Runahead execution is a technique that improves processor performance by pre-executing the running application instead of stalling the processor when a long-latency cache miss occurs. Previous research has shown that this technique significantly improves processor performance. However, the efficiency of runahead execution, which directly affects the dynamic energy consumed by a runahead processor, has not been explored. A runahead processor executes significantly more instructions than a traditional out-of-order processor, sometimes without providing any performance benefit, which makes it inefficient. This paper describes the causes of inefficiency in runahead execution and proposes techniques to make a runahead processor more efficient, thereby reducing its energy consumption and possibly increasing its performance.
Related white papers
Major Animation Studio Speeds to Finish Line on Latest Feature Film With IBRIX
Digital effects houses and film studios have leveraged the computational power of high performance clusters for rendering and other complex jobs. Over time, the size and makeup of the clusters...
Planning Considerations for HiperDispatch Mode
For all levels of z/OS, a TCB or SRB may be dispatched on any logical processor of the type required (standard, zAAP or zIIP). A unit of work starts on...
Premier IT Magazine: Reinvented Transistors
45-nm Manufacturing Creating the Next Wave of Quad-Core Processors
Live Migration With AMD-V Extended Migration Technology
Virtual Machine migration is a capability being increasingly utilized in today's enterprise environments. With live migration, a Virtual Machine Monitor (VMM) moves a running Virtual Machine (VM) nearly instantaneously from...
Tuning Symantec Brightmail AntiSpam on UltraSPARC T1 and T2 Processor-Powered Servers
Electronic mail is a business-critical function in virtually every enterprise, and it is also one that is under constant attack. Well-known viruses such as Melissa, and worms like SoBig have...
AMD Stream Computing: Software Stack
Advanced Micro Devices, Inc. (AMD) - a leading global provider of innovative computing solutions - is working with other leading companies and academic institutions worldwide to deliver a complete, accelerated...
Cache Optimization for Coarse Grain Task Parallel Processing Using Inter-Array Padding
The wide use of multiprocessor system has been making automatic parallelizing compilers more important. To improve the performance of multiprocessor system more by compiler, multigrain parallelization is important. In multigrain...

