This paper analyzes the reliability and availability features in several commodity Chip MultiProcessors (CMPs) and finds that they have numerous single points of failure. Failures in some system components, e.g., interconnect cache controller and memory controller logic, leave CMPs susceptible to error even if the computation is Dual Modular Redundant (DMR) or Triple Modular Redundant (TMR). Furthermore, even though some replicated resources are present in CMPs, they can not be used effectively for providing system-level protection because of the lack of fault isolation in shared components. This paper describes a CMP design that can provide system-level error protection. The proposed design provides mode configuration features in hardware to tolerate errors in any component.
Related white papers
Major Animation Studio Speeds to Finish Line on Latest Feature Film With IBRIX
Digital effects houses and film studios have leveraged the computational power of high performance clusters for rendering and other complex jobs. Over time, the size and makeup of the clusters...
Using Intel Thread Profiler for Win32 Threads: Philosophy and Theory
In the past, software profiling tools have concentrated on measuring the execution time of functions and procedures within applications. For serial applications this was useful information that guided the programmer...
Construction Materials Supplier Simplifies Branch Office IT Management, Saves $400,000
Lafarge North America needed a cost-effective solution to reduce network security risks at branch offices while increasing the efficiency of centralized server administration. The company became an early adopter of...
Railway Ticketing Agency Improves Performance and Eases IT Management
Voyages-sncf.com is the online travel agency for the Societe Nationale des Chemins de fer Francais (SNCF), a public enterprise that operates rail services for passengers and freight throughout France. With...
Black Belt Itanium Processor Performance: The Foundation (Part 1 of 5)
Many optimizations will automatically be enabled with aggressive compiler flags. However, with a small amount of performance analysis, architectural insight and targeted code modification, the software developer can greatly improve...
Planning Considerations for HiperDispatch Mode
For all levels of z/OS, a TCB or SRB may be dispatched on any logical processor of the type required (standard, zAAP or zIIP). A unit of work starts on...
GameRail Chooses Dell and Achieves Up to an 80 Percent Reduction in Latency for a Faster Gaming Experience
Founded in 2006, St. Louis, Missouri-based GameRail has been working to solve online gamers connectivity and gaming quality challenges. To minimize the possibility of service quality degradation before users reach...


