Advertisement
Promo

Server platforms Toolkit

Download now

A Large-Scale Study of Failures in High-Performance-Computing Systems

PublisherCarnegie Mellon University
Format568.0KB PDFDate added01 Dec 2005
Topics High Performance Computing
Downloads2

Designing highly dependable systems requires a good understanding of failure characteristics. Unfortunately little raw data on failures in large IT installations is publicly available, due to the confidential nature of this data. This paper analyzes soon-to-be-public failure data covering systems at a large high-performance-computing site. The data has been collected over the past 9 years at Los Alamos National Laboratory and includes 23000 failures recorded on more than 20 different systems, mostly large clusters of SMP and NUMA nodes. They study the statistics of the data, including the root cause of failures, the mean time between failures, and the mean time to repair.

Download now

Did you find this white paper useful?
14 out of 25 users found this white paper useful


  • Trackback
  • Clip Link

Related white papers

HP print solutions and Barclays Wealth

Leading investment management advisor Barclays Wealth wanted to replace its disparate, multi-vendor print environment with a more efficient and environmentally sound solution. One of the business benefits, Annual savings from...


HP print solutions for Logica

IT services and business provider, Logica wanted to replace an ageing fleet of legacy printers and copies from different vendors with a single-vendor solution which would reduce costs and increase...


Creating a Dynamic Infrastructure through Virtualization

In almost every case,the transformation to a dynamic infrastructure will involve virtualization.Many IT professionals think of virtualization specifically in terms of servers.IBM,however,has a broader perspective,in which virtualization is seen as...


Dynamic Infrastructure Helping Build a Smarter PlanetDelivering Superior Business and IT Services with Agility and Speed

In this smarter world, we need our infrastructure to propel us forward, not hold us back. This infrastructure becomes instrumented, interconnected and intelligent to bring together the business and IT...


IBM Virtualization Services

Virtualization is a powerful technology and can have profound effects on the datacenter; however, it should be viewed as a component of an overall IT strategy that will be able...


Go Green with IBM System x Servers and Intel Xeon Processors

By "going green" with energy-efficient IBM® System x™ servers featuring Intel® Xeon® processors, you can win back control of your IT budget—and win the battle with data center power constraints.


Recommended Practices for PC Fleet Management for Mid Market and EnterpriseOrganizations

PC management is both costly and ongoing. Desktop support alone soaks up 30-45 percent1 of IT budgets. But optimizing your PC fleet management strategy will produce efficiencies and lower costs. ...


Broadband Deals? Powered by Top 10 Broadband

150+ broadband packages

Compare 30+ mobile broadband deals

Mobile Broadband »
White Paper

Featured White Paper

Technical Description: IBMXIV Storage System

The IBMXIV® Storage System offers a new level of high-end disk system performance and reliability. It is a core component of theIBMInformation Infrastructure which helps clients address their needs for availability, security, compliance and retention of information. The XIVsystem provides consistency under all conditions, immunity to hotspots, ...

Download Now

Other White Papers

Best Practices for Translating Customer Satisfaction into Revenue

Today's support organisations are focused on two top-level metrics: financial results and customer...

Data Quality Considerations for a Master Data Management Structure

Companies acquiring companies. Human Resources sharing information with Finance. Businesses...

See All White Papers


Skip Sub Navigation Links to CNET Brand Links

Help

Become part of the ZDNet community.

Newsletters