Mean Time Between Failure (MTBF) is a much-used management metric in IT both for discrete components as well as overall systems. It is an important metric to track in regard to system reliability. Even fault-tolerant servers fail. As the level of complexity and coupling increases, systemic failure due to the accumulation of component failures interacting in previously unexpected ways is inevitable.
Related white papers
HP Technology Helps IDX Customers Successfully Address the Business Side of Healthcare
IDX Systems Corporation provides information technology solutions to maximize value in the delivery of healthcare, improve the quality of patient service, enhance medical outcomes, and accelerate the enterprise revenue cycle....
Visa Debit Processing Service (DPS) Delivers Flawless Service Through the Busy Peak Season
More than 140 institutions, including five of the largest banks in the United States, depend on the Visa Debit Processing Service (DPS) in Colorado to ensure transaction integrity and always-up...
A Hybrid Fault-Tolerant Algorithm for MPLS Networks
This paper presents a new fault tolerant, path maintaining, algorithm for use in MPLS based networks. The novelty of the algorithm lies upon the fact that it is the first...
Templates for More Efficient Virtualization Management
Many IT departments have utilized server virtualization to solve server sprawl. Virtualization makes provisioning a new OS instance so easy that administrators are often required to manage hundreds of virtual...
Multicast Group Communication as a Base for a Load-Balancing Replicated Data Service
This paper gives a rigorous account of an algorithm that provides sequentially consistent replicated data on top of the view synchronous group communication service previously specified by Fekete, Lynch and...
San Diego Emergency Dispatch System Depends on Windows 2000 Advanced Server and Stratus Fault-Tolerant Servers
When San Diego Fire and Life Safety Services (SDFLSS) needed a new platform for its life-critical Computer-Aided Dispatch (CAD) system, it turned to Windows 2000 Advanced Server and Stratus's fault-tolerant...
Fault Tolerance in a Virtual World: How Common Use Cases Raise Your Criticality Quotient
As server virtualization goes mainstream, the workloads of virtual machines are taking on mission-critical proportions. Uses range from application-dense server consolidation, to failover and disaster recovery, to managing the virtualized...


