igital systems have been entrusted with increasingly more critical responsibilities, requiring high dependability. Often the use of high-quality components and design techniques does not sufficiently reduce the likelihood of system failures, and means must be provided to tolerate faults in the system. This article reviews the basic concepts of fault-tolerant computing, focusing on hardware. It examines failures, faults, and errors in digital systems and defines measures of dependability, which dictate and evaluate fault-tolerance strategies for different classes of applications. The various mechanisms for implementing a fault-tolerance strategy are reviewed, including error detection, fault masking, fault confinement, system reconfiguration and repair, and system recovery
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.