Transient Faults in Computer Systems

Abstract

A powerful technique particularly appropriate for the detection of errors caused by transient faults in computer systems was developed. The technique can be implemented in either software or hardware; the research conducted thus far primarily considered software implementations. The error detection technique developed has the distinct advantage of having provably complete coverage of all errors caused by transient faults that affect the output produced by the execution of a program. In other words, the technique does not have to be tuned to a particular error model to enhance error coverage. Also, the correctness of the technique can be formally verified. The technique uses time and software redundancy. The foundation for an effective, low-overhead, software-based certification trail approach to real-time error detection resulting from transient fault phenomena was developed

    Similar works