research

Quantifying fault recovery in multiprocessor systems

Abstract

Various aspects of reliable computing are formalized and quantified with emphasis on efficient fault recovery. The mathematical model which proves to be most appropriate is provided by the theory of graphs. New measures for fault recovery are developed and the value of elements of the fault recovery vector are observed to depend not only on the computation graph H and the architecture graph G, but also on the specific location of a fault. In the examples, a hypercube is chosen as a representative of parallel computer architecture, and a pipeline as a typical configuration for program execution. Dependability qualities of such a system is defined with or without a fault. These qualities are determined by the resiliency triple defined by three parameters: multiplicity, robustness, and configurability. Parameters for measuring the recovery effectiveness are also introduced in terms of distance, time, and the number of new, used, and moved nodes and edges

    Similar works