Search CORE

1 research outputs found

Fault diagnosis of distributed systems : analysis, simulation and performance measurement.

Author: Mohammed Thabit Sultan
Publication venue
Publication date: 01/01/1992
Field of study

Fault diagnosis forms an essential component in the design of highly reliable distributed computing systems. Early models for diagnosis require a global observer, whereas the diagnosis is shared between the systems nodes in later models. These models are reviewed and their different diagnosability properties reconciled. The design of improved fault diagnosis algorithms for systems without a global observer provides the main motivation for the thesis. The modified algorithm SELF3 [Hoss88] is taken as a starting point. A number of communication architectures used in distributed systems are reviewed. The properties of diagnosis algorithms depend strongly on the testing graph. A general class of testing graphs, designated as H-graphs, (which are a generalization of Dꞩṭ graphs introduced in [Prep67]), are investigated and their diagnostic properties determined. A software simulator for distributed systems has been written as the main investigative tool for diagnosis algorithms. The design and structure of the simulator are described. The diagnosis process is measured in terms of diagnostic time and number of messages produced, and the factors upon which these quantities depend are identified. The results of simulation of a number of systems are given under various fault conditions. A modified way of routing diagnosis messages, which, especially in large system s, results in a reduction in both the number of diagnosis messages and the time required to perform diagnosis, is presented. The thesis also contains a number of specific recommendations for improving existing self-diagnosis algorithms

Cranfield CERES