1 research outputs found
Fault diagnosis of distributed systems : analysis, simulation and performance measurement.
Fault diagnosis forms an essential component in the design of highly reliable distributed
computing systems. Early models for diagnosis require a global observer, whereas the
diagnosis is shared between the systems nodes in later models. These models are reviewed and their different diagnosability properties reconciled. The design of improved fault diagnosis algorithms for systems without a global observer provides the main motivation for the thesis. The modified algorithm SELF3 [Hoss88] is taken as a starting point.
A number of communication architectures used in distributed systems are reviewed. The
properties of diagnosis algorithms depend strongly on the testing graph. A general class
of testing graphs, designated as H-graphs, (which are a generalization of Dêž©á¹ graphs
introduced in [Prep67]), are investigated and their diagnostic properties determined.
A software simulator for distributed systems has been written as the main investigative
tool for diagnosis algorithms. The design and structure of the simulator are described.
The diagnosis process is measured in terms of diagnostic time and number of messages
produced, and the factors upon which these quantities depend are identified. The results
of simulation of a number of systems are given under various fault conditions. A modified
way of routing diagnosis messages, which, especially in large system s, results in a
reduction in both the number of diagnosis messages and the time required to perform
diagnosis, is presented. The thesis also contains a number of specific recommendations
for improving existing self-diagnosis algorithms