8 research outputs found

    Terminology and paradigms for fault tolerance

    Get PDF

    Garbage collection in distributed systems

    Get PDF
    PhD ThesisThe provision of system-wide heap storage has a number of advantages. However, when the technique is applied to distributed systems automatically recovering inaccessible variables becomes a serious problem. This thesis presents a survey of such garbage collection techniques but finds that no existing algorithm is entirely suitable. A new, general purpose algorithm is developed and presented which allows individual systems to garbage collect largely independently. The effects of these garbage collections are combined, using recursively structured control mechanisms, to achieve garbage collection of the entire heap with the minimum of overheads. Experimental results show that new algorithm recovers most inaccessible variables more quickly than a straightforward garbage collection, giving an improved memory utilisation

    Fault injection testing of software implemented fault tolerance mechanisms of distributed systems

    Get PDF
    PhD ThesisOne way of gaining confidence in the adequacy of fault tolerance mechanisms of a system is to test the system by injecting faults and see how the system performs under faulty conditions. This thesis investigates the issues of testing software-implemented fault tolerance mechanisms of distributed systems through fault injection. A fault injection method has been developed. The method requires that the target software system be structured as a collection of objects interacting via messages. This enables easy insertion of fault injection objects into the target system to emulate incorrect behaviour of faulty processors by manipulating messages. This approach allows one to inject specific classes of faults while not requiring any significant changes to the target system. The method differs from the previous work in that it exploits an object oriented approach of software implementation to support the injection of specific classes of faults at the system level. The proposed fault injection method has been applied to test software-implemented reliable node systems: a TMR (triple modular redundant) node and a fail-silent node. The nodes have integrated fault tolerance mechanisms and are expected to exhibit certain behaviour in the presence of a failure. The thesis describes how various such mechanisms (for example, clock synchronisation protocol, and atomic broadcast protocol) were tested. The testing revealed flaws in implementation that had not been discovered before, thereby demonstrating the usefulness of the method. Application of the approach to other distributed systems is also described in the thesis.CEC ESPRIT programme, UK Engineering and Physical Sciences Research Council (EPSRC)

    Selective transparency in distributed transaction processing

    Get PDF
    PhD ThesisObject-oriented programming languages provide a powerful interface for programmers to access the mechanisms necessary for reliable distributed computing. Using inheritance and polymorphism provided by the object model, it is possible to develop a hierarchy of classes to capture the semantics and inter-relationships of various levels of functionality required for distributed transaction processing. Using multiple inheritance, application developers can selectively apply transaction properties to suit the requirements of the application objects. In addition to the specific problems of (distributed) transaction processing in an environment of persistent objects, there is a need for a unified framework, or architecture in which to place this system. To be truly effective, not only the transaction manager, but the entire transaction support environment must be described, designed and implemented in terms of objects. This thesis presents an architecture for reliable distributed processing in which the management of persistence, provision of transaction properties (e.g., concurrency control), and organisation of support services (e.g., RPC) are all gathered into a unified design based on the object model.UK Science and Engineering Council: ESPRIT project

    Design and development of algorithms for fault tolerant distributed systems

    Get PDF
    PhD ThesisThis thesis describes the design and development of algorithms for fault tolerant distributed systems. The development of such algorithms requires making assumptions about the types of component faults for which toler- ance is to be provided. Such assumptions must be specified accurately. To this end, this thesis develops a classification of faults in systems. This fault classification identifies a range of fault types from the most restricted to the least restricted. For each fault type, an algorithm for reaching distributed agreement in the presence of a bounded number of faulty processors is developed, and thus a family of agreement algorithms is presented. The influence of the various fault types on the complexities of these algorithms is discussed. Early stopping algorithms are also developed for selected fault types and the influence of fault types on the early stopping conditions of the respective algorithms is analysed. The problem of evaluating the perfor- mance of distributed replicated systems which will require agreement algo- rithms is considered next. As a first step in the direction of meeting this challenging task, a pipeline triple modular redundant system is considered and analytical methods are derived to evaluate the performance of such a system. Finally, the accuracy of these methods is examined using computer simulations.UK Science and Engineering Research Council (SERC), DELTA-4 consortium of ESPIRI

    Integrating safety analysis techniques, supporting identification of common cause failures.

    Get PDF
    When we apply safety analysis techniques on a new design, our primary objective is to malfunctions. The ultimate aim is to identify weak areas of the design and stimulate design iterations that improve the safety of the system under examination. Unfortunately, the current industrial pratrise shows that this aim is seriously hindered by the lack of appropriate techniques for the analysis of complex hierarchical designs

    A characterisation of faults in systems

    No full text
    SIGLEAvailable from British Library Document Supply Centre- DSC:8724.9(206) / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    A Characterisation of Faults in Systems

    No full text
    corecore