12 research outputs found

    The implementation and use of Ada on distributed systems with high reliability requirements

    Get PDF
    The general inadequacy of Ada for programming systems that must survive processor loss was shown. A solution to the problem was proposed in which there are no syntatic changes to Ada. The approach was evaluated using a full-scale, realistic application. The application used was the Advanced Transport Operating System (ATOPS), an experimental computer control system developed for a modified Boeing 737 aircraft. The ATOPS system is a full authority, real-time avionics system providing a large variety of advanced features. Methods of building fault tolerance into concurrent systems were explored. A set of criteria by which the proposed method will be judged was examined. Extensive interaction with personnel from Computer Sciences Corporation and NASA Langley occurred to determine the requirements of the ATOPS software. Backward error recovery in concurrent systems was assessed

    On the engineering of crucial software

    Get PDF
    The various aspects of the conventional software development cycle are examined. This cycle was the basis of the augmented approach contained in the original grant proposal. This cycle was found inadequate for crucial software development, and the justification for this opinion is presented. Several possible enhancements to the conventional software cycle are discussed. Software fault tolerance, a possible enhancement of major importance, is discussed separately. Formal verification using mathematical proof is considered. Automatic programming is a radical alternative to the conventional cycle and is discussed. Recommendations for a comprehensive approach are presented, and various experiments which could be conducted in AIRLAB are described

    Study of fault-tolerant software technology

    Get PDF
    Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance

    The implementation and use of Ada on distributed systems with high reliability requirements

    Get PDF
    A preliminary analysis of the Ada implementation of the Advanced Transport Operating System (ATOPS), an experimental computer control system developed at NASA Langley for a modified Boeing 737 aircraft, is presented. The criteria that was determined for the evaluation of this approach is described. A preliminary version of the requirements for the ATOPS is contained. This requirements specification is not a formal document, but rather a description of certain aspects of the ATOPS system at a level of detail that best suits the needs of the research. The survey of backward error recovery techniques is also presented

    The construction of recoverable multi-level systems

    Get PDF
    PhD ThesisSystems structures and data structures which make possible the state restoration of user objects, are described in this thesis. Recovery is linked with types, which suggests making a distinction between recoverable and unrecoverable types. For convenience, recovery is discussed in terms of recovery blocks as developed at the University of Newcastle upon Tyne. Recovery is taken to mean restoring the values of recoverable types. Recoverable multi-level systems are considered. On the one hand levels in such systems can be backed out. On the other hand these levels provide explicit recovery for new types they introduce, and so can be called on to restore states of objects used in higher levels. The concepts and issues are discussed and explained; mechanisms and techniques for building such systems are presented. Recovery techniques for complex global data structures and techniques to maintain consistency at any time, even when recovery is impossible such as after a crash, are described and compared. Many of the presented techniques are employed in an implemented recoverable two-level system, with a recoverable filing system. This two-level system is described in detail. It is argued that in order to implement recoverability in multi-level systems with efficiency and flexibility, the interfaces of the system should provide both recoverable and unrecoverable types. It is also shown that the way in which complex data structures are updated is of major importance if recovery is to be provided in a "reasonably" efficient way and consistency is to be guaranteed after a crash.Netherlands Organisation for the Advancement of Pure Research

    Object Management for Persistence and Recoverability

    Get PDF
    PhD ThesisAs distribution becomes commonplace, there is a growing requirement for applications that behave reliably when node or network failures occur. To support reliability, operations on the components of a distributed application may be declared to occur within the scope of an atomic action. This thesis describes how atomic actions may be supported in an environment consisting of applications that operate on objects. To support the failure atomicity and permanence of effect properties of an atomic action, the objects accessed within the scope of an atomic action must be recoverable and persistent. This thesis describes how these properties may be added to the class of an object. The approach adopted is to provide a class that implements recovery and persistence mechanisms, and derive new classes from this base class. By refining inherited operations so that recovery and persistence is specific to that class, recoverable and persistent objects may be easily produced. This thesis also describes how an atomic action may be implemented as a class, so that instances of the class are atomic actions which manage the recoverable and persistent objects. Multiple instance declarations produce nested atomic actions, and the atomic action class also inherits persistence so that shortterm commit information may be saved in an object store which is used to maintain the passive state of persistent objects. Since the mechanisms and classes that support recovery, persistence, and atomic actions are constructed using the feature of an object-oriented language, they may be implemented in environments that provide suitable support for objects and object-oriented programming languages.Science and Engineering Research Council, SERC/Alve

    A VLSI architecture for enhancing software reliability

    Get PDF
    As a solution to the software crisis, we propose an architecture that supports and encourages the use of programming techniques and mechanisms for enhancing software reliability. The proposed architecture provides efficient mechanisms for detecting a wide variety of run-time errors, for supporting data abstraction, module-based programming and encourages the use of small protection domains through a highly efficient capability mechanism. The proposed architecture also provides efficient support for user-specified exception handlers and both event-driven and trace-driven debugging mechanisms. The shortcomings of the existing capability-based architectures that were designed with a similar goal in mind are examined critically to identify their problems with regard to capability translation, domain switching, storage management, data abstraction and interprocess communication. Assuming realistic VLSI implementation constraints, an instruction set for the proposed architecture is designed. Performance estimates of the proposed system are then made from the microprograms corresponding to these instructions based on observed characteristics of similar systems and language usage. A comparison of the proposed architecture with similar ones, both in terms of functional characteristics and low-level performance indicates the proposed design to be superior

    The embedded operating system project

    Get PDF
    This progress report describes research towards the design and construction of embedded operating systems for real-time advanced aerospace applications. The applications concerned require reliable operating system support that must accommodate networks of computers. The report addresses the problems of constructing such operating systems, the communications media, reconfiguration, consistency and recovery in a distributed system, and the issues of realtime processing. A discussion is included on suitable theoretical foundations for the use of atomic actions to support fault tolerance and data consistency in real-time object-based systems. In particular, this report addresses: atomic actions, fault tolerance, operating system structure, program development, reliability and availability, and networking issues. This document reports the status of various experiments designed and conducted to investigate embedded operating system design issues

    An error recovery scheme for concurrent processes

    Get PDF
    PhD ThesisWith the more widespread use of multi- processors and distributed computing systems, programmers need a simple, reliable interface to them. This thesis describes language constructs, and mechanisms for their support, that can be used in the implementation of fault-tolerant concurrent processes. The basic language structure is the Atomic Action, supported by a modified recovery cache mechanism. This combines the collection of recovery data with the locking of resources and allows recovery blocks to be integrated with Atomic Actions. Synchronisation between actions is discussed, as well as a means of detecting and breaking deadlocks, based on the use of a "blocking graph". Reliable communication and cooperation between actions is considered, and several constructs are investigated. The limitations of Shared Atomic Actions are identified, and, further, the use of a form of reliable "secretary" is shown to lead to unnecessary recovery activity. These problems are resolved by structures based on a classification of resources by the way they are used in programs. Also contained in the thesis are descriptions of trial implementations of some of the mechanisms described, and a discussion of existing concurrent programming techniques.Science Research Council of Great Britai

    Reliable resource allocation between unreliable processes

    No full text
    corecore