58 research outputs found

    Executable assertions and flight software

    Get PDF
    Executable assertions are used to test flight control software. The techniques used for testing flight software; however, are different from the techniques used to test other kinds of software. This is because of the redundant nature of flight software. An experimental setup for testing flight software using executable assertions is described. Techniques for writing and using executable assertions to test flight software are presented. The error detection capability of assertions is studied and many examples of assertions are given. The issues of placement and complexity of assertions and the language features to support efficient use of assertions are discussed

    Analysis of backward error recovery for concurrent processes with recovery blocks

    Get PDF
    Three different methods of implementing recovery blocks (RB's). These are the asynchronous, synchronous, and the pseudo recovery point implementations. Pseudo recovery points so that unbounded rollback may be avoided while maintaining process autonomy are proposed. Probabilistic models for analyzing these three methods under standard assumptions in computer performance analysis, i.e., exponential distributions for related random variables were developed. The interval between two successive recovery lines for asynchronous RB's mean loss in computation power for the synchronized method, and additional overhead and rollback distance in case PRP's are used were estimated

    Reasoning and Improving on Software Resilience against Unanticipated Exceptions

    Get PDF
    In software, there are the errors anticipated at specification and design time, those encountered at development and testing time, and those that happen in production mode yet never anticipated. In this paper, we aim at reasoning on the ability of software to correctly handle unanticipated exceptions. We propose an algorithm, called short-circuit testing, which injects exceptions during test suite execution so as to simulate unanticipated errors. This algorithm collects data that is used as input for verifying two formal exception contracts that capture two resilience properties. Our evaluation on 9 test suites, with 78% line coverage in average, analyzes 241 executed catch blocks, shows that 101 of them expose resilience properties and that 84 can be transformed to be more resilient

    Auditable Restoration of Distributed Programs

    Full text link
    We focus on a protocol for auditable restoration of distributed systems. The need for such protocol arises due to conflicting requirements (e.g., access to the system should be restricted but emergency access should be provided). One can design such systems with a tamper detection approach (based on the intuition of "break the glass door"). However, in a distributed system, such tampering, which are denoted as auditable events, is visible only for a single node. This is unacceptable since the actions they take in these situations can be different than those in the normal mode. Moreover, eventually, the auditable event needs to be cleared so that system resumes the normal operation. With this motivation, in this paper, we present a protocol for auditable restoration, where any process can potentially identify an auditable event. Whenever a new auditable event occurs, the system must reach an "auditable state" where every process is aware of the auditable event. Only after the system reaches an auditable state, it can begin the operation of restoration. Although any process can observe an auditable event, we require that only "authorized" processes can begin the task of restoration. Moreover, these processes can begin the restoration only when the system is in an auditable state. Our protocol is self-stabilizing and has bounded state space. It can effectively handle the case where faults or auditable events occur during the restoration protocol. Moreover, it can be used to provide auditable restoration to other distributed protocol.Comment: 10 page

    Implementing atomic actions in Ada 95

    Get PDF
    Atomic actions are an important dynamic structuring technique that aid the construction of fault-tolerant concurrent systems. Although they were developed some years ago, none of the well-known commercially-available programming languages directly support their use. This paper summarizes software fault tolerance techniques for concurrent systems, evaluates the Ada 95 programming language from the perspective of its support for software fault tolerance, and shows how Ada 95 can be used to implement software fault tolerance techniques. In particular, it shows how packages, protected objects, requeue, exceptions, asynchronous transfer of control, tagged types, and controlled types can be used as building blocks from which to construct atomic actions with forward and backward error recovery, which are resilient to deserter tasks and task abortion

    The implementation and use of Ada on distributed systems with high reliability requirements

    Get PDF
    The use and implementation of Ada in distributed environments in which reliability is the primary concern were investigated. In particular, the concept that a distributed system may be programmed entirely in Ada so that the individual tasks of the system are unconcerned with which processors they are executing on, and that failures may occur in the software or underlying hardware was examined. Progress is discussed for the following areas: continued development and testing of the fault-tolerant Ada testbed; development of suggested changes to Ada so that it might more easily cope with the failure of interest; and design of new approaches to fault-tolerant software in real-time systems, and integration of these ideas into Ada

    Middleware Fault Tolerance Support for the BOSS Embedded Operating System

    Get PDF
    Critical embedded systems need a dependable operating system and application. Despite all efforts to prevent and remove faults in system development, residual software faults usually persist. Therefore, critical systems need some sort of fault tolerance to deal with these faults and also with hardware faults at operation time. This work proposes fault-tolerant support mechanisms for the BOSS embedded operating system, based on the application of proven fault tolerance strategies by middleware control software which transparently delivers the added functionality to the application software. Special attention is taken to complexity control and resource constraints, targeting the needs of the embedded market.Fundação para a Ciência e a Tecnologia (FCT

    Review of Software Fault-Tolerance Methods for Reliability Enhancement of Real-Time Software Systems

    Get PDF
    Real time systems are those systems which must guarantee to response correctly within strict time constraint or within deadline. Failures can arise from both functional errors as well as timing bugs. Hence, it is necessary to provide temporal correctness of programs used in real time applications in addition to providing functional correctness. Although, there are several researches concerned with achieving fault tolerance in the presence of various functional and operational errors but many of them did not address the problem concerned with the timing bugs which is an important issue in real time systems. As for real time systems, many times it becomes a necessity for a given service to be delivered within the specified time deadline. Therefore, this paper reviews the existing approaches from the perspective of  real time systems to analyse the shortcomings of these approaches to  present a versatile and cost effective approach in the presence of timing bugs for providing fault tolerance to enhance the reliability of the real time software applications

    Recovery blocks for communicating systems

    Get PDF
    In many practical applications of real-time computing (avionics, switching systems) a message-passing inter-processes communication approach is adopted for both modularity and reliability aims. In the present paper, the problem of adding fault-tolerance in a message passing multiprocesses environment is examined. Recovery blocks implementation schemes for both asynchronous and synchronous communications are proposed, with the aim of avoiding domino-effects and exploiting the message oriented system structure. When a sender process produces a message, an acceptance test is performed on the message by system procedures, which in sequence: i) transfer the message on the receiving process working memory, ii) save present process status, or in case of error, restore some previous process status, and iii) discard no longer needed status informations
    • …
    corecore