245 research outputs found

    Design verification of SIFT

    Get PDF
    A SIFT reliable aircraft control computer system, designed to meet the ultrahigh reliability required for safety critical flight control applications by use of processor replications and voting, was constructed for SRI, and delivered to NASA Langley for evaluation in the AIRLAB. To increase confidence in the reliability projections for SIFT, produced by a Markov reliability model, SRI constructed a formal specification, defining the meaning of reliability in the context of flight control. A further series of specifications defined, in increasing detail, the design of SIFT down to pre- and post-conditions on Pascal code procedures. Mechanically checked mathematical proofs were constructed to demonstrate that the more detailed design specifications for SIFT do indeed imply the formal reliability requirement. An additional specification defined some of the assumptions made about SIFT by the Markov model, and further proofs were constructed to show that these assumptions, as expressed by that specification, did indeed follow from the more detailed design specifications for SIFT. This report provides an outline of the methodology used for this hierarchical specification and proof, and describes the various specifications and proofs performed

    Development of software fault-tolerance techniques

    Get PDF
    As computers become more widely used, and in particular as they become used in more safety critical applications, the reliability of the computer system and its software becomes more important. There is also an increasing need for high levels of reliability in applications involving very large numbers of inexpensive units where recall of the units would be disproportionately expensive. The nature of faults and the assumptions made by different approaches to correct operation are considered. The recovery block approach is described and a probabilistic analysis of its effectiveness, with and without correlated design errors is provided. Mechanisms for generating acceptance tests from specifications, and for providing recovery in the presence of asynchrony, are described. An analysis of, and design for, the provision of recovery blocks in the microprogram of the Bendix BDX930 processor is provided. An example of the use of recovery blocks in a simple operating system is also provided

    Byzantine-Resistant Total Ordering Algorithms

    Get PDF
    AbstractMulticast group communication protocols are used extensively in fault-tolerant distributed systems. For many such protocols, the acknowledgments for individual messages define a causal order on messages. Maintaining the consistency of information, replicated on several processors to protect it against faults, is greatly simplified by a total order on messages. We present algorithms that incrementally convert a causal order on messages into a total order and that tolerate both crash and Byzantine process faults. Varying compromises between latency to message ordering and resilience to faults yield four distinct algorithms. All of these algorithms use a multistage voting strategy to achieve agreement on the total order and exploit the random structure of the causal order to ensure probabilistic termination

    Fault tolerant architectures for integrated aircraft electronics systems

    Get PDF
    Work into possible architectures for future flight control computer systems is described. Ada for Fault-Tolerant Systems, the NETS Network Error-Tolerant System architecture, and voting in asynchronous systems are covered

    Fault tolerant architectures for integrated aircraft electronics systems, task 2

    Get PDF
    The architectural basis for an advanced fault tolerant on-board computer to succeed the current generation of fault tolerant computers is examined. The network error tolerant system architecture is studied with particular attention to intercluster configurations and communication protocols, and to refined reliability estimates. The diagnosis of faults, so that appropriate choices for reconfiguration can be made is discussed. The analysis relates particularly to the recognition of transient faults in a system with tasks at many levels of priority. The demand driven data-flow architecture, which appears to have possible application in fault tolerant systems is described and work investigating the feasibility of automatic generation of aircraft flight control programs from abstract specifications is reported

    An interval logic for higher-level temporal reasoning

    Get PDF
    Prior work explored temporal logics, based on classical modal logics, as a framework for specifying and reasoning about concurrent programs, distributed systems, and communications protocols, and reported on efforts using temporal reasoning primitives to express very high level abstract requirements that a program or system is to satisfy. Based on experience with those primitives, this report describes an Interval Logic that is more suitable for expressing such higher level temporal properties. The report provides a formal semantics for the Interval Logic, and several examples of its use. A description of decision procedures for the logic is also included

    End-To-End Latency of a Fault-Tolerant CORBA Infrastructure

    Get PDF
    This paper presents an evaluation of the end-to-end latency of a fault-tolerant CORBA infrastructure that we have implemented. The fault-tolerant infrastructure replicates the server applications using active, passive and semi-active replication, and maintains strong replica consistency of the server replicas. By analyses and by measurements of the running fault-tolerant infrastructure, we characterize the end-to-end latency under fault-free conditions. The main determining factor of the run-time performance of the fault-tolerant infrastructure is the Totem group communication protocol, which contributes to the end-to-end latency primarily in two ways: the delay in sending messages and the processing cost of the rotating token. To reduce the delay in sending messages for passive and semi-active replication, the position of the primary server replica on the Totem ring, the token rotation time, the processing time at the client, and the processing time at the server must be considered. For active replication, the presence of duplicate messages adversely affects the performance. However, if an effective sending-side duplicate suppression mechanism is implemented, active replication is more advantageous than both passive and semi-active replication because of the automatic selection of the most favorable position of the server replica that sends the first non-duplicate reply

    Unification of Transactions and Replication in Three-Tier Architectures Based on CORBA

    Get PDF
    In this paper, we describe a software infrastructure that unifies transactions and replication in three-tier architectures and provides data consistency and high availability for enterprise applications. The infrastructure uses transactions based on the CORBA object transaction service to protect the application data in databases on stable storage, using a roll-backward recovery strategy, and replication based on the fault tolerant CORBA standard to protect the middle-tier servers, using a roll-forward recovery strategy. The infrastructure replicates the middle-tier servers to protect the application business logic processing. In addition, it replicates the transaction coordinator, which renders the two-phase commit protocol nonblocking and, thus, avoids potentially long service disruptions caused by failure of the coordinator. The infrastructure handles the interactions between the replicated middle-tier servers and the database servers through replicated gateways that prevent duplicate requests from reaching the database servers. It implements automatic client-side failover mechanisms, which guarantee that clients know the outcome of the requests that they have made, and retries aborted transactions automatically on behalf of the clients
    • …
    corecore