582 research outputs found

    Automatic specification of reliability models for fault-tolerant computers

    Get PDF
    The calculation of reliability measures using Markov models is required for life-critical processor-memory-switch structures that have standby redundancy or that are subject to transient or intermittent faults or repair. The task of specifying these models is tedious and prone to human error because of the large number of states and transitions required in any reasonable system. Therefore, model specification is a major analysis bottleneck, and model verification is a major validation problem. The general unfamiliarity of computer architects with Markov modeling techniques further increases the necessity of automating the model specification. Automation requires a general system description language (SDL). For practicality, this SDL should also provide a high level of abstraction and be easy to learn and use. The first attempt to define and implement an SDL with those characteristics is presented. A program named Automated Reliability Modeling (ARM) was constructed as a research vehicle. The ARM program uses a graphical interface as its SDL, and it outputs a Markov reliability model specification formulated for direct use by programs that generate and evaluate the model

    Rapid Recovery for Systems with Scarce Faults

    Full text link
    Our goal is to achieve a high degree of fault tolerance through the control of a safety critical systems. This reduces to solving a game between a malicious environment that injects failures and a controller who tries to establish a correct behavior. We suggest a new control objective for such systems that offers a better balance between complexity and precision: we seek systems that are k-resilient. In order to be k-resilient, a system needs to be able to rapidly recover from a small number, up to k, of local faults infinitely many times, provided that blocks of up to k faults are separated by short recovery periods in which no fault occurs. k-resilience is a simple but powerful abstraction from the precise distribution of local faults, but much more refined than the traditional objective to maximize the number of local faults. We argue why we believe this to be the right level of abstraction for safety critical systems when local faults are few and far between. We show that the computational complexity of constructing optimal control with respect to resilience is low and demonstrate the feasibility through an implementation and experimental results.Comment: In Proceedings GandALF 2012, arXiv:1210.202

    Reachability analysis of fault-tolerant protocols

    Get PDF
    Due to the increasing requirements imposed on fault-tolerant protocols, their complexity is steadily growing. Thus verification of the functionality of the fault-tolerance mechanisms is also more difficult to accomplish. In this thesis a model-based approach towards efficiently finding ``loopholes'' in the fault-tolerance properties of large protocols is provided. The contributions comprise thinning out the state space without missing behavior with respect to the validation goal through a partial ordering strategy based on single fault regions. Two algorithms for (partial) analysis are designed, implemented and evaluated: the H-RAFT algorithm is based on SDL elements constituting each transition and requires no user-knowledge. The Close-to-Failure algorithm on the other hand is purely based on user-provided information. Combination of the two algorithms is also investigated. All contributions exploit the fault-tolerant nature of the protocols. In order to compare the performances of the novel techniques to well-known algorithms, a tool has been developed to allow for easy integration of different algorithms. All contributions are thoroughly investigated through experiments summing up to several CPU-month. The results show unambiguously the advantages of the developed methods and algorithms.Durch die zunehmenden Anforderungen an fehlertolerante Protokolle steigt auch deren Komplexität zusehends. Dadurch ist es deutlich schwieriger die Funktionalität der Fehlertoleranzmechanismen zu überprüfen. In dieser Arbeit wird ein modellbasierter Ansatz vorgestellt, dessen Ziel es ist ``Lücken'' in den Fehlertoleranzeigenschaften effizient zu finden. Dazu wird ein Algorithmus entwickelt, der eine partiellen Ordnung erzeugt und es somit erlaubt den Zustandsraum zu verkleinern ohne Verhalten bezüglich der zu prüfenden Eigenschaften zu verlieren. Weiterhin werden zwei Algorithmen zur (partiellen) Analyse entworfen, implementiert und bewertet: Der H-RAFT Algorithmus basiert auf den SDL-Elementen der jeweiligen Transitionen und erfordert keinerlei weiteres Domänen-Wissen des Benutzers. Der Close-to-Failure Algorithmus hingegen ist nur von Benutzerinformationen abhängig. Kombinationen der beiden Ansätze werden ebenfalls untersucht. Für alle vorgestellten Methoden und Algorithmen wird ausgenutzt, dass es sich um fehlertolerante Protokolle handelt. Um die neuen Ansätze mit weitverbreiteten Algorithmen vergleichen zu können wird ein Werkzeug entwickelt, welches eine einfache Integration von Algorithmen ermöglicht. Die vorgestellten Techniken werden ausführlich in Experimenten mit einem Gesamtaufwand von etlichen CPU-Monaten untersucht. Die Ergebnisse dieser Experimentreihen zeigen eindeutig die Vorteile der entwickelten Algorithmen und Methoden

    Practical Model Checking of a Home Area Network System: Case Study

    Get PDF
    The integrated communication infrastructure is the core of the Smart Grid architecture. Its two-way communication and information flow provides this network with all needed resources in order to control and manage all connected components from the utility to the customer side. This latter, named the Home Area Network or HAN, is a dedicated network connecting smart devices inside the customer home, and using different solutions. In order to avoid problems and anomalies along the process life cycle of developing a new solution for HAN network, the modeling and validation is one of the most powerful tools to achieve this goal. This paper presents a practical case study of such validation. It intends to validate a HAN SDL model, described in a previous work, using model checking techniques. It introduces a method to translate the SDL model to a Promela model using an intermediate format IF. After the generation of the Promela model, verification is performed to ensure that some functional properties are satisfied. The desired properties are defined in Linear Temporal Logic (LTL), and DTSPIN (an extension of SPIN with discrete time) model checker is used to verify the correctness of the model

    Development and implementation of the verification process for the shuttle avionics system

    Get PDF
    The background of the shuttle avionics system design and the unique drivers associated with the redundant digital multiplexed data processing system are examined. With flight software pervading to the lowest elements of the flight critical subsystems, it was necessary to identify a unique and orderly approach of verifying the system as flight ready for STS-1. The approach and implementation plan is discussed, and both technical problems and management issues are dealt with
    • …
    corecore