Search CORE

10,367 research outputs found

Prototype of Fault Adaptive Embedded Software for Large-Scale Real-Time Systems

Author: Haney Michael
Jung Mina
Messie Derek
Nordstrom Steven
Oh Jae C.
Shetty Shweta
Publication venue
Publication date: 01/01/2005
Field of study

This paper describes a comprehensive prototype of large-scale fault adaptive embedded software developed for the proposed Fermilab BTeV high energy physics experiment. Lightweight self-optimizing agents embedded within Level 1 of the prototype are responsible for proactive and reactive monitoring and mitigation based on specified layers of competence. The agents are self-protecting, detecting cascading failures using a distributed approach. Adaptive, reconfigurable, and mobile objects for reliablility are designed to be self-configuring to adapt automatically to dynamically changing environments. These objects provide a self-healing layer with the ability to discover, diagnose, and react to discontinuities in real-time processing. A generic modeling environment was developed to facilitate design and implementation of hardware resource specifications, application data flow, and failure mitigation strategies. Level 1 of the planned BTeV trigger system alone will consist of 2500 DSPs, so the number of components and intractable fault scenarios involved make it impossible to design an `expert system' that applies traditional centralized mitigative strategies based on rules capturing every possible system state. Instead, a distributed reactive approach is implemented using the tools and methodologies developed by the Real-Time Embedded Systems group.Comment: 2nd Workshop on Engineering of Autonomic Systems (EASe), in the 12th Annual IEEE International Conference and Workshop on the Engineering of Computer Based Systems (ECBS), Washington, DC, April, 200

arXiv.org e-Print Archive

CiteSeerX

Syracuse University Research Facility and Collaborative Environment

Event-B Patterns for Specifying Fault-Tolerance in Multi-Agent Interaction

Author: A. Avizienis
J. Ferber
N.R. Jennings
N. Storey
J.R. Abrial
D. Jackson
E. Gamma
G. Leavens
J. Abrial
R. Smith
B. Meyer
S. Blazy
E. Chan
D. Cansell
D. Deugo
Y. Aridor
M. Weiss
R. Back
A. Fedoruk
S. Kumar
L. Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Interaction in a multi-agent system is susceptible to failure. A rigorous development of a multi-agent system must include the treatment of fault-tolerance of agent interactions for the agents to be able to continue to function independently. Patterns can be used to capture fault-tolerance techniques. A set of modelling patterns is presented that specify fault-tolerance in Event-B specifications of multi-agent interactions. The purpose of these patterns is to capture common modelling structures for distributed agent interaction in a form that is re-usable on other related developments. The patterns have been applied to a case study of the contract net interaction protocol

CiteSeerX

Southampton (e-Prints Soton)

Rapid Recovery for Systems with Scarce Faults

Author: Huang Chung-Hao
Peled Doron
Schewe Sven
Wang Farn
Publication venue: 'Open Publishing Association'
Publication date: 01/10/2012
Field of study

Our goal is to achieve a high degree of fault tolerance through the control of a safety critical systems. This reduces to solving a game between a malicious environment that injects failures and a controller who tries to establish a correct behavior. We suggest a new control objective for such systems that offers a better balance between complexity and precision: we seek systems that are k-resilient. In order to be k-resilient, a system needs to be able to rapidly recover from a small number, up to k, of local faults infinitely many times, provided that blocks of up to k faults are separated by short recovery periods in which no fault occurs. k-resilience is a simple but powerful abstraction from the precise distribution of local faults, but much more refined than the traditional objective to maximize the number of local faults. We argue why we believe this to be the right level of abstraction for safety critical systems when local faults are few and far between. We show that the computational complexity of constructing optimal control with respect to resilience is low and demonstrate the feasibility through an implementation and experimental results.Comment: In Proceedings GandALF 2012, arXiv:1210.202

arXiv.org e-Print Archive

Directory of Open Access Journals

Incremental bounded model checking for embedded software

Author: A Biere
A Petrenko
A Pnueli
AC Dias Neto
D Kroening
Daniel Kroening
E Clarke
G Fraser
H Jin
JN Hooker
M Harman
Martin Brain
N Eén
P Fleming
Peter Schrammel
R Brummayer
RE Bryant
Ruben Martins
Tino Teige
Tom Bienmüller
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Program analysis is on the brink of mainstream usage in embedded systems development. Formal verification of behavioural requirements, finding runtime errors and test case generation are some of the most common applications of automated verification tools based on bounded model checking (BMC). Existing industrial tools for embedded software use an off-the-shelf bounded model checker and apply it iteratively to verify the program with an increasing number of unwindings. This approach unnecessarily wastes time repeating work that has already been done and fails to exploit the power of incremental SAT solving. This article reports on the extension of the software model checker CBMC to support incremental BMC and its successful integration with the industrial embedded software verification tool BTC EMBEDDED TESTER. We present an extensive evaluation over large industrial embedded programs, mainly from the automotive industry. We show that incremental BMC cuts runtimes by one order of magnitude in comparison to the standard non-incremental approach, enabling the application of formal verification to large and complex embedded software. We furthermore report promising results on analysing programs with arbitrary loop structure using incremental BMC, demonstrating its applicability and potential to verify general software beyond the embedded domain

City Research Online

Springer - Publisher Connector

Oxford University Research Archive

A method for rigorous development of fault-tolerant systems

Author: Lopatkin Ilya
Publication venue: Newcastle University
Publication date: 01/01/2013
Field of study

PhD ThesisWith the rapid development of information systems and our increasing dependency on computer-based systems, ensuring their dependability becomes one the most important concerns during system development. This is especially true for the mission and safety critical systems on which we rely not to put signi cant resources and lives at risk. Development of critical systems traditionally involves formal modelling as a fault prevention mechanism. At the same time, systems typically support fault tolerance mechanisms to mitigate runtime errors. However, fault tolerance modelling and, in particular, rigorous de nitions of fault tolerance requirements, fault assumptions and system recovery have not been given enough attention during formal system development. The main contribution of this research is in developing a method for top-down formal design of fault tolerant systems. The re nement-based method provides modelling guidelines presented in the following form: a set of modelling principles for systematic modelling of fault tolerance, a fault tolerance re nement strategy, and a library of generic modelling patterns assisting in disciplined integration of error detection and error recovery steps into models. The method supports separation of normal and fault tolerant system behaviour during modelling. It provides an environment for explicit modelling of fault tolerance and modal aspects of system behaviour which ensure rigour of the proposed development process. The method is supported by tools that are smoothly integrated into an industry-strength development environment. The proposed method is demonstrated on two case studies. In particular, the evaluation is carried out using a medium-scale industrial case study from the aerospace domain. The method is shown to provide support for explicit modelling of fault tolerance, to reduce the development e orts during modelling, to support reuse of fault tolerance modelling, and to facilitate adoption of formal methods.DEPLOY: The TrAmS Grant: The School of Computing Science, Newcastle University

Newcastle University eTheses

The ISIS project: Fault-tolerance in large distributed systems

Author: Birman Kenneth P.
Marzullo Keith
Publication venue
Publication date
Field of study

The semi-annual status report covers activities of the ISIS project during the second half of 1989. The project had several independent objectives: (1) At the level of the ISIS Toolkit, ISIS release V2.0 was completed, containing bypass communication protocols. Performance of the system is greatly enhanced by this change, but the initial software release is limited in some respects. (2) The Meta project focused on the definition of the Lomita programming language for specifying rules that monitor sensors for conditions of interest and triggering appropriate reactions. This design was completed, and implementation of Lomita is underway on the Meta 2.0 platform. (3) The Deceit file system effort completed a prototype. It is planned to make Deceit available for use in two hospital information systems. (4) A long-haul communication subsystem project was completed and can be used as part of ISIS. This effort resulted in tools for linking ISIS systems on different LANs together over long-haul communications lines. (5) Magic Lantern, a graphical tool for building application monitoring and control interfaces, is included as part of the general ISIS releases