161 research outputs found

    Explicit Representation of Exception Handling in the Development of Dependable Component-Based Systems

    Get PDF
    Exception handling is a structuring technique that facilitates the design of systems by encapsulating the process of error recovery. In this paper, we present a systematic approach for incorporating exceptional behaviour in the development of component-based software. The premise of our approach is that components alone do not provide the appropriate means to deal with exceptional behaviour in an effective manner. Hence the need to consider the notion of collaborations for capturing the interactive behaviour between components, when error recovery involves more than one component. The feasibility of the approach is demonstrated in terms of the case study of the mining control system

    Trends in reliability modeling technology for fault tolerant systems

    Get PDF
    Reliability modeling for fault tolerant avionic computing systems was developed. The modeling of large systems involving issues of state size and complexity, fault coverage, and practical computation was discussed. A novel technique which provides the tool for studying the reliability of systems with nonconstant failure rates is presented. The fault latency which may provide a method of obtaining vital latent fault data is measured

    Evaluation of Fault Mitigation Techniques Based on Approximate Computing Under Radiation

    Get PDF
    A software technique based on approximate computing and redundancy is presented to mitigate radiation-induced soft errors in COTS microprocessors. Approximate Computing relies on the capability of certain applications to accept imprecise results to improve efficiency by sacrificing its results in a controlled manner. Our approach avoids the time overhead derived from hardening while preserving the detection and correction rate. Experimental results show that we can detect and correct SDC events improving the cross-section up to 160×, and keeping accuracy under control without compromising performance. In addition, an accuracy-aware layer is included to improve error mitigation and to provide a trade-off between the number of tolerable errors and the necessary accuracy.MultiRad (funded by Région Auvergne-Rhône-Alpes, France); IRT Nanoelec (French National Research Agency ANR-10-AIRT-05 project funded through the Program d’investissement d’avenir); UGA/LPSC/-GENESIS platform and PID2022-138696OB-C22 (funded by the Spanish Ministry of Science and Innovation)

    Rapid Recovery for Systems with Scarce Faults

    Full text link
    Our goal is to achieve a high degree of fault tolerance through the control of a safety critical systems. This reduces to solving a game between a malicious environment that injects failures and a controller who tries to establish a correct behavior. We suggest a new control objective for such systems that offers a better balance between complexity and precision: we seek systems that are k-resilient. In order to be k-resilient, a system needs to be able to rapidly recover from a small number, up to k, of local faults infinitely many times, provided that blocks of up to k faults are separated by short recovery periods in which no fault occurs. k-resilience is a simple but powerful abstraction from the precise distribution of local faults, but much more refined than the traditional objective to maximize the number of local faults. We argue why we believe this to be the right level of abstraction for safety critical systems when local faults are few and far between. We show that the computational complexity of constructing optimal control with respect to resilience is low and demonstrate the feasibility through an implementation and experimental results.Comment: In Proceedings GandALF 2012, arXiv:1210.202

    Fault-Tolerant FPGA-Based Systems

    Get PDF
    This paper presents a new approach to on-line fault tolerance via reconfiguration for the systems mapped onto field programmable gate arrays (FPGAs). The fault detection, based on self-checking technique, is introduced at application level; therefore our approach can detect the faults of configurable logic blocks (CLBs) and routing interconnections in the FPGAs concurrently with the normal system work. A grid of tiles is projected on the FPGA structure and a certain number of spare CLBs is reserved inside every tile. The number of spare CLBs per tile, which will be used as a backup upon detecting any faulty CLB, is estimated in accordance with the probability of failure. After locating the faulty CLBs, the faulty tile will be reconfigured with avoiding the faulty CLBs. Our proposed approach uses a combination of hardware and software redundancy. We assume that a module external to the FPGA controls automatically the reconfiguration process in addition to the diagnosis process (DIRC); typically this is an embedded microprocessor having some storage for the various tile configurations. We have implemented our approach using Xilinx Virtex FPGA. The DIRC code is written in JBits software tools. In response to a component failure this approach capitalizes on the unique reconfiguration capabilities of FPGAs and replaces the affected tile with a functionally equivalent one that does not rely on the faulty component. Unlike fixed structure fault-tolerance techniques for ASICs and microprocessors, this approach allows a single physical component to provide redundant backup for several types of components

    A method for rigorous development of fault-tolerant systems

    Get PDF
    PhD ThesisWith the rapid development of information systems and our increasing dependency on computer-based systems, ensuring their dependability becomes one the most important concerns during system development. This is especially true for the mission and safety critical systems on which we rely not to put signi cant resources and lives at risk. Development of critical systems traditionally involves formal modelling as a fault prevention mechanism. At the same time, systems typically support fault tolerance mechanisms to mitigate runtime errors. However, fault tolerance modelling and, in particular, rigorous de nitions of fault tolerance requirements, fault assumptions and system recovery have not been given enough attention during formal system development. The main contribution of this research is in developing a method for top-down formal design of fault tolerant systems. The re nement-based method provides modelling guidelines presented in the following form: a set of modelling principles for systematic modelling of fault tolerance, a fault tolerance re nement strategy, and a library of generic modelling patterns assisting in disciplined integration of error detection and error recovery steps into models. The method supports separation of normal and fault tolerant system behaviour during modelling. It provides an environment for explicit modelling of fault tolerance and modal aspects of system behaviour which ensure rigour of the proposed development process. The method is supported by tools that are smoothly integrated into an industry-strength development environment. The proposed method is demonstrated on two case studies. In particular, the evaluation is carried out using a medium-scale industrial case study from the aerospace domain. The method is shown to provide support for explicit modelling of fault tolerance, to reduce the development e orts during modelling, to support reuse of fault tolerance modelling, and to facilitate adoption of formal methods.DEPLOY: The TrAmS Grant: The School of Computing Science, Newcastle University

    Immunotronics - novel finite-state-machine architectures with built-in self-test using self-nonself differentiation

    Get PDF
    A novel approach to hardware fault tolerance is demonstrated that takes inspiration from the human immune system as a method of fault detection. The human immune system is a remarkable system of interacting cells and organs that protect the body from invasion and maintains reliable operation even in the presence of invading bacteria or viruses. This paper seeks to address the field of electronic hardware fault tolerance from an immunological perspective with the aim of showing how novel methods based upon the operation of the immune system can both complement and create new approaches to the development of fault detection mechanisms for reliable hardware systems. In particular, it is shown that by use of partial matching, as prevalent in biological systems, high fault coverage can be achieved with the added advantage of reducing memory requirements. The development of a generic finite-state-machine immunization procedure is discussed that allows any system that can be represented in such a manner to be "immunized" against the occurrence of faulty operation. This is demonstrated by the creation of an immunized decade counter that can detect the presence of faults in real tim

    An immune system paradigm for the assurance of dependability of collaborative self-organizing systems

    Get PDF
    In collaborative self-organizing computing systems a complex task is performed by relatively simple autonomous agents that act without centralized control. Disruption of a task can be caused by agents that produce harmful outputs due to internal failures or due to maliciously introduced alterations of their functions. The probability of such harmful outputs is minimized by the application of a design principle called ”the immune system paradigm” that provides individual agents with an all-hardware fault tolerance infrastructure. The paradigm and its application are described in this paper.1st IFIP International Conference on Biologically Inspired Cooperative Computing - Biological Inspiration: Just a dream?Red de Universidades con Carreras en Informática (RedUNCI

    An immune system paradigm for the assurance of dependability of collaborative self-organizing systems

    Get PDF
    In collaborative self-organizing computing systems a complex task is performed by relatively simple autonomous agents that act without centralized control. Disruption of a task can be caused by agents that produce harmful outputs due to internal failures or due to maliciously introduced alterations of their functions. The probability of such harmful outputs is minimized by the application of a design principle called ”the immune system paradigm” that provides individual agents with an all-hardware fault tolerance infrastructure. The paradigm and its application are described in this paper.1st IFIP International Conference on Biologically Inspired Cooperative Computing - Biological Inspiration: Just a dream?Red de Universidades con Carreras en Informática (RedUNCI
    corecore