161 research outputs found
Explicit Representation of Exception Handling in the Development of Dependable Component-Based Systems
Exception handling is a structuring technique that facilitates the design of systems by encapsulating the process of error recovery. In this paper, we present a systematic approach for incorporating exceptional behaviour in the development of component-based software. The premise of our approach is that components alone do not provide the appropriate means to deal with exceptional behaviour in an effective manner. Hence the need to consider the notion of collaborations for capturing the interactive behaviour between components, when error recovery involves more than one component. The feasibility of the approach is demonstrated in terms of the case study of the mining control system
Trends in reliability modeling technology for fault tolerant systems
Reliability modeling for fault tolerant avionic computing systems was developed. The modeling of large systems involving issues of state size and complexity, fault coverage, and practical computation was discussed. A novel technique which provides the tool for studying the reliability of systems with nonconstant failure rates is presented. The fault latency which may provide a method of obtaining vital latent fault data is measured
Evaluation of Fault Mitigation Techniques Based on Approximate Computing Under Radiation
A software technique based on approximate computing and redundancy is presented to mitigate radiation-induced soft errors in COTS microprocessors. Approximate Computing relies on the capability of certain applications to accept imprecise results to improve efficiency by sacrificing its results in a controlled manner. Our approach avoids the time overhead derived from hardening while preserving the detection and correction rate. Experimental results show that we can detect and correct SDC events improving the cross-section up to 160×, and keeping accuracy under control without compromising performance. In addition, an accuracy-aware layer is included to improve error mitigation and to provide a trade-off between the number of tolerable errors and the necessary accuracy.MultiRad (funded by Région Auvergne-Rhône-Alpes, France); IRT Nanoelec (French National Research Agency ANR-10-AIRT-05 project funded through the Program d’investissement d’avenir); UGA/LPSC/-GENESIS platform and PID2022-138696OB-C22 (funded by the Spanish Ministry of Science and Innovation)
Rapid Recovery for Systems with Scarce Faults
Our goal is to achieve a high degree of fault tolerance through the control
of a safety critical systems. This reduces to solving a game between a
malicious environment that injects failures and a controller who tries to
establish a correct behavior. We suggest a new control objective for such
systems that offers a better balance between complexity and precision: we seek
systems that are k-resilient. In order to be k-resilient, a system needs to be
able to rapidly recover from a small number, up to k, of local faults
infinitely many times, provided that blocks of up to k faults are separated by
short recovery periods in which no fault occurs. k-resilience is a simple but
powerful abstraction from the precise distribution of local faults, but much
more refined than the traditional objective to maximize the number of local
faults. We argue why we believe this to be the right level of abstraction for
safety critical systems when local faults are few and far between. We show that
the computational complexity of constructing optimal control with respect to
resilience is low and demonstrate the feasibility through an implementation and
experimental results.Comment: In Proceedings GandALF 2012, arXiv:1210.202
Fault-Tolerant FPGA-Based Systems
This paper presents a new approach to on-line fault tolerance via reconfiguration for the systems mapped onto field programmable gate arrays (FPGAs). The fault detection, based on self-checking technique, is introduced at application level; therefore our approach can detect the faults of configurable logic blocks (CLBs) and routing interconnections in the FPGAs concurrently with the normal system work. A grid of tiles is projected on the FPGA structure and a certain number of spare CLBs is reserved inside every tile. The number of spare CLBs per tile, which will be used as a backup upon detecting any faulty CLB, is estimated in accordance with the probability of failure. After locating the faulty CLBs, the faulty tile will be reconfigured with avoiding the faulty CLBs. Our proposed approach uses a combination of hardware and software redundancy. We assume that a module external to the FPGA controls automatically the reconfiguration process in addition to the diagnosis process (DIRC); typically this is an embedded microprocessor having some storage for the various tile configurations. We have implemented our approach using Xilinx Virtex FPGA. The DIRC code is written in JBits software tools. In response to a component failure this approach capitalizes on the unique reconfiguration capabilities of FPGAs and replaces the affected tile with a functionally equivalent one that does not rely on the faulty component. Unlike fixed structure fault-tolerance techniques for ASICs and microprocessors, this approach allows a single physical component to provide redundant backup for several types of components
A method for rigorous development of fault-tolerant systems
PhD ThesisWith the rapid development of information systems and our increasing
dependency on computer-based systems, ensuring their dependability becomes
one the most important concerns during system development. This
is especially true for the mission and safety critical systems on which we
rely not to put signi cant resources and lives at risk.
Development of critical systems traditionally involves formal modelling
as a fault prevention mechanism. At the same time, systems typically
support fault tolerance mechanisms to mitigate runtime errors. However,
fault tolerance modelling and, in particular, rigorous de nitions of fault
tolerance requirements, fault assumptions and system recovery have not
been given enough attention during formal system development.
The main contribution of this research is in developing a method for
top-down formal design of fault tolerant systems. The re nement-based
method provides modelling guidelines presented in the following form:
a set of modelling principles for systematic modelling of fault tolerance,
a fault tolerance re nement strategy, and
a library of generic modelling patterns assisting in disciplined integration
of error detection and error recovery steps into models.
The method supports separation of normal and fault tolerant system behaviour
during modelling. It provides an environment for explicit modelling
of fault tolerance and modal aspects of system behaviour which
ensure rigour of the proposed development process.
The method is supported by tools that are smoothly integrated into an
industry-strength development environment.
The proposed method is demonstrated on two case studies. In particular,
the evaluation is carried out using a medium-scale industrial case study
from the aerospace domain.
The method is shown to provide support for explicit modelling of fault
tolerance, to reduce the development e orts during modelling, to support
reuse of fault tolerance modelling, and to facilitate adoption of formal
methods.DEPLOY:
The TrAmS Grant:
The School of Computing Science, Newcastle University
Immunotronics - novel finite-state-machine architectures with built-in self-test using self-nonself differentiation
A novel approach to hardware fault tolerance is demonstrated that takes inspiration from the human immune system as a method of fault detection. The human immune system is a remarkable system of interacting cells and organs that protect the body from invasion and maintains reliable operation even in the presence of invading bacteria or viruses. This paper seeks to address the field of electronic hardware fault tolerance from an immunological perspective with the aim of showing how novel methods based upon the operation of the immune system can both complement and create new approaches to the development of fault detection mechanisms for reliable hardware systems. In particular, it is shown that by use of partial matching, as prevalent in biological systems, high fault coverage can be achieved with the added advantage of reducing memory requirements. The development of a generic finite-state-machine immunization procedure is discussed that allows any system that can be represented in such a manner to be "immunized" against the occurrence of faulty operation. This is demonstrated by the creation of an immunized decade counter that can detect the presence of faults in real tim
An immune system paradigm for the assurance of dependability of collaborative self-organizing systems
In collaborative self-organizing computing systems a complex task is performed by relatively simple autonomous agents that act without centralized control. Disruption of a task can be caused by agents that produce harmful outputs due to internal failures or due to maliciously introduced alterations of their functions. The probability of such harmful outputs is minimized by the application of a design principle called ”the immune system paradigm” that provides individual agents with an all-hardware fault tolerance infrastructure. The paradigm and its application are described in this paper.1st IFIP International Conference on Biologically Inspired Cooperative Computing - Biological Inspiration: Just a dream?Red de Universidades con Carreras en Informática (RedUNCI
An immune system paradigm for the assurance of dependability of collaborative self-organizing systems
In collaborative self-organizing computing systems a complex task is performed by relatively simple autonomous agents that act without centralized control. Disruption of a task can be caused by agents that produce harmful outputs due to internal failures or due to maliciously introduced alterations of their functions. The probability of such harmful outputs is minimized by the application of a design principle called ”the immune system paradigm” that provides individual agents with an all-hardware fault tolerance infrastructure. The paradigm and its application are described in this paper.1st IFIP International Conference on Biologically Inspired Cooperative Computing - Biological Inspiration: Just a dream?Red de Universidades con Carreras en Informática (RedUNCI
- …