106 research outputs found
Efficient fault-injection-based assessment of software-implemented hardware fault tolerance
With continuously shrinking semiconductor structure sizes and lower supply
voltages, the per-device susceptibility to transient and permanent hardware
faults is on the rise. A class of countermeasures with growing popularity
is Software-Implemented Hardware Fault Tolerance (SIHFT), which avoids
expensive hardware mechanisms and can be applied application-specifically.
However, SIHFT can, against intuition, cause more harm than good, because
its overhead in execution time and memory space also increases the figurative
“attack surface” of the system – it turns out that application-specific configuration of SIHFT is in fact a necessity rather than just an advantage.
Consequently, target programs need to be analyzed for particularly critical spots to harden. SIHFT-hardened programs need to be measured and compared throughout all development phases of the program to observe reliability improvements or deteriorations over time. Additionally, SIHFT implementations
need to be tested.
The contributions of this dissertation focus on Fault Injection (FI) as an assessment technique satisfying all these requirements – analysis, measurement and comparison, and test. I describe the design and implementation of an FI tool, named Fail*, that overcomes several shortcomings in the state of
the art, and enables research on the general drawbacks of simulation-based
FI. As demonstrated in four case studies in the context of SIHFT research,
Fail* provides novel fine-grained analysis techniques that exploit the newly
gained possibility to analyze FI results from complete fault-space exploration.
These analysis techniques aid SIHFT design decisions on the level of program
modules, functions, variables, source-code lines, or single machine instructions.
Based on the experience from the case studies, I address the problem
of large computation efforts that accompany exhaustive fault-space exploration
from two different angles: Firstly, I develop a heuristical fault-space
pruning technique that allows to freely trade the total FI-experiment count
for result accuracy, while still providing information on all possible faultspace
coordinates. Secondly, I speed up individual TAP-based FI experiments
by improving the fast-forwarding operation by several orders of magnitude
for most workloads. Finally, I dissect current practices in FI-based evaluation
of SIHFT-hardened programs, identify three widespread pitfalls in the
result interpretation, and advance the state of the art by defining a novel
comparison metric
Compiler-Injected SIHFT for Embedded Operating Systems
Random hardware faults are a major concern for critical systems, especially when they are employed in high-radiation environments such as aerospace applications. While specialised hardware already exists for implementing fault tolerance, software solutions, named Software-Implemented Hardware Fault Tolerance (SIHFT), offer higher flexibility at a lower cost. This work describes a compiler-based approach for inserting instruction-level fault detection mechanisms in both the application code and the operating system. An experimental evaluation on a STM32 board running FreeRTOS shows the effectiveness of the proposed approach in detecting faults
Automated Synthesis of SEU Tolerant Architectures from OO Descriptions
SEU faults are a well-known problem in aerospace environment but recently their relevance grew up also at ground level in commodity applications coupled, in this frame, with strong economic constraints in terms of costs reduction. On the other hand, latest hardware description languages and synthesis tools allow reducing the boundary between software and hardware domains making the high-level descriptions of hardware components very similar to software programs. Moving from these considerations, the present paper analyses the possibility of reusing Software Implemented Hardware Fault Tolerance (SIHFT) techniques, typically exploited in micro-processor based systems, to design SEU tolerant architectures. The main characteristics of SIHFT techniques have been examined as well as how they have to be modified to be compatible with the synthesis flow. A complete environment is provided to automate the design instrumentation using the proposed techniques, and to perform fault injection experiments both at behavioural and gate level. Preliminary results presented in this paper show the effectiveness of the approach in terms of reliability improvement and reduced design effort
Enhanced Compiler Technology for Software-based Hardware Fault Detection
Software-Implemented Hardware Fault Tolerance (SIHFT) is a modern approach for tackling random hardware faults of dependable systems employing solely software solutions. This work extends an automatic compiler-based SIHFT hardening tool called ASPIS, enhancing it with novel protection mechanisms and overhead-reduction techniques, also providing an extensive analysis of its compliance with the non-trivial workload of the open-source Real-Time Operating System FreeRTOS. A thorough experimental fault-injection campaign on an STM32 board shows how the system achieves remarkably high tolerance to single-event upsets and a comparison between the SIHFT mechanisms implemented summarises the trade-off between the overhead introduced and the detection capabilities of the various solutions
Study of fault tolerant software technology for dynamic systems
The major aim of this study is to investigate the feasibility of using systems-based failure detection isolation and compensation (FDIC) techniques in building fault-tolerant software and extending them, whenever possible, to the domain of software fault tolerance. First, it is shown that systems-based FDIC methods can be extended to develop software error detection techniques by using system models for software modules. In particular, it is demonstrated that systems-based FDIC techniques can yield consistency checks that are easier to implement than acceptance tests based on software specifications. Next, it is shown that systems-based failure compensation techniques can be generalized to the domain of software fault tolerance in developing software error recovery procedures. Finally, the feasibility of using fault-tolerant software in flight software is investigated. In particular, possible system and version instabilities, and functional performance degradation that may occur in N-Version programming applications to flight software are illustrated. Finally, a comparative analysis of N-Version and recovery block techniques in the context of generic blocks in flight software is presented
PROMON: a profile monitor of software applications
Software techniques can be efficiently used to increase the dependability of safety-critical applications. Many approaches are based on information redundancy to prevent data and code corruption during the software execution. This paper presents PROMON, a C++ library that exploits a new methodology based on the concept of "Programming by Contract" to detect system malfunctions. Resorting to assertions, pre- and post-conditions, and marginal programmer interventions, PROMON-based applications can reach high level of dependabilit
Software Fault Tolerance in Real-Time Systems: Identifying the Future Research Questions
Tolerating hardware faults in modern architectures is becoming a prominent problem due to the miniaturization of the hardware components, their increasing complexity, and the necessity to reduce the costs. Software-Implemented Hardware Fault Tolerance approaches have been developed to improve the system dependability to hardware faults without resorting to custom hardware solutions. However, these come at the expense of making the satisfaction of the timing constraints of the applications/activities harder from a scheduling standpoint. This paper surveys the current state of the art of fault tolerance approaches when used in the context real-time systems, identifying the main challenges and the cross-links between these two topics. We propose a joint scheduling-failure analysis model that highlights the formal interactions among software fault tolerance mechanisms and timing properties. This model allows us to present and discuss many open research questions with the final aim to spur the future research activities
FAUST: fault-injection script-based tool
The tool described in this paper aims at evaluating the effectiveness of software-implemented fault-tolerant techniques used in safety-critical systems. The target application is stressed with the injection of transient or permanent faults. The user can therefore observe the real behaviour of the application in presence of a fault, and, if necessary, take the appropriate countermeasures. The accent is put on the extreme easiness of the use and the portability on all UNIX platforms
Agent Based Test and Repair of Distributed Systems
This article demonstrates how to use intelligent agents for testing and repairing a distributed system, whose elements may or may not have embedded BIST (Built-In Self-Test) and BISR (Built-In Self-Repair) facilities. Agents are software modules that perform monitoring, diagnosis and repair of the faults. They form together a society whose members communicate, set goals and solve tasks. An experimental solution is presented, and future developments of the proposed approach are explore
PROMON: a profile monitor of software applications
Software techniques can be efficiently used to increase the dependability of safety-critical applications. Many approaches are based on information redundancy to prevent data and code corruption during the software execution. This paper presents PROMON, a C++ library that exploits a new methodology based on the concept of “Programming by Contract” to detect system malfunctions. Resorting to assertions, pre- and post-conditions, and marginal programmer interventions, PROMON-based applications can reach high level of dependability
- …