6 research outputs found
Software-implemented fault insertion: An FTMP example
This report presents a model for fault insertion through software; describes its implementation on a fault-tolerant computer, FTMP; presents a summary of fault detection, identification, and reconfiguration data collected with software-implemented fault insertion; and compares the results to hardware fault insertion data. Experimental results show detection time to be a function of time of insertion and system workload. For the fault detection time, there is no correlation between software-inserted faults and hardware-inserted faults; this is because hardware-inserted faults must manifest as errors before detection, whereas software-inserted faults immediately exercise the error detection mechanisms. In summary, the software-implemented fault insertion is able to be used as an evaluation technique for the fault-handling capabilities of a system in fault detection, identification and recovery. Although the software-inserted faults do not map directly to hardware-inserted faults, experiments show software-implemented fault insertion is capable of emulating hardware fault insertion, with greater ease and automation
Study of fault-tolerant software technology
Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance
Space station data system analysis/architecture study. Task 2: Options development DR-5. Volume 1: Technology options
The second task in the Space Station Data System (SSDS) Analysis/Architecture Study is the development of an information base that will support the conduct of trade studies and provide sufficient data to make key design/programmatic decisions. This volume identifies the preferred options in the technology category and characterizes these options with respect to performance attributes, constraints, cost, and risk. The technology category includes advanced materials, processes, and techniques that can be used to enhance the implementation of SSDS design structures. The specific areas discussed are mass storage, including space and round on-line storage and off-line storage; man/machine interface; data processing hardware, including flight computers and advanced/fault tolerant computer architectures; and software, including data compression algorithms, on-board high level languages, and software tools. Also discussed are artificial intelligence applications and hard-wire communications
Fault injection testing of software implemented fault tolerance mechanisms of distributed systems
PhD ThesisOne way of gaining confidence in the adequacy of fault tolerance mechanisms of a
system is to test the system by injecting faults and see how the system performs under
faulty conditions. This thesis investigates the issues of testing software-implemented
fault tolerance mechanisms of distributed systems through fault injection.
A fault injection method has been developed. The method requires that the target
software system be structured as a collection of objects interacting via messages. This
enables easy insertion of fault injection objects into the target system to emulate
incorrect behaviour of faulty processors by manipulating messages. This approach
allows one to inject specific classes of faults while not requiring any significant changes
to the target system. The method differs from the previous work in that it exploits an
object oriented approach of software implementation to support the injection of specific
classes of faults at the system level.
The proposed fault injection method has been applied to test software-implemented
reliable node systems: a TMR (triple modular redundant) node and a fail-silent node.
The nodes have integrated fault tolerance mechanisms and are expected to exhibit
certain behaviour in the presence of a failure. The thesis describes how various such
mechanisms (for example, clock synchronisation protocol, and atomic broadcast
protocol) were tested. The testing revealed flaws in implementation that had not been
discovered before, thereby demonstrating the usefulness of the method. Application of
the approach to other distributed systems is also described in the thesis.CEC ESPRIT programme,
UK Engineering and Physical Sciences Research Council (EPSRC)