342 research outputs found
Reliability analysis of triple modular redundancy system with spare
Hardware redundant fault-tolerant systems and the different design approaches are discussed. The reliability analysis of fault-tolerant systems is usually done under permanent fault conditions. With statistical data suggesting that up to 90% of system failures are caused by intermittent faults, the reliability analysis of fault-tolerant systems must concentrate more on this class of faults. In this work, a reconfigurable Triple Modular Redundancy (TMR) with spare system that differentiates between permanent and intermittent faults has been built. The reconfiguration process of this system depends on both the current status of its modules and their history. Based on this, a different approach for reliability analysis under intermittent fault conditions using Markov models is presented. This approach shows a much higher system reliability compared to other redundant and non-redundant configurations
Synchronization and fault-masking in redundant real-time systems
A real time computer may fail because of massive component failures or not responding quickly enough to satisfy real time requirements. An increase in redundancy - a conventional means of improving reliability - can improve the former but can - in some cases - degrade the latter considerably due to the overhead associated with redundancy management, namely the time delay resulting from synchronization and voting/interactive consistency techniques. The implications of synchronization and voting/interactive consistency algorithms in N-modular clusters on reliability are considered. All these studies were carried out in the context of real time applications. As a demonstrative example, we have analyzed results from experiments conducted at the NASA Airlab on the Software Implemented Fault Tolerance (SIFT) computer. This analysis has indeed indicated that in most real time applications, it is better to employ hardware synchronization instead of software synchronization and not allow reconfiguration
Fault-tolerant computer study
A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed
Critical fault patterns determination in fault-tolerant computer systems
The method proposed tries to enumerate all the critical fault-patterns (successive occurrences of failures) without analyzing every single possible fault. The conditions for the system to be operating in a given mode can be expressed in terms of the static states. Thus, one can find all the system states that correspond to a given critical mode of operation. The next step consists in analyzing the fault-detection mechanisms, the diagnosis algorithm and the process of switch control. From them, one can find all the possible system configurations that can result from a failure occurrence. Thus, one can list all the characteristics, with respect to detection, diagnosis, and switch control, that failures must have to constitute critical fault-patterns. Such an enumeration of the critical fault-patterns can be directly used to evaluate the overall system tolerance to failures. Present research is focused on how to efficiently make use of these system-level characteristics to enumerate all the failures that verify these characteristics
Study of fault-tolerant software technology
Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance
Recommended from our members
Fault-tolerant hardware designs and their reliability analysis
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Fault-tolerance, which is a complement to fault prevention, is an effective method of achieving ultra-high reliability. By taking this approach fault free computation can be achieved despite the presence of fault in the system. In this thesis three new fault tolerant techniques are presented and their advantages over well known fault-tolerant strategies are shown. One of these new techniques achieves higher reliability than any other similar techniques presented in the literature. Generally fault-tolerant structures consist of four major blocks: the replicated modules, the disagreement and detection circuit, the switching circuit, and the voting mechanism. The most critical component in a fault-tolerant system is the voter because the final output of the system is computed by this component. This dissertation presents a new implementation for voters which reduces both the complexity and the occupied area on the chip. The structures of the three techniques developed in this work are such that the complexity of their switching mechanisms grows only linearly with the number of modules but the voting mechanism complexity increases significantly. This is a better approach than those schemes in which the switching complexity increases significantly and the voter's complexity remains constant or grows linearly with the number of modules because it is easier to implement a complex voter than a complex switch (voters have more regular structures). Extensive comparisons are made between different fault-tolerant techniques. A new reliability model is also developed for system reliability evaluation of the new designs. The results of these analyses are plotted, and the advantages of the new techniques are demonstrated. In the final part of the work an expert system is described which uses the knowledge acquired by these comparisons. This expert system is meant as a prototype of a component of a CAD tool which will act as an advisor on fault-tolerant techniques
Airborne Advanced Reconfigurable Computer System (ARCS)
A digital computer subsystem fault-tolerant concept was defined, and the potential benefits and costs of such a subsystem were assessed when used as the central element of a new transport's flight control system. The derived advanced reconfigurable computer system (ARCS) is a triple-redundant computer subsystem that automatically reconfigures, under multiple fault conditions, from triplex to duplex to simplex operation, with redundancy recovery if the fault condition is transient. The study included criteria development covering factors at the aircraft's operation level that would influence the design of a fault-tolerant system for commercial airline use. A new reliability analysis tool was developed for evaluating redundant, fault-tolerant system availability and survivability; and a stringent digital system software design methodology was used to achieve design/implementation visibility
Using Fine Grain Approaches for highly reliable Design of FPGA-based Systems in Space
Nowadays using SRAM based FPGAs in space missions is increasingly considered due to their flexibility and reprogrammability. A challenge is the devices sensitivity to radiation effects that increased with modern architectures due to smaller CMOS structures. This work proposes fault tolerance methodologies, that are based on a fine grain view to modern reconfigurable architectures. The focus is on SEU mitigation challenges in SRAM based FPGAs which can result in crucial situations
Diversity Strategies for Nuclear Power Plant Instrumentation and Control Systems
This report presents the technical basis for establishing acceptable mitigating strategies that resolve diversity and defense-in-depth (D3) assessment findings and conform to U.S. Nuclear Regulatory Commission (NRC) requirements. The research approach employed to establish appropriate diversity strategies involves investigation of available documentation on D3 methods and experience from nuclear power and nonnuclear industries, capture of expert knowledge and lessons learned, determination of best practices, and assessment of the nature of common-cause failures (CCFs) and compensating diversity attributes. The research described in this report does not provide guidance on how to determine the need for diversity in a safety system to mitigate the consequences of potential CCFs. Rather, the scope of this report provides guidance to the staff and nuclear industry after a licensee or applicant has performed a D3 assessment per NUREG/CR-6303 and determined that diversity in a safety system is needed for mitigating the consequences of potential CCFs identified in the evaluation of the safety system design features. Succinctly, the purpose of the research described in this report was to answer the question, 'If diversity is required in a safety system to mitigate the consequences of potential CCFs, how much diversity is enough?' The principal results of this research effort have identified and developed diversity strategies, which consist of combinations of diversity attributes and their associated criteria. Technology, which corresponds to design diversity, is chosen as the principal system characteristic by which diversity criteria are grouped to form strategies. The rationale for this classification framework involves consideration of the profound impact that technology-focused design diversity provides. Consequently, the diversity usage classification scheme involves three families of strategies: (1) different technologies, (2) different approaches within the same technology, and (3) different architectures within the same technology. Using this convention, the first diversity usage family, designated Strategy A, is characterized by fundamentally diverse technologies. Strategy A at the system or platform level is illustrated by the example of analog and digital implementations. The second diversity usage family, designated Strategy B, is achieved through the use of distinctly different technologies. Strategy B can be described in terms of different digital technologies, such as the distinct approaches represented by general-purpose microprocessors and field-programmable gate arrays. The third diversity usage family, designated Strategy C, involves the use of variations within a technology. An example of Strategy C involves different digital architectures within the same technology, such as that provided by different microprocessors (e.g., Pentium and Power PC). The grouping of diversity criteria combinations according to Strategies A, B, and C establishes baseline diversity usage and facilitates a systematic organization of strategic approaches for coping with CCF vulnerabilities. Effectively, these baseline sets of diversity criteria constitute appropriate CCF mitigating strategies for digital safety systems. The strategies represent guidance on acceptable diversity usage and can be applied directly to ensure that CCF vulnerabilities identified through a D3 assessment have been adequately resolved. Additionally, a framework has been generated for capturing practices regarding diversity usage and a tool has been developed for the systematic assessment of the comparative effect of proposed diversity strategies (see Appendix A)
- …