342 research outputs found

    Reliability analysis of triple modular redundancy system with spare

    Get PDF
    Hardware redundant fault-tolerant systems and the different design approaches are discussed. The reliability analysis of fault-tolerant systems is usually done under permanent fault conditions. With statistical data suggesting that up to 90% of system failures are caused by intermittent faults, the reliability analysis of fault-tolerant systems must concentrate more on this class of faults. In this work, a reconfigurable Triple Modular Redundancy (TMR) with spare system that differentiates between permanent and intermittent faults has been built. The reconfiguration process of this system depends on both the current status of its modules and their history. Based on this, a different approach for reliability analysis under intermittent fault conditions using Markov models is presented. This approach shows a much higher system reliability compared to other redundant and non-redundant configurations

    Synchronization and fault-masking in redundant real-time systems

    Get PDF
    A real time computer may fail because of massive component failures or not responding quickly enough to satisfy real time requirements. An increase in redundancy - a conventional means of improving reliability - can improve the former but can - in some cases - degrade the latter considerably due to the overhead associated with redundancy management, namely the time delay resulting from synchronization and voting/interactive consistency techniques. The implications of synchronization and voting/interactive consistency algorithms in N-modular clusters on reliability are considered. All these studies were carried out in the context of real time applications. As a demonstrative example, we have analyzed results from experiments conducted at the NASA Airlab on the Software Implemented Fault Tolerance (SIFT) computer. This analysis has indeed indicated that in most real time applications, it is better to employ hardware synchronization instead of software synchronization and not allow reconfiguration

    Fault-tolerant computer study

    Get PDF
    A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed

    Critical fault patterns determination in fault-tolerant computer systems

    Get PDF
    The method proposed tries to enumerate all the critical fault-patterns (successive occurrences of failures) without analyzing every single possible fault. The conditions for the system to be operating in a given mode can be expressed in terms of the static states. Thus, one can find all the system states that correspond to a given critical mode of operation. The next step consists in analyzing the fault-detection mechanisms, the diagnosis algorithm and the process of switch control. From them, one can find all the possible system configurations that can result from a failure occurrence. Thus, one can list all the characteristics, with respect to detection, diagnosis, and switch control, that failures must have to constitute critical fault-patterns. Such an enumeration of the critical fault-patterns can be directly used to evaluate the overall system tolerance to failures. Present research is focused on how to efficiently make use of these system-level characteristics to enumerate all the failures that verify these characteristics

    Study of fault-tolerant software technology

    Get PDF
    Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance

    Airborne Advanced Reconfigurable Computer System (ARCS)

    Get PDF
    A digital computer subsystem fault-tolerant concept was defined, and the potential benefits and costs of such a subsystem were assessed when used as the central element of a new transport's flight control system. The derived advanced reconfigurable computer system (ARCS) is a triple-redundant computer subsystem that automatically reconfigures, under multiple fault conditions, from triplex to duplex to simplex operation, with redundancy recovery if the fault condition is transient. The study included criteria development covering factors at the aircraft's operation level that would influence the design of a fault-tolerant system for commercial airline use. A new reliability analysis tool was developed for evaluating redundant, fault-tolerant system availability and survivability; and a stringent digital system software design methodology was used to achieve design/implementation visibility

    Using Fine Grain Approaches for highly reliable Design of FPGA-based Systems in Space

    Get PDF
    Nowadays using SRAM based FPGAs in space missions is increasingly considered due to their flexibility and reprogrammability. A challenge is the devices sensitivity to radiation effects that increased with modern architectures due to smaller CMOS structures. This work proposes fault tolerance methodologies, that are based on a fine grain view to modern reconfigurable architectures. The focus is on SEU mitigation challenges in SRAM based FPGAs which can result in crucial situations

    Diversity Strategies for Nuclear Power Plant Instrumentation and Control Systems

    Get PDF
    This report presents the technical basis for establishing acceptable mitigating strategies that resolve diversity and defense-in-depth (D3) assessment findings and conform to U.S. Nuclear Regulatory Commission (NRC) requirements. The research approach employed to establish appropriate diversity strategies involves investigation of available documentation on D3 methods and experience from nuclear power and nonnuclear industries, capture of expert knowledge and lessons learned, determination of best practices, and assessment of the nature of common-cause failures (CCFs) and compensating diversity attributes. The research described in this report does not provide guidance on how to determine the need for diversity in a safety system to mitigate the consequences of potential CCFs. Rather, the scope of this report provides guidance to the staff and nuclear industry after a licensee or applicant has performed a D3 assessment per NUREG/CR-6303 and determined that diversity in a safety system is needed for mitigating the consequences of potential CCFs identified in the evaluation of the safety system design features. Succinctly, the purpose of the research described in this report was to answer the question, 'If diversity is required in a safety system to mitigate the consequences of potential CCFs, how much diversity is enough?' The principal results of this research effort have identified and developed diversity strategies, which consist of combinations of diversity attributes and their associated criteria. Technology, which corresponds to design diversity, is chosen as the principal system characteristic by which diversity criteria are grouped to form strategies. The rationale for this classification framework involves consideration of the profound impact that technology-focused design diversity provides. Consequently, the diversity usage classification scheme involves three families of strategies: (1) different technologies, (2) different approaches within the same technology, and (3) different architectures within the same technology. Using this convention, the first diversity usage family, designated Strategy A, is characterized by fundamentally diverse technologies. Strategy A at the system or platform level is illustrated by the example of analog and digital implementations. The second diversity usage family, designated Strategy B, is achieved through the use of distinctly different technologies. Strategy B can be described in terms of different digital technologies, such as the distinct approaches represented by general-purpose microprocessors and field-programmable gate arrays. The third diversity usage family, designated Strategy C, involves the use of variations within a technology. An example of Strategy C involves different digital architectures within the same technology, such as that provided by different microprocessors (e.g., Pentium and Power PC). The grouping of diversity criteria combinations according to Strategies A, B, and C establishes baseline diversity usage and facilitates a systematic organization of strategic approaches for coping with CCF vulnerabilities. Effectively, these baseline sets of diversity criteria constitute appropriate CCF mitigating strategies for digital safety systems. The strategies represent guidance on acceptable diversity usage and can be applied directly to ensure that CCF vulnerabilities identified through a D3 assessment have been adequately resolved. Additionally, a framework has been generated for capturing practices regarding diversity usage and a tool has been developed for the systematic assessment of the comparative effect of proposed diversity strategies (see Appendix A)
    • …
    corecore