728 research outputs found

    AES-EPO study program, volume I Final study report

    Get PDF
    Conceptual study of possible solutions to long- term and time-critical reliability problems affecting Apollo command module guidance and control compute

    Reliability estimation procedures and CARE: The Computer-Aided Reliability Estimation Program

    Get PDF
    Ultrareliable fault-tolerant onboard digital systems for spacecraft intended for long mission life exploration of the outer planets are under development. The design of systems involving self-repair and fault-tolerance leads to the companion problem of quantifying and evaluating the survival probability of the system for the mission under consideration and the constraints imposed upon the system. Methods have been developed to (1) model self-repair and fault-tolerant organizations; (2) compute survival probability, mean life, and many other reliability predictive functions with respect to various systems and mission parameters; (3) perform sensitivity analysis of the system with respect to mission parameters; and (4) quantitatively compare competitive fault-tolerant systems. Various measures of comparison are offered. To automate the procedures of reliability mathematical modeling and evaluation, the CARE (computer-aided reliability estimation) program was developed. CARE is an interactive program residing on the UNIVAC 1108 system, which makes the above calculations and facilitates report preparation by providing output in tabular form, graphical 2-dimensional plots, and 3-dimensional projections. The reliability estimation of fault-tolerant organization by means of the CARE program is described

    Dependability modeling and optimization of triple modular redundancy partitioning for SRAM-based FPGAs

    Full text link
    SRAM-based FPGAs are popular in the aerospace industry for their field programmability and low cost. However, they suffer from cosmic radiation-induced Single Event Upsets (SEUs). Triple Modular Redundancy (TMR) is a well-known technique to mitigate SEUs in FPGAs that is often used with another SEU mitigation technique known as configuration scrubbing. Traditional TMR provides protection against a single fault at a time, while partitioned TMR provides improved reliability and availability. In this paper, we present a methodology to analyze TMR partitioning at early design stage using probabilistic model checking. The proposed formal model can capture both single and multiple-cell upset scenarios, regardless of any assumption of equal partition sizes. Starting with a high-level description of a design, a Markov model is constructed from the Data Flow Graph (DFG) using a specified number of partitions, a component characterization library and a user defined scrub rate. Such a model and exhaustive analysis captures all the considered failures and repairs possible in the system within the radiation environment. Various reliability and availability properties are then verified automatically using the PRISM model checker exploring the relationship between the scrub frequency and the number of TMR partitions required to meet the design requirements. Also, the reported results show that based on a known voter failure rate, it is possible to find an optimal number of partitions at early design stages using our proposed method.Comment: Published in Reliability Engineering & System Safety Volume 182, February 2019, Pages 107-11

    Optimizing Scrubbing by Netlist Analysis for FPGA Configuration Bit Classification and Floorplanning

    Full text link
    Existing scrubbing techniques for SEU mitigation on FPGAs do not guarantee an error-free operation after SEU recovering if the affected configuration bits do belong to feedback loops of the implemented circuits. In this paper, we a) provide a netlist-based circuit analysis technique to distinguish so-called critical configuration bits from essential bits in order to identify configuration bits which will need also state-restoring actions after a recovered SEU and which not. Furthermore, b) an alternative classification approach using fault injection is developed in order to compare both classification techniques. Moreover, c) we will propose a floorplanning approach for reducing the effective number of scrubbed frames and d), experimental results will give evidence that our optimization methodology not only allows to detect errors earlier but also to minimize the Mean-Time-To-Repair (MTTR) of a circuit considerably. In particular, we show that by using our approach, the MTTR for datapath-intensive circuits can be reduced by up to 48.5% in comparison to standard approaches

    Improving reconfigurable systems reliability by combining periodical test and redundancy techniques: a case study

    Get PDF
    This paper revises and introduces to the field of reconfigurable computer systems, some traditional techniques used in the fields of fault-tolerance and testing of digital circuits. The target area is that of on-board spacecraft electronics, as this class of application is a good candidate for the use of reconfigurable computing technology. Fault tolerant strategies are used in order for the system to adapt itself to the severe conditions found in space. In addition, the paper describes some problems and possible solutions for the use of reconfigurable components, based on programmable logic, in space applications

    Evaluation of reliability modeling tools for advanced fault tolerant systems

    Get PDF
    The Computer Aided Reliability Estimation (CARE III) and Automated Reliability Interactice Estimation System (ARIES 82) reliability tools for application to advanced fault tolerance aerospace systems were evaluated. To determine reliability modeling requirements, the evaluation focused on the Draper Laboratories' Advanced Information Processing System (AIPS) architecture as an example architecture for fault tolerance aerospace systems. Advantages and limitations were identified for each reliability evaluation tool. The CARE III program was designed primarily for analyzing ultrareliable flight control systems. The ARIES 82 program's primary use was to support university research and teaching. Both CARE III and ARIES 82 were not suited for determining the reliability of complex nodal networks of the type used to interconnect processing sites in the AIPS architecture. It was concluded that ARIES was not suitable for modeling advanced fault tolerant systems. It was further concluded that subject to some limitations (the difficulty in modeling systems with unpowered spare modules, systems where equipment maintenance must be considered, systems where failure depends on the sequence in which faults occurred, and systems where multiple faults greater than a double near coincident faults must be considered), CARE III is best suited for evaluating the reliability of advanced tolerant systems for air transport

    A run time adaptive architecture to trade-off performance for fault tolerance applied to a DVB on-board processor

    Get PDF
    Reliability is one of the key issues in space applications. Although highly flexible and generally less expensive than predominantly used ASICs, SRAM-based FPGAs are very susceptible to radiation effects. Hence, various fault tolerant techniques have to be applied in order to handle faults and protect the design. This paper presents a reconfigurable on-board processor capable of run-time adaptation to harsh environmental conditions and different functional demands. Run-time reconfigurability is achieved applying two different reconfiguration methodologies. We propose a novel self-reconfigurable architecture able to on demand duplicate or triplicate part of the design in order to form DMR and TMR structures. Moreover, we introduce two different approaches for voting the correct output. The first one is a traditional voter that adapts to different DMR/TMR domain positions whereas the second implies comparing the captured flip-flop values directly from the configuration memory read through ICAP. The comparison is done periodically by an embedded processor thus completely excluding the voting mechanism in hardware. The proposed run-time reconfiguration methodology provides savings in terms of device utilization, reconfiguration time, power consumption and significant reductions in the amount of rad-hard memory used by partial configurations

    ATAMM analysis tool

    Get PDF
    Diagnostics software for analyzing Algorithm to Architecture Mapping Model (ATAMM) based concurrent processing systems is presented. ATAMM is capable of modeling the execution of large grain algorithms on distributed data flow architectures. The tool graphically displays algorithm activities and processor activities for evaluation of the behavior and performance of an ATAMM based system. The tool's measurement capabilities indicate computing speed, throughput, concurrency, resource utilization, and overhead. Evaluations are performed on a simulated system using the software tool. The tool is used to estimate theoretical lower bound performance. Analysis results are shown to be comparable to the predictions

    AES-EPO study program, volume II Final study report

    Get PDF
    Packaging, machine organization, error detection, and fabrication and test in determining solution to long-term and time-critical reliability of Apollo command module guidance-control compute
    • …
    corecore