728 research outputs found
AES-EPO study program, volume I Final study report
Conceptual study of possible solutions to long- term and time-critical reliability problems affecting Apollo command module guidance and control compute
Reliability estimation procedures and CARE: The Computer-Aided Reliability Estimation Program
Ultrareliable fault-tolerant onboard digital systems for spacecraft intended for long mission life exploration of the outer planets are under development. The design of systems involving self-repair and fault-tolerance leads to the companion problem of quantifying and evaluating the survival probability of the system for the mission under consideration and the constraints imposed upon the system. Methods have been developed to (1) model self-repair and fault-tolerant organizations; (2) compute survival probability, mean life, and many other reliability predictive functions with respect to various systems and mission parameters; (3) perform sensitivity analysis of the system with respect to mission parameters; and (4) quantitatively compare competitive fault-tolerant systems. Various measures of comparison are offered. To automate the procedures of reliability mathematical modeling and evaluation, the CARE (computer-aided reliability estimation) program was developed. CARE is an interactive program residing on the UNIVAC 1108 system, which makes the above calculations and facilitates report preparation by providing output in tabular form, graphical 2-dimensional plots, and 3-dimensional projections. The reliability estimation of fault-tolerant organization by means of the CARE program is described
Dependability modeling and optimization of triple modular redundancy partitioning for SRAM-based FPGAs
SRAM-based FPGAs are popular in the aerospace industry for their field
programmability and low cost. However, they suffer from cosmic
radiation-induced Single Event Upsets (SEUs). Triple Modular Redundancy (TMR)
is a well-known technique to mitigate SEUs in FPGAs that is often used with
another SEU mitigation technique known as configuration scrubbing. Traditional
TMR provides protection against a single fault at a time, while partitioned TMR
provides improved reliability and availability. In this paper, we present a
methodology to analyze TMR partitioning at early design stage using
probabilistic model checking. The proposed formal model can capture both single
and multiple-cell upset scenarios, regardless of any assumption of equal
partition sizes. Starting with a high-level description of a design, a Markov
model is constructed from the Data Flow Graph (DFG) using a specified number of
partitions, a component characterization library and a user defined scrub rate.
Such a model and exhaustive analysis captures all the considered failures and
repairs possible in the system within the radiation environment. Various
reliability and availability properties are then verified automatically using
the PRISM model checker exploring the relationship between the scrub frequency
and the number of TMR partitions required to meet the design requirements.
Also, the reported results show that based on a known voter failure rate, it is
possible to find an optimal number of partitions at early design stages using
our proposed method.Comment: Published in Reliability Engineering & System Safety Volume 182,
February 2019, Pages 107-11
Optimizing Scrubbing by Netlist Analysis for FPGA Configuration Bit Classification and Floorplanning
Existing scrubbing techniques for SEU mitigation on FPGAs do not guarantee an
error-free operation after SEU recovering if the affected configuration bits do
belong to feedback loops of the implemented circuits. In this paper, we a)
provide a netlist-based circuit analysis technique to distinguish so-called
critical configuration bits from essential bits in order to identify
configuration bits which will need also state-restoring actions after a
recovered SEU and which not. Furthermore, b) an alternative classification
approach using fault injection is developed in order to compare both
classification techniques. Moreover, c) we will propose a floorplanning
approach for reducing the effective number of scrubbed frames and d),
experimental results will give evidence that our optimization methodology not
only allows to detect errors earlier but also to minimize the
Mean-Time-To-Repair (MTTR) of a circuit considerably. In particular, we show
that by using our approach, the MTTR for datapath-intensive circuits can be
reduced by up to 48.5% in comparison to standard approaches
Improving reconfigurable systems reliability by combining periodical test and redundancy techniques: a case study
This paper revises and introduces to the field of reconfigurable computer systems, some traditional techniques used in the fields of fault-tolerance and testing of digital circuits. The target area is that of on-board spacecraft electronics, as this class of application is a good candidate for the use of reconfigurable computing technology. Fault tolerant strategies are used in order for the system to adapt itself to the severe conditions found in space. In addition, the paper describes some problems and possible solutions for the use of reconfigurable components, based on programmable logic, in space applications
Evaluation of reliability modeling tools for advanced fault tolerant systems
The Computer Aided Reliability Estimation (CARE III) and Automated Reliability Interactice Estimation System (ARIES 82) reliability tools for application to advanced fault tolerance aerospace systems were evaluated. To determine reliability modeling requirements, the evaluation focused on the Draper Laboratories' Advanced Information Processing System (AIPS) architecture as an example architecture for fault tolerance aerospace systems. Advantages and limitations were identified for each reliability evaluation tool. The CARE III program was designed primarily for analyzing ultrareliable flight control systems. The ARIES 82 program's primary use was to support university research and teaching. Both CARE III and ARIES 82 were not suited for determining the reliability of complex nodal networks of the type used to interconnect processing sites in the AIPS architecture. It was concluded that ARIES was not suitable for modeling advanced fault tolerant systems. It was further concluded that subject to some limitations (the difficulty in modeling systems with unpowered spare modules, systems where equipment maintenance must be considered, systems where failure depends on the sequence in which faults occurred, and systems where multiple faults greater than a double near coincident faults must be considered), CARE III is best suited for evaluating the reliability of advanced tolerant systems for air transport
A run time adaptive architecture to trade-off performance for fault tolerance applied to a DVB on-board processor
Reliability is one of the key issues in space applications. Although highly flexible and generally less expensive than predominantly used ASICs, SRAM-based FPGAs are very susceptible to radiation effects. Hence, various fault tolerant techniques have to be applied in order to handle faults and protect the design. This paper presents a reconfigurable on-board processor capable of run-time adaptation to harsh environmental conditions and different functional demands. Run-time reconfigurability is achieved applying two different reconfiguration methodologies. We propose a novel self-reconfigurable architecture able to on demand duplicate or triplicate part of the design in order to form DMR and TMR structures. Moreover, we introduce two different approaches for voting the correct output. The first one is a traditional voter that adapts to different DMR/TMR domain positions whereas the second implies comparing the captured flip-flop values directly from the configuration memory read through ICAP. The comparison is done periodically by an embedded processor thus completely excluding the voting mechanism in hardware. The proposed run-time reconfiguration methodology provides savings in terms of device utilization, reconfiguration time, power consumption and significant reductions in the amount of rad-hard memory used by partial configurations
ATAMM analysis tool
Diagnostics software for analyzing Algorithm to Architecture Mapping Model (ATAMM) based concurrent processing systems is presented. ATAMM is capable of modeling the execution of large grain algorithms on distributed data flow architectures. The tool graphically displays algorithm activities and processor activities for evaluation of the behavior and performance of an ATAMM based system. The tool's measurement capabilities indicate computing speed, throughput, concurrency, resource utilization, and overhead. Evaluations are performed on a simulated system using the software tool. The tool is used to estimate theoretical lower bound performance. Analysis results are shown to be comparable to the predictions
AES-EPO study program, volume II Final study report
Packaging, machine organization, error detection, and fabrication and test in determining solution to long-term and time-critical reliability of Apollo command module guidance-control compute
- …