Search CORE

728 research outputs found

AES-EPO study program, volume I Final study report

Author
Publication venue
Publication date
Field of study

Conceptual study of possible solutions to long- term and time-critical reliability problems affecting Apollo command module guidance and control compute

NASA Technical Reports Server

Reliability estimation procedures and CARE: The Computer-Aided Reliability Estimation Program

Author: Mathur F. P.
Publication venue
Publication date
Field of study

Ultrareliable fault-tolerant onboard digital systems for spacecraft intended for long mission life exploration of the outer planets are under development. The design of systems involving self-repair and fault-tolerance leads to the companion problem of quantifying and evaluating the survival probability of the system for the mission under consideration and the constraints imposed upon the system. Methods have been developed to (1) model self-repair and fault-tolerant organizations; (2) compute survival probability, mean life, and many other reliability predictive functions with respect to various systems and mission parameters; (3) perform sensitivity analysis of the system with respect to mission parameters; and (4) quantitatively compare competitive fault-tolerant systems. Various measures of comparison are offered. To automate the procedures of reliability mathematical modeling and evaluation, the CARE (computer-aided reliability estimation) program was developed. CARE is an interactive program residing on the UNIVAC 1108 system, which makes the above calculations and facilitates report preparation by providing output in tabular form, graphical 2-dimensional plots, and 3-dimensional projections. The reliability estimation of fault-tolerant organization by means of the CARE program is described

NASA Technical Reports Server

Dependability modeling and optimization of triple modular redundancy partitioning for SRAM-based FPGAs

Author: Hoque Khaza Anuarul
Mohamed Otmane Ait
Savaria Yvon
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

SRAM-based FPGAs are popular in the aerospace industry for their field programmability and low cost. However, they suffer from cosmic radiation-induced Single Event Upsets (SEUs). Triple Modular Redundancy (TMR) is a well-known technique to mitigate SEUs in FPGAs that is often used with another SEU mitigation technique known as configuration scrubbing. Traditional TMR provides protection against a single fault at a time, while partitioned TMR provides improved reliability and availability. In this paper, we present a methodology to analyze TMR partitioning at early design stage using probabilistic model checking. The proposed formal model can capture both single and multiple-cell upset scenarios, regardless of any assumption of equal partition sizes. Starting with a high-level description of a design, a Markov model is constructed from the Data Flow Graph (DFG) using a specified number of partitions, a component characterization library and a user defined scrub rate. Such a model and exhaustive analysis captures all the considered failures and repairs possible in the system within the radiation environment. Various reliability and availability properties are then verified automatically using the PRISM model checker exploring the relationship between the scrub frequency and the number of TMR partitions required to meet the design requirements. Also, the reported results show that based on a known voter failure rate, it is possible to find an optimal number of partitions at early design stages using our proposed method.Comment: Published in Reliability Engineering & System Safety Volume 182, February 2019, Pages 107-11

arXiv.org e-Print Archive

PolyPublie

Optimizing Scrubbing by Netlist Analysis for FPGA Configuration Bit Classification and Floorplanning

Author: Schmidt Bernhard
Teich Jürgen
Ziener Daniel
Zöllner Christian
Publication venue: 'Elsevier BV'
Publication date: 25/07/2017
Field of study

Existing scrubbing techniques for SEU mitigation on FPGAs do not guarantee an error-free operation after SEU recovering if the affected configuration bits do belong to feedback loops of the implemented circuits. In this paper, we a) provide a netlist-based circuit analysis technique to distinguish so-called critical configuration bits from essential bits in order to identify configuration bits which will need also state-restoring actions after a recovered SEU and which not. Furthermore, b) an alternative classification approach using fault injection is developed in order to compare both classification techniques. Moreover, c) we will propose a floorplanning approach for reducing the effective number of scrubbed frames and d), experimental results will give evidence that our optimization methodology not only allows to detect errors earlier but also to minimize the Mean-Time-To-Repair (MTTR) of a circuit considerably. In particular, we show that by using our approach, the MTTR for datapath-intensive circuits can be reduced by up to 48.5% in comparison to standard approaches

arXiv.org e-Print Archive

Improving reconfigurable systems reliability by combining periodical test and redundancy techniques: a case study

Author: Bezerra Eduardo Augusto
Gough Michael Paul
Vargas Fabian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2001
Field of study

This paper revises and introduces to the field of reconfigurable computer systems, some traditional techniques used in the fields of fault-tolerance and testing of digital circuits. The target area is that of on-board spacecraft electronics, as this class of application is a good candidate for the use of reconfigurable computing technology. Fault tolerant strategies are used in order for the system to adapt itself to the severe conditions found in space. In addition, the paper describes some problems and possible solutions for the use of reconfigurable components, based on programmable logic, in space applications

Sussex Research Online

Evaluation of reliability modeling tools for advanced fault tolerant systems

Author: Baker Robert
Scheper Charlotte
Publication venue
Publication date
Field of study

The Computer Aided Reliability Estimation (CARE III) and Automated Reliability Interactice Estimation System (ARIES 82) reliability tools for application to advanced fault tolerance aerospace systems were evaluated. To determine reliability modeling requirements, the evaluation focused on the Draper Laboratories' Advanced Information Processing System (AIPS) architecture as an example architecture for fault tolerance aerospace systems. Advantages and limitations were identified for each reliability evaluation tool. The CARE III program was designed primarily for analyzing ultrareliable flight control systems. The ARIES 82 program's primary use was to support university research and teaching. Both CARE III and ARIES 82 were not suited for determining the reliability of complex nodal networks of the type used to interconnect processing sites in the AIPS architecture. It was concluded that ARIES was not suitable for modeling advanced fault tolerant systems. It was further concluded that subject to some limitations (the difficulty in modeling systems with unpowered spare modules, systems where equipment maintenance must be considered, systems where failure depends on the sequence in which faults occurred, and systems where multiple faults greater than a double near coincident faults must be considered), CARE III is best suited for evaluating the reliability of advanced tolerant systems for air transport

NASA Technical Reports Server

A run time adaptive architecture to trade-off performance for fault tolerance applied to a DVB on-board processor

Author: Berrojo Luis
Regada Raúl
Riesgo Alcaide Teresa
Torre Arnanz Eduardo de la
Veljković Filip
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Reliability is one of the key issues in space applications. Although highly flexible and generally less expensive than predominantly used ASICs, SRAM-based FPGAs are very susceptible to radiation effects. Hence, various fault tolerant techniques have to be applied in order to handle faults and protect the design. This paper presents a reconfigurable on-board processor capable of run-time adaptation to harsh environmental conditions and different functional demands. Run-time reconfigurability is achieved applying two different reconfiguration methodologies. We propose a novel self-reconfigurable architecture able to on demand duplicate or triplicate part of the design in order to form DMR and TMR structures. Moreover, we introduce two different approaches for voting the correct output. The first one is a traditional voter that adapts to different DMR/TMR domain positions whereas the second implies comparing the captured flip-flop values directly from the configuration memory read through ICAP. The comparison is done periodically by an embedded processor thus completely excluding the voting mechanism in hardware. The proposed run-time reconfiguration methodology provides savings in terms of device utilization, reconfiguration time, power consumption and significant reductions in the amount of rad-hard memory used by partial configurations

Crossref

Archivo Digital UPM

ATAMM analysis tool

Author: Jones Robert
Mielke Roland
Stoughton John
Publication venue
Publication date
Field of study

Diagnostics software for analyzing Algorithm to Architecture Mapping Model (ATAMM) based concurrent processing systems is presented. ATAMM is capable of modeling the execution of large grain algorithms on distributed data flow architectures. The tool graphically displays algorithm activities and processor activities for evaluation of the behavior and performance of an ATAMM based system. The tool's measurement capabilities indicate computing speed, throughput, concurrency, resource utilization, and overhead. Evaluations are performed on a simulated system using the software tool. The tool is used to estimate theoretical lower bound performance. Analysis results are shown to be comparable to the predictions

NASA Technical Reports Server

AES-EPO study program, volume II Final study report

Author
Publication venue
Publication date
Field of study

Packaging, machine organization, error detection, and fabrication and test in determining solution to long-term and time-critical reliability of Apollo command module guidance-control compute

NASA Technical Reports Server