Search CORE

4,744 research outputs found

Detailed Modeling and Reliability Analysis of Fault-Tolerant Processor Arrays

Author: Fortes Jose A. B.
Lopez-Benitez Noe
Publication venue: 'Purdue University (bepress)'
Publication date: 01/06/1989
Field of study

Recent advances in VLSI/WSI technology have led to the design of processor arrays with a large number of processing elements confined in small areas. The use of redundancy to increase fault-tolerance has the effect of reducing the ratio of area dedicated to processing elements over the area occupied by other resources in the array. The assumption of fault-free hardware support (switches, buses, interconnection links, etc.,), leads at best to conservative reliability estimates. However, detailed modeling entails not only an explosive growth in the model state space but also a difficult model construction process. To address the latter problem, a systematic method to construct Markov models for the reliability evaluation of processor arrays is proposed. This method is based on the premise that the fault behavior of a processor array can be modeled by a Stochastic Petri Net (SPN). However, in order to obtain a more compact representation, a set of attributes is associated with each transition in the Petri net model. This representation is referred to as a Modified Stochastic Petri Net (MSPN) model. A MSPN allows the construction of the corresponding Markov model as the reachability graph is being generated. The Markov model generated can include the effect of failures of several different components of the array as well as the effect of a peculiar distribution of faults when the reconfiguration occurs. Specific reconfiguration schemes such as Successive Row Elimination (SRE), Alternate Row-Column Elimination (ARCE) and Direct Reconfiguration (DR), are analyze

Purdue E-Pubs

DeSyRe: on-Demand System Reliability

Author: Armato Antonino
Bouganis Christos-Savvas
Falsafi Babak
Gaydadjiev Georgi
Isaza Sebastian
Malek Alirad
Mariani Riccardo
Pnevmatikatos Dionisios N
Pradhan Dhiraj K
Rauwerda Gerard
Seepers Robert
Shafik Rishad Ahmed
Sourdis Ioannis
Strydis Christos
Sunesen Kim
Theodoropoulos Dimitris
Tzilis Stavros
Vavouras Michail
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

Southampton (e-Prints Soton)

EUR Research Repository

Chalmers Research

Chalmers Publication Library

Explore Bristol Research

Correction, improvement and model verification of CARE 3, version 3

Author: Altschul R. E.
Manke J. W.
Nelson D. L.
Rose D. M.
Publication venue
Publication date
Field of study

An independent verification of the CARE 3 mathematical model and computer code was conducted and reported in NASA Contractor Report 166096, Review and Verification of CARE 3 Mathematical Model and Code: Interim Report. The study uncovered some implementation errors that were corrected and are reported in this document. The corrected CARE 3 program is called version 4. Thus the document, correction. improvement, and model verification of CARE 3, version 3 was written in April 1984. It is being published now as it has been determined to contain a more accurate representation of CARE 3 than the preceding document of April 1983. This edition supercedes NASA-CR-166122 entitled, 'Correction and Improvement of CARE 3,' version 3, April 1983

NASA Technical Reports Server

Care 3, Phase 1, volume 1

Author: Bryant L. A.
Guccione L.
Stiffler J. J.
Publication venue
Publication date
Field of study

A computer program to aid in accessing the reliability of fault tolerant avionics systems was developed. A simple mathematical expression was used to evaluate the reliability of any redundant configuration over any interval during which the failure rates and coverage parameters remained unaffected by configuration changes. Provision was made for convolving such expressions in order to evaluate the reliability of a dual mode system. A coverage model was also developed to determine the various relevant coverage coefficients as a function of the available hardware and software fault detector characteristics, and subsequent isolation and recovery delay statistics

NASA Technical Reports Server

A fault injection experiment using the AIRLAB Diagnostic Emulation Facility

Author: Baker Robert
Mangum Scott
Scheper Charlotte
Publication venue
Publication date
Field of study

The preparation for, conduct of, and results of a simulation based fault injection experiment conducted using the AIRLAB Diagnostic Emulation facilities is described. An objective of this experiment was to determine the effectiveness of the diagnostic self-test sequences used to uncover latent faults in a logic network providing the key fault tolerance features for a flight control computer. Another objective was to develop methods, tools, and techniques for conducting the experiment. More than 1600 faults were injected into a logic gate level model of the Data Communicator/Interstage (C/I). For each fault injected, diagnostic self-test sequences consisting of over 300 test vectors were supplied to the C/I model as inputs. For each test vector within a test sequence, the outputs from the C/I model were compared to the outputs of a fault free C/I. If the outputs differed, the fault was considered detectable for the given test vector. These results were then analyzed to determine the effectiveness of some test sequences. The results established coverage of selt-test diagnostics, identified areas in the C/I logic where the tests did not locate faults, and suggest fault latency reduction opportunities

NASA Technical Reports Server

Chip level simulation of fault tolerant computers

Author: Armstrong J. R.
Publication venue
Publication date
Field of study

Chip level modeling techniques, functional fault simulation, simulation software development, a more efficient, high level version of GSP, and a parallel architecture for functional simulation are discussed

NASA Technical Reports Server

Civil Space Technology Initiative: a First Step

Author
Publication venue
Publication date
Field of study

This is the first published overview of OAST's focused program, the Civil Space Technology Initiative, (CSTI) which started in FY88. This publication describes the goals, technical approach, current status, and plans for CSTI. Periodic updates are planned

NASA Technical Reports Server

Parallelized reliability estimation of reconfigurable computer networks

Author: Das Subhendu
Nicol David M.
Palumbo Dan
Publication venue
Publication date
Field of study

A parallelized system, ASSURE, for computing the reliability of embedded avionics flight control systems which are able to reconfigure themselves in the event of failure is described. ASSURE accepts a grammar that describes a reliability semi-Markov state-space. From this it creates a parallel program that simultaneously generates and analyzes the state-space, placing upper and lower bounds on the probability of system failure. ASSURE is implemented on a 32-node Intel iPSC/860, and has achieved high processor efficiencies on real problems. Through a combination of improved algorithms, exploitation of parallelism, and use of an advanced microprocessor architecture, ASSURE has reduced the execution time on substantial problems by a factor of one thousand over previous workstation implementations. Furthermore, ASSURE's parallel execution rate on the iPSC/860 is an order of magnitude faster than its serial execution rate on a Cray-2 supercomputer. While dynamic load balancing is necessary for ASSURE's good performance, it is needed only infrequently; the particular method of load balancing used does not substantially affect performance

NASA Technical Reports Server