Search CORE

23,346 research outputs found

Recommended from our members

Fault Tolerance Against Design Faults

Author: Strigini L.
Publication venue: 'Wiley'
Publication date: 01/01/2005
Field of study

City Research Online

Fault-tolerant sub-lithographic design with rollback recovery

Author: DeHon André
Naeimi Helia
Publication venue: 'AIP Publishing'
Publication date: 19/03/2008
Field of study

Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (Pf = 10^-7) in systems with 10^12 susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme

Caltech Authors

ScholarlyCommons@Penn

Integrated analysis of error detection and recovery

Author: Lee Y. H.
Shin K. G.
Publication venue
Publication date
Field of study

An integrated modeling and analysis of error detection and recovery is presented. When fault latency and/or error latency exist, the system may suffer from multiple faults or error propagations which seriously deteriorate the fault-tolerant capability. Several detection models that enable analysis of the effect of detection mechanisms on the subsequent error handling operations and the overall system reliability were developed. Following detection of the faulty unit and reconfiguration of the system, the contaminated processes or tasks have to be recovered. The strategies of error recovery employed depend on the detection mechanisms and the available redundancy. Several recovery methods including the rollback recovery are considered. The recovery overhead is evaluated as an index of the capabilities of the detection and reconfiguration mechanisms

NASA Technical Reports Server

A probabilistic model for information and sensor validation

Author: Ibargüengoytia PH
Sucar LE
Vadera S
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2006
Field of study

This paper develops a new theory and model for information and sensor validation. The model represents relationships between variables using Bayesian networks and utilizes probabilistic propagation to estimate the expected values of variables. If the estimated value of a variable differs from the actual value, an apparent fault is detected. The fault is only apparent since it may be that the estimated value is itself based on faulty data. The theory extends our understanding of when it is possible to isolate real faults from potential faults and supports the development of an algorithm that is capable of isolating real faults without deferring the problem to the use of expert provided domain-specific rules. To enable practical adoption for real-time processes, an any time version of the algorithm is developed, that, unlike most other algorithms, is capable of returning improving assessments of the validity of the sensors as it accumulates more evidence with time. The developed model is tested by applying it to the validation of temperature sensors during the start-up phase of a gas turbine when conditions are not stable; a problem that is known to be challenging. The paper concludes with a discussion of the practical applicability and scalability of the model

CiteSeerX

University of Salford Institutional Repository