23,346 research outputs found

    Fault-tolerant sub-lithographic design with rollback recovery

    Get PDF
    Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (Pf = 10^-7) in systems with 10^12 susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme

    Integrated analysis of error detection and recovery

    Get PDF
    An integrated modeling and analysis of error detection and recovery is presented. When fault latency and/or error latency exist, the system may suffer from multiple faults or error propagations which seriously deteriorate the fault-tolerant capability. Several detection models that enable analysis of the effect of detection mechanisms on the subsequent error handling operations and the overall system reliability were developed. Following detection of the faulty unit and reconfiguration of the system, the contaminated processes or tasks have to be recovered. The strategies of error recovery employed depend on the detection mechanisms and the available redundancy. Several recovery methods including the rollback recovery are considered. The recovery overhead is evaluated as an index of the capabilities of the detection and reconfiguration mechanisms

    A probabilistic model for information and sensor validation

    Get PDF
    This paper develops a new theory and model for information and sensor validation. The model represents relationships between variables using Bayesian networks and utilizes probabilistic propagation to estimate the expected values of variables. If the estimated value of a variable differs from the actual value, an apparent fault is detected. The fault is only apparent since it may be that the estimated value is itself based on faulty data. The theory extends our understanding of when it is possible to isolate real faults from potential faults and supports the development of an algorithm that is capable of isolating real faults without deferring the problem to the use of expert provided domain-specific rules. To enable practical adoption for real-time processes, an any time version of the algorithm is developed, that, unlike most other algorithms, is capable of returning improving assessments of the validity of the sensors as it accumulates more evidence with time. The developed model is tested by applying it to the validation of temperature sensors during the start-up phase of a gas turbine when conditions are not stable; a problem that is known to be challenging. The paper concludes with a discussion of the practical applicability and scalability of the model
    • …
    corecore