6,742 research outputs found

    Immunotronics - novel finite-state-machine architectures with built-in self-test using self-nonself differentiation

    Get PDF
    A novel approach to hardware fault tolerance is demonstrated that takes inspiration from the human immune system as a method of fault detection. The human immune system is a remarkable system of interacting cells and organs that protect the body from invasion and maintains reliable operation even in the presence of invading bacteria or viruses. This paper seeks to address the field of electronic hardware fault tolerance from an immunological perspective with the aim of showing how novel methods based upon the operation of the immune system can both complement and create new approaches to the development of fault detection mechanisms for reliable hardware systems. In particular, it is shown that by use of partial matching, as prevalent in biological systems, high fault coverage can be achieved with the added advantage of reducing memory requirements. The development of a generic finite-state-machine immunization procedure is discussed that allows any system that can be represented in such a manner to be "immunized" against the occurrence of faulty operation. This is demonstrated by the creation of an immunized decade counter that can detect the presence of faults in real tim

    Fault-tolerant computer study

    Get PDF
    A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed

    Control-flow checking via regular expressions

    Get PDF
    The present paper explains a new approach to program control flow checking. The check has been inserted at source-code level using a signature methodology based on regular expressions. The signature checking is performed without a dedicated watchdog processor but resorting to inter-process communication (IPC) facilities offered by most of the modern operating systems. The proposed approach allows very low memory overhead and trade-off between fault latency and program execution time overhead

    Fault-tolerant sub-lithographic design with rollback recovery

    Get PDF
    Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (Pf = 10^-7) in systems with 10^12 susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme

    Integrated analysis of error detection and recovery

    Get PDF
    An integrated modeling and analysis of error detection and recovery is presented. When fault latency and/or error latency exist, the system may suffer from multiple faults or error propagations which seriously deteriorate the fault-tolerant capability. Several detection models that enable analysis of the effect of detection mechanisms on the subsequent error handling operations and the overall system reliability were developed. Following detection of the faulty unit and reconfiguration of the system, the contaminated processes or tasks have to be recovered. The strategies of error recovery employed depend on the detection mechanisms and the available redundancy. Several recovery methods including the rollback recovery are considered. The recovery overhead is evaluated as an index of the capabilities of the detection and reconfiguration mechanisms

    Sequential Circuit Design for Embedded Cryptographic Applications Resilient to Adversarial Faults

    Get PDF
    In the relatively young field of fault-tolerant cryptography, the main research effort has focused exclusively on the protection of the data path of cryptographic circuits. To date, however, we have not found any work that aims at protecting the control logic of these circuits against fault attacks, which thus remains the proverbial Achilles’ heel. Motivated by a hypothetical yet realistic fault analysis attack that, in principle, could be mounted against any modular exponentiation engine, even one with appropriate data path protection, we set out to close this remaining gap. In this paper, we present guidelines for the design of multifault-resilient sequential control logic based on standard Error-Detecting Codes (EDCs) with large minimum distance. We introduce a metric that measures the effectiveness of the error detection technique in terms of the effort the attacker has to make in relation to the area overhead spent in implementing the EDC. Our comparison shows that the proposed EDC-based technique provides superior performance when compared against regular N-modular redundancy techniques. Furthermore, our technique scales well and does not affect the critical path delay

    Autonomous spacecraft maintenance study group

    Get PDF
    A plan to incorporate autonomous spacecraft maintenance (ASM) capabilities into Air Force spacecraft by 1989 is outlined. It includes the successful operation of the spacecraft without ground operator intervention for extended periods of time. Mechanisms, along with a fault tolerant data processing system (including a nonvolatile backup memory) and an autonomous navigation capability, are needed to replace the routine servicing that is presently performed by the ground system. The state of the art fault handling capabilities of various spacecraft and computers are described, and a set conceptual design requirements needed to achieve ASM is established. Implementations for near term technology development needed for an ASM proof of concept demonstration by 1985, and a research agenda addressing long range academic research for an advanced ASM system for 1990s are established

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
    • …
    corecore