796 research outputs found

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    A Survey on the Best Choice for Modulus of Residue Code

    Get PDF
    Nowadays, the development of technology and the growing need for dense and complex chips have led chip industries to increase their attention on the circuit testability. Also, using the electronic chips in certain industries, such as the space industry, makes the design of fault tolerant circuits a challenging issue. Coding is one of the most suitable methods for error detection and correction. The residue code, as one of the best choices for error detection aims, is wildly used in large arithmetic circuits such as multiplier and also finds a wide range of applications in processors and digital filters. The modulus value in this technique directly effect on the area overhead parameter. A large area overhead is one of the most important disadvantages especially for testing the small circuits. The purpose of this paper is to study and investigate the best choice for residue code check base that is used for simple and small circuits such as a simple ripple carry adder. The performances are evaluated by applying stuck-at-faults and transition-faults by simulators. The efficiency is defined based on fault coverage and normalized area overhead. The results show that the modulus 3 with 95% efficiency provided the best result. Residue code with this modulus for checking a ripple carry adder, in comparison with duplex circuit, 30% improves the efficiency

    A Case Study of Self-Checking Circuits Reliability

    Get PDF
    In this paper, we analyze the reliability of self-checking circuits. A case study is presented in which a fault-tolerant system with duplicated self-checking modules is compared to the TMR version. It is shown that a duplicated self-checking system has a much higher reliability than that of the TMR counterpart. More importantly, the reliability of the selfchecking system does not drop as sharply as that of the TMR version. We also demonstrate the trade-offs between hardware complexity and error handling capability of self-checking circuits. Alternative self-checking designs where some hardware redundancies are removed with the lost of fault-secure and/or self-testing properties are also studied

    SABRE: A bio-inspired fault-tolerant electronic architecture

    Get PDF
    As electronic devices become increasingly complex, ensuring their reliable, fault-free operation is becoming correspondingly more challenging. It can be observed that, in spite of their complexity, biological systems are highly reliable and fault tolerant. Hence, we are motivated to take inspiration for biological systems in the design of electronic ones. In SABRE (self-healing cellular architectures for biologically inspired highly reliable electronic systems), we have designed a bio-inspired fault-tolerant hierarchical architecture for this purpose. As in biology, the foundation for the whole system is cellular in nature, with each cell able to detect faults in its operation and trigger intra-cellular or extra-cellular repair as required. At the next level in the hierarchy, arrays of cells are configured and controlled as function units in a transport triggered architecture (TTA), which is able to perform partial-dynamic reconfiguration to rectify problems that cannot be solved at the cellular level. Each TTA is, in turn, part of a larger multi-processor system which employs coarser grain reconfiguration to tolerate faults that cause a processor to fail. In this paper, we describe the details of operation of each layer of the SABRE hierarchy, and how these layers interact to provide a high systemic level of fault tolerance. © 2013 IOP Publishing Ltd

    Fault-tolerant computer study

    Get PDF
    A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed

    Fault-tolerant sub-lithographic design with rollback recovery

    Get PDF
    Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (Pf = 10^-7) in systems with 10^12 susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme

    Reliable Low-Latency and Low-Complexity Viterbi Architectures Benchmarked on ASIC and FPGA

    Get PDF
    The Viterbi algorithm is commonly applied in a number of sensitive usage models including decoding convolutional codes used in communications such as satellite communication, cellular relay, and wireless local area networks. Moreover, the algorithm has been applied to automatic speech recognition and storage devices. In this thesis, efficient error detection schemes for architectures based on low-latency, low-complexity Viterbi decoders are presented. The merit of the proposed schemes is that reliability requirements, overhead tolerance, and performance degradation limits are embedded in the structures and can be adapted accordingly. We also present three variants of recomputing with encoded operands and its modifications to detect both transient and permanent faults, coupled with signature-based schemes. The instrumented decoder architecture has been subjected to extensive error detection assessments through simulations, and application-specific integrated circuit (ASIC) [32nm library] and field-programmable gate array (FPGA) [Xilinx Virtex-6 family] implementations for benchmark. The proposed fine-grained approaches can be utilized based on reliability objectives and performance/implementation metrics degradation tolerance

    Design of CMOS PSCD circuits and checkers for stuck-at and stuck-on faults

    Get PDF
    [[abstract]]We present in this paper an approach to designing partially strongly code-disjoint (PSCD) CMOS circuits and checkers, considering transistor stuck-on faults in addition to gate-level stuck-at faults. Our design-for-testability (DFT) technique requires only a small number of extra transistors for monitoring abnormal static currents, coupled with a simple clocking scheme, to detect the stuck-on faults concurrently. The DFT circuitry not only can detect the faults in the functional circuit but also can detect or tolerate faults in itself, making it a good candidate for checker design. Switch and circuit level simulations were performed on a sample circuit, and a sample 4-out-of-8 code checker chip using the proposed technique has been designed, fabricated, and tested, showing the correctness of the method. Performance penalty is reduced by a novel BiCMOS checker circuit.[[fileno]]2030108010057[[department]]電機工程學
    corecore