479 research outputs found

    Synthesis for circuit reliability

    Get PDF
    textElectrical and Computer Engineerin

    Design of a fault tolerant airborne digital computer. Volume 1: Architecture

    Get PDF
    This volume is concerned with the architecture of a fault tolerant digital computer for an advanced commercial aircraft. All of the computations of the aircraft, including those presently carried out by analogue techniques, are to be carried out in this digital computer. Among the important qualities of the computer are the following: (1) The capacity is to be matched to the aircraft environment. (2) The reliability is to be selectively matched to the criticality and deadline requirements of each of the computations. (3) The system is to be readily expandable. contractible, and (4) The design is to appropriate to post 1975 technology. Three candidate architectures are discussed and assessed in terms of the above qualities. Of the three candidates, a newly conceived architecture, Software Implemented Fault Tolerance (SIFT), provides the best match to the above qualities. In addition SIFT is particularly simple and believable. The other candidates, Bus Checker System (BUCS), also newly conceived in this project, and the Hopkins multiprocessor are potentially more efficient than SIFT in the use of redundancy, but otherwise are not as attractive

    Study of Single Event Transient Error Mitigation

    Get PDF
    Single Event Transient (SET) errors in ground-level electronic devices are a growing concern in the radiation hardening field. However, effective SET mitigation technologies which satisfy ground-level demands such as generic, flexible, efficient, and fast, are limited. The classic Triple Modular Redundancy (TMR) method is the most well-known and popular technique in space and nuclear environment. But it leads to more than 200% area and power overheads, which is too costly to implement in ground-level applications. Meanwhile, the coding technique is extensively utilized to inhibit upset errors in storage cells, but the irregularity of combinatorial logics limits its use in SET mitigation. Therefore, SET mitigation techniques suitable for ground-level applications need to be addressed. Aware of the demands for SET mitigation techniques in ground-level applications, this thesis proposes two novel approaches based on the redundant wire and approximate logic techniques. The Redundant Wire is a SET mitigation technique. By selectively adding redundant wire connections, the technique can prohibit targeted transient faults from propagating on the fly. This thesis proposes a set of signature-based evaluation equations to efficiently estimate the protecting effect provided by each redundant wire candidates. Based on the estimated results, a greedy algorithm is used to insert the best candidate repeatedly. Simulation results substantiate that the evaluation equations can achieve up to 98% accuracy on average. Regarding protecting effects, the technique can mask 18.4% of the faults with a 4.3% area, 4.4% power, and 5.4% delay overhead on average. Overall, the quality of protecting results obtained are 2.8 times better than the previous work. Additionally, the impact of synthesis constraints and signature length are discussed. Approximate Logic is a partial TMR technique offering a trade-off between fault coverage and area overheads. The approximate logic consists of an under-approximate logic and an over-approximate logic. The under-approximate logic is a subset of the original min-terms and the over-approximate logic is a subset of the original max-terms. This thesis proposes a new algorithm for generating the two approximate logics. Through the generating process, the algorithm considers the intrinsic failure probabilities of each gate and utilizes a confidence interval estimate equation to minimize required computations. The technique is applied to two fault models, Stuck-at and SET, and the separate results are compared and discussed. The results show that the technique can reduce the error 75% with an area penalty of 46% on some circuits. The delay overheads of this technique are always two additional layers of logic. The two proposed SET mitigation techniques are both applicable to generic combinatorial logics and with high flexibility. The simulation shows promising SET mitigation ability. The proposed mitigation techniques provide designers more choices in developing reliable combinatorial logic in ground-level applications

    On Fault Tolerance Methods for Networks-on-Chip

    Get PDF
    Technology scaling has proceeded into dimensions in which the reliability of manufactured devices is becoming endangered. The reliability decrease is a consequence of physical limitations, relative increase of variations, and decreasing noise margins, among others. A promising solution for bringing the reliability of circuits back to a desired level is the use of design methods which introduce tolerance against possible faults in an integrated circuit. This thesis studies and presents fault tolerance methods for network-onchip (NoC) which is a design paradigm targeted for very large systems-onchip. In a NoC resources, such as processors and memories, are connected to a communication network; comparable to the Internet. Fault tolerance in such a system can be achieved at many abstraction levels. The thesis studies the origin of faults in modern technologies and explains the classification to transient, intermittent and permanent faults. A survey of fault tolerance methods is presented to demonstrate the diversity of available methods. Networks-on-chip are approached by exploring their main design choices: the selection of a topology, routing protocol, and flow control method. Fault tolerance methods for NoCs are studied at different layers of the OSI reference model. The data link layer provides a reliable communication link over a physical channel. Error control coding is an efficient fault tolerance method especially against transient faults at this abstraction level. Error control coding methods suitable for on-chip communication are studied and their implementations presented. Error control coding loses its effectiveness in the presence of intermittent and permanent faults. Therefore, other solutions against them are presented. The introduction of spare wires and split transmissions are shown to provide good tolerance against intermittent and permanent errors and their combination to error control coding is illustrated. At the network layer positioned above the data link layer, fault tolerance can be achieved with the design of fault tolerant network topologies and routing algorithms. Both of these approaches are presented in the thesis together with realizations in the both categories. The thesis concludes that an optimal fault tolerance solution contains carefully co-designed elements from different abstraction levelsSiirretty Doriast

    Techniques for the realization of ultra- reliable spaceborne computer Final report

    Get PDF
    Bibliography and new techniques for use of error correction and redundancy to improve reliability of spaceborne computer

    Error detection for data communication systems

    Get PDF
    A description of the problems encountered in the data communications field and the various solutions can be found in a number of diverse, and often theoretical sources. My intention in writing this thesis is to bring together, in a practical and understandable manner, the theory and the application of a method of error detection used extensively in data communication systems known as the Cyclic Redundancy Check (CRC). To provide some background on the subject, a description of a data communication system is presented, and the possible sources of error are explored in some detail. Data transmission formats are described, and a comparison of various error detection schemes is presented so that the advantages of the CRC can be more readily understood. The theory behind the CRC and its physical implementation is given, along with a detailed example showing the effectiveness of the CRC for error detection. Finally, the current state-of-the-art technology available for implementing the various error detection schemes is discussed, with particular emphasis on those technologies that perform the Cyclic Redundancy Check

    Error control coding for semiconductor memories

    Get PDF
    All modern computers have memories built from VLSI RAM chips. Individually, these devices are highly reliable and any single chip may perform for decades before failing. However, when many of the chips are combined in a single memory, the time that at least one of them fails could decrease to mere few hours. The presence of the failed chips causes errors when binary data are stored in and read out from the memory. As a consequence the reliability of the computer memories degrade. These errors are classified into hard errors and soft errors. These can also be termed as permanent and temporary errors respectively. In some situations errors may show up as random errors, in which both 1-to-O errors and 0-to-l errors occur randomly in a memory word. In other situations the most likely errors are unidirectional errors in which 1-to-O errors or 0-to-l errors may occur but not both of them in one particular memory word. To achieve a high speed and highly reliable computer, we need large capacity memory. Unfortunately, with high density of semiconductor cells in memory, the error rate increases dramatically. Especially, the VLSI RAMs suffer from soft errors caused by alpha-particle radiation. Thus the reliability of computer could become unacceptable without error reducing schemes. In practice several schemes to reduce the effects of the memory errors were commonly used. But most of them are valid only for hard errors. As an efficient and economical method, error control coding can be used to overcome both hard and soft errors. Therefore it is becoming a widely used scheme in computer industry today. In this thesis, we discuss error control coding for semiconductor memories. The thesis consists of six chapters. Chapter one is an introduction to error detecting and correcting coding for computer memories. Firstly, semiconductor memories and their problems are discussed. Then some schemes for error reduction in computer memories are given and the advantages of using error control coding over other schemes are presented. In chapter two, after a brief review of memory organizations, memory cells and their physical constructions and principle of storing data are described. Then we analyze mechanisms of various errors occurring in semiconductor memories so that, for different errors different coding schemes could be selected. Chapter three is devoted to the fundamental coding theory. In this chapter background on encoding and decoding algorithms are presented. In chapter four, random error control codes are discussed. Among them error detecting codes, single* error correcting/double error detecting codes and multiple error correcting codes are analyzed. By using examples, the decoding implementations for parity codes, Hamming codes, modified Hamming codes and majority logic codes are demonstrated. Also in this chapter it was shown that by combining error control coding and other schemes, the reliability of the memory can be improved by many orders. For unidirectional errors, we introduced unordered codes in chapter five. Two types of the unordered codes are discussed. They are systematic and nonsystematic unordered codes. Both of them are very powerful for unidirectional error detection. As an example of optimal nonsystematic unordered code, an efficient balanced code are analyzed. Then as an example of systematic unordered codes Berger codes are analyzed. Considering the fact that in practice random errors still may occur in unidirectional error memories, some recently developed t-random error correcting/all unidirectional error detecting codes are introduced. Illustrative examples are also included to facilitate the explanation. Chapter six is the conclusions of the thesis. The whole thesis is oriented to the applications of error control coding for semiconductor memories. Most of the codes discussed in the thesis are widely used in practice. Through the thesis we attempt to provide a review of coding in computer memories and emphasize the advantage of coding. It is obvious that with the requirement of higher speed and higher capacity semiconductor memories, error control coding will play even more important role in the future

    Autonomous Recovery Of Reconfigurable Logic Devices Using Priority Escalation Of Slack

    Get PDF
    Field Programmable Gate Array (FPGA) devices offer a suitable platform for survivable hardware architectures in mission-critical systems. In this dissertation, active dynamic redundancy-based fault-handling techniques are proposed which exploit the dynamic partial reconfiguration capability of SRAM-based FPGAs. Self-adaptation is realized by employing reconfiguration in detection, diagnosis, and recovery phases. To extend these concepts to semiconductor aging and process variation in the deep submicron era, resilient adaptable processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of the underlying computational devices. A new approach to autonomous fault-handling which addresses these goals is developed using only a uniplex hardware arrangement. It operates by observing a health metric to achieve Fault Demotion using Recon- figurable Slack (FaDReS). Here an autonomous fault isolation scheme is employed which neither requires test vectors nor suspends the computational throughput, but instead observes the value of a health metric based on runtime input. The deterministic flow of the fault isolation scheme guarantees success in a bounded number of reconfigurations of the FPGA fabric. FaDReS is then extended to the Priority Using Resource Escalation (PURE) online redundancy scheme which considers fault-isolation latency and throughput trade-offs under a dynamic spare arrangement. While deep-submicron designs introduce new challenges, use of adaptive techniques are seen to provide several promising avenues for improving resilience. The scheme developed is demonstrated by hardware design of various signal processing circuits and their implementation on a Xilinx Virtex-4 FPGA device. These include a Discrete Cosine Transform (DCT) core, Motion Estimation (ME) engine, Finite Impulse Response (FIR) Filter, Support Vector Machine (SVM), and Advanced Encryption Standard (AES) blocks in addition to MCNC benchmark circuits. A iii significant reduction in power consumption is achieved ranging from 83% for low motion-activity scenes to 12.5% for high motion activity video scenes in a novel ME engine configuration. For a typical benchmark video sequence, PURE is shown to maintain a PSNR baseline near 32dB. The diagnosability, reconfiguration latency, and resource overhead of each approach is analyzed. Compared to previous alternatives, PURE maintains a PSNR within a difference of 4.02dB to 6.67dB from the fault-free baseline by escalating healthy resources to higher-priority signal processing functions. The results indicate the benefits of priority-aware resiliency over conventional redundancy approaches in terms of fault-recovery, power consumption, and resource-area requirements. Together, these provide a broad range of strategies to achieve autonomous recovery of reconfigurable logic devices under a variety of constraints, operating conditions, and optimization criteria
    corecore