709 research outputs found

    A Generic Dual Core Architecture with Error Containment

    Get PDF
    The dual core strategy allows to construct a fail-silent processor from two instances (master/checker) of any arbitrary standard processor. Its main drawbacks are its vulnerability with respect to common mode failures and the existence of residual single points of failure. In this paper we propose a generic frame that systematically eliminates these drawbacks. First, we employ temporal redundancy to cope with common mode failures. Unlike similar approaches we can ensure error containment even if -- as a result of the temporal redundancy -- the comparison by the checker core is delayed. We attain this by introducing a specific delay element for outgoing data. Second, we perform a systematic analysis of potential single points of failure and eliminate these by careful layout, self-checking circuits and similar methods. We finally validate our approach by means of exhaustive fault injection experiments. The results indicate a 100% self-checking coverage for stuck-at faults and complete error containment. Since the proposed framework has been kept generic in the sense that the individual standard processor cores are treated as black boxes, these results are valid independent of the core actually used

    Design of CMOS PSCD circuits and checkers for stuck-at and stuck-on faults

    Get PDF
    [[abstract]]We present in this paper an approach to designing partially strongly code-disjoint (PSCD) CMOS circuits and checkers, considering transistor stuck-on faults in addition to gate-level stuck-at faults. Our design-for-testability (DFT) technique requires only a small number of extra transistors for monitoring abnormal static currents, coupled with a simple clocking scheme, to detect the stuck-on faults concurrently. The DFT circuitry not only can detect the faults in the functional circuit but also can detect or tolerate faults in itself, making it a good candidate for checker design. Switch and circuit level simulations were performed on a sample circuit, and a sample 4-out-of-8 code checker chip using the proposed technique has been designed, fabricated, and tested, showing the correctness of the method. Performance penalty is reduced by a novel BiCMOS checker circuit.[[fileno]]2030108010057[[department]]電機工程學

    Investigations into the feasibility of an on-line test methodology

    Get PDF
    This thesis aims to understand how information coding and the protocol that it supports can affect the characteristics of electronic circuits. More specifically, it investigates an on-line test methodology called IFIS (If it Fails It Stops) and its impact on the design, implementation and subsequent characteristics of circuits intended for application specific lC (ASIC) technology. The first study investigates the influences of information coding and protocol on the characteristics of IFIS systems. The second study investigates methods of circuit design applicable to IFIS cells and identifies the· technique possessing the characteristics most suitable for on-line testing. The third study investigates the characteristics of a 'real-life' commercial UART re-engineered using the techniques resulting from the previous two studies. The final study investigates the effects of the halting properties endowed by the protocol on failure diagnosis within IFIS systems. The outcome of this work is an identification and characterisation of the factors that influence behaviour, implementation costs and the ability to test and diagnose IFIS designs

    Design of a fault tolerant airborne digital computer. Volume 1: Architecture

    Get PDF
    This volume is concerned with the architecture of a fault tolerant digital computer for an advanced commercial aircraft. All of the computations of the aircraft, including those presently carried out by analogue techniques, are to be carried out in this digital computer. Among the important qualities of the computer are the following: (1) The capacity is to be matched to the aircraft environment. (2) The reliability is to be selectively matched to the criticality and deadline requirements of each of the computations. (3) The system is to be readily expandable. contractible, and (4) The design is to appropriate to post 1975 technology. Three candidate architectures are discussed and assessed in terms of the above qualities. Of the three candidates, a newly conceived architecture, Software Implemented Fault Tolerance (SIFT), provides the best match to the above qualities. In addition SIFT is particularly simple and believable. The other candidates, Bus Checker System (BUCS), also newly conceived in this project, and the Hopkins multiprocessor are potentially more efficient than SIFT in the use of redundancy, but otherwise are not as attractive

    On Fault Tolerance Methods for Networks-on-Chip

    Get PDF
    Technology scaling has proceeded into dimensions in which the reliability of manufactured devices is becoming endangered. The reliability decrease is a consequence of physical limitations, relative increase of variations, and decreasing noise margins, among others. A promising solution for bringing the reliability of circuits back to a desired level is the use of design methods which introduce tolerance against possible faults in an integrated circuit. This thesis studies and presents fault tolerance methods for network-onchip (NoC) which is a design paradigm targeted for very large systems-onchip. In a NoC resources, such as processors and memories, are connected to a communication network; comparable to the Internet. Fault tolerance in such a system can be achieved at many abstraction levels. The thesis studies the origin of faults in modern technologies and explains the classification to transient, intermittent and permanent faults. A survey of fault tolerance methods is presented to demonstrate the diversity of available methods. Networks-on-chip are approached by exploring their main design choices: the selection of a topology, routing protocol, and flow control method. Fault tolerance methods for NoCs are studied at different layers of the OSI reference model. The data link layer provides a reliable communication link over a physical channel. Error control coding is an efficient fault tolerance method especially against transient faults at this abstraction level. Error control coding methods suitable for on-chip communication are studied and their implementations presented. Error control coding loses its effectiveness in the presence of intermittent and permanent faults. Therefore, other solutions against them are presented. The introduction of spare wires and split transmissions are shown to provide good tolerance against intermittent and permanent errors and their combination to error control coding is illustrated. At the network layer positioned above the data link layer, fault tolerance can be achieved with the design of fault tolerant network topologies and routing algorithms. Both of these approaches are presented in the thesis together with realizations in the both categories. The thesis concludes that an optimal fault tolerance solution contains carefully co-designed elements from different abstraction levelsSiirretty Doriast

    LSI/VLSI design for testability analysis and general approach

    Get PDF
    The incorporation of testability characteristics into large scale digital design is not only necessary for, but also pertinent to effective device testing and enhancement of device reliability. There are at least three major DFT techniques, namely, the self checking, the LSSD, and the partitioning techniques, each of which can be incorporated into a logic design to achieve a specific set of testability and reliability requirements. Detailed analysis of the design theory, implementation, fault coverage, hardware requirements, application limitations, etc., of each of these techniques are also presented

    Testability and redundancy techniques for improved yield and reliability of CMOS VLSI circuits

    Get PDF
    The research presented in this thesis is concerned with the design of fault-tolerant integrated circuits as a contribution to the design of fault-tolerant systems. The economical manufacture of very large area ICs will necessitate the incorporation of fault-tolerance features which are routinely employed in current high density dynamic random access memories. Furthermore, the growing use of ICs in safety-critical applications and/or hostile environments in addition to the prospect of single-chip systems will mandate the use of fault-tolerance for improved reliability. A fault-tolerant IC must be able to detect and correct all possible faults that may affect its operation. The ability of a chip to detect its own faults is not only necessary for fault-tolerance, but it is also regarded as the ultimate solution to the problem of testing. Off-line periodic testing is selected for this research because it achieves better coverage of physical faults and it requires less extra hardware than on-line error detection techniques. Tests for CMOS stuck-open faults are shown to detect all other faults. Simple test sequence generation procedures for the detection of all faults are derived. The test sequences generated by these procedures produce a trivial output, thereby, greatly simplifying the task of test response analysis. A further advantage of the proposed test generation procedures is that they do not require the enumeration of faults. The implementation of built-in self-test is considered and it is shown that the hardware overhead is comparable to that associated with pseudo-random and pseudo-exhaustive techniques while achieving a much higher fault coverage through-the use of the proposed test generation procedures. The consideration of the problem of testing the test circuitry led to the conclusion that complete test coverage may be achieved if separate chips cooperate in testing each other's untested parts. An alternative approach towards complete test coverage would be to design the test circuitry so that it is as distributed as possible and so that it is tested as it performs its function. Fault correction relies on the provision of spare units and a means of reconfiguring the circuit so that the faulty units are discarded. This raises the question of what is the optimum size of a unit? A mathematical model, linking yield and reliability is therefore developed to answer such a question and also to study the effects of such parameters as the amount of redundancy, the size of the additional circuitry required for testing and reconfiguration, and the effect of periodic testing on reliability. The stringent requirement on the size of the reconfiguration logic is illustrated by the application of the model to a typical example. Another important result concerns the effect of periodic testing on reliability. It is shown that periodic off-line testing can achieve approximately the same level of reliability as on-line testing, even when the time between tests is many hundreds of hours

    Fault-tolerant computer study

    Get PDF
    A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
    corecore