809 research outputs found

    Catastrophic Faults in Reconfigurable Linear Arrays of Processors

    Get PDF
    In regular architectures of identical processing elements, a widely used technique to improve the reconfigurability of the system consists of providing redundant processing elements and mechanisms of reconfiguration. In this paper we consider linear arrays of processing elements, with unidirectional bypass links of length g. We count the number of particular sets of faulty processing elements. We show that the number of catastrophic faults of g elements is equal to the (g-1)-th Catalan number. We also provide algorithms to rank and unrank all catastrophic sets of g faults. Finally, we describe a linear time algorithm that generate all such sets of faults

    Periodic Application of Concurrent Error Detection in Processor Array Architectures

    Get PDF
    Processor arrays can provide an attractive architecture for some applications. Featuring modularity, regular interconnection and high parallelism, such arrays are well-suited for VLSI/WSI implementations, and applications with high computational requirements, such as real-time signal processing. Preserving the integrity of results can be of paramount importance for certain applications. In these cases, fault tolerance should be used to ensure reliable delivery of a system's service. One aspect of fault tolerance is the detection of errors caused by faults. Concurrent error detection (CED) techniques offer the advantage that transient and intermittent faults may be detected with greater probability than with off-line diagnostic tests. Applying time-redundant CED techniques can reduce hardware redundancy costs. However, most time-redundant CED techniques degrade a system's performance

    Testing and reconfiguration of VLSI linear arrays

    Get PDF
    AbstractAchieving fault tolerance through incorporation of redundancy and reconfiguration is quite common. In this paper we study the fault tolerance of linear arrays of N processors with k bypass links whose maximum length is g. We consider both arrays with bidirectional links and unidirectional links.We first consider the problem of testing whether a set of n faulty processors is catastrophic, i.e., precludes reconfiguration. We provide new testing algorithms which improve and generalize known testing algorithms. For bidirectional arrays we provide an O(kn) time testing algorithm and for unidirectional arrays we provide an O(n) time algorithm for the case k = 1, and an O(kn log k) time algorithm, for the case k 1.When the fault pattern is not catastrophic we study the problem of finding an optimal reconfiguration of the array. We consider optimality with respect to two parameters: the size of the reconfigured array and the number of redundant links to activate. Considering optimality with respect to the size of the reconfigured array, we prove that the problem is NP-hard in the strong sense if the bypass links are bidirectional, while it can be solved in O(kng) time if the bypass links are unidirectional. Considering optimality with respect to the number of bypass links to activate, we prove that the problem can be solved in O(kn) time if the bypass links are bidirectional, and in O(kng) time if the bypass links are unidirectional

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Testability and redundancy techniques for improved yield and reliability of CMOS VLSI circuits

    Get PDF
    The research presented in this thesis is concerned with the design of fault-tolerant integrated circuits as a contribution to the design of fault-tolerant systems. The economical manufacture of very large area ICs will necessitate the incorporation of fault-tolerance features which are routinely employed in current high density dynamic random access memories. Furthermore, the growing use of ICs in safety-critical applications and/or hostile environments in addition to the prospect of single-chip systems will mandate the use of fault-tolerance for improved reliability. A fault-tolerant IC must be able to detect and correct all possible faults that may affect its operation. The ability of a chip to detect its own faults is not only necessary for fault-tolerance, but it is also regarded as the ultimate solution to the problem of testing. Off-line periodic testing is selected for this research because it achieves better coverage of physical faults and it requires less extra hardware than on-line error detection techniques. Tests for CMOS stuck-open faults are shown to detect all other faults. Simple test sequence generation procedures for the detection of all faults are derived. The test sequences generated by these procedures produce a trivial output, thereby, greatly simplifying the task of test response analysis. A further advantage of the proposed test generation procedures is that they do not require the enumeration of faults. The implementation of built-in self-test is considered and it is shown that the hardware overhead is comparable to that associated with pseudo-random and pseudo-exhaustive techniques while achieving a much higher fault coverage through-the use of the proposed test generation procedures. The consideration of the problem of testing the test circuitry led to the conclusion that complete test coverage may be achieved if separate chips cooperate in testing each other's untested parts. An alternative approach towards complete test coverage would be to design the test circuitry so that it is as distributed as possible and so that it is tested as it performs its function. Fault correction relies on the provision of spare units and a means of reconfiguring the circuit so that the faulty units are discarded. This raises the question of what is the optimum size of a unit? A mathematical model, linking yield and reliability is therefore developed to answer such a question and also to study the effects of such parameters as the amount of redundancy, the size of the additional circuitry required for testing and reconfiguration, and the effect of periodic testing on reliability. The stringent requirement on the size of the reconfiguration logic is illustrated by the application of the model to a typical example. Another important result concerns the effect of periodic testing on reliability. It is shown that periodic off-line testing can achieve approximately the same level of reliability as on-line testing, even when the time between tests is many hundreds of hours

    LSI/VLSI design for testability analysis and general approach

    Get PDF
    The incorporation of testability characteristics into large scale digital design is not only necessary for, but also pertinent to effective device testing and enhancement of device reliability. There are at least three major DFT techniques, namely, the self checking, the LSSD, and the partitioning techniques, each of which can be incorporated into a logic design to achieve a specific set of testability and reliability requirements. Detailed analysis of the design theory, implementation, fault coverage, hardware requirements, application limitations, etc., of each of these techniques are also presented

    The Automatic Synthesis of Fault Tolerant and Fault Secure VLSI Systems

    Get PDF
    This thesis investigates the design of fault tolerant and fault secure (FTFS) systems within the framework of silicon compilation. Automatic design modification is used to introduce FTFS characteristics into a design. A taxonomy of FTFS techniques is introduced and is used to identify a number of features which an "automatic design for FTFS" system should exhibit. A silicon compilation system, Chip Churn 2 (CC2), has been implemented and has been used to demonstrate the feasibility of automatic design of FTFS systems. The CC2 system provides a design language, simulation facilities and a back-end able to produce CMOS VLSI designs. A number of FTFS design methods have been implemented within the CC2 environment; these methods range from triple modular redundancy to concurrent parity code checking. The FTFS design methods can be applied automatically to general designs in order to realise them as FTFS systems. A number of example designs are presented; these are used to illustrate the FTFS modification techniques which have been implemented. Area results for CMOS devices are presented; this allows the modification methods to be compared. A number of problems arising from the methods are highlighted and some solutions suggested

    On Fault Tolerance Methods for Networks-on-Chip

    Get PDF
    Technology scaling has proceeded into dimensions in which the reliability of manufactured devices is becoming endangered. The reliability decrease is a consequence of physical limitations, relative increase of variations, and decreasing noise margins, among others. A promising solution for bringing the reliability of circuits back to a desired level is the use of design methods which introduce tolerance against possible faults in an integrated circuit. This thesis studies and presents fault tolerance methods for network-onchip (NoC) which is a design paradigm targeted for very large systems-onchip. In a NoC resources, such as processors and memories, are connected to a communication network; comparable to the Internet. Fault tolerance in such a system can be achieved at many abstraction levels. The thesis studies the origin of faults in modern technologies and explains the classification to transient, intermittent and permanent faults. A survey of fault tolerance methods is presented to demonstrate the diversity of available methods. Networks-on-chip are approached by exploring their main design choices: the selection of a topology, routing protocol, and flow control method. Fault tolerance methods for NoCs are studied at different layers of the OSI reference model. The data link layer provides a reliable communication link over a physical channel. Error control coding is an efficient fault tolerance method especially against transient faults at this abstraction level. Error control coding methods suitable for on-chip communication are studied and their implementations presented. Error control coding loses its effectiveness in the presence of intermittent and permanent faults. Therefore, other solutions against them are presented. The introduction of spare wires and split transmissions are shown to provide good tolerance against intermittent and permanent errors and their combination to error control coding is illustrated. At the network layer positioned above the data link layer, fault tolerance can be achieved with the design of fault tolerant network topologies and routing algorithms. Both of these approaches are presented in the thesis together with realizations in the both categories. The thesis concludes that an optimal fault tolerance solution contains carefully co-designed elements from different abstraction levelsSiirretty Doriast
    corecore