Search CORE

1,492 research outputs found

Catastrophic Faults in Reconfigurable Linear Arrays of Processors

Author: De Prisco Roberto
De Santis Alfredo
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1993
Field of study

In regular architectures of identical processing elements, a widely used technique to improve the reconfigurability of the system consists of providing redundant processing elements and mechanisms of reconfiguration. In this paper we consider linear arrays of processing elements, with unidirectional bypass links of length g. We count the number of particular sets of faulty processing elements. We show that the number of catastrophic faults of g elements is equal to the (g-1)-th Catalan number. We also provide algorithms to rank and unrank all catastrophic sets of g faults. Finally, we describe a linear time algorithm that generate all such sets of faults

CiteSeerX

Columbia University Academic Commons

Efficient reconfigurable techniques for VLSI arrays with 6-port switches

Author: Schroder H
Srikanthan T
Wu J
Publication venue: IEEE (Piscataway, NJ)
Publication date: 01/01/2005
Field of study

This paper proposes an efficient techniques to reconfigure a two-dimensional degradable very large scale integration/wafer scale integration (VLSI/WSI) array under the row and column routing constraints, which has been shown to be NP-complete. The proposed VLSI/WSI array consists of identical processing elements such as processors or memory cells embedded in a 6-port switch lattice in the form of a rectangular grid. It has been shown that the proposed VLSI structure with 6-port switches eliminates the need to incorporate internal bypass within processing elements and leads to notable increase in the harvest when compared with the one using 4-port switches. A new greedy rerouting algorithm and compensation approaches are also proposed to maximize harvest through reconfiguration. Experimental results show that the proposed VLSI array with 6-port switches consistently outperforms the most efficient alternative, proposed in literature, toward maximizing the harvest in the presence of fault processing elements

RMIT Research Repository

On Fault Tolerance Methods for Networks-on-Chip

Author: Lehtonen Teijo
Publication venue: Turku Centre for Computer Science
Publication date: 13/11/2009
Field of study

Technology scaling has proceeded into dimensions in which the reliability of manufactured devices is becoming endangered. The reliability decrease is a consequence of physical limitations, relative increase of variations, and decreasing noise margins, among others. A promising solution for bringing the reliability of circuits back to a desired level is the use of design methods which introduce tolerance against possible faults in an integrated circuit. This thesis studies and presents fault tolerance methods for network-onchip (NoC) which is a design paradigm targeted for very large systems-onchip. In a NoC resources, such as processors and memories, are connected to a communication network; comparable to the Internet. Fault tolerance in such a system can be achieved at many abstraction levels. The thesis studies the origin of faults in modern technologies and explains the classification to transient, intermittent and permanent faults. A survey of fault tolerance methods is presented to demonstrate the diversity of available methods. Networks-on-chip are approached by exploring their main design choices: the selection of a topology, routing protocol, and flow control method. Fault tolerance methods for NoCs are studied at different layers of the OSI reference model. The data link layer provides a reliable communication link over a physical channel. Error control coding is an efficient fault tolerance method especially against transient faults at this abstraction level. Error control coding methods suitable for on-chip communication are studied and their implementations presented. Error control coding loses its effectiveness in the presence of intermittent and permanent faults. Therefore, other solutions against them are presented. The introduction of spare wires and split transmissions are shown to provide good tolerance against intermittent and permanent errors and their combination to error control coding is illustrated. At the network layer positioned above the data link layer, fault tolerance can be achieved with the design of fault tolerant network topologies and routing algorithms. Both of these approaches are presented in the thesis together with realizations in the both categories. The thesis concludes that an optimal fault tolerance solution contains carefully co-designed elements from different abstraction levelsSiirretty Doriast

UTUPub

Recommended from our members

Algorithm Based Fault Tolerance in Massively Parallel Systems

Author: Lerner Mark D.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1988
Field of study

An A complex computer system consists of billions of transistors, miles of wires, and many interactions with an unpredictable environment. Correct results must be produced despite faults that dynamically occur in some of these components. Many techniques have been developed for fault tolerant computation. General purpose methods are independent of the application, yet incur an overhead cost which may be unacceptable for massively parallel systems. Algorithm-specific methods, which can operate at lower cost, are a developing alternative [1, 72]. This paper first reviews the general-purpose approach and then focuses on the algorithm-specific method, with an eye toward massively parallel processors. Algorithm-based fault tolerance has the attraction of low overhead; furthermore it addresses both the detection and also the correction problems. The principle is to build low-cost checking and correcting mechanism based exclusively on the redundancies inherent in the system

Columbia University Academic Commons

Memory built-in self-repair and correction for improving yield: a review

Author: Atchina Delsikreo
Sontakke Vijay
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/02/2024
Field of study

Nanometer memories are highly prone to defects due to dense structure, necessitating memory built-in self-repair as a must-have feature to improve yield. Today’s system-on-chips contain memories occupying an area as high as 90% of the chip area. Shrinking technology uses stricter design rules for memories, making them more prone to manufacturing defects. Further, using 3D-stacked memories makes the system vulnerable to newer defects such as those coming from through-silicon-vias (TSV) and micro bumps. The increased memory size is also resulting in an increase in soft errors during system operation. Multiple memory repair techniques based on redundancy and correction codes have been presented to recover from such defects and prevent system failures. This paper reviews recently published memory repair methodologies, including various built-in self-repair (BISR) architectures, repair analysis algorithms, in-system repair, and soft repair handling using error correcting codes (ECC). It provides a classification of these techniques based on method and usage. Finally, it reviews evaluation methods used to determine the effectiveness of the repair algorithms. The paper aims to present a survey of these methodologies and prepare a platform for developing repair methods for upcoming-generation memories

Institute of Advanced Engineering and Science

Reconfigurable architecture for very large scale microelectronic systems

Author: Chen Wei
Publication venue: The University of Edinburgh
Publication date: 01/01/1986
Field of study

Edinburgh Research Archive

Control and reliability of optical networks in multiprocessors

Author: Olsen James Jonathan
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1993
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1993.Includes bibliographical references (leaves 138-142).by James Jonathan Olsen.Ph.D

DSpace@MIT

Self test and self repair strategies in VLSI architectures for high speed digital correlation

Author: Blackley William Sinclair
Publication venue: The University of Edinburgh
Publication date: 01/01/1985
Field of study

Edinburgh Research Archive

Fault Secure Encoder and Decoder for NanoMemory Applications

Author: DeHon André
Naeimi Helia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2009
Field of study

Memory cells have been protected from soft errors for more than a decade; due to the increase in soft error rate in logic circuits, the encoder and decoder circuitry around the memory blocks have become susceptible to soft errors as well and must also be protected. We introduce a new approach to design fault-secure encoder and decoder circuitry for memory designs. The key novel contribution of this paper is identifying and defining a new class of error-correcting codes whose redundancy makes the design of fault-secure detectors (FSD) particularly simple. We further quantify the importance of protecting encoder and decoder circuitry against transient errors, illustrating a scenario where the system failure rate (FIT) is dominated by the failure rate of the encoder and decoder. We prove that Euclidean geometry low-density parity-check (EG-LDPC) codes have the fault-secure detector capability. Using some of the smaller EG-LDPC codes, we can tolerate bit or nanowire defect rates of 10% and fault rates of 10^(-18) upsets/device/cycle, achieving a FIT rate at or below one for the entire memory system and a memory density of 10^(11) bit/cm^2 with nanowire pitch of 10 nm for memory blocks of 10 Mb or larger. Larger EG-LDPC codes can achieve even higher reliability and lower area overhead

Caltech Authors