28 research outputs found

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Choose-Your-Own Adventure: A Lightweight, High-Performance Approach To Defect And Variation Mitigation In Reconfigurable Logic

    Get PDF
    For field-programmable gate arrays (FPGAs), fine-grained pre-computed alternative configurations, combined with simple test-based selection, produce limited per-chip specialization to counter yield loss, increased delay, and increased energy costs that come from fabrication defects and variation. This lightweight approach achieves much of the benefit of knowledge-based full specialization while reducing to practical, palatable levels the computational, testing, and load-time costs that obstruct the application of the knowledge-based approach. In practice this may more than double the power-limited computational capabilities of dies fabricated with 22nm technologies. Contributions of this work: • Choose-Your-own-Adventure (CYA), a novel, lightweight, scalable methodology to achieve defect and variation mitigation • Implementation of CYA, including preparatory components (generation of diverse alternative paths) and FPGA load-time components • Detailed performance characterization of CYA – Comparison to conventional loading and dynamic frequency and voltage scaling (DFVS) – Limit studies to characterize the quality of the CYA implementation and identify potential areas for further optimizatio

    Advances in Nanowire-Based Computing Architectures

    Get PDF

    The Case for Reconfigurable Components with Logic Scrubbing: Regular Hygiene Keeps Logic FIT (low)

    Get PDF
    As we approach atomic-scale logic, we must accommodate an increased rate of manufacturing defects, transient upsets, and in-field persistent failures. High defect rates demand reconfiguration to avoid defective components, and transient upsets demand online error detection to catch failures. Combining these techniques we can detect in-field persistent failures when they occur and reconfigure around them. However, since failures may be logically masked for long periods of time, persistent failures may accumulate silently; this integration of errors over time means the effective failure rate for persistent errors can exceed transient upset rates. As a result, logic scrubbing is necessary to prevent the silent accumulation of an undetectable number of persistent errors. We provide simple analysis to illustrate quantitatively how this phenomena can be a concern

    Amorphous Computing

    Get PDF
    Amorphous computing is the development of organizational principles and programming languages for obtaining coherent behaviors from the cooperation of myriads of unreliable parts that are interconnected in unknown, irregular, and time-varying ways. The impetus for amorphous computing comes from developments in microfabrication and fundamental biology, each of which is the basis of a kernel technology that makes it possible to build or grow huge numbers of almost-identical information-processing units at almost no cost. This paper sets out a research agenda for realizing the potential of amorphous computing and surveys some initial progress, both in programming and in fabrication. We describe some approaches to programming amorphous systems, which are inspired by metaphors from biology and physics. We also present the basic ideas of cellular computing, an approach to constructing digital-logic circuits within living cells by representing logic levels by concentrations DNA-binding proteins

    Performance analysis of fault-tolerant nanoelectronic memories

    Get PDF
    Performance growth in microelectronics, as described by Moore’s law, is steadily approaching its limits. Nanoscale technologies are increasingly being explored as a practical solution to sustaining and possibly surpassing current performance trends of microelectronics. This work presents an in-depth analysis of the impact on performance, of incorporating reliability schemes into the architecture of a crossbar molecular switch nanomemory and demultiplexer. Nanoelectronics are currently in their early stages, and so fabrication and design methodologies are still in the process of being studied and developed. The building blocks of nanotechnology are fabricated using bottom-up processes, which leave them highly susceptible to defects. Hence, it is very important that defect and fault-tolerant schemes be incorporated into the design of nanotechnology related devices. In this dissertation, we focus on the study of a novel and promising class of computer chip memories called crossbar molecular switch memories and their demultiplexer addressing units. A major part of this work was the design of a defect and fault tolerance scheme we called the Multi-Switch Junction (MSJ) scheme. The MSJ scheme takes advantage of the regular array geometry of the crossbar nanomemory to create multiple switches in the fabric of the crossbar nanomemory for the storage of a single bit. Implementing defect and fault tolerant schemes come at a performance cost to the crossbar nanomemory; the challenge becomes achieving a balance between device reliability and performance. We have studied the reliability induced performance penalties as they relate to the time (delay) it takes to access a bit, and the amount of power dissipated by the process. Also, MSJ was compared to the banking and error correction coding fault tolerant schemes. Studies were also conducted to ascertain the potential benefits of integrating our MSJ scheme with the banking scheme. Trade-off analysis between access time delay, power dissipation and reliability is outlined and presented in this work. Results show the MSJ scheme increases the reliability of the crossbar nanomemory and demultiplexer. Simulation results also indicated that MSJ works very well for smaller nanomemory array sizes, with reliabilities of 100% for molecular switch failure rates in the 10% or less range

    A survey of fault-tolerance algorithms for reconfigurable nano-crossbar arrays

    Get PDF
    ACM Comput. Surv. Volume 50, issue 6 (November 2017)Nano-crossbar arrays have emerged as a promising and viable technology to improve computing performance of electronic circuits beyond the limits of current CMOS. Arrays offer both structural efficiency with reconfiguration and prospective capability of integration with different technologies. However, certain problems need to be addressed, and the most important one is the prevailing occurrence of faults. Considering fault rate projections as high as 20% that is much higher than those of CMOS, it is fair to expect sophisticated fault-tolerance methods. The focus of this survey article is the assessment and evaluation of these methods and related algorithms applied in logic mapping and configuration processes. As a start, we concisely explain reconfigurable nano-crossbar arrays with their fault characteristics and models. Following that, we demonstrate configuration techniques of the arrays in the presence of permanent faults and elaborate on two main fault-tolerance methodologies, namely defect-unaware and defect-aware approaches, with a short review on advantages and disadvantages. For both methodologies, we present detailed experimental results of related algorithms regarding their strengths and weaknesses with a comprehensive yield, success rate and runtime analysis. Next, we overview fault-tolerance approaches for transient faults. As a conclusion, we overview the proposed algorithms with future directions and upcoming challenges.This work is supported by the EU-H2020-RISE project NANOxCOMP no 691178 and the TUBITAK-Career project no 113E760
    corecore