1,554 research outputs found

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Advances in Nanowire-Based Computing Architectures

    Get PDF

    Low-overhead fault-tolerant logic for field-programmable gate arrays

    Get PDF
    While allowing for the fabrication of increasingly complex and efficient circuitry, transistor shrinkage and count-per-device expansion have major downsides: chiefly increased variation, degradation and fault susceptibility. For this reason, design-time consideration of faults will have to be given to increasing numbers of electronic systems in the future to ensure yields, reliabilities and lifetimes remain acceptably high. Many mathematical operators commonly accelerated in hardware are suited to modification resulting in datapath error detection and correction capabilities with far lower area, performance and/or power consumption overheads than those incurred through the utilisation of more established, general-purpose fault tolerance methods such as modular redundancy. Field-programmable gate arrays are uniquely placed to allow further area savings to be made thanks to their dynamic reconfigurability. The majority of the technical work presented within this thesis is based upon a benchmark hardware accelerator---a matrix multiplier---that underwent several evolutions in order to detect and correct faults manifesting along its datapath at runtime. In the first instance, fault detectability in excess of 99% was achieved in return for 7.87% additional area and 45.5% extra latency. In the second, the ability to correct errors caused by those faults was added at the cost of 4.20% more area, while 50.7% of this---and 46.2% of the previously incurred latency overhead---was removed through the introduction of partial reconfiguration in the third. The fourth demonstrates further reductions in both area and performance overheads---of 16.7% and 8.27%, respectively---through systematic data width reduction by allowing errors of less than ±0.5% of the maximum output value to propagate.Open Acces

    Yield Enhancement of Digital Microfluidics-Based Biochips Using Space Redundancy and Local Reconfiguration

    Full text link
    As microfluidics-based biochips become more complex, manufacturing yield will have significant influence on production volume and product cost. We propose an interstitial redundancy approach to enhance the yield of biochips that are based on droplet-based microfluidics. In this design method, spare cells are placed in the interstitial sites within the microfluidic array, and they replace neighboring faulty cells via local reconfiguration. The proposed design method is evaluated using a set of concurrent real-life bioassays.Comment: Submitted on behalf of EDAA (http://www.edaa.com/

    Fault-tolerant fpga for mission-critical applications.

    Get PDF
    One of the devices that play a great role in electronic circuits design, specifically safety-critical design applications, is Field programmable Gate Arrays (FPGAs). This is because of its high performance, re-configurability and low development cost. FPGAs are used in many applications such as data processing, networks, automotive, space and industrial applications. Negative impacts on the reliability of such applications result from moving to smaller feature sizes in the latest FPGA architectures. This increases the need for fault-tolerant techniques to improve reliability and extend system lifetime of FPGA-based applications. In this thesis, two fault-tolerant techniques for FPGA-based applications are proposed with a built-in fault detection region. A low cost fault detection scheme is proposed for detecting faults using the fault detection region used in both schemes. The fault detection scheme primarily detects open faults in the programmable interconnect resources in the FPGAs. In addition, Stuck-At faults and Single Event Upsets (SEUs) fault can be detected. For fault recovery, each scheme has its own fault recovery approach. The first approach uses a spare module and a 2-to-1 multiplexer to recover from any fault detected. On the other hand, the second approach recovers from any fault detected using the property of Partial Reconfiguration (PR) in the FPGAs. It relies on identifying a Partially Reconfigurable block (P_b) in the FPGA that is used in the recovery process after the first faulty module is identified in the system. This technique uses only one location to recover from faults in any of the FPGA’s modules and the FPGA interconnects. Simulation results show that both techniques can detect and recover from open faults. In addition, Stuck-At faults and Single Event Upsets (SEUs) fault can also be detected. Finally, both techniques require low area overhead

    Seven strategies for tolerating highly defective fabrication

    Get PDF
    In this article we present an architecture that supports fine-grained sparing and resource matching. The base logic structure is a set of interconnected PLAs. The PLAs and their interconnections consist of large arrays of interchangeable nanowires, which serve as programmable product and sum terms and as programmable interconnect links. Each nanowire can have several defective programmable junctions. We can test nanowires for functionality and use only the subset that provides appropriate conductivity and electrical characteristics. We then perform a matching between nanowire junction programmability and application logic needs to use almost all the nanowires even though most of them have defective junctions. We employ seven high-level strategies to achieve this level of defect tolerance

    Precise delay measurement through combinatorial logic

    Get PDF
    A high resolution circuit and method for facilitating precise measurement of on-chip delays for FPGAs for reliability studies. The circuit embeds a pulse generator on an FPGA chip having one or more groups of LUTS (the "LUT delay chain"), also on-chip. The circuit also embeds a pulse width measurement circuit on-chip, and measures the duration of the generated pulse through the delay chain. The pulse width of the output pulse represents the delay through the delay chain without any I/O delay. The pulse width measurement circuit uses an additional asynchronous clock autonomous from the main clock and the FPGA propagation delay can be displayed on a hex display continuously for testing purposes

    Fault Tolerance in Programmable Metasurfaces: The Beam Steering Case

    Get PDF
    Metasurfaces, the two-dimensional counterpart of metamaterials, have caught great attention thanks to their powerful control over electromagnetic waves. Recent times have seen the emergence of a variety of metasurfaces exhibiting not only countless functionalities, but also a reconfigurable or even programmable response. Reconfigurability, however, entails the integration of tuning and control circuits within the metasurface structure and, as this new paradigm moves forward, new reliability challenges may arise. This paper examines, for the first time, the reliability problem in programmable metamaterials by proposing an error model and a general methodology for error analysis. To derive the error model, the causes and potential impact of faults are identified and discussed qualitatively. The methodology is presented and instantiated for beam steering, which constitutes a relevant example for programmable metasurfaces. Results show that performance degradation depends on the type of error and its spatial distribution and that, in beam steering, error rates over 10% can still be considered acceptable
    corecore