427 research outputs found

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Using Fine Grain Approaches for highly reliable Design of FPGA-based Systems in Space

    Get PDF
    Nowadays using SRAM based FPGAs in space missions is increasingly considered due to their flexibility and reprogrammability. A challenge is the devices sensitivity to radiation effects that increased with modern architectures due to smaller CMOS structures. This work proposes fault tolerance methodologies, that are based on a fine grain view to modern reconfigurable architectures. The focus is on SEU mitigation challenges in SRAM based FPGAs which can result in crucial situations

    UA2TPG: An untestability analyzer and test pattern generator for SEUs in the configuration memory of SRAM-based FPGAs

    Get PDF
    This paper presents UA2TPG, a static analysis tool for the untestability proof and automatic test pattern generation for SEUs in the configuration memory of SRAM-based FPGA systems. The tool is based on the model-checking verification technique. An accurate fault model for both logic components and routing structures is adopted. Experimental results show that many circuits have a significant number of untestable faults, and their detection enables more efficient test pattern generation and on-line testing. The tool is mainly intended to support on-line testing of critical components in FPGA fault-tolerant systems

    Self-healing concepts involving fine-grained redundancy for electronic systems

    Get PDF
    The start of the digital revolution came through the metal-oxide-semiconductor field-effect transistor (MOSFET) in 1959 followed by massive integration onto a silicon die by means of constant down scaling of individual components. Digital systems for certain applications require fault-tolerance against faults caused by temporary or permanent influence. The most widely used technique is triple module redundancy (TMR) in conjunction with a majority voter, which is regarded as a passive fault mitigation strategy. Design by functional resilience has been applied to circuit structures for increased fault-tolerance and towards self-diagnostic triggered self-healing. The focus of this thesis is therefore to develop new design strategies for fault detection and mitigation within transistor, gate and cell design levels. The research described in this thesis makes three contributions. The first contribution is based on adding fine-grained transistor level redundancy to logic gates in order to accomplish stuck-at fault-tolerance. The objective is to realise maximum fault-masking for a logic gate with minimal added redundant transistors. In the case of non-maskable stuck-at faults, the gate structure generates an intrinsic indication signal that is suitable for autonomous self-healing functions. As a result, logic circuitry utilising this design is now able to differentiate between gate faults and faults occurring in inter-gate connections. This distinction between fault-types can then be used for triggering selective self-healing responses. The second contribution is a logic matrix element which applies the three core redundancy concepts of spatial- temporal- and data-redundancy. This logic structure is composed of quad-modular redundant structures and is capable of selective fault-masking and localisation depending of fault-type at the cell level, which is referred to as a spatiotemporal quadded logic cell (QLC) structure. This QLC structure has the capability of cellular self-healing. Through the combination of fault-tolerant and masking logic features the QLC is designed with a fault-behaviour that is equal to existing quadded logic designs using only 33.3% of the equivalent transistor resources. The inherent self-diagnosing feature of QLC is capable of identifying individual faulty cells and can trigger self-healing features. The final contribution is focused on the conversion of finite state machines (FSM) into memory to achieve better state transition timing, minimal memory utilisation and fault protection compared to common FSM designs. A novel implementation based on content-addressable type memory (CAM) is used to achieve this. The FSM is further enhanced by creating the design out of logic gates of the first contribution by achieving stuck-at fault resilience. Applying cross-data parity checking, the FSM becomes equipped with single bit fault detection and correction

    X-Rel: Energy-Efficient and Low-Overhead Approximate Reliability Framework for Error-Tolerant Applications Deployed in Critical Systems

    Full text link
    Triple Modular Redundancy (TMR) is one of the most common techniques in fault-tolerant systems, in which the output is determined by a majority voter. However, the design diversity of replicated modules and/or soft errors that are more likely to happen in the nanoscale era may affect the majority voting scheme. Besides, the significant overheads of the TMR scheme may limit its usage in energy consumption and area-constrained critical systems. However, for most inherently error-resilient applications such as image processing and vision deployed in critical systems (like autonomous vehicles and robotics), achieving a given level of reliability has more priority than precise results. Therefore, these applications can benefit from the approximate computing paradigm to achieve higher energy efficiency and a lower area. This paper proposes an energy-efficient approximate reliability (X-Rel) framework to overcome the aforementioned challenges of the TMR systems and get the full potential of approximate computing without sacrificing the desired reliability constraint and output quality. The X-Rel framework relies on relaxing the precision of the voter based on a systematical error bounding method that leverages user-defined quality and reliability constraints. Afterward, the size of the achieved voter is used to approximate the TMR modules such that the overall area and energy consumption are minimized. The effectiveness of employing the proposed X-Rel technique in a TMR structure, for different quality constraints as well as with various reliability bounds are evaluated in a 15-nm FinFET technology. The results of the X-Rel voter show delay, area, and energy consumption reductions of up to 86%, 87%, and 98%, respectively, when compared to those of the state-of-the-art approximate TMR voters.Comment: This paper has been published in IEEE Transactions on Very Large Scale Integration (VLSI) System

    Analysis and Test of the Effects of Single Event Upsets Affecting the Configuration Memory of SRAM-based FPGAs

    Get PDF
    SRAM-based FPGAs are increasingly relevant in a growing number of safety-critical application fields, ranging from automotive to aerospace. These application fields are characterized by a harsh radiation environment that can cause the occurrence of Single Event Upsets (SEUs) in digital devices. These faults have particularly adverse effects on SRAM-based FPGA systems because not only can they temporarily affect the behaviour of the system by changing the contents of flip-flops or memories, but they can also permanently change the functionality implemented by the system itself, by changing the content of the configuration memory. Designing safety-critical applications requires accurate methodologies to evaluate the systemā€™s sensitivity to SEUs as early as possible during the design process. Moreover it is necessary to detect the occurrence of SEUs during the system life-time. To this purpose test patterns should be generated during the design process, and then applied to the inputs of the system during its operation. In this thesis we propose a set of software tools that could be used by designers of SRAM-based FPGA safety-critical applications to assess the sensitivity to SEUs of the system and to generate test patterns for in-service testing. The main feature of these tools is that they implement a model of SEUs affecting the configuration bits controlling the logic and routing resources of an FPGA device that has been demonstrated to be much more accurate than the classical stuck-at and open/short models, that are commonly used in the analysis of faults in digital devices. By keeping this accurate fault model into account, the proposed tools are more accurate than similar academic and commercial tools today available for the analysis of faults in digital circuits, that do not take into account the features of the FPGA technology.. In particular three tools have been designed and developed: (i) ASSESS: Accurate Simulator of SEuS affecting the configuration memory of SRAM-based FPGAs, a simulator of SEUs affecting the configuration memory of an SRAM-based FPGA system for the early assessment of the sensitivity to SEUs; (ii) UA2TPG: Untestability Analyzer and Automatic Test Pattern Generator for SEUs Affecting the Configuration Memory of SRAM-based FPGAs, a static analysis tool for the identification of the untestable SEUs and for the automatic generation of test patterns for in-service testing of the 100% of the testable SEUs; and (iii) GABES: Genetic Algorithm Based Environment for SEU Testing in SRAM-FPGAs, a Genetic Algorithm-based Environment for the generation of an optimized set of test patterns for in-service testing of SEUs. The proposed tools have been applied to some circuits from the ITCā€™99 benchmark. The results obtained from these experiments have been compared with results obtained by similar experiments in which we considered the stuck-at fault model, instead of the more accurate model for SEUs. From the comparison of these experiments we have been able to verify that the proposed software tools are actually more accurate than similar tools today available. In particular the comparison between results obtained using ASSESS with those obtained by fault injection has shown that the proposed fault simulator has an average error of 0:1% and a maximum error of 0:5%, while using a stuck-at fault simulator the average error with respect of the fault injection experiment has been 15:1% with a maximum error of 56:2%. Similarly the comparison between the results obtained using UA2TPG for the accurate SEU model, with the results obtained for stuck-at faults has shown an average difference of untestability of 7:9% with a maximum of 37:4%. Finally the comparison between fault coverages obtained by test patterns generated for the accurate model of SEUs and the fault coverages obtained by test pattern designed for stuck-at faults, shows that the former detect the 100% of the testable faults, while the latter reach an average fault coverage of 78:9%, with a minimum of 54% and a maximum of 93:16%

    Advances in Nanowire-Based Computing Architectures

    Get PDF

    Delay Measurements and Self Characterisation on FPGAs

    No full text
    This thesis examines new timing measurement methods for self delay characterisation of Field-Programmable Gate Arrays (FPGAs) components and delay measurement of complex circuits on FPGAs. Two novel measurement techniques based on analysis of a circuit's output failure rate and transition probability is proposed for accurate, precise and efficient measurement of propagation delays. The transition probability based method is especially attractive, since it requires no modifications in the circuit-under-test and requires little hardware resources, making it an ideal method for physical delay analysis of FPGA circuits. The relentless advancements in process technology has led to smaller and denser transistors in integrated circuits. While FPGA users benefit from this in terms of increased hardware resources for more complex designs, the actual productivity with FPGA in terms of timing performance (operating frequency, latency and throughput) has lagged behind the potential improvements from the improved technology due to delay variability in FPGA components and the inaccuracy of timing models used in FPGA timing analysis. The ability to measure delay of any arbitrary circuit on FPGA offers many opportunities for on-chip characterisation and physical timing analysis, allowing delay variability to be accurately tracked and variation-aware optimisations to be developed, reducing the productivity gap observed in today's FPGA designs. The measurement techniques are developed into complete self measurement and characterisation platforms in this thesis, demonstrating their practical uses in actual FPGA hardware for cross-chip delay characterisation and accurate delay measurement of both complex combinatorial and sequential circuits, further reinforcing their positions in solving the delay variability problem in FPGAs
    • ā€¦
    corecore