1,250 research outputs found

    An adaptive method to tolerate soft errors in SRAM-based FPGAs

    Get PDF
    AbstractIn this paper, we present an adaptive method that is a combination of SEU-avoidance in CAD flow and adaptive redundancy to tolerate soft error effects in SRAM-based FPGAs. This method is based on the modification of T-VPack and VPR tools. Three different steps of these tools are modified for SEU-awareness: (1) clustering, (2) placement and (3) routing. Then we use the unused resources as redundancy. We have investigated the effect of this method on several MCNC benchmarks. This investigation has been performed using three experiments: (1) SEU-awareness in clustering with redundancy, (2) SEU-awareness in clustering and placement with redundancy and (3) SEU-awareness in clustering, placement and routing with redundancy. With a confidence level of 95%, the results show that, using each of these three experiments, the system failure rate of ten MCNC circuits has been decreased between 4.52% and 10.42%, between 10.25% and 21.63%, and between 10.48% and 24.39%, respectively

    Single event upset hardened embedded domain specific reconfigurable architecture

    Get PDF

    Using Fine Grain Approaches for highly reliable Design of FPGA-based Systems in Space

    Get PDF
    Nowadays using SRAM based FPGAs in space missions is increasingly considered due to their flexibility and reprogrammability. A challenge is the devices sensitivity to radiation effects that increased with modern architectures due to smaller CMOS structures. This work proposes fault tolerance methodologies, that are based on a fine grain view to modern reconfigurable architectures. The focus is on SEU mitigation challenges in SRAM based FPGAs which can result in crucial situations

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Self-healing concepts involving fine-grained redundancy for electronic systems

    Get PDF
    The start of the digital revolution came through the metal-oxide-semiconductor field-effect transistor (MOSFET) in 1959 followed by massive integration onto a silicon die by means of constant down scaling of individual components. Digital systems for certain applications require fault-tolerance against faults caused by temporary or permanent influence. The most widely used technique is triple module redundancy (TMR) in conjunction with a majority voter, which is regarded as a passive fault mitigation strategy. Design by functional resilience has been applied to circuit structures for increased fault-tolerance and towards self-diagnostic triggered self-healing. The focus of this thesis is therefore to develop new design strategies for fault detection and mitigation within transistor, gate and cell design levels. The research described in this thesis makes three contributions. The first contribution is based on adding fine-grained transistor level redundancy to logic gates in order to accomplish stuck-at fault-tolerance. The objective is to realise maximum fault-masking for a logic gate with minimal added redundant transistors. In the case of non-maskable stuck-at faults, the gate structure generates an intrinsic indication signal that is suitable for autonomous self-healing functions. As a result, logic circuitry utilising this design is now able to differentiate between gate faults and faults occurring in inter-gate connections. This distinction between fault-types can then be used for triggering selective self-healing responses. The second contribution is a logic matrix element which applies the three core redundancy concepts of spatial- temporal- and data-redundancy. This logic structure is composed of quad-modular redundant structures and is capable of selective fault-masking and localisation depending of fault-type at the cell level, which is referred to as a spatiotemporal quadded logic cell (QLC) structure. This QLC structure has the capability of cellular self-healing. Through the combination of fault-tolerant and masking logic features the QLC is designed with a fault-behaviour that is equal to existing quadded logic designs using only 33.3% of the equivalent transistor resources. The inherent self-diagnosing feature of QLC is capable of identifying individual faulty cells and can trigger self-healing features. The final contribution is focused on the conversion of finite state machines (FSM) into memory to achieve better state transition timing, minimal memory utilisation and fault protection compared to common FSM designs. A novel implementation based on content-addressable type memory (CAM) is used to achieve this. The FSM is further enhanced by creating the design out of logic gates of the first contribution by achieving stuck-at fault resilience. Applying cross-data parity checking, the FSM becomes equipped with single bit fault detection and correction

    Analysis of Radiation-induced Cross Domain Errors in TMR Architectures on SRAM-based FPGAs

    Get PDF
    SRAM-Based FPGAs represent a low-cost alternative to ASIC device thanks to their high performance and design flexibility. In particular, for aerospace and avionics application fields, SRAM-based FPGAs are increasingly adopted for their configurability features making them a viable solution for long-time applications. However, these fields are characterized by a radiation environment that makes the technology extremely sensitive to radiation-induced Single Event Upsets (SEUs) in the SRAM-based FPGA’s configuration memory. Configuration scrubbing and Triple Modular Redundancy (TMR) have been widely adopted in order to cope with SEU effects. However, modern FPGA devices are characterized by a heterogeneous routing resource distribution and a complex configuration memory mapping causing an increasing sensitivity to Cross Domain Errors affecting the TMR structure. In this paper we developed a new methodology to calculate the reliability of TMR architecture considering the intrinsic characteristics of the new generation of SRAM-based FPGAs. The method includes the analysis of the configuration bit sharing phenomena and of the routing long lines. We experimentally evaluate the method of various benchmark circuits evaluating the Mean Upset To Failure (MUTF). Finally, we used the results of the developed method to implement an improved design achieving 29x improvement of the MUTF

    New Design Techniques for Dynamic Reconfigurable Architectures

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    An Adaptive Modular Redundancy Technique to Self-regulate Availability, Area, and Energy Consumption in Mission-critical Applications

    Get PDF
    As reconfigurable devices\u27 capacities and the complexity of applications that use them increase, the need for self-reliance of deployed systems becomes increasingly prominent. A Sustainable Modular Adaptive Redundancy Technique (SMART) composed of a dual-layered organic system is proposed, analyzed, implemented, and experimentally evaluated. SMART relies upon a variety of self-regulating properties to control availability, energy consumption, and area used, in dynamically-changing environments that require high degree of adaptation. The hardware layer is implemented on a Xilinx Virtex-4 Field Programmable Gate Array (FPGA) to provide self-repair using a novel approach called a Reconfigurable Adaptive Redundancy System (RARS). The software layer supervises the organic activities within the FPGA and extends the self-healing capabilities through application-independent, intrinsic, evolutionary repair techniques to leverage the benefits of dynamic Partial Reconfiguration (PR). A SMART prototype is evaluated using a Sobel edge detection application. This prototype is shown to provide sustainability for stressful occurrences of transient and permanent fault injection procedures while still reducing energy consumption and area requirements. An Organic Genetic Algorithm (OGA) technique is shown capable of consistently repairing hard faults while maintaining correct edge detector outputs, by exploiting spatial redundancy in the reconfigurable hardware. A Monte Carlo driven Continuous Markov Time Chains (CTMC) simulation is conducted to compare SMART\u27s availability to industry-standard Triple Modular Technique (TMR) techniques. Based on nine use cases, parameterized with realistic fault and repair rates acquired from publically available sources, the results indicate that availability is significantly enhanced by the adoption of fast repair techniques targeting aging-related hard-faults. Under harsh environments, SMART is shown to improve system availability from 36.02% with lengthy repair techniques to 98.84% with fast ones. This value increases to five nines (99.9998%) under relatively more favorable conditions. Lastly, SMART is compared to twenty eight standard TMR benchmarks that are generated by the widely-accepted BL-TMR tools. Results show that in seven out of nine use cases, SMART is the recommended technique, with power savings ranging from 22% to 29%, and area savings ranging from 17% to 24%, while still maintaining the same level of availability
    • …
    corecore