1,140 research outputs found

    FIMSIM: A fault injection infrastructure for microarchitectural simulators

    Get PDF
    Fault injection is a widely used approach for experiment-based dependability evaluation in which faults can be injected to the hardware, to the simulator or to the software. Simulation based fault injection is more appealing for researchers, since it can be utilized at the early design stage of the processor. As such, it enables a preliminary analysis of the correlation between the criticality of circuit level faults and their impact on applications. However, the lack of publicly available fault injectors for microarchitecture level simulators brings extra burden of designing and implementing fault injectors to the researchers who evaluate microarchitecture dependability. In this study, we present FIMSIM, to the best of our knowledge, the first publicly available fault injection simulator at the microarchitecture level. FIMSIM is a compact tool which is capable of injecting transient, permanent, intermittent and multi-bit faults. Therefore, FIMSIM provides the opportunity to comprehensively evaluate the vulnerability of different microarchitectural structures against different fault models.Postprint (published version

    Deviation-tolerant computation in concurrent failure-prone hardware

    Get PDF

    Analysis of Radiation-induced Cross Domain Errors in TMR Architectures on SRAM-based FPGAs

    Get PDF
    SRAM-Based FPGAs represent a low-cost alternative to ASIC device thanks to their high performance and design flexibility. In particular, for aerospace and avionics application fields, SRAM-based FPGAs are increasingly adopted for their configurability features making them a viable solution for long-time applications. However, these fields are characterized by a radiation environment that makes the technology extremely sensitive to radiation-induced Single Event Upsets (SEUs) in the SRAM-based FPGA’s configuration memory. Configuration scrubbing and Triple Modular Redundancy (TMR) have been widely adopted in order to cope with SEU effects. However, modern FPGA devices are characterized by a heterogeneous routing resource distribution and a complex configuration memory mapping causing an increasing sensitivity to Cross Domain Errors affecting the TMR structure. In this paper we developed a new methodology to calculate the reliability of TMR architecture considering the intrinsic characteristics of the new generation of SRAM-based FPGAs. The method includes the analysis of the configuration bit sharing phenomena and of the routing long lines. We experimentally evaluate the method of various benchmark circuits evaluating the Mean Upset To Failure (MUTF). Finally, we used the results of the developed method to implement an improved design achieving 29x improvement of the MUTF

    The effects of high energy particles on planetary missions

    Get PDF
    Researchers review the background and motivation for the detailed study of the variability and uncertainty of the particle environment from a space systems planning perspective. The engineering concern raised by each environment is emphasized rather than the underlying physics of the magnetosphere or the sun. Missions now being planned span the short term range of one to three years to periods over ten years. Thus the engineering interest is beginning to stretch over periods of several solar cycles. Coincidentally, detailed measurements of the environment are now becoming available over that period of time. Both short term and long term environmental predictions are needed for proper mission planning. Short term predictions, perhaps based on solar indices, real time observations, or short term systematics, are very useful in near term planning -- launches, EVAs (extravehicular activities), coordinated observations, and experiments which require the magnetosphere to be in a certain state. Long term predictions of both average and extreme conditions are essential to mission design. Engineering considerations are many times driven by the worst case environment. Knowledge of the average conditions and their variability allows trade-off studies to be made, implementation of designs which degrade gracefully under multi-stress environments

    Combined Time and Information Redundancy for SEU-Tolerance in Energy-Efficient Real-Time Systems

    No full text
    Recently the trade-off between energy consumption and fault-tolerance in real-time systems has been highlighted. These works have focused on dynamic voltage scaling (DVS) to reduce dynamic energy dissipation and on time redundancy to achieve transient-fault tolerance. While the time redundancy technique exploits the available slack time to increase the fault-tolerance by performing recovery executions, DVS exploits slack time to save energy. Therefore we believe there is a resource conflict between the time-redundancy technique and DVS. The first aim of this paper is to propose the usage of information redundancy to solve this problem. We demonstrate through analytical and experimental studies that it is possible to achieve both higher transient fault-tolerance (tolerance to single event upsets (SEU)) and less energy using a combination of information and time redundancy when compared with using time redundancy alone. The second aim of this paper is to analyze the interplay of transient-fault tolerance (SEU-tolerance) and adaptive body biasing (ABB) used to reduce static leakage energy, which has not been addressed in previous studies. We show that the same technique (i.e. the combination of time and information redundancy) is applicable to ABB-enabled systems and provides more advantages than time redundancy alone

    Synthesis for circuit reliability

    Get PDF
    textElectrical and Computer Engineerin

    Masking and Detecting Radiation-Induced Errors in SRAM-Based FPGAs Through Partial Circuit Replication

    Get PDF
    Radiation found in terrestrial and space environments can induce errors into SRAM-based FPGAs. Replication of circuitry can be used mask and detect these errors to improve reliability or availability. This work advances the understanding and implementation of partial circuit replication in SRAM-based FPGAs. Partial circuit replication is the replication of a subset of the components in a circuit. A reliability model is presented that evaluates the reliability benefit of partial circuit replication. The model suggests that the reliability benefit is inversely related to the portion of the circuit replicated. A partial triple module redundancy case study is also presented that evaluates several different selection algorithms. Random selection was found to be ineffective and maximizing protected routes while minimizing inserted voters provided a high return, reducing failure likelihood by 20% with only 9% coverage. A final study applied duplication with compare to an FPGA-based networking system to detect persistent silent network disruptions. A coverage of 29% was able to detect 45% of these failures in neutron radiation testing

    Fault-tolerant sub-lithographic design with rollback recovery

    Get PDF
    Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (Pf = 10^-7) in systems with 10^12 susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme
    • …
    corecore