3,975 research outputs found

    Fast and accurate SER estimation for large combinational blocks in early stages of the design

    Get PDF
    Soft Error Rate (SER) estimation is an important challenge for integrated circuits because of the increased vulnerability brought by technology scaling. This paper presents a methodology to estimate in early stages of the design the susceptibility of combinational circuits to particle strikes. In the core of the framework lies MASkIt , a novel approach that combines signal probabilities with technology characterization to swiftly compute the logical, electrical, and timing masking effects of the circuit under study taking into account all input combinations and pulse widths at once. Signal probabilities are estimated applying a new hybrid approach that integrates heuristics along with selective simulation of reconvergent subnetworks. The experimental results validate our proposed technique, showing a speedup of two orders of magnitude in comparison with traditional fault injection estimation with an average estimation error of 5 percent. Finally, we analyze the vulnerability of the Decoder, Scheduler, ALU, and FPU of an out-of-order, superscalar processor design.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness and Feder Funds under grant TIN2013-44375-R, by the Generalitat de Catalunya under grant FI-DGR 2016, and by the FP7 program of the EU under contract FP7-611404 (CLERECO).Peer ReviewedPostprint (author's final draft

    Cross-layer Soft Error Analysis and Mitigation at Nanoscale Technologies

    Get PDF
    This thesis addresses the challenge of soft error modeling and mitigation in nansoscale technology nodes and pushes the state-of-the-art forward by proposing novel modeling, analyze and mitigation techniques. The proposed soft error sensitivity analysis platform accurately models both error generation and propagation starting from a technology dependent device level simulations all the way to workload dependent application level analysis

    Single event upset hardened embedded domain specific reconfigurable architecture

    Get PDF

    Radiation-Induced Error Criticality in Modern HPC Parallel Accelerators

    Get PDF
    In this paper, we evaluate the error criticality of radiation-induced errors on modern High-Performance Computing (HPC) accelerators (Intel Xeon Phi and NVIDIA K40) through a dedicated set of metrics. We show that, as long as imprecise computing is concerned, the simple mismatch detection is not sufficient to evaluate and compare the radiation sensitivity of HPC devices and algorithms. Our analysis quantifies and qualifies radiation effects on applications’ output correlating the number of corrupted elements with their spatial locality. Also, we provide the mean relative error (dataset-wise) to evaluate radiation-induced error magnitude. We apply the selected metrics to experimental results obtained in various radiation test campaigns for a total of more than 400 hours of beam time per device. The amount of data we gathered allows us to evaluate the error criticality of a representative set of algorithms from HPC suites. Additionally, based on the characteristics of the tested algorithms, we draw generic reliability conclusions for broader classes of codes. We show that arithmetic operations are less critical for the K40, while Xeon Phi is more reliable when executing particles interactions solved through Finite Difference Methods. Finally, iterative stencil operations seem the most reliable on both architectures.This work was supported by the STIC-AmSud/CAPES scientific cooperation program under the EnergySFE research project grant 99999.007556/2015-02, EU H2020 Programme, and MCTI/RNP-Brazil under the HPC4E Project, grant agreement n° 689772. Tested K40 boards were donated thanks to Steve Keckler, Timothy Tsai, and Siva Hari from NVIDIA.Postprint (author's final draft

    Evaluation Applied to Reliability Analysis of Reconfigurable, Highly Reliable, Fault-Tolerant, Computing Systems for Avionics

    Get PDF
    Emulation techniques are proposed as a solution to a difficulty arising in the analysis of the reliability of highly reliable computer systems for future commercial aircraft. The difficulty, viz., the lack of credible precision in reliability estimates obtained by analytical modeling techniques are established. The difficulty is shown to be an unavoidable consequence of: (1) a high reliability requirement so demanding as to make system evaluation by use testing infeasible, (2) a complex system design technique, fault tolerance, (3) system reliability dominated by errors due to flaws in the system definition, and (4) elaborate analytical modeling techniques whose precision outputs are quite sensitive to errors of approximation in their input data. The technique of emulation is described, indicating how its input is a simple description of the logical structure of a system and its output is the consequent behavior. The use of emulation techniques is discussed for pseudo-testing systems to evaluate bounds on the parameter values needed for the analytical techniques

    Enhancing Program Soft Error Resilience through Algorithmic Approaches

    Get PDF
    The rising count and shrinking feature size of transistors within modern computers is making them increasingly vulnerable to various types of soft faults. This problem is especially acute in high-performance computing (HPC) systems used for scientific computing, because these systems include many thousands of compute cores and nodes, all of which may be utilized in a single large-scale run. The increasing vulnerability of HPC applications to errors induced by soft faults is motivating extensive work on techniques to make these applications more resilient to such faults, ranging from generic techniques such as replication or checkpoint/restart to algorithm-specific error detection and tolerance techniques. Effective use of such techniques requires a detailed understanding of how a given application is affected by soft faults to ensure that (i) efforts to improve application resilience are spent in the code regions most vulnerable to faults, (ii) the appropriate resilience techniques is applied to each code region, and (iii) the understanding be obtained in an efficient manner. This thesis presents two tools: FaultTelescope helps application developers view the routine and application vulnerability to soft errors while ErrorSight helps perform modular fault characteristics analysis for more complex applications. This thesis also illustrates how these tools can be used in the context of representative applications and kernels. In addition to providing actionable insights into application behavior, the tools automatically selects the number of fault injection experiments required to efficiently generation error profiles of an application, ensuring that the information is statistically well-grounded without performing unnecessary experiments

    Study of Single Event Transient Error Mitigation

    Get PDF
    Single Event Transient (SET) errors in ground-level electronic devices are a growing concern in the radiation hardening field. However, effective SET mitigation technologies which satisfy ground-level demands such as generic, flexible, efficient, and fast, are limited. The classic Triple Modular Redundancy (TMR) method is the most well-known and popular technique in space and nuclear environment. But it leads to more than 200% area and power overheads, which is too costly to implement in ground-level applications. Meanwhile, the coding technique is extensively utilized to inhibit upset errors in storage cells, but the irregularity of combinatorial logics limits its use in SET mitigation. Therefore, SET mitigation techniques suitable for ground-level applications need to be addressed. Aware of the demands for SET mitigation techniques in ground-level applications, this thesis proposes two novel approaches based on the redundant wire and approximate logic techniques. The Redundant Wire is a SET mitigation technique. By selectively adding redundant wire connections, the technique can prohibit targeted transient faults from propagating on the fly. This thesis proposes a set of signature-based evaluation equations to efficiently estimate the protecting effect provided by each redundant wire candidates. Based on the estimated results, a greedy algorithm is used to insert the best candidate repeatedly. Simulation results substantiate that the evaluation equations can achieve up to 98% accuracy on average. Regarding protecting effects, the technique can mask 18.4% of the faults with a 4.3% area, 4.4% power, and 5.4% delay overhead on average. Overall, the quality of protecting results obtained are 2.8 times better than the previous work. Additionally, the impact of synthesis constraints and signature length are discussed. Approximate Logic is a partial TMR technique offering a trade-off between fault coverage and area overheads. The approximate logic consists of an under-approximate logic and an over-approximate logic. The under-approximate logic is a subset of the original min-terms and the over-approximate logic is a subset of the original max-terms. This thesis proposes a new algorithm for generating the two approximate logics. Through the generating process, the algorithm considers the intrinsic failure probabilities of each gate and utilizes a confidence interval estimate equation to minimize required computations. The technique is applied to two fault models, Stuck-at and SET, and the separate results are compared and discussed. The results show that the technique can reduce the error 75% with an area penalty of 46% on some circuits. The delay overheads of this technique are always two additional layers of logic. The two proposed SET mitigation techniques are both applicable to generic combinatorial logics and with high flexibility. The simulation shows promising SET mitigation ability. The proposed mitigation techniques provide designers more choices in developing reliable combinatorial logic in ground-level applications

    Analysis of Single Event Upsets Propagation at Register Transfer Level in Combinational and Sequential Circuits Based on Satisfiability Modulo Theories

    Get PDF
    The progressive scaling of semiconductor technologies has led to significant performance improvements in digital designs. However, ultra-deep sub-micron technologies have increased the vulnerability of VLSI designs to soft errors. In order to allow a cost-effective reliability aware design process, it is critical to assess soft error reliability parameters in early design stages. This thesis proposes a new technique to model, analyze and estimate the propagation of Single Event Upsets (SEUs) in combinational and sequential designs described at the Register Transfer Level (RTL) using Satisfiability Modulo Theories (SMT). The propagation of SEUs through RTL bit-vector constructs is modeled as a Satisfiability problem using the SMT theory of bit-vectors. At first, for combinational designs, two different analysis techniques, concrete and abstract modeling, are used in order to investigate the efficiency and accuracy of a data type reduction technique for soft error analysis. To analyze the vulnerability of the combinational circuits, we compute the Soft Error Rate (SER), which is a summation of the propagation probabilities. Concrete modeling uses two versions of the design, one faulty and one fault-free, in order to analyze SEU propagation. Abstract modeling uses a data type reduction technique to evaluate the difference in performance and accuracy over the first method. Experimental results demonstrate that the loss in accuracy due to abstract modeling depends on the design behavior. However, abstract modeling allows to reduce processing time significantly. Following this first approach, the methodology is then extended to model and analyze SEU propagation in sequential circuits at RTL. In order to estimate the vulnerability of sequential circuits to soft errors, the methodology must be adapted to represent state transitions. To do so, we present an approach that uses circuit unrolling. This approach uses multiple unrolled copies of the design to represent the various state transitions. The fault propagation is then analyzed through a certain number of states. Useful information regarding the vulnerability to SEUs of the sequential circuit can then be generated. The propagation probabilities can be computed from the SEU injection cycle to multiple subsequent cycles. These results are then used to estimate the circuit Soft Error Rate (SER). Experimental results demonstrate the effectiveness and the applicability of the proposed approach. Finally, we present a new methodology to estimate digital circuit vulnerability to soft errors at Register Transfer Level (RTL). Single Event Upsets (SEUs) propagation through RTL bit-vector operations is modeled and analyzed using a different modeling approach based on Satisfiability Modulo Theories (SMTs). The objective of this new approach is to improve the efficiency of the analysis. For instance, the bit-vector reduction operators and arithmetic operators were modeled in SMT to include the fault propagation properties. This approach uses only one copy of the design to do the analysis. This means that the fault propagation properties are embedded within the SMT equivalent of the RTL constructs themselves, and therefore does not require two-copies of the design to analyze. In order to illustrate the practical utilization of our work, we have analyzed different RTL combinational circuits. Experimental results demonstrate that the proposed framework is faster than other comparable contemporary techniques. Moreover, it provides more accurate and detailed results of the circuit vulnerability allowing a more efficient applicability of fault tolerance techniques

    Digital design techniques for dependable High-Performance Computing

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen
    • …
    corecore