671 research outputs found

    Cross-layer Soft Error Analysis and Mitigation at Nanoscale Technologies

    Get PDF
    This thesis addresses the challenge of soft error modeling and mitigation in nansoscale technology nodes and pushes the state-of-the-art forward by proposing novel modeling, analyze and mitigation techniques. The proposed soft error sensitivity analysis platform accurately models both error generation and propagation starting from a technology dependent device level simulations all the way to workload dependent application level analysis

    Cross-Layer Resiliency Modeling and Optimization: A Device to Circuit Approach

    Get PDF
    The never ending demand for higher performance and lower power consumption pushes the VLSI industry to further scale the technology down. However, further downscaling of technology at nano-scale leads to major challenges. Reduced reliability is one of them, arising from multiple sources e.g. runtime variations, process variation, and transient errors. The objective of this thesis is to tackle unreliability with a cross layer approach from device up to circuit level

    Fast and accurate SER estimation for large combinational blocks in early stages of the design

    Get PDF
    Soft Error Rate (SER) estimation is an important challenge for integrated circuits because of the increased vulnerability brought by technology scaling. This paper presents a methodology to estimate in early stages of the design the susceptibility of combinational circuits to particle strikes. In the core of the framework lies MASkIt , a novel approach that combines signal probabilities with technology characterization to swiftly compute the logical, electrical, and timing masking effects of the circuit under study taking into account all input combinations and pulse widths at once. Signal probabilities are estimated applying a new hybrid approach that integrates heuristics along with selective simulation of reconvergent subnetworks. The experimental results validate our proposed technique, showing a speedup of two orders of magnitude in comparison with traditional fault injection estimation with an average estimation error of 5 percent. Finally, we analyze the vulnerability of the Decoder, Scheduler, ALU, and FPU of an out-of-order, superscalar processor design.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness and Feder Funds under grant TIN2013-44375-R, by the Generalitat de Catalunya under grant FI-DGR 2016, and by the FP7 program of the EU under contract FP7-611404 (CLERECO).Peer ReviewedPostprint (author's final draft

    Reliability-energy-performance optimisation in combinational circuits in presence of soft errors

    Get PDF
    PhD ThesisThe reliability metric has a direct relationship to the amount of value produced by a circuit, similar to the performance metric. With advances in CMOS technology, digital circuits become increasingly more susceptible to soft errors. Therefore, it is imperative to be able to assess and improve the level of reliability of these circuits. A framework for evaluating and improving the reliability of combinational circuits is proposed, and an interplay between the metrics of reliability, energy and performance is explored. Reliability evaluation is divided into two levels of characterisation: stochastic fault model (SFM) of the component library and a design-specific critical vector model (CVM). The SFM captures the properties of components with regard to the interference which causes error. The CVM is derived from a limited number of simulation runs on the specific design at the design time and producing the reliability metric. The idea is to move the high-complexity problem of the stochastic characterisation of components to the generic part of the design process, and to do it just once for a large number of specific designs. The method is demonstrated on a range of circuits with various structures. A three-way trade-off between reliability, energy, and performance has been discovered; this trade-off facilitates optimisations of circuits and their operating conditions. A technique for improving the reliability of a circuit is proposed, based on adding a slow stage at the primary output. Slow stages have the ability to absorb narrow glitches from prior stages, thus reducing the error probability. Such stages, or filters, suppress most of the glitches generated in prior stages and prevent them from arriving at the primary output of the circuit. Two filter solutions have been developed and analysed. The results show a dramatic improvement in reliability at the expense of minor performance and energy penalties. To alleviate the problem of the time-consuming analogue simulations involved in the proposed method, a simplification technique is proposed. This technique exploits the equivalence between the properties of the gates within a path and the equivalence between paths. On the basis of these equivalences, it is possible to reduce the number of simulation runs. The effectiveness of the proposed technique is evaluated by applying it to different circuits with a representative variety of path topologies. The results show a significant decrease in the time taken to estimate reliability at the expense of a minor decrease in the accuracy of estimation. The simplification technique enables the use of the proposed method in applications with complex circuits.Ministry of Education and Scientific Research in Liby

    Techniques pour l'évaluation et l'amélioration du comportement des technologies émergentes face aux fautes aléatoires

    Get PDF
    The main objective of this thesis is to develop analysis and mitigation techniques that can be used to face the effects of radiation-induced soft errors - external and internal disturbances produced by radioactive particles, affecting the reliability and safety in operation complex microelectronic circuits. This thesis aims to provide industrial solutions and methodologies for the areas of terrestrial applications requiring ultimate reliability (telecommunications, medical devices, ...) to complement previous work on Soft Errors traditionally oriented aerospace, nuclear and military applications.The work presented uses a decomposition of the error sources, inside the current circuits, to highlight the most important contributors.Single Event Effects in sequential logic cells represent the current target for analysis and improvement efforts in both industry and academia. This thesis presents a state-aware analysis methodology that improves the accuracy of Soft Error Rate data for individual sequential instances based on the circuit and application. Furthermore, the intrinsic imbalance between the SEU susceptibility of different flip-flop states is exploited to implement a low-cost SER improvement strategy.Single Event Transients affecting combinational logic are considerably more difficult to model, simulate and analyze than the closely-related Single Event Upsets. The working environment may cause a myriad of distinctive transient pulses in various cell types that are used in widely different configurations. This thesis presents practical approach to a possible exhaustive Single Event Transient evaluation flow in an industrial setting. The main steps of this process consists in: a) fully characterize the standard cell library using a process and library-aware SER tool, b) evaluate SET effects in the logic networks of the circuit using a variety dynamic (simulation-based) and static (probabilistic) methods and c) compute overall SET figures taking into account the particularities of the implementation of the circuit and its environment.Fault-injection remains the primary method for analyzing the effects of soft errors. This document presents the results of functional analysis of a complex CPU. Three representative benchmarks were considered for this analysis. Accelerated simulation techniques (probabilistic calculations, clustering, parallel simulations) have been proposed and evaluated in order to develop an industrial validation environment, able to take into account very complex circuits. The results obtained allowed the development and evaluation of a hypothetical mitigation scenario that aims to significantly improve the reliability of the circuit at the lowest cost.The results obtained show that the error rate, SDC (Silent Data Corruption) and DUE (Detectable Uncorrectable Errors) can be significantly reduced by hardening a small part of the circuit (Selective mitigation).In addition to the main axis of research, some tangential topics were studied in collaboration with other teams. One of these consisted in the study of a technique for the mitigation of flip-flop soft-errors through an optimization of the Temporal De-Rating (TDR) by selectively inserting delay on the input or output of flip-flops.The Methodologies, the algorithms and the CAD tools proposed and validated as part of the work are intended for industrial use and have been included in a commercial CAD framework that offers a complete solution for assessing the reliability of circuits and complex electronic systems.L'objectif principal de cette thèse est de développer des techniques d'analyse et mitigation capables à contrer les effets des Evènements Singuliers (Single Event Effects) - perturbations externes et internes produites par les particules radioactives, affectant la fiabilité et la sureté en fonctionnement des circuits microélectroniques complexes. Cette thèse à la vocation d'offrir des solutions et méthodologies industrielles pour les domaines d'applications terrestres exigeant une fiabilité ultime (télécommunications, dispositifs médicaux, ...) en complément des travaux précédents sur les Soft Errors, traditionnellement orientés vers les applications aérospatiales, nucléaires et militaires.Les travaux présentés utilisent une décomposition de sources d'erreurs dans les circuits actuels, visant à mettre en évidence les contributeurs les plus importants.Les upsets (SEU) - Evènements Singuliers (ES) dans les cellules logiques séquentielles représentent actuellement la cible principale pour les efforts d'analyse et d'amélioration à la fois dans l'industrie et dans l'académie. Cette thèse présente une méthodologie d'analyse basée sur la prise en compte de la sensibilité de chaque état logique d'une cellule (state-awareness), approche qui améliore considérablement la précision des résultats concernant les taux des évènements pour les instances séquentielles individuelles. En outre, le déséquilibre intrinsèque entre la susceptibilité des différents états des bascules est exploité pour mettre en œuvre une stratégie d'amélioration SER à très faible coût.Les fautes transitoires (SET) affectant la logique combinatoire sont beaucoup plus difficiles à modéliser, à simuler et à analyser que les SEUs. L'environnement radiatif peut provoquer une multitude d'impulsions transitoires dans les divers types de cellules qui sont utilisés en configurations multiples. Cette thèse présente une approche pratique pour l'analyse SET, applicable à des circuits industriels très complexes. Les principales étapes de ce processus consiste à: a) caractériser complètement la bibliothèque de cellules standard, b) évaluer les SET dans les réseaux logiques du circuit en utilisant des méthodes statiques et dynamiques et c) calculer le taux SET global en prenant en compte les particularités de l'implémentation du circuit et de son environnement.L'injection de fautes reste la principale méthode d'analyse pour étudier l'impact des fautes, erreurs et disfonctionnements causés par les évènements singuliers. Ce document présente les résultats d'une analyse fonctionnelle d'un processeur complexe dans la présence des fautes et pour une sélection d'applications (benchmarks) représentatifs. Des techniques d'accélération de la simulation (calculs probabilistes, clustering, simulations parallèles) ont été proposées et évalués afin d'élaborer un environnement de validation industriel, capable à prendre en compte des circuits très complexes. Les résultats obtenus ont permis l'élaboration et l'évaluation d'un hypothétique scénario de mitigation qui vise à améliorer sensiblement, et cela au moindre coût, la fiabilité du circuit sous test. Les résultats obtenus montrent que les taux d'erreur, SDC (Silent Data Corruption) et DUE (Detectable Uncorrectable Errors) peuvent être considérablement réduits par le durcissement d'un petite partie du circuit (protection sélective). D'autres techniques spécifiques ont été également déployées: mitigation du taux de soft-errors des Flip-Flips grâce à une optimisation du Temporal De-Rating par l'insertion sélective de retard sur l'entrée ou la sortie des bascules et biasing du circuit pour privilégier les états moins sensibles.Les méthodologies, algorithmes et outils CAO proposés et validés dans le cadre de ces travaux sont destinés à un usage industriel et ont été valorisés dans le cadre de plateforme CAO commerciale visant à offrir une solution complète pour l'évaluation de la fiabilité des circuits et systèmes électroniques complexes

    The impact of soft errors in logic and its commercialisation in ARM IP

    Get PDF
    The significance of soft errors in logic has grown because of reduced memory vulnerability and the shrinking dimensions of semiconductor technology coupled with the increasing amount of logic integrated into a chip. Consequently, some of ARM’s customers are concerned about how soft errors on the bus interconnect will affect the dependability of their systems, since the interconnect is a critical hub of communication in a SoC and represents a substantial and growing amount of logic. With the rising complexity of their systems, the interconnect will become larger and more complex in the future, adding to their concern. In this work the impact of soft errors on the bus interconnect logic was investigated and a product was developed to ameliorate the effects of such errors on ARM’s customers’ products. Methods to measure the SER of ARM IP were investigated by focusing on logical masking, which is a component in the calculation of the SER. The effect that the topology of a combinatorial logic circuit has on its logical masking rate was considered by performing gate-level statistical fault injection on different implementations of adder circuits. Significant variation in logical masking was found ranging from a factor of 3.1 at a synthesis frequency of 100 MHz to a factor of 2.1 at 900 MHz. This difference is explained in an original way by correlating logical masking with the circuit’s path length and fan-out. These properties could be used to create a static method of measuring the logical masking rather than the current time-consuming method of dynamic simulation. Additionally, nearly 30% of faults injected cause more than one error, which means that the combinational SER will be underestimated if research does not take gate fan-out into consideration. Using this methodology a circuit designer can now base his choice or development of a circuit on its reliability as well as its performance, power, and area. Studying the variation in the factors that affect the SER is important to ensure accuracy in addressing customer requirements. Although it is important to consider the rate of soft error occurrence, in this work the impact of errors is demonstrated to be critical. Using protocol-level fault injection it is shown that faults on the ARM AXI bus interconnect can have a serious effect on the reliability of the entire SoC such as deadlock, memory corruption, or undefined behaviour. Using a fault-path traversal algorithm, it is demonstrated that traditional error detection codes are not sufficient at preventing these failures when faults occur on certain AXI bus signals. This led to the development of novel fault tolerant methods that provide protection for these identified signals. Based on these developments, a product was proposed for an add-on to the AXI bus interconnect that can detect, correct, and report logic soft errors without changing the AMBA standard or the customer’s connecting IP

    Radiation Hardened by Design Methodologies for Soft-Error Mitigated Digital Architectures

    Get PDF
    abstract: Digital architectures for data encryption, processing, clock synthesis, data transfer, etc. are susceptible to radiation induced soft errors due to charge collection in complementary metal oxide semiconductor (CMOS) integrated circuits (ICs). Radiation hardening by design (RHBD) techniques such as double modular redundancy (DMR) and triple modular redundancy (TMR) are used for error detection and correction respectively in such architectures. Multiple node charge collection (MNCC) causes domain crossing errors (DCE) which can render the redundancy ineffectual. This dissertation describes techniques to ensure DCE mitigation with statistical confidence for various designs. Both sequential and combinatorial logic are separated using these custom and computer aided design (CAD) methodologies. Radiation vulnerability and design overhead are studied on VLSI sub-systems including an advanced encryption standard (AES) which is DCE mitigated using module level coarse separation on a 90-nm process with 99.999% DCE mitigation. A radiation hardened microprocessor (HERMES2) is implemented in both 90-nm and 55-nm technologies with an interleaved separation methodology with 99.99% DCE mitigation while achieving 4.9% increased cell density, 28.5 % reduced routing and 5.6% reduced power dissipation over the module fences implementation. A DMR register-file (RF) is implemented in 55 nm process and used in the HERMES2 microprocessor. The RF array custom design and the decoders APR designed are explored with a focus on design cycle time. Quality of results (QOR) is studied from power, performance, area and reliability (PPAR) perspective to ascertain the improvement over other design techniques. A radiation hardened all-digital multiplying pulsed digital delay line (DDL) is designed for double data rate (DDR2/3) applications for data eye centering during high speed off-chip data transfer. The effect of noise, radiation particle strikes and statistical variation on the designed DDL are studied in detail. The design achieves the best in class 22.4 ps peak-to-peak jitter, 100-850 MHz range at 14 pJ/cycle energy consumption. Vulnerability of the non-hardened design is characterized and portions of the redundant DDL are separated in custom and auto-place and route (APR). Thus, a range of designs for mission critical applications are implemented using methodologies proposed in this work and their potential PPAR benefits explored in detail.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Statistical Reliability Estimation of Microprocessor-Based Systems

    Get PDF
    What is the probability that the execution state of a given microprocessor running a given application is correct, in a certain working environment with a given soft-error rate? Trying to answer this question using fault injection can be very expensive and time consuming. This paper proposes the baseline for a new methodology, based on microprocessor error probability profiling, that aims at estimating fault injection results without the need of a typical fault injection setup. The proposed methodology is based on two main ideas: a one-time fault-injection analysis of the microprocessor architecture to characterize the probability of successful execution of each of its instructions in presence of a soft-error, and a static and very fast analysis of the control and data flow of the target software application to compute its probability of success. The presented work goes beyond the dependability evaluation problem; it also has the potential to become the backbone for new tools able to help engineers to choose the best hardware and software architecture to structurally maximize the probability of a correct execution of the target softwar
    corecore