657 research outputs found

    Error Mitigation Using Approximate Logic Circuits: A Comparison of Probabilistic and Evolutionary Approaches

    Get PDF
    Technology scaling poses an increasing challenge to the reliability of digital circuits. Hardware redundancy solutions, such as triple modular redundancy (TMR), produce very high area overhead, so partial redundancy is often used to reduce the overheads. Approximate logic circuits provide a general framework for optimized mitigation of errors arising from a broad class of failure mechanisms, including transient, intermittent, and permanent failures. However, generating an optimal redundant logic circuit that is able to mask the faults with the highest probability while minimizing the area overheads is a challenging problem. In this study, we propose and compare two new approaches to generate approximate logic circuits to be used in a TMR schema. The probabilistic approach approximates a circuit in a greedy manner based on a probabilistic estimation of the error. The evolutionary approach can provide radically different solutions that are hard to reach by other methods. By combining these two approaches, the solution space can be explored in depth. Experimental results demonstrate that the evolutionary approach can produce better solutions, but the probabilistic approach is close. On the other hand, these approaches provide much better scalability than other existing partial redundancy techniques.This work was supported by the Ministry of Economy and Competitiveness of Spain under project ESP2015-68245-C4-1-P, and by the Czech science foundation project GA16-17538S and the Ministry of Education, Youth and Sports of the Czech Republic from the National Programme of Sustainability (NPU II); project IT4Innovations excellence in science - LQ1602

    Conception, optimisation, et vérification formelle de techniques de tolérance aux fautes pour circuits

    Get PDF
    Technology shrinking and voltage scaling increase the risk of fault occurrences in digital circuits. To address this challenge, engineers use fault-tolerance techniques to mask or, at least, to detect faults. These techniques are especially needed in safety critical domains (e.g., aerospace, medical, nuclear, etc.), where ensuring the circuit functionality and fault-tolerance is crucial. However, the verification of functional and fault-tolerance properties is a complex problem that cannot be solved with simulation-based methodologies due to the need to check a huge number of executions and fault occurrence scenarios. The optimization of the overheads imposed by fault-tolerance techniques also requires the proof that the circuit keeps its fault-tolerance properties after the optimization.In this work, we propose a verification-based optimization of existing fault-tolerance techniques as well as the design of new techniques and their formal verification using theorem proving. We first investigate how some majority voters can be removed from Triple-Modular Redundant (TMR) circuits without violating their fault-tolerance properties. The developed methodology clarifies how to take into account circuit native error-masking capabilities that may exist due to the structure of the combinational part or due to the way the circuit is used and communicates with the surrounding device.Second, we propose a family of time-redundant fault-tolerance techniques as automatic circuit transformations. They require less hardware resources than TMR alternatives and could be easily integrated in EDA tools. The transformations are based on the novel idea of dynamic time redundancy that allows the redundancy level to be changed "on-the-fly" without interrupting the computation. Therefore, time-redundancy can be used only in critical situations (e.g., above Earth poles where the radiation level is increased), during the processing of crucial data (e.g., the encryption of selected data), or during critical processes (e.g., a satellite computer reboot).Third, merging dynamic time redundancy with a micro-checkpointing mechanism, we have created a double-time redundancy transformation capable of masking transient faults. Our technique makes the recovery procedure transparent and the circuit input/output behavior remains unchanged even under faults. Due to the complexity of that method and the need to provide full assurance of its fault-tolerance capabilities, we have formally certified the technique using the Coq proof assistant. The developed proof methodology can be applied to certify other fault-tolerance techniques implemented through circuit transformations at the netlist level.La miniaturisation de la gravure et l'ajustement dynamique du voltage augmentent le risque de fautes dans les circuits intégrés. Pour pallier cet inconvénient, les ingénieurs utilisent des techniques de tolérance aux fautes pour masquer ou, au moins, détecter les fautes. Ces techniques sont particulièrement utilisées dans les domaines critiques (aérospatial, médical, nucléaire, etc.) où les garanties de bon fonctionnement des circuits et leurs tolérance aux fautes sont cruciales. Cependant, la vérification de propriétés fonctionnelles et de tolérance aux fautes est un problème complexe qui ne peut être résolu par simulation en raison du grand nombre d'exécutions possibles et de scénarios d'occurrence des fautes. De même, l'optimisation des surcoûts matériels ou temporels imposés par ces techniques demande de garantir que le circuit conserve ses propriétés de tolérance aux fautes après optimisation.Dans cette thèse, nous décrivons une optimisation de techniques de tolérance aux fautes classiques basée sur des analyses statiques, ainsi que de nouvelles techniques basées sur la redondance temporelle. Nous présentons comment leur correction peut être vérifiée formellement à l'aide d'un assistant de preuves.Nous étudions d'abord comment certains voteurs majoritaires peuvent être supprimés des circuits basés sur la redondance matérielle triple (TMR) sans violer leurs propriétés de tolérance. La méthodologie développée prend en compte les particularités des circuits (par ex. masquage logique d'erreurs) et des entrées/sorties pour optimiser la technique TMR.Deuxièmement, nous proposons une famille de techniques utilisant la redondance temporelle comme des transformations automatiques de circuits. Elles demandent moins de ressources matérielles que TMR et peuvent être facilement intégrés dans les outils de CAO. Les transformations sont basées sur une nouvelle idée de redondance temporelle dynamique qui permet de modifier le niveau de redondance «à la volée» sans interrompre le calcul. Le niveau de redondance peut être augmenté uniquement dans les situations critiques (par exemple, au-dessus des pôles où le niveau de rayonnement est élevé), lors du traitement de données cruciales (par exemple, le cryptage de données sensibles), ou pendant des processus critiques (par exemple, le redémarrage de l'ordinateur d'un satellite).Troisièmement, en associant la redondance temporelle dynamique avec un mécanisme de micro-points de reprise, nous proposons une transformation avec redondance temporelle double capable de masquer les fautes transitoires. La procédure de recouvrement est transparente et le comportement entrée/sortie du circuit reste identique même lors d'occurrences de fautes. En raison de la complexité de cette méthode, la garantie totale de sa correction a nécessité une certification formelle en utilisant l'assistant de preuves Coq. La méthodologie développée peut être appliquée pour certifier d'autres techniques de tolérance aux fautes exprimées comme des transformations de circuits

    Fast and accurate SER estimation for large combinational blocks in early stages of the design

    Get PDF
    Soft Error Rate (SER) estimation is an important challenge for integrated circuits because of the increased vulnerability brought by technology scaling. This paper presents a methodology to estimate in early stages of the design the susceptibility of combinational circuits to particle strikes. In the core of the framework lies MASkIt , a novel approach that combines signal probabilities with technology characterization to swiftly compute the logical, electrical, and timing masking effects of the circuit under study taking into account all input combinations and pulse widths at once. Signal probabilities are estimated applying a new hybrid approach that integrates heuristics along with selective simulation of reconvergent subnetworks. The experimental results validate our proposed technique, showing a speedup of two orders of magnitude in comparison with traditional fault injection estimation with an average estimation error of 5 percent. Finally, we analyze the vulnerability of the Decoder, Scheduler, ALU, and FPU of an out-of-order, superscalar processor design.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness and Feder Funds under grant TIN2013-44375-R, by the Generalitat de Catalunya under grant FI-DGR 2016, and by the FP7 program of the EU under contract FP7-611404 (CLERECO).Peer ReviewedPostprint (author's final draft

    Single event upset hardened embedded domain specific reconfigurable architecture

    Get PDF

    Analysis and Design of Resilient VLSI Circuits

    Get PDF
    The reliable operation of Integrated Circuits (ICs) has become increasingly difficult to achieve in the deep sub-micron (DSM) era. With continuously decreasing device feature sizes, combined with lower supply voltages and higher operating frequencies, the noise immunity of VLSI circuits is decreasing alarmingly. Thus, VLSI circuits are becoming more vulnerable to noise effects such as crosstalk, power supply variations and radiation-induced soft errors. Among these noise sources, soft errors (or error caused by radiation particle strikes) have become an increasingly troublesome issue for memory arrays as well as combinational logic circuits. Also, in the DSM era, process variations are increasing at an alarming rate, making it more difficult to design reliable VLSI circuits. Hence, it is important to efficiently design robust VLSI circuits that are resilient to radiation particle strikes and process variations. The work presented in this dissertation presents several analysis and design techniques with the goal of realizing VLSI circuits which are tolerant to radiation particle strikes and process variations. This dissertation consists of two parts. The first part proposes four analysis and two design approaches to address radiation particle strikes. The analysis techniques for the radiation particle strikes include: an approach to analytically determine the pulse width and the pulse shape of a radiation induced voltage glitch in combinational circuits, a technique to model the dynamic stability of SRAMs, and a 3D device-level analysis of the radiation tolerance of voltage scaled circuits. Experimental results demonstrate that the proposed techniques for analyzing radiation particle strikes in combinational circuits and SRAMs are fast and accurate compared to SPICE. Therefore, these analysis approaches can be easily integrated in a VLSI design flow to analyze the radiation tolerance of such circuits, and harden them early in the design flow. From 3D device-level analysis of the radiation tolerance of voltage scaled circuits, several non-intuitive observations are made and correspondingly, a set of guidelines are proposed, which are important to consider to realize radiation hardened circuits. Two circuit level hardening approaches are also presented to harden combinational circuits against a radiation particle strike. These hardening approaches significantly improve the tolerance of combinational circuits against low and very high energy radiation particle strikes respectively, with modest area and delay overheads. The second part of this dissertation addresses process variations. A technique is developed to perform sensitizable statistical timing analysis of a circuit, and thereby improve the accuracy of timing analysis under process variations. Experimental results demonstrate that this technique is able to significantly reduce the pessimism due to two sources of inaccuracy which plague current statistical static timing analysis (SSTA) tools. Two design approaches are also proposed to improve the process variation tolerance of combinational circuits and voltage level shifters (which are used in circuits with multiple interacting power supply domains), respectively. The variation tolerant design approach for combinational circuits significantly improves the resilience of these circuits to random process variations, with a reduction in the worst case delay and low area penalty. The proposed voltage level shifter is faster, requires lower dynamic power and area, has lower leakage currents, and is more tolerant to process variations, compared to the best known previous approach. In summary, this dissertation presents several analysis and design techniques which significantly augment the existing work in the area of resilient VLSI circuit design

    Fully Automated Radiation Hardened by Design Circuit Construction

    Get PDF
    abstract: A fully automated logic design methodology for radiation hardened by design (RHBD) high speed logic using fine grained triple modular redundancy (TMR) is presented. The hardening techniques used in the cell library are described and evaluated, with a focus on both layout techniques that mitigate total ionizing dose (TID) and latchup issues and flip-flop designs that mitigate single event transient (SET) and single event upset (SEU) issues. The base TMR self-correcting master-slave flip-flop is described and compared to more traditional hardening techniques. Additional refinements are presented, including testability features that disable the self-correction to allow detection of manufacturing defects. The circuit approach is validated for hardness using both heavy ion and proton broad beam testing. For synthesis and auto place and route, the methodology and circuits leverage commercial logic design automation tools. These tools are glued together with custom CAD tools designed to enable easy conversion of standard single redundant hardware description language (HDL) files into hardened TMR circuitry. The flow allows hardening of any synthesizable logic at clock frequencies comparable to unhardened designs and supports standard low-power techniques, e.g. clock gating and supply voltage scaling.Dissertation/ThesisPh.D. Electrical Engineering 201

    Reliable chip design from low powered unreliable components

    Get PDF
    The pace of technological improvement of the semiconductor market is driven by Moore’s Law, enabling chip transistor density to double every two years. The transistors would continue to decline in cost and size but increase in power. The continuous transistor scaling and extremely lower power constraints in modern Very Large Scale Integrated(VLSI) chips can potentially supersede the benefits of the technology shrinking due to reliability issues. As VLSI technology scales into nanoscale regime, fundamental physical limits are approached, and higher levels of variability, performance degradation, and higher rates of manufacturing defects are experienced. Soft errors, which traditionally affected only the memories, are now also resulting in logic circuit reliability degradation. A solution to these limitations is to integrate reliability assessment techniques into the Integrated Circuit(IC) design flow. This thesis investigates four aspects of reliability driven circuit design: a)Reliability estimation; b) Reliability optimization; c) Fault-tolerant techniques, and d) Delay degradation analysis. To guide the reliability driven synthesis and optimization of combinational circuits, highly accurate probability based reliability estimation methodology christened Conditional Probabilistic Error Propagation(CPEP) algorithm is developed to compute the impact of gate failures on the circuit output. CPEP guides the proposed rewriting based logic optimization algorithm employing local transformations. The main idea behind this methodology is to replace parts of the circuit with functionally equivalent but more reliable counterparts chosen from a precomputed subset of Negation-Permutation-Negation(NPN) classes of 4-variable functions. Cut enumeration and Boolean matching driven by reliability-aware optimization algorithm are used to identify the best possible replacement candidates. Experiments on a set of MCNC benchmark circuits and 8051 functional microcontroller units indicate that the proposed framework can achieve up to 75% reduction of output error probability. On average, about 14% SER reduction is obtained at the expense of very low area overhead of 6.57% that results in 13.52% higher power consumption. The next contribution of the research describes a novel methodology to design fault tolerant circuitry by employing the error correction codes known as Codeword Prediction Encoder(CPE). Traditional fault tolerant techniques analyze the circuit reliability issue from a static point of view neglecting the dynamic errors. In the context of communication and storage, the study of novel methods for reliable data transmission under unreliable hardware is an increasing priority. The idea of CPE is adapted from the field of forward error correction for telecommunications focusing on both encoding aspects and error correction capabilities. The proposed Augmented Encoding solution consists of computing an augmented codeword that contains both the codeword to be transmitted on the channel and extra parity bits. A Computer Aided Development(CAD) framework known as CPE simulator is developed providing a unified platform that comprises a novel encoder and fault tolerant LDPC decoders. Experiments on a set of encoders with different coding rates and different decoders indicate that the proposed framework can correct all errors under specific scenarios. On average, about 1000 times improvement in Soft Error Rate(SER) reduction is achieved. Last part of the research is the Inverse Gaussian Distribution(IGD) based delay model applicable to both combinational and sequential elements for sub-powered circuits. The Probability Density Function(PDF) based delay model accurately captures the delay behavior of all the basic gates in the library database. The IGD model employs these necessary parameters, and the delay estimation accuracy is demonstrated by evaluating multiple circuits. Experiments results indicate that the IGD based approach provides a high matching against HSPICE Monte Carlo simulation results, with an average error less than 1.9% and 1.2% for the 8-bit Ripple Carry Adder(RCA), and 8-bit De-Multiplexer(DEMUX) and Multiplexer(MUX) respectively

    Soft-Error Resilience Framework For Reliable and Energy-Efficient CMOS Logic and Spintronic Memory Architectures

    Get PDF
    The revolution in chip manufacturing processes spanning five decades has proliferated high performance and energy-efficient nano-electronic devices across all aspects of daily life. In recent years, CMOS technology scaling has realized billions of transistors within large-scale VLSI chips to elevate performance. However, these advancements have also continually augmented the impact of Single-Event Transient (SET) and Single-Event Upset (SEU) occurrences which precipitate a range of Soft-Error (SE) dependability issues. Consequently, soft-error mitigation techniques have become essential to improve systems\u27 reliability. Herein, first, we proposed optimized soft-error resilience designs to improve robustness of sub-micron computing systems. The proposed approaches were developed to deliver energy-efficiency and tolerate double/multiple errors simultaneously while incurring acceptable speed performance degradation compared to the prior work. Secondly, the impact of Process Variation (PV) at the Near-Threshold Voltage (NTV) region on redundancy-based SE-mitigation approaches for High-Performance Computing (HPC) systems was investigated to highlight the approach that can realize favorable attributes, such as reduced critical datapath delay variation and low speed degradation. Finally, recently, spin-based devices have been widely used to design Non-Volatile (NV) elements such as NV latches and flip-flops, which can be leveraged in normally-off computing architectures for Internet-of-Things (IoT) and energy-harvesting-powered applications. Thus, in the last portion of this dissertation, we design and evaluate for soft-error resilience NV-latching circuits that can achieve intriguing features, such as low energy consumption, high computing performance, and superior soft errors tolerance, i.e., concurrently able to tolerate Multiple Node Upset (MNU), to potentially become a mainstream solution for the aerospace and avionic nanoelectronics. Together, these objectives cooperate to increase energy-efficiency and soft errors mitigation resiliency of larger-scale emerging NV latching circuits within iso-energy constraints. In summary, addressing these reliability concerns is paramount to successful deployment of future reliable and energy-efficient CMOS logic and spintronic memory architectures with deeply-scaled devices operating at low-voltages

    Cross-layer Soft Error Analysis and Mitigation at Nanoscale Technologies

    Get PDF
    This thesis addresses the challenge of soft error modeling and mitigation in nansoscale technology nodes and pushes the state-of-the-art forward by proposing novel modeling, analyze and mitigation techniques. The proposed soft error sensitivity analysis platform accurately models both error generation and propagation starting from a technology dependent device level simulations all the way to workload dependent application level analysis

    Elasticity and Petri nets

    Get PDF
    Digital electronic systems typically use synchronous clocks and primarily assume fixed duration of their operations to simplify the design process. Time elastic systems can be constructed either by replacing the clock with communication handshakes (asynchronous version) or by augmenting the clock with a synchronous version of a handshake (synchronous version). Time elastic systems can tolerate static and dynamic changes in delays (asynchronous case) or latencies (synchronous case) of operations that can be used for modularity, ease of reuse and better power-delay trade-off. This paper describes methods for the modeling, performance analysis and optimization of elastic systems using Marked Graphs and their extensions capable of describing behavior with early evaluation. The paper uses synchronous elastic systems (aka latency-tolerant systems) for illustrating the use of Petri nets, however, most of the methods can be applied without changes (except changing the delay model associated with events of the system) to asynchronous elastic systems.Peer ReviewedPostprint (author's final draft
    corecore