10 research outputs found

    Low overhead Soft Error Mitigation techniques for high-performance and aggressive systems

    Get PDF
    The threat of soft error induced system failure in high performance computing systems has become more prominent, as we adopt ultra-deep submicron process technologies. In this paper, we propose two techniques, namely soft error mitigation (SEM) and soft and timing error mitigation (STEM), for protecting combinational logic blocks from soft errors. Our first technique (SEM), based on distributed and temporal voting of three registers, unloads the soft error detection overhead from the critical path of the systems. Our second technique (STEM) adds timing error detection capability to guarantee reliable execution in aggressively clocked designs that enhance system performance by operating beyond worst-case clock frequency. We also present a specialized low overhead clock generation scheme that ably supports our proposed techniques. Timing annotated gate level simulations, using 45 nm libraries, of a pipelined adder-multiplier and DLX processor show that both our techniques achieve near 100% fault coverage. For DLX processor, even under severe fault injection campaigns, SEM achieves an average performance improvement of 26.58% over a conventional triple modular redundancy voter based soft error mitigation scheme, while STEM outperforms SEM by 27.42%

    On the deployment of on-chip noise sensors

    Get PDF
    The relentless technology scaling has led to significantly reduced noise margin and complicated functionalities. As such, design time techniques per se are less likely to ensure power integrity, resulting in runtime voltage emergencies. To alleviate the issue, recently several works have shed light on the possibilities of dynamic noise management systems. Most of these works rely on on-chip noise sensors to accurately capture voltage emergencies. However, they all assume that the placement of the sensors is given. It remains an open problem in the literature how to optimally place a given number of noise sensors for best voltage emergency detection. The problem of noise sensor placement is defined at first along with a novel sensing quality metric (SQM) to be maximized. The threshold voltage for noise sensors to report emergencies serves as a critical tuning knob between the system failure rate and false alarms. The problem of minimizing the system alarm rate subject to a given system failure rate constraint is formulated. It is further shown that with the help of IDDQ measurements during testing which reveal process variation information, it is possible and efficient to compute a per-chip optimal threshold voltage threshold. In the third chapter, a novel framework to predict the resonance frequency using existing on-chip noise sensors, based on the theory of 1-bit compressed sensing is proposed. The proposed framework can help to achieve the resonance frequency of individual chips so as to effectively avoid resonance noise at runtime --Abstract, page iii

    Single event upset hardened embedded domain specific reconfigurable architecture

    Get PDF

    Methods and architectures based on modular redundancy for fault-tolerant combinational circuits

    Get PDF
    Dans cette thèse, nous nous intéressons à la recherche d architectures fiables pour les circuits logiques. Par fiable , nous entendons des architectures permettant le masquage des fautes et les rendant de ce fait tolérantes" à ces fautes. Les solutions pour la tolérance aux fautes sont basées sur la redondance, d où le surcoût qui y est associé. La redondance peut être mise en oeuvre de différentes manières : statique ou dynamique, spatiale ou temporelle. Nous menons cette recherche en essayant de minimiser tant que possible le surcoût matériel engendré par le mécanisme de tolérance aux fautes. Le travail porte principalement sur les solutions de redondance modulaire, mais certaines études développées sont beaucoup plus générales.In this thesis, we mainly take into account the representative technique Triple Module Redundancy (TMR) as the reliability improvement technique. A voter is an necessary element in this kind of fault-tolerant architectures. The importance of reliability in majority voter is due to its application in both conventional fault-tolerant design and novel nanoelectronic systems. The property of a voter is therefore a bottleneck since it directly determines the whole performance of a redundant fault-tolerant digital IP (such as a TMR configuration). Obviously, the efficacy of TMR is to increase the reliability of digital IP. However, TMR sometimes could result in worse reliability than a simplex function module could. A better understanding of functional and signal reliability characteristics of a 3-input majority voter (majority voting in TMR) is studied. We analyze them by utilizing signal probability and boolean difference. It is well known that the acquisition of output signal probabilities is much easier compared with the obtention of output reliability. The results derived in this thesis proclaim the signal probability requirements for inputs of majority voter, and thereby reveal the conditions that TMR technique requires. This study shows the critical importance of error characteristics of majority voter, as used in fault-tolerant designs. As the flawlessness of majority voter in TMR is not true, we also proposed a fault-tolerant and simple 2-level majority voter structure for TMR. This alternative architecture for majority voter is useful in TMR schemes. The proposed solution is robust to single fault and exceeds those previous ones in terms of reliability.PARIS-Télécom ParisTech (751132302) / SudocSudocFranceF

    Cost Reduction and Evaluation of a Temporary Faults Detecting Technique

    No full text
    : IC technologies are approaching the ultimate limits of silicon in terms of channel width, power supply and speed. By approaching these limits, circuits are becoming increasingly sensitive to noise, which will result on unacceptable rates of soft-errors. Furthermore, defect behavior is becoming increasingly complex resulting on increasing number of timing faults that can escape detection by fabrication testing. Thus, fault tolerant techniques will become necessary even for commodity applications. This work considers the implementation and improvements of a new soft error and timing error detecting technique based on time redundancy. Arithmetic circuits were used as test vehicle to validate the approach. Simulations and performance evaluations of the proposed detection technique were made using time and logic simulators. The obtained results show that detection of such temporal faults can be achieved by means of meaningful hardware and performance cost. 1

    Cost Reduction and Evaluation of a Temporary Faults Detecting Technique

    No full text
    IC technologies are approaching the ultimate limits of silicon in terms of channel width, power supply and speed. By approaching these limits, circuits are becoming increasingly sensitive to noise, which will result on unacceptable rates of soft-errors. Furthermore, defect behavior is becoming increasingly complex resulting on increasing number of timing faults that can escape detection by fabrication testing. Thus, fault tolerant techniques will become necessary even for commodity applications. This work considers the implementation and improvements of a new soft error and timing error detecting technique based on time redundancy. Arithmetic circuits were used as test vehicle to validate the approach. Simulations and performance evaluations of the proposed detection technique were made using time and logic simulators. The obtained results show that detection of such temporal faults can be achieved by means of meaningful hardware and performance cost

    Cost reduction and evaluation of a temporary faults detecting technique

    No full text
    corecore