44 research outputs found

    PRITEXT: Processor Reliability Improvement Through Exercise Technique

    Get PDF
    With continuous improvements in CMOS technology, transistor sizes are shrinking aggressively every year. Unfortunately, such deep submicron process technologies are severely degraded by several wearout mechanisms which lead to prolonged operational stress and failure. Negative Bias Temperature Instability (NBTI) is a prominent failure mechanism which degrades the reliability of current semiconductor devices. Improving reliability of processors is necessary for ensuring long operational lifetime which obviates the necessity of mitigating the physical wearout mechanisms. NBTI severely degrades the performance of PMOS transistors in a circuit, when negatively biased, by increasing the threshold voltage leading to critical timing failures over operational lifetime. A lack of activity among the PMOS transistors for long duration leads to a steady increase in threshold voltage Vth. Interestingly, NBTI stress can be recovered by removing the negative bias using appropriate input vectors. Exercising the dormant critical components in the Processor has been proved to reduce the NBTI stress. We use a novel methodology to generate a minimal set of deterministic input vectors which we show to be effective in reducing the NBTI wearout in a superscalar processor core. We then propose and evaluate a new technique PRITEXT, which uses these input vectors in exercise mode to effectively reduce the NBTI stress and improve the operational lifetime of superscalar processors. PRITEXT, which uses Input Vector Control, leads to a 4.5x lifetime improvement of superscalar processor on average with a maximum lifetime improvement of 12.7x

    Design and test for timing uncertainty in VLSI circuits.

    Get PDF
    由於特徵尺寸不斷縮小,集成電路在生產過程中的工藝偏差在運行環境中溫度和電壓等參數的波動以及在使用過程中的老化等效應越來越嚴重,導致芯片的時序行為出現很大的不確定性。多數情況下,芯片的關鍵路徑會不時出現時序錯誤。加入更多的時序餘量不是一種很好的解決方案,因為這種保守的設計方法會抵消工藝進步帶來的性能上的好處。這就為設計一個時序可靠的系統提出了極大的挑戰,其中的一些關鍵問題包括:(一)如何有效地分配有限的功率預算去優化那些正爆炸式增加的關鍵路徑的時序性能;(二)如何產生能夠捕捉準確的最壞情況時延的高品質測試向量;(三)為了能夠取得更好的功耗和性能上的平衡,我們將不得不允許芯片在使用過程中出現一些頻率很低的時序錯誤。隨之而來的問題是如何做到在線的檢錯和糾錯。為了解決上述問題,我們首先發明了一種新的技術用於識別所謂的虛假路徑,該方法使我們能夠發現比傳統方法更多的虛假路徑。當將所提取的虛假路徑集成到靜態時序分析工具里以後,我們可以得到更為準確的時序分析結果,同時也能節省本來用於優化這些路徑的成本。接著,考慮到現有的延時自動向量生成(ATPG) 方法會產生功能模式下無法出現的測試向量,這種向量可能會造成測試過程中在被激活的路徑周圍出現過多(或過少)的電源噪聲(PSN) ,從而導致測試過度或者測試不足情況。為此,我們提出了一種新的偽功能ATPG工具。通過同時考慮功能約束以及電路的物理佈局信息,我們使用類似ATPG 的算法產生狀態跳變使其能最大化已激活的路徑周圍的PSN影響。最後,基於近似電路的原理,我們提出了一種新的在線原位校正技術,即InTimeFix,用於糾正時序錯誤。由於實現近似電路的綜合僅需要簡單的電路結構分析,因此該技術能夠很容易的擴展到大型電路設計上去。With technology scaling, integrated circuits (ICs) suffer from increasing process, voltage, and temperature (PVT) variations and aging effects. In most cases, these reliability threats manifest themselves as timing errors on speed-paths (i.e., critical or near-critical paths) of the circuit. Embedding a large design guard band to prevent timing errors to occur is not an attractive solution, since this conservative design methodology diminishes the benefit of technology scaling. This creates several challenges on build a reliable systems, and the key problems include (i) how to optimize circuit’s timing performance with limited power budget for explosively increased potential speed-paths; (ii) how to generate high quality delay test pattern to capture ICs’ accurate worst-case delay; (iii) to have better power and performance tradeoff, we have to accept some infrequent timing errors in circuit’s the usage phase. Therefore, the question is how to achieve online timing error resilience.To address the above issues, we first develop a novel technique to identify so-called false paths, which facilitate us to find much more false paths than conventional methods. By integrating our identified false paths into static timing analysis tool, we are able to achieve more accurate timing information and also save the cost used to optimize false paths. Then, due to the fact that existing delay automated test pattern generation (ATPG) methods may generate test patterns that are functionally-unreachable, and such patterns may incur excessive (or limited) power supply noise (PSN) on sensitized paths in test mode, thus leading to over-testing or under-testing of the circuits, we propose a novel pseudo-functional ATPG tool. By taking both circuit layout information and functional constrains into account, we use ATPG like algorithm to justify transitions that pose the maximized functional PSN effects on sensitized critical paths. Finally, we propose a novel in-situ correction technique to mask timing errors, namely InTimeFix, by introducing redundant approximation circuit with more timing slack for speed-paths into the design. The synthesis of the approximation circuit relies on simple structural analysis of the original circuit, which is easily scalable to large IC designs.Detailed summary in vernacular field only.Detailed summary in vernacular field only.Yuan, Feng.Thesis (Ph.D.)--Chinese University of Hong Kong, 2012.Includes bibliographical references (leaves 88-100).Abstract also in Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Challenges to Solve Timing Uncertainty Problem --- p.2Chapter 1.2 --- Contributions and Thesis Outline --- p.5Chapter 2 --- Background --- p.7Chapter 2.1 --- Sources of Timing Uncertainty --- p.7Chapter 2.1.1 --- Process Variation --- p.7Chapter 2.1.2 --- Runtime Environment Fluctuation --- p.9Chapter 2.1.3 --- Aging Effect --- p.10Chapter 2.2 --- Technical Flow to Solve Timing Uncertainty Problem --- p.10Chapter 2.3 --- False Path --- p.12Chapter 2.3.1 --- Path Sensitization Criteria --- p.12Chapter 2.3.2 --- False Path Aware Timing Analysis --- p.13Chapter 2.4 --- Manufacturing Testing --- p.14Chapter 2.4.1 --- Functional Testing vs. Structural Testing --- p.14Chapter 2.4.2 --- Scan-Based DfT --- p.15Chapter 2.4.3 --- Pseudo-Functional Testing --- p.17Chapter 2.5 --- Timing Error Tolerance --- p.19Chapter 2.5.1 --- Timing Error Detection --- p.19Chapter 2.5.2 --- Timing Error Recover --- p.20Chapter 3 --- Timing-Independent False Path Identification --- p.23Chapter 3.1 --- Introduction --- p.23Chapter 3.2 --- Preliminaries and Motivation --- p.26Chapter 3.2.1 --- Motivation --- p.27Chapter 3.3 --- False Path Examination Considering Illegal States --- p.28Chapter 3.3.1 --- Path Sensitization Criterion --- p.28Chapter 3.3.2 --- Path-Aware Illegal State Identification --- p.30Chapter 3.3.3 --- Proposed Examination Procedure --- p.31Chapter 3.4 --- False Path Identification --- p.32Chapter 3.4.1 --- Overall Flow --- p.34Chapter 3.4.2 --- Static Implication Learning --- p.35Chapter 3.4.3 --- Suspicious Node Extraction --- p.36Chapter 3.4.4 --- S-Frontier Propagation --- p.37Chapter 3.5 --- Experimental Results --- p.38Chapter 3.6 --- Conclusion and Future Work --- p.42Chapter 4 --- PSN Aware Pseudo-Functional Delay Testing --- p.43Chapter 4.1 --- Introduction --- p.43Chapter 4.2 --- Preliminaries and Motivation --- p.45Chapter 4.2.1 --- Motivation --- p.46Chapter 4.3 --- Proposed Methodology --- p.48Chapter 4.4 --- Maximizing PSN Effects under Functional Constraints --- p.50Chapter 4.4.1 --- Pseudo-Functional Relevant Transitions Generation --- p.51Chapter 4.5 --- Experimental Results --- p.59Chapter 4.5.1 --- Experimental Setup --- p.59Chapter 4.5.2 --- Results and Discussion --- p.60Chapter 4.6 --- Conclusion --- p.64Chapter 5 --- In-Situ Timing Error Masking in Logic Circuits --- p.65Chapter 5.1 --- Introduction --- p.65Chapter 5.2 --- Prior Work and Motivation --- p.67Chapter 5.3 --- In-Situ Timing Error Masking with Approximate Logic --- p.69Chapter 5.3.1 --- Equivalent Circuit Construction with Approximate Logic --- p.70Chapter 5.3.2 --- Timing Error Masking with Approximate Logic --- p.72Chapter 5.4 --- Cost-Efficient Synthesis for InTimeFix --- p.75Chapter 5.4.1 --- Overall Flow --- p.76Chapter 5.4.2 --- Prime Critical Segment Extraction --- p.77Chapter 5.4.3 --- Prime Critical Segment Merging --- p.79Chapter 5.5 --- Experimental Results --- p.81Chapter 5.5.1 --- Experimental Setup --- p.81Chapter 5.5.2 --- Results and Discussion --- p.82Chapter 5.6 --- Conclusion --- p.85Chapter 6 --- Conclusion and Future Work --- p.86Bibliography --- p.10

    Testability and redundancy techniques for improved yield and reliability of CMOS VLSI circuits

    Get PDF
    The research presented in this thesis is concerned with the design of fault-tolerant integrated circuits as a contribution to the design of fault-tolerant systems. The economical manufacture of very large area ICs will necessitate the incorporation of fault-tolerance features which are routinely employed in current high density dynamic random access memories. Furthermore, the growing use of ICs in safety-critical applications and/or hostile environments in addition to the prospect of single-chip systems will mandate the use of fault-tolerance for improved reliability. A fault-tolerant IC must be able to detect and correct all possible faults that may affect its operation. The ability of a chip to detect its own faults is not only necessary for fault-tolerance, but it is also regarded as the ultimate solution to the problem of testing. Off-line periodic testing is selected for this research because it achieves better coverage of physical faults and it requires less extra hardware than on-line error detection techniques. Tests for CMOS stuck-open faults are shown to detect all other faults. Simple test sequence generation procedures for the detection of all faults are derived. The test sequences generated by these procedures produce a trivial output, thereby, greatly simplifying the task of test response analysis. A further advantage of the proposed test generation procedures is that they do not require the enumeration of faults. The implementation of built-in self-test is considered and it is shown that the hardware overhead is comparable to that associated with pseudo-random and pseudo-exhaustive techniques while achieving a much higher fault coverage through-the use of the proposed test generation procedures. The consideration of the problem of testing the test circuitry led to the conclusion that complete test coverage may be achieved if separate chips cooperate in testing each other's untested parts. An alternative approach towards complete test coverage would be to design the test circuitry so that it is as distributed as possible and so that it is tested as it performs its function. Fault correction relies on the provision of spare units and a means of reconfiguring the circuit so that the faulty units are discarded. This raises the question of what is the optimum size of a unit? A mathematical model, linking yield and reliability is therefore developed to answer such a question and also to study the effects of such parameters as the amount of redundancy, the size of the additional circuitry required for testing and reconfiguration, and the effect of periodic testing on reliability. The stringent requirement on the size of the reconfiguration logic is illustrated by the application of the model to a typical example. Another important result concerns the effect of periodic testing on reliability. It is shown that periodic off-line testing can achieve approximately the same level of reliability as on-line testing, even when the time between tests is many hundreds of hours

    QBF with Soft Variables

    Get PDF
    QBF formulae are usually considered in prenex form, i.e. the quantifierblock is completely separated from the propositional part of the QBF.Among others, the semantics of the QBF is defined by the sequence ofthe variables within the prefix, where existentially quantifiedvariables depend on all universally quantified variables stated to theleft.In this paper we extend that classical definition and consider a newquantification type which we call soft variable. The idea is toallow a flexible position and quantifier type for these variables.Hence the type of quantifier of the soft variable can also bealtered. Based on this concept, we present an optimization problemseeking an optimal prefix as defined by user-given preferences. We statean algorithm based on MaxQBF, and present several applications – mainlyfrom verification area – which can be naturally translated into theoptimization problem for QBF with soft variables. We further implementeda prototype solver for this formalism, and compare our approach toprevious work, that differently from ours does not guarantee optimalityand completeness

    Resilience of an embedded architecture using hardware redundancy

    Get PDF
    In the last decade the dominance of the general computing systems market has being replaced by embedded systems with billions of units manufactured every year. Embedded systems appear in contexts where continuous operation is of utmost importance and failure can be profound. Nowadays, radiation poses a serious threat to the reliable operation of safety-critical systems. Fault avoidance techniques, such as radiation hardening, have been commonly used in space applications. However, these components are expensive, lag behind commercial components with regards to performance and do not provide 100% fault elimination. Without fault tolerant mechanisms, many of these faults can become errors at the application or system level, which in turn, can result in catastrophic failures. In this work we study the concepts of fault tolerance and dependability and extend these concepts providing our own definition of resilience. We analyse the physics of radiation-induced faults, the damage mechanisms of particles and the process that leads to computing failures. We provide extensive taxonomies of 1) existing fault tolerant techniques and of 2) the effects of radiation in state-of-the-art electronics, analysing and comparing their characteristics. We propose a detailed model of faults and provide a classification of the different types of faults at various levels. We introduce an algorithm of fault tolerance and define the system states and actions necessary to implement it. We introduce novel hardware and system software techniques that provide a more efficient combination of reliability, performance and power consumption than existing techniques. We propose a new element of the system called syndrome that is the core of a resilient architecture whose software and hardware can adapt to reliable and unreliable environments. We implement a software simulator and disassembler and introduce a testing framework in combination with ERA’s assembler and commercial hardware simulators

    Fault Detection Methodology for Caches in Reliable Modern VLSI Microprocessors based on Instruction Set Architectures

    Get PDF
    Η παρούσα διδακτορική διατριβή εισάγει μία χαμηλού κόστους μεθοδολογία για την ανίχνευση ελαττωμάτων σε μικρές ενσωματωμένες κρυφές μνήμες που βασίζεται σε σύγχρονες Αρχιτεκτονικές Συνόλου Εντολών και εφαρμόζεται με λογισμικό αυτοδοκιμής. Η προτεινόμενη μεθοδολογία εφαρμόζει αλγορίθμους March μέσω λογισμικού για την ανίχνευση τόσο ελαττωμάτων αποθήκευσης όταν εφαρμόζεται σε κρυφές μνήμες που περιέχουν μόνο στατικές μνήμες τυχαίας προσπέλασης όπως για παράδειγμα κρυφές μνήμες επιπέδου 1, όσο και ελαττωμάτων σύγκρισης όταν εφαρμόζεται σε κρυφές μνήμες που περιέχουν εκτός από SRAM μνήμες και μνήμες διευθυνσιοδοτούμενες μέσω περιεχομένου, όπως για παράδειγμα πλήρως συσχετιστικές κρυφές μνήμες αναζήτησης μετάφρασης. Η προτεινόμενη μεθοδολογία εφαρμόζεται και στις τρεις οργανώσεις συσχετιστικότητας κρυφής μνήμης και είναι ανεξάρτητη της πολιτικής εγγραφής στο επόμενο επίπεδο της ιεραρχίας. Η μεθοδολογία αξιοποιεί υπάρχοντες ισχυρούς μηχανισμούς των μοντέρνων ISAs χρησιμοποιώντας ειδικές εντολές, που ονομάζονται στην παρούσα διατριβή Εντολές Άμεσης Προσπέλασης Κρυφής Μνήμης (Direct Cache Access Instructions - DCAs). Επιπλέον, η προτεινόμενη μεθοδολογία εκμεταλλεύεται τους έμφυτους μηχανισμούς καταγραφής απόδοσης και τους μηχανισμούς χειρισμού παγίδων που είναι διαθέσιμοι στους σύγχρονους επεξεργαστές. Επιπρόσθετα, η προτεινόμενη μεθοδολογία εφαρμόζει την λειτουργία σύγκρισης των αλγορίθμων March όταν αυτή απαιτείται (για μνήμες CAM) και επαληθεύει το αποτέλεσμα του ελέγχου μέσω σύντομης απόκρισης, ώστε να είναι συμβατή με τις απαιτήσεις του ελέγχου εντός λειτουργίας. Τέλος, στη διατριβή προτείνεται μία βελτιστοποίηση της μεθοδολογίας για πολυνηματικές, πολυπύρηνες αρχιτεκτονικές.The present PhD thesis introduces a low cost fault detection methodology for small embedded cache memories that is based on modern Instruction Set Architectures and is applied with Software-Based Self-Test (SBST) routines. The proposed methodology applies March tests through software to detect both storage faults when applied to caches that comprise Static Random Access Memories (SRAM) only, e.g. L1 caches, and comparison faults when applied to caches that apart from SRAM memories comprise Content Addressable Memories (CAM) too, e.g. Translation Lookaside Buffers (TLBs). The proposed methodology can be applied to all three cache associativity organizations: direct mapped, set-associative and full-associative and it does not depend on the cache write policy. The methodology leverages existing powerful mechanisms of modern ISAs by utilizing instructions that we call in this PhD thesis Direct Cache Access (DCA) instructions. Moreover, our methodology exploits the native performance monitoring hardware and the trap handling mechanisms which are available in modern microprocessors. Moreover, the proposed Methodology applies March compare operations when needed (for CAM arrays) and verifies the test result with a compact response to comply with periodic on-line testing needs. Finally, a multithreaded optimization of the proposed methodology that targets multithreaded, multicore architectures is also presented in this thesi

    Fault-Tolerant Computing: An Overview

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNASA / NAG-1-613Semiconductor Research Corporation / 90-DP-109Joint Services Electronics Program / N00014-90-J-127
    corecore