44 research outputs found
Recommended from our members
Efficient verification/testing of system-on-chip through fault grading and analog behavioral modeling
textThis dissertation presents several cost-effective production test solutions using fault grading and mixed-signal design verification cases enabled by analog behavioral modeling. Although the latest System-on-Chip (SOC) is getting denser, faster, and more complex, the manufacturing technology is dominated by subtle defects that are introduced by small-scale technology. Thus, SOC requires more mature testing strategies. By performing various types of testing, better quality SoC can be manufactured, but test resources are too limited to accommodate all those tests. To create the most efficient production test flow, any redundant or ineffective tests need to be removed or minimized.
Chapter 3 proposes new method of test data volume reduction by combining the nonlinear property of feedback shift register (FSR) and dictionary coding. Instead of using the nonlinear FSR for actual hardware implementation, the expanded test set by nonlinear expansion is used as the one-column test sets and provides big reduction ratio for the test data volume. The experimental results show the combined method reduced the total test data volume and increased the fault coverage. Due to the increased number of test patterns, total test time is increased.
Chapter 4 addresses a whole process of functional fault grading. Fault grading has always been a ”desire-to-have” flow because it can bring up significant value for cost saving and yield analysis. However, it is very hard to perform the fault grading on the complex large scale SOC. A commercial tool called Z01X is used as a fault grading platform, and whole fault grading process is coordinated and each detailed execution is performed. Simulation- based functional fault grading identifies the quality of the given functional tests against the static faults and transition delay faults. With the structural tests and functional tests, functional fault grading can indicate the way to achieve the same test coverage by spending minimal test time. Compared to the consumed time and resource for fault grading, the contribution to the test time saving might not be acceptable as very promising, but the fault grading data can be reused for yield analysis and test flow optimization. For the final production testing, confident decisions on the functional test selection can be made based on the fault grading results.
Chapter 5 addresses the challenges of Package-on-Package (POP) testing. Because POP devices have pins on both the top and the bottom of the package, the increased test pins require more test channels to detect packaging defects. Boundary scan chain testing is used to detect those continuity defects by relying on leakage current from the power supply. This proposed test scheme does not require direct test channels on the top pins. Based on the counting algorithm, minimal numbers of test cycles are generated, and the test achieved full test coverage for any combinations of pin-to-pin shortage defects on the top pins of the POP package. The experimental results show about 10 times increased leakage current from the shorted defect. Also, it can be expanded to multi-site testing with less test channels for high-volume production.
Fault grading is applied within different structural test categories in Chapter 6. Stuck-at faults can be considered as TDFs having infinite delay. Hence, the TDF Automatic Test Pattern Generation (ATPG) tests can detect both TDFs and stuck-at faults. By removing the stuck-at faults being detected by the given TDF ATPG tests, the tests that target stuck-at faults can be reduced, and the reduced stuck-at fault set results in fewer stuck-at ATPG patterns. The structural test time is reduced while keeping the same test coverage. This TDF grading is performed with the same ATPG tool used to generate the stuck-at and TDF ATPG tests.
To expedite the mixed-signal design verification of complex SoC, analog behavioral modeling methods and strategies are addressed in Chapter 7 and case studies for detailed verification with actual mixed-signal design are ad- dressed in Chapter 8. Analog modeling effort can enhance verification quality for a mixed-signal design with less turnaround time, and it enables compatible integration of the mixed-signal design cores into the SoC. The modeling process may reveal any potential design errors or incorrect testbench setup, and it results in minimizing unnecessary debugging time for quality devices.
Two mixed-signal design cases were verified by me using the analog models. A fully hierarchical digital-to-analog converter (DAC) model is implemented and silicon mismatches caused by process variation are modeled and inserted into the DAC model, and the calibration algorithm for the DAC is successfully verified by model-based simulation at the full DAC-level. When the mismatch amount is increased and exceeded the calibration capability of the DAC, the simulation results show the increased calibration error with some outliers. This verification method can identify the saturation range of the DAC and predict the yield of the devices from process variation.
A phase-locked loop (PLL) design cases were also verified by me using the analog model. Both open-loop PLL model and closed-loop PLL model cases are presented. Quick bring-up of open-loop PLL model provides low simulation overhead for widely-used PLLs in the SOC and enables early starting of design verification for the upper-level design using the PLL generated clocks. Accurate closed-loop PLL model is implemented for DCO-based PLL design, and the mixed-simulation with analog models and schematic designs enables flexible analog verification. Only focused analog design block is set to the schematic design and the rest of the analog design is replaced by the analog model. Then, this scaled-down SPICE simulation is performed about 10 times to 100 times faster than full-scale SPICE simulation. The analog model of the focused block is compared with the scaled-down SPICE simulation result and the quality of the model is iteratively enhanced. Hence, the analog model enables both compatible integration and flexible analog design verification.
This dissertation contributes to reduce test time and to enhance test quality, and helps to set up efficient production testing flows. Depending on the size and performance of CUT, proper testing schemes can maximize the efficiency of production testing. The topics covered in this dissertation can be used in optimizing the test flow and selecting the final production tests to achieve maximum test capability. In addition, the strategies and benefits of analog behavioral modeling techniques that I implemented are presented, and actual verification cases shows the effectiveness of analog modeling for better quality SoC products.Electrical and Computer Engineerin
PRITEXT: Processor Reliability Improvement Through Exercise Technique
With continuous improvements in CMOS technology, transistor sizes are shrinking aggressively every year. Unfortunately, such deep submicron process technologies are severely degraded by several wearout mechanisms which lead to prolonged operational stress and failure. Negative Bias Temperature Instability (NBTI) is a prominent failure mechanism which degrades the reliability of current semiconductor devices. Improving reliability of processors is necessary for ensuring long operational lifetime which obviates the necessity of mitigating the physical wearout mechanisms. NBTI severely degrades the performance of PMOS transistors in a circuit, when negatively biased, by increasing the threshold voltage leading to critical timing failures over operational lifetime. A lack of activity among the PMOS transistors for long duration leads to a steady increase in threshold voltage Vth. Interestingly, NBTI stress can be recovered by removing the negative bias using appropriate input vectors. Exercising the dormant critical components in the Processor has been proved to reduce the NBTI stress. We use a novel methodology to generate a minimal set of deterministic input vectors which we show to be effective in reducing the NBTI wearout in a superscalar processor core. We then propose and evaluate a new technique PRITEXT, which uses these input vectors in exercise mode to effectively reduce the NBTI stress and improve the operational lifetime of superscalar processors. PRITEXT, which uses Input Vector Control, leads to a 4.5x lifetime improvement of superscalar processor on average with a maximum lifetime improvement of 12.7x
Design and test for timing uncertainty in VLSI circuits.
由於特徵尺寸不斷縮小,集成電路在生產過程中的工藝偏差在運行環境中溫度和電壓等參數的波動以及在使用過程中的老化等效應越來越嚴重,導致芯片的時序行為出現很大的不確定性。多數情況下,芯片的關鍵路徑會不時出現時序錯誤。加入更多的時序餘量不是一種很好的解決方案,因為這種保守的設計方法會抵消工藝進步帶來的性能上的好處。這就為設計一個時序可靠的系統提出了極大的挑戰,其中的一些關鍵問題包括:(一)如何有效地分配有限的功率預算去優化那些正爆炸式增加的關鍵路徑的時序性能;(二)如何產生能夠捕捉準確的最壞情況時延的高品質測試向量;(三)為了能夠取得更好的功耗和性能上的平衡,我們將不得不允許芯片在使用過程中出現一些頻率很低的時序錯誤。隨之而來的問題是如何做到在線的檢錯和糾錯。為了解決上述問題,我們首先發明了一種新的技術用於識別所謂的虛假路徑,該方法使我們能夠發現比傳統方法更多的虛假路徑。當將所提取的虛假路徑集成到靜態時序分析工具里以後,我們可以得到更為準確的時序分析結果,同時也能節省本來用於優化這些路徑的成本。接著,考慮到現有的延時自動向量生成(ATPG) 方法會產生功能模式下無法出現的測試向量,這種向量可能會造成測試過程中在被激活的路徑周圍出現過多(或過少)的電源噪聲(PSN) ,從而導致測試過度或者測試不足情況。為此,我們提出了一種新的偽功能ATPG工具。通過同時考慮功能約束以及電路的物理佈局信息,我們使用類似ATPG 的算法產生狀態跳變使其能最大化已激活的路徑周圍的PSN影響。最後,基於近似電路的原理,我們提出了一種新的在線原位校正技術,即InTimeFix,用於糾正時序錯誤。由於實現近似電路的綜合僅需要簡單的電路結構分析,因此該技術能夠很容易的擴展到大型電路設計上去。With technology scaling, integrated circuits (ICs) suffer from increasing process, voltage, and temperature (PVT) variations and aging effects. In most cases, these reliability threats manifest themselves as timing errors on speed-paths (i.e., critical or near-critical paths) of the circuit. Embedding a large design guard band to prevent timing errors to occur is not an attractive solution, since this conservative design methodology diminishes the benefit of technology scaling. This creates several challenges on build a reliable systems, and the key problems include (i) how to optimize circuit’s timing performance with limited power budget for explosively increased potential speed-paths; (ii) how to generate high quality delay test pattern to capture ICs’ accurate worst-case delay; (iii) to have better power and performance tradeoff, we have to accept some infrequent timing errors in circuit’s the usage phase. Therefore, the question is how to achieve online timing error resilience.To address the above issues, we first develop a novel technique to identify so-called false paths, which facilitate us to find much more false paths than conventional methods. By integrating our identified false paths into static timing analysis tool, we are able to achieve more accurate timing information and also save the cost used to optimize false paths. Then, due to the fact that existing delay automated test pattern generation (ATPG) methods may generate test patterns that are functionally-unreachable, and such patterns may incur excessive (or limited) power supply noise (PSN) on sensitized paths in test mode, thus leading to over-testing or under-testing of the circuits, we propose a novel pseudo-functional ATPG tool. By taking both circuit layout information and functional constrains into account, we use ATPG like algorithm to justify transitions that pose the maximized functional PSN effects on sensitized critical paths. Finally, we propose a novel in-situ correction technique to mask timing errors, namely InTimeFix, by introducing redundant approximation circuit with more timing slack for speed-paths into the design. The synthesis of the approximation circuit relies on simple structural analysis of the original circuit, which is easily scalable to large IC designs.Detailed summary in vernacular field only.Detailed summary in vernacular field only.Yuan, Feng.Thesis (Ph.D.)--Chinese University of Hong Kong, 2012.Includes bibliographical references (leaves 88-100).Abstract also in Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Challenges to Solve Timing Uncertainty Problem --- p.2Chapter 1.2 --- Contributions and Thesis Outline --- p.5Chapter 2 --- Background --- p.7Chapter 2.1 --- Sources of Timing Uncertainty --- p.7Chapter 2.1.1 --- Process Variation --- p.7Chapter 2.1.2 --- Runtime Environment Fluctuation --- p.9Chapter 2.1.3 --- Aging Effect --- p.10Chapter 2.2 --- Technical Flow to Solve Timing Uncertainty Problem --- p.10Chapter 2.3 --- False Path --- p.12Chapter 2.3.1 --- Path Sensitization Criteria --- p.12Chapter 2.3.2 --- False Path Aware Timing Analysis --- p.13Chapter 2.4 --- Manufacturing Testing --- p.14Chapter 2.4.1 --- Functional Testing vs. Structural Testing --- p.14Chapter 2.4.2 --- Scan-Based DfT --- p.15Chapter 2.4.3 --- Pseudo-Functional Testing --- p.17Chapter 2.5 --- Timing Error Tolerance --- p.19Chapter 2.5.1 --- Timing Error Detection --- p.19Chapter 2.5.2 --- Timing Error Recover --- p.20Chapter 3 --- Timing-Independent False Path Identification --- p.23Chapter 3.1 --- Introduction --- p.23Chapter 3.2 --- Preliminaries and Motivation --- p.26Chapter 3.2.1 --- Motivation --- p.27Chapter 3.3 --- False Path Examination Considering Illegal States --- p.28Chapter 3.3.1 --- Path Sensitization Criterion --- p.28Chapter 3.3.2 --- Path-Aware Illegal State Identification --- p.30Chapter 3.3.3 --- Proposed Examination Procedure --- p.31Chapter 3.4 --- False Path Identification --- p.32Chapter 3.4.1 --- Overall Flow --- p.34Chapter 3.4.2 --- Static Implication Learning --- p.35Chapter 3.4.3 --- Suspicious Node Extraction --- p.36Chapter 3.4.4 --- S-Frontier Propagation --- p.37Chapter 3.5 --- Experimental Results --- p.38Chapter 3.6 --- Conclusion and Future Work --- p.42Chapter 4 --- PSN Aware Pseudo-Functional Delay Testing --- p.43Chapter 4.1 --- Introduction --- p.43Chapter 4.2 --- Preliminaries and Motivation --- p.45Chapter 4.2.1 --- Motivation --- p.46Chapter 4.3 --- Proposed Methodology --- p.48Chapter 4.4 --- Maximizing PSN Effects under Functional Constraints --- p.50Chapter 4.4.1 --- Pseudo-Functional Relevant Transitions Generation --- p.51Chapter 4.5 --- Experimental Results --- p.59Chapter 4.5.1 --- Experimental Setup --- p.59Chapter 4.5.2 --- Results and Discussion --- p.60Chapter 4.6 --- Conclusion --- p.64Chapter 5 --- In-Situ Timing Error Masking in Logic Circuits --- p.65Chapter 5.1 --- Introduction --- p.65Chapter 5.2 --- Prior Work and Motivation --- p.67Chapter 5.3 --- In-Situ Timing Error Masking with Approximate Logic --- p.69Chapter 5.3.1 --- Equivalent Circuit Construction with Approximate Logic --- p.70Chapter 5.3.2 --- Timing Error Masking with Approximate Logic --- p.72Chapter 5.4 --- Cost-Efficient Synthesis for InTimeFix --- p.75Chapter 5.4.1 --- Overall Flow --- p.76Chapter 5.4.2 --- Prime Critical Segment Extraction --- p.77Chapter 5.4.3 --- Prime Critical Segment Merging --- p.79Chapter 5.5 --- Experimental Results --- p.81Chapter 5.5.1 --- Experimental Setup --- p.81Chapter 5.5.2 --- Results and Discussion --- p.82Chapter 5.6 --- Conclusion --- p.85Chapter 6 --- Conclusion and Future Work --- p.86Bibliography --- p.10
Testability and redundancy techniques for improved yield and reliability of CMOS VLSI circuits
The research presented in this thesis is concerned with the design of fault-tolerant integrated circuits as a contribution to the design of fault-tolerant systems. The economical manufacture of very large area ICs will necessitate the incorporation of fault-tolerance features which are routinely employed in current high density dynamic random access memories. Furthermore, the growing use of ICs in safety-critical applications and/or hostile environments in addition to the prospect of single-chip systems will mandate the use of fault-tolerance for improved reliability. A fault-tolerant IC must be able to detect and correct all possible faults that may affect its operation. The ability of a chip to detect its own faults is not only necessary for fault-tolerance, but it is also regarded as the ultimate solution to the problem of testing. Off-line periodic testing is selected for this research because it achieves better coverage of physical faults and it requires less extra hardware than on-line error detection techniques. Tests for CMOS stuck-open faults are shown to detect all other faults. Simple test sequence generation procedures for the detection of all faults are derived. The test sequences generated by these procedures produce a trivial output, thereby, greatly simplifying the task of test response analysis. A further advantage of the proposed test generation procedures is that they do not require the enumeration of faults. The implementation of built-in self-test is considered and it is shown that the hardware overhead is comparable to that associated with pseudo-random and pseudo-exhaustive techniques while achieving a much higher fault coverage through-the use of the proposed test generation procedures. The consideration of the problem of testing the test circuitry led to the conclusion that complete test coverage may be achieved if separate chips cooperate in testing each other's untested parts. An alternative approach towards complete test coverage would be to design the test circuitry so that it is as distributed as possible and so that it is tested as it performs its function. Fault correction relies on the provision of spare units and a means of reconfiguring the circuit so that the faulty units are discarded. This raises the question of what is the optimum size of a unit? A mathematical model, linking yield and reliability is therefore developed to answer such a question and also to study the effects of such parameters as the amount of redundancy, the size of the additional circuitry required for testing and reconfiguration, and the effect of periodic testing on reliability. The stringent requirement on the size of the reconfiguration logic is illustrated by the application of the model to a typical example. Another important result concerns the effect of periodic testing on reliability. It is shown that periodic off-line testing can achieve approximately the same level of reliability as on-line testing, even when the time between tests is many hundreds of hours
QBF with Soft Variables
QBF formulae are usually considered in prenex form, i.e. the quantifierblock is completely separated from the propositional part of the QBF.Among others, the semantics of the QBF is defined by the sequence ofthe variables within the prefix, where existentially quantifiedvariables depend on all universally quantified variables stated to theleft.In this paper we extend that classical definition and consider a newquantification type which we call soft variable. The idea is toallow a flexible position and quantifier type for these variables.Hence the type of quantifier of the soft variable can also bealtered. Based on this concept, we present an optimization problemseeking an optimal prefix as defined by user-given preferences. We statean algorithm based on MaxQBF, and present several applications – mainlyfrom verification area – which can be naturally translated into theoptimization problem for QBF with soft variables. We further implementeda prototype solver for this formalism, and compare our approach toprevious work, that differently from ours does not guarantee optimalityand completeness
Resilience of an embedded architecture using hardware redundancy
In the last decade the dominance of the general computing systems market has being replaced by embedded systems with billions of units manufactured every year. Embedded systems appear in contexts where continuous operation is of utmost importance and failure can be profound.
Nowadays, radiation poses a serious threat to the reliable operation of safety-critical systems. Fault avoidance techniques, such as radiation hardening, have been commonly used in space applications. However, these components are expensive, lag behind commercial components with regards to performance and do not provide 100% fault elimination. Without fault tolerant mechanisms, many of these faults can become errors at the application or system level, which in turn, can result in catastrophic failures.
In this work we study the concepts of fault tolerance and dependability and
extend these concepts providing our own definition of resilience. We analyse the physics of radiation-induced faults, the damage mechanisms of particles and the process that leads to computing failures. We provide extensive taxonomies of 1) existing fault tolerant techniques and of 2) the effects of radiation in state-of-the-art electronics, analysing and comparing their characteristics. We propose a detailed model of faults and provide a classification of the different types of faults at various levels. We introduce an algorithm of fault tolerance and define the system states and actions necessary to implement it. We introduce novel hardware and system software techniques that provide a more efficient combination of reliability, performance and power consumption than existing techniques. We propose a new element of the system called syndrome that is the core of a resilient architecture whose software and hardware can adapt to reliable and unreliable environments. We implement a software simulator and disassembler and introduce a testing framework in combination with ERA’s assembler and commercial hardware simulators
Fault Detection Methodology for Caches in Reliable Modern VLSI Microprocessors based on Instruction Set Architectures
Η παρούσα διδακτορική διατριβή εισάγει μία χαμηλού κόστους μεθοδολογία για την
ανίχνευση ελαττωμάτων σε μικρές ενσωματωμένες κρυφές μνήμες που βασίζεται σε
σύγχρονες Αρχιτεκτονικές Συνόλου Εντολών και εφαρμόζεται με λογισμικό
αυτοδοκιμής. Η προτεινόμενη μεθοδολογία εφαρμόζει αλγορίθμους March μέσω
λογισμικού για την ανίχνευση τόσο ελαττωμάτων αποθήκευσης όταν εφαρμόζεται σε
κρυφές μνήμες που περιέχουν μόνο στατικές μνήμες τυχαίας προσπέλασης όπως για
παράδειγμα κρυφές μνήμες επιπέδου 1, όσο και ελαττωμάτων σύγκρισης όταν
εφαρμόζεται σε κρυφές μνήμες που περιέχουν εκτός από SRAM μνήμες και μνήμες
διευθυνσιοδοτούμενες μέσω περιεχομένου, όπως για παράδειγμα πλήρως
συσχετιστικές κρυφές μνήμες αναζήτησης μετάφρασης. Η προτεινόμενη μεθοδολογία
εφαρμόζεται και στις τρεις οργανώσεις συσχετιστικότητας κρυφής μνήμης και είναι
ανεξάρτητη της πολιτικής εγγραφής στο επόμενο επίπεδο της ιεραρχίας. Η
μεθοδολογία αξιοποιεί υπάρχοντες ισχυρούς μηχανισμούς των μοντέρνων ISAs
χρησιμοποιώντας ειδικές εντολές, που ονομάζονται στην παρούσα διατριβή Εντολές
Άμεσης Προσπέλασης Κρυφής Μνήμης (Direct Cache Access Instructions - DCAs).
Επιπλέον, η προτεινόμενη μεθοδολογία εκμεταλλεύεται τους έμφυτους μηχανισμούς
καταγραφής απόδοσης και τους μηχανισμούς χειρισμού παγίδων που είναι διαθέσιμοι
στους σύγχρονους επεξεργαστές. Επιπρόσθετα, η προτεινόμενη μεθοδολογία
εφαρμόζει την λειτουργία σύγκρισης των αλγορίθμων March όταν αυτή απαιτείται
(για μνήμες CAM) και επαληθεύει το αποτέλεσμα του ελέγχου μέσω σύντομης
απόκρισης, ώστε να είναι συμβατή με τις απαιτήσεις του ελέγχου εντός
λειτουργίας. Τέλος, στη διατριβή προτείνεται μία βελτιστοποίηση της
μεθοδολογίας για πολυνηματικές, πολυπύρηνες αρχιτεκτονικές.The present PhD thesis introduces a low cost fault detection methodology for
small embedded cache memories that is based on modern Instruction Set
Architectures and is applied with Software-Based Self-Test (SBST) routines. The
proposed methodology applies March tests through software to detect both
storage faults when applied to caches that comprise Static Random Access
Memories (SRAM) only, e.g. L1 caches, and comparison faults when applied to
caches that apart from SRAM memories comprise Content Addressable Memories
(CAM) too, e.g. Translation Lookaside Buffers (TLBs). The proposed methodology
can be applied to all three cache associativity organizations: direct mapped,
set-associative and full-associative and it does not depend on the cache write
policy. The methodology leverages existing powerful mechanisms of modern ISAs
by utilizing instructions that we call in this PhD thesis Direct Cache Access
(DCA) instructions. Moreover, our methodology exploits the native performance
monitoring hardware and the trap handling mechanisms which are available in
modern microprocessors. Moreover, the proposed Methodology applies March
compare operations when needed (for CAM arrays) and verifies the test result
with a compact response to comply with periodic on-line testing needs. Finally,
a multithreaded optimization of the proposed methodology that targets
multithreaded, multicore architectures is also presented in this thesi
Fault-Tolerant Computing: An Overview
Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNASA / NAG-1-613Semiconductor Research Corporation / 90-DP-109Joint Services Electronics Program / N00014-90-J-127