37 research outputs found

    Scalable Test Generators for High-Speed Datapath Circuits

    Full text link
    This paper explores the design of efficient test sets and test-pattern generators for on-line BIST. The target applications are high-performance, scalable datapath circuits for which fast and complete fault coverage is required. Because of the presence of carry-lookahead, most existing BIST methods are unsuitable for these applications. High-level models are used to identify potential test sets for a small version of the circuit to be tested. Then a regular test set is extracted and a test generator TG is designed to meet the following goals: scalability, small test set size, full fault coverage, and very low hardware overhead. TG takes the form of a twisted ring counter with a small decoder array. We apply our technique to various datapath circuits including a carry-lookahead adder, an arithmetic-logic unit, and a multiplier-adder.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/43010/1/10836_2004_Article_154697.pd

    Innovative Techniques for Testing and Diagnosing SoCs

    Get PDF
    We rely upon the continued functioning of many electronic devices for our everyday welfare, usually embedding integrated circuits that are becoming even cheaper and smaller with improved features. Nowadays, microelectronics can integrate a working computer with CPU, memories, and even GPUs on a single die, namely System-On-Chip (SoC). SoCs are also employed on automotive safety-critical applications, but need to be tested thoroughly to comply with reliability standards, in particular the ISO26262 functional safety for road vehicles. The goal of this PhD. thesis is to improve SoC reliability by proposing innovative techniques for testing and diagnosing its internal modules: CPUs, memories, peripherals, and GPUs. The proposed approaches in the sequence appearing in this thesis are described as follows: 1. Embedded Memory Diagnosis: Memories are dense and complex circuits which are susceptible to design and manufacturing errors. Hence, it is important to understand the fault occurrence in the memory array. In practice, the logical and physical array representation differs due to an optimized design which adds enhancements to the device, namely scrambling. This part proposes an accurate memory diagnosis by showing the efforts of a software tool able to analyze test results, unscramble the memory array, map failing syndromes to cell locations, elaborate cumulative analysis, and elaborate a final fault model hypothesis. Several SRAM memory failing syndromes were analyzed as case studies gathered on an industrial automotive 32-bit SoC developed by STMicroelectronics. The tool displayed defects virtually, and results were confirmed by real photos taken from a microscope. 2. Functional Test Pattern Generation: The key for a successful test is the pattern applied to the device. They can be structural or functional; the former usually benefits from embedded test modules targeting manufacturing errors and is only effective before shipping the component to the client. The latter, on the other hand, can be applied during mission minimally impacting on performance but is penalized due to high generation time. However, functional test patterns may benefit for having different goals in functional mission mode. Part III of this PhD thesis proposes three different functional test pattern generation methods for CPU cores embedded in SoCs, targeting different test purposes, described as follows: a. Functional Stress Patterns: Are suitable for optimizing functional stress during I Operational-life Tests and Burn-in Screening for an optimal device reliability characterization b. Functional Power Hungry Patterns: Are suitable for determining functional peak power for strictly limiting the power of structural patterns during manufacturing tests, thus reducing premature device over-kill while delivering high test coverage c. Software-Based Self-Test Patterns: Combines the potentiality of structural patterns with functional ones, allowing its execution periodically during mission. In addition, an external hardware communicating with a devised SBST was proposed. It helps increasing in 3% the fault coverage by testing critical Hardly Functionally Testable Faults not covered by conventional SBST patterns. An automatic functional test pattern generation exploiting an evolutionary algorithm maximizing metrics related to stress, power, and fault coverage was employed in the above-mentioned approaches to quickly generate the desired patterns. The approaches were evaluated on two industrial cases developed by STMicroelectronics; 8051-based and a 32-bit Power Architecture SoCs. Results show that generation time was reduced upto 75% in comparison to older methodologies while increasing significantly the desired metrics. 3. Fault Injection in GPGPU: Fault injection mechanisms in semiconductor devices are suitable for generating structural patterns, testing and activating mitigation techniques, and validating robust hardware and software applications. GPGPUs are known for fast parallel computation used in high performance computing and advanced driver assistance where reliability is the key point. Moreover, GPGPU manufacturers do not provide design description code due to content secrecy. Therefore, commercial fault injectors using the GPGPU model is unfeasible, making radiation tests the only resource available, but are costly. In the last part of this thesis, we propose a software implemented fault injector able to inject bit-flip in memory elements of a real GPGPU. It exploits a software debugger tool and combines the C-CUDA grammar to wisely determine fault spots and apply bit-flip operations in program variables. The goal is to validate robust parallel algorithms by studying fault propagation or activating redundancy mechanisms they possibly embed. The effectiveness of the tool was evaluated on two robust applications: redundant parallel matrix multiplication and floating point Fast Fourier Transform

    Fault Detection Methodology for Caches in Reliable Modern VLSI Microprocessors based on Instruction Set Architectures

    Get PDF
    Η παρούσα διδακτορική διατριβή εισάγει μία χαμηλού κόστους μεθοδολογία για την ανίχνευση ελαττωμάτων σε μικρές ενσωματωμένες κρυφές μνήμες που βασίζεται σε σύγχρονες Αρχιτεκτονικές Συνόλου Εντολών και εφαρμόζεται με λογισμικό αυτοδοκιμής. Η προτεινόμενη μεθοδολογία εφαρμόζει αλγορίθμους March μέσω λογισμικού για την ανίχνευση τόσο ελαττωμάτων αποθήκευσης όταν εφαρμόζεται σε κρυφές μνήμες που περιέχουν μόνο στατικές μνήμες τυχαίας προσπέλασης όπως για παράδειγμα κρυφές μνήμες επιπέδου 1, όσο και ελαττωμάτων σύγκρισης όταν εφαρμόζεται σε κρυφές μνήμες που περιέχουν εκτός από SRAM μνήμες και μνήμες διευθυνσιοδοτούμενες μέσω περιεχομένου, όπως για παράδειγμα πλήρως συσχετιστικές κρυφές μνήμες αναζήτησης μετάφρασης. Η προτεινόμενη μεθοδολογία εφαρμόζεται και στις τρεις οργανώσεις συσχετιστικότητας κρυφής μνήμης και είναι ανεξάρτητη της πολιτικής εγγραφής στο επόμενο επίπεδο της ιεραρχίας. Η μεθοδολογία αξιοποιεί υπάρχοντες ισχυρούς μηχανισμούς των μοντέρνων ISAs χρησιμοποιώντας ειδικές εντολές, που ονομάζονται στην παρούσα διατριβή Εντολές Άμεσης Προσπέλασης Κρυφής Μνήμης (Direct Cache Access Instructions - DCAs). Επιπλέον, η προτεινόμενη μεθοδολογία εκμεταλλεύεται τους έμφυτους μηχανισμούς καταγραφής απόδοσης και τους μηχανισμούς χειρισμού παγίδων που είναι διαθέσιμοι στους σύγχρονους επεξεργαστές. Επιπρόσθετα, η προτεινόμενη μεθοδολογία εφαρμόζει την λειτουργία σύγκρισης των αλγορίθμων March όταν αυτή απαιτείται (για μνήμες CAM) και επαληθεύει το αποτέλεσμα του ελέγχου μέσω σύντομης απόκρισης, ώστε να είναι συμβατή με τις απαιτήσεις του ελέγχου εντός λειτουργίας. Τέλος, στη διατριβή προτείνεται μία βελτιστοποίηση της μεθοδολογίας για πολυνηματικές, πολυπύρηνες αρχιτεκτονικές.The present PhD thesis introduces a low cost fault detection methodology for small embedded cache memories that is based on modern Instruction Set Architectures and is applied with Software-Based Self-Test (SBST) routines. The proposed methodology applies March tests through software to detect both storage faults when applied to caches that comprise Static Random Access Memories (SRAM) only, e.g. L1 caches, and comparison faults when applied to caches that apart from SRAM memories comprise Content Addressable Memories (CAM) too, e.g. Translation Lookaside Buffers (TLBs). The proposed methodology can be applied to all three cache associativity organizations: direct mapped, set-associative and full-associative and it does not depend on the cache write policy. The methodology leverages existing powerful mechanisms of modern ISAs by utilizing instructions that we call in this PhD thesis Direct Cache Access (DCA) instructions. Moreover, our methodology exploits the native performance monitoring hardware and the trap handling mechanisms which are available in modern microprocessors. Moreover, the proposed Methodology applies March compare operations when needed (for CAM arrays) and verifies the test result with a compact response to comply with periodic on-line testing needs. Finally, a multithreaded optimization of the proposed methodology that targets multithreaded, multicore architectures is also presented in this thesi

    Resilience of an embedded architecture using hardware redundancy

    Get PDF
    In the last decade the dominance of the general computing systems market has being replaced by embedded systems with billions of units manufactured every year. Embedded systems appear in contexts where continuous operation is of utmost importance and failure can be profound. Nowadays, radiation poses a serious threat to the reliable operation of safety-critical systems. Fault avoidance techniques, such as radiation hardening, have been commonly used in space applications. However, these components are expensive, lag behind commercial components with regards to performance and do not provide 100% fault elimination. Without fault tolerant mechanisms, many of these faults can become errors at the application or system level, which in turn, can result in catastrophic failures. In this work we study the concepts of fault tolerance and dependability and extend these concepts providing our own definition of resilience. We analyse the physics of radiation-induced faults, the damage mechanisms of particles and the process that leads to computing failures. We provide extensive taxonomies of 1) existing fault tolerant techniques and of 2) the effects of radiation in state-of-the-art electronics, analysing and comparing their characteristics. We propose a detailed model of faults and provide a classification of the different types of faults at various levels. We introduce an algorithm of fault tolerance and define the system states and actions necessary to implement it. We introduce novel hardware and system software techniques that provide a more efficient combination of reliability, performance and power consumption than existing techniques. We propose a new element of the system called syndrome that is the core of a resilient architecture whose software and hardware can adapt to reliable and unreliable environments. We implement a software simulator and disassembler and introduce a testing framework in combination with ERA’s assembler and commercial hardware simulators

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Self-Test Mechanisms for Automotive Multi-Processor System-on-Chips

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Release and Verification of an Operating System for Testing e-Flash on Microcontrollers for Automotive Applications based on Multicore Architecture

    Get PDF
    The cars produced contain an increasing number of electronic devices for active assistance to driving, safety controls, energy efficiency, passenger comfort and entertainment. Safety is the keyword and means to have electronic components high reliability. Infineon microcontroller division works to improve reliability and guarantee the quality of microcontroller flash memories. The thesis goal is to verify the operating system used to test the microcontrollers flash memorie

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    Designs for increasing reliability while reducing energy and increasing lifetime

    Get PDF
    In the last decades, the computing technology experienced tremendous developments. For instance, transistors' feature size shrank to half at every two years as consistently from the first time Moore stated his law. Consequently, number of transistors and core count per chip doubles at each generation. Similarly, petascale systems that have the capability of processing more than one billion calculation per second have been developed. As a matter of fact, exascale systems are predicted to be available at year 2020. However, these developments in computer systems face a reliability wall. For instance, transistor feature sizes are getting so small that it becomes easier for high-energy particles to temporarily flip the state of a memory cell from 1-to-0 or 0-to-1. Also, even if we assume that fault-rate per transistor stays constant with scaling, the increase in total transistor and core count per chip will significantly increase the number of faults for future desktop and exascale systems. Moreover, circuit ageing is exacerbated due to increased manufacturing variability and thermal stresses, therefore, lifetime of processor structures are becoming shorter. On the other side, due to the limited power budget of the computer systems such that mobile devices, it is attractive to scale down the voltage. However, when the voltage level scales to beyond the safe margin especially to the ultra-low level, the error rate increases drastically. Nevertheless, new memory technologies such as NAND flashes present only limited amount of nominal lifetime, and when they exceed this lifetime, they can not guarantee storing of the data correctly leading to data retention problems. Due to these issues, reliability became a first-class design constraint for contemporary computing in addition to power and performance. Moreover, reliability even plays increasingly important role when computer systems process sensitive and life-critical information such as health records, financial information, power regulation, transportation, etc. In this thesis, we present several different reliability designs for detecting and correcting errors occurring in processor pipelines, L1 caches and non-volatile NAND flash memories due to various reasons. We design reliability solutions in order to serve three main purposes. Our first goal is to improve the reliability of computer systems by detecting and correcting random and non-predictable errors such as bit flips or ageing errors. Second, we aim to reduce the energy consumption of the computer systems by allowing them to operate reliably at ultra-low voltage level. Third, we target to increase the lifetime of new memory technologies by implementing efficient and low-cost reliability schemes

    Dependable design for low-cost ultra-low-power processors

    Get PDF
    Emerging applications in the Internet of Things (IoT) domain, such as wearables, implantables, smart tags, and wireless sensor networks put severe power, cost, reliability, and security constraints on hardware system design. This dissertation focuses on the architecture and design of dependable ultra-low power computing systems. Specifically, it proposes architecture and design techniques that exploit the unique application and usage characteristics of future computing systems to deliver low power, while meeting the reliability and security constraints of these systems. First, this dissertation considers the challenge of achieving both low power and high reliability in SRAM memories. It proposes both an architectural technique to reduce the overheads of error correction and a technique that uses the nature of error correcting codes to allow lower voltage operation without sacrificing reliability. Next, this dissertation considers low power and low cost. By leveraging the fact that many IoT systems are embedded in nature and will run the same application for their entire lifetime, fine-grained usage characteristics of the hardware-software system can be determined at design time. This dissertation presents a novel hardware-software co-analysis based on symbolic simulation that can determine the possible states of the processor throughout any execution of a specific application. This enables power-gating where more gates are turned off for longer, bespoke processors customized to specific applications, and stricter determination of peak power bounds. Finally, this dissertation considers achieving secure IoT systems at low cost and power overhead. By leveraging the hardware-software co-analysis, this dissertation shows that gate-level information flow security guarantees can be provided without hardware overheads
    corecore