307 research outputs found

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Low-Power High-Performance Ternary Content Addressable Memory Circuits

    Get PDF
    Ternary content addressable memories (TCAMs) are hardware-based parallel lookup tables with bit-level masking capability. They are attractive for applications such as packet forwarding and classification in network routers. Despite the attractive features of TCAMs, high power consumption is one of the most critical challenges faced by TCAM designers. This work proposes circuit techniques for reducing TCAM power consumption. The main contribution of this work is divided in two parts: (i) reduction in match line (ML) sensing energy, and (ii) static-power reduction techniques. The ML sensing energy is reduced by employing (i) positive-feedback ML sense amplifiers (MLSAs), (ii) low-capacitance comparison logic, and (iii) low-power ML-segmentation techniques. The positive-feedback MLSAs include both resistive and active feedback to reduce the ML sensing energy. A body-bias technique can further improve the feedback action at the expense of additional area and ML capacitance. The measurement results of the active-feedback MLSA show 50-56% reduction in ML sensing energy. The measurement results of the proposed low-capacitance comparison logic show 25% and 42% reductions in ML sensing energy and time, respectively, which can further be improved by careful layout. The low-power ML-segmentation techniques include dual ML TCAM and charge-shared ML. Simulation results of the dual ML TCAM that connects two sides of the comparison logic to two ML segments for sequential sensing show 43% power savings for a small (4%) trade-off in the search speed. The charge-shared ML scheme achieves power savings by partial recycling of the charge stored in the first ML segment. Chip measurement results show that the charge-shared ML scheme results in 11% and 9% reductions in ML sensing time and energy, respectively, which can be improved to 19-25% by using a digitally controlled charge sharing time-window and a slightly modified MLSA. The static power reduction is achieved by a dual-VDD technique and low-leakage TCAM cells. The dual-VDD technique trades-off the excess noise margin of MLSA for smaller cell leakage by applying a smaller VDD to TCAM cells and a larger VDD to the peripheral circuits. The low-leakage TCAM cells trade off the speed of READ and WRITE operations for smaller cell area and leakage. Finally, design and testing of a complete TCAM chip are presented, and compared with other published designs

    Fault Detection Methodology for Caches in Reliable Modern VLSI Microprocessors based on Instruction Set Architectures

    Get PDF
    Η παρούσα διδακτορική διατριβή εισάγει μία χαμηλού κόστους μεθοδολογία για την ανίχνευση ελαττωμάτων σε μικρές ενσωματωμένες κρυφές μνήμες που βασίζεται σε σύγχρονες Αρχιτεκτονικές Συνόλου Εντολών και εφαρμόζεται με λογισμικό αυτοδοκιμής. Η προτεινόμενη μεθοδολογία εφαρμόζει αλγορίθμους March μέσω λογισμικού για την ανίχνευση τόσο ελαττωμάτων αποθήκευσης όταν εφαρμόζεται σε κρυφές μνήμες που περιέχουν μόνο στατικές μνήμες τυχαίας προσπέλασης όπως για παράδειγμα κρυφές μνήμες επιπέδου 1, όσο και ελαττωμάτων σύγκρισης όταν εφαρμόζεται σε κρυφές μνήμες που περιέχουν εκτός από SRAM μνήμες και μνήμες διευθυνσιοδοτούμενες μέσω περιεχομένου, όπως για παράδειγμα πλήρως συσχετιστικές κρυφές μνήμες αναζήτησης μετάφρασης. Η προτεινόμενη μεθοδολογία εφαρμόζεται και στις τρεις οργανώσεις συσχετιστικότητας κρυφής μνήμης και είναι ανεξάρτητη της πολιτικής εγγραφής στο επόμενο επίπεδο της ιεραρχίας. Η μεθοδολογία αξιοποιεί υπάρχοντες ισχυρούς μηχανισμούς των μοντέρνων ISAs χρησιμοποιώντας ειδικές εντολές, που ονομάζονται στην παρούσα διατριβή Εντολές Άμεσης Προσπέλασης Κρυφής Μνήμης (Direct Cache Access Instructions - DCAs). Επιπλέον, η προτεινόμενη μεθοδολογία εκμεταλλεύεται τους έμφυτους μηχανισμούς καταγραφής απόδοσης και τους μηχανισμούς χειρισμού παγίδων που είναι διαθέσιμοι στους σύγχρονους επεξεργαστές. Επιπρόσθετα, η προτεινόμενη μεθοδολογία εφαρμόζει την λειτουργία σύγκρισης των αλγορίθμων March όταν αυτή απαιτείται (για μνήμες CAM) και επαληθεύει το αποτέλεσμα του ελέγχου μέσω σύντομης απόκρισης, ώστε να είναι συμβατή με τις απαιτήσεις του ελέγχου εντός λειτουργίας. Τέλος, στη διατριβή προτείνεται μία βελτιστοποίηση της μεθοδολογίας για πολυνηματικές, πολυπύρηνες αρχιτεκτονικές.The present PhD thesis introduces a low cost fault detection methodology for small embedded cache memories that is based on modern Instruction Set Architectures and is applied with Software-Based Self-Test (SBST) routines. The proposed methodology applies March tests through software to detect both storage faults when applied to caches that comprise Static Random Access Memories (SRAM) only, e.g. L1 caches, and comparison faults when applied to caches that apart from SRAM memories comprise Content Addressable Memories (CAM) too, e.g. Translation Lookaside Buffers (TLBs). The proposed methodology can be applied to all three cache associativity organizations: direct mapped, set-associative and full-associative and it does not depend on the cache write policy. The methodology leverages existing powerful mechanisms of modern ISAs by utilizing instructions that we call in this PhD thesis Direct Cache Access (DCA) instructions. Moreover, our methodology exploits the native performance monitoring hardware and the trap handling mechanisms which are available in modern microprocessors. Moreover, the proposed Methodology applies March compare operations when needed (for CAM arrays) and verifies the test result with a compact response to comply with periodic on-line testing needs. Finally, a multithreaded optimization of the proposed methodology that targets multithreaded, multicore architectures is also presented in this thesi

    Proposal for the development of 3D Vertically Integrated Pattern Recognition Associative Memory (VIPRAM)

    Full text link

    Innovative Techniques for Testing and Diagnosing SoCs

    Get PDF
    We rely upon the continued functioning of many electronic devices for our everyday welfare, usually embedding integrated circuits that are becoming even cheaper and smaller with improved features. Nowadays, microelectronics can integrate a working computer with CPU, memories, and even GPUs on a single die, namely System-On-Chip (SoC). SoCs are also employed on automotive safety-critical applications, but need to be tested thoroughly to comply with reliability standards, in particular the ISO26262 functional safety for road vehicles. The goal of this PhD. thesis is to improve SoC reliability by proposing innovative techniques for testing and diagnosing its internal modules: CPUs, memories, peripherals, and GPUs. The proposed approaches in the sequence appearing in this thesis are described as follows: 1. Embedded Memory Diagnosis: Memories are dense and complex circuits which are susceptible to design and manufacturing errors. Hence, it is important to understand the fault occurrence in the memory array. In practice, the logical and physical array representation differs due to an optimized design which adds enhancements to the device, namely scrambling. This part proposes an accurate memory diagnosis by showing the efforts of a software tool able to analyze test results, unscramble the memory array, map failing syndromes to cell locations, elaborate cumulative analysis, and elaborate a final fault model hypothesis. Several SRAM memory failing syndromes were analyzed as case studies gathered on an industrial automotive 32-bit SoC developed by STMicroelectronics. The tool displayed defects virtually, and results were confirmed by real photos taken from a microscope. 2. Functional Test Pattern Generation: The key for a successful test is the pattern applied to the device. They can be structural or functional; the former usually benefits from embedded test modules targeting manufacturing errors and is only effective before shipping the component to the client. The latter, on the other hand, can be applied during mission minimally impacting on performance but is penalized due to high generation time. However, functional test patterns may benefit for having different goals in functional mission mode. Part III of this PhD thesis proposes three different functional test pattern generation methods for CPU cores embedded in SoCs, targeting different test purposes, described as follows: a. Functional Stress Patterns: Are suitable for optimizing functional stress during I Operational-life Tests and Burn-in Screening for an optimal device reliability characterization b. Functional Power Hungry Patterns: Are suitable for determining functional peak power for strictly limiting the power of structural patterns during manufacturing tests, thus reducing premature device over-kill while delivering high test coverage c. Software-Based Self-Test Patterns: Combines the potentiality of structural patterns with functional ones, allowing its execution periodically during mission. In addition, an external hardware communicating with a devised SBST was proposed. It helps increasing in 3% the fault coverage by testing critical Hardly Functionally Testable Faults not covered by conventional SBST patterns. An automatic functional test pattern generation exploiting an evolutionary algorithm maximizing metrics related to stress, power, and fault coverage was employed in the above-mentioned approaches to quickly generate the desired patterns. The approaches were evaluated on two industrial cases developed by STMicroelectronics; 8051-based and a 32-bit Power Architecture SoCs. Results show that generation time was reduced upto 75% in comparison to older methodologies while increasing significantly the desired metrics. 3. Fault Injection in GPGPU: Fault injection mechanisms in semiconductor devices are suitable for generating structural patterns, testing and activating mitigation techniques, and validating robust hardware and software applications. GPGPUs are known for fast parallel computation used in high performance computing and advanced driver assistance where reliability is the key point. Moreover, GPGPU manufacturers do not provide design description code due to content secrecy. Therefore, commercial fault injectors using the GPGPU model is unfeasible, making radiation tests the only resource available, but are costly. In the last part of this thesis, we propose a software implemented fault injector able to inject bit-flip in memory elements of a real GPGPU. It exploits a software debugger tool and combines the C-CUDA grammar to wisely determine fault spots and apply bit-flip operations in program variables. The goal is to validate robust parallel algorithms by studying fault propagation or activating redundancy mechanisms they possibly embed. The effectiveness of the tool was evaluated on two robust applications: redundant parallel matrix multiplication and floating point Fast Fourier Transform

    Choose-Your-Own Adventure: A Lightweight, High-Performance Approach To Defect And Variation Mitigation In Reconfigurable Logic

    Get PDF
    For field-programmable gate arrays (FPGAs), fine-grained pre-computed alternative configurations, combined with simple test-based selection, produce limited per-chip specialization to counter yield loss, increased delay, and increased energy costs that come from fabrication defects and variation. This lightweight approach achieves much of the benefit of knowledge-based full specialization while reducing to practical, palatable levels the computational, testing, and load-time costs that obstruct the application of the knowledge-based approach. In practice this may more than double the power-limited computational capabilities of dies fabricated with 22nm technologies. Contributions of this work: • Choose-Your-own-Adventure (CYA), a novel, lightweight, scalable methodology to achieve defect and variation mitigation • Implementation of CYA, including preparatory components (generation of diverse alternative paths) and FPGA load-time components • Detailed performance characterization of CYA – Comparison to conventional loading and dynamic frequency and voltage scaling (DFVS) – Limit studies to characterize the quality of the CYA implementation and identify potential areas for further optimizatio

    Analysis of material efficiency aspects of personal computers product group

    Get PDF
    This report has been developed within the project ‘Technical support for environmental footprinting, material efficiency in product policy and the European Platform on Life Cycle Assessment’ (LCA) (2013-2017) funded by the Directorate-General for Environment. The report summarises the findings of the analysis of material-efficiency aspects of the personal-computer (PC) product group, namely durability, reusability, reparability and recyclability. It also aims to identify material-efficiency aspects which can be relevant for the current revision of the Ecodesign Regulation (EU) No 617/2013. Special focus was given to the content of EU critical raw materials (CRMs) ( ) in computers and computer components, and how to increase the efficient use of these materials, including material savings thanks to reuse and repair and recovery of the products at end of life. The analysis has been based mainly on the REAPro method ( ) developed by the Joint Research Centre for the material-efficiency assessment of products. This work has been carried out in the period June 2016-September 2017, in parallel with the development of The preparatory study on the review of Regulation 617/2013 (Lot 3) — computers and computer servers led by Viegand Maagøe and Vlaamse Instelling voor Technologisch Onderzoek NV (VITO) (2017) ( ). During this period, close communication was maintained with the authors of the preparatory study. This allowed ensuring consistency between input data and assumptions of the two studies. Moreover, outcomes of the present research were used as scientific basis for the preparatory study for the analysis of material-efficiency aspects for computers. The research has been differentiated as far as possible for different types of computers (i.e. tablet, notebooks and desktop computers). The report starts with the analysis of the technical and scientific background relevant for material-efficiency aspects of computers, such as market sales, expected lifetime, bill of materials, and a focus on the content of CRMs (especially cobalt in batteries, rare earths including neodymium in hard disk drives and palladium in printed circuit boards). Successively the report analyses the current practices for repair, reuse and recycling of computers. Based on results available from the literature, material efficiency of the product group has the potential to be improved, in particular the lifetime extension. The residence time ( ) of IT equipment put on the market in 2000 versus 2010 generally declined by approximately 10 % (Huisman et al., 2012), while consumers expressed their preference for durable goods, lasting considerably longer than they are typically used (Wieser and Tröger, 2016). Design barriers (such as difficulties for the disassembly of certain components or for their processing for data sanitisation) can hinder the repair and the reuse of products. Malfunction and accident rates are not negligible (IDC, 2016, 2010; SquareTrade, 2009) and difficulties in repair may bring damaged products to be discarded even if still functioning. Once a computer reaches the end of its useful life, it is addressed to ‘waste of electrical and electronic equipment’ (WEEE) recycling plants. Recycling of computers is usually based on a combination of manual dismantling of certain components (mainly components containing hazardous substances or valuable materials, e.g. batteries, printed circuit boards, display panels, data-storage components), followed by mechanical processing including shredding. The recycling of traditional desktop computers is perceived as non-problematic by recyclers, with the exception of some miniaturised new models (i.e. mini desktop computers), which still are not found in recycling plants and which could present some difficulties for the extraction of printed circuit boards and batteries (if present). The design of notebooks and tablets can originate some difficulties for the dismantling of batteries, especially for computers with compact design. Recycling of plastics from computers of all types is generally challenging due to the large use of different plastics with additives, such as flame retardants. According to all the interviewed recyclers, recycling of WEEE plastics with flame retardant is very poor or null with current technologies. Building on this analysis, the report then focuses on possible actions to improve material efficiency in computers, namely measures to improve (a) waste prevention, (b) repair and reuse and (c) design for recycling. The possible actions identified are listed hereinafter. (a) Waste prevention a.1 Implementation of dedicated functionality ( ) for the optimisation of the lifetime of batteries in notebooks: the lifetime of batteries could be extended by systematically implementing a preinstalled functionality on notebooks, which makes it possible to optimise the state of charge (SoC) of the battery when the device is used in grid operation (stationary). By preventing the battery remaining at full load when the notebook is in grid operation, the lifetime of batteries can be potentially extended by up to 50 %. Users could be informed about the existence and characteristics of such a functionality and the potential benefits related to its use. a.2 Decoupling external power supplies (EPS) from personal computers: the provision of information on the EPS specifications and the presence/absence of the EPS in the packaging of notebooks and tablets could facilitate the reuse by the consumer of already-available EPS with suitable characteristics. Such a measure could promote the use of common EPS across different devices, as well as the reuse of already-owned EPS. This would result in a reduction in material consumption for the production of unnecessary power supplies (and related packaging and transport) and overall a reduction of treatment of electronic waste. The International Electrotechnical Commission (IEC) technical specification (TS) 62700, the Standard Institute of Electrical and Electronics Engineers (IEEE) 1823 and Recommendation ITU-T L.1002 can be used to develop standards for the correct definition of connectors and power specifications. a.3 Provision of information about the durability of batteries: the analysis identified the existence of endurance tests suitable for the assessment of the durability of batteries in computers according to existing standards (e.g. EN 61960). The availability of information about these endurance tests could help users to get an indication on the residual capacity of the battery after a predefined number of charge/discharge cycles. Moreover, such information would allow for comparison between different products and potentially push the market towards longer-lasting batteries. a.4 Provision of information about the ‘liquid ingress protection (IP) class’ for personal computers: this can be assessed for a notebook or tablet by performing specific tests, developed according to existing standards (e.g. IEC 60529). Users can be informed about the level of protection of the computer against the ingress of liquids (e.g. dripping water or spraying water or water jets) and in this way prevent one of the most common causes of computer failure. The yearly rate of estimated material saving if dedicated functionality for the optimisation of the lifetime of batteries (a.1) were used ranges from around 2 360 to 5 400 tonnes (t) of different materials per year. About 450 t of cobalt, 100 t of lithium, 210 t of nickel and 730 t of copper could be saved every year. The estimated potential savings of materials when EPS are decoupled from notebooks and tablets (a.2) are in the range 2 300-4 600 t/year (80 % related to the notebook category, and 20 % to tablets). These values can be obtained when 10-20 % of notebooks and tablets are sold without an EPS, as users can reuse already-owned and compatible EPS. Under these conditions, for example, about 190-370 t of copper can be saved every year. This estimate may increase when the same EPS can be used for both notebooks and tablets (at the moment the assessment is based on the assumption that the two product types were kept separated). Further work is needed to assess the potential improvements thanks to the provision of information about the durability of batteries (a.3), and about the ‘liquid-IP class’ (a.4). The former option (a.3) has the potential to boost competition among battery manufacturers, resulting in more durable products. The latter option (a.4) has the potential to reduce computer damage due to liquid spillage, ranked among the most recurrent failure modes. (b) Repair/reuse b.1 and b.2 Provision of information to facilitate computer disassembly: the disassembly of relevant components (such as the display panel, keyboard, data storage, batteries, memory and internal power-supply units) plays a key role to enhance repair and reuse of personal computers. Some actions have therefore been discussed (b.1) to provide professional repair operators with documentation about the sequence of disassembly, extraction, replacement and reassembly operations needed for each relevant component of personal computers, and (b.2) to provide end-users with specific information about the disassembly and replacement of batteries in notebooks and tablets. b.3 Secure data deletion for personal computers: this is the process of deliberately, permanently and irreversibly erasing all traces of existing data from storage media, overwriting the data completely in such a way that access to the original data, or parts of them, becomes infeasible for a given level of effort. Secure data deletion is essential for the security of personal data and to allow the reuse of computers by a different user. Secure data deletion for personal computers can be ensured by means of built-in functionality. A number of existing national standards (HMG IS Standard No 5 (the United Kingdom), DIN 66399 (Germany), NIST 800-88r1 (the United States (US)) can be used as a basis to start standardisation activities on secure data deletion. The estimated potential savings of materials due to the provision of information and tools to facilitate computer disassembly were quantified in the range of 150-620 t/year for mobile computers (notebooks and tablets) within the first 2 years of use, and in the range of 610 2 460 t/year for mobile computers older than 2 years. Secure data deletion of personal computers, instead, is considered a necessary prerequisite to enhance reuse. The need to take action on this is related to policies on privacy and protection of personal data, as the General Data Protection Regulation (EU) 2016/679 and in particular its Article 25 on ‘data protection by design and by default’. Future work is needed to strengthen the analysis, however it was estimated that secure data deletion has the potential to double volume of desktop, notebook and tablet computers reused after the first useful lifetime. (c) Recyclability c.1 Provision of information to facilitate computer dismantling: computers could be designed so that crucial components for material aspects (e.g. content of hazardous substances and/or valuable materials) can be easily identified and extracted in order to be processed by means of specific recycling treatments. Design for dismantling can focus on components listed in Annex VII of the WEEE directive ( ). The ‘ease of dismantling’ can be supported by the provision of relevant information (such as a diagram of the product showing the location of the components, the content of hazardous substances, instructions on the sequence of operations needed to remove these components, including type and number of fastening techniques to be unlocked, and tool(s) required). c.2 Marking of plastic components: although all plastics are theoretically recyclable, in practice the recyclability of plastics in computers is generally low, mainly due to the large amount of different plastic components with flame retardants (FRs) and other additives. Marking of plastic components according to existing standards (e.g. ISO 11469 and ISO 1043 series) can facilitate identification and sorting of plastic components during the manual dismantling steps of the recycling. c.3 FR content: according to all the recyclers interviewed, FRs are a major barrier to plastics recycling. Current mechanical-sorting processes of shredded plastics are characterised by low efficiency, while innovative sorting systems are still at the pilot stage and have been shown to be effective only in certain cases. Therefore, the provision of information on the content of FRs in plastic components is a first step to contribute to the improvement of plastics recycling. Plastics marking (as discussed above) can contribute to the separation of plastics with FRs during the manual dismantling, allowing for their recycling at higher rates (in line with the prescription of IEC/TR 62635, 2015). However, detailed information about FRs content could be given in a more systematised way, for example through the development of specific indexes. These indexes could support recyclers in checking the use of FRs in computers and in developing future processes and technologies suitable for plastics recycling. Moreover, these indexes could support policymakers in monitoring the use of FRs in the products and, in the medium-long term, to promote products that use smaller quantities of FRs. An example of a FR content index is provided in this report. c.4 Battery marks: the identification of the chemistry type of batteries in computers is necessary in order to have efficient identification and sorting, and thus to improve the material efficiency during the recycling. It is proposed to start standardisation activities to establish standard marking symbols for batteries. The examples of the ‘battery-recycle mark’, developed by the Battery Association of Japan (BAJ), and the current standardisation activities for the IEC 62902 (standard marking symbols for batteries with a volume higher than 900 cm3) may be used as references to develop ad hoc standards. The benefits of actions for the design for recycling can be relevant. In particular, the proposed actions should contribute to increase the amounts of materials that will be recycled (6 350-8 900 t/year), in particular plastics (5 950-7 960 t/year of additional plastics), but also metals such as cobalt (55-110 t), copper (240-610 t), rare earths as neodymium and dysprosium (2 7 t) and various precious metals (gold (0.1-0.4 t), palladium (0.1-0.4 t) and silver (2 7 t)). Compared to the amount of materials recycled in the EU (2012 data), these values would represent a recycling increase of 1-2 % for cobalt, 2-5 % for palladium, and 13-50 % for rare earths.JRC.D.3-Land Resource
    corecore