194 research outputs found

    Techniques for Aging, Soft Errors and Temperature to Increase the Reliability of Embedded On-Chip Systems

    Get PDF
    This thesis investigates the challenge of providing an abstracted, yet sufficiently accurate reliability estimation for embedded on-chip systems. In addition, it also proposes new techniques to increase the reliability of register files within processors against aging effects and soft errors. It also introduces a novel thermal measurement setup that perspicuously captures the infrared images of modern multi-core processors

    On microarchitectural mechanisms for cache wearout reduction

    No full text
    Hot carrier injection (HCI) and bias temperature instability (BTI) are two of the main deleterious effects that increase a transistor's threshold voltage over the lifetime of a microprocessor. This voltage degradation causes slower transistor switching and eventually can result in faulty operation. HCI manifests itself when transistors switch from logic ''0'' to ''1'' and vice versa, whereas BTI is the result of a transistor maintaining the same logic value for an extended period of time. These failure mechanisms are especiall in those transistors used to implement the SRAM cells of first-level (L1) caches, which are frequently accessed, so they are critical to performance, and they are continuously aging. This paper focuses on microarchitectural solutions to reduce transistor aging effects induced by both HCI and BTI in the data array of L1 data caches. First, we show that the majority of cell flips are concentrated in a small number of specific bits within each data word. In addition, we also build upon the previous studies, showing that logic ''0'' is the most frequently written value in a cache by identifying which cells hold a given logic value for a significant amount of time. Based on these observations, this paper introduces a number of architectural techniques that spread the number of flips evenly across memory cells and reduce the amount of time that logic ''0'' values are stored in the cells by switchingThis work was supported in part by the Spanish Ministerio de Economía y Competitividad within the Plan E Funds under Grant TIN2015-66972-C5-1-R, in part by the HiPEAC Collaboration Grant funded by the FP7 HiPEAC Network of Excellence under Grant 287759, and in part by the Engineering and Physical Sciences Research Council under Grant EP/K 026399/1 and Grant EP/J016284/1

    Reliable Software for Unreliable Hardware - A Cross-Layer Approach

    Get PDF
    A novel cross-layer reliability analysis, modeling, and optimization approach is proposed in this thesis that leverages multiple layers in the system design abstraction (i.e. hardware, compiler, system software, and application program) to exploit the available reliability enhancing potential at each system layer and to exchange this information across multiple system layers

    Reliability in the face of variability in nanometer embedded memories

    Get PDF
    In this thesis, we have investigated the impact of parametric variations on the behaviour of one performance-critical processor structure - embedded memories. As variations manifest as a spread in power and performance, as a first step, we propose a novel modeling methodology that helps evaluate the impact of circuit-level optimizations on architecture-level design choices. Choices made at the design-stage ensure conflicting requirements from higher-levels are decoupled. We then complement such design-time optimizations with a runtime mechanism that takes advantage of adaptive body-biasing to lower power whilst improving performance in the presence of variability. Our proposal uses a novel fully-digital variation tracking hardware using embedded DRAM (eDRAM) cells to monitor run-time changes in cache latency and leakage. A special fine-grain body-bias generator uses the measurements to generate an optimal body-bias that is needed to meet the required yield targets. A novel variation-tolerant and soft-error hardened eDRAM cell is also proposed as an alternate candidate for replacing existing SRAM-based designs in latency critical memory structures. In the ultra low-power domain where reliable operation is limited by the minimum voltage of operation (Vddmin), we analyse the impact of failures on cache functional margin and functional yield. Towards this end, we have developed a fully automated tool (INFORMER) capable of estimating memory-wide metrics such as power, performance and yield accurately and rapidly. Using the developed tool, we then evaluate the #effectiveness of a new class of hybrid techniques in improving cache yield through failure prevention and correction. Having a holistic perspective of memory-wide metrics helps us arrive at design-choices optimized simultaneously for multiple metrics needed for maintaining lifetime requirements

    Improving the Reliability of Microprocessors under BTI and TDDB Degradations

    Get PDF
    Reliability is a fundamental challenge for current and future microprocessors with advanced nanoscale technologies. With smaller gates, thinner dielectric and higher temperature microprocessors are vulnerable under aging mechanisms such as Bias Temperature Instability (BTI) and Temperature Dependent Dielectric Breakdown (TDDB). Under continuous stress both parametric and functional errors occur, resulting compromised microprocessor lifetime. In this thesis, based on the thorough study on BTI and TDDB mechanisms, solutions are proposed to mitigating the aging processes on memory based and random logic structures in modern out-of-order microprocessors. A large area of processor core is occupied by memory based structure that is vulnerable to BTI induced errors. The problem is exacerbated when PBTI degradation in NMOS is as severe as NBTI in PMOS in high-k metal gate technology. Hence a novel design is proposed to recover 4 internal gates within a SRAM cell simultaneously to mitigate both NBTI and PBTI effects. This technique is applied to both the L2 cache banks and the busy function units with storage cells in out-of-order pipeline in two different ways. For the L2 cache banks, redundant cache bank is added exclusively for proactive recovery rotation. For the critical and busy function units in out-of-order pipelines, idle cycles are exploited at per-buffer-entry level. Different from memory based structures, combinational logic structures such as function units in execution stage can not use low overhead redundancy to tolerate errors due to their irregular structure. A design framework that aims to improve the reliability of the vulnerable functional units of a processor core is designed and implemented. The approach is designing a generic function unit (GFU) that can be reconfigured to replace a particular functional unit (FU) while it is being recovered for improved lifetime. Although flexible, the GFU is slower than the original target FUs. So GFU is carefully designed so as to minimize the performance loss when it is in-use. More schemes are also designed to avoid using the GFU on performance critical paths of a program execution

    Improving the Robustness of Redundant Execution with Register File Randomization

    Full text link
    [EN] Staggered Redundant execution (SRE) is a fault-tolerance mechanism that has been widely deployed in the context of safety-critical applications. SRE not only protects the system in the presence of faults but also helps relaxing safety requirements of individual elements. However, in this paper, we show that SRE does not effectively protect the system against a wide range of faults and thus, new mechanisms to increase the diversity of homogeneous cores are needed. In this paper, we propose Register File Randomization (RFR), a low-cost diversity mechanism that significantly increases the robustness of homogeneous multicores in front of common-cause faults (CCFs) and register file wearout. Our results show that RFR completely removes the failure rate for register file CCFs for certain workloads and reduces by a factor of 5X the impact of stress related register file aging for the workloads analysed. Our implementation requires less than 50 RTL lines of code and the area (FPGA logic) overhead of RFR is less than 0.2% of a 64-bit RISC-V core FPGA implementation.This work has received funding from the ECSEL Joint Undertaking (JU) under grant agreement No 877056 and the Agencia Estatal de Investigacion from Spain under grant agreement no. PCI2020-112092, and from the the European Unions Horizon 2020 research and innovation programme under grant agreement no. 871467.Tuzov, I.; Andreu, P.; Medina, L.; Picornell-Sanjuan, T.; Robles Martínez, A.; López Rodríguez, PJ.; Flich Cardo, J.... (2021). Improving the Robustness of Redundant Execution with Register File Randomization. IEEE. 1-9. https://doi.org/10.1109/ICCAD51958.2021.96434661

    Techniques for Improving Security and Trustworthiness of Integrated Circuits

    Get PDF
    The integrated circuit (IC) development process is becoming increasingly vulnerable to malicious activities because untrusted parties could be involved in this IC development flow. There are four typical problems that impact the security and trustworthiness of ICs used in military, financial, transportation, or other critical systems: (i) Malicious inclusions and alterations, known as hardware Trojans, can be inserted into a design by modifying the design during GDSII development and fabrication. Hardware Trojans in ICs may cause malfunctions, lower the reliability of ICs, leak confidential information to adversaries or even destroy the system under specifically designed conditions. (ii) The number of circuit-related counterfeiting incidents reported by component manufacturers has increased significantly over the past few years with recycled ICs contributing the largest percentage of the total reported counterfeiting incidents. Since these recycled ICs have been used in the field before, the performance and reliability of such ICs has been degraded by aging effects and harsh recycling process. (iii) Reverse engineering (RE) is process of extracting a circuit’s gate-level netlist, and/or inferring its functionality. The RE causes threats to the design because attackers can steal and pirate a design (IP piracy), identify the device technology, or facilitate other hardware attacks. (iv) Traditional tools for uniquely identifying devices are vulnerable to non-invasive or invasive physical attacks. Securing the ID/key is of utmost importance since leakage of even a single device ID/key could be exploited by an adversary to hack other devices or produce pirated devices. In this work, we have developed a series of design and test methodologies to deal with these four challenging issues and thus enhance the security, trustworthiness and reliability of ICs. The techniques proposed in this thesis include: a path delay fingerprinting technique for detection of hardware Trojans, recycled ICs, and other types counterfeit ICs including remarked, overproduced, and cloned ICs with their unique identifiers; a Built-In Self-Authentication (BISA) technique to prevent hardware Trojan insertions by untrusted fabrication facilities; an efficient and secure split manufacturing via Obfuscated Built-In Self-Authentication (OBISA) technique to prevent reverse engineering by untrusted fabrication facilities; and a novel bit selection approach for obtaining the most reliable bits for SRAM-based physical unclonable function (PUF) across environmental conditions and silicon aging effects
    • …
    corecore