57 research outputs found

    On microarchitectural mechanisms for cache wearout reduction

    No full text
    Hot carrier injection (HCI) and bias temperature instability (BTI) are two of the main deleterious effects that increase a transistor's threshold voltage over the lifetime of a microprocessor. This voltage degradation causes slower transistor switching and eventually can result in faulty operation. HCI manifests itself when transistors switch from logic ''0'' to ''1'' and vice versa, whereas BTI is the result of a transistor maintaining the same logic value for an extended period of time. These failure mechanisms are especiall in those transistors used to implement the SRAM cells of first-level (L1) caches, which are frequently accessed, so they are critical to performance, and they are continuously aging. This paper focuses on microarchitectural solutions to reduce transistor aging effects induced by both HCI and BTI in the data array of L1 data caches. First, we show that the majority of cell flips are concentrated in a small number of specific bits within each data word. In addition, we also build upon the previous studies, showing that logic ''0'' is the most frequently written value in a cache by identifying which cells hold a given logic value for a significant amount of time. Based on these observations, this paper introduces a number of architectural techniques that spread the number of flips evenly across memory cells and reduce the amount of time that logic ''0'' values are stored in the cells by switchingThis work was supported in part by the Spanish Ministerio de Economía y Competitividad within the Plan E Funds under Grant TIN2015-66972-C5-1-R, in part by the HiPEAC Collaboration Grant funded by the FP7 HiPEAC Network of Excellence under Grant 287759, and in part by the Engineering and Physical Sciences Research Council under Grant EP/K 026399/1 and Grant EP/J016284/1

    Enhancing the L1 Data Cache Design to Mitigate HCI

    Get PDF
    Over the lifetime of a microprocessor, the Hot Carrier Injection (HCI) phenomenon degrades the threshold voltage, which causes slower transistor switching and eventually results in timing violations and faulty operation. This effect appears when the memory cell contents flip from logic ‘0’ to ‘1’ and vice versa. In caches, the majority of cell flips are concentrated into only a few of the total memory cells that make up each data word. In addition, other researchers have noted that zero is the most commonly-stored data value in a cache, and have taken advantage of this behavior to propose data compression and power reduction techniques. Contrary to these works, we use this information to extend the lifetime of the caches by introducing two microarchitectural techniques that spread and reduce the number of flips across the first-level (L1) data cache cells. Experimental results show that, compared to the conventional approach, the proposed mechanisms reduce the highest cell flip peak up to 65.8%, whereas the threshold voltage degradation savings range from 32.0% to 79.9% depending on the application.This work has been supported by the Spanish Ministerio de Econom´ıa y Competitividad (MINECO), by FEDER funds through Grant TIN2012-38341-C04-01, by the Intel Early Career Faculty Honor Program Award, by a HiPEAC Collaboration Grant funded by the FP7 HiPEAC Network of Excellence under grant agreement 287759, and by the Engineering and Physical Sciences Research Council (EPSRC) through Grants EP/K026399/1 and EP/J016284/1.This is the author accepted manuscript. The final version is available from IEEE at http://dx.doi.org/10.1109/LCA.2015.2460736. The dataset associated with this article can be found on the repository at https://www.repository.cam.ac.uk/handle/1810/249006

    Balancing soft error coverage with lifetime reliability in redundantly multithreaded processors

    Get PDF
    Silicon reliability is a key challenge facing the microprocessor industry. Processors need to be designed such that they are resilient against both soft errors and lifetime reliability phenomena. However, techniques developed to address one class of reliability problems may impact other aspects of silicon reliability. In this paper, we show that Redundant Multi-Threading (RMT), which provides soft error protection, exacerbates lifetime reliability. We then explore two different architectural approaches to tackle this problem, namely, Dynamic Voltage Scaling (DVS) and partial RMT. We show that each approach has certain strengths and weaknesses with respect to performance, soft error coverage, and lifetime reliability. We then propose and evaluate a hybrid approach that combines DVS and partial RMT. We show that this approach provides better improvement in lifetime reliability than DVS or partial RMT alone, buys back a significant amount of performance that is lost due to DVS, and provides nearly complete soft error coverage. I

    PRITEXT: Processor Reliability Improvement Through Exercise Technique

    Get PDF
    With continuous improvements in CMOS technology, transistor sizes are shrinking aggressively every year. Unfortunately, such deep submicron process technologies are severely degraded by several wearout mechanisms which lead to prolonged operational stress and failure. Negative Bias Temperature Instability (NBTI) is a prominent failure mechanism which degrades the reliability of current semiconductor devices. Improving reliability of processors is necessary for ensuring long operational lifetime which obviates the necessity of mitigating the physical wearout mechanisms. NBTI severely degrades the performance of PMOS transistors in a circuit, when negatively biased, by increasing the threshold voltage leading to critical timing failures over operational lifetime. A lack of activity among the PMOS transistors for long duration leads to a steady increase in threshold voltage Vth. Interestingly, NBTI stress can be recovered by removing the negative bias using appropriate input vectors. Exercising the dormant critical components in the Processor has been proved to reduce the NBTI stress. We use a novel methodology to generate a minimal set of deterministic input vectors which we show to be effective in reducing the NBTI wearout in a superscalar processor core. We then propose and evaluate a new technique PRITEXT, which uses these input vectors in exercise mode to effectively reduce the NBTI stress and improve the operational lifetime of superscalar processors. PRITEXT, which uses Input Vector Control, leads to a 4.5x lifetime improvement of superscalar processor on average with a maximum lifetime improvement of 12.7x

    Improving the Robustness of Redundant Execution with Register File Randomization

    Full text link
    [EN] Staggered Redundant execution (SRE) is a fault-tolerance mechanism that has been widely deployed in the context of safety-critical applications. SRE not only protects the system in the presence of faults but also helps relaxing safety requirements of individual elements. However, in this paper, we show that SRE does not effectively protect the system against a wide range of faults and thus, new mechanisms to increase the diversity of homogeneous cores are needed. In this paper, we propose Register File Randomization (RFR), a low-cost diversity mechanism that significantly increases the robustness of homogeneous multicores in front of common-cause faults (CCFs) and register file wearout. Our results show that RFR completely removes the failure rate for register file CCFs for certain workloads and reduces by a factor of 5X the impact of stress related register file aging for the workloads analysed. Our implementation requires less than 50 RTL lines of code and the area (FPGA logic) overhead of RFR is less than 0.2% of a 64-bit RISC-V core FPGA implementation.This work has received funding from the ECSEL Joint Undertaking (JU) under grant agreement No 877056 and the Agencia Estatal de Investigacion from Spain under grant agreement no. PCI2020-112092, and from the the European Unions Horizon 2020 research and innovation programme under grant agreement no. 871467.Tuzov, I.; Andreu, P.; Medina, L.; Picornell-Sanjuan, T.; Robles Martínez, A.; López Rodríguez, PJ.; Flich Cardo, J.... (2021). Improving the Robustness of Redundant Execution with Register File Randomization. IEEE. 1-9. https://doi.org/10.1109/ICCAD51958.2021.96434661

    StageNet: A Reconfigurable Fabric for Constructing Dependable CMPs

    Full text link

    Energy and Reliability in Future NOC Interconnected CMPS

    Get PDF
    In this dissertation, I explore energy and reliability in future NoC (Network-on-Chip) interconnected CMPs (chip multiprocessors) as they have become a first-order constraint in future CMP design. In the first part, we target the root cause of network energy consumption through techniques that reduce link and router-level switching activity. We specifically focus on memory subsystem traffic, as it comprises the bulk of NoC load in a CMP. By transmitting only the flits that contain words that we predicted would be useful using a novel spatial locality predictor, our scheme seeks to reduce network activity. We aim to further lower NoC energy consumption through microarchitectural mechanisms that inhibit datapath switching activity caused by unused words in individual flits. Using simulation-based performance studies and detailed energy models based on synthesized router designs and different link wire types, we show that (a) the pre- diction mechanism achieves very high accuracy, with an average rate of false-unused prediction of just 2.5%; (b) the combined NoC energy savings enabled by the predictor and microarchitectural support are 36% on average and up to 57% in the best case; and (c) there is no system performance penalty as a result of this technique. In the second part, we present a method for dynamic voltage/frequency scaling of networks-on-chip and last level caches in CMP designs, where the shared resources form a single voltage/frequency domain. We develop a new technique for monitoring and control and validate it by running PARSEC benchmarks through full system simulations. These techniques reduce energy-delay product by 46% compared to a state-of-the-art prior work. In the third part, we develop critical path models for HCI- and NBTI-induced wear assuming stress caused under realistic workload conditions, and apply them onto the interconnect microarchitecture. A key finding from this modeling is that, counter to prevailing wisdom, wearout in the CMP on-chip interconnect is correlated with a lack of load observed in the NoC routers, rather than high load. We then develop a novel wearout-decelerating scheme in which routers under low load have their wearout-sensitive components exercised without significantly impacting the router’s cycle time, pipeline depth, and area or power consumption. We subsequently show that the proposed design yields a 13.8∼65× increase in CMP lifetime

    Cross-Layer Approaches for an Aging-Aware Design of Nanoscale Microprocessors

    Get PDF
    Thanks to aggressive scaling of transistor dimensions, computers have revolutionized our life. However, the increasing unreliability of devices fabricated in nanoscale technologies emerged as a major threat for the future success of computers. In particular, accelerated transistor aging is of great importance, as it reduces the lifetime of digital systems. This thesis addresses this challenge by proposing new methods to model, analyze and mitigate aging at microarchitecture-level and above

    A Holistic Solution for Reliability of 3D Parallel Systems

    Full text link
    As device scaling slows down, emerging technologies such as 3D integration and carbon nanotube field-effect transistors are among the most promising solutions to increase device density and performance. These emerging technologies offer shorter interconnects, higher performance, and lower power. However, higher levels of operating temperatures and current densities project significantly higher failure rates. Moreover, due to the infancy of the manufacturing process, high variation, and defect densities, chip designers are not encouraged to consider these emerging technologies as a stand-alone replacement for Silicon-based transistors. The goal of this dissertation is to introduce new architectural and circuit techniques that can work around high-fault rates in the emerging 3D technologies, improving performance and reliability comparable to Silicon. We propose a new holistic approach to the reliability problem that addresses the necessary aspects of an effective solution such as detection, diagnosis, repair, and prevention synergically for a practical solution. By leveraging 3D fabric layouts, it proposes the underlying architecture to efficiently repair the system in the presence of faults. This thesis presents a fault detection scheme by re-executing instructions on idle identical units that distinguishes between transient and permanent faults while localizing it to the granularity of a pipeline stage. Furthermore, with the use of a dynamic and adaptive reconfiguration policy based on activity factors and temperature variation, we propose a framework that delivers a significant improvement in lifetime management to prevent faults due to aging. Finally, a design framework that can be used for large-scale chip production while mitigating yield and variation failures to bring up Carbon Nano Tube-based technology is presented. The proposed framework is capable of efficiently supporting high-variation technologies by providing protection against manufacturing defects at different granularities: module and pipeline-stage levels.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168118/1/javadb_1.pd

    Pairing Software-Managed Caching with Decay Techniques to Balance Reliability and Static Power in Next-Generation Caches

    Get PDF
    Since array structures represent well over half the area and transistors on-chip, maintaining their ability to scale is crucial for overall technology scaling. Shrinking transistor sizes are resulting in increased probabilities of single events causing single- and multi-bit upsets which require adoption of more complex and power hungry error detection and correction codes (ECC) in hardware. At the same time, SRAM leakage energy is increasing partly due to technology trends and partly due to the increasing number of transistors present. This paper proposes and evaluates methods of reducing the static power requirements of caches, while also maintaining high reliability. In particular, we propose methods of applying reduced ECC techniques to data that has been identified (by programmer or compiler) as error-tolerant. This segregation, in turn, makes both the default data and the error-tolerant data more amenable to decay-based techniques for leakage control. We examine the potential of this split memory hierarchy along several dimensions. In particular, we consider the power and reliability issues inherent in the approach. Overall, we show that our approach allows the ECC requirements of future applications and caches to be met while also reducing leakage energy
    • …
    corecore