26 research outputs found

    Energy-Aware Data Movement In Non-Volatile Memory Hierarchies

    Get PDF
    While technology scaling enables increased density for memory cells, the intrinsic high leakage power of conventional CMOS technology and the demand for reduced energy consumption inspires the use of emerging technology alternatives such as eDRAM and Non-Volatile Memory (NVM) including STT-MRAM, PCM, and RRAM. The utilization of emerging technology in Last Level Cache (LLC) designs which occupies a signifcant fraction of total die area in Chip Multi Processors (CMPs) introduces new dimensions of vulnerability, energy consumption, and performance delivery. To be specific, a part of this research focuses on eDRAM Bit Upset Vulnerability Factor (BUVF) to assess vulnerable portion of the eDRAM refresh cycle where the critical charge varies depending on the write voltage, storage and bit-line capacitance. This dissertation broaden the study on vulnerability assessment of LLC through investigating the impact of Process Variations (PV) on narrow resistive sensing margins in high-density NVM arrays, including on-chip cache and primary memory. Large-latency and power-hungry Sense Amplifers (SAs) have been adapted to combat PV in the past. Herein, a novel approach is proposed to leverage the PV in NVM arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time. On the other hand, this dissertation investigates a novel technique to prioritize the service to 1) Extensive Read Reused Accessed blocks of the LLC that are silently dropped from higher levels of cache, and 2) the portion of the working set that may exhibit distant re-reference interval in L2. In particular, we develop a lightweight Multi-level Access History Profiler to effciently identify ERRA blocks through aggregating the LLC block addresses tagged with identical Most Signifcant Bits into a single entry. Experimental results indicate that the proposed technique can reduce the L2 read miss ratio by 51.7% on average across PARSEC and SPEC2006 workloads. In addition, this dissertation will broaden and apply advancements in theories of subspace recovery to pioneer computationally-aware in-situ operand reconstruction via the novel Logic In Interconnect (LI2) scheme. LI2 will be developed, validated, and re?ned both theoretically and experimentally to realize a radically different approach to post-Moore\u27s Law computing by leveraging low-rank matrices features offering data reconstruction instead of fetching data from main memory to reduce energy/latency cost per data movement. We propose LI2 enhancement to attain high performance delivery in the post-Moore\u27s Law era through equipping the contemporary micro-architecture design with a customized memory controller which orchestrates the memory request for fetching low-rank matrices to customized Fine Grain Reconfigurable Accelerator (FGRA) for reconstruction while the other memory requests are serviced as before. The goal of LI2 is to conquer the high latency/energy required to traverse main memory arrays in the case of LLC miss, by using in-situ construction of the requested data dealing with low-rank matrices. Thus, LI2 exchanges a high volume of data transfers with a novel lightweight reconstruction method under specific conditions using a cross-layer hardware/algorithm approach

    TWO CONTROL-FLOW ERROR RECOVERY METHODS FOR MULTITHREADED PROGRAMS RUNNING ON MULTI-CORE PROCESSORS

    Get PDF
    This paper presents two control-flow error recovery techniques, CFE Recovery using Data-flow graph Consideration and CFE Recovery using Macro block-level Check pointing. These techniques are proposed with regards to thread interactions in the programs. These techniques try to moderate the high memory and performance overheads of conventional control-flow checking techniques. The proposed recovery techniques are composed of two phases of control-flow error detection and recovery. These phases are designed by means of inserting additional instructions into program at compile time considering dependency graph, extracted from control-flow and data-flow dependencies among basic blocks and thread interactions in the programs. In order to evaluate the proposed techniques, five multithreaded benchmarks are utilized to run on a multi-core processor. Moreover, a total of 10000 transient faults have been injected into several executable points of each program. Fault injection experiments show that the proposed techniques recover the detected errors at-least for 91% of the cases

    Reactive Rejuvenation of CMOS Logic Paths using Self-activating Voltage Domains

    Get PDF
    Aggressive CMOS technology scaling trends exacerbate the aging-related degradation of propagation delay and energy efficiency in nanoscale designs. Recently, power-gating has been utilized as an effective low-power design technique which has also been shown to alleviate some aging impacts. However, the use of MOSFETs to realize power-gated designs will also encounter aging-induced degradations in the sleep transistors themselves which necessitates the exploration of design strategies to utilize power-gating effectively to mitigate aging. In particular, Bias Temperature Instability (BTI) which occurs during activation of power-gated voltage islands is investigated with respect to the placement of the sleep transistor in the header or footer as well as the impact of ungated input transitions on interfacial trapping. Results indicate the effectiveness of power-gating on NBTI/PBTI phenomena and propose a preferred sleep transistor configuration for maximizing higher recovery. Furthermore, the aging effect can manifest itself as timing error on critical speed-paths of the circuit, if a large design guardband is not reserved. To mitigate circuit from BTI-induced aging, the Reactive Rejuvenation (RR) architectural approach is proposed which entails detection and recovery phases. The BTI impact on the critical and near critical paths performance is continuously examined through a lightweight logic circuit which asserts an error signal in the case of any timing violation in those paths. By observing the timing violation occurrence in the system, the timing-sensitive portion of the circuit is recovered from BTI through switching computations to redundant aging-critical voltage domain. The proposed technique achieves aging mitigation and reduced energy consumption as compared to a baseline circuit. Thus, signi?cant voltage guardbands to meet the desired timing speci?cation are avoided result in energy savings during circuit operation

    Read-Tuned Stt-Ram And Edram Cache Hierarchies For Throughput And Energy Optimization

    No full text
    As capacity and complexity of on-chip cache memory hierarchy increases, the service cost to the critical loads from last level cache (LLC), which are frequently repeated, has become a major concern. The processor may stall for a considerable interval while waiting to access the data stored in the cache blocks in LLC, if there are no independent instructions to execute. To provide accelerated service to the critical loads requests from LLC, this paper concentrates on leveraging the additional capacity offered by replacing SRAM-based L2 with spin-transfer torque random access memory (STT-MRAM) to accommodate frequently accessed cache blocks in exclusive read mode in favor of reducing the overall read service time. Our proposed technique improves the temporal locality while preventing cache thrashing via sufficient accommodation of the frequently read reused fraction of working set that may exhibit distant re-reference interval in L2. Our experimental results show that the proposed technique can reduce the L2 read miss ratio by 51.7% on average compared to conventional STT-MRAM L2 design across PARSEC and SPEC2006 workloads while significantly decreasing the L2 dynamic energy consumption

    Compression or Corruption? A Study on the Effects of Transient Faults on BNN Inference Accelerators

    No full text
    Over past years, the philosophy for designing the artificial intelligence algorithms has significantly shifted towards automatically extracting the composable systems from massive data volumes. This paradigm shift has been expedited by the big data booming which enables us to easily access and analyze the highly large data sets. The most well-known class of big data analysis techniques is called deep learning. These models require significant computation power and extremely high memory accesses which necessitate the design of novel approaches to reduce the memory access and improve power efficiency while taking into account the development of domain-specific hardware accelerators to support the current and future data sizes and model structures. The current trends for designing application-specific integrated circuits barely consider the essential requirement for maintaining the complex neural network computation to be resilient in the presence of soft errors. The soft errors might strike either memory storage or combinational logic in the hardware accelerator that can affect the architectural behavior such that the precision of the results fall behind the minimum allowable correctness. In this study, we demonstrate that the impact of soft errors on a customized deep learning algorithm called Binarized Neural Network might cause drastic image misclassification. Our experimental results show that the accuracy of image classifier can drastically drop by 76.70% and 19.25% in IfcW1A1 and cnvW1A1 networks, respectively across CIFAR-10 and MNIST datasets during the fault injection for the worst-case scenarios
    corecore