65 research outputs found

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    A survey of emerging architectural techniques for improving cache energy consumption

    Get PDF
    The search goes on for another ground breaking phenomenon to reduce the ever-increasing disparity between the CPU performance and storage. There are encouraging breakthroughs in enhancing CPU performance through fabrication technologies and changes in chip designs but not as much luck has been struck with regards to the computer storage resulting in material negative system performance. A lot of research effort has been put on finding techniques that can improve the energy efficiency of cache architectures. This work is a survey of energy saving techniques which are grouped on whether they save the dynamic energy, leakage energy or both. Needless to mention, the aim of this work is to compile a quick reference guide of energy saving techniques from 2013 to 2016 for engineers, researchers and students

    A hardware mechanism to reduce the energy consumption of the register file of in-order architectures

    Get PDF
    This paper introduces an efficient hardware approach to reduce the register file energy consumption by turning unused registers into a low power state. Bypassing the register fields of the fetch instruction to the decode stage allows the identification of registers required by the current instruction (instruction predecode) and allows the control logic to turn them back on. They are put into the low-power state after the instruction use. This technique achieves an 85% energy reduction with no performance penalty

    Impact of partitioning cache schemes on the cache hierarchy of SMT processors

    Full text link
    © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Power consumption is becoming an increasingly important component of processor design. As technology node shrinks both static and dynamic power become more relevant. This is particularly critical for the cache hierarchy. Previous implementations mainly focus on reducing only one kind of power in the cache, either static or dynamic. However, for a more robust approach that will remain relevant as technology continues to shrink, both aspects of power need to be addressed. Recent processors, e.g. Intel Core or IBM Power8, implement simultaneous multithreading (SMT) cores to hide high memory latencies. In these systems, the dynamic energy in the L1 cache is even more stressed since this cache level is shared by several threads running on the same core. This paper proposes and evaluates the use of phase adaptive caches in all structures of a 3-level cache hierarchy of a SMT cores. Compared to the use of conventional caches, our work results on significant dynamic and leakage energy savings with scarce performance impact.This work was supported by the Spanish Ministerio de EconomĂ­a y Competitividad (MINECO) and by FEDER funds under Grant TIN2012–38341–C04–01.Kenyon, S.; LĂłpez, S.; Sahuquillo BorrĂĄs, J. (2015). Impact of partitioning cache schemes on the cache hierarchy of SMT processors. IEEE. https://doi.org/10.1109/HPCC-CSS-ICESS.2015.127

    Software-Based Self-Test of Set-Associative Cache Memories

    Get PDF
    Embedded microprocessor cache memories suffer from limited observability and controllability creating problems during in-system tests. This paper presents a procedure to transform traditional march tests into software-based self-test programs for set-associative cache memories with LRU replacement. Among all the different cache blocks in a microprocessor, testing instruction caches represents a major challenge due to limitations in two areas: 1) test patterns which must be composed of valid instruction opcodes and 2) test result observability: the results can only be observed through the results of executed instructions. For these reasons, the proposed methodology will concentrate on the implementation of test programs for instruction caches. The main contribution of this work lies in the possibility of applying state-of-the-art memory test algorithms to embedded cache memories without introducing any hardware or performance overheads and guaranteeing the detection of typical faults arising in nanometer CMOS technologie

    Microarchitectural techniques to reduce energy consumption in the memory hierarchy

    Get PDF
    This thesis states that dynamic profiling of the memory reference stream can improve energy and performance in the memory hierarchy. The research presented in this theses provides multiple instances of using lightweight hardware structures to profile the memory reference stream. The objective of this research is to develop microarchitectural techniques to reduce energy consumption at different levels of the memory hierarchy. Several simple and implementable techniques were developed as a part of this research. One of the techniques identifies and eliminates redundant refresh operations in DRAM and reduces DRAM refresh power. Another, reduces leakage energy in L2 and higher level caches for multiprocessor systems. The emphasis of this research has been to develop several techniques of obtaining energy savings in caches using a simple hardware structure called the counting Bloom filter (CBF). CBFs have been used to predict L2 cache misses and obtain energy savings by not accessing the L2 cache on a predicted miss. A simple extension of this technique allows CBFs to do way-estimation of set associative caches to reduce energy in cache lookups. Another technique using CBFs track addresses in a Virtual Cache and reduce false synonym lookups. Finally this thesis presents a technique to reduce dynamic power consumption in level one caches using significance compression. The significant energy and performance improvements demonstrated by the techniques presented in this thesis suggest that this work will be of great value for designing memory hierarchies of future computing platforms.Ph.D.Committee Chair: Lee, Hsien-Hsin S.; Committee Member: Cahtterjee,Abhijit; Committee Member: Mukhopadhyay, Saibal; Committee Member: Pande, Santosh; Committee Member: Yalamanchili, Sudhaka

    A low-power cache system for high-performance processors

    Get PDF
    戶ćșŠ:新 ; 栱摊ç•Șć·:ç”Č3439ć· ; ć­ŠäœăźçšźéĄž:ćšćŁ«(ć·„ć­Š) ; 授䞎ćčŽæœˆæ—„:12-Sep-11 ; æ—©ć€§ć­Šäœèš˜ç•Șć·:新576

    Drowsy cache partitioning for reduced static and dynamic energy in the cache hierarchy

    Get PDF
    Power consumption in computing today has lead the industry towards energy efficient computing. As transistor technology shrinks, new techniques have to be developed to keep leakage current, the dominant portion of overall power consumption, to a minimum. Due to the large amount of transistors devoted to the cache hierarchy, the cache provides an excellent avenue to dramatically reduce power usage. The inherent danger with techniques that save power can negatively effect the primary reason for the inclusion of the cache, performance. This thesis work proposes a modification to the cache hierarchy that dramatically saves power with only a slight reduction in performance. By taking advantage of the overwhelming preference of memory accesses to the most recently used blocks, these blocks are placed into a small, fast access A partition. The rest of the cache is put into a drowsy mode, a state preserving technique that reduces leakage power within the remaining portion of the cache. This design was implemented within a private, second level cache that achieved an average of almost 20% dynamic energy savings and an average of nearly 45% leakage energy savings. These savings were attained while incurring an average performance penalty of only 2%

    Joint Hardware-Software Leakage Minimization Approach for the Register File of VLIW Embedded Architectures

    Get PDF
    New applications demand very high processing power when run on embedded systems. Very Long Instruction Word (VLIW) architectures have emerged as a promising alternative to provide such processing capabilities under the given energy budget. However, in this new VLIW-based architectures, the register file is a very critical contributor to the overall power consumption and new approaches have to be proposed to reduce its power while preserving system performance. In this paper, we propose a novel joint hardware–software approach that reduces the leakage energy in the register files of these embedded VLIW architectures. This approach relies upon an energy-aware register assignment method and a hardware support that creates sub-banks in the global register file that can be switched on/off at run time. Our results indicate energy savings in the register file, after considering the overhead of the added extra hardware, up to 50% for modern multimedia embedded applications without performance degradation. We illustrate this approach using real-life applications running on these processors. We also illustrate the tradeoff between the area overhead vs. the gains in the leakage energy for the different strategies
    • 

    corecore