20 research outputs found

    Fine-grain CAM-tag cache resizing using miss tags

    Get PDF

    A hardware mechanism to reduce the energy consumption of the register file of in-order architectures

    Get PDF
    This paper introduces an efficient hardware approach to reduce the register file energy consumption by turning unused registers into a low power state. Bypassing the register fields of the fetch instruction to the decode stage allows the identification of registers required by the current instruction (instruction predecode) and allows the control logic to turn them back on. They are put into the low-power state after the instruction use. This technique achieves an 85% energy reduction with no performance penalty

    Energy Wall for Exascale Supercomputing

    Get PDF
    "Sustainable development" is one of the major issues in the 21st century. Thus the notions of green computing, green development and so on show up one after another. As the large-scale parallel computing systems develop rapidly, energy consumption of such systems is becoming very huge, especially system performance reaches Petascale (10^15 Flops) or even Exascale (10^18 Flops). The huge energy consumption increases the system temperature, which seriously undermines the stability and reliability, and limits the growth of system size. The effects of energy consumption on scalability become a growing concern. Against the background, this paper proposes the concept of "Energy Wall" to highlight the significance of achieving scalable performance in peta/exascale supercomputing by taking energy consumption into account. We quantify the effect of energy consumption on scalability by building the energy-efficiency speedup model, which integrates computing performance and system energy. We define the energy wall quantitatively, and provide the theorem on the existence of the energy wall, and categorize the large-scale parallel computers according to the energy consumption. In the context of several representative types of HPC applications, we analyze and extrapolate the existence of the energy wall considering three kinds of topologies, 3D-Torus, binary n-cube and Fat tree which provides insights on how to mitigate the energy wall effect in system design and through hardware/software optimization in peta/exascale supercomputing

    LowLEAC: Low leakage energy architecture for caches

    Get PDF
    With the ever-decreasing feature sizes, static power dissipation has become a concern in computing devices. On-chip memories are a major contributor towards the processor’s leakage power dissipation due to their large transistor count. We propose a Low Leakage Energy Architecture for Caches, called LowLEAC to minimize the static power dissipation in caches made of CMOS SRAM cells. This technique is based on keeping only k most recently used cache lines powered on other lines powered off to reduce the leakage power dissipation. The control however increases the dynamic power due to re-fetching of data. To overcome that, we deploy CMOS compatible nonvolatile SRAM cell, called cNVSRAM, to implement caches. The cNVSRAM cell works as a conventional SRAM in the regular mode and saves the data in a non-volatile back up when a cache line is turned off or put in the sleep mode. The non-volatile back up mode helps improve the dependability of the cache and avoids the penalty occurred due to loss of data from the inactive cache lines. With a small area penalty, LowLEAC achieves 18% energy savings with insignificant impact on the performance. LowLEAC is a suitable architecture for cache memory in mobile computing devices to minimize battery power consumption and reduce heat

    Energy-Aware Compilation and Hardware Design for VLIW Embedded Systems

    Get PDF
    Tomorrow's embedded devices need to run multimedia applications demanding high computational power with low energy consumption constraints. In this context, the register file is a key source of power consumption and its inappropriate design and management severely affects system power. In this paper, we present a new approach to reduce the energy of shared register files in forthcoming embedded VLIW processors running real-life applications up to 60% without performance penalty. This approach relies on limited hardware extensions and a compiler-based energy-aware register assignment algorithm to deactivate at run-time parts of the register file (i.e., sub-banks) in an independent way

    Joint Hardware-Software Leakage Minimization Approach for the Register File of VLIW Embedded Architectures

    Get PDF
    New applications demand very high processing power when run on embedded systems. Very Long Instruction Word (VLIW) architectures have emerged as a promising alternative to provide such processing capabilities under the given energy budget. However, in this new VLIW-based architectures, the register file is a very critical contributor to the overall power consumption and new approaches have to be proposed to reduce its power while preserving system performance. In this paper, we propose a novel joint hardware–software approach that reduces the leakage energy in the register files of these embedded VLIW architectures. This approach relies upon an energy-aware register assignment method and a hardware support that creates sub-banks in the global register file that can be switched on/off at run time. Our results indicate energy savings in the register file, after considering the overhead of the added extra hardware, up to 50% for modern multimedia embedded applications without performance degradation. We illustrate this approach using real-life applications running on these processors. We also illustrate the tradeoff between the area overhead vs. the gains in the leakage energy for the different strategies

    CACHES D-NUCA TRIANGOLARI:MODELLAZIONE E VALUTAZIONE DEL CONSUMO DI POTENZA DINAMICO

    Get PDF
    In questo lavoro di tesi sarà valutato il consumo di potenza dinamico per le architetture di cache ad accesso non uniforme (Dinamic NUCA). A tal fine ù stata modellata un’ architettura per le cache D-NUCA e Triangular D-NUCA; ricavati dei parametri per la stima del consumo dinamico; effettuate simulazioni per determinare il consumo dinamico della cache. I parametri di consumo sono stati ottenuti con il software per la modellazione di memorie cache CACTI. Le simulazioni sono state effettuate modificando il simulatore Sim-Alpha ( simulatore del processore ALPHA 21264) adattato all’utilizzo di cache TD-NUCA , i benchmark scelti: 176.gcc, 181.mcf, 256.bzip2, 300.twolf, sono appartenenti alla suite SpecInt 2000. I risultati ottenuti portano alla conclusione che le cache TD-NUCA Decrescenti sono risultate in assoluto le migliori sia in termini di consumo che di prestazioni su tre dei quattro benchmark; ne consegue che ulteriori affinamenti in questa tecnica progettuale sono auspicabili
    corecore