970 research outputs found

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    A low-power cache system for high-performance processors

    Get PDF
    制度:新 ; 報告番号:甲3439号 ; 学位の種類:博士(工学) ; 授与年月日:12-Sep-11 ; 早大学位記番号:新576

    A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems

    Full text link
    In this paper, we present a novel cache design based on Multi-Level Cell Spin-Transfer Torque RAM (MLC STTRAM) that can dynamically adapt the set capacity and associativity to use efficiently the full potential of MLC STTRAM. We exploit the asymmetric nature of the MLC storage scheme to build cache lines featuring heterogeneous performances, that is, half of the cache lines are read-friendly, while the other is write-friendly. Furthermore, we propose to opportunistically deactivate ways in underutilized sets to convert MLC to Single-Level Cell (SLC) mode, which features overall better performance and lifetime. Our ultimate goal is to build a cache architecture that combines the capacity advantages of MLC and performance/energy advantages of SLC. Our experiments show an improvement of 43% in total numbers of conflict misses, 27% in memory access latency, 12% in system performance, and 26% in LLC access energy, with a slight degradation in cache lifetime (about 7%) compared to an SLC cache

    Buffer Controlled Cache for Low Power Multicore Processors

    Get PDF
    This thesis proposes a buffered dual access mode cache to reduce power consumption in multicore caches for embedded systems. This cache is called Buffer Controlled Cache (BCC cache). The proposed scheme introduces a pre-cache buffer to determine how to access the cache. The proposed cache shows better prediction rates and lower power consumption than conventional caches, such as Phased cache and Way-prediction cache. For single core implementation, Simplescalar and Cacti simulators have been used for these simulations using SPEC2000 benchmark programs. The experimental results show that the proposed cache improves the power consumption by 37%-42% over the conventional caches. Multi2Sim and McPAT simulators have been used for the multicore simulations using the Parsec benchmark programs. The experimental results show that the proposed cache improves the power consumption by as much as 54% over conventional caches

    Power considerations for memory-related microarchitecture designs

    Get PDF
    The fast performance improvement of computer systems in the last decade comes with the consistent increase on power consumption. In recent years, power dissipation is becoming a design constraint even for high-performance systems. Higher power dissipation means higher packaging and cooling cost, and lower reliability. This Ph.D. dissertation will investigate several memory-related design and optimization issues of general-purpose computer microarchitectures, aiming at reducing the power consumption without sacrificing the performance. The memory system consumes a large percentage of the system\u27s power. In addition, its behavior affects the processor power consumption significantly. In this dissertation, we propose two schemes to address the power-aware architecture issues related to memory: (1) We develop and evaluate low-power techniques for high-associativity caches. By dynamically applying different access modes for cache hits and misses, our proposed cache structure can achieve nearly lowest power consumption with minimal performance penalty. (2) We propose and evaluate look-ahead architectural adaptation techniques to reduce power consumption in processor pipelines based on the memory access information. The scheme can significantly reduce the power consumption of memory-intensive applications. Combined with other adaptation techniques, our schemes can effectively reduce the power consumption for both computer- and memory-intensive applications. The significance, potential impacts, and contributions of this dissertation are: (1) Academia and industry R & D has solely targeted the objective of high performance in both hardware and software designs since the beginning stage of building computer systems. However, the pursuit of high performance without considering energy consumption will inevitably lead to increased power dissipation and thus will eventually limit the development and progress of increasingly demanded mobile, portable, and high-performance computing systems. (2) Since our proposed method adaptively combines the merits of existing low-power cache designs, it approaches the optimum in terms of both retaining performance and saving energy. This low power solution for highly associative caches can be easily deployed with a low cost. (3) Using a cache miss , a common program execution event, as a triggering signal to slow down the processor issue rate, our scheme can effectively reduce processor power consumption. This design can be easily and practically deployed in many processor architectures with a low cost

    High-Performance Energy-Efficient and Reliable Design of Spin-Transfer Torque Magnetic Memory

    Get PDF
    In this dissertation new computing paradigms, architectures and design philosophy are proposed and evaluated for adopting the STT-MRAM technology as highly reliable, energy efficient and fast memory. For this purpose, a novel cross-layer framework from the cell-level all the way up to the system- and application-level has been developed. In these framework, the reliability issues are modeled accurately with appropriate fault models at different abstraction levels in order to analyze the overall failure rates of the entire memory and its Mean Time To Failure (MTTF) along with considering the temperature and process variation effects. Design-time, compile-time and run-time solutions have been provided to address the challenges associated with STT-MRAM. The effectiveness of the proposed solutions is demonstrated in extensive experiments that show significant improvements in comparison to state-of-the-art solutions, i.e. lower-power, higher-performance and more reliable STT-MRAM design

    Revisiting LP-NUCA Energy Consumption: Cache Access Policies and Adaptive Block Dropping

    Get PDF
    Cache working-set adaptation is key as embedded systems move to multiprocessor and Simultaneous Multithreaded Architectures (SMT) because interthread pollution harms system performance and battery life. Light-Power NUCA (LP-NUCA) is a working-set adaptive cache that depends on temporal-locality to save energy. This work identifies the sources of energy waste in LP-NUCAs: parallel access to the tag and data arrays of the tiles and low locality phases with useless block migration. To counteract both issues, we prove that switching to serial access reduces energy without harming performance and propose a machine learning Adaptive Drop Rate (ADR) controller that minimizes the amount of replacement and migration when locality is low. This work demonstrates that these techniques efficiently adapt the cache drop and access policies to save energy. They reduce LP-NUCA consumption 22.7% for 1SMT. With interthread cache contention in 2SMT, the savings rise to 29%. Versus a conventional organization, energy--delay improves 20.8% and 25% for 1- and 2SMT benchmarks, and, in 65% of the 2SMT mixes, gains are larger than 20%

    Cache architectures based on heterogeneous technologies to deal with manufacturing errors

    Full text link
    [EN] SRAM technology has traditionally been used to implement processor caches since it is the fastest existing RAM technology.However,one of the major drawbacks of this technology is its high energy consumption.To reduce this energy consumption modern processors mainly use two complementary techniques: i)low-power operating modes and ii)low-power memory technologies.The first technique allows the processor working at low clock frequencies and supply voltages.The main limitation of this technique is that manufacturing defects can significantly affect the reliability of SRAM cells when working these modes.The second technique brings alternative technologies such as eDRAM, which provides minimum area and power consumption.The main drawback of this memory technology is that reads are destructive and eDRAM cells work slower than SRAM ones. This thesis presents three main contributions regarding low-power caches and heterogeneous technologies: i)an study that identifies the optimal capacitance of eDRAM cells, ii)a novel cache design that tolerates the faults produced by SRAM cells in low-power modes, iii)a methodology that allows obtain the optimal operating frequency/voltage level when working with low-power modes. Regarding the first contribution,in this work SRAM and eDRAM technologies are combined to achieve a low-power fast cache that requires smaller area than conventional designs and that tolerates SRAM failures.First,this dissertation focuses on one of the main critical aspects of the design of heterogeneous caches:eDRAM cell capacitance.In this dissertation the optimal capacitance for an heterogeneous L1 data cache is identified by analyzing the compromise between performance and energy consumption.Experimental results show that an heterogeneous cache implemented with 10fF capacitors offers similar performance as a conventional SRAM cache while providing 55% energy savings and reducing by 29% the cache area. Regarding the second contribution,this thesis proposes a novel organization for a fault-tolerant heterogeneous cache.Currently,reducing the supply voltage is a mechanism widely used to reduce consumption and applies when the system workload activity decreases.However,SRAM cells cause different types of failures when the supply voltage is reduced and thus they limit the minimum operating voltage of the microprocessor. In the proposal,memory cells implemented with eDRAM technology serve as backup in case of failure of SRAM cells, because the correct operation of eDRAM cells is not affected by reduced voltages. The proposed architecture has two working modes: high-performance mode for supply voltages that do not induce SRAM cell failures, and low-power mode for those voltages that cause SRAM cell failures. In high-performance mode, the cache provides full capacity, which enables the processor to achieve its maximum performance. In low-power mode, the effective capacity of the cache is reduced because some of the eDRAM cells are dedicated to recover from SRAM failures. Experimental results show that the performance is scarcely reduced (e.g. less than 2.7% across all the studied benchmarks) with respect to an ideal SRAM cache without failures. Finally,this thesis proposes a methodology to find the optimal frequency/voltage level regarding energy consumption for the designed heterogeneous cache. For this purpose, first SRAM failure types and their probabilities are characterized.Then,the energy consumption of different frequency/voltage levels is evaluated when the system works in low-power mode.The study shows that, mainly due to the impact of SRAM failures on performance,the optimal combination of voltage and frequency from the energy point of view does not always correspond to the minimum voltage.[ES] La tecnología SRAM se ha utilizado tradicionalmente para implementar las memorias cache debido a que es la tecnología de memoria RAM más rápida existente.Por contra,uno de los principales inconvenientes de esta tecnología es su elevado consumo energético.Para reducirlo los procesadores modernos suelen emplear dos técnicas complementarias:i) modos de funcionamiento de bajo consumo y ii)tecnologías de bajo consumo.La primeras técnica consiste en utilizar bajas frecuencias y voltajes de funcionamiento.La principal limitación de esta técnica es que los defectos de fabricación pueden afectar notablemente a la fiabilidad de las celdas SRAM en estos modos.La segunda técnica agrupa tecnologías alternativas como la eDRAM,que ofrece área y consumo mínimos.El inconveniente de esta tecnología es que las lecturas son destructivas y es más lenta que la SRAM. Esta tesis presenta tres contribuciones principales centradas en caches de bajo consumo y tecnologías heterogéneas: i)estudio de la capacitancia óptima de las celdas eDRAM, ii)diseño de una cache tolerante a fallos producidos en las celdas SRAM en modos de bajo consumo, iii)metodología para obtener la relación óptima entre voltaje y frecuencia en procesadores con modos de bajo consumo. Respecto a la primera contribución,en este trabajo se combinan las tecnologías SRAM y eDRAM para conseguir una memoria cache rápida, de bajo consumo, área reducida, y tolerante a los fallos inherentes a la tecnología SRAM.En primer lugar,esta disertación se centra en uno de los aspectos críticos de diseño de caches heterogéneas SRAM/eDRAM: la capacitancia de los condensadores implementados con tecnología eDRAM.En esta tesis se identifica la capacitancia óptima de una cache de datos L1 heterogénea mediante el estudio del compromiso entre prestaciones y consumo energético.Los resultados experimentales muestran que condensadores de 10fF ofrecen prestaciones similares a las de una cache SRAM convencional ahorrando un 55% de consumo y reduciendo un 29% el área ocupada por la cache. Respecto a la segunda contribución,esta tesis propone una organización de cache heterogénea tolerante a fallos.Actualmente,reducir el voltaje de alimentación es un mecanismo muy utilizado para reducir el consumo en condiciones de baja carga.Sin embargo,las celdas SRAM producen distintos tipos de fallos cuando se reduce el voltaje de alimentación y por tanto limitan el voltaje mínimo de funcionamiento del microprocesador. En la cache heterogénea propuesta,las celdas de memoria implementadas con tecnología eDRAM sirven de copia de seguridad en caso de fallo de las celdas SRAM, ya que el correcto funcionamiento de las celdas eDRAM no se ve afectado por tensiones reducidas.La arquitectura propuesta consta de dos modos de funcionamiento: high-performance mode para voltajes de alimentación que no inducen fallos en celdas implementadas en tecnología SRAM, y low-power mode para aquellos que sí lo hacen. En el modo high-performance mode,el procesador dispone de toda la capacidad de la cache.En el modo low-power mode se reduce la capacidad efectiva de la cache puesto que algunas de las celdas eDRAM se dedican a la recuperación de fallos de celdas SRAM.El estudio de prestaciones realizado muestra que éstas bajan hasta un máximo de 2.7% con respecto a una cache perfecta sin fallos. Finalmente, en esta tesis se propone una metodología para encontrar la relación óptima de voltaje/frecuencia con respecto al consumo energético sobre la cache heterogénea previamente diseñada. Para ello,primero se caracterizan los tipos de fallos SRAM y las probabilidades de fallo de los mismos.Después,se evalúa el consumo energético de diferentes combinaciones de voltaje/frecuencia cuando el sistema se encuentra en un modo de bajo consumo.El estudio muestra que la combinación óptima de voltaje y frecuencia desde el punto de vista energético no siempre corresponde al mínimo voltaje debido al imp[CA] La tecnologia SRAM s'ha utilitzat tradicionalment per a implementar les memòries cau degut a que és la tecnologia de memòria RAM més ràpida existent.Per contra, un dels principals inconvenients d'aquesta tecnologia és el seu elevat consum energètic.Per a reduir el consum els processadors moderns solen emprar dues tècniques complementàries: i)modes de funcionament de baix consum i ii)tecnologies de baix consum.La primera tècnica consisteix en utilitzar baixes freqüències i voltatges de funcionament.La principal limitació d'aquesta tècnica és que els defectes de fabricació poden afectar notablement a la fiabilitat de les cel·les SRAM en aquests modes.La segona tècnica agrupa tecnologies alternatives com la eDRAM, que ofereix àrea i consum mínims.L'inconvenient d'aquesta tecnologia és que les lectures són destructives i és més lenta que la SRAM. Aquesta tesi presenta tres contribucions principals centrades en caus de baix consum i tecnologies heterogènies: i)estudi de la capacitancia òptima de les cel·les eDRAM, ii)disseny d'una cau tolerant a fallades produïdes en les cel·les SRAM en modes de baix consum, iii)metodologia per a obtenir la relació òptima entre voltatge i freqüència en processadors amb modes de baix consum. Respecte a la primera contribució, en aquest treball es combinen les tecnologies SRAM i eDRAM per a aconseguir una memòria cau ràpida, de baix consum, àrea reduïda, i tolerant a les fallades inherents a la tecnologia SRAM.En primer lloc, aquesta dissertació se centra en un dels aspectes crítics de disseny de caus heterogènies: la capacitancia dels condensadors implementats amb tecnologia eDRAM.En aquesta dissertació s'identifica la capacitancia òptima d'una cache de dades L1 heterogènia mitjançant l'estudi del compromís entre prestacions i consum energètic.Els resultats experimentals mostren que condensadors de 10fF ofereixen prestacions similars a les d'una cau SRAM convencional estalviant un 55% de consum i reduint un 29% l'àrea ocupada per la cau. Respecte a la segona contribució, aquesta tesi proposa una organització de cau heterogènia tolerant a fallades.Actualment,reduir el voltatge d'alimentació és un mecanisme molt utilitzat per a reduir el consum en condicions de baixa càrrega.Per contra, les cel·les SRAM produeixen diferents tipus de fallades quan es redueix el voltatge d'alimentació i per tant limiten el voltatge mínim de funcionament del microprocessador. En la cau heterogènia proposta, les cel·les de memòria implementades amb tecnologia eDRAM serveixen de còpia de seguretat en cas de fallada de les cel·les SRAM, ja que el correcte funcionament de les cel·les eDRAM no es veu afectat per tensions reduïdes.L'arquitectura proposada consta de dues maneres de funcionament: high-performance mode per a voltatges d'alimentació que no indueixen fallades en cel·les implementades en tecnologia SRAM,i low-power mode per a aquells que sí ho fan.En el mode high-performance,el processador disposa de tota la capacitat de la cau.En el mode low-power es redueix la capacitat efectiva de la cau posat que algunes de les cel·les eDRAM es dediquen a la recuperació de fallades de cel·les SRAM.L'estudi de prestacions realitzat mostra que aquestes baixen fins a un màxim de 2.7% pel que fa a una cache perfecta sense fallades. Finalment,en aquesta tesi es proposa una metodologia per a trobar la relació òptima de voltatge/freqüència pel que fa al consum energètic sobre la cau heterogènia prèviament dissenyada.Per a açò,primer es caracteritzen els tipus de fallades SRAM i les probabilitats de fallada de les mateixes.Després,s'avalua el consum energètic de diferents combinacions de voltatge/freqüència quan el sistema es troba en un mode de baix consum.L'estudi mostra que la combinació òptima de voltatge i freqüència des del punt de vista energètic no sempre correspon al mínim voltatge degut a l'impacte de les fallades de SRAM en les presLorente Garcés, VJ. (2015). Cache architectures based on heterogeneous technologies to deal with manufacturing errors [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/58428TESI

    Towards adaptive balanced computing (ABC) using reconfigurable functional caches (RFCs)

    Get PDF
    The general-purpose computing processor performs a wide range of functions. Although the performance of general-purpose processors has been steadily increasing, certain software technologies like multimedia and digital signal processing applications demand ever more computing power. Reconfigurable computing has emerged to combine the versatility of general-purpose processors with the customization ability of ASICs. The basic premise of reconfigurability is to provide better performance and higher computing density than fixed configuration processors. Most of the research in reconfigurable computing is dedicated to on-chip functional logic. If computing resources are adaptable to the computing requirement, the maximum performance can be achieved. To overcome the gap between processor and memory technology, the size of on-chip cache memory has been consistently increasing. The larger cache memory capacity, though beneficial in general, does not guarantee a higher performance for all the applications as they may not utilize all of the cache efficiently. To utilize on-chip resources effectively and to accelerate the performance of multimedia applications specifically, we propose a new architecture---Adaptive Balanced Computing (ABC). ABC uses dynamic resource configuration of on-chip cache memory by integrating Reconfigurable Functional Caches (RFC). RFC can work as a conventional cache or as a specialized computing unit when necessary. In order to convert a cache memory to a computing unit, we include additional logic to embed multi-bit output LUTs into the cache structure. We add the reconfigurability of cache memory to a conventional processor with minimal modification to the load/store microarchitecture and with minimal compiler assistance. ABC architecture utilizes resources more efficiently by reconfiguring the cache memory to computing units dynamically. The area penalty for this reconfiguration is about 50--60% of the memory cell cache array-only area with faster cache access time. In a base array cache (parallel decoding caches), the area penalty is 10--20% of the data array with 1--2% increase in the cache access time. However, we save 27% for FIR and 44% for DCT/IDCT in area with respect to memory cell array cache and about 80% for both applications with respect to base array cache if we were to implement all these units separately (such as ASICs). The simulations with multimedia and DSP applications (DCT/IDCT and FIR/IIR) show that the resource configuration with the RFC speedups ranging from 1.04X to 3.94X in overall applications and from 2.61X to 27.4X in the core computations. The simulations with various parameters indicate that the impact of reconfiguration can be minimized if an appropriate cache organization is selected

    A survey of emerging architectural techniques for improving cache energy consumption

    Get PDF
    The search goes on for another ground breaking phenomenon to reduce the ever-increasing disparity between the CPU performance and storage. There are encouraging breakthroughs in enhancing CPU performance through fabrication technologies and changes in chip designs but not as much luck has been struck with regards to the computer storage resulting in material negative system performance. A lot of research effort has been put on finding techniques that can improve the energy efficiency of cache architectures. This work is a survey of energy saving techniques which are grouped on whether they save the dynamic energy, leakage energy or both. Needless to mention, the aim of this work is to compile a quick reference guide of energy saving techniques from 2013 to 2016 for engineers, researchers and students
    corecore