2,314 research outputs found

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    Chapter One – An Overview of Architecture-Level Power- and Energy-Efficient Design Techniques

    Get PDF
    Power dissipation and energy consumption became the primary design constraint for almost all computer systems in the last 15 years. Both computer architects and circuit designers intent to reduce power and energy (without a performance degradation) at all design levels, as it is currently the main obstacle to continue with further scaling according to Moore's law. The aim of this survey is to provide a comprehensive overview of power- and energy-efficient “state-of-the-art” techniques. We classify techniques by component where they apply to, which is the most natural way from a designer point of view. We further divide the techniques by the component of power/energy they optimize (static or dynamic), covering in that way complete low-power design flow at the architectural level. At the end, we conclude that only a holistic approach that assumes optimizations at all design levels can lead to significant savings.Peer ReviewedPostprint (published version

    Power Modeling and Optimization for GPGPUs

    Get PDF
    Modern graphics processing units (GPUs) supports tens of thousands of parallel threads and delivers remarkably high computing throughput. General-Purpose computing on GPUs (GPGPUs) is becoming the attractive platform for general-purpose applications that request high computational performance such as scientific computing, financial applications, medical data processing, and so on. However, GPGPUs is facing severe power challenge due to the increasing number of cores placed on a single chip with decreasing feature size. In order to explore the power optimization techniques in GPGPUs, I first build a power model for GPGPUs, which is able to estimate both dynamic and leakage power of major microarchitecture structures in GPGPUs. I then target on the power-hungry structures (e.g. register file) to explore the energy-efficient GPGPUs. In order to hide the long latency operations, GPGPUs employs the fine-grained multi-threading among numerous active threads, leading to the sizeable register files with massive power consumption. The conventional method to reduce dynamic power consumption is the supply voltage scaling. And the inter-bank tunneling FETs (TFETs) is the promising candidate compared to CMOS for low voltage operations regarding to both leakage and performance. However, always executing at the low voltage will result in significant performance degradation. In this study, I propose the hybrid CMOS-TFET based register file and allocate TFET-based registers to threads whose execution progress can be delayed to some degree to avoid the memory contentions with other threads to reduce both dynamic and leakage power, and the CMOS-based registers are still used for threads requiring normal execution speed. My experimental results show that the proposed technique achieves 30% energy (including both dynamic and leakage) reduction in register files with negligible performance degradation compared to the baseline case equipped with naive power optimization technique

    An efficient design space exploration framework to optimize power-efficient heterogeneous many-core multi-threading embedded processor architectures

    Get PDF
    By the middle of this decade, uniprocessor architecture performance had hit a roadblock due to a combination of factors, such as excessive power dissipation due to high operating frequencies, growing memory access latencies, diminishing returns on deeper instruction pipelines, and a saturation of available instruction level parallelism in applications. An attractive and viable alternative embraced by all the processor vendors was multi-core architectures where throughput is improved by using micro-architectural features such as multiple processor cores, interconnects and low latency shared caches integrated on a single chip. The individual cores are often simpler than uniprocessor counterparts, use hardware multi-threading to exploit thread-level parallelism and latency hiding and typically achieve better performance-power figures. The overwhelming success of the multi-core microprocessors in both high performance and embedded computing platforms motivated chip architects to dramatically scale the multi-core processors to many-cores which will include hundreds of cores on-chip to further improve throughput. With such complex large scale architectures however, several key design issues need to be addressed. First, a wide range of micro- architectural parameters such as L1 caches, load/store queues, shared cache structures and interconnection topologies and non-linear interactions between them define a vast non-linear multi-variate micro-architectural design space of many-core processors; the traditional method of using extensive in-loop simulation to explore the design space is simply not practical. Second, to accurately evaluate the performance (measured in terms of cycles per instruction (CPI)) of a candidate design, the contention at the shared cache must be accounted in addition to cycle-by-cycle behavior of the large number of cores which superlinearly increases the number of simulation cycles per iteration of the design exploration. Third, single thread performance does not scale linearly with number of hardware threads per core and number of cores due to memory wall effect. This means that at every step of the design process designers must ensure that single thread performance is not unacceptably slowed down while increasing overall throughput. While all these factors affect design decisions in both high performance and embedded many-core processors, the design of embedded processors required for complex embedded applications such as networking, smart power grids, battlefield decision-making, consumer electronics and biomedical devices to name a few, is fundamentally different from its high performance counterpart because of the need to consider (i) low power and (ii) real-time operations. This implies the design objective for embedded many-core processors cannot be to simply maximize performance, but improve it in such a way that overall power dissipation is minimized and all real-time constraints are met. This necessitates additional power estimation models right at the design stage to accurately measure the cost and reliability of all the candidate designs during the exploration phase. In this dissertation, a statistical machine learning (SML) based design exploration framework is presented which employs an execution-driven cycle- accurate simulator to accurately measure power and performance of embedded many-core processors. The embedded many-core processor domain is Network Processors (NePs) used to processed network IP packets. Future generation NePs required to operate at terabits per second network speeds captures all the aspects of a complex embedded application consisting of shared data structures, large volume of compute-intensive and data-intensive real-time bound tasks and a high level of task (packet) level parallelism. Statistical machine learning (SML) is used to efficiently model performance and power of candidate designs in terms of wide ranges of micro-architectural parameters. The method inherently minimizes number of in-loop simulations in the exploration framework and also efficiently captures the non-linear interactions between the micro-architectural design parameters. To ensure scalability, the design space is partitioned into (i) core-level micro-architectural parameters to optimize single core architectures subject to the real-time constraints and (ii) shared memory level micro- architectural parameters to explore the shared interconnection network and shared cache memory architectures and achieves overall optimality. The cost function of our exploration algorithm is the total power dissipation which is minimized, subject to the constraints of real-time throughput (as determined from the terabit optical network router line-speed) required in IP packet processing embedded application

    A Low-Voltage, Low-Power 4-bit BCD Adder, designed using the Clock Gated Power Gating, and the DVT Scheme

    Full text link
    This paper proposes a Low-Power, Energy Efficient 4-bit Binary Coded Decimal (BCD) adder design where the conventional 4-bit BCD adder has been modified with the Clock Gated Power Gating Technique. Moreover, the concept of DVT (Dual-vth) scheme has been introduced while designing the full adder blocks to reduce the Leakage Power, as well as, to maintain the overall performance of the entire circuit. The reported architecture of 4-bit BCD adder is designed using 45 nm technology and it consumes 1.384 {\mu}Watt of Average Power while operating with a frequency of 200 MHz, and a Supply Voltage (Vdd) of 1 Volt. The results obtained from different simulation runs on SPICE, indicate the superiority of the proposed design compared to the conventional 4-bit BCD adder. Considering the product of Average Power and Delay, for the operating frequency of 200 MHz, a fair 47.41 % reduction compared to the conventional design has been achieved with this proposed scheme.Comment: To appear in the proceedings of 2013 IEEE International Conference on Signal Processing, Computing and Control (ISPCC,13

    The impact of design techniques in the reduction of power consumption of SoCs Multimedia

    Get PDF
    Orientador: Guido Costa Souza de AraújoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A indústria de semicondutores sempre enfrentou fortes demandas em resolver problema de dissipação de calor e reduzir o consumo de energia em dispositivos. Esta tendência tem sido intensificada nos últimos anos com o movimento de sustentabilidade ambiental. A concepção correta de um sistema eletrônico de baixo consumo de energia é um problema de vários níveis de complexidade e exige estratégias sistemáticas na sua construção. Fora disso, a adoção de qualquer técnica de redução de energia sempre está vinculada com objetivos especiais e provoca alguns impactos no projeto. Apesar dos projetistas conheçam bem os impactos de forma qualitativa, as detalhes quantitativas ainda são incógnitas ou apenas mantidas dentro do 'know-how' das empresas. Neste trabalho, de acordo com resultados experimentais baseado num plataforma de SoC1 industrial, tentamos quantificar os impactos derivados do uso de técnicas de redução de consumo de energia. Nos concentramos em relacionar o fator de redução de energia de cada técnica aos impactos em termo de área, desempenho, esforço de implementação e verificação. Na ausência desse tipo de dados, que relacionam o esforço de engenharia com as metas de consumo de energia, incertezas e atrasos serão frequentes no cronograma de projeto. Esperamos que este tipo de orientações possam ajudar/guiar os arquitetos de projeto em selecionar as técnicas adequadas para reduzir o consumo de energia dentro do alcance de orçamento e cronograma de projetoAbstract: The semiconductor industry has always faced strong demands to solve the problem of heat dissipation and reduce the power consumption in electronic devices. This trend has been increased in recent years with the action of environmental sustainability. The correct conception of an electronic system for low power consumption is an issue with multiple levels of complexities and requires systematic approaches in its construction. However, the adoption of any technique for reducing the power consumption is always linked with some specific goals and causes some impacts on the project. Although the designers know well that these impacts can affect the design in a quality aspect, the quantitative details are still unkown or just be kept inside the company's know-how. In this work, according to the experimental results based on an industrial SoC2 platform, we try to quantify the impacts of the use of low power techniques. We will relate the power reduction factor of each technique to the impact in terms of area, performance, implementation and verification effort. In the absence of such data, which relates the engineering effort to the goals of power consumption, uncertainties and delays are frequent. We hope that such guidelines can help/guide the project architects in selecting the appropriate techniques to reduce the power consumption within the limit of budget and project scheduleMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

    Formal connectivity verification of clock and reset signals in ultra-low-power SoC designs

    Get PDF
    Abstract. This thesis investigates the usage of formal connectivity verification on clock and reset signal connectivity in ultra-low-power SoC designs. The origin of power consumption in CMOS circuits is explained, and the conflict between dynamic and static power on system parameter level is introduced. Common power reduction techniques are introduced and explained in some detail. Overview of functional verification and its role in the design flow is presented. The main classification of functional verification into logic simulation and formal verification is discussed, and details of both are explained and compared. Challenges rising from low power design methodologies are introduced. Detailed view of connectivity and integration in SoC designs is provided, and a specified method of verifying connectivity is introduced in the form of formal connectivity verification. The practical part of the thesis starts with an explanation of the verification goal and requirements for achieving it. Structure of the design environment used in the verification task is explained, and the different stages that the verification was conducted on. Creation of used connectivity properties and the used process flow for the chosen software tool is presented. The process of confirming falsified properties as design bugs is introduced. The results of the verification task are presented, providing the total target amount for each verification stage, as well as the found bugs. The found bugs and their circumstances are explained. Comparison is made between the conventional method of verifying connectivity and the investigated formal method. Results show a great decrease in overall work effort, resourcing and time spent on the connectivity verification.Formaali liitettävyysverifiointi kello- ja reset-signaaleille ultra-matalan tehonkulutuksen järjestelmäpiireissä. Tiivistelmä. Tämä diplomityö tutkii formaalin liitettävyysverifionnin käyttöä kello- ja reset-signaalien yhteyksille ultra-matalan tehonkulutuksen järjestelmäpiireissä. Tehonkulutuksen lähteet CMOS piireissä selitetään, ja esitetään konflikti dynaamisen ja staattisen tehonkulutuksen välillä systeemin parametritasolla. Tavanomaisia tehonkulutusta vähentäviä tekniikoita esitellään ja selitetään jossain määrin. Funktionaalisen verifioinnin yleiskatsaus ja asema suunnitteluvuossa esitellään. Funktionaalisen verifioinnin pääjaottelua logiikkasimulaatioon ja formaaliin verifiointiin käsitellään, ja molempien yksityiskohtia selitetään ja vertaillaan. Matalan tehonkulutuksen metodologioiden aiheuttamat ongelmat esitetään. Yksityiskohtainen kuvaus liitettävyydestä ja integroinnista järjestelmäpiireissä selitetään, ja eritelty metodi liitettävyyden verifioimiselle esitellään formaalin liitettävyysverifionnin muodossa. Käytännön osuus diplomityöstä alkaa verifoinnin tavoitteen ja vaatimusten esittelemisellä. Käytetyn mallin rakenne ja verifiointitehtävä selitetään, sekä eri tasot joilla verifiointi suoritettiin. Liitettävyys-ominaisuuksien luominen, sekä käytetty prosessivuo valitulle työkalulle esitetään. Vääriksi todistettujen ominaisuuksien varmistaminen suunnitteluvirheiksi esitellään. Tulokset verifointitehtävästä esitellään, käsitellen verifioinnin kohteiden kokonaista lukumäärää molemmilla verifiointitasoilla, sekä niistä löydettyjen virheiden määrää. Löydetyt suunnitteluvirheet ja niiden seikkaperät selitetään. Vertailua tehdään perinteisen liitettävyyden verifionnin metodin ja tutkitun formaalin metodin välillä. Tulokset osoittavat suuren säästön kokonaisessa työmäärässä, resurssoinnissa sekä liitettävyyden verifiointiin kulutetussa ajassa

    Improved RISC Processor Design Using MIPS Instruction Set Approach

    Get PDF
    Processor's are playing vital role in today's environment. A lot many processor's are into the market but daily a new processor is making its position in the society. The trade off parameter's of VLSI are playing a key role in the selecting the particular processor for its application. In the days a Low Power and Low Area circuit is to be designed. This paper totally concentrates on synthesizing a low power processor in Verilog HDL. The complexity of the processor can be increased by making it suitable for low power applications. DOI: 10.17762/ijritcc2321-8169.15052

    Circuits and Systems Advances in Near Threshold Computing

    Get PDF
    Modern society is witnessing a sea change in ubiquitous computing, in which people have embraced computing systems as an indispensable part of day-to-day existence. Computation, storage, and communication abilities of smartphones, for example, have undergone monumental changes over the past decade. However, global emphasis on creating and sustaining green environments is leading to a rapid and ongoing proliferation of edge computing systems and applications. As a broad spectrum of healthcare, home, and transport applications shift to the edge of the network, near-threshold computing (NTC) is emerging as one of the promising low-power computing platforms. An NTC device sets its supply voltage close to its threshold voltage, dramatically reducing the energy consumption. Despite showing substantial promise in terms of energy efficiency, NTC is yet to see widescale commercial adoption. This is because circuits and systems operating with NTC suffer from several problems, including increased sensitivity to process variation, reliability problems, performance degradation, and security vulnerabilities, to name a few. To realize its potential, we need designs, techniques, and solutions to overcome these challenges associated with NTC circuits and systems. The readers of this book will be able to familiarize themselves with recent advances in electronics systems, focusing on near-threshold computing
    corecore