1,895 research outputs found

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    Fine-grained Energy and Thermal Management using Real-time Power Sensors

    Get PDF
    With extensive use of battery powered devices such as smartphones, laptops an

    SRAM Cells for Embedded Systems

    Get PDF

    System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing

    Get PDF
    This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications. Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance. This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB. Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy). The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption. Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude

    Exploiting Aging Benefits for the Design of Reliable Drowsy Cache Memories

    Get PDF
    In this paper, we show how beneficial effects of aging on static power consumption can be exploited to design reliable drowsy cache memories adopting dynamic voltage scaling(DVS) to reduce static power. First, we develop an analytical model allowing designers to evaluate the long-term threshold voltage degradation induced by bias temperature instability (BTI)in a drowsy cache memory. Through HSPICE simulations, we demonstrate that, as drowsy memories age, static power reduction techniques based on DVS become more effective because of reduction in sub-threshold current due to BTI aging. We develop a simulation framework to evaluate trade-offs between static power and reliability, and a methodology to properly select the “drowsy” data retention voltage. We then propose different architectures of a drowsy cache memory allowing designers to meet different power and reliability constraints. The performed HSPICE simulations show a soft error rate and static noise margin improvement up to 20.8% and 22.7%, respectively, compared to standard aging unaware drowsy technique. This is achieved with a limited static power increase during the very early lifetime, and with static energy saving of up to 37% in 10 years of operation, at no or very limited hardware overhead

    Reliable Power Gating with NBTI Aging Benefits

    Get PDF
    In this paper, we show that Negative Bias Temperature Instability (NBTI) aging of sleep transistors (STs), together with its detrimental effect for circuit performance and lifetime, presents considerable benefits for power gated circuits. Indeed, it reduces static power due to leakage current, and increases ST switch efficiency, making power gating more efficient and effective over time. The magnitude of these aging benefits depends on operating and environmental conditions. By means of HSPICE simulations, considering a 32nm CMOS technology, we demonstrate that static power may reduce by more than 80% in 10 years of operation. Static power decrease over time due to NBTI aging is also proven experimentally, using a test-chip manufactured with a TSMC 65nm technology. We propose an ST design strategy for reliable power gating, in order to harvest the benefits offered by NBTI aging. It relies on the design of STs with a proper lower Vth compared to the standard power switching fabric. This can be achieved by either re-designing the STs with the identified Vth value, or applying a proper forward body bias to the available power switching fabrics. Through HSPICE simulations, we show lifetime extension up to 21.4X and average static power reduction up to 16.3% compared to standard ST design approach, without additional area overhead. Finally, we show lifetime extension and several performance-cost trade-offs when a target maximum lifetime is considered

    Register-transfer-level power profiling for system-on-chip power distribution network design and signoff

    Get PDF
    Abstract. This thesis is a study of how register-transfer-level (RTL) power profiling can help the design and signoff of power distribution network in digital integrated circuits. RTL power profiling is a method which collects RTL power estimation results to a single power profile which then can be analysed in order to find interesting time windows for specifying power distribution network design and signoff. The thesis starts with theory part. Complementary metal-oxide semiconductor (CMOS) inverter power dissipation is studied at first. Next, power distribution network structure and voltage drop problems are introduced. Voltage drop is demonstrated by using power distribution network impedance figures. Common on-chip power distribution network structure is introduced, and power distribution network design flow is outlined. Finally, decoupling capacitors function and impact on power distribution network impedance are thoroughly explained. The practical part of the thesis contains RTL power profiling flow details and power profiling flow results for one simulation case in one design block. Also, some methods of improving RTL power estimation accuracy are discussed and calibration with extracted parasitic is then used to get new set of power profiling time windows. After the results are presented, overall RTL power estimation accuracy is analysed and resulted time windows are compared to reference gate-level time windows. RTL power profiling result analysis shows that resulted time windows match the theory and RTL power profiling seems to be a promising method for finding time windows for power distribution network design and signoff.Rekisterisiirtotason tehoprofilointi järjestelmäpiirin tehonsiirtoverkon suunnittelussa ja verifioinnissa. Tiivistelmä. Tässä työssä tutkitaan, miten rekisterisiirtotason (RTL) tehoprofilointi voi auttaa digitaalisten integroitujen piirien tehonsiirtoverkon suunnittelussa ja verifioinnissa. RTL-tehoprofilointi on menetelmä, joka analysoi RTL-tehoestimoinnista saadusta tehokäyrästä hyödyllisiä aikaikkunoita tehonsiirtoverkon suunnitteluun ja verifiointiin. Työ alkaa teoriaosuudella, jonka aluksi selitetään, miten CMOS-invertteri kuluttaa tehoa. Seuravaksi esitellään tehonsiirtoverkon rakenne ja pahimmat tehonsiirtoverkon jännitehäviön aiheuttajat. Jännitehäviötä havainnollistetaan myös piirikaavioiden ja impedanssikäyrien avustuksella. Lisäksi integroidun piirin tehonsiirtoverkon suunnitteluvuo ja yleisin rakenne on esitelty. Lopuksi teoriaosuus käsittelee yksityiskohtaisesti ohituskondensaattoreiden toiminnan ja vaikutuksen tehonsiirtoverkon kokonaisimpedanssiin. Työn kokeellisessa osuudessa esitellään ensin tehoprofiloinnin vuo ja sen jälkeen vuon tulokset yhdelle esimerkkilohkolle yhdessä simulaatioajossa. Lisäksi tässä osiossa käsitellään RTL-tehoestimoinnin tarkkuutta ja tehdään RTL-tehoprofilointi loisimpedansseilla kalibroidulle RTL-mallille. Lopuksi RTL-tehoestimoinnin tuloksia ja saatuja RTL-tehoprofiloinnin aikaikkunoita analysoidaan ja verrataan porttitason mallin tuloksiin. RTL-tehoprofiloinnin tulosten analysointi osoittaa, että saatavat aikaikkunat vastaavat teoriaa ja että RTL-tehoprofilointi näyttää lupaavalta menetelmältä tehosiirtoverkon analysoinnin ja verifioinnin aikaikkunoiden löytämiseen

    Towards Energy-Efficient and Reliable Computing: From Highly-Scaled CMOS Devices to Resistive Memories

    Get PDF
    The continuous increase in transistor density based on Moore\u27s Law has led us to highly scaled Complementary Metal-Oxide Semiconductor (CMOS) technologies. These transistor-based process technologies offer improved density as well as a reduction in nominal supply voltage. An analysis regarding different aspects of 45nm and 15nm technologies, such as power consumption and cell area to compare these two technologies is proposed on an IEEE 754 Single Precision Floating-Point Unit implementation. Based on the results, using the 15nm technology offers 4-times less energy and 3-fold smaller footprint. New challenges also arise, such as relative proportion of leakage power in standby mode that can be addressed by post-CMOS technologies. Spin-Transfer Torque Random Access Memory (STT-MRAM) has been explored as a post-CMOS technology for embedded and data storage applications seeking non-volatility, near-zero standby energy, and high density. Towards attaining these objectives for practical implementations, various techniques to mitigate the specific reliability challenges associated with STT-MRAM elements are surveyed, classified, and assessed herein. Cost and suitability metrics assessed include the area of nanomagmetic and CMOS components per bit, access time and complexity, Sense Margin (SM), and energy or power consumption costs versus resiliency benefits. In an attempt to further improve the Process Variation (PV) immunity of the Sense Amplifiers (SAs), a new SA has been introduced called Adaptive Sense Amplifier (ASA). ASA can benefit from low Bit Error Rate (BER) and low Energy Delay Product (EDP) by combining the properties of two of the commonly used SAs, Pre-Charge Sense Amplifier (PCSA) and Separated Pre-Charge Sense Amplifier (SPCSA). ASA can operate in either PCSA or SPCSA mode based on the requirements of the circuit such as energy efficiency or reliability. Then, ASA is utilized to propose a novel approach to actually leverage the PV in Non-Volatile Memory (NVM) arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time

    Enhancing Power Efficient Design Techniques in Deep Submicron Era

    Get PDF
    Excessive power dissipation has been one of the major bottlenecks for design and manufacture in the past couple of decades. Power efficient design has become more and more challenging when technology scales down to the deep submicron era that features the dominance of leakage, the manufacture variation, the on-chip temperature variation and higher reliability requirements, among others. Most of the computer aided design (CAD) tools and algorithms currently used in industry were developed in the pre deep submicron era and did not consider the new features explicitly and adequately. Recent research advances in deep submicron design, such as the mechanisms of leakage, the source and characterization of manufacture variation, the cause and models of on-chip temperature variation, provide us the opportunity to incorporate these important issues in power efficient design. We explore this opportunity in this dissertation by demonstrating that significant power reduction can be achieved with only minor modification to the existing CAD tools and algorithms. First, we consider peak current, which has become critical for circuit's reliability in deep submicron design. Traditional low power design techniques focus on the reduction of average power. We propose to reduce peak current while keeping the overhead on average power as small as possible. Second, dual Vt technique and gate sizing have been used simultaneously for leakage savings. However, this approach becomes less effective in deep submicron design. We propose to use the newly developed process-induced mechanical stress to enhance its performance. Finally, in deep submicron design, the impact of on-chip temperature variation on leakage and performance becomes more and more significant. We propose a temperature-aware dual Vt approach to alleviate hot spots and achieve further leakage reduction. We also consider this leakage-temperature dependency in the dynamic voltage scaling approach and discover that a commonly accepted result is incorrect for the current technology. We conduct extensive experiments with popular design benchmarks, using the latest industry CAD tools and design libraries. The results show that our proposed enhancements are promising in power saving and are practical to solve the low power design challenges in deep submicron era
    corecore