197 research outputs found

    Energy-precision tradeoffs in the graphics pipeline

    Get PDF
    The energy consumption of a graphics processing unit (GPU) is an important factor in its design, whether for a server, desktop, or mobile device. Mobile products, such as smart phones, tablets, and laptop computers, rely on batteries to function; the less the demand for power is on these batteries, the longer they will last before needing to be recharged. GPUs used in servers and desktops, while not dependent on a battery for operation, are still limited by the efficiency of power supplies and heat dissipation techniques. In this dissertation, I propose to lower the energy consumption of GPUs by reducing the precision of floating-point arithmetic in the graphics pipeline and the data sent and stored on- and off-chip. The key idea behind this work is twofold: energy can be saved through a systematic and targeted reduction in the number of bits 1) computed and 2) communicated. Reducing the number of bits computed will necessarily reduce either the precision or range of a floating point number. I focus on saving energy by way of reducing precision, which can exploit the over-provisioning of bits in many stages of the graphics pipeline. Reducing the number of bits communicated takes several forms. First, I propose enhancements to existing compression schemes for off-chip buffers to save bandwidth. I also suggest a simple extension that exploits unused bits in reduced-precision data undergoing compression. Finally, I present techniques for saving energy in on-chip communication of reduced-precision data. By designing and simulating variable-precision arithmetic circuits with promising energy versus precision characteristics and tradeoffs, I have developed an energy model for GPUs. Using this model and my techniques, I have shown that significant savings (up to 70% in computation in the vertex and pixel shader stages) are possible by reducing the precision of the arithmetic. Further, my compression approaches have enabled improvements of 1.26x over past work, and a general-purpose compressor design has achieved bandwidth savings of 34%, 87%, and 65% for color, depth, and geometry data, respectively, which is competitive with past work. Lastly, an initial exploration in signal gating unused lines in on-chip buses has suggested savings of 13-48% for the tested applications' traffic from a multiprocessor's register file to its L1 cache

    Power efficient, event driven data acquisition and processing using asynchronous techniques

    Get PDF
    PhD ThesisData acquisition systems used in remote environmental monitoring equipment and biological sensor nodes rely on limited energy supply soured from either energy harvesters or battery to perform their functions. Among the building blocks of these systems are power hungry Analogue to Digital Converters and Digital Signal Processors which acquire and process samples at predetermined rates regardless of the monitored signal’s behavior. In this work we investigate power efficient event driven data acquisition and processing techniques by implementing an asynchronous ADC and an event driven power gated Finite Impulse Response (FIR) filter. We present an event driven single slope ADC capable of generating asynchronous digital samples based on the input signal’s rate of change. It utilizes a rate of change detection circuit known as the slope detector to determine at what point the input signal is to be sampled. After a sample has been obtained it’s absolute voltage value is time encoded and passed on to a Time to Digital Converter (TDC) as part of a pulse stream. The resulting digital samples generated by the TDC are produced at a rate that exhibits the same rate of change profile as that of the input signal. The ADC is realized in 0.35mm CMOS process, covers a silicon area of 340mm by 218mm and consumes power based on the input signal’s frequency. The samples from the ADC are asynchronous in nature and exhibit random time periods between adjacent samples. In order to process such asynchronous samples we present a FIR filter that is able to successfully operate on the samples and produce the desired result. The filter also poses the ability to turn itself off in-between samples that have longer sample periods in effect saving power in the process

    Circuit and System Level Design Optimization for Power Delivery And Management

    Get PDF
    As the VLSI technology scales to the nanometer scale, power consumption has become a critical design concern of VLSI circuits. Power gating and dynamic voltage and frequency scaling (DVFS) are two effective power management techniques that are widely utilized in modern chip designs. Various design challenges merge with these power management techniques in nanometer VLSI circuits. For example, power gating introduces unique power integrity issues and trade-offs between switching noise and rush current noise. Assuring power integrity and achieving power efficiency are two highly intertwined design challenges. In addition, these trade-offs significantly vary with the supply voltage. It is difficult to use conventional power-gated power delivery networks (PDNs) to fully meet the involved conflicting design constraints while maximizing power saving and minimizing supply noise. The DVFS controller and the DC-DC power converter are two highly intertwining enablers for DVFS-based systems. However, traditional DVFS techniques treat the design optimizations of the two as separate tasks, giving rise to sub-optimal designs. To address the above research challenges, we propose several circuit and system level design optimization techniques in this dissertation. For power-gated PDN designs, we propose systemic decoupling capacitor (decap) optimization strategies that optimally trade-off between power integrity and leakage saving. First, new global decap and re-routable decap design concepts are proposed to relax the tight interaction between power integrity and leakage power saving of power-gated PDN at a single supply voltage level. Furthermore, we propose to leverage re-routable decaps to provide flexible decap allocation structures to better suit multiple supply voltage levels. The proposed strategies are implemented in an automatic design flow for choosing optimal amount of local decaps, global decaps and re-routable decaps. The proposed techniques significantly increase leakage saving without jeopardizing power integrity. The flexible decap allocations enabled by re-routable decaps lead to optimal design trade-offs for PDNs operating with two supply voltage levels. To improve the effectiveness of DVFS, we analyze the drawbacks of circuit-level only and policy-level only optimizations and the promising opportunities resulted from the cross-layer co-optimization of the DC-DC converter and online learning based DVFS polices. We present a cross-layer approach that optimizes transition time, area, energy overhead of the DC-DC converter along with key parameters of an online learning DVFS controller. We systematically evaluate the benefits of the proposed co-optimization strategy based on several processor architectures, namely single and dual-core processors and processors with DVFS and power gating. Our results indicate that the co-optimization can introduce noticeable additional energy saving without significant performance degradation

    Design and Analysis of Power Distribution Networks in VLSI Circuits.

    Full text link
    Rapidly switching currents of the on-chip devices can cause fluctuations in the supply voltage which can be classified as IR and Ldi/dt drops. The voltage fluctuations in a supply network can inject noise in a circuit which may lead to functional failures of the design. Power supply integrity verification is, therefore, a critical concern in high-performance designs. Also, with decreasing supply voltages, gate-delay is becoming increasingly sensitive to supply voltage variation. With ever-diminishing clock periods, accurate analysis of the impact of supply voltage on circuit performance has also become critical. Increasing power consumption and clock frequency have exacerbated the Ldi/dt drop in every new technology generation. The Ldi/dt drop has become the dominant portion of the overall supply-drop in high performance designs. On-die passive decap, which has traditionally been used for suppressing Ldi/dt, has become expensive due to its area and leakage power overhead. This has created an urgent need for novel circuit techniques to suppress the Ldi/dt drop in power distribution networks. We provide accurate algorithmic solutions for determining the worst-case supply-drop and the impact of supply noise on circuit performance. We propose a path-based and a block-based approach for computing the maximum circuit delay under power supply fluctuations. We also propose an early-mode supply-drop estimation approach and a statistical approach for power grid analysis. All the proposed approaches are vectorless and account for both IR and Ldi/dt drops. We also propose a performance-aware decoupling capacitance allocation technique which uses timing slacks to drive the optimization. Finally, we present analog as well as all-digital circuit techniques for inductive supply noise suppression. The proposed all-digital circuit techniques were implemented in a test-chip, fabricated in a 0.13µm CMOS process. Measurements on the test-chip demonstrate a reduction in the supply fluctuations by 57% for a ramp loads and by 75% during resonance. We also present a low-power, all-digital on-chip oscilloscope for accurate measurement of supply noise. Supply noise measurements obtained from the on-chip oscilloscope were validated to conform well to those obtained from a traditional supply-drop monitor and direct on-chip probing.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/58508/1/spant_1.pd

    Radar Imaging in Challenging Scenarios from Smart and Flexible Platforms

    Get PDF
    undefine

    Cross-Layer Approaches for an Aging-Aware Design of Nanoscale Microprocessors

    Get PDF
    Thanks to aggressive scaling of transistor dimensions, computers have revolutionized our life. However, the increasing unreliability of devices fabricated in nanoscale technologies emerged as a major threat for the future success of computers. In particular, accelerated transistor aging is of great importance, as it reduces the lifetime of digital systems. This thesis addresses this challenge by proposing new methods to model, analyze and mitigate aging at microarchitecture-level and above

    Distributed IC Power Delivery: Stability-Constrained Design Optimization and Workload-Aware Power Management

    Get PDF
    ABSTRACT Power delivery presents key design challenges in today’s systems ranging from high performance micro-processors to mobile systems-on-a-chips (SoCs). A robust power delivery system is essential to ensure reliable operation of on-die devices. Nowadays it has become an important design trend to place multiple voltage regulators on-chip in a distributive manner to cope with power supply noise. However, stability concern arises because of the complex interactions be-tween multiple voltage regulators and bulky network of the surrounding passive parasitics. The recently developed hybrid stability theorem (HST) is promising to deal with the stability of such system by efficiently capturing the effects of all interactions, however, large overdesign and hence severe performance degradation are caused by the intrinsic conservativeness of the underlying HST framework. To address such challenge, this dissertation first extends the HST by proposing a frequency-dependent system partitioning technique to substantially reduce the pessimism in stability evaluation. By systematically exploring the theoretical foundation of the HST framework, we recognize all the critical constraints under which the partitioning technique can be performed rigorously to remove conservativeness while maintaining key theoretical properties of the partitioned subsystems. Based on that, we develop an efficient stability-ensuring automatic design flow for large power delivery systems with distributed on-chip regulation. In use of the proposed approach, we further discover new design insights for circuit designers such as how regulator topology, on-chip decoupling capacitance, and the number of integrated voltage regulators can be optimized for improved system tradeoffs between stability and performances. Besides stability, power efficiency must be improved in every possible way while maintaining high power quality. It can be argued that the ultimate power integrity and efficiency may be best achieved via a heterogeneous chain of voltage processing starting from on-board switching voltage regulators (VRs), to on-chip switching VRs, and finally to networks of distributed on-chip linear VRs. As such, we propose a heterogeneous voltage regulation (HVR) architecture encompassing regulators with complimentary characteristics in response time, size, and efficiency. By exploring the rich heterogeneity and tunability in HVR, we develop systematic workload-aware control policies to adapt heterogeneous VRs with respect to workload change at multiple temporal scales to significantly improve system power efficiency while providing a guarantee for power integrity. The proposed techniques are further supported by hardware-accelerated machine learning prediction of non-uniform spatial workload distributions for more accurate HVR adaptation at fine time granularity. Our evaluations based on the PARSEC benchmark suite show that the proposed adaptive 3-stage HVR reduces the total system energy dissipation by up to 23.9% and 15.7% on average compared with the conventional static two-stage voltage regulation using off- and on-chip switching VRs. Compared with the 3-stage static HVR, our runtime control reduces system energy by up to 17.9% and 12.2% on average. Furthermore, the proposed machine learning prediction offers up to 4.1% reduction of system energy

    Space programs summary no. 37-61, volume 2 for the period 1 November - 31 December 1969. The deep space network

    Get PDF
    Research and developments in Deep Space Network progra

    Modélisation, exploration et estimation de la consommation pour les architectures hétérogènes reconfigurables dynamiquement

    Get PDF
    L'utilisation des accélérateurs reconfigurables, pour la conception de system-on-chip hétérogènes, offre des possibilités intéressantes d'augmentation des performances et de réduction de la consommation d'énergie. En effet, ces accélérateurs sont couramment utilisés en complément d'un (ou de plusieurs) processeur(s) pour permettre de décharger celui-ci (ceux-ci) des calculs intensifs et des traitements de flots de données. Le concept de reconfiguration dynamique, supporté par certains constructeurs de FPGA, permet d'envisager des systèmes beaucoup plus flexibles en offrant notamment la possibilité de séquencer temporellement l'exécution de blocs de calcul sur la même surface de silicium, réduisant alors les besoins en ressources d'exécution. Cependant, la reconfiguration dynamique n'est pas sans impact sur les performances globales du système et il est difficile d'estimer la répercussion des décisions de configuration sur la consommation d'énergie. L'objectif principal de cette thèse consiste à proposer une méthodologie d'exploration permettant d'évaluer l'impact des choix d'implémentation des différentes tâches d'une application sur un system-on-chip contenant une ressource reconfigurable dynamiquement, en vue d'optimiser la consommation d'énergie ou le temps d'exécution. Pour cela, nous avons établi des modèles de consommation des composants reconfigurables, en particulier les FPGAs, qui permettent d'aider le concepteur dans son design. À l'aide d'une méthodologie de mesure sur Virtex-5, nous montrons dans un premier temps qu'il est possible de générer des accélérateurs matériels de tailles variées ayant des performances temporelles et énergétiques diverses. Puis, afin de quantifier les coûts d'implémentation de ces accélérateurs, nous construisons trois modèles de consommation de la reconfiguration dynamique partielle. Finalement, à partir des modèles définis et des accélérateurs produits, nous développons un algorithme d'exploration des solutions d'implémentation pour un système complet. En s'appuyant sur une plate-forme de modélisation à haut niveau, celui-ci analyse les coûts d'implémentation des tâches et leur exécution sur les différentes ressources disponibles (processeur ou région configurable). Les solutions offrant les meilleures performances en fonction des contraintes de conception sont retenues pour être exploitées.The use of reconfigurable accelerators when designing heterogeneous system-on-chip has the potential to increase performance and reduce energy consumption. Indeed, these accelerators are commonly a adjunct to one (or more) processor(s) and unload intensive computations and treatments. The concept of dynamic reconfiguration, supported by some FPGA vendors, allows to consider more flexible systems including the ability to sequence the execution of accelerators on the same silicon area, while reducing resource requirements. However, dynamic reconfiguration may impact overall system performance and it is hard to estimate the impact of configuration decisions on energy consumption.. The main objective of this thesis is to provide an exploration methodology to assess the impact of implementation choices of tasks of an application on a system-on-chip containing a dynamically reconfigurable resource, to optimize the energy consumption or the processing time. Therefore, we have established consumption models of reconfigurable components, particularly FPGAs, which assists the designer. Using a measurement methodology on Virtex-5, we first show the possibility to generate hardware accelerators of various sizes, execution time and energy consumption. Then, in order to quantify the implementation costs of these accelerators, we build three power models of the dynamic and partial reconfiguration. Finally, from these models, we develop an algorithm for the exploration of implementation and allocation possibilities for a complete system. Based on a high-level modeling platform, the implementation costs of the tasks and their performance on various resources (CPU or reconfigurable region) are analyzed. The solutions with the best characteristics, based on design constraints, are extracted.RENNES1-Bibl. électronique (352382106) / SudocSudocFranceF
    • …
    corecore