Search CORE

6 research outputs found

Hardware/Software Co-design for Multicore Architectures

Author: Xu Thomas Canhao
Publication venue: Turku Centre for Computer Science
Publication date: 12/09/2012
Field of study

Siirretty Doriast

UTUPub

Energy-precision tradeoffs in the graphics pipeline

Author: Pool Jeff
Publication venue
Publication date: 01/05/2012
Field of study

The energy consumption of a graphics processing unit (GPU) is an important factor in its design, whether for a server, desktop, or mobile device. Mobile products, such as smart phones, tablets, and laptop computers, rely on batteries to function; the less the demand for power is on these batteries, the longer they will last before needing to be recharged. GPUs used in servers and desktops, while not dependent on a battery for operation, are still limited by the efficiency of power supplies and heat dissipation techniques. In this dissertation, I propose to lower the energy consumption of GPUs by reducing the precision of floating-point arithmetic in the graphics pipeline and the data sent and stored on- and off-chip. The key idea behind this work is twofold: energy can be saved through a systematic and targeted reduction in the number of bits 1) computed and 2) communicated. Reducing the number of bits computed will necessarily reduce either the precision or range of a floating point number. I focus on saving energy by way of reducing precision, which can exploit the over-provisioning of bits in many stages of the graphics pipeline. Reducing the number of bits communicated takes several forms. First, I propose enhancements to existing compression schemes for off-chip buffers to save bandwidth. I also suggest a simple extension that exploits unused bits in reduced-precision data undergoing compression. Finally, I present techniques for saving energy in on-chip communication of reduced-precision data. By designing and simulating variable-precision arithmetic circuits with promising energy versus precision characteristics and tradeoffs, I have developed an energy model for GPUs. Using this model and my techniques, I have shown that significant savings (up to 70% in computation in the vertex and pixel shader stages) are possible by reducing the precision of the arithmetic. Further, my compression approaches have enabled improvements of 1.26x over past work, and a general-purpose compressor design has achieved bandwidth savings of 34%, 87%, and 65% for color, depth, and geometry data, respectively, which is competitive with past work. Lastly, an initial exploration in signal gating unused lines in on-chip buses has suggested savings of 13-48% for the tested applications' traffic from a multiprocessor's register file to its L1 cache

Carolina Digital Repository

Software Power Analysis And Optimization For Power-Aware Multicore Systems

Author: Wang Shinan
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2014
Field of study

Among all the factors in sustainable computing, power dissipation and energy consumption, arguably speaking, are fundamental aspects of modern computer systems. Different from performance metric, power dissipation is not easy to measure because hardware instrumentation is usually required. Yet as an indispensable component of a computer system, software becomes a major factor affecting power dissipation besides hardware energy-efficiency and power states. With detailed information on resource usage and power dissipation of an application/software, software developers will be able to leverage algorithms and implementations in order to produce power-efficient solutions. Hardware instrumentation, despite its accuracy, is costly and complicated to set up. A general solution to connect software with hardware along with detailed power and system information will improve the system overall efficiency. In this work, we design and implement a general solution to analyze and model software power dissipation. Based on the analysis, we propose a combined solution to optimize the energy efficiency of parallel workload. Starting from the hands-on power measurement method in detail, we provide a fine-grain power profile of two computer systems using hardware instrumentation. Being focusing on dynamic power dissipation analysis, we propose a two-level power model for power-aware multicore computer systems. Based on the model, we design and implement SPAN to relate power dissipation to the different portions of an application using the proposed power model. By using SPAN, developers can easily identify the sections of code consuming the most power in the program. Alternatively, to enable automatic source code instrumentation, we utilize compiler techniques to insert profiling code before and after each function in source code. The expected outcome includes an open source function level power profiling tool, Safari. Using the profiling tools, we propose a model to capture the relationship between concurrency (C), power (P) and execution time (T). By changing the system configuration for different parallel workload, we are able to achieve optimal/near optimal energy-efficient execution of a given workload on a specific platform

CiteSeerX

Digital Commons@Wayne State University

High-level power analysis for multi-core chips

Author: Eisley Noel A.
Peh Lishiuan
Soteriou Vassos
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

Technology trends have led to the advent of multi-core chips in the form of both general-purpose chip multiprocessors (CMPs)and embedded multi-processor systems-on-a-chip (MPSoCs), with on-chip networks increasingly becoming the defacto communication fabric between cores as the demand for on-chip bandwidth scales up. These multi-core chips are composed of two key subcomponents: processor cores and a network fabric. Rapid, early-stage power estimation of these multi-core chips is crucial in assisting compilers in determining the most efficient thread partitioning and place-ment. While prior work in high-level power analysis exists, the focus has been on uniprocessor cores and ignores the interactions between cores via the on-chip network, as well as the power contribution of the on-chip fabric itself. In this paper we propose a ?rst high-level power analysis framework that synergistically considers both computation and communication in a complete CMP system. Processor cores and the communication fabric are both abstracted as network nodes and links, so data dependencies, structural dependencies and communication dependencies are all modeled as resource contention, with resource utilization as a proxy for relative power. Our tool has been validated against the cycle-accurate BTL simulator of the MITRawCMP, showing an average speed up of 7X while achieving relative accuracy of 9.1%. We see this as a ?rst step towards enabling the implementation of parallelizing compilers that explore various power-performance tradeoffs for future multi-core chip

Ktisis

ABSTRACT High-Level Power Analysis for Multi-Core Chips

Author
Publication venue
Publication date
Field of study

Technology trends have led to the advent of multi-core chips in the form of both general-purpose chip multiprocessors (CMPs) and embedded multi-processor systems-on-a-chip (MPSoCs), with on-chip networks increasingly becoming the de facto communication fabric between cores as the demand for on-chip bandwidth scales up. These multi-core chips are composed of two key subcomponents: processor cores and a network fabric. Rapid, early-stage power estimation of these multi-core chips is crucial in assisting compilers in determining the most efficient thread partitioning and placement. While prior work in high-level power analysis exists, the focus has been on uniprocessor cores and ignores the interactions between cores via the on-chip network, as well as the power contribution of the on-chip fabric itself. In this paper we propose a first high-level power analysis framework that synergistically considers both computation and communication in a complete CMP system. Processor cores and the communication fabric are both abstracted as network nodes and links, so data dependencies, structural dependencies and communication dependencies are all modeled as resource contention, with resource utilization as a proxy for relative power. Our tool has been validated against the cycle-accurate BTL simulator of the MIT Raw CMP, showing an average speedup of 7X while achieving relative accuracy of 9.1%. We see this as a first step towards enabling the implementation of parallelizing compilers that explore various power-performance tradeoffs for future multi-core chips

CiteSeerX