Search CORE

5 research outputs found

Data-Width-Driven Power Gating of Integer Arithmetic Circuits

Author: Hoang-Thanh Tung
Larsson-Edefors Per
Publication venue
Publication date: 01/01/2012
Field of study

When performing narrow-width computations, power gating of unused arithmetic circuit portions can significantly reduce leakage power. We deploy coarse-grain power gating in 32-bit integer arithmetic circuits that frequently will operate on narrow-width data. Our contributions include a design framework that automatically implements coarse-grain power-gated arithmetic circuits considering a narrow-width input data mode, and an analysis of the impact of circuit architecture on the efficiency of this data-width-driven power gating scheme. As an example, with a performance penalty of 6.7%, coarse-grain power gating of a 45-nm 32-bit multiplier is demonstrated to yield an 11.6x static leakage energy reduction per 8x8-bit operation

Chalmers Research

Chalmers Publication Library

Adaptation of a GPU simulator for modern architectures

Author: Hall Piriya Kristofer
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2016
Field of study

GPUs have evolved quite radically during the last ten years, providing improvements in the areas of performance, power consumption, memory, and programmability, increasing interest in them. This increase in interest, especially in academic research into GPU architecture, has led to the creation of the widely used GPGPU-Sim, a GPU simulator for general purpose computation workloads. The simulation models currently available for simulation are based on older architectures, and as new GPU architectures have been introduced, GPGPU-Sim has not been updated to model them. This project attempts to model a more modern GPU, the Maxwell based GeForce GTX Titan X. This is accomplished by modifying the existing configuration files for one of the older simulation models. The changes made to the configuration files include changing the GPU\u27s organization, updating the clock domains, and increasing cache and memory sizes. To test the accuracy of the model, eleven GPGPU programs, some having multiple kernels, were chosen to be executed by the model and by the physical hardware, and compared using IPC as the metric. While for some of the kernels the model performed within 16% of the GeForce GTX Titan X, there were an equal number of kernels for which the model performed either much faster or much slower than the hardware. It is suspected that the cases for which the model performed much faster were ones in which either the hardware executed single precision instructions as double precision instructions, or the hardware ran an entirely different machine code for the same kernel than the model. The cases for which the model performed much slower are suspected to be due to the fact that the Maxwell memory subsystem cannot currently be accurately modeled in GPGPU-Sim

Digital Repository @ Iowa State University (ISU)

Shader optimization and specialization

Author: Crawford Lewis
Publication venue: The University of Edinburgh
Publication date: 11/10/2022
Field of study

In the field of real-time graphics for computer games, performance has a significant effect on the player’s enjoyment and immersion. Graphics processing units (GPUs) are hardware accelerators that run small parallelized shader programs to speed up computationally expensive rendering calculations. This thesis examines optimizing shader programs and explores ways in which data patterns on both the CPU and GPU can be analyzed to automatically speed up rendering in games. Initially, the effect of traditional compiler optimizations on shader source-code was explored. Techniques such as loop unrolling or arithmetic reassociation provided speed-ups on several devices, but different GPU hardware responded differently to each set of optimizations. Analyzing execution traces from numerous popular PC games revealed that much of the data passed from CPU-based API calls to GPU-based shaders is either unused, or remains constant. A system was developed to capture this constant data and fold it into the shaders’ source-code. Re-running the game’s rendering code using these specialized shader variants resulted in performance improvements in several commercial games without impacting their visual quality

Edinburgh Research Archive