5 research outputs found
Data-Width-Driven Power Gating of Integer Arithmetic Circuits
When performing narrow-width computations, power gating of unused arithmetic circuit portions can significantly reduce leakage power. We deploy coarse-grain power gating in 32-bit integer arithmetic circuits that frequently will operate on narrow-width data. Our contributions include a design framework that automatically implements coarse-grain power-gated arithmetic circuits considering a narrow-width input data mode, and an analysis of the impact of circuit architecture on the efficiency of this data-width-driven power gating scheme. As an example, with a performance penalty of 6.7%, coarse-grain power gating of a 45-nm 32-bit multiplier is demonstrated to yield an 11.6x static leakage energy reduction per 8x8-bit operation
Adaptation of a GPU simulator for modern architectures
GPUs have evolved quite radically during the last ten years, providing
improvements in the areas of performance, power consumption, memory,
and programmability, increasing interest in them. This increase
in interest, especially in academic research into GPU architecture,
has led to the creation of the widely used GPGPU-Sim, a GPU simulator
for general purpose computation workloads. The simulation models
currently available for simulation are based on older architectures,
and as new GPU architectures have been introduced, GPGPU-Sim has not been
updated to model them.
This project attempts to model a more modern GPU, the Maxwell based
GeForce GTX Titan X. This is accomplished by modifying the existing
configuration files for one of the older simulation models. The
changes made to the configuration files include changing the GPU\u27s
organization, updating the clock domains, and increasing cache and
memory sizes. To test the accuracy of the model, eleven GPGPU
programs, some having multiple kernels, were chosen to be executed by
the model and by the physical hardware, and compared using IPC as the
metric.
While for some of the kernels the model performed within 16% of the
GeForce GTX Titan X, there were an equal number of kernels for which
the model performed either much faster or much slower than the hardware.
It is suspected that the cases for which the model performed much faster
were ones in which either the hardware executed single precision instructions
as double precision instructions, or the hardware ran an entirely different
machine code for the same kernel than the model. The cases for which the model
performed much slower are suspected to be due to the fact that the Maxwell
memory subsystem cannot currently be accurately modeled in GPGPU-Sim
Shader optimization and specialization
In the field of real-time graphics for computer games, performance has a significant effect on the player’s enjoyment and immersion. Graphics processing units (GPUs) are
hardware accelerators that run small parallelized shader programs to speed up computationally expensive rendering calculations. This thesis examines optimizing shader
programs and explores ways in which data patterns on both the CPU and GPU can be
analyzed to automatically speed up rendering in games.
Initially, the effect of traditional compiler optimizations on shader source-code
was explored. Techniques such as loop unrolling or arithmetic reassociation provided
speed-ups on several devices, but different GPU hardware responded differently to
each set of optimizations. Analyzing execution traces from numerous popular PC
games revealed that much of the data passed from CPU-based API calls to GPU-based
shaders is either unused, or remains constant. A system was developed to capture this
constant data and fold it into the shaders’ source-code. Re-running the game’s rendering code using these specialized shader variants resulted in performance improvements
in several commercial games without impacting their visual quality