Architecture-Aware Optimization on a 1600-core
Graphics Processor

Daga, Mayank; Feng, Wu-chun; Scogland, Thomas R.W.

research

Architecture-Aware Optimization on a 1600-core Graphics Processor

Authors: Mayank Daga
Wu-chun Feng
Thomas R.W. Scogland
Publication date: 1 July 2011
Publisher

Abstract

The graphics processing unit (GPU) continues to make significant strides as an accelerator in commodity cluster computing for high-performance computing (HPC). For example, three of the top five fastest supercomputers in the world, as ranked by the TOP500, employ GPUs as accelerators. Despite this increasing interest in GPUs, however, optimizing the performance of a GPU-accelerated compute node requires deep technical knowledge of the underlying architecture. Although significant literature exists on how to optimize GPU performance on the more mature NVIDIA CUDA architecture, the converse is true for OpenCL on the AMD GPU. Consequently, we present and evaluate architecture-aware optimizations for the AMD GPU. The most prominent optimizations include (i) explicit use of registers, (ii) use of vector types, (iii) removal of branches, and (iv) use of image memory for global data. We demonstrate the efficacy of our AMD GPU optimizations by applying each optimization in isolation as well as in concert to a large-scale, molecular modeling application called GEM. Via these AMD-specific GPU optimizations, the AMD Radeon HD 5870 GPU delivers 65% better performance than with the wellknown NVIDIA-specific optimizations

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Computer Science Technical Reports @Virginia Tech

oai:vtcstechreports.OAI2:1159

Last time updated on 21/06/2013