Meta-programming and auto-tuning in the search for high performance GPU code

Abstract

Writing high performance GPGPU code is often difficult and time-consuming, potentially requiring laborious manual tuning of low-level details. Despite these challenges, the cost in ignoring GPUs in high performance computing is increasingly large. Auto-tuning is a potential solution to the problem of tedious manual tuning. We present a framework for auto-tuning GPU kernels which are expressed in an embedded DSL, and which expose compile-time parameters for tuning. Our framework allows for kernels to be polymorphic over what search strategy will tune them, and allows search strategies to be implemented in the same meta-language as the kernel-generation code (Haskell). Further, we show how to use functional programming abstractions to enforce regular (hyper-rectangular) search spaces. We also evaluate several common search strategies on a variety of kernels, and demonstrate that the framework can tune both EDSL and ordinary CUDA code

    Similar works