Hybrid Vector Library-From Memory Bound to Compute Bound with NVVM

Abstract

International audienceExisting source code usually interleaves data management, error-checking, text processing and actual compute. On general purpose processors, this mixture of code tasks is not necessarily an issue, and performance levels are often satisfactory as is. However, when trying to use GPU, this hybrid computing turns into a coding challenge. Each individual computing tasks does not show sufficient workload, and porting the whole application requires a significant investment in the software asset. We propose an alternate approach with runtime compilation based on function calls on a compute library. Hybrid Vector Library operates on vectors, in a manner similar to BLAS level 1 routines, with other functions such as square root or exponential, or MKL routines. In essence, all operations are performed on a vector of values. We illustrate the performance results of this approach on a typical financial benchmark.Existing solutions such as ArrayFire do not allow custom device function to be called in the middle of a level 1 routines sequence. We address that issue by also processing these functions. We follow the call graph from the main compute routine, and generate cubin files for user-defined device functions. These functions are then linked at runtime to the hvl calls sequence, and usually generate a JCAL instruction in SASS, in a similar way to sqrt.Our approach gives similar benefits to user's code as ArrayFire, with the flexibility of custom device functions

    Similar works