CUDA programming language perfectly matches the data parallel programming model and it is a very specific way of programming Graphics Processing Unit devices. On the other hand, the large amount of hardware not necessarily having a GPU is a resource that we would not like to left unused. Exploiting this resources arises the issue of guaranteeing performance portability, a major challenge faced today by the heterogeneous high performance programming community. The aim of this work is the development of an automatic translation tool from CUDA to C++. Operating at a source-to-source translation level we manipulate the Abstract Syntax Tree of the CUDA program to obtain the C++ version.
To accomplish this synctactic analysis and transformation, we are relying on clang, a compiler front-end for the C/C++ languages family, that also handles CUDA syntax. Our approach consists in mapping each CUDA block to a CPU thread, and serialize the execution of each CUDA thread. After describing the implementation of this tool, we will show that we preserve a comparable performance running the translated code on the target architecture. We also point out how the use of the CUDA framework can be profitable targeting more than GPU architectures