The main purpose of implementing the code on Kepler architecture is to speed up the GPU code, which is from the previous work done by our group, by using the new functions of NVIDIA CUDA\u27s Kepler architecture. Therefore, this thesis specifically focuses on the latest architecture.
To get benefits from the Kepler architecture, the primary work is to convert the code and make it adapt to the new features: Warp Shuffle and Dynamic Parallelism. The new code changes the way to transfer data and generate new kernel functions. In addition, another challenge is to trade o the use of resources on each thread to get the best performance.
The new code has different performance with different work sizes. Generally, the speedup is between 17% and 33%, and better performance is achieved in larger systems. This is a reasonable performance for the improvement with only two new features. The main contribution of this thesis is that the detailed evaluation of these two Kepler architectural features provide guidance to other researchers on the potential performance benefits of modifying their code. Therefore, they can make appropriate modifications and achieve reasonable speedup according to the structure of their codes