thesis

Optimization of Parallel Computations on Heterogeneous GPU-Based Systems

Abstract

In this master thesis, we design and implement MultiStream: a solution that extends the existing data parallel skeleton library SkePU with NVIDIA CUDA Streams to overlap main memory – device memory data transfers with CUDA Kernel executions. We show the benefits of this approach using a task-parallel framework, FastFlow, on-top of SkePU. Finally, we compare the MultiStream extended SkePU to an ad-hoc solution to discuss the tradeoffs between the level of abstraction and the maximum achievable performance

    Similar works