Search CORE

3 research outputs found

Investigation of Parallel Data Processing Using Hybrid High Performance CPU + GPU Systems and CUDA Streams

Author: Czarnul Paweł
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 16/12/2020
Field of study

The paper investigates parallel data processing in a hybrid CPU+GPU(s) system using multiple CUDA streams for overlapping communication and computations. This is crucial for efficient processing of data, in particular incoming data stream processing that would naturally be forwarded using multiple CUDA streams to GPUs. Performance is evaluated for various compute time to host-device communication time ratios, numbers of CUDA streams, for various numbers of threads managing computations on GPUs. Tests also reveal benefits of using CUDA MPS for overlapping communication and computations when using multiple processes. Furthermore, using standard memory allocation on a GPU and Unified Memory versions are compared, the latter including programmer added prefetching. Performance of a hybrid CPU+GPU version as well as scaling across multiple GPUs are demonstrated showing good speed-ups of the approach. Finally, the performance per power consumption of selected configurations are presented for various numbers of streams and various relative performances of GPUs and CPUs

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Performance optimization for SpMV on multi-GPU systems using threads and multiple streams

Author: Guo Ping
Zhang Changjiang
Publication venue: Kean Digital Learning Commons
Publication date: 01/10/2016
Field of study

Sparse matrix-vector multiplication (SpMV) is a key operation in scientific computing and engineering ap-plications. This paper presents an optimization strategy to improve SpMV performance on the multi-GPU systems by adopting OpenMP threads and multiple CUDA streams. We propose an efficient scheme to control multiple GPUs jointly complete SpMV computations by making use of OpenMP threads. Moreover, we adopt streamed approach to increase concurrency to further improve SpMV performance. In our paper, we use HYB (Hybrid ELL/COO), a hybrid sparse storage format, to demonstrate the effectiveness of our proposed approach. Our experimental results show that our approach achieves an average speedup of 3.80 over the existing SpMV implementation on a single GPU

Crossref

Kean Digital Learning Commons