We present MGPU, a C++ programming library targeted at single-node multi-GPU
systems. Such systems combine disproportionate floating point performance with
high data locality and are thus well suited to implement real-time algorithms.
We describe the library design, programming interface and implementation
details in light of this specific problem domain. The core concepts of this
work are a novel kind of container abstraction and MPI-like communication
methods for intra-system communication. We further demonstrate how MGPU is used
as a framework for porting existing GPU libraries to multi-device
architectures. Putting our library to the test, we accelerate an iterative
non-linear image reconstruction algorithm for real-time magnetic resonance
imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs
and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us
to conclude that multi-GPU systems are a viable solution for real-time MRI
reconstruction as well as signal-processing applications in general.Comment: 15 pages, 10 figure