4 research outputs found
GPU peer-to-peer techniques applied to a cluster interconnect
Modern GPUs support special protocols to exchange data directly across the
PCI Express bus. While these protocols could be used to reduce GPU data
transmission times, basically by avoiding staging to host memory, they require
specific hardware features which are not available on current generation
network adapters. In this paper we describe the architectural modifications
required to implement peer-to-peer access to NVIDIA Fermi- and Kepler-class
GPUs on an FPGA-based cluster interconnect. Besides, the current software
implementation, which integrates this feature by minimally extending the RDMA
programming model, is discussed, as well as some issues raised while employing
it in a higher level API like MPI. Finally, the current limits of the technique
are studied by analyzing the performance improvements on low-level benchmarks
and on two GPU-accelerated applications, showing when and how they seem to
benefit from the GPU peer-to-peer method.Comment: paper accepted to CASS 201