Modern Graphics Processing Units (GPUs) provide high computation power at low costs and have been described as desktop supercomputers. The GPUs expose a general, data-parallel programming model today in the form of CUDA and CAL. The GPU is presented as a massively multithreaded architecture by them. Several high-performance, general data processing algorithms such as sorting, matrix multiplication, etc., have been developed for the GPUs. In this paper, we present a set of general graph algorithms on the GPU using the CUDA programming model. We present implementations of breadth-first search, st-connectivity, single-source shortest path, all-pairs shortest path, minimum spanning tree, and maximum flow algorithms on commodity GPUs. Our implementations exhibit high performance, especially on large graphs. We experiment on random, scale-free, and real-life graphs of up to millions of vertices. Parallel algorithms for such problems have been reported in the literature before, especially on supercomputers. The approach has been that of divide-and-conquer, where individual processing nodes solve smaller sub-problems followed by a combining step. The massively multithreaded model of the GPU makes it possible to adopt the data-parallel approach even to irregular algorithms like graph algorithms, using O(V) or O(E) simultaneous threads. The algorithms and the underlying techniques presented in this paper are likely to be applicable to many irregular algorithms on them. 1
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.