4,671 research outputs found
Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors
Sparse matrix-vector multiplication (SpMV) is a central building block for
scientific software and graph applications. Recently, heterogeneous processors
composed of different types of cores attracted much attention because of their
flexible core configuration and high energy efficiency. In this paper, we
propose a compressed sparse row (CSR) format based SpMV algorithm utilizing
both types of cores in a CPU-GPU heterogeneous processor. We first
speculatively execute segmented sum operations on the GPU part of a
heterogeneous processor and generate a possibly incorrect results. Then the CPU
part of the same chip is triggered to re-arrange the predicted partial sums for
a correct resulting vector. On three heterogeneous processors from Intel, AMD
and nVidia, using 20 sparse matrices as a benchmark suite, the experimental
results show that our method obtains significant performance improvement over
the best existing CSR-based SpMV algorithms. The source code of this work is
downloadable at https://github.com/bhSPARSE/Benchmark_SpMV_using_CSRComment: 22 pages, 8 figures, Published at Parallel Computing (PARCO
Parallel Unsmoothed Aggregation Algebraic Multigrid Algorithms on GPUs
We design and implement a parallel algebraic multigrid method for isotropic
graph Laplacian problems on multicore Graphical Processing Units (GPUs). The
proposed AMG method is based on the aggregation framework. The setup phase of
the algorithm uses a parallel maximal independent set algorithm in forming
aggregates and the resulting coarse level hierarchy is then used in a K-cycle
iteration solve phase with a -Jacobi smoother. Numerical tests of a
parallel implementation of the method for graphics processors are presented to
demonstrate its effectiveness.Comment: 18 pages, 3 figure
- …