6 research outputs found
Analyzing and enhancing OSKI for sparse matrix-vector multiplication
Sparse matrix-vector multiplication (SpMxV) is a kernel operation widely used
in iterative linear solvers. The same sparse matrix is multiplied by a dense
vector repeatedly in these solvers. Matrices with irregular sparsity patterns
make it difficult to utilize cache locality effectively in SpMxV computations.
In this work, we investigate single- and multiple-SpMxV frameworks for
exploiting cache locality in SpMxV computations. For the single-SpMxV
framework, we propose two cache-size-aware top-down row/column-reordering
methods based on 1D and 2D sparse matrix partitioning by utilizing the
column-net and enhancing the row-column-net hypergraph models of sparse
matrices. The multiple-SpMxV framework depends on splitting a given matrix into
a sum of multiple nonzero-disjoint matrices so that the SpMxV operation is
performed as a sequence of multiple input- and output-dependent SpMxV
operations. For an effective matrix splitting required in this framework, we
propose a cache-size-aware top-down approach based on 2D sparse matrix
partitioning by utilizing the row-column-net hypergraph model. The primary
objective in all of the three methods is to maximize the exploitation of
temporal locality. We evaluate the validity of our models and methods on a wide
range of sparse matrices by performing actual runs through using OSKI.
Experimental results show that proposed methods and models outperform
state-of-the-art schemes.Comment: arXiv admin note: substantial text overlap with arXiv:1202.385
Recommended from our members
Efficient group communication for large-scale parallel clustering
Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in iterative parallel data mining algorithms. In particular, the analysis focuses on one of the most influential and popular data mining methods, the k-means algorithm for cluster analysis. The straightforward parallel formulation of the k-means algorithm requires a global reduction operation at each iteration step, which hinders its scalability. This work studies a different parallel formulation of the algorithm where the requirement of global communication can be relaxed while still providing the exact solution of the centralised k-means algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs