1 research outputs found
Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions
The sparse matrix-vector product (SpMV) is a fundamental operation in many
scientific applications from various fields. The High Performance Computing
(HPC) community has therefore continuously invested a lot of effort to provide
an efficient SpMV kernel on modern CPU architectures. Although it has been
shown that block-based kernels help to achieve high performance, they are
difficult to use in practice because of the zero padding they require. In the
current paper, we propose new kernels using the AVX-512 instruction set, which
makes it possible to use a blocking scheme without any zero padding in the
matrix memory storage. We describe mask-based sparse matrix formats and their
corresponding SpMV kernels highly optimized in assembly language. Considering
that the optimal blocking size depends on the matrix, we also provide a method
to predict the best kernel to be used utilizing a simple interpolation of
results from previous executions. We compare the performance of our approach to
that of the Intel MKL CSR kernel and the CSR5 open-source package on a set of
standard benchmark matrices. We show that we can achieve significant
improvements in many cases, both for sequential and for parallel executions.
Finally, we provide the corresponding code in an open source library, called
SPC5.Comment: Published in Peer J C