Search CORE

5 research outputs found

Improving Numerical Accuracy for Non-Negative Matrix Multiplication on GPUs using Recursive Algorithms

Author: Alberto Fastmmw
Alexandru Nicolau
Lubomir Bic
Matthew Badin
Michael Dillencourt
Paolo D&apos
Publication venue
Publication date: 11/04/2020
Field of study

ABSTRACT Scientific computing is only bound by the limits of Moore's Law and the scalability of high performance mathematical library implementations. Most mathematical libraries however tend to focus only on general inputs, limiting their potential performance and scalability by not tailoring their implementation to specific inputs, such as non-negative inputs. By removing this limitation it is possible to improve the performance and accuracy of a range of problems. In this paper we explore the limitations of hardware to improve accuracy of non-negative matrix multiply by specifically comparing implementations on the GPU and CPU and propose algorithmic solutions to improve accuracy. Next, we demonstrate a matrix multiply implementation that takes advantage of asymptotically fast matrix multiply algorithms, which have been shown to scale better than O(N 3 ) matrix multiply implementations, and improve accuracy by up to a whole digit while increasing performance by up to 27% for matrices where the input is positive. Finally, we propose to extend the BLAS level 3 specification to non-negative matrices to allow easy integration of our solution and allow other library authors to implement their own solutions as part of an existing standard

CiteSeerX

Improving Numerical Accuracy for Non-Negative Matrix Multiplication on GPUs using Recursive Algorithms

Author: Alexandru Nicolau
Lubomir Bic
Matthew Badin
Michael Dillencourt
Publication venue
Publication date: 01/01/2013
Field of study

Scientific computing is only bound by the limits of Moore’s Law and the scalability of high performance mathematical library implementations. Most mathematical libraries however tend to focus only on general inputs, limiting their potential performance and scalability by not tailoring their implementation to specific inputs, such as non-negative inputs. By removing this limitation it is possible to improve the performance and accuracy of a range of problems. In this paper we explore the limitations of hardware to improve accuracy of non-negative matrix multiply by specifically comparing implementations on the GPU and CPU and propose algorithmic solutions to improve accuracy. Next, we demonstrate a matrix multiply implementation that takes advantage of asymptotically fast matrix multiply algorithms, which have been shown to scale better than O(N 3) matrix multiply implementations, and improve accuracy by up to a whole digit while increasing performance by up to 27 % for matrices where the input is positive. Finally, we propose to extend the BLAS level 3 specification to non-negative matrices to allow easy integration of our solution and allow other library authors to implement their own solutions as part of an existing standard

CiteSeerX

Crossref

EXPERIMENTAL EVALUATION OF PARALLEL PROGRAM SCALABILITY ON XEON PHI SMP

Author
Publication venue
Publication date
Field of study

KFUPM ePrints

EXPERIMENTAL EVALUATION OF PARALLEL PROGRAM SCALABILITY ON XEON PHI SMP

Author
Publication venue
Publication date
Field of study