1,887 research outputs found
Fast recursive matrix multiplication for multi-core architectures
AbstractIn this article, we present a fast algorithm for matrix multiplication optimized for recent multicore architectures. The implementation exploits different methodologies from parallel programming, like recursive decomposition, efficient low-level implementations of basic blocks, software prefetching, and task scheduling resulting in a multilevel algorithm with adaptive features. Measurements on different systems and comparisons with GotoBLAS, Intel Math Kernel Library (IMKL), and AMD Core Math Library (AMCL) show that the matrix implementation presented has a very high efficiency
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Efficient Randomized Algorithms for the Fixed-Precision Low-Rank Matrix Approximation
Randomized algorithms for low-rank matrix approximation are investigated,
with the emphasis on the fixed-precision problem and computational efficiency
for handling large matrices. The algorithms are based on the so-called QB
factorization, where Q is an orthonormal matrix. Firstly, a mechanism for
calculating the approximation error in Frobenius norm is proposed, which
enables efficient adaptive rank determination for large and/or sparse matrix.
It can be combined with any QB-form factorization algorithm in which B's rows
are incrementally generated. Based on the blocked randQB algorithm by P.-G.
Martinsson and S. Voronin, this results in an algorithm called randQB EI. Then,
we further revise the algorithm to obtain a pass-efficient algorithm, randQB
FP, which is mathematically equivalent to the existing randQB algorithms and
also suitable for the fixed-precision problem. Especially, randQB FP can serve
as a single-pass algorithm for calculating leading singular values, under
certain condition. With large and/or sparse test matrices, we have empirically
validated the merits of the proposed techniques, which exhibit remarkable
speedup and memory saving over the blocked randQB algorithm. We have also
demonstrated that the single-pass algorithm derived by randQB FP is much more
accurate than an existing single-pass algorithm. And with data from a scenic
image and an information retrieval application, we have shown the advantages of
the proposed algorithms over the adaptive range finder algorithm for solving
the fixed-precision problem.Comment: 21 pages, 10 figure
- …