4,078 research outputs found
Parallel Randomized Tucker Decomposition Algorithms
The Tucker tensor decomposition is a natural extension of the singular value
decomposition (SVD) to multiway data. We propose to accelerate Tucker tensor
decomposition algorithms by using randomization and parallelization. We present
two algorithms that scale to large data and many processors, significantly
reduce both computation and communication cost compared to previous
deterministic and randomized approaches, and obtain nearly the same
approximation errors. The key idea in our algorithms is to perform randomized
sketches with Kronecker-structured random matrices, which reduces computation
compared to unstructured matrices and can be implemented using a fundamental
tensor computational kernel. We provide probabilistic error analysis of our
algorithms and implement a new parallel algorithm for the structured randomized
sketch. Our experimental results demonstrate that our combination of
randomization and parallelization achieves accurate Tucker decompositions much
faster than alternative approaches. We observe up to a 16X speedup over the
fastest deterministic parallel implementation on 3D simulation data
Efficient Randomized Algorithms for the Fixed-Precision Low-Rank Matrix Approximation
Randomized algorithms for low-rank matrix approximation are investigated,
with the emphasis on the fixed-precision problem and computational efficiency
for handling large matrices. The algorithms are based on the so-called QB
factorization, where Q is an orthonormal matrix. Firstly, a mechanism for
calculating the approximation error in Frobenius norm is proposed, which
enables efficient adaptive rank determination for large and/or sparse matrix.
It can be combined with any QB-form factorization algorithm in which B's rows
are incrementally generated. Based on the blocked randQB algorithm by P.-G.
Martinsson and S. Voronin, this results in an algorithm called randQB EI. Then,
we further revise the algorithm to obtain a pass-efficient algorithm, randQB
FP, which is mathematically equivalent to the existing randQB algorithms and
also suitable for the fixed-precision problem. Especially, randQB FP can serve
as a single-pass algorithm for calculating leading singular values, under
certain condition. With large and/or sparse test matrices, we have empirically
validated the merits of the proposed techniques, which exhibit remarkable
speedup and memory saving over the blocked randQB algorithm. We have also
demonstrated that the single-pass algorithm derived by randQB FP is much more
accurate than an existing single-pass algorithm. And with data from a scenic
image and an information retrieval application, we have shown the advantages of
the proposed algorithms over the adaptive range finder algorithm for solving
the fixed-precision problem.Comment: 21 pages, 10 figure
An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling
We present a sparse linear system solver that is based on a multifrontal
variant of Gaussian elimination, and exploits low-rank approximation of the
resulting dense frontal matrices. We use hierarchically semiseparable (HSS)
matrices, which have low-rank off-diagonal blocks, to approximate the frontal
matrices. For HSS matrix construction, a randomized sampling algorithm is used
together with interpolative decompositions. The combination of the randomized
compression with a fast ULV HSS factorization leads to a solver with lower
computational complexity than the standard multifrontal method for many
applications, resulting in speedups up to 7 fold for problems in our test
suite. The implementation targets many-core systems by using task parallelism
with dynamic runtime scheduling. Numerical experiments show performance
improvements over state-of-the-art sparse direct solvers. The implementation
achieves high performance and good scalability on a range of modern shared
memory parallel systems, including the Intel Xeon Phi (MIC). The code is part
of a software package called STRUMPACK -- STRUctured Matrices PACKage, which
also has a distributed memory component for dense rank-structured matrices
- …