82 research outputs found
Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance
We show how both the tridiagonal and bidiagonal QR algorithms can be restructured so that they be-
come rich in operations that can achieve near-peak performance on a modern processor. The key is a
novel, cache-friendly algorithm for applying multiple sets of Givens rotations to the eigenvector/singular
vector matrix. This algorithm is then implemented with optimizations that (1) leverage vector instruction
units to increase floating-point throughput, and (2) fuse multiple rotations to decrease the total number of
memory operations. We demonstrate the merits of these new QR algorithms for computing the Hermitian
eigenvalue decomposition (EVD) and singular value decomposition (SVD) of dense matrices when all eigen-
vectors/singular vectors are computed. The approach yields vastly improved performance relative to the
traditional QR algorithms for these problems and is competitive with two commonly used alternatives—
Cuppen’s Divide and Conquer algorithm and the Method of Multiple Relatively Robust Representations—
while inheriting the more modest O(n) workspace requirements of the original QR algorithms. Since the
computations performed by the restructured algorithms remain essentially identical to those performed by
the original methods, robust numerical properties are preserved
A Distributed and Incremental SVD Algorithm for Agglomerative Data Analysis on Large Networks
In this paper, we show that the SVD of a matrix can be constructed
efficiently in a hierarchical approach. Our algorithm is proven to recover the
singular values and left singular vectors if the rank of the input matrix
is known. Further, the hierarchical algorithm can be used to recover the
largest singular values and left singular vectors with bounded error. We also
show that the proposed method is stable with respect to roundoff errors or
corruption of the original matrix entries. Numerical experiments validate the
proposed algorithms and parallel cost analysis
Improved Accuracy and Parallelism for MRRR-based Eigensolvers -- A Mixed Precision Approach
The real symmetric tridiagonal eigenproblem is of outstanding importance in
numerical computations; it arises frequently as part of eigensolvers for
standard and generalized dense Hermitian eigenproblems that are based on a
reduction to tridiagonal form. For its solution, the algorithm of Multiple
Relatively Robust Representations (MRRR) is among the fastest methods. Although
fast, the solvers based on MRRR do not deliver the same accuracy as competing
methods like Divide & Conquer or the QR algorithm. In this paper, we
demonstrate that the use of mixed precisions leads to improved accuracy of
MRRR-based eigensolvers with limited or no performance penalty. As a result, we
obtain eigensolvers that are not only equally or more accurate than the best
available methods, but also -in most circumstances- faster and more scalable
than the competition
Perturbation splitting for more accurate eigenvalues
Let be a symmetric tridiagonal matrix with entries and
eigenvalues of different magnitudes. For some , small entrywise
relative perturbations induce small errors in the eigenvalues,
independently of the size of the entries of the matrix; this is
certainly true when the perturbed matrix can be written as
with small . Even if it is
not possible to express in this way the perturbations in every
entry of , much can be gained by doing so for as many as
possible entries of larger magnitude. We propose a technique which
consists of splitting multiplicative and additive perturbations
to produce new error bounds which, for some matrices, are much
sharper than the usual ones. Such bounds may be useful in the
development of improved software for the tridiagonal eigenvalue
problem, and we describe their role in the context of a mixed
precision bisection-like procedure. Using the very same idea of
splitting perturbations (multiplicative and additive), we show
that when defines well its eigenvalues, the numerical values
of the pivots in the usual decomposition may
be used to compute approximations with high relative precision.Fundação para a Ciência e Tecnologia (FCT) - POCI 201
Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients
We present a robust and scalable preconditioner for the solution of
large-scale linear systems that arise from the discretization of elliptic PDEs
amenable to rank compression. The preconditioner is based on hierarchical
low-rank approximations and the cyclic reduction method. The setup and
application phases of the preconditioner achieve log-linear complexity in
memory footprint and number of operations, and numerical experiments exhibit
good weak and strong scalability at large processor counts in a distributed
memory environment. Numerical experiments with linear systems that feature
symmetry and nonsymmetry, definiteness and indefiniteness, constant and
variable coefficients demonstrate the preconditioner applicability and
robustness. Furthermore, it is possible to control the number of iterations via
the accuracy threshold of the hierarchical matrix approximations and their
arithmetic operations, and the tuning of the admissibility condition parameter.
Together, these parameters allow for optimization of the memory requirements
and performance of the preconditioner.Comment: 24 pages, Elsevier Journal of Computational and Applied Mathematics,
Dec 201
- …