40 research outputs found
Minimizing Communication for Eigenproblems and the Singular Value Decomposition
Algorithms have two costs: arithmetic and communication. The latter
represents the cost of moving data, either between levels of a memory
hierarchy, or between processors over a network. Communication often dominates
arithmetic and represents a rapidly increasing proportion of the total cost, so
we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds
were presented on the amount of communication required for essentially all
-like algorithms for linear algebra, including eigenvalue problems and
the SVD. Conventional algorithms, including those currently implemented in
(Sca)LAPACK, perform asymptotically more communication than these lower bounds
require. In this paper we present parallel and sequential eigenvalue algorithms
(for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms
that do attain these lower bounds, and analyze their convergence and
communication costs.Comment: 43 pages, 11 figure
Improved Accuracy and Parallelism for MRRR-based Eigensolvers -- A Mixed Precision Approach
The real symmetric tridiagonal eigenproblem is of outstanding importance in
numerical computations; it arises frequently as part of eigensolvers for
standard and generalized dense Hermitian eigenproblems that are based on a
reduction to tridiagonal form. For its solution, the algorithm of Multiple
Relatively Robust Representations (MRRR) is among the fastest methods. Although
fast, the solvers based on MRRR do not deliver the same accuracy as competing
methods like Divide & Conquer or the QR algorithm. In this paper, we
demonstrate that the use of mixed precisions leads to improved accuracy of
MRRR-based eigensolvers with limited or no performance penalty. As a result, we
obtain eigensolvers that are not only equally or more accurate than the best
available methods, but also -in most circumstances- faster and more scalable
than the competition
Thick-restarted joint Lanczos bidiagonalization for the GSVD
The computation of the partial generalized singular value decomposition
(GSVD) of large-scale matrix pairs can be approached by means of iterative
methods based on expanding subspaces, particularly Krylov subspaces. We
consider the joint Lanczos bidiagonalization method, and analyze the
feasibility of adapting the thick restart technique that is being used
successfully in the context of other linear algebra problems. Numerical
experiments illustrate the effectiveness of the proposed method. We also
compare the new method with an alternative solution via equivalent eigenvalue
problems, considering accuracy as well as computational performance. The
analysis is done using a parallel implementation in the SLEPc library
Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance
We show how both the tridiagonal and bidiagonal QR algorithms can be restructured so that they be-
come rich in operations that can achieve near-peak performance on a modern processor. The key is a
novel, cache-friendly algorithm for applying multiple sets of Givens rotations to the eigenvector/singular
vector matrix. This algorithm is then implemented with optimizations that (1) leverage vector instruction
units to increase floating-point throughput, and (2) fuse multiple rotations to decrease the total number of
memory operations. We demonstrate the merits of these new QR algorithms for computing the Hermitian
eigenvalue decomposition (EVD) and singular value decomposition (SVD) of dense matrices when all eigen-
vectors/singular vectors are computed. The approach yields vastly improved performance relative to the
traditional QR algorithms for these problems and is competitive with two commonly used alternatives—
Cuppen’s Divide and Conquer algorithm and the Method of Multiple Relatively Robust Representations—
while inheriting the more modest O(n) workspace requirements of the original QR algorithms. Since the
computations performed by the restructured algorithms remain essentially identical to those performed by
the original methods, robust numerical properties are preserved
MRRR-based Eigensolvers for Multi-core Processors and Supercomputers
The real symmetric tridiagonal eigenproblem is of outstanding importance in
numerical computations; it arises frequently as part of eigensolvers for
standard and generalized dense Hermitian eigenproblems that are based on a
reduction to tridiagonal form. For its solution, the algorithm of Multiple
Relatively Robust Representations (MRRR or MR3 in short) - introduced in the
late 1990s - is among the fastest methods. To compute k eigenpairs of a real
n-by-n tridiagonal T, MRRR only requires O(kn) arithmetic operations; in
contrast, all the other practical methods require O(k^2 n) or O(n^3) operations
in the worst case. This thesis centers around the performance and accuracy of
MRRR.Comment: PhD thesi
High-performance SVD partial spectrum computation
We introduce a new singular value decomposition (SVD) solver
based on the QR-based Dynamically Weighted Halley (QDWH) algorithm for computing the partial spectrum SVD (QDWHpartial-SVD)
problems. By optimizing the rational function underlying the algorithms in the desired part of the spectrum only, the QDWHpartial-SVD
algorithm efficiently computes a fraction (say 1-20%) of the leading
singular values/vectors. We develop a high-performance implementation of QDWHpartial-SVD 1 on distributed-memory manycore
systems and demonstrate its numerical robustness. We perform a
benchmarking campaign against counterparts from the state-of-theart numerical libraries across various matrix sizes using up to 36K
MPI processes. Experimental results show performance speedups
for QDWHpartial-SVD up to 6X and 2X against vendor-optimized
PDGESVD from ScaLAPACK and KSVD on a Cray XC40 system
using 1152 nodes based on two-socket 16-core Intel Haswell CPU,
respectively. We also port our QDWHpartial-SVD software library
to a system composed of 256 nodes with two-socket 64-Core AMD
EPYC Milan CPU and achieve performance speedup up to 4X compared to vendor-optimized PDGESVD from ScaLAPACK. We also
compare energy consumption for the two algorithms and demonstrate how QDWHpartial-SVD can further outperform PDGESVD
in that regard by performing fewer memory-bound operations
Accurate and Efficient Expression Evaluation and Linear Algebra
We survey and unify recent results on the existence of accurate algorithms
for evaluating multivariate polynomials, and more generally for accurate
numerical linear algebra with structured matrices. By "accurate" we mean that
the computed answer has relative error less than 1, i.e., has some correct
leading digits. We also address efficiency, by which we mean algorithms that
run in polynomial time in the size of the input. Our results will depend
strongly on the model of arithmetic: Most of our results will use the so-called
Traditional Model (TM). We give a set of necessary and sufficient conditions to
decide whether a high accuracy algorithm exists in the TM, and describe
progress toward a decision procedure that will take any problem and provide
either a high accuracy algorithm or a proof that none exists. When no accurate
algorithm exists in the TM, it is natural to extend the set of available
accurate operations by a library of additional operations, such as , dot
products, or indeed any enumerable set which could then be used to build
further accurate algorithms. We show how our accurate algorithms and decision
procedure for finding them extend to this case. Finally, we address other
models of arithmetic, and the relationship between (im)possibility in the TM
and (in)efficient algorithms operating on numbers represented as bit strings.Comment: 49 pages, 6 figures, 1 tabl
High-Performance Solvers for Dense Hermitian Eigenproblems
We introduce a new collection of solvers - subsequently called EleMRRR - for
large-scale dense Hermitian eigenproblems. EleMRRR solves various types of
problems: generalized, standard, and tridiagonal eigenproblems. Among these,
the last is of particular importance as it is a solver on its own right, as
well as the computational kernel for the first two; we present a fast and
scalable tridiagonal solver based on the Algorithm of Multiple Relatively
Robust Representations - referred to as PMRRR. Like the other EleMRRR solvers,
PMRRR is part of the freely available Elemental library, and is designed to
fully support both message-passing (MPI) and multithreading parallelism (SMP).
As a result, the solvers can equally be used in pure MPI or in hybrid MPI-SMP
fashion. We conducted a thorough performance study of EleMRRR and ScaLAPACK's
solvers on two supercomputers. Such a study, performed with up to 8,192 cores,
provides precise guidelines to assemble the fastest solver within the ScaLAPACK
framework; it also indicates that EleMRRR outperforms even the fastest solvers
built from ScaLAPACK's components