2,285 research outputs found
High-Performance Solvers for Dense Hermitian Eigenproblems
We introduce a new collection of solvers - subsequently called EleMRRR - for
large-scale dense Hermitian eigenproblems. EleMRRR solves various types of
problems: generalized, standard, and tridiagonal eigenproblems. Among these,
the last is of particular importance as it is a solver on its own right, as
well as the computational kernel for the first two; we present a fast and
scalable tridiagonal solver based on the Algorithm of Multiple Relatively
Robust Representations - referred to as PMRRR. Like the other EleMRRR solvers,
PMRRR is part of the freely available Elemental library, and is designed to
fully support both message-passing (MPI) and multithreading parallelism (SMP).
As a result, the solvers can equally be used in pure MPI or in hybrid MPI-SMP
fashion. We conducted a thorough performance study of EleMRRR and ScaLAPACK's
solvers on two supercomputers. Such a study, performed with up to 8,192 cores,
provides precise guidelines to assemble the fastest solver within the ScaLAPACK
framework; it also indicates that EleMRRR outperforms even the fastest solvers
built from ScaLAPACK's components
Improved Accuracy and Parallelism for MRRR-based Eigensolvers -- A Mixed Precision Approach
The real symmetric tridiagonal eigenproblem is of outstanding importance in
numerical computations; it arises frequently as part of eigensolvers for
standard and generalized dense Hermitian eigenproblems that are based on a
reduction to tridiagonal form. For its solution, the algorithm of Multiple
Relatively Robust Representations (MRRR) is among the fastest methods. Although
fast, the solvers based on MRRR do not deliver the same accuracy as competing
methods like Divide & Conquer or the QR algorithm. In this paper, we
demonstrate that the use of mixed precisions leads to improved accuracy of
MRRR-based eigensolvers with limited or no performance penalty. As a result, we
obtain eigensolvers that are not only equally or more accurate than the best
available methods, but also -in most circumstances- faster and more scalable
than the competition
Fast computation of spectral projectors of banded matrices
We consider the approximate computation of spectral projectors for symmetric
banded matrices. While this problem has received considerable attention,
especially in the context of linear scaling electronic structure methods, the
presence of small relative spectral gaps challenges existing methods based on
approximate sparsity. In this work, we show how a data-sparse approximation
based on hierarchical matrices can be used to overcome this problem. We prove a
priori bounds on the approximation error and propose a fast algo- rithm based
on the QDWH algorithm, along the works by Nakatsukasa et al. Numerical
experiments demonstrate that the performance of our algorithm is robust with
respect to the spectral gap. A preliminary Matlab implementation becomes faster
than eig already for matrix sizes of a few thousand.Comment: 27 pages, 10 figure
Very Large-Scale Singular Value Decomposition Using Tensor Train Networks
We propose new algorithms for singular value decomposition (SVD) of very
large-scale matrices based on a low-rank tensor approximation technique called
the tensor train (TT) format. The proposed algorithms can compute several
dominant singular values and corresponding singular vectors for large-scale
structured matrices given in a TT format. The computational complexity of the
proposed methods scales logarithmically with the matrix size under the
assumption that both the matrix and the singular vectors admit low-rank TT
decompositions. The proposed methods, which are called the alternating least
squares for SVD (ALS-SVD) and modified alternating least squares for SVD
(MALS-SVD), compute the left and right singular vectors approximately through
block TT decompositions. The very large-scale optimization problem is reduced
to sequential small-scale optimization problems, and each core tensor of the
block TT decompositions can be updated by applying any standard optimization
methods. The optimal ranks of the block TT decompositions are determined
adaptively during iteration process, so that we can achieve high approximation
accuracy. Extensive numerical simulations are conducted for several types of
TT-structured matrices such as Hilbert matrix, Toeplitz matrix, random matrix
with prescribed singular values, and tridiagonal matrix. The simulation results
demonstrate the effectiveness of the proposed methods compared with standard
SVD algorithms and TT-based algorithms developed for symmetric eigenvalue
decomposition
A nested Krylov subspace method to compute the sign function of large complex matrices
We present an acceleration of the well-established Krylov-Ritz methods to
compute the sign function of large complex matrices, as needed in lattice QCD
simulations involving the overlap Dirac operator at both zero and nonzero
baryon density. Krylov-Ritz methods approximate the sign function using a
projection on a Krylov subspace. To achieve a high accuracy this subspace must
be taken quite large, which makes the method too costly. The new idea is to
make a further projection on an even smaller, nested Krylov subspace. If
additionally an intermediate preconditioning step is applied, this projection
can be performed without affecting the accuracy of the approximation, and a
substantial gain in efficiency is achieved for both Hermitian and non-Hermitian
matrices. The numerical efficiency of the method is demonstrated on lattice
configurations of sizes ranging from 4^4 to 10^4, and the new results are
compared with those obtained with rational approximation methods.Comment: 17 pages, 12 figures, minor corrections, extended analysis of the
preconditioning ste
The Anderson model of localization: a challenge for modern eigenvalue methods
We present a comparative study of the application of modern eigenvalue
algorithms to an eigenvalue problem arising in quantum physics, namely, the
computation of a few interior eigenvalues and their associated eigenvectors for
the large, sparse, real, symmetric, and indefinite matrices of the Anderson
model of localization. We compare the Lanczos algorithm in the 1987
implementation of Cullum and Willoughby with the implicitly restarted Arnoldi
method coupled with polynomial and several shift-and-invert convergence
accelerators as well as with a sparse hybrid tridiagonalization method. We
demonstrate that for our problem the Lanczos implementation is faster and more
memory efficient than the other approaches. This seemingly innocuous problem
presents a major challenge for all modern eigenvalue algorithms.Comment: 16 LaTeX pages with 3 figures include
Structure Preserving Parallel Algorithms for Solving the Bethe-Salpeter Eigenvalue Problem
The Bethe-Salpeter eigenvalue problem is a dense structured eigenvalue
problem arising from discretized Bethe-Salpeter equation in the context of
computing exciton energies and states. A computational challenge is that at
least half of the eigenvalues and the associated eigenvectors are desired in
practice. We establish the equivalence between Bethe-Salpeter eigenvalue
problems and real Hamiltonian eigenvalue problems. Based on theoretical
analysis, structure preserving algorithms for a class of Bethe-Salpeter
eigenvalue problems are proposed. We also show that for this class of problems
all eigenvalues obtained from the Tamm-Dancoff approximation are overestimated.
In order to solve large scale problems of practical interest, we discuss
parallel implementations of our algorithms targeting distributed memory
systems. Several numerical examples are presented to demonstrate the efficiency
and accuracy of our algorithms
- …