2,285 research outputs found

    High-Performance Solvers for Dense Hermitian Eigenproblems

    Full text link
    We introduce a new collection of solvers - subsequently called EleMRRR - for large-scale dense Hermitian eigenproblems. EleMRRR solves various types of problems: generalized, standard, and tridiagonal eigenproblems. Among these, the last is of particular importance as it is a solver on its own right, as well as the computational kernel for the first two; we present a fast and scalable tridiagonal solver based on the Algorithm of Multiple Relatively Robust Representations - referred to as PMRRR. Like the other EleMRRR solvers, PMRRR is part of the freely available Elemental library, and is designed to fully support both message-passing (MPI) and multithreading parallelism (SMP). As a result, the solvers can equally be used in pure MPI or in hybrid MPI-SMP fashion. We conducted a thorough performance study of EleMRRR and ScaLAPACK's solvers on two supercomputers. Such a study, performed with up to 8,192 cores, provides precise guidelines to assemble the fastest solver within the ScaLAPACK framework; it also indicates that EleMRRR outperforms even the fastest solvers built from ScaLAPACK's components

    Improved Accuracy and Parallelism for MRRR-based Eigensolvers -- A Mixed Precision Approach

    Get PDF
    The real symmetric tridiagonal eigenproblem is of outstanding importance in numerical computations; it arises frequently as part of eigensolvers for standard and generalized dense Hermitian eigenproblems that are based on a reduction to tridiagonal form. For its solution, the algorithm of Multiple Relatively Robust Representations (MRRR) is among the fastest methods. Although fast, the solvers based on MRRR do not deliver the same accuracy as competing methods like Divide & Conquer or the QR algorithm. In this paper, we demonstrate that the use of mixed precisions leads to improved accuracy of MRRR-based eigensolvers with limited or no performance penalty. As a result, we obtain eigensolvers that are not only equally or more accurate than the best available methods, but also -in most circumstances- faster and more scalable than the competition

    Fast computation of spectral projectors of banded matrices

    Full text link
    We consider the approximate computation of spectral projectors for symmetric banded matrices. While this problem has received considerable attention, especially in the context of linear scaling electronic structure methods, the presence of small relative spectral gaps challenges existing methods based on approximate sparsity. In this work, we show how a data-sparse approximation based on hierarchical matrices can be used to overcome this problem. We prove a priori bounds on the approximation error and propose a fast algo- rithm based on the QDWH algorithm, along the works by Nakatsukasa et al. Numerical experiments demonstrate that the performance of our algorithm is robust with respect to the spectral gap. A preliminary Matlab implementation becomes faster than eig already for matrix sizes of a few thousand.Comment: 27 pages, 10 figure

    Very Large-Scale Singular Value Decomposition Using Tensor Train Networks

    Full text link
    We propose new algorithms for singular value decomposition (SVD) of very large-scale matrices based on a low-rank tensor approximation technique called the tensor train (TT) format. The proposed algorithms can compute several dominant singular values and corresponding singular vectors for large-scale structured matrices given in a TT format. The computational complexity of the proposed methods scales logarithmically with the matrix size under the assumption that both the matrix and the singular vectors admit low-rank TT decompositions. The proposed methods, which are called the alternating least squares for SVD (ALS-SVD) and modified alternating least squares for SVD (MALS-SVD), compute the left and right singular vectors approximately through block TT decompositions. The very large-scale optimization problem is reduced to sequential small-scale optimization problems, and each core tensor of the block TT decompositions can be updated by applying any standard optimization methods. The optimal ranks of the block TT decompositions are determined adaptively during iteration process, so that we can achieve high approximation accuracy. Extensive numerical simulations are conducted for several types of TT-structured matrices such as Hilbert matrix, Toeplitz matrix, random matrix with prescribed singular values, and tridiagonal matrix. The simulation results demonstrate the effectiveness of the proposed methods compared with standard SVD algorithms and TT-based algorithms developed for symmetric eigenvalue decomposition

    A nested Krylov subspace method to compute the sign function of large complex matrices

    Full text link
    We present an acceleration of the well-established Krylov-Ritz methods to compute the sign function of large complex matrices, as needed in lattice QCD simulations involving the overlap Dirac operator at both zero and nonzero baryon density. Krylov-Ritz methods approximate the sign function using a projection on a Krylov subspace. To achieve a high accuracy this subspace must be taken quite large, which makes the method too costly. The new idea is to make a further projection on an even smaller, nested Krylov subspace. If additionally an intermediate preconditioning step is applied, this projection can be performed without affecting the accuracy of the approximation, and a substantial gain in efficiency is achieved for both Hermitian and non-Hermitian matrices. The numerical efficiency of the method is demonstrated on lattice configurations of sizes ranging from 4^4 to 10^4, and the new results are compared with those obtained with rational approximation methods.Comment: 17 pages, 12 figures, minor corrections, extended analysis of the preconditioning ste

    The Anderson model of localization: a challenge for modern eigenvalue methods

    Get PDF
    We present a comparative study of the application of modern eigenvalue algorithms to an eigenvalue problem arising in quantum physics, namely, the computation of a few interior eigenvalues and their associated eigenvectors for the large, sparse, real, symmetric, and indefinite matrices of the Anderson model of localization. We compare the Lanczos algorithm in the 1987 implementation of Cullum and Willoughby with the implicitly restarted Arnoldi method coupled with polynomial and several shift-and-invert convergence accelerators as well as with a sparse hybrid tridiagonalization method. We demonstrate that for our problem the Lanczos implementation is faster and more memory efficient than the other approaches. This seemingly innocuous problem presents a major challenge for all modern eigenvalue algorithms.Comment: 16 LaTeX pages with 3 figures include

    Structure Preserving Parallel Algorithms for Solving the Bethe-Salpeter Eigenvalue Problem

    Full text link
    The Bethe-Salpeter eigenvalue problem is a dense structured eigenvalue problem arising from discretized Bethe-Salpeter equation in the context of computing exciton energies and states. A computational challenge is that at least half of the eigenvalues and the associated eigenvectors are desired in practice. We establish the equivalence between Bethe-Salpeter eigenvalue problems and real Hamiltonian eigenvalue problems. Based on theoretical analysis, structure preserving algorithms for a class of Bethe-Salpeter eigenvalue problems are proposed. We also show that for this class of problems all eigenvalues obtained from the Tamm-Dancoff approximation are overestimated. In order to solve large scale problems of practical interest, we discuss parallel implementations of our algorithms targeting distributed memory systems. Several numerical examples are presented to demonstrate the efficiency and accuracy of our algorithms
    • …
    corecore