40 research outputs found

    Minimizing Communication for Eigenproblems and the Singular Value Decomposition

    Full text link
    Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds were presented on the amount of communication required for essentially all O(n3)O(n^3)-like algorithms for linear algebra, including eigenvalue problems and the SVD. Conventional algorithms, including those currently implemented in (Sca)LAPACK, perform asymptotically more communication than these lower bounds require. In this paper we present parallel and sequential eigenvalue algorithms (for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms that do attain these lower bounds, and analyze their convergence and communication costs.Comment: 43 pages, 11 figure

    Improved Accuracy and Parallelism for MRRR-based Eigensolvers -- A Mixed Precision Approach

    Get PDF
    The real symmetric tridiagonal eigenproblem is of outstanding importance in numerical computations; it arises frequently as part of eigensolvers for standard and generalized dense Hermitian eigenproblems that are based on a reduction to tridiagonal form. For its solution, the algorithm of Multiple Relatively Robust Representations (MRRR) is among the fastest methods. Although fast, the solvers based on MRRR do not deliver the same accuracy as competing methods like Divide & Conquer or the QR algorithm. In this paper, we demonstrate that the use of mixed precisions leads to improved accuracy of MRRR-based eigensolvers with limited or no performance penalty. As a result, we obtain eigensolvers that are not only equally or more accurate than the best available methods, but also -in most circumstances- faster and more scalable than the competition

    Thick-restarted joint Lanczos bidiagonalization for the GSVD

    Full text link
    The computation of the partial generalized singular value decomposition (GSVD) of large-scale matrix pairs can be approached by means of iterative methods based on expanding subspaces, particularly Krylov subspaces. We consider the joint Lanczos bidiagonalization method, and analyze the feasibility of adapting the thick restart technique that is being used successfully in the context of other linear algebra problems. Numerical experiments illustrate the effectiveness of the proposed method. We also compare the new method with an alternative solution via equivalent eigenvalue problems, considering accuracy as well as computational performance. The analysis is done using a parallel implementation in the SLEPc library

    Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance

    Get PDF
    We show how both the tridiagonal and bidiagonal QR algorithms can be restructured so that they be- come rich in operations that can achieve near-peak performance on a modern processor. The key is a novel, cache-friendly algorithm for applying multiple sets of Givens rotations to the eigenvector/singular vector matrix. This algorithm is then implemented with optimizations that (1) leverage vector instruction units to increase floating-point throughput, and (2) fuse multiple rotations to decrease the total number of memory operations. We demonstrate the merits of these new QR algorithms for computing the Hermitian eigenvalue decomposition (EVD) and singular value decomposition (SVD) of dense matrices when all eigen- vectors/singular vectors are computed. The approach yields vastly improved performance relative to the traditional QR algorithms for these problems and is competitive with two commonly used alternatives— Cuppen’s Divide and Conquer algorithm and the Method of Multiple Relatively Robust Representations— while inheriting the more modest O(n) workspace requirements of the original QR algorithms. Since the computations performed by the restructured algorithms remain essentially identical to those performed by the original methods, robust numerical properties are preserved

    MRRR-based Eigensolvers for Multi-core Processors and Supercomputers

    Get PDF
    The real symmetric tridiagonal eigenproblem is of outstanding importance in numerical computations; it arises frequently as part of eigensolvers for standard and generalized dense Hermitian eigenproblems that are based on a reduction to tridiagonal form. For its solution, the algorithm of Multiple Relatively Robust Representations (MRRR or MR3 in short) - introduced in the late 1990s - is among the fastest methods. To compute k eigenpairs of a real n-by-n tridiagonal T, MRRR only requires O(kn) arithmetic operations; in contrast, all the other practical methods require O(k^2 n) or O(n^3) operations in the worst case. This thesis centers around the performance and accuracy of MRRR.Comment: PhD thesi

    High-performance SVD partial spectrum computation

    Get PDF
    We introduce a new singular value decomposition (SVD) solver based on the QR-based Dynamically Weighted Halley (QDWH) algorithm for computing the partial spectrum SVD (QDWHpartial-SVD) problems. By optimizing the rational function underlying the algorithms in the desired part of the spectrum only, the QDWHpartial-SVD algorithm efficiently computes a fraction (say 1-20%) of the leading singular values/vectors. We develop a high-performance implementation of QDWHpartial-SVD 1 on distributed-memory manycore systems and demonstrate its numerical robustness. We perform a benchmarking campaign against counterparts from the state-of-theart numerical libraries across various matrix sizes using up to 36K MPI processes. Experimental results show performance speedups for QDWHpartial-SVD up to 6X and 2X against vendor-optimized PDGESVD from ScaLAPACK and KSVD on a Cray XC40 system using 1152 nodes based on two-socket 16-core Intel Haswell CPU, respectively. We also port our QDWHpartial-SVD software library to a system composed of 256 nodes with two-socket 64-Core AMD EPYC Milan CPU and achieve performance speedup up to 4X compared to vendor-optimized PDGESVD from ScaLAPACK. We also compare energy consumption for the two algorithms and demonstrate how QDWHpartial-SVD can further outperform PDGESVD in that regard by performing fewer memory-bound operations

    Accurate and Efficient Expression Evaluation and Linear Algebra

    Full text link
    We survey and unify recent results on the existence of accurate algorithms for evaluating multivariate polynomials, and more generally for accurate numerical linear algebra with structured matrices. By "accurate" we mean that the computed answer has relative error less than 1, i.e., has some correct leading digits. We also address efficiency, by which we mean algorithms that run in polynomial time in the size of the input. Our results will depend strongly on the model of arithmetic: Most of our results will use the so-called Traditional Model (TM). We give a set of necessary and sufficient conditions to decide whether a high accuracy algorithm exists in the TM, and describe progress toward a decision procedure that will take any problem and provide either a high accuracy algorithm or a proof that none exists. When no accurate algorithm exists in the TM, it is natural to extend the set of available accurate operations by a library of additional operations, such as x+y+zx+y+z, dot products, or indeed any enumerable set which could then be used to build further accurate algorithms. We show how our accurate algorithms and decision procedure for finding them extend to this case. Finally, we address other models of arithmetic, and the relationship between (im)possibility in the TM and (in)efficient algorithms operating on numbers represented as bit strings.Comment: 49 pages, 6 figures, 1 tabl

    High-Performance Solvers for Dense Hermitian Eigenproblems

    Full text link
    We introduce a new collection of solvers - subsequently called EleMRRR - for large-scale dense Hermitian eigenproblems. EleMRRR solves various types of problems: generalized, standard, and tridiagonal eigenproblems. Among these, the last is of particular importance as it is a solver on its own right, as well as the computational kernel for the first two; we present a fast and scalable tridiagonal solver based on the Algorithm of Multiple Relatively Robust Representations - referred to as PMRRR. Like the other EleMRRR solvers, PMRRR is part of the freely available Elemental library, and is designed to fully support both message-passing (MPI) and multithreading parallelism (SMP). As a result, the solvers can equally be used in pure MPI or in hybrid MPI-SMP fashion. We conducted a thorough performance study of EleMRRR and ScaLAPACK's solvers on two supercomputers. Such a study, performed with up to 8,192 cores, provides precise guidelines to assemble the fastest solver within the ScaLAPACK framework; it also indicates that EleMRRR outperforms even the fastest solvers built from ScaLAPACK's components
    corecore