    Minimizing Communication for Eigenproblems and the Singular Value Decomposition

    Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds were presented on the amount of communication required for essentially all O(n3)O(n^3)-like algorithms for linear algebra, including eigenvalue problems and the SVD. Conventional algorithms, including those currently implemented in (Sca)LAPACK, perform asymptotically more communication than these lower bounds require. In this paper we present parallel and sequential eigenvalue algorithms (for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms that do attain these lower bounds, and analyze their convergence and communication costs.

    Lanczos eigensolution method for high-performance computers

    The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors

    The use of Lanczos's method to solve the large generalized symmetric definite eigenvalue problem

    The generalized eigenvalue problem, Kx = Lambda Mx, is of significant practical importance, especially in structural enginering where it arises as the vibration and buckling problem. A new algorithm, LANZ, based on Lanczos's method is developed. LANZ uses a technique called dynamic shifting to improve the efficiency and reliability of the Lanczos algorithm. A new algorithm for solving the tridiagonal matrices that arise when using Lanczos's method is described. A modification of Parlett and Scott's selective orthogonalization algorithm is proposed. Results from an implementation of LANZ on a Convex C-220 show it to be superior to a subspace iteration code

    ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

    Solving the electronic structure from a generalized or standard eigenproblem is often the bottleneck in large scale calculations based on Kohn-Sham density-functional theory. This problem must be addressed by essentially all current electronic structure codes, based on similar matrix expressions, and by high-performance computation. We here present a unified software interface, ELSI, to access different strategies that address the Kohn-Sham eigenvalue problem. Currently supported algorithms include the dense generalized eigensolver library ELPA, the orbital minimization method implemented in libOMM, and the pole expansion and selected inversion (PEXSI) approach with lower computational complexity for semilocal density functionals. The ELSI interface aims to simplify the implementation and optimal use of the different strategies, by offering (a) a unified software framework designed for the electronic structure solvers in Kohn-Sham density-functional theory; (b) reasonable default parameters for a chosen solver; (c) automatic conversion between input and internal working matrix formats, and in the future (d) recommendation of the optimal solver depending on the specific problem. Comparative benchmarks are shown for system sizes up to 11,520 atoms (172,800 basis functions) on distributed memory supercomputing architectures.

    Fast computation of spectral projectors of banded matrices

    We consider the approximate computation of spectral projectors for symmetric banded matrices. While this problem has received considerable attention, especially in the context of linear scaling electronic structure methods, the presence of small relative spectral gaps challenges existing methods based on approximate sparsity. In this work, we show how a data-sparse approximation based on hierarchical matrices can be used to overcome this problem. We prove a priori bounds on the approximation error and propose a fast algo- rithm based on the QDWH algorithm, along the works by Nakatsukasa et al. Numerical experiments demonstrate that the performance of our algorithm is robust with respect to the spectral gap. A preliminary Matlab implementation becomes faster than eig already for matrix sizes of a few thousand.

    Fast Hessenberg reduction of some rank structured matrices

    We develop two fast algorithms for Hessenberg reduction of a structured matrix A=D+UVHA = D + UV^H where DD is a real or unitary n×nn \times n diagonal matrix and U,VCn×kU, V \in\mathbb{C}^{n \times k}. The proposed algorithm for the real case exploits a two--stage approach by first reducing the matrix to a generalized Hessenberg form and then completing the reduction by annihilation of the unwanted sub-diagonals. It is shown that the novel method requires O(n2k)O(n^2k) arithmetic operations and it is significantly faster than other reduction algorithms for rank structured matrices. The method is then extended to the unitary plus low rank case by using a block analogue of the CMV form of unitary matrices. It is shown that a block Lanczos-type procedure for the block tridiagonalization of (D)\Re(D) induces a structured reduction on AA in a block staircase CMV--type shape. Then, we present a numerically stable method for performing this reduction using unitary transformations and we show how to generalize the sub-diagonal elimination to this shape, while still being able to provide a condensed representation for the reduced matrix. In this way the complexity still remains linear in kk and, moreover, the resulting algorithm can be adapted to deal efficiently with block companion matrices.

    Convergence and round-off errors in a two-dimensional eigenvalue problem using spectral methods and Arnoldi-Chebyshev algorithm

    An efficient way of solving 2D stability problems in fluid mechanics is to use, after discretization of the equations that cast the problem in the form of a generalized eigenvalue problem, the incomplete Arnoldi-Chebyshev method. This method preserves the banded structure sparsity of matrices of the algebraic eigenvalue problem and thus decreases memory use and CPU-time consumption. The errors that affect computed eigenvalues and eigenvectors are due to the truncation in the discretization and to finite precision in the computation of the discretized problem. In this paper we analyze those two errors and the interplay between them. We use as a test case the two-dimensional eigenvalue problem yielded by the computation of inertial modes in a spherical shell. This problem contains many difficulties that make it a very good test case. It turns out that that single modes (especially most-damped modes i.e. with high spatial frequency) can be very sensitive to round-off errors, even when apparently good spectral convergence is achieved. The influence of round-off errors is analyzed using the spectral portrait technique and by comparison of double precision and extended precision computations. Through the analysis we give practical recipes to control the truncation and round-off errors on eigenvalues and eigenvectors.

    Decay properties of spectral projectors with applications to electronic structure

    Motivated by applications in quantum chemistry and solid state physics, we apply general results from approximation theory and matrix analysis to the study of the decay properties of spectral projectors associated with large and sparse Hermitian matrices. Our theory leads to a rigorous proof of the exponential off-diagonal decay ("nearsightedness") for the density matrix of gapped systems at zero electronic temperature in both orthogonal and non-orthogonal representations, thus providing a firm theoretical basis for the possibility of linear scaling methods in electronic structure calculations for non-metallic systems. We further discuss the case of density matrices for metallic systems at positive electronic temperature. A few other possible applications are also discussed.

    A bibliography on parallel and vector numerical algorithms

    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also