1,137 research outputs found

    A Shift Selection Strategy for Parallel Shift-invert Spectrum Slicing in Symmetric Self-consistent Eigenvalue Computation

    Get PDF
    © 2020 ACM. The central importance of large-scale eigenvalue problems in scientific computation necessitates the development of massively parallel algorithms for their solution. Recent advances in dense numerical linear algebra have enabled the routine treatment of eigenvalue problems with dimensions on the order of hundreds of thousands on the world's largest supercomputers. In cases where dense treatments are not feasible, Krylov subspace methods offer an attractive alternative due to the fact that they do not require storage of the problem matrices. However, demonstration of scalability of either of these classes of eigenvalue algorithms on computing architectures capable of expressing massive parallelism is non-trivial due to communication requirements and serial bottlenecks, respectively. In this work, we introduce the SISLICE method: a parallel shift-invert algorithm for the solution of the symmetric self-consistent field (SCF) eigenvalue problem. The SISLICE method drastically reduces the communication requirement of current parallel shift-invert eigenvalue algorithms through various shift selection and migration techniques based on density of states estimation and k-means clustering, respectively. This work demonstrates the robustness and parallel performance of the SISLICE method on a representative set of SCF eigenvalue problems and outlines research directions that will be explored in future work

    Efficient Recursion Method for Inverting Overlap Matrix

    Full text link
    A new O(N) algorithm based on a recursion method, in which the computational effort is proportional to the number of atoms N, is presented for calculating the inverse of an overlap matrix which is needed in electronic structure calculations with the the non-orthogonal localized basis set. This efficient inverting method can be incorporated in several O(N) methods for diagonalization of a generalized secular equation. By studying convergence properties of the 1-norm of an error matrix for diamond and fcc Al, this method is compared to three other O(N) methods (the divide method, Taylor expansion method, and Hotelling's method) with regard to computational accuracy and efficiency within the density functional theory. The test calculations show that the new method is about one-hundred times faster than the divide method in computational time to achieve the same convergence for both diamond and fcc Al, while the Taylor expansion method and Hotelling's method suffer from numerical instabilities in most cases.Comment: 17 pages and 4 figure

    Numerical Stability of Lanczos Methods

    Get PDF
    The Lanczos algorithm for matrix tridiagonalisation suffers from strong numerical instability in finite precision arithmetic when applied to evaluate matrix eigenvalues. The mechanism by which this instability arises is well documented in the literature. A recent application of the Lanczos algorithm proposed by Bai, Fahey and Golub allows quadrature evaluation of inner products of the form ψ†g(A)ψ\psi^\dagger g(A) \psi. We show that this quadrature evaluation is numerically stable and explain how the numerical errors which are such a fundamental element of the finite precision Lanczos tridiagonalisation procedure are automatically and exactly compensated in the Bai, Fahey and Golub algorithm. In the process, we shed new light on the mechanism by which roundoff error corrupts the Lanczos procedureComment: 3 pages, Lattice 99 contributio

    Preconditioned Spectral Clustering for Stochastic Block Partition Streaming Graph Challenge

    Full text link
    Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is demonstrated to efficiently solve eigenvalue problems for graph Laplacians that appear in spectral clustering. For static graph partitioning, 10-20 iterations of LOBPCG without preconditioning result in ~10x error reduction, enough to achieve 100% correctness for all Challenge datasets with known truth partitions, e.g., for graphs with 5K/.1M (50K/1M) Vertices/Edges in 2 (7) seconds, compared to over 5,000 (30,000) seconds needed by the baseline Python code. Our Python code 100% correctly determines 98 (160) clusters from the Challenge static graphs with 0.5M (2M) vertices in 270 (1,700) seconds using 10GB (50GB) of memory. Our single-precision MATLAB code calculates the same clusters at half time and memory. For streaming graph partitioning, LOBPCG is initiated with approximate eigenvectors of the graph Laplacian already computed for the previous graph, in many cases reducing 2-3 times the number of required LOBPCG iterations, compared to the static case. Our spectral clustering is generic, i.e. assuming nothing specific of the block model or streaming, used to generate the graphs for the Challenge, in contrast to the base code. Nevertheless, in 10-stage streaming comparison with the base code for the 5K graph, the quality of our clusters is similar or better starting at stage 4 (7) for emerging edging (snowballing) streaming, while the computations are over 100-1000 faster.Comment: 6 pages. To appear in Proceedings of the 2017 IEEE High Performance Extreme Computing Conference. Student Innovation Award Streaming Graph Challenge: Stochastic Block Partition, see http://graphchallenge.mit.edu/champion

    An introduction to numerical methods in low-dimensional quantum systems

    Full text link
    This is an introductory course to the Lanczos Method and Density Matrix Renormalization Group Algorithms(DMRG), two among the leading numerical techniques applied in studies of low-dimensional quantum models. The idea of studying the models on clusters of a finite size in order to extract their physical properties is briefly discussed. The important role played by the model symmetries is also examined. Special emphasis is given to the DMRG.Comment: 36 pages, 4 figures, standard LaTex, Brazilian School on Statistical Mechanics (2002), PDF and PS files available at http://www.sbf.if.usp.br/bj

    Spectrum of the Dirac Operator and Multigrid Algorithm with Dynamical Staggered Fermions

    Full text link
    Complete spectra of the staggered Dirac operator \Dirac are determined in quenched four-dimensional SU(2)SU(2) gauge fields, and also in the presence of dynamical fermions. Periodic as well as antiperiodic boundary conditions are used. An attempt is made to relate the performance of multigrid (MG) and conjugate gradient (CG) algorithms for propagators with the distribution of the eigenvalues of~\Dirac. The convergence of the CG algorithm is determined only by the condition number~κ\kappa and by the lattice size. Since~κ\kappa's do not vary significantly when quarks become dynamic, CG convergence in unquenched fields can be predicted from quenched simulations. On the other hand, MG convergence is not affected by~κ\kappa but depends on the spectrum in a more subtle way.Comment: 19 pages, 8 figures, HUB-IEP-94/12 and KL-TH 19/94; comes as a uuencoded tar-compressed .ps-fil
    • …
    corecore