1,137 research outputs found
A Shift Selection Strategy for Parallel Shift-invert Spectrum Slicing in Symmetric Self-consistent Eigenvalue Computation
© 2020 ACM. The central importance of large-scale eigenvalue problems in scientific computation necessitates the development of massively parallel algorithms for their solution. Recent advances in dense numerical linear algebra have enabled the routine treatment of eigenvalue problems with dimensions on the order of hundreds of thousands on the world's largest supercomputers. In cases where dense treatments are not feasible, Krylov subspace methods offer an attractive alternative due to the fact that they do not require storage of the problem matrices. However, demonstration of scalability of either of these classes of eigenvalue algorithms on computing architectures capable of expressing massive parallelism is non-trivial due to communication requirements and serial bottlenecks, respectively. In this work, we introduce the SISLICE method: a parallel shift-invert algorithm for the solution of the symmetric self-consistent field (SCF) eigenvalue problem. The SISLICE method drastically reduces the communication requirement of current parallel shift-invert eigenvalue algorithms through various shift selection and migration techniques based on density of states estimation and k-means clustering, respectively. This work demonstrates the robustness and parallel performance of the SISLICE method on a representative set of SCF eigenvalue problems and outlines research directions that will be explored in future work
Efficient Recursion Method for Inverting Overlap Matrix
A new O(N) algorithm based on a recursion method, in which the computational
effort is proportional to the number of atoms N, is presented for calculating
the inverse of an overlap matrix which is needed in electronic structure
calculations with the the non-orthogonal localized basis set. This efficient
inverting method can be incorporated in several O(N) methods for
diagonalization of a generalized secular equation. By studying convergence
properties of the 1-norm of an error matrix for diamond and fcc Al, this method
is compared to three other O(N) methods (the divide method, Taylor expansion
method, and Hotelling's method) with regard to computational accuracy and
efficiency within the density functional theory. The test calculations show
that the new method is about one-hundred times faster than the divide method in
computational time to achieve the same convergence for both diamond and fcc Al,
while the Taylor expansion method and Hotelling's method suffer from numerical
instabilities in most cases.Comment: 17 pages and 4 figure
Numerical Stability of Lanczos Methods
The Lanczos algorithm for matrix tridiagonalisation suffers from strong
numerical instability in finite precision arithmetic when applied to evaluate
matrix eigenvalues. The mechanism by which this instability arises is well
documented in the literature. A recent application of the Lanczos algorithm
proposed by Bai, Fahey and Golub allows quadrature evaluation of inner products
of the form . We show that this quadrature evaluation
is numerically stable and explain how the numerical errors which are such a
fundamental element of the finite precision Lanczos tridiagonalisation
procedure are automatically and exactly compensated in the Bai, Fahey and Golub
algorithm. In the process, we shed new light on the mechanism by which roundoff
error corrupts the Lanczos procedureComment: 3 pages, Lattice 99 contributio
Preconditioned Spectral Clustering for Stochastic Block Partition Streaming Graph Challenge
Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is
demonstrated to efficiently solve eigenvalue problems for graph Laplacians that
appear in spectral clustering. For static graph partitioning, 10-20 iterations
of LOBPCG without preconditioning result in ~10x error reduction, enough to
achieve 100% correctness for all Challenge datasets with known truth
partitions, e.g., for graphs with 5K/.1M (50K/1M) Vertices/Edges in 2 (7)
seconds, compared to over 5,000 (30,000) seconds needed by the baseline Python
code. Our Python code 100% correctly determines 98 (160) clusters from the
Challenge static graphs with 0.5M (2M) vertices in 270 (1,700) seconds using
10GB (50GB) of memory. Our single-precision MATLAB code calculates the same
clusters at half time and memory. For streaming graph partitioning, LOBPCG is
initiated with approximate eigenvectors of the graph Laplacian already computed
for the previous graph, in many cases reducing 2-3 times the number of required
LOBPCG iterations, compared to the static case. Our spectral clustering is
generic, i.e. assuming nothing specific of the block model or streaming, used
to generate the graphs for the Challenge, in contrast to the base code.
Nevertheless, in 10-stage streaming comparison with the base code for the 5K
graph, the quality of our clusters is similar or better starting at stage 4 (7)
for emerging edging (snowballing) streaming, while the computations are over
100-1000 faster.Comment: 6 pages. To appear in Proceedings of the 2017 IEEE High Performance
Extreme Computing Conference. Student Innovation Award Streaming Graph
Challenge: Stochastic Block Partition, see
http://graphchallenge.mit.edu/champion
An introduction to numerical methods in low-dimensional quantum systems
This is an introductory course to the Lanczos Method and Density Matrix
Renormalization Group Algorithms(DMRG), two among the leading numerical
techniques applied in studies of low-dimensional quantum models. The idea of
studying the models on clusters of a finite size in order to extract their
physical properties is briefly discussed. The important role played by the
model symmetries is also examined. Special emphasis is given to the DMRG.Comment: 36 pages, 4 figures, standard LaTex, Brazilian School on Statistical
Mechanics (2002), PDF and PS files available at http://www.sbf.if.usp.br/bj
Spectrum of the Dirac Operator and Multigrid Algorithm with Dynamical Staggered Fermions
Complete spectra of the staggered Dirac operator \Dirac are determined in
quenched four-dimensional gauge fields, and also in the presence of
dynamical fermions.
Periodic as well as antiperiodic boundary conditions are used.
An attempt is made to relate the performance of multigrid (MG) and conjugate
gradient (CG) algorithms for propagators with the distribution of the
eigenvalues of~\Dirac.
The convergence of the CG algorithm is determined only by the condition
number~ and by the lattice size.
Since~'s do not vary significantly when quarks become dynamic,
CG convergence in unquenched fields can be predicted from quenched
simulations.
On the other hand, MG convergence is not affected by~ but depends on
the spectrum in a more subtle way.Comment: 19 pages, 8 figures, HUB-IEP-94/12 and KL-TH 19/94; comes as a
uuencoded tar-compressed .ps-fil
- …