242 research outputs found
Structure Preserving Parallel Algorithms for Solving the Bethe-Salpeter Eigenvalue Problem
The Bethe-Salpeter eigenvalue problem is a dense structured eigenvalue
problem arising from discretized Bethe-Salpeter equation in the context of
computing exciton energies and states. A computational challenge is that at
least half of the eigenvalues and the associated eigenvectors are desired in
practice. We establish the equivalence between Bethe-Salpeter eigenvalue
problems and real Hamiltonian eigenvalue problems. Based on theoretical
analysis, structure preserving algorithms for a class of Bethe-Salpeter
eigenvalue problems are proposed. We also show that for this class of problems
all eigenvalues obtained from the Tamm-Dancoff approximation are overestimated.
In order to solve large scale problems of practical interest, we discuss
parallel implementations of our algorithms targeting distributed memory
systems. Several numerical examples are presented to demonstrate the efficiency
and accuracy of our algorithms
Numerical Analysis
Acknowledgements: This article will appear in the forthcoming Princeton Companion to Mathematics, edited by Timothy Gowers with June Barrow-Green, to be published by Princeton University Press.\ud
\ud
In preparing this essay I have benefitted from the advice of many colleagues who corrected a number of errors of fact and emphasis. I have not always followed their advice, however, preferring as one friend put it, to "put my head above the parapet". So I must take full responsibility for errors and omissions here.\ud
\ud
With thanks to: Aurelio Arranz, Alexander Barnett, Carl de Boor, David Bindel, Jean-Marc Blanc, Mike Bochev, Folkmar Bornemann, Richard Brent, Martin Campbell-Kelly, Sam Clark, Tim Davis, Iain Duff, Stan Eisenstat, Don Estep, Janice Giudice, Gene Golub, Nick Gould, Tim Gowers, Anne Greenbaum, Leslie Greengard, Martin Gutknecht, Raphael Hauser, Des Higham, Nick Higham, Ilse Ipsen, Arieh Iserles, David Kincaid, Louis Komzsik, David Knezevic, Dirk Laurie, Randy LeVeque, Bill Morton, John C Nash, Michael Overton, Yoshio Oyanagi, Beresford Parlett, Linda Petzold, Bill Phillips, Mike Powell, Alex Prideaux, Siegfried Rump, Thomas Schmelzer, Thomas Sonar, Hans Stetter, Gil Strang, Endre Süli, Defeng Sun, Mike Sussman, Daniel Szyld, Garry Tee, Dmitry Vasilyev, Andy Wathen, Margaret Wright and Steve Wright
A hierarchically blocked Jacobi SVD algorithm for single and multiple graphics processing units
We present a hierarchically blocked one-sided Jacobi algorithm for the
singular value decomposition (SVD), targeting both single and multiple graphics
processing units (GPUs). The blocking structure reflects the levels of GPU's
memory hierarchy. The algorithm may outperform MAGMA's dgesvd, while retaining
high relative accuracy. To this end, we developed a family of parallel pivot
strategies on GPU's shared address space, but applicable also to inter-GPU
communication. Unlike common hybrid approaches, our algorithm in a single GPU
setting needs a CPU for the controlling purposes only, while utilizing GPU's
resources to the fullest extent permitted by the hardware. When required by the
problem size, the algorithm, in principle, scales to an arbitrary number of GPU
nodes. The scalability is demonstrated by more than twofold speedup for
sufficiently large matrices on a Tesla S2050 system with four GPUs vs. a single
Fermi card.Comment: Accepted for publication in SIAM Journal on Scientific Computin
A Parallel Solver for Graph Laplacians
Problems from graph drawing, spectral clustering, network flow and graph
partitioning can all be expressed in terms of graph Laplacian matrices. There
are a variety of practical approaches to solving these problems in serial.
However, as problem sizes increase and single core speeds stagnate, parallelism
is essential to solve such problems quickly. We present an unsmoothed
aggregation multigrid method for solving graph Laplacians in a distributed
memory setting. We introduce new parallel aggregation and low degree
elimination algorithms targeted specifically at irregular degree graphs. These
algorithms are expressed in terms of sparse matrix-vector products using
generalized sum and product operations. This formulation is amenable to linear
algebra using arbitrary distributions and allows us to operate on a 2D sparse
matrix distribution, which is necessary for parallel scalability. Our solver
outperforms the natural parallel extension of the current state of the art in
an algorithmic comparison. We demonstrate scalability to 576 processes and
graphs with up to 1.7 billion edges.Comment: PASC '18, Code: https://github.com/ligmg/ligm
Computing subdominant unstable modes of turbulent plasma with a parallel Jacobi-Davidson eigensolver
In the numerical solution of large-scale eigenvalue problems, Davidson-type methods are an increasingly popular alternative to Krylov eigensolvers. The main motivation is to avoid the expensive factorizations that are often needed by Krylov solvers when the problem is generalized or interior eigenvalues are desired. In Davidson-type methods, the factorization is replaced by iterative linear solvers that can be accelerated by a smart preconditioner. Jacobi-Davidson is one of the most effective variants. However, parallel implementations of this method are not widely available, particularly for non-symmetric problems. We present a parallel implementation that has been included in SLEPc, the Scalable Library for Eigenvalue Problem Computations, and test it in the context of a highly scalable plasma turbulence simulation code. We analyze its parallel efficiency and compare it with a Krylov-Schur eigensolver. © 2011 John Wiley and Sons, Ltd..The authors are indebted to Florian Merz for providing us with the test cases and for his useful suggestions. The authors acknowledge the computer resources provided by the Barcelona Supercomputing Center (BSC). This work was supported by the Spanish Ministerio de Ciencia e Innovacion under project TIN2009-07519.Romero Alcalde, E.; Román Moltó, JE. (2011). Computing subdominant unstable modes of turbulent plasma with a parallel Jacobi-Davidson eigensolver. Concurrency and Computation: Practice and Experience. 23:2179-2191. https://doi.org/10.1002/cpe.1740S2179219123Hochstenbach, M. E., & Notay, Y. (2009). Controlling Inner Iterations in the Jacobi–Davidson Method. SIAM Journal on Matrix Analysis and Applications, 31(2), 460-477. doi:10.1137/080732110Heuveline, V., Philippe, B., & Sadkane, M. (1997). Numerical Algorithms, 16(1), 55-75. doi:10.1023/a:1019126827697Arbenz, P., Bečka, M., Geus, R., Hetmaniuk, U., & Mengotti, T. (2006). On a parallel multilevel preconditioned Maxwell eigensolver. Parallel Computing, 32(2), 157-165. doi:10.1016/j.parco.2005.06.005Genseberger, M. (2010). Improving the parallel performance of a domain decomposition preconditioning technique in the Jacobi–Davidson method for large scale eigenvalue problems. Applied Numerical Mathematics, 60(11), 1083-1099. doi:10.1016/j.apnum.2009.07.004Stathopoulos, A., & McCombs, J. R. (2010). PRIMME. ACM Transactions on Mathematical Software, 37(2), 1-30. doi:10.1145/1731022.1731031Baker, C. G., Hetmaniuk, U. L., Lehoucq, R. B., & Thornquist, H. K. (2009). Anasazi software for the numerical solution of large-scale eigenvalue problems. ACM Transactions on Mathematical Software, 36(3), 1-23. doi:10.1145/1527286.1527287Hernandez, V., Roman, J. E., & Vidal, V. (2005). SLEPc. ACM Transactions on Mathematical Software, 31(3), 351-362. doi:10.1145/1089014.1089019Romero, E., Cruz, M. B., Roman, J. E., & Vasconcelos, P. B. (2011). A Parallel Implementation of the Jacobi-Davidson Eigensolver for Unsymmetric Matrices. High Performance Computing for Computational Science – VECPAR 2010, 380-393. doi:10.1007/978-3-642-19328-6_35Romero, E., & Roman, J. E. (2010). A Parallel Implementation of the Jacobi-Davidson Eigensolver and Its Application in a Plasma Turbulence Code. Lecture Notes in Computer Science, 101-112. doi:10.1007/978-3-642-15291-7_11Über ein leichtes Verfahren die in der Theorie der Säcularstörungen vorkommenden Gleichungen numerisch aufzulösen*). (1846). Journal für die reine und angewandte Mathematik (Crelles Journal), 1846(30), 51-94. doi:10.1515/crll.1846.30.51G. Sleijpen, G. L., & Van der Vorst, H. A. (1996). A Jacobi–Davidson Iteration Method for Linear Eigenvalue Problems. SIAM Journal on Matrix Analysis and Applications, 17(2), 401-425. doi:10.1137/s0895479894270427Fokkema, D. R., Sleijpen, G. L. G., & Van der Vorst, H. A. (1998). Jacobi--Davidson Style QR and QZ Algorithms for the Reduction of Matrix Pencils. SIAM Journal on Scientific Computing, 20(1), 94-125. doi:10.1137/s1064827596300073Morgan, R. B. (1991). Computing interior eigenvalues of large matrices. Linear Algebra and its Applications, 154-156, 289-309. doi:10.1016/0024-3795(91)90381-6Paige, C. C., Parlett, B. N., & van der Vorst, H. A. (1995). Approximate solutions and eigenvalue bounds from Krylov subspaces. Numerical Linear Algebra with Applications, 2(2), 115-133. doi:10.1002/nla.1680020205Stathopoulos, A., Saad, Y., & Wu, K. (1998). Dynamic Thick Restarting of the Davidson, and the Implicitly Restarted Arnoldi Methods. SIAM Journal on Scientific Computing, 19(1), 227-245. doi:10.1137/s1064827596304162Sleijpen, G. L. G., Booten, A. G. L., Fokkema, D. R., & van der Vorst, H. A. (1996). Jacobi-davidson type methods for generalized eigenproblems and polynomial eigenproblems. BIT Numerical Mathematics, 36(3), 595-633. doi:10.1007/bf01731936Balay S Buschelman K Eijkhout V Gropp W Kaushik D Knepley M McInnes LC Smith B Zhang H PETSc users manual 2010Hernandez, V., Roman, J. E., & Tomas, A. (2007). Parallel Arnoldi eigensolvers with enhanced scalability via global communications rearrangement. Parallel Computing, 33(7-8), 521-540. doi:10.1016/j.parco.2007.04.004Dannert, T., & Jenko, F. (2005). Gyrokinetic simulation of collisionless trapped-electron mode turbulence. Physics of Plasmas, 12(7), 072309. doi:10.1063/1.1947447Roman, J. E., Kammerer, M., Merz, F., & Jenko, F. (2010). Fast eigenvalue calculations in a massively parallel plasma turbulence code. Parallel Computing, 36(5-6), 339-358. doi:10.1016/j.parco.2009.12.001Merz, F., & Jenko, F. (2010). Nonlinear interplay of TEM and ITG turbulence and its effect on transport. Nuclear Fusion, 50(5), 054005. doi:10.1088/0029-5515/50/5/054005Simoncini, V., & Szyld, D. B. (2002). Flexible Inner-Outer Krylov Subspace Methods. SIAM Journal on Numerical Analysis, 40(6), 2219-2239. doi:10.1137/s0036142902401074Morgan, R. B. (2002). GMRES with Deflated Restarting. SIAM Journal on Scientific Computing, 24(1), 20-37. doi:10.1137/s106482759936465
Fast and accurate con-eigenvalue algorithm for optimal rational approximations
The need to compute small con-eigenvalues and the associated con-eigenvectors
of positive-definite Cauchy matrices naturally arises when constructing
rational approximations with a (near) optimally small error.
Specifically, given a rational function with poles in the unit disk, a
rational approximation with poles in the unit disk may be obtained
from the th con-eigenvector of an Cauchy matrix, where the
associated con-eigenvalue gives the approximation error in the
norm. Unfortunately, standard algorithms do not accurately compute
small con-eigenvalues (and the associated con-eigenvectors) and, in particular,
yield few or no correct digits for con-eigenvalues smaller than the machine
roundoff. We develop a fast and accurate algorithm for computing
con-eigenvalues and con-eigenvectors of positive-definite Cauchy matrices,
yielding even the tiniest con-eigenvalues with high relative accuracy. The
algorithm computes the th con-eigenvalue in operations
and, since the con-eigenvalues of positive-definite Cauchy matrices decay
exponentially fast, we obtain (near) optimal rational approximations in
operations, where is the
approximation error in the norm. We derive error bounds
demonstrating high relative accuracy of the computed con-eigenvalues and the
high accuracy of the unit con-eigenvectors. We also provide examples of using
the algorithm to compute (near) optimal rational approximations of functions
with singularities and sharp transitions, where approximation errors close to
machine precision are obtained. Finally, we present numerical tests on random
(complex-valued) Cauchy matrices to show that the algorithm computes all the
con-eigenvalues and con-eigenvectors with nearly full precision
- …