77 research outputs found
Solving Lattice QCD systems of equations using mixed precision solvers on GPUs
Modern graphics hardware is designed for highly parallel numerical tasks and
promises significant cost and performance benefits for many scientific
applications. One such application is lattice quantum chromodyamics (lattice
QCD), where the main computational challenge is to efficiently solve the
discretized Dirac equation in the presence of an SU(3) gauge field. Using
NVIDIA's CUDA platform we have implemented a Wilson-Dirac sparse matrix-vector
product that performs at up to 40 Gflops, 135 Gflops and 212 Gflops for double,
single and half precision respectively on NVIDIA's GeForce GTX 280 GPU. We have
developed a new mixed precision approach for Krylov solvers using reliable
updates which allows for full double precision accuracy while using only single
or half precision arithmetic for the bulk of the computation. The resulting
BiCGstab and CG solvers run in excess of 100 Gflops and, in terms of iterations
until convergence, perform better than the usual defect-correction approach for
mixed precision.Comment: 30 pages, 7 figure
Preconditioning of Improved and ``Perfect'' Fermion Actions
We construct a locally-lexicographic SSOR preconditioner to accelerate the
parallel iterative solution of linear systems of equations for two improved
discretizations of lattice fermions: the Sheikholeslami-Wohlert scheme where a
non-constant block-diagonal term is added to the Wilson fermion matrix and
renormalization group improved actions which incorporate couplings beyond
nearest neighbors of the lattice fermion fields. In case (i) we find the block
llssor-scheme to be more effective by a factor about 2 than odd-even
preconditioned solvers in terms of convergence rates, at beta=6.0. For type
(ii) actions, we show that our preconditioner accelerates the iterative
solution of a linear system of hypercube fermions by a factor of 3 to 4.Comment: 27 pages, Latex, 17 Figures include
Algorithms in Lattice QCD
The enormous computing resources that large-scale simulations in Lattice QCD
require will continue to test the limits of even the largest supercomputers into
the foreseeable future. The efficiency of such simulations will therefore concern
practitioners of lattice QCD for some time to come.
I begin with an introduction to those aspects of lattice QCD essential to the
remainder of the thesis, and follow with a description of the Wilson fermion
matrix M, an object which is central to my theme.
The principal bottleneck in Lattice QCD simulations is the solution of linear
systems involving M, and this topic is treated in depth. I compare some of the
more popular iterative methods, including Minimal Residual, Corij ugate Gradient
on the Normal Equation, BI-Conjugate Gradient, QMR., BiCGSTAB and
BiCGSTAB2, and then turn to a study of block algorithms, a special class of iterative
solvers for systems with multiple right-hand sides. Included in this study
are two block algorithms which had not previously been applied to lattice QCD.
The next chapters are concerned with a generalised Hybrid Monte Carlo algorithm
(OHM C) for QCD simulations involving dynamical quarks. I focus squarely
on the efficient and robust implementation of GHMC, and describe some tricks
to improve its performance. A limited set of results from HMC simulations at
various parameter values is presented.
A treatment of the non-hermitian Lanczos method and its application to the
eigenvalue problem for M rounds off the theme of large-scale matrix computations
- …