174 research outputs found
Solving Lattice QCD systems of equations using mixed precision solvers on GPUs
Modern graphics hardware is designed for highly parallel numerical tasks and
promises significant cost and performance benefits for many scientific
applications. One such application is lattice quantum chromodyamics (lattice
QCD), where the main computational challenge is to efficiently solve the
discretized Dirac equation in the presence of an SU(3) gauge field. Using
NVIDIA's CUDA platform we have implemented a Wilson-Dirac sparse matrix-vector
product that performs at up to 40 Gflops, 135 Gflops and 212 Gflops for double,
single and half precision respectively on NVIDIA's GeForce GTX 280 GPU. We have
developed a new mixed precision approach for Krylov solvers using reliable
updates which allows for full double precision accuracy while using only single
or half precision arithmetic for the bulk of the computation. The resulting
BiCGstab and CG solvers run in excess of 100 Gflops and, in terms of iterations
until convergence, perform better than the usual defect-correction approach for
mixed precision.Comment: 30 pages, 7 figure
Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond
In this and a set of companion whitepapers, the USQCD Collaboration lays out
a program of science and computing for lattice gauge theory. These whitepapers
describe how calculation using lattice QCD (and other gauge theories) can aid
the interpretation of ongoing and upcoming experiments in particle and nuclear
physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers
Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics
Graphics Processing Units (GPUs) are having a transformational effect on
numerical lattice quantum chromodynamics (LQCD) calculations of importance in
nuclear and particle physics. The QUDA library provides a package of mixed
precision sparse matrix linear solvers for LQCD applications, supporting single
GPUs based on NVIDIA's Compute Unified Device Architecture (CUDA). This
library, interfaced to the QDP++/Chroma framework for LQCD calculations, is
currently in production use on the "9g" cluster at the Jefferson Laboratory,
enabling unprecedented price/performance for a range of problems in LQCD.
Nevertheless, memory constraints on current GPU devices limit the problem sizes
that can be tackled. In this contribution we describe the parallelization of
the QUDA library onto multiple GPUs using MPI, including strategies for the
overlapping of communication and computation. We report on both weak and strong
scaling for up to 32 GPUs interconnected by InfiniBand, on which we sustain in
excess of 4 Tflops.Comment: 11 pages, 7 figures, to appear in the Proceedings of Supercomputing
2010 (submitted April 12, 2010
QCD simulations with staggered fermions on GPUs
We report on our implementation of the RHMC algorithm for the simulation of
lattice QCD with two staggered flavors on Graphics Processing Units, using the
NVIDIA CUDA programming language. The main feature of our code is that the GPU
is not used just as an accelerator, but instead the whole Molecular Dynamics
trajectory is performed on it. After pointing out the main bottlenecks and how
to circumvent them, we discuss the obtained performances. We present some
preliminary results regarding OpenCL and multiGPU extensions of our code and
discuss future perspectives.Comment: 22 pages, 14 eps figures, final version to be published in Computer
Physics Communication
Multi-mass solvers for lattice QCD on GPUs
Graphical Processing Units (GPUs) are more and more frequently used for
lattice QCD calculations. Lattice studies often require computing the quark
propagators for several masses. These systems can be solved using multi-shift
inverters but these algorithms are memory intensive which limits the size of
the problem that can be solved using GPUs. In this paper, we show how to
efficiently use a memory-lean single-mass inverter to solve multi-mass
problems. We focus on the BiCGstab algorithm for Wilson fermions and show that
the single-mass inverter not only requires less memory but also outperforms the
multi-shift variant by a factor of two.Comment: 27 pages, 6 figures, 3 Table
Simulating the weak death of the neutron in a femtoscale universe with near-Exascale computing
The fundamental particle theory called Quantum Chromodynamics (QCD) dictates
everything about protons and neutrons, from their intrinsic properties to
interactions that bind them into atomic nuclei. Quantities that cannot be fully
resolved through experiment, such as the neutron lifetime (whose precise value
is important for the existence of light-atomic elements that make the sun shine
and life possible), may be understood through numerical solutions to QCD. We
directly solve QCD using Lattice Gauge Theory and calculate nuclear observables
such as neutron lifetime. We have developed an improved algorithm that
exponentially decreases the time-to solution and applied it on the new CORAL
supercomputers, Sierra and Summit. We use run-time autotuning to distribute GPU
resources, achieving 20% performance at low node count. We also developed
optimal application mapping through a job manager, which allows CPU and GPU
jobs to be interleaved, yielding 15% of peak performance when deployed across
large fractions of CORAL.Comment: 2018 Gordon Bell Finalist: 9 pages, 9 figures; v2: fixed 2 typos and
appended acknowledgement
- …