24 research outputs found
Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics
Graphics Processing Units (GPUs) are having a transformational effect on
numerical lattice quantum chromodynamics (LQCD) calculations of importance in
nuclear and particle physics. The QUDA library provides a package of mixed
precision sparse matrix linear solvers for LQCD applications, supporting single
GPUs based on NVIDIA's Compute Unified Device Architecture (CUDA). This
library, interfaced to the QDP++/Chroma framework for LQCD calculations, is
currently in production use on the "9g" cluster at the Jefferson Laboratory,
enabling unprecedented price/performance for a range of problems in LQCD.
Nevertheless, memory constraints on current GPU devices limit the problem sizes
that can be tackled. In this contribution we describe the parallelization of
the QUDA library onto multiple GPUs using MPI, including strategies for the
overlapping of communication and computation. We report on both weak and strong
scaling for up to 32 GPUs interconnected by InfiniBand, on which we sustain in
excess of 4 Tflops.Comment: 11 pages, 7 figures, to appear in the Proceedings of Supercomputing
2010 (submitted April 12, 2010
Gauge Field Generation on Large-Scale GPU-Enabled Systems
Over the past years GPUs have been successfully applied to the task of
inverting the fermion matrix in lattice QCD calculations. Even strong scaling
to capability-level supercomputers, corresponding to O(100) GPUs or more has
been achieved. However strong scaling a whole gauge field generation algorithm
to this regim requires significantly more functionality than just having the
matrix inverter utilizing the GPUs and has not yet been accomplished. This
contribution extends QDP-JIT, the migration of SciDAC QDP++ to GPU-enabled
parallel systems, to help to strong scale the whole Hybrid Monte-Carlo to this
regime. Initial results are shown for gauge field generation with Chroma
simulating pure Wilson fermions on OLCF TitanDev.Comment: The 30th International Symposium on Lattice Field Theory, June 24-29,
2012, Cairns, Australia (Acknowledgment and Citation added
Extending the QUDA library for Domain Wall and Twisted Mass fermions
We extend the QUDA library, an open source library for performing calculations in lattice QCD on Graphics
Processing Units (GPUs) using NVIDIA's CUDA platform, to include kernels for non-degenerate twisted mass and
multi-gpu Domain Wall fermion operators. Performance analysis is provided for both cases
Multi-mass solvers for lattice QCD on GPUs
Graphical Processing Units (GPUs) are more and more frequently used for
lattice QCD calculations. Lattice studies often require computing the quark
propagators for several masses. These systems can be solved using multi-shift
inverters but these algorithms are memory intensive which limits the size of
the problem that can be solved using GPUs. In this paper, we show how to
efficiently use a memory-lean single-mass inverter to solve multi-mass
problems. We focus on the BiCGstab algorithm for Wilson fermions and show that
the single-mass inverter not only requires less memory but also outperforms the
multi-shift variant by a factor of two.Comment: 27 pages, 6 figures, 3 Table
Lattice QCD based on OpenCL
We present an OpenCL-based Lattice QCD application using a heatbath algorithm
for the pure gauge case and Wilson fermions in the twisted mass formulation.
The implementation is platform independent and can be used on AMD or NVIDIA
GPUs, as well as on classical CPUs. On the AMD Radeon HD 5870 our double
precision dslash implementation performs at 60 GFLOPS over a wide range of
lattice sizes. The hybrid Monte-Carlo presented reaches a speedup of four over
the reference code running on a server CPU.Comment: 19 pages, 11 figure