70 research outputs found
SU(2) Lattice Gauge Theory Simulations on Fermi GPUs
In this work we explore the performance of CUDA in quenched lattice SU(2)
simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware
and software architecture developed by NVIDIA for computing on the GPU. We
present an analysis and performance comparison between the GPU and CPU in
single and double precision. Analyses with multiple GPUs and two different
architectures (G200 and Fermi architectures) are also presented. In order to
obtain a high performance, the code must be optimized for the GPU architecture,
i.e., an implementation that exploits the memory hierarchy of the CUDA
programming model.
We produce codes for the Monte Carlo generation of SU(2) lattice gauge
configurations, for the mean plaquette, for the Polyakov Loop at finite T and
for the Wilson loop. We also present results for the potential using many
configurations () without smearing and almost configurations
with APE smearing. With two Fermi GPUs we have achieved an excellent
performance of the speed over one CPU, in single precision, around
110 Gflops/s. We also find that, using the Fermi architecture, double precision
computations for the static quark-antiquark potential are not much slower (less
than slower) than single precision computations.Comment: 20 pages, 11 figures, 3 tables, accepted in Journal of Computational
Physic
Landau Gauge Fixing on GPUs
In this paper we present and explore the performance of Landau gauge fixing
in GPUs using CUDA. We consider the steepest descent algorithm with Fourier
acceleration, and compare the GPU performance with a parallel CPU
implementation. Using lattice volumes, we find that the computational
power of a single Tesla C2070 GPU is equivalent to approximately 256 CPU cores.Comment: 10 pages, 3 figures and 3 table
QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems
The multi-GPU open-source package QCDGPU for lattice Monte Carlo simulations
of pure SU(N) gluodynamics in external magnetic field at finite temperature and
O(N) model is developed. The code is implemented in OpenCL, tested on AMD and
NVIDIA GPUs, AMD and Intel CPUs and may run on other OpenCL-compatible devices.
The package contains minimal external library dependencies and is OS
platform-independent. It is optimized for heterogeneous computing due to the
possibility of dividing the lattice into non-equivalent parts to hide the
difference in performances of the devices used. QCDGPU has client-server part
for distributed simulations. The package is designed to produce lattice gauge
configurations as well as to analyze previously generated ones. QCDGPU may be
executed in fault-tolerant mode. Monte Carlo procedure core is based on PRNGCL
library for pseudo-random numbers generation on OpenCL-compatible devices,
which contains several most popular pseudo-random number generators.Comment: Presented at the Third International Conference "High Performance
Computing" (HPC-UA 2013), Kyiv, Ukraine; 9 pages, 2 figure
Gauge Field Generation on Large-Scale GPU-Enabled Systems
Over the past years GPUs have been successfully applied to the task of
inverting the fermion matrix in lattice QCD calculations. Even strong scaling
to capability-level supercomputers, corresponding to O(100) GPUs or more has
been achieved. However strong scaling a whole gauge field generation algorithm
to this regim requires significantly more functionality than just having the
matrix inverter utilizing the GPUs and has not yet been accomplished. This
contribution extends QDP-JIT, the migration of SciDAC QDP++ to GPU-enabled
parallel systems, to help to strong scale the whole Hybrid Monte-Carlo to this
regime. Initial results are shown for gauge field generation with Chroma
simulating pure Wilson fermions on OLCF TitanDev.Comment: The 30th International Symposium on Lattice Field Theory, June 24-29,
2012, Cairns, Australia (Acknowledgment and Citation added
Landau Gauge Fixing on GPUs and String Tension
We explore the performance of CUDA in performing Landau gauge fixing in
Lattice QCD, using the steepest descent method with Fourier acceleration. The
code performance was tested in a Tesla C2070, Fermi architecture. We also
present a study of the string tension at finite temperature in the confined
phase. The string tension is extracted from the color averaged free energy and
from the color singlet using Landau gauge fixing.Comment: 7 pages, 4 figures, 1 table. Contribution to the International
Meeting "Excited QCD", Peniche, Portugal, 06 - 12 May 201
Lattice QCD based on OpenCL
We present an OpenCL-based Lattice QCD application using a heatbath algorithm
for the pure gauge case and Wilson fermions in the twisted mass formulation.
The implementation is platform independent and can be used on AMD or NVIDIA
GPUs, as well as on classical CPUs. On the AMD Radeon HD 5870 our double
precision dslash implementation performs at 60 GFLOPS over a wide range of
lattice sizes. The hybrid Monte-Carlo presented reaches a speedup of four over
the reference code running on a server CPU.Comment: 19 pages, 11 figure
Landau gauge fixing on the lattice using GPU's
In this work, we consider the GPU implementation of the steepest descent
method with Fourier acceleration for Laudau gauge fixing, using CUDA. The
performance of the code in a Tesla C2070 GPU is compared with a parallel CPU
implementation.Comment: 3 pages, 1 figure, Proceedings of the Xth Quark Confinement and the
Hadron Spectrum, 8-12 October 2012, TUM Campus Garching, Munich, German
- …