Search CORE

70 research outputs found

SU(2) Lattice Gauge Theory Simulations on Fermi GPUs

Author: Bhanot
Clark
Creutz
Creutz
Egri
Engels
Huntley
Kirk
Kovacs
McLerran
Nuno Cardoso
Pedro Bicudo
Press
Shakespeare
Stack
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

In this work we explore the performance of CUDA in quenched lattice SU(2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analyses with multiple GPUs and two different architectures (G200 and Fermi architectures) are also presented. In order to obtain a high performance, the code must be optimized for the GPU architecture, i.e., an implementation that exploits the memory hierarchy of the CUDA programming model. We produce codes for the Monte Carlo generation of SU(2) lattice gauge configurations, for the mean plaquette, for the Polyakov Loop at finite T and for the Wilson loop. We also present results for the potential using many configurations (

50\ 000

) without smearing and almost

2\ 000

configurations with APE smearing. With two Fermi GPUs we have achieved an excellent performance of

200 \times

the speed over one CPU, in single precision, around 110 Gflops/s. We also find that, using the Fermi architecture, double precision computations for the static quark-antiquark potential are not much slower (less than

2 \times

slower) than single precision computations.Comment: 20 pages, 11 figures, 3 tables, accepted in Journal of Computational Physic

arXiv.org e-Print Archive

CiteSeerX

Crossref

Landau Gauge Fixing on GPUs

Author: Babich
Bogolubsky
Boucaud
Bowman
Cardoso
Clark
Cucchieri
Cucchieri
Davies
Dudal
Edwards
Elitzur
Furui
Giusti
Martinelli
Nuno Cardoso
Oliveira
Oliveira
Oliveira
Orlando Oliveira
Paulo J. Silva
Pedro Bicudo
Schrock
Publication venue: 'Elsevier BV'
Publication date: 08/10/2012
Field of study

In this paper we present and explore the performance of Landau gauge fixing in GPUs using CUDA. We consider the steepest descent algorithm with Fourier acceleration, and compare the GPU performance with a parallel CPU implementation. Using

32^4

lattice volumes, we find that the computational power of a single Tesla C2070 GPU is equivalent to approximately 256 CPU cores.Comment: 10 pages, 3 figures and 3 table

arXiv.org e-Print Archive

Crossref

QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems

Author: Demchik Vadim
Kolomoyets Natalia
Publication venue
Publication date: 26/10/2013
Field of study

The multi-GPU open-source package QCDGPU for lattice Monte Carlo simulations of pure SU(N) gluodynamics in external magnetic field at finite temperature and O(N) model is developed. The code is implemented in OpenCL, tested on AMD and NVIDIA GPUs, AMD and Intel CPUs and may run on other OpenCL-compatible devices. The package contains minimal external library dependencies and is OS platform-independent. It is optimized for heterogeneous computing due to the possibility of dividing the lattice into non-equivalent parts to hide the difference in performances of the devices used. QCDGPU has client-server part for distributed simulations. The package is designed to produce lattice gauge configurations as well as to analyze previously generated ones. QCDGPU may be executed in fault-tolerant mode. Monte Carlo procedure core is based on PRNGCL library for pseudo-random numbers generation on OpenCL-compatible devices, which contains several most popular pseudo-random number generators.Comment: Presented at the Third International Conference "High Performance Computing" (HPC-UA 2013), Kyiv, Ukraine; 9 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Gauge Field Generation on Large-Scale GPU-Enabled Systems

Author: Winter Frank
Publication venue
Publication date: 05/12/2012
Field of study

Over the past years GPUs have been successfully applied to the task of inverting the fermion matrix in lattice QCD calculations. Even strong scaling to capability-level supercomputers, corresponding to O(100) GPUs or more has been achieved. However strong scaling a whole gauge field generation algorithm to this regim requires significantly more functionality than just having the matrix inverter utilizing the GPUs and has not yet been accomplished. This contribution extends QDP-JIT, the migration of SciDAC QDP++ to GPU-enabled parallel systems, to help to strong scale the whole Hybrid Monte-Carlo to this regime. Initial results are shown for gauge field generation with Chroma simulating pure Wilson fermions on OLCF TitanDev.Comment: The 30th International Symposium on Lattice Field Theory, June 24-29, 2012, Cairns, Australia (Acknowledgment and Citation added

arXiv.org e-Print Archive

Crossref

Landau Gauge Fixing on GPUs and String Tension

Author: Bicudo Pedro
Cardoso Nuno
Oliveira Orlando
Silva Paulo J.
Publication venue: 'Jagiellonian University'
Publication date: 01/01/2012
Field of study

We explore the performance of CUDA in performing Landau gauge fixing in Lattice QCD, using the steepest descent method with Fourier acceleration. The code performance was tested in a Tesla C2070, Fermi architecture. We also present a study of the string tension at finite temperature in the confined phase. The string tension is extracted from the color averaged free energy and from the color singlet using Landau gauge fixing.Comment: 7 pages, 4 figures, 1 table. Contribution to the International Meeting "Excited QCD", Peniche, Portugal, 06 - 12 May 201

arXiv.org e-Print Archive

Crossref

Lattice QCD based on OpenCL

Author: Bach Matthias
Lindenstruth Volker
Philipsen Owe
Pinke Christopher
Publication venue: 'Elsevier BV'
Publication date: 26/09/2012
Field of study

We present an OpenCL-based Lattice QCD application using a heatbath algorithm for the pure gauge case and Wilson fermions in the twisted mass formulation. The implementation is platform independent and can be used on AMD or NVIDIA GPUs, as well as on classical CPUs. On the AMD Radeon HD 5870 our double precision dslash implementation performs at 60 GFLOPS over a wide range of lattice sizes. The hybrid Monte-Carlo presented reaches a speedup of four over the reference code running on a server CPU.Comment: 19 pages, 11 figure

arXiv.org e-Print Archive

GSI Repository

Landau gauge fixing on the lattice using GPU's

Author: Bicudo Pedro
Cardoso Nuno
Oliveira Orlando
Silva Paulo J.
Publication venue
Publication date: 15/01/2013
Field of study

In this work, we consider the GPU implementation of the steepest descent method with Fourier acceleration for Laudau gauge fixing, using CUDA. The performance of the code in a Tesla C2070 GPU is compared with a parallel CPU implementation.Comment: 3 pages, 1 figure, Proceedings of the Xth Quark Confinement and the Hadron Spectrum, 8-12 October 2012, TUM Campus Garching, Munich, German

arXiv.org e-Print Archive

CiteSeerX

Crossref