143 research outputs found
Lattice QCD based on OpenCL
We present an OpenCL-based Lattice QCD application using a heatbath algorithm
for the pure gauge case and Wilson fermions in the twisted mass formulation.
The implementation is platform independent and can be used on AMD or NVIDIA
GPUs, as well as on classical CPUs. On the AMD Radeon HD 5870 our double
precision dslash implementation performs at 60 GFLOPS over a wide range of
lattice sizes. The hybrid Monte-Carlo presented reaches a speedup of four over
the reference code running on a server CPU.Comment: 19 pages, 11 figure
QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems
The multi-GPU open-source package QCDGPU for lattice Monte Carlo simulations
of pure SU(N) gluodynamics in external magnetic field at finite temperature and
O(N) model is developed. The code is implemented in OpenCL, tested on AMD and
NVIDIA GPUs, AMD and Intel CPUs and may run on other OpenCL-compatible devices.
The package contains minimal external library dependencies and is OS
platform-independent. It is optimized for heterogeneous computing due to the
possibility of dividing the lattice into non-equivalent parts to hide the
difference in performances of the devices used. QCDGPU has client-server part
for distributed simulations. The package is designed to produce lattice gauge
configurations as well as to analyze previously generated ones. QCDGPU may be
executed in fault-tolerant mode. Monte Carlo procedure core is based on PRNGCL
library for pseudo-random numbers generation on OpenCL-compatible devices,
which contains several most popular pseudo-random number generators.Comment: Presented at the Third International Conference "High Performance
Computing" (HPC-UA 2013), Kyiv, Ukraine; 9 pages, 2 figure
QCD simulations with staggered fermions on GPUs
We report on our implementation of the RHMC algorithm for the simulation of
lattice QCD with two staggered flavors on Graphics Processing Units, using the
NVIDIA CUDA programming language. The main feature of our code is that the GPU
is not used just as an accelerator, but instead the whole Molecular Dynamics
trajectory is performed on it. After pointing out the main bottlenecks and how
to circumvent them, we discuss the obtained performances. We present some
preliminary results regarding OpenCL and multiGPU extensions of our code and
discuss future perspectives.Comment: 22 pages, 14 eps figures, final version to be published in Computer
Physics Communication
Design and optimization of a portable LQCD Monte Carlo code using OpenACC
The present panorama of HPC architectures is extremely heterogeneous, ranging
from traditional multi-core CPU processors, supporting a wide class of
applications but delivering moderate computing performance, to many-core GPUs,
exploiting aggressive data-parallelism and delivering higher performances for
streaming computing applications. In this scenario, code portability (and
performance portability) become necessary for easy maintainability of
applications; this is very relevant in scientific computing where code changes
are very frequent, making it tedious and prone to error to keep different code
versions aligned. In this work we present the design and optimization of a
state-of-the-art production-level LQCD Monte Carlo application, using the
directive-based OpenACC programming model. OpenACC abstracts parallel
programming to a descriptive level, relieving programmers from specifying how
codes should be mapped onto the target architecture. We describe the
implementation of a code fully written in OpenACC, and show that we are able to
target several different architectures, including state-of-the-art traditional
CPUs and GPUs, with the same code. We also measure performance, evaluating the
computing efficiency of our OpenACC code on several architectures, comparing
with GPU-specific implementations and showing that a good level of
performance-portability can be reached.Comment: 26 pages, 2 png figures, preprint of an article submitted for
consideration in International Journal of Modern Physics
Computational Physics on Graphics Processing Units
The use of graphics processing units for scientific computations is an
emerging strategy that can significantly speed up various different algorithms.
In this review, we discuss advances made in the field of computational physics,
focusing on classical molecular dynamics, and on quantum simulations for
electronic structure calculations using the density functional theory, wave
function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012,
Helsinki, Finland, June 10-13, 201
First Application of Lattice QCD to Pezy-SC Processor
AbstractPezy-SC processor is a novel new architecture developed by Pezy Computing K. K. that has achieved large computational power with low electric power consumption. It works as an accelerator device similarly to GPGPUs. A programming environment that resembles OpenCL is provided. Using a hybrid parallel system “Suiren” installed at KEK, we port and tune a simulation code of lattice QCD, which is computational elementary particle physics based on Monte Carlo method. We offload an iterative solver of a linear equation for a fermion matrix, which is in general the most time consuming part of the lattice QCD simulations. On single and multiple Pezy-SC devices, the sustained performance is measured for the matrix multiplications and a BiCGStab solver. We examine how the data layout affects the performance. The results demonstrate that the Pezy-SC processors provide a feasible environment to perform numerical lattice QCD simulations
- …