6 research outputs found
Design and optimization of a portable LQCD Monte Carlo code using OpenACC
The present panorama of HPC architectures is extremely heterogeneous, ranging
from traditional multi-core CPU processors, supporting a wide class of
applications but delivering moderate computing performance, to many-core GPUs,
exploiting aggressive data-parallelism and delivering higher performances for
streaming computing applications. In this scenario, code portability (and
performance portability) become necessary for easy maintainability of
applications; this is very relevant in scientific computing where code changes
are very frequent, making it tedious and prone to error to keep different code
versions aligned. In this work we present the design and optimization of a
state-of-the-art production-level LQCD Monte Carlo application, using the
directive-based OpenACC programming model. OpenACC abstracts parallel
programming to a descriptive level, relieving programmers from specifying how
codes should be mapped onto the target architecture. We describe the
implementation of a code fully written in OpenACC, and show that we are able to
target several different architectures, including state-of-the-art traditional
CPUs and GPUs, with the same code. We also measure performance, evaluating the
computing efficiency of our OpenACC code on several architectures, comparing
with GPU-specific implementations and showing that a good level of
performance-portability can be reached.Comment: 26 pages, 2 png figures, preprint of an article submitted for
consideration in International Journal of Modern Physics
Portable multi-node LQCD Monte Carlo simulations using OpenACC
This paper describes a state-of-the-art parallel Lattice QCD Monte Carlo code
for staggered fermions, purposely designed to be portable across different
computer architectures, including GPUs and commodity CPUs. Portability is
achieved using the OpenACC parallel programming model, used to develop a code
that can be compiled for several processor architectures. The paper focuses on
parallelization on multiple computing nodes using OpenACC to manage parallelism
within the node, and OpenMPI to manage parallelism among the nodes. We first
discuss the available strategies to be adopted to maximize performances, we
then describe selected relevant details of the code, and finally measure the
level of performance and scaling-performance that we are able to achieve. The
work focuses mainly on GPUs, which offer a significantly high level of
performances for this application, but also compares with results measured on
other processors.Comment: 22 pages, 8 png figure
Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond
In this and a set of companion whitepapers, the USQCD Collaboration lays out
a program of science and computing for lattice gauge theory. These whitepapers
describe how calculation using lattice QCD (and other gauge theories) can aid
the interpretation of ongoing and upcoming experiments in particle and nuclear
physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers