2,665 research outputs found
Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond
In this and a set of companion whitepapers, the USQCD Collaboration lays out
a program of science and computing for lattice gauge theory. These whitepapers
describe how calculation using lattice QCD (and other gauge theories) can aid
the interpretation of ongoing and upcoming experiments in particle and nuclear
physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers
Simulating the weak death of the neutron in a femtoscale universe with near-Exascale computing
The fundamental particle theory called Quantum Chromodynamics (QCD) dictates
everything about protons and neutrons, from their intrinsic properties to
interactions that bind them into atomic nuclei. Quantities that cannot be fully
resolved through experiment, such as the neutron lifetime (whose precise value
is important for the existence of light-atomic elements that make the sun shine
and life possible), may be understood through numerical solutions to QCD. We
directly solve QCD using Lattice Gauge Theory and calculate nuclear observables
such as neutron lifetime. We have developed an improved algorithm that
exponentially decreases the time-to solution and applied it on the new CORAL
supercomputers, Sierra and Summit. We use run-time autotuning to distribute GPU
resources, achieving 20% performance at low node count. We also developed
optimal application mapping through a job manager, which allows CPU and GPU
jobs to be interleaved, yielding 15% of peak performance when deployed across
large fractions of CORAL.Comment: 2018 Gordon Bell Finalist: 9 pages, 9 figures; v2: fixed 2 typos and
appended acknowledgement
Parallel Algorithm for Solving Kepler's Equation on Graphics Processing Units: Application to Analysis of Doppler Exoplanet Searches
[Abridged] We present the results of a highly parallel Kepler equation solver
using the Graphics Processing Unit (GPU) on a commercial nVidia GeForce 280GTX
and the "Compute Unified Device Architecture" programming environment. We apply
this to evaluate a goodness-of-fit statistic (e.g., chi^2) for Doppler
observations of stars potentially harboring multiple planetary companions
(assuming negligible planet-planet interactions). We tested multiple
implementations using single precision, double precision, pairs of single
precision, and mixed precision arithmetic. We find that the vast majority of
computations can be performed using single precision arithmetic, with selective
use of compensated summation for increased precision. However, standard single
precision is not adequate for calculating the mean anomaly from the time of
observation and orbital period when evaluating the goodness-of-fit for real
planetary systems and observational data sets. Using all double precision, our
GPU code outperforms a similar code using a modern CPU by a factor of over 60.
Using mixed-precision, our GPU code provides a speed-up factor of over 600,
when evaluating N_sys > 1024 models planetary systems each containing N_pl = 4
planets and assuming N_obs = 256 observations of each system. We conclude that
modern GPUs also offer a powerful tool for repeatedly evaluating Kepler's
equation and a goodness-of-fit statistic for orbital models when presented with
a large parameter space.Comment: 19 pages, to appear in New Astronom
PyFR: An Open Source Framework for Solving Advection-Diffusion Type Problems on Streaming Architectures using the Flux Reconstruction Approach
High-order numerical methods for unstructured grids combine the superior
accuracy of high-order spectral or finite difference methods with the geometric
flexibility of low-order finite volume or finite element schemes. The Flux
Reconstruction (FR) approach unifies various high-order schemes for
unstructured grids within a single framework. Additionally, the FR approach
exhibits a significant degree of element locality, and is thus able to run
efficiently on modern streaming architectures, such as Graphical Processing
Units (GPUs). The aforementioned properties of FR mean it offers a promising
route to performing affordable, and hence industrially relevant,
scale-resolving simulations of hitherto intractable unsteady flows within the
vicinity of real-world engineering geometries. In this paper we present PyFR,
an open-source Python based framework for solving advection-diffusion type
problems on streaming architectures using the FR approach. The framework is
designed to solve a range of governing systems on mixed unstructured grids
containing various element types. It is also designed to target a range of
hardware platforms via use of an in-built domain specific language based on the
Mako templating engine. The current release of PyFR is able to solve the
compressible Euler and Navier-Stokes equations on grids of quadrilateral and
triangular elements in two dimensions, and hexahedral elements in three
dimensions, targeting clusters of CPUs, and NVIDIA GPUs. Results are presented
for various benchmark flow problems, single-node performance is discussed, and
scalability of the code is demonstrated on up to 104 NVIDIA M2090 GPUs. The
software is freely available under a 3-Clause New Style BSD license (see
www.pyfr.org)
Parameter Selection and Pre-Conditioning for a Graph Form Solver
In a recent paper, Parikh and Boyd describe a method for solving a convex
optimization problem, where each iteration involves evaluating a proximal
operator and projection onto a subspace. In this paper we address the critical
practical issues of how to select the proximal parameter in each iteration, and
how to scale the original problem variables, so as the achieve reliable
practical performance. The resulting method has been implemented as an
open-source software package called POGS (Proximal Graph Solver), that targets
multi-core and GPU-based systems, and has been tested on a wide variety of
practical problems. Numerical results show that POGS can solve very large
problems (with, say, more than a billion coefficients in the data), to modest
accuracy in a few tens of seconds. As just one example, a radiation treatment
planning problem with around 100 million coefficients in the data can be solved
in a few seconds, as compared to around one hour with an interior-point method.Comment: 28 pages, 1 figure, 1 open source implementatio
Hydrodynamics of Suspensions of Passive and Active Rigid Particles: A Rigid Multiblob Approach
We develop a rigid multiblob method for numerically solving the mobility
problem for suspensions of passive and active rigid particles of complex shape
in Stokes flow in unconfined, partially confined, and fully confined
geometries. As in a number of existing methods, we discretize rigid bodies
using a collection of minimally-resolved spherical blobs constrained to move as
a rigid body, to arrive at a potentially large linear system of equations for
the unknown Lagrange multipliers and rigid-body motions. Here we develop a
block-diagonal preconditioner for this linear system and show that a standard
Krylov solver converges in a modest number of iterations that is essentially
independent of the number of particles. For unbounded suspensions and
suspensions sedimented against a single no-slip boundary, we rely on existing
analytical expressions for the Rotne-Prager tensor combined with a fast
multipole method or a direct summation on a Graphical Processing Unit to obtain
an simple yet efficient and scalable implementation. For fully confined
domains, such as periodic suspensions or suspensions confined in slit and
square channels, we extend a recently-developed rigid-body immersed boundary
method to suspensions of freely-moving passive or active rigid particles at
zero Reynolds number. We demonstrate that the iterative solver for the coupled
fluid and rigid body equations converges in a bounded number of iterations
regardless of the system size. We optimize a number of parameters in the
iterative solvers and apply our method to a variety of benchmark problems to
carefully assess the accuracy of the rigid multiblob approach as a function of
the resolution. We also model the dynamics of colloidal particles studied in
recent experiments, such as passive boomerangs in a slit channel, as well as a
pair of non-Brownian active nanorods sedimented against a wall.Comment: Under revision in CAMCOS, Nov 201
QCD simulations with staggered fermions on GPUs
We report on our implementation of the RHMC algorithm for the simulation of
lattice QCD with two staggered flavors on Graphics Processing Units, using the
NVIDIA CUDA programming language. The main feature of our code is that the GPU
is not used just as an accelerator, but instead the whole Molecular Dynamics
trajectory is performed on it. After pointing out the main bottlenecks and how
to circumvent them, we discuss the obtained performances. We present some
preliminary results regarding OpenCL and multiGPU extensions of our code and
discuss future perspectives.Comment: 22 pages, 14 eps figures, final version to be published in Computer
Physics Communication
- …