5,918 research outputs found
Highly accelerated simulations of glassy dynamics using GPUs: caveats on limited floating-point precision
Modern graphics processing units (GPUs) provide impressive computing
resources, which can be accessed conveniently through the CUDA programming
interface. We describe how GPUs can be used to considerably speed up molecular
dynamics (MD) simulations for system sizes ranging up to about 1 million
particles. Particular emphasis is put on the numerical long-time stability in
terms of energy and momentum conservation, and caveats on limited
floating-point precision are issued. Strict energy conservation over 10^8 MD
steps is obtained by double-single emulation of the floating-point arithmetic
in accuracy-critical parts of the algorithm. For the slow dynamics of a
supercooled binary Lennard-Jones mixture, we demonstrate that the use of
single-floating point precision may result in quantitatively and even
physically wrong results. For simulations of a Lennard-Jones fluid, the
described implementation shows speedup factors of up to 80 compared to a serial
implementation for the CPU, and a single GPU was found to compare with a
parallelised MD simulation using 64 distributed cores.Comment: 12 pages, 7 figures, to appear in Comp. Phys. Comm., HALMD package
licensed under the GPL, see http://research.colberg.org/projects/halm
Performance analysis of parallel gravitational -body codes on large GPU cluster
We compare the performance of two very different parallel gravitational
-body codes for astrophysical simulations on large GPU clusters, both
pioneer in their own fields as well as in certain mutual scales - NBODY6++ and
Bonsai. We carry out the benchmark of the two codes by analyzing their
performance, accuracy and efficiency through the modeling of structure
decomposition and timing measurements. We find that both codes are heavily
optimized to leverage the computational potential of GPUs as their performance
has approached half of the maximum single precision performance of the
underlying GPU cards. With such performance we predict that a speed-up of
can be achieved when up to 1k processors and GPUs are employed
simultaneously. We discuss the quantitative information about comparisons of
two codes, finding that in the same cases Bonsai adopts larger time steps as
well as relative energy errors than NBODY6++, typically ranging from
times larger, depending on the chosen parameters of the codes. While the two
codes are built for different astrophysical applications, in specified
conditions they may overlap in performance at certain physical scale, and thus
allowing the user to choose from either one with finetuned parameters
accordingly.Comment: 15 pages, 7 figures, 3 tables, accepted for publication in Research
in Astronomy and Astrophysics (RAA
GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics
We present the newly developed code, GAMER (GPU-accelerated Adaptive MEsh
Refinement code), which has adopted a novel approach to improve the performance
of adaptive mesh refinement (AMR) astrophysical simulations by a large factor
with the use of the graphic processing unit (GPU). The AMR implementation is
based on a hierarchy of grid patches with an oct-tree data structure. We adopt
a three-dimensional relaxing TVD scheme for the hydrodynamic solver, and a
multi-level relaxation scheme for the Poisson solver. Both solvers have been
implemented in GPU, by which hundreds of patches can be advanced in parallel.
The computational overhead associated with the data transfer between CPU and
GPU is carefully reduced by utilizing the capability of asynchronous memory
copies in GPU, and the computing time of the ghost-zone values for each patch
is made to diminish by overlapping it with the GPU computations. We demonstrate
the accuracy of the code by performing several standard test problems in
astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster
system. We measure the performance of the code by performing purely-baryonic
cosmological simulations in different hardware implementations, in which
detailed timing analyses provide comparison between the computations with and
without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are
demonstrated using 1 GPU with 4096^3 effective resolution and 16 GPUs with
8192^3 effective resolution, respectively.Comment: 60 pages, 22 figures, 3 tables. More accuracy tests are included.
Accepted for publication in ApJ
Acceleration of Coarse Grain Molecular Dynamics on GPU Architectures
Coarse grain (CG) molecular models have been proposed to simulate complex sys- tems with lower computational overheads and longer timescales with respect to atom- istic level models. However, their acceleration on parallel architectures such as Graphic Processing Units (GPU) presents original challenges that must be carefully evaluated. The objective of this work is to characterize the impact of CG model features on parallel simulation performance. To achieve this, we implemented a GPU-accelerated version of a CG molecular dynamics simulator, to which we applied specic optimizations for CG models, such as dedicated data structures to handle dierent bead type interac- tions, obtaining a maximum speed-up of 14 on the NVIDIA GTX480 GPU with Fermi architecture. We provide a complete characterization and evaluation of algorithmic and simulated system features of CG models impacting the achievable speed-up and accuracy of results, using three dierent GPU architectures as case studie
The GENGA Code: Gravitational Encounters in N-body simulations with GPU Acceleration
We describe an open source GPU implementation of a hybrid symplectic N-body
integrator, GENGA (Gravitational ENcounters with Gpu Acceleration), designed to
integrate planet and planetesimal dynamics in the late stage of planet
formation and stability analyses of planetary systems. GENGA uses a hybrid
symplectic integrator to handle close encounters with very good energy
conservation, which is essential in long-term planetary system integration. We
extended the second order hybrid integration scheme to higher orders. The GENGA
code supports three simulation modes: Integration of up to 2048 massive bodies,
integration with up to a million test particles, or parallel integration of a
large number of individual planetary systems. We compare the results of GENGA
to Mercury and pkdgrav2 in respect of energy conservation and performance, and
find that the energy conservation of GENGA is comparable to Mercury and around
two orders of magnitude better than pkdgrav2. GENGA runs up to 30 times faster
than Mercury and up to eight times faster than pkdgrav2. GENGA is written in
CUDA C and runs on all NVIDIA GPUs with compute capability of at least 2.0.Comment: Accepted by ApJ. 18 pages, 17 figures, 4 table
A sparse octree gravitational N-body code that runs entirely on the GPU processor
We present parallel algorithms for constructing and traversing sparse octrees
on graphics processing units (GPUs). The algorithms are based on parallel-scan
and sort methods. To test the performance and feasibility, we implemented them
in CUDA in the form of a gravitational tree-code which completely runs on the
GPU.(The code is publicly available at:
http://castle.strw.leidenuniv.nl/software.html) The tree construction and
traverse algorithms are portable to many-core devices which have support for
CUDA or OpenCL programming languages. The gravitational tree-code outperforms
tuned CPU code during the tree-construction and shows a performance improvement
of more than a factor 20 overall, resulting in a processing rate of more than
2.8 million particles per second.Comment: Accepted version. Published in Journal of Computational Physics. 35
pages, 12 figures, single colum
- âŠ