2,555 research outputs found
NBODY6++GPU: Ready for the gravitational million-body problem
Accurate direct -body simulations help to obtain detailed information
about the dynamical evolution of star clusters. They also enable comparisons
with analytical models and Fokker-Planck or Monte-Carlo methods. NBODY6 is a
well-known direct -body code for star clusters, and NBODY6++ is the extended
version designed for large particle number simulations by supercomputers. We
present NBODY6++GPU, an optimized version of NBODY6++ with hybrid
parallelization methods (MPI, GPU, OpenMP, and AVX/SSE) to accelerate large
direct -body simulations, and in particular to solve the million-body
problem. We discuss the new features of the NBODY6++GPU code, benchmarks, as
well as the first results from a simulation of a realistic globular cluster
initially containing a million particles. For million-body simulations,
NBODY6++GPU is times faster than NBODY6 with 320 CPU cores and 32
NVIDIA K20X GPUs. With this computing cluster specification, the simulations of
million-body globular clusters including primordial binaries require
about an hour per half-mass crossing time.Comment: 13 pages, 9 figures, 3 table
Strong scaling of general-purpose molecular dynamics simulations on GPUs
We describe a highly optimized implementation of MPI domain decomposition in
a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson
and Glotzer, arXiv:1308.5587). Our approach is inspired by a traditional
CPU-based code, LAMMPS (Plimpton, J. Comp. Phys. 117, 1995), but is implemented
within a code that was designed for execution on GPUs from the start (Anderson
et al., J. Comp. Phys. 227, 2008). The software supports short-ranged pair
force and bond force fields and achieves optimal GPU performance using an
autotuning algorithm. We are able to demonstrate equivalent or superior scaling
on up to 3,375 GPUs in Lennard-Jones and dissipative particle dynamics (DPD)
simulations of up to 108 million particles. GPUDirect RDMA capabilities in
recent GPU generations provide better performance in full double precision
calculations. For a representative polymer physics application, HOOMD-blue 1.0
provides an effective GPU vs. CPU node speed-up of 12.5x.Comment: 30 pages, 14 figure
Molecular Dynamics Simulation of Macromolecules Using Graphics Processing Unit
Molecular dynamics (MD) simulation is a powerful computational tool to study
the behavior of macromolecular systems. But many simulations of this field are
limited in spatial or temporal scale by the available computational resource.
In recent years, graphics processing unit (GPU) provides unprecedented
computational power for scientific applications. Many MD algorithms suit with
the multithread nature of GPU. In this paper, MD algorithms for macromolecular
systems that run entirely on GPU are presented. Compared to the MD simulation
with free software GROMACS on a single CPU core, our codes achieve about 10
times speed-up on a single GPU. For validation, we have performed MD
simulations of polymer crystallization on GPU, and the results observed
perfectly agree with computations on CPU. Therefore, our single GPU codes have
already provided an inexpensive alternative for macromolecular simulations on
traditional CPU clusters and they can also be used as a basis to develop
parallel GPU programs to further speedup the computations.Comment: 21 pages, 16 figure
An Efficient Cell List Implementation for Monte Carlo Simulation on GPUs
Maximizing the performance potential of the modern day GPU architecture
requires judicious utilization of available parallel resources. Although
dramatic reductions can often be obtained through straightforward mappings,
further performance improvements often require algorithmic redesigns to more
closely exploit the target architecture. In this paper, we focus on efficient
molecular simulations for the GPU and propose a novel cell list algorithm that
better utilizes its parallel resources. Our goal is an efficient GPU
implementation of large-scale Monte Carlo simulations for the grand canonical
ensemble. This is a particularly challenging application because there is
inherently less computation and parallelism than in similar applications with
molecular dynamics. Consistent with the results of prior researchers, our
simulation results show traditional cell list implementations for Monte Carlo
simulations of molecular systems offer effectively no performance improvement
for small systems [5, 14], even when porting to the GPU. However for larger
systems, the cell list implementation offers significant gains in performance.
Furthermore, our novel cell list approach results in better performance for all
problem sizes when compared with other GPU implementations with or without cell
lists.Comment: 30 page
Acceleration of Coarse Grain Molecular Dynamics on GPU Architectures
Coarse grain (CG) molecular models have been proposed to simulate complex sys- tems with lower computational overheads and longer timescales with respect to atom- istic level models. However, their acceleration on parallel architectures such as Graphic Processing Units (GPU) presents original challenges that must be carefully evaluated. The objective of this work is to characterize the impact of CG model features on parallel simulation performance. To achieve this, we implemented a GPU-accelerated version of a CG molecular dynamics simulator, to which we applied specic optimizations for CG models, such as dedicated data structures to handle dierent bead type interac- tions, obtaining a maximum speed-up of 14 on the NVIDIA GTX480 GPU with Fermi architecture. We provide a complete characterization and evaluation of algorithmic and simulated system features of CG models impacting the achievable speed-up and accuracy of results, using three dierent GPU architectures as case studie
- …