2,000 research outputs found
NBODY6++GPU: Ready for the gravitational million-body problem
Accurate direct -body simulations help to obtain detailed information
about the dynamical evolution of star clusters. They also enable comparisons
with analytical models and Fokker-Planck or Monte-Carlo methods. NBODY6 is a
well-known direct -body code for star clusters, and NBODY6++ is the extended
version designed for large particle number simulations by supercomputers. We
present NBODY6++GPU, an optimized version of NBODY6++ with hybrid
parallelization methods (MPI, GPU, OpenMP, and AVX/SSE) to accelerate large
direct -body simulations, and in particular to solve the million-body
problem. We discuss the new features of the NBODY6++GPU code, benchmarks, as
well as the first results from a simulation of a realistic globular cluster
initially containing a million particles. For million-body simulations,
NBODY6++GPU is times faster than NBODY6 with 320 CPU cores and 32
NVIDIA K20X GPUs. With this computing cluster specification, the simulations of
million-body globular clusters including primordial binaries require
about an hour per half-mass crossing time.Comment: 13 pages, 9 figures, 3 table
Load management strategy for Particle-In-Cell simulations in high energy particle acceleration
In the wake of the intense effort made for the experimental CILEX project,
numerical simulation cam- paigns have been carried out in order to finalize the
design of the facility and to identify optimal laser and plasma parameters.
These simulations bring, of course, important insight into the fundamental
physics at play. As a by-product, they also characterize the quality of our
theoretical and numerical models. In this paper, we compare the results given
by different codes and point out algorithmic lim- itations both in terms of
physical accuracy and computational performances. These limitations are illu-
strated in the context of electron laser wakefield acceleration (LWFA). The
main limitation we identify in state-of-the-art Particle-In-Cell (PIC) codes is
computational load imbalance. We propose an innovative algorithm to deal with
this specific issue as well as milestones towards a modern, accurate high-per-
formance PIC code for high energy particle acceleration
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
GROMACS is a widely used package for biomolecular simulation, and over the
last two decades it has evolved from small-scale efficiency to advanced
heterogeneous acceleration and multi-level parallelism targeting some of the
largest supercomputers in the world. Here, we describe some of the ways we have
been able to realize this through the use of parallelization on all levels,
combined with a constant focus on absolute performance. Release 4.6 of GROMACS
uses SIMD acceleration on a wide range of architectures, GPU offloading
acceleration, and both OpenMP and MPI parallelism within and between nodes,
respectively. The recent work on acceleration made it necessary to revisit the
fundamental algorithms of molecular simulation, including the concept of
neighborsearching, and we discuss the present and future challenges we see for
exascale simulation - in particular a very fine-grained task parallelism. We
also discuss the software management, code peer review and continuous
integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin
Parallelization of Kinetic Theory Simulations
Numerical studies of shock waves in large scale systems via kinetic
simulations with millions of particles are too computationally demanding to be
processed in serial. In this work we focus on optimizing the parallel
performance of a kinetic Monte Carlo code for astrophysical simulations such as
core-collapse supernovae. Our goal is to attain a flexible program that scales
well with the architecture of modern supercomputers. This approach requires a
hybrid model of programming that combines a message passing interface (MPI)
with a multithreading model (OpenMP) in C++. We report on our approach to
implement the hybrid design into the kinetic code and show first results which
demonstrate a significant gain in performance when many processors are applied.Comment: 10 pages, 3 figures, conference proceeding
A portable platform for accelerated PIC codes and its application to GPUs using OpenACC
We present a portable platform, called PIC_ENGINE, for accelerating
Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as
Graphic Processing Units (GPUs). The aim of this development is efficient
simulations on future exascale systems by allowing different parallelization
strategies depending on the application problem and the specific architecture.
To this end, this platform contains the basic steps of the PIC algorithm and
has been designed as a test bed for different algorithmic options and data
structures. Among the architectures that this engine can explore, particular
attention is given here to systems equipped with GPUs. The study demonstrates
that our portable PIC implementation based on the OpenACC programming model can
achieve performance closely matching theoretical predictions. Using the Cray
XC30 system, Piz Daint, at the Swiss National Supercomputing Centre (CSCS), we
show that PIC_ENGINE running on an NVIDIA Kepler K20X GPU can outperform the
one on an Intel Sandybridge 8-core CPU by a factor of 3.4
SPH-EXA: Enhancing the Scalability of SPH codes Via an Exascale-Ready SPH Mini-App
Numerical simulations of fluids in astrophysics and computational fluid
dynamics (CFD) are among the most computationally-demanding calculations, in
terms of sustained floating-point operations per second, or FLOP/s. It is
expected that these numerical simulations will significantly benefit from the
future Exascale computing infrastructures, that will perform 10^18 FLOP/s. The
performance of the SPH codes is, in general, adversely impacted by several
factors, such as multiple time-stepping, long-range interactions, and/or
boundary conditions. In this work an extensive study of three SPH
implementations SPHYNX, ChaNGa, and XXX is performed, to gain insights and to
expose any limitations and characteristics of the codes. These codes are the
starting point of an interdisciplinary co-design project, SPH-EXA, for the
development of an Exascale-ready SPH mini-app. We implemented a rotating square
patch as a joint test simulation for the three SPH codes and analyzed their
performance on a modern HPC system, Piz Daint. The performance profiling and
scalability analysis conducted on the three parent codes allowed to expose
their performance issues, such as load imbalance, both in MPI and OpenMP.
Two-level load balancing has been successfully applied to SPHYNX to overcome
its load imbalance. The performance analysis shapes and drives the design of
the SPH-EXA mini-app towards the use of efficient parallelization methods,
fault-tolerance mechanisms, and load balancing approaches.Comment: arXiv admin note: substantial text overlap with arXiv:1809.0801
Parallel TREE code for two-component ultracold plasma analysis
The TREE method has been widely used for long-range interaction {\it N}-body
problems. We have developed a parallel TREE code for two-component classical
plasmas with open boundary conditions and highly non-uniform charge
distributions. The program efficiently handles millions of particles evolved
over long relaxation times requiring millions of time steps. Appropriate domain
decomposition and dynamic data management were employed, and large-scale
parallel processing was achieved using an intermediate level of granularity of
domain decomposition and ghost TREE communication. Even though the
computational load is not fully distributed in fine grains, high parallel
efficiency was achieved for ultracold plasma systems of charged particles. As
an application, we performed simulations of an ultracold neutral plasma with a
half million particles and a half million time steps. For the long temporal
trajectories of relaxation between heavy ions and light electrons, large
configurations of ultracold plasmas can now be investigated, which was not
possible in past studies
- âŠ