20,312 research outputs found
An adaptive hierarchical domain decomposition method for parallel contact dynamics simulations of granular materials
A fully parallel version of the contact dynamics (CD) method is presented in
this paper. For large enough systems, 100% efficiency has been demonstrated for
up to 256 processors using a hierarchical domain decomposition with dynamic
load balancing. The iterative scheme to calculate the contact forces is left
domain-wise sequential, with data exchange after each iteration step, which
ensures its stability. The number of additional iterations required for
convergence by the partially parallel updates at the domain boundaries becomes
negligible with increasing number of particles, which allows for an effective
parallelization. Compared to the sequential implementation, we found no
influence of the parallelization on simulation results.Comment: 19 pages, 15 figures, published in Journal of Computational Physics
(2011
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
GROMACS is a widely used package for biomolecular simulation, and over the
last two decades it has evolved from small-scale efficiency to advanced
heterogeneous acceleration and multi-level parallelism targeting some of the
largest supercomputers in the world. Here, we describe some of the ways we have
been able to realize this through the use of parallelization on all levels,
combined with a constant focus on absolute performance. Release 4.6 of GROMACS
uses SIMD acceleration on a wide range of architectures, GPU offloading
acceleration, and both OpenMP and MPI parallelism within and between nodes,
respectively. The recent work on acceleration made it necessary to revisit the
fundamental algorithms of molecular simulation, including the concept of
neighborsearching, and we discuss the present and future challenges we see for
exascale simulation - in particular a very fine-grained task parallelism. We
also discuss the software management, code peer review and continuous
integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin
Efficient Implementations of Molecular Dynamics Simulations for Lennard-Jones Systems
Efficient implementations of the classical molecular dynamics (MD) method for
Lennard-Jones particle systems are considered. Not only general algorithms but
also techniques that are efficient for some specific CPU architectures are also
explained. A simple spatial-decomposition-based strategy is adopted for
parallelization. By utilizing the developed code, benchmark simulations are
performed on a HITACHI SR16000/J2 system consisting of IBM POWER6 processors
which are 4.7 GHz at the National Institute for Fusion Science (NIFS) and an
SGI Altix ICE 8400EX system consisting of Intel Xeon processors which are 2.93
GHz at the Institute for Solid State Physics (ISSP), the University of Tokyo.
The parallelization efficiency of the largest run, consisting of 4.1 billion
particles with 8192 MPI processes, is about 73% relative to that of the
smallest run with 128 MPI processes at NIFS, and it is about 66% relative to
that of the smallest run with 4 MPI processes at ISSP. The factors causing the
parallel overhead are investigated. It is found that fluctuations of the
execution time of each process degrade the parallel efficiency. These
fluctuations may be due to the interference of the operating system, which is
known as OS Jitter.Comment: 33 pages, 19 figures, add references and figures are revise
Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms
We present a mathematical framework for constructing and analyzing parallel
algorithms for lattice Kinetic Monte Carlo (KMC) simulations. The resulting
algorithms have the capacity to simulate a wide range of spatio-temporal scales
in spatially distributed, non-equilibrium physiochemical processes with complex
chemistry and transport micro-mechanisms. The algorithms can be tailored to
specific hierarchical parallel architectures such as multi-core processors or
clusters of Graphical Processing Units (GPUs). The proposed parallel algorithms
are controlled-error approximations of kinetic Monte Carlo algorithms,
departing from the predominant paradigm of creating parallel KMC algorithms
with exactly the same master equation as the serial one.
Our methodology relies on a spatial decomposition of the Markov operator
underlying the KMC algorithm into a hierarchy of operators corresponding to the
processors' structure in the parallel architecture. Based on this operator
decomposition, we formulate Fractional Step Approximation schemes by employing
the Trotter Theorem and its random variants; these schemes, (a) determine the
communication schedule} between processors, and (b) are run independently on
each processor through a serial KMC simulation, called a kernel, on each
fractional step time-window.
Furthermore, the proposed mathematical framework allows us to rigorously
justify the numerical and statistical consistency of the proposed algorithms,
showing the convergence of our approximating schemes to the original serial
KMC. The approach also provides a systematic evaluation of different processor
communicating schedules.Comment: 34 pages, 9 figure
Achieving Extreme Resolution in Numerical Cosmology Using Adaptive Mesh Refinement: Resolving Primordial Star Formation
As an entry for the 2001 Gordon Bell Award in the "special" category, we
describe our 3-d, hybrid, adaptive mesh refinement (AMR) code, Enzo, designed
for high-resolution, multiphysics, cosmological structure formation
simulations. Our parallel implementation places no limit on the depth or
complexity of the adaptive grid hierarchy, allowing us to achieve unprecedented
spatial and temporal dynamic range. We report on a simulation of primordial
star formation which develops over 8000 subgrids at 34 levels of refinement to
achieve a local refinement of a factor of 10^12 in space and time. This allows
us to resolve the properties of the first stars which form in the universe
assuming standard physics and a standard cosmological model. Achieving extreme
resolution requires the use of 128-bit extended precision arithmetic (EPA) to
accurately specify the subgrid positions. We describe our EPA AMR
implementation on the IBM SP2 Blue Horizon system at the San Diego
Supercomputer Center.Comment: 23 pages, 5 figures. Peer reviewed technical paper accepted to the
proceedings of Supercomputing 2001. This entry was a Gordon Bell Prize
finalist. For more information visit http://www.TomAbel.com/GB
- …