4,096 research outputs found
Performance of a second order electrostatic particle-in-cell algorithm on modern many-core architectures
In this paper we present the outline of a novel electrostatic, second order Particle-in-Cell (PIC) algorithm, that makes use of 'ghost particles' located around true particle positions in order to represent a charge distribution. We implement our algorithm within EMPIRE-PIC, a PIC code developed at Sandia National Laboratories. We test the performance of our algorithm on a variety of many-core architectures including NVIDIA GPUs, conventional CPUs, and Intel's Knights Landing. Our preliminary results show the viability of second order methods for PIC applications on these architectures when compared to previous generations of many-core hardware. Specifically, we see an order of magnitude improvement in performance for second order methods between the Tesla K20 and Tesla P100 GPU devices, despite only a 4× improvement in the theoretical peak performance between the devices. Although these initial results show a large increase in runtime over first order methods, we hope to be able to show improved scaling behaviour and increased simulation accuracy in the future
A portable platform for accelerated PIC codes and its application to GPUs using OpenACC
We present a portable platform, called PIC_ENGINE, for accelerating
Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as
Graphic Processing Units (GPUs). The aim of this development is efficient
simulations on future exascale systems by allowing different parallelization
strategies depending on the application problem and the specific architecture.
To this end, this platform contains the basic steps of the PIC algorithm and
has been designed as a test bed for different algorithmic options and data
structures. Among the architectures that this engine can explore, particular
attention is given here to systems equipped with GPUs. The study demonstrates
that our portable PIC implementation based on the OpenACC programming model can
achieve performance closely matching theoretical predictions. Using the Cray
XC30 system, Piz Daint, at the Swiss National Supercomputing Centre (CSCS), we
show that PIC_ENGINE running on an NVIDIA Kepler K20X GPU can outperform the
one on an Intel Sandybridge 8-core CPU by a factor of 3.4
Multi-Architecture Monte-Carlo (MC) Simulation of Soft Coarse-Grained Polymeric Materials: SOft coarse grained Monte-carlo Acceleration (SOMA)
Multi-component polymer systems are important for the development of new
materials because of their ability to phase-separate or self-assemble into
nano-structures. The Single-Chain-in-Mean-Field (SCMF) algorithm in conjunction
with a soft, coarse-grained polymer model is an established technique to
investigate these soft-matter systems. Here we present an im- plementation of
this method: SOft coarse grained Monte-carlo Accelera- tion (SOMA). It is
suitable to simulate large system sizes with up to billions of particles, yet
versatile enough to study properties of different kinds of molecular
architectures and interactions. We achieve efficiency of the simulations
commissioning accelerators like GPUs on both workstations as well as
supercomputers. The implementa- tion remains flexible and maintainable because
of the implementation of the scientific programming language enhanced by
OpenACC pragmas for the accelerators. We present implementation details and
features of the program package, investigate the scalability of our
implementation SOMA, and discuss two applications, which cover system sizes
that are difficult to reach with other, common particle-based simulation
methods
The Fast Multipole Method and Point Dipole Moment Polarizable Force Fields
We present an implementation of the fast multipole method for computing
coulombic electrostatic and polarization forces from polarizable force-fields
based on induced point dipole moments. We demonstrate the expected
scaling of that approach by performing single energy point calculations on
hexamer protein subunits of the mature HIV-1 capsid. We also show the long time
energy conservation in molecular dynamics at the nanosecond scale by performing
simulations of a protein complex embedded in a coarse-grained solvent using a
standard integrator and a multiple time step integrator. Our tests show the
applicability of FMM combined with state-of-the-art chemical models in
molecular dynamical systems.Comment: 11 pages, 8 figures, accepted by J. Chem. Phy
Efficient Implementations of Molecular Dynamics Simulations for Lennard-Jones Systems
Efficient implementations of the classical molecular dynamics (MD) method for
Lennard-Jones particle systems are considered. Not only general algorithms but
also techniques that are efficient for some specific CPU architectures are also
explained. A simple spatial-decomposition-based strategy is adopted for
parallelization. By utilizing the developed code, benchmark simulations are
performed on a HITACHI SR16000/J2 system consisting of IBM POWER6 processors
which are 4.7 GHz at the National Institute for Fusion Science (NIFS) and an
SGI Altix ICE 8400EX system consisting of Intel Xeon processors which are 2.93
GHz at the Institute for Solid State Physics (ISSP), the University of Tokyo.
The parallelization efficiency of the largest run, consisting of 4.1 billion
particles with 8192 MPI processes, is about 73% relative to that of the
smallest run with 128 MPI processes at NIFS, and it is about 66% relative to
that of the smallest run with 4 MPI processes at ISSP. The factors causing the
parallel overhead are investigated. It is found that fluctuations of the
execution time of each process degrade the parallel efficiency. These
fluctuations may be due to the interference of the operating system, which is
known as OS Jitter.Comment: 33 pages, 19 figures, add references and figures are revise
HARES: an efficient method for first-principles electronic structure calculations of complex systems
We discuss our new implementation of the Real-space Electronic Structure
method for studying the atomic and electronic structure of infinite periodic as
well as finite systems, based on density functional theory. This improved
version which we call HARES (for High-performance-fortran Adaptive grid
Real-space Electronic Structure) aims at making the method widely applicable
and efficient, using high performance Fortran on parallel architectures. The
scaling of various parts of a HARES calculation is analyzed and compared to
that of plane-wave based methods. The new developments that lead to enhanced
performance, and their parallel implementation, are presented in detail. We
illustrate the application of HARES to the study of elemental crystalline
solids, molecules and complex crystalline materials, such as blue bronze and
zeolites.Comment: 17 two-column pages, including 9 figures, 5 tables. To appear in
Computer Physics Communications. Several minor revisions based on feedbac
- …