Search CORE

4,096 research outputs found

Performance of a second order electrostatic particle-in-cell algorithm on modern many-core architectures

Author: Brown Dominic
Jarvis Stephen A.
Wright Steven A.
Publication venue: 'Elsevier BV'
Publication date: 29/10/2018
Field of study

In this paper we present the outline of a novel electrostatic, second order Particle-in-Cell (PIC) algorithm, that makes use of 'ghost particles' located around true particle positions in order to represent a charge distribution. We implement our algorithm within EMPIRE-PIC, a PIC code developed at Sandia National Laboratories. We test the performance of our algorithm on a variety of many-core architectures including NVIDIA GPUs, conventional CPUs, and Intel's Knights Landing. Our preliminary results show the viability of second order methods for PIC applications on these architectures when compared to previous generations of many-core hardware. Specifically, we see an order of magnitude improvement in performance for second order methods between the Tesla K20 and Tesla P100 GPU devices, despite only a 4× improvement in the theoretical peak performance between the devices. Although these initial results show a large increase in runtime over first order methods, we hope to be able to show improved scaling behaviour and increased simulation accuracy in the future

University of Birmingham Research Portal

Warwick Research Archives Portal Repository

A portable platform for accelerated PIC codes and its application to GPUs using OpenACC

Author: Brunner S.
Gheller G.
Hariri F.
Jocksch A.
Lanti E.
Messmer P.
Progsch J.
Tran T. M.
Villard L.
Publication venue: 'Elsevier BV'
Publication date: 09/03/2016
Field of study

We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as Graphic Processing Units (GPUs). The aim of this development is efficient simulations on future exascale systems by allowing different parallelization strategies depending on the application problem and the specific architecture. To this end, this platform contains the basic steps of the PIC algorithm and has been designed as a test bed for different algorithmic options and data structures. Among the architectures that this engine can explore, particular attention is given here to systems equipped with GPUs. The study demonstrates that our portable PIC implementation based on the OpenACC programming model can achieve performance closely matching theoretical predictions. Using the Cray XC30 system, Piz Daint, at the Swiss National Supercomputing Centre (CSCS), we show that PIC_ENGINE running on an NVIDIA Kepler K20X GPU can outperform the one on an Intel Sandybridge 8-core CPU by a factor of 3.4

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Repository for Publications and Research Data

Elsevier - Publisher Connector

Multi-Architecture Monte-Carlo (MC) Simulation of Soft Coarse-Grained Polymeric Materials: SOft coarse grained Monte-carlo Acceleration (SOMA)

Author: Müller Marcus
Schneider Ludwig
Publication venue: 'Elsevier BV'
Publication date: 13/01/2018
Field of study

Multi-component polymer systems are important for the development of new materials because of their ability to phase-separate or self-assemble into nano-structures. The Single-Chain-in-Mean-Field (SCMF) algorithm in conjunction with a soft, coarse-grained polymer model is an established technique to investigate these soft-matter systems. Here we present an im- plementation of this method: SOft coarse grained Monte-carlo Accelera- tion (SOMA). It is suitable to simulate large system sizes with up to billions of particles, yet versatile enough to study properties of different kinds of molecular architectures and interactions. We achieve efficiency of the simulations commissioning accelerators like GPUs on both workstations as well as supercomputers. The implementa- tion remains flexible and maintainable because of the implementation of the scientific programming language enhanced by OpenACC pragmas for the accelerators. We present implementation details and features of the program package, investigate the scalability of our implementation SOMA, and discuss two applications, which cover system sizes that are difficult to reach with other, common particle-based simulation methods

arXiv.org e-Print Archive

Juelich Shared Electronic Resources

The Fast Multipole Method and Point Dipole Moment Polarizable Force Fields

Author: Coles Jonathan P.
Masella Michel
Publication venue: 'AIP Publishing'
Publication date: 14/01/2015
Field of study

We present an implementation of the fast multipole method for computing coulombic electrostatic and polarization forces from polarizable force-fields based on induced point dipole moments. We demonstrate the expected

O(N)

scaling of that approach by performing single energy point calculations on hexamer protein subunits of the mature HIV-1 capsid. We also show the long time energy conservation in molecular dynamics at the nanosecond scale by performing simulations of a protein complex embedded in a coarse-grained solvent using a standard integrator and a multiple time step integrator. Our tests show the applicability of FMM combined with state-of-the-art chemical models in molecular dynamical systems.Comment: 11 pages, 8 figures, accepted by J. Chem. Phy

arXiv.org e-Print Archive

Crossref

HAL-CEA

HAL UVSQ

Efficient Implementations of Molecular Dynamics Simulations for Lennard-Jones Systems

Author: Ito N.
Suzuki M.
Watanabe H.
Publication venue: 'Japan Society of Applied Physics'
Publication date: 23/06/2011
Field of study

Efficient implementations of the classical molecular dynamics (MD) method for Lennard-Jones particle systems are considered. Not only general algorithms but also techniques that are efficient for some specific CPU architectures are also explained. A simple spatial-decomposition-based strategy is adopted for parallelization. By utilizing the developed code, benchmark simulations are performed on a HITACHI SR16000/J2 system consisting of IBM POWER6 processors which are 4.7 GHz at the National Institute for Fusion Science (NIFS) and an SGI Altix ICE 8400EX system consisting of Intel Xeon processors which are 2.93 GHz at the Institute for Solid State Physics (ISSP), the University of Tokyo. The parallelization efficiency of the largest run, consisting of 4.1 billion particles with 8192 MPI processes, is about 73% relative to that of the smallest run with 128 MPI processes at NIFS, and it is about 66% relative to that of the smallest run with 4 MPI processes at ISSP. The factors causing the parallel overhead are investigated. It is found that fluctuations of the execution time of each process degrade the parallel efficiency. These fluctuations may be due to the interference of the operating system, which is known as OS Jitter.Comment: 33 pages, 19 figures, add references and figures are revise

arXiv.org e-Print Archive

Crossref

HARES: an efficient method for first-principles electronic structure calculations of complex systems

Author: Arias
Bachelet
Baroni
Briggs
Briggs
Brodlie
Ceperley
Chelikowsky
Chelikowsky
Chetty
Claessen
Denlinger
Efthimios Kaxiras
Fletcher
Fuchs
Ghedira
Goedecker
Gonze
Gweon
Gygi
Hanchul Kim
Highcock
Hohenberg
Hoshi
I.J. Park
Ihm
Iyer
Juan
Juan
Kittel
Kleinman
Kohn
Kresse
Kresse
Kresse
Lee
Meier
Milman
Modine
Monkhorst
Moroni
Moruzzi
Moruzzi
Normand Modine
Ordejon
P. Maragakis
Perdew
Perdew
Perdew
Perdew
Perdew
Perdew
Perdew
Powell
Rose
Schlenker
Schwarz
Shanno
Shanno
Singh
Teter
Troullier
U.V. Waghmare
Vanderbilt
von Barth
Vosko
Walsh
Whangbo
Williams
Publication venue: 'Elsevier BV'
Publication date: 29/08/2000
Field of study

We discuss our new implementation of the Real-space Electronic Structure method for studying the atomic and electronic structure of infinite periodic as well as finite systems, based on density functional theory. This improved version which we call HARES (for High-performance-fortran Adaptive grid Real-space Electronic Structure) aims at making the method widely applicable and efficient, using high performance Fortran on parallel architectures. The scaling of various parts of a HARES calculation is analyzed and compared to that of plane-wave based methods. The new developments that lead to enhanced performance, and their parallel implementation, are presented in detail. We illustrate the application of HARES to the study of elemental crystalline solids, molecules and complex crystalline materials, such as blue bronze and zeolites.Comment: 17 two-column pages, including 9 figures, 5 tables. To appear in Computer Physics Communications. Several minor revisions based on feedbac

arXiv.org e-Print Archive

Crossref

UNT Digital Library