4,096 research outputs found

    Performance of a second order electrostatic particle-in-cell algorithm on modern many-core architectures

    Get PDF
    In this paper we present the outline of a novel electrostatic, second order Particle-in-Cell (PIC) algorithm, that makes use of 'ghost particles' located around true particle positions in order to represent a charge distribution. We implement our algorithm within EMPIRE-PIC, a PIC code developed at Sandia National Laboratories. We test the performance of our algorithm on a variety of many-core architectures including NVIDIA GPUs, conventional CPUs, and Intel's Knights Landing. Our preliminary results show the viability of second order methods for PIC applications on these architectures when compared to previous generations of many-core hardware. Specifically, we see an order of magnitude improvement in performance for second order methods between the Tesla K20 and Tesla P100 GPU devices, despite only a 4× improvement in the theoretical peak performance between the devices. Although these initial results show a large increase in runtime over first order methods, we hope to be able to show improved scaling behaviour and increased simulation accuracy in the future

    A portable platform for accelerated PIC codes and its application to GPUs using OpenACC

    Get PDF
    We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as Graphic Processing Units (GPUs). The aim of this development is efficient simulations on future exascale systems by allowing different parallelization strategies depending on the application problem and the specific architecture. To this end, this platform contains the basic steps of the PIC algorithm and has been designed as a test bed for different algorithmic options and data structures. Among the architectures that this engine can explore, particular attention is given here to systems equipped with GPUs. The study demonstrates that our portable PIC implementation based on the OpenACC programming model can achieve performance closely matching theoretical predictions. Using the Cray XC30 system, Piz Daint, at the Swiss National Supercomputing Centre (CSCS), we show that PIC_ENGINE running on an NVIDIA Kepler K20X GPU can outperform the one on an Intel Sandybridge 8-core CPU by a factor of 3.4

    Multi-Architecture Monte-Carlo (MC) Simulation of Soft Coarse-Grained Polymeric Materials: SOft coarse grained Monte-carlo Acceleration (SOMA)

    Full text link
    Multi-component polymer systems are important for the development of new materials because of their ability to phase-separate or self-assemble into nano-structures. The Single-Chain-in-Mean-Field (SCMF) algorithm in conjunction with a soft, coarse-grained polymer model is an established technique to investigate these soft-matter systems. Here we present an im- plementation of this method: SOft coarse grained Monte-carlo Accelera- tion (SOMA). It is suitable to simulate large system sizes with up to billions of particles, yet versatile enough to study properties of different kinds of molecular architectures and interactions. We achieve efficiency of the simulations commissioning accelerators like GPUs on both workstations as well as supercomputers. The implementa- tion remains flexible and maintainable because of the implementation of the scientific programming language enhanced by OpenACC pragmas for the accelerators. We present implementation details and features of the program package, investigate the scalability of our implementation SOMA, and discuss two applications, which cover system sizes that are difficult to reach with other, common particle-based simulation methods

    The Fast Multipole Method and Point Dipole Moment Polarizable Force Fields

    Full text link
    We present an implementation of the fast multipole method for computing coulombic electrostatic and polarization forces from polarizable force-fields based on induced point dipole moments. We demonstrate the expected O(N)O(N) scaling of that approach by performing single energy point calculations on hexamer protein subunits of the mature HIV-1 capsid. We also show the long time energy conservation in molecular dynamics at the nanosecond scale by performing simulations of a protein complex embedded in a coarse-grained solvent using a standard integrator and a multiple time step integrator. Our tests show the applicability of FMM combined with state-of-the-art chemical models in molecular dynamical systems.Comment: 11 pages, 8 figures, accepted by J. Chem. Phy

    Efficient Implementations of Molecular Dynamics Simulations for Lennard-Jones Systems

    Full text link
    Efficient implementations of the classical molecular dynamics (MD) method for Lennard-Jones particle systems are considered. Not only general algorithms but also techniques that are efficient for some specific CPU architectures are also explained. A simple spatial-decomposition-based strategy is adopted for parallelization. By utilizing the developed code, benchmark simulations are performed on a HITACHI SR16000/J2 system consisting of IBM POWER6 processors which are 4.7 GHz at the National Institute for Fusion Science (NIFS) and an SGI Altix ICE 8400EX system consisting of Intel Xeon processors which are 2.93 GHz at the Institute for Solid State Physics (ISSP), the University of Tokyo. The parallelization efficiency of the largest run, consisting of 4.1 billion particles with 8192 MPI processes, is about 73% relative to that of the smallest run with 128 MPI processes at NIFS, and it is about 66% relative to that of the smallest run with 4 MPI processes at ISSP. The factors causing the parallel overhead are investigated. It is found that fluctuations of the execution time of each process degrade the parallel efficiency. These fluctuations may be due to the interference of the operating system, which is known as OS Jitter.Comment: 33 pages, 19 figures, add references and figures are revise

    HARES: an efficient method for first-principles electronic structure calculations of complex systems

    Get PDF
    We discuss our new implementation of the Real-space Electronic Structure method for studying the atomic and electronic structure of infinite periodic as well as finite systems, based on density functional theory. This improved version which we call HARES (for High-performance-fortran Adaptive grid Real-space Electronic Structure) aims at making the method widely applicable and efficient, using high performance Fortran on parallel architectures. The scaling of various parts of a HARES calculation is analyzed and compared to that of plane-wave based methods. The new developments that lead to enhanced performance, and their parallel implementation, are presented in detail. We illustrate the application of HARES to the study of elemental crystalline solids, molecules and complex crystalline materials, such as blue bronze and zeolites.Comment: 17 two-column pages, including 9 figures, 5 tables. To appear in Computer Physics Communications. Several minor revisions based on feedbac
    corecore