Search CORE

6 research outputs found

Global Simulation of Plasma Microturbulence at the Petascale & Beyond (Optimizing the GTC Code for Blue Gene/Q): ALCF-2 Early Science Program Technical Report

Author: Ethier Stephane
Ibrahim Khaled
Madduri Kamesh
Oliker Leonid
Tang William
Wang Bei
Williams Samuel
Williams Timothy
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 14/05/2013
Field of study

Crossref

UNT Digital Library

Particle-in-Cell algorithms for emerging computer architectures

Author: Decyk Viktor K.
Singh Tajendra V.
Publication venue: The Authors. Published by Elsevier B.V.
Publication date: 31/03/2014
Field of study

AbstractWe have designed Particle-in-Cell algorithms for emerging architectures. These algorithms share a common approach, using fine-grained tiles, but different implementations depending on the architecture. On the GPU, there were two different implementations, one with atomic operations and one with no data collisions, using CUDA C and Fortran. Speedups up to about 50 compared to a single core of the Intel i7 processor have been achieved. There was also an implementation for traditional multi-core processors using OpenMP which achieved high parallel efficiency. We believe that this approach should work for other emerging designs such as Intel Phi coprocessor from the Intel MIC architecture

Elsevier - Publisher Connector

A portable platform for accelerated PIC codes and its application to GPUs using OpenACC

Author: Brunner S.
Gheller G.
Hariri F.
Jocksch A.
Lanti E.
Messmer P.
Progsch J.
Tran T. M.
Villard L.
Publication venue: 'Elsevier BV'
Publication date: 09/03/2016
Field of study

We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as Graphic Processing Units (GPUs). The aim of this development is efficient simulations on future exascale systems by allowing different parallelization strategies depending on the application problem and the specific architecture. To this end, this platform contains the basic steps of the PIC algorithm and has been designed as a test bed for different algorithmic options and data structures. Among the architectures that this engine can explore, particular attention is given here to systems equipped with GPUs. The study demonstrates that our portable PIC implementation based on the OpenACC programming model can achieve performance closely matching theoretical predictions. Using the Cray XC30 system, Piz Daint, at the Swiss National Supercomputing Centre (CSCS), we show that PIC_ENGINE running on an NVIDIA Kepler K20X GPU can outperform the one on an Intel Sandybridge 8-core CPU by a factor of 3.4

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Repository for Publications and Research Data

Elsevier - Publisher Connector

An efficient mixed-precision, hybrid CPU-GPU implementation of a fully implicit particle-in-cell algorithm

Author: Birdsall
Bowers
Bowers
Bowers
Burau
Chen
D.C. Barnes
Decyk
Fuller
G. Chen
Harris
Hennessy
Hwu
Hwu
Kirk
Kong
L. Chacón
Liewer
Little
Madduri
Markstein
Nickolls
Shampine
Stantchev
Williams
Wittenbrink
Wulf
Publication venue: 'Elsevier BV'
Publication date: 22/11/2011
Field of study

Recently, a fully implicit, energy- and charge-conserving particle-in-cell method has been proposed for multi-scale, full-f kinetic simulations [G. Chen, et al., J. Comput. Phys. 230,18 (2011)]. The method employs a Jacobian-free Newton-Krylov (JFNK) solver, capable of using very large timesteps without loss of numerical stability or accuracy. A fundamental feature of the method is the segregation of particle-orbit computations from the field solver, while remaining fully self-consistent. This paper describes a very efficient, mixed-precision hybrid CPU-GPU implementation of the implicit PIC algorithm exploiting this feature. The JFNK solver is kept on the CPU in double precision (DP), while the implicit, charge-conserving, and adaptive particle mover is implemented on a GPU (graphics processing unit) using CUDA in single-precision (SP). Performance-oriented optimizations are introduced with the aid of the roofline model. The implicit particle mover algorithm is shown to achieve up to 400 GOp/s on a Nvidia GeForce GTX580. This corresponds to 25% absolute GPU efficiency against the peak theoretical performance, and is about 300 times faster than an equivalent serial CPU (Intel Xeon X5460) execution. For the test case chosen, the mixed-precision hybrid CPU-GPU solver is shown to over-perform the DP CPU-only serial version by a factor of \sim 100, without apparent loss of robustness or accuracy in a challenging long-timescale ion acoustic wave simulation.Comment: 25 pages, 6 figures, submitted to J. Comput. Phy

arXiv.org e-Print Archive

CiteSeerX

Crossref

Recommended from our members

Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms

Author: Ethier S
Ibrahim KZ
Im EJ
Madduri K
Oliker L
Williams S
Publication venue: eScholarship, University of California
Publication date: 01/09/2011
Field of study

The next decade of high-performance computing (HPC) systems will see a rapid evolution and divergence of multi- and manycore architectures as power and cooling constraints limit increases in microprocessor clock speeds. Understanding efficient optimization methodologies on diverse multicore designs in the context of demanding numerical methods is one of the greatest challenges faced today by the HPC community. In this work, we examine the efficient multicore optimization of GTC, a petascale gyrokinetic toroidal fusion code for studying plasma microturbulence in tokamak devices. For GTC's key computational components (charge deposition and particle push), we explore efficient parallelization strategies across a broad range of emerging multicore designs, including the recently-released Intel Nehalem-EX, the AMD Opteron Istanbul, and the highly multithreaded Sun UltraSparc T2+. We also present the first study on tuning gyrokinetic particle-in-cell (PIC) algorithms for graphics processors, using the NVIDIA C2050 (Fermi). Our work discusses several novel optimization approaches for gyrokinetic PIC, including mixed-precision computation, particle binning and decomposition strategies, grid replication, SIMDized atomic floating-point operations, and effective GPU texture memory utilization. Overall, we achieve significant performance improvements of 1.3-4.7× on these complex PIC kernels, despite the inherent challenges of data dependency and locality. Our work also points to several architectural and programming features that could significantly enhance PIC performance and productivity on next-generation architectures. © 2011 Elsevier B.V. All rights reserved

eScholarship - University of California

Recommended from our members

Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms

Author: Ethier S
Ibrahim KZ
Im EJ
Madduri K
Oliker L
Williams S
Publication venue: eScholarship, University of California
Publication date: 01/09/2011
Field of study

eScholarship - University of California