4 research outputs found

    A portable platform for accelerated PIC codes and its application to GPUs using OpenACC

    Get PDF
    We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as Graphic Processing Units (GPUs). The aim of this development is efficient simulations on future exascale systems by allowing different parallelization strategies depending on the application problem and the specific architecture. To this end, this platform contains the basic steps of the PIC algorithm and has been designed as a test bed for different algorithmic options and data structures. Among the architectures that this engine can explore, particular attention is given here to systems equipped with GPUs. The study demonstrates that our portable PIC implementation based on the OpenACC programming model can achieve performance closely matching theoretical predictions. Using the Cray XC30 system, Piz Daint, at the Swiss National Supercomputing Centre (CSCS), we show that PIC_ENGINE running on an NVIDIA Kepler K20X GPU can outperform the one on an Intel Sandybridge 8-core CPU by a factor of 3.4

    A Space and Bandwidth Efficient Multicore Algorithm for the Particle-in-Cell Method

    Get PDF
    International audienceThe Particle-in-Cell (PIC) method allows solving partial differential equation through simulations, with important applications in plasma physics. To simulate thousands of billions of particles on clusters of multicore machines, prior work has proposed hybrid algorithms that combine domain decomposition and particle decomposition with carefully optimized algorithms for handling particles processed on each multicore socket. Regarding the multicore processing, existing algorithms either suffer from suboptimal execution time, due to sorting operations or use of atomic instructions, or suffer from suboptimal space usage. In this paper, we propose a novel parallel algorithm for two-dimensional PIC simulations on multicore hardware that features asymptotically-optimal memory consumption, and does not perform unnecessary accesses to the main memory. In practice, our algorithm reaches 65% of the maximum bandwidth, and shows excellent scalability on the classical Landau damping and two-stream instability test cases

    A bucket sort algorithm for the particle-in-cell method on manycore architectures

    No full text
    The Particle-In-Cell (PIC) method is effectively used in many scientific simulation codes. In order to optimize the performance of the PIC approach, data locality is required. This relies on efficient sorting algorithms. We present a bucket sort algorithm with small memory footprint for the PIC method targeting Graphics Processing Units (GPUs). Our sorting algorithm shows an increased performance with the amount of storage provided and with the orderliness of the particles. For our application where particles are presorted it performs better and requires less memory than other sorting algorithms in the literature. The overall PIC algorithm performs at its best if the sorting is applied
    corecore