1,167 research outputs found
An efficient mixed-precision, hybrid CPU-GPU implementation of a fully implicit particle-in-cell algorithm
Recently, a fully implicit, energy- and charge-conserving particle-in-cell
method has been proposed for multi-scale, full-f kinetic simulations [G. Chen,
et al., J. Comput. Phys. 230,18 (2011)]. The method employs a Jacobian-free
Newton-Krylov (JFNK) solver, capable of using very large timesteps without loss
of numerical stability or accuracy. A fundamental feature of the method is the
segregation of particle-orbit computations from the field solver, while
remaining fully self-consistent. This paper describes a very efficient,
mixed-precision hybrid CPU-GPU implementation of the implicit PIC algorithm
exploiting this feature. The JFNK solver is kept on the CPU in double precision
(DP), while the implicit, charge-conserving, and adaptive particle mover is
implemented on a GPU (graphics processing unit) using CUDA in single-precision
(SP). Performance-oriented optimizations are introduced with the aid of the
roofline model. The implicit particle mover algorithm is shown to achieve up to
400 GOp/s on a Nvidia GeForce GTX580. This corresponds to 25% absolute GPU
efficiency against the peak theoretical performance, and is about 300 times
faster than an equivalent serial CPU (Intel Xeon X5460) execution. For the test
case chosen, the mixed-precision hybrid CPU-GPU solver is shown to over-perform
the DP CPU-only serial version by a factor of \sim 100, without apparent loss
of robustness or accuracy in a challenging long-timescale ion acoustic wave
simulation.Comment: 25 pages, 6 figures, submitted to J. Comput. Phy
Performance of a second order electrostatic particle-in-cell algorithm on modern many-core architectures
In this paper we present the outline of a novel electrostatic, second order Particle-in-Cell (PIC) algorithm, that makes use of 'ghost particles' located around true particle positions in order to represent a charge distribution. We implement our algorithm within EMPIRE-PIC, a PIC code developed at Sandia National Laboratories. We test the performance of our algorithm on a variety of many-core architectures including NVIDIA GPUs, conventional CPUs, and Intel's Knights Landing. Our preliminary results show the viability of second order methods for PIC applications on these architectures when compared to previous generations of many-core hardware. Specifically, we see an order of magnitude improvement in performance for second order methods between the Tesla K20 and Tesla P100 GPU devices, despite only a 4Ă— improvement in the theoretical peak performance between the devices. Although these initial results show a large increase in runtime over first order methods, we hope to be able to show improved scaling behaviour and increased simulation accuracy in the future
Apar-T: code, validation, and physical interpretation of particle-in-cell results
We present the parallel particle-in-cell (PIC) code Apar-T and, more
importantly, address the fundamental question of the relations between the PIC
model, the Vlasov-Maxwell theory, and real plasmas.
First, we present four validation tests: spectra from simulations of thermal
plasmas, linear growth rates of the relativistic tearing instability and of the
filamentation instability, and non-linear filamentation merging phase. For the
filamentation instability we show that the effective growth rates measured on
the total energy can differ by more than 50% from the linear cold predictions
and from the fastest modes of the simulation.
Second, we detail a new method for initial loading of Maxwell-J\"uttner
particle distributions with relativistic bulk velocity and relativistic
temperature, and explain why the traditional method with individual particle
boosting fails.
Third, we scrutinize the question of what description of physical plasmas is
obtained by PIC models. These models rely on two building blocks:
coarse-graining, i.e., grouping of the order of p~10^10 real particles into a
single computer superparticle, and field storage on a grid with its subsequent
finite superparticle size. We introduce the notion of coarse-graining dependent
quantities, i.e., quantities depending on p. They derive from the PIC plasma
parameter Lambda^{PIC}, which we show to scale as 1/p. We explore two
implications. One is that PIC collision- and fluctuation-induced thermalization
times are expected to scale with the number of superparticles per grid cell,
and thus to be a factor p~10^10 smaller than in real plasmas. The other is that
the level of electric field fluctuations scales as 1/Lambda^{PIC} ~ p. We
provide a corresponding exact expression.
Fourth, we compare the Vlasov-Maxwell theory, which describes a phase-space
fluid with infinite Lambda, to the PIC model and its relatively small Lambda.Comment: 24 pages, 14 figures, accepted in Astronomy & Astrophysic
Efficient Strict-Binning Particle-in-Cell Algorithm for Multi-Core SIMD Processors
International audienceParticle-in-Cell (PIC) codes are widely used for plasma simulations. On recent multi-core hardware, performance of these codes is often limited by memory bandwidth. We describe a multi-core PIC algorithm that achieves close-to-minimal number of memory transfers with the main memory, while at the same time exploiting SIMD instructions for numerical computations and exhibiting a high degree of OpenMP-level parallelism. Our algorithm keeps particles sorted by cell at every time step, and represents particles from a same cell using a linked list of fixed-capacity arrays, called chunks. Chunks support either sequential or atomic insertions, the latter being used to handle fast-moving particles. To validate our code, called Pic-Vert, we consider a 3d electrostatic Landau-damping simulation as well as a 2d3v transverse instability of magnetized electron holes. Performance results on a 24-core Intel Sky-lake hardware confirm the effectiveness of our algorithm, in particular its high throughput and its ability to cope with fast moving particles
Numerical and Analytical Methods for Laser-Plasma Acceleration Physics
Theories and numerical modeling are fundamental tools for understanding, optimizing and designing present and future laser-plasma accelerators (LPAs).
Laser evolution and plasma wave excitation in a LPA driven by a weakly relativistically intense, short-pulse laser propagating in a preformed parabolic plasma channel, is studied analytically in 3D including the effects of pulse steepening and energy depletion. At higher laser intensities, the process of electron self-injection in the nonlinear bubble wake regime is studied by means of fully self-consistent Particle-in-Cell simulations. Considering a non-evolving laser driver propagating with a prescribed velocity, the geometrical properties of the non-evolving bubble wake are studied. For a range of parameters of interest for laser plasma acceleration, The dependence of the threshold for self-injection in the non-evolving wake on laser intensity and wake velocity is characterized.
Due to the nonlinear and complex nature of the Physics involved, computationally challenging numerical simulations are required to model laser-plasma accelerators operating at relativistic laser intensities. The numerical and computational optimizations, that combined in the codes INF&RNO and INF&RNO/quasi-static give the possibility to accurately model multi-GeV laser wakefield acceleration stages with present supercomputing architectures, are discussed. The PIC code jasmine, capable of efficiently running laser-plasma simulations on Graphics Processing Units (GPUs) clusters, is presented. GPUs deliver exceptional performance to PIC codes, but the core algorithms had to be redesigned for satisfying the constraints imposed by the intrinsic parallelism of the architecture. The simulation campaigns, run with the code jasmine for modeling the recent LPA experiments with the INFN-FLAME and CNR-ILIL laser systems, are also presented
Characterization of short-pulse laser-produced fast electrons by 3D hybrid particle-in-cell modeling of angularly resolved bremsstrahlung
The interaction of an intense short-pulse laser with a solid target efficiently generates energetic (fast) electrons above the energy of 1 Mega-electronvolt (MeV). Characterization of such high-energy electrons is critical for numerous applications, such as the generation of secondary particle sources, the creation of warm dense matter (WDM), advanced fusion concepts, and intense x-ray radiation for probing complex high areal density objects and inertial confinement fusion (ICF) fusion cores. However, determining laser-driven fast electron characteristics, specifically, electron energy distribution, divergence angle, and laser-to-electron conversion efficiency, has been challenging partly due to complex electron trajectories caused by electric sheath potential, known as electron recirculation. This thesis reports on developing a novel fast electron characterization technique by modeling angularly resolved bremsstrahlung radiations with a three-dimensional (3D) hybrid Particle-in-cell (PIC) code. An experiment using a 50-TW Leopard laser (15 J, 0.35 ps, 2Ă—10^19 W/cm2) was carried out to measure bremsstrahlung radiations at two angular positions and escaped fast electrons along the laser axis for two types of targets: a 100-ÎĽm- thick Cu foil and a same Cu target with a CH backing (Cu-CH target). A 3D hybrid-PIC code, Large Scale Plasma (LSP), is extensively used in this work to simulate the electron transport within the solid target, including electron recirculation around the target, and the x-ray generation of absolute photon yields. The measurements were fitted with a series of simulations by varying all three electron parameters. Fitting results based on chi-squared analyses show good agreements for both target types when the electron slope temperature of 0.8 MeV, the divergence angle of 70 degrees, and the electron beam energy of 1.3 J are used. Furthermore, the effects of electron recirculation on bremsstrahlung generation and the enhancement of a short-pulse laser-produced x-ray intensity in various foil thicknesses are numerically studied. These results provide insight into designing and optimizing an x-ray source target for broadband x-ray radiography of a magnetically compressed aluminum rod at the Zebra pulsed power laboratory
- …