67 research outputs found
Investigation of the dynamics of ionization induced injected electrons under the influence of beam loading effects
In laser-driven wakefield, ionization induced injection is an efficient way to inject electrons in the plasma wave. A detailed study on the beam dynamics under the influence of beam loading effects, which can be controlled by the concentration of nitrogen impurity introduced in the hydrogen gas was conducted. For a specific value of this percentage, the final energy of the high-energy electron bunch becomes nearly independent of the trapped positions, thus leading to a small energy dispersion. We also show that the final beam emittance is mainly determined by the injection process
Improving I/O Performance for Exascale Applications through Online Data Layout Reorganization
The applications being developed within the U.S. Exascale Computing Project (ECP) to run on imminent Exascale computers will generate scientific results with unprecedented fidelity and record turn-around time. Many of these codes are based on particle-mesh methods and use advanced algorithms, especially dynamic load-balancing and mesh-refinement, to achieve high performance on Exascale machines. Yet, as such algorithms improve parallel application efficiency, they raise new challenges for I/O logic due to their irregular and dynamic data distributions. Thus, while the enormous data rates of Exascale simulations already challenge existing file system write strategies, the need for efficient read and processing of generated data introduces additional constraints on the data layout strategies that can be used when writing data to secondary storage. We review these I/O challenges and introduce two online data layout reorganization approaches for achieving good tradeoffs between read and write performance. We demonstrate the benefits of using these two approaches for the ECP particle-in-cell simulation WarpX, which serves as a motif for a large class of important Exascale applications. We show that by understanding application I/O patterns and carefully designing data layouts we can increase read performance by more than 80 percent
Ultra-low emittance beam generation using two-color ionization injection in laser-plasma accelerators
Ultra-low emittance (tens of nm) beams can be generated in a plasma accelerator using ionization injection of electrons into a wakefield. An all-optical method of beam generation uses two laser pulses of different colors. A long-wavelength drive laser pulse (with a large ponderomotive force and small peak electric field) is used to excite a large wakefield without fully ionizing a gas, and a short-wavelength injection laser pulse (with a small ponderomotive force and large peak electric field), co-propagating and delayed with respect to the pump laser, to ionize a fraction of the remaining bound electrons at a trapped wake phase, generating an electron beam that is accelerated in the wake. The trapping condition, the ionized electron distribution, and the trapped bunch dynamics are discussed. Expressions for the beam transverse emittance, parallel and orthogonal to the ionization laser polarization, are derived. An example is presented using a 10-ÎĽm CO2 laser to drive the wake and a frequency-doubled Ti:Al2O3 laser for ionization injection
Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor
The Roofline Performance Model is a visually intuitive method used to bound the sustained peak floating-point performance of any given arithmetic kernel on any given processor architecture. In the Roofline, performance is nominally measured in floating-point operations per second as a function of arithmetic intensity (operations per byte of data). In this study we determine the Roofline for the Intel Knights Landing (KNL) processor, determining the sustained peak memory bandwidth and floating-point performance for all levels of the memory hierarchy, in all the different KNL cluster modes.We then determine arithmetic intensity and performance for a suite of application kernels being targeted for the KNL based supercomputer Cori, and make comparisons to current Intel Xeon processors. Cori is the National Energy Research Scientific Computing Center’s (NERSC) next generation supercomputer. Scheduled for deployment mid-2016, it will be one of the earliest and largest KNL deployments in the world
Recommended from our members
Detailed analysis of the effects of stencil spatial variations with arbitrary high-order finite-difference Maxwell solver
Very high order or pseudo-spectral Maxwell solvers are the method of choice to reduce discretization effects (e.g. numerical dispersion) that are inherent to low order Finite-Difference Time-Domain (FDTD) schemes. However, due to their large stencils, these solvers are often subject to truncation errors in many electromagnetic simulations. These truncation errors come from non-physical modifications of Maxwell's equations in space that may generate spurious signals affecting the overall accuracy of the simulation results. Such modifications for instance occur when Perfectly Matched Layers (PMLs) are used at simulation domain boundaries to simulate open media. Another example is the use of arbitrary order Maxwell solver with domain decomposition technique that may under some condition involve stencil truncations at subdomain boundaries, resulting in small spurious errors that do eventually build up. In each case, a careful evaluation of the characteristics and magnitude of the errors resulting from these approximations, and their impact at any frequency and angle, requires detailed analytical and numerical studies. To this end, we present a general analytical approach that enables the evaluation of numerical errors of fully three-dimensional arbitrary order finite-difference Maxwell solver, with arbitrary modification of the local stencil in the simulation domain. The analytical model is validated against simulations of domain decomposition technique and PMLs, when these are used with very high-order Maxwell solver, as well as in the infinite order limit of pseudo-spectral solvers. Results confirm that the new analytical approach enables exact predictions in each case. It also confirms that the domain decomposition technique can be used with very high-order Maxwell solvers and a reasonably low number of guard cells with negligible effects on the whole accuracy of the simulation
Recommended from our members
Summary of working group 6: Theory and simulations
The paper briefly summarizes the contributions presented during the working group 6 sessions on theory and simulations
Recommended from our members
A generalized massively parallel ultra-high order FFT-based Maxwell solver
Dispersion-free ultra-high order FFT-based Maxwell solvers have recently proven to be paramount to a large range of applications, including the high-fidelity modeling of high-intensity laser–matter interactions with Particle-In-Cell (PIC) codes. To enable a massively parallel scaling of these solvers, a novel parallelization technique was recently proposed, which consists in splitting the simulation domain into several processor sub-domains, with guard regions appended at each sub-domain boundary. Maxwell's equations are advanced independently on each sub-domain using local shared-memory FFTs (instead of a single distributed global FFT). This implies small truncation errors at sub-domain boundaries, the amplitude of which depends on guard regions sizes and order of the Maxwell solver. For moderate guard region sizes, this ’local’ technique proved to be highly scalable on up to a million cores and notably enabled the 3D modelingof so-called plasma mirrors, for which 8 guard cells only were enough to prevent truncation error growth. Yet, for other applications, the required number of guard cells might be much higher, which would severely limit the parallel efficiency of this technique due to the large volume of guard cells to be exchanged between sub-domains. In this context, we propose a novel parallelization technique that ensures very good scaling of FFT-based solvers with an arbitrarily high number of guard cells. Our ’hybrid’ technique consists in performing distributed FFTs on local groups of processors with guard regions now appended to boundaries of each group of processors. It uses a dual domain decomposition method for the Maxwell solver and other parts of the PIC cycle to keep the simulation load-balanced. This ’hybrid’ technique was implemented in the open source exascale library PICSAR. Benchmarks show that for a large number of guard cells (>16), the ’hybrid’ technique offers up to ×3 speed-up and ×8 memory savings compared to the ’local’ one
Pseudospectral Maxwell solvers for an accurate modeling of Doppler harmonic generation on plasma mirrors with particle-in-cell codes.
With the advent of petawatt class lasers, the very large laser intensities attainable on target should enable the production of intense high-order Doppler harmonics from relativistic laser-plasma mirror interactions. At present, the modeling of these harmonics with particle-in-cell (PIC) codes is extremely challenging as it implies an accurate description of tens to hundreds of harmonic orders on a broad range of angles. In particular, we show here that due to the numerical dispersion of waves they induce in vacuum, standard finite difference time domain (FDTD) Maxwell solvers employed in most PIC codes can induce a spurious angular deviation of harmonic beams potentially degrading simulation results. This effect was extensively studied and a simple toy model based on the Snell-Descartes law was developed that allows us to finely predict the angular deviation of harmonics depending on the spatiotemporal resolution and the Maxwell solver used in the simulations. Our model demonstrates that the mitigation of this numerical artifact with FDTD solvers mandates very high spatiotemporal resolution preventing realistic three-dimensional (3D) simulations even on the largest computers available at the time of writing. We finally show that nondispersive pseudospectral analytical time domain solvers can considerably reduce the spatiotemporal resolution required to mitigate this spurious deviation and should enable in the near future 3D accurate modeling on supercomputers in a realistic time to solution
- …