23 research outputs found
Maximum Subarray Problem in 1D and 2D via Weighted Paths in Directed Acyclic Graphs
The Maximum Subarray Problem was encountered by Ulf Grenander for maximum likelihood estimation in pattern analysis. We are given a vector (or matrix) of numbers, and we have to find the contiguous sub-vector (or sub-matrix) which has the maximum sum of numbers in it. Apart from the original application, the problem also arises for example in biological sequence analysis.We present here a linear-time algorithm in one dimension which is different from the one known due to Kadane, and present a way of extending it to two dimensions. To achieve the latter, we provide a new technique, the red-blue graphs, which encodes all the contiguous sub-matrices of an m Ă— n matrix in size O(m Ă— n)
A Space and Bandwidth Efficient Multicore Algorithm for the Particle-in-Cell Method
International audienceThe Particle-in-Cell (PIC) method allows solving partial differential equation through simulations, with important applications in plasma physics. To simulate thousands of billions of particles on clusters of multicore machines, prior work has proposed hybrid algorithms that combine domain decomposition and particle decomposition with carefully optimized algorithms for handling particles processed on each multicore socket. Regarding the multicore processing, existing algorithms either suffer from suboptimal execution time, due to sorting operations or use of atomic instructions, or suffer from suboptimal space usage. In this paper, we propose a novel parallel algorithm for two-dimensional PIC simulations on multicore hardware that features asymptotically-optimal memory consumption, and does not perform unnecessary accesses to the main memory. In practice, our algorithm reaches 65% of the maximum bandwidth, and shows excellent scalability on the classical Landau damping and two-stream instability test cases
Efficient Data Layouts for a Three-Dimensional Electrostatic Particle-in-Cell Code
International audienceThe Particle-in-Cell (PIC) method is a widely used tool in plasma physics. To accurately solve realistic problems, the method requires to use trillions of particles and therefore, there is a strong demand for high performance code on modern architectures. The present work describes performance results of Pic-Vert, a hybrid OpenMP/MPI and vectorized three-dimensional electrostatic PIC code.The code simulates 3d3v Vlasov-Poisson systems on Cartesian grids with periodic boundary conditions. Overall, it processes 590 million particles/second on a 24-core Intel Skylake architecture, without hyper-threading (25 million particles per second per core).The paper presents extensions in 3d of our preliminary 2d results, with highlights on the difficulties andsolutions proposed for these extensions. Specifically, our main contributions consist in proposing a new space-filling curve in 3d (called L6D) to improve the cache reuse and an adapted loop transformation (strip-mining) to achieve efficient vectorization. The analysis of these optimization strategies is performed in two-stages, first on a 24-core socket and second on a super-computer, from 1 to 3,072 cores, demonstrating significant performance gains and very satisfactory weak scaling results of the code
Efficient Data Structures for a Hybrid Parallel and Vectorized Particle-in-Cell Code
International audienceThe contribution of the present work relies on an innovative and judicious combination of several optimization techniques for achieving high performance when using automatic vectorization and hybrid MPI/OpenMP parallelism in a Particle-in-Cell (PIC) code. The domain of application is plasma physics: the code simulates 2d2v Vlasov-Poisson systems on Cartesian grids with periodic boundary conditions. Overall, our code processes 65 million particles/second per core on Intel Haswell (without hyper-threading) and achieves a good weak scaling up to 0.4 trillion particles on 8,192 cores. The optimizations mainly consist in using (i) a structure of arrays for the particles, (ii) an efficient data structure for the electric field and the charge density, and (iii) an appropriate code for automatic vectorization of the charge accumulation and of the positions' update. In particular, we use space-filling curves to enhance data locality while enabling vectorization: starting from a redundant cell-based data structure for the electric field and for the charge density, we compare several space-filling curves for an efficient ordering of these data and we obtain a gain of 36% in the number of L2 and L3 cache misses when using a Morton curve instead of the classical row-major one. In addition, by proposing a specific writing of the updating positions code we achieve a 31% time improvement in that step. The optimizations bring an overall gain in the execution time of 42% with respect to a standard code. The parallelization of the particle loops is simply performed by means of both distributed and shared memory paradigms, without domain decomposition. We explain the weak and the strong scalings of the code bounded as expected by the overhead of the MPI communications
OptiTrust: an Interactive Framework for Source-to-Source Transformations
This paper presents an interactive framework for developing high-performance C code via series of source-to-source transformations. Optimization steps are described in transformation scripts, expressed as OCaml programs. The programmer can interactively visualize the textual differences associated with any step of the script. We demonstrate the effectiveness of OptiTrust by reproducing a manually optimized Particle-In-Cell numerical simulation, starting from a direct, unoptimized version of the algorithm. This case study covers many state-of-the-art optimization patterns that appear in numerical simulation codes. We argue that, compared with optimizing code by hand, deriving high performance code using a transformation script makes the code easier to review, easier to debug, and easier to maintain as the intended program or as the target hardware evolves
High order numerical methods for Vlasov-Poisson models of plasma sheaths
This article is a report of the CEMRACS 2022 project, called HIVLASHEA, standing for "High order methods for Vlasov-Poisson models for sheaths". A two-species Vlasov-Poisson model is described together with some numerical simulations, permitting to exhibit the formation of a plasma sheath. The numerical simulations are performed with two different methods: a first order classical finite difference (FD) scheme and a high order semi-Lagrangian (SL) scheme with Strang splitting; for the latter one, the implementation of (non-periodic) boundary conditions is discussed. The codes are first evaluated on a one-species case, where an analytical solution is known. For the two-species case, cross comparisons and the influence of the numerical parameters for the SL method are performed in order to have an idea of a reference numerical simulation. Aknowledgements Centre de Calcul Intensif d'Aix-Marseille is acknowledged for granting access to its high performance computing resources
Pic-Vert : une implémentation de la méthode particulaire pour architectures multi-coeurs
In this thesis, we are interested in solving the Vlasov–Poisson system of equations (useful in the domain of plasma physics, for example within the ITER project), thanks to classical Particle-in-Cell (PIC) and semi-Lagrangian methods. The main contribution of our thesis is an efficient implementation of the PIC method on multi-core architectures, written in C, called Pic-Vert. Our implementation (a) achieves close-to-minimal number of memory transfers with the main memory, (b) exploits SIMD instructions for numerical computations, and (c) exhibits a high degree of shared memory parallelism. To put our work in perspective with respect to the state-of-the-art, we propose a metric to compare the efficiency of different PIC implementations when using different multi-core architectures. Our implementation is 3 times faster than other recent implementations on the same architecture (Intel Haswell).Cette thèse a pour contexte la résolution numérique du système de Vlasov–Poisson (modèle utilisé en physique des plasmas, par exemple dans le cadre du projet ITER) par les méthodes classiques particulaires (PIC pour "Particle-in-Cell") et semi-Lagrangiennes. La contribution principale de notre thèse est une implémentation efficace de la méthode PIC pour architectures multi-coeurs, écrite dans le langage C, dont le nom est Pic-Vert. Notre implémentation (a) atteint un nombre quasi-minimal de transferts mémoires avec la mémoire principale, (b) exploite les instructions vectorielles (SIMD) pour les calculs numériques, et (c) expose une quantité suffisante de parallélisme, en mémoire partagée. Pour mettre notre travail en perspective avec l'état de l'art, nous proposons une métrique permettant de comparer différentes implémentations sur différentes architectures. Notre implémentation est 3 fois plus rapide que d'autres implémentations récentes sur la même architecture (Intel Haswell)
Pic-Vert : une implémentation de la méthode particulaire pour architectures multi-coeurs
In this thesis, we are interested in solving the Vlasov–Poisson system of equations (useful in the domain of plasma physics, for example within the ITER project), thanks to classical Particle-in-Cell (PIC) and semi-Lagrangian methods. The main contribution of our thesis is an efficient implementation of the PIC method on multi-core architectures, written in C, called Pic-Vert. Our implementation (a) achieves close-to-minimal number of memory transfers with the main memory, (b) exploits SIMD instructions for numerical computations, and (c) exhibits a high degree of shared memory parallelism. To put our work in perspective with respect to the state-of-the-art, we propose a metric to compare the efficiency of different PIC implementations when using different multi-core architectures. Our implementation is 3 times faster than other recent implementations on the same architecture (Intel Haswell).Cette thèse a pour contexte la résolution numérique du système de Vlasov–Poisson (modèle utilisé en physique des plasmas, par exemple dans le cadre du projet ITER) par les méthodes classiques particulaires (PIC pour "Particle-in-Cell") et semi-Lagrangiennes. La contribution principale de notre thèse est une implémentation efficace de la méthode PIC pour architectures multi-coeurs, écrite dans le langage C, dont le nom est Pic-Vert. Notre implémentation (a) atteint un nombre quasi-minimal de transferts mémoires avec la mémoire principale, (b) exploite les instructions vectorielles (SIMD) pour les calculs numériques, et (c) expose une quantité suffisante de parallélisme, en mémoire partagée. Pour mettre notre travail en perspective avec l'état de l'art, nous proposons une métrique permettant de comparer différentes implémentations sur différentes architectures. Notre implémentation est 3 fois plus rapide que d'autres implémentations récentes sur la même architecture (Intel Haswell)
Maximum Subarray Problem in 1D and 2D via Weighted Paths in Directed Acyclic Graphs
The Maximum Subarray Problem was encountered by Ulf Grenander for maximum likelihood estimation in pattern analysis. We are given a vector (or matrix) of numbers, and we have to find the contiguous sub-vector (or sub-matrix) which has the maximum sum of numbers in it. Apart from the original application, the problem also arises for example in biological sequence analysis.We present here a linear-time algorithm in one dimension which is different from the one known due to Kadane, and present a way of extending it to two dimensions. To achieve the latter, we provide a new technique, the red-blue graphs, which encodes all the contiguous sub-matrices of an m Ă— n matrix in size O(m Ă— n)
Pic-Vert : a particle-in-cell implementation for multi-core architectures
Cette thèse a pour contexte la résolution numérique du système de Vlasov–Poisson (modèle utilisé en physique des plasmas, par exemple dans le cadre du projet ITER) par les méthodes classiques particulaires (PIC pour "Particle-in-Cell") et semi-Lagrangiennes. La contribution principale de notre thèse est une implémentation efficace de la méthode PIC pour architectures multi-coeurs, écrite dans le langage C, dont le nom est Pic-Vert. Notre implémentation (a) atteint un nombre quasi-minimal de transferts mémoires avec la mémoire principale, (b) exploite les instructions vectorielles (SIMD) pour les calculs numériques, et (c) expose une quantité suffisante de parallélisme, en mémoire partagée. Pour mettre notre travail en perspective avec l'état de l'art, nous proposons une métrique permettant de comparer différentes implémentations sur différentes architectures. Notre implémentation est 3 fois plus rapide que d'autres implémentations récentes sur la même architecture (Intel Haswell).In this thesis, we are interested in solving the Vlasov–Poisson system of equations (useful in the domain of plasma physics, for example within the ITER project), thanks to classical Particle-in-Cell (PIC) and semi-Lagrangian methods. The main contribution of our thesis is an efficient implementation of the PIC method on multi-core architectures, written in C, called Pic-Vert. Our implementation (a) achieves close-to-minimal number of memory transfers with the main memory, (b) exploits SIMD instructions for numerical computations, and (c) exhibits a high degree of shared memory parallelism. To put our work in perspective with respect to the state-of-the-art, we propose a metric to compare the efficiency of different PIC implementations when using different multi-core architectures. Our implementation is 3 times faster than other recent implementations on the same architecture (Intel Haswell)