Search CORE

3 research outputs found

Optimisation d'un algorithme Galerkin Discontinu en OpenCL appliqué à la simulation en électromagnétisme

Author: Helluy Philippe
Strub Thomas
Weber Bruno
Publication venue: HAL CCSD
Publication date: 15/11/2017
Field of study

International audienceIn this paper, we present GPU and CPU optimization results of a Discontinuous Galerkin algorithm applied to electromagnetism and implemented in OpenCL and MPI. This algorithm was initially optimized to run in parallel on several GPUs and then adapted for CPUs. GPUs and CPUs require an specific implementation adapted to their hardware architectures. We begin by describing the field of application. Then, we present the GPU optimizations as well as the performances obtained on GPU and CPU with this version of the code. Finally, we describe the adaptations made for the OpenCL CPU optimizations.Dans cet article, nous présentons les résultats d'optimisation sur GPU et CPU d'un algorithme Galerkin Discontinu appliqué à l'électromagnétisme et codé en OpenCL et MPI. Cet algorithme a initialement été optimisé pour être exécuté en parallèle sur plusieurs GPUs et ensuite adapté pour CPUs. Les GPUs et CPUs nécessitent une implémentation propre à leur architecture matérielle. Nous commençons par préciser le contexte d'application. Dans un second temps, nous présentons les optimisations GPU ainsi que les performances obtenues sur GPU et CPU avec cette version du code. Enfin, nous décrivons les adaptations qui ont permis de décupler les performances sur CPU

INRIA a CCSD electronic archive server

Optimization of a discontinuous Galerkin solver with OpenCL and StarPU

Author: Bramas Bérenger
Helluy Philippe
Mendoza Laura
Weber Bruno
Publication venue: Institut de Mathématiques de Marseille, AMU
Publication date: 29/01/2020
Field of study

International audienceSince the recent advance in microprocessor design, the optimization of computing software becomes more and more technical. One of the difficulties is to transform sequential algorithms into parallel ones. A possible solution is the task-based design. In this approach, it is possible to describe the parallelization possibilities of the algorithm automatically. The task-based design is also a good strategy to optimize software in an incremental way. The objective of this paper is to describe a practical experience of a task-based parallelization of a Discontinuous Galerkin method in the context of electromagnetic simulations. The task-based description is managed by the StarPU runtime. Additional acceleration is obtained by OpenCL

INRIA a CCSD electronic archive server

Asynchronous OpenCL/MPI numerical simulations of conservation laws

Author: Helluy Philippe
Massaro Michel
Roberts Malcolm
Strub Thomas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

International audienceHyperbolic conservation laws are important mathematical models for describing many phenomena in physics or engineering. The Finite Volume (FV) method and the Discon-tinuous Galerkin (DG) methods are two popular methods for solving conservation laws on computers. Those two methods are good candidates for parallel computing: • they require a large amount of uniform and simple computations, • they rely on explicit time-integration, • they present regular and local data access pattern. In this paper, we present several FV and DG numerical simulations that we have realized with the OpenCL and MPI paradigms. First, we compare two optimized implementations of the FV method on a regular grid: an OpenCL implementation and a more traditional OpenMP implementation. We compare the efficiency of the approach on several CPU and GPU architectures of different brands. Then we give a short presentation of the DG method. Finally, we present how we have implemented this DG method in the OpenCL/MPI framework in order to achieve high efficiency. The implementation relies on a splitting of the DG mesh into sub-domains and sub-zones. Different kernels are compiled according to the zones properties. In addition, we rely on the OpenCL asynchronous task graph in order to overlap OpenCL computations, memory transfers and MPI communications. This work has benefited from several supports: from the french defense agency DGA, from the Labex ANR-11-LABX-0055-IRMIA and from the AxesSim company

Crossref

INRIA a CCSD electronic archive server