Search CORE

31 research outputs found

Pipelining the Fast Multipole Method over a Runtime System

Author: Agullo Emmanuel
Bramas Béranger
Coulaud Olivier
Darve Eric
Messner Matthias
Toru Takahashi
Publication venue
Publication date: 01/01/2012
Field of study

Fast Multipole Methods (FMM) are a fundamental operation for the simulation of many physical problems. The high performance design of such methods usually requires to carefully tune the algorithm for both the targeted physics and the hardware. In this paper, we propose a new approach that achieves high performance across architectures. Our method consists of expressing the FMM algorithm as a task flow and employing a state-of-the-art runtime system, StarPU, in order to process the tasks on the different processing units. We carefully design the task flow, the mathematical operators, their Central Processing Unit (CPU) and Graphics Processing Unit (GPU) implementations, as well as scheduling schemes. We compute potentials and forces of 200 million particles in 48.7 seconds on a homogeneous 160 cores SGI Altix UV 100 and of 38 million particles in 13.34 seconds on a heterogeneous 12 cores Intel Nehalem processor enhanced with 3 Nvidia M2090 Fermi GPUs.Comment: No. RR-7981 (2012

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

PGAS-FMM: Implementing a distributed fast multipole method using the X10 programming language

Author: Huber Thomas
Milthorpe Josh
Rendell Alistair
Publication venue: 'Wiley'
Publication date: 10/12/2015
Field of study

The fast multipole method (FMM) is a complex, multi-stage algorithm over a distributed tree data structure, with multiple levels of parallelism and inherent data locality. X10 is a modern partitioned global address space language with support for asynchr

The Australian National University

Parallelization of Hierarchical Matrix Algorithms for Electromagnetic Scattering Problems

Author: Ancourt C.
Francavilla M.A.
Giordanengo G.
Grelck C.
Kessler C.
Larsson E.
Righero M.
Vecchi G.
Vipiana F.
Zafari A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

International Migration, Integration and Social Cohesion online publications

From Piz Daint to the Stars: Simulation of Stellar Mergers using High-Level Abstractions

Author: Amini Parsa
Biddiscombe John
Daiß Gregor
Diehl Patrick
Frank Juhan
Huck Kevin
Kaiser Hartmut
Marcello Dominic
Pfander David
Pflüger Dirk
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/08/2019
Field of study

We study the simulation of stellar mergers, which requires complex simulations with high computational demands. We have developed Octo-Tiger, a finite volume grid-based hydrodynamics simulation code with Adaptive Mesh Refinement which is unique in conserving both linear and angular momentum to machine precision. To face the challenge of increasingly complex, diverse, and heterogeneous HPC systems, Octo-Tiger relies on high-level programming abstractions. We use HPX with its futurization capabilities to ensure scalability both between nodes and within, and present first results replacing MPI with libfabric achieving up to a 2.8x speedup. We extend Octo-Tiger to heterogeneous GPU-accelerated supercomputers, demonstrating node-level performance and portability. We show scalability up to full system runs on Piz Daint. For the scenario's maximum resolution, the compute-critical parts (hydrodynamics and gravity) achieve 68.1% parallel efficiency at 2048 nodes.Comment: Accepted at SC1

arXiv.org e-Print Archive

Crossref

Louisiana State University

Task-based programming for Seismic Imaging: Preliminary Results

Author: Agullo Emmanuel
Boillot Lionel
Bosilca George
Calandra Henri
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2014
Field of study

International audienceThe level of hardware complexity of current supercomputers is forcing the High Performance Computing (HPC) community to reconsider parallel programming paradigms and standards. The high-level of hardware abstraction provided by task-based paradigms make them excellent candidates for writing portable codes that can consistently deliver high performance across a wide range of platforms. While this paradigm has proved efficient for achieving such goals for dense and sparse linear solvers, it is yet to be demonstrated that industrial parallel codes relying on the classical Message Passing Interface (MPI) standard and that accumulate dozens of years of expertise (and countless lines of code) may be revisited to turn them into efficient task-based programs. In this paper, we study the applicability of task-based programming in the case of a Reverse Time Migration (RTM) application for Seismic Imaging. The initial MPI-based application is turned into a task-based code executed on top of the PaRSEC runtime system. Preliminary results show that the approach is competitive with (and even potentially superior to) the original MPI code on an homogenous multicore node and can exploit much more efficiently complex hardware such as a cache coherent Non Uniform Memory Access (ccNUMA) node or an Intel Xeon Phi accelerator

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1