Search CORE

5 research outputs found

On the benefits of tasking with OpenMP

Author: A Duran
A Duran
A Rico
E Ayguadé
M Garcia-Gasulla
MJ Berger
MJ Berger
P Atkinson
P Virouleau
R Vidal
T Gautier
X Teruel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Tasking promises a model to program parallel applications that provides intuitive semantics. In the case of tasks with dependences, it also promises better load balancing by removing global synchronizations (barriers), and potential for improved locality. Still, the adoption of tasking in production HPC codes has been slow. Despite OpenMP supporting tasks, most codes rely on worksharing-loop constructs alongside MPI primitives. This paper provides insights on the benefits of tasking over the worksharing-loop model by reporting on the experience of taskifying an adaptive mesh refinement proxy application: miniAMR. The performance evaluation shows the taskified implementation being 15–30% faster than the loop-parallel one for certain thread counts across four systems, three architectures and four compilers thanks to better load balancing and system utilization. Dynamic scheduling of loops narrows the gap but still falls short of tasking due to serial sections between loops. Locality improvements are incidental due to the lack of locality-aware scheduling. Overall, the introduction of asynchrony with tasking lives up to its promises, provided that programmers parallelize beyond individual loops and across application phases.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Detecting Non-Sibling Dependencies in OpenMP Task-Based Applications

Author: A Podobas
AE Eichenberger
E Agullo
HS Matar
J Protze
M Ghane
P Carribault
P Virouleau
P Virouleau
Publication venue: HAL CCSD
Publication date: 11/09/2019
Field of study

International audienceThe advent of the multicore era led to the duplication of functional units through an increasing number of cores. To exploit those processors, a shared-memory parallel programming model is one possible direction. Thus, OpenMP is a good candidate to enable different paradigms: data parallelism (including loop-based directives) and control parallelism, through the notion of tasks with dependencies. But this is the programmer responsibility to ensure that data dependencies are complete such as no data races may happen. It might be complex to guarantee that no issue will occur and that all dependencies have been correctly expressed in the context of nested tasks. This paper proposes an algorithm to detect the data dependencies that might be missing on the OpenMP task clauses between tasks that have been generated by different parents. This approach is implemented inside a tool relying on the OMPT interface

Crossref

HAL-CEA

HAL UVSQ

On the Impact of OpenMP Task Granularity

Author: A Podobas
C Szyperski
D Traoré
E Ayguadé
F Broquedis
GE Blelloch
H Vandierendonck
M Frigo
P Virouleau
P Virouleau
S Olivier
SC Goldstein
SN Agathos
V Grandgirard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/09/2018
Field of study

International audienceTasks are a good support for composition. During the development of a high-level component model for HPC, we have experimented to manage parallelism from components using OpenMP tasks. Since version 4-0, the standard proposes a model with dependent tasks that seems very attractive because it enables the description of dependencies between tasks generated by different components without breaking maintainability constraints such as separation of concerns. The paper presents our feedback on using OpenMP in our context. We discover that our main issues are a too coarse task granularity for our expected performance on classical OpenMP runtimes, and a harmful task throttling heuristic counter-productive for our applications. We present a completion time breakdown of task management in the Intel OpenMP runtime and propose extensions evaluated on a testbed application coming from the Gysela application in plasma physics

HAL-ENS-LYON

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Preliminary Experience with OpenMP Memory Management Implementation

Author: A Sodani
C Augonnet
D Schwalb
K Bhandari
M Pérache
P Carribault
P Virouleau
Publication venue: HAL CCSD
Publication date: 21/09/2020
Field of study

International audienceBecause of the evolution of compute units, memory hetero-geneity is becoming popular in HPC systems. But dealing with such various memory levels often requires different approaches and interfaces. For this purpose, OpenMP 5.0 defines memory-management constructs to offer application developers the ability to tackle the issue of exploiting multiple memory spaces in a portable way. This paper proposes an overview of memory-management from applications to runtimes. Thus, we describe a convenient way to tune an application to include memory management constructs. We also detail a methodology to integrate them into an OpenMP runtime supporting multiple memory types (DDR, MC-DRAM and NVDIMM). We implement our design into the MPC framework , while presenting some results on a realistic benchmark

Crossref

HAL-CEA

sOMP: Simulating OpenMP Task-Based Applications with NUMA Effects

Author: A Daumen
AE Eichenberger
C Engelmann
C Feld
J Tao
JC de Kergommeaux
K Dzmitry
L Stanisic
M Slimane
N Denoyelle
P Czarnul
P Virouleau
P Virouleau
R Aversa
RN Calheiros
S Girona
T Gautier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/09/2020
Field of study

International audienceAnticipating the behavior of applications, studying, and designing algorithms are some of the most important purposes for the performance and correction studies about simulations and applications relating to intensive computing. Often studies that evaluate performance on a single-node of a simulation don’t consider Non-Uniform Memory Access (NUMA) as having a critical effect. This work focuses on accurately predicting the performance of task-based OpenMP applications from traces collected through the OMPT interface. We first introduce TiKKi, a tool that records a rich high-level representation of the execution trace of a real OpenMP application. With this trace, an accurate prediction of the execution time is modeled from the architecture of the machine and sOMP, a SimGrid-based simulator for task-based applications with data dependencies. These predictions are improved when the model takes into account memory transfers. We show that good precision (10% relative error on average) can be obtained for various grains and on different numbers of cores inside different shared-memory architectures

HAL-ENS-LYON

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1