Search CORE

2 research outputs found

Decentralized in-order execution of a sequential task-based code for shared-memory architectures

Author: Agullo Emmanuel
Aumage Olivier
Castes Charly
Saillard Emmanuelle
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/05/2022
Field of study

International audienceThe hardware complexity of modern machines makes the design of adequate programming models crucial for jointly ensuring performance, portability, and productivity in high-performance computing (HPC). Sequential task-based programming models paired with advanced runtime systems allow the programmer to write a sequential algorithm independently of the hardware architecture in a productive and portable manner, and let a third party software layer-the runtime system-deal with the burden of scheduling a correct, parallel execution of that algorithm to ensure performance. Many HPC algorithms have successfully been implemented following this paradigm, as a testimony of its effectiveness. Developing algorithms that specifically require fine-grained tasks along this model is still considered prohibitive, however, due to per-task management overhead [1], forcing the programmer to resort to a less abstract, and hence more complex "task+X" model. We thus investigate the possibility to offer a tailored execution model, trading dynamic mapping for efficiency by using a decentralized, conservative in-order execution of the task flow, while preserving the benefits of relying on the sequential taskbased programming model. We propose a formal specification of the execution model as well as a prototype implementation, which we assess on a shared-memory multicore architecture with several synthetic workloads. The results show that under the condition of a proper task mapping supplied by the programmer, the pressure on the runtime system is significantly reduced and the execution of fine-grained task flows is much more efficient

INRIA a CCSD electronic archive server

Exécution ordonnée décentralisée d'un code séquentiel à base de tâches sur une architecture à mémoire partagée

Author: Agullo Emmanuel
Aumage Olivier
Castes Charly
Saillard Emmanuelle
Publication venue: HAL CCSD
Publication date: 28/01/2022
Field of study

Abstract:Decentralized in-order execution of a sequential task-based code for shared-memory architecturesCharly Castes, Emmanuel Agullo, Olivier Aumage, Emmanuelle SaillardProject-Teams HiePACS and STORM Research Report n° 9450 — January 2022 — 30 pagesThe hardware complexity of modern machines makes the design of adequate pro- gramming models crucial for jointly ensuring performance, portability, and productivity in high- performance computing (HPC). Sequential task-based programming models paired with advanced runtime systems allow the programmer to write a sequential algorithm independently of the hard- ware architecture in a productive and portable manner, and let a third party software layer —the runtime system— deal with the burden of scheduling a correct, parallel execution of that algorithm to ensure performance. Many HPC algorithms have successfully been implemented following this paradigm, as a testimony of its effectiveness.Developing algorithms that specifically require fine-grained tasks along this model is still considered prohibitive, however, due to per-task management overhead [1], forcing the programmer to resort to a less abstract, and hence more complex “task+X” model. We thus investigate the possibility to offer a tailored execution model, trading dynamic mapping for efficiency by using a decentralized, conservative in-order execution of the task flow, while preserving the benefits of relying on the sequential task-based programming model. We propose a formal specification of the execution model as well as a prototype implementation, which we assess on a shared-memory multicore architecture with several synthetic workloads. The results show that under the condition of a proper task mapping supplied by the programmer, the pressure on the runtime system is significantly reduced and the execution of fine-grained task flows is much more efficient

INRIA a CCSD electronic archive server

HAL-Rennes 1