Search CORE

14 research outputs found

Fibers are not (P)Threads: The Case for Loose Coupling of Asynchronous Programming Models and MPI Through Continuations

Author: Barcelona Supercomputing Center
Bosilca G.
Grant E.
Hoefler Torsten
IEEE and The Open Group
Iwasaki S.
Pritchard Howard
Schuchart J.
Schuchart Joseph
Unified Communication Framework Consortium
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/11/2020
Field of study

Asynchronous programming models (APM) are gaining more and more traction, allowing applications to expose the available concurrency to a runtime system tasked with coordinating the execution. While MPI has long provided support for multi-threaded communication and non-blocking operations, it falls short of adequately supporting APMs as correctly and efficiently handling MPI communication in different models is still a challenge. Meanwhile, new low-level implementations of light-weight, cooperatively scheduled execution contexts (fibers, aka user-level threads (ULT)) are meant to serve as a basis for higher-level APMs and their integration in MPI implementations has been proposed as a replacement for traditional POSIX thread support to alleviate these challenges. In this paper, we first establish a taxonomy in an attempt to clearly distinguish different concepts in the parallel software stack. We argue that the proposed tight integration of fiber implementations with MPI is neither warranted nor beneficial and instead is detrimental to the goal of MPI being a portable communication abstraction. We propose MPI Continuations as an extension to the MPI standard to provide callback-based notifications on completed operations, leading to a clear separation of concerns by providing a loose coupling mechanism between MPI and APMs. We show that this interface is flexible and interacts well with different APMs, namely OpenMP detached tasks, OmpSs-2, and Argobots.Comment: 12 pages, 7 figures Published in proceedings of EuroMPI/USA '20, September 21-24, 2020, Austin, TX, US

arXiv.org e-Print Archive

Crossref

Extending the Functionality of Score-P through Plugins: Interfaces and Use Cases

Author: Hackenberg Daniel
Ilsche Thomas
Nagel Wolfgang E.
Schuchart Joseph
Schöne Robert
Tschüter Ronny
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Performance measurement and runtime tuning tools are both vital in the HPC software ecosystem and use similar techniques: the analyzed application is interrupted at specific events and information on the current system state is gathered to be either recorded or used for tuning. One of the established performance measurement tools is Score-P. It supports numerous HPC platforms and parallel programming paradigms. To extend Score-P with support for different back-ends, create a common framework for measurement and tuning of HPC applications, and to enable the re-use of common software components such as implemented instrumentation techniques, this paper makes the following contributions: (I) We describe the Score-P metric plugin interface, which enables programmers to augment the event stream with metric data from supplementary data sources that are otherwise not accessible for Score-P. (II) We introduce the flexible Score-P substrate plugin interface that can be used for custom processing of the event stream according to the specific requirements of either measurement, analysis, or runtime tuning tasks. (III) We provide examples for both interfaces that extend Score-P’s functionality for monitoring and tuning purposes

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Crossref

Technische Universität Dresden: Qucosa

MPI Application Binary Interface Standardization

Author: Besnard Jean-Baptiste
Brown Jed
Byrne Simon
Dalcin Lisandro
Gadeschi Gonzalo Brito
Hammond Jeff R.
Pérache Marc
Schnetter Erik
Schuchart Joseph
Zhou Hui
Publication venue
Publication date: 22/08/2023
Field of study

MPI is the most widely used interface for high-performance computing (HPC) workloads. Its success lies in its embrace of libraries and ability to evolve while maintaining backward compatibility for older codes, enabling them to run on new architectures for many years. In this paper, we propose a new level of MPI compatibility: a standard Application Binary Interface (ABI). We review the history of MPI implementation ABIs, identify the constraints from the MPI standard and ISO C, and summarize recent efforts to develop a standard ABI for MPI. We provide the current proposal from the MPI Forum's ABI working group, which has been prototyped both within MPICH and as an independent abstraction layer called Mukautuva. We also list several use cases that would benefit from the definition of an ABI while outlining the remaining constraints

arXiv.org e-Print Archive

Глобализация, региональная интеграция и экономическое развитие

Author: Beseda Martin
Horák David
Kružík Jakub
Schuchart Joseph
Sojka Radim
Čermák Martin
Říha Lubomír
Publication venue: Книжный дом
Publication date: 01/01/2007
Field of study

The paper deals with the energy consumption evaluation of the Finite Element Tearing and Interconnect (FETI) based solvers of linear systems, which is an established method for solving real-world engineering problems. Authors evaluated the effect of the CPU frequency on the energy consumption of the FETI solver using a linear elasticity 3D cube synthetic benchmark. In this problem, the effect of frequency tuning on the energy consumption of the essential processing kernels of the FETI method was evaluated. The paper provides results for two types of frequency tuning: (1) static tuning and (2) dynamic tuning. For static tuning experiments, the frequency is set before execution and kept constant during the runtime. For dynamic tuning, the frequency is changed during the program execution to adapt the system to the actual needs of the application. The paper shows that static tuning brings up 12% energy savings when compared to default CPU settings (the highest clock rate). The dynamic tuning improves this further by up to 3%

Crossref

DSpace at VSB Technical University of Ostrava

BSU Digital Library

The EU Center of Excellence for Exascale in Solid Earth (ChEESE): Implementation, results, and roadmap for the second phase

Author: Abril Claudia
Afanasiev Michael
Amati Giorgio
Aniko Wirp Sara
Bader Michael
Badia Rosa M.
Barsotti Sara
Basili Roberto
Bayraktar Hafize B.
Bernardi Fabrizio
Boehm Christian
Brizuela Beatriz
Brogi Federico
Cabrera Eduardo
Casarotti Emanuele
Castro Manuel J.
Cerminara Matteo
Cheptsov Alexey
Cirella Antonella
Conejero Javier
Costa Antonio
de la Asunción Marc
de la Puente Josep
Djuric Marco
Dorozhinskii Ravil
Espinosa Gabriela
Esposti-Ongaro Tomaso
Farnós Joan
Favretto-Cristini Nathalie
Fichtner Andreas
Folch Arnau
Fournier Alexandre
Gabriel Alice-Agnes
Gallard Jean-Matthieu
Gibbons Steven John
Glimsdal Sylfest
González-Vida José Manuel
Gracia Jose
Gregorio Rose
Gutierrez Natalia
Halldorsson Benedikt
Hamitou Okba
Houzeaux Guillaume
Jaure Stephan
Kessar Mouloud
Krenz Lukas
Krischer Lion
Laforet Soline
Lanucara Piero
Li Bo
Lorenzino Maria Concetta
Lorito Stefano
Løvholt Finn
Macedonio Giovanni
Macías Jorge
Martínez Montesinos Beatriz
Marín Guillermo
Mingari Leonardo
Moguilny Geneviève
Montellier Vadim
Monterrubio-Velasco Marisol
Moulard Georges Emmanuel
Nagaso Masaru
Nazaria Massimo
Niethammer Christoph
Pardini Federica
Pienkowska Marta
Pizzimenti Luca
Poiata Natalia
Rannabauer Leonhard
Rodriguez Juan Esteban
Rojas Otilio
Romano Fabrizio
Rudyy Oleksandr
Ruggiero Vittorio
Samfass Philipp
Sanchez Sabrina
Sandri Laura
Scala Antonio
Schaeffer Nathanael
Schuchart Joseph
Selva Jacopo
Sergeant Amadine
Stallone Angela
Sánchez-Linares Carlos
Taroni Matteo
Thrastarson Soelvi
Titos Manuel
Tonelllo Nadia
Tonini Roberto
Ulrich Thomas
Vilotte Jean-Pierre
Volpe Manuela
Vöge Malte
Wössner Uwe
Publication venue
Publication date: 01/01/2023
Field of study

publishedVersio

HAL AMU

Norwegian Geotechnical Institute (NGI) Digital Archive

Scalable Tools for Non-Intrusive Performance Debugging of Parallel Linux Workloads

Author: Hackenberg Daniel
Ilsche Thomas
Schuchart Joseph
Schöne Robert
Publication venue: Ottawa Linux Symposium Comittee
Publication date: 01/01/2014
Field of study

There is a variety of tools to measure the performance of Linux systems and the applications running on them. However, the resulting performance data is often presented in plain text format or only with a very basic user interface. For large systems with many cores and concurrent threads, it is increasingly difficult to present the data in a clear way for analysis. Moreover, certain performance analysis and debugging tasks require the use of a high-resolution time-line based approach, again entailing data visualization challenges. Tools in the area of High Performance Computing (HPC) have long been able to scale to hundreds or thousands of parallel threads and help finding performance anomalies. We therefore present a solution to gather performance data using Linux performance monitoring interfaces. A combination of sampling and careful instrumentation allows us to obtain detailed performance traces with manageable overhead. We then convert the resulting output to the Open Trace Format (OTF) to bridge the gap between the recording infrastructure and HPC analysis tools. We explore ways to visualize the data by using the graphical tool Vampir. The combination of established Linux and HPC tools allows us to create an interface for easy navigation through time-ordered performance data grouped by thread or CPU and to help users find opportunities for performance optimizations

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Scalable Tools for Non-Intrusive Performance Debugging of Parallel Linux Workloads

Author: Hackenberg Daniel
Ilsche Thomas
Schuchart Joseph
Schöne Robert
Publication venue: Ottawa Linux Symposium Comittee
Publication date
Field of study

HSSS - Hochschulschriftenserver der SLUB

Run-Time Exploitation of Application Dynamism for Energy-Efficient Exascale Computing (READEX)

Author: Gerndt Michael
Kjeldsberg Per Gunnar
Nagel Wolfgang
Oleynik Yury
Schuchart Joseph
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Efficiently utilizing the resources provided on current petascale and future exascale systems will be a challenging task, potentially causing a large amount of underutilized resources and wasted energy. A promising potential to improve efficiency of HPC applications stems from the significant degree of dynamic behavior, e.g., run-time alternation in application resource requirements in HPC workloads. Manually detecting and leveraging this dynamism to improve performance and energy-efficiency is a tedious task that is commonly neglected by developers. However, using an automatic optimization approach, application dynamism can be analyzed at design-time and used to optimize system configurations at run-time. The European Union Horizon 2020 READEX project will develop a tools-aided scenario based auto-tuning methodology to exploit the dynamic behavior of HPC applications to achieve improved energy-efficiency and performance. Driven by a consortium of European experts from academia, HPC resource providers, and industry, the READEX project aims at developing the first of its kind generic framework for split design-time runtime automatic tuning for heterogeneous system at the exascale level

Crossref

NORA - Norwegian Open Research Archives

READEX: Linking Two Ends of the Computing Continuum to Improve Energy-efficiency in Dynamic Applications

Author: Gerndt Michael
Gocht Andreas
Kjeldsberg Per Gunnar
Mian Umbreen Sabir
Schuchart Joseph
Říha Lubomír
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

In both the embedded systems and High Performance Computing domains, energy-efficiency has become one of the main design criteria. Efficiently utilizing the resources provided in computing systems ranging from embedded systems to current petascale and future Exascale HPC systems will be a challenging task. Suboptimal designs can potentially cause large amounts of underutilized resources and wasted energy. In both domains, a promising potential for improving efficiency of scalable applications stems from the significant degree of dynamic behaviour, e.g., runtime alternation in application resource requirements and workloads. Manually detecting and leveraging this dynamism to improve performance and energy-efficiency is a tedious task that is commonly neglected by developers. However, using an automatic optimization approach, application dynamism can be analysed at design time and used to optimize system configurations at runtime. The European Union Horizon 2020 READEX (Runtime Exploitation of Application Dynamism for Energy-efficient eXascale computing) project will develop a tools-aided auto-tuning methodology inspired by the system scenario methodology used in embedded systems. Dynamic behaviour of HPC applications will be exploited to achieve improved energy-efficiency and performance. Driven by a consortium of European experts from academia, HPC resource providers, and industry, the READEX project aims at developing the first of its kind generic framework to split design time and runtime automatic tuning while targeting heterogeneous system at the Exascale level. This paper describes plans for the project as well as early results achieved during its first year. Furthermore, it is shown how project results will be brought back into the embedded systems domain

NORA - Norwegian Open Research Archives