Search CORE

17 research outputs found

Automated Scheduling Algorithm Selection and Chunk Parameter Calculation in OpenMP

Author: Ciorba M. Florina
Eleliemy Ahmed
Mohammed Ali
Müller Korndörfer Jonas H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Increasing node and cores-per-node counts in supercomputers render scheduling and load balancing critical for exploiting parallelism. OpenMP applications can achieve high performance via careful selection of scheduling kind and chunk parameters on a per-loop, per-application, and per-system basis from a portfolio of advanced scheduling algorithms (Korndörfer et al. , 2022). This selection approach is time-consuming, challenging, and may need to change during execution. We propose Auto4OMP , a novel approach for automated load balancing of OpenMP applications. With Auto4OMP, we introduce three scheduling algorithm selection methods and an expert-defined chunk parameter for OpenMP's schedule clause's kind and chunk , respectively. Auto4OMP extends the OpenMP schedule(auto) and chunk parameter implementation in LLVM's OpenMP runtime library to automatically select a scheduling algorithm and calculate a chunk parameter during execution. Loop characteristics are inferred in Auto4OMP from the loop execution over the application's time-steps. The experiments performed in this work show that Auto4OMP improves applications performance by up to 11 % compared to LLVM's schedule(auto) implementation and outperforms manual selection. Auto4OMP improves MPI+OpenMP applications performance by explicitly minimizing thread- and implicitly reducing process-load imbalance

edoc

RobustImpact Robust impact design of steel and composite building structures: Drawings for producing the test specimens

Author: Baldassino Nadia
Colomer Segura Carles
Demonceau Jean-François
Freddi Fabio
Hoffmann Nadine
Hoffmeister Benno
Huvelle Clara
Jaspart Jean-Pierre
Korndörfer Jonas
Kuhlmann Ulrike
Zandonini Riccardo
Publication venue: European Commission
Publication date
Field of study

UCL Discovery

Seismic Fragility of Horizontal Pressure Vessels - Effects of Structural Interaction between Industrial Components

Author: Feldmann Markus
Hoffmeister Benno
Korndörfer Jonas
Publication venue: Institute of Structural Analysis and Antiseismic Research
Publication date: 01/01/2017
Field of study

Crossref

Publikationsserver der RWTH Aachen University

Fragility Analysis of Horizontal Pressure Vessels in the Coupled and Uncoupled Case

Author: Feldmann Markus
Hoffmeister Benno
Korndörfer Jonas
Publication venue: 'ASME International'
Publication date: 01/01/2016
Field of study

Publikationsserver der RWTH Aachen University

MLS: Multilevel Scheduling in Large Scale High Performance

Author: Ciorba Florina M.
Eleliemy Ahmed
Müller Korndörfer Jonas H.
Publication venue: The International Conference on High Performance Computing (ISC)
Publication date: 01/01/2019
Field of study

High performance computing systems are of increased size (in terms of node count, core count, and core types per node), resulting in increased available hardware parallelism. Hardware parallelism can be found at several levels, from machine instructions to global computing sites. Unfortunately, exposing, expressing, and exploiting parallelism is difficult when considering the increase in parallelism within each level and when exploiting more than a single or even a couple of parallelism levels. The multilevel scheduling (MLS) project aims to offer an answer to the following research question: Given massive parallelism, at multiple levels, and of diverse forms and granularities, how can it be exposed, expressed, and exploited such that execution times are reduced, performance targets are achieved, and acceptable efficiency is maintained? The MLS project investigates the development of a multilevel approach for achieving scalable scheduling in large scale high performance computing systems across the multiple levels of parallelism, with a focus on software parallelism. By integrating multiple levels of parallelism, MLS differs from hierarchical scheduling, traditionally employed to achieve scalability within a single level of parallelism. Specifically, MLS extends and bridges the most successful (batch, application, and thread) scheduling models beyond a single or a couple of parallelism levels (scaling across) and beyond their current scale (scaling out). Via the MLS approach, the project aims to leverage all available parallelism and address hardware heterogeneity in large scale high performance computers such that execution times are reduced, performance targets are achieved, and acceptable efficiency is maintained

edoc

Finding Neighbors in a Forest: A b-tree for Smoothed Particle Hydrodynamics Simulations

Author: Cabezón Rubén M.
Cavelan Aurélien
Ciorba Florina M.
Korndörfer Jonas H. Müller
Publication venue: 'Center for Open Science'
Publication date: 01/01/2019
Field of study

Finding the exact close neighbors of each fluid element in mesh-free computational hydrodynamical methods, such as the Smoothed Particle Hydrodynamics (SPH), often becomes a main bottleneck for scaling their performance beyond a few million fluid elements per computing node. Tree structures are particularly suitable for SPH simulation codes, which rely on finding the exact close neighbors of each fluid element (or SPH particle). In this work we present a novel tree structure, named \textit{b-tree}, which features an adaptive branching factor to reduce the depth of the neighbor search. Depending on the particle spatial distribution, finding neighbors using \tree has an asymptotic best case complexity of O(n), as opposed to O(nlogn) for other classical tree structures such as octrees and quadtrees. We also present the proposed tree structure as well as the algorithms to build it and to find the exact close neighbors of all particles. We assess the scalability of the proposed tree-based algorithms through an extensive set of performance experiments in a shared-memory system. Results show that b-tree is up to 12× faster for building the tree and up to 1.6× faster for finding the exact neighbors of all particles when compared to its octree form. Moreover, we apply b-tree to a SPH code and show its usefulness over the existing octree implementation, where b-tree is up to 5× faster for finding the exact close neighbors compared to the legacy code

arXiv.org e-Print Archive

edoc

Mapping Matters: Application Process Mapping on 3-D Processor Topologies

Author: Bielert Mario
Ciorba Florina M.
Korndörfer Jonas H. Müller
Lima Pilla Laércio
Publication venue: HAL CCSD
Publication date: 10/03/2021
Field of study

International audienceApplications’ performance is influenced by the mapping of processes to computing nodes, the frequency and volume of exchanges among processing elements, the network capacity, and the routing protocol. A poor mapping of application processes degrades performance and wastes resources. As process mapping is frequently ignored as an explicit optimization step (since the system typically offers a default mapping), users may lack awareness of their applications’ communication behavior, and the opportunities for improving performance through mapping are often unclear. This work studies the impact of application process mapping on several processor topologies. We propose and apply a generic workflow that renders mapping as an explicit optimization step to a set of four applications, twelve mapping algorithms, and three direct network topologies. We assess the mappings’ quality in terms of volume, frequency, and distance of exchanges using metrics such as dilation (measured in hop·Byte). With a parallel trace-based simulator, we predictthe applications’ execution on the three topologies using the twelve mappings. To ensure the correctness of the simulations, we compare the pre- and post-simulation results. This work emphasizes the importance of process mapping as an explicit optimization step and offers a solution for parallel applications to exploit the full potential of the allocated resources on a given system

HAL-CentraleSupelec

arXiv.org e-Print Archive

HAL-Rennes 1

A Runtime Approach for Dynamic Load Balancing of OpenMP Parallel Loops in LLVM

Author: Ciorba Florina M.
Doerfert Johannes
Finkel Hal
Iwainsky Christian
Kale Vivek
Klemm Michael
Müller Korndörfer Jonas H.
Yilmaz Akan
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2019
Field of study

Load imbalance is the major source of performance degradation in computationally-intensive applications that frequently consist of parallel loops. Efficient scheduling of parallel loops can improve the performance of such programs. OpenMP is the de-facto standard for parallel programming on shared-memory systems. The current OpenMP specification provides only three choices for loop scheduling which are insufficient in scenarios with irregular loops, system-induced interference, or both. Therefore, this work augments the LLVM implementation of the OpenMP runtime library with eleven state-of-the-art plus three new and ready-to-use scheduling techniques. We tested the existing and the added loop scheduling strategies on several applications from the NAS, SPEC OMP 2012, and CORAL-2 benchmark suites. The experimental results show that each newly implemented scheduling technique outperforms the other in certain application and system configurations. We measured performance gains of up to 6% compared to the fastest previously available scheduling techniques. This work establishes the importance of beyond-standard scheduling options in OpenMP for the benefit of evolving applications executing on evolving multicore architectures

edoc

Robusimpact - Design report of the specimens for all the experimental analyses - Deliverable 4.1

Author: Baldassino Nadia
Colomer Segura Carles
Demonceau Jean-François
Freddi Fabio
Hoffman Nadine
Hoffmeister Benno
Huvelle Clara
Jaspart Jean-Pierre
Korndörfer Jonas
Kuhlmann Ulrike
Zandonini Riccardo
Publication venue: Commission Européenne
Publication date: 01/01/2014
Field of study

The present report focuses on the design of the experimental analysis that are going to be performed within the ROBUSTIMPACT project (Grant Agreement Number: RFSR-CT-2012-00029). The project focuses on the behavior of composite steel and concrete framed buildings against accidental actions. Within the project, several experimental analyses are going to be performed spanning from the local to the global behavior

Open Repository and Bibliography - Liège