Search CORE

3 research outputs found

Dynamic Loop Scheduling Using MPI Passive-Target Remote Memory Access

Author: Ciorba Florina M.
Eleliemy Ahmed
Publication venue
Publication date: 01/01/2018
Field of study

Scientific applications often contain large computationally-intensive parallel loops. Loop scheduling techniques aim to achieve load balanced executions of such applications. For distributed-memory systems, existing dynamic loop scheduling (DLS) libraries are typically MPI-based, and employ a master-worker execution model to assign variably-sized chunks of loop iterations. The master-worker execution model may adversely impact performance due to the master-level contention. This work proposes a distributed chunk-calculation approach that does not require the master-worker execution scheme. Moreover, it considers the novel features in the latest MPI standards, such as passive-target remote memory access, shared-memory window creation, and atomic read-modify-write operations. To evaluate the proposed approach, five well-known DLS techniques, two applications, and two heterogeneous hardware setups have been considered. The DLS techniques implemented using the proposed approach outperformed their counterparts implemented using the traditional master-worker execution model

arXiv.org e-Print Archive

Crossref

edoc

A Comprehensive Performance Evaluation of the BinLPT Workload-Aware Loop Scheduler

Author: Broquedis Francois
Castro Márcio
Freitas Henrique,
Gomes Antônio Tadeu,
Mehaut Jean-François
Penna Pedro Henrique
Plentz Patrícia
Publication venue: 'Wiley'
Publication date: 19/02/2019
Field of study

International audienceWorkload-aware loop schedulers were introduced to deliver better performance than classical loop scheduling strategies. However, they presented limitations such as inexible built-in workload estimators and suboptimal chunk scheduling. Targeting these challenges, we proposed previously a workload-aware scheduling strategy called BinLPT, which relies on three features: (i) user-supplied estimations of the workload of the loop; (ii) a greedy heuristic that adaptively partitions the iteration space in several chunks; and (iii) a scheduling scheme based on the Longest Processing Time (LPT) rule and on-demand technique. In this paper, we present two new contributions to the state-of-the-art. First, we introduce a multiloop support feature to BinLPT, which enables the reuse of estimations across loops. Based on this feature, we integrated BinLPT into a real-world elastodynamics application, and we evaluated it running on a supercomputer. Second, we present an evaluation of BinLPT using simulations as well as synthetic and application kernels. We carried out this analysis on a large-scale NUMA machine under a variety of workloads. Our results revealed that BinLPT is better at balancing the workloads of the loop iterations and this behavior improves as the algorithmic complexity of the loop increases. Overall, BinLPT delivers up to 37.15% and 9.11% better performance than well-known loop scheduling strategies, for the application kernels and the elastodynamics simulation, respectively

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

Precise Predictions for LHC Cross Sections and Phenomenology beyond NLO

Author: Roth Robin
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2017
Field of study

Die Produktion von Vektorbosonpaaren ermöglicht die Untersuchung der Wechselwirkung zwischen drei elektroschwachen Eichbosonen. Eine Abweichung dieser Kopplung von der Vorhersage des Standardmodells kann durch anomale Kopplungen im Formalismus von Effektiver Feldtheorie beschrieben werden. In dieser Arbeit wird die zusätzliche Abstrahlung von Jets in WZ und WH Produktion untersucht. Hierfür wird die Observable

x_{\text{jet}}

eingeführt, um Events, die von Jet Abstrahlung dominiert werden, von solchen zu trennen, die zwei hochenergetische Vektorbosonen beinhalten. Mit dieser Observablen können Phasenraumbereiche identifiziert werden, die sensitiv sind auf anomale Kopplungen zwischen Eichbosonen. Zudem wird ein dynamisches Jet Veto vorgeschlagen, um die Sensitivität von Suchen nach anomalen Kopplungen zu erhöhen. Ein traditionelles Veto mit einer festen Skala führt zu logarithmisch wachsenden Termen, die durch ein dynamisches Veto vermieden werden können. Das dynamische Veto erlaubt weiterhin die Einbeziehung eines größeren Phasenraumbereichs. Dies verbessert die Statistik und damit die Empfindlichkeit von Suchen nach anomalen Kopplungen. Für eine genaue Beschreibung der Events mit Vektorbosonpaaren mit hohen Transversalimpulsen sind Korrekturen höherer Ordnung notwendig. Im Rahmen dieser Arbeit wird die LoopSim Methode verwendet, um Korrekturen in

\bar{n}\text{NLO}

in der starken Kopplung zu berechnen. Dies ist eine Näherung der Korrekturen in nächst-zu-nächst-zu-führender Ordnung und besonders geeignet für hohe Transversalimpulse. Diese Analysen nutzen das flexible Monte Carlo Programm VBFNLO in Verbindung mit LoopSim. In dieser Arbeit wird eine parallelisierte Implementierung von VBFNLO entwickelt, die insbesondere für komplexe Prozesse die numerische Integration und Laufzeit verbessert und moderne Rechencluster effizienter nutzt

KITopen