Search CORE

11 research outputs found

Dynamic Loop Scheduling Using MPI Passive-Target Remote Memory Access

Author: Ciorba Florina M.
Eleliemy Ahmed
Publication venue
Publication date: 01/01/2018
Field of study

Scientific applications often contain large computationally-intensive parallel loops. Loop scheduling techniques aim to achieve load balanced executions of such applications. For distributed-memory systems, existing dynamic loop scheduling (DLS) libraries are typically MPI-based, and employ a master-worker execution model to assign variably-sized chunks of loop iterations. The master-worker execution model may adversely impact performance due to the master-level contention. This work proposes a distributed chunk-calculation approach that does not require the master-worker execution scheme. Moreover, it considers the novel features in the latest MPI standards, such as passive-target remote memory access, shared-memory window creation, and atomic read-modify-write operations. To evaluate the proposed approach, five well-known DLS techniques, two applications, and two heterogeneous hardware setups have been considered. The DLS techniques implemented using the proposed approach outperformed their counterparts implemented using the traditional master-worker execution model

arXiv.org e-Print Archive

Crossref

edoc

Dynamic Loop Scheduling Using the MPI Passive-Target Remote Memory Access Model

Author: Ciorba Florina M.
Eleliemy Ahmed
Publication venue: The Platform for Advanced Scientific Computing (PASC) Conference
Publication date: 01/01/2018
Field of study

Large parallel loops are present in many scientific applications. Static and dynamic loop scheduling (DLS) techniques aim to achieve load balanced executions of applications. The use of DLS techniques in scientific applications, such as the self-scheduling-based techniques, showed significant performance advantages compared to static techniques. On distributed-memory systems, DLS techniques have been implemented using the message-passing interface (MPI). Existing implementations of MPI-based DLS libraries do not consider the novel features of the latest MPI standards, such as one-sided communication, shared-memory window creation, and atomic read-modify-write operations. This poster considers these features and proposes an MPI-based DLS library written in the C language. Unlike existing libraries, the proposed DLS library does not employ a master-worker execution model. Moreover, it contains implementations of five well-known DLS techniques, namely self-scheduling, fixed-size chunking, guided self-scheduling, trapezoid self-scheduling, and factoring. An application from the computer vision is used to assess and compare the performance of the proposed library against the performance of existing solutions. The evaluation results show improved performance and highlight the need to revise and upgrade existing solutions in light of the significant advancements in the MPI standards

edoc

Про одну модель оптимального розподілу ресурсів у багатопроцесорних середовищах

Author: Дорошенко А.Ю.
Іваненко П.А.
Ігнатенко О.П.
Publication venue: Інститут програмних систем НАН України
Publication date: 01/01/2011
Field of study

Розглядається модель керування ресурсами в однорідному багатопроцесорному середовищі. Запропоновано підхід на основі потокових моделей, який дозволяє отримати оптимальне у сенсі швидкодії керування. Отримано аналітичний вид часу закінчення роботи в залежності від параметрів розпаралелювання. Результати проілюстровані на відомому прикладі множення матриць. Виконано експериментальне підтвердження теоретичних моделей

Наукова електронна бібліотека періодичних видань НАН України (Vernadsky National Library of Ukraine)

Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

Author: Ciorba Florina M.
Eleliemy Ahmed
Publication venue
Publication date: 01/01/2019
Field of study

Computationally-intensive loops are the primary source of parallelism in scientific applications. Such loops are often irregular and a balanced execution of their loop iterations is critical for achieving high performance. However, several factors may lead to an imbalanced load execution, such as problem characteristics, algorithmic, and systemic variations. Dynamic loop self-scheduling (DLS) techniques are devised to mitigate these factors, and consequently, improve application performance. On distributed-memory systems, DLS techniques can be implemented using a hierarchical master-worker execution model and are, therefore, called hierarchical DLS techniques. These techniques self-schedule loop iterations at two levels of hardware parallelism: across and within compute nodes. Hybrid programming approaches that combine the message passing interface (MPI) with open multi-processing (OpenMP) dominate the implementation of hierarchical DLS techniques. The MPI-3 standard includes the feature of sharing memory regions among MPI processes. This feature introduced the MPI+MPI approach that simplifies the implementation of parallel scientific applications. The present work designs and implements hierarchical DLS techniques by exploiting the MPI+MPI approach. Four well-known DLS techniques are considered in the evaluation proposed herein. The results indicate certain performance advantages of the proposed approach compared to the hybrid MPI+OpenMP approach

arXiv.org e-Print Archive

Crossref

edoc

Multi-dimensional dynamic loop scheduling algorithms

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Crossref

ONE MODEL OF OPTIMAL RESOURCE ALLOCATION IN HOMOGENEOUS MULTIPROCESSOR SYSTEM

Author: Anatoly Doroshenko
Oleksii Ignatenko
Pavlo Ivanenko
Publication venue
Publication date
Field of study

This paper deals with the control model of optimal recourse allocation in homogeneous multiprocessor system. We proposed an approach to developing optimal control using fluid models theory domain. We obtain analytic solution for time depending of parallel execution parameters. Results are validated by experimentation for matrix multiplication example

ISS Library

Evaluation of a distributed numerical simulation optimization approach applied to aquifer remediation

Author: Barbosa Helio J.C.
Costa Patrícia A.P.
Garcia Eduardo L.M.
Schulze Bruno
Publication venue: Published by Elsevier B.V.
Publication date: 31/05/2010
Field of study

AbstractIn this paper we evaluate a distributed approach which uses numerical simulation and optimization techniques to automatically find remediation solutions to a hypothetical contaminated aquifer. The repeated execution of the numerical simulation model of the aquifer through the optimization cycles tends to be computationally expensive. To overcome this drawback, the numerical simulations are executed in parallel using a network of heterogeneous workstations. Performance metrics for heterogeneous environments are not trivial; a new way of calculating speedup and efficiency for Bag-of-Tasks (BoT) applications is proposed. The performance of the parallel approach is evaluated

Elsevier - Publisher Connector

Precise Predictions for LHC Cross Sections and Phenomenology beyond NLO

Author: Roth Robin
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2017
Field of study

Die Produktion von Vektorbosonpaaren ermöglicht die Untersuchung der Wechselwirkung zwischen drei elektroschwachen Eichbosonen. Eine Abweichung dieser Kopplung von der Vorhersage des Standardmodells kann durch anomale Kopplungen im Formalismus von Effektiver Feldtheorie beschrieben werden. In dieser Arbeit wird die zusätzliche Abstrahlung von Jets in WZ und WH Produktion untersucht. Hierfür wird die Observable

x_{\text{jet}}

eingeführt, um Events, die von Jet Abstrahlung dominiert werden, von solchen zu trennen, die zwei hochenergetische Vektorbosonen beinhalten. Mit dieser Observablen können Phasenraumbereiche identifiziert werden, die sensitiv sind auf anomale Kopplungen zwischen Eichbosonen. Zudem wird ein dynamisches Jet Veto vorgeschlagen, um die Sensitivität von Suchen nach anomalen Kopplungen zu erhöhen. Ein traditionelles Veto mit einer festen Skala führt zu logarithmisch wachsenden Termen, die durch ein dynamisches Veto vermieden werden können. Das dynamische Veto erlaubt weiterhin die Einbeziehung eines größeren Phasenraumbereichs. Dies verbessert die Statistik und damit die Empfindlichkeit von Suchen nach anomalen Kopplungen. Für eine genaue Beschreibung der Events mit Vektorbosonpaaren mit hohen Transversalimpulsen sind Korrekturen höherer Ordnung notwendig. Im Rahmen dieser Arbeit wird die LoopSim Methode verwendet, um Korrekturen in

\bar{n}\text{NLO}

in der starken Kopplung zu berechnen. Dies ist eine Näherung der Korrekturen in nächst-zu-nächst-zu-führender Ordnung und besonders geeignet für hohe Transversalimpulse. Diese Analysen nutzen das flexible Monte Carlo Programm VBFNLO in Verbindung mit LoopSim. In dieser Arbeit wird eine parallelisierte Implementierung von VBFNLO entwickelt, die insbesondere für komplexe Prozesse die numerische Integration und Laufzeit verbessert und moderne Rechencluster effizienter nutzt

KITopen

Scalable loop self-scheduling schemes for heterogeneous clusters

Author
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2005
Field of study

Crossref