11 research outputs found

    Dynamic Loop Scheduling Using MPI Passive-Target Remote Memory Access

    Get PDF
    Scientific applications often contain large computationally-intensive parallel loops. Loop scheduling techniques aim to achieve load balanced executions of such applications. For distributed-memory systems, existing dynamic loop scheduling (DLS) libraries are typically MPI-based, and employ a master-worker execution model to assign variably-sized chunks of loop iterations. The master-worker execution model may adversely impact performance due to the master-level contention. This work proposes a distributed chunk-calculation approach that does not require the master-worker execution scheme. Moreover, it considers the novel features in the latest MPI standards, such as passive-target remote memory access, shared-memory window creation, and atomic read-modify-write operations. To evaluate the proposed approach, five well-known DLS techniques, two applications, and two heterogeneous hardware setups have been considered. The DLS techniques implemented using the proposed approach outperformed their counterparts implemented using the traditional master-worker execution model

    Dynamic Loop Scheduling Using the MPI Passive-Target Remote Memory Access Model

    Get PDF
    Large parallel loops are present in many scientific applications. Static and dynamic loop scheduling (DLS) techniques aim to achieve load balanced executions of applications. The use of DLS techniques in scientific applications, such as the self-scheduling-based techniques, showed significant performance advantages compared to static techniques. On distributed-memory systems, DLS techniques have been implemented using the message-passing interface (MPI). Existing implementations of MPI-based DLS libraries do not consider the novel features of the latest MPI standards, such as one-sided communication, shared-memory window creation, and atomic read-modify-write operations. This poster considers these features and proposes an MPI-based DLS library written in the C language. Unlike existing libraries, the proposed DLS library does not employ a master-worker execution model. Moreover, it contains implementations of five well-known DLS techniques, namely self-scheduling, fixed-size chunking, guided self-scheduling, trapezoid self-scheduling, and factoring. An application from the computer vision is used to assess and compare the performance of the proposed library against the performance of existing solutions. The evaluation results show improved performance and highlight the need to revise and upgrade existing solutions in light of the significant advancements in the MPI standards

    Про одну модель оптимального розподілу ресурсів у багатопроцесорних середовищах

    Get PDF
    Розглядається модель керування ресурсами в однорідному багатопроцесорному середовищі. Запропоновано підхід на основі потокових моделей, який дозволяє отримати оптимальне у сенсі швидкодії керування. Отримано аналітичний вид часу закінчення роботи в залежності від параметрів розпаралелювання. Результати проілюстровані на відомому прикладі множення матриць. Виконано експериментальне підтвердження теоретичних моделей

    Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

    Full text link
    Computationally-intensive loops are the primary source of parallelism in scientific applications. Such loops are often irregular and a balanced execution of their loop iterations is critical for achieving high performance. However, several factors may lead to an imbalanced load execution, such as problem characteristics, algorithmic, and systemic variations. Dynamic loop self-scheduling (DLS) techniques are devised to mitigate these factors, and consequently, improve application performance. On distributed-memory systems, DLS techniques can be implemented using a hierarchical master-worker execution model and are, therefore, called hierarchical DLS techniques. These techniques self-schedule loop iterations at two levels of hardware parallelism: across and within compute nodes. Hybrid programming approaches that combine the message passing interface (MPI) with open multi-processing (OpenMP) dominate the implementation of hierarchical DLS techniques. The MPI-3 standard includes the feature of sharing memory regions among MPI processes. This feature introduced the MPI+MPI approach that simplifies the implementation of parallel scientific applications. The present work designs and implements hierarchical DLS techniques by exploiting the MPI+MPI approach. Four well-known DLS techniques are considered in the evaluation proposed herein. The results indicate certain performance advantages of the proposed approach compared to the hybrid MPI+OpenMP approach

    Multi-dimensional dynamic loop scheduling algorithms

    Full text link

    ONE MODEL OF OPTIMAL RESOURCE ALLOCATION IN HOMOGENEOUS MULTIPROCESSOR SYSTEM

    Get PDF
    This paper deals with the control model of optimal recourse allocation in homogeneous multiprocessor system. We proposed an approach to developing optimal control using fluid models theory domain. We obtain analytic solution for time depending of parallel execution parameters. Results are validated by experimentation for matrix multiplication example

    Evaluation of a distributed numerical simulation optimization approach applied to aquifer remediation

    Get PDF
    AbstractIn this paper we evaluate a distributed approach which uses numerical simulation and optimization techniques to automatically find remediation solutions to a hypothetical contaminated aquifer. The repeated execution of the numerical simulation model of the aquifer through the optimization cycles tends to be computationally expensive. To overcome this drawback, the numerical simulations are executed in parallel using a network of heterogeneous workstations. Performance metrics for heterogeneous environments are not trivial; a new way of calculating speedup and efficiency for Bag-of-Tasks (BoT) applications is proposed. The performance of the parallel approach is evaluated

    Precise Predictions for LHC Cross Sections and Phenomenology beyond NLO

    Get PDF
    Die Produktion von Vektorbosonpaaren ermöglicht die Untersuchung der Wechselwirkung zwischen drei elektroschwachen Eichbosonen. Eine Abweichung dieser Kopplung von der Vorhersage des Standardmodells kann durch anomale Kopplungen im Formalismus von Effektiver Feldtheorie beschrieben werden. In dieser Arbeit wird die zusätzliche Abstrahlung von Jets in WZ und WH Produktion untersucht. Hierfür wird die Observable xjetx_{\text{jet}} eingeführt, um Events, die von Jet Abstrahlung dominiert werden, von solchen zu trennen, die zwei hochenergetische Vektorbosonen beinhalten. Mit dieser Observablen können Phasenraumbereiche identifiziert werden, die sensitiv sind auf anomale Kopplungen zwischen Eichbosonen. Zudem wird ein dynamisches Jet Veto vorgeschlagen, um die Sensitivität von Suchen nach anomalen Kopplungen zu erhöhen. Ein traditionelles Veto mit einer festen Skala führt zu logarithmisch wachsenden Termen, die durch ein dynamisches Veto vermieden werden können. Das dynamische Veto erlaubt weiterhin die Einbeziehung eines größeren Phasenraumbereichs. Dies verbessert die Statistik und damit die Empfindlichkeit von Suchen nach anomalen Kopplungen. Für eine genaue Beschreibung der Events mit Vektorbosonpaaren mit hohen Transversalimpulsen sind Korrekturen höherer Ordnung notwendig. Im Rahmen dieser Arbeit wird die LoopSim Methode verwendet, um Korrekturen in nˉNLO\bar{n}\text{NLO} in der starken Kopplung zu berechnen. Dies ist eine Näherung der Korrekturen in nächst-zu-nächst-zu-führender Ordnung und besonders geeignet für hohe Transversalimpulse. Diese Analysen nutzen das flexible Monte Carlo Programm VBFNLO in Verbindung mit LoopSim. In dieser Arbeit wird eine parallelisierte Implementierung von VBFNLO entwickelt, die insbesondere für komplexe Prozesse die numerische Integration und Laufzeit verbessert und moderne Rechencluster effizienter nutzt

    Scalable loop self-scheduling schemes for heterogeneous clusters

    No full text
    corecore