36,811 research outputs found

    Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

    Full text link
    Computationally-intensive loops are the primary source of parallelism in scientific applications. Such loops are often irregular and a balanced execution of their loop iterations is critical for achieving high performance. However, several factors may lead to an imbalanced load execution, such as problem characteristics, algorithmic, and systemic variations. Dynamic loop self-scheduling (DLS) techniques are devised to mitigate these factors, and consequently, improve application performance. On distributed-memory systems, DLS techniques can be implemented using a hierarchical master-worker execution model and are, therefore, called hierarchical DLS techniques. These techniques self-schedule loop iterations at two levels of hardware parallelism: across and within compute nodes. Hybrid programming approaches that combine the message passing interface (MPI) with open multi-processing (OpenMP) dominate the implementation of hierarchical DLS techniques. The MPI-3 standard includes the feature of sharing memory regions among MPI processes. This feature introduced the MPI+MPI approach that simplifies the implementation of parallel scientific applications. The present work designs and implements hierarchical DLS techniques by exploiting the MPI+MPI approach. Four well-known DLS techniques are considered in the evaluation proposed herein. The results indicate certain performance advantages of the proposed approach compared to the hybrid MPI+OpenMP approach

    DART-MPI: An MPI-based Implementation of a PGAS Runtime System

    Full text link
    A Partitioned Global Address Space (PGAS) approach treats a distributed system as if the memory were shared on a global level. Given such a global view on memory, the user may program applications very much like shared memory systems. This greatly simplifies the tasks of developing parallel applications, because no explicit communication has to be specified in the program for data exchange between different computing nodes. In this paper we present DART, a runtime environment, which implements the PGAS paradigm on large-scale high-performance computing clusters. A specific feature of our implementation is the use of one-sided communication of the Message Passing Interface (MPI) version 3 (i.e. MPI-3) as the underlying communication substrate. We evaluated the performance of the implementation with several low-level kernels in order to determine overheads and limitations in comparison to the underlying MPI-3.Comment: 11 pages, International Conference on Partitioned Global Address Space Programming Models (PGAS14

    Best Effort MPI/RT as an Alternative to MPI: Design and Performance Comparison

    Get PDF
    The Real-Time Message Passing Interface (MPI/RT) is an emerging real-time communications middleware standard for distributed real-time applications. The Message Passing Interface (MPI) is the de facto standard for high performance parallel application development. In this thesis, we describe how MPI/RT with best effort quality of service can be used as an alternative for MPI. Mercury Computer Systems\u27 RACE embedded parallel computer is used as the platform for comparison of design and performance of these two standards. The main advantages MPI/RT has over MPI are its explicit support for communication channels and its emphasis on early binding. Design and implementation of best effort MPI/RT on Mercury is described and its performance is compared with MPI in order to illustrate how MPI/RT features allow implementations to exploit the underlying platform more optimally. The results for the benchmarks show that MPI/RT outperforms MPI in almost all cases examined

    Guaranteed bandwidth implementation of message passing interface on workstation clusters

    Get PDF
    Due to their wide availability, networks of workstations (NOW) are an attractive platform for parallel processing. Parallel programming environments such as Parallel Virtual Machine (PVM), and Message Passing Interface (MPI) offer the user a convenient way to express parallel computing and communication for a network of workstations. Currently, a number of MPI implementations are available that offer low (average ) latency and high bandwidth environments to users by utilizing an efficient MPI library specification and high speed networks. In addition to high bandwidth and low average latency requirements, mission critical distributed applications, audio/video communications require a completely different type of service, guaranteed bandwidth and worst case delays (worst case latency) to be guaranteed by underlying protocol. The hypothesis presented in this paper is that it is possible to provide an application a low level reliable transport protocol with performance and guaranteed bandwidth as close to the hardware on which it is executing. The hypothesis is proven by designing and implementing a reliable high performance message passing protocol interface which also provides the guaranteed bandwidth to MPI and to mission critical distributed MPI applications. This protocol interface works with the Fiber Distributed Data Interface (FDDI) driver which has been designed and implemented for Performance Technology Inc. commercial high performance FDDI product, the Station Management Software 7.3, and the ADI / MPICH (Argonne National Laboratory and Mississippi State University\u27s free MPI implementation)
    • …
    corecore