2,371 research outputs found

    DART-MPI: An MPI-based Implementation of a PGAS Runtime System

    Full text link
    A Partitioned Global Address Space (PGAS) approach treats a distributed system as if the memory were shared on a global level. Given such a global view on memory, the user may program applications very much like shared memory systems. This greatly simplifies the tasks of developing parallel applications, because no explicit communication has to be specified in the program for data exchange between different computing nodes. In this paper we present DART, a runtime environment, which implements the PGAS paradigm on large-scale high-performance computing clusters. A specific feature of our implementation is the use of one-sided communication of the Message Passing Interface (MPI) version 3 (i.e. MPI-3) as the underlying communication substrate. We evaluated the performance of the implementation with several low-level kernels in order to determine overheads and limitations in comparison to the underlying MPI-3.Comment: 11 pages, International Conference on Partitioned Global Address Space Programming Models (PGAS14

    Generalized Portable SHMEM library for high performance computing

    Get PDF
    The Generalized Portable SHMEM library (GPSHMEM) is a portable implementation of the SHMEM library originally released by Cray Research Inc. on the Cray T3D. SHMEM and GPSHMEM realize the distributed shared memory programming model, that is, a shared memory programming model in environments in which memory is physically distributed. It is intended for use on a large variety of hardware platforms, including distributed systems with a network interconnect. The programming interface of GPSHMEM follows that of SHMEM and includes remote memory access operations (one-sided communication) and a set of collective routines such as broadcast, collection and reduction. Programming interfaces for C and Fortran are provided. Because of the minimal assumptions about the underlying hardware, GPSHMEM does not implement the full SHMEM T3D interface. The lack of a few functions is compensated by a set of extensions, including dynamic memory allocation for Fortran 77. To ease porting of SHMEM-enabled scientific Fortran 77 code from the Cray machines to use with GPSHMEM, a specialized Fortran 77 preprocessor was designed and developed

    OpenSHMEM Application Programming Interface, v1.0 Final

    Full text link
    This document defines the elements of the OpenSHMEM Application Programming Interface. The purpose of the OpenSHMEM API is to provide programmers with a standard interface for writing parallel programs using C, C++ and Fortran with one-sided communication

    High performance computing applications: Inter-process communication, workflow optimization, and deep learning for computational nuclear physics

    Get PDF
    Various aspects of high performance computing (HPC) are addressed in this thesis. The main focus is on analyzing and suggesting novel ideas to improve an application\u27s performance and scalability on HPC systems and to make the most out of the available computational resources. The choice of inter-process communication is one of the main factors that can influence an application\u27s performance. This study investigates other computational paradigms, such as one-sided communication, that was known to improve the efficiency of current implementation methods. We compare the performance and scalability of the SHMEM and corresponding MPI-3 routines for five different benchmark tests using a Cray XC30. The performance of the MPI-3 get and put operations was evaluated using fence synchronization and also using lock-unlock synchronization. The five tests used communication patterns ranging from light to heavy data traffic: accessing distant messages, circular right shift, gather, broadcast and all-to-all. Each implementation was run using message sizes of 8 bytes, 10 Kbytes and 1 Mbyte and up to 768 processes. For nearly all tests, the SHMEM get and put implementations outperformed the MPI-3 get and put implementations. We noticed significant performance increase using MPI-3 instead of MPI-2 when compared with performance results from previous studies. One can use this performance and scalability analysis to choose the implementation method best suited for a particular application to run on a specific HPC machine. Today\u27s HPC machines are complex and constantly evolving, making it important to be able to easily evaluate the performance and scalability of HPC applications on both existing and new HPC computers. The evaluation of the performance of applications can be time consuming and tedious. HPC-Bench is a general purpose tool used to optimize benchmarking workflow for HPC to aid in the efficient evaluation of performance using multiple applications on an HPC machine with only a click of a button . HPC-Bench allows multiple applications written in different languages, with multiple parallel versions, using multiple numbers of processes/threads to be evaluated. Performance results are put into a database, which is then queried for the desired performance data, and then the R statistical software package is used to generate the desired graphs and tables. The use of HPC-Bench is illustrated with complex applications that were run on the National Energy Research Scientific Computing Center\u27s (NERSC) Edison Cray XC30 HPC computer. With the advancement of HPC machines, one needs efficient algorithms and new tools to make the most out of available computational resources. This work also discusses a novel application of deep learning to a nuclear physics application. In recent years, several successful applications of the artificial neural networks (ANNs) have emerged in nuclear physics and high-energy physics, as well as in biology, chemistry, meteorology, and other fields of science. A major goal of nuclear theory is to predict nuclear structure and nuclear reactions from the underlying theory of the strong interactions, Quantum Chromodynamics (QCD). The nuclear quantum many-body problem is a computationally hard problem to solve. With access to powerful HPC systems, several ab initio approaches, such as the No-Core Shell Model (NCSM), have been developed for approximately solving finite nuclei with realistic strong interactions. However, to accurately solve for the properties of atomic nuclei, one faces immense theoretical and computational challenges. To obtain the nuclear physics observables as close as possible to the exact results, one seeks NCSM solutions in the largest feasible basis spaces. These results obtained in a finite basis, are then used to extrapolate to the infinite basis space limit and thus, obtain results corresponding to the complete basis within evaluated uncertainties. Each observable requires a separate extrapolation and most observables have no proven extrapolation method. We propose a feed-forward ANN method as an extrapolation tool to obtain the ground state energy and the ground state point-proton root-mean-square (rms) radius along with their extrapolation uncertainties. The designed ANNs are sufficient to produce results for these two very different observables in ^6Li from the ab initio NCSM results in small basis spaces that satisfy the following theoretical physics condition: independence of basis space parameters in the limit of extremely large matrices. Comparisons of the ANN results with other extrapolation methods are also provided

    Java Grande Forum Report: Making Java Work for High-End Computing

    Get PDF
    This document describes the Java Grande Forum and includes its initial deliverables.Theseare reports that convey a succinct set of recommendations from this forum to SunMicrosystems and other purveyors of Java™ technology that will enable GrandeApplications to be developed with the Java programming language

    Architecture independent environment for developing engineering software on MIMD computers

    Get PDF
    Engineers are constantly faced with solving problems of increasing complexity and detail. Multiple Instruction stream Multiple Data stream (MIMD) computers have been developed to overcome the performance limitations of serial computers. The hardware architectures of MIMD computers vary considerably and are much more sophisticated than serial computers. Developing large scale software for a variety of MIMD computers is difficult and expensive. There is a need to provide tools that facilitate programming these machines. First, the issues that must be considered to develop those tools are examined. The two main areas of concern were architecture independence and data management. Architecture independent software facilitates software portability and improves the longevity and utility of the software product. It provides some form of insurance for the investment of time and effort that goes into developing the software. The management of data is a crucial aspect of solving large engineering problems. It must be considered in light of the new hardware organizations that are available. Second, the functional design and implementation of a software environment that facilitates developing architecture independent software for large engineering applications are described. The topics of discussion include: a description of the model that supports the development of architecture independent software; identifying and exploiting concurrency within the application program; data coherence; engineering data base and memory management
    corecore