70,730 research outputs found

    DART-MPI: An MPI-based Implementation of a PGAS Runtime System

    Full text link
    A Partitioned Global Address Space (PGAS) approach treats a distributed system as if the memory were shared on a global level. Given such a global view on memory, the user may program applications very much like shared memory systems. This greatly simplifies the tasks of developing parallel applications, because no explicit communication has to be specified in the program for data exchange between different computing nodes. In this paper we present DART, a runtime environment, which implements the PGAS paradigm on large-scale high-performance computing clusters. A specific feature of our implementation is the use of one-sided communication of the Message Passing Interface (MPI) version 3 (i.e. MPI-3) as the underlying communication substrate. We evaluated the performance of the implementation with several low-level kernels in order to determine overheads and limitations in comparison to the underlying MPI-3.Comment: 11 pages, International Conference on Partitioned Global Address Space Programming Models (PGAS14

    An OpenSHMEM Implementation for the Adapteva Epiphany Coprocessor

    Full text link
    This paper reports the implementation and performance evaluation of the OpenSHMEM 1.3 specification for the Adapteva Epiphany architecture within the Parallella single-board computer. The Epiphany architecture exhibits massive many-core scalability with a physically compact 2D array of RISC CPU cores and a fast network-on-chip (NoC). While fully capable of MPMD execution, the physical topology and memory-mapped capabilities of the core and network translate well to Partitioned Global Address Space (PGAS) programming models and SPMD execution with SHMEM.Comment: 14 pages, 9 figures, OpenSHMEM 2016: Third workshop on OpenSHMEM and Related Technologie

    Lockouts: Past, Present, and Future

    Get PDF

    Exploring Scientific Application Performance Using Large Scale Object Storage

    Full text link
    One of the major performance and scalability bottlenecks in large scientific applications is parallel reading and writing to supercomputer I/O systems. The usage of parallel file systems and consistency requirements of POSIX, that all the traditional HPC parallel I/O interfaces adhere to, pose limitations to the scalability of scientific applications. Object storage is a widely used storage technology in cloud computing and is more frequently proposed for HPC workload to address and improve the current scalability and performance of I/O in scientific applications. While object storage is a promising technology, it is still unclear how scientific applications will use object storage and what the main performance benefits will be. This work addresses these questions, by emulating an object storage used by a traditional scientific application and evaluating potential performance benefits. We show that scientific applications can benefit from the usage of object storage on large scales.Comment: Preprint submitted to WOPSSS workshop at ISC 201

    Free and Open Source Software in Municipal Procurement:The Challenges and Benefits of Cooperation

    Get PDF
    The use of free and open source software by municipal governments is the exception rather than the rule. This is due to a variety of factors, including a failure of many municipal procurement policies to take into account the benefits of free software, free software vendors second-to-market status, and a lack of established free and open source software vendors in niche markets. With feasible policy shifts to improve city operations, including building upon open standards and engaging with free software communities, municipalities may be able to better leverage free and open source software to realize fully the advantages that stem from open software development

    Memory-built-in quantum teleportation with photonic and atomic qubits

    Full text link
    The combination of quantum teleportation and quantum memory of photonic qubits is essential for future implementations of large-scale quantum communication and measurement-based quantum computation. Both steps have been achieved separately in many proof-of-principle experiments, but the demonstration of memory-built-in teleportation of photonic qubits remains an experimental challenge. Here, we demonstrate teleportation between photonic (flying) and atomic (stationary) qubits. In our experiment, an unknown polarization state of a single photon is teleported over 7 m onto a remote atomic qubit that also serves as a quantum memory. The teleported state can be stored and successfully read out for up to 8 micro-second. Besides being of fundamental interest, teleportation between photonic and atomic qubits with the direct inclusion of a readable quantum memory represents a step towards an efficient and scalable quantum network.Comment: 19 pages 3 figures 1 tabl

    SKIRT: hybrid parallelization of radiative transfer simulations

    Full text link
    We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modeling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behavior of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.Comment: 21 pages, 20 figure

    Design and Evaluation of a Collective IO Model for Loosely Coupled Petascale Programming

    Full text link
    Loosely coupled programming is a powerful paradigm for rapidly creating higher-level applications from scientific programs on petascale systems, typically using scripting languages. This paradigm is a form of many-task computing (MTC) which focuses on the passing of data between programs as ordinary files rather than messages. While it has the significant benefits of decoupling producer and consumer and allowing existing application programs to be executed in parallel with no recoding, its typical implementation using shared file systems places a high performance burden on the overall system and on the user who will analyze and consume the downstream data. Previous efforts have achieved great speedups with loosely coupled programs, but have done so with careful manual tuning of all shared file system access. In this work, we evaluate a prototype collective IO model for file-based MTC. The model enables efficient and easy distribution of input data files to computing nodes and gathering of output results from them. It eliminates the need for such manual tuning and makes the programming of large-scale clusters using a loosely coupled model easier. Our approach, inspired by in-memory approaches to collective operations for parallel programming, builds on fast local file systems to provide high-speed local file caches for parallel scripts, uses a broadcast approach to handle distribution of common input data, and uses efficient scatter/gather and caching techniques for input and output. We describe the design of the prototype model, its implementation on the Blue Gene/P supercomputer, and present preliminary measurements of its performance on synthetic benchmarks and on a large-scale molecular dynamics application.Comment: IEEE Many-Task Computing on Grids and Supercomputers (MTAGS08) 200

    An Overview of Collective Bargaining in the United States

    Get PDF
    [Excerpt] American history reflects a long cycle of trade union decline and growth. Analysts routinely predict the death of the labor movement. (Yeselson 2012). Heralds of labor’s demise often argue that unions were needed in the past, but modem, enlightened management and the need for economic competitiveness make them obsolete. (Troy 1999). But then, workers fed up with employers’ exploitation decide to find new ways to defend themselves. History does not repeat itself, and conditions now are not the same as those spurring the great organizing drives of the 1930s and ‘40s. Still, American workers have shown deep resourcefulness over long cycles of trade union growth, decline and regeneration. Workers’ need for “somebody to back me up” in the face of employer power never disappears. The labor movement built by workers in the United States over the past century is still a strong base for working class advances and strengthening of collective bargaining in years to come
    • 

    corecore