Search CORE

14,267 research outputs found

Extending Message Passing Interface Windows to Storage

Author: Gioiosa Roberto
Kestor Gokcen
Laure Erwin
Markidis Stefano
Peng Ivy Bo
Rivas-Gomez Sergio
Publication venue
Publication date: 27/04/2017
Field of study

This work presents an extension to MPI supporting the one-sided communication model and window allocations in storage. Our design transparently integrates with the current MPI implementations, enabling applications to target MPI windows in storage, memory or both simultaneously, without major modifications. Initial performance results demonstrate that the presented MPI window extension could potentially be helpful for a wide-range of use-cases and with low-overhead

arXiv.org e-Print Archive

Crossref

DART-MPI: An MPI-based Implementation of a PGAS Runtime System

Author: Fürlinger Karl
Glass Colin W.
Gracia José
Idrees Kamran
Mhedheb Yousri
Tao Jie
Zhou Huan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

A Partitioned Global Address Space (PGAS) approach treats a distributed system as if the memory were shared on a global level. Given such a global view on memory, the user may program applications very much like shared memory systems. This greatly simplifies the tasks of developing parallel applications, because no explicit communication has to be specified in the program for data exchange between different computing nodes. In this paper we present DART, a runtime environment, which implements the PGAS paradigm on large-scale high-performance computing clusters. A specific feature of our implementation is the use of one-sided communication of the Message Passing Interface (MPI) version 3 (i.e. MPI-3) as the underlying communication substrate. We evaluated the performance of the implementation with several low-level kernels in order to determine overheads and limitations in comparison to the underlying MPI-3.Comment: 11 pages, International Conference on Partitioned Global Address Space Programming Models (PGAS14

arXiv.org e-Print Archive

Crossref

Comparison of Different Parallel Implementations of the 2+1-Dimensional KPZ Model and the 3-Dimensional KMC Model

Author: B.M. Forrest
D. Forster
E. Frey
E. Marinari
E. Marinari
F.D.A. AaraoReis
G. Ódor
G. Ódor
G. Ódor
G. Ódor
G. Ódor
H. Rost
H. Schulz
H. Schulz
H. van Beijeren
H.C. Fogedby
H.K. Janssen
J. Kelling
J. Kelling
J. Krug
K. -H. Heinig
K. Kawasaki
K.-H. Heinig
L. Canet
M. Barma
M. F. Nagy
M. Henkel
M. Kardar
M. Kardar
M. Lässig
M. Matsumoto
M. Plischke
M. Schwartz
M. Weigel
N. Metropolis
P. Meakin
S. Wolfram
T. Halpin-Healy
T. Hwa
T. Preis
V. Rosato
Y. Shim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/07/2012
Field of study

We show that efficient simulations of the Kardar-Parisi-Zhang interface growth in 2 + 1 dimensions and of the 3-dimensional Kinetic Monte Carlo of thermally activated diffusion can be realized both on GPUs and modern CPUs. In this article we present results of different implementations on GPUs using CUDA and OpenCL and also on CPUs using OpenCL and MPI. We investigate the runtime and scaling behavior on different architectures to find optimal solutions for solving current simulation problems in the field of statistical physics and materials science.Comment: 14 pages, 8 figures, to be published in a forthcoming EPJST special issue on "Computer simulations on GPU

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Recommended from our members

On the conditions for efficient interoperability with threads: An experience with PGAS languages using Cray communication domains

Author: Ibrahim KZ
Yelick K
Publication venue: eScholarship, University of California
Publication date: 01/01/2014
Field of study

Today's high performance systems are typically built from shared memory nodes connected by a high speed network. That architecture, combined with the trend towards less memory per core, encourages programmers to use a mixture of message passing and multithreaded programming. Unfortunately, the advantages of using threads for in-node programming are hindered by their inability to efficiently communicate between nodes. In this work, we identify some of the performance problems that arise in such hybrid programming environments and characterize conditions needed to achieve high communication performance for multiple threads: addressability of targets, separability of communication paths, and full direct reachability to targets. Using the GASNet communication layer on the Cray XC30 as our experimental platform, we show how to satisfy these conditions. We also discuss how satisfying these conditions is influenced by the communication abstraction, implementation constraints, and the interconnect messaging capabilities. To evaluate these ideas, we compare the communication performance of a thread-based node runtime to a process-based runtime. Without our GASNet extensions, thread communication is significantly slower than processes - up to 21x slower. Once the implementation is modified to address each of our conditions, the two runtimes have comparable communication performance. This allows programmers to more easily mix models like OpenMP, CILK, or pthreads with a GASNet-based model like UPC, with the associated performance, convenience and interoperability advantages that come from using threads within a node. © 2014 ACM

eScholarship - University of California

S-Net for multi-memory multicores

Author: Grelck C.
Julku J.
Penczek F.
Peterson L.
Pontelli E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Copyright ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 5th ACM SIGPLAN Workshop on Declarative Aspects of Multicore Programming: http://doi.acm.org/10.1145/1708046.1708054S-Net is a declarative coordination language and component technology aimed at modern multi-core/many-core architectures and systems-on-chip. It builds on the concept of stream processing to structure dynamically evolving networks of communicating asynchronous components. Components themselves are implemented using a conventional language suitable for the application domain. This two-level software architecture maintains a familiar sequential development environment for large parts of an application and offers a high-level declarative approach to component coordination. In this paper we present a conservative language extension for the placement of components and component networks in a multi-memory environment, i.e. architectures that associate individual compute cores or groups thereof with private memories. We describe a novel distributed runtime system layer that complements our existing multithreaded runtime system for shared memory multicores. Particular emphasis is put on efficient management of data communication. Last not least, we present preliminary experimental data

VTT Research System

University of Hertfordshire Research Archive

International Migration, Integration and Social Cohesion online publications

The upper-atmosphere extension of the ICON general circulation model (version: Ua-icon-1.0)

Author: Baldauf M.
Borchert S.
Reinert D.
Schmidt H.
Zhou G.
Zängl G.
Publication venue: 'Copernicus GmbH'
Publication date: 14/08/2019
Field of study

How the upper-atmosphere branch of the circulation contributes to and interacts with the circulation of the middle and lower atmosphere is a research area with many open questions. Inertia-gravity waves, for instance, have moved in the focus of research as they are suspected to be key features in driving and shaping the circulation. Numerical atmospheric models are an important pillar for this research. We use the ICOsahedral Non-hydrostatic (ICON) general circulation model, which is a joint development of the Max Planck Institute for Meteorology (MPI-M) and the German Weather Service (DWD), and provides, e.g., local mass conservation, a flexible grid nesting option, and a non-hydrostatic dynamical core formulated on an icosahedral-triangular grid. We extended ICON to the upper atmosphere and present here the two main components of this new configuration named UA-ICON: an extension of the dynamical core from shallow- to deep-atmosphere dynamics and the implementation of an upper-atmosphere physics package. A series of idealized test cases and climatological simulations is performed in order to evaluate the upper-atmosphere extension of ICON. © Author(s) 2019

MPG.PuRe