Search CORE

4 research outputs found

Managing Communication Latency-Hiding at Runtime for Parallel Programming Languages and Libraries

Author: Kristensen Mads Ruben Burgdorff
Vinter Brian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

This work introduces a runtime model for managing communication with support for latency-hiding. The model enables non-computer science researchers to exploit communication latency-hiding techniques seamlessly. For compiled languages, it is often possible to create efficient schedules for communication, but this is not the case for interpreted languages. By maintaining data dependencies between scheduled operations, it is possible to aggressively initiate communication and lazily evaluate tasks to allow maximal time for the communication to finish before entering a wait state. We implement a heuristic of this model in DistNumPy, an auto-parallelizing version of numerical Python that allows sequential NumPy programs to run on distributed memory architectures. Furthermore, we present performance comparisons for eight benchmarks with and without automatic latency-hiding. The results shows that our model reduces the time spent on waiting for communication as much as 27 times, from a maximum of 54% to only 2% of the total execution time, in a stencil application.Comment: PREPRIN

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Managing Overlapping Data Structures for Data-Parallel Applications on Distributed Memory Architectures

Author: . Brian Vinter
. Mads Ruben Burgdorff Kristensen
Publication venue: GSTF Journal on Computing (JoC)
Publication date: 28/08/2014
Field of study

In this paper, we introduce a model for managing abstract data structures that map to arbitrary distributed memory architectures. It is difficult to achieve scalable performance in data-parallel applications where the programmer manipulates abstract data structures rather than directly manipulating memory. On distributed memory architectures such abstract data-parallel operations may require communication between nodes. Therefore, the underlying system has to handle communication efficiently without any help from the user. Our data model splits data blocks into two sets -- local data and remote data -- and schedules the sub-block by availability at runtime.We implement the described model in DistNumPy -- a high-productivity programming library for Python. We go on to evaluate the implementation using a representative distributed memory system -- a Cray XE-6 Supercomputer -- up to 2048 cores. The benchmarking results demonstrate scalable good performance

GSTF Digital Library (GSTF-DL): Open Journal Systems (Global Science and Technology Forum)

Managing communication latency-hiding at runtime for parallel programming languages and libraries

Author: Kristensen Mads Ruben Burgdorff
Vinter Brian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Copenhagen University Research Information System

Towards Parallel Execution of Sequential Scientific Applications

Author: Kristensen Mads Ruben Burgdorff
Publication venue
Publication date: 06/09/2012
Field of study

Copenhagen University Research Information System