81,797 research outputs found
Behavioural types for non-uniform memory accesses
Concurrent programs executing on NUMA architectures consist of concurrent
entities (e.g. threads, actors) and data placed on different nodes. Execution
of these concurrent entities often reads or updates states from remote nodes.
The performance of such systems depends on the extent to which the concurrent
entities can be executing in parallel, and on the amount of the remote reads
and writes.
We consider an actor-based object oriented language, and propose a type
system which expresses the topology of the program (the placement of the actors
and data on the nodes), and an effect system which characterises remote reads
and writes (in terms of which node reads/writes from which other nodes). We use
a variant of ownership types for the topology, and a combination of behavioural
and ownership types for the effect system.Comment: In Proceedings PLACES 2015, arXiv:1602.0325
DART-MPI: An MPI-based Implementation of a PGAS Runtime System
A Partitioned Global Address Space (PGAS) approach treats a distributed
system as if the memory were shared on a global level. Given such a global view
on memory, the user may program applications very much like shared memory
systems. This greatly simplifies the tasks of developing parallel applications,
because no explicit communication has to be specified in the program for data
exchange between different computing nodes. In this paper we present DART, a
runtime environment, which implements the PGAS paradigm on large-scale
high-performance computing clusters. A specific feature of our implementation
is the use of one-sided communication of the Message Passing Interface (MPI)
version 3 (i.e. MPI-3) as the underlying communication substrate. We evaluated
the performance of the implementation with several low-level kernels in order
to determine overheads and limitations in comparison to the underlying MPI-3.Comment: 11 pages, International Conference on Partitioned Global Address
Space Programming Models (PGAS14
Actors: The Ideal Abstraction for Programming Kernel-Based Concurrency
GPU and multicore hardware architectures are commonly
used in many different application areas to accelerate problem solutions
relative to single CPU architectures. The typical approach to accessing
these hardware architectures requires embedding logic into the programming
language used to construct the application; the two primary forms
of embedding are: calls to API routines to access the concurrent functionality,
or pragmas providing concurrency hints to a language compiler
such that particular blocks of code are targeted to the concurrent functionality.
The former approach is verbose and semantically bankrupt,
while the success of the latter approach is restricted to simple, static
uses of the functionality.
Actor-based applications are constructed from independent, encapsulated
actors that interact through strongly-typed channels. This paper
presents a first attempt at using actors to program kernels targeted at
such concurrent hardware. Besides the glove-like fit of a kernel to the actor
abstraction, quantitative code analysis shows that actor-based kernels
are always significantly simpler than API-based coding, and generally
simpler than pragma-based coding. Additionally, performance measurements
show that the overheads of actor-based kernels are commensurate
to API-based kernels, and range from equivalent to vastly improved for
pragma-based annotations, both for sample and real-world applications
- …