21,274 research outputs found
Actors that Unify Threads and Events
There is an impedance mismatch between message-passing concurrency and virtual machines, such as the JVM. VMs usually map their threads to heavyweight OS processes. Without a lightweight process abstraction, users are often forced to write parts of concurrent applications in an event-driven style which obscures control flow, and increases the burden on the programmer. In this paper we show how thread-based and event-based programming can be unified under a single actor abstraction. Using advanced abstraction mechanisms of the Scala programming language, we implemented our approach on unmodified JVMs. Our programming model integrates well with the threading model of the underlying VM
SICStus MT - A Multithreaded Execution Environment for SICStus Prolog
The development of intelligent software agents and other
complex applications which continuously interact with their
environments has been one of the reasons why explicit concurrency has
become a necessity in a modern Prolog system today. Such applications
need to perform several tasks which may be very different with respect
to how they are implemented in Prolog. Performing these tasks
simultaneously is very tedious without language support.
This paper describes the design, implementation and evaluation of a
prototype multithreaded execution environment for SICStus Prolog. The
threads are dynamically managed using a small and compact set of
Prolog primitives implemented in a portable way, requiring almost no
support from the underlying operating system
DART-MPI: An MPI-based Implementation of a PGAS Runtime System
A Partitioned Global Address Space (PGAS) approach treats a distributed
system as if the memory were shared on a global level. Given such a global view
on memory, the user may program applications very much like shared memory
systems. This greatly simplifies the tasks of developing parallel applications,
because no explicit communication has to be specified in the program for data
exchange between different computing nodes. In this paper we present DART, a
runtime environment, which implements the PGAS paradigm on large-scale
high-performance computing clusters. A specific feature of our implementation
is the use of one-sided communication of the Message Passing Interface (MPI)
version 3 (i.e. MPI-3) as the underlying communication substrate. We evaluated
the performance of the implementation with several low-level kernels in order
to determine overheads and limitations in comparison to the underlying MPI-3.Comment: 11 pages, International Conference on Partitioned Global Address
Space Programming Models (PGAS14
libcppa - Designing an Actor Semantic for C++11
Parallel hardware makes concurrency mandatory for efficient program
execution. However, writing concurrent software is both challenging and
error-prone. C++11 provides standard facilities for multiprogramming, such as
atomic operations with acquire/release semantics and RAII mutex locking, but
these primitives remain too low-level. Using them both correctly and
efficiently still requires expert knowledge and hand-crafting. The actor model
replaces implicit communication by sharing with an explicit message passing
mechanism. It applies to concurrency as well as distribution, and a lightweight
actor model implementation that schedules all actors in a properly
pre-dimensioned thread pool can outperform equivalent thread-based
applications. However, the actor model did not enter the domain of native
programming languages yet besides vendor-specific island solutions. With the
open source library libcppa, we want to combine the ability to build reliable
and distributed systems provided by the actor model with the performance and
resource-efficiency of C++11.Comment: 10 page
Execution replay and debugging
As most parallel and distributed programs are internally non-deterministic --
consecutive runs with the same input might result in a different program flow
-- vanilla cyclic debugging techniques as such are useless. In order to use
cyclic debugging tools, we need a tool that records information about an
execution so that it can be replayed for debugging. Because recording
information interferes with the execution, we must limit the amount of
information and keep the processing of the information fast. This paper
contains a survey of existing execution replay techniques and tools.Comment: In M. Ducasse (ed), proceedings of the Fourth International Workshop
on Automated Debugging (AADebug 2000), August 2000, Munich. cs.SE/001003
Managing Communication Latency-Hiding at Runtime for Parallel Programming Languages and Libraries
This work introduces a runtime model for managing communication with support
for latency-hiding. The model enables non-computer science researchers to
exploit communication latency-hiding techniques seamlessly. For compiled
languages, it is often possible to create efficient schedules for
communication, but this is not the case for interpreted languages. By
maintaining data dependencies between scheduled operations, it is possible to
aggressively initiate communication and lazily evaluate tasks to allow maximal
time for the communication to finish before entering a wait state. We implement
a heuristic of this model in DistNumPy, an auto-parallelizing version of
numerical Python that allows sequential NumPy programs to run on distributed
memory architectures. Furthermore, we present performance comparisons for eight
benchmarks with and without automatic latency-hiding. The results shows that
our model reduces the time spent on waiting for communication as much as 27
times, from a maximum of 54% to only 2% of the total execution time, in a
stencil application.Comment: PREPRIN
Session-Based Programming for Parallel Algorithms: Expressiveness and Performance
This paper investigates session programming and typing of benchmark examples
to compare productivity, safety and performance with other communications
programming languages. Parallel algorithms are used to examine the above
aspects due to their extensive use of message passing for interaction, and
their increasing prominence in algorithmic research with the rising
availability of hardware resources such as multicore machines and clusters. We
contribute new benchmark results for SJ, an extension of Java for type-safe,
binary session programming, against MPJ Express, a Java messaging system based
on the MPI standard. In conclusion, we observe that (1) despite rich libraries
and functionality, MPI remains a low-level API, and can suffer from commonly
perceived disadvantages of explicit message passing such as deadlocks and
unexpected message types, and (2) the benefits of high-level session
abstraction, which has significant impact on program structure to improve
readability and reliability, and session type-safety can greatly facilitate the
task of communications programming whilst retaining competitive performance
Preparing HPC Applications for the Exascale Era: A Decoupling Strategy
Production-quality parallel applications are often a mixture of diverse
operations, such as computation- and communication-intensive, regular and
irregular, tightly coupled and loosely linked operations. In conventional
construction of parallel applications, each process performs all the
operations, which might result inefficient and seriously limit scalability,
especially at large scale. We propose a decoupling strategy to improve the
scalability of applications running on large-scale systems.
Our strategy separates application operations onto groups of processes and
enables a dataflow processing paradigm among the groups. This mechanism is
effective in reducing the impact of load imbalance and increases the parallel
efficiency by pipelining multiple operations. We provide a proof-of-concept
implementation using MPI, the de-facto programming system on current
supercomputers. We demonstrate the effectiveness of this strategy by decoupling
the reduce, particle communication, halo exchange and I/O operations in a set
of scientific and data-analytics applications. A performance evaluation on
8,192 processes of a Cray XC40 supercomputer shows that the proposed approach
can achieve up to 4x performance improvement.Comment: The 46th International Conference on Parallel Processing (ICPP-2017
Pervasive Parallel And Distributed Computing In A Liberal Arts College Curriculum
We present a model for incorporating parallel and distributed computing (PDC) throughout an undergraduate CS curriculum. Our curriculum is designed to introduce students early to parallel and distributed computing topics and to expose students to these topics repeatedly in the context of a wide variety of CS courses. The key to our approach is the development of a required intermediate-level course that serves as a introduction to computer systems and parallel computing. It serves as a requirement for every CS major and minor and is a prerequisite to upper-level courses that expand on parallel and distributed computing topics in different contexts. With the addition of this new course, we are able to easily make room in upper-level courses to add and expand parallel and distributed computing topics. The goal of our curricular design is to ensure that every graduating CS major has exposure to parallel and distributed computing, with both a breadth and depth of coverage. Our curriculum is particularly designed for the constraints of a small liberal arts college, however, much of its ideas and its design are applicable to any undergraduate CS curriculum
- …