66,680 research outputs found
Actors: The Ideal Abstraction for Programming Kernel-Based Concurrency
GPU and multicore hardware architectures are commonly
used in many different application areas to accelerate problem solutions
relative to single CPU architectures. The typical approach to accessing
these hardware architectures requires embedding logic into the programming
language used to construct the application; the two primary forms
of embedding are: calls to API routines to access the concurrent functionality,
or pragmas providing concurrency hints to a language compiler
such that particular blocks of code are targeted to the concurrent functionality.
The former approach is verbose and semantically bankrupt,
while the success of the latter approach is restricted to simple, static
uses of the functionality.
Actor-based applications are constructed from independent, encapsulated
actors that interact through strongly-typed channels. This paper
presents a first attempt at using actors to program kernels targeted at
such concurrent hardware. Besides the glove-like fit of a kernel to the actor
abstraction, quantitative code analysis shows that actor-based kernels
are always significantly simpler than API-based coding, and generally
simpler than pragma-based coding. Additionally, performance measurements
show that the overheads of actor-based kernels are commensurate
to API-based kernels, and range from equivalent to vastly improved for
pragma-based annotations, both for sample and real-world applications
Logic programming in the context of multiparadigm programming: the Oz experience
Oz is a multiparadigm language that supports logic programming as one of its
major paradigms. A multiparadigm language is designed to support different
programming paradigms (logic, functional, constraint, object-oriented,
sequential, concurrent, etc.) with equal ease. This article has two goals: to
give a tutorial of logic programming in Oz and to show how logic programming
fits naturally into the wider context of multiparadigm programming. Our
experience shows that there are two classes of problems, which we call
algorithmic and search problems, for which logic programming can help formulate
practical solutions. Algorithmic problems have known efficient algorithms.
Search problems do not have known efficient algorithms but can be solved with
search. The Oz support for logic programming targets these two problem classes
specifically, using the concepts needed for each. This is in contrast to the
Prolog approach, which targets both classes with one set of concepts, which
results in less than optimal support for each class. To explain the essential
difference between algorithmic and search programs, we define the Oz execution
model. This model subsumes both concurrent logic programming
(committed-choice-style) and search-based logic programming (Prolog-style).
Instead of Horn clause syntax, Oz has a simple, fully compositional,
higher-order syntax that accommodates the abilities of the language. We
conclude with lessons learned from this work, a brief history of Oz, and many
entry points into the Oz literature.Comment: 48 pages, to appear in the journal "Theory and Practice of Logic
Programming
On the relative expressiveness of higher-order session processes
By integrating constructs from the λ-calculus and the π-calculus, in higher-order process calculi exchanged values may contain processes. This paper studies the relative expressiveness of HOπ, the higher-order π-calculus in which communications are governed by session types. Our main discovery is that HO, a subcalculus of HOπ which lacks name-passing and recursion, can serve as a new core calculus for session-typed higher-order concurrency. By exploring a new bisimulation for HO, we show that HO can encode HOπ fully abstractly (up to typed contextual equivalence) more precisely and efficiently than the first-order session π-calculus (π). Overall, under session types, HOπ, HO, and π are equally expressive; however, HOπ and HO are more tightly related than HOπ and π
GPUVerify: A Verifier for GPU Kernels
We present a technique for verifying race- and divergence-freedom of GPU kernels that are written in mainstream ker-nel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational se-mantics for GPU programming termed synchronous, delayed visibility (SDV) semantics. The SDV semantics provides a precise definition of barrier divergence in GPU kernels and allows kernel verification to be reduced to analysis of a sequential program, thereby completely avoiding the need to reason about thread interleavings, and allowing existing modular techniques for program verification to be leveraged. We describe an efficient encoding for data race detection and propose a method for automatically inferring loop invari-ants required for verification. We have implemented these techniques as a practical verification tool, GPUVerify, which can be applied directly to OpenCL and CUDA source code. We evaluate GPUVerify with respect to a set of 163 kernels drawn from public and commercial sources. Our evaluation demonstrates that GPUVerify is capable of efficient, auto-matic verification of a large number of real-world kernels
Macroservers: An Execution Model for DRAM Processor-In-Memory Arrays
The emergence of semiconductor fabrication technology allowing a tight coupling between high-density DRAM and CMOS logic on the same chip has led to the important new class of Processor-In-Memory (PIM) architectures. Newer developments provide powerful parallel processing capabilities on the chip, exploiting the facility to load wide words in single memory accesses and supporting complex address manipulations in the memory. Furthermore, large arrays of PIMs can be arranged into a massively parallel architecture. In this report, we describe an object-based programming model based on the notion of a macroserver. Macroservers encapsulate a set of variables and methods; threads, spawned by the activation of methods, operate asynchronously on the variables' state space. Data distributions provide a mechanism for mapping large data structures across the memory region of a macroserver, while work distributions allow explicit control of bindings between threads and data. Both data and work distributuions are first-class objects of the model, supporting the dynamic management of data and threads in memory. This offers the flexibility required for fully exploiting the processing power and memory bandwidth of a PIM array, in particular for irregular and adaptive applications. Thread synchronization is based on atomic methods, condition variables, and futures. A special type of lightweight macroserver allows the formulation of flexible scheduling strategies for the access to resources, using a monitor-like mechanism
Session-Based Programming for Parallel Algorithms: Expressiveness and Performance
This paper investigates session programming and typing of benchmark examples
to compare productivity, safety and performance with other communications
programming languages. Parallel algorithms are used to examine the above
aspects due to their extensive use of message passing for interaction, and
their increasing prominence in algorithmic research with the rising
availability of hardware resources such as multicore machines and clusters. We
contribute new benchmark results for SJ, an extension of Java for type-safe,
binary session programming, against MPJ Express, a Java messaging system based
on the MPI standard. In conclusion, we observe that (1) despite rich libraries
and functionality, MPI remains a low-level API, and can suffer from commonly
perceived disadvantages of explicit message passing such as deadlocks and
unexpected message types, and (2) the benefits of high-level session
abstraction, which has significant impact on program structure to improve
readability and reliability, and session type-safety can greatly facilitate the
task of communications programming whilst retaining competitive performance
Engineering a static verification tool for GPU kernels
We report on practical experiences over the last 2.5 years related to the engineering of GPUVerify, a static verification tool for OpenCL and CUDA GPU kernels, plotting the progress of GPUVerify from a prototype to a fully functional and relatively efficient analysis tool. Our hope is that this experience report will serve the verification community by helping to inform future tooling efforts. © 2014 Springer International Publishing
- …