Search CORE

43 research outputs found

Coupling Memory and Computation for Locality Management

Author: Acar Umut A.
Blelloch Guy
Fluet Matthew
Muller Stefan K.
Raghunathan Ram
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 1st Summit on Advances in Programming Languages (SNAPL 2015)
Publication date: 01/01/2015
Field of study

We articulate the need for managing (data) locality automatically rather than leaving it to the programmer, especially in parallel programming systems. To this end, we propose techniques for coupling tightly the computation (including the thread scheduler) and the memory manager so that data and computation can be positioned closely in hardware. Such tight coupling of computation and memory management is in sharp contrast with the prevailing practice of considering each in isolation. For example, memory-management techniques usually abstract the computation as an unknown "mutator", which is treated as a "black box". As an example of the approach, in this paper we consider a specific class of parallel computations, nested-parallel computations. Such computations dynamically create a nesting of parallel tasks. We propose a method for organizing memory as a tree of heaps reflecting the structure of the nesting. More specifically, our approach creates a heap for a task if it is separately scheduled on a processor. This allows us to couple garbage collection with the structure of the computation and the way in which it is dynamically scheduled on the processors. This coupling enables taking advantage of locality in the program by mapping it to the locality of the hardware. For example for improved locality a heap can be garbage collected immediately after its task finishes when the heap contents is likely in cache

Dagstuhl Research Online Publication Server

Retrofitting parallelism onto OCaml.

Author: Dhiman Atul
Dolan Stephen
Jaffer Sadiq
Kelly Tom
Madhavapeddy Anil
Parimala Sudha
Sahoo Anmol
Sivaramakrishnan KC
White Leo
Publication venue: Proceedings of the ACM on Programming Languages
Publication date: 02/07/2020
Field of study

OCaml is an industrial-strength, multi-paradigm programming language, widely used in industry and academia. OCaml is also one of the few modern managed system programming languages to lack support for shared memory parallel programming. This paper describes the design, a full-fledged implementation and evaluation of a mostly-concurrent garbage collector (GC) for the multicore extension of the OCaml programming language. Given that we propose to add parallelism to a widely used programming language with millions of lines of existing code, we face the challenge of maintaining backwards compatibility--not just in terms of the language features but also the performance of single-threaded code running with the new GC. To this end, the paper presents a series of novel techniques and demonstrates that the new GC strikes a balance between performance and feature backwards compatibility for sequential programs and scales admirably on modern multicore processors

arXiv.org e-Print Archive

Apollo (Cambridge)

PAEAN : portable and scalable runtime support for parallel Haskell dialects

Author: Acar
Appel
Breitinger
Chakravarty
Cole
Du Bois
Geist
Hammond
Hammond
Hammond
Hammond
HANS-WOLFGANG LOIDL
JOST BERTHOLD
KEVIN HAMMOND
Loidl
Loidl
Loidl
Maier
Peyton
Sivaramakrishnan
Stewart
Trinder
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2016
Field of study

Over time, several competing approaches to parallel Haskell programming have emerged. Different approaches support parallelism at various different scales, ranging from small multicores to massively parallel high-performance computing systems. They also provide varying degrees of control, ranging from completely implicit approaches to ones providing full programmer control. Most current designs assume a shared memory model at the programmer, implementation and hardware levels. This is, however, becoming increasingly divorced from the reality at the hardware level. It also imposes significant unwanted runtime overheads in the form of garbage collection synchronisation etc. What is needed is an easy way to abstract over the implementation and hardware levels, while presenting a simple parallelism model to the programmer. The PArallEl shAred Nothing runtime system design aims to provide a portable and high-level shared-nothing implementation platform for parallel Haskell dialects. It abstracts over major issues such as work distribution and data serialisation, consolidating existing, successful designs into a single framework. It also provides an optional virtual shared-memory programming abstraction for (possibly) shared-nothing parallel machines, such as modern multicore/manycore architectures or cluster/cloud computing systems. It builds on, unifies and extends, existing well-developed support for shared-memory parallelism that is provided by the widely used GHC Haskell compiler. This paper summarises the state-of-the-art in shared-nothing parallel Haskell implementations, introduces the PArallEl shAred Nothing abstractions, shows how they can be used to implement three distinct parallel Haskell dialects, and demonstrates that good scalability can be obtained on recent parallel machines.PostprintPeer reviewe

Heriot Watt Pure

Crossref

Copenhagen University Research Information System

University of St. Andrews - Pure

St Andrews Research Repository

Hierarchical Memory Management for Parallel Programs

Author: Acar Umut,
Blelloch Guy,
Muller Stefan,
Raghunathan Ram,
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/09/2016
Field of study

International audienceAn important feature of functional programs is that they are parallel by default. Implementing an efficient parallel functional language, however, is a major challenge, in part because the high rate of allocation and freeing associated with functional programs requires an efficient and scalable memory manager. In this paper, we present a technique for parallel memory management for strict functional languages with nested parallelism. At the highest level of abstraction, the approach consists of a technique to organize memory as a hierarchy of heaps, and an algorithm for performing automatic memory reclamation by taking advantage of a disentanglement property of parallel functional programs. More specifically, the idea is to assign to each parallel task its own heap in memory and organize the heaps in a hierarchy/tree that mirrors the hierarchy of tasks. We present a nested-parallel calculus that specifies hierarchical heaps and prove in this calculus a disentanglement property, which prohibits a task from accessing objects allocated by another task that might execute in parallel. Leveraging the disentanglement property, we present a garbage collection technique that can operate on any subtree in the memory hierarchy concurrently as other tasks (and/or other collections) proceed in parallel. We prove the safety of this collector by formalizing it in the context of our parallel calculus. In addition, we describe how the proposed techniques can be implemented on modern shared-memory machines and present a prototype implementation as an extension to MLton, a high-performance compiler for the Standard ML language. Finally, we evaluate the performance of this implementation on a number of parallel benchmarks

Crossref

INRIA a CCSD electronic archive server

Doctor of Philosophy

Author: Tew Kevin B.
Publication venue: University of Utah
Publication date: 01/08/2013
Field of study

dissertationPlaces and distributed places bring new support for message-passing parallelism to Racket. This dissertation describes the programming model and how Racket's sequential runtime-system was modified to support places and distributed places. The freedom to design the places programming model helped make the implementation tractable; specifically, the conventional pain of adding just the right amount of locking to a big, legacy runtime system was avoided. The dissertation presents an evaluation of the places design that includes both real-world applications and standard parallel benchmarks. Distributed places are introduced as a language extension of the places design and architecture. The distributed places extension augments places with the features of remote process launch, remote place invocation, and distributed message passing. Distributed places provide a foundation for constructing higher-level distributed frameworks. Example implementations of RPC, MPI, map reduce, and nested data parallelism demonstrate the extensibility of the distributed places API

The University of Utah: J. Willard Marriott Digital Library

Composable Scheduler Activations for Haskell

Author: Harris Tim
Marlow Simon
Peyton Jones Simon
Sivaramakrishnan KC
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2014
Field of study

Purdue E-Pubs

Partial Aborts for Transactions via First Class Continuations

Author: Le Matthew
Publication venue: RIT Scholar Works
Publication date: 01/07/2016
Field of study

Software transactional memory (STM) has proven to be a useful abstraction for developing concurrent applications where programmers denote transactions with an atomic construct that delimits a collection of reads and writes to shared mutable references. The runtime system then guarantees that all transactions are observed to execute atomically with respect to each other. Traditionally, when the runtime system detects that one transaction conflicts with another, it aborts one of the transactions and restarts its execution from the beginning. This can lead to problems with both execution time and throughput. This thesis presents a novel approach that uses first-class continuations to restart a conflicting transaction at the point of a conflict, avoiding the re-execution of any work from the beginning of the transaction that has not been compromised. In practice, this allows transactions to complete more quickly, decreasing execution time and increasing throughput. The ideas presented in this thesis have been implemented in the context of the Manticore project, an ML-family language with support for parallelism and concurrency. Crucially, this work relies on constant-time continuation capturing via a continuation-passing-style (CPS) transformation and heap-allocated continuations. The partial abort scheme has been implemented as a part of three modern STM implementations: TL2, TinySTM, and NOrec. Experimental results show that, while no base STM implementation is universally best, each partial-abort implementation compares favorably to its full-abort counterpart. In addition to an implementation, this thesis presents a formal semantics for partial aborts. A proof of correctness is given by relating the partial-abort semantics to an analogous full-abort semantics via a simulation. All proofs have been formally verified using the Coq Theorem Prover

RIT Scholar Works

Parallel evaluation strategies for lazy data structures in Haskell

Author: Totoo Prabhat
Publication venue: Mathematical and Computer Sciences
Publication date: 01/05/2016
Field of study

Conventional parallel programming is complex and error prone. To improve programmer productivity, we need to raise the level of abstraction with a higher-level programming model that hides many parallel coordination aspects. Evaluation strategies use non-strictness to separate the coordination and computation aspects of a Glasgow parallel Haskell (GpH) program. This allows the specification of high level parallel programs, eliminating the low-level complexity of synchronisation and communication associated with parallel programming. This thesis employs a data-structure-driven approach for parallelism derived through generic parallel traversal and evaluation of sub-components of data structures. We focus on evaluation strategies over list, tree and graph data structures, allowing re-use across applications with minimal changes to the sequential algorithm. In particular, we develop novel evaluation strategies for tree data structures, using core functional programming techniques for coordination control, achieving more flexible parallelism. We use non-strictness to control parallelism more flexibly. We apply the notion of fuel as a resource that dictates parallelism generation, in particular, the bi-directional flow of fuel, implemented using a circular program definition, in a tree structure as a novel way of controlling parallel evaluation. This is the first use of circular programming in evaluation strategies and is complemented by a lazy function for bounding the size of sub-trees. We extend these control mechanisms to graph structures and demonstrate performance improvements on several parallel graph traversals. We combine circularity for control for improved performance of strategies with circularity for computation using circular data structures. In particular, we develop a hybrid traversal strategy for graphs, exploiting breadth-first order for exposing parallelism initially, and then proceeding with a depth-first order to minimise overhead associated with a full parallel breadth-first traversal. The efficiency of the tree strategies is evaluated on a benchmark program, and two non-trivial case studies: a Barnes-Hut algorithm for the n-body problem and sparse matrix multiplication, both using quad-trees. We also evaluate a graph search algorithm implemented using the various traversal strategies. We demonstrate improved performance on a server-class multicore machine with up to 48 cores, with the advanced fuel splitting mechanisms proving to be more flexible in throttling parallelism. To guide the behaviour of the strategies, we develop heuristics-based parameter selection to select their specific control parameters

ROS: The Research Output Service. Heriot-Watt University Edinburgh