196 research outputs found
Towards semi-automatic data-type translation for parallelism in Erlang
As part of our ongoing research programme into programmer-in-the-loop parallelisation, we are studying the problem of introducing alternative data structures to support parallelism. Automated support for data structure transformations makes it easier to produce the best parallelisation for some given program, or even to make parallelisation feasible. We use a refactoring approach to choose and introduce these transformations for specific algorithmic skeletons, structured forms of parallelism that capture common patterns of parallelism. Our approach integrates with the Wrangler refactoring tool for Erlang, and uses the advanced Skel [4] skeleton library for Erlang. This library has previously been shown to give good parallelisations for a number of applications, including a multi-agent system [1] where we have achieved speedups of up to 142.44 on a 61-core machine with 244 threads. We have investigated three widely-used Erlang data structures: lists, binary structures and ETS (Erlang Term Storage) tables. In general, we have found that ETS tables deliver the best parallel performance for the examples that we have considered. However, our results show that simple lists may deliver similar performance to the use of ETS tables, and better performance than using binary structures. This means that we cannot blindly choose to implement a single optimisation as part of the compilation process. Our approach also allows the use of new (possibly user-defined) data structures and other transformations in future, giving a high level of flexibility and generality.Postprin
Refactoring for introducing and tuning parallelism for heterogeneous multicore machines in Erlang
This research has been generously supported by the European Union Framework 7 Para-Phrase project (IST-288570), EU Horizon 2020 projects RePhrase (H2020-ICT-2014-1), agreement number 644235; Teamplay (H2020-ICT 2017-1) agreement number 779882, and EPSRC Discovery, EP/P020631/1. EU COST Action IC1202: Timing Analysis On Code-Level (TACLe), and by a travel grant from EU HiPEAC.This paper presents semi‐automatic software refactorings to introduce and tune structured parallelism in sequential Erlang code, as well as to generate code for running computations on GPUs and possibly other accelerators. Our refactorings are based on the lapedo framework for programming heterogeneous multi‐core systems in Erlang. lapedo is based on the PaRTE refactoring tool and also contains (1) a set of hybrid skeletons that target both CPU and GPU processors, (2) novel refactorings for introducing and tuning parallelism, and (3) a tool to generate the GPU offloading and scheduling code in Erlang, which is used as a component of hybrid skeletons. We demonstrate, on four realistic use‐case applications, that we are able to refactor sequential code and produce heterogeneous parallel versions that can achieve significant and scalable speedups of up to 220 over the original sequential Erlang program on a 24‐core machine with a GPU.PostprintPeer reviewe
Language run-time systems:An overview
The proliferation of high-level programming languages with advanced language features and the need for portability across increasingly heterogeneous and hierarchical architectures require a sophisticated run-time system to manage program execution and available resources. Additional benefits include isolated execution of untrusted code and the potential for dynamic optimisation, among others. This paper provides a high-level overview of language run-time systems with a focus on execution models, support for concurrency and parallelism, memory management, and communication, whilst briefly mentioning synchronisation, monitoring, and adaptive policy control. Two alternative approaches to run-time system design are presented and several challenges for future research are outlined. References to both seminal and recent work are provided
In search of a map : using program slicing to discover potential parallelism in recursive functions
Funding: EU FP7 grant “Parallel Patterns for Adaptive Heterogeneous Multicore Systems” (ICT-288570), by the EU H2020 grant “RePhrase: Refactoring Parallel Het- erogeneous Resource-Aware Applications – a Software Engineering Approach” (ICT-644235), by COST Action IC1202 (“Timing Analysis on Code-Level”), by the EPSRC grant “Discovery: Pattern Discovery and Program Shaping for Manycore Systems” (EP/P020631/1), and by Scottish Enterprise grant PS7305CA44.Recursion schemes, such as the well-known map, can be used as loci of potential parallelism, where schemes are replaced with an equivalent parallel implementation. This paper formalises a novel technique, using program slicing, that automatically and statically identifies computations in recursive functions that can be lifted out of the function and then potentially performed in parallel. We define a new program slicing algorithm, build a prototype implementation, and demonstrate its use on 12 Haskell examples, including benchmarks from the NoFib suite and functions from the standard Haskell Prelude. In all cases, we obtain the expected results in terms of finding potential parallelism. Moreover, we have tested our prototype against synthetic benchmarks, and found that our prototype has quadratic time complexity. For the NoFib benchmark examples we demonstrate that relative parallel speedups can be obtained (up to 32.93x the sequential performance on 56 hyperthreaded cores).Postprin
Pattern discovery for parallelism in functional languages
No longer the preserve of specialist hardware, parallel devices
are now ubiquitous. Pattern-based approaches to parallelism,
such as algorithmic skeletons, simplify traditional low-level
approaches by presenting composable high-level patterns of
parallelism to the programmer. This allows optimal parallel
configurations to be derived automatically, and facilitates the
use of different parallel architectures. Moreover, parallel patterns
can be swap-replaced for sequential recursion schemes,
thus simplifying their introduction. Unfortunately, there is no
guarantee that recursion schemes are present in all functional
programs. Automatic pattern discovery techniques can be used
to discover recursion schemes. Current approaches are limited
by both the range of analysable functions, and by the range of
discoverable patterns. In this thesis, we present an approach
based on program slicing techniques that facilitates the analysis
of a wider range of explicitly recursive functions. We then
present an approach using anti-unification that expands the
range of discoverable patterns. In particular, this approach is
user-extensible; i.e. patterns developed by the programmer can
be discovered without significant effort. We present prototype
implementations of both approaches, and evaluate them on
a range of examples, including five parallel benchmarks and
functions from the Haskell Prelude. We achieve maximum
speedups of 32.93x on our 28-core hyperthreaded experimental
machine for our parallel benchmarks, demonstrating
that our approaches can discover patterns that produce good
parallel speedups. Together, the approaches presented in this
thesis enable the discovery of more loci of potential parallelism
in pure functional programs than currently possible.
This leads to more possibilities for parallelism, and so more
possibilities to take advantage of the potential performance
gains that heterogeneous parallel systems present
EasyFJP: Providing Hybrid Parallelism as a Concern for Divide and Conquer Java Applications
Because of the increasing availability of multi-core machines, clus- ters, Grids, and combinations of these there is now plenty of computational power,but today's programmers are not fully prepared to exploit parallelism. In particular, Java has helped in handling the heterogeneity of such environments. However, there is a lot of ground to cover regarding facilities to easily and elegantly parallelizing applications. One path to this end seems to be the synthesis of semi- automatic parallelism and Parallelism as a Concern (PaaC). The former allows users to be mostly unaware of parallel exploitation problems and at the same time manually optimize parallelized applications whenever necessary, while the latter allows applications to be separated from parallel-related code. In this paper, we present EasyFJP, an approach that implicitly exploits parallelism in Java applications based on the concept of fork-join synchronization pattern, a simple but effective abstraction for creating and coordinating parallel tasks. In addition, EasyFJP lets users to explicitly optimize applications through policies, or user-provided rules to dynamically regulate task granularity. Finally, EasyFJP relies on PaaC by means of source code generation techniques to wire applications and parallel-specific code together. Experiments with real-world applications on an emulated Grid and a cluster evidence that EasyFJP delivers competitive performance compared to state-of-the-art Java parallel programming tools.Fil: Mateos Diaz, Cristian Maximiliano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico - CONICET - Tandil. Instituto Superior de Ingenieria del Software; Argentina;Fil: Zunino Suarez, Alejandro Octavio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico - CONICET - Tandil. Instituto Superior de Ingenieria del Software; Argentina;Fil: Hirsch Jofré, Matías Eberardo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico - CONICET - Tandil. Instituto Superior de Ingenieria del Software; Argentina
Logic programming in the context of multiparadigm programming: the Oz experience
Oz is a multiparadigm language that supports logic programming as one of its
major paradigms. A multiparadigm language is designed to support different
programming paradigms (logic, functional, constraint, object-oriented,
sequential, concurrent, etc.) with equal ease. This article has two goals: to
give a tutorial of logic programming in Oz and to show how logic programming
fits naturally into the wider context of multiparadigm programming. Our
experience shows that there are two classes of problems, which we call
algorithmic and search problems, for which logic programming can help formulate
practical solutions. Algorithmic problems have known efficient algorithms.
Search problems do not have known efficient algorithms but can be solved with
search. The Oz support for logic programming targets these two problem classes
specifically, using the concepts needed for each. This is in contrast to the
Prolog approach, which targets both classes with one set of concepts, which
results in less than optimal support for each class. To explain the essential
difference between algorithmic and search programs, we define the Oz execution
model. This model subsumes both concurrent logic programming
(committed-choice-style) and search-based logic programming (Prolog-style).
Instead of Horn clause syntax, Oz has a simple, fully compositional,
higher-order syntax that accommodates the abilities of the language. We
conclude with lessons learned from this work, a brief history of Oz, and many
entry points into the Oz literature.Comment: 48 pages, to appear in the journal "Theory and Practice of Logic
Programming
A heuristic-based approach to code-smell detection
Encapsulation and data hiding are central tenets of the object oriented paradigm. Deciding what data and behaviour to form into a class and where to draw the line between its public and private details can make the difference between a class that is an understandable, flexible and reusable abstraction and one which is not. This decision is a difficult one and may easily result in poor encapsulation which can then have serious implications for a number of system qualities. It is often hard to identify such encapsulation problems within large software systems until they cause a maintenance problem (which is usually too late) and attempting to perform such analysis manually can also be tedious and error prone. Two of the common encapsulation problems that can arise as a consequence of this decomposition process are data classes and god classes. Typically, these two problems occur together – data classes are lacking in functionality that has typically been sucked into an over-complicated and domineering god class. This paper describes the architecture of a tool which automatically detects data and god classes that has been developed as a plug-in for the Eclipse IDE. The technique has been evaluated in a controlled study on two large open source systems which compare the tool results to similar work by Marinescu, who employs a metrics-based approach to detecting such features. The study provides some valuable insights into the strengths and weaknesses of the two approache
Structured arrows : a type-based framework for structured parallelism
This thesis deals with the important problem of parallelising sequential code. Despite the importance of parallelism in modern computing, writing parallel software still relies on many low-level and often error-prone approaches. These low-level approaches can lead to serious execution problems such as deadlocks and race conditions. Due to the non-deterministic behaviour of most parallel programs, testing parallel software can be both tedious and time-consuming. A way of providing guarantees of correctness for parallel programs would therefore provide significant benefit. Moreover, even if we ignore the problem of correctness, achieving good speedups is not straightforward, since this generally involves rewriting a program to consider a (possibly large) number of alternative parallelisations.
This thesis argues that new languages and frameworks are needed. These language and frameworks must not only support high-level parallel programming constructs, but must also provide predictable cost models for these parallel constructs. Moreover, they need to be built around solid, well-understood theories that ensure that: (a) changes to the source code will not change the functional behaviour of a program, and (b) the speedup obtained by doing the necessary changes is predictable. Algorithmic skeletons are parametric implementations of common patterns of parallelism that provide good abstractions for creating new high-level languages, and also support frameworks for parallel computing that satisfy the correctness and predictability requirements that we require.
This thesis presents a new type-based framework, based on the connection between structured parallelism and structured patterns of recursion, that provides parallel structures as type abstractions that can be used to statically parallelise a program. Specifically, this thesis exploits hylomorphisms as a single, unifying construct to represent the functional behaviour of parallel programs, and to perform correct code rewritings between alternative parallel implementations, represented as algorithmic skeletons. This thesis also defines a mechanism for deriving cost models for parallel constructs from a queue-based operational semantics. In this way, we can provide strong static guarantees about the correctness of a parallel program, while simultaneously achieving predictable speedups.“This work was supported by the University of St Andrews (School of Computer
Science); by the EU FP7 grant “ParaPhrase:Parallel Patterns Adaptive Heterogeneous Multicore Systems” (n. 288570); by the EU H2020
grant “RePhrase: Refactoring Parallel Heterogeneous Resource-Aware Applications
- a Software Engineering Approach” (ICT-644235), by COST
Action IC1202 (TACLe), supported by COST (European Cooperation Science and Technology); and by EPSRC grant “Discovery: Pattern Discovery
and Program Shaping for Manycore Systems” (EP/P020631/1)” -- Acknowledgement
Towards Implicit Parallel Programming for Systems
Multi-core processors require a program to be decomposable into independent parts that can execute in parallel in order to scale performance with the number of cores. But parallel programming is hard especially when the program requires state, which many system programs use for optimization, such as for example a cache to reduce disk I/O. Most prevalent parallel programming models do not support a notion of state and require the programmer to synchronize state access manually, i.e., outside the realms of an associated optimizing compiler. This prevents the compiler to introduce parallelism automatically and requires the programmer to optimize the program manually.
In this dissertation, we propose a programming language/compiler co-design to provide a new programming model for implicit parallel programming with state and a compiler that can optimize the program for a parallel execution.
We define the notion of a stateful function along with their composition and control structures. An example implementation of a highly scalable server shows that stateful functions smoothly integrate into existing programming language concepts, such as object-oriented programming and programming with structs. Our programming model is also highly practical and allows to gradually adapt existing code bases. As a case study, we implemented a new data processing core for the Hadoop Map/Reduce system to overcome existing performance bottlenecks. Our lambda-calculus-based compiler automatically extracts parallelism without changing the program's semantics. We added further domain-specific semantic-preserving transformations that reduce I/O calls for microservice programs. The runtime format of a program is a dataflow graph that can be executed in parallel, performs concurrent I/O and allows for non-blocking live updates
- …