Search CORE

Stirling Online Research Repository (RIOXX)

Sheffield Hallam University Research Archive

Stirling Online Research Repository

Costing JIT Traces

Author: Maier Patrick
Morton John Magnus
Trinder Philip
Publication venue: School of Computing Science, University of Glasgow
Publication date
Field of study

Tracing JIT compilation generates units of compilation that are easy to analyse and are known to execute frequently. The AJITPar project aims to investigate whether the information in JIT traces can be used to make better scheduling decisions or perform code transformations to adapt the code for a specific parallel architecture. To achieve this goal, a cost model must be developed to estimate the execution time of an individual trace. This paper presents the design and implementation of a system for extracting JIT trace information from the Pycket JIT compiler. We define three increasingly parametric cost models for Pycket traces. We perform a search of the cost model parameter space using genetic algorithms to identify the best weightings for those parameters. We test the accuracy of these cost models for predicting the cost of individual traces on a set of loop-based micro-benchmarks. We also compare the accuracy of the cost models for predicting whole program execution time over the Pycket benchmark suite. Our results show that the weighted cost model using the weightings found from the genetic algorithm search has the best accuracy

Enlighten

Parallelisation of Haskell programs by refactoring

Author: Kozsik Tamás
Luksa Norbert
Publication venue
Publication date: 01/01/2018
Field of study

We propose a refactoring tool for the Haskell programming language, capable of introducing parallelism to the code with reduced effort from the programmer. Haskell has many ways to express concurrency and parallelism. Moreover, Eden, a dialect of Haskell supports a wide range of features for parallel and distributed computations. After comparing a number of possibilities we have found that the Eval Monad, the Par Monad and the Eden language provide similar parallel performance. Our tool is able to introduce parallelism by turning certain syntactic forms into the application of algorithmic skeletons, which are implemented with the Eval Monad, the Par Monad and the Eden language

University of Szeged

JIT-Based cost analysis for dynamic program transformations

Author: Abella
Albert
Albert
Albert
Aspinall
Autili
Aycock
Barthe
Bauman
Bolz
Bolz
Bozó
Brandolese
Brown
Cole
Dal Lago
Ferdinand
Hao
Hofmann
John Magnus Morton
Jones
Kersten
Lewis Steele
Magnus Morton
Magnus Morton
Maier
Maier
Pape
Patrick Maier
Peyton Jones
Phil Trinder
Rodrigues
Scaife
Spoonhower
Trinder
Wilhelm
Publication venue: 'Elsevier BV'
Publication date: 01/12/2016
Field of study

Tracing JIT compilation generates units of compilation that are easy to analyse and are known to execute frequently. The AJITPar project investigates whether the information in JIT traces can be used to dynamically transform programs for a specific parallel architecture. Hence a lightweight cost model is required for JIT traces. This paper presents the design and implementation of a system for extracting JIT trace information from the Pycket JIT compiler. We define three increasingly parametric cost models for Pycket traces. We determine the best weights for the cost model parameters using linear regression. We evaluate the effectiveness of the cost models for predicting the relative costs of transformed programs

Sheffield Hallam University Research Archive

Edinburgh Research Explorer

Enlighten

Programming Heterogeneous Parallel Machines Using Refactoring and Monte-Carlo Tree Search

Author: C Browne
DB Skillicorn
J Subhlok
K Ramamritham
M Aldinucci
M den Besten
Michael P Allen
RM Burstall
T Serban
Y-K Kwok
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/06/2020
Field of study

Funding: This work was supported by the EU Horizon 2020 project, TeamPlay, Grant Number 779882, and UK EPSRC Discovery, Grant Number EP/P020631/1.This paper presents a new technique for introducing and tuning parallelism for heterogeneous shared-memory systems (comprising a mixture of CPUs and GPUs), using a combination of algorithmic skeletons (such as farms and pipelines), Monte–Carlo tree search for deriving mappings of tasks to available hardware resources, and refactoring tool support for applying the patterns and mappings in an easy and effective way. Using our approach, we demonstrate easily obtainable, significant and scalable speedups on a number of case studies showing speedups of up to 41 over the sequential code on a 24-core machine with one GPU. We also demonstrate that the speedups obtained by mappings derived by the MCTS algorithm are within 5–15% of the best-obtained manual parallelisation.Publisher PDFPeer reviewe

Open Access Institutional Repository at Robert Gordon University

University of Dundee Online Publications

AGH (Akademia Górniczo-Hutnicza) University of Science and Technology: Journals

Parallel Patterns for Agent-based Evolutionary Computing

Author: Anielski Piotr
Byrski Aleksander
Kisiel-Dorohinicki Marek
Krzywicki Daniel
Mentel Szymon
Stypka Jan
Turek Wojciech
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 01/01/2016
Field of study

Computing applications such as metaheuristics-based optimization can greatly benefit from multi-core architectures available on modern supercomputers. In this paper, we describe an easy and efficient way to implement certain population-based algorithms (in the discussed case, multi-agent computing system) on such runtime environments. Our solution is based on an Erlang software library which implements dedicated parallel patterns. We provide technological details on our approach and discuss experimental results

Computer Science Journal (AGH University of Science and Technology, Krakow)

Biblioteka Nauki - repozytorium artykuÅÃ³w

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Static Analysis for Divide-and-Conquer Pattern Discovery

Author: Bozó István
Horváth Zoltán
Kozsik Tamás
Tóth Melinda
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 07/02/2017
Field of study

Routines implementing divide-and-conquer algorithms are good candidates for parallelization. Their identifying property is that such a routine divides its input into "smaller" chunks, calls itself recursively on these smaller chunks, and combines the outputs into one. We set up conditions which characterize a wide range of d&c routine definitions. These conditions can be verified by static program analysis. This way d&c routines can be found automatically in existing program texts, and their parallelization based on semi-automatic refactoring can be facilitated. We work out the details in the context of the Erlang programming language

Refactoring for introducing and tuning parallelism for heterogeneous multicore machines in Erlang

Author: Armstrong J
Brown C
Cole MI
Fields DK
Fowler M
Janjic V
Publication venue: 'Wiley'
Publication date: 24/06/2019
Field of study

This research has been generously supported by the European Union Framework 7 Para-Phrase project (IST-288570), EU Horizon 2020 projects RePhrase (H2020-ICT-2014-1), agreement number 644235; Teamplay (H2020-ICT 2017-1) agreement number 779882, and EPSRC Discovery, EP/P020631/1. EU COST Action IC1202: Timing Analysis On Code-Level (TACLe), and by a travel grant from EU HiPEAC.This paper presents semi‐automatic software refactorings to introduce and tune structured parallelism in sequential Erlang code, as well as to generate code for running computations on GPUs and possibly other accelerators. Our refactorings are based on the lapedo framework for programming heterogeneous multi‐core systems in Erlang. lapedo is based on the PaRTE refactoring tool and also contains (1) a set of hybrid skeletons that target both CPU and GPU processors, (2) novel refactorings for introducing and tuning parallelism, and (3) a tool to generate the GPU offloading and scheduling code in Erlang, which is used as a component of hybrid skeletons. We demonstrate, on four realistic use‐case applications, that we are able to refactor sequential code and produce heterogeneous parallel versions that can achieve significant and scalable speedups of up to 220 over the original sequential Erlang program on a 24‐core machine with a GPU.PostprintPeer reviewe

University of Dundee Online Publications

Automatically deriving cost models for structured parallel processes using hylomorphisms

Author: Alguwaifli Yasir
Castro David
Hammond Kevin
Sarkar Susmit
Publication venue: 'Elsevier BV'
Publication date: 01/02/2018
Field of study

This work has been partially supported by the EU Horizon 2020 grant “RePhrase: Refactoring Parallel Heterogeneous Resource-Aware Applications - a Software Engineering Approach” (ICT-644235), by COST Action IC1202 (TACLe), supported by COST (European Cooperation on Science and Technology), and by EPSRC grant EP/M027317/1 “C33: Scalable & Verified Shared Memory via Consistency-directed Cache Coherence”.Structured parallelism using nested algorithmic skeletons can greatly ease the task of writing parallel software, since common, but hard-to-debug, problems such as race conditions are eliminated by design. However, choosing the best combination of algorithmic skeletons to yield good parallel speedups for a specific program on a specific parallel architecture is still a difficult problem. This paper uses the unifying notion of hylomorphisms, a general recursion pattern, to make it possible to reason about both the functional correctness properties and the extra-functional timing properties of structured parallel programs. We have previously used hylomorphisms to provide a denotational semantics for skeletons, and proved that a given parallel structure for a program satisfies functional correctness. This paper expands on this theme, providing a simple operational semantics for algorithmic skeletons and a cost semantics that can be automatically derived from that operational semantics. We prove that both semantics are sound with respect to our previously defined denotational semantics. This means that we can now automatically and statically choose a provably optimal parallel structure for a given program with respect to a cost model for a (class of) parallel architecture. By deriving an automatic amortised analysis from our cost model, we can also accurately predict parallel runtimes and speedups.PostprintPeer reviewe

Finding parallel functional pearls : automatic parallel recursion scheme detection in Haskell functions via anti-unification

Author: Barwell Adam D.
Brown Christopher
Hammond Kevin
Publication venue
Publication date: 27/07/2017
Field of study

This work has been partially supported by the EU H2020 grant “RePhrase: Refactoring Parallel Heterogeneous Resource-Aware Applications–a Software Engineering Approach” (ICT-644235), by COST Action IC1202 (TACLe), supported by COST (European Cooperation in Science and Technology) , by EPSRC grant “Discovery: Pattern Discovery and Program Shaping for Manycore Systems” (EP/P020631/1), and by Scottish Enterprise PS7305CA44.This paper describes a new technique for identifying potentially parallelisable code structures in functional programs. Higher-order functions enable simple and easily understood abstractions that can be used to implement a variety of common recursion schemes, such as maps and folds over traversable data structures. Many of these recursion schemes have natural parallel implementations in the form of algorithmic skeletons. This paper presents a technique that detects instances of potentially parallelisable recursion schemes in Haskell 98 functions. Unusually, we exploit anti-unification to expose these recursion schemes from source-level definitions whose structures match a recursion scheme, but which are not necessarily written directly in terms of maps, folds, etc. This allows us to automatically introduce parallelism, without requiring the programmer to structure their code a priori in terms of specific higher-order functions. We have implemented our approach in the Haskell refactoring tool, HaRe, and demonstrated its use on a range of common benchmarking examples. Using our technique, we show that recursion schemes can be easily detected, that parallel implementations can be easily introduced, and that we can achieve real parallel speedups (up to 23 . 79 × the sequential performance on 28 physical cores, or 32 . 93 × the sequential performance with hyper-threading enabled).PostprintPeer reviewe

ZENODO