7 research outputs found

    Automatically deriving cost models for structured parallel processes using hylomorphisms

    Get PDF
    This work has been partially supported by the EU Horizon 2020 grant “RePhrase: Refactoring Parallel Heterogeneous Resource-Aware Applications - a Software Engineering Approach” (ICT-644235), by COST Action IC1202 (TACLe), supported by COST (European Cooperation on Science and Technology), and by EPSRC grant EP/M027317/1 “C33: Scalable & Verified Shared Memory via Consistency-directed Cache Coherence”.Structured parallelism using nested algorithmic skeletons can greatly ease the task of writing parallel software, since common, but hard-to-debug, problems such as race conditions are eliminated by design. However, choosing the best combination of algorithmic skeletons to yield good parallel speedups for a specific program on a specific parallel architecture is still a difficult problem. This paper uses the unifying notion of hylomorphisms, a general recursion pattern, to make it possible to reason about both the functional correctness properties and the extra-functional timing properties of structured parallel programs. We have previously used hylomorphisms to provide a denotational semantics for skeletons, and proved that a given parallel structure for a program satisfies functional correctness. This paper expands on this theme, providing a simple operational semantics for algorithmic skeletons and a cost semantics that can be automatically derived from that operational semantics. We prove that both semantics are sound with respect to our previously defined denotational semantics. This means that we can now automatically and statically choose a provably optimal parallel structure for a given program with respect to a cost model for a (class of) parallel architecture. By deriving an automatic amortised analysis from our cost model, we can also accurately predict parallel runtimes and speedups.PostprintPeer reviewe

    Structured arrows : a type-based framework for structured parallelism

    Get PDF
    This thesis deals with the important problem of parallelising sequential code. Despite the importance of parallelism in modern computing, writing parallel software still relies on many low-level and often error-prone approaches. These low-level approaches can lead to serious execution problems such as deadlocks and race conditions. Due to the non-deterministic behaviour of most parallel programs, testing parallel software can be both tedious and time-consuming. A way of providing guarantees of correctness for parallel programs would therefore provide significant benefit. Moreover, even if we ignore the problem of correctness, achieving good speedups is not straightforward, since this generally involves rewriting a program to consider a (possibly large) number of alternative parallelisations. This thesis argues that new languages and frameworks are needed. These language and frameworks must not only support high-level parallel programming constructs, but must also provide predictable cost models for these parallel constructs. Moreover, they need to be built around solid, well-understood theories that ensure that: (a) changes to the source code will not change the functional behaviour of a program, and (b) the speedup obtained by doing the necessary changes is predictable. Algorithmic skeletons are parametric implementations of common patterns of parallelism that provide good abstractions for creating new high-level languages, and also support frameworks for parallel computing that satisfy the correctness and predictability requirements that we require. This thesis presents a new type-based framework, based on the connection between structured parallelism and structured patterns of recursion, that provides parallel structures as type abstractions that can be used to statically parallelise a program. Specifically, this thesis exploits hylomorphisms as a single, unifying construct to represent the functional behaviour of parallel programs, and to perform correct code rewritings between alternative parallel implementations, represented as algorithmic skeletons. This thesis also defines a mechanism for deriving cost models for parallel constructs from a queue-based operational semantics. In this way, we can provide strong static guarantees about the correctness of a parallel program, while simultaneously achieving predictable speedups.“This work was supported by the University of St Andrews (School of Computer Science); by the EU FP7 grant “ParaPhrase:Parallel Patterns Adaptive Heterogeneous Multicore Systems” (n. 288570); by the EU H2020 grant “RePhrase: Refactoring Parallel Heterogeneous Resource-Aware Applications - a Software Engineering Approach” (ICT-644235), by COST Action IC1202 (TACLe), supported by COST (European Cooperation Science and Technology); and by EPSRC grant “Discovery: Pattern Discovery and Program Shaping for Manycore Systems” (EP/P020631/1)” -- Acknowledgement

    Pattern discovery for parallelism in functional languages

    Get PDF
    No longer the preserve of specialist hardware, parallel devices are now ubiquitous. Pattern-based approaches to parallelism, such as algorithmic skeletons, simplify traditional low-level approaches by presenting composable high-level patterns of parallelism to the programmer. This allows optimal parallel configurations to be derived automatically, and facilitates the use of different parallel architectures. Moreover, parallel patterns can be swap-replaced for sequential recursion schemes, thus simplifying their introduction. Unfortunately, there is no guarantee that recursion schemes are present in all functional programs. Automatic pattern discovery techniques can be used to discover recursion schemes. Current approaches are limited by both the range of analysable functions, and by the range of discoverable patterns. In this thesis, we present an approach based on program slicing techniques that facilitates the analysis of a wider range of explicitly recursive functions. We then present an approach using anti-unification that expands the range of discoverable patterns. In particular, this approach is user-extensible; i.e. patterns developed by the programmer can be discovered without significant effort. We present prototype implementations of both approaches, and evaluate them on a range of examples, including five parallel benchmarks and functions from the Haskell Prelude. We achieve maximum speedups of 32.93x on our 28-core hyperthreaded experimental machine for our parallel benchmarks, demonstrating that our approaches can discover patterns that produce good parallel speedups. Together, the approaches presented in this thesis enable the discovery of more loci of potential parallelism in pure functional programs than currently possible. This leads to more possibilities for parallelism, and so more possibilities to take advantage of the potential performance gains that heterogeneous parallel systems present

    Optimal program variant generation for hybrid manycore systems

    Get PDF
    Field Programmable Gate Arrays promise to deliver superior energy efficiency in heterogeneous high performance computing, as compared to multicore CPUs and GPUs. The rate of adoption is however hampered by the relative difficulty of programming FPGAs. High-level synthesis tools such as Xilinx Vivado, Altera OpenCL or Intel's HLS address a large part of the programmability issue by synthesizing a Hardware Description Languages representation from a high-level specification of the application, given in programming languages such as OpenCL C, typically used to program CPUs and GPUs. Although HLS solutions make programming easier, they fail to also lighten the burden of optimization. Application developers must rely on expert knowledge to manually optimize their applications for each target device, meaning that traditional HLS solutions do not offer a solution to the issue of performance portability. This state of fact prompted the development of compiler frameworks such as TyTra that operate at an even higher level of abstraction that is amenable to the use of Design Space Exploration (DSE). With DSE the initial program specification can be seen as the starting location in a search-space of correct-by-construction program transformations. In TyTra the search-space is generated from the transitive-closure of term-level transformations derived from type-level transformations. Compiler frameworks such as TyTra theoretically solve the issue of performance portability by providing a way to automatically generate alternative correct program variants. They however suffer from the very practical issue that the generated space is often too large to fully explore. As a consequence, the globally optimal solution may be overlooked. In this work we provide a novel solution to issue performance portability by deriving an efficient yet effective DSE strategy for the TyTra compiler framework. We make use of categorical data types to derive categorical semantics for the formal languages that describe the terms, types, cost-performance estimates and their transformations. From these we define a category of interpretations for TyTra applications, from which we derive a DSE strategy that finds the globally optimal transformation sequence in polynomial time. This is achieved by reducing the size of the generated search space. We formally state and prove a theorem for this claim and then show that the polynomial run-time for our DSE strategy has practically negligible coefficients leading to sub-second exploration times for realistic applications

    Finding parallel functional pearls : automatic parallel recursion scheme detection in Haskell functions via anti-unification

    Get PDF
    This work has been partially supported by the EU H2020 grant “RePhrase: Refactoring Parallel Heterogeneous Resource-Aware Applications–a Software Engineering Approach” (ICT-644235), by COST Action IC1202 (TACLe), supported by COST (European Cooperation in Science and Technology) , by EPSRC grant “Discovery: Pattern Discovery and Program Shaping for Manycore Systems” (EP/P020631/1), and by Scottish Enterprise PS7305CA44.This paper describes a new technique for identifying potentially parallelisable code structures in functional programs. Higher-order functions enable simple and easily understood abstractions that can be used to implement a variety of common recursion schemes, such as maps and folds over traversable data structures. Many of these recursion schemes have natural parallel implementations in the form of algorithmic skeletons. This paper presents a technique that detects instances of potentially parallelisable recursion schemes in Haskell 98 functions. Unusually, we exploit anti-unification to expose these recursion schemes from source-level definitions whose structures match a recursion scheme, but which are not necessarily written directly in terms of maps, folds, etc. This allows us to automatically introduce parallelism, without requiring the programmer to structure their code a priori in terms of specific higher-order functions. We have implemented our approach in the Haskell refactoring tool, HaRe, and demonstrated its use on a range of common benchmarking examples. Using our technique, we show that recursion schemes can be easily detected, that parallel implementations can be easily introduced, and that we can achieve real parallel speedups (up to 23 . 79 × the sequential performance on 28 physical cores, or 32 . 93 × the sequential performance with hyper-threading enabled).PostprintPeer reviewe

    AutoPar: automating the parallelization of functional programs

    Get PDF
    As the pervasiveness of parallel architectures in computing increases, so does the need for efficiently implemented parallel software. However, the development of parallel software is inherently more difficult than that of sequential software and is fraught with many pitfalls, such as race conditions and locking issues, amongst others. Developers are typically more comfortable developing sequentially, yet as the limitations of single-core processor speeds are reached, they have no choice but to reach for parallel implementations to obtain the required performance increases. An obvious solution to the parallelisation problem is to allow developers to continue to develop sequentially and generate efficient parallel programs automatically from these sequential ones. There are many existing techniques which automate the parallelisation process, however these techniques place many constraints upon the programs they are applicable to. This thesis defines a fully automatic parallelisation technique which places no restriction on its input programs and is applicable to programs defined using any data-type. The technique consists of two components: the first allows a given program to be redefined in terms of well-partitioned data. The second then explicitly parallelises the resulting program using Glasgow parallel Haskell. The technique is applied to several Haskell programs, the results of which have then been benchmarked with respect to the performance of handparallelised versions of the original programs. The benchmarking process has recorded the execution time and parallel performance of each benchmark program. The evaluation of the benchmark results has allowed for the merit of the automated parallelisation technique to be shown

    Symmetric Edit Lenses: A New Foundation for Bidirectional Languages

    Get PDF
    Lenses are bidirectional transformations between pairs of connected structures capable of translating an edit on one structure into an edit on the other. Most of the extensive existing work on lenses has focused on the special case of asymmetric lenses, where one structures is taken as primary and the other is thought of as a projection or view. Some symmetric variants exist, where each structure contains information not present in the other, but these all lack the basic operation of composition. Additionally, existing accounts do not represent edits carefully, making incremental operation difficult or producing unsatisfactory synchronization candidates. We present a new symmetric formulation which works with descriptions of changes to structures, rather than with the structures themselves. We construct a semantic space of edit lenses between “editable structures”—monoids of edits with a partial monoid action for applying edits—with natural laws governing their behavior. We present generalizations of a number of known constructions on asymmetric lenses and settle some longstanding questions about their properties—in particular, we prove the existence of (symmetric monoidal) tensor products and sums and the non-existence of full categorical products and sums in a category of lenses. Universal algebra shows how to build iterator lenses for structured data such as lists and trees, yielding lenses for operations like mapping, filtering, and concatenation from first principles. More generally, we provide mapping combinators based on the theory of containers. Finally, we present a prototype implementation of the core theory and take a first step in addressing the challenge of translating between user gestures and the internal representation of edits
    corecore