975 research outputs found

    An abstract machine for parallel graph reduction

    Get PDF
    technical reportAn abstract machine for parallel graph reduction on a shared memory multiprocessor is described. This is intended primarily for normal order (lazy) evaluation of functional programs. It is absolutely essential in such a design to adapt an efficient sequential model since during execution under limited resources available, performance will be reduced in the limit to that of the sequential engine. Parallel evaluation of normal order functional languages performed naively can result in poor overall performance despite the availability of sufficient processing elements and parallelism in the application. Needless context switching, task migration and continuation building may occur when a sequential thread of control would have sufficed. Furthermore, the compiler using static information cannot be fully aware of the availability of resources and their optimal utilization at any moment in run time. Indeed this may vary between runs which further aggravates the job of the compiler writer in generating optimal and compact code for programs. The benefits derived from this model are: 1) it is based on the G-machine so that execution under limited resources will default to a performance close to that of the G-machine; 2) the additional instructions needed to control the complexities of parallel evaluation are extremely simple, almost trivializing the job of the compiler writer; 3) attempts are made where possible to avoid context switching and task migration by retaining a sequential thread of control (made more clear in the paper), and 4) the method has demonstrated good overall performance on a shared memory multiprocessor

    SASLOG : Lazy Evaluation Meets Backtracking

    Get PDF
    We describe a combined functional / logic programming language SASLOG which contains Turner’s SASL, a fully lazy, higher-order functional language, and pure Prolog as subsets. Our integration is symmetric, i.e. functional terms can appear in the logic part of the program and v.v. Exploiting the natural correspondence between backtracking and lazy streams yields an elegant solution to the problem of transferring alternative variable bindings to the calling functional part of the program. We replace the rewriting approach to function evaluation by combinator graph reduction, thereby regaining computational efficiency and the structure sharing properties. Our solution is equally well suited to a fixed combinator set and to a super combinator implementation. In the paper we use Turner's fixed combinator set

    Parallel programming using functional languages

    Get PDF
    It has been argued for many years that functional programs are well suited to parallel evaluation. This thesis investigates this claim from a programming perspective; that is, it investigates parallel programming using functional languages. The approach taken has been to determine the minimum programming which is necessary in order to write efficient parallel programs. This has been attempted without the aid of clever compile-time analyses. It is argued that parallel evaluation should be explicitly expressed, by the programmer, in programs. To do achieve this a lazy functional language is extended with parallel and sequential combinators. The mathematical nature of functional languages means that programs can be formally derived by program transformation. To date, most work on program derivation has concerned sequential programs. In this thesis Squigol has been used to derive three parallel algorithms. Squigol is a functional calculus from program derivation, which is becoming increasingly popular. It is shown that some aspects of Squigol are suitable for parallel program derivation, while others aspects are specifically orientated towards sequential algorithm derivation. In order to write efficient parallel programs, parallelism must be controlled. Parallelism must be controlled in order to limit storage usage, the number of tasks and the minimum size of tasks. In particular over-eager evaluation or generating excessive numbers of tasks can consume too much storage. Also, tasks can be too small to be worth evaluating in parallel. Several program techniques for parallelism control were tried. These were compared with a run-time system heuristic for parallelism control. It was discovered that the best control was effected by a combination of run-time system and programmer control of parallelism. One of the problems with parallel programming using functional languages is that non-deterministic algorithms cannot be expressed. A bag (multiset) data type is proposed to allow a limited form of non-determinism to be expressed. Bags can be given a non-deterministic parallel implementation. However, providing the operations used to combine bag elements are associative and commutative, the result of bag operations will be deterministic. The onus is on the programmer to prove this, but usually this is not difficult. Also bags' insensitivity to ordering means that more transformations are directly applicable than if, say, lists were used instead. It is necessary to be able to reason about and measure the performance of parallel programs. For example, sometimes algorithms which seem intuitively to be good parallel ones, are not. For some higher order functions it is posible to devise parameterised formulae describing their performance. This is done for divide and conquer functions, which enables constraints to be formulated which guarantee that they have a good performance. Pipelined parallelism is difficult to analyse. Therefore a formal semantics for calculating the performance of pipelined programs is devised. This is used to analyse the performance of a pipelined Quicksort. By treating the performance semantics as a set of transformation rules, the simulation of parallel programs may be achieved by transforming programs. Some parallel programs perform poorly due to programming errors. A pragmatic method of debugging such programming errors is illustrated by some examples

    Parallel visual data restoration on multi-GPGPUs using stencil-reduce pattern

    Get PDF
    In this paper, a highly effective parallel filter for visual data restoration is presented. The filter is designed following a skeletal approach, using a newly proposed stencil-reduce, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, it is possible to run the filter seamlessly on a multicore machine, on multi-GPGPUs, or on both. The design and implementation of the filter are discussed, and an experimental evaluation is presented

    Computation in director string calculus

    Full text link
    In this thesis we introduce a modified version of Director String Calculus (MDSC) which preserves the applicative structure of the original lambda terms and captures the strong reduction as opposed to weak reduction of the original Director String Calculus (DSC). Furthermore, our reduction system provides an environment which supports the nonatomic nature of substitution operation and hence can lend itself to parallel and optimal reduction. We shall compare our reduction method with other reduction methods, and discuss some of the advantages and disadvantages of our method

    Stream Fusion, to Completeness

    Full text link
    Stream processing is mainstream (again): Widely-used stream libraries are now available for virtually all modern OO and functional languages, from Java to C# to Scala to OCaml to Haskell. Yet expressivity and performance are still lacking. For instance, the popular, well-optimized Java 8 streams do not support the zip operator and are still an order of magnitude slower than hand-written loops. We present the first approach that represents the full generality of stream processing and eliminates overheads, via the use of staging. It is based on an unusually rich semantic model of stream interaction. We support any combination of zipping, nesting (or flat-mapping), sub-ranging, filtering, mapping-of finite or infinite streams. Our model captures idiosyncrasies that a programmer uses in optimizing stream pipelines, such as rate differences and the choice of a "for" vs. "while" loops. Our approach delivers hand-written-like code, but automatically. It explicitly avoids the reliance on black-box optimizers and sufficiently-smart compilers, offering highest, guaranteed and portable performance. Our approach relies on high-level concepts that are then readily mapped into an implementation. Accordingly, we have two distinct implementations: an OCaml stream library, staged via MetaOCaml, and a Scala library for the JVM, staged via LMS. In both cases, we derive libraries richer and simultaneously many tens of times faster than past work. We greatly exceed in performance the standard stream libraries available in Java, Scala and OCaml, including the well-optimized Java 8 streams

    CPL: A Core Language for Cloud Computing -- Technical Report

    Full text link
    Running distributed applications in the cloud involves deployment. That is, distribution and configuration of application services and middleware infrastructure. The considerable complexity of these tasks resulted in the emergence of declarative JSON-based domain-specific deployment languages to develop deployment programs. However, existing deployment programs unsafely compose artifacts written in different languages, leading to bugs that are hard to detect before run time. Furthermore, deployment languages do not provide extension points for custom implementations of existing cloud services such as application-specific load balancing policies. To address these shortcomings, we propose CPL (Cloud Platform Language), a statically-typed core language for programming both distributed applications as well as their deployment on a cloud platform. In CPL, application services and deployment programs interact through statically typed, extensible interfaces, and an application can trigger further deployment at run time. We provide a formal semantics of CPL and demonstrate that it enables type-safe, composable and extensible libraries of service combinators, such as load balancing and fault tolerance.Comment: Technical report accompanying the MODULARITY '16 submissio

    Concurrency Analysis in Javascript Programs Using Arrows

    Get PDF
    Concurrency errors are difficult to detect and correct in asynchronous programs such as those implemented in JavaScript. One reason is that it is often difficult to keep track of which parts of the program may execute in parallel and potentially share resources in unexpected, and perhaps unintended, ways. While programming constructs such as promises can help improve the readability of asynchronous JavaScript programs that were traditionally written using callbacks, there are no static tools to identify asynchronous functions that run in parallel, which may potentially cause concurrency errors. In this work, we present a solution for implementing JavaScript programs using a library based on the abstraction of arrows. We enhanced the previous implementation of the arrows library by enabling its use with Node.js and by adding parallel asynchronous path detection. Automated identification of which arrows may execute in parallel helps the programmer narrow down the possible sources of concurrency errors

    Autonomic behavioural framework for structural parallelism over heterogeneous multi-core systems.

    Get PDF
    With the continuous advancement in hardware technologies, significant research has been devoted to design and develop high-level parallel programming models that allow programmers to exploit the latest developments in heterogeneous multi-core/many-core architectures. Structural programming paradigms propose a viable solution for e ciently programming modern heterogeneous multi-core architectures equipped with one or more programmable Graphics Processing Units (GPUs). Applying structured programming paradigms, it is possible to subdivide a system into building blocks (modules, skids or components) that can be independently created and then used in di erent systems to derive multiple functionalities. Exploiting such systematic divisions, it is possible to address extra-functional features such as application performance, portability and resource utilisations from the component level in heterogeneous multi-core architecture. While the computing function of a building block can vary for di erent applications, the behaviour (semantic) of the block remains intact. Therefore, by understanding the behaviour of building blocks and their structural compositions in parallel patterns, the process of constructing and coordinating a structured application can be automated. In this thesis we have proposed Structural Composition and Interaction Protocol (SKIP) as a systematic methodology to exploit the structural programming paradigm (Building block approach in this case) for constructing a structured application and extracting/injecting information from/to the structured application. Using SKIP methodology, we have designed and developed Performance Enhancement Infrastructure (PEI) as a SKIP compliant autonomic behavioural framework to automatically coordinate structured parallel applications based on the extracted extra-functional properties related to the parallel computation patterns. We have used 15 di erent PEI-based applications (from large scale applications with heavy input workload that take hours to execute to small-scale applications which take seconds to execute) to evaluate PEI in terms of overhead and performance improvements. The experiments have been carried out on 3 di erent Heterogeneous (CPU/GPU) multi-core architectures (including one cluster machine with 4 symmetric nodes with one GPU per node and 2 single machines with one GPU per machine). Our results demonstrate that with less than 3% overhead, we can achieve up to one order of magnitude speed-up when using PEI for enhancing application performance
    corecore