19 research outputs found

    Effect inference for deterministic parallelism

    Get PDF
    In this report we sketch a polymorphic type and effect inference system for ensuring deterministic execution of parallel programs containing shared mutable state. It differs from that of Gifford and Lucassen in being based on Hindley Milner polymorphism and in formalizing the operational semantics of parallel and sequential computation

    A Comparison of some recent Task-based Parallel Programming Models

    Get PDF
    The need for parallel programming models that are simple to use and at the same time efficient for current ant future parallel platforms has led to recent attention to task-based models such as Cilk++, Intel TBB and the task concept in OpenMP version 3.0. The choice of model and implementation can have a major impact on the final performance and in order to understand some of the trade-offs we have made a quantitative study comparing four implementations of OpenMP (gcc, Intel icc, Sun studio and the research compiler Mercurium/nanos mcc), Cilk++ and Wool, a high-performance task-based library developed at SICS. Abstract. We use microbenchmarks to characterize costs for task-creation and stealing and the Barcelona OpenMP Tasks Suite for characterizing application performance. By far Wool and Cilk++ have the lowest overhead in both spawning and stealing tasks. This is reflected in application performance when many tasks with small granularity are spawned where Cilk++ and, in particular, has the highest performance. For coarse granularity applications, the OpenMP implementations have quite similar performance as the more light-weight Cilk++ and Wool except for one application where mcc is superior thanks to a superior task scheduler. Abstract. The OpenMP implemenations are generally not yet ready for use when the task granularity becomes very small. There is no inherent reason for this, so we expect future implementations of OpenMP to focus on this issue

    Modular cloning

    Get PDF
    In this paper we deal with the problem of making context dependent interprocedural optimizations (where the legality of optimizing a function depends on properties of the callers of the function) effective and compatible with (a form of) separate compilation. We improve effectiveness by cloning, generating several versions of a single function optimized for different call sites. We attack the separate compilation problem, that code can not be generated until all calls of a function are known, by splitting the compilation process into two phases. The first phase analyses the modules one at a time in bottom-up dependency order ('main' is processed last) and produces code in an intermediate language where the constructs targeted by the optimization are annotated to control the application of the optimization. In cases where the legality of an optimization depends on properties of the callers of the function, these annotations can take the form of annotation variables which become extra formal parameters. The second phase traverses the modules in top-down dependency order, removing all of these extra parameters by specialization. We illustrate our approach with an integrated programming analysis and transformation system featuring a context sensitive type based analysis, cloning with sharing of identical clones and a modular implementation allowing for the compilation of large programs. The system implements cheap eagerness and redundant eval elimination for a lazy functional language

    Wool-A work stealing library

    No full text

    JakobRogstadius/MOSTACHI: v1.0

    No full text
    Initial release for peer review

    Resource management for task-based parallel programs over a multi-kernel. : BIAS: Barrelfish Inter-core Adaptive Scheduling

    No full text
    Trying to attack the problem of resource contention, created by multiple parallel applications running simultaneously, we propose a space-sharing, two-level, adaptive scheduler for the BarrelïŹsh operating system.The ïŹrst level is system-wide, running close to the OS’ kernel, and has knowledge of the available resources, while the second level, integrated into the application’s runtime, is aware of its type and amount of parallelism. Feedback on efficiency from the second-level to the ïŹrst-level, allows the latter to adaptively modify the allotment of cores (domain), intelligently promoting space-sharing of resources while still allowing time-sharing when needed.In order to avoid excess inter-core communication, the system-level scheduler is designed as a distributed service, taking advantage of the message-passing nature of BarrelïŹsh. The processor topology is partitioned so that each instance of the scheduler handles an appropriately sized subset of cores.Malleability is achieved by suspending worker-threads. Two different methodologies are introduced and explained, each suitable for distinct programming models and applications.Preliminary results are quite promising and show minimal added overhead. In specific multiprogramming conïŹgurations, initial experiments proved significant performance improvement by avoiding contention.QC 20130116Barrelfis

    A Quantitative Evaluation of popular Task-Centric Programming Models and Libraries

    No full text
    Programmers today face a bewildering array ofparallel programming models and tools, making it difficult tochoose an appropriate one for each application. The presentstudy focuses on the task centric approach and compares severalpopular systems, including Cilk Plus, TBB and various imple-mentations of OpenMP 3.0. We analyse their performance on theBOTS benchmark suite both on a 48 core Magny Cours serverand a 64 core TILEPro64 embedded manycore processor.QC 20121214</p

    Dynamic Inter-core Scheduling in Barrelfish : avoiding contention with malleable process domains

    No full text
    Trying to attack the problem of resource contention, created by multiple parallel applications running simultaneously, we propose a space-sharing, two-level, adaptive scheduler for the Barrelfish operating system. The first level is system-wide, existing inside the OS, and has knowledge of the available resources, while the second level is aware of the parallelism in the application. Feedback on efficiency from the second-level to the first-level, allows the latter to adaptively modify the allotment of cores (domain) thus intelligently avoiding time-sharing. In order to avoid excess inter-core communication, the first-level scheduler is designed as a distributed service, taking advantage of the message-passing nature of Barrelfish. The processor topology is partitioned so that each instance of the scheduler handles an appropriately sized subset of cores. Malleability is achieved by suspending worker-threads. Two different methodologies are introduced and explained, each ideal for different situations.QC 20120202Barrelfis
    corecore