48 research outputs found

    On Lock-Free Work-stealing Iterators for Parallel Data Structures

    Get PDF
    With the rise of multicores, there is a trend of supporting data-parallel collection operations in general purpose programming languages. These operations are highly parametric, incurring abstraction performance penalties. Furthermore, data-parallel operations must scale when applied to irregular workloads. Work-stealing is a proven technique for load balancing irregular workloads, but general purpose work-stealing also suffers abstraction penalties. We present a generic data-parallel collections design based on work-stealing for shared-memory architectures that overcomes abstraction penalties through callsite specialization of data-parallel operation instances. Moreover, we introduce \textit{work-stealing iterators} that allow fine-grained and efficient work-stealing for particular data-structures. By eliminating abstraction penalties and making work-stealing data-structure-aware we achieve up to 60x better performance compared to JVM-based approaches and 3x speedups compared to tools such as Intel TBB

    Achieving Efficient Work-Stealing for Data-Parallel Collections

    Get PDF
    In modern programming high-level data-structures are an important foundation for most applications. With the rise of the multi-core era, there is a growing trend of supporting data-parallel collection operations in general purpose programming languages and platforms. To facilitate object-oriented reuse these operations are highly parametric, incurring abstraction performance penalties. Furthermore, data-parallel operations must scale when used in problems with irregular workloads. Work-stealing is a proven load-balancing technique when it comes to irregular workloads, but general purpose work-stealing also suffers from abstraction penalties. In this paper we present a generic design of a data-parallel collections framework based on work-stealing for shared-memory architectures. We show how abstraction penalties can be overcome through callsite specialization of data-parallel operations instances. Moreover, we show how to make work-stealing fine-grained and efficient when specialized for particular data-structures. We experimentally validate the performance of different data-structures and data-parallel operations, achieving up to 60X better performance with abstraction penalties eliminated and 3X higher speedups by specializing work-stealing compared to existing approaches

    A pattern language for parallelizing irregular algorithms

    Get PDF
    Dissertação apresentada na Faculdade de CiĂȘncias e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia InformĂĄticaIn irregular algorithms, data set’s dependences and distributions cannot be statically predicted. This class of algorithms tends to organize computations in terms of data locality instead of parallelizing control in multiple threads. Thus, opportunities for exploiting parallelism vary dynamically, according to how the algorithm changes data dependences. As such, effective parallelization of such algorithms requires new approaches that account for that dynamic nature. This dissertation addresses the problem of building efficient parallel implementations of irregular algorithms by proposing to extract, analyze and document patterns of concurrency and parallelism present in the Galois parallelization framework for irregular algorithms. Patterns capture formal representations of a tangible solution to a problem that arises in a well defined context within a specific domain. We document the said patterns in a pattern language, i.e., a set of inter-dependent patterns that compose well-documented template solutions that can be reused whenever a certain problem arises in a well-known context

    Towards optimisation of model queries : A parallel execution approach

    Get PDF
    The growing size of software models poses significant scalability challenges. Amongst these challenges is the execution time of queries and transformations. In many cases, model management programs are (or can be) expressed as chains and combinations of core fundamental operations. Most of these operations are pure functions, making them amenable to parallelisation, lazy evaluation and short-circuiting. In this paper we show how all three of these optimisations can be combined in the context of Epsilon: an OCL-inspired family of model management languages. We compare our solutions with both interpreted and compiled OCL as well as hand-written Java code. Our experiments show a significant improvement in the performance of queries, especially on large models

    On a Near Optimal Work-Stealing Tree Data-Parallel Scheduler for Highly Irregular Workloads

    Get PDF
    We present a work-stealing algorithm for runtime scheduling of dataparallel operations in the context of shared-memory architectures on data sets with highly-irregular workloads that are not known a priori to the scheduler. This scheduler can parallelize loops and operations expressible with a parallel reduce or a parallel scan. The scheduler is based on the work-stealing tree data structure, which allows workers to decide on the work division in a lock-free, workloaddriven manner and attempts to minimize the amount of communication between them. A signiïŹcant effort is given to showing that the algorithm has the least possible amount of overhead. We provide an extensive experimental evaluation, comparing the advantages and shortcomings of different data-parallel schedulers in order to combine their strengths. We show speciïŹc workload distribution patterns appearing in practice for which different schedulers yield suboptimal speedup, explaining their drawbacks and demonstrating how the work-stealing tree scheduler overcomes them. We thus justify our design decisions experimentally, but also provide a theoretical background for our claims
    corecore