14,900 research outputs found

    pocl: A Performance-Portable OpenCL Implementation

    Get PDF
    OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus reducing the program porting effort. While the standard brings the obvious benefits of platform portability, the performance portability aspects are largely left to the programmer. The situation is made worse due to multiple proprietary vendor implementations with different characteristics, and, thus, required optimization strategies. In this paper, we propose an OpenCL implementation that is both portable and performance portable. At its core is a kernel compiler that can be used to exploit the data parallelism of OpenCL programs on multiple platforms with different parallel hardware styles. The kernel compiler is modularized to perform target-independent parallel region formation separately from the target-specific parallel mapping of the regions to enable support for various styles of fine-grained parallel resources such as subword SIMD extensions, SIMD datapaths and static multi-issue. Unlike previous similar techniques that work on the source level, the parallel region formation retains the information of the data parallelism using the LLVM IR and its metadata infrastructure. This data can be exploited by the later generic compiler passes for efficient parallelization. The proposed open source implementation of OpenCL is also platform portable, enabling OpenCL on a wide range of architectures, both already commercialized and on those that are still under research. The paper describes how the portability of the implementation is achieved. Our results show that most of the benchmarked applications when compiled using pocl were faster or close to as fast as the best proprietary OpenCL implementation for the platform at hand.Comment: This article was published in 2015; it is now openly accessible via arxi

    Expansion Trees with Cut

    Full text link
    Herbrand's theorem is one of the most fundamental insights in logic. From the syntactic point of view it suggests a compact representation of proofs in classical first- and higher-order logic by recording the information which instances have been chosen for which quantifiers, known in the literature as expansion trees. Such a representation is inherently analytic and hence corresponds to a cut-free sequent calculus proof. Recently several extensions of such proof representations to proofs with cut have been proposed. These extensions are based on graphical formalisms similar to proof nets and are limited to prenex formulas. In this paper we present a new approach that directly extends expansion trees by cuts and covers also non-prenex formulas. We describe a cut-elimination procedure for our expansion trees with cut that is based on the natural reduction steps. We prove that it is weakly normalizing using methods from the epsilon-calculus

    Enumerative Branching with Less Repetition

    Get PDF
    We can compactly represent large sets of solutions for problems with discrete decision variables by using decision diagrams. With them, we can efficiently identify optimal solutions for different objective functions. In fact, a decision diagram naturally arises from the branch-and-bound tree that we could use to enumerate these solutions if we merge nodes from which the same solutions are obtained on the remaining variables. However, we would like to avoid the repetitive work of finding the same solutions from branching on different nodes at the same level of that tree. Instead, we would like to explore just one of these equivalent nodes and then infer that the same solutions would have been found if we explored other nodes. In this work, we show how to identify such equivalences—and thus directly construct a reduced decision diagram—in integer programs where the left-hand sides of all constraints consist of additively separable functions. First, we extend an existing result regarding problems with a single linear constraint and integer coefficients. Second, we show necessary conditions with which we can isolate a single explored node as the only candidate to be equivalent to each unexplored node in problems with multiple constraints. Third, we present a sufficient condition that confirms if such a pair of nodes is indeed equivalent, and we demonstrate how to induce that condition through preprocessing. Finally, we report computational results on integer linear programming problems from the MIPLIB benchmark. Our approach often constructs smaller decision diagrams faster and with less branching

    Beyond Worst-Case Analysis for Joins with Minesweeper

    Full text link
    We describe a new algorithm, Minesweeper, that is able to satisfy stronger runtime guarantees than previous join algorithms (colloquially, `beyond worst-case guarantees') for data in indexed search trees. Our first contribution is developing a framework to measure this stronger notion of complexity, which we call {\it certificate complexity}, that extends notions of Barbay et al. and Demaine et al.; a certificate is a set of propositional formulae that certifies that the output is correct. This notion captures a natural class of join algorithms. In addition, the certificate allows us to define a strictly stronger notion of runtime complexity than traditional worst-case guarantees. Our second contribution is to develop a dichotomy theorem for the certificate-based notion of complexity. Roughly, we show that Minesweeper evaluates β\beta-acyclic queries in time linear in the certificate plus the output size, while for any β\beta-cyclic query there is some instance that takes superlinear time in the certificate (and for which the output is no larger than the certificate size). We also extend our certificate-complexity analysis to queries with bounded treewidth and the triangle query.Comment: [This is the full version of our PODS'2014 paper.

    Future value based single assignment program representations and optimizations

    Get PDF
    An optimizing compiler internal representation fundamentally affects the clarity, efficiency and feasibility of optimization algorithms employed by the compiler. Static Single Assignment (SSA) as a state-of-the-art program representation has great advantages though still can be improved. This dissertation explores the domain of single assignment beyond SSA, and presents two novel program representations: Future Gated Single Assignment (FGSA) and Recursive Future Predicated Form (RFPF). Both FGSA and RFPF embed control flow and data flow information, enabling efficient traversal program information and thus leading to better and simpler optimizations. We introduce future value concept, the designing base of both FGSA and RFPF, which permits a consumer instruction to be encountered before the producer of its source operand(s) in a control flow setting. We show that FGSA is efficiently computable by using a series T1/T2/TR transformation, yielding an expected linear time algorithm for combining together the construction of the pruned single assignment form and live analysis for both reducible and irreducible graphs. As a result, the approach results in an average reduction of 7.7%, with a maximum of 67% in the number of gating functions compared to the pruned SSA form on the SPEC2000 benchmark suite. We present a solid and near optimal framework to perform inverse transformation from single assignment programs. We demonstrate the importance of unrestricted code motion and present RFPF. We develop algorithms which enable instruction movement in acyclic, as well as cyclic regions, and show the ease to perform optimizations such as Partial Redundancy Elimination on RFPF

    On Global Types and Multi-Party Session

    Get PDF
    Global types are formal specifications that describe communication protocols in terms of their global interactions. We present a new, streamlined language of global types equipped with a trace-based semantics and whose features and restrictions are semantically justified. The multi-party sessions obtained projecting our global types enjoy a liveness property in addition to the traditional progress and are shown to be sound and complete with respect to the set of traces of the originating global type. Our notion of completeness is less demanding than the classical ones, allowing a multi-party session to leave out redundant traces from an underspecified global type. In addition to the technical content, we discuss some limitations of our language of global types and provide an extensive comparison with related specification languages adopted in different communities
    • …
    corecore