17 research outputs found

    Polyèdres et Compilation

    No full text
    22 pagesInternational audienceLa première utilisation de polyèdres pour résoudre un problème de compilation, la parallélisation automatique de boucles en présence d'appels de procédure, a été décrite et implémenté il y a près de trente ans. Le modèle polyédrique est maintenant reconnu internationalement et est en phase d'intégration dans le compilateur GCC, bien que la complexité exponentielle des algorithmes associés ait été pendant très longtemps un motif justifiant leur refus pur et simple. L'objectif de cet article est de donner de nombreux exemples d'utilisation des polyèdres dans un compilateur optimiseur et de montrer qu'ils permettent de poser des conditions simples pour garantir la légalité de transformations

    Static Analysis of Upper and Lower Bounds on Dependences and Parallelism

    Get PDF
    Existing compilers often fail to parallelize sequential code, even when a program can be manually transformed into parallel form by a sequence of well-understood transformations (as is the case for many of the Perfect Club Benchmark programs). These failures can occur for several reasons: the code transformations implemented in the compiler may not be sufficient to produce parallel code, the compiler may not find the proper sequence of transformations, or the compiler may not be able to prove that one of the necessary transformations is legal. When a compiler extract sufficient parallelism from a program, the programmer extract additional parallelism. Unfortunately, the programmer is typically left to search for parallelism without significant assistance. The compiler generally does not give feedback about which parts of the program might contain additional parallelism, or about the types of transformations that might be needed to realize this parallelism. Standard program transformations and dependence abstractions cannot be used to provide this feedback. In this paper, we propose a two step approach for the search for parallelism in sequential programs: We first construct several sets of constraints that describe, for each statement, which iterations of that statement can be executed concurrently. By constructing constraints that correspond to different assumptions about which dependences might be eliminated through additional analysis, transformations and user assertions, we can determine whether we can expose parallelism by eliminating dependences. In the second step of our search for parallelism, we examine these constraint sets to identify the kinds of transformations that are needed to exploit scalable parallelism. Our tests will identify conditional parallelism and parallelism that can be exposed by combinations of transformations that reorder the iteration space (such as loop interchange and loop peeling). This approach lets us distinguish inherently sequential code from code that contains unexploited parallelism. It also produces information about the kinds of transformations that will be needed to parallelize the code, without worrying about the order of application of the transformations. Furthermore, when our dependence test is inexact, we can identify which unresolved dependences inhibit parallelism by comparing the effects of assuming dependence or independence. We are currently exploring the use of this information in programmer-assisted parallelization. (Also cross-referenced as UMIACS-TR-94-40

    Mapping Deviation: A Technique to Adapt or to Guard Loop Transformation Intuitions for Legality

    Get PDF
    International audienceParallel architectures are now omnipresent in mainstream electronic devices and exploiting them efficiently is a challenge for all developers. Hence, they need the support of languages, libraries and tools to assist them in the optimization or parallelization task. Compilers can provide a major help by automating this work. However they are very fragile black-boxes. A compiler may take a bad optimization decision because of imprecise heuristics or may turn off an optimization because of imprecise analyses, without providing much control or feedback to the end user. To address this issue, we introduce mapping deviation, a new compiler technique that aims at providing a useful feedback on the semantics of a given program restructuring. Starting from a transformation intuition a user or a compiler wants to apply, our algorithm studies its cor-rectness and can suggest changes or conditions to make it possible rather than being limited to the classical go/no-go answer. This algorithm builds on state-of-the-art polyhedral representation of programs and provides a high flexibility. We present two example applications of this technique: improving semi-automatic optimization tools for programmers and automatically designing runtime tests to check the correctness of a transformation for compilers

    Exploiting Multi-Level Parallelism in Streaming Applications for Heterogeneous Platforms with GPUs

    Get PDF
    Heterogeneous computing platforms support the traditional types of parallelism, such as e.g., instruction-level, data, task, and pipeline parallelism, and provide the opportunity to exploit a combination of different types of parallelism at different platform levels. The architectural diversity of platform components makes tapping into the platform potential a challenging programming task. This thesis makes an important step in this direction by introducing a novel methodology for automatic generation of structured, multi-level parallel programs from sequential applications. We introduce a novel hierarchical intermediate program representation (HiPRDG) that captures the notions of structure and hierarchy in the polyhedral model used for compile-time program transformation and code generation. Using the HiPRDG as the starting point, we present a novel method for generation of multi-level programs (MLPs) featuring different types of parallelism, such as task, data, and pipeline parallelism. Moreover, we introduce concepts and techniques for data parallelism identification, GPU code generation, and asynchronous data-driven execution on heterogeneous platforms with efficient overlapping of host-accelerator communication and computation. By enabling the modular, hybrid parallelization of program model components via HiPRDG, this thesis opens the door for highly efficient tailor-made parallel program generation and auto-tuning for next generations of multi-level heterogeneous platforms with diverse accelerators.Computer Systems, Imagery and Medi

    Analysis and application of Fourier-Motzkin variable elimination to program optimization : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, New Zealand

    Get PDF
    This thesis examines four of the most influential dependence analysis techniques in use by optimizing compilers: Fourier-Motzkin Variable Elimination, the Banerjee Bounds Test, the Omega Test, and the I-Test. Although the performance and effectiveness of these tests have previously been documented empirically, no in-depth analysis of how these techniques are related from a purely analytical perspective has been done. The analysis given here clarifies important aspects of the empirical results that were noted but never fully explained. A tighter bound on the performance of one of the Omega Test algorithms than was known previously is proved and a link is shown between the integer refinement technique used in the Omega Test and the well-known Frobenius Coin Problem. The application of a Fourier-Motzkin based algorithm to the elimination of redundant bound checks in Java bytecode is described. A system which incorporated this technique improved performance on the Java Grande Forum Benchmark Suite by up to 10 percent

    Optimization within a Unified Transformation Framework

    Get PDF
    Programmers typically want to write scientific programs in a high level language with semantics based on a sequential execution model. To execute efficiently on a parallel machine, however, a program typically needs to contain explicit parallelism and possibly explicit communication and synchronization. So, we need compilers to convert programs from the first of these forms to the second. There are two basic choices to be made when parallelizing a program. First, the computations of the program need to be distributed amongst the set of available processors. Second, the computations on each processor need to be ordered. My contribution has been the development of simple mathematical abstractions for representing these choices and the development of new algorithms for making these choices. I have developed a new framework that achieves good performance by minimizing communication between processors, minimizing the time processors spend waiting for messages from other processors, and ordering data accesses so as to exploit the memory hierarchy. This framework can be used by optimizing compilers, as well as by interactive transformation tools. The state of the art for vectorizing compilers is already quite good, but much work remains to bring parallelizing compilers up to the same standard. The main contribution of my work can be summarized as improving this situation by replacing existing ad hoc parallelization techniques with a sound underlying foundation on which future work can be built. (Also cross-referenced as UMIACS-TR-96-93

    Hybrid analysis of memory references and its application to automatic parallelization

    Get PDF
    Executing sequential code in parallel on a multithreaded machine has been an elusive goal of the academic and industrial research communities for many years. It has recently become more important due to the widespread introduction of multicores in PCs. Automatic multithreading has not been achieved because classic, static compiler analysis was not powerful enough and program behavior was found to be, in many cases, input dependent. Speculative thread level parallelization was a welcome avenue for advancing parallelization coverage but its performance was not always optimal due to the sometimes unnecessary overhead of checking every dynamic memory reference. In this dissertation we introduce a novel analysis technique, Hybrid Analysis, which unifies static and dynamic memory reference techniques into a seamless compiler framework which extracts almost maximum available parallelism from scientific codes and incurs close to the minimum necessary run time overhead. We present how to extract maximum information from the quantities that could not be sufficiently analyzed through static compiler methods, and how to generate sufficient conditions which, when evaluated dynamically, can validate optimizations. Our techniques have been fully implemented in the Polaris compiler and resulted in whole program speedups on a large number of industry standard benchmark applications

    Dependence Abstractions

    No full text
    International audienc

    Minimal data dependence abstractions for loop transformations

    No full text
    Many abstractions of program dependences have already been proposed, such as the Dependence Distance, the Dependence Direction Vector, the Dependence Level or the Dependence Cone. These di erent abstractions have di erent precision. The minimal abstraction associated to a transformation is the abstraction that contains the minimal amount of information necessary to decide when such a transformation is legal. The minimal abstractions for loop reordering and unimodular transformations are presented. As an example, the dependence cone, that approximates dependences by aconvex cone of the dependence distance vectors, is the minimal abstraction for unimodular transformations. It also contains enough information for legally applying all loop reordering transformations and nding the same set of valid mono- and multi-dimensional linear schedulings than the dependence distance set

    Tests des Dependances et Transformations de Programme

    No full text
    The parallelization of sequential programs requires several stages : analysis of dependence relations, representation of these dependences and application of transformations using this representation to find a parallel schedule for the program instructions. The success of parallelization depends on the precision of the dependences test and dependence representation used. In this thesis, we present and compare different dependence test algorithms and different data dependence abstractions. The algorithm of the PIPS parallelizer is based on a approximate feasibility test using Fourier-Motzkin elimination. Our experiments show that, in practice, it is accurate enough for treating dependences systems, and that its practical complexity is polynomial. Different dependence abstractions have different precision. For deciding whether a transformation is legal, several abstractions are admissible, meaning they contain enough information for knowing if this transformation is legal. The minimal a..
    corecore