17 research outputs found
Polyèdres et Compilation
22 pagesInternational audienceLa première utilisation de polyèdres pour résoudre un problème de compilation, la parallélisation automatique de boucles en présence d'appels de procédure, a été décrite et implémenté il y a près de trente ans. Le modèle polyédrique est maintenant reconnu internationalement et est en phase d'intégration dans le compilateur GCC, bien que la complexité exponentielle des algorithmes associés ait été pendant très longtemps un motif justifiant leur refus pur et simple. L'objectif de cet article est de donner de nombreux exemples d'utilisation des polyèdres dans un compilateur optimiseur et de montrer qu'ils permettent de poser des conditions simples pour garantir la légalité de transformations
Static Analysis of Upper and Lower Bounds on Dependences and Parallelism
Existing compilers often fail to parallelize sequential code, even
when a program can be manually transformed into parallel form
by a sequence of well-understood transformations
(as is the case for many of the Perfect Club Benchmark
programs).
These failures can occur for several reasons: the code transformations
implemented in the compiler may not be sufficient to produce parallel
code, the compiler may not find the proper sequence of
transformations, or the compiler may not be able to prove that one
of the necessary transformations is legal.
When a compiler extract sufficient parallelism from a program,
the programmer extract additional parallelism.
Unfortunately, the programmer is typically left to search for
parallelism without significant assistance.
The compiler generally does not give feedback about which parts of the
program might contain additional parallelism, or about the types of
transformations that might be needed to realize this parallelism.
Standard program transformations and dependence abstractions cannot be
used to provide this feedback.
In this paper, we propose a two step approach for the search for
parallelism in sequential programs:
We first construct several sets of constraints that describe, for each
statement, which iterations of that statement can be executed
concurrently.
By constructing constraints that correspond to different assumptions
about which dependences might be eliminated through additional
analysis, transformations and user assertions, we can determine
whether we can expose parallelism by eliminating dependences.
In the second step of our search for parallelism, we examine these
constraint sets to identify the kinds of transformations that are
needed to exploit scalable parallelism.
Our tests will identify conditional parallelism and parallelism that
can be exposed by combinations of transformations that reorder the
iteration space (such as loop interchange and loop peeling).
This approach lets us distinguish inherently sequential code from code
that contains unexploited parallelism.
It also produces information about the kinds of transformations that
will be needed to parallelize the code, without worrying about the
order of application of the transformations.
Furthermore, when our dependence test is inexact,
we can identify which unresolved dependences inhibit parallelism
by comparing the effects of assuming dependence or independence.
We are currently exploring the use of this information in
programmer-assisted parallelization.
(Also cross-referenced as UMIACS-TR-94-40
Mapping Deviation: A Technique to Adapt or to Guard Loop Transformation Intuitions for Legality
International audienceParallel architectures are now omnipresent in mainstream electronic devices and exploiting them efficiently is a challenge for all developers. Hence, they need the support of languages, libraries and tools to assist them in the optimization or parallelization task. Compilers can provide a major help by automating this work. However they are very fragile black-boxes. A compiler may take a bad optimization decision because of imprecise heuristics or may turn off an optimization because of imprecise analyses, without providing much control or feedback to the end user. To address this issue, we introduce mapping deviation, a new compiler technique that aims at providing a useful feedback on the semantics of a given program restructuring. Starting from a transformation intuition a user or a compiler wants to apply, our algorithm studies its cor-rectness and can suggest changes or conditions to make it possible rather than being limited to the classical go/no-go answer. This algorithm builds on state-of-the-art polyhedral representation of programs and provides a high flexibility. We present two example applications of this technique: improving semi-automatic optimization tools for programmers and automatically designing runtime tests to check the correctness of a transformation for compilers
Exploiting Multi-Level Parallelism in Streaming Applications for Heterogeneous Platforms with GPUs
Heterogeneous computing platforms support the traditional types of
parallelism, such as e.g., instruction-level, data, task, and pipeline
parallelism, and provide the opportunity to exploit a combination of
different types of parallelism at different platform levels. The
architectural diversity of platform components makes tapping into the
platform potential a challenging programming task. This thesis makes an
important step in this direction by introducing a novel methodology for
automatic generation of structured, multi-level parallel programs from
sequential applications. We introduce a novel hierarchical intermediate
program representation (HiPRDG) that captures the notions of structure
and hierarchy in the polyhedral model used for compile-time program
transformation and code generation. Using the HiPRDG as the starting
point, we present a novel method for generation of multi-level programs
(MLPs) featuring different types of parallelism, such as task, data, and
pipeline parallelism. Moreover, we introduce concepts and techniques for
data parallelism identification, GPU code generation, and asynchronous
data-driven execution on heterogeneous platforms with efficient
overlapping of host-accelerator communication and computation. By
enabling the modular, hybrid parallelization of program model components
via HiPRDG, this thesis opens the door for highly efficient tailor-made
parallel program generation and auto-tuning for next generations of
multi-level heterogeneous platforms with diverse accelerators.Computer Systems, Imagery and Medi
Analysis and application of Fourier-Motzkin variable elimination to program optimization : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, New Zealand
This thesis examines four of the most influential dependence analysis techniques in use by optimizing compilers:
Fourier-Motzkin Variable Elimination, the Banerjee Bounds Test, the Omega Test, and the I-Test.
Although the performance and effectiveness of these tests have previously been documented empirically,
no in-depth analysis of how these techniques are related from a purely analytical perspective has been done.
The analysis given here clarifies important aspects of the empirical results that were noted but never fully
explained. A tighter bound on the performance of one of the Omega Test algorithms than was known previously
is proved and a link is shown between the integer refinement technique used in the Omega Test
and the well-known Frobenius Coin Problem. The application of a Fourier-Motzkin based algorithm to the
elimination of redundant bound checks in Java bytecode is described. A system which incorporated this
technique improved performance on the Java Grande Forum Benchmark Suite by up to 10 percent
Optimization within a Unified Transformation Framework
Programmers typically want to write scientific programs in a high level
language with semantics based on a sequential execution model. To execute
efficiently on a parallel machine, however, a program typically needs to
contain explicit parallelism and possibly explicit communication and
synchronization. So, we need compilers to convert programs from the first
of these forms to the second. There are two basic choices to be made when
parallelizing a program. First, the computations of the program need to be
distributed amongst the set of available processors. Second, the computations
on each processor need to be ordered. My contribution has been the development
of simple mathematical abstractions for representing these choices and the
development of new algorithms for making these choices. I have developed a new
framework that achieves good performance by minimizing communication between
processors, minimizing the time processors spend waiting for messages from
other processors, and ordering data accesses so as to exploit the memory
hierarchy. This framework can be used by optimizing compilers, as well as by
interactive transformation tools. The state of the art for vectorizing
compilers is already quite good, but much work remains to bring parallelizing
compilers up to the same standard. The main contribution of my work can be
summarized as improving this situation by replacing existing ad hoc
parallelization techniques with a sound underlying foundation on which future
work can be built.
(Also cross-referenced as UMIACS-TR-96-93
Hybrid analysis of memory references and its application to automatic parallelization
Executing sequential code in parallel on a multithreaded machine has been an
elusive goal of the academic and industrial research communities for many years. It
has recently become more important due to the widespread introduction of multicores
in PCs. Automatic multithreading has not been achieved because classic, static
compiler analysis was not powerful enough and program behavior was found to be, in
many cases, input dependent. Speculative thread level parallelization was a welcome
avenue for advancing parallelization coverage but its performance was not always optimal
due to the sometimes unnecessary overhead of checking every dynamic memory
reference.
In this dissertation we introduce a novel analysis technique, Hybrid Analysis,
which unifies static and dynamic memory reference techniques into a seamless compiler
framework which extracts almost maximum available parallelism from scientific
codes and incurs close to the minimum necessary run time overhead. We present how
to extract maximum information from the quantities that could not be sufficiently
analyzed through static compiler methods, and how to generate sufficient conditions
which, when evaluated dynamically, can validate optimizations.
Our techniques have been fully implemented in the Polaris compiler and resulted
in whole program speedups on a large number of industry standard benchmark applications
Minimal data dependence abstractions for loop transformations
Many abstractions of program dependences have already been proposed, such as the Dependence Distance, the Dependence Direction Vector, the Dependence Level or the Dependence Cone. These di erent abstractions have di erent precision. The minimal abstraction associated to a transformation is the abstraction that contains the minimal amount of information necessary to decide when such a transformation is legal. The minimal abstractions for loop reordering and unimodular transformations are presented. As an example, the dependence cone, that approximates dependences by aconvex cone of the dependence distance vectors, is the minimal abstraction for unimodular transformations. It also contains enough information for legally applying all loop reordering transformations and nding the same set of valid mono- and multi-dimensional linear schedulings than the dependence distance set
Tests des Dependances et Transformations de Programme
The parallelization of sequential programs requires several stages : analysis of dependence relations, representation of these dependences and application of transformations using this representation to find a parallel schedule for the program instructions. The success of parallelization depends on the precision of the dependences test and dependence representation used. In this thesis, we present and compare different dependence test algorithms and different data dependence abstractions. The algorithm of the PIPS parallelizer is based on a approximate feasibility test using Fourier-Motzkin elimination. Our experiments show that, in practice, it is accurate enough for treating dependences systems, and that its practical complexity is polynomial. Different dependence abstractions have different precision. For deciding whether a transformation is legal, several abstractions are admissible, meaning they contain enough information for knowing if this transformation is legal. The minimal a..