Search CORE

30 research outputs found

Verified Code Generation for the Polyhedral Model

Author: Courant Nathanaël
Leroy Xavier
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

International audienceThe polyhedral model is a high-level intermediate representation for loop nests that supports elegantly a great many loop optimizations. In a compiler, after polyhedral loop optimizations have been performed, it is necessary and difficult to regenerate sequential or parallel loop nests before continuing compilation. This paper reports on the formalization and proof of semantic preservation of such a code generator that produces sequential code from a polyhedral representation. The formalization and proofs are mechanized using the Coq proof assistant

INRIA a CCSD electronic archive server

HAL-Rennes 1

Parameterized and multi-level tiled loop generation

Author: Kim DaeGon
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2010
Field of study

Department Head: L. Darrell Whitley.2010 Summer.Includes bibliographical references.Tiling is a loop transformation that decomposes computations into a set of smaller computation blocks. The transformation has been proven to be useful for many high-level program optimizations, such as data locality optimization and exploiting coarse-grained parallelism, and crucial for architecture with limited resources, such as embedded systems, GPUs, and the Cell architecture. Data locality and parallelism will continue to serve as major vehicles for achieving high performance on modern architecture in multi-core era. In parameterized tiling the size of blocks is not fixed at compile time but remains a symbolic constant so that it can be selected/changed even at runtime. Parameterized tiled loops facilitate iterative and runtime optimizations, such as iterative compilation, auto-tuning and dynamic program adaption. In this dissertation we present a collection of techniques for generating parameterized and multi-level tiled loops from affine control loops and their parallelization. The tiled loop generation problem even for perfectly nested loops has been believed to have an exponential time complexity due to the heavy machinery like Fourier-Motzkin elimination. Disproving this decade-long belief, we provide a simple technique for generating tiled loop nests even from imperfectly nested loops. Our technique for perfectly nested loops consists of only syntactic processing that is applied only once and independently to each loop bound. Our approach to imperfectly nested loops is composed of a direct extension of the tiled code generation technique for perfectly nested loops and three simple optimizations on the resulting parameterized tiled loops. The generation as well as the optimizations are achieved only with purely syntactic processing, hence loop generation time remains negligible. We also present three schemes for multi-level tiling where tiling is applied more than once. All the schemes are scalable with respect to the number of tiling levels and can be combined to achieve better performance. To facilitate parallelization of parameterized tiled loops, we generate outermost tile-loops that are perfectly nested. We also provide a technique for statically restructuring parameterized tiled loops to the wavefront scheduling on shared memory system. Because the formulation of parameterized tiling does not fit into the well established polyhedral framework, such static restructuring has been a great challenge. However, we achieve this limited restructuring through a syntactic processing without any sophisticated machinery

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Recommended from our members

From ALPHA to imperative code : a transformational compiler for an array based functional language

Author: Wilde Doran K.
Publication venue: 'Oregon State University'
Publication date
Field of study

Practical parallel programming demands that the details of distributing data to processors and inter- processor communication be managed by the compiler. These tasks quickly become too di cult for a programmer to do by hand for all but the simplest parallel programs. Yet, many parallel languages still require the programmer to manage much of the the parallelism. I discuss the synthesis of parallel imperative code from algorithms written in a functional language called Alpha. Alpha is based on systems of a ne recurrence equations and was designed to specify algorithms for regular array architectures. Being a functional language, Alpha implicitly supports the expression of both concurrency and communication. Thus, the programmer is freed from having to explicitly manage the parallelism. Using the information derived from static analysis, Alpha can be transformed into a form suitable for generating imperative parallel code through a series of provably correct program transformations. The kinds of analysis needed to generate e cient code for array-based functional programs are a generalization of dependency analysis, usage analysis, and scheduling techniques used in systolic array synthesis

ScholarsArchive@OSU

Splitting Polyhedra to Generate More Efficient Code: Efficient Code Generation in the Polyhedral Model is Harder Than We Thought

Author: Bastoul Cédric
Loechner Vincent
Razanajato Harenome
Publication venue: HAL CCSD
Publication date: 23/01/2017
Field of study

International audienceCode generation in the polyhedral model takes as inputa union of Z-polyhedra and produces code scanning all ofthem. Modern code generation tools are heavily relying onpolyhedral operations to perform this task. However, theseoperations are typically provided by general-purpose poly-hedral libraries that are not specifically designed to addressthe code generation problem. In particular, (unions of) poly-hedra may be represented in various mathematically equiv-alent ways which may have different properties with respectto code generation. In this paper, we investigate this prob-lem and try to find the best representation of polyhedra togenerate efficient code.We present two contributions. First we demonstrate thatthis problem has been largely under-estimated, showing sig-nificant control overhead deviations when using differentrepresentations of the same polyhedra. Second, we proposean improvement to the main algorithm of the state-of-the-artcode generation tool CLooG. It generates code with fewertests in the inner loops, and aims to reduce control overheadand to simplify vectorization for the compiler, at the cost ofa larger code size. It is based on a smart splitting of theunion of polyhedra while recursing on the dimensions. Weimplemented our algorithm in CLooG/PolyLib, and com-pared the performance and size of the generated code to theCLooG/isl version

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Die Herausforderungen nichtlinearer Parameter und Variablen in automatischer Schleifenparallelisierung

Author: Größlinger Armin
Publication venue
Publication date: 18/12/2009
Field of study

With the rise of manycore processors, parallelism is becoming a mainstream necessity. Unfortunately, parallel programming is inherently more difficult than sequential programming; therefore, techniques for automatic parallelisation will become indispensable. We aim at extending the well-known polyhedron model, which promises this automation, beyond some of its current restrictions. Up to now, loop bounds and array subscripts in the modelled codes must be expressions linear in both the variables and the parameters. We lift this restriction and allow certain polynomial expressions instead of linear ones. With our extensions, we are able to handle more programs in all phases of the parallelisation process (dependence analysis, transformation of the program model, code generation). We extend Banerjee's classical dependence analysis to handle one non-linear parameter p, i.e., we are able to determine precisely the solutions of the system of conflict equalities for input programs with non-linear array accesses like A[p*i] in dependence of the residue class of p. We make contributions to three transformations desirable in automatic parallelisation. First, we show that using a generalised Simplex algorithm, which we have developed, schedules with non-linear parameters like theta(i)=floor(i/n) can be computed. In addition, such schedules can be expressed easily as a quantifier elimination problem but this approach turns out to be computationally less efficient with the available implementation. As a second transformation, we study parametric tiling which is used to adapt a parallelised program to the number of available processors at run time. Third, we present a localisation technique to exploit scratchpad memories on architectures on which data caching has to be handled by software. We transform a given code such that it keeps values which are reused in successive iterations of a sequential loop in the scratchpad. An access to a value written in an earlier iteration is served from the scratchpad to accelerate the access. In general, this transformation introduces non-linear loop bounds in the transformed model. Finally, we present an algorithm for generating code for arbitrary semi-algebraic iteration sets, i.e., for iteration sets described by polynomial inequalities in the variables and parameters. This is a vast generalisation of existing polyhedral code generation techniques. Although our algorithm is less efficient than polyhedral code generators, this paves the way for a code generator that can handle arbitrary parametric tilings and other transformations which introduce non-linear parameters (like non-linear schedules and the localisation we present) or even non-linear variables

Integer Affine Transformations of Parametric Z-polytopes and Applications to Loop Nest Optimization

Author: Loechner Vincent
Meister Benoit
Seghir Rachid
Publication venue: HAL CCSD
Publication date: 17/05/2010
Field of study

The polyhedral model is a well-known compiler optimization framework for the analysis and transformation of affine loop nests. We present a new method concerning a difficult geometric operation that is raised by this model: the integer affine transformation of parametric Z-polytopes. The result of such a transformation is given by a worst-case exponential union of Z-polytopes. We also propose a polynomial algorithm (for fixed dimension), to count points in arbitrary unions of a fixed number of parametric Z-polytopes. We implemented these algorithms and compared them to other existing algorithms, for a set of applications to loop nest analysis and optimization

INRIA a CCSD electronic archive server

Optimizing memory management on heterogeneous systems using polyhedral, compile-time techniques

Author: Βασιλειάδης Βασίλης Ι.
Publication venue
Publication date: 01/01/2013
Field of study

Σημείωση: διατίθεται συμπληρωματικό υλικό σε ξεχωριστό αρχείο

University of Thessaly Institutional Repository

Systematic Design Methods for Efficient Off-Chip DRAM Access

Author: Bayliss Samuel
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/05/2013
Field of study

Typical design flows for digital hardware take, as their input, an abstract description of computation and data transfer between logical memories. No existing commercial high-level synthesis tool demonstrates the ability to map logical memory inferred from a high level language to external memory resources. This thesis develops techniques for doing this, specifically targeting off-chip dynamic memory (DRAM) devices. These are a commodity technology in widespread use with standardised interfaces. In use, the bandwidth of an external memory interface and the latency of memory requests asserted on it may become the bottleneck limiting the performance of a hardware design. Careful consideration of this is especially important when designing with DRAMs, whose latency and bandwidth characteristics depend upon the sequence of memory requests issued by a controller. Throughout the work presented here, we pursue exact compile-time methods for designing application-specific memory systems with a focus on guaranteeing predictable performance through static analysis. This contrasts with much of the surveyed existing work, which considers general purpose memory controllers and optimized policies which improve performance in experiments run using simulation of suites of benchmark codes. The work targets loop-nests within imperative source code, extracting a mathematical representation of the loop-nest statements and their associated memory accesses, referred to as the ‘Polytope Model’. We extend this mathematical representation to represent the physical DRAM ‘row’ and ‘column’ structures accessed when performing memory transfers. From this augmented representation, we can automatically derive DRAM controllers which buffer data in on-chip memory and transfer data in an efficient order. Buffering data and exploiting ‘reuse’ of data is shown to enable up to 50× reduction in the quantity of data transferred to external memory. The reordering of memory transactions exploiting knowledge of the physical layout of the DRAM device allowing to 4× improvement in the efficiency of those data transfers

Spiral - Imperial College Digital Repository

運動計画をフィードバックループに含むヒューマノイドロボットの多点接触全身制御のための計算基盤

Author: Caron Stephan
カロンステファン
Publication venue: 情報理工学系研究科知能機械情報学専攻
Publication date: 24/03/2016
Field of study

学位の種別: 課程博士審査委員会委員 : （主査）東京大学教授中村仁彦, 東京大学教授下山勲, 東京大学教授稲葉雅幸, 東京大学教授國吉康夫, 東京大学准教授高野渉, LAAS-CNRSSenior Researcher LAUMOND Jean-PaulUniversity of Tokyo(東京大学

Iterative Schedule Optimization for Parallelization in the Polyhedron Model

Author: Ganser Stefan
Publication venue
Publication date: 17/03/2020
Field of study

In high-performance computing, one primary objective is to exploit the performance that the given target hardware can deliver to the fullest. Compilers that have the ability to automatically optimize programs for a specific target hardware can be highly useful in this context. Iterative (or search-based) compilation requires little or no prior knowledge and can adapt more easily to concrete programs and target hardware than static cost models and heuristics. Thereby, iterative compilation helps in situations in which static heuristics do not reflect the combination of input program and target hardware well. Moreover, iterative compilation may enable the derivation of more accurate cost models and heuristics for optimizing compilers. In this context, the polyhedron model is of help as it provides not only a mathematical representation of programs but, more importantly, a uniform representation of complex sequences of program transformations by schedule functions. The latter facilitates the systematic exploration of the set of legal transformations of a given program. Early approaches to purely iterative schedule optimization in the polyhedron model do not limit their search to schedules that preserve program semantics and, thereby, suffer from the need to explore numbers of illegal schedules. More recent research ensures the legality of program transformations but presumes a sequential rather than a parallel execution of the transformed program. Other approaches do not perform a purely iterative optimization. We propose an approach to iterative schedule optimization for parallelization and tiling in the polyhedron model. Our approach targets loop programs that profit from data locality optimization and coarse-grained loop parallelization. The schedule search space can be explored either randomly or by means of a genetic algorithm. To determine a schedule's profitability, we rely primarily on measuring the transformed code's execution time. While benchmarking is accurate, it increases the time and resource consumption of program optimization tremendously and can even make it impractical. We address this limitation by proposing to learn surrogate models from schedules generated and evaluated in previous runs of the iterative optimization and to replace benchmarking by performance prediction to the extent possible. Our evaluation on the PolyBench 4.1 benchmark set reveals that, in a given setting, iterative schedule optimization yields significantly higher speedups in the execution of the program to be optimized. Surrogate performance models learned from training data that was generated during previous iterative optimizations can reduce the benchmarking effort without strongly impairing the optimization result. A prerequisite for this approach is a sufficient similarity between the training programs and the program to be optimized