333 research outputs found
Experiences with enumeration of integer projections of parametric polytopes
Many compiler optimization techniques depend on the ability to calculate the number of integer values that satisfy a given set of linear constraints. This count (the enumerator of a parametric polytope) is a function of the symbolic parameters that may appear in the constraints. In an extended problem (the "integer projection" of a parametric polytope), some of the variables that appear in the constraints may be existentially quantified and then the enumerated set corresponds to the projection of the integer points in a parametric polytope.
This paper shows how to reduce the enumeration of the integer projection of parametric polytopes to the enumeration of parametric polytopes. Two approaches are described and experimentally compared. Both can solve problems that were considered very difficult to solve analytically
Integer polyhedra for program analysis
Polyhedra are widely used in model checking and abstract interpretation. Polyhedral analysis is effective when the relationships between variables are linear, but suffers from imprecision when it is necessary to take into account the integrality of the represented space. Imprecision also arises when non-linear constraints occur. Moreover, in terms of tractability, even a space defined by linear constraints can become unmanageable owing to the excessive number of inequalities. Thus it is useful to identify those inequalities whose omission has least impact on the represented space. This paper shows how these issues can be addressed in a novel way by growing the integer hull of the space and approximating the number of integral points within a bounded polyhedron
Integer Affine Transformations of Parametric Z-polytopes and Applications to Loop Nest Optimization
The polyhedral model is a well-known compiler optimization framework for the analysis and transformation of affine loop nests. We present a new method concerning a difficult geometric operation that is raised by this model: the integer affine transformation of parametric Z-polytopes. The result of such a transformation is given by a worst-case exponential union of Z-polytopes. We also propose a polynomial algorithm (for fixed dimension), to count points in arbitrary unions of a fixed number of parametric Z-polytopes. We implemented these algorithms and compared them to other existing algorithms, for a set of applications to loop nest analysis and optimization
On lattice point counting in -modular polyhedra
Let a polyhedron be defined by one of the following ways:
(i) , where ,
and ;
(ii) , where , and .
And let all rank order minors of be bounded by in absolute
values. We show that the short rational generating function for the power
series can be computed with the
arithmetic complexity where and are fixed, , and
is the complexity to compute the Smith Normal Form for integer matrix. In particular, for the case (i) and for
the case (ii).
The simplest examples of polyhedra that meet conditions (i) or (ii) are the
simplicies, the subset sum polytope and the knapsack or multidimensional
knapsack polytopes.
We apply these results to parametric polytopes, and show that the step
polynomial representation of the function , where
is parametric polytope, can be computed by a polynomial time even in
varying dimension if has a close structure to the cases (i) or (ii). As
another consequence, we show that the coefficients of the Ehrhart
quasi-polynomial can be computed by a polynomial time algorithm for fixed and
Refactoring intermediately executed code to reduce cache capacity misses
The growing memory wall requires that more attention is given to the data cache behavior of programs. In this paper, attention is given to the capacity misses i.e. the misses that occur because the cache size is smaller than the data footprint between the use and the reuse of the same data. The data footprint is measured with the reuse distance metric, by counting the distinct memory locations accessed between use and reuse. For reuse distances larger than the cache size, the associated code needs to be refactored in a way that reduces the reuse distance to below the cache size so that the capacity misses are eliminated. In a number of simple loops, the reuse distance can be calculated analytically. However, in most cases profiling is needed to pinpoint the areas where the program needs to be transformed for better data locality. This is achieved by the reuse distance visualizer, RDVIS, which shows the intermediately executed code for critical data reuses. In addition, another tool, SLO, annotates the source program with suggestions for locality ptimization. Both tools have been used to analyze and to refactor a number of SPEC2000 benchmark programs with very positive results
Systematic Design Methods for Efficient Off-Chip DRAM Access
Typical design flows for digital hardware take, as their input, an abstract description
of computation and data transfer between logical memories. No existing commercial
high-level synthesis tool demonstrates the ability to map logical memory inferred from
a high level language to external memory resources. This thesis develops techniques for
doing this, specifically targeting off-chip dynamic memory (DRAM) devices. These are
a commodity technology in widespread use with standardised interfaces. In use, the
bandwidth of an external memory interface and the latency of memory requests asserted
on it may become the bottleneck limiting the performance of a hardware design. Careful
consideration of this is especially important when designing with DRAMs, whose latency
and bandwidth characteristics depend upon the sequence of memory requests issued by
a controller.
Throughout the work presented here, we pursue exact compile-time methods for designing
application-specific memory systems with a focus on guaranteeing predictable performance
through static analysis. This contrasts with much of the surveyed existing work,
which considers general purpose memory controllers and optimized policies which improve
performance in experiments run using simulation of suites of benchmark codes.
The work targets loop-nests within imperative source code, extracting a mathematical
representation of the loop-nest statements and their associated memory accesses, referred
to as the ‘Polytope Model’. We extend this mathematical representation to represent the
physical DRAM ‘row’ and ‘column’ structures accessed when performing memory transfers.
From this augmented representation, we can automatically derive DRAM controllers
which buffer data in on-chip memory and transfer data in an efficient order. Buffering
data and exploiting ‘reuse’ of data is shown to enable up to 50× reduction in the quantity
of data transferred to external memory. The reordering of memory transactions exploiting
knowledge of the physical layout of the DRAM device allowing to 4× improvement in
the efficiency of those data transfers
Mini-Workshop: Ehrhart-Quasipolynomials: Algebra, Combinatorics, and Geometry
[no abstract available
Faster Integer Points Counting in Parametric Polyhedra
In this paper, we consider the counting function for a parametric polyhedron , where . We give a new representation of ,
called a \emph{piece-wise step-polynomial with periodic coefficients}, which is
a generalization of piece-wise step-polynomials and integer/rational Ehrhart's
quasi-polynomials. It gives the fastest way to calculate in certain
scenarios. The most important cases are the following:
1) We show that, for the parametric polyhedron defined by a
standard-form system with a fixed number of equalities,
the function can be represented by a polynomial-time computable
function. In turn, such a representation of can be constructed by an
-time algorithm;
2) Assuming again that the number of equalities is fixed, we show that
integer/rational Ehrhart's quasi-polynomials of a polytope can be computed by
FPT-algorithms, parameterized by sub-determinants of or its elements;
3) Our representation of is more efficient than other known approaches,
if has bounded elements, especially if it is sparse in addition.
Additionally, we provide a discussion about possible applications in the area
of compiler optimization. In some "natural" assumptions on a program code, our
approach has the fastest complexity bounds
Iterative Schedule Optimization for Parallelization in the Polyhedron Model
In high-performance computing, one primary objective is to exploit the performance that the given target hardware can deliver to the fullest. Compilers that have the ability to automatically optimize programs for a specific target hardware can be highly useful in this context. Iterative (or search-based) compilation requires little or no prior knowledge and can adapt more easily to concrete programs and target hardware than static cost models and heuristics. Thereby, iterative compilation helps in situations in which static heuristics do not reflect the combination of input program and target hardware well. Moreover, iterative compilation may enable the derivation of more accurate cost models and heuristics for optimizing compilers. In this context, the polyhedron model is of help as it provides not only a mathematical representation of programs but, more importantly, a uniform representation of complex sequences of program transformations by schedule functions. The latter facilitates the systematic exploration of the set of legal transformations of a given program.
Early approaches to purely iterative schedule optimization in the polyhedron model do not limit their search to schedules that preserve program semantics and, thereby, suffer from the need to explore numbers of illegal schedules. More recent research ensures the legality of program transformations but presumes a sequential rather than a parallel execution of the transformed program. Other approaches do not perform a purely iterative optimization.
We propose an approach to iterative schedule optimization for parallelization and tiling in the polyhedron model. Our approach targets loop programs that profit from data locality optimization and coarse-grained loop parallelization. The schedule search space can be explored either randomly or by means of a genetic algorithm.
To determine a schedule's profitability, we rely primarily on measuring the transformed code's execution time. While benchmarking is accurate, it increases the time and resource consumption of program optimization tremendously and can even make it impractical. We address this limitation by proposing to learn surrogate models from schedules generated and evaluated in previous runs of the iterative optimization and to replace benchmarking by performance prediction to the extent possible.
Our evaluation on the PolyBench 4.1 benchmark set reveals that, in a given setting, iterative schedule optimization yields significantly higher speedups in the execution of the program to be optimized. Surrogate performance models learned from training data that was generated during previous iterative optimizations can reduce the benchmarking effort without strongly impairing the optimization result. A prerequisite for this approach is a sufficient similarity between the training programs and the program to be optimized
- …