333 research outputs found

    Experiences with enumeration of integer projections of parametric polytopes

    Get PDF
    Many compiler optimization techniques depend on the ability to calculate the number of integer values that satisfy a given set of linear constraints. This count (the enumerator of a parametric polytope) is a function of the symbolic parameters that may appear in the constraints. In an extended problem (the "integer projection" of a parametric polytope), some of the variables that appear in the constraints may be existentially quantified and then the enumerated set corresponds to the projection of the integer points in a parametric polytope. This paper shows how to reduce the enumeration of the integer projection of parametric polytopes to the enumeration of parametric polytopes. Two approaches are described and experimentally compared. Both can solve problems that were considered very difficult to solve analytically

    Integer polyhedra for program analysis

    Get PDF
    Polyhedra are widely used in model checking and abstract interpretation. Polyhedral analysis is effective when the relationships between variables are linear, but suffers from imprecision when it is necessary to take into account the integrality of the represented space. Imprecision also arises when non-linear constraints occur. Moreover, in terms of tractability, even a space defined by linear constraints can become unmanageable owing to the excessive number of inequalities. Thus it is useful to identify those inequalities whose omission has least impact on the represented space. This paper shows how these issues can be addressed in a novel way by growing the integer hull of the space and approximating the number of integral points within a bounded polyhedron

    Integer Affine Transformations of Parametric Z-polytopes and Applications to Loop Nest Optimization

    Get PDF
    The polyhedral model is a well-known compiler optimization framework for the analysis and transformation of affine loop nests. We present a new method concerning a difficult geometric operation that is raised by this model: the integer affine transformation of parametric Z-polytopes. The result of such a transformation is given by a worst-case exponential union of Z-polytopes. We also propose a polynomial algorithm (for fixed dimension), to count points in arbitrary unions of a fixed number of parametric Z-polytopes. We implemented these algorithms and compared them to other existing algorithms, for a set of applications to loop nest analysis and optimization

    On lattice point counting in Δ\Delta-modular polyhedra

    Full text link
    Let a polyhedron PP be defined by one of the following ways: (i) P={xRn ⁣:Axb}P = \{x \in R^n \colon A x \leq b\}, where AZ(n+k)×nA \in Z^{(n+k) \times n}, bZ(n+k)b \in Z^{(n+k)} and rankA=nrank\, A = n; (ii) P={xR+n ⁣:Ax=b}P = \{x \in R_+^n \colon A x = b\}, where AZk×nA \in Z^{k \times n}, bZkb \in Z^{k} and rankA=krank\, A = k. And let all rank order minors of AA be bounded by Δ\Delta in absolute values. We show that the short rational generating function for the power series mPZnxm \sum\limits_{m \in P \cap Z^n} x^m can be computed with the arithmetic complexity O(TSNF(d)dkdlog2Δ), O\left(T_{SNF}(d) \cdot d^{k} \cdot d^{\log_2 \Delta}\right), where kk and Δ\Delta are fixed, d=dimPd = \dim P, and TSNF(m)T_{SNF}(m) is the complexity to compute the Smith Normal Form for m×mm \times m integer matrix. In particular, d=nd = n for the case (i) and d=nkd = n-k for the case (ii). The simplest examples of polyhedra that meet conditions (i) or (ii) are the simplicies, the subset sum polytope and the knapsack or multidimensional knapsack polytopes. We apply these results to parametric polytopes, and show that the step polynomial representation of the function cP(y)=PyZnc_P(y) = |P_{y} \cap Z^n|, where PyP_{y} is parametric polytope, can be computed by a polynomial time even in varying dimension if PyP_{y} has a close structure to the cases (i) or (ii). As another consequence, we show that the coefficients ei(P,m)e_i(P,m) of the Ehrhart quasi-polynomial mPZn=j=0nei(P,m)mj \left| mP \cap Z^n\right| = \sum\limits_{j = 0}^n e_i(P,m)m^j can be computed by a polynomial time algorithm for fixed kk and Δ\Delta

    Refactoring intermediately executed code to reduce cache capacity misses

    Get PDF
    The growing memory wall requires that more attention is given to the data cache behavior of programs. In this paper, attention is given to the capacity misses i.e. the misses that occur because the cache size is smaller than the data footprint between the use and the reuse of the same data. The data footprint is measured with the reuse distance metric, by counting the distinct memory locations accessed between use and reuse. For reuse distances larger than the cache size, the associated code needs to be refactored in a way that reduces the reuse distance to below the cache size so that the capacity misses are eliminated. In a number of simple loops, the reuse distance can be calculated analytically. However, in most cases profiling is needed to pinpoint the areas where the program needs to be transformed for better data locality. This is achieved by the reuse distance visualizer, RDVIS, which shows the intermediately executed code for critical data reuses. In addition, another tool, SLO, annotates the source program with suggestions for locality ptimization. Both tools have been used to analyze and to refactor a number of SPEC2000 benchmark programs with very positive results

    Systematic Design Methods for Efficient Off-Chip DRAM Access

    No full text
    Typical design flows for digital hardware take, as their input, an abstract description of computation and data transfer between logical memories. No existing commercial high-level synthesis tool demonstrates the ability to map logical memory inferred from a high level language to external memory resources. This thesis develops techniques for doing this, specifically targeting off-chip dynamic memory (DRAM) devices. These are a commodity technology in widespread use with standardised interfaces. In use, the bandwidth of an external memory interface and the latency of memory requests asserted on it may become the bottleneck limiting the performance of a hardware design. Careful consideration of this is especially important when designing with DRAMs, whose latency and bandwidth characteristics depend upon the sequence of memory requests issued by a controller. Throughout the work presented here, we pursue exact compile-time methods for designing application-specific memory systems with a focus on guaranteeing predictable performance through static analysis. This contrasts with much of the surveyed existing work, which considers general purpose memory controllers and optimized policies which improve performance in experiments run using simulation of suites of benchmark codes. The work targets loop-nests within imperative source code, extracting a mathematical representation of the loop-nest statements and their associated memory accesses, referred to as the ‘Polytope Model’. We extend this mathematical representation to represent the physical DRAM ‘row’ and ‘column’ structures accessed when performing memory transfers. From this augmented representation, we can automatically derive DRAM controllers which buffer data in on-chip memory and transfer data in an efficient order. Buffering data and exploiting ‘reuse’ of data is shown to enable up to 50× reduction in the quantity of data transferred to external memory. The reordering of memory transactions exploiting knowledge of the physical layout of the DRAM device allowing to 4× improvement in the efficiency of those data transfers

    Mini-Workshop: Ehrhart-Quasipolynomials: Algebra, Combinatorics, and Geometry

    Get PDF
    [no abstract available

    Faster Integer Points Counting in Parametric Polyhedra

    Full text link
    In this paper, we consider the counting function EP(y)=PyZnxE_P(y) = |P_{y} \cap Z^{n_x}| for a parametric polyhedron Py={xRnx ⁣:Axb+By}P_{y} = \{x \in R^{n_x} \colon A x \leq b + B y\}, where yRnyy \in R^{n_y}. We give a new representation of EP(y)E_P(y), called a \emph{piece-wise step-polynomial with periodic coefficients}, which is a generalization of piece-wise step-polynomials and integer/rational Ehrhart's quasi-polynomials. It gives the fastest way to calculate EP(y)E_P(y) in certain scenarios. The most important cases are the following: 1) We show that, for the parametric polyhedron PyP_y defined by a standard-form system Ax=y,x0A x = y,\, x \geq 0 with a fixed number of equalities, the function EP(y)E_P(y) can be represented by a polynomial-time computable function. In turn, such a representation of EP(y)E_P(y) can be constructed by an poly(n,A)poly\bigl(n, \|A\|_{\infty}\bigr)-time algorithm; 2) Assuming again that the number of equalities is fixed, we show that integer/rational Ehrhart's quasi-polynomials of a polytope can be computed by FPT-algorithms, parameterized by sub-determinants of AA or its elements; 3) Our representation of EPE_P is more efficient than other known approaches, if AA has bounded elements, especially if it is sparse in addition. Additionally, we provide a discussion about possible applications in the area of compiler optimization. In some "natural" assumptions on a program code, our approach has the fastest complexity bounds

    Iterative Schedule Optimization for Parallelization in the Polyhedron Model

    Get PDF
    In high-performance computing, one primary objective is to exploit the performance that the given target hardware can deliver to the fullest. Compilers that have the ability to automatically optimize programs for a specific target hardware can be highly useful in this context. Iterative (or search-based) compilation requires little or no prior knowledge and can adapt more easily to concrete programs and target hardware than static cost models and heuristics. Thereby, iterative compilation helps in situations in which static heuristics do not reflect the combination of input program and target hardware well. Moreover, iterative compilation may enable the derivation of more accurate cost models and heuristics for optimizing compilers. In this context, the polyhedron model is of help as it provides not only a mathematical representation of programs but, more importantly, a uniform representation of complex sequences of program transformations by schedule functions. The latter facilitates the systematic exploration of the set of legal transformations of a given program. Early approaches to purely iterative schedule optimization in the polyhedron model do not limit their search to schedules that preserve program semantics and, thereby, suffer from the need to explore numbers of illegal schedules. More recent research ensures the legality of program transformations but presumes a sequential rather than a parallel execution of the transformed program. Other approaches do not perform a purely iterative optimization. We propose an approach to iterative schedule optimization for parallelization and tiling in the polyhedron model. Our approach targets loop programs that profit from data locality optimization and coarse-grained loop parallelization. The schedule search space can be explored either randomly or by means of a genetic algorithm. To determine a schedule's profitability, we rely primarily on measuring the transformed code's execution time. While benchmarking is accurate, it increases the time and resource consumption of program optimization tremendously and can even make it impractical. We address this limitation by proposing to learn surrogate models from schedules generated and evaluated in previous runs of the iterative optimization and to replace benchmarking by performance prediction to the extent possible. Our evaluation on the PolyBench 4.1 benchmark set reveals that, in a given setting, iterative schedule optimization yields significantly higher speedups in the execution of the program to be optimized. Surrogate performance models learned from training data that was generated during previous iterative optimizations can reduce the benchmarking effort without strongly impairing the optimization result. A prerequisite for this approach is a sufficient similarity between the training programs and the program to be optimized
    corecore