368 research outputs found

    Polyhedral+Dataflow Graphs

    Get PDF
    This research presents an intermediate compiler representation that is designed for optimization, and emphasizes the temporary storage requirements and execution schedule of a given computation to guide optimization decisions. The representation is expressed as a dataflow graph that describes computational statements and data mappings within the polyhedral compilation model. The targeted applications include both the regular and irregular scientific domains. The intermediate representation can be integrated into existing compiler infrastructures. A specification language implemented as a domain specific language in C++ describes the graph components and the transformations that can be applied. The visual representation allows users to reason about optimizations. Graph variants can be translated into source code or other representation. The language, intermediate representation, and associated transformations have been applied to improve the performance of differential equation solvers, or sparse matrix operations, tensor decomposition, and structured multigrid methods

    Towards self-verification in finite difference code generation

    Get PDF
    Code generation from domain-specific languages is becoming increasingly popular as a method to obtain optimised low-level code that performs well on a given platform and for a given problem instance. Ensuring the correctness of generated codes is crucial. At the same time, testing or manual inspection of the code is problematic, as the generated code can be complex and hard to read. Moreover, the generated code may change depending on the problem type, domain size, or target platform, making conventional code review or testing methods impractical. As a solution, we propose the integration of formal verification tools into the code generation process. We present a case study in which the CIVL verification tool is combined with the Devito finite difference framework that generates optimised stencil code for PDE solvers from symbolic equations. We show a selection of properties of the generated code that can be automatically specified and verified during the code generation process. Our approach allowed us to detect a previously unknown bug in the Devito code generation tool

    Verified Code Generation for the Polyhedral Model

    Get PDF
    International audienceThe polyhedral model is a high-level intermediate representation for loop nests that supports elegantly a great many loop optimizations. In a compiler, after polyhedral loop optimizations have been performed, it is necessary and difficult to regenerate sequential or parallel loop nests before continuing compilation. This paper reports on the formalization and proof of semantic preservation of such a code generator that produces sequential code from a polyhedral representation. The formalization and proofs are mechanized using the Coq proof assistant

    Reach Set Approximation through Decomposition with Low-dimensional Sets and High-dimensional Matrices

    Full text link
    Approximating the set of reachable states of a dynamical system is an algorithmic yet mathematically rigorous way to reason about its safety. Although progress has been made in the development of efficient algorithms for affine dynamical systems, available algorithms still lack scalability to ensure their wide adoption in the industrial setting. While modern linear algebra packages are efficient for matrices with tens of thousands of dimensions, set-based image computations are limited to a few hundred. We propose to decompose reach set computations such that set operations are performed in low dimensions, while matrix operations like exponentiation are carried out in the full dimension. Our method is applicable both in dense- and discrete-time settings. For a set of standard benchmarks, it shows a speed-up of up to two orders of magnitude compared to the respective state-of-the art tools, with only modest losses in accuracy. For the dense-time case, we show an experiment with more than 10.000 variables, roughly two orders of magnitude higher than possible with previous approaches

    Polyhedral Compilation: Applications, Approximations and GPU-specific Optimizations

    Get PDF
    Polyhedral compilation has been successful in analyzing, optimizing, automatically parallelizing a�ne computations for modern heterogenous target architectures. Many of the tools have been developed to automate the process of program analysis and transformations for a�ne control parts of programs including widely used open-source and production compilers such as GCC, LLVM, IBM/XL. This thesis makes contribution to the polyhedral model in three orthogonal dimensions as follows: • Applications: Applies polyhedral loop transformations on Deep learning computation kernel to demonstrate the e�ectiveness of complex loop transformations on these kernels. • Approximations: Developes two efficient algorithms to over-approximate convex polyhedra into U-TVPI polyhedra having applications in polyhedral compilation as well as automated program verification. • GPU-Specific Optimizations: Builds end-to-end fully automatic compiler framework to generate cache optimized CUDA code begining from sequential C program by using polyhedral modelling techniques.

    IEGen: semi-automatic generation of inspectors and executors

    Get PDF
    Department Head: L. Darrell Whitley.Includes bibliographical references (pages 74-76).Software that simulates real-world phenomena such as heat transfer over surfaces and molecular interaction is often based on irregular computational kernels. Indirect array accesses such as A[B[i]] that are found in irregular computations often exhibit a memory access pattern that does not make efficient use of the memory hierarchy, reducing performance. Additionally, indirect array accesses hinder our ability to apply loop optimizations to improve data locality or introduce parallelism at compile time. One approach to solving this problem, an inspector/executor strategy, inspects the index arrays (B) at runtime to determine the order of accesses to the data arrays (A), reorders the data and index arrays, and executes a transformed computation that accesses the data arrays in a more efficient manner. For the most part, the application of inspector/executor strategies to irregular computations has been done manually or with limited generality. This thesis presents the Inspector/Executor Generator (IEGen). This tool accepts an irregular computation specification and sequence of run-time reordering transformations to apply to that computation as input. IEGen then generates serial inspector and executor code that implements the transformed computation. We contribute an inspector intermediate representation (IR) called an Inspector Dependence Graph (IDG), a method for code generation of inspectors based on an IDG, a method for code generation of executors, and techniques for manipulating affine constraints with uninterpreted function symbol (UFS) expressions to enable code generation. We evaluate our techniques against an existing library that supports limited UFS expressions and additionally show that generalized generation of inspectors and executors that implement composed run-time reordering transformations is possible

    Automatic Performance Optimization of Stencil Codes

    Get PDF
    A widely used class of codes are stencil codes. Their general structure is very simple: data points in a large grid are repeatedly recomputed from neighboring values. This predefined neighborhood is the so-called stencil. Despite their very simple structure, stencil codes are hard to optimize since only few computations are performed while a comparatively large number of values have to be accessed, i.e., stencil codes usually have a very low computational intensity. Moreover, the set of optimizations and their parameters also depend on the hardware on which the code is executed. To cut a long story short, current production compilers are not able to fully optimize this class of codes and optimizing each application by hand is not practical. As a remedy, we propose a set of optimizations and describe how they can be applied automatically by a code generator for the domain of stencil codes. A combination of a space and time tiling is able to increase the data locality, which significantly reduces the memory-bandwidth requirements: a standard three-dimensional 7-point Jacobi stencil can be accelerated by a factor of 3. This optimization can target basically any stencil code, while others are more specialized. E.g., support for arbitrary linear data layout transformations is especially beneficial for colored kernels, such as a Red-Black Gauss-Seidel smoother. On the one hand, an optimized data layout for such kernels reduces the bandwidth requirements while, on the other hand, it simplifies an explicit vectorization. Other noticeable optimizations described in detail are redundancy elimination techniques to eliminate common subexpressions both in a sequence of statements and across loop boundaries, arithmetic simplifications and normalizations, and the vectorization mentioned previously. In combination, these optimizations are able to increase the performance not only of the model problem given by Poisson’s equation, but also of real-world applications: an optical flow simulation and the simulation of a non-isothermal and non-Newtonian fluid flow
    corecore