Search CORE

857 research outputs found

Near-optimal loop tiling by means of cache miss equations and genetic algorithms

Author: Abella Ferrer Jaume
González Colás Antonio María
Llosa Espuny José Francisco
Vera Rivera Francisco Javier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

The effectiveness of the memory hierarchy is critical for the performance of current processors. The performance of the memory hierarchy can be improved by means of program transformations such as loop tiling, which is a code transformation targeted to reduce capacity misses. This paper presents a novel systematic approach to perform near-optimal loop tiling based on an accurate data locality analysis (cache miss equations) and a powerful technique to search the solution space that is based on a genetic algorithm. The results show that this approach can remove practically all capacity misses for all considered benchmarks. The reduction of replacement misses results in a decrease of the miss ratio that can be as significant as a factor of 7 for the matrix multiply kernel.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

On Multiphase-Linear Ranking Functions

Author: A Podelski
A Schrijver
AM Ben-Amram
AR Bradley
AR Bradley
B Cook
B Cook
C Alias
E Albert
HY Chen
J Leike
J Leroux
J Ouaknine
M Brockschmidt
M Harrison
MA Colóon
P Feautrier
P Ganty
Publication venue
Publication date: 23/03/2017
Field of study

Multiphase ranking functions (

\mathit{M{\Phi}RFs}

) were proposed as a means to prove the termination of a loop in which the computation progresses through a number of "phases", and the progress of each phase is described by a different linear ranking function. Our work provides new insights regarding such functions for loops described by a conjunction of linear constraints (single-path loops). We provide a complete polynomial-time solution to the problem of existence and of synthesis of

\mathit{M{\Phi}RF}

of bounded depth (number of phases), when variables range over rational or real numbers; a complete solution for the (harder) case that variables are integer, with a matching lower-bound proof, showing that the problem is coNP-complete; and a new theorem which bounds the number of iterations for loops with

\mathit{M{\Phi}RFs}

. Surprisingly, the bound is linear, even when the variables involved change in non-linear way. We also consider a type of lexicographic ranking functions,

\mathit{LLRFs}

, more expressive than types of lexicographic functions for which complete solutions have been given so far. We prove that for the above type of loops, lexicographic functions can be reduced to

\mathit{M{\Phi}RFs}

, and thus the questions of complexity of detection and synthesis, and of resulting iteration bounds, are also answered for this class.Comment: typos correcte

arXiv.org e-Print Archive

Crossref

Symbolic and analytic techniques for resource analysis of Java bytecode

Author: Aspinall David
Atkey Robert
MacKenzie Kenneth
Sannella Donald
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Recent work in resource analysis has translated the idea of amortised resource analysis to imperative languages using a program logic that allows mixing of assertions about heap shapes, in the tradition of separation logic, and assertions about consumable resources. Separately, polyhedral methods have been used to calculate bounds on numbers of iterations in loop-based programs. We are attempting to combine these ideas to deal with Java programs involving both data structures and loops, focusing on the bytecode level rather than on source code

CiteSeerX

University of Strathclyde Institutional Repository

Edinburgh Research Explorer

From Loop Transformation to Hardware Generation

Author: BEYLS K
CHRISTIAENS M
Devos Harald
Stroobandt Dirk
Van Campenhout Jan
Publication venue: Veldhoven
Publication date: 01/01/2006
Field of study

Ghent University Academic Bibliography

Succinct Representations for Abstract Interpretation

Author: A. Miné
B. Dutertre
B. Jeannet
B. Jeannet
C. Lattner
D. Gopan
D. Monniaux
D. Monniaux
L. Gonnord
L. Moura de
N. Halbwachs
P. Cousot
R. Bagnara
R. Bagnara
R. Bagnara
R. Sharma
T. Gawlitza
X. Rival
Publication venue
Publication date: 01/01/2012
Field of study

Abstract interpretation techniques can be made more precise by distinguishing paths inside loops, at the expense of possibly exponential complexity. SMT-solving techniques and sparse representations of paths and sets of paths avoid this pitfall. We improve previously proposed techniques for guided static analysis and the generation of disjunctive invariants by combining them with techniques for succinct representations of paths and symbolic representations for transitions based on static single assignment. Because of the non-monotonicity of the results of abstract interpretation with widening operators, it is difficult to conclude that some abstraction is more precise than another based on theoretical local precision results. We thus conducted extensive comparisons between our new techniques and previous ones, on a variety of open-source packages.Comment: Static analysis symposium (SAS), Deauville : France (2012

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

Loo.py: transformation-based code generation for GPUs and CPUs

Author: Asanovic K.
Ellson J.
Rubinsteyn A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/05/2014
Field of study

Today's highly heterogeneous computing landscape places a burden on programmers wanting to achieve high performance on a reasonably broad cross-section of machines. To do so, computations need to be expressed in many different but mathematically equivalent ways, with, in the worst case, one variant per target machine. Loo.py, a programming system embedded in Python, meets this challenge by defining a data model for array-style computations and a library of transformations that operate on this model. Offering transformations such as loop tiling, vectorization, storage management, unrolling, instruction-level parallelism, change of data layout, and many more, it provides a convenient way to capture, parametrize, and re-unify the growth among code variants. Optional, deep integration with numpy and PyOpenCL provides a convenient computing environment where the transition from prototype to high-performance implementation can occur in a gradual, machine-assisted form

arXiv.org e-Print Archive

Crossref

Parameterized and multi-level tiled loop generation

Author: Kim DaeGon
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2010
Field of study

Department Head: L. Darrell Whitley.2010 Summer.Includes bibliographical references.Tiling is a loop transformation that decomposes computations into a set of smaller computation blocks. The transformation has been proven to be useful for many high-level program optimizations, such as data locality optimization and exploiting coarse-grained parallelism, and crucial for architecture with limited resources, such as embedded systems, GPUs, and the Cell architecture. Data locality and parallelism will continue to serve as major vehicles for achieving high performance on modern architecture in multi-core era. In parameterized tiling the size of blocks is not fixed at compile time but remains a symbolic constant so that it can be selected/changed even at runtime. Parameterized tiled loops facilitate iterative and runtime optimizations, such as iterative compilation, auto-tuning and dynamic program adaption. In this dissertation we present a collection of techniques for generating parameterized and multi-level tiled loops from affine control loops and their parallelization. The tiled loop generation problem even for perfectly nested loops has been believed to have an exponential time complexity due to the heavy machinery like Fourier-Motzkin elimination. Disproving this decade-long belief, we provide a simple technique for generating tiled loop nests even from imperfectly nested loops. Our technique for perfectly nested loops consists of only syntactic processing that is applied only once and independently to each loop bound. Our approach to imperfectly nested loops is composed of a direct extension of the tiled code generation technique for perfectly nested loops and three simple optimizations on the resulting parameterized tiled loops. The generation as well as the optimizations are achieved only with purely syntactic processing, hence loop generation time remains negligible. We also present three schemes for multi-level tiling where tiling is applied more than once. All the schemes are scalable with respect to the number of tiling levels and can be combined to achieve better performance. To facilitate parallelization of parameterized tiled loops, we generate outermost tile-loops that are perfectly nested. We also provide a technique for statically restructuring parameterized tiled loops to the wavefront scheduling on shared memory system. Because the formulation of parameterized tiling does not fit into the well established polyhedral framework, such static restructuring has been a great challenge. However, we achieve this limited restructuring through a syntactic processing without any sophisticated machinery

Mountain Scholar (Digital Collections of Colorado and Wyoming)