Search CORE

2 research outputs found

Recommended from our members

POET: Parameterized Optimization for Empirical Tuning

Author: Quinlan D
Seymour K
Vuduc R
Yi Q
You H
Publication venue: Lawrence Livermore National Laboratory
Publication date: 29/01/2007
Field of study

The excessive complexity of both machine architectures and applications have made it difficult for compilers to statically model and predict application behavior. This observation motivates the recent interest in performance tuning using empirical techniques. We present a new embedded scripting language, POET (Parameterized Optimization for Empirical Tuning), for parameterizing complex code transformations so that they can be empirically tuned. The POET language aims to significantly improve the generality, flexibility, and efficiency of existing empirical tuning systems. We have used the language to parameterize and to empirically tune three loop optimizations-interchange, blocking, and unrolling-for two linear algebra kernels. We show experimentally that the time required to tune these optimizations using POET, which does not require any program analysis, is significantly shorter than that when using a full compiler-based source-code optimizer which performs sophisticated program analysis and optimizations

UNT Digital Library

Recommended from our members

Efficient Generation of Sequences of Dense Linear Algebra through Auto-Tuning

Author: Belter Geoffrey
Publication venue: CU Scholar
Publication date: 01/01/2012
Field of study

It is rare for a programmer to solve a numerical problem with a single library call; most problems require a sequence of calls. In the case of linear algebra, programmers will chain a series of Basic Linear Algebra Subprogram (BLAS) library calls to achieve the desired result. When a sequence of BLAS calls is memory bound, a great deal of performance is missed because optimization has not occurred between library routines. It is not practical to create a library with every required sequence of linear algebra operations, but at the same time it is difficult for programmers to write their own high performance implementation. One solution is for programmers to use an auto-tuning tool capable of optimizing the sequence of operations that exactly suits their need. This thesis presents a matrix representation and type system that describes basic linear algebra operations, the loops required to implement those operations, and the legality of key optimizations. This is demonstrated in an auto-tuning tool which generates loops and performs data parallelism and loop fusion. Results show that this approach can match or exceed performance of vendor tuned BLAS libraries, general purpose optimizing compilers, and hand written code. Further, this approach is shown to be both portable and work with a range of dense matrix storage formats. All of this is achieved with search times in the range of several minutes to a few hours

CU Scholar Institutional Repository