Search CORE

10 research outputs found

Combining Target-independent Analysis with Dynamic Profiling to Build the Performance Model of a DSP

Author: F. Ferrandi
M. Lattuada
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Fast and accurate performance estimation is a key aspect of heterogeneous embedded systems design flow, since cycle-accurate simulators, when exist, are usually too slow to be used during design space exploration. Performance estimation techniques are usually based on combination of estimation of the single processing elements which compose the system. Architectural characteristics of Digital Signal Processors (DSP), such as the presence of Single Instruction Multiple Data operations or of special hardware units to control loop executions, introduce peculiar aspects in the performance estimation problem. In this paper we present a methodology to estimate the performance of a function on a given dataset on a DSP. Estimation is performed combining the host profiling data with the function GNU GCC GIMPLE representation. Starting from the results of this analysis, we build a performance model of a DSP by exploiting the Linear Regression Technique. Use of GIMPLE representation allows to take directly into account the target-independent optimizations performed by the DSP compiler. We validate our approach by building a performance model of the MagicV DSP and by testing the model on a set of significative benchmarks

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Performance Improvement in Kernels by Guiding Compiler Auto-Vectorization Heuristics

Author: Renatio Miceli
Publication venue
Publication date
Field of study

Vectorization support in hardware continues to expand and grow as well we still continue on superscalar architectures. Unfortunately, compilers are not always able to generate optimal code for the hardware;detecting and generating vectorized code is extremely complex. Programmers can use a number of tools to aid in development and tuning, but most of these tools require expert or domain-specific knowledge to use. In this work we aim to provide techniques for determining the best way to optimize certain codes, with an end goal of guiding the compiler into generating optimized code without requiring expert knowledge from the developer. Initally, we study how to combine vectorization reports with iterative comilation and code generation and summarize our insights and patterns on how the compiler vectorizes code. Our utilities for iterative compiliation and code generation can be further used by non-experts in the generation and analysis of programs. Finally, we leverage the obtained knowledge to design a Support Vector Machine classifier to predict the speedup of a program given a sequence of optimization underprediction, with 82% of these accurate within 15 % both ways

ZENODO

A Survey on Compiler Autotuning using Machine Learning

Author: Ashouri Amir H.
Cavazos John
Killian William
Palermo Gianluca
Silvano Cristina
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/09/2018
Field of study

Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Automatic creation of tile size selection models using neural networks

Author: Yuki Tomofumi
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2010
Field of study

2010 Spring.Includes bibliographic references (pages 54-59).Covers not scanned.Print version deaccessioned 2022.Tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality. Effective use of tiling requires selection and tuning of the tile sizes. This is usually achieved by hand-crafting tile size selection (TSS) models that characterize the performance of the tiled program as a function of tile sizes. The best tile sizes are selected by either directly using the TSS model or by using the TSS model together with an empirical search. Hand-crafting accurate TSS models is hard, and adapting them to different architecture/compiler, or even keeping them up-to-date with respect to the evolution of a single compiler is often just as hard. Instead of hand-crafting TSS models, can we automatically learn or create them? In this paper, we show that for a specific class of programs fairly accurate TSS models can be automatically created by using a combination of simple program features, synthetic kernels, and standard machine learning techniques. The automatic TSS model generation scheme can also be directly used for adapting the model and/or keeping it up-to-date. We evaluate our scheme on six different architecture-compiler combinations (chosen from three different architectures and four different compilers). The models learned by our method have consistently shown near-optimal performance (within 5% of the optimal on average) across the tested architecture-compiler combinations

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Machine Learning in Compiler Optimization

Author: O'Boyle Michael
Wang Zheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/05/2018
Field of study

In the last decade, machine learning based compilation has moved from an an obscure research niche to a mainstream activity. In this article, we describe the relationship between machine learning and compiler optimisation and introduce the main concepts of features, models, training and deployment. We then provide a comprehensive survey and provide a road map for the wide variety of different research areas. We conclude with a discussion on open issues in the area and potential research directions. This paper provides both an accessible introduction to the fast moving area of machine learning based compilation and a detailed bibliography of its main achievements

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Lancaster E-Prints

Predictive Modeling in a Polyhedral Optimization Space

Author: Bastoul Cédric
Cavazos John
Cohen Albert
Park Eunjung
Pouchet Louis-Noël
Sadayappan P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

International audienceHigh-level program optimizations, such as loop transformations, are critical for high performance on multi-core targets. However, complex sequences of loop transformations are often required to expose parallelism (both coarse-grain and fine-grain) and improve data locality. The polyhedral compilation framework has proved to be very effective at representing these complex sequences and restructuring compute-intensive applications, seamlessly handling perfectly and imperfectly nested loops. Nevertheless identifying the most effective loop transformations remains a major challenge. We address the problem of selecting the best polyhedral optimizations with dedicated machine learning models, trained specifically on the target machine. We show that these models can quickly select high-performance optimizations with very limited iterative search. Our end-to-end framework is validated using numerous benchmarks on two modern multi-core platforms. We investigate a variety of different machine learning algorithms and hardware counters, and we obtain performance improvements over productions compilers ranging on average from 3.2x to 8.7x, by running not more than 6 program variants from a polyhedral optimization space

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Automatic performance model construction for the fast software exploration of new hardware designs

Author: Christophe Dubach
Edwin Bonilla
Felix Agakov
Grigori Fursin
John Cavazos
Olivier Temammember Of Hipeac
Publication venue
Publication date: 01/01/2006
Field of study

Developing an optimizing compiler for a newly proposed architecture is extremely difficult when there is only a simulator of the machine available. Designing such a compiler requires running many experiments in order to understand how different optimizations interact. Given that simulators are orders of magnitude slower than real processors, such experiments are highly restricted. This paper develops a technique to automatically build a performance model for predicting the impact of program transformations on any architecture, based on a limited number of automatically selected runs. As a result, the time for evaluating the impact of any compiler optimization in early design stages can be drastically reduced such that all selected potential compiler optimizations can be evaluated. This is achieved by first evaluating a small set of sample compiler optimizations on a prior set of benchmarks in order to train a model, followed by a very small number of evaluations, or probes, of the target program. We show that by training on less than 0.7 % of all possible transformations (640 samples collected from 10 benchmarks out of 880000 possible samples, 88000 per training benchmark) and probing the new program on only 4 transformations, we can predict the performance of all program transformations with an error of just 7.3 % on average. As each prediction takes almost no time to generate, this scheme provides an accurate method of evaluating compiler performance, which is several orders of magnitude faster than current approaches

CiteSeerX

Crossref