Search CORE

1,256 research outputs found

Machine Learning in Compiler Optimization

Author: O'Boyle Michael
Wang Zheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/05/2018
Field of study

In the last decade, machine learning based compilation has moved from an an obscure research niche to a mainstream activity. In this article, we describe the relationship between machine learning and compiler optimisation and introduce the main concepts of features, models, training and deployment. We then provide a comprehensive survey and provide a road map for the wide variety of different research areas. We conclude with a discussion on open issues in the area and potential research directions. This paper provides both an accessible introduction to the fast moving area of machine learning based compilation and a detailed bibliography of its main achievements

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Lancaster E-Prints

Machine Learning Based Auto-tuning for Enhanced OpenCL Performance Portability

Author: Elster Anne C.
Falch Thomas L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/06/2015
Field of study

Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such systems, and offers functional portability. It does, however, suffer from poor performance portability, code tuned for one device must be re-tuned to achieve good performance on another device. In this paper, we use machine learning-based auto-tuning to address this problem. Benchmarks are run on a random subset of the entire tuning parameter configuration space, and the results are used to build an artificial neural network based model. The model can then be used to find interesting parts of the parameter space for further search. We evaluate our method with different benchmarks, on several devices, including an Intel i7 3770 CPU, an Nvidia K40 GPU and an AMD Radeon HD 7970 GPU. Our model achieves a mean relative error as low as 6.1%, and is able to find configurations as little as 1.3% worse than the global minimum.Comment: This is a pre-print version an article to be published in the Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). For personal use onl

arXiv.org e-Print Archive

Crossref

Savant: Automatic generation of a parallel scheduling heuristic for map-reduce

Author: Dorronsoro Bernabé
Pinel Frédéric
Publication venue
Publication date: 01/01/2014
Field of study

Open Repository and Bibliography - Luxembourg

“It Looks Like You’re Writing a Parallel Loop”:A Machine Learning Based Parallelization Assistant

Author: Castaneda Lozano Roberto
Cole Murray
Franke Bjoern
Maramzin Aleksandr
Vasiladiotis Christos
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Crossref

Edinburgh Research Explorer

Using Decision Tree Voting to Select a Polyhedral Model Loop Transformation

Author: Ruvinskiy Ray
Publication venue: 'University of Waterloo'
Publication date: 01/01/2013
Field of study

Algorithms in fields like image manipulation, sound and signal processing, and statistics frequently employ tight loops. These loops are computationally intensive and CPU-bound, making their performance highly dependent on efficient utilization of the CPU pipeline and memory bus. Recent years have seen CPU pipelines becoming more and more complicated, with features such as branch prediction and speculative execution. At the same time, clock speeds have stopped their prior exponential growth rate due to heat dissipation issues, and multiple cores have become prevalent. These developments have made it more difficult for developers to reason about how their code executes on the CPU, which in turn makes it difficult to write performant code. An automated method to take code and optimize it for most efficient execution would, therefore, be desirable. The Polyhedral Model allows the generation of alternative transformations for a loop nest that are semantically equivalent to the original. The transformations vary the degree of loop tiling, loop fusion, loop unrolling, parallelism, and vectorization. However, selecting the transformation that would most efficiently utilize the architecture remains challenging. Previous work utilizes regression models to select a transformation, using as features hardware performance counter values collected during a sample run of the program being optimized. Due to inaccuracies in the resulting regression model, the transformation selected by the model as the best transformation often yields unsatisfactory performance. As a result, previous work resorts to using a five-shot technique, which entails running the top five transformations suggested by the model and selecting the best one based on their actual runtime. However, for long-running benchmarks, five runs may be take an excessive amount of time. I present a variation on the previous approach which does not need to resort to the five-shot selection process to achieve performance comparable to the best five-shot results reported in previous work. With the transformations in the search space ranked in reverse runtime order, the transformation selected by my classifier is, on average, in the 86th percentile. There are several key contributing factors to the performance improvements attained by my method: formulating the problem as a classification problem rather than a regression problem, using static features in addition to dynamic performance counter features, performing feature selection, and using ensemble methods to boost the performance of the classifier. Decision trees are constructed from pairs of features (performance counters and structural features than can be determined statically from the source code). The trees are then evaluated according to the number of benchmarks for which they select a transformation that performs better than two baseline variants, the original program and the expected runtime if a randomly selected transformation were applied. The top 20 trees vote to select a final transformation

CiteSeerX

University of Waterloo's Institutional Repository

Learning for Optimization with Virtual Savant

Author: Massobrio Renzo
Publication venue
Publication date: 25/05/2021
Field of study

Optimization problems arising in multiple fields of study demand efficient algorithms that can exploit modern parallel computing platforms. The remarkable development of machine learning offers an opportunity to incorporate learning into optimization algorithms to efficiently solve large and complex problems. This thesis explores Virtual Savant, a paradigm that combines machine learning and parallel computing to solve optimization problems. Virtual Savant is inspired in the Savant Syndrome, a mental condition where patients excel at a specific ability far above the average. In analogy to the Savant Syndrome, Virtual Savant extracts patterns from previously-solved instances to learn how to solve a given optimization problem in a massively-parallel fashion. In this thesis, Virtual Savant is applied to three optimization problems related to software engineering, task scheduling, and public transportation. The efficacy of Virtual Savant is evaluated in different computing platforms and the experimental results are compared against exact and approximate solutions for both synthetic and realistic instances of the studied problems. Results show that Virtual Savant can find accurate solutions, effectively scale in the problem dimension, and take advantage of the availability of multiple computing resources.Los problemas de optimización que surgen en múltiples campos de estudio demandan algoritmos eficientes que puedan explotar las plataformas modernas de computación paralela. El notable desarrollo del aprendizaje automático ofrece la oportunidad de incorporar el aprendizaje en algoritmos de optimización para resolver problemas complejos y de grandes dimensiones de manera eficiente. Esta tesis explora Savant Virtual, un paradigma que combina aprendizaje automático y computación paralela para resolver problemas de optimización. Savant Virtual está inspirado en el Sı́ndrome de Savant, una condición mental en la que los pacientes se destacan en una habilidad especı́fica muy por encima del promedio. En analogı́a con el sı́ndrome de Savant, Savant Virtual extrae patrones de instancias previamente resueltas para aprender a resolver un determinado problema de optimización de forma masivamente paralela. En esta tesis, Savant Virtual se aplica a tres problemas de optimización relacionados con la ingenierı́a de software, la planificación de tareas y el transporte público. La eficacia de Savant Virtual se evalúa en diferentes plataformas informáticas y los resultados se comparan con soluciones exactas y aproximadas para instancias tanto sintéticas como realistas de los problemas estudiados. Los resultados muestran que Savant Virtual puede encontrar soluciones precisas, escalar eficazmente en la dimensión del problema y aprovechar la disponibilidad de múltiples recursos de cómputo.Fundación Carolina Agencia Nacional de Investigación e Innovación (ANII, Uruguay) Universidad de Cádiz Universidad de la Repúblic

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

REDI - Digital Repository of the National Agency of Research and Innovation

Repositorio de Objetos de Docencia e Investigación de la Universidad de Cádiz