Search CORE

55 research outputs found

MaSiF: Machine learning guided auto-tuning of parallel skeletons

Author: Cole Murray
Collins Alexander
Fensch Christian
Leather Hugh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2013
Field of study

Heriot Watt Pure

Crossref

Towards Automatic Learning of Heuristics for Mechanical Transformations of Procedural Code

Author: Carro Manuel
Mariño Julio
Tamarit Salvador
Vigueras Guillermo
Publication venue
Publication date: 09/03/2016
Field of study

The current trend in next-generation exascale systems goes towards integrating a wide range of specialized (co-)processors into traditional supercomputers. However, the integration of different specialized devices increases the degree of heterogeneity and the complexity in programming such type of systems. Due to the efficiency of heterogeneous systems in terms of Watt and FLOPS per surface unit, opening the access of heterogeneous platforms to a wider range of users is an important problem to be tackled. In order to bridge the gap between heterogeneous systems and programmers, in this paper we propose a machine learning-based approach to learn heuristics for defining transformation strategies of a program transformation system. Our approach proposes a novel combination of reinforcement learning and classification methods to efficiently tackle the problems inherent to this type of systems. Preliminary results demonstrate the suitability of the approach for easing the programmability of heterogeneous systems.Comment: Part of the Program Transformation for Programmability in Heterogeneous Architectures (PROHA) workshop, Barcelona, Spain, 12th March 2016, 9 pages, LaTe

arXiv.org e-Print Archive

Directory of Open Access Journals

Adapt or Become Extinct!:The Case for a Unified Framework for Deployment-Time Optimization

Author: Goumas Georgios
Gross Thomas R.
Karlsson Sven
McKee Sally A.
Probst Christian W.
Själander Magnus
Zhang Lixin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

The High-Performance Computing ecosystem consists of a large variety of execution platforms that demonstrate a wide diversity in hardware characteristics such as CPU architecture, memory organization, interconnection network, accelerators, etc. This environment also presents a number of hard boundaries (walls) for applications which limit software development (parallel programming wall), performance (memory wall, communication wall) and viability (power wall). The only way to survive in such a demanding environment is by adaptation. In this paper we discuss how dynamic information collected during the execution of an application can be utilized to adapt the execution context and may lead to performance gains beyond those provided by static information and compile-time adaptation. We consider specialization based on dynamic information like user input, architectural characteristics such as the memory hierarchy organization, and the execution profile of the application as obtained from the execution platform\u27s performance monitoring units. One of the challenges of future execution platforms is to allow the seamless integration of these various kinds of information with information obtained from static analysis (either during ahead-of-time or just-in-time) compilation. We extend the notion of information-driven adaptation and outline the architecture of an infrastructure designed to enable information flow and adaptation through-out the life-cycle of an application

Crossref

Chalmers Research

DSpace at NTUA

Online Research Database In Technology

Parallel-Pattern Aware Compiler Optimisations:Challenges and Opportunities

Author: Leather Hugh
Wang Zheng
Publication venue: Lancaster University
Publication date: 10/09/2018
Field of study

This report outlines our finding that existing compilers are not aware of the pattern semantics and thus miss massive optimisation opportunities

Lancaster E-Prints

MLGOPerf: An ML Guided Inliner to Optimize Performance

Author: Ashouri Amir H.
Chan Bryan
Elhoushi Mostafa
Gao Yaoqing
Hua Yuzhe
Manzoor Muhammad Asif
Wang Xiang
Publication venue
Publication date: 19/07/2022
Field of study

For the past 25 years, we have witnessed an extensive application of Machine Learning to the Compiler space; the selection and the phase-ordering problem. However, limited works have been upstreamed into the state-of-the-art compilers, i.e., LLVM, to seamlessly integrate the former into the optimization pipeline of a compiler to be readily deployed by the user. MLGO was among the first of such projects and it only strives to reduce the code size of a binary with an ML-based Inliner using Reinforcement Learning. This paper presents MLGOPerf; the first end-to-end framework capable of optimizing performance using LLVM's ML-Inliner. It employs a secondary ML model to generate rewards used for training a retargeted Reinforcement learning agent, previously used as the primary model by MLGO. It does so by predicting the post-inlining speedup of a function under analysis and it enables a fast training framework for the primary model which otherwise wouldn't be practical. The experimental results show MLGOPerf is able to gain up to 1.8% and 2.2% with respect to LLVM's optimization at O3 when trained for performance on SPEC CPU2006 and Cbench benchmarks, respectively. Furthermore, the proposed approach provides up to 26% increased opportunities to autotune code regions for our benchmarks which can be translated into an additional 3.7% speedup value.Comment: Version 2: Added the missing Table 6. The short version of this work is accepted at ACM/IEEE CASES 202

arXiv.org e-Print Archive

Performance Improvement in Kernels by Guiding Compiler Auto-Vectorization Heuristics

Author: Renatio Miceli
Publication venue
Publication date
Field of study

Vectorization support in hardware continues to expand and grow as well we still continue on superscalar architectures. Unfortunately, compilers are not always able to generate optimal code for the hardware;detecting and generating vectorized code is extremely complex. Programmers can use a number of tools to aid in development and tuning, but most of these tools require expert or domain-specific knowledge to use. In this work we aim to provide techniques for determining the best way to optimize certain codes, with an end goal of guiding the compiler into generating optimized code without requiring expert knowledge from the developer. Initally, we study how to combine vectorization reports with iterative comilation and code generation and summarize our insights and patterns on how the compiler vectorizes code. Our utilities for iterative compiliation and code generation can be further used by non-experts in the generation and analysis of programs. Finally, we leverage the obtained knowledge to design a Support Vector Machine classifier to predict the speedup of a program given a sequence of optimization underprediction, with 82% of these accurate within 15 % both ways

ZENODO