1,812 research outputs found

    A Survey on Compiler Autotuning using Machine Learning

    Full text link
    Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

    Finding Best Compiler Options for Critical Software Using Parallel Algorithms

    Get PDF
    The efficiency of a software piece is a key factor for many systems. Real-time programs, critical software, device drivers, kernel OS functions and many other software pieces which are executed thousands or even millions of times per day require a very efficient execution. How this software is built can significantly affect the run time for these programs, since the context is that of compile-once/run-many. In this sense, the optimization flags used during the compilation time are a crucial element for this goal and they could make a big difference in the final execution time. In this paper, we use parallel metaheuristic techniques to automatically decide which optimization flags should be activated during the compilation on a set of benchmarking programs. The using the appropriate flag configuration is a complex combinatorial problem, but our approach is able to adapt the flag tuning to the characteristics of the software, improving the final run times with respect to other spread practicesThis research has been partially funded by the Spanish MINECO and FEDER projects (TIN2014-57341-R (http://moveon.lcc.uma.es), TIN2016-81766-REDT (http://cirti.es), and TIN2017-88213-R (http://6city.lcc.uma.es). It is also funded by Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Efficient memory-level parallelism extraction with decoupled strands

    Get PDF
    We present Outrider, an architecture for throughput-oriented processors that exploits intra-thread memory-level parallelism (MLP) to improve performance efficiency on highly threaded workloads. Outrider enables a single thread of execution to be presented to the architecture as multiple decoupled instruction streams, consisting of either memory accessing or memory consuming instructions. The key insight is that by decoupling the instruction streams, the processor pipeline can expose MLP in a way similar to out-of-order designs while relying on a low-complexity in-order micro-architecture. Instead of adding more threads as is done in modern GPUs, Outrider can expose the same MLP with fewer threads and reduced contention for resources shared among threads. We demonstrate that Outrider can outperform single-threaded cores by 23-131% and a 4-way simultaneous multi-threaded core by up to 87% in data parallel applications in a 1024-core system. Outrider achieves these performance gains without incurring the overhead of additional hardware thread contexts, which results in improved efficiency compared to a multi-threaded core

    Energy-Centric Scheduling for Real-Time Systems

    Get PDF
    Energy consumption is today an important design issue for all kinds of digital systems, and essential for the battery operated ones. An important fraction of this energy is dissipated on the processors running the application software. To reduce this energy consumption, one may, for instance, lower the processor clock frequency and supply voltage. This, however, might lead to a performance degradation of the whole system. In real-time systems, the crucial issue is timing, which is directly dependent on the system speed. Real-time scheduling and energy efficiency are therefore tightly connected issues, being addressed together in this work. Several scheduling approaches for low energy are described in the thesis, most targeting variable speed processor architectures. At task level, a novel speed scheduling algorithm for tasks with probabilistic execution pattern is introduced and compared to an already existing compile-time approach. For task graphs, a list-scheduling based algorithm with an energy-sensitive priority is proposed. For task sets, off-line methods for computing the task maximum required speeds are described, both for rate-monotonic and earliest deadline first scheduling. Also, a run-time speed optimization policy based on slack re-distribution is proposed for rate-monotonic scheduling. Next, an energy-efficient extension of the earliest deadline first priority assignment policy is proposed, aimed at tasks with probabilistic execution time. Finally, scheduling is examined in conjunction with assignment of tasks to processors, as parts of various low energy design flows. For some of the algorithms given in the thesis, energy measurements were carried out on a real hardware platform containing a variable speed processor. The results confirm the validity of the initial assumptions and models used throughout the thesis. These experiments also show the efficiency of the newly introduced scheduling methods
    corecore