55 research outputs found
Hybrid analysis of memory references and its application to automatic parallelization
Executing sequential code in parallel on a multithreaded machine has been an
elusive goal of the academic and industrial research communities for many years. It
has recently become more important due to the widespread introduction of multicores
in PCs. Automatic multithreading has not been achieved because classic, static
compiler analysis was not powerful enough and program behavior was found to be, in
many cases, input dependent. Speculative thread level parallelization was a welcome
avenue for advancing parallelization coverage but its performance was not always optimal
due to the sometimes unnecessary overhead of checking every dynamic memory
reference.
In this dissertation we introduce a novel analysis technique, Hybrid Analysis,
which unifies static and dynamic memory reference techniques into a seamless compiler
framework which extracts almost maximum available parallelism from scientific
codes and incurs close to the minimum necessary run time overhead. We present how
to extract maximum information from the quantities that could not be sufficiently
analyzed through static compiler methods, and how to generate sufficient conditions
which, when evaluated dynamically, can validate optimizations.
Our techniques have been fully implemented in the Polaris compiler and resulted
in whole program speedups on a large number of industry standard benchmark applications
Efficient optimization of memory accesses in parallel programs
The power, frequency, and memory wall problems have caused a major shift in mainstream computing by introducing processors that contain multiple low power cores. As multi-core processors are becoming ubiquitous, software trends in both parallel programming languages and dynamic compilation have added new challenges to program compilation for multi-core processors. This thesis proposes a combination of high-level and low-level compiler optimizations to address these challenges.
The high-level optimizations introduced in this thesis include new approaches to May-Happen-in-Parallel analysis and Side-Effect analysis for parallel programs and a novel parallelism-aware Scalar Replacement for Load Elimination transformation. A new Isolation Consistency (IC) memory model is described that permits several scalar replacement transformation opportunities compared to many existing memory models.
The low-level optimizations include a novel approach to register allocation that retains the compile time and space efficiency of Linear Scan, while delivering runtime performance superior to both Linear Scan and Graph Coloring. The allocation phase is modeled as an optimization problem on a Bipartite Liveness Graph (BLG) data structure. The assignment phase focuses on reducing the number of spill instructions by using register-to-register move and exchange instructions wherever possible.
Experimental evaluations of our scalar replacement for load elimination transformation in the Jikes RVM dynamic compiler show decreases in dynamic counts for getfield operations of up to 99.99%, and performance improvements of up to 1.76x on 1 core, and 1.39x on 16 cores, when compared with the load elimination algorithm available in Jikes RVM. A prototype implementation of our BLG register allocator in Jikes RVM demonstrates runtime performance improvements of up to 3.52x relative to Linear Scan on an x86 processor. When compared to Graph Coloring register allocator in the GCC compiler framework, our allocator resulted in an execution time improvement of up to 5.8%, with an average improvement of 2.3% on a POWER5 processor.
With the experimental evaluations combined with the foundations presented in this thesis, we believe that the proposed high-level and low-level optimizations are useful in addressing some of the new challenges emerging in the optimization of parallel programs for multi-core architectures
Improving loop optimization with histogram profiling
Production compilers use numerous techniques to generate performant code. One such technique is Profile-guided optimization (PGO). The princi- ple of this technique is to insert instrumentation during compilation, gather information about program behaviour with training runs and use this infor- mation during recompilation to improve optimization. The thesis aims to improve the precision of Loop optimizations in GNU Compiler Collection (GCC) with PGO. Currently in GCC, only the average iteration count of a loop is known with PGO. This leads to inefficiencies in both the performance and size of the binary. We implement infrastructure for measuring more information about loop iterations and add new counters namely the histogram of iterations and his- togram of iterations modulo its size. With the histogram of iterations, we improve loop peeling and implement a new case of loop versioning optimiza- tion. This significantly improves the performance of the generated code with reasonable overhead.Produkční překladače používají mnoho různých technik optimalizace kódu. Jedna taková technika je Profile-guided optimization (PGO). Princip této techniky je, že během překládání programu je do něj vložena instrumentace, uživatel změří jeho chování pomocí testovacího běhu a při druhém překladu jsou změřená data použita ke zlepšení optimalizace. Cílem této práce je zlep- šit přesnost optimalizace smyček v GNU Compiler Collection (GCC) s PGO. Během PGO je aktuálně znám ve GCC pouze průměrný počet iterací dané smyčky. To vede k neefektivním optimalizacím, jak co se týče výkonu, tak co se týče velikosti generovaného programu. Tato práce přidává infrastrukturu pro měření dalších vlastností smyček. Implementujeme histogram iterací smyčky a histogram iterací smyčky mo- dulo jeho velikostí. Pomocí histogramu iterací pak zlepšíme optimalizaci loop peeling a přidáme novou verzi optimalizace loop versioning. To podstatně zlepšuje výkon za přiměřenou cenu.Katedra aplikované matematikyDepartment of Applied MathematicsFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
RVSDG: An Intermediate Representation for Optimizing Compilers
Intermediate Representations (IRs) are central to optimizing compilers as the
way the program is represented may enhance or limit analyses and
transformations. Suitable IRs focus on exposing the most relevant information
and establish invariants that different compiler passes can rely on. While
control-flow centric IRs appear to be a natural fit for imperative programming
languages, analyses required by compilers have increasingly shifted to
understand data dependencies and work at multiple abstraction layers at the
same time. This is partially evidenced in recent developments such as the MLIR
proposed by Google. However, rigorous use of data flow centric IRs in general
purpose compilers has not been evaluated for feasibility and usability as
previous works provide no practical implementations. We present the
Regionalized Value State Dependence Graph (RVSDG) IR for optimizing compilers.
The RVSDG is a data flow centric IR where nodes represent computations, edges
represent computational dependencies, and regions capture the hierarchical
structure of programs. It represents programs in demand-dependence form,
implicitly supports structured control flow, and models entire programs within
a single IR. We provide a complete specification of the RVSDG, construction and
destruction methods, as well as exemplify its utility by presenting Dead Node
and Common Node Elimination optimizations. We implemented a prototype compiler
and evaluate it in terms of performance, code size, compilation time, and
representational overhead. Our results indicate that the RVSDG can serve as a
competitive IR in optimizing compilers while reducing complexity
Finding and understanding bugs in C compilers
ManuscriptCompilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. In this paper we present our compiler-testing tool and the results of our bug-hunting study. Our first contribution is to advance the state of the art in compiler testing. Unlike previous tools, Csmith generates programs that cover a large subset of C while avoiding the undefined and unspecified behaviors that would destroy its ability to automatically find wrong-code bugs. Our second contribution is a collection of qualitative and quantitative results about the bugs we have found in open-source C compilers
Generation of Application Specific Hardware Extensions for Hybrid Architectures: The Development of PIRANHA - A GCC Plugin for High-Level-Synthesis
Architectures combining a field programmable gate array (FPGA) and a general-purpose processor on a single chip became increasingly popular in recent years. On the one hand, such hybrid architectures facilitate the use of application specific hardware accelerators that improve the performance of the software on the host processor. On the other hand, it obliges system designers to handle the whole process of hardware/software co-design. The complexity of this process is still one of the main reasons, that hinders the widespread use of hybrid architectures. Thus, an automated process that aids programmers with the hardware/software partitioning and the generation of application specific accelerators is an important issue. The method presented in this thesis neither requires restrictions of the used high-level-language nor special source code annotations. Usually, this is an entry barrier for programmers without deeper understanding of the underlying hardware platform.
This thesis introduces a seamless programming flow that allows generating hardware accelerators for unrestricted, legacy C code. The implementation consists of a GCC plugin that automatically identifies application hot-spots and generates hardware accelerators accordingly. Apart from the accelerator implementation in a hardware description language, the compiler plugin provides the generation of a host processor interfaces and, if necessary, a prototypical integration with the host operating system. An evaluation with typical embedded applications shows general benefits of the approach, but also reveals limiting factors that hamper possible performance improvements
Safe code transfromations for speculative execution in real-time systems
Although compiler optimization techniques are standard and successful in non-real-time systems, if naively applied, they can destroy safety guarantees and deadlines in hard real-time systems. For this reason, real-time systems developers have tended to avoid automatic compiler optimization of their code. However, real-time applications in several areas have been growing substantially in size and complexity in recent years. This size and complexity makes it impossible for real-time programmers to write optimal code, and consequently indicates a need for compiler optimization. Recently researchers have developed or modified analyses and transformations to improve performance without degrading worst-case execution times. Moreover, these optimization techniques can sometimes transform programs which may not meet constraints/deadlines, or which result in timeouts, into deadline-satisfying programs.
One such technique, speculative execution, also used for example in parallel computing and databases, can enhance performance by executing parts of the code whose execution may or may not be needed. In some cases, rollback is necessary if the computation turns out to be invalid. However, speculative execution must be applied carefully to real-time systems so that the worst-case execution path is not extended. Deterministic worst-case execution for satisfying hard real-time constraints, and speculative execution with rollback for improving average-case throughput, appear to lie on opposite ends of a spectrum of performance requirements and strategies.
Deterministic worst-case execution for satisfying hard real-time constraints, and speculative execution with rollback for improving average-case throughput, appear to lie on opposite ends of a spectrum of performance requirements and strategies. Nonetheless, this thesis shows that there are situations in which speculative execution can improve the performance of a hard real-time system, either by enhancing average performance while not affecting the worst-case, or by actually decreasing the worst-case execution time. The thesis proposes a set of compiler transformation rules to identify opportunities for speculative execution and to transform the code. Proofs for semantic correctness and timeliness preservation are provided to verify safety of applying transformation rules to real-time systems. Moreover, an extensive experiment using simulation of randomly generated real-time programs have been conducted to evaluate applicability and profitability of speculative execution. The simulation results indicate that speculative execution improves average execution time and program timeliness. Finally, a prototype implementation is described in which these transformations can be evaluated for realistic applications
- …