20,991 research outputs found

    Quality and speed in linear-scan register allocation

    Get PDF
    A linear-scan algorithm directs the global allocation of register candidates to registers based on a simple linear sweep over the program being compiled. This approach to register allocation makes sense for systems, such as those for dynamic compilation, where compilation speed is important. In contrast, most commercial and research optimizing compilers rely on a graph-coloring approach to global register allocation. In this paper, we compare the performance of a linear-scan method against a modern graph-coloring method. We implement both register allocators within the Machine SUIF extension of the Stanford SUIF compiler system. Experimental results show that linear scan is much faster than coloring on benchmarks with large numbers of register candidates. We also describe improvements to the linear-scan approach that do not change its linear character, but allow it to produce code of a quality near to that produced by graph coloring.Engineering and Applied Science

    Trace-based Register Allocation in a JIT Compiler

    Get PDF
    State-of-the-art dynamic compilers often use global approaches, like Linear Scan or Graph Coloring, for register allocation. These algorithms consider the complete compilation unit for allocation, which increases the complexity of the implementation (e.g., support for lifetime holes in Linear Scan) and potentially also affects compilation time. We propose a novel non-global algorithm, which splits a compilation unit into traces based on profiling feedback and subsequently performs register allocation within each trace individually. Traces reduce the problem size to a single linear code segment, which simplifies the problem a register allocator needs to solve. Additionally, we can apply different register allocation algorithms to each trace. We show that this non-global approach can achieve results competitive to global register allocation. We present an implementation of Trace Register Allocation based on the Graal VM and show an evaluation for common Java benchmarks. We demonstrate that performance of this non-global approach is within 3% (on AMD64) and 1% (on SPARC) of global Linear Scan register allocation.(VLID)247065

    Trace Register Allocation Policies: Compile-time vs. Performance Trade-offs

    Get PDF
    Register allocation is an integral part of compilation, regardless of whether a compiler aims for fast compilation or optimal code quality. State-of-the-art dynamic compilers often use global register allocation approaches such as linear scan. Recent results suggest that non-global trace-based register allocation approaches can compete with global approaches in terms of allocation quality. Instead of processing the whole compilation unit (i.e., method) at once, a trace-based register allocator divides the problem into linear code segments, called traces. In this work, we present a register allocation framework that can exploit the additional flexibility of traces to select different allocation strategies based on the characteristics of a trace. This provides us with fine-grained control over the trade-off between compile time and peak performance in a just-in-time compiler. Our framework features three allocation strategies: a linear-scan-based approach that achieves good code quality, a single-pass bottom-up strategy that aims for short allocation times, and an allocator for trivial traces. To demonstrate the flexibility of the framework, we select 8 allocation policies and show their impact on compile time and peak performance. This approach can reduce allocation time by 7–43 at a peak performance penalty of about 1–11 on average. For systems that do not focus on peak performance, our approach allows to adjust the time spent for register allocation, and therefore the overall compilation time, thus finding the optimal balance between compile time and peak performance according to an application's requirements

    On the Complexity of Spill Everywhere under SSA Form

    Get PDF
    Compilation for embedded processors can be either aggressive (time consuming cross-compilation) or just in time (embedded and usually dynamic). The heuristics used in dynamic compilation are highly constrained by limited resources, time and memory in particular. Recent results on the SSA form open promising directions for the design of new register allocation heuristics for embedded systems and especially for embedded compilation. In particular, heuristics based on tree scan with two separated phases -- one for spilling, then one for coloring/coalescing -- seem good candidates for designing memory-friendly, fast, and competitive register allocators. Still, also because of the side effect on power consumption, the minimization of loads and stores overhead (spilling problem) is an important issue. This paper provides an exhaustive study of the complexity of the ``spill everywhere'' problem in the context of the SSA form. Unfortunately, conversely to our initial hopes, many of the questions we raised lead to NP-completeness results. We identify some polynomial cases but that are impractical in JIT context. Nevertheless, they can give hints to simplify formulations for the design of aggressive allocators.Comment: 10 page

    Efficient optimization of memory accesses in parallel programs

    Get PDF
    The power, frequency, and memory wall problems have caused a major shift in mainstream computing by introducing processors that contain multiple low power cores. As multi-core processors are becoming ubiquitous, software trends in both parallel programming languages and dynamic compilation have added new challenges to program compilation for multi-core processors. This thesis proposes a combination of high-level and low-level compiler optimizations to address these challenges. The high-level optimizations introduced in this thesis include new approaches to May-Happen-in-Parallel analysis and Side-Effect analysis for parallel programs and a novel parallelism-aware Scalar Replacement for Load Elimination transformation. A new Isolation Consistency (IC) memory model is described that permits several scalar replacement transformation opportunities compared to many existing memory models. The low-level optimizations include a novel approach to register allocation that retains the compile time and space efficiency of Linear Scan, while delivering runtime performance superior to both Linear Scan and Graph Coloring. The allocation phase is modeled as an optimization problem on a Bipartite Liveness Graph (BLG) data structure. The assignment phase focuses on reducing the number of spill instructions by using register-to-register move and exchange instructions wherever possible. Experimental evaluations of our scalar replacement for load elimination transformation in the Jikes RVM dynamic compiler show decreases in dynamic counts for getfield operations of up to 99.99%, and performance improvements of up to 1.76x on 1 core, and 1.39x on 16 cores, when compared with the load elimination algorithm available in Jikes RVM. A prototype implementation of our BLG register allocator in Jikes RVM demonstrates runtime performance improvements of up to 3.52x relative to Linear Scan on an x86 processor. When compared to Graph Coloring register allocator in the GCC compiler framework, our allocator resulted in an execution time improvement of up to 5.8%, with an average improvement of 2.3% on a POWER5 processor. With the experimental evaluations combined with the foundations presented in this thesis, we believe that the proposed high-level and low-level optimizations are useful in addressing some of the new challenges emerging in the optimization of parallel programs for multi-core architectures
    • …
    corecore