Abstract. In this paper, we extend past work on Linear Scan register allocation, and propose two Extended Linear Scan (ELS) algorithms that retain the compiletime efficiency of past Linear Scan algorithms while delivering performance that can match or surpass that of Graph Coloring. Specifically, this paper makes the following contributions:
Introduction
Register allocation is the process of determining which variables (symbolic registers) should be held in physical machine registers at different program points and which should be spilled. Register assignment is the sub-process of identifying which specific machine registers should be used at different program points to hold which variables. The scope of register allocation may be local (restricted to a small region of a procedure, such as an innermost loop or an extended basic block), global (performed on an entire procedure) or interprocedural (performed across multiple procedures). Ever since its inclusion in the first compiler for FORTRAN five decades ago, register allocation has retained its role as one of the most important optimizations performed by compilers for high-level programming languages, and the algorithms used for register allocation have matured accordingly.
Starting with the seminal paper by Chaitin [5] , the dominant approaches for global register allocation have been based on the idea of building an Interference Graph (IG) for variables in a procedure, and employing Graph Coloring (GC) heuristics to perform the allocation. Significant advances have been achieved over these years through the introduction of new coloring, spilling, and coalescing heuristics based on the IG e.g.,
Spill-Free Register Allocation
This section introduces the Spill-Free Register Allocation (SFRA) problem as a theoretical foundation for comparing the fundamental differences between the Graph Coloring and Extended Linear Scan algorithms.
Spill-Free Register Allocation (SFRA): Given a set of symbolic registers, ℜ, and k physical registers, determine if it is possible to assign each symbolic register s ∈ ℜ to a physical register, reg(s, P) at each program point P where s is live. If so, report the register assignments, including any register-to-register copy statements that need to be inserted. If not, report that no feasible solution exists.
Two key assumptions in the specification of the SFRA problem are as follows. First, two "program points" are defined for each instruction, i k . i − k denotes the point at which the input operands of instruction i k are read, and i + k denotes the point at which the output operands of instruction i k are written. Second, we assume that register allocation is performed as a separate pass from instruction scheduling -instruction scheduling considerations for register allocation [1, 9, 11, 13] are beyond the scope of this paper. Figure 1 summarizes the basic Graph Coloring algorithm for Spill-Free Register Allocation as described by Chaitin [5] . The correctness of this algorithm has also been established earlier in [5] . It is easy to see that the algorithm requires O(|ℜ| 2 ) space, since the interference graph can be quadratic in the number of symbolic registers. The major overhead in execution time occurs in constructing the interference graph in step 2, which takes O(|ℜ| 2 ) time. (This assumes that the liveness information in input 3 has been precomputed in a way such that each instance of the simultaneously live condition in step 2 can be computed in constant time. Otherwise, the execution time for step 2 could be larger than O(|ℜ| 2 ).)
Basic Graph Coloring solution to the SFRA problem

Theoretical Limitations of Graph Coloring Solution
In this section, we summarize three fundamental theoretical limitations in using Graph Coloring as a foundation for global register allocation.
First, Graph Coloring is a more limited problem than Register Allocation. Transforming Register Allocation to Graph Coloring ensures that finding a k-coloring of an Interference Graph will lead to a feasible solution to the SFRA problem, but the converse is not true i.e., it is not necessary that an SFRA problem instance for which a solution exists can be transformed into a Graph Coloring problem for which a solution exists. Consider the two examples in Figure 2 , assuming that there are two physical registers available. In each case, a spill-free solution exists for the SFRA problem instance, but not for the Graph Coloring instance. In Example #2, the solution to the SFRA problem includes a register move instruction in the loop, but a solution based on Graph Coloring instead inserts a spill instruction in the loop. It is of course well known (e.g., [2] ) that renaming of variables or live-range splitting can be performed to obtain spill-free solutions with Graph Coloring for the examples in Figure 2 . The observation being made here is that these transformations are orthogonal to Graph Coloring and are 
Inputs
SFRA solution:
A simple solution exists to the above SFRA problem instance as follows, assuming that the two physical registers available are r 1 and r 2 . No register moves are necessary for this solution:
SFRA problem instance #2: Find a spill-free register allocation for symbolic registers s A , s B , s C in the program shown below, assuming k = 2 physical registers.
. . . := s C op . . .
. . . := s A op . . . i 6 :
. . . := s B op . . .
goto i 2 /* End of loop */ i 10 : . . .
Graph Coloring problem instance:
the Interference Graph is again a complete clique for the three nodes s A , s B , s C , and is therefore not 2-colorable.
SFRA solution:
The following solution exists to the above SFRA problem instance assuming that there are two physical registers available, r 1 and r 2 . It also requires the insertion of a registermove instruction r 1 := r 2 between instructions i 8 and i 9 .
Fig. 2.
Examples #1 and #2 for which a solution exists to the SFRA problem instance, but no solution exists to the corresponding Graph Coloring instance equally applicable to Extended Linear Scan (ELS). Also, these transformations come at the cost of increasing the number of nodes and edges in IG, thereby further exacerbating the time and space complexity of register allocation based on Graph Coloring.
Second, the O(|ℜ| 2 ) space requirement for constructing the interference graph is a scalability limitation because the overhead of any register allocation algorithm based on Graph Coloring becomes prohibitively large when compiling procedures with a large number of symbolic registers (especially after transformations such as procedure inlining and loop unrolling are performed), or in scenarios where compiler space and time overhead is at a premium (as in dynamic compilation).
Third, Graph Coloring is an NP-hard optimization problem (without even the guarantee of a constant performance bound), whereas an exact solution can be obtained for SFRA in time that is linear in the number of live intervals for all symbolic registers as shown below in Section 2.3.
Together these limitations suggest that the Graph Coloring formulation may have made the global register allocation algorithm harder to solve than necessary, and thereby provide the motivation for our work on Extended Linear Scan.
Basic Extended Linear Scan Algorithm, ELS 0
Inputs: Same as in Figure 1 register moves without the use of a temporary register, and insert these instructions on the control flow edge from P to Q (as part of Output 3 in Figure 3 ) end for end for Fig. 4 . Overview of Extended Linear Scan algorithm ELS 0 for Spill-Free Register Allocation (see Figure 3) a symbolic register s is represented by an Interval Set, I (s). Each interval, [P, Q] in I (s) represents a range of program points at which s is live. The interval set is a precise representation of liveness -as in [16] , there may be "holes" in the interval set corresponding to program points where s is not live. We also define I = ∪ s∈ℜ I (s) to be the set of all intervals in the program, and IEP to be the set of interval endpoints i.e., program points that correspond to endpoints of intervals in I . In the worst case theoretically, the size of I can be quadratic (|ℜ| × |IR|), where ℜ is the set of symbolic registers and IR is the intermediate representation of the procedure. The worst case can be achieved (for example) when each symbolic register is live at every other instruction in IR and therefore has |IR|/2 intervals. However, as shown in Section 4, in practice the average number of intervals per symbolic register is bounded by a small constant (≈ 2).
The outputs listed for the ELS 0 algorithm in Figure 4 are an extension of the outputs for the Graph Coloring algorithm in Figure 1 . The boolean value, Success, indicates if a feasible SFRA solution can be found. The register map, reg is finer-grained for ELS 0 than for GC since it is capable of assigning different physical registers to different intervals in the Interval Set of a given symbolic register. The third output of the ELS 0 algorithm is a set of register-move instructions needed to support the register map. We assume that it is preferable to generate register-register moves than spill loads and stores on current and future systems, even for loads and stores that results in cache hits. This is because many processors incur a coherence overhead for loads and stores, compared to register accesses. Further, register-register moves can be optimized by efficient copy coalescing algorithms such as the one presented in [3] .
We now outline how the ELS 0 algorithm addresses the three limitations for Graph Coloring discussed in Section 2.2:
1. The ELS 0 algorithm is guaranteed to find a feasible solution to an SFRA problem instance if and only if a feasible solution exists (Theorem 1). 2. The ELS 0 algorithm has a space requirement that is linear in the size of the input SFRA problem instance (Theorem 2). 3. The ELS 0 algorithm also has a time complexity that is linear in the size of the input SFRA problem instance (Theorem 2).
Theorem 1. The ELS 0 algorithm always computes a correct solution for the SFRA problem.
Proof:
[Sketch] The ELS 0 algorithm returns Success = false only if there exists a program point P with count[P] > k i.e., with more than k symbolic registers that are live at P (which means that a spill-free register allocation is not possible). If the ELS 0 algorithm returns Success = true then count[P] ≤ k must be true at all program points P ∈ IEP. Therefore, there must be a physical register available in the avail set for each symbolic register at each program point. The register-move instructions inserted by step 6 ensure that a symbolic register's value is correctly carried across different physical registers that may be assigned to the same symbolic register.
Theorem 2. The ELS 0 algorithm takes O(|IR| + |I |) space and O(|IR| + |I |) time.
Proof: [Sketch] It is easy to see that steps 1-5b take O(|IR| + |I |) space and time, assuming that all liveness information is precomputed (as in the Graph Coloring algorithm in Figure 1) . Note that the size of the avail set is bounded by a constant, k (= number of physical registers). For step 6, the key observation is that there can be at most k register move instructions inserted on any control flow edge. .
Register Allocation with Total Spills
In this section, we extend the SFRA problem statement to allow for total spills i.e., for identifying a subset of symbolic registers for which all accesses will be performed through memory instead of registers, with the goal of finding a solution with the smallest spill cost. Since the GCC compiler used to obtain our experimental results lacks support for pseudo-register live range splitting [8] , an investigation of live range splitting and partial spills in the ELS framework is a subject for future work.
Register Allocation with Total Spills (RATS): Given a set of symbolic registers, ℜ, k physical registers, and estimated execution frequency f req [P] for each program point P, a register allocation with total spills consists of 1. a boolean function, spilled(s), which indicates if s is to be spilled, and 2. for each symbolic register with spilled(s) = f alse, a register assignment, reg(s, P) at each program point P where s is live.
There are two versions of the RATS problem, depending on whether or not insertion of register-move instructions is permitted: -regMoves = false. In this version, no register-move instructions are allowed to be inserted, and the optimization problem is to find a register allocation with lowest spill cost i.e., the lowest number of dynamic load and store instructions for the spilled symbolic registers, as determined by the f req [P] values. -regMoves = true. In this version, register-move instructions are permitted as in the SFRA problem statement, and the optimization goal is to minimize the combined overhead of spill cost and register moves. The relative weightage to be given to spill costs and register moves is architecture-specific.
The SFRA problem in Section 2.3 is a decision problem which indicates whether a feasible spill-free register allocation can be obtained or not. In contrast, the RATS problem is an optimization problem, with the goal of minimizing spill costs (for the regMoves = false version) and a combination of spill costs and register-move cost (for the regMoves = true version). Note that it is trivial to obtain a feasible solution to the RATS problem by marking all symbolic registers as spilled -the challenge is to find a least-cost solution. It is well known that both versions of the RATS problem outlined above (with regMoves = false or true) are NP-hard.
The original algorithm by Chaitin addressed the regMoves = false version of the RATS problem by extending the algorithm in Figure 1 with a priority function that favored spilling symbolic register s with the smallest value of totalSpillCost(s)/iDegree(s), where
is the frequency-weighted sum of all read and write accesses to s, and iDegree(s) is the degree of s in the simplified Interference Graph. There has been a very substantial
Inputs:
1. IR, ℜ, k, as in Figure 1 . 2. f req [P] , estimated frequency for program point P ∈ IEP. 3. regMoves, version of the RATS problem to be solved.
Outputs:
1. spill(s), indicates if symbolic register s was spilled. 2. If spill(s) = false, then reg(s, P) specifies the physical register assigned to s at each program point P where s is live. 3. If regMoves = true, the IR is modified with insertion of register-move instructions as in Figure 4 .
Data structure initialization:
Initialize I (s), I , IEP, and count as in Figure 3 , an empty stack T , and spill(s) := false and totalSpillCost(s) as defined in Section 3. amount of past work on augmenting and refining this priority function, starting with [6] . As mentioned earlier, we expect that these advanced spill heuristics designed for GC will be equally applicable to an ELS foundation.
Figures 5 and 6 summarize our Extended Linear Scan algorithm for the RATS problem, ELS 1 . This algorithm uses an input parameter, regMoves, to address both versions of the RATS problem. Figure 5 includes initialization steps from the ELS 0 algorithm, and also initializes spill(s) and totalSpillCost(s). Figure 6 contains the main ELS 1 algorithm.
Step 1 in Figure 6 is the Spill Identification pass. It uses the observation from the SFRA problem that the only program points P for which spill decisions need to be made are those for which count[P] > K. The heuristic used in step 1a is to process these program points in decreasing order of f req [P] . As in Chaitin's Graph Coloring algorithm, Step 1b selects the symbolic register with the smallest value of totalSpillCost(s)/iDegree(s, P) for spilling. A key difference with graph coloring is that this decision is driven by the choice of program point P, and allows for assigning different physical registers to the same symbolic register at different program points, when regMoves = true. We define iDegree(s, P) = count[P] − 1 to be the number of symbolic registers that interfere with s at some program point P with count[P] > k, when computed in step 1b of ELS 1 algorithm. After Step 1 has completed, a feasible register allocation is obtained with count[P] ≤ k at each program point P. The set of registers selected to be spilled are identified by spill(s) = true, and are also pushed on to stack T .
Step 2 is the Spill Resurrection pass. It examines the symbolic registers pushed on the stack to see if any of them can be "unspilled". Opportunities for resurrection arise when a later spill decision causes an earlier spill decision to become redundant.
Step 3 is the Register Assignment pass. If regMoves is true, the algorithm uses steps 4, 5, 6 of the ELS 0 algorithm in Figure 4 . If regMoves is false, then we use a different register assignment algorithm that does not insert any register-move instructions. As half the total number of instructions in the program (e.g., if every alternate instruction is a "hole" -which could lead to a non-linear complexity for ELS), we see that in practice the average number of intervals per symbolic register is bounded by a small constant (≈ 2). We see that the Space Compression Factor (SCF) = |I |/|IG| varies from 4.5% to 22.7%, indicating the extent to which we expect the interval set, I , to be smaller than the interference graph, IG. Finally, the last two columns contain the compile-time spent in global register allocation for these two algorithms. For improved measurement accuracy, the register allocation phase was repeated 100 times, and the timing (in ms) reported in Table 1 is the average over the 100 runs. While compile-time measurements depend significantly on the engineering of the algorithm implementations, the early indications are there is a marked reduction in compile-time when moving from GC to ELS 1 for all benchmarks. The compile-time speedups for ELS 1 relative to GC varied from 15× to 68×, with an overall speedup of 18.5× when adding all the compile-times. Figure 7 shows the SPEC rates obtained for the Graph Coloring and ELS 1 algorithms, using the -O3 option in gcc. Recall that a larger SPEC rate indicates better performance. In summary, the runtime performance improved by up to 5.8% for ELS 1 relative to GC (for 197.parser), with an average improvement of 2.3%. There was only one case in which a small performance degradation was observed for ELS 1 , relative to GC -a slowdown of 1.4% for 181.mcf. These results clearly show that the compiletime benefits for Extended Linear Scan can be obtained without sacrificing runtime performance -in fact, ELS 1 delivers a net improvement in runtime performance relative to GC. Further, these measurements were obtained with regMoves = true, indicating that the extra register moves did not contribute a significant performance degradation. Runtime results were not obtained for the original Linear Scan algorithms, because it has already been established in prior work that their performance is inferior to that of Graph Coloring [14, 16] . 
Conclusions
This paper makes the case for using Extended Linear Scan as an alternate foundation to Graph Coloring for global register allocation. It highlighted three fundamental theoretical limitations with Graph Coloring as a foundation (Section 2.2). It introduced the basic Extended Linear Scan algorithm, ELS 0 (Section 2.3), which addressed all three limitations for the problem of Spill-Free Register Allocation (SFRA). It also introduced the ELS 1 algorithm (Section 3), which extended ELS 0 to obtain a greedy algorithm for the problem of Register Allocation with Total Spills (RATS). Finally, it included experimental results for eight SPECint2000 benchmarks to compare the Graph Coloring and Extended Linear Scan algorithms (Section 4). The results show that the space and time used by ELS 1 is significantly smaller than those used by GC. The Space Compression Factor (SCF) = |I |/|IG| varied from 4.5% to 22.7%, and the compile-time speedups for ELS 1 relative to GC varied from 15× to 68×. In addition, the runtime performance improved by up to 5.8% for ELS 1 relative to GC, with an average improvement of 2.3%. This is a significant improvement over past Linear Scan algorithms which delivered compile-time efficiency but lagged behind Graph Coloring in runtime performance. Together, these results show that Extended Linear Scan is promising as an alternate foundation for global register allocation, compared to Graph Coloring, due to its compile-time scalability without loss of execution time performance. Directions for future work include further study of the trade-off between register-move instructions and spill load/store instructions, and support for region-based live range splitting.
