In this paper, we propose the use of cyclic interval graphs as an alternative representation for register allocation. The \thickness" of the cyclic interval graph captures the notion of overlap between live ranges of variables relative to each particular point of time in the program execution. We demonstrate that cyclic interval graphs provide a feasible and e ective representation that accurately captures the periodic nature of live ranges found in loops.
Introduction
Register allocation plays an important role in compiler optimization. In fact, for modern high-performance processor architectures, register allocation has been viewed as a technique which \adds the largest single improvement" among various compiler optimizations 1]. The technology advance in the past decade has widened the gap between the speed of the CPU and memory (DRAMs), and this gap (a form of the Von Neumann bottleneck) is expected to continue to grow 2]. Therefore, the bene t of keeping variables in registers is increasing, and thus the impact of good register allocation strategies is also increasing.
Register allocators in many modern compilers employ the classical graph coloring method originally proposed by Chaitin and improved by others 3, 4, 5, 6] . In this method, an interference graph is built to direct register allocation. Each node in the graph corresponds to a live range of a program variable. An edge between two nodes in the graph represents interference between the two live ranges. Chaitin's heuristics color the graph with k colors such that adjacent nodes are assigned di erent colors. Thus, a k-coloring of the interference graph corresponds to a feasible register assignment with k registers. If the graph is not k-colorable, spill code is introduced.
Given a set of live ranges, the interference graph encodes the overlapping or interference of all live ranges for an entire code segment, most often an entire procedure or function body. Thus, this representation provides a concise summary of the constraints that must be met for a correct register allocation. Furthermore, interference graph approaches are well-suited to general purpose register allocation that works even for programs with irregular and complex ow of control.
However, we believe that this concise and general strategy is sometimes also a weakness of the interference graph approach. From our point of view, interference graphs sometimes provide a representation of the problem that is too abstract, particularly in the case of well-structured inner-loop nests. More speci cally, an interference graph does not encode any notion of the relative time of overlaps between live ranges. This information is very useful in developing e ective coloring and spilling heuristics. This is particularly true when one considers how to e ectively model the live range of a loop variable: its lifetime may cross the boundary of iterations, and it may be de ned and used repetitively at regular intervals. Another weakness of the interference graph approach is the potential expense required to to rebuild and recolor the interference graph after spill code has been introduced.
Although there have been successful e orts so improve Chaitin's original interference graph approach (these are presented in Section 7), we propose a new approach that is based on a di erent representation -cyclic interval graphs. Intuitively, the \thickness" of each point in the cyclic interval graph captures information about overlapping live ranges of variables at a particular location in the program. We will see that this information | the \fat spot" | is crucial in developing our new heuristic algorithms.
We argue that an approach based on such interval graphs can address some of the shortcomings of the interference graph approaches. Since our approach takes advantage of program structure and the relative times of live ranges, we have focused on methods for structured programs, and in particular large inner-loops and loop nests. The main contributions of this paper include:
We demonstrate that cyclic interval graphs provide a feasible and e ective representation to characterize sequences of live ranges of variables in successive iterations of a loop. (Section 2)
A new heuristic algorithm for minimum register allocation, the fat cover algorithm, has been developed, implemented, and studied. (Section 3)
A new spilling algorithm is proposed that makes use of the extra information available in the interval graph representation. Whenever possible, it favors register oats (moving values from one register to another) over the traditional register spills (storing a spilled variable into memory). Furthermore, the spilling phase is only invoked once, and there is no need for iterating the allocation and spilling phases.(Section 4)
The possibility of using interval graphs as a model for register allocation was noted in 7, 8] . However, to our knowledge, previous research was theoretical in nature and mainly focussed on the algorithmic aspects of the interval graph model. Furthermore, the issues of applying such models to real programs with hierarchical control structures (such as nested conditionals and loops) have not been addressed. In this paper, our objective is quite di erent: we are primarily interested in the feasibility of using interval graphs in register allocators of real life compilers.
It is important to note that we do not claim that our interval graph methods can solve all the problems in register allocation. In general, we do not accept the view that there is an ultimate \best" method for register allocation. On the contrary, we believe that di erent approaches are suited to di erent sorts of programs. For example, for program segments with very irregular control ow, there is little program structure to exploit, and the interference graph representation is probably the best abstraction. However, for program segments such as nested loops and structured conditionals, there is often program structure that can be exploited by encoding the problem as cyclic interval graphs. Thus, we view the interference graph approach and the cyclic interval graph approach as complementary.
It should be noted that in most of the paper we concentrate on examples that are innermost loops. However, in Section 5, we discuss how our scheme naturally extends to a hierarchical method that can handle nested loops and conditionals. In Section 6, we present experimental results to demonstrate the e ectiveness of our approach for two large loops that are di cult to color. We compare the results of our prototype spilling and coloring implementations with the results observed from compiling these loops with state-of-the-art optimizing compilers (SUN Sparc C compiler, the MIPS C compiler, and the IBM RS6000 compiler). Finally, we present related work in Section 7 and our conclusions in Section 8.
Cylic Interval Graphs
In this section, we rst review the traditional interference graph representation, and then introduce our cyclic interval graph representation.
Interference Graphs and Chaitin's Heuristics
As outlined in Section 1, the traditional approach uses an interference graph representation where nodes represent live ranges of variables, and edges represent interference between two live ranges. To be more precise, according to 4], two live ranges interfere, \if one of them is live at a de nition point of the other." A node has degree k if it has k neighbors. Chaitin's method colors the graph with k colors such that two adjacent nodes are assigned di erent colors. Thus, a k-coloring of the interference graph corresponds to a feasible register assignment with k registers.
The basic heuristics of Chaitin's original method are based on a simple observation: a graph G having a node X with degree less than k is k-colorable if and only if the reduced graph G 0 formed by removing X with all its adjacent edges is k-colorable. Thus, Chaitin's algorithm tries to remove all nodes of degree less than k. If at some point there remains only nodes with degree greater or equal to k, then spilling is performed. This involves the introduction of some spill code (to store the de nition of the spilled variable to memory and to load it for later uses) according to some heuristic. As the spill code replaces one long live-range with several short live-ranges, the interference graph is rebuilt and the coloring process must be repeated until a k-coloring succeeds without introducing any new spill code. Many improvements on Chaitin's original method have been proposed. For example, one such improvement is based on the observation that it is not always necessary to spill a node with k or more neighbors 6].
If some of the neighbors can be allocated the same color, then spilling is not necessary. This is one example where one can observe that a k-coloring can be achieved if the correct subset of nodes are colored the same color.
Cyclic Interval Graphs
The cyclic interval graph approach has been designed to expose program structure that is useful in choosing which intervals to color the same color, and which intervals are best to spill.
One challenge in designing this new representation is determining how to represent live ranges of loop variables. Figure 1 (a) shows a loop with n iterations. The numbers written alongside the instructions are the instruction numbers. Four scalar variables are de ned and used in the loop: X1 { X4. Note that in the case of loops each variable has a sequence of live ranges that correspond to di erent iterations of the loop. For example, the live range of the variable X4, can be split into several segments. For the rst iteration, X4 is de ned outside the loop and dies at instruction 2 within the loop. This is one section of X4's live range. In addition, for each iteration i of the loop, X4 is de ned in instruction 4 of iteration i, and is live between this de nition and the last use in instruction 2 of the following iteration i + 1. There is a similar situation for X3. In order to accurately capture this information in our approach, we would like to nd a representation that incorporates the regular periodic nature of variables that are de ned in some iteration i and last used in some later iteration i 0 .
In Figure 1 (c), we show the interval graph for the program in Figure 1 As illustrated in Figure 1 (c), the live range of a loop variable can be represented as a periodic interval: a sequence of lifetime intervals that are equally spaced in time by some period. Such a periodic interval can be characterized by the interval corresponding to one period. For example, the live ranges of variables X1 { X4 in Figure 1 (c) have a period of one iteration. The live ranges of variables X1 and X2 do not extend across the boundary between iterations, therefore, they each can be expressed as one interval, i.e. X1: 1 : 3), X2: 2 : 4). The variables X3 and X4, however, are de ned in one iteration and used in the next. Therefore, for convenience, we represent its live range as a pair of two intervals, i.e. X3: ( 0 : 1); 3 : 5]) and X4: ( 0; 2); 4; 5]), where the interval 0 : 1), for X3 as an instance, can be considered an extension of the interval 3 : 5] that is wrapped around to t in one period. The numbers 0 and 5 do not correspond to any instructions but merely provide a joining point for two successive iterations. We call such a \wrapped" interval | a cyclic interval. In Figure 2 In general, we use the following conventions in our cyclic interval graph representations.
Let t 0 ; t 1 ; : : : be the starting time points of a sequence of machine operations. Without loss of generality, we use non-negative integers for the time points. We use t : t 0 ] to denote the interval between t and t 0 including both end points. The notation t; t 0 ) denotes the same interval but with the end point t 0 left out.
We assume that each machine operation is in the form of a quadruple, e.g. x = y + z, which begins at some time point t. To be precise, we say that variable x is de ned at time point t. The live range of x will continue to the time point t 0 , (t 0 > t), where it is last used in a statement, e.g. u = x + v. After time t 0 , the value in x is no longer live. In this paper, we de ne the lifetime interval of x to be t; t 0 ). When no confusion may occur, we use the terms interval and lifetime interval interchangeably. The relation between the live ranges of a set of variables is completely de ned by the corresponding set of lifetime intervals.
We should note that cyclic intervals can also be used to represent live ranges of variables in loops which are rst unrolled. Of course, each variable in the original loop will be split into several variables depending the unrolling factor. Then, the interval graph can be constructed where each new variable in the unrolled loop is associated with a cyclic interval.
We should also note that the period of a cyclic interval graph may be greater than 1 iteration. This may happen when a loop body contains array references that have a loopcarried dependency with dependence distance greater than 1. A straightforward extension can represent certain array references in a loop using cyclic intervals. A full discussion is beyond the scope of this paper, and readers are referred to 9].
Cyclic interval graphs can also be used to represent programs with hierarchical control structures such as nested loops and conditionals. In the case of nested loops, we naturally get nested cyclic interval graphs. In the case of conditionals, we create a structure similar to nested loops by introducing the proper constraints between the two branches of the conditionals. In Section 5, we discuss this process further.
Before presenting the main problems to be solved on cyclic interval graphs, we rst introduce the following de nitions. 
Coloring Cyclic Interval Graphs
Our two main coloring problems can be formulated as: The importance of Problem 2 is obvious and is probably familiar to most compiler writers. We therefore focus on the importance of Problem 1, the problem of nding a minimum coloring of a cyclic interval graph. We argue that this is an important problem for the following two reasons:
1. It has important applications in situations when the smallest number of registers is required. For example, when allocating registers interprocedurally it is bene cial to allocate a minimal number of registers to each procedure using such a solution. This reduces the amount of register saving required at procedure call time, and can also improve interprocedural register allocation 10].
2. Using the information captured by interval graphs, we have developed a two-step approach for solving Problem 2. This approach makes e ective use of the optimal solution of Problem 1 to minimize the spilling cost. As we show in Section 4, this is particularly important for programs in which the register pressure is close to k.
The above two problems are treated in Section 3 and Section 4 respectively.
Further Observations about Cyclic Interval Graphs
Our problems are related to the class of circular-arc graph coloring problems 11, 12] . A graph G is called a circular-arc graph if its vertices can be placed in a one-to-one correspondence with a set of circular arcs of a circle in such a way that two vertices of G are joined by an edge if and only if the corresponding two arcs intersect one another. In Figure 2 (b), we show the circular-arc graph representation of our example. Intuitively, one can think of \bending" each of the interval into an arc. Since the intervals are periodic, we can t them into one circle.
Theoretically, the problem of determining a k-coloring for a circular arc graph with n arcs has a complexity of O(nk!klogk) 12] . In this paper, we are interested in fast heuristic methods which nd a k-coloring quickly, and generates e cient code for spilling when necessary.
As in any general graph coloring problem, nding the minimum coloring of a cyclic interval graph is NP-hard 12]. For our purpose of register allocation, it is most important to use the information provided in the interval graph as guiding heuristics for our algorithmic solutions. From our examples, we can observe that the number of minimum registers needed for a cyclic interval graph is related to the thickness of the graph, which we will formally de ne below.
De nition 2.3 The width of a cyclic interval graph G at time t, written as width(G,t), is the number of intervals covering t.
De nition 2.4 The maximum width of a cyclic interval graph G, written W max (G), is the maximum width(G; t), for all t which is covered by some interval in G. The minimum width of a cyclic interval graph G, written W min (G), is the minimum width(G; t), for all t which is covered by some interval in G. Now, we state the following theorems about the number of colors required to minimally color acyclic and cyclic interval graphs.
The following theorem addresses the problem of optimal coloring of acyclic interval graphs. Proof: First, it is obvious that G cannot be colored with less than k colors. Now let us complete the proof by sketching an algorithm (called left-to-right algorithm) which will guarantee to nd the optimal coloring of G. Assume G spans from time 0 to time n. Starting from the left end (at time, t = 0), move from left to right along the time line. For each interval, I, which ends at t, release its color back to the pool of free colors. For each interval I beginning at t, give t a free color which is not being used by any interval covering t. Initially, the pool contains k = W max (G) free colors. Since there will never be more than W max (G) intervals covering any time t, the algorithm will successfully nd a k-coloring for G. Proof: First, it is obvious that G cannot be colored with less than k colors. Now let us complete the proof by sketching an algorithm which will guarantee to nd the coloring of G in W max (G) + W min (G) colors. Cut G at the point where it has the minimum width W min (G). Take the intervals covering the cutting point out of G and call the remaining part G 0 . Obviously, we can now treat G 0 as an acyclic graph. Coloring G 0 with the left-to-right algorithm guarantees it to be colored with no more than k 0 = W max colors (Theorem 2.1). Then, it is trivial to see that we can use W min (G) more colors to color the removed intervals.
Finding a Minimal Coloring of Cyclic Interval Graphs
In this section we examine heuristic algorithms for coloring cyclic interval graphs using a minimal number of colors. More speci cally, given a cyclic interval graph G, we would like to nd a fast algorithm that can color G with as few colors as possible. Such algorithms will all use some sort of heuristic, and therefore they are not guaranteed to nd the optimal solution. However, our goal is to nd algorithms that always nd the optimal or close to optimal solutions. In subsection 3.1 we develop a new algorithm, the fat cover algorithm. In subsection 3.2 we give a short summary of two other approaches to the minimal coloring problem and, in subsection 3.3 we describe another important method, the hybrid algorithm. Finally, in subsection 3.4 we present experimental results to compare the e ectiveness of the four algorithms.
The Fat Cover Algorithm
Given the fact that the optimal k for a cyclic interval graph G is bounded by W max (G) and W min (G) + W max (G) (Theorem 2.2), and our experimental observations which indicate that a large majority of graphs that could represent programs can be colored in W max colors, we have developed an algorithm, called the fat cover algorithm, that is speci cally designed to work well for graphs that can be colored in W max colors.
The key to this algorithm is the observation that the fat spots of the interval graph are the locations that are most important, and that we can iteratively reduce the maximum width of the uncolored portion of the graph by nding a non-overlapping set of intervals that covers all of the fat spots and coloring all of these intervals with the same color. We rst introduce this idea informally with an example, and then give a more formal development of the algorithm.
An Introductory Example of the Fat Cover Algorithm
Consider the graph given in Figure 3 (a) which has a maximum width of 3, and two cyclic intervals a and b. The fat spots, or the points of maximum width, are indicated by arrows. The objective of the fat cover algorithm is to nd a set of non-overlapping intervals that includes a cyclic interval. In Figure 3 (a) we have indicated such a set with dashed lines (intervals a and d) { we call this a fat cover relative to a. If we color both a and d with the same color, then we reduce the original problem to that of nding a 2-coloring for the graph given in Figure 3 (b).
Here we nd that a fat cover for b is fb, f, gg. We can then reduce the problem to a 1-coloring of the acyclic interval graph given in Figure 3 (c), which is clearly 1-colorable. 
A Formal Description of the Fat Cover Algorithm
Given the basic idea of the algorithm as presented in the previous section, we now give a more rigorous description of the algorithm.
De nition 3.1 The fat spots of a cyclic interval graph G, written fatspots(G), is the set of all times t i where width(G,t i ) = W max (G).
De nition 3.2 A fat cover of a cyclic interval graph G relative to interval I is a subgraph F (I 2 F) of G that obeys the following two properties: (1) all intervals in F are non-overlapping, and (2) 8t i 2 fatspots(G), there exists an interval in F that covers t i . The development of our fat cover algorithm was inspired by Theorems 2.1 and 3.1. Given a graph G with m cyclic intervals I c 1 ; I c 2 ; : : :; I cm , the algorithm proceeds in two phases. The rst phase attempts to use m colors to nd a fat cover for each of the m cyclic intervals. At the ith step, a traversal from left to right is performed to nd a fat cover for interval I c i (call this fat cover F i ). Our implementation ensures that a fat cover will be found in a left to right traversal, if one exists. If such a cover is found, a traversal from right to left is performed which assigns the same new color C i to all of the intervals in F i .
After all of the m cyclic intervals are dealt with in this rst phase, then the second phase uses a straightforward left-to-right algorithm to color the remaining intervals. If the rst phase succeeds, then the second phase need only consider a reduced graph G 0 that contains no cyclic intervals, and has a maximum width of w = W max (G) ? m. The coloring of G 0 is guaranteed to use only w new colors (see proof of Theorem 2.1). Thus, if the rst phase succeeds, we can nd an optimal coloring in k = W max (G) colors for graph G. If the rst phase fails to nd a fat cover at some stage, the second phase simply colors the remaining cyclic intervals with new colors, and applies the simple left to right algorithm to color the remaining intervals. In this case, the resulting coloring may or may not be optimal. However Theorem 4.1 ensures that, for a graph G which is k = W max (G) colorable, it is possible for our algorithm to succeed in handling the cyclic intervals by nding the appropriate fat covers.
It should be noted that although the algorithm nds a fat cover at each step, it may not nd the fat cover that leads to an optimal solution. That is, for a k-colorable graph G, it may select a fatcover F 0 such that G ? F 0 is not k ? l colorable, even though there exists another fatcover F such that G ? F is k-colorable.
For graphs with one cyclic interval, any fat cover will do, and we do nd the optimal solution. This is because we are guaranteed to nd a fat cover, F, if one exists. Furthermore, if a fat cover is found, we know that the graph G ? F will have maximum thickness of k ? 1, and G ? F will contain no cyclic intervals. Therefore, by theorem 2.1, we can guarantee that G ? F can be colored with k ? 1 colors, and G can therefore be colored in k colors.
Our fat cover algorithm can be thought of as a smart way of deciding which subset of intervals should be colored with the same color. In some of the more traditional approaches using interference graphs, a simpli cation phase is applied to the interference graph in which pairs of nodes are coalesced into one node, thus forcing them to be colored the same color 3]. In our case we are searching for sets of nodes that have a very speci c property, that is they all belong to a fat cover of some cyclic interval. Finding such a set of intervals requires information regarding the location of all the fat spots in the interval graph. This information is explicit in our cyclic interval graph representation, and is not available in the interference graph representation.
It should be noted that the fat cover algorithm is not computationally expensive. For each of the cyclic intervals, one sweep of the graph is required (where the size of the graph is exactly the number of 3-address statements in the program). All of the remaining intervals can be handled by one nal left-to-right sweep. Furthermore, since interval graphs that correspond to programs have at most one new interval per time step, the complexity at each point in the sweep is e ectively constant.
Two Other Approaches
In this subsection, we describe a naive coloring algorithm based on Chaitin's original Interference Graph approach and a Greedy algorithm.
An Interference Graph Algorithm
Since a cyclic interval graph contains information about interference (or overlap) among intervals, we can use a Chaitin-style reduction algorithm to discover a k which guarantees that a cyclic interval graph can be colored in k colors. At the ith step of this algorithm one removes, from among the remaining intervals, the interval with the fewest number of overlapping intervals (this corresponds to removing the node with least degree from an interference graph). For a graph with n intervals, there will be n steps.
Call the interval that is removed at step i, I i , and let d i denote the number of overlapping intervals (degree) of I i at step i. Now pick k = max(d 1 ; d 2 ; : : :; d n ) + 1. We can color the original graph in k colors by coloring the intervals in the order I n , I (n?1) , : : :, I 1 . At each step we color an interval that has at most k ? 1 previously colored overlapping intervals.
A Greedy Algorithm
Another approach to coloring a cyclic interval graph is to rst color the cyclical intervals, and then use a greedy algorithm to color the remaining intervals. At each step in the greedy algorithm the following three steps are performed :
1. From among the uncolored intervals, choose the \best" one to color next, call it I next . Some possible criteria for choosing the \best" interval include: (a) the leftmost uncolored interval (the interval with the lowest starting time), (b) the longest uncolored interval, (c) the interval which overlaps with the most uncolored intervals, or (d) the interval which has the fewest number of available colors (where a color C is available for interval I, only if C has not been used for any interval that overlaps with I).
2. From among the colors available for I next , choose the \best" color, call it C next . If no color is available for this interval, then allocate a new color. Some possible criteria for choosing the \best" color include: (a) best-t (each color is available for some time intervals, a color that best-ts is one where the starting and ending times for the color best match the starting and ending times for the interval I next ), (b) worst-t, and (c) the color which can be used for the fewest number of unallocated intervals.
3. Assign color C next to interval I next .
The experimental results reported in Figure 5 used a greedy algorithm with option (d) for choosing the interval to color, and option (c) for for choosing which color to use (these were found to be the best heuristics).
A Hybrid Algorithm
A hybrid algorithm can combine the best points of the interference graph approach with either the fat cover method (presented in 3.1) or the greedy method (presented in 3.2.2). Given a graph G, this algorithm nds the coloring in three phases. The rst phase applies a reduction step based on interference information. This phase repeatedly removes all intervals that have fewer than W max (G) overlapping intervals. Let us call the intervals removed I 1 ; I 2 ; : : :; I m , and the graph remaining G 0 . Phase 2 applies either a greedy algorithm or the fat cover algorithm to color the intervals in G 0 , and nally phase 3 colors the intervals removed by phase 1 in the order I m ; I (m?1) ; : : :; I 1 .
Let us illustrate the hybrid fatcover algorithm with a step-by-step example given in Figure  4 . other intervals. This is equivalent to applying a Chaitin-like algorithm with k=3. Note that only intervals c and f were removed, and all of the remaining intervals overlap with 3 other intervals. This means that in order to use only 3 colors, a naive interference graph algorithm would have to resort to spilling at this point. However, as illustrated with the next four pictures, when the fat cover algorithm is applied to the remaining intervals, a 3-coloring is found.
Figure 4(c):
The rst phase of the fat cover algorithm is to nd a fat cover for each cyclic interval. As shown by the arrows in this picture, there are four fat spots to be covered.
By traversing left from interval a, the cover of fa,dg is found. We can now color a and d with a new color red, and proceed to the next phase. In general, if the phase that handles all of the m cyclic intervals succeeds, then we are guaranteed to be able to color the remaining intervals with W max ? m colors. In this example W max is 3, and m is 2. It should be noted that this algorithm could be improved further by allowing multiple alternations between the interference graph heuristic and the fat-cover algorithm. For example, after nding the fat cover in Figure 4 (c), one could remove all intervals with degree less than 2.
An Experimental Comparison
In order to experiment with a wide variety of coloring approaches and coloring heuristics, we implemented an experimental platform that supports all four approaches outlined above and also supports a wide variety of heuristics for the greedy approach. The tables given in Figure  5 summarize the experimental results that we collected for the following algorithms: (1) the algorithm based on the naive interference graph (see Section 3.2.1), (2) a greedy algorithm (see Section 3.2.2), (3) the fat cover algorithm (as described in the Section 3.1), (4) a hybrid algorithm that has a rst phase based on interference information, and a second phase that applies the greedy method (see Section 3.3), and (5) a hybrid algorithm that uses the fat cover method for the second phase.
One can think of our experiments as a \heuristic algorithm challenge". We experimented with four graph sizes -10, 20, 30, and 50 intervals. For each graph size, we studied graphs which Figure 5 : Number of Extra Registers Used in Coloring 1000 Random Graphs randomly. These program graphs correspond to cyclic interval graphs that could arise from inner loop constructs of a program, i.e. they are not truly random graphs. Each time point in the graph corresponds to one instruction which may have at most one de nition, and at most 2 last uses. Non-cyclic intervals correspond to live ranges that are de ned and last-used within the same iteration, while cyclic intervals correspond to live ranges that are either live throughout the loop, or de ned in one iteration and used in the next. The graph given in Figure  3 is an example of one such graph that has 8 intervals of which 2 are cyclic intervals.
Each experiment was run as follows. Given an input interval graph, each of the ve algorithms was applied to the graph, and the number of colors used by each algorithm was reported. If we let k min be the minimum number used by any of the algorithms, then each algorithm was charged with penalty points, one point for each register more than k min that it used. The numbers reported in Figure 5 corresponds to the number of penalty points charged to each algorithm over 1000 experiments. A score of 0 for an algorithm means that it produced the minimal number of registers in all 1000 experiments. The results clearly indicate that the hybrid fat cover algorithm is the winner. In 12 of 20 cases it gave the best result in all 1000 experiments (a score of 0), and in 19 of 20 cases it equaled or beat all of the other algorithms (in the case of 30 intervals with 10 cyclic intervals it was a close second).
These results do not give conclusive answers for how the various algorithms will behave on real application programs, but it does show a general trend for a large number of graphs that correspond to possible programs (any graph we generate corresponds to a possible program). Thus, we can see that the hybrid algorithm is worth implementing in a real compiler, and that the fat cover is likely a better idea than a wide variety of greedy heuristics. As we will see in the next section, the fat cover algorithm also provides some advantages when we are trying to reduce spill code. Thus, these experiments were important for us to demonstrate that the fat cover algorithm appears to do quite well for a wide variety of graphs. Furthermore, the results motivated us to pursue the next step of integrating the fat cover algorithm with a new spilling strategy.
Finding a k-Coloring of Cyclic Interval Graphs
In the previous section we presented the fat cover algorithm that was designed to nd a coloring in a minimal number of registers. In this section we present a new approach for allocating registers given the constraint that only k registers are available, and the minimal number of registers required to color the graphs is k 0 , where k 0 > k.
General interference graphs have several drawbacks that our interval graph representation solves naturally:
Separation of the spill phase from the coloring phase: Given that we developed a good algorithm for coloring a graph G with maximum thickness W max (G), we take the approach that register allocation should proceed in two phases. Given k < W max (G), the rst phase transforms G to an equivalent graph G 0 that has maximum thickness W max (G 0 ) = k. This transformation process introduces register spills and is guaranteed to produce a graph G 0 that can be colored by the second phase without introducing any further register spills. Thus, only one application of each phase is required. This di ers from most approaches based on interference graphs that introduce spilling during or after the register allocation phase. These approaches cannot guarantee that the spilling will result in a k-colorable interference graph in one pass, and it is necessary to iterate the coloring/spilling process until a k-colorable solution is found. It should be noted that the approach given in 13] suggested a means of avoiding this iteration, but it uses a more complex algorithm than that required here.
Choice of spilled quantities: We use the information stored in the cyclic interval graph to make good decisions on which intervals to spill. This information is not available in the interference graph representation, and therefore cannot be exploited in spilling techniques based on that representation. It should be noted that a similar approach has been used in the context of interference graphs 14]. In this approach the \width" 1 of the interference graph was used for one of the spill heuristics. Similarly our algorithm uses the width of the interval graph as 1 The width of the interference graph at any given point in time is de ned to be the number of live variables at that point. one of the criteria when choosing a node to spill. However our representation captures the width of the graph at every point in the program very naturally and makes it easier to bene cially exploit this information. Furthermore, we can make use of a natural metric for distance, even for cyclic intervals.
Register Floats: Our approach uses a two-level mechanism: (1) oating registers and (2) spilling registers. As fully explained in the next section, a register oat corresponds to moving a value from one register to another register, while a register spill corresponds to moving a value from a register to a memory location and back. Clearly a register oat is preferred over a register spill.
Chameleon Intervals, Register Floats, and Register Spills
By carefully studying the structure of cyclic interval graphs, one can see that there are two quite di erent constraints that make a graph not colorable in k registers. The rst is the most evident. If a graph G has some time, t i , where there are more than k intervals covering t i , then it is impossible to allocate a di erent color to each interval at t i . For example, consider the graph given in Figure 6 (a). Here there are three intervals, a, b, and c that overlap. The only way in which this graph can be colored with 2 colors is to spill one of the intervals to memory. We illustrate this process in Figure 6 (b), where the interval for c has been spilled leaving two short intervals representing the de nition of c followed by a store to memory (#) and a load from memory (") followed by a use. The second situation is more subtle. Consider the graph given in Figure 6 (c). This graph has a maximum width of 2, but is not 2-colorable. In this situation we have not really run out of colors, and we need not resort to spilling in order to make this graph 2-colorable. Instead, we use the notion of a chameleon interval, an interval that can change color depending on its surroundings.
If we allow the interval for variable a to change color at the location indicated by the solid bar in Figure 6 (d), then we can easily color this graph with only two colors. Thus, instead of introducing the loads and stores required for a register spill, we need only introduce a register move that corresponds to the location that interval a changes from green to red. We call this register move operation a register oat -a value oats from register to register, but is not spilled. 3 By using chameleon intervals to nd register oats, we can color any cyclic interval graph G that has W max (G) = k with exactly k colors without introducing any spilling. This is because any graph with W max (G) = k that is not immediately k-colorable must belong to the class of graphs that can be colored if we allow chameleon intervals (as illustrated in Figure 6(d) ).
Thus, we can use our fat cover algorithm to color the graph, and for each cyclic interval that cannot be covered, we simply introduce a chameleon interval. No extra loads or stores need be introduced: we simply introduce a register oat for each chameleon interval. Since we introduce chameleon intervals only for the cyclic intervals that do not have a fat cover, the number of chameleon intervals introduced is small (at most W min (G)).
If more than one register oat is introduced, it is possible that some of register moves depend on each other. For example, perhaps a red interval needs to turn to green, and a green interval needs to turn to red. This can be accomplished either by rotating values through a temporary register, or by swapping the contents of registers using a trick such as a := a xor b; b := b xor a; a := a xor b. It is most straightforward to use a temporary, and since the number of cyclic intervals is likely to be less than the maximum width of the graph, a temporary register will be available.
Reducing the Width of an Interval Graph
Given that we have the coloring algorithm described in the previous section, the problem of k-coloring now reduces to the problem of transforming a graph G, with W max (G) = k 0 , k 0 > k, to an equivalent graph G 0 with W max (G 0 ) = k. Since we are trying to reduce the width of a graph (as shown in Figure 6 (b)), this transformation must introduce register spills. Therefore we would like an approach which attempts to minimize the number of register spills.
We have developed a new algorithm, the sweep and split algorithm that is based on the cyclic interval graph representation. 4 Like the fat cover coloring algorithm, the sweep and split 3 The idea of register oats is not new, however di culty in e ciently identifying values to treat as register oats has prevented their widespread use. Our interval graph representation provides a natural mechanism| chameleon intervals|for recognizing when to use register oats and the quantities on which to use them.
algorithm takes advantage of the extra information available in the interval representation. Since this algorithm is straight-forward, we give only an overview.
The central idea of this algorithm is to sweep from left to right over the cyclic interval graph. The invariant is that at each time step i, any time to the left of time i is guaranteed to have a maximum width W max (G; i) k. To move to the next time step, i + 1, there are two situations. The rst is that width(G; i) k, and the second is that width(G; i) = k 0 ; k 0 > k. In the rst case, no action is required. In the second case, one must select k 0 ? k intervals to split by introducing spill code. Thus, the only di culty is developing a good heuristic for selecting which intervals to split.
We have developed a heuristic that uses information about time which is readily available from our interval graphs. This heuristic favors intervals that will clear the longest time interval to the right of i. For non-cyclic intervals this is equivalent to choosing the one with the furthest next use from i. Note we do not split the whole interval, but only the segment that overlaps time i. The other segments will be split only if the sweep selects those intervals as the ones to split at some later step i 0 . The reasoning behind our heuristic is that according to the invariant, all times to the left of i have already had their widths reduced, and so we should favor intervals that will reduce widths to the right of i. If multiple intervals clear the same longest distance, an interval that requires only a load, is preferred over an interval that requires both a load and a store, and if a store is required, then a store that is outside of the loop is preferred.
Let us now give a concrete example of applying the sweep and split algorithm to a cyclic interval graph that corresponds to a small program. In Figure 7 (a) we give a small illustrative program, and in Figure 7 (b) we give the 3-address code. 5 Assuming that the number of available registers (k) is 3, Figure 7 (c) gives the 3-address code program that results from applying the sweep and split algorithm. In order to reduce the width to 3, three intervals must be selected to be split. Interval c cannot be split because its use is at time 1. Of the remaining intervals, the best intervals to select are a, b, and n. Splitting each of these intervals frees the longest time interval to the right of the sweeping line. 
Hierarchical Cyclic Interval Graphs
In the previous sections we have concentrated on cyclic interval graphs that represent innermost loops. However, it is very important to note that our techniques are not limited to these cases. In fact, there is a natural hierarchical representation for structured programs that contain nested conditionals and loops.
We should note that our emphasis has been on the application of our methods for innermost loops. We include this discussion of hierarchical structures to show that it is plausible and relatively straight-forward to extend the method to structured programs. For the parts of the program with very irregular control structure, an interference graph approach is probably more e ective.
The overall strategy for hierarchical graphs is to allocate pseudo-register numbers in a bottom-up fashion according to the nested structure of the program, and then to assign real register numbers in a top-down pass.
Let us rst consider the case of nested loops as illustrated in Figure 8(a) . In this example, it is quite clear that this is just a nesting of cyclic interval graphs, with the cyclic interval graph for LOOP 2 nested inside a cyclic interval graph for LOOP 1. Thus, we can apply our spilling and coloring algorithms in a structured manner, starting with the innermost loop and working outwards. For example, let us assume that we nd a 4-coloring for LOOP 2 in Figure 8 We can now nd a coloring for LOOP 1, which is now a non-hierarchical cyclic interval graph. Since this is the topmost level in the hierarchy, we can use the pseudo-register numbers as the real register numbers for all non-autonomous intervals. For the two autonomous intervals, the real register numbers must be assigned to all intervals in LOOP 2 that correspond to the autonomous intervals. For example, consider the situation where autonomous intervals were assigned real registers R 1 , and R 5 , and the intervals inside LOOP 2 had been assigned pseudo registers P 1 to c, P 2 to d, and P 1 to e. Then, on the top-down pass we must assign R 1 to all intervals assigned to P 1 (c,e) and R 5 to all occurrences of P 2 (d).
In the case of nested conditionals, the hierarchical interval graph nesting is not quite as obvious. There are two basic problems. The rst problem is that we must enforce a consistent allocation between intervals of the same name that are live on entry (or exit) to both sides of the conditional. For example, in the program in Figure 9 (a), a is live on entry to the conditional, and must be allocated to the same register on both sides of the conditional. Similarly, in the program in Figure 9 (b), b is live on exit from the conditional, and must be allocated to the same register on both sides of the conditional.
The second problem is that there may be constraints that require a consistent allocation between live ranges with the same name that both enter and exit the conditional. Consider the program in Figure 9 (c). In this case, the outer loop creates a cyclic interval which imposes the constraint that the the interval for a entering the conditional must be allocated to the same register as the intervals for a exiting the conditional.
We combine all of these problems in the example of a conditional nested inside a loop as illustrated in Figure 10 . Note the constraints on variables live at the start of the conditional (in our example this is variable a), and those live at the end of the conditional(variables a and b). Each such variable must be allocated to the same register on both sides of the conditional.
For example, if b is allocated to register P i in the then part of the conditional, b must also be allocated to P i in the else part. Any variable that is live only within the conditional may be allocated to di erent registers on each side of the conditional. For example, variable e, may be assigned to di erent registers on each side of the conditional. As illustrated in Figure 10 we capture exactly these constraints (and no more constraints) by creating a cyclic interval graph that connects the input variables of the IF-THEN and ELSE parts together and the output variables of the IF-THEN and ELSE parts together. In our example, this means that we connect a on the inputs and a and b on the outputs. With these connections we have now created a cyclic interval graph. As the execution of the two branches of the conditional are mutually exclusive, we can think of the cyclic interval graph of the nested conditional to be composed of a time line that is \wrapped" from one branch of the conditional to the other. Now, we apply the spilling and coloring heuristics to the cyclic interval graph of the embedded conditional, and then the outer loop.
This encoding of the problem will address the problem of consistent allocation of intervals entering or exiting both sides of the conditional. In our example it will force the two intervals for a entering the conditionals to be allocated the same color, say P i , and the two intervals allocated to a exiting the conditional to be allocated the same color, say P j . However, by looking at only the conditional, and ignoring the presence of the outer loop, there is no reason that P i and P j must be the same color. This leads to the second problem with conditionals.
Note that a is actually a cyclic interval in the next hierarchical level, LOOP 1. This means that There are several methods for handling this second problem. One solution is to introduce chameleon intervals for cyclic intervals that have been allocated di erent registers in a nested structure. This, of course, may introduce extra register move instructions. Another approach is to force the two related ends of a cyclic interval to be pre-allocated to the same color. In our example, this means that we would pre-allocate both the entry and exit intervals for a to one pseudo register, say P1. A third method is to reallocate registers on the top down pass. This has the disadvantage of requiring two allocations, but has the advantage of propagating exactly the constraints that are imposed by allocations at a higher level. For example, it might be the case that cyclic interval a had already been allocated to a chameleon interval in the processing of LOOP 1, and there is no extra overhead introduced by assigning two di erent registers to the a in the conditional.
In summary, we can apply the cyclic interval graph coloring and spilling algorithms to structured programs by repeatedly applying the following two steps in a bottom-up phase:
Step 1: Solve the innermost nested construct (either a loop or a conditional). In the case of a loop, it is already a cyclic interval graph. In the case of a conditional, we create the proper cyclic interval graph by joining the input and output variables that are common to both the IF-THEN and ELSE parts of the conditional.
Step 2: Given the solution from Step 1, replace the nested structure with simple intervals as illustrated in Figure 8(b) .
The bottom-up phase is then followed by a top-down phase that propagates real register numbers to the pseudo allocation done on the bottom-up phase. On the top-down phase, extra constraints due to cyclic intervals entering and exiting conditionals must be resolved by introducing chameleon intervals or recoloring the lower level with some extra initial coloring constraints.
As we shall brie y survey in Section 7, interesting work has been conducted on hierarchical methods for register allocation 15, 16, 17] . However, it is our belief that the clear and simple representation of the cyclic interval graph provides a good basis for representing live ranges that cross the boundaries of nested program structures. This of course helps in the hierarchical register allocation process.
Interval Graph Performance on Benchmark Programs
Standalone versions of our spilling and coloring algorithms were implemented 6 . In this section we compare the performance of our interval graph method of spilling and register allocation to the performance of three advanced production C compilers for the IBM RS6000 (Version 1:01:003:0013), the Sun Sparc (version bundled with SunOS 4:1:1), and the MIPS (Version 2:11:2). Our comparisons use the highest level of optimization o ered by these compilers. Each of these three architectures has 32, 32-bit integer registers. The RS6000 also has 32, 64-bit oating point registers, while the Sparc and the MIPS have 16, 64-bit oating point registers.
We focus on two inner loop bodies, one taken from Livermore Loop 8 7 , the other from the Tomcat SPEC89 benchmark (Release 1.2). 8 . Both of these benchmarks were selected because of the relatively large size of their loop bodies and the large number of variables referenced. This large size is necessary in order to evaluate the e ciency of register allocation and spilling on these three architectures with their large register sets. Both benchmarks are also oating point intensive and use double precision (64-bit) arithmetic. Hence we concentrate on the allocation and spilling of oating point registers.
In order to give all compilers approximately equal input, we rst transformed the input source code so that all of the complex transformations had been made explicit. Both loops were manually unrolled and software pipelined. Common subexpression elimination was performed and reused data values were explicitly assigned to local scalar variables. This transformed code is isomorphic to the interval graphs which were fed through our spiller and allocator. These aggressive optimizations performed tend to increase the lifetime of variables thus increasing the 6 We are currently integrating the spiller and the register allocator, as well as an interference graph approach into our McCAT (McGill Compiler Architecture Testbed) research compiler. 7 To be precise, the Tomcat loop is the \I-LOOP" beginning with \DO 250 I= I1P,I2M". importance of the register allocator. Thus no sophisticated analysis of array indices nor any unusual optimization was required by the commercial compilers to match the performance of our standalone implementation.
Originally, we implemented an interval graph representation on the gcc (Version 1.37.1). However, optimizations, like common subexpression elimination and alias analysis, were poor. Hence variable lifetimes were too short to receive meaningful bene t from our register allocation scheme. We decided that for meaningful results, we had to take an approach where we could make the complex transformations explicit and rely on observing the output of compilers for which we could not modify the source. This is, of course, only a preliminary step that we used to demonstrate that our ideas are worth pursuing and studying further by others in their compilers. When the McCAT compiler supports all of the complex transformations automatically, we will be able to make a more thorough study of di erent register allocation strategies.
We make several assumptions when generating the interval graph results. As in the rest of the paper we assume that in an instruction like x1 = x2 + 4 the same register could be used for both x1 and x2 if x2 were dead after this point. We also assume that instructions are executed in their source code order. If instruction scheduling and register allocation were done together, code can be reordered so as to reduce the live range of certain variables and thus reduce register pressure. For simplicity we also assume in constructing our interval graphs, that all instructions execute in unit time. This assumption biases our results towards needing more registers. For example, if in executing a oating point divide, the destination register is not lled for 20 cycles after the initiation of the divide, that register could be used for some other purpose during those 20 cycles.
In generating spills we scan the interval graph from left to right. Whenever the thickness exceeds the number of registers, the excess is spilled. The registers chosen for spilling are those whose next use is most distant. Since these are cyclic interval graphs, the distance measure is cyclic, i.e. distance is measured from the current time to the next use, wrapping around at the end of the iteration.
When only 16 registers are available, our method requires the introduction of spills for these benchmarks. However as can be seen in Tables 1 and 2 , the number of spills required is substantially less than that required by either the Sparc or the MIPS compilers. The reduction in load spills ranges from 6 loads per iteration to 73 loads per iteration! These are proportional to the dynamic loads and stores performed. The reduction in store spills is slightly smaller, ranging from 0 to 71 stores per iteration. Please note that the total number of loads and stores include the loads and stores introduced by spilling as well as the intrinsic loads and stores which are the rst reference or nal store of array elements respectively. Table 3 gives analogous results when 32 registers are available on the RS6000. The increased number of registers alleviates the need for many of the spills. The interval graph method allows Loop 8 and the rolled version of tomcat to execute with no load or store spills. Most interesting however, is the performance with an unrolled version of tomcat. In this case the interval graph method still required no spills, while the RS6000 had 48 load spills and 22 store spills. Loop 8 also had 48 fewer load spills and 32 fewer store spills. Similarly when only 16 registers were available, the largest (absolute) reduction in spills came with the unrolled tomcat loop.
After registers are spilled, the interval graph must still be colored. As was discussed in Section 4, it is always possible to color an interval graph of maximum width W max with the use of chameleon registers, i.e. moving values from one register to another. Using the fat- cover algorithm all but the unrolled version of tomcat were successfully colored without using chameleon intervals as can be seen in Table 4 .
For the unrolled tomcat loop, the interval graph had a minimal coloring of 34, and our method introduced 4 chameleon intervals to make it 32 colorable. In all the state-of-the-art C compilers that we have studied (GCC, SPARC, MIPS, and the RS6000 C compilers) costly spills to memory would have been used to make the graph 32 colorable. However, in our case we needed only 4 register moves because the interval graph again provided a natural representation which allowed us to avoid these spills.
Cyclic
Min The main point to be made from these experiments is that the cyclic interval graph approach does a good job on complex loops that have high register pressure. This can be seen by the very low number of loads and stores required for all cases, and the use of register oats instead of spills on the unrolled tomcat case. This is an absolute observation, and does not depend on comparing the results to other register allocation strategies.
The other point, is that when given the same programs, three production-quality compilers produce substantially worse spill code for these challenging examples. Slight variations among these three compilers might be due to slightly di erent low-level optimizations or instruction scheduling. However, the main point is that all of them performed signi cantly worse than the interval graph approach.
This evidence indicates that further work on the cyclic interval graph approach is worth while, and that further tests will show the kinds of programs for which the cyclic interval graph method is superior.
Related Work
In this section, we present a survey of work related to register allocation, graph coloring and, interval graphs.
In a number of recent publications, researchers have been trying to improve Chaitin's method for register allocation. Briggs et.al.recognized the fact that Chaitin's original heuristic is not guaranteed to nd the minimum coloring 6]. They proposed a di erent heuristic method which simpli es the coloring phase by separating it from the spilling phase. That is, when the graph has been reduced to the stage where all remaining nodes have a degree greater or equal to k, it does not stop and spill. The algorithm continues the coloring process by selecting one remaining node to reduce the graph according to some heuristics. At the end of the reduction phase, the nodes are processed in the reverse order and are assigned colors. It is possible that during this process, a node with a degree greater or equal to k can still be colored, since more than one neighbor may have been allocated the same color. This method is based on interference graphs, and the coloring and spilling process may be iterated several times. Nonetheless, by avoiding some pointless spilling, improved code was generated for a number of test programs.
Bernstein et. al. have introduced a collection of heuristics which reduces the likelihood of excessive spill code generation 14]. The width, which is the number of live ranges at a certain point in the program, is used to compute the spill cost of a variable. The width coupled with the depth (of loop nesting) form the basis of their area-based heuristics. This method employs the interference graph as the basic representation of the program, and may require the graph to be rebuilt after spill code is introduced.
Callahan, Carr and Kennedy studied register allocation methods for subscripted variables, which poses a problem for many compilers. According to their method, array references which are live across several iterations are recognized and a source-to-source transformation called scalar replacement is performed so that they can be handled by coloring-based register allocators. Register moves are introduced to transfer values of subscripted variables across iterations, thus eliminating some load and store operations. However, the introduction of register moves, and the subsequent processing of register allocation seem to be orthogonal, and there exists no single uni ed framework for this optimization problem.
Another approach to the problem of register allocation for scalar and subscripted variables has been suggested by Duesterwald, Gupta, and So a 18]. This method uses the integrated register allocation graph, which is an extension of the interference graph, to represent the coloring problem for both scalars and subscripted variables. The subscripted variables are allocated a set of registers that form a register pipeline. Interprocedural register allocation has been studied by a number of people 20, 10, 21] . For example, Steenkiste and Hennessy have developed an algorithm for interprocedural register allocation where a procedure interference graph is constructed. Each node in the graph is a procedure of the program. Two procedures which are active at the same time are adjacent in the procedure interference graph. Each node of the graph is assigned a number of color that equals the number of registers needed by the local variables of the procedure. This number is determined by an intraprocedural-procedure allocation phase. A coloring algorithm assigns di erent colors to adjacent nodes of the procedure interference graph. Therefore, it is evident that a good solution for the minimum register allocation problem (as described in Problem 1 of Section 2) is important for the intraprocedural allocation phase.
Cytron and Ferrante have proposed a method of storage allocation where the amount of storage needed is equal to the maximum number of simultaneously live variables in the original program 22] . The objective of their work is to allocate storage for temporary variables by renaming, which is a compiler technique that transforms imperative programs to data ow graphs 23, 24] . They have pointed out that the formulation of the register allocation problem as a graph coloring problem based on the traditional interference graph may abstract away vital information present in the original program (like the width of interval graph), which their method uses to guide the register allocator to achieve an optimal solution e ciently. One di erence between their work and the work proposed in this paper lies in the treatment of loop variables. For example, a scalar variable de ned in a loop, is either changed into an array by scalar expansion when the loop bound is known a priori, or it is transformed to a dynamically allocated variable when the loop bound is not known statically 22]. In our work, we treat such variables using cyclic intervals, thus the overhead of extra arrays or dynamic allocation is avoided.
Callahan and Koblenz have presented a register allocation method via hierarchical graph coloring 15]. The main idea is to represent the hierarchical program structure as a tree of tiles. Tiles are processed rst in a bottom up fashion and the local interference graph is created and colored (perhaps with pseudo registers) on a tile by tile basis to capture the local usage pattern. Then a top down walk binds the pseudo registers to physical registers. Spill code is nally introduced in the less frequently executed portions of the program. Knobe and Zadeck have also proposed a hierarchical register allocation scheme based on control trees 17]. A prune procedure is executed before coloring to reduce the register pressure to a desired threshold value by storing some values in memory on entry to a program region and then reloading them on exit. The authors claim that after pruning, the coloring process will terminate if the threshold value is set properly. A live range may need to be split during the coloring process. The coloring algorithms of both the hierarchical methods described above accept Chaitan's interference graph as their input.
Gupta et. al. reported their work in the area of global register allocation using clique separators 16] . A clique separator is a completely connected subgraph. When it is removed from the graph, it disconnects the graph into at least two subgraphs. Their algorithm rst partitions the code into code segments using clique separators. Each code segment is colored separately using the interference graph coloring method. Then, the colored subgraphs are combined by the global register allocator. In the presence of branching, the combining process may introduce register copying at the point where di erent control ow paths merge.
As we pointed out before, our problems are related to the class of circular-arc graph coloring problems 11, 12] . The idea of using interval graphs for register allocation goes back over 15 years. Tucker was one of the rst to note the advantages of the representation 7, 8] . He also noted that the related concept of circular arc graphs could be applied to program loops. Interval graphs have also been used to overlay arrays and thereby minimize program memory requirements 25], and to perform channel routing in VLSI layouts 26, 27] . However, the practical use of interval graphs in register allocation appears to have been largely ignored because of perceived di culties in dealing both with circular arc graphs and hierarchical interval graphs, both of which arise when dealing with real programs 14]. A great deal of theoretical work has been done, a good summary of which may be found in 28, 29] .
Using circular arc graphs for register allocation has recently been proposed in high-level data ow synthesis for digital systems 30, 31, 32] . In this application domain, the computation is represented by data ow graphs. Data ow graphs with loops can be modeled by cyclic data ow graphs 33, 34] , and the corresponding register allocation problem can be modeled by circular arc graphs 34]. Unlike in compiler optimization, the hardware-oriented synthesis work traditionally does not address the issue of code spilling.
A recent application of the work described in this paper is the use of cyclic interval graph representation in a uni ed framework of loop scheduling and register allocation 9]. In fact, lifetime intervals can naturally be derived from an instruction schedule, and the register allocation scheme developed in this paper can be utilized e ectively in the scheduling framework.
Finally, let us reiterate our view of the relation between this work and the other related work in register allocation. We believe that our approach based on interval graphs appear well suited to certain structured programs, and in particular large inner-loops and loop nests. However, for less structured programs, perhaps the interference graph is the best representation. We expect that these two representations can complement each other in a compiler, and further experiments are clearly required to determine how to combine them in a most e ective fashion. We are currently implementing a combined approach in our McCAT compiler.
Conclusions and Future Work
In this paper we have presented a new approach to register allocation that is based on a hierarchical cyclic interval graph representation. Our representation can e ectively characterize the overlap between live ranges of variables at di erent times in a program execution. Furthermore, we have demonstrated how the additional information in such a representation is useful in coloring and spilling algorithms. We believe our method is particularly suitable to handle structured program segments such as structured loops and conditionals.
Based on our interval graph framework, we have presented two approaches to the minimal coloring problem based on the notion of a fat-cover. In addition, we have presented a new approach to the k-coloring problem. Our approach introduces the notion of chameleon intervals and, register oats which help avoid expensive register spills by introducing less expensive register moves. We presented a new sweep-and-split algorithm that is used to transform graphs that are not k-colorable into graphs that are guaranteed to be k-colorable. This transformation minimizes spills by using a powerful heuristic that is guided by information available in the interval graph representation, but is not available in the traditional interference graph representation.
In addition to illustrating how these algorithms work on inner loops, we have also shown how the cyclic interval graph representation can be extended to accurately capture the register allocation constraints in programs with nested loops and conditionals.
We have implemented our spilling algorithm and applied it to a collection of challenging loops. By comparing our results to those produced by the MIPS, SPARC, and R6000 production compilers, we have demonstrated that the cyclic interval graph representation, combined with our new algorithms, produces very encouraging results.
Based on our experimental coloring testbed and our prototype spilling program, our register allocation framework is being integrated into the low-level structured intermediate representation supported by the McCAT compiler 35] . In addition to this implementation e ort, we are continuing to exploit the extra information available from the interval graph representation. For example, one potential use of such information is the combination of instruction scheduling and register allocation.
