Towards register allocation of SSA-form programs by Hack, Sebastian et al.
Towards Register Allocation for Programs in
SSA-form
Sebastian Hack, Daniel Grund, Gerhard Goos
(hack|daniel|ggoos)@ipd.info.uni-karlsruhe.de







In this technical report, we present an architecture for register allocation
on the SSA-form. We show, how the properties of SSA-form programs
and their interference graphs can be exploited to develop new methods
for spilling, coloring and coalescing. We present heuristic and optimal
solution methods for these three subtasks.
1 Introduction
Register allocation is commonly considered as the task of mapping variables in
a program to processor registers. Since a variable can be multiply assigned,
all definitions of the variable write to the same register. This restriction may
sometimes leads to register allocations using more registers than necessary, or
even causing memory access. In this paper we investigate register allocation for
programs in SSA-form, i.e. all variables are statically assigned only once and
correspond to a unique definition of a variable. Thus, multiple definitions of the
same (non-SSA) variable are allowed to reside in different registers.
A common technique for register allocation is graph coloring: Each variable
in the program corresponds to a node in the undirected interference graph. If
the compiler finds out that two variables cannot be held in the same register
(they interfere) an edge is drawn between the two nodes in the interference
graph. A k-coloring of the interference graph is thus a valid register allocation
for the program.
Consider the program in figure 1(a). Its interference graph (shown in fig-
ure 1(b)) needs three colors to be colored. However, putting P in SSA-form
makes two distinct variables d1, d2 for each definition of d, as shown in fig-
ure 2(a). This allows the register allocator to assign two different registers to
1
a ← 1
b ← a + a
c ← a + 1
d ← b + 1
store c
e ← 1












(b) Interference graph of P
Figure 1: A non-SSA program and its interference graph
d1 and d2. However, this introduces a copy on one of the control flow edges
e3 and e4. Assuming only two registers are available, one certainly would pre-
fer introducing a copy to avoid memory access. If three registers are available,
one would spend the third register, assign d1, d2, d3 the same register and thus
eliminate the copy on e3.
Traditionally, φ-operations are removed before register allocation. In doing
so, SSA variables which occur as operands of a φ-operations are mapped to
as few as possible non-SSA variables. Effectively, removing φ-operations before
register allocation results in prematurely coalescing the SSA variables, poten-
tially leading to spills if registers are exhausted. Thus, eliminating a copy can
cause a spill, which is a costly trade-off. Preferably, a register allocator should
only eliminate a copy if it will not cause a spill.
As Chaitin showed in [CAC+81], each undirected graph can occur as an
interference graph to some (non-SSA) program. As minimal coloring undirected
graphs is NP-complete, usually a heuristic is applied. The variables which have
to be spilled are determined while coloring the graph when a node cannot be




The feedback is necessary, since determining the chromatic number (the amount
of colors (registers) needed to color the graph) is NP-complete. So, it is not
possible to tell how many colors you will need by “simply looking at the graph”.
Furthermore, the coalescing phase, which tries to eliminate useless copies, may
change the colorability of the graph. So making a reasonable trade-off between
2
a ← 1
b ← a + a
c ← a + 1
d1 ← b + 1
store c
e ← 1
d2 ← a + 1
store e












(b) Interference graph of P ′
Figure 2: SSA version of P and its interference graph
coalescing and spilling is hard for such a register allocator.
However, the situation changes totally if register allocation is based on the
SSA-form, as we show in this paper:
• Due to a special property (see [Hac05]) of the interference graphs of SSA-
form programs, called chordality, the spilling and coalescing phases can be
completely decoupled during register allocation (This fact has also and in-
dependently been discovered by Pereira and Palsberg [PP05], Brisk [Bri05]
and Bouchez, Darte and Rastello [Bou05]).
• Since chordal graphs are perfect they inherit all properties from perfect
graphs, of which the most important one is, that the chromatic number
of the graph is equal to the size of the largest clique. Even stronger,
this property holds for each induced subgraph of a perfect graph. The
interference graph of program P shown in figure 1(a) is an example for a
non-perfect graph since its chromatic number is 3 and the largest clique
contains 2 nodes. In other words chordality ensures that local register
pressure is not only a lower bound for the true register demand but a
precise measure. Determining the instruction in the program where the
most variables are live, gives the number of registers needed for a valid
register allocation of the program. In contrast to usual (non-SSA) graph
coloring approaches, no additional register demand can arise from inter-
block effects.
• This allows the spilling phase to exactly determine the locations in the
program where variables must reside in memory. Thus, the spilling mech-
anism can be based on examining the instructions in the program instead
of considering the nodes in the interference graph. After the spilling phase
has lowered the register pressure to the given bound, it is guaranteed, that
no further spill will be introduced. So spilling has to take place only once.
3
Spill Color Coalesce/SSA-Destruction
As we show in section 3, inserting additional copy instructions will not
lower the demand of registers in SSA-form programs. Thus, the SSA-
form can be seen as a sufficiently “un-coalesced” representation to avoid
inserting spills in favor of copies.
• Coloring a chordal graph can be done in O(|V |2). However, if the domi-
nance relation and live ranges have been computed in advance (which is
commonly the case), coloring can be done in O(ω(G) · n), where n is the
number of instructions in the program and ω(G) the size of the largest
live set (which is smaller or equal than the number of available registers
after spilling).
• The major source of copy instructions in a program are φ-operations.
As shown in the introductory example, coalescing these copies too early
might result in an unnecessary high register demand. The coalescing phase
must take care that coalescing two variables will not exceed the number
of available registers.
Instead of merging nodes in the interference graph, we try to assign these
two nodes the same color. Since the graph then remains chordal, coa-
lescing can easily keep track of the graph’s chromatic number and refuse
coalescing two variables if this increases the chromatic number beyond the
number of available registers.
Section 5.2 defines the coalescing problem and proves it NP-complete. In
section 5.3, we present a heuristic approach and in section 5.4 we give an
optimal solution method for the coalescing problem.
• In section 6 we show how common register targeting constraints can be
reduced to the coalescing problem.
2 Foundations
In this section, we give some foundations which are vital for the rest of this
paper. First, we introduce a more compact notation for φ-operations, secondly,
we discuss several properties of interference graphs of SSA-form programs.
In this report, we assume that all instructions have already been scheduled.
Thus, we consider a program as being given by its control flow graph consisting
of labelled instructions
(y1, . . . , ym)← τ(x1, . . . , xn)
Each label ` has set pred` of control flow predecessors and a set succ` of control
flow successors. A basic block is a set of labels {`1, . . . , `n} for which holds:
If `1 is executed, all `i, 1 < i ≤ n are executed. Furthermore, we require the
program to be in SSA-form (see [CFR+91] for example), i. e. each variable is













if i3 < 100
return j3
j2 ← j3 + i3




(a) SSA program Q
func f1() = let i1 = 1, j1 = 1 in f2(i1, j1)
func f2(i3, j3) = if i3 < 100 then f3(j3) else f4(i3, j3)
func f3(j3) = j3
func f4(i3, j3) = let j2 = j3 + i3, i2 = j3 + 1 in f2(i2, j2)
(b) CPS for Q
Figure 3: SSA and CPS
2.1 φ-operations
The way φ-operations are commonly written is misleading in two ways:
1. Writing φ-operations as functions suggests that two different variables
being their operands may never be contained in the same register. For a
normal operation such as +,−, . . . this is true, but not for a φ-operation.
φ “computes” its value based on control flow information in a way that
only one operand is used for the computation. Interferences of operands
of a φ-operation are never caused by the φ-operation. So, if the operands
of a φ-operation do not interfere, it is explicitly wanted to hold them in
the same register, since the φ then degrades to a no-op. Basically, a φ-
operation is a notational trick for control flow dependent value renaming.
2. Since a φ-operation conventionally only allows to define one value, one
usually writes multiple φ-operations at the beginning of a basic block to
indicate, that multiple values are passed along an incoming edge of this
block. This suggests that these φ-operations are processed in order, which
is wrong as the semantics of SSA requires that all φ-operations in a basic
block are to be simultaneously evaluated as the first instruction in this
block.
5
The semantics of φ-operations is best described by the continuation passing
scheme (CPS) (see [Kel95] and [App98]), which transforms a SSA-form program
to a functional one by converting each block into a function and converting
control flow edges into function calls. The values live along a control flow edge
are passed as arguments to the function representing the basic block to which it
jumps to, as shown in figure 3. To circumvent some of the notational deficiencies
of φ-operations, we will subsume all φ-operations in a basic block in one matrix-
like Φ-operation. The brackets shall indicate, that the xij are not used as in an
ordinary operation.
y1 ← φ(x11, . . . , x1n)
...
...





 x11 · · · x1n... . . . ...
xm1 · · · xmn

2.2 Interference graphs of SSA-form programs
We consider a program given by its control flow graph in which each node (called
label) represents a single instruction (the program is already scheduled). The
label where a variable v is defined, is denoted by Dv. If a label ` dominates
a label `′, we write `  `′. Let G = (VG, EG) be the interference graph of an
SSA-form program P .
The first lemma states that interfering nodes are totally ordered regarding
dominance and is due to Budimlić [BCH+02].
Lemma 1. If two values v and w are live at some label `, either Dv dominates
Dw or vice versa.
Lemma 2. If v and w interfere and Dv  Dw, then v is live at Dw.
The next lemma shows, that the dominance relation is not arbitrarily di-
rected for chains in the interference graph.
Lemma 3. Let ab, bc ∈ EG and ac 6∈ EG. If Da  Db, then Db  Dc.
The proofs of these lemmas can be found in [Hac05].
3 Spilling
Normally, spilling in graph-based register allocators removes nodes from the in-
terference graph to make the graph k-colorable. But the node in the interference
graph does not reflect where and how often a variable is used; it just records all
interferences throughout the variable’s live range by its incident edges. Since
removing the node from the graph would cause inserting memory access code
after each definition and each usage of the variable, also at places in the program
where registers suffice, a lot of work has been done to break the live ranges of
variables in reasonable pieces, see [CH90] and [BDEO97] for example.
6
Bouchez [Bou05] investigates the problem of how to split ranges in order
to minimize the number of reload instructions. He gives precise descriptions
of the complexity of several variants of this problem and shows, that splitting
live ranges to lower the register pressure to a fixed k while inserting a minimum
amount of reload instructions is NP-complete depending on the chromatic num-
ber of the graph.
The following theorem is the foundation for the spilling techniques we pro-
pose in this section. It is trivial but not obvious and has been independently
proven by Bouchez [Bou05]. For non-SSA programs it does generally not hold.
For example, consider the non-SSA version of program Q in figure 3(a), which
will be discussed in section 5.1.
Lemma 4. For each clique C ⊆ G, VC = {v1, . . . , vn}, there is a permutation
σ : VC −→ VC , such that Dσ(v1)  · · ·  Dσ(vn).
Proof. By lemma 1, for each vi, vj , 1 ≤ i < j ≤ n either Dvi  Dvj holds or
Dvj  Dvi (dominance is total for interfering variables), every sorting algorithm
can produce σ.
Theorem 1. Let G be the interference graph of a SSA-form program and C an
induced subgraph of G. C is a clique in G iff there exists a label in the program
where all V (C) are live.
Proof. “⇐” holds by definition. “⇒”: By lemma 4, there exists a permutation
σ of the {v1, . . . , vn} = VC such that Dσ(v1)  · · ·  Dσ(vn). Then, by lemma 2,
all σ(v1), . . . , σ(vn−1) are live at Dσ(vn).
This implies, that the amount of registers needed for the program can be
determined, by checking the set of variables live at some instruction of the
program. Now it is obvious why splitting a live range by a copy instruction
does not lower the register demand in a SSA-form program: The number of live
variables does not change for any instruction in the program, thus the chromatic
number of the graph remains the same.
3.1 An adaption of Belady’s algorithm
Guo et al. [GGP03] show the power of Belady’s algorithm [Bel66] for spilling in
long basic blocks in SSA-form. Using the properties of SSA-form programs and
their interference graphs, the method by Guo can be extended to work on the
whole procedure, not just on a linear sequence of code.
Given k registers and a label `, where l > k variables are live. Clearly,
k− l variables have to reside in memory at ` for a valid register allocation. The
method of Belady selects those to remain in memory, whose usages are farthest
away from this label. Whereas “away” means the number of instructions which
have to be executed from ` until the usage is reached. If the usage is in the same
basic block, this number is simply given by the number of instructions between
7
` and the usage. If the usage is outside `’s block, we have to define a reasonable
measure, as e. g.:
nextuse(`, v) =

∞ if v is not live at `




We will apply the Belady method to each basic block seperately and combine
the results to obtain a solution for the whole procedure. Consider a basic block
B. We define P as the set of all variables live in at B and the results of the
Φ-operation in B if there is one. These variables are passed to the block B from
another block. A value v live in at B is passed on each incoming control flow












the x1, . . . , xn are passed along the respective incoming control flow edges to B
and are referenced under the “name” y in B.
Since we have k registers, only k variables can be passed to B in registers.
Let σ : P −→ P be a permutation which sorts P ascendingly according to
nextuse (viewed from the entry of block B). The set of variables which we allow
to be passed in registers to B is then
I :=
{
pσ(1), . . . , pσ(min{k,l})
}
We apply the Belady scheme by traversing the labels in the block from entry to
exit. A set Q of maximal size k is used to hold all variables which are currently
in registers. Q is initialized with I, optimistically presuming that all variables
of I are kept in registers upon entering the block B.
At each label
` : (y1, . . . , ym︸ ︷︷ ︸
D`
)← τ(x1, . . . , xn︸ ︷︷ ︸
U`
)
the U` have to be in registers when the instruction is reached. Assume that
some of the U` are not contained in Q, i. e. R := U` \ Q is not empty. Thus,
reloads have to be inserted for all variables in R. By inserting reloads, the values
of the variables are brought into registers. Thus,
max{|R|+ |Q| − k, 0}
variables are removed from Q. As the method of Belady suggests, we remove
the ones with the highest nextuse. Furthermore, if a variable v ∈ I is displaced
before it was used, there is no sense in passing v to B in a register. inB denotes
the set of all v ∈ I which are used in B before they are displaced.
8
We assume, that all variables defined by τ are written to registers upon
executing τ . This means, that
max{|D`|+ |Q| − k, 0}
variables are displaced from Q when τ writes its results1. Note, that for all
v ∈ U`, there is nextuse(v, `) = 0 which implies that the U` will be displaced
lastly. However, not the uses at ` are deciding but the uses after `. Thus, we
need a slightly modified version of nextuse
nextuse ′(`, v) = 1 + min
`′∈succ`
nextuse(`′, v)
which disregards the usages at `. After processing each label in the block, we
memorize the last contents of Q in the set outB .
Finally, after processing each block as described above, we have to combine
the results to form a valid solution for the whole procedure. Particularily, we
have to assure, that each variable in inB for some block B must reach B in a
register. Therefore, we have to check each predecessor P of B and insert reloads
for all inB \ outP on the edge from P to B2.
The pseudocode of the algorithm is presented in appendix D.
3.2 Spilling using integer linear programming
Based on theorem 1, we formulate an ILP-based approach for the spilling prob-
lem. For means of easier specification, assume that each basic block B contains
a terminating instruction which uses all variables live at the end of the basic
block3. Thus, each variable live at the end of a block B, has at least one use
in B. Let `B denote the label of the terminating instruction in block B.
Let us consider a label
` : (y1, . . . , ym︸ ︷︷ ︸
D`
)← τ(x1, . . . , xn︸ ︷︷ ︸
U`
)
as shown in figure 4. Let L` be the set of variables live out at ` without the
variables defined at `. Since all variables which are operands to τ must be present
in registers at ` and the variables defined by τ are also written to registers, the
following inequalities must hold:
|L`|+ |U`| − |L` ∩ U`| ≤ k
|L`|+ |D`| ≤ k
Thus, the amount p` of variables which can be “passed by” ` in registers is
p` := k −max{|D`|, |U`| − |L` ∩ U`|}
1Note, that Q ∩D` = ∅ since the program is in SSA-form.
2Inserting code on an edge is possible by eliminating critical edges and placing the code in
the respective block.
3A variable is live at the end of a block, if it is live out at the block or occurs as an argument
to a Φ-operation in a successor of B in the column corresponding to B.
9
a b x1 x2 x3




Figure 4: Uses, definitions and variables live through at a label
At each label ` where more than k variables are live, the spilling phase has
to select a subset smaller or equal to k of variables which shall reside in registers
at `. If a variable v is not in this subset, it has to be reloaded at its next use. So,
equivalently the spilling phase decides which usage of a variable is preceded by a
reload and which not. To avoid memory traffic, the number of reload operations
should be kept as small as possible.
3.2.1 Reloads
For every use of a (SSA-)variable x at label `, we introduce a decision variable
m(x,`) which shall be 1, if x reaches ` in memory causing a reload, or 0 if x
reaches ` in a register. If a variable is selected to remain in memory at some
label `, the next use of this variable will imply a reload, thus the decision variable
for that use must be 1. We define the function next(v, `) to obtain the decision
variable for the next use of a variable v live at some label `. Thus, for each
label `, we add the following constraint:∑
v∈T`
next(v, `) ≥ p`
Consider a variable v live in at some successor S of some block B. If the first
use of v in S (note that there is at least one, by construction) is from memory,
there is no sense in keeping v in a register at the end of B. Analogously, if the
first use of v in S is from a register, v must also be kept in a register at the end
of B. If it was not, one would need a reload of v at the beginning of S. Shortly:
a variable cannot change its storage location at basic block borders. Thus, for
each variable live in at some block B and all predecessors P1, . . . , Pn of B where
v is live out, we create the following constraints:
m(v,`P1 ) = next(v,first label in B)
...
m(v,`Pn ) = next(v,first label in B)
10













m(x1,`P1 ) = next(v,first label in B)
...
m(xn,`Pn ) = next(v,first label in B)
meaning that if the first use of v (from the beginning of B) is from memory,
all the arguments of v’s row shall also be in memory at the end of the Φ’s
predecessor blocks.





Rematerializing a variable means recomputing it instead of reloading it from
memory. The decision variable r(v,`) indicates, that v shall be rematerialized
at `. Clearly, a variable v can only be rematerialized at a label `, if the operands
o1, . . . , on of the instruction defining v are live at `.
n∑
i=1
m(oi,`) + n · r(v,`) ≤ n
Additionally, rematerializing v at ` only makes sense, iff v reaches ` in memory:
r(v,`) ≤ m(v,`)
If one can rematerialize a variable v at a label `, the costs incurred by m(v,`)
should be reduced. So it is sensible, to choose the weight ρ(v,`) of r(v,`) in the
target function between 0 and µ(v,`).
3.2.3 Stores
Up to now, we only considered the costs of reloads. If one also wants to take
into account the costs of store operations, additional decision variables and
constraints have to be added. For each (SSA-)variable v, we introduce one
decision variable sv to indicate whether v must be stored to memory. sv must
be 1, if there is some label ` with m(v,`) = 1. Furthermore, store costs should
11
not be incurred, if a value can be rematerialized. This is expressed by following
constraints:
sv ≥ m(v,`1) − r(v,`1)
...
sv ≥ m(v,`n) − r(v,`n)
We then add ∑
v∈Variables
σv · sv
to the cost function, where the σv are arbitrarily selectable positive weights.
Since the cost function gets minimized, sv will be automatically pulled to 0, if
all m(v,`i) are 0.









Note, that the constraints and variables for stores and rematerialization are
optional. If one wants to trade solving time against solution quality, one can
omit the variables/constraints for stores and/or rematerialization.
4 Coloring
The main tool for coloring chordal graphs are perfect elimination orders (PEO)
(see [Gol80]). PEOs determine in which order the nodes of the graph have to be
removed and put on a coloring stack so that the graph can be optimally colored4
by re-inserting them in reverse order and giving them the smallest color which
is not allocated to one of its already inserted neighbors.
PEOs are based on so-called simplicial nodes. A node is simplicial, if all its
neighbors belong to the same clique. A lemma by Dirac (see [Gol80]) states
that each chordal graph has a simplicial node. Since removing a node (and
its incident edges) from a chordal graph preserves chordality, there is always a
simplicial node to remove. Now it is obvious, why chordal graphs can be colored
so efficiently: When a removed simplicial node is re-inserted, all its previously
inserted neighbors form a clique (the node was simplicial as it was removed), so
the node is assigned the next free color.
PEOs are closely related to the dominance relation of a SSA-form program.
Consider a SSA-form program and its interference graph G:
Theorem 2. A SSA variable v can be added to a PEO of G if all variables
whose definitions are dominated by the definition of v have been added to the
PEO.
4using as few colors as possible
12
Proof. To be added to a PEO, v must be simplicial. Let us assume, v is not
simplicial. Then, by definition, there exist two neighbors a, b of v which are not
connected (va, vb ∈ E and ab 6∈ E). By the proposition, all variables whose
definitions are dominated by Dv have been added to the PEO and removed
from G. Thus, Da  Dv. Then, by lemma 3, Dv  Db which contradicts the
proposition. Thus, v is simplicial.
As a direct consequence, a PEO can be constructed by a post-order pass over
the dominance tree. Equivalently, the interference graph can also be colored by
assigning a color to a variable v, when all variables whose definitions dominate
the one of v have been colored. Thus, if dominance information and the set of
variables live at v’s definition are present, the graph can be colored with a simple
pass over the dominance tree without materializing the interference graph itself.
For pseudo code see Algorithm 1.
Algorithm 1 Coloring an interference graph of a SSA-form program
procedure Color-Program(Program P )
Color-Recursive(start block of P )
end
procedure Color-Recursive(Basic block B)
assigned← colors of the live-in
for each instruction (b1, . . . , bm)← τ(a1, . . . , an) from entry to exit do
for a ∈ {a1, . . . , an} do
if last use of a then
assigned← assigned \ color(a)
fi
od
for b ∈ {b1, . . . , bm} do
color(b)← one of all colors \ assigned
od
od





The coalescing phase tries to minimize the number of copy instructions by co-
alescing variables together. In our setting, copy instructions solely occur due
to the destruction of Φ-operations and to handle register constraints imposed
by the target architecture. Let us first discuss, how Φ-operations can be imple-
mented using real processor instructions.
13
5.1 Implementing Φ-operations
Conventionally, while translating out of the SSA-form, Φ-operations are sub-
stituted by a sequence of copy instructions. However, the main property of a
copy operation is that it duplicates a value and therefore brings the value into
another register. But duplicating a value is not always necessary when imple-
menting Φ-operations. For example, have a look at the Φ-operation in program
Q in figure 3(a). If the basic block of the Φ-operation is reached via edge e4, j3
is assigned j2 and i3 is assigned i2. Replacing the Φ by a sequence of copies
i3 = i2
j3 = j2
during SSA-destruction suggests that i3 interferes with j2 and they thus cannot
be assigned the same register. This adds an edge from i3 to j2 in the interference
graph of Q (see figure 5(a)) raising the register demand from 2 to 3.
However, Φ-operations can be removed in a way that the register demand
for that Φ never exceeds the number of variables the Φ defines5. Consider an
operation
(b1, . . . , bn) = Perm(a1, . . . , an)
which works as a “bulk copy” assigning a1 to b1, a2 to b2, and so forth in one
step. This corresponds to a permutation of the registers assigned to the ai and
bi. For example, consider the Φ in program Q in figure 3(a) and its register
allocated version in figure 5(b). To eliminate the Φ, a Perm has to be inserted
on the edge e4 which swaps R1 and R2.
There are however situations where a value must be duplicated. For instance
consider a slight modification of program Q by deleting the variable j1 and
modifying the Φ-operation as shown in figure 5(c). The Φ is now utilized to
duplicate i1 into i3 and j3 which can only be achieved with a copy instruction
on the edge e1. The same situation occurs, if the Φ-operation has an argument,
which is live-in at the Φ’s block (confer to figure 5(c)). Since i1 then interferes
with the Φ, a copy from i1 to i3 is inevitable on edge e1.
To sum up, duplicates are only needed, if a Φ-argument is used multiply in
one column (and thus on the path from the predecessor to the Φ’s block) or if a
Φ-argument is live-in at the Φ’s block. Note that mere interference with a value
defined by the Φ is not sufficient: Consider program Q in figure 3(a). j2 and j3
interfere but there is no duplicate of j2 needed, since it is not live-in at the Φ’s
block.
Considering machine code, permutations can be implemented in various ways
(refer to appendix B for basic definitions concerning permutations):
Register swaps
Some architectures, like the x86, have an instruction which swaps the
contents of two registers. This directly implements a transposition. Ar-
chitectures without such an instruction can implement a swap using three

















if R2 < 100
return R2
R1 ← R2 + R1















if i3 < 100
return j3
j2 ← j3 + i3
















if i3 < 100
return j3
j2 ← i1 + i3




(d) SSA program Q′′
Figure 5: Register allocated Q and example programs Q′, Q
15
instructions for the classical xor trick (which also works with addition and
subtraction operations).
Moves
Having one backup register, each cycle ζ can be implemented with |ζ|+ 1





where rX is a free register not used in the permutation.
5.2 Optimizing Φ-operations
As we have seen, we can eliminate Φ-operations in a way that no additional
register demand arises. Thus, a coloring of the interference graph of the SSA-
form program is a valid register allocation for the non-SSA one containing Perm-
operations. In order to lower the number of transpositions needed for a Perm,
we investigate the problem of maximizing the number of fixed points of a Perm6.
A variable x is a fixed point of a Perm
(. . . , x′, . . . ) = Perm(. . . , x, . . . )
if x and its target x′ are assigned the same register. (see appendix B for a
detailed discussion on permutations). Clearly, for fixed points no code has to
be generated. Even more, if the Perm only consists of fixed points, no code has
to be generated for the Perm at all.
Given an SSA-form program P , its interference graph G = (V,E) and the
set Φ of all Φs in P . For a valid k-coloring f : V −→ {1, . . . , k} of G, we define





 x11 · · · x1n... . . . ...








costf (yi, xij) with costf (a, b) =
{
wab if f(a) 6= f(b)
0 else
(1)
where the wab ≥ 0 are costs for copying b to a. The overall costs of the program





6Note, that optimizing fixed points is only an approximation corresponding to the tradi-
tional coalescing paradigm but does not generally minimize the number of transpositions.
16
Definition 1 (SSA-Maximize-Fixed-Points). Given a SSA-form program
P and its interference graph G. Find a coloring f of G for which c(P, f) is
minimal.
Theorem 3. SSA-Maximize-Fixed-Points is NP-complete depending on
the number of Φ-operations.
Proof. We reduce the NP-complete problem of finding a k-coloring of a graph
H to SSA-Maximize-Fixed-Points. The method of reduction is based on a
brilliant idea by Rastello et al. [RdFG03]. Consider the following example graph
H:
v1 v2 v3
We construct a SSA-form program from H in the following way:
• There is one block B
• For each node vi ∈ VH there is a block Bi and a control flow edge Bi −→ B
• For each edge vivj ∈ EH there are three basic blocks B′ij , Bij , Bji and the
following control flow edges
– Bi −→ Bji
– Bj −→ Bij
– Bij −→ B′ij
– Bji −→ B′ij










• A Φ-operation  p1...
pk
← Φ
 v1 · · · v|VH |... ...
v1 · · · v|VH |

is placed in block B.
Thus, the example program fragment now looks like:
17
v1 ← v2 ← v3 ←

































In the interference graph G of the constructed program, an edge vivj ∈ EH







The dashed lines denote that assigning the two nodes the same color will lower
the cost by one. First of all, let us consider a lower bound for the costs. Since
each vi can only be assigned one color, there are k − 1 nodes pi with different
colors than vi, resulting in costs of k−1 for vi. So, each optimal solution incurs
at least costs of |VH |(k − 1).
Assume H is k-colorable and let g be a k-coloring of H. Assign each vi ∈
VG the color g(vi), vi ∈ VH . Since g(vi) 6= g(vj) for each vivj ∈ EH , the
nodes vi, aij , pij can be assigned the same color. Thus, g incurs costs of exactly
|VH |(k−1). So each k-coloring of H induces an optimal solution of SSA-Maxi-
mize-Fixed-Points.
A coloring f of G, which does not correspond to a k-coloring of H is no
(optimal) solution of SSA-Maximize-Fixed-Points: Since f corresponds to
no k-coloring of H, there is vivj ∈ EH for which f(vi) = f(vj). Thus, f(aij) 6=
f(vi) and f(aji) 6= f(vj) resulting in costs strictly greater than |VH |(k − 1).
Thus, SSA-Maximize-Fixed-Points is NP-complete with the number of Φ-
operations.
18
5.3 A heuristic approach for Perm optimization
In this section, we present a heuristic approach for finding a k-coloring of the
interference graph of a SSA-form program with preferably low costs, as defined
in section 5.2.
The basic idea is to alter a coloring (as computed by the method described
in section 4) in order to assign arguments of Φ-operations and their results the
same color. We emphasize that a valid k-coloring of the interference graph G
is always maintained. Generally, the effects of changing the color of a node are
not local in the graph and require the alteration of several other nodes’ colors.
See appendix C for pseudo code of the algorithm.
For each row in a Φ, we build an optimization unit OU = (p, a1, . . . , ak)
consisting of
• The result p of the Φ in this row.
• The arguments a1, . . . , ak of the Φ which do not interfere with p. An
argument interfering with p can trivially never be assigned p’s register.
For each OU a minimization of the costs is then tried separately. The mini-
mization of an OU is not allowed to touch the results of an already processed
OU . The processing of an OU = (p, a1, . . . , ak) consists of three phases.
Init
For each allowed color c for p, we insert an entry Ec = (c, Cc, Sc) into a
priority queue consisting of
• the color c.
• a conflict graph Cc. Initially, Cc equals the subgraph induced by
p, a1, . . . , an in the interference graph.
• a maximum weighted stable set Sc of Cc. Each ai in the OU is as-
signed the weight wpai as defined in the cost function in equation 1 in
section 5.2. The weight of p is arbitrary, because p is contained in ev-
ery maximum stable set by construction. This property is preserved
throughout the optimization process.
This queue is ordered by decreasing weights of the Sc. So, if one succeeds
in altering the coloring of the interference graph so that all nodes in the
first Sc have the same color, the OU causes minimal costs.
Test
While testing has not finished, the first entry Ec is removed from the
priority queue. We then attempt to adjust the coloring of the interference
graph in a way, that the nodes p, a1, . . . , ak are assigned the color c. Note,
that until the testing phase is not completed for an OU , color changes are
only virtual and rolled back if the optimization fails for the OU .
We try to change the color for each u ∈ {p, a1, . . . , an} to c, in this order.
If a neighbor n of u is also colored with c, we annotate n with the former
19
color of u. This may provoke other conflicts which are then resolved
recursively. Swapping the color of a node v initiated by changing the
color of u to c ends in one of the four cases:
1. Changing v’s color does not generate new conflicts.
2. Register constraints forbid assigning v some color d. This conflict,
caused by the intention of changing u’s color to c, cannot be resolved.
This is indicated by adding the edge uu to the conflict graph Cc.
Thus, u is excluded from every possible stable set of Cc. So, c is
not assignable to u. If u = p then the entry is discarded (since
trying to assign c to the ai is not sensible if the color of p is unequal
to c). Otherwise, the entry is reinserted into the priority queue after
recomputing Sc.
3. v’s color has already been pinned (see phase Apply) by the process-
ing of another optimization unit. Then, changing v’s color would
increase the costs incurred by this other OU . uu is added to Cc and
the entry is reinserted into the queue as described above.
4. If v is a pinning candidate for the current OU , u and v are somehow
interdependent. The algorithm cannot assign c to u and v at the same
time. As we require p to be always contained in each Cc, if v = p
then we add the edge uu, otherwise the edge uv to Cc. Afterwards,
Sc is recomputed and the entry is re-inserted into the queue.
If all conflicts caused by changing u’s color to c have been resolved (all
ended in case 1), then u is marked as a pinning candidate, else all color
annotations caused by re-coloring u are discarded.
If all p, a1, . . . , an are marked as pinning candidates, testing ends for this
OU .
Apply
If the testing phase produced at least two pinning candidates (some ai and
p could be colored with the same color), the pinning candidates become
pinned and all color changes annotated by the pinning phase are applied
to G.
Note, that the Test-Phase always terminates, since in each step an edge is
added to the conflict graph, if testing was not successful. Thus, in the worst
case, the stable set will finally consist of the Φ-result only and is not re-inserted
into the priority queue. Thus, the whole algorithm terminates.
5.4 Φ-optimization using integer linear programming
In this section we describe a method yielding optimal solutions for SSA-Maxi-
mize-Fixed-Points. Let G = (V,E, Q) be the interference graph augmented
with edges Q that indicate a request for same colors of the adjacent nodes.
20
5.4.1 Formalization
Since every solution is also a valid coloring, we start with modeling a common
graph coloring problem. For each vi ∈ V and each possible color c the binary
variables xic indicate the color of node vi. xic shall be 1, iff node vi has color c.
The following constraint enforces that each node has exactly one color:




Furthermore, incident nodes must have different colors. For chordal graphs
this can be modelled efficiently with facet defining clique constraints, since a
minimum clique cover is computable in O(|V |2) for chordal graphs (see [Gav72]).
For each color c and each clique C in a given (minimal) clique cover C, we add
the following constraint: ∑
vi∈C
xic ≤ 1
So far, the model results in a valid coloring of G. Now we define the cost
function and add additional variables and constraints to model the same-color-
requests. For each edge vivj ∈ Q the binary variable yij shall indicate, whether
the incident nodes have the same color (0), or different colors (1). To model
the positive costs ωij arising from different colors we add the term ωijyij to the
objective function being minimized.
To interconnect the y with the x variables we add two constraints per color
and edge in Q. Since the yij are automatically pulled to 0 by minimization, we
only must enforce yij = 1, if vi and vj have different colors:
yij ≥ xic − xjc
yij ≥ xjc − xic
Though this is a complete formalization of the problem, one can improve it
by reducing the model size, or by adding constraints pruning the search space.
5.4.2 Improvements
To reduce the number of variables and constraints, we perform another step
before building the ILP. We remove the maximum number of simplicial nodes
from G, which are not concerned with equal-color-requests (not adjacent to
edges in Q). These nodes build the beginning of a PEO and thus can be colored
after the solution of the ILP is applied. Now the ILP only has to deal with the
“core” of the problem.
Finally, we want to present two classes of inequalities which can be used to
restrict the search space: Path-inequalities and Clique-Path-inequalities. The
basic idea for both classes is to provide lower bounds for the costs coming from
the contrariness of equal-color-request and interference-edges.
Definition 2. We call two nodes a, b equal-color-connected, ecc(a, b), if there is
a path of edges in Q connecting them, and no inner node of this path interferes
with another node of the path. Formally: ∃v1, . . . , vn ∈ V :
21
• a = v1, b = vn
• ∀1 ≤ i < n : vivi+1 ∈ Q
• ∀1 ≤ i < j ≤ n : vivj ∈ E ⇒ {vi, vj} = {a, b}





The second class uses a clique C = {v1, . . . , vn} in the interference graph com-




yvi,a ≥ n− 1
At first sight, the conditions of the second class seem to be very special, but
especially in the case where an argument appears multiply in the same column




yad + ycd ≥ 1




yad + ybd + ycd ≥ 2
(b) Clique-Path-inequality
Figure 6: Examples for the two classes of inequalities.
Register constraints can be integrated in the model by either setting the
forbidden xic to 0, or by omitting the corresponding variables and adjusting
some constraints.
6 Register constraints
Almost all processor architectures impose register constraints to some of their
instructions. Often, the register assignable to an operand of an instruction is
fixed. Graph coloring register allocators usually solve this problem by
22
1. splitting live range of that variable at the location of constrained occur-
rence by inserting copies to let it change its register there.
2. adding pre-colored nodes for each register in the register class.
3. Connecting the node of the constrained variable with all pre-colored nodes
but the one which represents the constraint color.
Note that omitting the live range splitting, even if two variables with the same
constraint do not interfere, could raise the register demand unnecessarily as
shown in figure 7.
aR1 ← . . .
b ← . . .
c ← b + 1
d ← a + 1
eR1 ← b + c
f ← c + d
· · ·









Figure 7: Program with constraints and its interference graph
Unfortunately, coloring chordal graphs with pre-colored nodes is only in P,
iff each color is used only once in a pre-coloring of the graph (see [Mar04]
and also [BHT92]). However, in register allocation, constraint colors often
are used multiple times. Therefore we delegate this problem to the Perm-
Optimizer which cares about minimizing the transpositions in Perms inserted
for Φ-operations as described in section 5:
We insert a Φ with a single argument column right in front of each instruc-
tion imposing register constraints on its operands or results. This expresses,
that all variables live in front of the constrained instruction are able to change
their register there. Thus, the interference graph breaks in two completely un-
connected components (see figure 8). This ensures, that in each component,
each color occurs only once as a pre-coloring. Along the way, this also solves
the problem of two interfering nodes which must reside in the same register and
never causes the register demand to exceed the number of registers available.
Coloring the operands of the instruction imposing the constraints is now
easy. Consider algorithm 1: Arriving at a Φ inserted due to register constraints,
the set of used colors is empty, since each value stops living at the Φ. The
values which get colored next are the results of that Φ followed by the results
of the constrained instruction. Since all colors are available, one can select the
23
coloring of these variables freely. Finally, we mark the color of the constrained
variable as not changeable to pass this information to the Perm-Optimizer.
It is then up to the Perm-Optimizer to find a coloring in which as many
operands and results as possible are assigned the same color.
aR1 ← . . .
b ← . . .
c ← b + 1






eR1 ← b′ + c′
f ← c′ + d′
· · ·












Figure 8: Program with constraints and its interference graph
7 Conclusions
SSA-form programs allow for a completely new architecture of register alloca-
tors. Due to the chordality of their interference graphs, the different phases of
register allocation (spilling, coloring, coalescing) can be completely decoupled,
thus avoiding the feedback based approach in common graph coloring register
allocators. In this report we gave an outline of such a new register allocator
architecture and proposed novel solution methods for each of the three phases
mentioned above.
Based on the direct correspondence of the live sets in the program to the
cliques in the program’s interference graph, we showed how an already existing,
heuristic method for spilling in basic blocks can be extended to work on a
whole procedure. Additionally, we presented an integer linear programming
formulation providing optimal solutions for the spilling problem. Both methods
work without constructing the interference graph.
Furthermore, we showed that an optimal coloring of the interference graph
can be obtained in linear time (assuming dominance and live ranges have already
been computed), also without constructing the interference graph itself.
Finally, we investigated the problem of copy coalescing, which we find is
identical to the removal of Φ-operations. We proved this problem to be NP-
complete and gave a heuristic, as well as an optimal solution method. Further-
24
more, we showed how common register constraints can easily be expressed in
this setting.
8 Related Work
Besides our work, the power of chordal graphs for register allocation has been
discovered by several people independently.
Brisk [Bri05] independently developed a proof of the chordality of SSA-
form program’s interference graphs, which is quite similar to ours. He utilizes
standard techniques from chordal graph theory to obtain an optimal coloring of
the interference graph. He does not consider spilling and coalescing.
Bouchez [Bou05] also recognizes the chordality of SSA-form programs’ inter-
ference graphs and gives an extensive analysis of the complexity of the spilling
problem in his master thesis.
Pereira and Palsberg [PP05] investigate interference graphs of non-SSA pro-
grams in a Java compiler and find that 95% of them are chordal. They further-
more present heuristics for spilling and coalescing which, since they work on
non-SSA programs, cannot exploit the special properties of SSA-form programs
and their interference graphs.
Although not considering chordal graphs, Appel and George [AG01] propose
a non-feedback based approach for register allocation of non-SSA programs us-
ing integer linear programming in which spilling, coloring and coalescing are
decoupled. They explicitly address processors with a small number of registers
to keep the linear programs solvable in a reasonable time.
9 Acknowledgements
We want to thank our colleagues Michael Beck, Marco Gaertler, Rubino Geiß
and Götz Lindenmayer for many fruitful discussions. We also thank Alan My-
croft and Simon Peyton Jones for many helpful comments.
A Graph theory terminology
Let G = (V,E) be an undirected graph. We call a graph G complete, iff for each
v, w ∈ VG, there is an edge vw ∈ EG and denote it by Kn, n = |VG|. We call
H = G[VH ] an induced subgraph of G, if VH ⊆ VG and for all nodes v, w ∈ VH ,
vw ∈ EG ⇐⇒ vw ∈ EH holds. H is called a clique if H is complete and H ⊆ G
for some G. ω(G) is the size of the largest clique in G. A graph G = (V,E)
with V = {v1, . . . , vn} and E = {v1v2, . . . , vn−1vn, vnv1} is called a cycle and is
denoted by Cn. A set of nodes {v1, . . . , vn} ⊆ VG is called a stable set, iff for
each vi, vj there is vivj 6∈ EG.
A coloring is a partition of VG into subsets C1, . . . , Ck whereas v, w ∈ Cm
implies that vw 6∈ EG. The chromatic number χ(G) is the smallest k for which
C1, . . . , Ck is a coloring of G.
25
Definition 3. A graph G is called perfect, iff ω(H) = χ(H) for each H ⊆ G.
Definition 4. A graph G is called chordal iff it does not contain any induced
Cn for n ≥ 4.
B Permutations
A bijective mapping σ over a set X is called a permutation. An i ∈ X for which
σ(i) = i, is called a fixed point of σ. A cyclic permutation ζ is a permutation
for which there is an i ∈ X and a |ζ| ∈ N for which ζ |ζ|(i) = i and for each
j ∈ X \ {i, ζ(i), . . . , ζ |ζ|(i)}, ζ(j) = j. A cyclic permutation is written by giving
the members of the cycle:(
1 2 3 4
1 3 4 2
)
⇒ (2 3 4)
Each cycle can be decomposed into cycles of length two, called transpositions,
in the following way:
(i1 i2 · · · ik) ⇒ (i1 i2) · (i2 i3) · · · (ik−1 ik)
A fixed point is a cycle of length 1. Each permutation is uniquely determined
by a product of disjoint cycles, up to isomorphy between the cycles since (1 2 3),
(2 3 1), (3 1 2) denote the same cycle.
26






 a11 · · · a1n... ...
am1 · · · amn
 do
for i=1,. . . ,m do




for optimization unit OU = (p, a1, . . . , ak) do
create priority queue Q . Init
for colors c assignable to p do
Cc ← G[p, a1, . . . , ak]
Sc ← maximum weighted independent set of Cc




g← f . Copy current coloring f to g
pop (c, C, S) from Q
C ′ ← Test(c, C, S)
if C ′ 6= nil then
S′ ← maximum weighted independent set of C ′
Insert entry (c, C ′, S′) into Q
fi
until C ′ = nil
if |candidates| > 1 then . Apply
pinned← pinned ∪ candidates





function Test(c, C, S)
for u ∈ S do
(s, v)←TryColor(u, nil, c)
if s = ok then
candidates← candidates ∪ u
else
if s = candidate and v 6= p then
C ′ = (VC , EC ∪ vu)
else







function TryColor(v ∈ VG, u ∈ VG, c)
cv ← g(v)
if c = cv then . The color of v is already c
return (ok, nil)
else if v ∈ pinned then . v has been pinned by another OU
return (pinned, v)
else if v ∈ candidates then . v has already been tested
return (candidate, v)
else if c is not allowed for v then . Due to register constraints
return (forbidden, v)
fi
for {n | vn ∈ EG, n 6= u, c = g(n)} do . Look at all conflicting
neighbors of v
(s, v′)←TryColor(n,v,cv) . Try to give a neighbor v’s color








D Pseudocode of spilling with Belady’s method
Algorithm 2 Belady’s algorithm for a basic block
function Displace(Set Q, Label `, Function nextuse, Variables V )
R← V \Q . R contains all elements of
V not in Q
X ← ∅ . Use X to record all vari-
ables displaced from Q
for i = 1 . . .max{|R|+ |Q| − k, 0} do . Remove as many variables
from Q so that R can
be added to Q preserving
|Q| ≤ k
w ← arg maxv∈Q nextuse(`, v)
X ← X ∪ w
Q← Q \ w
od
Q← Q ∪ V
return X
end
procedure Belady-Block(Basic block B)
I ← liveinB ∪ results of Φ in B
I ← k smallest members of I concerning nextuse
inB ← I . inB shall contain all vari-
ables which are already in
registers upon entering B.
Q← I
U ← ∅ . If a variable is used at
some label, it is put into
U
for each label ` : (y1, . . . , ym)← τ(x1, . . . , xn) in B do
Insert reloads for all x ∈ {x1, . . . , xn} \Q
X ← Displace(Q, `,nextuse, {x1, . . . , xn})
inB ← inB \ [X \ U ] . If a value live in is dis-
placed before it is used, it
can be removed from inB
since it is not necessary to
hold it in a register upon
entering the block
.
U ← U ∪ {x1, . . . , xn}
Displace(Q, `,nextuse′, {y1, . . . , ym})
od
outB ← variables in Q
end
29
Algorithm 3 Belady’s algorithm for spilling
procedure Spill-Belady(Procedure π, Number of registers k)
for each basic block B of π do
Belady-Block(B)
od
for each basic block B of π do
for P ∈ predB do






[AG01] Andrew W. Appel and Lal George. Optimal Spilling for CISC Ma-
chines with Few Registers. In ACM SIGPLAN 2001 Conference on
Programming Language Design and Implementation, pages 243–253,
June 2001.
[App98] Andrew W. Appel. SSA is Functional Programming. ACM SIG-
PLAN Notices, 33(4):17–20, April 1998.
[BCH+02] Zoran Budimlić, Keith D. Cooper, Timothy J. Harvey, Ken Kennedy,
Timothy S. Oberg, and Steven W. Reeves. Fast copy coalescing and
live-range identification. In Proceedings of the ACM SIGPLAN 2002
Conference on Programming language design and implementation,
pages 25–32. ACM Press, 2002.
[BDEO97] Peter Bergner, Peter Dahl, David Engebretsen, and Matthew
O’Keefe. Spill code minimization via interference region spilling. In
PLDI ’97: Proceedings of the ACM SIGPLAN 1997 conference on
Programming language design and implementation, pages 287–295,
New York, NY, USA, 1997. ACM Press.
[Bel66] L. Belady. A Study of Replacement of Algorithms for a Virtual
Storage Computer. IBM Systems Journal, 5:78–101, 1966.
[BHT92] M. Biró, M. Hujter, and Zs. Tuza. Precoloring extension. I. Interval
graphs. Discrete Mathematics, 100:267–279, 1992.
[Bou05] Florent Bouchez. Allocation de registres et vidage en mémoire. Mas-
ter’s thesis, ÉNS Lyon, 2005.
[Bri05] Philip Brisk. Personal communication. 2005.
[CAC+81] G. J. Chaitin, M. A. Auslander, A. K. Chandra, J. Cocke, M. E. Hop-
kins, and P. W. Markstein. Register allocation via graph coloring.
Journal of Computer Languages, 6:45–57, 1981.
[CFR+91] R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K.
Zadek. Efficiently computing static single assignment form and the
control dependence graph. ACM Transactions on Programming Lan-
guages and Systems, 13(4):451–490, October 1991.
[CH90] Fred C. Chow and John L. Hennessy. The priority-based coloring
approach to register allocation. ACM Trans. Program. Lang. Syst.,
12(4):501–536, 1990.
[Gav72] Fănică Gavril. Algorithms for minimum coloring, maximum clique,
minimum covering by cliques, and independent set of a chordal
graph. SIAM Journal on Computing, 1(2):180–187, June 1972.
31
[GGP03] Jia Guo, Maria Jesus Garzaran, and David Padua. The Power of
Belady’s Algorithm in Register Allocation for Long Basic Blocks.
The 16th International Workshop on Languages and Compilers for
Parallel Computing, 2003.
[Gol80] Martin Charles Golumbic. Algorithmic Graph Theory And Perfect
Graphs. Academic Press, 1980.
[Hac05] Sebastian Hack. Interference Graphs of Programs in SSA-form. Tech-
nical report, Universität Karlsruhe, June 2005.
[Kel95] Richard A. Kelsey. A Correspondence Between Continuation Passing
Style and Static Single Assignment Form. ACM SIGPLAN Notices,
30(3):13–22, 1995.
[Mar04] Dániel Marx. Graph Coloring with Local and Global Constraints.
PhD thesis, Budapest University of Technology and Economics,
2004.
[PP05] Fernando Magno Quintao Pereira and Jens Palsberg. Register allo-
cation via coloring of chordal graphs. In Proceedings of APLAS’05,
November 2005.
[RdFG03] Fabrice Rastello, Francois de Ferrire, and Christophe Guillon. Op-
timizing translation out of ssa with renaming constraints. Technical
Report RR-03-35, LIP, ENS Lyon, June 2003.
32
