Using integer linear programming for instruction scheduling and register allocation in multi-issue processors  by Chang, Chia-Ming et al.
Computers Math. Applic. Vol. 34, No. 9, pp. 1-14, 1997 
Pergamon Copyright~)1997 Elsevier Science Ltd 
Printed in Great Britain. All rights reserved 
0898-1221/97 $17.00 + 0.00 
PII: S0898-1221(97)00184-3 
Using Integer Linear Programming 
for Instruction Scheduling and Register 
Allocation in Multi-Issue Processors 
CHIA-MING CHANG,  CHIEN-MING CHEN AND CHUNG-TA K ING 
Department of Computer Science, National Tsing Hua University 
Hsinchu, Taiwan 300, R.O.C. 
kingOcs, nthu. edu. tw 
(Received September 1996; revised and accepted June 1997) 
Abstract - - Inst ruct ion scheduling and register allocation are two very important optimizations in
modern compilers for advanced processors. These two optimizations must be performed simultane- 
ously in order to maximize the instruction-level parallelism and to fully utilize the registers [1]. In 
this paper, we solve register allocation and instruction scheduling simultaneously using integer linear 
programming (ILP). We have successfully worked out the ILP formulations for the problem with and 
without register spilling. Two kinds of optimizations are considered: 
(1) fix the number of free registers and then solve the minimum number of cycles to execute the 
instructions, or 
(2) fix the maximum execution cycles for the instructions and solve the minimum number of 
registers needed. 
Besides being theoretically interesting, our solution serves as a reference point for other heuristic 
solutions. The formulations are also applicable to high-level synthesis of ASICs and designs for 
embedded processors. In these application domains, the code quality is more important han the 
compilation time. 
Keyword~- - In teger  linear programming, Compiler optimization, Instruction scheduling, Register 
allocation, Processor architecture. 
1.  INTRODUCTION 
In modem compilers for advanced multi-issue processors, instruction scheduling and register 
allocation are two very important optimizatious. Due to their complexities, many previous works 
considered these optimlzatious eparately and concentrated more on their phase-ordering [1-3]. 
However, no matter  which optimization is done first, the earlier phase has to make decisions 
without knowing how the later phase will do. As a result, the latter could only work with more 
constrained conditions and find it more difficult to optimize the code. Furthermore, with the 
multi-issue feature of modem processors, the more the instructions are issued at the same cycle, 
the more the registers are required. Without  enough registers, either register spilling has to take 
place or some instructions must be delayed. To minin',dze the execution t ime and to fully utilize 
the registers, instruction scheduling and register Mlocation must be considered at the same time. 
This research was supported in part by the National Science Council under Grants NSC-86-2213-E007-043 and 
NSC-S~2213-E-00Z-04a 
Typeset by .AA48-TEX 
1 
2 C.-M. CHANG et ~. 
Several heuristics have been proposed to solve register allocation and instruction scheduling 
together. For example, the strategy used in [4-6] is to keep the information on the next use of each 
register. The register whose next use is the farthest is spilled if there are not enough registers. 
In [7], a graph combining the control/data flow graph and the register interference graph was 
proposed to solve the two optimizations simultaneously. The problem with this approach is that 
one cannot determine the edges in the complement graph of the parallel interference graph if we 
have more than three function units of the same type and more than two instructions using that 
type of function units. 
In this paper, we solve register allocation and instruction scheduling simultaneously using 
integer linear programming (ILP). One reason for using ILP is because the technique has been 
applied successfully to problems in application domains uch as high-level synthesis for ASICs [8]. 
Our work can build upon those results. In addition, with ILP the resultant code is better than 
or as good as that obtained by heuristics, e.g., list scheduling [9,10]. Thus, the ILP solution can 
serve as a reference for other heuristic solutions. 
We will consider the optimization problem of solving instruction scheduling and register al- 
location simultaneously, with and without register spilling. The goals of the ILP formulations 
are either to solve the minimum number of cycles to execution the given instructions or for the 
minimum number of registers needed. As far as we know, there is no attempt being made so far to 
solve such optimizations using ILP. Our formulations have their theoretical merits. Furthermore, 
just as it is applicable to high-level synthesis, our formulations can be used in other application 
domains in which the code quality is more important han the compilation time. 
The remainder of this paper is organized as follows. Basic assumptions and notations of our 
models are introduced in Section 2. ILP formulation for the optimization problem with register 
spilling not allowed will be discussed in Section 3, and that with spilling allowed will be discussed 
in Section 4. Experimental results on example code segments will be presented in Section 5. 
The formulated ILPs were solved using the LINDO package [11], which produced optimal integer 
solutions using branch-and-bound. Finally, Section 6 gives our concluding remarks. 
2. PRELIMINARIES 
In this paper, we will consider the following two optimization problems. 
PROBLEM NRS. Schedule instructions and allocate registers in a basic block with register spilling 
not allowed. 
PROBLEM RS. Schedule instructions and allocate registers in a basic block with register spilling 
allowed. 
We will use ILP to solve these two problems. Our ILP formulations consider the following two 
optimizations. 
OPTIMIZATION TIME. Fix the number of free registers and solve, for the minimum number of 
cycles to execute the instructions. 
OPTIMIZATION REG. Fix the maximum execution cycles of the instructions and solve, for the 
minimum number of registers needed. 
Our formulations are based on the following assumptions. 
• The target processor has a multi-issue, load/store architecture with multiple function 
units. 
• Life ranges of the registers in the schedule will not span across basic block boundaries. 
• Every instruction takes only one cycle to execute. 
• All registers are of the same type. 
• Every operand of an instruction occupies only one register. 
• Registers used in a STORE instruction can be redefined in the same cycle. 
Integer Linear Programming 3 
Several notations will be used throughout the paper. Let n be the total number of instruc- 
tions in the given code segment and I~ be the i th instruction. The binary variable xi,c denotes 
whether/~ is scheduled to cycle c. If so, then x~,c = 1; otherwise xi,c -- 0. Let R be the number of 
available registers in the processor. The notation I~ -* / j  means that there is a data dependence 
from I~ to Ij. We call I~ a parent of Ij and Ij a child of Ii. Let CH(I~) denote the set of all 
children of instruction I+ Suppose there are t types of function units in the processor. For each 
type of function units Fk, 1 < k < t, there are Ark units. The notation/i E Fk will be used to 
indicate that Ii requires a function unit of type Fk. 
11 LOAD PRI,a 
I2 LOAD PR2,b ~ 
I3 ADD PR3,PRI,PR2 ~ ~ v 
14 LOAD PR4,c _ __ ~ 
Is MULT PILS,PR3,PR4 { Io )"- 
Is LOAD PR6,d 
I~ LOAD PRT,e _ ] 
Is ADD PR8,PR6,PR7 [ 
/9 MUUF PR9,PR5,PR8 ,L 
Iio STORE PRg,f 
PR# denotes a pseudo register 
(a) The code segment. (b) The DFG. 
Figure 1. An example code segment and its DFG. 
In order to illustrate our ILP formulations, the example code listed in Figure la will be used. 
Suppose that the target processor has one load/store unit, one multiplier, and two adders. The 
dataflow graph (DFG) [12] corresponding to the example code is shown in Figure lb. Nodes in the 
graph represent instructions and edges represent data dependence relations between instructions. 
Since it is assumed that the result of an instruction must be in one register, we can view an edge 
in the DFG to represent not only a dependence relation but also a register define-use chain. Also, 
we have assumed that life ranges of the registers do not span across block boundaries. Thus in the 
DFG,  the instructions without parents must be LOAD instructions and those without children 
must be STORE instructions. 
Since the solution time of ILP depends on the number of variables in the formula, it is critical 
to reduce the number of variables. One common approach is to constrain the solution space with 
the earliest and latest issue times of each instruction. Let E, denote the earliest issue time of an 
instruction li. Then, Ei can be estimated as follows. Let c~ be the number of predecessors of I, 
which require a function unit of type Fk. Let a and b denote the earliest issue times of two of 
the parents of li. Then, we have 
E~ = max a,b, , ,..., +I. 
To compute the latest issue time Li of I~, we must know the max imum execution time Tmax 
of all instructions. Without loss of generality, we can set Tm~x = n, as if the instructions were 
4 C.-M. CHANG et al. 
executed sequentially. Next, reverse the directions of all edges in DFG and compute the earliest 
issue time E /o f  I~. The latest issue time can then be obtained as Li = Tr,,x - E /+ I. 
3. ILP FOR, PROBLEM NRS 
Problem NRS considers instruction scheduling and re~ster allocation without register spilling. 
It is suitable for cases in which there are a large number of free registers and the number of 
instructions in the basic block is small. In this section, we present he ILP formulations for 
Problem NRS. The constraints for instruction scheduling and register allocation are introduced 
separately in two subsections. Their combination forms the complete formulation for this prob- 
lem, which is given in Section 3.3. 
Table i. The variable distribution table of the illustrative example in solving Problem 
NRS. 
LOAD/STORE MULT ADD 
Cycle I1 I2 14 I6 I7 110 I5 I9 I3 I8 
1 3:1,1 3:2,1 3:4,1 3:6,1 3:7,1 
2 Z1,2 3:2,2 3:4,2 3:6,2 3:7,2 
3 3:1,3 :r2,3 3:4,3 X6,3 3:7,3 3:3,3 3:8,3 
4 3:1,4 3:2,4 3:4,4 2:6,4 X7,4 3:5,4 Z3,4 3:8,4 
5 3:1,8 3:2,5 3:4,5 3:6,5 3:7,5 3:5,8 3:9,5 3:3,5 3:8,8 
6 3:1,6 3:2,6 3:4,6 Z6,6 Z7,6 3:10,6 X5,6 3:9,6 Z3,6 3:8,6 
7 3:4,7 3:6,7 3:7,7 3:10,7 X5,7 3:9,7 Z3,7 :E8,7 
8 3:10,8 3:5,8 3:9,8 Z8,8 
O XlO,9 ~9,9 
10 3:10,10 
3.1. Constraints for Instruct ion Scheduling 
In this section, we consider the constraints in the ILP formulation for instruction scheduling. 
To reduce the solution space, we can use the earliest issue time Ei and the latest issue time Li of 
an instruction Ii. From Ei and Li, a variable distribution table can be constructed. Table 1 shows 
such a table for our illustrative xample. An empty entry means that the corresponding variable 
is out of the range defined by Ei and L~ and will not be considered in the ILP formulation. On 
the other hand, a nonempty entry x~,c means that instruction Ii may be scheduled at cycle c. It 
takes a value of 0 or 1. From this table, we can formulate the possible constraints for instruction 
scheduling. The formulation generally follows that introduced in [8] for solving local instruction 
scheduling. For completeness of presentation the expressions are listed below. 
• Function unit constraint. The total number of instructions which can be executed 
simultaneously b a particular type of function units c~nnot exceed the total number of 
that type of function units. In other words, 
z,,c-Nk<_O, for 1 _< c _< Tmax and 1 < k < t. (1) 
l, eFk 
For the illustrative xample, we have the following inequalities for the load/store units at 
the first two cycles: 
3:1,1 + Z2,1 "[- 2:4,1 "[" Z6,1 -[" Z7,1 -- i _< O, for the first cycle, 
Zl,2 + z2,2 + z4,2 + zs,2 + z~,~ - 1 _< O, for the second cycle. 
As another example, at cycle seven, we have the following inequalities for multipliers and 
adders: 
z5,7 + z9,7 - 1 _< O, for multipliers, 
zs,7 + zs,7 - 2 <__ O, for adders. 
Integer Linear Programming 5 
• Appearance  constra int .  Since one instruction can only be executed in exactly one 
cycle, we have the following expressions: 
Li 
:£~,c -- I, for 1 < i < (2) 
c-~ E~ 
Note that the summation is taken from Ei to Li instead of from 1 to Tmax. Consider 
instruction 11 in the illnstrative example. The following expression will be generated: 
:£I,I "~- :£1,2 "~- :£1,3 + Xl,4 + :£1,5 "~- :£1,6 : I. 
• P recedence  constra int .  When there is a data dependence from instruction li to l j ,  
i.e., I~ ~ lj, li cannot be scheduled after or at the same cycle as l j .  This leads to the 
following inequalities: 
L~ L# 
(c  x z i , c )  - ~ (c x :£j,c) <- -1,  VIi ~ Ij. (3) 
c----E~ c----E# 
For example, the data dependence from instruction Is to 19 in the illustrative example 
produces the following inequality: 
3:£8,3 ~ 4XS,4 ~ 5:£8,5 -[- 6:£8,6 + 7:£8,7 + 8:£8,S -- 5:£9,5 -- 6:£9,6 -- 7:£9,7 -- 8:£9,8 -- 9:£9,9 ~_ - - i .  
• T ime constra int .  To minimize Train means that no instruction can be scheduled after 
Train. That is, 
L~ 
(C × :£i,c) -- Tmin ~ 0, V l  i without successors. (4) 
c...~ E i  
For the illustrative example, we have the following inequality: 
6:£10,6 + 7:£10,7 "{- 8:£10,8 + 9:£10,9 + 10:£10,10 -- Tmin ~_ 0. 
3.2. Const ra in ts  for Reg is ter  A l locat ion 
This section considers the ILP formulation for register allocation for Problem NRS. To facilitate 
the formulation, a new 0/1 variable Ui,c is defined. It is equal to 1 when the register defined in 
instruction I~ is alive at cycle c, i.e., the instruction occupies a register in cycle c. Note that we 
do not have to compute U~,c of a STORE instruction. 
If instruction I~ has only one child Ij, i.e., CH(Ii) = {Ij}, then 
c c 
u,,o-- (5) 
k~l k~-I 
However, if Ii has K children (K > 1), then the inequality corresponding to Ui,¢ becomes difficult 
to formulate. Our solution here is to define a temporary variable UTi,c such that 
UT, ,o  = r × y :  - . (6)  
k~l IjECH(I~) k----1 
6 C.-M. CHANG e$ ¢Z~. 
Obviously, the register defined in/~ is alive at cycle c if and only if UT~,c > 0. That is, U~,c = 1 
when UT~,c > 0 and Ui,c = 0 when UT~,c -- O. So we have the following two inequalities to 
constrain Ui,c using UT~,c: 
Ui,c - UTi,c < 0, (7) 
K x U~,= - UT~,~ > O. (8) 
Using U~,¢, we can then constrain the number of registers used in each cycle by limiting that 
number to be smaller than the total number of registers R: 
E U~,c - R _< 0, for each cycle c. (9) 
h # STORE 
Since each instruction in the illustrative xample has only one successor, equation (5) can be 
applied. For example, we have the following expressions for/2: 
U2,1 = ~g2,1, 
U2,2 --= U2,1 "[" 22,2, 
U2,c = U2,~-, + z2,c - X3,c, for 3 < c < 6. 
The constraint on the number of registers in each cycle can then be obtained. For example, at 
the first cycle, we have 
UI,I -}- U2,1 +U4,1 -{-U6,1 "4" []7,1 -R  < 0. 
3.3. Complete Formulation for Prob lem NRS 
The complete ILP formulation for solving Problem NRS based on Optimization TIME is as 
follows. 
Minimize: Tmin, (10) 
subject o: E x~,c - Nk <_ 0, for 1 < c _< Tmax and 1 < k < t, (1) 
I~.Fu 
Li 
E x~,c = 1, for 1 < i ~ n, (2) 
c----Ei 
L~ L# 
E (c x x~,c) - E (c × zj,c) _< -1, Vii --* Ij, (3) 
c=E~ c=E~ 
Li 
E (C × Zi,¢) -- Train _~ 0, Y X~ without successors, (4) 
c= Ei 
Ui,¢ - R < 0, for each cycle c. (9) 
x~ # STORE 
Note that we can also solve for Optimization REG by replacing Tmin with the given Tmax 
and then try to minimize R. Also, we can solve for the minimum execution cycles subject o 
the minimum number of free registers. This is done by first solving the Optimization REG to 
get the minimum number of registers, say Rmin, and then replacing R with Rm~ to solve the 
Opt|mlzation TIME. Similarly, we can also solve for the minimum number of registers ubject o 
the minimum execution cycles. Again this is done by first solving the Optimization TIME to get 
the minimum number of cycles, say T, and then replacing Tmax with T to solve the Optimization 
REG. 
Integer Linear Programming 7 
4. ILP FOR PROBLEM RS 
In this section, we consider the ILP formulation for Problem RS, in which register spilling 
is allowed. Basic ideas of our formulation are introduced first, followed by the constraints for 
instruction scheduling and register allocation. The complete formulation is shown in Section 4.4. 
4.1. Bas ic  Idea 
The ILP formulation for instruction scheduling and register allocation becomes very complex 
when register spilling is taken into account. First, it is hard to use ILP to determine which 
register should be spilled. Second, dynamically added spill code also makes the formulation for 
instruction scheduling extremely difficult. Third, the spill code changes the live range of registers, 
which also complicated ILP formulations. 
Our strategy here is to add spill code at every possible location first and then eliminate those 
unwanted. Several issues have to be resolved when using this strategy. 
• Where should spill code be added? 
In our formulation, we add spill code in the following two cases. 
- -  LOAD inst ruct ions .  For a LOAD instruction with only one successor, we do not 
change anything. For a LOAD with K successors, where K > 1, we add K -  1 
identical copies of that LOAD to the basic block. In this way, we can split the life 
range of registers if some of these LOAD instructions are effective. 
- -  Data  dependences .  For a data dependence Ii ~ Ij, if Ii is not a LOAD instruc- 
tion and Ij is not a STORE, then we change the dependence to Ii --, STORE 
LOAD ~ Ij. In the new instruction sequence, the STORE instruction stores the 
content of the destination register of Ii to memory and the LOAD instruction loads 
that value from memory to register. For the illustrative xample, Figure 2 shows the 
resultant DFG with all spill code added. The solid arcs indicate the original depen- 
dence relationships and the dashed arcs indicate the dependences newly introduced. 
• Which spill code is redundant and can be eliminated? Again, we consider the following 
two cases. 
- -  LOAD inst ruct ions .  As mentioned above, a LOAD instruction with K (K > 1) 
successors i replicated K times. We now permit these identical LOAD instructions 
to be scheduled into the same cycle, even though there is only one load/store unit. 
If two or more of these LOAD instructions are scheduled into the same cycle, then 
we treat them as one LOAD in the ILP formulation. In this way they will occupy 
only one function unit and require only one register. We will show how this can be 
formulated in ILP shortly. 
- -  Data  dependences .  For a dependence chain Ii --* STORE --* LOAD --, I j  derived 
from I~ --* I j ,  we permit the STORE to be scheduled in the same cycle as I~. If that 
happens in the final schedule, then this means that the dependence Ii --* I i is not 
spill. Of course, if I~ and the STORE are scheduled to the same cycle, we must also 
schedule the LOAD and I i to the same cycle. Again, the ILP formulation will be 
shown later. 
For the illustrative xample, Table 2 shows the resultant variable distribution table after spill 
code is added. Since the table is too large, Ei and L~ for each instruction are specified instead. In 
the next two sections, we will see how the ideas presented in this section are applied to formulating 
the constraints of the ILP for Problem RS. 
4.2. Constra ints  for Instruct ion Schedul ing 
• Funct ion  unit constraint .  For ease of explanation, we divide all instructions into four 
groups. 
8 C.-M. CHANt eta/. 
Figure 2. The DFG of the illustrative example with spill code ~lded. 
Table 2. The variable distribution table of the illustrative example for solving Prob- 
lem RS. 
LOAD/STORE 
I1 I2 131S ISlL I4 151S I51L I6 
E~ 1 1 3 5 1 4 6 1 
L~ 12 12 13 14 13 14 15 13 
LOAD/STORE MULT ADD 
17 I81S I81L 110 /5 I9 I3 18 
E~ 1 3 5 6 4 5 3 3 
L~ 13 14 15 16 14 15 13 14 
(1) A LOAD instruction with only one child or an instruction which is not a LOAD and is not 
in a spill code. 
For each such instruction I~, we define a 0/1 variable f~,c to denote if I~ occupies a function 
unit in cycle c. Note that  xi,e can be viewed as the number of function units (0 or 1) that  I i  uses 
in cycle c. Thus, we have 
fi,c - zi,c = 0, for Ei ~ c _< Li. ( I I )  
In the illustrative example, instructions I I  ,~ I io are in this group. Take I2 as an ex~_m_ple. We 
have the expressions 
f2,c - z2,¢ = 0, for 1 _< c ~ 12. 
(2) LOAD instructions with more than one child. 
Integer Linear Progrsmming 9 
Each instruction Ii in the group is a LOAD instruction with K children, where K > 1. As 
mentioned above, there will be K identical copies of that LOAD instruction. Let the copies be 
denoted Iil, I i2 , . . . ,  IiK. Since these LOAD instructions can be scheduled into the same cycle, 
we must determine the number of load/store units they use precisely at each cycle. Define a 0/1 
variable fi,c to denote the number of load/store units used by Ii in cycle c. Then, fi,c = 0 if 
xil,c + xi2,c + -'" + x~K,c = 0, and fi,c = 1 otherwise. We thus have the following inequalities 
for fi,c: 
fi,c - (xil,c + xi2,c +""  + ZiK,C) <- O, (12) 
g x fi,c - (xil,c + xi2,c -}- • "" + x~K,c) ~ 0. (13) 
K When ~k=l  xih,c = 0, equation (12) forces fi,c to be 0, while equation (13) has no effect on fi,c. 
K x On the other hand, when ~kffil ik,c = 1, equation (13) forces fi,c to 1, while equation (12) has 
no effect. 
(3) STORE instructions which are in a spill code. 
We define one 0/1 variable fis,c for each set of STORE instructions with the same parent Ii. 
The variable f,s,c denotes whether these STORE instructions occupy a function unit in cycle c. 
Denote the STOREs in that set as Ii,s, I i2s,. . . ,  IiKs, where K > 0. We must do some pre- 
processing. Define a new variable Yiks,c for each xiks,c to denote whether Xiks,c is an effective 
STORE. That is, if Xihs,c = 1 and xi.c = 0, then Yi,s,c = 1, otherwise Yi,s,c = O. The variable 
helps to determine whether Iiks occupies one function unit at cycle c. It is constrained by the 
following inequalities: 
Y~ks,c - m x xiks,rn - E m x zi,rn _< 0, (14) 
\m=E~hs m=Ei 
~lihs,C -- Zihs,C ~-- O, (15) 
T× E Y i , s ,~-  m×Xiks ,m-E  m×zi , ,n  >0. (16) 
mffiE~ks \mfEiks mffiE~ 
We then have the following inequalities to constrain/is,c: 
K 
f i s , c -  Ey iks ,c  < O, (17) 
k--1 
K 
K x f,s,c - E Yi,s,c >- O. (18) 
k----1 
Note that x~,c = zihs,c = 1 for some k means that the destination register of Ii will not be spilled 
by I~ks. In this case, we can delete that STORE instruction and the associated LOAD. Note 
that i f / i  has only one child in the original code, we can use fis,c directly to replace Y~hs,c in 
equation (14)-(16), and delete equation (17),(18). In our example, the children of 13,/5, Is are 
in this group. Consider the children of Iz. We have the following inequalities: 
Y3~s,c - m x zz~s ,m - m x zz ,m ~ 0, V3  < c <_ 13, 
\mr3  mr3  
Y31s,c - X31s,c <- O, V3 ~ c <_ 13, 
16 ~ ~s,~,, .  - ,~ x =3,~,,. - ~ m × x3,,. > o, 
re=f3 \re=f3 mr3  
f3s,¢ - YSls,c <- O, 3 <_ c <_ 13, 
fzs,c - Y81s,c >- O, 3 <_ c <_ 13. 
I0 C.-M. CHANG et al. 
(4) LOAD instructions which are in a spill code. 
Define one 0/1 variable f~L,¢ for each set of such LOAD instructions whose parent's parent 
is Ii. The variable fiL,c denotes if these LOAD instructions occupy a function unit in cycle c. 
Suppose we have K LOAD instructions I~IL, I~2L,... , IiKL in such a set. We also need to do some 
preprocessing. Define a new variable Y~hL,c for each XihL,C to denote whether zih~,¢ is an effective 
LOAD. That is, if ZihL,c = 1 and Zj,c = 0, then YihL,C = 1, where I j is the child of I~hL, otherwise 
YikL,c = 0. The variable is used to determine whether I~L occupies one function unit at cycle c. 
Constraints associated with such variables are as follows: 
vi .~,c - m x x~,~ - ~ m x x ,~, ,m < 0, (19) 
\mfE~ mfE, k,. 
W~,c  - x~,c  _< 0, (20) 
T x ~ YihL,~ -- m x x~,= - m x ZihL,~ > 0. (21)  
mffimin(Eis, L ) \m----Ej mfEo,  L 
We then have the following inequalities to constrain fi,,c: 
K 
fiL,c - ~ Yi.L,c < 0, (22) 
k----1 
K 
Kx/ iL ,C  -- ~--~'~ V',L,¢ > 0. (23) 
k----1 
Note that Zj,c = xik~.,c = 1 for some k means that the destination register of I~ will not be spilled 
and reloaded by IihL later. Thus, we can delete that LOAD instruction. Note that if Ii has only 
one child in the original code, we can use f,L,c to replace Y~kL,C" Consider instruction 13 in our 
example. We have 
Y3~L,c - -  m X XS, m - -  m X X3tL ,  m __~ 0, 5 < c < 14, 
\mffi4 mffi5 
V31L,c - -  ZStL,c --~ 0, V5  < c < 14, 
mffi5 \mffi4 m--5 
fs~,c - Y3~,, < 0, 5 < c < 14, 
f3L,C -- Y3tL,C --> 0, 5 < C < 14. 
Finally, we can derive the ILP constraints related to the function units. 
$. , , -  Nk < 0, fo r l<c<Tand l<k<t .  (24) 
I...EFh 
Note that f.,c E F~ means that the type of the function unit that f.,c will use is F~. 
• Appearance  constra int .  Since one instruction may appear in exactly one cycle, we 
have the following expressions: 
t~ 
z,,c = 1, 1 < i < n, (25) 
c----Ei 
where n is now the total number of instructions after adding all spill code. 
Integer Linear Programming 11 
• Precedence  constra int .  If the two instructions in the precedence relation I~ - .  I j 
cannot be scheduled to the same cycle, then we have the following expression: 
Li Lj 
E cxx~,,c- E c× xj,c ~-1 .  (26) 
cfE~ cfE~ 
If the two instructions in the precedence r lation can be scheduled to the same cycle, then 
we have the following expression: 
L~ L~ 
Ecxx ' , c -  E c×xj,c<_O. (27) 
c----E~ cfE j  
Note that if they are scheduled to the same cycle, then the spill code is ineffective. 
• T ime constra int .  The constraint is the same as in Problem NRS: 
Li 
E (C × Zi,c) -- Tmi n _~ 0, V I i  without successors. (28) 
c----Ei 
4.3. Const ra in ts  for Register  A l locat ion 
We define a 0/1 variable U~,c for each instruction I~ in the original code segment, where "1" 
means that the register defined in instruction I~ is alive at cycle c. In other words, it must occupy 
a register in cycle c. When a LOAD instruction has more than one child, we add spill code and 
require that they share the same U~,¢. Note that we do not have to compute U~,c of a STORE 
instruction. 
If CH(Ii) = {Ij} and Ii is a LOAD or Ij is a STORE instruction, then 
c c 
U~,c = E X,,m- E xj,m. (29) 
m----1 m----1 
However, if I~ is a LOAD with K (K > 1) children in the original code, then we can define a 
temporary variable UT~,c such that 
K c c 
UT,,c-- E E zih,m- E E z,,m. (30) 
k=1 mf f i l  1~CU(h) mffil 
Suppose li is an instruction whose child is not a STORE. Then, we have 
c c K c K c 
UT,, :K E *,,m- E EY',,. ''+ E E E J. '', (31) 
m----1 mffil kffil m----1 k=l  IjECH(Ii) mffiffil 
where Y~hs and YikL are defined in the previous ection. The variable UT~,c can thus be used to 
compute Ui,c as follows: 
U~,c - VT~,c < O, (7) 
K x Ui,c - UT~,¢ >_ O. (8) 
After we have all Ui,e's, the number of registers used in each cycle can be constrained by 
limiting that number to smaller than R: 
E Ui,¢ - R _< 0, for each cycle c. (32) 
I~# STORE 
12 C.-M. CHANG et al. 
4.4. Complete Formulation for Problem RS 
The complete ILP formulation for solving Problem NRS based on Optimization T IME is as 
follows. 
Minimize: Tmin (33) 
to: E f . ,c -Nk-<O,  fo r l<c<T,  l<k_<t ,  (24) subjec  
f.,cEFk 
Li 
X,,c = 1, 1 < i < n, (25) 
c~Ei 
L~ L~ 
E c x Xi,c - E c x Xj,c _< -1 (or 0), Vii --* Ij (--* is dashed), (26) 
cfE~ c=E~ 
Li 
E (C × Zi,c) -- Tmin ~ 0, Vii without successors, (28) 
c-~-Ei 
E Ui,c - R <_ O, for each cycle c. (32) 
z~@ STORE 
5. EXPERIMENTAL  RESULTS 
To evaluate the effectiveness of our ILP formulations, two examples were used. The first one is 
our illustrative xample shown in Figure 1, which will be referred to as Example 1. The second 
example is shown in Figure 3, which will be referred to as Example 2. All the ILP formulations 
are solved on a SPARC-10 workstation using the LINDO package [11]. LINDO produces an 
integer solution for an ILP problem using the branch-and-bound method. 
Ii LOAD PRI,a 
/2 LOAD PR2,b 
/3 MULT PIL3,PRI,PR2 
/4 LOAD PR4,c ~ 
/5 LOAD PRS,d 
Is ADD PR6,PR4,PR5 
Iv LOAD PRT,e 
I8 ADD PRS,PR1,PR7 
/9 MULT PRg,PR6,PR8 
110 ADD PRI0,PR3,PR9 ( 
111 ADD PR11,PR3,PR6 
Ix2 STORE PR10~f 
/13 STOKE PRll,g 
PR@ denotes a pseudo register (~)  ( '~ 
(a) The code segment. (b) The DFG. 
Figure 3. Another example code segment and its DFG. 
Statitics related to the ILP formulations and their running times on the SPARC-10 workstation 
are listed in Table 3. This table gives a general idea of the complexity of our formulations. From 
the table, we can see that the formulation for Problem RS has about four times more variables 
and inequalities than that for Problem NRS. The solution time is about 60 times more. Thus, 
Integer Linear Programming 13 
"l~ble 3. Statistics of the ILP formulations and their running times. 
ILP for NRS ILP for RS 
Example Variables Inequalities Time Variables Inequalities Time 
1 56 93 2 ,~ 15sec 248 335 20rain 
2 231 206 i rain -- -- 
solving for Problem NRS is computationally more feasible than solving for Problem RS. Note 
that we did not solve Problem NRS for Example 2, because its formulation is very complex and 
we do not have a suitable tool to generate the expressions automatically. 
Using LINDO, we were able to obtain optimal solutions to the ILP formulations for Examples 1
and 2. When the number of registers is limited to two, the code in Example 1 will be scheduled 
as in Figure 4 by solving Problem RS. We can see from the figure that there is a register spilling 
at Cycle 5. On the other hand, the code in Example 1 renders no solution if Problem NRS is to 
be solved and the number of registers is limited to two. 
Function Unit 
Cycle 
L/S ADD MULT 
1 h 
2 I2 
3 [4 I3 
4 le 15 
5 I51s 
6 I7 
7 lsiL I8 
8 [9 
9 I~0 
Figure 4. Example 1 after our optimization (number of free registers i two). 
6. CONCLUDING REMARKS 
Previous approaches to code compilation for multi-issue processors usually consider instruc- 
tion scheduling and register allocation separately. These two optimizations must be considered 
simultaneously in order to maximize the instruction-level parallelism and minimize the number 
of registers used. In this paper, we have shown how to solve register allocation and instruction 
scheduling simultaneously using ILP. We have successfully worked out the ILP formulations for 
the problem with and without register spilling. When applying the formulations to our example 
codes, optimum schedules were obtained for the target machine. 
One major problem with ILP is that the number of variables and expressions in the formulations 
could be very large for only a small code segment. This results in a very long solution time. To 
reduce the time complexity, we need a more accurate way of estimating the earliest and latest 
issue time, perhaps through some heuristics. We also need to further refine the formulations to 
minimize redundant variables and/or inequalities. We believe that through these refinements, 
we should be able to develop a more general technique to instruction scheduling and register 
allocation in multi-issue processors. 
REFERENCES 
1. J.R. Goodman and W. I-Isu, Code scheduling and register allocation in large basic blocks, In Proceedin9a of 
the International Confc.-~-,ce o'n Supercomputing, pp. 442--452, (1988). 
2. P.B. Gibbons and S.S. Muchnick, Efltcient instruction scheduling for a pipelined architecture, In ACM SIG- 
PLAN'86 Symposium on Compiler Construction, (1986). 
14 C.-M. CHANG et al. 
3. D.M. Lavery, P.P. Chang and W.W. Hwu, The importance of prepass code scheduling for superscalar nd 
superpipelined processors, Technical Report No. CRHC-91-18, University of Illinois, Urbsas,-Charnpalgn, IL, 
(1991), 
4. J.R. Ellis, A Compiler for VLIW Architectures, MIT Press, Cambridge, (1986). 
5. S.M. Freudenberger and J.C. Ruttenberg, Phase ordering of register allocation and instruction scheduling, 
In Code Generation--Concepta, Tools and Techniques, (1991). 
6. R.F. Touzeau, A FORTRAN compiler for the FPS-164 scientific omputer, In ACM SIGPLAN '84 Symposium 
on Compiler Construction, (1984). 
7. S.S. Pinter, Register allocation with instruction scheduling: A new approach, In Proceedings of the ACM 
$IGPLAN Conyerence on Programming Language Design and Implementation, pp. 258-257, (1993). 
8. C.T. Hwang, Optimum and heuristic algorithms for the scheduling problem in high level synthesis, Ph.D. 
Thesis, National Tsing-Hua University, (1992). 
9. Adam, A comparison of list schedules for parallel processing systems, Communications of the ACM, 685--690, 
(1974). 
10. S. Davidzon, Some experiments in local microcode compaction for horizontal machines, IEEE Transactions 
on Computers, 460-477, (1982). 
11. LINDO System, Inc., LINDO: Linear INteractive and Discrete Optimizer for linear, integer, and quadratic 
programming problems. 
12. D.D. Gajski, N.D. Dutt, A.C. Wu and Y.L. Lin, High-Level Synthesis, Kluwer Academic Publishers, (1992). 
13. M. Auslander and M. Hopkins, An overview of the PL.8 compiler, In Proceedings o] the ACM $IGPLAN 
Symposium on Compiler Construction, (1982). 
14. D.G. Bradlee, S.J. Eggers and R.R. Henry, Integrating register allocation and instruction scheduling for 
RISCs, In Proceedings of the Fourth International Conference on Architectural Support for Programming 
Languages and Operating Systems, pp. 122-131, (1991). 
15. G.J. Chaltin, Register allocation and spilling via graph coloring, ACM SIGPLAN Symposium on Compiler 
Construction, (1982). 
16. J.A. Fisher, The optimization of horizontal microcode within and beyond basic blocks, Ph.D. Thesis, New 
York University, (1979). 
17. J.L. Hennessy and T. Gross, Hardware/software t adeoffs for increased performance, In ACM Transactions 
on Programming Languages and Systems, (1983). 
18. J.L. Hennessy and D.A. Patterson, Computer Architecture a Quantitative Approach, Morgan Kaufmann, 
(1990). 
19. W. I-Isu, Register Allocation and code scheduling for load/d~cretionary store architectures, In Computer 
~qcience Technical Report #722, University of Wisconsin-Madison, (1987). 
20. S. Jaln, Circular scheduling: A new technique to perform software pipelining, In Proceedings of the ACId 
SIGPLAN '91 Conference on Programming Language Design and Implementation, (1991). 
21. B.Y. Lin, A study on local instruction scheduling in superSPARC by modified GCC compiler, Master Thesis, 
National Tsing-Hua University, (1993). 
22. C. Norris and L.L. Pollock, A scheduler-sensitive global register allocator, In Proceedings ol the International 
Conference on Snpercomputing, (1993). 
23. K. Kennedy, P. Briggs, K.D. Cooper and L. Torczon, Coloring heuristics for register allocation, AOM SIG- 
PLAN NOTICES, (1989). 
