Instruction scheduling and register allocation are two very important optimizations in modern compilers for advanced processors. These two optimizations must be performed simultaneously in order to maximize the instruction-level parallelism and to fully utilize the registers 11]. In this paper we solve register allocation and instruction scheduling simultaneously using integer linear programming (ILP). We have successfully worked out the ILP formulations for the problem with and without register spilling. Two kinds of optimizations are considered: (1) Fix the number of free registers and then solve for the minimum number of cycles to execute the instructions, or (2) x the maximum execution cycles for the instructions and solve for the minimum number of registers needed. Besides being theoretically interesting, our solution serves as a reference point for other heuristic solutions. The formulations are also applicable to high-level synthesis of ASICs and designs for embedded processors. In these application domains, the code quality is more important than the compilation time.
Introduction
In modern compilers for advanced multi-issue processors, instruction scheduling and register allocation are two very important optimizations. Due to their complexities, many previous works considered these optimizations separately and concentrated more on their phase-ordering 10, 11, 21] . However, no matter which optimization is done rst, the earlier phase has to make decisions without knowing how the later phase will do. As a result, the latter could only work with more constrained conditions and nd it more di cult to optimize the code. Furthermore, with the multi-issue feature of modern processors, the more the instructions are issued at the same cycle, the more the registers are required. Without enough registers, either register spilling has to take place or some instructions must be delayed. To minimize the execution time and to fully utilize the registers, instruction scheduling and register allocation must be considered at the same time.
Several heuristics have been proposed to solve register allocation and instruction scheduling together. For example, the strategy used in 6, 8, 23] is to keep the information on the next use of each register. The register whose next use is the farthest is spilled if there are not enough registers.
In 22], a graph combining the control/data ow graph and the register interference graph was proposed to solve the two optimizations simultaneously. The problem with this approach is that one cannot determine the edges in the complement graph of the parallel interference graph if we have more than three function units of the same type and more than two instructions using that type of function units.
In this paper we solve register allocation and instruction scheduling simultaneously using integer linear programming (ILP). One reason for using ILP is because the technique has been applied successfully to problems in application domains such as high-level synthesis for ASICs 15] . Our work can build upon those results. In addition, with ILP the resultant code is better than or as good as that obtained by heuristics, e.g., list scheduling 1, 5] . Thus, the ILP solution can serve as a reference for other heuristic solutions.
We will consider the optimization problem of solving instruction scheduling and register allocation simultaneously with and without register spilling. The goals of the ILP formulations are either to solve for the minimum number of cycles to execution the given instructions or for the minimum number of registers needed. As far as we know, there is no attempt being made so far to solve such optimizations using ILP. Our formulations have their theoretical merits. Furthermore, just as it is applicable to high-level synthesis, our formulations can be used in other application domains in which the code quality is more important than the compilation time.
Remainder of this paper is organized as follows. Basic assumptions and notations of our models are introduced in section 2. ILP formulation for the optimization problem with register spilling not allowed will be discussed in section 3, and that with spilling allowed will be discussed in section 4. Experimental results on example code segments will be presented in section 5. The formulated ILPs were solved using the LINDO package 17], which produced optimal integer solutions using branch-and-bound. Finally section 6 gives our concluding remarks.
Preliminaries
In this paper, we will consider the following two optimization problems:
Problem NRS: Schedule instructions and allocate registers in a basic block with register spilling not allowed.
Problem RS: Schedule instructions and allocate registers in a basic block with register spilling allowed.
We will use ILP to solve these two problems. Our ILP formulations consider the following two optimizations:
Optimization TIME: Fix the number of free registers and solve for the minimum number of cycles to execute the instructions.
Optimization REG: Fix the maximum execution cycles of the instructions and solve for the minimum number of registers needed.
Our formulations are based on the following assumptions:
The target processor has a multi-issue, load/store architecture with multiple function units.
Life ranges of the registers in the schedule will not span across basic block boundaries.
Every instruction takes only one cycle to execute.
All registers are of the same type.
Every operand of an instruction occupies only one register.
Registers used in a STORE instruction can be rede ned in the same cycle.
Several notations will be used throughout the paper. Let n be the total number of instructions in the given code segment and I i be the i-th instruction. The binary variable x i;c denotes whether I i is scheduled to cycle c. If so, then x i;c = 1; otherwise x i;c = 0. Let R be the number of available registers in the processor. The notation I i ! I j means that there is a data dependence from I i to I j . We call I i a parent of I j and I j a child of I i . Let CH(I i ) denote the set of all children of instruction I i . Suppose there are t types of function units in the processor. For each type of function units F k , 1 k t, there are N k units. The notation I i 2 F k will be used to indicate that I i requires a function unit of type F k .
In order to illustrate our ILP formulations, the example code listed in Fig. 1(a) will be used. Suppose that the target processor has one load/store unit, one multiplier, and two adders. The data ow graph (DFG) 9] corresponding to the example code is shown in Fig. 1(b) . Nodes in the graph represent instructions and edges represent data dependence relations between instructions. Since it is assumed that the result of an instruction must be in one register, we can view an edge in the DFG to represent not only a dependence relation but also a register de ne-use chain. Also, we have assumed that life ranges of the registers do not span across block boundaries. Thus in the DFG, the instructions without parents must be LOAD instructions and those without children must be STORE instructions. ******************** * Figure 1 goes here * ******************** Since the solution time of ILP depends on the number of variables in the formula, it is critical to reduce the number of variables. One common approach is to constrain the solution space with the earliest and latest issue times of each instruction. Let E i denote the earliest issue time of an instruction I i . Then, E i can be estimated as follows. Let c k be the number of predecessors of I i which require a function unit of type F k . Let a and b denote the earliest issue times of two of the parents of I i . Then, we have E i = maxfa; b; dc 1 =N 1 e; dc 2 =N 2 e; : : :; dc t =N t eg + 1:
To compute the latest issue time L i of I i , we must know the maximum execution time T max of all instructions. Without loss of generality, we can set T max = n, as if the instructions were executed sequentially. Next, reverse the directions of all edges in DFG and compute the earliest issue time 6 x 1;6 x 2;6 x 4;6 x 6;6 x 7;6 x 10;6 x 5;6 x 9;6 x 3;6 x 8;6 7 x 4;7 x 6;7 x 7;7 x 10;7 x 5;7 x 9;7 x 3;7 x 8;7 The constraints for instruction scheduling and register allocation are introduced separately in two subsections. Their combination forms the complete formulation for this problem, which is given in section 3.3.
Constraints for Instruction Scheduling
In this subsection, we consider the constraints in the ILP formulation for instruction scheduling. x i;c = 1; for 1 i n: (2) Note that the summation is taken from E i to L i instead of from 1 to T max . Consider instruction I 1 in the illustrative example. The following expression will be generated: 
Using U i;c , we can then constrain the number of registers used in each cycle by limiting that number to be smaller than the total number of registers R: X I i 6 =STORE U i;c ? R 0; for each cycle c (9) Since each instruction in the illustrative example has only one successor, Eq. 5 can be applied. 
Note that we can also solve for Optimization REG by replacing T min with the given T max and then try to minimize R. Also, we can solve for the minimum execution cycles subject to the minimum number of free registers. This is done by rst solving for Optimization REG to get the minimum number of registers, say R min , and then replacing R with R min to solve for Optimization TIME. Similarly, we can also solve for the minimum number of registers subject to the minimum execution cycles. Again this is done by rst solving for Optimization TIME to get the minimum number of cycles, say T, and then replacing T max with T to solve for Optimization REG.
ILP for Problem RS
In this section we consider the ILP formulation for Problem RS, in which register spilling is allowed. Basic ideas of our formulation are introduced rst, followed by the constraints for instruction scheduling and register allocation. The complete formulation is shown in section 4.4.
Basic Idea
The ILP formulation for instruction scheduling and register allocation becomes very complex when register spilling is taken into account. First, it is hard to use ILP to determine which register should be spilled. Second, dynamically added spill code also makes the formulation for instruction scheduling extremely di cult. Third, the spill code changes the live range of registers, which also complicated ILP formulations.
Our strategy here is to add spill code at every possible location rst and then eliminate those unwanted. Several issues have to be resolved when using this strategy:
Where should spill code be added? In our formulation we add spill code in the following two cases:
{ LOAD instructions:
For a LOAD instruction with only one successor, we do not change anything. For a LOAD with K successors, where K > 1, we add K ? 1 identical copies of that LOAD to the basic block. In this way we can split the life range of registers if some of these LOAD instructions are e ective.
{ Data dependences:
For a data dependence I i ! I j , if I i is not a LOAD instruction and I j is not a STORE, then we change the dependence to I i ! STORE ! LOAD ! I j . In the new instruction sequence, the STORE instruction stores the content of the destination register of I i to memory and the LOAD instruction loads that value from memory to register. For the illustrative example, Fig. 2 shows the resultant DFG with all spill code added. The solid arcs indicate the original dependence relationships and the dashed arcs indicate the dependences newly introduced.
******************** * Figure 2 goes here * ******************** Which spill code is redundant and can be eliminated?
Again, we consider the following two cases:
{ LOAD instructions:
As mentioned above, a LOAD instruction with K (K > 1) successors is replicated K times. We now permit these identical LOAD instructions to be scheduled into the same cycle, even though there is only one load/store unit. If two or more of these LOAD instructions are scheduled into the same cycle, then we treat them as one LOAD in the ILP formulation. In this way they will occupy only one function unit and require only one register. We will show how this can be formulated in ILP shortly.
For a dependence chain I i ! STORE ! LOAD ! I j derived from I i ! I j , we permit the STORE to be scheduled in the same cycle as I i . If that happens in the nal schedule, then this means that the dependence I i ! I j is not spill. Of course, if I i and the STORE are scheduled to the same cycle, we must also schedule the LOAD and I j to the same cycle. Again, the ILP formulation will be shown later.
For the illustrative example, 
Constraints for Instruction Scheduling
Function unit constraint:
For ease of explanation, we divide all instructions into four groups:
(1) A LOAD instruction with only one child or an instruction which is not a LOAD and is not in a spill code: 
Note that x i;c = x i kS ;c = 1 for some k means that the destination register of I i will not be spilled by I i kS . In this case we can delete that STORE instruction and the associated 
Note that x j;c = x i kL ;c = 1 for some k means that the destination register of I i will not be spilled and reloaded by I i kL later. Thus we can delete that LOAD instruction. 
Note that f ;c 2 F k means that the type of the function unit that f ;c will use is F k .
Appearance constraint:
Since one instruction may appear in exactly one cycle, we have the following expressions:
where n is now the total number of instructions after adding all spill code.
Precedence constraint:
If the two instructions in the precedence relation I i ! I j cannot be scheduled to the same cycle, then we have the following expression: 
Experimental Results
To evaluate the e ectiveness of our ILP formulations, two examples were used. The rst one is our illustrative example shown in Figure 1 , which will be referred to as Example 1. The second example is shown in Figure 3 , which will be referred to as Example 2. All the ILP formulations are solved on a SPARC-10 workstation using the LINDO package 17]. LINDO produces an integer solution for an ILP problem using the branch-and-bound method.
******************** * Figure 3 goes here * ******************** Statitics related to the ILP formulations and their running times on the SPARC-10 workstation are listed in Table 3. This table gives Using LINDO, we were able to obtain optimal solutions to the ILP formulations for Examples 1 and 2. When the number of registers is limited to two, the code in Example 1 will be scheduled as in Figure 4 by solving Problem RS. We can see from the gure that there is a register spilling at cycle 5. On the other hand, the code in Example 1 renders no solution if Problem NRS is to be solved and the number of registers is limited to two. ******************** * Figure 4 goes here * ********************
Concluding Remarks
Previous approaches to code compilation for multi-issue processors usually consider instruction scheduling and register allocation separately. These two optimizations must be considered simultaneously in order to maximize the instruction-level parallelism and minimize the number of registers used. In this paper we have shown how to solve register allocation and instruction scheduling simultaneously using ILP. We have successfully worked out the ILP formulations for the problem with and without register spilling. When applying the formulations to our example codes, optimum schedules were obtained for the target machine. One major problem with ILP is that the number of variables and expressions in the formulations could be very large for only a small code segment. This results in a very long solution time. To reduce the time complexity we need a more accurate way of estimating the earliest and latest issue time, perhaps through some heuristics. We also need to further re ne the formulations to minimize redundant variables and/or inequalities. We believe that through these re nements, we should be able to develop a more general technique to instruction scheduling and register allocation in multi-issue processors. 
