A Parametric View of Retargetable Register Allocation by Bryant, Kelvin S. & Mauney, Jon
A Parametric View of RetargetableRegister AllocationKelvin S. Bryant Jon Mauneyksb@cs.umd.edu mauney@csc.ncsu.eduDept. of Computer Science Dept. of Computer ScienceUniv. of Maryland, College Park, MD 20742 N.C. State Univ., Raleigh, NC 27695301-405-2703 January 24, 1995AbstractWe discuss the problems involved in building a retargetable register allocator for use in anoptimizing compiler. While the popular \register coloring" method is machine-independent, theallocator as a whole must implement numerous machine-dependent decisions. We present thekinds of information that must be parameterized in order to include register allocation in aretargetable compiler back-end, and discuss a sample solution.1 IntroductionRegister allocation is an important compiler optimization, and numerous strategies for allocatingregisters have been described in the literature. The register allocation approach is usually publishedas an algorithm that can be applied to any target architecture. However the allocator as imple-mented in a compiler must deal with machine-specic details and apparently must be hand-tunedto t the target. In order for a code-generator generator system to include register allocation, theallocator must be implemented as a general algorithm, with the target-machine details suitablyparameterized for easy retargeting.For example, we will deal with what is probably the most popular register allocation technique,graph coloring, which was rst described by Chaitin [CAC+81] and subsequently improved manyothers [CH84, BGG+89, BCKT89]Graph coloring is used to solve resource allocation problems where resource usage must bemutually exclusive. Nodes in the graph represent clients that use resources, edges link clients thatneed resources during overlapping time intervals, and nally, colors represent the resources. Theproblem is to assign a color to each node using a maximum of k colors while not assigning the samecolor to nodes that are connected by an edge. The register allocation equivalent, termed registercoloring, attempts to color a graph called an interference graph. In the interference graph, nodesrepresent uses of a value or variable (known as a live range), edges connect live ranges that are livesimultaneously, and allocatable registers represent the the limited colors. Since the graph coloringproblem is known to be NP-complete, Chaitin proposed a heuristic (shown in gure 1).The register coloring method is elegant and apparently quite general. However, in practice,register set congurations and register usage conventions are highly idiosyncratic. A practical reg-ister allocator must accomodate the details of the machine, which complicates retargeting register1
(to allocate k registers, by finding a k-coloring of a graph)RepeatBuild the interference graph.Repeatif there is an unconstrained node (a node with degree lessthan k) left in the graph then- Remove the node and all its edges from the graph.- Push the node onto coloring stack.else- Remove a constrained node and all its edges from the graph.- Mark the node for spilling.- Push node onto coloring stack.Until the graph is empty.Add spill code for the nodes marked for spilling.Until no nodes need to be spilled.Repeat- Pop node from coloring stack.- Add node and all its edges back into graph.- Assign a color that differs from all its neighbors.Until coloring stack is empty.Figure 1: Chaitin's Coloring Heuristicallocators to dierent architectures. For example, consider the following register set congurationsand assembly usage conventions:MIPS R4000: A RISC architecture that contains 32 32-bit general purpose registers that store in-teger, char, and pointer values. There are also 16 64-bit oating point registers that store oatand double values. Function actual parameters are passed using a combination of registersand the program stack depending on the data types and number of parameters.MC680x0: A CISC architecture that contains 8 32-bit data registers that store integer, char,oat and double values. Addresses are stored in 8 32-bit address registers. Although addressregisters can also store integers, only the addition and subtraction arithmetic operations useaddress register operands.VAX: A CISC architecture containing 16 32-bit registers that store integer, char, pointer, oatand double values. Scalar function results are returned in register r0. Parameters are passedusing the program stack (or main memory) and a special assembly call instruction.These examples illustrate the machine-dependencies that a register allocator must accomodate.If the register allocator is to be part of a retargetable code generator, then the allocator will haveto deal with the following problems: Dierent machines will have dierent k values for the dierent register sets. A value of a particular data type may be handled dierently on dierent architectures. Forexample, the MIPS has a distinct register set to store oat values. But on the VAX, there isonly one register set to store all data types, so integers and oats will interfere.2
 On typical RISC machines, operands only reside in registers which increases the size andcomplexity of the interference graph. In addition, RISC machine must also reserve registersto store intermediate results. The rst phases of the code generator construct the control ow graph and live ranges.Dierent code generators will represent these structures dierently. Machines have special purpose registers that can be allocated in some portions of a programbut not in others. For example, some machines require function results to be returned in aspecic register. Some machines use register pairs store certain values and instruction results.In addition, the compiler writer must consider the classic phase-ordering problem between theregister allocator and the code selector. If aggressive register allocation is performed before codeselection, a CISC code selector may be forced into chosing less than optimal instructions to matchthe choices made by the register allocator. On the other hand, if code selection proceeds rst, theregister allocator may be forced into a suboptimal allocation due to restrictions imposed by theselected instructions.Compiler writers use code generator generators to aid the development of new backends; how-ever, because of the above problems, register allocators are still usually hand-coded. Even withhand-coded allocators, coordinating the actions of the code selector and register allocator to mini-mize the phase-ordering problem is often overlooked. Previous papers describing register allocationhave typically included implementation results, but none have discussed retargetability.This paper describes an interface for building retargetable register allocators. The perspec-tive will be that of the register allocator. The question is "What would ease the task of build-ing/retargeting allocators?" We wish to completely encapsulate the register allocator. Thus weneed to isolate the machine-specic information so that it can be provided as a part of the code-generator system; on the other hand, we also want to be able to replace the register allocator withan improved strategy { or move the allocator to another compiler { so we must also identify thealgorithm-specic needs of the allocator. We will use the term \relocatable allocator" to describea register allocator that is designed to be portable among target machines and among containingcompilers. The next section describes the data structures and accessing routines needed by a typ-ical coloring algorithm. Section 3 presents a coloring algorithm designed using our ideas. Finally,Section 4 gives concluding remarks and future work.2 Parametric DescriptionPerforming register coloring requires several data structures that contain machine independent andmachine dependent information. These structures are built in the early phases of the code generatorand vary in implementation from one compiler to another. If retargeting to multiple compilers andarchitectures is to be achieved, accessing routines which hide implementation and architecturaldetails must be provided by the compiler writer. We briey describe the data structures, theassociated accessing routines, and the interface between these structures and the coloring algorithm.The rst three of these concern the interface between the allocator and the rest of the compiler.The fourth contains the machine-specic information.3
2.1 Interference GraphDiscovering interferences between nodes is a machine-dependent operation. For example, integersand oats on the MIPS are stored in two dierent register sets; therefore, no edges will join integerand oat live ranges. Other machines may use the same register set for integers and oats, so theremay be conicts represented in the interference graph. In essence, the set of edges, E, in the graphwill vary for the same program on dierent machines. This dierence can be encapsulated into asingle function that compares two live-ranges and returns \true" if they interfere. The function, inturn, would use the register-set description information of section 2.4.The interference graph is apparently independent of the details of the allocation algorithm, andcan be constructed by the code generator. However, many coloring routines rebuild the interferencegraph several times during execution. Therefor, the compiler writer must provide acessing routinesto allow incremental modication of the interference graph. The following is a description of thenecessary routines:insert: Given a live range, analyze conicts and add the node to the interference graph.delete: Remove a node and associated edges.traverse: Traverse all nodes in the graph and apply a given operation.Building the original graph becomes a matter of traversing the list of live ranges and applyingthe insert operation. The delete operation should return a reference to the removed nodes tooptimize iterative heuristics such as Chaitin's.2.2 Live RangesThe vertices of the interference graph are composed of live ranges. Live ranges serve as the atomicunit of allocation for the coloring routine; however, some coloring routines will split live ranges aspart of the spill process. Splitting live ranges requires the coloring routine to manipulate individuallive ranges which are implementation specic. Coloring routines also employ cost functions to helpmake coloring and spill decisions. These cost functions must be able to access live range informationsuch as live range length, type of reference (read/write), and nesting depth of individual references.Other general access routines are as follows: Create a new empty live range. Remove a single element from a live range. Add a single element from a live range. Transfer references from one live range to another. Traverse a live range and access individual elds (which ones?) Create pointers to live range elements. Advance to the next/previous live range. Go to beginning of the live range list.With the above routines, the compiler writer can now manipulate live ranges in an implemen-tation independent way. The spill routine can split live ranges and update elds to reect the newrelationships in the interference graph. 4
new_lr = {};Find entry point(s) into live range lr and enqueue into queue Q;while ( (Q is not {}) and(Number of colored neighbors in new_lr <= number of allocatable registers) and(Number of blocks in lr > 1 /* lr is non-empty */ ) doblock = head(Q);if (block is in lr) thenMove block from lr to new_lr;if (block has 1 successor) thenAdd this successor to head of Q.elseAdd successors to tail of Q.od Figure 2: Live range splitting heuristic2.3 Control Flow GraphThe Control Flow Graph (CFG) captures the ow of control between the basic blocks of the userprogram. As mentioned earlier, some coloring routines will split live ranges as part of the spillprocess. The split routine uses the CFG and the interference graph to locate appropriate splitlocations. Figure 2 shows the algorithm for a typical split routine [LH86].The algorithm performs a breadth-rst search of the given live range (using the CFG) to buildtwo new live ranges. The rst live range will be composed of the live ranges elements traversedin the breadth-rst search. The algorithm attempts to make this live range as large as possiblewhile making it colorable (i.e. the number of colored neighbors should remain less than the numberof allocatable registers). The second live range is composed of the remaining nodes. To supportaccess to the CFG, the following routines are needed: Return the rst basic block of the program. Given a basic block, return all immediate successor blocks. Return the program statements contained in a given basic block.2.4 Register Set DescriptionsThe coloring routine will assign registers to the live ranges represented in the interference graph.These assignments are accomplished by matching the data type of a live range with a registerset designed for that data type. A retargetable allocator must be able to access a minimal set ofinformation concerning the registers on the host machine. Map a live range to a register set. Given a register set, return the number of allocatable registers. Given a register set and a particular register number, return the assembly name of the register. Return the forbidden list of a live range. The forbidden list consists of the register numbersassigned to neighbors. 5
It is assumed that the code generator will know how to map a live range to a register set usinginformation provided by the compiler writer before compiler compile time. If the register sets arenumbered, the register information could be represented by a simple matrix indexed using (Registerset number X register number).3 Register Allocator ExampleThe goal of this section is to demonstrate the exibility of the interface in building relocatablecoloring routines. Figure 3 shows an implementation of the heuristic given in gure 2. The routinerelies on the accessing routines described in section 2 to retrieve implementation and hardwarespecic information. The following is a summary of the important features of the example coloringroutine.Function buildIntGraph builds the original interference graph. It makes a call to insertNodefor each live range in the live range list. The live range accessing routines allow buildIntGraphto iterate through the live ranges. In addition, these accessing routines also allow insertNode tocompare live ranges (nodes) in the graph for conicts.The outermost loop executes until a k-coloring is guaranteed. This is accomplished by a seriesof operations that remove unconstrained nodes from the graph followed by marking (for spilling)and removing one or more constrained nodes. Chaitin's heuristic assumes that the code selectorhas already executed and generated register references for each operand. Thus, spilling requires theallocator to insert code to load registers from memory at each reference in the spilled live range.Spill code will vary for dierent machines and is handled by lower-level routines in the allocator.Function getConstrained returns an unconstrained node if one exists. This process is optimizedif the data structure containing the live ranges is sorted and indexed by degree [MB81, BCKT89].Function getLowCost returns a pointer reference to the constrained node with the lowest spillcost. There are many adequate methods of computing spill cost but most rely on a combinationof the degree of the live range, the number of references in the live range and the nesting depthof the references. The interface allows modication of the function without modifying the coloringroutine.As a nal note, consider deleting and reinserting nodes in the graph. Spilled nodes are notreturned to the interference graph; however, their removal aects neighboring nodes. A good graphdata structure should allow quick access between neighboring nodes in the graph. When a node isspilled, neighboring nodes should be incrementally updated to reect the fact that the spilled nodewill not be returned to the graph. Other nodes should only be "psuedo" removed (i.e. just markedas removed but remaining in the graph data structure). This process optimizes the subsequentrebuilding phase of the allocator.4 ConclusionsWe have described major issues related to retargeting register allocators to dierent architecturesand dierent optimizing compilers. The idea is to provide a common set of routines for accessing thedata structures containing machine specic and compiler specic information. Such data structuresinclude the interference graph, list of live ranges, control ow graph, and register set information.Now, the compiler writer can immediately concentrate retargeting eorts on improving the allocatoroutput. Because of length limitations, only an overview of our strategy is presented here. Additionaldetails on how the dierent data structures and accessing routines interact could be presented.6
intGraph = buildIntGraph( lrangeList );do {rebuild = 0;do {/* Add unconstrained nodes to coloring stack. */if ((nptr = getUnconstrained(intGraph)) != NULL) {sptr = deleteNode( nptr, intGraph );push( sptr, coloringStack );}/* Mark lowest-cost constrainted node for spilling */else if ( (nptr = getLowCost(intGraph)) != NULL) {rebuild = 1;sptr = deleteNode( nptr, intGraph );markSpilled( sptr );push( sptr, coloringStack );}}while (!emptyGraph( intGraph ));/* Rebuild the Graph if a node was spilled. */if (rebuild) {while (!emptyStack(coloringStack) {sptr = pop( coloringStack );/* spilled nodes are not added back to intGraph. */if (spilled(sptr))addSpillCode(sptr);else insertNode(sptr, intGraph)}}}while (rebuild)do {sptr = pop( coloringStack );assignColor( sptr, intGraph );insertNode( sptr, intGraph );}while (!emptyStack( coloringStack ));Figure 3: Example Register Coloring Implementation7
Even with the interface, there are still dicult issues that must be addressed by the compilerwriter. For example, coloring algorithms expect accessing routines to a data structure containingdetailed register set information. This information will vary from one machine to another; therefore,the compiler writer must produce this information for each new target. In addition, some CISCinstructions have unusual register requirements. This may cause additional spills if the requirementscannot be encoded in a form usable by the coloring routine.The interface was designed in an eort to combine automatic code generators (such as Twig[AG89] and BURG [FHP92]) with register coloring. Our approach, termed Generic Register Al-location System (GRAS) [Bry93], enhances the automatic code generator to allow the compilerwriter to provide general register set information and instruction specic register information. Thegeneral register set information is provided using a Register Description File. The automatic codegenerator tabulates the information in this le so that it can be used by the accessing routines dur-ing coloring. The instruction specic register information is described in the associated instructiontemplates of the automatic code generator. A pre-allocation phase extracts the register informationfrom the templates and embeds it into the live ranges so that it is available for the coloring routine.The specic register information helps guide the coloring routine in dealing with temporary valuesand mapping the correct register sets to the instructions.Future eorts will combine dierent allocators with GRAS code generators. This will test theexibility of our interface with dierent architectures. In addition, we plan to focus on some of theRISC related issues addressed by the Marion System [BHE91].References[AG89] A. V. Aho and S. W. K. Ganapathi, M. Tjiang. Code generation using tree matchingand dynamic programming. ACM Trans. on Programming Languages and Systems,11(4):491{516, October 1989.[BCKT89] Preston Briggs, Keith D. Cooper, Ken Kennedy, and Linda Torczon. Coloring heuristicsfor register allocation. SIGPLAN Notices, 24(7):275{284, July 1989. Proceedings of theACM SIGPLAN '89 Conference on Programming Language Design and Implementation.[BGG+89] David Bernstein, Dina Q. Goldin, Martin C. Golumbic, Hugo Krawczyk, Yishay Man-sour, Itai Nahshon, and Ron Y. Pinter. Spill code minimization techniques for optimiz-ing compilers. SIGPLAN Notices, 24(7):258{263, July 1989. Proceedings of the ACMSIGPLAN '89 Conference on Programming Language Design and Implementation.[BHE91] David G. Bradlee, Robert R. Henry, and Susan J. Eggers. The Marion system forretargetable instruction scheduling. SIGPLAN Notices, 26(6):229{240, June 1991. Pro-ceedings of the ACM SIGPLAN '91 Conference on Programming Language Design andImplementation.[Bry93] Kelvin S. Bryant. A Generic Appoach to Integrating Automatic Code Generation andRegister Allocation. PhD thesis, North Carolina State Univ., September 1993.[CAC+81] G. J. Chaitin, M. A. Auslander, A. K. Chandra, J. Cocke, M. E. Hopkins, and P. W.Markstein. Register allocation via coloring. Computer Languages, 6(1):47{57, January1981. 8
[CH84] Fred C. Chow and John L. Hennessy. Register allocation by priority-based coloring.SIGPLAN Notices, 19(6):222{232, June 1984. Proceedings of the ACM SIGPLAN '84Symposium on Compiler Construction.[FHP92] C.W. Fraser, R.R. Henry, and T.A. Proebsting. Burg - fast optimal instruction selectionand tree parsing. SIGPLAN Notices, 27(4):68{76, April 1992.[LH86] James R. Larus and Paul N. Hilnger. Register allocation in the SPUR Lisp compiler.SIGPLAN Notices, 21(7):255{263, July 1986. Proceedings of the ACM SIGPLAN '86Symposium on Compiler Construction.[MB81] D. W. Matula and L. L. Beck. Smallest-last ordering and clustering and graph coloringalgorithms. Technical Report CSE-8104, Southern Methodist University, Dallas, TX,July 1981.
9
