Register allocation and optimization techniques in compiler construction by Jorgensen, Edward R.
UNLV Retrospective Theses & Dissertations 
1-1-1991 
Register allocation and optimization techniques in compiler 
construction 
Edward R. Jorgensen 
University of Nevada, Las Vegas 
Follow this and additional works at: https://digitalscholarship.unlv.edu/rtds 
Repository Citation 
Jorgensen, Edward R., "Register allocation and optimization techniques in compiler construction" (1991). 
UNLV Retrospective Theses & Dissertations. 133. 
http://dx.doi.org/10.25669/if4i-ax8s 
This Thesis is protected by copyright and/or related rights. It has been brought to you by Digital Scholarship@UNLV 
with permission from the rights-holder(s). You are free to use this Thesis in any way that is permitted by the 
copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from 
the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/
or on the work itself. 
 
This Thesis has been accepted for inclusion in UNLV Retrospective Theses & Dissertations by an authorized 
administrator of Digital Scholarship@UNLV. For more information, please contact digitalscholarship@unlv.edu. 
INFORMATION TO USERS
This manuscript has been reproduced from the microfilm master. UMI 
films the text directly from the original or copy submitted. Thus, some 
thesis and dissertation copies are in typewriter face, while others may 
be from any type of computer printer.
The quality of this reproduction is dependent upon the quality of the 
copy submitted. Broken or indistinct print, colored or poor quality 
illustrations and photographs, print bleedthrough, substandard margins, 
and improper alignment can adversely affect reproduction.
In the unlikely event that the author did not send UMI a complete 
manuscript and there are missing pages, these will be noted. Also, if 
unauthorized copyright material had to be removed, a note will indicate 
the deletion.
Oversize materials (e.g., maps, drawings, charts) are reproduced by 
sectioning the original, beginning at the upper left-hand corner and 
continuing from left to right in equal sections with small overlaps. Each 
original is also photographed in one exposure and is included in 
reduced form at the back of the book.
Photographs included in the original manuscript have been reproduced 
xerographically in this copy. Higher quality 6" x 9" black and white 
photographic prints are available for any photographs or illustrations 
appearing in this copy for an additional charge. Contact UMI directly 
to order.
University Microfilms International 
A Bell & Howell Information Company 
300 North Zeeb Road, Ann Arbor. Ml 48106-1346 USA 
313/761-4700 800/521-0600

Order Number 1344909
R eg is te r  a llocation  a n d  op tim iza tion  techn iques in  com piler 
co n s tru c tio n
Jorgensen, Edward R., II, M.S.
University of Nevada, Las Vegas, 1991
C opyright ©1991 by Jorgensen, Edw ard R ., II. All righ ts  reserved.
U-M-I
300 N. Zeeb Rd.
Ann Arbor, MI 48106

REGISTER ALLOCATION AND 
OPTIMIZATION TECHNIQUES 
IN COMPILER CONSTRUCTION
by
Edward R. Jorgensen
A thesis submitted in partial fulfillment 
of the requirements for the degree of
Master of Science 
in
Computer Science
Department of Computer Science and Electrical Engineering 
University of Nevada, Las Vegas 
April, 1991
Abstract
The purpose of this thesis was to investigate the implementation of register allocation and 
optimization techniques used in the process of compiler construction. The implementation 
issues were investigated by choosing an architecture and examining various register 
allocation and optimization techniques. In choosing the techniques to be implemented, 
only the most promising possibilities were explored for the specific architecture chosen. 
The goal was to categorize the register allocation and optimization schemes for the 
architecture.
The thesis of Edward R. Jorgensen II for the degree of Master of Science in Computer 
Science is approved:
Chairperson, Evangelos Yfantis, Ph.D.
Examining Committee Member, Tom Nartker, Ph.D.
Examining ^ommittee Member, John Minor, Ph.D
_________________________
Graduate Faculty Representative, Tom Schaffter, Ph.D.
Graduate Dean, Ronald W. Smith, Ph.D.
University of Nevada, Las Vegas 
April, 1991
ii
Acknowledgements
I would like to thank my advisor Dr. Evangelos Yfantis for advice on this thesis and for 
encouragement throughout my undergraduate and graduate years at UNLV.
Additionally, I would like to thank Dr. Tom Nartker for his continuous support. Other 
members of the faculty to whom I would like to extend this appreciation to include Dr. 
John Minor and Dr. Tom Schaffter.
Thanks also go to Ron Young for providing the compiler that was used for 
implementation.
I would like to thank my employer and co-workers Roger Hardwick, Tom Bozarth, Lorita 
Scanlan, and Tim Kirkpatrick. Without their support, this effort would not have been 
possible.
And finally, I would like to thank Bob Maichle and Dolly Turner for editing, general 
support and encouragement.
Table of Contents
Abstract .............................................................................................................................  i
Acknowledgements ...........................................................................................................  iii
Table of C ontents.............................................................................................................. iv
List of F ig u re s ......................................................................................................................vii
1 Introduction ................................................................................................ 1
1.1 Purpose and G o a ls ................................................................................. 2
1.2 Compiler U s e d ........................................................................................  3
1.3 Target Machine .....................................................................................  3
2 General Issues in Register Allocation.......................................................  4
2.1 Potential Execution Time Savings .......................................................  4
2.2 Register Allocation Partitioning............................................................  5
2.3 Register Allocation Across Procedure C a l l s ........................................  6
3 Previous W o rk .........................................................................................  7
3.1 Usage C ounts......................................................................................  7
3.2 Directed Acyclic Graph (DAG) ............................................................  8
3.3 Register Descriptors ..............................................................................  9
3.4 Lifetime Analysis ................................................................................. 10
3.5 Graph Coloring Strategies .................................................................  10
3.5.1 Graph Coloring .............................................................................. 11
3.5.2 Priority-based Coloring...................................................................  12
4 Register Allocation and Optimization Techniques................................ 14
4.1 Single-Use Register Allocation...........................................................  14
4.2 Usage C ounts...................................................................................  15
4.3 Graph Coloring ..........................................................................  16
4.4 Register Descriptors ...........................................................................  17
4.4.1 Algorithm .......................................................................................  17
4.4.2 Register Allocation Across Procedure C a lls .................................  18
4.4.3 Results ............................................................................................  18
4.5 Directed Acyclic Graph (DAG) .........................................................  19
4.5.1 Example ..........................................................................................  21
4.5.2 Results ............................................................................................  23
iv
4.6 Reaching Definitions..........................................................................  25
4.6.1 Data Flow Analysis........................................................................ 26
4.6.2 Computing the gen() and kill() functions...................... 28
4.6.3 Computing the out() function .......................................  28
4.6.4 Set Representation .......................................................................  29
4.6.5 Results   30
4.7 Live-Variable A nalysis.................................................... 32
4.7.1 Data Flow A nalysis........................................................................  33
4.7.2 Computing the in() fu n c tio n .................................................  34
4.7.3 Results   35
4.8 Variable-Use Analysis .......................................................................  36
4.8.1 Results   36
4.9 Peephole Optimization.......................................................................  38
4.9.1 Optimizations...................................................................................  39
4.9.2 Results   39
4.10 Other Optim izations..........................................................................  42
5 Conclusion ...............................................................................................  43
Bibliography ................................................................................................................... 44
Appendix A, Intel 80x86 Architecture Overview ......................................................... 46
Appendix B, Single-Use Register Allocation............................................................... 49
B.l Exam ples........................................................................................ 50
B.2 Source C o d e ..................................................................................  58
B.2.1 Source Code, Single-Use Register Allocation ....................  58
Appendix C, Register Descriptors ...............................................................................  78
C.l Exam ples........................................................................................ 79
C.2 Source C o d e ..................................................................................  105
C.2.1 Source Code, Register Descriptors ........................................  105
C.2.2 Source Code, Optimize Function ..........................................  116
Appendix D, Directed Acyclic Graph ....................................................................... 124
D.l Exam ples........................................................................................ 125
D.2 Source C o d e ..................................................................................  144
D.2.1 Source Code, Optimize Function ..........................................  144
D.2.2 Source Code, DAG R outines..................................................  146
Appendix E, Reaching D efinitions..............................................................................  154
E.l Exam ples.......................................................................................  155
E.2 Source C o d e ..................................................................................  163
E.2.1 Source Code, Optimize Function ..........................................  163
E.2.2 Source Code, Reaching Definitions........................................  164
E.2.3 Source Code, Store Registers.................................................. 175
v
Appendix F, Live-Variable Analysis ............................................................................  181
F.l Exam ples......................................................................................... 182
F.2 Source C o d e ..................................................................................  190
F.2.1 Source Code, Optimize Function .........................................  190
F.2.2 Source Code, Live-Variable Analysis....................................  191
F.2.3 Source Code, Store Registers................................................. 202
Appendix G, Variable-Use Analysis ............................................................................  208
G .l Exam ples......................................................................................... 209
G.2 Source C o d e ..................................................................................  217
G.2.1 Source Code, Optimize Function .........................................  217
G.2.2 Source Code, Store Registers................................................. 218
Appendix H, Peephole Optim izer.................................................................................  224
H .l Exam ples........................................................................................  225
H.2 Source C o d e ..................................................................................  228
H.2.1 Source Code, Peephole Optimizer ....................................... 228
Appendix I, Compiler Back-end....................................................................................  235
I.1 Source C o d e ..................................................................................  236
1.1.1 Source Code, Back-end....................................................................  236
Appendix J, Data Structures.........................................................................................  261
J.l Include F iles ................................................................................... 262
J.1.1 Include File, Back-end ..........................................................  262
J.1.2 Include File, Op-stack............................................................. 263
J.1.3 Include File, Optimize ..........................................................  264
J.1.4 Include File, Register Descriptors......................................... 265
J.l.5 Include File, Flow-Graph .....................................................  266
J.1.6 Include File, Directed Acyclic Graph ................................... 267
vi
List of Figures
Figure 1, A Compiler .....................................................................................................  1
Figure 2, Compiler P h a se s .............................................................................................. 2
Figure 3, Directed Acyclic Graph ................................................................................. 8
Figure 4, DAG Algorithm ...........................................................................................  21
Figure 5, DAG Example .............................................................................................  22
Figure 6, Predecessor Flow-G raph............................................................................... 27
Figure 7, Predecessor Flow-Graph Algorithm.............................................................  27
Figure 8, Algorithm to Calculate Reaching Definitions ........................................... 29
Figure 9, Successor Flow-Graph ................................................................................. 33
Figure 10, Successor Flow-Graph Algorithm .............................................................  34
Figure 11, Algorithm to Perform Live-Variable Analysis .........................................  35
Figure 12, Final Code Generation and Register Allocation Strategy ....................  43
vii
Chapter 1 
Introduction
A compiler is a translator between a source language and a target language (refer to 
Figure 1). The source language is typically written by a programmer and the target 
language is acted on by a machine.
Source
Program
Compiler I TargetProgram
Error
Messages
Figure 1, A Compiler
Compilation involves several steps including reading the input program, recognizing 
constructs of the programming language, choosing an implementation of those constructs 
from the available instructions of the target machine, preparing an executable image of 
the operations. Compilers are typically organized in phases [1], each of which performs 
one part of the translation. The early phases of compilation which include lexical 
analysis, syntactic analysis, and semantic analysis are language specific and machine 
independent. That is, these phases incorporate knowledge of the source language, but do 
not refer to specific features of the target machine. The early phases are referred to as 
the front-end. The later phases of compilation which include resource allocation, 
instruction selection, and code generation are machine specific but language independent. 
That is, these phases incorporate specific knowledge about the target machine, but do not 
refer to features of the source language. The later phases are referred to as the back­
end of a compiler.
The front-end compiler design issues rely on formal language theory. There are various 
well established models for compiler front-end design [2]. These include top-down (LL) 
and bottom-up (LALR). Compiler front-end phases include lexical analysis, syntactical 
analysis, and semantic analysis. Additionally, table driven tools such as yacc and lex are 
available to perform the front-end parsing and lexical analysis.
1
Back-end related research has yet to produce a widely accepted design model for overall 
back-end construction. Due to the diverse number of architectures and recent advances 
in new architecture designs, a significant amount of work has been performed in this 
area. Between the diverse architectures and lack of a widely accepted back-end model, 
much of the work in this area has been ad-hoc routines developed for very specific cases. 
The following diagram shows the general front-end, back-end relationship.
Source
Program
Front-end
Lexical
Anallysla
Syntax 
’ Analyzer
Semantic
Analyzer
Intermediate 
Code
Back-end
Code
Optimizer
Code
Generation
Target
Program
Figure 2, Compiler Phases
One of the back-end issues that is encompassed by the code optimization and code 
generation is register allocation.
1.1 Purpose and Goals
This thesis examines various register allocation and related optimization techniques and 
further examines the related implementation issues for back-end code generation. This 
included choosing a specific architecture to implement and investigate various register 
allocation techniques. Only the most promising possibilities were explored for the specific 
architecture chosen. It is expected that for different architectures an entirely different 
set of results would be generated.
The goal was to categorize the register allocation and optimization techniques for a 
specific architecture.
2
1.2 Compiler Used
The compiler used for implementation is a compiler for a locally defined language named 
UNLV2-Version 2 [3]. The UNLV2-Version 2 language is an enhancement of the 
UNLV2-Version 1 compiler [4]. The UNLV2-Version 2 compiler was written by Ron 
Young another graduate student at the University of Nevada, Las Vegas. The compiler 
consists of a basic front-end and back-end that produces assembly language code for the 
Intel 80x86 architecture. The produced code can then be assembled by a standard PC 
assembler. The run-time support routines for basic input and output were also included 
with the compiler.
The UNLV2-Version 2 language is a straight-forward PASCAL-like, structured language. 
Language constructs include WHILE loops, IF statements, IF...THEN...ELSE statements, 
and procedure/function calls. Procedures and functions can have parameters and local 
variables whose storage is dynamically allocated and can call themselves recursively. 
The issues of variable scoping must be adequately handled throughout the front- and 
back-end of the compiler. Data types include integers, real’s, characters, arrays, records, 
and pointers. All Pascal features are included except for the set features.
The reasons this compiler was used include the source code availability and the fact that 
the intermediate code generated is quadtuples. Quadtuples were chosen as a 
representative type of intermediate code for general application.
1.3 Target Machine
The Intel 80x86 architecture [5] [6] was chosen due to the wide-spread availability of such 
processors. Due to the limited number of registers available and the restrictive 
instruction set, an efficient and effective register allocation mechanism is particularly 
important.
An overview of the Intel 80x86 architecture, including the machine registers, is located 
in Appendix A. A basic knowledge of the architecture is required to understand the 
comments and descriptions presented in the remainder of this document.
The processor specific extensions are not addressed by the compiler. The processor 
specific extensions refer to the architecture enhancements for alternate memory 
addressing mechanisms (i.e., virtual memory). One reason for the processor extensions 
not being addressed by the compiler is that the 80286 extensions are not compatible with 
the 80386 extensions [7]. The primary reason the processor specific extensions are not 
used is that the 80286 and 80386 processor extensions are not being used by the 
operating system (i.e., DOS). As a result of the operating system not using the processor 
specific extensions, code written using the extensions is unable to call operating system 
functions to perform system interaction (i.e., file or terminal input/output).
3
Chapter 2
General Issues in Register Allocation
Register allocation refers to the process of allocating or assigning registers to program 
variables. This section presents an overview of the general issues involved in register 
allocation. These issues include a description of the advantages and disadvantages of 
performing register allocation.
The general process of always performing an optimal selection of program variables to 
assign to registers and when to assign them is NP-complete.
2.1 Potential Execution Time Savings
The general idea behind register allocation is that machine instructions which operate on 
values in registers are typically executed faster than values in memory. Depending on the 
instruction, this difference or speed-up can be significant. Due to the nature of the Intel
architecture instruction set, one operand must be in a register. The second operand can
be either in memory or in a register. There is a difference in the amount of time 
required to execute an instruction with both operands in registers as compared to only 
one operand being in a register.
For example, a very common instruction on the Intel 80286 processor like the ADD 
instruction with a register operand (REG) and a memory operand (MEM):
ADD REG, MEM
would be take 16 machine cycles plus the effective address calculation time1 which is 
between 6 and 12 additional machine cycles [5] [6]. Whereas, the ADD instruction with 
two register operands (REG, REG):
ADD REG, REG
would be take 3 machine cycles. The difference or minimum speed-up of 19 machine
1 The effective address calculation time is the amount of time required to calculate the offset. This 
might include a base address plus an offset for indexed or additional displacement memory addressing. The 
time required to compute the physical memory addresses from the segment address and offset is included 
in the instruction time. Refer to Appendix A for an overview of the segmentation scheme used by the Intel 
80x86 architecture.
4
cycles would be multiplied by the number of uses. The difference can accumulate over 
the course of a large program to be quite significant.
However, the difference or potential speed-up must be offset by the amount of machine 
cycles required to move the operand from memory-to-register. If the value was altered, 
the additional amount of machine cycles required to move the operand from the register 
back to memory must also be considered.
Since one operand must be moved from memory into a register, the time for that 
memory-to-register operation is required. It is then an option to either move the second 
operand into a register or leave it in memory. The final decision will be effected by 
many factors. One of which might be the amount of time required to perform the 
memory-to-register and potential register-to-memory operations. For a single operation, 
using two registers would increase the overall execution speed.
For example, assuming a register is used the value must be put into a register with the 
MOV instruction:
MOV REG, MEM
which takes 9 machine cycles plus the effective address calculation time which is between 
6 and 12 additional machine cycles. If the value in the register is altered, at some point 
it must be saved back into memory:
MOV MEM, REG
which takes the same number of machine cycles as the previous example. The minimum 
15 machine cycles for the load, plus the potential of a minimum additional 15 machine 
cycles clearly shows that for a single instance, using a register would increase execution 
time.
In order to fully realize the potential benefits of the additional speed-up due to register 
usage, a series of instructions would need to be performed with the value in a register.
2.2 Register Allocation Partitioning
Register allocation can be performed locally or globally. Local register allocation refers 
to assigning registers within either a set of basic blocks or within a program procedure. 
Global register allocation refers to assigning or using registers across multiple basic blocks 
or procedures. A subset of registers may be reserved for local assignment and the 
remaining registers may be assigned to global assignment.
Additionally, some registers are reserved for the compiler back-end to use for book­
keeping purposes. Depending on the architecture, these might include such registers as 
the stack pointer, base pointer, frame pointer, and/or instruction pointer. In general 
these registers, defined by the hardware, are dedicated to their respective function. For
5
the Intel 80x86 architecture, these dedicated registers include the segment, stack, base 
pointer, and instruction pointer registers.
In general, the registers are divided into two fixed partitions. One for the dedicated 
registers and the other for the non-dedicated or general purpose registers. This document 
will address register allocation for the partition of general purpose registers.
2.3 Register Allocation Across Procedure Calls
One major issue in register allocation is how to handle allocated registers in the context 
of procedure calls. There are three major alternatives. The first is caller-save. Here the 
register save code must be generated to save all allocated registers before the procedure 
is called, and then to restore the registers after the procedure returns. The second 
alternative, callee-save, is to save the registers at the entry point of the procedure. In this 
manner, the procedure need only save the registers it can potentially modify. The last 
alternative is to, at run-time, compute dynamically which registers to save. As part of the 
procedure linkage, the set of registers to save is computed by taking the intersection of 
the set of live registers in the calling routine and the set of registers required by the 
procedure.
None of these alternatives is optimal in all cases. Saving registers prior to the call is 
optimal if the called procedure uses more registers than the caller routine, since fewer 
registers would need to be saved at the call-site than would need to be saved at the entry 
point. Conversely, saving registers at the entry point is optimal if the calling routine uses 
more registers than the called routine, since fewer registers would need to be saved at 
the entry point than at the procedure call.
While dynamic linkage appears to save the minimum number of registers in all cases, the 
scheme can quickly degrade into an expensive form of callee-save. The higher expense 
arises from the computation of the set of registers that must be saved which is performed 
at run-time for each procedure call [8].
One study has shown there to be very little difference between caller-save and callee- 
save [8]. This study was conducted by recompiling a large number of programs (all 
UNIX utilities) using both methods and comparing the execution times. Of all the 
programs, over 80% showed no change. The remaining programs were split between 
executing slightly faster and executing slightly slower. Overall, the differences were small 
and primarily a function of the style in which the programs were written.
The traditional approach is to use callee-save for register allocation across procedure 
calls.
6
Chapter 3
Previous Work
There has been a great deal of work in the area of register allocation. This chapter 
presents an overview of previous work in this area. For the purposes of presenting this 
information, the previous work has been categorized into the following general register 
allocation strategies:
Usage Counts
Directed Acyclic Graph (DAG)
Register Descriptors 
Lifetime Analysis 
Graph Coloring
These general strategies and the related previous works are presented in the following 
sections.
3.1 Usage Counts
One of the earliest methods of performing register allocation involved the calculation of 
program variable "usage counts" [9]. Machine registers are assigned to program variables 
with the highest usage during code generation. Usage counts are used primarily for 
local register allocation.
A data structure is created which contains a record of the variables that are currently 
assigned to each register and a status flag for each register. Additionally, a record of 
each variable and its usage count is also maintained. Then, as code generation occurs, 
each time an instruction is generated that uses a variable, the associated usage count is 
decremented. When the usage count drops to zero, the register is released for reuse. 
When there is an excess demand for registers, the register containing the lowest usage 
count is selected for possible re-loading. The previous value must be stored only if it has 
been changed as indicated by the status flag. This approach is generally applied either 
to basic blocks or procedures.
One of the primary advantages is the relatively low overhead associated with this 
approach. This is true even when there is a relatively low number of registers available 
on the target machine.
Another of the stated advantages is that this approach performs near optimal register 
allocation for linear code regions. However, this is not true for all architectures.
7
Unfortunately, one of the disadvantages is that locally optimal solutions to the problem 
of register allocation do not necessarily add up to the globally optimal solution. 
Specifically, register assignment across non-linear regions is not optimal. An optimal 
solution would depend on the frequency of execution of each flow path through the basic 
block. Assumptions are then made regarding the frequency of execution of each flow 
path and the associated program variables. For example, variables in a loop can be 
weighted more heavily.
3.2 Directed Acyclic Graph (DAG)
The use of a Directed Acyclic Graph (DAG) was suggested by Aho, Sethi, and 
Ullman [1]. The DAG is used to assign registers to temporaries during expression 
evaluation. Interior nodes of the DAG represent the generation or modification of 
temporaries created along the various paths of the DAG. Registers can be assigned to 
these interior nodes, thus improving the time required for overall evaluation of the 
expression.
For example, given the expression:
x = a /  b + ( c  + d )  * e
the DAG generated would be as follows:
Figure 3, Directed Acyclic Graph
Registers would be assigned to the t0, t , , and t2 nodes. The advantage of this approach 
is in identifying which node to assign registers to when there is not enough registers for 
all nodes, as might be the case with larger expressions. If not enough registers are 
available for all nodes, registers are assigned only to the longest paths in the DAG.
8
One potential problem with this approach is how embedded subroutine or functions calls 
are handled. If registers are used as temporaries, then they have to be saved before 
calling the function and restored afterwards. Additional effort is required to recognize 
the function calls, and decide how to handle such cases. Registers might be assigned to 
temporaries only after the function call or might be passed as part of the function call.
3.3 Register Descriptors
The basic idea of using register descriptors and address descriptors to keep track of 
register contents and variable addresses during code-generation is outlined by Aho, Sethi, 
and Ullman [1] and is the combination of several strategies [10][11][12].
The strategies are combined into a single algorithm, referred to as getregQ, that performs 
register assignment during code-generation using the register and address descriptors. 
The register descriptors keep track of what is in each register, and is consulted when a 
new register is required. The address descriptors keep track of the location of the most 
recent value of the variable and the assigned memory location. At various points during 
code-generation, the current location might be a register or memory address.
As code-generation progresses and variables are accessed or updated, the function getregQ 
is used to assign registers to variables when possible. The function uses the register 
descriptors and, when required, may move a variable from a register back to memory.
The fimction is invoked for every tuple that accesses program variables. The basic
version of getregQ, as outlined in the literature, is used to perform local register 
allocation.
The getregQ function takes a standard quadtuple (i.e., (x = y op z)) as input and returns 
a location, L, where the results of the quadtuples should be stored. The location is 
typically a register.
The basic algorithm for getreg is as follows:
L = getreg ( quadtuple (x = y op z) )
1) If the name y is in a register that holds the value of no other names, and 
y is not live and has no next use after execution of x : = y op z, then return
the register of y for L. Update the address descriptor of y to indicate that
y is no longer in L.
2) Failing (1), return an empty register for L if there is one.
3) Failing (2), if: 1) x has a next use in the block.
2) op requires a register.
Find an occupied register R. Store the value in R into memory location 
(via a MOV instruction) if it is not already in the proper memory location
9
M, update the address descriptor for M, and return R. If R holds the value 
of several variables, a MOV instruction must be generated for each variable 
that needs to be stored. A suitable occupied register might be one whose 
datum is referenced furthest in the future, or one whose value is also in 
memory.
4) If x is not used in the block, or no suitable occupied register can be found, 
select the memory location of x as L.
The architecture specific implementation and extensions to the basic getregQ algorithm 
are described in Chapter 4, Section 4.4.
3.4 Lifetime Analysis
In order perform register allocation across basic blocks, information about the variables 
used in the next basic block or blocks is required. Generally, for local register allocation, 
the values in registers must be written to memory at the end of each basic block. This 
is required since control may reach the block from multiple other blocks, and it cannot 
be directly assumed that a variable used by the successor block will always appear in the 
same register. In some cases this may result in performing a register-to-memory 
operation and then in a successive block, performing a corresponding memory-to-register 
operation on the same variable. Additionally, since it spans a basic block a peep-hole 
optimization will be unable to correct the problem. In fact, depending on the specific 
data-flow, the additional memory-to-register operation may be required (i.e., a loop 
situation).
It might be possible to retain a value or values in a register across multiple basic blocks, 
by obtaining information about the variables used in the successor block or blocks. This 
would allow a more global solution to register allocation issues.
To obtain such information, a data-flow analysis must be performed in order to generate 
the data-flow information. The data-flow information consists of the inQ (i.e., variables 
that are live going into the basic block) and/or the outQ (i.e., variables still live going 
out of the basic block).
There are several methods of computing the inQ’s and outQ’s of basic blocks. Two of 
those methods are investigated in Chapter 4, Sections 4.6 and 4.7.
3.5 Graph Coloring Strategies
The application of graph coloring to the problem of register allocation is outlined by 
Chaitin, et al. [13] and later refined by Chaitin [14]. There have been a great deal of 
variations on the graph coloring strategy [15][16][17]. For register allocation, each node 
in a coloring graph represents a program variable or quantity that is a candidate for 
residing in a machine register. The number of colors used for coloring the graph is the
10
number of registers available for use in register allocation. The goal is to find the best 
way to assign program variables to registers so the number of variables in registers is 
maximized.
Coloring a graph is an assignment of a color to each graph node in such a manner that 
if two nodes are adjacent (i.e., connected by an edge of the graph), then they must have 
a different color assigned. A coloring is said to be an n-coloring if it does not use more 
than n colors. The chromatic number of a graph is defined to be the minimal number 
of colors for which there is an n-coloring of the graph. The basic problem of 
determining whether a given graph is n-colorable is NP-complete. This suggests that in 
some cases an impractical amount of computation time might be required to determine 
the coloring. In some cases, the amount of time could be an exponential function of the 
size of the graph. Any algorithms that use the graph coloring strategy must take some 
steps to overcome this potentially significant obstacle.
This section presents an overview of the basic graph coloring algorithm and one of the 
graph coloring variations.
3.5.1 Graph Coloring
The basic graph coloring algorithm outlined by Chaitin, et. al. involves creating a register 
interference graph for each procedure in the source program. Two computations which 
reside in machine registers are said to interfere with each other if they are live 
simultaneously at any point in the program. The resulting graph is potentially very large, 
and a must be stored in a data structure that takes as little space as possible. For an 
N-node graph, an N by N bit matrix can be constructed to represent the graph.
A series of operations and optimizations are performed on the graph after it is built. 
The resulting graph is then used to perform register assignment. One of the basic 
assumptions of graph coloring is that no previous register allocation has been performed, 
and that all values have memory locations assigned. This is required in case a register 
does not get assigned to the variable or should spill code be required. Spilling or spill 
code implies that the values are kept in a memory location as opposed to a register.
After the interference graph is built, the next step is to coalesce the nodes for the 
purpose of assuring that separate nodes in the graph get the same color. This is done 
by taking two nodes that do not interfere and combining them in a single node which 
interferes with any node which either of them interfered with before.
Additionally, before the interference graphs for program procedures are colored, the 
graph is reduced. This reduction, referred to as subsumption, is performed by eliminating 
nodes in the graph that have fewer edges than there are colors available. That is, if 
there are n colors available, and the graph has fewer than n nodes, the graph is colorable 
no matter how the colors are assigned. As such, the node can be eliminated from the 
graph. This is a very powerful reduction, which can in some cases completely eliminate 
the interference graph. If the graph is reduced to the empty set, colors can be assigned 
sequentially to the nodes in the order that they were removed.
11
The final step is to perform the graph coloring on the processed interference graph (if 
it is non-empty). This step can be potentially very time consuming, and would only be 
performed if a high degree of optimization is required.
If the graph is not colorable, spill code must be introduced. As spill code is introduced, 
the interference graph is modified accordingly. As the interference graph is updated, the 
new graph may be colorable. This process is continued until a complete coloring can be 
obtained.
Due to the overhead required and the nature of the coloring algorithm, this approach 
works best for a target machine with a large number of free registers.
3.5.2 Priority-based Coloring
One of the variations of the standard coloring algorithm involves assigning priorities in 
node coloring. This assignment of priorities is based on estimates of the benefits that 
can be derived from allocating individual quantities in registers. Since the costs involved 
in register allocation are taken into account, the algorithm will not over-allocate and will 
execute in linear time.
This approach to graph coloring still assumes that all values have memory locations 
assigned. This is required in case a register does not get assigned to the variable or 
should spill code be required.
Standard graph coloring does not take into account the cost and saving involved in 
allocating variables to registers. The cost refers to the required instructions that perform 
the register-to-memoiy and memory-to-register operations that put variables in registers 
or make the registers available for other uses. The savings refers to the amount of time 
gained in the execution of the program. Variables occur with different frequencies and 
with varying degrees of clustering, so that the relative benefits of assigning registers to 
variables differ. The standard coloring algorithm always tries to allocate as many items 
in registers as possible and does not recognize the fact that this is sometimes not 
beneficial due to the register load/store costs. Also, the standard coloring algorithm 
does not take into account the loop structure of the program. In practice, variables 
occurring in frequently executed regions should be given greater preference for residing 
in registers.
In order to overcome these problems, cost and saving estimates are generated for each 
program variable. These estimates include the cost of moving variables from memory 
into registers and saving the register values back to memory. The following parameters 
are used:
MOVCOST The cost of a memory-to-register or register-to-memory 
operation, which in practice is the execution time of the 
move instruction of the target machine.
12
LODSAVE The amount of execution time saved for each reference
of a variable residing in a register compared with the
corresponding memory reference that is replaced.
STRSAVE The amount of execution time saved for each definition
of a variable residing in a register compared with the
corresponding store to memory being replaced.
Thus, for each variable in the basic block, the potential savings of having the variable in 
a register can be estimated. However, if the current basic block is considered together 
with the preceding and subsequent blocks, the actual saving may increase. This can 
happen if preceding or subsequent blocks use the same variable, and that variable is 
assigned to the same machine register. This might potentially eliminate a series of 
memory-to-register or register-to-memory operations, which would have an impact of the 
savings estimates. To deal with this possibility, the following two separate estimates are 
tracked:
MAXSAVE = LODSAVE * u + STRSAVE * d
MINSAVE = LODSAVE * u + STRSAVE * d - MOVCOST * n
Where: u is the number of uses of the variable,
d is the number of definitions, and 
n is either 0, 1, or 2.
The n depends on whether a load is required at the beginning and/or end of the basic 
block. If the variable is loaded into a register at the beginning of a block and saved 
back to memory at the end of the block n would be 2. If the variable is not changed 
and does not need to be moved back to memory, n would be 1. If the variable is 
already in a register and is not altered, n would be 0. The variable might already be in 
a register from a previous basic block.
After the estimates are generated, they are weighted by the loop-nesting depths of the 
program. In this manner, variables in loops, particularly inner loops, are given a higher 
priority for register allocation.
Then successive iterations of the coloring algorithm are performed. Each iteration assigns 
one live range to a register by choosing the most promising live range according to the 
cost and saving estimates computed over that live range. By assigning the live ranges 
with the highest priority first, registers are allocated where they will have the most impact 
on the overall execution time of the program. The algorithm terminates when either all 
live ranges have been allocated, or all registers have been assigned over all basic blocks. 
By using the priority based register assignment and potentially terminating before a 
complete coloring is obtained, the computation time does not deteriorate when an a 
complete coloring cannot be achieved.
13
Chapter 4
Register Allocation and Optimization Techniques
This section outlines the register allocation and optimization techniques that were 
investigated for the Intel 80x86 architecture. The general concepts for existing register 
allocation and optimization techniques are described in the literature and summarized in 
Chapter 3. A detailed description of how each general concept was specifically applied 
is included in this section. A copy of the code and any additional specific information 
related to each method is located in a respective appendix.
Not all register allocation techniques were evaluated through direct implementation. 
Two methods, usage counts and graph coloring were investigated and not implemented. 
The results of those evaluations are also outlined in this chapter.
4.1 Single-Use Register Allocation
The most basic and simplest form of allocating registers during code generation is single­
use register allocation. Single-use register allocation means that the value or values are 
placed in registers as required for the instruction and saved back to memory after the 
instruction. No attempt to retain values in registers across multiple instructions or track 
register contents is made.
The back-end, using single-use register allocation would convert the expression: 
a = b + c
into the following code:
mov a x , word p t r  s s : [ b p - 4 ]
add a x , word p t r  s s : [ b p - 6 ]
mov word p t r  s s : [ b p - 2 ] ,  ax
The advantage of this approach is that there is very little compile-time overhead. Also, 
the construction of the back-end becomes much simpler and correspondingly smaller. The 
primary disadvantage is that the code generated is very inefficient. A large number of 
needless and possibly redundant memory-to-register and register-to-memory operations are 
performed.
14
For example, after the preceding code has been generated, the value of a  is in the AX 
register. If any succeeding statements used the same variable, a would be reloaded from 
memory. If the succeeding statement that used the same variable immediately followed, 
it would be possible to perform peep-hole optimization to remove the redundant load/store.
This approach is most commonly used when the compiler is not being used to produce 
high quality code. Several books for C-Language programming and compiler construction, 
particularly ones that focus on front-end issues, favor this approach for its simplicity [18].
The purpose of implementing this approach was to establish a baseline for comparison with 
the other register allocation schemes. This also helped to establish the back-end model.
Refer to Appendix B for detailed examples and a copy of the source code.
4.2 Usage Counts
Register allocation by usage counts involves the calculation of program variable frequency 
counts. As code is generated, machine registers are assigned to the program variables 
with the highest usage. In general, usage counts are used primarily for local register 
allocation. During this process, a register descriptor must be maintained to track the 
variables that are assigned to registers. When applying the usage counts register allocation 
technique to the Intel 80x86 architecture, some architecture specific issues must be 
addressed.
It would be possible, and with relatively low overhead, to generate the program variable 
usage counts. This information would then be used to assign registers to the variables with 
the highest use counts.
This method alone would be inadequate due to the mandatory register use requirements 
of the instruction set. For example, given the quadtuple:
a = b + c
The instruction set requires that the operand b be assigned to a register, and that the 
result, a, would then reside in the same register. It would be possible for the variables 
a or b to have a relatively low usage in the basic block. If that were the case, and the 
variable did not get assigned to a register in the correct order, the program would not 
work.
The variable b must be moved into a register prior to the instruction regardless of its 
usage count. If all registers are being used, another register must be made available by 
the generation of spill code. Since the variable has a low usage, it might then either 
needlessly use a register or require more spill code to be generated to be able to re-use 
the register.
15
Additionally, it might be possible for the variable c  in the quadtuple above to be used 
a large number of times without being redefined. If the instances of use are in close 
proximity, register assignment would be appropriate. If the instances of use are 
infrequent or more widely distributed, register assignment may not be appropriate 
depending on the number of free registers available.
The generation of spill code to free registers for quadtuples with mandatory register use 
requirements would make the direct application of usage counts tend to generate poor 
code when there are more variables than available registers. This would be true even 
for linear code segments, where usage counts normally generate very high-quality code.
While the direct application of usage counts would be inappropriate for the Intel 80x86 
architecture, it might be possible to combine this approach with another mechanism.
4.3 Graph Coloring
The basic graph coloring algorithm involves creating a register interference graph for each 
procedure in the source program. Interference implies that two variables are live 
simultaneously at some in the program. A series of optimizations is performed after the 
interference graph is built to reduce the graph so that multiple nodes that do not 
interfere are coalesced into a single node. The resulting graph is then colored in order 
to assign different registers to the interfering variables.
If a coloring cannot be obtained, spill code must be introduced. The interference graph 
is updated as the spill code is generated to see if the resulting graph is colorable. This 
process is continued until a coloring can be obtained. With a small number of registers 
available, it is very unlikely that for non-trivial programs a coloring will be obtained 
without the generation of spill code.
This approach does not take into consideration the mandatory register use requirements 
of the instruction set. Variables there are spilled may be required in a register due to 
the instruction set. If that is the case, additional code must be generated to free the 
required register, load the variable into the register, and potentially save the result back 
to memory. The end result will be an excess of potentially conflicting spill code 
generation.
Another problem with graph coloring is that, due to the overhead required and the nature 
of the coloring algorithm, this approach tends to works best for a target machine with a 
large number of free registers. The more free registers, the better the coloring algorithm 
will be able to assign a maximum of variables to registers. For the Intel architecture, 
there are six registers available for allocation to variables. However, the number of 
registers actually available for allocation is dependant upon the operation or series of 
operations being performed. For example, the word multiply instruction uses a minimum 
of two registers and possibly three registers. Because of this, the number of available 
registers does not remain static, further eroding the potential effectiveness of the coloring 
algorithm.
16
Therefore, the coloring algorithm was determined to be non-applicable to the Intel 
architecture, and was therefore not implemented as part of this effort.
4.4 Register Descriptors
The general algorithm for using register descriptors and address descriptors for register 
allocation and code generation is embodied in the getregf) algorithm as presented by Aho, 
Sethi, and Ullman [1]. This section presents information about the specific 
implementation of the getregf) function for the Intel 80x86 architecture.
The function getregf) is used during code-generation to assign registers to variables when 
possible. The dynamic nature of the register assignment makes this approach flexible for 
architecture specific modifications. As such, instructions that require a specific register 
will be assigned those registers as part of the modified getregf) function. A series of 
support routines are used to perform various memory store and register free operations. 
The modified algorithm and the associated support routines are described in the following 
section.
4.4.1 Algorithm
The modified getregf) function, performs the register assignment in a architecture specific, 
instruction specific context. To do this, the standard quadtuple (x = y op z) along with 
a flag to indicate which register must be assigned. This might be "any_register" or a 
specific register depending on the operation. The modified algorithm for the Intel 
architecture is as follows:
L = getreg ( quadtuple (x = y op z), which_register )
1) If the variable y is in a register that holds the value of no other names,
and y is not live and has no next use after execution of the quadtuple, then
return the register of y for L. Update the address descriptor of y to
indicate that y is no longer in L.
2) Failing (1), return an empty register for L if there is one. Search the 
registers available for free registers. The search must be performed based 
on register required and the variable type since real, characters, and integers 
use different registers.
3) Failing (2), if: 1) x has a next use in the block.
2) The operand requires a register.
Free a register (either the register required or one with the least number 
of entities currently assigned). If any register, search occupied registers for 
the specific variable type. Generate the required instructions to store the 
selected register to memory. This implies that multiple instructions might 
be generated if the register has multiple variables assigned.
17
In addition to the getregf) function, a number of support routines are required. These 
routines are used to perform the functions of manipulating the register descriptors, freeing 
registers as required, and generating the memory-to-register/register-to-memory 
instructions.
Additionally, the next-use information must be generated. The next-use information 
consists of a simple live-variable analysis for the basic block. Performing the live-variable 
analysis through a single block without the potential of multiple control-flows is very 
straight-forward. These routines are used exclusively by the back-end. Appendix C 
presents a copy of the source code for the getregf) function for reference.
4.4.2 Register Allocation Across Procedure Calls
The getregf) function performs caller-save register allocation for procedure calls. This is 
handled by default since a procedure call would be placed into a separate basic block. 
Since the getregf) function will write to memory, and therefore free, all registers at the 
end of a basic block, no additional effort is required at the start of a procedure. All 
registers are assumed to be available.
It would be possible to change this, and leave values in the registers across a procedure 
call. If the procedure required a register, the register descriptors would indicate the prior 
use. If all registers are used, the appropriate register-to-memory instructions could be 
generated for the register required. Such a scheme would require additional compile­
time.
Previous studies have established [8] that the difference between caller-save and callee- 
save do not justify the additional effort.
4.4.3 Results
The use of the getregf) function produced acceptable results for local register allocation. 
The results were very good when the program had a small number of variables. The 
results were acceptable for programs with a larger number of variables. Specific 
registers must be used for certain machine instructions due to the mandatory register use 
requirements of the architecture.
Un-necessary register-to-memory and memory-to-register operations were generated in 
some situations. A certain amount of register swapping is unavoidable with the small 
number of registers available and the required register use for the instruction set.
Refer to Appendix C for detailed examples and a copy of the source code.
18
4.5 Directed Acyclic Graph (DAG)
Directed Acyclic Graphs (DAGs) are widely used for common subexpression elimination 
within basic blocks and performing transformations on basic blocks. The basic approach 
to building and applying a DAG is presented by Aho, Sethi, and Ullman [1]. The use 
of a DAG for register allocation is also mentioned by Holub [19] for register assignment 
during expression evaluation. This DAG approach, developed locally [20], primarily 
follows Aho, Sethi, and Ullman approach.
The DAG has been applied for the purpose of improving register allocation mechanisms 
in the context of the Intel architecture. A complete DAG is created for the quadtuples 
in a basic block prior to instruction generation. This DAG is then used to heuristically 
reorder the quadtuples. Code is then generated based upon the reordered quadtuples. 
The nature of the heuristic reordering tends to coalesce the quadtuples that share 
common variables. This is directly applicable to register allocation since a variable is 
more likely to be put in a register, and allowed to stay in a register, when the statements 
that use the variable/register are in close proximity. This is especially true for the Intel 
80x86 architecture, due to the mandatory register use requirements for most instructions. 
The proximity of quadtuples or instructions that use the same variables can significantly 
improve overall register allocation and code generation efficiency2 where there is a high 
contention for available registers.
The general processes of building, accessing, and numbering a DAG is outlined in the 
literature. The classic approach to using a DAG includes building the DAG and 
subsequently numbering the DAG nodes in a very specific manner. This numbering is 
performed starting with a root node, traversing downward, left child first, sequentially 
numbering each node. A node is only numbered if all the parents have been previously 
numbered. If one or more of the parents is as yet un-numbered, the downward traversal 
for that branch stops. This process continues until all the nodes are numbered. This 
numbering, in reverse, is used to re-order the nodes. The new ordering for the nodes 
is more optimal since the algorithm, when possible, tends to coalesce the nodes that share 
common variables. This would allow a more efficient register assignment and code 
generation for the Intel architecture.
The classic approach to creating and accessing a DAG typically includes a parent field, 
where the parent field is used to refer to a linked list of parent nodes. This linked list 
of parents is used to verify the parent numbering status during DAG node numbering 
process. Each parent is checked, via the linked list, for its node numbering status. When 
an un-numbered parent is found, node traversal down that branch of the DAG is halted.
The DAG node description typically includes the operand. In practical terms, the 
operand is already stored in the symbol table, and as such the structure need only contain 
a pointer to the appropriate location in the symbol table.
2 This assumes some type of dynamic register allocation mechanism (such as the getregf) function 
previously described).
19
For this application of a DAG, the basic DAG build and all DAG accesses can be 
performed without using a linked list of parents. The structure contains a use_count 
instead of a list of parent pointers. The use count is a simple integer counter that is 
used to indicate the number of parents. The count is very easily set or incremented 
during the DAG build. This counter is used instead of a linked list of parents. The 
algorithm is more efficient and easier to implement without the additional complexity 
of a linked list of parent nodes.
A node might be defined as follows:
s t r u c t  t y p i c a l _ n o d e  {
s t r u c t  sym tab *o p e ra n d ;  
i n t  o p e r a t o r ;
s t r u c t  t y p i c a l _ n o d e  * l e f t ;  
s t r u c t  t y p i c a l _ n o d e  * r i g h t ;  
i n t  n o d e _ o r d e r ; 
i n t  u s e _ c o u n t ;
};
A linked list of root nodes is maintained as part of the node building. This is required 
for this application since a basic block can have multiple independent sets of quadtuples. 
The list of root nodes is used to track the multiple DAGs that result from the multiple 
independent quadtuples. Refer to Section 4.5.1 for an example.
Only the non-leaf nodes are numbered after the DAG is built for this application. The 
non-leaf nodes represent the result of a quadtuple and leaf nodes represent quadtuple 
operands for register allocation purposes. A non-leaf node can be an operand to another 
quadtuple.
The numbering is performed starting with a root node, traversing downward, left child 
first, sequentially numbering each node. A node is numbered only if all the parents 
have been previously numbered. If one or more of the parents is as yet un-numbered, 
the downward traversal for that branch stops. The status of the parent node or nodes 
is checked with the use_count counter which is part of the node structure. If the 
use_count is 1, then there is only one parent un-numbered or unaccounted for, which 
would be the parent performing the test. While traversing the tree, when a use_count 
of greater than 1 is encountered, downward traversal stops, and the use_count is 
decremented indicating that this parent, one of the potentially multiple parents, has been 
numbered. This method is used instead of checking parents via a linked list.
20
The following is the basic algorithm developed for building, numbering, re-ordering, 
deleting the DAG, and generating code:
r o o t s _ l i s t  = NULL; 
w h i l e  ( b lo c k  != NULL )
f o r  ( b l o c k _ s t a r t  t o  b lo c k _ e n d  )
makenode ( t u p l e ,  r o o t s _ l i s t  ) ;  
n u m b e r_ tu p le s  ( r o o t s _ l i s t  ) ;  
r e o r d e r _ t u p l e s  ( b l o c k ,  r o o t s _ l i s t  ) ;  
d e l e t e _ d a g  ( r o o t s _ l i s t  ) ;  
g e n e r a te _ c o d e  ( b lo c k  ) ;  
en d _ w h ile ?
Figure 4, DAG Algorithm
This algorithm is specific for the purpose of re-ordering quadtuples within the basic block 
prior to code generation.
4.5.1 Example
This example is provided in order to illustrate the potential advantages of using a DAG 
for reordering quadtuples prior to register allocation. This example demonstrates the 
use_count and node numbering as described. Assuming the following set of quadtuples:
e 53 X + y
c = a + b
j = e + i
n 35 o + p
d = b + e
i = j + k
h — d + e
f = c + d
21
The DAG what would be generated is as follows:
roots ——»f, h, i, n
Figure 5, DAG Example
When traversing the DAG to determine the ordering, the use_counts (the number inside 
the node) provide the information regarding the number of parents. For example, in 
Figure 5 as the traversing progresses from the (d , +) node, the 2 would indicate an un­
numbered parent. The 2 would be decremented to 1, and traversal down that branch 
would be discontinued. As traversal downward from the (h , +) node progresses, the 
use_count for the (d , +) would have already been changed to 1, and the node would 
be appropriately numbered.
The node numbering that is generated is indicated by the numbers to the left on the 
outside of the node. According to this numbering, the re-ordered quadtuples would be 
as follows:
n = o + P
e = X + y
j = e + l
i = j + k
d = b + e
h = d + e
c = a + b
f = c + d
This new ordering would tend to generate more efficient code due to the improved 
register allocation possibilities. This is especially true when there are a limited number 
of registers available.
22
4.5.2 Results
The DAG is not used directly for register allocation. Instead, the DAG is applied prior 
to code generation to re-order to quadtuples. The re-ordered quadtuples should then 
lend themselves to better register allocation due to the proximity of quadtuples that have 
common variables.
The DAG tended to have little or no impact on programs with numerous small basic 
blocks. Small basic blocks tend to be generated for a class of control-oriented programs. 
Such programs might include I/O  processing (where little or no data processing is 
performed). This is because small basic blocks tend to have less variables, and the 
getregf) function was able to assign variables to registers regardless of the order. For very 
small blocks (three to five quadtuples), there are not enough quadtuples to allow 
significant re-ordering.
A parameter, MAX_DAG_SIZE, was implemented to address this issue. If the basic 
block had MAX_DAG_SIZE or less quadtuples, the DAG build was not performed, and 
the root pointer was set to NULL.
The DAG tended to improve the register allocation only for larger blocks that contained 
a series of quadtuples. This might occur in programs that evaluated a formula or 
performed a series calculations within a single basic block. The following program is 
used to demonstrate the results of the DAG approach.
UNLV Language C o m p ile r
1: { Example P rogram  f o r  DAG R e - o r d e r i n g  }
2 :
3: p rog ram  one
4 :
5 : v a r  a ,  b ,  c ,  d ,  e ,  f ,  h ,  i :  i n t e g e r ;
6 : v a r  j ,  k ,  1 ,  n ,  o ,  p ,  x ,  y :  i n t e g e r ;
7:
8 : b e g in
9 :  w r i t e l n  ( " t s t l  —  t e s t  DAG " ) ;
10 ;
11: e = X + y
12: c = a + b
13: j = e + i
14: n = o + p
15: d = b + e
16: i = j + k
17: h = d + e
18:
19:
20:
f
end.
c + d
This program is used as an example and, due to the un-initialized variables, will not have 
meaningful results. The intermediate quadtuples that are generated for the primary basic
23
block are listed below. For space considerations, only the quadtuples for the basic block 
of interest are shown. The Ord N represents the new order as assigned by the DAG.
u n o p t im iz e d  t u p l e s  ( m a in ) , 32 t u p l e s .
IM ADD X ( l /0 ) y ( i / o ) t 0 0 1 ( 0 / 0 ) 000000 Ord 14
IM STORE t o o l ( 0/ 0 ) NULL e ( l / 0 ) 000000 Ord 13
IM ADD a ( l / 0 ) b ( l / 0 ) t 0 0 2 (0 /0 ) 000000 Ord 4
IM STORE t 0 0 2 (0 /0 ) NULL 0 ( 1 /0 ) 000000 Ord 3
IM ADD ® (1 /0 ) 1 ( 1 /0 ) t 0 0 3 (0 /0 ) 000000 Ord 12
IM STORE t 0 0 3 (0 /0 ) NULL j (1 /0 ) 000000 Ord 11
IM ADD 0 ( 1 /0 ) P ( l / 0 ) t 0 0 4 (0 /0 ) 000000 Ord 16
IM STORE t 0 0 4 (0 /0 ) NULL n ( l / 0 ) 000000 Ord 15
IM ADD b ( l / 0 ) e  ( 1 /  0) t 0 0 5 (0 /0 ) 000000 Ord 8
IM STORE t 0 0 5 (0 /0 ) NULL d ( l / 0 ) 000000 Ord 7
IM ADD j ( l / 0 ) k ( l / 0 ) t 0 0 6 (0 /0 ) 000000 Ord 10
IM STORE t 0 0 6 (0 /0 ) NULL i ( l / 0 ) 000000 Ord 9
IM ADD d ( l / 0 ) e ( l / 0 ) t 0 0 7 (0 /0 ) 000000 Ord 6
IM STORE t 0 0 7 (0 /0 ) NULL h ( l / 0 ) 000000 Ord 5
IM ADD c ( l / 0 ) d ( l / 0 ) t 0 0 8 (0 /0 ) 000000 Ord 2
IM STORE t 0 0 8 (0 /0 ) NULL f ( l / 0 ) 000000 Ord 1
The following set of quadtuples represents the intermediate code after reordering.
o p t im iz e d  t u p l e s  ( m a in ) , 32 t u p l e s .
IM ADD o ( l / 0 ) P ( l / 0 ) t 0 0 4 (1 /1 7 ) 000000 Ord 16
IM_"store t 0 0 4 (0 /0 ) NULL n ( 1 /  0) 000000 Ord 15
im""add X ( l /0 ) y ( i / 0 ) t 0 0 1 ( l / 8 ) 000000 Ord 14
IM "store t o o l ( 0/ 0 ) NULL e ( l / 1 3 ) 000000 Ord 13
im""add e ( l / 1 9 ) K l / O ) t 0 0 3 (1 /1 4 ) 000000 Ord 12
im""store t 0 0 3 (0 /0 ) NULL j (1 /2 2 ) 000000 Ord 11
im""add j ( l / 0 ) k ( l / 0 ) t 0 0 6 (1 /2 3 ) 000000 Ord 10
im""store t 0 0 6 (0 /0 ) NULL i ( i / ° ) 000000 Ord 9
im""add b ( l / 0 ) e ( l / 2 5 ) t 0 0 5 (1 /2 0 ) 000000 Ord 8
im""store t 0 0 5 (0 /0 ) NULL <1(1/25) 000000 Ord 7
im""add d ( l / 2 8 ) e ( l / 0 ) t 0 0 7 (1 /2 6 ) 000000 Ord 6
IM STORE t 0 0 7 (0 /0 ) NULL h ( l / 0 ) 000000 Ord 5
im""add a ( l / 0 ) b ( l / l 9 ) t 0 0 2 (1 /1 1 ) 000000 Ord 4
im""store t 0 0 2 (0 /0 ) NULL c ( l / 2 8 ) 000000 Ord 3
IM ADD c ( l / 0 ) d ( l / 0 ) t 0 0 8 (1 /2 9 ) 000000 Ord 2
IM "store t 0 0 8 (0 /0 ) NULL f ( l / 0 ) 000000 Ord 1
In order to fully demonstrate the DAG approach, the code generated was examined.
24
A fragment of the code generated by the preceding program is presented with and 
without the DAG optimization. Both use the standard getreg() function as described in 
Section 4.4.
1 mov a x ,w o rd p t r s s : [ b p - 4 ] mov a x ,w o rd p t r s s : [b p -8 ]
2 add a x ,w o rd p t r s s : [ b p - 2 ] add a x ,w o rd p t r s s : [b p -6 ]
3 mov b x ,w o rd p t r s s : [ b p - 3 2 ] mov b x ,w o rd p t r s s : [b p -4 ]
4 add b x , word p t r s s : [ b p - 3 0 ] add b x ,w o rd p t r s s : [ b p - 2 ]
5 mov c x ,a x mov c x ,b x
6 add c x ,w o rd p t r s s : [ b p - 1 2 ] add cx ,w o rd p t r s s : [b p -1 2 ]
7 mov d x ,w o rd p t r s s : [ b p - 8 ] mov d x ,c x
8 add d x ,w o rd p t r s s : [ b p - 6 ] add d x ,w o rd p t r s s : [b p -1 4 ]
9 mov d i ,w o r d p t r s s : [ b p - 3 0 ] mov d i , word p t r s s : [b p -3 0 ]
10 add d i , a x add d i , b x
11 mov s i , c x mov s i , d i
12 add s i ,w o r d p t r s s : [ b p - 1 4 ] add s i , b x
13 mov word p t r  s s : [ b p - 2 4 ] ,a x mov word p t r  s s : [bp - 1 0 ] ,a x
14 mov a x , d i mov a x ,w o rd  p t r s s : [b p -3 2 ]
15 add a x ,w o rd  p t r s s : [ b p - 2 4 ] add a x ,w o rd  p t r s s : [b p -3 0 ]
16 mov word p t r  s s : [ b p - 2 0 ] ,a x mov word p t r  s s : [bp - 2 8 ] ,a x
17 mov a x ,b x add a x , d i
18 add a x , d i
S t a n d a r d Code DAG R e-O rd e re d Code
The differences between the two programs are bolded. The DAG approach was able 
to save two memory accesses (lines 7 and 12), and one instruction (line 17). However, 
by re-ordering the nodes an additional memory access was required (line 14). As such 
the total amount of improvement for this small example was 31 machine cycles or 
approximately 10%. More complex blocks could potentially yield even better results.
Overall the DAG approach was able to increase the overall efficiency of larger, more 
complex basic blocks. In doing so, no negative impacts were produced.
Refer to Appendix D for detailed examples and a copy of the source code.
4.6 Reaching Definitions
One type of variable lifetime analysis is reaching definitions. The use of reaching 
definitions involves the tracking of where a specific variable was last defined before 
reaching a given block. The use of reaching definitions for code optimizations is 
presented by Aho, Sethi, and Ullman [1], The primary use is for performing 
transformations such as constant folding, dead-code elimination, loop invariant detection, 
code motion, induction variable detection, strength reduction, and induction variable 
elimination.
25
The use of variable lifetime information can also be directly applied to register allocation 
techniques. Specifically, lifetime information about variables, within the context of the 
program control flow, can be used to determine which variables would be best left in 
register across multiple basic blocks. This allows a more global approach to register 
allocation. This approach would be easily applied to the various local register allocation 
techniques without significant alteration in the local allocation scheme.
During local register allocation using the getregf) algorithm, the values in registers are 
written to memory at the end of each basic block. This leads to the possibility of 
redundant register-to-memory and memory-to-register operations for some variables. 
However, depending on the data-flow, the second memory-to-register operation may be 
required due to a loop or jump.
The first step is to perform a data-flow analysis to generate a flow-graph. The flow-graph 
is then used to generate the data-flow information. This information is represented in 
the form of the sets inf) and outf) that are used indicate which variables are live going 
into and out of a basic block. Since the outf) of a block is the inf) of the next block, 
only the outf) need be saved. The outf) of a basic block represents the variables at the 
end of the basic block that are either generated within the block or enter the block and 
are not killed.
In order to calculate the outf), the inf), genf), and ktllf) sets are required. The genf) 
function represents variables that are generated in the block, the inf) function represents 
variables coming into the basic block, and the killf) function represents variables that are 
killed in the block. A variable may be left in a register across multiple blocks depending 
on the outf) set for a the successor block or blocks.
4.6.1 Data Flow Analysis
Program control flow information is required in order to accurately calculate the outf)*s. 
The generation of the control-flow information, in the form of a flow-graph, requires a 
separate pass over the quadtuples. The predecessor or successor information may be 
represented, depending on what the flow-graph will be used for. The data-flow 
information necessary for the reaching definitions, as described in the previous section, 
requires predecessor information.
26
A predecessor flow-graph would look like the following:
Figure 6, Predecessor Flow-Graph
Each node in the graph represents a basic block, and each arrow represents a pointer to 
the predecessor block or blocks. There can potentially be any number of predecessor 
blocks for each node, depending on the control-flow.
The following algorithm was developed in order to efficiently calculate the predecessor 
information. This algorithm is more efficient than the algorithm presented in the 
literature.
f o r  ( e a c h  b a s i c  b lo c k  B ) do 
i f  ( l a b e l  )
l a b e l s [ l a b e l _ n u m ]  = b lock_num ;
f o r  ( e a c h  b a s i c  b lo c k  B ) do
i f  ( l a s t _ s t a t e m e n t  != any_jum p ) 
p r e d [ B+1] = b lock_num ; 
i f  ( l a s t _ s t a t e m e n t  == jump )
p r e d [1 a b e l s [ j  m p_lbl_num ] =  b lock_num ; 
i f  ( l a s t _ s t a t e m e n t  == cond_jum p )
p r e d [ l a b e l s [ jm p _ lb l_ n u m ]= b lock_num ; 
p r e d [B + l ]  = block_num ;
e n d f o r ;
Figure 7, Predecessor Flow-Graph Algorithm
The predecessor flow-graph is then used to calculate the reaching definitions.
27
4.6.2 Computing the gen() and fallQ functions
The gen() and fdll() sets for each block must be generated prior to computing the out(). 
The gen() function for a block represents the variables that are generated or defined 
within that block. The kill() function for a block represents the other definitions that are 
killed by a new definition (i.e, gen()) of the same variable. That is, when a variable is 
re-defined, the new definition is said to kill all other definitions of the variable until 
either the end of the program or yet another re-definition.
The gen() and lallQ functions for single statements are very straight-forward to calculate. 
The gen() and killQ sets for each statement must be combined for a series of statements. 
A series of statements refers to a basic block for register allocation purposes. The gen() 
and kill() for a cascade of statements is computed as follows:
gen[B] = gen[B2] u ( gen[B,] - kill[B2] )
kill[B] = fdll[B2] u ( kz//[Bj] - gen[B2] )
Each variable definition is numbered in order of occurrence. This number is then used 
as an identifier or index for that specific definition.
4.6.3 Computing the out() function
The in() and out() sets are generated based on the flow-graph information and the gen() 
and killQ information. For example:
m[B] = u out[P]
out\B\ = gen\B\ u ( z«[B] - kill[B\ )
Where u out\F] represents the union of the out() sets of all the predecessor blocks. 
The predecessor information is contained in the flow-graph.
28
These sets are computed iteratively for all blocks in the program according to the 
following algorithm:
f o r  ( e a c h  b lo c k  B ) do 
o u t[B ]  = g e n [B ] ;  
e n d f o r ;
ch an g e  = TRUE; 
w h i l e  ( ch an g e  )
c h a n g e  = FALSE? 
f o r  ( e a c h  b lo c k  B ) do 
in [B ]  = u o u t [ P ] ;  
o l d o u t  = o u t [ B ] ;
o u t[B ]  = gen[B] u ( in [B ]  -  k i l l [ B ] ) ;  
i f  ( o u t[B ]  != o l d o u t  ) 
change  = TRUE?
e n d fo r?
en d w h ile ?
Figure 8, Algorithm to Calculate Reaching Definitions
This algorithm will essentially propagate a variable generation or definition across as 
many block as it live (i.e., not killed by another definition) for all possible paths of 
program control-flow.
4.6.4 Set Representation
The sets for in(), out(), gen(), and /cill() can be represented with a bit sequence. Each 
bit represents a variable definition. The variable definition numbering process produces 
a unique number for each definition in the program. This number is then used as an 
index into the bit sequence (i.e., bit 1 represents definition 1 and so forth).
Due to the potentially large number of sets required, this will help reduce the amount 
of storage space required to manipulate and save the information. The operations such 
as union and difference can be performed with logical operators. For example, the 
formula:
out[B] = gen[B] u ( m[B] - kill[B] )
can be implemented as follows:
ou/[B] = gen[B] | ( in[B] & -  {1dll[B]) )
Where -• implies NOT, | implies OR, and & implies AND. The relative speed at which 
the logical operations can be performed helps improve the efficiency of the overall 
algorithm.
29
4.6.5 Results
The code generated using the reaching definitions optimization, in almost all cases, tended 
to be worse with the optimization than without.
The reasons for this can be demonstrated with the following example program.
UNLV Language C om pile r
1: { Example Program  }
2:
3: p ro g ram  aho
4:
5: v a r  i ,  j ,  m, n : i n t e g e r ;
6: v a r  a ,  u l ,  u 2 ,  u3 : i n t e g e r ;
7:
8: b e g in
9: i  :=  1;
10: j  :=  l ;
11: u l  :=  1;
12: u2 :=  1;
13: u3 :=  1;
14: m := u l  -  1;
15: n :=  m + 2;
16: a  :=  u l ;
17:
18: w h i l e  ( i  < 4 ) do
19: b e g in
20: j  :=  j  + i ;
21: i  :=  i  -  1 ;
22: i f  ( j  > 4 ) t h e n
23: a :=  u2
24: e l s e
25: i  :=  u 3 ;
26: e n d ;
27: en d .
The getregf) function assigns registers as required to the variables and/or temporaries for 
the statements in the basic block on lines 9 through 16. All register values would be 
written to memory at the end of the basic block (line 16) as part of the normal getregf) 
strategy. This would free the registers to be used in the next basic block. The values 
required would need to be retrieved from memory as needed for the basic block 
beginning at line 18.
30
The following code fragment illustrates the normal operation of getregf) for the transition 
from the basic block ending at statement number 16 and the basic block beginning at 
statement 18:
mov word p t r  s s : [ b p - 1 2 ] , ax
mov word p t r  s s : [ b p - 1 0 ] , b x
mov word p t r  s s : [ b p - 8 ] , c x
mov word p t r  s s : [ b p - 4 ] , d x
mov word p t r  s s : [ b p - 2 ] , d i
mov word p t r  s s : [ b p - 1 6 ] , s i
LOOOOOO:
mov a x ,w o rd  p t r  s s : [ b p - 1 6 ]
cmp a x , 4 ? i f  s tm t
j l s h o r t  @000001
jmp L000001
@000001:
mov b x ,w o rd  p t r  s s : [ b p - 6]
add b x ,  1
The register contents for all registers are written to memory at the end of the basic block 
(indicated by the label LOOOOOO). The registers are available for the next block (in this 
case the while loop). Assuming that all the values in registers are live beyond the end 
of the basic block, they would be retained in registers as demonstrated in the following 
example.
L 0 0 0 0 0 0 :
mov
mov
cmp
j l
jmp
@0 0 0 0 0 1 :
add
word p t r  s s : [ b p - 1 2 ] , ax  
ax ,w o rd  p t r  s s : [ b p - 6 ]  
a x , 4 ; i f  s tm t  
s h o r t  @000001 
L 0 0 0 0 0 1
s i ,  1
However, a problem arises as registers are required in the succeeding basic block3. If the 
value required in a register is already in a register from the pervious basic block, then 
a gain has been made. If the value required in a register is not currently in a register, 
not only does the value have to be loaded from memory, but the contents of an existing 
register must be saved to memory in order to free the register.
This completely eliminates the potential gain from retaining a value in a register. The 
register-to-memory operation resulting from a delayed store might now be located inside 
a loop as in the example above. In fact, for loops that use all registers, the loops will 
always contain additional memory-to-register and register-to-memory operations. A loop 
might use all the registers due to a large number of operations inside the loop or the use 
of register intensive instructions (i.e., multiply and divide). The reaching definitions
Registers will almost certainly be required due to the mandatory register use for most instructions.
31
optimization made an improvement only for cases where the loop was very small, the 
program had few variables, and the variable in question was in a register from the 
previous basic block.
This is due to the fact that a variable is considered live going into a block, even if the 
variable is not used in that block. If a variable is maintained in a register and not used 
in the block, it has very little chance of being allowed to stay in a register due to the 
small number of available registers and the mandatory register use of the instruction set.
This caused the code generated using the reaching definitions global register allocation 
optimization to be of a very poor quality. The code quality degraded to that of single­
use register allocation due to the loading of loops with needless memory-to-register and 
register-to-memory operations.
Refer to Appendix E for detailed examples and a copy of the source code.
4.7 Live-Variable Analysis
Another type of variable lifetime analysis is live-variable analysis. Live-variable analysis 
is very similar to the reaching definitions in that the end-result is the computation of the 
sets in() and out(). The method of generating the sets differs in that the genf) and killQ 
functions are replaced by the use() and def() functions respectively.
The primary difference between the two methods is that live-variable analysis produces 
a more refined analysis. This is because if a variable is defined, used, and then at some 
point never used again it can be considered dead. The live-variable analysis will 
recognize this. The reaching definitions consider a variable dead only when it is re­
defined.
The reaching definitions tend to ignore undefined variables. The live-variable analysis, 
with the use() function, handles undefined variables correctly. This advantage is expected 
to be short lived in the context of a executable program.
The lifetime information and the program control flow information, are used in a similar 
manner to determine which variables would be best left in registers across multiple basic 
blocks. The data-flow analysis is also performed in a similar manner as described in 
Section 4.6.1. The flow-graph for live-variable analysis requires successor information 
instead of predecessor information. The flow-graph is created differently to encompass 
the successor information.
This live-variable analysis information is also represented in the form of sets which 
provide information about which variables are live going into and out of a basic block. 
During the calculation of the in() and out() sets, the use() and def() sets are required. 
The def() function represents variables that are defined in the block and the use() 
function represents variables that are used in the block. The in() function represents 
variables that are live coming into the basic block. Since the in() of a block is the out()
32
of the next block, only the in() need be saved. Based on the in() sets for a successor 
block or blocks a variable may be left in a register at the end of the current block.
4.7.1 Data Flow Analysis
The final calculation of the in()’s requires program control flow information. The 
generation of the control-flow information, in the form of a flow-graph, requires a 
separate pass over the quadtuples. The data-flow information required for the live- 
variable analysis, as previously described, requires successor information. A successor 
flow-graph would look like the following:
Figure 9, Successor Flow-Graph
Each node in the graph represents a basic block, and each arrow represents a pointer to 
the successor block or blocks. There will be at most two successor blocks. This fact 
allows for a less complex data structure to represent the successor flow-graph.
33
In order to generate the successor information, the following algorithm was developed.
f o r  ( e a c h  b a s i c  b lo c k  B ) do 
i f  ( l a b e l  )
b l k s [ l a b e l ]  = b lo c k _ n u m b e r ;
f o r  ( e a c h  b a s i c  b lo c k  B ) do
i f  ( l a s t _ s t a t e m e n t  == jump ) 
s u c c [B ]  = b l k s [ l a b e l ] ;
e l s e
s u c c [B ]  = n e x t_ b lo c k ;  
i f  ( l a s t _ s t a t e m e n t  == cond_jum p ) 
s u c c [B ]  = b l k s [ l a b e l ] ;
e n d i f ; 
e n d f o r ;
Figure 10, Successor Flow-Graph Algorithm
The successor flow-graph can then be used to perform the live-variable analysis.
4.7.2 Computing the in() function
The in() and out() sets are generated based on the flow-graph information and the use() 
and def() information. For example:
in[B] = use[B] u ( ouf[B] - deftB] )
out[B] = u i«[S]
Where u in[S] represents the union of all sets of successor blocks. The successor 
information is contained in the flow-graph.
34
These sets are computed iteratively for all blocks in the program according to the 
following algorithm:
f o r  ( e a c h  b lo c k  B ) do 
in [B ]  =* NULL; 
e n d f o r ;
ch a n g e  = TRUE; 
w h i l e  ( ch an g e  )
c h an g e  = FALSE; 
f o r  ( e a c h  b lo c k  B ) do 
o u t[B ]  = u i n [ S ] ;  
o l d i n  = i n [ B ] ;
in [B ]  = u se [B ]  u (o u t[B ]  -  d e f [ B ] ) ;  
i f  ( in [B ]  != o l d o u t  ) 
ch an g e  = TRUE;
e n d f o r ;
e n d w h i le ;
Figure 11, Algorithm to Perform Uve-Variables Analysis
This algorithm will essentially propagate a variable definition across as many block as it 
live (i.e., not killed by another definition or by lack of use) for all possible paths of 
program control-flow.
4.7.3 Results
The results from the live-variable analysis were very similar to the results from the 
reaching definitions. The sets generated from the live-variable analysis were more refined 
than the sets generated from the reaching definitions, but it made little or no difference 
in the code generated.
The problem is related to the fact that variables are considered live going into a block, 
even if they are not used in that block. If a variable is maintained in a register and not 
used in the block, there is a very high chance that the register will be required for other 
operations prior to its eventual use in a successive block. If the register is required, the 
register-to-memory code is generated anyway. Not only does this eliminate the potential 
savings, but the loops tend to become loaded with additional, needless register-to-memory 
and memory-to-register operations. The code generated is significantly worse and almost 
degrades to single-use register allocation.
35
4.8 Variable-Use Analysis
Although the reaching definitions and live-variable analysis did not generate improved 
code, the basic assumption that some variables can be left in registers across a basic 
block is still valid. A number of cases are found in the code generated with the getregf) 
function where a variable is written to memory and then, in a successive block, the 
variable is read back into a register. The inf) and outf) sets, as generated, are to 
generalized to be useful for the specific architecture.
Another method of determining which variables can be retained in registers across blocks 
was required in order address the problems with the reaching definitions and live-variable 
analysis. It was determined by examining the generated code that the variables that 
potentially could be retained in registers across blocks were, in almost all cases, variables 
that were actually used in the successor block. The inf) and outf) sets specified variables 
that were live, which does not necessarily mean that the variable is actually used.
A new approach was created, which was named variable-use analysis, and subsequently 
implemented in order to evaluate this new approach. The implementation followed the 
basic strategy as outlined in live-variable analysis. The generation of the data-flow 
information is exactly the same as for the live-variable analysis which is outlined in 
Section 4.7.1.
The primary difference is that instead of the inf) function being used during code 
generation, the usef) function was used. The usef) function, already generated as part 
of the live-variable analysis, contains the information about variables are actually used in 
a given block. The flow-graph represents the information about the successor block or 
blocks.
At the end of a block, the usef) function for the successor block is used to determine 
which variables can be allowed to stay in registers. With the usef) function there is a 
reasonable assurance that the variable, since it is used in the successive block, wont need 
to be put back into memory before being accessed.
4.8.1 Results
The variable-use analysis tended to have little or no impact for programs with a number 
of small basic blocks. Small basic blocks tend to be generated for a class of control- 
oriented programs. Such program might include I/O  processing (where little or no data 
processing is performed).
The variable-use analysis tended to improve the register allocation only for a series of 
basic blocks that used the same variables. This might occur in programs that evaluated 
a formula or performed a series calculations within a specific control structure.
36
This results of the variable-use analysis can be demonstrated with the following example 
program.
UNLV Language C o m p ile r
1: { Example P rogram  }
2:
3: p rog ram  v _ u se
4:
5: v a r  i , j , a : i n t e g e r ;
6:
7: b e g in
8: i  :=  1 ;
9: j  :=  5 ;
10: a  : = 0;
11:
12: w h i l e  ( i  < 4 ) do
13: b e g in
14: i  :=  i  + 1 ;
15: j  :=  j  -  1 ;
16: i f  ( j  > 4 )
17: a  :=  2
18: e l s e
19: a  :=  3;
20 : e n d ;
21: en d .
The following code fragment illustrates the normal operation of getreg() for the transition 
from the basic block ending at statement number 10 and the basic block beginning at 
statement number 12.
mov word p t r  s s : [ b p - 6 ] , a x  
mov word p t r  s s : [ b p - 4 ] , b x  
mov word p t r  s s : [ b p - 2 ] , c x  
LOOOOOO:
mov a x ,w o rd  p t r  s s : [ b p - 6 ]  
cmp a x , 4 ; i f  s tm t  
j l  s h o r t  @000001 
jmp L000001 
@000001:
The contents of all registers are written to memory before the start of the next block. 
Then, the next block must obtain values from memory as demonstrated by this code 
fragment.
37
The variable-use method detects that a variable, currently in a register, is used in the 
successor block, and the value is retained in a register as demonstrated in the following 
code fragment.
mov word p t r  s s : [ b p - 4 ] , b x  
mov word p t r  s s : [ b p - 2 ] , c x  
LOOOOOO:
cmp a x , 4 ; i f  s tm t  
j l  s h o r t  @000001 
jmp L000001 
@000001:
The value was retained in a register and used in the successive block. Both the register- 
to-memory and memory-to-register operation were saved. Since the second memory-to- 
register operation was in a loop, there is a potential for significant execution time savings.
If the loop requires significant register use, either from a large number of quadtuples or 
register intensive instructions, the value that was retained in a register may need to be 
written to memory. This register-to-memory operation, and possibly another memory-to- 
register operation will then be performed inside a loop. The loop would then be loaded 
with some additional register-to-memory and memory-to-register operations.
Based on the register use of the loop, the variable-use global register allocation method can 
be slightly erratic.
4.9 Peephole Optimization
Peephole optimization is applied after code generation. An instruction set specific peephole 
optimizer is applied to a small window or subset of the generated code. The specialized 
peephole optimizer performs basic, architecture specific simplifications. This provides a last 
chance to improve the generated code or remove relatively obvious problems with the final 
assembly code.
A circular buffer was added to the code generator in order to apply the peephole 
optimizations. A specialized print routine was used for the code generator produced. The 
print routine maintained the last N generated lines in a circular buffer. The oldest 
instruction is written to the final output file as new instructions are added. The buffer 
allows the peephole optimizer to be executed on the last N instructions in the buffer. This 
makes it possible to perform any potential updates to the generated instructions while they 
are still in memory.
38
4.9.1 Optimizations
The peephole optimizer performs a series of architecture specific optimizations. These 
include the use of machine idioms, algebraic simplifications, and redundant-instruction 
elimination. The peephole optimizations are limited to basic blocks, but this does not effect 
most of the peephole optimizations being performed.
An attempt is made to use efficient instructions as part of the code generation process. 
This includes the use of short jumps4 where possible. Instructions that may be sequential 
in the final assembly code are often generated in different parts of the code generator. For 
example, if an instruction requires a register, the getregQ function will be called. Spill code 
instructions might be generated if a register must be made available by the getregQ function, 
prior to the original instruction being produced by the back-end.
The assembly instructions are reviewed for a set of possible optimizations during the 
peephole optimization pass. These include the algebraic simplifications of 
addition/subtraction of an operand by 0 or 1 or the multiplication of an operand by 0 or 1. 
In such cases the instruction can be either removed or altered. Multiplication by 0 can be 
replaced by setting the operand to zero. Multiplication of an operand by a factor of two 
is replaced with a corresponding shift instruction.
A check for redundant register-to-memory and memory-to-register operations is also 
performed. It is unlikely that a redundant load/store will be generated within a basic block 
with the way the getregQ function works. A redundant load/store across blocks will not be 
removed since the peephole cannot span multiple blocks. Since a block may be reached 
by multiple other blocks, the load/store may not be redundant depending on the program 
control-flow. The variable-use analysis, as described in Section 4.8, addresses this issue.
4.9.2 Results
The peephole optimizer did very well for the specific types of optimizations addressed. 
For some programs, depending on the code, the overall execution speed can be enhanced 
significantly. However, there are a limited number of optimizations that can be addressed 
by the peephole optimizer.
4 A short jump uses a displacement as opposed to a specific address. For the Intel architecture, the 
displacement jump is faster and has a smaller op-code.
39
The following example program illustrates the operation of the peephole optimizer. Since 
it is unlikely that most of these operations would be performed as shown here, this program 
is an example only.
UNLV Language C o m p ile r
1: p rog ram p e e p
2:
3: v a r  a , b , ,c,d : i n t e g e r
4:
5: b e g in
6:
7: a  : = 1;
8: b : = 0;
9: c : = 2 ;
10:
11: d : = c * 0;
12: c : = a  * l ;
13: a  : = c + 0;
14: c : = a  + l ;
15: a : = c  - l ;
16: c  : = a  - 0 ;
17: a  : = c  * 2 ;
18: c  : = a * 4 ;
19:
20: en d .
The peephole optimizer will recognize the multiplication by 1 and remove the instruction. 
Multiplication by 0 will be replaced with an x o r5 instruction. Addition or subtraction by 
1 will be replaced with an in c  or dec  instruction respectively. For addition or 
subtraction by 0, the instruction will be removed.
s The xor instruction is the fastest method of setting an operand to zero.
40
The following code fragment illustrates the generated code as optimized by the peephole
optimizer.
mov a x ,  1 mov a x ,  1
mov b x ,  0 x o r b x ,b x
mov c x ,2 mov c x ,  2
mov d i , a x mov d i , a x
mov a x , c x mov a x ,  cx
mov b x ,  0 x o r b x ,b x
im u l bx
mov w ord p t r s s : [ b p - 8 ] , d i mov word p t r s s : [ b p - 8 ] , d i
mov d i , a x mov d i , a x
mov a x ,w o rd  p t r  s s : [ b p - 8 ] mov a x ,w o rd  p t r  s s : [ b p - 8 ]
mov b x ,  1 mov b x , 1
im u l bx
mov word p t r s s : [ b p - 4 ] , a x mov word p t r s s : [ b p - 4 ] , a x
add a x ,  0
mov word p t r s s : [ b p - 8 ] , a x mov word p t r s s : [ b p - 8 ] , a x
add a x ,  1 i n c ax
mov word p t r s s : [ b p - 4 ] , a x mov word p t r s s : [ b p - 4 ] , a x
su b a x ,  1 d e c ax
mov word p t r s s : [ b p - 8 ] , a x mov word p t r s s : [ b p - 8 ] , a x
sub a x ,  0
mov word p t r s s : [ b p - 2 ] , d i mov word p t r s s : [ b p - 2 ] , d i
mov d i , a x mov d i , a x
mov b x ,  2 mov b x ,  2
im ul bx s h l a x ,  1
mov word p t r s s : [ b p - 4 ] , d i mov word p t r s s : [ b p - 4 ] , d i
mov d i ,  ax mov d i , a x
mov b x ,  4 mov b x ,4
im ul bx s h l a x ,  2
S ta n d a r d  Code P e e p h o le  O p tim iz e d  Code
Blank lines have been added where instructions have been removed for clarity. There are 
no blank lines in the final assembly file. A significant improvement has been made in the 
execution time of the program. This magnitude of gain is not expected for typical 
programs.
Overall, the peephole optimizer will perform architecture specific optimizations on the 
generated code. The program, depending on how it is written, might not present any 
instances for performing such optimizations. When these optimizations can be made, there 
will always be an improvement in execution time.
41
4.10 Other Optimizations
There are a series of architecture independent optimizations that can be performed on the 
quadtuples in addition to the architecture specific register allocation mechanisms and 
optimizations [1][2][21][22]. These optimizations include common subexpression 
elimination, copy propagation, loop optimizations, code motion, and induction variables 
strength reduction. These optimizations will indirectly effect register allocation and directly 
effect overall program execution time.
Such optimizations can be combined with some of the register allocation techniques 
previously described. Many intermediate code optimizations require the flow-graph 
information. The flow-graph can be created once, and used for a series of optimizations. 
This would help reduce the overall optimization overhead required during compilation.
42
Chapter 5
Conclusion
The goal of this thesis was to categorize the register allocation and optimization 
techniques and the related implementation issues for the Intel architecture. The 
techniques investigated included usage counts, graph coloring, directed acyclic graphs, 
register descriptors, various lifetime analysis techniques, and peephole optimizations. 
These techniques were investigated and categorized as to their effectiveness for the Intel 
architecture.
Of the register allocation and optimization techniques investigated, the customized 
directed acyclic graph, register descriptors, customized variable-use analysis, and peephole 
optimization were the most effective for the given architecture. The final code generation 
and register allocation strategy for the Intel architecture is as follows:
I n t e r m e d i a t e
C o d e
Code Code Final
Optlm Izatlon Generation Optimization
A s s e m b l y
C o d e
• DAG
• V a r i a b l e - u a e
R e g i s t e r
D e s c r i p t o r s
P e e p h o l e
O p t i m i z e r
Figure 12, Final Code Generation and Register Allocation Strategy
The usage counts, graph coloring, reaching definitions, and live-variable analysis 
techniques were not effective for the Intel architecture. This was due to the limited 
number of register available and mandatory register use requirements of the instruction 
set.
An additional observation from this effort is that the register use by the instruction set 
is as important as the number of registers available. This is especially true for target 
machines with a relatively small number of available registers. As part of categorizing 
the results of register allocation techniques in the literature, the only machine dependant 
issue considered is the number of registers available. Categorization for the Intel 
architecture based only on the number of free registers available is misleading.
43
Bibliography
[1] Aho, A. Sethi, R., Ullman, J. 1985 Compiler Principals, Techniques, and Tools. 
Addison-Wesley Publishing Company.
[2] Fischer, C., LeBlanc, R. 1988 Crafting a Compiler. The Benjamin/Cummings 
Publishing Company.
[3] Young, R. 1991 An implementation of the "UNLV2-Version-2" Programming 
Language.
[4] McCauley, Daniel. 1989 Design and Implementation o f a Procedural Language 
"UNLV2" with Code Generation and Optimization. Masters Thesis, University 
Nevada, Las Vegas.
[5] Intel iAPX 86/88, 186/188, User's Manual. 1985 Intel Corporation Literature 
Department.
[6] Intel iAPX 286 Programmers Reference Manual. 1985 Intel Corporation Literature 
Department.
[7] Duncan, R., Petzold, C., Baker, M., Schulman, A., Davis, S., Nelson, R., Moote, 
R. 1990 Extending DOS. Addison-Wesley Publishing Company.
[8] McKusick, M. 1984 "Register Allocation and Data Conversion in Machine 
Independent Code Generators". Ph.D. Dissertation, University of California, 
Berkeley.
[9] Freiburghouse, R. 1974 "Register allocation via usage counts". Communications 
o f the ACM  17:11, pages 638-642.
[10] Belady, L.A. 1966 "A study of replacement Algorithms for a virtual storage 
computer". IBM Systems Journal 5:2, pages 78-101.
[11] Loweiy, E.S., Medlock, C. W. 1969 "Object Code Optimizations".
Communications o f the ACM  12, pages 13-22.
[12] Marill, T. 1962 "Computational chains and the simplification of computer 
programs". IEEE Transactions on Electronic Computers EC: 11-2, pages 173-180.
[13] Chaitin, G., Auslander, M., Chandra, A., Cocke, J., Hopkins, M., and Markstein, 
P. 1981 "Register Allocation Via Coloring" Computer Languages, V6, pages 47- 
57.
44
[14] Chaitin, G. 1982 "Register Allocation & Spilling Via Graph Coloring. ACM 
SIGPLAN Notices 17 {Proceedings o f the SIGPLAN 82 Symposium on Compiler 
Construction), pages 201-207.
[15] Chow, F. and Hennessy J. 1984 "Register Allocation by Priority-based Coloring" 
ACM SIGPLAN Notices 19 {Proceedings o f the ACM  SIGPLAN 84 Symposium on 
Compiler Construction), pages 222-232.
[16] Briggs, P., Cooper, K , Kennedy, K., Torczon, L. 1989 "Coloring Heuristics for 
Register Allocation" Computer Languages, pages 275-284.
[17] Gupta, R., Soffa, M., Steele, T. 1989 "Register Allocation Via Clique Separators" 
Computer Languages, pages 264-263.
[18] Hendrix, J. 1990 A  Small C Compiler M&T Books, M&T Publishing Company.
[19] Holub, A. 1990 Compiler Design in C. Prentice-Hall Publishing Company.
[20] Condit, A., Maichle, B., Yfantis, A. 1991 Personal Conversations.
[21] Lombardo, J., Yfantis, A. 1991 "Loop Optimization Based on Strength Reduction 
and Code Motion of Exponential Functions". ISMM International Conference, 
Computer Applications in Design and Simulation Analysis. March 19-21,1991 Las 
Vegas, Nevada.
[22] Lombardo, J. 1991 "Implicit Recognition of Parallelism by Compiler 
Optimization". ISMM International Conference, Computer Applications in Design 
and Simulation Analysis. March 19-21, 1991 Las Vegas, Nevada.
45
Appendix A
Intel 80x86 Architecture Overview
This appendix presents a brief overview of the Intel 80x86 architecture. The instruction 
set and address translation mechanisms are unique to the Intel architecture.
This overview includes only the base architecture and not some of the processor specific 
extensions. The processor specific extensions are not addressed by the compiler. The 
processor specific extensions refer to the architecture enhancements for alternate memory 
addressing mechanisms (i.e., virtual memory).
One reason for the processor extensions not being addressed by the compiler is that the 
80286 extension are not compatible with the 80386 extensions. This would require 
significantly different code to be generated based on the processor being used. 
Additionally, code generated would not be portable across different machines.
The primary reason the processor specific extensions are not used is that the 80286 and 
80386 processor extensions are not used by the operating system (i.e., DOS). Code 
written using the processor extensions is unable to call operating system functions to 
perform system interaction (i.e., file or terminal input/output) as a result of the operating 
system not using the processor specific extensions.
Register Summary:
There are 14 registers for the base Intel 80x86 architecture. The registers are as follows:
AX Accumulator
BX Base Register (only register allowing indexed addressing)
CX Count Register
DX
SI Index Registers (for indexed mode addressing)
DI
CS Code Segment Register
DS Data Segment Register
SS Stack Segment Register
ES Extra Segment Register
46
IP Instruction Pointer
BP Base Pointer
SP Stack Pointer
Flag Flag Register
The data registers (AX, BX, CX, and DX) and the index registers are available for 
general use. However, there are some restrictions. The BX, SI, and DI registers are the 
only registers that can perform indirect addressing.
The segment registers are used for address translation, and are generally not available 
otherwise.
The SP, and BP registers are used to access the stack.
Instruction Summary:
The following is a brief overview of the instructions that perform arithmetic operations. 
These are presented to illustrate the instruction set and the mandatory register use 
requirements.
ADD R, M R = R + M
ADD M, R M = M + R
ADD Ri , Rj R i = R i + Rj
ADD M, M I l l e g a l
SUB R, M R = R - M
SUB M, R M = M - R
SUB Ri , Rj R i = R i -  Rj
SUB M, M I l l e g a l
MUL M AX/DX = AX *
MUL R AX/DX = AX *
DIV M AX = AX /  M
DIV R AX = AX /  R
Since most memory-to-memoiy operations are illegal, a register must be assigned to at 
least one operand. This is true for almost all instructions. Instructions such as the 
multiply, require the AX and DX registers.
47
Addressing Modes Summary:
The following is an overview of the addressing modes.
Direct (contents of a variable or register).
Immediate (immediate value).
Indirect (contents of a specified address)
Displacement (contents of a specified address plus a fixed or constant 
displacement).
Indexed (contents of a specified address plus a variable displacement).
The indirect, displacement, and indexed addressing modes can only be performed certain 
registers.
MOV R, [BX] I n d i r e c t  mode l e g a l  w i th  BX, S I ,  DI o n ly .
MOV [BX], R I n d i r e c t  mode l e g a l  w i t h  BX, S I ,  DI o n ly .
MOV R, [B X ][S I]  In d e x e d  mode l e g a l  w i th  BX S I /D I  o n ly .
MOV [ B X ] [S I ] , R In d e x e d  mode l e g a l  w i th  BX S I /D I  o n ly .
MOV M, M I l l e g a l
For indexed memory operations, the SI or DI register (typically used as an offset from 
a base) is added to the BX (typically used as a base) to form the final offset. The offset 
is during address translation.
Address Translation:
The address translation is how the final physical addresses are generated. Memory is 
logical divided into work-areas or memory segments. Then, information in the segment is 
accessed as an offset into the segment. This requires two registers, one for the segment 
address and another for the offset within that segment.
This allows the architecture to access 220 (or 1 MB) of memory with 16 bit registers. The 
formula used for perform address translation is:
physical address = ( segment register * 16 ) + offset
The multiple is perform using a bit shift operation for speed. The address translation is 
performed by the hardware automatically for all memory accesses. There is no way to 
circumvent address translation.
48
Appendix B
Single-Use Register Allocation
This appendix presents the some examples and the source code for the single-use register 
allocation.
The examples consist of the list file and the final generated assembly language file for 
each example. The source code follows the examples.
49
B.l Examples
The following are example programs along with the assembly language source files.
UNLV Language C o m p ile r
1: { Example P rogram  }
2 :
3: p ro g ram  f i b
4 :
5: v a r  a , b , c , d , e :  i n t e g e r ;
6;
7 ; b e g in
8 ; w r i t e l n  ( " f i b o n a c i  s e r i e s " ) ;
9 : a  :=  1;
10: b :=  1 ;
11: c  :=  1;
12: e  :=  0;
13:
14: w h i l e  ( (c  < 30) and  (e  = 0) ) do
15: b e g in
16: i f  (a  < 0) th e n
17: b e g in
18: w r i t e l n  ( " o v e r f l o w . " ) ;
19: e  :=  1;
2 0: end
21: e l s e
22: w r i t e  ( a ) ;
23 : d  :=  a  + b ;
24: a  : = b ;
25 : b  :=  d ;
26 : c  :=  c  + 1;
27: i f  (e  = 1) o r  (c  mod 10 = 0) t h e n
28: w r i t e l n  (" ")
29 : e l s e
30: w r i t e  ( " ,  11) ;
31: e n d ;
32: en d .
50
TEXT
assum e
e x t r n
e x t r n
e x t r n
e x t r n
e x t r n
e x t r n
seg m e n t b y t e  p u b l i c  'CODE' 
 c s :_TEXT, d s : _DATA, s s : _DATA
r d i n t : n e a r  
r d f l t r n e a r  
w r i n t : n e a r  
w r f l t t n e a r  
w r s t r : n e a r  
w r e o l : n e a r
f i b  p r o c  f a r  
f i n i t  
p u sh  d s  
x o r  a x , ax  
p u sh  ax  
mov a x , _DATA 
mov d s , ax  
mov s s , ax  
mov s p , _DATA—_STACK 
p u sh  ax  ; n u l l  d i s p l a y  r e g
p u sh  a x  ; r e t u r n  a d d r e s s
p u sh  bp ; s a v e  a c t i v a t i o n  r e c o r d  l i n k
mov b p , s p  
mov d s :_ d i s p + 0 , bp
su b  s p ,3 8  ; a l l o c a t e  s p a c e  f o r  l o c a l  v a r s
mov word p t r  s s : [ b p - 1 2 ] , o f f s e t  d s : _ l i t + 0  
p u sh  word p t r  s s : [ b p - 1 2 ]  
p u sh  d s :_ d i s p + 0
c a l l   w r s t r
add  s p ,4  
p u sh  d s :_ d i s p + 0
c a l l   w re o l
add  s p ,2
mov word p t r  s s : [ b p - 1 0 ] , 1  
mov w ord p t r  s s : [ b p - 8 ] , l  
mov w ord p t r  s s : [ b p - 6 ] , l  
mov w ord p t r  s s : [ b p - 2 ] , 0  
L000000:
mov a x ,w o rd  p t r  s s : [ b p - 6 ]  
cmp a x , 30 
j n l  s h o r t  @000006 
mov w ord p t r  s s : [ b p - 1 4 ] , 1  
jmp s h o r t  @000007 
@000006:
mov word p t r  s s : [ b p - 1 4 ] , 0  
@000007:
mov a x ,w o rd  p t r  s s : [ b p - 2 ]  
cmp a x ,0
j n e  s h o r t  @000008
mov word p t r  s s : [b p - 1 6 ] ,1
jmp s h o r t  @000009
51
@000008:
mov w ord p t r  s s : [ b p - 1 6 ] ,0
@000009:
mov a x ,w o rd  p t r  s s : [ b p - 1 4 ]
and a x ,w o rd  p t r  s s : [ b p - 1 6 ]
mov word p t r  s s : [ b p - 1 8 ] , ax
mov a x ,w o rd  p t r  s s : [ b p - 1 8 ]
o r a x ,  ax
j n z @000001
jmp L000001
@000001:
mov a x ,w o rd  p t r  s s : [ b p - 1 0 ]
cmp
j l
a x ,0  ; i f  s tm t
s h o r t  @000002
jmp L000002
@000002:
mov word p t r  s s : [ b p - 2 2 ] , o f f s e t  d s : _ l i t + 1 6
p u sh word p t r  s s : [ b p - 2 2 ]
p u sh d s :_ d i s p + 0
c a l l __w r s t r
add s p ,  4
p u sh d s :_ d i s p + 0
c a l l __w re o l
add s p ,  2
mov word p t r  s s : [ b p - 2 ] , l
jmp L000003
L000002:
p u sh word p t r  s s : [ b p - 1 0 ]
p u sh d s :_ d i s p + 0
c a l l __w r i n t
add s p ,  4
L000003:
mov a x ,w o rd  p t r  s s : [ b p - 1 0 ]
add a x ,w o rd  p t r  s s : [ b p - 8 ]
mov w ord p t r  s s : [ b p - 2 4 ] , ax
mov a x ,w o rd  p t r  s s : [ b p - 2 4 ]
mov w ord p t r  s s : [ b p - 4 ] , a x
mov a x ,w o rd  p t r  s s : [ b p - 8 ]
mov word p t r  s s : [ b p - 1 0 ] , ax
mov a x ,w o rd  p t r  s s : [ b p - 4 ]
mov word p t r  s s : [ b p - 8 ] , a x
mov a x ,w o rd  p t r  s s : [ b p - 6 ]
add a x ,  1
mov w ord p t r  s s : [ b p - 2 6 ] , ax
mov a x ,w o rd  p t r  s s : [ b p - 2 6 ]
mov word p t r  s s : [ b p - 6 ] , a x
mov a x ,w o rd  p t r  s s : [ b p - 2 ]
cmp a x ,  1
j n e s h o r t  @000010
mov word p t r  s s : [ b p - 2 8 ] , l
52
jmp s h o r t  @000011
@000010:
mov word p t r  s s : [ b p - 2 8 ] ,0
@000011:
mov a x ,w o rd  p t r  s s : [ b p - 6 ]
p u sh bx
mov b x , 10
cwd
i d i v bx
pop bx
mov a x ,d x
cmp a x ,  0
j n e s h o r t  @000012
mov word p t r  s s : [ b p - 3 2 ] , 1
jmp s h o r t  @000013
@000012:
mov word p t r  s s : [ b p - 3 2 ] , 0
@000013:
mov a x ,w o rd  p t r  s s : [ b p - 2 8 ]
o r a x ,w o rd  p t r  s s : [ b p - 3 2 ]
mov word p t r  s s : [ b p - 3 4 ] , ax
mov a x ,w o rd  p t r  s s : [ b p - 3 4 ]
o r a x ,  ax
j n z @000004
jmp L000004
@000004:
mov word p t r  s s : [ b p - 3 6 ] , o f f s e t d s : _ l i t + 2 6
p u sh word p t r  s s : [ b p - 3 6 ]
p u sh d s :_ d i s p + 0
c a l l __w r s t r
add s p ,  4
p u sh d s :_ d i s p + 0
c a l l w re o l
add s p ,  2
jmp L000005
L000004:
mov word p t r  s s : [ b p - 3 8 ] , o f f s e t d s : _ l i t + 2 8
p u sh word p t r  s s : [ b p - 3 8 ]
p u sh d s :_ d i s p + 0
c a l l __w r s t r
add s p ,  4
L000005:
jmp L000000
L000001:
mov a h ,4 c h
i n t 21h
f i b  endp
53
TEXT e n d s
DATA seg m en t word p u b l i c  'DATA'
d i s p  dw 1 dup(O)
l i t  db 1 0 2 ,1 0 5 ,9 8 ,1 1 1 ,1 1 0 ,9 7 ,9 9 ,1 0 5 ,3 2 ,1 1 5 ,1 0 1  
db 1 1 4 ,1 0 5 ,1 0 1 ,1 1 5 ,0
db 1 1 1 ,1 1 8 ,1 0 1 ,1 1 4 ,1 0 2 ,1 0 8 ,1 1 1 ,1 1 9 ,4 6 ,0
db 3 2 ,0
db 4 4 ,3 2 ,0
STACK l a b e l  b y te
DATA e n d s
NOUSE seg m en t word s t a c k  'STACK'
NOUSE e n d s
end  f i b
54
UNLV L anguage C o m p ile r
1: { Example Program  }
2:
3: p ro g ram  v _ u se
4:
5: v a r  i ,  j ,  m, n : i n t e g e r ;
6: v a r  a , u l , u 2 , u 3 : i n t e g e r
7:
8: b e g in
9: m :=  1 ;
10: n : = 1 ;
11: u l  :=  1;
12: u2 :=  1;
13: u3 :=  1 ;
14: m :=  m + n ;
15: i  :=  m -  1;
16: j  :=  n ;
17: a  :=  u l ;
18:
19: w h i l e  ( i  < 4 ) do
20: b e g in
21: i  :=  i  + 1 ;
22: j  :=  j  -  1?
23: i f  ( j  > 4 ) t h e n
24: a := u2
25: e l s e
26: i  := u 3 ;
27: e n d ;
28: e n d .
55
TEXT seg m e n t b y t e  p u b l i c  'CODE*
assum e c s :_TEXT, d s :_DATA, s s :_DATA
e x t r n  __r d i n t : n e a r
e x t r n  __r d f I t : n e a r
e x t r n  __w r i n t : n e a r
e x t r n  __w r f I t : n e a r
e x t r n  __w r s t r : n e a r
e x t r n  w r e o l : n e a r
v u s e p r o c  f a r
f i n i t
p u sh d s
x o r a x ,  ax
p u sh ax
mov ax,_DATA
mov d s , a x
mov s s ,  ax
mov s p , _DATA-_STACK
p u sh a x  ; n u l l  d i s p l a y  r e g
p u sh a x  ; r e t u r n  a d d r e s s
p u sh  bp  ; s a v e  a c t i v a t i o n  r e c o r d  l i n k
mov b p , s p
mov d s : _ d i s p + 0 ,b p
sub s p ,2 8  ; a l l o c a t e  s p a c e  f o r  l o c a l
mov word p t r  s s : [ b p - 1 2 ] ,1
mov word p t r  s s : [ b p - 1 0 ] , 1
mov word p t r  s s : [ b p - 6 ] , l
mov word p t r  s s : [ b p - 4 ] , l
mov w ord p t r  s s : [ b p - 2 ] , 1
mov a x ,w o rd  p t r  s s : [ b p - 1 2 ]
add a x ,w o rd  p t r  s s : [ b p - 1 0 ]
mov word p t r  s s : [ b p - 1 8 ] , ax
mov a x ,w o rd  p t r  s s : [ b p - 1 8 ]
mov word p t r  s s : [ b p - 1 2 ] , ax
mov a x ,w o rd  p t r  s s : [ b p - 1 2 ]
su b a x ,  1
mov word p t r  s s : [ b p - 2 0 ] , a x
mov a x ,w o rd  p t r  s s : [ b p - 2 0 ]
mov word p t r  s s : [ b p - 1 6 ] , ax
mov a x ,w o rd  p t r  s s : [ b p - 1 0 ]
mov word p t r  s s : [ b p - 1 4 ] , ax
mov a x ,w o rd  p t r  s s : [ b p - 6 ]
mov word p t r  s s : [ b p - 8 ] , a x
L000000:
mov a x ,w o rd  p t r  s s : [ b p - 1 6 ]
cmp a x , 4 ; i f  s tm t
j l
jmp
s h o r t  @000001
L000001
@000001:
mov a x ,w o rd  p t r  s s : [ b p - 1 6 ]
add a x ,  1
56
mov w ord p t r  s s : [ b p - 2 4 ] , ax  
mov a x ,w o rd  p t r  s s : [ b p - 2 4 ]  
mov w ord p t r  s s : [ b p - 1 6 ] , ax  
mov a x ,w o rd  p t r  s s : [ b p - 1 4 ]  
su b  a x , l
mov w ord p t r  s s : [ b p - 2 6 ] , ax  
mov a x ,w o rd  p t r  s s : [ b p - 2 6 ]  
mov word p t r  s s : [ b p - 1 4 ] , ax  
mov a x ,w o rd  p t r  s s : [ b p - 1 4 ]  
cmp a x , 4 ; i f  s tm t  
j g  s h o r t  @000002 
jmp L000002
@000002:
mov a x ,w o rd  p t r  s s : [ b p - 4 ]  
mov word p t r  s s : [ b p - 8 ] , a x  
jmp L000003
L000002:
mov a x ,w o rd  p t r  s s : [ b p - 2 ]  
mov w ord p t r  s s : [ b p - 1 6 ] , ax  
L000003:
jmp L000000
L000001:
mov a h ,4 c h
i n t  2 l h
v _ u s e  endp
_TEXT e n d s
_DATA seg m en t word p u b l i c  'DATA'
_ d i s p  dw 1 d u p (0 )
_STACK l a b e l  b y t e
_DATA e n d s
_N0USE seg m en t word s t a c k  'STACK'
_NOUSE e n d s
end  v  u s e
57
B.2 Source Code
The following is the source code for the single-use register allocation technique. This is 
the complete back-end. No additional support requires are required. The include files 
are located in Appendix J for reference.
B.2.1 Source Code, Single-Use Register Allocation
#i n c l u d e
# i f d e f  _
#i n c l u d e  
# e l s e  
#i n c l u d e  
# e n d i f  
#i n c l u d e  
#i n c l u d e
# i n c l u d e  
# i n c l u d e  
# i n c l u d e  
#i n c l u d e
# i n c l u d e  
#i n c l u d e  
#i n c l u d e
< s t d i o . h >
TURBOC__
< a l l o c .h >
< m a l lo c .h >
" b a c k e n d .h "
" d e f s . h "
"sym .h"
" c o d e .h "
" e r r c o d e s . h "
" o p s t a c k .h "
" o p t i m iz e .h "
" r e g s . h "
" t u p l e s . h "
e x t e r n  FILE * asm o u t;
e x t e r n  i n t  d e b u g ;
e x t e r n  i n t  g e n _ c o d e ;
e x t e r n  Sym * d i s p l a y [ ] ;
e x t e r n  Sym *prognam e;
e x t e r n  s t r u c t  r e g _ d e s c  r e g i s t e r s [ ] ;
e x t e r n  s t r u c t  s t a c k  s t a c k [ ] ;
e x t e r n  i n t  s t a c k _ s i z e ;
e x t e r n  i n t  t e m p _ l a b e l s ;
e x t e r n  i n t  l o c a l s ;
e x t e r n  i n t  l e v e l ;
e x t e r n  i n t  t u p l e _ i n d e x ;
e x t e r n  i n t  m a x _ le v e l ;
e x t e r n  s t r u c t  b a s i c  * b lo c k s ;  
e x t e r n  s t r u c t  b a s i c  * b l o c k _ s t a r t ;
e x t e r n  s t r u c t  t u p l e  t u p l e s [ ] ;
s t r u c t  d a t a _ i t e m  * d a t a _ l i s t  = NULL; 
s t r u c t  d a t a _ i t e m  * d a t a _ l i s t _ h e a d  = NULL; 
i n t  d a t a _ l i s t _ o f f s e t  = 0; 
i n t  l a s t  l e v e l  = 0 ;
58
/ *
* b ac k e n d  () — g e n e r a t e  a c t u a l  a s se m b ly  la n g u a g e  i n s t r u c t i o n s  from
* t h e  q u a d t u p l e s . . .
* /
e x t e r n  c h a r  * r e g _ s t r [ ] ;  
e x t e r n  c h a r  * c h a r _ r e g _ s t r [ ] ;
v o id  b a c k e n d ()
{
e x t e r n  c h a r  * g e n _ a d d r ( ) ;
e x t e r n  c h a r  * g e n _ a d d r2 ( ) ;
e x t e r n  v o id  a d d 2 r e g ( ) ;
e x t e r n  v o id  a l l o c a t e _ t e m p _ s t o r a g e ( ) ;
e x t e r n  v o id  f r e e _ r e g ( ) ;
e x t e r n  v o i d  m ak e_ seg m en ts ( ) ;
e x t e r n  v o i d  s t o r e _ r e g i s t e r s ( ) ;
e x t e r n  v o id  p r i n t _ r e g _ c h a i n ( ) ;
e x t e r n  v o id  f r e e _ a x _ d x ( ) ;
i n t  x _ r e g ;
i n t  y _ r e g ;
i n t  z _ r e g ;
c h a r  x _ r e g _ b u f [ 3 0 ] ;
c h a r  y _ r e g _ b u f [ 3 0 ] ;
Sym *x_sym = NULL;
Sym *y_sym = NULL;
Sym *z_sym = NULL; 
s t r u c t  o p s ta c k  *x = NULL; 
s t r u c t  o p s ta c k  *y = NULL; 
s t r u c t  o p s ta c k  *z = NULL; 
s t r u c t  r e g _ d e s c  *work; 
i n t  i ;  
i n t  j ;  
i n t  c o u n t ;  
i n t  r e g ;
i n t  a r g _ s i z e  = 0; 
i n t  i f _ s t m t ;  
s t r u c t  t u p l e  * c u r ;
i f  ( !g e n _ co d e )  
r e t u r n ;
(v o id )  a l l o c a t e _ t e m p _ s t o r a g e ( ) ;
/ *
* g e n e r a t e  code  w i t h  r e g i s t e r  a s s ig n m e n t  f o r  e a c h  b a s i c  b lo c k .  
* /
59
b lo c k s  = b l o c k _ s t a r t ;  
w h i l e  ( b lo c k s  1= NULL)
{
/ *
* i n i t i a l i z e  r e g i s t e r  d e s c r i p t o r  t a b l e  
* /
f o r  ( i  = 0 ;  i  < NUMREG; i++)
{
r e g i s t e r s [ i ] . c o u n t  = 0 ; 
r e g i s t e r s [ i ] . s y m b o l  = NULL; 
r e g i s t e r s [ i ] . n e x t  = NULL;
f o r  ( i = b l o c k s - > s t a r t ; i< b l o c k s - > s t a r t + b l o c k s - > n u m b e r ; i++)
i f  (debug)
{
f p r i n t f (a sm o u t , " ;  t u p l e  #%d%s\n", i , i  ==
b l o c k s - > s t a r t  ? " — s t a r t  o f  b a s i c  b lo c k "  : " " ) ;  
p r i n t f ( " \ n t u p l e  # % d % s\n " ,i ,  i  == b l o c k s - > s t a r t  ? 
m —  s t a r t  o f  b a s i c  b lo c k "  : " " ) ;
}
c u r  = & t u p l e s [ i ] ;  
x  = & c u r - > r e s u l t ; 
y  = & c u r-> o p l ;  
z = & cur->op2; 
i f  ( x - > ty p e  & OPSTACK_VAR) 
x_sym = x - > v a l u e . s y m ta b ; 
i f  ( y -> ty p e  & OPSTACK_VAR) 
y_sym = y - > v a l u e . s y m ta b ; 
i f  ( z - > ty p e  & OPSTACK_VAR) 
z_sym = z - > v a lu e . s y m ta b ;
s w i t c h  (c u r -> c o d e )
{
c a s e  IM_AVAL;
z_sym = z->value. symtab; 
arg s ize  += z_sym->size; 
i f  Jz->type & OPSTACK_INT) 
fprintf(asmout,"\tpush\t%s\n", gen_addr(z));
e l s e  i f  ( z -> ty p e  & OPSTACK_STRING) 
f p r i n t f ( a s m o u t , " \ t p u s h \ t % s \ n " , g e n _ a d d r ( z ) ) ;  
e l s e  i f  ( z -> ty p e  & OPSTACK_REAL)
{
f p r i n t f ( a s m o u t ,
" \ t p u s h \ t w o r d  p t r  s s : [ b p % + d ] \ n " , z _ s y m - > o f f s e t + 2 ) ; 
f p r i n t f ( a s m o u t ,
" \ t p u s h \ t w o r d  p t r  s s : [ b p % + d ] \n " , z _ s y m - > o f f s e t ) ;
}
break;
60
c a s e  IM_AREF:
z_sym = z - > v a lu e . s y m ta b ;  
a r g _ s i z e  += z_ sy m -> s iz e ?
i f  ( ( z_sym -> type  & SYM_TYPE_STRING) == 0)
{
f p r i n t f ( a s m o u t , " \ t l e a \ t a x , % s \ n M, g e n _ a d d r ( z ) ) ; 
f p r i n t f ( a s m o u t , " \ t p u s h \ t a x \ n " ) ;
}
e l s e
f p r i n t f (a sm o u t , " \ t p u s h \ t % s \ n " , g e n _ a d d r ( z ) ) ;  
b r e a k ;
c a s e  IM_CALLP: 
c a s e  IM_CALLF;
z_sym = z - > v a lu e . s y m ta b ;
f p r i n t f ( a s m o u t , " \ t p u s h \ t d s : _ d i s p % + d \ n " , l e v e l * 2 ) ; 
f p r i n t f ( a s m o u t , " \ t c a l l \ t % s \ n " , z _ sy m -> sy m b o l) ; 
f p r i n t f ( a s m o u t , " \ t a d d \ t s p , % d\n", a rg _ s iz e + S IZ E _ IN T ) ; 
a r g _ s i z e  = 0 ; 
b r e a k ;
c a s e  IM_END:
f p r i n t f ( a s m o u t , " _ T E X T \ t e n d s \ n " ) ;
(v o id )  m a k e_ seg m en ts ( ) ;
f p r i n t f  (a sm o u t,  ,l\ t e n d \ t % s \ n " , z - > v a l u e .  sym tab-> sym bol)  ; 
b r e a k ;
c a s e  IM_ENDF: 
c a s e  IM_ENDP:
z_sym = z - > v a lu e . s y m ta b ;
i f  (z_sym == prognam e)
{
f p r i n t f ( a s m o u t , " \ t m o v \ t a h , 4 c h \ n " ) ; 
f p r i n t f ( a s m o u t , " \ t i n t \ t 2 1 h \ n " ) ;
}
e l s e
{
i f  ( c u r -> c o d e  == IM_ENDF)
{
i f  (z_ sy m -> ty p e  & SYM_TYPE_REAL)
{
i f  (z_ sy m -> reg  != R0_REG) 
f p r i n t f ( a sm o u t ,
" X t f l d N ^ s X n 11 ,g e n _ a d d r2  (z_ sy m )) ;
)
)
i f  ( z _ s y m -> le v e l+ l  != l a s t _ l e v e l )
{
/ *  l a s t  v a r i a b l e  was in d e x e d  o f f  o f  a n o t h e r  d i s p l a y  
r e g i s t e r  * /
61
f p r i n t f ( a s m o u t , " \ t m o v \ t b p , d s :_ d i s p % + d \n " , 
( z _ sy m -> le v e l+ l)* S IZ E _ IN T ) ; 
l a s t _ l e v e l  = z _ s y m - > le v e l ;
}
f p r i n t f ( a s m o u t , " \ t m o v \ t b x , s s : [ b p + 4 ] \ n " ) ; 
f p r i n t f  (a sm o u t,  ,,\ tm o v \ t d s : _ d i s p % + d ,b x \ n " , l e v e l* 2 )  ; 
f p r i n t f (a s m o u t , " \ t m o v \ t s p , b p \ n " ) ;  
f p r i n t f ( a s m o u t , " \ t p o p \ t b p \ n " ) ;  
f p r i n t f ( a s m o u t , " \ t r e t \ n " ) ;
}
f p r i n t f ( a s m o u t , " % s \ t e n d p \ n " , z - > v a lu e . s y m ta b - > s y m b o l ) ; 
b r e a k ;
c a s e  IM_PROG:
f p r i n t f  ( a sm o u t , " _ T E X T \ts e g m e n t \ tb y te  p u b l i c  ' CODE' \ n " )  ; 
f p r i n t f ( a s m o u t ,
” \ t a s s u m e \ t c s ;  _TEXT, d s :_DATA, s s : _DATA\n") ;
f p r i n t f ( a s m o u t , " \ t e x t r n \ t  r d i n t : n e a r \ n " ) ;
f p r i n t f ( a s m o u t , " \ t e x t r n \ t  r d f l t : n e a r \ n " ) ;
f p r i n t f ( a s m o u t , " \ t e x t r n \ t  w r i n t : n e a r \ n " ) ;
f p r i n t f ( a s m o u t , " \ t e x t r n \ t  w r f l t : n e a r \ n " ) ;
f p r i n t f ( a s m o u t , " \ t e x t r n \ t  w r s t r ; n e a r \ n " ) ;
f p r i n t f ( a s m o u t , H\ t e x t r n \ t  w r e o l : n e a r \ n " ) ;
b r e a k ;
c a s e  IM_FUNC: 
c a s e  IM_PROC:
l a s t _ l e v e l  = l e v e l ;  
z_sym = z - > v a lu e . s y m ta b ;  
i f  ( z_sym == prognam e)
{
/ *  m ain  p ro g ram  s e t u p  * /
f p r i n t f  (a sm o u t,  '^ s X tp ro c X t fa r X n '1, z_sym ->sym bol) ;
f p r i n t f ( a s m o u t , " \ t f i n i t \ n " ) ;
f p r i n t f  (a sm o u t,  " \ t p u s h \ t d s \ n 11) ;
f p r i n t f ( a s m o u t , " \ t x o r \ t a x , a x \ n " ) ;
f p r i n t f ( a s m o u t , " \ t p u s h \ t a x \ n " ) ;
f p r i n t f (a s m o u t , " \ t m o v \ t a x , _DATA\n") ;
f p r i n t f ( a s m o u t , " \ t m o v \ t d s , a x \ n " ) ;
f p r i n t f (a s m o u t , " \ t m o v \ t s s , a x \ n H) ;
f p r i n t f (a s m o u t , " \ t m o v \ t s p , _DATA-_STACK\n") ;
f p r i n t f ( a s m o u t , " \ t p u s h \ t a x \ t ;  n u l l  d i s p l a y  r e g \ n H) ;
f p r i n t f ( a s m o u t , " \ t p u s h \ t a x \ t ;  r e t u r n  a d d r e s s \ n " ) ;
}
e l s e
f p r i n t f ( a s m o u t , " % s \ t p r o c \ t n e a r \ n " , z _ sy m -> sy m b o l) ; 
f p r i n t f ( a s m o u t ,
" \ t p u s h \ t b p \ t ;  s a v e  a c t i v a t i o n  r e c o r d  l i n k \ n ” ) ; 
f p r i n t f (a s m o u t , " \ t m o v \ t b p , s p \ n " ) ;  
f p r i n t f ( a s m o u t , " \ t m o v \ t d s : _ d i s p % + d , b p \ n H, 
l a s t_ l e v e l* S I Z E _ I N T ) ;
62
f p r i n t f ( a s m o u t ,
" \ t s u b \ t s p , % d \t ;  a l l o c a t e  s p a c e  f o r  l o c a l  v a r s \ n " ,  
s t a c k [ l e v e l ] . s i z e ) ; 
b r e a k ;
c a s e  IM_STORE:
i f  ( t ( x - > t y p e  & OPSTACK_VAR)) 
e r r o r (ERR_XNTERR, " IM_STORE r e s u l t  n o t  a  v a r i a b l e ” ,
NULL,ABORT);
i f  ( y -> ty p e  & OPSTACK_CONST)
{
/ *  s p e c i a l  c a s e  -  l o a d  a  c o n s t a n t  i n t o  a v a r i a b l e  * /  
x_sym->mem = FALSE; 
i f  (x_sym -> type  & SYM_TYPE_REAL)
{
x_sym -> reg  = NO_REG;
f p r i n t f ( a s m o u t , ” \ t f l d \ t % s \ n ” , g e n _ a d d r ( y ) ) ;  
f p r i n t f ( a s m o u t , ” \ t f s t p \ t % s \ n ” ,g e n _ a d d r2 (x _ s y m )) ;  
f p r i n t f ( a s m o u t , " \ t f w a i t \ n " ) ;
>
e l s e
{
x_sym -> reg  = NO_REG; 
f p r i n t f ( a s m o u t , ” \ tm o v \ t% s ,% s \n " , 
g e n _ a d d r 2 (x _ s y m ) ,g e n _ a d d r (y ) ) ;
)
b r e a k ;
>
e l s e
{
i f  (x_sym -> type  & SYM_TYPE_REAL)
(
x _sym -> reg  = NO_REG;
f p r i n t f ( a s m o u t , ” \ t f l d \ t % s \ n ” , g e n _ a d d r ( y ) ) ;  
f p r i n t f ( a s m o u t , ” \ t f s t p \ t % s \ n ” ,g e n _ a d d r2 (x _ s y m ) ) ;  
f p r i n t f ( a s m o u t , " \ t f w a i t \ n " ) ;
}
e l s e
(
x_sym -> reg  = NO_REG;
f p r i n t f  (a sm o u t,  " \ t m o v \ t a x ,  % s\n” , g e n _ a d d r  ( y ) ) ; 
f p r i n t f ( a s m o u t , ” \ t m o v \ t % s , a x \ n " , 
g e n _ a d d r2 (x _ sy m )) ;
}
}
b r e a k ;
63
c a s e  IM_F2I:
f p r i n t f ( a s m o u t , " \ t f l d \ t % s \ n " , g e n _ a d d r ( y ) ) ;  
f p r i n t f ( a s m o u t , " \ t f i s t p \ t % s \ n " , g e n _ a d d r ( x ) ) ;  
f p r i n t f ( a s m o u t , " \ t f w a i t \ n " ) ;  
b r e a k ;
c a s e  IM_I2F:
f p r i n t f ( a s m o u t , " \ t f i l d \ t % s \ n " , g e n _ a d d r ( y ) ) ;  
f p r i n t f ( a s m o u t , " \ t f s t p \ t % s \ n " , g e n _ a d d r ( x ) ) ;  
f p r i n t f ( a s m o u t , " \ t f w a i t \ n " ) ;  
b r e a k ;
c a s e  IM_RDIV:
f p r i n t f ( a s m o u t , " \ t f l d \ t % s \ n " , g e n _ a d d r ( y ) ) ;  
f p r i n t f ( a s m o u t ,  ,,\ t f d i v \ t % s \ n " , g e n _ a d d r ( z ) ) ;  
f p r i n t f (a sm o u t , " \ t f s t p \ t % s \ n " , g e n _ a d d r (x ) ) ;  
f p r i n t f ( a s m o u t , " \ t f w a i t \ n " ) ;
b r e a k ;
c a s e  IM_ADD: 
c a s e  IM_SUB:
i f  (x_sym -> type  & SYM_TYPE_REAL)
{
/ *  r e a l  a d d i t i o n / s u b r t a c t i o n  * /
f p r i n t f ( a s m o u t , M\ t f l d \ t % s \ n " , g e n _ a d d r ( y ) ) ;
i f  ( c u r -> c o d e  == IM_ADD)
f p r i n t f  (a sm o u t,  ,,\ t f a d d \ t % s \ n " , g e n _ a d d r  ( z ) ) ; 
e l s e
f p r i n t f ( a s m o u t , " \ t f s u b \ t % s \ n " , g e n _ a d d r ( z ) ) ;
f p r i n t f ( a s m o u t , " \ t f s t p \ t % s \ n " , g e n _ a d d r ( x ) ) ;  
f p r i n t f ( a s m o u t , " \ t f w a i t \ n " ) ;
}
e l s e
{
/ *  i n t e g e r  a d d /s u b  * /
f p r i n t f ( a s m o u t , " \ t m o v \ t a x , % s \ n " ,g e n _ a d d r ( y ) ) ;
i f  ( c u r -> c o d e  == IM_ADD)
f p r i n t f (a s m o u t , " \ t a d d \ t a x , % s \ n " ,g e n _ a d d r ( z ) ) ;  
e l s e
f p r i n t f (a s m o u t , " \ t s u b \ t a x , % s \n " , g e n _ a d d r ( z ) ) ;  
f p r i n t f ( a s m o u t , H\ t m o v \ t % s , a x \ n " ,g e n _ a d d r ( x ) ) ;
)
b r e a k ;
64
c a s e  IM_LOR: 
c a s e  IM_AND:
f p r i n t f ( a s m o u t , M\ t m o v \ t a x ,% s \ n " ,g e n _ a d d r ( y ) ) ;
i f  ( c u r -> c o d e  == IM_LOR)
f p r i n t f  ( a s m o u t , " \ t o r \ t a x , % s \ n ,, , g e n _ a d d r ( z ) ) ;  
e l s e
f p r i n t f ( a s m o u t , M\ t a n d \ t a x , % s \ n " , g e n _ a d d r ( z ) ) ;
f p r i n t f  (a s m o u t , 11 \ tm o v \ t% s , a x \ n H, g e n _ a d d r  ( x ) ) ; 
b r e a k ;
c a s e  IM_REQ: 
c a s e  IM_RNE: 
c a s e  IM_RLE: 
c a s e  IM_RLT: 
c a s e  IM_RGE: 
c a s e  IM_RGT:
i f  ( t u p l e s [ i + 1 ] . c o d e  == IM_JMPZ)
(
t u p l e s [ i + 1 ] . code  = IM_NOP; 
i f _ s t m t  = TRUE;
}
e l s e  
i f _ s t m t  = FALSE;
/ *  l o a d  y  i n t o  r e g i s t e r  * /
f p r i n t f ( a s m o u t , " \ t m o v \ t a x , % s \n " , g e n _ a d d r ( y ) ) ;
s t r c p y ( x _ r e g _ b u f ,g e n _ a d d r ( x ) ) ;
i f  ( i f _ s t m t )
{
i n t  c o d e ;  
f p r i n t f (a sm o u t ,
" \ t c m p \ t a x , % s \ t ;  i f  s t m t \ n " , g e n _ a d d r ( z ) ) ;
co d e  = c u r -> c o d e ;
i+ + ;
c u r  = &t u p l e s [ i ] ;  
i f  (code  =?= IM_REQ) 
f p r i n t f ( a s m o u t , " \ t j e \ t s h o r t  @%6. 6 d \n " , c u r - > l a b e l ) ; 
e l s e  i f  (code  == IM_RNE)
f p r i n t f ( a s m o u t , " \ t j n e \ t s h o r t  @%6. 6 d \ n " , 
c u r - > l a b e l ) ; 
e l s e  i f  (code  == IM_RLT) 
f p r i n t f ( a s m o u t , " \ t j l \ t s h o r t  @%6. 6 d \n ” , c u r - > l a b e l ) ;  
e l s e  i f  (code  == IM_RLE)
f p r i n t f ( a s m o u t , " \ t j l e \ t s h o r t  @%6.6d\n” , 
c u r - > l a b e l ) ;
65
e l s e  i f  (code  == IMJRGT) 
f p r i n t f ( a s m o u t , " \ t j g \ t s h o r t  @ % 6 .6 d \n " ,c u r -> la b e l ) ; 
e l s e  i f  (code  == IM_RGE)
f p r i n t f  (a sm o u t , 11 \ t j g e \ t  s h o r t  @%6. 6 d \ n " , 
c u r - > l a b e l ) ; 
f p r i n t f ( a s m o u t , " \ t j m p \ t L % 6 . 6 d \ n " , c u r - > l a b e l ) ; 
f p r i n t f ( a s m o u t , " @ % 6 . 6 d : \ n " , c u r - > l a b e l ) ;
}
e l s e
{
f p r i n t f  ( a s m o u t , ,,\ t c m p \ t a x ,% s \ n " , g e n _ a d d r ( z ) ) ;  
i f  ( c u r -> c o d e  == IM_REQ)
f p r i n t f ( a s m o u t , " \ t j n e \ t s h o r t  @%6. 6 d \ n " , 
t e m p _ l a b e l s ) ; 
e l s e  i f  ( c u r -> c o d e  == IM_RNE)
f p r i n t f ( a s m o u t , " \ t j e \ t s h o r t  @%6.6d\n", 
t e m p _ l a b e l s ) ; 
e l s e  i f  ( c u r -> c o d e  == IM_RLT)
f p r i n t f ( a s m o u t , M\ t j n l \ t s h o r t  §%6. 6 d \ n " , 
t e m p _ l a b e l s ) ; 
e l s e  i f  ( c u r -> c o d e  =  IM_RLE)
f p r i n t f ( a s m o u t , " \ t j n l e \ t s h o r t  §%6. 6 d \ n " , 
t e m p _ l a b e l s ) ; 
e l s e  i f  ( c u r -> c o d e  == IM_RGT)
f p r i n t f ( a s m o u t , " \ t j n g t \ t s h o r t  § % 6 .6 d \n " , 
t e m p _ l a b e l s ) ; 
e l s e  i f  ( c u r -> c o d e  == IM_RGE)
f p r i n t f ( a s m o u t , " \ t j n g e \ t s h o r t  @%6.6d\n” , 
t e m p _ l a b e l s ) ; 
f p r i n t f ( a s m o u t , " \ t m o v \ t % s , l \ n " , x _ r e g _ b u f ) ;  
f p r i n t f (a s m o u t ,
11 \ t j m p \ t s h o r t  §%6.6d\n@%6.6d: \n \ tm o v \ t% s ,  0 \ n " , 
t e m p _ l a b e l s + l , t e m p _ l a b e l s , x _ r e g _ b u f ) ;  
f p r i n t f ( a s m o u t , "@%6. 6 d : \ n " , t e m p _ l a b e l s + l ) ; 
t e m p _ la b e l s  += 2 ;
}
b r e a k ;
c a s e  IMJEDIV: 
c a s e  IM_MOD:
i f  (y_sym == NULL) 
y _ re g  = NO_REG; 
e l s e
y _ re g  = y _ sy m -> reg ;
(v o id )  f r e e _ a x _ d x ( ) ;  / *  DX:AX a r e  r e q u i r e d  * /
s t r c p y ( y _ r e g _ b u f ,g e n _ a d d r ( y ) ) ;  
x _ re g  = AX_REG; 
x_sym -> reg  = AX_REG; 
s t r c p y (x _ r e g _ b u f , g e n _ a d d r ( x ) ) ;  
i f  (y _ re g  1= AX_REG) 
f p r i n t f ( a s m o u t , " \ t m o v \ t a x , % s \n " , y _ r e g _ b u f ) ;
66
i f  ( z - > ty p e  & OPSTACK_CONST)
{
/ *  s p e c i a l  c a s e  -  c o n n o t  h a v e  im m ed ia te  mode * /  
i f  ( z - > ty p e  & OPSTACK_INT)
{
c o u n t  = 0 x 7 f f f ;  
z _ r e g  = NO_REG;
f o r  ( j  = LOWER_INT_REG; ( z _ re g  == NO_REG) &&
(j  <= UPPER_INT_REG); j++)
{
i f  ( ( j  != AX_REG) && ( j  != DX_REG)) 
i f  ( r e g i s t e r s [ j ] . c o u n t  < c o u n t)
{
c o u n t  = r e g i s t e r s [ j ] . c o u n t ;  
z _ r e g  = j ;
)
)
i f  ( z _ re g  == NO_REG)
(v o id )  error(ERR_INTERR,
'•c o u ld  n o t  a s s i g n  c o n s t a n t  t o  r e g . " ,
NULL,ABORT); 
f p r i n t f  (a sm o u t,  " \ t p u s h \ t % s \ n " , r e g _ s t r  [ z _ r e g ] ) ; 
f p r i n t f ( a s m o u t , " \ t m o v \ t % s , % s \ n " ,
r e g _ s t r [ z _ r e g ] , g e n _ a d d r ( z ) ) ;
}
}
f p r i n t f  (a sm o u t , " \ t c w d \ n 11) ;  
f p r i n t f ( a s m o u t , " \ t i d i v \ t % s \ n " ,
( z - > ty p e  & OPSTACK_CONST) ? r e g _ s t r [ z _ r e g ] : 
g e n _ a d d r ( z ) ) ;  
i f  ( c u r - > c o d e  == IM_MOD) 
x _ re g  -  DX_REG;
/ *  b u i l d  r e g i s t e r  d e s c r i p t o r  -  r e g i s t e r - b a s e d  r e s u l t  * /  
(v o id )  a d d 2 r e g ( x _ r e g , x ) ;
/ *  f r e e  r e g i s t e r s  f o r  y  and  z —  i f  n e c e s s a r y  * /  
i f  (debug)
{
(v o id )  p r i n t _ r e g _ c h a i n ( " f r e e _ a x _ d x  free" ,A X _R E G ); 
(v o id )  p r i n t _ r e g _ c h a i n ( " f r e e _ a x _ d x  f r e e " , DX_REG);
)
i f  ( x _ re g  == AX_REG)
{
f r e e (r e g i s t e r s [ D X _ R E G ] .n e x t ) ; 
r e g i s t e r s [ DX_REG]. c o u n t  = 0 ; 
r e g i s te r s [ D X _ R E G ] .n e x t  = NULL;
67
e l s e
{
f r e e (r e g i s t e r s [ A X _ R E G ] .n e x t ) ; 
r e g is te r s [A X _ R E G ] .c o u n t  = 0 ; 
r e g i s te r s [ A X _ R E G ] .n e x t  = NULL;
i f  ( z - > ty p e  & OPSTACK_VAR)
{
i f  ( ( z - > l i v e  == FALSE) && ( z - > n e x tu s e  == 0)
&& ( z_sym -> reg  != N0_REG))
{
(v o id )  f r e e _ r e g ( z ) ;
}
}
e l s e  i f  ( z -> ty p e  & OPSTACK_CONST)
{
i f  ( z - > ty p e  & OPSTACK_INT)
f p r i n t f ( a s m o u t , " \ t p o p \ t % s \ n " , r e g _ s t r [ z _ r e g ] ) ;
)
b r e a k ;
c a s e  IM_MULT:
i f  (x_sym -> type  & SYM_TYPE_REAL)
(
/ *  r e a l  m u l t i p l i c a t i o n  * /
f p r i n t f ( a s m o u t , " \ t f l d \ t % s \ n " , g e n _ a d d r ( y ) ) ;  
f p r i n t f ( a sm o u t , " \ t f m u l \ t % s \ n " , g e n _ a d d r ( z ) ) ;  
f p r i n t f ( a s m o u t , " \ t f s t p \ t % s \ n " , g e n _ a d d r ( x ) ) ;  
f p r i n t f ( a s m o u t , " \ t f w a i t \ n " ) ;
)
e l s e
{
/ *  i n t e g e r  m u l t i p l y  * /
s t r c p y (y _ r e g _ b u f , g e n _ a d d r ( y ) ) ;
f p r i n t f ( a s m o u t , " \ t m o v \ t a x , % s \n " ,y _ r e g _ b u f ) ;
i f  ( z -> ty p e  & OPSTACK_CONST)
{
f p r i n t f ( a s m o u t , " \ t m o v \ t b x , % s \n " , g e n _ a d d r ( z ) )? 
f p r i n t f ( a s m o u t , " \ t i m u l \ t b x \ n " ) ;
)
e l s e
f p r i n t f  (a sm o u t,  " \ t i m u l \ t % s \ n ,l,g e n _ a d d r  ( z ) ) ; 
f p r i n t f  (a sm o u t,  l,\ t m o v \ t % s , a x \ n " ,g e n _ a d d r ( x ) ) ;
)
b r e a k ;
c a s e  IM__LABEL:
f p r i n t f ( a s m o u t , ”L%6. 6 d ; \ n " , c u r - > l a b e l ) ; 
b r e a k ;
68
c a s e  IM_JMP:
f p r i n t f ( a s m o u t , " \ t j m p \ t L % 6 . 6 d \ n " , c u r - > l a b e l ) ; 
b r e a k ;
c a s e  IM_JMPZ:
/ *  make s u r e  y  i s  i n  a  r e g i s t e r ,  c u r r e n t l y ,  we a r e  
a lw a y s  t h e  f i r s t  t u p l e  i n  a  b a s i c  b l o c k  so  a l l  
r e g i s t e r s  a r e  f r e e .  !!1  *** 1!!  t h i s  w i l l  h ave  
t o  b e  ch an g ed  i f  d a t a - f l o w  a n a l y s i s  i s  d o n e . . .  * /  
i f  ( r e g is te r s [A X _ R E G ] .c o u n t  != 0)
(v o id )  error(ERR_INTERR,
"IM_JMPZ -  AX n o t  free",NULL,ABORT) ;  
f p r i n t f (a sm o u t , " \ t m o v \ t a x , % s\n " , g e n _ a d d r ( y ))  ; 
f p r i n t f ( a s m o u t , " \ t o r \ t a x , a x \ n " ) ; 
f p r i n t f ( a s m o u t , " \ t j n z \ t @ % 6 . 6 d \ n " , c u r - > l a b e l ) ; 
f p r i n t f ( a s m o u t , " \ t j m p \ t L % 6 . 6 d \ n " , c u r - > l a b e l ) ; 
f p r i n t f ( a s m o u t , " § % 6 . 6 d : \ n " , c u r - > l a b e l ) ; 
b r e a k ;
c a s e  IM_JMPNZ:
/ *  make s u r e  y  i s  i n  a  r e g i s t e r ,  c u r r e n t l y ,  we a r e  
a lw a y s  t h e  f i r s t  t u p l e  i n  a  b a s i c  b l o c k  s o  a l l  
r e g i s t e r s  a r e  f r e e .  1!!  *** ! ! !  t h i s  w i l l  have  
t o  b e  ch an g ed  i f  d a t a - f l o w  a n a l y s i s  i s  d o n e . . .  * /  
i f  ( re g is te r s [A X _ R E G ] .c o u n t  != 0)
(v o id )  e r r o r (ERR_INTERR, " IM_JMPNZ -  AX n o t  f r e e " ,  
NULL,ABORT); 
f p r i n t f ( a s m o u t , " \ t m o v \ t a x , % s \n " , g e n _ a d d r ( y ) )? 
f p r i n t f (a s m o u t , " \ t o r \ t a x , a x \ n " ) ;  
f p r i n t f ( a s m o u t , " \ t j z \ t @ % 6 . 6 d \ n " , c u r - > l a b e l ) ; 
f p r i n t f ( a s m o u t , " \ t j m p \ t L % 6 . 6 d \ n " , c u r - > l a b e l ) ; 
f p r i n t f ( a s m o u t , " @ % 6 . 6 d : \ n " , c u r - > l a b e l ) ; 
b r e a k ;
}
}
b l o c k s  = b l o c k s - > n e x t ;
}
p r i n t f  ("%d b y t e s  a l l o c a t e d  f o r  te m p o ra ry  and  l o c a l  v a r i a b l e s . \ n " , 
s t a c k [ l e v e l ] . s i z e ) ; 
p r i n t f ( " % d  i n t e r m e d i a t e  co d e  t u p l e s  g e n e r a t e d . \ n " , t u p l e _ i n d e x ) ; 
t u p l e _ i n d e x  = 0 ;
69
/ *
* m ak e_ seg m en ts ( ) —  g e n e r a t e  a s se m b ly  la n g u a g e  s t a t e m e n t s  t o
* d e f i n e  t h e  c o n t e n t s  o f  t h e  d a t a  and s t a c k
s e g m e n ts .
* /
v o id  m ak e _ se g m e n ts ()
{
f p r i n t f (a s m o u t , "_D A T A \tsegm ent\tw ord  p u b l i c  ' DATA1\ n " ) ;  
f p r i n t f (a s m o u t , " _ d is p \ td w \ t% d  d u p ( 0 ) \ n " , m a x _ le v e l+ l )? 
d a t a _ l i s t  = d a t a _ l i s t _ h e a d ;  
i f  ( d a t a _ l i s t  != NULL)
{
f p r i n t f ( a s m o u t , " _ l i t " ) ; 
w h i l e  ( d a t a _ l i s t  != NULL)
{
i f  (debug)
{
p r i n t f ( "m ak e_ d a ta_ seg m en t: t y p e : %d, o f f s e t : " ,
"%d, l e n g t h :  %d", 
d a t a _ l i s t - > t y p e , d a t a _ l i s t - > o f f s e t , d a t a _ l i s t - > l e n g t h ) ;
i f  ( d a t a _ l i s t - > t y p e  & DATA_ITEM_STRING) 
p r i n t f  ('*, s t r i n g :  % s \ n " , d a t a _ l i s t - > v a l u e . s t r i n g ) ;
e l s e  i f  ( d a t a _ l i s t - > t y p e  & DATA_ITEM_REAL) 
p r i n t f ( " ,  r e a l :  % f \ n " , d a t a _ l i s t - > v a l u e . r v a l ) ;
)
i f  ( d a t a _ l i s t - > t y p e  & DATA_ITEM_STRING)
{
i n t  i , j ; 
c h a r  * s t r ;
f p r i n t f (a sm o u t , " \ t d b \ t " ) ;  
s t r  = d a t a _ l i s t - > v a l u e . s t r i n g ;  
j  = s t r l e n ( s t r ) + l ;  
f o r  ( i  = 0 ; i  < j ;  i++)
{
f p r i n t f ( a s m o u t , " % d " , * ( s t r + i ) ) ;
i f  ( ( ! ( i  % 10) && ( i  != 0) )  | j  ( * ( s t r + i )  == ' \ 0 ' ) )
{
f p r i n t f ( a s m o u t , " \ n " ) ; 
i f  ( ! ( i  % 10) && ( * ( s t r + i )  != ' \ 0 ' ) )  
f p r i n t f ( a s m o u t , " \ t d b \ t " ) ;
)
e l s e
f p r i n t f  (a s m o u t , " , 11) ;
)
}
e l s e  i f  (d a t a _ l i s t - > t y p e  & DATA_ITEM_REAL)
f p r i n t f ( a s m o u t , " \ t d d \ t % f \ n " , d a t a _ l i s t - > v a l u e . r v a l ) ; 
d a t a _ l i s t  = d a t a _ l i s t - > n e x t ;
}
}
70
