The Design of a Two Level Code Generator by Byrne, Michael
 
 
 
http://researchcommons.waikato.ac.nz/ 
 
 
Research Commons at the University of Waikato 
 
Copyright Statement: 
The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). 
The thesis may be consulted by you, provided you comply with the provisions of the 
Act and the following conditions of use:  
 Any use you make of these documents or images must be for research or private 
study purposes only, and you may not make them available to any other person.  
 Authors control the copyright of their thesis. You will recognise the author’s right 
to be identified as the author of the thesis, and due acknowledgement will be 
made to the author where appropriate.  
 You will obtain the author’s permission before publishing any material from the 
thesis.  
 
The Design of a Two Level Code Generator 
A thesis submitted in partial 
fulfilm ent of the requirements 
for the degree of Master of 
Science i n Computer Science at 
the University of Waikato by 
Micha el Byrne. 
University of WaiKato 
1987 
UNl'IERSITY OF WAIKATO 
LIBRARY 
QA-16-lb 
. G i+~ fsO, 
l '1 s1 
c1:i.::uu.,1 l't 
CI ES" 
IIEFERC ,,1..t 
ONLY 
Ac lt. now l edgments 
My than k s go to Keith Hopper for the cons id erable support and 
advic e he has provided , and to my family who were pat i ent dur i ng 
the time I spent on this thesis. 
Abstract 
The Rcode in termed i ate code used in the University of 
Waikato Portable Language Implementat i on Project (PLIP) compiler 
sy s tem has been designed to represent the source progra m 
independently of the source l anguage and the target mach i ne 
environment, with on ly suff i c i ent structura l i nformation to 
ensure that efficient target machine code can be represented. 
L ik e many such in termediate codes, significant work. is sti ll 
required to produce target machine code. This study has 
investigated the design and use of a second i ntermediate code 
that divides the code generator into two phases, based on the 
observation that generation of target machine code wi I I have many 
similarities for machines that are architectura ll y s imil ar. This 
code is a Generic Action Set (GAS ) code that represents common 
arch i tectural 
fam i I y. The 
optim i sation, 
features of a set of machi nes considered to form a 
first phase is the generation of GAS code and its 
and i s common for a l I machines in the family. he 
second phase is the generation of target machine code from GAS 
code. It has been recogn i sed that generation of target mach in e 
code for mach i nes in the family wi ll st ill involve many 
similarit ie s, 
abstraction 
but machine idio s yn cras i es make 
difficult. However the development of 
adequate 
"f I u i d " 
abstractions 
por t a b i Ii t y 
definition 
for mac h in es i n the f am i I y to ass i st 
of comp i I er code has been studied, using the 
provided for the GAS fam ily as a bas i s for the 
of abstraction. 
with 
c l ear 
leve l 
Producing a c o de generator for a new ma ch in e wi I I often invo lv es 
very little effort, if one already e xi sts for a similar machine. 
Table of Contents 
Ackn owledgments 
Abstract 
Chapter 1 
Introduction 
Chapter 2 
Rcode i ntermediate code 
Thesis objective 
Survey of portab i Ii ty approaches 
Intermediate codes 
Chapter 3 
Diana - an intermediate code for Ada 
Rcode - Plip first level in termed iate code 
L in ear intermediate codes 
Portable target code generat io n 
Table driven code generation 
Hand coded code generators 
Code generator generators 
Other I i tera ture 
GAS code: a second intermediate code 
Justification for GAS code 
GAS code structure 
Subroutine Ca I Ii ng 
Contents-1 
1-1 
1-3 
1-6 
2-1 
2-3 
2-5 
2-6 
2-10 
2-13 
2-13 
2 - 15 
2-16 
2 - 18 
3-1 
3-1 
3-11 
3-16 
Contents 
Chapter 4 
Three address/Stack GAS machine 4-1 
GAS machine and interpretation 4-2 
Family definition for protot ype design 4-3 
Families and constraints 4-10 
GAS code for VA X family 4-12 
Character ist ics of family 4-12 
Temp orari es 4-13 
Addressing modes 4-13 
Data types supported 4-16 
Memory s tructure 4-18 
Code I abe Is and addresses 4-20 
Runtime Structures 4.-23 
Su broutine calling convent ion 4-28 
Input and output 4-34 
System vir tual cal Is 4-34 
Interrupts 4-35 , 
Mach ine state 4-36 
Logic a l shifts 4-37 
GAS instructions 4 - 37 
GAS operands 4 ~ " 
-.:>·: 
Compiler arc hi tecture 4-38 
Generat io n of GAS code from Rcode 4-39 
Dynamic and ove r size d objects 4-44 
Addition/Subtraction of oversized integers 4-45 
Multipication of oversized i ntegers 4-49 
I mplement i ng control structures 4-49 
Conte nts-2 
Contents 
Chapter 5 
GAS code opt imi sat ion 
Chapter 6 
GAS code optimisat i ons 
S tati c evaluat i on 
Other global opt imi sat io ns 
Identification of var i ab l es 
Bitstrings in the optim i s er 
Poin ters effect on data flow ana l ysis 
Loop opt imisation 
Lo cal code block optimisation 
Representing optimiser database i nfor mation 
GAS code and bas ic blocks 
Target machine code generator 
Storage allocat ion 
Ha nd I i ng operating system in tr in sics 
Register al loc at ion and ass i gnment and 
instruction selection 
Register stores before instruction 
Selection of operand loc at io ns 
Ge nerat i ng binary 
Dumpin g registers 
Index registers and register a l l oca tion 
Al loca t ing space to dump re g i s t ers 
Saving res ourc es used by routine 
Paramet ers of sub rou tine 
Handlin g GAS ca llin g convention 
Jump / Cal I displacement fixup 
Re f ere n ces to global data ob j ects 
Contents-3 
5-1 
5 - 3 
5-5 
5 - 6 
5-10 
5-17 
5-20 
5-26 
5-28 
5 - 30 
5-33 
6-1 
6-11 
6-14 
6-15 
6-19 
6-20 
6 - 24 
6 - 24 
6 - 25 
6- 2 7 
6 - 28 
6-30 
6-30 
6-44 
6 - 46 
Chapter 7 
Contents 
VAX abstracted resource database and 
generation procedures 
Target machine code optimisation 
Conclusions 
Introduction 
GAS code generation 
Futu re research and development 
Appendix A 
Machine Fa mil y No 1 GAS code Reference Manual 
Appendix B 
Machine Fam ily No 1 GAS code Users Guide 
Append ix C 
Pc ode Reference anua l 
Bib I i ography 
Contents-4 
6-46 
6-62 
7-1 
7-1 
7-2 
7-4 
Chapter One 
Introduction 
This chapter d i scus ses the portab i Ii ty co cepts i nvo I ved i n the 
Univ ers i ty of Wai kato Portable Language Imp l ementat i on Project 
<PLIP), and then describes the ob j ect iv es o this thes i s. The 
PLIP p roject invo l ves the development of a portab le compiler 
s ystem . The aim i s to develop a compiler system that can be 
easily modifie d to provide a compiler for any source l anguage 
for any machin e and ope r at ing syste m. Such a comp i I er system 
arose out of the need at the University of Wa ik ato for a standard 
im plementat io n of various programming languages on a range target 
ma ch in es and operat in g systems. Typically students are faced with 
impl ementat ions of programming l ang uages on different machines 
that can differ in semant i cs and even s yntax, and wh i ch offer a 
variety of dir ect ive s. 
debug ge r s , subrout in e 
The development e vironment of ed itors, 
I i braries and Ii k e rs can a l so d i ffer 
s i gnificant l y. The so l ut i on to pro vide standard l anguage 
im plement at ion s on a l I mach i nes would nly be pract i ca l if 
com p iler s coul d be eas il y ported . To a llow a standard de v e lo pme nt 
environment , the project has been e x tended to the de v elooment of 
a portab l e inker and e d ito r/debugger syste m. Thi s ed i tor p roj ect 
involves more than the de v e lopm ent of a typical te xt editor; it 
inv o lv es a "generic " edit or concept. I nforma· i on on the portab l e 
li nker a nd gener i c ed itor projects are pro vi ded i n re erences [1] 
and [2]. Fo r each l anguage for which a co mpi l er i s deve l oped, a 
standard runtime I ibrary wi 11 be developed. C rrently a standard 
Modula -2 runtime ibr ary has been defined , wi th imp l ementa ti ons 
produced for the Dig i ta l VAX/VMS [3] a nd NIX [4] operating 
1-1 
l 
I ntroduct i on 
systems. 
The PLIP project requirements are also 
development of portable software generally. 
the requ i rements for 
Therefore the project 
has assumed wider significance in term s of an i mplementation 
study of portabi I i ty. The combined tools wi 1 1 provide the idea l 
fac i I i ties for the deve l opment of software packages. The Modu l a 
II compiler implementation and runtime 
the development of portab l e systems 
ibrary provide a too l for 
software such as data 
communications software, and the compiler system and generic 
editor themselves. 
Portabil i ty of t h e compiler is achieved by: 
a > Use of the Modula-2 language as im plemented by the 
portable compiler system itself. 
b) Use of the standard Modu la-2 runtime 
PL IP project for fa c i I i t i es such 
I i brary produced for 
the 
concurrency. 
c ) 
semant i c 
possible. 
d) 
Use o f 
ana l yser 
Division 
as f i I e 
a parser generator to deve l op a 
for any g i ven source l anguage a s 
access and 
parser and 
eas il y as 
of the compiler into a front end which is 
target machine independent, and a back end code generato r . The 
i nterface between the front and bac k ends i s a target mac hi ne and 
source l anguage independent i ntermed i ate code. Th i s concept i s 
central to the success of the project. 
1-2 
Introduction 
e ) A modu l e arc i tecture that orga i zes mach i e and 
operating system dependence i nto a few modules. 
Rcode intermediate code 
The key element for portab ili ty i s the intermed i ate code used to 
It provides the 
the compiler 
separate the front and back ends of the compi l er. 
starting po int for the subject of th i s thes i s; 
bac k.e nd. The in termediat e code i s cal l ed Rcode, is oriented 
towards the basic 
two byte in tegers , 
low l e vel operations required such as adding 
but a l so al lows the front end to co mmu nicate 
semantic structura l i nfor mation to the back e nd that wi I I assist 
the bac k end to produce efficient machine code. The bas ic low 
level operat ion s of Reade are not machine spec ifi c . The ob jects 
manipulated are typ i cal of ta r get machine objects such as s i gned 
or unsigned in tegers, rea l s, booleans or b i tstrings but are not 
specific to any target mach in e. The ob j ects mani pu l ated in Rcode 
are not source l anguage ob j ects such as array e l e me nt s or record 
fields. Semantic stru ctur a l in format i on prov i ded i nc l udes : 
a) F l ow contro l l og ic 
b) Code b l ocks, 
informati on 
procedures and assoc i ated l e xi ca l ne st ing 
c) Identification of i ndependent global storage areas 
d) Ident ifi cat ion of l ocal variables of p r ocedures and c o de 
bloc k s 
f) Ident ifi cat i on of subrout i ne cal I ing co mponents such as 
parameters and resu l ts 
1- 3 
Introduct i on 
g) Ex pre s s i on structure v ia a tree format 
T ese are al I prov i ded i n a so ur ce l anguage i ndependent form . The 
equ i va l ent structures for a g i ven source l anguage must be map ped 
by the front end i nto the i nte rm ed i ate code structures. 
Th e Rcode basic l ow level operations are " t ree .. based. The 
o perands of operat i ons ca n be a s ubtrees of o per at i ons that 
r eturn operand va lu es. For e xa mpl e in f i gu r e 1, the top ADD has 
an operand wh i ch wi 11 be t h e su m o f objects " A " and " 8 " and a 
second operand wh i ch i s the p r od uct of "X +Y" and "C " . Th i s 
structure wh i c h i s s imi lar to a typical parse tree, provides 
structural i nformat i on wh i c h c ou l d not be p r ovided by a inear 
code. 
form: 
Cons i de r t h e rep r esenta ti on of A+B+ (X +Y>*C i n f i rst 
ADD A,B ,Temp_ 1 
ADD X, Y, Te mp_ 2 
MU L C, emp_2 ,T e mp_ 3 
ADD Te mp_1,Te mp _2,T e mp_4 
1-4 
i ne a r 
I n troduct i on 
then tree or m: 
I 
I 
ADD 
I \ 
I \ 
ADD MULTIPLY 
I \ I \ 
I \ I \ 
A B c ADD 
I \ 
I \ 
x y 
The tree f orm represents d irec t ly the expr ession structure. 
Efficient target mac hi ne code gen era t i on 
the semantic 
i s ass i sted by 
availab i l i ty of inf or mat i on on str u cture of a 
program as provided by Rcod e . 
Re ode represents semant i c struct ures that are co mmon I y ava i l ab I e 
in a w i de range o l a guages. It d oes not represent semant i c 
i nformat i on that i s specific to very few l anguages, or which does 
not contribute to c ode generation. In particular , Re o de da ta 
typ e s are simpl e machine l evel objects . Hi g h e r lev e l s our ce 
language objects, such as reco r ds or arrays a re not identified . 
hi s i s because these dat a types do no t c ontr i bute much to code 
gene r at ion and a l so there are s i gn ific ant variat i ons between 
source la ~guages on ru l es governing their data types. 
1- 5 
Introduction 
A d i scuss i on of in termediate l anguages i s in cluded in chapter 
two. 
Thesis Objective 
The centr a l a im has been to design a portable back end 
comp il er that produces efficient target mach ine 
particular a prototype design has been undertaken to 
back end for the Digital Corporation VAX. A more gene ral 
for the 
code. In 
deve lo p a 
mechanism 
f o r a por tab le back end wi I I be deve lo ped at a l ater stage using 
the experience gained in the prototype design and deve I opmen t, 
but this i s beyond the scope of this thesis. 
The design for the backend prototype has been based on the 
following realisations . and assumptions 
a ) That from Reade to efficient target mac hine code i s a 
large st e p. 
bl There i s much involved in this process that i s not target 
machine dependent. 
cl Many machines for wh i ch bac kends wi I I be produced wi I I 
have s i rn i I ar arch i tee tu res. 
The Rcode 
things such 
variab l es 
strateg i es, 
a l loc at ion . 
t o eff i cient target machine code step wi I I in v olve 
as of I inearisation Rcode, i ntroducing temporary 
as required, optimisation, ca l I in g convention 
operand 
St e ps 
address in g mode se l ect i on , a n d 
such as I inearisation and t emporar y 
regis ter 
variable 
declaration, many optimizations and call i ng convention handling 
a r e not part icul arly target machine dependent, but depend more on 
1-0 
Introduct i on 
he genera l arch i tecture of the mac hi ne. Mach i n es w i th s i mi I a r 
arch i tectures will involve much that is similar in these steps. 
S i milarity of architecture should therefore be a significant 
factor in the design of a portable backend. 
1-7 
Chapter Two 
Survey of Portab 1 Ii t y Approaches 
This chapter wi I I discuss further the concept of portab i Ii lty of 
software via a portable language system, wi I I surve y previous 
techn i ques used to achieve portability of compi l ers and i n 
particular, techniques used to develop portable code generator 
backends for compi l ers, and wi 11 contrast these to the approa ch 
ta ken in th e PLIP project. 
Portability 
programming 
of software depends heav i I y on the ava i I ab i Ii ty of a 
language for which compilers hopefully exist on al I 
target machines. However several problems e xi st in finding such a 
language: 
a) Semant i cs of the l anguage and even the synta x accepted 
frequently vary f rom one comp i I er to another i n 
annoying subtle ways. 
b) Pragmas ava i I able are not standard 
cl Access to s ystem resources such as fi Jes and timers i s 
not st andard, or is provided i n only a limi t ed form. 
Access to system resources should be provided in a machine 
independent conceptua Ii sed manner. Cobo I prov i des many of the 
syst em resource access faci I ities required for commerc i al 
app lications, but for app li cations such as writing compile rs , 
very few of the required fac i I i ti es are provided in the I anguage. 
2-1 
Port ab 1 Ii ty Approaches 
i s problem is o ten solved by the deve lopment of a standard 
runt im e I ibrary that abstacts the system resources in a non 
machine-dependent manner. However, the I ibrary implementation 
must itself be portable. The main difficulty is that the 
implementation will at various points have to have access to 
information on t e target machine, operating system and cal I 
system l ibraries. The solution adopted for the PLIP project is to 
ensure that al I target machine and operating system information 
is kep t in separate modules, and the comp il e r front end 
in tercepts ca I Is to the operating system and generates 
appropriate cal I ing code for the host ibrary involved. It is 
important to separate information on the target machine and 
operating system, because a comp i I er may be ported to run under 
more than one operating system on a given mach i ne. A Modula-2 
standard runtime I ibrary has been developed as part of the PLIP 
project that provides such things as fi l e manipulation and access 
facilit i es, 
management, 
t imers, delays and alarms, concurrency and process 
process synchronisation, interprocess communication, 
e x cept i on hand I ing, and access to arguments passed to a program . 
T he first 
so lv ed by 
and second problems above can only be effectively 
porting a standard compi l er to a ll target machines. 
This must be careful l y done to ensure that the semantics rema in 
the sa me regar dless of such things as different sized data 
objects of the host machine. Obviously, ease of porting of the 
compiler becomes cri t ical . 
Most attempts to develop portable comp il ers have centred on the 
use of an intermediate lan guage . The compiler front end i s 
designed to be target machine indep endent and produce an 
intermediate code which is target machine and source language 
2-2 
Portab I I i ty Approac hes 
i ndepende n The des i gn of the in te rmed i ate code i s cr i tica l to 
the success of this approach. It must a llo w e ficient target code 
to be produced, as wel I as al lowing portab i I ity. 
The problem then becomes one of trying to i d a way of 
developing code generators for a range o f mac i nes that 
the i ntermed i ate code as input and produce target machine 
One approa ch i s s imply to prov i de a code generator or 
runt ime in terpreter for each target machine. An e xam ple 
eas i I y 
accept 
code . 
e ven 
of a 
runtime interpreter is the P-Code system [13]. A second approach 
i s to represent the machine and operating s ystem dependenc y of 
the back-end by represent i ng rele vant characterist i cs of the 
target machine and operating system by tab le s of data, or a mix 
of tables and procedure cal Is. 
a l I machines uses this table of 
One code ge ner ator a l gorithm for 
informat i on. A third approach is 
to provide a code generator generat or that r equires a formal 
machine description and which wi I I generate a code generator 
program. A survey of these approaches i s pro vide d by Ganapath i et 
a l [12]. 
Intermediate Codes 
Severa l in termed i ate codes have been deve lo ed in attempts to 
achieve portability of compi l ers. Often t e y are designed for 
d ifferen t environments . Some are designed to al low a given 
l anguage to be port ed to a range of host machines, others to 
al low a range of languages to be ported to a given machi ne , and 
some to a I I ow a range of I anguages to be po r ed to a range of 
hosts. The first environment wou l d be t y pic a l of an organisation 
int erested in a part i cu l ar l anguage. An e xa mp l e i s t he Dia n a [ 5 ] 
intermediate code. The second appr oach i s typical of mach i ne 
vendors. Many machine vendors have or have attempted to deve lo p a 
2-3 
Port ab I Ii ty Approaches 
house standa rd i nterme d ia e code for use by severa l front ends 
and a sing le code generator. The th i rd i s typ i ca l of software 
houses that produce comp i I ers and equ i pment vendors that se I I a 
range of mach ine s and operating systems. The Unco l [6] project 
in termed i ate code i s an e x ample of an in termediate code designed 
for such an en vi ronment. The PLIP project has shown that a sing l e 
i ntermed i ate code can be successfu l for al I three environments, 
except that the range of l anguages supported i s restricted to a 
range or family of source languages rather than a ll lan guages. 
The Rcode I anguage deve I oped in the PLIP project is an 
im p l ementation of such an intermediate code. The success of this 
la nguage has resulted from a careful co n s iderati on of what needs 
to be in an intermed i ate code to al low effic i ent target code to 
be generated. 
A major factor affecting in te rmediate code des i gn is how close i t 
shou Id be to e i the r the source I eve I or target machine I eve I. The 
c loser to sou rce l anguage the eas i er i t w i I I be to develop a 
front end for a new l anguage, but the greater the effort to 
develop a back e n d for a new target machine, and the reverse if 
the code 
If the 
machine s 
is c lo ser to the target mach i ne . 
aim i s to de velop comp il ers for one l a n guage on severa l 
i t wou l d appea l to place the int ermed i ate code at a low 
l e v el so that much of what t he compiler does i s comm on. I f the 
aim is to p rodu ce a compiler f or several l anguages for o ne 
machi n e, i t would appea l to p l ace the in terme d i ate l anguage at a 
high level. If the a im were to produce comp il ers for seve ral 
source l anguages on each of severa I mac hin es, the I eve I of the 
i ntermed i ate code i s less clear. If an in termed i ate 
o pt i miser is to be used and eff i cient code generated then 
2- 4 
code 
the 
decision 
complex. 
on the 
Portab i Ii ty Approaches 
form f the in termed i ate code becomes 
A target machine independent code representation a I I ows 
target machine independent optimisations to be p erformed. 
more 
some 
More 
optimisation i s possib l e the closer the representation is to the 
target machine. It is the efficient manipulation of target 
ma chine objects that provides a significant source of 
optimisation. Saving memo ry loads and stores of target machi ne 
objects is a prime source. The closer the representation i s to 
t arget machine form, the more e xactly target machine memory 
accesses wi I I be represented. Many optimisations a lso depe d on 
information on the semant i c structure of a program, such as flow 
of control what variables occupy a given storage area, and the 
structure of an offset computation. A low lev el intermedi ate 
code tha t does not provide this in for m3 t i on wi I I Ii mi t 
optimisation 
intermediate 
effectiveness. Optimisation therefore suggests an 
code involving target machine da ta objects, but 
higher l eve l control struct re information. 
D i ana - An i ntermed i ate code for Ada 
To ach i eve ef fici ent code generat i on, the intermediate code must 
effectively represent the semantic structures of the s o urce 
l anguage that are relevant to efficien t code generation i n the 
bac k end. If the in termedia t e co de is d e si g ned to ha ndle one 
language a n d s ev er al t a rget mac h ines t hen the in ter mediat e c o de 
co u ld be designed to represent al I the i nformation generated by 
the synta x and semant ic ana ly sis of the front end, that is the 
parse tree and symbo I tab I e . This i s the approach taken with 
Diana, wh i ch is an intermed i ate code used for Ada. This l anguage 
represents al the semantic structures of Ada, including data 
2-5 
Portab I I I ty Approaches 
t y pe information in detail, Ada concepts 
module s, subroutines, exception handling, 
of tasks, 
and control 
generics, 
structures. 
It is an attributed tree-structured intermediate code. Tree nodes 
contain attributes that prov id e semantic information. Often this 
semantic information could be computed anyway, but is provided in 
a tree node because the front end semantic analyser would have 
computed the information and i f represented in the intermediate 
code, it will not need to be recomputed. The result is that Diana 
i s a complex intermediate code and complicates the entire 
compiler, and other software such as editors and interpreters 
that may 
justifi ed 
use 
i f 
the intermediate code. 
more efficient code was 
This could 
produced, but 
perhaps 
the 
project intermediate code (Rcode) has shown that much of 
be 
PLIP 
the 
information contained in D i ana does not contribute significantly 
to target machine code. The complexity of Diana resulted because 
i t is 
source 
used for other requirements such as the need to 
code from Diana. 
generate 
If the concept of Diana were extended to an i ntermed i ate code for 
several languages and target 
structures of a I I the source 
complex interme diate language . 
machines, representing semantic 
languages would l ead to a very 
The back end rea I I y becomes a 
translator for the union of al I features of the source languages 
represen ted in the intermediate code. This would prov i de very 
l ittle gain from the use of an intermediate code. This was one of 
the major problems in the Uncol project. 
Reade - The PLIP first l eve l Int ermed iat e code 
The solution adopted in Rcode is to select 
structures for a fam i I y of source I anguages 
import ant to efficient code generation in 
2-6 
those semantic 
that rea I I y are 
the back.end. In 
Po r tab I Ii ty Approa ch es 
deve l oping Rcode i t has been determ i ned that for a related 
"family" of languages, only a few semantic structures are 
i mportant for the generation of efficient target code, and Rcode 
i s su i tab l e for an environment where mult i ple source languages, 
target machines and operating systems are invo l ved. The 
defin i tion of such a family of languages i s outside the scope o f 
this thesis. 
The PLIP project has shown the Rcode se ma n tic structures usefu l 
i n representing the fam i I y of I anguages that i nc I udes Pase a I , 
Algol, Modula-2, Ada, Pl1 , C, Lisp, etc as: 
Note 
al Control structures 
Loop 
Counting l oop ("for·· loop) 
Case 
not concurrent (sequence) 
Set (optional l y concurrent ) 
Goto 
b ) Subroutine and funct i on ca I I mec hanisms 
c) Expression evaluation and structure 
that ass i gnment 
Many of 
isreallyjustaform of 
the additiona l se mantic 
function call 
constructs of mechanism . 
languages are not relevant to target code generation. A 
s i gnificant 
considered 
relevan t to 
omission 
important. 
the user 
above is that source data types are 
In rea Ii ty source data 
and the front en d. The 
2-7 
types are 
programmer 
not 
on l y 
i s 
Portab l I i t y Approaches 
concerned with manipulating source l anguage data objects not 
target machine objects. Target code is concerned with 
manip ulat ion of bits. Machine level instructions will often 
perform operations on bits interpreting them as being some form 
o representation such as two's complement b i 
the source l evel the concept of data types 
ary integers. At 
usua l ly includes 
semantic rules such as preventing mixing of ty , es in arithmetic 
e x pressions, or assigning a real value to an integer. At the 
machine level an operation may in terpret bits as representing a 
certain s i zed two's complement value, no chec ks are made whether 
this piece of memory is "typed" for this data type. The bits may 
have been placed there by an operation invo l ving a different 
interpretation of the bits. An exception ma y occur if the bit 
pattern does not represent a value within the machines capabi I ity 
(oversized). Thus the source datatype concep-- is almost totally 
unrelated to the manipulation of bits at the machine level. The 
Rcode "Basictype" para I l eis the machine l evel bit string 
i nterpret i on for 
"d atatype" concept. 
an operation rather than the 
At the mach i ne I eve I, ma~·or 
source I eve I 
issues concern 
how to uti I ise reg i sters eff i c i ent l y and : h e address in g of 
objects given reasonably comp l e x address com u tations wh i ch are 
usua lly re l ated to accessing objects i n arra y s and records. 
However the concept of records or arrays i s no: imp~rtant, on l y 
that an array or record represents that seve r a l target machine 
objects wi I I have runtime addresses that are related to a c ommon 
base address. 
data structure 
I n Rcode, computation of an address 
i s represented by a subtree. 
in a 
This 
structure information can be used to generate efficient 
The front end should map machine addressing mode s. 
comple x 
address 
target 
source 
operations on source data typ e s to intermediate code operations 
that involve manipulation of target machine s i ze bit objects. 
2 - 8 
Po rt a b i Ii t y App r oac h es 
This would seem to im p ly rep l acing he rob l e m f represe nt ing 
al I source types by reoresenting al I machine l evel bitstring 
interpre tations availab le. his is at a real problem as target 
machine level interpretat i ns are much l ess comp : ex, and there is 
more standardisation t ' a source l a guage data types. Detai Is 
such as act u a I s i z e may d i fer but o t 
signed and unsigned in:eger, rea l s 
the bas ic forms 
and dec imal. 
such as 
The Reade 
"basictype" can represe t in a uniform way, bas i c interpretations 
and sizes across a wide rage of machines . So ~ce languag es may 
have int egers, reals, arrays, enumerated types, str in gs and 
records, and deta i Is of t e ru I es govern i ng sue objects can vary 
greatly. Examples are r ul es governing semant ics such as packing 
of arrays and r ecords, and string variables. 
The front end must be g iv en access to i nformat io n on bitstring 
in terpretations supported by the target machine and wi I I generate 
code that involves operat i ons that use these interp retations. It 
wi I I determine how objects of arra y s and records wi I I be packed 
based on alignment requ i re ments 
the size of objects in me~ory . 
for ·arget mach i .. e objects, and 
In Reade the in:erpreta t io ns are 
repre sented in a cohe rent fash ion sing :he "basic type" 
definit i on of types. r· s i s a one byte value ~sed to describe 
the type 
i nteger, 
of man i pu lat i o sch as card i a l , boo l ean, real and 
together with s i ze. It al lows a standa r:: me thod by which 
various par ts of the cornpi !er can access in formation on the 
in t erpreta t ion s available on the targe t machine. 
In Reade, e x pression e a l uation i s provided b y the tree 
structure. The tree str ~c ture prov i des a clear indication of 
expression structure as opposed to I in ear code. Directed acyclic 
graphs would also of, co rse, allow common subexpressions to be 
2- 9 
Portab f Ii ty Approaches 
represented. 
Rcode includes provision for expressing source type information. 
h e in tent ion for this is to provide for "symbol file" production 
fo r relevant lan guages a nd to provide a sy mbo lic debugger with a 
tem p la te for i nterpreting storage in a form more suitable to the 
user . The types supported wi 11 not cover e xactly a l I types 
avai I able in any language, but provide suffic i ent for a debugge r 
to prov ide a reasonable inter pretation of the conte nts of memory . 
Linear Intermediate Codes 
An alternative approach to int ermediate code 
the "P-code" syste m, which i s a very low 
I i near intermediate code that has: 
is represented by 
le vel unstructured 
Memory which is word and byte oriented with word 
addresses and byte pointers supported 
Zero address top of stack arithmetic instructions 
Severa l pointer registers 
Proce dure ca lli ng mechanism t hat supports the 
lexical and scope structure of Pasca l procedure, 
plus handling of result and l oca l variables. 
Block word and byte movement and compare 
instructions that ass i st w i th records, arrays and 
str in gs 
Instructions are pr ovi ded to al locate and 
2- 10 
T h i s 
Portab I Ii t y Approac hes 
d ea I I ocate storage i n a heap memory area 
Instruct i ons 
b i tstrings that 
set operations 
are provided for 
represent sets, 
manipulat i ng 
to i mp l ement 
Co nd i t i ona l branch i nstruction are prov i ded as 
t h e bas i c mechan i sm for providing f l ow co n tro l 
I n st r uctions for mainipulating boo l ean, i nteger, 
re a l , po i nters , scalars (bytes> and str i ng s a r e 
pro vided. 
lan guage is designed as an i ntermed i ate code for 
i mp l ement i ng Pascal and therefore its design very much reflects 
the fac i I i ti e s of Pas ca I . However i t cou I d probab I y be reasonab I y 
use d for a wider range of languages. As P-code i s very low leve l , 
mos t of the Pascal compi !er i s target machine i ndependent, the 
compil e r wo uld be easy to port. Only a P-code to target mach i ne 
i n t erpr et er o r trans l ator would be requ i red. However effic i ent 
co d e generat i o n wi I I not be easy because ver y I i tt l e structura l 
i n f orma t i on i s pr o v i ded. This is a p i t y as Pasca l was des i g n ed t o 
a l low e ff i c i ent target code to be produced. Great ef fo rt wi 1 1 be 
r eq u i r e d to e x t r act structural i nformations ch as f l o w of 
con t r o l. T h e subroutine calling will allow effective calling 
target code to be gene r ated. However the convention used I i mits 
P-c o de t o source l anguages that ha ve procedures i nvo l v i ng 
statica ll y si zed variab l es, arguments and r esu l t s. T h e l o w l eve l 
i nterm ed i a te code approach was never ser i ous ly considered a s a 
v i ab l e a l ternative in the PLIP project. 
2-11 
Portab i I I ty Approaches 
An i mp roved I i near i ntermediate code that conta in s semantic 
structural i nformation is often used. Typ i cally these codes 
consist of three 
i nstructions wi 11 
address 
i nvolve 
instructions. 
operations on 
i k e Rcode, these 
mach i ne l evel data 
objects, and semantic structural informat ion is a l so suppl ied. 
It may be 
"pa rameter " 
subroutine 
provided in the form of instruct i on 
or "ca l I .. which provide structural 
calling, and declarat i on pseudo 
codes such 
information 
instructions 
as 
for 
to 
declare procedures or storage areas, and For, Case and Loop 
codes. Expr ession structure may be retained i n that an operand 
that is computed may actual l y be indicated in the three address 
instruction by a po i nter to the instruction i n which the operand 
was computed. Therefore a tree or even d i rected grap h structu re 
for instructions may be maintained. However control structures 
such as FOR loops a n d comple x nest in g of CASE and LOOP structures 
can only be provided if codes for marking beg i nning and end of 
loops and case options and for marking FOR i de x var i ables are 
in c lu ded i n the in termed i ate code. In this case the Ii near three 
address code carries much of the in for mat i on orovided b y Rcod e. 
However i n most modern l anguages , d ir ected graphs will be 
required for 
be written 
common sube x press ions . An i nte rme diate code cannot 
to an e x terna l device i n post order form and st ill 
retain a directed grap h structure. Linear codes so lv e th i s 
problem b y the use of temporary variables. Te next chapter wi I I 
discuss how t hi s problem can be so l ved by t he introduction of a 
second intermediate lan guage . 
An e x a mp le of a I i near intermediate code t '1 at contains many 
semantic structure pseudo i nstruct ions is the i ntermediate code 
use d by Ank l a m et al [7] for t he deve l opment of a code generator 
back end for the Di g i ta l VAX. This interme diate code has 
2-12 
Port ab i Ii ty Approaches 
ins truct io ns that can have a var i able number of operands rather 
than three address operands. As it i s intended for a spec ific 
target machine i t i s reasonably target machine dependent, most 
no tab I y that data types supported are designed to map eas i I y to 
VAX data types, however the types are genera I I y usefu I for a wide 
range of machines. 
An intermediate code designed for im pleme nt l n g 
on one machine may include 
compilers for 
several languages spec ific target 
machine features. For e xa mple, 
can be specif i ca I I y target 
the data man i pu l ation ope ra tions 
machine codes , addressing modes 
specif i ca I I y those of the target machine, and the registers of 
the target machine can be specif ic a ll y i ncluded. An intermediate 
code for multiple target machines can't represent target machine 
features without becoming very large, or being modifi ed for each 
ta r get machine, and therefore the front end. An intermediate 
seem sh ort language designed for a single target machine would 
sighted in that the compiler fr on t end is I ike l y to 
another machine at some stage, if it is any good. 
be ported to 
A group in 
intermediate 
environment. 
Denmark [14] have used the i dea of two 
c o de i n the de v e lo pment of an ADA 
Portable Target Code Generat i on 
Table Driven Code Generators 
levels of 
c omp i I ing 
The table driven approach of representing informat ion on the 
mach in e i nstruction set, data t y pes supported, and architecture 
generally, can become very comp ! i cated if t he i d io syncrasies of 
the var io us target machines are to be accoun ted for, or 
2 -13 
Po r tab I Ii ty Approaches 
a lterna t ively only a limi ed repr esentat ion of arget mach i nes is 
used so that inefficient code is generated. The table i s used to 
make decisions during such tasks as assignment of addresses 
offsets, al location of registers, select i on of instruction 
subroutine cal I construction. Information such as data 
and 
and 
bject 
sizes supported for various operations, ava i I ab i Ii ty of two and 
three address instructions, the requirement of some instructions 
to use specific registers and the var i ous sizes of offsets for 
branch in structions must be represented. As the code generator 
performs these tasks it wi I I need to maintain a database on the 
l ocation of objects in memory (variables, routines, constants, 
labels) and the allocation of registers. This resource database 
can be referred to as a mac h i n e re s o ,- ·:.,:, a I I o cat i on database. 
This database must be capable of representing the resources of a 
wide range of machines. Both the table and database must be able 
to handle the various memory structures and addressing modes of 
the intended target machines. 
The major difficulty of the tab l e driven approach is that the 
table which can represent a wide range of machines wi I I be many 
times that required for one mach ine alone. hi s is because for 
any machine there can be factors important to generation of 
efficient machine code that are not present on other machines. If 
ef fi cient code is to be produced it wi I I be necessary for the 
table to represent the sum of factors important to a I I in tended 
target machines. The wider the number of intended target 
machines, the l arger and more complex the tab l e would be. _ f the 
intended target machines can be limited to a family of sim i lar 
machines then the tab l e can obviously be ke pt s i gnif ica tly 
sma I I er. 
2-14 
Portab i Ii ty Approaches 
For 
the 
any given machine many of the cha r acter ist i cs represented in 
table wi I I not be relevant as they represent information 
re l evant to other machines. However the code generator wi i I sti 11 
i nclude the logic for processing this informat io n i n case it 
encounters a machine for wh i ch th i s i nformation i s im portant i n 
code generation. The target code generator tables and code wi I I 
be unwieldy in size. 
Hand Coded Code Generators 
The hand production of separate code generators for each target 
machine on l y seems a practical proposition i f the i ntermediate 
code is low level so that the generator i s small. The 
disadvantages of such an intermediate code have a lr eady been 
d i scussed. However this approach would be pract i cal for higher 
l evel intermediate codes if each code generator was engineered in 
such a way that the code generator for one machine could be 
easily modified to produce a code generator for a similar 
machine. 
Such engineer in g can be ach i eved by i dent ifying the parts of the 
code generator logic that involve machine dependency and i nstead 
of accessing tables of i nformation, insert target machine 
dependent comp i !er co de. Thi s approac h is enhanced if the code 
generator is abstracted as much as possible and target machine 
dependencies kept to the level absolutely necessary. As th e 
target machin e dependent is sues that are relevant wi I I vary from 
one machine to another, the nature of target code generators for 
ea ch machine could vary sign ifica nt ly . However i f the generator 
structure i s abstracted effectively, a code generator fo r one 
mach in e should be reason ab ly easy to modify to produce a code 
generator for any other machine that has similar character i stics . 
2-15 
Po r tab i I l ty Approac hes 
h i s app r oa c h r ea I I y comes dow n to a c l ass i cal software 
eng i neer i ng prob l e m i nvo l v i ng care u l abstraction of functions 
and relevant databases so that portabi I ity and changeabi I i ty are 
ma x im i sed. The resource database and a y target machine table 
r epresent i ng character i st i cs are i mp l emented i n a target machine 
dependent way but the structure i s abstracted and access provided 
vi a procedures and enumerated data types. Code generator 
a l gor i thms are c aref ully des i gned so t hat they only deal wi th 
machine spec i f i c deta il s when abs ol ute l y r equi r ed. Where target 
machine code i s r equ i red i t i s clea rl y i de nt if i ed. 
Th i s a pproach 
levels c l ose 
has been ta ken th r oughout the PLIP project. At 
are being 
The issues 
to where target machine 
generated, abstracti on becomes mar~ 
instruct i ons 
diff i cult. 
i nvolved wil l not be common to a I I machines, so that the 
abstractions required will differ. In this situation port i ng will 
require mod i fication of the abstract i ons provided in the code 
generator logic and database, but this shou l d never become an 
e x cuse to abandon abstraction attempts or a code generator 
dev e lo ped fo r one mac hin e wi I I not be easy to port to eve n a ver y 
s i rn i l ar machine. 
Code Ge nerator Ge nerato r s 
This approach appears to offer the rea I po s s i bi Ii ty of 
gener a ting a co de g enerator fo r a n e w tar g e t mac h ine . Al I 
required i s de f i nition of t he mac hine c haract e ristics 
eas i I y 
tha t is 
in a 
formal grammar. This defini t ion i s proces s ed by a code generator 
generator to produce a code generator for the target machine. 
Thi s approach was ser i ously cons i dered for the PLIP project but 
was considered to have the following prob l ems: 
2-16 
Portab i I i ty Approaches 
a ) 
driven 
T e c de generator prod ced would 
suffer the same problems 
probably be tab le 
and wou l d as table driven 
co mpil ers: the table and generator code would be too large, as 
the genera li sed table format would include more complexity than 
required or the actual target machine itself . An intelligent 
coce generator generator would modify the table and code form as 
mu c as poss i b l e for each target machine, but th i s wou ld make the 
generator ge nerator very d i ff icult to write. 
bl As wi th most automatical l y generated software, 
generator wi 11 be slow. 
the 
cl Optimisation is difficult. It was envisaged that 
i ntermediate code would probably have to be Ii near 
code 
the 
and 
opt i misation I imit ed to such options as peepho l e optimisation. 
Tree structured intermediate codes and more complex optimisations 
would requ i re a much more complex generator generator. 
d) The effort inv o lv ed in deve lo ping the descr i pt io n for one 
machine would be heavy, nearly equivalent to hand cod i ng the 
code gene r ator . The definit io n would no doubt requ ir e extensive 
de gging equ iv a l ent to a hand coded code generator, but the 
relat i onship between er rors in definition and invalid code 
produced would not be as obvious as the source of such errors in 
a hand coded generator. 
el Modific ation of the descr i ption for one machine 
produce a des c r i pt i on for another s i mi I a r mac h i n e w i I I 
easy because of the nature of many formal grammars. 
not 
to 
be 
f ) Most automatic cod e gene r ators have been unsuccessfu I, 
2- 17 
Po r t ab i Ii ty Approaches 
although during 1986 two successful projects were reported. One 
by Schm i dt and Voll er [15] involved the deve l opment of a portab le 
compi Jer system for ··Pascal and Fortran I ik e·· l anguages. Th e 
Vienna Development Met od and i ts spec i ic at ion l anguage META IV 
are used to descr i be e formal spec ifi cat ion of the target 
machine and sou rce Ja ng ages. A common intermediate la guage is 
der iv ed from the def ini t io n of the source languages and a code 
generator generator processes the target mach i e spec i f ic ation to 
produce e x ecutable Pascal programs tat implement t h e 
generators. A seco nd by Ganapathi and Fischer [16] uses 
grammars to describe the target machin e instruction set and 
code 
affix 
the 
code generator 
techniqu e s. It 
i s obtained automat i call y using attributed parsing 
is c laim ed a code generator such as th i s can 
perform most "popular" ta r get machine optimisations. 
Most attempts at producing retargetab l e code generators have 
mainly been based o n code generator generators, e ven if the 
r esu l t of the ge nera or generator is essentia lly a tab le 
representation of t he target mach i ne, or carefully engineered 
han d coded code generators. 
Overall i t wa s fe l t t'at with equ iv a l ent effor s ma ller, faster 
code generators t at produced better qua Ii ty code co I d be 
produced by hand coding as compared to table dr iv en generators or 
automatic generator produced code generators. Th e decis i on was 
made to i nve s tigate ways of im proving the engineering of ta rg et 
code generators so t hat code generators developed for one machine 
coul d more easi ly be modifi ed to pr o d uc e generat o rs or other 
mach in es . 
2-1 8 
- --- - - - - - - - - - ----
Portab I Ii ty Approaches 
Oth er L i terature 
eferences to al I i ed (more or l ess) work are: 
a) Compiler Tanenbaum et al 
produced a too l 
too l 
kit 
k i t , 
for making portab l e 
[ 1 J . They 
compi Jers. 
have 
The 
" 2msterdam Cornp i ler Ki t " cons i sts of a set of in tegrated programs 
: o s im p Ii fy the task of produc in g portab I e comp i I ers. For each 
a g ua ge a front - end must be written to produce in ter mediat e 
code . A portable optimiser i s provided or th i s intermediate c ode 
w:1 i c h i s then translated or in terp r eted to th e assembly language 
o the target. 
b) Portability vi a vir tua l mach i nes, Yan k ov and Bonev [18]. 
Th ey describe the use of interme diate codes for virtual 
processors which are interpreted by emulators on the target 
1a chines. 
c) Table driven code gene ration, Graham [19]. A Ph. D. study 
o table dr iv en code generation. 
d ) Mo du l a-2 optimising comp i l er, Powe I I [ 20 J . Th is project 
demonstrates how a compi l er that produces reasonab l e code can be 
c n str-ucted qui ck I y if ad v antage i s ta ken of existing so tware. 
T~e parser was generated using Yac c [21] and P-code i n termediate 
c de i s r educed , providing compatibility with many Pas c al 
comp i I er s. The procedu r e ca I I convention conforms to tha t of 
Pascal and C compilers runn i g und er Ber k eley Unix. The P-c ode of 
cou rse al low s existing P-code translators and interpreters to be 
1-sed. 
e ) Portable C Com pi l ers ( PCCJ, Johnson [22]. The front -e nd 
2 -19 
Port ab i Ii t y Appr oac hes 
of the compi l er i s gene r ated b y Yacc. I t s i ntermediate cod e 
consists of prefix notation 
the rest. Obviously such 
for express i ons ad assembly code for 
an intermed i ate code does not make 
portab ili ty easy. However three qua r ers o - the compiler i s 
i ndependent of the target machine, i nd ic at i ng the degree to which 
e tective software engineering shou ld be ab l e to reduce target 
mach in e dependency of compi l ers. 
A portab le C compi ler, Snyder [23 ] . Developed a C 
portable compiler that i s driven by a set of ma ch i ne dependent 
i ntormat i on contained in a set of tab l es which are automatica lly 
c onstructed from 
machine dependent 
a user provided machine descr i pt ion fo r a 
but abstracted mach i ne . The user d efi nes 
trans I at ion to target machine code of abstract machine code 
produced by providing macro def i nitions . 
g ) Multiple front and bac k ends , Davis on and Fraser [ 24 J. 
Descr ib es 
of variou s 
comp il er organisations des igne d to a l l ow 
front and bac k ends. Def in es the terms 
comb ina t ions 
union and 
"i ntersect i on" 
i ntersect ion 
mach i nes in re erence to i ntermediate codes. An 
and 
un i on 
i ntermediate code ha s I imi ted operators 
address i ng modes so that the fronted as 'ew choic es. A 
i ntermed iate code has a wide ra n ge of operat or s and addressing 
mo des so that the front end has a wi de range o choi ces, but as 
not al l target ma chines may support a ll these alternatives it 
ma y be 
machine 
nece s sary 
in formation. 
for the front end to have 
These concepts a r e sef u l 
des i gn o any intermedi ate code for m. 
acce s s to target 
background for the 
h) Code generator generator, Hey Ii ger et a I [25]. Produced a 
recommendation for a retargetable co mpil er using a comp il er 
2-20 
Portab i Ii ty Approaches 
cornpi l e r t hat uti I i sed a ma chine descr ipt i o provided i n a 
mach ine descri pt io n language. 
i) A re-usable code generator for Prime SO-ser i es computers. 
T. Akin [26] . In volved the design and deve lo pment of a target 
code 
This 
gene ra tor that could be used with mu l t i p l e front ends . 
produc t is t y pical for compi l er develope r s w i th one target 
machine i n min d. The i nput i s a tree structured in termediate form 
a nd the output is assembler. 
2-21 
Chapter Three 
GAS code: A Second Intermediate Code 
The obj ecti v es I i sted i n chapter one for the des i gn of t he code 
generator requ i red tha t account be taken of the fact that Rcode 
i s a long way from target mach ine code, and that many of the 
target machines wi 11 have simi Jar architectures. It i s t hese two 
fa cto rs that led to t h e con cept for a second in termed i a te code. 
This chapter out I in es re ason i ng that le d to the concept of a 
second i ntermediate code, and the role that suc h an interme diate 
code should p l ay, includin g i ts relat i onship to Reade. Th e next 
chapter d eals 
the d ev e l oment 
in more 
of a 
deta i I with the concepts tha t arose during 
pro totype GAS co de deve lo ped for the 
prot o type code generator for PLIP. 
Justification for GAS code 
Rcode has been designed to provide an in termed i ate code that the 
front end 
structural 
c an effectively generate, and which conta i ns semant ic 
i ntormat ion essential for efficient code generation, 
but i s not source l ang uage or target mac hin e depe ndent. Th i s 
pro vide s the i dea I veh i c I e for front and ba ck end i ndependence. 
It doe s i n volve operations on objects that are essentially 
t ar get machine obj ects an d not source l anguage objects. However 
i t is not c I o se o target rnach i ne code in severa I ways: 
a) Computed operand values for an Rcode are spec i fi ed by a 
sub tr-e e of Rcode i nstruct i ons. In a register based machine, the 
value will be typica ll y computed into a regi s ter. The operand 
computat i on is linear i sed. This l i nearisat ion is not performed in 
3-1 
A Second Intermediate Code 
~code as i t loses useful structura l in format ion provided by the 
tree structuring and very much depends on the number of addresses 
2 , l owed i n target machine instructions. I f zero address 
i nstructions are avai !able, the tree's structure can be very 
easily converted to target code. Intermediate results are he l d on 
t e stac k. The only problems occur when attempt i ng to hand l e 
- ommon subexpressions. If the machine is three address, temporary 
storage locations must be created to store i ntermediate results. 
~f only two address in struct ions are avai !able, more temporar ie s 
a n d move in str uctions w i I I be created. If the mach i ne has 
~egisters, selection of l ocations for holding values wi I I become 
even more complex. 
b) Rcode includes arithmetic operations on block objects 
that may be both large and dynamically sized. For a real machine, 
tnese may have to be converted into operations on objects 
supported by the target machine. These breakdown operations are 
not performed in Rcode partly because breakdown requires some 
nowl edge of machine architecture (the basic arithmet ic 
·n structions, f l ags, and branch i nstruct i ons avai !able to do the 
~o b> and because th i s breakdown would make the Reade target 
. achine dependent and defeat i ts purpose. The target machine may 
e v en support operat i ons on very large and dynamical ly sized 
bjects. Any breakdown should be l eft to the code generato r so 
t hat ful structural inf ormation is p re served, al lowing the 
arget code generator to deci de how the breakdown is best 
~er formed. 
n many target machines, operands of arithmet i c i nstructions must 
i n volve operands of the same type. On other machines operands may 
be different in size, eliminating code needed t o mas sage 
3- 2 
A Second Intermed i ate Code 
size. Re ode arithmetic d i ffere t 
operat i ons 
operands 
s i zed operands to the sa me 
(block oper at io ns as i de ) 
are the same size because 
seem to spec ify that a ll 
a basictype operand i s 
i nc I uded. However consider: 
ADD 
I \ 
I \ 
I \ 
I 
I 
4 byte 
int eger 
\ 
\ 
X ADD 
I \ 
I 
I 
I 
I 
2 byte 
i nteger 
y 
\ 
\ 
\ 
\ 
z 
A clever code generator would sti I I gene r ate a first i nstruc tio n 
to add the two byte valu es in the first add, then an i nstruct io n 
to add a four by te valu e at X to the two byte result of the first 
add, rather tha n an instruction to massage the result of the 
first add to four bytes . 
c } 
in vo lve 
sized . 
Bitstring manipulat ion s are specified 
bitstrings that cou l d be both large 
The operations could be con ver ted to 
3-3 
in Rcode. These 
and dynam i ca I I y 
operations on 
A Second Intermediate Code 
of s i ze supp o rted b y the target ma cni ne, and bitstrings 
bitstring 
breakdown 
i nstructions supported by the ta rge t mach i ne. 
us i ng 
Th i s 
is not performed in Reade or t h e same reasons 
described above for block arithmetic operat io ns . Some target 
machines support some b i t operations on dynamically sized 
bi tstr i ngs, so the breakdown may not be desired s im i I ar to the 
above in ( b) for arithmetic operat i ons. 
d) Subroutine ca l ling in Reade does not inv olve a ny form of 
convention such as use o f the stack for the result of function, 
passing parameters, or al location of storage for local objects. 
Rcode wi I I give a specification of the argu ments, result and 
local 
is to 
objects, i n terms of size and alignment, 
be implemented on the target machine. 
but not how this 
There wi I I be 
significant ly more involved at the ta r get machine level to 
actual I ly implement a subroutine cal I The subrou tine structure 
in Rcode is designed to adequately reflect the subroutine 
fac i I ities and semantic structure of the source l anguages: 
le x ical nesting and scope, arguments , resu l t, and local objects 
of routines. Rcode does not attempt to represent how a subrout in e 
cal I i ng structure is implemented as this i s obviously target 
machine dependent. The im p l e mentat io n 
processor mach i ne wi I I probab l y differ 
on a stack, 
s i g i fi cantl y 
implementation for a transputer based mach i ne. 
s in g l e 
to the 
e) Rcode does not re flect the way operands for instructions 
a re def in ed in target machin e instructions. Rcode operands can be 
subtrees, whereas target machine operands merely i nvol ve 
objects referenced via address i ng modes suc h as i nde xi ng, stat ic 
offsets and i ndirect ion . Al I operands i n Rcode are referred to by 
specifying the ide ntity of a storage area, and an offset into the 
3-4 
area. Th i s 
statical ly 
A Second Intermed i ate Code 
corresponds to d irect addre ss ing i f the 
computable and inde x ed addressing i f the 
offset 
offset 
is 
is 
dynamical ly computed. Immediate add re ssing i s supported, if the 
operand 
extract 
i ts e I f 
from 
complexity of 
can be statically e val uated. I t i s possible to 
the Rcode more i nforma t io n for almost unl i mited 
addressing. The fo l lowi ng e x a mp l e c l ear l y shows 
the op erand address computat io n stru c t ure and would a l l ow complex 
addressing modes on t h e target machine to be used, i f ava i I ab I e. 
I 
I 
I 
ADD 
I \ 
I \ 
REFER X 
I 
I 
LOAD 
ADD 
I \ 
I \ 
LOAD #4 
ADD 
\ 
\ 
\ 
\ 
\ 
MULT 
I 
I 
\ 
#12 #5 
3-5 
\ 
LOA D 
REFER Y 
A Second Intermediate Code 
Linearisation of Reade expression trees wi 11 lose information on 
the structure of an operand address computation. This effect must 
be taken account of if efficient addressing modes are sti I I to be 
generated in the target machine code. Various target machi nes 
provide fairly complex addressing modes such as computing an 
operand address as the contents of the memory location whose 
address is in the I ocat ion whose address is given by the sum of 
the c on ten ts of a reg i s t er and an i mm e d i a t e v a I u e . These a I I ow 
complex Rcode operand address expressions to be implemented as a 
single complex target machine operand . Reade contains the 
information on an operand address computation structure, but does 
not reflect that in target code this complexity is converted into 
a reasonab I y comp I ex addressing modes, or more genera I I y, 
than the address computation being simply li nearised, 
rather 
i t is 
converted to a fewer number of in ear in struct ions using the 
complex addressing modes available. The effective use of complex 
addressing modes can significantly improve the quality of code 
produced. In reality most target machines wi 11 be imi ted to the 
extent of complexity for operand addressing modes. Add i t i onal Jy, 
in most source languages, there is a limit to the level of 
comple xi ty of addressing modes required for the implementation of 
constructs in the lan guage. Only when using an array of records 
which contain an array of records etc could very elaborate 
addressing be requ ir ed. The limi t on a ddress ing mo de complexity 
required i n practice needs to kept in mind. It should also be 
noted that the generation of complex addressing modes is not 
rea I I y target machine dependent . Concept such as indexing, 
i ndirection are al I reasonably common. If a machine does not have 
a given complex addressing mode, it is a fairly easy matter to 
3-6 
A Second Intermediate Code 
brea \ c own the comp l e x mode i nto the simp l er modes provided. 
Many target machines wi 11 
used objects wi 11 be hopefu I ly kept 
provide reg i sters. Commonly 
in registers. Therefore there 
is • e question 0 where operands should be l ocated for an 
i s t ,- u c t i o n , and register address ing modes are i ntroduced. When 
ot sets or po i nters to operands are computed, there i s the 
q estion of whether and which index registers shou l d be used. In 
t e Reade machine when a subexpression i s required for an 
operation , the subtree for the operation is tacked on to the 
in struct .i on. 
g) For a machine that provides a stac k, subroutine cal I ing 
wi 11 typically use the stack. A major question in target code 
ge nerat i on wi I I be the effective use of stack pointer registers 
in t he access of objects allocated on the stac k . 
h) In a zero address oriented machine, emphasis in target 
code generation w i I I be on the effect i ve loc at i ng of ob j ects on 
the stack, and the addressing of these objects us in g various 
stac k po i nters. In fact the Reade tree structure w i I I suit a zero 
address arch i tect u re. 
; l Reade spec i f i es contro l logic structures with high 
faci I i ties su ch as For loops and Case structures. In 
machines the se wi I I be implement ed at the target ma chine 
using conditional and uncondit ion al branches. 
level 
many 
level 
Re de ha s been effect iv ely designed to a l l ow effective isolation 
betwee n the front and back ends. However Reade is obviously not 
In the differences between Reade c lo se to target machine code. 
3-7 
A Second Intermed i ate Code 
a n d ta r get machine code discussed ab ov e, i t i s c l ear that severa l 
d i tferences are dependent on the target machine such as use o f 
target machine registers, and some d i fferences su c h 
tree structured cod e of Rcode, convert i ng 
as 
the the li near i s in g 
h i gh l eve l control structures of Rcode in to more typ i ca l branch 
breaking down ar i thmet i c operations on oversized i nstruct i o s, 
objects a n d i nearisat i on of operand subtrees i nto i nstruct i ons 
using co mp l e x addressing modes are no t part i cu l ar ly target 
machine dependent, but a re mo r e depe ndent on the genera l 
ar ch i tecture and instructi o n set . For e xa mple i nearisat i on will 
de pend very much on whether the mach i ne i s reg i ster or ze r o 
address or i ented. Another examp l e i s the break in g of operations 
on b l oc k ob j ects and bitstrings i nto operations on objects 
supported by the target machine. Th is wi 1 1 i n v o l ve creating 
temporary objects as requ i red, us i ng ha r dware fac i Ii t i es for 
detecting overflow and carry, and us i ng i nstruct i ons supported 
by the target machine. For man y ma chi nes, the instruction sets 
and status fl ags avai l ab le are very s i mil ar for th i s purpose. If 
t h e mach in e is zero a d dress o ri ented , a d i ffere n t approach w i I I 
be used t ha n fo r a mach in e with reg i sters . If t h e target machine 
has a stac k that ca n support s ubr ou t i ne c a l I in g , c o de w i I I be 
generated to t h e use stac k . In a transputer mach i ne , a procedu re 
c a l I i s n otab l y d i ffe r ent i n the mec a n i s m f o r pass i ng of 
a r gu ments . Though the deta il s of th i s co d e wil be spec i f i c to 
ea ch ta r get machine, the genera l form wi I I be common to al I 
machines with a s tack . 
Hence a r c hi tecture a n d genera l in st ruc t ion set wi I I b e 
a s i gnifi cant factor i n t h e p r ocess of con vert i ng Rcode to targ e t 
machine code. 
3-8 
A Second Intermed i ate Code 
Opt i mi sat i on is another reason for the use of a second 
i ntermediate code. Optimisation could be undertaken at the Rcode 
level for such pure ly machine independent optim i sat ion s as common 
sube x pression e limina t io n, and removal of loop invar i ant code 
from w i th i n loops. However so me of these opt imi sat i ons require 
that a directed ac ycli c graph be created rather than a pure tree, 
but the convention f or writing Rcode out in post order form wi I I 
mean that the d i rected graph wi I I be returned to a tree, 
effectively losing the optimisat ion . Optimisations would sti I I 
have to be perfor med at a later stage, a nd many of these 
optimisations wou l d use the same information generated for Rcode 
optimisation. The advantage of Rcode optimisat i on i s that code 
does not have to be written for the optimisations performed for 
each target machine or fam i I y of machines encountered. However 
optimisers would sti I I have to be written for each target 
machine code and these would require the regeneration of some of 
the same information used by the Rcode optimiser. The concept of 
writing an optimiser for each target machine does not really 
appeal. Additional ly , at the target machi e code l evel the 
program structu r e i s no longer ava i I ab l e as i t i s in Rcode, so 
optimisation effect iv eness wi I I be s i gnif ic ant l y d im inished. A 
program representat i on that was closer to fina l machine code, but 
which contained structural information had se v e r a l advantages. If 
this code I inearised us i ng temporar i es for commo n subexpressions, 
common sube x press i ons could be represented by references to the 
s ame memory lo c at i on rather than by a subtree . If the code were 
written to dis k, the optimisat i on would not be lo st. The code 
wou l d also conta i n mo re of the objects, inc l ding temporaries, 
that would be direct ly manipulated in the f in al target machine 
code, so that i nformation generated for opt im i sation, such as 
liveness analysis, would be more usefu l to the target code 
3-9 
A Second Intermediate Code 
genera or. The target code generator i s part i cu l ar l y co cerned 
with register al location, and this benefits considerab l y from a 
I iveness analysis performed across as many objects as 
The retention of Reade structural information wou l d 
possible. 
be usefu l 
right through to target code generation. It i s also very useful 
for generating optimiser information such as for data flow 
analysis (see later chapter on GAS code optimisation), as basic 
blocks of code and loops are much easier to identify. At an y time 
during target code generation, the context of the program 
s ructure is sti I I known, so that possible uses of target machine 
i nstructions such as Loop can be identifie d. 
A major source of optimisation is static evaluation of a program. 
This can eliminate significant blocks of code, al low initialised 
data area I inker directives to replace code that effectively 
initialises areas, and indicate such things as whether an object 
s ize Reade subtree produces a sta tically determinable value. An 
interpret er is an effective tool for static eva l uat i on i f 
available to the compiler. An Reade interpreter wou l d be 
difficult to produce because of the complexity of Reade, b u t once 
produced wou Id i nterpret any Re ode, independent of target 
machine. A machine code interpreter would merely in v o lv e the 
compiler executing the code the compi l er has just produced. But 
why produce a I ot of machine code that is found to be redundant? 
I f the struct u re of the progra m prov i ded by Rcode is not 
available, useful static evaluation would probably be limi ted 
anyway. A I ower I eve I representation of a program that was va Ii d 
for a range of machines, and which was much s i mpler than Reade 
wou l d be a useful compromise. An interpreter would on l y have to 
be pi-educed for each family. 
3-10 
A Second Intermed i ate Code 
GAS code Structure 
The argument has there for e been made for converting Reade to a 
form closer to target mach i ne code. This second 
reflect the sign i ficant architecture features of 
code 
the 
i s to 
target 
machine . The conversion of Re ode to GAS code w i I I a I I ow much that 
i s common in producing target code for members of the fami I y to 
be perform e d by a common GAS code generator. Th e GAS code will 
ma k e common family optim i sers and interpreters possible. The 
structural i nformation prov i ded by Reade should be retained, 
e x cept for expression structure prov i ded by the tree structure of 
Reade. The major role for the GAS code in volves: 
a } Linearisation of Reade expressions. Th i s w i I I depend on 
the general machine architecture. Consider the fol lowing Reade: 
STORE 
I \ 
I \ 
REFER A MULT 
I \ 
I \ 
ADD z 
I \ 
I \ 
x y 
3-11 
A Second Intermed i ate Code 
For a stack machine: 
PUSH X 
PUSH Y 
ADD 
PUSH Z 
MULT 
STORE A 
For a register machine: 
ADD X,Y,Temp_l 
MULT Z,Temp_1,A 
Rcode expression subtrees wi I I be replaced by GAS code se q uences. 
GAS code should model the zero, accumulator, two, or three 
address orientation of the machines in the family. 
b ) As noted above, this linearisation will destroy 
info rmat io n on the structure of operand address computation . 
h ere fore the GAS machine shou I d a I I ow operand addressing, that 
are the equa I of any of the modes i n target machines in the 
ami ly . Th i s wi allow the address computation to be converted 
to a ser i es of instructions with complex addressing modes 
selected wh en a d va n ta g eous . I f s imple a ddress i ng mo d e s wer e 
provid e d by the GAS ma c hi ne , t he I inearisation of add r ess ing 
computation would lose s tructural information that would preempt 
se of more complex address in g modes on the target machine. 
c ) It may be necessary to breakdown block ar i t '1 1r e: c and 
bitstring operation s in t o operat ions on obj e ct s of types 
3-12 
A Second Intermediate Code 
supported by the target machine, including re l ecting the need 
for a l 
of a I 
access 
no ted 
operands to be the same size, if this re l ects the nature 
the machines in the family. This process will require 
to th e data types supported by the target machine. As 
abo ve, 
operations on 
t he target 
dynamical l y 
mac h i n es i n the 
sized objects, 
breakdown w i I I not be neces sary. 
f a m i I y 
and 
may 
i f 
support 
so, th i s 
d ) The representation of the implementat i o n of subroutine 
cal Is on the target machine. If the machine has a stack that can 
be used for subroutines, GAS code should allow subroutine cal l s 
to be converted in to a series of GAS instruct i ons that represent 
the PUSH and POP instructions that wi 11 be used, and the fact 
that objects on the . stack wi I I be accessed by stack pointers. In 
fact a standard general cal I ing convention shou l d be 
in the GAS code so that the GAS code represents as 
represented 
closely as 
poss i ble the final target machine code structure. This is 
discussed below. Such a convention would bypass any host machine 
conventions unless these were compatible. However portabi Ii ty of 
software wi I I be enhanced by a standardised ca l I i ng con v ention. 
Instructions to support this convention must be in c l uded in th is 
GAS code. If the target machine i nvolves transputers, the 
subroutine env i ronment wi I I differ marked l y form a sing l e 
processor mac hi ne, and the GAS code must ref l ect the ca l I ing 
environment for this type of machine. 
e ) 
machine 
Control 
code I e v e I 
structures should be 
instruction s that wi I I 
converted 
be sed in 
to 
the 
typical 
fam i I y 
such as conditional branches, and associated l abe l s. However, as 
the progra m structural information is st i 1 1 important for 
optimisation and target code generation, the Rcodes such as CASE, 
3-13 
A Second Intermed i ate Code 
LOOP and FOR should be reta i ned , t ou g h the y w i I I be red u nda n t. 
fl Spec i al instruct i ons should be prov i ded for i nput and 
output, d i sab li ng / enab li ng i nterrupts, and processor reg i ster 
man i pu l at i on. These will a ll ow GAS code to be i mp l emented for 
these areas that re l ects the typica l form of target machine 
c ode. A l so Rcodes such as "Add In P l ace " can be im p l e mented i n 
GAS code by d i sabl i ng i nterrupts , 
in structions are ava il ab l e to do an 
i f no s i ng l e target 
in di vi sib l e add. 
mach i ne 
he GAS 
generator w i I I req uir e i nfo rmat ion on whether Rcodes s u c h as 
Add_In_P la ce can be hand l ed directly i n target machine code and 
i f so, the Rcode will be l eft in p l ace. The Rcode i nput and 
ou tput and reg i ster manipulat i on codes seem adequate, but the 
subtrees for port address , size and value computations wi I I be 
r eplaced by GAS code sequences, and as there is no way for GAS 
c ode to return a va l ue to an Reade node, a s im p l e GAS i nput, 
output or target GAS instruction is placed on the end of the 
sequence that computes the s i ze, address and va l ue co mp onents. 
The Rcode for i nput, output and register access wi I I be retained 
for structural i nfor mat i on. 
g l Calls to the operat i ng s yste m k ernel s hould be mode ll ed, 
for exa mple for request i ng saving or loading o 
To support this GAS code should incude an 
process conte x t. 
i nstruct i on that 
represents a kernel cal I. 
instruction; CHMK on the VAX, 
Many machines have an 
INT on the Intel 8086. 
equivalent 
The GAS code attempts to gener i ca I I y mode I the t arget c o de f o r a 
f a mily of machines. The number o f GAS codes wi 11 not need to be 
l arge to a I I ow the above requ i rements for GAS code to be met. 
I nstructions are requ i red to generically model: 
3-1 4 
A Second Intermed i ate Code 
Memory moves 
Stack operations ( if stack present) 
ALU operation (arithmetic and l og i cal l 
Subroutine cal I and return 
Control instructions 
Special i nstructions 
The instruction set shou l d reflect the instructions 
typically be used to implement various ALU and control 
on the target machines in the fam i I y. For examp I e 
that w i 11 
structures 
in many 
fami I ies arithmetic operations are provided on a few fi x ed sized 
objects. The arithmetic instructions in the GAS code shou l d 
reflect this. Ex ecut i on control i s often provided by conditional 
and unconditional branch instructions. The GAS machine shou ld 
reflect this. However for sub r out in e ca l I s there i s a wide 
variety of cal I ing convention prov i ded by machines wh i ch are 
often quite d iff erent . This prob l em i s discussed separate ly 
below. 
Th e i nst r u c t ion s must allow a n e ff i ci en t choice o f ac tua l target 
machine instructions. 
constra int s are placed on 
that each operand in an 
size, or t hat certain 
This requ i res that no unnecessary 
in struct ions . For example, 
arithmetic code shou l d be of 
ar i thmetic i n structions should 
requiring 
the sa me 
be two 
add ress . Ho we ver co n s t rai n t s could b e a pp l ie d wh en it is certain 
3- 15 
A Second Intermed iate Code 
that al I or a lmo st al I mach i nes in a ' a mi ly wi 11 ha ve the given 
constraint. An examp l e i s that many machines on I y provide 
ar i thmetic operations on operands of a few allowable s i zes, and 
that d yn am i c sized operands are not catered for. The concept of 
constr-aints and family definitions i s discussed more fu l ly in the 
nex t chapter, and is very important for successful GAS machine 
and code design. 
The l ack. of constra i nts on the form of GAS in stru c t ion s mak. es i t 
very I ikely that they w i 11 be orthogona l and three address . In 
many processors, the instruction count i s cons i derab ly i ncreased 
by non orthogona Ii ty, and by two address and three address 
variations of i nstructions. Additiona lly, the GAS code only 
attempts to model the major character i stics of a family. It 
contains no spec i al purpose e x tra instruct io ns. Therefore the GAS 
i nstruction count wi 11 a l ways be sma l I . 
S ubro u t in e Ca llin g 
Many machines and operating systems provide a standa r d 
convention. For GAS subrout in e calls , the str u cture of 
s ho u l d be c l ear, so that eff i c ient target code can be 
ca I I i ng 
the call 
generated 
on the target mach i ne , perhaps us in g an y eff i c i ent in struct io ns 
a vail ab I e on the target mach i ne . Howe v er a dee i s ion has been 
taken in the PLIP pro j ect to have a standard ca 1 1 i ng convention 
reqardless of the targe t machine's own faci I ities . As a result 
the stru c tur e of the runtime stack. will be fairly simi l ar for all 
target machines in a family, only d i ffering for reasons o 
a Ii gnment and perhaps a few obj ects that are required for 
im p l ementation on the target machine. Th i s wi I I assist with 
portabi I ity of programs that man i pu l ate the stack . An e x amp l e of 
this is the im p lementation of except i on hand l ers in the PLIP 
3-16 
A Second Intermediate Code 
Modula-2 portab l e runtime I ibrary. The target mach i ne subroutine 
cal I convention and assoc i ated spec ial in structions may not 
necessar i I y be used. The GAS machine references to I oca I objects 
of the routine, arguments, the result area , and outer scope 
objects should be obvious g i ven the GAS machine addressing 
faci i t i es 
faci it i es 
subrout i ne 
so that efficient 
by the target 
cal l s will be 
use can be made o target machine 
code generator. - he structure of 
made even clear because the Re ode 
"Cal I wi 11 be retained. his helps to clear ly mark to the code 
generator the components of a cal I; creatio n of the argument 
b l ock, computat i on of the result area, and the computation of the 
cal led routine descriptor. 
Control ins tructions 
in struct i ons 
conditional 
available 
should 
in 
reflect 
the target 
t he s i g n i f i c a.n t 
mach in e. For 
control 
example, 
branch instructions may be the d om inant mechanism. 
However in so me machines, 
l oop instruct i ons suited 
special control instruc t i ons such 
for implementing FOR type loo ps 
as 
in 
certain situat i ons may be available. A good code generator shou l d 
recognise these s itu at i ons and use the instruction. However i f 
the GAS code has been generated to im p l e me t the loop using 
conditional branch i nstruct i ons, the use of t e loop instruction 
would seem preempted . As Reade contro l codes are kept to assist 
target code generation, the detection of s i uat i ons where speci al 
target machine codes can be used such as l oop 
reasonably simple. 
instructions is 
Declaration of data storage areas is necessa ry for any tar get 
code generation phase. Not only must code be e mi tted, but I inker 
directives specifying g l obal storage areas must also be emitted. 
The Rcode declarations for global storage areas wi I I be retained 
3-17 
A Second Intermediate Code 
i n t e GAS c ode. Noe ta t Rcode g l oba l storage area dec l arat i ons 
conta i n area s i ze spec i ficat i ons that are a l ways statical l y 
de t e nn i nab I e . R code v ar i ab I e de c I a r a t i on s are for d y n am i ca I I y 
a l l ocated storage, su ch as on a stac k , and wi 1 1 not result in the 
ge n erat i on of li n k ers orage d i rect i v es. These will generally be 
con v erted into GAS c o de to a I I ocate th i s storage, for examp I e a 
PUSH i nstruct ion. Th e s i ze computation Re ode subtrees for 
var i a bl e dec l arat ion s w ill have been converted to GAS code, 
before t h e PUSH i s gene r at e d. 
A l I Rc o des that mar k t he dec l aration of objects or routines w i I I 
be reta i ned. T h ese Rc o des conta i n useful sema t i c information, 
such as "h ere is the de cl a r ation for a variable of the routine " , 
or "" here is the code fo r a rout i ne "" , or ""here is the code to 
c ompute the resu l t s iz e ·· . Thi s is a l I useful i nformation to the 
target code generat or . 
It shou l d now be c I ea r that Rcode i s not converted to GAS code, 
and the Rcode d i scarded , along w i th the wea l th of se mant i c 
info r mat i on i t conta in s . An i mportant concept of the second 
i nter med i ate code i s that i t co-ex i ts with much of the Rcode. The 
i mportant semant i c 
r eta i n i ng Rcodes 
dec l ara tio ns, and 
operations such 
operati 
target 
n s more 
machine. 
str uc t ur a l i nformat i on of Rcode is retained by 
s u e as Ca I I _ , For, Case and Loop, and 
on l y rea l ly replacing actual .. rea I·· 
that 
Re ode 
·· rea I·· as Add , Load and Negate, so 
closely resemble the 
A mo v e made closer 
""r ea l" o pera t ions of the 
to target code may make 
op timi sat i on 
ob j ec t s are 
better be c ause more of the real operations and data 
r epresented, but th i s wou l d be overshadowed i f 
se mantic informat i on was l ost . The only st ru c tur a l information 
that is lost is the structure of expressions, because of their 
3-18 
A Second I ntermed iate Code 
i near i sat i on. 
At the conceptual level, GAS code is generated for "rea l .. Rcodes 
an d replaces these Rcodes on the Rcode tree. "Real .. Rcodes are 
action codes; those that actually do something. GAS codes are 
" tacked" onto the Reade tree. At the representation l eve l , GAS 
c odes are coded in the data field of the two byte Reserved Reade. 
Thi s means that the rout i nes developed for hand I ing Reade trees 
c an equa ll y be used for hand I ing Reade trees containing GAS code, 
a real plus for efficiency in the compi !er. 
The need to 
are 
declare temporary storage objects as Re ode 
to mark e x pressions l i near i sed, and the need to create labels 
positions in the GAS code that are targets of GAS code branch 
i nstructions l eads to the need for GAS pseudo codes to define 
t h ese storage objects and labels. However there could potential l y 
be many of these objects and labels, and the compiler databases 
cou l d become clogged with i nformation on them. One solut i on is to 
delete the database entries for the tenporary objects and l abe l s 
for a rout i ne when handl i ng of a routine is complete. However, 
i t may be possible a po i nt is reached when i t is k nown that a 
temporary object or l abe l w i ll not be referred to again. 
Therefore pseudo GAS i nstructions to indicate th i s wou l d be 
useful from a practical v i ew point. 
The generation of GAS code from Reade can be achieved by the 
standard techniques for generating machine code from a syntax 
tr ee , e x cept that the GAS code is attached t o and replaces some 
parts of the tree, 
produced. Techniques 
quite applicable to 
rather than a mach i ne code stream being 
such as Syntax D i rected 
generating GAS code. In 
3-19 
Translation 
most cases 
are 
the 
co n version 
consist i ng 
of 
of 
A Second Intermed ia te Code 
each Rcode essent i a lly i nvo l ves a 
severa l a I ternat i ve sets of possib l e 
template 
GAS code 
sequences depending on the contents of the " ho l dup " stack of 
prev i ously generated GAS codes and operands. Operands for 
i nstructions in the selected template are "fi I l ed " i n on the 
basis of heldup operands. The i s the standard procedure for 
s yntax directed translation. 
Th e GAS code wi 11 be spread through the Rcode tree, 
im portant to real i ze that i f the Rc o de tree i s wal k ed 
order form, and 
declaration Rcodes and GAS la bels noted, and 
GAS codes extracted and placed in one inear 
but i t i s 
in post 
ist 
that the GAS code could then be e x ecuted. The GAS code e x ecuted, 
wi I I or should, produce the requ i red semant i c effect of t he 
program. The declarations are used to declare storage objects and 
p i eces of code. Each of these objects c an then be referenced. If 
GAS cod e were interpreted on a r ea l mach i ne , storage area 
declarations encountered would result in allocation of storage 
for the size required, a database entry wou l d be created that 
p o ints to the memory al located to t h e storage area. When a 
routine declaration Rcode is encountered , a database entry would 
be created that pointed to the Rcode tree node that is the start 
of th e r out ine code. 
Rcode inc I udes 
Except for 6a, 
machine and syste m dependent codes ( group 6 ) . 
these Rcodes wi I I be passed untouched to the code 
gen erator as these are very t arg et machine and operating system 
3-20 
A Second Intermediate Code 
dependent. On l y th e t a rget code generator ca n hand l e th ese. 
After GAS code has been ge n erated, it can be opt im ised . The GAS 
code can be t r eated as an y t y p i cal three address code , or zer o 
address code, or whateve r for m, and opt imi sed us i ng any o f t h e 
we l I l', nown tec h n i ques a va i I ab l e for opt i mi s i ng code i n t h i s f o r m. 
A later chapter on GA S code opt i mi sation descr i bes the des i gn f o r 
t h e GAS code opt imi sat io n i n t h e PLIP pr oj ect . The des i g n ta k es 
s i gnif i cant advantage of the structural i nfor mat i on pr ov i ded by 
the reta i ned Rcodes. 
3 - 21 
Chapter 4 
Thr ee Address/Stack GAS Mac hin e 
Th i s thesis has involved considerable work o the refinement of 
the concept of a Generic Action Set of instructions for a generic 
machine. The generic machine and its instructions represent the 
architecture for a family of machines. A gener i c machine and code 
set has been developed for a fam il y of machines that includes the 
VAX and many other common processors such as, for examp l e, the 
Mo o rola 68000, Zi log 28000 and Data Genera l MV series. This GAS 
mac . i ne and code design has been incorporated in the code 
generator being developed for the VAX a s part of PLIP. A 
Reference Manua l and Users Guide Manual have been produced for 
the GAS de s ign and are included i n appendix A and B. The wor k on 
the VAX prototype code generator has provided an env i ron ment in 
wh i ch the two I eve I concept has been refined. A fu I I er 
generalised deve lo pment of the two level concept is planned but 
i s beyond the scope o th i s thes i s. Th i s deve l opment wi 11 i nvolve 
app l ica t ion to fami I ies such as zero address machines, and 
multi proce s sor machines. In particu l ar i t i s hoped to produce a 
des i gn for transputer based machines in the near future . 
This cha pter d e scri b e s the pr ot o t yp e d es i g n de veloped for 
reg i ster/s tac k machines of whi c h the VAX i s a ,, ember . The design 
o his GAS machine definition has consumed much of the t im e or 
th is thesis. It has been found that the des i gn of the GAS machine 
for a fami l y is vit a l . If t h e defini t ion does not a ccu r ately 
model the s i gn ifi ca n t fa c tors affe ct ing machi e co d e g eneration 
4- 1 
Thr ee Ad dr ess / S t a ck GAS Mac hin e 
for machin es int e a mi l y, the code produced for al machines 
will be poor . The GAS machine concept offers much for 
portability, but t , e def ini t io n i s not prec i se , portabi ity 
will be gained at t e expense of code qua li ty. One of the 
im portant goals of the PL P pro j ect i s that code produced must be 
of high qua ! i ty. The prototype development has been essential to 
demonstrate that bot h portabi li ty and qua li ty code can be 
produced by a two l e vel code generator. If th i s were not 
possible, the GAS mach in e concept would jus t have become another 
interesting idea that did not I ive up to its promise. The 
prototype deve I opment has shown that portab i Ii ty and qua Ii ty of 
target code can be ach i eved for register/stack based ma c hines . 
Further prototype deve l opment w i 1 1 establish if this wi 11 also be 
true for zero address and mult i processor machines. 
GAS Machine and Inte rpr etat i on 
As stated in the prev i ous chapter, i t shou l d be poss i ble to 
in terpret GAS code s i ng the semantics def in ed for the generic 
machine. Interpretat i on would i nvo l ve accepting a stream of 
Rcodes containing GAS code, in post order form, or an Rcode tree, 
and ignoring Rcodes e xcept dec l arat i on Rcodes and Reade reserved 
extensions conta i n i g GAS codes. The GAS codes are then 
"executed " . The dec larat i on Rcodes are equiva l ent to crea ting a 
new memory segment, either data or code . The machine therefore 
emula t es a se gmented virtual memory mac h in e. If t h e GAS c od e i s 
interpreted on a conventiona l machine, a database will be 
mai nta i ned which records the loca tion in real memory 
temporary etc ) dec l ared 
of 
in 
each 
the object 
Rcode. 
virt ual 
routin<2, (storage area, 
Th i s data base i s simi l ar to segment tables 
memory machine . References to routines, 
4 - 2 
in a s e gme nted 
or obj e c t s in 
Three Address / Stack GAS Mach i ne 
mem ory w i I I require access o th i s database. 
Family definition for Prototype GAS Machine Design 
he prev iou s chapter introdu ced t e concept that a generic act ion 
set for a gener i c machine represents a "family·· of machines, 
argued the case for it 's ex i stence, described 
Reade, and discussed what shou l d be achieved 
Reade to the GAS code l evel. 
i ts relat i onsh i p to 
in the move 
T e factors discerned as im ortant for the second level 
evolved during the design of a GAS machine prototype for 
register/stack based fa mily . These are not an exhaustive set 
from 
code 
the 
ot 
requirements for a gener i c machine . . It is expec ted for examp l e, 
t hat the design of a generic machine definition for transputer 
based machines wi I I involve additional important concepts. A 
more complete generic machine conceptual mo del will no doubt be 
es tab I i shed duri g later more in-depth studies of the two leve l 
code generator concept. The i ssues that apoeared during the 
development of the VAX fam il y GAS machine prototype are now 
d i scussed and prov i de an in s ight i nto the major i ssues 
defining a new family. 
involved 
he model for a GAS machine requ ir es an architectural definit i on 
and an in struction set. The architecture defines such th i ngs as 
stack faci I it i es and any associated pointers, the structure of 
memory, addressing modes for operands, and types of operands 
andled. The in struction set describes GAS machine operations 
that are ava il ab l e, but will also include some "pseudo " 
operation s, invo l ved with declaration of objects . Addi ti ona I I y 
Rcode declaration codes are s i gn ificant in the GAS machine model 
4-3 
Three Address / Stac k GAS Mach i ne 
or declaring code and memory objects. 
The instruction set provides an important def inition of the 
family, and genera l ly reflects architectura l features of the 
family. For a machine such as the VAX, i nstruct ion s in c lud e: 
ALU ar i th metic and lo g ica l 
Memory movement 
Stack push/pop 
Control 
Subroutine 
mai n ly conditional branch ing 
Ca l I/Return 
Ari thmetic instructions 
statically sized objects, 
essent ially involve operations on 
of size 8,16,32,64 bits. These objects 
are multiples of bytes, 
on byte boundar i es . 
and if loca ted in memory must be located 
The types of ar i thmetic i nstructions w ill determine how block 
arithmetic i nstruct ions are br ok en down. The types of cont rol 
in structions wi 11 detemine how the control structures of Re ode 
are represented. 
How the Rcode tree structure and in termediate result v a lu es are 
handled on t h e target machine is s i gn ifi cant. Reade i s tree based 
wh i ch means that storage for i ntermed i ate results is im p li ed. An 
in ternal stack of 
implementation of 
intermediate resu lt s wou l d be typ i ca l of 
the Reade machine. The Reade machine 
a ny 
i s 
ther efore v ery much a li gned with a Stac k (zero addr ess) 
handled architecture. The h and! ing of in ter med i ate results i s 
im p! icitly for the programmer . For a machine w ith reg i sters and 
without zero address i nstruct i ons , the code must be li nearis ed 
4-4 
Three Address / Stack GAS Mac hin e 
and 
The 
the dest i nat ion 
programmer i s 
tor al I i nstr ct i ons e xolici t ly specitied. 
therefore concerned wi t the hand li ng of 
in termed iate results. The essent i a l ad vant age of registers i s 
that va I ues that are to be used ear Ii est fol l ow i ng code can be 
kept in registers to avoid unnecessa ry me m ry references. In a 
zero address machine, many instructions ~o ot require space for 
operand specification, but a ll te mporary r '=su l ts are k ept on the 
stac k wh i ch is kept in memory. Hence a l I op erands w i II require 
me mory references. Wit h acce l erators sue as cache memor y, this 
l oss of use of reg i sters is co mpens a ted t o so me extent, and is 
ba l anced by man y instructions no t requiring me mory references to 
obtain operand addresses. Conversely ins truction caching and 
prefetching to some e xten t negates the ad van tage of zero address 
ins tructions not requir i ng operands. To hand l e common 
sube x press i ons requ ir es t ha t 
at random, rather than in 
i nter med iat e r esu l ts be 
the regu lated manner that 
avai l able 
a tree 
structure requires in which values wi II reaopear automat i cally at 
the top of the stack when requ i red aga in . To implement common 
sube x pressions on a stac k machine, c ommo, e xpress io n re su I t s 
required could be accessed us in g an offs'=t i nto the stac k . Th i s 
i s messy because the offset would ha v e t o oe computed each time 
i t is requ i red, and when t h e value i s o l onger required i t 
cannot be 
approach 
eas i I y removed 
i s a second st ac k 
f rom the st ac K. A more practical 
that would be used to a l lo cate local 
variables for a r outine and would also be se d o al l oc ate space 
for temporar i es. When a common sube xpress i on va l ue is required, 
the va l ue can be obta ine d f rom the variab l e stack and pushed on 
t he c ompu tat i on sta ck. A r eg i ster machine · as the advantage that 
t he co mmo n value can be stored in a r eg ist er. Ref erencing the 
• .:: we l ater in the code i s s im p l e and fast. Therefore whether or 
n ot a machin e has reg i sters i s obv iou s ly a signi icant factor in 
4-5 
Three Address / Stack GAS Mach in e 
defining a fam i ly. 
An alternative to register based and zero addr e ss machines would 
be an accumu l ator based architec ure. In a zero address 
architecture, GAS i nstructions for arithmet i c and l ogical 
opera t ions would not have any operands. For an accumulator ba s ed 
mach i ne, t e l ocat i on o one operand i s im p I i c i t ly i n the 
accumulator, and the GAS code would ref l ect this by ar i thmetic 
and l ogical i nstr uctions that only have one operand. If a machine 
ha s several registers, a ri th met ic and log i cal GAS instructions 
i nstructions w i I I typ i ca I I y have three operands. I t would be 
possible to create a fam il y for reg i ster based machines that have 
a r ithmetic and logical i nstruct i ons that are a lways two address. 
The three address form is more general and can be used for two 
address target machines as two address instructions can a l ways be 
derived from t hre e address in structions. However if a GAS code is 
deve I oped that is two address, then the GAS code w i 1 1 more 
c ' ose I y mode I the f i na I target machine code form i f the target 
machine i s two address . 
here fore whet er a mac hin e i s reg i ster , zero address, 
accumu I a tor, three 
wh en def i n i ng a 
or two address based wi I I be very s i gn i ficant 
GAS mach i ne. In fact whether a machine i s 
register based i s not as funda menta I at the GAS I eve I as whether 
the machine is two or three address. Registers are only im portant 
the l ocat i on of operands. Re g i s ter machin e s are in dec i ding 
lik ely to be two or three address machines, an d therefore re quire 
for holding in te rm ediate sube xpress ion results. 
the GAS code for t hese machines i s lik e l y to be two 
temporaries 
Therefore 
addre s s or 
temporar i e s , 
three 
but 
a dd ress, 
thi s would 
and include provi s ion s 
be no d iffer e nt rom 
4- 6 
to declare 
a me mory 
Three Address/StacK GAS Mach in e 
orien ted two or three address machine that had no genera I 
registers. 
The subroutine call structure of Reade (lexica l l evel structuring 
and sc ope, arguments, results, and local objects) must be coded 
eff i ciently in the target mach i ne. The basic faci I i t i es prov i ded 
by the am ily for imp l ementing such subrout ine ca ll s will 
significantly affect much of the code. As described in the 
previous chapter, a dec i sion has also been taken in the PLIP 
project to im plement a general cal I i ng standard in the interest 
of portabi I ity of user code. Therefore the subroutine cal I ing 
conventions inplemented in the target machine in struct ion set may 
not necessarily be used, unless it al l ows conformance to the 
family standard. Only si mp l e CALL and RETURN instructi ons are 
required. However for the VAX fam i I y, machines, a stac K is 
assumed avai I able on which the result, arguments and local 
storage objects can be located and eff i c i ent ly accessed. 
Efficient a c cess really requires that po inters are available that 
can be used to access the result, arguments and l ocal objects 
efficiently. It i s a l so important that the GAS code retains 
informat ion on the structure of subro utine cal l s. Th i s i s 
provided by the GAS mac hin e ca l I i ng convention and the retention 
of the Rcode routine declaration Rcode and the "C a l I· Reade. 
Some Rcodes such as " Add In Place " wi 11 be implem ented in the 
fam ily GAS code by specifying disabling of i nterrupts if the 
target machine does not h ave a suitable equ ivalen t ins truct i on. 
Therefore a machine belonging to the fa mily must have 
instr uct ions to d i sable and enable interrupts. 
Other factors such as addressing modes and data types supported 
4-7 
Three Address/Stack GAS Machine 
to· various inst ructions are a l so v er y important . However these 
two factors are not usefu l for def ining a fam il y. As discussed in 
the previous chapter, Rcode imp I i es the address in g of objects in 
terms of offsets in to storage areas, with indirect address in g 
a l l owed. Complex addressing modes a llow in ormation on the 
address computation structure to be retained to some extent. The 
l evel to which t hi s information needs to be retained depends on 
the complexity of address in g modes of the target machines in the 
am i I y. The GAS operand addressing modes shou I d a I I ow the more 
comp I e x addressing modes common I y ava i I ab I e on target machines i n 
the fami I y to be used. However it is not required that a I I 
ma chines i n the family have a certain set of " typ i cal ·· address i ng 
mo des , 
f am i I y 
therefore addressing modes themselves are not part of a 
definition. Complex GAS operand addresses wi I I be broken 
down i nto a sequence of computat i ons if the target machine does 
not support the l evel of complexity present in the GAS 
ins truction address. In the GAS machine prototype developed, 
access to stack objects via pointers is model l ed by GAS mach in e 
pointers. The data types supported by target machines a nd 
operations availab l e on each data type often vary greatly from 
machine to machine. Requiring that machines in a fami ly must have 
very s im ilar datatype support would produce many more GAS 
fami ies. In fact all that is important for defining the VAX 
farn i I y in requard to data types support i s as stated above; for 
machines in the VA X family, arithmetic operations must involve a 
s mall number of stat i ca ll y s i zed objects. Individua l target 
machine data type suppo rt i s instead taken into account during 
Rco de, GAS code a n d target machine code generation on a local 
l evel using tables of i nformation, but machines in the family do 
not ha v e to have support for certain datatypes. The constraint 
t~at operations are on fixed sized objects I imits the range of 
4-8 
Three Address / Stac k GAS Mac hin e 
mac h i e s that w i 1 1 mat c h t e a mi ly, b t a l l o ws GAS co de to be 
c l oser to the target code for the mach i nes i n the fam ily, making 
the second I eve I code more usefu I If o perat i ons on d yn am i c and 
bloc k operands had been a I I owed, more mac hi nes wou l d have been 
i nc l uded in the fam il y, but GAS code ge n e r ated for mac .i n es wh i c h 
actua lly on l y al l ow operat i ons on f ix ed s i zed ob j ects wou l d n o t 
r ef l ect the brea k d ow n o f ove r s i zed and d yn a mi c o pera nd 
operat i o n s , which occ u r in Rcode v ia blo c k e xa c t ar i thme tic 
operat i o ns. Addit i onal ly , the constra in t i s app li ed that operands 
for arith met i c operations mu st beg i n on b y te boundar i es. The GAS 
c ode p ro duced shou l d the r efore mo ve o perands of a i t hm et ic 
operat i o n s not l ocat e d on b y te boundar i es i nto te mporar i es. 
The VAX fam i I y GAS machine protot y pe inc l udes b i tstr i ng 
f aci i t i es. Bitstr i ng operat i ons i nc l ude the l og i ca l o perat i ons 
NOT, OR , XOR and AND p l us a SHIFT and an EXTRACT ins truction. 
They operate on b i tstr i ngs that can be any s i ze and beg i n at any 
bit locat i on i n memor y . As an e xa mp l e of use, the y w i 11 be 
generated to ensure any arith met i c o perations that i nvo l ve 
ob j ects not located on b y te boundr i es, are e x tracted i nt o 
temporar i es s o that the ar i thmet i c operat i on can be p e rf or me d . 
Th is mode l s what w i 11 be done t o e xtr act va l ues f ro m b it pac k ed 
records and arrays in to r eg i sters f or co mputat ion. Thi s does not 
necessa ri !y im p l y that the target mach in e must ha v e 
bitstr i ng instructions. If the target mach in e does 
certain b i tstr i ng instruct i on, t h e GAS b it string 
mu st be converted i nto a series of target mach i ne 
a I these 
not have a 
struct i ons 
i nstructions 
i nvo lvin g l og i ca l o perat i o n s that a r e ava il ab l e s u ch as 
and s hif ts . Uni ik e t h e ar i t hmet i c operat io ns, GAS 
OR, AND 
b it str i ng 
Th i s i s ope ratio s are not I imi ted to operands o f f ix ed s i zes. 
b e c a us e the range of bitstr i ng operat i o n s o f machines i s usual ly 
4-9 
Three Address/StacK GAS Machine 
quite wid e, some degree of dyna mically s i zed and l arge s iz ed 
operands often being prov ided. If the target machine bitstring 
o perat io ns require operands to be s i zed 
the massag in g must be done by the target 
in mu l tiples of 
c o de generator. 
bytes, 
The only 
requirement for a machine to be lo ng to the family i s that i t i s 
capable somehow of emu l ating the b i tstr in g operat ion s i n the GAS 
code, even i f this only i nvolves AND and OR of register contents . 
Fami I ies and Constraints 
The cent r al i ssue of family de inition shou l d be com in g c l ea rer. 
De f in i ng a fami l y defin i tion was stated as f i nding architectural 
featu r es common to a group of machines. Th i s is perhaps more 
accurately described as finding constraints that are mutually 
common to significant group of machines. A l I the constraints 
should be common to al I machines in the family. The larger the 
nu mber of constralnts the greater the number of common code 
generation steps that can be represented in the Rcode to GAS code 
generat i on step, and the more o the final target code objects 
that w i I I be represented, thus a i d i ng the GAS code optimiser and 
code generator. An e xample of how more target machine data 
objects are represented by app lyi ng constraints i s that e x tra 
te mporar i es used to massage operands to the sa me s i ze wi I I be 
created i f a constra in t i s app li ed that operands must be of the 
same size. The architectural features such as two address versus 
three address can be seen as a constraint factor. Two address 
i t a I I ows GAS code ls more constraining than three address code; 
code to represent the co mm on step for a l I machines in such a 
family to convert three address Reade operations to two address 
operat i ons , creating the extra temporar i es required. It i s 
in teresting to note that constraints really represent imitations 
in the target machine arc hitecture. A mach i ne with an 
4-10 
Three Address / Stac k GAS Mac hin e 
arch i tecture with ew constra i nts: 
three, two, one and zero address instructio s 
operands can be of d i ffer i ng sizes 
operands can be dynam i ca ll y or stat i ca ll y s i zed 
operands can beg in on an y b i t 
would be qu i te simp l e to generate target code or from Rcode. The 
more I i mi t e d a mac h in e arch i t e ct u re i s , then the more work that 
is involved i n generating target code from GAS code. The more 
that common constraints can be recognised a mon gst mach i nes, the 
more the common wo r k that can be done by a GAS code stage. Too 
many constra i nts will result in f a milies that have only one 
member, so the advantage of the GAS code concept wou I d be I ost. 
Finding a ba l ance i s the key to a successfu l GAS code design. 
This balance for the GAS fam il y took a long t i me to d i scern, but 
now that i t has been done the concepts are muc , c l earer and a GAS 
de fi nit ion 
develop. 
for say a transputer based fam i I y ...i ou I d be eas i er to 
For a factor to be important in describ i ng a family, i t must be 
necessary for a I I machines in the fam i ly to c on orm to the 
factor. Therefore i f two address i nstructions i s a requirement, 
then a l I mach i nes mus t be two address. If a ri t hm et ic operations 
must involve fi x ed s i zed ob j ects, th i s is a co stra int . However 
addressing modes of the GAS machine is not a con s tr a i nt . Ma c h ines 
i n the family do not hav e to have these addressing modes to be 
4 - 11 
i n c luded 
actor. 
i n 
Three Address / Stack GAS Machine 
the family . S i mil ar l y data type support i s 
GAS mac hi ne a n d c od e f or f am i I y conta i n i ng VAX 
Characteristics of fam il y 
a l May have two or tree address instructions 
( not accumu l ator or zero address instructions 
and probabl y has general purpose registers) 
b) Has a stack w i th po in ter register support for 
al locating and accessing arguments and l ocal 
variables of rout i nes 
c) An i n struction set: 
ALU arithmet i c i nstructions 
that operate on a s mal I range of statically 
s i zed ob j ects. A l I objects need not be of the 
same size, but must be o f the same general type . 
A l I memory operands must be byte aligned 
Memor y movement of bytes, 
either 
dynamically sized 
or 
statica lly s i zed 
- Stac k i nstr u ct i ons to Push/Pop bytes 
4-12 
o t a 
Th r ee Address / Stack GAS Mach i ne 
Con t ro l struct i o s that essent i a lly i n v o l ve 
cond i t i o al and unconditional branching 
- S i mp l es broutine ca l l and return 
- Interr pt enab l e and disab l e 
- Instruct i ons that a l low logical 
operations OR, AND, XOR, NOT to 
be emulated 
Temporaries 
To al l ow linear i sat i on of Reade, temporary storage locations can 
be def i ned us i ng the USEVAR GAS code. The basictype and unique i d 
a ssigned t o the temporary must be provided. As it may become 
obvio s that a te mpo r ary is no longer required, a DELVAR GAS code 
i s pro vi ded so tat database entries related to the temporary can 
be de l eted. The basictype size indicates the required size for 
t he temporary, bu sma l ler size values, or parts of the value 
stored may be used at a l ater stage. References may be made to 
va ! ues 
a b y te 
in temporaries by giving the temporary ident i fication, 
offset ad b i t offset to the start of the value in 
te mporary . These offsets may be computed. 
Address i ng Modes 
and 
the 
Operands wi I I often require a n address which mark s the beginning 
f a locatio n fro m wh i ch to start taking a source value or to 
wh i ch is to be wr i tten a va l ue. For a l l obj e cts except 
bitstr i ngs, these locations are alway s byte boun d arie s . For 
b i tstr i ng operations, they can be any bit pos i t i on. An object i n 
4- 13 
Three Address/Stack GAS Mach in e 
mem or y can be referred to in severa I wa y s: 
OR 
OR 
a l specifying the identity for the area i n which t is 
lo cated in, a byte offset into t h e a r ea, p lu s a bit offset. 
Both of these offsets can be computed 
b ) a stack pointer register prov i des the memory add re ss 
c ) the memory l ocations provided by (a) and (b) could be 
considered 
location, 
to contain the address of the required memory 
.e. indirection 
Additionally, an "extra offset" can be added to the address 
provided by (a), (b) and (c). This extra offset allows 
model I ing of source language concepts such as accessing 
the fields of a record that is referenced via a po in ter. 
he addressing modes effectively prov i ded are: 
immed i ate 
storage 
value from which so urce value can be obtained 
i s exp li citly provided 
d i rect 
direct reference to a memory loca t ion 
requires 
a storage area i d 
4-14 
Three Address / Stac~ GAS Mach in e 
a n d 
and 
i ndexed 
or 
stat i c bit and byte offset to that area 
stat i c e x tra o ffset 
al reference provided by st or age area i d 
and any of t h e bit, byte or e x tra 
offsets are computed 
b) stack pointer contents provides address 
(additionally a ny of the bit, byte and 
extra offsets may be c om puted to give 
more levels of indexing ) 
i nd i rection 
or 
direct an d in d ex ed a ddr es ses above contain 
an address wh i ch is added to the e x tra offset 
a va l ue in a temporary ho l ds an address 
which i s added to the e x tra of set 
Bit and byte offsets must be a basictype va l u e spec i fied by an 
immediate value , or 
l ocation whose address 
offs et . 
computed into a tempo r ary or a memory 
involves a sta t ic by t e offset, and n i I bit 
The i dent i t y of a memory a r ea may need to inc I de reference to 
the identity of the module from wh i ch it has been imported. 
It coul d be argued th a t t hese addre s sin g modes are not as comple x 
4-15 
Three Address/Stac k GAS Mac hin e 
as are 
modes 
ho se o n some target machines, t e VAX i nc l de d, '::,...; t t h e 
provided are adequate to eff i cient l y i mpleme t the 
c onstructs found in the i ntended source l anguages. 
Data t ypes s upporte d 
The datatypes supported are the same as for Re ode ( Re ode 
Basictypes of signed / uns i gned i nteger, rea l , 
Basictype objects and b i tstrings. B l ock 
supported for memory move , compare and search 
de c i rn a I l , b I o ck s o f 
operands are on l y 
instruct i ons. Bloc k 
arithmetic operations must be reduced to a series of operat i ons 
on Basictype objects. Th i s ref l ects that most mach i nes do not 
provide arithmet i c operat i ons on blocks of memory, but do provide 
block memor y move, compare and search i nstructions . The block 
structure of such operations is made clear in Rcode and should be 
retained in GAS code rather than be expressed as operations on a 
series of Basictype objects. GAS memory move instruct i ons 
LOAD, PUSH and POP. 
el ude 
A Basictype operand 
basictype specifier 
specif i er, and a 
l ocat i on. 
Basictype operands 
GAS i nstruct i ons by a is spec i fied in 
which i s identica l to the Rcode bas i c type 
reference indicating the operand va l u e o r 
are ta ken from an imm ed i ate va l ue or the 
contents of a temporary or a memory lo c ation. A Bas i ctype coded 
byte must be gi v en to ind i cate the size of the operand. If taken 
from an i mmed i ate value, t ' e operand w i 11 be taken beginning at 
the least sign i ficant b i t of the imm ed i ate va l ue. If ta k e fro m 
memo r y, the operand w i I I be taken from the bit whose address is 
given. If taken from a temporary, the value i s taken fro m the b i t 
at the the offset specif i ed. 
4-16 
Three Address / Stac k GAS Mach i ne 
8 1 ock operands consist of a b l ock of basictype objects. The size 
of the b l ock i n terms of number of bas i ctype objects is 
s~ecified by one bas i ctype opera n d, and the start of the block 
a1d the bas i ctype of eac h n i t prov i ded b y another. Bloc k 
ooerands always invo l ve ob·ects i n me mory, or constant , 
i tial i sed storage areas. There "' ore 
pr ovided or a b i t offset i nto a constant 
a b i t address must be 
prov i ded . If the block 
i s i n fact a s i ng l e basictype o bject, the size operand wi 11 be an 
immed i ate va l ue that specif i es a length of one. This al lows 
c l ear i dentif i cation o memory move instruct i ons that Involve a 
basictype value and not a bloc k . If the b l ock only involves a 
series of bytes, the basictype specifier can indicate an unsigned 
type one byte long, and the size operand the number of bytes _. 
B i tstring operands 
l ocated at any bit 
may be ta k en from an immediate value, or 
in memory, temporary, or constant area. They 
ust 
alue 
also be provided with a s i ze, 
which is immediate or l ocated 
which must be a 
in a temporary, 
1 ocation i n memory that i s spec i f i ed with a stat i c byte 
and bit offset. 
basictype 
or at a 
offset, 
e data types supported are there f ore qu i te extens i ve and shou l d 
e xceed most of those ava il ab l e on the target machine. 
practice, GAS operations wi 11 not necessarily be on objects 
supported by the target machine. Within one mach i ne there is 
often a complex variat i on i n he datatypes that wi I I be supported 
y d i fferent instruct i ons. Instead, the GAS generator only need 
ch ose a Basictype that is as big or perhaps s l ightly larger t h a n 
, s supported for the operation involved . The target code 
4-17 
Three Address / S t ack GAS Mac hi ne 
genera: r then can eas i I y generate code or the GAS code. However 
t h e c l oser the GAS code operand i s to a sup port e d s ize, the more 
effect i ve the GAS code opt imisat i on wi I I be. An alternative 
approach would be to reduce o perands to the same s i ze or 
s ma ll er than t e largest size supported by the arget . The GAS 
code opt i mi ser wou l d be effect i ve on t i s code, but the code may 
be po o r i f many opera ds are sma I I er t han necessary. The choice 
of approach depe ds on the ammount of i nformation made avai !able 
to the GAS code generator. Prov i ding information on a l I the 
ava i I ab I e data s i zes for each data type supported by each GAS 
code operation could be sign i ficant . The approach taken has been 
t o represent l argest size of operand that can be handled by the 
target machine for a specified a given GAS code such a s ADD. When 
ADD operations on large sized objects are encountered, they wi I I 
be be broken down i nto operations on objects of this size. If 
t h e s i ze shown i s not ac tua I I y supper ted by the target machine i n 
the specific c i rcumstances <e.g . reg i ster that al lows ADD on 
objects of this s i ze can't be used) a few extra target 
code i nstruct i ons may have to be generated. Th i s wi I I 
d i fficu l t. Therefore ac cu rate rep r esentat i on of target 
data t y pes supported under al I poss i b l e s i tuations is not 
o r GAS code generat i on. 
Memory Structure 
mach i ne 
not be 
mach i ne 
vital 
h e GAS machine memory stru c ture require d much t h o ught . The 
desire that GAS code reflect the nature of t a r g et ma ch i ne co de 
meant that code to reference local var i ables and parameters 
should ref l ect the fact that such objects would normally be 
stored on a stac k . However it wou I d require a I ot of effort for 
the GAS code generator to accurately mode l the tar get machine 
stack as it will be at runtime. Addit i onally the content s of the 
4- 18 
Three Address/Stack GAS Machine 
s tac ma y ch a n ge wi th opt i mi sat i on and when the deta i led target 
machine code i s ge erated. Objects that appear dynam i ca ll y sized 
may turn out t o be stat i ca I I y sized. Objects not re erenced may 
be om i tted. Add i t i ona l objects may appear on the stac k . 
Ali gnment c o nstra i nts make the possible effects of objects 
d i sappear i ng and appearing qu i te complex. The approach taken i s 
to have a stac k which consists of "objects " . An object i s one 
comp l ete Rcode var i able, GAS temporary, Rcode Basictype object 
<variab l e or i mm ediate) or GAS pointer. Eac h stack locat i on 
contains one object . When Rcode declarations for a l oca l object 
o f a rout i ne are encountered, the object is assigned the next 
object l ocat i o n on the stac k, an d a PUSH in str uction i s 
generated. The PUSH instruction has three operands; a lignment 
which i s Rcode byte form used to specify required alignment, a 
basictype operand to specify size of block to push and a 
basictype o perand that specifies the type of bas i ctype object 
and the first basic object in the b l ock. To a lloc ate space for an 
Rcode variab l e, the operand that specif i es the first basic object 
i n the b l oc k w i 11 be '"NIL'". When target machine code i s 
generated, the PUSH instruction wi I I probab l y be converted i nto 
an increment of the stack pointer (probably for a l I l ocal objects 
i n one i ncrement). Therefore the GAS machine wi 1 1 not involve the 
use of dope vectors for dynamically s ized objects. When objects 
on the stac k are referenced, they wi I I be re erenced by their 
'"object offset '" p l us a byte and bit offset within the object. As 
stack po i nters are maintained ( see below) the offset for loca l 
variables of routines wi 11 be from a pointer LOCALSTORE. This 
imp Ii es that the GAS code generator must keep a data base of 
object offsets assigned to Rcode objects, so tha t later 
references to hese objects in the Rcode can be converted to 
t eir object offsets. If the GAS code containing Rcode tree i s 
4-19 
-- - - - - - - - --------~ 
- - - - - - - - --- --
Three Ad dr ess / Stac~ GAS Mac hin e 
written out to d i sk, t . i s database wi 11 be l ost. Howe e r, 
Reade declarations shou l d be retained, so t e database can 
rebu i It when the Re ode i s rebu i It. Arguments and resu I ts 
simi l ar ly be pushed o to the stack. A descr i ption of t e 
mach ine pointers and ca lli n g convent ion i s given below. 
the 
be 
wi I I 
GAS 
Global storage areas w i 11 r epresent storage areas de in ec i n the 
Rcode. The GAS machine w ill retain the same g lob a l storage 
structure as the Rcode. This involves the concept of several 
indepe ndent storage areas. The s i zes of these areas wi I I be 
def in ed in Reade dec l are and append area codes and the import 
area code. These Rcodes wi 11 be retained and no GAS code wi 11 be 
directly generated for them. The target code generator wi I I 
con ver t t.hem into su i table portable linker directives. GAS 
operands that reference an object In a global storage area w i 11 
do so by specifying the storage area id, module i d if t e area i s 
imported, and a byte and bit offset into the area. 
If a GAS i nstruction has an operand that would e x tend past the 
stat ic storage area or stack object i n which i t i s loc ated , the 
beh av iour of the program wi I I be unpred i ctab l e. When an operand 
extends beyond a stac k ob·e ct, it may e x tend i tote "o ll ow ing 
obj ect on the stack. 
Code Labe l s and Addresses 
The structure for memory containing cornpi ler generated code is 
mainly i mportant i n terms of i dentifying rout in es and targets of 
branch i nstructions. The actual size o f storage sed and 
a Ii gn men t requ ir ements i s not important to t h e front end or for 
the GAS code, as l ong as entry points for procedure and the 
target of branches can be declared and referred to . 
4- 20 
Three Address/Stack GAS Machine 
In any computing mach i e, 
essential component. The 
fac i I ities provided by 
code path control structures are an 
GAS machine must mode l the typical 
it s fam ily for code path contro l . 
Fundamental 
tar gets of 
to this i s the addressing methods used to spec ify the 
subrout i ne ca I I s and branches. In the GAS mach in e 
t es e cannot 
displacements 
in structions 
be spec i 
because 
i ed as real addresses, or 
these require that target 
have been determined so that the addresses 
re l at i ve 
mach i ne 
of, or 
displacements to, the target addresses are known . GAS code 
operands mu st be generated for GAS CALL and bran ch instructions 
that spec ify the targe t in a way that doesn't involve an exp I icit 
target machine address displacement. 
Many of the branches wi I I be required to implement Rcode control 
structures such as loops and case structures. The GAS code pseudo 
in struct ion USEVAR can be in serted in the GAS code to mark points 
that wi I I be used as targets of branch ins truct ions used to 
im plement these control structures. As wil be discussed later in 
the chapter on GAS code opt imi sat i on, bas i c code block 
identification and flow ana l ysis wil primar il y be involved with 
l ooking for USEVAR codes and branch ins truct ions that refer to 
these I ab e I s . GAS branch in struct i on s can use the I ab e I i d as a 
target specifier. The target code generator wi I 
of the label into t n e address al lo cated to the 
convert the uses 
I abe I. A GAS code 
terpreter wi 11 int erpret uses of the labels as references to 
the GAS code following the USEVAR code that defined the label . 
Som e GAS branches will involve the direct implemen tat i on of Rcode 
jumps to targets that wi 11 invo l ve l abe l s that are def ined i n the 
Rcode by the Declare _Lab el Rcode. These jumps may also involve a 
4-21 
Three Add r ess / Stac k GAS Mach in e 
return to a dif erent "environment " ( see below under s brout in e 
cal I in g convent i ons). The as pee ts norma I I y 
environment on a target machine such as 
assu med to be automat ic in the GAS machine. 
involv ed in change of 
stac k unwi d in g are 
The target of the GAS 
in struction wi I I be specif i ed by an operand tat re 
label by the Rcode def i ned l abe l i d and lexic a l 
"Declar e_Label " codes are l e ft on the Rcode tree by 
ers to the 
l evel. The 
t'-:e GAS code 
generator to mark their l ocation for fol lowing phases o the 
compi fer. A GAS code i nterp re ter would interpret the use of the 
label as a pointer to the Rcode subtree fol lowing the Rcode that 
dee I a red the I abe I. However it may a I so have to unwind the 
current " environment ". 
Subroutine Calls in GAS code will require a target address. The 
GAS CALL instruct ion w i I I 
"Ca I I" and "Fast Ca ll" . 
be produced in response to t he Rco des 
The generation of the target addresses 
wi I I ultimately be traced to a Refer_Routine Reade which returns 
a routine descriptor, part of which contains the rout i e address. 
When the GAS code generator encounters a Refer_Rout in e Rcode, 
should generate a GAS operand that ref ers to the routine by 
Re ode ass i gned i d and I e x i ca I I eve I. For some GAS codes , 
operand wi I I be assumed to return a rout ine descr i pto r. I 
operand i s supp I i ed as the target of a GAS CALL in st rue t i on 
address part of the routine descr i ptor is assu med to be 
i t 
the 
th is 
this 
the 
the 
target address . The environment po i nter wi I I be relevant or a 
"Ca l I" Reade, but not a "Fast_Cal I" Rcode. If the target of the 
cal is computed the contents of the storage loc at io n specif i ed 
i n the GAS CALL ins truction prov i des the target address of the 
cal I . Note that the storage location value wi 11 have been fi I le d 
by a va l ue that could be traced to a Refer_Rout i ne Rcode. GAS 
code, as stated ear li er, does not i nvo lv e real target mach ine 
4-22 
Three Address / Stac ~ GAS Mac h i ne 
code addresses. Te '" Refer Routine '" is t e o y way t h e address 
of a 
Rcode 
routine can be descr i bed in Rcode. '" Refer _Routine" 
and GAS operands that spec ify a rout i ne i d and lexical 
l eve l are expected to return an address in t h e target machine. A 
Rcode or GAS code interpreter encounter in g t ~ ese would have to 
generate a target mach in e po i nter s i ze a l e that uniquel y 
i dent ifi es the relevant program points. : e v 2 ! ues wou l dn't have 
to be true target mach i ne code addresses. The y could be 
i dentification codes for i nterpreter database records for l ab el s 
or routines i n t h e Rcode or GAS code. 
Runt im e Str uctu r es 
This deals with the support the GAS machine wi I I provide for 
maintaining runtime structures. These structures are related 
exclusively to structures assoc i ated w i th subr ou tine cal I s. 
One of the 
architectures 
more s i gnificant aspects of modern 
i s the faci I i t i es pro vi ded to s u pport 
processor 
subroutine 
cal Is. Some arch i tectures provide faci I i t i es required fo r Fortran 
type structures, with no concepts of l e xi cal nesting and 
associated scope concepts. These processors prov i de cal I 
mechan i sms that provide e ficient al lo cat ion of l oca l storage for 
the subroutine, and dyna mi c return from the sub rou t in e, i nc l uding 
the automat i c dea l l ocat ion of l oca l storage, and space occupied 
by the arguments. Add i tiona lly the mechanism often provides 
facilities for the automatic saving and restoration of all or 
se l ected registers. Registers are prov i ded t ' at a l low efficient 
access to arguments and l oca l variables. Le xi cal nesting and 
scope must be hand l ed b y the programmer. Some orocessors provide 
fac i I ities to a l low the automat i c mainta in ence of access paths to 
a 11 objects in scope" of the current procedure. Some processors 
4-23 
Thr ee Address/Stac K GAS Mach in e 
a l so pr ovi de tor t e , cl us ion of automat ; -
exception hand I ing assoc i ated with the proced re 
In the VAX, if an e xc eption such as divide by 
faci li ties for 
ca l I struct u re. 
zero or stack 
overflow occurs, the context of the current procedure may define 
t e address of an e x ceot io n handler which wi 11 be ca l l ed. If the 
address 
handler 
def ined in t e current context is N'L, 
definiti on in t h e caller e nvi ron ment wil 
the exception 
be e x amined, 
and if not NIL, this e xc eption hand l er wi 1 1 be cal led. If it too 
i s NI L the un wi nd in g proc ess continues down the d y namic 
environment ca l I chain un ti I a non-NIL exception handler is 
found. When a procedure is ca I I ed, t h e defau It e x ception hand I er 
w i 11 be that of the caller. The programmer can then define a new 
handler at anytime wi t hin the procedure. The concept of exception 
handlers 
or Algol 
associated w i th procedures 
but i s present in ADA 
i s not imp l emented 
and the Portable 
Implementation Project runtime I ibra r y. 
in Pascal 
Language 
A I so 
many 
associated with procedures is the handl i ng of results. In 
cases 
register. 
The ca I I er 
attempt wi ! I be made to return the result 
An a l ternat ive is to return the result on the 
code and subroutine code must conform to the 
i n ·a 
stack. 
same 
convention as to how t he subroutine arguments and results wi I I be 
hand I ed. Usu a I I y the assemb I er programmer or I anguage comp i I er 
will adhere to a sta n dard convention for all subroutine calls. In 
many mach in es these con v entions are mac h ine 
spec if ic registers and e xp ect i ng the 
specific 
use of 
in structions. Universa l com pl ianc e with the convention 
involving 
specific 
enables 
the use of i ndependent ly compiled s ubroutines. In the Por t able 
Imp l ementation Project, it is desired th at the Language 
con v ent io n be as s imil a r as possib l e on whatever target machine 
4 - 24 
Three Add r ess / Stack GAS Mach i ne 
i s i nvolved. Th i s ass i sts portab ili ty of an y c oae that 
deliberate l y manipulates the stack, for examp l e imp l ememtat i on of 
the '' Ra i se " e xc eption procedure in the Modula-2 runtime I ib rary 
of PLIP manipu l ates the stac k to force return to an en vi ronment 
that may be several layers be l ow the current procedure l evel . 
Implementation of this on a new target wi I I be eas i er i f the 
stac k structure is s imil ar. The on ly d i fferences in t he stac k 
structur e of various target machines wi I I be related to the 
a li gnment r equ ir ements and miscellaneous objects such as static 
a n d dynamic li nk pointers . This also imp l ies that the resu l t will 
a l ways be passed on the stack, and that cal Is to subroutines may 
not involve the us e of special instructions of the target 
machine, but wi 11 use any simple cal I instruction that merely 
involves saving the processor status word/register and return 
address, t hen jumping to the address specified. Spec i al CALL 
instructions wi I I on l y be used if they preserve the funda me ntal 
stack structure. 
Th e add i t i onal advantage of using a common convention i s that GAS 
co de can be generated for the hand Ii ng of arguments and resu I ts 
r ather than l eav in g th i s to the target code generator. One of the 
major a ims of GAS code is that it shou Id represent as much as 
possible the common steps i nvolved in generating target code. The 
alternative wou l d be to have a high level GAS code i nstruction 
for subroutines that l eft the implementation detai Is com pl ete l y 
to the target machine generator . 
With these conc ep ts i n min d, a calling convention and assoc ia ted 
GAS codes and GAS machine reg i sters have been designed. They 
a l low the target code gene r ator to recognise the ful I structure 
of the subroutine cal I so that special cal I instructions can be 
4-25 
Three Address / Stac ~ GAS Mach in e 
use d, a n d the ca I I Ii nkage mechanisms can be im p I emen ted, such as 
stat i c l inks and register saving. 
The GAS machine stack wi I I be used f or resu l ts and arguments. 
Several pointers are defined that po i nt to relevant locations on 
t e stac k. These pointers are: 
a) TOP_OF_STAC K. Contains the offset of the object on the 
top of the stack. 
b ) ARGLIST. Contains th e offset of the start of the 
argument for the current procedure. 
c) LOCALSTORE. Contains the offset of the start of the local 
objects for the current routine. 
d) STACKBASE. Contains the unique byte address of the base 
ot the stack. Th i s wi 11 not norma lly be referenced i n GAS code. 
e ) STACKLIMIT. Conta i ns the unique byte address of the 
i m i t to which the stack may grow. 
f ) ROUTINEDESCR. Contains the routine descr i ptor for the 
curr ently active routine. Consists of the current Program Counter 
and a STATICLINK po in ter (see be lo w) . 
g) PC. Contains pointer to current GAS code instruction . 
h) STATICLI NK . Contains po in ter to the stat i c env i ronment 
f or the current routine . This may be a pointer to an environment 
vector containing pointers to al I the DISPLAY vectors that a re 
4-26 
Three Address / Stack GAS Machine 
c ,·rent I y i n scope, or a pointer to the DISPLAY vector o t the 
l exical l y enclosing routine. In this i t form s a stat i c li nk chain 
mechanism. The static environment vector and static chain 
me chan i sms are both used common ly . 
i ) PR EVDISPLA Y. Points to the DISPLAY vector for the 
previously active env i ronment, which is for the cal l er routine. 
This represents the dynamic I in k mechanism. 
j ) RETURNMARK. Contains the offset of the stac k location 
containing the return address to be used when return i ng from the 
curent l ocation. 
k ) RESULT. Points to the result area for the routine. 
I ) EXCE PTHANDLER. Points to the routine to be e xe cuted if 
an e xc ept i on is detected by some monitor. The routine takes one 
parameter wh i ch is po inted to by EXCEPTPARM. The exceptions wh ich 
can be detected are obviously machine dependent. Typ i ca l examples 
are divide by zero, stack over or underflow, 
overf I ow or underf I ow. They requ ir e that the 
real a r i t hme t i c 
processor has 
har dware monitors to detect these exceptions. Different machines 
wi I I have different me chan i sm for responding to suc h e xcept i ons. 
They may have a VAX approach where a I I exceptions w i I I i nvoke a 
common handler, or may have a separate handler for ea ch e x ception 
type. If the l atter is used, the target code generated should set 
al I handlers to point to the same handler rout i ne. 
m) EXCEPTPARM. Th i s contains the parameter tha t w i 11 b e 
use d by the exception handler routine. Its contents are mach ine 
specific , but should al low t he e xception to identify the 
4-27 
Three Address / Stac k GAS Mach in e 
e x cept io n ca u se. 
These registers a l low imp l ementat i on o f subrout in e call i ng, but 
a l so a l low a wa y of e x press i ng in a portab l e wa y , man i p u I at i on 
such as cha n g i ng the return address so return w i I I unw i nd to an 
o uter le x ica l l evel. 
S u brout in e Ca lli ng Convent i on 
The subrout i ne ca l ling convention u sed 
d em onstrate what happens at each step) 
(see later diagrams 
is as f o l l ows. The 
that 
f u I I 
c o nvention is based on the assumption that GAS code must be fully 
self-contained, w i thout any requirement that Rcodes such as 
"Call " and " Declare_Routine " be present. The on ly Rcodes requ i red 
w i f I be the declarations of static areas and constants, and 
e x ported symbo l s. However the GAS code generator w i 11 normal l y 
I eave on the Re ode tree Re odes such as " Ca I I.. a n d 
"Dec lare _Routine " that wi 11 be useful during code generation. 
a) Compute t h e size of the resu It area and a I I ocate space on 
the stac k us i ng a PUSH i nstruction . 
b) E x ecute the GAS PUSHMAR K i nstruct i on. Th i s w i I I 
C r ea te a new uninit i a li sed " d i sp l a y" v ector 
Moves the co n tents of the TOP_OF _ STAC K to the 
RESULT pointer of t h e new d i splay v ecto r 
Saves space for 
necessary 
PUSH o f 
status 
an NIL 
the return address and any 
information by performing a 
value that i s h e size 
4-28 
Th r ee Add r ess / Stac k GAS Mac hin e 
necessar y. T i s w i I I re ui r e target mac hine 
i nformat i on on the s i ze required. 
Moves co tents of TOP_OF_STAC K to t e 
RETUR NMAR K o the new d i sp l ay vector 
moves t ' e contents of the 
b y one , to t h e ARGLIST o 
OP_OF_STAC K incremented 
the new display vector 
The PUSHMAR K i nstruct i on alerts the target code generator t hat a 
subrou i ne cal I i s unde r way. It assumes that the resu l t area is 
the object on the t o p of the stack . The display vector may on 
the target mach i ne e xi st as registers, or space may be a ll ocated 
on the stack, or i t may be a mixture of registers and stack 
objects. Al I parts of the vector do not have to be created at 
t hi s point, but cou ld be created i n parts so that the co mponents 
are spread through the stack. The implementation of the disp l ay 
i s up to the target code generator. 
the PUSHMARK 
present , as 
i nstruct i on 
i t ma k es 
i s not necessary i f t h e "Ca l 1· 
i t c I ear to the target code 
Note that 
Re ode is 
generator that a ca l I in vo l v i ng the standard ca l I i ng convent i on 
i s under way. Th 0 s bt r ees of the "Cal I· Reade wi 11 a l so make it 
cl ear when the resu l t area s i ze computation i s under way that it 
is a part of a subro t i ne cal I . This information can be used to 
useful effect by the target code generator (see Chapter S i x). 
cl Push the argu ments onto the stac k . 
d l The target 
pointer 
routine descriptor 
i s pushed on the 
is computed, 
s tack. If the 
and the 
routine envi r onment 
4- 29 
Three Add r ess / StacK GAS Mach in e 
a dress i s computed the des cri ptor ma y be a temporary or 
a l located memory location. The routine descr i ptor contains the 
address of the routine entry point and a pointer to its 
environment. Computation of descriptors essent i ally occurs when 
t e Refer Rout i e Rcode i s encountered. The GAS machine wil l 
assume rout in e descriptor 
Refer Routine Rcode i s encou 
compu t ation is automatic. When a 
tered, a GAS code operand conta i ning 
h e routine i d and lexical l evel is generated . When this operand 
i s used in a GAS code, it i s assumed that the routine descriptor 
wi 11 be returned, unless the in struct i on is the CALL GAS code, i n 
which case, just the address of the routine is returned. The CALL 
in struct i on i s low level and i s used for implementing the Rc ode 
"Fast CaJ 1·· . It is not involved in environments. If the Rcode 
subtree of the "Cal I· Rcode fo r computing _ the routine descriptor 
of the cal led routine merely contains a ·· Refer_Routine·· Rcode , 
GAS code w i I I be generated to PUSH the environment po i nter of 
he descriptor, and the GAS CALL ins t r uction wi 11 refer directly 
to the routine v i a i ts Rcode assigned id and lexical level. The 
target mach in e code generator wi I l actua I I y have to generate code 
to compute the environment po in ter (and an env i ronment vector if 
ece s sary l . It wi 1 1 recog i se th is because the PUSH GAS code 
i nvolves the environment po i nter of the rout in e descriptor for a 
routine. I f t h e routine descriptor Rcode subtree i nvolves a 
computed descr i ptor, GAS code must be generated to p e rform the 
computation, PUSH the en vi ronment p oint e r of the resulting 
descr i ptor and the CALL genera ted wi I I sp ec ify an in d irect 
a ddress through the address f i e l d of the desc r i ptor, which may be 
stored i n memory or a temporary. The contents of the environment 
pointer field wi I I have been computed at some Refer Routine Rcode 
i n the descr i ptor computation subtree or even earli e r. The 
envi r onmen t struct ur e chosen depen ds on t he t arg et ma chine. I t 
4-30 
Three Address/Stac~ GAS Machine 
may be 
record 
a 
of 
stat i c li n po i nter (the aacress 
the l ex i cally enc l osing rout i ne) 
of t he 
or an 
activat io n 
environment 
vec tor (which conta in s pointers to a l I activation r ecords i n 
scope for the procedure). As it is qu i te poss i b l e that procedu re 
v ar i ables (hence a procedure cal I ma y involve a dynam i ca lly 
computed target address and assoc i ated environment) and procedure 
parameters may be used, the stat i c Ii k mechanism wou l d probabl y 
be favoured reguardless of the host machine fac i I i ties. The 
back end is not designed for a spec ific l anguage such as Pascal 
where procedure variables and parameters can be ruled out. 
el Ca l I the routine u s ing the CALL GAS in struc tion with the 
address determined in (dl as target. This wi 11 result in the 
current " address " being pushed onto the stack, any status 
automatically saved, and control be in g passed to the address 
specified. Th i s instruction is a simple jump to subroutine 
instruction, 
generator wi 
and it i s anticipated that the target mach i ne 
usual ly generate such a s imple i nstruct i on. 
code 
fl The first i nstruction in the subroutine w i 11 be the 
NEWDISPLAY GAS code. This instruct i on i ndicates to the target 
machin e code ge n erato r that standard ca I Ii ng convent i ons are to 
be used when this r outine i s ca ll ed. Note that this i n struct i on 
i s no t needed in the presence of the " ec I are Routine .. Rcode as 
it wi 1 1 be obvious t h at a subrout i ne is be i ng entered. The 
NEWDISPLAY i nstruction wi 11 operate as fol lows: 
The most recently created display i s made the 
current display, and wil l ma ke its PREVDISPLAY 
field poi t to the previous ! act i ve display. 
There could be several new displays that have 
4-31 
Three Address/Stack GAS Mac hin e 
been created, but which have not yet bee n 
activated. This can occur when eva l ation of 
arguments or the target routine descriptor 
invo lv es 
i nvolve 
rout i ne calls, 
further routine 
which the mse lv es 
cal Is. -he 
may 
most 
recently created display is always t e correct 
d i splay to be act i vated when a NEWDIS~LAY code 
is encountered. 
The return address and status information is on 
the top of the stack and is " popped ·· into the 
memory location pointed to by the RETURNMARK 
fie l d of the display. In target machine code 
this pop may not need to be done as the 
RETURNMARK may be implemented on the stack 
using the value pushed by the CALL in struction 
used on the target machine. There is 
The stat i c I ink is "pop ped" into the STATICLINK 
of the disp l ay vector. Th i s operat ion may also 
not be performed in the target mach in e code as 
t he STATICLINK may be implemented on the stack. 
The TOP _OF_STACK i s moved to the OCALSTORE 
field of the display. 
g) Al locate space for local objects by the use of PUSH 
instructions. 
w i I I 
h) At the completion of the routine, 
be used, fo I I owed by the RET code. 
4-32 
the POPMARK GAS 
Th i s w i I I place 
code 
the 
Three Address / Stack GAS Mach in e 
RE URNMARk value o the current d i sp l a y in to the OP OF STA Ck of 
the previous disp l ay, de l ete the current d i sp l ay, and make the 
previous display act iv e. The RET instruction wi I I return using 
the address po in ted o by the TOP OF STACK and reinstate any 
status registers of the machine that had been i mp I i c i t I y saved 
during the subrout i ne call. RET is not involved with env i ronments 
and is a s im ple return rom subrout in e. 
The target mach in e code generato r may be able to per orm the 
steps of the cal l convention very easily using i nstructions and 
facilities such as frame p o inters in the target machine . It may 
have to perform significant work to implement the cal I convention 
if the machine only has limi ted fac ili ties for maintaining the 
d i splay components. The GAS subroutine convention represents the 
essence of subrout in e cal l s when l ex i cal nesting and scope rules 
are i n v o I v e d , and the stac k i s used as the pr i mar y v eh i c i I e . 
However it does represent the conventions that the result and 
arguments wi I be passed on the stac k , and that an env i ronment 
pointer wi I I be passed on the stac k . It a l so reflects that 
registers wi I co mm on I y be av a i I ab I e to ass i s t the process . 
Non-standard cal I in g conventions can be implemented by using only 
the CALL GAS code. The programmer or front end l anguage processor 
will be responsible for hand lin g arguments and resu l ts 
themselves. The Reade "Construct" can be used to guarrantee that 
arguments are placed on the stack exactly as they should be. The 
lan guage front-end cou l d also dec i de to bypass the standa rd 
convention for short h i gh-use r outi n es , and generate Rcode to 
pass parameters in registers. In both cases the Reade Fast_Ca l I 
wi 11 be used instead of the Cal Rcode this i s, however not the 
co ncern of the back - end. 
4-33 
Three Address / Stac k GAS Mac hin e 
Input and Output 
The Reade input and output statements are supported by GAS code 
IN and OUT statements. The operands o the struct i on spec ify 
t e port address, s i ze of block to transfer, bJ f er address, and 
the type of port < indicated by Rcode Basic t y pe f i eld of the 
buffer address operand ) to indicate i f byte, do ble byte etc port 
i s i nvo l ved ) . Whether the port address is va l i d or not wi 11 be 
target hardware dependent. The ta r get machine c de generator wi I I 
have to generate appropriate code to check the port address. The 
target hardware may support the bloc k port operation directly by 
a single i nstruction, or it may have to generate a loop of 
i nstructions to execute the operation. The GAS machine retains 
the .block 
block move 
input output concept as it is much easier to break a 
down than to recognise a block move from a loop 
containing move instructions. 
System Virtual Ca l Is 
These 
kernel 
target 
i n the 
are used to make uses of services of the operat i ng system 
What they are translated into wi 11 be determined by the 
code generator. The operands of the S C code is a value 
range [O .. 255] to represent the cal I co de, an operand 
specify i ng the number of fo I I owing operands, and zero or more 
fol lowing operands. The code value wi 11 be interpreted by the 
target code generator as a kernel function and wi 11 process i t 
acco r ding ly, us i ng the operands as requirea. Currently two 
sta nd ard SVC codes have been def i ned: 
254 
255 
Save Context 
Load Context 
4-34 
Three Address/StacK GAS Machine 
Th es e w i I I be us e d by the f r on t e n d or p ro g r a mm e r 
v i rtua l concurrenc y fac i I i t i es. Load Conte x t w i I 
to im p I e ment 
requ i re one 
pa r a me te r to spe ci f y t h e i dent i fi cat io n o f the pr ocess i n v o l ved. 
NO TE : Conte x t i s here spec i f i ed to mean 
the state o f the target mach i ne 
those e l e ments o f 
n eeded to enab l e 
e x e cutio n t o cont i nue aft er a sa v e a n d lo ad pa ir a s 
i f ne i t her had b e e n e x ecu te d, a n d r ega r d l ess o f what 
:1 ccu rre d be twwen t h e save and I cad . 
Interrupts 
The GAS machine i s pro vid ed wi t h genera l 
avai I able on any machine. This cons i sts of: 
i n t errupt fac ili t i es 
a ) SETINT new state. The ne w state operand can have va l ues 
b) RT I. 
DISABLE 
ENABLE 
Spec if ies a return from i nterrupt . 
Both o f these GAS c o de i nstruct i o ns w i I I be con v e r ted t o 
app r opr i ate targe t mach in e code. As these fac i I i t i es are 
ava i I able o n most t arget machines they a I I ow 
implementations o r Rcodes such as Add_In_Place, that 
portable 
do not 
depend on the a va il ab i l i t y of any specia l i ndivisib l e 
instructions. A l ong with the SVC code for load a n d save conte x t, 
these tw o i nstr uct i o ns pro v ide t h e essenti al p o rt a b l e t ool s f or 
interrupt handlers. The reg i ster and i nput / outpu t port 
i nstructions wi l l a l so be v ery usefu l . 
4-35 
Three Address/Sta c ~ GAS Mach i ne 
Mach i ne State 
The GAS machine status is spec ifi cally represented by t iv e 
vari ables. They are: 
CARRY 
OVERFLOW 
EQ UAL 
POSITIVE 
NEGATIVE 
state 
For an y target machine t her e wi I I be other ad di tiona l machine 
status information. Man i pulation of these status b i ts w i I I 
therefore be non portable. Usually status bits are stored in a 
spec i fic reg i ster. Man i pulation of these registers is achieved by 
the two GAS instructions STATESET and STATEREAD for manipulating 
r egisters . An y target machine status registers, and the GAS 
status are preserved and reins t ated on CALL / RET GAS ins t ru ct io ns . 
Any instructions that manipulate the machine status reg i sters is 
obvi o us ly target mac h i ne dependent. However the bits re lated to 
the GAS status bits can be manipulated portab ly. 
The GAS status f l ags can be used as source values for operations 
and therefore provide a portable means of imp l e ment i ng overf l ow 
hand I ing . They are us ed to break Rcode operat ion s on d y na mi cally 
sized objects i n t o GAS code op e ra t ions on "Bas i ctype" obj e c ts. 
Wh en the condition of a flag represents is TR UE , the f l ag wil l 
ha ve a value of one, else it will be zero. 
The 
the 
GAS flags are se t b y va rious GAS instructions d epend i ng on 
in structi o n results. The target code ge ne rat or must ensure 
4 - 36 
Three Address / S t a ck GAS Mach i ne 
t, at target status b i ts used to represent the GAS status b i s 
are set accord i ng l y, either as a consequence 0 the target 
mac h i n e instruct i on s , or by spec i fie setting of the status bi ts 
. ,: I , not set by target mach in e i nstructions. Reference to GAS 
flags w i I I require target code to access the target 
1 a chi ne status b i ts used to represent the GAS flags. 
Logical Sh if ts 
M st processors prov i de some form o i ns tr uction to rotate the 
con tents of memory or a register . The shit will be by a certa in 
mber of b i ts, and could be l eft (the most s i gn ifi ca t b it 
d i rection) or right (the least s i gnificant bit d ir ection ) . Th e 
s i ft cou l d be arithmet i c, I og i ca I , or rotation. If arithmetic, 
t h e most significant b i t wi I I be considered a sign bit '. If the 
s i ft is l eft, the shift ill s im ply not involve the s i gn bit. If 
t e shift is r i ght , on ea ch bit shift, the sign bit w i I I be 
rep Ii cated into the bit immediate ly to its right. If the shift 
i s binary, al I bits wi 11 be involved. If the shift is rotat io n, 
b i ts shifted off the end wi ll be p l aced back i n the bit position 
on the other end of the bitstr ing . Target machine code must be 
genera ted to produce the e xact effect specified by the GAS sh i ft 
i , struct ion . The bitstring spec i fied as the target of the shift 
i n the GAS sh ift i nstruction ca n be any s i ze, so a series of 
target machine i nstructions may be required. The GAS code 
ge nerator shou l d try to produce shift operat i ons on objects 
d i rectly supported for sh i ft operations. 
GAS Instructions 
T ' 
I e format of GAS i nstruct ion s is an operation code fo l lowed b y 
se v eral operands. A l I i nstruct i ons except the SVC have a f ix ed 
4-37 
Three Address/Stack GAS Machine 
number of operands. 
an instruction 
i nstruct i ons. As 
The target machine i nstruct i on set must have 
set that can sensibly emulate the GAS 
a mi1ninum, emulat ion of each GAS code should 
only requ ir e template of target machine code 
in struct io ns. 
a standard 
Additional target machine code wi I I be required, 
particularl y for reg i ster dump i ng and restoration. 
GAS Operands 
Gas operands specify th e values to be used for an instruction. 
The values can be basictype objects, blocks of bytes, bitstrings, 
a br·anch I abe I , cond i tion value or shift mode. A special case 
exists when an unini t ialise d ob ject is pushed on the sta ck. Th i s 
is done when space is al located for the result of a routine and 
for loca l variables of a routine. In this case the source value 
to push will be specified as a "nil" address value( see above 
Compiler Architecture 
The PLIP comp i I er structure has been organised to a I I ow separate 
phases to operate as independent processes. From the backend 
viewpoint, the phases are: 
Front end 
Gas code generator 
Gas code optimiser 
Target Code generator and optimiser 
As GAS c o de wi I I b e attached to t he Rcode tree, a I I phases are 
designed to operate from Rcode queues. An Rcode queue effect i vel y 
4-38 
Three Address/Stack GAS Mach in e 
provides a p i pe Ii ne between phases. The co mm on connect i on 
mec han ism between these phases i s one of the be e i ts of encoding 
GAS codes as part of the Reade tree. A plug and socket approach 
can also be supported as each phase does not k ow how Rcodes are 
placed on i ts inpu t queue, or what happens to Rcodes 
its output queue. For example, Rcodes on the i nput 
GAS code generator ma y be placed there by an Re de 
it places on 
queue to the 
file reader 
process, or d irectly by the front end process. The Rc odes from 
the output queue may be absor bed by a process that writes Rcodes 
to a dis k file, or d ir e ctly by the GAS code optimiser. When the 
decision was made to attach GAS codes to the Rcode tree it beca me 
clea r that a co mm on connect ion mechanism between phases became 
poss i ble, and some time was spent developing the concurrent, 
queue connected arc hi tecture. Addit i onally the Rcode fi le reade r s 
and writers were extended to ca l I routines t ha t encoded and 
decode d GAS codes fro m the Rcode reserved extension, thus 
allowing Rcode fi le input/output routines to be used 
intermediate f i I es between the GAS code generator, 
optimis er , and the target code generator. 
Th e Ge ne r at i on o f GAS code from Reade 
for 
GAS 
any 
code 
The generation of GAScode fro m Rcode wi I I be doe on a post - order 
flow of Reade us in g what is essent i al l y a s yn ta x d ir ect ed 
trans l at i on mechanism . Code holdup for one procedu re wi I I not be 
used. Therefore the generator operates on a very I oca I I e ve J . The 
ma i n code improvement mecha ni s ms us ed duri n g th i s phase ar e to 
col l apse 
addressing 
address 
as many addr e s s comput a tions as po ss ib l e i nt o one GAS 
mode so as to preserve as muc h information on an 
structure as possible, and to tr y and e limi nate 
unnecessary sto r e instructions. The un n e c essary st o re 
in stru ct i ons w i I I commonly be gener at ed for Rco de sequences s uch 
4 - 39 
Three Address/Stack GAS Machine 
as: 
I 
I 
STORE 
I \ 
I \ 
REFER VAR y ADD 
I \ 
I \ 
REFER VAR x #5 
It would be easy to generate GAS code of the fo rm : 
USEVAR temp1 
ADD X,#5,temp1 
LOAD t ernp 1 , Y 
However the USEVAR and LOAD i nstruction can be e li minated 
ADD X ,#5, Y 
A holdup stack i s ma i nta i ned to a ll ow these code impro v e ments to 
be ach i eved. Each node of the stac k c o nta i ns a GAS operand and a 
GAS instruct ion. Th e node at the top of the stac k wi I I c onta i n 
the GAS operand that contains the r esult of the previous 
operation, and also t e i nstruct i on i tself. If o i nstruct i on was 
generated for the previous Rcode the i nstruct i on code f i e l d o f 
the stack node wi 11 be NIL. This wi I I occur fo r Rcodes s uch as 
"REFER _VARIABLE", wh i ch are involved in address computatio ns that 
c an be represented direct l y in GAS address i n g modes . Each 
4-40 
Three Add r ess / Stac ~ GAS Mach in e 
s t r uct i on of the add r ess computat i on wi I I resu l t i n a mo re 
co mplex addressing mode be i ng constructed. I f the previous Rcode 
does not return a result, the operand field w i I I be NIL. Th i s 
w i I I occur for Rcodes such as "STORE " . To i ll ustrate how th i s 
s:ac k i s 
•al lowing 
used c ons i der the generat i on of GAS 
Rcode. The tree structure of the Rc o de 
!: y i ndentation . 
STORE 
data_type_l 
REFER_VARIABLE Z 
ADD 
datatype_2 
LOAD 
LOAD 
datatype_3 
REFER VARIABLE X 
datatype_4 
ADD 
datatype_S 
REFER_VARIABLE Y 
LITERAL #6 
c ode for t h e 
i s represented 
T e REFER_VAR I ABLE Z wi I I be encounte r ed first and converted to a 
source GAS operand that wi 1 1 return the address of " Z". The 
"basictype" f i eld for the operand w i I I be set to that required 
fo r a target machine po i nter va lue. Th i s operand wi I I be pushed 
o t o the stac k w i th a NIL i nstruct i on f i eld. e '" REFER_ VARIABLE 
x· Rcode i s encountered ne x t and a source GAS operand is 
generated that returns the address of "X" . Th e ·· bas i ctype " fie l d 
va l ue is a l so set for a target machine pointer type. Th i s operand 
4-41 
Three Add r ess / Stac k GAS Mach in e 
i s a l so pushed on t e stack a lon g w i th a NIL in str ction fie l d. 
The i rst LOAD instruction i s then encountered. The holdup stack 
i s popped and the operand which returns the address of X is 
obtained. This i s changed to an operand which spec i f i es an object 
of bas ic type datatype_2 that is l oacated in memory at X. Th i s 
operand i s pushed bac k onto the stack with a NIL in struct i on. The 
REFER VA RIABLE Y Rcode i s then encountered and an operand that 
speci i es 
in struction. 
the 
The 
address of y 
" LITERAL #6" 
i s pushed along with a NIL 
i s encounte r ed next and a source 
GAS operand returning an immediate va lu e #6 i s generated. The 
"basictype" field of the operand is set for the size and type of 
the immediate va lue . This operand i s also pushed on the stack 
a l ong with a NIL instruction field. The second "ADD"' Rcode is 
then encountered. 
it is determined, 
The two operands on the stack are popped, 
one spec i fies the address of Y a nd one is 
and 
an 
imm ed iate value #6. As a result, a new operand is created 
spec i fies the address of the location at an offset of 6 
from the start of "Y" . This operand i s pushed onto the 
that 
ytes 
holdup 
stac k with a NIL in struct i on. his i I lu strates ow address 
computation structure provided by Rcode trees i s represented i n 
the addressing mo de facilities of GAS code. The LOAD Rcode is 
then encountered. The holdup stack is popped and the operand that 
specif i es an address offset 6 byt e s rom the start of Y i s found. 
his is changed to an operand that spec i fies the co nt ents of this 
l ocation, and the Basic type specifier for the operand wi I I be 
set to datatype_4. This operand is pushed on the stack w i th a NIL 
in struct ion. The first ADD Rcode i s then encountered. The holdup 
stac k is popped twice. This wi I I provide one operand that 
specifies a basictype object of type data ype_3 and loc ated in 
memory at X, and an operand which specif i es a bas i ctype object 
of type datatype_4 located in memory at o fset 6 bytes from Y. A 
4-42 
Three Address /Stack GAS Mach in e 
GAS temporary i s def i ned ( bu no USE VAR GAS code i s generated 
yet ) , and an operand generated that specifies a basictyp e obje ct 
of type datatype_2 located in this temporary. A GAS ADD 
instruction is then generated which has as so u rce op erands the 
two operands popped, a target op erand the opera d tha t specifies 
the temporary. Note that the datatype of eac operand may be 
d iff erent. The operand referring to the tempora ry and the new GAS 
ADD i nstruct io n are pushed on the stac k . he SORE Reade is then 
encountered. The holdup stac k i s popped twice and it is noted 
that the operand popped first is a temporary who se contents are 
generated i n the instruction popped with i t. The temporary is 
discarded. The second operand popped def i nes the dest i nat ion 
address of the store . The operand specifies the add ress of 
location Z. This operand is changed to specify a basictype object 
of type datatype_1 located i n memory at Z and the GAS ADD 
instruction i s altered so that th i s operand i s the target. This 
ADD ins t ruction is then emitted. 
Note that if any of the operands for th i s ADD in struct ion s d id 
not begin on by te boundaries, temporar i es wou l d have to be used 
and the value moved to the temporary for the ADD operation . This 
ref l ects the family restr ic t i on that ar i thmet·c operands must 
begin on b yte boundaries. 
h i s process is essent ially a standard bottom up trans l a t ion. The 
techn i ques for convert in g such things as Rcode a ri th me tic and 
control operat i ons to linear GAS code i nvol v e standard well 
established code translation methods for converting a parse tree 
to target code. Code has been produced for thi s t r ans lation 
pro ce ss but has not figured domina n t l y in the design e f fort for 
the two I eve I co ncept as no new issues are i n vo I ved. 
4-43 
Three Add r ess / Stack GAS Mach i ne 
Dynam i c sized and overs i zed objects 
When ar ithme t i c operat i on s on dynamically s i zed objects or 
objects too l a r ge for bas i ctypes, are encountered, they mus t be 
bro ken down in to operat i ons on objects of s i ze supported by the 
target mach in e. In the AX f am ily GAS code, this brea k down wi ll 
be performed at the GPS l evel. Te fol lowin g notes decr i be how 
this problem can be approached o r operations involvin g equa l 
size sourc e objects and i I lus trates how the GAS resources defined 
for th is GAS rnach i ne can be used . The hand Ii ng of ove rf I ow as 
spe ci f ie d b y the Rcode ar i thmetic operations i s also discussed. 
These mu l t i p l e arithme t i c algorithms themselves are we ll known. 
Dyn amica lly sized or oversized objects can be encountered 
Rcode extended exact ar it hmetic codes, dec imal arithmetic 
i n 
and 
bitstring operations. Ext ended arithmetic codes and decimal 
arithmetic involve ob je cts that are a multiple of bytes with the 
operands specified by B l ock_Load, Block_Update and Select Rcodes 
or as the resu l t of a preced i ng b l ock operation. Bitstr in g 
objects start on b yt e boundaries a nd are a multi ple of bits w i th 
oper ands spe cifi ed b y the B i t_se l e c t , B i t_B lock _Load and 
B i t_Block_Update Rcodes or as the re su l t of a prev io us operat i on. 
B it str in g 
b i tstring 
bitstr in g 
operat i ons a r e not b roken down s i nce target machine 
faci l i ties are usuall y q ui te var i ed such as dynamic 
operations up to 32 b i ts. It w i I I be the task of the 
target code generator to break down bitstring operat io ns int o the 
target machine in stru ct i ons requ i red. 
Th e brea kd ow n for dyna mi cally s i zed objects must be done at run 
t ime . Stat i ca lly sized objects can be broken down at comp ile 
t im e . The algo ri thm for the b r ea k down depends on the t ype ot 
4-44 
Three Address / Stack GAS Mach i ne 
operation i n v o lv ed. In the ol l owing a l gor i th ms it is as su mmed 
that the l argest source operand the target machine w i I I support 
for the ar i thmetic operation i nvolved, is cal l ed a basic unit and 
that data i s packed with the l east significant byte at the lower 
memory address. 
Add iti o n/ Subt r act i on of ( s i gned / uns i gned ) o v e r s i zed i ntege r s 
Thi s descript io n assumes both operands w i I I be of the same s i ze . 
For subtract ion it is assumed that the second source operand is 
the subtractor (see GAS code reference manual). When two in tegers 
of a s ize n bits are added the result wi I I be an integer of s i ze 
n but with the poss i bi ity of an overflow bit. The Rcode datat y pe 
for the operation wi I I specify how overf l ow i s to be handled. A 
byte may have to be prepended to the result which is a boolean 
i ndicating whethe r or not overflow has occurred, or an overf lo w 
error may have to be signal led. Appropriate GAS code should be 
generated. The signalling of overflow error i s left to a default 
target machine handler. The Rcode only specif i es that an overflow 
error message shou Id be s i gna I I ed, not what the message shou I d 
be. The GAS code SVC cal I instruction with a v a lu e indicating 
i nteger overf l ow wi I I be generated. 
he algor i thm for breaking 
ad d i tion and subtraction is as 
down dynam i ca lly 
f o I I ows : 
s i zed i nteger 
When the size of the source operand is greater than 
the size of the largest object that the target machine 
a I I ows for the operation i nvo I ved then perform the 
fol lowing: 
Create a binary byte sized temporary initialised to 
4-45 
Three Address/Stac k GAS Mach i ne 
zero, 
opera nd 
and two temporar ie s of size equa l to the 
that the target mach in e can hand l e 
l argest 
direct ly 
These for the 
temporaries 
and "B .. . 
GAS code i nstruct i on invo l ved. 
will be ca lled bas i c un i t temporaries " A " 
For each basic unit of the source operands taken in 
turn from the l east sign ifican t end, unti l the 
remaining most s i gn ifican t part of operands are of size 
equa I or I ess than the size the machine can support: 
( il Exc ept for the least s i gnificant basic uni t, 
perform an unsigned operation on the basic unit 
of the second source operand and the byte 
temporary and store the result in the "A" bas ic 
unit sized tempora ry. If overflow occurs, move 
"1" to the byte temporary, else move zero to i t. 
Ex cept for the le ast significant basic 
perform an uns igned operat i on on the 
source basic unit and the "A" basic 
temporary, storing the resu I t i n 
unit, 
first 
un i t 
the 
corresponding basic unit of the target 
If overflow occurs add one to 
operand. 
the byte 
temp o ra ry . For the l east significant basic unit, 
add the two source oper and basic units, and 
store the result in the corresponding target 
bas i c unit. If overf lo w move one to the byte 
temporary. 
Generate code to lo ad the remainder of the fir st so urce 
4-46 
Generating 
Three Address / Stack GAS Machine 
operand in to 
remainder o 
basic un it 
t e .. A.. basic n i t ·emporary and 
the the second source operand into 
temporar y. This i s done because 
t he 
"B" 
the 
remainder may be any size less than or equal to the 
s iz e the target machine can handle. Hence the compi l er 
must use a temporary of bas i c u i t s iz e. 
Generate co de 
temporary and 
to perform the opera t i on on the 
the remaind er of t he " B " basic 
byte 
un i t 
temporary storing the result in t h e "B" basic unit 
temporary . If overflow, set the byte temporary to one, 
else ze ro . Perfor m the operat i on on the remainder of 
the "A" and "B" bas i c unit temporaries plac i ng the 
resu l t in the " A " basic unit temporary. Test if the 
result in the " A " bas i c unit te mporary i s too large for 
the remainder of the resu l t operand or that overflow 
has occurred; if so set the byte temporary to one (else 
do nothing). If the byte temporary i s st i 11 zero cop y 
the least s i gnif i cant bytes of the "A" bas i c un i t 
temporary to remaining bytes of t e resu l t operand. If 
the Rcode operat ion datatype spec ifi es, move zero to 
the least s i gnif i cant byte of resu l t operand to 
in d i cate no overflow has occurred. If the temporary 
byte 
byte 
move 
the 
is not zero, and the Rcode datatype 
to i nd i ca e overflow, d o the copy as 
one to the l east signif i ca nt byte of 
Rcode datatype of the operation 
requires a 
above, but 
resu l t. If 
spec if ies 
s i g n a l I ing a runt i me error on overflow, generate GAS 
code SVC instruction for ove r f lo w instead. 
code when the the compi l er can determ i ne i f the 
4 - 47 
Three Address / Stac K GAS Mach in e 
op e rat i on i nvo l ves operands that a 
the target mach i ne or the operat · 
e not l arge r th an supported by 
i s as fo l lows: 
This requ i res that the s ; :e be stat i ca ll y determinable. 
Check whether it matches a datasize supported by the 
target machine . This wo I d require detai l ed information 
ava i I ab l e on the data s i zes the target machine direct l y 
supports for each GAS ar i th metic operat i on and or each 
datatype such as signed i nteger. -h i s i s not a large 
volume of information as there are not many GAS 
arithmet i c instructions. I f the operand sizes are 
supported by the target machine generate an instruc ~ ion 
to perform the operat i on. If not then generate an 
instruction to load the remaining bytes of each source 
operand into temporaries sized to the ne x t I argest size 
supported by the target machine and then perform the 
operation on the temporar y . Code is generated to check 
whether the resu It in t e temporar y can fit back into 
the rema i ning bytes o f t e result, or that overf l ow has 
not occured. If no overf I ow occurred the resu I t 
temporary which corresponds to the remainder of the 
result i s moved to the resu l t opera n ~. A zero value i s 
moved to the 
the operation 
least s i gn i ficant b y te of the result 
dataty e specifies prepending a 
i f 
byte 
i ndicati g if over l ow has occurred (see Rcode 
reference manual l. If overflow has ccurred, then the 
action taken depends on the datatype specified for the 
operation. If t e least s i gnificant b y te must be set to 
indicate overflow, move a value of one to the most 
least 
doing 
significant byte of the target operand, 
the copy from result temporar y as above. 
4-f.8 
after 
If a 
Three Add re ss / Sta ck GAS Mach i ne 
runtime error m st be 
Gascode instead. 
signal led, generate an SVC 
If the operands are oversized , by stat i cally s i zed, the breakdown 
will follow that descr i bed for dynamica ll y s i zed operands, 
however the compiler cou ld take advantage of knowing the e xact 
s iz e of the operands and generate a s im p l e iterative cou n ted loop 
throug h the basic units of the operands. 
Mu l t i p li cat i on of ove r s iz ed int egers 
This assumes both operands are of the same size 
r esult of multip i cat i on is double this size and 
and 
that 
that the 
division 
produces a result which consists of a dividend and rema in der eac h 
of the same size as the divisor. However each iteration of basic 
units will be more co mpl e x, but the a l gorithm will involve the 
usual multiple precision algorithms that can be im plemented using 
temporaries, and information on target machine sizes for 
arithmetic operations. 
Imp l ement i ng Co n tro l Str uc t ur es 
Control structu r es such as FOR lo ops, CASE structures and loops 
are implem ented using cond i t io nal and unconditional branches. 
However un I i k e code generat i on of code for a specif ic target 
machine in which bran chin g addressing modes of the target machine 
wi 11 be used, t y pical ly inv o lvin g relative address in g modes, the 
GAS branch instruction will i nvolve a branch to a specif i c code 
l ocation marked by a GAS l abe l . As the c hapter on t arget code 
generation wi I I describe, the target code generator w i I I convert 
any branch i nto an address in g mode suitab l e for t he target 
machine. 
4- 49 
Three Address/StacK GAS Mach i ne 
- ~e GAS gene rator can ta ke a ny approach i n generating GAS code 
: at, g iv en the semantics of the GAS mach i ne, w i 11 lead to the 
production of correct target machine code. This leaves room for 
s uch dec i sions as a l locating i nde x e s of FOR loops into 
:emp o rar i es or onto the stac k . It i s envisaged that the GAS 
i;enerator 
:::iroduced, 
w i I I be s ignifi cant ly modified to im pro v e target code 
ba se d on tests of code produced for var io us p i eces of 
sou rce code. The GAS cod e generator a l gor i thms are comp ! i cated by 
:~e fact that not only i s t he ef ficie nc of the GAS code µ~_d _ ce d 
im por ta nt , but that some seemingly eff i c i ent GAS code may produce 
; neffi c i e t targe t ma c hine code. It i s also i mpor tant that the 
writter of a GAS code generator does not ha v e a speci fie target 
, a c hine in mind. Th e GAS code may produ c e effic i ent code for this 
machine but very poor code for others. 
4-50 
Chapter Five 
GAS code Optimisation 
This c apter discusses the reasons for optimisation o GAS code 
and t e d i scusses the requirements ad design of the optimiser. 
The des i gn uses many of the optimisations based 
ana l ys i s as described for example by Aho et 
on 
al 
data 
[9]. 
flow 
They 
describe these concepts in terms of I inear three address code and 
do not provide a rigorous description of implementation . This 
c hapter presents a design using data flow analysis in the context 
o GAS code attached to an Rcode tree. It i ncludes a description 
of many of the required implementation aspects, such as how to 
represent basic code blocks (defined below), the contents and 
structure of a database that stores information generated during 
optimisation and problems such as iden tif ying "variables" 
to optim i sation. Many of the optimisations are we I I releva t 
described 
descr i bed 
in 
i n 
the 
this 
I iterature on compilers and hence are not 
the chapter. This chapter a i ms to present 
design for a complete optimiser; what optimisation s should be 
performed, what data structures shou l d be generated to enable 
e ficie n t i mplementation of these optimisations, and elaboration 
of hand l ing problems such as ident i fy i ng variables and handling 
pointers in the Reade and GAS code environment. 
n earlier chapters it was suggested that an Re ode optimiser 
c o uld be p r oduced but that certain problems existed: 
a) Significant target code optimisation would sti I I be 
required 
5-1 
GAS Code Opt i mi sat i on 
bl Interpretat i on is very use ul or op 
wi I I 
i mi sat io n, but 
construction of an Reade interpreter o t be very eas y 
c ) 
w i I I be 
wr i tten 
Opt imi sat i ons such as common sube x press io n 
represented 
out t o d i s k, 
by a d ir ected graph but that if 
the opt imi sat io n wi 11 be lost 
elim i nat io n 
Reade is 
s i nce the 
p ost orde r externa I representation does not a 11 ow 
g raphs to be repr esented 
for directed 
d) Much of the i nf orma t i on generated or optimisation is 
also useful for target code generation. An e xa mple i s informat i on 
on which definitions of var ia ble values reac h a given point in a 
program. Another example i s whether at a given point, a variab l e 
with its cu rr ent value cou l d be used at some l ater stage. This i s 
spec i fically useful for register allocation. Th i s i nformat ion i s 
more useful computed for al I objects that wi 11 be involved in 
target machine code. The information generated for Rcode 
optim i sat io n wi ll often involve operat i ons on non target machine 
objects (block operations and bitstringsl, and will not involve 
a ll objects r equ i red at the mach i ne l e ve l , for e xamp l e 
temporaries used to brea k down b l ock arithmetic or b i tstr in g 
operations. 
A second intermediate code offers reasonab l e compromises in 
sol vi ng 
achieved 
these problems. Al I the 
in Rcode can be achie v ed 
optimisa ti ons that can be 
in GAS code, plus additiona l 
o pt imi sat io n can be achie ved a s such as 
invol ving the e x tra temporar i es that are created 
elimination of branches o instructions tat 
copy propaga tion 
i n GAS code, and 
are themse lv es 
uncond it i ona I j umps. GAS code w i 11 be eas i er to interpret than 
5-2 
GAS Code Opt imisat i on 
Rcode as i t involves ewer operations. The i nformat i on generated 
for optimisation wi I I be more useful to the target code generator 
as it involves more o the actual target code objects involve d, 
both because temporar i es are created, and because ar i thmetic 
operations are broken down. However bitstrings are not 
down so in this respect not all target machine temporar i es 
broken 
w i I I 
be epresented. Common sube x press ion s can be represented in 
externa I form as the y a re stored i n temporar ies . Th ugh GAS code 
is a inear code, retainin g the Rcode in f orma tion improve s the 
ease of optimisation. Data flow analysis and recognit io n of loops 
are much easier than aving to discern this information from a 
purely I i near code. 
The pr i ce of optimisation at the GAS code l eve l is that an 
optimiser will have to be written for each family of machines for 
which a GAS machine is defined . This is, of course, sti 11 much 
le ss effort than writing one for each target! 
Optimisation w i I I, 
machine 
temporary 
code 
data 
level 
however, st i I I be required at the target 
as GAS code st i 11 wi 11 not r epresent al I 
branches wi I I 
objects required in target 
be ge ne rated in target machine 
code, 
code 
and extra 
t at may 
themselves 
branches. 
branch to i nstructions that invo l ve u cond i tional 
A description of additiona l target code optimisation 
needed is given in the chapter on target code generation. 
GAS Code Op t im i sa ti ons 
The GAS code o pt imi zations consist of variou s global (perfo rm ed 
across basic code block s us i ng data flow analys i s and in terpreter 
results ) and local code block optimisations. Basic blocks are 
"stra i g ht li ne " pieces of code with one entry and one e xit point. 
5-3 
GAS Code Opt i mi sat i on 
hin t he se b l oc l<. s i t i s nown that t he i nstruct i on s w ill oe 
p erformed. Some instruct i ons cou l d actual l y be performed 
p a r a ! le i v ia p ip e! in i ng and i nstruct i on caches. Note that operand 
ho I dup h as a I ready been used i n i n i t i a I GAS c o de generat i on t o 
a ll ow for efficient address i ng mode se l ection and li m i ted 
i nstruction ho l dup is used to a v o i d some me mo r y operations. T Ii e 
o pt i m i sat i ons 
a r e: 
t h at are r eco mmended to be performed on GAS c o de 
A I I 
Global 
Static evaluat i on 
Loop invariant code remova I 
Induction variable e li mination from loop 
Constant folding 
Local Code Block 
Common subexpression e I i mi nation 
Instruction rearrangement 
Peep hole optim i sat i on 
opt imi sat i ons a r e pe rf or med on a procedu r e by pr o ced ure 
b as i s . Th i s i s lim iting i n t h at i nter-procedura l opt imi satio n i s 
i g n ored. The worst case i s assu med that a l I " out" and outer scope 
v ar i ab I es are a I i ve on e xi t ra m t he procedure and must be cop i ed 
to me mor y from registe r s at the end of the procedure. Loca l 
ar i ables are obviuos l y not l ive on e x it. The "out" paramete r s 
are para meters such as " pass by re f er e nce". At the start o f t ' e 
proc edure, no v aria bl es a r e ass um ed res i dent in r egiste r s. Wh e n a 
procedure ca l l is encountered, all variab l es are copied fr om 
registers back into memory. Any temporar i es i n registers must be 
saved on the stack , but how this i s done i s exp l ained l ater in 
5-4 
GAS Code Opt imi sat i on 
t~e target mach i ne code gene rator n o es . 
Pasca l and Modu l a, use of pr o cedures i s 
In l ang uages such a s 
heavy hence inter-
proc edura l mot i on ana ly sis cou l d conce iv ab ly produce useful code 
im provements, part i cu l ar ly for h eavi ly sed short procedures such 
as a ''Get_ Character" procedure. Th i s could eas il y be added as an 
add i t i onal phase. However t im e lim itat i ons current l y preclude 
impl ementatio n of inter-procedura l lo w a a ly s i s but i ts 
inc orporat i on shou l d be ser io us ly c on s i dered i n the near fu t ure. 
Stat i c eva l uat i on 
Whenever an assignment is performed ( LOAD, ADD, e t c), then an 
interpreter wi 1 1 be cal led to a ttempt t o evaluate the co d e that 
produces the operands to be added. The i nterpreter wi I I be g i ven 
the Rcode tree pointer of the Rcode tree containing the .GAS code 
that produces the source used i n the assignment. 
Whenever a cond i t i ona I e x pression i s used for a case structure, 
the interpreter w i I I be ca 11 ed to atte mpt to evaluate 
unreachable 
the 
case e xpress i on. 
opt i ons can 
e limi nated . 
The possib i I i ty of el imina t i ng 
then be detected and t he code for the options 
he i nterpreter w i I I be e x pected to return the computed value i f 
sta ti c eva l uation i s possib l e. If a var i ab l e is ass i gned a stat i c 
val ue a STORE instruction assigning t h e static value to the 
target var i ab I e is gene r ated, and the subtree for the source is 
d i sca,·ded. 
no t been 
generated 
initialise 
If the l e xi ca l lev el i s g lo bal, and the var i able has 
assigned a va l ue pre vi ously, the gascode I NITVAR 
which wi I I be used to generate a I i n ker direc t ive 
the storage a l located for the globa l var i able to 
static va l ue computed. 
5-5 
is 
to 
the 
GAS Code Opt imisation 
Other Globa l Opt i mi sat i ons 
he rema ini ng g l oba l optimisat i ons a l I depe nd o n the results of 
code mo t i on and data flow ana ly s i s. Cooe motio n 3-a iy s i s 
i dentif i es bas i c bloc k s that co nsist of code sequences that have 
one entry ? Oin t and on e e xi t po int. Additiona ll y the basic blocks 
that can i mm ediate ly precede and succeed each block are 
en t ifi e d. With i n a code block, commo n sube xpress i on and 
regist er al locat io n 
usua I I y done afte r 
can be eff ectively perfo r ; ed but 
the global optimisations . The 
these are 
fol lowing 
descriptions of code motion and data flow ana ly sis are based on 
i deas a nd a l gor i thms b y Aho and UI Iman, but extended to account 
for the specific environment provided by Reade with GAS code and 
to allow for the effects of type coercion in anguages 
Modula-2. 
such as 
An algorithm for detecting basic code blocks based on one by Aho 
and UI Iman i s as fol lows: 
Identify l eader stateme n ts, state ments which 
start eac h ba s i c b l ock. 
The f i rst statement 
a program unit or 
leader statement 
l ex i cally 
procedure is 
i n 
a 
Any s t atement wh i ch is the target of a 
conditional or unconditiona l jum p, (or 
any 
- Any 
labeled statement ) , 
statement wh i ch 
i s a l eader. 
f o I I OWS 
conditional jump or call i s a leader. 
5-6 
a 
F r ea 
which 
GAS Code Opt i mi sat i on 
l eader construct its bas i c 
co s i sts of the leader ad 
bloc k , 
a I I 
statemen s up to but not i nc I ud i ng t e next 
I eader, a RETURN or the I ex i ca I end of the 
program or routine. Any statement not p l aced 
be removed. 
A flow graph cant , en be const ru cted wh i ch shows whic 
b locks can be predecessors and successors of a given 
block 82 fol l ows b l ock 81 if: 
basic code 
bloc k . A 
i ) 
A I so the 
There is a cond iti onal or unconditional jump 
first from the 
statement 
la st stateme nt of 81 
in 82. 
to the 
82 immedia te ly fol lows 81 lexi ca lly, and 
ends in a call or conditional jump . 
lexically first block in the routine 
81 
has no 
predecessors, and t,e basic blocks which ter mina e w i th a RETURN 
have no successors. 
During t' e id ent ification of bas i c blocks severa l useful extra 
functions can be performed. Set the .. in .. and ··out "" I ists to NIL 
to in d ica te that variables have not yet been identified (after 
basic b lock s have been identified, target machine generation may 
be performed whic would try to use these ""in·· a , d ""out"" li sts. 
he NIL value wil l be an indication that the list s do not ex i st . 
he identification of "variab le s " and ""pointers " of in terest can 
be performed. 
5-7 
GAS Code Opt imi sat i on 
A ter the i dent ifi cation of a ll bas i c b l oc ks, col l aps i., g 
b l oc k s with on l y one predecessor, into the predecessor can 
o f 
be 
performed. GAS codes that do not belong to any basic bloc k can be 
deleted. Another pass through the basic b l oc k s to perfor m common 
e xpression eliminat i on can then be perfor med so that e x tra 
"u sefu I var i ables" can be i dent i i ed. Note that co mm on 
subexpress i ans e I im i nation cannot be performed unt i 
of interest {as opposed to Rcode storage areas and 
var i a ! es" 
variab l es) 
have been i dentified . 
Th e major task of data flow analys i s can then beg i This 
analysis ta k es severa I forms . he major for ms a r e du-cha i n in g, 
ud-chain i ng and live variable analysis. 
Reaching analysis {ud-chains) involves determining for each basic 
b l ock for each variable, what definitions of variab l es are 
sti 11 val id on entry to a block, and which def i nitions are valid 
on exit from the bloc k . Definitions are any statements wh i ch 
assign a va l ue to a variable. This inf ormat ion i s useful for such 
things as constant fold in g where a variab l e {A) has its 
used at a po i nt 
def ini tion A 
i nstruct i on, then 
r eference 
invariant 
to A. 
cod e , 
but i t i s noted rom ud -a nalysis t hat 
2 i s the only li ve def i n i t i 
the va l ue " 2" can be used i n 
The ud-chains a l so assist in 
of A at 
p l ace of 
detecti ng 
particu larly subexpression computat ion, 
value 
the 
the 
the 
loo p 
copy 
oropagation (e.g x =y z= x if x ha s no further uses, it is poss i b l e 
t o generate z = y), and detect ion of induction var i ab l es. 
Live 
block, 
var iable analys i s involves identifying fo r each bas i c 
variables that are alive on entry and var i ables that 
co de 
are 
a l ive on ex i t. "A l iveness" of a variable me ans that th e va ria :)le 
5-8 
GAS Code Opt i mi sat i on 
sed w i h i ts c r rent v a l e at some time l atter i n t h e 
code. A use can 
a l' o cat i on. I f 
c rrent l y i n use, 
v a l ue that i s no 
a 
be made of this i nformat i on i n reg i ster 
register i s required but a l I reg i sters are 
c oose or a reg i ster that conta i s a variab le 
long er a live. iv e v ar i ab l es ana y s i s can be 
ac i e v ed using he results of ud-c ' ain ana ly s i s. 
,h e du-cha inin g analysis 
for each variable, uses 
inv o lv es computing for eac basic block, 
of t h e varia b le that coul d use the va lue 
o the v ar i able as it i s on entr y to the block, and similarly on 
e xi t from the block . "U ses of a var iab l e" are those GAS code 
i nstruc t ions tat have the var i ab l e as an operand. Hence for eac 
vari able t h ere wi 1 1 be a I i st of i dent ifi ers of instructions that 
w i I I use the variab l e's current value. The du cha in s can be used 
or ass isting w i t h the de tec t io n of loop invari ant code, 
particularly subexpressions, 
be used to assist w i th 
from the body of a loop . 
register a l l ocation by 
It can als o 
pr ovid i ng 
f ormat i on on which v ar i ab l es wi I I be used soo n est a ft e r a g iven 
po i nt in a program. 
G l o ba l common s ubexpr ess ion ana l ys i s woul d requir e that avai ! a b l e 
e x pression data f l ow analysis be performed. This woul d require 
tat each expression be ass i g ed an i d and e x press ; o . s al i ve on 
entry and e xi t rom eac h basic block c omp uted. Ava i !ab l e 
e x pressions ana l ysis i s si mil a r to reaching analysis, e xc ep t 
t ', at for an e x pres sio n to be a ai ! ab l e at the start of a bloc 
must be ava i ab l e at the e n d o each of i ts pr ed eces sor s. An 
e x pression i s killed by a bas i c block if a y ope r a n ds of its 
operands are def ined. An expression is def i ned in a basic block 
on l y i f i t is comp u ted in the b l oc k . 
5-9 
GAS Code Opt i misation 
Identification of Variables 
A ma ·or 
ana l ysis. 
question is what const i tutes a ""variab l e '' 
The Rcode wi I I only contain defin i tions of 
or th i s 
storage 
areas which are variable or stat i c areas. These storage areas may 
c onta i n only a simple Rcode basict y pe object wh i ch are the basic 
operand type used i n GAS code instruct i ons ( e x cept for bloc k 
ope r ands used in memory move ins tru c ions and bitstr i ng operands 
used in bitstring o perat i ons) or cou l d conta i n 
stru c ture. If these storage areas were 
an array or record 
the " var i ables" 
i dentified, the optim i sations achie vab l e wo u i d b e i mited, and 
register allocation would be severely limited i n the i nformation 
that i t co u ld use from data flow a nalysis. As Bas i c types wi I I 
correspond very closely to objects that wi 11 be stored in 
registers and wi I I be the objects used in computations used in 
sube x pression evaluations and loop control variables, these are 
the variables most advantageous to spec i fy. In the fol lowing 
these are what are referred to as ""variables " . Reade storage 
areas and variables are referred to 
Storage areas may contain s e veral 
together as 
basictype 
""storage areas 
variables. The 
prob I em is then how to identify basictype var i ables that may be 
l ocated anywhere in a storage area. 
The Rcode has provis i on to convey symbo l tab l e i nfor mation, wh ic h 
would be very useful for the i dentificat i o of var i ables . However 
the Reade reference manual states that the backend wi I I not 
expect 
back end 
also. 
would 
back end 
table. 
to see these Rc o des . If Reade is han d ed directly to the 
by the front end, the s ymbol tab le cou l d be provided 
However, i f the Reade i s wr i tten to d i s k the sy mbol table 
not be avai l able to the back.end. For this reason the 
has been designed assuming no access o the symbol 
It has also been assumed t hat Rcodes for symbol table 
5-10 
GAS Code Optim isat i o n 
definitions w i I I not, 
the Rcode. 
as the Re de manua l states, nor ma I I y be 
present in If Rcode symbol tab l e definitions were 
contained in the Reade, then the Reade cou l d be very long if many 
modules had been imported and sy mbo l defin . tion Rcodes for 
i mported modules would be dup li cated in any programs that also 
import those modules . An a l ternat iv e would be :o assume that an 
Rea de f il e on l y contains the s ym :io l tab l e de i i"'li tion Rcodes for 
t he program e x c l ud in g those f rom i mported modu l es, and then the 
s ym bol files for all imported modules co:..., l d be read to 
reconstruct the ful symbo l tab l e. The front end wi I I generate 
type def ini t ion Re odes for deb gg i ng purposes, but on I y for 
current mo du le. Additionally. all def ini tions will not 
necessarily be in cluded. The pragma inv olved will specify which 
type definit i ons are to be included. Whether fo r optimisation the 
extra overhead of reading s ym bol files for a ll im ported modules 
and hand I ing the symbo l tab le i s wo rthwh ile i s not obvious. 
Per haps at a later stage of th i s project an investigation can be 
made of the benef i ts obtainable. As a comprom i se, the spec i fie 
inclusion of symbol table in for matio n for po i ters would be of 
most value (see notes on po i nters ) . 
It is not necessary to have the full symbol tab l e availab l e to 
the backend. In fact the opt imi ser database structure (see notes 
be l ow> is a usefu l form for optimisation. If symbol table 
information were present in the Reade, it could be used purely to 
he l p build the variab l e and pointer database of the optimiser . A 
full symbol tab l e is not necessar y for opt imi sat i on. Fu ll symbo l 
tab l e information can be usefu l . For e x amp le in Pascal pointers 
are guarrenteed to po i nt on ly to objects of spec i fi ed types. When 
an object po i ted to by a pointer 
o f the type the po i nter can poi 
5-1 
is changed, only those objects 
t to are cons id ered i k.e I y to 
GAS Code Opt imisation 
c ange. 
In the absence of symbol table info rmat io in the Rcode, 
basictype objects and pointers must be i dentified by an 
exam inati n of the GAS code. When new Basic~ype objects are 
encountered , an entry should be ma de in the co imiser database 
detin in g the variable address and i ts datatype <obtained from the 
GAS code i nstruction in whic h the new variable ; 5 encountered). 
Only those variables that can be clearly identified are useful in 
dataf low analysis. Such variables are temporary var i ables and 
those with static of set in to a storage area. Those that are 
identified by a computed offset are not exp! icltly identifiable 
by the compiler, 
what actual piece 
and the compi !er wi JI not be able to determ in e 
of storage is involved alt l-iough it 
possib l e to determine when the same variable ;s later 
t 0- W i th the emphasis on may. The computed offset i s 
may be 
referred 
always 
computed in to a temporar y . If two variables have a computed 
offset in the same the tempo rary , and the datat y oes are the same, 
then the two variables are actually the sa me ( because in GAS 
code, temporaries are on ly ever defined once bu t may be used 
severa l times). ote t hat the offsets fort o variables will 
on I y be computed in the sa me te mporary i f com on sube xpression 
e l irn i nation has been perfor med . This i s because if Reade re ers 
t o the same variable at a computed offset 
same subtree wi 11 be duplicated in bo th 
in tw o locat i ons , the 
locat i ons to compu te the 
be converted to producing offset. In GAS code these subtrees wi 11 
a c ornp u·ed offset i n two different temporar i es. I f common 
sube x pression ana l ysis i s performed, 
used in each location. 
the same temporary wi 11 be 
5-12 
GAS Code Opt i misat i on 
ar i able id ent i i ed b y a dynam i c offset i s encountered 
uring dataf lo w ana l ys i s, a ll variables i n the storage area 
i volv ed must then be considered potentially used or defined as 
a pp ropriate. Th i s i s further comp I icated i n l a n guages that al lo w 
static type coerc i on, suc h as odula. Stat i c type coerc i on wi ll 
al low var i ab l es 
al ered, 
depends 
other 
on the 
to over lap !! 
var i ab l es 
degree 
may 
of 
so that when one variab l e is 
a l so be a l tered in a way that 
overlap . Modu l a a I I ows address 
me ipul at io n and this prov i des another source of poss i ble ov er l ap 
of obje cts . As th i s address manipulation may i nvolve dynam ic 
com puta tio n, the effect of overlap would be im poss i ble for the 
cornpi le r to ana ly se. Stat i c ov er l aps could however be ana ly sed 
a n d the effects determined. 
The defin i t io n of a variab l e in a storage area is an address and 
a datatype . In a la nguage such as Modula 2 variables could start 
at the same address, but are of different datatype and poss i bly 
a l so of different size. 
I n data flow ana ly s i s and register al l ocat i on, these effects must 
be considered. Any considerat i on must be conservative, so that 
in correct code wi I I not be generated . 
For def i nition reach i ng ana ly s i s, it is des i red to compute a l I 
definitions of variables that may reach a given bas ic bloc k . As 
sta t ed abo v e, this informat i on i s used for s ch opt imisations as 
consta nt fo l ding. If i t is k nown that only o e defin i tion such as 
y :=10 can reach a n i nstruct ion x := y , then fo l d i ng can be used to 
gener a te x :=10. If any other def i nition can reach the instruction 
t he n y:= x must be generated. It wouldn't matter if the e x act 
form of the definition is un kn own, al I that is important i s 
5 - 13 
GAS Cod e Opt imisat i on 
wh ether another def i n i t i on cou l d r each t h e y := x i nstruc tion. Wh e n 
a variab l e at a computed offset is used, a definition of all 
var i ables in the storage area should be co sidered generated but 
the variable i nvolved may be to an unknown value because 
overlapped part ially by some unknown amount. A I I current 
definitions of variables in the area shou l d be considered k.i I led. 
For 
the 
but 
i f 
e x amp l e if x:=10 reaches a code block, but x :=20 occurs in 
block before y:= x then the x :=10 can be considered "ki I led", 
constant folding of y:= x to y:=20 can sti I I be used. However 
x i s a variable in storage area two , a n d an assignment 
def i nition sett i ng a variable to 20 is encountered for a 
var-iab l e 
offset, 
in storage area two but at a dynam i ca I I y computed 
i t is not known if "x is affected. We therefore assume 
x :=10 is 
generated. 
ki I led and a new but undefined definition of x 
This wi I I prevent constant fo l d in g as the value of 
i s 
x 
cannot be computed by the compiler. The concept of an undefined 
definit i on is a useful way of handling the effects of uncerta i n 
overlap. The existence of one of these def i nit i ons wi 1 1 prevent 
erro n eous code being generated. That i s) they will 
optimisat i ons such as constant fo l ding fro m be i ng performed 
there ls uncerta i nty of the va l id i ty of the opt im isation. 
stop 
whe n 
For "Liveness" analysis, if a variable at a co mputed offset i s 
defined, then consider .. ive " var i ables in the storage area, to 
remain ''liv e". In li veness ana l ys i s it is hoped to find when a 
variable is clearly defined at some point because the variable is 
dead between the previous use and this de i n i tion, hence any old 
value in a register between the prev i ous use and this definition 
i s dead hence the reg i ster can be reused wi thout saving the value 
in mem or y . In the case where a variable defined by a computed 
offset i s defined, none of the var i ab l es i n the storage area may 
5-14 
GAS Code Optimisation 
be d e ine d f o r s ur e, so al l liv e var i ab l es r.u s b e co ns i de r ed 
st i I I I i ve (hence t e register contents wi I I be saved i f the 
reg i ster i s reused). Additionally, when target machine c ode i s 
be i ng generated or the GAS code i nvolved, any reg i sters 
current l y ho l d i ng var i ables i n the area wi 11 have to first be 
sa ed back to memory , and t he register marked to indicate it no 
l o ger h o l ds t he var i able va l ue, because o possible partial 
over l ap. his wou l d ensure any reference to a y var i ab l e i n the 
storage area 
wh i ch w i I I 
in a later instruction w i 1 1 use t h e 
be the correct value. If any va l ues 
v alue 
were 
in memory 
I e ft in 
reg is ters, these va l ues would no longer be correct if there were 
any o v er l ap at all. If the computed offset or a variable is 
stored in the sa me temporary as t he temporary sed to compute the 
offset for the variable stored in the register , and the datatype 
is the same, then the register wi I not have to be dumped ( this 
wil allow useful register use for sour c e code such as 
a[i+l]:=a[i+l]+l. If the variable is used , (as opposed to 
defined) consider al objects in the storage area become ive. 
Th i s is conservative in that a l 
saved if the reg i ster is reused, 
be touched later by th i s use. 
variables in registers wil l be 
because the alue i nvolved may 
Again any reg i sters containing 
va l ues i n the storage area should be f i rst du mped to memory but 
the reg i ster i s sti 1 1 marked to indicate it o l ds the va l ue, 
case of possible o v er l ap. Th i s wi 11 ensure that the value .. used .. 
wi I I be cor r ect. 
For availab l e e x pression ana l ysi s , an e x press i on 
l onger a vai l able wh en i t i s possible that one of 
c hanged. 
offset a l 
herefore when 
variables i n 
a 
i s cons i dered no 
i ts operands i s 
at a co mpu t ed 
so that a n y 
varible is de fi ned 
the area cou l d cha ge 
e xpressio n s i n whic any of these var i ables are used must be 
5-15 
GAS Code Opt i mi sat i on 
consi dered no longe~ a a il able . 
In the presence ot type coerc i on when any var i able i s defined, 
any re g i sters conta i ni g vari ab l es that over l ap should be dumped 
f i rst, and marked as no lo nger containing the var iable. When an y 
of the overlapp in g var iables are l atter referenced, the memor y 
value s wi I I be up t o date and wi I I have to be used as no register 
copy for an y of th em w i I I e x i st . '..Jhen a variable i s used, any 
vari ab l es over la pping s oul d be dum p ed first, but the regis t ers 
s t i 11 ma r ke d as containing the variables . Th i s is true even if 
the variable invo l ved is identified with a stat i c offset. 
Co ercion makes it ne cessary to know wh i ch variab l es overlap 
each other . This can be achieved by a bitstring (see below ) for 
each v ar iable to represent an y variable that it overlaps. For 
vari ables ident ifi ed by computed offset, it would hav e to be 
assume d al other var i ables in the storage area are 
When a variable i s used or defined , a l I variables 
overla p ped. 
it overlaps 
mu st be treated as be in g referenced, or defined with an unknown 
va lue . The same approach could be taken for variab l es i dent ifi ed 
wi th a static offset. However thi s is not as accurate as 
the possible. To im pr ove o th i s app roach, the e xact effect on 
variab l e s over l apped cou l d be co mputed. Th ese computations would 
i nvolve determin in g which variables are overlapped and dumping 
onl y th e se v ariab l es if i n registers. A bitstr i ng representin g 
variab l es over l apped ( see below ) ass o c i ated with each stat i cally 
i dentifi ed var i ab le woul d be an eff i c i ent method of 
im plementation. In i n i tial im p l e men t ati on s of this compi l e r , the 
s impler approach wi II be u s ed . 
As varia b les are only i dent i f ie d as c od e is proc essed, i t makes 
5- 16 
GAS Code Opt i mi sat i on 
s~nse to ide t ify useful variables during the poss of 
use d for Bas ic Code Block analys i s. 
the code 
Bitstr i ngs i n the optimiser 
ring dataf low analysis, it i s often necessary at the start and 
en d of basic code blocks to id entify the state of al I variables, 
definition instructions and use instructions important to the 
cp imis er at t , at point i n relation to a particu lar propert y that 
i s boolean i n nature (i . e. the liveness of eac h va riable) An 
economical way to achieve this is to use a bitstring where each 
b i t represents the sta t e of the property for each variable, 
definition 
definition 
a l loca ted 
or use 
or use 
important to the optimiser . Each 
important to the o p timiser has a bit 
varia b le, 
position 
in these "optimizer bitstrings". For e xamp l e a 
b i tstring is used to represent I ive ness of var i ables at the start 
of basic bloc k s of code typically may be: 
101110011 
.e variable one i s i ve, variab l e two isn't e t c 
Te b i tstring wi I I be used in logical OR an d AND operations many 
: mes, 
con s i st 
s·ze of 
hence 
of a 
the 
it must be effic i ently im plemen t ed . Bitstri n g s wi II 
I i n k ed I ist of bitstring pi ec es . Each p i ece i s of 
mo st c onvenient object size fr manipulat i ng 
b i tstrings 
:iitstr-ings 
on the machine the compiler is runn in g 
recorded for a bas i c code block wi l I be 
on. Various 
pointed to 
from the basic code 
var i ab l e 
b l oc k optimiser 
b i tstr in gs wi II 
data base records 
be ma i taine d for ad i tionally 
s t o ,· a g e a r ea of variables i de n t ifi ed that are lo cated in 
5-17 
and 
eac 
that 
GAS Code Opt imi sat ion 
s orage area . An array of s uch poi n ters wi I I b e k ep t or ea ch ot 
var i ab l e , global, and imported storage areas f o r each l e xi ca l 
I eve l . 
The o t imi ser database record for an i dent i f i ed var i ab l e shou l d 
in c I de t h e fol low i ng: 
Th e ( j) and 
a ) Opt imi ser ass i gned var i ab l e i d 
( also se rv es as o pt imiser var i ab l e 
bitstr i ng bit o ffset) 
b) Reade storag e area i d 
( area id or 
variable id or 
temporary id) 
cl Offset (mu s t be sta tic va l ue o r te mpora ry ) 
dl Ty pe of var iable (Reade Bas i c t ype l 
el P o inter to over l ap b i tstring 
f ) Lin k to ne x t var i ab l e in st o rage a re a 
g l Used as pointer (see l a t er) 
h l Reference count in rout i ne 
il L i veness 
j) Ne x t use GAS code in s r uc t ion 
( j) f i e l ds a r e used i n ne x t - use anal y s i s during 
target mac hine code generat i on. See the chapter on target machine 
c od e ge , erat i on . 
Acce s s - o va r i ab I es r ec or ds i s r equ i re d a s fo I I ows: 
al G i ven a storage area i d, of f set and bas i ct y pe , 
l o c ate t h e var i ab l e r ec ord 
5-18 
GAS Code Opt i mi sat i on 
bl G i ven one variable in a storage area locate 
variables records for al I other var i ables i n 
the area 
c l Given variable i d locate t h e record for 
the variab le 
The first access requirement is obtained by accessing the GAS 
code entit y database given storage area i d, obtaining the 
va riab l e id of the the first var i able in the storage area, th en 
us i ng the Ii nk to the next var iab I e in the storage area, f i nd the 
var iab le record that matches the offset and bas i ctype. 
The I i nk to the ne x t variable i n a storage area i s used f or 
construc t i on of a I inked I ist of variables belong i ng to an 
The i nk I is t w i 11 be maintained as a ring. The start of the 
wi I I be po i nted to from the GAS code entity database record 
the 
area. 
ring 
for 
the storage area which w i I I contain the variab l e i d of t he start 
of the r i ng. The ring al lows the second access r equ ir ement to be 
provided. 
Th e poin t ers to a l current ly acti ve var i ab l e def i nitions w i I be 
ke pt i n an array inde x ed b y the va riables id. This al l ows th e 
th • r d access requirement to be provided. 
GAS code statements that define and use variables must a lso be 
identified . Bitstr in gs wi 11 be used but stateme ts of interest t o 
the optim i ser must be ass i gned an id by the opt i miser. An arra y 
of poin te rs to instruction s of interest wi I I be main tained . Th e 
array wi 11 be indexed b y the statement ids. 
5-19 
GAS Code Opt imi sat i on 
If avai I ab le e x press ion analysis i s performed, bitstrings wi 11 be 
maintained to represent whether expressions are a Ii ve or not at 
the start and end of basic bloc k s. 
Pointers and their effect on data flow analysis 
Use of pointers comp I ic ates data flow analys i s. When an object 
that is pointed to by a pointer is altered, al variables could 
be assumed to change. This i s on the conservative side. Analysis 
of the ef f ects of pointers on reaching analysis and I ive variable 
analysis is very usefu l for optim i sation in languages such as 
Pascal and Modula because they are heavy in the use of pointe r s. 
Also any languages that al low cal I by reference parameters w i 11 
benefit from analysis of pointers. However such an analysis can 
be very diff i cult pat i cularly in a language such as Modula that 
a l lows coercion, which would al low overlap of variables and 
arithmetic of any des ire d form on pointers. The problems co v ered 
above involved in hand I ing objects at computed offsets, and in 
the presence of poss i ble overlapping of variables is made 
s i gnificantly worse in the presence of pointers. Pointer 
ana ly sis a l so has a high ov erhead because for each basic b lock , 
for each pointer, a I i st of var i ab l es that may be p oi nted t o on 
entry, and exi t, must be mainta i ned. The ana ly sis of po i nters as 
d i scussed below is based on the ideas of Aho and UI Iman [24], but 
has been extended to i nclude ident i fication of pointers, and 
coercion and arbitar y arithmetic on pointer contents. In 
particu l ar, the concept of a po i, ter "touchin g" a variable as 
opposed to pointing to a variable i s introduced. When a pointer 
" t ouches" a va r ia b le, the po inter wi 11 not point exp I icitly at 
the variable, bu t may 
va ria ble. Th i s concept i s 
point to an object that overlaps 
importa n t i n reaching analysis, i n 
th e 
that 
GAS Code Opt imisation 
a detinition statement ma y to uch a vari ab le, ence alter i g i t's 
alue, but i t can't be inferred e xact l y how the va l ue o t e 
ar i able w i 11 alter, which we cou l d have done if the vari ab l e 
spec i ical ly po i nted to the variab l e. his wou l d be i mportan , 
for e x amp le, where two defin itio ns reached a p i, t, but they bot 
set the var i ab l e to 2 . It wo u l d see m appropr i ate to perto rm 
con stant ai d ing, however, if one of the defin i t i ons on l y to ched 
r ather than po in ted e x p I i c i t l y to the var i ab l e , th i s cannot be 
done. Again the conc ept of defin iti on to a undefined value 
becomes important. 
Po i nter analysis i s ma de even more difficult in Reade or GAS code 
because pointer type variables ar e not i dent ified types. In fact 
the only objec ts declared are storage areas for which size and 
a li gnment i s g iv en. Aho , Sethi and UI Iman [9] e x press the opin i on 
that pointer analysis shoul d only depend on use of objec ts whic h 
are c l earl y def i ned as po i nters , and f or sou rce la nguages i n 
whic h ass i gnments to po in ters are guarranteed to only p rovi de 
al id pointer values. In a l anguage such as Modu l a, this is i n no 
way guarranteed because of coercion. 
wh en an object po i nted to i s used or 
A l so beca u se of coerc i on, 
defined , a l variab l es : n 
t e storage 
considered 
area conta inin g the object pointed 
poss i b ly used or defined i nd i rect ly . 
t o 
This 
shou l d be 
is unl e ss 
the compi l er performs accurate var i able overlap computation as 
me nt ion ed above. 
The above prob lems in a I anguage such as Modu I a make po i nter 
analys i s of debatable va l ue, but it 
The use r must however have to 
i s p lanned 
expli c i t ly 
to be im p l e ment ed. 
r equest pointer 
anal y s i s , a n d this must be clear in the pr oj ec t documentat i on. 
5-21 
GAS Code Optimisation 
In the absence of over l ap ca l cu lat i ons, 
hand I i ng on a storage area bas is. For 
pointers are on l y 
a pointe r, i t is 
worth 
only 
required to kn ow wh ich storage areas the pointer may point to. 
When the object po i nted to i s def i ned or used, a l I va riables i n 
an y storage areas that may be pointed to wi I I be co s i dered used 
or def i ned, because the var i able pointed to may overlap any other 
variabl es in the area. Note that if o e of the variables i n a 
storage area so used is a po in te r , th i s po in ter wi 1 1 then be 
con sidere d to poss i bl y point to a l I var·ables. Th i s wi 11 in fact 
a lso be 
altered; 
the case whenever any var i able in a storage area i s 
al I po i nters in that area must be considered to be ab le 
to po in t to al I variables, from that po i nt on. 
Pointers can be i dentified whenever a part of a storage. area or 
temporary is used to store an address value. That part o f the 
storage area i nvo I ved must be id entif i ed by a static offset into 
the variable or by a computed offset i nto a storage area. If the 
pointer va lu e is stored at a computed offset into a variable area 
(which occurs when an array e xi sts whose elements contains 
po in te ,-s l , i t wi 11 not be possible to clearly i dent i y when th i s 
pointer va lu e i s used later. However, as the the behav io ur of 
sch po i nters cannot be ana l ysed by t , e compi ! er because it can 
never always be certain wh ich pointer is be in g referenced. 
W enever a po i nter va l ue located at a computed offset into a 
var i ab l e is used to reference a variable, a l I var i ables (and 
po i nters are a l so variables) wi I I have to be considered possibly 
used or defined because it w i 11 not be known what ac tua I object 
is being referenced because i t cannot be computed by the comp i I er 
when the contents of this pointer was set. It wou l d be impossible 
to deter mi ne even the Rcode sto r age area t hat is poi n ted t o. Th e 
approach to pointers must be conse r vat:ve, and the conservative 
5-22 
GAS Code Optim i sation 
approach when unsure of a po i nter's e xact contents i s to assume 
that al I var i ab l es may be potent i ally defi ed or used. Consider 
the pointer to possib l y· ouch" al I var i ab l es, touch in the sense 
that a l I variables may not be pointed to exact l y, but they may 
overlap wi th t~e variable po i nted to. I certain of the variable 
pointed to, a l I variab l es i n the storage area of the object 
pointed to must be considered "touched" because of possible 
overlap with the object pointed to. The conservative treatment in 
data flow ana l ysis of pointers that may po i nt to several objects 
is described below. 
The term "touched" is used in the descriptions below to define 
all variables that the compiler considers may be pointed to or 
possibly overlapped by the objects that may currently be pointed 
to. 
In reach i ng analysis when the object pointed to is defined, al I 
variab l es that may be touched are considered defined by an 
unknown 
because 
va I e ( un k nown because the over I ap i nvo I ved 
of the poss i bi l ity of coercion), nless it 
is un k nown 
is known by 
t h e comp il er tat a spec ; fic variable i s pointed to directly in 
whic case t h e variable w i I I be defined to the value generated in 
the instruction . Any current definitions of the varia b le s touched 
shou l d be considered kill ed. If the po i nter explicitly points to 
any variables then t his variable wi I I be co ns idere d defined by 
the value generated in the instruction. 
In I iveness analysis, consider first the case where the variable 
po i nted to is def i ned. Then consider no change; al I I ive 
variables remain I ive . N w consider when the variable pointed to 
i s used. Cons i der al ariables i n storage areas that may be 
5-23 
GAS Code Optimisation 
poin ed t o o be e m-? iv e. As 2 ny ar i ab l e i n a ny st o r a g e are a 
hat can be pointed to can be invo lv ed this i s conservat i ve and 
i s e q u i v a I en t t o sa y i n g .. i mm e d i a t e I y pre c e d i n g th i s i n s t r u c t i on , 
sa v e the contents o any register tat conta i s the conte n ts o f a 
var i ab l e in a n y storage area that ma y be pointed to because 
statement may refer to a part o f that var i ab l e " . 
th i s 
I f a n address i s l oaded i nto a l ocation tha i s not c l ear ly 
i d en t i i ed , then c on s i der a l l po i nters i n the storage areas that 
ma y contain the po in te r to now be ab l e to po i n: to al I variab l es. 
Th e var i ab le whose address i s invo l ved cannot just be added to 
the I ist of var i ab l es that the poi n :ers can point to, 
poss ibl e overlap of po i nters means that the e f fect on 
that have been i dentified is unclear. This p r oduces 
because 
pointers 
the same 
e f ee t as l oadin g a n y value, not j ust an add ~ess va l ue, 
var i ab l e that can' t b e c l early ident i fied. 
into a 
Th e n e x t prob l e m i n pointer analysis i n vo l ves sp e c i f yi ng what is 
po in ted to b y the po i n ter. The contents ma y be un k no wn or 
unde fi n ed . I f u nknown i t means that 
ass igne d a v a l ue b u t no th i ng i s k now 
the po in ter may h a ve 
of t h e va l ue. In th i s 
b een 
ca s e , 
the po i nte r must be assumed to poss i b l y po in t to a l I var i ables , 
or st or age areas. Un defined mea n s th at the po' nte r has n ot be e n 
ass i g ned a va lue. A use . o f a pointer wi th unk nown 
p ro d uce a c o mp i I er warning message. The conte n ts of 
w i ll be represented as fo l lows : 
valu e w i 11 
a pointe r 
a b i tst r in g indicat i ng var i ab l es that may b e 
" o u c hed " , in that the object p oi nted to may overlap a n 
obje c t t hat i s touched, but is not po i nted to direct ly 
5-24 
GAS Code Opt imi sat ion 
AND 
a b i tstring indicating variab l es that may be exp I i cit l y 
pointed to 
The AND represents the fact that on some paths into a basic 
b l ock, the contents of a pointer may have been defined to point 
e x p li citly to variab l es and touch any other variables i n the same 
storage areas as these variab l es, whereas other paths may on l y 
have determined that the pointer may touch certain variab l es. 
The following is how pointers will be analysed to determine what 
pointers may point at, and what they may touch. The ana lysi s is 
based on the the algorithm of Aho and UI Iman [24 ]. 
a) If there is an assignment to a pointer where 
GAS code operand is "addressof x then 
w i I I point to "x"' explicitly and may touch 
variable in the same storage area as x 
the 
the 
any 
source 
pointer 
other 
bl If a pointer a is assigned a va l ue which is the 
contents of another pointer "' b"', then pointer a c an 
only point to the var i ab l es that "'b"' can point to and can 
only touch the var i ables that "'b"" can touch. 
cl If a pointer "a"' is assigned a value wh i ch results from a 
computation which involves adding a constant value to the 
address for the start of a storage area, and the r·esu It 
wl 11 not go outside t he area, then assume the po i nter may 
point to any variable in the area. If the result would go 
outside the area, assume al I variables ma y be ind i cated . 
5-25 
e l 
GAS Code Opt i mi sat i on 
If a var i ab le , i nc l ding any pointer, i n the same area as 
a pointer i s changed, ten the pointer must be cons i dered 
able to touch to any variable, but not po in t o any. T i s 
is on the conservative side. 
the po i nter i s assigned a computed value than as 
specified 
·ariab l e. 
in ( d ) , assume t he pointer can then 
ot h er 
point to an y 
When analysing pointers across code b lock s , any po in ter on en try 
to a bloc k should be cons id ered to possib ly point to the union of 
what the po in ter may point to on e xi t fr om a I I predecessor 
b l ocks. 
Loop Opt i misat i on 
Code 
t · e 
loop optimisat i on 
code comprising the 
a naly s is . However most 
involves global code motion analysis on 
loop, using the info rmation on reaching 
loop optimisations will be inv al i d using 
the code motion flow graph ( the graph t: ' es e techniques i f 
~epresent ing the poss ible motion of bas ic bloc k s) for the 
proc edure is not reducible. To be reducib l e essentially means 
rnat any loo p must have only one entry point. Languages s uch as 
Mo d ,i a-2 are i nherent ly reducible, but languages su ch as Fort r a n 
and Pascal wit h GOTO's can al low progra ms with loops that can be 
entered in more than one place. However structured 
also results i n reduc i b l e flow graphs. Add i tional l y 
programming 
loops w j th 
mul tip le entry points w i I I not be common because the lo gic 
associated with t hem i s tricky and usually avo i ded. However, the 
programmer shou Id have the ab i Ii t y to suppress such I o op 
optimis ations dur in g compi lation, and here shou l d be clear 
ar nin g in the docu mentation about the ne ed for th i s suppres sion 
5- 26 
GAS Code Optimisation 
when lo,os wi th more than one entry po i n t are used. 
If loo p analysis is to be perfor med, the Rcode structure makes 
I ops eas y to iden t ify. The ''Loop" Rcode clear ly i dentifies loops 
(except ose im p lemented us i ng GOTOs but i f the graph i s 
reducib l e, this just means that some l oops are missed). This 
greatly s im p li fies the process needed o find loop s as described 
by Aho and UI Ima n for t:iree address int ermed ia te code. When loop 
anal ysis i s performed, for each loop i dent ifie d, a databas e 
record wi I I be produced that contains: 
Loop id 
Conta in ed loops 
Contained basic code bloc k s 
The loopid is assigned by the optimiser. The contained loops are 
loops contained within the loop (sub - loops). These are easily 
detected in the Rcode by the nested structure clearly represented 
by the Rcode . Note that a reduc i b l e flow graph cannot have 
overlapping loo ps. Contained bas ic code bloc k s are those code 
bas ic b lo c k s in the loop that are not contained in s i de sub- loo ps. 
Loop identification wi 11 be sefu l i n register al lo cat i on. 
Emphas i s 
heavily 
loop . The 
wi 11 be placed on a l l ocat in g to registers, v ariables 
sed within an innermost loop, for the durat i on of that 
level of use of each var i able is obtained by co mputing 
usage counts for variables in the basic blocks of the loop (se e 
under target machine code generation). 
If loop analysis is not to be performed, loo ps wi 11 not be 
identified. Latter stages that may use l oop ana l ys i s results will 
5-27 
GAS Code Opt i mi sat i on 
operat e qu ite happ il y as the y wi 11 merely fin~ ~hat no loops have 
been i dent i fied. 
Loop optimisations in c l ude moving l oop i nvaria . code out to the 
start o the loop and creati g a preamble tc :he loop which is 
only executed once on entry to the l oop . The'"e w i 11 not be the 
problem that can occur in some i, tennediate anguages in that 
in struct ions are removed from the l oop that 2re the target of 
jumps, because j mps refer to po i nts defi ed by the USEVAR 
Gascode i nstruct i on, which w i 11 not be removec from the loop. A 
second opt imi sation will invol e identificat i on of induction 
variables and replacememt of the ir computatioG by addition and 
subtraction rather than multiplication and division (strength 
reduction) or even complete elimination of the variable . if only 
i nvolved in tests. The algorithms for loop opt i misations are well 
known and available from various sources (see b i bliography). 
Local Code B l ock Opt imisat i on 
Local code b lo c k optim i sat ion i s performed afte ... g lobal data flow 
based opt i misations. Typical local code b l ck optimisations 
include common sube x press io n el imination, re-ordering of 
instruc t ion s and peephole optimisa ion such as el i mination of 
u n ecessary ass i gnment statements. 
Common 
local 
subexpress io n analys i s can be achieved easily 
code b I ck by k eep i ng an "event r _mber" 
incremented each time an ass i gnment is encounte ~ed. A 
within 
which 
a 
is 
the start 
of the basic b lo c k , this e v e t nu mber is set to zero. The symbo I 
table descr i ptor of each variable records thee ent number of the 
last ass i g men • that altered it s value . he "e ent number" o the 
e xpre ss ion i s the la rgest e v ent number of the variables w i thin 
5-28 
GAS Code Optimisat i on 
i t , when t he e x press i on i s compu ted. When the e :-. pres s i o n i s 
n eeded aga in, it · s event number i s recomputed and compared to i ts 
even t number when previously compute d. If changed, co de to 
recomp ut e the e x pression va I u e w i I I be generated . Severa l 
alternative we I kno wn a l gor i t hm s can be used. No te that co mm on 
s u be x press io n e limi nat ion i s pe rf or med o n bas ic b locks d u ri ng the 
i dentification of bas ic b lo c k s, because i t i s eas y, and t'1is w ill 
hopefu ll y allow more va riables to be i dent ifie d as be i ng 
referr ed to in more places by a l lowing recognition of 
com puted offsets for v a ri ab le s. 
Peephole optimization inv olves the fol lowi ng : 
* Redundant assignment elimination 
a=b c=a a=z -- > c= b a=z 
a=b b=a -- > a=b 
* Unreachable co de elimination 
If fir st statement in bas ic code b l oc k 
is unla be lled and fol lows immediatel y 
an unconditional branch statement 
then i t can be e limin ated 
* E I i mi nation of AD D A, 0, A MUL T A, 1 , A 
common 
It has been decided tha ini t i ally common e xp ress ion el imi nat ion 
wi l l not be performed globa lly because empirical st ud i es show 
that little be n efit i s ga i ned as out-l i ned b y Ank l a m et a l [7]. 
Be cause of th is decision, temporaries wi I I no t sur v i e beyond a 
b as ic bloc k . 
5-29 
GAS Code Opt i misat i on 
Only l oca l code b l ock common expression e limi nation produces 
usefu I benefits. A global common e x press i on e l irnin at io n 
co uld be added to the optimiser. 
beyond basic b l ock boundaries . 
Temporar i es would then 
phase 
e xist 
Each of the steps of the optim i ser cons i sts of a subrout ine . Code 
motion and data f low ana ly s i s and the various optimisations are 
im plemented as subroutines that operate on the Reade tree for the 
routine and use the entit y database and optimiser database. Th i s 
ma ke s it easy f or any particu l ar optimisation to be omitted or 
for the optimisation phases to be omitted entirely. 
Th e code motion and .data flow analysis is the necessary first 
stage of the GAS code optimiser and generates the optimiser 
database which contains information on basic code bloc ks and for 
each basic code block, and the resu I ts of du, ud and ive 
variable analysis. Loop information may a l so be generated. The 
database information is used by the target machine gener a tor for 
re gister allocation and assignment . However if the generator 
detects that no optimisation and flow analys i s has been done, i t 
wi 11 al l ocate registers on a very l ocal bas i s. This gives t he 
option of producing code very quickly, but the code wi ll be 
terribly bad ! ! !! 
Representing Optimiser Database Information 
The optimiser database i s the fundum ent a l basis of the 
optimisation operation and may be used by the ta r get machine code 
generator. A vi ta I pro perty of the nature of t he database i s that 
it shou Id a I I ow a s muc h in d ep e ndence betwee n variou s parts of the 
optimi ser and target co d e generator as poss ibl e. For e xa mple , i f 
5 - 30 
GAS Code Op timi sat io n 
o p timisa io ns resul t in movi g s ruct i ons arou d, t his 
s~ould not affect I iveness analysis or reaching ana l ysis. Whether 
os have been i denti fi ed or not, or I iv e var i able ana ly sis done 
S , 0 not upset the target ma ch in e gene r ator; on l y the 
e ectiveness of code generat ion should be i nvolved. Design of 
:,e optimi se r database i s there ore v i ta l . 
Basic bloc k s, var i ables and GAS cod e database entries wi I I a l I be 
as3 i g ned in d ivi dua l ··i ds .. . This is tor use i n b its trings as 
def i ne d e l sewhere, but a l so a l lows that one of the o bjects may 
eas i I y be de l eted . If a GAS code i s deleted, a la ter d ir ect 
refere nce to i t by a pointer wi 11 cause an error. If any 
r eference i s via an "i d", the GAS code can only be accessed via 
an optimiser database. Thi s wi 11 reveal that t h e GAS code for the 
"i d" has been deleted. 
here wi I I be a separ a te database structure for each basic b l oc k . 
A li st of a l l bas i c blocks i n a routine, which w i I I i nc l ude a 
oointer to each basic block's database wi I I be k ept in an a rray 
in l e x ical order of b l oc ks. The offset of a po i nter in this array 
i s the i d of the basic bl ock pointed to. Each basic b l ock 
databa se w i I I i nclude the i d o each GAS c o de in st ruction in 
t he b l ock kept in motion sequence , iv e var i ables on ent ry and 
-= xi t ( b i tstr in g I inked I i st po inter) I iv e va ri ab l e definit i ons 
0 entry and e xi t , I i ve uses on entry and ex it (bi t s tring) and 
al predecessor and successor bas i c block ids, and the le xi call y 
s ccessot- b as i c bloc k id. B it str in gs are a l s o po i nted t o th a 
r epresent def ini t i on instructions that are kille d, def in i t i on 
in struct i ons that are generated in the b l ock, vari ab l es that are 
sed before they are defined in the b l ock, and vari abl es which 
ar e defined before they are us ed. The first t wo bitstrings are 
5-31 
GAS Code Opt i mi sat i on 
us eful in r ea ching ana l ysi s , a n d the second t~o are u s e ul in 
I iv eness a n a l ys i s. When process i ng of t h e cu rr ent ro ut i n e is 
com p l ete, a l I informat i o n r e l at i n g t o bas i c b l ock s in t he r outin e 
wi I I be re moved. No a I I owanc e has been mad e i n th e database o r 
record i ng i nfor mat i on o n a va il ab l e e xpress i ons , as commo n 
s u be xpress i o n i s no t be i ng perfor med a c r oss ba s i c c o de b l oc k s, as 
d esci- i ~ed e arl i er . I t i s q u i te I i k e l y t hi s wi I! be added l at e r. 
A da t ab as e of a ll va riabl es i dent i i e d b y h e o p t imi ser w il l also 
b e ma i nta i ned, as descr i bed ea rli er. An a rr a y of po i nte r s to a ll 
va r i a bl e str uct u res w i l l be ma i nta i ned . Th i s arra y wi l l be 
inde x e d 
c omp l e te, 
y va ri ab l e i ds. Wh en p ro cess i ng o f the 
a l I entr i es fo r var i ab l es wi I I be removed. 
ne w var i a bl e database is created for each r outin e . 
pro ce du r es wi I I be enc o untered i n post-order form so 
pro cedures w i I I be encountered f i rst. Var i ables 
pr oc ed ures wi I I not be i dentif i ed f i rst and t herefor e 
rout i ne i s 
Therefo r e a 
Co de fo r 
t h a t 
for 
wi I I 
in n e r 
outer 
not 
f i I I the vari ab l e in de x f i rst . Var i ab l es fr om va r i ou s l e xi c al 
l evels w i b e i nterm i ngl e d i n the var i ab l e array. Hence merel y 
r emov i ng va ri ab l es f or the current l e xi ca l l e v e l at t h e end of a 
rout i ne wi I I no t be s tra i ght forwa r d . I n the vers ion of th e 
back.en d be in g de ve l ope d t i s w i 1 1 not be do ,e. Al I va ri a ble 
entr i es are de l eted , and a va ri ab l e database 
routine. Th i s approac h w i I I no t be suffi c i ent 
i s bu i It f o r eac h 
i f interprocedu ra l 
data f l ow ana l ysis i s performed i n l ater imp l e mentation s . However 
an advantage i s that t h e var i ab l e database wi l l onl y be as l arge 
as t he n umb er o f var i abl es used b y a rout i e and i ts l e x ica lly 
enclo s in g rou t in e s . 
flow ana ly s i s . 
This wi ll s av e sign i f i c a nt s pa c e i n t he d a t a 
An a r r a y of p oin t er s t o all GA S c ode i n s t ruct io ns in the r outine 
5-32 
GAS Code Opt i mi sat i on 
be i ng o oc essed, at i nvolve a de i nit i on ot 3 v .;;r ; ab :e w i 11 be 
maintained. This array w i I I be indexed by the s ruction i d 
ass i gned to GAS c de i nvolved. o e that GAS code struct ion s 
that do not invo l e definitions or uses of i nterest w i 11 not be 
ass i gned an id, so tat bitstr i ngs are kep t as sma l I as ;::,ossible. 
When the process i g of t he current routi ne is c mp l ete, t e array 
is purged. 
A database structure will be maintaine d for a ll loops i dentif i e 
as described earl i er. An array of p o inters to al l oop structures 
w i 11 be maintained, with the offs et for each l oop poi ter being 
the i d for that loop . When processing of the current routine is 
complete, the loop database is purged. 
It mu st be remembered also that t he GAS code entity database wi J I 
also be present. When the cu rrent lexi cal level i s left (the 
p ro cessing of the routine has finished), the GAS code entity 
database wi I I on l y be purged of ent ri es relating to the current 
lexi ca l level . 
GAS code and Bas i c B l oc ks 
When basic blocks are formed, an optimiser ciatabase i s formed 
which conta ins or each basic b lock, a pointer to t e GAS code 
in structions for th e bas i c b l oc k . The i nstructions in a basic 
block, except for t h e la s t instruction, wi 11 be any GAS code 
e x cept a cond i t io al branch, return from subr outine , 
This w i I I a I I ow USE VAR and USELABEL Rcodes to ap;::,ear 
b l ocks . 
or 
in 
ca I I . 
basic 
u ring optimisatlo , GAS co des may be moved arou nd or deleted, 
ut after op t imisat i on i t must be possible to wri t e e mo difie d 
5-33 
GAS Co de Opt i mi sat i on 
R c o d e b a c o d i s k i n p o s • - o 1· d e r f o ,· m . T h i s w o u I d s e em t o i n v o I v e 
patchi g the Reade tree as changes are made to the GAS code. 
How ever, a GAS code BLOC K as been def ine d, which i s on l y sed by 
the optim i ser, and target mach i ne code generator, and wh i ch has 
the fie l d block-id. Whe a bas i c b l oc k is i dent ified, the GAS 
codes invo lv ed are removed from the Reade tree, and the GAS code 
Rcode BLOCK is grafted to the tree in their place. When t e Reade 
tree is written to d i s k, and the BLOC K GAS code i s encountered, 
the GAS codes for the b lo c k wi I I be accessed via the optimiser 
database and written with appropriate L i n k Rcodes to bind them 
toge her. Before code generat i on can be performed, the Reade file 
wi I I need to be read, an Reade tree and ent i ty database 
regenerated, basic code blocks i dent i ied and the optimiser 
database created. If improved target machine code i s to 
generated, data flow a na ly s is wi I I have to be performed to 
generate the ive variable '"in'" and ·· out'" ists again. Writing 
opt irni sed GAS code to d i s k wi I I t ere fore have a signi i cant 
overhead. However it may be necessary because of restr i ctions on 
mem ory size, or to al l ow later interpretation of GAS code, or to 
a I I ow target mach i ne code generators to use the sa me GAS code . 
To conserve space, 
i nstruct i ons. Th i s 
"vari able ·· which i s 
target operands wi I I be str i pped off GAS code 
··o r igin al .. target operand wi 11 represent a 
l oaded in to the database, un l ess this has 
already been done. The database entry wi I I po i nt to the target 
operand stripped from the GAS code. The GAS code wi I I be modified 
to point to a targ e t operand of a spec i al '" Target_Id'" variety 
which contains the fi e l ds: 
target_type 
i nd ir ection 
5-34 
GAS Code Opt imisation 
targ e t i d 
t h e target i d i e l d conta i ns the unique i d assigned to 
ariable b y the optim i ser. Th i s approach conserves space, 
a l so a l l ows fast eff i cient access to the variable database 
p ro cess i ng instructions later to produce generate and k i I I 
the 
and 
when 
i sts 
( see above under reaching anal y sis), use and definition sets <see 
abo ve under l iveness analysis), and next use ana lysis ( see 
cha pter on t arget cod e generation ) Having the " id " of variables 
i n GAS code instructions wi 11 be most useful. If optimised GAS 
code is written to dis k , operands must be written in full, so the 
"target id" of targets must be used to access the ful I operand in 
the database so that t he operand can be written in fu I I. 
5-35 
Chapter S ix 
Ta r get Mac hi ne Code Ge nera t o r 
Emphas i s i n the code generator des i gn has been p laced on 
producing a co de generator that is as i ndependent of the t arget 
ma chin e as p o ssib l e, but which must produce effic i ent target 
cod e . For the prototype, the functions r equired of the code 
gene r a t or i nc I ude: 
a ) Converting GAS code addressing mo des into target machine 
address i ng modes. Th i s involves manipula ti on of pointer and inde x 
reg i sters and wi 11 often involve general register al location and 
as sig nment strateg i es. 
b ) Conv erting b i tstring operations i nto operations of the 
ta r get mac hin e. 
c ) Mapping operations on basictype ob j ects in GAS code int o 
operat io ns o f s i ze supported by the target mach i ne. For GAS code, 
ob j ects are chosen to be as close as possible to target machine 
types , but as already stated, the GAS code generator may not 
ne c e ssar il y be able to exact l y specify target machine datatypes. 
d ) Con vert ar i thmetic operations wh i ch are represented 
t hr ee address operat i ons i nto appropriate target machi n e 
op e rati o ns, which are perhaps two or three address o pe r ations. 
e ) Al locate and ass i gn variab l es to registers. A l location 
6-1 
Target Code Generat i on 
dec i d i ·'"'t;, whic h var i ab l es shou l d be ass i gned to i nvolves 
r e gisters, and ass i g ment invo l ves assigning actual reg i ster-s to 
these variables. , ow i ndependent t ese two operations are depends 
to a large extent the orthogo nality of the target mach i ne. 
f) Map 
in structio ns. 
GAS 
At t 
branch in str ctions to 
i s stage use may be made o• 
target 
Re ode 
machine 
contro l 
codes st i I I prese t to recogn i se when spec i a I target machine 
instructions such as LOOP can be sed. F ixup of target addresses 
will be required. This requires in formation on the size of target 
machine code i nstructions so that offset 0 branch target 
instructions is known . 
gl Map the GAS subroutine cal I ing convention to target 
machine faci lities . This wi ll involve manipu l ation of target 
machine stack pointer registers, and possible use of target 
mac h i n e spec i a I sub rout in e ca I I i n g i n st r u ct i o s , but on I y , o 
course , if these are compat i ble with the sta nd ard GAS cal ing 
convent i on. The use of dope vectors to hand I e dynam i ca I I y sized 
local var iables a nd results will a l so be requ ir ed. Strategies for 
saving reg i ster co tents at ca l I time are a l so eeded. 
h ) Any GAS SVC instructions must be im plemented by 
a propr i ate generat i on of thunks or ca I Is to t e k erne I. 
i ) Mach i 
remain must be 
e and operating syste m specif i c Pcodes which sti I I 
imp l emented, again e i ther by generation of thunks 
or calls to target machin e libraries. 
The target machine code generator is a subrout i I e that receives 
as input an Reade tree for a rout i ne, a GAS e t i t y database and 
6-2 
Ta r get Code Generat i on 
the opt i mi ser databa3e. This generator w i 11 e , ;:>ect as a minimum 
that Basic b lock s ha e been i dent i i ed , .. ar i ab l es ·· have been 
i dentif i ed and t hat li ve var i ab l e, use a n d de f i nition, and "in 
and "out " i sts for bas i c code b l ocks ha v e bee n set to i n i t i a l 
NIL values 
Hopefully 
we I I. 
( as 
f u I I 
be f re any data l ow ana ly s i s i s performed l. 
data f l ow anal y s i s i nformation is a v a i I ab le as 
Target code generatio i s st i I I being performe~ with he Rcode 
structure a vailab l e as GAS code i s attached to the Reade tree. 
The Reade tree structure prov id es useful program structure 
in formation. For exa mp l e i t 
t o al locate space or l ocal 
is clear when code i s be in g generated 
var i ables of a routine, or when the 
conditional e x press i on for a choice struct r e i s being generated. 
Target machine code wi I I be generated d ir ect l y from s ome Rcodes. 
This spec i fically i nvolves group s ix machine specific Rcodes, 
whi ch are passed untouched through to th e targe: code ge erator. 
The generator 
optimisation. 
is 
The 
itse l f very im portant in terms of 
GAS code optimiser perfor mec general 
code 
lo g ic a l 
optimisations. The t a r get machine generator is 
improvement based on effic i ent registe r 
instruction se l ection and operand addressi n g 
These the code im provement strategies are 
spec i f i c. 
i nvolved in code 
sage, 
mo de 
effect iv e 
se le ct i on . 
target machine 
It is obvious tha t t here is sti 11 s i g nifica nt work to be achieved 
by t h e target mac h in e code generator . To e n hance p o rt ab i I i t y , th e 
aim has been to make the algor i thms i nvo l ved as target machine 
independent as poss i b l e , wi th target machine depe ndency I i mi ted 
6 -3 
Ta r get Code Generat i on 
as ew as possib l e clear l y defined constants and procedures. 
: ~ act i t as been found that such a c l ea solut i on has not been 
o actical, and i nstead "flu i d " abstracted procedures, constants 
ad associated database have been developed that are ported to 
each new target mach i ne in a generic ami ly. Hope u l Jy the 
p ,-t i ng only involves changes in implementation but in practice 
S Q e changes to the definition of procedures and the database 
wi 1 1 be e x pected, hence the term "fluid". he more effective the 
a ~stract i on, the less the number of defin i t i on changes r equ i red. 
he concept of gener i c families allows that "fluid " procedures, 
constants and database wi I I be developed w i th abstraction 
generalised with a family in mind. This has been found to be 
~uch more practica l that abstracting for as many machines as 
possible as the the abstractions become so general they a r e not 
ery useful. The GAS fa mily def i nition provides a clear 
definition for a group of machines , and provides a clear bas i s 
for the abstraction. 
some code generators, the approach that has been ta k en of 
representing relevant target 
c .aracter i stics by tables o 
machine and 
information. 
operating 
A totally 
system 
machine 
i ndependent program perfor ms target code generation us i ng he 
tab l e of i nformat i on when requ i r i ng in ormat i on on target mach i ne 
a~d operating system character istics ( see chapter two). Tab l es 
a r e adequate when there are several independent target mach i ne 
cnaracteristics for which the arget code generator w i I I require 
i nformation. However the c aracteristics v al v ed are usua l Jy 
related. It 
relationships 
rsipresented; 
b i t values 
is d i fficu l t to satisfactora! Jy 
in a table. How, for example, can 
represent such 
the fol lowing be 
not only can a certain register be used to store 32 
for integer addition, but the register i s also the 
6-4 
Target Code Gene r at i on 
o n ly one that can be used for stor ing a b i tstr in g s i ze value? a 
two address addit ion i nstruct i on may be provided, but on ly if one 
opera nd i s loc ated in a reg i ster, per aps a spec i f ic regi ster. 
ese pieces of information may on ly be relevant fo r a s i ng le 
mac in e. Th e table structure beco me s too co mp l e x, and must be 
des i gned to represent a y ormat ion relevant in code generat i on 
on any of the target machin es in t he fam il y. he code generator 
must take ac count of these character i stics; i t cannot ignore 
them. he tab l e an d co de generator become a superset of al the 
target machines. The family concept represented by the GAS 
machine hel ps to reduce cons i d e rab ly th e comp l e xi ty required as 
opposed to a table structure and code generator for al I targe t 
machines, and makes a table based approach more pract i cal. The 
characteristics relevant for machines in one family are . 1 ik ely to 
be ve,-y s imi Jar. 
The approach which has been attempted in this design, is to 
produce a target code generator whose structure i s based on an 
or thogona l machine with t h e characteristics of the gener ic 
f am ily, and modify the structu re for the non-orthogonal 
irregularitie s of the part i cular targe mach i n e. This appr oach 
seems to be succ ess fu I because machines in a fam i I y have s i m i I ar 
archit ectu re s and di ferences betwee n them te n d to be l oca li sed. 
Sign ifi cant arch i tectura l features ha ve much greater effect on 
the code generator form . Within the fa mi ly there can be many 
mi nor differences which modify the genera l algorith ms in a 
stra igh t-forward way such as pro i ding on ly two address 
str ctions, and the resul t of t hese in structions must b e a 
r e g i ste r, or restr i ct i ng registers for operands of a division 
ins t ru ct io n to using specif i c reg i ster s . 
s e I e c i n g reg i st er s i s st i I I the ma in i s sue , 
However the ta s k of 
it i s only the loc a l 
6-5 
Target Code Generat i on 
deta i I s that var y . The target machine famil y concept therefore 
has benefit trough into the target code ge eration phase. A 
genera I is e d target code generator for a I I mac h i n es i s not 
pract i ca l , 
A general 
var i at i ons 
but one tor a farni l y of machines o es seem practica l . 
target code generator or al I mac : es would involve 
that i ncluded ma'or architectura l features. The 
database that characterises the machine and reoresents the state 
of a ll oc a t i on of machine resources could be c ·i te different for 
many mach i nes. A s i ng l e database that allowe:J representat i on of 
all these factors would be huge. Within a fam il y, variations only 
relate 10 var i ations on the central architectural features 
character i se 
character i se 
the fami l y. A sing l e database tat can be used 
a l I the machines in one family and the state 
al location of resources is of a more practical size. 
The prototype code generator for the VAX machine Is based 
that 
to 
of 
on 
producing 
orthogonal 
instruct i o n 
a basic code generator for a machine which is 
i n the sense that any operands of any given 
can be of dif erent sizes and ca be located i n 
memory or any one of a set of registers, ad for any three 
target ad c r· '='" s GA S i n s t r u c t i o n s , t wo and three address 
i nstruct i ons a r e ava il ab l e. Add i tional l y any registers can be 
used as pointer 
orthogonality w i ll 
or index registers. A y further non-
have to be ca ered for by target machine 
specific code included where required. An examp l e is where a two 
address instruct i on would be the most efficient approach, but the 
target 
operand 
cannot 
a=a+ y, 
machine on l y provides two address instructions where 
that is overwr i tten must be located in a reg i ster, 
the 
i t 
be in memory. An example of such an instruction would be 
whe r e · a is not used before any value currently located 
in a register and therefore "a" wi 11 not win a register. When the 
6-6 
-1 
Target Code Generat i on 
code genera or or he or tho go n a I mac h i n e w i I I ca 1 1 
address 
instead 
address 
i nstructi n to be generated with both operands 
a dec i s io n wi I I have to be made to generate 
instruct i on with operands in memory if one i s 
or a tw o 
i n memory, 
a three 
avai I ab l e 
or orcing the dumping of a reg i ster or a dec i s i on may be made to 
dump a register anyway. These changes in t e code generator 
should be c l ear ly marked. It should be poss ible to easily 
generate a targe t code generator for one machine using an 
exi sting target code generator for a machine with s imi lar n on 
orthogonalit i es. For examp le, many machines w ill have a n o n-
orthogonality in wh i ch the result of the two address i nstruction 
must be a reg i ster. A modificat i on to the prototype generato r to 
handle thi s wi 11 produce a target code generator that wi 11 be 
easy to modify for al I target machines with this non-
orthogona Ii t y . For machines that are reasonab I y orthogona I the 
prototype code generator for a family of mach i nes wi 11 eas il y 
provide the target code generator. Their code generators wi I I 
a ls o not have the overhead of many of the tests requ i red to test 
for the presence of non-orthogonalities that would be required i f 
a sing l e genera l table driven target code generator was used. 
Th e e I ements of the target code generators for a I I machines i n 
the VAX family that will essent i ally be common i nc l ude: 
a) Re g i ster allocation, wh ich involves use of live variable 
analysis and ne xt use informat i on, plus accessing whether 
variable s are currently located in registers. Reg i ster ass i gn me nt 
w i I I be target mach i ne specif i c. 
b) Bui I d i ng the subrout i ne cal Is 
6-7 
Target Code Generat i on 
c) Implementat i on o r contro l structures pr imar ily by 
conditional branch i nstructions. 
c ) Branch fi xup, the general mach an i sm is not target 
ma chine dependent. 
E l ements which cou l d vary s ignific antl y from machine to machine 
in clude: 
a) Bitstring man i pulation 
bl Addressing mode se l ect i on 
cl Register dumping on subroutine cal I ing 
d) Register assignment 
e) Bui I ding instruction b it patterns 
The VAX prototype target code generator wi I I be non-orthogonal 
in that al operands of ar ithmetic instructions must be of the 
same s i ze, except for imm ediate values. Add i t i onally, registers 
12 to 15 will be r eser ved as pointers as desc ri bed i n the VAX 
Architecture Handbo ok. A few i nst ru ctio ns require use of 
specified regi s ters, or r egister pa ir s. The se will be hand l ed on 
the l ocal l evel at which they occur. For many VAX ins t ruc t i ons, 
there are restrictions on the allowab l e formats for some or a I I 
oper ands. The datatype of an operat ion is usually reflected in 
the instruction code . 
As discussed abo v e, abstract ion o f the target code generation 
6-8 
Ta r get Code Generat i on 
pr o cess i s the key to portabi ity . Consider i n mor e deta i I what 
this abstractions involv es. 
As ment i oned, database wi I I be maintained wh ich represents the 
c haracteristics and current state of al location o the machine 
resou ·ces. To a llow t e code of the compiler to be as modif i able 
or other machines as possib l e, it is desirable that the database 
be abstracted as much as possible. The actual database structure 
is therefore hidde and procedures provide access. For examp l e, 
for reg is ter a l location it wi 11 be necessary to know if a 
variable i s current ly stored in a reg i ster. Th i s i s not achieved 
by directly acc e ssing the database records as th i s requires 
spec ific knowledge of the record structures which wi I I vary 
mac hine to machine. Instead a procedure such as 
fr om 
In_Register(Variable_Id) 
i s provided which returns the register (or None) used. Other 
e x amp l es include: 
Al l ocate_Specific_Register ( Reg_ d,Variable_Id) 
Al locate _Register (Variable_Idl 
Deal locate_Register(Register_Id ) 
Has_Two _Address(Gas_ Code,Datatype} 
Abs tract ion requires that careful consideration be given to what 
procedures are 
These procedu r es 
required genera I I y across a range o 
may be changed from one machine 
6-9 
machines. 
to another 
because of 
procedures 
non-ort 
w i I I be 
Target Code Generat i on 
ogo a li ty. Howe ver the 
s i mi I ar for machines 
requ i red set 
w i th simi lar 
0 
non-
orthogonal i ties. These abstracted procedures for accessing the 
database wi 11 therefore be important in achiev i ng portabi I ity of 
the target code generators. They are discussed in mor e detai I 
later in this chapter. 
Such fuctions as handl i ng of operands, requests 
tar-get code, and the algorithms involved i n 
for generation 
such things 
of 
as 
registe r a l location, handling of declarations and subroutine call 
handling should also be handled in an abstracted manner. h i s is 
discussed later but wi 11 lead to code such as: 
IF In_Register ( X) <> No_Register AND 
Used_Before(Y,X) THEN 
Set_Location(Y,Location(Xl) 
Deal locate_Register(X) 
END 
In this abstracted code, the procedures In_Register, Set_Location 
and Deal loc ate_Reg ister wi 11 access a ta r get machine resources 
database, but the detai I s are hidden from the code above. The 
Used Before procedure wi I I access the optimiser database. Note 
that '"X" and "Y" are variables relevant to the target code 
generator and are not Rcode defined variables. 
Other i nformation ca n be made ava i I ab I e through procedures that 
access the optimiser database, and information about the target 
machine is contained in the PLIP "Machines" module. 
I f the functions of the target code generator are to be 
6-10 
Target Code Generat i on 
a s':rac ed and ea stractej a proach used tor dev e l oping a 
target code 
mac i n es in 
generat or 
the fa mil y, 
for 
it 
on e machine po r ted 
is necessary the the 
to many othe r 
appr oache s to 
ge 2rator func t ions be we l I t ought out, b t that the choi ce of 
app each wi I I be s ign i f i cantly influ enced b y t e ease of port ing 
t ' e approach to other machines in the farn i l y. If a d i f f er ing 
aporoach to a u ct ion i s cons i dered more effect ive on a spec i f ic 
mac , i " e , someone could im p lement t his approach at 
sections discuss mo st of 
some later 
t i e . The fo llowing the centra l 
f nc t i ons of target code generators and suggest approaches tha 
w i ! I be ta k en to p r oduce eff i c i ent code bu t sti 11 rema i n easily 
portab le to a range of machines. 
Storage Al l ocat i on 
The f i rst step i II i nvolve ca llin g the storage allocator routine 
for any declarati on s encounte r ed i n the routine (higher l exical 
level decla r ations wi I I have a lready been encountered and 
processed). If th e current l e xic a l l eve l i s the outer mo st l ev e l , 
i s static and i n k er d ir ectives must be then· the storage i nvo I v ed 
generated. As a portab I e inker 
d irect i v es is indep endent of 
i s used, the generation of 
arget machine. Handlin g of 
I ink er 
Rcode 
g l ob a l storage a rea and .. var i ab I e " dec l arat ion s i s not 
s i gn if i cant ly d iff erent , e xc ep for th e ge nerat i on of in ker 
d i rect i ves for g l obal storage areas . Note that Rcode var i ab l e 
areas may contain severa l var iab l es significant to target c ode 
ge eration (see also optimiser chapter) . GAS Va r iables in a 
g l oba l storage area w i I I be coded with the address ield set t o 
the off set of the object into the storage area. A I ink.er 
directive w i I I be generated to re l ocate th i s operand by the 
address the ink. er ass i gns to the storage area. Storage 
a l l ocat i on for stat i c areas i s therefore not performed by the 
6-11 
Target Code Generat i on 
ta rge t c o de generator . 
The a l l ocator wi 11 update the database e tries for Rcode 
v ariab l e storage areas, to record t h e ; te of set o f t h e area 
( i stat i c l y s i zed) or i ts dope vector < · dynami ca lly s i zed) 
ram t e 
that w il 
the byte on t , e stack po i nted t 
be used to access l oca l var i ab le s. 
by the stac k 
Whet : e a 
po i nter 
v ariab l e 
storage a r ea 
evaluat io n by 
i s 
the 
stat i ca ll y 
GAS code 
sized 
i nterpre 
i s 
er 
determined b y static 
of the 
determines the size of the object. If the va l ue 
stat i c, ten the size subtree i s deleted and the 
subtree 
retu r ned 
var i able 
that 
i s 
i s 
a I I ocated space at a static offset into t h e routine activation 
record. The entity record for the object is updated to indicate 
no dope v ector. If not statically sized, target machine c o de will 
be generated for the size subtree (this code w i I I not be emitted 
y et), w i th the result to be placed in a dope vector. The entity 
record for the object wi I I be upda ed to record that a dope 
vector is used. The machine module w i 11 conta i n information on 
the s i ze and required a Ii gnrnent for dope v ectors . Note that space 
for these storage areas may not necessar ily be a l l ocated i n the 
order de c I a r a t i on s are enc o u n t ere d as a I i g rn en t re q u i rem en ts may 
ma k e i t poss i b I e to use I ess tot a I space by using a d i f erent 
ordering. Database entries for GAS coae variab l es located i n 
variab l e storage areas w i I I be updated to ecord the byte of sets 
of the var i ables into the area. 
The Re ode tree 
cont ai n 
operan d 
GAS code 
of this 
that specifies the size of the variable now 
that terminates with a PUSH GAS code . 
push can be used to determine clearly i f 
The 
the 
variable is statically or dynam i ca ll y s i zed. This PUSH GAS code 
wi ll normally be converte d into an instruction to i ncrement t e 
6-12 
Target Code Generat i on 
stac k pointer. 
After al I dec l arat i ons for a routine have bee .1 processed, the 
space consumed by a ll stat ic a lly s i zed d y nam ic obj ects and dope 
vectors wi I I be k nown and code to e x tend the stac k pointer by 
th i s amount i s generated. The code generated ear Ii er to compute 
the size of each dynamically s i zed o b j ect i s e rri tted , and code i s 
generated to adva , ce the stac k pointer b y th i s com p ted size, and 
to store t he computed byte offset of the object into the rout in e 
invo cation record in the dope vec to r for the coje ct. References 
to objects i n the area wi I I requ i re generat i on of code that 
accesses the object vi a the dope vect or for t , e object, and at 
the byte offset of the object into the area. 
For 
s i :zed. 
global 
A 
specifies 
sto r age areas, al I storage are2s are statically 
inker d i rective wi I I be emitted for each area which 
the a li gnment of area, s i ze (which i s always static), 
area protect ion, explanatory te xt, and the area i d. Whenever 
re erences 
offset i nto 
relocation 
o e emitted. 
are made to a globa l object, the address wi I I be an 
the the area, and a 
of this address by 
For c o nstants 
i n ke r direct i e specifying 
the start address of the area w i I I 
the expressi o . subtree of 
Declare Constant Rcode that spec ifi es the constant value 
t he 
i s 
passed to the GAS cod e interpreter which shou l d return a stat i c 
va l ue. Code to push an object of this si:ze 2 n d a li gnment on the 
3tack wi I I be generated . The entity database entry for the 
c onstant wi I I be updated to i nclude the address assigned to the 
t h e c on sta nt , a nd its s ize . It i s assumed trat co nstants of a 
wo rd or less wi ll be ha n d le d as imme diate values directly in the 
code, a l though th i s can vary depending on the l evel of use of the 
va l ue and the effectiveness of the immediate aodressing mode o n 
6-13 
Target Code Generation 
the targ et machine. Co nstants dec l ared in a rout i ne w i 'I sti ll 
be al l ocated storage g lo ba ll y. Linker d ir ect i ves wi I I be emitted 
for constants spec i f yin g a read only i nitialised storage area. 
G l oba l i n i tial i sed storage a r eas are def i ned by us i g the 
Append_P1rea Rcode. This Rcode specifies stat i ca lly s i zed 
e x tens i o n s to g lobal storage areas w ith initial li tera l v alues 
specified. When encountered the va I ue subtree of the Rc o de i s 
handed to the GAS code i nterp r eter t o evaluate the initia l value 
and then a link er d ir ect iv e i s emitted that specif i es an 
extension to a storage area, and an in i tial value. Addit ionally, 
if an I N ITVAR GAS code i s encountered, a I inker direct i ve to 
in i tial i se part of a storage area with the specif ied statically 
computed value i s emitted. This code is generated when the GAS 
interpreter determ i nes by static eva lu ation that a part of a 
g lob a l storag e a r ea h as an initial value . 
The output of the target machine code generator wi I I be su i table 
for inp ut to the portable I in k er. This wi I I mean that any 
i nstructions that r efer to g l obal objec ts or constants w i; I have 
I ink er reloca t ion direct iv es appended to spec ify an o ff set from 
the s art address o the rele v ant st or age a r ea. For a l I g l ob al 
storage areas and routines, I inker d i rect i ves def i ning the m a r e 
generated which can be sed for reso l vi ng external refere n ces to 
them in other module s. Fo r refe rences to globa l storage areas and 
rou t in es i n other mod u les, linker d i rec tives are generated for 
such referenc es spec i fying re l ocation of references by the 
addresses assigned to the ar eas/rout i nes identified by modu l e id 
and area-id/routine-id . 
Hand li ng operat i ng system interface and intrinsics 
Cal I s to t h e operating system I ib rar i es, and kern e l ca l I s wi 11 
6-14 
Target Code Generat i on 
reed t o be generated. These may be req i r ed to im p I ement 
arithmetic functions such as real number opera ions, to implement 
SVC GAS codes, an d direct ref e rences to target system dependent 
rel ocatab l e object l anguage s ym bo l s . In each case ink er 
d i rectives must be generated that identify the target symbol GAS 
co de SVC cal I s such as sav in g and l oad in g conte x t may be 
i mp l emented d i rect ly by ge eration of t unks or by ca l I s to 
op erat ing system a so ftware 
or ca II to targe system 
requirements for operating 
i nterrupt 
depen dent 
system 
in struc ; o n 
reloca:ab l e 
cal I s a n d 
of some 
object. 
use of 
fo rm, 
The 
host 
i n tr in s ics wi I I be kn own to th e target c ode ge n erator and wi I I be 
hand le d according l y. 
Reg i ster Al l ocatlon a nd As s ignment, a n d Instruct i on Selection 
These a l gor i thms wi I I use the results of basic bloc k 
i den t if i cation, variable identification and iven ess a nalysis. If 
lo ops have been identified, this information will be used to 
provide g lobal register allocation to registers of heavily used 
var i ab l es in t h e l oop . Th i s w i I I be based o a usage cou nt 
a . alysis across t h e bas ic b lock s compr i s i ng the loop . 
Additi ona lly for each bas i c b lo ck, "next u se ana ly s is" w i b e 
performed. T i s analys i s i s as descr ibe d by ,-,'- o and UI Iman [9] . 
t u ses i n fo rmat io n deve l oped during GAS code optimi s ation . It i s 
ot i tse l f ?e formed during optim i sat i on as t h i s i nformati on i s 
not required for optim i sations performed. A bac k ward pas s is made 
over each bas i c b lo c k at the sta g e when target ~ achine c ode i s 
being generated for the b l oc k . For e a ch instruct io n encountered , 
re cord i n a separate Ii st that consists of o . e entry 
GAS code instruction in the b lo ck an d has a forward a nd 
or each 
bac k wa rd 
I i nk) 
status 
t he id of the GAS co d e and the ·· 1 i veness " an d "n e xt use 
for each of i ts operands ( a s mainta i ed i n t h e variable 
6 - 15 
Target Code Generat ion 
databas e es tab Ii shed. an is altered b y t h e 
i nstruction, the record for the var i ab l e entry for the operand 
i s set to indicate it is not alive ad as no next use (N IL value 
or ne x t use field of database entr y) The variab l e database 
entri es for operands not a l tered are set to i nd i cate t h at the y 
a re I i v e and that the next uses of t h em are i n the i nstruct i on 
be i ng processed (store the id of the current GAS code i nstruction 
i n ne x t use field). No te that t hee ects o f operations on 
obje cts at computed offsets, operat ion s on obj ects referenced v ia 
pointers, and the effects of poss i b l e type coercion , wi I I be 
h andled as describ ed i n the optimiser chapter. Any object in the 
sa me storage area as an object altered, or any objects in storage 
areas that could be pointed to by a pointer to an object to be 
changed, wil be left with the sa me liveness and ne x t use status. 
If t he object i s used, these other obj ects wi 11 be 
but the next use fie l d wi 11 be l e f t unchanged, Ii ve, 
that these objects have low prior i t y for reg i sters 
mar k ed as 
indicating 
because of 
t h ey a r e not d i rect ly invo l ved. The I iv e status e nsures that any 
chan ges to these objects wi I I no t be t h rown away during r egiste r 
reallocation. Add i t ion a ll y, t e current l ocation(s) 
(r eg i ster/memory) of each 
mainta in ed i n the database. 
operand 
At the 
us ed 
start 
in a basic bloc k is 
of the bac k ward pass, 
th e I iveness of each v ar i able wi 11 be set to the I i veness of each 
object as computed b y the globa l I iv eness analysis o n e xi t from 
the code b l ock, and the ne x t use va lu e wi II be set to NIL. If the 
RET i nstruction is t he last instruct i on i n a bas i c b l oc k, all 
variable s except t hos e at t h e curre t lexic a l level are 
consi dered to be live but the ne x t use fi eld i n the ir symbo l 
tabl e entries w i 11 be N IL. This i s because no int erprocedura l 
da a l ow anal y s i s is current l y done . L i ttle can be done in a 
mod lar com p ilation system . If this i s added a t a l ater dat e, 
6-16 
Target Code Generation 
h e n not a I I v ar i ab l e s w i 11 be I i v e on e x i t. no g lo b a l 
opt i misation has been done, the "out " I i ve var i ab l e i s t w i I I be 
NIL, hence assu me al I var i ab l es are I i ve on e x i t. 
The var i at i on i n order o co mputat i on w i th i n a bas i c code b I oc k 
has not been cons i dered at t i s stage i n project de v e l opment. T h e 
reaso n i s tat c ommo n sube x oressi o n e l im i nat i on on bas i c c o de 
b locks ma kes tree I a b e Ii ng a n d r e v ersa I of orde r of o perat ors 
l e ss th an o p tima I , beca u se te mpo r ar i es ma y I i v e beyo n d t he 
su bt re e i n whi c they are de fin ed. However there are tec h n i ques 
to r produc i ng sub-opt i ma I trees in terms of reg i sters used that 
could be implemented in l ater v ers i o ns. Note that t r ee we i g ht ing 
of the GAS code is possible because the fo ll ow i ng i s the f orm o f 
Rcode tree that wi I I be gene r ated for b i nar y operat i ons: 
LI NK 
I \ 
I \ 
I \ 
LI NK AD D a , te mp2 ,c 
I \ 
I \ 
I \ 
I \ 
MU L x , y ,a LINK ( 2) 
I 
I \ 
I 
' 
\ 
I \ 
ADD g,h,templ ADD te mp O, t em p , te mp2 
6- 17 
Target Code Generation 
Note chat tempo is an "outside" temporary, representing a common 
expression t-esu It. To a I I ow tree reversa I t e GAS codes for a 
basic b l ock wi I I have to be put bac into tree orm as above but 
a special form of tree would have to be al lowed that al lowed tree 
weighting values to be recorded on nodes. Also ote that ne x t use 
analysis within a basic code block would have o be done after 
tree reversal, and after the GAS code instruct i ons or the block 
have aga i n been l inearised. 
Consider now how registers wi 11 be al located i n code generation 
and how this interacts with selection of target code instruc t ions 
addres s ing modes. 
Code wi 11 be generated in a wa lk of the Rcode tree for a 
procedure, and when each basic block is encountered code for it 
wi 11 be generated. As the walk fol lows the Rcode tree, proc e dure 
structural information is ava i I ab I e whenever code i s being 
generated for a basic b l ock. GAS code instruct i ons for each basic 
bloc k wi I I be accessed using the optimiser database po i nters to 
GAS code instructions for the bloc k . Code wi I I also be generated 
for group six Rcodes. 
To i I lustrate the bas i c target code generation process consider 
the GAS binary o peration involving Rcode basictypes, ADD B,C,A 
where A: =B+C. Assume a multi - register machine, a l low or the 
effects of coercion, and to al low the possibi i t y of deferred 
storage o e l ements in arrays (objects ident i i ed by co mputed 
offsets) The fol lowing i s a genera l des c ription of the a l gorithm 
that wi I I be used, and discusses some of the re l evant actors. 
The al location and assi g nment of registers i s a l ways a 
6-18 
Target Code Generat i on 
stat i stical problem, but the a im i s to minimi se memory referen es 
by trying to k eep in reg i sters objects with earl i est next use. 
The uncertainty of what is be i ng man i pulated that is introduced 
b y coerc ion and po i nters makes effective a l l ocat i on of reg i sters 
v ery diff ic ult. Severa l steps are i nvo lv ed in the target code 
generation. 
Reg i st e r sto r es be f ore in struc ti on 
If any registers currentl y contain va l ues from the sa me storage 
area as A ( but not includin g A) or contain values for locat i ons 
i dentified by a po i nter that could point to an object in the same 
storage area as A (if no po i nter ana I ys i s has been done, 
register containing such a va l ue wi ll be i nvo l ved) then 
code to dump these va I ues back into memory, and 
r egisters as no longer stor in g these variab l es. They 
then any 
generate 
mark the 
may be 
changed b y the in struct ion so these variables must be cons i dered 
no lo n ger I ive. Asimi l ar approach is taken o r 8 and C to dump 
register conten s to memory, but l eaves the registers sti ll 
marked as po i nt in g to the variables. If efforts are made to more 
ex act ly identify vari ab les wi t h stat ic offsets that overlap, l ess 
register dum p i ng wou l d be required ( see note s on GAS code 
optimisat ;ons). If indir ect address in g is used to spec ify A, then 
generate code to dump the contents of any registers that conta in 
vari ab l es i n an y of the storage areas that may be pointed to by 
t he po i nter value p rov i d i ng the address of A , and consi der t he s e 
registers then free . The poss i ble areas that could be po i nted to 
wou l d be identified by pointer ana ly s is { see GAS code 
op timisat ion) In the absence of any global po int er analysis, a ll 
registers wi I I have to be dumped and considered ree . Note that 
ev en if p oi nter analysis has not been done by the GAS c ode 
optimi s er, lim ited pointer ana ly s i s could be done with i n the 
6-19 
l oca l 
i dent i 
basic 
y B 
code 
or C 
Target Code Ge ne rat i on 
bloc k . If i nd irect 
then do the sa me as 
address i ng is used 
for A, but leave 
reg i sters ma r k ed as st i I I pointing t o var i ables du mped. 
Se l ect i on of Operand Locations 
The next factor to cons i der i s where the operands wi I I 
to 
the 
be 
accessed for he instruction. 
whether the machine supports 
This depends on many factors; 
t hree address and two address 
instruction s, and any imitations on wher e the operan ds o f these 
i nstruc t i ons can be loca ted. For example, two address 
in str uc t ions such as ADD X,Y may be I imi ted to the first operand 
in a regi st er, the second e i ther in me mory or a reg i ster, with 
the first operan d receiv i ng the resu lt. 
If only two address i nstru ction s are a vai lable, several factors 
w i I I be sign ifi cant. In a two address in struc tion , the result 
overwrites one of the so ur ce operands, 
fol l owing cases arise: 
typically the first. The 
If the target operand doesn't w i n a r eg i ster fro m a sou rce 
operand that can be the source operand repl aced by the resu It in 
a two address instruc tion (u s ing a rev erse in st ruction 
necessary, and available, such as RSUB, RDIV l i t w i I I 
i f 
be 
nec essar y to generate a mov e instruct ion to move 
ope rand to the lo cat i on chosen for t he target, 
the first source 
before the two 
address in struct ion i s generated. However the generation of this 
mo e delayed unti after the location of the source operands has 
been determined. The "win" of a reg i ster from a source oper a nd 
i s based on the curr ent location of source operands, current 
regi ster usage, ne x t use of objects in t h e basic code bloc k , and 
I iv eness ana ly s i s . The target may not win because neither source 
6 - 20 
Ta r get Code Generat ion 
ope rand i s i n a register, or if any so iree operand i s i n a 
register, the target won't win the source reg ster (because the 
source has a ne x t use before the resu l t or be re the contents 
ot other reg i sters ) . 
If the target operand wins a register rom 3 sour ce ~-~~3n d that 
can b e rep l aced by the result in a t wo ad ress i ns ruct ion 
availa l e on the target mach i ne, then if the source operand is 
st i 11 ive, an i nstruct i on must be gene r ated to dump the contents 
of the register to memo ry first. Two address i nstructions are 
most u _ef u l in a situation where the target wi ns a register from 
a source op e r and that is not Ii ve after the instruction . 
If a so u rce operand is j n memory but has a ne x t use and wins a 
register, then code must be generated to lo ad the register. The 
operand value will be loaded i nto the reg i ster allocated, then a 
co p y w i 11 be moved i nto the I ocat ion reserved for t he resu It if 
necessary. The operand i s then avai !able i n a register for the 
next use , sav i g a memory reference. 
he loc at io n of t e source operands and of the result have to be 
considered toget her . First consider the c ase of a source operand 
curren t ly i n a reg i ster, that has no ne x t use, and does not 
c ontain t h e value for any other object, ad can be replaced by 
the r esu l t in a two address instruction ava i lable on the target 
mac hin e (con s i dering reverse instructions if necessary) This 
register w i I I be chosen for the target. Then the operands (minus 
the target if t he previous step succee ded in al loca t ing the 
result a re gis t er) are cons i dered for reg i ster allocation, the 
operand wi th t h e ear Ii est ne x t use i rst. No te that if a sour ce 
operan d has no next use, it wil l not be considered for a 
6 - 21 
l 
Target Code Ge nerat i on 
reg i ster. An opera n d wil l wi n a register i f i t has a ne x t use 
and any of the fo l l owing are true: 
- there are free registers that ca be used 
specif i ed operand in the ~ i ven i nstruction 
- there 
have n o 
are registers wi th objects that are 
ne x t use in the bas i c bloc k (these 
for 
I i ve 
w i I I 
the 
but 
be 
g l obal l y al l ocated ob j ects) and 
used for the operand 
hese registers can be 
- the operand wi I I be used before the object 
stored in any register that can be used 
current ly 
for the 
operand, unless the register has been earmarked to 
receive an operand already . The register it wi 11 be 
allocated is one that contains a va l ue used latest in 
the code block. 
the selection criteria are app l ied i n this order. 
It the source object that i s overwr i tten by the result i n the two 
address 
machine 
instruct i on must be 
instruct i on forms) 
in a reg i ster (requ i red by the target 
then the target operand must be 
a ll ocated a register, and the source operand that is overwritten 
moved to th i s register. The above algorithm w i 1 1 be applied, 
e xcept that the target operand wi I I be considered for a register 
before the source o per an d reguard l ess of ne x t se. If the target 
f a i I s to obta i n a r eg i ster after the ne x t use c r i ter i a has been 
app li ed, it wi 11 be a l l ocated the reg i ster containing the ob j ect 
w i t h latest use in the block . 
6-22 
Targe t Code Generat i on 
t i s a ter the l ocat i oL the target ad so rce opera ds h a e 
been determ i ned that instructions are generated to move a copy of 
the source operand to the lo cation of the resu l t and to dump any 
reg i sters rea ll ocated or a ll ocated for rece i v i ng the res ult . 
a three address in struc io n i s ava i ! able i t would be used in 
several s i tuat i ons: 
-When t i s not des irabl e to place the result i n a reg i ster. 
This occurs when registers are fu I I y used, and the re su I t 
has a ne x t use after any of the reg i ster contents or the 
resu l t h as no ne x t use i n the bas ic block (but i s st ill live 
of course or the in struction is redundant) and i s not 
globally al loca ted a register. 
-Either of the two source operands may be in reg is te r s or 
win register locat i ons because they have an ear lier next 
u se than other objects i n reg i sters. Source opera n ds may 
w i n re gisters spe ci i cally over the target because of ne x t 
use con siderations. 
A two address in struct i on will be chosen i n preference to a 
three address i nstruct i on i f: 
-eit h er of the source operands has no n e x t use after 
the instruct i on but is currently stored in a register 
and the resu l t as a ne x t use in the bas i c bloc k 
-e ither of the s ou rce ope r ands i s stored in a reg i ster , 
but has a ne x t use after the tar g et opera n d , and after 
al I other va lues cu rrently stored i n reg i sters 
6-23 
Target Code Generat i on 
-the target operand is the same ob j ect as one of the 
source operands (may need to use reverse operation or 
sub, div etc i nstruct i ons 
~e abo ve pre erences can be modified when t e t imin gs o various 
target mac hine in s ructions are taken int o account. 
Gene rate B i na r y 
After decid i ng the target instruction form t o use, and the 
: ocations selected for the operands, generate the re l evant b inary 
target bit pattern or the in st ru ction that i mplements the actua l 
GAS operands. A l so generate any I inker direct ive s required for 
relocation of operand addresses. 
Dum p Reg i s t e r s 
I an y of the operands are 
in the bas ic block and 
in registers but have no next uses 
are not a l l ocated globally to 
register s, generate code to dump the registers to memory a nd 
mark the registers as free. I f any source operands are no 
l onger ii ve, mar t e registers as free. 
At the end t e basic block, the contents of registers that 
ho l d a l ues must be cons ider ed. n a simple approach, at the end 
of the basic b l ock , use t he register descriptors to deter mi ne if 
a ny registers hold any variab l es that are I ive o exit from the 
basic b lock and f or these variables chec k if a val i d memory cooy 
st i 11 exists. For any I ive variables tat do no have current 
I i ve memory cop ies , generate MOV instructions to 
register copies i n memory . In a more complex approac 
decisio to h ol d common l y used variables in re g i sters 
6-24 
save the 
a globa l 
may be 
Target Code Generat io n 
taken. For examp l e or al I bas i c b l oc k s of a l oop, a dec i sion may 
be made to hold variables "' x" and ''y" in registers "'R1"' and "'R2". 
If these variables have been dumped < for examp l e due to register 
shortage ) , code must be generated to re l oad t em, as all basic 
code bloc k s w i 1 1 ass me on entry that t ese variables are in the 
registers globally al located to them. 
registers , they won't be dumped at 
heir var i able database entries w i 
If these variables are in 
the end of this basic block. 
ind i cate t e variable is to 
e 1-:.ept in 
the global 
l ifetimes 
a specif i ed register. Another use u l extension for 
al l ocation of registers i s to compute disjoint 
for variables with high sage and globally allocate 
registers for these variab l es using graph colouring algorithms. 
Index reg i sters and register al l oc ation 
Index registers are involved with the implementat i on of GAS 
addressing modes. GAS addressing modes (see the GAS machine 
chapter) allow indentification of several possible level s of 
indexing . The structure addressing modes selected during GAS code 
generation and optimisation shou ld al low as much address 
structure as possible to be retained so that maximum advantage 
can be taken of target machine address i ng modes. n particular 
where a array address invo l ves computat i on that includes a loop 
index var i ab le, the i oop index variab l e shou l d be retained as 
clearly i dent i fied i n the address computation for the array 
access. 
Index registers wi II be used in several situations. One use i s 
when an operand is specified as a computed offset into a global 
area and the offset loaded into an i ndex regi ster, with the base 
the address of the start of 
whe n a GAS code operand 
the storage area. 
i s identified by an 
6-25 
Another use occurs 
i ndirect address 
contained in a 
t he advantage 
Target Code Generat i on 
temporary, plus an imm ed iate o f set. An 
this form of ad dress i ng i s that 
e xample o 
success iv e 
references to var i ous fields of a record po in ted to by a po i nter 
would beef i c i ent ly im p l emented. he base address of the record 
wou l d be lo aded i nto an i ndex reg i ster and the various fields 
would be accessed using d i ferent immediate value offsets. If the 
GAS co de or co mput i ng the te mporar y involves add in g two computed 
objects, a n d the machine supports two l eve l indexing ( two in de x 
registers added to a static base address) then generate code to 
load two inde x r eg i sters with these va l ues . Hopefully one of the 
register values wi I I be used again for another reference . This 
s i tuation cou ld occu r when there is a two dimensional array or an 
array o records. In each case the static base value would be 
zero. Another use of indexing occurs when accessing stack 
objects . This wi I I use a stack pointer and either an imm ediate 
and /or computed offset. If the stack object i s accessed via a 
dope vector, a combined indexed/in d ir ect addressing mode wi I I be 
used. If the offset is co mputed, two l evel in dex i ng can be used 
avai l ab l e on the target machine (e.g. Int e I 8086 has i t l 
the contents of the stack po i nter register must be added to 
computed off set, and s i ng l e in de xing used. 
or 
the 
The use of i nde x reg i sters can be considered part of normal 
register al l ocat i on i f the machines genera l registers can be used 
as i n de x reg i st er s . In many mac h in es , a imi ted number of general 
registers can be used as index r eg i sters, or i ndex registers are 
q ite d istinc t from genera l reg i sters. Th is will not allow i nde x 
r egister a l lo cat i on to be cons i dered jus t part of normal register 
a l location. In the VAX machine, any genera l regist er can be used 
as an ind e x reg i ster, although i t is normal to r eser ve register 
12 o 15 as pointer registers, and registers O and 1 are used to 
6-26 
Target Code Generat i on 
return resu I ts from any VM S Ii brary routines ca I I ed. Th ei-etore 
the VAX prototype code generator w i I I ma i n ly c on s i der reg i sters 2 
o 11 as a vailable or genera l reg i ster and in de x register 
a ll ocation. For target machines w i th restr i cted use of general 
r eg i sters for i nde xin g , add i t ion a l code w i 1 1 be requ i red so that 
only a ll owable reg i s ter s are u sed when i nde x registers are 
allocated. Th i s code will typicallyh a vetobehandcodedon a 
machine by machine bas i s. However as de scribed earlier in th is 
ch apter, the VAX prototype is fa i r l y o rthogonal in the use of 
registers for genera I use and i ndex i ng and shou Id prov i de the 
struct ure for machines wi th constraints. 
This completes the f actors involved generat i ng code for an 
in struct ion such as ADD .. Consider now how to handle r eg i ste rs 
tha t are dumped to memory . 
Allocating space on stacl<. to dump registers 
When c o de i s be i ng generated and reg i ster con tents have to be 
saved in memory , the problem is "where". For v ariab l es already 
al located space i n memory , the lo cat ion is obviou s ly i n this 
a l lo cated storage location. Te mporar i es have no t however, been 
al lo ca ted any memory . Space w i 11 be a l lo cated on the stac k. f o r 
storing dump ed temporar y values . The space for these temporaries 
is a l l ocated as if i t were lo ca l storage. However, sufficient 
s pace should only be a ll ocated to hold the ma ximum num ber of 
temporaries saved at any one time. When a t empor a ry of a spec i fic 
data t y pe is to be du mpe d , a ch e ck i s made or an al located 
t emporar y storage object on sta ck that conta in s a variab le that 
i s no longer I ive . I f one is found, the temporary i s dumpe d to 
i t. If one i s not found, e x tra space i s a ll ocated on the stac k. 
With th i s appr oa ch, more space ma y be a I I ocated tha n absolutely 
6- 2 7 
Target Code Generat i on 
-. e e s s ar y , because spa c e ma y a v e be e a I I oc at ed for a tempo r a r y 
t at i s now dead, but 
c rr ent l y being dumped . 
i s o a s ma I I er s i ze than the temporary 
There o re there may be unusab l e gaps o n 
he stac k . However garbage co l l ec tio n to rec l a i m these gaps wou l d 
t be worth the overhead. e code generator must keep trac k of 
e rn porary space a I I ocated on t e stac k . 
Sav i ng resources used by a rout i ne 
e n a rou ti ne i s cal l ed, a ny v ar i ab l es the rout i ne uses must b e 
du m ed to memory if t h e cur r e n t memory value 
Add i t i onal ly , any reg i sters used by the 
i s not up t o d at e . 
rout in e that a r e 
c u rr ent ly being used by the ca ll er, must be temporari l y saved. A 
bi tst rin g indicating which reg i s ters are used at the cal I po i nt, 
c ould be pushed onto the stac k b y the ca l ler. The cal l ed routine 
wi I I use th i s b i tstring to sa v e the reg i sters i ndicated. At the 
end of the rout i ne, the b i tstr i ng is used to restore r e g i ste r s . 
Hence th e first and l ast i nstructions i n any rout i ne w i 1 1 be 
standard reg i ster du mp and restore code. A mor e c omp lex 
dump / restore a l gorithm wo u l d i nv ol ve on l y sav i ng those registe r s 
t hat are i n use by the ca I I er and wh i ch are a I so used by t h e 
r out i ne. Th i s wou Id be done b y the comp i I er per orm i ng an AND 
op eration on the bitstr i ng represent i ng registe r s i n use i n the 
cal ! er at the ca l I po i nt and a b i tstr i ng r eprese n t i ng the 
r eg i sters used b y the rout ine , w i th the resu l t i ng bitst ri ng u sed 
to decide reg i sters to be saved / restored. The bitstr i ng 
r epresenting the registers used by a routine wou l d hopefully be 
a v ai l ab l e i n the GAS code e t i ty database record for t ' e rout i e 
a t the tim e the co mp i I er i s generat i ng code f o r t h e ca I I so tha t 
t . e comp i I e r, k now i ng what reg i sters the ca I I er has i n use , can 
e al uate c o mm on reg i sters, and hence gene r ate co de t o push a 
b i t str i ng that indicates these common regi s ter s . If t h e e nti t y 
6-28 
Target Code Generat i on 
data ase r ecord or the routine has ro bits ring or regi3ters 
sed (c ode has not ye been generated 70r the routine: only the 
def ini tion has been encountered), or a record doesn': e xi st (the 
routine has not been encountered at a or e xam p l e an im ported 
routine) then generate code to push a b i tstri g that indic ates 
savi g of al l registers currently used by the caller. This a ll, 
f course, depends on being able to c l early id ent i y t , e rout i ne 
be i ng cal led. If this is not poss i b : e, then aga i n p ush a 
it str in g to in dicate a l regsiters i n u se by the cal l er must be 
saved. 
Variabl es that need to have their memo ry copy updated from 
register can be ident i fied in a sim il ar manner. The GAS code 
entity database record for the rou ti ne wi I I contain a bitstring 
i ndicating wh i ch variables i t MAY use. The ca ll er wi I requ ire 
code to dump registers that conta in any of these variables used. 
f no information i s avai I able on the routine, a l I registers 
e xcept 
he MAY 
those containing temporary values wi I I 
for the use of a variable refer 
have to be 
indicates 
dumped. 
that a 
varia b le may be referenced. Uncertainty due to coercior , computed 
offsets and pointers cou l d mean t h at many variables MAY be 
touched. Note however that handling the variab l es used bitstr ing 
only i nvo l ves the compi l er, i t does not affect target machine 
code. 
the target machine has few registers, and e f icien t 
in structions are available to dump the m , the s im p l st approach may 
be to dump a I I registers anyway. 
When a routine uses very few registers , then it may be aster for 
the routine to dump these registers rather t h an have code to 
6-29 
Target Code Generat i on 
pro ces s e r eg i ste r d mp bitstr i ng on the sta ck.. Hence i t w i ll 
i gnore the bitstring on the stack. Th i s dec i s ion a l so depends on 
how efficient the code to dump registers based on the bitstring 
can be made. The target machine may have a register save 
in struct io n that can ut i I i se the b i tstring d i rect ly, or i t may 
requ i re a ser i es of bit tests and push i nstruct i ons. Hence the 
dec i s io n at which point to dump regi sters used by the routine 
(rather than us i ng the r eg i sters indicated i n the bitstr i ng) w i 11 
depend very much on the target machine in struct i ons for sav in g 
registers. 
Parameters of subro utin e 
Parameter pass i ng conventions (cal I by reference, value etc> are 
ha n d l ed by the Rcode. Al I the bac k end sees is the need t o bui I d 
an argument bloc k. 
Hand Ii ng of subrout in e GAS ca I Ii ng convent i on 
Th e hand I i ng of the resu l t , argument a nd l oca l variables orage 
areas ha v e been discussed. They re quire computation of size and 
the al l ocat io n of space on the stac k ( poss i b l y us i ng dope 
vectors ). The GAS ca 11 i ng convent ion in d ic ates the form t hat the 
stac k shou l d ha v e dur in g a subrout i ne: 
resu l t 
return bloc k ( address and status) 
arguments 
l ocal variable st or age 
Prior i ty i s p l aced on im ple ment in g this structu re in the in terest 
of po r tab i I ity as d i s cuss ed in ch apter for, rather t h an v a rying 
the form at to su i t special target machine ca lli g convent ion s and 
6-30 
Target Code Ge nerat i on 
ins truct i ons. Assoc iated with the GAS sub rou t i ne con ent io n i s 
the Display Vector. h i s contains po int ers to 
arguments and local variables, d yn am i c and stat ic 
the result, 
i nk pointers, 
and e xcept i on ha d l er in for mat ion . I t i s s ggested t at an 
ef ective way to im p l ement the stac k of d i spla y vectors that wi I I 
be required at runt im e i s to ma i ntain a separates ack con sist ing 
o the disp l ay vectors. This stac k is re erred to as t e d i sp lay 
stac k to d i stingu i sh i t from the "'main"' stac k . Te base of the 
main stac k wi I I contain four po inters: 
Display Vectors ack base 
Display Vector sta ck limi t 
Currently act iv e D i sp l ay Vector 
Top of disp l ay stack (M ost recently created D i splay Vector) 
Assume that the Ca ll and Declare_Routine Rcodes are l eft on the 
Reade tree by the GAS code generator. The PUSHMARK, EWDISPLA Y 
and POPMARK GAS codes become unimportant, as the structure of the 
ca ll i s provided y ne Rcodes. These GAS c des are provided 
though ( see Chapter Four) as these subrouti e rel ated Rcodes may 
ot be present. I this case t h ey are u se d to in d i cate the 
difference between a fast ca l I and a ful I GAS subroutine cal I ing 
me chanis m. 
Wh en a 
structure 
routine 
CALL is encountered dur in g target code ge eration, the 
of the ca ll ( result, argumen ts, l oca l ar i ables and 
descriptor handling) is made clear b y t he "' Cal l " Rcode . 
On encounter i ng a Ca l I Reade the target code generator wi I I : 
6-31 
Target Code Generat i on 
al A l l ocate space or a new display on the display stack, 
and an i nstr-uction generated to i ncrease the top of display stack 
pointer by t e size of a d i sp l ay vector. 
b ) An instruction is ge erated to move t e main stack 
pointer to the RESULT field of t e new display vector. 
c) ,he GAS code to co mpute the size of the resu It is then 
processed. Th i s is located o the result computation sub-tree of 
the Ca I I Rcode. 
d) The GAS code to generate the arguments , also on a 
subtree of the Cal I Rcode, 
be a PUSHMARK i nstruction. 
generate instructions to: 
i s processed. The first GAS code wi 11 
At this point the code generator wi I I 
The 
-increment the stack pointer by the s i ze of a p i nter 
(return address) ad the size of any status informa t ion 
required for the target machine return instruction. 
-move the address of this return b lo ck to the 
RETURNMARK fie l d of the new display vector. 
-move the stack po in ter to the ARGLIST of the new the 
display vector. 
target machine code to generate and place the argu ments on 
t e stac k is then generated. 
e ) The sub-tree of the Cal I Rcode that contains the GAS 
6-32 
Target Code Generat i on 
c de 
stac 
o compute the routine descr i ptor and push 
i s then processed. The generator wi I I u s e 
it on the main 
i nformation 0 
t e target mach i ne to decide how to for m the stat i c environment. 
1, e descr i ptor wi ll not in fact be p shed on the stac k . It i s 
s~ggested that stat i c ink pointer be moved to the STATICLI K 
f i e l d of the new display, and the adc ess of the ta r get rout i e 
e r eta i ned by the code generator. 
f ) The CALL GAS code is encountered as t h e l ast instruction 
the ro utine descriptor computation sub-tree. A target machine 
ca l l in struction will be generated us in g the a ddress computed i n 
( e ) , ignoring the GAS operand for the CALL instruction. 
W en processing a routine declaration, 
wi 11 respond as fol lows: 
the target code generator 
al The 
dec l arations 
i:- struc ion. 
first GAS instruction i n th e subroutine, 
for local variables, wi 11 be the 
Again th i s 
Rcode 
in struction i s present in 
befo re any 
NEWDISPLA Y 
case the 
code "D ec lar e Routine" i s not retained by the GAS 
generator. It c l ear ly mark s the start o f a subrout i ne. When th i s 
struct io n i s encountered severa l in structions are generated: 
-t he current display 
PREVDISPLAY f i e l d of 
display vector stack. 
vector pointer i s move d 
the new vector on top 
to 
of 
the 
he 
-t h e po in ter to the new d i sp l ay vector i s moved to the 
cur rent d isplay vector po i nter ( in the main stac k ba se ) 
-the retur n address and any stats informat io 
6-33 
b) 
Target Code Gene r at ion 
curren t ly locate d on the top c~ t~e main stack i s 
popped and moved to the loc at i o po i nt ed to by the 
RETURNMARK f i e l d of the current d i sp l a y vector 
-the stack po in ter contents are mov ed to the LOCALSTORE 
field of the current display vector. 
The loc a l varia b l e dec lar at ions are encountered and 
space al located as sugges te d . 
cl The end of he routine wi I I cons i st of two GAS codes: 
POPMARK and RET. The code generator wi 11 generate the fol lowing 
instr uct io ns: 
-The RETURNMARK field of the cu rr ent display vector i s 
copied in to the TOP_OF_STAC K field of the display 
v ector pointed to by the PRE V !SPLAY field of the 
cur rent vector, and to the stac k pointer register. 
-The current display vector pointer is set to point to 
the d i splay vector pointed to by PREVDISPLAY. 
-The pointer to the top of the d i sp lay stac k i s 
decremented by the size of a disp l ay vector . Th i s 
dea lloc ates the display vector for the routine that i s 
term in at i ng. At terminat ion of a rou t i ne its d i sp lay 
vector wi I I a l ways be the most recently created display 
vector. 
-a return machine code in struct ion i s generated 
6 - 34 
Target Code Generat i on 
he s t at i c I i n k meth d i s l eft p to t h e ta r ge : c o d e gene ,-a tor 
based on the reso rces o f t h e mac hi ne . The s t at i c 
po i nter to a 
t h e stat i c 
I i st of po i nters t o d i sp l ay v ect or s 
en vi ro nm ent, or s i mp l y a po i nte r t o 
v e c o r for t h e p r e vi ous stat i c l e v e l i mp l e ment i g a 
o r the env i ronme n t. 
The a bove des cri bes h ow t h e c al I i n g mechan i s m ca be 
with no he l p f ro m a n y target mac hi ne fac i I i t i es , 
stac k po i nte r and s im p l e cal l and ret ur 
I i nk may be a 
t hat prov i de 
t he disp l a y 
l i n k ed li st 
im p l e mented 
e x cept for a 
i n s truct i ons. 
Improve ments in the speed of t he me chan i s m c o u l d be ach i e ve d by 
reserv i ng reg i ste r s t o hold the po i n ters to t e c rrent and new 
displa y vectors, and the f i e ld s of the curre t d i splay vecto r 
such as PREVDISPLAY, LOCALSTORE, ARGLIST and RESULT, and by 
ut i I is i ng i nstructio n s of the target mach i ne, part i cu l ar l y ca l I 
i nstructions, that i mprove the overhead of t h e process. 
I t wou l d be poss i b l e to ut ili se the VA X CAL S i nstruct i on 
auto mate more o f the ca l I process. Reg i sters co ul d be used 
to 
as 
fo l lows ( note t hat t hese r eg i sters may not po i n t d i rect l y to t h e 
o b ·ects spec i f i ed , but ca n be used t o eas i ly a cc ess th e des i red 
objects by the add i t i o n o a s ma I stat i c of se t l : 
- Th e Ar gu ment Po i nter 
provides the ARGLIST f or 
reg i ster ( .!;P reg i ster 
the current d i sp l ay vector. 
1 2 l 
-The Stac k Po i ter 
t h e TOP_OF_S TAC K fo r 
reg i ster (SP reg i ster 14) p r o vi des 
h e cu r r e n t d i s p ' a y v ect or . 
-T h e Frame Po i nter register ( FP reg i ster 13 ) 
the LOC SORE f or t h e c u r rent disp l a y . 
6-35 
prov i des 
Ta r get Code Generat i on 
-The Program Counter (PC reg i ster 15 ) pro vi des the PC 
for the current d i sp l ay vector. 
- he Frame Po i nter reg i ster also effect i ve l y provides 
the RETURNMARK for the current d i sp l ay vector. It does 
not po i nt d ir ec ly to the r eturn b lo c k , but does 
p r o v ide t he e f ect iv e return address fr t h e VAX RET 
i nstru c t i on. 
-Reg i ster 
disp lay. 
10 provides the RESULT f i e l d of the cu rr e n t 
-The Argu ment Pointer also is used for the STATICLINK 
field. After the arguments ha v e been pushed, the static 
environment wi 11 be computed, and a po in ter t o i t is 
pushed onto t h e stack. Therefore the AP p r ovides access 
to both t e stat i c environment and the argument I j st I 
a l though it does not actually point to e i ther d i rect ly . 
- h e EXC EPTHA NDLER for the current d i sp l ay vector i s 
provided 0 the stack along with the sa ved register 
contents po i nted o by the frame pointer. 
handler mechan i sm provided by the VAX 
utilised. 
The e x ception 
can then be 
-The EXCEPTPARM for the current display vector wi I I be 
provided on the stack after the exceptio n has oc c urred 
(see VAX Architecture Handboo k ). 
- The Frame Pointer register a l so effectively prov i des 
6-36 
Target Co de Generat i on 
the PRE ISPLAY as t e AP, FP, SP registers 
described here wil l be saved on 
and 
the stac k when a 
subr-out i e i s ca I I ed and together they effecti vely 
provide the previous d ispl ay v ector for t e routine 
which ca l led the current subroutine. 
T~e STACKLIMIT and STACKBASE fields of the current display are 
~o t currently 
register s for 
implemented because the VAX does not pro vi de 
these, but provides read only pages on e i ther end 
t the memory a I I ocated for the stack . 
o begin a subroutine cal I, the result area is computed and s t ack 
space is al l ocated. 
en the PUSHMARK GAS code i s encountered, register 10 is PUSHED 
= d the stack po int er i s then move d to register 10. Regis er 10 
- en wi I I point to this saved value and at an offset of 4 bytes 
~,om register 10, i s the space al located for the result area of 
t e subroutine to be ca 11 ed. The arguments are then pushed on the 
stac k . The routine descr iptor is computed. The address of the 
routine, if computed will be com puted into a register . The 
stat ic ink wi I I be pushed on t he stack. During computation of 
a guments and the routine descriptor, the code generator must 
no te that the current result area pointer is now on the stack, 
but 
'::e 
c an be accessed via register .0. The CALL GAS code wi I I 
encountered and a VAX CALLS in struction is generated with 
then 
the 
. mber of arguments operand set to zero. 
w i 1 1 auto matica lly: 
This CALLS instruction 
PUSH the number of argu ments operand 
6-37 
Target Code Generat i on 
sa v e stac k po i nter i n a temporary internal register 
PUSH reg i ste r s specif i ed in b i t mask at the start of 
the c a l l ed rou t in e 
PUSH PC (return address ), FP , AP 
PUSH a long word (32 b i ts) (not relevant to d i scussion ) 
PUSH zero longword (exception handler) 
FP is rep l aced by SP 
AP i s set to the saved SP 
PC is set to the first instruction in cal l ed subroutine 
Th i s stac k structure pro vi des wh at i s termed the re turn fra me for 
the cal l ed routine. 
The PC, AP, FP , SP, Reg i ster 10, the pushed static I i nk, and the 
pushed e xc ept ion handler together prov i de the disp l ay disp l ay 
v ector for the new rout ine . At plus 4 b y tes from the AP is the 
static I ink, and at plus 8 bytes is the f i rst argument . The FP 
po i nts d i rectly to E XC EPTHANDLER. At p l us 12 bytes fro m the FP is 
the prev i ous FP which can be used to a cce ss the previous d is p l ay 
vector EXC EPTHANDLER field (see above). 
The pre viou s disp lay vec t o r i s prov i ded as fol l ows: 
6-38 
when the 
Target Code Generat i on 
PC 
OCALSTORE 
ARGLIST 
RESU 
STAT I CLINK 
OP_OF_STACK 
RETURNMARK 
E XC EPTHANDLER 
pus ed PC 
pus ed FP 
pus ed AP 
the previously pus ed R10 contents 
p i nted to b y pushed via AP 
register 10 c onta in s t e value t e stack 
po in te r wi I I be set to on return to the 
caller; this wi 11 po i nt to the pushed 
static link for the ca ller . 
effect ively provided by the pushed FP 
prov i ded on the stac k in the return 
frame of the ca I I er sed s a ved FP to 
access this frame) 
VAX RET i nstruct io n i s encountered, the fol lowing i s 
performed by the VAX: 
-PC, AP,FP replaced by values in return frame 
-an y saved reg i ste r s restored 
fir st in struction in t e c a ller after the CAL S shou l d e a 
o P into reg i ster 10 to rest o re the po i nter to the RESULT area of 
t'.e cal l er ro utine. The R!::S LT produced byte ca l led routine 
1,., i 11 then be avai I ab l e on the top of the stack. 
this approach t o implem ent in g the GAS cal l mechanism, t e 
c urrent d i sp l a y i s conta i ed i n a mix ture of registers and sta ck 
locations . Previous disp lay s are sp r ead throug h the stac k . Though 
d i sp l ay v ector fie l ds a re spread, it i s reasonab l y easy to obta i n 
access t o an y o f them. Extr a i tems appear on the stac k than are 
6-39 
Target Code Generat i on 
spec i fied or the GAS cal I con ven t i o n, but these i tems are simply 
l o cated. Any programmer that man i pul ates the stack to im plement 
so me function and hopes the sta ndar d stack structu r e of the GAS 
convention i s avai I ab l e on each target machine so that the i r- code 
i s easily ported, w i 11 find the e x tra i tems on the VAX stack eas y 
to account for . The d isplay vector is however provided, though 
t~e programm er must take account that the reg i sters used for 
disp lay ve ctor fie lds do not al po i nt direct i y at the objects 
i volv ed. 
arguments" 
It i s impo rtant to note that the " number of 
operand of the CALLS in struct ion has not been used to 
dea I I oca te the spa ce used b y arguments. Th i s VAX mechanism I imi ts 
t he s i ze of the argu ment bloc k to 1024 bytes . Additionally co de 
i s requi re d to compute the s iz e of the arguments so that the 
"number of arguments " operand can be .provided. It is felt the use 
of register 10 for stack storage dea l loc ation i s more effective 
on both counts. The a I I ocat ion of the resu It to the stac k rather 
tan returning through register zero as the VAX con v entio n 
spec i fies a l so gives wider portab ili ty. The major defect of the 
se of the VAX CALLS ins truct ion as described here , i s the 
e x cess iv e use of reg i ste r s. Addi tiona lly t he real gain in 
per onnance of subrou tin es i s i mi t ed. 
An e li mination of the need for r eg i ster 10 can be achieved b y 
u s i g a stack o f ARGLIST po in ters as fol l ows: 
al Al l ocate spac e or t he re sult, with any descriptor 
requir ed f or a dynam i ca lly s iz ed re sult al l ocated I as t. 
b) Push the c urrent stack pointer onto the ARGLIST stack . 
c ) Generate th e arguments onto the stac k, statica lly s i zed 
6-40 
Target Code Generat i on 
argum e ts a n d descr i ptors for dynam i ca ll y s i zed arg umen ts first. 
dl Compute the routine descr i ptor and push onto stac k. 
e l The compi l er se l ects the registers to be sa v ed. 
f ) A llocate space for saving reg i sters , a d i sp l a y vector 
and a du mmy VAX r eturn frame by decrementi g the sta ck pointer. 
Th e spa c e i s a I I o ca t e d ass u rnrn i n g i t w i I I o c cup y the space 
curren t ly occupied by the routine descriptor static I ink (but not 
the PC f i eld o the descriptor). Set the return frame condition 
hand l er , register mas k and the "number of argu ment s " fields to 
zero. 
gl Move the computed static I i nk to th e STATICLINK field of 
new display. 
h l Save s e I e ct e d r eg i st er s i n the a I I o cat e d space. 
i) Call the subrout in e using a JSB instruction that uses 
the address in the computed routine descr iptor on the stac k , i f 
the address is dynamical l y computed, e l se use the statica lly 
eva l uated address. 
j ) 
fol lows: 
The first instructions of the subroutine should be as 
-The return address on the top of t e stac k is popped 
i nto the " saved" PC field of the ret rn b l oc k . 
-The cu rr ent FP and AP registers are move d to th e 
6- 41 
Target Code Generat i on 
" saved " FP and AP 
block. 
i e l ds r e spect iv e l y o the r eturn 
-The va l ue on top of ·he ARGLIST stac k i s cop i ed into 
the AP reg i ster . 
- The current SP contents are moved to the FP . 
The subroutine i s not final l y cons i dered entered unt i I the l ast 
step which makes the display vector for the ne w routine final l y 
active. Between the JSB instruct i on and the move of the SP t o the 
FP, the caller i s still considered active. 
The current disp l ay vector is effectively pointed to by the FP 
and is comprised as follows: 
-EXCEP HANDLER is located in the Cond i tion Handler 
field o f the return b l ock. The VA X Cond i tion Hand l er 
mechan i sm wi 11 therefore st i 1 1 operate correctly. The 
FP po i nts to the pre v ious return frame, which w i I I 
contain the next l eve l c o nd i tion hand i er. 
- RESULT and ARGLIST f i e l ds are effect i vely provided by 
the AP register. 
- LOCALSTORE is effect i vely provided y t e FP. 
-PREVO SPLAY is pro vi ded b y th e "saved " FP. 
- PC i s ~rovided by the mach i ne PC . 
6- L..2 
Ta r get Code Generat i on 
-TOP OF SAC K i s provided by the VA SP register. 
k ) Space is al located for the local variables of the 
subrout i ne . Statically s i zed objects and dope vectors first, hen 
space for dynamically sized ob·ects. 
I) Return from subroutine is achieved by the VA X RE 
instruction. Th i s wi l l automatically load AP, FP and PC with 
those of the ca l ler ( saved i n the return block) and wi 11 leave 
the SP pointing to the display vector of the subrout i ne that has 
just comp l eted. The fol lowing steps are then perfor med: 
-Restore saved registers. 
-Pop the value on the top of the ARGLIST stac k into the 
SP. This value effectively points to the result and 
arguments of the subroutine just returned from. 
This completes the subroutine cal I process. Its ma i n advantages 
over the previous two methods are: 
a) Does require exc l us i ve use o f any more registers than the 
VAX convention ( AP, FP, SP) . 
bl Recogn ises t hat in physical reality, 
created d i splay vectors i s not required, 
a separate stack of 
but that merely a 
separate stack of ARGLIST values for created d i splay vector s is 
required. Space is only a ll ocated for display vectors when its 
routine actu a lly becomes active. This is useful when genera ting 
arguments requires subrout in e ca I Is. 
6 - 43 
Target Code Generat i on 
'"'e problem for th is metho d {and the f ir st method inv o l v in g a 
separate d i sp l ay vector stack ) i s how to al locate space for the 
:..qGLIST stack. Space can be al lo cated in the same the storage 
2re a a l loc ated or the main program stac k. It grows 
o ~posite end to the main stack. The base of t h e 
:on ta ins a po i nter to the top of the ARGLIST 
in from t h e 
ma in stac k 
stack. T e 
.::·tficulty is that hardware stac k overflow protection would not 
~ w opera te o n t h e VAX. The main stac k cou l d overflow i nto he 
~RGLIST s tac k. I n a ma ch in e with stack base and I imi t registers, 
the I imi t reg i ster could be moved to a l ways point to th e top of 
- e ARGLIST stac k. Alternat iv ely space cou l d be a ll ocated in 
~e ap, but this has the disad va ntage with multiple p ro cesses that 
space wou l d be wasted if th er e were many of these stac k s 
all ocated and only part i ally full. An alternat ive i s t or ARGLI ST 
sac ks to grow in segments, bu t management o t ese 
wo u Id add e x tra overhead to subr o utine ca I I i ng. 
J ump/Ca l I d i splac ement F i xup 
structures 
um ps to labels are considered in two categor i es, j umps to labe ls 
n the routin e , and jum ps t o l abels outside t he routine. J umps 
,n side the rout in e wi I I be optimised t o minim i se required offset 
s i zes. J umps wi I I require 
2 return to 
to l abe l s outs i de are mor e co mp l e x and 
the env i ronment o f th e l abe l . Th is 
of the stack . The routine containing 
wil require 
manipulation 
l abel wi 11 not have been coded, hence so t h e o f set 
wi 11 not be computab l e. The branch i nstruct io n w i 11 
the target 
to the labe l 
be encoded 
wit h a displace me n t l arge en ou gh to address a nywh ere in the 
program address space . The target label wi 11 no t hav e been 
encountered because Rcode presents the p ro gra m in Post-ord er 
- rm. The 
e n countered 
code for outer enc l os i ng 
last, enc l osed l e x l eve l s 
6-44 
l e xic a l 
first . 
l e v els 
I n the 
w i I I be 
l anguages 
_J 
Target Code Ge nerat ion 
hat wi I I be hand led he target of a ump mus t be with i n 
lexica l l evel or to an outer enclosing l e xic al I eve I , in 
the 
the 
current environment. The code for enclosi g l e x ical level s wi ll 
not a v e been encountered. A I i nker d i rective wi I I be generated 
to in d ic ate that the contents o the disp l ace ment f i eld must be 
set to t e address o the label. When this labe l i s late r 
e coun ered during target code generat i on, a I i n k er d i rective i s 
generated to identify the l abe l to the I i n k er. 
Branch in structions wi I I be encoded using PC r e la t lve addressing 
mo des whenever poss i b l e. 
Ca I I in structions can be encoded to minimise the displacement 
field if t he code for the procedure has been generate~ which wi I I 
be true if the rout i ne called i s lexical !y contained. If the 
code for the called routine has not been generated, then fu ll -
leng th displacement must be generated. owever the requ i red 
displacement l ength can be computed as t h e Cal I i nstruct i on i s 
generated. The actua l value of the d i sp l acement cannot be 
computed u n t i I the s i ze of d i splacement fields of 
in structions 
computed. 
precedi ng the ca l I in struct ion in the 
any branch 
routine are 
As code i s generated for the rout i ne, branch instructions are 
coded assuming long displacement fields in the instruction. 
Labels are given byte offsets from the start of the routine based 
on branch in struct ion s with long offsets. A lis t of pointers to 
branch ad ca ll instruction s is kept, in p hy s i ca l order. When the 
coding of t h e routine is comp l ete, the r e lative displaceme n t fo r 
each branc h in str uc t ion i s comp u ted in ter ms of initially 
a l ! oca ed offsets for l abe l s, and the minimum l ength displacement 
6-45 
Target Code Generat i on 
iel ds necessa ry for each branch i s co mp ute~ ~ iv en the ew 
computed s ize o d i sp l acement fields, byte of se:s fo r labels a e 
recomputed. Branch and cal I ins tr uc t io n disp l ace eit operands are 
then computed and stored i n the i nstruct i on. 
References to globa l data objects 
References to global obj ects wi I 
be computed at the time t he 
inv olve ope r 2-ds whose s i ze 
instr uction gener ated. 
displacements involved wi I I no depend on the 3 i ze required 
can 
T e 
for 
i nstr uctions, as do branch in s tr uct i on disp lace ents . References 
to global objects will us ually involv e an oPset i nto a data 
segment area. 
register, or 
The address for t e start of t 
a global addressing mode may be 
: s area may be in a 
·sed. In the l atter 
c ase, the address field of the i nstruct i on w ill be loaded with 
the offset of the global object i nto the data seg ment involved, 
and a link.er directive will be generate d to spe: i fy relocation of 
the offset y t h e start address assigned to: e data segm ent 
i nvolve d. 
VAX Abstracted Resource Database and Code Generat ion Procedures 
he d i scuss i on of target code funct io ns above s · ggests abstra cted 
approach es to target cod e generation. The fo l ' win g d i scusses i n 
more deta i I what i s valv ed in the mechanics .; abstract i on. It 
attempts to c l arify how abstraction 
PLIP projec t VAX pr ot otype target 
i s be ing ~ 2 i fes ted 
code ge erator, 
in t e 
hopefu l l y 
demonstrates how abs t rac t ion can c o ntr i bute: port ab i I i ty ot 
target code generators. 
As described ear li ~ r in thi s ch apter, port ab i ! · ty is e nha nced 
t he access to inf o rmat i on on the resources a vailable on 
machine , and the state of a I I ocat ion of t . 0 s e r~ sou r ces 
6-46 
a 
i s 
Target Code Generat i on 
prov i ed by an abstracted da abase hat i s accessed b y s andard 
abstracted procedures and constants. Addit i ona lly the abstraction 
of the target code generator it self whenever possible wi I I a l so 
impro ve the portab i I i ty of code generators. It was a l so s ggested 
tat a s i ngle abstracted target code generator for a l I machines 
in the family shou l d not be attempted as the comp l e xity o some 
machines would make the abstact ion l arge and comple x . Th erefore 
the a i m i s a set of air l y fl u id procedures and constants that 
will hopefully be easily modif ied to produce code gen erator s for 
ne w machi nes i n the fam i ly. In some respects this appea r s lik e a 
new level of amilies within the target code generators, a third 
l e ve l of the code generator. However machines at th i s l e v e l are 
no t easi Jy categor i sed into usefu l fami I ies, as many subtle 
dif fere nces ofte n exist between any two mach i nes wh i ch makes a 
complete gener i c hand l ing of the two machines not as st rai ght 
forward as the generic treatment of fam i I ies at the GAS I eve I. 
ha t is 
mechanics 
potential 
mac hine 
really be in g aimed at is a prag mat ic treat men t of the 
of code generator de vel opment that wi I I maximi se the 
for easily modifying a code generator for o e target 
to produce the code generator for another mach i n e. The 
abstracted procedures and constants wi I I great ly ass i st th i s a im. 
These procedures and constants or mac hi nes in a GAS fam i Jy wi 11 
tend to differ mor e i n implementat i on rather than d ef inition . 
I t has been suggested t hat a good start i ng po i nt for a 
code generator is to base the structure on the structure 
target 
that 
would be developed for an orthogonal mac hine that wou ld be a 
member of he same generic family as the target machine. The VAX 
prov i des a reasonably orthogonal machine w i th reg i sters, and 
the re fore the VAX prototype provides a good basis for target code 
generators for other machines in the fam ily . Is ma i n non 
6-47 
Target Code Generat i on 
0~ thogona l features have a l ready been d i sc ssed. 
'l e fol lowing is a descr i pt i on of some of t e abstracted 
~r ocedures and constants de 1e l oped for the VA X prototype that 
i I I i I l ustrate the typ i ca l form of these procedures. Note that 
, ost constants required by the target code generator wi 11 be 
provided in the " achine" module and wi I I e i nformation related 
to the size of ob j ects suported. These procedures and constants 
a r e based on the approaches to issues such as register al l ocat i on 
and two, three address instructions, as discussed earlier i n the 
cha pter. They are therefore abstract ions of the approach taken i n 
relati on to these i ssues. Fo r exampl e the procedure " In_Register" 
i s im po rt ant regardless of t he machine . Simi larly when an 
i ns.tru c t ion ADD X,Y,Z <Z =X+Y> is encountered, it wi 1 1 be 
necessary to copy back to memory the contents of any registers 
hat conta in va lues that share the same storage area of X,Y or Z. 
These can be requested by p roce du re ca I Is: 
Dump_AI I_I n_Area{X,Source) 
Dump _ AI I_In _Area{Y,Source) 
Dump_AI I_In _Area <Z, Target) 
and these procedures wi 11 be implemented in terms of 
machine. The procedures below are an e xample 
procedures that are be i ng developed for t e VAX 
the 
of 
target 
typica l 
prototype. 
l-lowever many of these proced r es wi 1 1 appear i n t h e target code 
genera t ors for a wide r ange of machine s. 
Several procedures wi I I return information re q uired for dec i sion 
6 - 48 
Target Code Generat i on 
ma ki ng b y the target co de gene r at o r o r a r e sed to update t he 
target code generator database to ref l ect changes i n say register 
c ontents or locat i ons of variab l es : 
In_Register ( variab l e _ i d l 
If the variable i s current l y located in a reg i ster 
retur n s t h e reg i ste r i d, e l se returns no registe r . 
Used_First( var _ 1, var_2 l 
Returns TRUE if var 1 has a ne x t use be f ore var_ 2 
Is Live(var i ab l el 
Returns TRUE if variab l e is I ive-
. e. could be 
used agian before i t is defined aga i n 
Has_Two_Address_Form(Code, Datatypel 
Returns TRUE if the target machine supports a t wo 
address in struct i on for the GAS code specif i ed 
and the Rcode Bas i ct y pe spec i f i ed 
Has_Three_Address_F o r m( Code, Datat y pe l 
Returns TRUE i f the target mach i ne supports a 
three address i nstruct i on f o r t h e GAS code 
spec i fied and Rcode Bas i ctype specif i ed 
Can _Be_In _Re g i ste rs(Code,For m, Datatype,Operandl 
Return s the set of regist e rs that can be used for 
the location of the spec i fied operand, for th e 
target machine instruct i on of spec i f i ed two o r 
three addre s s form, that would be used to 
implement the GAS code i nstruct i on specif i ed in 
6-49 
Target Code Generat i on 
" ::>de .. 
Data type. 
or he t e Rcode Bas i ct y pe specif i ed i n 
The ··Operand .. specification can be: 
Target _Op, F i rst_Op, Second _ p, Third_Op. Note 
at this procedure shou l d be im p l emeted without 
a~ array t represent the informat ion because i t 
wou l d be a large array, and us a ll y a standard set 
of registers w i 1 1 be a l l owab i e, except i n few 
i structions. 
No_Registers 
Peturns an emp y reg i ster set. Usefu l in context: 
IF Can _Be_In_Reg isters = No _Registers 
Can_Be_In_Memory(Code, For m,Datatype,Operand) 
Returns 
memory 
true if the specified operand can be in 
(see Can_Be_In_Registers) 
location specif i er 
If the var i ab l e as 
of the 
severa l 
variab l e 
current 
Locat i on(Varid) 
Ret rns a 
specified. 
cop i es, for e xample in two reg i sters and memory, 
w i I I return 
represen t ion 
dependent. 
a reg i ster by 
the location i s 
Set _Location(Var i d,Location_Spec i f i er ) 
preference. The 
target machine 
•; 11 modify the arget code generator database to 
r ef l ect the act that there i s now an add i t i onal 
copy of th e varia ble at the locatio n specified by 
"Lo cation _S pec i fier". 
6 - 50 
Target Code Generat i on 
Several procedures are needed 
generation 
f nctions. 
of target machine 
that 
code to im o : ement 
request 
standard 
Al lo cate_Reg i ster (Var i ab l eid) 
Wi 11 al l ocate a register to the ar iable purely on 
the basis of next use cri t erea. Wi 11 generate a 
load instruct ion and wi 1 1 update t e target code 
generator data base o reflect tat the variable 
now has a copy in a register, the variable does 
win a register. Wi 11 a l so generate an instruction 
to dump the contents of a register won form 
another variable, but only if there is no other 
current copy of the variable . 
Dump_A I 1 _ In_Area {Vari ab I e, Source_or _ Tcrget) 
Thi3 procedure wi I I generate target machine code 
to dump any registers that conta in vari ables that 
share the same area as the soec i fied variable. 
If "Source_or _Target " is "Source·, 
wil still be cons i dered to cont2 i 
If ·source_or_Target " i s "Targe · 
wi I I be considered free. 
the registers 
h e variables. 
the registers 
Dump _A I I 
Wi I! genera e target mach in e co de to dump 
registers that are not 
contain live variables . 
globa l y al l ocated 
a I I 
and 
Dump_Variable{Variableid) 
6 - 51 
Target Code Generat i on 
Wi I I generate target machine code to dump the 
specified variable, if currently i n a register. 
Get_Register(Variab l e , A l l owed_Registe r s, Operand ) 
Wi I I return i n Operand a pointer to the target 
machine code operand that specifies the l ocation 
to use for the variable. If a register is won, 
this will be a register operand. The "Allowed 
Registers" are the registers that can be 
co n sidered for use. Wi I I not generate any code to 
spi ll an y register allocated that conta in ed a 
value for a variable. 
Dump_Reglster(Reg i ster i d) 
Wi 11 generate code to dump register indicated in 
the target ma chine operand. 
Leave_Basic_Block 
Wi 11 loa d in to reg i sters any globa lly al l ocated 
variables that a re not currently in their g lo ba lly 
al located registers, t h en w i 11 cal I Dump_AI I . 
Generate_Add(type, l oc i d, locid,target _ lo c i d) 
Wi I I gene r ate target machine code to add ob jec ts 
of t h e type spec ified using values at the 
lo cati on s spec i fied by the first two locations, 
and placing t he resul t i n the th ir d l ocat ion. Noe 
that on s ome machines th ls procedure may generate 
a sequenc e of i nstructions or a ca l I to an 
in strlns ic s u bro uti ne li bra ry . 
6-52 
Target Code Generation 
T !-. e o l l owing i I lustr ates how these procedures a r e used. This i s 
o a comprehen s iv e a l gorithm for generation of target code for 
a1 addition operation, but is in tended to I lustrate the use of 
~ e procedures without be in g too comp l e x . Th i s 
C J rse, handl e al I t he poss i b l e com binat i ons of 
operands t hat cou l d occur. 
Dump_AI I (X,Sourc e ) 
Dump_AI I (Y,Source) 
Dump_A I I (Z, Targe t ) 
Al locate_Regi s ter(X) 
Al l ocate _Register(Y) 
IF Has_Two_Address_Form(ADD_,Type) THEN 
IF In_Register(X) AND 
Used _F irst(X,Z) AND 
Used_ First( X,Y) THEN 
Set_Location(Z,Location(X)) 
ELSIF In_Register(Y) AND 
does not 
loc at io ns 
Used_F i rst(Y,Z) AND 
Ha s_Two_Address_Fo rm(A DD_ ,Type l THEN 
Set _Loca tion(Z,Location(Yl ) 
ELSE 
Al l ocate_Register(Z ) 
IF In _Register ( Zl THE N 
of 
of 
IF NOT (In_Register( Xl l THE N 
Generate_Move(Type,Locat i o n( Xl,Lo cat i on {Z)l 
Set_Location(X,Location(Z l l 
ELSIF NOT (In_Register<Yl l THEN 
Generate _ Mo ve(Type,Locat ion(Yl ,Locat ion ( Z l l 
Set_Locat ion(Y ,Location ( Z l l 
6-53 
Ta r get Code Gene ration 
END 
END 
E D 
ELSE 
Al l ocate_Register ( Z l 
END 
Genera te Add( ype,Locat io n(Xl ,Locat i on(Yl ,Locat i on(Z)) 
Th e "Dum p_ AI I" procedures wi 11 ca use "STORE" in s ructions to be 
generated to du mp reg i sters conta i ning values that may 
any of the operands. The "So rce .. a d "Target .. va I es 
whether the operand i nvolved may be c anged by the 
overlap 
spec i fy 
o perat i on 
i nvo lv ed. If a "Source" 
co pied back to memory 
operand, the r eg is ters whose contents 
can stil l b e cons i dered to ho l d 
are 
the 
va riab les after the 
registers shou l d no 
in struct i on. I f a "Target " 
l onger be cons ider ed to hold 
operand the 
t he va lues 
inv o l ved as the variables may be changed. For more i n or mation on 
possib l e overlapping of var i ab l es i n the sa me storage area or 
via use of po i nters, see notes earlier i n th i s chapter, and the 
GAS code opt imi sat i on chapter. 
The " Al lo cate_Reg i ste r(Xl" and "Al l oc ate _Reg i ster (Yl" wi 11 resu l t 
i n generat i on of reg i ster "LOAD" i nst r ction s if the var i ab le s 
are not currently 
reg i ster i s won 
instr ct ion wi 11 
memory. Registers 
in reg i sters, 
from another 
be generated 
are won on a 
but can w i n a 
var ia l e , a 
to c PY the 
"ne x t s e 
a llo cation ' s discusse d earlier i n . he chap te r . 
reg i ster. If a 
register 
reg i ster 
bas i s. 
"STO RE " 
back t o 
Register 
Many approaches 
have 
t his 
been deve l oped to the problem of reg i ste r a l loc ation. I n 
thes i s a primary ai m has bee n to de mon stra e that an y 
register al lo cat ion sche me can be effe ct iv e l y abstrac ... ed in a set 
6-54 
Target Code Generat i on 
of procedures, cc , stan sand databa se. 
The next step 
l ocations for 
i nstruct ions. 
approach out l 
the algorith m abo v e i s to c o s i der the possib le 
his 
ed 
operands a d the possib l e u se two 
l ogic represents an a~st r act ion 
address 
of the 
and 
two a l gor i thms re l ect 
ear li er i nt i s chapter. T 1->e procedures 
that the VAX usu a I I y offers the ch o i c e of 
and three address ins tructions or mo st o perat ;on s . 
The "Generate 
the operands 
ta r ge t machine 
d d" procedure wi I I note the lo cations chosen for 
a ~d the datatype and wi I I ge er ate appropr i ate 
in struct i ons. It wi I I detect when two address 
i nstructions are to be used when the location of 
operand i s the same as one of the source opera nds. 
Another 
l ogic 
examp l e 
is the 
of the abstract io n of target 
actual generation of target 
code 
mach i ne 
patterns. For many 
i nstruction formats . 
machines there are often ju st a 
Consider a fictional mac hin e 
in struct i ons with formats: 
the target 
generation 
code bit 
few ba si c 
that has 
no operands ( e . g. return form subro tine etc ) 
one reg i ster operand e.g . in c r eme t register etc ) 
two reg i ster op er ands e.g. register to register move) 
reg i s ter then memory reference opera ds 
two memory reference operands 
6 - 55 
Target Code Generat i on 
Ad dit i ona lly assume that the i nstruct io n format depends e n tire ly 
on the operation code . Th i s means that each target 
has a ix ed format, obviously a airly unrealistic 
b t s im p lifi es the discussion. 
instruction 
assumption, 
The above instruction formats are quite common. Note though that 
no immediate operands are al low ed. Th i s can be represented by the 
fol lo wing data structure and procedure abstracti on s: 
Code Form= 
No_Ops, 
One_Reg, 
Two_Reg, 
Reg_Memory, 
Memory _Memo ,-y 
PROCEDURE Encode_ Op eration ( 
TYPE 
VAR 
BEGIN 
Byte = [O . . 255] 
Address_Length 
Code _Bytes 
6-56 
Code, 
Op_Locat i o n _l, 
Op_Locat ion _2 
Byte 
ARPAY[O .. 7] OF Byte 
Target Code Generation 
Code _B y tes[OJ := Co de_Bits[Code: 
CASE Operation_Form[Code] OF 
On e_Reg: 
Code_Bytes[1] := 
Register_B i ts[ 
J 
Two_Reg: 
Code_Bytes[1] : = 
Register_B i ts[ 
J 
Register_B i ts[ 
Register_Id ( Op _L ocation_1 ) , 
First_Register_Operand 
Registe r _Id(Op_Location_l), 
F i rst_Reg i ster_Operand 
& 
Register_Id(Op_Location_2l, 
Second_Reg i ster_Operand 
J 
Reg_Memory: 
Code_Bytes[1] .-
Register _Bits[ 
Reg i ster_ld(Op_Location_l l, 
F i rst_Register_Operand 
Address _L en gth := 
En co de_Address ( 
Op_Location_2, 
6-57 
Target Code Generat io n 
Mernory _ Memory: 
Address_Length : = 
Code_Bytes, 
2 
Encode_Address (Op_Locat i on_1 ,Code_Bytes, 1) 
Address_Length := 
END 
Encode_Address( 
Op_Location_2, 
Code _Bytes, 
END 
Send_Linker(Code_Bytesl 
+ Address_Length 
If t he instruction has no operands or only one operand, 
op erand specifiers provided in the procedure cal I 
ignored. 
the extra 
would be 
The array "Operat i on_Form", returns the operation format in terms 
of the enumerated data type "Operands_Form" . This ref I ects that 
many machines the n mber and a I I owab I e forms of operands are a 
funct i on of the operation code. The operands themselves may be 
specified independent I y as occurs in t e VAX, or operands may be 
c oded jointly or with the operation code bit pattern. In this 
ficticou s machin e, operands are coded in dependently, unless there 
are two register operands, in which case the registers are coded 
i nto the same byte . The algorithm above could be extended to 
ch ec k the suppl ie d location conformed to that required (register 
6-58 
Target Code Generat i on 
or me mory I oca i o l . 
The additional procedure "Encode_Address" is requ i red which wi 1 1 
ta k e a l ocat i on specif i er that represents a memory l ocat i on and 
wi I I encode t i s into bi ts necessary for the :arget mach i ne to 
the g i ven location. This procedure ·tself may have a specify 
s i milar form to the A l gorithm above, with a "CASE .. structure 
based on an "Operands_For m" enumerated data t ~e that represents 
the target machines form of memory re~erence operands. 
Add i t i onally th i s procedure wi l emit i n k e r di rectives f o r 
re l ocation if ne cessary. Two arrays are requ i red that store b i t 
sequences needed for reg i ster iden i f i catio n and for each code 
operation. 
dimension 
The reg i ste r array is two dime:-:sional, with one 
the register identification, ad the second an 
in d i cation of whether the register operand is the first or second 
oper a nd. This assumes the first reg i ster is c ded into the first 
f o ur bits of the second byte of the target code instruction, and 
i:he i nstruct i on is a register to register i nstruction, the 
second reg i ster 
byte. It cou l d 
is coded i nto the second four i ts o f the second 
be argued i n keeping wi th the philosophy that 
tab l e representat i on i s less effective for representation o f 
mac hi ne i nfor mat i on, that these arrays shou I - be rep I aced with 
Procedures provide mere felexibi I it y , procedure 
particularly 
ca I I s. 
when it comes to port i ng, but t.ey have a h i gher 
overhead than the time required to access an array . This dee is ion 
w i ll always be important where low l evel p i e.:es of i nformation 
are required. 
The VAX i nstruct i ons have operands that are dependent on the 
operat i on code, but on l y i n respect of the num:er of operands and 
restr i ct io ns on he formats of each operand. perands are coded 
6-59 
Target Code Generat i on 
in dependent ly , and not to the same bytes as t, e operation code 
or other operands. This means that b i t patter generat i on could 
be abstracted as generating bits i ndependently or the operation 
code and each operand. This approach is va Ii d for a mach i ne where 
there are no restrictions nature of any operands . There i s no 
need for the concept of instruction "forms". Each operand can be 
cod ed into bits independent l y. Al I that is required for each 
instruction is the nu mber of operands. However, som e VAX 
inst ructions naturally involve restr i ctions on the nature of so me 
or al of its operands. An e xa mple is the "MOVC" i nstruction 
which i nvolves moving of a b lock of memory, therefore the 
operands that specify the source and destina i on bloc ks mu st 
obviously be memory addresses given exp! icl t ly or via an address 
i n a register or a memory locat i on . The source and destination 
cannot be registers. The i ndependent operands approach could 
produce i nva I i d target machine code by i nva Ii d operands to be 
produced. Therefore a more rigorous abs t ract i on would al low 
checks to be made on operand types based on the operation code. 
herefore the concept of instruction "f orms" could be used. For 
the VAX there wi 11 be many forms to acco mo date the fu 11 
in struction set. Eac h form would consist of an operation code 
field and several operands, each one of wh i c has a set of 
a ll owable forms. 
Very few machines have a fully orthogona l i nstruct i on for mat 
where for any i nstruct io n a l I operands can ha v e any of the forms 
s pported by the machine. Most machines wi 11 be s i mil ar to the 
VAX i n which the i nstruction "forms" concept i s an effect ive 
bas i s for abstracting the process of ta r get code b i t 
generation.Therefore the general a l gor i th m above wil be 
applicable to most machines. Modifications ma y have to be made in 
6-60 
- - - - - - - - - - - --- - - - ---------
Target Code Ge nerat i on 
each CAS E op ion, but hope y many machines wi 11 h a ve such 
s im i I ar instruction forma s tat often very few changes wi I be 
r equired. 
w pefully th i s discuss i on of abstract i on o target code b i t 
pattern generation i I lustra tes how portab i I ity of the target code 
;enerator can be enhanced by abstractio of this unction and 
arious other functions of the target code generator. As he 
abstraction is not per or med wi th al possible target machines in 
i nd but rather a gener i c a m i I y, the abstract ion can be more 
e fective . 
Fu I I implementation of the prototype VAX code generator based on 
the approach of a target code generator database and code 
generation algorithm s that utilise the procedures, i s st i I I no 
where near complete. This task. involves significant work and is 
beyond the scope of this thesis. What is significant from the 
viewpoint of this thesis i s that the two I eve I approach offers 
: e chance to deve I op target code generators that are genera Ii sed 
as much as possible for the machines in the same generic fam i I y. 
is much mo re practical than develop in g a genera li sed targe 
code generator for a l I mac hin es. Generalisation is ot achie ve d 
vi a tab l es of information but via a fluid abstracted set of 
procedures const ants that ut i I i se an abstracted database. 
-;-h i s devel opment of such target code generators, and in 
particular i dentifying and developing the most e fee i e 
sbstracte d procedures and targe code generator database wi 11 
require significant further study. During the researh work for 
~his thes i s the concepts have been deve l oped to a stage where 
they have been demonstrated to be pract ical ad to contribute to 
t "1 e goal of portabi I ity, and that the tw o I eve I code generator 
6-61 
I 
_J 
Target Code Generat i on 
approach enhances the ir pract i ca lity and eftect i ve nes s . 
Target Machine Code Optim i sation 
Any t a r get machine o pt imi sat i on wi I I o b viou s ly require such a 
optimiser be produced for each ta r get machine. Therefore i t w il l 
no t be des i red to perform much opt imi sat i on at th is l eve l Use of 
the GAS 
developed 
code opt i miser requires on ly one opt imi ser wi I I be 
for each fam i I y, and that opefu I I y provides much of 
th e op t imi sat i on relevant for the mach in es in the family, so that 
I it tle opt imi sation i s required of target machine code. 
Addit i onally , optimisation at the mach in e code l evel is usually 
diff i cu l t at the mach i ne code lev e l as program structura l 
information is usually not a vai !able. However in the PLIP 
compilers, the code generation is performed with Reade structura l 
information sti 11 avai !able, and bas i c blocks and code motion 
information usually avai !able . This would al low more reasonabl y 
comple x optimisations to be perfor me d , but GAS code opt imi sation 
wou l d already ha v e achieved mo st o f the se. The only p r act ical and 
usefu l optimisation a the target machine level are the r efo re 
peephole optimisations, in particu lar branches to unconditional 
b r anc hes, 
and stores 
process. 
copy propagat ion, and e limi nation of unnecessar y loads 
that may be introduced by t he target code generat ion 
6-62 
Chapter Seven 
Conc l us i ons 
Introduct i on 
he aim of this study has been to develop a des ign for a compiler 
back-end code generator for the VAX/VMS enviro~ment for the PLIP 
project . Th i s project re q i red t , at po r t ab i I i t y be a major 
considerat ion in the des i gn. GAS code has been developed as an 
effective way to improve ortability primarily by representing 
the common architectural features of a family of machines. which 
for the VAX family machines i nclude: 
L i near code using t hre e address instructions 
Control logic implemented by branch in structions 
Subroutine cal Is implem ented via stack 
All arith metic on ix ed sized objects 
The GAS code is designed to contain as few codes as poss i ble by 
effective l y removing al I ob ·ect "typing" and addressing modes to 
the GAS operands. Future target code generation for RISC hardware 
will find this useful. The addressing modes are designed to allo w 
add r ess 
to the 
computation structure to be clear ly represented, 
level nece ssar y to efficiently im plement source 
but only 
l anguage 
s t ructures rather than a l I target machine addressing modes . 
7-1 
Co nc l us i ons 
GAS code Generat i on 
Generation of GAS code wi I I involve many of the common steps of 
generating target machine code for machines in the farni ly. GAS 
code more closely resembles target machine code tan Reade. It is 
attached to the Rcode tree in the Reserved two byte extension 
Rcode, but program structural informat ion represen ted by Rcodes 
such as "Loop", "Case" and ··c a I I .. are reta i ed, as we I I as 
declaration Rcodes. -hese are useful to a GAS code optimiser and 
i nte r preter, and 
added advantage 
du~ing target code generation. This has the 
that the procedures for hand I ing Rcode wi 11 
handle GAS code since it is only a form of Reade. 
Optimisation at the GAS code l evel has ad vantages over 
optimisation at either the Rcode or target machine le vel. At the 
Rcode level, avai l able optimisations are not as comprehensive as 
possible at the GAS code or target ma chine level. Optimisation at 
the ta r get code leve l woul d requ ir e an optimiser be writt en for 
each target machine. Op ti mi sation a t the GAS code l evel offers a 
useful compromise. Because GAS code consists of ew codes t he 
optimiser is simpler than either an Rcode or ta rget machine code 
optimiser, and a GAS interpreter is a more prac i cal proposition 
than an Reade interpreter . The fact that GAS code is attached to 
the Rcode tree al lows program structural i nfor mation to sti 11 be 
avai I able to the GAS optimi ser; th i s has been foun d to assist 
both t e mechan ics and effectiveness of the opt imis ation . 
E x t e n d i n g p.o r t a b i ! i y i n t o t h e g en e r a t i o n o f t a r g e t ma c h i n e c o d e 
ha s been ach i eved by abstract i n g the process of gen erat ing code 
as much as poss i ble wi th the aim that the code for a target code 
generator for o 1e machi ne wi 11 be eas y to p o r t to another 
machine. How wel I th e abs tractions for one machin e wi 11 apply to 
7-2 
Conc l us i ons 
3 o her mac ; ne will obv i o sly depend on s imilariti es be ween the 
:n a ch in es . ec i s i ons as to the optimum context i n which to e x tract 
fu nctiona li ty are sub · ecti v e, dependent upon the viewpo i nt of the 
dividu a l des i gner. The more genera l the abstract io n the w i der 
~ne range of machines to which i t w i I app ly, but probab l y the 
l ess usef t h e abstract i on. I t has bee not i ced dur i ng 
develo pment work for the f i rst code ge n erator tha t i t i s the 
I ajor eat ~re s of a machine's architecture that dom i nate the 
unctions that mus t be performe d to generate target code. T he 
target co de generators fo r the machine s i n a GAS fam i ly w i I I 
there fore h a v e s tron g s imi larities. The abstractions de v eloped 
for targe code generation should therefore a i m to be genera l in 
terms of the architect r a I features that define the GAS fam i I y to 
wh i ch t h e machine belongs. It is not e x pected that one s i ngle set 
o abstractions wi 11 apply to a l I machines i n a family, but that 
a fluid set of abstracts wi I I be developed with modifications i n 
th e abstractions for each machine to al low for idio s yncrasies of 
each tar get machin e. Abstraction wi I I be manifested i n a set of 
procedure s and a database that represents the character i st i c s and 
resources of the target ma c h in e and the current a I I o cat i on of 
t ese resources by the co mp il er. A table dr iven approach 
incapable 
to 
0 targ et code generat i o n was r e jected as be i ng 
adequately representing the machine character is t i cs necessary or 
qua Ii ty ta r get code generat ion acr o ss even a s ma I I range of 
machi n es. a ble driven c ode generators are based on a si n gle code 
generator t ,at 
able for m. 
ch aracterist i cs 
is supp li ed relevant target machine information i n 
e code gene ra tor tends be the superset of 
for a l I the in tended target machines. As there 
ar e many idiosyncrasies a nd local d i fferences the code 
and tab le h a v e to be large . The concept of a set of 
generator 
.. f I u id .. 
abstr-acted pr ocedures and associated databa se a l l ows ad v antage to 
7-3 
Co nc l us i ons 
ta ke n o w at i s co mmo n be: we e n code generat or s , but 
mo d i i cat i ons are made to c orporate id i osyncras i es of 
ta r get machine. Code generator generators were mainly re j ected 0 
: ~e grounds that to produce a code generator for a new inachi e, 
: e two l eve l code generat or approach wou l d, g i ven the sa me 
e fort requ i red for code generator generators, a I I ow de ve I o pme : 
.:: a sma I I er and fas te r c o de generator that produced bette r 
q u a ! i ty target code . This wou l d be more so i f the new machine is 
sim ilar to a mac ine for whic'. a code generator already exits. 
To ass i st with abstraction a n d maximise commonal i ty be twee 
targe t 
target 
code generators, commo n approaches have been taken to 
code generation issues such as hand Ii ng storage and c ode 
declarations, subroutine ca ll ing conventions, register saving 
~ ring subrout i ne calling and allocating space for temporar y 
ar i ables. 
Future Research and Developme n t 
'. e fu r ther de v elopment ands u dy of Rcode as the ma i n bas i s o f 
portabi I ity v ia front-end ad back-end separat i on shou l d be 
o u rsued . This w i 1 1 occ u r as compilers are developed or a ra nge 
c .:: source languages, target mach i es and operating systems. 
'. e development of GAS code o r other fami I ies such as ze r o 
acdress and transputer based mach i nes is required. This work w i 11 
a l so al low the " fa mily" concept to be refined. The work d one for 
t e pr o t o type GAS c ode i n t i s s t u d y s ho u I d prov i de t h e bas i s f o r 
o her fam i I i es as much o the GAS codes wi 1 1 be the same for a ny 
two fam i lies . 
e p r eliminary GAS code opt i miser designed in th is stud y 
7- 4 
Conclus i ons 
require s refinement. 
The VAX target code generator that is under development requires 
rt her work, with abstractio ns of functions and extension and 
refinement of the associated database . The abstract ion s most 
usefu I for portab i I ity wi 11 al ays be S'..Jbjective and w i I be 
im proved with further study and e x perience . 
A 3et of sour c e progra ms is requ i red ~or test in g purposes. These 
wou Id a I I ow the correctness and qua Ii ty of target code produced 
to be assessed. Add itio nally the value of improvements in the 
code generator could then be assessed in an objective manner . It 
is important that these programs provide a wide an d ba l anced 
representation of typical programs for which the compiler wou l d 
be used. It is hoped eventually to use the test suite being 
develope d a t the University of Tasmania for th i s purpose. 
7-5 
Appendix A 
Machine farni ly No 1 GAS code Reference Manual 
Introduction 
The Generic Action Set (GAS) Code described in thi s Reference Manual is for 
machines with the following general architectural characteristics 
a . Has two and/or three address instructions . Does nor provide zero address 
or accumulator instructions for binary operations. 
b. Provides a stack with pointer register support suitable for allocating and 
accessing arguments and local variables of routines. 
c. Offers an instruction set which can directly implement the GAS Code 
instructions described below. 
If a target machine cannot implement most of these instructions with one or a few 
of its own instructions, the target machine is not compatible with this family. The 
characteristics of this set are :-
a . ALU arithmetic instructions operate on a small range of statically sized 
objects which must be byte aligned . All operands need not be of the same size . 
NOTE This requires that Rcode block exact arithmetic operations be broken 
d0wn into series of operations on basic objects that are supported by the 
target machine . 
b. Memory movement instructions can involve movement of dynamically or 
statically sized numbers of bytes. 
c. Stack instructions equivalent to PUSH and POP are available . 
d . Control instructions essentially involve conditional and unconditional 
branching. 
e . Both simple and complex subroutine Call and Return facilities are offered . 
f. Facilities are available to enable and disable interrupts or. at the least , an 
uninterruptible test-and-set style instruction is a,·ailable . 
g. Provides at least instructions for logical operations OR, AND. XOR and 
NOT . 
The GAS Machine Stack 
The GAS Machine ha s a stack which is used for allocating storage for results, 
local variables. arguments and return address block s for subroutines. It is con-
sidered to consist of a stack of objects, each of which can consist of a different 
2 GAS CODE REFERENCE MANUAL 
number of bytes . Objects are placed on the stad by the PUSH GAS Code and 
removed by the POP GAS Code . 
Displ&)' Vector 
Associated with each routine, whether active or not. is a "Display Vector" 
which contains several pointers to objects on the stack plus a few other values. 
This vector consists of the following "fields" :-
a . TOP_ OF_ STACK. This is a counter of objects on the stack . When an 
object is pushed onto the stack , it is incremented by one . When a new stack is 
created it is set to zero. Most references to an object on the stack will involve 
a display pointer plus an object offset (object number), and a byte. and bit 
offset within the object. rather than a conventional stack pointer which points to 
bytes plus a byte offset and bit offset. This overcomes the problem of handling 
dynamically sized objects which are usuall y handled on many machines by 
descriptors of some forrn . The GAS machine ha s facilities to automatically 
locate an object given an object offset. When the address of a stack object is 
required as data for an instruction. it is assumed that a unique byte address is 
generated . Reference to this object can then be made via this unique address. 
This address is a runtime value and the mapping of objects. given their object 
offsets and their unique byte addresses is handled automaticall y by the GAS 
Machine . 
NOTE When the GAS Code is being interpreted in a real machine. some 
mechanism will be required to achieve this mapping. 
b . RESULT . This contains the object number {offset ) of the current result 
object . 
c. ARGUST . Thi s contains the object number (offset) of the start of the 
current argument list. 
d . LOCALSTOR£ . Thi s contains the object number {offset ) of the start of 
the local objects of the current routine . 
e. STACKBASE . Thi s is the unique byte address of the base of the stack . 
f. STACKLIMIT . Thi s contains the unique byte address of the limit to 
which the stack ma,· gro" . 
NOTE Stack underflo" corresponds to an anempt to pop an object when the 
TOP_ OF_ STACK is zero. Stack 0verflow occurs when a push would 
result in the stack growing past the address given by the contents of 
STACKUMIT . 
3 
g . ROUTINEOESCR . This contains the r0utine descriptor for the currently 
active routine . It consists of the current PC and the STATIQ.INK. 
(J) PC. This contains a pointer to the current GAS Code when the machine 
is running. Such a pointer indicates where in the Rcode-style tree structure 
this is to be found . 
(2) Sf A Tl CLINK . This contains a pointer to the static environment for 
the current routine. It may be an environment vector or a link to the 
display for the lexically enclosing routine . 
h . PREVOISPLA Y. Thi field points to the previ0usly active display . Thi s 
is not actually required by the GAS machine but is present to reflect the fact 
that the target machine will require it. It represents the dynamic link mechan-
ism. 
i. RETURNMARK. Thi s contains the object number (offset) of the return 
address to be used when returning from the current routine . 
j . EXCEPTHANOLER. This points to the routine to be executed if an 
exception is detected - by some "monitor" , for example by the GAS Machine 
"hardware" . This routine takes one parameter which is contained in the 
EXCEPTPARM pointer . 
, EXCEPTPARM . This contains a pointer to the parameter to be used by 
the exception handler routine . 
Run-time Subroutine Structure 
A number of run-time data structures provide the basic GAS Machine mechan-
ism for handling the environments of routines. Each time a new procedure is 
entered , a display vector is created . 
Note that the CALL GAS Code is equivalent to the FAST_ CALL Roode . That 
is. it onl y involves pushing a return address and branching 10 code at the address 
given . T o handl e full routines the PUSHMARK, NEWDISPLA Y and POPMARK 
GAS Codes are pro,·ided . The GAS Machine implements the CALL Rcode in the 
foll owing way 
a . Compute the siu of the result area , then push a block object of th is size 
onto the stacl. giving it the alignment required for the result. and a NIL ,·alue. 
b . Execute the PUSHMARK GAS Code . Thi s results in the creation of a ne" 
displa ) vector. and a llocates space on the stacl- for a subroutine return block 
(return address and any status information required in a return block ) by 
4 GAS CODE REFERENCE MANU.,L 
pushing an object of the size required for the return block , and with a NlL 
value . See the details of this inStruction later in thi s manual. 
c . The arguments are then pushed onto the stac, . 
d . The routine descriptor is computed and also pushed on the stack . 
NOTE The GAS machine never needs code to establish the static link field . It 
is assumed that this is built into it. The generation of target machine 
code or GAS Code interpretation will. however. require some suitable 
mechani sm . 
e . Use the CALL GAS Code using the address field of the descriptor (which is 
currentl y the object 0n the top of the stack ). The b~1e offset of the address 
field into thi s object will be rarger machine dependent. and the GAS Code gen-
erator must therefore have access to this information . The address of the 
instruction after the CALL and any status information that is also pushed by the 
target machine calling instruction will be pushed on the stack ; control then 
passes to the instruction pointed to by the routine addres . The return address 
and status in formation will be considered by the GAS Machine to occupy one 
object location and to be of the size required for such a return bloc, on the tar-
get machine . The GAS Code for calling may thus be "Titten as 
CALL Always. Offset_ of_ RetAddr + (TOP_ OF_ ST ACK ) 
f. In the routine code . use the NEWDISPLA Y GAS Code as the first instruc· 
tion. This performs several actions 
(I ) The most recently created Di splay becomes the current (active ) display 
with its PREVDISPLA Y field pointing to the display that was active . The 
TOP_ OF_ ST ACK pointer of this ne" display will be made equal to the 
TOP_ OF_ ST ACK of the old di splav . 
NOTE 1. There could be several ne" displays yet to be acti vated . This 
would occur if the generation of the arguments . or the routine 
descriptor involved call s to routines and that these routines them· 
selves involved calls to routines (etc'. ). The most recentl v created 
di splay is the one to make active because these call s a;e nested . 
However . note that if when computing and pushing arguments a sub-
routine call is encountered . the computation of the environment of 
the called procedure will always be correct as the ne" di splay 
created by the PUSHMARK is still not the acti ve di splay at the time 
of the ne" cal l. 
2. One of these calls may involve a FAST_ CALL Rcode. in whi ch 
5 
case a new display is not generated and used in the routine . Ho1<· 
ever , thi s is why the activation of the new display ca!Ulot be done 
with the CALL GAS Code , because even though there may be new 
inactive displays , on a fast call , the CALL GAS Code does not 
involve a new di spla y becoming active . 
(2 ) The return block is on the top of the stack and must be moved to the 
stack location pointed to by RETURNMARK. 
(3) The static link contained in the routine decriptor which is now on the 
stack top is moved into the ST ATICLINK field of the display . 
(4) The routine descriptor on the top of the stack is now not needed , so the 
TOP_ OF_ ST ACK is decremented by 1. 
(5) Move the TOP_ OF_ STACK to the LOCALSTORE field of the display 
g . Allocate space on the stack for local objectS. 
To return from the subroutine use the POPMARK instruction followed by the 
RET instruction . The POPMARK instruction will place the RETURNMARK value 
in the TOP_ OF_ STACK of the previous display , effectively deleting the current 
display and making the previous display active . The RET instruction will find the 
correct return address and status information (required for the target machine 
return instruction) at the top of stack . 
GAS Machine Flags 
Severa l flags are provided by the GAS machine . Two of them. Overflo\\ and 
Carry are explicitly set by executing various instructions in the GAS Machine. 
These flags will have a value of "One" (TRUE) OR "Zero" (FALSE) They can be 
treated as providing a source value (1 or OJ or a condition (TRUE or FALSE) 
depending on the context. The flags are :-
OVERFLOW 
CARR Y 
POSITIVE 
NEGATIVE 
EQUAL 
Tem porar) Objects 
As GAS Code is generated . it will become necessary to provide temporary 
storage objects to hold temporary results. It will also be necessary to provide tem-
porary label s to implement such things a, Case structures and loops. To allo" for 
GAS Code to identi fy. use and dispose of these objects the following pseudo-GAS 
6 GAS CODE REFERENCE MANUAL 
Codes have been identified 
a . USE\/ AR id 
b. DEL \I AR id 
c. USELABEL id 
d . DELLABEL id 
Encountering a USE\/ AR GAS Code means that storage is to be allocated for a 
data object of the Rcode Basictype . This temporary object can then be referred to 
using the id . When a DEL \I AR is encountered use of the id will no longer be 
val id . The same identifier may be used later for other temporary objects if 
needed . Encountering a USELABEL GAS Code means that the current Rcode 
nod e pointer should be noted and associated with the temporary label id given . 
Reference to the id will then be associated with the Rcode tree pointer. V.'hen the 
DELLABEL is encountered , the temporary label id becomes invalid . 
GAS Data T)•pes 
Basic1)'pes. It is assumed that the target machine can handle directlv some of the 
··sasictypes·· defined in Reade . It is also assumed that the front-end of a compiler 
will generate only Basictypes which the target machine can handle V.'hen generat-
ing GAS Code. extended ari thmeti c operations that appear in the Rcode should be 
broken into a set of operati ons involving the Reade basic datatypes supponed by 
the target machine . 
Basictype operand s are required for all arithmetic operati ons. and are speci fied 
by an operand that consists of two parts . the basictype and the object reference . 
The object reference identifies the value involved and may be an immediate value 
or identification of the memory location containing the value. or the address of a 
memory location (code or data ). Object referenci ng is discussed belo" . 
Blocks . Memory move GAS Codes (LOAD. PUSH . POP. IN and OUT) are 
operations on blocks of basictype objects as are TEST and SCA!\ . Blocks are 
specified by one operand that specifies the number of basictype objects in the 
block. and another operand that specifies the first basictype object in the block . If 
the block is considered to be just a serie s of bytes. the length operand should 
specify the number of bytes and the second operand should specify a basictype 
object that is unsigned and one byte in size . 
If a block is specified in such a way that the size result., in running out of real 
target machine data space . the program is erroneous. 
NOTE In practice the exact effects are target machi ne dependent 
9 
c. Shin Mode . Used to indicate the type shift in the SHIFT instruction . It is 
one byte coded with a value Arithmetic, Logical or Rotate . 
d . Alignment Used in PUSH instruction to specify the alignment required 
for an object to be pushed . It is a byte coded using the formal specified in the 
Rcode Reference Manual. 
e . Byte Cardinal . This is a single byte providing an unsigned binary exact 
value. It is used in the SVC instruction to specify both the SVC code and how 
many arguments follow . 
f. General . Provides identification of a Basictype value or bitstring to be 
used in a GAS instruction. 
General Operand Forms 
These operands consist of a byte coded according to the Rcode "Basictype" 
definition . a byte coded to indicate the variant form involved. and a number of 
bytes that depend on the variant. Before describing the variant forms, various 
sub-fields of these operand descriptors must be described . 
a . Immediate Sul>--lield. Thi s provides an immediate value . It consists of 
one byte that is a binary unsigned value in the range I 0 .. 7] that specified ho" 
many bytes follow. then this number of bytes folio" . The first byte is the least 
significant byte. 
b . Displa) Field . This is a one byte value coded to identify one of the com-
ponents of the GAS Display vector. 
c. Temporar)_ Id . This is a two byte value coded as an unsigned binary 
exact value indicating the GAS generator assigned identification for a GAS tem-
porary label or storage location . 
d . Direct Basictype. Tlus is a sub-field used to specify a value that indicates 
the size of a bitstring or an offset of an object in a storage location (object . byte 
and bit offsets). These values may be static or computed . If computed , they 
must be stored at a location whose offsets are definitel y given as static values. 
Thi s sub-field consist of a b~1e coded according to the Rcode "basictype" 
definition. a byte coded to indicate the variant form of the sub-field , followed 
by a number of bytes dependent on the variant . The variants are coded as fol-
lows -
(1) Immediate . Coded as Immediate sub-field . 
(1 ) D.mamic . 
10 GAS {X)[)E REFERENCE MANUAL 
- Rcode Lexical Level (one byte) 
- Display component (coded as display sub-field) 
- Object offset, Byte offset, Bit offset 
all coded as Immediate sub-field 
(3) Static . 
- Rcode Areanumber (one byte) 
- Byte , Bit offsets (coded as immediate subfields) 
(4) Temporary . 
- Temporary id (coded as Temporary_ ld sub-field) 
- byte. bit offsets ( coded as immediate sub-fields) 
(5) Imported . 
- Rcode Moduleid (one byte) 
- Rcode Areaid (one byte ) 
- byte. bit offset (coded as immediate sub-fields) 
e . Code Object This identifies a code object . The value returned is the 
address of the code object . Consists of one byte coded to indicate variant form 
of thi s sub-field . followed by a number of bytes depending on the variant 
(1 ) Routine . 
- Rcode routineid (one byte} 
- Rcode Lexical level (one b)1e ) 
(2 ) Code. 
- Rcode Labelid (one byte) 
- Rcode Lexical level (one b)1e) 
(3) Temp. 
- Gas assigned Temporary label id (coded as Temporary_ Id sub-field). 
(4 ) Imported . 
- Rende Moduleid (one byte) 
- Rcode Routineid (one b)1e ) 
Tha variant form s of a Genera l operand are 
a . In . This variant provides a value from which a bas1ctype object is to be 
taken If the value IS an immediate series of b~1es or an address. the basictype 
value will be taken starung at the least significan1 bn . If the value is specified 
11 
by a reference to a bit in memory, the basictype value will be taken starting at 
this bit. Whether thi s bit is assumed the most or least significant bit is machine 
dependent and depends on how objects are packed . If the value provided is an 
immediate value or an address and is longer than required for the basictype , 
only the least significant pan is used . If the value is not long enough. missing 
bits are assumed to be sign extended , based on the basictype type involved . 
The variant pan of the In form is coded :-
(]) Immediate . Coded as an Immediate sub-field . 
(2) Special . Coded as a Special sub-field . 
(3) Target. 
is address . Thi s is a one byte BOOLEAN value indicating if the 
~alue to be used is the address of the memory location identified 
(TRUE) or the contents (FALSE). If TRUE the bit identified must 
be located on a byte boundary in memory (not a temporary storage 
location or display pointer). 
- Target location specifier coded as specified for the "Out" variant (see 
below ). 
(4) Code . Value is the address of a code object identified . Coded as Code 
Object sub-field . 
(5 ) Constant . 
- Rcode Constantid (one byte) 
- Rcode Lexical level (one byte) 
- location (coded as Direct Static sub-field) 
b. Out . Provides identification of a storage location for a basictype object. 
The identification is given by identificying the bit that marks the beginning of 
storage used for the object . The contents are required or wil l be changed , or 
its address is required (excludes display pointers and temporary locations) . lt 
consists of an extra offset field coded as a Direct Basictype field , a byte indicat-
ing ·· indirection·· and which is coded as a binary value . a byte coded to indicate 
variant form , and a number of following bytes that depend on the operand 
form and which identify a bit in storage . The ··indirection·· byte is coded 
TRUE if thi s bit marks the beginning of a stored address that identifies the 
actual storage bit required, else it is FALSE . The extra offset is a b)1e offset 
that identifies the offset from the bit identified (after the possibility of indirec-
tion) of the actual bit required . The variant pan identifies a bit in storage 
coded as follows :-
(l ) Target. Bit lc>cated in display pointer 
- Rcode lexical level (one byte) 
GAS CODE REFERENCE MANUAL 
- Display pointer identity (coded as Display Field sub-fie ld) 
(2) D)•namic . Bit located in stack object 
- Rcode lexical level 
- Display pointer identity (coded as Display Field sub-field ) 
- Object . byte. bit (coded as Direct Basictype sub-field ) 
(3) Static . Bit located in a static storage area in current module 
- Rcode Areaid 
- Byte , bit offset (coded as Direct Basictype sub-field ) 
(4) Temp . Bit located in a GAS temporary storage location 
- Temporary identity (coded as Temporary_ Id sub-field ) 
- Byte , bit offset (coded as Direct Basictype sub-field ) 
(5 ) Imported . Bit is located in static storage area of another module 
- Rcode Moduleid 
- Rcode Areaid 
- Byte, bit offset (coded as Direct Basictype sub-field ) 
c. Fixed . This is an immediate fued value. coded as an Immediate sub-fiel d. 
d . !dent. Basictype value is taken from or stored staning at the least 
significant bit of the GAS temporary storage location specified . Coded as a 
Temporary_ Id sub-field . 
e . Bit_ In . This refers to a bitstring used for a GAS Code source operand 
information . It consists of two pans . The first pan provides a value which 
specifies the length of the bitstring in bits, coded as a Direct Basictype sub-
field . The second pan provides the reference to the value from which the bit-
string will be taken. coded in the same way that the "Jn" variant is coded . 
NOTE If the val ue is an immediate or memory address value. and is longer 
than required . the bitstring will be taken staning at the least significant 
bit. If the value is too shon the ··missing·· bit s will be taken as zero . 
f. Bit_ Out. Identifies a bit in storage to whi ch a bitstring will be written . It 
is of the same form as the Bit_ In variant. 
g. NIL. Provides NI L reference when no object is required (pushing a NI L 
13 
value). Has no further components . 
14 GAS CODE REFERENCE MANUAL 
Gas Instruction Descriptions 
Arithmetic Instructions 
All "source" operands should be Basictype objects specified by General 
operands of the "In", "Fixed" or 'Temp .. variant form or Condition operands . The 
destination must be of the "Out" or NIL variant of General operand. Therefore all 
operands will be basictype objects . 
All operands must begin on b)1e boudaries, ie the bit offset must be zero . 
For all arithmetic operations if the operation results in a positive value then the 
GREATER condition will be set to TRUE. If the result is negative the LESS con-
dition will be true . If the result is equal to zero the EQUAL condition will be set. 
If the size of the result produced exceeds that of the data type specified for the 
destination operand, the overflow condition will be set TRUE . and the result will 
have the most significant pan truncated . The datatype for the destination operand 
will be set by the GAS Code generator to the Basictype specified in the Rcode 
instruction form which the GAS Code arithmetic instruction is derived . 
NOTE Note that the Rcode Basictype bit 6 specifies information on how to handle 
cwerflow on integers and bit 7 specifies whether to truncate or round for real 
arithmetic . The GAS code generator should generate appropriate code . 
ADD_ 
source_ l source_ 2 destination 
Adds source_ land source_ 2 and places result in destination . 
SUB 
source_ l source_ 2 destination 
Subtracts source_ l from source_ 2 gi ,·ing destination . 
MULT 
source_ l source_ 2 destination 
Thi s multiplies source_ 1 by source_ 2 giving the result in destination . 
DIV_ 
source_ l source_ 2 destination 
Thi s operation d ivides source_ l by source_ 2 giving the destination . 
15 
NEG 
source destination 
Takes source operand and negates it (based on the data type ) and places the result 
in destination . 
Logical Instructions 
AND_ 
source 1 source_ 2 destination 
Performs a logical AND on the data in source_ J and source_ 2 and places result in 
the destination . The source_ 1 and source_ 2 operands must be of the General 
form . The destination must be NIL , "Out" or "Bit_ Out'" variants only of the Gen-
eral operand form . If the destination is a Basictype the Positive , Negative or 
Equal conditions will be set according to the result placed in the destination . If the 
result is too large for the destination, the least significant bits will be placed in the 
destination , and the Overflow flag will be set to true . If either of the source 
operands are basictypes, they will be treated simply as a bitstring for the purposes 
of the operation. 
XOR 
sourc.e 1 source_ 2 destination 
As described for AND but performs logical XOR operation . 
NOT_ 
source destination 
Jnvens all the bits in source and places result in destination . The source operand 
mu st be of the General form . The destination must be of either the "Out" or 
"Bit_ Out .. variant of the General form of operand . If the destination is specified 
as a Basictype operand, will set the Positive, Negative . and Equal conditions based 
on the value placed in the destination . II the result is too large for the destination , 
only the least significant bits will be placed in the destination and Overflow will be 
set TRUE . 
TEST 
block siz.e operand_ 1 operand_ 2 
Compares the block of Basictype objects starting at operand_ J and the block of 
Basict vpe objects staning at operand_ 2. The size. operand_ J and operand_ 2 
must be of the Genera l operand form . Both operand_ J and operand_ 2 mu st be 
16 GAS CODE REFERENCE MANUAL 
refer either to bitstrings of the same siz.e, or Basictype objects of the same type 
and siz.e . If the blocksiz.e value is .. one .. , and the operand_ 1 and operand_ 2 are 
Basictype objectS , the values will be compared and the Positive condition set 
TRUE if operand_ 1 > operand_ 2; the Equal state set TRUE if they are equal ; the 
Negative state set TRUE if operand_ 1 < operand_ 2. If the siz.e is greater than 
one, the block comprising a series of Basictypes of length as specified by the 
blocksiz.e operand will be tested only for equality and the Equal condition set 
accordingly. If the operands are bitstrings, they will also be only be tested for 
equality. 
Special Instructions 
HLT 
With no operands this operation terminates program execution. 
TEST_ AND_ SET 
operand 
Tests the bit at the address specified by operand , and sets to one if z.ero. Trus is an 
indivisible operation. If the bit was z.ero then the EQUAL condition will be 
TRUE, otherwise it will be FALSE. The "operand" must be a General operand of 
"Bit_ Out" variant . The bit involved will be the first bit in the Bitstring speci fied . 
RTI 
With no operands thi s operation returns from an interrupt. 
lN_ 
blocksiz.e, pon. location 
This will move a block of Basictype data objects from the pon specified to the 
storage location starting at the Basictype object specified by the location operand. 
The pon operand must provide a Basictype value. Whether this value is a valid 
pon address for the target machine is for the user to ensure. The location operand 
must be an "Out .. , "Temp" or NIL variant of the General operand form . The 
blocksiz.e operand must pro,·ide a Basictype value , and specifies ho" many Basic-
type values are to be transferred . 
OUT 
blockiz.e. pan. location 
Moves data from the location specified to the pan specified. Similar to the IN 
operation. 
17 
STATESET 
reg no , source 
Moves the Basictype object specified by source , into the special register specified 
by regno. The source must supply a Basictype value . Regno must be a "Fixed " 
variant " (immediate) of the General operand form . 
NOTE Encodings of possible register codes are contained in the Reg_ Kind 
enumerated data type contained in the ponable project runtime library 
MACHINE module . 
STATEREAD 
regno. destination 
Moves the Basictype object from the special register specified to the storage loca-
tion specified by destination , otherwise similar to STATEREAD. 
SETINf 
new_ state 
new_ state is an "Interrupt State" operand form . 
SETINT Disable 
is used , for the following GAS Code all maskable interrupts are masked. until 
SETINT Enable 
when the previous masking state will be restored 
SVC 
code, no_ ops. operand , . 
Used to make use of services provided by an operating system kernel. These are 
calls that cause a "wakeup" of the kernel. effective! ,· a software interrupt of the 
kernel. The following two codes ha ve been reserved 
254 Save Context 
255 Load Context 
The SVC code is a "Byte Cardinal'· operand form and provides an immediate 
value. in the range {0 .. 255]. 
The no_ ops is a "Byte Card inal'' operand that specifies hov. many operands are 
following. 
18 GAS CODE REFERENCE MANUAL 
The following operands can be any operand of the General form . 
Jump Instructions 
BRC 
condition address 
This instruction branches to the address specified if and only if the condition 
specified is TRUE. The condition will be Positive , Equal, Negative , Overflow or 
Carry. The address operand may be any General operand that returns a Basictype 
value . Whether the address is valid is a problem for the user. 
CALL 
condition address 
Thi s instruction call s the code at the address specified provided that the condition 
is TRUE . The condition will be Always , Positive , Equal or Negative, Overflowed 
or Carrv Set. It is equivalent to the Roode FAST_ CALL. A return block consist-
ing of ;i;-e return address and any status bits pushed by the target machine call 
instruction. is pushed on the stack before the routine is called . The address 
operand may be any General operand that returns a Basictype value . Whether the 
address is valid is a problem for the user. 
RET 
Thi s instruction is equivalent to the Rcode Fast_ Return . It returns through the tar-
get machine return block on the top of stack . · 
PUSHMARK 
This instruction wi ll 
a . Create a new Display vector 
b. Current TOP_ OF_ ST ACK - new RESULT 
This assumes that an object of the size/ali gnment of the resu lt area is on the 
top of stack . 
c. Push a NIL object of the si ze and alignment required for a return block 
used by the return from subroutine target machine instruction that wil l be used 
for actual subroutine return. 
d . TOP_ OF_ STACK - new RETURNMARK 
e . Current TOP_ OF_ STACK + 1 - new ARGUST 
19 
NEWDJSPLAY 
This code sets up a new display , and must be the firSI instruction in a subroutine . 
II performs the following 
a . Make moSI recently created display active . 
b . Previous TOP_ OF_ STACK - new TOP_ OF_ STACK 
c. Make PREVDISPLAY of the new display point to the previous display. 
d . POP return block on top of stack to location pointed to by RETURNMARK 
e. POP the return block now on top of Slack into ROUTINEDESCR of current 
display . 
f. Move TOP_ OF_ ST ACK contents to LOCALSTORE of current display 
PO PM ARK 
This performs the action of 
a . Current RETURNMARK - previous TOP_ OF_ ST ACK. 
b. Delete current di splay . 
c. Make previous display active . 
Multiple Instructions 
SCAN 
blocksize target search_ for result 
Thi s instruction expects target to specify an Unsigned Basictype object of one byte 
in length . Together the blocksize and target identify a block of memory to be 
searched . The search_ for operand identifies a Basictype object provides a mask 
value to be searched for in the target block . The search begins at the byte 
specified by target and continues until a match is found , or the number of bytes left 
in the block is less than the size of the search mask value . If the mask is found , 
its stan address is Slored in resul t. If not found , result will be 0. The result 
should be a Basictype large enough to receive a pointer value , or truncation of the 
most significant bits will occur . 
XLATE 
source destination 
Converts the Basictype value provided by source to the basictype specified for 
20 GAS CODE REFERENCE MA'IUAL 
destination. and places the result in deSlination . Note that this instruction can be 
used for sign extending operands. If the result is too large for the destination , the 
Overflow condition is set and the moSI significant bits are truncated . Such pro-
grams will be unpredictable . 
Operand Movement Instructions 
PUSH 
size source 
Pushes the object, whatever its size , into the next object position on the top of the 
Slack . but allowing for the required alignment of the new object. The 
TOP OF ST ACK pointer of the current display vector will be increased by ONE. 
Note-that ~ urce could provide a NIL reference . 
POP 
operand 
Will Pop the object on the top of the Slack into the location specified . The operand 
must be an ""Out'' or "Temp" variant of the General Operand form . The 
TOP OF ST ACK pointer of the current display is decremented by ONE. The 
objed popped may consist of a block of objects of the Basictype specified by 
operand . There the POP will result in a block of objects being placed in the loca-
tion specified by operand . 
LOAD 
blocksize source destination 
Moves the block of Basictype objects specified by blocksize and location t0 the 
location specified by de stination . The destination must be an "Out" or "Temp" 
variant of the General type operand . Note that the source operand could provide a 
NIL reference . 
SHIFT 
shiftmode shiftsize operand 
Shifts the data in the operand . by the number of bits specified in the Basictype 
value provided by the shiftsize operand, and using a shift mod e as speci fied in the 
shiftmode operand . If the shiftsize value is negative, the shift is left , if positive the 
shift is rieht. The allowable shift modes are Arithmetic Shift . Logical_ Shi ft , and 
Rotate . The operand must be a "Bit_ Out" variant of th; General operand form . 
EXTRACT 
21 
source destination 
Moves the bits of source and places in destination . The source operand must be a 
"Bit_ In" variant of the General operand form and the destination operand must be 
a "Bit_ Out" variant of the General operand form . If the size of the source exceeds 
that of the target, only a bitstring equal to the length of the target will be placed in 
the target. The remaining bits of the source will be ignored . 
Pseudo Instructions 
USEVAR 
id 
The "id" operand is an immediate value that contains the identity of a new tem-
porary data object. 
DELVAR 
id 
Specifies that the temporary object specified by the immediate value in ''id'. no 
longer exists. 
USE LABEL 
id 
The "id " operand is an immediate value containing the identity of a new tem-
porary code label. 
DELLA BEL 
id 
Specifies that the temporary label specified by the immediate value in ··;ct ·· no 
longer exists. 
Appendix B 
Machine Fami l y No 1 GAS code Users Guide 
1 Introduction 
This manual provides background information describing the concept and use 
of Generic Action Set Code for the Machine Family No 1. The Reference manual 
provides detailed specifications of the GAS Code and GAS Machine for the famil y 
of target machines involved . 
The Portable Language Implementation Project which led to the production of 
this manual. involves the development of a software production system that is 
easily portable to a wid e range of target machines. Thi s is achieved by the use of 
a compiler system that can be easily adapted to compile a range of languages and 
to produce code for a range of target machines. Pans of the compiler which are 
specific to a language. machine or operating system are kept to the minimum abso-
lutely required and any such dependency is corifined to one kind per module (eg 
only machine or onl y language ). Careful design has ensured that each form of 
dependency only involves a few modules. 
To assist in this. the Modula-2 language pseud o-module SYSTEM has been 
slightl y modified to include low level mechanisms for handling machine registers, 
input/output and execution context. The runtime library which forms an essential 
part of the portabl e system incl udes faciliues for concurrency. message and event-
based synchroni sation and exception handling in addition to the more traditional 
input/output facilitie s. 
For all languages, the front end of the compiler will produce a common inter-
mediate code. Rcode (see the Rcode User's Guide and the Rcode Reference 
Manual ). This is a tree-structured language The front end considers storage ele-
ment sizes in the production of Rcode , a s the only target machine related parame-
ters. 
The back-end of the compiler has to produce target machine code . For 
machines which are architecturall y very similar , the compiler back-end code 
involved is very similar . To take ad vantage of thi s fact. the concept of a Generic 
2 GAS CODE USER'S GUI DE 
Action Set code has been developed . Thi s Generic Action Set code (or GAS Code) 
is a second intermediate code . It is a linear machine-code language which is hung 
on the Rcode tree to replace relevant Roode segments . The "instructions" of the 
GAS Code and the actual GAS Code "stream" produced from the Roode will reflect 
the general nature of code generated for any target machine that has an architecture 
similar to that of the GAS Machine . 
Consider. as an example. the production of GAS Code for a high-level 
language assignment statement such as 
K :=(X+Y) *Z 
For a real target machine with registers the GAS Code generated would be of the 
form 
ADD X Y TEMPI 
MUL T TEMPI Z K 
During generation of machine code for a typical family member, this GAS Code 
fragment could be convened to 
LOAD Rl. X 
ADD Rl. Y 
MULT Rl. Z 
STORE Rl , K 
For a zero address real machine on the other hand , the GAS Code generated would 
be 
PUSH X 
PUSH Y 
ADD 
PUSH Z 
MULT 
POP K 
Consider , as a second example, the following 
Z = ROUT (X.Y) 
The GAS Code produced in a machine with registers and a hardware stack facility 
could then be 
PUSH RESULT 
PUSH X 
PUSH Y 
lntrodua,on 
CALL ROUT 
STORE RESULT Z 
POP RESULT 
3 
where RESULT is an "area" reserved for resul ts of procedures that return a result . 
This could be convened almost directly to target machine code as 
PUSH RO 
PUSH X 
PUSHY 
CALL ROUT 
STORE RO.Z 
POP RO 
As can be seen from these simple examples, the conversion w GAS Code from 
Rcode is common to all target machines in a "family". The GAS instructions 
represent what is typical of the instructions of the machines in the family. and the 
GAS Code generated generally reflects the kind of target machine code which will 
eventually be generated . 
The GAS instructions to represent a famil y will have the form of a reduced 
instruction set machine . This is because :-
a . Individual machines in the family will have special instructions for 
efficiently implementing cenain functions. There is no point including "spe· 
cials .. in the GAS instruction set. only the general instruction categories need be 
represented . When converting GAS t0 target machine code , these "special" 
instructions in the target machines may be used . although various workers have 
argued in favour of adhering to a limited set of instructions even when such 
"special s" are available . 
b The GAS instructions are orthogonal in the sense that operations such as 
ADD for all data types are implemented by one generic GAS instruction . with 
the type of data involved indicated within each extra 0perand . In many 
machines. the number of instructions is greatly increased bv a separate instruc-
tion for each data type (and size!). 
c . The number of objects for an instruction such as ADD is fixed . For exam-
ple ADD assumes four operands -
ADD s0urce_ I. source_ 2. resul t 
In many machines there are several instructions for ADD, one for two operands 
one for three. etc 
4 GAS CODE USER'S GUI DE 
d . The GAS Code operations in thi s family are at most binary . This rules out 
instructions such as 
ADD X. Y, Z.K 
ie Add X, Y. Z giving K. 
2 Machine Family Definition 
The concept of famil y in relation to GAS Code generation . refers to the 
classification of machines according to the general architectural features of target 
machines in that family . Machines in the same family should give machine code 
for a program that is very similar in nature . The major factors that influence the 
form of machine code produced are :-
a. Does the machine support zero (stack based), one, two, or three address 
operations or a mixture"! 
b . Does the machine hardware support PUSH . POP and stack pointer relative 
addressing? 
c. What is the general range of instructions pwvided - ie. ADD. DIV . MOVE. 
BNE? 
d . Does the machine only suppon arithmetic on staticall y sized objects. and 
do the objects have to be byte aligned (not bit)" 
Other factors can affect the code produced . The most imponant of these factors 
is the data types supponed by the machine . However. th is is an area of great 
diversity among machines, and is not therefore an appropriate factor to be used for 
differentiating machine families. For a factor to be useful in discerning families. it 
must be clearly present or not present on machines. Many machines either have 
registers or don 't , have zero-address operations or don 't , do or don 't have PUSH 
and POP instructions. However. very few machines have identical or even very 
close data structure suppon. Data structure suppon is so complex and imponant. 
that even the Roode generator has to know what data sizes are avai lable in the 
hardware in order t0 allocate space reasonabl y. Rcode generates instructions 
involving what it calls Basictypes. which are basic exact , (binary coded ) decimal 
and real types of various sizes. Simple Rcod e will only be generated for datatypes 
which are directlv supponed by the target machine. 
The front-end mav als0 generate (block exact arithmetic) Rcodes which use 
operands indicating a multiple hardware-sized object. The GAS Code generator 
5 
6 GAS CODE USER'S GUIDE 
must in this case perform the operation in a series of stages. using the Basictypes 
supported by the target machine . This is because the machines in the family are 
only assumed to provide arithmetic operations on Basictype objects . This is a 
significant machine factor that can be used to discern families. It is present or it is 
not. 
NOTE A machine is considered to directly suppon an operation on a given data 
type. either by having a direct machine hard ware instruction, or by provid-
ing a software emulating system call . 
Another significant factor for machine code generation is whether indexed 
addressing is supported . However thi s has more of a local minor effect on the 
code generated and is not included as a factor for family classification of machines. 
The use of index registers and allowable offset sizes is very machine dependent 
and is. therefore , best left to the stage of converting GAS-code into to target 
machine code . The GAS Code will have the form 
ADD . refer_ variable+offset .. 
If a machine supports inde xed addressing this will conven \() 
LOAD 
LOAD 
ADD 
IReg, variable address 
Reg . OReg+offset) 
Reg .. 
whereas if the machine only supports indirect addressing. the machine codt wil l 
take the form 
LOAD 
ADD 
LOAD 
ADD 
Reg, variable address 
!Reg. offset 
R, Oreg ) 
R .. 
The GAS Code produced does not preclude either alternative above . However if 
the GAS Code generated was of the form 
PUSH A 
PUSH B 
PUSH C 
ADD 
MULT 
and the target machine did n0t in fact suppon zero address operations, but had 
registers. conversion of GAS Code 10 target machine code would be very difficult. 
The GAS Code addressing mcxies should , however , provide sufficient information 
Machine Family Definition 
on addressing modes so that address computation structure can be easily identified, 
allowing the target machine to select efficient addressing modes of the target 
machine . 
The above example clearly demonstrates that factors used to differentiate fami-
lies of machines must be such that they significantly affect the form of machine 
code produced; the presence or absence of such a factor on a machine must be 
clearly discernible . 
• 
3 Block Structure and Scope 
Many high-level programming languages have what is referred to as block 
structuring. The implication of thi s is that blocks of code are defined which are 
lexicall y contained within other code blocks. A code block can declare it 's own 
objects . Some form of "scope" rules are applied · typicall y such that the code in a 
block can access it's own local objects and objects in blocks within which it is lex-
ically contained . Objects declared inside blocks that are lexicall y contained within 
a block are "invisible" to the containing block . 
This block structuring is also implemented in conjunction with an extended sub-
routine structure where each code block is named and can be invoked by the use of 
thi s name . Arguments can be passed at anv such invocation . Implicit in a call is 
that space for local objects of the block called is automatically allocated . 
When the execution of a code block terminates. space allocated for local objects 
of the block and any arguments are assumed to be released. control returns to the 
instruction in the outer calling environment after the one which invoked the code 
block . Those code blocks that can be called from any point in a program follov. 
the same scope rules as defined for access to obJects. That is . code blocks them-
selves are considered to be objects. A code block which is lexically enclosed in a 
code block is considered to be a local object of the enclosing block . As a code 
block is assumed to be able to access local data objects in all code blocks which 
enclose it lexically. and as the local data objects are only assumed to exist for the 
time of activation of a code block , then it is necessary that when a code block is 
active . all it 's lexically enclosing code blocks are also currently active even though 
possiblv suspended. When recursion is allowed there can be many current activa· 
tions of a given code block . In this case the interpretation of "in scope" data 
objects is dynam ic. When a given code block wants to access a data object in 
some enclosing block , but there are several current activations of that code block , 
the data object accessed will be the one in the most recent activation of the code 
block . 
Also related 10 code blocks is the question of error traps provided by many real 
Bloa. S1ruaur< and Scope 9 
target machines. such as "divide by zero". To enable programs to recover them-
selves when such errors are encountered, a programmer will often include code to 
capture the trap. However , the code required in each code bloc, may be different . 
On entry to a code block the programmer will set up appropriate code to handle 
traps . On return the handler code in force on entry must be re-instated . Hence 
such handler code is related to code block structuring. 
Many processors provide special hardware instructions to help in the implemen-
tation of this block Structuring. These relate to such things as :-
a . Allocating space for local data objects when the code block is called . 
b . Providing facilitie s for aocessing data objects of lexically containing code 
blocks . This is often referred to as a static linJ.. mechanism. 
c. Providing facilities for accessing arguments and result areas . 
d . Providing facilities for deallocating space used for local data objects and 
arguments . and for reinstating the environment of the caller code block and 
returning to the instruction after the call . This is often referred to as a dynamic 
link mechanism. 
e . Provide facilitie s to save registers on entry to a subroutine and then rein-
state saved values on exit. 
f. Provide facilities to save the program status . ln the GAS Machine this 
comprises the Less_ Than . Greater_ Than. Equal , Carr)·_ Set , and Overflowed 
status "flags". On exit from the subroutine. the saved values should be re-
instated . 
g. Provide facilities for managing exception handlers on entry/exit. On entry 
to a subroutine. the current exception handler should be saved , and on exit the 
saved handler should be re-instated . 
The use of any facilities provided in the target machine will usually help to 
implement block structure constructs efficiently . However, the type of facilitie s 
offered by many processors are varied, and often only cover certain of these issues. 
For example. the DEC VAX processor "CALLS" instruction provides facilities for 
easy management of arguments, register saving/reinstatement. dynamic linking . 
status saving and reinstatement . and exception handlers. However. it does not pro-
vide automatic allocation/deallocation of space for loca l data objects or for 
automatic maintenance of access paths t0 objects in lexicall y enclosing code blocks. 
10 GAS CODE USER'S GUIDE 
Code Block Environment 
The GAS Machine must be able able to cater for a range of target machine 
facilities for implementing block structuring. It must therefore provide a fully con-
ceptualised implementation of bloc!- Structuring, not a panial implementation. 
The basis of the GAS machine support for block Structuring is the concept of 
the "environment" of a code block. An environment consists of :-
a . Its local data objects . 
b. It arguments list; the arguments passed to it when called . 
c Its ' result area: where to place any result value that is to be returned on 
completion of the bloc!- . 
d . The environments of lexically enclosing code blocks. Thi s is required to 
allow access to objects in these blocks in accordance with the scope rules. 
e. The environment of the code block which called it. Thi s is required for 
returning on completion of the current code block . 
f. The currently active exception handler routine for the code block . 
g The current program counter value for the code block . 
The register contents are also pan of a machine environment, but this is very target 
machine related and is not important at the GAS machine level. 
The environment of a code block is maintained in a display vector . This con-
sists of a set of pointers to objects on the code stack . The display vector interacts 
with the PUSHMARK, POPMARK and NEWDISPLA Y GAS Code inStructions 
The way in which the Display and these GAS Code instructions interact m the 
operation of the GAS Machine is detailed in the "GAS Code Reference Manual" 
The access t0 data objects in outer lexically enclosing blocl- s is assumed to be pro-
vided automatically, given a display pointer identit y. lexical level. and offset from 
the pointer contents. The GAS Machine therefore does not bias in favour of any 
particular method of implementing such an access (this is left up lO the target 
machine code generating stage where techniques will be used that mal.e use of an) 
special leatures pro,·ided by some particular target machine). but provide, 
sufficient information to allm1 any methCld of implementation to be used . 
The importance of providing the se runtime structures is that GAS Code can 
represent the steps involved in implementing subroutine calls in the famil). the 
structure of the subroutine call 1s still clearly described (the Call and 
Declare Routine Rcodes ma,· not be retained bv the GAS Code generator]. and a 
standard calling convention :.0d stack structure ·,s provided which the target code 
Blad Suuc:tur< and Scope 11 
generator can anemp1 to proYide as closely as possible regardless of target 
machine , providing in so doing a structure that will assist wi th ponability of any 
code that manipulates the stack . 
Datatypes Supported 
The data type support provided reflects the support typically provided by the 
machines in the family . except that for a data form for which there is no "typical" 
family support. the structure of objects of this form should be clearh· retained so 
that efficient target code can be selected . An example of this in family No I is bit-
strings. Target machines in the family provide a wide variety of facilities for bit-
string operations. Therefore in the GAS Machine bitstrings can consist of either a 
statically or dynamically defined number of bits. No restrictions are placed on pos-
sible size of the bitstrings other than by some implementation restriction which typ-
icall y uses a 16-bit object to indicate the number of bits . If the target machine pro-
vides limited operations on statically sized bitstrings that are multiples of bytes , or 
a wide range of operations on dynamically sized bitstrings with or without a size 
limitation. thi s is a problem for the target code generator. For machine famil v No 
I the types supported are - -
a. Basictype Objects . These reflect the fact that the machi nes in the famil y 
typicall)' provide operations on static sized objects that are multiples of bytes. 
The Rcode Basictype coding scheme therefore provides the ideal mechanism for 
describing these objects. 
b . Blocks of Basictype Objects . Provision of these reflects the fact that manv 
of the machines in the famil y provide operation on blocks of memory, co;. 
sidered either as a block of bytes or a block of Basictype values. The opera-
tions typicall y provide block moves (LOAD, PUSH . POP , IN , OUT). compare 
and scan operations. Again the facilities provided on the machines in the familv 
vary so that it is necessary to retain the full structure of the block in the GAS 
Code operand . Some target machines in the family mav provide limited opera-
tions on blocks of bytes only. whereas some will provide operations on blocks 
of Basictype objects . 
c . Bitstrings . Strings of arbitrary numbers of bits are freq uently manipulable 
by target hardware in a restricted wa\'. 
Temporar~ Objects 
A temporary storage reference involves an operand which is a reference to an 
object which has been a llocated by the GAS Machine 10 temporary storage . These 
record s are created when a temporary resull or value requires storage . Such 
12 GAS CODE USER'S GU IDE 
temporaries are results generated whenever an Rcode subtree is required to return 
a resul t. When converted to GAS Cod e, the code for the subtree will be generated 
with the value to be returned left in a temporary. The value in the temporary is 
used either in generating target cod e or working out addressing modes for the node 
with the subtree. The subtree could involve the generation of the length of an 
operand , or the initial value of an index of a "FOR" loop . Temporary storage 
declarations contain the "id" of the element and the machine data type to be stored . 
When a temporary storage object is required . the USEV AR GAS Code will be 
generated and inserted in the Rcode tree . The unique-id allocated to this object is 
used to reference this object . The DEL Y AR is used to allow deallocation of the 
entity data base record for this object, and also for the generation of target machine 
code as temporary objects are obviousl y very important in register allocation , dur-
ing final conversion from GAS Code to target machine code . 
Temporary objects may be allocated space on the stack or to a register. or to 
global storage . The choice depend s heavily on the target machine architecture and 
the lifetime required for the temporary object . The "temporary data store" concept 
therefore clearl y identifies objects that are temporary and the USEV AR/DEL Y AR 
pseudo GAS Codes provide the information needed for their efficient 
allocation/deallocation . 
The use of temporaries can be considered to be the equivalent ()f modelling a 
machine with an unlimited number of registers. Whenever a temporary storage 
location is needed . a register can always be found. that is a temporary in GAS 
Machine terminology . Extending the analogy'with machine registers . it is assumed 
tha t "currently active" temporary data objects are saved when a CALL GAS Code 
is encountered . 
Note that with block structured languages. a temporary data object is normally 
considered to be local to the code block in which it is defined . If the block can be 
called recursively , then at one time there can be several instances of the temporary. 
When a temporary is referenced , the most recent instance of the temporary is 
required . The GAS machine automatically achieves this. 
In the target machine. a temporary can only be allocated to a global storage 
locati on if it is not possible for there to be more than one instance of it al one 
time . This is definitel y true if there is no CALL GAS Code between the USEVAR 
and DEL Y AR for the temporary . If there is. the temporary must be allocated 
space on the stack or to a register . Thi s ensures that the most recent activation of 
a temporary is accessed when a temporary is referenced , because if the object is 
allocated on stack then the temporary in the current fra me will be accessed. and if 
tha t temporar,· has been allocated to a register then the re gister will only h0l d data 
Blad 5uuaure and Scope 13 
for the currently active code block . 
Registers for active but suspended code blocks are saved on the stack. A tem-
porary can be moved between Stack and registers . The entity database record must 
be kept up 10 date by the code generator to show where it is located at all times. 
When the DEL VAR GAS Code is encountered , the storage for the temporary 
can be deallocated , but this may not happen immediately. If temporaries are 
created and deleted on a nested basis, storage can usually deallocated immediately 
by popping them off the Stack, since the temporary to be deleted will always be on 
the top of the stack . The detail s of handling thi s is a problem for a GAS Code 
interpreter or target machine code generator. 
A temporary jump label record is created to implement such high-level 
language constructs as WHILE. IF-THEN-ELSE, CASE. for which a jump to a 
label is required . The USELABEL pseudo-GAS Code is added to the Roode tree 
when a temporary label is required , and a DELLABEL pseudo-GAS Code is added 
when the label is no longer needed , so that any GAS entity database record can be 
deallocated . The information required for these jumps is different from Roode 
labe ls which are source language related entities and have additional information 
such as lexical level. Jump label s are merely the entities required for implement-
ing thi s kind of control flo" structure and the information required is simpler . 
Hence the use of separa te records of jump labels. 
Immediate Objects 
Immediate objects occur when a value has been Statically evaluated by the 
back-end . These objects can be 1.2 .4 or 8 byte length objects, and the datatype 
must be one of the Roode basic datatypes. 
Standard Conditions 
Standard elements are used to implement the characteristic of many machines 
to allo" extended size ar ithmetic to be implemented via several steps, with the use 
of carry and overflow flags 10 allow carry over from one stage to the next. and also 
10 control the flow of control depending on the result of previous instructions 
a . CARRY - if previous exact arithmetic produced a carry. then the standard 
element CARR Y wi ll conta in 1. else it will contam zero . For example ,n the 
sequence 
ADD X. Y. Z 
ADD A.CA RRY . M 
ADD M. B. RESULT 
RET 
14 GAS CODE USER ·5 GUI DE 
In this example X and Y are added to produce Z . the leaSI significant pan of a 
double precision ad d . The next two add instructions. add the most significant 
components and any CARRY from the least significant add . The result is 
placed in the RESULT standard storage area since it is the result of a functi on 
call . 
b. OVERFLOW - This last operation resulted in an overflow. 
c. POSITIVE - The last operation resulted in a "Positive" value. or in a test 
operation the first operand is larger than the second. 
d . EQUAL - The last operation resulted was test of two equa l values or the 
result was zero . 
e . NEGATIVE - The last operation resulted in a negative value, or in a test 
operation the first operand in less than the second . 
Machines in family No J provide an overflow and carry bits that provide the basic 
mechanism for multiple-precision arithmetic operations. They al so provide the 
equivalent of Positive , Equal and Zero flags and these are used to provide the main 
basi s for control of program flow. 
In many machines, some flags reflect factors intrinsic 10 the machines own 
architecture . From a conceptual. non machine specific vie" . the only significant 
flags are interrupt enable/disable , carry/overflow flags. and Positive. Negative and 
Equal flags. 
The GAS standard conditions can be used' a s booleans in tests. or 10 provide a 
"J'' (fRUE) or O (FALSE ) value . This is done to reflect the fact that there are 
times when "carry ·· can be used for example , 10 provide one of the source values in 
an add ition operation and at other times can be used as the condition in a cond i-
tional branch . Man) processors have a "zero" Status bit but this is the same as 
"equa l" where the previous operation would have set the "equal "' condition to true 
if it had resulted in a zero result . 
4 Run-time Entities 
The term "entities" refers 10 objects whose declarations are encountered in the 
Rcode , or objects that are created during GAS Code generation , such as temporary 
labels. Additionally, objects will be identified that are imponant to the GAS Code 
generator and optimiser, and target code generator , but which are a sub-field of an 
Rcode or GAS Code entity. An example is a Basictype object located in an Rcode 
storage area or variable storage area. Such an object will be imponant to the GAS 
optimiser and the target code generator as it will no doubt be a candidate for regis-
ter allocation. During GAS Code and target machine code generation , an entity 
data base will be maintained that record s information about these objects . 
If GAS Code is generated and the Rcode tree is wrinen to disc then anv inter-
preter or target machine code generator that read s this Rcode fil e will have to 
reconstruct the entity database . All Rcode declarations for static storage areas and 
constants must be left on the Rcode tree to define storage areas, the task of identi · 
fying all objects imponant to the GAS Code optimiser and target code generator 
will be much easier if all declaration Rcodes are left on the Rcod e tree. This of 
course wi ll depend on the storage available . The subtrees of these Rcode node s 
wi ll contain GAS Code . 
The "run-time" data base consists of a set of records. one record for each object 
encountered . Objects are of one of the following types :-
a . Rcode Variable storage areas · space allocated on stack . 
b. Rcode Static areas · global space allocated . 
c. Rcod e Constants · al so allocated global storage . 
d . Labels· Rcode labels or GAS temporary labels. 
e . Symbols. 
f. Routines. 
g. Temporarv storage locations required by the GAS Machine' 
15 
16 GAS CODE USER ·s GUI DE 
h. Optimiser and target code generator "variable s". 
Entity records for Rcode objects should be constructed when declarations are 
encountered in the Rcode . Entity record s for temporary data objects and labels 
should be created during GAS Code generation as they are required . They may be 
deleted when no longer needed . The GAS optimiser and target code generator 
should create and delete record s for variables they identify as important. The 
entity record s are used to relate the declaration of objects and their later use . 
Note that the entity database records for Rcode objects have to be accessible 
via their Rcode Identification when GAS Code is being generated . For example 
when an Rcode REFER_ VARIABLE requires conversion to an object offset. They 
also have to be accessible via their object offset and lexical level when target 
machine code is being generated so that a GAS Code with reference to an object 
by object offset and lexical level must be convenible into a byte offset to the object 
or it 's descriptor if dynamic in length . 
5 GAS Instructions 
Instruction Formal 
GAS-code instructions wil l have the format 
instruction_ code operand_ I, operand_ 2, .. . 
Each instruction has a fixed number of operands, except for the SVC instruction . 
For an instruction such as ADD, the number of operand s in a register-based 
machine will be three 
Operand_ 1 
Operand_ 2 
Operand_ 3 
source operand 
source operand 
destination operand 
Thi s is the most general form for a BINARY operation . The ad d instruction code 
is therefore generic. 
Rcode vs. GAS Code 
The following paragraphs anempt 10 explain the relationship between the two 
intermediate codes - Rcode and GAS Code . 
Rcode is a tree structured intermediate language . Operands of an Rcode opera-
tion can be defined by a routine which is a subtree of Rcode that returns the 
required operand value . An example is an Add Rcode where the first operand is 
obtained from a subtree that involves much arithmetic. Execution of Rcode would 
therefore essentially be recursive . When an Rcode instruction is to be executed. a 
halt in execution would be needed when a subtree is encountered, and execution of 
this subtree wou ld be performed . When complete. the value returned (if any). 
would be used to continue the execution of the "suspended"' Rcode instruction . At 
an,· time there could be a stack of suspended Rcodes. GAS Code. on the other 
hand. is essential ly a /,near code . It is generated bv removing much of the tree 
structure of Rcode. GAS Code is generated for an,· subtrees of an Rcode instruc-
tion and any values returned by these subtrees are left in temporary storage 
17 
18 GAS CODE USER'S GUIDE 
locations. When GAS Code has been generated for all subtrees, then the GAS 
Code will be generated for the Rcode itself. 
Most "topDlogicaJ" Rcodes wil l probably be left on the Rcode tree by a typical 
GAS Code generator after GAS Code has been generated . The term "topology" is 
used here 10 mean Rcodes that help describe the semantic structure of the code . 
Examples of this are Loop. Case , For. and Call_ . Thi s information will still be 
very useful during target machine code generation . However the subtrees of these 
Rcodes will now point mainly to linear sequences of GAS Code . The literal field s 
of these Rcodes are also retained . 
Declarative Rcodes are also retained . This is because the entity database is not 
saved if the GAS Code is written to disc . As stated earlier , the database contains 
information on declarations encountered and is used for relating references to 
objects , to the declarations of objects, and also for recording offsets allocated to 
objects on the stack . Declarations include the size and alignment of objects . When 
a GAS Code containing tree is read from disc . this database must be regenerated . 
Therefore the reader wi ll need to use the declaration Rcodes encountered to rebuild 
the database . Any sub-trees of these declaration Rcodes will contain GAS Code . 
GAS Code nodes are implemented via the Reserved Rcode. Each of these 
Rcodes is a GAS Code instruction . The format of the Rcode will be 
byte_ one : 
byte_ two . 
bvte _ three : 
byte_ four : 
rest : 
The Reserved Rcode indicator 
Number of bytes of literal dat;J 
from byte_ four to the end of thi s Rcode 
The indicator of a particular GAS Code 
GAS Code operand bytes 
If several GAS Codes are linked in a sequence , this will be done using the standard 
LINK Rcode . 
Rcode G roups and GAS Code 
Rcodes consists of several groups. Thi s division into groups is relevant to the 
actions which should be taken by a GAS Code genera10r. 
Group O This group is of no major significance 10 the back-end of a compiler 
and has no effect on a GAS Code generator. They are prelude operations used 10 
reconstruct the Rcode tree when reading from backing store . 
Group 1 Thi s group contai ns declarations such as program units. variables. 
GAS lnstrucoons 19 
ronstants . 
They are used by the back-end to produce the GAS entity data base. and for the 
allocation of storage . 
Group 2 These are used to obtain values as operands for Rcode instructions 
without actually using the rontents of storage locations. These provide operands 
that are evaluable addresses and constants. They are such things as Refer-
Constant. Refer-Variable , Refer-Result, Refer-Area, Literal. Refer- Variable()( ) 
means use the address of the variable "X" as the operand . Refer-Area(F) means 
return the address of the area "F". If the variables are global , the operand for 
Refer_ Variable()( ) will be statically evaluable, if not an instruction such as 
LOADADDR @SP+4 , Rl 
rould eventually need to be generated in the target machine . In GAS Code the 
operand is a pointer to the GAS Code entity rerord for X. and the mode of 
addressing will be "address of'. 
Group 3 Thi s group of Rcode s provides the facilitie s for generating operand 
values that are romputed from values in storage locations. It also includes opera-
tions on storage locations themselves . An example of what is essentially an 
operand reference is "Block-Load " which returns a block of bytes at a specified 
address. 
An example of an Rcod e that involves operand manipulation is the 
"Block_ Store" code which will store a given value in a specified location . 
The group also includes the Call Rcode. This will produce GAS Code for set-
ting up the argument list. and Call of the procedure code . 
Thi s group of codes therefore involves operations on storage entities. that 
involve -
a . Returning all or pan of the rontents of a storage location - effectively 
operand specification . 
b. Setting all or pan of the rontents of a storage location. 
c. Invoking a procedure - a procedure is a piece of storage that is treated as 
code . 
This Group of Rcodes will therefore produce in GAS Code 
a. Operand s of GAS Code instructions. 
b. GAS Code memory reference instructions. 
20 GAS CODE USER'S GUIDE 
c. GAS Code instruction sequences to set up argument lists for and then 10 
CALL procedures. 
It is important in GAS Code to retain via addressing modes, as much of an 
address computations structure as possible . 
Group 4 This group provides rontrol logic information · Loop. For , Case , Goto, 
Link , Label. Most of these Rcodes are left untranslated on the Rcode tree, so as to 
provide vital information to the phase that produces target machine code from GAS 
Code . However , most of these codes will also involve the generation of BRC GAS 
Code instruction of some kind and the creation of destination label records (see 
notes on these above ). Also included are Rcodes which rontrol sequences of state-
ments, statements that are to be executed in strict sequential order , or a set of 
statements that can be executed in any order or in parallel if the target machine 
supports concurrent execution . Information on whether statements must be exe-
cuted in sequence or can be executed roncurrently is very important to the phase 
that ronverts GAS Code to target machine code - which is why it is important that 
group 4 Rcodes are lefi on the Rcode tree . 
Group 5 These are the codes which provide arithmetic features. They include 
codes to load/store values from/to storage locations, and rontrol operations on 
shared storage locations. Some arithmetic operations will conven directly to single 
GAS Code instructions. whereas others will require several GAS Code instruction · 
for example. extended precision operations where data type sizes exceed the target 
machine capabilities. Also included in this l!oup are facilitie s for runtime checks 
and error message generation . 
Group 6 The Rcodes in groups O to 5 will be generated independently of the tar-
get machine or operating system . The back -end must suppon all of the Rcodes in 
these groups. Group 6 involves Rcodes which have semantics which are fre -
quently dependent on the target machine or , possibly. its operating system. Some 
of these , therefore , will not be supponed by all compiler back-ends. The Rcodes 
in group 6(a), however, are all required of every target machine and are therefore 
processed into GAS Code . Other Rcodes in thi s group are passed untouched to the 
phase that converts the tree with GAS Code into target machine code. It is onlv in 
this phase that detail s of the specific target machine are known . 
Group 6(a) Rcodes are referred to as "Compiler Standard Extension;" These 
are generated by the rompiler front-end , to implement such source language 
features as Modula-2 's get and set register facilities. input and output. and sen•ice 
calls. These facilitie s will not necessaril y be supportable on all target machines 
and the back-end will give error messages if they are not supported 
GAS I nstrua.ons 21 
Other codes in group 6 will only appear if included in-l ine in the user's source 
program . This implies that the source language suppons in-line Rc:ode. The front 
end will pass these codes on unprocessed , as will the back-end phase that generates 
GAS Code . These machine specific Rcode extensions provide the ability to take 
advantage of specific feature s of a machine . Only the user who writes the in-line 
Rcode. and the back-end phase that produces target machine code are involved in 
the implementation of these extensions . Therefore it provides a streamlined 
mechanism for taking advantage of machine specific features in a way that has 
minimal effect on the compiler code . 
NOTE II should always be noted that any user program containing in-line machine 
specific Rcode extensions will not be portable to other machines, if at all . 
Group 7 This group provides type information that is used for :-
a . Implementation of separate compilation by providing a dau, structure to 
record symbol table information . 
b . Generating information to be passed to a source-language debugger. The 
back-end mu st therefore process these codes to produce whatever format is 
required for an interpreter or debugger to be used . These Rcodes will not be 
processed during the phase of producing GAS Code , but should be handled in 
the following phase. where the details of the target debugger or interpreter will 
be known . 
Group 8 This group allows the possibility of extending Rcode to include either 
general purpose. or implementation specific extensions. Each extension code is to 
be foll0wed a second code indicating the extension within the extension code 
group. GAS Code is the "Reserved" Rcode . 
Special Instructions 
Most of the GAS instructions are fairl y self explanatory , because they are sim-
ple instructions. However. the special instructions group requires some funher ela-
boration . 
Input/Output. The IN/OUT instructions are used to code input and output Rcode 
instructions specified in group 6(a) "Compiler Standard Extensions". These codes 
pro,·ide a portable conceptualised representation of IO. They involve transfers 
between a storage area and an 1/0 pon. It is up to the user programmer 10 ensure 
the port address is valid for the target machi ne. and that the pon specified handles 
the dau, type specified . 
If the source language suppon s in-l ine Rcode. then IO can be done with 
machine specific extensions. rather than with the INiOlIT instructions. However 
22 GAS CODE USER ·s GUI DE 
thi s will make the program less port.lble to other machines. Most languages , such 
as MODULA-2, do not suppon in-line R code , and therefore group 6 (a) Rcodes 
will be generated for 10 . 
Note that the pon-id can be dynamically evaluable where the target machine 
suppons thi s. This means that no checking can be made for a valid port-id by the 
back-end . and that the back-end should generate an error message when the target 
machine cannot suppon this facility. 
State Set/Read. When a user program refers to a register (eg in MODULA-2 
using the get and set register facility) the front-end will examine the register cod-
ings available in the portable compiler MAOUNE module and determine if the 
register is a special register such as a status register or segment register . If it is, 
the front-end will generate an Rcod e instruction for manipulating special registers. 
otherwise , it will generate the seVget general register Rcodes. 
The dau, type is relevant in that the front-end checks data type length is not 
greater than the register length. If it is shoner. the back-end will generate 
appropriate code to replace the lower order bits by the source . leaving the remain-
ing bits unchanged I 
Note that special registers are treated differentl y from general registers. General 
registers are all ocated by the automatic register allocation mechanism of the back -
end . Special registers are not. 
Set-Interrupt. This instruction is designed to implement Rcode instructions that 
must be uninterruptible (as opposed to criticai code in which only mutual exclusion 
need be enforced ). and to allow return from interrupt handlers to be specified . For 
example 
Add_ In_ Place_ Locked address, value_ 10_ add 
could be coded in GAS Code as 
ADDX. Y,X 
however. th is gives no indication that this must be uninterruptible . Therefore it 
should be coded as 
SETINT DISABLE 
ADD X.Y.X 
SETI NT ENABLE 
SVC instruction . This implements the Rcode group 6(a) "Service_ Call" instruc-
ti on. For language which suppon concurrency , for example, the compiler front-
end will generate system call s for LOADCONTEXT and SA VECONTE>,'T. Other 
GAS lnslnJCl..ons 23 
calls will be generated directly only by languages that suppon in-line Rcode. In 
the latter case the back-end will have to be aware of the meaning of the system 
call code and generate appropriate code . This requires co-operation between the 
programmer and the compiler back-end , which is inevitable when writing 
machine-related system software. 
The codes 254 and 255 are allocated the meanings SA VECT)NTEXT and 
LOADCONTEXT acrnss all languages and target machines. The concept of SVC is 
imponant. System call s such as GETDATE, GETilME, GETREffiRD are 
effectivel y only subroutine calls. and are not considered SVC's. SVC's are where 
the operating system kernel is effectively asynchronousl y interrupted and forced to 
perform some action such as sening a timer , performing physical 10 via OS rou-
tines, sening up and sa ving process data . Using the kernel for facilities such as 
GEITIME will be provided by a runtime library procedure. 
Appendix C 
Rcode Reference Manual 
University of Waikaw 
Department of Computer Sciei,~e 
PORTABLE LANGUAGE IMPLEMENTATION PROJECT: 
RCODE REFERENCE MANUAL 
Introduction 
This reference manual contains the formal specifications of the meaning and 
purpose of each opcode within the intermediate code known as Rcode . Any imple-
mentation is at liberty to choose values and sizes for parameters and opcodes 
themselves, exC(.,,, as uay be herein defined . The values and Lmns u:;.:d within 
the project of which this specification is a part Me defined in the appropriate 
Modula-2 Definition modules used within the compiler structllre . 
The following opcode descriptions and formats are, unless otherwise st;.!ed , 
common to all target machines. Opcodes in Groups 6b, 6c, etc are specific to par-
ticular processor hardware , designed to allow a oompiler front-end to use special 
hardware facilitie s where this is (rarely) necessary. 
Object Alignment 
Alignment of objects referred to in Rcod e. wherever required , is specified as a 
single octet (b)1e of eight bits) This indicates that the address of the allocated 
storage must be divi sible by the corresponding power of 2 . 
For example , 0 indicates arbitrary b~e-ahgnment , whereas a value of l means 
that the address must be word-aligned (divisible by 2). These are the two most 
useful values on most microprocessors. since some classes of object are required to 
be word-aligned because of the machine bus architecture _ Alignment values up to 
3 (indicating quadword alignment ) are those most likel y to be used in practice; a 
value of 9 (indicating page alignment ) is occasionally required for the beginning of 
storage areas. 
PORTABLE LANGUAGE IMPLEM ENTATION PROJECT 
Da!Jl Types 
The primitive data types recognised in Rcode are boolean . 1ris1are . signed 
eracr, unsigned eracr , real, decimal and unspecified . Except for boolean and tri· 
state, these may come in different si:r.es. Particular opcodes are defined for operat -
ing on multiple-b~e blocks of data, and on strings of bits. 
It is the responsibility of a compiler front end to map operations on source 
language data types into appropriate Rcode opcodes. For the common simple 
types (eg boolean, integer, real), the correspondence is obvious; operations on 
more complex data structures will need to be built up out of the available Rcode 
primitives. 
Boolean A boolean datum has two possible values: true or false . Considered 
as integers, these have values l and O respectively . An attempt to use any other 
integer value as a boolean gives undefined results. A boolean may be stored as a 
component of a packed structure in a single bit; unpacked, however , it takes up a 
whole byte . 
Tristate A tristate datum, as i•s name suggests, has three possible values, 
represented by the integers -1 , 0 and + l. Use of other integer values where a oi -
state is wanted gives undefined results . Tristate values are returned by the com-
parison operators, where they are used to denote 'less-than ', 'equal ' and 'greater-
than ' respectively. A oistate value allows concise expression of the state of CPU 
condition codes. As pan of a packed structure , it requires two bits for storage: 
unpacked. a single b~1e. 
Signed E~acl Signed exact numbers (integers) may be represented in any 
appropriate form for the target machine . This is most often twos-complement 
form . At least two sizes (b~e and word) of arithmetic are supported within 
Roode. This maY. of course . result in the generation of multiple length machine 
code arithmeti c on those machines which do not support this in hardware 
Unsigned Exact Unsigned exact numbers (cardinal numbers) are available in 
the same range of sizes as signed ones, for all machines. However , it should be 
noted for machine cod e generation purposes that, on most machines, hardware 
arithmeti c for unsigned numbers is resoicted to addition and subtraction. as 
hardware multiph· and divide instructions only operate on signed quantities. 
Real Rea l arithmeti c hardware is unavailable on many microcomputers. Where 
machine code generation is required , therefore, it wil l often be necessary to emu-
late hardware tloatinf point facilitie s. The lengths provided for real numbers 
includ e at le~,t 3c. 64 and 80-bit (temporary ) as defined in the proposed IEEE 
standard Where hardware facilities for longer real arithmetic is available on an '" 
RCODE REFERENCE MANUAL 
particular machine then a size coding greater than 2 should be used . 
Decimal Decimal arithmetic operates on blocks of packed decimal digits, each 
digit occupying four bits. 
Unspecified Unspecified is the dat.a type for which arithmetic is really exact 
arithmetic in another guise . However, whereas signed and unsigned exact arith-
metic is meam to cause a run-time error should the resuh be outside the represent-
able range, unspecified arithmetic merely returns the truneated result. For exam-
ple, unspecified 8-bit, 16-bit and 32-bit additioo and subtraction is the same as 
unsigned addition and subtraction modulo 256, 65_ 536 and 4_ 294_ 967_ 296 
respectively . 
Type Coding 
Wherever required, the data type is represented in a single octet (byte of eight 
bits), encoded as follows :-
a . Biis O . . 2. These contain the size, interpreted for exact and unspecified 
values as C.."t:ots 'Oilles a power of 2, for reals in a st.andard way u;, to 2 (u -
32-bit, 1 - 64-bit, 2 - 80-bit) - values of 3 and 4 arc target machine dependent 
in interpretation. The size code must be O for boolean, tristate -and · decimal 
dat.a. This is because unpacked boolean and tristate values always occupy I 
byte . For decimal dat.a, the size is dynamically determined from that of the 
operands . 
b. Bus 3 .. 5 . These specify the data type, as follows: 
0 - boolean 
J - tristate 
2 - signed exact 
· unsigned exaC1 
4 
- unspecified 
5 - real 
6 - decimal 
- reserved 
c. 811 6. If the data type is a signed or unsigned exact , bit 6 equal to I causes 
the result of the operation to have, prepended to it . a byte contain a boolean 
indication of whether overflo" occurred or not Also, the signalling of a run-
time error on overflow is disabled for that operation. Bit 6 equal to zero 
causes the operation to return it, normal result, and to signal an error on 
overflo" . For other data types, the use of this bi t is reserved for conveying 
implement.ation-specific information . For the VAX . for example . if the datum 
I· ..... , ......... , ,.,::~:·:.::::.:·:::.~::: .. 
distinguishes between the two kinds of double precision available: tt ts O for 
D _ floating, and 1 for G _ floating . 
d . Bir 7. For real operations (dat.a type 5), this bit is_ available to indicate 
whether the result of the operation is to be truncated (bit 7 = 0) or rounded 
(bit 7 = J) . For other dat.a types , this bit is reserved for conveying 
implementation-specific information . 
I Notation . 
I The opcodes in this Reference Manual are divided into funcuonal groups and 
subgroups indicating the purpose for which they were ongmally conceived . No 
implement'.er is restricted to using any opcode for any specific _purpose suggested m 
this manual , but the use of Rcode at all implies that the user mtends to conform to 
the form and function specified for each individual opcode. 
In addi:ion to beinr. divided into groups, all opcodes are divided into a small 
numbec ~r classes :-
a . Static (S). This opcode reouires no run-time action. 
b. Declarative (D) . This opcode declares some run-time entity . It may, or 
may not , therefore require run-time action dependent upon whether the enuty is 
statically determinable or no\. 
c. Functional (F). Thi s opcode class returns some value at run-time . 
d . Procedural (P). This opcode performs some run-time action without 
returning any value . 
e S\'mhol table (T ). An opcode in thi s class provides information which is 
not ~ eaningful for code generation purposes. but may be of use to prcwide 
symbol ic debugging information . 
The bracketed letter code used above appears against each Rcode entry in the 
reference li st Where some opcode may or may not return some value dependent 
I upon its operands. then (F/P ) will be seen in the list. 
Each opcod e has zero or more operands, listed below the_ entry heading. Those 
ttems m3r cd .. _ .. descrihc operand s to the instruction which occur as subtrees. 
Items preincd hv .. . .. des~Tibe in formation which occurs literally (ie is not dynarm -
callv oomp uta bl ; I. immed iately foll owing the opcode Whether any of these sub· 
trees 1s e,pected to return a value will roturall y depend on the opcode . 
RCODE REFERENCE MANUAL 5 
Group 0: 1'5eudo-Gperadoos 
Opcodes in this group cause special actions to be taken during the building or 
rebuilding of the Rcode tree . 
Group Oa: Miscellaneous 
Noop 
Null (S) 
This opcode is ignored on input. Toe opcode value must be 0, per-
mitting null bytes to be insened anywhere in an Rcode file , for 
example as filler at the end of a disk block . 
This opcode represents a null subtree . This opcode is used when-
ever an operand to the parent node is being omitted as, for example , 
when some compiler optimisation deteCLs unreachable code and 
wishes to replace it - with a Null . 
Refer_ Generic (T) 
• generic nesting level of argument group 
• argument number within group 
Thi s opcode is used wherever a reference occurs to the correspond-
ing generic argument. The back end of a compiler should never see 
this , except as a pan of the symbol table infor-mation . 
Identifier (T ) 
- the character string representation 
Tlu s onlv occurs as an operand which is giving a name to something . 
The subtree is expected tr> be an appropriate literal - usually a 
Shon_ Block_ Li t. 
Group Ob: Subtree Stack Manipulation 
These operations are prcw ided to allow a compiler front-end , operating under 
tight memory constraints, to output the subtrees of a node in any convenient order . 
possib ly before it has determined the correct order. and even before it has 
di scovered whether it will need those subtrees or not. Instead . it can direct the 
subsequent comr,iler phases to perform the re-ordering, and deletion of unwanted 
orerand s. as it is rebuilding. the tree . 
NOTE 1. The s.: uperauc,ns do not themselves occur a '. oan of the tree structure of 
the Rcodc . . 
6 PORTABLE U.NGUAGE IMPLEMENTATION PROJECT 
2 . The semantics are described as equivalent operations on a stack which is 
being used to rebuild a tree . Any equivalent actions are , of course , permis-
sible . 
Tree_ Rotate (S) 
• Depth into subtr1:e stack to rotate 
• How many steps to rotate (signed) 
A single-step rotation in the positive direction moves the subtree at 
the top of the stack to the bottom of the affected part, moving the 
rest of the affected pan one place up . A single-step rotation in the 
negative direction bas the opposite effect. Multi-step rotations 
effectively consist of the appropriate number of single-step rotations . 
Tree_ Duplicate (S) 
• Depth of entry in stack to copy () = top) 
• how many copies to make 
Makes the required number of copies of the specified subtree , and 
pushes them onto the stack being used in rebuilding the tree . 
Tree_ Delete (S) 
• Number of subtrees to delete 
Deletes the specified number of subtrees from the top of the tree -
building stack . 
Group Oc: Source Text Symool Information 
These orcoctes allo" the incl usion of source line and column number informa-
tion in the Rcode. from where it can be incl uded in some target-system debugger 
symbol-table . Like the operations in the previous sub-group , they occur 'free-
tloating' . not as pan of the Roode tree structure. 
L ine number information. if present. Lakes the form of two explicit operations. 
Col umn number information , however. may optionally (see Standard Pragmas) be 
associated "ith every Roode tree node . where it occurs as an extra byte follo.,.-ing 
the opcode, but preced ing the additional opcode-specific literal information. Thi s 
byte specifies the col umn position in the current source line corresponding to the 
Rcode operation. a s a non-negative increment over the position of the operation 
which preceded it in the file The column number is initialised to one at the Stan 
of each source li ne . 
RCODE REFERENCE MANUAL 
Set_ Sourcdhw (T) 
• new source line counter value 
Sets the current source line counter to the specified number , and 
resets the column counter to one . 
Inc_ Sourceline (T) 
Increments the current source line counter by I, and resets the 
column counter to one . 
Advance_ Columnpos (T) 
• one-byte amount to be added to the current column position 
Add s the specified amount to the column counter. 
This is useful where a gap of more than 255 characters occurs 
between successive Roode operations coming from the same source 
line . It is also usable when CPl•;mn information is not present on 
every opcode . 
Group Od: Code Generation Control 
The operations indicated by the opcodes in this subf!roup also occur free -
floating - independent of the tree strucrure . 
Pragma (S) 
- pragma operand s 
- text information as a literal 
• pragma code id 
A pragma i~ mt.::rnt to convey implementation·srecific information to 
contro l aspect , of the code generation process. Standard pragmas 
are defined at the end of this Refe rence Manua l. The list of pragmas 
which may be de fined for a particul ar language should be contained 
in a l...anguagc Svstem User Guide or equival ent documenL 
Group I: l)ecJaralions 
Group la: Globall_,-occurring l)eclarations 
T hese globa l declarati on , may onlv oc.cur at the outermost (top) level of a com-
pilation unit. 
8 PORTABLE LANGUAGE IMPLEMENTATION PROJECT 
Begin_ Unit (S) 
End Unit (SJ 
- unit name 
. unit body itself 
- the end_ unit code (see below) 
• year (four characters\ 
• month 
• day of month 
• hour (twenty-four hour clock ) 
• minute 
• second 
• hundredths of a second (all two characters) 
• Target machine type (code values specified for project) 
• Target machine version 
• Target machine variant 
• Target OS type (code values specified for project) 
• Target OS version 
• Target OS variant 
• Roode version 
• Roode variant 
• Bits per byte 
• Bytes per shon integer 
• Bytes per d efa ult integer 
• Bytes per address 
This opcode mark s the beginning of a compilation uni t. and prclYides 
some information which may be used for consistency checking. 
The literal parameters form a 11me-stamp which corresponds in 
format to an ANSI proposed standard over all but the la s\ si , 
'fields'. All code values specified for the Portable Language Imp le-
mentation Project are contained in appropriate compi ler/linker 
Definition mod ules. Thev need not. of course , be used by any other 
system making use of Rcode . The time-stamp facilitates version 
checking. It is meant to be passed on to a linker program. so that , 
when linking together the complete user program , the linker can 
check that different units 1mponing the same uni t have not impon ed 
different versions of th3\ uni t 
• module krnd (fa ce. bod y or entire ) 
• number of 1mrorted module s 
RCODE REFERENCE MANUAL 9 
• number of generic argument groups (0 if non-generic) 
• number of internal declarations 
This oprode marks the end of a compilation unit. 
'Module face ', 'module body' and 'entire module ' are Peano 
language terms. There are corresponding terms in other modular 
languages . A module face consists of definitions which are read by 
the compiler front end in order to perform interface consistency 
checlcing when compiling an importing unit, and also when compil-
ing the coresponding module body - the compiler back end should 
never see them. A module body is that compilation unit which 
corresponds to a known module face , making direct use of the 
definitions therein. An entire module is complete and self-contained , 
having no separate module face containing definitions. The 
difference between a module body and an entire module is not 
important to the back end of a compiler. 
: ,::port_ Unit (S) 
· name of i:nponed unit 
• id number assigned to unit 
• date/time/version information 
Specifies a compilation unit which is imported by this one lmpon 
information is passed on to the linker . where it is used to son out 
what units are needed in the final program . and in what order they 
are to be initialised and finalised . The date/time/version information 
is identical in form to the literal field s of the Begin_ Unit opcode . 
Declare_ Area (S) 
· optional explanatory text . w be displayed on a !ml.er map 
• id number assigned to area 
• alignment required 
• area protection :- read-only or read/write 
This opcode declares a static storage are:i . The s1zt of the area i; 
initially z.ero ; space is allocated in it using extend-area and append-
area instructions. 
The explanatorv text. if present. should ta ke the wrm of a shon 
block literal instruction . This lneral informa11 on. mterpreted as a 
text string. is passed on 10 th e linl-er. which ma) d1,play 11 on a 
storage allocation map 
10 PORTABLE LANGUAGE IMPLEMENTATION PROJECT 
Group lb: Other Dedaratlons 
Declarations in this group may occur at any point within a compilation unit , 
dependent upon the source language from which they were generated . 
Extend_ Area (S) 
• number of area to extend 
• bow many bytes to extend area by 
Increase the siz.e of the storage area by the specified number of unin -
izialised bytes . It is up to the front end of a compiler to keep track 
of (and adjust) the current siz.e of the area, in order to satisfy any 
alignment requirements for the new extension . 
Append_ Area (S) 
- literal information 
• number of area to append to 
• how many bytes of literal information to store 
This increases the siz.e of the storage area by the specifie.l numoer of 
bytP,s, and initialises the newly-allocated storage with the given 
information. It is up to the front end of a compiler to keep track of 
(and adjust) the current siz.e of the area, in order to satisfy any align-
ment requirements for the new extension . 
Declare_ Constant (D) 
· expression giving value 
• assigned id number 
• lexical leve l 
• alignment required 
Declares a constant entitv and specifie s 11s value 
Declare_ Variable (D) 
· expression giving size in b)te; 
• assigned id num ber 
• lexical level 
• alignment required for the allocated swrag, 
Declares a variable entit " and specifies how much s10rage 11 should 
be allocated - but NOT where th is storage should be'. 
The lellical leve l is present on this and the Declare_ Constant 
operations rn order 10 allow optimisa11nn of aa:ess tn sta 11call"·si,ed 
RCODE REFERENCE MANUAL 11 
entities. A compiler back end may simply allocate the appropriate 
amounts of space in the stack frame, and remember the (static) start-
ing offsets , with which it can replace all references to those entities . 
At any point where acxess to an outer block is required , the com-
plete tree for the outer block need not have been built, since the 
code generator may be operating on a routine at a time , rather than 
for an entire compilation unit. However, if a compiler front end 
could be written to run as one pass (as it could be for Pascal say), 
the declaration of a constant or variable would oertainly have been 
seen before any references to it. The explicit presence of a lexical 
level on the declaration therefore completes the information neces-
sary for a compiler back end to recognise that this has indeed hap-
pened . 
To further suppon the generation of code for a routine at a time , 
it is also required of a compiler front end that entity declarations 
local to different blocks at the same lexical level are nor intermin-
gled . Particularl y, for two sucxessive blocks zt the same lexical 
level , the declarations local ,o the second block should all occur after 
the declaration of the first block . in this way. sucxessive entity 
declarations with the same lexical level (with no intervening ones at 
outer lexical levels) may be assumed by the back end to belong to 
the same block . 
Declare_ Type (T) 
- size of objects of the type in bytes. or null if unbounded 
- size of objects of the type in bits. or null if unbounded 
- code for performing run -time consistency chech 
- type definition itsel f 
• assigned id number 
• lexical level 
• nesting level within compilation unit 
• alignment re{luired for objects of the type 
• packed or n01 
• machine type or not 
• basic machine type coding 
• length in bits 
• length in bytes 
• is an unsafe type 
• is a forward definition 
12 PORTABLE LANGUAGE IMPLEMENTATION PROJECl 
This opcode forms an Rcode type desa-iptor - in terms of the high-
level language definition . The run-time consistency checking code 
could be something like ensuring that the lower bound of a subrange 
is not greater than the upper bound plus one . The Rcode expression 
comprising the type definition consists of one of the operators in 
group 7c (below) . 
Note that all type definitions (whether they are directl y given a 
name via a type definition or not) in the source language get 
translated into rorresponding type entity declarations . Types which 
were just pan of other types, get referred to in the appropriate 
places, in the latter, using one of the operations in group 7b (below) . 
Declare_ Routine (D) 
- code for the routine itself 
• assigned id number 
• lexical level 
• is it forward 
• is it immediate 
"' is it a macro routine 
• is proper routine entry/exit code to be generated 
• size of function result. or O if none or variable-length 
An immediate procedure or function is one that ma y be used in con -
stant expressions. This information is really only meaningfu l to a 
compiler front end . 
Macro routines are those for which call s are expanded in-line 
Hence, no code need be generated for them ; the bad end of a com-
piler system should never see them 
The static size of the function result is rncludec for dec1d 1ng 
whether the result should be returned in a register or not To ensure 
consistency with imponing units. the decision should Dt made based 
on this size alone . 
Declare_ Module (T) 
- zero or more generic arg group defrnnion, 
- zero or more contained declarati on, 
• assigned id number 
• lexical level 
• whether face. bodv or entire 
RCODE REFERENCE MANUAL 13 
• number of generic argument groups, or O if non-generic 
• number of contained declarations 
This opcode declares a module entity . If this is a non-generic 
module then the code generator should inspect its contents to gen-
erate machine code and/or interpret the Rcode as required . 
Declare_ Label (D) 
• assigned id number 
• lexical level 
This defines a label. Note that Goto's and labels are provided only 
for implementing Pascal-like languages. They are not required for 
any other pw-pose . 
Define_ Symbol (S) 
- name of symbol 
- expression giving value 
• data type 
Defines a global symbol in the target-system-<iependent relocatable 
object language . This permits Rcode to express the necessary inter-
facing to operating-system library routines and other software . 
Group le: Object Characteristics 
Ths opcodes in this sub-group identi fy general characteristics for all kind s of 
objects defined in symbol table form , characteristics which are of use when a com-
piler is importing definitions to form symbol table entries, but which could/would 
be ignored in the back-end of a compiler. by a debugger , etc. 
Wh-ole_ Object (TJ 
- formal argument Ii st 
• is built-in to the source language 
• forms pan of a standard library 
• argument group number 
• is exponed 
• kind of parameter (or not ) 
This defines characteristics which are common to complete objects of 
whatever kind they mav be. The kind of parameter relates to the 
kind of calling convention appropriate to thi s panicular object (eg hy 
value as variable , bv value as constant, as fixed type , as variable 
14 PORTABLE LANGUAGE IMPLEMENTATION PROJECT 
type, etc). 
Part_ Object fr) 
- field off set 
- field width 
• is a bit field or not 
• is exported 
• kind of parameter (or not) 
Defines characteristics which are. common to components of objects 
of whatever kind they may be. 
Group 2: Values of Things 
The operations in this group return values without operating on any operands. 
Multiple versions of the 'return-literal ' opcode are provided, to save space where 
only small amounts of literal information are wanted . 
Byte_ Lit (F) 
• one byte of literal information 
Returns the literal information as value . 
Short_ Word_ Ut (F) 
• two bytes of literal infqrmation 
Returns the literal information as value . 
Long_ Word_ Lit (F) 
• four bytes of literal information 
Returns the literal information as value. 
Short_ Block_ Ut (Fl 
• one-byte count of how many bytes of literal information 
• one or more bytes of litera l information (255 b)~es max I 
Returns the literal information as value . 
Literal (F) 
• count of how many bytes of literal informauon 
• one or more bytes of literal information (up to 65536 i 
RCODE REFERENCE MANUAL Jj 
Returns the literal information as value . 
Rerer _ Constant (F) 
• id number 
• lexical level 
Returns the value of the specified constant entity. 
Generic_ Refer_ Constant (T) 
• id number 
• lexical level 
• generic nesting level 
This opcode only occurs within a generic entity, when referring to 
some constant entity declared within it. It facilitates re-assignment 
of lexical levels and id numbers, when instantiating the generic 
entity . 
Re~,"_ '/ sriable (F) 
• id number 
• lexical leve l 
Returns the address of the specified variable entity . 
Generic_ Refer_ Variable (T) 
• id number 
• lexical leve l 
• generi c nestin g leve l 
Thi s opcod e onl , occurs within a generic entity, when referring to 
some varia ble enut, declared within It . It facilitates re-assignment 
of lexica l level , and id numbers. when instantiating the generic 
entity . 
Refer_ Area (F) 
,.. id number 
Thi s returns the add rc» 0f the stan of the specified storage area in 
the current compi lauon uni t 
Refer_ Imported_ Area (F ) 
"' area number w1th111 module 
,.. module numhe r 
16 PORTABLE LANGUAGE IMPLEMENTATION PROJECT 
Returns the address of the start of the storage area from the specified 
imponed unit. 
Refer_ Symbol (F) 
• name of symbol 
• data type 
Returns the value of an externally-defined global symbol in the 
target-system-dependent relocatable object language . lbis allows 
Rrode to interface to operating-system library routines and other 
software . 
Refer_ Arglisl (F) 
• lexical level 
lbis opcode returns the address of the start of the argument list for 
the specified block . This address is intended for use in address 
arithmetic lw nerforrn offsetting) prior to doing a load, to obtain the 
value of som; argument . It should not be used to swre imo the 
argument list . 
W AR1'11NG If the argument list is z.ero-length, the address returned 
from the operation specified by this opcode is undefined. 
Refer_ Result (F) 
• lexical level 
Returns the address of the stan of the function result area for the 
specified block . If the result area is zero-length . the address 
returned is undefined. 
Refer_ Routine (F) 
• id number 
• lexical level 
This returns a descriptor for the specified routine . The lexical level 
should agree with that in the routine declaration . A descriptor con-
sists of the address of the routine . followed bv an environment 
pointer value (eg the static link . or alternativel y a pointer to a 
display array). 
Generic Refer_ Routine (T ) 
-: id number 
RCODE REFERENCE MANUAL 17 
• lexical level 
• generic nesting level 
This opcode only occurs within a generic entity, when referring to 
some routine entity declared within it. It returns a descriptor for the 
specified routine . 
Refer_ Imported_ Routine (F) 
• id number of routine within unit 
• unit number 
This returns a desaiptor for the specified routine from the specified 
imponed unit. Note that within that unit , the routine must be 
marked as being exported . This may be done by sourre-level 
declaration or internally by a compiler front end • as appropriate for 
the language concerned . 
Group 3: Basic Operand Manipulation 
This group of opcodes defines all forms of !1lllllipulation of operands excluding 
any form of transformation (ie not including logic or arithmetic). 
Group 3a: Memory Operations 
Note that loading returns an object found in store, while storing places an 
object into a given location. A conventional machine Load operation comprises 
both of these since both a source and a destination are involved . 
Storage is meant to occur at some time aher the "issue" of a Store opcode. 
aand then only if necessary. Thi s allows for delayed storage operations to reduce 
memory accesses when generaung machine code . The Update opcodes are there· 
fore provided w indicate that any holdup must be cleared and the location immed i-
ately updated . 
Block_ Load (FJ 
· source address for value to load 
· length in b)ies of value to load 
• alignment that may be assumed for source addres s 
Returns the specified number of bytes of data beginning at the 
specified address . The block may be zero-length 
The alignment value is provided 10 allow a machine code genera-
tor ICl produce code to move more than a b,ie at a time , if th is 
18 PORT ABLE LANGUAGE IMPLEMENTATION PROJECT 
would be more efficient on the target machine . 
Bit_ Block_ Load (f) 
- base byte address of source value to load 
- bit offset from source base address 
- length in bits of value to load 
This opcode returnS the specified number of bits of data beginning at 
the specified bit offset from the given base address. The block may 
be zero-length. 
Block_ Store (P) 
- destination address 
- value to store 
• alignment that may be assumed for destination address 
This opcode stores the value in memory as a wbole number of bytes, 
-;,.ginning at the specified address. 
The alignment value is provided to allow units larger than a byte 
to be moved at a time, if this would be more efficient on the target 
machine. 
Bit_ Block_ Store (P) 
· base desunation address 
· bit off set from base destination address 
• value 10 store 
Stores the value in mem ory . as some number of bits. beginning at 
the specified offset from the given base address 
Block_ Update (PJ 
- address of blocl, 
- size of block. in byte, 
• alignment that can be assumed for address of block 
This operation indicates that optimisations performed on acx:esses 10 
the specified block of Slore (delayed loads or delayed stores) are not 
10 be carried across an occurrence of the operation . h allows the 
programmer 10 specify checkpoints for atomic operations on share.d 
RCOOE REFERENCE MANUAL 19 
variables . 
Bil_ Block_ Update (P) 
- base byte address for block 
- bit off set of start of block from base address 
- size of block to be updated, in bits 
Toe effects of this opcode are similar to Block_ Update. except that 
the block is a bitstring beginning on an arbitrary bit boundary . 
Group 3b: Operations on Structures 
Select (F) 
- byte offset into value for part to seleC\ 
- size in bytes of part to seleC\ 
- value to seleC\ from 
Returns the specified portion of the given value . Thi s operation is 
used to implement selection of components of structures . 
Bil_ Select (FJ 
Construct (F) 
- bit offset into value for part to seleC\ 
- size in bits of part to select 
- value to select from 
Rerurns the specified portion of the )!l\'Cn value. Th is operation 1s 
usefu l for implementing seleaion of components of packed struc-
tures. 
- zero or more components 
• number of componen 
Thi s opcode comb,nes the given component values into a physicall y-
contiguous block value. Each component occupies some whole 
number of bytes. with no gaps. This operation is used. for example. 
to implement construetors in the Peano langua2e. or to assemble the 
list of parameters for s routine ,nvocauon . The components mu st 
each be tat the top level ) a Construct_ Component operation. which 
20 PORTABLE LA"IGUA G~ IMPLE '4 E"ITATI0'1 PROIECT 
is described below. 
Packed_ Construct (F) 
- zero or more components 
• number of components 
This opcode has similar meaning to a Construe\ operation , except 
that each component is packed into some number of bits , again with 
no gaps. 
Construct_ Component (Fl 
- replicator count , defaults to I if nul l. 
- size to give component within overall structure 
- value of component 
Specifies a component of either a Construct or a Packed _ Construct 
operation. 
The s~z: to give the component is interpreted a! t,vt,u;...fflr a Con-
strUCl, and bits for a Packed_ Construct. If the actual component 
-value is smaller than thi s. it gets padded with z.ero bits al the high-
address end to the specified size; if larger . it gets truncated at the 
high-address end . It is the job of a compiler front end to determ,n~ 
size values such that. if the complete structured value is given its 
required alignment, each component ends up with an alignment suit-
able for its data type . 
Group 3c: Other 
Call_ (F/P) 
· siz.e of result area. or nu ll 1f fix ed 
· composite va lue of argument hs, 
- routine descriptor 
• siz.e of funetion result area. or O 11 variable or zero-length 
• alignment required for argument li st 
• alignment required for result areo 
Thi s opcode call s the specified routmL The argument hst mav be 
built , for example, bv us,ng the Constru ct opera;on Used m a .con-
text where a value 1s required this opc0Ck 1mphe, a funet,on cal l. 
otherwise a procedure ca ll 
RCODE REFERENCE MANUAL 21 
Group (P) 
Loop_ (P) 
Group 4: C1111tro1 
- body of group 
This operation implements the grouped statement such as a 'com-
pound statement ' source language style of construct.ion . A grouped 
statement may be left in the middle with an Exit_ operation (see 
below). The body of the group is simply a statement or sequence of 
statemems, none of which retwns a value . 
- body of loop 
Thi s operation implements the loop statement in typical high-level 
languages. It may also be used for the other loop constructs present 
in several languages. 
Local_ Block (F/P) 
For_ (PJ 
- size of result area , or null if fued 
- composite value of argument list 
- body of block 
• lexical leve l 
• alignment required for result area 
• alignment required for argument list 
• size of result area. or O if none or variable-length 
This declares a local block . which is equivalent 10 an inline routine 
cal l. It ma y ha ve its own loca l declarat ions. and ma) a lso be left 
with an Exi t_ operation 1see below). If it is used in a context where 
a value is required then it implies a functi on local block , otherwise a 
procedure 
- one or more for-index definitions 
- bc>dy of for-)oop 
• lexical leve l 
• number of for-i ndexes 
lmplemen a for-sta teme nt. with the for-indexes declared as loca l 
constants . and with static increments of either plus or minus 1. Thi s 
directl y corresr or.c s to the for-statemer,, of languages such as Pean0 
22 PORTABLE LANGUAG E IMPLEMENTATION PROJECT 
or Ada . Other languages, with different semantics , could still imple-
ment their for-statements in terms of this primitive . 
The body of the for-loop is performed for all possible combina-
tions of values of the for-indexes, starting from their initial values up 
to their final values, in steps of plus or minus 1 (as specified in each 
index definition), with the last for-index varying the most rapidly. 
This operation is like a local block , in that the body may contain 
declarations local to the for-statement. Indeed , the for-indexes them-
selves are effectively declared as constant entities within the body of 
the for-loop , with id numbers assigned in sequence , with the first 
for-index being 1. 
For_ Index (D) 
Exit (P) 
Case_ (F/P ) 
- initial value 
- final value 
• data type of for-index 
• back wards or not 
Defines a for-index for a for-statement. The data type must not be 
rea l. If 'backwards ' is true, the step is minus 1. so if the initial 
value is less than the final value, the for-loop is not executed at all . 
Otherwise. if 'backwards' is false, the step is plus 1. Then for the 
number of iterauons of the for- loop to be zero, the mitial value mu st 
be greater than the fina l val ue 
• number 0f level , to exn 
Exm the spec1ried innermost number of leve ls o f gr0up. loop. lor 
statement or block II an Exit_ operation cause , an e,11 lrnm a 
block it mu st be the la st \Outermost) construct of the ones being left. 
An E; it_ mav not cause an exit from more than one block 
· selecting expression 
· 2.ero or more ahcmauvcs 
- else part 
• data type of selectin g expression (mu st nm be real I 
" number of ahcrnat1vc~ 
Thi , orcode is used w implement b,,th ca se- and if- state ments in 
RCODE REFERENCE M ,VlUAL 23 
typical high-level languages. The selea.ing expression is used to 
select an alternative that has a case-label (see below) with a match-
ing value . If no such alternative is found , the else-pan alternative is 
selea.ed for execution. 
An if-Statement is represented as a case-statement with a boolean 
selea.ing expression , and appropriate labels on its alternatives . 
All the alternatives (except the else-part) must, at the top level , 
be Case_ Alternative operations (defined below). This operation 
may also occur in expressions, where it returns a value. This will be 
the value returned by the alternative which was selected for execu-
tion . In this case, all the alternatives must be capable of returning a 
value of an appropriate type . 
NOTE A case-operation is implementable on several computers 
using jumr tables, on others using a case instrua.ion of some 
kind . However, if the selea.ing expression is known to have 
only two er. three values (for example, if the data type is 
boolean or tristate), an implementation using comparisons 
and conditional branches may be more efficient. 
If there is more than one case-label matching a given value for 
the selecting expression , the result is undefined . 
Case_ Alternative (F/P) 
· one or more case label definitions 
- a lternative StatemenVexpression itself 
• number of label, 
Define!> an al ternative in a case-statement or case-expression . The 
parent node must be a Case_ operation. An alternati ve is selected 
lor execution if one of its associated label s matches the value of the 
selecting expression 
Byte_ Case_ Label (S J 
• one-bv11: label value 
Denne, a sin~le ca,e label val ue of some b~~e-sized data type (the 
data type ,s specified in the C:.1se_ operauon J 
Byte_ Range_ La«·_ Label (5 J 
~ one- by11.· ltw. hound 
24 PORTABLE LANGUAGE IMPLEMENTATION PROJECT 
• one-byte high bound 
Defines a range of label values of some byte-sized data type (the 
data type comes from the Case_ operation). This label matches any 
value of the selecting expression from the low bound up to the high 
bound, inclusive. If the low bound is greater than the high bound, 
this represents the null range , which mate.hes no values. 
I Shortword_ Case_ Label (S) 
• two-byte label value 
Defines a single case label value of some shortword-sized data type 
(the data type is specified in the Case_ operation) . 
Shortword_ Range_ Case_ Label (S} 
• two-byte low bound 
• two-byte high bound 
·OehLc: a range of label values of some shonword-sized d~•a type 
(the data type comes from the Case_ operation). This label matches 
any value of the selecting expression from the low bound up to the 
high bound , inclusive . If the low bound is greater than the high . 
bound , this represenL~ the null range, which matches no values. 
Longword Case_ Label (S) 
• longword-sized label value 
Defines a single case labe l value of some longword-sized data type 
(the data type is srecified in the Case_ operation ). 
Longword_ Range_ Case_ Label (S J 
• longword-sized low bound 
• longword-sized high bound 
Defines a range of label values of some longword-sized data type 
(the data type comes from the Case_ operation) This labe l matches 
any value of the selecting expression from the low bound ur to the 
high bound . inclusivt . If the lo" bound is greater than the high 
bound. this represents the nu ll range . which matches no values. 
Case_ Label (S) 
- large label value 
RCODE REFERENCE MANUAL 25 
Defines a single case label value of some data type with the same 
size (the data type is specified in the Case_ operation) . 
NOTE This operation is reserved for expansion to integer data sizes 
larger than a longword . 
Case_ Label_ Range {S) 
- label value low bound 
· label value high bound 
Defines a range of label values of some data type with the same size 
(the actual data type comes from the Case_ operation). This label 
matches any value of the selecting expression from the low bound up 
to the high bound , inclusive . If the low bound is greater than the 
high bound , thi s represents the null range , which matches no values. 
NOTE This operation is reserved for expansion to integer data sizes 
larger than a longword . 
Sequence (FIP) 
Set_ (F/P) 
· zero or more statement subtrees 
• number of subtrees in sequence 
The statements of the sequence are to be executed in strict sequential 
order. Thi s operation may occur in an expression , where it returns 
whatever value is returned from the last subtret in the sequence . 
· zero or more statement subtrees 
• number of stattments in set 
Tne statements of the set are to bt executed in some unspecified 
order The compi ler bad. end ma~ even take advantage of parallel· 
ism in the target hardware . and execute part, of the set concurrentlv . 
This operation ma y occui in an expression. when · 11 returns whatev~r 
value " returned from the last subtree in the· set. irrespective of 
which ,ubtret comp le t« ·e,ecuuon · last. 
No_ Transplant ff l 
,uhtri:e to r>e 1~olated 
In genera l. the bacl end r,f c cr,mr iler m av t;, ke advantage r>f the 
assoc1aun1~ 01 0pera1or:- IC' rearrangt. t,.rression:- such as 
26 
Unk {P) 
Goto (P) 
Label (SJ 
PORTABLE LANGUAGE IMPLEMENTATION PROJECT 
(a + b) + c into a + (b + c), to permit further transformations in 
the interests of efficiency . However , thi s may have implications on 
the accuracy of the resulting computation . The No_ Transplant 
operation permits a compiler front end to set limits to this re-
arrangement; specifically , no portion of the subtree below the 
No_ Transplant may be moved into the pan of the tree. above, nor is 
movement in the opposite direction permitted . Apart from this. the 
No_ Transplant operation simply returns whatever value is returned 
from the subtree . 
- head of list 
- rest of list 
This opcode enables the construction of linked lists of opcodes as, 
for example , in the Sequence and Set_ codes above . Except for the 
last element in the list the 'rest ' will be another Link opcode . 
• id number of destination label 
• lexical level of destination label 
Implements the Goto statement found in some programming 
languages. 
• id number of labe l 
Defines the label 10 point to th t current place in the blod, For 
example, if Lh, s occurs ma sequence, transfemng control w the label 
will cause executi on 10 resume at that point in the seq uence 
Group 5 : Arithmetic 
Group Sa: Memor)' Operations 
Th is subgroup models subgroup 3a. except tha1 all object, are of basic types 
only . 
Load (F i 
· address to load from 
• data type of value to load 
RCODE REFERENCE MANUAL 27 
Store (P) 
Update (P) 
Returns the value at the specified address . 
• address to store value at 
· value to store 
• data type of value to store 
Stores the value at the specified address . The size specified by the 
data type should agree with that of the value being stored . 
· address to update 
• data type of value to update 
This opcode indicates that optimisations performed on accesses to 
the specified piece of memory (delayed loads or delayed stores) are 
not to be carried across an occurrence of the operation . It allows the 
programmer or a compiler fro;·.t end to specify checkpoints for 
atomic operations on shared va.'1aoles . 
Group Sb: Numeric Operations 
Add (F) 
Subtract (F) 
· first operand 
- second operand 
• data type 
lf the data type is boolean , this performs an inclusive-or operation 
on its operands. taken as boolean values lf the dat;i type is Si gned . 
unsigned or unspecified exact. or rea l. it performs the appropna1 e 
numeri c add operation on its operands . taken as val ues of the 
corresponding rype The tristate data type JS not allowed . 
- first operand 
- second operand 
• da ta type 
If the data type is signed , unsigned or unspecified exact. or rea l. th i, 
performs the appropriate numeric subtract operauon on ,ts operand,. 
taken as Yalues of the corresponding type. Boolean and tristate da t.:, 
28 
Negate (F) 
PORTABLE LANGUAGE IMPLEMENTATION PROJECT 
types arc not allowed . 
• value to negate 
• data type 
If the data type is boolean , this performs a not-operation on its 
operand , returning false if its value is true, true if its value is false . 
If the data type is oistate, this operation returns -1 if its operand 
value is +I, +I if the operand is .J, and O if the operand is 0. It 
could thus be used to invert the sense of a comparison . If the data 
type is signed or unspecified exact, or real, this performs the 
appropriate numeric negate operation on its operand, taken as a 
value of the corresponding type . The unsigned exact data type is not 
allowed . 
Absolute_ Value (F) 
Multipl,1 (F ) 
Divide (F) 
· operand value 
• data type 
If the data type is signed exact or real , this performs the appropriate 
absolute-value operation on its operand, taken as a value of the 
corresponding type . The boolean , tristate , unsigned exact and 
unspecified data types are not allowed . 
- first operand 
- second operand 
• data type 
If the dat.i type is boolean. this performs an and-operation on 11' 
operands, taken as boolean Yalues. If the data typt 1s signed , 
unsigned or unspecified exact , or real, it performs the appropriate 
numeric multiply operation on its operands. taken as values of the 
corresponding type . The tristate data type is n01 allowed 
• dividend 
. divisor 
• data type 
If the data type is signed. unsigned or unspecified exact. or rea l. th JS 
RCODE REFERENCE MANUAL 29 
Modulus (F) 
performs the appropriate numeric divide operation on its operands , 
taken as values of the CO!Tesponding type. The boolean and tristate 
data types are not allowed . 
- dividend 
- divisor 
• data type 
If the data type is signed, unsigned or unspecified exact, this returns 
the remainder on dividing the dividend value by the divisor. Toe 
boolean, tristate and real data types are not allowed. 
Group Sc: Extended-precision Numeric Opc ... tlons 
These special opcodes are provided to simplify the task of a back-end code 
generator which will frequently have to adopt a different strategy for multiple 
length arithmetic of all kinds. 
Block_ Add (F) 
• first c,perand 
· second operand 
• data type 
Performs a multiple-precision add operation on its operands. which 
may be any number of bytes long . but should be of the same siz.c. 
The siz.e field in the data type is ignored ; the only valid types are 
signed exact, unsigned exact and unspecified . 
Block_ Subtract (FJ 
· first operand 
· second operand 
• data type 
Performs a mult1ple-precision subtract operation on its operands. 
which may be any number of hyies long. but should be c,f the same 
siz.e . The size field in the data type is ignc,red ; the only '"alid types 
are signed exact. unsigned exact. and unspecified . 
Extended_ Multipl)' (FJ 
· multiplicand 
· multtpiie1 
• d~ta typ 
30 PORTABLE LA~GUAGE IMPUMENTATION PROJECT 
This multiplies its two operands, returning a double-length result. 
The data type must be signed exact. 
Extended Divide (F) 
• dividend 
- divisor 
• data type 
Returns both the quotient and remainder of the division, in that 
order, as a composite value. The siz.e of the divisor (and of the quo-
tient and the remainder) is given directly by the data type; the siz.e 
of the dividend is twice this. The data type must be signed exact. 
Extended_ Modulus_ F1oatlng (F) 
• real multiplicand 
• real multiplier 
· multiplier mantissa extension. integer 
• data type for integ::r ope,'ands ;md..int.:.;;e• pan of result 
• data type for reai operands and real part of result 
Does the floating multiplication , and returns the whole number and 
fraction pans of the result separately. in that order, as a composite 
value. The whole number part is returned as a signed integer, the 
fraction part as a real. 
Group Sd: Conversions 
Convert (FJ 
- source valut 
• result data type 
• source data type 
· source value 
Does conversions between operand preosions and between exact and 
real values The valid conversions are as follows :-
a . From signed /unsigned exact to s1gned1unsigned exact · thi s is 
always valid provided that the value dcies not overflc,v. the 
bound , of the result type. Thi , rould occur if the result type is 
smaller than the source typo . but n also means that a negauve 
value of a signed exact type cannot be convened to an unsigned 
exact 
h From unspecified !Cl unspecified · th· · 1s always valid. but 
RCOOE REFERENCE MANUAL 31 
the result type must not be larger than the source type. The trun· 
catcd bits are simply thrown away - that is, overflow is ignored . 
c. From signed exact to real - this should always be valid - pro-
vided overflow does not occur. 
d . From unsigned exact to real - this is always possible, 
although there may be no hardware suppon for direct machine 
code generation . 
e . From real to signed exact - this is valid provided overflow 
does not occur . 
f. From real to unsigned exact - this is always possible pro-
vided that integer overflow does not occur . Fractions are trun-
cated towards zero . This may not be supported by hardware for 
machine code generation . 
g . From re.ii to real - this is valid provided overflow does not 
occur . For conversion to a lesser precision , the result is trun-
cated or rounded depending on whether bit 7 of the result data 
type is O or I respflctively . 
All other combinations of source and result data type are illegal. 
Convert_ Tristate_ Boolean (F) 
- source value 
• byte containing conversion mask 
Maps the source value . interpreted as being of tristate type. onto a 
boolean value , according to the :,.bit conversion mask :-
• Bi1 0 indicates what result to give if the source tristate ,·alue 1s 
- I 
• Bit I indicates wha1 result to give if the source value is O 
• Bit 2 indicates the result if the source value is + I. 
Thi s operation is useful for mapping of the rtsult of a convernon 
ont0 a boolean value , for implementing the companson operators 1n 
Pascal and Modula-c . for example . 
32 PORTABLE LANGUAGE IMPLEMENTATION PROIECT 
Group Se: Comparisons 
Compare (F) 
- left-hand operand 
- right-hand operand 
• data type 
Performs the appropriate comparison operation on its operands. 
returning a tristate result as follows :-
- I means less-than , 
O means equal , 
+ I means greater-than . 
Any data type, except unspecified, is allowed . 
Block_ Compare (F) 
• left-hand operand 
. right-hand operand 
• data type 
Performs a string-aimparison operation on its operands taken data 
type unit size at a time. making the comparison on a data type basis. 
Both operands should be of the same size . It returns a tnstate 
result , with the same meanings as for Compare . 
Group Sf: Bitslring Operations 
Bit_ Op (F) 
- le ft-hand operand 
- righ1-hand operand 
' operation mas), 
This 1s a generalised two-operand bitwise operation. The operand 
sizes. m bns. should be equal: the result has the same size . 
Each bit of the result is determined is follows :-
a The correspondin p. bi t of the righ1 -hand operand is extractc~. 
Jen -sh ifted 0ne place. and mclus1ve-ORed with tht correspondin ~ 
bn extracted from the le ft-hand operand . to give an integer 
numher 1n the ran~.: 0 10 3. 
b The bn 1of the Im, -order 4 ) m th c- operation mas, which h., , 
th i, numher 1mmed1ate ly gives the v~lue of the rc,u i1 bit 
RCODE REFERENCE MANUA L 33 
Some example masks are 
2_ #1110 - bitwise-or (set union ) 
2_ #1000 - bitwise-and (set inter5eruon) 
2_ #0010 - bitwise-clear (asymmetric set difference) 
2 _ #0110 - bitwise-xor (symmetric set difference) 
2 _ # 1011 - bitwise > = (reverse implication) 
2_ #1101 - bitwise < = (implication) 
2_ #1010 - identity operation on left-hand operand 
2_ #0101 - complement first operand (set difference with 
universal set) 
Either operand ma y be null if. according to the operation mask . its 
value is irrelevant tC\ that of the resuh. 
Arithmetic_ Shil\ (F) 
- value to be shifted 
• ho" manv bn places to shift 
Returns a rtsuh of the same size as the value. shifted arithmetically 
left (1f pos111ve I or righ1 (if negative) by the number of place 
specified . 
Logical_ Shift (FJ 
Rotate (F ) 
Sin~leton (F i 
- value to be shifted 
• hov. many bn places to shift 
Returns a resuh of the same size as the value, shifted logicall y left 
(if pos1tJ,·e ) or right (if negative) by the number of p laces specified . 
- valu to h, shifted 
• hov. manv hit places to shift 
Return, a re suh nl Uk s:ime size as the value . rotated left lif posi-
tive, or n ~ht Ill negative l b,· the number of places specified 
• :.,,111..· 111 h1h of resul1 lP produet. 
nfl,LI I(' h11 10 hL ,r:t tn 1 
Return, .. 
equa l 1<' ( 
:, :n; i:...!'.111 ,Lt , L J ~trin f. v.nh :.i ll the bi~. t,.c-ept one . 
Til e nil ,ct nr the hu \O be set mu s\ be Jess than the stze 
PORTABLE LANGUAGE IMPLEMENTATION PROJECT 
of the result. 
Zero_ Extend (F) 
- size in bits of result 
- value to be extended 
If the result size is greater than that of the second operand value, the 
result is that value, with zero bits added at the most significant end, 
to make up the required size. If the result size is less than that of 
the operand value , the latter is truncated at the high-address end to 
the required size. The bits lost should all be ze:ro. otherwise a run-
time error occurs. 
Sign_ Extend (F) 
- size in bits of result 
- value to be extended 
If the result size is greater than that of the second operand value, the 
result is that value, with its !>.&n bit duplicated at the most significant 
end , to make up the required size. If the result size is less than that 
of the operand value, the latter is truncated at the most significant 
end to the required size . The bits lost should all be equal to the sign 
bit of the result , otherwise a run-time error occurs. 
Group Sg: R~ime Checks 
Signal (D) 
Assert (D) 
- error message .. tex t 
This operation signal s a run-time error. The message text shou ld be 
given by one of the litera l operations 
- boolean expression 
- error message tex\ 
This operation signals a run-time error if the given condition is 
found to be false . The message text should be given by one of the 
literal operations. 
Range_ Check (FJ 
- val ue to be checked 
- low !unit to he checked against 
RCODE REFERENCE MANUAL 35 
- high limit 10 be checked against 
- error message text 
• data type 
The data type should be boolean, tristate , or signed or unsigned 
exact. The value to be checked is returned as the result of the range 
check operation . However, if this value is less than the low limit , or 
greater than the high lim11, a run-time error is signalled . The mes-
sage text should be given by one of the literal operations 
Either of the high or low limit expressions could be null , mean-
ing that the corresponding limit check is not to be performed . This 
allows one-sided range checks, if the front end is able to determine, 
from the logic of the program, which checks are not necessary . 
Group 6: Machine/S)•stem--0ependent Operations 
This section cf Roode conuins a number of 'standard ' system-dependent opera-
tion s which are need ed r,n any machino which is to implement programs involving 
c:oncurrency, inpuVoutput. interrupts. etc. In addition there are a number of 
machines which includ e panicular mstn,ctions which could be used to achieve spe-
cial effects only applicable IC' that machine . These special effects are not usually 
produced by a compiler fron t end generating Roode. but more likely from using 
the two-level language facili ty for inhne Rcode which Pean0 offers. It is for this 
reason that machine-dependent Rood es are pmvided . Extensions for mach111es not 
covered in thi s manua l are permi ssib le . provided tha t such extensions are ONLY 
going to affect programmer generated Roode and NOT compi ler generated Rcode -
whi ch would then no longer be p0n abl~ 
Group 6a: Compile r Standard [~tensions 
The or,codes in thi, sub-,ect,on ma , oe generated h, cc,mplier front end ; and 
wil l be understood by all cc,d, generators or interpreters provid ed a, pan of the 
Ponabie Language lm piementa110n Prn.iect 
Input (P) 
• J (l ron addrc ~-. 
butler lc1c.1 uo11 
· num hcr f'l unn, tn transfer 
• d:na t,·r, 
Tran sfer th, sr,,1 lkC number 0f data typ, obJeCh 1r0m th, 
add ressed r0n int , the machine b uffe r srecifiec Tho r un addrcs, 
36 PORTABLE LANGUAGE IMPLEMENTATION PROJECT 
is expected to be valid for some input device on !be machine coo-
a:med . The data type must specify a size d object which 
corresponds to the pon concerned , for example a byte for a byte 
pon! 
Input_ Move {P) 
Output (P) 
- ilo pon address 
- buffer location 
• number of units to transfer 
• data type 
This opcode specifies the transfer of the given number of data type 
units from the addressed port into the buffer (if non-null ) and return-
ing the value. The port address is expeaed to be valid for some 
input device on the machine concerned . The data type must specify 
a size of object which corresponds to the port concerned . for exam-
ple a byte for a byte port: 
- i/o port address 
• buffer location 
• number of units to transfer 
• daui tyr 
Transfer the number of uruts ~ecified from the buffer to the device 
anached to the output port address. The port address is expeaed to 
be a valid address for an output device The daui tii,e must 
correspond to the kind of output port attached to the address. for 
example a w0rd if the port is a word port ' 
Fast_ Call (P) 
• addres, of routine w ca ll 
This opood e implies the diree1 generation of a mach111e code which 
does not set up an v environment · whi ch mai be d one for norma l 
routine cal l, . It may thus be used to imr,lement non-standard callm ~ 
conventions · in part icular the calling con,·en11on used by the native 
operatmg system 
Fast_ Return (Pl 
Thi s opcode imr,hc s th, ~enerauon 01 a machine code wh,cn mertl ! 
return from ~om~ rnuunc c.a ll withou carryinf out an: fra mt 
RCODE REFERENCE MANUAL 37 
Jwnp (P) 
cleanup that ma}' be dooe for normal routines by the back end . It 
may be used 10 implement noo-standard calling conventions . 
- address to jwnp to 
This opcode implies the action of a jwnp instruetion to the specified 
destination. This may be used to implement non-standard routine 
calling conventioos . 
Interrupt_ Return (P) 
This opcode implies the action of a machine code to return from an 
interrupt routine, restoring any machine-dependent environment 
which the hardware may bave saved . This could be used in a rou-
tine declaration which doesn't bave the standard routine entry and 
exit code generated , to set up special-purpose interrupt handlers . 
Get_ Priority (P) 
- result location expression 
Thi s opcode places the current process priority into the location 
specified . Note tha1 the bit-panern which represents this priority 
must be interpreted in a machine-dependent manner . 
Set_ Priorit) (P) 
• value expression 
This makes the currem process priori~· be the given value expres-
sion (which musl be a valid machine-dependent value!) 
Saw_ Rai~,·- Prioril.1 (P) 
- va lue expression 
This save , the current process priority ( ID some processor dependent 
manner ) and 1Dcrements the current prion ~· by the given value 
expression - whi ch must resuli in a mach1De-depend ent vali d value. 
Restore Lower_ Priori!) (P i 
· locauon 
Place thl current value of the pnOnt) of the current p roce , ID tht 
g iven locauon and restore the prion ty to that va lue moSl rccentl ~ 
38 PORTABLE L,'<GUAGE IMPLEMENTATION PROJECT 
saved . 
Get_ Special_ Reg (f) 
• address for result (usually null ) 
• data type 
• register number 
Returns the value of the specified privileged CYU register. An 
encoding of the possible register numbers are contained in the 
Reg_ Kind enumerated data type contained in the project runtime 
library MAODNE module . The machine code generated by a com-
piler back end depends upon the register involved and, of course, 
the machine. 
Set_ Special_ Reg (P) 
- value to be given 
• data type 
• register number 
Sets the specified privileged CPU register to the given value . An 
encoding of the possible register numbers are coniaincd in the 
Reg_ Kind enumerated dau, type contained in the project runtime 
library MA011NE module . The machine code(s) which may be gen-
erated by a compiler back end arc often regiStcr and machine· 
dependent 
Gel _ General_ Reg (f) 
- address for result (usual ly nu lll 
• data type 
• register num her 
Returns the value of the specified general-purpose register The 
encodings of the possible register numbers are contained in the 
Reg_ Kind enumerated data type contained in the proJect runtime 
lihrary M A011NE module. 
S(•I _ General_ Reg (P) 
- value to be se1 
• da ta 1ypl 
• reg, ter num ber 
SeL, the val ue of the srecified general-purpo,e regis ter to that ~1ven 
RCODE REFERENCE MANUAL 39 
Toe encodings of the possible register numbers are contained in tht 
Reg_ Kind enumerated data type contained in the project runtime 
library MAOilNE module . 
AUocat.e_ General_ Reg (D) 
- address for saving value (if needed ) 
• data type 
• register number 
Marks the specified general-purpose register as in use for the rest of 
the current block . Anything the code generator may have put in it is 
saved . This operation may be used to implement non-standard rou-
tine calling conventions , with arguments passed in registers. for 
example . 
Deny_ General_ Reg (D) 
- null opcode 
• data type 
• register number 
Marks the specified general-purpose register as in use for the dura -
tion of the current block . This operation should occur right at the 
stan of the block The code generator never touches the register . so 
its initial contents are preserved . This operation may be used 10 
implement non-standard routine calling conventions. with argum eni, 
passed in registers, for example . 
Free_ General_ Reg (D) 
- address last saved in (usually null 0 1 
• data type 
• register num ber 
This opcode is used to inform a compiler back end code generator 
that it may no" use the specified register m 1ts generated cod e Th t> 
operation may foll ov. an Allocate_ General_ Reg N 
Deny_ General_ Reg operation on that regi ster. m the same bloc! 
Service_ Call (P) 
. optional parameter 
• one byte service call cod< 
Th is opcode implies the generation of a machine code t< cal: ., 
40 
Test_ Set (P) 
PORTABLE LANGUAGE IMPLEMENTATION PROIECT 
kernel service within the 'operating system ' of the machine con-
cerned with the appropriate parameiers . This is also used where 
specific hardware services (eg context saving - 254 or context loading 
- 255) arc required . 
• address of location to test-and-set 
• address for result 
This opcode requires a machine code generator to generate an unin-
terruptible (if possible) code for testing and setting the location 
specified . returning a boolean value, true indicating that the destina-
tion was previously set, false otherwise . 
Test_ Oear (P) 
• address of location to teSt-and-clear 
• address for result 
This opcode requires a machine code [!Cnerator to gene,~,~ an unin-
terruptible (if possible) code for testing and clearing the location 
specified, returning the result as a boolean value , true indicating that 
the destination was previously clear, otherwise fal se . 
Add_ In_ Place (P) 
· address to add value to 
· value to add 
This opcode requires the indiyisible action of incrementing the value 
at the given location by the amount specified. It could he used to 
tndivisibly update a memory counter which ,s betng shared between 
asynchronous concurrent activities 
Subtract_ In_ Place (P) 
- address to subtract value from 
- value 10 subtract 
Thi s opcode requ ires the action of indivisibl,· subtrac11ng !ht given 
amount from the address specified . It could be used to indivisib l, 
update a memory counter which is being shared betv.,ecn async.hr0 
RCODE REFERENCE MANUAL 41 
nous concurrent activities. 
Add_ In_ Place_ Locked (P) 
- address to add value to 
- value to add 
This opcode requires the uninterruptible action of incrementing the 
value at the given location by the amount specified. lt may be used 
for counting semaphore implementation. 
Subtract_ In_ Place_ Locked (P) 
- address to subtract value from 
- value to subtract 
This opcode requires the uninterruptible action o f subtracting the 
given amount from the value stored in the address specified . It may 
be used for implementing a counting semaphore . 
lnllne _ Code (P) 
• one-byte length of literal code w insert 
• literal code itself 
Toe specified literal code is to be copied. unchanged. by a compiler 
code generator, to its code output. This operation may be used 10 
generate the very machine specific special instrucuons. not otherwise 
normall y considered by the machine code generator . Care should be 
taken with the use of this operation , as it cannot be guarranteed that 
the code generator checks that the literal information constitute s 3 
valid sequence of machine instructions. 
Group 6b: Z8000-speclfic Operations 
The following JiS\ of Roode extensions permits access to all the special ieature , 
of the ZSOOO-series microprocessors except :-
3 . The translate and translate-a nd-test 1nS\ruct1on, 
b. The compare-and-decrement group of instrucuons 
Exte nded Processing Unit (EPU) 1nstructi0n , 
l:xtensions t,, core with th,· first thre,· m:i v he added " ' necessan· Howewr. thes, 
codes may he generated h~ the target c~,al' generawr when: s~ttable actions an: 
detected . E, tensions for the last "il l have 10 a wa n the a,·ai labilny of documenta 
42 PORTABLE LANGUAGE IMPLEMENTATION PROJECT 
tion on the kinds of EPU 's available . 
Mulll_ Mkro_ Test (F) 
Generates an MBIT instruction , and returns a boolean value indicat-
ing the state of the Ml pin; true if the pin is high (inactive), false if 
it is low. 
Multi_ Micro_ Request (P) 
- 16-bit delay count (must be greater than 2) 
Generates an MREQ instruction, and returns a tristate value based on 
the resultant setting of the condition codes, as follows :-
+ I (greaterthan) - request not signalled (resource not available) 
0 (equal) - request not granted (resource not available) 
-1 Oessthan) - request granted (resource available) 
Multi_ Micro_ Reset (P) 
Generates an MRES instruction . 
Multi_ Micro_ Set (P) 
Generates an MSET instruction . 
Group 6c: VAX-,;peclfic operations 
The following Roode extensions are specific to the Digital Equipment VAX-11 
target machine series of processors. 
Exec_ Service_ Call (P) 
- service call code 
Generates a CHM E instruction. with the srec111ed argument ,·alue 
which should be a 16-bit code 
Super_ Service_ Call (P) 
- service call code 
Generates a CHMS instruC\lon. with th, spec ified argument value 
which should be a J6-bit code 
Us.er_ Service_ Call (P) 
· service call code 
Generates a CHM U instructior.. w1tl, th,· srecificd arg ument value 
RCODE REFERENCE MANUAL 43 
which should be a J6-bit code . 
Group 7: Type Information 
This final group of opaxles is included in Roode for two purposes :-
a . To enable compilers to implement the separate compilation feature of 
languages such as Peano and Modula-2. 
b. For generating information to be passed to a source-language debugger. 
For the former purpose , when the front end of a compiler is compiling a com-
pilation unit which is a module face or (for languages like Peano) an entire 
module , it produces an Rcode file called a symbol file. When compiling some later 
unit which impons this one, the compiler front end re-reads the symbol file, and 
rebuilds its symbol table from the information within, thus facilitating full type 
checking across compilation units . 
The back end of a compiler never sees this symbol file . Instead, when compil-
ing the corresponding module bod y or (for languages like Peano) simultaneously 
with the production of the symbol file, the front end produces another Roode 
stream, containing the acrua l code which is to be passed to the back end for the 
generation of target machine code . 
This latter code file may also contain symbol table information (some of which 
is just a copy of that in the symbol file ). for debugging purposes The portable 
language project implementauon translates this into whatever format is appropriate 
for either the project editing environment used as a debugger or for a target-
system-specific debugger. 
Group 7a: Symbol Table Declarations 
Expose (T) 
• id number of module to expose 
• exponed or not 
Implements the exposing-declaration in Peano or the Use facilny in 
languages li~ t Ad a. for exampl e 
Declare_ Constant _ Id (T 1 
- 03ffi L 
- general charactensuc, 
- type ol wnstanl 
. e,presc.1cm refcrencin!! value 
144 PORTABLE LANGUAGE IMPLEMENTATION PROIECT 
I 
I 
Declares a constant identifier. lbc general characteristics should be 
one of the opcodes in Group Jc. The Roode expression giving the 
type should be one of the opentions in group 7b (below). The 
expression for referencing the value of the constant is expected to be 
inscned in-line in the Rcode generated by the front end, wherever a 
reference to the constant occurs. 
The expression for referring to the value is expected to consist of 
a Declare_ Constant to give the appropriate id number and lexical 
level. 
Declare_ Var_ Id (T) 
- name 
- object characteristics 
- type of variable 
- expression indicating the L-value of the variable 
Declares a variable identifier. Object characteristics will usually be 
given by an opcode from group Jc . The Roode expression giving the 
type should be one of the operations in group 7b (below). The 
expression which returns the address (L-value) of the variable would 
normall y be a Declare_ Variable operation referencing the actual 
variable entity . 
Declare_ Type_ Id (T) 
- name 
- oh.1ect charactcrisucs 
· type nscl f 
Declares a type 1dent111,·r Object charactenst1cs " e,pec.ted to be an 
Rcode from group le Tht Rcode expression giving the type should 
be ont ol the operat"''" in group 7h lhelov. ). 
Declare_ Label _ Id (T l 
- n,1m c: 
- ob,1cct charac1..:n$tlC~ 
· l.,hel v:rlu, 
Th, ; declare , an e,pi,rn u,er-defim·d lahel Tia· labe l value ,s 
RCODE REFERENCE MANUA L 45 
expected to be a Declare_ Label opcode . 
Declare_ Routine_ )dent (T) 
- name 
- object characteristics 
- routine type 
- list of arguments, or null if niladic 
- name of result , or null if not function 
- expression returning routine desaiptor 
• function precedence (or O if procedure) 
Declares a procedure or function identifier. The Rcode expression 
g,vmg the routine type should be one of the operations in group 7b 
(below). This should be a procedure or function type, though (for 
example in Peano) it may be a generic type whose subtypes are pro-
cedure or function types . 
The expression which returns a desaiptor for the routine is 
inserted in-line in the Rcode generated l,y the front end , wherever a 
reference to the routine occurs. This would normally be a 
Declare_ Routine operation referencing the appropriate routine 
enttty. However. for routines declared externally, this would have to 
be an expression which built a routine descriptor out of a 
Define_ Symbol operation . 
Declare_ Module_ Id (T) 
- name 
- objeci characteristics 
' id numhe 
Declares a modul e 1den1ifi e1 The id numhcr correspond, 10 tha1 in 
a Declare_ Mod ul e mod ule enin,· declara1ion 
Group 7b: Type Reference!. 
Refer_ Type (T) 
" id numh~r 
• lexi c.11 level 
Thi 1 refe rcnc~, th,· typ,· rn111, declared with the specified id number 
16 PORTABLE LANGUAGE IMPLEMENTATIO'I PROJECT 
at the given lexical level. 
~efer _Module_ !dent (P) 
- name expression 
This opcode enables module identifier numbers to be translated back 
into the name of the imported module name. 
:;eneric_ Refer_ Type (T) 
• id number 
• lexical level 
• generic nesting level 
This opcode only occurs within a generic entity , when referring to 
some type entity declared within it. Facilitates re-assigning of lexi-
cal levels and id numbers , when instantiating the generic object. 
Refer_ Imported_ Type (T) 
Subtype (T) 
• id number of type within unil 
• unit number 
This opcode references the type en11ty exported from the specified 
compilation unit. 
- parent type 
- zero or more actua l argument, 
• number of actua I arguments 
This instantiates the genl'ric type which " given as the parenl 1ype . 
The root opcodes specif y,n g the parcnl type. and an '.· actual type 
arguments. may be all\ ,,f the opcodes i this group. including 
further Subtypes. 
Group 7c: Components of Type Definitions 
The operations in th is sub-group occur as the type defi nitJon operand. in a type 
entity declaration (see the Define_ Type ,,perau0n in group lb , above ) They sur· 
ply further informati0n specific w each class nf type. As always. references to 
other types are done with operau0ns in ~rour 7b 
Enumerated_ Type (T ) 
- zern or mort· idL·nt, fier.., m:1~ in~ up thi: enumeration list 
RCOOE REFERENCE MA NUAL 47 
• number of identifiers in the enumeration list 
Defines an enumerated type. The identifiers in the enumeration list 
are automatically declared as constant identifiers , with literal values 
starting from O up to one less than the number of identifiers in lhe 
list. 
Char_ Type (T) 
Specifies the "standard" type char. 
Integer_ Type (T) 
Specifies the universal basic type 1n1eger. 
Cardinal_ Type (T) 
Specifies the universal basic type cardinal. 
Real_ Type (T) 
• number of exponent bits 
Specifies the universal basic type real . The number of exponent bits 
is merely a count and docs not impl y any particular underlying 
hardware exponent base or offset . etc. 
Subrange_ Type (T) 
- base type of subrange 
- expression givi ng low bound 
- expression giving high bound 
. nonnaltsauon code 
• signed or not 
• minimum valut 
• ma.>..1mum ,·3lui: 
Defines a subrange type . and whether n is signed or not. Only 
subranges of the predefined type rnreger are signed . The norrnalisa-
tion code is used 10 define the correct storage values in minimum 
storage space: The minimum and max imum values are expected to 
be with in the range of a machmc word for representation purposes. 
Arra)_ Typt, (T ) 
- suhscnpt t)T'' 
· array componcnl type.· 
48 PORTA BLE LANGUAGE IMPLEMENT/I TION PROJECT 
Defines an array type . Note that in many languages, mult i-
dimensional arrays arc treated as arrays of arrays• Thi s is the tech-
nique for specifying multi-dimensional array structured types in 
Roode . 
Ream!_ Type (T) 
- z.ero or more record fielti nefinitions 
• number of fields 
Defines a language record type . Each field definition is given by a 
Record_ Field operation, defined in group 7d below. 
Note that records in Modula-2 and Pascal differ from the ones in 
Peano in that they allow variants. These are represented by extra, 
compiler-generated fields whose types are union types, since unions 
are the Peano equivalent to variants . 
Union_ Type (T) 
Set_ Type (T ) 
- z.ero or more selecting expressions 
- z.ero or more union field definitiom 
• number of selecting expressionr 
• number of field s 
Defines a union type . Each field definiti on ,s g,ven b, a 
Union_ Field operation. defined in group 7d belo-.. 
For languages which perrnit record variants these ma) bt' 
represented in the .. Rcode by extra record field s whose tYf'e, arc 
union types . In this case . there wi ll be j uS\ one selectmg expression. 
which is a dummy any-.. ay. smce !ht variants are not checked 
- set element type 
This defines a language set type for rel evant langu:igc~. 
Channel_ Type (T ) 
- element type 
• is FIFO or not 
• d irect,on onward or noll 
This opcode de nnes a type which 1~ some I.ind of ,npuvoutpu 
RCOOE REFERENCE MANUAL 49 
pathway from the machine (which could be vinua l) . 
Routlne_ Type (T) 
- list of argument types, or null if niladic 
- result type, or null if not function 
- variable result siu 
• argument count 
• fixed result siu (or -1 if none or variable) 
Defines a procedure or function type . This corresponds to the .. sig-
nature" of a routine or function . 
Pointer_ Type (T) 
. pointer component type 
This opcode defines a pointer type . 
Capsule_ Type (T) 
- na me 
- initialisau on code re ference 
• fina hsation code reference 
• m od ule id 
• lexical level 
Thi s defines a capsule type of object used in obJect-oriented and 
similar language , 
Generic_ Type (T) 
- size 10 bytes of arghst 
- s12e 10 bi ts of argh st 
- forma l arg grour 
. typL definiuon proper 
• allou unconsrrai ned or not 
Defines a generi c typ, The formal argument grour " given b: an 
Arg_ Grnu r operation . defined in grour 7d . belo" The tyrw 
de fmi uon proper ma : b,· f ive n hv an \ of the operations m grour it 
. for instance. 11 C<'uld be a 1urthcr Gencm _ Type 
Discrete_ T)·pe IT ) 
Thi s opcode srcc111es that a ran1cu lar generic tyr e 1s to be di screte 
This orcoa , 1s norma ll,· handled onl, withi n the fron t end oi an , 
50 PORTA BLE LA NGUAGE IMPLE"1ENTATION PROJECT 
compiler system . T he back end wi ll never see this since 11 should 
ha\'e been converted into the e xact type speci ficat ion before reacb.ing 
that point. 
Private_ Type (T) 
- actual type 
Defines a private or otherwise hidden type. The Rcode expression 
giving the actual type is one of those operations in group To , above . 
Group 7d: Components of Type Definitions 
Record_ Field (T) 
· name 
. compone nt character isu cs 
. type of fie ld 
• default value, or null if none 
Defines a field of a record structure . This operation only occurs as a 
subtree of a Record_ Type operation. 
Union_ Field (T J 
. li st of associated case labels 
- name 
- component charac1en s11c~ 
. type 01 tield 
• number of associated sclectmg expression 
• numher of associated case label s 
Defines a fiel d o: a union or variant record structure Th is operation 
onl~ occurs a~ ~: ;;;ur,trec 01 a Uni0 _ Typ~ operatton 
The selecllnf expre ,SH)n number. 1f noni.ero. 1s u;ed tO specif: 
an ele ment in tne hs1 of selectmg expressions hanging off the parent 
Union_ Typt nnde: the ,e are numbered from l upwards This field 
of the union 1, se lected ii and onl, if the sclecung expression 
matche s nne 01 the case lahel s associated with th is un ,on fiel d, and 
rr0v,ded no rreced ing IJO\\er-numbered I selecting expression select, 
an, union fiel ~ I I th, unmn fie ld comprise s the cl s<c- p3n of the 
uninn tha1 1, It 1~ w hL selected on l:- 1f none of the selecung 
expres ions 'L' ll' ..:'b an: uni on ne ld. thi s 1s indicated by the- numbc1 
ol the _electrn~ nrre,sion. and the number of associated case Jabeb . 
~. 
RCODE REFERENCE MANUAL 
both being rero 
Value_ Arg (T) 
- name of argument 
- type of argument 
- expression giving field offset 
• is value held in a bit field 
51 
Defines a formal value argument in a generic argument group . The 
field offset indicates the location of the value of the argument , within 
the composite value of the entire argument group . It is interpreted 
as a bit offset for a bit field (that is, the argument group structure is 
packed ), and as a byte offset otherwise This opcode onl y oocurs as 
a subtree to an Arg_ Group operation (below). 
Type_ Arg (T) 
- name of argument 
- formal type 
Defines a formal type argument in a generic argument grour. where 
the forma l type has been explicitl y specified . No storage is taken ur 
in the composne argument group value , for holding run-time infor-
mation specificall y about thi s argument. This opcode only occurs as 
a subtree to an Arg_ Group operation (below). 
Unspec_ TyJ>e_ Arg (TJ 
- name of argument 
- expression gi\' tng fi eld offset 
• is s1z.e held m a bu fiel d 
"" 1~ s12e expressed rn bn, 
• 1s pri "ate al lowed for actual typ, 
Defines a formal type argument in a genenc argument group. where 
no forma l typ e has been specified . On instantiation , the actua l type 
may be any type . except that private types ma v be di sa llowed 
V,ith m th,· genen c type. nothing is l..nown about the type except 
its si ze . T he neld o ffset expression indicates the offset. withm the 
composnc argument grour \'alue. to the fiel d containing this size 
The off. et " to be interpreted ,n bits for a bi t fie ld (1. e. the argumen• 
group structure is packed ,. and ,n bytes otherwi Se 
T h,· " '-" n se lf mo , h, ex pressed m bns or in b)1es T he lormec 
.52 PORTABLE LANGUAGE IMPLEMENTATION PROJECT 
is needed if the formal type is used as pan of a packed structure 
anywhere within the generic. otherwise the laner 1s adequate Tlus 
operation onl y occurs as a subtree to an Arg_ Group operation 
(below) . 
Arg_ Group (T) 
- rero or more formal argument definitions 
• number of formal argument definitions comprising group 
Defines a group of formal arguments for a generi c entit) The for-
mal argument definitions are some mixture o f Value_ Arg. 
Type_ Arg and Unspec_ Type_ Arg opcodes 
Group 8: Two-byte Extended Rcodes 
The possibility of extending Rcode for either special implementation specific 
functions or general purpose extensions, 1s allowed for in the Rcode basic design 
by including three extension codes :-
a . Implementation_ Specific . 
b . Extension . 
c . Reserved - for compiler GAS Code !second-level intermediate language J 
See also GAS Code User's Guide and Reference Manuals. 
Each of these is expected to have at least two bytes of fo llowmg literal mfor 
mauon . The first o f these bytes 1s a count of the following literal bytes (ur to 255 
maximum ) and the second 1s the extended opcod e itsel f The need for more lneral 
in formation 1s. of course , dependent upon tho semantics attributed to the extens10, , 
opcode b~~e 
STANDARD PRAGMA5 
The foll owing are the currently-defined standard pragmas for commum cauo, 
between the front and bacl.. ends of a comri lation system 
Side_ Effects One b,, e Bo{llean li1era l parameter 
Thi s r ragmll indicate "' whether th (' bad. end 1~ I rl.'l tr1 t1pu m1st 
ex pression, as though the,· had 110 side effects la s m Pean01. N no• 
te g . Mndu la -1. Pasc.11 ) The flag 1s true to indi cate tha t s1dL" eflect , 
may be present. fal se othern·»e The d efault assumpuon is tha t std, 
effects arc absent 
RCOOE REFERENCE M.._NU"L 53 
Column_ Info . One byte Boolean literal parameter. 
This pragma indicates that each subsequent Roode opcode includes a 
column number byte (if the parameter is true) or not (false). The 
default assumption is that column numbers are omitted . 
B i b i i ograp h y 
[1_ Dortable anguage Implementation Pro jec -
Portable Linker Refere ce anua l 
University of Waika t o 
[2] Portable Language I mp l ementation Project-
Generic Editor Reference Manual 
Universi ty of Wai kato 
[3] Portable Language Implemen·ation Project 
Runtime Library Users' Guide 
U i v er s i t y of Wa i k at o 
[l~ Portable Language Imp l ementation Project 
[ c: .. ._,) J 
Unix Runtime Librar y Guide 
University of Waikato 
Dian a Reference Manual, March 81. 
G. Goos a n d Wm A. Wu! editors. 
[6] Strong, j ' J . Wegste i n, A. Tritter , J. Olszty , 
0. Mock, and T.Steel [1958]. ··re prob l em of 
pro gra mming communication with changing machines : a 
proposed solution·, Comm. ACM 1:8 (August) 12-18. Part 
B i bi iography-1 
B i b i i og r ap hy 
2: 1 :0 1Septe r ber l 9 - 5. Pe ort o, t-~ ~d-Hoc comm · ttee 
on Un i versal Languages. 
[ 7] Anklam, P, Cu t l e r , R. Heinen, ~- ac aren [1982 _. 
" Eng i neering a co mp il er, VAX-11 c c e o erat i n 2 1 d 
o pt imi zat ion". 
[8] VAX Architec ure Handbook; Eq u i pme , • 
Corpora t ion. 
[9] Aho, A, R. Sethi, J . U I I man [ ::,86]. "Com p il ers 
Princ i p le s, Techn i ques and Too l s". 
[10] Rust in, R. (e ditor) [1972] . " Design 2 d Optimizat ion of 
Compi l ers " . 
[11] Bornat , R. [ 1979] . "Under sta ndi~ ; and Writ ing 
Compi l ers." 
[ 12] Ganapat . i M, J. Hennessy, c. = i scher [ 1 982 _ . 
"R etargetab le Compi l er Code Genera: · o .. Computi g 
S rveys, 14, Dec 573-592. 
[13] Nori K, U . Ummann, K . Jensen, H. , 2;'9 l l , Ch. Jacobi 
[ 1 981 ] . " Pascal P im p l ementat i on notes · , 
Barren 125- 170. 
[14] Chr i sta in Rousn i ng Ltd. 
[1 5] Sc hmi dt U , R. Vo I I er [ 1 986 J . "Te jevelopment of a 
mach in e inde pendent mu I t i I ang uage : ~mp i I er s y stem 
B i bi i og r ap hy -2 
B i bi i ography 
applying the Vi enna eve l opmen .. e :, C .. Sys em 
Oecsr i pt i n Methodo l ogies 557 - 590 . 
[16 Ganap at C . F i s cher _19 8 5]. " Programm i g Langu a ges 
and S y stems " , Oct 560 -5°9 . 
[17] Tane bau m A, H . Van Staveren, E. K e i zer , J . Ste venson 
[1983]. " P rac t ical tool kit for making portable 
compiler s" , Communications ACM, 26, 654-660 . 
[18] Yankov 8, S. Bonev, L. ikolov [1985]. Micropro cess in g 
a nd Microprogrammi ng, 16, Nov-Dec, 221-226 . 
[19] Graham S [1984]. 
251-288 . 
"Table-driven code gene ration", Lo rho 
[20] P owe ll M [1984] . "A portable opt imi z in g co mpiler for 
Modula-2 ", ACM SIGPLAN notices 19:6 , 310- 318. 
[21] Johnson S 1 975 J . ·· v acc - yet anot her comp i !er 
compi l er ", Computing Science Techn i ca i Report 32, AT&T 
Be l I Laborator i es , Murray Hi 11 , N.J. 
[22] Johnson D [1979] . "A tour through the portab l e C 
compi l er " , AT&T Bell Laborator i es, Murray Hi '! , N.J . 
[23] Snyder C. [1975] . " A portable compiler for tl-ie language 
C", Pro je ct AC, MIT , AD-A010218/6 May. 
[2 4] A h o P , R . Se hi, J. U I I ma n 
princ i p l es, techni que s a nd tools , 
B i b I i o grap hy -3 
[ 1986] . 
648- 653 . 
·· Co, pi I er s 
B i b i i ography 
[:25] Heyliger G , L. McElhaney, T. Dwyer, P. Keziah [1980]. 
'"Recommendat i ans foi- a retargetab I e compi l er ··, Mart i n 
Mar i etta Aerospace , AD-a084195 / 7 Ma r ch. 
[26] Akin T [1981]. "A reusable code generator for 
PRI ME SO- ser i es computers ··, Georgia Inst. 
Tec h nolog y, AD-A108820/2 AUG. 
B i b i iography-4 
the 
of 
