A b s t r a c t A major problem in deriving a compiler from a formal definition is the production of correct and efficient object code. We propose a solution to this problem in the framework of a compiler writing system where the compilation process may be viewed as successive translations from an attributed abstract syntax to another abstract syntax. The code-generator generator needs two kinds of specifications :
A b s t r a c t A major problem in deriving a compiler from a formal definition is the production of correct and efficient object code. We propose a solution to this problem in the framework of a compiler writing system where the compilation process may be viewed as successive translations from an attributed abstract syntax to another abstract syntax. The code-generator generator needs two kinds of specifications :
an attributed abstract syntax (AAS) of the target machine : it is the description of the I.R. given as input to the code-generator. a target machine description where the basic concepts are hierarchically described by tree-patterns. These tree patterns are terms of the target abstract syntax.
The code generation process is divided into two steps : the instructions selection process and the register allocation one. The instruction selection process applies a set of rewriting rules driven by tree templates derived from the target machine specification to the I.R. term. The register allocation process consists of several evaluation passes of an attributed grammar derived automatically from the target machine specification. The first one sets the constraints on temporaries according to the whole context, the second one does life-time analysis and packing on temporaries, the last one assigns effective resources to temporaries.
I n t r o d u c t i o n
A compiler, in order to produce code, needs full knowledge not only of the syntax and the semantics of the source language but also of the structure of the target machine and the semantics of its instruction set. Considerable research effort has been invested into making compiler construction as modular and as automatic as possible. Tools based on formalisms such as abstract data types, attributed g r a m m a r s are widely known and used to produce front ends. In order to have a uniform approach of the whole compilation process, it would be useful to use the same formalism for the front end and the back end. An approach using high level semantics has been developped in MESS [LP 87 ] but the emphasis had been put essentially on the front-end. This paper is devoted to the back end of the compiler writing system. Usually, code generation is converted into a syntactic process using tree structured patterns describing instructions of the target machine and a tree structured I.R. The instruction selection is done by covering the input I : R by instruction patterns. Various works differ by the way in which the instruction set of the target machine is described and by the pattern matching and transformations used to reduce the I . R tree.
LR (1) All these code-generator generators not are generally embedded in a full compiler writing system using a uniform formal framework. The instruction set of the target machine, except in the works of Giegerich [Gie 90] is not described by a formal semantics. It is often obscured by informations related to the reduction algorithm and cannot be got straightforward from the handbook of the target machine. As a consequence of this lack of formal semantics, there is no means to prove the correctness of the generated code.
Our compiler writing system produces a compiler from a specification including three parts : a source language, a target language definition and the description of the implementation choices.
The fundamental background of the specifications is that the whole compilation process is viewed as successive mappings from an abstract data type ( A D T ) into an other abstract data type [GDM 84] . The axioms of the various A D T allow to prove the correctness of each step of the compilation process [DMR 89] .
Since an abstract data type without axioms is hardly more than an abstract syntax, the tools used here to handle these specifications are attributed abstract syntaxes (AAS for abbreviation) and attribute grammars (AG) specifying the mappings between AASes. An A A S is mainly an abstract syntax with declarations of attributes attached to phyla. Productions of the A G are operator definitions of the AAS. In the whole paper, we call semantic rules the attributes definitions related to the productions of the AG. The system FNC-2 has especially been designed for dealing with these tools [JP 90] .
We focus on the back end of the compiler writing system. It uses the same formal background as the preceding steps.
The first section presents the code-generator specification for the instruction selection process. It needs two kinds of specifications :
an A A S of the target machine : it is the description of the I . R given as input to the code-generator, -a target machine description hierarchically structured in three levels [DMR 87] .
It consists of a description of the target instruction set and a mapping of the instruction set into the target A A S (i.e the semantics of the instructions is given in terms of the target A A S ). The second section deals with the code generation mechanism which is divided into two steps : the instruction selection process and the register allocation process.
The compiler writer describes the semantics of each instruction of the target machine by a I.R. term. Thus the instruction selection process can be defined formally as a reverse translation as in [GS 88] [Gie 90], operationally the reverse translation is defined by a set of tree templates derived from the target machine specification and a set of rewriting rules that are machine independent. As the description language has a formal semantics, the correctness of the rewriting rules has been achieved [DMR 89]. They preserve the semantics of the I.R term.
This step produces a term of the canonical target AAS. It is the target AAS enriched by some universal operations on temporary ressources and restricted to the canonical form of instructions and addressing modes. This term is given as input to the register allocation step which is an evaluation pass of the AG based on the canonical target AAS, whose semantic rules are automatically generated by the system. 
Notions such as the size of an operand, the type or the name of a storage, the value of an address are specified by attributes attached to phyla of the target A A S .
2.2
The target machine specification
We provide a language to specify the instruction set processor of a target machine.
The basic concepts used are described by specific constructs of the language : storage bases, storage classes, value classes, access modes, access classes and instructions. Each construct is defined by properties such as the size of the associated addressable units or the semantics of the occurrence of the construct. This semantics is expressed using a term of the target A A S and takes into account its size. As the occurrences of a construct are related to the size of the addressable units, their semantic descriptions are nearly identical [DMR 87] . A solution proposed to deal with large algebraic specifications is the use of parameterization and instantiation mechanisms. Such mechanisms fit very well to our machine specification language. The compiler writer can factor some instances of a given construct in a generic pattern followed by the possible values of the generic parameters of the pattern . The system derives from this declaration as many occurrences of the construct as there are sizes of addressable units asssociated to it. The instantiation mechanism is bound to a name generation mechanism. Throughout the paper the following notations will be used :
• If n is a name and L is a variable, when L is instantiated by v , n!L builds the name n_y.
• <S V > means that V is a constant or a variable of sort S.
• All keywords of the language are in bold letters in the following examples.
Storage classes
A component of the physical storage does not represent the same operand depending on the size associated to the operation applied to this operand. For instance, an access to a register may designate a byte operand, a word operand or a longword operand. Thus, we define two fundamental concepts : storage base and storage class. A storage base is defined as a set of smallest addressable units of physical storage. For a given storage base, the compiler writer must describe as many storage classes as there are ways to gather storage base elements to represent logical storage units. A storage class occurrence is characterized by the following properties : In the machine description language, the storage class construct is described using a predefined keyword :[or each of these properties.
The MC68000 has two kinds of registers : the data registers dedicated to data values and address registers dedicated to addresses. Thus, the compiler writer must declare the two following storage bases :
Storage_base DREG --Data registers Set is { DREG [k] w h e r e k in 0 .. 31 } End Storage_base AREG --Address registers S e t is { AREG [k] where k in 0 .. 13 } End Let us consider the storage classes related to the data register storage base. As an access to a data register may represent an access to a byte operand, a word operand or a long word operand, the compiler writer must declare three storage classes respectively : the dregister_B, the dregister_W and dregisterL storage classes. The use of the parameterization and instantiation mechanisms allows to avoid the repetition of similar declarations. The compiler writer can declare a generic pattern of a data register storage class using generic names. The instance part of the declaration includes the information needed by the name generation mechanism to build the actual names. I n s t a n c e s s i z e in {B, W, L} c a s e size is B :
length is 2 L :
length is 4 End c a s e End From this pattern, the system deduces the three descriptions of actual storage classes. The specification of the address register class is similar to that of the data register class. The only difference is that the byte access to an address register is not available.
A c c e s s m o d e s
Let us consider an assignment statement of A to B, we shall state in the sequel that A is the source operand and B the destination operand of the assignment. In an instruction context, an operand is designated by an addressing mode. Whereas an addressing mode in source position designates the contents of a storage, it designates the storage itself in destination position. A particular machine has several addressing modes. For a given addressing mode of the machine, the compiler writer must define as many access modes as there are associated storage classes. An access mode pattern is specified by :
-a canonical form, representative of the access mode, including its name and its parameters. These parameters are formal storage or value classes.
-its related attributes : length, format and costs.
-a template that describes the access path to the corresponding operand : the operand in source (resp. destination) position is defined by the term obtained by applying the dereferenee (resp. the cell constructor) operation to this template.
As for the storage class construct, the compiler writer can define a formal access mode. Among the numerous addressing modes of the MC68000, let us consider the indirect with displacement addressing mode. This access mode has instances which depend on the size of the location indirectly accessed in source position. Thus the compiler writer defines a generic access mode pattern "disp_am!size" parameterized by the size.
A c c e s s _ m o d e
Canonical_form --Indirect with displacement access modes disp_am!size ( <aregister_L reg> , <value_W val>) A t t r i b u t e s $1ength = size --length of tile addressable unit $fmt = ~val(reg)" --Assembly language format T e m p l a t e index ( cont_of_areg_L (<aregister reg>), const_value_W (<immediate_value val>)) Instances size in {B, W, L} End As the template of the previous definition gives an address whose dereference operator is cont_oLaddress!size and the cell constructor is designates_address!size, the system derives the two following generic access modes, respectively in destination and source position : designates_address!size ( index (cont_of~regL (<aregister reg>), const_valuefi, V (<immediate_value val>))) cont _of..address!size ( index (cont_of.areg_L (< aregister reg>), const_value_W (<immediat e_value val>)))
This leads to three templates in source position (respectively in destination position) when the size is instantiated by {B, W, L}. This leads to three templates in source position (respectively in destination position) when the size is instantiated by {B, W, L}. The indirect with index access mode has instances which depend on the size of the location indirectly accessed in source position.
A c c e s s _ m o d e C a n o n i c a l _ f o r m --Indirect with displacement access modes dindex_am!index_size!size ( <aregister_L regl >, < dregister!index_size reg2 > , <vaiue_B val >) A t t r i b u t e s $1ength = size Sfmt = ~ val (regI, reg2.index_size)-w h e n val = 0 -0 (regl, reg2.index_size)~ T e m p l a t e index (cont_of_areg_L (< aregister regt >) , add_L (sign_extend_L (cont_of_dreg!index_size (<dregister reg2>)) , sign_extend_L (const_value_B ( <immediate_vMue val>)))) w h e n vM = 0 index (cont_of_areg_L (<aregister regl>) , cont_oLdreg!index_size ( <dregister reg2>)) I n s t a n c e s size in {B, W, L}~ index_size in { W, L} E n d
Notice that an optimizing case is described in the specification of this access mode by the clause when, if the value of vM is 0.
. . 3 A c c e s s classes
The operands of an instruction are access classes which are defined as sets of access modes. An access class can be also specified by a generic pattern including the instantiation of its elements. There are as many instances of a generic access class as there are possible sizes of operands.
Access_class
< All_access!size AM > = dreg_am!size (< dregister!size reg >) w h e r e size in {B, W, L} = areg_am!size (<aregister!size reg>) w h e r e size in {W, L} : ..o E n d
. . 4 I n s t r u c t i o n s An instruction may be characterized by the following properties : -the access classes defining the operands to which the instructions apply -its related attributes : format, i.e the syntax in the assembly language, length -the template describing what is performed by the instruction (it is a term of the abstract data type)
Nearly every instruction of the target machine may be applied to the different lengths of its operands. In order to avoid the repetition of such descriptions, the compiler writer specifies a pattern of an instruction and its instances. Let us consider the move instruction which corresponds to an assignment operation. The size of the instruction may be specified to be a byte, a word or a longword. We obtain the following specification :
Instruction
Canonical_form move!size (<All_access!size AM1>, <Altdata_access!size AM2>) A t t r i b u t e s $1ength = size Sfmt = -MOVE.$1ength Sfmt(<All_access!size AMI>) , Sfmt (<Altdata_access!size AM2>)" T e m p l a t e assign!size (src (<All_access!size AM 1 >), dst (<Altdata_access!size AhI2>)) Instances size in {B, W, L} End
The addi instruction specifies the addition of an immediate operand to an appropriate operand of the alterable data access class :
Canonical_form addi!size ( <Immediate_access!size AM1>, <Altdata_access!size AM2>) A t t r i b u t e s $1ength = size Sfmt = "ADDI.$1ength $fmt (<Immediate_a~:cess!size AM1 > ) , Sfmt (<Altdata_access!size AM2>)" T e m p l a t e assign!size ( add!size (src ( <Altdata_access!size AM2>), src ( <Immediate_access!size AMI>)) , dst (<Altdata_access!size AM2>)) Instances size in {B, W, L} End
The asl instruction specifies the shift of the content of a data register by a quick value :
Canonical_form ast!size ( < Quick_access AM1 >, <Dregister_access!size AM2 >) A t t r i b u t e s $1ength = size Sfmt = "ASL.$1ength $fmt(<Quick_access AM1> ) , Sfmt (<Dregister_access!size AM2>)" T e m p l a t e assign!size ( shift_al!size (src ( <Quick_access AMI>), src ( <Dregister_access!size AhI2>)) , dst (<Dregister_access!size AM2>)) Instances size in {B, W, L} End
2.3

The canonical target machine A A S
The bottom-up matching process of the I.R is carried out until each modification of the I.R. is identified to an instruction template. For each modification, in the context of an instruction template, operand subterms are matched with access mode templates. If they are leaves of the instruction template, they are replaced by their canonical form in the modification, else the location designated by the access mode is stored in a temporary and the modification is rewritten using this temporary. The process goes on in order to make the modification closer to an instruction template. Finally the modification is flattened in a sequence of universal store trees and an instance of the instruction template [DMR 89] . The universal store trees must be described. Thus it is necessary to enrich the target machine A A S with corresponding phyla and operators. The following specification is automatically added by the system. 
The interface specification
The system needs to link the internal names such as temporary_am, Univ_assign, and the actual names of the machine specification that become possible synonyms during the rewriting process. For that purpose, the system uses an interface declaration module specified by the compiler writer. For the MC68000 it follows : 
Target templates and a t t r i b u t e grammar derivation
Two modules are used to process the target machine specification and give two outputs, respectively a set of tree patterns and an attributed abstract syntax of the target machine. The first module builds three families of trees corresponding to access mode templates in source posit.ion and destination position and instruction templates. These families are written into Prolog clauses [DMR 88] . The properties of these trees derived from the specification are translated in Prolog clauses. These tree templates are used by the rewriting step to achieve the instruction selection. The second module builds an A G for FNC-2. The evaluation of the A G achieves the register allocation step.
96
3
The code-generator generator
I n s t r u c t i o n s e l e c t i o n
For each I . R term~ the rewriting algorithm needs to know the boundary where the access mode pattern matching can stop and where the instruction pattern matching can begin. For tlfis purpose, we define a partition of instruction templates into instruction classes that have the same boundary. Two templates of an instruction class can be represented by a canonicM representative. The instruction selection algorithm applies a set of rules as specified in [DMR 89] . The strategy of application of the rewriting rules is strongly connected with the notion of canonical representative of an instruction class which defines the context of the search for access modes. The variables occur ones in a canonical representative and represent the operands, they are annotated by a property source or destination.
R e w r i t i n g rules
Notations Let AN~[~o~rce be the ordered set of access mode patterns in source position. Let AMdestination be the ordered set of access mode patterns in destination position. Let IC be the ordered set of instruction class patterns. In all the following rules, the search for a pattern of a set of patterns that matches a term is done by trying the patterns of the set one after the other, with respect to the set ordering. Some of the rules are more formally described in . The strategy of application of the rewriting rules is strongly connected with the notion of canonical representative of an instruction class which defines the context of the search for access modes. The variables occur ones in a canonical representative and represent the operands, they are annotated by a property source or destination.
Notations Let AMso~ce be the ordered set of access mode patterns in source position. Let AMdestination be the ordered set of access mode patterns in destination position. Let IC be the ordered set of instruction class patterns. In all the following rules, the search for a pattern of a set of patterns that matches a term is done by trying the patterns of the set one after the other, with respect to the set ordering. Some of the rules are more formally described in [DMR 89] . The first one describes the replacement of a subtree which is an instance of an access mode by the instantiated canonical form of this access mode : informally when matching a tree t with an instruction class pattern, the source and destination position contexts are set according to the position property in the instruction class pattern. If there is a subtree t i of t which is an instance of an access mode pattern in the right position, then t i is replaced ]n t by the instantiated canonical form of the access mode pattern. The instruction class and access mode patterns are matched with respect to the ordering of the two sets IC and AM[. We recall here completely the second rule. It gives a good idea of the use of temporaries. It describes the tranformation to be done when a subtree supposed to be an operand in an instruction context is not an instance of an access mode but has inner subtrees that are instances of access modes. Let t be the tree to transform, supppose there exist T E IC and a substitution aIc such that cqc = { < Ai, ti > } I Ai E Var(T), 1 < i < n with aIc(T) = t and suppose there exists t i such that < Ai, t i > E alC , and if the context of ,-Xi in T is the position pos and such that for all A E AMpo~ there exists no substitution p such that p(A) = t i. Then if there exist :
Rule R2
1. a largest subtree tlj starting from the leaves of t i, 2. and B E AMso~ce 3. and a substitution 7 such that 3'(B) = tij.
we define tile rewriting of the tree t by :
where 6 is the substitution : ~ = { < 0, tmpa > } where t m p a is a new temporary location. The rules described in [DMR 89] only deals with terms whose subtrees considered as operands in an instruction context are nested access mode instances.
When the compiler writer describes the translation of an expression from the source A A S to the target A A S , the term produced contains embedded arithmetic and access path operators. Thus it is necessary to specifl" rules to deal with nested arithmetic operators. Informally. when in an instruction context, a subtree t i cannot be reduced to an access mode instance using the rules of [DMR 89], then starting from the leaves of ti, we took for the biggest subtree tij of t i which match the left son of an arithmetic instruction A. First the subtree of tij corresponding to the source operand modified by the instruction is saved in a temporary, second the subtree of ti3 is replaced by the reference to the temporary in source position, third an instance of A is generated using tij and fourth, tij is replaced by the reference to the temporary in source position, , Sh-cs (Denotation_temporary)) --replace the information related to the temporary by the intersection of the --constraints related to the left hand side instructions with that of current one e n d w h e r e ; For instance, in the asl tree, the constraint for trap0 is Dregister_access-interface which binds the temporary trap0 with a long data register as the size of the instruction is long. In the same way, the constraint for trap1 in the sub ~:ree is Altdata_access-interface which binds also tmpl with a long data register, trap2 is bound with a long data register because of the constraints on the second operand of an addi instruction.
The overloading of the denotation of a temporary is done by looking for the information coupled to the name of the temporary in the global attribute Ssymtab.
w h e r e Denotation_temporary --* use $term := let info : info-binding :=lookup ($id (Denotation_temporary) , Ssymtab (Denotation_temporary)) in if info = Aregister_access-interface t h e n Temporary-union (Denotation_aregister ( ) with Stype := aregister, $id := $id (Denotation_temporary) e n d w i t h ) elsif info = Dregister.~ccess-interface t h e n Temporary-union (Denotation_dregister ( ) w i t h Stype := dregister, $id := $id (Denotation_temporary) end w i t h ) else Temporary-union (Denotation_temporary ( ) w i t h Stype := temporary, $id := $id (Denotation_temporary) end with ) e n d if ; e n d w h e r e ;
The overloading of the temporary access mode is done using the union of types. 
