Generating Efficient Simulators from a Specification Language by Larsson, Fredrik
SICS Technical Report
T 
ISSN 
Uppsala Master	s Theses in
Computer Science 
Examensarbete DV
  

 
ISSN 
Generating E cient Simulators from a Specication Language
  
Fredrik Larsson
Computing Science Department
Uppsala University
Box    S   Uppsala Sweden
This work has been carried out at
Swedish Institute of Computer Science
Box  	 S 	
  Kista Sweden
and has been sponsored by
Ericsson Utvecklings AB
Box   S  

Alvsjo Sweden
Abstract
A simulator is a powerful tool for hardware as well as software development However
implementing an ecient simulator by hand is a very labour intensive and error
prone
task This paper describes a tool for automatic generation of ecient instruction set
architecture ISA simulators A specication le describing the ISA is used as input to
the tool Besides a simulator the tool also generates an assembler and a disassembler for
the architecture We present a method where statistics is used to identify frequently used
instructions Special versions of these instructions are then created by the tool in order to
speed up the simulator With this technique we have generated a SPARC V simulator
which is more ecient than our hand
coded and hand
optimized one
Keywords  Instruction Set Simulator Interpreter Specication Language Instruction
Set Architecture SPARC Automatic Code Generation
Supervisors  Peter Magnusson Bengt Werner
Examiner  Johan Bevemyr
Contents
  Introduction 
 Background                                                                           
 Levels of Abstraction                                                                 
 The Aim of This Thesis                                                             
 Benets of a Simulator Generation Tool                                           
 Organization of This Thesis                                                         
 Simulation Techniques 
 Intermediate Format                                                                 
 Threaded Code                                                                       	
 The Simulator Generation Tool 
 Aims                                                                                   

 Design                                                                                 

 First Approach                                                               

 Improvements                                                                 
 Core Interface                                                                 
 Test Suites                                                                   
 Implementation                                                                       
 Discussion                                                                             
 The Description Language  
 Requirements                                                                         
 What Needs to be Expressed                                                       
 Example of Instruction Coding                                                     
 Design                                                                                 
 First Approach                                                               
 Improvements                                                                 
 Discussion                                                                             

 The Intermediate Format 
 Introduction                                                                           
 Requirements                                                                         
 Packing Parameters                                                                 
 Storing the Intermediate Format                                                   
 Optimizing Using Statistics                                                         
 Motivation                                                                   
 Specialization                                                                 
 Generalization                                                               

 The Iteration Process                                                       
 Generated Parts 
	 Introduction                                                                           
	 Main Include File                                                                     
	 The Decoder                                                                         
	 The Service Routines                                                                 
	 The Disassembler                                                                     	
		 The Statistics Converter                                                             	
	
 The Assembler and Output Functions                                             

 Performance 
 Future Work 
	 Related Work  
  Conclusion 
A An Example of a Speci
cation  SPARC 
B An Example of a Decoder 
C An Example of Generated Service Routines 

 Introduction
In this section we give a brief introduction to the eld of simulators why they are used and
how they can be created The purpose of this work will also be discussed
   Background
Ever since the early days of computer history simulation techniques have been an important
research area When new computer systems are to be built it is essential that the behavior and
correctness of such systems can be tested and veried in an ecient way With a simulator
it is for example possible for a hardware manufacturer to test new ideas and solutions for
a component as well as measuring its performance even though it has not yet been built
Using simulators for hardware veri cation can thus reduce both the development cost and
time signicantly
Besides hardware verication simulators can be used as a tool in software development  A
simulator can act as an ordinary proler counting instructions and locating commonly used
routines etc but it has also the ability to collect information on a much lower level For
example the cache performance of a program can be analyzed giving detailed information
about which instructions cause cache misses Pipeline throughput and functional unit usage
can be examined Also if an entire operating system is run on top of a simulator the behavior
and performance of the whole system can be analyzed All this is important when optimizing
for a specic target machine
When debugging a nondeterministic system which depends very much on timing issues such
as an operating system kernel or a server program lots of timer events a simulator can be a
very helpful tool since if it is deterministic a requirement it can always reproduce the same
state every time the system is run In this way it is possible to isolate infrequent scheduling
dependent errors
Again if a simulator exists for a new architecture which has not yet been built it is still possible
to write software for it Development of hardware and software can be done in parallel or
in any order The simulator g  for example was used to debug a UNIXkernel before
hardware was available
A third and perhaps most common reason for developing a simulator is that it makes it
possible for one machine to run applications from another environment A common word for
this kind of simulator is emulator  The PCx Software Emulator  for the Amiga Computer
which emulates a Pentium Pro is an example
  Levels of Abstraction
A simulator can simulate a processor at dierent levels of abstraction from the analog transis
tor level to the instruction set architecture level as seen by an assemblylanguage programmer

A list of dierent levels which could be identied is presented here 	 
  Instruction Set Architecture Level The level of the instruction set architecture
ISA is the highest level of abstraction where only the result of each instruction is
seen and not the mechanism behind it an assembly language programmers view For
example no pipeline or functional units are modeled
  Organizational Level At the organizational level pipelines instruction fetch and
issue caches the MMU the behavior of the functional units etc are simulated This
gives an almost clockcycle true simulator in which the timing of the instructions and
potential resource conicts are considered
  Register Transfer Level The register transfer level describes the internal operation
of the functional units how storage elements buses and control signals are congured
bit level
  Logical Level The logical level simulates the logical equations that implements a
given datapath
The lower the simulation level is the more information need to be be processed which leads
to longer simulation times Of course dierent parts of the simulator can use dierent levels
of abstraction The user can concentrate herself on implementing the parts she wishes to
examine on a lower level and thus reducing simulation time for other less interesting parts It
is even possible to switch level during simulation for example if more accurate information is
needed from certain parts of a program while other parts are not so interesting besides keeping
a correct abstract state Such a technique has been used in SimOS  and is discussed in
more details in 
  The Aim of This Thesis
Implementing a simulator by hand could be a very labourintensive and errorprone task It
could take several months to complete a simulator which means that a lot of work will be
focused on making the simulator work instead of using it as a tool for making design decisions
It might also be dicult to verify the correctness of the simulator when it is ready for use
It would be useful to have a tool which makes it possible to generate a simulator from some sort
of specication a metatool Such tools already exists today but most of them are focused
on abstraction levels below the ISAlevel see section  and are thus more suitable for
hardware verication than for making instruction set simulators An example of a Hardware
De nition Language HDL is the VHDL  For the ISAlevel there does exists a tool 	 

but its solution for generating simulators  writing the specication of an ISA in an functional
like language and then executing that specication when simulating  is not ecient enough
for all needs

Thus the aim of this thesis was to construct a tool that could from a ISAspecication
generate a simulator that should be as fast as or preferably faster than a handcoded and
handoptimized one Of course such a simulator will be considerably more ecient than
simulators on lower abstraction levels
 
 Benets of a Simulator Generation Tool
As indicated in the previous section there are several benets of using a simulator generation
tool SGT The user will make less errors if all information of an ISA is gathered into one
preferably compact specication le instead of being scattered around over several source
les Changing or modifying instruction sets for a simulator is much easier done by means
of a tool than by hand The user can concentrate herself on central parts of the system eg
how the semantics of an instruction should be implemented in order to be eciently executed
She can forget about routine work such as writing code for decoding instructions etc which
will be generated by the tool
In our approach we use execution statistics of instructions to optimize the simulator Such a
task would be nearly impossible to do by hand or at least extremely laborious especially if
several simulators are to be optimized for dierent program types
A SGT also has the ability of generating other utility tools such as assemblers disassemblers
and test programs for validating the correctness of the simulator which could be very helpful
  Organization of This Thesis
Section  describes some important simulation techniques used in the generated simulator In
section  we discuss our approach to a SGT Section  describes our specication language
We use an intermediate format for more ecient interpretation and this is covered in section
 The generated parts of the simulator is described in section 	 We present the performance
of a generated simulator in section 
 related work in section  future work in section  and
our conclusions in the last section

 Simulation Techniques
From previous work we are familiar with how to construct ecient simulators     
This knowledge we of course want to make use of when we design a simulator generation tool
Below we summarize some important techniques described in these refered papers which can
be used to implement an ecient ISAsimulator
  Intermediate Format
A common way to implement a simulator is to use an interpretation loop that loops over the
binary and interprets the instructions one at a time The simulator decodes the instructions at
runtime and then calls the corresponding service routine which performs the function of the
instruction If the instructions in the source code have complicated bit patterns opcodes
which is very common for instruction sets the decode phase could be very expensive to
perform at runtime Therefore a better approach is to rst translate the source code into
an intermediate format which is much faster to interpret Figure  shows this translation
A
B
C
A
B
C
Native Format
Intermediate Format
Service Routines
Figure   Mapping to intermediate format
The major dierence between the native format the source code and the intermediate for
mat is that the latter is optimized for software interpretation as opposed to hardware In
our scheme it contains a pointer to the service routine for the instruction With this repre
sentation we only need to decode the parameters of the instructions and perform a jump to
the service routine This makes the simulation much more ecient
For performance reasons it is not always the best solution to have one service routine per
instruction Commonly used instructions could have special versions of service routines in
	
order to speed up the simulator
 
Rarely used instructions could be brought together into one
service routine where they could for instance share code

This makes the mapping between
instructions and service routines a rather dicult task when it comes to optimization Our
solution to this will be discussed in more details in section 
The instruction parameters such as register numbers immediate values branch osets etc
must also be found in the intermediate format But here we have the ability to store them in a
dierent way which can make the simulation more ecient We can for example precalculate
certain transformations which otherwise must be performed at runtime
With intermediate format we mean both what service routines there are the mapping and
how their parameters are stored
 Threaded Code
With the technique explained above threaded code  can be used to make the simulator
even more ecient With threaded code no loop structure is used to make the interpretation
Instead all necessary loop code is rolled out at the end of each service routine This means
that the last thing a service routine performs is a jump to the next one which can be found
in the intermediate format In this way no subroutines need to be called and thus we can
save a few assembler instructions

The simulator could be said to be executing a thread since
it never returns from a call
 
For example if it is very common to add with one  we could make a special service routine which only task
is to perform that In this way we do not need to extract the immediate value  from the instruction during
runtime

This could lead to better cache utilization of the host machine running the simulator

Typically we do not need to save the return address and to do the return from the subroutine We also
save the jump to the top of the interpretation loop since we do not have one


 The Simulator Generation Tool
In this section our concept of a simulator generating tool ISAlevel will be presented A
more detailed description of the dierent parts will be given in the following sections
  Aims
The simulator generation tool should be able to 
  nd a good intermediate format for fast interpretation
  generate ecient threaded Ccode for the simulator
  generate assemblerdisassembler
  make optimization by using statistics
  generate test suites for testing the correctness of the simulator
 Design
Clearly we need a specication of the architecture to make a simulator of A complete
ISAspecication must contain information about how instructions are coded opcodes the
semantics of each instruction what they should do the syntax for assemblerdisassembler
register and memory structure To simulate accurate timing information about resources
eg caches and pipelines needs to be added The tool should then be able to generate all
the dierent parts of a simulator from this specication
This seemed to be a lot of work and not achievable within a six months thesis work Therefore
we had to focus on something at rst Since we already had a simulator SimICS  which
could run SPARC V  code it was natural for us to use it as a basis and try to replace
parts of it with generated ones In this way we always had a version of the simulator which was
runnable and we could always compare the generated parts with the old ones which helped
us nding bugs even in SimICS We focused on the instruction set and on implementing
SPARC V instructions from a specication using the simulator core with all register and
condition codes as well as the memory simulation from SimICS But all the time we had in
mind that the tool should be able to generate simulators for other processors including CISC
Complex Instruction Set Computer ones
  First Approach
Figure  shows what the tool is expected to do The intermediate format is rst created
from the specication and then all the components The decoder needs both the native

Service Routines Assembler
Disassembler
Specification
DecoderIntermediate Format
Figure   Overall structure The ellipses represent data and the squares are generated components
format described in the specication and the intermediate format since it should do the
translation between the two This translation is performed before the instruction is executed
the rst time The intermediate code is then saved so that the decoder does not have to be
called on subsequent invocations of the instruction The service routines are created from the
intermediate format and the assembler and the disassembler is generated directly from the
specication
Optimizations was not considered in this rst approach instead we concentrated our work
on integrating the generated parts with SimICS Our specication language

was simple but
expressive the whole SPARC V instruction set could be described but we only implemented
a few test instructions which worked well together with SimICS The intermediate format
used a   mapping between instructions and service routines since this was the easiest to
begin with
The decoder the disassembler and a primitive assembler worked ne as well
 Improvements
With the rst approach we had something that could run so we now focused on improvements
The intermediate format denes a mapping between opcodes and service routines This
format can be changed by making specializations or generalizations of service routines A
specialization denes new service routines that handles special cases faster than the original
routine For instance a specialized version of an add instructions could be a increment in
struction which always adds with one The increment instruction can be implemented more
eciently than the add instruction since we do not need to extract the immediate value in
this case  from the intermediate format A generalization on the other hand is the opposite
of a specialization Here the same service routine is used for several instructions This way
they can share code and thus we can make better use of the instruction cache of the host
machine running simulator
When nding an ecient intermediate format execution statistics can be used to give hints
of which service routines that need to be created which specializations and generalization we
should make Figure  describes our scheme of how to do this

See section  for information about how the specication language looked like in this rst approach

Statistics Converter
Service Routines
Instruction Statistics Raw Statistics
Assembler
Disassembler
Specification
Decoder
Intermediate Format
Figure   Overall structure with optimization using statistics
Here a statistics converter is rst created from the specication The purpose of this mod
ule is to convert raw opcode statistics to instruction statistics Opcode statistics contains
information of how frequent the execution of every unique opcode bit pattern is ie how
common the instruction add  g g

is for example This information could be output
from a simulator or some other tool In instruction statistics on the other hand every bit
pattern matching a service routine are grouped together A bit pattern is parsed into a set of
elds with assigned values The frequency of each eld set is kept so that if a service routine
is specialized into two new the statistics is also split between them The reason for doing
this translation is that the specication gives us information about the instructions on the
instruction level the add instruction rather than on the opcode level all variants of add
When the statistics converter is generated a stand alone tool

 the generation of the inter
mediate format is performed This process could be viewed as an iteration where better and
better intermediate formats are produced For each format a simulator is created and by
measuring its performance we can see if it was more ecient than the previous version The
statistics are used to see which service routine to create For a more detailed description
about this process see section 
As before the decoder needs information about the native format as well as the intermediate
But this decoder is a little more complicated than the previous one Since no   mapping
is used between instructions and service routines this one must be able to recognize special
instances of the instructions and which instructions that should use the same service routine
The assembler could be created directly from the specication but the disassembler uses some

SPARC V instruction Adds 	 to the contents of the global register g
 and stores the result in global
register g

This means that in order to create the simulator the SGT must rst create another tool The SGT has
now become a metametatool

information from the intermediate format In this way it is able to show which special service
routine a certain instruction uses This is a nice feature when debugging and optimizing the
simulator
 Core Interface
With the scheme described in the previous section the architecture of a generated simulator
can be viewed in gure 
D
ecoder
D
isassem
bler
Service
Routines
Converter
Statistics
A
ssem
bler
D
isassem
bler
Helper Tool
Simulator Core
D
evices
The Simulator
Static Functionality
Figure   Architecture of a generated simulator
Here the shaded parts of the simulator are not generated by the tool and must thus be supplied
from some other source In our case we use SimICS Devices include memory hierarchy disks
or other storage units bitmapped screens and other peripheral devices we wish to simulate
The simulator core contains all start up code for the simulator generic event queues to control
the simulator command line interface debugging facilities architectural aspects such as the
structure of the register le and if delay slots are used etc The simulator core should be so
generic that it could support dierent architecture specications
When an instruction needs to refer to the core part of the system  it could be a memory
instruction for instance reading from memory  it uses special access primitives These prim
itives are used directly in the specication and thus we make the specication independent of
the user core and device parts except the primitives of course This means that it is very
easy to replace these parts without changing the specication eg adding a new timer device
or changing the cache size
The simulator needs the disassembler since it should be possible to step through a program
instruction by instruction and also to disassemble parts of a program
The helper tools consists of the same disassembler as in the simulator an assembler and the
statistics converter The shaded part of the helper tool just controls it and does not need to
be changed for dierent architectures

 Test Suites
The process of generating test suites is rather complicated since it requires deeper knowledge
about the semantics of the instructions than what is necessary for generating a simulator We
must for example know that a branch instruction is a branch instruction in order to test it
properly So generation of test suites was not considered during the time of the thesis work
A SPARC V suite developed earlier was used to verify correctness 
 Implementation
The SGT called SimGen was implemented in Ccode for maximum performance Handling
large amounts of statistics require a fast tool However the development time could have been
shorter if we had used a language with better support for symbols and dynamic datatypes
Flex and Yacc was used for generating parsers for the specication language

 Discussion
In our approach we separate the instructions which are specied in the description language
from the devices and the simulator core which must be implemented by hand This solution
was not our intention from the beginning Instead we wanted to be able to generate the
whole architecture from a single specication Lack of time forced us to concentrate on the
instruction set However specifying a entire system including register structure memory
hierarchy delayed slots etc could be a very tricky task Especially if we want the system to
be as ecient as possible Therefore implementing those parts by hand is not such a bad
idea after all

 The Description Language
In this section we will talk about our specication language how it was developed and how
the nal version looks

  Requirements
The specication language should
  describe the processor on the ISAlevel
  be able to express RISC as well as CISC architectures
  include syntax descriptions for the instructions
  be expressive but compact
  make it possible to generate ecient simulator code
  be easy to use
We thus exclude description of
  register architecture and condition codes
  memory hierarchy
  other devices
which must be implemented by hand by the user but it should be possible to access their
functionality within the specication This could be done by using macros functions or global
variables The service routines which will contain these references later on will be compiled
and linked with the user written modules

 What Needs to be Expressed
What must be expressed in the specication language are rst the bit patterns opcodes
that codes the instructions see next section for an example This information is needed by
the decoder the disassembler and the statistics converter since they must be able to identify
the dierent instructions in the native format The assembler also needs it when writing the
instructions to a le Second the semantics of instructions is of course needed and can in an
abstract way be viewed as functions transforming the state of the simulated processor from
one state to an other Third the syntax of the instructions which is used by the disassembler
and assembler is also needed Instruction statistics describing how frequently dierent bit
patterns are used for optimization must also be specied somewhere but this information is
better stored in a separate place since it is created by a tool rather than handwritten


 Example of Instruction Coding
Typical machine code instructions are built up by dierent  elds which forms formats  Those
formats could be of varying length which is common in CISC instruction sets or have a
xed length as in RISC instructions typically  bits on a modern processor Below two
instructions from the SPARC V architecture are shown
Add with register add rs rs rd
op rd op rs i  rs
 XXXXX  YYYYY   ZZZZZ
Add with signed immediate add rs simm rd
op rd op rs i simm
 XXXXX  YYYYY  WWWWWWWWWWWWW
Each column in the tables represent one eld In the rst row the name of the elds are
given and in the second row their values in binary Those elds with numbers here op
op i identify the instruction The only dierence in this case between the instructions is
the value of the ield which tells whether the instruction is an add with register or an
add with immediate The elds containing the letter X Y Z or W have variable values and
are parameters to the instruction pointing out dierent registers or being immediate values
The rst add instruction has a eld which is not used but should be set to zero
The semantics of the rst instruction is to add the contents of register rs  with the contents
of register rs and stores the result in register rd The widths of the register elds are 
bits since the SPARC V architecture has  dierent registers The second add instruction
uses an immediate value as an operand instead of a register This value is stored in simm 
which can hold values from 	 to  twos complement Since all SPARC V registers
are  bits this eld must be signextended before use


 Design
  First Approach
Our rst approach to the specication language can be viewed in gure  where the two add
instructions from the previous section are described
The specication begins with a description of dierent formats used The rst is called fa
and the second fb Both have a width of  bits Between the square brackets all elds of
the format are listed together with their widths The sum must be equal to the format width
Then follows an instruction specication which states that the addinstruction should use
format fa and interpret rs rs and rd as parameters The pattern declaration constrains
the values of the elds op op and i ie they identify the addinstruction The syntax

format f a   op rd op  rs	 i	 not
used rs
format f b   op rd op  rs	 i	 simm	 	 
instruction f a addrs	 rs rd
pattern
op  	  op     i  
syntax
 printfaddtrld rld rldn rs	 rs rd 
semantics
 REGrd  REGrs	  REGrs 
instruction f b addirs	 simm	  rd
pattern
op  	  op     i  	
syntax
 printfaddtrld rld rldn rs	 sign
extendsimm	 	  rd 
semantics
 REGrd  REGrs	  sign
extendsimm	  	  
Figure   Example of a specication of the SPARC instructions add with register and add with
immediate
declaration is just Ccode used for the disassembler This code could use the parameter elds
as ordinary variables The semantics parts is also built up by Ccode and is merely copied to
the instructions service routine Again the parameter elds can be used as variables The
REG construct is a macro dened in the simulator core which expands to a register reference
The addiinstruction add immediate is described in the same manner The sign extend
is also a macro which surprise signextends the simm value to  bits
Drawbacks
The specication approach described here is rather straight forward We dene the format
of the instructions and then the parameters syntax and semantics A benet of this is that
it is rather simple but it has some drawbacks too When specifying a whole architecture
this way the specication le tends to become rather large and thus hard to read since
every single instruction must be specied although some of them only dier slightly Take
the two add instructions for example They have two dierent addressing modes but their
common semantics is to add the rst operand to the second and then store the result in
the destination We want to have a specication on a higher abstraction level which should
separate the description of instructions and addressing modes

In the following section we will present a solution for this as well as other constructs to make
it possible to generate an ecient intermediate format from the specication

The Motorola 	x	 architecture  for example  has over ten dierent addressing modes This could there
fore be very useful

 Improvements
Field Declaration
We have replaced the format declaration used in the rst approach with eld declarations
This was done mainly because we want to be able to refer to elds globally without a particular
format
fields  
op 	  rd op 	 rs			 i	 	  simm	 	 rs
Here some of the SPARC V elds are specied The number  after the fieldskeyword is
an oset and means that these elds are placed within the rst  bits of an opcode op uses
the rst  bits and rs the last  Several declarations with dierent osets can be specied
This is useful when we want to dene elds for a CISC architecture which has dierent format
widths The minus character before simm species that this eld should be sign extended
before usage and that the tool now has this responsibility
Intermediate Form
A new construct is the intermediate form declaration Here it is possible to specify certain
transformations that should be applied to the elds during the translation to the intermediate
format For example when accessing a register within a service routine we must multiply the
register number with the register width in order to get the correct oset from the beginning of
the simulated register le

If we can calculate this during the translation to the intermediate
format rather than during runtime we can save a shift instruction for every register access
Of course more complicated transformations can be used if necessary
intermediate form
rd  REG
OFFSET
DSTrd   
rs	  REG
OFFSET
SRCrs	   
rs  REG
OFFSET
SRCrs   
Above some transformations for the SPARC register elds are shown The macros used here
are dened in the simulator core and are used to calculate the correct position within the
register le

The numbers after the eld names states that the transformed eld needs a total
number of  bits  of which the lowest  always will be zero since we multiply with 
The tool can make use of this information when packing the elds into the intermediate format
If all elds do not t for example it can strip some zerobits without losing information
 	

Some simulator host architectures may have an assembler instruction for this but not all

The structure of the SPARC register le register structure is rather complicated since it uses register
windows However some smart transformations can be used to speed up register accesses but these are not
explained here See 

 	
It then of course needs to shift the eld at runtime reproducing the zerobits If it was a complicated
transformation however we could still gain execution time
	
Combinative ContextSensitive Macros
By using combinative contextsensitive macros CCSmacros we can specify instruction se
mantics on a higher abstraction level and thus get a much more readable specication Figure
	 below shows how this could be done with our add instructions considered earlier
define OP	
fields  rs	  syntax rldrs	 semantics  REGrs	 
define OP
case i   
fields  rs  syntax rldrs semantics  REGrs 
case i  	 
fields  simm	   syntax ldsimm	  semantics  simm	  
define DST
fields  rd  syntax rldrd semantics  REGrd 
instruction addOP	 OP DST
pattern
op  	  op   
syntax
add OP	 OP DST
semantics
 DST  OP	  OP 
Figure 	  Here three dierent macros are dened one for each operand of the add instruction The
OP 
macro expands dierently depending on the value of the i
eld
A CCSmacro has a caselist which assigns dierent meanings to the macro depending on
which boolean expression evaluates to true A case expression is built up by constraints on
elds that must be satised in order for the corresponding casebranch to be used Exactly
one of these expressions must be true for every possible valuation of the elds If only one
meaning of the macro is requested the case expression can be omitted eg OP and DST Each
branch has three dierent parts which corresponds to the context sensitivity of the macro  If
it is used in the parameter list of an instruction denition the fieldsdeclaration will replace
the macro ie the eld list between the square brackets If the macro is used in the syntax
area the syntaxdenition replaces the macro and if it is used in the semantics part of an
instruction the semanticsdenition replaces the macro
The eect of using such macro is shown in gure 
 where the instruction denition for the
addinstruction is expanded into two dierent versions


Add With Register
Add With Immediate
instruction add i rs rs rd
pattern
op   		 op
   		 i  
syntax
add rldrs rldrs rldrd
semantics
 REGrd  REGrs  REGrs 
instruction add i rs simm
 rd
pattern
op   		 op
   		 i  
syntax
add rldrs ldsimm
 rldrd
semantics
 REGrd  REGrs  simm
 
Figure 
  The add instruction denition is expanded to two dierent versions
What has happened is that since the OPmacro had two cases we got an instruction for each
case Note that the pattern part now has an extra condition on i which comes from the macro
If more than one multicase macro is used then an instruction denition for every combination
hence the name combinative of cases will be generated In gure  some arithmetic and
logical instructions of the SPARC V architecture are dened After the macro expansion
here we get a total of 	 instruction denitions  instructions with  addressing modes each
The uses keyword is used to declare the use of the ARITHmacro Macros used as parameters
are implicitly declared since this is so common
define ARITH
case op     syntax add semantics   
case op   	  syntax sub semantics   
case op   	  syntax and semantics   
case op   		  syntax andn semantics   
case op   	  syntax or semantics   
case op   		  syntax orn semantics   
case op   		  syntax xor semantics   
case op   			  syntax xnor semantics   
instruction arithOP	 OP DST uses ARITH
pattern
op  	
syntax
ARITH OP	 OP DST
semantics
 DST  OP	ARITHOP 
Figure   Some SPARC arithmetic and logical instructions in one denition assuming the OP OP
and DST are already dened The eld
declaration part of the case branches can be skipped since no
elds are used We switch on the op
eld since it identies the instructions

Syntax Strings
The syntax denition used in the description language needs an explanation A denition
of the form 	AfT
EgB	 corresponds to the output from the C function call printf	A TB	
E E is an arbitrary Cexpression with the type T The A and B parts are ordinary text
including the  character since an expression is placed between the curly brackets f is
written nf For example 	Sum
 fld
g	 corresponds to printf	Sum
  ld	 
This representation was used because it matched well with the macros since we only have to
replace the macros with the syntax strings without having to bother about Cstyle argument
lists We also believe it is easier to parse these syntax strings when making a full assembler
but we did not have time to do within this thesis work
Virtual Fields
Sometimes it is possible to have a more ecient but not complete representation of some part
of an architecture The states that is not representable are so unusual that we really want
to make use of the more ecient technique Since we still want to be correct we can let the
simulator execute in dierent modes one correct but slow and one incomplete but fast If we
detect a state that is not representable we can switch over to the slower mode and then switch
back as soon as possible The simulation of condition codes in SimICS uses this technique
Note that this is only used for simulator eciency ie representation of simulator state and
must not change the behavior of the simulated processor
Since we use dierent representations we need dierent service routines one per mode and
thus we must be able to express this in the specication language Our solution for this is to
introduce something we call virtual  elds which can be used as usual elds in case expressions
or in the pattern part of an instruction denition Instead of being a part of an opcode they
represent internal states of the simulator For every virtual eld the user need to supply a
function in the simulator core for example which denes the meaning of the eld ie which
dierent values it can have and when See appendix B for an example of the use of such
function The virtual elds are declared among the other elds as follows
fields  
op 	  rd modevirtual onpagevirtual
and can be used like ordinary elds 
instruction fOO Q M JB
pattern
bar    mode    onpage  	
!!!
We use a modeeld in our SPARC V implementation to specify whether an instruction
should execute in our optimized mode for condition codes or not If the decoder detects by

means of a user dened function that we are in the optimized mode a faster service routine
is invoked for the current instruction which uses the more ecient representation If a service
routine nds out that the state is not representable it switches over to the slower mode and
reexecutes the instruction But this time the service routine for the slower mode is used
In the same way if a slow mode service routine nds out that it is possible to execute in
optimized mode it can switch back to that mode Since some instructions now have two
dierent service routines we need to store both service routine pointers in the intermediate
format Actually we use two formats one for each mode
An onpageeld is used to separate branch instructions which jumps to the same memory
page from those who fall o This is done because we can implement onpage branches more
eciently than opage branches An onpage branch does not need to calculate the simulated
physical address we should jump to by using its virtual address ie going through the TLB
Translation Lookaside Buer Instead we can just add a proper oset to the physical
address If the decoder detects an onpage branch a service routine implementing the more
ecient target calculation is used For more information of this see 

 Discussion
Our main goal was to be able to generate a simulator from our specication language that
should be as fast as or preferably faster than a handcoded and handoptimized simulator To
be as good as a human designer the tool must know what optimizations and transformations
that could be applied to certain constructs of the simulated system eg better representation
of condition codes and how to perform on page branches more eciently To program a tool
with this information is very tricky if not impossible It is not very exible either since dier
ent optimizations could be applied to dierent architectures A better approach is to let the
specication language contain constructs for optimization purposes such as our intermediate
formdeclaration and the possibility to use virtual elds These constructs were at rst de
veloped in order to express optimizations used in SimICS for our SPARC V implementation
but we believe they could be as useful for other architectures
To be able to describe the instruction semantics on a higher abstraction level as well as still
being able to generate ecient code we invented the CCSmacros From the beginning they
were only intended to express dierent addressing modes but as gure  shows they can be
used to express other things as well An alternative way to express dierent addressing modes
in a compact way could be to evaluate which addressing mode at runtime instead eg using
the ield as a parameter and switch on it In this way we also only need a single instruction
denition for the two dierent add instructions but we will increase execution time The
CCSmacros gives us both speed and a compact specication which is easy to read
All text between the f and the g markers are pure Ccode besides the use of CCS macros
and just copied to the source code of the generated simulator This means that errors will
rst show up when compiling the simulator and not when the tool parses the specication
This is of course a drawback but has the advantage of being easy to write since we do not
need to invent a new language for the user to express the instructions semantics in

 The Intermediate Format
Earlier we have talked about the intermediate format in very schematic way In this section
we will give a more detailed description of this topic and how it is used by the SGT
  Introduction
From section  we are familiar with the basic concept of an intermediate format We make a
translation to a format that is more ecient to interpret than the native because it contains
pointers to the service routines implementing the instructions as well as eciently stored
parameters But what do we mean by an instruction The instructions used by a SPARC
processor for example have a width of  bits This gives a total number of 

possible
patterns Should all the legal ones be viewed as dierent service routines ie no parameters at
all or should just one service routine be used with the  bits opcode as the only parameter
with a very complicated semantic Clearly the truth must be somewhere between these
extreme cases We cannot use  

service routines and by using one we have not gained so
much since then the decoder has to be a part of the service routine
When looking in an architecture manual certain elds divide the bit patterns into dierent
instructions It could be add with register or add with immediate for example This
division is a very intuitive one when choosing an intermediate format but is it the best
Suppose that it is very common to add with the immediate value  In this case we could
have a special service routine which just does that In this way we do not have to extract the
intermediate parameter from the intermediate format during runtime since we already know
the parameter equals  We can gain some instructions in the service routine and it also gives
the compiler that compiles the service routine a chance of making further optimizations
Say that several instructions are very seldom used Then we can use a single service routine
for them which make it possible for them to share code and thus we can make better use of
the instruction cache on the host machine running the simulator
These two cases show that we may get a more ecient simulator if we use an alternative way
to map bit patterns to service routines The use of execution statistics will help us nding
such a mapping
 Requirements
  The service routine parameters should be packed in an ecient way
  The intermediate format including service routines should be printable so it is possible
for the user to see the decisions made by the tool
  The tool should optimize the set of service routines by using statistics

 Packing Parameters
When running a service routine some elds are implicitly set to specic values ie the elds
used by the decoder to identify which service routine to store a pointer to in the intermediate
format These elds we call static  elds since their values are never changed for a specic
service routine This means that we do not need to store them in the intermediate format
What remains are the parameter  elds which holds register numbers intermediate values
branch osets etc This information we of course want to store in an ecient way
From previous work  we know that it is better to pack all parameters in one machine
word than using one for each parameter This has to do with slow memory loads If we store
the parameters in a single word we have to extract them by shifting and masking but we
still gain execution time since these operations will be made on registers instead of memory
cells In SimICS the intermediate format is 	 bits wide  bits for the service routine
pointer and  bits for storing parameters Since we started by implementing the same
architecture as SimICS simulates SPARC V the same size for the intermediate format was
used When implementing a CISC architecture for example we perhaps need to use a wider
format However the algorithm packing the parameters is exible enough to use any width
What we want to minimize is of course the number of assembler instructions which extract
the parameters When we do this we must also consider aspects such as sign extension of
parameters before usage the need to use zerobits or not and the intermediate transformation
of parameters which the user can specify See section 
Our algorithm is described below and it must be pointed out that it is designed for generating
extraction code to be run on the SPARC architecture host Thus some other packing of
parameters is perhaps better for another host machine with another instruction set
Packing Algorithm
 The algorithm rst checks to see if all parameters t into the intermediate format when
all intermediate transformations are performed before the packing ie the intermediate
eld widths are used
 If the parameters do not t then zerobits are removed for some or all elds The user
can specify that some of the least signicant bits in a eld always will be zero see
section  and thus we only need to store the most signicant bits
 If still the parameters do not t some or all of the intermediate transformations must be
performed at runtime instead Of course if we have complicated transformations we
could gain execution time by storing them in separated machine words and thus be able
to precalculate the transformation anyway But since this require deeper knowledge of
the transformations we have not considered this

 If the parameters cannot be packed here the algorithm gives up
  
However for an
architecture with x instruction width this case is very unlikely to occur since if we
use the same width for the intermediate format as for the native excluding the service
routine pointer the parameters must t
 
 Now the parameters must be placed in the right order depending on their type We
use the following heuristic to decide how to order them 
  A parameter which need to be sign extended S should be placed to the left in
the format We thus only need to perform one arithmetic shift to signextend and
extract the parameter If other parameters must be signextended they must rst
be shifted up and then arithmeticshifted down again to produce the sign
  Zerobits Z users should be placed in the middle where only a shift and a mask
are enough for the extraction and reproducing of zerobits
  Normal parameters which do not need any transformation P should be placed to
the right or to the left where only a mask or a shift is enough for the extraction
Of course if they must be placed in the middle we need both a shift and a mask
  An index I parameter which identies instructions within a generalized service
routine see section  must be placed on the same location for all instructions
sharing a service routine We have chosen to place them in the least signicant
bits on the right
  If the service routine only has one parameter no shifting and masking is of course
necessary and signed parameters could be presignextended
This gure summarizes the parameter placements 
SSSSSSSSSSPPPPPPZZZZZZPPPPPPPIII
In order to extract all these parameters we need a total of  SPARC instructions which
are faster then  loads One shift for the Seld shiftmask for the high Peld
shiftmask for the Zeld shiftmask for the low Peld and a mask for the index
eld
Our parameter placement heuristic is not optimal There are situations where it is possible
to save one or a few instructions if we reorder the parameters in an intelligent way For
example an optimization that could be done is trying to place Zelds at the bitposition
equal to the number of zerobits In this way only one mask is necessary for extracting and
zerobitextending such parameter we mask away the bits around the eld
We could use an algorithm which generates all possible permutations of the elds and then
calculates which one is the most ecient Such method should nd the optimal packing but
could be rather slow
  
The parameters should be stored in a memory structure in this case but this has not been implemented
yet
 
Unless we have introduced a too wide index eld when generalizing See section 


 Storing the Intermediate Format
One of the requirements we had on the intermediate format was that it should be printable in
some readable form We did not think it was enough to look at the produced service routine
to nd out the actions performed by the tool and since we wanted to nd a good solution by
iteration over the intermediate format see section  we needed to be able to read it as
well The solution to this was to store it in a textformat both human readable and computer
readablewritable Below in gure  the add instruction in its intermediate form is viewed
Fields
op      
rd       REG OFFSET DSTrd   
op
      
rs 
      REG OFFSET SRCrs   
i      
rs  
     REG OFFSET SRCrs   
simm
  
  
  
Service Routine add i 

Instruction add i 

Fields
rs  parameter
rs  parameter
rd  parameter
op  
op
  
i  
Intermediate Format
A rd    
B rs    
C rs  
  
AAAAAAAAAAABBBBBBBBBCCCCCCCCCCC
Syntax
add susrGetRegStrrs susrGetRegStrrs susrGetRegStrrd
Semantics

REGrd  REGrs  REGrs 

Statistics 


Figure   The add with register instruction represented in the intermediate format
First all elds are declared their positions in the native format startbit to stopbit
 
 if they
should be signextended  or not  and then what was specied in the intermediate form
declaration in the specication  the intermediate width of the eld number of zerobits and
the transformation
 
In the SPARC specication the bits are numbered the other way around  eg  to 
 instead of 	    as
this is also used in the SPARC architecture manual 

After all eld declarations a list of service routine denitions follows Each service routine has
in turn a list of instructions that share this service routine In the gure the service routine
called add i  only contains the instruction add i  same name
An instruction denition rst declares which elds to use and the constraints bound to them
Next we can see how our tool has packed the intermediate parameters for the service routine
The packing must be made with respect to the intermediate form declaration in the speci
cation ie new eld widths and whether zerobits can be removed For the add instruction
the sum of the eld widths is  bits so the removal of zerobits is necessary here Two bits
has been removed from rs
Next comes the syntax declaration and then the semantics
A new part compared to the specication is the statistics which is used when optimizing
the simulator Here the user can add statistics about the instructions by means of the
tool These statistics contains information of how frequent dierent parameters are for the
instructions We will see how this is used in the next section
 Optimizing Using Statistics
  Motivation
The typical action performed by a service routine implementing a triadic
 
instruction such
as SPARCadd is viewed in the following pseudo assembler code 
 r contains the packed parameters r contains the start of the
 simulated register file
service routine add  add gX gY gZ 
and r MASK P r  Store first operand parameter in register r
srl r SHIFT P r  Store second operand in r by first shifting
and r MASK P r  and then masking
srl r SHIFT P
 r
  Store destination register number in r

ld r  r r  Load r with emulated register value gX
ld r  r r  Load r with emulated register value gY
add r r r  Perform the actual instruction add
st r r  r
  Store the result in the destination register gZ
epilogue  The unrolled interpretation loop code Contains 
 assembler instructions including the loading of
 r for the next service routine and a jump there
A total of  instructions Now suppose that the add gX g gX is very common add the
 
An instruction which uses three parameters is called triadic

contents of register g to gX so we make a special version of it 
 r contains now only one parameter gX since we know we will use
 the emulated machine register g and can thus be used directly
service routine add  add gX g gX 
ld r  r r  Load r with the emulated register value of gX
ld r  IM G r  Load r with the emulated register value of g
 The offset to g is stored in IM G a constant now
add r r r  Perform the actual instruction add
st r r  r  Store the result in the destination register
epilogue
This service routine just contains  assembler instructions and is thus signicantly faster If
we make special versions of the most common instructions this way we will hopefully get a
more ecient simulator The ideal is to have special case of every possible instruction  


but this solution is of course not realistic because the memory on our host machine will
probably run out long before we get there and we will certainly trash the instruction cache
somewhere along the way Therefore we want exactly so many service routines that t into
the hosts instruction cache Special versions of common instructions and sharing routines
among unusual ones to bring down instruction cache usage
In the following subsections we will show how to do this by using statistics
 Specialization
The process of making special versions of commonly used service routines we refer to as
specialization Our SGT understands two specialization techniques and instruction statistics
is used for both of them When using the rst specialization technique the tool searches for
the service routine with the highest accumulated frequency count when matching a parameter
eld to a xed value A new service routine is added where this eld has been removed from
the parameters and added to the static elds with the constraint that it should be equal to
the value The old service routine is kept to handle all cases where the condition on the eld
does not hold
What we bypass in the special version is the extraction of the parameter which can save us two
assembler instructions a shift and a mask We also get less parameters in the intermediate
format and this could lead to better packing of the remaining elds Perhaps no zerobits
need to be aborted for example
The other way to specialize is to look for service routines where two or more parameters
frequently have the same value Then we only need to store one of them and thus we will
	
again save parameter extraction instructions When looking at statistics this seems to be
rather common The SPARC add instruction for example very often uses the same register
as source and destination
Service Routine add i 

Instruction add i 

Fields
rs  parameter simm  parameter
rd  parameter op  
op   i  
Intermediate Format
A simm    
B rs 	  
 
C rd   
 
AAAAAAAAAAAAABBBBBBBBBCCCCCCCCC
Syntax

Semantics

REGrd  REGrs 
 simm 

Statistics 
freq rs simm rd  freq rs simm rd 
           
         	  
           
   	     	   
        	  	 
         	  
   	        
           
           
           
      	     
     


Figure   The original add immediate instruction with statistics
Figure  shows the add immediate instruction together with instruction execution statis
tics One row in the statistics table shows how frequent certain parameter values are The
instruction add  r   r occurs  times for instance
Our tool uses a simple algorithm when deciding what specialization to perform For each
service routine it checks which parameter value exists most frequent in the statistics In
gure  this is simm   which occurs  times The algorithm also counts how
frequent parameters with the same value are In the same gure we nd rs  rd 
times The constraint with highest value wins and the corresponding specialization method
is used for it
In gure  right a new service routine has been created where rs is equal to rd All
statistics data where this condition holds has also been moved from the original service routine
left to the new We see that the new routine only needs two parameters rd and simm
and thus we save extraction instructions The parameters are also better stored for the new
routine since we do not need to remove any zerobits here


Service Routine add i 

Instruction add i 

Fields
rs  parameter simm  parameter
rd  parameter op  
op   i  
Intermediate Format
A simm    
B rs 	  
 
C rd   
 
AAAAAAAAAAAAABBBBBBBBBCCCCCCCCC
Syntax

Semantics

REGrd  REGrs 
 simm 

Statistics 
freq rs simm rd 
     
   	  
     
  	   
     
   	  


Service Routine add i  rs rd

Instruction add i  rs rd

Fields
simm  parameter rd  parameter
op   op  
rs  rd i  
Intermediate Format
A simm    
B rd   
 
AAAAAAAAAAAAABBBBBBBBBBB
Syntax

Semantics

REGrd  REGrd 
 simm 

Statistics 
freq simm rd  freq simm rd 
         
       	  
   	      
  	       
         
         
         
     	    
    


Figure   The add instruction shown in gure  broken up into two dierent versions The original
on the left and a special variant on the right where rs is equal to rd
Further specializations can now be made either from the original routine or from the new
one The statistics shows that adding with  seems to be very common therefore a possible
choice could be to make a special instance of the new routine where simm   If this is
done we almost get the same code as showed in the optimized service routine of section 
The only dierence is that since we use an immediate value we do not need to load the value
from a register which saves us another instruction
Compiler Optimizations
When we specialize a parameter we replace its occurrence in the semantics with the value or
parameter it was equal to thus replacing variables with constants or other variables This
gives the compiler that compiles the generated service routine an opportunity to make further
optimizations such as constant propagation and common subexpression elimination
We have not measured how much it is possible to gain by compiler optimizations but we think
it can make a dierence especially for instructions with long and complicated semantics
 Generalization
The process of grouping uncommon service routines together we call generalization The
purpose of this is to minimize the instruction cacheusage of the host machine running the

simulator Doing the optimal selection is a very hard task which involves deep analysis of
instruction semantics This is beyond the scope of this work so we use an approximative
method in which we identify the most infrequent used service routine by using the statistics
We then bring them together where they at least can share the epilogue code
 
which is
identical for most of the service routines
A generalization is almost the reverse to a specialization However generalizations does not
move existing static elds to parameter elds Instead it creates a new parameter eld which
the service routine switches on in order to execute the correct instruction semantics The new
parameter should be placed at the same location in the intermediate format for all instructions
sharing the routine otherwise it is impossible to recognize them See gure  for an example
of how a generalized service routine looks
An optimization that can be done in order to increase code sharing is to gather instructions
which have their parameters packed in the same way thus making it possible for them to
share parameter extraction code If this is used we get a trade o between instruction usage
statistics and parameter packing similarities which we do not know how to handle Thus
our tool does not consider this
In the following subsection we will present a way to control specializations and generalizations
 The Iteration Process
Instruction Statistics
SimGen
Intermediate.0
SimGen
SimGen
Simulator Simulator
Statistics
ConverterSimGen
Intermediate.1
Intermediate.nIntermediate.n+1
Specification Opcode Statistics
Figure   The process of nding an ecient simu

lator by iteration
In gure  our scheme of nding an e
cient simulator is shown From the spec
ication the SGT SimGen creates the
rst intermediate format intermediate
where all CCSmacros are expanded We
thus have one service routine per instruc
tion This version of the intermediate for
mat corresponds to the native format of
the architecture and it thus forms the ba
sis for our iteration process This is also
where we create the statistics converter
since it should use the native format to
convert opcode frequencies to instruction
frequencies
The intermediate format is stored in the
text format explained in section  and
contains all information from the original
specication le including the syntax de
nitions Thus we do not need the speci
 
The unrolled interpretation loop code

cation le any more
 
Instruction statistics is then added to the intermediate format by the tool Here intermedi
ate an instructions appears like the addinstructions in gure 
Now when we have statistics our iteration process can begin The user has basically two
possible choices for each iteration step 
  the number of new service routines to create by specializations
  how many instructions to bring together by generalizations
If the user species for example  new service routines the tool will search through the
statistics for the  best specializations creating a new service routine for each case The new
intermediate format will then contain this new conguration If fewer service routines are
requested on the other hand the tool will make as many generalizations as needed
For every new intermediate format a corresponding simulator can be created Thus the
user can test the performance of the simulator at each iteration step and by making more
specializations and generalizations its possible to tune the simulator for the host machine
which should run it In this way it could be possible to nd the best instruction cache usage
As we save every intermediate format on the way we can easily undo bad decisions This
process can of course be made automatic but we have not gured out a good way yet
 
This is somewhat dierent from our view of the system in section 

 where the decoder used both the
specication and the intermediate format for example But in practice it is the same thing since all source
information needed by the tool is just copied into the intermediate format text le It was practical to store
all information in the same place since we should nd an ecient intermediate format by iteration

 Generated Parts
In this section we will present the dierent parts of the simulator how they are generated
and what their source code looks like
	  Introduction
Since SimICS is written in C with some GNUextensions it was natural for us to use the
same language for our generated parts We could have chosen assembler of course to get
maximum control of the code but with a little experience it is rather easy to predict what
kind of machine code the compiler GCC will produce especially since we do not use any
complicated constructs C has also the benet of being more readable than assembler and
easier to port to other platforms
Ccode should be created for the following parts 
  Decoder
  Service Routines
  Disassembler
  Statistics Converter
  Assembler and Opcode Output Functions
As said before the remaining parts must be implemented by hand
	 Main Include File
Since we wanted the source code to be readable while still being ecient we decided to
generate a main include le which contains usual operations needed by the dierent parts
The main contents of this le are 
  extraction macros for the native format elds
  intermediate format packing macros for each instruction
  extraction macros for the intermediate format elds

Native Format Extraction Macros
The purpose of these macros is to extract elds from the native format They are used by
the decoder disassembler and the statistics converter which all need to examine these elds
to achieve their tasks Each macro takes a variable in which the extracted eld will be stored
and a charpointer to the beginning of an opcode This way it is easy to use variable length
opcode which are used in CISC instructions The following two macros shows the extraction
of the SPARC V rd A and simm B elds
 AAAAA  BBBBB BBBBBBBB 
define EXTRACT NATIVE rdrd code 
 
rd  code    	 xf 

define EXTRACT NATIVE simm
simm
 code 
 
simm
  code    	 xf 
simm
  simm
   ! code
    	 xff 
simm
  sign extend int
simm
 
 

The rst macro has to shift the rst opcodebyte one bit to the right and then mask o
the most signicant bits in order to extract the rdeld The second macro is a little more
complicated since the simmeld covers more the one byte It must also be signextended
which is done by the macro
define sign extend int
v w longv  
w   
w
which rst shifts the eld v up and then arithmeticshifts it down again to produce the right
sign
Since these macros are automatically generated by a rather general algorithm which is not
presented here a lot of unnecessary code is laid out such as shifting elds zero bits and
masking a byte with x We have not spent any time for removing this code we leave this
to the compiler ie RISC
 
Intermediate Format Packing Macros
Intermediate format packing macros are used as indicated by their name to pack parameter
elds into the intermediate format The decoder uses them for this task Below we show the
macros used for the add immediate instruction which we have considered earlier
 
Relegating Interesting Stu to the Compiler

define PACK INTERMEDIATE add i w simm
 rs rd 
 
rs   REG OFFSET SRCrs    
rd   REG OFFSET DSTrd    
rs     
rd     
w   
w  w  
 ! simm
    	 xfff 
w  w   ! x 
w  w   ! rs    	 xff 
w  w   ! rd    	 xff 

The macro rst performs the intermediate transformations for rs and rd as specied by the
user in the specication In order for all elds to t        two zerobits are
used for them thus they are shifted down  bits each
 
All elds are then packed into the
intermediate format
We will get a macro like this for all dierent instructions which results in a rather big include
le   lines for SPARC V without specializations This could be cut down by letting
instructions with the same intermediate format share packing macros This has however not
been done
Intermediate Format Extraction Macros
These macros are the most time critical ones since they are used within the service routines
to extract parameters Below the extraction macros are shown for the same add instruction
which was packed above 
 simm
 rs rd 
 AAAAAAAAAAAAABBBBBBBBBCCCCCCCCC 
define EXTRACT INTERMEDIATE rs Bw code w  code    	 xfc
define EXTRACT INTERMEDIATE rd Aw code w  code   	 xfc
define EXTRACT INTERMEDIATE simm
 Aw code w  longcode   
The rst macro extracts the rseld by rst shifting it 
 bits to the right and then masking
of the bits around it thus producing zerobits as well The second macro works in the same
way except the shift which goes in the other direction simm is extracted and signextended
with only one shift
Unlike the packing macros the intermediate extraction macros have been designed to be shared
by several instructions But since a eld can be packed dierently for dierent service routines
several macros can exist for the same eld therefore a letter here A B which has nothing
to do with format showed in the comment is appended to the macro names to distinguish
them from each other
 
It looks pretty stupid to rst shift the elds up two bits in the transformation and then back again  but
since we do not analyze the user written transformation things like this could occur However  the decoder is
not time critical since it is only run once per instruction

The include le also contains other things needed such as constant denitions prototypes
inclusions of user written headers etc
	 The Decoder
The decoders task is to identify the dierent instructions in the native format and then
construct the corresponding intermediate format for them The na!"ve way of doing this is to
loop through all instructions for an incoming opcode and test the static elds until a match
occurs Then we extract the parameter elds so we can do intermediate translation Since
many instructions uses the same elds a lot of tests will be duplicated this way For example
the two addinstructions we have looked at several times only diers in the ields thus it
would be a waste of time to check the op and opelds for both of them To avoid this we
rst build a decode tree in which a branch corresponds to an instruction with a certain format
and certain static values In this way we will prevent duplicated tests
i
i
rd
i
op 2
1
3
0
Root
rs1 P
20
0
Prs1
1
rd op3P
rs2 P0
1
P
simm13
rs2 P
simm13 P
P moders2
0
1
P mode
1
0
simm13
H
G
F
A
B
C
E
D
I
0
1
0
1
1
Other Instructions
Other Instructions
Other Instructions
Other Instructions
Figure   A decode tree for   dierent instructions
Figure  shows how such tree could look for the following instructions  A specialized
version of add where rs  rd B specialized version of add immediate where rs  rd
and simm   C specialized version of add immediate where rs  rd D add E
add immediate F subcc
 
in mode  G subcc in mode  H subcc immediate in mode
 I subcc immediate in mode 
Here we also see the usage of virtual elds The subcc instructions have two dierent versions
one for each mode which corresponds to dierent representation of condition codes
 
Subtracts but also sets condition codes

The circle marks the root node of the tree A rounded rectangle represents a new eld in the
format and corresponds to the extraction of the eld in the code to generate The rst one
is called op and occurs in all SPARC V instructions None of these nodes has any siblings
here but this could be the case if several dierent elds start at the same bitposition The
rectangle formed nodes are children of the eld nodes and represent dierent values
	
the
elds have in dierent instructions These nodes corresponds to tests on the elds in the code
to generate For example the op could have values from  to  and simm could be equal to 
in the specialized instruction B A P means that a eld is used as a parameter Each leaf
in the tree represents a unique instruction and the action to perform when we have found it
That is here to pack the parameters and set the service routine pointer for the intermediate
format
When the decode tree has been built it is rather easy to generate the decoder from it We
only have to traverse the tree in inorder and lay out the proper code In appendix B the
corresponding code to gure  is shown together with some comments
It could be possible to rearrange the tree and test the dierent elds in some other order For
example the most used elds rst This way we could split the search space faster and thus
get a more ecient decoder But we have not implemented this since the decoder is not the
most time critical in the system it is only used once per instruction
	
 The Service Routines
void service routines

entry pointsSIM ENTRY INDEX add i  rs rd simm
   		SIM ENTRY POINT add i  rs rd simm
 
entry pointsSIM ENTRY INDEX add i  rs rd  		SIM ENTRY POINT add i  rs rd
entry pointsSIM ENTRY INDEX add i   		SIM ENTRY POINT add i 
return
  Service Routines  

Figure   Building jump table for the service routines The 
operator means the address of the
label GNU
extensions to C
From the intermediate format text le it is rather straight forward to create the service
routines Basically we put all of them in one Cfunction service routine function with labels
pointing out their positions Since the GNUextension of C let us take the address of a label
it is easy to build up a jump table for the service routines This is done by letting the
initialization phase of the simulator call the service routines function which then stores the
addresses in a table array This procedure is shown in gure  and is necessary because
labels cannot be referred to globally The decoder uses the jump table to set the service
routine pointers in the intermediate code
	
Actually  this could be any constraint on the eld such as op     op   etc But this is not necessary
for describing the SPARCarchitecture

SIM ENTRY POINT add i 

u long
rs rd
long
simm

EXTRACT INTERMEDIATE simm
 Asimm
 rOP
EXTRACT INTERMEDIATE rs Brs rOP
EXTRACT INTERMEDIATE rd Drd rOP
REGrd  REGrs  simm
 
epilogue

SIM ENTRY POINT add i  rs rd

u long
rd
long
simm

EXTRACT INTERMEDIATE simm
 Asimm
 rOP
EXTRACT INTERMEDIATE rd Ard rOP
REGrd  REGrd  simm
 
epilogue

SIM ENTRY POINT add i  rs rd simm
 

u long
rd
EXTRACT INTERMEDIATE rd Crd rOP
REGrd  REGrd   
epilogue

SIM ENTRY POINT and i  or i  xor i 

u long
rs rd rs INDEX 
EXTRACT INTERMEDIATE rs Crs rOP
EXTRACT INTERMEDIATE rd Crd rOP
EXTRACT INTERMEDIATE rs Brs rOP
EXTRACT INTERMEDIATE INDEX  AINDEX  rOP
switchINDEX 

case   and i  

REGrd  REGrs 	 REGrs 
break

case   or i  

REGrd  REGrs ! REGrs 
break

case   xor i  

REGrd  REGrs " REGrs 
break


epilogue

Figure   Specialized service routines for add instruction left and generalized service routine right
In gure  we show some generated service routines To the left we have three dierent
versions of the add immediate instruction The rst is the general one which can handle all
cases It begins with extracting the three parameters simm rs and rd from the variable
rOP which holds the parameter part of the intermediate format The instructions semantics
and the epilogue
 
are then executed The next routine is used whenever the condition
rs  rd holds Note that rs has been replaced by rd in the semantics The last service
routine is specialized a step further with simm equal to 
The service routine on the right side in the gure is an example of a generalized one It
is composed by the three SPARC V instructions and or and xor An extra index eld
INDEX  needs to be introduced which we can switch on to reach the correct semantics In
principle we have replaced the old static elds by a new one in order to distinguish between
the instructions The new eld is however smaller and easier to decode than the old ones
The purpose of generalizing is as mentioned before to bring down instruction cache usage by
 
The epilogue checks for interrupt events  handles them if necessary  and branches to the next instruction
after copying its parameters into rOP
	
sharing code In this case we were lucky since all parameter extraction code could be shared If
the parameters are packed dierently for each instruction they have to be extracted separately
within the corresponding case block The index eld has of course the same location
The intermediate format of these instructions are shown below 
Intermediate Format
A rs    
B rs    
C rd    
D INDEX  
 
  
AAAAAAAAAAABBBBBBBBBCCCCCCCCCDD
Here the two middle elds need two zerobits each The index eld is two bits wide since we
must be able recognize three dierent instructions
	 The Disassembler
The disassembler is generated in the same way as the decoder but instead of building the
intermediate format it writes the instructions syntax our syntax strings are easily converted
to printf statements It also ignores the virtual elds since these are not a part of the native
format
The following gure shows shows an example of how output from the assembler could look 
x ld o   l
xc add o  l
x sub o 
 o
x orcc g g g
x be x
xc or g g o
x call xcc
x nop
x sethi hix o
		 The Statistics Converter
The statistics converter is also generated by using a decode tree but here we must take care
of the virtual elds This has to do with the fact that we do not have any statistics which
tells us how frequent dierent virtual eld values are Remember that virtual elds represent
internal states of the simulator which cannot be derived from the native format Typically
a virtual eld corresponds to dierent modes in the simulator The instructions which can
execute in those dierent modes have one service routine per mode since we have dierent
semantics for each mode and thus it is necessary to spread the instructions statistics among
them according to their relative usage Our generalization and specializationalgorithms will
not work if this is not done
The user must measure these frequencies in some way or estimate them Then they are given
to the statistics converter which processes the statistics accordingly


For example if the simulator has two modes and runs in mode one   of the time we supply
this information to the statistics converter If we have one instruction subcc XYZ with
frequency  we will get subcc mode  XYZ with frequency  and subcc mode  XYZ
with frequency  after the conversion
This as an approximation since all instructions for a certain mode will get the same relative
division For example if the simulator could give us a better information about the usage of
certain instructions in dierent modes we could do better This has however not been done
yet
	 The Assembler and Output Functions
We generate a very simple assembler which understands instructions in the format 
name parnum parnum  parnumN
where name is the name of the instruction after the expansion of the combinative macros
For example add i  The parnums are the corresponding parameter values in decimal
When the assembler has identied an instruction it calls the instructions output function
which writes the opcode to a le An example of such a function is the following 
void out add i FILE out u long rs u long rs u long rd

u long w
w  
w  w   ! x
w  w   ! rd    	 xf
w  w   ! x
w  w   ! rs    	 xf
w  w   ! x
w  w   ! x
w  w   ! rs    	 xf
fwritechar 	w sizeofu long  out

The assembler was mainly created to test the decoder and disassembler But the output
functions could be useful if we want to generate test suites in the future

 Performance
In order to measure the performance of a generated simulator

and test how much it is
possible to gain by specializations we have run the GCC benchmark from SPEC  on a
generated simulator This benchmark is the optimization part of the GNU CCompiler GCC
To keep down the execution time we only use peephole optimization and thus we use a
modied version of the benchmark
We rst collected execution statistics of instructions from the benchmark and then we gener
ated dierent versions of the simulator using dierent numbers of specializations This way
we managed to get the simulator   better than the handoptimized version classic and
  better than itself when no specializations was used The simulator only executed in the
service routines   of the total time the rest was spent in other parts of the simulator
Thus we get a more fair picture if we only consider the time spent in the service routines
ie the part of the simulator which we can aect with specializations and generalizations
The following diagram shows this time for dierent version of the simulator when run on a
SuperSPARC  MHz






Classic



























Time s
Specializations
Figure 	  The performance of dierent specialized simulators on the SPEC  benchmark
GCC
We see that with no specializations the generated simulator is slower than the classic one
but with only  specializations it has become faster We get the most ecient simulator
by  specializations This one has   faster service routines compared to the one with
no specializations and 
  faster than the service routines in the classic simulator With 
specializations something happens The most likely reason for this performance degradation
is that we trash the instruction cache of the host machine We have not examined if gener

A SPARC V user level simulator Floating point and coprocessor instructions were not implemented

alizations could change the performance here to the better However we have learned that
the use of the instruction cache is extremely important for the performance If the instruc
tion cache is badly used the eect of specializations is reduced Therefore we rst sort the
service routines by the execution statistics so that frequently used service routines are placed
consecutively in memory This optimization increases the speed of the service routines by a
few per cent
We have also optimized simulators for some other small applications and the behavior seems
to be the same  the simulator gets faster when we specialize However more programs should
be tested to evaluate the importance of specializations and generalizations
Service Routines
           
           









 













 




 

 
 
 
Specializations
Figure 
  This diagram shows how many service routines that correspond to  
  
per cent of the total service routine calls for dierent specialized simulators From the GCC
benchmark
The diagram in gure 
 shows that the instruction usage of the GCC benchmark is very
skewed ie a very small amount of instructions stand for large usage For example only 

service routines  instructions when no specializations are made out of  stands for  
of the total service routine calls and only  for   This motivates specializations since
they will be applied to the far most used instructions We can also see in the diagram that
when we specialize a service routine split it into two parts the more genaral one seems to
be quite common too With  specializations ie  new service routines almost all of
the more genaral ones stayed among the   most called service routines and half of them
stayed among the 
  most called However only around  stayed amoung the   most
called

	 Future Work
A complete simulator generation tool is a rather complex system and since this work have
been carried out as a six month thesis much work still needs to be done In the following list
we summarize some of the most important things that could be improved or added to the
system in the future
  Augment the specication language with constructions for specifying register structure
and memory hierarchies as well as the structure of the TLB and paging system if the
processor is a little endian or big endian include instruction timing information with
the ability to specify resources such as functional units and pipelines adding constructs
for specifying delay slots and instruction issue etc Clearly a lot of work can be done
here but we must not forget that the tool need to able to generate ecient code for the
new part as well
  Implement simulators for other architectures to evaluate the generality usability and
utility of the specication language
  Generate test suites to verify the correctness of the simulator This could be done by
generating programs which contains all implemented instructions with random values
as parameters as well as critical values such as MIN INT    MAX INT Of course
we need to be careful with branches and memory addresses The test programs should
write their state at given times to a le They can then be run both on the simulator
and on a native system The written states can be compared to verify at least that the
test programs runs correctly
  Make the generation of a full assembler possible Could be useful when test suites should
be created
  Assign a cost function for each parameter which corresponds to how much we can gain
if we specialize it This cost function should depend on the statistics of course but also
on parameter type and placement within the intermediate code For example we gain
more if we specialize a parameter which has to perform its intermediate transformation
at runtime due to compiler optimizations that can be performed than for one which
do not have such or has it precalculated
  Better analysis of how specializations and generalizations aect the performance of the
generated simulator Collect execution statistics from a large set of programs and then
measure the performance of the simulator running other programs Investigate how
important the placement of service routines is for the instruction cache usage of the
host machine
  Make our iteration process for optimazing the simulator automatic


 Related Work
As mentioned in the introduction there has not been much work on simulator generation tools
for the instruction set architecture level There exists a few but their main approach has not
been focused on how to generate an ecient simulator which was the main goal with this
thesis work
Tood A Cook and Ed Harcourt 	 
 has developed a functional programming language
Lisas which is used as a specication language for an instruction set architecture The
language contains constructs for describing instruction formats declaring memory sizes with
word lengths and simple register les The semantics of an instruction is implemented by a
function in the language which uses other helper functions to determine dierent addressing
modes etc When simulating the specication le program is simply executed We have
not seen any measurments of how ecient this approach is and if it is possible to compile
this language to ecient code The specication language lacks syntax description for the
instructions
Another tool is the Visualizationbased Microarchitecture Workbench VMW by Trung A
Diep  It uses templates to specify how instructions are coded and which elds to use
as parameters The templates contains hexadecimal mask values which is used to identify
dierent elds An API Applications Programming Interface is dened for C which
is used to help the programmer implement the simulator The API includes functions for
getting information about the next instruction to execute the contents of memory addresses
and caches if some functional units are occupied etc Thus no desription language is used to
specify the architecture instead the user programs the simulator in C using this API A
nice feature of the tool is that it has a graphical user interface for controlling the simulator
We have not found any performance measurment of an implemented simulater using the
VMW
Some people have addressed subsets of the problem such as the New Jersey MachineCode
Toolkit 	 NJMCTK The NJMCTK is used for helping programmers writing applications
which process machine code ie assemblers disassemblers and debuggers The user can
specify how instructions are coded in a description language and then the tool is able to
generate code for an instruction encoder as well as a decoder At rst we considered to use
the tool for decoding native instruction and producing our intermediate format but it turned
out to be too inexible However we have borrowed some ideas and terminology from the
NJMCTK
Another work that could be mentioned here is Automatic Generation of Assemblers by John
D Wick 


 Conclusion
We now have a tool which will be very helpful during the development of ecient simulators
The specication language provides a compact but expressive way to describe dierent in
struction set architectures and our experience so for indicate signicantly shorter development
times with much less errors
Furthermore our test results show that a generated simulator can be faster than a hand
coded and handoptimized one Thus we do not lose in performance here which usually is
the main dilemma when using utilities of this kind

A An Example of a Specication  SPARC
The following text is an example of a description le which describes a subset of the SPARC
V instruction set architecture

 Description file for a subset of the SPARC V instruction set architecture

 Declaration of the positions and names of the different fields
The dispon and dispoff fields are actually the same but since
different intermediate transforming for onpage and
offpage purposes see below will be used the field is codes as two 
fields 
op rd op	 rs	
i simm rs	
a cond op	 dispon dispoff
imm shcnt	
dispon dispoff trap
modevirtual
onpagevirtual
 Some intermediate transformations to speed up the simulator 
intermediate form
rd  REG OFFSET DSTrd   
rs  REG OFFSET SRCrs   
rs  REG OFFSET SRCrs   
imm  imm   
dispon  dispon     Transform for an onpage branch
dispoff  dispoff     Transform for an onpage branch
 Definition of some useful CCSmacros 
define OP
fields  rs 
syntax sinrsginrsoinrslildrs  
semantics  REGrs 
define OP
case i   
fields  rs 
syntax sinrsginrsoinrslildrs  
semantics  REGrs 
case i   
fields  simm 
syntax ldsimm
semantics  simm 
define DST
fields  rd 
syntax sinrdginrdoinrdlildrd  
semantics  REGrd 
define MODE
case mode    semantics 
case mode    semantics 
 Used to select the correct primitive when branching offpage or onpage 
define ON OR OFF PAGE
case onpage    semantics OFF
case onpage    semantics ON
 Used to code if a branch is annulled or not 
define ANNUL
case a    syntax 
case a    syntax a semantics annull epilogue

 Different fields are used depending on onor offpage branches 
define DISP
case onpage    fields dispoff syntax xlxPC 
 	dispoff semantics dispoff
case onpage    fields dispon  syntax xlxPC 
 	dispon semantics dispon
 The names of all conditional branch instructions 
define BRANCH NAMES
case cond    syntax bne
case cond    syntax be
case cond    syntax bg
case cond    syntax ble
case cond    syntax bge
case cond    syntax bl
case cond    syntax bgu
case cond    syntax bleu
case cond    syntax bcc
case cond    syntax bcs
case cond    syntax bpos
case cond    syntax bneg
case cond    syntax bvc
case cond    syntax bvs
 The condition that must hold in order for a branch to be taken The
fist 	 are used in our optimized mode for condition codes and the last
	 are used in the other mode 
define COND
case mode     cond    semantics rCMP VALUE A ! rCMP VALUE B
case mode     cond    semantics rCMP VALUE A  rCMP VALUE B
case mode     cond    semantics intrCMP VALUE A  intrCMP VALUE B
case mode     cond    semantics intrCMP VALUE A  intrCMP VALUE B
case mode     cond    semantics intrCMP VALUE A  intrCMP VALUE B
case mode     cond    semantics intrCMP VALUE A  intrCMP VALUE B
case mode     cond    semantics rCMP VALUE A  rCMP VALUE B
case mode     cond    semantics rCMP VALUE A  rCMP VALUE B
case mode     cond    semantics rCMP VALUE A  rCMP VALUE B
case mode     cond    semantics rCMP VALUE A  rCMP VALUE B
case mode     cond    semantics rCMP VALUE A  rCMP VALUE B    
case mode     cond    semantics rCMP VALUE A  rCMP VALUE B    
case mode     cond   
semantics intrCMP VALUE A " rCMP VALUE B  rCMP VALUE A  rCMP VALUE B " rCMP VALUE A  
case mode     cond   
semantics intrCMP VALUE A " rCMP VALUE B  rCMP VALUE A  rCMP VALUE B " rCMP VALUE A  
case mode     cond    semantics !ccode z
case mode     cond    semantics ccode z
case mode     cond    semantics !ccode z ## ccode n ! ccode v
case mode     cond    semantics ccode z ## ccode n ! ccode v
case mode     cond    semantics ccode n  ccode v
case mode     cond    semantics ccode n ! ccode v
case mode     cond    semantics !ccode c ## ccode z
case mode     cond    semantics ccode c ## ccode z
case mode     cond    semantics !ccode c
case mode     cond    semantics ccode c
case mode     cond    semantics !ccode n
case mode     cond    semantics ccode n
case mode     cond    semantics !ccode v
case mode     cond    semantics ccode v
 This is an interesting definition which defines all SPARC V conditional branches
After the expansion of the CCSmacros we get a total of  instruction definitions
The COND has  different entries ANNUL and ON OR OFF PAGE has 
each    
JUMP REL ON PAGE N and JUMP REL OFF PAGE are primitives defined in
the simulator core 
instruction branchesDISP uses ON OR OFF PAGE ANNUL COND
pattern
op     op  
syntax
BRANCH NAMESANNUL DISP
semantics

if COND

JUMP REL ON OR OFF PAGE PAGE NDISP
ANNUL



instruction sethiimm DST
pattern
op     op  
syntax
sethi hixlximm   DST
semantics
 DST  imm 
 MEMORY LOAD g is a core primitive to load an address from memory 
instruction ldOP OP DST
pattern
op     op  
syntax
ld OP 
 OP DST
semantics
 MEMORY LOAD gDST OP 
 OP u long Ld Word 
instruction addOP OP DST
pattern
op     op  
syntax
add OP OP DST
semantics
 DST  OP 
 OP 
 Subcc has different semantics depending the mode used 
instruction subccOP OP DST
pattern
op     op     mode  
syntax
subcc OP OP DST
semantics

rCMP VALUE A  OP
rCMP VALUE B  OP
DST  rCMP VALUE A  rCMP VALUE B

 If we are in noptmode just switch back to optmode and reexecute
the intruction since the result of an subcc always can be
represented in opmode 
instruction subcc
pattern
op     op     mode  
semantics

switch to opt

	
B An Example of a Decoder
This is an example of a decoder for some variants of the add and subcc instructions dened
in appendix A Dierent degrees of specialisations have been applied to the add instructions
The subcc uses virtual elds since it has dierent semantics for dierent modes
 long sgDecodeINTERMEDIATE CODE p u char code CPU STATE   state
 
 u long
	 op rd op rs i rs mode
 long
 simm

 EXTRACT NATIVE opop code
 if op  
 
 EXTRACT NATIVE rdrd code
 EXTRACT NATIVE opop code
 if op  
	 
 EXTRACT NATIVE rsrs code
 if  REG OFFSET SRCrs      REG OFFSET DSTrd   
 
 EXTRACT NATIVE ii code
 if i  
 
 EXTRACT NATIVE rsrs code
 PACK INTERMEDIATE add i  rs rdpparameters rs rd
 pentry pt  v entry pointsSIM ENTRY INDEX add i  rs rd
	 return 
 
 if i  
 
 EXTRACT NATIVE simmsimm code
 if simm  
 
 PACK INTERMEDIATE add i  rs rd simm pparameters rd
 pentry pt  v entry pointsSIM ENTRY INDEX add i  rs rd simm 
 return 
	 
 PACK INTERMEDIATE add i  rs rdpparameters simm rd
 pentry pt  v entry pointsSIM ENTRY INDEX add i  rs rd
 return 
 
 
	 EXTRACT NATIVE ii code
	 if i  
	 
	 EXTRACT NATIVE rsrs code
		 PACK INTERMEDIATE add i pparameters rs rd rs
	 pentry pt  v entry pointsSIM ENTRY INDEX add i 
	 return 
	 
	 if i  
	 
 EXTRACT NATIVE simmsimm code
 PACK INTERMEDIATE add i pparameters simm rs rd
 pentry pt  v entry pointsSIM ENTRY INDEX add i 
 return 
	 
 
 if op  
 
 EXTRACT NATIVE rsrs code
 EXTRACT NATIVE ii code
 if i  
 
 EXTRACT NATIVE rsrs code
 mode  usrGetVirtualField mode  state subcc i  rd rd rs rs rs rs NULL
	 if mode  
 
 PACK INTERMEDIATE subcc i  mode pparameters rs rd rs
 pentry pt  v entry pointsSIM ENTRY INDEX subcc i  mode 
 return 
 


 if mode  
 
 PACK INTERMEDIATE subcc i  mode pparameters rs rd rs
 pentry pt  v entry pointsSIM ENTRY INDEX subcc i  mode 
	 return 
 
 
 if i  
 
 EXTRACT NATIVE simmsimm code
 mode  usrGetVirtualField mode  state subcc i  rd rd rs rs simm simm NULL
 if mode  
 
 PACK INTERMEDIATE subcc i  mode pparameters simm rs rd
	 pentry pt  v entry pointsSIM ENTRY INDEX subcc i  mode 
 return 
 
 if mode  
 
 PACK INTERMEDIATE subcc i  mode pparameters simm rs rd
 pentry pt  v entry pointsSIM ENTRY INDEX subcc i  mode 
 return 
 
 
	 
 
 return 
 
Comments
Above the corresponding decoder of gure  is shown It extracts the native elds and
compare them to dierent values in order to identify the incoming instruction When a eld
is used as a parameter eld it is just extracted since no comparison is necessary However
since specializations are used the same eld can act as both static and as a parameter for
dierent service routines For example on line  the simm is checked against  to see if the
specialized service routine should be used or not
On line 	 two elds after the application of the user specied intermediate transformations
are compared to see if they are equal This test is performed to see if a specialized service
routine which uses equal parameter values should be used
The use of a virtual eld is shown on line 	 and below The variable mode virtual eld is set
to the value returned by the user dened function usrGetVirtualField Such a function gets
the state of the CPU as a parameter as well as the name of the instruction and its parameters
with corresponding values The function can now return an approporate value based on this
data In this case the mode of the simulator is returned see the virtual eld part of section
 We can now test if a service routine for mode  should be used or one for mode 
v entry points is an array which holds all service routine pointers When the right service
routine is found for an instruction the corresponding pointer is stored in the intermediate
format together with the packed service routine parameters

C An Example of Generated Service Routines
This example shows the corresponding service routines for those instructions decoded in
appendix B
include v includeh
void v service routines 

v entry pointsSIM ENTRY INDEX add i  rs rd    SIM ENTRY POINT add i  rs rd
v entry pointsSIM ENTRY INDEX add i     SIM ENTRY POINT add i 
v entry pointsSIM ENTRY INDEX add i  rs rd simm     SIM ENTRY POINT add i  rs rd simm 
v entry pointsSIM ENTRY INDEX add i  rs rd    SIM ENTRY POINT add i  rs rd
v entry pointsSIM ENTRY INDEX add i     SIM ENTRY POINT add i 
v entry pointsSIM ENTRY INDEX subcc i  mode     SIM ENTRY POINT subcc i  mode 
v entry pointsSIM ENTRY INDEX subcc i  mode     SIM ENTRY POINT subcc i  mode 
v entry pointsSIM ENTRY INDEX subcc i  mode     SIM ENTRY POINT subcc i  mode  subcc i  mode 
v entry pointsSIM ENTRY INDEX subcc i  mode     SIM ENTRY POINT subcc i  mode  subcc i  mode 
return
  Service Routines  
SIM ENTRY POINT add i  rs rd

u long
rs rd
EXTRACT INTERMEDIATE rs Ars rOP
EXTRACT INTERMEDIATE rd Ard rOP
REGrd  REGrd 
 REGrs 
epilogue

SIM ENTRY POINT add i 

u long
rs rd rs
EXTRACT INTERMEDIATE rs Ars rOP
EXTRACT INTERMEDIATE rd Brd rOP
EXTRACT INTERMEDIATE rs Ars rOP
REGrd  REGrs 
 REGrs 
epilogue

SIM ENTRY POINT add i  rs rd simm 

u long
rd
EXTRACT INTERMEDIATE rd Crd rOP
REGrd  REGrd 
  
epilogue

SIM ENTRY POINT add i  rs rd

u long
rd
long
simm
EXTRACT INTERMEDIATE simm Asimm rOP
EXTRACT INTERMEDIATE rd Ard rOP
REGrd  REGrd 
 simm 
epilogue

SIM ENTRY POINT add i 

u long
rs rd
long
simm
EXTRACT INTERMEDIATE simm Asimm rOP
EXTRACT INTERMEDIATE rs Brs rOP
EXTRACT INTERMEDIATE rd Drd rOP
REGrd  REGrs 
 simm 
epilogue

SIM ENTRY POINT subcc i  mode 

u long
rs rd rs
EXTRACT INTERMEDIATE rs Ars rOP
EXTRACT INTERMEDIATE rd Brd rOP

EXTRACT INTERMEDIATE rs Ars rOP
rCMP VALUE A  REGrs 
rCMP VALUE B  REGrs 
REGrd  rCMP VALUE A  rCMP VALUE B
epilogue

SIM ENTRY POINT subcc i  mode 

u long
rs rd
long
simm
EXTRACT INTERMEDIATE simm Asimm rOP
EXTRACT INTERMEDIATE rs Brs rOP
EXTRACT INTERMEDIATE rd Drd rOP
rCMP VALUE A  REGrs 
rCMP VALUE B  simm 
REGrd  rCMP VALUE A  rCMP VALUE B
epilogue

SIM ENTRY POINT subcc i  mode  subcc i  mode 

u long
INDEX 
EXTRACT INTERMEDIATE INDEX  AINDEX  rOP
switchINDEX 

case   subcc i  mode  

switch to opt
break

case   subcc i  mode  

switch to opt
break


epilogue



References
 R Bedichek  Some Ecient Hardware Simulation Techniques In USENIX 
Winter pp 	
 Blittersoft 	 PCx  Software PC Emulation http blittersoftwildnetcouk
 M Rosenblum S Herrod E Witchell A Gupta 	 Complete Computer System
Simulation the SimOS Approach In IEEE Parallel and Distributed Technology
 B Werner P Magnusson 
 A Hybrid Simulation Approach Enabling Performance
Characterization of Large Software Systems Proceedings of MASCOTS
 pp 

 R Lipsett E Marschner M Shaded 	 VHDL  The Language IEEE Design and
Test of Computers April pages 
	 T A Cook  Instruction Set Architecture Speci cation PhD thesis from North
Carolina State University

 T A Cook E Harcourt  A Functional Speci cation Language for Instruction
Set Architectures
 J R Bell 
 Threaded Code In Communications of the ACM Vol 	 No 	 June
pp 


 P Magnusson  Partial Translation SICS Technical Report T 
 P Magnusson  Simulation of Parallel Hardware In MASCOTS January
 P Magnusson D Samuelsson  A Compact Intermediate Format for SimICS
SICS Research Report R 

 D Samuelsson  System Level Interpretation of the SPARC V Instruction Set
Architecture SICS Research Report R 
 P Magnusson SimICS Homepage http wwwsicssesimics
 The SPARC V Reference Manual 
 T A Diep  A Visualizationbased Microarchitecture Workbench PhD thesis
from Carnegie Mellon University
	 N Ramsey M F Fernandez 	 New Jersey MachineCode Toolkit Reference Man
ual Version 

 J D Wick 
 Automatic Generation of Assemblers PhD thesis from Yale Uni
versity

