Emulation of a Complex Instruction Set Computer with a Reduced Instruction Set Computer by McNeley, K. J. & Milutinovic\u27, V. M.
Purdue University
Purdue e-Pubs
Department of Electrical and Computer
Engineering Technical Reports
Department of Electrical and Computer
Engineering
4-1-1988
Emulation of a Complex Instruction Set Computer





Follow this and additional works at: https://docs.lib.purdue.edu/ecetr
This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact epubs@purdue.edu for
additional information.
McNeley, K. J. and Milutinovic', V. M., "Emulation of a Complex Instruction Set Computer with a Reduced Instruction Set Computer"




vX vX vX vX vX vX vX vX vX vX vX vX vX vX
!vX vX vX vX vX vX vX vIvX vX vX vX vX vX ;
SSviviS:*:®
y .^ V . \ \ ^ V A ^ \V . ,.V.V.V.,.V.V.V.V.V.V.V.V.
. Jw a v X v I 
iXv Xv X'' ‘Wmmmmmmmi
F I L E
E m u lation  o f  a C om plex  
In stru ction  Set C om puter  
w ith  a R educed  In stru ction  
S et C om puter




School of Electrical Engineering
Purdue University
West Lafayette, Indiana 47907
This research (especially in its final phases) was partially supported by 
NCR Corporation, World’s Headquarters, Dayton, Ohio.
Emulation of a Complex Instruction Set Computer With a Reduced
Instruction Set Computer
K. J. McNeley and V. M. Milutinovic 
School of E lectrical Erigineering 
Purdue University 
West L afayette , Indiana 47907
ABSTRACT
This paper analyzes some of the d ifficu lties of em ulating a Complex Instruction  Set 
Com puter (CISC) with a Reduced Instruction Set Com puter (RISC). It will be shown 
that although the speed advantage of a RISC is sacrificed , a CISC can be em ula ted  
with the exception of softw are constructs th a t support nonstandard hardw are 
in te rfaces. Some concre te  exam ples will be used to help illu s tra te  the 
execution- tim e bottlenecks as well as to discuss possible solutions from  an 
a rch itec tu ra l point of view for both Silicon and Gallium Arsenide (GaAs). In addition, 
it will be shown tha t the most e ffic ien t method of em ulation involves debugging 
compiled High-Level Language (HLL) source code on a CISC, and then recom piling 
the HLL code with a com piler th a t is fam iliar with the ta rg e t RISC ach itec tu re .
Sectidri I: INTRODUCTION I
1.1 The In te rest in GaAa I
1.2 The Motivation Behind the Em ulation of a GlSC I
1.3 Benefits Resultirig from the Em ulation 2
Section 2: EMULATION EFFICIENCY OF A CISC WITH A RISC 3
2.1 Problems Eneduritered in Emulation 3
2.1.1 Addressing Mode Emulation 3
2.1.2 CoriditiOh Code Support 4
2.1.3 Trace Mode Emulation 4
2.1.4 Softw are Support Of Infrequently Used Instructidris 5
2.2 A Solution to tfti Efficiency Problem 6
Section 3: THE TRANSLATOR'S SOPHISTICATION 7
3.1 Condition Code O ptim ization 7
3.1.1 Locating Coriditidriai Instructions
3.1.2 Substituting RISC Supported Conditional C onstructs
3.1.3 A D efault Solutiort to Condition Code Synthesis
3.2 The Im pact of the Data Form at on Effidiency 9
3.3 The E ffect of IricreaSirig the R egister File Size 10
3.3.1 D ecreasing the LOAD/STORE LateriCy 10
3.3.2 The Im pact on. the IriStructiori Form at 11
Section 4: PIPELINE CONSIDERATIONS 12
4.1 Why Pipelining is an Im portant Consideration 12
4.2 Pipeliriirig in the Silicon Environment 12
4.3 Pipeiirtirig in the GaAs Environment 13
Section s: PACKING CONSIDERATIONS 14
5.1 In trainstruction  Packing 14
5.2 In terinstruction  Packing 14





Table of C on ten ts
Section 6: OTHER ARCHITECTURAL CONSIDERATIONS
6.1 On-Package Cache Support
6.2 Coprocessor Support
6.2.1 Monitoring the Instruction Bus
6.2.2 Conditional Actions Based on Coprocessor Conditions
6.2.3 An Operand T ransfer Scheme
6.3 Bit and Byte Field Support
Section 7: INSTRUCTION TRANSLATION
7.1 Translation From MC68020 Code to MIPS Code
7.2 Disassembly and Decomposition of the  MC68020 Instructions
7.3 Condition Code Analysis and In terinstruction  O ptim ization
7.4 Standard MIPS O ptim ization Mechanisms
Section 8: CONCLUSION 
Section 9: ACKNOWLEDGEMENTS 
Section 10. REFERENCES
Section 11: TABLES AND FIGURES **
Table I. PERFORMANCE COMPARISON OF GaAs AND SILICON 24
Figure I. SU-MIPS (4 MHz.) EXECUTION EFFICIENCY FOR
DIFFERENT COMPILED BENCHMARKS 25
Figure 2. MC68020 (10 MHz.) EXECUTION EFFICIENCY FOR
DIFFERENT COMPILED BENCHMARKS 26
Figure 3. COMPARISON OF HLL EXECUTION EFFICIENCIES AND CODE
SIZE FOR DIFFERENT METHODS OF CODE GENERATION 27 
Figure 4. CACHE HIT RATIOS VERSUS CACHE SIZE
- ii-
Table of C on ten ts
Appendix A ADDRESSING MODE EMULATION
A.I R egister Allocation
A.2 Addressing modes supported by the MC68020 
A.3 R egiitd r Ddfihitiohs used in Translation
A. 4 AddfdsMng MOde Trdnslatioh
; . . i ... ■ ' • • . . . "
Appendix B GONDlTIbN CODE SYNTHESIS
B. l  Cohditidh Code Bit Map
B.2 Basic Conditions Tested
B.3 Macros Preceding a Conditiorial Ihstruction
B. 4 Condition Code Translation
Appendix C INSTRUCTION SET SYNTHESIS
C. l  R egister D dfim tidni tisdd in Translation
C . 2 Instruction Set Translation
Appendix D QUICKSORT BENCHMARK TRANSLATION
D. i  The Purpose of this Appendix
D.2 Pseudocode used for the Quicksort Behchihark
D.3 MC68020 Assembly Language Code for Quicksort
D.4 SU-MIPS Assembly Language Code for Quicksort
D.5 Translated MC68020 .Assembly Language Code for Quicksort
INTRODUCTION
1.1 The Interest in GaAs
Gallium Arsenide (GaAs) has reached a level of in teg ra tion  where it la now 
feasible to design very high speed com puter com ponents of m oderate scale* The 
current fabrication  lim itation  is about 30K transistors per chip [HiInM84] and is 
expected to more than double by the year 2000. Since several 32-bit
m icroprocessor designs exist today th a t require few er than 60 K tran sisto rs  to 
im plem ent [1], GaAs technology has a ttra c te d  a a g rea t deal -°f sc ien tific
in te rest.
In addition to  the speed advantage of about 20:1 which Enhancem ent/D epletion 
mode Metal Sem iconductor Field E ffect Transistors (E/D-M ESFETS) o ffer 
(IkTo M 84], it has been shown th a t GaAs has the ability  to  w ithstand
environm ental conditions fa r more severe than its  silicon co u n terp art. In 
particu la r, GaAs can w ithstand 10-100 million RADS and a tem p era tu re  range of 
'-200?C to -+ZpO0C [EdLiW85]. This is im portant when environm ental constra in ts  
are c ritica l such as those found in aerospace and m ilitary applications where
radiation hardness is a prim ary concern.
One final property  which makes GaAs a more a ttra c tiv e  m a te r ia l than silicon 
is its  ability  to in te rface  d irec tly  with optical fibers [Honey851. This technology 
has; th e  po ten tia l to be used for com m unications processing by using op tical 
fibers as inputs and outputs in the sam e fashion th a t cu rren t silicon 
m icroprocessors use e lec trica l connections. This also has the p o ten tia l to help 
a llev iate  the off-chip  bandwidth res tric tio n  and reduce the in te rface  hardw are 
required.
1.2 The Motivation Behind the Emulation of a CISC
The th ree fac to rs  m entioned above make this technology an exciting  a rea  for 
research  and fu rth er developm ent for special purpose applications.
U nfortunately, some processor designs are not d irec tly  im plem entabie on a single 
GaAs chip because of the d ifferences betw een silicon and GaAs device physics. 
These d ifferences can be found in Table I . In sum m ary, the  four m ajor 
d ifferences are: (I) the low transisto r count allowable, (2) the large ra tio  of 
off-chip propagation delays to on-chip sw itching delays, (3) the  low on-chip g a te  
fanin (and fanout), and (4) the low yield. It is c lear from this discussion th a t 
some arch itec tu res  will never be viable for a d irec t GaAs im plem entation on a 
single chip.
There are  th ree basic approaches available for a GaAs m icroprocessor. They 
include: (I) functionally divide the processor and im plem ent each function as a
I. Although GaAs cannot simply be considered as a fast silicon, the transisto r 
count is a re la tively  good indicator of a rch itec tu ra l com plexity and is therefo re  
im portant.
I
separa te  chip, (2) divide the processor in a b it-s lice  manner and im plem ent each 
slice as a chip, and (3) im plem ent the en tire  processor as a single, very simple 
chip. Although the firs t two approaches may be worthy of fu rther study, it is 
generally  agreed th a t the excessive com m unications betw een processor elem ents 
is prohibitive in a GaAs environm ent. Any im plem entation of a GaAs processor 
with a m ulti-chip configuration will be plagued with a degraded perform ance due 
to large off-chip  propagation delays [M ilFu84].
Because o f the aforem entioned design considerations, the only a rch itec tu re  
which is tran sisto r count com patible with GaAs is known as a Reduced 
Instrudtion Set Com puter (RISC) a rch itec tu re . The RISC design philosophy 
consists of th ree  basic design principles [K atev83]: (I) iden tify  the  machine 
instructions most frequently  used by the ta rg e t application, (2) optim ize the 
d a tap a th  and tim ing for these frequent instructions, and (3) incorporate  o ther 
frequent instructions into the instruction  se t only if they f it in to  the already 
elaborated  VLSI schem e. Since RISC a rch itec tu res  are the  only a rch itec tu re  in 
which a single chip com puter can be im plem ented in GaAs, what needs to be 
shown is th a t the  reduced instruction  se t prim itives can fully em ulate  all of the 
e laborate  capab ilities of a Complex Instruction Set Com puter (CISC)
a rch itec tu re .
It will be shown in this paper th a t a RISC can em ulate a CISC with the  
exception of so ftw are  constructs for hardw are dependent capab ilities such as 
dynamic memory sizing, nonstandard bus a rb itra tion , e tc . Of course, these 
constructs could be supported on new designs if they were needed by including 
the required hardw are. It will also be shown th a t the most e ffic ien t method of 
em ulation is accom plished by debugging com piled High-Level Language (HLL) 
source code on a CISC and then recom piling the debugged code into RISC 
machine code using a com piler th a t understands the resources available on the 
ta rg e t RISC a rch itec tu re . This technique will support the maximum
execution-tim e speed up possible from GaAs technology.
1.3 Benefits Resulting from the Emulation
Several benefits exist as a d irec t consequence of the em ulation of a CISC 
with a RISC a rch itec tu re  in GaAs. Four advantages are outlined below:
1. A possibility exists to replace CISC a rch itec tu re s  where to lerance for 
excessive radiation or wide tem p era tu re  variations is required.
2. A possibility exists to cap tu re  a m arket share of CISC type application 
program s in the general purpose com puting arena where the execution 
speed im provem ent of a GaAs RISC is desired.
3. The em ulation allows portab ility  of CISC program s to the RISC 
environm ent to increase the num ber of application program s readily  
available to RISC a rch itec tu res .
4. Softw are productivity  is enhanced in speed c ritic a l environm ents because 
program s can firs t be debugged using CISC aids available in hardw are (for 
exam ple, trace  mode) and then executed on a RISC a rch itec tu re  which 
achieves speed by reducing its  overhead.
2
The only res tric tio n  is th a t the two a rch itec tu res must have some basic 
sim ilarities such as the sam e address space arid the sam e d a ta  bus size. It is 
also advantageous to have th e  sam e m emory map if  I/O devices and coprocessors 
are memory mapped. If this is not the case, the I/O drivers Will require some 
address translation  or possibly, recoding.
EMULATION EFFICIENCY OF A CISC WITH A RISC
2.1 Problems Encountered in Emulation
Although it would appear advantageous to d irec tly  tran s la te  a program w ritten  
using a CISC instruction se t into one using a RISC instruction  se t, close scru tiny  
reveals tha t this endeavor is alm ost fu tile . The chief advantage of a RISC IS 
the benefit of an execution-tim e decrease in machine code due to a sho rte r 
machine clock cycle [CiaStSO]. U nfortunately, this speed advantage can riot be 
fully exploited when Em ulating a CISC using a machine code translation  schem e. 
This is due to several a rch itec tu ra l bottlenecks. The rrist of this Section is 
devoted to an explanation of these bottlenecks.
2.1.1 Addressing J l ods Emulation
One o f the a rch itec tu ra l desigh philosophies in a RISC is the LOAD/STORE [2] 
concept of main memory addressing. A minimum num ber of off-ch ip  operand 
references are used because of a large (and often  tim es windowed) reg is te r file. 
This means th a t memory bandwidth is less Of a res tric tio n  in this type of 
a rch itec tu re  than in most CISCs. When translation is considered how ever, the 
operand fetch  la tency  tim e must be evaluated. This is especially  true  for CISC 
supported addressing modes.
Appendix A illu stra tes  one exam ple of an addressing mode em ulation using the 
Stanford University MIPS (SU-MIPS) [PrGrH84] [GiGrH 831 to em ula te  the 
Motorola 68020 (MC68020®) [M otor84]. The assembly code has been packed and 
optim ized to allow a maximum utilization  of hardw are resources [Gross83] within 
the known con tex t. The point of this em ulation is th a t it can take as many as 
eight RISC instructions to do a sim ple a rithm etic  AD D if  some ind irec t 
addressing modes are used. As a consequence, a RISC loses its  speed advantage 
over a CISC im plem ented in silicon in because of the memory operand fe tch  
latency.
2. LOAD/STORE a rch itec tu res  minimize the operand fe tch  la tency  by prefetching 
all operands and storing them in reg isters. Since reg isters  are  generally  the 
fas tes t level in the memory hierarchy, only a sm all operand fe tch  delay occurs 
compared with o ther levels of sto rage.
3
2.1.2 Condition Code Support
Although some RISC a rch itec tu re s , like the the U niversity of California a t 
Berkeley RISC (UCB-RISC) [K atev83], support condition codes, many others do 
not. Instead, these RISCs often  have a com pare and branch instruction  to 
sim ulate the two step  process of testing  a condition and executing a conditional 
action* This reduces the execu tion-tim e needed for a conditional action by a 
fac to r df two [3]. Even when a RISC a rch itec tu re  supports cohditiort codes,
support generally  includes a ra th e r sm all subset as com pared with most CISCs.
Cbhdition codes are  se t during the execution of v irtually  e v e f f  instruction a 
CISC executes. Consequently, e ith e r an e laborate  transla tion  mechanism is
required to check when the condition codes are  actually  used, or every
instruction decom position must synthesize the e ffe c t of the executed instruction  
on the condition code reg ister. An exam ple of a condition code synthesis is 
shown in Appendix B. As with exam ples used throughout this paper, the MC68020 
and the SU-MIPS were used as the CISC/RISC pair. The generalized  translation  
sequence assumes the existence of a sophisticated  . tran s la to r and occurs in a 
threO step  process as follows:
1. The tran s la to r locates all conditional instructions and re traces  the job 
stream  to determ ine the last instruction  or instructions which se t the 
condition code bits used in the decision. In situations where the condition 
code b its  used in a conditional action  do not lie within the confines o f  a 
labeled routine, any procedure th a t has a branch to th a t label must be 
checked.
2. The tran s la to r then rep laces the conditional instruction  with a
corresponding com pare and branch instruction  if the RISC supports the
condition tested  and the sources for the comparison can be determ ined.
3. A lternately , the m acros which se t the required bits a re  inserted . This is 
the defau lt when the tran sla to r cannot, for one reason or another, 
de term ine where the required  condition code b its .are se t. An exam ple .of 
this is when a conditional branch vectors program flow based on an en try  
in a lookup tab le, and consequently, the destination  procedure is 
indeterm inan t a t translation  tim e. Since the condition code b its required 
by the ta rg e t procedure are also indeterm inan t, all b its in the condition 
code reg is te r must be appropriately  se t.
2.1.3 Trace Mode Emulation
Hardware costs have undergone a dram atic  decrease in the last decade 
prim arily due to advances in VLSI fabrica tion . As a resu lt of the evolution of 
low cost in teg ra ted  circu its, a large portion of pro ject developm ent costs lie in 
softw are developm ent - especially  during the program debugging phase. In
3. The overall execution-tim e actually  increases by 1.1% when considering the 
en tire  program . The execution tim e is reduced only when considering an 
individual conditional instruction  with an explicit te s t for condition.
4
response to this situation , hardw are m anufacturers are beginning to include a 
tra c e  mode for program checkout and support for functions like se ttin g  program 
breakpoints* v irtual machine support, e tc . This concept may be well su ited  to 
the CISC philosophy but it v iolates the RISC philosophy of supporting only 
Simple, fast instructions.
The softw are em ulation of a trace  mode is very expensive in term s of both 
execution-tim e and memory space. As a resu lt, it has been ignored during the 
em ulation process discussed in this paper. Instead I t  is assum ed th a t program  
debugging will be done in a CISC environm ent and then recom piled fo r the  RISC 
environment* The valid ity  of this assum ption lies in making hardw are/so ftw are  
trade-offs betw een GaAs and silicon environm ents. Both RISCs and CISCs have 
specific advantages and th erefo re  both types of a rch itec tu re s  should be 
exploited. If it is advantageous to support a tra c e  mode in a developm ent 
environm ent, the com plexity  of this support should be m igrated into silicon since 
it is known th a t a RISC's so ftw are is not e ffic ien t for th is task . M oreover, in 
the case of the SU-MIPS, tra ce  capabilities are  v irtually  meaningless a f te r  code 
reorganization is com plete because instruction  boundaries are  not always 
preserved.
2.1.4 Softw are Support of Infrequently  Used Instructions
One of the main argum ents heard from RISC proponents (in the CISC vs. 
RISC debate) is th a t com piler w riters generally  do not use the  full capabilities 
of a complex instruction se t [Wulf81]. This is e ith e r because the complex 
instruction has a longer execu tion-tim e than the corresponding prim itives [4], or 
the complex instruction doesn't quite fit the particu la r need in all cases and 
extensive analysis would be required to determ ine when it  makes sense to use it 
[5 j. As a resu lt, i t  is believed th a t a large portion of the com plex instructions 
are not used except by assem bly language program m ers.
The question th a t arises from consideration of the em ulation of infrequently  
used instructions is: if these particu la r instructions are  used so infrequently , then  
why does it m a tte r  if they are execution-tim  e expensive? The reason is th a t 
this papet analyzes the em ulation as a whole. For exam ple, the  instruction  
decom position for b it field operations is very execu tion-tim e expensive. This can 
be used to ju s tify  an ex ternal fo rm atting  processor which can provide d a ta  in 
the form at requested  ra th e r than trying to do bit m anipulations. Inclusion of 
this type of hardw are has the additional advantage of norm alizing a m antissa or 
stripping o ff  the exponent from a floating point coprocessor. What is im portan t 
in the design is the overall system  a rch itec tu re , not the  individual pieces of i t .
4. "Prim itive" indicates one of the RISC instructions used to em ulate  a CISC 
instruction . The VAX auto increm ent addressing mode is the classic exam ple of a 
longer execution tim e for a CISC Instruction.
5. For exam ple, consider hardw are support of a CASE s ta tem e n t or a 
PROCEDURE call. Invariably it e ith e r supports only one language well, or is so 
general th a t it is ineffic ien t for special cases.
5
There is one problem associated  with this concept, however. If the em ulation 
assumes ex tern al processors, then how can the tran sla to r know when to 
synthesize execu tion-tim e expensive instructions and when to paSs the job to 
another processor? The answ er is sim ple. Make the tran s la to r portable much 
the sam e way the 'C ' com piler was made portab le . Allow a user definition of 
the back end of the tran sla to r so th a t i t  can be custom ized to several versions 
of the baseline a rch itec tu re . This allows flex ib ility  but does add som ew hat of a 
burden to the  tran s la to r w riter. This is not how ever, an unreasonable
requirem ent since there is a w idespread belief th a t the full capab ilities of GaAs 
technology will never be achieved without the appropria te  advances in com piler 
(or in this case translato r) technology.
2.2 A Solution to the Efficiency Problem
The previous section would indicate  th a t an em ulation of a CISC with any 
so rt of e ffic iency  is v irtually  impossible. That is not the case. The im plication 
of the previous section is th a t a transla tion  based em ulation with any so rt of 
effic iency  is impossible. The obvious conclusion is th a t transla tion  of assembly 
language instructions is not desirable unless it is absolutely necessary . As 
already indicated , the most e ffic ien t method of em ulation consists of debugging 
com piled HLL source code on a CISC and then recom piling the code with a 
RISC com piler th a t understands the natu re  of the ta rg e t a rch itec tu re  and can 
fully exploit the resources available to i t .
Although recom piling HLL source code is not generally  considered as an 
em ulation, it is the  most e ffic ien t approach. As long as it  can be shown th a t 
an em ulation is possible from a transla tion , the method employed to achieve an 
increased effic iency  i3 unim portant. If a com piler based em ulation is used to 
em ulate the MC68020 with the SU-MIPS, then execution effic iencies g re a te r  than 
unity can be achieved as illu s tra ted  in Figure I and Figure 2. The eight p asca l 
benchm arks used for this com parison are the following:
Puzzle Solves a th ree  dim ensional cube packing problem .
Queen Solves eight queens chess problem.
Perm Computes all perm utations of one through seven.
Towers Solves the Towers of Janoi problem for 14 discs.
Intrhm Multiplies two 40x40 in teg er m atrices.
Bubble Bubble sorts 5000 in tegers
Quick Quiclcsorts 5000 in tegers
Tree Tree insertion sort of 5000 in tegers.
When a translation  based em ulation is used for the sam e two processors, the 
execution effic iency  is reduced and sto rage space in increased . The ex ten t of 
these e ffec ts  can be determ ined  for the Quicksort benchm ark as shown in Figure 
3; the actua l code translation  can be found in Appendix D. This translation  was
6
done for MC68020 code w ritten  d irec tly  in assembly language and is com pared 
with SU-MIPS code also w ritten  in assem bly language so th a t com piler 
technology does not influence the perform ance evaluation. From these figures it 
is c lear th a t w hether GaAs or silicon is the  m ateria l of choice for the RISC, 
com piler based em ulation is a more a ttra c tiv e  a lte rn a tiv e  by far.
U nfortunately, there  are  situations when a d irec t translation  is the only 
a lte rnative  for em ulation. A translation  is required, for exam ple, when the only 
source code is machine code th a t can be disassem bled into assem bly code and 
then translated  into RISC assem bly code. This does not include system  m onitors 
or I/O drivers for the  most p a rt. Although these routines are  w ritten  in 
assembly code because of the ir speed c ritica l na tu re , they are  generally  
arch itec tu ra lly  dependent and a translation  is m eaningless. For exam ple, a RISC 
in general will not have the sam e hardw are controls such as the in te rrup t 
m echanisms, nor will it have the sam e memory mapping of its  I/O ports. Since 
the transla to r cannot an tic ipate  these situations, it does not make sense to
perform  a translation . Perhaps more im portantly  though, the key words for
consideration are "speed c ritic a l nature." It has already been shown th a t an
effic ien t translation  cannot be achieved so it  is impossible . to  make the
translated  routines fast.
TH? TRANSLATOR'S SOPHISTICATION
3.1C ondition  Code Optimization
• . „ V-,-* • . ' , ' ' ..... vV ;v  • :C  :
In subsection 2.1.2, the steps in condition code em ulation were briefly  
discussed. This section will fu rth er e laborate  on these problem s. The reason so 
much a tten tio n  is being given to this problem is because this has the po ten tia l 
of being a m ajor execu tion-tim e bo ttleneck . Fortunately , with a soph isticated  
tran sla to r some of the speed degradation which this problem imposes can be 
elim inated. The ensuing subsections will outline the solution to the p a rtia l 
elim ination of this problem.
3.1.1 Locating Conditional Instructions
The dom inant execution-tim e inefficiency when a ttem p tin g  to synthesize 
condition codes resu lts from trying to em ulate them for every instruction  in the  
CISC instruction  se t which a ffec ts  the s ta tu s  reg is ter. The f irs t s tep  in 
minimizing the unnecessary overhead is to reduce the num ber of instances where 
the synthesis occurs. The way this is accom plished is by finding a conditional 
action and then re trac ing  the sequential execution un til the la st in struc tion  (or 
instructions) th a t a ffec t the condition codes used in the conditional action are  
isolated. For most instructions, this will not be d ifficu lt. For instructions which 
have labels in close proxim ity to the conditional action, this may not be the 
case. Whenever a program segm ent does not have c lea r cu t d irec tion  which i t  
is destined to follow, all of its  condition codes must be synthesized regardless of 
w hether they a re  known to be used or not. This can be very  execu tion-tim e 
expensive.
Consider translating  code w ritten  a t the assembly level. At one point or 
another, most assem bly language program m ers have w ritten  a program  where
branches occur to a program  segm ent based on an en try  in a lookup tab le which 
may or may not be known a t assembly tim e. In this event, th e re  is no way for 
the tran sla to r to an tic ip a te  the d irection  of program flow and consequently the 
last instruction  or instructions prior to a branch must se t the condition codes for 
all bits th a t a re  a ffec ted  by the job stream . As m entioned earlie r , the com plete 
em ulation of condition codes is disastrous from an effic iency  standpoint. The 
conclusion from this is th a t the  ex istence of an e ffic ien t transla tion  for code
w ritten  a t the assembly level is o ften  dubious. As already m entioned, the best
m ethod of assuring effic iency  is using well s truc tu red  HLL program s which are
compiled into well s tru c tu red  assem bly language s ta tem en ts  using modern 
com piler m ethods. .
3.1.2 Substituting RISC Supported Conditional C onstructs
Although the substitu tion of MIPS supported conditional actions is not 
addressed in Appendix B, its discussion is appropria te . What this process en tails 
is locating instructions where a conditional action such as "Branch on Equal" or 
"Set on Positive" can be perform ed equivalently  by a MIPS prim itive and the two 
operands can be identified . When this is the case, the condition code synthesis 
can be com pletely  ignored and an e ffic ien t translation  can be supported by
replacing CISC instructions with RISC prim itives.
How Often does the situa tion  occur where a RISC's prim itives can rep lace  a 
CISC's conditional action and the m acros which se t the condition codes? In 
p rac tice , this situa tion  occurs quite  frequen tly  as shown in Appendix D. The 
condition codes are  usually se t in the instruction  d irec tly  preceding  the 
conditional action . When th is  occurs, the two instructions are prim e candidates 
for single instruction  rep lacem ent. One of the few exam ples of a conditional 
action th a t can never be replaced with a MIPS conditional action  is a ROTATE 
or a SHIFT instruction . F ortunately , the decom position of these instructions 
include prim itives which se t the appropria te  b its  in the  condition code reg is te r 
every tim e they are  synthesized. This generally  allows the m acros which are 
listed in Appendix B,3 to handle these cases w ithout the  aid of the  condition 
code m acros for the o ther a ffec ted  bits.
The conclusion of this subsection is th a t it is often  possible to salvage some 
effic iency  in a m ajority  of the  cases of conditional actions. The p rac tica l 
im plem entation of this type of synthesis depends heavily on the s tru c tu re  of the 
assem bly language program , but com pilers today g en era te  re la tive ly  well 
s tru c tu red  code. Although the best method of em ulation resu lts from compiling 
HLL constructs d irec tly  into RISC m achine instructions, i t  is encouraging to 
realize  th a t in cases where th a t is impossible a transla tion  can still be 
perform ed with some amount of effic iency  in condition code synthesis.
3.1.3 A D efault Solution to Condition Code Synthesis
The conditional actions of a CISC cannot always be replaced by a RISC's 
prim itives. The macros for the individual b its  of the condition code cannot 
always be separa ted  either. What can be done for the sake of effic iency  about 
the situation  where for one reason or ano ther it  cannot be determ ined  what 
should be done about the condition codes? U nfortunately , the only solution is to 
synthesize all b its a ffec ted  by the sequence of instructions in question.
Consider the case of a subroutine th a t has a conditional branch a few 
instructions into the routine. N aturally, this subroutine has a label associated  
with its  en try  point. This means th a t any of a num ber of o ther branches or 
calls can reference  this en try  point as a destination . Since it  is som etim es 
indeterm inant a t com pile tim e which calls or branches will v ec to r execution to  
this subroutine, all routines which have its  en try  point as a possible destination  
may be forced to se t the  condition codes for the b its  used in the  destina tion  
subroutine's conditional branch. Although this is not an a ttra c tiv e  s itua tion , i t  is 
unavoidable.
3,2 The Impact o f the Data Format on Efficiency
Another concern re la ted  to improving execution-tim e effic iency  of em ulation, 
is the data  form ats More precisely, most CISGs support form ats o ther than
binary a rith m etic . For exam ple, the MC68020 supports Binary Coded Decimal 
(BCD) arithm etic  for many types of operations. It would be Sdvaintageous from 
an effic iency  standpoint to support only those types Of fo rm ats which the 
a rch itec tu re  supports; Ih the case of a RISC, this is only binary a rith m etic .
Consider the task of adding th ree  BCD num bers located  in reg isters  and 
storing the resu lt back into another reg ister. The translation  sequence for this 
would be:
ABCD R S I,RD <=> UNPK R S I,RSl ■:
UNPK RD,RD
. Add R S I,RD . , ; ; V , ' , ' ■
PACK R SI,RSl
PACK RD,RD
AB G D RS2,RD <=> UNPK RS2,RS2 \ ■ V
UNPK R D, R D . . .. . '■ ■ ̂
Add RS2,RD
■ . ' PACK RS2,RS2
PACK RD,RD
In this exam ple, the cap italized  instructions are MC68020 instructions used in the  
translation  and the lower case instructions are  MIPS prim itives. This exam ple 
illu stra tes  the generalized  m ethod employed in transla tion  but th is is c learly  ah 
ineffic ien t m ethod. Obviously the first step  in optim izing this code is to rem ove 
the instructions "PACK RD" and "UNPK RD" in the middle of the  sequence.
The next level of im provem ent is by to ta lly  removing all of the PACK and 
UNPK operations. Although in a CISC the operands may be residen t in main 
memory instead of reg isters, this is not true in LOAD/STORE a rch itec tu re s . 
Because of this, the  most p rac tica l m ethod of im provem ent is to rea lize  th a t 
the operand is in a BCD fo rm at when it is fe tched . If i t  is im m ediately  
transla ted  into a binary num ber and le f t  in the reg ister file in th a t fo rm at, no 
translation  has to be perform ed' until it is w ritten  back into main memory. This 
will improve perform ance for frequently  used d a ta  item s which are  used in 
repeated  ite ra tions of a rithm etic  operations.
-  9 -
3.3 The E ffect of Increasing the Register File Size
One notable inconsistency in the addressing mode em ulation can be found in 
the reg ister allocation schem e. In Appendix A .l, 24 reg isters  are  listed  for use 
by the tran sla to r. The 24 reg isters  specified  are all required if  there  is to be 
no sacrifice  in the number of reg isters  available to the MC68020 com piler. 
Since the original SU-MIPS has only 16 general purpose reg isters, this means one 
of two things: e ither a lot of reg ister/m em ory  swapping must be done for the 
least frequen tly  used reg isters  to accom m odate the transla tion  betw een 
a rch itec tu re s , or the reg ister file of the SU-MIPS must be enlarged to a 
minimum of 24 reg isters. Increasing the  reg is te r file size will iinpact the
instruction  fo rm at by enlarging the reg ister field 1-bit per field . BdtH df these 
will be discussed in more d e ta il below,
3,3,1 D ecreasing the LOAD/STORE Latency
If swapping betw een the reg ister file and memory is the required a lte rn a tiv e , 
this will change the som ew hat op tim istic  em ulation effic iency  because of the 
additional w ait s ta te s  introduced from spill over LOAD/STORE operations. In the 
silicon environm ent the addition of eight 32-bit reg isters  would be
inconsequential; in the GaAs environm ent this would not be the case. This
assertion is based on the design param etrics  of the  original SU-MIPS. The
original reg is te r file size was approxim ately 21% of the to ta l num ber of 
transistors and about 8.3% of the VLSI a rea  [PrG rH 84]. If 50% more reg isters 
and the control c ircu itry  to support them were added, the transisto r count would 
be over 8.OK. If 100% more reg isters  were added to support 32, the  device 
count would exceed I lK  transisto rs. More im portan tly , with today's GaAs
technology, the d a ta  path area might be prohibitive unless one the the reg is te r
d a ta  busses were excluded.
However, within the next decade or so, GaAs technology is expected to 
support 60K transisto rs. Because of the off-ch ip  com m unication bo ttleneck , i t  is 
believed th a t the additional transisto r count should be used to exploit a la rger 
reg ister file or a reg ister windowing schem e. An exam ple of the  benefits  of 
reg ister windowing can be found in the exam ination of the UCB-RISC where 
d ram atic  execu tion-tim e decreases were obtained from an eight window reg is ter 
schem e C K atev83]. If a tran s la to r were designed to minimize the off-ch ip
memory refe rences by keeping more of its  operands in the enlarged reg is te r file,
this could overshadow o ther available a lte rnatives for increasing execution 
speed. Examples of a lte rn ativ es  which improve perform ance include additional 
hardw are resources such as m ultiple A rithm etic Logic Units (ALUs) di* seria l 
on-chip m ultipliers [6].
6. In digital signal processing there  is an advantage to these types of resources 
because of the natu re  of signal processing. This may outweigh the advantages 
of the additional reg isters [Tseng84].
10
3.3.2 The Im pact on the Instruction Form at
The instruction  fo rm at would require minor m odifications to allow the  addition 
of 16 reg isters  in the silicon environm ent [7]. The m ajor d ifference would be the 
elim ination of AL U 3 instructions [8]. Fortunately , an exam ination of the 
Appendices reveals th a t the number of AL U3 instructions is re la tive ly  sm all. If 
an AL U 3 instruction  were required, i t  could be synthesized by moving one of the 
source operands into the destination  reg ister and then executing  an ALU2 
instruction . A lternately, i t  could simpiy be included in the instruction  se t in its 
cu rren t form with the provision th a t it cannot be packed With ah AL U 2 
instruction .
A more im portan t consideration than the elim ination of the ALU3 instruction  
is the inability for the modified instruction  fo rm at to support packing LOAD 
instructions with ALU2 instructions. One proposed method to help reduce the 
im pact of this m odification, is the in troduction  of two new instructions known as 
LD2 or ST2 which dould be packed with an ALU2 instruction . Any Of the
following m ethods could be used:
1. A LD2 (ST2) instruction could have two reg isters  where one reg is te r is the 
address o f the source (destination) operand and the o ther reg is te r is the 
destination  (source). No offse t would be allowed for the address reg ister.
2. A LD2 instruction  could have two reg isters  or a reg is te r and an
im m ediate. The two sources would be added to form the e ffec tiv e  
address of the operand with the resu lt w ritten  to the second source.
3. A LD2 (ST2) instruction  could have two reg isters  or a reg is te r and an
im m ediate with an im plicit destination  (source) reg is te r th a t is always 
used. .
All existing th ree operand form ats could s till be supported if th a t were desired 
but they could not be packed with an ALU2 instruction . One additional benefit 
which the enlarged reg is te r field allows is the expansion of the im m ediate  
constant to any num ber in the range of -16 to +15. This can be very  benefic ia l 
for the em ulation of condition codes where the lim itations of a four b it
im m ediate field resu lted  in excessive ALU3 MOV instructions.
; ■ : ' -
7. In the GaAs environm ent, packing will not be allowed so these m odifications 
to the instruction  fo rm at will not be required.
8. An ALU3 instruction  is so named because it has th ree  operand fields: (I) 
source I , (2) source 2, and (3) destination .
11
PIPELINE CONSIDERATIONS
4.1 Why Pipelining is an Important Consideration
The pipeline depth  can have a dram atic  im pact on the em ulation efficiency  
when transla ting  CISC instructions. For exam ple, a delayed branch schem e is 
very a ttra c t iv e  in silicon because the probability  of finding an instruction  to  
rep lace the NO-OP d irec tly  a f te r  the branch has been estim ated  to be as high 
as 90% [Gross83b]. U nfortunately, delayed branch schem es lose som e of th e ir 
im pact in the  GaAs environm ent with a long pipe. When the branch delay is 
extended to six machine cycles due to the instruction  fe tch  tim e, the 
rep lacem ent probability  for all five NO-OPs is less than 5%. The probability  of 
NO-OP rep lacem ent for pipe depths betw een two and five approxim ately follows 
a logarithm ic curve betw een the two. A sim Uar analogy exists fo r an operand 
fe tch  which has the sam e memory la tency .
What this im plies is th a t i t  is very d ifficu lt to support e laborate  addressing 
modes or excessive branching. The only hope of Overcoming these problems is 
the possibility o f an off-chip , on-package cache split betw een instruction  sto rage 
and d a ta  sto rage. If these can be made large enough to ensure a reasonable h it 
probability , then  i t  is possible to bring the average m em ory fe tch  la ten cy  down 
to two machine cycles and obtain a closer m atch to the silicon SU-MIPS 
a rch itec tu re . This will be discussed in more d e ta il in section  6.1.
4.2 Pipelining in the Silicon Environment
The SU-MIPS pipeline schem e and instruction  cycle tim e was optim ized to 
m atch the silicon operand fe tch  tim e [Gross83b]. This includes the use of a five 





Instruction Fetch -  Send out the Program Counter (PC) and 
increm ent it .
Instruction Decode -  Decode the requested  instruction .
Operand Decode -  Com pute the e ffec tiv e  address and send 
to memory if  a LOAD or a STORE; a lte rn a te ly , use the ALU 
for a reg is te r to reg is te r ALU operation.
Operand Store - Send out the operand if a STORE operation; 
a lte rn a te ly , use the ALU for a reg is te r to reg is te r ALU 
operation.
(OF) Operand Fetch - Receive the operand if a LOAD and w rite
it to a reg ister.
This sequence lends itse lf well to the silicon environm ent because the operand 
or instruction  fe tch  tim e corresponds to only tw ice the instruction  
execu tion-tim e. Consequently, the pipeline overlap was designed to fe tch  the
12 -
next instruction  every two machine cycles. The resu lt is a good balance
betw een the d a tap a th  tim e, ALU tim e, and memory access tim e. With e ffic ien t 
packing, delayed branches, and a large reg ister file to reduce off-chip  operand 
references, an e ffic ien t use of the hardw are resources is possible. Even when 
the em ulation requires support of e laborate  addressing modes, i t  is not
prohibitive in the silicon environm ent because it will only take  two clock cycles 
to fe tch  a memory operand.
4.3 Pipelining in the GslAs Environment
The GaAs environm ent is radically  d ifferen t than the silicon environm ent for 
several reasons. The most troublesom e, as m entioned before, is th e  trem endous 
bo ttleneck  of off-package com m unications. This makes many of the 
advantageous mechanisms in silicon d ifficu lt to im plem ent in GaAs* One of these 
is the pipeline mechanism because the instruction  execu tion-tim e is an order of 
magnitude fa s te r  than the memory fe tch  in GaAs tim e as com pared to only
tw ice as fast in silicon. This changes the pipeline im plem entation although it  
does not force much of a d ifference  in the p a rticu la r sequence of events which 
occurs. One possible pipeline schem e in the GaAs environm ent is shown below:
Instruction Fetch -  Send out the  PC, increm ent i t ,  and
decode the cu rren t instruction .
R egister I Read - Read the firs t operand from a reg ister.
R egister 2 Read -  Read the second operand from a
reg ister.
A rithm etic or logic operations on the operands to produce an 
a rith m etic  resu lt, calcu la te  an e ffec tiv e  address, or perform  
a com parison.
(MEM) Memory Address -  Send out an e ffec tiv e  m emory address*
(WB) Write Back -  Send out the resu lts of an ALU operation  fo r a
memory or reg is te r w rite, or fe tch  the operand fo r a
memory read.
This pipeline schem e is designed to consist of five stages where REG l and 
REG2 were separa ted  to em phasize th a t they  are fe tched  sequentially . The 
sequential fe tch  makes sense for two reasons. F irst, an a tte m p t is being made 
to get an optimum m atch betw een the d a tap a th  tim e, mem ory fe tch  tim e, and 
the instruction fe tch  tim e. Secondly, two read busses are  expensive in VLSI rea l 
e s ta te , finis is not a problem in silicon but is a problem in GaAs, Since the 
ex tra  bus could not be fully u tilized  because of the lim ited  memory bandw idth,
it  makes more sense to use the VLSI area saved from using only one bus for an








In the SU-MIPS a rch itec tu re , th e re  is no pipeline interlocking schem e [9]. 
Instead of hardw are controls which prevent pipeline hazards and conflic ts , the  
Stanford a rc h itec ts  decided to use softw are in terlocking [H eJoP83J. The 
advantage of this schem e is th a t the hardw are can generally  achieve a higher 
throughput because an in te lligen t com piler can frequently  put ah unrelated  
instruction  in the  job stream  which can execu te  while the  hazard  is being 
avoided. Of course, this is not always the case. One exam ple of this is 
discovered when a ttem p ting  to pack an addressing mode decom position with an 
instruction  decom position.
Because of so ftw are in terlocks, it would appear th a t th e re  would be an 
advantage in combining each of the eighteen  MC68020 addressing modes with 
every instruction  decom position to determ ine the overall execution efficiency . 
Upon closer exam ination, this proves not to be the case. Whenever the  next 
prim itive is dependent on the da ta  being fe tched , a situa tion  known as a d a ta  
dependency exists and in tra in struc tion  packing is usually p revented . This implies 
th a t the NO-OPs following the final LOAD instruction  in an addressing mode 
decom position cannot be optim ized out.
5.2 taterinstnietion Packing
Now consider packing on ah in terin struc tion  basis. Although it  was shown th a t 
d a ta  dependencies o ften  prevent packing on an in tra in stru c tio n  basis, packing 
sequential addressing mode prim itives with previous instruction  prim itives can be 
done for a large m ajority of cases [10].
U nfortunately, there  is one problem associated  with this type of packing# If a 
particu la r addressing mode decom position has a label associated  with it, packing 
is not always possible on an in te rin struc tion  basis with the instruction  prim itives 
preceding it. The reason is th a t if  prim itives from a previous instruction  are 
packed with the prim itives of the next addressing mode, then any routine th a t 
jumps or calls th a t instruction  will execu te  code th a t could have adverse a ffe c ts  
of the execution of the program . For exam ple, if  the la st prim itive in the 
decom position was a reg ister ADD and it was packed into the next sequential 
addressing mode decom position which happened to have a label, then the 
destination  reg is te r of the ADD would always be modified whenever a branch was 
taken to th a t label. This cannot always be allowed.
9. MIPS was derrived from "M icroprocessor w ithout In terlocked Pipeline Stages."
10. An in te resting  consequence of this fa c t is th a t the effic iency  of em ulation 
cannot accu ra te ly  be determ ined w ithout an actual transla tion  and execution 
since the in te rin struc tion  packing probability  is program dependent.
14
O tH E tt AttCHirECTURAL CONSIDERATIONS
6.1 On-Package Cache Support
The significance of an on-package, off-chip  cache on the execution"tim e 
perform ance cannot be overem phasized. Since the cache a ffec ts  the pipeline 
depth it is ex trem ely  im portan t; i t  can significantly  decrease the average pipe 
fe tch  latency . Cache basically  com es in two storage types, Instruction (I) cache 
and D ata (D) cache. Both types have trem endous advantages in the GaAs 
environm ent.
Significant execution-tim e im provem ents can be a tta in ed  from a cache with as 
little  as 4 K bits of sto rage [HwaBf 84].- For exam ple, a hit ra tio  of over 50% can 
be expected  ̂from an I cache with 64 32-bit words [SmiGoBB]. Since m em ories of 
this size have already been fab rica ted  in GaAs [ HiInM 84], th is is not 
unreasonable to consider. The I cache should be large enough to hold a typical 
size program loop; in the MC68020, this was chosen so th a t 64 instructions can 
be kept on-package. Given the 4 K lim itation , a modest D cache could also 
im plem ented in GaAs with 64 32-bit words. These can be used to keep many of 
the most frequently  used instructions or operands of a program only two machine 
cycles away.
If a cache is available, then support for a four level memory h ierarchy  must 




4. Mass sto rage th a t has an access tim e which is device dependent.
Assuming a reasonable cache hit ra tio , many of the m emory re fe ren ces  should 
s ta tis tic a lly  reside in the firs t two levels of mem ory h ierarchy  as shown in 
Figure 4. This will enhance the effic iency  of em ulation because when a ttem p tin g  
to  tran sla te  CISC instructions it  is advantageous (from an a rch itec tu ra l point of 
view) to have RISC hardw are with ch arac te ris tic s  as close as possible.
One of the possible im plem entations for a cache would be to use so ftw are  
in terlocks for operands which are  fe tched  from the cache and hardw are 
in terlocks for operands which are fe tched  on a cache miss. If th is schem e is 
used, then a GaAs MIPS would be able to use a two stage  pipeline iden tical to 
the SU-MIPS. The conclusion th a t can be drawn from an im plem entation  of 
cache in GaAs is th a t the closer the operands are to the ALU, the b e tte r  the 
effic iency  will be.
R egister operands available with v irtually  no fe tch  la tency .
Cache operands available within two machine cycles on a cache h it. 
Memory operands available within five machine cycles.
15
6.2 Coprocessor Support
One aspect of the instruction  se t synthesis which was not addressed in 
Appendix C is the  extensive coprocessor instruction  se t which the MC68020 
supports. The prim ary reason for this omission is th a t most RISCs don t  have 
coprocessor support and it  is tedious a t best to em ulate floating point 
instructions. Another reason is th a t with an instruction  cache oh-package, i t  is 
very d ifficu lt fo r the  coprocessor to  m onitor the instruction  bus to asce rta in  
which instruction  the m icroprocessor is executing a t any given point in tim e 
[11], Finally, some types of instructions, such as a conditional action  based on a 
coprocessor resu lt, a re  alm ost prohibitive in the GaAs environm ent because of 
the d ifference  in the instruction  cycle tim e and the  o ff-ch ip  signal propagation. 
In this section , we will a ttem p t to address these questions.
6.2.1 Monitoring the Instruction Bus
The problem with trying to m onitor the instruction bus is th a t when an 
on-package cache is p resent the in ternal in struction  bus is usually hidden from 
ex ternal peripherals. In general, only the Cache D irect Memory Access (CDMA) 
con tro ller has access to both the in ternal and ex ternal busses. Because of this, 
the CDMA is the only device capable of granting the coprocessor access to the 
curren tly  executing instruction .
One proposal for handling coprocessors is based on the CDMA sending a 
requested instruction  to both the m icroprocessor and the ex ternal in struction  bus 
in the event of a cache hit. In the event of a cache miss, the CDMA must 
request the instruction  from ex ternal memory anyway, so it is available to the 
coprocessor im m ediately. One drawback of this proposal is th a t it does prohibit 
C D MA prefetch ing  of the next sequential instructipn  to be executed  from main 
memory [12] and it does increase bus contention. In most applications th a t 
would require a coprocessor how ever, these draw backs are  o ffse t by the  
advantages of coprocessor support.
6.2.2 Conditional Actions Based On Coprocessor Conditions
For an existing RISC there  are two predom inant methods for supporting a 
conditional action based on a coprocessor condition. One employs a memory 
location used as a s ta tu s  reg is te r to support a softw are in terlock  which is se t to 
TRUE when a memory mapped coprocessor reg is ter becom es valid.- The o th er 
method uses a mem ory location as a d a ta  reg is te r but is in te rru p t driven and 
causes an in te rru p t when the data  becom es valid by asserting  the in te rrup t pin. 
If the in te rrup t m ethod is used to indicate th a t the condition has been te s ted , 
the validity  of the te s t result is found as soon as the in te rru p t vec to r is placed 
on the instruction  bus; this is the most e ffic ien t m ethod.
11. This is the method most frequently  used in coprocessors today.
12. This applies most d irec tly  to CDMAs with a rem ote PC to ind icate  the 
location of the next instruction and/or a "likely bit" to indicate  the probability 
of a branch [PaGaH83].
16
Fortunately , on new RISC designs there  are many coprocessor support options 
available for im plem entation with regard  to flags being te s ted  and se t. In 
addition to the methods described above, a separa te  se t of pins could be 
included to indicate  the valid ity  of the resu lt, and optionally, the result of the 
te s t. The constraining fac to r is the number of pins which are  available due to 
packaging considerations.
6.2.3 An Operand T ransfer Scheme
The number of in ternal reg isters available to the coprocessor is b e s t le ft 
transparen t to the RISC unless a p a rticu la r coprocessor is being designed to  be 
in teg ra ted  with the overall a rch itec tu re  or it is decided th a t only a p a rticu la r 
coprocessor will be supported. Therefore, it m akes no d ifference  w hether the 
coprocessor is a stack  machine or a reg ister machine; ra th e r, what is im portan t 
is tha t the operands are readily available to the RISC.
The most frequent solution to the operand tran sfe r question is based on a 
memory mapped reg ister schem e. This schem e requires th a t the  values th a t are 
contained in th e  reg isters used by the coprocessor are re flec ted  in the con ten ts 
o f main memory. Memory should be updated to  re fle c t any changes in the 
in ternal reg isters in a "w rite through" fashion sim ilar to a d a ta  cache by stealing  
processor mem ory cycles. If for exam ple, the instruction  on the in struction  bus 
is a coprocessor instruction , then  any of the m echanisms described in the 
previous subsection can be used to  indicate  the valid ity  Of the resu lt as long as 
the coprocessor has Written its resu lt to a memory location prior to the 
m icroprocessor requesting it. If all memory references are  handled through a 
memory con tro ller, then the exponent can be stripped or the m antissa norm alized 
during the operand fe tch . This will be discussed in more d e ta il in the next 
section . : -
6.3 Bit and Byte Field Support
As indicated  in o ther sections, because of the lim itations in VLSI area  which a 
GaAs design can exploit, any m igration of CISC operations into peripheral chips 
has the po ten tia l to improve execution effic iency . This is only true  for 
fo rm atting  preprocessors if the peripheral processor doesn't im pact the  m em ory 
fetch  la tency . Even though bit and byte field operations may be in frequen t, if  
they are expensive enough re la tive  to the overall instruction  transla tion  it could 
prove beneficial to add a peripheral chip with fo rm atting  capabilities.
For instance, if  a d a ta  fo rm at chip increased the operand fe tch  la ten cy  for 
every operand fetched  by two machine cycles, then it would not make sense for 
this chip to be placed in the memory path if  most memory requests consisted of 
32-bit word aligned operands. However, if a chip were designed which gcted 
much like a DMA con tro ller and only increased the mem ory la ten cy  when a 
request for a fo rm atted  memory operand or coprocessor operand was made, it 
could be a valuable addition to the a rch itec tu re . Appendix G illu s tra tes  the 





7.1 Translation Prom MC68020 Code to MIPS Code
Just as it is assumed th a t a new hardw are technology will soon be available 
for com m ercial use, so too is i t  assum ed th a t a new so ftw are  technology will be 
available. This assum ption is in fa c t alm ost n ecessita ted  in the hope th a t some 
am ount o f em ulation effic iency  is salvageable. The belie f in a new softw are 
technology is tw ofold in ju stifica tion . F irst, it is su b stan tia ted  by th t  large 
developm ent e ffo rt in the areas of in te lligen t com pilers and in o ther types of 
a rtific ia lly  in te lligen t softw are. Secondly, it is su bstan tia ted  by the general 
in te re s t in RISC a rch itec tu re s  as a whole - an a rch itec tu re  which em phasizes the 
role of an in te lligen t com piler as an in teg ra l part of the a rch itec tu re .
So far, the individual pieces of the transla tion  have been analyzed w ithout
dealing with the actual task a t hand. Now, elaboration on the ac tua l translation  
schem e employed in ge tting  from MC68020 machine code to MIPS m achine code 
is needed. The em ulation has been described as consisting of modular, disjoint 
tasks which could be in teg ra ted  into a single program th a t is capable of the 
translation . The goal now is to combine all the pieces to ge ther and com m ent on 
the ir various in teractions. This process begins with raw MC68020 machine code 
and ends in optim ized MIPS machine code.
. . • • “ C . V’ V ' /
7.2 Disassembly and Decomposition o f the MC68020 Instructions
The firs t s tep  of translation  is to disassem ble the MC68020 machine code into 
assem bly levei code so th a t it can be d ea lt with it in a symbolic manor.
Disassemblers have already been w ritten  for this processor so li t t le  elaboration 
about this point is required. The th rust of the e ffo rt here  is in assigning labels 
to various parts of the program and in decoding operand lengths.
The next s tep  in em ulation consists of decomposing the disassem bled
instructions into the ir MIPS prim itives. This will include the addressing mode 
decom position outlined in Appendix A, followed with the operation  decom position 
outlined in Appendix C. The MC68020 machine code address will be associated
with the firs t prim itive of the addressing mode decom position. The condition
code synthesis will then be inserted  into the job stream  w ithout regard  to
w hether or not the  s ta tu s  reg ister b its are  used; this decom position is outlined in
Appendix B. The MIPS supported constructs will however, rep lace the ir MC68020 
coun terparts  where appropriate .
Very l it t le  optim ization will be perform ed a t this point in the translation  w ith 
the possible exception of reg is te r reassignm ent for global optim ization  and any 
in tra instruction  packing th a t is possible. If a reg is te r file la rger than 24
reg isters were available, then the reg is te r reassignm ent would a ttem p t to assign 
variables in such a manor as to reduce the am ount of LOAD/STORE instructions 
required. The instruction  boundries will be preserved until a f te r  the condition 
codes have been analyzed and rem oved where possible. This will prevent 
optim izations for code th a t is destined to be rem oved.
- 18
7.3 Condition Code Analysis and Interinstruction Optimization
C ondition code analysis and optim ization will occur next in the sequence as 
per specifications outlined in section  3.1. This is one of the most- lengthy 
optim ization  steps in the sequence. One thing which was not previously
discussed is the concept of preserving the condition code boundries to ensure 
tha t the synthesis is not prevented from altering  the required bits in the sta tu s 
reg ister a t the proper point during execution.
The next module of translation  handles in terin struc tion  packing of MIPS 
prim itives. This task was outlined in more d e ta il in section  5.2. Care must be 
taken to preserve instruction, subroutine, and condition code boundries where 
required. For exam ple, most RISC prim itives cannot be packed with a preceding 
decom position's prim itives if it is a subroutine boundry in question. This is 
because it is generally  not desirable to execute prim itives from a preceding 
instruction decom position every tim e a branch is taken to the subroutine.
The preservation  of the subroutine boundry is frequen tly  not required in the 
case of condition code synthesis, however. The condition code synthesis only 
modifies reg isters R l9 and R20. Since it does not m a tte r  if  these reg isters  are 
modified throughout the subroutine, it is inconsequential if the condition code 
synthesis crosses the routine's boundry as long as the synthesis is com pleted  
before the a ffec ted  bits are re se t by another instruction  or a conditional action 
is reached.
7.4 Standard MIPS Optimizaticm Mechanisms
The rem ainder of the translation  mechanism will rely  on the  standard  MIPS 
optim izer and assem bler with some minor d ifferences. The key d ifference  is 
th a t the op tim izer will only be allowed to opera te  on reg is te r reassignm ent and 
delayed branching NO-OP replacem ent. Subroutine, procedure, and condition 
code boundries must be preserved since the op tim izer and reorgan izer are 
oblivious to the transla to r's  overall purpose. The following is the  order which 
this optim ization will occur.
i .  Reorgemize of the MIPS, machine code to avoid pipeline and branch
2. Pack the MIPS machine code into the most com pact form allowed under 
the imposed boundry constrain ts.
3, Assembly of the resulting instruction translation  into MIPS machine code 
which is "hidden" from the end user.
Although certa in  portions of the standard  MIPS optim ization schem e could be 
combined With o ther modules of the translation , it does not make sense to 
rew rite any more of the existing softw are than is absolutely necessary.
Following th a t philosophy, an a ttem p t has been made to reuse as much existing
softw are as possible. It is generally  easier to modify a piece of so ftw are  than
it is to rew rite  a new piece and be faced with a new se t of problem s
hazards^
19 -
a ltogether. In addition, following the conclusions of this paper, it is assumed 
th a t a softw are translation  will be avoided when possible so s ta rtin g  from 
scra tch  is not ju stified  because of the developm ent e ffo rt th a t would require.
CONCLUSION
ThiS paper analyzed the a rch itec tu ra l considerations concerning the  em ulation 
of a CISC with a RISC. It was shown th a t it is possible to em ulate the 
constructs supported by a CISC excep t for som e softw are constructs which are 
used to control the hardw are. The MC68020 and the SU-MIPS were Sdltcted  as 
the com parison a rch itec tu res  to help exem plify the im portan t concepts of a 
translation; th e ir selection was based on the curren t in te re s t in both of these 
m icroprocessors. It was fu rth e r shown th a t a poor em ulation effic iency  is 
expected from a d irec t translation  of the MC68020 instructions into MIPS 
prim itives because of the natu re  of the MC68020 instructions.
In addition, several com m ents were made on the translation  from CISC to  
RISC code with regard  to the design of GaAs fab rica ted  a rch itec tu res . The most 
im portan t d ifferences betw een GaAs and silicon a rch itec tu re s  is due to device 
physics; specifically , the speed lim itations in GaAs are due to off-chip  
com m unications bottlenecks. GaAs enjoys an order of m agnitude decrease  over 
silicon in its  machine cycle tim e but there  is v irtually  no speed advantage when 
going off-package. As a resu lt, many of the accep ted  constructs of GISC 
arch itec tu res  are not tra n s fe ra b le  into GaAs if  a RISC a rch itec tu re  is involved. 
When CISC concepts are im plem ented on a RISC design, they vio late  the 
philosophy th a t gives a RISC it speed advantage.
It is therefo re  no surprize th a t v irtually  no execu tion-tim e im provem ent can 
be realized  in a translation  because the CISC's com piled code uses constructs 
and resources unavailable or inconsistent with the RISC's. The resulting  
inefficiency may be to lerab le  because of the radiation hard advantages in a GaAs 
RISC, but it was concluded th a t the most e ffic ien t method of em ulation could 
be achieved by firs t debugging HLL source code on a CISC using all of the 
developm ent hardw are available and then recom piling the source code with a 
com piler th a t is aw are of the resources available to the RISC. This allows the 
exploitation of the advantages of both types of a rch itec tu res ; nam ely, the 
flexibility of a CISC and the speed of a RISC. Furtherm ore, using a RISC to 
execute compiled code instead of transla ted  code allows em ulation effic iencies 
much g rea te r than unity.
ACKNOWLEDGEMENTS
The authors are thankful to th e ir colleagues from the Purdue University GaAs 
pro ject for the ir useful com m ents and suggestions and to RC A's Advanced 












[ HiIn M 84]
[Honey85]
Clark, D.W., S trecker, W.D., "Com m ents on 'The Case for
the Reduced Instruction Set Com puter,' by P atterson  and
D itzel," ACM SIGARCH Com puter A rchitecture  News, Vol. 8, 
Nb. 6, O ctober 1980, pp. 34-38.
Eden, R.C.; Welch, B.M., "In tegrated  C ircuits: the Case for 
GaIHurn Arsenide," IEEE Spectrum , VoL 9, No. 12, Decem ber 
1983, pp. 30-37.
Gill, J ., Gross, T., Hennessy, J ., Jouppi, N., Przybylski, S.,
Rowen, C., "Summary of MIPS Instructions," Technical 
Report 83-237, Stanford University, November 1983.
Grappel, R.D., Hemertway, J .E ., "A Tale of Four
M icroprocessors: Benchmarks Q uantify Perform ance,"
Electronic Design News, April I , 1981, pg. 179.
Gross, T.R ., "Code O ptim ization Techniques fo r Pipelined 
A rchitectures, Digest of Papers, Spring CO MPCON 83;
Gross, T.R ., "Code O ptim izations of Pipelined C onstraints," 
Technical Report 83-255, Stanford University, Decem ber 
1983.
Hennessy, J ., Jouppi, N., Przybylski, S., Rowen, C., Gross, 
T., "Perform ance Issues in VLSI Processor Design," 
Proceedings of the In ternational C onference Oh Com puter 
Design, Rye, New York, O ctober 1983.
Hennessy, J.L ., Gross, T.R., "Code G eneratioh and 
R eorganization in the Presence of Pipeline C onstrain ts," 
Proceedings of the Ninth POPL Conference, January  1982, 
pp. 120-127.
Hennessy, J.L ., Gross, T.R., "Postpass Code O ptim ization of 
Pipeline C onstraints," ACM Transactions on Program m ing 
Languages and Systems, Vol. 5, No. 3, July 1983, pp. 
422-448.
Hirayam a, M., Ino., M., MatsuOka, Y., Suzuki, M., "A GaAs 
4Kb SRAM With D irect Coupled FET Logic," Proceedings of 
the 1984 IEEE In ternational Solid-State C ircuits C onference, 
San Francisco, CA, February 1984, pp. 46-47.
Lee, T.C., EE694 G raduate Engineering Seminar, Purdue
University, Spring 1985.
21
[HwaBr84] Hwang, K., Briggs, F.A., "C om puter A rchitecture  and Parallel 
Processing," McGraw- Hill Book Company, 1984.
[IkTo M 84] Ikawa, Y., Toyoda, N., Mochisuki, M., Terada, T., Kanazawa, 
K., Hirose, M., Mizoguchi, T., Hojo, A., "A IK GaAs Gate 
A rray,” Proceedings of the 1984 IEEE In ternational 
Solid-State C ircuits Conference, San Francisco, California,
February 1984, pp. 40-41.
[K atev83] K atevenis, M.G.H., "Reduced Instruction Set C om puter 
A rchitectures for VLSI," PHD Thesis, U niversity of C aiifoniia 
a t Berkeley, O ctober 1983.
[MilFu84] M ilutinovic, V., Fura, D., "An Introduction into the GaAs 
M icroprocessor A rchitecture," to appear.
[M otor84] Motorola Inc., "MC68020 32-Bit M icroprocessor User's 
Manual," P rentice  Hall,1984.
[PaG aH 83] Patterson , D.A., Garrison, P., Hill, M., Lioupis, D., Nyberg, 
C., Sippel, T., Van Dyke, K., "A rchitecture of a
VLSI-Instruction Cache for a RISC*" Proceedings of the 
Tenth Annual In ternational Symposium on C om puter
A rchitecture, Stockholm, Sweden, June 1983, pp. 108-116.
[PrG rH 84] Przybylski, S.A., Gross, T.R ., Hennessy, J.L ., Jouppi,'4SLP., 
Rowen, C., "O rganization and VLSI Im plem entation o f MIPS,"
Technical Report 84-259, Stanford U niversity, April 1984.
[SmiGo85] Smith, J.E ., Goodman, J .R ., "Instruction Cache R eplacem ent 
Policies and Organizations," IEEE Transactions on Com puters, 
VoL C-34, No. 3, March 1985.
[Tseng84] Tseng, P.S., "A S ta tis tica l Study of the Softw are for 
D edicated Special Purpose Applications," Purdue University 
In ternal R eport, D ecem ber 1984.
[ Wulf81] Wulf, W.A., "Com pilers and Com puter A rchitecture," IEEE 
Com puter, Vol. 14, No. 7, July 1981, pp. 41-48.
22 -
Table I . PERFORMANCE
■ . ; ; .V . :
COMPARISON OF GaAs AND SILICON [MilFu8S].
. : \  : . '■ GaAs Silicon
COMPLEXITY
Transistor Count/Chip 20-30K 300-400K
Chip Area yield and power yield and power
dependent dependent
SPEED ■' ; , -
Gate Delay 50-150 ps 1-3 ns
On-*chip Memory Access 0.5-2.0 ns 10-20 ns
O ff-chip/O n-package Access 4-10 ns 40-80 ns
0 ff-ch ip / O ff-package Access 20-80 ns 100-200 ns
IC DESIGN 
T ransistors/G ate I + fanin I + fanin
Transistors/M em ory Ceil 
S tatic 6 6
Dynamic I I
Fanin (typical transistor) 3 - 4 5 ,
Fanout (typical transistor) 3 - 4 . 5
Fanout Gate Delay (ea. gate) 25 -  40 % 25 -  40 %
23





11 I  l 1I MUI 









































Figure 3. COMPARISON OF EXECUTION EFFICIENCIES AND CODE SIZE FOR 











Figure 4. CACHE HIT RATIOS VERSUS CACHE SIZE
HIT RATIO (%) 
1 0 0 . 0 . DATA CACHE
8 0 . 0
INSTRUCTION CACHE
6 0 . 0
4 0 . 0
2 0 . 0
3 2 0 . 0
2 5 6 . 0
CACHE SIZE IN 3 2 - B I T WORDS
1 9 2 . 06 4 . 0
3 8 4 . 00 . 0




MIPS FUNCTION MC68020 BITS
R0-R7 (D ata registers) DO-D 7 32
R8-R14 (Address registers) A0-A6 32
R15 (User stack  pointer) A 7 32
PC (Program counter) PC 32
R16 (Status & cond code) SR 16
R17 (Interrupt stack  pointer) A7' 32
R18 (M aster stack  pointer) A 7" 32
[13] (Vector base register) VBR 32
[14] (A lternate function) SFC 3
?! ft DFC 3
[15] (Cache control register) CACR 32
(Cache address register) CAAR 32
R19-R23 (Scratchpad registers) [16] 32
13. The default address of for exception handling in MIPS is always zero , therefo re  VBR is 
not required.
14. These are not supported by MIPS
15. Since MIPS has built in (non-optional) cache support, these reg isters are not required.
16. MIPS will require additional scratchpad reg isters for address calculations th a t the 
MC68020 doesn't require.
28 -
A.2 Addressing modes supported  by the  MC68020
1. Data R egister D irect [17]
2. Address R egister D irect
3. Address R egister Indirect
4. Address R egister Indirect with Postincrem ent
5. Address R egister Indirect with P redecrem ent
6. Address R egister Indirect with D isplacem ent
7. Address R egister Indirect with Index (8 Bit Displacement)
8. Address R egister Indirect with Index (Base Displacement)
9. Memory Indirect Post-Indexed
10. Memory Indirect Pre-Indexed
11. PC Indirect with D isplacem ent [18]
12. PC Indirect with Index (8 Bit Displacement)
13. PC Indirect with Index (Base Displacement)
14. PC Memory Indirect Post-Indexed
15. PC Memory Indirect Pre-Indexed
16. Absolute Short Address
17. Absolute Long Address
18. Im m ediate
A.3 R egister D efinitions used in Translation
Alo
’ • v  ' "
Low order bits of a physical address (24-bits)
An Any address reg ister
Ahi . ■ High order bits of a physical address (8-bits)
Bd Base D isplacem ent (sign extended to 32-bits)
DISP Any 8-b it displacem ent
Od Outer D isplacem ent (sign extended to 32-bits)
SCALE Constant equal to 0,2,4, or 8
SIZE Constant equal to 1,2, or 4
Xn Any reg ister used as an index reg ister (sign extended to 32-bits)
17. Modes 1,2,16, and 18 are d irectly  supported by MIPS.
18. PC re la tive  modes will need some m odifications because the translation  will change the 
re la tive  location of some of the operands.
-  29 -
A.4 Addressing Mode Translation
MODE I: D irectly supported by MIPS.



































































MODE 11: Ld Bd-2,R22
Mov PC,R21 ;;Add R22, R21
Ld 0[R21],R23 ;; No-op
No-op ;;
MODE 12: Mov DISP-1,R22 J5Mov PC, R21
Add R22,R21 J5Mov Xn,R22
Sll SCALE,R22 jjAdd R22, R21
Ld 0[R21],R23 jjNo-op
No-op ;;
MODE 13: Ld Bd-2,R22
Mov PC,R21 J5Add R22, R21
Mov Xn,R22 jjSll SC ALE,R22
Ld [R21+R22],R23 jjAdd R22, R21
No-op ;;
MODE 14: Ld Bd-2,R22
Mov PC,R21 J5Mov Xnf R23
■ Ld [R21+R22],R21 ;;sii SC ALE, R23
Ld Od,R22 - ■
Add R22,R21 jjAdd R23,R 21
Ld 0[R21],R23 jjNo-op
No-op ;;
MODE 15: Ld Bd-2,R22
Mov PC,R21 jjAdd R22, R21
Mov Xn,R22 jjSll SCALE, R22
Ld [R21+R22],R21 jjNo-op
Ld Od, R22 . ' '
Ld [R21+R22],R23 J5Add R22,R21
No-op
MODE 16: D irectly  supported by MIPS
MODE 17: Ld AlofR 22
Mov AhifR 21 /;;S11 #24,R21
Ld [R21+R22], R23 jjAdd R22,R21
No-op JJ
MODE 18: D irectly  supported  by MIPS
Appendix B
CONDITION CODE SYNTHiSlS
B. I  C ondition C ode Bit Map
The status reg ister consists of a 32 bit dedicated reg ister (R16) Where there  are only a 
few bits defined by the condition code em ulator. The defined b it map is as follows:
Bit 0: Carry (C)
Bit I: Overflow (V)
Bit 2: Zero (Z) ■ ■ ■ •• " '
Bit 3: Negative (N) ■ . ‘  ̂ :: . :
Bit 4: Extend (X) ■ . ■ ■ • . . . - I- '■
Bit 5: : ,-O-' 'V,'. ,
Bit 6 : '',a ' 1 ' '
Bit 7: o
The upper 24 bits of the condition code reg ister will be reserved for the system 
monintor. The lower byte of the sta tus reg ister is com parable to the user byte in the 
MC68020.
B.2 Basic C onditions Tested
There are 29 instruction groupings listed in appendix A-3 of the Motorola user's manual 
which a ffec t the condition codes in some way. Since the overflow and carry  b its  are 
handled with the macros listed below, the overflow exception handler must be disabled; it 
will be used only for the m ultiply and divide instructions and will be specifically  tested  
using the conditional trap  instruction. Notice th a t the condition code synthesis is not 
com pletely optim ized since only the macros required will be inserted . The optim ization will 
be perform ed a fte r the specific macros required are inserted . The MIPS instruction form at 
was obtained from [GiGrH83].
32 -
B.3 Macros Preceding a Conditional Instruction
CC => carry  c lear
Or #-2,R16,R19 ;;Not R19




F => always false
. Mov #0, R19 ;;
GE = > g rea te r or equal









And #1,R19 ;; No-op
And #4,R16,R20 ;;Sra #2,R20







LE = >■ less or equal
Mov #10,R19
Seq #2, R l 9
;;And R16,R19
Seq #8,R19
And #4,R16,R20 ;;And #1,R19
Sra #2,R20 ;;0 r R20,R19
LS => lower or same
And #5,R16,R19 ;; ■
LT => less than




MI = > minus
Mov #8 , R19 ;;And R16,R19
NE => not equal
Or #-5,R16,R19 »5 Not R19
PL => plus
Mov #-9,R19 ;;0 r R16,R19
Not R19 ;;
T => always true
Mov #-l,R 19 ;;
VC = > overflow clear
Or #-3,R16,R19 ;;Not R19
VS => overflow set
And #2,R16,R19 '55
34
B.4 Condititm Code Transla tion
D efin itio n s:
Dn D estination reg ister of the instruction
Lb Lower bound for a comparison
O ffset The offset of the b it field within the effec tive  operand
Rn Result of the operation
Sn Source reg ister of the instruction
Ub Upper bound for a comparison
Width Width of the bit field in the e ffec tive  address
OPERATION SYNTHESIS
ABCD M o v #199, R19
Sgt Rn,R19
Mov #17,R20




• ; ' And R16,R19
ADD Mov #128,R20
ADDI Xor Sn, Rn,R l 9-







: • . . ; Or Sn,Dn,R19




. : And #4, R l 9
Or Rl 9, R l 6
BIT AFFECTED







;; No-op /*  se t zero









/* se t overflow














/* se t carry
j j No-op /*  set zero
jjAnd
;;
# 1 1 ,R l 6
-  35




V- And R 2 0 , R l 9
And #1,R16,R19
Mov #-17,R20
Or Rl 9, R l 6
ADDX Mov #128,R20





■' ’ ’ And Sn, Dn, R19
. Rol #1,R19
Or Rl 9, R l 6
P r Sn,Dn,R19
Xor #-l,R n,R 20
Bol #1,R20

























Rh, R l 9
/* se t negative






























!* se t carry
;; No-op I* se t zero
;;0 r R19, R16 • / "■
;;And 
;; No-op





#4, R l 9 
R20,R16











IlPr Rl 9, R l 6 - " :



























' . ” . - . Or R20,R16
Mov #128,R20
And Sn, Rn, R l 9
Rol #1,R19
';  ̂ • V1 Or Rl 9, Rl 6
Or Sn,Rn,R19
- * Xor #-l,D n,R 20
{ . . . . Rol #1,R20
Mov Rn, R l 9
Seq #0,R19
And #4,R19





7 ■ • And #1,R16,R19
Mov • #-17,R20






/* se t negative
;;0 r R19,R16






;; No-op /* se t carry


























R 20, R l 9 
R19,R20 
R20,R16
/* se t carry
- -r' ' ' . '








/* se t negative
;;0 r R19,R16 : . ■ . 7
;;Ro1
;;And
# 4 ,R l9 
R20,R16
/* se t extend
;;
-  37
OPERATION SYNTHESIS BIT AFFECTED
SOBX M ov #128,R20 5 5 SH #24,R20 /* se t overflow
Xor Sn,Dn,R19 ;;And R19, R20
Xor Dn,Rn,R19 ;;And R19,R20
" '' ' Rol #2,R20 ;;And #13, R16 . '•
Or R20,R16 ;;
■ \ v .V  ' '
Mov #128,R20 ;;S11 #24,R20 /*  se t carry
And Sn, Rn, R l 9 ;;And R20,R19
Rol #1,R19 ;;And #14, R l 6
Or R19,R16 ;; No-op
Or Sn, Rn, R l 9 ;;And R20,R19
Xor #-l,D n,R 20 ;;And Rl9,R20
Rol #1,R20 55 Or R20,R16 - ■ . . ■
Mov Rn,R19 ;; No-op /* set zero
• Seq #0,R19
And R16,R19 ;;0 r R19,R16
Mov #-9,R19 ;;And R19,R16 /* set negative
Mov #8jR20 ;;Mov Rn, R l 9
Sgt #0,R19 -
And R20,R19 ;;0 r R19,R16
'
And #1,R16,R19 ;;Rol #4, R l 9 /*  se t extend
Mov #-17,R20 ;;And R20,R16
Or R l 9, Rl 6 55
CAS Mov #128,R20 JlSll #24,R20 /* Set overflow
CAS2 Xor Sn,Dn,R19 ;;And R19,R20
CMP Xor Dn,Rn,R19 ;;And R19jR20
CMPI Rol #2,R20 ;;And #13,R16
CMPM Or R20,R16 ;;No-op
CMPA - ' •. ' -
\ Mov #128,R20 JfSU #24,R20 /*  se t carry
. .. . And Sn, Rn, R l 9 ;;And R20,R19
RoI #1,R19,R19 ;;And #14,R16
Or R19,R16 ;; No-op ■ ’ .' , ' -
Or Sn,Rn,R19 ;;And R20,R19 . ■' ' ■ _ •
Xor #-l,D n,R 20 ;;And R19,R20
. ■ Rol #1 ,R 20 ;;0 r R20.R16
Mov Rn, R l 9 ;;No-op /* se t zero
. Seq #0,R19 .
And #4,R19 ;;0 r R19,R16
Mov #-9,R19 ;;And Rl 9, R l 6 I* se t negative
■ Mov #8,R20 ;;Mov Rn,R19
Sgt # 0, Rl 9
And R20,R19 ;;0 r R19,R16 '
-  38 -
OPERATION SYNTHESIS BIT AFFECTED
DIVS Overflow bit is se t during the instruction.
DIVU
MULS And #14,R16 ;; /* c lear carry
MULU
Mdv ' #-9,R19 ;;And R19,R16 /* se t negative
.. '■ V ; ■ Mov #8,R20 ;;Mov Rn, Rl 9
Sgt #0,R19
And R20,R19 ;;0 r R19,R16
Mov Rn,R19 ;; No-op /* set zero
Seq #0,R19
'• ■ ; And #4,R19 ;;And #11,R16
Or .■■■. R16.R19 ;;
SBCD Mov Rn, R l 9 ;;No-op /* se t carry
NBCD Sgt #0,R19 and extend
Mov #17,R20 ;;And R20,R19
■ ; , ■ Not R20 ;;And R20,R16
' • Or R19,R16 ;;
Mov Rn,R19 ;;No-op /*  se t zero
Seq #0,R19
And #4,R19 ;;And #11,R16
Or R19.R16 55 .■ .. , ' '■
NEG Mov #128,R20 JfSIl #24,R20 I* se t carry
Mov Dn,R19 ;;0 r Rn,R19
And R20,R19 ;;Rol #1,R19
And #14,R16 55 Or Rl 9, Rl 6
Mov #128,R20 '5 5 SU #24,R20 /* se t overflow
Mov Dn,R19 ;;And Rn, Rl 9
And R20,R19 ;;RoI #2,R19 ’ ■ •'
And #13,R16 ;;0 r R l 9, Rl 6
: . , Mov Rn, R19 ;; No-op' . :; /* se t zero
Seq #0,R19 _
: And #4,R19 ;;And #11,R16
Or Rl 9, R l 6 ;;
. Mov #-9,R19 ;;And R19.R16 I* se t negative
Mov #8,R20 ;;Mov Rn, Rl 9
Sgt #0,R19
And R 20, R l 9 ;;0 r R19,R16 ■ - ■
-V.': '■ And #1,R16,R19 ;;Roi #4,R19 /*  set extend




-  39 -
OPERATION SYNTHESIS
■
NEGX Mov #128,R20 JjSH #24, R 20
Mov D n, R19 5 J Or Rh, R l 9
And R20,R19 ;;Rol #1,R19
■ ■ And #14, R l 6 U Or R l9 ,R i6
Mov #128,R20 ;;S11 #24,R20
Mov Dnr R l 9 ;;Ahd Rn,R19
■ • And R 20, R l 9 
#13, R l 6
;;Rol #2, R19
And »»0 r Rl 9, R l 6
. ; ' ' Mov Rn, R19 ;; N o—op
• . • Seq #0,R19
And #4,R19 ;;And #11 ,R16
Or Rl 9, Rl 6 JJ
BTST Mov #1,R19 ;;Mov Dn,Lo
BCHG Rol Lo,R19 ;;And R23, R19
BSET Seq #0,R20
BCLR And #4,R20 ;;And #11, R l 6
Or R20,R16 JJ
BFTST And #12,R16 JJ
BFCHG
BFSET Mov Width, L o ;;Mov #0,R19
BFCLR Mov #-l,R 20 jjRlc R 20, R l 9
BFEXTS Sll O ffset, R l 9 ;;And Dn,R19
BFEXTU Srl Off+W id-l,R19 JjSll #3, R l 9
BFFFO Mov #-9,R20 ;;And R20,R16
Or R19,R16 JJ
■ Mov Width, Lo ;;Mov #0,R19
Mov #-l,R 20 ;;Rlc R20,R19
■ Sll Offset,R19 ;;And Dn,R19
Seq #0,R19
And #4,R19 ;;And #11,R16
Or R19,R16 JJ
BFlNS And #12,R16 JJ
Mov #128,R20 JJSll #24,R20
Mov Sn, R l 9 ;;And R 20, R l 9; Mov #-9,R20 JJRol #4,R19
And R20,R16 JjOr R19,R16
Mov Sn,R19 ;; No-op
Seq #0,R19
And #4,R19 ;;And #11,R16
Or R19,R16 55
-  40 -
BIT AFFECTED
/* se t carry
/* set overflow
/* se t zero
/* s e t zero














Carry/overflow  set during instruction execution
Mov Rn, R l 9 ;; No-op /* set zero
Seq #0,R19
And #4,R19 ;;And #11,R16
Or R19,R16 ;;
Mov #-9,R19 ;;£nd R19,R16 /* se t negative
Mov #8,R20 ;;Mov Rn, R l 9
Sgt #0,R19
And R20,R19 ;;0 r R19,R16
And # I, Rl 6 , R l 9 ;;Rol #4,R19 /* set extend
M ov #-17,R20 ;;And R20,R16
Or R l 9, R l 6 ;;
And #12,R16 ;; /* c lear carry
and overflow
Mov Rn, R19 ;;No-op I* se t zero
Seq #0,R19
And #4,R19 ;;And #11,R16
Or R19,R16 ;;
Mov #-9,R19 ;;And R19,R16 /* se t negative
Mov #8,R20 ;;Mov Rn, Rl 9
Sgt #0,R19
And R20, R19 ;;0 r R19,R16
Carry set during instruction  execution
And #13,R16 ;; ./* c lear dverflow
Mov Rn,R19 ;; No-op /* se t zero
Seq #0,R19
And #4,R19 ;;And #11,R16
Or R19,R16 ;;
Mov #-9,R19 ;;And R19,R16 /* se t negative
Mov #8,R20 ;;Mov Rn, Rl 9
Sgt #0,R19
And R20,R19 ;;0 r R19,R16
And #1,R16,R19 jj'Roi #4,R19 /* se t extend
Mov #-17,R20 ;;And R20,R16











And #12,R16 ;; * I? c lear carry
and overflow
Mov Rn,R19 ;;No-op I* se t zero
Seq #0,R19
And #4, R l 9 ;;And #11,R16
Or R l 9, R l 6 ;;
Mov #-9,R19 ;;And R19,R16 /* se t negative
MoV #8,R20 Mov Rn, R19
Sgt #0,R19
And R20,R19 ;; Or R19,R16
Mov #16,R20 ;;And R16,R20 /* se t carry
Sr! # 4 ,R20 ;;And #14,R16
Or R 2 0 , R l 6 ;;
And #13 ,RlR- ;; /* c lear overflow
Mov Rn, R l 9 j; N o-op /* se t zero
Seq #0,R19
And #4, R l 9 ;;And #H ,R 16
Or R19,R16 ;; .. . -
Mdv #-9,R19 ;; And R19, R l 6 /* se t negative
M ov #8,R20 5; Mov Rn, R l 9
Sgt #0,R19
And R20,Rl9 ;;0 r  R19,R16
Carry set during instruction execution
Arid #13,R16 55 f* c lear overflow
Mov Rh, R l 9 ;; No-op /* set zero
Seq #0,R19
And #4,R19 ;;And #11,R16
Or R19.R16 ;; ' ■ . ■
M dv #-9, R19 ;; And R19,R16 /* se t negative
Mov #8,R20 ;; Mov Rn, Rl 9 ■ ■ ... ■ ■ ■ ;
Sgt #0,R19
R20,R19And ;; Or R19,R16





;; No-op /* set zero
And #4,R19 ;;And #11,R16
Or R19,R16 ;;
-  42
OPERATION SYNTHESIS BIT AFFECTED
Mov #-9,R19 ;;And R19,R16 I* se t negative
Mov #8,R20 ;; Mov Rn, Rl 9
Sgt #0,R19
;;0 r  R19,R16And R20,R19
ASR Carry set during instruction execution
LSR
ROXR And #13,R16 ;; /*  c lear overflow
Mov Rn,R19 ;; No-op /* set zero





' . Mov #-9,R19 ;;And R19,R16 /* set negative
Mov #8,R20 ;;Mov Rn,R19
- . . - Sgt #0,R19
;;0 r  R19,R16
' ' ' •
And R20,R19
' ; • ■ ■ And #1,R16,R19 ;;Rol #4jR19 /* se t extend
Mov #-17,R20 ;;And R20,R16
Or R19.R16 ;;
ASR And #12, R l 6 55 /* c lear carry
LSR
■ ■ • and overflowIjaJ XV
(r=0 ). Mov Rn, R19 ;;No-op /* se t zero
Seq #0,R19 ■
And #4,R19 ;;And #11,R16
Or R l 9, Rl 6 ' ”
Mov #-9,R19 ;;And R19, Rl 6 /* se t negative
Mpv #8,R20 ;;Mov Rn,R19
Sgt #0,R19
And R20,R19 ;;0 r  R19,R16
ROXR Mov #16,R20 ;;And R16,R20 /* se t carry
(r=0 ) Srl #4,R20 ;;And #14,R16
Or R20,R16 ;; : -  -'V-';"5 ■
And #13,R16 /* clear overflow
■ ■■ ' . _ ■ Mov Rn, R l 9 ;; No-op /* set zero
Seq #0,R19 . • . ; • . . ; -
And # 4 ,R l9 ;;And #11,R16
. Or R19,R16 ”
Mov #-9,R19 ;;And R19,R16 /* se t negative
■. : ■ • • ' Mov #8,R20 ;;Mov Rn,R19
.. ■' v "■ Sgt #0,R19
■ ; : ■■'; And R20,R19 ;;0 r  R19,R16
43 -
OPERATION SYNTHESIS






Rn, R l 9 
#0,R19
;;No-op
And # 4 ,RT9 ;;And #11,R16
Or Rl 9, R l 6 ;; ■





;;Mov Rn, Rl 9







And #4,R19 ;;And #11,R16
Or R19.R16 ;;
Mov #-9,R19
# 8 ,R 20
;;And R19,R16
Mov ;;Mov Rn, Rl 9
Sgt #0,R19
And R20,R19 ;;0 r R19,R16
-  44 -
BIT AFFECTED
/* c lear overflow 
I* se t Zero
/* se t negative
/* c lear carry  
and overflow 
/* se t zero
/* se t negative
Appendix C
INSTRUCTION SET SYNTHESIS
C .l Register Definitions used in Translation
Any address reg ister 
Any valid condition 
Any data  reg ister 
Compare operand reg ister 
Update operand reg ister
The effec tive  address reg ister of the operand (R21)
The prefetched  e ffec tive  operand reg is ter (R23)
Hi reg ister
Any valid label 24
Any constant in the range of 0 and 2 
L reg ister
A b it field o ffse t (0-31 MOD 32)
Any data  reg ister pair
Any constant betw een 0 and 32 (0 to 8 in most cases)
A b it field width (0-31 MOD 32)
NOTES:
1. In reg ister d irec t mode, <eop> is replaced by an a rb itrary  destination reg ister and the 
store instruction is om itted .
2. R19, R20 and R22 are used as scratch  reg isters but must not a ffec t condition code 
evaluation. If two effec tive  operands are used, the firs t operand fe tched  will be 
placed in R20 and the second operand will reside in R23.
3. WIDTH + OFFSET must not exceed 32 b its . A translation  error message should be 
generated if they are.
4. Not all forms of the instructions will be translated . For example:
ADD <ea>,Dn < = > Add R23,Dn >5






will be translated  using the second form of the instruction because the firs t form is 
less complex. The prim ary concern in the translation is to determ ine the worst case 
















-  45 -















PACK <eop2 >,<eop2 >




[ MC68020 instruction ] 











Add SI,<eop> ;; No-op
St <eop>,0 [<ea>] ;;
Mov #1,R19 ;;S11 #4,R19
And R16,R19 ;;Sra #4,R19
Add <eopl>,<eop2 > ;;Add R19,<eop2>
St <eop2>,0 [<ea2 >] ;;
And Dn,<eop> ;;No-op








Mov SI,Lo ;;Mov #0,R19





Or R20,R16 j.;SU' # 32-SI, R l 9
Sra #32-SI,R19 ;;Not R l 9
Sne #0,R19 .
And #2, R l 9 ;;0 r  R19,Rl6
St <eop>,0 [<ea>] ;;













Mov SI-#l,Lo ;;Sra Lo,<eop>
And #l,<eop>,R19 ;;And #14,R16
Sra #l,<eop> ;;0 r R19, R16
St <eop>,0 [<ea>] >;
Bne #0, Rl 9, Label
No-op ;;No-op
Mov #1,R19 ;;su OFFSET,R19
Xor R19,<eop> ;;No-op
St <eop>,0 [<ea>] ' ” ; ■ ■ ■ '
Mbv #1,R19 JjSll OFFSETj RiB
Not R19 ;;And R19,<eop>
St <eop>,0 [<ea>] ;;
Mov WIDTH,Lo ;;Mov #0,R19
Mbv #-l,R 20 ;; Rle R20,R19
Sll OFFSET,R19 ;;Xor R19,<eop>
St <eop>,0 [<ea>] ;;
Mov WIDTHjLo ;;Mov #0,R19
Mov #-l,R 20 ;;Rle R20,R19
Sll OFFSET, R19 ;;Not R19
And R19,<eop> ;; No-op
St <eop>,0 [<ea>] j?
Mbv WIDTHjLo ;;Mov #0,R19
Mov # - l ,R 20 55 Rlc R20,R19
sir OFFSET,R19 ;;And <eop>,R19
sii #32-WID TH-O F FS ETtRl 9 5 5 Sra #32-WIDTH,R19
Mov R19,Dn 55
Mov WIDTHjLo ;;Mov # OtR l 9
Mov #-l,R 20 ;;Rlc R20,R19
Sll OFFSET,R19 ;;And <eop>,R19
Srl OFFSET,R19 ;;Mov R19,Dn
M Ov WIDTH+0FFSET,R19 ;;Mov #l,D n
Mbv OFFSET,R22 ;;sn WIDTH+OFFSET-#ljDn




N o-op ;; No-op
Mov WID TH+OF FSETt Rl 9 ;;No-op
Add #l,R 19,D n 55
Mov WIDTHtLo ;;Mov #0,R19
Mov #-l,R 20 ;;Rlc R20,R19
And Dn,R19,R20 ;;sn OFFSET,R20
- 47 -
Sll OFFSET, R19 ;;And
Or R20,<eop> ;;No-op
St <eop>,0 [<ea>] ;;
BFSET Mov WIDTH,Lo ;;Mov
Mov #-l,R 20 M-Rlc
Sn OFFSET,R19 ;;0 r
St <eop>,0[<ea>] ;;
BFTST A ffects condition codes only.
BRA Bra Label
No-op ;;No-op
BSET Mov #1,R19 ;;Sii
Or R19,<eop> ;;No-op
St <eop>,0 [<ea>] ;;




BTST A ffects condition codes only.
CAS Beq <eop>,Dc,Equal
Mov <eop>,Dc ;; No-op
Bra Out
No-op ;;No-op
Equal St Du,0[<ea>] ;;No-op
Out No-op ;;
CAS2 Bne R n l,D c l , Nequall
No-op ;;No-op
Bra O utl
Mov D ulvRnl ;; No-op
Nequall Mov R n l,Dc ;;No-op
O utl Bne Rn2 ,Dc2 ,N equal2
No-op ;;No-op
Bra Out2
Mov Du2,Rn2 ;;No- op
















CLR St # 0 ,0 [<ea>] ;;
CMP A ffects condition codes only.
CMPA A ffects condition codes only.
CMPI Affects condition codes only.
CMPM A ffects condition codes only.
CMP2 A ffects condition codes only.
'̂ . " ■
DBCC Bne #0,R19,Out
Sub #l,D n ;;No~op
Bne # - l ,  DnjLabel
Out No-op ;;N o-op
DIYSL.L Teq #0,<eop>, Zerodivide






Loop Dstep R20 ;;Dstep
Dstep R20 ;;Dstep







Sra #31,Dr,R19 ;; And
Sub R19,R20,Dr ;;Sub





Tne #0, R l 9, Overflow
Mov #0,Hi ;;Mov










R19, R 20 















#31, Rl 9 
R19, Dq 
Hi, R l 9




























































































MOVE ■ St <eopl>,0 [<ea2 >] 55
MOVEA St An,0[<ea>] 55
MOVECCR St R16,0[<ea>] 55
MOVESR St R16,0[<ea>] 55









-  50 -
St R2,0[<ea>] I S ub #l,<ea>
St R3,0[<ea>] ;Sub #l,<ea>
#l,<ea>St R4,0[<ea>] ;Sub
St R5,0[<ea>] ;Sub #l,<ea>
St R6,0[<ea>l ;Sub #l,<ea>
St R7,0[<ea>] ;Sub #l,<ea>
St R8,0[<ea>] ;Sub #l,<ea>
St R9,0[<ea>] ;Sub #l,<ea>
St R10,0[<ea>] ;Sub #l,<ea>
St R l l , 0 [<ea>] ;Sub #l,<ea>
St R12,0[<ea>] ;Sub #l,<ea>
St R13,0[<ea>] ;Sub #l,<ea>
St R14,0[<ea>] ;Sub #l,<ea>
St R15,0[<ea>] ; No-op
NOTE: The translato r will remove the Store instruction for any reg ister not 
selec ted  to be saved. This only involves checking the reg ister list.
MOVEP Xe #3,Dn,R19 ;;S11 #8,R19
Xe #2,Dn,R20 ;;S11 #24,R20




Xe #0,Dn,R19 ;;S11 #24,R19
- Or R20,R19 ;;No-op
St R19,l[<ea>] :
MOVEQ M OV LI,Dn ;;;
MULS.L Mov #0 ,Hi ; M ov Dn,Lo
Msetup <eop> ;Mstep <eop>
Mstep <eop> ; M ste p- <eop>
Mstep < eop> ; Mstep <eop>
Mstep <eop> ; Mstep <eop>
Mstep <eop> ;Mstep <eop>
/ Mstep <eop> ; Mstep <eop>
Mstep <eop> ; M step <eop>
Mov Lo,Dn ;Mov Hi, Rl 9
Sra #31,Dn ;Xor Dn,R19
Mov Lo,Dn ; No-op
■ . : Tne # O, R19, Overf low ;No-op
MULU.L Mov #0,Hi ;Mov Dn,Lo
MSetup <eop> ; M step <eop>
Mstep <eop> ; Mstep <eop>
Mstep <eop> ;Mstep <eop>
Mstep <eop> ; Mstep <eop>
Mstep <eop> ; Mstep <eop>
Mstep <eop> ; Mstep <eop>
Mstep <eop> ; Mstep <eop>
Umend <eop> ;Mov Lo,Dn
51
NBCD UNPK <eop>,<eop> [ MC68020 instruction
Not <eop> ;;No-op
PACK <eop>,0 [<ea>l [ MC68020 instruction
NEG Subr #0 ,<eop>> ;;No-op
St <eop>,0 [<ea>] ;;
NEGX Mov #16,R19 ;;Subr # 0 ,<eop>
.■ ‘ . • And R16,R19 ;;Srl #4,R19
V " - .  ' Sub R19,<eop> ;;No-op
St <eop>,0 [<ea>] 55
NOP No-op ;;No-op
NOT Not <eop> ;;No-op
St <eop>,0 [<eop>] ;; '■■
OR Or Dn,<eop> ;;No-op
St <eop>,0 [<ea>] . 55 ■ ;■ ■ '
ORI Ld LI,R19
Or R19,<eop> ;; No-op
St <eop>,0 [<ea>] ;;
PACK Ld LI, Rl 9
•
' Xe #l,<eopl> ,R 20 ;;Add R20,R19
Xe #2,<eopl>,R20 ;;SH #8,R20
Add R20,R19 ;;No-op ■
■ ' ‘ Mov #15,R20 ;;And R20,R19
Sll #8,R20 ;;And R20,R19
Xe #1,R19,R20 ;;sn #4, R20
Xe #0,R19 ;;Add R20,R19
. - ■ ■ St R19,0[<ea2>] ;; No-op
PEA St <ea>,-l[R15] ;;Sub #1,R15
RESET Priviledged instruction - not available in user mode.
ROL Mov #1,R19 ;;Rol SI,<eop>
■ . ' And <eop>,R19 ;;And #14,R16
St <eop>,0 [<ea>] >5 Or' R19,R16
ROR Mov #128,R19 ;;sn #24, R l 9
Rol #32-SI,<eop> ;;And <eop>,R19
Srl # 31, R l 9 ;;And #14,R16
St <eop>,0 [<ea>] ;;0 r R19,R16
ROXL Mov #1,R19 ;;SU # 31-SI, R19
And <eop>,R19 ;;Not R19,R20
■ And R20,<eop> ;;sn #27-SI,R19




Sll #27-SI,R20 ;; Or R 2 O ,< eop>
Mov #-18, R 20 ;;And R 20, R l 6
Sne #0,R19
And R20,R19 u Roi SI,< eop>
; s t <eop>,0 [<ea>] ;;0 r Rl9,R16
R O X R Mov #1,R 19, ;;S11 SI-#1,R19
And <eop>,R19 ;;Not R19,R20
And R20,<eop> ;;Srl SI-#4,R19
Mov #16,R20 ;;And R16,R20
Sn SI-#4,R20 ;;0 r R20,<eop>
iMov #-18,R20 ;;And R20,R16
Sne #0,R19
And R20,R19 ;;Rol #32-SI,<eop>
St <eop>,0 [<ea>] ;;0 r R19,R16
R T D Ld 0[R15],R19 ;;Add #1,R15
Ld LI,R20
M ov R15,PC ;;Add R20,Ri5
R T E Privileged instruction  - not available in user mode.
R T R L d 0[R15],R16 ;;Add #1,R15
Ld 0[R15],R19 ;;Add #i,R 15
No-op . . ;;Mdv R19, PC
R T S Ld 0[R15],R19 ;;Add # 1, RlS
No-op ;;Mov R19,PC
S B C D UNPK <eopl>,R20 [ MC68020 instruction
UNPK <eop2 >,<eop2 > { MC68020 instruction
Mov #1,R19 ;;Sii #4, R l 9
And R16,R19 ;;Sra #4, R l 9
Sub R20,<eop2> ;;Sub R19,R20
PACK R20,<ea2> I MC68020 instruction
S C C Sne #0,R19,<ea>
S T O P Priviledged instruction - not available in user mode.
S U B Sub Dn,<eop> ;;No-op
. ■ s t ;  ' <eop>,0 [<ea>] 55
S U B A Sub <eop>,An ;;
S U B I Ld LI,R19 ‘ .■
Subr <eop>,R19 ;;No-op
St R19,0[<ea>] ;;
S U B Q Sub Sl,<eop> ;;No-op











Mov #1,R19 ;;S11 #4,R19
And R16,R19 ;;Sra #4,R19
Sub <eopl>,<eop2 > ;;Sub R19,<eop2>
St <eop2>,0 [<ea2>] ;;
Xe #l,D n, R l 9 >5 SU #24,R19
Xe #0,Dn,R20 ;>sii #16,R20
Or R20,R19 ;;Srl #16,Dn
Or R19,Dn 55
Mov #1,R19 ;;S11 #7,R19
Or R19,<eop> ;;No-op






NOTE: This trap  is always taken. 11 bits are available to pass the vector. 
Tne #0,R19,Trap
Tne #0,R19, Overflow








Xe #0,<eop>,R19 ;;Mov #15,R22
SI! #4,R19,R20 ;;And R22,R19








D.l The Purpose of this Appendix
This Appendix was included for several reasons. F irst, the translation of MC68020 
assembly language code is illustra ted  to aid the reader in understanding the nature  of a 
translation. Secondly, it was incorporated to help num erically evaluate the efficiency  of 
a typical application program for both the assembly version and the transla ted  versions on 
these two m icroprocessors. Although the compiled versions are com pared in Figures I  and 
2 for a program w ritten  in pascal to sort in teger words, the actua l efficiency  is 
somewhat re la ted  to the compiler's efficiency. These routines were w ritten  d irectly  in 
assembly code and are therefore  a b e tte r  reflection  on the machine's perform ance than on 
the com piler w riter's ability.
The benchmark chosen to te s t the two arch itec tu res is the Carnegie Mellon Benchmark 
known as Quicksort. The resulting MC68020 code is an adapted version of MC68n00L10 
code w ritten  by Motorola th a t first appeared in [G raJe81], The SU-MIPS assembly code 
appears here for the first tim e in code w ritten  d irec tly  in assembly and in code derrived 
from a translation of the MC68020 code. The translated  version was then optim ized 
using the translation specifications outlined in the tex t with global reg ister optim ization 
used to overcome the lim itations on the reg ister file's size.
The code was evaluated for both program size and the resulting execution tim e for a 4 
MHz. SU-MIPS and a 10 MHz. MC68020. The te s t data  for this benchm ark consists of 102 
(N=IOO) records, each 16 bytes long. Param eter M is se t to nine. The records are as 
follows:
Record 0 —  00 00 00 00 00 00 00
Record I —  FF 00 00 00 00 00 00
Record 2 —  FE 00 00 00 00 00 00


















Record 101 -- -  FF FF FF FF FF FF FF
Notice th a t only key values (bytes 3 to 9 in each record) are significant. All da ta  values 
are hexadecim al bytes.
-  55 -
D.2 Pseudocode used for the Quicksort Benchmark
procedure Q UICKSO RT(N,M,REC,STACK) 
in teger L,U,I,J
in teger array  STACK[0:2*F(N)-1] 
ch arac te r string V 
L != I; U := N
do forever
I I= L; J := U+l; V := R E C tL l 
do forever
do I :=" 1+1 until REC[I] >= V end-do 
do J := J - I  until REC[J] <= V end-do 
' if J > I
then swap REC[I] with REC[J] 




swap REC[L] with REC[J] 
if  both subfile sizes (J-L and U-J) <= M 
then
if stack is em pty 
then goto end-outer 
else pop L and U from stack 
end-if 
v else ■
if sm aller subfile size (J-L or U-J) <= M
then set L and U to lower and upper lim it of larger subfile 
else
push lower and upper lim its of larger subfile onto stack 





do for I from N-I to I in steps of I 
If -R E C tIl > REC[1+1] then 
:= RECtI]; J := J+ l 
do forever
RECtJ- I]  := REC tJ]; J := J+ l 
if RECtJ] >= V then goto end-last end-if 
end-do 
end-last:
R E C tJ-I] :=■ V 
end-if 
end-do









* A ttributes: 4 Gigabyte Address Range ; *





* Input: DO - "N" Record Count *
* D l -  " M " Threshold for Insertion Sort *
* AO - "REC" Address of the Sort Array *
*
* : '
A7 - "STACK" Stack Address *
*




All Registers are Transparent over this Routine *
*









LENGTH EQU 16 sort en try  record length
KEY EQU 3 offse t to key within record




* quicksort subroutine entry
*
******************************* * * *********** * * ***** * * * ************ * * *****
*
Q UIC K M OVE M.L D0-D7/A0-A6,-(SP) save all reg isters
MOVE.L DO,D 2 copy num ber of records over
LSL.L #4, DO calc. p tr. to last record
LEA -LENGTH(A0,D0.L),A1 Al <- p tr. to last record -  U
LSL.L #4 ,Dl find to ta l size of M records
MOVE.L D 1,A6 keep value in A6 for la te r
MOVEM.L DO/ 0 2 / A l,-(SP) save dummy, count, top of stack
CLR.L -(SP) mark sort stack  em pty




************ ****** ***************** * * ************** **********************
* Register use: AO -> firs t record of the subfile
* Al -> last record  of the  subfile
* A2 & A3 -> key pointers
* A4 &A5 -> work pointers
* A6 -> length of the "M" records
*
*
SP -> recursive call argum ents ' • . '
SORT LEA K E Y(AO), A2 A2 -> KEY(I) = REC(L)
LEA LENGTH+KEY(A1),A3 A3 -> KEY(J) = REC(U+1)
LOOPl LEA KE Y(AO),A4 A4 -> V for current record
:■ LEA LENGTH(A2),A2 I <- 1+1
MOVE.L A2,A5 AS tem p for I
MOVE.L #KEYLEN-1,D0 DO = loop counter
CMPl C MPM.B (A5)+,(A4)+ com pare V-REC(I)
DBNE DO,CMPl loop while equal
BHI LOOPl if REC(I) < V continue com pare
L 0 0 P 2 LEA v, • KEY(AO),A4 A4 -> KEY(V) of current record
LEA -LENGTH(A3),A3 J <- J - I
MOVE.L A3,A5 A5 = tem p for J
MOVE.L #KE YLEN-1,D0 loop counter
C MP2 CMPM.B (A4)+,(A5)+ com pare REC(J)-V
■ : DBNE D0 ,C MP2 loop while equal
■ ; / BHI LOOP2 TF REC(J) > V continue com pare
CMP.L A3,A2 I >= J
BCC ENDlST branch if I >= J -
MOVEM.L -K E Y( A2),D0“D 3 swap..- .
MO VEM.L -KE Y(A3),D4-D7 REC(J)
MOVEM.L D O-D 3,-K E Y( A3) with
MOVEM.L D4-D 7,-K E Y( A2) REC(I)
BRA LOOPl continue
*
* new subfile found, now determ ine the next stage
*************************************************************************
*
ENDlST SUB.L #KE Y,A3 sub. key - get beg. of record
MOVEM.L (AO), D O-D 3 swap
MOVEM.L (A3),D4-D7 REC(L)
MOVEM.L D0-D3,(A3) . with
MOVEM.L D4-D7,(A0) . . . REC(J)
MOVE.L A l,D l D l <- U
MOVE.L A3, D 2 D2 <- J











com pare (J-L) MSIZE 
branch if no 
com pare (U-J) <= MSIZE 
branch if no
pop next L and U from stack 
te s t if  stack  is em pty 
continue if sort is not em pty
*
************************************ * * * ***************** * * ***************
* decide subfile direction
*
*1f**^*it#**p^*#^**^ ******************************************************
* r; ;' ■ ■ - :
NEWLU CMPvL A6,D1 (U-J) <= MSIZE? (U-J) sm aller?
BLE NEWU branch if so
CMP.L D1,D2 determ ine sm aller subfile
. . BCS STACK branch if  (J-L) is smaller-
MOVE.L A lr-(SP) stack  U
MOVE.L A3,-(SP) stack  J
* ' (U-J) subfile sm aller, se t
* L A U to larger Subfile lim its
NEWU LEA -LENGTH(A3),A1 U <- J - I 1-L stays the same
BRA SORT continue the sort
* ... . ■ ■ (J-L) subfile sm aller, s e t
* .' ' -■ L A U to larger subfile lim its
STACK MOVEM.L AO/A3,-(SP) push L A J onto sort stack
NEWL LEA LENGTH(A3),A0 L <- J+ l, U stays the same
* .
BRA SORT continue the sort
******************* Ni******************************************** *********
♦ fall into insertion sort as all subfiles bellow
* or equal M records - insertion sort phase
*************************************************************************
*
* R egister use: DO -> loop counter ;
'
* D l -> counter and swap reg ister
* D2 /  D4 -> swap registers
* D5 /  D7 -> "V" save reg isters
* AO -> REC(I)
* Al -> REC(J) .■ . :
* A2 / A3 -> work reg isters . .
* A4 -> REC(J-I) • ' . ■ :
* A5 -> "V" save registers ;-' ; ' ' ■
*
*
A6 -> fram e pointer ; ‘ ; ■ '
*
*
Note: stack space is reserved for "V" key com pare record copies
MOVEM.L (SP)+,D0/A0 reload rec* count & top record
LINK A6 ,#-LENGTH allocate "V" key copy area
SUB #2 ,DO DO ranges from N-2 through 0



































-L E N GT H(AO),AO I <- I-I
KEY(AO),A2 A2 -> KEY(I)
LENGTH+KEY(AO),A3 A3 -> KEY(I+1)
#KEYLEN-1,D1 loop counter for compare
(A3)+,(A2)+ compare KEY(I) & KEY(I+1)
D I, C M PIIl loop while equal
ENDIF branch if KEY(I) <= KEY(I+1)
(A0),D5-D7/A5 V <- REC(I)
D5-D7,(SP) add on stack  for key compare
LENGTH(AO)tAl Al -> REC(J) -  REC(I+1)
A0,A4 prime A4 -> REC(J-I)
(A1),D1-D4 temp <- REC(J)
D 1-D4,(A4) REC(J-I) <- temp
A1,A4 A4 -> REC(J-I)
LENGTH(Al)tAl J = J +I
KEY(SP),A2 A2 -> KEY(V)
KEY(Al),A3 A3 -> KEY(J)
#KEYLEN-ItDl loop counter in Dl
(A3)+,(A2)+ compare KEKY(V) & KEY(J)
D ltCMPVJ loop while equal
LOOPIN if KEY(V) > KEY(J) keep looping
D5-D7/A5,(A4) REC(J-I) <- V
DOtLOOP continue linear insert
A 6 free and restore  stack
(SP)+,DO-D7/AO-A6 restore registers
return to caller









QUICKSORT - Assembly Version *
*
* Attributes: 67 Megabytes Address Range *





* Input: RO -> "N" Record Count *
* ■- Rl - "M" Threshold for Insertion Sort *
* R2 -■ "REC" Address of the Sort Array *
*
*




R 3 - > Return Address of the Calling Program *
*
* Register: R 4 - "REC + L" Lower Limit of the Subfile ' *
* Use: R5 - "REC + U" Upper Limit of the Subfile *
* ' . : • - ■ R6 -  "REC+I" Index Pointer *
* R 7 - "REC+J" Index Pointer *
* R8 -  Scratch Register for Compares, e tc . ■ *
* R9 - Scratch Register for Compares, e tc . *
* - . .:T ■ RlO - Scratch Register for Compares, e tc . *
* R H - LSW "V" *
* R12 — Tl Tl *
* R13 -MSW  "V" *








Output: The Sort Data Array is Sorted *
*
* Lines: 131 *
* Bytes:  ̂ 524 *
* *
*********************************************** ********** * * ********** * * * * 
W














St R2,-3[R3] ;;sn # 4 ,Rl
St R4,-4[R3] ;;Mov R2,R4
St R5,-5[R3] ;;Add #4,R4
St R6,-6[R3] ;;Sub #7,R3
St R7,0[R3] ;;Mov R2,R5
St R8,-l[R3] ;;Add R0,R5
; st ■ R9,-2[R3] ;;Mov RO5RO
St R10,-3[R3] ;;Mov RO5RO
St R11,-4[R3] ;;Mov RO5RO
St R 12,-5 [ R 3] ;;Mov RO5RO
St R13,-6[R3] ;;Sub #7, R 3
St R14,0[R3] ;;Mov RO5RO
St R15,-1[R3] ;;Mov RO5RO





* get V=REC[L], set I=L, set J=U+4
*
SORT Ld 0[R4],R13 ;;Mov R4,R6
Ld 2[R4],R11 ;;Xc #0,R13
Ld 1[R4],R12 ;;sn #16,R11
Add
$
#4,R5,R7 : s.iSii #16,R11
*■ set 1=1+4, i 
*
Eind REC[I] >= V " ’

















Rb 5R I I 5NEXTI 
find REC[J] <= V
NEXTJ Ld -4[R7],R10 ;;Sub #4,R7










R 8 , R l l 5N E XT J
;;S11 #16, R 8
* if J >1 then swap RECtU w i th -/REC U l and goto SO RT, else branch out
Ld 0[R7],R10 i i M o v  80 ,80
Ble R7 ,R 6 ,ENDlST
Ld 0[R6],R15 ;;Mov R0,R0
Ld 1[R6],R14 ;; Mov ROtRO
St R15,0[R7] ;;Mov ROtRO
St R14,1[R7] ;;Mov ROt RO
St R10,0[R6] ;;Mov ROtRO
St R9,1[R6] ;;Mov ROtRO
Ld 2[R6],R15 ;;Mov ROtRO
Ld 3[R6],R14 ;; Mov ROtRO
Ld 2[R7],R10 ;;Mov ROtRO
Ld 3[R7],R9 ;; Mov ROtRO
St R15,2[R7] ;;Mov ROtRO
St R14,3[R7] ;;Mov R0,R0
St R10,2[R6] ;;Mov ROtRO
St R9,3[R6 ] ;;Mov ROt RO
Bra SORT
******** ****************************** ******************* ****;**.***,*,*****,* 
*
* new jsubfile found, now determine the next stage ,
♦
********************************************************************;*****
* ■ ■ ■
* this routine means I>J so first swap V with REC[J],
* then branch
*
to NEWLU if [(J-L) or (U-J)] > M
ENDlST Ld 0[R4],R13 ;;Mov ROtRO
Ld 2[R4],R11 ;;Mov ROtRO
Ld 3[R4],R15 ;;Mov ROtRO
Ld 0[R7],R10 ;;Mov ROt RO
Ld 2[R7],R8 ;;Mov ROtRO
Ld 3[R7],R14 ;;Mov ROt RO
. St R8,2[R4] ;$Mqv ROtRO




St ;;Mov ROt RO
St R12,1[R7] ;;Mov R 7, R 8
St R13,0[R7] ;;Sub R4,R8
St R14,3[R4] ;;Mov R5,R9









means tha t [(J-L) and (U-J)] <= M
Ld 1[R3],R10 ;;Mov ROtRO
Beq #0,R9,OUTER
Ld 2[R3],R11 ;;Mov ROtRO
Mov R10,R4 ;;Mov R H , R 5
Bra SORT




* decide subfile direction
*************************♦A**********************************************
* '
* entry here means tha t  (J-L) > M
R9,R l jNEWU 
R7,-1[R3] 
R8,R9,STACK 
















‘ STACK - entry here means tha t  (J-L) >




M and (J-L) < (U-J) 











fall into insertion sort as all subfiles below 
or equal M records -  insertion sort phase
OUTER Add R0,R2,R6 ;;Add #4,R7
LOOP Ld -4[R6],R13 ;;Sub *4,R6
Ld 4[R6],R10 jjXc #0,R13
Ld 1[R6],R12 ;;Xc #0,R10
Bge R10,R13,LOOP
Ld 5[R6],R9 ;;Mov ROjRO
Bge R9,R12,LOOP
Ud- 2 [ R 6] ,R11 ;;Mov ROjRO
Ld 6 [R 6 ],R 8 JjSrl #16, Rl I
Sll #16,R11 JjSrl #16,R8




* now set J=J+4 ■ Set REC[J—4) = RECtJ], set J=J+4
* then check to see if REC[J]
* ■
>= Vj branch i f  so
NEXT Ld v 4 [ R 7], R10 J J Add #4,R7
Ld 1[R7],R9 ;;Mov ROjRO
64
St R10,-4[R7] ;;Mov RO,RO
Ld 2[R7],R8 ;;Xc #0,R10
Ld 3[R7],R14 ;;Mov RO,RO
St R9,-3[R7] ;;Mov ROj RO '
St R8,-2[R7] ;;Mov R0,R0
Bge RIO,R13,LAST
’ St .-.v R 14,-1 [ R 7] ;;sn #16,R8
Bge R9,R12,LAST
Sll #16,R8 ;;Mov R0,R0
Bit RB, R U , NEXT
*
Mov RO,RO ;;Mov R0,R0
* since REC[J] >= V, move V into R E C tJ-l]  and
* goto
*
LOOP if I <= REC +4
LAST Ld 0[R6],R13 ;;Mov ROjRO
Ld 2[R6],R11 ;;Mov ROjRO
Ld 3[R6lyR14 ;;Mov R0,R0
St R13,-4[R7] 55M0V R0,R15
St R12,-3[R7] ;;Add R2,R15
St R11,-2[R7] 5; Add #4,R15
Ble R6,R15,LOOP
*
St R 14,-1 [ R 7] ;;Mov ROjRO
* now
*
the routine is done, So exit a f te r  restoring the registers
Ld R15,1[R3] 5; Mov R0,R0
Ld R14,2[R3] ;;Mov R0,R0
Ld R13,3[R3] ;;Mov ROjRO
Ld R12,4[R3] ;;Mov R0,R0
Ld R11,5[R3] ;;Mov ROjRO
Ld R10,6[R3] ;;Mov ROjRO
Ld R9,7[R3] ;;Add #7,R3
Ld R8,1[R3] ;;Mov ROjRO
Ld R7,2[R3] ;;Mov ROjRO
Ld R6,3[R3] 5; Mov R0,R0
Ld R5,4[R3] ;;Mov ROjRO
Ld R4,5[R3] ;;Mov ROjRO
Ld R2,6[R3] ;;Add #7,R3
Ld R1,0[R3] ;;Add #2,R3
Ld 0[R3],R3 5; M ov ROjRO
*
Ld R0,-1[R3] ;;Mov R3,PC
End
65 -•
D.5 Translated MC68020 Assembly Language Code for Quicksort




SU-MIPS Carnegie Mellon Benchmark I ■**
*
*
QUICKSORT - Translated Version *
*
* Attributes: 67 Megabyte Address Range *





* Input: DO - "N" Record Count *
* Dl -  "M" Threshold for Insertion Sort *
*
*




Output: The Sort Data Array is Sorted **
*
*
All Registers are Transparent over this Routine **









LENGTH EQU 16 sort entry  record length
KEY EQU 3 offset to key within record
KEYLEN
*
EQU 7 sort key length
************************************************ ************************* 
*




* MOVEM.L D O-D 7 /AO-A 6 ,-(S P)
*
save all registers
QUICK St R14,-1[R15] ;;Sub #1,R15
St R13,-1[R15] ;;Sub #1,R15
St R12,-1[R15] ;;Sub #1,R15
St R11,-1[R15] ;;Sub #1,R15
St R10,-1[R15] ;;Sub #1,R15
St R9,-1[R15] ;;Sub #1,R15
St R8,-1[R15] ;;Sub #1,R15
St R7,-1[R15] ;;Sub #1, R15
- 66 -
St R6,-1[R15] ;;Sub #1,R15
St R5,-l[R15] ;;Sub #1,R15
St R4,-1[R15] ;;Sub #1,R15
St R3,-1[R15] ;;Sub #1,R15
St R2,-1[R15] ;;Sub #1,R'15
St R1,-1[R15] ;;Sub #1,R15
St ROj- I l R l  5] ;;Sub #1,R15
*
* MOVE. L D0,D2 copy num ber of records over
*
*
LSL.L #4 ,DO calc. ptr. to last record
*
Mov ROjR 2 ;;S11 #4,R0
* LEA -L E N GT H(AO, DO. D jAl Al <- ptr. to last record = U
*
*
LSLiL #4,D1 find to ta l size of M records
Mov -L E N GT Hj R 9 ;;Add R8,R9
*
• Add R0,R9 ;;Sii #4, R l
* MOVEM.L D0/D2/A lj-(SP) save dummy, count, top of stack
*
*
MOVE. L D1 ,A6 keep value in A6 for la te r
St R9,-1[R15] ;;Sub #1,R15
St R2,-1[R15] ;;Sub #2,R15
*
St R 0,0 [ R15] ;;Mov R1,R14
*
*
CLR.L -(SP) mark sort stack empty
St #0,-l(R15] ;;Sub #1,R15
*  . .






* Register use: AO -> first record of the subfile
* Al -> last record of the subfile
* A2, & A3 -> key pointers
* A4 & A5 -> work pointers ”
* A6 -> length of the "Mlt records
*
* - ' ' • /**
SP -> recursive call arguments ■ ; \  ■, ■ \  ' . '■
*
*
K E Y(AO), A2 & I V KEY(I) = REC(L)
SORT Add
*
KEYjRSj RIO ;;Mov ROj RO
*
*
LEA LENGTH+KE Y( Al), A3 A3 -> KEY(J) = .REC(U+I)
*
Mov LENGTH+KE Y,A3 ;$Add Al ,A3
*
*




Add KE Y,R8,R12 ;; Mov ROjRO
* LEA LENGTH(A2),A2 I <- 1+1
* MOVE.L #K E YLEN-IjDO DO = loop counter
* MOVE.L A2,A5 A5 temp for I
* .
Mov LENGTH,R5 ;;Add R10,R5
% Mdv
R5,R10 ;;Mov Sk e y l e n - I j RO
* CMPM.B (A5)+,(A4)+ compare V-REC(I)
* DBNE DOjCMPl loop while equal
* BHi LOOPl if REC(I) < V continue compare
CMPl Ld [R13>>#2],R1 ;;Not R13,R3
Ld [R12>>#2],R2 ;;Xc R3,R1
. Not R12,R4 ;;Xc R4,R2
Bne R2,R1,CMP1
Add #1,R12 ...15 Add S1,R13
■ ' BeqSub
SOjROjCMPl




RZj R I jLOOPI
ROj RO ;;Mov ROj RO
* LEA KE Y(A0),A4 A4 -> KEY(V) of current record
L0 0P2 Add KE Y,R8,R12 ;;Mov ROj RO
*
. * LEA -L E N GT H( A3), A3 J <- J - I
* MOVE.L A3,A5 A5 = temp for J
*
$
MOVE.L Sk e y l e n - I jDO loop counter
Mov LEN GT Hj R 5 ;;Sub R5,R11
$
Mpv R l l jRS ;;Mov Sk e y l e n - I j RO
* CMPM.BI (A4)+,(A5)+ compare REC(J)-V
* DBNE DOjCMPZ loop while equal
*
♦ ■
BHI L 00P 2 IF REC(J) > V continue compare
C MP 2 Ld [R12>>S2],R1 ;;Not R12,R3
Ld [R 13 » # 2 ],R 2 55 X c R 3, Rl
Not ;■ R13,R4 ;;Xc R4,R2
Bne
Add
R2 , RI , C MP2 
S1.R12 ;$Add S1,R13
- . V -  1 : Beq SOjROjC MP2






. * CMP.L A3,A2 I >= J
* BCC ENDlST branch if I >= J
Bhi A2,A3,E NDlST




MOVEM .L -KE Y(A2),D0-D3 swap
Ld -3[R10],R0 ;;Add #1,R10
Ld 0[R10],R1 ;;Add #4 ,RlO 
RO,ROLd G[R10],R2 ;;Mov
*
Ld 4[R10],R3 ;;Sub #5 ,RlO
*
*
MOVEM,.L -KE Y(A3),D4-D7 REC(J)
Ld -3[R11],R4 ;;Add #1,R11
Ld 0[R11],R5 ;;Add #4 ,R H
Ld 0[R11],R6 ;;Mov R0,R0
*
Ld 4[R11],R7 ;;Sub #5 ,R H  ;
*
*
MOVE M.L D O-D 3,-K E Y( A3) with
R0,-3[R11] ;;Add i i , r h  -
St R1,0[ R H ] ;;Add #4,R H
- S t ’.' ■ R2,0[ R H ] ;;Mov R0 ,R 0
*
St R3,4[ R H ] ;;Sub #5 ,R H




St R4,-3[R10] ;;Add #1, RlQ
'■ St R 5,0 [RIO] ;;Add # 4 ,RlO
; : st: ■■■; R6,0[R10] ;;Mov R0,R0
Bra LOOPl
*
St R7,4tR10] ;;Sub #5, RlO
******* * ********************* * * ****************************************** 
*
* new subfile found, now determine
*
the next stage
******** **** * * * * * ***************************************** * * * ****** ******
* SUB.L #KE Y,A3 sub. key - get beg. of record
* MOVEM.L (A0),DO-D 3
*
swap
ENDlST Ld 0[R8],R0 ;;Add #4, R 8
Ld 0[R8],R0 ;;Add #4, R 8







Ld 0[R11],R4 ;;Add #4 ,R H
Ld 0[R11],R5 ;;Add #4 ,R H




* MOVEM.L 'DO-0'3,(A3) ' . with
* MOVE.L A l,DI Dl <- U




St R3,0[R11] ;;Sub #4, R H
St R2,0[R11] ;;Sub #4, RH
St R l tO tR ll] ;;Sub #4,R11
*
St ROtO tR ll] ;;Mov R9t Rl
* . MOVEM.L D4-D 7t(A0) ■ • ■■: ; . REC(J)
*
*
MOVE.L A3,D2 D 2 <- J
St R7,0tR8] ;;Sub #4,R8
St R6,0tR8] ;;Sub ■ #4,R8 ' '
St R5,0(R8] ;;Sub #4, R 8
St R4,0(R8] ;; Mov R H , R2
' * -
- * SUB.L D2,D1 Dl <- U-J
* SUB.L AOtD 2 D2 <- J-L
* CMP.L A6,D2 compare (J-L) <= MSIZE
*
s|c
BHI NEWLU branch if no
Bhi R14,R2,NEWLU ■ ■
' ' ' • ' '• ;
*
Sub R2,R1 ;;Sub R8,R2
* CMP.L A6,D1 • compare (U-J) <= MSIZE
*
♦
BHI NEWL branch if no
s|t ■
Bhi R14,R1,NEWL
* MOVEM.L (SP)+,A0/ Al pop next L and U from stack
* MOVE.L AOtDO ■ ' . ;■ te s t  if  s tack  is empty
*
♦




Ld lfR15],R9 ;;Mov R8,R0
*

















R14,R l tNEWU 
ROtRO




(U-J) <= MSIZE? (U-J) smaller? 
branch if so
;;Mov ROtRO
determine smaller subfile 












*  : ; : ; :
* , / A' .
NEWU Bra SORT
Mov "  #-16 ,R9 ;;Add




(U-J) subfile smaller, set 
L A U to larger subfile limits
RO,RO 
#2, Rl 5
U <- J - l ,  L stays the same 
continue the sort 
(J-L) subfile smaller, set 
L A U to larger subfile limits
R11,R9
push L A J onto sort stack
STACK St R10,-1[R15] ;;Sub #1,R15
St R8,-1[R15] ;;Sub #1,R15
*
* LEA LENGTH(A3),A0 L <- J+ l, U stays the same
* BRA SORT continue the sort
*
NEWL Bra SORT
Mov LENGTH,R8 j;Add R1 1 ,R 8
*
********4*************4*4:tt**4*****44******:****4************4****:tl********
* fall into insertion sort as all subfiles bellow
* or equal M records -  insertion sort phase
*********************************************************4***************
*
* Register use: DO -> loop counter
* Dl -> counter and swap register
* D2 /  D4 -> swap registers
* DS I D7 -> "V" save registers
* AO -> REC(I)
* Al -> REC(J)
* A2 /  A3 -> work registers
* A4 -> REC(J-I)
* AS -> "V" save registers
* A6 -> frame pointer
* Note;
*








* LINK A6 ,#-LEN GTH
* SUB #2,DO
key compare record copies
reload rec. count A top record
;; Add #1, Rl 5
;;Add #1,R15
allocate "V" key copy area 
DO ranges from N-2 through 0
- 71
*
Mov #LENGTH,R1 ;;Sub #4,R15
St R14,0[R15] ;;Sub #2,RO
*
Mov R15,R14 ;;Sub R1,R15
* LEA -LENGTH(AO)tAO I <- I-I
* LEA KE Y(AO),A2 A2 -> KEY(I)
* LEA LENGTH+KE Y(AO),A3 A3 -> KEYU+l)
*
*
MOVE. L #KE YLEN-ljDl loop counter for compare
LOOP Add #KE Y,R8,R11 ;;Mov ROtRO
Mov LENGTH,Rl ;;Sub R1,R 8
♦
Add #KE Y,R8,R10 ;;Mov #KE YLEN-ItRl
* CMPM.B (A3)+,(A2)+ compare KEY(I) & KEY(I+1)
* DBNE D I, C M PIIl loop while equal
*
*
BLS ENDIF branch if KEY(I) <= KEY(I+1)
C M PIIl Ld [R11>>#2],R5 ;;Not R11,R7
Ld [R10>>#2],R6 JjXc R 7, R 5
Not R10,R13 ;;Xc R13,R6
Bne R5,R6,C M PIIl
Add #1,R11 ;;Add #1,R10
Beq #0,R l tCMPIIl
' ' ' Sub #1 ,R 1 ;;Mov ROt RO
*
Bls R6,R5,ENDIF
* M OVE M. L (A0),D5-D7/A5 V <- REC(I)
*
*
LEA LENGTH(AO)tAl Al -> REC(J) = REC(I+I)
Ld 0[R8],R5 5 5 M o v R8,R9
Ld 4[R9],R6 ;;Add #4,R9
Ld 4[R9],R7 ;;Add #4,R9
*
Ld 4[R9],13 55Sub #-8,R9
* M O VE M.L D5-D7,(SP) add on stack for key compare
*
$
MOVE.L A0,A4 prime A4 -> RBC(J-I)
St R5,0[R15] 5 5 M o v R8,R12
St R6,4[R15] ;;Add #4, Rl 5
*
St R7,4[R15] 55Sub #4,R15
*
*
M OVE M.L (A1),D1-D4 temp <- REC(J)
LOOPIN Ld 0[R9],R1 5 5 M o v ROtRO
Ld 4[R9],R2 55Add #4,R9
Ld 4[R9],R3 ;;Add #4,R9
*
Ld 4[R9],R3 J5Add #-8,R9
* MOVEM.L D1-D4,(A4) REC(J-I) <- temp
*
*
MOVE. L A1,A4 A4 -> REC(J-I)
St R1,0[R12] 55Add #4,R12
- 72 -
St R2,0[R12] ;;Add #4,R12
St R3,0[R12] ;;Add #4,R12
*
St R4,0[R12] ;;Mov R9,R12
* LEA LE N GTH(Al),Al J = J+l
* LEA ' KEY(Al)jAS A3 -> KEY(J)
Mov l e n g t h , r u ;;Add ROj R l l
if
M ov R l l jRS ;;Add SKEYj R l l
* LEA KE Y(SP),A2 A2 -> KEY(V)
*
*
MOVE.L #KEYLEN-1,D1 loop counter in Dl
Add #KE Y,R15,R10 ;;Mov Sk e y l e n - I jRi
* GMPM.B (A3)+,(A2)+ compare KEKY(V) & KEY(J)
* DBNE D ljCMPVJ loop while equal .
*
*
BHI LOOPIN if KEY(V) > KEY(J) keep looping
C M P fJ Ld [R 1 1 » # 2 ],R 2 ;;Not R11,R3
Ld [R10>>#2],R4 ;;Xc R3,R2
Not RI0,R3 ;;Xc R3,R4
Bne R2,R4,C MPVJ
Add ^ S l jR l l ;;Add S l j RlO
Bne #0,R1,CMPVJ





MOVEM. L D5-D7/A5,(A4) REC(J-I) <- V
St R5,0[R4] ;;Mov ROjRO
St R6,4[R4] ;;Add S4,R4
St R7,4[R4] ;;Add S4,R4




DBRA DOjLOOP continue linear insert
ENDIF Bne SOjROjLOOP
*
Sub S I jRO ;;Mov ROjRO
* UNLK A6 free and restore stack
* MO VEM.L (SP)+,D0-D7/A0-A6 restore registers
*
*
RTS ■ return to caller
Add #4,R14,R15 ;;Add S1,R15
Ld -1[R15],R0 ;;Add S1,R15
Ld -1[R15],R1 ;;Add S1,R15
Ld -1[R15],R2 •,;Add S l j RlS
Ld -1[R15],R3 ;;Add S1,R15
Ld -1[R15],R4 ;;Add SI, Rl 5
Ld -1[R15],R5 ;;Add S1,R15
Ld -1[R15],R6 ;;Add S l j RlS
Ld -1[R15],R7 ;;Add S1,R15
-  73 -
Ld -1[R15],R8 ;;Add # I y Rl 5
Ld -1[R15],R9 ;;Add #1,R15
Ld -1[R15],R10 ;;Add #1,R15
Ld -1[R15],R11 ;;Add #1,R15
Ld -1[R15],R12 ;;Add #1,R15
Ld -1[R15],R13 ;;Mov R12,Hi
Ld 1[R15],R12 ;;Add #1,R15
Ld -1[R15],R14 ;;Mov R12,PC
Mov
END
End
Hi,R12 ;;Add #1,R15
