Application of Compiler-Assisted Multiple Instruction Rollback Recovery to Speculative Execution by Alewine, N.J. et al.
July 1993 UILU-ENG-93-2229
CRHC-93-16
Center for Reliable and High-Performance Computing
APPLICATION OF 
COMPILER-ASSISTED 
MULTIPLE INSTRUCTION 
ROLLBACK RECOVERY TO 
SPECULATIVE EXECUTION
N.J. Alewine 
W. K. Fuchs 
W.-M. Hwu
Coordinated Science Laboratory 
College of Engineering
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Approved for Public Release. Distribution Unlimited.
UNCLASSIFIED J a xj_
{EduRifv Classification of this page
REPORT DOCUMENTATION PAGE
1a. REPORT SECURITY CLASSIFICATION
Unclassified _________
1b. RESTRICTIVE MARKINGS 
None
2a. SECURITY CLASSIFICATION AUTHORITY
2b. DECLASSIFICATION/DOWNGRADING SCHEDULE
3. DISTRIBUTION/AVAILABILITY OF REPORT 
Approved for public release; 
distribution unlimited
4. PERFORMING ORGANIZATION REPORT NUMBER(S)
UILU-ENG-93-2229 CRHC-93-16
5. MONITORING ORGANIZATION REPORT NUMBER(S)
6a. NAME OF PERFORMING ORGANIZATION 
Coordinated Science Lab 
University of Illinois
6b. OFFICE SYMBOL 
(If applicable)
N/A
7a. NAME OF MONITORING ORGANIZATION
National Aeronautics and Space Administration
6c ADDRESS (G'ty, State, and ZIP Code)
1101 W. Springfield Avenue 
Urbana, IL 61801
7b. ADDRESS (City, State, and ZIP Code)
Moffitt Field, CA
8a. NAME OF FUNDING/SPONSORING 
ORGANIZATION
7a
8b. OFFICE SYMBOL 
(If applicable)
9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER
8c. ADDRESS (City, State, and ZIP Code) 10. SOURCE OF FUNDING NUMBERS
PROGRAM PROJECT TASK
7b
ELEMENT NO. NO. NO.
WORK UNIT 
ACCESSION NO.
11. TITLE (Include Security Classification)
Application of Compiler-Assisted Multiple Instruction Rollback Recovery to Speculative 
Execution
12. PERSONAL AUTHOR(S) ALEWINE, N. J, W. K. Fuchs, and W.-M. Hwu
13a. TYPE OF REPORT
Technical__
13b. TIME COVERED 
FROM_________ _ TO
14. DATE OF REPORT (fear, Month, Day)
1QQ3 Tnlv 19---------------
15. PAGE COUNT
18
16. SUPPLEMENTARY NOTATION
I 17 COSATI CODES 18. SUBJECT TERMS (Continue on reverse if necessary and identify by block number)
FIELD GROUP SUB-GROUP r o l l b a c k  r e c o v e r y ,  c o m p i l e r - a s s i s t e d  m u l t i p l e  i n s t r u c t i o n ,
t r a n s i e n t  p r o c e s s o r  f a i l u r e s ,  i n s t r u c t i o n a l  l e v e l  p a r a i -
J j3 .1 ÌS 2 L _
19. ABSTRACT (Continue on reverse if necessary and identify by block number)
Speculative execution is a method to increase instruction level parallelism which can be exploited by both 
super-scalar and VLIW architectures. The key to a successful general speculation strategy is a repair 
mechanism to handle mispredicted branches and accurate reporting of exceptions for speculated instrucitons. 
Multiple instruction rollback is a technique developed for recovery from transient processor failure. Many 
of the difficulties encountered during recovery from branch misprediction or from instruction re-execution 
due to exception in a speculative execution arChitecute are similar to those encountered during multiple 
instruction rollback.
This paper investigates the applicability of a recently developed compiler-assisted multiple instruciton 
rollback scheme to aid in speculative exectuion repair. Extensions to the ocmpiler-assisted scheme to support 
branch and exception repair are presented along with performance measurements across ten application 
programs.
20. DISTRIBUTION /AVAILABILITY OF ABSTRACT 
| 0UNCLASSIFIED/UNLIMITED □  SAME AS RPT. □  DTIC USERS
21. ABSTRACT SECURITY CLASSIFICAI 
U n c l a s s i f i e d
ION
I 22a. NAME OF RESPONSIBLE INDIVIDUAL 22b. TELEPHONE (Include Area Code) 22c. OFFICE SYMBOL
DD FORM 1473,84 MAR 83 APR edition may be used until exhausted. 
All other editions are obsolete.
SECURITY CLASSIFICATION OF THIS PAGE 
UNCLASSIFIED
UNCLASSIFIED____________
ICCUWITY CLAMI FI CATION O F  THII f»AOE
UNCLASSIFIED
TO APPEAR: WORKSHOP OH HARDWARE AND SOFTWARE ARCHITECTURES FOR FAULT
TOLERANCE: PERSPECTIVES AND TOWARDS A SYNTHESIS JUNE 14-16, 1993
LE MONT SAINT-MICHEL, FRANCE
APPLICATION OF COMPILER-ASSISTED MULTIPLE 
INSTRUCTION ROLLBACK RECOVERY TO 
SPECULATIVE EXECUTION
N. J. Alewine* W. K. Fuchs, W.-M. Hwu
Center for Reliable and High-Performance Computing 
Coordinated Science Laboratory 
University of Illinois at Urban a-Champaign
Abstract
Speculative execution is a method to increase in­
struction level parallelism which can be exploited by 
both super-scalar and VLIW architectures. The key 
to a successful general speculation strategy is a repair 
mechanism to handle mispredicted branches and ac­
curate reporting of exceptions for speculated instruc­
tions. Multiple instruction rollback is a technique 
developed for recovery from transient processor fail­
ures. Many of the difficulties encountered during re­
covery from branch misprediction or from instruction 
re-execution due to exceptions in a speculative exe­
cution architecture are similar to those encountered 
during multiple instruction rollback.
This paper investigates the applicability of a re­
cently developed compiler-assisted multiple instruc­
tion rollback scheme to aid in speculative execution 
repair. Extensions to the compiler-assisted scheme 
to support branch and exception repair are presented 
along with performance measurements across ten ap­
plication programs.
1 Introduction
Super-scalar and VLIW architectures have been 
shown effective in exploiting instruction level paral­
lelism (ILP) present in a given application [1-3]. Cre­
ating additional ILP in applications has been the sub­
ject of study in recent years [4-6]. Code motion within 
a basic block is insufficient to unlock the full potential 
of super-scalar and VLIW processors with issue rates
* International Business Machines Corporation, Boca Raton,
FI.
1 This research was supported in part by the National Aero­
nautics and Space Administration (NASA) under grant NASA 
NAG 1-613, in cooperation with the Illinois Computer Labora­
tory for Aerospace Systems and Software (ICLASS), and in part 
by the Department o f the Navy and managed by the Office of 
the Chief o f Naval Research under Contract N00014-91-J-1283.
greater than two [3]. Given a trace of the most fre­
quently executed basic blocks, limited code movement 
across block boundaries can create additional ILP at 
the expense of requiring complex compensation code 
to ensure program correctness [7]. Combining multiple 
basic blocks into superblocks permits code movement 
within the superblock without the compensation code 
required in standard trace scheduling [3].
General upward and downward code movement 
across trace entry points (joins) and general down­
ward code motion across trace exit points (branches, 
or forks) is permitted without the need for special 
hardware support [7]. Sophisticated hardware support 
is required, however, for unrestricted upward code mo­
tion across a branch boundary. Such code motion 
is referred to as speculative execution and has been 
shown to substantially enhance performance over non- 
speculated architectures [8-10]. This paper focuses on 
the support hardware for speculative execution, which 
ensures correct operation in the presence of except­
ing speculated instructions (referred to as exception 
repair) and of mispredicted branches (referred to as 
branch repair). It is shown that data hazards which re­
sult from exception and branch repair are very similar 
to data hazards that result from multiple instruction 
rollback, and that techniques used to resolve rollback 
data hazards are applicable to exception and branch 
repair.
The remainder of the paper is organized as follows. 
Section 2 gives a brief overview of a compiler-assisted 
multiple instruction rollback (MIR) scheme to be used 
as a base for application to speculative execution re­
pair (SER). Section 3 describes speculative execution 
and the requirements for exception repair and branch 
repaur. Section 4 introduces a schedule reconstruc­
tion scheme and extends the compiler-assisted rollback 
scheme. Section 5 describes read buffer flush costs auad 
Section 6 presents performance impacts which result
from read buffer flushes.
2 Compiler-Assisted M ultiple Instruc­
tion Rollback Recovery
2.1 Hazard Classification
Within a general error model, data hazards result­
ing from instruction retry are o f two types [11-13]. 
On-path hazards are those encountered when the in­
struction path after rollback is the same as the initial 
path and branch hazards are those encountered when 
the instruction path after rollback is different than the 
initial path. As shown in Figure 1, rx represents an 
on-path hazard where during the initial instruction se-
Figure 1: On-path data hazard.
quence rx is written and after rollback is read prior to 
being re-written. As shown in Figure 2, ry represents 
a branch hazard where the initial instruction sequence 
writes ry and after rollback ry is read prior to being re­
written however this time not along the original path.
2.2 On-path Hazard Resolution Using a 
Read Buffer
Hardware support consisting of a read buffer of size 
2N, as shown in Figure 3, has been shown to be ef­
fective in resolving on-path hazards [11-13]. The read 
buffer maintains a window of register read history. If 
am on-path hazard is present, then prior to writing 
over the old value of the hazard register, a read of 
that value must have taken place within the last N  
instructions (else after rollback of <  N, a read of the 
hazard register would not occur before a redefinition). 
Key to this scenario is the fact that the original path 
is repeated. Branch hazard resolution is left to the
Figure 2: Branch data hazard.
Figure 3: Read buffer.
compiler. At rollback, the reawl buffer is flushed back 
to the general purpose register file (GPRF), restoring 
the register file to a restartable state. The primary 
advantage of the read buffer is that it does not require 
an additional reawl port as with a history buffer, repli­
cation of the GPRF as with the future file, or bypass 
logic as with the reorder buffer or delayed write buffer 
[14,15].
2.3 Branch Hazard Removal Compiler 
Transformations
Compiler transformations have been shown to be 
effective in resolving branch hazards [11,12]. Branch 
hazard resolution occurs at three levels; 1) pseudo 
code, 2) machine code, and 3) post-pass. Resolution 
at the pseudo code level would be accomplished by 
renaming the pseudo register ry of instruction /,• (Fig-
ure 2) to rz. Node splitting, loop expansion and loop 
protection transformations aid in breaking pseudo reg­
ister equivalence relationships so that renaming can 
be performed. After the pseudo registers are mapped 
to physical registers, some branch hazards could re­
appear. This is prevented at the machine code level 
by adding hazard constraints to live range constraints 
prior to register allocation. Branch hazards that re­
main after the first two levels can be resolved by either 
creating a ‘‘covering’’ on-path hazard or by inserting 
nop (no operation) instructions ahead of the hazard 
instruction until the rollback is guaranteed to be un­
der the branch. Given the branch hazard of Figure 
2, a covering on-path hazard is created by inserting 
an MOV ry, ry instruction immediately before the in­
struction in which ry is defined. This guarantees that 
the old value o f ry is loaded into the read buffer and 
is available to restore the register file during rollback.
• •
Figure 5: Speculated instruction traps.
3.1 Branch Repair
3 Speculative Execution
Figures 4 and 5 illustrate the two basic problems 
which are encountered when attempting upward code 
motion across a branch. As shown in Figure 4, if the
'* - r*
o
v
branch taken 
--------------1
rI -  r2 * 3 2
Figure 4: r\ in live_out of taken path.
speculated instruction (i.e., an instruction moved up­
ward past one or more branches) modifies the system 
state, and due to the branch outcome the speculated 
instruction should not have been executed, program 
correctness could be affected. Figure 5 illustrates that 
if the speculated instruction causes an exception, and 
again due to the branch outcome, the excepting in­
struction should not have been executed, program per­
formance or even program correctness could be af­
fected.
Figure 6 shows an original instruction schedule and 
a new schedule after speculation. Instructions d, *, 
and /  have been speculated above branches c and 
g from their respective fall-through paths.2 Specu­
lated instructions Me marked “ (s).” The motivation 
for such a schedule might be to hide the load delay 
of the speculated instructions or to allow more time 
for the operands of the branch instructions to become 
available. If c commits to the taken path (i.e., it is 
mispredicted by the static scheduler), some changes 
to the system state that have resulted from the execu­
tion o f d, *, and / ,  may have to be undone. No update 
is required for the PC; execution simply begins at j . 
If instead, c commits to the fall-through path but g 
commits to the taken path, then only i ’s changes to 
the system state may have to be undone.
Not all changes to the system state are equally im­
portant. If for example, d writes to register rx and 
rx £ live.in(j) (i.e., along the path starting at j , a 
redefinition of rx will be encountered prior to a use of 
rx [16]), then the original value of rx does not have 
to be restored. Inconsistencies to the system state 
as a result of mispredicted branches exhibit similari­
ties to branch hazards in multiple instruction rollback 
[11,12]. Given this similarity between branch haz­
ards due to instruction rollback and branch hazards 
due to speculative execution, compiler-driven data­
flow manipulations, similar to those developed to elim­
inate branch hazards for MIR [11,12], cam be used to 
resolve branch hazards that result from speculation. 
Such compiler transformations have been proposed for
3For this example it is assumed that the fall-through paths 
are the most likely outcom e o f the branch decisions at c and g.
a a RB_c: d
b « d e
0 - * -  j « i f
d b i
e («)f jumpLl
f 0 — j
■ 1 : 0 *  k e RB_g: h
h L 1 :[? J -^  k i
i h jump L2
2: L2:
Original Speculated Recovery
Schedule Schedule
ma  ^^-- m
Blocks
Figure 6: Branch repair.
branch misprediction handling [9]. Since re-execution 
of speculated instructions is not required for branch 
misprediction, compiler resolution of branch hazards 
becomes a sufficient branch repair technique.
3.2 Exception Repair
Figure 6 also demonstrates the handling of spec­
ulated trapping instructions. If d is a trapping in­
struction and an exception occurred during its execu­
tion, handling o f the exception must be delayed until c 
commits so that changes to the system state are mini­
mized, and in some cases to ensure that repair is pos­
sible in the event that c is mispredicted. If c commits 
to the taken path, the exception is ignored and d is 
haul died like any other speculated instruction given a 
branch mispredict. If c was correctly predicted, three 
exception repair strategies are possible. The first is to 
undo the effects of only those instructions speculated 
above c (i.e., d, i, and / )  and then branch to a recov­
ery block RB.c [10] as shown in Figure 6. The address 
of the recovery block can be obtained by using the PC 
value of the excepting instruction as am index into a 
hash table. This strategy ensures precise interrupts 
[14,17] relative to the nonspeculated schedule but not 
relative to the original schedule. Recovery blocks can 
cause significamt code growth [10]. The second strat­
egy undoes the effects o f all instructions subsequent to 
d (i.e., i, 6, and / ) ,  handles the exception, amd resumes 
execution at instruction i [9]. This latter strategy pro­
vides restartable states and does not require recovery 
blocks. A third exception repair strategy undoes the 
effects of only those subsequent instructions that are 
speculated above c (i.e., only i amd / ) ,  handles the ex­
ception, and resumes execution at instruction «, how­
ever, this time only executing speculated instructions 
until c is reaudied. The improved efficiency of strategy 
3 over that o f strategy 2 comes at the cost o f slightly 
more complex exception repair hardware.
When a branch commits and is mispredicted, the 
exception repair hardware must perform three func­
tions: 1) determine whether am exception has occurred 
during the execution o f a speculated instruction, 2) if 
an exception has occurred, determine the PC value 
of the excepting instruction, amd 3) determine which 
changes to the system state must be undone. Func­
tions 1 and 2 aure similar to error detection amd location 
in multiple instruction rollback. Function 3 is similar 
to on-path hazard resolution in multiple instruction 
rollback [11,12,18]. On-path hazards assume that af­
ter rollback the initiad instruction sequence from the 
faulty instruction to the instruction where the error 
was detected is repeated.
Figure 7 illustrates the speculation of a group of
:
Figure 7: Exception repadr.
instructions amd re-execution strategy 3. The load in­
struction traps, but the exception is not hamdled un­
til the bramch instruction commits to the fall-through 
path. Control is then returned to the trapping instruc­
tion. This scenario is identicad to multiple instruction 
rollbaudc where am error occurs during the loaui instruc­
tion and is detected during the bramch instruction. For 
this examiple, only r*i must be restored during rollback 
since amd r5 will be rewritten prior to use during 
re-execution. Figure 7 shows that exception repair
hazards in speculative execution are the same as on- 
path hazards in multiple instruction rollback, and a 
read buffer as described in Section 2 can be used to 
resolve these hazards. The depth of the read buffer is 
the maximum distance from h  to In along any back­
wards walk3, where Jn is a trapping instruction that 
was speculated above branch instruction h-
3.3 Schedule Reconstruction
Assumed in Figures 6 and 7 are mechanisms to 
identify speculative instructions, determine the PC 
value o f excepting speculated instructions, and deter­
mine how many branches a given instruction has been 
speculated above. An example of the latter case is 
shown in Figure 6 where instructions d, t, and / ,  are 
undone if c is mispredicted; however, only i must be 
undone if g is mispredicted.
If the hardware had access to the original code 
schedule, the design of these mechanisms would be 
straightforward. Unfortunately, static scheduling re­
orders instructions at compile-time and information as 
to the original code schedule is lost. To enable recov­
ery from mispredicted branches and proper handling 
of speculated exceptions, some information relative to 
the original instruction order must be present in the 
compiler-emitted instructions. This will be referred to 
as schedule reconstruction.
By limiting the flexibility of the scheduler, less in­
formation about the original schedule is required. For 
example, if speculation is limited to one level only 
(i.e., above a single branch), a single bit in the opcode 
fleld is sufficient to indicate that the instruction has 
been moved above the next branch [8]. The hardware 
would then know exactly which instruction effects to 
undo (i.e., the ones with this bit set). Also, remov­
ing branch hazards directly with the compiler permits 
general speculation with no schedule reconstruction 
for branch repair [9].
4 Implicit Index Schedule Reconstruc­
tion
Implicit index scheduling supports general specula­
tion o f regular and trapping instructions. The scheme 
was inspired by the handling of stores in the sentinel 
scheduling scheme [9] and was designed to exploit the 
unique properties of the read buffer hardware design 
described in Section 2. Schedule reconstruction is ac­
complished by marking each instruction speculated or
3 A walk is a sequence o f edge traversals in a graph where the 
edges visited can be repeated [19].
nonspeculated by including a bit in the opcode field, 
and using this encoding to maintain an operand his­
tory of speculated instructions in a FIFO queue called 
a speculation read buffer (SRB). The SRB operates 
similar to a read buffer with additional provisions for 
exception handling.
4.1 Exception Repair Using a Speculation 
Read Buffer
Figure 8 shows an original code schedule and two 
speculative schedules, along with the contents of the 
SRB at the time branches Ie and Ig commit. Instruc­
tions Id and If have been speculated above branch 
instruction / c, and has been speculated above both 
Ig and Ic. The encoding of speculated instructions in­
forms the hardware that the source operands are to 
be saved in the SRB, along with the source operand 
values, corresponding register addresses, and the PC 
of the speculated instruction.
Speculated instructions execute normally unless 
they trap. If a speculated instruction traps, the ex­
ception bit in the SRB which corresponds to the trap­
ping instruction is set and program execution contin­
ues. Subsequent instructions that use the result of the 
trapping instruction are allowed to execute normally.
A chk^except(k) instruction is placed in the home 
block of each speculated instruction. Only one 
chk-except(k) instruction is required for a home block. 
As the name implies, chL.except(k) checks for pend­
ing exceptions. The command cam simultaneously in­
terrogate each location in the SRB by utilizing the 
bit field k. As shown in schedule 1 of Figure 8, 
chk.except(0011U) in J' checks exceptions for instruc­
tions Id and Id. If a checked exception bit is set, the 
SRB is flushed in reverse order, restoring the appropri­
ate register and PC values. Execution can then begin 
with the excepting instruction.
Figure 8 illustrates several on-path hazards which 
are resolved by the SRB. In schedule 1, if /,• traps and 
the branch /<• commits to the taken path, U has cor­
rupted ri and If has corrupted r7. Flushing the SRB 
up through /, restores both registers to their values 
prior to the initial execution of Ii. Note that register 
re is also corrupted but not restored by the SRB, since 
after rollback re will be rewritten with a correct value 
before the corrupted value is used.
As an alternative to checking for exceptions in each 
home block, the exception could be handled when the 
exception bit reaches the bottom of the SRB. This is 
similar to the reorder buffer used in dynamic schedul­
ing [14] and eliminates the cost of the chk-except(k) 
command, however, increases the exception handling
Original Schedule Speculated Schedule 1 Speculated Schecule 2
rl  = r2 *  r3 V rL l r2* 'L
r3 = r4 + r5 h-
bne rJt r3, I . V r8
ri a r 7 *  r8 VI
rs “  r> *  4 ¥
V  0  “  r2 * r3
r7 = r7 + 4 
bne r8, v7, 1^
rt -  r6 * 4 
r , =  M EM (r, )
1 c 
U o 
s r 
h d
2N
Ic: bne rJt r3, Ij 
I’ : chk_except(001 111)
V  r8 rs + 4
V  bne V  r7>
I*: chk_except(l 10000)
lk- r6 r6 + 4
Except bit — 1 
I  Reg. No.
i t . - 0
ic . valuti r?) 7
V value(rg) 8
V valuer7) 7
I.- - 0
1/ valuer2) 2
} -
} -
} -
I’ : chk_except(l 10011)
ri  + 4
lg: bne rg , r? , lk 
I*: chk_except(001100)
V  r6 = r6 * 4
Except bit— I 
j  R eg.N o.-i I
f e1 c 
“  o 
S r 
h d
2N
i t . - 0
i t . value(r7) 7
I,- - 0
I,- valuer2) 2
V valuerg) 8
V valuefr7) 7
h
}J
} -
SRB Contents SRB Contents
Figure 8: Exception repair using a speculation read buffer (SRB).
latency which can impact performance depending on 
the frequency o f exceptions.
Implicit index scheduling derives its name from the 
ability of the compiler to locate a particular register 
value within the SRB. This is possible only if the dy­
namically occurring history of speculated instructions 
is deterministic at branch boundaries. Superblocks 
guarantee this by ensuring that the sole entry into the 
superblock is at the header and by limiting specula­
tion to within the superblock. For standard blocks, 
bookkeeping code [7] can be used to ensure this deter­
ministic behavior.
4.2 Branch Repair Using a Speculation 
Read Buffer
As described in Section 2, branch repair can be han­
dled by resolving branch hazards with the compiler. 
Branch hazard resolution in multiple instruction roll­
back can be assisted by the read buffer when cover­
ing on-path hazards are present, reducing the perfor­
mance cost of variable renaming [11,12]. In a similar 
fashion, the SRB can assist in branch repair. Figure 
9 shows the original code schedule and the two spec­
ulative schedules of Figure 8. For this example, it is 
assumed that r j, r3, r6, and 7*7 are elements in both 
live.in(Ij) and livc.in(Ik).
As shown in schedule 1, if branch instruction Ic 
commits to the taken path, 7*3, re, and rr, which were 
modified in Ii, Id, and / / ,  respectively, must be re­
stored. If instead, Ic commits to the fall-through path 
and Ig commits to the taken path, only 7*2 must be re­
stored. Registers t*2 and 7*7 are rollback hazards that 
result from exception repair; therefore, the SRB con­
tains their unmodified values. By including a flush(k) 
command at the target o f Ie and Ig, the SRB can be 
used to restore r2 and/or rr given a misprediction of 
Ie or Ig.
The flush(k) command selectively flushes the ap­
propriate register values given a branch misprediction. 
For example, in schedule 2 of Figure 9, if Ic is predicted 
correctly and Ig is mispredicted, the SRB is flushed in 
reverse order up through /«, restoring vaiue(r2) from 
Ii but not restoring value(rj) from If. Since specu­
lation is always from the most probable branch path, 
the flush(k) command is always placed on the most 
improbable branch path, minimizing the performance 
penalty. Not all branch hazards are resolved by the 
presence of on-path hazards. These remaining haz­
ards can be resolved with compiler transformations.
5 SRB Flush Penalty
The examples o f Section 4 demonstrate that 
compiler-assisted multiple instruction rollback cam be 
applied to both bramch repair amd exception repair in a 
speculative execution architecture. The flush penadty 
of the read buffer is not a key concern in multiple in­
struction rollback applications since instruction faults 
are typicadly very rare. In application to exception re- 
pair in speculative execution, the SRB flush penalty is 
also not a major concern due to the infrequency of ex­
ceptions involving speculated instructions. However, 
in application to bramch repadr, the SRB flush penadty 
could produce significamt performance impacts. Stud­
ies of bramch behavior show a conditioned branch fre­
quency o f 11% to 17% [20]. Static bramch prediction 
methods result in bramch mispredictions in the ramge 
of 5% to 15%. This results in a branch repadr fre­
quency as high as 2.5%. Assuming a CPI (clock cycles 
per instruction) rate of one amd am average SRB flush 
penadty of ten cycles, the performance overhead of the 
flush mechanism would reach 22.5%. This indicates 
the importamce of minimizing the amount of redun- 
damt data stored in the SRB so that the flush penadty 
is reduced.
Recently, a technique was proposed to reduce the 
amount of redundamt data in a read buffer so that the 
read buffer size could be reduced [12,13]. A similar 
technique cam be used to assure that only the data 
required for bramch amd exception repadr is stored in 
the SRB. In the implicit index scheme of Section 4, a 
bit indicating whether am instruction is speculated is 
added to the opcode field. By expanded this field to 
two bits, operamd storage requirements cam be spec­
ified. Figure 10 shows the reduced contents of the 
SRB given schedule 1 of Figure 9. In the modified 
scheme, only the first reaui of 7*7 must be maintained. 
Register r% is not required since it was not modified. 
The improved scheme adso eliminates blamk spaces in 
the SRB. For this exauuple, the misprediction of Ie in 
schedule 1 of Figure 9 results in four less variables to 
flush.
The coding of the two speculation bits would be as 
follows: 00) no save required, 01) save operamd 1, 10) 
save operamd 2, amd 11) save both operamds. If neither 
operamd of a speculated instruction has be saved in 
the SRB, the instruction is not marked as speculated. 
This is not a problem for bramch repair: however, if 
such am instruction traps, the hardware would have no 
way of knowing not to hamdle the exception immedi­
ately. There would adso be no entry in the SRB for the 
exception bit or for the corresponding PC vadue. One 
solution to the problem would be to auid another bit to
Original Schedule Speculated Schedule 1 Speculated Schedule 2
V  <i -  r2 * r3
V  r3 mr4 + rS 
Ie: bne rJf Tj. Iy
V  r< ”  r7 • rS
I .: ri  “  r» + 4
y  r7 = r7 + 4
I^ i bne / j ,  ry, 1^
I»: r6 mri * 4
I.: r2 = MEM(r2 )
Ir  0 r2 * rJ V  0 r2 *  rJ
Except bit— , 
I  Reg. No.
2N
l:
valuer?)
valuefrj
valueffj)
I^ i bne 7^ » I ^
V  r6 mr6 + 4
Iy. flush (lO lllO )
1^ flush(lOOOOO) —
2N
r8 r8 + 4
I£  bne fg I Tjt I ^
lK r6 = ri  + 4
L : flush (lllO lO ) .
1 •
I*: flush(OOlOOO) — ,
V - 0
v valuer7) 7
I. - 0
I; valuer2) 2
h valuerg) 8
h valuer?) 7
SRB Contents SRB Contents
Figure 9: Branch repair using a speculation read buffer (SRB).
„  Except bit— » 
Reg. No.
a a
f é
1 c 2N 
u o 
s r 
h d
a ]f
h value(r?) 7
i, vaiue(r2) 2
SRB Contents
Figure 10: SRB with reduced content.
Figure 11: Instrumentation code placement.
the opcode field which marks speculated trapping in­
structions. A better solution is to code all speculated 
trapping instructions which have no operands to save 
as 01. This will indicate that exception handling is to 
be delayed and cause a reservation of an entry in the 
SRB, and also will slightly increase the flush penalty 
during branch repairs.
6 Performance Evaluation 
6.1 Evaluation Methodology
In this section, results of a read buffer flush penalty 
evaluation are presented. The instrumentation code 
segments of Figure 11 call a branch error procedure 
which performs the following functions:
1. Update the read buffer model.
2. Force actual branch errors during program exe­
cution, allowing execution to proceed along an 
incorrect path for a controlled number o f instruc­
tions.
3. Terminate execution along the incorrect path and 
restore the required system state from the simu­
lated read buffer.
4. Measure the resulting flush cycles during the 
branch repair.
5. Begin execution along the correct path until the 
next branch is encountered.
An example instrumentation code segment is shown 
in Figure 12. Parameters, such as operand saving in­
formation, current PC, branch fall-though PC, and 
branch target PC values, are passed by the instru­
mentation code to the branch error procedure. An 
additional miscellaneous parameter contains instruc­
tion type and information used for debugging.
Figure 13 gives a high level flow of operation for the 
branch error procedure. When a branch instruction 
in the original application program is encountered, an 
arm-branch flag is set. Prior to the execution of the 
next application instruction, the arm-branch flag is 
checked, and if set, the branch decision made by the 
application program is set aside. The branch is then 
predicted by the branch prediction model. Four mod­
els are used in the evaluation: 1) predict taken, 2) pre­
dict not taken, 3) dynamic prediction, and 4) static 
prediction from profiling information. The dynamic 
prediction model is derived from a two bit counter 
branch target buffer (BTB) design [21] and is the 
only model that requires updating with each predic­
tion outcome.
After the branch is predicted, the prediction is 
checked against the actual branch path taken by the 
application program. If the prediction was correct, ex­
ecution proceeds normally. If the prediction was incor­
rect, the correct branch path is loaded into the recov­
ery queue along with a branch error detection (BED) 
latency, and the predicted path is loaded into the PC. 
The BED latency indicates how long the execution of 
instructions is to continue along the incorrect path. 
The branch error time-out flag is set when the BED 
latency is reached. When a branch error is detected, 
the register file state is repaired using the read buffer 
contents. The PC value of the correct branch path is 
obtained from the recovery queue. During branch er­
ror rollback recovery, the number of cycles required to 
flush the read buffer during branch repair is recorded.
$_3  im lb_2_2 4_0 :
# in s t r u c t io n  24
# Begin brsim _sim  hook: s i
subu
la
sw
la
sw
la
sw
l i
sw
l i
sw
move
j
$sp,
$ a t,
$ a t,
$ a t ,
$ a t,
$ a t,
$at t 
$ at, 
$ a t , 
$ at, 
$ a t, 
$ a t,
44
$
16, s2 -  0 : normal
—  hook address
—  instruction adress
—  next hook address
—  miscellaneous
_s im lb_2_2 4_0
2 0 ($sp)
$_s im lb_2_2 4_1
2 4 ($sp)
$_3 im lb_2_2 5_0
2 8 ($sp)
8216 ------------------
3 2 ($sp)
16 «
brsim  save
4 0 ($sp) 
$sp
# End brsim _sim  hook. 
$_sim lb_2_2 4_1 :
addu $16, $16,
directs read buffer to save 
register 16
original instruction
$_s im lb_2_2 5_0 :
# in s t r u c t io n  25
# Begin brsim__sim hook: s i  -  16, s2 -  9: branch
subu $spr 44
la $ at, $ sim lb  2 25 0 —  hook address
sw $ at, 2 0 ($sp)
la $ a t , $ sim lb  2 25 1 -e -—— instruction adress
sw $ at, 2 4 ($sp)
la $ a t,
sw $ at, 2 8 ($sp)
l i $ a t, 532505 «
sw $at, 3 2 ($sp)
la $ a t,
sw $ at, 3 6 ($sp)
l i $ a t, 304 e ......................—
sw $ at, 4 0 ($sp) registers 16 and 9
move $ a t, $sp
j brsim__save
# End brsim__sim hook. 
$ sim lb  2 25 1:
bne $16, $9, $_main__5 original instruction
$ main 6:
Figure 12: Instrumentation code sequences.
Yw
update 
RB model
update
recovery
Queue
return
i
• restore GPRF from 
RB model, record 
flush cycles
• load PC from 
recovery queue
PC - program counter
GPRF - general purpose register file
RB - read buffer
BPM - branch prediction model
Figure 13: Branch error procedure operation.
Table 1: Application programs.
Program Static Size Description
QUEEN 148 eight-queen program
WC 181 UNIX utility
QSORT 252 quick sort algorithm
CMP 262 UNIX utility
GREP 907 UNIX utility
PUZZLE 932 simple game
COMPRESS 1826 UNIX utility
LEX 6856 lexical analyzer
YACC 8099 parser-generator
CCCP 8775 preprocessor for 
gnu C compiler
It is assumed for this evaluation that two read 
buffer entries can be flushed in a single cycle. This cor­
responds to a split-cycle-save assumption of the gen­
eral purpose register file [12]. Performance overhead 
due to read buffer flushes (% increase) is computed as
Flush-OH  =  100 .
total-cycles
All instructions are assumed to require one cycle for 
execution. This assumption is conservative since the 
MIPS processor used for the evaluation requires two 
cycles for a load. The additional cycles would increase 
the total-cycles and thereby reduce the observed per­
formance overhead. In addition to accurately measur­
ing flush costs, the evaluation verifies the operation of 
the read buffer and its ability to restore the appropri­
ate system state over a wide range of applications.
The instrumentation insertion transformation oper­
ates on the s-code emitted by the MIPS code generator 
of the IMPACT C compiler [3]. The transformation 
determines which operands require saving in the read 
buffer and inserts calls to the initialization, branch er­
ror, and summary procedures. The resulting s-code 
modules are then compiled and ran on a DECstation 
3100. For the evaluation, BED latencies from 1 to 10 
were used. Table 1 lists the ten application programs 
evaluated. Static Size is the number of assembly in­
structions emitted by the code generator, not includ­
ing the library routines and other fixed overhead.
6.2 Evaluation Results
Experimental measurements of read buffer flush 
overhead (Flush Off) for various BED latencies are 
shown in Figures 14 through 23. The four branch
Flush OH
<%)
P_Taken: -o -
P_N_Taken:--o- 
Dyn_Pred: 
Prof_Pred: - a-
t— I— I— I— I— I— i— I— r 
1 2 3 4 5 6 7 8 9  10 
BED Latency
Figure 14: Flush penalty: QUEEN.
Flush OH
<%)
50H P Taken: - o -P N Taken:--o-
40- Dyn_Pred: •••»•••Prof_Pred: - a -
30-
20-
10- _ „ s & ïS ïz l
0-
1" **Ta * ^
l i i l i 1— I— I— r 
1 2 3 4 5 6 7 8 9  10 
BED Latency
Figure 15: Flush penalty: WC.
prediction strategies used for the evaluation are: 
1) predict taken (P-Taken), 2) predict not taken 
(P-N-Taken), 3) dynamic prediction based on a 
branch target buffer (Dyn-Pred), and 4) static branch 
prediction using profiling data (Prof-Prtd).
Flush costs were closely related to branch predic­
tion accuracies, i.e., the more often a branch was mis­
predicted, the more often flush costs were incurred. 
In a speculative execution architecture, branch predic­
tion inaccuracies result in performance impacts in ad­
dition to the impacts from the branch repair scheme. 
Branch misprediction increases the base run time of 
an application by permitting speculative execution of 
unproductive instructions. Increased levels o f specular 
tion increase the performance impacts associated with 
branch prediction inaccuracies. Only the performance 
impacts associated read buffer flushes are shown in 
Figures 14 through 23.
Flush OH
(% )
50-
40-
30-
P_Taken: -o -
P_N_Taken:--o- 
Dyn_Pred: •••*•• 
ProfJPred: - a-
i— i— i— i— i— i— r 
2 3 4 5 6 7 8 9  10 
BED Latency
Figure 16: Flush penalty: COMPRESS.
Flush OH
(% )
50-
40-
30-
20-
P Taken: -o -
PlN_Taken:--o- 
Dyn_Pred: 
ProfPred:
o f  » f ■ 't f  f  f
2 3 4 5 6 7 8 9  10 
BED Latency
Figure 17: Flush penalty: CMP.
Flush OH
(* >
50H
40-
30-
20-
P_Taken: -o -
P_N_Taken:--o- 
Dyn_Pred: •••*■■ 
Prof_Pred: - a-
2 3 4 5 6 7 8 
BED Latency
9 10
Figure 18: Flush penalty: PUZZLE.
Hush OH
<%)
50-
40-
30-
20-
110-
P_Taken: -o -
P_N_Taken:-o- 
Dyn_Pred: •••». 
ProfPred: - a-
II--o—a
" i — i— i— i— i— i— i— r 
1 2 3 4 5 6 7 8 9  10 
BED Latency
Figure 19: Flush penalty: QSORT.
For nine of the ten applications, P-N.Taken was 
significantly more accurate or marginally more ac­
curate in predicting branch outcomes than P- Taken. 
For QSORT, P-Taken was significantly more accurate 
than P-N-Taken. This result demonstrates that in 
a speculative execution architecture, it is difficult to 
guarantee optimal performance across a range of ap­
plications given a choice between predict-taken and 
predict-not-taken branch prediction strategies.
For all but one application, Prof-Pred was more ac­
curate than either P-Taken or P-N-Taken. For CMP, 
Prof-Pred, P-N-Taken, and Dyn-Pred were nearly per­
fect in their prediction of branch outcomes. Prof-Pred 
marginally outperformed Dyn-Pred in all applications 
except LEX.
The purpose of measuring read buffer flush costs 
given the recovery from injected branch errors is to 
establish the viability of using a read buffer design
for branch repair for speculative execution. Although 
in such a speculative schedule only static prediction 
strategies would be applicable, the Dyn-Pred model 
was included to better assess how varying branch pre­
diction strategies impact flush costs. Overall, the ac­
curacy of Dyn-Pred fell between P-Taken/P-N-Taken 
and Prof-Pred.
Over the ten applications studied, read buffer flush 
overhead ranged from 49.91% for the PJTaken strat­
egy in CCCP to .01% for the P-N-Taken strategy for 
CMP given a BED of ten. It can be seen from Figures 
14 through 23 that a good branch prediction strat­
egy is key to a low read buffer flush cost. The results 
show that given a static branch prediction strategy 
using profiling data, an average BED of ten produces 
flush costs no greater than 14.8% an d an average flush 
cost of 8.1% across the ten applications studied. This 
performance overhead is comparable to the overhead
Hush OH
(% )
Figure 20: Flush penalty: GREP.
Flush OH
<%)
50H p_Taken:
H P N Taken:--o-
OH— i— i— i— i— i— i— i— i— r 
1 2 3 4 5 6 7 8 9  10 
BED Latency
Figure 22: Flush penalty: YACC.
Flush OH
(% )
expected from a delayed write buffer scheme with a 
maximum allowable BED o f ten [15]. Given a max­
imum BED of ten and an average BED of less than 
ten, the flush costs of the read buffer would be less 
than that of a delayed write buffer, since a delayed 
write buffer is designed for a worst-case BED and the 
flush penalty of a read buffer is based on the average 
BED. The observed flush costs are small in compari­
son to the substantial performance gain of speculated 
architectures over that o f nonspeculated architectures 
[8- 10].
The BED for a given branch in this evaluation cor­
responds to the number of instructions moved above 
a branch in a speculative schedule. The results of the 
evaluation indicate that if the average number of in­
structions speculated above a given branch is < 10, 
then the read buffer becomes a viable approach to 
handling branch repair.
Flush OH
<%)
2 3 4 5 6 7 8 9  10
BED Latency
Figure 23: Flush penalty: CCCP.
7 Summary
Speculative execution has been shown to be an ef­
fective method to create additional instruction level 
parallelism in general applications. Speculating in­
structions above branches requires schemes to han­
dle mispredicted branches and speculated instructions 
that trap.
This paper showed that branch hazards resulting 
from branch mispredictions in speculative execution 
are similar to branch hazards in multiple instruction 
rollback developed for processor error recovery. It was 
shown that compiler techniques previously developed 
for error recovery can be used as an effective branch 
repair scheme in a speculative execution architecture. 
It was also shown that data hazards that result in 
rollback due to exception repair are similar to on-path 
hazards suggesting a read buffer approach to exception
repair.
Implicit index scheduling was introduced to exploit 
the unique characteristics of rollback recovery using 
a read buffer approach. The read buffer design was 
extended to include PC values to aid in rollback from 
excepting speculated instructions.
Read buffer flush penalties were measured by in­
jecting branch errors into ten target applications and 
measuring the flush cycles required to recover from 
the branch errors using a simulated read buffer. It 
was shown that with a static branch prediction strat­
egy using profiling data, flush costs under 15% are 
achievable. The results of these evaluations indicate 
that compiler-assisted multiple instruction rollback is 
viable for branch and exception repair in a speculative 
execution architecture.
8 Acknowledgements
The authors wish to thank Shyh-Kwei Chen and 
C.-C. Jim Li for their help with the compiler aspects 
of this paper. We would like to thank Scott Mahlke, 
William Chen, and John Christopher Gyllenhaal for 
their excellent technical suggestions and assistance 
with the IMPACT C compiler. Finally, we express 
our thanks to Janak Patel for his contributions to this 
research.
References
[1] R. P. Colwell, R. P. Nix, J. O’Donnell, D. B. Par 
pworth, and P. K. Rodman, “A VLIW Architec­
ture for a Trace Scheduling Compiler,” in Proc. 
2nd Int. Conf. Architecture Support Programming 
Languages and Operating Syst., pp. 105-111, Oct. 
1987.
[2] J. C. Dehnert, P. Y. Hsu, and J. P. Bratt, “Over­
lapped Loop Support in the Cydra 5,” in Proc. 
3rd Int. Conf. Architecture Support Programming 
Languages and Operating Syst., pp. 25-38, April 
1989.
[3] P. Chang, W. Chen, N. Warter, and W.- 
M. W. Hwu, “IMPACT: An Architecture Frame­
work for Multiple-Instruction-Issue Processors,” 
in Proc. 18th Annu. Symp. Comput. Architecture, 
pp. 266-275, May 1991.
[4] B. R. Rau and C. D. Glaeser, “Some Scheduling 
Techniques and an Easily Schedulable Horizon- 
tad Architecture for High Performance Scientific
Computing,” in Proc. 20th Annu. Workshop Mi­
croprogramming Microarchitecture, pp. 183-198, 
Oct. 1981.
[5] M. S. Lam, “Software Pipelining: An Effective 
Scheduling Technique for VLIW Machines,” in 
Proc. ACM SIGPLAN 1988 Conf. Programming 
Language Design Implementation, pp. 318-328, 
June 1988.
[6] A. Aiken and A. Nicolau, “Optimal Loop Paral­
lelization,” in Proc. ACM SIGPLAN 1988 Conf. 
Programming Language Design Implementation, 
pp. 308-317, June 1988.
[7] J. A. Fisher, “Trace Scheduling: A Technique 
for Global Microcode Compaction,” IEEE Trans. 
Comput., vol. c-30, no. 7, pp. 478-490, July 1981.
[8] M. D. Smith, M. S. Lam, and M. Horowitz, 
“Boosting Beyond Scalar Scheduling in a Super­
scalar Processor,” in Proc. 17th Annu. Symp. 
Comput. Architecture, pp. 344-354, May 1990.
[9] S. A. Mahlke, W. Y. Chen, W.-M. W. Hwu, B. R. 
Rao, and M. S. Schlansker, “Sentinel Scheduling 
for VLIW and Superscalar Processors,” in Proc. 
5th Int. Conf. Architecture Support Programming 
Languages and Operating Syst., pp. 238-247, Oct. 
1992.
[10] M. D. Smith, M. A. Horowitz, and M. S. Lam, 
“Efficient Superscalar Performance Through 
Boosting,” in Proc. 5th Int. Conf. Architecture 
Support Programming Languages and Operating 
Syst., pp. 248-259, Oct. 1992.
[11] N. J. Alewine, S.-K. Chen, C.-C. J. Li, W. K.
* Fuchs, and W.-M. W. Hwu, “Branch Recov­
ery with Compiler-Assisted Multiple Instruction 
Retry,” in Proc. 22th. Int. Symp. Fault-Tolerant 
Comput., pp. 66-73, July 1992.
[12] N. J. Alewine, Compiler-assisted Multiple In­
struction Rollback Recovery using a Read Buffer. 
PhD thesis, Tech. Rep. CRHC-93-06, University 
of Illinois at Urbana-Champaign, 1993.
[13] N. J. Alewine, S.-K. Chen, W. K. Fuchs, and W.- 
M. W. Hwu, “Compiler-assisted Multiple Instruc­
tion Rollback Recovery using a Read Buffer,” 
Tech. Rep. CRHC-93-11, Coordinated Science 
Laboratory, University of Illinois, May 1993.
[14] J. E. Smith and A. R. Pleszkun, “Implementing 
Precise Interrupts in Pipelined Processors,” IEEE 
Trans. Comput., vol. 37, pp. 562-573, May 1988.
[15] Y. Tamir and M. Tremblay, “High-Performance 
Fault-Tolerant VLSI Systems Using Micro Roll­
back,” IEEE Trans. Comput., vol. 39, pp. 548- 
554, Apr. 1990.
[16] A. V. Aho, R. Sethi, and J. D. Ullman, Compil­
ers: Principles, Techniques, and Tools. Reading, 
MA: Addison-Wesley, 1986.
[17] M. Johnson, Superscalar Microprocessor Design. 
Englewood Cliffs, NJ: Prentice-Hall, Inc., 1991.
[18] C.-C. J. Li, S.-K. Chen, W. K. Fuchs, and W.- 
M. W. Hwu, “Compiler-Assisted Multiple In­
struction Retry,” Tech. Rep. CRHC-91-31, Coor­
dinated Science Laboratory, University o f Illinois, 
May 1991.
[19] J. A. Bondy and U. Murty, Graph Theory with 
Applications. London, England: Macmillan Press 
Ltd., 1979.
[20] J. L. Hennessy and D. A. Patterson, Computer 
Architecture: A Quantitative Approach. San Ma­
teo, CA: Morgan Kaufmann Publishers, Inc., 
1990.
[21] J. K. Lee and A. J. Smith, “Branch Prediction 
Strategies and Branch Target Buffer Design,” 
Computer, vol. 17, no. 1, pp. 6-22, Jan. 1984.
