Application of compiler-assisted multiple instruction rollback recovery to speculative execution by Hwu, W.-M. et al.
NASA-CR-193360
Iuly 1993 UILU-ENG-93-2229
CRHC-93-16
Center for Reliable and High-Performance Computing
APPLICATION OF
COMPILER-ASSISTED
MULTIPLE INSTRUCTION
ROLLBACK RECOVERY TO
SPECULATIVE EXECUTION
N.J. Alewine
W. K. Fuchs
W.-M. Hwu
(NASA-CR-193360) APPLICATION OF
COHPILER-ASSISTED MULTIPLE
INSTRUCTION ROLLBACK RECOVERY TO
SPECULATIVE EXECUTION (illinois
Univ.) 18 p
N93-32355
Uncl as
G3/62 0176461
Coordinated Science Laboratory
College of Engineering
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Approved for Public Release. Distribution Unlimited.
https://ntrs.nasa.gov/search.jsp?R=19930023166 2020-03-17T04:57:07+00:00Z
UNCLASSIFIED
_CURItY CLASSIFICATION OF THIS PAGE
la. REPORT SECURITY CLASSIFICATION
Unclassified
2a. SECURffY CLASSIFICATION AUTHORITY
Zb. DECLASSIFICATION I OOWNGRADING SCHEDULE
4. PERFORMING ORGANIZATION REPORT NUMBER(S)
UILU-ENG-93-2229 CRHC-93-16
'6a. NAME C)F PERFORMING ORGANIZATION
Coordinated Science Lab
University of Illinois
6c ADDRESS (O'ty, State, and ZIP Code)
ii01 W. Springfield Avenue
Urbana, IL 61801
8a. NAME OF FUNDING/SPONSORING
ORGANIZATION
7a
8c ADDRESS(City, State, and ZlPCode)
7b
REPORT DOCUMENTATION PAGE
lb. RESTRICTIVE MA/tKINGS
None
3. DISTRIBUTION/AVAILAB'IUTY OF REPORT
Approved for public release;
dis cribution unlimited
S. MONITORING ORGANIZATION REPORT NUMBER(S)
ii
OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION
(If app.cabl,)
N/A National Aeronautics and Space Admlnlstratic
7b. ADDRESS (C/ty, State, and ZIP Code)
Moffitt Field, CA
9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER
10. SOURCE OF FUNDING NUMBERS
ELEMENT NO. . NO. ACCESSION NO.
Bb. OFFICE SYMB(_L
(If applicable)
11. TITLE (Include _curityOas_i_ation)
Application of Compiler-Asslsted Multiple Instruction Roilback Recovery to Speculative
Execution
S. PAGE COUNT
12. PERSONAL AUTHOR(S)
=
13a. TYPE OF REPORT
ALEWINE, N. J, W. K. Fuchs, and W.-M. Hwu
1135. TIME COVERED 114. DATE OF REPORT O'ear, Month, Day)
Technical I FROM . TO _,. I ] qq_ .T,11 v 1 P
16. SUPPLEMENTARY NOTATION
18
ii
17. COSATI CODES I 18. SUBJECT TERMS (Continue on reverse if _ces_ty and identify by block numbed
FIELD I GROUP I SUB-GROUP I rollback recovery, compller-asslsted multiple instruction,
• I I transient processor failures, instructional level paral-
,, lel t _m
:9 ABSTRACT (Continue on reverse if necessary and identify by block number)
Speculative execution is a method to increase instruction level parallelism which can be exploited by both
super-scalar and VLIW architectures. The key to a successful general speculation strategy is a repair
mechanism to handle mispredicted branches and accurate reporting of exceptions for speculated instrucitons.
Multiple instruction rollback is a technique developed for recovery from transient processor failure. Many
of the difficulties encountered during recovery from branch misprediction or from instruction re-execution
due to exception in a speculative execution architecute are similar to those encountered during multiple
instruction rollback.
This paper investigates the applicability of a recently developed compiler-assisted multiple instruciton
rollback scheme to aid in speculative exectuion repair. Extensions to the ocmpiler-assisted scheme to support
branch and exception repair are presented along with performance measurements across ten application
programs.
20. DISTRIBUTION I AVAIL/kBIUTY OF ABSTRACT J21. ABSTRACT SECURITY CLASSIFICATION
I_'IUNCLASSIFIEDAJNUMITED [] SAME AS RPT. [] DTI¢ USERS'J Unclassified
22a. NAME OF RESPONSIBLE INDIVIDUAL ...... FZb.TELEPHONEOndu¢_ AreaCode)/22c. OFFICE SYMBOL
Illll
DD FORM 1473, 84 MAR 83 APR edition may be used until exhausted. SECURITy,, CLASSIFICATION OF THIS PAGE
All other editions are obsolete.
D_;CLAS S I FI ED
TO APPEAR: WORKSHOP ON HARDWARE AND SOFTWARE ARCHITECTURES FOR FAULT
TOLERANCE: PERSPECTIVES AND TOWARDS A SYNTHESIS JUNE 14-16, 1993
LE _R'I" S,_IFI"-'_CB_, FP,_CE
APPLICATION OF COMPILER-ASSISTED MULTIPLE
INSTRUCTION ROLLBACK RECOVERY TO
SPECULATIVE EXECUTION
N. J. Alewine, W. K. Fuchs, W.-M. Hwu
Center for Reliable and High-Performance Computing
Coordinated Science Laboratory
University of Illinois at Urbana-Champaign
Abstract
Speculative execution is a method to increase in-
struction level parallelism which can be exploited by
both super-scalar and VLIW architectures. The key
to a successful general speculation strate_W is a repair
mechanism to handle mispredicted branches and ac-
curate reporting of exceptions for speculated instruc-
tions. Multiple instruction rollback is a technique
developed for recovery from transient processor fail-
ures. Many of the difficulties encountered during re-
cover/from branch misprediction or from instruction
re-execution due to exceptions in a speculative exe-
cution architecture are similar to those encountered
during multiple instruction roUback.
This paper investigates the applicability of a
centiy developed compiler-a_isted multiple instruc-
tion rollback scheme to aid in speculative execution
repair. Extensions to the compiler-assisted scheme
to support branch and exception repair are presented
along with performance measurements acro_ ten ap-
plication programs.
1 Introduction
Super-scalar and VLIW architectures have been
shown effective in exploiting instruction level paral-
lelism (ILP) present in a given application [1-3]. Cre-
ating additional ILP in applications has been the sub-
ject of study in recent years [4-6]. Code motion within
a basic block is insufficient to unlock the full potential
of super-scalar and VLIW processors with issue rates
"InternationAl Btud_e_ Machines Corporation, Boca P_n,
I;I.
I Thk research wu mpported in part by the National Aero-
na, utJcJ and Space Admini_rathm _NASA ) under gran_ NASA
NAG 1-613, in cooperation with the minois Computer Labora-
tory for A_e Systen_ and Software (ICLASS), and in part
by the Department of the Navy and managed by the Omce ot
the Chief of Naval Research under Contract N00014-91-J-1283.
greater than two [3]. Given a trace of the most fre-
quently executed basic blocks, limited code movement
across block boundaries can create additional ILP at
the expense of requiring complex compensation code
to ensure program correctness [7]. Combining multiple
basic blocksinto muperblocks permits code movement
within the superblock without the compensation code
required in standard trace scheduling [3].
General upward and downward code movement
across trace entry points (joins) and general down-
ward code motion across trace exit points (branches,
or forks) is permitted without the need for special
hardware support [7]. Sophisticated hardware support
is required, however, for unrestricted upward code mo-
tion across a branch boundary. Such code motion
is referred to as speculative ezecution and has been
shown to substantially enhance performance over non-
speculated architectures [8-10]. This paper focuses on
the support hardware for speculative execution, which
ensures correct operation in the presence of except-
ing speculated instructions (referred to as exception
repair) and of mispredicted branches (referred to as
branch repair). It is shown that data hazards which re-
sult from exception and branch repair are very similar
to data hazards that result from multiple instruction
rollback, and that techniques used to resolve rollback
data hazards are applicable to exception and branch
repair.
The remainder of the paper is organized as follows.
Section 2 gives a brief overview of a compiler-assisted
multiple instruction rollback (MIR) scheme to be used
as a base for application to speculative execution re-
pair (SER). Section 3 describes speculative execution
and the requirements for exception repair and branch
repair. Section 4 introduces a schedule _consLr_c.
tion scheme and extends the compiler-assisted rollback
scheme. Section 5 describes read buffer flush costs and
Section 6 presents performance impacts which result
from read buffer flushes.
2 Compiler-Assisted Multiple Instruc-
tion Rollback Recovery
2.1 Hazard Classification
Within a general error model, data hazards result-
ing from instruction retry are of two types [11-13].
On-path hazards are those encountered when the in-
struction path after rollback is the same as the initial
path and branch hazards are those encountered when
the instruction path after rollback is different than the
initial path. As shown in Figure 1, rm represents an
on-path hazard where during the initial instruction se-
1
N is live
1
i
• ii
• I
I
• ||
Ii rye-'= rd+'_c i
Figure 2: Branch data hazard.
I'," ": +"bl ,
N •
I,:-,'= +x I
rollbsck
Figure h On-path data hazard.
quen¢_ rffiis written and after rollback is read prior to
being re-written. As shown in Figure 2, ry represents
a branch hazard where the initial instruction sequence
writes ry and al_er rollback ry is read prior to being re-
written however this time not along the original path.
2.2 On-path Hazard Resolution Using a
Read Buffer
Hardware support consisting of a read buffer of size
2N, as shown in Figure 3, has been shown to be ef-
fective in resolving on-path hazards [11-13]. The read
buffer maintains a window of register read history. If
an on-path hazard is present, then prior to writing
over the old value of the hazard register, a read of
that value must have taken place within the last N
instructions (else after rollback of <_ 2V, a read of the
hazard register would not occur before a redefinition).
Key to this scenario is the fact that the original path
is repeated. Branch hazard resolution is left to the
A
= I
li I,
'_-- Read Buffer
C
|
Figure 3: Read buffer.
compiler. At rollback, the read buffer is flushed back
to the general purpose register file (GPRF), restoring
the register file to a restartable state. The primary
advantage of the read buffer is that it does not require
an additional read port as with a history buffer, repli-
cation of the GPRF as with the future file, or bypass
logic as with the reorder buffer or delayed write buffer
[14,15].
2.3 Branch Hazard Removal Compiler
Transformations
Compiler transformations have been shown to be
effective in resolving branch hazards [11, 12]. Branch
hazard resolution occursat three levels; I) pseudo
code,2) machine code,and 3) post-pass.Reso|ution
at the pseudo code levelwould be accomplishedby
renamingthe pseudoregistereyofinstructionli(Fig-
ure 2) to rz. Node splitting, loop expansion and loop
protection transformations aid in breaking pseudo reg-
ister equivalence relationships so that renaming can
be performed. After the pseudo registers are mapped
to physics] registers, some branch hazards could re-
appear. This is prevented at the machine code level
by adding hazard constraintsto liverange constraints
prior to registerallocation.Branch hazards that re-
main afterthe firsttwo levelscan be resolvedby either
creating a %overing _ on-path hazard or by inserting
hop (no operation) instructionsahead of the hazard
instructionuntilthe rollbackisguaranteed to be un-
der the branch. Given the branch hazard of Figure
2, a covering on-path hazard is created by inserting
an MOV r_,rv instructionimmediately before the in-
structionin which rv isdefined.This guarantees that
the old value of % isloaded into the read bufferand
isavailableto restorethe registerfileduring rollback.
3 Speculative Execution
Figures 4 and 5 illustratethe two basic problems
which are encountered when attempting upward code
motion acro_ a branch. As shown in Figure 4, ifthe
"m
'oj jbranchmkm
_v T
ie , *
" ................... *Q_ I
Figure 4: rl in live_out of taken path.
speculated instruction (i.e., an instruction moved up-
ward past one or more branches) modifies the system
state, and due to the branch outcome the speculated
instruction should not have been executed, program
correctne_ could be affected. Figure 5 illustrates that
if the speculated instruction causes an exception, and
again due to the branch outcome, the excepting in-
struction should not have been executed, program per-
formance or even program correctness could be af-
fected.
tJr_ = _( r2)l-_--'trap occun
: O I [branch taken
_v T
:,e • •
t J
Figure 5: Speculated instructiontraps.
3.1 Branch Repair
Figure6 shows an originalinstructionschedule and
a new schedule afterspeculation. Instructionsd, i,
and / have been speculated above branches c and
9 from their respectivefall-throughpaths.2 Specu-
lated instructionsare marked "(s)." The motivation
for such a schedule might be to hide the load delay
of the speculated instructionsor to allow more time
forthe operands ofthe branch instructionstobecome
available.Ifc commits to the taken path (i.e.,itis
mispredicted by the staticscheduler), some changes
to the system statethat have resultedfrom the execu-
tionofd, i,and f,may have tobe undone. No update
isrequired for the PC; execution simply begins at j.
Ifinstead,c commits to the fail-throughpath but 9
commits to the taken path, then only i'schanges to
the system statemay have to be undone.
Not allchanges to the system stateare equallyim-
portant. If for example, d writes to register r= and
r= _ live_in(j)(i.e.,along the path startingat j, a
redefinitionof r= willbe encountered priorto a use of
r, [16]),then the originalvalue of 7"=does not have
to be restored. Inconsistenciesto the system state
as a resultof mispredicted branches exhibitsimilari-
tiesto branch hazards in multipleinstructionrollback
[11,12]. Given this similaritybetween branch haz-
ards due to instructionrollba_.kand branch hazards
due to speculative execution, compiler-driven data-
flowmanipulations,similarto thosedeveloped toelim-
inate branch hazards forMIR [11,12],can be used to
resolvebranch hazards that resultfrom speculation.
Such compiler transformationshave been proposed for
2For thiB exmnple it is a_mmed that the fall-through patl_
are the mo,t likely outcome of the br_nch decisions at c aad 9-
a a
b (')cl
j (')i
d b
¢ (s)f
f
Lh _ k e
h Lh _ k
i h
RB_c: d
e
f
i
jump L1
RB_g: h
i
jump 1.2
Original Speculau_ Recovery
Schedule Schedule Blocks
Figure 6: Branch repair.
branch misprediction handling [9]. Since re-execution
of speculated instructions is not required for branch
mispredlction, compiler resolution of branch hazards
becomes a sufficient branch repair technique.
3.2 Exception Repair
Figure 6 also demonstrates the handling of spec-
ulated trapping instructions. If d is a trapping in-
struction and an exception occurred during its execu-
tion, handling of the exception must be delayed until ¢
commits so that changes to the system state are mini.
mized, and in some cases to ensure that repair is pos-
sible in the event that c is mispredlcted. If c commits
to the taken path, the exception is ignored and d is
handled like any other speculated instruction given a
branch mispredict. If c was correctly predicted, three
exception repair strategies are possible. The first is to
undo the effects of only those instructions speculated
above c (i.e., d, i, and f) and then branch to a recovo
ery block RB_c [10] as shown in Figure 6. The address
of the recovery block can be obtained by using the PC
value of the excepting instruction as an index into a
hash table. This strategy ensures precise interrupts
[14,17] relative to the nonspeculated schedule but not
relative to the original schedule. Recovery blocks can
cause significant code growth [10]. The second strat-
egy undoes the effects of all instructions su.b_uent to
d (i.e., i, b, and/), handles the exception, and resumes
execution at instruction i [9]. This latter strategy pro-
rides restartable states and does not require recovery
blocks. A third exception repair strategy undoes the
effects of only those subsequent instructions that are
speculated above c (i.e., only i and/), handles the ex-
ception, and resumes execution at instruction i, how-
ever, this time only executing speculated instructions
until c is reached. The improved efficiency of strategy
3 over that of strategy 2 comes at the coat of slightly
more complex exception repair hardware.
When a branch commits and is mispredicted, the
exception repair hardware must perform three func-
tions: 1) determine whether an exception has occurred
during the execution of a speculated instruction, 2) if
an exception has occurred, determine the PC value
of the excepting instruction, and 3) determine which
changes to the system state must be undone. Func-
tions 1 and 2 are similar to error detection and location
in multiple instruction rollback. Function 3 is similar
to on-path hazard resolution in multiple instruction
rollback [11,12, 18]. On-path hazards assume that af-
ter rollback the initial instruction sequence from the
faulty instruction to the instruction where the error
was detected is repeated.
Figure 7 illustrates the speculation of a group of
| /e
e
i nb=k
i "
_e'D'D_DeO e,De 0 Blipi el IDIDI! H_D g I y
• branch •
• not •
• _ •
Figure 7: Exception repair.
instructions and re-execution strategy 3. The load in-
struction traps, but the exception is not handled un-
til the branch instruction commits to the fail-through
path. Control is then returned to the trapping instruc-
tion. This scenario is identical to multiple instruction
rollback where an error occurs during the load instruc-
tion and is detected during the branch instruction. For
this example, only el must be restored during rollback
since r4 and rs will be rewritten prior to use during
re-execution. Figure 7 shows that exception repair
hazards in speculative execution are the same as on-
path hazards in multiple instruction rollback, and a
read buffer as described in Section 2 can be used to
resolve these hazards. The depth of the read buffer is
the maximum distancefrom I, to In along any back-
wards walk3, where In isa trapping instructionthat
was speculated above branch instructionI_.
3.3 Schedule Reconstruction
Assumed in Figures 6 and 7 are mechanisms to
identifyspeculative instructions,determine the PC
value of excepting speculated instructions,and deter-
mine how many branches a given instructionhas been
speculated above. An example of the lattercase is
shown in Figure 0 where instructions d, i, and f, axe
undone if c is mispredicted; however, only i must be
undone if g is mispredicted.
If the hardware had access to the original code
schedule, the design of these mechanisms would be
straightforward. Unfortunately, static scheduling re-
orders instructions at compile-time and information as
to the original code schedule is lost. To enable recov-
ery from mispredicted branches and proper handling
of speculated exceptions, some information relative to
the originalinstructionorder must be present in the
compiler-emittedinstructions.This willbe referredto
as schedule recon._raction.
By limitingthe flexibilityof the scheduler,lessin-
formation about the originalschedule isrequired.For
example, ifspeculation is limited to one levelonly
(i.e.,above a singlebranch), a singlebitin the opcode
fieldissufficiento indicatethat the instructionhas
been moved above the next branch [8].The hardware
would then know exactly which instructioneffectsto
und6 (i.e.,the ones with thisbit set). Also, remov-
ing branch hazards directlywith the compiler permits
general speculation with no schedule reconstruction
for branch repair [9].
4 Implicit Index Schedule Reconstruc-
tion
Implicit indez scheduling supports general specula-
tion of regular and trapping instructions. The scheme
was inspired by the handling of stores in the sentinel
scheduling scheme [9] and was designed to exploit the
unique properties of the read buffer hardware design
described in Section 2. Schedule reconstruction is ac-
complished by marking each instruction specalated or
3 A ,#alk is a sequence of edge tr&verMia in a graph where the
edges visited can be repeated [19].
nonspec,lated by including a bit in the opcode field,
and using this encoding to maintain an operand his-
tory of speculated instructions in a FIFO queue called
a speculation read buffer (SRB). The SRB operates
similar to a read buffer with additional provisions for
exception handling.
4.1 Exception Repair Using a Speculation
Read Buffer
Figure 8 shows an originalcode schedule and two
speculativeschedules,along with the contents of the
SRB at the time branches Icand _rw commit. Instruc-
tions [d and I I have been speculated above branch
instruction Ic, and Ii has been speculated above both
I s and I¢. The encoding of speculated instructions in-
forum the hardware that the source operands are to
be saved in the SRB, along with the source operand
values, corresponding register addresses, and the PC
of the speculated instruction.
Speculated instructions execute normally unless
they trap. If a speculated instruction traps, the ex-
ception bit in the SRB which corresponds to the trap-
ping instruction is set and program execution contin-
ues. Subsequent instructions that use the result of the
trapping instruction are allowed to execute normally.
A chk.ezcept(k) instruction is placed in the home
block of each speculated instruction. Only one
chLezcept(k) instruction is required for a home block.
As the name implies, chLezcept(k) checks for pend-
ing exceptions. The command can simultaneously in-
terrogate each location in the SRB by utilizing the
bit field k. As shown in schedule 1 of Figure 8,
chk.ezcept(OOIlll) in I_ checks exceptions for instruc-
tions Id and I_. If a checked exception bit isset,the
SRB isflushedinreverseorder,restoringthe appropri-
ate registerand PC values.Execution can then begin
with the excepting instruction.
Figure 8 illustrateseveralon-path hazards which
are resolvedby the SRB. In schedule 1,if1_traps and
the branch [c commits to the taken path, _r_has cor-
rupted r2 and [! has corrupted rv. Flushing the SRB
up through [i restoresboth registersto theirvalues
priorto the initialexecution of li. Note that register
re isalsocorrupted but not restoredby the SRB, since
afterrollback,s willbe rewrittenwith a correctvalue
beforethe corrupted value isused.
As an alternativeto checking forexceptionsineach
home block,the exception could be handled when the
exception bit reaches the bottom of the SRB. This is
similarto the reorderbufferused indynamic schedul-
ing [14] and eliminates the cost of the chk_ezcept(k)
command, however, increases the exception handling
Oril_inal Schedule
I_ r_ = r2 * r_
x_ r3 = r, + rJ
L: t_ _l° _" ZS
I¢_ r6 = rT * r8
I.: rs = r& + 4
If: rT= rT + 4
It: bne rt, r7, Ik
zh: r6= r6 + "t
Ii: r2 = MEM(r2 )
1
c 2N
d
l
PC
Reg. No.
Zr - 0
If value(r 7) 7
xd _,_ s
Id _l_[r 7) 7
Ii - 0
It value(r2) 2
SRB Contents
Speculated Schedule 1 S/_ulated Sch_ule 2
la: rl = r2 * r_ la: rl .. r2 * r_
I¢: bile r/, rj, Ij I¢: b;l_ r1, r3, lj
I_: chk_except(001111)- I_: chk..exeept(llO011)
le: re= r&+ 4 I.: rs= ra+ 4
It: btm rs. r7, It It" bner&, r7. It
I_ chk_except(110000) I;: chk=_except(O01100) --,]
Is: r6-:6 + 4 I_ r6 1"6+4
:
PC Except bit --I
: It _,)71 IJ -,: . oI 1,.._
SRBConmnm
Figure8: Exceptionrepairusinga speculationreadbuffer(SRB).
latency which can impact performance depending on
the frequency of exceptions.
Impficit index scheduling derives its name from the
ability of the compiler to locate a particular restorer
value within the SRB. This is pouible only if the dy-
namically occurring history of speculated instructions
is deterministic at branch boundaries. Superbloclm
guarantee this by ensuring that the sole entry into the
superblock is at the header and by limiting specula-
tion to within the superblock. For standard blocks,
bookkeeping code [7] can be used to ensure this deter-
ministic behavior.
4.2 Branch Repair Using a Speculation
Read Buffer
As describedinSection2,branch repaircan be han-
dled by resolvingbranch hazards with the compiler.
Branch hazard resolutionin multiple instructionroll-
back can be a_isted by the read bufferwhen cover-
ing on-path hazards are present,reducing the perfor-
mance cost of variablerenaming [11,12].In a similar
fashion,the SRB can assistin branch repair.Figure
9 shows the originalcode schedule and the two spec-
ulativeschedulesof Figure 8. For thisexample, itis
+
assumed that r_, _, re, and rv are elements in both
live_in( Ij ) and lit_e_in( I_ ).
As shown in schedule I, ifbranch instructionI,
commits to the taken path, r2,rs,and rv,which were
modified in I+,Ij, and I/, respectively,must be re-
stored.Ifinstead,Iccommits to the fall-throughpath
and Igcommits to the taken path, only r2 must be re-
stored. Registersr2 and rv are rollbackhazards that
resultfrom exception repair;therefore,the SRB con-
talnstheirunmodified values.By includinga fl_h(_)
command at the targetof Ic and Ig, the SRB can be
used to restorer2 and/or rv given a mispredictionof
I, orig.
The flush(k) command selectively flushes the ap-
propriate register values given a branch misprediction.
For example, in schedule 2 of Figure 9, ifI¢ is predicted
correctly and Ig is mispredicted, the SRB is flushed in
reverse order up through Ii, restoring value(r2) from
Ii but not restoringvalue(rv) from I I. Since specu-
lation is always from the most probable branch path,
the flush(k)command is always placed on the most
improbable branch path, minimizing the performance
penalty. Not allbranch hazards are resolved by the
presence of on-path hazards. These remaining haz-
ards can be resolvedwith compiler transformations.
5 SRB Flush Penalty
The examples of Section 4 demonstrate that
compiler-assisted multiple instruction rollback can be
applied to both branch repair and exception repair in a
speculative execution architecture. The flush penalty
of the read buffer is not a key conceen in multiple in-
struction rollback applications since instruction faults
are typically very rare. In application to exception re-
pair in speculative execution, the SRB flush penalty is
not a major concern due to the infrequency of ex-
ceptions involving speculated instructions. However,
in application to branch repair, the SRB flush penalty
could produce significant performance impacts. Stud-
ies of branch behavior show a conditional branch fre-
quency of 11% to 17% [20].Staticbranch prediction
methods resultin branch mispredictiousin the range
of 5% to 15%. This resultsin a branch repair fre-
quency as high as 2.5%. Assuming a CPI (clockcycles
per instruction)rate ofone and an average SRB flush
penaltyoften cycles,the performance overhead ofthe
flushmechanism would reach 22.5%. This indicates
the importance of minimizing the amount of redun-
dant data stored inthe SRB so that the flushpenalty
isreduced.
Recently, a technique was proposed to reduce the
amount ofredundant data in a read bufferso that the
read buffersizecould be reduced [12,13]. A similar
technique can be used to assure that only the data
required for branch and exception repairisstored in
the SRB. In the implicitindex scheme of Section4, a
bit indicatingwhether an instructionisspeculated is
added to the opcode field.By expanded thisfieldto
two bits,operand storage requirements can be spec-
ified.Figure 10 shows the reduced contents of the
SRB given schedule I of Figure 9. In the modified
scheme, only the firstread of rr must be maintained.
Register rs is not required since it was not modified.
The improved scheme also eliminates blank spaces in
the SRB. For this example, the misprediction of Ic in
schedule 1 of Figure 9 results in four lessvariablesto
flush.
The coding of the two speculationbitswould be as
follows:00) no save required,01) save operand I,10)
save operand 2,and 11) save both operands. Ifneither
operand of a speculated instruction has be saved in
the SRB, the instruction is not masked as speculated.
This is not a problem for branch repair: however, if
such an instruction traps, the hardware would have no
way of knowing not to handle the exception immedi-
ately. There would also be no entry in the SRB for the
exception bit or for the corresponding PC value. One
solution to the problem would be to add another bit to
.Ori."l_al Schedule Speculated Schedule 1
I_ rl = r2* r3
I_ rj- v4 + rJ
I_ v6 = rT , v8
I,: vs = va + 4
If _7= rT+ _
I_: b= _a. "7. Ik
I_ r6= r6+ 4
I_: r2 = M]_(r 2)
T lIf' " oIdl va/u_r_) 8
c 2N Id_wNne(r:fl 7
o lI, - o
d Ii va/ue(r_ 2
l
I
I_ I"1"i"2" r3
I.+r+/_. r8
Ib;r_+ r5
I=: bne vl. v3. Ij
I,: v8 = va + 4
I_ br_ v&. v7, Ik
I/_ r6" r6+ 4
e
e
Ii n_o_)--
f
l c
11 0
$ r
h d
1
Speculated Schedule 2
SRB Contents
I,: r_" rs + 4
Ii b_ r_. _7,I_
I_: r_ f r_ + 4
Ij: flush(Ill010) --
I_: nush(OOlO00)---,
I
• I
I
l If 0
_fv_, 7)7
Ii 0
2N li _l_e(,_) 2
Id value(r_) 8
Id _l_(rT) 7
SRB Contcnm
Figure9: Branch repair using a speculationread buffer(SRB).
u o
L
PC Except bit
Reg. No. -
Id v_r7) 7
x, 2
M Contents
Figure I0: SRB with reduced content.
Inmz men ion Ori nl  odc
code imtru_om
0 E-7
Figure II: Instrumentation code placement.
the opcode field which marks speculated trapping in-
structions. A better solution is to code all speculated
trapping instructions which have no operands to save
as 01. This will indicate that exception handling is to
be delayed and cause a reservation of an entry in the
SRB, and also will slightly increase the flush penalty
during branch repairs.
6 Performance Evaluation
6.1 Evaluation Methodology
In this section, results of a read buffer flush penalty
evaluation are presented. The instrumentation code
segments of Figure 11 call a branch error procedure
which performs the following functions:
1. Update the read buffer model.
. Force actual branch errors during program exe-
cution, allowing execution to proceed along an
incorrect path for a controlled number of instruc-
tions.
. Terminate execution along the incorrect path sad
restore the required system state from the simu-
lated read buffer.
4. Measure the resulting flush cycles during the
branch repair.
5. Begin execution along the correct path until the
next branch is encountered.
An example instrumentation code segment is shown
in Figure 12. Parameters, such as operand saving in-
formation, current PC, branch fall-though PC, and
branch target PC values, are passed by the instru-
mentation code to the branch error procedure. An
additional miscellaneous parameter contains instruc-
tion type and information used for debugging.
Figure 13 gives a high level flow of operation for the
branch error procedure. When a branch instruction
in the original application program is encountered, an
arm{.branch flag is set. Prior to the execution of the
next application instruction, the arnt.branch flag is
checked, and if set, the branch decision made by the
application program is set aside. The branch is then
predicted by the branch prediction model. Four mod-
eL are tuted in the evaluation: 1) predict taken, 2) pre-
dict not taken, 3) dynamic prediction, and 4) static
prediction from profiling information. The dynamic
prediction model is derived from a two bit counter
branch target buffer (BTB) design [21] and is the
only model that requires updating with each predic-
tion outcome.
After the branch is predicted, the prediction is
checked against the actual branch path taken by the
application program. If the prediction was correct, ex-
ecution proceeds normally. If the prediction was incor-
rect, the correct branch path is loaded into the recov-
ery queue along with a branch error detection (BED)
latency, and the predicted path is loaded into the PC.
The BED latency indicates how long the execution of
instructions is to continue along the incorrect path.
The branch error time_ouL flag is set when the BED
latency is reached. When a branch error is detected,
the register file state is repaired using the read buffer
contents. The PC value of the correct branch path is
obtained from the recovery queue. During branch er-
ror rollback recovery, the number of cycles required to
flush the read buffer during branch repair is recorded.
$ simlb 2 24 0"
T instruCtiOn 24
# Begin brsim sim hook: sl - 16, s2 - 0: normal
_spi 44subu
la
sw
la
sw
la
sw
li
sw
li
sw
move
Sat,
Sat,
Sat,
Sat,
Sat,
Sat,
Sat,
Sat,
Sat,
Sat,
Sat,
#
$ simlb 2 25 l--
bne $16,
$_main 6 :
$ simlb 2 24 0 _ hookaddr_
2_($sp)
$ simlb 2 24 1 _ _s_c_oQadge_ss
2_($sp}
$ s imlb 2 25 0-4----next hook
28 ($sp)
8216 _ n_sc_Uaneous
32 ($sp)
16 _ dh_cts read buff= to save
40 ($sp) regis= 16
Ssp
j brsim save
# End brsim simho_k.
$_simlb 2 24 I_
$ simlb 2 25 0:
instruct i_n 25
# Begin brsim sim hook: sl - 16, s2 - 9: branch
subu _sp, 44
la Sat, $ simlb 2 25 0_@----hookadd_ss
sw Sat, 2_($sp)
la Sat, $ simlb 2 25 1-4----L_n_fionadre_
sw Sat, 2_($sp)
la Sat, $ main 6
sw Sat, 2_($spY
li Sat, 532505
sw Sat, 32($sp)
la Sat, $ main 5 _
sw Sat, 3_($sp_
li Sat, 304
sw Sat, 40($sp)
move Sat, $sp
j brsim save
End brsim sim ho_k.
ne_thookaddxess
m_7,elIaneous
mrget
d_-ectsreadbuff=tosave
_s_ 16 and 9
$9, $_main_5 I_@--- o_ms_cfion
Figure 12: Instrumentation code sequences.
YN
Y
N Y
N
branch
Y
atm__h <- 1
update
RB model
I ream i
load recovery queue
withnotpredicu_lpath
)
m recovery queue
PC - program counter
GPRF - general purpose register file
RB - read buffer
BPM - branch prediction model
N
' r
. restore GPRF f_om ]
RB model, record
flush cycles
• load PC fxom
recovery queue
Figure 13: Branch error procedure operation.
Table1: Application programs.
I Program IIStatic D cnption
QUEEN
WC
QSORT
CMP
GREP
PUZZLE
COMPRESS
LEX
YACC
CCCP
148 eight-queen program
181 UNIX utility
252 quick sort algorithm
262 UNIX utility
907
932
1826
6856
8099
8775
UNIX utility
simple game
UNIX utility
lexical analyzer
parser-_enerator
preprocemor for
gnu C compiler
It is smumed for this evaluation that two read
buffer entries can be flushed in a single cycle. This cor-
responds to a split-cycle-save assumption of the gen-
eral purpoze register file [12]. Performance overhead
due to read buffer flushes (% increaze) is computed sa
fluah.cpcle8Flush..OH = 100. '
total._'ycles
All instructions are amurned to require one cycle for
execution. This amumption is conservative since the
MIPS processor used for the evaluation requires two
cycles for a load. The additional cycles would increase
the total_cyclea and thereby reduce the observed per-
formance overhead. In addition to accurately measur-
ing flush costs, the evaluation verifies the operation of
the read buffer and its ability to restore the appropri-
ate system state over a wide range of applications.
The instrumentation insertion transformation oper-
ates on the s-code emitted by the MIPS code generator
of the IMPACT C compiler [3]. The transformation
determines which operanck require saving in the read
buffer and inserts calls to the initialization, branch er-
ror, and summary procedures. The resulting s-code
modules are then compiled and run on a DECstation
3100. For the evaluation, BED latencies from 1 to 10
were used. Table 1 lists the ten application programs
evaluated. Static Size is the number of amembly in-
structions emitted by the code generator, not includ-
ing the library routines and other fixed overhead.
6.2 Evaluation Results
Experimental measurements of read buffer flush
overhead (Fl_h OH) for various BED latencies are
shown in Figures 14 through 23. The four branch
Hush OH(%)
P N T_en:-o-
0- o if,,! °. °., **.M...°.
10" _
BED Latency
Figure 14: Flush penalty: QUEEN.
Flush OH
(%)
50-
40-
30-
20.
10.
0
n: ...o-
n:-_-
Lt
2 3 4 _ 6 7 8 9 I'0
BED I._Uency
Figure 15: Flush penalty: WC.
prediction strategies used for the evaluation are:
1) predict taken (P_Taken), 2) predict not taken
(P.N_Taken), 3) dynamic predictionbased on a
branch target buffer (Dpn_Pred), and 4) static branch
prediction using profiling data (Prof_Pred).
Flush costs were closely related to branch predic-
tion accuracies, i.e., the more often a branch was mis-
predicted, the more often flush costs were incurred.
In a speculative execution architecture, branch predic-
tion inaccuracies result in performance impacts in ad-
dition to the impacts from the branch repair scheme.
Branch misprediction increases the base run time of
an application by permitting speculative execution of
unproductive instructions. Increased levels of specula-
tion increase the performance impacts associated with
branch prediction inaccuracies. Only the performance
impacts associated read buffer flushes are shown in
Figures 14 through 23.
FlushOH(%)
5O"
4O-
30-
2O-
10
P Taken:
P_-N_Takcn:-a-
Dyn_Pmd: .._..
Prof_Pred: --4-
BED Late,no7
Figure 16: Flush penalty: COMPRESS.
Hush OH
(%)
50.HP T_nt: .-_
- P N T aken:-o-
D .-Pre 
Pmf Pre_ --_
.
3O-
2O-
10-
1 2 3 4 5 6 7 8 910
BED Latency
Figure 17: Flush penalty: CMP.
Hush OH
(%)
_ P_Taken: -Q-
£
I_Y ..a-
0 ,
BED Latency
Figure 18: Flush penalty: PUZZLE.
Flush OH
(%)
50- p Taken: --_
- P-N Taken>o-
40-
Prof. Pred: --4--
3O-
- _ m GI
20-
oos _'w
O' _ :_ _. :_ _ "_ _ _ 1'0
BED Latency
Figure 19: Flush penalty: QSORT.
For nine of the ten applications, P_N.Taken was
significantly more accurate or marginally more ac-
curate in predicting branch outcomes than P_Taken.
For QSORT, P.Taken was significantly more accurate
than P.N_Taken. This result demonstrates that in
s speculative execution architecture, it is difficult to
guarantee optimal performance across a range of ap-
plications given a choice between predict-taken and
predict-not-taken branch prediction strategies.
For all but one application, Pref_Pred was more ac-
curate than either P.Taken or P_N.Taken. For CMP,
Prof_Pred, P.N_Taken, and Dyn_Pred were nearly per-
fect in their prediction of branch outcomes. Pre/_Pred
marginally outperformed D_ln_Pred in all applications
except LEX.
The purpose of measuring read buffer flush costs
given the recovery from injected branch errors is to
establish the viability of using a read buffer design
forbranchrepairforspeculativexecution.Although
insuch a speculativescheduleonly staticprediction
strategies would be applicable, the Dyn.Pred model
was included to better assess how varying branch pre-
diction strategies impact flush costs. Overall, the ac-
curacy of Dyn_Pred fell between P_Taken/P_N_Taken
and Pro f_Pred.
Over the ten applications studied, read buffer flush
overhead ranged from 49.91% for the P_Taken strat-
egy in CCCP to .01% for the P_N_Taken strategy for
CMP given s BED of ten. It can be seen from Figures
14 through 23 that s good branch prediction strat-
egy is key to a low read buffer flush cost. The results
show that given a static branch prediction strategy
using profiling data, an average BED of ten produces
flush costs no greater than 14.8% and an average flush
coet of 8.1% acrc_mthe ten applications studied. This
performance overhead is comparable to the overhead
Hush OH
_ PTalmn: --_
P N Talmn:-a-
=-
. °O°°U
lo- ou--a °'°°
j I..Q..Q..Q.._r ° x.....x...,x .-o..x
_- =....._-..- _''_f"_
I ! I
BI_)Lamncy
Figure 20: Flush penalty: GREP.
Flush OH
(%)
50-
40-
3O-
2O-
0
E -.-@-
Taken:-o-
-Pmd: ---.
Pmf_Pmd: .-4-
BED Latency
Figure21:Flushpenalty:LEX.
expectedfrom a delayedwritebuffer scheme with a
maximum allowable BED of ten [15]. Given a max-
imum BED of ten and an average BED of less than
ten, the flush costs of the read buffer would be lees
than that of a delayed write buffer, since a delayed
write buffer is designed for a worst-case BED and the
flush penalty of a read buffer is based on the average
BED. The observed flush costs are small in compari-
son to the substantial performance gain of speculated
architectures over that of nonspeculated architectures
[8--10].
The BED fora givenbranchinthisevaluationcor-
respondsto the number of instructionsmoved above
a branchina speculativeschedule.The resultsofthe
evaluationindicatethatifthe averagenumber ofin-
structiouspeculatedabove a givenbranch is< 10,
then the read bufferbecomes a viableapproach to
handlingbranchrepair.
Hush OH
(%)
50-
2o:
0
P Taken:
P-N Taken:-o-
Prof_Pred: --4--
BED Latency
Figure22:Flushpenalty:YACC.
FlushOH
(%)
30-I V P-..N. Takcn_-o-
- Dyn_Pred: .-_'.--
20" Prof_Pred:--4-.
10-
0
BED Latency
Figure23:Flushpenalty:CCCP.
7 Summary
Speculativexecutionhas been shown tobe an ef-
fectivemethod to create additionalinstructionlevel
parallelismin generalapplications.Speculatingin-
structiousabove branchesrequiresschemes to han-
dlemispredictedbranchesand speculatedinstructions
thattrap.
This paper showed thatbranch hazardsresulting
from branchmispredictiousin speculativexecution
aresimilarto branchhazardsin multipleinstruction
rollbackdevelopedforprocessorerrorrecovery.Itwas
shown thatcompilertechniquespreviouslydeveloped
for error recovery can be used as an effective branch
repair scheme in a speculative execution architecture.
It wae also shown that data hazards that result in
rollback due to exception repair are similar to on-path
hazards suggesting a read buffer approach to exception
repair.
Implicit index schedulin 8 was introduced to exploit
the unique characteristics of rollback recovery using
a read buffer approach. The read buffer design was
extended to include PC values to aid in rollback from
excepting speculated instructions.
Read buffer flush penalties were measured by in-
jecting branch errors into ten target applications and
measuring the flush cycles required to recover from
the branch en_rs using a simulated read buffer. It
was shown that with a static branch prediction strat-
egy using profiling data, flush costs under 15% are
achievable. The results of these evaluations indicate
that compiler-assisted multiple instruction rollback is
viable for branch and exception repair in a speculative
execution architecture.
8 Acknowledgements
The authors wish to thank Shyh-Kwei Chen and
C.-C. Jim Li for their help with the compiler aspects
of this paper. We would like to thank Scott Mahlke,
William Chen, and John Christopher Gyllenhaal for
their excellent technical suggestions and amistance
with the IMPACT C compiler. Finally, we express
our thanks to Janak Patel for his contributions to this
research.
References
[1]
[2]
[3]
[4]
R. P. Colwell, R. P. Nix, J. O'Donnell, D. B. Ps-
pworth, sad P. K. Rodman, "A VLIW Architec-
ture for a Trace Scheduling Compiler," in Pro¢.
_nd Int. Conf. Architecture Support Programming
Langeages and Operating Syst., pp. 105-111, Oct.
1987.
J. C. Dehnert, P. Y. Hsu, and J. P. Bratt, "Over-
lapped Loop Support in the Cydra 5," in Proc.
3rd Int. Conf. Architecture Support Programming
Languages and Operating Syst., pp. 26-38, April
1989.
P. Chang, W. Chen, N. Warter, and W.-
M. W. Hwu, "IMPACT: An Architecture Frame-
work for Multiple-Instruction-Issue Processors,"
in Proc. lSth Anna. Syrup. Comput. Architecture,
pp. 286--275, May 1991.
B. It. Itau and C. D. Glaeser, "Some Scheduling
Techniques and an Easily Schedulable Horizon-
tal Architecture for High Performance Scientific
Computing," in Proc. HOth Anna. Workshop Mi-
croprogramming Microarchitectare, pp. 183-198,
Oct. 1981.
[5] M. S. Laln, "Software Pipelinin8: An Effective
Scheduling Technique for VLIW Machines," in
Pro¢. ACM SIGPLAN 1988 Conf. Programming
Language Design Implementation, pp. 318-328,
June 1988.
[6] A. Aiken and A. Nicolau, "Optimal Loop Paral-
leUzation," in Pro¢. ACM SIGPLAN 1988 Conf.
Programming Language Design Implementation,
pp. 308-317, June 1988.
[7] J. A. Fisher, '_rrace Scheduling: A Technique
for Global Microcode Compaction," IEEE Trans.
Comput., vol. c-30, no. 7, pp. 478-490, July 1981.
[8] M. D. Smith, M. S. Lain, and M. Horowitz,
"Boceting Beyond Scalar Scheduling in a Super-
scalar Processor," in Proc. 17th Anna. Syrup.
Compnt. Architecture, pp. 344-354, May 1990.
[9] S. A. Mahlke, W. Y. Chen, W.-M. W. Hwu, B. K.
ltao, and M. S. Schlansker, "Sentinel Scheduling
for VLIW and Superscalar Processors," in Proc.
5tit Int. Conf. Architecture Support Programming
Languages and Operating Sysf., pp. 238-247, Oct.
1992.
[10] M. D. Smith, M. A. Horowitz, and M. S. Lain,
"Efficient Superscalar Performance Through
Boosting," in Proc. 5th Int. Conf. Architecture
Support Programming Languages and Operating
Sy_., pp. 248-259, Oct. 1992.
[11] N. J. A]ewine, S.-K. Chen, C.-C. J. Li, W. K.
Fuchs, sad W.-M. W. Hwu, "Branch Recov-
ery with Compiler-Assisted Multiple Instruction
Retry," in Pro¢. _th. Int. Syrup. Fault-Tolerant
Comput., pp. 66-73, July 1992.
[12] N. J. Alewine, Compiler-assisted Multiple In-
struction Rollback Recovery asing a Read Buffer.
PhD thesis, Tech. Rep. CRHC-93-06, University
of Illinois at Urbana-Champaign, 1993.
[13] N. J. Alewine, S.-K. Chen, W. K. Fuchs, and W.-
M. W. Hwn, "Compiler-assisted Multiple Instruc-
tion Rollback Recovery using a Read Buffer,"
Tech. Rap. CRHC-93-11, Coordinated Science
Laboratory, University of Illinois, May 1993.
[14] J. E. Smith and A. K. Pleszkun, "Implementing
Precise Interrupts in Pipelined Processors," IEEE
Trans. Comput., vol. 37, pp. 562-573, May 1988.
[15]Y. TamirandM. Tremblsy,"High-Performance
Fault-TolerantVLSISystemsUsingMicroRoll-
bsck," IEEE Tr=_. Comp,t., vol. 39, pp. 548-
554, Apr. 1990.
[16] A. V. Aho, 11. Sethi, and J. D. Ullman, Compil-
ers: Principles, Techniqwes, and Tools. Reading,
MA: Addison-Wesley, 1086.
[17] M. Johnson, Superscalar Microprocessor Design.
Enghwood Clh_, NJ: Prentice-Hall, Inc., 1991.
[18] C.-C. J. Li, S.-K. Chen, W. K. Puchs, and W.-
M. W. Hwu, "Compiler-Assisted Multiple In-
struction Retry," Tech. Rep. CRHC-91-31, Coor-
dinate! Science Laboratory, University of IUinois,
May 1991.
[19] J. A. Bondy and U. Murty, Graph Theory with
Applications. London, England: Macmillan Press
Ltd., 1979.
[20] J. L. Hennessy and D. A. Patterson, Computer
Architecture: A Quantitatiee Approach. San Ma-
t_'o, CA: Morgan Kaufmann Publishers, Inc.,
1990.
[21] J. K. Lee and A. J. Smith, "Branch Prediction
Strategies and Branch Target Buffer Design,"
Computer, vol. 17, no. 1, pp. 6-22, Jan. 1984.
