Decomposing the proof of correctness of pipelined microprocessors by Gopalakrishnan, Ganesh & Hosabettu, Ravi
D e c o m p o s i n g  t h e  P r o o f  o f  C o r r e c t n e s s  o f  P i p e l i n e d  M i c r o p r o c e s s o r s
R avi H o sa b e ttu 1, M andayam  Srivas2, G anesh  G o p a la k rish n an 1
d e p a r t m e n t  o f C o m p u te r  Science 2C o m p u te r  Science L a b o ra to ry
U n iv ers ity  o f U ta h  SR I In te rn a tio n a l
S a lt Lake C ity , U T  84112 M enlo P a rk , CA 94025
C o n tac t em ail: h o sab e tt@ cs.u tah .ed u
J a n u a ry  12, 1998
A b strac t
We present a systematic approach to decompose and incrementally btiild the proof of correctness 
of pipelined microprocessors. The central idea is to construct the abstraction function using comple­
tion functions, one per unfinished instruction, each of which specify the cffect (on the observables) of 
completing the instruction. In addition to avoiding term-size and case explosion as could happen for 
deep and complex pipelines during flushing and helping localize errors, our method can also handle 
stages with iterative loops. The technique is illustrated on pipelined- as well as a superscalar pipelined 
implementations of a subset of the DLX architecture.
K e y w o rd s : Processor verification, Decomposition, Increm ental verification 
C a te g o ry :  A
1 Introduction
M odern m icroprocessors employ radical optim izations such as superscalar pipelining, speculative execu­
tion and out-of-order execution to  enhance their th roughput. These optim izations make microprocessor 
verification difficult in practice. M ost approaches to  mechanical verification of pipelined processors rely on 
the following key techniques: F irst, given a pipelined im plem entation and a  simpler ISA-level specification, 
they require a suitable abstraction m apping from an im plem entation s ta te  to  a specification s ta te  and 
define the correspondence between the two machines using a  com m ute diagram . Second, they use symbolic 
simulation to  derive logical expressions corresponding to  the  two paths in the com m ute diagram  which will 
be then tested for equivalence. An autom atic way to  perform this equivalence testing is to  use ground 
decision procedures for equality with uninterpreted functions such as the ones in PVS. This stra tegy  has 
been used to  verify several processors in PVS [CRSS94,SM96]. Some of the approaches to  pipelined pro­
cessor verification rely on the  user providing the definition for the abstraction function. Burch and Dill 
in [BD94] observed th a t  the  effect of flushing the pipeline, for example by pum ping a sequence of NOPs, 
can be used to  autom atically  com pute a  suitable abstraction  function. Burch and Dill used this flushing 
approach along with a validity checker [JDB95,BDL96] to  effectively au tom ate  the verification of pipelined 
im plem entations of several processors.
The pure flushing approach has the drawback of generating an im practically large abstraction function 
for deeper pipelines. Also, the  num ber of examined cases explodes as the control part becomes complicated. 
To overcome this draw back, Burch [Bur96] decomposed the verification problem into three subproblems 
and suggested an alternative m ethod for constructing the  abstraction function. This m ethod required the
1
user to  add some ex tra  control inputs to  the im plem entation and set them  appropriately while constructing 
the  abstraction function. Along with a validity checker which needed the user to  help with many manually 
derived case splits, he used these techniques in superscalar processor verification. However, despite the 
m anual effort involved, the reduction obtained in the expression size and the number of cases explored as 
well as how the m ethod will scale is not clear.
In this paper, we propose a  system atic methodology to  modularize as well as decompose the proof 
of correctness of microprocessors with complex pipeline architectures. Called the completion funct ions  
m ethod, our approach relies on the  user expressing the abstraction function in term s of a set of completion 
functions, one per unfinished instruction. Each completion function specifies the desired effect (on the 
observables) of completing the instruction. Notice th a t one is not  obligated to  s ta te  how such completion 
would actually be a tta ined , which, indeed, can be very complex, involving details such as squashing, 
pipeline stalls, and even d a ta  dependent iterative loops. Moreover, we strongly believe th a t a typical 
designer would have a very clear understanding of the completion functions, and would not  find the  task  of 
describing them  and constructing the abstraction function onerous. Thus, in addition to  actually gaining 
from designers’ insights, verification based on the completion function m ethod has a num ber of other 
advantages. It results in a  natural decomposition of proofs. Proofs builds up in a  layered m anner where 
the  designer actually debugs the last pipeline stage first  through a  verification condition, and then uses 
th is verification condition as a  rew rite rule in debugging the penultim ate stage, and so on. Because of 
th is layering, the proof stra tegy  employed is fairly simple and alm ost generic in practice. Debugging is far 
more effective than in o ther m ethods because errors can be localized to  a stage, instead of having to  wade 
through monolithic proofs. The m ethod is not explicitly targeted tow ards any single aspect of processor 
design such as control, and can naturally  handle loops in pipeline stages.
1.1 R e la ted  work
Cyrluk has developed a  technique called ’Inverting the abstraction  m apping’ [Cyr96] for guiding theorem 
provers during processor verification. In addition to  not decomposing proofs in our sense, this technique 
also suffers from large term  sizes. Park and Dill have used the idea of aggregation functions in distributed 
cache coherence protocol verification [PD96]. The completion functions are similar to  aggregation functions 
but our goal is the decomposition of the proof we can achieve using them . Additional com parisons with 
past work are made in subsequent sections.
2 Correctness Criteria for Processor Verification
The completion functions approach aims to  realize the correctness criterion expressed in Figure 1(a) (used 
in [SH97]), in a  m anner th a t  proofs based on it are m odular and layered as pointed out earlier. Figure 1(a) 
expresses th a t  n im plem entation transitions which s ta r t  and end with flushed s ta tes  correspond to  m tran ­
sitions in the specification machine where m is the num ber of instructions executed in the specification 
machine. I_ s te p  is the im plem entation transition function and A_step is the specification transition  func­
tion. p r o je c t io n  would ex trac t only those im plem entation s ta te  com ponents visible to  the specification 
i.e. the  observables. This criterion is preferred because it corresponds to  the intuition th a t a  real pipelined 
m icroprocessor s tarting  a t  a flushed sta te , running some program  and term inating in a  flushed s ta te  is 
em ulated by a  specification machine whose s tarting  and term inating  s ta tes  are in direct correspondence 
through projection. One way to  adap t this correctness criterion into an inductive argum ent would be to  
first show th a t  the processor m eets the criterion in Figure 1(b), and then check th a t  the  abstraction  func­
tion ABS satisfies the condition th a t  in a  flushed s ta te  f s ,  A BS(fs) = p r o j e c t i o n ( f s ) .  One also needs 
to  prove th a t  the im plem entation machine will eventually reach a flushed s ta te  if no more instructions are 









Figure 1: Pipelined microprocessor correctness criteria
Intuitively, Figure 1(b) says th a t  if the im plem entation and the specification machines s ta r t  in a corre­
sponding pair of sta tes, then after executing a  transition, their new sta tes  correspond, im p l- s ta te  is an 
arb itrary  reachable s ta te  of the im plem entation machine. Figure 1(b) uses a modified transition function 
A-step* instead of A_step since certain im plem entation transitions might correspond to  executing zero, or 
more than  one instructions in the specification machine. The case of zero instruction can arise if, e.g., the 
im plem entation machine stalls due to  a  load interlock. The case of more than  one instruction can arise 
if, e.g., the the  im plem entation machine has m ultiple pipelines. The num ber of instructions executed by 
the  specification machine is provided by a  function on im plem entation s ta tes  (called the synchronization 
function). One of the crucial proof obligations is to  show th a t this function does not always return  zero.
The m ost difficult task  here is to  define an appropriate abstraction function and to  prove th a t  the 
Figure 1(b) comm utes. One way to  define an abstraction function [BD94] is to  flush the pipeline so th a t  all 
the unfinished instructions complete, and update  the observables, and then apply a projection. Since m ost 
machines allow for stalling the pipeline, i.e., advancing the im plem entation machine w ithout fetching a  new 
instruction, flushing can be performed by a sequence of stall transitions of the  im plem entation machine. 
The num ber of stall transitions required depends on the depth of the pipeline, stall cycles due to  interlocks 
etc. This would generate the following verification condition for proving th a t  Figure 1(b) comm utes (where 
f l u s h  is as discussed before):
Flush_VC: A _ s te p ( p r o je c t io n ( f lu s h ( im p l_ s ta te ) ) )  = p r o j e c t i o n ( f l u s h ( I _ s t e p ( i m p l_ s t a t e ) ) )
It is practical to  prove this verification condition only for simple and shallow pipelines. For superscalar 
processors with multiple pipelines and complex control logic, the logical expressions generated are too large 
to  manage and check equivalence on. A nother drawback is th a t the num ber of stall transitions to  flush 
the pipeline should be known, a priori. This, even if finite, may be indeterm inate if the control involves 
data-dependent loops or if some part of the processor such as memory-cache interface is abstracted  away 
for m anaging the  complexity of the system .
3
3 The Com pletion Functions Approach
The completion functions approach is also based on using an abstraction function corresponding to  flushing 
the entire pipeline. However, th is function is not derived via flushing in our basic approach1. R ather, we 
construct the abstraction function as a  composition of a sequence of completion functions which, as said 
earlier, specifies the desired effect (on the observables) of completing each unfinished instruction. These 
completion functions m ust also leave all non-observable s ta te  com ponents unchanged. The order in which 
these functions are composed is determ ined by the program  order of the unfinished instructions. The 
conditions under which each function is composed with the rest, if any, is determined by w hether the  
unfinished instructions ahead of it could disrupt the flow of instructions e.g., by being a taken branch or by 
raising an exception. Observe th a t  one is not required to  s ta te  how these conditions are actually realised 
in the im plem entation. As we illustrate  later, this definition of the abstraction function leads to  a  very 
natural decomposition of the proof of the com m ute diagram  and supports incremental verification. Any 
mistakes, either in specifying the completion functions or in constructing the abstraction function, m ight 
lead to  a false negative verification result, bu t never a  false positive.
Consider a  very simple four stage pipeline with one observable s ta te  com ponent regf ile which is shown 
in Figure 2. The instructions flow down the pipeline with every cycle in order with no stalls, hazards etc. 
(This is unrealistically simple, bu t we explain how to handle these artifacts in subsequent sections). There 
can be three unfinished instructions in this pipeline a t any tim e, held in the three sets of pipeline registers 
labeled IF /ID , ID /E X , and E X /W B . The completion function corresponding to  an unfinished instruction 
held in a  set of pipeline registers (such as ID /EX ) would s ta te  how the different values stored in th a t  
set of registers (ID /E X  in this example) are combined to  com plete th a t  instruction. In our example, the 
completion functions are C_EX_WB, C_ID_EX and C_IF_ID. Now the abstraction function, whose effect should 
be to  flush the pipeline, can be expressed as a composition of these completion functions as follows (we 
om it projection here as regf ile is the only observable s ta te  com ponent):
ABS(impl_state) = C_IF_ID(C_ID_EX(C_EX_WB(impl_state)))
IF/ID ID/EX EX/WB
Figure 2: A simple four stage pipeline and decomposition of the proof under completion functions
This definition of the abstraction  function leads to  a decomposition of the proof of the com m ute diagram  
for regf ile as shown in Figure 2 . The decomposition shown generates the following series of verification 
conditions, the last one of which corresponds to  the complete com m ute diagram .
VCI: regfile(I_step(impl_state)) = regfile(C_EX_WB(impl_state))
VC2: regfile(C_EX_WB(I_step(impl_state))) = regfile(C_ID_EX(C_EX_WB(impl_state)))








I_step executes the instructions already in the pipeline as well as a newly fetched instruction. Given 
this, VCI expresses the following fact: since regfile is updated in the last stage, we would expect th a t 
after I_step is executed, the contents of regfile would be the same as after completing the instruction 
in the set E X /W B  of pipeline registers.
Now consider the instruction in ID /E X . I-step executes it partia lly  as per the logic in stage EX, and 
then moves the  result to  the set E X /W B  of pipeline registers. C_EX_WB can now take over and complete 
th is instruction. This would result in the sam e contents of regfile as completing the instructions held 
in sets E X /W B  and ID /EX  of pipeline registers in that order. This is captured by VC‘2. VC3 and VC4 
are similar. Note th a t our ultim ate goal is to  prove only VC4, with the proofs of VCI through VC3 
acting as ‘helpers’. Each verification condition in the above series can be proved using a standard strategy  
which involves expanding the outerm ost function on the both sides of the equation and using the previously 
proved verification condition (if any) as a rew rite rule to  simplify the expressions, followed by the necessary 
case analysis, as well as reasoning about the term s introduced by function expansions. Since we expand 
only the  topm ost functions on both sides, and because we use the previously proved verification condition, 
the sizes of the expressions produced during the proof and the required case analysis are kept in check.
As m entioned earlier, the completion functions approach also supports increm ental and layered  verifi­
cation. W hen proving VCI, we are verifying the writeback stage of the pipeline against its specification 
C_EX_WB. W hen proving VC2, we are verifying one more stage of the pipeline, and so on. This makes it 
is easier to  locate errors. In [BD94], if there is a  bug in the pipeline, the validity checker would produce 
a counterexam ple - a  set of formulas potentially involving all the im plem entation variables - th a t implies 
the negation of Flush.VC. Such an ou tp u t is not helpful in pinpointing the bug.
A nother im portan t advantage of the  completion functions m ethod is th a t  it is applicable even when the 
num ber of stall transitions to  flush the pipeline is indeterm inate, which can happen if, e.g., the pipeline con­
tains d a ta  dependent iterative loops. The completion functions, which s ta te  the desired effect of completing 
an unfinished instruction, help us express the  effect of flushing directly. The proof th a t  the im plem entation 
eventually goes to  a  flushed s ta te  can be done by using a m easure function which returns the number of 
cycles the  im plem entation takes to  flush (this will be a d a ta  dependent expression, not a constant) and 
showing th a t either the m easure function decreases after every cycle or the im plem entation machine is 
flushed.
A disadvantage of the  completion functions approach is th a t  the user m ust explicitly specify the defi­
nitions for these completion functions and then construct an abstraction function. In a later section, we 
describe a hybrid approach to  reduce the manual effort involved in th is process.
4 Application to DLX and Superscalar DLX Processors
In th is section, we explain how to  apply our m ethodology to  verify two examples - a pipelined and a 
superscalar pipelined im plem entation of a  subset o f th e  DLX processor [HP90]. We describe how to specify 
the  completion functions and construct an abstraction function, how to  handle stalls, speculative fetching 
and certain  hazards, and illustrate the  particular decomposition and the proof strategies th a t we used. 
These are the  same examples th a t  were verified by Burch and Dill using the  flushing approach in [BD94] 
and by Burch using his techniques in [Bur96] respectively. O ur verification is carried out in PVS.
5
4.1 D L X  p ro c e sso r  d e ta ils
The specification of this processor has four s ta te  com ponents : the program  counter pc, the register file 
r e g f i l e ,  the d a ta  memory dmem and the instruction memory imera. There are six types of instructions 
supported: lo a d , s to r e ,  unconditional jump, conditional b ran ch , a l u - im m ediate and 3-register a lu  in­
struction. The ALU is modeled using an uninterpreted function. The memory system  and the register file 
are modeled as stores with re a d  and w r i te  operations. The sem antics of re a d  and w r i te  operations are pro­
vided using the following two axioms: a d d r l  = addr2  IMPLIES r e a d ( w r i t e ( s t o r e , a d d r l ,v a i l ) , a d d r 2 )  
= v a i l  and a d d r l  /=  addr2  IMPLIES r e a d ( w r i t e ( s t o r e ,a d d r l ,v a l l ) , a d d r 2 )  = r e a d ( s to r e ,a d d r 2 )  
The specification is provided in the form of a transition  function A_step.
The im plem entation is a five stage pipeline as shown in Figure 3. There are four sets of pipeline 
registers holding inform ation about the partially executed instructions in 15 pipeline registers. The intended 
functionality of each of the stages is also shown in the diagram . The im plem entation uses a  simple ’assume 
not taken’ prediction stra tegy  for jump and b ran c h  instructions. Consequently, if a jum p or branch is 
indeed taken (b r .ta k e n  signal is asserted), then the pipeline squashes the subsequent instruction and 
corrects the pc. If the instruction following a lo a d  is dependent on it ( s t_ is s u e  signal is asserted), then 
th a t instruction will be stalled for a cycle in the set IF /ID  of pipeline registers, otherwise they flow down 
the pipeline with every cycle. No instructions are fetched in the  cycle where s t a l l - i n p u t  is asserted. The 
im plem entation provides forwarding of d a ta  to  the instruction decode unit (ID stage) where the operands 
are read. The details of forwarding are not shown in the diagram . The im plem entation is also provided in 
the form of a transition  function I_ s te p . The detailed im plem entation, specification as well as the proofs 
can be found a t [Hos98].
bubblejd
































Store to or 
load from 
memory.




Figure 3: Pipelined im plem entation
4.2 S p e c ify in g  th e  c o m p le tio n  fu n c t io n s
There can be four partially executed instructions in th is processor a t any tim e, one each in the four sets 
of pipeline registers shown. We associate a completion function with each such instruction. We need to  
identify how a  partially executed instruction is stored in a particular set of pipeline registers - once this is 
done, the completion function for th a t  unfinished instruction can be easily derived from the specification.
Consider the  set IF /ID  of pipeline registers. The intended functionality of the IF stage is to  fetch an 
instruction (place it in i n s t r . i d )  and increment the pc. The bub b le_ id  register indicates whether the 
instruction is valid or not. (It might be invalid, for example, if it is being squashed due to  a taken b ran ch ). 
So in order to  com plete the execution of this instruction, the completion function should do nothing if the 
instruction is not valid, otherwise it should update the pc with the targe t address if it is a jump or a taken 
b ran ch  instruction, update the dmem if it is a s to r e  instruction and update the  r e g f i l e  if it is a  load .
6
a lu -im m ed ia te  or a lu  instruction according to  the sem antics of the instruction. The details of how these 
are done is in the specification. This function is not obtained by tracing the im plem entation instead, the 
user directly provides the intended effect. Also note th a t we are not concerned with load interlock or d a ta  
forwarding while specifying the completion function. We call this function C_IF_ID.
Consider the set ID /E X  of pipeline registers. The ID stage completes the execution of jump and b ran ch  
instructions, so this instruction would affect only dmem and r e g f  i l e .  The bubble_ex indicates whether the 
instruction is valid or not, operand_a and o perand .b  are the two operands read by the ID stage, opcode_ex 
and d e s t .e x  determ ine the opcode and the destination register of the instruction and o f f s e t .e x  is used 
to  calculate the memory address for lo a d  and s t o r e  instructions. The completion function should s ta te  
how these inform ation can be combined to complete the instruction, which again can be gleaned from the 
specification. We call this function C_ID_EX. Similarly the  completion functions for the  o ther two sets of 
pipeline registers - C_EX_MEM and C_MEM_WB - are specified.
The completion functions for the unfinished instructions in the initial sets of pipeline registers are 
very close to  the  specification and it is very easy to  derive them . (For example, C_IF_ID is alm ost the 
sam e as the specification). However for the unfinished instructions in the later sets of pipeline registers, 
it is more involved to  derive them  as the user needs to  understand how the inform ation abou t unfinished 
instructions are stored in the various pipeline registers bu t the  functions themselves are much simpler. 
Also the completion functions are independent of how the various stages are implemented and ju st depend 
on their functionality.
4.3 T h e  d e c o m p o s it io n  and  th e  p ro o f  d e ta ils
Since the instructions flow down the  pipeline in order, the  abstraction function is defined the composition 
of these completion functions followed by p r o je c t io n  as shown below:
ABS(impl_state) = proj ection(C_IF_ID(C_ID_EX(C_EX_MEM(C_MEM_WB(impl_state)))))
The synchronization function, for this example, returns zero if there is a load interlock ( s t_ is s u e  is 
true) or s t a l l_ in p u t  is asserted or jum p/b ranch  is taken (b r .ta k e n  is true) otherwise it returns one. The 
modified specification transition  function is A _step’ . The proof th a t  this function is not always zero was 
straightforw ard and we skip the  details here. This is also needed in the approach of [BD94],
4.3.1 The decom position
The decomposition we used for r e g f  i l e  for this example is shown in Figure 4. The justification for the  first 
three verification conditions is similar as in Section 3. There are two verification conditions corresponding 
to  the instruction in set IF /ID  of pipeline registers. If s t_ i s s u e  is true, then th a t instruction is not issued, 
so C_ID_EX ought to  have no effect in the lower path  in the com m ute diagram . VC4_r requires us to  prove 
this under condition P i  =  s t . i s s u e .  VC5_r is for the  case when the instruction is issued, so it should be 
proved under condition P ‘2 =  NOT s t . i s s u e .  VC6_r is the verification condition corresponding to  the final 
com m ute diagram  for r e g f  i l e .
The decomposition for dmem is similar except th a t  the  first verification condition V C l.d  is slightly 
different. Since dmem is not updated in the last stage, VCl_d for dmem sta tes  th a t dmem is not affected by 
C_MEM_WB i.e. dmem(CJlEM_WB(impl_state)) = d m e m (im p l.s ta te ) . The rest of the verification conditions 
are exactly identical to  th a t  of r e g f  i l e .
The comm ute diagram  for pc was decomposed into only three verification conditions. We first one, 
V C l-p , stated  th a t pc(C_ID_EX(C_EX_MEM(C-MEM.WB(impl_state)) ) )  = p c ( im p l_ s ta te )  since completing 
the instructions in the last three sets of pipeline registers will not affect the pc. In addition, completing 
the instruction in set IF /ID  of pipeline registers will not affect the pc too, if th a t instruction is not stalled
7
impl_statc
Figure 4: The decomposition of the com m ute diagram  for regfile
and is not a jum p/taken branch. This is captured by VC‘2.p. The third one, VC3_p, was the verification 
condition corresponding to  the final com m ute diagram  for pc.
The decomposition we used for imem had two verification conditions: VCI J  which sta ted  th a t  com plet­
ing the four instructions in the pipeline has no effect on imem and the second one, V C 2J was corresponding 
to  the final com m ute diagram  for imem.
4.3.2 T he  p ro o f
We need a  rewrite rule for each register of a particular set of pipeline registers th a t s ta tes  th a t  it is unaffected 
by the completion functions of the unfinished instructions ahead of it. For example, for bubble_ex, the 
rewrite rule is bubble_ex(C_EX_MEM(C_MEM_WB(impl_state))) = bubble_ex(impl_state). All these rules 
can be generated and proved autom atically. We then defined a  stra tegy  which would setup these, and 
the definitions and the axioms from the  im plem entation and the specification as rewrite rules. We avoid 
setting  up as rewrite rules those definitions on which we do case analysis - st_issue and br.taken and 
those corresponding to  the feedback logic.
The correctness of the feedback logic is captured succinctly in the  form of following two lemmas, one 
each for the two operands th a t  it reads. If there is a valid instruction in set IF /ID  of pipeline registers 
and it is not stalled, then the value read in the ID stage by the  feedback logic is the same as the  value 
read from regfile after the three instructions ahead of it are completed. Their proofs are done by using 
the  stra tegy  above to  setup  all the  rew rite rules, setting  up the  definitions in the lemmas being proved as 
rew rite rules, followed by an assert to  do the rewrites and simplifications, followed by (apply (then* 
(repeat (lift-if)) (bddsimp) (ground) ) )  to  do the case analysis.
T he proof s tra tegy  for proving all the verification conditions of r e g f i l e  and dmem is similar - use the 
stra tegy  described above to  setup  the rewrite rules, set up the previously proved verification conditions and 
the  lemmas about feedback logic as rewrite rules, expand the outerm ost function on both sides, assert 
to  do the rewrites and simplifications, then do case analysis with (apply (then* (repeat ( l i f t - i f ) )  
(bddsim p) (g ro u n d ) ) ) .  M inor differences were th a t some finished w ithout the need for case analysis (like 
VC2_r and VC2_d) and some needed the outerm ost function to  be expanded on only one of the sides (like 
VC4_r and VC4.d). VC6_r and VC6_d were slightly more involved in th a t the various cases introduced by 
expanding A_step’ were considered in the following order - s t_ i s s u e ,  s t a l l - i n p u t ,  b r . ta k e n  - followed 
by a similar stra tegy  as described before.
The proofs of the verification conditions for pc were again sim ilar except th a t we do additional case 
analysis after expanding br.taken condition. Finally, the proofs of verification conditions for imem were 
trivial since the instruction memory does not change.
8
We needed an invariant in this example: th a t  dest_ex  is ze ro _ reg  whenever b u b b le .ex  is true  or 
opcode.ex  is a s to r e  or a jump or a b ran c h  instruction. M aking d e s t .e x  equal to  ze ro _ reg  was to ensure 
th a t  the r e g f i l e  was not updated under these conditions. The proof th a t the  invariant is closed under 
I_ s te p  was however trivial.
We make two observations here. The proof of a particular verification condition, say for r e g f i l e ,  may 
use the previous verification conditions of all o ther specification s ta te  com ponents, hence these need to  be 
proved in th a t  order. The particular order in which we did the proof was VCl_r, V C ljd , VC2_r, VC2_d, 
VC3_r, VC3.d, the two lemmas for feedback logic, VC4_r, VC4_d, VC5-T, VC5_d, V C 1J, V C l.p , VC2.p, 
VC6_r, VC6_d, VC3_p and VC2 j .  The second observation is th a t this is the particular decomposition th a t 
we chose. We could have avoided proving, say VC4_r, and proved th a t  goal when it arises within, say 
V C6_r, if the  prover can handle the term  sizes.
Finally we prove th a t  the im plem entation machine eventually goes to  a  flushed sta te  if it is stalled 
sufficiently long and then check in th a t  flushed s ta te  f s ,  ABS(fs) = p r o j e c t i o n ( f s ) .  For this example, 
th is proof was done by observing th a t  b u b b le_ id  will be true after two stall transitions (hence no instruction 
in set IF /ID  of pipeline registers) and th a t this ’no-instruction’-ness propagates down the pipeline with 
every stall transition .
4.4 S u p e rs c a la r D L X  p rocesso r
The superscalar DLX processor is a dual issue version of the DLX processor. Both the pipelines have sim ilar 
struc tu re  as Figure 3 except th a t the second pipeline only executes a lu - im m e d ia te  and a lu  instructions. 
In addition, there is one instruction buffer location.
Specifying the completion functions for the various unfinished instructions was similar. A main differ­
ence was how the completion functions of the unfinished instructions in the sets IF /ID  of pipeline registers 
and the  instruction buffer (say the instructions are i ,  j ,  k and completion functions are C_i, C_j and 
C-k respectively) are composed to  handle the  speculative fetching of instructions. These unfinished in­
structions could be potential branches since th e  branch instructions are executed in the ID stage of the 
first pipeline. So while constructing the abstraction function, we compose C_j (with C _ i ( . . . r e s t  o f th e  
co m p le tio n  f u n c t io n s  in  o r d e r . . . ) )  only if instruction i  is not a  taken branch and then compose C_k 
only if instruction j  is not a  taken branch too. We used a sim ilar idea in constructing the  synchronization 
function too. The specification machine would not execute any new instructions if any of the instructions 
i ,  j , k mentioned above is a taken branch. It is very easy and natu ra l to  express these conditions using 
completion functions since we are not concerned with when exactly the branches are taken in the imple­
m entation machine. However, if using the pure flushing approach, even the synchronization function will 
have to  be much more complicated having to  cycle the im plem entation machine for many cycles [Bur96].
A nother difference between the two processors was the complex issue logic here which could issue zero 
to  two instructions per cycle. We had eight verification conditions on how different instructions get issued 
or stalled/m ove around. (This again is the  particular decomposition th a t  we chose, we can reduce this by 
choosing a  coarser decomposition). The complete PVS specification and proofs can be found a t  [Hos98], 
T he proofs of all the verification conditions again used very similar strategies. The synchronization function 
had many more cases in this example and the previously proved verification conditions were used many 
tim es over.
4.5 H y b r id  app ro ach  to  red u ce  th e  m a n u a l e f fo r t
In some cases, it is possible to  derive  the  definitions of some of the  completion functions autom atically 
from the  im plem entation to reduce the  manual effort. We illustrate th is on the DLX example.
The im plem entation is provided in the form of a typical transition  function giving the ’new’ value for
9
each s ta te  com ponent. Since the im plem entation modifies the regf ile in the writeback stage, we take 
C-MEM.WB to  be newjregf ile. This is a  function of dest.wb and result.wb. To determ ine how C_EX_MEM 
updates the register file, we perform a  step  of symbolic simulation of the non-observables i.e. replace 
dest_wb and result.wb in above function with their ’new-’ counterparts. Since the MEM stage updates 
dmem, C_EX_MEM will have another com ponent modifying dmem which we simply take as new.dmem. Similarly 
we derive C_ID_EX from C_EX_MEM through symbolic sim ulation. For the set IF /ID  of pipeline registers, this 
gets complicated on two counts - the instruction there could get stalled due to  a  load interlock and the 
forwarding logic th a t appears in the ID stage. So we let the user specify th is function directly. We have 
done a com plete proof using these completion functions. The details of the proof are similar. A n im portant 
difference here is that this elim inated the invariant that was needed earlier.
While reducing the manual effort, th is way of deriving the completion functions from the im plem entation 
has the disadvantage th a t we are verifying the im plem entation against itself. This contradicts our view of 
these as desired  specifications and negates our goal of incremental verification. In the example above, a  bug 
in the  writeback stage would go undetected and appear in the  completion functions th a t are being built 
up. (In fact, VCl_r for regf ile is true  by the  construction of C_MEM_WB and hence need not be proved - we 
believe, we can formalize this under suitable assum ptions on the im plem entation). All bugs will eventually 
be caught however, since the final com m ute diagram  uses the ’correct’ specification provided by the user 
instead of being generated from the im plem entation. To combine the advantages of both, we could use 
a hybrid approach where we use explicitly provided and symbolically generated completion functions in 
com bination. For example, we could derive it for the last stage, specify it for the penultim ate stage and 
then derive it for the stage before it (from the specification for the penultim ate stage) and so on.
5 Conclusions
We have presented a system atic approach to  modularize and decompose the proof of correctness of pipelined 
microprocessors. This relied on the user expressing the cumulative effect of flushing in term s of a  set of 
completion functions, one per unfinished instruction. This resulted in a natural decomposition of the proof 
and allowed the verification to  proceed incrementally. W hile th is m ethod increased the manual effort on the 
part of the  user, we found specifying the completion functions and constructing the  abstraction function 
was quite easy and believe th a t a  typical designer would have an understanding of these. We also believe 
th a t  our approach can verify deeper and complex pipelines than  is possible with o ther au tom ated  methods.
O ur fu tu re  plan is to  see how our approach can be applied, or can be adapted, to  verify more complex 
pipeline control th a t  use out-of-order completion of instructions. Our initial a ttem p ts  a t verifying such a 
processor appear encouraging. The particular processor we are attem pting  to  verify allows out-of-order 
completion of instructions but has a  complex issue logic th a t  allows such a possibility only if th a t  instruction 
does not cause any W AW  hazards. The crucial idea here is th a t we can reorder the completion functions 
of the unfinished instructions to  match the program  order (the order used by the abstraction function) 
using this property  of the  issue logic. O ther plans include testing the efficacy of our approach for verifying 
pipelines w ith d a ta  dependent iterative loops and asynchronous memory interface.
A c k n o w le d g e m e n ts
We would like to  thank John Rushby for very useful feedback on the first d raft of this paper.
References
[BD94] J . R. Burch and D. L. Dill. A utom atic verification of pipelined m icroprocessor control. In David
10
Dill, editor, C om puter-A ided Verification, C A V  ’94, volume 818 of Lecture N otes in Com puter 
Science, pages 68-80, Stanford, CA, June  1994. Springer-Verlag.
[BDL96] Clark B arre tt, David Dill, and Jerem y Levitt. Validity checking for com binations of theories 
with equality. In Srivas and Camilleri [SC96], pages 187-201.
[Bur96] J . R. Burch. Techniques for verifying superscalar microprocessors. In Design A utom ation  Con­
ference, D A C  ’96, June 1996.
[CRSS94] D. Cyrluk, S. R ajan, N. Shankar, and M. K. Srivas. Effective theorem  proving for hardware 
verification. In Ram ayya K um ar and Thom as Kropf, editors, Theorem Provers in Circuit Design  
(T P C D  ’94), volume 910 of Lecture N otes in C om puter Science, pages 203-222, Bad Herrenalb, 
Germany, Septem ber 1994. Springer-Verlag.
[Cyr96] David Cyrluk. Inverting the abstraction mapping: A methodology for hardw are verification. In 
Srivas and Camilleri [SC96], pages 172-186.
[Hos98] Ravi H osabettu . PVS specification and proofs of DLX and superscalar DLX examples, 1998. 
Available a t  h ttp ://w w w .cs .u tah .ed u /~ h o sab e tt/p v s/d lx .h tm l.
[HP90] John L. Hennessy and David A. Patterson . C om puter Architecture: A Q uantita tive Approach. 
Morgan K aufm ann, San M ateo, CA, 1990.
[JDB95] R. B. Jones, D. L. Dill, and J . R. Burch. Efficient validity checking for processor verification. 
In In ternational Conference on C om puter A ided Design, IC C A D  ’95, 1995.
[PD96] Seungjoon Park  and David L. Dill. Protocol verification by aggregation of d istributed actions. In 
Rajeev Alur and Thom as A. Henzinger, editors, C om puter-A ided Verification, C A V  ’96, volume 
1102 of Lecture N otes in C om puter Science, pages 300-310, New Brunswick, NJ, Ju ly /A ugust 
1996. Springer-Verlag.
[SC96] M andayam  Srivas and A lbert Camilleri, editors. Formal Methods in C om puter-A ided Design  
(F M C A D  ’96), volume 1166 of Lecture N otes in C om puter Science, Palo Alto, CA, November 
1996. Springer-Verlag.
[SII97] J. Sawada and W . A. H unt, J r. Trace table based approach for pipelined microprocessor ver­
ification. In O rna  Grum berg, editor, C om puter-A ided Verification, C A V  ’97, volume 1254 of 
Lecture N otes in C om puter Science, pages 364-375, Haifa, Israel, June 1997. Springer-Verlag.
[SM96] M andayam  K. Srivas and Steven P. Miller. Applying formal verification to  the AAM P5 micro­
processor: A case study in the industrial use of formal m ethods. Formal M ethods in S ystem s  
Design, 8(2):153-188, March 1996.
11
