Achieving Maximum Performance: A Method for the Verification of Interlocked Pipeline Control Logic by Kerstin Eder & Geoff Barrett
Achieving Maximum Performance: A Method for the
Veriﬁcation of Interlocked Pipeline Control Logic
Kerstin Eder

University of Bristol
Department of Computer Science
Woodland Road, MVB
Bristol BS8 1UB, GB
eder@cs.bris.ac.uk
Geoff Barrett
Broadcom DSL BU
320 Bristol Business Park
Coldharbour Lane
Bristol BS16 1EJ, GB
gbarrett@broadcom.com
ABSTRACT
Getting the interlock logic which controls pipeline ﬂow correct
is an important prerequisite for maximising pipeline performance.
Unnecessary pipeline stalls can only be eliminated when they can
be distinguished from those stalls which are necessary to preserve
functional correctness.
We propose a method for deriving a maximum pipeline perfor-
mance speciﬁcation from a complete functional speciﬁcation of the
pipeline control logic. The performance speciﬁcation can be used
to generate simulation testbench assertions. On the other hand, the
speciﬁcation can serve as a basis for formal property checking. The
most promising aspect of our work is, however, the potential to syn-
thesise the actual control logic from its formal description.
Categories and Subject Descriptors
B.5.2[Register-Transfer-Level Implementation]: DesignAids—
Veriﬁcation; B.5.1[Register-Transfer-Level Implementation]: De-
sign—Control Design, Pipeline
General Terms
Performance, Veriﬁcation
Keywords
Pipeline Stall, Interlock Logic, Veriﬁcation
1. INTRODUCTION
1.1 Motivation
Unnecessary pipelinestalls decreaseprocessor performance. En-
suringthat apipelinedmicroprocessor utilisesitsmaximumpipeline
This publication is one result of work that was performed during a
collaborative summer project between the University of Bristol and
Broadcom Corporation. Kerstin Eder is grateful for the support of
both organisations during this time.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for proﬁt or commercial advantage and that copies
bear this notice and the full citation on the ﬁrst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior speciﬁc
permission and/or a fee.
DAC 2002, June 10-14, 2002, New Orleans, Louisiana, USA.
Copyright 2002 ACM 1-58113-461-4/02/0006 ...$5.00.
throughput is therefore an important task for processor designers.
The complexity of today’s microprocessor architectures demands
a systematic and rigorous approach; one that can be provided by
applying formal methods.
This paper describes the method developed while verifying the
performance of the FirePath processor. FirePath [9] is a Broadcom
proprietary LIW/SIMD processor targeted at communications and
multimedia applications, its ﬁrst use being embedded within a DSL
system chip. The processor is based on a pipelined dual instruction
multiple data microarchitecture. Pipeline ﬂow is controlled via an
interlock logic that prevents functional hazards. To achieve best
performance, the pipeline should only stall when this is required
to guarantee functional correctness. For example, competition for
a completion bus results in pipeline congestion, thus potentially
stalling all ﬁnal pipeline stages, which did not win the grant for
the completion bus. Stalls in a completion pipeline stage can cause
preceding stages to stall.
Fromaperformance point ofview, theinstructions inthe pipeline
should move intothe next pipeline stage whenever possible. Hence,
a performance bugis a pipeline stall forwhich there isno functional
justiﬁcation. Maximum performance can be achieved if there are
no unnecessary stalls. With our method we can derive a maximum
performance speciﬁcation from a speciﬁcation of the control logic
which describes all functional constraints that cause a pipeline stall
by applying ﬁxed-point iteration. This performance speciﬁcation
can be included in a veriﬁcation testbench in the form of an as-
sertion. Alternatively, the speciﬁcation can be used as a basis for
formal property checking.
Our method is applicable to any pipelined microprocessor archi-
tecture that uses interlock logic to control pipeline ﬂow, provided
the functional speciﬁcation of the ﬂow control logic has the prop-
erties described in Section 3.1. These properties are by no means
restrictive, in fact they are what one would naturally expect when
specifying pipeline control ﬂow.
A very promising aspect of our approach is that the functional
speciﬁcation can serve as the basis for RTL code synthesis. We
are conﬁdent that the basic HDL code for the interlock logic can
be generated fully automatically from its functional speciﬁcation.
This is one topic of our current research.
This paper introduces our method by means of a case study in
Section 2. For this purpose and to focus our attention on the ap-
proach (rather than the architecture itself) we introduce a simple
pipelined processor architecture in Section 2.1. The performance
speciﬁcation of our example architecture will be developed from
its functional speciﬁcation in Section 2.2. Section 3 establishes
the theoretical soundness of our approach and outlines assumptionsthat need to be satisﬁed to apply the method. Our results are pre-
sented in Section 4, which is followed by some conclusions and an
outlook on further work in Section 5.
1.2 Related Work
Approaches to verify the correctness of pipelined microproces-
sor designs range from traditional simulation-based veriﬁcation via
automatic property or model checking [4, 2] to semi-automatic the-
orem proving [1]. Combined approaches have also been success-
fully applied [3].
In practice, simulation-based veriﬁcation is still widely used to
gain conﬁdence in design correctness; this also applies to pipeline
ﬂow control. Kohno and Matsumoto describe in [5] a test pro-
gram generation tool which supports the veriﬁcation of pipeline
design. Their approach requires a pipeline behaviour speciﬁcation
to be provided. Based on this speciﬁcation their tool automatically
generates test cases. The pipeline behaviour speciﬁcation contains
pipeline deﬁnitions comprising pipeline stages, corresponding data
hazards, bypassing mechanisms and resource usage for each in-
struction category. Any additional stall conditions are speciﬁed in
explicit stall deﬁnitions. Compared to the speciﬁcation used in our
approach, their pipeline behaviour speciﬁcation is very similar, but
also much more detailed since it is describing the behaviour on in-
struction category level while ours is, except for instructions which
enforce an explicit pipeline stall, independent of the actual instruc-
tion set. The assertions obtained from our speciﬁcation can easily
be added to existing test benches which, as we have experienced,
facilitates the integration of our method into an existing veriﬁcation
environment.
In [6], Kroening and Paul introduce a tool that supports the de-
sign of pipelined microprocessors from prepared sequential ma-
chines which are later used as reference models. Prepared in this
context means, amongst other things, that the hardware of the orig-
inal sequential machine has been partitioned into pipeline stages.
The design method of pipelining sequential architectures is further
explained in [7]. Under the assumption that a prepared sequen-
tial machine is given, they present a method that adds a stall en-
gine and also forwarding logic. The correctness of the resulting
pipelined design with respect to the original sequential design is
then proven with the theorem prover PVS [8]. Designers are ex-
pected to provide register mappings for the intermediate results of
all pipeline stages. Again, the logic description of the stall engine is
very similar to our functional speciﬁcation. From our point of view
the tool proposed by Kroening and Paul is beneﬁcial if the design
is developed in a systematic sequential top-down reﬁnement ap-
proach. However, from our experience, a combination of top-down
and bottom-up design is often necessary to meet the tight time-to-
market schedules engineers are faced with in practice. Our method
only requires a functional speciﬁcation of the pipeline ﬂow control
logic. Sufﬁcient information to develop a functional speciﬁcation
is often available early in the design cycle ie in the microarchitec-
tural description of the pipelined processor design. Like Kroening
and Paul we are ultimately aiming to generate HDL code for the
pipeline ﬂow control logic from our functional speciﬁcation.
2. AN EXAMPLE CASE STUDY
2.1 Example Architecture
To demonstrate our method we introduce a simple pipelined mi-
croprocessor architecture. Alongside the architectural description
we also informally introduce our modelling language by giving a
formal description of the features relevant to our analysis.
Our example architecture comprises of two pipes long and short
2
1
PIPE
completion bus c
4
3
long short
PIPESTAGE
Figure1: Examplepipelinearchitecturewithtwo pipesand one
completion bus.
which share one combined fetch/decode/issue stage. The long pipe
has two execution stages and one writeback stage, and the short
pipe has one execution/writeback stage. The ﬁrst stages of both
pipelines operate in lock step, ie progress is only made in syn-
chrony. Note that the pipeline stages in Figure 1 have been indexed
starting from the fetch/decode/issue stages.
For our formal description we deﬁne the sets/types:
PIPE = flong;shortg
PIPESTAGE = f1;2;3;4g
Forﬂowcontrol purposes eachpipe stagehas amoving or empty
(moe) ﬂag, which propagates backwards to its predecessor and in-
dicates whether the stage is moving forward on the next cycle or
is currently empty. It is used to determine whether the stage will
block the preceding stage. If the moe ﬂagis set, the respective stage
is ready to take an instruction from the preceding stage on the next
cycle. If themoeﬂagis clear, the preceding pipe stage gets blocked,
ie does not move forward on the next cycle; it stalls.
Further, each pipe stage has a require to move (rtm)ﬂ a gw h i c h
propagates forwards to its successor and indicates whether the con-
tents in that stage intends to move on. So when a stage does not re-
quire to move, it means either it is not valid (ie contains a pipeline
bubble) or, if valid it does not need to move into the next pipeline
stage (mostly because its processing ﬁnishes at that stage and no
writeback is required).
BOOLEAN p:s:moe
BOOLEAN p:s:rtm
where p 2 PIPE
s 2 PIPESTAGE
A special (instruction-speciﬁc) ﬂag, which can only be accessed
in the fetch/decode/issue stage of the long pipe, indicates whether
the machine is in a wait state.
BOOLEAN op is WAIT
Our example architecture has eight registers. The register use is
recorded on an eight bit scoreboard, where the register address is
used to access the scoreboard bit-array. Each instruction has one
source and one destination register.
REGADDRESS = f7::0g
SDREG = fsrc;dstg
BOOLEAN scb[8]
REGADDRESS p:1:r:regaddr
where p 2 PIPE
r 2 SDREGSPECfunc
= f(hlong:4:moe;long:3:moe;long:2:moe;long:1:moe;short:2:moe;short:1:moei)
=( long:req ^: long:gnt !: long:4:moe)
^ (long:3:rtm ^: long:4:moe !: long:3:moe)
^ (long:2:rtm ^: long:3:moe !: long:2:moe)
^ ((long:1:rtm ^: long:2:moe
_ op is WAIT
_: short:1:moe
_9 r : SDREG : 9a : REGADDRESS : long:1:r:regaddr = a ^ scb[a] ^ c:regaddr 6= a)
!: long:1:moe)
^ (short:req ^: short:gnt !: short:2:moe)
^ ((short:1:rtm ^: short:2:moe
_: long:1:moe
_9 r : SDREG : 9a : REGADDRESS : short:1:r:regaddr = a ^ scb[a] ^ c:regaddr 6= a)
!: short:1:moe)
Figure 2: A functional speciﬁcation describing the necessary pipeline stalls.
The ﬁnal stages of both execution subpipes connect to a comple-
tion bus named c;t h eshort pipe has higher priority for completion
than the long pipe. Each pipe has a dedicated signal to request the
completion bus. There is one grant signal for each pipe. To target
a register the completion bus holds the target register’s address.
BOOLEAN p:req;p:gnt
REGADDRESS c:regaddr
where p 2 PIPE
This concludes the example architecture. Note that the example
processor is simpler than the FirePath in several ways. These in-
clude the fact that FirePath is two-sided and has more and deeper
execution pipes than the example. Furthermore FirePath has sev-
eral pipeline decouple stages (shunts), interrupt logic and several
completion buses. FirePath has been successfully veriﬁed with our
method. To keep the example comprehensible, however, we restrict
this paper to the given simple architecture.
2.2 Speciﬁcation
2.2.1 Functional Speciﬁcation
Our aim is to demonstrate that the pipeline ﬂow control allows
maximum performance. By maximum performance we understand
that there are no unnecessary pipeline stalls. In the setting de-
scribed in Section 2.1 this means that there are no situations where
the moe ﬂag is clear when it should be set.
We can start from a speciﬁcation, which describes under which
conditions it is necessary to stall a pipe stage to avoid hazards. In
practice, this information is typically contained in the microarchi-
tectural description of the processor, and hence should be accessi-
ble at an early stage during processor design. The following speci-
ﬁcation describes the individual constraints under which a pipeline
stall is necessary in our example architecture.
We start with the completion stages which compete for access to
the completion bus. A completion stage does not move on when it
does not win the grant to use the completion bus.
long:4:rtm ^: long:gnt !: long:4:moe
short:2:rtm ^: short:gnt !: short:2:moe
Note that for the completion stages the bus request ﬂag should be
set when the rtm ﬂag is set, which transforms the above statements
to:
long:req ^: long:gnt !: long:4:moe
short:req ^: short:gnt !: short:2:moe
Note further that the completion logic, eg the arbitration scheme
of the bus, can also be included in the functional speciﬁcation. For
the sake of simplicity we will concentrate on the moe ﬂags only.
To prevent hazards caused by overwriting, the long pipe’s inter-
mediate stages should stall when their preceding pipe stage stalls.
long:2:rtm ^: long:3:moe !: long:2:moe
long:3:rtm ^: long:4:moe !: long:3:moe
In the same way, a fetch/decode/issue stage should stall when the
respective issue pipe is stalled.
long:1:rtm ^: long:2:moe !: long:1:moe
short:1:rtm ^: short:2:moe !: short:1:moe
Further, if the machine is in a wait state then the long pipe can
not issue.
op is WAIT !: long:1:moe
In addition, both the long and the short pipe operate in lock step
during fetch, decode and issue.
:long:1:moe !: short:1:moe
:short:1:moe !: long:1:moe
Note that the above two statements describe the logical equiva-
lence of the moe ﬂags of the initial stages of both pipelines. We use
two implications instead of one equivalence. This makes our spec-
iﬁcation compositional with respect to individual pipeline stages,
which facilitates the maintenance of our speciﬁcation.
If a source or destination register is outstanding then the instruc-
tion cannot be issued. A register is outstanding if it is on the score-
board, ie its scoreboard ﬂag is set, and it is not bypassed, ie it will
not be written to in the current cycle, which means it is not a com-
pletion target register.
8p : PIPE : 9r : SDREG : 9a : REGADDRESS :
p:1:r:regaddr =a ^ scb[a] ^ c:regaddr 6=a !:p:1:moe
This statement completes our speciﬁcation. To get a more com-
pact description we can transform the above speciﬁcation such that
there is exactly one statement for each pipeline stage as given in
Figure 2.SPECperf
=( long:req ^: long:gnt  : long:4:moe)
^ (long:3:rtm ^: long:4:moe  : long:3:moe)
^ (long:2:rtm ^: long:3:moe  : long:2:moe)
^ ((long:1:rtm ^: long:2:moe
_ op is WAIT
_: short:1:moe
_9 r : SDREG : 9a : REGADDRESS : long:1:r:regaddr = a ^ scb[a] ^ c:regaddr 6= a)
 : long:1:moe)
^ (short:req ^: short:gnt  : short:2:moe)
^ ((short:1:rtm ^: short:2:moe
_: long:1:moe
_9 r : SDREG : 9a : REGADDRESS : short:1:r:regaddr = a ^ scb[a] ^ c:regaddr 6= a)
 : short:1:moe)
Figure 3: A maximum performance speciﬁcation.
An assignment to the vector of moving or empty ﬂags compris-
ing
hlong:4:moe;long:3:moe;long:2:moe;long:1:moe;
short:2:moe;short:1:moei
satisﬁes this speciﬁcation iff
` f(hlong:4:moe;long:3:moe;long:2:moe;long:1:moe;
short:2:moe;short:1:moei):
2.2.2 Performance Speciﬁcation
The functional speciﬁcation in Figure 2 contains formulas of the
form condition !: moe,w h e r econdition represents a disjunction
of the individual constraints that should lead to a pipeline stall.
Note that a violation of this speciﬁcation occurs when condition is
satisﬁed, but the moe ﬂag is set, corresponding to a situation where
despite the condition holding, the respective pipeline stage signals
to its predecessor that it is either moving or empty, so it could be
overwritten on the next cycle. The result is that the pipe is moving
even though it should not, which will cause a hazard. This corre-
sponds to a functional bug; we therefore call a speciﬁcation of the
above form a functional speciﬁcation.
However, our aim is to identify cases where the pipeline stalls
although it is safe to move on. Formally, this can be expressed
by specifying that :moe ^: condition should not occur, where
condition represents a disjunction of all individual constraints that
should lead to a pipeline stall. This is logically equivalent to the
formula :moe ! condition; stating that if the moe ﬂag is not set
then we would expect condition to hold. If condition does not hold,
then we have indeed identiﬁed an unnecessary pipeline stall. A
speciﬁcation containing formulas of this form will be called a per-
formance speciﬁcation.
The performance speciﬁcation that corresponds to the functional
speciﬁcation introduced in Section 2.2.1 is given in Figure 3. It
can be included into a testbench in the form of an assertion. Al-
ternatively, the performance speciﬁcation can serve as a basis for
property checking; we have left property checking for future work.
2.2.3 Concluding Remarks on our Case Study
For obvious reasons, a design should satisfy both its functional
speciﬁcation and its performance speciﬁcation. Hence, the com-
bined speciﬁcationwould contain formulas ofthe formcondition$
:moe, expressing that the pipeline stalls if and only if condition is
satisﬁed. The combined pipeline speciﬁcation thus corresponds to
the speciﬁcation obtained by changing all ! in Figure 2 into $.
Because the focus of our project was on performance veriﬁca-
tion, we only intended to verify the performance part of the com-
bined speciﬁcation. In practice, it is common to use other veriﬁca-
tion methods to gain conﬁdence in the functional correctness of the
design.
To include the assertions into a testbench, what remains to be
done is to translate them into the HDL used for RTL design and
simulation. In addition, the signals referred to in the assertion need
to be connected to the corresponding signals in the RTL code of
the pipeline design. Close cooperation with designers is required
to ensure that the RTL signals have the intended semantics.
3. DERIVINGTHEPERFORMANCESPEC-
IFICATION FROM THE FUNCTIONAL
SPECIFICATION
In the previous section we gave an informal and intuitive under-
standing of the relationship between the functional and the perfor-
mance speciﬁcation. The careful reader will have noticed that it is
possible to satisfy our functional speciﬁcation and yet never move
at all. There are, in fact, many possible implementations of this
speciﬁcation of varying performance. So we must ask which of
these speciﬁcations we require. Of course, we want the best one,
ie the one which stalls least often. In this section, we will show
that the best solution exists and prove that it is derived by chang-
ing each ! in Figure 2 into $ as we have indicated to obtain the
combined functional and performance speciﬁcation in the previous
section. Ingeneral, the best solution may be more complicated than
this but only if control ﬂows in both directions along the pipeline.
3.1 Properties of the Functional Speciﬁcation
We ﬁrst establish some properties of the functional speciﬁcation.
It is important that the functional speciﬁcation can be provided in
a form that satisﬁes these properties; they are preconditions for de-
rivinga maximum performance speciﬁcation. Section3.2 describes
how the derivation is performed.
Clearly, the functional speciﬁcation given in Section 2.2.1 does
not uniquely determine the values that should be assigned to the
moe ﬂags. Note for instance that our speciﬁcation is satisﬁed if all
moe ﬂags are set to false.
` f(hFalse;False;False;False;False;Falsei) (1)
Assigning false to all moe ﬂags blocks the entire pipeline. But,from a functional point of view, correctness is certainly preserved
when the pipeline stalls completely. This is why the speciﬁcation
in Section 2.2.1 is called a functional (correctness) speciﬁcation.
Note further that our speciﬁcation can be split into two separate
pipeline speciﬁcations of the form:
f(hlong:4:moe;:::;long:1:moei)=
1
^
i=4
long:i:F(:long:i+1:moe;:short:i:moe) !: long:i:moe
and
f(hshort:2:moe;short:1:moei)=
1
^
i=2
short:i:F(:short:i+1:moe;:long:i:moe) !: short:i:moe
where the function F, which describes the individual stalling con-
straints, is constructed using only conjunction and disjunction (on
its argument variables), ie it is monotonic. With suitable indexing
the above speciﬁcation can be further generalised to:
f(hmoei)=
^
(hFi(:hmoei) !: h moei)
such that
hFi[i](:hmoei) !: h moei[i]
From the monotonicity of F it follows that f is disjunctive in the
sense that if two assignments to moe ﬂag vectors satisfy f then their
bitwise disjunction also satisﬁes f:
` f(hmoei1)
` f(hmoei2)
` f(hmoei1 _h moei2)
(2)
where
hmoen::1i1 _h moen::1i2 =
hhmoei1[n] _h moei2[n];:::;hmoei1[1] _h moei2[1]i
Now we can show that:
F(:(hmoei1 _h moei2)) !: (hmoei1 _h moei2)
Assume: ` f(hmoei1) which means F(:hmoei1) !: h moei1
and ` f(hmoei2) which means F(:hmoei2) !: h moei2.
Proof:
F(:(hmoei1 _h moei2))
= F(:hmoei1 ^: h moei2)
! F(:hmoei1) ^ F(:hmoei2) since
:hmoei1 ^: h moei2
!: h moeii
where i 2f 1;2g
and F monotonic
!: h moei1 ^: h moei2 by assumption
= :(hmoei1 _h moei2) q.e.d.
Properties 1 and 2 of our functional speciﬁcation form the theo-
retical foundation for the derivation of the performance speciﬁca-
tion. The two properties do not restrict the pipeline control ﬂow
logic. Establishing the ﬁrst property is trivial, since our speciﬁca-
tion does not state anything about when pipeline stages do not stall.
To implement pipeline functionality that allows instructions to en-
ter at the fetch/decode/issue stage, progress though the intermediate
stages, and exit at a completion stage or possibly earlier, the control
ﬂows backwards, starting from the completion stages. For any such
pipeline the conditions which cause necessary pipeline stalls can be
speciﬁed by deﬁning a suitable function F which takes the negated
moving or empty ﬂags as arguments. Monotonicity should be a
natural property of the resulting functional speciﬁcation.
3.2 Derivation
To get a unique description of the values that should be assigned
to the moe ﬂags to achieve maximum performance we need to add,
to the functional speciﬁcation, the requirement that the desired set
of assignments to moe ﬂags is the most liberal, ie the assignment
which makes more moe ﬂags true than any other.
It is important to prove that there is a unique most liberal as-
signment. To prove this we need to show that there is at least one
assignment possible; thisistheonethat setsallﬂagsinthemoevec-
tor to false - see property (1), and that or’ing the individual ﬂags of
two valid assignments results in another valid assignment; which is
established through property (2).
Under the assumption that the functional speciﬁcation has these
two properties, the most liberal assignment to moe ﬂags, written
hMOEi, can be obtained by disjunctively combining the individual
bits of all valid assignments to moe ﬂags.
hMOEi =
W
hmoei s.t. ` f(hmoei)
` f(hMOEi) (3)
Fixed-point iteration can now be applied to ﬁnd the least stalling
solution. The resulting speciﬁcation f(hMOEi) states under which
conditions most moe ﬂags are set, ie how maximum pipeline per-
formance can be achieved. It establishes that, to get maximum per-
formance, each individual moe ﬂag should be assigned as follows:
hMOEi[i] := :hFi[i](:hMOEi) (4)
Thisconﬁrms our intuitiveapproach fromSection2.2.2 andfor-
mally establishes the relationship between the functional and the
performance parts of our combined speciﬁcation.
It remains to show that hMOEi is indeed the solution that pro-
vides the maximum performance. We need to show that any valid
assignment to moe ﬂags is subsumed by the assignment in MOE.
Suppose: ` f(hmoei0).
Assume: hmoei0[j] !h MOEi[j] for j >i under the given index-
ing. This assumption is based on the observation that the moe ﬂag
of a ﬁnal stage depends not on moe ﬂags but solely on a completion
bus grant which is treated as a constant in F.
We can now prove for each moe ﬂag i in the vector that:
hmoei0[i] !h MOEi[i]
Proof:
hmoei0[i] !: h Fi[i](:hmoei0) by 4
!: h Fi[i](:hMOEi) F monotonic
= hMOEi[i] by 4
Hence:
hmoei0 !h MOEi q.e.d.
Note that the inductive proof above is possible because control
ﬂows, starting from the completion stages, to the respective prede-
cessor pipeline stages only. If control ﬂows in both directions along
the pipeline the best solution may be more complicated.4. RESULTS
Ouraimwastoidentifypossibleperformance bugsinthepipeline
ﬂow control logic of the FirePath processor. A performance bug, in
this context, is deﬁned as an unnecessary pipeline stall. Using the
architecture and microarchitecture manuals, and in close collabo-
ration with designers, we investigated all constraints that stall the
pipeline to preserve functional correctness, and wrote them down
in the form of a functional speciﬁcation that has the properties dis-
cussed in Section 3.1. This speciﬁcation is already a valuable ref-
erence for design engineers. We have shown how a maximum per-
formance speciﬁcation can be obtained from the functional speci-
ﬁcation. Ideally the combined speciﬁcation is used as a basis for
formal property checking. Alternatively it can be translated into
testbench assertions which are checked during simulation.
The speciﬁcation of the FirePathpipeline design is now a perma-
nent part of the processor’s testbench. It ensures that any modiﬁ-
cations of the pipeline ﬂow control logic preserve the initial intent.
Even the best simulation is by no means exhaustive, hence the fact
that the assertions are not triggered during simulation does not im-
ply that the design satisﬁes the speciﬁcation. A more thorough ap-
proach is to use a property checking tool instead of simulation. Our
investigation has found that the speciﬁcation can also be translated
into statements that can be veriﬁed by property checkers. Running
a property checker means exhaustive veriﬁcation and is therefore
preferable to simulation.
The project took 10 weeks of one person’s time from a cold start
on the processor architecture and microarchitecture (and Verilog).
In summary, we uncovered inefﬁciencies in the pipeline control
ﬂow, and also some incorrect initialisation values of control sig-
nals. The completion logic has been redesigned as a consequence
of our analysis, resulting in efﬁciency increase at the pipeline com-
pletion stages.
Another achievement of this project is the gain in design under-
standing amongst engineers. The entire pipeline ﬂow control was
formalised and is now documented in the functional speciﬁcation
which serves as a design reference. It greatly helped to clarify how
the pipeline ﬂow is controlled, and bridged gaps when parts were
designed by different teams. In several cases, functional equiva-
lence of different implementations needed to be established before
a more abstract description was accepted across the design teams.
An instance of this, which is not addressed in our example archi-
tecture, are shunt stages, where the same effect can be achieved
via various implementations, all of which should satisfy the same
functional speciﬁcation on a more abstract level.
5. CONCLUSION AND FURTHER WORK
Interlocked pipeline ﬂow control is designed to prevent hazards
by stalling the pipeline. The design of the interlock logic is per-
ceived to be tricky and debugging it can delay the design process
considerably. There is a danger of introducing unintended pipeline
stalls during debugging and more generally whenever the design is
modiﬁed.
While the prevention of hazards is very important to achieve
functional correctness, the detection and prevention of unnecessary
pipeline stalls is crucial to maximise pipeline performance and thus
obtain high processor performance. The aim of our project was to
develop a speciﬁcation that detects and thus helps to prevent un-
necessary pipeline stalls.
We believe that our method can be applied to any pipelined mi-
croprocessor design that uses interlock logic to prevent hazards,
provided the functional speciﬁcation of the ﬂow control logic can
be given in a form that satisﬁes the properties detailed in Sec-
tion 3.1. These properties are not restrictive, in fact they state what
one would naturally expect when specifying pipeline control ﬂow.
The simplicity of our approach is based on the fact that it relies
purely on a functional understanding of the pipeline ﬂow control
logic, which many designers are familiar with at an early stage of
the design process. In addition, the speciﬁcation is, except for in-
structions which enforce an explicit pipeline stall, independent of
the actual instruction set, allowing the control ﬂow to be developed
and veriﬁed in separation from the data ﬂow. We have found that
our method can already deliver useful results, even when the design
is not yet complete.
For this pilot project we have derived the performance speciﬁca-
tion manually. We are now working on a tool which, given a func-
tional speciﬁcation thathas thepropertiesmentioned inSection3.1,
generates the corresponding performance speciﬁcationandalsoVer-
ilog/VHDL assertions. This tool will form the ﬁrst module of a
(semi-)automatic pipeline ﬂow control design environment. Ulti-
mately, we would like to generate the HDL code that implements
the pipeline ﬂow control logic from the functional speciﬁcation.
This is much more ambitious. Issues like timing and the intro-
duction of shunt stages to decouple pipelines if signal propagation
times cannot meet cycle times need to be addressed. A project that
aims to show that this is feasible is currently in progress.
6. REFERENCES
[1] M. Bickford and M. Srivas. Veriﬁcation of a pipelined
microprocessor using CLIO. In Workshop on Hardware
Speciﬁcation, Veriﬁcation and Synthesis: Mathematical
Aspects, volume 408 of Lecture Notes in Computer Science.
Springer, 1989.
[2] A. Biere, E. Clarke, R. Raimi, and Y. Zhu. Verifying safety
properties of a PowerPC microprocessor using symbolic
model checking without BDDs. In 11th International
Conference on Computer-Aided Veriﬁcation, Lecture Notes in
Computer Science, pages 60–71. Springer, 1999.
[3] J. R. Burch and D. L. Dill. Automatic veriﬁcation of pipelined
microprocessor control. In 6th International Conference on
Computer-Aided Veriﬁcation, volume 818 of Lecture Notes in
Computer Science, pages 68–80. Springer, 1994.
[4] E. M. Clarke, O. Grumberg, and D. A. Peled. Model
Checking. MIT Press, 1999.
[5] K. Kohno and N. Matsumoto. A new veriﬁcation methodology
for complex pipeline behavior. In DAC 2001 Conference
Proceedings, June 2001.
[6] D. Kroening and W. J. Paul. Automated pipeline design. In
DAC 2001 Conference Proceedings, June 2001.
[7] S. M¨ uller and W. Paul. Computer Architecture: Complexity
and Correctness. Springer, 2000.
[8] S. Owre, N. Shankar, J. M. Rushby, and D. W. J.
Stringer-Calvert. PVS System Guide. SRI International,
version 2.3 edition, September 1999. Available at
http://pvs.csl.sri.com.
[9] S. Wilson. Broadcom’s FirePath Processor Architecture. In
Embedded Processor Forum, June 2001.