Verification of microarchitectural refinements in rule-based systems by Dave, Nirav H. et al.
Veriﬁcation of Microarchitectural Reﬁnements in
Rule-based Systems
Nirav Dave∗, Michael Katelman†, Myron King∗, Arvind∗, Jose´ Meseguer†
∗ Massachusetts Institute of Technology - Computer Science and Artiﬁcial Intelligence Laboratory
Cambridge, MA 02139, U.S.A.
{ndave, mdk, arvind}@csail.mit.edu
† University of Illinois at Urbana-Champaign - Department of Computer Science
Urbana, IL 61801, U.S.A.
{katelman, meseguer}@uiuc.edu
Abstract—Microarchitectural reﬁnements are often required
to meet performance, area, or timing constraints when designing
complex digital systems. While reﬁnements are often straightfor-
ward to implement, it is difﬁcult to formally specify the conditions
of correctness for those which change cycle-level timing. As
a result, in the later stages of design only those changes are
considered that do not affect timing and whose veriﬁcation can
be automated using tools for checking FSM equivalence. This
excludes an essential class of microarchitectural changes, such as
the insertion of a register in a long combinational path to meet
timing. A design methodology based on guarded atomic actions,
or rules, offers an opportunity to raise the notion of correctness
to a more abstract level. In rule-based systems, many useful
reﬁnements can be expressed simply by breaking a single rule
into smaller rules which execute the original operation in multiple
steps. Since the smaller rule executions can be interleaved with
other rules, the veriﬁcation task is to determine that no new
behaviors have been introduced. We formalize this notion of
correctness and present a tool based on SMT solvers that can
automatically prove that a reﬁnement is correct, or provide
concrete information as to why it is not correct. With this tool, a
larger class of reﬁnements at all stages of the design process can
be veriﬁed easily. We demonstrate the use of our tool in proving
the correctness of the reﬁnement of a processor pipeline from
four stages to ﬁve.
I. INTRODUCTION
Modular reﬁnement is an important technique in designing
complex digital systems because it eases architectural explo-
ration for better performance, area, and power. For modular
reﬁnement to be viable it should be relatively easy to deter-
mine if a local change preserves the overall correctness of the
design. Generally, it is extremely difﬁcult for a designer to give
a full formal correctness speciﬁcation for a system. Specifying
correctness requires a level of knowledge of the overall system
and familiarity with formal veriﬁcation methods that few
designers possess. As a consequence, common practice is to
settle for partial veriﬁcation via testing. Testing works, but as
test suites tend to be built in conjunction with the design itself,
designers rarely gain sufﬁcient conﬁdence in their reﬁnements’
correctness until the ﬁnal stages of the design cycle.
An alternative is to restrict the types of reﬁnements to ones
whose local correctness guarantees that the overall behavior
will remain unaffected, and designs usually rely on the notion
of equivalence supported by the design language semantics
for proving or testing local equivalence. As most hardware
description languages describe synthesizable systems at the
level of gates and wires, this amounts to FSM (ﬁnite-state
machine) equivalence. Tools usually require the designer to
specify the mapping of state elements (e.g., ﬂip-ﬂops), and
thus reduce the problem of FSM equivalence to combina-
tional equivalence, which can be performed efﬁciently. FSM-
equivalence-preserving reﬁnements have proven to be quite
useful because tools are available to prove the local correctness
automatically and there is no negative impact on the overall
veriﬁcation strategy. However, FSM reﬁnement is too restric-
tive, disallowing many desirable changes such as adding a
buffer to cut a critical path in a pipeline. Thus these tools are
limited to veriﬁcation in the later stages of design when the
timing has been decided.
Recently, languages like Bluespec [4], which describe de-
signs not as gates and wires but as a set of guarded atomic
actions (or rules) on state elements, have been proposed.
Over the last six years, it has been established that Bluespec
programs not only can produce no-compromise hardware [1],
but that keeping programs at the rule level allows more
ﬂexibility in design and reﬁnements [8], [9]. For instance, the
addition of a pipeline stage can be implemented in a natural
way by splitting the rule corresponding to the appropriate
stage into multiple rules, and introducing state to hold the
intermediate results.
A Bluespec program can be reasoned about at two levels.
At the ﬁrst level we deal with rules in an unscheduled manner.
The semantics state that we compute by selecting any valid
rule (i.e., one whose guard evaluates to true) for execution,
update the state by executing the body of the rule, and then
repeat the process. This means that the program is naturally
non-deterministic, and programs at this level are meant to
be correct for all possible traces of execution. At the second
level the compiler adds a scheduler which is responsible for
resolving the non-determinism so that we may synthesize
the program into a high-quality FSM implementation. The
choice of scheduler is a purely performance-based concern and
should not affect the correctness. We exploit this separation








yi = f1(xi, r1i);
r10 = 0; r1i+1 = yi;
zi = f2(yi, r2i);
r20 = 0; r2i+1 = zi;
Fig. 1. Initial FSM
Despite the guarded atomic action formalism’s deep relation
to term-rewriting systems and formal proofs, little work has
been done to verify rule-based programs at anything beyond
the implementation level. The main contribution of this paper
is to deﬁne a notion of program equivalence between rule-
based programs and describe an SMT-based algorithm that
automatically veriﬁes the correctness of “rule splitting” re-
ﬁnements. Our notion of program equivalence is based on
the transitive closure of permitted transitions and not on
more standard trace-based characterizations. If needed, the
distinctions drawn by traced-based characterizations can be
expressed programmatically and veriﬁed using our notion of
equivalence.
A tool based on this algorithm is able to prove the correct-
ness of interesting reﬁnements in a matter of minutes, well
within range to be useful as a debugging aid for the designer.
We use the tool to show the correctness of several examples
including the reﬁnement of a four-stage processor pipeline into
a ﬁve-stage pipeline.
Paper Organization: In Section II, we discuss the kinds of
reﬁnements we want to make and why their correctness
cannot be formulated at the FSM level. We also discuss the
challenge of veriﬁcation at the level of rules and discuss how
nondeterministic speciﬁcations affect the veriﬁcation task. In
Section III, we formalize a notion of equivalence in the context
of rule reﬁnements. In Section IV, we discuss the algorithm
used by our tool to mechanically verify equivalence using an
SMT solver. In Section V, we discuss the veriﬁcation of a
processor program. In the last two sections we discuss related
work and present our conclusion.
II. MOTIVATING REFINEMENT EXAMPLE
To understand the challenges of reﬁnement in rule-based
systems we must ﬁrst understand how such reﬁnements differ
from reﬁnements of FSMs, motivating our notion of behavior
and explaining where the new method and tool are needed.
A. Reﬁning an FSM
Consider the hardware represented by the FSM system
shown in Figure 1. The system consists of two registers r1
and r2, both initially zero, and some combinational logic
implementing functions f1 and f2. The critical path in this
system goes from r1 to r2 via f1 and f2. In order to improve
performance, a designer may want to break this path by adding
a buffer (say, a one element FIFO) on the critical path as shown
in Figure 2. Though we have not shown the circuitry to do so,
we will assume that r2 does not change and the output z is







yi = f1(xi, r1i); z0 = ⊥;
r10 = 0; r1i+1 = yi;
yp0 = ⊥; ypi+1 = ypi;
zi+1 = f2(yi+1, r2i+1);
r20 = 0; r21 = r20;
r2i+2 = zi+1;
Fig. 2. Reﬁned FSM
In this reﬁned system, the operation that was done in one
cycle is now done in two; f1 is evaluated in the ﬁrst cycle,
and f2 in the second. The computation is fully pipelined so
that each stage is always productive (except the ﬁrst cycle
of the second stage, when the FIFO buffer is empty) and we
have the same cycle-level computation rate. However the clock
period in the reﬁned system can be much shorter, thereby
increasing system throughput. Though the cycle-by-cycle state
of the two FSMs do not match directly, a little bit of analysis
will show that the sequence of values assumed by r2 and z are
the same in both systems. In other words, the reﬁned system
produces the same answer as the original system but one cycle
later. Therefore, in many situations such a reﬁnement may be
considered correct even though the FSMs of the two systems
are not equivalent.
The problem here is that if we don’t rely on FSM equiv-
alence then how should we deﬁne equivalence? A solution
could be to introduce the notion of a message or valid input
and output and then deﬁne equivalence in terms of input-output
sequences of messages as opposed to cycle-by-cycle behavior
of input and output. Unless we carry this notion of validity
everywhere in the design it is difﬁcult to reason in these terms
mechanically.
In the following section we discuss a rule-based description
of this example and show how reﬁnements are expressed in
such systems.
B. Reﬁnements at the Rule Level
When designers specify systems using rules, they often have
in their mind a particular datapath and FSM, although the exact
datapath and FSM is generated by the compiler. For example,
a designer may express the FSM design in Figure 1 using
a single rule as shown in Figure 3. In contrast to the FSM
interface, a rule-based system has no strict notion of clock
cycle to determine when data is passed. To deal with this we
have added two FIFOs inQ and outQ; when we “take input”
we dequeue a value from inQ and when we have a new value
to output we enqueue into outQ.
If we assume that a rule executes in one clock cycle then the
rule in Figure 3 speciﬁes that every cycle r1 and r2 should
be updated, one value should be dequeued from inQ, and
one value should be enqueued in the outQ. (The approximate
logic generated by each rule is shown as a cloud in all the
ﬁgures; we have omitted the control logic to avoid clutter.)
The sequences of values for r1 and r2 match exactly with






register r1 = 0, r2 = 0
fifo inQ, outQ;
method input(x) = inQ.enq(x);
method output() = outQ.deq();
method outValue = outQ.first();
rule produce_consume when (!inQ.empty() && !outQ.full()):
let x = inQ.first(); inQ.deq();
let y = f1(x,r1); let z = f2(y,r2);
r1 := y; r2 := z; outQ.enq(z);
Fig. 3. A Rule-based Speciﬁcation of the Initial Program
which serve as input and output in the original program match
the sequence of values in inQ and outQ (see Figure 4).
([x0, x1, x2, ...], r10, r20, []) −→
([x1, x2, ...], r11, r21, [z1]) −→
([x2, ...], r12, r22, [z1, z2]) −→
...
where: r10 = 0; r20 = 0;
r1i+1 = f1(xi, r1i);
r2i+1 = f2(r1i+1, r2i);
zi = r2i
Fig. 4. The behavior of the program in Figure 3. State is represented by
quadruples where the ﬁrst and ﬁnal member are the contents of inQ and
outQ, and the second and third members are the values of r1 and r2
The reﬁned FSM in Figure 2 may be described by splitting
our single rule into two rules: produce and consume, which
communicate via the FIFO q as shown in Figure 5.
The ﬁrst thing to understand about this two-rule program
is that it represents a nondeterministic speciﬁcation which
can be implemented by many different FSMs. For multiple
rule programs, the semantics only state that any enabled rule
(i.e., a rule in a state where its guard is true) can be executed;
it does not determine which rule to choose if more than
one is enabled. The following are possible schedules for this
program:







In the ﬁrst schedule the program repeatedly enters a token
into the FIFO and then immediately takes it out. This emulates
the execution of the rule in the unreﬁned program (Figure 3)
and leaves the FIFO q empty after each consume rule
execution. This schedule also does the same set of updates
to registers r1 and r2 as the original program. The second
schedule repeatedly queues up two tokens before removing
them. Note that this schedule will be valid only if q has space






register r1 = 0, r2 = 0
fifo q, inQ, outQ
rule produce when (!q.full() && !inQ.empty()):
let x = inQ.first(); inQ.deq();
let y = f1(r1,x);
q.enq(y); r1 := y
rule consume when (!q.empty() && !outQ.full()):
let y = q.first; q.deq();
let z = f2(y,r2);
outQ.enq(z); r2 := z;
Fig. 5. A Reﬁnement of the Program in Figure 3
starts, there will always be at least one token in q.
In case of multiple-rule programs, the behavior of the
program must be thought of in terms of the set of permitted
executions or more precisely, the set of the sequences of
values assumed by various state elements. A scheduler picks
a speciﬁc execution from this set. A schedule is chosen by the
compiler (with voluntary inputs from the designer) based on
some goodness criteria. The current Bluespec compiler [11]
schedules as many enabled rules as possible every cycle as
long as the rules do not conﬂict with each other. The behavior
produced by a parallel scheduler must be consistent with
some one-rule-at-time schedule. For the example at hand the
Bluespec compiler will schedule only the producer in the
ﬁrst cycle and then repeatedly schedule consumer followed
by producer in each subsequent cycle.
C. Observability
In what sense are the modules in Figure 3 and Fig-
ure 5 equivalent? Notice that given any sequence of inputs
x0, x1, x2, x3, ... both programs produce the same sequence of
outputs z1, z2, z3, .... However, the interleavings of permissi-
ble “observations” are different. Assuming all FIFOs are of
size 1, both systems can observe the following sequences:
x0, z1, x1, z2, x2, z3... and x0, x1, z1, z2, x2, x3, z3, .... How-
ever, the sequence x0, x1, x2, z1, z2, z3, ... can only be ob-
served for the reﬁned system, as the reﬁned system has more
buffering. Bluespec is expressive enough that one can write
a program to distinguish between these two modules by only
taking output from the module after a ﬁxed number of inputs
have entered. In spite of this, we want a notion of equivalence
which permits this reﬁnement.
The equality we will deﬁne applies only to full programs,
i.e., those which do not interact with the outside world. To
express equality between systems which interact with the
outside world, we need to construct a “generic” context which
represents all possible interactions with the outside world. This








register r1 = 0, r2 = 0
fifo inQ, outQ, obsQ;
rule produce_consume_observe when (!inQ.empty()
&& !outQ.full() && !obsQ.full()):
let x = inQ.first(); inQ.deq();
let y = f1(x,r1); let z = f2(y,r2);
let a = f3(r1,r2);
r1 := y; r2 := z;
outQ.enq(z); obsQ.enq(a);
Fig. 6. Program of Figure 3 with an Observer
sink queues to drive interactions and store results. For instance,
in our example we can attach a FIFO initially containing N
elements as source to inQ and a output FIFO with M empty
slots to outQ. It is easy to see why, given sufﬁcient sizes of N
and M , this closed system will model all possible interactions
of inQ and outQ with the outside world.
Under a weaker notion of equality, which relies on the
transitive closure of rule applications instead of trace equiv-
alence, the previously discussed reﬁnement is correct. At
the same time this weaker notion of equality can lead to
errors if the module is used incorrectly (for instance if the
input to the module changes depending on the number of
values outstanding). We rely on the user to express desired
distinctions programmatically. For instance if the user believes
the relative order of inputs and outputs are necessary, he can
add an additional FIFO to which we enqueue witnesses of
both input and output events. We believe this is the right
tradeoff between greater ﬂexibility of reﬁnements, and user-
responsibility in expressing correctness [3].
As another example, consider reﬁnements of a processor.
To show the correctness of a reﬁnement, it is sufﬁcient to
show that the reﬁned processor generates the same sequence of
instruction addresses of committed instructions as the original.
As such we can add a single observation FIFO to the context to
observe differences and consider all possible initial instruction
and data memory conﬁgurations to verify correctness.
D. An Example to Illustrate Incorrect Reﬁnements
While reﬁnements are often easy to implement, it is not
uncommon for a designer to make subtle mistakes. Consider
the original one-rule produce-consume example augmented
with observation logic as shown in Figure 6. In addition
to doing the original computation, this program computes a
function of the state of r1 and r2, and at each iteration inserts
the result into a new FIFO queue(obsQ). A designer may want
to do the same rule splitting exercise he had done with the ﬁrst








register r1 = 0, r2 = 0
fifo q, q1, inQ, outQ, obsQ
rule produce when (!q.full() && !inQ.empty()):
let x = inQ.first(); inQ.deq();
let y = f1(r1,x);
q.enq(y); r1 := y
rule consume when (!q.empty() && outQ.full()):
let y = q.first(); q.deq();
let z = f2(y,r2);
outQ.enq(z); r2 := z;
rule observe when (!obsQ.full()):
let a = f3(r1, r2); obsQ.enq(a);








register r1 = 0, r2 = 0
fifo inQ, outQ, obsQ, r1Q, r2Q, q;
rule produce when (!inQ.empty() && !r1Q.full()
&& !q.full()):
let x = inQ.first(); inQ.deq();
let y = f1(r1,x);
r1Q.enq(r1); q.enq(y); r1 := y;
rule consume when (!q.empty() && !r2Q.full()
&& !outQ.full()):
let y = q.first(); q.deq();
let z = f2(y,r2);
r2Q.enq(r2);
outQ.enq(z); r2 := z;
rule observe when (!obsQ.full() && !r1Q.empty()
&& !r2Q.empty()):
let x = f3(r1Q.first(),r2Q.first());
r1Q.deq(); r2Q.deq(); obsQ.enq(x);
Fig. 8. A correct reﬁnement of the program in Figure 6
This reﬁnement is clearly wrong; we can observe r1 and r2
out-of-sync via the new observer circuit. Thus, the sequence
produce observe consume has no correspondence in
the original program. For our tool to be useful to a designer,
it must be able to correctly determine that this reﬁnement is
incorrect (or rather that it failed to ﬁnd a matching behavior
in the original program). A correct reﬁnement is shown in
Figure 8, where extra queues have been introduced to keep
relevant values in sync. The correct solution would be obvious
to an experienced hardware designer because all paths in a
pipeline have the same number of stages.
E. Reﬁnements in Nondeterministic Programs
The examples that we have considered so far have started









register r1 = 0, r2 = 0
fifo inQ, outQ, obsQ;
rule produce_consume when (!inQ.empty() && !outQ.full()):
let x = inQ.first(); inQ.deq();
let y = f1(x,r1); let z = f2(y,r2);
r1 := y; r2 := z; outQ.enq(z);
rule observe when (!obsQ.full()):
let x = f3(r1,r2); obsQ.enq(x);
Fig. 9. A program with a nondeterministic observer
produce deterministic behaviors. Much of the value of rule-
based programs comes from the ability to specify programs
which can have multiple distinct behaviors. An example of
a useful nondeterministic speciﬁcation is that of a speculative
processor whose correctness does not depend upon the number
of instructions which are executed on the incorrect path. What
does it mean to do a reﬁnement in such a program?
Consider the example in Figure 9, which is a variation of
our producer-consumer example with an observer (Figure 6).
Unlike the lockstep version which did one observation for each
iteration, in this program we are allowed to not only miss some
updates of r1 and r2, but are permitted to repeatedly make
the same observations. An implementation, i.e., a particular
schedule, of this rule-based speciﬁcation would pick some
deterministic sequence of observations from the allowed set.
By giving such a speciﬁcation, the designer is saying, in effect,
that any schedule of observations is acceptable. In that sense,
the observations made in the program in Figure 6 are an
acceptable implementation of this nondeterministic program.
By the same reasoning we could argue that the reﬁnement
shown in Figure 8 is a correct reﬁnement of Figure 9.
But suppose we did not want to rule out any behaviors
prematurely in our reﬁnements, then a correct reﬁnement will
have to preserve all possible behaviors. We show a correct
reﬁnement of the nondeterministic program in Figure 10,
where we introduce an extra register, r1p, to keep a relevant
copy of r1 in sync with r2 with which to make legal
observations.
It is nontrivial to show that all behaviors in the new
program can be modeled by the original nondeterministic
speciﬁcation and vice versa. As we demonstrate later, our
tool can automatically verify this condition, though we do
require the programmer to specify a projection function, by
which state in the two different programs can be related. The
partial function relationship is both natural for designers to
come up with and easy to specify. Having manually deﬁned
this function, the designer passes it to the tool which tells










register r1 = 0, r2 = 0, r1p = 0
fifo inQ, outQ, q, obsQ
rule produce when (!q.full() && !inQ.empty):
let x = inQ.first(); inQ.deq();
let y = f1(x,r1);
r1 := y; q.enq(y);
rule consume when (!q.empty() && !outQ.full()):
let x = q.first(); q.deq();
let z = f2(x,r2);
r1p := x; r2 := z; outQ.enq(z);
rule observe when (!obsQ.full())
let x = f3(r1p, r2); obsQ.enq(x);
Fig. 10. Correct reﬁnement of Figure 9
an execution from one program which it believes cannot be
simulated by the other.
III. FORMALIZING BEHAVIORS AND CORRECTNESS OF
REFINEMENTS
We model the behavior of a program using a state transition
system. A program P has a collection of state elements and
a set of rules RP . The states in the transition system are
the values assumed by these state elements. Thus, the state
transition system of a program with two 32-bit registers would
have 264 states.
A. Equivalence of Programs
Deﬁnition 1 (State Transition System of Program P ). Each
program P is modeled by a state transition system given by
a triple of the form:
(S, S0,−→)
where S is the set of states associated with program
P ; S0 ⊆ S is the set of states corresponding to initial
conﬁgurations of P ; and −→S⊆ S × S is the transition
relation, deﬁned such that (s, s′) ∈−→S if and only if there
exists some rule R in P whose execution takes the state s to
s′. In addition, we write  to denote the reﬂexive transitive
closure of −→. 
It is sometimes useful to know which rule R caused the
transition from s to s′; we will denote this by writing:
s
R−−→ s′
Similarly we write the sequence of rule executions σ =
R1, R2, ...Rn where:
s0




Intuitively two programs P1 and P2 with the same set of
states are equivalent if every transition in one system can
be simulated by a sequence of transitions in the other. That
is, every ﬁnite execution s  s′ in P1 has a corresponding
execution s s′ in P2 and vice versa.
Deﬁnition 2 (Equivalence of Programs). Let P be a program
modeled by the transition system S = (S, S0,−→S) and
let P ′ be a program modeled by the transition system
S ′ = (S′, S′0,−→S′); P and P ′ are are equivalent if and
only if S = S′, S0 = S′0, and S=S′ . 
This deﬁnition captures the fact that two program may have
a different set of rules but may still be equivalent in terms of
their transitive closure. This ability allows us to add “derived”
rules whose execution is always expressible in terms of the
other rules in the program without affecting the meaning of
the program, which is very important in implementing these
systems. Sometimes the consequences of this equality are non-
intuitive. For example, a modulo up-counter (0 to 1 to 2 back to
0) and a modulo down-counter (0 to 2 to 1 back to 0) will have
the same transitive closure and thus be considered equivalent.
Our notion of equality only says that you can get from 0 to 1,
and whether you did so directly or through another state, i.e., 2
does not matter. If we want to distinguish these two counters,
we can always add an observation queue to record the counter
value after each transition. Practically, this is not required in
a real system as the context in which such a counter is used
will implicitly either demand one of these orders or will work
equally well with either counter.
Modeling the behavior of a program in terms of the tran-
sitive closure of executions also allows us to deﬁne several
other notions precisely:
Deﬁnition 3 (Deterministic Programs). Let P be a program
modeled by the transition system S = (S, S0,−→S). P
is deterministic if and only if all pair-wise executions are
joinable, that is: when for all s0, s1, s2 ∈ S such that
s0 S s1 and s0 S s2, there exists an s3 ∈ S such
that s1 S s3 and s2 S s3. A program is called non-
deterministic if it is not a deterministic program. 
Examples of nondeterministic programs are shown in Fig-
ures 7, 9, and 10. All other examples given in Section II are
deterministic according to the above deﬁnition.
As we have shown, implementing a rule-based program
requires choosing a schedule. A scheduler by deﬁnition im-
plements a speciﬁc execution sequence. Thus, in the case of
non-deterministic programs the scheduler eliminates all non-
determinism and produces one speciﬁc behavior from among
the allowed set of behaviors. Even for deterministic programs
the scheduler is sometimes not able to produce a “complete”
behavior because the scheduling of rules may be unfair. In
such cases we say that an implementation is partially correct:
Deﬁnition 4 (Partially Correct Implementation). Let P be a
program modeled by the transition system S = (S, S0,−→S)
and let P ′ be a program modeled by the transition system
S ′ = (S′, S′0,−→S′). P ′ is a partially correct implementation
of P if and only if S′ = S, S′0 = S0, and S′⊆S . 
B. Correctness of Reﬁnements
Correctness is slightly more complicated to deﬁne for
the reﬁnements because the speciﬁcation and implementation
programs have different state elements. In addition, the set of
rules in the implementation is formed by removing a rule from
the speciﬁcation rule set, replacing it by several rules which
together simulate the removed rule. The new rules operate
on the new implementation state, and the remaining rules are
“lifted” to operate on the new implementation state. This is
clearly more complicated than the simple addition of a derived
rule described in the previous section.
Consider the reﬁnement from the program in Figure 3
to the one Figure 5. Both programs have r1 and r2 but
they get updated at different times in the two programs and
may not match. Intuitively we know that if q contains an
element, then we are part way through at least one round
of an equivalent produce_consume atomic computation,
and r1 and r2 will appear out of sync in the two programs.
Conversely, whenever q is empty we should be able to draw a
correspondence between the executions of the two programs.
We can guarantee that the reﬁned program does not add new
behaviors if we can show that for any execution in the reﬁned
program, whenever it reaches a state where q is empty we can
ﬁnd a corresponding execution in the original program which
has the same values for the matching state elements, i.e., r1
and r2. To guarantee that we haven’t lost behaviors we must
also show the converse: namely that any computation in the
original program can be mimicked by the reﬁned program.
This is quite easy to show because produce followed by
consume behaves exactly as produce_consume.
We ﬁrst offer an intuitive reason why the original program
can mimic the reﬁned one. Consider those preﬁxes of a
schedule in the reﬁned program which have equal number of
produce and consume rule executions. At the end of such
a preﬁx, q must be empty and we can meaningfully verify
that the speciﬁcation and implementation match. Further, since
produce always adds a token and consume always removes
one, we must have a non-empty q when we have an unequal
number of produces and consumes. As such we there’s no
meaningful state match. However, by making an appropriate
number of consumes we can always empty q where upon
our previous line of reasoning would apply and we’d have a
match. Therefore, all we have to show is that 1) for all preﬁxes
where we end with an empty q the speciﬁcation can mimic the
execution of the implementation, and 2) for all other sequences
the implementation can move to a state where q is empty.
To reason about the correctness of reﬁnement formally,
we need a projection function p to relate implementation
state to speciﬁcation state. This projection function is often
partial. The states which are in the domain of the projec-
tion function (e.g., Dom(p)) are called relatable. That is, if
S = (S, S0,−→S) is reﬁned by T = (T, T0,−→T ) by a
partial-function p : T ⇀ S, the set of relatable states in T
is, by deﬁnition, the set Dom(p) of states where p is deﬁned.
All initial states should be in Dom(p), that is, T0 ⊆ Dom(p).
Furthermore, for any two states t1, t2 ∈ Dom(p) such that
t1 T t2, we should have p(t1)S p(t2).
This condition covers all ﬁnite executions in the implemen-
tation program which start and end in relatable states, but not
those which start in a relatable state but do not end in one.
To address those executions, we must verify that every ﬁnite
execution in the implementation beginning in a relatable state
which does not end in a relatable state, can eventually reach
one.
Deﬁnition 5 (Partially Correct Reﬁnement). Let P be a
program modeled by the transition system S = (S, S0,−→S),
P ′ be a program modeled by the transition system T =
(T, T0,−→T ), and p : T ⇀ S a partial function relating
states having the property that T0 ⊆ Dom(p). P ′ is a partially
correct reﬁnement of P exactly when the following conditions
hold:
1) Correspondence of Initial State: {p(t)|t ∈ T0} = S0.
2) Soundness: For all t1, t2 ∈ Dom(p) such that t1 T t2,
also p(t1)S p(t2).
3) Limited Divergence: For all t0 ∈ T0 and t1 ∈ T such
that t0 T t1, there exists t2 ∈ Dom(p) such that
t1 T t2. 
The ﬁrst clause states the initial states correspond to each
other. The second clause states that every possible execution
in the implementation whose starting and ending states have
corresponding states in the speciﬁcation must have a corre-
sponding execution in the speciﬁcation. The third clause states
that from any reachable state in the implementation we can
always get back to a state which corresponds to a state in the
speciﬁcation. Note that these clauses alone do not guarantee
that the speciﬁcation has been fully implemented. To guarantee
that a speciﬁcation has been fully implemented, we need the
notion of total correctness.
Deﬁnition 6 (Totally Correct Reﬁnement). A totally correct
reﬁnement is a partially correct reﬁnement that, in addition,
satisﬁes:
4) Completeness: For all s1, s2 ∈ S and t1 ∈ Dom(p)
such that s1 S s2 and p(t1) = s1, there exists an
t2 ∈ Dom(p) such that t1 T t2 with p(t2) = s2. 
This states that all executions in the speciﬁcation program are
preserved in the implementation.
Of the conditions for total correctness, correspondence of
initial state the completeness are easy to verify in the context
of rule splitting. This leaves us only concerned with soundness
and limited divergence.
IV. CHECKING SIMULATION USING SMT SOLVERS
We can understand the execution of rule R as the application
of a pure function fR of type S −→ S to the current
state. When the guard of R fails, it causes no state change
(i.e., fR(s) = s). We can compose these functions to generate
a function fσ corresponding to a sequence of rules σ. To prove
the correctness of reﬁnements, we pose queries about fσ to
an SMT solver.
SMT solvers are conceptually Boolean Satisﬁability (SAT)
solvers extended to allow predicates relating to non-boolean
domains (characterized by the particular theories it imple-
ments). SMT solvers do not directly reason about computation,
but rather permit assertions about the input and output relation
of functions. They provide concrete counter-examples when
the assertion is false. For example, suppose we wish to verify
that some concrete function f behaves as the identity function.
We can formulate a universal quantiﬁcation representing the
property: ∀x, y.(x = f(y))∧ (x = y). An SMT solver can be
used to solve this query, provided the domains of x and y are
ﬁnite, and f is expressed in terms of boolean variables. If the
SMT solver can ﬁnd a counter-example, then the property is
false. If not, then we are assured that f must be the identity.
The speed of SMT solvers on large domains is due to their
ability to exploit symmetries in the search space [6].
When we reason about rule execution it is often useful to
discard all executions where a rule produces no state update
(a degenerate execution); it is clearly equivalent to the same
execution with that rule removed. As such, when posing
questions to the solver it is useful to add clauses which state
that sequential states of an execution are different. To represent
this assertion for the rule R, we deﬁne the predicate function
fˆR(s2, s1) which asserts that the guard of rule R evaluates to
true in s1 and that s2 is the updated state:
fˆR(s2, s1) = (s2 = fR(s1)) ∧ (s2 = s1)
As with the functions, we can construct a larger predicate
fˆσ(s2, s1) which is true when a non-degenerate execution of
σ takes us from s1 to s2.
Now we explain how the propositions in Deﬁnition 5 can
be checked via a small set of easily answerable SMT queries.
A. Checking Correctness
For this discussion let us assume we have a speciﬁcation
program P and a reﬁnement P ′ and their respective transition
systems S = (S, S0,−→S ,S) and T = (T, T0,−→T ,T )
are related by the projection function p : T ⇀ S.
Now let us consider the soundness proposition from Deﬁni-
tion 5: ∀t1, t2 ∈ Dom(p).(t1 T t2) =⇒ (p(t1)S p(t2)).
A naı¨ve approach to verifying this property entails explicitly
enumerating all pairs (t1, t2) in the relationT and checking
the corresponding pair (p(t1), p(t2)) in the relation S . As
the set of states in both systems are ﬁnite, both of these
relations are similarly ﬁnite (bounded by |T |2 and |S|2, respec-
tively) and thus we can mechanically check the implication.
We can substantially reduce this work by noticing two facts.
First, because of transitivity, if we have already checked the
correctness of t1
σ1T t2 and t2
σ2T t3, then there is no need
to check the correctness of execution σ = σ1σ2. Second, if we
have already found an execution σ such that t
σT t′ then we
can ignore all other executions σ′ = σ which have the same
starting and ending states as they must also be correct. This
essentially reduces the task from checking the entire transitive
closure to checking only a covering of it. Unfortunately, the
size of this covering is still very large.
The insight on which our algorithm is built is that proving
this property for a small set of ﬁnite rule sequences is
tantamount to proving the property for any execution. We
explain this idea using the program in Figure 5.
• Let’s begin by considering all rule sequences of length
one: produce and consume.
• The sequence consume is never valid for execution
starting in a relatable state so we need not consider it
further.
• The sequence produce is valid to execute but does
not take us to a relatable state, so we construct more
sequences by extending it with each rule in the implemen-
tation. These new sequences are produce produce
and produce consume.
• The sequence produce consume always takes a re-
latable state to another relatable state. We check that
all concrete executions of produce consume have
a corresponding execution in the speciﬁcation. We do
this check over a ﬁnite set of sequences in S (in this
case: produce_consume), the selection of which we
will explain later. Since all executions of produce
consume end in a relatable state, we need not extend it.
• produce produce never takes us from relatable state
to relatable state, so again extend the sequence to
get new sequences produce produce produce and
produce produce consume.
• produce produce produce is degenerate if q is of
length 2 (q has to have some known ﬁnite length).
• Suppose we could prove that the sequence produce
produce consume always behaves like produce
consume produce. Then any execution preﬁxed by
produce produce consume is equal to an execu-
tion preﬁxed by produce consume produce. No-
tice that we need not consider any sequences preﬁxed by
produce consume produce because itself has the
preﬁx produce consume. Therefore we need not con-
sider further sequences preﬁxed by produce produce
consume.
• Because we have no new extension to consider, we have
proved the correctness of this reﬁnement.
Each of these steps involved an invocation of the SMT
solver on queries which are much simpler than the general
query presented previously, though the solver still must con-
ceptually traverse the entire state space. The queries them-
selves are simple because they are always presented using rule
sequences of concrete length, which are much smaller than
the sequences in T . The only problem with this procedure
is that in the worst case this algorithm will run for the
maximum number of states in S . If we give up before the
correctly terminating condition, this only means we have failed
to establish the correctness of the reﬁnement. We think it is
unlikely that the type of reﬁnements we consider in this paper
will enter this case. In fact most reﬁnements can be shown to
be correct with very small number of considered sequences.
B. The Algorithm
The algorithm constructs three sets, each of whose elements
corresponds to a set of ﬁnite executions of T . For each
iteration, Rσ represents the set of ﬁnite sequences for which
we have explicitly found a corresponding member, and U
represents the set of ﬁnite executions we have yet to verify
(each element of U conceptually represents all ﬁnite sequences
starting with some concrete sequence of rule executions σ).
NU is the new value of U being constructed for the next
iteration of the execution.
The Veriﬁcation Algorithm:
1) Initially: Rσ := ∅, U := {Ri|Ri ∈ RP ′}, NU := ∅
2) if U = ∅, we have veriﬁed all ﬁnite executions. Exit with
Success.
3) Check if we have reached our iteration limit. If so, give
up, citing the current U set as the cause of the uncertainty.
4) For each σ ∈ U :
a) Check if the execution of σ from a relatable state is
ever non-degenerate:
∃t1 ∈ T, t2 ∈ Dom(p).(t1
σT t2)
If no execution exists we can stop considering σ
immediately.
b) Check if σ should be added to Rσ . That is, if some
execution of σ should have a correspondence in S:
∃t, t′ ∈ Dom(p).(t σT t′)
If so Rσ := Rσ ∪ {σ}.
c) Check if all ﬁnite executions of σ that should have a
correspondence in S have such a correspondence:
∀t, t′ ∈ Dom(p).(t σT t′) =⇒ ∃σ′.(p(t)
σ′S p(t′))
If this fails due to some concrete execution of σ,
exit with Failure providing the counter example as
justiﬁcation.
d) For every execution where σ does not put us in a
relatable state, we must show that extensions of the
form σσ′ have an equivalent execution σ1σ2σ′, where
σ1 is a member of Rσ and |σ1σ2| ≤ |σ|. Thus, the
correctness of σσ′ is reduced to the correctness of the
shorter sequence σ2σ′.
∀t ∈ Dom(p), t′ ∈ T.(t σT ) =⇒
∃σ1 ∈ Rσ, σ2.
(|σ1σ2| ≤ |σ|) ∧ (σ1(t) ∈ Dom(p))
∧ (σ2(σ1(t)) = t′).
If this succeeds, we need not consider executions for
which σ is a preﬁx. If not, partition all the extensions
into the |RP ′ | sets of rules by extending σ by one rule
execution. NU := NU ∪ {σ.Ri|Ri ∈ RP′}.
5) U := NU , NU := ∅, Go to Step 2. 
C. Formulating the SMT Queries
The four conditions in the inner-most loop of the algorithm
can be formulated as the following SMT queries using the fˆσ
predicate and the computational version of projection function
p, pˆ : T −→ S and rel : T −→ {0, 1} where p and pˆ are the
same if p is deﬁned and rel(t) returns true exactly when p(t)
is deﬁned.
1) Existence of valid execution of σ starting from a relatable
state:
∃t1, t2 ∈ T.fˆσ(t2, t1) ∧ rel(t1)
2) Verifying that each execution of σ in the implementation
starting and ending in a relatable state has a correspond-
ing execution in the speciﬁcation:
∀t1, t2 ∈ T.
(rel(t1) ∧ rel(t2) ∧ fˆσ(t2, t1)) =⇒∨
σ′∈EC(σ)(fˆσ′(pˆ(t2), pˆ(t1)))
where EC is the “expected correspondences” function
which takes a sequences of rules σ in T and returns
a ﬁnite set of sequences in S to which σ is likely
to correspond. This function can be easily generated
by the tool or the user, since the reﬁnements are rule
splitting, it is easy to predict the candidates in the
speciﬁcation that could possibly mimic σ. For instance,
consider the reﬁnement of the program in Figure 3 to
the one in Figure 5. Each occurrence of produce
in the implementation should correspond to an oc-
currence of produce_consume in the speciﬁcation.
Thus, the sequence produce produce consume
produce, if it has a correspondence at all, could
only correspond to the sequence produce_consume
produce_consume produce_consume.
3) Checking that every valid execution of σ in the imple-
mentation has an equivalent sequence which is correct
by concatenation of smaller sequences:
∀t1, t2, tm ∈ T.
rel(t1) ∧ fˆσ(t2, t1) =⇒ rel(tm) ∧∨
σ1∈Rσ (
∨
σ2∈EA(σ,σ1)(fˆσ1(tm, t1) ∧ fˆσ2(t2, tm)))
Our algorithm requires us to ﬁnd, given σ and σ1 in T ,
a σ2 such that the execution of σ is the same as the
execution of σ1σ2, and |σ1σ2| ≤ |σ|. We will assume the
existence of a “expected alternatives” function EA which
enumerates all possible σ2 given σ and σ1.
D. Step-By-Step Demonstration
For the sake of clarity, we provide an additional example
of the algorithm’s execution. Figure 11 gives the trace of
reasoning through which our algorithm progresses in order to
verify the reﬁnement of the program in Figure 6 to the one in
Figure 7. Each node represents an element in the algorithm’s
set U , and the path from the root to any node in the graph
corresponds to the concrete value σ for that node. At each
node, we verify the correctness of all corresponding ﬁnite
executions of σ: nodes displayed as ⊥ are vacuously true by
Step 4a, while other leaf nodes are either true by Step 4d or
incorrect by Step 4c. The program is ultimately rejected as the
reﬁnement being checked is incorrect:
• We begin by considering all rule sequences of length
one executed in a relatable state: produce, consume,
and observe. The rule observe always ends in a
relatable state, and corresponds directly to the observe
rule in the speciﬁcation program. consume is never valid
to execute, so the only sequence which we extend is
produce since it never ends in a relatable state.
• We now extend produce, giving us three new
sequences to consider: produce produce, produce
consume, and produce observe. produce
consume always ends in a relatable state and
corresponds to the execution of produce_consume
in the speciﬁcation. Neither produce produce, nor
produce observe ever end in a relatable state, and
since we are unable to prove their equivalence to an
execution we have already veriﬁed, we extend both.
• In the third iteration, we consider the sequence produce
observe consume, which always ends in a relatable
state. This exposes an error in the reﬁnement since there
is no possible sequence of rule in the speciﬁcation which
produces this ﬁnal state (in this case, the implementation




















Fig. 11. Tree visualization of the algorithmic steps to check the reﬁnement
of the program in Figure 6 to the one in Figure 7
V. THE DEBUGGING TOOL AND EVALUATION
Our tool works with Bluespec SystemVerilog (BSV) [4]
which is a commercial rule-based language aimed at hardware
design. It takes as input an intermediate output of the BSV
compiler where all data types are represented as bit vectors.
The rules are expressed using bit vector expressions and
primitive modules (e.g., registers, FIFOs, and memories). The
algorithm in Section IV works more efﬁciently when rule sizes
are small, therefore the ﬁrst phase of the tool is to reduce
the size of actions by action sequentialization, conditional
merging, and “when lifting” [7]. Next, the tool generates the
function fR for each rule R. We use typed λ-calculus with
let blocks to represent these functions and apply many small
transformations to simplify them.
The tool is essentially an embodiment of the algorithm
shown in Section IV in Haskell. As we have discussed, this
algorithm makes many queries to an SMT solver; we use the















(b) 5 stage SMIPS Processor
Fig. 12. SMIPS processor reﬁnement
(e.g., rule commutativity and sequence degeneracy) of the
programs, we remove unneeded sequences from consideration
in the sets EA and EC. This has substantial impact on the
size of SMT queries.
To demonstrate our tool, we consider a reﬁnement of a
Simpliﬁed MIPS (SMIPS) processor, whose ISA contains
a representative subset of 35 instructions from the MIPS
ISA. While the ISA semantics are speciﬁed one instruction
at a time, our program is pipelined with ﬁve stages in the
style of the DLX processor [17], and resembles soft-cores
used in many FPGA designs. The execution of the ﬁnal
implementation is split into the following ﬁve separate stages
(see Figure 12(b)):
1) Fetch requests the next instruction from the instruction
memory (imem) based on the pc register which it then
updates speculatively to the next consecutive pc.
2) Decode takes the data from the instruction memory and
the fetch stage, decodes the instruction, and passes it
along to the execute stage. It also reads the appropriate
locations in the register ﬁle rf, stalling to avoid data
hazards (stall logic is not shown).
3) Execute gets decoded instructions from the execute
queue, performs ALU operations and translates addresses
for memory operations. To handle branch operations, it
kills mispredicted instructions and sets the pc.
4) Memory performs reads and writes to the data mem-
ory, passing the data to the writeback state. (A further
reﬁnement might introduce a more realistic split-phase
memory, which would move some of this functionality
into the writeback stage).
5) Writeback gets instructions in the form of register des-
tination and value pairs, performing the update on the
register ﬁle.
The implementation program contains one rule per stage,
and stages communicate via FIFO connections. If we were
to executes the rules for each stage in reverse order (starting
from writeback and ﬁnishing with fetch), the result is a fully
pipelined system. If each FIFO is implemented as a single reg-
ister with a valid bit, this is indistinguishable from the standard
processor complete with pipeline stalls. If instead we execute
the rules in pipeline order, we end up with a system where the
instructions ﬂy through the processor one-at-a-time. For code
simplicity, our ﬁnal implementation actually decomposes the
execute stage into three mutually exclusive cases, implement-
ing each with a separate rule(exec, exec_branch, and
exec_branch_mispredict). Since the rule guards are
mutually exclusive, this does not modify the pipeline structure,
nor does it change the analysis.
Our implementation is relatively complicated and we would
like to know if it matches the ISA. One way to achieve this is
to start with a single-rule description of the behavior (translit-
erated directly from the documentation, which we consider to
be correct), and incrementally reﬁne the program towards the
ﬁnal ﬁve-stage implementation. After each reﬁnement, our tool
can be used to verify correctness with regards to the previous
iteration. For the sake of brevity, we examine only the ﬁnal
reﬁnement, which takes a four-stage processor (Figure 12(a))
and splits the fetch-decode stage. Though the transformation
is straightforward, the tool must be able to correctly resolve
the effect of speculative execution from branch prediction.
The tool is able to establish the correctness of this reﬁne-
ment step in under 7 minutes. To do so it needed to check
21 executions in the reﬁned program of maximum length 3,
ﬁnding correspondences in the four-stage program for the 5
corresponding rules, fetch_decode for fetch decode,
and exec_branch_mispredict for the mispeculating se-
quences fetch exec_branch_mispredict and fetch
fetch exec_branch_mispredict.
VI. RELATED WORK
There is a rich body of literature for verifying pipelined
processors (see,e.g., [2], [5], [12], [15], [20]). Most of this
work is motivated by proving the correctness of a pipelined or
our-of-order microarchitectural implementation of a processor.
Usually the speciﬁcation is an unpipelined model of the
processor. The work generally relies on mechanical theorem
proving, and the technique of pipeline draining to match the
speciﬁcation and implementation states is well established in
this context. Our lifting and projection functions effectively do
the same thing, and at some level, all these papers are about
the ability of the systems to simulate each other. The literature
on bisimulation (see, e.g., [16]) and stuttering simulation (see,
e.g., [14]) is also very rich and relevant.
Our tool is totally automatic and is intended as a debugging
aid as opposed to establishing total correctness of the design.
The problem we address is one of local transformation, i.e.,
an atomic rule which has been split into multiple atomic sub-
rules with the same functionality. We want to show that this
local transformation has no “bad” consequences on the whole
design. In this scenario almost every aspect of the design
has been speciﬁed and it becomes amenable to automated
veriﬁcation. The proof of the reﬁnements we discussed relied
on a concrete size for the inserted FIFOs; in contrast, the proof
of Arvind and Shen [2] works for FIFOs of any size.
Singh et. al. [19] translate a restricted subset of Bluespec to
PROMELA as a means for querying the SPIN model checker
about reﬁnements. Conceptually one could translate both the
speciﬁcation and implementation programs into PROMELA
and verify that the PROMELA systems represent a valid
PROMELA reﬁnement. The subset of Bluespec they consider
is interesting from a semantic point of view, but not large
enough to deal with realistic programs. Richards et. al. [18]
proposed a more complete translation of Bluespec to PVS
leveraging a monadic representation. Both of these focused on
the task of getting the Bluespec design translated for the model
checker faithfully. Expressing our formulation as correctness
of reﬁnements via the transitive closure of rules as an LTL or
CTL property is not straightforward; in fact it is not clear to
us if this is even possible.
VII. DISCUSSION
In this paper we have presented both a notion of reﬁnement
and a tool for verifying the correctness of such reﬁnements.
The exact notion of correctness is aimed at being easily and
naturally expressible by a designer. The tool quickly ﬁnds the
correspondence by searching for a minimum cover of the paths
in the system starting from all possible relatable states.
The notion of equality we implement is that of a closed
system. We have shown intuitively how to rephrase the veriﬁ-
cation of open system, i.e., modules, by encoding the notion
of equivalence programmatically. This is highly practical as it
is both easy to do and lends itself to the way that designers
understand equivalences, allowing them to express their con-
cerns more precisely. It is possible to automatically generate
the necessary contexts to express some standard equivalences,
e.g., trace equivalence.
The current tool can be improved in three orthogonal
dimensions. First, we currently only leverage the theory of bit
vectors. By adding additional theories of FIFOs, arrays, and
uninterpreted functions [13] we can dramatically reduce the
complexity of our SMT queries. Secondly, our interface with
the SMT solver is inefﬁcient, requiring ﬁle-level IO. More
than half of the compute time comes from marshaling and
unmarshaling the query representation. This clearly can be
eliminated by directly integrating an SMT solver with our tool.
Finally, our algorithm allows us to reason about each element
of U in parallel. This is quite straightforward to exploit in
a multithreaded implementation of our program. With a fast
enough tool, we are conﬁdent that this can help shape how
designers approach their work and will encourage further use
of formal reasoning in design.
ACKNOWLEDGMENTS
We are thankful to the anonymous referee who helped
us clarify the difference between our transitive closure-based
equivalence and the more standard trace-based techniques.
We are thankful to Armando Solar-Lezama for helping us
understand the difﬁculty of expressing our problem in model
checking. This work has been supported by the National
Science Foundation (#CCF-0541164).
REFERENCES
[1] Arvind, Rishiyur S. Nikhil, Daniel L. Rosenband, and Nirav Dave. High-
level Synthesis: An Essential Ingredient for Designing Complex ASICs.
In Proceedings of ICCAD’04, San Jose, CA, 2004.
[2] Arvind and Xiaowei Shen. Using Term Rewriting Systems to Design
and Verify Processors. IEEE Micro, 19(3):36–46, May 1999.
[3] Arvind and Nirav Dave and Michael Katelman. Getting formal ver-
iﬁcation into design ﬂow. In Proceedings of the 15th international
symposium on Formal Methods, FM ’08, pages 12–32, 2008.
[4] Bluespec, Inc., Waltham, MA. Bluespec SystemVerilog Version 3.8
Reference Guide, November 2004.
[5] Jerry R. Burch and David L. Dill. Automatic veriﬁcation of pipelined
microprocessor control. In Computer Aided Veriﬁcation, pages 68–80,
1994.
[6] Stephen A. Cook. The complexity of theorem-proving procedures.
In Proceedings of the third annual ACM symposium on Theory of
computing, STOC ’71, pages 151–158, New York, NY, 1971.
[7] Nirav Dave, Arvind, and Michael Pellauer. Scheduling as Rule Com-
position. In Proceedings of Formal Methods and Models for Codesign
(MEMOCODE), Nice, France, 2007.
[8] Nirav Dave, Man Cheuk Ng, Michael Pellauer, and Arvind. A design
ﬂow based on modular reﬁnement. In Formal Methods and Models
for Co-Design, 2010. MEMOCODE ’10. 8th IEEE/ACM International
Conference on, June 2010.
[9] Kermin Fleming, Chun-Chieh Lin, Nirav Dave, Gopal Raghavan, Jamey
Hicks, and Arvind. H.264 decoder: A case study in multiple design
points. In In Proceedings of Formal Methods and Models for Codesign
(MEMOCODE 2008), Anaheim, CA, 2008.
[10] Vijay Ganesh and David L. Dill. A Decision Procedure for Bit-Vectors
and Arrays. In 19th International Conference on Computer Aided
Veriﬁcation (CAV-07), pages 519–531, 2007.
[11] James C. Hoe and Arvind. Operation-Centric Hardware Description
and Synthesis. IEEE TRANSACTIONS on Computer-Aided Design of
Integrated Circuits and Systems, 23(9), September 2004.
[12] Sava Krstic´, Robert B. Jones, and John O’Leary. Mothers of pipelines.
Electron. Notes Theor. Comput. Sci., 174:7–22, June 2007.
[13] S. Lahiri, S. Seshia, and R. Bryant. Modeling and veriﬁcation of out-
of-order microprocessors in UCLID. In FMCAD ’02, volume 2517 of
LNCS, pages 142–159. Springer-Verlag, November 2002.
[14] P. Manolios. A compositional theory of reﬁnement for branching time.
In CHARME 2003, volume 2860 of Lecture Notes in Computer Science,
pages 304–318. Springer, 2003.
[15] Kenneth L. McMillan. Veriﬁcation of an implementation of tomasulo’s
algorithm by compositional model checking. In Proceedings of the
10th International Conference on Computer Aided Veriﬁcation, CAV
’98, pages 110–121, London, UK, 1998.
[16] K. S. Namjoshi. A simple characterization of stuttering bisimulation. In
FSTTCS’97, volume 1346 of Lecture Notes in Computer Science, pages
284–296. Springer, 1997.
[17] David A. Patterson and John L. Hennessy. Computer Organization
& Design: The Hardware/Software Interface, Second Edition. Morgan
Kaufmann, 1997.
[18] Dominic Richards and David Lester. A monadic approach to automated
reasoning for bluespec systemverilog. Innovations in Systems and
Software Engineering, pages 1–11, 2011.
[19] Gaurav Singh and Sandeep Shukla. Verifying compiler based reﬁnement
of bluespec speciﬁcations using the spin model checker. In Model
Checking Software, volume 5156 of Lecture Notes in Computer Science,
pages 250–269. Springer Berlin / Heidelberg, 2008.
[20] P.J. Windley. Formal modeling and veriﬁcation of microprocessors.
Computers, IEEE Transactions on, 44(1):54 –72, jan 1995.
