Verifying Safety of a Token Coherence Implementation by Parametric Compositional Refinement by Burckhardt, Sebastian et al.
University of Pennsylvania
ScholarlyCommons
Departmental Papers (CIS) Department of Computer & Information Science
January 2005
Verifying Safety of a Token Coherence
Implementation by Parametric Compositional
Refinement
Sebastian Burckhardt
University of Pennsylvania
Rajeev Alur
University of Pennsylvania, alur@cis.upenn.edu
Milo M.K. Martin
University of Pennsylvania, milom@cis.upenn.edu
Follow this and additional works at: http://repository.upenn.edu/cis_papers
From the 6th International Conference, VMCAI 2005, Paris, France, January 17-19, 2005.
This paper is posted at ScholarlyCommons. http://repository.upenn.edu/cis_papers/174
For more information, please contact libraryrepository@pobox.upenn.edu.
Recommended Citation
Sebastian Burckhardt, Rajeev Alur, and Milo M.K. Martin, "Verifying Safety of a Token Coherence Implementation by Parametric
Compositional Refinement", Lecture Notes in Computer Science: Verification, Model Checking, and Abstract Interpretation 3385, 130-145.
January 2005. http://dx.doi.org/10.1007/978-3-540-30579-8_9
Verifying Safety of a Token Coherence Implementation by Parametric
Compositional Refinement
Abstract
We combine compositional reasoning and reachability analysis to formally verify the safety of a recent cache
coherence protocol. The protocol is a detailed implementation of token coherence, an approach that decouples
correctness and performance. First, we present a formal and abstract specification that captures the safety
substrate of token coherence, and highlights the symmetry in states of the cache controllers and contents of
the messages they exchange. Then, we prove that this abstract specification is coherent, and check whether the
implementation proposed by the protocol designers is a refinement of the abstract specification. Our
refinement proof is parametric in the number of cache controllers, and is compositional as it reduces the
refinement checks to individual controllers using a specialized form of assume-guarantee reasoning. The
individual refinement obligations are discharged using refinement maps and reachability analysis. While the
formal proof justifies the intuitive claim by the designers about the ease of verifiability of token coherence, we
report on several bugs in the implementation, and accompanying modifications, that were missed by extensive
prior simulations.
Comments
From the 6th International Conference, VMCAI 2005, Paris, France, January 17-19, 2005.
This conference paper is available at ScholarlyCommons: http://repository.upenn.edu/cis_papers/174
Verifying Safety of a Token Coherence
Implementation by Parametric Compositional
Refinement∗
(Extended Version with Proofs)
†
Sebastian Burckhardt Rajeev Alur Milo M. K. Martin
University of Pennsylvania
Abstract
We combine compositional reasoning and reachability analysis to for-
mally verify the safety of a recent cache coherence protocol. The protocol
is a detailed implementation of token coherence, an approach that decou-
ples correctness and performance. First, we present a formal and abstract
specification that captures the safety substrate of token coherence, and
highlights the symmetry in states of the cache controllers and contents
of the messages they exchange. Then, we prove that this abstract speci-
fication is coherent, and check whether the implementation proposed by
the protocol designers is a refinement of the abstract specification. Our
refinement proof is parametric in the number of cache controllers, and is
compositional as it reduces the refinement checks to individual controllers
using a specialized form of assume-guarantee reasoning. The individual
refinement obligations are discharged using refinement maps and reach-
ability analysis. While the formal proof justifies the intuitive claim by
the designers about the ease of verifiability of token coherence, we report
on several bugs in the implementation, and accompanying modifications,
that were missed by extensive prior simulations.
∗This research was partially supported by the NSF award CCR0306382, and a donation
from Intel Corporation.
†The published short version [7] is copyrighted by Springer Verlag. This extended version
includes more material, in particular more proofs. Because the readability is impacted, we
recommend that this version is used for reference only. Both versions can be found online [8].
The numbering of proofs, definitions and examples is consistent with the regular versions —
references that do only appear in the extended version are prefixed with E.
1
1 INTRODUCTION
1 Introduction
Shared memory multiprocessors have become the most important architecture
used for commercial and scientific workloads. Such systems use hardware cache
coherence protocols to create the illusion of a single, shared memory without
caches. These protocols are important factors of the overall system performance,
and numerous optimizations contribute to their complexity. Since hard-to-cover
race conditions elude simulations of the protocols, formal methods are often
employed to verify their correctness.
Token Coherence is a new approach to cache coherence protocols that decou-
ples correctness requirements from performance choices, claiming to improve
both performance and verifiability [23]. Separate correctness mechanisms en-
sure safety and liveness. Safety is achieved by token counting: per memory
location, the number of tokens in the system is a global invariant. By requiring
at least one token for read access and all tokens for write access, the proto-
col directly enforces a single-writer, multiple-reader policy. On the other hand,
Liveness is achieved by persistent requests. This reliable, but slower protocol is
used when the regular requests do not succeed within a timeout period. Persis-
tent requests are required because the regular requests, while likely to complete
quickly, do not guarantee eventual success.
In this work, we combine compositional verification and model checking to verify
the safety of a detailed implementation of a token coherence protocol for an
arbitrary number of caches. Our method takes advantage of the opportunities
offered by the token coherence design. It proceeds in four steps.
1. We present a formal specification of the safety substrate of token coher-
ence. This abstract protocol is based on rewrite rules and multisets, and
expresses the symmetry between components and messages. It applies to
arbitrary network topologies, cache numbers, and even cache hierarchies.
2. We prove manually that the abstract protocol is safe (i.e. coherent). The
verification problem is thus reduced to checking that the implementation
correctly refines the abstract protocol.
3. We prove that the refinement can be verified for each component individ-
ually, by replacing its context with an abstraction. We prove that this
decomposition into local refinement obligations is sound, using a variant
of assume-guarantee reasoning based on contextual refinement, and per-
forming an induction on the number of caches.
4. We discharge the local refinement obligations with the conventional model
checker Murϕ [13, 12]. To obtain the models, we manually translate,
abstract and annotate the implementation code. This procedure reduces
the refinement checking to a reachability problem, which Murϕ solves by
enumerative state space search.
2
2 PROCESS MODEL 1.1 Related Work
Even though the protocol implementation had been extensively simulated prior
to this work, we discovered a few bugs, and were able to fix them quickly with
the help of counterexamples produced by the model checker. The compositional
refinement method proved to be effective in avoiding the state space explosion
problem [17] which is commonly encountered in system-level models [29].
1.1 Related Work
Prior work on formal verification of cache coherence varies in (1) the proto-
col complexity and level of detail (2) the coverage achieved (safety, liveness,
parametric systems) (3) the underlying tools (enumerative or symbolic model
checkers, decision procedures, theorem provers), (4) reduction techniques (sym-
metry, abstraction, compositional verification), and (5) degree of automation.
We refer to Pong and Dubois [29] for a general survey, and to various illustrative
efforts [24, 28, 15, 3].
Our proof methodology modifies and combines a variety of ideas in the formal
verification literature. These include assume-guarantee reasoning for composi-
tional verification (c.f. [1, 9, 2, 26]), structural induction for proving properties
for arbitrary number of processes (c.f. [20, 10, 16, 14, 11, 4]), data abstraction
(c.f. [34, 18]), use of term rewrite systems for hardware verification [5], and
proving refinement using reachability analysis (c.f. [19]).
2 Process Model
In this section, we define the process model and introduce our assume-guarantee
proof rules. We chose to define the process model from scratch, so to keep it con-
cise and self-contained, and to obtain the desired combination of features. Ex-
cept for the specialized definition of contextual refinement, all concepts (traces,
composition, refinement) are standard and appear in many variations and com-
binations in the process algebra literature [30].
A process is defined as the set of its traces, which are finite words over an
alphabet Σ of events. Σ is considered fixed and common to all processes. We
further partition Σ = Σe∪Σc into disjoint subclasses: Σe contains events that are
visible to external observers of the system only, while Σc describes synchronous
communication events. Matching events in Σc (e.g. sending and receiving of a
message) are denoted σ and σ.
Definition 2.1 A process P over Σ is a non-empty prefix-closed language; i.e.
P ⊂ Σ∗, P 6= ∅ and for all u, v ∈ Σ∗ : uv ∈ P ⇒ u ∈ P . A process P refines a
process Q, written P 4 Q, iff P ⊂ Q. A process P is closed if P ⊂ Σ∗e.
The refinement relation 4 is a complete partial order on the processes. The
3
2 PROCESS MODEL
bottom (silent) process {} has but one trace: the empty string. The top
(universal) process Σ∗ includes all possible traces.
When composing processes, we merge their traces by interleaving their events
and hiding mutual communication.
Definition 2.2 Let u, v, w ∈ Σ∗ be traces. We define the relation u | v ` w
(speak: u, v can combine to form w) by the following inference rules:
 |  `  (epsilon)
u | v ` w σ ∈ Σc
uσ | vσ ` w (communication)
u | v ` w σ ∈ Σ
uσ | v ` wσ (l-event)
u | v ` w σ ∈ Σ
u | vσ ` wσ (r-event)
Example 2.3 Let Σe = {a, b, c, d} and Σc = {e, e}. Then we have
ab | cd ` acbd ab | cd ` abcd ae | eb ` ab ae | eb ` aeeb
but not ae | eb ` ba.
Definition 2.4 Let P , Q be processes. Then P | Q .= {w ∈ Σ∗ | ∃u ∈ P : ∃v ∈
Q : u | v ` w}.
Composition is commutative, which follows from Lemma E.1 below. It is also
associative, which follows from Lemma E.2 below.
Lemma E.1 Let u, v, w ∈ Σ∗. If u | v ` w, then v | u ` w.
Proof. We prove this by induction on the derivation of u | v ` w, with a
case distinction on the type of the last rule used.
case (epsilon):
In this case, u = v, so the claim is immediate.
case (l-event):
In this case, we have a derivation for u′σ | v ` w′σ.
We have a derivation subtree for u′ | v ` w′.
By induction, v | u′ ` w′.
Apply (r-event) to get v | u ` w.
case (r-event):
In this case, we have a derivation for u | v′σ ` w′σ.
We have a derivation subtree for u | v′ ` w′.
By induction, v′ | u ` w′.
Apply (l-event) to get v | u ` w.
case (communication):
In this case, we have a derivation for u′σ | v′σ ` w.
We have a derivation subtree for u′ | v′ ` w.
By induction, v′ | u′ ` w.
Apply (communication) to get v | u ` w.

4
2 PROCESS MODEL
Lemma E.2 Let u1, u2, u3, v, w ∈ Σ∗. If the following two conditions hold:
u1 | u2 ` v (1)
v | u3 ` w (2)
Then there exists a word z ∈ Σ∗ such that both of the following hold:
u1 | z ` w (3)
u2 | u3 ` z (4)
Proof. The proof proceeds by induction on the sum of the depths of the
derivation trees for (1) and (2). We do a case distinction on what inference
rules are used at the bottom of those derivations. For each combination of
inference rules, there is at least one matching proof case.
case (epsilon),(epsilon):
We are given derivations for (1),(2) of the form
 |  `  and  |  ` 
Set z = .
This gives (3) by (epsilon) and (4) by (epsilon).
case (epsilon),(l-event):
case (epsilon),(communication):
case (r-event),(epsilon):
case (l-event),(epsilon):
these combinations are not possible, because the variable v
can not match both  and v′σ.
case (communication),(any):
We are given derivations for (1),(2) of the form
u′1σ | u′2σ ` v and v | u3 ` w
Furthermore, we have a derivation subtree for
u′1 | u′2 ` v
By the induction hypothesis, there exists a z′ ∈ Σ∗ such that
(a) u′1 | z′ ` w (b) u′2 | u3 ` z′
Now set z = z′σ. Then
we get (3) from (a) by (communication),
and we get (4) from (b) by (l-event).
case (any),(r-event):
We are given derivations for (1),(2) of the form
u1 | u2 ` v and v | u′3σ ` w′σ
Furthermore, we have a derivation subtree for
v | u′3 ` w′
By the induction hypothesis, there exists a z′ ∈ Σ∗ such that
(a) u1 | z′ ` w′ (b) u2 | u′3 ` z′
Now set z = z′σ. Then
we get (3) from (a) by (r-event),
and we get (4) from (b) by (r-event).
5
2 PROCESS MODEL
case (l-event),(l-event):
We are given derivations for (1),(2) of the form
u′1σ | u2 ` v′σ and v′σ | u3 ` w′σ
Furthermore, we have derivation subtrees for
u′1 | u2 ` v′ and v′ | u3 ` w′
By the induction hypothesis, there exists a z ∈ Σ∗ such that
(a) u′1 | z ` w′ (b) u2 | u3 ` z
Now we get (3) from (a) by (l-event),
and (4) is (b).
case (r-event),(l-event):
We are given derivations for (1),(2) of the form
u1 | u′2σ ` v′σ and v′σ | u3 ` w′σ
Furthermore, we have derivation subtrees for
u1 | u′2 ` v′ and v′ | u3 ` w′
By the induction hypothesis, there exists a z′ ∈ Σ∗ such that
(a) u1 | z′ ` w′ (b) u′2 | u3 ` z′
Now set z = z′σ. Then
we get (3) from (a) by (r-event),
and we get (4) from (b) by (l-event).
case (l-event),(communication):
We are given derivations for (1),(2) of the form
u′1σ | u2 ` v′σ and v′σ | u′3σ ` w
Furthermore, we have derivation subtrees for
u′1 | u2 ` v′ and v′ | u′3 ` w
By the induction hypothesis, there exists a z′ ∈ Σ∗ such that
(a) u′1 | z′ ` w (b) u2 | u′3 ` z′
Now set z = z′σ. Then
we get (3) from (a) by (communication),
and we get (4) from (b) by (r-event).
case (r-event),(communication):
We are given derivations for (1),(2) of the form
u1 | u′2σ ` v′σ and v′σ | u′3σ ` w
Furthermore, we have derivation subtrees for
u1 | u′2 ` v′ and v′ | u′3 ` w
By the induction hypothesis, there exists a z ∈ Σ∗ such that
(a) u1 | z ` w (b) u′2 | u′3 ` z
Now (a) already is (3),
and we get (4) from (b) by (communication).

Like in CCS [27], composition does not restrict its components: for processes
P,Q we always have P 4 P | Q. This follows from Lemma E.3. Furthermore,
for all processes P , P | {} = P . This follows from the above, and from Lemma
E.4.
6
2 PROCESS MODEL
Lemma E.3 For any u ∈ Σ∗, u |  ` u and  | u ` u.
Proof. By induction on the length of u. If u = , then the result is direct from
rule (epsilon). Otherwise, u = u′σ for some σ ∈ Σ. By induction, u′ |  ` u′
and  | u′ ` u′. Application of the rules (l-event) and (r-event), respectively,
then implies the result. 
Lemma E.4 Let u, v ∈ Σ∗. If u |  ` v or  | u ` v, then u = v.
Proof. We prove that u |  ` v implies u = v (the other follows with Lemma
E.1, using induction on the derivation of u |  ` v, with a case distinction on
the last rule used.
case (epsilon):
u = v is immediate from the matching.
case (l-event):
In this case, u = u′σ and v = v′σ for some σ ∈ Σ.
We have a derivation subtree for u′ |  ` v′.
By induction, u′ = v′, and therefore u = v.
case (r-event):
case (communication):
These are impossible — they don’t match .

Refinement is preserved by composition: if P ′ 4 P , then P ′ | Q 4 P | Q. We
can use this fact to prove that a system implementation refines its specification
P ′ | Q′ 4 P | Q (5)
from the simpler, local refinement conditions
P ′ 4 P and Q′ 4 Q (6)
However, this method is not very powerful, because the refinements (6) do often
not hold because of implicit assumptions on the context. Assume-guarantee
reasoning remedies this shortcoming. We provide the context as an explicit
subscript to the refinement relation, enabling us to conclude (5) from
P ′ 4Q P and Q′ 4P Q (7)
Most process models used for compositional refinement of hardware [2, 25] can
express the contextual refinement P ′ 4Q P directly as P ′ ‖Q 4 P (using syn-
chronous parallel composition). The same does not work in our context (as
exemplified by the observation 5 below), so we use a direct definition instead.
Definition 2.5 (Contextual refinement.) Let P, P ′, C be processes. Then
P ′ is said to refine P in context C, written P ′ 4C P , iff for all traces u ∈ P ′
the following condition holds: if there is a trace v ∈ C such that u ↑ Σc = v ↑ Σc
(i.e. the communication events in u, v match up), then u ∈ P .
7
2 PROCESS MODEL
Intuitively, we require that all behaviors of P ′ that are actually possible within
an environment that adheres to C are allowed by P .
The following observations provide insight about contextual refinement.
1. For any process C, 4C is a pre-order on processes.
2. If P ′ 4C P , and C ′ 4 C, then P ′ 4C′ P .
3. Refinement in a universal context corresponds to regular refinement:
P ′ 4Σ∗ P ⇔ P ′ 4 P .
4. Refinement in a silent context corresponds to refinement of closed pro-
cesses: P ′ 4{} P ⇔ (P ′ ∩ Σ∗e) 4 (P ∩ Σ∗e)
5. The refinement P ′ | C 4{} P | C does not imply P ′ 4C P , because the
traces of P ′ | C do not indicate what mutual communication takes place.
However, the converse always holds.
The following Lemma shows how contextual refinement and composition are
related to each other.
Lemma E.5 Let u, v ∈ Σ∗. Then u ↑ Σc = v ↑ Σc iff there exists a w ∈ Σ∗e
such that u | v ` w.
Proof. First, we show the implication ‘⇒’, using induction on the sum of the
lengths of u and v. We make a case distinction based on the kind of the last
events in u and v.
• If u = , then u ↑ Σc = , therefore v ↑ Σc = , and therefore v ∈ Σ∗e.
Now choose w = v, and because we know  | v ` v by Lemma E.3, we get
u | v ` w.
• If v = , then v ↑ Σc =  , so u ↑ Σc = , and therefore u ∈ Σ∗e. Now
choose w = u, and because we know u |  ` u by Lemma E.3, we get
u | v ` w.
• If u = u′σ with σ ∈ Σe, then u ↑ Σc = u′ ↑ Σc and by induction, there
must exist a w′ ∈ Σ∗e such that u′ | v ` w′. Now choose w = w′σ, and by
applying (l-event) we get u | v ` w.
• If v = v′σ with σ ∈ Σe, then v ↑ Σc = v′ ↑ Σc and by induction, there
must exist a w′ ∈ Σ∗e such that u | v′ ` w′. Now choose w = w′σ, and by
applying (r-event) we get u | v ` w.
• If u = u′σ and v = v′τ with σ, τ ∈ Σc, then (because u ↑ Σc = v ↑ Σc)
it must be the case that τ = σ and u′ ↑ Σc = v′ ↑ Σc. By induction,
there must exist a w ∈ Σ∗e such that u′ | v′ ` w. Now we simply apply
(communication) to get u | v ` w.
8
2 PROCESS MODEL
Now, we show the implication ‘⇐’, using induction on the derivation of u | v `
w, with a case distinction on the last rule used.
case (epsilon):
In this case, u = v =  and therefore trivially
u ↑ Σc = v ↑ Σc.
case (l-event):
In this case, u = u′σ and w = w′σ for some σ ∈ Σ.
Because w ∈ Σ∗e, we must have σ ∈ Σe and w′ ∈ Σ∗e.
We have a derivation subtree for u′ | v ` w′.
By induction, u′ ↑ Σc = v ↑ Σc,
from which it follows that u ↑ Σc = v ↑ Σc.
case (r-event):
In this case, v = v′σ and w = w′σ for some σ ∈ Σ.
Because w ∈ Σ∗e, we must have σ ∈ Σe and w′ ∈ Σ∗e.
We have a derivation subtree for u | v′ ` w′.
By induction, u ↑ Σc = v′ ↑ Σc,
from which it follows that u ↑ Σc = v ↑ Σc.
case (communication):
In this case, u = v′σ and v = v′σ for some σ ∈ Σc.
We have a derivation subtree for u′ | v′ ` w.
By induction, u′ ↑ Σc = v′ ↑ Σc,
from which it follows that u ↑ Σc = v ↑ Σc.

To avoid circularity in the assume-guarantee reasoning, we conservatively re-
quire that the specification processes can always engage in a subset of commu-
nication events Σr ⊂ Σc that is sufficiently large, i.e. Σr ∪Σr = Σc; in our case,
we will take care of this requirement by having specification processes accept
any message at any time1. We use the following definition to formalize this
property of processes.
Definition 2.6 Let P be a process over Σ, and Σr ⊂ Σ be an event subset. P
is called Σr-enabled iff ∀u ∈ P : ∀σ ∈ Σr : uσ ∈ P .
We now give the two proof rules for compositional refinement. The first rule is
simpler, but restricted to two components. The second rule is a generalization
suited for induction.
Theorem 2.7 Let P, P ′, Q,Q′, C be processes over Σ = Σe ∪ Σc. Let Σr ⊂ Σc
such that Σr ∪ Σr = Σc. Then the following proof rules are sound:
1If this is not true by default, we could extend the specification to generate a special error
event if it receives an unexpected message.
9
2 PROCESS MODEL
P ′ 4Q P P,Q are Σr-enabled Q′ 4P Q
P ′ | Q′ 4{} P | Q
P ′ 4Q|C P P,Q are Σr-enabled Q′ 4P |C Q
P ′ | Q′ 4C P | Q
Proof. We need only prove the second rule since the first can be obtained
from the second by setting C = {}.
Suppose we are given processes P,Q, P ′, Q′, C that satisfy the hypotheses of the
proof rule. Suppose that u ∈ P ′ | Q′ and v ∈ C and u ↑ Σc = v ↑ Σc. There
must be a derivation for u ∈ P ′ | Q′, say p | q ` u for some p ∈ P ′ and q ∈ Q′.
By Lemma E.5, there exists a w ∈ Σ∗e such that u | v ` w. Now Lemma E.6
tells us that in this situation, p ∈ P and q ∈ Q.. Because p | q ` u, this implies
that u ∈ P | Q. Therefore, P ′ | Q′ 4C P | Q. 
Lemma E.6 Let P,Q, P ′, Q′, C be processes, let Σr ⊂ Σc such that Σr ∪ Σr =
Σc, let both P,Q be Σr-enabled, let P ′ 4Q|C P , let Q′ 4P |C Q and let p ∈ P ′,
q ∈ Q′, v ∈ C.
Now, if p | q ` u for some u ∈ Σ∗, and u | c ` w for some w ∈ Σ∗e, then p ∈ P
and q ∈ Q.
Proof. We prove this by induction on the derivation of p | q ` u, with a case
distinction on the kind of rule used last.
case (epsilon):
In this case, p = q =  and therefore p ∈ P and q ∈ Q.
case (l-event):
In this case, we have p = p′σ, u = u′σ and derivations
p′σ | q ` u′σ and u′σ | c ` w
We have a derivation subtree for p′ | q ` u′.
By Lemma E.7, there are c′ ∈ C and w′ ∈ Σ∗e such that u′ | c′ ` w′.
By induction, p′ ∈ P and q ∈ Q.
By Lemma E.2, there is a z′ ∈ Σ∗ such that
(a) p | z ` w (b) q | c ` z
By Lemma E.5, (a) implies p ↑ Σc = z ↑ Σc
Furthermore, (b) and q ∈ Q imply that z ∈ Q | C.
Therefore, by P ′ 4Q|C P , we have p ∈ P .
case (r-event):
In this case, we have q = q′σ, u = u′σ and derivations
p | q′σ ` u′σ and u′σ | c ` w
We have a derivation subtree for p | q′ ` u′.
By Lemma E.7, there are c′ ∈ C and w′ ∈ Σ∗e such that u′ | c′ ` w′.
By induction, p ∈ P and q′ ∈ Q.
10
2 PROCESS MODEL
By Lemma E.1 and Lemma E.2, there is a z′ ∈ Σ∗ such that
(a) q | z ` w (b) p | c ` z
By Lemma E.5, (a) implies q ↑ Σc = z ↑ Σc
Furthermore, (b) and p ∈ P imply that z ∈ P | C.
Therefore, by Q′ 4P |C Q, we have q ∈ Q.
case (communication):
In this case, we have p = p′σ, q = q′σ, and derivations
p′σ | q′σ ` u and u | c ` w
We have a derivation subtree for p′ | q′ ` u.
By induction, p′ ∈ Q and q′ ∈ Q.
Now, at least one of σ ∈ Σr or σ ∈ Σr.
Therefore, p′σ ∈ P or q′σ ∈ Q, and therefore p ∈ P or q ∈ Q.
Say q ∈ Q.
Proceed as in the second part of the (l-event)-clause above.
Say p ∈ P .
Proceed as in the second part of the (r-event)-clause above.

Lemma E.7 Let u, v, w ∈ Σ∗ such that u | v ` w. If u′ is a prefix of u, then
there exist prefixes v′ of v and w′ of w such that u′ | v′ ` w′.
Proof. We prove this first for the special case where u = u′σ. The general
case can then be obtained by induction on the length difference between u and
its prefix u′. For the special case u = u′σ, we use induction on the derivation
of u | v ` w, with a case distinction on the type of rule used last.
case (epsilon):
This case is impossible.
case (l-event):
In this case, we have a derivation of u′σ | v ` w′σ.
We have a derivation subtree for u′ | v ` w′.
This directly confirms the claim since w′ is a prefix of w.
case (r-event):
In this case, we have a derivation of u′σ | v′τ ` w′τ .
We have a derivation subtree for u′σ | v′ ` w′.
By induction, there exist prefixes u′′ of u′ and v′′ of v′
such that u′ | v′′ ` w′′.
This confirms the claim since v′′ is a prefix of v and w′′ is a prefix of w.
case (communication):
In this case, we have a derivation of u′σ | v′σ ` w.
We have a derivation subtree for u′ | v′ ` w.
This directly confirms the claim since v′ is a prefix of v.

For example, consider again the local refinement obligations (7). Suppose that
11
3 TOKEN COHERENCE
the specification processes P,Q can receive messages at any time. We can then
apply the first proof rule to conclude that P ′ | Q′ refines P | Q, if there is no
external communication, i.e., there are no other components in the system.
3 Token Coherence
In this section, we introduce a formal specification of the safety substrate of
token coherence. This abstract protocol is a generalization of the MOESI token
counting rules in Martin’s dissertation [21]. We then justify it’s use as a spec-
ification, by proving that it is coherent, and with it any implementation that
refines it.
3.1 Background: Cache Coherence
Cache coherence describes the contract between the memory system and the
processor in a shared-memory multiprocessor. It is typically established at the
granularity of a cache block. A memory system is cache coherent if for each
block, writes are serialized, and reads get the value of the last write.
Definition 3.1 Let V be the set of values of a fixed cache block, and v0 ∈ V
the initial value. Let Σrw = {rd(v), wr(v) | v ∈ V } be the alphabet of events,
describing accesses to the block by some processor. Then the coherent traces of
the system are given by the following regular language over Σrw :
Coh = rd(v0)∗
(⋃
v∈V
wr(v) rd(v)∗
)∗
Token coherence, like many contemporary coherence protocols such as the pop-
ular MOESI protocol family [33], provides this strong form of coherence by
enforcing a “single writer, multiple reader” policy2.
Whether a cache block can be read and/or written is determined by its per-
mission state, one of {L,Lr, Lrw} (see also figure 1 for more detail). The cache
coherence protocol then enforces the following important mutual exclusion prop-
erty: If one cache is in Lrw state, all other caches are in L state.
For example, the popular MOESI protocol family [33] fits into our permission
scheme as follows: the I state is L, both S and O are Lr, and both M and E are
Lrw.
Token coherence guarantees mutual exclusion by requiring since an agent need
by assigning a fixed number m of tokens to each cache block. These tokens are
2We are considering only the interface between the memory system and the processor here.
Independently, the contract between the processor and the programmer may use weaker forms
of coherence that involve temporal reordering of events, as specified by the memory model.
12
3 TOKEN COHERENCE 3.2 The abstract protocol
(a) Without values. (b) With values.
write
readread
local access transitions
transitions
protocol−controlled
L
r rwL L
Each cache is a LTS (Q, q0, δ,Σrw)
where
Q = {L,Lr(v), Lrw(v) | v ∈ V }
q0 = L
δ ⊂ Q× (Σrw ∪ {})×Q
and for v, w ∈ V we have δ contain
1) local access transitions:
Lr(v)
rd(v)−−−→ Lr(v)
Lrw(v)
rd(v)−−−→ Lrw(v)
Lrw(v)
wr(w)−−−−→ Lrw(w)
2) protocol-controlled transitions:
of the form q −→ q′ with q, q′ ∈ Q. The
protocol implementation controls these
state changes.
Figure 1: LTS model of caches.
stored in the caches and the memory, and may be carried around by protocol
messages, but their total number is constant. The permission state of a cache
corresponds to how many tokens it has: L for 0 tokens, Lr for 1 . . . (m − 1)
tokens and Lrw for m tokens.
The mutual exclusion has additional applications beyond just cache coherence:
some processor designs directly exploit the per-address mutual exclusion pro-
vided by the cache coherence protocol, to implement synchronization primitives,
or for optimizations such as speculative loads or speculative retirement [31].
3.2 The abstract protocol
In our abstract protocol, system components and messages are of the same type
and treated completely symmetrically: both are represented by token bags.
Token bags are finite multisets (or bags) over some set T of tokens, and may be
required to satisfy some additional constraints (well-formedness). The tokens in
the bag constitute the state of the component, or the contents of the message.
The state of the entire system is represented as yet another bag that encloses
the token bags of the individual components and messages. The sending of a
message is modeled as a division, where a bag separates into two bags, dividing
its tokens. The receipt of a message, symmetrically, is modeled as a fusion of
token bags. Change is expressed by local reactions: tokens within a bag can be
13
3 TOKEN COHERENCE 3.2 The abstract protocol
consumed, produced or modified according to rewrite rules.
We now formalize the general behavior of such token transition systems, which
prepares us for defining the actual abstract protocol.
Definition 3.2 (Multisets) Let T be a set. Two words u, v ∈ T ∗ are equiv-
alent if one is a permutation of the other. The induced equivalence classes
{[u] | u ∈ T ∗} are called finite multisets over T , or T -bags. Multiset union is
defined as concatenation [u]` [v] .= [uv]. The set of all T -bags is denotedM(T ).
For x ∈M(T ), let |x| denote the set of elements of T that occur in x.
For example, for any t1, t2 ∈ T , all of the following denote the same T -bag:
[ t21 t2 ] = [ t1 t1 t2 ] = {t1t1t2, t1t2t1, t2t1t1}. The exponent is a convenient
notation for repeated symbols, and often used with regular languages.
Definition 3.3 (Token Transition System) A TTS is a tuple (T,B, I,Σe,W )
where T is a set of tokens, B ⊂ M(T ) defines the set of well-formed T -
bags, I ∈ M(B) is the initial configuration, Σe is a set of local events, and
W ⊂ Σe ×M(T )× 2T ×M(T ) is a set of rewrite rules.
A rewrite rule (a, x,H, y) ∈ W is denoted a: x =⇒
H
y. It describes a reaction
labeled a that can occur whenever all the tokens in x are together in a bag, and
the bag does not contain any of the inhibiting tokens listed in H. When the
reaction fires, the tokens x are replaced by the tokens y. If H is empty, we omit
it from the notation.
A TTS defines a process over the alphabet Σ = Σe∪Σc, with Σc = {snd(b), rcv(b) |
b ∈ B}, with the traces {u ∈ Σ∗ | ∃C ∈ M(B) : I u−→ C}, where we define the
transition relation C u−→ C ′ with the inference rules3 below.
3The variables in the rule templates range over the following domains: u, v, w ∈ Σ∗,
x, y, z ∈ B, and C,C′, C′′ ∈ M(B). Furthermore, as a syntactic shortcut, we allow C,C′, C′′
to match several positions in a multiset of token bags: for example, [ C z ] can match [ x y z ]
by setting C = [ x y ].
14
3 TOKEN COHERENCE 3.2 The abstract protocol
C
−→ C (stutter)
C
u−→ C ′ C ′ v−→ C ′′
C
uv−→ C ′′ (trans)
x`y ∈ B
[ C x y ] −→ [ C x`y ] (fusion) [ C x`y ] −→ [ C x y ] (division)
a: x =⇒
H
y |z| ∩H = ∅ y`z ∈ B
[ C x`z ] a−→ [ C y`z ] (reaction)
[ C x ]
snd(x)−−−−→ [ C ]
(send)
[ C ]
rcv(x)−−−−→ [ C x ]
(receive)
Token transition systems have a feel of concurrency much like a biological system
where reactive substances are contained in cells that can undergo fusion and
division. Chemical abstract machines [6] capture the same idea (with molecules,
membranes, and solutions instead of tokens, bags, and configurations), but are
also different in many ways (for example, they do not have fusion or division).
Example E.8 Suppose our tokens are electrons 	, positrons ⊕, and gamma
rays γ, and we have a reaction where electrons and positrons annihilate each
other, emitting a gamma ray:
fire: [ 	 ⊕ ] =⇒ [ γ ]
Then the following is a trajectory of the system:
[ [ γ ⊕ ⊕ ] [ 	 ] ]
→ [ [ γ ⊕ ] [ ⊕ ] [ 	 ] ]
→ [ [ γ ⊕ ] [ ⊕ 	 ] ]
fire−→ [ [ γ ⊕ ] [ γ ] ]
What happens is that the left bag splits off a bag containing a single ⊕. This
bag is then absorbed by the right bag. The latter now contains the ingredients
needed for the “fire” reaction, so it can react.
Definition 3.4 (The abstract protocol.) The safety substrate Tm (where m
is the number of tokens, a fixed parameter) is a TTS (T,B, I,Σe,W ) where
• T contains the following tokens:
R is a regular token as used by token coherence.
O(s) is a owner token in one of two states s ∈ {C,D} (clean or dirty).
D(v) is an instance of the data, with value v ∈ V .
M(v) is a memory cell containing the value v ∈ V .
15
3 TOKEN COHERENCE 3.2 The abstract protocol
• B is defined by imposing two conditions on a token bag x ∈M(T ):
– if x contains data D(v), then it must contain at least one regular
token R or an owner token O(s).
– if x contains a dirty owner token O(D), then it must contain data
D(v).
• I .= [ [ Rm−1 O(C)M(v0) ] ].
• Σe .= {rd(v),wr(v),memread,memwrite, copy,drop | v ∈ V }.
• W consists of the rewrite rules shown in table 1.
Table 2 shows an example trajectory of the abstract protocol. Next, we explain
the reaction rules and their interaction in some more detail.
rd(v) reads a value from a data instance (it can be applied at any time, and does
not modify the state). wr(w) modifies a data token, and can only be applied if
all m tokens (one owner token and m − 1 regular tokens) are present, and no
other data copies are in the same bag (which guarantees that the data token
being modified is the only one in the system).
To guarantee proper writebacks of modified data, a special owner token is used.
The owner token records the clean/dirty state, i.e. whether the memory value is
stale. When modifying data, the owner token is set to dirty. When the memory
writes back the data (memwrite), the owner token is cleaned. memread loads
data from the memory only if there is a clean owner token, and thereby avoids
reading stale data.
The rules copy and drop imply that data instances D(v) can be freely copied
or destroyed, subject only to the restriction enforced by B that all bags are
well-formed — for example, whoever has the dirty owner token must keep at
least one data instance.
We can now prove that the abstract protocol is coherent.
Theorem 3.5 The closed system Tm ∩ Σ∗e is coherent:
(Tm ∩ Σ∗e) ↑ Σrw ⊂ Coh
Proof. To prove this, verify that (1) all of the following invariants hold in the
initial state I and (2) prove (by induction on derivations) that if the invariants
hold for a state C, they hold for any state C ′ such that C u−→ C ′ for some u ∈ Σ∗e.
1. The number of regular tokens R in the system is m− 1.
2. There is always exactly one owner token O(s).
3. There is always exactly one memory cell M(v).
4. All data instances D(v) have the same values.
16
3 TOKEN COHERENCE 3.2 The abstract protocol
rd(v): [ D(v) ] =⇒ [ D(v) ]
wr(w): [ Rm−1 O(s) D(v) ] =⇒
{D(v)}
[ Rm−1 O(D) D(w) ]
memread: [M(v) O(C) ] =⇒ [M(v) O(C) D(v) ]
memwrite: [M(v) O(D) D(w) ] =⇒ [M(w) O(C) D(w) ]
copy: [ D(v) ] =⇒ [ D(v) D(v) ]
drop: [ D(v) ] =⇒ [ ]
Table 1: The reaction rules of the abstract protocol.
Description System trajectory
initial state [ [M(v0) O(C) Rm−1 ]D [ ]C1 [ ]C2 ]
C1 requests M (requests are abstracted away)
D responds
— read memory data memread−−−−−→ [ [M(v0) D(v0) O(C) Rm−1 ]D [ ]C1 [ ]C2 ]
— send data w/ tokens −→ [ [ M(v0) ]D [ D(v0) O(C) Rm−1 ] [ ]C1 [ ]C2 ]
C1 receives response
−→ [ [M(v0) ]D [ D(v0) O(C) Rm−1 ]C1 [ ]C2 ]
C1 writes value v1
wr(v1)−−−−→ [ [M(v0) ]D [ D(v1) O(D) Rm−1 ]C1 [ ]C2 ]
C2 requests S (requests are abstracted away)
C1 responds
— copy data
copy−−−→ [ [M(v0) ]D [ D(v1) D(v1) O(D) Rm−1 ]C1 [ ]C2 ]
— send data w/ token −→ [ [M(v0) ]D [ D(v1) O(D) Rm−2 ]C1 [ D(v1) R ] [ ]C2 ]
Table 2: A short example trajectory of the abstract protocol, representing a
system with a memory D and two caches C1 and C2. For clarification, token
bags carry subscripts indicating the component that they represent. Those
subscripts are not part of the abstract protocol.
17
3 TOKEN COHERENCE 3.2 The abstract protocol
5. If the owner token is clean, any data instances present have the same value
as the memory cell.
6. If there is a data token, it contains the value of the last write. Otherwise,
the memory does.
We proceed by induction on the derivation of C u−→ C ′, with a case distinction
on the last derivation used. Since the invariants are predicates on states, it
is immediately clear that they are preserved in the case (stutter), and the
case (trans) is a straighforward induction. The cases (fusion) and (division)
preserve the invariants because the latter are insensitive to how the tokens are
partititioned into bags. The cases (send) and (receive) can not apply because
we assumed u ∈ Σ∗e. Finally, the rule (reaction) is the only interesting case,
and we will examine it in detail.
First, invariants 1, 2 and 3 must be preserved because the reactions preserve
the number of R, O and M tokens. We examine the remaining invariants 4, 5,
and 6 separately for each reaction.
Invariant 4 is obviously preserved by rd(v), memwrite, copy and drop. The rule
memwrite also preserves it, because invariant 5 guarantees that in this situation,
the memory has the same value as all existing data tokens. Finally, the rule
wr(w) preserves the invariant because the data token that participates in the
reaction must be the only data token in the whole system. This is because any
other tokens would have to either be in a different bag (which, by B, they can’t
be without an accompanying R or O token, and we know that there are no R
or O tokens in any other bag because by invariants 1 and 2, there is a fixed
number of them, and they are all in this bag), or in the same bag (which they
can’t be because they would inhibit the reaction).
Invariant 5 is obviously preserved by rd(v), copy and drop. It is preserved by
wr(w), because it is vacously satisfied after the reaction. Similarly, it becomes
true after memwrite. And it is also preserved by memread, which creates a new
data token with the same value as the memory.
Invariant 6 is true after wr(w), because there is just one data token (as argued
for Invariant 4). It is clearly preserved by rd(v), copy and memwrite. It is
preserved by drop because either the token being dropped is not the last one
(in which case the remaining data tokens keep the value), or it is the last one
(in which case the memory must be up-to-date, because of invariant 5 and
the fact that the owner token must be clean, which is because a dirty owner
token requires the presence of a data token, because of B). It is preserved by
memread because either the newly created token is not the only one (in which
case invariant 4 helps us), or it is the only one (in which case we know that the
value read from memory is the value of the last write).
Together, these invariants guarantee that all data instances D(v) are always
up-to-date; therefore, reads get the correct value which implies coherence. 
All state is modeled by tokens, and there is no distinction between components
18
3 TOKEN COHERENCE 3.3 How a TTS is an LTS
and messages. This symmetry points out interesting design directions. For
example, we consider the memory cell M(v) to be stationary. However, the
formal token rules do not impose this restriction and and could be used as an
implementation guideline for a system with home migration.
3.3 How a TTS is an LTS
For completeness, and for use by later proofs, we show how our token transition
systems can be understood as LTSs, and that our processes compose in the same
way that the LTSs do. We include this section only in the extended version since
it is quite technical in a boring way.
To define the LTS corresponding to a TTS, we use exactly the same inference
rules as for defining the traces of a TTS, except that we do not include the
(stutter) and (trans) rule (since they are redundant once we take the trace
set of the LTS). To avoid possible confusion when referencing these rules, we
added primes to the rule names in this section.
Definition E.9 The token transition system S = (T,B, I,Σe,W ) defines a
labeled transition system LTS(S) .= (Q, q0,Σ ∪ {}, δ) where
Q
.=M(B)
q0 .= I
Σ = Σe ∪ Σc
Σc
.= {snd(b), rcv(b) | b ∈ B}
and δ is defined by the inference rules
x`y ∈ B
[ C x y ] −→δ [ C x`y ]
(fusion’)
[ C x`y ] −→δ [ C x y ]
(division’)
a: x =⇒
H
y |z| ∩H = ∅ y`z ∈ B
[ C x`z ] a−→δ [ C y`z ]
(reaction’)
[ C x ]
snd(x)−−−−→δ [ C ]
(send’)
[ C ]
rcv(x)−−−−→δ [ C x ]
(receive’)
Definition E.10 For a labeled transition system (Q, q0,Σ∪{}, δ), states q1, q2 ∈
Q and a word v ∈ Σ∗ we define: q v=⇒ q′ iff there exists a k ≥ 0 and a se-
quence of transitions q0
v1−→δ q1 v2−→δ . . . vk−→δ qk such that q0 = q, qk = q′ and
v1v2 . . . vk = v (where v1v2 . . . vk is  for k = 0).
19
3 TOKEN COHERENCE 3.3 How a TTS is an LTS
Definition E.11 For an LTS M = (Q, q0,Σ ∪ {}, δ), define the trace set
L(M) .= {w ∈ Σ∗ | ∃q ∈ Q : q0 w=⇒ q}
Lemma E.12 Let P be the process defined by the token transitions systems S.
Then L(LTS(S)) = P .
Proof. For the ‘⊂’ direction, we are given a word w and a transition sequence
as in definition E.10. For each transition, we have a primed rule as listed under
definition E.9. We can then transform this transition sequence into a derivation
for I w−→ C, by taking, for each transition, the corresponding non-primed rule
(as listed under definition 3.3), and glueing them together with (trans). This
then shows that w ∈ P .
For the ‘⊃’ direction, we show that for all derivations of C u−→ C ′, there exists a
state sequence as in definition E.9, with w = u and q0 = C and qk = C ′ (we do
so by induction on the derivation of I u−→ C). From that, we conclude that for
all u such that I u−→ C for some C, we must have u ∈ L(LTS(S)). 
Definition E.13 For two LTSs Mi = (Qi, q0i,Σ ∪ {}, δi) (with i ∈ {1, 2}) we
define the composition M1 | M2 to be the LTS (Q1 × Q2, (q01, q02),Σ ∪ {}, δ)
where δ is defined by the inference rules
q1
u−→δ1 q′1 u ∈ Σ ∪ {}
(q1, q2)
u−→δ (q′1, q2)
(l-event’)
q2
u−→δ2 q′2 u ∈ Σ ∪ {}
(q1, q2)
u−→δ (q1, q′2)
(r-event’)
q1
σ−→δ1 q′1 q2 σ−→δ2 q′2 σ ∈ Σc
(q1, q2)
−→δ (q′1, q′2)
(communication’)
Lemma E.14 For two LTSs Mi = (Qi, q0i,Σ ∪ {}, δi) (with i ∈ {1, 2})), the
following holds:
L(M1) | L(M2) = L(M1 |M2)
Definition E.15 For two LTSs Mi = (Qi, q0i,Σ∪{}, δi) (with i ∈ {1, 2})) we
say that M2 can simulate M1 if there exists a weak simulation relation. A weak
simulation relation is a relation S ⊂ Q1 ×Q2 such that
(S1) (q01, q02) ∈ S
(S2) for all (q1, q2) ∈ S, and for all transitions q1 u−→δ1 q′1, there exists a state
q′2 ∈ Q2 such that (q1′, q2′) ∈ S and q2 u=⇒δ2 q′2.
Lemma E.16 If M2 can simulate M1, then L(M1) ⊂ L(M2).
20
4 IMPLEMENTATION
G
ET
X
G
ET
S
Lo
ck
do
w
n
U
nl
oc
kd
ow
n
D
at
a 
O
w
ne
r
D
at
a 
Sh
ar
ed
A
ck
A
ck
 O
w
ne
r
Ex
cl
us
iv
e 
Co
m
pl
.
O
w
ne
d 
Co
m
pl
.
Sh
ar
ed
 C
om
pl
.
O dd l /L f q k f q k v i w i x i
NO a b j c j a l / L f q k f q k v i w i x i
L j j l l / NO r k r k r k s k v i w i x i
d b j 
/NO
d j 
/NO
m f p 
k /O
n f p 
k /O
Figure 2: The SLICC table for the memory controller. Rows show controller
states, columns show events, and cells show transitions. For example, consider
the upper left box. It states that if a Request-Exclusive message arrives while
the controller is in state O, the actions d, b and j are executed in sequence, and
the controller transitions to the NO state. Shaded cells indicate that an event
is not expected to occur in the given state.
4 Implementation
In this section, we describe how we verified the safety of a detailed implementa-
tion of token coherence for an arbitrary number of caches. We describe how we
used compositional verification to deal with the parametric character, and how
we employed abstraction to handle the fine level of detail. We conclude with a
list of discovered bugs.
4.1 The Protocol Implementation
The protocol implementation was developed by Martin et al. for architecture re-
search on token coherence [21], and was extensively simulated prior to this work.
It consists of finite state machines (FSM) for the cache and memory controllers,
augmented with message passing capabilities. The FSMs are specified using
the domain-specific language SLICC (Specification Language for Implementing
Cache Coherence) developed by Martin et al.
The FSMs include all necessary transient states that arise due to the asyn-
chronous nature of the protocol. The memory and cache controller amount to
600 and 1800 lines of SLICC code, respectively, a scale on which purely manual
analysis methods are impractical, in particular because these low-level specifi-
cations are usually changed over time.
The SLICC compiler generates (1) executables for the simulation environment
and (2) summary tables containing the control states, events and transitions
in a human-readable table format4. Figure 2 shows the summary table for the
memory controller, with its 3 states and 11 events. Note that some parts of
4More about the table format can be found in Sorin et al. [32]
21
4 IMPLEMENTATION 4.2 Parametric Compositional Refinement Proof
the state, such as the number of tokens, or the actual data values, are stored in
variables that are not visible in the summary table.
Due to lack of space, we can not reproduce the summary table for the cache
controller (17 states and 20 events), and we can not explain further the meaning
of the states and events. The complete SLICC code and interactive HTML-
tables are online [22], along with implementations of three other cache coherence
protocols.
4.2 Parametric Compositional Refinement Proof
Consider the system S′n consisting of n caches C
′, a directory controller D′
(which is attached to the memory, and sometimes called memory controller), and
a interconnection network N ′. We consistently use primes for implementation
processes to distinguish them from specification processes:
S′n
.= C ′ | C ′ | · · · | C ′︸ ︷︷ ︸
n
| N ′ | D′ (8)
In the beginning, the memory holds all tokens. We define local specification
processes as token transition systems:
D
.= Tm = (T,B, I,Σe,W )
C
.= (T,B, [ [ ] ],Σe,W )
N
.= (T,B, [ [ ] ],Σe,W )
Because a token transition system already models all conceivable distributions
of state over the system, no new behavior emerges if we compose. This “self-
similarity” is an important feature in our induction proof, and expressed in the
following general proposition:
Proposition E.17 Let Pi be three processes (i ∈ {1, 2, 3}) defined by the token
transitions systems Si = (T,B, Ii,Σe,W ) (which are identical except for their
initial states). Then, if I1`I2 = I3, we have P1 | P2 = P3.
Proof. We adopt the LTS view of transition systems for this proof, as
developed in section 3.3. We denote the respective labeled transition systems
by Mi
.= LTS(Si).
For the direction ‘⊃’, note that the transition system M1 |M2 can simulate M3
because (1) we can first send all bags from M2 to M1, which does not produce
any observed behavior, and makes the states of M1 match the state of M3, and
(2) from there on, we can simulate all transitions ofM3 with identical transitions
of M1, using (l-event’) to embed M1’s behavior in M1 |M2.
For the direction ‘⊂’, we let M3 simulate M1 |M2 by the following relation:
S
.= {((q1, q2), q3) ∈ (Q1 ×Q2)×Q3 | q1`q2 = q3}
22
4 IMPLEMENTATION 4.2 Parametric Compositional Refinement Proof
To show that S is a weak simulation, we need to show that it satisfies the
properties (S1) and (S2) in definition E.15. (S1) is direct from the assumptions.
For (S2), we make a case distinction on the kind of transition of M1 |M2.
First, assume we are dealing with (l-event’) or (r-event’). We can then
match all possible transitions (one of (fusion’), (divsion’), (reaction’), (send’)
or (receive’)) because q1` q2 = q3 means that the bags that are involved in
producing the transition in q1 or q2 can produce the same transition in q3.
Second, assume we need to match a (communication’) transition. It turns
out that we can simulate this by doing nothing at all: consider q1
snd(b)−−−−→ q′1 and
q2
rcv(b)−−−→ q′2. Then we must have q1`q2 = q1′`q′2 = q3, so we are o.k. because
q3
=⇒ q3.

We now state the central result which (together with Theorem 3.5) allows us to
verify the implementation components D′, C ′ and N ′ individually, each within
an abstracted context rather than a fully instantiated system.
Theorem 4.1 If the implementation processes satisfy the local refinement obli-
gations
D′ 4C D (9)
C ′ 4D C (10)
N ′ 4D C (11)
then for all n ∈ N, we have S′n 4{} Tm, i.e., the system refines the formal token
coherence protocol.
Proof. Using proposition E.17, we obtain the following equalities, which
allow us to split and merge specification processes as needed during the proof:
C | D = D (12)
C | C = C (13)
The following expresses the fact that a collection of n caches behaves just like
a single cache, and we prove it by induction:
C ′(n) 4D C (14)
For n = 1, it coincides with (10). As for n (n+ 1): the equation (12) can be
expanded in the induction hypothesis (14) to yield
C ′(n) 4C|D C (15)
and the same expansion in (10) produces
C ′ 4C|D C (16)
23
4 IMPLEMENTATION 4.3 Discharging the Obligations
Now, (15) and (16) are plugged into theorem 2.7 to get the following
C ′(n) | C ′ 4D C | C
from which we can complete the induction by using (13).
Now, we add the network. First, we expand (12) in (11) to get
N ′ 4C|D C (17)
which can be plugged into theorem 2.7, along with (15), to get
C ′(n) | N ′ 4D C | C (18)
which can be reduced to the following using (13):
C ′(n) | N ′ 4D C (19)
Finally, we plug (9) and (19) into the theorem and get
C ′(n) | N ′ | D′ 4{} C | D (20)
whose right-hand side can be collapsed by (12) to complete the proof. 
4.3 Discharging the Obligations
To discharge the remaining obligations, we used manual translation, abstraction,
and annotation, and the explicit model checker Murϕ [13, 12]. The following
steps give an overview of the method.
1. Obtain models D′, C ′ for the memory and cache controller implementa-
tions. This step involves translating the SLICC code to Murϕ, instrument-
ing it with the read/write events relevant for coherence, and abstracting
both the state space and the message format. Fig. 3 shows snippets of
translated code. The SLICC instructions that fell prey to the abstraction
are in slanted face. For example, only a single cache block is modeled,
therefore the code dealing with addresses is abstracted away. Also, mes-
sage source and destination fields are irrelevant due to the deep symmetry
of formal token coherence. Furthermore, two data values are sufficient5.
2. Obtain good encodings for the specification/environment processes D, C.
We can take advantage (1) of the global system invariants established
earlier and (2) of the fact that fusion and division are not observable. For
example, the flattening map [ b1 b2 . . . bk ] 7→ b1 ` b2 . . .` bk provides
a canonical representative state. This means that a single T -bag, rather
than a multiset of T -bags, is sufficient to model the context. The models
we obtain this way are compact and contribute much to the state-space
economy of our approach.
5Restricting the set of values is justified by the data-independence [34], which implies that
we can freely substitute values in the traces.
24
4 IMPLEMENTATION 4.3 Discharging the Obligations
rule "get Request-Excl in O state"
(I_DirectoryState = state_O)
==>
begin
d_sendDataWithAllTokens();
I_DirectoryState := state_NO;
endrule;
procedure d_sendDataWithAllTokens();
var
out_msg: I_message;
begin
out_msg.RType := DATA_OWNER;
if !(I_Tokens > 0) then
error "d: assertion failed. ";
endif;
out_msg.Tokens := I_Tokens;
out_msg.DataBlk := I_DataBlk;
out_msg.Dirty := false;
I_Tokens := 0;
EVENT_MEMLOAD();
EVENT_SEND(out_msg);
EVENT_DROP();
end;
transition(O, RequestExcl, NO) {
d_sendDataWithAllTokens;
b_forwardToSharers;
j_popIncomingRequestQueue;
}
action(d_sendDataWithAllTokens, "d") {
peek(requestNetwork_in, RequestMsg) {
enqueue(responseNetwork_out, ResponseMsg) {
out_msg.Address := address;
out_msg.Type := CoherenceResponseType:DATA_OWNER;
out_msg.Sender := id;
out_msg.SenderMachine := MachineType:Directory;
out_msg.Destination.add(in_msg.Requestor);
out_msg.DestMachine := MachineType:L1Cache;
assert(directory[address].Tokens > 0);
out_msg.Tokens := directory[in_msg.Address].Tokens;
out_msg.DataBlk := directory[in_msg.Address].DataBlk;
out_msg.Dirty := false;
out_msg.MessageSize := MessageSizeType:Response_Data;
}
}
directory[address].Tokens := 0;
}
Figure 3: The murphi code (left) is obtained from the SLICC code (right).
3. Annotate the transitions of the implementation with matching specifica-
tion transitions, and provide refinement maps. For each transition of the
implementation process, the annotations specify a sequence of transitions
of the specification process. Fig. 3 shows such annotations in uppercase.
The refinement maps are functions that map a controller state to its cor-
responding token bag.
4. Run the model checker Murϕ separately for the two relevant obligations6
D′ 4C D and C ′ 4D C.
Proposition 4.3 listed below describes how the contextual refinement is
discharged. The state enumeration performed by the model checker effec-
tively constructs and verifies the relation R, which describes the reachable
states of the implementation process I within the abstract context C. The
annotations provided by the user eliminate the need for existential quan-
tification. The model checker also validates the assertions present in the
implementation code.
Definition 4.2 For a labeled transition system (Q, q0,Σ∪{}, δ), states q1, q2 ∈
Q and a word v ∈ Σ∗ we define: q v=⇒ q′ iff there exists a k ≥ 0 and a sequence of
transitions q0
v1−→ q1 v2−→ . . . vk−→ qk such that q0 = q, qk = q′ and v1v2 . . . vk = v
(where v1v2 . . . vk is  for k = 0).
6Theorem 4.1 lists three obligations, but we skip N ′ 4D C because it reduces to checking
the reliablity of the network, which is trivial at the given abstraction level
25
4 IMPLEMENTATION 4.3 Discharging the Obligations
Proposition 4.3 Let I, S and C be processes defined by the trace sets of the
labeled transition systems Li
.= (Qi, q0i,Σ ∪ {}, δi) with i ∈ {I, S, C}. Let
φ : QI → QS be a function (the refinement map). If R ⊂ QI ×QC is a relation
such that
(R1) (q0I , q0C) ∈ R, and φ(q0I) = q0S
(R2) If (qI , qC) ∈ R and qC u−→ q′C for some u ∈ Σe ∪ {}, then (qI , q′C) ∈ R.
(R3) If (qI , qC) ∈ R and qI u−→ q′I for some u ∈ Σe ∪ {}, then (q′I , qC) ∈ R and
φ(qI)
u=⇒ φ(q′I).
(R4) If (qI , qC) ∈ R and qI σ−→ q′I and qC σ−→ q′C for some σ ∈ Σc, then (q′I , q′C) ∈
R and φ(qI)
σ=⇒ φ(q′I).
then I 4C S.
Proof. We first prove the following statement. Let the circumstances be as
described above.
If u, v ∈ Σ∗ and w ∈ Σ∗e such that
u | v ` w (21)
q0I
u=⇒ qI (22)
q0C
v=⇒ qC (23)
then both of the following hold:
(qI , qC) ∈ R (24)
q0S
u=⇒ φ(qI) (25)
The proposition then follows: for given u ∈ I, v ∈ C such that u ↑ Σc = v ↑ Σc,
we know by Lemma E.5 that there exists a w ∈ Σe such that u | v ` w. The
above statement then implies u ∈ S. Because this is true for any such u, v, it
implies I 4C S.
For the proof of the statement, we do a simultaneous induction on the struc-
tures of (21), (22) and (23). To make it more clear that this is a well founded
induction, we can define an integer metric that strictly decreases: the sum of the
depth of the derivation and the lengths of the sequences satisfies this property.
The induction proceeds by case distinction on the types of the last derivation
of (21) and the last transitions in the sequences (22) and (23).
case (any, any, ):
In this case, (23) has the form q0C
v=⇒ q′C −→ qC
By induction, (qI , q′C) ∈ R and q0S u=⇒ φ(qI).
The first implies (24) by (R2).
The second is (25).
26
4 IMPLEMENTATION 4.4 Results
case (any, , any):
In this case, (22) has the form q0I
u=⇒ q′I −→ qI
By induction, (q′I , qC) ∈ R and q0S u=⇒ φ(q′I).
By (R3), we get (24) and φ(q′I)
=⇒ φ(qI), and from there (25).
case ((epsilon), any, any):
Note that the cases for nontrivial sequences (22) and (23)
are already covered (since they can only contain  transitions).
Therefore, it suffices to consider
qI = q0I and qC = q0C
For which (R1) implies the claims (24) and (25).
case ((r-event), any, σ):
In this case, we have a derivation for u | v′σ ` w′σ
and we know q0C
v′=⇒ q′C σ−→ qC with v = v′σ.
Clearly, w′ ∈ Σ∗e.
We have a derivation subtree for u | v′ ` w′.
By induction, (qI , q′C) ∈ R and q0S u=⇒ φ(qI).
The first implies (24) by (R2).
The second is (25).
case ((l-event), σ, any):
In this case, we have a derivation for u′σ | v ` w′σ
and we know q0I
u′=⇒ q′I σ−→ qI with u = u′σ.
Clearly, w′ ∈ Σ∗e.
We have a derivation subtree for u′ | v ` w′.
By induction, (q′I , qC) ∈ R and q0S u
′
=⇒ φ(q′I).
By (R3), we get (24) and φ(q′I)
σ=⇒ φ(qI), and from there (25).
case ((communication), σ, σ):
In this case, we have a derivation for u′σ | v′σ ` w
and we know
q0I
u′=⇒ q′I σ−→ qI and q0C v
′
=⇒ q′C σ−→ qC
We have a derivation subtree for u′ | v′ ` w.
By induction, (q′I , q
′
C) ∈ R and q0S u
′
=⇒ φ(q′I).
By (R4), we get (24) and φ(q′I)
σ=⇒ φ(qI), and from there (25).

The full Murϕ code is available online [8].
4.4 Results
The translation required about two days of work. This estimate assumes famil-
iarity with token coherence, and some knowledge of the implementation. We
found several bugs of varying severity, all of which were missed by prior random
27
4 IMPLEMENTATION 4.4 Results
simulation tests similar to those described by Wood et. al. [35]. Seven changes
were needed to eliminate all failures (not counting mistakes in the verification
model):
1. The implementation included assertions that do not hold in the general
system. Although they were mostly accompanied by a disclaimer like
“remove this for general implementation”, the latter was missing in one
case.
2. The implementation was incorrect for the case where a node has only one
token remaining and answers a Request-Shared. This situation was not
encountered by simulation, probably because the number of tokens always
exceeded the number of simulated nodes. We fixed the implementation,
which involved adding another state to the finite state control.
3. Persistent-Request-Shared messages (which are issued if the regular Request-
Shared is not answered within a timeout period) suffered from the same
problem, and we applied the same fix.
4. The implementation copied the dirty bit from incoming messages even if
they did not contain the owner token. Although this does not compromise
coherence, it can lead to suboptimal performance due to superfluous write-
backs. This performance bug would have gone undetected had we only
checked for coherence, rather than for refinement of the abstract protocol.
5. After fixing bug 4, a previously masked bug surfaced: the dirty bit was no
longer being updated if a node with data received a dirty owner token.
6. Two shaded boxes (i.e. transitions that are specified to be unreachable)
were actually reachable. This turned out to be yet another instance of the
same kind of problem as in bug 2.
7. Finally, another (last) instance of bug 2 was found and fixed.
As expected, the compositional approach heavily reduced the number of searched
states. This kept computational requirements low, in particular considering that
the results are valid for an arbitrary number of caches. Measurements were car-
ried out on a 300MHz Pentium III ThinkPad.
# tokens component # states # transitions time
4 memory controller 92 1692 0.3s
8 memory controller 188 5876 0.6s
32 memory controller 764 83396 7.49s
4 cache controller 700 23454 1.4s
8 cache controller 1308 76446 4.6s
32 cache controller 4956 1012638 65.2s
28
REFERENCES
5 Conclusions and Future Work
We make three main contributions. First, we formally verified the safety of
a system-level implementation of token coherence, for an arbitrary number of
caches. Second, we developed a general and formal specification of the safety
substrate of token coherence, and prove its correctness. Third, we demonstrated
that token coherence’s “design for verification” approach indeed facilitates the
verification as claimed.
Future work may address the following open issues. First, the methodology does
not currently address liveness. Second, other protocols or concurrent computa-
tions may benefit from the high-level abstraction expressed by token transition
systems, and offer opportunities for compositional refinement along the same
lines. Third, much room for automation remains: for example, we could at-
tempt to integrate theorem provers with the SLICC compiler.
References
[1] M. Abadi and L. Lamport. Conjoining specifications. ACM Transactions
on Programming Languages and Systems, 17(3):507–535, May 1995.
[2] R. Alur and T. A. Henzinger. Reactive modules. In Proceedings of the 11th
Annual IEEE Symposium on Logic in Computer Science, page 207. IEEE
Computer Society, 1996.
[3] R. Alur and B. Wang. Verifying network protocol implementations by
symbolic refinement checking. In Proceedings of the 13th International
Conference on Computer-Aided Verification, 2001.
[4] T. Arons, A. Pnueli, S. Ruah, J. Xu, and L. D. Zuck. Parameterized
verification with automatically computed inductive assertions. In Computer
Aided Verification, pages 221–234, 2001.
[5] Arvind and X. W. Shen. Using term rewriting systems to design and verify
processors. IEEE Micro, 19(3):36–46, /1999.
[6] G. Berry and G. Boudol. The chemical abstract machine. Theoretical
Computer Science, 96(1):217–248, 1992.
[7] S. Burckhardt, R. Alur, and M. M. K. Martin. Verifying safety of a to-
ken coherence protocol by compositional parametric refinement. In Sixth
International Conference on Verification, Model Checking and Abstract In-
terpretation (VMCAI), LNCS 3385. Springer Verlag, 2005.
[8] S. Burckhardt et al. Verifying safety of a token coherence implementation
by parametric compositional refinement: Extended version. http://www.
seas.upenn.edu/~sburckha/token/, 2004.
29
REFERENCES REFERENCES
[9] K. M. Chandy and J. Misra. Parallel program design: a foundation.
Addison-Wesley Longman Publishing Co., Inc., 1988.
[10] E. M. Clarke, O. Grumberg, and S. Jha. Verifying parameterized networks.
ACM Trans. Program. Lang. Syst., 19(5):726–750, 1997.
[11] G. Delzanno. Automatic verification of parameterized cache coherence pro-
tocols. In Computer Aided Verification, pages 53–68, 2000.
[12] D. L. Dill. The murphi verification system. In Proceedings of the 8th
International Conference on Computer Aided Verification, pages 390–393.
Springer-Verlag, 1996.
[13] D. L. Dill, A. J. Drexler, A. J. Hu, and C. H. Yang. Protocol verification as
a hardware design aid. In International Conference on Computer Design,
pages 522–525, 1992.
[14] E. A. Emerson and V. Kahlon. Reducing model checking of the many to
the few. In Conference on Automated Deduction, pages 236–254, 2000.
[15] S. M. German. Formal design of cache memory protocols in IBM. Formal
Methods in System Design, 22(2):133–141, 2003.
[16] S. M. German and A. P. Sistla. Reasoning about systems with many pro-
cesses. J. ACM, 39(3):675–735, 1992.
[17] G. J. Holzmann. Algorithms for automated protocol verification. AT&T
Tech. J., Jan./Feb. 1990.
[18] Y. Kesten and A. Pnueli. Control and data abstraction: The cornerstones
of practical formal verification. International Journal on Software Tools
for Technology Transfer, 2(4):328–342, 2000.
[19] R. P. Kurshan. Computer-aided verification of coordinating processes: the
automata-theoretic approach. Princeton University Press, 1994.
[20] R. P. Kurshan and K. McMillan. A structural induction theorem for pro-
cesses. In Proceedings of the Eighth Annual ACM Symposium on Principles
of Distributed Computing, pages 239–247. ACM Press, 1989.
[21] M. M. K. Martin. Token Coherence. PhD thesis, University of Wisconsin-
Madison, 2003.
[22] M. M. K. Martin et al. Protocol specifications and tables for four compara-
ble MOESI coherence protocols: Token coherence, directory, snooping, and
hammer. http://www.cs.wisc.edu/multifacet/theses/milo_martin_
phd/, 2003.
[23] M. M. K. Martin, M. D. Hill, and D. A. Wood. Token coherence: de-
coupling performance and correctness. In Proceedings of the 30th Annual
International Symposium on Computer Architecture, pages 182–193. ACM
Press, 2003.
30
REFERENCES REFERENCES
[24] K. McMillan and J. Schwalbe. Formal verification of the Encore Gigamax
cache consistency protocol. In Proceedings of the International Symposium
on Shared Memory Multiprocessing, pages 242–51, Tokyo, Japan, 1991.
[25] K. L. McMillan. A compositional rule for hardware design refinement. In
Proceedings of the 9th International Conference on Computer-Aided Veri-
fication, pages 24–35, June 1997.
[26] K. L. McMillan. Verification of an implementation of tomasulo’s algorithm
by compositional model checking. In A. J. Hu and M. Y. Vardi, editors,
Computer Aided Verification, volume 1427 of Lecture Notes in Computer
Science, pages 110–121. Springer, 1998.
[27] R. Milner. Communicating and Mobile Systems: the pi-Calculus. Cambridge
University Press, 1999.
[28] S. Park and D. L. Dill. Verification of FLASH cache coherence protocol
by aggregation of distributed transactions. In Proceedings of the Eighth
Annual ACM Symposium on Parallel Algorithms and Architectures, pages
288–296. ACM Press, 1996.
[29] F. Pong and M. Dubois. Verification techniques for cache coherence proto-
cols. ACM Computing Surveys, 29(1):82–126, 1997.
[30] A. Ponse, S. A. Smolka, and J. A. Bergstra. Handbook of Process Algebra.
Elsevier Science Inc., 2001.
[31] P. Ranganathan, V. S. Pai, and S. V. Adve. Using speculative retirement
and larger instruction windows to narrow the performance gap between
memory consistency models. In Proceedings of the Ninth Annual ACM
Symposium on Parallel Algorithms and Architectures, pages 199–210. ACM
Press, 1997.
[32] D. J. Sorin, M. Plakal, A. E. Condon, M. D. Hill, M. M. K. Martin, and
D. A. Wood. Specifying and verifying a broadcast and a multicast snooping
cache coherence protocol. IEEE Transactions on Parallel and Distributed
Systems, 13(6):556–578, 2002.
[33] P. Sweazey and A. J. Smith. A class of compatible cache consistency pro-
tocols and their support by the IEEE futurebus. In Proceedings of the
13th Annual International Symposium on Computer Architecture, pages
414–423. IEEE Computer Society Press, 1986.
[34] P. Wolper. Expressing interesting properties of programs in propositional
temporal logic. In Proceedings of the 13th ACM SIGACT-SIGPLAN Sym-
posium on Principles of Programming Languages, pages 184–193. ACM
Press, 1986.
31
REFERENCES REFERENCES
[35] D. A. Wood, G. A. Gibson, and R. H. Katz. Verifying a multiproces-
sor cache controller using random test generation. IEEE Design & Test,
7(4):13–25, 1990.
32
