State space reduction in modeling checking parameterized cache coherence protocol by two-dimensional abstraction by Yang Guo et al.
J Supercomput (2012) 62:828–854
DOI 10.1007/s11227-012-0755-0
State space reduction in modeling checking
parameterized cache coherence protocol
by two-dimensional abstraction
Yang Guo · Wanxia Qu · Long Zhang · Weixia Xu
Published online: 17 April 2012
© The Author(s) 2012. This article is published with open access at Springerlink.com
Abstract Scalability of cache coherence protocol is a key component in future
shared-memory multi-core or multi-processor systems. The state space explosion is
the first hurdle while applying model-checking to scalable protocols. In order to val-
idate parameterized cache coherence protocols effectively, we present a new method
of reducing the state space of parameterized systems, two-dimensional abstraction
(TDA). Drawing inspiration from the design principle of parameterized systems, an
abstract model of an unbounded system is constructed out of finite states. The mathe-
matical principles underlying TDA is presented. Theoretical reasoning demonstrates
that TDA is correct and sound. An example of parameterized cache coherence proto-
col based on MESI illustrates how to produce a much smaller abstract model by TDA.
We also demonstrate the power of our method by applying it to various well-known
classes of protocols. During the development of TH-1A supercomputer system, TDA
was used to verify the coherence protocol in FT-1000 CPU and showed the potential
advantages in reducing the verification complexity.
Y. Guo () · L. Zhang
Institute of Microelectronics and Microprocessor, School of Computer Science, National University




W. Qu · W. Xu






State space reduction in mc parameterized cache coherence 829
Keywords Parameterized cache coherence protocol · True concurrency · Model
checking · Two-dimensional abstraction
1 Introduction
Model checking is an automatic technique for verifying finite state concurrent sys-
tems, which uses a finite state machine to describe the system under consideration
and temporal logic to state the properties that the system must satisfy. This method
has been used successfully in practice to verify complex software and hardware
systems [1, 2]. However, efficient verification of parameterized cache coherence pro-
tocols is one of the most challenging problems in verification domain today. Firstly,
parameterized systems are composed of an arbitrary number of processes which con-
cur cooperatively (the number of processes is called the system parameter). The
behavior of one process is determined not only by its current state, but also the
changes of the environment it lives. Secondly, parameterized systems are by nature
unbounded. The system parameter may be arbitrarily large, and the ultimate goal is
to validate the properties in a system for every possible number of processes. In such
cases, the number of global states can be enormous, resulting in the state space explo-
sion. Formal verification of parameterized systems is known to be undecidable and
thus cannot be automated. Thirdly, symbolic methods such as BDD or SAT, which
can enable scalable formal verification methods, can be ineffective when it comes to
cache coherence protocols because most of the state variables are relevant in protocol
property verification. As faster larger systems are designed, the complexity of cache
protocols will continue to increase.
Fong Pong [3] presented a comprehensive survey of various approaches to the
verification of cache coherence protocol based on state enumeration, model check-
ing, and symbolic state models. He pointed out that no framework had been proposed
so far to deal with the memory consistency model in the context of formal verification
based on state expansion. Monolithic formal verification methods that treat the proto-
col as a whole have been used fairly routinely for verifying cache coherence protocols
from the early 1990s [4, 5]. However, these monolithic techniques will not be able
handle the very large state space of parameterized protocols. While techniques like
indexed predicates [6], counter abstraction [7], environment abstractions [8, 9], and
cutoffs based approach [10] have been proposed for parameter protocol verification
during these years, none of them scales well to large protocols, and those that do scale
require an inordinate amount of manual effort to succeed [11]. We are not aware of
any published work that has reported formal verification of a parameterized cache
coherence protocol with reasonable complexity.
All successful applications of model checking thus far have made use of domain
specification abstraction techniques. Continuing this trend and drawing inspiration
from recent work like environment abstraction [8, 9], we exploit the domain knowl-
edge about parameterized systems to devise an appropriate abstraction method. We
propose a novel generic approach called two-dimensional abstraction (TDA), which
could effectively reduce the state space of parameterized systems. In our work, the
size of the state transition graph for each process is reduced independently at first,
830 Y. Guo et al.
then the whole system composed of the reduced processes is abstracted based on
the design principles of parameterized systems, thus avoiding the construction of the
complete state space that might be too large to fit into memory.
TDA has a number of advantages over other approaches. First, TDA abstracts
away redundant information from a concrete system via decomposition–abstraction–
composition–reabstraction, thus effectively alleviating the state explosion problem
during parameterized systems verification. Second, TDA can be used for parallel
systems in the usual fashion because it has no limitation in communication mode
among processes. Third, TDA can be used with any model checker. The freedom to
choose model checkers is important in practice. Fourth, TDA is sound and complete.
We give complete soundness and completeness proofs for our method. At last, con-
stant heterogeneous processes and infinite state systems are allowed, which makes
TDA suitable for large scale heterogeneous systems. We demonstrate the power of
our method by applying it to various well-known classes of protocols.
The rest of this paper is organized as follows. In Sect. 2, we introduce previous
related work. Section 3 gives some background information. In Sect. 4, we propose
a model with true concurrency semantics for parameterized systems. In Sect. 5, we
present concepts of a TDA model and the method to construct a TDA model. A cache
coherence protocol based on MESI is used to illustrate the approach of getting a much
smaller state space by TDA in Sect. 6. Experimental results of various well-known
protocols and application are presented in Sect. 7. Section 8, the last section, presents
concluding remarks.
2 Related works
The development of effective techniques for checking parameterized systems is one
of the most challenging problems in verification today. Prior research in the area of
coherence protocol verification has ranged from simulation to formal methods. These
techniques have had varying degrees of success, but few of them have been applied
to a large industrial-strength protocol like FLASH.
Simulation with random or directed stimulus has been shown to be effective at
finding most protocol errors [12]. However, simulation tends not to be effective at
uncovering subtle bugs, especially those related to the consistency model. Subtle con-
sistency bugs often occur only under unusual combinations of circumstances, and it
is unlikely that simulation will drive the protocol to these situations.
For verification of high level specifications, modern industrial practice consists of
modeling small instances of the protocols in guard/action languages such as Mur-
phi [13] or TLA+ [14], and exploring the reachable states through explicit state enu-
meration.
The idea of using non-interference lemmas for parameterized model checking is
attributed to McMillan [15], Chou [16], and Li [17], which is also called the CMP
method. The CMP approach to parameterized verification is a combination of data
type reduction and compositional reasoning. In this approach, a model checker is
used as proof assistant and the user guides the proof by supplying invariants or non-
interference lemmas. Similar types of reasoning have been applied by Chen to verify
State space reduction in mc parameterized cache coherence 831
non-parameterized hierarchical protocols [18]. The compositional method of McMil-
lan is used for compositional reasoning to handle infinite state systems including
directory based protocols. This technique, which requires user intervention at various
stages, has been applied to verify safety and liveness properties of the FLASH proto-
col. The paper by Chou [16] presented a method along similar lines, that was used to
verify safety of FLASH and GERMAN protocol. Krstic [19] gave a formalization of
the method. The CMP method scales well. As far as we are aware, the CMP method
is one of a few methods to handle the full complexity of the FLASH protocol. Intel
used CMP to verify an industrial-strength cache protocol several orders of magni-
tude larger than even the FLASH protocol [20]. Talupur and Tuttle showed how to
derive high-quality invariants from message flows and how to use these invariants
to accelerate the CMP method [21, 22]. A message flow is a sequence of messages
sent among processors during the execution of a protocol. The hardest part of using
CMP is finding a set of protocol invariants that enable CMP to work. The user has
the burden of coming up with non-interference lemmas which can be non-trivial and
require deep understanding of the protocol under verification.
Another effective method for parameterized verification is the abstraction ap-
proach [6–9, 11, 23–25]. Predicate abstraction, first proposed by Graf [11] as a special
case of the general framework of abstraction interpretation, has been used in the veri-
fication of parameterized protocols. In predicate abstraction, a finite set of predicates
is defined over the concrete set of states. These predicates are used to construct a
finite state abstraction of a concrete system. The automation in generating the finite
abstract model makes this scheme attractive in combining deductive and algorithmic
approaches for infinite state verification. Lahiri [26] proposed the use of a symbolic
decision procedure and its application for predication abstraction. One of the main
problems in predicate abstractions is that it typically makes a large number of theo-
rem prover calls when computing the abstract transition relation or the abstract state
space. Pnueli [23] presented the method of invisible invariants that combines a small-
model theorem with a heuristics to generate proofs of correctness of parameterized
systems. Wang [24] used monotonic abstraction to provide an over-approximation
of the transition system induced by a parameterized system. The over-approximation
gives a transition system which is monotonic with respect to a well quasi-ordering
on the set of configurations. Timm [25] presented an approach combining symme-
try arguments with spotlight abstractions. The technique determines (the size of) a
particular instantiation of the parameterized system from the given temporal logic
formula, and feds this into an abstracting model checker. Environment abstraction [8,
9] exploits the replicated structure of a parameterized system to make its verification
easy, and it converts the unbounded system into a bounded one via finite state de-
scription method. In real cache coherence protocols, the internal state of each cache
can be quite complex, and thus environment abstraction might fail. The other method
is divide-and-conquer, in other words, abstraction for each process is made indepen-
dently before the model for the whole system is constructed [27]. Unfortunately, too
many constraints for systems under consideration make this way unpractical.
Other related work includes that of Pandav [28] who has proposed a set of heuris-
tics to aid in constructing invariants for cache protocols. Delzanno [29] used arith-
metic constraints to model possibly infinite sets of global states of a multi-processor
832 Y. Guo et al.
system with many identical caches. General purpose symbolic model checkers for
infinite-state systems working over arithmetical domains were used. Delzanno and
Bultan [30, 31] described a constraint based verification method for handling the
safety and liveness properties of GERMAN protocol. But their method cannot verify
single index liveness properties. Emerson and Kahlon [32] verified GERMAN by first
reducing it to a snoopy bus protocol and then invoking a theorem asserting that if a
snoopy bus protocol of a certain form is correct for 7 nodes then it is correct for any
number of nodes. Pnueli proposed an elegant cutoff method that can verify the DIR
protocol [10], but it was sound and not complete, and worked only for safety proper-
ties. A broad technique was proposed for the verification of WSIS systems that can
handle the DIR protocol as an example [33], yet again the resulting technique was
sound but not complete.
3 Preliminaries
This section contains basic material about the Kripke structure, temporal logic and
equivalent relation on Kripke structures [34].
Definition 1 (Kripke structure) Let AP be a set of atomic propositions. A Kripke
structure M over AP is a five-tuple M = (AP, S, I,R,L) where
1. S is a finite set of states.
2. I ⊆ S is the set of initial states.
3. R ⊆ S × S is a transition relation that must be total, that is, for every state s ∈ S
there is a state s′ ∈ S such that R(s, s′).
4. L : S → 2AP is a function that labels each state with the set of atomic propositions
true in that state.
Temporal logic is used to specify properties of Kripke structures. CTL, a powerful
logic, describes properties of computation trees. A tree is formed by designating a
state in a Kripke structure as the initial state and then unwinding the structure into
an infinite tree with the designated state at the root. In CTL, formulas are composed
of path quantifiers and temporal operators. The path quantifiers are used to describe
the branching structure in the computation tree. There are two such quantifiers A (for
all computation paths) and E (for some computation path). The temporal operators, X
(next time), F (in the future), G (always), U (until), and R (release) describe properties
of a path through the tree.
There are two types of formulas in CTL: state formulas which are true in a specific
state and path formulas which are true along a specific path. Let AP be the set of
atomic propositions, the syntax of CTL is given by the following rules:
1. If p ∈ AP , then p is a state formula.
2. If f and g are state formulas, then ¬f ,f ∧ g, and f ∨ g are state formulas.
3. If f is a path formula, then Ef and Af are state formulas.
4. If f is a state formula,then f is also a path formula.
5. If f and g are path formulas, then ¬f , f ∧ g, f ∨ g, Xf , Ff , fUg, and fRg
are path formulas.
State space reduction in mc parameterized cache coherence 833
Let M be a Kripke structure over AP. A path in M from a state s is an infinite
sequence of states π = s0s1s2 · · · such that s0 = s and R(si, si+1) holds for all i ≥ 0.
We use πi to denote the suffix of π starting at si .
The restriction of CTL to universal path quantifiers A is called ACTL.
Simulation equivalence restricts the logic and relaxes the requirement that the
structures should satisfy exactly the same formulas, resulting in a great reduction.
Definition 2 (Simulation relation) Given two structures M and M ′ with AP′ ⊆ AP, a
relation H ⊆ S × S′ is a simulation relation between M and M ′ if and only if for all
s and s′, if H(s, s′) then the following conditions hold:
1. L(s) ∩ AP ′ = L′(s′).
2. For every state s1 such that R(s, s1), there is a state s′1 with the property that
R′(s′, s′1) and H(s1, s′1).
If there exists a simulation relation H such that for every initial state s0 in M there
is an initial state s′0 in M ′ for which H(s0, s′0), we say that M ′ simulates M (denoted
by M 
 M ′).
4 Modeling parameterized systems
States of each process in a parameterized system are considered as interpretations
over a finite variable set, V . For each V , a subset V e is called an external variable
set that is used by the process to communicate with the environment consisting of
other processes. The set V i = V − V e is an internal variable set. Obviously, the
environment may update only external variables, whereas the process may update all
the variables. Such processes are modeled by Kripke structures which describe a class
of finite state systems with first-order logic propositions. A complex parameterized
system is modeled as a composition of such smaller processes when the following
conditions are met.
Definition 3 (Compatible structure) Two Kripke structures M1 = (AP1, S1, I1,
R1,L1) and M2 = (AP2, S2, I2,R2,L2) are involved, in which V1 and V2 are their
respective state variable sets. If V i1 ∩V i2 = ∅ and V e1 = V e2 are true, then M1 and M2
are compatible structures. The former condition indicates that internal variables are
owned only by one process and the latter requires external variables shared by both
processes.
Definition 4 (Compatible state) Let M1 = (AP1, S1, I1,R1,L1) and M2 = (AP2, S2,
I2,R2,L2) be two compatible structures. If L1(s1) ∩ AP2 = L2(s2) ∩ AP1 is true,
then s1 ∈ S1 and s2 ∈ S2 are compatible. Compatible states agree on the external
variables as well as the common atomic propositions.
Processes communicate with each other in the synchronous or asynchronous
mode. In the synchronous execution mode, all processes execute the transitions at the
same time, whereas in the asynchronous execution mode, the process state transitions
834 Y. Guo et al.
are independent of each other: the system evolves by interleaving the evolution of its
processes. At each execution cycle, only one process is chosen to perform a transi-
tion. However, parameterized systems, in which different processes may change their
states at the same time, are very common in reality. There is no order between these
transitions, thus preserving the true meanings of concurrency. We call such a commu-
nication mode as asynchronous composition with true concurrency semantics. From
the viewpoint of computer science, it is more interesting to investigate asynchronous
products of Kripke structures with true concurrency semantics. We propose a formal
model with true concurrency semantics for parameterized systems, which is more
suitable for describing concurrent systems in the usual fashion.
Definition 5 (Asynchronous composition with true concurrency semantics) Let
Mk = (APk, Sk, Ik,Rk,Lk) be the kth (1 ≤ k ≤ n) Kripke structure among compati-





Mk = (AP, S, I,R,L)
is defined to be:
1. AP = ⋃nk=1APk .
2. S = {< s1, s2, . . . , sn > |sk ∈ Sk (1 ≤ k ≤ n) are compatible states} ⊆ ∏nk=1Sk .
3. I = {< s1, s2, . . . , sn > |∧nk=1 sk ∈ Ik} ⊆ S.
4. R = {(< s1,i , s2,i , . . . , sn,i >,< s1,i+1, s2,i+1, . . . , sn,i+1 >)|∃j,1 ≤ j ≤ n, (sj,i ,
sj,i+1) ∈ Rj }.
5. L(< s1, s2, . . . , sn >) = ⋃nk=1Lk(sk).
Theorem 1 The asynchronous composition operator with true concurrency seman-
tics,
∏
a , is commutative and associative.
Proof By Definition 5, the set of atomic propositions of the composition is a union of
component atomic propositions; so is the set of labels. States of the composition are
vectors of component states that are compatible, and they are elements of the Carte-
sian product of component states. Each transition of the composition involves at least
a transition of n components. Because the union and product of sets are commuta-
tive and associative, the asynchronous composition operator with true concurrency
semantics is also commutative and associative. 
5 Two-dimensional abstraction
Now we use a two-dimensional graph shown in Fig. 1 to describe the state space of
parameterized systems, where the x axis denotes system parameter n, and the y axis
denotes the state space of each process m. To simplify the presentation, it is supposed
that all processes are identical. Since the full cross-product of the process states needs
to be considered in the global system at each step, the result of the asynchronous
composition with true concurrency semantics is very large, in the worst case mn.
State space reduction in mc parameterized cache coherence 835
Fig. 1 State space of parameterized systems
Too many reachable states impede the automatic verification in many practical cases.
Two-dimensional abstraction technique proposed in this paper is specifically tailored
for parameterized systems with true concurrency semantics and helps avoiding the
problem of state explosion.
Definition 6 (Two-dimensional abstraction) For asynchronous concurrent parame-
terized systems with true concurrency semantics, two-dimensional abstraction is a
process constructing an abstract model by first reducing the state space of each pro-
cess independently along the y axis in order to reduce m and then hiding the system
parameter n along the x axis based on the design principles of parameterized systems.
The former step is called y-abstraction, and the latter x-abstraction. The correspond-
ing reduced results are called the y-abstract model and TDA model, respectively.
The selection of an equivalence relation between a TDA model and a concrete sys-
tem is of prime importance for the successful application of TDA in practice. Simu-
lation relationship [35] will result in a greater reduction of the number of states by re-
stricting logic and relaxing the requirement that two structures should satisfy exactly
the same set of formulas. Given two Kripke structures M1 = (AP1, S1, I1,R1,L1)
and M2 = (AP2, S2, I2,R2,L2) with AP2 ⊆ AP1, if there exists a simulation rela-
tion H such that for every initial state s10 (s10 ∈ I1) in M1 there is an initial state s20
(s20 ∈ I2) in M2 for which H(s10, s20), we say that M2 simulates M1 and denote it by
M1  M2. Intuitively, for every transition in M1, there is a corresponding transition
in M2.
In the following sections, PSc(n) refers to the concrete model of asynchronous
concurrent parameterized systems with true concurrency semantics consisting of n
concrete processes. PSy(n) is the y-abstract model of PSc(n) and PSt (n) is its TDA
model.
5.1 y-Abstraction
The y-abstraction deals with each concrete process independently in order to abstract
away the information irrespective of system properties. Any property-preserving ab-
836 Y. Guo et al.
straction method is available. We construct a finite predicate set Φ = {ϕ1, ϕ2, . . . , ϕr}
from properties and system description, and build the y-abstract model through the
method of basic predicate abstraction.
The predicate set Φ defines an equivalence relationship on Sck , the set of states of
Mck = (APck, Sck , I ck ,Rck,Lck) (1 ≤ k ≤ n), and each equivalence class is denoted by an
abstract state. The concrete state is labeled with a predicate formula which is satisfied
in that state. In other words, labeling function Lck maps a concrete state into a predi-
cate set. The set of states of the y-abstract model Myk , S
y
k is a set of normal boolean
expressions on b1, b2, . . . , br (bj (1 ≤ j ≤ r) corresponding to predicate ϕj . A y-
abstract state is a truth assignment to r boolean variables. Labeling function Lyk maps
a y-abstract state into a boolean expression. The abstract operator Hcyk determines
the relationship between concrete states and abstract states. The method of building
the transition relation Ryk of the y-abstract model M
y
k from the concrete transition
relation Rck is the same as that introduced by Graf and Saidi [11]. From the above
definitions, we can conclude that Hcyk ⊆ Sck ×Syk is a simulation relation between Mck
and Myk , so the following theorem holds.
Theorem 2 Mck  M
y
k (1 ≤ k ≤ n).
Proof The proof is given in [11]. 
In the following, we will demonstrate how the y-abstraction affects the parame-
terized concurrent systems.
Definition 7 (Visible transitions set and invisible transitions set) Given a Kripke
structure M = (AP, S, I,R,L), we assume that APf is the set of atomic proposi-
tions involved in the temporal formula f . The set of visible transitions of M w.r.t.
APf includes transitions affecting the truth of atomic propositions in APf , which
is denoted by VTS(M,APf ) = {(s, t)|(s, t) ∈ R ∧ (L(s) ∩ APf = L(t) ∩ APf )}. The
set of IVTS(M,APf ) = R − VTS(M,APf ) is called the set of invisible transitions of
M w.r.t. APf .
It is obvious that VTS(M,APf ) and IVTS(M,APf ) relate to the system prop-
erty. Both of them satisfy VTS(M,APf ) ∩ IVTS(M,APf ) = ∅ and VTS(M,APf ) ∪
IVTS(M,APf ) = R.










k ) ∈ IVTS(Mck ,APf ), the corresponding y-abstract transition Ryk (syk , tyk ) is a













k ) connects two different y-abstract states in M
y
k , that is to
say, Myk performs a transition. Hence, all transitions in M
c
k are maintained. Figure 2
illustrates two kinds of concrete transitions and their y-abstract transitions. Therefore,
the y-abstract model PSy(n) is an asynchronous composition of My1 ,M
y
2 , . . . ,M
y
n .
Theorem 3 The asynchronous composition with true concurrency semantics opera-
tor
∏
a is monotonic w.r.t. , that is, Mck  M
y
k (1 ≤ k ≤ n) ⇒ PSc(n)  PSy(n).
State space reduction in mc parameterized cache coherence 837
Fig. 2 How y-abstraction affects transitions in asynchronous concurrent parameterized systems
Proof Let PSc(n) = (APc, Sc, I c,Rc,Lc) = ∏na k=1 Mck be an asynchronous com-
position with true concurrency semantics, where Mck = (APck, Sck , I ck ,Rck,Lck). Its y-
abstract model is denoted by PSy(n) = (APy, Sy, I y,Ry,Ly) = ∏na k=1 Myk , where
M
y
k = (APyk , Syk , I yk ,Ryk ,Lyk ).





APyk ⊆ APck. (2)







APck = APc. (3)
Note that the abstract function Hcyk , described in Sect. 5.1, is a simulation relation
between Mck and M
y
k , hence, for every s
y in PSy(n), the following identity holds:

























That is to say, a y-abstract state is obtained by applying Hcyk (1 ≤ k ≤ n) to the
kth element in concrete state sc.
Now we will show that Hcy ⊆ Sc × Sy is a simulation relation between PSc(n)
and PSy(n). For every sc = 〈sc1a, sc2b, . . . , sckl, . . . , scng〉 ∈ Sc, suppose that sy =
〈sy1a, sy2b, . . . , sykl, . . . , syng〉 ∈ Sy is its y-abstract state, namely, Hcy(sc) = sy , then,
by Definition 2, both of the following conditions must hold:
1. Lc(sc) ∩ APy = Ly(sy).
2. ∀tc tc ∈ Sc ∧ Rc(sc, tc) ⇒ ∃ty ty ∈ Sy ∧ Ry(sy, ty) ∧ Hcy(tc, ty).
838 Y. Guo et al.
Proof of condition (1): Lc(sc) ∩ APy = Ly(sy).













































kl) is a set of atomic propositions true in s
c
kl , so it is only relative
to APck and independent of AP
c






































) = Ly(sy). (9)
Hence, condition (1) is true.
Proof of condition (2): ∀tc tc ∈ Sc ∧ Rc(sc, tc) ⇒ ∃ty ty ∈ Sy ∧ Ry(sy, ty) ∧
Hcy(tc, ty).
For each tc = 〈tc1a′ , tc2b′ , . . . , tckl′ , . . . , tcng′ 〉 ∈ Sc, Rc(sc, tc) implies that there is at
least one component in a concrete model that makes a transition. Suppose that the
former k (1 ≤ k ≤ n) components make transitions, while the latter n−k components
do not. There are several cases to be considered.
Case 1: tc = sc ∧ Rck(sckl, tckl′) ∈ IVTS(Mck ,APf ), as represented in the middle
of Fig. 2.
State space reduction in mc parameterized cache coherence 839
Because
Mck 






















Now we construct ty by Definition 5 as follows:





















































As the latter n − k components in the concrete model do not make transitions, we
obtain
sc(k+1)r = tc(k+1)r ′, . . . , scng = tcng′ . (13)
























This expression indicates that applying Hcyk to the kth element of tc will yield its
y-abstract state, thus, (tc, ty) ∈ Hcy .
From (11), there is at least one element in sy and ty that satisfies Ryk (syke, tyke′), so
(sy, ty) ∈ Ry .
The other two cases, tc = sc ∧ Rck(sckl, tckl′) ∈ VTS(Mck ,APf ) and tc = sc, can be
discussed in a similar way.
To this point, both conditions (1) and (2) are true. We conclude that Hcy ⊆ Sc ×Sy
is a simulation between PSc(n) and PSy(n). By Definition 2, for every initial state
sc0 ∈ I c in PSc(n) there is an initial state sy0 ∈ I y in PSy(n) such that Hcy(sc0, sy0 ), as
a consequence, this theorem is proved. 
Theorem 3 implies that the y-abstract model is weakly-preserved w.r.t. ACTL*
formula. Applying this theorem to each kind of ACTL* formula, we get the following
conclusion.
Theorem 4 For each ACTL* formula f (APf ⊆ APy), PSy(n) |= f ⇒ PSc(n) |= f .
Proof From Theorem 3, we obtain
PSc(n) 
 PSy(n).
Hence, PSy(n) |= f ⇒ PSc(n) |= f holds. It is proved in [34]. 
Intuitively, this theorem is true because formula in ACTL* describes properties
that are quantified over all possible behaviors of a system. Because every behavior
840 Y. Guo et al.
of PSy(n) is a behavior of PSc(n) , every formula of ACTL* that is true in PSy(n)
must also be true in PSc(n). Theorem 4 is very useful for large scale system verifi-
cation since it provides a way of accelerating the verification by taking advantage of
exhaustive search of a smaller state space.
5.2 x-Abstraction
During the construction of parameterized systems, the designers reason about its cor-
rectness by focusing on the execution of one process (called hub) and consider its in-
teraction with other processes (called rims, all rims constitute the hub’s environment)
[8]. The x-abstraction, following this idea, produces a much smaller state space.
As described in the earlier sections, PSy(n) is an asynchronous concurrent system
with true concurrency semantics. Without loss of generality, assume that PSy(n) con-
tains n − 1 (n > 1) rims (numbered from 1 to n − 1) and one hub (numbered n). We





1 , . . . , s
y




























It is straightforward to find that Lyk (s
y
k ) (1 ≤ k ≤ n) on the right hand side of the
identity is the set of all labels of rims (or hubs) and they are atomic propositions that
process k satisfies in the current state. These atomic propositions reflect process prop-
erties. Consequently, the object of x-abstraction is the whole parameterized system
whose properties relate to either one process or many processes.
Definition 8 (Process property) The first-order predicate prop(k), 1 ≤ k ≤ n, in-
dicating that the kth process has property prop, is called process property. We use
PROP(k) = {prop(k)} to denote all properties the kth process holds.
Given a process d , the d-label is an instance of prop(k), meaning that process d
meets the property prop. PROP(d) = {prop(d)} is the set of all d-labels. For every
sy (sy ∈ Sy) and process d (1 ≤ d ≤ n), we have either sy |= prop(d) or sy  prop(d).
If sy |= prop(d) holds, the y-abstract state sy has the label prop(d).
The global state label of the y-abstract model can be simplified as follows, by
Definition 8:
Ly = L(1)∪· · ·∪L(k)∪· · ·∪L(n) =
n⋃
k=1
L(k) = {l(d), sy |= l(d),1 ≤ d ≤ n}. (15)
It is interesting to note that the global label of the y-abstract state sy is all the
process properties it satisfied. Next we will introduce a new notation to describe the
parameterized system.
Definition 9 The first-order predicate snps(k) = prop(k)∧(∧j =k prop(j)) describes
not only the kth process but also its environment (comprising the j th process).
snps(k) is a quite detailed picture of the global system, and all the snapshots are
represented as SNPS = {snps(k)}.
State space reduction in mc parameterized cache coherence 841
A snapshot snps(k) gives the necessary condition that an equivalent partition meets
on PSy(n): if there exits a process d satisfying sy |= snps(d), snps(k) is one of the
abstract states of sy . All such y-abstract states which satisfy the above condition
compose an equivalence class. If snps(k) were of the form ±prop1(k)∧±prop2(k)∧
· · · ∧ ±propr (k), r > 1, where prop1(k), . . . ,propr (k) are r process properties and
±propi (k) (1 ≤ i ≤ r) indicates that propi (k) appears positive or negative, snps(k)
can be expressed by a tuple 〈b1, b2, . . . , br〉, where bi = 1 ⇔ snps(k) ⇒ propi (k).
That is, the value of each bit bi reflects the polarity of the corresponding predicate
propi (k) in snps(k). Labeling the y-abstract states with atomic formulas will result
in a much smaller state space.
In order to construct a TDA model, PROP and SNPS must meet two conditions:
coverage and congruence. Coverage means that every y-abstract state is reflected by
some snapshots, and congruence implies that snps(k) contains enough information
about a process to conclude a label holds true for this process or not. That is to say,
for each snps(k) ∈ SNPS and each prop(k) ∈ PROP it holds that snps(k) → prop(k)
or snps(k) → ¬prop(k).
Suppose that PROP and SNPS of PSy(n) satisfy the above conditions, the TDA
model is a Kripke structure PSt = 〈APt , St , I t ,Rt ,Lt 〉:
1. APt is the set of atomic propositions involved in the process property prop(k), and
APt = APy according to Definition 8;
2. St = SNPS is the set of abstract states: the abstract operator αn(sy) = {snps(k) ∈
SNPS|sy |= snps(n)} maps all the y-abstract states sy , where hub meets the con-
dition of snps(k), into the TDA abstraction state snps(k);
3. I t is the set of initial abstract states: snps(k) ∈ I t if there exists a parameterized
system PSy(n) and a y-abstract state sy ∈ I y such that snps(k) ∈ αn(sy);
4. Lt is the labeling function: for each snps(k) ∈ St ,Lt(snps(k)) = {prop(k) :
snps(k) ⇒ prop(n)};
5. Rt is the set of abstract transitions: for each snps1(k) ∈ St , snps2(k) ∈ St , if there
exist a parameterized system PSy(n) and two y-abstract states sy ∈ Sy, ty ∈ Sy
which meet the condition of snps1(k) ∈ αn(sy) ∧ snps2(k) ∈ αn(ty) ∧ (sy, ty) ∈
Ry , then (snps1(k), snps2(k)) ∈ Rt .
The TDA abstract state is labeled with prop(k) which process k satisfies, and now
k becomes finite after y-abstraction, therefore, St is finite, too. From the theoretical
perspective, TDA will reduce the space by (|S| − |St |)/|S| where S is the set of
asynchronous composition states defined in Definition 5. At this time, our goal of
reducing the state space of parametric verification has been achieved.
Theorem 5 For a single-indexed ACTL* specification ∀x ϕ(x) where the atomic
formulas involved in ϕ(x) are labels in Lt , the following holds: PSt |= ϕ(x) ⇒
∀nPSy(n) |= ∀xϕ(x).
Proof The proof is given in [36]. 
The correctness of TDA means that TDA model is weakly-preserved for single
indexed ACTL* specifications, which is guaranteed by Theorems 3, 4, and 5. In ad-
dition, Theorem 5 implies that TDA is sound, namely, any single-indexed ACTL*
842 Y. Guo et al.
specification which holds in a TDA model also holds in a concrete model with arbi-
trary number of processes. The completeness and soundness of our approach provide
a solid theoretical foundation for optimizing the state space of parameterized systems.
6 An example
We show how the TDA runs on parameterized MESI protocol. The MESI protocol is
a four-state write-invalidate cache coherence protocol in which every memory block
can be in one of the following states: Modified, Exclusive, Shared, and Invalid [37].
Invalid means that a memory block is not present in the cache and to load it the
processor would have to send a request (LD) to the main memory. Modified identifies
cache lines that have been written by the corresponding processor (ST). The current
version of the modified block resides in the cache and is not visible to the rest of the
system at this time. The processor can perform LD, ST, and Eviction on this data.
Shared is the only state which allows other valid copies of the same memory block
to be stored in other caches. A processor can load from a Shared memory block or
evict it without notifying other processors or the memory. Exclusive means that the
processor is the one who owns the right to modify the block and the main memory
is current with the contents of the cache. If one cache has an Exclusive or Modified
state, all matching lines in other caches are marked Invalid.
Let PSc(3) be a distributed shared-memory multi-processor system with three pro-
cessors which ensures the data consistency through a directory-based MESI protocol
considering single memory block and single cache line. The directory itself is a data
structure whose entries record, for every block of memory, the state (i.e., cache access
permission, namely, dirstate) and the identities of the processors which have cached
that block (sharedset). Each cache tag residing in a processor includes at least three
fields: memaddr, cachestate, and cachedata. From the viewpoint of each cache con-
troller, a particular memory block can be in one of the four states: MODF, EXCL,
SHRD, or INVD. From the perspective of system-wide view, the state of a cache line
is determined by the corresponding dirstate and cachestate. Regardless of dirstate, if
the range of cachedata is contained in [0,1], there are as many as 32 transitions in the
state machine of a single processor for a single memory block, even though 7 states
are valid (shown on the left hand side of Fig. 3). It is very difficult to draw the state
machine graph if cachedata and memaddr are allowed to take on any values from its
domain.
Now we want to validate PSc(3) which satisfies such a property that there exists
a processor without a copy of a block of memory when it is shared by another pro-
cessor. The first step is to simplify the MESI protocol for a single processor through
y-abstraction by Definition 6. Because the above property only relates to the state
of cache line and does not care its value, cachedata is redundant. The Kripke struc-
ture of the reduced MESI protocol by y-abstraction is shown on the right hand side
of Fig. 3, where states are labeled with predicates satisfied in the current state, for
example, ‘M’ means cachestate = MODF.
According to Definition 5, there are only 14 valid states out of a possible 4 ∗ 4 ∗ 4
states in PSc(3) (shown in Fig. 4), each of them is labeled with a predicate-vector























































844 Y. Guo et al.
Fig. 4 y-abstract model of 3
MESI-based processors
of length three, with the three bits representing the predicate the current memory
block satisfies in processors 1, 2, and 3, respectively. For example, 〈EII〉 implies
that processor 1 owns the right to modify the memory block and the memory data
is not present in the caches of processors 2 and 3. To load the memory data, both
of them must issue a request to the main memory. Other states are excluded due
to compatibility constraints. Take 〈MMM〉 as an example. For the particular cache
line in processors 1, 2 and 3, cachestate is an internal variable, whereas dirstate
and sharedset are external variables. The labels for an M state in each proces-
sor are {dirstate = M, sharedset = P1, cachestate = M}, {dirstate = M, sharedset =
P2, cachestate = M}, and {dirstate = M, sharedset = P3, cachestate = M}, respec-
tively. The M states do not agree on the external variable sharedset, so they are not
compatible.
In the second step, we use the following process property to represent that the
block of memory is shared by the kth processor and there is another processor which
has no copy of the block of memory:
δ(k) = (cachestate[k] = S) ∧
(∨
j =k
cachestate[j ] = I
)
. (16)
We define prop1(k) and prop2(k) by
prop1(k) 
(
cachestate[k] = S), (17)
State space reduction in mc parameterized cache coherence 845
Table 1 PSy(3) state space partition using snps(2)
Equivalence class Label of equivalence class Bit-vector of label
{ISI,SSI, ISS} prop1(2) ∧ prop2(2) = snps(2) 〈11〉
{III, IEI, IMI,EII,MII,
IIE, IIS, IIM,SII}
¬prop1(2) ∧ prop2(2) = ¬snps(2) 〈01〉
{SSS} prop1(2) ∧ ¬prop2(2) = ¬snps(2) 〈10〉
{SIS} ¬prop1(2) ∧ ¬prop2(2) = ¬snps(2) 〈00〉
prop2(k)  ∃j
(
j = k ∧ cachestate[j ] = I). (18)
Thus
snps(1) = prop1(1) ∧ prop2(1),
snps(2) = prop1(2) ∧ prop2(2), (19)
snps(3) = prop1(3) ∧ prop2(3).
Table 1 demonstrates the result of the state space of PSy(3) partitioned by snps(2).
The first column lists the sets of equivalence class, while the second is the label of
each equivalence class and its bit vector expression is shown in the last column. From
the table we note that there are only 4 states in the TDA model, reducing the space by
71.4 % compared with that in the y-abstract model. The state of 〈11〉 in the resulting
model means that processor 2 has a shared copy of the memory block and the memory
data is not present in the caches of processor 1 and/or processor 3. Therefore, the TDA
model is precise enough to prove the above system property, namely, TDA is correct.
Because the system parameter n is existentially-quantified, a group of parame-
terized systems with different system parameter can be modeled by the same TDA
model. To prove the soundness, we applied our method to several other concrete sys-
tems. As it is expected, at least 3 concrete systems have the same TDA model as
PSc(3) has. Figure 5 shows one such system.
7 Case studies
To validate our approach, we have implemented TDA and applied it to verify several
classical cache coherence protocols as described in [38] and a hierarchical cache
protocol in FT-1000 CPU.
7.1 Protocols and properties to be verified
Classical protocols and properties these protocols should have are introduced briefly
here.
Synapse N + 1
Synapse N + 1 is a write-allocation protocol developed by Synapse for the N + 1
computer. A cache can be in one of three possible states: invalid (the cache has no
846 Y. Guo et al.
Fig. 5 Another concrete system with 4 MESI-based processors has the same TDA model as PSc(3) has
State space reduction in mc parameterized cache coherence 847
Fig. 6 The Synapse N + 1
protocol from the perspective of
cache Ci
valid data), valid (the cache has a potentially shared copy of the data), and dirty (the
cache has a modified copy of the data). dirty is an exclusive state, only one cache can
have a dirty line. The state changes according to write and read commands issued by
the corresponding processor (for example, Rm, W ) or coming from the system bus
(such as Rm and W ), as shown in Fig. 6, Rh is an internal action that denotes a read
hit, Rm denotes a read miss, W denotes a write.
There are two possible sources of data inconsistency for Synapse:
UNS1: a dirty cache co-exists with one or more caches in state valid;
UNS2: more than one cache is in state dirty.
Illinois
The University of Illinois protocol is a snoopy cache, write-invalidate, write-in co-
herence policy. The special feature is that caches can have exclusive copies of data.
Bus invalidation signals are sent only for writes to shared data. The memory copy is
updated using a write-back policy (replacement). In addition to invalid, caches can
be in one of the following states: valid-exclusive (the cache has an exclusive copy
of the data that is consistent with the memory such that a modification of its content
requires no bus invalidation signal), shared (the cache has a copy of the data con-
sistent with the memory and other caches may have copies of the data), and dirty
(the cache has a modified copy of the data, i.e., the data in main memory are ob-
solete and the content of the other caches is not valid). The transition is given in
Fig. 7, and the behavior of one cache may be internal actions Rh (read hit), Rm (read
miss), We (write in exclusive state), Wd (write in dirty state), WI (write and inval-
idate), and Rep (replacement with a new memory line). In this figure, P is defined
as Number(dirty) = 0 ∧ Number(shared) = 0 ∧ Number(valid-exclusive) = 0, where
Number(q) denotes the number of caches in state q in the current global state.
The possible sources of data inconsistency are:
UNS1: a dirty cache co-exists with caches either in state shared or valid-exclusive;
UNS2: there is more than one dirty cache.
The other possible violations of the exclusivity of state valid-exclusive are:
UNS3: there is more than one valid-exclusive cache;
UNS4: a shared cache co-exists with a cache in state valid-exclusive.
848 Y. Guo et al.
Fig. 7 The Illinois protocol
from the perspective of cache Ci
Fig. 8 The Berkeley protocol
from the perspective of cache Ci
Berkeley
The Berkeley protocol is a variation of MESI with write-allocation and with a
shared modified state, named owned non-exclusively. In this state, the main memory
is not coherent with the possible multiple, cached copies of the owner data. The other
three states are invalid, unowned (similar to the MESI Shared state), and owned ex-
clusively (similar to the MESI Modified state). Figure 8 demonstrates how one cache
changes its state according to different commands.
In the Berkeley protocol, we have the following sources of data inconsistency:
UNS1: an owned exclusively cache co-exists with one or more caches either in
state owned non-exclusively, or unowned;
UNS2: there is more than one owned exclusively cache.
Dragon
Dragon is a write-allocation protocol that uses a signal to indicate snoop hits on the
bus. The protocol has four states: shared clean (multiple clean copies may coexist),
shared dirty (multiple dirty copies may coexist), shared valid exclusive (the cache
has an exclusive clean copy), and dirty (the cache has an exclusive dirty copy). The
possible transitions from the perspective of cache Ci are shown in Fig. 9, where
P,Q,S,T are defined as follows:
P  Number(exclusive) = 0 ∧ Number(dirty = 0) ∧ Number(shared-dirty) = 0 ∧
Number(shared-clean) = 0,
State space reduction in mc parameterized cache coherence 849
Fig. 9 The Dragon protocol
from the perspective of cache Ci
Q  Number(shared-dirty) + Number(shared-clean) ≥ 2,
S  Number(shared-dirty) = 0 ∧ Number(shared-clean) = 1,
T  Number(shared-dirty) = 1 ∧ Number(shared-clean) = 0.
In the Dragon protocol, there are several possible sources of data inconsistency:
UNS1: a dirty cache co-exists with one or more caches either in state shared dirty,
shared clean or valid exclusive;
UNS2: an valid exclusive cache co-exists with one or more caches either in state
shared clean, or shared dirty;
UNS3: there is more than one dirty cache;
UNS4: there is more than one valid exclusive cache.
7.2 Experimental results
Figures 10 and 11 present some results of these experiments.
The asynchronous composition of n-processor system which ensures the data con-
sistency through some protocol is a concrete system. Figure 10 shows the number
of concrete states of each protocol against different system parameter according to
Definition 5. Although in the worst case the number of states in asynchronous com-
position could be as large as
∏n
k=1|Sk|, in practice it typically turns out to be much
smaller. This is because some states, such as 〈dirty,dirty〉 in Illinois protocol and
〈owned-exclusively,owned-exclusively〉 in Berkeley protocol are prohibited. As it is
seen from this figure, with the increase of processor number (especially greater than
13 for Berkeley and Dragon, 20 for Synapse N + 1 and Illinois), the state num-
ber grows rapidly. Therefore, the largest asynchronous composition we can get only
comprises 24 processors (Synapse N + 1).
In Fig. 11, we plot the number of states in TDA model of each protocol. Be-
cause process properties used in TDA are made of predicates taken from properties
to be verified, different properties for the same protocol have different TDA models.
Two predicates, cachestate(i) = dirty/shared and Number(dirty/valid-exclusive),
are enough to express these properties formally, resulting in 4, the maximum number
of TDA abstract states. AHG denotes the number of reachable states in the abstract
850 Y. Guo et al.
Fig. 10 Asynchronous composition state number with different processor number
Fig. 11 TDA state number against properties, UNS1–UNS4 correspond to properties to be verified for
each protocol, and AHG denotes abstract history graph
history graph described in [39] which are greater than those in TDA. It is also impor-
tant to notice that the number of states in TDA model does not change along with the
system parameter, which is consistent with the conclusion in Sect. 6. All experiments
were conducted on a PC with a 3.3 GHz Intel Core processor, 8 Gb of available main
memory, running Red Hat Linux (6.1) and GCC (4.4.5).
State space reduction in mc parameterized cache coherence 851
Fig. 12 Architecture of FT-1000 CPU
7.3 Application for FT-1000 CPU
FT-1000 CPU is a key component in TH-1A supercomputer system [40]. It adopts
the parallel system on chip multi-core architecture. Eight multi-thread cores, each
with a private cache hierarchy (L1 Cache), are integrated on the chip. The eight cores
share a large capacity multi-bank L2 Cache, and communication between cores is
achieved through Cache Crossbar. Cache Ordering Unit (COU) is responsible for
cache coherence and memory ordering. L2 Cache can access the off-chip high speed
DDR3 DRAM via memory controller units (MCU). The inter-chip direct connect
interface supports cache coherence packet and large block data transfer packet, and
can be used for connecting 2–4 processors directly to build large scale tightly-coupled
shared-memory systems. This chip provides efficient I/O access by integrated PCIE
2.0 standard interface. Figure 12 illustrates the architecture of FT-1000 CPU.
In FT-1000 based SMP systems, a two-level hierarchical coherence protocol is
designed to provide the coherent view of shared data items for programmers. The
first level is the chip-level protocol used to keep multiple copies of the data among
eight L1 caches consistent. The second level is the inter-chip protocol, used to main-
tain the L2 caches coherence among different chips. Both levels of this protocol are
based on the standard three-state (unowned, shared, exclusive) invalidation-based
directory-based cache coherence protocol with some extensions. This hierarchical
protocol is more complicated, with more corner cases and bigger state space than
non-hierarchical protocols, as we can see, it has eight instances of chip-level protocol
and at most four instances of inter-chip protocol running concurrently. So it seems ob-
vious that such hierarchical protocols cannot be checked by current model checkers,
e.g., Murphi, NuSMV. During the development of FT-1000 CPU, we applied TDA
to reduce the state space of chip-level protocol, and checked several safety proper-
ties using NuSMV. Then, FT-1000 CPU is regarded as a single-core processor and
852 Y. Guo et al.
Table 2 Experimental results
of FT-1000 chip-level protocol Asynchronous composition
state number
TDA state number Time (ms)
UNS1 UNS2 UNS1 UNS2
264 4 2 64 57
the verification of the inter-chip protocol is simplified. We claimed the correctness of
the original protocol by verifying the second level protocol. Some chip-level experi-
mental results are given in Table 2, where UNS1 and UNS2 are the same as those of
Synapse N + 1.
8 Conclusions
The verification of cache coherence in general is known to be NP-hard. In the age of
exascale computing, scalability is emerging as one of the key components in paral-
lel computing [41]. Scalable multi-core multi-processor architectures are inevitable.
More and more complex processes and unbounded system parameter result in the
state explosion during the verification of parameterized cache coherence protocols.
A generic abstraction method for parameterized systems, two-dimensional abstrac-
tion (TDA), has been put forward in this paper. The novelty of our approach lies in
that it analyzes in depth the intrinsic factors affecting the size of state space, and
reduces the state space in two dimensions, thus a much smaller abstract model is pro-
duced. Compared with traditional approaches, our approach can effectively reduce
the verification complexity and greatly scale the verification capabilities. We give
complete soundness and completeness proofs for our method. We have demonstrated
the benefits of our approach on several coherence protocols with realistic features.
Our future work is to integrate TDA with model-checking tools and check the
advanced cache coherence protocol hierarchically organized for a next generation
supercomputer. We also plan to investigate combining TDA with CMP method in the
future.
Acknowledgements This work is inspired by the idea from M. Talupur’s work on environment ab-
straction, and supported by the National Natural Science Foundation of China under Grant No. 61070036
and 61133007.
Open Access This article is distributed under the terms of the Creative Commons Attribution License
which permits any use, distribution, and reproduction in any medium, provided the original author(s) and
the source are credited.
References
1. Grumberg O, Veith H (2008) 25 Years of model checking: history, achievements, perspectives. In:
Lecture notes in computing science, vol 5000, VII. Springer, Berlin, p 231
2. Guerraoui R, Henzinger TA, Singh V (2008) Model checking transactional memories. Distrib Comput
22(3):129–145
3. Pong F, Dubios M (1997) Verification techniques for cache coherence protocols. ACM Comput Surv
29(1):82–126
State space reduction in mc parameterized cache coherence 853
4. Abts D, Lilja DJ, Scott S (2000) Toward complexity-effective verification: a case study of the cray
SV2 cache coherence protocol. In: The 27th annual int’l symp on computer architecture (ISCA-2000),
Vancouver, British Columbia, Canada
5. Abdulla PA, Delzanno G, Rezine A (2008) Monotonic abstraction in parameterized verification. Elec-
tron Notes Theor Comput Sci 223:3–14
6. Lahiri SK, Bryant RE (2007) Predicate abstraction with indexed predicates. ACM Trans Comput
Logic 9(1). doi:10.1145/1297658.1297662
7. Pnueli A, Xu J, Zuck L (2002) Liveness with (0; 1; ∞) counter abstraction. In: Proc of the 14th
international conference on computer aided verification (CAV). Lecture notes in computer science,
vol 2404. Springer, Berlin, pp 107–122
8. Clarke E, Talupur M, Veith H (2006) Environment abstraction for parameterized verification. In: Proc
of the 7th VMCAI. Lecture notes in computer science, vol 3855. Springer, Berlin, pp 126–141
9. Clarke E, Talupur M, Veith H (2008) Proving Ptolemy right: environment abstraction principle for
parameterized verification. In: Proc of TACAS. Lecture notes in computer science, vol 4963. Springer,
Berlin, pp 33–47
10. Pnueli A, Ruah S, Zuck L (2001) Automatic deductive verification with invisible invariants. In: Proc
of the 7th TACAS, pp 82–97
11. Graf S, Saidi H (1997) Construction of abstract state graphs with PVS. In: Proc of the 9th int’l conf
on computer aided verification. Springer, Berlin, pp 72–83
12. Sorin DJ, Plakal M, Condon AE, et al (2002) Specifying and verifying a broadcast and a multicast
snooping cache coherence protocol. IEEE Trans Parallel Distrib Syst 13(6):556–577
13. Dill DL, Drexler AJ, Hu AJ, Yang CH (1992) Protocol verification as a hardware design aid. In: IEEE
intl conference on computer design, pp 522–525
14. Lamport L (2002) Specifying systems: the TLA+ language and tools for hardware and software engi-
neers. Addison-Wesley, Reading
15. McMillan KL (2001) Parameterized verification of the FLASH cache coherence protocol by compo-
sitional model checking. In: Proc on correct hardware design and verification methods (CHARME).
Lecture notes in computer science, vol 2144. Springer, Berlin, pp 179–195
16. Chou C-T, Mannava PK, Park S (2004) A simple method for parameterized verification of cache
coherence protocols. In: Proc on formal methods in computer-aided design (FMCAD). Lecture notes
in computer science, vol 3312. Springer, Berlin, pp 382–398
17. Li Y (2007) Mechanized proofs for the parameter abstraction and guard strengthening principle in
parameterized verification of cache coherence protocols. In: Proc of the ACM symposium on applied
computing (SAC), pp 1534–1535
18. Chen X, Yang Y, Gopalakrishnan G, Chou C-T (2010) Efficient methods for formally verifying safety
properties of hierarchical cache coherence protocols. Form Methods Syst Des 36(1):37–64
19. Krstic S (2005) Parameterized system verification with guard strengthening and parameter abstraction.
In: Automated verification of infinite state systems (AVIS)
20. Talupur M, Krstic S, O’Leary J, Tuttle MR (2008) Parametric verification of industrial strength cache
coherence protocols. In: Proc workshop on design of correct circuits (DCC)
21. Talupur M, Tuttle M (2008) Going with the flow: parameterized verification using message flows. In:
Formal methods in computer aided design (FMCAD), pp 1–8
22. O’Leary J, Talupur M, Tuttle MR (2009) Protocol verification using flows: an industrial experience.
In: Proc of the 9th international conference on formal methods in computer-aided design (FMCAD),
pp 172–179
23. Pnueli A, Zuck LD (2003) Model-checking and abstraction to the aid of parameterized systems. In:
Proc of the 4th international conference on verification, model checking, and abstract interpretation.
Lecture notes in computer science, vol 2144. Springer, Berlin, p 4
24. Wang C, Hachtel GD, Somenzi F (2006) Abstraction refinement for large scale model checking.
Springer, Berlin
25. Timm N, Wehrheim H (2010) On symmetries and spotlights—verifying parameterised systems. In:
Lecture notes in computer science, vol 6447. Springer, Berlin, pp 534–548
26. Lahiri SK, Bryant RE, Cook B (2003) A symbolic approach to predicate abstraction. In: Proc of the
international conference on computer-aided verification (CAV’03). Lecture notes in computer science,
vol 2742. Springer, Berlin, pp 141–153
27. Konnov IV, Zakharov VA (2010) An invariant-based approach to the verification of asynchronous
parameterized networks. J Symb Comput 45(11):1144–1162
854 Y. Guo et al.
28. Pandav S, Slind L, Gopalakrishnan G (2005) Counterexample guided invariant discovery for param-
eterized cache coherence verification. In: Proc of CHARME. Lecture notes in computer science, vol
3725. Springer, Berlin, pp 317–331
29. Delzanno G (2000) Automatic verification of parameterized of cache coherence protocols. In: Proc of
the 12th international conference on computer aided verification (CAV), pp 53–68
30. Delzanno G, Bultan T (2001) Constraint-based verification of client–server protocols. In: Proc of the
7th international conference on principles and practice of constraint programming, pp 286–301
31. Delzanno G (2003) Constraint-based verification of parameterized cache coherence protocols. Form
Methods Syst Des 23(3):257–301
32. Emerson EA, Kahlon V (2003) Exact and efficient verification of parameterized cache coherence
protocols. In: Proc on correct hardware design and verification methods (CHARME’03). Lecture notes
in computer science, vol 2860. Springer, Berlin, pp 247–262
33. Baukus K, Lakhnech Y, Stahl K (2002) Parameterized verification of a cache coherence protocol:
safety and liveness. In: Proc VMCAI. Lecture notes in computer science, vol 2294. Springer, Berlin,
pp 247–262
34. Clarke EM Jr, Grumberg O, Peled DA (1999) Model checking. The MIT Press, Cambridge, London
35. Girard A, Julius AA, Pappas GJ (2008) Approximate simulation relations for hybrid systems. Discret
Event Dyn Syst 18(2):163–179
36. Talupur M (2006) Abstraction techniques for parameterized verification. Ph.D. Dissertation, Pitts-
burgh, Carnegie Mellon University
37. Culler D, Singh JP, Gupta A (1998) Parallel computer architecture: a hardware/software approach.
Morgan Kaufmann, San Mateo
38. Delzanno G (2003) Constraint-based verification of parameterized cache coherence protocols. Form
Methods Syst Des 23:257–301
39. Emerson EA, Kahlon V (2003) Rapid parameterized model checking of snoopy cache coherence
protocols. In: Proc TACAS. Lecture notes in computer science, vol 2619. Springer, Berlin, pp 144–
159
40. Yang X-J, Liao X-K, Lu K, et al (2011) The TianHe-1A supercomputer: its hardware and software.
J Comput Sci Technol 26(3):344–351
41. Bosque JL, Roblas OD, Toharia P, Pastor L (2011) Evaluating scalability in heterogeneous systems.
J Supercomput 58:367–375
