In this paper, we introduce a cache protocol verification technique based on a symbolic state expansion procedure. A global FSM (Finite State Machine) model characterizing the protocol behavior is built and protocol verification becomes equivalent to finding whether or not the global FSM may enter erroneous states. In order to reduce the complexity of the state expansion process, all the caches in the same state are grouped into an equivalence class and the number of caches in the class is symbolically represented by a repetition constructor. This symbolic representation is partly justified by the symmetry and homogeneity of cache-based systems. However, the key idea behind the representation is to exploit a unique property of cache coherence protocols: the fact that protocol correctness is not dependent on the exact number of cached copies. Rather, symbolic states only need to keep track of whether the caches have 0, 1 or multiple copies. The resulting symbolic state expansion process only takes a few steps and verifies the protocol for any system size. Therefore, it is more efficient and reliable than current approaches.
The verification procedure is first applied to the verification of five existing protocols under the assumption of atomic protocol transitions. A simple snooping protocol on a split-transaction shared bus is also verified to illustrate the extension of our approach to protocols with non-atomic transitions.
Index Terms-Cache coherence protocol, formal verification, finite state machine, symbolic expansion and shared-memory multiprocessor.
A New Approach for the Verification of Cache Coherence Protocols Abstract
In this paper, we introduce a cache protocol verification technique based on a symbolic state expansion procedure. A global FSM (Finite State Machine) model characterizing the protocol behavior is built and protocol verification becomes equivalent to finding whether or not the global FSM may enter erroneous states. In order to reduce the complexity of the state expansion process, all the caches in the same state are grouped into an equivalence class and the number of caches in the class is symbolically represented by a repetition constructor. This symbolic representation is partly justified by the symmetry and homogeneity of cache-based systems. However, the key idea behind the representation is to exploit a unique property of cache coherence protocols: the fact that protocol correctness is not dependent on the exact number of cached copies. Rather, symbolic states only need to keep track of whether the caches have 0, 1 or multiple copies. The resulting symbolic state expansion process only takes a few steps and verifies the protocol for any system size. Therefore, it is more efficient and reliable than current approaches.
Introduction
In a shared-memory multiprocessor, private caches are needed to reduce the effects of memory access latency and contention. Whereas private caches significantly improve system performance, they introduce the cache coherence problem. Multiple cached copies of the same memory word must be consistent at any time. A cache coherence protocol ensures that changes made to shared memory locations by any processor are made visible to all other processors.
A cache coherence protocol is a set of rules coordinating communicating entities (usually cache and memory controllers) to enforce consistency among multiple data copies. Many protocols [1, 7] have been proposed, described and implemented; however, they have never been validated formally. The simplicity of these protocols, the lack of verification tools, and the complexity of current formal validation procedures may explain this state of affair. Informal techniques for protocol verification are based on time-consuming, error-prone testing procedures by engineers and require a great deal of ingenuity. As the complexity of protocols increases, it becomes extremely difficult to verify protocols by simply relying on human reasoning.
In a broad sense, the goal of validation is to verify that a protocol satisfies its specification and possesses the required invariant properties. Validation activities, including simulation studies, state reachability analysis and logical reasoning, should exist at all phases of design and implementation. Simulations are conceptually simple but suffer from incompleteness since a random test sequence must be run indefinitely to enter all reachable states. It is also very unlikely that validation procedures based on trace-driven simulations can detect most design errors: A protocol passing the test is only shown to be correct for the particular simulation runs. This paper introduces a new approach for validating cache coherence protocols at the early design stage. Our method is based on reachability analysis. Abstracted from the details of their hardware implementation, protocols are specified by finite state automata which characterize the behavior of the caches. The system state is the composition of individual cache states as in [23, 29] , but the system state space is symbolically represented and expanded. The symbolic representation is partly based on the symmetry and homogeneity of cache-based system. However, the major contribution of this paper is to exploit a unique property of cache coherence protocols: the fact that protocol correctness is not dependent on the exact number of cached copies. Rather, symbolic states only need to keep track of whether the caches have 0, 1 or multiple copies. Symbolic states are obtained by grouping caches in the same state into an equivalence class and symbolically represents the number of caches in the class by a repetition constructor. This state representation leads to an efficient state expansion process. More importantly, our method verifies protocols independently of system sizes so that the verification result is totally reliable. A global state graph is reported at the end of the procedure. The global state graph is useful not only to verify data consistency but also to represent the protocol at the system level in a compact fashion and to demonstrate the similarities and disparities among protocols.
After reviewing the current state of the art for verifying cache protocols, we develop the global state model for cache protocols under the assumption of atomic protocol transitions and illustrate it with the Illinois protocol. Next, we introduce our new methodology based on symbolic states. Equivalence relations leading to the state representation by classes are introduced, followed by the symbolic construction of the global state graph.
The methodology is applied to several protocols of moderate complexity described in [1] .
A simple snooping protocol on a split-transaction shared-bus is finally verified to illustrate the extension of our methodology to protocols with non-atomic transitions.
Current Approaches
Several approaches to verify cache protocols have been explored by other authors.
Approaches based on testing were attempted by Baer and Girault [2] who introduced a Petri Net model of cache protocols. This model comprehensively specifies the underlying hardware structure. The Petri Net model is valuable in capturing the synchronization between communicating hardware entities, and hence, it is an important methodology for mapping protocol designs to actual implementations. The construction of a Petri Net model as shown in [2] is difficult to automate and is very complex even though the protocol is simple; the verification procedure is not specified and is probably very complex.
Reachability analysis is primarily based on exploring exhaustively all the possible interactions between entities interacting in the protocol. The system is characterized by its state. From a given state, the exploration of all possible interactions among entities leads to a number of new states. States in which the protocol fails to preserve expected correctness properties are classified as erroneous states; otherwise states are permissible. If any erroneous state is reachable, the protocol is incorrect. The major difficulty of this technique is the "state space explosion" problem [5, 16] ; normally the state exploration complexity quickly blows up with the increasing number and complexity of entities involved in the protocol. Reachability analysis has been widely adopted for the automated verification of communication protocols [5, 16, 19, 30] . In order to validate a cache coherence protocol, it is not sufficient to track the possible states; the state models must also capture aspects associated with the consistency of data values.
In [29] , Rudolf and Segall presented a proof of a snooping protocol by enumerating the various scenarios of reads and writes. Each cache is modeled as a finite state automaton and a product machine is a collection of n finite state automata. Nanda and Bhuyan [23] presented a similar approach based on the composition of communicating finite state machines and on state enumeration. Dill et al. [13] developed the Murϕ system which also searches all reachable system states. In an enumeration approach, a large number of redundant states are visited and expanded during the state expansion procedure.
Enumerating states for complex protocols faces the state space explosion problem.
Another technique for validating protocols relies on logical proofs [5, 15, 20] . This approach can validate a full range of properties. Ideally, any property which can be formulated in logic can be verified, but proof and formulation of assertions reflecting the desired correctness properties are often error-prone and need considerable ingenuity. In some studies requiring great efforts, correctness conditions are still incorrectly and/or incompletely formulated. More importantly, this approach cannot deal with state-oriented transitions.
An approach combining the advantages of reachability analysis and of logical proofs has recently been applied to the verification of communication protocols [3, 4, 5, 8] . Reachability analysis based on state models with augmented variables and processing routines is used to expand major states, while logic properties are formulated and proved over state variables and associated context variables. This intermediate approach is wellsuited to the verification of cache coherence protocols: on one hand coherence activities are mainly reflected by the state changes of caches, which suggests the reachability analysis; on the other hand, the modeling of data aspects are dealt with by augmenting the state description with context variables.
Recent work has focused on the state space explosion problem. Instead of explicitly enumerating the state space, McMillan and Schwalbe [22] and several other authors [10, 11] took a different approach to represent the system state space symbolically. If V is the set of (boolean) variables representing the states of components in the system, a system state is an assignment of either 0 or 1 to each variable in V. Therefore, the set of all system states is obtained by all possible interpretations to variables in V. A boolean function f(V) can represent the global state space if f(V) is true for all reachable states. Based on the same idea, possible state transitions from V to V', for all V and V', are also represented by a boolean function g(V, V') such that g(V, V') is true. In this model checking method, Binary Decision Diagrams (BDDs) [6] (or Typed Decision Graph in [11, 21] ) are used to manipulate operations on boolean formulas efficiently. Correctness conditions of protocols are given in temporal logic formulas. They are then converted into quantified boolean functions whose truth values are evaluated. Although this symbolic model checking method does not enumerate all reachable states explicitly, the BDD size for representing transition relations may increase rapidly in proportion to the scale and the complexity of the system. Recently, it was observed that a complex system often exhibits a great deal of regularity and symmetry. Ip and Dill [17, 18] implemented the symmetric Murϕ that exploits the symmetry of the system by grouping together states whose representations are permutations of each other. A similar idea was also applied in symbolic model checking methods by Clarke, et al. [9] and Emerson, et al. [14] . By applying (symmetric) permutation operators G on a state s, we can obtain the orbit set of states θ(s) of s, which can be canonically represented by one state (denoted as ξ(s) in [18] ) picked from θ(s). Transition relations R(s 1 ,s 2 ) are then converted into R G (θ(s 1 ), θ(s 2 )). Since states which are permutations of each other are grouped and represented by only one canonical state in their orbit set, the state space and the BDD size after transformation can be significantly reduced. Methods for model checking on the transformed model without explicitly building the transformed model are also discussed in [9] .
Protocol Model

Finite State Automaton
A protocol can be specified by a simple finite state machine (FSM) model which includes the interactions between caches and main memory only. To simplify the presentation of the methodology we assume first that each protocol transition is atomic, that is, the time required for the change of states of all caches is zero. Useful verifications can be derived under this assumption. In Section 6, we will show through an example how the methodology can be extended to non-atomic protocol transitions.
Representing a cache coherence protocol by an FSM model is natural from the perspective of protocol designers, and, in the past, FSM models have been extensively used to describe and specify cache coherence protocols at a logical level. Without loss of generality, formal definitions of the protocol model are as follows. Strong relations between the definitions of cache states and the status of cached copies are common in cache protocol designs [29] . This suggests a primary verification procedure consisting of searching all reachable global states and proving that all reached states are permissible in the sense that individual cache states are compatible [23] . The problem of searching the global state space is therefore converted into the problem of finding an efficient model for the global FSM.
Model for Data Consistency
A cache coherence protocol must support correct execution of a program on a multiprocessor system. In general, there are two distinct requirements:
• The ordering of accesses must conform to a well defined consistency model and the parallel code must be written correctly for this model through proper use of synchronization primitives (e.g., critical sections).
• For correctly written programs, the cache protocol must always return the latest value on each load.
We formulate this latter condition within the framework of the reachability expansion as follows. With respect to a single memory location, the general framework of our method associates each cache C i with an auxiliary variable cdata i and the memory with auxiliary variable mdata to keep track of data consistency between memory and cached copies.
cdata take values from domain {nodata, fresh, obsolete} and mdata from domain {fresh, obsolete}. Initially, let us assume that all caches are in the Invalid state without data copies (cdata i =nodata, for all i) and that memory has the fresh copy (mdata=fresh). The value assignments to these variables during the state expansion conform to the protocol. A data inconsistency occurs when a processor can access its local copy with value obsolete. Figure 1 shows the state transition diagram of the Illinois protocol [24] for one cache C i . We will use this protocol as a running example throughout the paper. The Illinois protocol distinguishes private and non-actively shared blocks from actively shared blocks such that invalidations for write hits on private and non-actively shared blocks can be avoided. There are four states for cached blocks: Invalid, Valid-Exclusive (not modified; only copy in caches), Shared (not modified; possible copies in other caches) and Dirty (modified; only copy in caches). On a read miss a block is loaded in states Valid-Exclusive or Shared depending on the value of the sharing-detection function F. In the framework of a finite state machine, the Illinois protocol can be described as following:
The Illinois Protocol
• State Symbols Q = {Invalid, Valid-Exclusive, Shared, Dirty}. Rep(j) }, which stand for read, write and replacement issued by cache C i and by other caches C j .
• Cache algorithm from the perspective of cache C i : Following the model of data consistency in Section 3.2, a formal model of the Illinois protocol keeps track of data consistency by associating each cache C i with a variable cdata i and the memory with a variable mdata. The data transfer aspects of the Illinois cache protocol from the perspective of C i can be specified as follows.
1. Read Miss.
if (there exist C j in Dirty state) (mdata = cdata j ) /* update memory */ (cdata i = cdata j ) else if (there is no cached copy) (cdata i = mdata) /* get data from memory */ else /* arbitrarily choose C j with a copy */ (cdata i = cdata j ) 2. Write Hit.
if (C i has a dirty copy) no action is taken else
if (there exist C j in Dirty state) cdata i = cdata j /* must be a fresh copy */ else if (there exist C j with a copy) cdata i = cdata j /* must observe fresh copy */ else /* no cached copy */
if (C i has a dirty copy) mdata = cdata i cdata i = nodata
Reachability Analysis -State Space Expansion
Since the reachability graph is constructed over all global states and since value assignments to auxiliary variables M are irrelevant to state transitions, we concentrate now on the transitions between global states alone.
Exhaustive Enumeration of the State Space
To verify the protocol completely at the finite automaton level, all state transitions must be exhaustively simulated. Conventionally, an exhaustive search algorithm as shown in Figure 2 is used to explore the system state space. Since the state space is enumerated explicitly, the number of caches must be exactly defined. As a result, the state space must be finite because the numbers of state symbols and of cache events are also finite. In a system with n caches, Q  = m state symbols, and Σ G  = k cache events, the maximum number of states in the system state space is (m) n states (this number represents the extreme case because some states are not reachable). However, the number of states visited in the expansion process is far more than (m) n states. For each state in the working list, we must generate all its directly reachable states although some of them may have been visited previously. Without any pruning effort, we need at least approximately k(m) n state visits to complete the expansion process for the worst case. If the connectivity information faithfully showing the path leading to a particular state from a given state is stored, the problem of limited memory capacity becomes apparent. The state space grows exponentially with the complexity of the protocol and the number of entities in the validation model. A quantitative analysis of this technique is given in [16] .
Pruning the State Space by Counting Equivalence
To keep the state space manageable, pruning of redundant states is necessary. Two system states (q 1 , q 2 ,..., q n ) and (s 1 , s 2 ,..., s n ) are strictly equivalent if and only if q i = s i , q i , s i ∈Q, for all 1≤ i ≤ n. This strict equivalence relation is certainly too conservative. As we mentioned before, the behavior of all cache entities is characterized by a common FSM with deterministic transition functions. All n! permutations of a state (q 1 , q 2 ,..., q n ) are equivalent in the validation process because the order of the tuple is not important [9, 18] . Exploiting symmetry to reduce the complexity of a verification procedure is not new.
Based on this system symmetry, Ip and Dill [17, 18] have implemented a symmetric version of Murϕ and applied it to the verification of cache protocols. A similar approach has been taken by Clarke, et al. [9] and Emerson and Sistla [14] . In their approaches, a canon-
ical state S is selected to represent its orbit set θ(S), which is the set of symmetrically equivalent states of S.
Studies have shown that exploiting symmetry to reduce the complexity of a verification procedure is a first step in the right direction but it is not enough to permit the reliable verification of large-scale systems [26] [27] [28] . For a system of n processors, the maximal reduction of the state space is limited to n! when the system symmetry is exploited. Fortunately, equivalence classes can be further broadened by taking into account a unique feature of cache protocols, the fact that we do not have to keep track of the exact number of cached copies to check for protocol correctness. This extension of state equivalence relations beyond the strict counting equivalence of Definition 5 was first proposed in [25] .
Symbolic Expansion
The major difficulties of exhaustive enumeration techniques are the state space explosion problem, the large amount of memory required to manipulate the state information, and the inefficiency of testing for the convergence of the state expansion process.
Searching through the history and working lists against the new state becomes intolerable for large state spaces. The number of visited states which can be maintained is also limited by the memory size. Another technical problem is the fact that the validation is done for a fixed number of caches. It is not clear at first that a protocol correct for a system with n caches would also be correct for a system with n' caches, n ≠ n'. The verification procedure should deal with any n in order to validate the protocol for any system.
Composite State
The number of caches in a particular cache state plays an important role in judging whether or not a system state is permissible. For example, several caches in the Dirty state signal a data inconsistency. Similarly if a cache is in the Shared state, the local copy is clean and possibly present in other caches. In theory, an infinite number of caches could have clean copies without affecting protocol correctness. In all these cases, the actual number of copies is not important. What is critical in all protocols is whether there are 0, 1 or several copies in a given state. These possibilities can be represented by the following set of repetition constructors 2 .
2. Note that if a protocol was ever invented in which two dirty copies are permissible, the methodology would still be applicable provided that we add this new possibility to the list of repetition constructors. 
Definition 6 (Repetition Constructors)
Computing Values of the Sharing-Detection Function on Composite States
Because the composite state is defined upon states of arbitrary number of caches, values of the sharing-detection function F is computed accordingly. Assuming that we have a composite state S= , we compute F as follows:
1. If none of q i 's indicates that cached copies exist, all caches observe no data sharing. 3 . In fact, the generated state is (Shared, Invalid * ), which represents a superset containing (Shared, Invalid + ). Nevertheless, for the sake of illustrating the way to compute the sharing-detection function in this example, we assume the generation of state (Shared, Invalid + ). 
Information Ordering and Pruning
Repetition constructors can be ordered by the set of possible states they specify.
The resulting order is 1 < + < * and 0 < *. We also write q 1 < q + < q * and q 0 < q * where q ∈Q. q + reveals that one cache is in state q (which is always permissible) or that multiple caches are in state q (which may indicate a data inconsistency condition 
Definition 10 (Essential State) Composite state S is essential if and only if there does not exist a composite state S such that S ⊆ F S.
At the end of the expansion process, the state space is simply decomposed into several families (which may be overlapping) represented by essential composite states as shown in Figure 3 . Readers should be aware of the fact that the generation of all essential states is successful only when the verified protocol is correct. If the protocol is incorrect, expanding error states which lead to unpredictable states is practically meaningless. We assume that the state expansion process terminates whenever a protocol error is detected. 
Rules and Algorithm for the Expansion Process
We need to define the set of operations applicable to composite states in the state generation process. In the following, '/' signifies "or" selection. During the state expansion process, the next state is produced by simulating the current state, by exploring all possible cache transitions, and by repeatedly applying the above rules. Specifically, a state expansion step has two phases. During the first phase, a new composite state is derived from the current state by firing Coincident, One-step or Nsteps transition rules. In the second phase, the Aggregation rule is applied to lump together caches in the same state.
Before formalizing the algorithm for symbolic state expansion and protocol verification, we first prove, as promised, the monotonicity of the expansion process. 
Lemma 1 The aggregation process is monotonic
,     ≤ q r 2 ≡ ≡ S 1 q 1 r 1 q 2 r 2 … q i 1 - r i 1 - q i i=1 q i 1 + r i 1 + … q n r n , , , , , , ,     = S 2 q 1 r 1 q 2 r 2 … q i 1 - r i 1 - q i 1/+/* q i 1 + r i 1 + … q n r n , , , , , , ,     = → τ Lemma 3 The claim S 1 ≤ S 2 holds if S 1 ⊆ F S 2 ,
that is, r j ≤ r j for all j and F(S 1 )=F(S 2 ).
Proof: The result extends the conclusion of lemma 1 and the proof is similar. Proof: Because F is null, the transition functions depend only on local cache state and intended operations. By Definition 9, the relation of containment ⊆ null is characterized by the relation of structural covering ≤ alone. As a result, the claim is just a recursive induction from lemma 3.
Corollary 1 demonstrates that the symbolic expansion process for protocols whose behavior does not depend on any characteristic function exhibits the monotonous property.
Therefore, during the expansion process, S 1 can be discarded because all successors originated from S 1 can be generated by expanding the successors of S 2 . Corollary 2 states that the symbolic expansion process for protocols depending on a sharing-detection function also exhibits monotonicity. Proof: See the proof in Appendix A.1.
Corollary 2 If F is the sharing-detection function and S
The preceding results suggest a very efficient expansion process shown in Figure 4 to obtain essential states. Two lists keep track of non-expanded and visited states. At each step, a new state is produced by expanding the current state, and then a pruning process justified by the monotonicity property removes contained states. The final output reported in list H is the set of essential states. All possible states are included in the reported essential states, as we now show.
Theorem 5
The essential composite states generated by the proposed algorithm of Figure   4 are complete. They symbolically characterize all states which can be produced by an exhaustive expansion process such as the algorithm of Figure 2 .
Proof:
The exact number of caches is explicitly specified in the validation model based on an enumeration approach, whereas the symbolic approach employs canonical forms to represent states. Consider states u, v (derived from u by transition τ) in the enumeration approach and composite states s, t (derived from s by transition τ) in the symbolic form such that s symbolically represents u. t also represents v, because, during the generation of composite states from s to t, the same transition functions are applied and the same information is accumulated as in the expansion of u into v.
Uniqueness of Essential States and Accumulation of State Information
In addition to the property of monotonicity, the SSM method has two other interest- Proof: This theorem means that the set of essential states defines a fixpoint where the state expansion process terminates. Due to theorem 5, ES represents all possible configurations that the system can reach. Therefore, S must be contained by at least one S e in ES.
Because the symbolic state expansion is monotonic, all states derived from S are contained by states derived from S e . When the state transition graph of ES is strongly connected, there must exist at least one path from S e to all other essential states. It is impossible to reach an essential state S e ∉ ES from S. u It is not difficult to see that theorem 6 is invalid when the state graph is not strongly connected. Consider the simple case that the state graph consists of two subgraphs: G1 and G2 are individually strongly connected and paths exist from G1 to G2, but not vice versa. If the state expansion process starts from a state which is contained by states in G2 but not by states in G1, then only the subgraph G2 will be produced. Because the essential states cover all possible states that can be generated by classical state enumeration methods, this property also holds for traditional state enumeration methods. In order to generate the entire state graph, the state expansion must start with a state in G1.
The protocol designers cannot determine whether the state graph is strongly connected in advance. It is, however, always safe to start the state expansion process with an initial state in which all caches are empty because this is usually the state when the system is turned on. If subgraphs containing sink states are detected, we can isolate the subgraphs and analyze them.
The accumulation of state information in the SSM method is a major strength of the SSM method over other approaches. Consider the state transition of a write miss in the verification of the Illinois protocol:
Initially, no processor has a copy of the block. When a processor causes a write miss, its cache receives an exclusive, dirty copy and all other caches remain in the Invalid state.
In order to reach the resulting state (Dirty, Inv * ), a traditional state enumeration method would need to model at least two caches; in general, it is difficult to predict the number of caches needed in a model to reach all the possible state of a protocol. The SSM method eliminates this uncertainty since it verifies a protocol model independently of the system size or of the number of processors.
Verification of the Illinois Protocol
We demonstrate the symbolic expansion by applying the algorithm of Figure 4 to the Illinois protocol. The other four protocols described in [1] are verified in Appendix B.
We start the expansion process in an initial state (Invalid + ), in which no cache has a block copy. ) after the refinement.
Systems with Non-Atomic Accesses
In this section, we consider a shared-bus snooping protocol to demonstrate the applicability of the methodology to verify systems with non-atomic protocol transitions.
We verify the design of a snooping protocol for a split-transaction (packet-switched) bus.
Although this protocol is simple, it is adequate for demonstration purpose and it suggests that the methodology can be applied to more complex protocols. Figure 6 depicts the state transition diagram of a snooping protocol for a splittransaction bus. When a cache needs to load data from main memory or from the cache with a dirty copy, a packet is sent on the bus, the bus is released and the missing block is later supplied. Between the initial request and the return of the block, other, independent protocol transactions can take place. Modeling such non-atomic transactions is accomplished by including more implementation details and of adding more transient states to the state machine model as shown in Figure 6 . It is always possible to introduce such addi-
A Protocol for a Split-Transaction Bus
where q i Q, ∈ for all i tional states to the state model. In this verification process, we are not just modeling the protocol but also some details of its implementation [12] . This protocol is basically similar to the protocols in [1, 22] also has the ability to prevent concurrent requests to the same memory block through interprocessor interlocks.
• Aggressive snarfing.
We first tried a very aggressive protocol, which always allows a read miss to propagate on the bus; bus interlocks prevent the propagation of write requests on the bus only if there is a request already pending for the same block. When multiple misses for the same block are pending, the main memory or the current owner respond with the data block and a null destination field causing all caches in the RP state to grab the copy and share it. Unfortunately, this intuitively correct design leads to a data inconsistency state after 31 expansion steps. The error trace is:
The problem arises because caches with read misses are allowed to receive data sent in response to a preceding cache write miss.
• Snarfing on read miss data.
To correct the flaw in the first protocol, the cache in the WP state rejects all concurrent requests to the same block, which means that no read miss can propagate on the bus if one write request is pending; however multiple read misses can be propagate concurrently.
After fixing the first design error, we ran the protocol model and discovered another flaw after 27 steps:
In the above event sequence the caches in the RP state load obsolete data from main memory rather than the up-to-date data from the current Owner.
• Correct Design.
The flaw in the second design suggests that the responding cache in the XMem state should inhibit main memory if there is any concurrent read request appearing on the bus. With this addition to the protocol, the verification of the protocol model runs without detecting any error and reports the global state transition diagram shown in Figure 7 after 28 steps. In state S2 and S4, an interlock must be set on the bus to avoid multiple concur- In all cases the verification procedure only took a few seconds of computing time. 
Conclusion
We have presented an overview of the method based on Symbolic State Models (SSM) for validating cache coherence protocols at the behavior level. We have also shown how to extend the method to verify a protocol with non-atomic transitions at a lower level of abstraction. By exploiting equivalence relations among global states, we can symbolically represent and generate the system state space rather than enumerate it. The global transition diagram built upon the symbolic essential states not only facilitates the verification of data consistency but also specifies the global behavior of the protocol for any number of processors. This description is much more concise and precise than previous approaches relying on the specification of each cache FSM. Such a representation highlights the similarities and disparities among protocols at the system level. protocols for hierarchically organized machines. Second, the verification is independent of the number of caches in the system and therefore is totally reliable. This unique feature is not shared by other approaches.
We did not find any consistency problem in the five protocols that we have examined under atomic transitions. This was somewhat expected. The protocols are relatively simple and have been time-tested. The example with non-atomic protocol transitions is much more complex and demonstrates that the verification technique can detect subtle protocol errors.
Currently, a fully automated verification system has been implemented, which includes a high-level description language and a mechanical verifier. The tool has been successfully applied to a wide range of cache protocols, including central directory-based protocols, S3.mp (Sun's Scalable Shared-memory MultiProcessor) distributed directorybased protocol, and delayed consistency protocols developed for systems with relaxed memory models [26] [27] [28] . Consider the values of F returned in the successor composite state S 1 .
(a).F(S 1 ) = v1. This can happen if the transition τ is repeatedly applied to S 1 on caches in state q with (q Invalid) and S 1 has the structure (Invalid +/* , q + ). S 2 should have the same structure and results in F(S 2 ) = v1 because the transition τ removes all cached copies in S 1 and has the same effect when applied to S 2 , and thus F(S 2 ) = v1.
(b).F(S 1 ) = v2. This can happen if a transition τ is applied to S 1 and removes all cached copies except in the cache which originates τ. Because S 1 ⊆ F S 2 , τ must have the same effects when applied to S 2 , and hence F(S 2 ) = v2.
(c).F(S 1 ) = v3. In this case, S 1 carries the information that two or more than two cached copies exist. Since we know that S 1 ≤ S 2 , or S 1 is structurally covered by S 2 , two or more than two cached copies exist in S 2 , that is, F(S 2 ) = v3.
From the above, we can conclude that F(S 1 )=F(S 2 ) in all cases.
→ τ transition diagram is shown in Figure 9 . A block in Shared-Dirty state in one cache can coexist with multiple Valid copies in other caches. Fig. 9 . The Berkeley protocol after 33 state visits. Figure 10 shows the global state transition diagram for the Firefly protocol. The
B.3 The Firefly Protocol
Firefly protocol is a write-broadcast protocol with dynamic sharing detection. Exclusive state indicates that no other cache has a copy and that writes need no longer be broadcast. A write back occurs when a dirty copy is selected for replacement. 
B.4 The Dragon Protocol
In Figure 11 , the output of our algorithm is shown for the Dragon protocol. The
Dragon protocol is also a write-broadcast protocol. It is different from the Firefly protocol because it supports a Shared-Dirty state and updates to shared blocks are not immediately reflected to main memory.
The cache which performed the latest write to the shared block is in the SharedDirty state and is responsible for supplying the block on misses in remote caches and for updating main memory on replacement. A Shared-Dirty copy is not consistent with the main memory copy, which is apparent from the table of Figure 11 . The Berkeley and the We also notice that the Exclusive state, as defined in the Illinois protocol, is included to save a broadcast message upon writing to an exclusive and clean copy. 
