Abstract. In shared-memory multiprocessors sequential consistency o ers a natural tradeo between the exibility a orded to the implementor and the complexity o f t h e programmer's view of the memory. Sequential consistency requires that some interleaving of the local temporal orders of read/write events at di erent processors be a trace of serial memory. W e d e v elop a systematic methodology for proving sequential consistency for memory systems with three parameters |number of processors, number of memory locations, and number of data values. From the de nition of sequential consistency it su ces to construct a non-interfering observer that watches and reorders read/write events so that a trace of serial memory is obtained. While in general such an observer mu s t b e u n bounded even for xed values of the parameters |checking sequential consistency is undecidable!| we s h o w that for two paradigmatic protocol classes |lazy caching and snoopy cache coherence| there exist nite-state observers. In these cases, sequential consistency for xed parameter values can thus be checked by language inclusion between nite automata.
Introduction
Shared-memory multiprocessors are an important class of supercomputing systems. In recent y ears a number of such systems have been designed in both academia and industry. The design of a correct and e cient shared memory is one of the most di cult tasks in the design of such systems. The shared-memory interface is a contract between the designer and the programmer of the multiprocessor. In general, there is a tradeo between the ease of programming and the exibility of shared-memory semantics necessary for an e cient implementation. Not surprisingly, a n umber of abstract shared-memory models have been developed.
All abstract memory models can be understood in terms of the fundamental serial-memory model. A serial memory behaves as if there is a centralized memory that services read and write requests atomically such that a read to a location returns the latest value written to that location. Coherence 1 requires that the global temporal order of events (reads and writes) at di erent processors be a trace of serial memory. Sequential consistency Lam79] ignores the global temporal order and requires only that some interleaving of the local temporal orders of events at di erent processors be a trace of serial memory. Although sequential consistency is a strictly weaker property than coherence, the absence of a synchronizing global clock b e t ween the di erent processors in a multiprocessor makes a sequentially consistent memory indistinguishable from a serial memory. Compared to coherence, sequential consistency clearly o ers more exibility f o r an e cient implementation yet, most real systems that claim to be sequentially consistent actually end up implementing coherence. In an e ort to get more exibility for implementation, memory models that relax local temporal order of events at each processor have been developed in recent y ears. This has been achieved at the cost of complicating the programmer's interface. These memory models such a s w eak ordering, partial store ordering, total store ordering, and release consistency AG96] relax the processor order of events in di erent ways and provide fence or synchronization operations across which sequentially consistent behavior is guaranteed.
We focus on the veri cation of sequential consistency for two reasons. First, the interface provided by s e q u e n tial consistency is clear, easy to understand, and widely believed to be the correct tradeo between implementation exibility and complexity of the programmer's view of shared memory. In fact, there is a trend of thought Hil98] that considers the performance gains achieved by relaxed semantics not worth the added complexity of the programmer's interface and advocates sequential consistency as the shared-memory interface for future multiprocessors. Second, even relaxed memory models have fence operations across which sequentially consistent behavior should be observed. Hence, the techniques developed in this paper will be useful for their veri cation also.
High-level descriptions of shared-memory systems are typically parameterized by t h e n umber n of processors, the number m of memory locations, and the number v of data values that can be written in a memory location. A parameterized memory systems consists of a central-control part C and a processor part P. B o t h C and P are functions that take v alues for m and v and return a nite-state process. An instantiation of the system containing n processors, m memory locations, and v data values is constructed by composing C(m v) with n copies of P(m v). We w ould like t o v erify sequential consistency for all values of the parameters. However, sequential consistency is not a local property correctness for m processors (locations, values) cannot be deduced by reasoning about individual processors (locations, values). The following observations about real shared-memory systems, which w e assume in our modeling, are crucial for our results. We assume that the memory system is monotonic and symmetric with respect to both the set of locations, and the set of data values. Monotonicity i n locations means that every run of the system projected onto a subset of locations is a run of the system with just that subset of locations. Monotonicity i n d a t a values means that a sequence is a run of the system with some set of possible data values if and only if it is a run of the system with a larger set of data values. Symmetry in locations means that, if is a run of the memory system, and l is a permutation on the set of locations, then l ( ) is also a run of the memory system. Finally, symmetry in data values means that, if is a run of the memory system, and v is any function from data values to data values, then v ( ) i s also a run of the memory system. Even for xed values of the parameters, checking if a memory system is sequentially consistent is undecidable AMP96]. The main reason for the problem being undecidable is that the speci cation of sequential consistency allows a processor to read the value at a location after an unbounded number of succeeding writes to that location by other processors. In real systems, nite resources such as bu ers and queues bound the number of writes that can be pending. It is su cient to construct a witness that observes the reads and writes occurring in the system (without interfering with it) and reorders them while preserving the order of events in each processor such that a trace of serial memory is obtained. We c a l l s u c h a witness an observer. If a nite-state observer exists, then it can be composed with a xed-parameter instantiation of the memory system and the problem of deciding sequential consistency is reduced to a languagecontainment c heck b e t ween two nite-state automata which can be discharged by model checking. In the concrete examples we h a ve l o o k ed at (see below), we h a ve indeed seen that a nite-state observer exists for xed values of the parameters.
However, our goal is to verify sequential consistency for arbitrary values of the parameters. Towards this end, we r s t d e v elop a novel inductive p r o o f framework for proving sequential consistency for any n umber n of processors, given xed m and v. Inductive proofs on parameterized systems KM89] use an implementation preorder and show the existence of a process invariant such that the composition of the invariant with an additional process is smaller than the process invariant in the preorder. The preorders typically used |for instance, trace containment and simulation| preserve the temporal sequence of events. Since we c heck a su cient condition for sequential consistency by the mechanism of an observer that reorders the read/write events of the processors in the system, preorders that preserve the temporal sequence of events do not su ce for our purpose. Our inductive proof strategy rst determines a process invariant I 1 of the memory system with respect to the trace-containment preorder to get a nite-state abstraction that can generate all sequences of observable actions for any n umber of processes. We t h e n n d a merge invariant I 2 such that (1) the single-process memory system containing I 2 is sequentially consistent, and (2) there is an observer that maps every run of I 2 kP that can be produced in an environment o f I 1 to a run 0 of I 2 , such that the read/write events in 0 are an interleaving of the read/write events of I 2 and P in , and the traces obtained from and 0 are identical. Given a run of the memory system with n > 1 processors, we use the observer to create a run 0 of the memory system with n ; 1 processors, such t h a t and 0 are identical when projected to the events of the rst n ; 2 processors, and the read/write events of the (n ; 1)-st processor in 0 are an interleaving of the read/write events of the (n ; 1)-st and n-th processors in . By doing this n times, we generate a run of the memory system with a single processor, which i s s e q u e n tially consistent b y the base case of the induction.
The induction demonstrates sequential consistency for any n umber of processors, but given m and v. W e w ould like su cient conditions under which using xed values for m and v lets us conclude sequential consistency for all m and v. T o that end, we impose three requirements on the process and merge invariants. The rst two requirements |symmetry and monotonicity on memory locations| are identical to the corresponding assumptions on the memory system. The third requirement i s c a l l e d location independence. A p r o c e s s i s l o c ation independent if it has the property that a sequence of events is a run of the process with m locations if the m sequences obtained by projecting onto individual memory locations are runs of the process with a single location. We s h o w that if the two i n variants satisfy location symmetry, location monotonicity, a n d location independence, and the observer is location and data independent, then it su ces to do the induction for three memory locations and two d a t a v alues. As a result, the correctness of the memory system can be proved by d i s c harging two nite-state lemmas using a model checker |one that proves the correctness of the process invariant, and another that proves the correctness of the merge invariant.
Our proof framework can be applied to a variety of protocols in particular, all cache-coherence protocols described in AB86] fall into its domain. We demonstrate the method by v erifying two example protocols |lazy caching ABM93] and a snoopy cache-coherence protocol HP96]. The correctness of lazy caching has been established before by manual proofs ABM93, Gra94, LLOR99] . The correctness of the snoopy c a c he-coherence protocol is argued informally in HP96]. Finite-state observers exist for both these examples. In both cases, the proof of a parameterized system was reduced to nite-state lemmas in the way described above, and discharged by our model checker MOCHA AHM + 98]. Manual e ort was required to construct the process and merge invariants, and the observer, a n d t o v erify that the assumptions on the memory system and the requirements on the invariants and observer are indeed satis ed.
Related work. We use process induction for the veri cation of an abstract memory model. We list related work along two axes |work that veri es abstract memory models, and work that veri es systems with an arbitrary number of processes. MS91,CGH 
Parameterized memory systems
A parameterized memory system M has three parameters |the numbern of processors, the numberm of memory locations, and the numberv of data values. The parameterized memory system M is built from two parameterized I/O-processes C and P which h a ve t wo parameters |the number m of memory locations, and the number v of data values. Intuitively, the I/O-process P represents a single processor in the system and C represents a central controller. The I/O-process M(n m v) is built from the I/O-processes C(m v) a n d P (m v) b y composing C(m v) a n d n copies of P (m v). Given n > 0, m > 0, and v > 0, the memory system M(n m v) is an I/O-process that has processors numbered from 1 : : : n , memory locations numbered from 0 : : : m ; 1, and data values numbered from 0 : : : v ; 1. We n o w formally de ne a parameterized memory system. Let N be the set of all non-negative i n tegers. For any k > 0, let N k denote the set of all nonnegative i n tegers less than k. A parameterized memory system is a pair hC Pi such t h a t b o t h C and P are functions that map N n f 0g N n f 0g to I/Oprocesses such that for all m > 0 a n d v > 0, we h a ve t h a t P r i v (C(m v)) = and ObsNames P are nite sets that satisfy the following properties:
1. PrivNames C \ ObsNames C = , a n d PrivNames P \ ObsNames P = .
2. PrivNames C \ (ObsNames P PrivNames P N) = , a n d ObsNames C \ (PrivNames P N) = . 3. R 2 PrivNames P and W 2 PrivNames P .
The functions name, loc, and val are de ned on Act(C(m v)) Act(P (m v)), and extract respectively the rst, second, and third components of the actions. Given some m and v, l e t RdWr(m v) be the union of the set of read actions fhR j kijj < m and k < vg and the set of write actions fhW j k ijj < m a n d k < vg.
For all m and v and for all k > 0, let P k (m v) denote the I/O-process that is obtained from P (m v) b y renaming every private action a to action a 0 , such that (1) name(a 0 ) i s t h e p a i r hname(a) k i, ( 2 ) loc(a 0 ) = loc(a), and (3) val(a 0 ) = val(a). A parameterized memory system de nes a function that maps N N n f 0g N n f 0g to I/O-processes as follows:
For particular n m v, w e s a y t h a t M(n m v) i s a memory system. Note that M(n m v) is compatible with P n+1 (m v), due to the renaming of private actions in P n+1 (m v), and the conditions on the names of private and observable actions of C and P described above. The observable actions of M(n m v) are the same for all n > 0. We de ne a function proc on the set of actions S k m v P r i v (P k (m v)) such that if a is a private action of P k (m v), then we have proc(a) = k.
Sequential consistency
Let Memop(n m v) be the union of the sets fhhR ii j k ij0 < i n and j < m a n d k < v g and fhhW i i j k ij0 < i n and j < m and k < vg. T h us Memop(n m v) denotes the set of read and write operations of M(n m v). The functions name, loc, a n d val, which w ere originally de ned on actions of P(m v) and C(m v), can be de ned analogously on actions of M(n m v). Thus, the four functions name, loc, val, a n d proc are de ned on all members of Memop(n m v). We use Memop to denote the set S n m v Memop(n m v). Let = 1 2 : : : k be a sequence in Memop , the set of nite sequences with elements from Memop. T h e abstraction of , denoted by ( ), is a labeled directed graph hV E Li, where V is a nite set of vertices, E V V , a n d L is a function from V to Memop(n m v), such that (1) V = f1 2 ::: kg, (2) for all i 2 V , w e h a ve that L(i) = i , and (3) for all x y 2 V , w e h a ve that hx yi 2 E i proc(L(x)) = proc(L(y)) and x < y . W e o b s e r v e that for every sequence 2 Memop , the abstraction ( ) is an acyclic graph. Thus, we can obtain total orderings of the vertices in ( ) that respect the dependencies speci ed by its edges. Since the edges form a partial order, several such total orders, which are called linearizations of , m a y exist. Formally, a one-to-one mapping f : V ! V is a total order of ( ) = hV E Li if for all x y 2 V , w h e n e v er hx yi 2 E we h a ve that f ;1 (x) < f ;1 (y). If f is a total order of ( ), then the sequence L(f(1)) L (f(2)) : : : L (f(jV j)) of actions in Memop(n m v) i s a linearization of .
We are interested in de ning which sequences from Memop are serial. Intuitively, a sequence from Memop is serial if it can be produced by serial memory where each read from a location returns the value written by the last write to that location. We state this formally below. Let = 1 2 : : : k be a sequence in Memop . W e d e n e lastwrite as a function that associates with each position i in , the position j in where the most recent write to the location loc( i ) w as done. Formally, lastwrite is a mapping from the from the set f1 2 : : : k g to f1 2 ::: kg f ? g such that lastwrite (i) = j if there exists a j such that j i, loc( i ) = loc( j ), name( j ) = hW n 1 i for some n 1 , and there does not exist any j 0 with j < j 0 i, name( j 0 ) = hW n 2 i for some n 2 , a n d loc( j 0 ) = loc( i ) otherwise lastwrite (i) = ?. The sequence is serial if for all i k, i f lastwrite (i) 6 = ?, t h e n val( i ) = val( lastwrite (i) ). 
Assumptions on parameterized memory systems
In order to reduce the proof of sequential consistency of the parameterized memory system to nite state model checking obligations, we m a k e some assumptions about memory systems. We rst state a few additional de nitions.
Let be a run of the memory system M(n m v). We d e n o t e b y j j the run restricted to the j th memory location. Formally, w e h a ve j j = ] , where = fa j a 2 Act(M(n m v)) and loc(a) = jg. F or j > 0, we denote by <j the run restricted to memory locations numbered less than j that is, j <j = ] , where = fa j a 2 Act(M(n m v)) and loc(a) < j g. Assumption 1 (Location symmetry) Let : N m ! N m be a p ermutation function on the set of memory locations. Extend to actions, extended actions and extended action sequences in the natural way. Then, Note that the function in assumption 1 above i s a p e r m utation on the set of locations, whereas the function in assumption 3 could be any arbitrary function on the set of data values. Let 0 be the function from Act(M(n m v)) to Act(M(m n v)), which c hanges the location attribute to 0. Formally, 0 (a) = a 0 such that name(a 0 ) = name(a), loc(a 0 ) = 0, and val(a 0 ) = val(a). We extend 0 to extended action sequences in the natural way. The observer is location independent if for all j, w e h a ve t h a t ( 0 ( j j )) = 0 ( ( )j j ). The observer is data independent if for every function : N v f ? g ! N v f ? g such that (x) = ? i x = ?, w e h a ve that ( ( )) = ( ( )). Proposition 1. Suppose the parameterized memory system M satis es assumptions 1{4. For all n > 0, the following two statements are e quivalent:
1. There i s a l o cation and data independent serializer for M(n n 2). 2. There i s a l o cation and data independent serializer for M(n m v) for all m > 0 and v > 0.
Suppose we x t h e n umber of processors to n. D u e t o t h e a b o ve proposition it su ces to consider only n locations and 2 data values, if the serializer we design is location and data independent. Since our objective i s t o p r o ve sequential consistency for an arbitrary number of processors, we g i v e a method based on induction over the number of processors for this. The inductive step in the method considers two processors and designs a serializer-like function for them. Then an argument similar to the one used in proving Proposition 1 will let us show that it is enough to perform the inductive step for xed numbers of memory locations and data values. If the parameterized memory system M is sequentially consistent, then by o u r de nition, there exists an observer for M such that for every sequence of memory operations of M(n m v), the function produces a rearranged sequence 0 such t h a t ( 1 ) 0 is serial, and (2) and 0 agree on the ordering of the memory operations of each individual processor. We wish to provide an inductive construction that produces such an observer for arbitrary n. The construction uses the notion of a generalized processor called a merge invariant, and a witness function that works like an observer for a two-processor system consisting of the merge invariant and P (m v).
Recall that RdWr(m v) is the set of private actions of P(m v) that represent read and write operations. For technical reasons, we w ant the memory operations of the merge invariant to be named di erently than those of P (m v). Let 
Reduction to a xed number of memory locations
In this section, we use assumptions 1 and 2 on the parameterized memory system. Further, we impose requirements on the process and merge invariants and the merging function that will reduce the veri cation problem to one on a xed number of memory locations. The rst two requirements are identical to assumptions 1 and 2 on the parameterized memory system. (I2(3 v) ) .
Reduction to a xed number of data values
In this section, we assume that the memory system satis es assumptions 3 and 4. Recall the de nition of a data-independent observer. Theorem 3. Let M be a p arameterized memory system satisfying assumptions 3 and 4. For all n > 0, m > 0, a n d v > 0, i f is a data-independent observer for the memory system M(n m v), then is a serializer for M(n m 2) i is a serializer for M(n m v). Consider a merging function . W e s a y t h a t is data independent if for all v, and for every function : N v f ? g ! N v f ? g such t h a t (x) = ? i x = ?, we h a ve that ( ( )) = ( ( )). Suppose that the witness for B2 I2 I1 (m v) is data independent. Then the implicit observer function that is produced for M(n m v) as a result of n applications of the witness is also data independent. Corollary 1. Let M be a p arameterized memory system satisfying assumptions 1{4. Let I 1 be a p ossible process invariant and let I 2 be a p ossible merge invariant for M satisfying requirements 1{3. Let be a l o cation and data independent merging function. Suppose A I1 (1 2) and B1 I2 (1 2) are true, and is a witness for B2 I2 I1 (3 2). Then M(n m v) is sequentially consistent for all n > 0, m > 0, and v > 0 t h a t i s , M is sequentially consistent.
Two Applications: Lazy Caching and Snoopy Coherence
We show h o w the theory developed in the previous section can be used to verify sequential consistency of memory systems with an arbitrary number of processors, locations and data values using a model checker. We consider two speci c memory protocols, namely the lazy caching protocol from ABM93] and a snoopy cache-coherence protocol from HP96].
For each of these protocols, we rst argue that assumptions 1{4 are satis ed by the memory system, and that requirements 1{3 are satis ed by the process and merge invariants. Then, we design a witness and argue that it is location and data independent. The following observations provide justi cation for our informal arguments: { The invariants and the witness have the property that they never base their decisions on data values. Thus, they are data independent b y d e s i g n . { The memory system inherently enforces a total order on the writes to every location. In fact, every memory system we k n o w of has this property. Our merge witness respects this total order for every location. Let M be a parameterized memory system and let be a merging function. Let be a run of M and let j be any location. The order of writes in ( )j j is the same as the total order of writes to location j in . E v ery read to a location reads the value written to that location by some earlier write. The witness also respects this causal relationship between the writes and the reads. If two reads of location j access the value written by the same write, then the witness places them in their temporal order of occurrence in ( )j j . T h us, the ordering of events to a location j is independent o f t h e e v ents to other memory locations and determined solely by the temporal sequence of events to location j. Hence, our witness is naturally location independent.
We nally discharge the three proof obligations of Corollary 1 using our model checker MOCHA. Lazy Caching. The lazy caching protocol allows a processor to complete a write in its local cache and proceed even while other processors continue to access the \old" value of the data in their local caches. Each c a c he has an output queue in which writes are bu ered and an input queue in which reads are bu ered. In order to satisfy sequential consistency, some restrictions are placed on read accesses to a cache when either writes or updates are bu ered. A complete description of the protocol can be found in ABM93].
The I/O-process C(m v) for this protocol is the trivial process with a single state that accepts all input actions. The I/O-process P(m v) is a description of one processor and cache in the system. The set P r i v (P (m v)) has actions with three di erent names: read, write, a n d update. A n update action occurs when a write gets updated to a local cache from the head of its input queue. There is one action for each combination of these names with locations, processors and data values |a total of 3 n m v private actions. The set Obs(P(m v)) has actions with one name: serialize. A serialize action occurs when a processor takes the result of a local write from the head of its output queue and transmits it on the bus. The serialize action does not identify the processor which did the action. Thus, a processor has m v di erent observable actions.
The process invariant I 1 is such that for all m and v, the I/O-process I 1 (m v) simply generates all possible sequences of serialize actions. It is trivial to see that I 1 is a process invariant. The merge invariant I 2 is exactly the same as P. The merging function is non-trivial. It queues write actions and delays them until the corresponding update action is seen by all processors. It also delays read actions until the corresponding write has been made visible. The witness preserves processor order, never bases decisions on data values, and respects the total order of writes that is inherent to the lazy-caching protocol. By design, the witness is location and data independent. Snoopy Cache Coherence. The snoopy coherence protocol has a bus on which all caches send messages, as well as \snoop" and react to messages. Each l o c ation has a state machine which is in one of three states: read-shared, writeexclusive, o r invalid. If a location is in read-shared state, then a cache has permission to read the value. If a location is in write-exclusive state, then a cache has permission to both read and write the value. In order to transition to read-shared or write-exclusive states, the cache sends messages over the bus, and other caches respond to these messages. There is also a central memory attached to the bus. When a location is not held in write-exclusive by a n y cache, the memory owns that location and responds to read requests for that location.
The I/O-process C(m v) for this protocol models the central memory, a n d P (m v) models one processor with a local cache. The process C(m v) has no private actions. It has observable actions with four di erent names: read-request, write-request, read-response, and write-response. The process P(m v) has private actions with two di erent names: read and write, and the same set of observable actions as C(m v). None of the observable actions identify the processor that did the action.
The process invariant i s s u c h that for all m and v, w e h a ve t h a t I 1 (m v) i s a generalization of the processor P (m v). The processor is generalized so that it can send a read-request for a location even if it already has the location in read-shared, a n d a write-request even if it already has the location in writeexclusive.The merge invariant I 2 is identical to I 1 with the additional capability to execute private read and write actions. The merging function preserves temporal order of occurrence of reads and writes. This simple witness works because the snoopy protocol implements coherence. Again, by design the witness is data and location independent. We used MOCHA to verify that I 1 (1 v ) i s a process invariant a n d I 1 (3 v ) is a merge invariant. MOCHA required less than 15 minutes to check these.
