The 'Test model-checking' approach to the verification of formal memory models of multiprocessors by Gopalakrishnan, Ganesh & Nalumansu, Ratan
The ‘Test M odel-checking’ Approach to the Verification of 
Formal M emory M odels of M ultiprocessors *
R a ta n  N a lu m asu , R a jn ish  G hughal, A bdel M okkedem  and  G anesh  G opalak rishnan
UU CS-98-008
D e p a rtm e n t o f C o m p u te r  Science, U n iversity  o f U ta h ,
S a lt Lake C ity , U T  84112-9205 '
C o n tac t em ail: { ra ta n , ganesh}@ cs.u tah .e d u
This technical report combines work reported in CAV 98 and SPAA 98.
A b strac t
W e offer a  so lu tio n  to  th e  p rob lem  o f  verify ing  fo rm al m em ory  m odels o f  p rocessors by com ­
b in in g  th e  s tre n g th s  o f m odel-checking an d  a  fo rm a l te s tin g  p rocedure  for p ara lle l m achines. W e 
ch a rac te rize  th e  fo rm al basis for a b s tra c tin g  th e  te s ts  in to  te s t a u to m a ta  an d  assoc ia ted  m em ory  
ru le  safe ty  p ro p e rtie s  w hose v io la tio n s p in p o in t th e  o rdering  ru le  being  v io la ted . O ur ex perim en­
ta l resu lts  on V erilog m odels o f  a  com m ercia l sp lit tra n sa c tio n  bus d em o n s tra te s  th e  a b ility  o f  our 
m e th o d  to  effectively debug  design m odels d u rin g  early  stages o f th e ir  deve lopm en t.
K e y w o rd s :  Formal memory models, shared memory multiprocessors, formal testing, model-checking.
1 Introduction
The fundam entally im portan t problem [AG96] of verifying whether a  given memory sys tem model  (or 
“a memory system ” ) provides a, formal  memory model  (or “memory model” ) appears in a num ber of 
guises. CPU  designers are interested in knowing whether some of the aggressive execution techniques 
such as speculative issue of memory operations violate sequential consistency; I /O  bus designers are 
interested in knowing the exact sem antics of shared accesses provided by split I/O  transactions 
[Cor97]; even language designers of m ulti-threaded languages such as Java th a t  support shared 
updates [GJS96] are interested in this problem . Formal verification m ethods are ideally suited for 
th is problem because: (i) the sem antics of memory orderings are too subtle to  be fathom ed through 
informal reasoning alone; (ii) ad hoc testing m ethods cannot provide assurance th a t  the desired 
memory model has been implemented. U nfortunately, despite the central im portance of this problem 
and the large body of formal m ethods research in this area, there is still no single formally based 
m ethod th a t  the designer of a  realistic m ultiprocessor system can use on h is/her detailed design 
model to  quickly find violations in the design. In th is paper we describe such a m ethod called test  
model-checking.
Test model-checking formally adap ts to  the realm of model-checking a formally based architectural 
testing m ethod called A r c h t e s t .  A r c h t e s t  has been successfully used on a  num ber of commercial 
m ultiprocessors [Col] by running a suite of test-program s on them . A r c h t e s t  is an incomplete
•Supported in p a rt by ARPA Order #B 990 under SPAWAR C ontract #N0039-95-C-0018 (Avalanche), DARPA 
under contract #DABT6396C0094 (UV).
1
testing m ethod in th a t it does not, under all circum stances, detect violations of memory orderings 
[Col92]. Nevertheless, its tests have been shown to  be incisive in practice [Col]. M ost im portantly, 
the formal theory of memory ordering rules developed by Collier in [CoI92] forms the basis for 
A r c h t e s t , which means th a t  whenever a violation is detected by A r c h t e st , there is a formal line 
of reasoning leading back to  the precise cause.
Being based on A r c h t e s t ,  test model-checking is also incomplete. However, none of the  (pre­
sumed) complete alternatives to  date have been shown to  be practical for verifying large designs. For 
example [PD96] involves the use of manually guided mechanical theorem proving. Even approaches 
based on conventional  model-checking are impossibly difficult to  use in practice. For example, the 
assertions pertaining to  the sequential consistency of lazy caching [Ger95], a  simple memory system , 
expressed in various tem poral logics (by [Gra94] in VCTL” [CES86] and [LLOR97] in TLA [Lam94]) 
are highly complex. We do not believe th a t descriptions of th is style will scale up. On the o ther 
hand, the tes t model-checking m ethod has not only been able to  comfortably handle the  memory 
system  defined by the sym m etric multiprocessor (SM P) bus called Runway  [BCS96, GGH+97] used 
by Hewlett-Packard in their high-end machines, bu t also it discovered many subtle bugs in our early 
U tah Runway Model (URM) th a t we created. Our URM includes a number of details such as split 
transactions, out of order transaction completions, and even an element of speculative execution. 
The errors we m ade in capturing these details could well have been made in an actual industrial 
context. We believe th a t  with growing system complexity, the role of debugging m ethods th a t are 
effective and are formally based will only grow in significance, regardless of whether the m ethods are 
complete or not.
Test model-checking has a num ber of o ther desirable features. It involves model-checking a  fixed 
set of safety properties for each formal memory model, th a t  are very nearly independent  of the actual 
memory system  model being tested. This fixed nature  greatly facilitates the use of test model- 
checking within the design cycle where debugging is m ost effective, design changes are frequent, 
and time-consuming alterations to  the properties being verified following design changes would be 
frowned upon (test model-checking will not need such alterations). Also, the formal adap tation  of the 
tests of A r c h t e s t  m ade in tes t model-checking can be verified once and for all, thanks to  the fixed 
set of tests used in tes t model-checking (we describe and argue the correctness of these abstractions 
later). Finally, in tes t model-checking, a memory model is viewed as a collection of simpler ordering 
rules, and for each constituent ordering rule, a specific property is tested on the memory system . We 
found th a t  this significantly helps com partm entalize errors, as opposed to producing non-intuitive 
error traces th a t could result during conventional model-checking, which can be very difficult to  
understand for non-trivial memory systems.
Test model-checking is also a  more effective debugger for memory models than  A r c h t e s t  in a 
formal sense. The tests of A r c h t e s t  are straight-line program s of length k , one per node. Such 
program s execute on various nodes of the m ultiprocessor concurrently. The recom m endation accom­
panying A r c h t e s t  is th a t  users run the tests for as large a k th a t is feasible, because then the 
chances of being scheduled according to different interleavings (by the underlying operating system, 
memory controller arb iter, etc.) increase. In adapting  the tests of A r c h t e s t ,  test model-checking 
gives the effect of choosing k =  oo. Thus, we cover all possible schedules. The subtle bugs detected 
by test model-checking on realistic examples th a t  are reported in Section 5 corroborate our intuition 
th a t  test model-checking is indeed an effective debugging tool for memory models.
To reiterate, our specific contributions in this paper are: (i) the adaptation  of a  formal testing 
m ethod for memory models to  model-checking, th a t  can be applied during the design of modern mi­
croprocessors whose memory system s are very complex; (ii) a formal characterization (accompanied 
by proofs) of how the tests  of the testing m ethod are abstracted  and turned into a  fixed set of safety 
properties th a t are then model-checked; and (iii) experim ental results on three examples using the 
VIS model-checker, the last example being much larger than  any previously reported in this context.
2
(C l)  V(o, d) €  addre s s  x  da t um  Vi €  index : ini t  = >  A G ( enabl e(read, (a ,d) )  = >  avail ,  (a ,d) )
(C2)  V(o, d),  (a, d') € addre ss  X da tum,  d ^  d' Vi £  index :
in i t  = >  A G ( ( a v a i l , ( a , d )  A E F (enaW e(read ,(a ,d))))  =£■ A[->auciiJ,(a,<i) W  AG(->aua»f,(a,d))])
(C3) V(a,d) £ addres s  x d a tu m  Vi, k £  index : ini t  = >  A G [ a f t e r ( w r i t e k ( a , d ) )  = >  AF(at>ai7i(a, d))]
(S I)  V(a, d) 6 addres s  x  da t um  Vi £  index  :
ini t  ==> A G [a /te r (w r ite i( a ,d))  = >  A ( - ’enabled(read, (a,  d))  W  avai l i (a , d)) ]
(54) V(a, d),  (a, d') £  addres s  x  da tum,  d  ^  d' Vi, k £  index :
ini t  = >  A ( [ - <a va i l , ( a , d )X V (a v a i l , ( a ' , d /) / \ -<ava i l l (a , d) ) ]  = >  [-<availk(a,d) W  at>a^7fc(a, ,d ,)])
Figure 1: P a rt of the specification of Sequential Consistency, from [Gra94]
R elated  W ork
In [Gra94], abstract in terpretation  [CC77] is employed to  reduce infinite-system verification to  fi­
nite VCTL* model-checking. They apply this technique to  verify the sequential consistency of lazy 
caching with unbounded queues. They recognize th a t  to  get an exact characterization of sequential 
consistency involving only the observable event names, one needs full second order logic [Gra94]. 'I'o 
be able to  express sequential consistency in VCTL*, they give a  stronger characterization of sequen­
tial consistency. For this stronger characterization, the expression of sequential consistency is very 
complex, as shown in figure 1 (this figure shows only part of their sequential consistency expression). 
A technique very similar to  test model-checking was proposed in [McM93] under the section heading 
‘Sequential Consistency’. To give a  historic perspective, our test model-checking idea originated in 
our a ttem p t to  answer the following two questions: (i) which memory ordering rule(s) is [McM93] 
really verifying? (ii) is this a  general technique? i.e. can o ther memory ordering rules be verified 
in the  same fashion? We still have not found a  satisfactory answer to  the first question because the 
test in [McM93] uses only one location which then couldn’t  make it a  test for sequential consistency;  
it could plausibly be a test for coherence—which again does not correspond to  what Collier formally 
proves in [Col92]. One of our contributions is th a t we answer these questions by elaborating on the 
theoretical as well as practical aspects of test model-checking.
In [PD96], the authors use a  m ethod called aggregation on a  distributed shared memory coherence 
protocol used in an experim ental multiprocessor, to  arrive a t a simplified model of system behavior. 
Their technique involves m anual theorem  proving. The work in [HMTLB95] as well as [DPN93] 
are aimed at verifying th a t  synchronization routines work correctly under various memory models, 
where the memory models themselves are described using fin ite-state operational models. They do 
not address the problem of establishing the memory models provided by detailed memory subsystem  
designs, which is our contribution. In [GK97, GK94], the au thors analyze the problem of deciding 
whether a given set of traces are sequentially consistent. O ur approach differs in two respects. F irst, 
we are interested in proving th a t detailed models of memory system s are correct, while they obtain 
traces (presumably from actual machines) and analyze them  for sequential consistency. Second, our 
m ethod is more useful for CPU  designers as it can give feedback during early phases of the design 
pinpointing which ordering rules are violated (if any).
2  O v e r v i e w  o f  A r c h t e s t
A r c h t e s t  is based on the theory presented in [Col92] th a t  formally defines and characterizes archi­
tectu ral rules obeyed by memory subsystem s of multiprocessors. Although these rules are elemental , 
in realistic memory system s the  rules manifest in compound  form. Obeying a  compound rule is
3
Initially A = 0
ProcessPi Pi'ocessPi
Li : A := 1; X[\]  := A;
L2 : A : =  2; X[2\ := A\
L3 : A : =  3; X[3) := A;
Lk : A ~  k := A;
Figure 2: T e s ip o w O ' A r c h t e s t  test for A ( C M P , R O , W O )
tan tam oun t to  obeying all the constituent elemental rules; violating a  compound rule is tan tam ount 
to  violating any  of the constituent elemental rules. Each such elemental rule describes a  constrain t 
on the  order in which various read and write events can occur. For read operations there is one 
read event per each read operations. However, for write operations, there is one write event per pro­
ccss per write operation which captures the effect of a write operation becoming visible to  different 
processors a t  different times. Some of the elemental ordering rules are:
Rule of Computation (CM P): This is a basic rule defining how the term inal value of each 
operand is calculated from the initial values of the operand. Though m ost of the literature 
on memory architectures implicitly assumes this rule, we will often keep it explicit in our 
discussions.
Rule of Read Order (RO): For any pair of read events a and b in the same process, if a comes 
before b in program  order then a happens before b.
Rule of W rite Order (WO): For any pair of write events a and b in the sam e process, if a comes 
before b in program order then a happens before b.
Rule of Program Order (PO): For any pair of events a and b in the same process, if a comes 
before b in program order than  a happens before b. Event a or 6 can be either read or write 
event. So, both RO and PO are special cases of PO. This is one of the  strongest ordering rules 
and is essential for sequential consistency.
Rule o f W rite Atom icity (WA): A write operation becomes visible to  all processes instan ta­
neously. More precisely, one conceptual store 5,- is associated with each processor node Pi.  
Then, for each write operation W ,  one write event W,  is defined per store 5,-. Then, W A  
guarantees th a t  there is no i , j  and no event e such th a t  e is before Wj  and is after Wj.
In order to  check memory subsystem s for a compound rule, A r c h t est  provides a test for each 
compound rule along with a set of conditions to  be checked for. If any of the conditions is violated 
then  a  violation to  obey the compound rule is detected.
^ R O W O : A r c h t e s t  test for A ( C M P ,  R O ,  W O )
The tes t of A r c h t e s t  for the  compound  rule consisting of the elemental rules C M P , R O , and 
W O ,  denoted A ( C M P ,  R O , W O ) ,  is shown in Figure 2. Process Pi executes a  sequence of write 
instructions (intended to  check for W O), and P2 executes a  sequence of read instruction (intended 
to  check for RO). If the memory system  correctly realizes A ( C M P ,  R O , W O ) ,  then Condition 1 
produces a  positive outcome:
C o n d i t io n  1 ( M o n o to n ic )  The sequence of A’ values is monotonically increasing, i.e.:
Vi, j  : 1 <  »' <  j  <  k : „Y[t] <  X \ j ]  or equivalently V* : 1 <  i <  k  — 1 : X[i] <  X [ i  +  1].




L\ : A : =  1; Lax : l/[l] := A;
U  : A := 2; Lb, : K[l] := /?;
.. .  La3 : l/[2] := A;
Lk '• A := k; LB9 : K[2] := 5;
L „*  : :=  A; 
Lb„ : V[k] :=
Figure 3: TesfyvA1 A r c h t e s t
= S  = 0
p3 p4
LBl : X[l] := B; L, : .0 := 1;
L„, : Y'[l] := A\ L2 : B := 2;
Lb, : X[2] := B; ...
LAi : y[l] := A; Lk : B := *;
Lb, : X[k] := B;
L/it : Y[k] := A;
test for A ( C M P ,  RO,  W O ,  W A )
Initially A = B =  0
L\\ : A := 1;
L,2 : Y[ 1] := B; 
Li\ : A := 2;
L22 : y[2] :=
Lfci : A := k\
Lfcl : Y[fc] := B\
L,, : B := l;
L,2 : -V[l] := A; 
L21 : B := 2;
L22 : X[2] := A;
Lfci : B  :=k;
Lfci : X[fc] := A;
Figure 4: T e s fp g : A r c h t e s t  test for A ( C M P , P 0 )
T e s t W A : A r c h t e s t  t e s t  fo r A { C M P, R O , W O , W A)
T e s t y / A ,  shown in Figure 3 tests for A ( C M P ,  R O ,  W O ,  W A),  with the conditions checked being:
(i) the M o n o to n ic  condition (suitably modified for arrays U , V , X , Y ) ,  and (ii) A to m ic , which is:
C o n d i t io n  2 (A to m ic )  Vi, j  : 1 <  i , j  <  k : K[i] >  X \ j ]  V Y \ j \  >  U[i\.
The A to m ic  condition watches for the possibility th a t  a  write operation from P\  and a  write oper­
ation from P4 appear to  have finished in different orders to  P2 and P3 .
T e s t p 0 : A r c h t e s t  t e s t  fo r A ( C M P , P O )
Testj>Q,  shown in Figure 4 tests for A ( C M P , P O ) ,  with the conditions checked being: (i) the 
M o n o to n ic  condition (suitably modified for arrays X , Y ) ,  and (ii) P O .C r o s s ,  which is:
C o n d i t io n  3 (P O -C ro s s )  Vi, j  : 1 <  i , j  <  k  : (X[i] > j  V Y \ j ]  >  i) A (X[i] <  j  V Y[j ]  <  i).
All A r c h t e s t  test program s such as Testy^j^, T e s t p Q  etc. are m eant to  be run on real machines 
and there can’t  be any real guarantees th a t the particular interleavings th a t  reveal violations (such 
as for memory ordering rule WA watched by condition A to m ic  in T e s t y j ^ )  will indeed happen. 
To allow for as many interleavings as possible, ARCHTEST recommends th a t  its tests be run for 
large values of k. W ith test model-checking, we effectively run the tests  for k =  0 0 . Test model- 
checking achieves this by transform ing each A r c h t e s t  test into a test au to m a ta  which exploits 
non-determ inism  to  effectively check for k =  0 0 . Also, the model-checking framework guarantees 
th a t we explore all possible interleavings than  a  particular interleaving.
5
3 Test model-checking
Test model-checking converts the tests of A r c h t e s t  to  corresponding memory rule test automata  
( “test au tom ata” ) th a t drive model of the memory system  being examined. In our experim ents, we 
use the Verilog language supported by VIS [Ver] to  capture  the memory system models as well as 
the test au tom ata . The C o n d i t io n s  corresponding to  each compound memory rule being tested 
are turned into corresponding memory rule safety properties  that, are checked by the VIS tool. The 
reader may take a  peek a t Section 4.1 to know which compound rules define sequential consistency 
[Lam79]. In the  remainder of th is section, we explain the  assum ptions under which we formally 
derive test automata  as well as memory rule safety properties,  followed by a  description of how test 
au tom ata  as well as memory rule safety properties are derived for specific cases.
3.1 A ssu m p tion s ab ou t m em ory sy stem s realized  in hardware
Memory system s realized in hardw are as well as finite-state models thereof are assumed to  be data 
independent ; i.e., the control logic of the system moves d a ta  around, and does not base its control- 
point settings on the d a ta  values themselves. We also assum e th a t the system  is address semi­
dependent  [IIB95], i.e. the  control logic can a t m ost com pare two addresses for equality or inequality 
and base its actions on the outcom e of this test. These assum ptions are s tandard , and form the basis 
for defining test au to m ata  as well as memory rule safety properties.
3.2 C reation  o f te st au tom ata
As illustrated in Figure 5, we obtain test au to m ata  for various memory models by finitely abstracting  
the d a ta  used in test of A r c h t e s t ,  using non-determ inism  to  justify the abstraction . For example, 
we abstract the specific activities of process T\  of Figure 2 into th a t of (non-deterministically) writing 
all possible ascending values over {0,1}, as shown in P\ of Figure 5. Also, since we cannot store infinite 
arrays in creating process P 2, we turn  P 2 and the corresponding memory rule safety property into an 
autom aton th a t  checks th a t  the array values read are m onotonically increasing. This, in tu rn , can be 
performed using ju s t two consecutive array values x l  and x 2 th a t  are nondeterm inistically recorded 
by P2 . Hence, the  memory rule safety property we model-check for is: P2 in final state  =>• x 2 >  x l .
We now provide a justification th a t these abstractions preserve the  memory rule safety properties, 
i.e., for the same memory system model, i.e. a  violation of a  condition occurs in a test of A r c h t e s t  
for k — 00 iff the same violation will occur in model-checking the corresponding memory rule safety 
property when tes t au tom ata  are used to  drive the  memory system model. To keep the presentation 
simple, we formally argue how the test au to m ata  finds every violation present in the test of A r c h t e s t  
with k =  0 0 ; the opposite direction of iff, i.e. how a  test of A r c h t e s t  with k =  00 finds violations 
found by the test au to m ata  is easy to see because the test au tom ata  ju st appears as a “s tu ttering” 
of the test of A r c h t e s t .  For example, the actions of P\  in Figure 2 can be viewed as repeating the 
initialization and then repeating the instruction a t label Li  of P\ of Figure 2. O ur proof sketches 
are illustrated on the two tests  presented in Section 2 and another test described in this section.
3.3 A b stractin g  o
We show th a t if the test program  in T e s t ^ Q ^ Q  shows th a t  M o n o to n ic  is violated, then the test 
autom aton also reveals the error. Since M o n o to n ic  is violated,
3 i : 1 <  t <  Jt: X[i ]  >  X [ i + 1 ]
<=>  3/, a  : 1 <  i <  k : (X[i] >  Of) A (X [i +  1] <  ct)
•$=>• 3i, a  : 1 <  i <  k  : (A’fi] >  or) A ->(X[i +  1] > a )
6
PI P2
A:=0 rd (A );
3i x l : = rd (A );
A: =1 ©
x 2 : = rd (A );
SI ) ( S2
A : =1 r d ( A ) ;
Figure 5: T e s t ro \V O  au tom ata  : Test au to m ata  for A ( C M P ,  RO,  W O )  
Initially A =  0 Initially A =  (0 >  a )  Initially A =  0
ProcessPi Proc es s P i ProcessP\ P roc es s P i
Lt : A := 1; X [ l ) : =  ( A > a ) ; U : A := (1 > a) ; X [ \ ]  := A;
L2 A := 2; X[2) := (A > a ) ; U : A  := (2 > a) ; X[2] := A;
L3 : A : =  3; X[3] := (A >  a ) ; U : A  := (3 > a) ; X[3] := A;
Lk : A := k X[k]  := (A > a); Lk : A := (k > a ) X[*] := A;
(a ) (b)
Proces sPi Pro ces sP i
Lt : A :=  0; X [ l )  :=  A\
La : A :=0 X[a]  :=  A
La+i : A :=  1 X [a  +  1] :=  A
La+i : A  :=  1 X [a  +  2] :=  A
Lk :A  :=  1 X[fc] :=  A\
(c)
Figure 6 : Abstraction of T e s t ^ Q \ \ rQ
Since, the last form ula com pares X [ t \  and X [ i  +  1] only to  o , we can rewrite the test program  as 
shown in Figure 6 (a) assuming data independence, and rewrite the last formulae as
3 i : 1 <  i <  k  : X[t] =  1 A X [ i+  1] =  0
Note th a t in Figure 6 (a) all reads of A  occur in the expression A > a .  Hence, we can replace every 
A  :=  v  with A :=  (t; > a)  and X [i] :=  (A >  a )  with A’[i] :=  .4 w ithout affecting MONOTONIC again, 
if data independence holds, to  obtain Figure 6 (b). Figure 6 (c) is obtained by simplifying Figure 6 (b): 
each v > a  evaluates to  0 for v  < a  and 1 otherwise. This figure is generalized to  obtain the test 
autom aton in Figure 2(b). Intuitively the  autom aton finds the violation as follows. P\  remains 
in the initial s ta te  for a  iterations (executing A :=0) and then switches to  second s ta te  (executing 
A := l) . Also, P2 remains in the initial s ta te  for i — 1 iterations and then switches to  second s ta te  
recording x l  and then x2  (dashed edges show when these variables are recorded). T hus the test 
au tom aton ’s execution is identical to  th a t  in Figure 6 (c) except th a t the test autom aton gives the 
effect of taking k to  0 0 . Also notice th a t s i  and x 2  get the values corresponding to  A'[i] and A"[i + 1]. 
Also, corresponding to  X[i \  =  1 A X [ i  +  1] =  0, we have x l  =  1 A x2  =  0. Hence the memory rule 
safety property corresponding to  condition M o n o to n ic  is found violated by the test autom aton 
exactly when T’esiftOWO f°r k =  oc detects a  violation. Note th a t  the nondeterminism employed in 
constructing test au to m ata  enables Pi and P2 to  guess  the  right value of a  and i corresponding to 
the violation.
3.4  A b stractin g
Test autom aton for T e s t y /a  is shown in Figure 7. In this au tom aton  Pi and P4 w rite all possible 
ascending sequences of {0, 1} in A  and B  respectively. Each processor independently  and non-  
deterministically  decides to  switch from writing 0 to  w riting 1. M odifications similar to  those in 
T e s t  ft OWO are a PPl'ed to  P2 and P3 also, to  (nondeterm inistically) decide which C [^»],V’[*’] pair and
7
r d ( A ) ; r d ( B ) ;




f1 u : = r d (A) ; !x := rd (B ) ;rHII< I v : = rd(B ) ; ly : = r d (A) ; B: -
3 3
A: =1 r d ( A ) ; r d ( B ) ; B : =1
r d ( B ) ; r d ( A ) ;
P I P2 P3 P4
Figure 7: T e s t y / a  tes t au tom ata  : Test A utom ata for A ( C M P ,  R.O, W O ,  W A)
X \ j \ y \ j \  pair are recorded in u , v  and x , y .  The memory rule safety property corresponding to  con­
dition A to m ic  is: P2 and P 3 in their final states  =£> 1; > * V y >  tt. As was explained in Section 3.2 
for T e s t f t o \V O  our abstraction avoids having to  remember the entire extent of the arrays U, V ,  
X ,  and Y .  (In T e s t \ y ^ , one has to  check for M o n o to n ic  also; this is done similarly to  th a t in
T e s t RO\VO-)
To show th a t  the abstraction preserves A to m ic , let A to m ic  be violated in T e s t y / ^ of A r c h t e s t .  
Hence
3 i , j :  U[t\ >  Y \ j ]  A X \ j ]  >  V[t\
<£=> 3 , i , j , a , / 3 :  Y \ j ]  =  a  A U[t\ >  a  A V[t\  =  fi A X [ j ]  >  fi
Similar to  T pQ^yQ, assuming data-independence,  we have an execution of the test autom aton 
(Figure 7) in which P\ ,  P2 , P3 , P4 iterates for a , i  — 1 , j  — 1 ,/3 times (respectively) in their initial 
s ta tes  before switching to  their final sta tes. This test autom aton execution detects violations of 
A to m ic  exactly when Testy/j^  for £ =  00 would. A violation of A to m ic  happens exactly when 
m = 1 A u  =  0 A i  =  1A !/ =  0.
3.5 A b stractin g  T e s t p Q
We now discuss a test for the elemental ordering rule Program  O rder (PO ), which is somewhat more 
complex than  the  previous two tests. PO requires th a t two events of the same process occur in the 
order specified by the program . A r c h t e s t  provides the test for the compound rule A ( C M P, P O )  
shown in Figure 8 . Violation of A ( C M P , P O )  is detected if Condition 3 fails: We obtain the test 
autom aton and the memory rule safety property for T e s t p g  of Figure 4 as illustrated in Figure 8 . 
Pi  executes a pair of instructions: w rite to  A followed by read from B,  infinitely often. The value 
w ritten to  A  is 0 for some iterations and is nondeterministically changed to  1. P2 runs similarly. Pi  
nondeterm inistically selects a pair of w rite followed by read instruction. It assigns the value w ritten 
to  A  to  j  and the value read from B  to  y.  Similarly, processor 2 updates i and x.  The dashed edges 
in Figure 8 show when x , y , i , j  are updated. The memory rule safety property corresponding to  
condition P O -C ro s s  is: Pi and P 2 in their final states  ^  (x >  j  V y >  i) A (a: <  j  V y  <  *).
To show th a t  th is abstraction preserves P O .C r o s s ,  let P O .C r o s s  be violated in A r c h t e s t  
test T e s t  pQ.
3*, j  : ( X i  <  j  A Yj <  i) V (Xj >  j  A Yj >  i)
<=>  3, *, j ,  a ,  (3 : ( (X i  =  a)  A ( j  >  a )  A (Yj =  ff) A (i >  fi))
V((X i  > a) A (j  =  a) A (Yj > fi) A  (* =  fi))
Similar to  the case of T e s t y / a ,  if 3 i , j  : A’[i] <  j  A Y \ j ]  <  i, then we can get a  case in the test 
au to m ata  where x =  0 A j  =  l A y  =  0 A i  =  l .  Similarly, if 3 i , j  : A”[i] > 0 A Y \ j ]  >  i, then we can
8
A := l ;  
r d  (B) rd (A)
Figure 8 : T e s t p Q  test au tom ata : Test au tom ata  for A ( C M P ,  PO)
Event Action or condition
Ri(d. a) 
Wi(d, a)
if Mem [a] =  d 
Mem[a] :=  d
Figure 9: Serial memory transaction rules
get a case in the test au tom ata  where a; =  l A j  =  0 A y = l A i  =  0. Hence, the memory rule safety 
property corresponding to  P O _ C ro ss  will be violated in test au tom ata  if and only if P O -C ro s s  
will be violated in A r c h t e s t  test T e s t  p o  for k =  oo.
4 Case Studies
To dem onstrate  the effectiveness of our approach, we verified three different memory system s, namely 
serial memory, lazy caching, and a simplified version of the Runway bus, all using VIS [Ver]. These 
three memory system s are described in some detail below, along with some of the subtle bugs th a t 
we could detect using test model-checking. Details of all our experim ents can be obtained from the 
Web [Mok] or by contacting the authors.
4.1 H ow  do we check for seq uentia l con sisten cy?
A sequentially consistent memory system  [Lam93] requires th a t there be a  single self-consistent trace 
t of memory operations th a t  when projected onto the memory operations of each individual processor 
Pi (R i ( a , d) and Wi(a,  d) for processor i) is according to  program order for P ,. As suggested in [Col92], 
we can show th a t  sequential consistency is A ( C M P ,  P O , W A ) .
As [Col92] does not list a  single compound test to  check for A ( C M P ,  P O , W A ) ,  we can use 
the following two tests  th a t are available: Test^VA which tests for A ( C M P ,  RO,  W O ,  W A )  and 
T e s t p Q  which tests  for A ( C M P ,  P O ) .  This com bination is exactly equivalent to  testing sequential 
consistency because P O  implies R O  and W O  (as formally defined in [Col92]). For every memory 
system we consider, these two tests are model-checked separately and summ arized in Figure 14.
9
Event Allowed if Action
R,(d, a) C , (a)  =  d  A Out,  =  {}
A no *-ed entries in In,
W ,(d ,a) Out, ;= append(Out , , ( d , a ) )
MW, (d,a) head(Out , )  =  (d ,a ) Mem[a\  := d;
Outi := tai l (Out , ) \
(VA: ^  i :: Ink :=  append(Ink,  (d,a)));  
Ini :=  append(In , , ( d , a , * ) )
M R j( d ,a ) ,V/em[a] =  d Ini :=  append( Ini , ( d , a) )
CUi(cf, a) head( Im)  is either
(d,  a) or (d, a, *) Ini :=  tail(Ini); Ci updat e(C, ,d , a)
Cl, C\ := restrict(C,)
Initially: Va Mem[a\  — 0
A Vt =  1 . . . n C, C Mem  A /n , =  {} A Outi  =  {}
Fairness: no action other than Cl, can be always enabled bu t never taken
W —write MW -mem ory write CU cachc update
R—read MR—memory read Cl—cache invalidate
Figure 10: G erth ’s version of the lazy caching algorithm , from Figure 4 of [Ger95].
4.2 Serial m em ory  and L azy caching
The s e r ia l  m e m o ry  protocol for n processors and a memory is shown in Figure 9. Serial memories 
are often used to  define SC operationally. The lazy  c a ch in g  protocol [Ger95], shown in Figure 10, 
also implements sequential consistency, and is geared towards a bus based architecture. The memory 
interface still consists of reads and writes; however, caches C, are interposed between the shared 
memory M e m  and the processors P 2. Each cache Ci  contains a  part of the memory M e m  and has 
two queues associated with it: an out-queue Outi  in which P, write requests are buffered and an 
in-queue IN i  in which the pending cache updates are stored. These queues model the asynchronous 
behavior of write events in a  sequentially consistent memory. A write event Wi(a,  d) doesn’t  have 
an im m ediate effect. Instead, a  request (d, a) is placed in O uti. W hen the w rite request is taken out 
of the queue, by an internal mernory-write event M W i ( a , d ) ,  the memory is updated and a cache 
update request (d ,a)  is placed in every in-queue. This cache update is eventually removed by an 
internal cache update  event C U j(a ,  d) as a result of which the cache C j  gets updated. Cache evictions 
are modeled by internal caches invalidate events: C /, can arbitrarily  remove locations from cache 
Ci.  Caches are filled both as the delayed result of write events and through internal memory-read 
events, M R ( a , d ) .  The la tte r  events model the effect of a cache-miss: in th a t case the read event 
stalls until the location is copied from the memory. A read event R i (a ,d ) ,  predictably, stalls until a 
copy of location a is present in C,- bu t also until the copy contains a correct value in the following 
sense: SC dem ands th a t a  processor P, reads the value a t a  location a th a t was recently w ritten 
by Pi unless some other processor updated a in the  meantime. Hence, a read event R i ( a , d ) cannot 
occur unless all pending writes in Out i  are processed as well as the cache updates requests from Ini  
th a t  corresponds to  writes of Pi. For this reason, such cache updates requests are marked (with a 
*). Figure 11 shows the s tructu re  of the  Verilog model we created for the memory model verification 












F ig u r e  11: Verilog a rc h itec tu re  o f tw o processors Lazy C ach ing  p ara lle l m ach ine
4.3 R unw ay-PA 8000 M em ory S ystem
Figure 4.3.1 shows a simplified view of 2 H P PA8000 CPU s and a  memory controller (HOST) 
interconnected by HP Runway Bus[BCS96, Cam97, Kan96]1. We will describe the Runway-PA8000 
system in some detail to  facilitate a clear description of some of the subtle bugs in URM unearthed 
by the tes t model-checking technique. Runway is a  synchronous, split-transaction bus which is 
responsible for providing a  coherent view of shared memory to  the processors (clients) while still 
allowing th e  clients to  maintain private copies of memory lines in their caches. Cache Coherency is 
m aintained by a snoopy coherency protocol described below.
4.3.1 Snoopy Coherency Protocol
Each cache line in a  client can be in one of the  four s ta tes  : invalid, shared, private-clean or dirty.
If a client suffers a read miss in cache, it generates a  rsp (read shared or private) transaction; if 
it suffers a w rite miss, it generates a  rp (read private) transaction. The transaction is broadcast 
on the Runway when it wins the bust m astership. All clients snoop the transaction  into their CCC 
(cache coherency check) queues and process the entries in CCC queue a t their own speed. W hen a 
transaction gets to  the head of CCC of client C ,, it sends a ccr (cache coherency response) to  HOST 
according to  Figure 13, and also changes its s ta te  to  reflect the transaction; for example, if the 
transaction is rp generated by C,-, it would assum e “invalid-private-clean” transien t s ta te . If a  client 
generates a  coh.copyout  as ccr, it would later issue a c2cw  (cache to  cache write) to  supply the da ta . 
HOST enters the ccr’s into its C C R queue, and after all clients have responded to  a  transaction, the 
HOST determ ines if the d a ta  would be supplied by another client. If no client is going to  supply 
the da ta , the HOST would generate a hdr  (host d a ta  return) transaction on the Runway to  supply 
the d a ta  to  the  requester. It would also drive Client_op lines to  indicate w hether the d a ta  m ust be 
shared (i.e., a t least one of the ccrs is coh.shared). W hen a  client notices a  d a ta  return  (a hdr  or 
c2cw) targeted  tow ards it, it enters the  inform ation into d a ta  return  (DR) queue. Note th a t a  client 
might receive a  d a ta  return  before it generates the corresponding ccr. In th is case, the client keeps
1 We have purposefully avoided arbitration lines and other details for the sake of clarity. The actual Runway allows 
up to four CPU s and one I /O  processor and also many more transactions including coherent, non-coherent and I/O  
transactions than  we describe here. We provide a  simplified view' of its operation which captures the essential complexity 
of its behavior.












mem Ctrl and 
main mem)
CCR2 CCR1
Cache Coherency Responses _
Figure 12: Simplified View of Runway-PA8000 M emory System
Transaction Generated by S tate ccr
- self - coh.ok
- other invalid coh.ok
rsp other private-clean coh.shared
rsp other shared coh.shared
rp other shared coh.ok
rp other private-clean coh.ok
- other dirty coh.copyout
Figure 13: ccr  generated when a transaction gets to  the head of CCC queue
the d a ta  in d a ta  return  queue until the ccr is sent out.
4.3.2 Delay in c cr  generation
If a client has a  c2cw  transaction for a  line yet to  go on Runway, then it delays generating any more 
ccr’s for th a t  line. To see why this is necessary, consider the following. Suppose a  client C l has a 
d irty  line. Client C2 requests this line by issuing rsp transaction on bus. C l  will generate coh.copyout  
in response to  C 2’s request, invalidate its own line, and create a c2cw  transaction for C2. Note th a t 
the m ost recent d a ta  for this line is with C l and not HOST. Now, a client C3 requests the same 
line by issuing rsp. C2 and C3 generates respectively coh.shared and coh.ok ccrs in response to  C 3’s 
request. C l ’s ccr  will be coh.ok in response to  C 3’s request. If C l sends coh.ok to  HOST before its 
c2cw  goes on the  bus then HOST can provide a  stale d a ta  to  C3 by its hdr  transaction. To avoid 
this, C l delays generating ccr  until the c2cw  goes on the bus.
4.3.3 Arbitration
Runway follows a complex pipelined arb itra tion  algorithm  to determ ine the bus m aster. Here, we 
only present an approxim ation of the algorithm . Every bus user (client or HOST) m ust become the 
bus m aster before it can drive the bus. Bus m astership a t cycle N +2 is acquired by initiating the 
arb itra tion  in cycle N by driving the request through dedicated arb itra tion  lines (not shown in the 
figure). During cycle N + l ,  every potential bus user evaluates the o thers’ drives and, in conjunction 
with round-robin pointers for arb itration  priorities, determ ines who wins bus-m astership for cycle 
N +2. Those who do not win bus m astership keep-off the bus. Bus arb itration  proceeds in a  pipelined 
m anner concurrently with transaction processing.
12
4 .3 .4  P A 8 0 0 0  R u n w a y  in te r fa c e
In addition to  the Runway specifics described above, PA8000 Runway interface (PARI) also adheres 
to  the following constraints in order to  ensure Program  O rder and W rite Atomicity. PARI allows a 
client to  initiate Runway transactions for various cache misses; it is possible th a t these transactions 
complete out of order. However, all instructions strictly complete in program order. PARI guarantees 
th a t the client will stall the coherency response for any cache line which it has an outstanding miss 
for (i.e., it has initiated a Runway transaction, has assumed the ownership but is still waiting for the 
d a ta ). The coherency response will be generated only after the client has received the d a ta  and has 
used it to  make forward progress a t least one instruction. PARI guarantees th a t if a  client receives 
d a ta  for its Runway transaction before it assumed the ownership then it will not modify or use the 
d a ta  until it processes its own transaction (and thus assumes ownership). PARI guarantees th a t if a 
client has c2cw  transaction then it gets the highest priority to  go to  the Runway.
4.4 T he R unw ay-PA 8000 in V IS V erilog
We constructed a Verilog model of the  llunway-PA8000 system , Utah Runway Model (URM ), and 
the two abstractions of T e s t p Q  and Tes t™ a  to  verify th a t its memory model is sequential consistent. 
The complexity of the system  stem s from a  num ber of sources: (a) multiple outstanding transactions 
for each processor, (b) out-of-order completion of the Runway transactions, but in-order comple­
tion of instructions, (c) eager assum ption of ownership w ithout receiving the corresponding data , 
(d) “equivalent” sta tes introduced by decoupled execution due to  coherency queues, (e) speculative 
execution features of the processor to  ensure performance in spite of in-order completion of the in­
structions, (f) an involved d istributed pipelined arb itra tion  algorithm . We did not try  to  model each 
of these features in their full glory, bu t we did include a  modicum of these aggressive features into 
our URM, which in fact occupies more than  2,000 lines of VIS Verilog code (see [Mok]). For instance 
all essential features of (a), (b), (c), and (e) are included, (f) is abstracted  by using nondeterminism , 
(d) is abstracted  as explained below.
A b s t r a c t io n  o f  Q u e u e s  Additional abstraction effort was necessary to  make our URM digestible 
by VIS. This essentially consists in getting  rid of the CCC, CCR, and DR queues which are the main 
cause of s ta te  explosion, bu t retain  HDR queue in the HOST and C2CW  queues in the HOST and 
clients.
In Runway, most of the  conflicts are detected and resolved by the HOST. There is one situation 
where a client detects conflict: the client has a  pending c2cw  transaction . The client resolves this 
by delaying its coherency response; the net result of this delay is th a t  the HOST would not generate 
hdr  transactions until the c2cw  goes on the Runway. Since we abstracted  away the CC R queues, 
in our URM the clients send the coherency response for a  coherent transaction immediately after 
its occurrence on the bus. Hence, in our URM the clients can’t resolves conflicts by delaying the 
coherency response; instead the HOST computes  if the coherency response needed to  be delayed, and 
if so, delays the hdrs appropriately. This is achieved as follows. A counter is associated with each 
HDR queue entry. If the counter is non-zero, then it is waiting for some c2cw  transactions for th a t 
line from the  clients, hence the hdr  needs to  be delayed. A fter all the pending c2cw  transactions for 
th a t  line go on the bus, the  counter becomes zero, and hence the hdr  transaction can go on the  bus. 
In our URM, we used a  tw o-bit counter, which allows up to  four processors.
In Runway, all clients save the d a ta  returns (hdr  and c2cw  transactions) in DR queue until 
the corresponding request appears a t  the head of its CCC queue. This is necessary to  enforce in­
order completion of instructions. We abstrac t away the  CCC queues and the d a ta  return  queues by 
associating a one-bit inform ation with each cache line in each client. This bit is set for an address
13
A (CM P.PO) #states #bdd nodes conditions verified runtim e (mn:sec)




lazy caching 7.80248e+06 306692 Vacuity
P O -C ond
01:12
36:33
URM 953675 1657308 Vacuity
P O .C ond
14:23
27h28:30
A(CM P,W O,RO,W A) ♦states #bdd nodes conditions verified runtim e (mn:sec)
serial memory 21242 10084 Vacuity
C ondi -  Cond3
00:04
00:34
lazy caching 1.90736e+06 513655 Vacuity
C ondi -  Cond3
02:02
59:33
URM 985236 1695092 Vacuity
C ondi -  Cond3
17:24 
40h 17:33
Figure 14: Verification results using VIS on a  SPARC ULTRA-1 with 512 MB Memory
a whenever a  d a ta  return happens for a , bu t a  preceding instruction is not yet completed. After all 
preceding instructions are completed, the  d a ta  is used, and the bit is reset indicating the completion 
of the instruction.
4 .5  V e r i f ic a t io n  r e s u l t s
The tables in figure 14 show execution tim e for model-checking our Serial memory, Lazy caching and 
URM models for tes ts  of A(C.VII), PO) and A(CMP,RO,YVO,WA) (recall th a t A(CM P, PO, WA) 
implies SC). The three models running separately the two tests T e s t y / A  an^ T e s t p Q  are model- 
checked for the following conditions: (Figure 8 does not show some of these sta tes)
TestWA: MONOTON1C: A (P2.inS2) = >  ( P2U\ < P2.U2)
A (P2.inS2) = >  (P2.V1 <  P2.V2)
A (P3.inS2) =► (P3 .X1 < P3.X2)
A ( P a . i n S a )  = ►  ( /V Y " .  <  PZ.Y2)
A t o m i c :  (P2.inS\ A P3.irtSi) = >  (P2.V >  P3.X v  P3 .Y >  P2-U)
Testp0 : P O . C r o s s :  (Pi.inS3 A P2.inS3) => (Pi-Y > P2.l V P2.X > P\.J) A  (Pi.Y < P2.I V P2.X < P\.J)
As can be seen, all these conditions are safety properties, and independent of the model itself, which 
is a  distinct advantage over o ther m ethods.
The size of the s ta te  space and num ber of nodes in BDDs are also reported. Note th a t lazy caching 
has more s ta tes  than  Runway due to  the queues present in the model. However, the complexity of 
the Runway protocol is much higher, which results in large BDD size and higher run tim e. However, 
in all our experim ents, whenever there was any memory ordering rule violation in our model, test 
model-checking detected it quickly (in th e  order of minutes). A very desirable feature one can provide 
in a tool based on test model-checking is a menu  of previously generated test au to m ata  for the various 
compound rules in [Col92], using which designers can probe their model.
O ur Verilog models captures quite faithfully the cache coherence protocol and the ordering rules 
of the  three memory systems.
A fter an extensive debugging using test model-checking driven by T e s t p o  and T e s t y j ^ ,  we 
have a high confidence th a t the memory model provided by Lazy caching and Runway-PA8000 is 
sequentially consistent. The verification of serial memory was straightforw ard.
14
Description o f a Bug found in preliminary model of lazy caching: The following bug in 
our model of Lazy Caching was caught by a violation of P O _ C ro ss  in T e s t p q . The bug was in the 
queues used by Lazy Caching, which were implemented as shift registers. We forgot to  shift the *-bit 
in Ini  when the processor I \  receives a cache-update from /n ,  queue. W ith this bug it is possible th a t 
Irii queue is not *-ed when it should be, and consequently reads in P, may bypass writes. This results 
in a violation of PO . This is a difficult bug to  catch because its detection involves understanding the 
complex feedback from all components of the  protocol to  each o ther (queues, memory, and caches). 
Moreover, this bug is interesting because it violates PO  but doesn’t violate WA. This is so because 
only w rite-read (W R) order is affected by this bug. Our technique effectively caught this bug: 
the P O _ C ro ss  condition does not pass when we model-checked the model for T es t  p q . However, 
T es t y /A  (note th a t it doesn’t  involve PO) passes!  This shows the futility of ad hoc testing methods: 
one could apply subjective criteria to  consider a  test similar to  T e s t y j a  to  be sufficiently incisive, 
when in fact it fails to  account for a  crucial ordering relation such as PO.
Description o f a Bug found in preliminary URM: Similarly, another corner-case bug was 
caught by test model-checking in our URM by a  violation of P O -C ro s s  condition using Testj>Q.  
This bug generated a  long counter-example trace, due to  the depth of the sequential logic of the 
model. The trace revealed the following situation:
(1) client i  has removed its own read transaction from the bus, then
(2) client i  sends coh.ok in response to  a  subsequent coherent transaction for the sam e line before
getting  the  d a ta  for its transaction (by hdr  or c2cw).
This problem was fixed using the counter in the IIO ST ’s HDR entries to  record the pending c2cas 
and the one-bit, information in the client’s cache lines to  record whether the  d a ta  is supplied, as 
explained in paragraph 4.4. After fixing the  bug the PO condition passed.
5 Conclusion and Future Plans
We presented a  new approach to  verify m ulti-processors for formal memory models, which combines 
two existing powerful techniques: model-checking, and the testing m ethod of A r c h t e s t . From our 
results, we conclude th a t  tes t model-checking can be of great value in detecting bugs during early 
stages of the  design cycle of modern microprocessors whose memory subsystem s are complex. Our 
results on our URM of the HP PA /R unw ay bus a tte s t to  this.
So far we have identified the rules and corresponding tests for sequential consistency. We are cur­
rently working on identifying similar rules and tests  for o ther well-known formal memory models such 
as TSO, PSO, and RMO [AG96] th a t are described in the SPARC V9 architecture manual [WG94]. 
This work may involve defining new rules as well as new tests  corresponding to  them .
We are currently  working to  form ulate some reasonable assum ptions abou t the memory system 
model under which the tests adm inistered by our test au tom ata  can be rendered complete. Also, 
for a limited class of models, model-checking the  test for some small value of k m ight actually be 
sufficient. O ur initial a ttem p ts  in this direction are encouraging.
Acknowledgments We would like to  thank  Dr. Collier for his help in explaining his work, his 
very inform ative emails and providing A r c h t e s t . We would like to  thank  Dr. N arendran for many 
fruitful discussions. We would like to  thank  Dr. Al Davis and his Avalanche team  foro offering us 
the unique opportunity  to  work on sta te-of-the-art processors and busses.
15
















S arita  V. Adve and Kourosh Gharachorloo. Shared memory consistency models: A 
tu to ria l. Computer , 29(12):66-76, December 1996.
William R. Bryg, Kenneth K. Chan, and Nicholas S.Fiduccia. A high-performance, 
low-cost multiprocessor bus for w orkstations and midrange servers. Hewlet t-Packard  
Journal , pages 18-24, February 1996.
A lbert Camilleri. A hybrid approach to  verifying liveness in a sym m etric multi­
processor. In Theorem Proving in Higher Order Logics, 10th International Conference,  
T P H O L s ’97, Murray Hill, N J , pages 49-67, A ugust 1997. Springer-Verlag LNCS 1275.
P. Cousot and R. Cousot. A bstract intepretation: a unified lattice model for sta tic  
analysis of program s by construction or approxim ation of fixpoints. In Proceedings of 
4th POP L,  pages 238-252, Los Angeles, CA, ACM Press, 1977.
E. M. Clarke, E. A. Emerson, and A. P. Sistla. A utom atic verification of finite-state 
concurrent system s using tem poral logic specifications. A C M  T O P L A S , 8(2):244-263,
1986.
W. W . Collier. M ultiprocessor diagnostics, h ttp ://w w w .infom all.o rg /d iagnostics/arch test.h tm l.
W. W . Collier. Reasoning About Parallel Architectures.  Prentice-Hall, Englewood Cliffs,
NJ, 1992.
Francisco Corella, April 1997. Invited talk  a t Com puter Hardware Description Lan­
guages 1997, Toledo, Spain, on Verifying I /O  Systems.
David L. Dill, Seungjoon Park, and A ndreas Nowatzyk. Formal specification of abstract 
memory models. In G aetano Borriello and Carl Ebeling, editors, Research on Integrated 
Syst ems , pages 38-52. M IT Press, 1993.
Rob G erth . Introduction to  sequential consistency and the lazy caching 
algorithm . Distributed Computing,  1995. Also can be found in 
h ttp ://w w w .research .d ig ita l.eom /S R C /tla /papers.h tm l#L azy .
G. G opalakrishnan, R. Ghughal, R. H osabettu , A. Mokkedem, and R. Nalumasu. For­
mal modeling and validation applied to  a commercial coherent bus: A case study. In 
Hon F. Li and David K. P robst, editors, C H A R M E , M ontreal, Canada, 1997.
Jam es Gosling, Bill Joy, and Guy Steele. The Java™  Language Specification. Sun 
M icrosystems, 1.0 edition, August 1996. appeared also as book with sam e title  in 
Addison-Wesleys 'The Java Series’.
Phillip B. Gibbons and Ephraim  Korach. On testing cache-coherent shared memories.
In Proceedings of  the 6th Annual Symposium on Parallel Algorithms and Architectures , 
pages 177-188, New York, NY, USA, June 1994. ACM Press.
Phillip B. Gibbons and Ephraim  Korach. Testing shared memories. S I A M  Journal  on 
Computing,  26(4):1208-1244, August 1997.
S. Graf. Verification of a distributed cache memory by using abstractions. Lecture 













[HB95] It. Hojati and R. Brayton. A utom atic d a tap a th  abstraction  of hardware system s. In 
Conference on Computer-Aided Verification, 1995.
R. Hojati, R. M ueller-Thuns, P. Loewenstein, and R. Brayton. A utom atic verification 
of memory system s which service their requests ou t of order. In CHDL , pages 623-639, 
1995.
Gerry Kane. P A - R I S C  2.0 Architecture.  Prentice Hall, 1996. ISBN 0-13-182734-0.
Leslie Lam port. How to  make a  m ultiprocessor com puter th a t  correctly executes mul­
tiprocess program s. IEEE Transactions on Computers , 9(29):690-691, 1979.
Leslie Lam port. How to  make a correct multiprocess program execute correctly on a 
multiprocessor. Technical report, Digital Equipm ent Corporation, Systems Research 
Center, February 1993.
Leslie Lam port. The tem poral logic of actions. A C M  Transactions on Programming  
Languages and Sys tems , 16(3):872-923, May 1994. Also appeared as SRC Research 
Report 79.
P. Ladkin, L. L am port, B. Olivier, and D. Roegel. Lazy caching in tla . Distributed  
Computing , 1997.
Kenneth L. McMillan. Symbolic Model Checking. Kluwer Academic Press, 1993.
A. Mokkedem. Verification of three memory system s using test model-checking. 
h ttp ://w w w .cs.u tah .edu /-m okkedem /v is/v is .h tm l.
Seungjoon Park and David L. Dill. Verification of FLASH cache coherence protocol by 
aggregation of d istributed transactions. In SPAA,  pages 288-296, Padua, Italy, June 
24-26, 1996.
Vis-1.2 release. h ttp ://w w w -cad.eecs.berkeley.edu/R espep/R esearch/vis/index.htm l.
David L. Weaver and Tom Germond. The S P A R C  Architecture Manual -  Version 9. P 
T  R Prentice-Hall, Englewood Cliffs, NJ 07632, USA, 1994.
17
