Towards a formal model of shared memory consistency for intel itanium by Gopalakrishnan, Ganesh & Chatterjee, Prosenjit
Towards a Formal Model of Shared 





School of Computing 
University of Utah 
Salt Lake City, UT 84112 USA 
April 5, 2001 
Abstract 
We provide a simple formal model for ltanium™ shared memory consistency covering a 
core set of instructions. Existing descriptions of Itanium shared memory consistency are 
based on an informal collection of ordering rules as well as several examples. OUf oper-
ational model employs widely understood data structures such as buffers and memories, 
and expresses ordering constraints precisely using a collection of non-deterministic rules. 
This can enable the construction of reliable prototype implementations, formal verification 
against implementations, fonnal verification against other formal models, as well as veri-
fication of synchronization routines. Our model covers all published ordering constraints, 
and also sheds light on tricky concepts such as causality. 
Towards a Formal Model of Shared Memory Consistency for Intel Itanium™ 
Prosenjit Chatterjee and Ganesh Gopalakrishnan , 
School of Computing, University of Utah 
http://www.cs.utah.edu!formal_verification/ 
Technical Report No. UUCS -01 -0003 
Towards a Formal Model of Shared Memory Consistency for Intel Itanium™ 
Prosenjit Chatterjee and Ganesh Gopalakrishnan • 
School of Computing, University of Utah 
http:/ /www.cs.utah.edu/formal _verification/ 
Abstract 
We provide a simple formal model for ltanium™ 
shared memory consistency [1, 2} covering a core set 
of instructions. Existing descriptions of ltanium shared 
memory consistency are based on an informal collection 
of Qrdering rules as well as several examples. Our op-
erational model employs widely understood dala struc-
tures such as buffers and memories, and expresses or-
dering constraints precisely using a collection of non-
deterministic rules. This can enable the construction of 
reliable prototype implementations, formal verification 
against implementations, formal verification against 
other formal models, as well as verification of synchro-
nization routines. Our model covers all published or-
dering constraints, and also sheds light on tricky con-
cepts such as causality. 
1 Introduction 
The ltanium™ shared memory consistency model 
[1, 2] is described in terms of a collection of ordering 
rules, constraints stated in English, and examples of 
legal and illegal executions. While good for initial un-
derstanding, such descriptions often leave many details 
unanswered . This can make it difficult for program-
mers to write reliable MP libraries. As far as we know, 
a for mal specification (operational or otherwise) has 
not yet been published for Itanium. 
In this paper, we provide a simple execution oriented 
(operational) model for the Itanium shared memory 
consistency reverse-engineered from [1, 2]. We believe 
that it accurately (and more completely) describes al-
ternative descriptions publicly available. The availabil-
ity of an operational model can help designers build ex-
ecutable prototypes to gain deeper understanding. In 
addition, they can use model-checkers to gain a deeper 
understanding with respect to synchronization routines 
'This work was supported by National Science Foundation 
Grants CCR-9987516 and CCR-0081406 
as well as specific ordering issues [3, 4J. Like any for-
mal specification, an operational model runs the risk of 
being over- or under-specified. In this paper we point 
out, as space permits1how we have strived to avoid 
these risks. 
Our model deals with cacheable memory instruc-
tions consisting of acquire loads (written ld.acq), or-
dinary loads (ld), release stores (st .rel), and ordi-
nary stores (st), as well as memory fences. It does 
not currently handle atomic read-modify-writes, non-
cacheablc memory, or special ru les pertaining to data 
dependencies involving registers [1, Section 13.21. De-
spite its simplicity, our model captures all published 
ordering properties of the instructions we consider, and 
also sheds more light on corner cases pertaining to 
causality. While operational models have been pro-
posed for commercial shared memory systems (notably 
for Spare V9 [5]), a notable feature of our operational 
model is its use of a few explicit devices such as vector 
timestamps [6] to clearly describe the tricky notion of 
causality. 
2 Overview of the Itanium Memory 
Model 
The Itanium memory model can be understood in 
terms of program and global visibility ("visibility") or-
ders. For memory operations of type 'store', visibility 
refers to when the effects of the store become appar-
ent to all processors. For memory operations of type 
'load,' visibility refers to when the execution of load 
appears to have been carried out for the processor car-
rying out the load. (All other processors do not directly 
observe the load happening.) As in (1], for two differ-
ent memory operations X and Y , X»Y specifies that 
X is before Y in program order, X -1- Y indicates that 
Y must be visible only after X is visible. Further, if 
X -1- Y and Y -1- Z, then X -1- Z. Now, if X and Y 
are two memory operations in the same program, and 
I Details appear in our webpage. 
X ))Y, Itanium requires the following: 
1. If X is a load (store) and Y is a store (load) to 
the same location, WAR (RAW) hazards must be 
avoided. 
2. If X and Yare stores to the same location, WAW 
h37.ards must be avoided. In addition, X -t Y. 
3. If X and Yare memory operations to any location, 
and have a fence between them, then X -t Y. 
4. If X is an Acquire load and Y is any other memory 
operation to any location , then X -t Y. 
5. If X is any memory operation to any location and 
Y is a Release store, then X -t Y. 
Under all other circumstances, X and Y can get exe-
cuted in any order. [lJ also asserts the following con-
straints on executions: 
Coherence: There is a single visibility order (which 
is also a total order) of all stores per memory location 
observed by all the processors. Further, this total or-
der is consistent with the program order for memory 
operations on that location in each processor, 
RC_t so: Intuitively, ld.acq and st.rel are used to 
"bracket" instruction sequences, to permit more lib-
eral execution orders for instructions in-between. To a 
crude approximation, ld.acq and st.rel are strongly or-
dered as in sequential consistency [7J . However, more 
precisely viewed, RC_tso [2] captures the orderings in-
volving ld.acq and st. rei as per Release Consistency [7), 
with ld.acq and st.rel obeying TSO [5J. Under RC_tso, 
there is a single global visibility order of all Release 
Stores, with the exception t hat each processor may see 
(via ordinary or acquire loads) its own updates earlier 
t han when other processors see it. Further, this global 
visibility order is a total order consistent with program 
order of all release stores in each processor. 
Causality: When a st.reZ instruction X in some pro-
cessor PI is read by an ld.acq instruction Y in another 
processor P2, then no store instruction following Y in 
program order must be visible to any processor before 
X is visible. 
2.1 Examples 
The following examples (some from [1]) illustrate 
the Itanium ordering rules (assume that each memory 
location has value 0 in the beginning) . 
• The following execution is invalid due to Rule 2 per-
taiIling to WA WA Rule 4 pertaining to Acquire, and the 
requirement of voherence, 
, 





• Fence (Rule 3) is illustrated by the following invalid 
execution. Here, Id(B,O) and Id (A,O) are seen after 
a fence, while the stores that supply new values into A 
and B are not getting flushed as is required by fences; 
• st(B, ll 
Fe nce Fenc" 
Id(S,O) Id(.l.O) 
• Acquire and Release (Rules 4 and 5) are illus-
trated by the following invalid execution. Here, 
s t (A, 1) precedes s t. r el(B , 1), while Id. acq(B) pre-
cedes Id(A ). However, we see Id(A.O) happening in-





Id."cq(S , l) 
Id(.l,O) 
• Coherence is illustrated by the following invalid ex-
ecution. This is because the Acquire semantics forces 
the Id instructions to occur after the Id .acq instruc-
tions. However, processors R and S are observing tbe 
updates to A in different orders: 
, Q • , 
s t(A,1) st(A,2) Id . a.cq(A,O Id.ecq(.l,2) 
Id(A,2) Id(A,1) 
• RC_tso is illustrated by the following valid execut ion. 
The store of P into A is locally visible to P (say, via 
a cache or st.ore buffer) before it becomes visible to all 









• Another aspect of release stores is that all the st. reI 
of all the processors taken together forms a single global 
visibility order that is also a total order. Considering 
this, the followiug outcome is not valid, because Q and 









Id( A. ,O) 
, 
lit. rel(B, 1) 
• The following example violates the causality rule. 
The st .rel(A, 1) of P is observed by Q via ald. acq. 
Further, st(B,!) of Q is observed by Id.acq of R. 
Causality now requires that st (8 ,1) must be visible 
to R only after s t. rel (A.l) . However, in this exam-








2.2 The Operat ional Model 
Each instruction issued by a processor is modeled 
by a tuple t = (p,l,o,a,d,v). The field p of tuple t is 
selected by p(t), and so on. Here, 
G (3) 
Id st strel Id 
" 












• e , , 
Figure 1. An Operational Model of Itanium 
• p(t) is the processor issuing the instruction . 
• l(t) is the ordinal position ("label") of the instruc-
t ion in the sequentia l progra m running on p(t) . 
• o(t) is the operation type which can be ld, ld.acq, 
st, st.reI, or Fence. 
• a(t) is the memory locat ion to be written into (if 
o(t) E {st,st.rel}), or to be loaded from (if o(t) E 
{ld,ld.acq}). 
• d(t) is t he data value to be writt en into a (for 
stores), or to be loaded from a (for loads). 
• v(t) is a vector of labels, whose purpose is to model 
causality, as will be explained shortly. 
Some of the fi elds of a tuple t may be undefined for 
certain instructions, as will be apparent from the tran-
sition system. The operational semant ics is now de-
scribed in terms of five data structures held by each 
processor Pi (see Figure 1), and how each instruction 
tuple t that is issued updates these data structures 
and/or returns the read value, as per Table 1. By 
'buffer' we mean an unbounded structure in which the 
entries maintain their arrival order as in a FIFO, but 
ent ries may be removed from anywhere provided a re-
moval condition is satisfied. T he oldest entry is always 
at the head and the youngest at the tail. Initially, all 
buffers are empty. T he data structure elements are: 
st.rel F~~~:::;~~~~~,~.===~===~~:;:::==l 
ld.acq(t) 
a(!') = ate) A d{t') = d(t) 
1\ t' youngest 
for add ress a(t) 
e lse 
Mp(f) [a{t) ] = d(t) 
A 
.... 3 t' E WI Bp(!) : 
a(t' ) = aCt) A d(t') = d(t) 
A t' youngest 
Id(/) for address aCt) 
else 
t E WI Bp(!) 
A 
Mp(I)[a(t)] = d(t) 
A -,3t' E WIDp(I): 
Delete(W 1 Bp(1) , t) ; 
jl(o(t) = st.rel) 
Table 1. Transition System 
1. a memory Mi that spans the ent ire address-space 
of Itanium and holds word-sized data in each lo-
cation. Mi is updated when the M W(t) event of 
Table 1 fi res, which removes an entry from WI Bi 
writes into Mi. Initially, each location of M; car-
ries data O. 
2. a write-out buffer W OB; into which st and st.rel 
are enqueued. When the Del(t) event of Table 1 
fires, an ent ry is removed fro m W 0 Bi and atomi-
cally copied into all WIB j • 
3. a load buffer RBi into which ld instructions (but 
not ld.acq) are enqueued when event ld(t) of Ta-
ble 1 fires. Later, when an MR(t) event fires, a 
tuple t is removed from RB;, and data d(t) corre-
sponding to this t uple gets returned. 
4. a write-in buffer WIBj, and 
5. a label-vector Li held by each processor Pi ' Th is 
is a vector of natural numbers, with each entry 
init ialized to O. Specifically, Li(j] holds the label 
of the last st.rel instruction of Pj that has already 
been written into Mi' In other words, Li(j] in-
dicates the (release-store) instruction of Pj upto 
which M; has "caught up." To maintain this in-
variant, whenever any memory location in Ali gets 
updated by a release store operation represented 
by the tuple t, Lp(t)[P(t)] gets set to the value l(t), 
which is the label of the release instruction repre-
sented by t. Before an st instruction is enqueued 
into WO B i , the v(t) field of this instruction is set 
to the current Li value. 
2.3 State Transition Rules 
Table 1 defines the operational semantics of the Ita-
nium shared memory modeL The fir st column shows 
Events that happen if the guard condition in the sec-
ond column is true, performing the actions shown in 
the third column. At any time, anyone of the eligible 
events may be picked in a fair manner. Each event 
happens when the next instruction t is issued by pro-
cessor p(t) (for events ld.acq(t) through Fence(t)), or 
when an instruction is removed from one of the internal 
burters and is carried out (for events M W (t), M R(t), 
and Del(t)). Notice that in case of events ld.acq(t), 
ld(t), as well as M R(t), t uple t carries the data d(t) 
being returned (following the convention used in [8]) . 
When these events fire, a constraint expressed in the 
Guard field shows what this data is. We use = for 
equality testing, and t- for assignment. 
ld.acq(t); If the next instruction tuple t of processor 
p(t) is a ld.acq, we perform the ld.acq(t) event . We seek 
an entry t' in WOSp(l) such that a(t') = a(t), and t' is 
the youngest such entry, if multiple ent ries exist. If t' 
exists, the returned data d(t) is the same as d(t'). Ifno 
such entry exists ("else"), ld.acq must get serviced from 
the memory MI,(I) , and t hat too, only when there is nQ 
tuple t' in the WIBp(t) buffer such that p(t') = p(t) 
and a(t') = a(t). The condition p(t') = p(t) prevents a 
ld.acq from bypassing an earlier issued st or st. reI on 
the same address. 
ld(t): As with ld.acq(t), the ld(t) event is serviced di-
rectly by WOSp(t) upon a 'hit '; otherwise, t is en-
queued into RBp(!) via I ssue(RBp(t),t). 
st.rel(t); results in t being enqueued into WOBp(tl via 
procedure Issue. 
st(t) fi rst updates the v field of tuple t with the la-
bel vector Lp(l) (shown by t +- t[ Lp(t)/v ]), and then 
enqueues the resulting tuple t into WOBp(,) via proce-
dure Issue. 
Fence(t) is carried out by procedure Flush , whicb 
Rushes every pend ing RBp(t) entry, every W OBp(,) en-
try, and every WI B; entry for all j, where the entry 
comes from p(t) and occurs earlier than t in program 
order. 
MW(t) updates the memory array Mp(l) from 
WIBp(,). Its guard 'A llowed' captures when tuple t, 
which is present in W I Bp(t), can be processed ahead 
of all the other tuples with in WI Bp(t). This is precisely 
when there isn't an older WI Bp(t) entry t' and one of 
the following four conditions hold: (i) a(t) = a(t'), 
(ii) both t and t' are st.rel, (iii) both come from 
p(t) with o(t) = st.rel, (iv) the label of t' matches 
v(t)[P(t')J, which is the label of the last st.rel from 
p(t) seen by p(t), o(t') = st.rel, and o(t) = st. Condi-
tion (iv) blocks the st from happening until after Mp(1) 
also has assimilated t' , ensuring causality. When event 
MW(t) fires, Mp(t) is first updated , and tuple t is then 
deleted from WI B,,(,) by procedure Delete. Also, if the 
operation of tuple t is st.rel, the label-vector Lp(l) is 
updated to the label v(t) carried by tuple t to record 
the release-store upto which Mp(t) has caught up. 
MR(t) represents when a tuple t buffered in RBp(t) 
(corresponding to an ld instruction) gets serviced. This 
event is allowed when memory array Mp(t) holds d(t) 
at address a(t), and t here is no t' in WI Bp(t) with a 
matching address from the same processor. 
Del(t) calls procedure ProcWO B which first checks if 
o(t)= st and there is an entry t' in RBp(1) with address 
a(t), or if o(t) = st.rel and there is an entry t' in 
either RBp(l) or WOBp(l) with a lower label. If neither, 
ProcWOB deletes t from WOB , copying it atomically 
into every WI B. The functions used in the transition 
system are now described. 
Flu,h( I), 
WHILE V (len(WORp(t» > 0) 
V (Ien(RB~p(I» > 0) 
V (3 i, I'EW I B~i , p(t)=p(I') A I(I')<I(I» 
DO FOR til E RB_p(t) DO an HR(t") event 
FOR t" E WOB _p(t) DO ProcWOB(WOB..p(t),t) 
FOR til Esome WIB_i where p(t)=p(t") 1\ l(t")<l(t) 
DO M W (t") 
END WHILE 
PwcWOB(WOB~p(I), I), 
IF V(o(I) = " A ~3 I'E{RB~p(t). WOB~p(t)} , 
a(I) = a(I') A I(I') < I(I)l 
V(o(I) = ,'.ret A , 3 t'E{ RRp(t), WOB~p(')}' 
I(I')<I(I)l 
THEN 
Delete t from WOB _p(t) ; 
FOR all i DO fssl.le(WfB _i,t) 
END IF 
Allowed(W f B _p(t),t): 
~3 "E WIB_p(t) , 
t' older than t 
A 
(V(a(t) = a(t')) 
V(o(t) = ott') = st.,el) 
V(P(t) p(") A ott) ,t"el) 
V(I(t') v(t)(P(t'))) 
A 
(o(t') = st.rel 1\ o(t) = sO 
) 
Issue(Buf fer, t); Add t to the tail of Buffer as in a 
FIFO queue. 
Delete(Buf fer, t); Here, Buffer is either RB or WI B. 
This procedure deletes t wherever it may be ill Buffer. 
3 Analysis of our Operational Model 
We now show how our operational model meets the 
requirements laid out in Section 2. 
1. RAW for load operation r and store operation w 
earlier in program order: (i) If r is satisfied when 
it hits a w in WOB (Table 1, event ld.acq(t) or 
ld(t)), RAW is satisfied. (ii) If r is satisfied from 
memory, in case w has already been written into 
memory, the RAW hazard is avoided. If however 
w is in WI B hence blocks r (which is in RB) from 
issuing, we freeze r till w is written into the mem-
ory (Table 1, event M R(t)). T hus, here also the 
RAW hazard is avoided. 
2. WAR for load r and store w from the same pro-
cessor: Note that w cannot move from WOB to 
WI B until the load is drained from WI B (see 
ProcWOB). Hence, WAR hazards are avoided. 
3. WAW, as well as visibility order for stores to the 
same location are guaranteed as fo llows. If there 
are two stores to the same address in WOB, event 
Del(t) removes them in the oldest-first order. If 
the second store comes while the first has gone 
into WI B, then the "t' issued before t" check in 
function Allowed prevents a younger write from 
overtaking an older one. 
4. Fence: Procedure Flush carries out all "preceding" 
instructions before allowing instruction issuing to 
resume. Hence Rule 3 is obeyed . 
5. Acq: Hazard aspects of Acq have already been 
covered. Since Acq blocks further instruction is-
suing till it gets carried out (see event ld.acq(t)), 
Rule 4 pertaining to vi:;ibility is satisfied . 
6. Rei: Loads that come before st.rel are handled by 
ProcWOB that checks for loads with lower labels. 
Stores before st.rel are also checked in a similar 
manner. A st.rel that enters WI B when there 
is another store in W f B from the same processor 
is prevented from reordering by function Allowed. 
This meets Rule 5. 
7. Coherence: The rules for handling WOB and 
WI B ensure Coherence. 
8. RC_tso: Handling of WOB and WI B ensure a 
total global visibility order of release stores. The 
"TSO" aspect of RC_tso comes naturally because 
each processor may see its own update early via 
the WOB, exactly as in classical TSO [5]. 
All rules except for causality have been discussed. 
We now discuss causality in some detail. Causality 
can be summarized at a high level a" follows: "Before 
any st operation 0 is posted into any M ;, ensurc that 
every st.rel operation r that a is "causally dependent 
upon" has already been updated into Mi . "Causally 
dependent on" means 0 was issued by some Pi after it 
had updated its own store M j with the value provided 
by r. 
Causality is obeyed to a certain extent. Specifically, 
if a st. rei satisfies a ld.o.cq instruction then all subse-
quent store operations following that ld.acq instruction 
in program order will be visible to all processors after 
that st. rei operation. It suffices to prove this condi-
tion by proving that if X is a st. ret from any processor 
p(X) satisfying Y which is a ld.acq in processor p(Y), 
Z is a st to any memory address in p(Z) where Y »Z 
(hence p(Y) = p(Z)), and X --t Y in p(Y), thcn X --t 
Z for any processor Pl.:. Since X --t Y, 
• X must have been updated in My by the time Y 
is carried out, 
• the label vector v(Z) must reflect the update of 
X, ;"" v(Z)(P(X)] ~ I(X), and 
• for any other processor Pl.:, either X is updated in 
Mk or else it resides in W IBk. When Z gets issued 
to all W /8 buffers, and in particular WI Bk , it 
cannot participate in the MW(t) event before X 
can do so, due to the behavior of function Allowed. 
As an example, consider the earlier discussed example, 
now with labels; 
, Q • l:st.r&l{A.t) l:ld.acqU,t> 1:1d.&.cq(B,t) 
2,st(B,t) 2:1d(.t,O) 
The label vector carried by instruction st(B,1) 
would be [1,0,0] because Q would have seen the 
st. reI (A.l) instruction of P situated at label 1 when 
it issues st(B.1). If st.rel(A.l) still resides in the 
WIBR buffer when st(B.l) also enters WIBR, func-
tion Allowed ensures lhat the former is posted into Mil. 
before the latter. Thus, Id(A.O) is impossible in R. 
3.1 Or dering Relaxat ions 
We now discuss a few examples of ordering relax-
alions correctly supported by our model. 
Releases can be bypassed by subsequent operations. 
Moreover, these operations may bypass operations pre-




st(e , l) can bypass both st.rel(b , 1) and st(a.l). 
T his is supported by our operational model as fol-
lows. Suppose these instructions are in WOll. Proc-
WOB will considcr st (e.1) as well as st (a,1) eligible 
for movement into W [H, because, for st instructions, 
the label comparisons are done address-wise. How-
ever, ProcWOB will not be able to move st. rel(b, 1) 
into WI B before it moves st (a ,l ), because for release 
stores, label comparisons are across all addresses. 
Itanium is not required to provide any global total 
order for st instructions. In this example, 
P2 P3 
IIt(a,l) . t(b,2) Id . aeq(a,l) I d.aeq(b , 2) 
Id(b,O) I d(a,O) 
it a llows P3 to sec st (a , 1) before st (b, 2) and vice 
versa in P4. This relaxation is supported by fu nction 
Allowed. Suppose st(a.i) and st(b,2) are both in 
WI Bp3 and W IB p4 in some order. Function Allowed 
can pick st (a,l) to post first in M p3 , while it can pick 
st(b,2) to post first in Mp4. 
3.2 H ow R»R m ay im pact causality 
It is unclear by reading [I] whether the following 
execution is legal or not: 
P1 
Id(A,I) 
Id.aeq(A , 2) 
st(B,l) 
P2 P3 P4 
s t rel(A,!) st(A,2) ld.aeq(B,t) 
ld(A,O) 
If the instructions I d(A,l) and Id.aeq(A,2) are 
ordered because they are loads on the same loca-
tion, t hen the following consequences of causality 
emerge. We have st.rel(A.i) being ordered be-
fore Id.aeq(A,2) in the visibil ity order of Pl. Due 
to the acquire semantics, st(B, l ) is performed af-
ter Id.acq(A,2) . The situation is qu ite analogous to 
the Causality example on Page 2, except the causal 
chain forms through a 10ad-to-loOO order. Now, since 
st(B.l) is observed by Id.acq(B.O, we cannot have 
Id(A.O) in P4 due to causality. It is unknown to 
us whether 10OO-to-load orderings such as between 
Id(A.l) and Id.acq(A,2) are to be obeyed, and if 
so must cause causal chains in this fashion . 
4 Concluding Rem arks 
In this pa.per, we provided a simple operational 
model for Itanium ™ shared memory consistency. Our 
operational model is based on three buffers, a mem-
ory array, a label-array, and a collection of non-
deterministic rules to process loads, stores, and fences 
with respect to these data structures. We point out as-
pects of this memory model, including causality rules. 
We believe that OUI model can form a concrete point 
of discussion for understanding the Itanium™ proces-
sor. We also anticipate usage in formal verification, as 
well as easy adaptation through changes to the rules to 
other memory models. 
R eferences 
[1] Intel, The IA-64 Architecture Software Developer'~ 
Manual Vol. 2 rev. 1.1: //anium (TM); System Archi-
tecture, Intel, 2000, Volume 2, Chapter 13, ~Coherence 
and MP Order i llg.~ http://developer.intcl.com/design/ 
ia-64/downloads/24531802.htm. 
[2] Gil Neiger, 2001, http://www.cs.utah.edu/mpv/papers 
/neiger /fmcad2001 .pdf. 
[3] David L. Dill, Seungjoon Park and Andreas Nowatzyk, 
uFonnal Specification of Abstract Memory Modelsn , in 
Gaetano Borriello and Carl Ebeling, editors, Research 
on Integrated Systems, pp. 38- 52. MIT Press, 1993. 
[4] Ratan Nalumasu, Rajnish Chughal , Abdcl Mokke-
dem and Ganesh Gopalakrishnan, "The 'Test Model-
Checking' Approach to the Verification of Formal Mem-
ory Models of Multiprocessors", in Alan J. Hu and 
Moshe Y. Vardi, editors, Computer Aided Verification, 
volume 1427 of Lecture Notes in Computer Science, pp. 
464- 476, Vancouver, BC, Canada, June 1998, Springer-
Verlag. 
[5J David L. Weaver and Tom Germond, The SPARC Ar-
chitecture Manual - Version 9, P T R Prentice-Hall, 
Englewood CliA's, NJ 07632, USA, 1994. 
[6] Mustaque Ahamad, Gil Neiger, James E. Burns, Prince 
Kohli and Phillip W. Hutto, "Causal Memory: Defini-
tions, Implementation and Programming", Distributed 
Computing, vol. 9, n. 1, pp. 37-49, 1995. 
[7] Sarita V. Adve and Kourosh Charachorloo, "Shared 
memory consistency models: A tutorial", Computer, 
vol. 20, n. 12, pp. 66- 76, December 1996. 
[81 Rob Gerth, "Sequential COllsistency and the Lazy 
Caching Algorithm", Distributed Computing, vol. ?, 
II. 12. pp. 57- 59, 1999. 
A Details of function Allowed 
Why 1(") = v(')[P(")J and not 1(',) ,$ v(')[P(,')( ;, 
used in fun ction Allowed: Suppose l(t) < v(t)[P( t )1. 
Then the value L = v(t)[P(t)1 corresponds to some 
instruction, say t". There are two cases: (i) t" is in 
WI Bp(t). In this case, function Allowed ensures that 
t' gets posted into Mp(t} before t". Then, we will be 
back to the = test. (ii) t" has already posted into M, in 
which case it isn 't in W I Bp(t). This is a contradiction 
because t' is still in W1B, violating Allowed. 
