Analyzing the Intel Itanium memory ordering rules using logic programming and SAT by Yang, Yue
Analyzing the Intel Itanium  M emory Ordering 
Rules Using Logic Programming and SAT *
Yue Y ang, G anesh  G o p alak rish n an , G ary  L in d stro m , an d  K o n ra d  S lind
School of Computing, University of U tah 
{yyang, ganesh, g a ry , s lin d }@ cs.u tah .ed u
A b s tr a c t .  We present a non-operational approach to  specifying and 
analyzing shared memory consistency models. The m ethod uses higher 
order logic to  capture a complete set of ordering constraints on execution 
traces, in an axiomatic style. A  direct translation  of the semantics to  a 
constraint logic programming language provides an interactive and incre­
m ental framework for exercising and verifying finite test programs. The 
framework has also been adapted to  generate equivalent boolean satisfi­
ability (SAT) problems. These techniques make a memory model spec­
ification executable, a powerful feature lacked in most non-operational 
methods. As an example, we provide a concise formalization of the Intel 
Itanium  memory model and show how constraint solving and SAT solv­
ing can be effectively applied for com puter aided analysis. Encouraging 
initial results dem onstrate the scalability for complex industrial designs.
1 In trodu ction
M odern  sh a red  m em ory  arch itec tu res  re ly  on  a rich  se t of m em ory-access re­
la ted  in s tru c tio n s  to  prov ide th e  flex ib ility  needed  by  softw are. F or in stance , th e  
In te l I ta n iu m ™  processor fam ily  [1] prov ides tw o varie ties o f loads an d  sto res 
in  ad d itio n  to  fence an d  sem aphore  in s tru c tio n s , each assoc ia ted  w ith  different 
o rdering  restric tio n s. A m em ory  m odel defines th e  u n derly ing  m em ory  o rd er­
ing sem an tics (a lso  know n as m em ory  consistency). P ro p e r  u n d e rs ta n d in g  of 
these  o rdering  ru les is essen tial for th e  co rrec tness o f sh a red  m em ory  consis­
te n cy  p ro toco ls th a t  are aggressive in  th e ir  o rdering  perm issiveness, as well as 
for com piler tran sfo rm a tio n s  th a t  rea rran g e  m u ltith re a d e d  p ro g ram s for h igher 
concu rrency  an d  m in im al synch ron ization . D ue to  th e  com plex ity  of advanced  
co m p u te r a rch itec tu res , however, p rac tic in g  designers face a serious p rob lem  in 
re liab ly  com prehend ing  th e  m em ory  m odel specification .
C onsider, for exam ple , th e  assem bly  code show n in  F ig . 1 th a t  is ru n  concur­
ren tly  on  tw o I ta n iu m  processors (such  code fragm en ts are generally  know n as 
litm us tests): T h e  first p rocessor, P1, execu tes a store of d a tu m  1 in to  add ress a; 
it th e n  perfo rm s a store-release1 of d a tu m  1 in to  add ress b. P rocessor P2 perform s 
a load-acquire from  b, load ing  th e  resu lt in to  reg ister r1 .  I t  is followed by  an  or­
d inary  load from  lo ca tio n  a  in to  reg is te r r2 .  T h e  q u es tio n  arises: if  all locations
* This work was supported by a grant from the Semiconductor Research Corporation 
for Task 1031.001, and Research G rants CCR-0081406 and CCR-0219805 of NSF.
P1 P2
s t  a ,1 ;  ld .a c q  r1 ,b ;
s t . r e l  b ,1 ; ld  r 2 ,a ;
F ig . 1. A litmus test showing the ordering properties of store-release and load-acquire. 
Initially, a =  b =  0. Can it result in r1 =  1 and r2 =  0? The Itanium  memory model 
does not perm it this result. However, if the load-acquire in P2 is changed to  an ordinary 
load, the result would be allowed.
in itia lly  co n ta in  0, can  th e  final reg ister values be r1= 1  an d  r2 = 0 ?  To d e term in e  
th e  answ er, th e  I ta n iu m  m em ory  m odel m u st be consu lted . T h e  form al specifi­
ca tio n  of th e  I ta n iu m  m em ory  m odel is given in  an  In te l ap p lica tio n  n o te  [2]. I t 
com prises a com plex se t of o rdering  rules, 24 of w hich are expressed  exp lic itly  
based  on a la rge am o u n t of special term inology. O ne can  follow a pencil-and- 
p en  ap p ro ach  to  reason  th a t  th e  execu tion  show n in  F ig . 1 is n o t p e rm itte d  by 
th e  ru les specified in  [2]. B ased  on th is , one can  conclude th a t  even th o u g h  th e  
in s tru c tio n s  in  P2 p e r ta in  to  different addresses, th e  u nderly ing  h ard w are  is n o t 
allowed to  ca rry  o u t th e  o rd in a ry  load  a t th e  beg inn ing , an d  by  th e  sam e token, 
a sh a red  m em ory  consistency  p ro toco l o r an  o p tim izing  com piler ca n n o t reo rder 
th e  in s tru c tio n s  in  P2. A  fu rth e r  investiga tion  show s th a t  th e  above resu lt w ould 
be p e rm itte d  if th e  s t . r e l  in  P1 is changed  to  a s t ,  o r th e  l d . a c q  in  P2 is 
changed  to  a ld .  T herefore, s t . r e l  an d  l d . a c q  m u st b o th  b e  used  in  p a irs  to  
achieve th e  “b a rr ie r” effect in  th is  scenario.
A  litm us te s t like th is  can  reveal cruc ia l in fo rm atio n  to  help  system  design­
ers m ake rig h t decisions in  code se lection  an d  op tim iza tio n s. B u t as bigger te s ts  
are used  an d  m ore in tr ic a te  ru les are involved, tra c e  p ro p e rtie s  quickly  becom e 
n o n -in tu itiv e  an d  han d -p ro v in g  p ro g ram  com pliance can  b e  very  difficult. How 
can  one be assu red  th a t  th e re  does n o t ex ist an  in te ra c tin g  ru le  th a t  m igh t in­
tro d u ce  u n ex p e c ted  im p lica tions?  A lso, a  large scale design is o ften  com posed  
from  sim pler com ponen ts. To avoid being  overw helm ed by  th e  overall com plex­
ity, a useful techn ique is to  iso la te  th e  ru les re la ted  to  a specific a rc h itec tu ra l 
fea tu re  so th a t  one can  analyze th e  m odel piece by  piece. For exam ple , if one 
can  selectively  en a b le /d isa b le  ce rta in  ru les, he or she m ay  quickly  find o u t th a t  
th e  “p ro g ram  o rd er” ru les in  [2] are c ritica l to  th e  scenario  in  F ig. 1 w hile m any  
o th e rs  are irre levan t.
T hese issues suggest th a t  a series o f useful fea tu res are needed  from  th e  
specification  fram ew ork to  help  people b e t te r  u n d e rs ta n d  th e  u n derly ing  m odel. 
U n fo rtuna te ly , m ost n o n -o p era tio n a l specification  m e th o d s  leave these  issues u n ­
resolved because th e y  use n o ta tio n s  th a t  do  n o t su p p o rt analysis th ro u g h  execu­
1 Briefly, a store-release instruction will, a t its completion, ensure th a t all previous 
instructions are completed; a load-acquire instruction correspondingly ensures th a t 
all following instructions will complete only after it completes. These explanations 
are far from precise - w hat does “previous” and “completion” mean? A formal spec­
ification of a memory model is key to  precisely capture these and all similar notions.
tion . G iven th a t  designers need  lucid  an d  reliab le m em ory  m odel specifications, 
an d  given th a t  m em ory  m odel specifications can  live for decades, it is crucial 
th a t  p rogress be m ade  in  th is  regard .
In  th is  p ap e r, we ta k e  a fresh look a t  th e  n o n -o p era tio n a l specification  m e th o d  
an d  explore w h a t verification  techn iques can  be applied . W e m ake th e  follow­
ing co n trib u tio n s  in  th is  p ap e r. F irs t, we p resen t a  com positional m e th o d  to  
ax iom atica lly  c a p tu re  all aspec ts  o f th e  m em ory  o rdering  requ irem en ts, re su lt­
ing a com prehensive, c o n s tra in t-b ased  m em ory  consistency  m odel. Second, we 
p ropose a m e th o d  to  encode these  specifications using  F D -P ro lo g .2 T h is enables 
one to  p erfo rm  in te rac tiv e  an d  in c rem en ta l analysis. T h ird , we have harn essed  a 
boo lean  sa tisfiab ility  checker to  solve th e  co n s tra in ts . To th e  b es t o f ou r know l­
edge, th is  is th e  first ap p lica tio n  of SAT m e th o d s  for ana lyz ing  m em ory  m odel 
com pliance. As a case s tu d y  in  th is  approach , we have form alized a large subse t 
of th e  I ta n iu m  m em ory  m odel an d  used  co n s tra in t p ro g ram m in g  an d  boo lean  
sa tisfiab ility  for p ro g ram  analysis.
R e l a t e d  w o r k  T h e  area  of m em ory  m odel specification  has been  p u rsu ed  u n d er 
d ifferent approaches. Som e researchers have em ployed operational sty le specifi­
ca tio n s [3] [4] [5] [6], in  w hich th e  u p d a te  of a  global s ta te  is defined step -b y -step  
w ith  th e  execu tion  of each in s tru c tio n . F or exam ple , an  o p era tio n a l m odel [4] 
for S parc V 9 [7] was developed  in  M urph i. W ith  th e  m odel checking ca p ab ility  
su p p o rte d  by  M urph i, th is  execu tab le  m odel was used  to  exam ine m an y  code se­
quences from  th e  S parc V9 a rc h itec tu re  book. W hile th e  descrip tions com prising  
an  o p e ra tio n a l specification  o ften  m irro r th e  decision process o f an  im p lem en te r 
an d  can  be exp lo ited  by  a m odel checker, th e y  are n o t dec lara tive . H ence th e y  
te n d  to  em phasize th e  how  aspec ts  th ro u g h  th e ir  usage of specific d a ta  s tru c tu re s , 
n o t th e  w hat a sp ec ts  th a t  form al specifications are supposed  to  em phasize.
O th e r researchers have used  non-operational (also know n as axiom atic) spec­
ifications, in  w hich th e  desired  p ro p ertie s  are d irec tly  defined. N on-opera tiona l 
sty les have been  w idely  used  to  describe co ncep tua l m em ory  m odels [8] [9]. O ne 
no ticeab le  lim ita tio n  of these  specifications is th e  lack of a m eans for a u to m atic  
execution . A n ax iom atic  specification  of th e  A lpha  m em ory  m odel was w ritte n  
by  Yu [10] in  L isp  in  1995. L itm u s-te s ts  w ere w ritte n  in  an  S-expression syn tax . 
V erification  cond itions were g en e ra ted  for th e  litm us te s ts  an d  fed to  th e  S im plify 
[11] verifier o f C o m p aq /S R C . In  co n tra s t, we specify  th e  m o d ern  I ta n iu m  m em ­
o ry  m odel. O ur specification  is m uch  closer to  th e  ac tu a l in d u s tria l specification , 
th a n k s  to  th e  d ec la ra tiv e  n a tu re  of F D -P ro log . T h e  F D  c o n s tra in t solver offers 
a  m ore in te rac tiv e  an d  inc rem en ta l env ironm ent. W e have also app lied  SAT an d  
d e m o n s tra te d  its  effectiveness.
L am p o rt an d  colleagues have specified th e  A lpha  an d  I ta n iu m  m em ory  m od­
els in  T L A +  [12] [13]. T hese specifications also su p p o rt th e  execu tion  of litm us 
te s ts . T h e ir  ap p ro ach  bu ilds v isib ility  o rder inductively . W hile  th is  also p re­
cisely specifies th e  v isib ility  o rder, th e  m a n n er in  w hich such ind u c tiv e  defin itions
2 FD-Prolog refers to  Prolog w ith a finite dom ain (FD) constraint solver. For example, 
SICStus Prolog and GNU Prolog have this feature.
True
(Illegal execution)
F ig . 2 . The process of making an axiomatic memory model executable. Legality of a 
litmus test can be checked by either a constraint solver or a SAT solver.
are co n s tru c te d  will v ary  from  m em ory  m odel to  m em ory  m odel, m ak ing  com ­
parisons am ong th e m  h ard e r. O ur m e th o d  in s tea d  relies on p rim itive  re la tio n s 
an d  d irec tly  describes th e  com ponen ts to  m ake u p  a full m em ory  m odel. T h is 
m akes our specifications easier to  u n d e rs ta n d , an d  m ore im p o rtan tly , to  com ­
p are  ag a in s t o th e r  m em ory  m odels using  th e  sam e p rim itives. T h is also m eans 
we can  d isab le  som e sub-ru les q u ite  re liab ly  w ith o u t affecting  th e  o th e r  p rim itive  
o rdering  ru les - a dan g er in  a  sty le  w hich m erges all th e  o rdering  concerns in  a 
m onolith ic  m anner.
R o a d m a p  In  th e  n ex t section , we in tro d u c e  ou r m ethodology. S ection  3 de­
scribes th e  I ta n iu m  m em ory  o rdering  ru les. S ection  4 d em o n stra te s  th e  analysis 
of th e  I ta n iu m  m em ory  m odel th ro u g h  execution . W e conclude an d  p ropose 
fu tu re  w orks in  S ection  5. T h e  concise specification  of th e  I ta n iu m  ordering  
co n s tra in ts  is p rov ided  in  th e  A ppend ix .
2 O verview  o f th e  Fram ework
A p ic to ria l rep rese n ta tio n  of ou r m ethodo logy  is show n in F ig . 2. W e use a 
collection of p rim itive  o rdering  ru les, each  serv ing  a clear pu rpose, to  specify 
even th e  m ost challenging com m ercial m em ory  m odels. T h is  ap p ro ach  m irro rs 
th e  sty le  a d o p ted  in  m o d ern  dec la ra tive  specifications w ritte n  by  th e  industry , 
such as [2]. M oreover, by  using  pu re  logic p rog ram s su p p o rte d  by  ce rta in  m odern  
flavors of P ro log  th a t  also include fin ite  d om ain  co n s tra in ts , one can  d irec tly  
c a p tu re  these  h igher o rd er logic specifications an d  also in te rac tiv e ly  execu te  th e  
specifications to  o b ta in  execu tion  resu lts  for litm us te s ts . A lte rna tive ly , we can  
o b ta in  SAT in stan ces of th e  boo lean  c o n s tra in ts  rep resen tin g  th e  m em ory  m odel 
th ro u g h  sym bolic execution , in  w hich case boo lean  sa tisfiab ility  too ls  can  be 
em ployed to  quickly  answ er w h e th e r ce rta in  litm u s te s ts  are legal or no t.
2 .1  S p e c i f i c a t io n  M e th o d
To define a m em ory  m odel, we use p red ic a te  calcu lus to  specify  all co n s tra in ts  
im posed  on  an  o rdering  re la tio n  order . T h e  co n s tra in ts  are a lm ost com pletely
first-o rder; however, since order is a p a ra m e te r  to  th e  specification , th e  con­
s tra in ts  are m ost easily  c a p tu re d  w ith  h igher o rder p red ic a te  calculus (w e use th e  
H O L logic [14]). P rev ious n o n -o p era tio n a l specifications o ften  im p lic itly  requ ire 
general o rdering  p ro p ertie s , such as to ta lity , tran s itiv ity , an d  circuit-freeness. 
T h is is th e  m ain  reason  w hy such specifications c a n n o t read ily  be executed . 
In  co n tra s t, we are  fully  exp lic it a b o u t such p ro p ertie s , an d  so ou r co n s tra in ts  
com plete ly  ch a rac terize  th e  m em ory  m odel.
2 .2  E x e c u t in g  A x io m a t ic  S p e c i f ic a t io n s
A  stra ig h tfo rw ard  tra n sc r ip tio n  of th e  form al p red ica te  ca lcu lus specification  
in to  a P ro log-sty le  logic p ro g ram  m akes it possib le for in te rac tiv e  an d  increm en­
ta l execu tion  of litm u s te s ts . T h is  encourages ex p lo ra tio n  an d  ex p erim en t in  th e  
va lida tion  an d  (w e an tic ip a te ) th e  developm ent of com plex coherence p ro toco ls. 
To m ake a specification  execu tab le , we in s ta n tia te  it over a  fin ite  execu tion  an d  
convert th e  verification  p ro b lem  to  a sa tisfiab ility  p roblem .
T h e  A lg o r i t h m  G iven a fin ite execu tion  ops w ith  n  o p era tio n s, th e re  are n 2 
o rdering  pairs , co n s titu tin g  an  o rdering  m a tr ix  M ,  w here th e  elem ent M j  in d i­
ca tes  w h e th e r o p era tio n s  i an d  j  shou ld  b e  o rdered . W e go th ro u g h  each o rdering  
ru le  in  th e  specification  an d  im pose th e  co rrespond ing  co n s tra in t reg ard in g  th e  
elem en ts o f M .  T h en  we check th e  sa tisfiab ility  o f all th e  o rdering  requ irem ents. 
If  such a M  exists, th e  tra c e  ops is legal, an d  a valid  in te rleav ing  can  be derived 
from  M .  O therw ise , ops is n o t a legal trace .
A p p ly in g  C o n s t r a i n t  L o g ic  P r o g r a m m i n g  Logic p rog ram m ing  differs fun­
d am en ta lly  from  conven tional p rog ram m ing  in  th a t  it describes th e  logical s tru c ­
tu re  of th e  prob lem s ra th e r  th a n  p rescrib ing  th e  d e ta iled  s tep s  of solving them . 
T h is  n a tu ra lly  reflects th e  ph ilosophy  of th e  ax iom atic  specification  style. As a 
resu lt, our fo rm al specification  can  be easily  encoded  using  Prolog. M em ory  or­
dering  c o n s tra in ts  can  be solved th ro u g h  a co n ju n ctio n  of tw o m echanism s th a t  
F D -P ro lo g  read ily  provides. O ne applies back track in g  search  for all co n s tra in ts  
expressed  by  logical variab les, an d  th e  o th e r  uses non -b ack track in g  c o n s tra in t 
solving based  on arc consistency  [15] for F D  variables, w hich is p o te n tia lly  m ore 
efficient an d  c e rta in ly  m ore com plete (especia lly  u n d e r th e  presence of negation ) 
th a n  w ith  logical variab les. T h is  w orks by  add ing  c o n s tra in ts  in  a  m ono ton ica lly  
increasing  m a n n er to  a c o n s tra in t s to re , w ith  th e  in -b u ilt c o n s tra in t p ro p ag a tio n  
ru les of F D -P ro lo g  help ing  refine th e  variab le  ranges (o r  conclud ing  th a t  th e  
c o n s tra in ts  are n o t sa tisfiab le) w hen  co n s tra in ts  are discovered an d  asse rted  to  
th e  c o n s tra in t sto re.
A p p ly in g  B o o le a n  S a t i s f i a b i l i ty  T e c h n iq u e s  T h e  goal o f a boo lean  sa tis ­
fiab ility  p rob lem  is to  d e term in e  a sa tisfy ing  variab le  assignm ent for a boo lean  
form ula or to  conclude th a t  no  such assignm ent ex ists. A sligh t v a ria n t o f th e  
P ro log  code can  le t us benefit from  SAT solving techniques, w hich have advanced  
trem en d o u sly  in  recen t years. In s te a d  of solving co n s tra in ts  using  a F D  solver,
we can  le t P ro log  em it SAT in stan ces th ro u g h  sym bolic execution . T h e  re su lta n t 
form ula is tru e  if an d  on ly  if th e  litm us te s t is legal u n d e r th e  m em ory  m odel. 
I t  is th e n  sen t to  a SAT solver to  find o u t th e  resu lt.
3 Specify ing th e  Itan ium  M em ory C on sisten cy  M odel
T h e orig inal I ta n iu m  m em ory  o rdering  specification  is in fo rm ally  given in  various 
places in  th e  I ta n iu m  a rc h itec tu re  m an u al [1]. In te l la te r  p rov ided  an  app lica­
tio n  n o te  [2] to  guide system  developers. T h is  d o cu m en t uses a  com bination  of 
E ng lish  an d  in form al m a th em atic s  to  specify  a core su b se t o f m em ory  o p era­
tions in  a n o n -o p era tio n a l style. W e d em o n s tra te  how  th e  specification  of [2] 
can  be a d a p te d  to  ou r fram ew ork to  enab le  co m p u te r a ided  analysis. V irtu a lly  
th e  en tire  In te l ap p lica tio n  n o te  has  been  c a p tu re d .3 W e assum e p ro p e r address 
a lignm en t an d  com m on add ress size for all m em ory  accesses, w hich w ould be 
th e  com m on case en co u n te red  by  p rog ram m ers (even these  res tric tio n s  could  be 
easily  lifted ). T h e  d e ta iled  defin ition  of th e  I ta n iu m  m em ory  m odel is p resen ted  
in  th e  A ppend ix . T h is section  exp lains each of th e  ru les. T h e  following defini­
tio n s are used th ro u g h o u t th is  paper:
I n s t r u c t i o n s  - In s tru c tio n s  w ith  m em ory  access or m em ory  o rdering  sem antics. 
F ive in s tru c tio n  ty p es are defined in  th is  p ap er: load-acqu ire  ( ld .a c q ) ,  store- 
release ( s t . r e l ) ,  u n o rd ered  load  ( ld ) ,  u n o rd ered  s to re  ( s t ) ,  an d  m em ory  fence 
(mf). A n in s tru c tio n  i m ay  have read sem an tics  ( i s R d  i =  tru e )  o r w rite  sem a n ­
tics  ( is W r  i =  tr u e ) .  L d .a c q  an d  l d  have rea d  sem antics. S t . r e l  an d  s t  have 
w rite  sem antics. Mf has  n e ith e r  rea d  n o r w rite  sem an tics. In s tru c tio n s  are  de­
com posed  in to  operations  to  allow  a finer specification  of th e  o rdering  p ro p ertie s .
E x e c u t io n  - A lso know n as a trace, co n ta in s  all m em ory  o p era tio n s  g en e ra ted  
by  a p rogram , w ith  sto res being  a n n o ta te d  w ith  th e  w rite  d a ta  an d  loads being 
a n n o ta te d  w ith  th e  re tu rn  d a ta . A n execu tion  is legal if th e re  ex ists  an  o rder 
am ong  th e  o p era tio n s  in  th e  execu tion  th a t  satisfies all m em ory  m odel con­
s tra in ts .
A d d r e s s  A t t r i b u t e s  - E v ery  m em ory  loca tion  is associa ted  w ith  an  address 
a t tr ib u te , w hich can  be w rite-back  (W B ), uncacheab le  (U C ), o r w rite-coalescing 
(W C ). M em ory  o rdering  sem antics m ay  vary  for d ifferent a ttr ib u te s . P re d ica te  
a t t r i b u t e  is used  to  find  th e  a t tr ib u te  of a location .
O p e r a t i o n  T u p le  - A tu p le  co n ta in in g  necessary  a t tr ib u te s  is used  to  m a th e ­
m a tic a lly  describe m em ory  o p era tio n s. M em ory  o p e ra tio n  i  is rep resen ted  by  a 
tu p le  (P ,P c ,O p ,V a r ,D a ta ,W rId ,W rT y p e ,W rP ro c ,R e g ,U s e R e g ,Id  }, w here
3 We have formally captured 21 out of 24 rules from [2]. Semaphore operations, which 
require 3 rules, have yet to  be defined.
requireLinearOrder requireMemoryDataDependence requireRead Value
- requireIrreflexiveTotal - MD:RAW - validWr
- requireTransitive - MD:WAR - validLocalWr
- requireAsymmetric - MD:WAW - validRemoteW r
- validDefaultW r
require WriteOperationOrder requireDataFlowDependence - validRd
- local/rem ote case - DF:RAR
- rem ote/rem ote case - DF:RAW requireNo UCBypasss
- DF:WAR
requireProgramOrder requireSequential UC
- acquire case require Coherence - RAR case
- release case - local/local case - RAW case
- fence case - rem ote/rem ote case - WAR case
- WAW case
requireAtomic WBRelease
T ab le  1. The specification hierarchy of the Itanium  memory ordering rules.
p  i =  P  : 
p c  i =  P c  : 
o p  i =  O p  : 
v a r  i =  V a r  : 
d a t a  i =  D a ta  : 
w r I D  i =  W r I d  : 
w r T y p e  i =  W r T y p e  : 
w r P r o c  i =  W r P r o c  : 
r e g  i =  R eg  : 
u s e R e g  i =  U esR e g  : 
id  i =  I d  :
issuing processor 
p ro g ram  co u n ter 
in s tru c tio n  ty p e  
sh a red  m em ory  location  
d a ta  value
iden tifier of a w rite  o p era tio n  
ty p e  of a w rite  o p era tio n  
ta rg e t processor o f a w rite  o p era tio n  
reg ister
flag of a w rite  in d ica tin g  if i t  uses a reg ister 
global iden tifier o f th e  o p era tio n
A rea d  in s tru c tio n  or a fence in s tru c tio n  is decom posed  in to  a single op er­
a tion . A w rite  in s tru c tio n  is decom posed  in to  m u ltip le  opera tio n s, com prising  
a local w rite  o p e ra tio n  ( w r T y p e  i =  L ocal)  an d  a se t o f rem o te  w rite  o p era ­
tions ( w r T y p e  i =  R e m o te )  for each ta rg e t p rocesso r ( w r P r o c  i), w hich also 
includes th e  issuing processor. E v ery  w rite  o p e ra tio n  i th a t  o rig ina tes from  a 
single w rite  in s tru c tio n  shares th e  sam e p ro g ram  co u n te r (p c  i) an d  w rite  ID 
( W r ID  i ).
3 .1  T h e  I t a n i u m  M e m o r y  O r d e r i n g  R u le s
As show n below, p red ic a te  l e g a l  is a  top-level c o n s tra in t th a t  defines th e  legal­
ity  of a tra c e  ops by  checking th e  ex istence of an  o rder am ong ops th a t  satisfies 
all requ irem en ts. E ach  req u irem en t is form ally  defined in  th e  A ppend ix .
legal ops =  3 order.
re q u ir e L in e a rO rd e r  ops order A
re q u ir e W r i te O p e ra t io n O rd e r  ops order A 
r e q u ir e P r o g ra m O rd e r  ops order A 
re q u ir e M e m o ry D a ta D e p e n d e n c e  ops order A 
re q u ire D a ta F lo w D e p e n d e n c e  ops order A 
r e q u ire C o h e re n c e  ops order A 
re q u ire R e a d V a lu e  ops order A 
re q u ire A to m ic W B R e le a s e  ops order A 
re q u ire S e q u e n tia lU C  ops order A 
re q u ire N o U C B y p a s s  ops order
T able 1 illu s tra te s  th e  h ie ra rch y  of th e  I ta n iu m  m em ory  m odel defin ition . 
M ost c o n s tra in ts  s tr ic tly  follow th e  ru les from  [2]. W e also exp lic itly  ad d  a p red ­
ica te  r e q u i r e L in e a r O r d e r  to  c a p tu re  th e  general o rdering  req u irem en t since [2] 
has on ly  E ng lish  to  convey th is  im p o r ta n t o rdering  p roperty .
G e n e r a l  O r d e r i n g  R e q u i r e m e n t  ( A p p e n d ix  A .1 )  T h is requ ires o r d e r  to  
be an  irreflexive to ta l  o rder w hich is also circuit-free.
W r i t e  O p e r a t i o n  O r d e r  ( A p p e n d ix  A .2 )  T h is specifies th e  o rdering  am ong 
w rite  o p era tio n s  o rig in a te  from  a single w rite  in s tru c tio n . I t  g u aran tees  th a t  no 
w rite  can  becom e visib le rem o te ly  before it  becom es visible locally.
P r o g r a m  O r d e r  ( A p p e n d ix  A .3 )  T h is  re s tr ic ts  reo rdering  am ong in s tru c ­
tions of th e  sam e p rocessor w ith  resp ec t to  th e  p ro g ram  order.
M e m o r y - D a t a  D e p e n d e n c e  ( A p p e n d ix  A .4 )  T h is re s tr ic ts  reo rdering  am ong 
in s tru c tio n s  from  th e  sam e processor w hen th e y  access com m on locations .
D a ta - F lo w  D e p e n d e n c e  ( A p p e n d ix  A .5 )  T h is  is supposed  to  specify how 
local data dependency  an d  control dependency  shou ld  be tre a te d . However, th is  
is an  area  th a t  is n o t fully  specified in  [2]. In s tea d  of p o in tin g  to  an  inform al 
docu m en t as done in  [2], we p rovide a form al specification  covering m ost cases 
of d a ta  dependency, nam ely  estab lish in g  d a ta  d ep endency  betw een  tw o m em ory  
o p era tio n s  by  checking th e  conflict usages of local reg is te rs .4 A lth o u g h  [2] o u t­
lines four ca tegories for data-flow  d ep en d en cy  (R A R , RAW , W AR, an d  W AW ), 
th e  W AW  case (a w rite  here is ac tu a lly  a read in  te rm s of reg is te r usage, e.g., 
s t  a , r )  does n o t e s tab lish  an y  value-based  d a ta  dependence re la tio n . T herefore, 
d a ta  d ep en d en cy  as specified in  o rd e re d B y L o c a lD e p e n c e n c e  is on ly  se tu p  by 
th e  first th re e  cases.
4 We do not cover branch instructions or indirect-m ode instructions th a t also induce 
d a ta  dependency. We provide enough d a ta  dependency specification to  let designers 
experiment w ith straight-line code th a t uses registers - this is an im portant require­
m ent to  support execution.
C o h e r e n c e  ( A p p e n d ix  A .6 )  T h is  co n s tra in s  th e  o rd er o f w rites to  a com m on  
location. If  tw o w rites to  th e  sam e loca tion  w ith  th e  a t tr ib u te  of WB or UC becom e 
visible to  a processor in  som e o rder, th e y  m u st becom e visible to  all p rocessors 
in  th a t  o rder.
R e a d  V a lu e  ( A p p e n d ix  A .7 )  T h is  defines w h a t d a ta  can  be observed  by  a 
read  o p era tio n . T h ere  are th ree  scenarios: a rea d  can  get th e  d a ta  from  a local 
w rite  (v a lid L o c a lW r), a  rem o te  w rite  (validR em oteW r), o r th e  d e fau lt value 
(v a lid D e fa u ltW r) .  In  validR em oteW r we requ ire  th a t  “th e  read  is n o t o rdered  
w ith  th e  c a n d id a te  rem o te  w rite” . I t  is s ligh tly  d ifferent from  [2], w hich requires 
th a t  “th e  c a n d id a te  w rite  is o rd ered  w ith  th e  re a d ” . T h is  resu lts  from  th e  differ­
ence in  th e  w ay th e  o rdering  p a th  is co n s tru c ted . Since we do no  have an  explicit 
ru le th a t  es tab lishes th e  o rd er w hen a rea d  gets th e  value from  a w rite , th e  o rd er­
ing re la tio n  betw een  th e m  w ould n o t p re-ex is t. In  [2], a to ta l  o rd er is im plic itly  
im posed. S im ilar to  shared  m em ory  read  value rules, p red ic a te  v a l id R d  g u a ra n ­
tees consis ten t assignm ents o f reg isters - th e  value of a  reg is te r is o b ta in ed  from  
th e  m ost recen t p rev ious assignm ent o f th e  sam e reg ister.
T o ta l  O r d e r i n g  o f  W B  R e le a s e s  ( A p p e n d ix  A .8 )  T h is  specifies th a t  store- 
releases to  w r i t e - b a c k  (W B ) m em ory  m u st obey  rem o te  w rite  atom icity , i.e., 
th e y  becom e rem o te ly  visible atom ically .
S e q u e n t i a l i ty  o f  U C  O p e r a t io n s  ( A p p e n d ix  A .9 )  T h is  specifies th a t  op­
era tio n s  to  u n c a c h e a b le (U C )  m em ory  locations m u st have th e  p ro p e r ty  of se­
quentiality, i.e., th e y  m u st becom e visible in  p ro g ram  order.
N o  U C  B y p a s s in g  ( A p p e n d ix  A .1 0 )  T h is  specifies th a t  u n c a c h e a b le (U C )
m em ory  is n o t cacheable an d  does n o t allow  local bypassing  from  UC w rites.
4 M aking th e  Itan iu m  M em ory M odel E xecu tab le
W e have developed  tw o m e th o d s  to  analyze th e  I ta n iu m  m em ory  m odel. T h e  
first, as m en tioned  earlier, uses P ro log  b ack track in g  search , au g m en ted  w ith  
fin ite -dom ain  c o n s tra in t solving. T h e  second ap p ro ach  ta rg e ts  th e  pow erful SAT 
engines th a t  have recen tly  em erged.
T h e  C o n s t r a i n t  L o g ic  P r o g r a m m i n g  A p p r o a c h
O ur form al I ta n iu m  specification  is im p lem en ted  in  S IC S tus P ro log  [16]. L itm us 
te s ts  are co n ta in ed  in  a se p a ra te  te s t  file.5 W h en  a te s t  nu m b er is selected, 
th e  F D  co n s tra in t solver exam ines all co n s tra in ts  a u to m a tica lly  an d  answ ers 
w h e th e r th e  se lected  execu tion  is legal. B y  ru n n in g  th e  litm u s te s ts  we can  learn
5 We have verified most of the sample programs provided by [2]. The only 3 (out of 17) 
examples we cannot do at this point involve disjoint accesses to  memory locations. 
O ther litmus tests can also be easily added.
P1 P2
(1) s t_ l o c a l ( a ,1 ) ;  (7) ld .a c q ( 1 ,b ) ;
(2) s t_ re m o te 1 (a ,1 ) ;  (8) l d ( 0 ,a ) ;
(3) s t_ re m o te 2 (a ,1 ) ;
(4) s t . r e l _ l o c a l ( b ,1 ) ;
(5) s t .r e l_ r e m o te 1 ( b ,1 ) ;
(6) s t .r e l_ r e m o te 2 ( b ,1 ) ;
F ig . 3. An execution resulted from the program  in Fig. 1. Stores are decomposed into 
local stores and remote stores. Loads are associated w ith return  values.
th e  degree to  w hich execu tions are co nstra ined , i.e., we can  o b ta in  a general view 
of th e  global o rdering  re la tio n  betw een  pa irs  of in s tru c tio n s.
C onsider, for exam ple , th e  p ro g ram  d iscussed earlier in  F ig . 1. I ts  in s tru c ­
tions are decom posed  in to  o p era tio n s  as show n in F ig . 3. A fter ta k in g  th is  trac e  
as in p u t, th e  P ro log  to o l a t te m p ts  all possib le o rders u n til it can  find an  in s ta n ti­
a tio n  th a t  satisfies all co n s tra in ts . For th is  p a r tic u la r  exam ple , it re tu rn s  “illegal 
tra c e ” as th e  resu lt. If  one com m ents o u t th e  re q u ir e P ro g ra m O rd e r  ru le  an d  
exam ines th e  tra c e  again , th e  to o l quickly  finds a legal o rdering  m a tr ix  an d  a 
co rrespond ing  in te rleav ing  as show n in F ig. 4. M any  o th e r  ex p erim en ts  can  be 
conven ien tly  perfo rm ed  in  a sim ilar way. T herefore, n o t on ly  does th is  app roach  
give people th e  n o ta tio n  to  w rite  rigorous as well as read ab le  specifications, it 
also allows users to  p lay  w ith  th e  m odel, ask ing  “w h a t if” queries a fte r selectively  
en ab lin g /d isab lin g  th e  o rdering  ru les th a t  are cruc ia l to  th e ir  work.
A lthough  tra n s la tin g  th e  form al specification  to  P ro log  is fairly  s tra ig h tfo r­
w ard, th e re  does ex ist som e “logic g ap ” betw een  p red ic a te  calcu lus an d  Prolog. 
M ost P ro log  system s do n o t d irec tly  su p p o rt quan tifiers. T herefore, we need  to  
im p lem en t th e  effect o f a un iversa l quan tifie r by  en u m era tin g  th e  re la te d  fin ite 
dom ain . T h e  ex is ten tia l quan tifie r is realized  by  th e  back  track in g  m echan ism  of
1 2 3 4 5 6 7 8
1 0 1 1 0 0 0 0 0
2 0 0 1 0 0 0 0 0
3 0 0 0 0 0 0 0 0
4 1 1 1 0 1 1 1 0
5 1 1 1 0 0 1 1 0
6 1 1 1 0 0 0 1 0
7 1 1 1 0 0 0 0 0
8 1 1 1 1 1 1 1 0
F ig . 4. A legal ordering m atrix  for the execution shown in Fig. 3 when requirePro- 
gram Order is disabled. A value 1 indicates th a t the two operations are ordered. A 
possible interleaving 8 4 5 6 7 1 2 3  is also autom atically derived from this m atrix.
Prolog when proper predicate conditions are set.
T h e  S A T  A p p r o a c h
As an  a lte rn a tiv e  m e th o d , we use ou r P ro log  p ro g ram  as a  d river to  em it 
p ro p o sitio n a l form ulae assertin g  th e  so lvab ility  o f th e  co n s tra in ts . A fter being  
converted  to  a s ta n d a rd  fo rm a t called  D IM A C S, th e  final fo rm ula is sen t to  a 
SAT solver, such as b e rk m in  [17] o r z C h a ff  [18]. A lthough  th e  c lause-generation  
phase  can  be d e tach ed  from  th e  logic p ro g ram m in g  app roach , th e  ab ility  to  have 
it coexist w ith  F D -P ro lo g  m igh t b e  advan tageous since it allows th e  tw o m e th o d s  
to  share  th e  sam e specification  base. T h e  com plex ity  of boo lean  sa tisfiab ility  is 
N P -C om plete . However, trem en d o u s  progress has  been  achieved in  recen t years 
in  SAT tools, m ak ing  SAT solving an  effective techn ique for in d u s tria l app lica­
tions. A ccording to  ou r in itia l resu lts , th is  seem s to  offer an  encourag ing  p a th  
to  tack lin g  la rger p roblem s.
P e r f o r m a n c e  R e s u l t s
P erfo rm ance s ta tis tic s  from  som e litm u s te s ts  is show n below. T hese te s ts  are 
chosen from  [2] an d  rep resen ted  by  th e ir  orig inal ta b le  num bers. P erfo rm ance is 
m easu red  on a Dell In sp iron  3800 m achine w ith  512 M B m em ory  700 M Hz C P U . 
S IC S tus P ro log  is ru n  u n d er com piled  m ode. T h e  SAT solver used  is b erkm in .
Test R esu lt F D  Solver(sec) V ars C lauses SAT(sec) C N F  G en T im e
[2, T able 5] illegal 0.38 64 679 0.03 negligible
[2, T able 10] legal 2.36 100 1280 0.01 negligible
[2, T able 15] illegal 17.7 576 15706 0.05 a m in u te
[2, T able 18] illegal 1.9 144 2125 0.01 few secs
[2, T able 19] legal 3.8 144 2044 0.02 few secs
5 C onclusions
T h e se ttin g  in  w hich co n tem p o ra ry  m em ory  m odels are expressed  an d  ana lyzed  
needs to  b e  im proved. T ow ards th is , we p resen t a fram ew ork based  on ax iom atic  
specifications (expressed  in  h igher o rd er logic) of m em ory  o rdering  requ irem en ts. 
I t  is s tra ig h tfo rw ard  to  encode these  req u irem en ts  as c o n s tra in t logic p rog ram s 
or, by  an  e x tra  level o f tra n s la tio n , as boo lean  sa tisfiab ility  p rob lem s. In  th e  
la t te r  case, one can  em ploy c u rre n t SAT too ls to  quickly  answ er w h e th e r ce rta in  
executions are p e rm itte d  or no t. O ur techn iques are  d e m o n s tra te d  th ro u g h  th e  
a d a p ta tio n  an d  analysis of th e  I ta n iu m  m em ory  m odel. B eing ab le to  tack le  such 
a com plex design  also a t te s ts  to  th e  sca lab ility  of o u r fram ew ork for cu tting -edge  
com m ercial arch itec tu res .
O ur m ethodo logy  provides several benefits. F irs t, th e  ab ility  to  execute th e  
u n derly ing  m odel is a pow erful fea tu re  th a t  p rom otes  u n d ers tan d in g . Second, th e  
com positional specification  sty le  prov ides m odu larity , reusability , an d  scalability . 
I t  also allows one to  ad d  co n s tra in ts  in c rem en ta lly  for investiga tion  purposes. 
T h ird , th e  expressive pow er of th e  u n derly ing  logic allows one to  define a w ide
range of req u irem en ts  using  th e  sam e n o ta tio n , p rov id ing  a rich  tax o n o m y  for 
m em ory  consistency  m odels. F inally , th e  m e th o d  of converting  ax iom atic  ru les to  
a p ro p o sitio n a l form ula allows one to  perfo rm  p ro p e r ty  checking th ro u g h  boo lean  
reasoning , th u s  open ing  u p  new  m eans to  conduc t fo rm al verification.
F u t u r e  W o r k  O ne en hancem en t th a t  can  be m ade is to  develop th e  capab il­
ity  o f exercising  sym bolic (non-ground)  litm us te s ts . Such a to o l m ay  be used  
to  a u to m a tica lly  syn thesize critica l in s tru c tio n s  of co n cu rren t code fragm ents 
com prising  com piler idiom s or o th e r  sy nch ron iza tion  p rim itives. F or exam ple, 
one could  im agine using  a sym bolic s to re  in s tru c tio n  in  a p ro g ram  an d  asking 
a to o l to  solve w h e th e r i t  shou ld  be an  “o rd in a ry ” or a “release” s to re  to  help  
g enera te  aggressive code. A n o th er a rea  of im provem ent is in  reducing  th e  logic 
gap  betw een  th e  form al specification  an d  th e  too ls  th a t  execu te  th e  specifica­
tion . O ne possib ility  is to  ap p ly  a quantified  boolean fo rm u la e  (Q B F ) solver th a t  
d irec tly  accep ts quan tifiers. T h e  research  of Q B F  solvers is still a t  a  p re lim in ary  
s tage  com pared  to  p ro p o sitio n a l SAT. W e hope ou r w ork can  help  acce lera te  its  
developm ent by  p rov id ing  in d u s tria lly  m o tiv a te d  benchm arks.
R eferences
1. Intel Itanium  A rchitecture Software Developer's M anual, 
h ttp ://developer.in te l.com /design /itan ium /m anuals.h tm
2. A Formal Specification of Intel Itanium  Processor Family Memory Ordering. Appli­
cation Note, Document Number: 251429-001 (October, 2002)
3. K. Gharachorloo: Memory consistency models for shared-memory multiprocessors. 
Technical Report CSL-TR-95-685, Stanford University, (December 1995)
4. D. Dill, S. Park, A. Nowatzyk: Formal Specification of A bstract Memory Models. 
Research on Integrated Systems: Proceedings of the 1993 Symposium, Ed. G. Bor- 
riello and C. Ebeling, M IT Press (1993)
5. Prosenjit Chatterjee, Ganesh Gopalakrishnan: Towards a Formal Model of Shared 
Memory Consistency for Intel Itanium . ICCD 2001, A ustin,TX  (Sept 2001)
6. Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom : Specifying Java Thread Se­
mantics Using a Uniform Memory Model, Jo in t ACM Java Grande - ISCOPE Con­
ference (2002)
7. The SPARC A rchitecture M anual, Version 9, P rentice Hall (1993)
8. Leslie L am port: How to Make a M ultiprocessor Com puter T hat Correctly Executes 
M ultiprocess Programs. IEEE Transactions on Com puters, 28(9): 690-691 (1979)
9. M. Aham ad, G. Neiger, Jam es Burns, Prince Kohli, Philip H utto: Causal Memory: 
Definitions, Im plem entation and Programming. Technical Report: GIT_CC-93/95 
(July 1994)
10. Yuan Yu, through personal communication
11. Simplify, http ://research.com paq.com /SR C /esc/S im plify.htm l
12. T L A +, h ttp ://research .m icrosoft.com /users/lam port/tla /tla .h tm l
13. Rajeev Joshi, Leslie Lam port, John M atthews, Serdar Tasiran, M ark Tuttle, Yuan 
Yu: Checking Cache-Coherence Protocols w ith T L A + Formal M ethods in System 
Design. Formal M ethods in System Design, 22(2): 125-131 (Mar 2003)
14. M. J. C. Gordon, T. F. Melham: In troduction to  HOL: A theorem  proving envi­
ronm ent for higher order logic, Cambridge University Press (1993)
15. J. Jaffar, J-L. Lassez: Constraint Logic Programm ing. Principles Of Program m ing 
Languages, Munich, Germany (January 1987)
16. SICStus Prolog, http://w w w .sics.se/sicstus
17. E. Goldberg, Y. Novikov: BerkMin: a Fast and Robust Sat-Solver. Design, Au­
tom ation and Test in Europe Conference and Exhibition Paris, France (2002)
18. M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, S. Malik: Chaff: Engineering an 
Efficient SAT Solver. 39th Design A utom ation Conference, Las Vegas (June 2001)
A ppendix: Form al Itan iu m  M em ory O rdering Specification
A .1  G e n e r a l  O r d e r i n g  R e q u i r e m e n t
re q u ir e L in e a rO rd e r  ops order =
re q u ire I r re f le x iv e T o ta l ops order A re q u ir e T ra n s i t iv e  ops order A 
re q u ire A s y m m e tr ic  ops order
re q u ire I r re f le x iv e T o ta l ops order =  V i, j  G ops. 
id i =  id j  ^  (o rd e r  i j  V o rd e r  j  i)
re q u ire T ra n s i t iv e  ops order =  V i , j , k  G ops. (o rd e r  i j  A o rd e r  j  k) ^  o rd e r  i k
re q u ire A s y m m e tr ic  ops order =  V i, j  G ops. o rd e r  i j  ^  —(o rd e r  j  i)
A .2  W r i t e  O p e r a t i o n  O r d e r
re q u ir e W r i te O p e ra t io n O rd e r  ops order =  V i, j  G ops. 
o r d e r e d B y W rite O p e ra t io n  i j  ^  o rd e r  i j
o r d e r e d B y W rite O p e ra t io n  i j  =  isW r i A isW r j  A w rID  i =  w rID  j  A 
(w rT y p e  i =  L oca l A w rT y p e  j  =  R e m o te  A w rP ro c  j  =  p  i V 
w rT y p e  i =  R e m o te  A w rT y p e  j  =  R e m o te  A 
w rP ro c  i =  p  i A w rP ro c  j  =  p  i)
A .3  P r o g r a m  O r d e r
re q u ir e P r o g ra m O rd e r  ops order =  V i, j  G ops.
(o rd e re d B y A c q u ire  i j  V o rd e re d B y R e le a s e  i j  V o rd e re d B y F e n c e  i j ) ^  
o rd e r  i j
o rd e re d B y P ro g ra m  i j  =  p  i =  p  j  A pc i <  p c  j
o rd e re d B y A c q u ire  i j  =  o rd e re d B y P ro g ra m  i j  A op  i =  ld.acq
o rd e re d B y R e le a s e  i j  =  o rd e re d B y P ro g ra m  i j  A op  j  =  st.re l A 
(isW r i ^  (w rT y p e  i =  Local A w rT y p e  j  =  Local V 
w rT y p e  i =  Rem ote A w rT y p e  j  =  Rem ote A w rP ro c  i =  w rP ro c  j) )
o rd e re d B y F e n c e  i j  =  o r d e r e d B y P r o g ra m  i j  A (op i =  m f  V op  j  =  m f )
re q u ir e M e m o ry D a ta D e p e n d e n c e  ops order =  V i, j  G ops.
(o rd e re d B y R A W  i j  V o rd e re d B y W A R  i j  V o rd e re d B y W A W  i j ) ^  
o rd e r  i j
o rd e re d B y M e m o ry D a ta  i j  =  o rd e re d B y P ro g ra m  i j  A v a r  i =  v a r  j
o rd e re d B y R A W  i j  =
o rd e re d B y M e m o ry D a ta  i j  A isW r i A w rT y p e  i =  Local A isR d  j
o rd e re d B y W A R  i j  =
o rd e re d B y M e m o ry D a ta  i j  A isR d  i A isW r j  A w rT y p e  j  =  Local
o rd e re d B y W A W  i j  =  o rd e re d B y M e m o ry D a ta  i j  A isW r i A isW r j  A 
(w rT y p e  i =  Local A w rT y p e  j  =  Local V 
w rT y p e  i =  R em ote A w rT y p e  j  =  R em ote A 
w rP ro c  i =  p  i A w rP ro c  j  =  p  i)
A .5  D a ta - F lo w  D e p e n d e n c e
re q u ire D a ta F lo w D e p e n d e n c e  ops order =  V i, j  G ops. 
o rd e re d B y L o c a lD e p e n c e n c e  i j  ^  o rd e r  i j
o rd e re d B y L o c a lD e p e n c e n c e  i j  =  o rd e re d B y P ro g ra m  i j  A re g  i =  re g  j  A 
(isR d  i A isR d  j  V
isW r i A w rT y p e  i =  Local A u se R e g  i A isR d  j  V 
isR d  i A isW r j  A w rT y p e  j  =  Local A u se R e g  j)
A .6  C o h e r e n c e
re q u ire C o h e re n c e  ops order =  V i, j  G ops.
(isW r i A isW r j  A v a r  i =  v a r  j  A
( a t t r ib u t e  (var i) =  W B  V a t t r i b u t e  (var i) =  UC) A
(w rT y p e  i =  Local A w rT y p e  j  =  Local A p  i =  p  j  V
w rT y p e  i =  Rem ote A w rT y p e  j  =  Rem ote A w rP ro c  i =  w rP ro c  j )  A
o rd e r  i j )
(V p, q G ops.
(isW r p A isW r q A w rID  p =  w rID  i A w rID  q =  w rID  j  A 
w rT y p e  p =  Rem ote A w rT y p e  q =  Rem ote A w rP r o c  p =  w rP r o c  q) ^  
o rd e r  p q)
A .7  R e a d  V a lu e
re q u ire R e a d V a lu e  ops order =  V j  G ops.
(isR d  j  ^  (v a lid L o c a lW r ops order j  V v a l id R e m o te W r ops order j  V 
v a l id D e fa u ltW r ops order j ) )  A ((isW r j  A u se R e g  j )  ^  v a lid R d  ops order j)
A.4 M em ory-Data Dependence
v a lid L o c a lW r ops order j  =  3 i G ops.
(isW r i A w rT y p e  i =  Local A v a r  i =  v a r  j  A p  i =  p  j  A 
d a t a  i =  d a t a  j  A o rd e r  i j )  A
(—3 k G ops. isW r k A w rT y p e  k  =  Local A v a r  k  =  v a r  j  A p  k  =  p  j A 
o rd e r  i k A o rd e r  k j)
v a l id R e m o te W r ops order j  =  3 i G ops.
(isW r i A w rT y p e  i =  R em ote A w rP ro c  i =  p  j  A v a r  i =  v a r  j  A 
d a t a  j  =  d a t a  i A —(o rd e r  j  i)) A
(—3 k G ops. isW r k A w rT y p e  k =  R em ote  A v a r  k =  v a r  j  A w rP ro c  k =  p  j A 
o rd e r  i k A o rd e r  k j)
v a lid D e fa u ltW r ops order j  =
(—3 i G ops. isW r i A v a r  i =  v a r  j  A o rd e r  i j  A
(w rT y p e  i =  Local A p  i =  p  j  V w rT y p e  i =  R em ote  A w rP r o c  i =  p  j )) ^  
d a t a  j  =  d e fa u lt  (var j)
v a lid R d  ops order j  =  3 i G ops.
(isR d  i A re g  i =  re g  j A  o rd e re d B y P ro g ra m  i j  A d a t a  j  =  d a t a  i) A 
(—3 k G ops. isR d  k A re g  k =  re g  j  A 
o rd e re d B y P ro g ra m  i k A o rd e re d B y P ro g ra m  k j)
A .8  T o ta l  O r d e r i n g  o f  W B  R e le a s e s
re q u ire A to m ic W B R e le a s e  ops order =  V i, j ,  k G ops.
(op i =  st.re l A w rT y p e  i =  Rem ote A op  k =  st.re l A w rT y p e  k =  Rem ote A 
w rID  i =  w rID  k A a t t r i b u t e  (v ar i) =  W B  A o rd e r  i j  A o rd e r  j  k) ^
(op j  =  st.re l A w rT y p e  j  =  Rem ote A w rID  j  =  w rID  i)
A .9  S e q u e n t i a l i ty  o f  U C  O p e r a t io n s
re q u ire S e q u e n tia lU C  ops order =  V i, j  G ops. o rd e re d B y U C  i j  ^  o rd e r  i j
o rd e re d B y U C  i j  =
o rd e re d B y P ro g ra m  i j  A a t t r i b u t e  (v a r i) =  U C A a t t r i b u t e  (v a r j )  =  UC A 
(isR d  i A isR d  j  V
isR d  i A isW r j  A w rT y p e  j  =  Local V 
isW r i A w rT y p e  i =  Local A isR d  j  V
isW r i A w rT y p e  i =  Local A isW r j  A w rT y p e  j  =  Local)
A .1 0  N o  U C  B y p a s s in g
re q u ire N o U C B y p a s s  ops order =  V i, j ,  k G ops.
(isW r i A w rT y p e  i — Local A a t t r i b u t e  (var i) — U C A isR*d j  A 
isW r k A w rT y p e  k =  Rem ote A w rP ro c  k =  p  k A w rID  k =  w rID  i A 
o rd e r  i j  A o rd e r  j  k) ^
(w rP ro c  k = p j  V va r  i =  v a r  j )
