
































Huawei Dresden Research Center













We observe that the standard notion of thread fairness is
insufficient for guaranteeing termination of even the sim-
plest shared-memory programs under weak memory mod-
els. Guaranteeing termination requires additionalmodel-specific
fairness constraints, which we call memory fairness. In the
case of acyclic declarative memorymodels, such as TSO and
RA, we show that memory fairness can be equivalently ex-
pressed in a uniform fashion as prefix-finiteness of an ex-
tended coherence order. This uniform memory fairness rep-
resentation yields the first effective way for proving termi-
nation of spinloops under weak memory consistency.
1 Introduction
Consider the following concurrent program demonstrating
the “message passing” idiom, where G and ~ are shared vari-
ables initialized with 0 and 0 and 1 are local variables.
G := 1
~ := 1
repeat { 0 := ~ } until (0 ≠ 0)
1 := G
(MP-Loop)
There are two basic semantic questions about this program:
1. Can this program terminate with 1 = 0?
2. Does the program always terminate? i.e., is thread 2
guaranteed to eventually read ~ = 1?
The first question has been studied extensively in the lit-
erature under a variety of memorymodels. Under sequential
consistency (SC) [16], the answer is “No.” Once thread 2 ex-
its the loop, it has observed the ~ := 1 write of thread 1 and
therefore also the earlier G := 1 write, and so 1 = 1. The
same reasoning also holds for some weak memory models
(e.g., x86-TSO [18] and (S)RA [13]), but not for other models
(e.g., PSO [23] and RC11 with relaxed accesses [14]). Under
PSO, for example, the writes of thread 1 may be propagated
out of order to thread 2, and so observing the ~ := 1 write
does not imply that G := 1 has happened.
The second question has been thoroughly studied only
under SC. Under SC, MP-Loop can diverge if, e.g., thread
2 is always scheduled and thread 1 never gets a chance to
run. This run is considered unfair because although thread
1 is always available to be scheduled, it is never selected.
A standard assumption is thread fairness (which is typically
simply called fairness in the literature [15, 17, 19]), namely
that every non-terminated thread is eventually scheduled.
With a fair scheduler, MP-Loop is guaranteed to terminate.
Underweakmemory consistency, the situation is substan-
tially more complex. Thread fairness alone does not suffice
to ensure termination of MP-Loop, because merely execut-
ing the ~ := 1 write does not mean that its effect is prop-
agated to the other threads. Take, for example, the oper-
ational TSO model [18], where writes are appended to a
thread-local buffer and are later asynchronously applied to
the shared memory. With such a model, it is possible that
the ~ := 1 write is forever stuck in the first thread’s buffer
and so thread 2 never gets a chance to read ~ = 1.
To rule out such behaviors, we introduce another prop-
erty,memory fairness (MF), that ensures that threads do not
continuously observe the same stale memory state.
In this paper, we consider three operationally-definedmod-
els: SC [16], TSO [18] and RA (i.e., the release-acquire frag-
ment of C11 following its operational characterization by
Kang et al. [11]). SC trivially satisfies MF. For TSO, we re-
quire that every buffered write is eventually unbuffered. For
RA, more adaptations are necessary: (1) we constrain the
timestamp ordering so that no write can overtake infinitely
many other writes; and (2) add a transition that forcefully
updates the views of threads so that all executedwrites even-
tually become globally visible.
Besides introducing MF into the aforementioned opera-
tional definitions, we show that MF can also be introduced
in a uniform fashion in their declarative/axiomatic defini-
tions. At first, this is rather challenging because in declara-
tive models the concept of a transition eventually happen-
ing does not make sense (since, in particular, events of dif-
ferent threads are not totally ordered). We observe, how-
ever, that the total order is actually not strictly necessary
for defining fairness; what is important is that every event is
preceded by only a finite number of other events. We there-
fore formulate memory fairness as prefix-finiteness of the
1
, Ori Lahav, Egor Namakonov, Jonas Oberhauser, Anton Podkopaev, and Viktor Vafeiadis
extended coherence order, which means that in a fair execu-
tion no write can be preceded by an infinite number of other
events (e.g., reads that have not yet observed the write).
We justify the uniform declarative definition of memory
fairness by showing that in a number of instances (SC, TSO,
RA), it is equivalent to the fair adaptations of the correspond-
ing operational models. This requires extending the exist-
ing equivalence results between operational and declarative
models to infinite executions, and involves more advanced
constructions that make use of memory fairness.
Moreover, we show that our uniformMF definition is use-
ful for verification. Termination of spinloops in declarative
models with our uniform MF definition reduces to whether
an iteration reading only final latest writes exits the loop.
For example, under MF models, the loop in MP-Loop termi-
nates because reading the final write to ~ (i.e., ~ := 1) exits
the loop. We apply this reduction to verify the termination
and/or fairness of two lock implementations.
Outline. In §2we define fairness operationally and incor-
porate it in the operational definitions of SC, x86-TSO, and
RA. In §3 we recap the declarative framework for defining
memory models. In §4 we present our declarative MF condi-
tion and establish its equivalence to the operational MF no-
tions. In §5 we show that the declarative fairness character-
ization yields an effective method for proving termination
of spinloops and illustrate it on two lock implementations
in §6. We conclude with a discussion of fairness in other
models in §7.
Supplementary Material. The material enclosed with
this submission contains: (1) an appendix with typeset proof
sketches for the lemmas and propositions of the article and
further examples, and (2) a Coq development with mecha-
nized proofs for all the results of Section 2 to 5.
2 What is a Fair Operational Semantics?
In this section we define our operational framework and its
fairness constraints. We initially demonstrate our terminol-
ogy for sequential consistency (SC). In §2.1 and §2.2, we in-
stantiate our framework to the total-store-order (TSO)model
and release/acquire (RA), and discuss memory fairness in
each of these models.
Labeled Transition Systems. Our formal development
is based on labeled transition systems (LTSs), which we use
to represent both programs and operational memory mod-
els. We assume that the transition labels of these systems
are split between observable transition labels and silent tran-
sition labels. Using transition labels we define a trace to be a
(finite or infinite) sequence of transition labels (of any kind);
whereas an observable trace is a (finite or infinite) sequence
of observable transition labels. Then, LTSs capture sets of
traces and of observable traces in the standard way, which
is formulated below.
Formally,we define an LTS to be a tuple 〈&, Σ,Θ, init ,−→〉,
where & is a set of states, Σ is a set of observable transition
labels, Θ is a set of silent transition labels, init ∈ & is the ini-
tial state, and −→ ⊆ & × (Σ⊎Θ) ×& is a set of transitions. We
denote by .Q, .Σ, .Θ, .init, and −→ the components
of an LTS .
We denote by src(C), tlab(C), and tgt(C) the three com-
ponents of a transition C ∈ −→. For f ∈ Σ ⊎ Θ, we write
f
−→ for the relation {〈src(C), tgt(C)〉 | C ∈ −→, tlab(C) = f}.




−→. We say that a transi-
tion label f ∈ Σ⊎Θ is enabled in some state @ ∈ & if @
f
−→ @′
for some @′ ∈ & .
A run of is a (finite or infinite) sequence ` of transitions
in −→ such that src(`(0)) = .init and tgt(`(: − 1)) =
src(`(:)) for every : ≥ 1 in dom(`). A run ` of  induces
the trace d if d (:) = tlab(`(:)) for every : ∈ dom(`). Also,
` induces the observable trace d ′ if d ′ is the restriction to Σ
of some trace d that is induced by `.
An (observable) trace d is called an (observable) trace of
 if it is induced by some run of. We write: OTr() for the
set of all observable traces of  and OTrfin () for the set of
all finite observable traces of .
Domains and Event Labels. Todefine programs and their
semantics, we fix sets Loc, Tid, and Val of (shared) locations,
thread identifiers, and values (respectively). We assume that
Val contains a distinguished value 0, which serves as the ini-
tial value for all locations. In addition, we assume that Tid
is finite, given by Tid = {1, 2, ... ,# } for some # ≥ 1. We use
G,~ to range over Loc; g, c to range over Tid; and E to range
over Val. Programs are interacting with the memory using
event labels, defined as follows.
Definition 2.1. An event label ; is one of the following:
• Read event label: R (G, ER) where G ∈ Loc and ER ∈ Val.
• Write event label: W (G, EW) where G ∈ Loc and EW ∈ Val.
• Read-modify-write label: RMW(G, ER, EW) where G ∈ Loc
and ER, EW ∈ Val.
The functions typ, loc, valr, and valw return (when appli-
cable) the type (R/W/RMW), location (G ), read value (ER), and
written value (EW) of a given event label ; .We denote byELab
the set of all event labels.
Remark 1. For conciseness, we have not included fences in
the set of event labels. Both in TSO [18] and RA [13], fences
can be modeled as read-modify-writes to an otherwise un-
used distinguished location 5 .
Remark 2. Rich programming languages like C/C++ [3]
and Java [4] as well as the ARMv8 multiprocessor [21] have
multiple kinds of accesses. This requires to extend our event
labels with additional modifiers. However, simple event la-
bels as defined above suffice for the purpose of this paper.
Sequential Programs. Tokeep the presentation abstract,
we do not fix a particular programming language, but rather
2
Making Weak Memory Models Fair ,
represent sequential programs as LTSs with ELab, the set
of all event labels, serving as the set of observable transi-
tion labels. For simplicity, we assume that sequential pro-
grams do not have silent transitions.1 For an example of a
toy programming language syntax and its reading as an LTS,
see [20]. In our code snippets throughout the paper, we im-
plicitly assume such a standard interpretation.
We refer to observable traces of sequential programs (i.e.,
sequences over ELab) as sequential traces.
Concurrent Programs. A concurrent program, which we
also simply call a program, is a top-level parallel composi-
tion of sequential programs, defined as a finite mapping as-
signing a sequential program to each thread g ∈ Tid. A con-
current program % induces an LTS with Tid×ELab serving
as the set of observable transition labels (and no silent tran-
sition labels). This LTS follows the interleaving semantics
of % : its states are tuples in
∏
g ∈Tid % (g).Q; the initial state is






−→% ? [g ↦→ ?]
In the sequel, we identify concurrent programs with their
induced LTSs.
We refer to observable traces of concurrent programs (i.e.,
sequences over Tid× ELab) as concurrent traces. We denote
the two components of a pair f ∈ Tid × ELab by tid(f)
and elab(f) respectively.
Behaviors. We define a behavior to be a function V as-
signing a sequential trace to every thread, since the events
executed by each thread capture precisely what it has ob-
served about the memory system.
Notation2.2. The restriction of a concurrent trace d to thread
g ∈ Tid, denoted by d |g , is the sequence obtained from d by
keeping only the transition labels of the form g : _.
Definition 2.3. The behavior induced by a concurrent trace
d , denoted by V (d), is given by
V (d) , _g ∈ Tid. _: ∈ dom(d |g ). elab(d |g (:)).
This notation is extended to sets of concurrent traces in the
obvious way (V (() , {V (d) | d ∈ (}).
Notation 2.4. For an LTS  with .Σ = Tid × ELab, we
denote by B() the set of behaviors induced by observable
traces of  ( i.e., B() , V (OTr())) and by Bfin () the
set of behaviors induced by finite observable traces of  ( i.e.,
Bfin () , V (OTrfin ())).
The following property easily follows fromour definitions.
Proposition 2.5. For every program % , if V (d1) = V (d2),
then d1 ∈ OTr(%) iff d2 ∈ OTr(%).
1This assumption serves usmerely to simplify the presentation, since silent
program transitions can be always attached to the next memory access.
Thread Fairness. Not all program behaviors are fair.
Example 2.6. Consider the following program:
G := 1
! : 0 := G
if 0 = 0 goto !
(Rloop)
The behaviors of this program include the behavior assign-
ing W (G, 1) to the first thread and R (G, 1) to the second, but
also the (infinite) behavior assigning the empty sequence to
the first thread and the infinite sequence R (G, 0), R (G, 0), ...
to the second. This behavior occurs if an unfair scheduler
only schedules the second thread to run even though the
first thread is always available to execute.
A natural constraint, which in particular excludes the infi-
nite behavior in the example above, requires a fair scheduler,
which we formally define as follows.
Definition 2.7. Let % be a program.
• A thread g ∈ Tid is enabled in ? ∈ %.Q if 〈g, ;〉 is enabled
in ? for some for some ; ∈ ELab.
• A thread g ∈ Tid is continuously enabled at index : in an
infinite run ` of % if it is enabled in src(`( 9 )) for every
index 9 ≥ : . Thread g is continuously enabled in ` if it is
continuously enabled in ` at some index : .
• A run ` of % is thread-fair if ` is finite or for every thread
g ∈ Tid and index : such that g is continuously enabled
in ` at : , there exists 9 ≥ : such that tid(tlab(`( 9 ))) =
g .
• A thread-fair observable trace of % is any concurrent trace
induced by a thread-fair run of % .
• A thread-fair behavior of % is any behavior induced by a
thread-fair observable trace of % . We denote by Btf (%)
the set of all thread-fair behaviors of % .
Returning to Example 2.6, thread-fair behaviors of Rloop
are either finite or must assign 〈W (G, 1)〉 to the first thread.
The following proposition is useful in the sequel.
Proposition 2.8. For every program % , if V (d1) = V (d2),
then d1 is a thread-fair observable trace of % iff d2 is a thread-
fair observable trace of % .
Memory Systems. To give operational semantics to pro-
grams, we synchronize them with memory systems, which,
like programs, are LTSs with Tid × ELab serving as the set
of observable transition labels. In addition, memory systems
have silent transition labels, which vary from one system to
another. Intuitively, the set of silent transition labels M .Θ
of a memory systemM consists of internal actions that the
program cannot observe (e.g., cache related operations).
The most well-known memory system is that of sequen-
tial consistency [16], denoted here byMSC, in which writes
by each thread are made immediately visible to all other
threads. MSC tracks the most recent value written to each
location. Its initial state maps each location to zero. That is,
MSC.Q , Loc → Val and MSC.init , _G. 0. The system
3
, Ori Lahav, Egor Namakonov, Jonas Oberhauser, Anton Podkopaev, and Viktor Vafeiadis
MSC has no silent transitions (MSC.Θ = ∅) and its transi-
tion relation −→MSC is defined as follows:
" ′ = " [G ↦→ E]





" (G) = E

















Writing E to G simply updates the value of G stored in " .
(" [G ↦→ E] is the function that maps G to E and all other lo-
cations ~ to " (~).) Reading E from G succeeds iff the value
stored for G in memory is E . The atomic read-modify-write
RMW(G, ER, EW) reads location G yielding value ER and immedi-
ately writes EW to it. Note thatMSC is oblivious to the thread




−−→MSC ). The other memory
systems below do not have this property.
Linking Programs and Memory Systems. By linking
programs and memory systems, we can talk about the be-
havior of a program % under a memory system M. We say
that a certain behavior V is a behavior of a program % under
a memory system M if V is both a behavior of % and a be-
havior of M (i.e., V ∈ B(%) ∩ B(M). Similarly, V is called a
thread-fair behavior of % underM if V ∈ Btf (%) ∩ B(M).
Proposition 2.9. Let % be a program, M be a memory sys-
tem, and V be a behavior.
• V is a behavior of % under M iff V = V (d) for some d ∈
OTr(%) ∩ OTr(M).
• V is a thread-fair behavior of % under M iff V = V (d) for
some d ∈ OTr(M) that is also a thread-fair observable
trace of % .
Example 2.10. Thread-fair behaviors of the programRloop
under MSC must be finite. Indeed, in observable traces of
MSC, after the first thread performsW (G, 1), the second thread
will perform R (G, 1) and terminate its execution. The behav-
ior Vinf that assigns the empty sequence to the first thread
and the infinite sequence consisting of R (G, 0) event labels
to the second thread cannot be obtained from a thread-fair
run of Rloop.
Example 2.11. Consider the following program (assuming
that G is initialized to 0):
!1 : G := 1
G := 0
goto !1
!2 : 0 := G
if 0 = 1 goto !2 (WWRloop)
The infinite behavior that assigns the infinite sequencesW (G, 1),
W (G, 0), W (G, 1), W (G, 0), ..., and R (G, 0), R (G, 0), ... to the first
and second threads (respectively) is a thread-fair behavior
of this program under MSC: in a corresponding run both
threads are executed infinitely often.
Memory Fairness. Aswe have already discussed, thread-
fairness alone is often insufficient to reason about termina-
tion under weak memory models. For this reason, we intro-
ducememory fairness (MF), which ensures that a thread can-
not be lagging behind indefinitely because the memory sys-
tem did not propagate certain updates to it. We formalize
this intuition by having MF require that the memory silent
transitions are scheduled infinitely often.
Definition 2.12. Let M be a memory system.
• A silent transition label \ ∈ M .Θ is continuously en-
abled at index : in an infinite run ` ofM if it is enabled
in src(`( 9 )) for every index 9 ≥ : . The label \ is con-
tinuously enabled in ` if it is continuously enabled in `
at some index : .
• A run ` of M is memory-fair if ` is finite or for every
silent memory transition label \ ∈ M .Θ and index :
such that \ is continuously enabled in ` at : , there exists
9 ≥ : such that tlab(`( 9 )) = \ .
• A memory-fair observable trace of M is any concurrent
trace induced by a memory-fair run ofM.
• A memory-fair behavior of M is any behavior induced
by a memory-fair observable trace of M. We denote by
Bmf (M) the set of all memory-fair behaviors of M.
Linking this definition with programs, we say that a cer-
tain behavior V is a memory-fair behavior of a program %
under a memory systemM if V ∈ B(%) ∩Bmf (M). Similarly,
V is called a thread&memory-fair behavior of % under M if
V ∈ Btf (%) ∩ Bmf (M).
Proposition 2.13. Let % be a program,M be a memory sys-
tem, and V be a behavior.
• V is a memory-fair behavior of % under M iff V = V (d)
for some observable trace d of % that is also a memory-fair
observable trace of M.
• V is a thread&memory-fair behavior of % underM iff V =
V (d) for some thread-fair observable trace d of % that is
also a memory-fair observable trace of M.
Since MSC.Θ = ∅, every behavior of a program % under
MSC is (vacuously)memory-fair. Next, we demonstrate two
weaker memory systems with non-empty set of silent tran-
sitions that have non-memory-fair traces. In these systems,
whether a program terminates or deadlocks may crucially
depend on memory-fairness.
2.1 The Total Store Order Memory System
We instantiate memory fairness to the “Total Store Order”
(TSO) model [18, 22] of the x86 architecture. This memory
system, denoted byMTSO, is defined by:
1. MTSO.Q , (Loc → Val) × (Tid → (Loc × Val)
∗)
(Each state consists of a memory and a per-thread
store buffer.)
4
Making Weak Memory Models Fair ,
2. MTSO.Θ , {prop(g) | g ∈ Tid}
(Silent transitions consist of a propagation label for
every thread.)
3. MTSO.init , 〈"0, 0〉, where"0 , _G. 0 and 0 ,
_g . n (Initially, all buffers are empty.)
4. −→MTSO is given in Fig. 1.
In addition to the global memory " , states of MTSO in-
clude a mapping  assigning a FIFO store buffer to every
thread. Writes are first written to the local buffer and later
non-deterministically propagate to memory (in the order in
which they were issued). Reads read the most recent value
of the relevant location in the thread’s buffer, and refer to
the memory if such value does not exist. RMWs can only
execute when the thread’s buffer is empty and write their
result in the memory directly.
Example 2.14 (Store Buffering). The following annotated
behavior is allowed under MTSO (but not under MSC):
G := 1
0 := ~ //reads 0
~ := 1
0 := G //reads 0
(SB)
Indeed, the first thread may run first, but the write of 1 to G
may remain in its store buffer. Then,when the second thread
runs, it reads the initial value (0) of G from the memory.
Example 2.15. Revisiting the Rloop program from §2, un-
like underMSC, thread-fair behaviors ofRloopunderMTSO
include the (infinite) behavior assigning the W (G, 1) to the
first thread and the infinite sequence R (G, 0), R (G, 0), ... to
the second. Indeed, the entry 〈G, 1〉 may indefinitely remain
in the first thread’s buffer, so that W (G, 1) is never executed
from the point of view of the second thread. To disqualify
this behavior, we need to further require memory fairness.
Indeed, in runs inducing this infinite behavior, the silent
memory transition prop(1) is necessarily continuously en-
abled. Memory fairness requires that prop(1) will be even-
tually executed, and from that point onMTSO prohibits the
second thread from executing R (G, 0).
We note that the notion of memory fairness is sensitive
to the choice of silent memory transitions. For example, con-
sider an alternative memory system, denoted byM ′
TSO
, with
less informative silent transition labels that do not record
the thread identifier of the propagatedwrite. (FormallyM ′
TSO




and the label of the propagation step is prop rather than
prop(g).) Then,M ′
TSO
induces the same set of behaviors as
MTSO, but not the same set of memory fair behaviors. In
particular, we can extend the Rloop program with an addi-
tional thread that constantly writes to some unrelated loca-
tion ~, and obtain a memory fair run ofM ′
TSO
by infinitely
often propagating a write to ~, and never propagating the
W (G, 1) entry.
2.2 The Release/Acquire Memory System
We instantiate our operational framework with a memory
system for Release/Acquire (RA), enriched with silent mem-
ory transitions for capturing fair behaviors. Here we follow
an operational formulation of RA from Kaiser et al. [10],
based on the Promising Semantics of Kang et al. [11].
The memory of the RA system records a (finite) set of
messages, each of which corresponds to somewrite that was
previously executed. Messages (of the same location) are or-
dered using timestamps, and carry a view—a mapping from
locations to timestamps. In turn, the states of this memory
system also keep track of the current view of each thread,
and use these views to confine the set ofmessages that threads
may read and write. In particular, if a thread has observed
(either by reading or by writing itself) a message whose
view V has V (G) = C , then it can only read messages of
G whose timestamp is greater than or equal to C .
To formally define this system, we let Time , N (using
natural numbers as timestamps), View , Loc → Time (the
set of views), and Msg , Loc × Val × Time × View (the
set of messages). We denote a message < as a tuple of the
form 〈G : E@C,V 〉, where G ∈ Loc, E ∈ Val, C ∈ Time, and
V ∈ View. We write loc(<), val(<), ts(<), and view(<)
to refer to the components of a message<. The usual order
< on natural numbers is lifted pointwise to views; ⊔ denotes
the pointwise maximum on views; and V0 is the minimum
view (V0 , _G. 0).
With these definitions and notations, the RAmemory sys-
tem, denoted here by MRA, is defined as follows (its silent
memory transitions are discussed in the sequel):
1. MRA.Q , P(Msg) × (Tid → View).
2. MRA.init , 〈"0, _g . V0〉, where the initial memory
is given by "0 , {〈G : 0@0,V0〉 | G ∈ Loc}.
3. −→MRA is given in Fig. 2.
The states ofMRA consist of a set" of all messages added
to the memory so far and a mapping T assigning a view
to each thread. Write steps of thread g writing to location
G pick a timestamp C that is fresh (< ∈ ". loc(<) =
G ∧ ts(<) = C ) and greater than the latest timestamp that g
has observed for G (T (g) (G) < C ); update the thread’s view
to include this timestamp (T ′ = T [g ↦→ T (g) [G ↦→ C]]);
and add a corresponding message to the memory carrying
the (updated) thread view (" ′ = " ∪ {〈G : E@C,T ′(g)〉}).
Read steps of thread g reading from location G pick a mes-
sage from the current memory (〈G : E@C,V 〉 ∈ ") whose
timestamp is greater than or equal to the latest timestamp
that g has observed for G (T (g) (G) ≤ C ); and incorporate the
message’s view in the thread view (T ′ = T [g ↦→ T (g)⊔V ]).
RMW steps are defined as atomic sequencing of a read step
followed by a write step, with the restriction that the new
message’s (fresh) timestamp is the successor of the times-
tamp of the read message (T ′′(g) (G) = T ′(g) (G) + 1). The
latter condition is needed to ensure the atomicity of RMWs:
5
, Ori Lahav, Egor Namakonov, Jonas Oberhauser, Anton Podkopaev, and Viktor Vafeiadis
′ =  [g ↦→ 〈G, E〉 · (g)]





(g) = 〈G=, E=〉· ... ·〈G1, E1〉
" [G1 ↦→ E1]···[G= ↦→ E=] (G) = E




(g) = n " (G) = ER
" ′ = " [G ↦→ EW]





(g) = 1 · 〈G, E〉 ′ =  [g ↦→ 1]






Figure 1. Transitions of MTSO
< ∈ ". loc(<) = G ∧ ts(<) = C
T (g) (G) < C
T ′ = T [g ↦→ T (g) [G ↦→ C]]





〈G : E@C,V 〉 ∈ "
T (g) (G) ≤ C


















Figure 2. Transitions ofMRA
no other write can intervene between the read part and the
write part of the RMW (i.e., no message can be placed be-
tween the read and the written messages).
Example 2.16 (Message passing). The following annotated
behavior is disallowed under MRA:
G := 1
~ := 1
0 := ~ //reads 1
1 := G //reads 0
(MP)
Indeed, the second thread can read 1 for ~, only after the
first thread added two messages<G = 〈G : 1@CG , [G ↦→ CG ]〉
and<~ = 〈~ : 1@C~, [G ↦→ CG ,~ ↦→ C~]〉 to the memory with
CG , C~ > 0. When reading<~ , the second thread increases its
view of G to be CG . Since CG > 0, it is then unable to read the
initial message of G , and must read<G .
Example 2.17. By forcing RMWs to use the successor of
the read message as the timestamp of the written message,
MRA forbids different RMWs to read the same message. To
see this, consider the following example (where FADD de-
notes an atomic fetch-and-add instruction that returns its
read value):
0 := FADD(G, 1) //reads 0 1 := FADD(G, 1) //reads 0
(2RMW)
W.l.o.g., if the first runs first, it reads from the initialization
message 〈G : 0@0,V0〉 (it is the only message of G in "0),
and it is forced to add a message with timestamp 1. When
the second thread runs, it may not read from the initializa-
tion message: that would again require adding a message of
G with timestamp 1, but that timestamp is no longer fresh.
Thus, it may only read from the message that was added by
the first thread.
Example 2.18. Fences (modeled as RMWs to an otherwise
unused distinguished location 5 ) can be used to recover se-
quential consistency when needed. The following outcome
is forbidden by RA.
G := 1
FADD(5 , 0)
0 := ~ //reads 0
~ := 1
FADD(5 , 0)
1 := G //reads 0
(SB+RMWs)
Due to the RMWs in both threads, MRA forbids the anno-
tated program behavior. Indeed, suppose, w.l.o.g., that the
first thread executes its FADD(5 , 0) first, it will read from
the initialization message to 5 and will add to memory a
message of the form 〈5 : 0@1,V 〉 with V (G) > 0. When
the second thread executes its FADD(5 , 0), it will necessarily
read that message and incorporate the view V in its thread
view, so that its view of G will be increased. Then, when it
reads G it may not pick the initial message.
The RA memory system defined so far (with no silent
transitions) allows non-fair executions. In particular, it al-
lows messages added by some thread to never propagate to
other threads, so that other threads may forever read a mes-
sage with a lower timestamp, and thus, allows, e.g., a thread-
fair infinite behavior for the Rloop program from §2. To ad-
dress this, we include silent memory transitions inMRA, la-
beled with tuples of the form prop(g,<), where g ∈ Tid and
< ∈ Msg (i.e., MRA.Θ , {prop(g,<) | g ∈ Tid,< ∈ Msg}).
Then, we include inMRA the following silent memory step:
RA-propagate
< ∈ " T (g) (loc(<)) < ts(<)





For a given thread g and message < that has not been yet
observed by thread g (T (g) (loc(<)) < ts(<)), this step
increases g ’s view to include<’s timestamp (T ′ = T [g ↦→
T (g) [loc(<) ↦→ ts(<)]]). Intuitively speaking, it ensures
that every thread g eventually advances its view so that it
cannot keep reading an old message indefinitely.
6
Making Weak Memory Models Fair ,
Example 2.19. While thread-fair behaviors of Rloop un-
der MRA include an infinite behavior (in which the second
thread indefinitely read the initialization message), memory
fairness forbids this behavior. Indeed, in runs inducing this
infinite behavior, a silent label prop(2, 〈G : 1@C, [G ↦→ C]〉)
(where C is a timestamp of a message added by instruction
G := 1 of Rloop) is necessarily continuously enabled. Mem-
ory fairness ensures that the corresponding transition is even-
tually executed, and from that point on,MRA prohibits the
second thread from executing R (G, 0).
To conclude this section, we emphasize again how mem-
ory fairness is sensitive to the choice of silent memory tran-
sitions. For instance, the system obtained fromMRA by dis-
carding the message < from the labels of silent memory
steps induces the same set of behaviors asMRA, but not the
same set of memory fair behaviors. In the next sections, we
present the declarative approach for defining the semantics
of memory systems, which uniformly captures memory fair-
ness, and does not require the technical ingenuity needed
for ensuring fairness in operational memory systems.
3 Preliminaries on Declarative Semantics
In this section we review the declarative (a.k.a. axiomatic)
framework for assigning semantics to concurrent programs
and present the well-known declarative model for the three
operational models presented above. Later, we will extend
the framework and the existing correspondence results with
fairness guarantees that account for infinite behaviors.
Relations. Given a binary relation (in particular, a func-
tion) ', dom(') and codom(') denote its domain and co-
domain. We write '?, '+, and '∗ respectively to denote its
reflexive, transitive, and reflexive-transitive closures. The in-
verse relation is denoted by '−1. We denote by '1 ; '2 the
(left) composition of two relations '1, '2, and assume that
; binds tighter than ∪ and \. We denote by [] the identity
relation on a set . In particular, [] ; ' ; [] = ' ∩ ( × ).
For = ≥ 0 and a relation ' on a set , '= is recursively de-





Events. Events represent individual memory accesses in
a run of a program.
Definition 3.1. An event 4 is a tuple 〈:, g : ;〉 where : ∈
N ∪ {⊥} is a serial number inside each thread (⊥ for initial-
ization events), g ∈ Tid ⊎ {⊥} is a thread identifier (⊥ for
initialization events), and ; ∈ ELab is an event label (as de-
fined in Def. 2.1). The functions sn, tid, and elab return
the serial number, thread identifier, and the event label of
an event. The functions typ, loc, valr, and valw are lifted
to events in the obvious way. We denote by Event the set of
all events, and use R, W, RMW to denote the following subsets:
R , {4 ∈ Event | typ(4) = R ∨ typ(4) = RMW}
W , {4 ∈ Event | typ(4) = W ∨ typ(4) = RMW}
RMW , {4 ∈ Event | typ(4) = RMW}
We use subscripts and superscripts to restrict sets of events
to certain location and thread (e.g., WG = {F ∈ W | loc(F) = G}
andg = {4 ∈  | tid(4) = g}). The set of initialization events
is given by Init , {〈⊥,⊥ : W (G, 0)〉 | G ∈ Loc}.
Notation 3.2. Given a relation ' on events, we denote by
' |loc the restriction of ' to events of the same location:
' |loc = {〈41, 42〉 ∈ ' | ∃G ∈ Loc. loc(41) = loc(42) = G}
Our representation of events induces a sequenced-before




(41 ∈ Init ∧ 42 ∉ Init) ∨
(tid(41) = tid(42) ∧ sn(41) < sn(42))
Initialization events precede all non-initialization events, while
events of the same thread are ordered according to their se-
rial numbers.
Behaviors (i.e.,mappings from threads to sequential traces)
are associated with sets of events in the obvious way:
Definition 3.3. The set of events extracted from a behav-
ior V , denoted by Event(V), is given by Event(V) , Init ∪
{〈:, g : V (g) (:)〉 | g ∈ Tid, : ∈ dom(V (g))}.
It is easy to see that for every behavior V , Event(V) satis-
fies certain “well-formedness” properties:
Definition 3.4. A set  ⊆ Event is well-formed if the fol-
lowing hold:
• Init ⊆ .
• tid(4) ≠ ⊥ and sn(4) ≠ ⊥ for every 4 ∈  \ Init.
• If tid(41) = tid(42) and sn(41) = sn(42), then 41 = 42.
• For every 4 ∈  \ Init and 0 ≤ : < sn(4), there exists
; ∈ ELab such that 〈:, tid(4) : ;〉 ∈ .
Execution Graphs. An execution graph consists of set
of events, a reads-from mapping that determines the write
event from which each read reads its value, and a modifica-
tion order which totally orders the writes to each location.
Definition 3.5. An execution graph is a tuple 〈, rf ,mo〉
where:
1.  is a well-formed (possibly, infinite) set of events.
2. rf , called reads-from, is a relation on  satisfying:
• If 〈F, A〉 ∈ rf thenF ∈ W, A ∈ R, loc(F) = loc(A ),
and valw(F) = valr(A ).
• F1 = F2 whenever 〈F1, A 〉, 〈F2, A 〉 ∈ rf (that is,
rf −1 is functional).
•  ∩ R ⊆ codom(rf ) (every read should read from
some write).
7
, Ori Lahav, Egor Namakonov, Jonas Oberhauser, Anton Podkopaev, and Viktor Vafeiadis
3. mo , calledmodification order, is a disjoint union of re-
lations {moG }G ∈Loc, such that each moG is a strict to-
tal order on  ∩ WG .
We denote the components of by.E,.rf, and.mo, and
use .po (called program order) to denote the restriction of
sequenced-before to.E (i.e.,.po , [.E]; <; [.E]). For a
set  ′ ⊆ Event, wewrite. ′ for.E∩ ′ (e.g.,.W = .E∩W).
We denote by EGraph the set of all execution graphs.
A declarative memory system is simply a set of execution
graphs (often formulated using a conjunction of several con-
straints). We refer to execution graphs in a declarative mem-
ory system G as G-consistent execution graphs.
We can nowdefine the behaviors allowed by a given declar-
ative memory system.
Definition 3.6. A behavior V is allowed by a declarative
memory system G if Event(V) = .E for some execution
graph  ∈ G. We denote by B(G) (Bfin (G)) the set of all
(finite) behaviors that are allowed by G.
The linking with programs is defined as follows.
Definition3.7. Let % be a program,G be a declarativemem-
ory system, and V be a behavior.
• V is a behavior of % under G if V ∈ B(%) ∩ B(G).
• V is a thread-fair behavior of % under G if V ∈ Btf (%) ∩
B(G).
3.1 A Declarative Memory System for SC
To provide a declarative formulation of SC, following Al-
glave et al. [2], we use the standard “from-read” relation
(a.k.a. “reads-before”). In this relation a read A is ordered be-
fore a write F if A reads from a write F ′ that is earlier than
F in the modification order.
Definition3.8. The from-read relation for an execution graph
 , denoted by.fr, is defined by:
.fr , (.rf−1 ;.mo) \ [.E] .
Note that we have to explicitly subtract the identity rela-
tion from.rf−1 ;.mo for making sure that RMW events are
not .fr-ordered before themselves.
Having defined fr, the “SC-happens-before” relation is
given by:
.hbSC , (.po ∪.rf ∪.mo ∪.fr)
+
In turn, SC consistency requires that.hbSC is irreflexive:
GSC , { ∈ EGraph | .hbSC is irreflexive}
Intuitively speaking, every trace of MSC induces an ex-
ecution graph  with irreflexive .hbSC; and, conversely,
every total order on .E that extends .hbSC is essentially
a trace ofMSC. The following standard theorem formalizes
these claims for finite executions:
Theorem 3.9 ([2]). Bfin (MSC) = B
fin (GSC).
Example 3.10. To see that GSC forbids the annotated out-
come of the SB program from Example 2.14, it suffices to
note that the following graph is GSC-inconsistent (W (G, 0)
and W (~, 0) are the implicit initialization writes):








Indeed, to get the desired behavior, the rf-edges are forced
because of the read values. Since mo cannot contradict po
(they are both included in hbSC), the mo-edges are also forced
as depicted above.We obtain fr-edges from R (G, 0) to W (G, 1)
and from R (~, 0) to W (~, 1), which, in turn, imply a hbSC-
cycle composed of two po and two fr edges.
3.2 A Declarative Memory System for TSO
Following Alglave et al. [2], a declarative formulation for
TSO is easily obtained from the one of SC, by removing
from the transitive closure in hbSC the program order edges
from writes to reads that are not necessarily “preserved” in
TSO. Indeed, because writes are buffered in TSO, roughly
speaking, the effect of awrite inTSOmay be delayedw.r.t. to
subsequent reads. By contrast, it cannot be delayed w.r.t. to
subsequent writes, since entries in the TSO buffers propa-
gate in a FIFO fashion.
When removing the write to read program order edges,
we need to explicitly enforce “SC per-location” (a.k.a. co-
herence), which takes care of intra-thread write-read pairs
(a read A from G that is later in program order than a write
F to G may not read from an write that is mo-earlier thanF ).
To achieve this, the model employs the following derived
relations:
.rfe , .rf \.po (external reads-from)
.ppo , .po \ ((W \ RMW) × (R \ RMW))
(preserved program order)
.hbTSO , (.ppo ∪.rfe ∪.mo ∪.fr)
+
(TSO-happens-before)
.scloc , (.po|loc ∪.rf ∪.mo ∪.fr)
+
(SC-per-location order)
Then,TSO consistency requires that.hbTSO and.scloc
are irreflexive:
GTSO , { ∈ EGraph | .hbTSO and .scloc are irreflexive}
Theorem 3.11 ([2]). Bfin (MTSO) = B
fin (GTSO).
Revising the execution graph for the SB program in Exam-
ple 3.10, we note that it is GTSO-consistent. In particular, the
two po edges that participate in the .hbSC cycle are from
a write to a read, so none of them is included in .hbTSO.
8
Making Weak Memory Models Fair ,
3.3 A Declarative Memory System for RA
The declarative model for RA is obtained by strengthening
the SC per-location requirement to useRA’s happens-before
relation instead of the program order:
.hbRA , (.po ∪.rf)
+ (RA-happens-before)
.raloc , (.hbRA |loc ∪.rf ∪.mo ∪.fr)
+
(RA-per-location order)
Then, RA consistency requires that.raloc is irreflexive:
GRA , { ∈ EGraph | .raloc is irreflexive}
Equivalence to the operational RA model for finite behav-
iors follows from the results of Kang et al. [11]:
Theorem 3.12. Bfin (MRA) = B
fin (GRA).
Next, we provide declarative justifications for some of the
examples presented in §2.2.
Example 3.13. To see that the annotated outcome of the
MP program from Example 2.16 is disallowed by GRA, it suf-
fices to note that the following (partially depicted) execution
graph is GRA-inconsistent:







An execution graph for this outcome must have rf and mo-
edges as depicted above. Since mo goes from W (G, 0) to W (G, 1),
and R (G, 0) reads from W (G, 0), we have an fr edge from
R (G, 0) to W (G, 1). Due to the hbRA from W (G, 1) to R (G, 0),
we obtain araloc-cycle, rendering this graphGRA-inconsistent.
Example 3.14. To see that the annotated outcome of the
2RMW program from Example 2.17 is disallowed by GRA, it
suffices to note that the following execution graph is GRA-
inconsistent for any choice of mo:
W (G, 0)
RMW (G, 0, 1) RMW (G, 0, 1)
rf rf
To see this, note that in GRA-consistent executions, mo can-
not contradict po. Hence, we must have mo from the initial
write to the two RMWs. This implies an fr edge in both
directions between the two RMWs, so that raloc must be
cyclic.
4 Making Declarative Semantics Fair
In this section we introduce memory fairness into declara-
tive memory models in a model-agnostic fashion.
To define fairness of execution graphs, we require that the
partial ordering of events in the graph is, like the ordering
of natural numbers, prefix-finite. From an operational point
of view, an event preceded by an infinite number of events
is never executed.
Definition 4.1. A relation ' on a set  is prefix-finite if
{0 ∈  | 〈0, 1〉 ∈ '} is finite for every 1 ∈ .
Concretely,we require themodification order and the from-
read relation to be prefix-finite.2
Definition 4.2. An execution graph  is fair if .mo and




fair for- ∈ {SC,TSO,RA}.
Example 4.3. The following program illustrates our defini-
tion of fairness:
G := 1 ;
!1 : 0 := G //only 1
goto !1
!2 : G := 2 ;
goto !2
(SCDeclUnfair)
Thread-fair executions of this program cannot produce the
annotated outcome with the SC memory system. With the
declarative SCmemory system, however, there are twoways
in which every read can read from the write of 1.
First, the write of 1 to G may have infinitely many mo-
















Otherwise, thewrite of 1mayhave finitely many mo-predecessors
but infinitely many mo-successors. Then, each of mo-successors

















In both cases, the execution graphs are unfair. (As we
prove below, this is not a coincidence.)
Example 4.4. On the converse, one should avoid unnec-
essary prefix-finiteness constraints. In particular, requiring
prefix-finiteness of cyclic relations, such as hbSC underTSO
orRA, is too strong. Doing sowould forbid the annotated be-
havior of the following example, since the corresponding ex-
ecution graph contains an infinite po∪fr descending chain.
2Note that the sequenced-before and reads-from relations are prefix-finite
in a well-formed execution graph. The former–by construction, the latter–
since its reverse relation is functional.
9
, Ori Lahav, Egor Namakonov, Jonas Oberhauser, Anton Podkopaev, and Viktor Vafeiadis
Yet, both TSO and RA allow the annotated behavior, as ev-
ery write may be delayed past 1-2 reads.
!1 : : := : + 1
G := :
0 := ~ //0,0,1,2. . .
goto !1
!2 : < :=< + 1
~ :=<
1 := G //0,1,2,3. . .
goto !2
(HbAcyclic)
W (G, 1) R (~, 0) W (G, 2) R (~, 0) W (G, 3) R (~, 1)
W (~, 1) R (G, 0) W (~, 2) R (G, 1) W (~, 3) R (G, 2)
. . .
Our main result extends Theorems 3.9, 3.11 and 3.12 for
infinite traces by imposing memory fairness on the opera-
tional systems (Def. 2.12) and execution graph fairness on
the declarative systems (Def. 4.2).
Theorem 4.5. For - ∈ {SC,TSO,RA},
Bmf (M- ) = B(G
fair
- ).
The full proof of this theorem could be found in Appen-
dix B and its Coq mechanization in [24]. Here, we outline
the proof starting with the easier direction.




Given amemory-fair behavior V ofM- , we let d be amemory-
fair observable trace of M such that V (d) = V . Then, using
d , we construct a fair execution graph  ∈ G- . Its events
are determined by V (.E = Event(V)), and its relations are
defined differently for every system:
SC. The rf and mo relations are determined by the trace
order: for each read rf assigns the latest write of the same
location, while mo corresponds to the trace order of writes
to same location. It follows that fr is included in the trace
order, and since the trace order is prefix-finite, mo and fr
are prefix-finite as well.
TSO. We define mo to be the order in which writes to the
same location are propagated (unbuffered) to memory. For
each read, rf maps it either to the mo-maximal write to the
same location that was propagated before it in d (if the read
reads from memory) or to the po-maximal one (if it reads
from the buffer). Since every write is eventually propagated
to memory, and once propagated no thread can read from
an mo-prior write, it follows that both mo and fr are prefix-
finite.
RA. The mo component of  follows the order induced
by timestamps of messages in the operational run. Prefix-
finiteness of mo follows from the facts that a location and
a timestamp uniquely identify the corresponding message
(and the write event in respectively) and that timestamps
are natural numbers—that is, each write event F represent-
ing a message with a timestamp C has at most C mo-prior
writes.
The rf component of  connects an event related to a
read/RMW transition of d with a write event representing
the message read by the transition.
Prefix-finiteness of fr follows from the fact that in the fair
operational run every message is eventually propagated to
every thread. That is, for any given write event F to a loca-
tion G in representing a message with a timestamp C , there
cannot be infinitely many reads from G in  reading from
write events that correspond to messages with timestamps
smaller than C .
4.2 B(Gfair
-
) ⊆ Bmf (M- )
The converse direction is more challenging. Given a fair G- -
consistent execution graph , we have to find amemory-fair
observable trace d ofM- such that Event(V (d)) = .E.
Put differently, we need a total order over .E \ Init that
extends .po, so that some memory-fair run of M- exe-
cutes according to this order. Existing proofs of correspon-
dence between declarative and operational definitions of SC
and RA pick an arbitrary total order extending .hbSC and
.hbRA respectively. (Assuming the axiom of choice, any
partial order ' on a set  can be extended to a total order
on .) It is then not difficult to show that executing the pro-
gram following that order yields the labels appearing in the
execution graph. For infinite graphs, however, an arbitrary
extension of .hbSC (or .hbRA respectively) does not nec-
essarily correspond to a (memory-fair) run of the program.
For this, we need an enumeration of .E \ Init, as defined
next.
Definition 4.6. An enumeration of a set is a (finite or infi-
nite) injective sequence a covering all the elements in (i.e.,
 = {a (8) | 8 ∈ dom(a)}). An enumeration a of  respects a
partial order ' on if 8 < 9 whenever 〈a (8), a ( 9 )〉 ∈ '. A set
is countable if it can be enumerated.
Prefix-finiteness of a partial order ensures that a suitable
enumeration exists:
Proposition 4.7. Let ' be a prefix-finite partial order on a
countable set . Then, there exists an enumeration of  that
respects '.
However, we do not yet have that the “happens-before”
relation of each model is prefix-finite; we only know that
.mo and .fr are prefix-finite. Next, we show that prefix-
finiteness of .mo and .fr suffices for prefix-finiteness of
the other relations, as long as the program in question has
a bounded number of threads.
First, note that every relation on a finite set is prefix-finite,
and prefix-finiteness is preserved by (finite) composition.
Lemma4.8. Let' and'′ be prefix-finite relations and= ∈ N.
Then ' ∪ '′, ' ; '′ and '≤= are also prefix-finite.
For transitive closures, we need an auxiliary property.
10
Making Weak Memory Models Fair ,
Definition 4.9. A relation ' on a set is =-total if for every
= + 1 distinct elements 01, ... ,0=+1 ∈ , we have 〈08 , 0 9 〉 ∈ '
for some 1 ≤ 8, 9 ≤ = + 1.
For an execution graph  with = threads, .po is =-total
(as a relation on.E\ Init). By the pigeonhole principle, any
set of = +1 events in.E \ Init contain two elements belong-
ing to the same thread, and those two events are ordered by
.po.
Now, if a relation ' is =-total and acyclic, its transitive clo-
sure '+ has bounded length, which entails that '+ is prefix-
finite provided ' is prefix-finite.
Lemma 4.10. Let ' be an acyclic, =-total, prefix-finite rela-
tion. Then, '+ is prefix-finite.
As a corollary, we obtain that the prefix-finiteness of the
“happens-before” relation in fair execution graphs.
Corollary 4.11. For- ∈ {SC,TSO,RA}, let be a fair G- -
consistent execution graph. Then .hb- is prefix finite.
From Prop. 4.7, there is an enumeration a that respects
hb- . We use a to construct a program trace d :
SC. The trace d followsa exactly. SinceMSC has no silent
memory transitions, d is trivially memory fair.
TSO. The trace d is incrementally constructed by follow-
ing the order of events in a and appending an appropriate
sequence of transitions. If the next event in a is a read, we ap-
pend to d all unexecuted po-prior writes and then the read.
If the next event in a is a write, we append the write to the
trace if it has not already been executed, and its propagation
to memory. By construction, every write in d is eventually
propagated to memory.
RA. The trace d is the enumeration a interleaved with
silent RA transition labels. Namely, for each write F and
thread g , we compute an index 8 in the enumeration s.t. it
is safe to propagateF to g at that index: for each event in g
with index greater than 8 , there is no hb-following (i) write
that mo-precedesF and (ii) read that reads from a write mo-
preceding F . Since  is fair, such an index is defined for
all (non-initialization) writes. Then, after the event with an
index corresponding to some write has been enumerated,
we execute a propagation transition for the write. In that
way, every write is eventually propagated to every thread,
so the resulting trace is memory fair.
Remark 3. Corollary 4.11 relies on having a bounded num-
ber of threads. With unbounded thread spawning, prefix-
finiteness of mo and fr is not enough to rule out unfair be-
haviors. To see this, consider the annotated behavior of the
following program and the corresponding execution graph:
! : : := : + 1
spawn
{
[: + 1] := 1













While mo and fr is trivially prefix-finite, hbSC has an infinite
descending chain, and indeed there is no SC execution of the
program leading to the annotated behavior.
5 Termination of Spinloops
In this section we consider the problem of termination spin-
loops under weak memory models. We show that, assum-
ing fairness as defined above, termination can be proven by
looking only at a single, specific iteration of the loop.
For simplicity, we henceforth assume that programs are
deterministic, as defined below.





−−→% ?2 imply that typ(;1) = typ(;2) and loc(;1) =
loc(;2), and, moreover, if ;1 = ;2, then ?1 = ?2 also holds.
For a behavior V of a deterministic program % , we denote
by `g (V) the unique run of % (g) that induces the sequential
trace V (g).
Definition 5.2. A spinloop iteration of thread g in a behav-
ior V is a range of event serial numbers [=,=′] such that the
sequence of corresponding program steps:
1. performs only reads: typ(tlab(`g (V) (8))) = R for
= ≤ 8 ≤ =′; and
2. returns the program to the starting state of the loop:
src(`g (V) (=)) = tgt(`g (V) (=
′)).
An infinite spinloop of thread g in a behavior V is an infinite
sequence B of consecutive spinloop iterations of thread g (i.e.,
B (8) = [=8, =
′
8 ] =⇒ ∃=
′





If infinite spinloops are the only source of unbounded be-
havior in programs (i.e., their individual iterations are of
bounded length and there are boundedly many writes to
each memory location), then because of fairness an infinite
spinloop has to eventually read from the mo-maximal writes.
Thus, the theorem below provides a sufficient condition for
establishing termination of spinloops.
Theorem5.3. Let V be a behavior of a deterministic program
and  be a fair execution graph with .E = Event(V) and
.scloc (see §3.2) irreflexive. For every infinite spinloop B of a
thread g in V whose iterations have bounded length and read
only from locations that are written to by finitely many writes
in  , there is a loop iteration B (8) whose reads all read from
mo-maximal writes.
11
, Ori Lahav, Egor Namakonov, Jonas Oberhauser, Anton Podkopaev, and Viktor Vafeiadis
6 Proving Deadlock Freedom for Locks
In this section we use Theorem 5.3 to reason about termina-
tion of spinlock and ticket lock clients. Our supplementary
material contains a proof for an MCS lock client.
6.1 Spinlock
Consider the following spinlock implementation:
int ; := 0
void ;>2: () { int A
repeat { repeat { A := ; } until (A = 0) }
until (CAS(;, 0, 1)) }
void D=;>2: () { ; := 0 }












Proof. Assume for the sake of contradiction that the pro-
gram has an infinite thread-fair V , which induced by a fair
execution graph . By inspection, since is infinite, V must
contain an infinite spinloop. The number of write events to
location ; in is finite since each thread makes at most two
writes to ; . Fix the mo-maximal one among them and denote
itF . Due to thread-fairness of V , the value written byF has
to be 0. Otherwise, it could have been only value 1 produced
by the CAS instruction, which is followed by a store writing
0. The write event produced by the store would have been
mo-following forF by {SC,TSO,RA}-consistency of .
By Theorem 5.3, there is a spinloop iteration that reads
fromF , which is a contradiction, since reading 0 from loca-
tion ; exits the loop. 
6.2 Ticket lock
Consider the following ticket lock implementation:
int B4AE8=6 := 0, C82:4C := 0
void ;>2: () { int B := 0, A := FADD(C82:4C, 1)
repeat { B := B4AE8=6 } until (B = A ) }
void D=;>2: () { B4AE8=6 := B4AE8=6 + 1 }
Theorem 6.2. In every thread-fair behavior of the following
program under Gfair
{SC,TSO,RA}
, A1, ... ,A# all grow unboundedly:
!1 : ;>2: ()
A1 := A1 + 1
D=;>2: ()
goto !1
!2 : ;>2: ()




!# : ;>2: ()
A# := A# + 1
D=;>2: ()
goto !#
Proof. For any thread-fair behavior V of this program and
a fair execution graph  inducing V , it can be shown that
each call to ;>2: reads a unique value from ticket , and that
whenever a certain ;>2: call reads ticket value E (and the
spinloop exits), the corresponding D=;>2: writes to serving
value E + 1. Moreover, the values written to ticket and to
serving are strictly increasing along .mo. (These are stan-
dard safety properties, so we elide details of their proofs.)
Bymeans of contradiction, now assume that there is a fair
execution graph inducing V where A8 for some 1 ≤ 8 ≤ #
is incremented only a finite number of times.
Due to thread-fairness of V , the only way this can happen
is if thread 8 has an infinite spinloop. There maywell bemul-
tiple threadswith infinite spinloops, so among those threads
let us consider the thread g that reads the smallest value for
ticket , say : , just before going into the infinite spinloop. So,
for all 0 ≤ 9 < : , some ;>2: has incremented ticket to value
9 and subsequently serving to value 9 + 1. In particular, the
mo-maximal among those sets serving to value : . Note that
there cannot be any writes to serving with larger values be-
cause they all require serving to first be set to : + 1 (which
does not happen since g is stuck in a spinloop).
Because of thread-fairness and Theorem 5.3, the infinite
spinloopmust have an iteration that reads from the mo-maximal
write to serving , i.e., reading value : . This is a contradiction,
because reading : exits the loop. 
7 Related Work and Discussion
Wehave investigated fairness inweakmemorymodels, both
operationally and declaratively, established three equivalence
results, and showed how the declarative formulations can be
used for reasoning about program termination.
Several papers, e.g., [5, 6, 9], have studied declarative for-
mulations of transactional consistencywith prefix-finiteness
constraints to ensure that a transaction is never preceded
by an infinite set of other transactions. In particular, Gots-
man and Burckhardt [9] established a connection between
declarative presentations that include fairness constraints
and operational presentations for models in their “Global
Operation Sequencing” framework. The TSO model can be
expressed in this framework. Nevertheless, their declarative
specifications require prefix-finiteness of the global visibil-
ity order, while we derive this property from prefix finite-
ness of more local relations (mo and fr). Thus, our formula-
tion is easily applicable for model checking based on partial
order reduction in the style of [12]. To the best of our knowl-
edge this is the first work to make a connection between
liveness in declarativemodels formulated in thewidely used
framework of Alglave et al. [2] and in operational models.
Finally, we outline several directions for future work.
Fairness underARMandPOWER. Low-level hardware
memory models, such as ARM and POWER, record syntac-
tic dependencies between instructions, so as to allow certain
executions with cycles in po ∪ rf. In these models prefix-
finiteness of mo and fr does not suffice for prefix-finiteness
of the appropriate “happens-before” relation. For instance,
under ARMv8 [8], assuming prefix-finiteness of mo and fr
does not forbid the out-of-thin-air read of the value 5 in the
12
Making Weak Memory Models Fair ,
following example (with an unbounded address domain):
!1 : ~8 := G8 //5
8 := 8 + 1
goto !1
!2 : G 9 := ~ 9+1 //5
9 := 9 + 1
goto !2
(Inf)
R (G0, 5) W (~0, 5) R (G1, 5) W (~1, 5) R (G2, 5) W (~2, 5)
R (~1, 5) W (G0, 5) R (~2, 5) W (G1, 5) R (~3, 5) W (G2, 5)
. . .
We suspect that in ARMv8, the appropriate liveliness condi-
tion should require prefix-finiteness of the “ordered-before”
(ob) relation, and leave the correspondence to the opera-
tional ARM model to a future work.
Weak RMWs. Besides the usual (“strong”) CAS instruc-
tions, C11 also support “weak” CASes [1],which may fail
spuriously, i.e., evenwhen they read the expected value, since
on some architectures—namely, POWER and ARM—weak
CASes are a bit more efficient than strong ones. In any case,
a strong CAS can be implemented by repeatedly perform-
ing a weak CAS in a loop as long as it fails spuriously. Nat-
urally, termination of such loops depends upon the weak
CASes not always failing spuriously—an additional fairness
requirement. Since this requirement is orthogonal to the no-
tion of memory fairness introduced in this paper, we leave
it for future work.
Acknowledgments
Ori Lahav was supported by the Israel Science Foundation
(grant number 5166651) and by the Alon Young Faculty Fel-
lowship. We thank FuMing, Diogo Behrens, and Lilith Ober-
hauser for implementing and reproducing theHMCS bug on
ARM.
References




[2] Jade Alglave, Luc Maranget, and Michael Tautschnig. 2014. Herd-
ing Cats: Modelling, Simulation, Testing, and Data Mining for Weak
Memory. ACM Trans. Program. Lang. Syst. 36, 2, Article 7 (July 2014),
74 pages. hps://doi.org/10.1145/2627752
[3] Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark We-
ber. 2011. Mathematizing C++Concurrency. In POPL 2011. ACM, New
York, 55–66. hps://doi.org/10.1145/1925844.1926394
[4] John Bender and Jens Palsberg. 2019. A formalization of Java’s con-
current access modes. Proc. ACM Program. Lang. 3, OOPSLA (2019),
142:1–142:28. hps://doi.org/10.1145/3360568
[5] Ahmed Bouajjani, Constantin Enea, and Jad Hamza. 2014.
Verifying Eventual Consistency of Optimistic Replica-
tion Systems. SIGPLAN Not. 49, 1 (Jan. 2014), 285–296.
hps://doi.org/10.1145/2578855.2535877
[6] Andrea Cerone, Giovanni Bernardi, and Alexey Gotsman. 2015. A
Framework for Transactional Consistency Models with Atomic Visi-
bility. In 26th International Conference on Concurrency Theory (CON-
CUR 2015) (LIPIcs), Vol. 42. Schloss Dagstuhl–Leibniz-Zentrum fuer
Informatik, 58–71.
[7] MilindChabbi, Michael Fagan, and JohnMellor-Crummey. 2015. High
performance locks for multi-level NUMA systems. ACM SIGPLAN
Notices 50, 8 (2015), 215–226.
[8] Shaked Flur, Kathryn E. Gray, Christopher Pulte, Susmit Sarkar,
Ali Sezgin, Luc Maranget, Will Deacon, and Peter Sewell.
2016. Modelling the ARMv8 Architecture, Operationally: Con-
currency and ISA. In POPL 2016. ACM, New York, 608–621.
hps://doi.org/10.1145/2837614.2837615
[9] Alexey Gotsman and Sebastian Burckhardt. 2017. Consistency
Models with Global Operation Sequencing and their Compo-
sition. In 31st International Symposium on Distributed Comput-
ing (DISC 2017) (Leibniz International Proceedings in Informat-
ics (LIPIcs)), Andréa W. Richa (Ed.), Vol. 91. Schloss Dagstuhl–
Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 23:1–23:16.
hps://doi.org/10.4230/LIPIcs.DISC.2017.23
[10] Jan-Oliver Kaiser, Hoang-Hai Dang, Derek Dreyer, Ori Lahav, and
Viktor Vafeiadis. 2017. Strong Logic for Weak Memory: Reasoning
About Release-Acquire Consistency in Iris. In 31st European Confer-
ence on Object-Oriented Programming (ECOOP 2017) (Leibniz Interna-
tional Proceedings in Informatics (LIPIcs)), Peter Müller (Ed.), Vol. 74.
Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Ger-
many, 17:1–17:29. hps://doi.org/10.4230/LIPIcs.ECOOP.2017.17
[11] Jeehoon Kang, Chung-Kil Hur, Ori Lahav, Viktor Vafeiadis, and
Derek Dreyer. 2017. A Promising Semantics for Relaxed-
Memory Concurrency. In POPL 2017. ACM, New York, 175–189.
hps://doi.org/10.1145/3009837.3009850
[12] Michalis Kokologiannakis, Azalea Raad, and Viktor Vafeiadis.
2019. Model Checking for Weakly Consistent Libraries. In
Proceedings of the 40th ACM SIGPLAN Conference on Program-
ming Language Design and Implementation (PLDI 2019). Associ-
ation for Computing Machinery, New York, NY, USA, 96–110.
hps://doi.org/10.1145/3314221.3314609
[13] Ori Lahav, Nick Giannarakis, and Viktor Vafeiadis. 2016. Taming
Release-acquire Consistency. In POPL ’16. ACM, New York, 649–662.
hps://doi.org/10.1145/2837614.2837643
[14] Ori Lahav, Viktor Vafeiadis, Jeehoon Kang, Chung-Kil Hur,
and Derek Dreyer. 2017. Repairing Sequential Consis-
tency in C/C++11. In PLDI 2017. ACM, New York, 618–632.
hps://doi.org/10.1145/3062341.3062352
[15] Leslie Lamport. 1977. Proving the Correctness of Multipro-
cess Programs. IEEE Trans. Software Eng. 3, 2 (1977), 125–143.
hps://doi.org/10.1109/TSE.1977.229904
[16] Leslie Lamport. 1979. How to Make a Multiprocessor Computer That
Correctly Executes Multiprocess Programs. IEEE Trans. Computers 28,
9 (1979), 690–691.
[17] Daniel Lehmann, Amir Pnueli, and Jonathan Stavi. 1981. Impartiality,
Justice and Fairness: The Ethics of Concurrent Termination. In ICALP
1981 (LNCS), Shimon Even and Oded Kariv (Eds.), Vol. 115. Springer,
264–277. hps://doi.org/10.1007/3-540-10843-2_22
[18] Scott Owens, Susmit Sarkar, and Peter Sewell. 2009. A Better x86
Memory Model: x86-TSO. In TPHOLs ’09. Springer, Heidelberg, 391–
407. hps://doi.org/10.1007/978-3-642-03359-9_27
[19] David Park. 1980. On the semantics of fair parallelism. In Abstract
Software Specifications. Springer, 504–526.
[20] Anton Podkopaev, Ori Lahav, and Viktor Vafeiadis. 2019. Bridging the
Gap between Programming Languages and Hardware Weak Memory
Models. Proc. ACM Program. Lang. 3, POPL, Article 69 (Jan. 2019),
31 pages. hps://doi.org/10.1145/3290382
[21] Christopher Pulte, Shaked Flur, Will Deacon, Jon French, Susmit
Sarkar, and Peter Sewell. 2018. Simplifying ARM concurrency:
multicopy-atomic axiomatic and operational models for ARMv8.
PACMPL 2, POPL (2018), 19:1–19:29. hps://doi.org/10.1145/3158107
13
, Ori Lahav, Egor Namakonov, Jonas Oberhauser, Anton Podkopaev, and Viktor Vafeiadis
[22] Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli,
and Magnus O. Myreen. 2010. x86-TSO: A Rigorous and Usable Pro-
grammer’s Model for x86 Multiprocessors. Commun. ACM 53, 7
(2010), 89–97. hps://doi.org/10.1145/1785414.1785443
[23] SPARC International Inc. 1994. The SPARC architecture manual (ver-
sion 9). Prentice-Hall.
[24] Supp 2020. Coq proof scripts and supplementary material for this
paper (enclosed with submission).
14
Making Weak Memory Models Fair ,
A Proofs of Propositions and Lemmas in the Paper
A.1 Proofs for Section 2
Proof of Prop. 2.5. W.l.o.g. suppose that d1 ∈ OTr(%). Note that there are no silent transition labels in traces of % , so d1 is a
trace of % . It means that there exists a run `1 of % inducing d1.
Note that transitions of different sequential programs are independent of each other. That is, they can be reordered in `1 in
such a way that the resulting run `2 will induce d2. Since V (d1) = V (d2), no same thread’s transitions reordering are needed
to do that. 
Proof of Prop. 2.8. W.l.o.g. suppose that d1 is a thread-fair observable trace of % induced by some `1. By Prop. 2.5 d2 is an
observable trace of % induced by some `2.
Note that, since V (d1) = V (d2), every thread’s LTS goes via the same sequence of states. That is, for each thread `2 has the
same set of continuously enabled states as `1 does. Also, since V (d1) = V (d2), every continuously enabled state is succeeded
by the same sequence of labels both in `1 and `2. Thus, if d1 is thread-fair then d2 should also be. 
Proof of Prop. 2.9. In the right-to-left implications of both claims the trace d belongs both to B(M) and B(%) (Btf (%) corre-
spondingly) thus satisfying needed conditions.
Note that for a (thread-fair) behavior V under M there are two traces: one from B(%) (Btf (%)) and another from B(M).
But since these traces have the same behavior, left-to-right implications can be proved by applying Prop. 2.5 (Prop. 2.8 corre-
spondingly) to the trace from B(M). 
Proof of Prop. 2.13. In the right-to-left implications of both claims the trace d belongs both to Bmf (M) and B(%) (Btf (%)
correspondingly) thus satisfying needed conditions.
Note that for a memory-fair (thread&memory-fair) behavior V under M there are two traces: one from B(%) (Btf (%)) and
another from Bmf (M). But since these traces have the same behavior, left-to-right implications can be proved by applying
Prop. 2.5 (Prop. 2.8 correspondingly) to the trace from Bmf (M). 
A.2 Proofs for Section 4
Proof of Prop. 4.7. Let a be an enumeration on . For 8 ∈ N, let (8 = {a (8)} ∪ {0 | 〈0, a (8)〉 ∈ '} be the set containing a (8) and
all its '-predecessors. Note that this set is finite because ' is prefix-finite.
Now let ) 8 = (8 \
⋃
{( 9 | 9 < 8} be the set of new elements in (8 (i.e., those not included in the (-set for previous indices).





















... C8|) 8 | .
Note that this sequence has at least 8 elements, so `(8) is well defined. Moreover, by construction, this sequence has no
duplicates, respects ', and the sequence for 8 is a prefix of the sequence for 8 + 1. From these facts, it follows that ` is an
enumeration of  that respects '. 
Proof of Lemma 4.10. It suffices to show that '2=+1 ⊆ '≤2= . The reason is that then '+ ⊆ '≤2= , and so by Lemma 4.8, '+ is
prefix-finite. Consider therefore a set of 2= + 1 distinct elements 01, ... ,02=+1 ∈  such that 〈0: , 0:+1〉 ∈ ' for all 1 ≤ : ≤ 2=.
Consider the set of = + 1 elements 01, 03, 02=+1. By =-totality, there exist 1 ≤ 8, 9 ≤ = + 1 such that 〈028+1, 029+1〉 ∈ '. Since ' is
acyclic, it follows that 8 < 9 , and so there is a shorter path from 01 to 02=+1, i.e., 〈01, 02=+1〉 ∈ '
≤2= , as required. (In more detail,
we have 〈01, 028+1〉 ∈ '
28+1, 〈028+1, 029+1〉 ∈ ', 〈029+1, 02=+1〉 ∈ '
2=−29 , and so 〈01, 02=+1〉 ∈ '
28+1+1+2=−29
= '2(=+8+1− 9) ⊆
'≤2= .) 
Proof of Corollary 4.11. In all cases, by applying Lemma 4.10, it suffices to show that the non-transitive versions of these
relations are acyclic, prefix-finite, and =-total. Acyclicity follows by the consistency conditions, prefix-finiteness follows by
Lemma 4.8 as they are unions of prefix-finite relations.
=-totality follows immediately for SC and RA, because po is =-total where = is the number of threads.
In the case of TSO, po\ ((W \RMW) × (R \RMW)) is 2=-total (where = is the number of threads) because for every three events
in the same thread at least two of them are related by po \ ((W \ RMW) × (R \ RMW)). Namely, let the three events be 0, 1, and
2 , and without loss of generality assume that 〈0, 1〉 ∈ po and 〈1, 2〉 ∈ po. If 〈0, 1〉 ∉ (W \ RMW) × (R \ RMW), then 〈0, 1〉 satisfies
the required condition. Otherwise, depending on the type of 2 , either 〈1, 2〉 or 〈0, 2〉 satisfy it. 
15
, Ori Lahav, Egor Namakonov, Jonas Oberhauser, Anton Podkopaev, and Viktor Vafeiadis
Proof of Theorem 5.3. Assume for the sake of contradiction that # > 0 is the smallest number for which no iteration B (8) reads
from mo-maximal stores in its first # steps. Consider a spinloop iteration B (8) in which the first # − 1 steps read mo-maximal
stores, and let the location that its # th step reads be G .
Because of determinism and.scloc irreflexivity, in all subsequent iterations B ( 9 ) with 9 > 8 , the first # − 1 transitions are
identical, while the # th transitions read location G .
Since by assumption the spinloop only reads from locations with a finite set of writes, G must have an mo-maximal write
F . By assumption, step # of the spinloop never reads from F , so all of these infinitely many reads must read from stores
which are mo-ordered beforeF . Thus we have infinitely many reads which are fr-ordered beforeF . But because is fair, fr
is prefix-finite, which is a contradiction. 
B Equivalence proofs of operational and declarative representations
In this section, we present the proof of Theorem 4.5. We split it to six lemmas: forward and backward directions for SC
(Lemmas B.1 and B.2), RA (Lemmas B.3 and B.4), and TSO (Lemmas B.6 and B.20) models.




Proof. Let V = V (d) where d is a trace induced by a run ` of MSC.
We construct an SC-consistent fair execution graph inducing V such that.E represents labels of d . We start by defining





〈=, g : ;〉 where : ∈ dom(d), d (:) = g : ;, ; ∈ ELab,
and = , |{8 ≤ : | ∃; ′ ∈ ELab. d (8) = g : ; ′}|
⊥ otherwise
We define a set , which consists of events constructed from trace d via function d · supplemented with initial events, and a
strict partial order <d over it:
 , Init ∪ {d: | : ∈ dom(d)}
<d , Init × ( \ Init) ∪ {〈d
8 , d 9 〉 ∈  ×  | 8 < 9}
We define  to be a tuple 〈, rf, mo〉 where:
• rf , ( [W] ; <d |loc ; [R]) \ (<d |loc ; [W] ; <d |loc) relates a read and the previous <d -latest write to the same location;
• mo , [W] ; <d |loc ; [W] relates <d-ordered writes to the same location: mo.
Finally, we prove the following.
•  is an execution graph.
– Requirements on the event set hold by construction.
– During a read transition the value currently stored in memory is observed and this value is written by the last write
to the same location. Also, rf−1 is functional and is defined for all reads because for each read there is the unique
previous write to the same location.
– For each ; ∈ Loc the relation mo; is a strict total order because <d |loc is a strict total order.
• fr ⊆ <d . Suppose that there are A ∈ R, F ∈ W such that fr(A ,F) ∧ F <d A . By the definition of fr, there is a F
′ ∈ W
such that rf(F ′, A ) ∧ mo(F ′,F). By the definition of.mo it follows thatF ′ <d F . But then it follows that A reads from
non-latest write which contradicts the definition of.rf.
•  is fair holds because both mo and fr are subsets of <d , which is prefix-finite.
•  ∈ GSC holds because .hbSC is a subset of <d , which is a strict partial order.
• V () = V . By the definition of d the sequence of labels of events belonging to a thread g is exactly a restriction of d to
g .





Proof. Let V ∈ B(Gfair
SC
). Let be a fair SC-consistent execution graph with.E that induces the behavior V . We’ll show that
there exists a memory-fair trace d of MSC that induces behavior V .
Since .E represents d , .E \ Init =
⋃
g ∈) 83,:∈dom(dg ) {(:, g : dg (:))} where dg is a restriction of d to thread g . That is, for
each g ∈ )83 there exists `g .
By Corollary 4.11 and Prop. 4.7, there exists an enumeration {48 }8 ∈N of.E \ Init that respects .hbSC.
16
Making Weak Memory Models Fair ,
Now, we define the run of the memory system. We build a function ` , N → M .Q × (Tid × ELab) × M .Q recursively.
Consider an arbitrary 8 ∈ N.
• Let (:, g : dg (:)) = 48 .
• Then let (BA2, ;1;, C6C) = `g (:). Note that ;1; = dg (:) by construction.
• If 8 = 0 let"prev = M .init, else let"prev = tgt(`(8 − 1)).
• If typ(;1;) = R ∨ typ(;1;) = F, let"next = "prev .
Otherwise let"next = "prev [loc(;1;) → valw(;1;)].
• Finally, we define `(8) = ("prev , ;1;,"=4GC ).
Suppose that on each step the SC memory subsystem allows the transition between the "prev and "next . Then ` is a
sequence of transitions. Also, src(`(0)) = MSC.init and adjacent states agree. That is, ` is a run ofMSC.
Let d = tlab ◦ ` be a trace ofMSC. For each g ∈ Tid the restriction d |g is equal to dg by construction, so V (d) = V () = V .
Also, since SC doesn’t have silent memory transitions, d is memory-fair.
It remains to show that on each step the SC memory subsystem allows the transition between the"prev and"next . To do
that we need to prove that during transitions with R and RMW labels a thread observes the value currently stored in memory at
some address G . Note that by the moment of that transition’s execution the value stored in memory is written by the current
mo|G -latest writeF because the events enumeration order respects hbSC ⊇ mo. So it would be sufficient to prove that rf(F, A )
where A is the event that corresponds to the aforementioned transition.
Suppose by contradiction that there is F ′ such that F ′ ≠ F ∧ rf(F ′, A ) ∧ mo(F ′,F). Then fr(A ,F) and, since fr ⊆ hbSC,
the A transition should have been executed beforeF transition which contradicts the choice ofF . 




Proof. Let V be in Bmf (MRA) and d and ` be a finite or infinite and fair trace and run MRA inducing V correspondigly.
Then, we define a partial function d · : N⇀ Tid ×N × ELab, a set , and a partial order <d over it as in Lemma B.1 for SC.
We define a function smap :  → QRA:
smap(4) ,
{
@ where 4 = d: for some : ∈ N and 〈_, @〉 = `(:)
initRA where 4 ∈ Init
its projection vmap :  → View which maps non-initial events to their thread views:
vmap(4) ,
{
T (tid(4)) where 4 ∈  \ Init and 〈_,T , _〉 = smap(4)
Vinit where 4 ∈ Init
and its projection tmap :  ⇀ Time which maps events to timestamps of related messages:
tmap(4) ,
{
vmap(4) (loc(4)) where 4 ∈  and loc(4) is defined
⊥ otherwise
We denote {〈4, 4 ′〉 | 5 (4) = 5 (4 ′)} by =5 and {〈4, 4
′〉 | 5 (4) < 5 (4 ′)} by <5 for 5 ∈ {tmap, vmap}. Obviously, both <tmap
and <vmap are strict partial orders.
We define  to be the execution graph 〈, rf, mo〉 where:
• rf , [W]; =tmap |loc ; [R];
• mo , [W]; <tmap |loc ; [W];
Consequently, .fr is equal to [R]; <tmap |loc ; [W].
Finally, we prove the following.
•  is an execution graph.
– Requirements on the events set hold by construction.
– During a read transition, a message is read, and this message is written by some write to the same location. Also, rf−1
is functional and is defined for all reads because for each read there is the unique message to the same location with
the same timestamp.
– For each ; ∈ Loc the relation mo; is a strict total order because <tmap |loc is a strict total order by properties of tmap.
•  is fair. First, we show that .mo is prefix-finite. Fix F ∈ E ∩ W. We know that tmap(F) is defined and belongs to N.
As a consequence, |dom(.mo; [F]) | ≤ tmap(F) by definition of.mo.
Now, we show that.fr is prefix-finite. Suppose that it is not the case. Then, there existsF ∈ E∩W and an infinite set of
unique read events {A8}8 ∈N ⊆ E∩R s.t. ∀8 ∈ N. 〈A8,F〉 ∈ .fr. Since a number of threads in the program is bounded, we
17
, Ori Lahav, Egor Namakonov, Jonas Oberhauser, Anton Podkopaev, and Viktor Vafeiadis
may assume that tid(A8) = g for all 8 and some g . Also, since {A8}8 ∈N ⊆ E \ Init, we know that there exists {08 }8 ∈N ⊆ N
s.t. ∀8 ∈ N. A8 = d
08 . We can deduce that ∀8 ∈ N. .rf−1(A8) <tmap F .
For each 8 , `8 .Q.T (g) (G) is less than tmap(F). tmap(F) is a timestamp of somemessage< = 〈loc(F) : val(F)@tmap(F), _〉 ∈
`8 .Q." . That is,
prop(g,<)
−−−−−−−→RA is continuously enabled in ` and never taken. It contradicts fairness of `.
•  ∈ GRA, that is, the relation .hbRA |;>2 ∪ mo ∪ fr is acyclic. We start by showing that .hbRA is acyclic. It follows
from the fact that po ∪ rf ⊆ <d and <d is a strict partial order:
– .po ⊆ <d holds by construction of .
– rf ⊆ <d . Fix an edge 〈4, 4
′〉 in rf. Since 4 ′ ∈ R, there exists 9 s.t. 4 ′ = d 9 . Also, 4 ′ ∈ .E \ Init. If 4 ∈ Init, then 4 <d 4
′.
If 4 ∈ .E \ Init, then there exists 8 s.t. 4 = d8 . Also, it means that 〈_, tid(4) : elab(4), smap(4)〉 = `(8). Since 4 ∈ W,
the `(8) transition is a write step, i.e., on this step a message< = 〈loc(4) : _@tmap(4), _〉 is added. The transition 9
reads the message, i.e., 8 < 9 . Consequently, 4 <d 4
′.
Acyclicity of.hbRA means that a.hbRA |;>2∪mo∪fr cycle has to contain at least one mo∪fr edge. Then, it is enough to
show that.hbRA |;>2 ⊆ ≤tmap since mo∪fr ⊆ <tmap (mo by definition, and fr as being equal to [R]; <tmap |loc ; [W])
and <tmap is a strict partial order.
To show that .hbRA |;>2 ⊆ ≤tmap , it is enough to prove that .hbRA ⊆ ≤vmap . For the latter, it is enough to show
.po ∪ rf ⊆ ≤vmap since ≤vmap is transitive.
– .po ⊆ ≤vmap. Fix an edge 〈4, 4
′〉 in.po. By construction of , we know that smap(4) −→∗
RA
smap(4 ′). Consequently,
vmap(4) ≤ vmap(4 ′) since a view of a specific thread only growths during the RA run.
– rf ⊆ ≤vmap. Fix an edge 〈4, 4
′〉 in rf. If 4 ∈ Init, then vmap(4) = Vinit, that is, vmap(4) ⊑ vmap(4
′). If 4 ∈ .E \ Init,
then there exists 8 s.t. 4 = d8 . Also, it means that 〈_, tid(4) : elab(4), smap(4)〉 = `(8). Since 4 ∈ W, the `(8) transition
is a write step, i.e., on this step a message< = 〈loc(4) : _@tmap(4), vmap(4)〉 is added.
Since 4 ′ ∈ R, there exists 9 s.t. 4 ′ = d 9 . On the `( 9 ) transition, message < is read by thread tid(4 ′). That is, the
thread’s view is updated by message’s view vmap(4). As a consequence, vmap(4) ⊑ vmap(4 ′).





Proof. Let V ∈ B(Gfair
RA
) be induced by a fair execution graph s.t.  is in Gfair
RA
. Let {48 }8 ∈N be an enumeration of.E \ Init
events3 which respects .po ∪ .rf. Such enumeration exists since .po ∪ .rf is acyclic. We also define corresponding
sequences of event sets {8 = Init ∪
⋃
9<8 4 9 }8 ∈N and (partial) execution graphs:
{8 = 〈8, [8] ; rf ; [8], [8 ] ; mo ; [8]〉}8 ∈N
For the sequence of execution graphs, we will construct a sequence of RA memory subsystem states {@8 }8 ∈N ⊆ QRA s.t., for
all 8 ∈ N, @8 simulates 8 and the following conditions hold:
• @0 = initRA;







The latter condition states that @8 and @8+1 are related via a step with the same label as 48 and a finite number of RA’s silent
transition, i.e., propagation of some messages. The propagation steps are required for constructing a fair run.






RA @8+1 the silent transitions’ number is finite, we can construct a run
of the RA operational machine from {@8}8 ∈N with the same behavior V induced by execution graph  . In the remainder of
the proof, we define a number of auxiliary constructions, build {@8 }8 ∈N, then show that the aforementioned conditions on the
sequence hold, and prove the related run of the RA machine is fair.
Since .mo is acyclic and prefix-finite, there exists a function tmap : .E → Time satisfying the following requirements:
• 4 ∈ Init ⇒ tmap(4) = 0;
• 〈F,F ′〉 ∈ .mo ⇒ tmap(F) < tmap(F ′);
• 〈F,F ′〉 ∈ .mo \ (.mo ;.mo) ⇒ tmap(F ′) = tmap(F) + 1;
• 〈F, A 〉 ∈ .rf ; [.R] ⇒ tmap(F) = tmap(A ).
Consequently, we know that if 〈A ,F〉 ∈ .fr, then tmap(A ) < tmap(F).
3Here we assume that.E is infinite to be specific. However, the similar argument works for the finite case.
18
Making Weak Memory Models Fair ,
We define a set of safe points for propagation of a write event F , denoted safepoints(F), to contain events which are not
hb?
RA




; [{4 | loc(4) = loc(F) ∧ tmap(4) < tmap(F)} ∪ {F}])
We define the following partial function tslot : Tid × ((.E ∩ W) \ Init) ⇀ N:
tslot(g, 48) ,
{
⊥ if .Eg ⊆ ∅
min{ 9 | 8 ≤ 9 ∧ ∀: > 9 . 4: ∈ .Eg ⇒ 4: ∈ safepoints(48 )} otherwise
Note that .E \ Init ⊆ {48 }8 ∈N, thus we can define function for elements of the sequence.
We use tslot(g,F) to point to a transition in {@8}8 ∈N which includes propagation step
prop(g,<)
−−−−−−−→RA for a message< repre-
senting F in the RA memory, i.e.,< = 〈loc(F) : val(F)@tmap(F), _〉.
We need to show that tslot is defined for all non-empty threads and non-initializing write events in  as a consequence of
 ’s fairness.
Lemma B.5. tslot(g,F) ≠ ⊥ for allF ∈ (.E ∩ W) \ Init and g s.t. .Eg ≠ ∅.
Proof. Fix g andF . If thread g has a finite number of events then either they all are enumerated beforeF , or the po-latest one
among them is in.Eg ∩ safepoints(F). Now, we consider the case of infinite number of events in g . To prove the lemma, we
need to show that .Eg ∩ safepoints(F) is not empty.
Since F ∈ .E \ Init, there exists 8 s.t. F = 48 . Let’s pick 4 90 for some 90 s.t. 4 90 ∈ .Eg and 90 ≥ 8 . If 4 90 ∈ safepoints(F),
then the proof is completed. Otherwise, there exists some :0 s.t.
4:0 ∈ codom( [4 90 ] ;.hb
?
RA) ∧ loc(4:0 ) = loc(F) ∧ tmap(4:0 ) < tmap(F)
We know that :0 is bigger than 90 since 〈4 90 , 4:0〉 ∈ .hb
?
RA
and enumeration {48 }8 ∈N respects .hbRA. Then, we can pick
91 > :0 s.t. 4 91 ∈ .Eg . Again, we have either found an element of safepoints(F), or we could pick :1 > 91. By iterating the
process, we either find an element of safepoints(F), or construct an infinite sequence {4:= }=∈N s.t.
loc(4:= ) = loc(F) ∧ tmap(4:= ) < tmap(F)
All elements of the sequence are either read or write events since they all operate on the same location as the write event F .
That is, {4:= }=∈N has an infinite subsequence containing either only read events or only write events. In the first case, there
is an infinite number of .fr-predecessors of F , in the second case—.mo-predecessors of F . In both cases, it contradicts
fairness of . 
Now, we construct {@8 = 〈"8 ,T8〉}8 ∈N. As we mentioned earlier, we want @8 to simulate8 : (i) there should be a message in
"8 for each write event in8 and (ii) T8 (g) has to represent tmap-timestamps of write events from dom(.hbRA ; [8 .Eg ]) for
each thread g . However, all components of @8 have to account for message propagation steps (which we determine by tslot)
also.
We define an auxiliary function set2view : P(.E) → View which assigns views to a set of write events from.E:
set2view() ,
⊔
{[loc(F) : tmap(F)] | F ∈ W ∩ }
Then, we define a function vmap : .E → View which assigns views representing hbRA paths, a function vmapprop : .E \
Init → View which assigns views representing observed propagated messages according to tslot, and a combination of the




vmapprop(48 ) , set2view {F | tslot(tid(48 ),F) < 8}
vmapfull (4) , vmap(4) ⊔ vmapprop(4)
All three functions are monotone on .hbRA paths:
∀〈4, 4 ′〉 ∈ .hbRA, 3 ∈ {−, prop, full}. vmap3 (4) ⊑ vmap3 (4
′)
The function e2m : .E ∩ W → Msg constructs a message from a write event:
e2m(4) , 〈loc(4) : val(4)@tmap(4), vmapfull(4)〉
19
, Ori Lahav, Egor Namakonov, Jonas Oberhauser, Anton Podkopaev, and Viktor Vafeiadis
Now, we can construct all components of @8 :
"8 , memoryinit ∪
⋃
{e2m(4 9 ) | 9 < 8 ∧ 4 9 ∈ W}
T ′8 , _g .
⊔
{vmapfull (4 9 ) | 9 < 8 ∧ 4 9 ∈ Eg } ⊔ set2view{F | tslot(g,F) < 8 − 1}
T8 , _g . T
′
8 (g) ⊔ set2view{F | tslot(g,F) < 8}
@8 , 〈"8 ,T8〉










holds for all 8 ∈ N.






• elab(4) = R(G, E). In this case, "8+1 = "8 and T
′
8+1 = T8 [g ↦→ T8 (g) ⊔ vmapfull (4)] by construction. Since 4 ∈ .E ∩ R
and enumeration {48 }8 ∈N respects .rf, there exists : < 8 s.t. 〈4: , 4〉 ∈ .rf, tmap(4: ) = tmap(4), val(4: ) = E , and
e2m(4: ) ∈ "8 .
For the transition to hold, we need to show that T (g) (G) ≤ tmap(4: ), i.e.,
(
⊔
{vmapfull(4 9 ) | 9 < 8 ∧ 4 9 ∈ Eg } ⊔ set2view{F | tslot(g,F) < 8})(G) ≤ tmap(4: )
It could be split to three statements to show:
1. (
⊔
{vmap(4 9 ) | 9 < 8 ∧ 4 9 ∈ Eg })(G) ≤ tmap(4: ).
Suppose it does not hold. Then, there exists some 4 9 and F ∈ W s.t. loc(F) = G , tmap(4: ) < tmap(F), 〈F, 4 9 〉 ∈
.hb?
RA
, and 〈4 9 , 4〉 ∈ .po. That is, there is a cycle [F] ; .hb
?
RA
; [4 9 ] ; .po ; [4] ; .rf
−1 ; [4: ] ; mo ; [F]. Since
[F] ; .hb?
RA
; [4 9 ] ; .po ; [4] ⊆ .hbRA |;>2 and .rf




{vmapprop(4 9 ) | 9 < 8 ∧ 4 9 ∈ Eg })(G) ≤ tmap(4: ).
Suppose it does not hold. Then, there exists some 9 , ? , and F ∈ W s.t. loc(F) = loc, tmap(4: ) < tmap(F), ? =
tslot(g,F), ? < 9 , and {4? , 4 9 } ⊆ Eg . That is, 〈4? , 4〉 ∈ .po ⊆ .hb
?
RA
. The fact that tmap(4) = tmap(4: ) < tmap(F)
contradicts definition of tslot.
3. (set2view{F | tslot(g,F) < 8})(G) ≤ tmap(4: ).
Suppose it does not hold. Then, there exists someF ∈ W s.t. loc(F) = loc, tmap(4: ) < tmap(F), and tslot(g,F) < 8 .
It contradicts definition of tslot.
• elab(4) = W(G, E). In this case, "8+1 = "8 ∪ {e2m(4)} and T
′
8+1 = T8 [g ↦→ T8 (g) ⊔ vmapfull(4)] by construction.
By .hbRA-monotonicity of vmapfull, vmapfull (4) = T8 (g) [G ↦→ tmap(4)] and, consequently, T8 (g) ⊔ vmapfull (4) =
vmapfull (4).
Absence of a message to G with the same timestamp tmap(4) in"8 follows from the construction of"8 and properties
of tmap. Also, T8 (g) (G) < tmap(4) holds by the same reason as in the read case.
• elab(4) = RMW(G, ER, EW). This case is similar to the read and write ones.






RA @8+1. For that, we take a set of pairs
- = {〈g,F〉 | ∃g,F. 8 = tslot(g,F)}
Since tslot(g, 4: ) ≥ : for all : ∈ N and g ∈ Tid, |- | is smaller than |Tid| ∗ 8 , i.e., it is finite. Thus, there is a finite number of
message propagation steps which lead to @8+1 from 〈"8+1,T
′
8+1〉. Note that the number of steps may be smaller than the size
of - , e.g., if {〈g,F〉, 〈g,F ′〉} ⊆ - and loc(F) = loc(F ′)—in this case, it is enough to make a propagation step only for a
message to the location with the biggest timestamp. We construct a run ` of the RA model inducing V from {@8 }8 ∈N.
Lastly, we need to show that ` is fair. For that, it is enough to show that there do not exist g ∈ Tid,< ∈ Msg, and : ∈ N
s.t. for all 9 > : there exists @′ s.t. @ 9
prop(g,<)
−−−−−−−→RA @
′. Suppose there are such g ,<, and : . It means that< ∈ M 9 for all 9 > :
and there is F ∈ {48 }8≤: ∩ (.E ∩ W) \ Init. s.t. e2m(F) = < and tid(F) = g . By Lemma B.5, tslot(g,F) is defined. That is,
tmap(F) ≤ T9 (g) (G) for 9 > tslot(g,F). It contradicts that for all 9 > : there exists @








Proof. Let V ∈ Bmf (MTSO). Then, V = V (d) where d is a trace induced by a memory-fair run ` of MTSO. We’ll construct a
TSO-consistent fair execution graph inducing V s.t. .E represents labels of d .
20
Making Weak Memory Models Fair ,
First we’ll relate write and propagation transitions. Since d is memory-fair, every buffered write eventually gets propagated.
That is, there exists a bijective function FA8C42?A>? : {8 | d (8) = (_, W(_, _))} → {8 | d (8) = (_, prop)} that maps the trace
index of write transition to the trace index of transition that propagates it.
We start by constructing the set of graph events E. It is made of Init set and trace-induced events constructed with the
function d like in the SC case. Note that d has an inverse d−1 on the resulting set of non-initializing events.
The program order on the resulting events set (as defined in Lemma B.1) under TSO doesn’t necessarily represent the order
in which the shared memory is accessed. To establish that order, we introduce the visibility function E8B : E → Z. For all 4
such that typ(4) ∈ {R ∪ RMW} let E8B (4) = d−1(4), that is, the trace index of the transition corresponding to that event. On the
other hand, the visibility of a non-RMWwrite is determined by its propagation rather than the write transition itself. That is,
for all 4 such that typ(4) = W ∧ 4 ∉ Init let E8B (4) = FA8C42?A>? (d−1(4)). Also let E8B (4) = −1 for all 4 ∈ Init.
The relations rf and mo will represent the events’ visibility in the graph.
• The modification order is determined by the order in which events become visible:
∀G ~. mo(G,~) ⇐⇒ G ∈ W ∧ ~ ∈ W ∧ loc(G) = loc(~) ∧ E8B (G) < E8B (~).
• The rf relation formalizes the rule according to which a write to a location is observed: it is either the latest non-
propagated write to that location from reading thread’s buffer or, if this buffer is empty, the latest propagated write to
that location. Formally, rf(F, A ) where bothF and A access the location ; if either:
– tid(A ) = tid(F) and d−1(F) is the index of the latest transition of a write to ; such that d−1(F) < E8B (A ) < E8B (F);
– the buffer of tid(A ) is empty by the moment of E8B (A )th transition and E8B (F) is the index of the latest ;-propagation
transition less than E8B (A ).
Lemma B.7.  = 〈, rf, mo〉 is a TSO-consistent fair execution graph inducing V s.t. .E represents labels of d .
Proof. The proof of  inducing V and having .E representing labels of d is the same as in Lemma B.1. The requirements on
 being an execution graph hold by the construction of.E,.rf and.mo. Fairness and two conditions of TSO-consistency
are proved in Lemma B.13, Lemma B.18 and Lemma B.19 correspondingly. 
To prove fairness and TSO-consistency of the aforementioned graph we’ll first state some auxillary claims.
Proposition B.8. The index of write transition is less than the index of its propagation: ∀8F . 8F < FA8C42?A>? (8F).
Proposition B.9. An event transition is executed at the same moment or before it becomes visible: ∀4. d−1(4) ≤ E8B (4).
Proof. If 4 is not a write, it becomes visible in the same transition. If 4 is a write event, it can be propagated only after the
write transition itself. 
Proposition B.10. Program order on writes and RMWs respects the propagation order: ∀G ~. ( [W]; po; [W]) (G,~) =⇒ E8B (G) <
E8B (~).
Proof. Writes of the same thread are propagated in the FIFO order. Also, E8B function is monotone w.r.t. event index for same
thread events. 
Proposition B.11. ∀F A . rf(F, A ) =⇒ d−1(F) < E8B (A ). Moreover, rfe(F, A ) =⇒ E8B (F) < E8B (A ).
Proof. By construction the read event can only observe a write that has been executed before. If the read observes the write
from another thread, then it should has been already propagated. 
Lemma B.12. fr(A ,F) =⇒ E8B (A ) < E8B (F).
Proof. Suppose that there exists such read A that fr(A ,F) ∧ E8B (F) < E8B (A ). It reads from someF ′ so that mo(F ′,F). By the
definition of mo it follows that E8B (F ′) < E8B (F). But A must read at least from the most recent propagated write, that is, at
least fromF . 
Lemma B.13.  is fair.
Proof. mo is prefix-finite by construction. Suppose, by contradiction, that there exists F for which there are infinitely many
reads that read before it. Note that there is only a finite amount of read transitions before E8B (F) in the trace order. That is,
there are infinitely many of them after E8B (F). But, as Lemma B.12 shows, there are no such reads. 
Proposition B.14. Relation po; fr is irreflexive.
21
, Ori Lahav, Egor Namakonov, Jonas Oberhauser, Anton Podkopaev, and Viktor Vafeiadis
Proof. Suppose that po(F, A ) and fr(A ,F). Then there exists aF ′ such that rf(F ′, A ) and mo(F ′,F).
Consider the E8B (F). By Lemma B.12 E8B (A ) < E8B (F). Then the write buffer is not empty by the moment of read so A should
have observed F or more recent write from the buffer. If F ′ is from another thread, it cannot be observed. If F ′ is from the
same thread, because of mo(F ′,F) it’s true that po(F ′,F) and thusF ′ cannot be observed as well. 
Proposition B.15. [W]; po;>2 ; [R]; fr; [W] ⊆ mo.
Proof. Suppose thatF1 andF2 are related in the way shown above. Then either mo(F1,F2) or mo(F2,F1). Suppose the latter
is the case. Then fr(A ,F1) by the definition of fr. That is, there exists a cycle of form po; fr. According to the Prop. B.14 it’s
not possible. 
PropositionB.16. Program order respects precedence relation except for write-read pair:∀G ~. (po\((W\RMW)×(R\RMW))) (G,~) =⇒
E8B (G) < E8B (~).
Proof. If the first event is not a write, it becomes visible immediately: E8B (G) = d−1(G). Also d−1(G) < d−1(~). Finally, by
Prop. B.9 d−1(~) ≤ E8B (~).
Suppose that the first event is a write. Then the second event is a write or an RMW, so E8B (G) < E8B (~) by Prop. B.10. 
Proposition B.17. ∀A ∈ {rfe, mo, fr} G ~. A (G,~) =⇒ E8B (G) < E8B (~).
Proof. By the definition of mo together with Prop. B.11 and Lemma B.12. 
Lemma B.18. The relation po;>2 ∪ rf ∪ mo ∪ fr is acyclic.
Proof. Suppose, by contradiction, that there exists such cycle. We represent it as po chains (possibly made of one event only)
alternated with rfe ∪ moe ∪ fre edges. With that, we’ll show that for every 48 – the first event in the cycle’s 8th po chain –
E8B (48) < E8B (48+1). Since it’s a cycle, it’ll result in E8B (40) < E8B (40).
Consider an arbitrary 8th po chain and the corresponding 48 (its po-first event). Also consider A8 (the next rfe∪ moe∪ fre
edge) which relates C8 (the po-last event in 8th chain) and 48+1.
If 48 = C8 , then E8B (48 ) = E8B (C8). Also, by Prop. B.17, E8B (C8 ) < E8B (48+1).
Suppose 48 ≠ C8 . If typ(48 ) = W ∧ typ(C8 ) = R, then A8 ∈ fr. Then by Prop. B.15 mo(48 , 48+1) so E8B (48) < E8B (48+1). If, on
the other hand, C8 ∈ W, then by Prop. B.15 E8B (48 ) < E8B (C8 ) otherwise, if 48 ∈ R, Prop. B.16 applies, so again E8B (48) < E8B (C8 ).
Finally E8B (C8 ) < E8B (48+1) by Prop. B.17.
In all cases it’s true that E8B (48 ) < E8B (48+1). 
Lemma B.19. The relation ppo ∪ rfe ∪ mo ∪ fr is acyclic.
Proof. The argument from Lemma B.18 applies. By definition of ppowe don’t consider the case of po∩ ((W \RMW) × (R \RMW))
— a special case of po∩ (W× R) which was the only one in the proof of Lemma B.18 where the restriction to the same location
was needed. 
This concludes the proof of Lemma B.6. 





Lemma B.21. Let be a fair TSO-consistent execution graph of % where the amount of threads is finite. Then there exists a fair
TSO execution trace of % with the same behavior.
Proof. The proof structure is the same as in Lemma B.2: we construct an enumeration of graph events that respects an acyclic
relation on it, then obtain a run of program according to that enumeration and finally prove that TSO memory subsystem
allows that run’s transitions.
We’ll consider a weaker relation as a base for the enumeration, namely, hbTSO , (ppo∪rfe∪mo∪fr)
+. It is is prefix-finite
by Corollary 4.11.
By Prop. 4.7 there exists an enumeration {48 }8 ∈N of .E \ Init that respects hbTSO.
According to that enumeration we build a run of a concurrent system. A corresponding sequence of transitions is obtained
with the following algorithm. We start with the initial state. Then we enumerate (w.r.t. hbTSO) graph events and for each
event extend the current run prefix with a block of transitions.
• If the current event is a RMW, the block will consist only of a RMW transition.
22
Making Weak Memory Models Fair ,
• If the current event is a non-RMW read, the block will consist of write transitions for all write events that po-precede
the current event but haven’t been enumerated or processed in such way yet. Such events may appear because hbTSO
doesn’t respect the program order between non-RMW writes and reads in general. Additionally, the last block event
will be the read transition itself.
• If the current event is a non-RMW write and it hasn’t been already included in some read’s block the way described
above, the current block will include the corresponding write transition. Additionally, whether the write has been
processed before or not, the block will include the propagation transition.
It remains to show that the resulting labels sequence is actually the sought trace. 
Lemma B.22. The sequence ` of transitions obtained by the algorithm above is a fair TSO execution trace.
Proof. Suppose that on each stepTSOmemory subsystem allows the transition obtained by the algorithm.Note that src(`(0)) =
MTSO.init and adjacent states agree. That is, ` is a run ofMTSO.
Let d = tlab ◦ ` be a trace of MTSO. For each g ∈ Tid the restriction d |g is equal to dg by construction: even if the
enumeration outputs events out of program order, due to the specific form of read blocks they appear in the trace according
to po. So V (d) = V () = V . Also, for each write transition in the run there exists a propagation transition, so silent memory
transition labels (g, prop) are eventually taken. That is, d is memory-fair.
It remains to show that on each step the TSO memory subsystem allows the transition. Namely, we need to show that
propagation transitions occur on non-empty thread buffer, RMW transitions occur on empty thread buffer and read/RMW
labels agree with current buffer and memory contents. The first condition holds because propagation occurs only after write
transition that puts an item into a buffer, and propagation is the only way to clear a buffer. The second condition holds
because enumeration respects the program order between writes and RMW, so by the moment of such transition’s execution
all previous thread’s writes will be already propagated.
To prove the latter condition we’ll consider the write event that a given read event reads from in and show that this write
coincides with the one determined by the TSO memory subsystem4. Let A be the read event and ;>2 the location it reads from,
FCA the write that has been observed during the transition and Frf the rf-predecessor of A . Suppose by contradiction that
FCA ≠ Frf.
• If the thread buffer has non-propagated writes to the location ;>2 , theFCA will be the latest non-propagated write to ;>2
of the same thread.
– If mo(Frf,FCA ), then fr(A ,FCA ). Therefore, there is an po; fr cycle prohibited by TSO.
– Suppose that mo(FCA ,Frf).
∗ If Frf belongs to the same thread, it can only appear after A in the program order, since FCA is the latest write. So
there is a po; rf cycle prohibited by TSO.
∗ IfFrf belongs to the other thread, it has been already been propagated since rfe is respected by the enumeration.
But it means thatFCA is also propagated.
• If all previous writes to ;>2 from the same thread have been already propagated, the FCA will be the mo-latest (in the
enumeration) write to ;>2 .
– If mo(Frf,FCA ), then fr(A ,FCA ). Then, since the enumeration respects fr,FCA will be preceded by A in the trace order
and thus cannot be observed during the transition.
– Suppose that mo(FCA ,Frf). Then Frf is not propagated yet. The only case when A can still read from it is when Frf
belongs to the same thread and is buffered. But it contradicts the condition on the empty buffer. 
This concludes the proof of Lemma B.20. 
4Here we reason about events in the context of operational semantics. Formally, it only deals with event labels. But every value stored in memory or buffer
is placed there due to some transition execution which in turn is determined by some graph event, since the trace under consideration is built from  . So
we implicitly assume such mapping between memory values and graph events that produce them.
23
, Ori Lahav, Egor Namakonov, Jonas Oberhauser, Anton Podkopaev, and Viktor Vafeiadis
C MCS lock
Consider the following implementation of the lock passing mechanism of the MCS lock (simplified for clarity):
int∗ ;>2:43 := new int(0)
int∗∗ =4GC := new int∗ (null)
void A4248E4!>2: () { int A
[;>2:43] := 1
[=4GC] := ;>2:43
repeat { A := [;>2:43] } until (A = 0) }
void ?0BB!>2: () { int∗ ;
repeat { ; := [=4GC] } until (; ≠ null)
[;] := 0 }
We verify that the lock passing can not get stuck, i.e., spinloops terminate in the following client which passes the lock
from the current owner to the next:
?0BB!>2: () A4248E4!>2: () (MCS-Client)
Theorem C.1. MCS-Client’s thread-fair behaviors under fair
{SC,TSO,RA}
are all finite.
Proof. Observe that the mo-maximal store on next sets [next] = locked . Thus by Theorem 5.3, the spinloop in passLock
terminates. Observe that it only terminates after reading from that store. Thus after termination we have l = locked . Due to
thread-fairness, the passing thread executes a store F setting [l ] to 0, and this store appears in the mo of [l ] = [locked ]. To
show that the spinloop in receiveLock terminates, it suffices (with Theorem 5.3) to show thatF is mo-maximal. We show the
proof only for RA since it allows the most behaviors. Consider for the sake of contradiction the following (partially depicted)
execution graph, which depicts the only legal mo in whichF is not mo-maximal:
W ([locked ], 0) W ([next ], null)
W ([locked ], 1)
W ([next ], locked )
R ([next ], locked )
F : W ([locked ], 0)
rf
mo
which has a raloc-cycle and is therefore not GRA-consistent. 
Note thatF is only mo-maximal because raloc is irreflexive in G{RA,SC,TSO} . In other memory systems (e.g., ARMv8, Pow-
erPC, (R)C11),F is not guaranteed to be mo-maximal without additional fences, and locks that pass ownership in this manner
will not always terminate even in thread&memory-fair executions. This includes the HMCS lock, for which “the fences nec-
essary for the HMCS lock on systems with processors that use weak ordering” suggested in the original paper [7, p. 218] do
not ensure termination on any of these systems. Indeed we have reproduced this non-terminating behavior on an ARMv8
server with 128 cores using the fences suggested in that paper, resulting eventually in a system-wide, permanent hang. This
shows that termination of spinloops under weak memory models is a practically important topic which has so far been mostly
ignored in the literature.
24
