Robustness against Power is PSPACE-complete by Derevenetc, Egor & Meyer, Roland
Robustness against Power is PSpace-complete
Egor Derevenetc1,2 and Roland Meyer2
1Fraunhofer ITWM 2University of Kaiserslautern
Abstract. Power is a RISC architecture developed by IBM, Freescale,
and several other companies and implemented in a series of POWER
processors. The architecture features a relaxed memory model provid-
ing very weak guarantees with respect to the ordering and atomicity of
memory accesses.
Due to these weaknesses, some programs that are correct under sequen-
tial consistency (SC) show undesirable effects when run under Power.
We call these programs not robust against the Power memory model.
Formally, a program is robust if every computation under Power has the
same data and control dependencies as some SC computation.
Our contribution is a decision procedure for robustness of concurrent
programs against the Power memory model. It is based on three ideas.
First, we reformulate robustness in terms of the acyclicity of a happens-
before relation. Second, we prove that among the computations with
cyclic happens-before relation there is one in a certain normal form.
Finally, we reduce the existence of such a normal-form computation to a
language emptiness problem. Altogether, this yields a PSpace algorithm
for checking robustness against Power. We complement it by a matching
lower bound to show PSpace-completeness.
1 Introduction
To execute code as fast as possible, modern processors reorder operations. For
example, Intel x86/x86-64 and SPARC processors implement the Total Store Or-
dering (TSO) memory model [13] which allows write buffering: store operations
in each thread can be queued and get executed on memory later. Processors
can also execute independent instructions out of program order as soon as the
input data and computational units are available for them. This is an inherent
feature of the POWER and ARM microprocessors [12]. Moreover, Power and
ARM memory models, unlike TSO, do not guarantee store atomicity: one write
can become visible to different threads at different times. They only ensure that
all threads see stores to the same memory location in the same order; stores to
different memory locations can be seen in different order by different threads.
All these optimizations are usually designed so that a single-threaded pro-
gram has the illusion that its instructions are executed in program order. The
picture changes in the presence of concurrency. Concurrent programs are often
assumed to have sequentially consistent (SC) semantics [10]: each thread exe-
cutes its operations in program order, stores become visible immediately to all
threads. Concurrent programs may observe a difference from SC when run on
ar
X
iv
:1
40
4.
70
92
v1
  [
cs
.L
O]
  2
8 A
pr
 20
14
Thread 1 Thread 2
𝑎 : mem[&𝑥]← 1 𝑐 : r1 ← mem[&𝑦]
𝑏 : mem[&𝑦]← 1 𝑑 : r2 ← mem[&𝑥]
Fig. 1. Message Passing (MP) program [14]. By &𝑥 and &𝑦 we denote the addresses of
the variables 𝑥 and 𝑦. Initially, 𝑥 = 𝑦 = 0. The first thread writes a message into 𝑥 and
sets flag variable 𝑦, signifying that the message is written. The second thread reads the
flag and, if it is set, expects to see the message written to 𝑥 by the first thread.
a modern processor with a weak memory model. To see this, consider the MP
program in Figure 1. SC and TSO forbid the situation where r1 > r2 upon ter-
mination of both threads. However, this is possible on Power: instruction 𝑐 can
read the value written by 𝑏, whereas 𝑑 reads the initial value.
We call a program not robust against Power [15,6,7,2,5,8,4] if it exhibits non-
SC behaviors when executed under the Power memory model. More formally, a
program is robust if all its Power computations have the same data and control
dependencies as the computations under SC. That is, for every Power compu-
tation there is a sequentially consistent computation which executes the same
instructions, all loads read from the same stores in both computations, and stores
to the same address happen in the same order. Robust programs produce the
same results on Power and SC architectures, which means verification results for
SC remain valid for the weak memory model.
We present an algorithm for deciding robustness against Power. This is the
first decidability result for this architecture and, more generally, the first decid-
ability result for a non-store atomic memory model. We obtain the algorithm
in the following steps. First, we reformulate robustness in terms of acyclicity of
a happens-before relation, using the result by Shasha and Snir [15]. Second, we
show that among the computations with cyclic happens-before there is always
one in a certain normal form. Next, we prove that the set of all normal-form com-
putations can be generated by a multiheaded automaton — an automaton model
developed recently in the context of robustness [8]. Finally, to check cyclicity of
the happens-before relation we intersect this automaton with regular languages.
The program is robust iff the intersection is empty. This reduces robustness to
language emptiness for multiheaded automata. The algorithm works in space
polynomial in the size of the program. We obtain a matching lower bound by a
reduction of SC-reachability to robustness, similar to [5].
Related work The happens-before relation was formulated by Lamport [9].
Shasha and Snir [15] have shown that a computation violates sequential consis-
tency iff it has a cyclic happens-before relation. Burckhardt and Musuvathi [6]
proposed the first algorithm for detecting non-robustness against TSO based on
monitoring SC computations. Burnim et al. [7] pointed out a mistake in the
definition of TSO used in [6] and described monitoring algorithms for the TSO
and PSO memory models. Alglave and Maranget [2] presented a tool to stat-
ically over-approximate happens-before cycles in programs written in x86 and
Power assembly, and to insert synchronization primitives (memory fences and
syncs) as required for robustness (called stability in their work). Bouajjani et
al. [5] obtained the first decidability result for robustness: robustness against
TSO is PSpace-complete for finite-state programs. In [4] they presented a re-
duction of robustness against TSO to SC reachability for general programs and
an algorithm for optimal fence insertion.
The Power architecture has attracted considerable recent attention. Alglave
et al. [3] give an overview of the numerous publications devoted to defining its
semantics. We highlight two Power models: the operational model by Sarkar
et al. [14] and the axiomatic one by Mador-Haim et al. [11]. These models were
extensively tested against the architecture and were proven to be equivalent [11].
Nevertheless, the operational model is known to forbid certain behaviors that
are possible on real hardware1 and in the axiomatic model2 [3]. Fortunately,
there is a suggested fix: in Section 4.5 of [14] one should read from a coherence-
order-earlier write instead of from a different write (two occurrences). Then, the
operational model is believed to strictly and tightly over-approximate Power [1].
In the present paper we stick to the corrected operational model from [14].
Finally, we would like to note that ARM has a memory model very similar
to that of Power. The differences and similarities are highlighted by Maranget et
al. in [12,3]. This fact promises a relatively easy transfer of the proof techniques
used in the present paper to the ARM memory model.
2 Programming Model
We define programs and their semantics in terms of automata. An automaton
is a tuple 𝐴 = (𝑆,𝛴,𝛥, 𝑠0, 𝐹 ), where 𝑆 is a set of states, 𝛴 is an alphabet,
𝛥 ⊆ 𝑆×(𝛴∪{𝜀})×𝑆 is a set of transitions, 𝑠0 ∈ 𝑆 is an initial state, and 𝐹 ⊆ 𝑆
is a set of final states. We call the automaton finite if 𝑆 and 𝛴 are finite. We write
𝑠1
𝑎−→ 𝑠2 if 𝑡 = (𝑠1, 𝑎, 𝑠2) ∈ 𝛥 and denote src(𝑡) := 𝑠1, dst(𝑡) := 𝑠2, lab(𝑡) = 𝑎.
The language of the automaton is ℒ(𝐴) := {𝜎 ∈ 𝛴* | 𝑠0 𝜎−→ 𝑠 for some 𝑠 ∈ 𝐹}.
For a sequence 𝜎 = 𝑎1 . . . 𝑎𝑛 ∈ 𝛴* we define |𝜎| := 𝑛, 𝜎[𝑖] := 𝑎𝑖, first(𝜎) := 𝑎1,
and last(𝜎) := 𝑎𝑛. We use · for concatenation,↓for projection, and 𝜀 for the empty
sequence. Given 𝛼 ∈ 𝛴* and 𝑎, 𝑏 ∈ 𝛼, we write 𝑎 <𝛼 𝑏 if 𝛼 = 𝛼1 · 𝑎 · 𝛼2 · 𝑏 · 𝛼3.
Given a function 𝑓 : 𝑋 → 𝑌 , 𝑥′ ∈ 𝑋, and 𝑦′ ∈ 𝑌 , we define 𝑓 ′ = 𝑓 [𝑥′ ←˒ 𝑦′] by
𝑓 ′(𝑥) := 𝑓(𝑥) for 𝑥 ∈ 𝑋 ∖ {𝑥′} and 𝑓 ′(𝑥′) := 𝑦′.
A program is a finite sequence of threads: 𝒫 = 𝒯1 . . . 𝒯𝑛. A thread is an
automaton 𝒯tid = (𝑄tid,CMD, ℐtid, 𝑞0tid, 𝑄tid) with a finite set of control states
𝑄tid, all of them being final, initial state 𝑞0tid, and a set of transitions ℐtid called
instructions and labeled with commands CMD defined below. Each thread has
an id from TID := [1..|𝒫|].
1 http://diy.inria.fr/cats/pldi-power/#lessvs
2 http://diy.inria.fr/cats/cav-power/
Let DOM = ADDR be a finite domain of values and addresses containing the
value 0. Let REG be a finite set of registers that take values from DOM. The set
of commands CMD includes loads, stores, local assignments, and conditionals
(assume):
⟨cmd⟩ ::= ⟨reg⟩ ← mem[⟨expr⟩] | mem[⟨expr⟩] ← ⟨expr⟩
| ⟨reg⟩ ← ⟨expr⟩ | assume(⟨expr⟩)
The set of expressions EXPR is defined over constants from DOM, registers from
REG, and (unspecified) functions FUN over DOM ∪ {⊥}. We assume that these
functions return ⊥ iff any of the arguments is ⊥.
2.1 Power Semantics
We briefly recall the corrected model from [14]. The state of a running program
consists of the runtime states of threads and the state of a storage subsystem.
The runtime state of a thread includes information about the instructions be-
ing executed by the thread. In order to start executing an instruction, the thread
must fetch it. The thread can fetch any instruction whose source control state is
equal to the destination state of the last fetched instruction. Then, the thread
must perform any computation required by the semantics of this instruction.
For example, for a load the thread must compute the address being accessed,
then read the value at this address, and place it into the target register. The
last step of executing an instruction is committing it. Committing an instruction
requires committing all its dependencies. For example, before committing a load
the thread must commit all its address dependencies — the instructions which
define the values of registers used in the address expression — and control depen-
dencies — the program-order-earlier (fetched earlier than the load) conditional
instructions. Moreover, all loads and stores accessing the same address must be
committed in the order in which they were fetched.
The storage subsystem keeps track, for each address, of the global ordering
of stores to this address — the coherence order — and the last store to this
address propagated to each thread. When a thread commits a store, this store is
assigned a position in the coherence order which we identify by a rational number
— the coherence key. We choose rational numbers (rather than naturals) to be
able to insert a store between any two stores in the coherence order. The key
must be greater than the coherence key of the last store to the same address
propagated to this thread. The committed store is immediately propagated to
its own thread. At some point later this store can be propagated to any other
thread, as long as it is coherence-order-later (has a greater coherence key) than
the last store to the same address propagated to that thread. When a thread
loads a value from a certain address, it gets the value written by the last store
to this address propagated to this thread. A thread can also forward the value
being written by a not yet committed store to a later load reading the same
address. This situation is called an early read.
An important property of Power is that it maintains the illusion of sequen-
tial consistency for single-threaded programs. This means that reorderings on
the thread level must not lead to situations when, e.g., a program-order-later
load reads a coherence-order-earlier store than the one read by a program-order-
earlier load from the same address. In [14] these restrictions are enforced by the
mechanism of restarting operations. We put these conditions into the require-
ments on final states of the running program instead.
To keep the paper readable, we omit the descriptions of Power synchroniza-
tion instructions: sync, lwsync, isync. All constructions in the paper can be
consistently extended to support them with the final result continuing to hold.
Formally, we define the semantics of program 𝒫 on Power by a Power au-
tomaton 𝑍(𝒫) := (𝑆𝑍 ,E, 𝛥𝑍 , 𝑠0𝑍 , 𝐹𝑍). Here, E is a set of labels called events
that we define together with the transitions.
State space A state of the Power automaton is a pair 𝑠𝑍 = (ts, 𝑠𝑌 ) ∈ 𝑆𝑍 with
runtime thread states ts : TID→ 𝑆𝑋 and storage subsystem state 𝑠𝑌 ∈ 𝑆𝑌 .
A runtime thread state 𝑠𝑋 = (fetched, committed, loaded) ∈ 𝑆𝑋 includes a
finite sequence of fetched instructions fetched ∈ ℐ*, a set of indices of committed
instructions committed ⊆ [1..|fetched|], and a function giving the store read by
a load loaded : [1..|fetched|] → {⊥}∪ {inita | a ∈ ADDR} ∪TID×N. We use inita
to denote the initial store of value 0 to address a. The initial state of a running
thread is 𝑠0𝑋 := (𝜀, ∅, 𝜆𝑖.⊥).
A state of the storage subsystem 𝑠𝑌 = (co, prop) ∈ 𝑆𝑌 includes a mapping
from a store instruction (its thread id and index in the list of fetched instructions)
to its position in the coherence order co : (TID× N ∪ {inita | a ∈ ADDR}) → Q,
and a mapping from a thread id and an address to the last store to this address
propagated to this thread prop : TID× ADDR→ {inita | a ∈ ADDR} ∪ TID× N.
The initial state of the storage subsystem is 𝑠0𝑌 := (𝜆tid.𝜆𝑖.0, 𝜆tid.𝜆a.inita).
The initial state of automaton 𝑍(𝒫) is 𝑠0𝑍 := (𝜆tid.𝑠0𝑋 , 𝑠0𝑌 ).
Transition relation Fix a state 𝑠𝑍 = (ts, 𝑠𝑌 ) with 𝑠𝑌 = (co, prop) and a thread
id tid ∈ TID with runtime state ts(tid) = (fetched, committed, loaded).
Let eval(tid, 𝑖, 𝑒) return the value in DOM of expression 𝑒 in the 𝑖’th
fetched instruction of thread tid, or ⊥ when the value is undefined. For-
mally eval(tid, 𝑖, 𝑒) := v, where v is computed as follows. If 𝑒 ∈ DOM, then
v := 𝑒. If 𝑒 = f(𝑒1 . . . 𝑒𝑛), then v := f(eval(tid, 𝑖, 𝑒1) . . . eval(tid, 𝑖, 𝑒𝑛)). Oth-
erwise, 𝑒 = r ∈ REG. Let 𝑖′ ∈ [1..𝑖 − 1] be the greatest index, such that
fetched[𝑖′] is a local assignment or a load to r. If there is no such index,
we define v := 0. If lab(fetched[𝑖′]) = r ← 𝑒v, then v := eval(tid, 𝑖′, 𝑒v). If
lab(fetched[𝑖′]) = r ← mem[𝑒a], then v := ⊥ if loaded[𝑖′] = ⊥, v := 0 if
loaded[𝑖′] = init*, and v := val(loaded[𝑖′]) otherwise (see the definition of val
below).
The expression addr(tid, 𝑖) returns the value of the address argument of the
𝑖’th fetched instruction of thread tid and is defined as follows. We use the special
value ⊤ if the instruction has no such argument. If lab(fetched[𝑖]) = r← mem[𝑒a]
or lab(fetched[𝑖]) = 𝑒a ← mem[𝑒v], then addr(tid, 𝑖) := eval(tid, 𝑖, 𝑒a). Otherwise,
addr(tid, 𝑖) := ⊤.
Similarly, the expression val(tid, 𝑖) returns the value of the value argu-
ment of the 𝑖’th fetched instruction of thread tid and is defined as follows. If
lab(fetched[𝑖]) = mem[𝑒a] ← 𝑒v, lab(fetched[𝑖]) = r ← 𝑒v, or lab(fetched[𝑖]) =
assume(𝑒v), then val(tid, 𝑖) = eval(tid, 𝑖, 𝑒v). Otherwise, val(tid, 𝑖) := ⊤.
The expressions addrdep(tid, 𝑖), datadep(tid, 𝑖), ctrldep(tid, 𝑖) denote the sets of
indices of instructions in thread tid being respectively address, data, and control
dependencies of the 𝑖’th instruction. The first two can be formally defined in
a recursive manner, similar to eval. Also, ctrldep(tid, 𝑖) := {𝑖′ ∈ [1..𝑖 − 1] |
lab(fetched[𝑖′]) = assume(𝑒v)}.
Let 𝒯tid = (𝑄tid,CMD, ℐtid, 𝑞0tid, 𝑄tid) ∈ 𝒫. The transition relation 𝛥𝑍 is the
smallest relation defined by the rules below:
POW-FETCH Consider instr ∈ ℐtid with src(instr) = dst(last(fetched)) or
src(instr) = 𝑞0tid if fetched = 𝜀, then:
(ts, 𝑠𝑌 )
(fetch,tid,instr)−−−−−−−−−→ (ts[tid ←˒ (fetched · instr, committed, loaded)], 𝑠𝑌 ).
POW-LOAD If fetched[𝑖] is a load, loaded[𝑖] = ⊥, a = addr(tid, 𝑖) ̸= ⊥, then:
(ts, 𝑠𝑌 )
(load,tid,𝑖,a)−−−−−−−→ (ts[tid ←˒ (fetched, committed, loaded[𝑖 ←˒ prop(tid, a)])], 𝑠𝑌 ).
POW-EARLY Let fetched[𝑖] be a load, loaded[𝑖] = ⊥, and a = addr(tid, 𝑖) ̸= ⊥.
Let 𝑖′ ∈ [1..𝑖 − 1] be the greatest index such that fetched[𝑖′] is a store with
a′ = addr(tid, 𝑖′) ∈ {a,⊥}. If a′ ̸= ⊥, val(tid, 𝑖′) ̸= ⊥, 𝑖′ ̸∈ committed, then:
(ts, 𝑠𝑌 )
(load,tid,𝑖,a)−−−−−−−→ (ts[tid ←˒ (fetched, committed, loaded[𝑖 ←˒ (tid, 𝑖′)])], 𝑠𝑌 ).
POW-COMMIT Consider 𝑖 ∈ [1..|fetched|]∖ committed where fetched[𝑖] is not
a store. Assume addrdep(tid, 𝑖) ∪ datadep(tid, 𝑖) ∪ ctrldep(tid, 𝑖) ⊆ committed.
Assume a = addr(tid, 𝑖) ̸= ⊥ and v = val(tid, 𝑖) ̸= ⊥. If a ̸= ⊤, assume
{𝑖′ ∈ [1..𝑖 − 1] | addr(tid, 𝑖′) ∈ {a,⊥}} ⊆ committed. In case fetched[𝑖] is a
load, assume loaded[𝑖] ̸= ⊥. In case fetched[𝑖] is an assume(), assume v ̸= 0.
Then:
(ts, 𝑠𝑌 )
(commit,tid,𝑖)−−−−−−−−→ (ts[tid ←˒ (fetched, committed ∪ {𝑖}, loaded)], 𝑠𝑌 ).
POW-STORE Assume all the preconditions from the previous rule hold, but
fetched[𝑖] is a store. Choose a coherence key k ∈ Q such that there is no
tid′ ∈ TID, 𝑖′ ∈ N for which co(tid′, 𝑖′) = k. Then:
(ts, 𝑠𝑌 )
(commit,tid,𝑖,k,a)−−−−−−−−−−→ (ts[tid ←˒ (fetched, committed ∪ {𝑖}, loaded)], 𝑠′𝑌 ),
where 𝑠′𝑌 := (co[(tid, 𝑖) ←˒ k], prop).
Additionally, this transition is immediately followed by a POW-PROP tran-
sition propagating the store to the thread where it was committed.
POW-PROP Consider tid′ ∈ TID, 𝑖′ ∈ N with co(tid′, 𝑖′) ̸= ⊥. Let a =
addr(tid′, 𝑖′). Assume co(prop(tid, a)) < co(tid′, 𝑖′). Then:
(ts, 𝑠𝑌 )
(prop,tid,tid′,𝑖′,a)−−−−−−−−−−→ (ts, (co, prop[(tid, a) ←˒ (tid′, 𝑖′)])).
Final states The set of final states 𝐹𝑍 ⊆ 𝑆𝑍 consists of all states
𝑠𝑍 = (ts, (co, prop)) ∈ 𝑆𝑍 , such that for each tid ∈ TID, ts[tid] =
(fetched, committed, loaded) the following holds:
FIN-COMM All instructions are committed: committed = [1..|fetched|].
FIN-LD Loads agree with the coherence order. Let fetched[𝑖] be a load, and
fetched[𝑖′] be an earlier load to the same address: 𝑖′ < 𝑖, addr(tid, 𝑖) =
addr(tid, 𝑖′). Then co(loaded[𝑖′]) ≤ co(loaded[𝑖]).
FIN-LD-ST Loads and stores in the same thread agree with the coherence
order. Let fetched[𝑖] be a load, let fetched[𝑖′] be an earlier store to the same
address: 𝑖′ < 𝑖, addr(tid, 𝑖) = addr(tid, 𝑖′). Then co(tid, 𝑖′) ≤ co(loaded[𝑖]).
The set of all Power computations of program 𝒫 is Cpower(𝒫) := ℒ(𝑍(𝒫)). The
set of all SC computations of the program Csc(𝒫) ⊆ Cpower(𝒫) includes only
those computations where each instruction is executed atomically, and stores
are immediately propagated to all threads.
Example 1. 𝜎MP = fetch(𝑎)·commit(𝑎)·prop(𝑎, 1)·fetch(𝑏)·commit(𝑏)·prop(𝑏, 1)·
prop(𝑏, 2) · fetch(𝑐) · fetch(𝑑) · load(𝑐) · load(𝑑) · commit(𝑑) · commit(𝑐) is a feasible
Power computation of the program MP (Figure 1):
– fetch(𝑎) := (fetch, 1, 𝑎) — thread 1 fetches store instruction 𝑎.
– commit(𝑎) := (commit, 1, 1, 1,&𝑥) — thread 1 commits 𝑎 with k = 1.
– prop(𝑎, 1) := (prop, 1, 1, 1,&𝑥) — 𝑎 is propagated to its own thread.
– fetch(𝑏) := (fetch, 1, 𝑏) — thread 1 fetches store instruction 𝑏.
– commit(𝑏) := (commit, 1, 2, 2,&𝑦) — thread 1 commits 𝑏 with k = 2.
– prop(𝑏, 1) := (prop, 1, 1, 2,&𝑥) — the store is propagated to its thread.
– prop(𝑏, 2) := (prop, 2, 1, 2,&𝑥) — the store is propagated to thread 2.
– fetch(𝑐) := (fetch, 2, 𝑐) — thread 2 fetches load 𝑐.
– fetch(𝑑) := (fetch, 2, 𝑐) — thread 2 fetches load 𝑑.
– load(𝑐) := (load, 2, 1,&𝑦) — thread 2 reads value 1 written by 𝑏 to 𝑦, because
𝑏 was propagated to thread 2.
– load(𝑑) := (load, 2, 2,&𝑥) — thread 2 reads the initial value 0 of 𝑥, because
𝑎 was not propagated to thread 2.
– commit(𝑑) := (commit, 2, 2) — thread 2 commits load 𝑑.
– commit(𝑐) := (commit, 2, 1) — thread 2 commits load 𝑐.
In the end, FIN-COMM holds as all fetched instructions are indeed commit-
ted, and FIN-LD and FIN-LD-ST trivially hold, as none of the threads has two
instructions accessing the same address.
Lemma 1. Assume 𝑠0𝑍
𝜎−→ 𝑠𝑍 ∈ 𝐹𝑍 . Then 𝑠𝑍 is uniquely determined.
Proof. Given a state and an event e, there is at most one transition from this
state labeled by e. This is clear for non-load events. For load events, this follows
from Lemma 4 and Lemma 5: if a load event was produced by a load from
memory transition, then condition (3) from Lemma 5 holds, then condition (1)
from Lemma 4 cannot hold for any store, therefore, the load event cannot be
produced by an early read transition. ⊓⊔
Lemma 2. Let 𝑠0𝑍
𝜎−→ (ts, 𝑠𝑌 ) e−→ (ts′, 𝑠′𝑌 ). Let (fetched, committed, loaded) =
ts(tid), (fetched′, committed′, loaded′) = ts′(tid) for some tid ∈ TID. If loaded[𝑖] ̸=
⊥, then loaded′[𝑖] = loaded[𝑖].
Proof. Follows from the loaded[𝑖] = ⊥ requirement in POW-LOAD and POW-
EARLY transitions. ⊓⊔
Lemma 3. Let 𝑠0𝑍
𝜎−→ 𝑠𝑍 e−→ 𝑠𝑍 ′. Assume eval(tid, 𝑖, 𝑒) = v ̸= ⊥ in 𝑠𝑍 . Then
eval(tid, 𝑖, 𝑒) = v in 𝑠𝑍
′.
Proof. By definition of eval, Lemma 2, and the fact that functions in FUN are
deterministic. ⊓⊔
Lemma 4. Consider a computation 𝜎 ∈ Cpower(𝒫). Then a load (tid, 𝑖) reads a
value from a store (tid, 𝑖′) via an early read (POW-EARLY) transition iff (1)
𝜎 = 𝜎1 · (load, tid, 𝑖, a) ·𝜎2 · (commit, tid, 𝑖′, *, a) ·𝜎3, 𝑖′ ∈ [1..𝑖− 1] and (2) 𝜎3 does
not contain events matching (commit, tid, [𝑖′ + 1..𝑖− 1], *, a).
Proof. From left to right. Assume the load (tid, 𝑖) reads the store (tid, 𝑖′) via an
early read transition. Then (tid, 𝑖) must be the latest store to the same address
in thread tid and must not be committed before load (i.e. committed after it),
therefore (1) holds. If (2) does not hold, then (tid, 𝑖′) is not the latest store to
address a in thread tid before the load event, since stores to the same address
are committed in the order of fetching. Contradiction.
From right to left. Let 𝑠0𝑍
𝜎1−→ 𝑠𝑍 = (ts, 𝑠𝑌 ). Consider ts(tid) =
(fetched, committed, loaded). Let 𝑖′′ < 𝑖 be the greatest index, such that
fetched[𝑖′′] is a store, addr(𝑖′′) ∈ {a,⊥}.
Assume 𝑖′ < 𝑖′′. If addr(𝑖′′) = a, we get a contradiction to (2), since stores
to the same address are committed in the order of fetching. If addr(𝑖′′) = ⊥,
then an early read is not possible in state 𝑠𝑍 , and the load reads from the
latest propagated store (POW-LOAD), which is coherence-order-before the store
(tid, 𝑖′), which is program-order-before (tid, 𝑖). This situation is forbidden by
FIN-LD-ST.
By Lemma 3, addr(tid, 𝑖′) ∈ {a,⊥}, therefore, 𝑖′′ = 𝑖′. Assume addr(tid, 𝑖′) =
⊥ or val(tid, 𝑖′) = ⊥. Then, again, a load from the latest propagated store
takes place, which is impossible (see above). Therefore, addr(tid, 𝑖′) = a and
val(tid, 𝑖′) ̸= ⊥.
Obviously, 𝑖′ ̸∈ committed holds, as each fetched instruction is committed
only once, and (tid, 𝑖′) is committed after the load takes place, see (1). All in
all, all requirements for the early read from (tid, 𝑖′) are met, therefore, an early
read transition from state 𝑠𝑍 is possible. As shown above, a load from memory
transition from the same state leads to 𝜎 ̸∈ Cpower(𝒫), therefore, (tid, 𝑖) reads
from the store (tid, 𝑖′) via an early read transition. ⊓⊔
Lemma 5. Consider a computation 𝜎 ∈ Cpower(𝒫). Then a load (tid, 𝑖) reads a
value from a store (tid′, 𝑖′) via a load from memory (POW-LOAD) transition iff
(1) 𝜎 = 𝜎1 · (prop, tid, tid′, 𝑖′, a) · 𝜎2 · (load, tid, 𝑖, a) · 𝜎3, (2) 𝜎2 does not contain
events matching (prop, tid, *, *, a), and (3) 𝜎3 does not contains events matching
(commit, tid, [1..𝑖− 1], *, a).
Proof. From left to right. Assume the load (tid, 𝑖) reads the store (tid′, 𝑖′) via a
load from memory transition. Then, the load has read from the latest store to
address a propagated to thread tid, i.e., (1) and (2) hold. Assume (3) does not
hold — 𝜎3 contains a commit (commit, tid, 𝑖
′′, *, a) and 𝑖′′ < 𝑖. Then, (tid, 𝑖) reads
from the store (tid′, 𝑖′), which is coherence-order-before the store (tid, 𝑖′′), which
is program-order-before (tid, 𝑖). This situation is forbidden by FIN-LD-ST.
From right to left. By (1), (3), and Lemma 4, the load event was not generated
by an early read transition. Therefore, the event was generated by a load from
memory transition, and the load has taken the value from the latest propagated
store to address a, which is, by (1) and (2), (tid′, 𝑖′). ⊓⊔
3 Robustness
Intuitively, a trace 𝑇 (𝜎) abstracts a program computation 𝜎 to the dataflow
and control-flow relations between instructions. Formally, the trace of 𝜎 is a
directed graph 𝑇 (𝜎) := (𝑉,→𝑝𝑜,→𝑐𝑜,→𝑠𝑟𝑐,→𝑐𝑓 ) with nodes 𝑉 and four kinds
of arcs. The nodes are instructions together with their thread identifiers and
serial numbers (in order to distinguish instructions executed in different threads
and the same instruction executed multiple times in the same thread): 𝑉 ⊆
({inita | a ∈ ADDR} ∪
⋃︀
tid∈TID{tid}) × N × ℐtid. The program order →𝑝𝑜 is the
order in which instructions were fetched in each thread. The coherence order →𝑐𝑜
gives the global ordering of writes to each address. The source order →𝑠𝑟𝑐 shows
the store from which a load took its value. The conflict order →𝑐𝑓 shows, for a
load, the stores to the same address following the store the load took its value
from. We define the happens-before relation as →ℎ𝑏 := →𝑝𝑜 ∪→𝑐𝑜 ∪→𝑠𝑟𝑐 ∪→𝑐𝑓 .
Formally, consider a computation 𝜎 ∈ Cpower(𝒫). Let 𝑠0𝑍 𝜎−→ 𝑠𝑍 with 𝑠𝑍 =
(ts, (co, prop)). By Lemma 1, 𝑠𝑍 is uniquely determined. The trace 𝑇 (𝜎) :=
(𝑉,→𝑝𝑜,→𝑐𝑜,→𝑠𝑟𝑐,→𝑐𝑓 ) is defined as follows. Assuming tid ∈ TID, ts(tid) =
(fetched, committed, loaded), 𝑖 ∈ [1..|fetched|], and similarly for tid′, we have:
𝑉 :={(tid, 𝑖, fetched[𝑖]) | tid ∈ TID, 𝑖 ∈ N},
→𝑝𝑜 :={((tid, 𝑖, fetched[𝑖]), (tid, 𝑖 + 1, fetched[𝑖 + 1])) |
𝑖 ∈ [1..|fetched| − 1]},
→𝑐𝑜 :={((tid, 𝑖, fetched[𝑖]), (tid′, 𝑖′, fetched[𝑖′])) |
addr(tid, 𝑖) = addr(tid′, 𝑖′) and co(tid, 𝑖) < co(tid′, 𝑖′)} ∪
{(init𝑎, (tid′, 𝑖′, fetched[𝑖′])) | 𝑎 = addr(tid′, 𝑖′)},
→𝑠𝑟𝑐 :={((tid, 𝑖, fetched[𝑖]), (tid′, 𝑖′, fetched′[𝑖′])) |
(tid, 𝑖) = loaded′[𝑖′]} ∪
{(init𝑎, (tid′, 𝑖′, fetched′[𝑖′])) | init𝑎 = loaded′(𝑖′)},
→𝑐𝑓 :={(𝑎, 𝑏) | ∃𝑐 : 𝑐→𝑠𝑟𝑐𝑎 and 𝑐→𝑐𝑜𝑏}.
Thread 1 Thread 2
init&𝑥 𝑎 : mem[&𝑥]← 1 𝑑 : r2 ← mem[&𝑥]
init&𝑦 𝑏 : mem[&𝑦]← 1 𝑐 : r1 ← mem[&𝑦]
𝑝𝑜 𝑝𝑜
𝑠𝑟𝑐
𝑐𝑓
𝑠𝑟𝑐
𝑐𝑜
𝑐𝑜
Fig. 2. Trace of computation 𝜎MP from Example 1.
We will also need address →𝑎𝑑𝑑𝑟 and data →𝑑𝑎𝑡𝑎 dependence relations (de-
fined as expected based on addrdep and datadep).
→𝑎𝑑𝑑𝑟 :={((tid, 𝑖, fetched[𝑖]), (tid, 𝑖′, fetched′[𝑖])) | 𝑖 ∈ addrdep(tid, 𝑖′)},
→𝑑𝑎𝑡𝑎 :={((tid, 𝑖, fetched[𝑖]), (tid, 𝑖′, fetched′[𝑖])) | 𝑖 ∈ datadep(tid, 𝑖′)}.
Since →𝑝𝑜 includes all the information from the fetched component of a thread
state,→𝑎𝑑𝑑𝑟 and→𝑑𝑎𝑡𝑎 can be reconstructed from→𝑝𝑜 by inspecting the instruc-
tions labeling a node. They are therefore not included in the trace explicitly.
The robustness problem is, given a program 𝒫, to check whether the set of
all traces under Power is a subset of all traces under SC: 𝑇power(𝒫) ⊆ 𝑇sc(𝒫),
where 𝑇mm(𝒫) := {𝑇 (𝜎) | 𝜎 ∈ Cmm(𝒫)} for mm ∈ {power, sc}.
Shasha and Snir have shown that a trace belongs to an SC computation iff
its happens-before relation is acyclic:
Lemma 6 ([15]). A program 𝒫 is robust against Power iff there is no trace
𝑇 ∈ 𝑇power(𝒫) with cyclic →ℎ𝑏.
Example 2. The trace of computation 𝜎MP (Figure 2) has a cyclic happens-
before relation. By Lemma 6, this means that the program is not robust. Indeed,
in no SC computation load 𝑑 can read 0 whereas 𝑐 has read 1.
4 Normal-Form Computations
We say that a computation 𝜏 ∈ Cpower(𝒫) is in normal form of degree 𝑛 if there is
a partitioning 𝜏 = 𝜏1 · · · 𝜏𝑛, such that all fetch events are in 𝜏1 (NF-A) and events
related to different instructions occur in different parts of the computation in
the same order (NF-B):
NF-A (𝜏2 · · · 𝜏𝑛)↓ fetch = 𝜀.
NF-B Formally, for 𝑗 ∈ {1, 2} let e𝑗 , e′𝑗 be events related to instruction (tid𝑗 , 𝑖𝑗).
If e1, e2 ∈ 𝜏𝑠 and e′1, e′2 ∈ 𝜏𝑠′ , then e1 <𝜏𝑠 e2 iff e′1 <𝜏𝑠′ e′2.
In the rest of this section we prove the following theorem:
Theorem 1. A program is robust iff it has no normal-form computations of
degree |𝒫|+ 3 with cyclic happens-before relation.
Consider a computation 𝜎 ∈ Cmm(𝒫). By 𝜎 ∖ (tid, 𝑖) we denote the computa-
tion obtained from 𝜎 by deleting all events related to the 𝑖’th fetched instruction
in thread tid.
Lemma 7. Consider a non-empty computation 𝜎 ∈ Cpower(𝒫). Then there is a
(tidx, 𝑖x), such that 𝜎
′ = 𝜎 ∖ (tidx, 𝑖x) satisfies |𝜎′| < |𝜎| and 𝜎′ ∈ Cpower(𝒫).
Proof. Consider the last fetched instruction in each thread. If among such in-
structions there is a non-store instruction, delete it: its result cannot be used by
any other instruction. If all these instructions are stores, delete the one, on which
(1) no load or store depends via (→𝑠𝑟𝑐 ∪→𝑑𝑎𝑡𝑎)+ · →𝑎𝑑𝑑𝑟, and (2) no condition
depends via (→𝑠𝑟𝑐 ∪→𝑑𝑎𝑡𝑎)+.
Towards a contradiction, assume there is no such store. Consider a last
fetched (store) instruction in a thread tid1: (tid1, 𝑖1). Case 1: there is a load or a
store (tid2, 𝑖
′
2) whose address depends on (tid1, 𝑖1). Case 2: there is a condition
(tid2, 𝑖
′
2) whose value depends on (tid1, 𝑖1). Consider the last fetched instruction
in thread tid2: (tid2, 𝑖2). It must be a store, and it must have been committed
after (tid1, 𝑖1): a store can only be committed after all loads and stores fetched
before it have their addresses determined (Case 1) and after all preceding con-
ditions are committed (Case 2).
Continuing the reasoning, for any last fetched instruction in a thread (tid𝑗 , 𝑖𝑗)
there is a last instruction in a different thread (tid𝑗+1, 𝑖𝑗+1) which must have been
committed later. Taking into account finiteness of the number of threads, we get
a contradiction. ⊓⊔
Fix a program 𝒫. Consider a shortest Power computation 𝛼 ∈ Cpower(𝒫)
with cyclic →ℎ𝑏. Let (tidx, 𝑖x) be the instruction determined by Lemma 7. Let
𝛼 := 𝛼1 · x1 · 𝛼2 · x2 · · ·𝛼𝑛, where {x1 . . . x𝑛−1} are the events related to the
𝑖x’th instruction fetched in thread tidx. Then 𝛼 ∖ (tidx, 𝑖x) := 𝛼′ := 𝛼1 · 𝛼2 · · ·𝛼𝑛.
Since 𝛼′ is shorter than 𝛼, its →ℎ𝑏 is acyclic. Therefore, there is a computation
𝛽 ∈ Csc(𝒫) with 𝑇 (𝛽) = 𝑇 (𝛼′).
Computations 𝛽 and 𝛼′ consist of the same fetch, load, and commit events:
fetch events are determined by →𝑝𝑜; address component a of load and store
commit events is determined by →𝑎𝑑𝑑𝑟, →𝑑𝑎𝑡𝑎 (derivable from →𝑝𝑜), and →𝑠𝑟𝑐;
since →𝑐𝑜 is the same for both computations, we can assume that matching store
commit events have the same value of coherence key k. Notably, 𝛽 can have more
propagate events than 𝛼′ as Power semantics does not guarantee that all stores
are propagated to all threads. Now we reorder events in each part 𝛼𝑗 of 𝛼 in the
way they follow in 𝛽. This gives a computation 𝛾 := 𝛽 ↓𝛼1 ·x1 ·𝛽 ↓𝛼2 ·x2 · · ·𝛽 ↓𝛼𝑛.
In the rest of the section we show that 𝛾 is a valid Power computation of program
𝒫 and has the same trace as 𝛼.
Lemma 8. For all tid ∈ TID holds 𝛼↓ fetch↓ tid = 𝛾 ↓ fetch↓ tid.
Proof. Since 𝑇 (𝛽) = 𝑇 (𝛼′), by definition of 𝛼 and properties of projection, for
any tid ∈ TID we have
𝛼↓ fetch↓ tid = 𝛼1 ↓ fetch↓ tid · x1 ↓ fetch↓ tid · · ·𝛼𝑛 ↓ fetch↓ tid
= · · · (𝛽 ↓ fetch↓ tid)↓(𝛼𝑖 ↓ fetch↓ tid) · x𝑖 ↓ fetch↓ tid · · ·
= 𝛽 ↓𝛼1 ↓ fetch↓ tid · x1 ↓ fetch↓ tid · · ·𝛽 ↓𝛼𝑛 ↓ fetch↓ tid
= (𝛽 ↓𝛼1 · x1 · · ·𝛽 ↓𝛼𝑛)↓ fetch↓ tid
= 𝛾 ↓ fetch↓ tid.
⊓⊔
Lemma 9. Consider some (tid, 𝑖) and (tid′, 𝑖′). Let 𝑃 (𝜎) := true if requirements
(1)–(2) from Lemma 4 or (1)–(3) from Lemma 5 hold for 𝜎, and 𝑃 (𝜎) := false
otherwise. Then, if 𝑃 (𝛼) then 𝑃 (𝛾).
Proof. The proof is a case consideration: which of the two cases holds in the
definition of 𝑃 hold for 𝜎, for 𝛼, and whether the distinguished load and commit
events are located in the same part 𝛼𝑗 . We consider two of the cases. The other
are similar.
Assume requirements (1)–(2) from Lemma 4 hold for 𝛼 and requirements (1)–
(3) from Lemma 5 holds for sequentially consistent computation 𝛽. If load and
commit events are in the same part, then 𝛼 = 𝛼1·x1 · · · (𝛼′𝑗 ·𝑏·𝛼′′𝑗 ·𝑐·𝑑·𝛼′′′𝑗 )·x𝑗 · · ·𝛼𝑛,
𝛽 = 𝛽1 · · · 𝑐 · 𝑑 · 𝛽2 · 𝑏 · 𝛽3, where 𝑏 = (load, tid, 𝑖, a), 𝑐 = (commit, tid, 𝑖′), 𝑑 =
(prop, tid, tid, 𝑖′, a), 𝑖′ < 𝑖. Consequently, 𝛾 = 𝛽 ↓𝛼1 · x1 · · ·𝛽 ↓𝛼𝑗 · x𝑗 · · ·𝛽 ↓𝛼𝑛 =
𝛽 ↓𝛼1 · x1 · · · (𝛽1 ↓𝛼𝑗 · 𝑑 · 𝛽2 ↓𝛼𝑗 · 𝑏 · 𝛽3 ↓𝛼𝑗) · x𝑗 · · ·𝛽 ↓𝛼𝑛 — looks like a read from
memory situation. We check requirements (1)–(3) of Lemma 5 then. First, 𝛽2 ↓𝛼𝑗
must have no prop events to thread tid with the address a — holds as 𝛽2 does not
have them. Second, 𝛽3 ↓𝛼𝑗 must have no commits of earlier writes in thread tid
— holds as 𝛽3 does not have them. Third, 𝛽 ↓𝛼𝑙 = (𝛽1 ·𝛽2 ·𝛽3)↓𝛼𝑙, 𝑙 ∈ [𝑖+ 1..𝑛]
must have no commit events for stores with indices [1..𝑖− 1], the same address
and thread id. Consider 𝛽1 ↓𝛼𝑙 — if it has such an event 𝑒, then two stores to
the same address, 𝑒 and 𝑐, are committed in different order in 𝛼′ and 𝛽, which
is impossible due to 𝑇 (𝛼′) = 𝑇 (𝛽). Consider 𝛽2 ↓𝛼𝑙 — it does not have such an
event, because 𝛽2 does not have prop events to address a, therefore, it does not
have commits of own stores there too. Consider 𝛽3 ↓𝛼𝑙 — it does not have such
an event, because 𝛽3 does not. Finally, none of x𝑙 events, 𝑙 ∈ [𝑖+ 1..𝑛− 1], must
be a commit of earlier writes in thread tid — holds, as these events belong to
the last fetched instruction of a thread.
Consider the case when load and commit events are in different parts, i.e.
𝛼 = 𝛼1 · x1 · · · (𝛼′𝑗 · 𝑏 · 𝛼′′𝑗 ) · · · (𝛼′𝑘 · 𝑐 · 𝑑 · 𝛼′′𝑘) · · ·𝛼𝑛, 𝛽 = 𝛽1 · 𝑐 · 𝑑 · 𝛽2 · 𝑏 · 𝛽3,
where 𝑏, 𝑐, 𝑑 are defined as before and 𝑖′ < 𝑖. Then, 𝛾 = 𝛽 ↓ 𝛼1 · x1 · · ·𝛽 ↓
𝛼𝑗 · x𝑗 · · ·𝛽 ↓𝛼𝑘 · · ·𝛽 ↓𝛼𝑛 = 𝛽 ↓𝛼1 · x1 · · ·𝛽1 ↓𝛼𝑗 · 𝛽2 ↓𝛼𝑗 · 𝑏 · 𝛽3 ↓𝛼𝑗 · x𝑗 · · ·𝛽1 ↓
𝛼𝑘 · 𝑐 · 𝑑 · 𝛽2 ↓𝛼𝑘 · 𝛽3 ↓𝛼𝑘 · x𝑘 · · ·𝛽 ↓𝛼𝑛 — looks like an early read case. Therefore,
one must check that 𝛽2 ↓𝛼𝑘 ·𝛽3 ↓𝛼𝑘 · x𝑘 · · ·𝛽 ↓𝛼𝑛 has no commit events matching
(commit, tid, [𝑖′ + 1..𝑖 − 1], *, a). Consider 𝛽2 ↓𝛼𝑘 — does not have such events,
because they would be immediately followed by a prop event to thread tid and
address a, which contradicts requirement (2) of Lemma 5. Consider 𝛽3 ↓𝛼𝑘 —
does not have such events, because 𝛽3 does not have them by requirement (3) of
Lemma 5. Consider 𝛽 ↓𝛼𝑙, 𝑙 ∈ [𝑗+1..𝑛] — does not have such events, because 𝛼𝑙
do not have them by requirement (2) of Lemma 4. Finally, x𝑙, 𝑙 ∈ [𝑗 + 1..𝑛− 1]
belong to the last fetched instruction of a thread, therefore do not contain the
described commit events. ⊓⊔
Lemma 10. 𝛾 ∈ Cpower(𝒫).
Proof. We proceed by induction. Assume (1) 𝛾 = 𝛾1 · e · 𝛾2, (2) 𝑠0𝑍 𝛾1−→ 𝑠𝑍 , and
(3) all loads satisfied in 𝛾1 have read from the same stores as in 𝛼. We show that
𝑠0𝑍
𝛾1·e−−→ 𝑠𝑍 ′ and all loads satisfied in 𝛾1 · e have read from the same stores as
in 𝛼. Let 𝑠𝑍 = (ts, 𝑠𝑌 ) and ts(tid) = (fetched, committed, loaded). Consider the
event e.
(fetch, tid, 𝑖) A transition labeled by e from state 𝑠𝑍 is feasible due to Lemma 8
and the fact that feasibility of a fetch transition is conditioned solely on the
previous fetch transition with the same thread id.
(load, tid, 𝑖, a) For the transition to be feasible, addr(𝑖) = a must hold. In order
to have addr(tid, 𝑖) ̸= ⊥, all loads in thread tid, on which addr(tid, 𝑖) depends,
must be satisfied. Note that these loads are the same in 𝛼 and 𝛾 due to
Lemma 8. Since 𝛼 ∈ Cpower(𝒫), these load events occurred before e in 𝛼. Let
e′ be one of these load events. If e′ ∈ 𝛼𝑖 and e ∈ 𝛼𝑗 , 𝑖 < 𝑗, or e′ ∈ {x𝑖 |
𝑖 ∈ [1..𝑛 − 1]}, or e ∈ {x𝑖 | 𝑖 ∈ [1..𝑛 − 1]}, then e′ and e are located in 𝛾
in the same order. If e′, e ∈ 𝛼𝑖, then e′, e ∈ 𝛽. Since the →𝑝𝑜 components of
𝑇 (𝛼) and 𝑇 (𝛽) match up to a single deleted arc, e′ and e are located in 𝛽
(therefore, in 𝛽 ↓𝛼𝑖 and 𝛾) in this order. By inductive assumption (3) and
the fact that functions in FUN are deterministic, addr(tid, 𝑖) = a holds.
Assume the load (tid, 𝑖) has read from a store (tid′, 𝑖′) in 𝛼. Then, by Lem-
mas 4, 5, 9, either conditions (1)–(3) of Lemma 5 hold, or conditions (1)–(2)
of Lemma 4 hold. In the former case, (prop, tid, tid′, 𝑖′, a) is the last prop
event to tid with address a, therefore, a load from memory transition read-
ing (tid′, 𝑖′) is feasible from state 𝑠𝑍 . In the latter case, (tid′, 𝑖′) is the lat-
est non-committed store to address a, and an early read transition reading
(tid′, 𝑖′) is possible. The proof that addr(tid′, 𝑖′) ̸= ⊥ is similar to the proof
that addr(tid, 𝑖) ̸= ⊥.
(commit, tid, 𝑖) The proof of addr(tid, 𝑖) ̸= ⊥ and val(tid, 𝑖) ̸= ⊥ is similar to the
proof of addr(tid, 𝑖) ̸= ⊥ in the previous case. If fetched[𝑖] is a load or a store,
there must be no preceding loads and stores to unknown addresses, which
holds and can be proven in a similar way. If fetched[𝑖] is a load, requirement
loaded[𝑖] ̸= ⊥ holds for the same reasons. If fetched[𝑖] is a conditional, re-
quirement val(tid, 𝑖) ̸= 0 holds by inductive assumption (3), the fact that
functions in FUN are deterministic, and the fact that 𝛼 ∈ Cpower(𝒫).
(commit, tid, 𝑖, k, a) Value k is unique, since it was unique in 𝛼, and 𝛼 and 𝛾
consist of the same commit events. We check co(prop(tid, a)) < k. Assume it
does not hold. Then, there is e′ = (prop, tid, tid′, 𝑖′, a), where co(tid′, 𝑖′) > k,
and e′, e are located in 𝛾 in this order. If e′ ∈ 𝛼𝑖, e ∈ 𝛼𝑗 , 𝑖 < 𝑗, or e′ ∈ {x𝑖 |
𝑖 ∈ [1..𝑛 − 1]}, or e ∈ {x𝑖 | 𝑖 ∈ [1..𝑛 − 1]}, these events are located in 𝛼
in this order, which contradicts 𝛼 ∈ Cpower(𝒫). If e′, e ∈ 𝛼𝑖, these events are
located in 𝛽 in this order, which contradicts 𝛽 ∈ Cpower(𝒫).
This transition is immediately followed by a prop transition in 𝛾, since it did
so in 𝛼 and 𝛽 (unless 𝑒 ∈ {x𝑖 | 𝑖 ∈ [1..𝑛− 1]}, which is a simpler case), and
by properties of projection.
(prop, tid, tid′, 𝑖′, a) The requirement co(prop(tid, a)) < co(tid′, 𝑖′) is proven sim-
ilarly to co(prop(tid, a)) < k in the previous case.
As shown above, 𝑠0𝑍
𝛾−→ 𝑠𝑍 . What is left to check, is that 𝑠𝑍 ∈ 𝐹𝑍 . The
requirement that all fetched instructions are committed trivially holds: 𝛽 includes
the same commit events as 𝛼′, therefore, by definition, 𝛾 contains the same
commit events as 𝛼. The other two requirements that loads and stores agree
with the coherence order hold due to Lemma 8, the inductive assumption (3),
and the fact that 𝛼 and 𝛾 consist of the same commit events (i.e. the coherence
keys of matching stores are equal in these computations). ⊓⊔
Lemma 11. 𝑇 (𝛾) = 𝑇 (𝛼)
Proof. Equality of→𝑝𝑜 follows from Lemma 8. Equality of source relation follows
from Lemmas 4, 5, 9, 10. Store order is determined by a and k components of
store commit events. Since computations 𝛼 and 𝛾 consist of the same commit
events, the →𝑐𝑜 relations in the traces of 𝛼 and 𝛾 are the same. ⊓⊔
Lemma 12. 𝛾 ∈ Cpower(𝒫) and 𝑇 (𝛾) = 𝑇 (𝛼).
Proof. Corollary of Lemmas 10 and 11. ⊓⊔
Without loss of generality we may assume that all fetch events of 𝛼 are located
within 𝛼1 · x1: every thread can always first fetch all instructions and in the rest
of the computation only execute them; such a reordering does not change the
trace. Also, note that the maximal number of events an instruction can generate
is |𝒫| + 2. This bound is achieved by a store that is fetched, committed, and
propagated to all threads. Then the following lemma holds:
Lemma 13. Computation 𝛾 is in normal form of degree |𝒫|+ 3.
Proof. By definition of 𝛾 and properties of projection. ⊓⊔
Together with Lemma 6 this proves Theorem 1.
Example 3. Consider 𝛼 := fetch(𝑐) · fetch(𝑑) · fetch(𝑎) ·fetch(𝑏) · commit(𝑎) ·
prop(𝑎, 1) ·

commit(𝑏) ·prop(𝑏, 1) ·prop(𝑏, 2) · load(𝑐) · load(𝑑) ·commit(𝑑) ·commit(𝑐),
which is essentially 𝜎MP with fetch events moved to the front. We cancel the x𝑖
events (crossed out) related to the store instruction 𝑏, as 𝑏 is the last instruction
of thread 1 and no address depends on it (we could also cancel the events of 𝑑
instead). Therefore, 𝛼1 := fetch(𝑐)·fetch(𝑑)·fetch(𝑎), 𝛼2 := commit(𝑎)·prop(𝑎, 1),
𝛼3 := 𝛼4 := 𝜀, 𝛼5 := load(𝑐) · load(𝑑) · commit(𝑑) · commit(𝑐), and 𝛼′ :=
Thread 1 Thread 2
init&𝑥 𝑎 : mem[&𝑥]← 1 𝑑 : r2 ← mem[&𝑥]
init&𝑦 𝑐 : r1 ← mem[&𝑦]
𝑝𝑜
𝑐𝑓
𝑠𝑟𝑐
𝑐𝑜
𝑠𝑟𝑐
Fig. 3. Trace of the computations 𝛼′ and 𝛽 from Example 3.
𝛼1 · 𝛼2 · 𝛼3 · 𝛼4 · 𝛼5. The trace of 𝛼′ is shown in Figure 3. The SC computa-
tion with the same trace is 𝛽 := fetch(𝑐) · load(𝑐) · commit(𝑐) · fetch(𝑑) · load(𝑑) ·
commit(𝑑) · fetch(𝑎) ·commit(𝑎) ·prop(𝑎, 1) ·prop(𝑎, 2). The normal-form computa-
tion is 𝛾 := 𝛽 ↓𝛼1 ·x1 · · ·𝛽 ↓𝛼5 = (fetch(𝑐)·fetch(𝑑)·fetch(𝑎))·fetch(𝑏)·(commit(𝑎)·
prop(𝑎, 1))·commit(𝑏)·prop(𝑏, 1)·prop(𝑏, 2)·(load(𝑐)·commit(𝑐)·load(𝑑)·commit(𝑑)).
It is feasible and has the same trace as 𝛼 and 𝜎MP (Figure 2).
5 From Normal-Form Computations to Emptiness
We now reduce robustness to language emptiness. First, we define a multiheaded
automaton capable of generating all normal-form computations of a program.
Next, we intersect it with regular languages that check cyclicity of the happens-
before relation. Altogether, the program is robust iff the intersection is empty.
5.1 Generating Normal-Form Computations
To generate all normal-form computations, we use so-called multiheaded au-
tomata [8]. Essentially, a multiheaded automaton generates a computation
𝜎1 . . . 𝜎𝑛 by simultaneously generating its parts 𝜎𝑖. The automaton has a head
for each part, and the transition relation defines the head producing an event.
Formally, an 𝑛-headed automaton over 𝛴 is an automaton operating on the ex-
tended alphabet [1..𝑛] × 𝛴: 𝐴 = (𝑆, [1..𝑛] × 𝛴,𝛥, 𝑠0, 𝐹 ). The language of 𝐴 is
ℒ(𝐴) := {second(𝜎↓({1}×𝛴) · · ·𝜎↓({𝑛}×𝛴)) | 𝑠0 𝜎−→ 𝑠 for some 𝑠 ∈ 𝐹}, where
second((𝑎1, 𝑏1) · · · (𝑎𝑚, 𝑏𝑚)) := 𝑏1 · · · 𝑏𝑚. Multiheaded automata are closed under
intersection with regular languages. Moreover, for finite multiheaded automata
language emptiness is NL-complete [8]:
Lemma 14 ([8]). Consider an 𝑛-headed automaton 𝑈 and a finite automaton
𝑉 over a common alphabet 𝛴. There is an 𝑛-headed automaton 𝑊 with ℒ(𝑊 ) =
ℒ(𝑈) ∩ ℒ(𝑉 ) with the number of states |𝑆𝑊 | ≤ |𝑆𝑈 | · |𝑆𝑉 |2𝑛 + 1.
Lemma 15 ([8]). Emptiness for 𝑛-headed automata is NL-complete.
We will generate all normal-form computations of program 𝒫 with the 𝑛-
headed automaton 𝑀(𝒫) := (𝑆𝑀 ,E, 𝛥𝑀 , 𝑠0𝑀 , 𝐹𝑀 ), where 𝑛 := |𝒫| + 3. The
automaton generates all events related to a single instruction in one shot, but,
possibly, in different parts of the computation. All fetch events are generated
in the first part of the computation. In order to generate them, the automaton
keeps track of the destination state of the last fetched instruction in each thread
(component ctrl-state of the automaton state).
Each instruction can only read the last value written to a register. Therefore,
the automaton only needs to remember |REG| register values per thread (compo-
nent reg-value). However, an instruction cannot be executed until the values of
all registers that it reads become known. To obey this restriction, the automaton
memorizes the part of the computation in which the register value gets computed
(reg-comp-head). For example, while handling an assignment r1 ← r1 + r2, the
automaton learns that the new value of r1 is the sum of the current values of r1
and r2. It also remembers that this value is available no earlier than the current
values of r1 and r2 are computed. Similarly, the automaton remembers the parts
of the computation in which the addresses of load and store instructions be-
come known (addr-comp-head), and certain kinds of instructions get committed
(reg-comm-head, assume-comm-head, addr-comm-head).
The automaton has to keep a separate memory state for each thread and for
each part of the computation. The memory state of a thread in a part is updated
when a store instruction gets propagated to this thread in this part. When a load
instruction is handled, the automaton chooses a part where the load event takes
place and uses the memory state of that part. Besides the memory valuation
(mem-value), the memory state includes coherence keys (last-key) to guarantee
that the generated computation respects the coherence order.
When starting the computation, the automaton non-deterministically guesses
the memory valuations and coherence keys for all parts of the computation
(except the first one). Upon termination, the automaton checks that the parts
of the computation generated by each head fit together at the concatenation
points. This ensures the overall computation is valid for the program. The trick
is to remember the guess of the initial memory valuations and coherence keys
in immutable components of the automaton state (mem-value𝑔, last-key𝑔). The
final states require that the current memory state in part h of the computation
coincides with the guessed initial state in part h + 1.
State space A state from 𝑆𝑀 (except the special initial state 𝑠0𝑀 ) includes
the following information:
– ctrl-state(tid) gives the current control state of thread tid.
– reg-comp-head(tid, r) gives the part in which last value assigned to register r
in thread tid gets computed.
– reg-value(tid, r) gives this computed value.
– reg-comm-head(tid, r) gives the part in which the last instruction assigning a
value to register r in thread tid gets committed.
– assume-comm-head(tid) gives the part in which the latest fetched condition
in thread tid is committed.
– mem-value(tid, a, h) gives the value of the last write to a propagated to thread
tid in the part h or earlier.
– last-key(tid, a, h) gives the coherence key of the last write to a propagated to
thread tid in the part h or earlier.
– mem-value𝑔, last-key𝑔 are immutable copies of the guessed values of the pre-
vious two components (see MH-GUESS below).
– early-mem-value(tid, a, h) gives the value written by the last fetched store to
a which is still in-flight in the part h of computation, ⊥ if there is no such
store, ⊤ if the value of the store is unknown or there is a later in-flight store
in this part with an unknown address.
– addr-comp-head(tid) gives the leftmost part of the computation, in which the
addresses of all already fetched memory accesses are computed.
– addr-comm-head(tid, a) gives the rightmost part of the computation having
a commit to address a by thread tid.
– instr-count(tid) gives the number of instructions fetched in thread tid.
The initial state 𝑠0𝑀 does not contain any information.
Transition relation We define transitions by specifying the new (primed) val-
ues of the state components and the label 𝜆 of the transition. First, we define
the transition guessing the initial memory state in each part of the computation:
MH-GUESS Assume the current state is 𝑠0𝑀 . Then, there are transitions
to the states satisfying ctrl-state′ := 𝜆tid.𝑞0tid, reg-comp-head
′ := 𝜆tid.𝜆r.1,
reg-value′ := 𝜆tid.𝜆r.0, reg-comm-head′ := 𝜆tid.𝜆r.1, assume-comm-head′ :=
𝜆tid.1, early-mem-value′ := 𝜆tid.𝜆a.𝜆h.⊥, mem-value′ = mem-value′𝑔,
last-key′ = last-key′𝑔, addr-comp-head
′ := 𝜆tid.1, addr-comm-head′ :=
𝜆tid.𝜆a.1, instr-count′ := 𝜆tid.0. Also, mem-value′(tid, a, 1) := 0,
last-key′(tid, a, 1) := 0 for all tid ∈ TID, a ∈ ADDR. Moreover,
last-key′(tid, a, h) ≤ last-key′(tid, a, h + 1) for h ∈ [1..𝑛 − 1], tid ∈ TID,
a ∈ ADDR (we assume last-key′(tid, a, 𝑛) := ∞). 𝜆 := 𝜀.
Fix a state 𝑠𝑀 . We overload eval(tid, 𝑒) to mean the value of expression 𝑒 for
the valuation of registers defined by 𝜆r.reg-value(tid, r).
Let HEAD := [1..𝑛]. Let tid ∈ TID, ctrl-state(tid) = 𝑞1, instr = 𝑞1 cmd−−→ 𝑞2 ∈
ℐtid. Let h1 := 1. Let h2 ∈ HEAD, h2 ≥ h1, h2 ≥ reg-comp-head(tid, r) for each reg-
ister r read in cmd. Let h3 ∈ HEAD, h3 ≥ h2, h3 ≥ reg-comm-head(tid, r) for each
register r read in cmd, h3 ≥ assume-comm-head(tid). Let 𝑖 := instr-count(tid) + 1
and instr-count′ := instr-count[tid ←˒ 𝑖]. Depending on the type of cmd, there are
the following transitions from 𝑠𝑀 labeled by events 𝜆:
MH-ASSIGN cmd = r ← 𝑒v. Let v := eval(tid, 𝑒v). Then reg-value′ :=
reg-value[(tid, r) ←˒ v], reg-comp-head′ := reg-comp-head[(tid, r) ←˒ h2],
reg-comm-head′ := reg-comm-head[(tid, r) ←˒ h3]. 𝜆 := (h1, fetch, tid, instr) ·
(h3, commit, tid, 𝑖).
MH-ASSUME cmd = assume(𝑒v). Let eval(tid, 𝑒v) ̸= 0. Then
assume-comm-head′ := assume-comm-head[tid ←˒ h3]. 𝜆 :=
(h1, fetch, tid, instr) · (h3, commit, tid, 𝑖).
MH-LOAD cmd = r ← mem[𝑒a]. Let a := eval(tid, 𝑒a). Let
h3 ≥ addr-comm-head(tid, a). If early-mem-value(tid, a) = ⊥,
let v := mem-value(tid, a, h2) (load from memory case). Oth-
erwise, let v := early-mem-value(tid, a, h2) and assume v ̸= ⊤
(early read case). Then reg-value′ := reg-value[(tid, r) ←˒
v], reg-comp-head′ := reg-comp-head[(tid, r) ←˒ h2],
reg-comm-head′ := reg-comm-head[(tid, r) ←˒ h3], addr-comp-head′ :=
addr-comp-head[tid ←˒ max{addr-comp-head(tid), h2}], addr-comm-head′ :=
addr-comm-head[(tid, a) ←˒ h3]. 𝜆 := (h1, fetch, tid, instr) · (h2, load, tid, 𝑖, a) ·
(h3, commit, tid, 𝑖).
MH-STORE cmd = mem[𝑒a] ← 𝑒v. Let a := eval(tid, 𝑒a). Assume h3 ≥
addr-comp-head(tid), h3 ≥ addr-comm-head(tid, a). Let v := eval(tid, 𝑒v).
Let k ∈ Q, k ̸= last-key(tid, a, h) for any tid ∈ TID, a ∈ ADDR,
h ∈ HEAD. Then early-mem-value′ := early-mem-value[(tid, a, [h1..h2 −
1]) ←˒ ⊤), (tid, a, [h2..h3 − 1]) ←˒ v]. We also set early-mem-value′ :=
early-mem-value′[(tid, a′, h) ←˒ ⊤] for all a′ ∈ ADDR ∖ {a}, h ∈ [h1..h2 − 1]
with early-mem-value(tid, a′, h) ∈ DOM. We define addr-comp-head′ :=
addr-comp-head[tid ←˒ max{addr-comp-head(tid), h2}], addr-comm-head′ :=
addr-comm-head[(tid, a) ←˒ h3]. Let 𝑇 ⊆ TID ∖ {tid}, initially mem-value′ :=
mem-value, last-key′ := last-key, and 𝜆 := (h1, fetch, tid, instr) ·
(h3, commit, tid, 𝑖, k, a). For tid
′ = tid and for each tid′ ∈ 𝑇 : let h ∈ HEAD, h ≥
h3 (h := h3 for tid
′ = tid), last-key(tid′, a, h) < k ≤ last-key𝑔(tid′, a, h+1), then
mem-value′ := mem-value′[(tid′, a, h) ←˒ v], last-key′ := last-key′[(tid′, a, h) ←˒
k], 𝜆 := 𝜆 · (h, prop, tid′, tid, 𝑖, a).
For brevity we allowed a single transition to be labeled by several events. An
automaton with such transitions can be trivially translated to the canonical form
by breaking one such transition into several consecutive ones.
Final states The set of final states 𝐹𝑀 is a subset of 𝑆𝑀 ∖ {𝑠0𝑀} consisting of
all states with mem-value(tid, a, h) = mem-value𝑔(tid, a, h+1), last-key(tid, a, h) =
last-key𝑔(tid, a, h + 1) for all tid ∈ TID, a ∈ ADDR, h ∈ [1..𝑛− 1].
Soundness and completeness
Lemma 16. ℒ(𝑀) ⊆ Cpower(𝒫).
Proof. Consider 𝜎 = 𝜆1 · · ·𝜆𝑚, such that 𝑠0𝑀 𝜆1−→ 𝑠𝑀 1 𝜆2−→ · · · 𝜆𝑚−−→ 𝑠𝑀𝑚 ∈ 𝐹𝑀 .
For h ∈ HEAD, let 𝜏𝑠h := second((𝜆1 · · ·𝜆𝑠)↓({h} × E)), 𝑠 ∈ [0..𝑚].
Let (𝑠𝑍
0
1 . . . 𝑠𝑍
0
𝑛) ∈ (𝑆𝑍)𝑛 be the states of 𝑍 defined so that SND-B holds for
𝑠 = 0 (see below). By induction on 𝑠 ∈ [1..𝑚] we show:
SND-A 𝑠𝑍
0
h
𝜏𝑠h−→ 𝑠𝑍𝑠h.
SND-B For all tid ∈ TID, h ∈ HEAD, 𝑠𝑍𝑠h = (ts, (co, prop)), ts(tid) =
(fetched, committed, loaded) holds:
SND-B1 fetched is the list of instructions fetched by (𝜏𝑚1 · · · 𝜏𝑚h−1 · 𝜏𝑠h ) ↓
fetch↓ tid.
SND-B2 committed consists of the indices of instructions committed by
(𝜏𝑚1 · · · 𝜏𝑚h−1 · 𝜏𝑠h )↓commit↓ tid.
SND-B3 loaded contains the information about the stores being read by
loads in (𝜏𝑚1 · · · 𝜏𝑚h−1 · 𝜏𝑠h ) determined according to Lemmas 4 and 5.
SND-B4 co(tid, 𝑖) = k if (commit, tid, 𝑖, k, a) ∈ 𝜏𝑚1 · · · 𝜏𝑚h−1 · 𝜏𝑠h for some
a ∈ ADDR, otherwise, co(tid, 𝑖) = ⊥.
SND-B5 prop(tid, a) = (tid′, 𝑖′) if (prop, tid, tid′, 𝑖′, a) = last((𝜏𝑚1 · · · 𝜏𝑚h−1 ·
𝜏𝑠h )↓(prop, tid, *, *, a)), otherwise, prop(tid, a) = inita.
SND-C For each tid ∈ TID: ctrl-state(tid) = dst(last(𝑠𝑍𝑠1.ts(tid).fetched)) (or
𝑞0tid if no instructions were fetched).
SND-D For each tid ∈ TID, r ∈ REG, for each h ∈ [reg-comp-head(tid, r)..𝑛]:
reg-value(tid, r) = eval(tid, instr-count(tid) + 1, r) computed for the state 𝑠𝑍
𝑠
h.
SND-E For each tid ∈ TID, r ∈ REG, h ∈ [reg-comm-head(tid, r)..𝑛]: let 𝑖 be
the index of the latest instruction in 𝑠𝑍
𝑠
h.ts(tid).fetched writing to r, then
𝑖 ∈ 𝑠𝑍𝑠h.ts(tid).committed.
SND-F For each tid ∈ TID, h ∈ [assume-comm-head(tid)..𝑛]: 𝑠𝑍𝑠h does not
contain uncommitted conditional instructions in thread tid having indices
≤ instr-count(tid).
SND-G For each tid ∈ TID, a ∈ ADDR, h ∈ HEAD: let 𝑤 := 𝑠𝑍𝑠h.prop(tid, a).
If 𝑤 = inita, mem-value(tid, a, h) = 0. If 𝑤 = (tid
′, 𝑖′), mem-value(tid, a, h) =
val(tid′, 𝑖′) computed in 𝑠𝑍𝑠h.
SND-H For each tid ∈ TID, a ∈ ADDR, h ∈ HEAD: last-key𝑔(tid, a, h) ≤
𝑠𝑍
𝑠
h.co(𝑠𝑍
𝑠
h.prop(tid, a)) = last-key(tid, a, h) ≤ last-key𝑔(tid, a, h + 1).
SND-K For each tid ∈ TID, a ∈ ADDR, h ∈ HEAD: let 𝑖 ∈ N be the
maximal index, such that 𝑠𝑍
𝑠
h.ts(tid).fetched[𝑖] is a store, addr(tid, 𝑖) = a
in 𝑠𝑍
𝑠
𝑛. Let 𝑖
′ be the maximal index, such that 𝑠𝑍𝑠h.ts(tid).fetched[𝑖
′] is
a store, addr(tid, 𝑖′) ∈ {⊥, a} in 𝑠𝑍𝑠h. Then early-mem-value(tid, 𝑖, h) =
⊥ if such 𝑖 does not exist or 𝑖 ∈ 𝑠𝑍𝑠h.ts(tid).committed. Otherwise,
early-mem-value(tid, 𝑖, h) = ⊤ if addr(tid, 𝑖′) = ⊥ or val(tid, 𝑖) = ⊥ in 𝑠𝑍𝑠h.
Otherwise, early-mem-value(tid, 𝑖, h) = val(tid, 𝑖) computed in 𝑠𝑍
𝑠
h.
SND-L For each tid ∈ TID, h ∈ [addr-comp-head(tid)..𝑛], 𝑖 ∈
[1..|𝑠𝑍𝑠h.ts(tid).fetched|]: addr(tid, 𝑖) ̸= ⊥ in 𝑠𝑍𝑠h.
SND-M For each tid ∈ TID, a ∈ ADDR, h ∈ [addr-comm-head(tid, a)..𝑛]: if
addr(tid, 𝑖) = a in 𝑠𝑍
𝑠
𝑛 for some 𝑖, then 𝑖 ∈ 𝑠𝑍𝑠h.ts(tid).committed.
Finally we will show that 𝑠𝑍
𝑚
h = 𝑠𝑍
0
h+1 for all h ∈ [1..𝑛− 1] and 𝑠𝑍𝑚𝑛 ∈ 𝐹𝑍 ,
thus proving the claim of the lemma.
Base case: 𝑠 = 1, we must show that there 𝑠𝑀 1 satisfies the inductive state-
ment. This is easy to check by definition of the destination state of MH-GUESS
transition.
Step case: assume the inductive statement holds for some 𝑠 ∈ [0..𝑚 − 1].
Consider 𝜆𝑠 (for notational convenience and without loss of generality we assume
below that h𝑗 ̸= h𝑗′ for 𝑗 ̸= 𝑗′):
Assignment 𝜆𝑠 = (h1, fetch, tid, instr) · (h3, commit, tid, 𝑖), instr = 𝑞1 r←𝑒v−−−→ 𝑞2.
Let e1 := (fetch, tid, instr), e3 := (commit, tid, 𝑖).
We need to show that 𝑠𝑍
𝑠−1
h1
e1−→ 𝑠𝑍𝑠h1 , i.e. that the assignment instruction
can be fetched. This follows from the choice of h1 := 1 in MH-ASSIGN and
SND-B1, SND-C.
We also need to show that 𝑠𝑍
𝑠−1
h3
e3−→ 𝑠𝑍𝑠h3 , i.e. that the assignment instruc-
tion can be committed. First, the e3 transition requires the instruction being
committed to be fetched, which holds due to SND-B1 and h3 ≥ h1. Second,
this instruction must be not committed yet, which holds by SND-B2 and
the fact that 𝑀 commits each instruction once and only once. Third, all
control dependencies must be committed. This is by the choice of h3 in MH-
ASSIGN and SND-F. Fourth, all the preceding data dependencies must be
committed. This is by the choice of h3 in MH-ASSIGN and SND-E. Finally,
the argument of the function must be computed. This is by choice of h3 ≥ h2
in MH-ASSIGN, Lemma 3, and SND-D.
In the end, we must show that the invariants hold in the new state. The
only non-trivial thing is SND-D, which holds due to SND-D, definition of v
in MH-ASSIGN, definitions of eval, and the fact that functions in FUN are
deterministic.
Assume 𝜆𝑠 = (h1, fetch, tid, 𝑞1
instr−−→ 𝑞2) · (h3, commit, tid, 𝑖), instr = assume(𝑒v).
The proof is similar to the previous case. The commit transition additionally
requires eval(tid, 𝑖, 𝑒v) ̸= 0, which holds due to the fact that a similar check
in MH-ASSUME holds, SND-D, definitions of eval, the fact that functions
in FUN are deterministic.
Load 𝜆𝑠 = (h1, fetch, tid, instr) · (h2, load, tid, 𝑖, a) · (h3, commit, tid, 𝑖), instr =
r ← mem[𝑒a]. Let e1 := (fetch, tid, instr), e2 := (load, tid, 𝑖, a), e3 :=
(commit, tid, 𝑖).
𝑠𝑍
𝑠−1
h1
e1−→ 𝑠𝑍𝑠h1 holds for the same reasons as before.
Next, we show that 𝑠𝑍
𝑠−1
h2
e2−→ 𝑠𝑍𝑠h2 , where this transition is a POW-
EARLY transition in the early read case of MH-LOAD and a POW-
LOAD transition in the load from memory case. First, we must show that
𝑠𝑍
𝑠
h.ts(tid).loaded[𝑖] = ⊥. This holds by SND-B3 and the fact that 𝑀 gener-
ates a load event once and only once for a single fetched load instruction.
Assume the early read case. This means, early-mem-value(tid, a, h2) ∈ DOM.
By SND-K, this means, the last fetched store with an unknown address or
address of the load is not yet committed, has the address of the load and
has the value known. By POW-EARLY, the load can take the value from
this store, and SND-B3 holds in the new state.
Consider the load from memory case. This means,
early-mem-value(tid, a, h2) = ⊥. By SND-K, this means, there is no
earlier fetched store with the same address which is not yet committed. By
POW-LOAD, the load can take the value from the last propagated store,
and SND-B3 holds in the new state.
Argumentation for 𝑠𝑍
𝑠−1
h3
e3−→ 𝑠𝑍𝑠h3 is similar to the previous cases. Addition-
ally, first we must show that 𝑠𝑍
𝑠
h.ts(tid).loaded[𝑖] ̸= ⊥. This is by h3 ≥ h2
(MH-LOAD), SND-B3. Second, we must ensure that all preceding instruc-
tions accessing the same address a are committed, and there are no previ-
ously fetched instructions with unknown address. This holds by choice of h3
in MH-LOAD, SND-L, and SND-M.
In the new state, SND-D holds by definition of v in POW-LOAD, definitions
of eval, SND-G, and SND-K. Proofs for the other conditions are simpler.
Store 𝜆𝑠 = (h1, fetch, tid, instr) · (h3, commit, tid, 𝑖, k, a) · (h3, prop, tid, tid, 𝑖, a) ·
(h4, prop, tid1, tid, 𝑖, a) · · · (h𝑢+3, prop, tid𝑢, tid, 𝑖, a). Let e1 := (fetch, tid, instr),
e3 := (commit, tid, 𝑖, k, a), e4 := (prop, tid, tid, 𝑖, a), e𝑗+3 := (prop, tid𝑗 , tid, 𝑖, a)
for 𝑗 ∈ [1..𝑢].
𝑠𝑍
𝑠−1
h1
e1−→ 𝑠𝑍𝑠h1 holds for the same reasons as before.
𝑠𝑍
𝑠−1
h3
e3−→ 𝑠𝑍𝑠h3 holds for the same reasons as in the case of a load. The
requirement that the coherence key is unique in POW-STORE follows from a
similar requirement in MH-STORE and SND-H. By POW-STORE, the only
available transition from 𝑠𝑍
𝑠
h2
is a propagation of the write to its thread, i.e.
e4, which indeed follows e3 in 𝜏 . Next, we show that e4 and further propagate
transitions are feasible.
First, POW-PROP rule requires the write being propagated to have a co-
herence key (i.e. to be committed), which holds by choice of h𝑗 , 𝑗 ∈ [3..𝑢+3]
in MH-STORE and SND-B2. Second, it requires the coherence key of the
latest propagated store to be less than the key of the store being propagated.
This is adhered due to the check last-key(tid′, a, h) < k and SND-H.
It is easy to see that the inductive statements hold in the new state as well.
Now we prove 𝑠𝑍
𝑚
h = 𝑠𝑍
0
h+1 for all h ∈ [1..𝑛− 1]. The equality of ts compo-
nents immediately follows from SND-B inductive statement.
Now we prove 𝑠𝑍
𝑚
𝑛 ∈ 𝐹𝑍 . FIN-COMM holds, because 𝑍 always emits a
commit event for each fetched instruction.
Let us turn to FIN-LD property. First, one should note that 𝑀 generates prop
events for stores to the same address in each part 𝜏𝑗 in the ascending order by
k. This is by MH-STORE. Together with SND-H, this means that these events
are sorted in 𝜏 in the ascending order by k. The rest of the proof of FIN-LD is
a simple case consideration: whether the loads 𝑖, 𝑖′ were done from memory or
from a local store early.
FIN-LD-ST is proven by a similar case consideration.
⊓⊔
We call 𝛼 a prefix of 𝜎 and write 𝛼 ⊑ 𝜎 if 𝜎 = 𝛼 · 𝛽 for some 𝛽.
Lemma 17. {𝜏 ∈ Cpower(𝒫) | 𝜏 is in normal form of degree 𝑛} ⊆ ℒ(𝑀).
Proof. Let 𝜏 = 𝜏1 · · · 𝜏𝑛 ∈ Cmm(𝒫) be a normal-form computation, i.e. 𝑠0𝑍 𝜏−→
𝑠𝑍 ∈ 𝐹𝑍 . We show that there is a sequence of transitions 𝑠0𝑀 𝜆1−→ 𝑠𝑀 1 𝜆2−→
. . .
𝜆𝑚−−→ 𝑠𝑀𝑚 ∈ 𝐹𝑀 , such that 𝜏h = second((𝜆1 · · ·𝜆𝑛)↓({h} × E)).
Let 𝜏𝑠h := second((𝜆1 · · ·𝜆𝑠) ↓ ({h} × E)), 𝑠0𝑍
𝜏1···𝜏h−1·𝜏𝑠h−−−−−−−→ 𝑠𝑍𝑠h −→* 𝑠𝑍 . By
induction on 𝑠 ∈ [1,∞) we show the following inductive statements:
CMPL-A There is a sequence of 𝑠 transitions: 𝑠0𝑀
𝜆1−→ 𝑠𝑀 1 𝜆2−→ . . . 𝜆𝑠−→ 𝑠𝑀𝑠.
CMPL-B For all h ∈ HEAD: 𝜏h = 𝜏𝑠h .𝜏𝑠h for some 𝜏𝑠h .
CMPL-C If e1, e2 ∈ 𝜏 are two events related to instruction (tid, 𝑖), then e1 ∈ 𝜏𝑠h
for some h iff e2 ∈ 𝜏𝑠h′ for some h′.
CMPL-D For each tid ∈ TID: ctrl-state(tid) = dst(last(𝑠𝑍𝑠1.ts(tid).fetched)) (or
ctrl-state(tid) = 𝑞0tid if no instructions were fetched).
CMPL-F For each tid ∈ TID, r ∈ REG, h ∈ [reg-comp-head(tid, r)..𝑛]:
reg-value(tid, r) = eval(tid, instr-count(tid) + 1, r) computed for the state 𝑠𝑍
𝑠
h.
CMPL-F’ For each tid ∈ TID, r ∈ REG, h ∈ [1..reg-comp-head(tid, r) − 1]:
eval(tid, instr-count(tid) + 1, r) = ⊥.
CMPL-G For each tid ∈ TID, r ∈ REG, h ∈ [reg-comm-head(tid)..𝑛]: let 𝑖
be the index of the last instruction in 𝑠𝑍
𝑠
h.ts(tid).fetched writing to r, then
𝑖 ∈ 𝑠𝑍𝑠h.ts(tid).committed.
CMPL-G’ For each tid ∈ TID, r ∈ REG, h ∈ [1..reg-comm-head(tid, r) − 1]:
let 𝑖 be the index of the last instruction in 𝑠𝑍
𝑠
h.ts(tid).fetched, then 𝑖 ̸∈
𝑠𝑍
𝑠
h.ts(tid).committed.
CMPL-K For each tid ∈ TID, h ∈ [assume-comm-head(tid)..𝑛]: let 𝑖 be
an index of an assume() instruction in 𝑠𝑍
𝑠
h.ts(tid).fetched, then 𝑖 ∈
𝑠𝑍
𝑠
h.ts(tid).committed.
CMPL-K’ For each tid ∈ TID, h ∈ [1..assume-comm-head(tid) − 1]: let 𝑖 be
an index of the last assume() instruction in 𝑠𝑍
𝑠
h.ts(tid).fetched, then 𝑖 ̸∈
𝑠𝑍
𝑠
h.ts(tid).committed.
CMPL-L For each tid ∈ TID, a ∈ ADDR, h ∈ HEAD: let 𝑤 := 𝑠𝑍𝑠h.prop(tid, a).
If 𝑤 = inita, mem-value(tid, a, h) = 0. If 𝑤 = (tid
′, 𝑖′), mem-value(tid, a, h) =
val(tid′, 𝑖′) computed in 𝑠𝑍𝑠h.
CMPL-M For each tid ∈ TID, a ∈ ADDR, h ∈ HEAD: last-key𝑔(tid, a, h) <
𝑠𝑍
𝑠
h.co(𝑠𝑍
𝑠
h.prop(tid, a)) = last-key(tid, a, h) ≤ last-key𝑔(tid, a, h + 1).
CMPL-N For each tid ∈ TID, a ∈ ADDR, h ∈ HEAD: let 𝑖 ∈ N be the
maximal index, such that 𝑠𝑍
𝑠
h.ts(tid).fetched[𝑖] is a store, addr(tid, 𝑖) = a
in 𝑠𝑍
𝑠
𝑛. Let 𝑖
′ be the maximal index, such that 𝑠𝑍𝑠h.ts(tid).fetched[𝑖
′] is
a store, addr(tid, 𝑖′) ∈ {⊥, a} in 𝑠𝑍𝑠h. Then early-mem-value(tid, 𝑖, h) =
⊥ if such 𝑖 does not exist or 𝑖 ∈ 𝑠𝑍𝑠h.ts(tid).committed. Otherwise,
early-mem-value(tid, 𝑖, h) = ⊤ if addr(tid, 𝑖′) = ⊥ or val(tid, 𝑖) = ⊥ in 𝑠𝑍𝑠h.
Otherwise, early-mem-value(tid, 𝑖, h) = val(tid, 𝑖) computed in 𝑠𝑍
𝑠
h.
CMPL-P For each tid ∈ TID, a ∈ ADDR, h ∈ [addr-comm-head(tid, a)..𝑛]: if
addr(tid, 𝑖) = a in 𝑠𝑍
𝑠
𝑛 for some 𝑖, then 𝑖 ∈ 𝑠𝑍𝑠h.ts(tid).committed.
CMPL-P’ For each tid ∈ TID, a ∈ ADDR, h ∈ [1..addr-comm-head(tid, a) − 1]:
there is 𝑖 with addr(tid, 𝑖) = a in 𝑠𝑍
𝑠
𝑛, such that 𝑖 ∈ 𝑠𝑍𝑠h.ts(tid).committed.
CMPL-R For each tid ∈ TID: instr-count(tid) = |𝑠𝑍𝑠1.ts(tid).fetched|.
Base case: 𝑠 = 1. We choose the first (MH-GUESS) transition 𝑠0𝑀
𝜆1−→ 𝑠𝑀 1,
so that the inductive statements hold:
Guess We define mem-value and last-key components of 𝑠𝑀 1 according to
CMPL-L and CMPL-M requirements. The other inductive statements triv-
ially hold.
Assume the inductive statements hold for 𝑠 and 𝜏𝑠h ̸= 𝜀 for some h ∈ HEAD.
We show they hold for 𝑠′ := 𝑠+ 1. The proof is done by pointing out an appro-
priate transition 𝑠𝑀𝑠
𝜆𝑠+1−−−→ 𝑠𝑀𝑠+1. We choose the first possible option out of
the following:
Assignment Assume e1 ⊑ 𝜏𝑠h1 , e3 ⊑ 𝜏𝑠h3 , where h1 < h3 (h1 = h3 is possible, but
here and further we write strict inequalities for notational convenience), e1 :=
(fetch, tid, 𝑞1
cmd−−→ 𝑞2), e3 := (commit, tid, 𝑖), h1 = 1, 𝑖 = instr-count(tid),
cmd = r← 𝑒v. Then, as we show next, a MH-ASSIGN transition is feasible.
First, 𝑠𝑍
𝑠
h1
e1−→, therefore, the state of the last fetched instruction in thread
tid in 𝑠𝑍
𝑠
h1
is 𝑞1. By CMPL-D, ctrl-state(tid) = 𝑞1 too.
Second, we choose h2 := max{reg-comm-head(tid, r) | r is read in cmd}. It
satisfies the requirements from MH-ASSIGN. Note that h2 ≤ h3 by CMPL-
F’ and POW-COMMIT: an instruction cannot be committed, until its argu-
ments are computed.
Third, we must show that for each register r read by the instruction holds
h3 ≥ reg-comm-head(tid, r) and h3. This holds by CMPL-G’, CMPL-K’, and
POW-COMMIT: an instruction cannot be committed until its data and
control dependencies are committed.
In the destination state, CMPL-F holds by CMPL-F in the source state,
definition of reg-value′ in MH-ASSIGN and definitions of eval. The other
inductive statements trivially hold.
Assume Assume e1 ⊑ 𝜏𝑠h1 , e3 ⊑ 𝜏𝑠h3 , where h1 < h3, e1 = (fetch, tid, 𝑞1
cmd−−→ 𝑞2),
e3 = (commit, tid, 𝑖), where 𝑖 = instr-count(tid), h1 = 1, 𝑖 = instr-count(tid),
cmd = assume(𝑒v). Then, a MH-ASSUME transition is feasible.
The proof is similar to the proof for the case of assignment. The MH-
ASSUME transition additionally requires eval(tid, 𝑒v) ̸= 0. This holds by
CMPL-F, definition of reg-value′ in MH-ASSIGN and definitions of eval.
The inductive statements trivially hold in the destination state.
Load Assume e1 ⊑ 𝜏𝑠h1 , e2 ⊑ 𝜏𝑠h2 , e3 ⊑ 𝜏𝑠h3 , where h1 < h2 < h3,
e1 = (fetch, tid, 𝑞1
cmd−−→ 𝑞2), e2 = (load, tid, 𝑖, a), e3 = (commit, tid, 𝑖),
𝑖 = instr-count(tid), cmd = r← mem[𝑒v]. We show that a MH-LOAD transi-
tion is feasible. We point out only differences with respect to the proof for
the assignment case.
Assume e2 was produced by a POW-EARLY transition. This means,
the last store writing to a has its address known and is not commit-
ted yet in 𝑠𝑍
𝑠
h2
. Then, by CMPL-N, early-mem-value(tid, a, h2) ∈ DOM,
and we have v := early-mem-value(tid, a, h2). Assume e2 was produced by
a POW-LOAD transition. Then, POW-EARLY transition was not possi-
ble (Lemma 4, Lemma 5). This means, there was no in-flight stores to a
in 𝑠𝑍
𝑠
h2
. Then, by CMPL-N, early-mem-value(tid, a, h2) = ⊥, and we have
v := mem-value(tid, a, h2). In both cases, by CMPL-N, CMPL-L we have
reg-value′ and reg-comp-head′ satisfying CMPL-F and CMPL-F’.
Additionally, we must show that h3 ≥ addr-comm-head(tid, a). This holds by
CMPL-P’ and CMPL-N.
Store Assume 𝑢 ∈ N, e𝑗 ⊑ 𝜏𝑠h𝑗 for 𝑗 ∈ [1..𝑢 + 3], where h2 = h3, e1 =
(fetch, tid, 𝑞1
cmd−−→ 𝑞2), e2 = (commit, tid, 𝑖, k, a), e3 = (prop, tid, tid, 𝑖, a),
e𝑗 = (prop, tid𝑗 , tid, 𝑖, a) for 𝑗 ∈ [4..𝑢 + 3], 𝑖 = instr-count(tid), cmd =
mem[𝑒a] ← 𝑒v. Assume that there are no other prop events for (tid, 𝑖) in
𝜏 , except for e3 . . . e𝑢+3. We show that a MH-STORE transition is feasible.
The requirements to be checked are similar to those in the load case. The
requirement that k is not already used holds by CMPL-M and the fact that
the same requirement in POW-STORE is met.
Consider the requirements in MH-STORE for generating prop events. The
requirement that propagation event to thread tid is generated in the
same part as commit is met by assumption h3 = h2. The requirement
last-key(tid′, a, h) < k ≤ last-key𝑔(tid′, a, h + 1) is met by CMPL-L, choice
of last-key𝑔 in the initial transition, and POW-PROP.
This means, inductive invariant CMPL-A holds for 𝑠 + 1. Also, CMPL-B
holds by choice of e1 . . . e𝑢+3, CMPL-D holds trivially. CMPL-C holds by
assumption that there are no other prop events in 𝜏 , except for e3 . . . e𝑢+3.
CMPL-F, CMPL-F’, CMPL-G, CMPL-G’ hold as store instruction does
not affect register values. CMPL-K, CMPL-K’ hold as a store instruction
is not assume(). CMPL-L holds by definition of mem-value′ in MH-STORE.
CMPL-M holds by definition of last-key′ in MH-STORE. CMPL-N holds by
definition of early-mem-value′ in MH-STORE. CMPL-P, CMPL-P’ hold by
definition of addr-comm-head′ in MH-STORE. CMPL-R hold by definition
of instr-count′ in MH-STORE.
Now we must show that one of the cases above always takes place. Consider
the event e = first(𝜏𝑠1 ). By CMPL-C and the fact that 𝜏 ∈ Cpower(𝒫), it is a
fetch event (fetch, tid, 𝑖, instr). Choose the case based on the kind of instr. By
NF-A and NF-B, all events related to the instruction (tid, 𝑖) constitute prefixes
of 𝜏𝑠h , h ∈ HEAD. The requirement 𝑖 = instr-count(tid) holds by CMPL-R. The
requirements like h1 ≤ h2 ≤ h3 in the load case naturally follow from the fact
that 𝜏 ∈ Cpower(𝒫).
Assume 𝜏𝑠h = 𝜏h for all h ∈ HEAD. Then 𝜏𝑠h ∈ 𝐹𝑀 by choice of mem-value𝑔
and last-key𝑔 in 𝑠𝑀 1 and CMPL-L, CMPL-M.
⊓⊔
Lemma 18. {𝜏 ∈ Cpower(𝒫) | 𝜏 is in normal form of degree 𝑛} ⊆ ℒ(𝑀(𝒫)) ⊆
Cpower(𝒫).
Proof. Corollary of Lemmas 16 and 17. ⊓⊔
5.2 Checking Cyclicity of the Happens-Before Relation
We call a happens-before cycle beautiful, if it has the following form:
(tid1, 𝑖1, instr1)→𝑝𝑜*(tid1, 𝑖′1, instr′1)→ℎ𝑜𝑝 . . .
→ℎ𝑜𝑝(tid𝑛, 𝑖𝑛, instr𝑛)→𝑝𝑜*(tid𝑛, 𝑖′𝑛, instr′𝑛)→ℎ𝑜𝑝(tid1, 𝑖1, instr1).
Here, →ℎ𝑜𝑝 := (→𝑐𝑜 ∪ →𝑠𝑟𝑐 ∪ →𝑐𝑓 ) and tid𝑘 ̸= tid𝑙 for 𝑘 ̸= 𝑙. We call 𝜃 :=
tid1 . . . tid𝑛 the profile of the cycle.
Example 4. The happens-before cycle shown in Figure 2 is beautiful.
Lemma 19 ([8]). A computation 𝜏 ∈ Cpower(𝒫) has a happens-before cycle iff
it has a beautiful happens-before cycle.
Given a cycle profile 𝜃, we define the automaton 𝑀 ′(𝒫, 𝜃) as a modification
of 𝑀(𝒫) that marks one event in each thread tid𝑗 ∈ 𝜃 by enter (identifying
(tid𝑗 , 𝑖𝑗 , *)) and a later (or the same) event by leave (identifying (tid𝑗 , 𝑖′𝑗 , *),
𝑖𝑗 ≤ 𝑖′𝑗). Note that 𝑀(𝒫) generates the events in program order, which en-
sures (tid𝑗 , 𝑖𝑗 , *)→𝑝𝑜*(tid𝑗 , 𝑖′𝑗 , *). Technically, 𝑀 ′(𝒫, 𝜃) introduces the following
changes:
– The alphabet is E′ := E× 2{enter,leave}.
– The automaton generates only load and prop events, as only they are relevant
for cycle detection.
– The prop events include k component of the corresponding commit event.
To check (tid𝑗 , 𝑖
′
𝑗 , *)→ℎ𝑜𝑝(tid𝑗+1, 𝑖𝑗+1, *), we use an intersection with a regular
language 𝐻 tid𝑗 ,tid𝑗+1 . The language 𝐻 tid1,tid2 includes a computation 𝜏 iff one or
more of the following conditions hold:
H-ST (e1,𝑚1), (e2,𝑚2) ∈ 𝜏 , leave ∈ 𝑚1, enter ∈ 𝑚2, e1 = (prop, tid1, tid1, k1, a),
e2 = (prop, tid2, tid2, k2, a), and k1 < k2.
H-SRC 𝜏 = 𝜏1 · (e1,𝑚1) · 𝜏2 · (e2,𝑚2) · 𝜏3, leave ∈ 𝑚1, enter ∈ 𝑚2,
e1 = (prop, tid2, tid1, a), e2 = (load, tid2, a), 𝜏2 does not contain events
(prop, tid2, *, a).
H-CF1 𝜏 = 𝜏1 · (e3,𝑚3) · 𝜏2 · (e2,𝑚2) · 𝜏3, leave ∈ 𝑚2, e3 = (prop, tid1, tid3, k3, a),
e2 = (load, tid1, a), 𝜏2 does not contain events (prop, tid1, *, *, a), (e3,𝑚3) ∈
𝜏1 · 𝜏2 · 𝜏3, 𝑚3 ∈ enter, e3 = (prop, tid2, tid2, k2), k3 < k2.
H-CF2 (e1,𝑚1), (e2,𝑚2) ∈ 𝜏 , enter ∈ 𝑚1, leave ∈ 𝑚2, e1 = (load, tid1, a),
e2 = (prop, tid2, tid2, k2, a) and there is no (e3,𝑚3) ∈ 𝜏 with e3 =
(prop, tid3, tid3, k3, a) with k3 < k2.
Lemma 20. Program 𝒫 has a beautiful cycle with profile 𝜃 = tid1 . . . tid𝑛 iff
𝑀 ′(𝒫, 𝜃) ∩𝐻 tid1,tid2 ∩ . . . ∩𝐻 tid𝑛,tid1 ̸= ∅.
Note that 𝑀 ′(𝒫, 𝜃) is infinite-state. To ensure 𝑀 ′(𝒫, 𝜃) has finitely many states,
we note that the instruction indices are irrelevant for the detection of happens-
before cycles (instr-count can be dropped), and that the number of different
coherence keys that must be stored in the state at any moment is polynomial
in the size of 𝒫. Indeed, the last-key and last-key𝑔 components of the state each
store at most |ADDR| · |𝒫| · 𝑛 different coherence keys. Each modification of the
last-key component of the state can be extended by a normalization step that
would turn coherence keys to consecutive natural numbers starting from zero.
The normalization step must preserve the less-than relation on the keys. In order
for the detection of happens-before cycles to work correctly, the automaton has
to remember the coherence keys of marked store events: they must be preserved
during normalization. Altogether, this results into 𝑂(|ADDR| · |𝒫|2 · 𝑛) different
keys, which is polynomial in the size of 𝒫.
Theorem 2. Robustness against Power is PSpace-complete.
Proof. By Theorem 1, Lemma 19, and Lemma 20, a program is non-robust iff
the equation from Lemma 20 holds for some 𝜃. In order to check robustness, we
enumerate all profiles 𝜃 and check the equation from Lemma 20. The enumer-
ation can be done in PSpace. By construction and Lemma 14, the size of the
intersection automaton is exponential in the size of the program. By Lemma 15,
language emptiness for it can be checked in PSpace in the size of the program,
which gives us the upper bound.
The PSpace lower bound follows from PSpace-hardness of SC state reach-
ability. One can reduce reachability to robustness by inserting an artificial
happens-before cycle in the target state. ⊓⊔
Acknowledgements. The authors thank Parosh Aziz Abdulla, Jade Alglave,
Mohamed Faouzi Atig, Ahmed Bouajjani, and Carl Leonardsson for helpful dis-
cussions on the Power memory model and the anonymous reviewers for sug-
gestions. The first author was granted by the Competence Center High Perfor-
mance Computing and Visualization (CC-HPC) of the Fraunhofer Institute for
Industrial Mathematics (ITWM). The work was partially supported by the DFG
project R2M2: Robustness against Relaxed Memory Models.
References
1. J. Alglave, October 2013. Personal communication.
2. J. Alglave and L. Maranget. Stability in weak memory models. In CAV, volume
6806 of LNCS, pages 50–66. Springer, 2011.
3. J. Alglave, L. Maranget, and M. Tautschnig. Herding cats. CoRR, abs/1308.6810,
2013.
4. A. Bouajjani, E. Derevenetc, and R. Meyer. Checking and enforcing robustness
against TSO. In ESOP, volume 7792 of LNCS, pages 533–553. Springer, 2013.
5. A. Bouajjani, R. Meyer, and E. Mo¨hlmann. Deciding robustness against Total
Store Ordering. In ICALP, volume 6756 of LNCS, pages 428–440. Springer, 2011.
6. S. Burckhardt and M. Musuvathi. Effective program verification for relaxed mem-
ory models. In CAV, volume 5123 of LNCS, pages 107–120. Springer, 2008.
7. J. Burnim, C. Stergiou, and K. Sen. Sound and complete monitoring of sequential
consistency for relaxed memory models. In TACAS, volume 6605 of LNCS, pages
11–25. Springer, 2011.
8. G. Calin, E. Derevenetc, R. Majumdar, and R. Meyer. A theory of partitioned
global address spaces. In FSTTCS, volume 24 of LIPIcs, pages 127–139, 2013.
9. L. Lamport. Time, clocks, and the ordering of events in a distributed system.
CACM, 21(7):558–565, 1978.
10. L. Lamport. How to make a multiprocessor computer that correctly executes
multiprocess programs. IEEE Transactions on Computers, 28(9):690–691, 1979.
11. S. Mador-Haim, L. Maranget, S. Sarkar, K. Memarian, J. Alglave, S. Owens,
R. Alur, M. M. K. Martin, P. Sewell, and D. Williams. An axiomatic memory
model for POWER multiprocessors. In CAV, volume 7358 of LNCS, pages 495–
512. Springer, 2012.
12. L. Maranget, S. Sarkar, and P. Sewell. A tutorial introduction to the ARM
and POWER relaxed memory models. https://www.cl.cam.ac.uk/~pes20/
ppc-supplemental/test7.pdf. Draft.
13. S. Owens, S. Sarkar, and P. Sewell. A better x86 memory model: x86-TSO (ex-
tended version). Technical Report CL-TR-745, University of Cambridge, 2009.
14. S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams. Understanding
POWER multiprocessors. In PLDI, pages 175–186. ACM, 2011.
15. D. Shasha and M. Snir. Efficient and correct execution of parallel programs that
share memory. ACM TOPLAS, 10(2):282–312, 1988.
