Lazy TSO Reachability by Bouajjani, Ahmed et al.
ar
X
iv
:1
50
1.
02
68
3v
1 
 [c
s.P
L]
  1
2 J
an
 20
15
Lazy TSO Reachability
Ahmed Bouajjani1, Georgel Calin2, Egor Derevenetc2,3, and Roland Meyer2
1LIAFA, University Paris 7 2University of Kaiserslautern 3Fraunhofer ITWM
Abstract. We address the problem of checking state reachability for
programs running under Total Store Order (TSO). The problem has been
shown to be decidable but the cost is prohibitive, namely non-primitive
recursive. We propose here to give up completeness. Our contribution is
a new algorithm for TSO reachability: it uses the standard SC semantics
and introduces the TSO semantics lazily and only where needed. At
the heart of our algorithm is an iterative refinement of the program of
interest. If the program’s goal state is SC-reachable, we are done. If the
goal state is not SC-reachable, this may be due to the fact that SC under-
approximates TSO. We employ a second algorithm that determines TSO
computations which are infeasible under SC, and hence likely to lead to
new states. We enrich the program to emulate, under SC, these TSO
computations. Altogether, this yields an iterative under-approximation
that we prove sound and complete for bug hunting, i.e., a semi-decision
procedure halting for positive cases of reachability. We have implemented
the procedure as an extension to the tool Trencher [1] and compared it
to the Memorax [2] and CBMC [14] model checkers.
1 Introduction
Sequential consistency (SC) [21] is the semantics typically assumed for parallel
programs. Under SC, instructions are executed atomically and in program order.
When programs are executed on an Intel x86 processor, however, they are only
guaranteed a weaker semantics known as Total Store Order (TSO). TSO weakens
the synchronization guarantees given by SC, which in turn may lead to erroneous
behavior. TSO reflects the architectural optimization of store buffers. To reduce
the latency of memory accesses, store commands are added to a thread-local
FIFO buffer and only later executed on memory.
To check for correct behavior, reachability techniques have proven useful.
Given a program and a goal state, the task is to check whether the state is
reachable. To give an example, assertion failures can be phrased as reachability
problems. Reachability depends on the underlying semantics. Under SC, the
problem is known to be PSpace-complete [18]. Under TSO, it is considerably
more difficult: although decidable, it is non-primitive recursive-hard [8].
Due to the high complexity, tools rarely provide decision procedures [2,23,24].
Instead, most approaches implement approximations. Typical approximations
of TSO reachability bound the number of loop iterations [5, 6], the number of
context switches between threads [9], or the size of store buffers [19, 20]. What
2 A. Bouajjani, G. Calin, E. Derevenetc, R. Meyer
all these approaches have in common is that they introduce store buffering in
the whole program. We claim that such a comprehensive instrumentation is
unnecessarily heavy.
The idea of our method is to introduce store buffering lazily and only where
needed. Unlike [2], we do not target completeness. Instead, we argue that our lazy
TSO reachability checker is useful for a fast detection of bugs that are due to the
TSO semantics. At a high level, we solve the expensive TSO reachability problem
with a series of cheap SC reachability checks — very much like SAT solvers are
invoked as subroutines of costlier analyses. The SC checks run interleaved with
queries to an oracle. The task of the oracle is to suggest sequences of instructions
that should be considered under TSO, which means they are likely to lead to
TSO-reachable states outside SC.
To be more precise, the algorithm iteratively repeats the following steps.
First, it checks whether the goal state is SC-reachable. If this is the case, the
state will be TSO-reachable as well and the algorithm returns. If the state is not
SC-reachable, the algorithm asks the oracle for a sequence of instructions and
encodes the TSO behavior of the sequence into the input program. As a result,
precisely this TSO behavior becomes available under SC. The encoding is linear
in the size of the input program and in the length of the sequence.
The algorithm is a semi-decision procedure: it always returns correct answers
and is guaranteed to terminate if the goal state is TSO-reachable. This guarantee
relies on one assumption on the oracle. If the oracle returns the empty sequence,
then the SC- and the TSO-reachable states of the input program have to coincide.
We also come up with a good oracle: robustness checkers naturally meet the
above requirement. Intuitively, a program is robust against TSO if its partial
order-behaviors (reflecting data and control dependencies) under TSO and under
SC coincide. Robustness is much easier than TSO reachability, actually PSpace-
complete [10, 11], and hence well-suited for iterative invocations.
We have implemented lazy TSO reachability as an extension to our tool
Trencher [1], reusing the robustness checking algorithms of Trencher to
derive an oracle. The implementation is able to solve positive instances of TSO
reachability as well as correctly determine safety for robust programs. The source
code and experiments are available online [1].
The structure of the paper is as follows. We introduce parallel programs with
their TSO and their SC semantics in Section 2. Section 3 presents our main
contribution, the lazy approach to solving TSO reachability. Section 4 describes
the robustness-based oracle. The experimental evaluation is given in Section 5.
Details and proofs missing in the main text can be found in the appendix.
Related Work
As already mentioned, TSO reachability was proven decidable but non-primitive
recursive [8] in the case of a finite number of threads and a finite data domain. In
the same setting, robustness was shown to be PSpace-complete [11]. Checking
and enforcing robustness against weak memory models has been addressed in
Lazy TSO Reachability 3
[3, 7, 10–13, 26]. The first work to give an efficient sound and complete decision
procedure for checking robustness is [10].
The works [2, 23, 24] propose state-based techniques to solve TSO reacha-
bility. An under-approximative method that uses bounded context switching is
given in [9]. It encodes store buffers into a linear-size instrumentation, and the
instrumented program is checked for SC reachability. The under-approximative
techniques of [5,6] are able to guarantee safety only for programs with bounded
loops. On the other side of the spectrum, over-approximative analyses abstract
store buffers into sets combined with bounded queues [19, 20].
2 Parallel Programs
We use automata to define the syntax and the semantics of parallel programs. A
(non-deterministic) automaton over an alphabet Σ is a tuple A = (Σ,S,→, s0),
where S is a set of states, →⊆ S × (Σ ∪ {ε}) × S is a set of transitions, and
s0 ∈ S is an initial state. The automaton is finite if the transition relation →
is finite. We write s
a
−→ s′ if (s, a, s′) ∈→, and extend the transition relation to
sequences w ∈ Σ∗ as expected. The language of A with final states F ⊆ S is
LF (A) := {w ∈ Σ∗ | s0
w
−→ s ∈ F}. We say that state s ∈ S is reachable if
s0
w
−→ s for some sequence w ∈ Σ∗. Letter a precedes b in w, denoted by a <w b,
if w = w1 · a · w2 · b · w3 for some w1, w2, w3 ∈ Σ∗.
A parallel program P is a finite sequence of threads that are identified by
indices t from TID. Each thread t := (Comt, Qt, It, q0,t) is a finite automaton with
transitions It that we call instructions. Instructions It are labelled by commands
from the set Comt which we define in the next paragraph.We assume, wlog., that
states of different threads are disjoint. This implies that the sets of instructions
of different threads are distinct. We use I :=
⊎
t∈TID It for all instructions and
Com :=
⋃
t∈TID Comt for all commands. For an instruction inst := (s, cmd , s
′)
in I , we define cmd(inst) := cmd , src(inst) := s, and dst(inst) := s′.
t1 q0,1
q1,1
q2,1
qg,1
mem[x]← 1
r1 ← mem[y]
assume r1=0
t2 q0,2
q1,2
q2,2
qg,2
mem[y]← 1
r2 ← mem[x]
assume r2=0
Fig. 1. Simplified Dekker’s algorithm.
To define the set of commands, let
DOM be a finite domain of values that
we also use as addresses. We assume
that value 0 is in DOM. For each thread
t, let REGt be a finite set of registers
that take their values from DOM. We
assume per-thread disjoint sets of reg-
isters. The set of expressions of thread
t, denoted by EXPt, is defined over reg-
isters from REGt, constants from DOM,
and (unspecified) operators over DOM. If r ∈ REGt and e, e′ ∈ EXPt, the
set of commands Comt consists of loads from memory r ← mem[e], stores
to memory mem[e] ← e′, memory fences mfence, assignments r ← e, and
conditionals assume e. We write REG :=
⊎
t∈TID REGt for all registers and
EXP :=
⋃
t∈TID EXPt for all expressions.
4 A. Bouajjani, G. Calin, E. Derevenetc, R. Meyer
The program in Figure 1 serves as our running example. It consists of two
threads t1 and t2 implementing a mutual exclusion protocol. Initially, the ad-
dresses x and y contain 0. The first thread signals its intent to enter the critical
section by setting variable x to 1. Next, the thread checks whether the second
thread wants to enter the critical section, too. It reads variable y and, if it is
0, the first thread enters its critical section. The critical section actually is the
state qg,1. The second thread behaves symmetrically.
2.1 Semantics of Parallel Programs
The semantics of a parallel program P under memory model M = TSO and
M = SC follows [25]. We define the semantics in terms of a state-space automaton
XM(P) := (E, SM, ∆M, s0). Each state s = (pc, val, buf) ∈ SM is a tuple where
the program counter pc : TID→ Q holds the current control state of each thread,
the valuation val : REG ∪ DOM→ DOM holds the values stored in registers and
at memory addresses, and the buffer configuration buf : TID→ (DOM×DOM)∗
holds a sequence of address-value pairs.
In the initial state s0 := (pc0, val0, buf0), the program counter holds the
initial control states, pc0(t) := q0,t for all t ∈ TID, all registers and addresses
contain value 0, and all buffers are empty, buf0(t) := ε for all t ∈ TID.
The transition relation ∆TSO for TSO satisfies the rules given in Figure 2.
There are two more rules for register assignments and conditionals that are
standard and omitted. TSO architectures implement (FIFO) store buffering,
which means stores are buffered for later execution on the shared memory. Loads
from an address a take their value from the most recent store to address a that is
buffered. If there is no such buffered store, they access the main memory. This is
modelled by the Rules (LB) and (LM). Rule (ST) enqueues store operations as
address-value pairs to the buffer. Rule (MEM) non-deterministically dequeues
store operations and executes them on memory. Rule (F) states that a thread can
execute a fence only if its buffer is empty. As can be seen from Figure 2, events
labelling TSO transitions take the form E ⊆ TID× (I ∪{flush})× (DOM∪{⊥}).
The SC [21] semantics is simpler than TSO in that stores are not buffered.
Technically, we keep the set of states but change the transitions so that Rule (ST)
is immediately followed by Rule (MEM).
We are interested in the computations of program P under M ∈ {TSO, SC}.
They are given by CM(P) := LF (XM(P)), where F is the set of states with empty
buffers. With this choice of final states, we avoid incomplete computations that
have pending stores. Note that all SC states have empty buffers, which means the
SC computations form a subset of the TSO computations: CSC(P) ⊆ CTSO(P).
We will use notation ReachM(P) for the set of all states s ∈ F that are reachable
by some computation in CM(P).
To give an example, the program from Figure 1 admits the TSO computation
τwit below where the store of the first thread is flushed at the end:
τwit = store1 · load1 · store2 · flush2 · load2 · flush1.
Lazy TSO Reachability 5
cmd = r ← mem[ea] buf(t)↓({a} ×DOM) = (a, v) · β
s
(t,inst,a)
−−−−−−→ (pc′, val[r := v], buf)
(LB)
cmd = r ← mem[ea] buf(t)↓({a} × DOM) = ε
s
(t,inst,a)
−−−−−−→ (pc′, val[r := val(a)], buf)
(LM)
cmd = mem[ea]← ev
s
(t,inst,a)
−−−−−−→ (pc′, val, buf[t := (a, v) · buf(t)])
(ST)
buf(t) = β · (a, v)
s
(t,flush,a)
−−−−−−→ (pc, val[a := v], buf[t := β])
(MEM)
cmd = mfence buf(t) = ε
s
(t,inst,⊥)
−−−−−−→ (pc′, val, buf)
(F)
Fig. 2. Transition rules for XTSO(P) assuming s = (pc, val, buf) with pc(t) = q and
inst = (q, cmd , q′) in thread t. The program counter is always set to pc′ = pc[t := q′].
We assume a = êa to be the address returned by an address expression ea and v = êv
the value returned by a value expression ev. We use buf(t) ↓ ({a} × DOM) to project
the buffer content buf(t) to store operations that access address a.
Consider an event e = (t, inst , a). By thread(e) := t we refer to the thread
that produced the event. Function inst(e) := inst returns the instruction.
For flush events, inst(e) gives the instruction of the matching store event. By
addr (e) := a we denote the address that is accessed (if any). In the example,
thread(store1) = t1, inst(store1) = q0,1
mem[x]←1
−−−−−−→ q1,1, and addr (store1) = x.
3 Lazy TSO Reachability
We introduce the reachability problem and present our main contribution: an
algorithm that checks TSO reachability lazily. The iterative algorithm queries an
oracle to identify sequences of instructions that, under the TSO semantics, lead
to states not reachable under SC. In Section 3.1, we show that the algorithm
yields a sound and complete semi-decision procedure.
Given a memory model M ∈ {SC,TSO}, the M reachability problem expects
as input a program P and a set of goal states G ⊆ SM. We are mostly interested
in the control state of each thread. Therefore, goal states (pc, val, buf) typically
specify a program counter pc but leave the memory valuation unconstrained.
Formally, the M reachability problem asks if some state in G is reachable in the
automaton XM(P).
Given: A parallel program P and goal states G.
Problem: Decide LF∩G(XM(P)) 6= ∅.
We use notation ReachM(P) ∩G for the set of reachable final goal states in P .
Instead of solving reachability under TSO directly, the algorithm we propose
solves SC reachability and, if no goal state is reachable, tries to lazily introduce
store buffering on a certain control path of the program. The algorithm delegates
choosing the control path to an oracle function O. Given an input program R,
6 A. Bouajjani, G. Calin, E. Derevenetc, R. Meyer
the oracle returns a sequence of instructions I ∗ in that program. Formally, the
oracle satisfies the following requirements:
– If O(R) = ε then ReachSC(R) = ReachTSO(R).
– Otherwise, O(R) = inst1inst2 . . . instn with cmd(inst1) a store, cmd(instn)
a load, cmd(inst i) 6= mfence, and dst(inst i) = src(inst i+1) for i ∈ [1..n− 1].
The lazy TSO reachability checker is outlined in Algorithm 1. As input, it
takes a program P and an oracle O. We assume some control states in each
thread to be marked to define a set of goal states. The algorithm returns true
iff the program can reach a goal state under TSO. It works as follows. First, it
creates a copy R of the program P . Next, it checks if a goal state is SC-reachable
in R (Line 3). If that is the case, the algorithm returns true. Otherwise, it asks
the oracle O where in the program to introduce store buffering. If O(R) 6= ε,
the algorithm extends R to emulate store buffering on the path O(R) under SC
(Line 8). Then it goes back to the beginning of the loop. If O(R) = ε, by the first
property of oracles, R has the same reachable states under SC and under TSO.
This means the algorithm can safely return false (Line 10). Note that, since R
emulates TSO behavior of P , the algorithm solves TSO reachability for P .
Algorithm 1 Lazy TSO reachability Checker
Input: Marked program P and oracle O
Output: true if some goal state is TSO-reachable in P
false if no goal state is TSO-reachable in P
1: R := P ;
2: while true do
3: if ReachSC(P) ∩G 6= ∅ then {check if some goal state is SC-reachable}
4: return true;
5: else
6: σ := O(R); {ask the oracle where to use store buffering}
7: if σ 6= ε then
8: R := R ⊕ σ;
9: else
10: return false;
Let σ := O(R) = inst1inst2 . . . instn and let t := (Com t, Qt, It, q0,t) be
the thread of the instructions in σ. The modified program R ⊕ σ replaces t by
a new thread t ⊕ σ. The new thread emulates under SC the TSO semantics
of σ. Formally, the extension of t by σ is t ⊕ σ := (Com ′t, Q
′
t, I
′
t , q0,t). The
thread is obtained from t by adding sequences of instructions starting from
q0 := src(inst1). To remember the addresses and values of the buffered stores,
we use auxiliary registers ar1, . . . , armax and vr1, . . . , vrmax, where max ≤ n− 1 is
the total number of store instructions in σ. The sets Com ′t ⊇ Comt and Q
′
t ⊇ Qt
are extended as necessary.
Lazy TSO Reachability 7
We define the extension by describing the new transitions that are added
to I ′t for each inst i. In our construction, we use a variable count to keep track
of the number of store instructions already processed. Initially, Q′t := Qt and
count := 0. Based on the type of instructions, we distinguish the following cases.
If cmd(inst i) = mem[e] ← e′, we increment count by 1 and add instructions
that remember the address and the value being written in arcount and vrcount.
If cmd(inst i) = r ← mem[e], we add instructions to I ′t that perform a load
from memory only when a load from the simulated buffer is not possible. More
precisely, if j ∈ [1, count] is found so that arj = e, register r is assigned the
value of vrj . Otherwise, r receives its value from the address indicated by e.
qi−1 · · ·
· · ·
qi
assume arcount 6= e assume ar1 6= e r ← mem[e]
assume ar1 = e
assume arcount = e
r ← vr1
r ← vrcount
If cmd(inst i) is an assignment or a conditional, we add (qi−1, cmd(inst i), qi)
to I ′t . By the definition of an oracle, cmd(inst i) is never a fence.
The above cases handle all instructions in σ. So far, the extension added new
instructions to I ′t that lead through the fresh states q1, . . . , qn. Out of control
state qn, we now recreate the sequence of stores remembered by the auxiliary
registers. Then we return to the control flow of the original thread t.
qn · · · dst(instn)
mem[ar1]← vr1 mem[armax]← vrmax
Next, we remove inst1 from the program. This prevents the oracle from
discovering in the future another instruction sequence that is essentially the
same as σ. As we will show, this is key to guaranteeing termination of the
algorithm for acyclic programs. However, the removal of inst1 may reduce the
set of TSO-reachable states. To overcome this problem, we insert additional
instructions. Consider an instruction inst ∈ It with src(inst) = src(inst i) for
some i ∈ [1..n] and assume that inst 6= inst i. We add instructions that recreate
the stores buffered in the auxiliary registers and return to dst(inst).
qi · · · dst(inst)
mem[ar1]← vr1 mem[arcount]← vrcount cmd(inst)
Similarly, for all load instructions inst i as well as out of q1 we add instructions
that flush and fence the pair (ar1, vr1), make visible the remaining buffered
stores, and return to state q in the original control flow. Below, q := src(inst i) if
inst i is a load and q := dst(inst1), otherwise. Intuitively, this captures behaviors
that delay inst1 past loads earlier than instn, and that do not delay inst1 past
the first load in σ.
qi · · · q
mem[ar1]← vr1 mfence mem[arcount]← vrcount
8 A. Bouajjani, G. Calin, E. Derevenetc, R. Meyer
t1
q0,1
q1,1
q2,1
qg,1
r1 ← mem[y]
assume r1=0
q1
q2
ar1 ← x vr1 ← 1 assume ar1 6= y
assume ar1 = y
r1 ← vr1
mem[ar1]← vr1
r1 ← mem[y]
mem
[ar1
]←
vr1
mfence
t2
q0,2
q1,2
q2,2
qg,2
mem[y]← 1
r2 ← mem[x]
assume r2=0
Fig. 3. Extension by inst(store1) · inst(load1) of the program in Figure 1. Goal state
(pc, val, buf) with val(x) = val(y) = 1 and val(r1) = val(r2) = 0 is now SC-reachable.
Figure 3 shows the extension of the program in Figure 1 by the instruction
sequence inst(store1) · inst(load1) := q0,1
mem[x]←1
−−−−−−→ q1,1
r1←mem[y]
−−−−−−→ q1,2.
3.1 Soundness and Completeness
We show that Algorithm 1 is a decision procedure for acyclic programs. From
here until (inclusively) Theorem 3 we assume that all programs are acyclic, i.e.,
their instructions and control states form directed acyclic graphs. Theorem 4
then explains how Algorithm 1 yields a semi-decision procedure for all programs.
We first prove the extension sound and complete (Lemma 1): extending R
by sequence σ := O(R) does neither add nor remove TSO-reachable states.
Afterwards, Lemma 2 shows that if Algorithm 1 extends R by σ (Line 8) then,
in subsequent iterations of the algorithm, no new sequence returned by the oracle
is the same as σ (projected back to P). Next, by the first condition of an oracle
and using Lemma 2, we establish that Algorithm 1 is a decision procedure for
acyclic programs (Theorem 3). Finally, we show that Algorithm 1 can be turned
into a semi-decision procedure for all programs using a bounded model checking
approach (Theorem 4).
Lemma 1 Let DOM ∪ REG be the addresses and registers of program R and
let σ := O(R). Then we have (pc, val, buf) ∈ ReachTSO(R) if and only if
(pc, val′, buf) ∈ ReachTSO(R ⊕ σ) with val(a) = val
′(a) for all a ∈ DOM ∪ REG.
Let t be the thread that differs in R and R⊕σ. To prove Lemma 1, one can show
that for any prefix α′ of α ∈ CTSO(R) there is a prefix β
′ of β ∈ CTSO(R ⊕ σ),
and vice versa, that maintain the following invariants.
Inv-0 s0
α′
−→ (pc, val, buf) and s0
β′
−→ (pc′, val′, buf′).
Inv-1 If pc and pc′ differ, they only differ for thread t. If pc(t) 6= pc′(t), then
pc(t) = dst(inst i) and pc
′(t) = qi for some i ∈ [1..n− 1].
Inv-2 val′(a) = val(a) for all a ∈ DOM ∪ REG.
Inv-3 buf and buf′ differ at most for t. If buf(t) 6= buf′(t), then pc′(t) = qi
for some i ∈ [1..n− 1] and buf(t) = (ârcount, v̂rcount) · · · (âr1, v̂r1) · buf
′(t) where
count stores are seen along σ from src(inst1) to dst(inst i).
Lazy TSO Reachability 9
We now show that the oracle never suggests the same sequence σ twice. Since
in R ⊕ σ we introduce new instructions that correspond to instructions in R,
we have to map back sequences of instructions I⊕ in R ⊕ σ to sequences of
instructions I in R. Intuitively, the mapping gives the original instructions from
which the sequence was produced. Formally, we define a family of projection
functions hσ : I
∗
⊕ → I
∗ with hσ(ε) := ε and hσ(w · inst) := hσ(w) · hσ(inst). For
an instruction inst ∈ I⊕, we define hσ(inst) := inst provided inst ∈ I . We set
hσ(inst) := inst i if inst is a first instruction on the path between qi−1 and qi
for some i ∈ [1..n]. In all other cases, we delete the instruction, hσ(inst) := ε.
Then, if R0 := P is the original program, σj is the sequence that the oracle
returns in iteration j ∈ N of the while loop, and w is a sequence of instructions
in Rj+1, we define h(w) := hσ0(. . . hσj (w)). This latter function maps sequences
of instructions in program Rj+1 back to sequences of instructions in P .
We are ready to state our key lemma. Intuitively, if the oracle in Algorithm 1
returns σ := O(R) and σ′ := O(R ⊕ σ) then, necessarily, h(σ′) 6= h(σ).
Lemma 2 Let R0 := P and Ri+1 := Ri⊕σi for σi := O(Ri) as in Algorithm 1.
If σj+1 6= ε then h(σj+1) 6= h(σi) for all i ≤ j.
Proof. Assume, to the contrary, that h(σj+1) = h(σi) for some i ≤ j where
σj+1 := O(Rj+1) and σi := O(Ri). Let instfirst be the first (store) instruction
and inst last the last (load) instruction of σj+1. Similarly, let inst
′
first and inst
′
last
be the first and last instructions of σi. Since h(σj+1) = h(σi) it means that
h(instfirst) = h(inst
′
first) and h(inst last) = h(inst
′
last).
However, since all control flows of Ri+1 := Ri ⊕ σi that recreate h(inst
′
first)
before h(inst ′last) also place a fence between the two, no other later sequences
that the oracle returns have h(inst ′first) come before h(inst
′
last). This in particular
means that σj+1 = O(Rj+1) where h(instfirst) comes before h(inst last) does not
exist. In conclusion, the initial assumption is false. ⊓⊔
We can now prove Algorithm 1 sound and complete for acyclic programs
(Theorem 3). Lemma 2 and the assumption that the input program is acyclic
ensure that if no goal state is found SC-reachable (Line 4), then Algorithm 1
eventually runs out of sequences σ to return (Line 7). If that is the case, O(R)
returns ε in the last iteration of Algorithm 1. By the first oracle condition, we
know that the SC- and TSO-reachable states of R are the same. Hence, no goal
state is TSO-reachable in R and, by Lemma 1, no goal state is TSO-reachable
in the input program P either. Otherwise, a goal state s is SC-reachable by
some computation τ in Rj for some j ∈ N and, by Lemma 1, there is a TSO
computation in P corresponding to τ that reaches s.
Theorem 3 For acyclic programs, Algorithm 1 terminates. Moreover, it returns
true on input P if and only if ReachTSO(P) ∩G 6= ∅.
Proof. It is immediate that Algorithm 1 always terminates for acyclic programs.
On the one hand, the number of instruction sequences that start with a store
and end with a load as in the second oracle condition are finite in P . On the
10 A. Bouajjani, G. Calin, E. Derevenetc, R. Meyer
other hand, by Lemma 2, at each iteration the oracle returns a sequence that
differs (in P) from the previous ones. These two facts imply termination.
We now prove that ReachTSO(P) ∩ G 6= ∅ iff. Algorithm 1 returns true
on input P . For the easy direction, assume that Algorithm 1 returns true on
input P . This means that ReachSC(R) ∩ G 6= ∅ in the last iteration of the
algorithm’s loop. Then, by ReachSC(R) ⊆ ReachTSO(R) and Lemma 1, we know
that ReachSC(R) ⊆ ReachTSO(P). Hence, ReachTSO(P) ∩G 6= ∅.
For the reverse direction, assume that ReachTSO(P) ∩ G 6= ∅. Furthermore,
let R0 := P and Ri+1 := Ri ⊕ σi for σi := O(Ri). By the initial termination
argument we know there exists j ∈ N such that the algorithm terminates with
R = Rj in its last loop iteration. That means that either the check in Line 3 of the
algorithm succeeds, in which case Algorithm 1 returns true, or the check in Line 7
of the algorithm fails, i.e. O(Rj) = ǫ and ReachSC(Rj) ∩ G = ∅. In the latter
case, by the first oracle condition we know that ReachTSO(Rj) ∩G = ∅ and, by
Lemma 1, we get ReachTSO(Rj) ⊆ ReachTSO(R0). Then, ReachTSO(P) ∩ G = ∅
contradicts the above assumption and concludes the proof. ⊓⊔
To establish that Algorithm 1 is a semi-decision procedure for all programs,
one can use an iterative bounded model checking approach. Bounded model
checking unrolls the input program P up to a bound k ∈ N on the length
of computations. Then Algorithm 1 is applied to the resulting programs Pk.
If it finds a goal state TSO-reachable in Pk, this state corresponds to a TSO-
reachable goal state in P . Otherwise, we increase k and try again. By Theorem 3,
we know that Algorithm 1 is a decision procedure for each Pk. This implies
that Algorithm 1 together with iterative bounded model checking yields a semi-
decision procedure that terminates for all positive instances of TSO reachability.
For negative instances of TSO reachability, however, the procedure is guaranteed
to terminate only if the input program P is acyclic.
Theorem 4 We have G ∩ ReachTSO(P) 6= ∅ if and only if, for large enough
k ∈ N, Algorithm 1 returns true on input Pk.
Proof. Assume that G ∩ ReachTSO(P) 6= ∅. Then there exist some state s ∈ G
and α ∈ CTSO(P) such that s0
α
−→ s. Let k be the length of α and G′ be the
goal states of XTSO(Pk). There exists a computation β ∈ CTSO(Pk) that mimics
α and reaches s′ ∈ G′. Hence, G′ ∩ ReachTSO(Pk) 6= ∅ and, by Theorem 3,
Algorithm 1 returns true on input Pk.
For the reverse direction, assume that Algorithm 1 returns true on input Pk
for some k ∈ N. Let s′0 be the initial state of XTSO(Pk) and, as before, G
′ be the
goal states of XTSO(Pk). By Theorem 3, there exists s
′ ∈ G′ ∩ ReachTSO(Pk)
and β ∈ CTSO(Pk) such that s′0
β
−→ s′. Since Pk unrolls P up to bound k, there
exists a computation α ∈ CTSO(P) that mimics β and reaches s ∈ G. Therefore,
G ∩ ReachTSO(P) 6= ∅. ⊓⊔
Lazy TSO Reachability 11
4 A Robustness-based Oracle
This section argues why robustness yields an oracle. Robustness [7, 10, 13, 26] is
a correctness criterion requiring that for each TSO computation of a program
there is an SC computation that has the same data and control dependencies.
Delays due to store buffering are still allowed, as long as they do not produce
dependencies between instructions that SC computations forbid.
Dependencies between events are described in terms of the happens-before
relation of a computation τ ∈ CTSO(P). The happens-before relation is a union
of the three relations that we define below: →hb (τ) := →po ∪ ↔ ∪ →cf .
The program order relation →po is the order in which threads issue their
commands. Formally, it is the union of the program order relations for all threads:
→po :=
⋃
t∈TID →
t
po . Let τ
′ be the subsequence of all non-flush events of thread
t in τ . Then →tpo :=<τ ′.
The equivalence relation ↔ links, in each thread, flush events and their
matching store events: (t, inst , a)↔ (t, flush, a).
The conflict relation →cf orders accesses to the same address. Assume, on the
one hand, that τ = τ1 ·store · τ2 ·load · τ3 ·flush · τ4 such that store↔ flush,
events store and load access the same address a and come from thread t,
and there is no other store event store′ ∈ τ2 such that thread(store′) = t and
addr (store′) = a. Then the load event load is an early read of the value buffered
by the event store and store→cf load.
On the other hand, assume τ = τ1 · e · τ2 · e′ · τ3 such that e and e′ are either
load or flush events that access the same address a, neither e nor e′ is an early
read, and at least one of e or e′ is a flush to a. If there is no other flush event
flush ∈ τ2 with addr (flush) = a then e→cf e
′.
Figure 4 depicts the happens-before relation of computation τwit.
store1 store2
flush1 flush2
load1 load2po po
cfcf
Fig. 4. The relation →hb (τwit).
A program P is said to be robust against
TSO if for each computation τ ∈ CTSO(P)
there exists a computation τ ′ ∈ CSC(P) such
that →hb (τ) =→hb (τ ′). If a program P is
robust, then it reaches the same set of final
states under SC and under TSO:
Lemma 5 If P is robust against TSO, then ReachSC(P) = ReachTSO(P).
Proof. The ⊆ inclusion holds by CSC(P) ⊆ CTSO(P). For the reverse, assume
that there is a TSO computation τ ∈ CTSO(P) such that s0
τ
−→ s. Since P is
robust, there is an SC computation τ ′ ∈ CSC(P) such that →hb (τ) =→hb (τ ′).
Then τ ′ ∈ CTSO(P) and, by Lemma 8, s0
τ ′
−→ s so s is SC-reachable. ⊓⊔
Our robustness-based oracle makes use of the following characterization of
robustness from earlier work [10]: a program P is not robust against TSO iff
CTSO(P) contains a computation, called witness, as in Figure 5.
Lemma 6 ([10]) Program P is robust against TSO if and only if the set of
TSO computations CTSO(P) contains no witness.
12 A. Bouajjani, G. Calin, E. Derevenetc, R. Meyer
A witness τ delays stores of only one thread in P . The other threads adhere
to the SC semantics. Conditions (W1) – (W4) in Figure 5 describe formally this
restrictive behavior. Furthermore, condition (W5) implies that no computation
τ ′ ∈ CSC(P) can satisfy →hb (τ) =→hb (τ
′).
The computation τwit is a witness for the program in Figure 1. Indeed, in no
SC computation of this program can both loads read the initial values of x and y.
Relative to Figure 5, we have store = store1, load = load1, flush = flush1,
τ3 = store2 · flush2 · load2, and τ1 = τ2 = τ4 = ε.
τ = store load flush
τ1 τ2 τ3 τ4
Fig. 5. Witness τ with store↔ flush and thread t := thread(store) = thread (load).
Witnesses satisfy the following constraints: (W1) Only thread t delays stores. (W2)
Event flush is the first delayed store of t and load is the last event of t past which
flush is delayed. So τ2 contains neither flush events nor fences of t. (W3) Sequence
τ3 contains no events of thread t. (W4) Sequence τ4 consists only of flush events e of
thread t. All these events e satisfy addr(e) 6= addr(load). (W5) We require load→+hb e
for all events e in τ3 · flush.
The robustness-based oracle, given input P , finds a witness τ as in Figure 5
and returns the sequence of instructions for the events in store · τ2 · load that
belong to thread t. If no witness exists, it returns ε. By Lemmas 5 and 6, this
satisfies the oracle conditions from Section 3. Note that, given a robust program
and the robustness-based oracle as inputs, Algorithm 1 returns within the first
iteration of the while loop.
5 Experiments
We have implemented our lazy TSO reachability algorithm on top of the tool
Trencher [1]. Trencher was initially developed for checking robustness and
implements the algorithm for finding witness computations described in [10]. Our
implementation reuses that algorithm as a robustness-based oracle. Trencher
originally used SPIN [17] as back-end SC reachability checker. The current im-
plementation, however, uses a simpler model checker that exploits information
about the instruction set for partial-order reduction. Moreover, it avoids having
to compile the verifier executables (pan) as is the case for SPIN.
We have implemented Algorithm 1 with the following amendments. First, the
extension does not delete the store instruction inst1. This ensures the extended
program has a (sound) superset of the TSO behaviors of the original program.
Second, the extension only adds instructions along q1, . . . , qn. The remaining in-
structions were added to ensure all behaviors of the original program exist in the
extended program, once inst1 is removed. The resulting algorithm is guaranteed
to give correct results for cyclic programs. Of course, it cannot be guaranteed
to terminate in general. Finally, our implementation explores extensions due to
different instruction sequences in parallel, rather than sequentially.
Lazy TSO Reachability 13
We compare our prototype implementation against two other model checkers
that support TSO semantics: Memorax [2] (revision 4f94ab6) and CBMC [14]
(version 4.7).Memorax implements a sound and complete reachability checking
procedure by reducing to coverability in a well-structured transition system.
CBMC is an SMT-based bounded model checker for C programs. Consequently,
it is sound, but not complete: it is complete only up to a given bound on the
number of loop iterations in the input program.
5.1 Examples
# Program T St Tr RQ CPU Real
1 Parker (non-rob) 2 11 10 4 8 5
2 Peterson (non-rob) 2 14 18 12 21 13
3 Dekker (non-rob) 2 24 30 30 171 70
4 Lamport (non-rob) 3 33 36 27 1839 694
5 MCS Lock 4 52 50 30 127 61
6 CLH Lock 3 43 41 70 10 7
7 Lock-Free Stack 4 46 50 14 9 7
Fig. 6. Trencher benchmarking results. The tests
are available online [1]. Times here are in milliseconds.
We tested our tool on a
set of examples. Figure 6
summarizes characteristics
of the examples taken from
the initial Trencher tests:
number of threads (T),
states (St), and transitions
(Tr). The first example is a
model of the buggy Parker
class from Java VM [15]. The next three examples are mutual exclusion proto-
cols implemented via shared variables. These protocols do not guarantee mutual
exclusion under TSO. We tested Dekker’s and Peterson’s algorithms for two
threads, and Lamport’s fast mutex [22] for three threads. The last three tests
from Figure 6 give statistics concerning reachability in robust test cases for the
lock-free stack, and for the MCS and CLH locking algorithms from [16].
We also performed three parametrized tests. First, we varied the number
of threads in Lamport’s fast mutex [22] (see left-hand-side of Figure 7). The
modified Dekker in Figure 8 is inspired by the examples of the fence-insertion
tool musketeer [4] and adds an “N -branching diamond” (see right-hand-side
of Figure 8) to both program threads. Lastly, the program in Figure 9 places
stores to address x on a length N loop in thread t1: since t1 expects to load the
initial y value while t2 expects to load 1 and then 0 from x, an execution that
reaches the goal state goes through the length N loop twice.
5.2 Evaluation
We ran all tests on a QEMU @ 2.67GHz virtual machine (16 cores) with 8GB
RAM running GNU/Linux. The table in Figure 6 summarizes the results of the
Trencher benchmark tests. RQ is the number of SC reachability queries raised
by Trencher. The columns CPU and Real give the total CPU time and the
wall-clock time for performing a test.
The first graph in Figure 10 depicts the running times of the three tools
on the non-robust examples from Figure 6. For CBMC, we used the versions
of the mutual exclusion algorithms that its authors provide. For Memorax,
we hand-wrote *.rmm files for the first 4 test programs. We did not perform
a comparison for robust programs: if SC reachability returns false on an input
14 A. Bouajjani, G. Calin, E. Derevenetc, R. Meyer
ti
r ← i
mem[x]← r
ry ← mem[y]
assume ry 6= 0
assume ry = 0
mem[y]← r
assume rx 6= r
ry ← mem[y]
assume ry 6= r
assume rx = r
mem[y]← 0
rx ← mem[x]
2 3 4
0
23
39
N — number of threads
w
a
ll
-c
lo
ck
ti
m
e
(s
ec
o
n
d
s)
Memorax
Trencher
CBMC
Fig. 7. The i-th Lamport mutex thread (left) and running times for N threads (right).
t1
mem[x]← 1
r1 ← mem[y]
assume r1=0
♦N (a)
t2
mem[y]← 1
r2 ← mem[x]
assume r2=0
♦N (b)
entry
♦N (a):
· · ·
exit
r ← mem[a]
assume r = 0 assume r = i ∀i ∈ [1..N − 1]
mem[a]← 1 mem[a]← (i+ 1) mod N
Fig. 8. Dekker’s algorithm modified so that an “N-branching diamond” over distinct
addresses a, b /∈ {x, y} is placed between the accesses to x and y. A final goal state is
TSO-reachable if the first store is delayed past the last load in either t1 or t2.
t1
assume r1 < N
assume r1 = N
r2 ← mem[y]
assume r2 = 0
r1 ← 0
mem[x]← r1
r1 ← r1 + 1
t2
mem[y]← 1
r3 ← mem[x]
assume r3 = 1
r3 ← mem[x]
assume r3 = 0
3 5 7 9 11 13
0
8
15
45
N — length of the loop
w
a
ll
-c
lo
ck
ti
m
e
(s
ec
o
n
d
s)
Trencher
CBMC
Fig. 9. A final goal state is TSO-reachable if t1 goes through the (length N) loop two
times: once to satisfy assume r3 = 1 and the second time to satisfy assume r3 = 0.
program, our implementation decides mutual exclusion as fast as Trencher is
Lazy TSO Reachability 15
1 2 3 4
0
0.2
0.4
0.6
0.8
Test order index from Figure 6.
w
a
ll
-c
lo
ck
ti
m
e
(s
ec
o
n
d
s)
Memorax
Trencher
CBMC
10 20 30 40
0
15
30
60
N — diamond branching factor
w
a
ll
-c
lo
ck
ti
m
e
(s
ec
o
n
d
s)
Trencher
Memorax
Fig. 10. Running times for the non-robust tests in Figure 6 (left) and Figure 8 (right).
able to determine robustness. Moreover, CBMC implements strictly an under-
approximative method where the number of loop iterations is bounded. Our
robust tests, however, contain unbounded loops.
The high load needed to verify Lamport’s mutex — in comparison with the
other Figure 6 tests — is justified by the correlation between the program’s
data domain size and its number of threads. For a larger number of threads,
the right-hand-side graph in Figure 7 shows that CBMC is fastest. This is the
case since, actually, the smallest unwind bound suffices for CBMC to conclude
reachability. For Memorax and Trencher the system runs out of memory
when N = 5. This underlines once again just how troublesome the state-space
explosion is for TSO reachability. Although it is not easily noticeable in the
picture, Memorax’s exponential scaling is better than Trencher’s: although
Trencher is slightly faster than Memorax for N ∈ {2, 3}, Memorax clearly
outperforms Trencher when N = 4.
The graph in Figures 9 show that, for the second parameterized test, our pro-
totype is faster than CBMC. Indeed, with increasing N , an ever larger number
of constraints need to be generated by CBMC. For Trencher, regardless of the
value of N , it takes three SC reachability queries to conclude TSO reachability.
The second graph in Figure 10 shows that, for the programs described by
Figure 8, our prototype is faster than Memorax. It seems Memorax cannot
cope well with the branching factor that the parameter N introduces.
To better understand the difficulty of the latter two parametric tests, we
present the exponential scaling behaviors of Trencher in Figure 11.
5.3 Discussion
Because we find several witnesses in parallel, throughout the experiments our
implementation required up to 2 iterations of the loop in Algorithm 1. In the
case of robust programs, one iteration is always sufficient. This suggests that
robustness violations are really the critical behaviors leading to TSO reachability.
The experiments indicate that, at least for some programs with a high branch-
ing factor, our implementation is faster thanMemorax if a useful witness can be
16 A. Bouajjani, G. Calin, E. Derevenetc, R. Meyer
500 1,000 1,500 2,000 2,500 3,000
0
59
value of N in Figure 8
w
a
ll
-c
lo
ck
ti
m
e
(s
ec
o
n
d
s)
Trencher
20 40 60 80 100 120 140 160
0
19
value of N in Figure 9
w
a
ll
-c
lo
ck
ti
m
e
(s
ec
o
n
d
s)
Trencher
Fig. 11. Additional Trencher results for the programs in Figures 8 and 9. Memorax
takes already 1 minute and 24 seconds for the program in Figure 8 and N = 50, while
CBMC takes 8 minutes and 35 seconds for the program in Figure 9 and N = 20.
found within a small number of iterations of Algorithm 1. Similarly, our proto-
type is better than CBMC for programs which require a high unwinding bound
to make visible TSO behavior reaching a goal state. Although the two programs
by which we show this are rather artificial, we expect such characteristics to
occur in actual code. Hence, our approach seems to be strong on an orthogonal
set of programs. In a portfolio model checker, it could be used as a promising
alternative to the existing techniques.
To evaluate the practicality of our method, more experiments are needed. In
particular, we hope to be able to substantiate the above conjecture for concrete
programs with behavior like that depicted in Figures 8 and 9. Unfortunately,
there seems to be no clear way of translating (compiled) C programs into our
simplified assembly syntax without substantial abstraction. To handle C code, an
alternative would be to reimplement our method within CBMC. But this would
force us to determine a-priori a good-enough unwinding bound. Moreover, we
could no longer conclude safety of robust programs with unbounded loops.
Acknowledgements The third author was granted by the Competence Center
High Performance Computing and Visualization (CC-HPC) of the Fraunhofer
Institute for Industrial Mathematics (ITWM). The work was partially supported
by the PROCOPE project ROIS: Robustness under Realistic Instruction Sets
and by the DFG project R2M2: Robustness against Relaxed Memory Models.
References
1. Trencher tool. http://concurrency.informatik.uni-kl.de/trencher.html.
2. P. A. Abdulla, M. F. Atig, Y.-F. Chen, C. Leonardsson, and A. Rezine. Counter-
Example Guided Fence Insertion under TSO. In TACAS, volume 7214 of LNCS,
pages 204–219. Springer, 2012.
3. J. Alglave. A Shared Memory Poetics. PhD thesis, University Paris 7, 2010.
Lazy TSO Reachability 17
4. J. Alglave, D. Kroening, V. Nimal, and D. Poetzl. Don’t Sit on the Fence - A
Static Analysis Approach to Automatic Fence Insertion. In CAV, volume 8559 of
LNCS, pages 508–524. Springer, 2014.
5. J. Alglave, D. Kroening, V. Nimal, and M. Tautschnig. Software Verification for
Weak Memory via Program Transformation. In ESOP, volume 7792 of LNCS,
pages 512–532. Springer, 2013.
6. J. Alglave, D. Kroening, and M. Tautschnig. Partial Orders for Efficient BMC of
Concurrent Software. CoRR, abs/1301.1629, 2013.
7. J. Alglave and L. Maranget. Stability in Weak Memory Models. In CAV, volume
6806 of LNCS, pages 50–66. Springer, 2011.
8. M. F. Atig, A. Bouajjani, S. Burckhardt, and M. Musuvathi. On the Verification
Problem for Weak Memory Models. In POPL, pages 7–18. ACM, 2010.
9. M. F. Atig, A. Bouajjani, and G. Parlato. Getting Rid of Store-Buffers in TSO
Analysis. In CAV, volume 6806 of LNCS, pages 99–115. Springer, 2011.
10. A. Bouajjani, E. Derevenetc, and R. Meyer. Checking and Enforcing Robustness
against TSO. In ESOP, volume 7792 of LNCS, pages 533–553. Springer, 2013.
11. A. Bouajjani, R. Meyer, and E. Mo¨hlmann. Deciding Robustness against Total
Store Ordering. In ICALP, volume 6756 of LNCS, pages 428–440. Springer, 2011.
12. S. Burckhardt and M. Musuvathi. Effective Program Verification for Relaxed
Memory Models. In CAV, volume 5123 of LNCS, pages 107–120. Springer, 2008.
13. J. Burnim, C. Stergiou, and K. Sen. Sound and Complete Monitoring of Sequential
Consistency for Relaxed Memory Models. In TACAS, volume 6605 of LNCS, pages
11–25. Springer, 2011.
14. E. Clarke, D. Kroening, and F. Lerda. A Tool for Checking ANSI-C Programs. In
TACAS, volume 2988 of LNCS, pages 168–176. Springer, 2004.
15. D. Dice. A race in LockSupport park() arising from Weak Memory Models.
https://blogs.oracle.com/dave/entry/a_race_in_locksupport_park.
16. M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. MKP, 2008.
17. G. J. Holzmann. The Model Checker SPIN. IEEE Tr. Sof. Eng., 23:279–295, 1997.
18. D. Kozen. Lower Bounds for Natural Proof Systems. In FOCS, pages 254–266.
IEEE Computer Society, 1977.
19. M. Kuperstein, M. Vechev, and E. Yahav. Partial-Coherence Abstractions for
Relaxed Memory Models. In PLDI, pages 187 – 198. ACM, 2011.
20. M. Kuperstein, M. T. Vechev, and E. Yahav. Automatic Inference of Memory
Fences. ACM SIGACT News, 43(2):108–123, 2012.
21. L. Lamport. How to Make a Multiprocessor Computer that Correctly Executes
Multiprocess Programs. IEEE Tr. on Com., 28(9):690–691, 1979.
22. L. Lamport. A Fast Mutual Exclusion Algorithm. ACM Tr. Com. Sys., 5(1), 1987.
23. A. Linden and P. Wolper. An Automata-Based Symbolic Approach for Verifying
Programs on Relaxed Memory Models. In SPIN, volume 6349 of LNCS, pages
212–226. Springer, 2010.
24. A. Linden and P. Wolper. A Verification-based Approach to Memory Fence Inser-
tion in Relaxed Memory Systems. In MCS, volume 6823 of LNCS, pages 144–160.
Springer, 2011.
25. S. Owens, S. Sarkar, and P. Sewell. A Better x86 Memory Model: x86-TSO (ex-
tended version). Technical Report CL-TR-745, University of Cambridge, 2009.
26. D. Shasha and M. Snir. Efficient and Correct Execution of Parallel Programs that
Share Memory. ACM Tr. on Prog. Lang. and Sys., 10(2):282–312, 1988.
18 A. Bouajjani, G. Calin, E. Derevenetc, R. Meyer
A A Simple Safe Program
The program from Figure 12 is safe since no goal state is TSO-reachable: the
initial control states will never be left since the conditionals will never succeed.
However, the algorithm that we describe for Theorem 4 does not terminate for
this example. Although every Pk that unrolls the program in Figure 12 up to
k ∈ N is found safe, the algorithm only stops if a TSO-reachable state is found
or if O(R) = ǫ, which is never the case.
t1
q0,1 qf,1
assume r1 = 2
mem[x]← 1− r1
r1 ← mem[y]
t2
q0,2 qf,2
assume r2 = 2
mem[y]← 1− r2
r2 ← mem[x]
Fig. 12. A safe program for which Algorithm 1 (as in Theorem 4) does not terminate.
The underlying reason why always O(R) 6= ǫ is that there are infinitely
many sequences instmstore · inst load, where inststore = (q0,1, mem[x]← 1− r1, q0,1),
inst load = (q0,1, r1 ← mem[y], q0,1), and m ∈ N.
B Proofs missing in Subsection 3.1
Prior to proving Lemma 1 we do a bit of preparation. We rely on computations
that delay flush events locally the least. Lemma 7 explains what this means.
Lemma 7 Let α ∈ CTSO(R) and t ∈ TID. There exists α¨ ∈ CTSO(R) such
that →hb (α) =→hb (α¨) and, for all events estore ↔ eflush within thread t, if
α¨↓ t := αprefix · estore · α′ · eflush · αsuffix then either
(1) α′ := β · eload · β′ and all events e ∈ β′ are flushes,
or (2) all events e ∈ α′ are local assignments or conditionals
Proof. Intuitively, the theorem states that flush events of thread t delayed past
same-thread local events, may be delayed less without changing the happens-
before relation of the computation. Local events are assignments, conditionals,
and store events in the same thread.
Let α := α1 · estore ·α2 · e ·α3 · eflush ·α4 such that estore ↔ eflush are events
of thread t, e is a local event in t and thread(e′) 6= t for all events e′ ∈ α3.
We denote by α0 := α1 ·estore ·α2 ·α3 ·eflush ·e ·α4 the TSO computation that
first performs the flush eflush and then the event e. Notice that since α3 contains
no events e′ with thread(e′) = t, feasibility of computation α0 is ensured and
→hb (α) =→hb (α0) holds.
Starting with the last flush event in α, we use the above reordering of events
e to locally delay flush events less. In the end we obtain computation α¨ in which
no flush event of thread t can be locally delayed less. ⊓⊔
Furthermore, in order to reference instructions of R ⊕ σ that the extension
adds we give an alternative description for some of the transition sequences in
Lazy TSO Reachability 19
the main text. Recall that variable count keeps track of the number of store
instructions processed along σ.
If cmd(inst i) = mem[e] ← e′, we said count is incremented and instructions
that remember the value and address written in arcount and vrcount are added.
qi−1 qi
arcount ← e vrcount ← e′
(1)
If cmd(inst i) = r ← mem[e] we said instructions are added that load from
memory only when a load from the simulated buffer is not possible. More pre-
cisely, if some j ∈ [1, count] such that arj = e is found, r is assigned the value
of vrj . Otherwise, the register r receives its value from the address ê.
qi−1 · · ·
· · ·
qi
assume arcount 6= e assume ar1 6= e r ← mem[e]
assume ar1 = eassume arcount = e
r ← vr1
r ← vrcount
Alternatively, assuming qcheck,i,count := qi−1, this can be stated as adding
{(qcheck,i,count, assume arcount = e, qbuf,i,count)} (2)
⊎ {(qcheck,i,count, assume arcount 6= e, qcheck,i,count−1)} (3)
⊎ {(qbuf,i,count, r← vrcount, qi)} (4)
...
⊎ {(qcheck,i,1, assume ar1 = e, qbuf,i,1)} (5)
⊎ {(qcheck,i,1, assume ar1 6= e, qmem,i)} (6)
⊎ {(qbuf,i,1, r ← vr1, qi)} (7)
⊎ {(qmem,i, r ← mem[e], qi)} (8)
We said that out of control state qn we create a sequence of stores to flush the
contents of the auxiliary registers and return to the code of the original thread.
qn · · · dst(instn)
mem[ar1]← vr1 mem[armax]← vrmax
Alternatively, we could have stated it as adding
{(qn, mem[ar1]← vr1, qflush,1)} (9)
...
⊎ {(qflush,max−1, mem[armax]← vrmax, dst(instn))} (10)
20 A. Bouajjani, G. Calin, E. Derevenetc, R. Meyer
Furthermore, for all instructions inst ∈ It with src(inst) = src(inst i) for
some i ∈ [1..n] and for which inst 6= inst i we added instructions that flush the
stores buffered in the auxiliary registers and return to dst(inst).
qi · · · dst(inst)
mem[ar1]← vr1 mem[arcount]← vrcount cmd(inst)
Alternatively, we could have stated it as adding
{(qi, mem[ar1]← vr1, qnext,i,1)} (11)
...
⊎ {(qnext,i,count−1, mem[arcount]← vrcount, qnext,i,count)} (12)
⊎ {(qnext,i,count, cmd(inst), dst(inst))} (13)
Finally, for all load instructions inst i, where i < n, as well as out of q1 we
added instructions that flush and fence the pair (ar1, vr1), make the remaining
buffered stores in the auxiliary registers visible, and return to q. Here q :=
src(inst i) in the load case and q := dst(inst1) otherwise.
qi · · · q
mem[ar1]← vr1 mfence mem[arcount]← vrcount
Alternatively, we could have stated it as adding
{(qi, mem[ar1]← vr1, qfence,i)} (14)
⊎ {(qfence,i, mfence, qorig,i,2)} (15)
⊎ {(qorig,i,2, mem[ar2]← vr2, qorig,i,3)} (16)
...
⊎ {(qorig,i,count, mem[arcount]← vrcount, q)} (17)
We can now turn to the actual proof of Lemma 1.
Proof (of # 1). Assume t is the thread of σ := inst1 · . . . · instn, XTSO(R⊕σ) :=
(E⊕, S⊕, ∆TSO, s⊕, F⊕), I and Q are the instructions and states of R, DOM and
REG are registers and addresses used by R, and I⊕ are the instructions I
′
t of
R ⊕ σ as described in Section 3.
A direct result of Lemmas 7 and 8 is that TSO computations of R that delay
flushes of t locally the least reach all the states in the set ReachTSO(R). Assume
α ∈ CTSO(R) is a computation where flushes of t are delayed locally the least as
Lemma 7 describes and let s0, . . . , sm ∈ STSO for some m ∈ N be all the states
along the transition sequence s0
α
−→ s, i.e., s0 := s0 and sm := s. Also, for all
k ∈ [0,m], let αk denote prefixes of α with s0
αk−−→ sk.
We prove by induction over state indexes k ∈ [0,m] that there exist prefixes
βk of β ∈ CTSO(R ⊕ σ) and states s′0, . . . , s
′
m ∈ S⊕ along s⊕
β
−→ s′ ∈ ∆∗TSO with
s′0 := s⊕ and s
′
m := s
′ such that the following invariants hold:
Lazy TSO Reachability 21
Inv-0 s0
α′
−→ (pc, val, buf) and s⊕
β′
−→ (pc′, val′, buf′).
Inv-1 If pc and pc′ differ then they only differ for thread t. Moreover, if
pc(t) 6= pc′(t) then pc(t) = dst(inst i) and pc′(t) = qi for some i ∈ [1..n− 1].
Inv-2 val′(a) = val(a) for all a ∈ DOM ∪ REG.
Inv-3 buf and buf′ differ at most for t. Furthermore, if buf(t) 6= buf′(t) then
pc′(t) = qi for some i ∈ [1..n− 1] and buf(t) = (ârcount, v̂rcount) · . . . · (âr1, v̂r1) ·
buf′(t) where count stores are seen along σ from src(inst1) to dst(inst i).
For the induction base case k = 0, α0 = ǫ, s0 = s0, pc = pc0, val = val0,
and buf = buf0. Then, for β0 := ǫ and s
′
0 = s⊕, invariants Inv-0...3 hold.
For the induction step case, assume that invariants Inv-0...3 hold for
k < m and that sk
e
−→ sk+1 := (pc+, val+, buf+) for some e ∈ E. We use a
case distinction over possible events e to define βk+1 such that s
′
0
βk+1
−−−→ s′k+1 :=
(pc′+, val
′
+, buf
′
+) and invariants Inv-0...3 hold for k + 1.
If thread(e) := t′ 6= t it means inst(e) ∈ I⊕ is enabled in pc′(t′), so there
exist e′ ∈ E⊕ and s′k+1 ∈ S⊕ such that inst(e
′) := inst(e) and (s′k, e
′, s′k+1) ∈
∆TSO in XTSO(R ⊕ σ). We define βk+1 := βk · e′ and find that, by the ∆TSO
semantics (Figure 2) and under the assumption that invariants Inv-0...3 hold
for k, invariants Inv-0...3 also hold for k + 1.
If thread(e) = t we make the following case distinction over e and pc′(t).
1 “e is a flush event.” This first case deals with the possibility that a store
operation is flushed. Depending on whether buf′(t) 6= ǫ, we either flush the oldest
address-value pair of buf′(t) or the first address-value auxiliary registers pair. By
Lemma 7, the later case can only happen when pc′(t) = qi for some i ∈ [2..n−1]
and inst i performs a load or i = 1.
If buf′(t) 6= ǫ we flush the oldest write access buffered. Namely, let eflush ∈ E⊕
and s′k+1 ∈ S⊕ such that, according to rule (WM), (s
′
k, eflush, s
′
k+1) ∈ ∆TSO.
We define βk+1 := βk · eflush and invariants Inv-0...3 hold for k + 1 since
(0) Inv-0,3 hold for k so s0
αk+1
−−−→ sk+1 and s′0
βk+1
−−−→ s′k+1, implying Inv-0
holds for k + 1.
(1) Inv-1 holds for k, pc+(t) = pc(t), and pc
′
+(t) = pc
′(t), so Inv-1 holds for
k + 1.
(2) Inv-2,3 hold for k, so events e and eflush update the same address by a
same value and Inv-2 holds for k + 1.
(3) Inv-3 holds for k and events e and eflush remove one address-value pair
from both buf(t) and buf′(t), so Inv-3 holds for k + 1.
Otherwise, buf′(t) = ǫ and count stores are encountered from src(inst1) to
pc′(t) = qi for some i ∈ [1..n− 1]. Then buf(t) = (ârcount, v̂rcount) · . . . · (âr1, v̂r1)
and, by Lemma 7, we know inst i is either the first store inst1 of σ or a load.
Either way, let e1, . . . , ecount, eflush, efence ∈ E⊕ match equations (14–17) in the
extension and s′k+1 ∈ S⊕ such that events ej are, for all j ∈ [1..count], the
buffering events for the stores (14,16–17), eflush is the flush event for the store
(14), efence is the event for the fence (15), and s
′
k
e1·eflush·efence·e2·...·ecount−−−−−−−−−−−−−−−→ s′k+1 ∈
∆∗TSO according to rules (ST,MEM,F) in Figure 2. We then define βk+1 :=
22 A. Bouajjani, G. Calin, E. Derevenetc, R. Meyer
βk · e1 · eflush · efence · e2 · . . . · ecount · e and find that invariants Inv-0...3 hold
for k + 1 since
(0) Inv-0,3 hold for k so s0
αk+1
−−−→ sk+1 and s′0
βk+1
−−−→ s′k+1, i.e. Inv-0 holds
for k + 1.
(1) Inv-1 holds for k and pc+(t) = q = pc
′
+(t), where q := src(inst i) if inst i
is a load and q := dst(inst1) otherwise, so Inv-1 holds for k + 1.
(2) Inv-2,3 hold for k, events e and eflush update the same address by the
same value and, since the other events do not update any address, Inv-2 holds
for k + 1.
(3) Inv-3 holds for k, and events e2, . . . , ecount place the corresponding
address-value pairs that match buf+(t) into buf
′
+(t), so Inv-3 holds for k + 1.
2 “e is not a flush event, pc′(t) = qi for i ∈ [1..n− 1], inst(e) 6= inst i+1.”
Event e corresponds to an instruction that does not follow σ. Then, events for
instructions (11–13) place the auxiliary address-value pairs into buf′+(t) and
then perform cmd(inst(e)). Let e1, . . . , ecount, e
′ ∈ E⊕ and s
′
k+1 ∈ S⊕ such that
ej are, for all j ∈ [1..count], the buffering events for stores (11–12), e′ is the
event for instruction (13), and s′k
e1·...·ecount·e
′
−−−−−−−−→ s′k+1 ∈ ∆
∗
TSO, according to the
Figure 2 rules. We define βk+1 := βk · e1 · . . . · ecount · e′ and find that invariants
Inv-0...3 hold for k + 1 since
(0) Inv-0 holds for k so s0
αk+1
−−−→ sk+1 and s
′
0
βk+1
−−−→ s′k+1, i.e. Inv-0 holds for
k + 1.
(1) Inv-1 holds for k and pc+(t) = dst(inst(e)) = pc
′
+(t), so Inv-1 holds for
k + 1.
(2) Inv-2 holds for k and the events e and e′ update at most one REG register
by the same value, so Inv-2 holds for k + 1.
(3) Inv-3 holds for k, the buffering store events e1, . . . , ecount make the
address-value pairs of the auxiliary registers explicit in buf′+(t), and if events
e and e′ are buffering events for stores then they add the same address-value
pair, so Inv-3 holds for k + 1.
3 “inst(e) performs a store and 2 fails.” We analyze the following sub-
cases depending on the value of pc′(t).
3a “pc′(t) = qi−1 for some i ∈ [1..n− 1].” Since 2 does not hold, inst(e) =
inst i and auxiliary registers track the store inst i. Let ea, ev ∈ E⊕ be events for
the instructions in (1) and s′k+1 ∈ S⊕ such that s
′
k
ea·ev−−−→ s′k+1 ∈ ∆
∗
TSO according
to the ∆TSO rule for local assignments. We define βk+1 := βk · ea · ev and find
that invariants Inv-0...3 hold for k + 1 since
(0) Inv-0 holds for k so s0
αk+1
−−−→ sk+1 and s′0
βk+1
−−−→ s′k+1, i.e. Inv-0 holds for
k + 1.
(1) Inv-1 holds for k, pc+(t) = dst(inst i), and pc
′
+(t) = qi, so Inv-1 holds
for k + 1.
(2) Inv-2 holds for k and no memory changes occurred outside of auxiliary
registers, so Inv-2 holds for k + 1.
(3) Inv-3 holds for k and (ârcount, v̂rcount) matches the address-value pair
added by e to buf+(t), so Inv-3 holds for k + 1.
Lazy TSO Reachability 23
3b “pc′(t) = pc(t) 6= src(inst1).” This case is similar to the one when
thread(e) 6= t since inst(e) ∈ I⊕. Then there exist e′ ∈ E⊕ and s′k+1 ∈ S⊕
such that inst(e′) = inst(e) and (s′k, e
′, s′k+1) ∈ ∆TSO in XTSO(R ⊕ σ). We de-
fine βk+1 := βk · e′ and find that, by the ∆TSO semantics (Figure 2), invariants
Inv-0...3 continue to hold for k + 1.
4 “inst(e) performs a load and 2 fails.” We analyze the following sub-
cases depending on the value of pc′(t).
4a “pc′(t) = qi−1 for some i ∈ [1..n− 1].” Since 2 does not hold, inst(e) =
inst i and we use (4–7,8) to load from e only when no register arj matches e for
any j ∈ [1..count].
If there exists a largest j ∈ [1..count] such that arj = e then r will take
its value from the auxiliary register vrj . Let ecount, . . . , ej , eassign ∈ E⊕ and
s′k+1 ∈ S⊕ such that ek are, for all k ∈ [j + 1..count], the events for negative
conditional checks (3,6), ej is the event for the earliest positive conditional check
(2,5), eassign is the event for an instruction (4,7), and s
′
k
ecount·...·ej·eassign
−−−−−−−−−−−→ s′k+1 ∈
∆∗TSO according to the rules for conditionals and local assignments in ∆TSO. We
define βk+1 := βk · ecount · . . . · ej · eassign and find that the invariants Inv-0...3
hold for k + 1 since
(0) Inv-0 holds for k so s0
αk+1
−−−→ sk+1 and s′0
βk+1
−−−→ s′k+1, i.e. Inv-0 holds for
k + 1.
(1) Inv-1 holds for k, pc+(t) = dst(inst i), and pc
′
+(t) = qi, so Inv-1 holds for
k + 1.
(2) Inv-2 holds for k, both e and eassign update r by the same value, and no
other event ecount, . . . , ej changes any address, so Inv-2 holds for k + 1.
(3) Inv-3 holds for k and no event alters buffer contents, so Inv-3 holds for
k + 1.
Otherwise, arj 6= e holds for all j ∈ [1..count] and the register r will take its
value from the address indicated by e. Namely, let ecount, . . . , e1, eload ∈ E⊕
and s′k+1 ∈ S⊕ such that ek are, for all k ∈ [1..count], the events for
negative conditional checks (3,6), eload is the event for instruction (8), and
s′k
ecount·...·e1·eload−−−−−−−−−−→ s′k+1 ∈ ∆
∗
TSO according to the rule for conditionals in ∆TSO
and (LB/LM). We define βk+1 := βk ·ecount · . . . ·e1 ·eload and find that invariants
Inv-0...3 hold for k + 1:
(0) Inv-0 holds for k so s0
αk+1
−−−→ sk+1 and s′0
βk+1
−−−→ s′k+1, i.e. Inv-0 holds for
k + 1.
(1) Inv-1 holds for k, pc+(t) = dst(inst i), and pc
′
+(t) = qi, so Inv-1 holds for
k + 1.
(2) Inv-2 holds for k, both e and eload update r by the same value, and no
other event ecount, . . . , e1 changes any address, so Inv-2 holds for k + 1.
(3) Inv-3 holds for k and no event alters buffer contents, so Inv-3 holds for
k + 1.
4b “pc′(t) = qn−1.” Since 2 does not hold, inst(e) = instn. Furthermore, be-
cause count = max, additionally to performing the events that simulate the load
behavior as in subcase 4a , the extension returns to the original program flow
24 A. Bouajjani, G. Calin, E. Derevenetc, R. Meyer
using events for (9–10) and makes the auxiliary registers address-value pairs
explicit in buf′+(t).
Let e′1, . . . , e
′
max ∈ E⊕ and s
′
k+1 ∈ S⊕ such that e
′
k are, for all k ∈ [1..max],
the buffering events for stores (9,10), and s′′k+1
e
′
1·...·e
′
max−−−−−−→ s′k+1 ∈ ∆
∗
TSO according
to (LS) from Figure 2, with s′′k+1 being notation for s
′
k+1 from 4a . We define
βk+1 := β
′
k+1 · e
′
1 · . . . · e
′
max, where β
′
k+1 is notation for βk+1 from 4a , and find
that the invariants Inv-0...3 hold for k + 1 since
(0) Inv-0 holds for k so s0
αk+1
−−−→ sk+1 and s′0
βk+1
−−−→ s′k+1, i.e. Inv-0 holds for
k + 1.
(1) Inv-1 holds for k and pc+(t) = dst(instn) = pc
′
+(t), so Inv-1 holds for
k + 1.
(2) Inv-2 holds for k, both events e and eload update r by the same value, and
no other event ecount, . . . , e1, e
′
1, . . . , e
′
max changes any address, so Inv-2 holds for
k + 1.
(3) Inv-3 holds for k and events e′1, . . . , e
′
max place the corresponding address-
value pairs that match buf+(t) into buf
′
+(t), so Inv-3 holds for k + 1.
4c “pc′(t) = pc(t).” This case is similar to 3b . Let e′ ∈ E⊕ and s′k+1 ∈ S⊕
such that inst(e′) = inst(e) and (s′k, e
′, s′k+1) ∈ ∆TSO in XTSO(R ⊕ σ). We
define βk+1 := βk · e′ and find that, by the ∆TSO semantics (Figure 2), the
invariants Inv-0...3 hold for k + 1.
5 “e performs an assignment, conditional, or memory fence and 2 fails.”
We analyze the following subcases.
5a “pc′(t) = qi−1 for i ∈ [1..n− 1].” Since 2 does not hold, inst(e) = inst i is
either a conditional or an assignment.
If cmd(inst i) = r ← e let e′ ∈ E⊕ and s′k+1 ∈ S⊕ such that inst(e
′) =
(qi−1, r ← e, qi) and (s′k, e
′, s′k+1) ∈ ∆TSO by the ∆TSO rule for local assign-
ments. We define βk+1 := βk · e′ and find that the invariants Inv-0...3 hold for
k + 1 since
(0) Inv-0 holds for k so s0
αk+1
−−−→ sk+1 and s′0
βk+1
−−−→ s′k+1, i.e. Inv-0 holds for
k + 1.
(1) Inv-1 holds for k, pc+(t) = dst(inst i), and pc
′
+(t) = qi, so Inv-1 holds for
k + 1.
(2) Inv-2 holds for k and e is evaluated the same by both e and e′, so the
register r is updated by the same value and Inv-2 holds for k + 1.
(3) Inv-3 holds for k and no event alters buffer contents, so Inv-3 holds for
k + 1.
Otherwise, cmd(inst i) = assume e. Let e
′ ∈ E⊕ and s′k+1 ∈ S⊕ such that
inst(e′) = (qi−1, assume e, qi) and (s
′
k, e
′, s′k+1) ∈ ∆TSO by the ∆TSO rule for
conditionals. We define βk+1 := βk · e′ and find that the invariants Inv-0...3
hold for k + 1 since
(0) Inv-0 holds for k so s0
αk+1
−−−→ sk+1 and s′0
βk+1
−−−→ s′k+1, i.e. Inv-0 holds for
k + 1.
(1) Inv-1 holds for k, pc+(t) = dst(inst i), and pc
′
+(t) = qi, so Inv-1 holds for
k + 1.
Lazy TSO Reachability 25
(2) Inv-2 holds for k and both e and e′ do not change any address, so Inv-2
holds for k + 1.
(3) Inv-3 holds for k and no event alters buffer contents, so Inv-3 holds for
k + 1.
5b “pc′(t) = pc(t).” This case covers the remaining possibilities when e is an
assignment, conditional, or memory fence. Similar to cases 3b and 4c , let
e
′ ∈ E⊕ and s′k+1 ∈ S⊕ such that inst(e
′) = inst(e) and (s′k, e
′, s′k+1) ∈ ∆TSO
in XTSO(R⊕σ). We define βk+1 := βk ·e′ and find that, by the ∆TSO semantics
(Figure 2), invariants Inv-0...3 hold for k + 1.
The above case distinction covers all possibilities for events e that α may
perform from sk. Hence, by complete induction, the extension does not remove
TSO-reachable states: if s = (pc, val, buf) is reachable by α then there exists
s′ = (pc′, val′, buf′) and β ∈ CTSO(R ⊕ σ) such that s′ is reachable by β in R⊕σ,
pc = pc′, val(a) = val′(a) for all a ∈ DOM ∪ REG, and buf = buf′ are empty.
For the reverse direction, let fτ : CTSO(R) → CTSO(R ⊕ τ) be the map
α 7→ β that the inductive proof implies, respectively fτ : E → E∗⊕ its re-
striction to events matching the different inductive cases. Furthermore, consider
computations β ∈ CTSO(R ⊕ σ) that do not interleave events of other threads
within the events of sequences fτ (e). Such computations reach the entire set
ReachTSO(R ⊕ σ). E.g., since local events ecount, . . . , e1 as in case 4a that pre-
cede eload can be performed right before eload, the above restriction does not
change the set of TSO-reachable states in R ⊕ σ. Note that fτ is a bijection
between such computations β and computations α ∈ CTSO(R) that delay flushes
locally the least wrt. t. Another induction can show that for each computation β
as described above there exists a computation α ∈ CTSO(R) such that invariants
Inv-0...3 hold for prefixes of β and α. This implies that the extension by σ does
not add TSO-reachable states. ⊓⊔
C TSO Semantics and Proofs missing in Section 4
Figure 13 describes the full TSO semantics. For completeness, states s ∈ SM use
the additional event counter ec : TID → N to identify events. This is used, e.g.,
to define matching stores and flushes and does not affect in any way our results.
As mentioned in subsection 2.1, under SC, stores are flushed immediately:
cmd = mem[ea]← ev, a = êa, v = êv, id = ec(t)
s
(t,id,inst,a)(t,id,flush,a)
−−−−−−−−−−−−−−−→ (ec′, pc′, val[a := v], buf)
(LSWM)
Lemma 8 If α, β ∈ CTSO(P), s0
α
−→ s, and →hb (α) =→hb (β) then s0
β
−→ s.
Proof. Assume s0
β
−→ s′. Since α and β have the same program order →po ,
it means s and s′ have the same index counter ec and program counter pc.
Moreover, since α and β have the same conflict order →cf , s and s′ have the
same memory valuation val. Finally, since computations α and β empty the
buffers, s and s′ have empty buffers. In conclusion, s = s′. ⊓⊔
26 A. Bouajjani, G. Calin, E. Derevenetc, R. Meyer
cmd = r ← mem[ea], a = êa, buf(t)↓(N× {a} × DOM) = (id , a, v) · β
s
(t,ec(t),inst,a)
−−−−−−−−−→ (ec′, pc′, val[r := v], buf)
(RB)
cmd = r ← mem[ea], a = êa, buf(t)↓(N× {a} × DOM) = ε, v = val(a)
s
(t,ec(t),inst,a)
−−−−−−−−−→ (ec′, pc′, val[r := v], buf)
(RM)
cmd = mem[ea]← ev, a = êa, v = êv, id = ec(t)
s
(t,id,inst,a)
−−−−−−−→ (ec′, pc′, val, buf[t := (id , a, v) · buf(t)])
(LS)
buf(t) = β · (id , a, v)
s
(t,id,flush,a)
−−−−−−−−→ (ec, pc, val[a := v], buf[t := β])
(WM)
cmd = mfence, buf(t) = ε
s
(t,ec(t),inst,⊥)
−−−−−−−−−→ (ec′, pc′, val, buf)
(LF)
cmd = r ← e, v = ê
s
(t,ec(t),inst,⊥)
−−−−−−−−−→ (ec′, pc′, val[r := v], buf)
(LA)
cmd = assume e, ê 6= 0
s
(t,ec(t),inst,⊥)
−−−−−−−−−→ (ec′, pc′, val, buf)
(LC)
Fig. 13. Transition rules for XTSO(P) assuming s = (ec, pc, val, buf) with pc(t) = q,
inst = q
cmd
−−→ q′ in thread t, ec′ = ec[t := ec(t) + 1], pc′ = pc[t := q′]. We use ê to
evaluate e under val and buf(t)↓(N× {a} × DOM) for stores in buf(t) that access a.
