Reasoning About TSO Programs Using Reduction and Abstraction by Bouajjani, Ahmed et al.
Reasoning About TSO Programs Using
Reduction and Abstraction
Ahmed Bouajjani1, Constantin Enea1, Suha Orhun Mutluergil2, and Serdar
Tasiran3
1 IRIF, University Paris Diderot & CNRS, {abou,cenea}@irif.fr,
2 Koc University, smutluergil@ku.edu.tr
3 Amazon Web Services, tasirans@amazon.com
Abstract. We present a method for proving that a program running un-
der the Total Store Ordering (TSO) memory model is robust, i.e., all its
TSO computations are equivalent to computations under the Sequential
Consistency (SC) semantics. This method is inspired by Lipton’s reduc-
tion theory for proving atomicity of concurrent programs. For programs
which are not robust, we introduce an abstraction mechanism that allows
to construct robust programs over-approximating their TSO semantics.
This enables the use of proof methods designed for the SC semantics in
proving invariants that hold on the TSO semantics of a non-robust pro-
gram. These techniques have been evaluated on a large set of benchmarks
using the infrastructure provided by CIVL, a generic tool for reasoning
about concurrent programs under the SC semantics.
1 Introduction
A classical memory model for shared-memory concurrency is Sequential Con-
sistency [15] (SC), where the actions of different threads are interleaved while
the program order between actions of each thread is preserved. For performance
reasons, modern multiprocessors implement weaker memory models, e.g., Total
Store Ordering (TSO) [19] in x86 machines, which relax the program order. For
instance, the main feature of TSO is the write-to-read relaxation, which allows
reads to overtake writes. This relaxation reflects the fact that writes are buffered
before being flushed non-deterministically to the main memory.
Nevertheless, most programmers usually assume that memory accesses hap-
pen instantaneously and atomically like in the SC memory model. This assump-
tion is safe for data-race free programs [3]. However, many programs employing
lock-free synchronization are not data-race free, e.g., programs implementing
synchronization operations and libraries implementing concurrent objects. In
most cases, these programs are designed to be robust against relaxations, i.e.,
they admit the same behaviors as if they were run under SC. Memory fences
must be included appropriately in programs in order to prevent non-SC behav-
iors. Getting such programs right is a notoriously difficult and error-prone task.
Robustness can also be used as a proof method, that allows to reuse the existing
SC verification technology. Invariants of a robust program running under SC are
ar
X
iv
:1
80
4.
05
19
6v
1 
 [c
s.L
O]
  1
4 A
pr
 20
18
also valid for the TSO executions. Therefore, the problem of checking robustness
of a program against relaxations of a memory model is important.
In this paper, we address the problem of checking robustness in the case
of TSO. We present a methodology for proving robustness which uses the con-
cepts of left/right mover in Lipton’s reduction theory [16]. Intuitively, a program
statement is a left (resp., right) mover if it commutes to the left (resp., right)
with respect to the statements in the other threads. These concepts have been
used by Lipton [16] to define a program rewriting technique which enlarges the
atomic blocks in a given program while preserving the same set of behaviors. In
essence, robustness can also be seen as an atomicity problem: every write state-
ment corresponds to two events, inserting the write into the buffer and flushing
the write from the buffer to the main memory, which must be proved to happen
atomically, one after the other. However, differently from Lipton’s reduction the-
ory, the events that must be proved atomic do not correspond syntactically to
different statements in the program. This leads to different uses of these concepts
which cannot be seen as a direct instantiation of this theory.
Then, following the idea of combining reduction and abstraction introduced
in [11], we define a program abstraction technique that roughly, makes reads
non-deterministic. This technique can be used in two ways. On one side, it can
lead to programs that have exactly the same set of reachable configurations
as the original program (but these configurations can be reached in different
orders), which can be proved robust using the mover theory. This implies that
the original program reaches the same set of configurations both under TSO and
SC (thus enabling the preservation of SC invariants). On the other side, it can
lead to over-approximations of the original program which are robust. In this
case, any invariant of the over-approximation under SC is also an invariant for
the TSO behaviors of the original program.
We tested the applicability of the proposed reduction and abstraction based
techniques on an exhaustive benchmark suite containing 34 challenging programs
(from [2] and [7]). These techniques were precise enough for proving robustness
of 32 of these programs. One program (presented in Figure 3) is not robust, and
required abstraction in order to derive a robust over-approximation. There is only
one program which cannot be proved robust using our techniques (although it is
robust). We believe however that an extension of our abstraction mechanism to
atomic read-write instructions will be able to deal with this case. We leave this
question for future work.
2 Overview
The TSO memory model allows strictly more behaviors than the classic SC mem-
ory model: writes are first stored in a thread-local buffer and non-deterministically
flushed into the shared memory at a later time (also, the write buffers are ac-
cessed first when reading a shared variable). However, in practice, many pro-
grams are robust, i.e., they have exactly the same behaviors under TSO and SC.
Robustness implies for instance, that any invariant proved under the SC seman-
tics is also an invariant under the TSO semantics. We describe in the following a
sound methodology for checking that a program is robust, which avoids modeling
and verifying TSO behaviors. Moreover, for non-robust programs, we show an ab-
straction mechanism that allows to obtain robust programs over-approximating
the behaviors of the original program.
procedure send (){
y := r1 ;
x := 1 ;
}
procedure recv ( ){
do{
r1 := x ;
}while ( r1 == 0 ) ;
r2 := y ;
}
2
ଵ
2
1
2
2
Fig. 1. An example message passing program and a sample trace. Edges of the trace
shows the happens before order of global accesses and they are simplified by applying
transitive reduction.
As a first example, consider the simple “message passing” program in Fig-
ure 1. The send method sets the value of the “communication” variable y to
some predefined value from register r1. Then, it raises a flag by setting the vari-
able x to 1. Another thread executes the method recv which waits until the flag
is set and then, it reads y (and stores the value to register r2). This program
is robust, the TSO memory model does not enable new behaviors although the
writes may be delayed. For instance, consider the following TSO execution (we
assume that r1 contains 42):
(t1, isu) (t1, isu)(t1, com, y, 42) (t1, com, x, 1)
(t2, rd, x, 0) (t2, rd, x, 0) (t2, rd, x, 1)(t2, rd, y, 42)
The actions of each thread (t1 or t2) are aligned horizontally, they are either
issue actions (isu) for writes being inserted into the thread local buffer (e.g.,
the first (t1, isu) represents the write of y being inserted to the buffer), commit
actions (com) for writes being flushed to the main memory (e.g., (t1, com, y, 42)
represents the write y := 42 being flushed and executed on the shared memory),
and read actions for reading values of shared variables. Note that every assign-
ment in the program generates two actions, an issue and a commit. The issue
action is “local”, it doesn’t enable or disable actions of other threads.
The above execution can be “mimicked” by an SC execution. If we had not
performed the isu actions of t1 that early but delayed them until just before
their corresponding com actions, we would obtain a valid SC execution of the
same program with no need to use store buffers:
(t1, wr, y, 42) (t1, wr, x, 1)
(t2, rd, x, 0) (t2, rd, x, 0) (t2, rd, x, 1)(t2, rd, y, 42)
Above, consecutive isu and com actions are combined into a single write ac-
tion (wr). This intuition corresponds to an equivalence relation between TSO
executions and SC executions: if both executions contain the same actions on
the shared variables (performing the same accesses on the same variables with
the same values) and the order of actions on the same variable are the same
for both executions, we say that these executions have the same trace[20], or
that they are trace-equivalent. For instance, both the SC and TSO executions
given above have the same trace given in Figure 1. The notion of trace is used
to formalize robustness for programs running under TSO [7]: a program is called
robust when every TSO execution has the same trace as an SC execution of the
same program.
Our method for showing robustness is based on proving that every TSO exe-
cution can be permuted to a trace-equivalent SC execution (where issue actions
are immediately followed by the corresponding commit actions). We say that an
action α moves right until another action β in an execution if we can swap α
with every later action until β while preserving the feasibility of the execution
(e.g., not invalidating reads and keeping the actions enabled). We observe that
if α moves right until β then the execution obtained by moving α just before
β has the same trace with the initial execution. We also have the dual notion
of moves-left with a similar property. As a corollary, if every issue action moves
right until the corresponding commit action or every commit action moves left
until the corresponding issue action, we can find an equivalent SC execution. For
our execution above, the issue actions of the first thread move right until their
corresponding com actions. Note that there is a commit action which doesn’t
move left: moving (t1, com, x, 1) to the left of (t2, rd, x, 0) is not possible since it
would disable this read.
In general, issue actions and other thread local actions (e.g. statements using
local registers only) move right of other threads’ actions. Moreover, issue actions
(t, isu) move right of commit actions of the same thread that correspond to writes
issued before (t, isu). For the message passing program, the issue actions move
right until their corresponding commits in all TSO executions since commits
cannot be delayed beyond actions of the same thread (for instance reads). Hence,
we can safely deduce that the message passing program is robust. However, this
reasoning may fail when an assignment is followed by a read of a shared variable
in the same thread.
procedure f oo ( ){
x := 1 ;
r1 := z ;
fence
r2 := y ;
}
procedure bar ( ){
y := 1 ;
fence
r3 := x ;
}
Fig. 2. An example store buffering program.
Consider the “store-buffering”
like program in Figure 2. This
program is also robust. However,
the issue action generated by x :=
1 might not not always move right
until the corresponding commit.
Consider the following execution
(we assume that the initial value
of z is 5):
(t1, isu) (t1, rd, z, 5) (t1, com, x, 1) . . .
(t2, isu) (t2, com, y, 1)(t2, τ)(t2, rd, x, 0) . . .
Here, we assumed that t1 executes foo and t2 executes bar. The fence
instruction generates an action τ . The first issue action of t1 cannot be moved
to the right until the corresponding commit action since this would violate the
program order. Moreover, the corresponding commit action does not move left
due to the read action of t2 on x (which would become infeasible).
The key point here is that a later read action by the same thread ((t1, rd, z, 5))
doesn’t allow to move the issue action to the right (until the commit). However,
this read action moves to the right of the actions of the other threads. So, we
can construct an equivalent SC execution by first moving the read action right
after the commit (t1, com, x, 1) and then move the issue action right until the
commit action.
In general, we can say that an issue (t, isu) of a thread t moves right until
the corresponding commit if each read action of t after (t, isu) can move right
until the next action of t that follows both the read and the commit. Actually,
this property is not required for all such reads. The read actions that follow a
fence cannot happen between the issue and the corresponding commit actions.
For instance, the last read action of foo cannot happen between the first issue
of foo and its corresponding commit action. Such reads that follow a fence are
not required to move right. In addition, we can omit the right-moves check for
the read actions that read from the thread local buffer (see Section 3 for more
details).
In brief, our method for checking robustness does the following for every write
instruction (assignment to a shared variable): either the commit action of this
write moves left or the actions of later read instructions that come before a fence
move right in all executions. This semantic condition can be checked using the
concept of movers [17] as follows: every write instruction is either a left-mover
or all the read instructions that come before a fence and can be executed later
than the write (in an SC execution) are right-movers. Note that this requires no
modeling and verification of TSO executions.
For non-robust programs that might reach different configurations than SC
executions, we define an abstraction mechanism that replaces read instructions
with “non-deterministic” reads that can read more values than the original in-
structions. The abstracted program has more behaviors than the original one
(under both SC and TSO), but it may turn to be robust. When it is robust, we
get that any property of its SC semantics holds also for the TSO semantics of
the original program.
Consider the work stealing queue implementation in Figure 3. A queue is
represented with an array items. Its head and tail indices are stored in the shared
variables H and T, respectively. There are three procedures that can operate on
this queue: any number of threads may execute the steal method and remove
an element from the head of the queue, and a single unique thread may execute
put or take methods nondeterministically. The put method inserts an element
at the tail index and the take method removes an element from the tail index.
This program is not robust. Our robustness check fails on this program be-
cause the writes of the worker thread (executing the put and take methods) are
var H,T, items ;
procedure s t e a l ( ){
local h , t , r e s ;
L1 : h := H;
t := T;
i f (h ≥ t )
return −1;
r e s := items [ h ] ;
i f ( cas (H, h , h+1) )
return r e s ;
else
goto L1 ;
}
procedure put ( var e l t ){
local t ;
t := T;
items [ t ] := e l t ;
T := t+1;
}
procedure take ( ){
local h , t , r e s ;
L1 : t := T;
T := t−1;
h := H; //havoc (h , h ≤ H) ;
i f ( t < h ){
T := h ;
return −1;
}
r e s := items [ t ] ;
i f ( t > h )
return r e s ;
T := h+1;
i f ( cas (H, h , h+1) )
return task ;
else
goto L1 ;
}
Fig. 3. Work Stealing Queue.
not left movers and the read from the variable H in the take method is not a
right mover. This read is not a right mover w.r.t. successful CAS actions of the
steal procedure that increment H.
Worse than that, this program might reach to configurations under TSO
semantics that are not possible under SC and properties of the SC executions
are not satisfied for TSO executions. If there is a single element in the queue and
the take method takes it by delaying its writes after some concurrent steals, one
of the concurrent steals might also remove this last element. Popping the same
element twice is not possible under SC, but it is possible under TSO semantics.
However, we can still prove some properties of this program under TSO.
We apply our abstraction on this instruction of the take method that reads
from H such that instead of reading the exact value of H, it can read any value
less than or equal to the value of H. We write this instruction as havoc(h, h ≤ H)
(it assigns to h a nondeterministic value satisfying the constraint h ≤ H). Note
that this abstraction is sound in the sense that it reaches more states under
SC/TSO than the original program.
The resulting program is robust. The statement havoc(h, h ≤ H) is a right
mover w.r.t. successful CAS actions of the stealer threads. Hence, for all the
write instructions, the reachable read instructions become right movers and our
check succeeds. The abstract program satisfies the specification of an idempotent
work stealing queue (elements can be dequeued multiple times) which implies
that the original program satisfies this specification as well.
3 TSO Robustness
We present the syntax and the semantics of a simple programming language
used to state our results. We define both the TSO and the SC semantics, an
abstraction of executions called trace [20] that intuitively, captures the happens-
before relation between actions in an execution, and the notion of robustness.
Syntax. We consider a simple programming language which is defined in Fig-
ure 4. Each program P has a finite number of shared variables −→x and a finite
number of threads (
−→
t ). Also, each thread ti has a finite set of local registers
(−→ri ) and a start label l0i . Bodies of the threads are defined as finite sequences
of labelled instructions. Each instruction is followed by a goto statement which
defines the evolution of the program counter. Note that multiple instructions
can be assigned to the same label which allows us to write non-deterministic
programs and multiple goto statements can direct the control to the same label
which allows us to mimic imperative constructs like loops and conditionals. An
assignment to a shared variable 〈var〉 := 〈expr〉 is called a write instruction.
Also, an instruction of the form 〈reg〉 := 〈var〉 is called a read instruction.
〈prog〉 ::= program 〈pid〉 vars 〈var〉∗ 〈thread〉∗
〈thread〉 ::= thread 〈tid〉 regs 〈reg〉∗ init 〈label〉 begin 〈linst〉∗ end
〈linst〉 ::= 〈label〉: 〈inst〉; goto 〈label〉;
〈inst〉 ::= 〈var〉 := 〈expr〉
| 〈reg〉 := 〈expr〉
| 〈reg〉 := 〈var〉
| fence
| 〈reg〉 := cas(〈var〉, 〈expr〉, 〈expr〉)
| skip
| assume 〈bexpr〉
Fig. 4. Syntax of the programs. The star (∗) indicates zero or more occurrences of
the preceding element. 〈pid〉, 〈tid〉, 〈var〉, 〈reg〉 and 〈label〉 are elements of their given
domains representing the program identifiers, thread identifiers, shared variables, regis-
ters and instruction labels, respectively. 〈expr〉 is an arithmetic expression over 〈reg〉∗.
Similarly, 〈bexpr〉 is a boolean expression over 〈reg〉∗.
Instructions of the program can read from or write to shared variables or
registers. However, each instruction can access at most one shared variable. We
assume that the program P comes with a set D that represents the domain of the
variables and the registers; and a set of functions F that allows us to calculate
arithmetic and boolean expressions.
The fence statement empties the buffer of the executing thread. The cas
(compare-and-swap) instruction checks whether the value of its input variable
is equal to its second argument. If so, it writes sets third argument as the value
of the variable and returns true. Otherwise, it returns false. In either case, cas
empties the buffer immediately after it executes. The assume statement allows
us to check conditions. If the boolean expression it contains holds at that state,
it behaves like a skip. Otherwise, the execution blocks. Formal description of
the instructions are given in Figure 5.
TSO Semantics. Under the TSO memory model, each thread maintains a local
queue to buffer write instructions. A state s of the program is a triple of the
form (pc,mem, buf). Let L be the set of available labels in the program P.
Then, pc :
−→
t → L shows the next instruction to be executed for each thread,
mem :
⋃
ti∈−→t
−→ri ∪−→x → D represents the current values in shared variables and
registers and buf :
−→
t → (−→x ×D)∗ represents the contents of the buffers.
There is a special initial state s0 = (pc0,mem0, buf0). At the beginning, each
thread ti points to its initial label l
0
i i.e., pc0(ti) = l
0
i . We assume that there is a
special default value 0 ∈ D. All the shared variables and registers are initiated as
0 i.e., mem0(x) = 0 for all x ∈
⋃
ti∈−→t
−→ri ∪−→x . Lastly, all the buffers are initially
empty i.e., buf0(ti) =  for all ti ∈ −→t .
The transition relation→TSO between program states is defined in Figure 5.
Transitions are labelled by actions. Each action is an element from
−→
t ×({τ, isu}∪
({com, rd}×−→x ×D)). Actions keep the information about the thread performing
the transition and the actual parameters of the reads and the writes to shared
variables. We are only interested in accesses to shared variables, therefore, other
transitions are labelled with τ as thread local actions.
A TSO execution of a program P is a sequence of actions pi = pi1, pi2, . . . , pin
such that there exists a sequence of states σ = σ0, σ1, . . . , σn, σ0 = s0 is the
initial state of P and σi−1 pii−→ σi is a valid transition for any i ∈ {1, . . . , n}. We
assume that buffers are empty at the end of the execution.
SC Semantics. Under SC, a program state is a pair of the form (pc,mem)
where pc and mem are defined as above. Shared variables are read directly
from the memory mem and every write updates directly the memory mem.
To make the relationship between SC and TSO executions more obvious, every
write instruction generates isu and com actions which follow one another in the
execution (each isu is immediately followed by the corresponding com). Since
there are no write buffers, fence instructions have no effect under SC.
Traces and TSO Robustness. Consider a (TSO or SC) execution pi of P. The
trace of pi is a graph, denoted by Tr(pi): Nodes of Tr(pi) are actions of pi except
the τ actions. In addition, isu and com actions are unified in a single node.
The isu action that puts an element into the buffer and the corresponding com
action that drains that element from the buffer correspond to the same node
in the trace. Edges of Tr(pi) represent the happens before order (hb) between
these actions. The hb is union of four relations. The program order po keeps the
order of actions performed by the same thread excluding the com actions. The
store order so keeps the order of com actions on the same variable that write
different values. The read-from relation, denoted by rf , relates a com action to
a rd action that reads its value. Lastly, the from-reads relation fr relates a rd
action to a com action that overwrites the value read by rd; it is defined as the
composition of rf and so.
Our hb relation is slightly different than the standard definition. In the stan-
dard definition, two com actions are related by so if they operate on the same
variable and write the same value. In our definition, they are not related by so.
In addition, fr relation implicitly changes due to the difference in so. If a com
action overwrites a value read by rd but writes the same value, these rd and
com actions are not related by fr according to our definition. However, they are
related according to the standard definition. This relaxation on hb definition is
necessary to relate trace robustness and mover concepts later. In order not to
confuse standard and new definitions, we will denote standard traces with Trstd.
We say that the program P is TSO robust if for any TSO execution pi of
P, there exists an SC execution pi′ such that Tr(pi) = Tr(pi′). For the standard
traces, it has been proved that robustness implies that the program reaches the
same valuations of the shared memory under both TSO and SC [7]. Moreover,
the following result characterizes TSO executions that have the same trace as
an SC execution.
Lemma 1 ([20]). A TSO-execution pi of a program P has the same standard
trace as an SC-execution of P if and only if the happens-before order in Trstd(pi)
is acyclic.
However, Lemma 1 is not entirely true for the new trace definition. We have
shown that only “only if” direction holds for the new trace definition and there
can be acyclic TSO executions that are not possible under SC.
Since the new hb definition is a subset of the standard definition, it is easy
to see that standard trace-robustness implies new trace-robustness. Moreover,
we have shown that new trace-robustness implies both TSO and SC executions
reach to the same set of valuations of the shared memory. (I do not provide
proofs due to space restrictions.)
4 A Reduction Theory for Checking Robustness
We present a methodology for checking robustness which builds on concepts
introduced in Lipton’s reduction theory [17]. This theory allows to rewrite a
given concurrent program (running under SC) into an equivalent one that has
larger atomic blocks. Proving robustness is similar in spirit in the sense that
one has to prove that issue and commit actions can happen together atomically.
However, differently from the original theory, these actions do not correspond
to different statements in the program (they are generated by the same write
instruction). Nevertheless, we show that the concepts of left/right movers can
be also used to prove robustness.
Movers. Let pi = pi1, . . . , pin be an SC execution. We say that the action pii moves
right (resp., left) in pi if pi1, . . . , pii−1, pii+1, pii, pii+2, . . . , pin (resp., pi1, . . . , pii−2, pii, pii−1, pii+1 . . . , pin)
is also a valid execution of P, the thread of pii is different than the thread of
pii+1 (resp., pii−1), and both executions reach to the same end state σn. Since
every issue action is followed immediately by the corresponding commit action,
an issue action moves right, resp., left, when the commit action also moves right,
resp., left, and vice-versa.
Let instOfpi be a function, depending on an execution pi, which given an
action pii ∈ pi, gives the labelled instruction that generated pii. Then, a labelled
instruction ` is a right (resp., left) mover if for all SC executions pi of P and for
all actions pii of pi such that instOf(pii) = `, pii moves right (resp., left) in pi.
A labelled instruction is a non-mover if it is neither left nor right mover, and
it is a both mover if it is both left and right mover.
Reachability Between Instructions. An instruction `′ is reachable from the
instruction ` if ` and `′ both belong to the same thread and there exists an
SC execution pi and indices 1 ≤ i < j ≤ |pi| such that instOfpi(pii) = ` and
instOfpi(pij) = `
′. We say that `′ is reachable from ` before a fence if pik is not an
action generated by a fence instruction in the same thread as `, for all i < k < j.
When ` is a write instruction and `′ a read instruction, we say that `′ is buffer-
free reachable from ` if pik is not an action generated by a fence instruction in
the same thread as ` or a write action on the same variable that `′ reads-from,
for all i < k < j.
Definition 1. We say that a write instruction `w is atomic if it is a left mover
or every read instruction `r buffer-free reachable from `w is a right mover. We
say that P is write atomic if every write instruction `w in P is atomic.
Note that all of the notions used to define write atomicity (movers and in-
struction reachability) are based on SC executions of the programs. The following
result shows that write atomicity implies robustness.
Theorem 1 (Soundness). If P is write atomic, then it is robust.
We will prove the contrapositive of the statement. For the proof, we need
the notion of minimal violation defined in [7]. A minimal violation is a TSO
execution in which sum of number of same thread actions between isu and
corresponding com actions for all writes is minimum. A minimal violation is
of the form pi = pi1, (t, isu), pi2, (t, rd, y, ∗), pi3, (t, com, x, ∗), pi4 such that pi1 is
an SC execution, only t can delay com actions, the first delayed action is the
(t, com, x, ∗) action after pi3 and it corresponds to (t, isu) after pi1, pi2 does
not contain any com or fence actions by t (writes of t are delayed until af-
ter (t, rd, y, ∗)), (t, rd, y, ∗) →hb+ act for all act ∈ pi3 ◦ {(t, com, x, ∗)} (isu and
com actions of other threads are counted as one action for this case), pi3 does not
contain any action of t, pi4 contains only and all of the com actions of t that are
delayed in (t, isu) ◦ pi2 and none of the com actions in (t, com, x, ∗) ◦ pi4 touches
y.
Minimal violations are important for us because of the following property:
Lemma 2 (Completeness of Minimal Violations [7]). The program P is
robust iff it does not have a minimal violation.
Lemma 2 defined on standard traces also hold for our extended trace defini-
tion.
Before going into the proof of Theorem 1, let us define some notation. Let pi
be a sequence representing an execution or a fragment of it. Let Q be a set of
thread identifiers. Then, pi|Q is the projection of pi on actions from the threads
in Q. Similarly, pi|n is the projection of pi on first n elements for some natural
number n. sz(pi) gives the length of the sequence pi. We also define a product
operator ⊗. Let pi and ρ be some execution fragments. Then, pi⊗ ρ is same as pi
except that if the ith isu action of pi is not immediately followed by a com action
by the same thread, then ith com action of ρ is inserted after this isu. Product
operator helps us to fill unfinished writes in one execution fragment by inserting
commit actions from another fragment immediately after the issue actions.
Proof (Theorem 1). Assume P is not robust. Then, there exists a minimal vio-
lation pi = pi1, α, pi2, θ, pi3, β, pi4 satisfying the conditions described before, where
α = (t, isu), θ = (t, rd, y, ∗) and β = (t, com, x, ∗). Below, we show that the
write instruction w = instOf(α) is not atomic.
1. w is not a left mover.
1.1. ρ = pi1, pi2|−→t \{t}, pi3|−→t \{t}|sz(pi3|−→t \{t})−1, γ, (α, β) is an SC execution of
P where γ is the last action of pi3. γ is a read or write action on x
performed by a thread t′ other than t and value of γ is different from
what is written by β.
1.1.1. ρ is an SC execution because t never changes value of a shared vari-
able in pi2 and pi3. So, even we remove actions of t in those parts,
actions of other threads are still enabled. Since other threads perform
only SC operations in pi, pi1, pi2|−→t \{t}, pi3|−→t \{t} is an SC execution.
From pi, we also know that the first enabled action of t is α if we
delay the actions of t in pi2 and pi3.
1.1.2. The last action of pi3 is γ. By definition of a minimal violation, we
know that θ →hb+ α and pi3 does not contain any action of t. So,
there must exist an action γ ∈ pi3 such that either γ reads from x
and γ →fr β in pi or γ writes to x and γ →st β in pi. Moreover, γ
is the last action of pi3 because if there are other actions after γ, we
can delete them and can obtain another minimal violation which is
shorter than pi and hence contradict the minimality of pi.
1.2. ρ′ = pi1, pi2|−→t \{t}, pi3|−→t \{t}|sz(pi3|−→t \{t})−1, (α, β), γ is an SC execution with
a different end state than ρ defined in 1.1 has or it is not an SC execution,
where instOf(γ′) = instOf(γ).
1.2.1. In the last state of ρ, x has the value written by β. If γ is a write
action on x, then x has a different value at the end of ρ′ due to
the definition of a minimal violation (γ and β should write different
values to have an so edge between them according to the our new hb
definition although it is not necessary according to the standard hb
definition). If γ is a read action on x, then it does not read the value
written by β in ρ, (again due to the new definition of so). However,
γ reads this value in ρ′ . Hence, ρ′ is not a valid SC execution.
2. There exists a read instruction r buffer-free reachable from w such that r is
not a right mover. We will consider two cases: Either there exists a rd action
of t on variable z in pi2 such that there is a later write action by another
thread t′ on z in pi2 that writes a different value or not. Moreover, z is not a
variable that is touched by the delayed commits in pi4 i.e., it does not read
its value from the buffer.
2.1. We first evaluate the negation of above condition. Assume that for all
actions γ and γ′ such that γ occurs before γ′ in pi2, either γ 6= (t, rd, z, vz)
or γ′ 6= (t′, isu)(t′, com, z, v′z). Then, r = instOf(θ) is not a right mover
and it is buffer-free reachable from w.
2.1.1. ρ = pi1, pi2|−→t \{t}, pi2|{t} ⊗ pi4, θ, θ′ is a valid SC execution of P where
θ′ = (t′, isu)(t′, com, y, ∗) for some t 6= t′.
2.1.1.1. ρ is an SC execution. pi1, pi2|−→t \{t} is a valid SC execution since
t does not update value of a shared variable in pi2. Moreover,
all of the actions of t become enabled after this sequence since t
never reads value of a variable updated by another thread in pi2.
Lastly, the first action of pi3 is enabled after this sequence.
2.1.1.2. The first action of pi3 is θ
′ = (t′, isu)(t′, com, y, ∗). Let θ′ be the
first action of pi3. Since θ →hb θ′ in pi and θ′ is not an action
of t by definition of minimal violation, the only case we have is
θ →fr θ′. Hence, θ′ is a write action on y that writes a different
value than θ reads.
2.1.1.3. r is buffer-free reachable from w. ρ is a SC execution, first action
of ρ after pi1, pi2|−→t \{t} is α, β; w = instOf((α, β)), r = instOf(θ)
and actions of t in ρ between α, β and θ are not instances of a
fence instruction or write to y.
2.1.2. ρ′ = pi1, pi2|−→t \{t}, pi2|{t} ⊗ pi4, θ′, θ is not a valid SC execution.
2.1.2.1. In the last state of ρ, the value of y seen by t is the value read
in θ. It is different than the value written by θ′. However, at the
last state of ρ′, the value of y t sees must be the value θ′ writes.
Hence, ρ′ is not a valid SC execution.
2.2. Assume that there exists γ = (t, rd, z, vz) and γ
′ = (t′, isu)(t′, com, z, v′z)
in pi2. Then, r = instOf(γ) is not a right mover and r is buffer-free
reachable from w.
2.2.1. Let i be the index of γ and j be the index of γ′ in pi2. Then, define
ρ = pi1, pi2|j−1|−→t \{t}, pi2|i|{t} ⊗ pi4, γ′. ρ is an SC execution of P.
2.2.1.1. ρ is an SC execution. pi1, pi2|j−1|−→t \{t} prefix is a valid SC ex-
ecution because t does not update any shared variable in pi2.
Moreover, all of the actions of t in pi2|i|{t} ⊗ pi4 become enabled
after this sequence since t never reads a value of a variable up-
dated by another thread in pi2 and γ
′ is the next enabled in pi2
after this sequence since it is a write action.
2.2.2. Let i and j be indices of γ and γ′ in pi2 respectively. Define ρ′ =
pi1, pi2|j−1|−→t \{t}, pi2|i−1|{t} ⊗ pi4, γ′, γ. Then, ρ′ is not a valid SC ex-
ecution.
2.2.2.1. In the last state of ρ, value of z seen by t is vz. It is different
than the v′z, value written by γ
′. However, in the last state of
ρ′, the value of z t sees must be v′z. Hence, ρ
′ is not a valid SC
execution.
2.2.3. r is buffer-free reachable from w because ρ defined in 2.2.1 is an SC
execution, first action after pi1, pi2|j−1|−→t \{t} is α, β, w = instOf((α, β)),
r = instOf(γ) and actions of t in ρ between α, β and θ are not in-
stances of a fence instruction or a write to z by t.
5 Abstractions and Verifying non-Robust Programs
In this section, we introduce program abstractions which are useful for verifying
non-robust TSO programs (or even robust programs – see an example at the end
of this section). In general, a program P ′ abstracts another program P for some
semantic model M ∈ {SC,TSO} if every shared variable valuation σ reachable
from the initial state in an M execution of P is also reachable in an M execution
of P ′. We denote this abstraction relation as P M P ′.
In particular, we are interested in read instruction abstractions, which re-
place instructions that read from a shared variable with more “liberal” read
instructions that can read more values (this way, the program may reach more
shared variable valuations). We extend the program syntax in Section 3 with
havoc instructions of the form havoc(〈reg〉, 〈varbexpr〉), where 〈varbexpr〉 is a
boolean expression over a set of registers and a single shared variable 〈var〉.
The meaning of this instruction is that the register reg is assigned with any
value that satisfies varbexpr (where the other registers and the variable var
are interpreted with their current values). The program abstraction we consider
will replace read instructions of the form 〈reg〉 := 〈var〉 with havoc instructions
havoc(〈reg〉, 〈varbexpr〉).
While replacing read instructions with havoc instructions, we must guarantee
that the new program reaches at least the same set of shared variable valuations
after executing the havoc as the original program after the read. Hence, we
allow such a rewriting only when the boolean expression varbexpr is weaker (in
a logical sense) than the equality reg = var (hence, there exists an execution of
the havoc instruction where reg = var).
Lemma 3. Let P be a program and P ′ be obtained from P by replacing an
instruction l1 : x := r; goto l2 of a thread t with l1 : havoc(r, φ(x,
−→r )); goto l2
such that ∀x, r. x = r =⇒ φ(x,−→r ) is valid. Then, P SC P ′ and P TSO P ′.
The notion of trace extends to programs that contain havoc instructions as
follows. Assume that (t, hvc, x, φ(x)) is the action generated by an instruction
havoc(r, φ(x,−→r )), where x is a shared variable and −→r a set of registers (the
action stores the constraint φ where the values of the registers are instantiated
with their current values – the shared variable x is the only free variable in φ(x)).
Roughly, the hvc actions are special cases of rd actions. Consider an execution
pi where an action α = (t, hvc, x, φ(x)) is generated by reading the value of a
write action β = (com, x, v) (i.e., the value v was the current value of x when the
havoc instruction was executed). Then, the trace of pi contains a read-from edge
β →rf α as for regular read actions. However, fr edges are created differently. If
α was a rd action we would say that we have α→fr γ if β →rf α and β →st γ.
For the havoc case, the situation is a little bit different. Let γ = (com, x, v′) be
an action. We have α →fr γ if and only if either β →rf α, β →st γ and φ(v′)
is false or α →fr γ′ and γ′ →st γ where γ′ is an action. Intuitively, there is a
from-read dependency from an havoc action to a commit action, only when the
commit action invalidates the constraint φ(x) of the havoc (or if it follows such
a commit in store order).
The notion of write-atomicity (Definition 1) extends to programs with havoc
instructions by interpreting havoc instructions havoc(r, φ(x,−→r )) as regular read
instructions r := x. Theorem 1 which states that write-atomicity implies robust-
ness can also be easily extended to this case.
Read abstractions are useful in two ways. First, they allow us to prove prop-
erties of non-robust program as the work stealing queue example in Figure 3. We
can apply appropriate read abstractions to relax the original program so that it
becomes robust in the end. Then, we can use SC reasoning tools on the robust
program to prove invariants of the program.
Second, read abstractions could be helpful for proving robustness directly.
The method based on write-atomicity we propose for verifying robustness is
sound but not complete. Some incompleteness scenarios can be avoided using
read abstractions. If we can abstract read instructions such that the new program
reaches exactly the same states (in terms of shared variables) as the original one,
it may help to avoid executions that violate mover checks.
Consider the program in Figure 6. The write statement x := 1 in procedure
foo is not atomic. It is not a left mover due to the read of x in the do-while loop
of bar. Moreover, the later read from y is buffer-free reachable from this write
and it is not a right mover because of the write to y in bar. To make it atomic, we
apply read abstraction to the read instruction of bar that reads from x. In the
new relaxed read, r1 can read 0 along with the value of x when x is not zero as
shown in the comments below the instruction. With this abstraction, the write
to x becomes a left mover because reads from x after the write can now read
the old value which was 0. Consequently, the program becomes write-atomic. If
we think of TSO traces of the abstract program and replace hvc nodes with rd
nodes, we obtain exactly the TSO traces of the original program. However, the
read abstraction adds more SC traces to the program and the program becomes
robust.
6 Experimental Evaluation
To test the practical value of our method, we have considered the benchmark
for checking TSO robustness described in [2], which consists of 34 programs.
This benchmark is quite exhaustive, it includes examples introduced in previous
works on this subject.
Table 1. Benchmark results. The second column (SV) states whether the original pro-
gram (without read abstractions) reach to the same shared variable states. SR column
shows whether the original program is robust according to the standard definition or
not. The fourth column (RB) stands for the robustness status of the original program
according to our extended hb definition. RA column shows the number of read ab-
stractions performed. RM column represents the number of read instructions that are
checked to be right movers and the LM column represents the write instructions that
are shown to be left movers. PO shows the total number of proof obligations generated
and VT stands for the total verification time in seconds.
Name SV SR RB RA RM LM PO VT
Chase-Lev: - - - 1 2 - 149 0.332
FIFO-iWSQ: + - + - 2 - 124 0.323
LIFO-iWSQ: + - + - 1 - 109 0.305
Anchor- iWSQ: + - + - 1 - 109 0.309
MCSLock: + + + 2 2 - 233 0.499
r+detour: + - + - 1 - 53 0.266
r+detours: + - + - 1 - 64 0.273
sb+detours+coh: + - + - 2 - 108 0.322
sb+detours: + - + - 1 1 125 0.316
write+r+coh: + - + - 1 - 78 0.289
write+r: + - + - 1 - 48 0.261
dc-locking: + + + 1 4 1 52 0.284
inline pgsql: + - + 2 2 - 90 0.286
Many of the programs in this benchmark are easy to prove being write-
atomic. Every write is followed by no buffer-free read instruction which makes
them trivially atomic (like the message passing program in Figure 1). This holds
for 20 out of the 34 programs.
For 13 examples, we needed to perform mover checks and/or read abstrac-
tions to show robustness. 12 of these examples are robust by our trace-robustness
definition but 10 of them are not robust according to the standard trace-robustness.
For all 12 programs (except Chase-Lev), both SC and TSO executions reach to
the same set of shared variable valuations. Detailed information for these exam-
ples can be found in Table 1. To check whether writes/reads are left/right movers
and the soundness of abstractions, we have used the tool Civl [12]. This tool al-
lows to prove assertions about concurrent programs (Owicki-Gries annotations)
and also to check whether an instruction is a left/right mover. The buffer-free
read instructions reachable from a write before a fence were obtained using a
trivial analysis of the control-flow graph (CFG) of the program. This method is
a sound approximation of the definition in Section 4 but it was sufficient for all
the examples.
Our method was not precise enough to prove robustness for only one example,
named as nbw-w-lr-rl in [7]. This program contains a method with explicit calls
to the lock and unlock methods of a spinlock. The instruction that writes to
the lock variable inside the unlock method is not atomic, because of the reads
from the lock variable and the calls to the getAndSet primitive inside the lock
method. Abstracting the reads from the lock variable is not sufficient in this
case due to the conflicts with getAndSet actions. However, we believe that read
abstractions could be extended to getAndSet instructions (which both read and
write to a shared variable atomically) in order to deal with this example.
7 Related Work
The weakest correctness criterion that enables SC reasoning for proving invari-
ants of programs running under TSO is state-robustness i.e., the reachable set
of states is the same under both SC and TSO. However, this problem has high
complexity (at least non-primitive recursive for programs with a finite number
of threads and a finite data domain [6]). Therefore, it is difficult to come up with
an efficient and precise solution. A symbolic decision procedure is presented in
[1] and over-approximate analyses are proposed in [13,14].
Due to the high complexity of state-robustness, stronger correctness crite-
ria with lower complexity have been proposed. Trace-robustness (that we call
simply robustness in our paper) is one of the most studied criteria in the litera-
ture. Bouajjani et al. [8] have proved that deciding trace-robustness is PSpace-
complete for a finite number of threads and a finite data domain.
There are various tools for checking trace-robustness. Trencher ([7]) is a
sound and complete tool for this purpose that tries to find minimal violations
by delaying writes of a single thread. Musketeer ([4]) provides an approximate
solution by checking existence of critical cycles on the control-flow graph. Other
tools that implement approximate verification procedures have been proposed
in [5,9,10]. Compared to these tools, our approach is either more general, i.e.,
it allows to prove robustness for programs that have an unbounded number of
threads and an unbounded data domain, or it is more precise, than for instance
Musketeer. Also, we propose an abstraction mechanism that allows to prove
invariants of non-robust programs.
Besides trace-robustness, there are other correctness criteria like triangular
race freedom (Trf) and persistence that are stronger than state-robustness.
Persistence ([2]) is incomparable to trace-robustness, and Trf [18] is stronger
than both trace-robustness and persistence. Our method can verify examples
that are state-robust but neither persistent nor Trf.
Reduction and abstraction techniques were used for reasoning on SC pro-
grams. Qed ([11]) is a tool that supports statement transformations as a way of
abstracting programs combined with a mover analysis. Also, Civl ([12]) allows
proving location assertions in the context of the Owicki-Gries logic which is en-
hanced with Lipton’s reduction theory [16]. Our work enables the use of such
tools for reasoning about the TSO semantics of a program.
References
1. Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Carl Leonardsson, and
Ahmed Rezine. Counter-example guided fence insertion under tso. In International
Conference on Tools and Algorithms for the Construction and Analysis of Systems,
pages 204–219. Springer, 2012.
2. Parosh Aziz Abdulla, Mohamed Faouzi Atig, and Tuan-Phong Ngo. The best
of both worlds: Trading efficiency and optimality in fence insertion for tso. In
European Symposium on Programming Languages and Systems, pages 308–332.
Springer, 2015.
3. Sarita V. Adve and Mark D. Hill. A unified formalization of four shared-memory
models. IEEE Trans. Parallel Distrib. Syst., 4(6):613–624, 1993.
4. Jade Alglave, Daniel Kroening, Vincent Nimal, and Daniel Poetzl. Dont sit on the
fence. In International Conference on Computer Aided Verification, pages 508–524.
Springer, 2014.
5. Jade Alglave and Luc Maranget. Stability in weak memory models. In Computer
Aided Verification, pages 50–66. Springer, 2011.
6. Mohamed Faouzi Atig, Ahmed Bouajjani, Sebastian Burckhardt, and Madanlal
Musuvathi. On the verification problem for weak memory models. ACM Sigplan
Notices, 45(1):7–18, 2010.
7. Ahmed Bouajjani, Egor Derevenetc, and Roland Meyer. Checking and enforcing
robustness against tso. In Programming Languages and Systems, pages 533–553.
Springer, 2013.
8. Ahmed Bouajjani, Roland Meyer, and Eike Mo¨hlmann. Deciding robustness
against total store ordering. In International Colloquium on Automata, Languages,
and Programming, pages 428–440. Springer, 2011.
9. Sebastian Burckhardt and Madanlal Musuvathi. Effective program verification for
relaxed memory models. In International Conference on Computer Aided Verifi-
cation, pages 107–120. Springer, 2008.
10. Jabob Burnim, Koushik Sen, and Christos Stergiou. Sound and complete monitor-
ing of sequential consistency for relaxed memory models. In International Confer-
ence on Tools and Algorithms for the Construction and Analysis of Systems, pages
11–25. Springer, 2011.
11. Tayfun Elmas, Shaz Qadeer, and Serdar Tasiran. A calculus of atomic actions. In
ACM Symposium on Principles of Programming Languages, page 14. Association
for Computing Machinery, Inc., January 2009.
12. Chris Hawblitzel, Shaz Qadeer, and Serdar Tasiran. Automated and modular
refinement reasoning for concurrent programs. Computer Aided Verification, 2015.
13. Michael Kuperstein, Martin Vechev, and Eran Yahav. Partial-coherence abstrac-
tions for relaxed memory models. In ACM SIGPLAN Notices, volume 46, pages
187–198. ACM, 2011.
14. Michael Kuperstein, Martin Vechev, and Eran Yahav. Automatic inference of
memory fences. ACM SIGACT News, 43(2):108–123, 2012.
15. Leslie Lamport. How to make a multiprocessor computer that correctly executes
multiprocess programs. Computers, IEEE Transactions on, 100(9):690–691, 1979.
16. Richard J Lipton. Reduction: A method of proving properties of parallel programs.
Communications of the ACM, 18(12):717–721, 1975.
17. Richard J. Lipton. Reduction: A method of proving properties of parallel programs.
Commun. ACM, 18(12):717–721, 1975.
18. Scott Owens. Reasoning about the implementation of concurrency abstractions on
x86-tso. In ECOOP, volume 6183, pages 478–503. Springer, 2010.
19. Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, and Mag-
nus O. Myreen. x86-tso: a rigorous and usable programmer’s model for x86 mul-
tiprocessors. Commun. ACM, 53(7):89–97, 2010.
20. Dennis Shasha and Marc Snir. Efficient and correct execution of parallel programs
that share memory. ACM Transactions on Programming Languages and Systems
(TOPLAS), 10(2):282–312, 1988.
x := ae(−→rt) ∈ ins(pc(t)) v = eval(ae(−→rt)) x ∈ −→x
(pc,mem, buf)
(t,isu)−−−−−→TSO (pc′,mem, buf [t→ buf(t) ◦ 〈(x, v)〉)]
buf(t) = 〈(x, v)〉 ◦ buf ′ x ∈ −→x
(pc,mem, buf)
(t,com,x,v)−−−−−−−−→TSO (pc,mem, buf [t→ buf ′]
r := ae(−→rt) ∈ ins(pc(t)) v = eval(ae(−→rt)) r ∈ −→rt
(pc,mem, buf)
(t,τ)−−−→TSO (pc′,mem[r → v], buf)
r := x ∈ ins(pc(t)) x ∈ −→x v = mem(x) x /∈ varsOfBuf(buf(t)) r ∈ −→rt
(pc,mem, buf)
(t,rd,x,v)−−−−−−−→TSO (pc′,mem[r → v], buf)
r := x ∈ ins(pc(t)) x ∈ −→x buf = α ◦ 〈(x, v)〉 ◦ β x /∈ varsOfBuf(β) r ∈ −→rt
(pc,mem, buf)
(t,rd,x,v)−−−−−−−→TSO (pc′,mem[r → v], buf)
fence ∈ ins(pc(t)) buf(t) = 
(pc,mem, buf)
(t,τ)−−−→TSO (pc′,mem, buf)
r := cas(x, ae1(
−→rt), ae2(−→rt)) ∈ ins(pc(t)) x ∈ −→x r ∈ −→rt
mem(x) = eval(ae1(
−→rt)) buf(t) =  v = eval(ae2(−→rt))
(pc,mem, buf)
(t,isu)(t,com,x,v)−−−−−−−−−−−−−→TSO (pc′,mem[r → 1][x→ v], buf)
r := cas(x, ae1(
−→rt), ae2(−→rt)) ∈ ins(pc(t)) x ∈ −→x r ∈ −→rt
mem(x) 6= eval(ae1(−→rt)) buf(t) =  v = mem(x)
(pc,mem, buf)
(t,rd,x,v)−−−−−−−→TSO (pc′,mem[r → 0], buf)
assume be(−→rt) ∈ ins(pc(t)) eval(be(−→rt)) = >
(pc,mem, buf)
(t,τ)−−−→TSO (pc′,mem, buf)
Fig. 5. The TSO Transition Relation. The function ins takes a label l and returns
the set of instructions labeled by l. We always assume that pc′ = pc[t → l′] where
pc(t) : inst goto l′; is a labelled instruction of t and inst is the instruction described
at the beginning of the rule. The evaluation function eval calculates the value of an
arithmetic or boolean expression based on mem (ae stands for arithmetic expression).
Sequence concatenation is denoted by ◦. The function varsOfBuf takes a sequence of
pairs and returns the set consisting of the first fields of these pairs.
procedure f oo ( ){
x := 1 ;
r2 := y ;
}
procedure bar ( ){
do{
r1 = x ;
//havoc ( r1 , (x 6= 0)?r1 = x ∨ r1 = 0 : r1 = 0)
}while ( r1 == 0 ) ;
y := 1 ;
}
Fig. 6. An example program that needs read abstraction to pass our robustness checks.
The havoc statement in comments reads as follows: if value of x is not 0 then r1 gets
either the value of x or 0. Otherwise, it is 0.
