Efficient Verification of Concurrent Programs Over TSO Memory Model by Narayan, Chinmay et al.
ar
X
iv
:1
60
6.
05
43
5v
1 
 [c
s.L
O]
  1
7 J
un
 20
16
Efficient Verification of Concurrent Programs
Over the TSO Memory Model
Chinmay Narayan, Subodh Sharma, S.Arun-Kumar
Indian Institute of Technology Delhi
Abstract. We address the problem of efficient verification of multi-
threaded programs running over Total Store Order (TSO) memory model.
It has been shown that even with finite data domain programs, the com-
plexity of control state reachability under TSO is non-primitive recur-
sive. In this paper, we first present a bounded-buffer verification approach
wherein a bound on the size of buffers is placed; verification is performed
incrementally by increasing the size of the buffer with each iteration of
the verification procedure until the said bound is reached. For programs
operating on finite data domains, we also demonstrate the existence of a
buffer bound k such that if the program is safe under that bound, then it
is also safe for unbounded buffers. We have implemented this technique
in a tool ProofTraPar. Our results against memorax [2], a state-of-the-art
sound and complete verifier for TSO memory model, have been encour-
aging.
1 Introduction
The explosion in the number of schedules is central to the complexity of veri-
fying the safety and correctness of concurrent programs. There exist a plethora
of approaches in the literature that explore ways and means to address the
schedule-space explosion problem; incidentally, many of the the published tech-
niques operate over the assumption of a sequentially consistent (SC) memory
model. In contrast, almost all modern multi-core processors conform to memory
models weaker than SC. A program executing on a relaxed memory model ex-
hibits more behaviours than on the SC memory model. As a result, a program
declared correct by a verification methodology that assumes SC memory model
can possibly contain a buggy behaviour when executed on a relaxed memory
model.
Consider x86 machines that conform to TSO (Total Store Ordering). The
compiler or the runtime system of the program under the TSO memory model
is allowed to reorder a read following a write (read and writes are to different
variables) within a process, i.e. break the program order specified by the devel-
oper. Operationally, such a re-ordering is achieved by maintaining per-process
store buffers. Write operations issued by a process/thread are enqueued in the
store buffer local to that process. The buffered writes are later flushed (from
the buffer) into the global memory. The point in time when flushes take place
is deterministically known only when the store buffers are full. When the store
buffers are partially full, flushes are allowed to take place non-deterministically.
Therefore, when a read operation of variable x, is executed by a process, the pro-
cess first checks whether there is a recent write to x in the process’s store buffer.
If such a write exists then the value from store buffer is returned, otherwise the
value is read from the global memory.
flag1 = false, flag2 = false, t = 0;
P1
While(true){
1. flag1:=true;
2. t:=2;
3. while(flag
2
= true& t = 2);
4. //Critical Section
5. flag1:=false;
}
P2
While(true){
6. flag2:=true;
7. t:=1;
8. while(flag1 = true& t = 1);
9. //Critical Section
10. flag2:=false;
}
Fig. 1: Peterson’s algorithm for two processes
Figure 1 shows Peterson’s
algorithm as an instance of
a correct program under SC
semantics but which can fail
when executed under TSO.In
this algorithm, two processes
P1 and P2 coordinate their ac-
cess to their respective critical
sections using a shared vari-
able t. This algorithm satisfies
the mutual exclusion property
under the SC memory model,
i.e. both processes can not be
simultaneously present in their critical sections. The property however, is vio-
lated when the same algorithm is executed with a weaker memory model, such
as TSO. Consider the following execution under TSO. The write operations at
1, 2, 6 and 7 from processes P1 and P2 are stored in store buffers and are yet
to be reflected in the global memory. The reads at control locations 3 and 8 will
return initial values, thereby violating the mutual exclusion property.
One can avoid such erroneous behaviors and restore the SC semantics of
the program by inserting special instructions, called memory fence, at chosen
control locations in the program. A memory fence ensures that the store buffer
of the process (which executes the fence instruction) is flushed entirely before
proceeding to the next instruction for execution. In the example, when fence
instructions are placed after flag1:=true in P1 and after flag2:=true in P2, the
mutual exclusion property is restored.
Safety verification under TSO is a hard problem even in the case of finite data
domain programs. The main reason for this complexity is the unboundedness
of store buffers. A program can be proved correct under TSO only when the
non-reachability of the error location is shown irrespective of the bound on the
buffers. The work in [7] demonstrated the equivalence of the TSO-reachability
problem to the coverability problem of lossy channel machines which is decidable
and of non-primitive recursive complexity. A natural question is to ask if it is
possible to have a buffer bound k such that if a finite data domain program is safe
under the k-bounded TSO semantics then it is guaranteed to be safe even with
unbounded buffers. For programs without loops such a statement seems to hold
intuitively. For programs with loops, it is possible that a write instruction inside
a loop keeps filling the buffer with values without ever getting them flushed to
the main memory. However, for finite data domain programs, only a finite set
of different values will be present in this unbounded buffer and this leads to a
sufficient bound on the buffer size.
In this paper we show that it is possible to verify a program P under TSO
(with unbounded buffers) by generalizing the bounded buffer verification. To-
wards this we first define TSOk, TSO semantics with buffer size k, and then
characterize a bound k0 such that if a program is safe in TSOk0 then it is safe
for any buffer bound greater than k0. We adapt a recently proposed trace parti-
tioning based approach [16,25] for the TSO memory model. These methods work
for the SC memory model as follows: the set of all SC executions of a program
P are partitioned in a set of equivalence classes such that it is sufficient to prove
the correctness of only one execution per equivalence class. As this trace parti-
tioning approach works with symbolic executions, we first define an equivalent
TSO semantics to generate a set of symbolic TSO traces. Subsequently we invoke
a trace partitioning tool ProofTraPar [25] for proving the correctness of these
traces. Note that the set of behaviors of a program P under TSOk is a subset of
the behaviors of P under TSOk′ for any k
′ > k. The trace partitioning approach
allows us to reuse the proof of correctness of P with buffer bound k in the proof
of correctness of P with any buffer bound greater than k. In a nutshell, the main
contributions of this work are:
– We characterize a buffer bound in case of finite state programs such that if
the program is correct under TSO up to that bound then it is correct for
unbounded buffers as well.
– We adapt the recently proposed trace partition based proof strategy of SC
verification [16,25] for TSO by defining an equivalent TSO semantics to
generate a set of symbolic TSO traces.
– We implement our approach in a tool, ProofTraPar[25], and compare its
performance against memorax[2], a sound and complete verifier for safety
properties under TSO. We perform competitively in terms of time as well as
space. In a few examples, memorax timed out after consuming around 6GB
of RAM whereas our approach could analyze the program in less than 100
MB memory.
Section 2 covers the related work in the area of verification under relaxed
memory models. Section 3 covers the notations used in this paper. Section 4
shows the necessary and sufficient conditions to generalize bounded verification
to unbounded buffers for finite data domain programs. Section 5 presents an
equivalent TSO semantics to generate a set of symbolic traces which can be
used by the trace partitioning tool ProofTraPar to check the correctness under
a buffer bound. This section ends with an approach based on critical cycle to
insert memory fence instructions. Section 6 compares the performance of our
approach with memorax. Section 7 concludes with future directions.
2 Related Work
Figure 2 captures the related work in this area. Verification approaches for re-
laxed memory models can be broadly divided into three classes: precise, under-
approximate and over-approximate. For finite state programs, the work in [7,2]
present sound and complete algorithms for control state reachability (finite state
programs) under TSO and PSO memory models.
RMM Verification
Precise
Safety Property
Memorax[7,2]
Remmex[22][23]
SC Property
Robustness[27,11,10,6]
Persistence[3]
Under-approximate
Buffer-bounded
[13,20,24,14]
Context-bounded
[8]Over-approximate
[4,5,19]
Fig. 2: State of the art
Sets of infinite configurations, aris-
ing from unbounded buffer size, are
finitely presented using regular ex-
pressions. Acceleration based tech-
niques that led to faster convergence
in the presence of loops were pre-
sented in [22,23]. However, the ter-
mination of the algorithm was not
guaranteed. Notice that in both [2]
and [23] the specification was a set
of control states to be avoided. One
can also ask the state reachability
question with respect to SC specifica-
tion, i.e. does a program P reach only
SC reachable states under a relaxed
memory model? This problem was
shown to be of the same complexity as
of SC verification (Pspace-complete)
and hence gave a more tractable cor-
rectness criterion than general state reachability problem. [27,11,10,6,3] work
with this notion of correctness and give efficient algorithms to handle a range
of memory models. In this paper we work with the control state reachability
problem as opposed to the SC state reachability problem.
Over-approximate analyses [4,5,19] trade precision with efficiency and con-
struct an over-approximate set of reachable states. Recently [1,28] used stateless
model checking under TSO and PSO memory models. The main focus of these
approaches are in finding bugs rather than proving programs correct. Another
line of work to make the state reachability problem more tractable involved ei-
ther restricting the size of buffers [13,20,24,14] or bounding the context switches
[8] among threads. None of theses methods give completeness guarantee even for
the finite data domain programs.
3 Preliminary
A concurrent program is a set of processes uniquely identified by indices t from
the set TID. As in [2,9], a process Pt is specified as an automaton 〈Qt, LABLt, δt, q0,t〉.
Here Qt is a finite set of control states, δt ⊆ Qt × LABLt × Qt is a transition
relation and q0,t is the initial state. Without loss of generality we assume every
transition is labeled with a different symbol from LABLt. LABLt represents a
finite set of labels to symbolically represent the instructions of the program. Let
SV be the set of shared variables of program P ranged over by x, y, z, Val be
a finite set of constants ranged over by v, LVt be the set of local variables of
process Pt ranged over by ℓ,m, and Expt be the set of expressions constructed
using LVt, Val and appropriate operators. Let LV =
⋃
t LVt, Exp =
⋃
t Expt, and
LABL =
⋃
t LABLt. Let a, b, c range over LABL and and e range over Exp. For-
mally an instruction, from set INST, is one of the following type; (i) x:=e, (ii)
ℓ:=x, (iii) ℓ:=e, (iv) assume(e), and (v) fence, where x ∈ SV, ℓ ∈ LV and e ∈ Exp.
A function Ins : LABL→ INST assigns an instruction to every label.
The first two assignment instructions, (i) and (ii), are the write and the read
operations of shared variables, respectively. Instruction (iii) assigns the value of
an expression (constructed from local variables and constants) to a local variable,
hence, does not include any shared memory operation. Instruction (iv) is used
to model loop and conditional statements of the program. Note that the boolean
expression e in assume(e) does not contain any shared variable. Instruction (v)
represents the fence operation provided by the TSO architecture. Let Loc(a)
be the shared variable used in Ins(a). For a function F : A× B, let the function
F[p← q] be the same as F everywhere except at p where it maps to q.
TSO Semantics In the TSO memory model, every process has a buffer of un-
bounded capacity. However, we present the TSO semantics by first defining a
k-bounded TSO semantics where all buffers are of fixed size k. For a concurrent
program P , the k-bounded semantics is given by a transition system TSOk =
〈S,→k, s0〉. Every state s ∈ S is of the form (cs, Lm,Gm,Buffk) where process con-
trol states cs : TID→ Q, Q =
⋃
tQt, local memory Lm : TID× LV→ Val, global
memory Gm : SV→ Val, and k-length bounded buffers Buffk : TID→ (SV×Val)
k.
We overload operator ‘.’ to denote the concatenation of labels as well as a deref-
erencing operator to identify a specific field inside a state. Therefore, for a state
s, s.Gm, s.Lm and s.Buffk denote the functions representing global memory, local
memory, and buffers respectively. Every write operation to a shared variable by
process Pt initially gets stored in the process-local buffer provided that the buffer
has less than k (buffer-bound) elements. This write operation is later removed
from the buffer non-deterministically to update the global memory. A read op-
eration of a shared variable say x, by a process Pt first checks the local buffer for
any write to x. If buffer contains any write to x then the value of the last write to
x is returned as a result of this read operation. If no such write is present in the
buffer of Pt then the value is read from the global memory. A process executes
instruction fence only when its local buffer is empty. For instruction assume(e),
boolean expression e is evaluated in the local state of Pt. Execution proceeds
only when the expression e evaluates to true. Assignment operation involving
only local variables changes the local memory of Pt. The transition relation →k
is defined in detail in the Appendix.
Relevance of the buffer size k Parameter k influences the extent of reordering
that happens in an execution. For example, if k = 0 then no reordering happens
and the set of executions is the same as under the SC memory model. Size pa-
rameter k, under the TSO memory model, allows any two instructions separated
by at most k instructions to be reordered, provided that one is write and another
is a read instruction. This reorder-bounded analysis was also shown effective by
[18] and seems a natural way to make this problem tractable.
4 Unbounded Buffer Analysis
In this section we show that for any finite data domain program and safety
property φ, there exists a buffer size k0 such that it is sufficient to prove φ for
all buffers up-to size k0. Note that for programs with write instructions inside
loops, it is possible to keep on writing to the buffer without flushing them to
the main memory. However since the data domain is finite, such instruction are
guaranteed to repeatedly write the same set of values to the buffer. It is this
repetition that guarantees the existence of a sufficient bound on the buffer.
The set of states in TSOk are monotonic with respect to the buffer bound,
i.e. Sk ⊆ Sk′ , for all k ≤ k
′. Let s⇃(cs,Gm,Lm,Bufflst) denote the restriction of a state
in S to only control states, global memory, local memory, and last writes (if
any) to shared variables in buffers. Let Sk⇃(cs,Gm,Lm,Bufflst) = {s⇃cs,Gm,Lm,Bufflst | s ∈ Sk}
be the states of Sk after projecting out the above information. For finite data
domain the set
⋃∞
k=0 Sk⇃cs,Gm,Lm,Bufflst is finite because only finitely many different
possibilities exist for functions cs, Gm, Lm and Bufflst. Further, Sk⇃cs,Gm,Lm,Bufflst ⊆
Sk+1⇃cs,Gm,Lm,Bufflst . Therefore there exists a k0 such that Sk0 ⇃cs,Gm,Lm,Bufflst is equal
to the set Sk0+1⇃cs,Gm,Lm,Bufflst . In this section we show that for every k > k0, sets
Sk⇃cs,Gm,Lm,Bufflst and Sk+1⇃cs,Gm,Lm,Bufflst are equal and hence we can stop the analysis
at k0.
For a buffer Buff(t), let σBuff(t).lst denote the sequence of last writes to shared
variables in buffer Buff(t). Let Exec(Gm, Lm(t),Buff(t), σ.a.σ′, (x, v), Lm′(t)) be
a predicate, where a = (x := e), that holds true iff (i) after executing sequence
σBuff(t).lst.σ.a.σ
′ from the global memory Gm and local memory Lm(t) the local
memory of process t is Lm′(t) and (ii) in the same sequence the value of expression
e in write instruction x := e at label a is v. The following two lemmas relate
the states of TSOk and TSOk+1 transition systems. We use s0
σ
→ s to denote a
sequence of transitions over a sequence σ of labels.
Lemma 1. For all n, σ, s ∈ Sk+1 such that s0
σ
→ s, |σ| = n and k ≥ 0, there
exists a state s′ ∈ Sk such that s
′.Gm = s.Gm and for all t ∈ TID,
1. (|s.Buffk+1(t)| = k + 1)⇒ ∃x, v.

(i) s.Buffk+1(t) = s
′.Buffk(t).(x, v)
(ii) ∃ qt, qt′ , σ
′, σ′′ such that (qt, a, qt′) ∈ δt,
s′.cs(t)
σ′
→ qt, qt′ .cs(t)
σ′′
→ s.cs(t),
σ′ and σ′′ do not modify the buffer of Pt, and
Exec(s′.Gm, s′.Lm(t), s′.Buffk(t), σ
′.a.σ′′, (x, v), s.Lm(t))
and
2. (|s.Buffk+1(t)| < k + 1)⇒

(i) s.Buffk+1(t) = s
′.Buffk(t),
(ii) s.Lm(t) = s′.Lm(t), and
(iii) s.cs(t) = s′.cs(t)
s′.cs(t) = s.cs(t). The above lemma states that every state in Sk+1 where the
buffer sizes of all processes are less than k+1, is also present in Sk. The detailed
proof of this lemma is given in the Appendix. Now we are ready to prove that
after k0, any increase in buffer size does not yield any new reachable control
location.
Theorem 1. For all k,
(Sk⇃(cs,Gm,Lm,Bufflst) = Sk+1⇃(cs,Gm,Lm,Bufflst)) ⇒
(Sk+1⇃(cs,Gm,Lm,Bufflst) = Sk+2⇃(cs,Gm,Lm,Bufflst))
Proof. there exists a state s′ ∈ Sk+1 such that s.cs = s
′.cs, s.Gm = s′.Gm, s.Lm =
s′.Lm and s.Bufflst = s
′.Bufflst. It is sufficient to show that (Sk+2⇃(cs,Gm,Lm,Bufflst) ⊆
Sk+1⇃(cs,Gm,Lm,Bufflst)
) as the other side of inclusion holds. Let us prove it by con-
tradiction, i.e. there is a state s ∈ Sk+2 such that no state s
′ ∈ Sk+1 exists with
s.cs = s′.cs, s.Gm = s′.Gm, s.Lm = s′.Lm and s.Buff lst = s.Bufflst. Following
Lemma 1, this state s must have at least one buffer with k + 2 entries in it.
Without loss of generality let t ∈ TID such that s.Buffk+2(t) is the only full
buffer.
1. Clearly, there exists a state s′ ∈ Sk+2 where all buffers except t are the same
as in s, s′.Buffk+2(t) is of size k+1 and there exists a sequence of transitions
σ.a.σ′ from s′.cs(t) to s.cs(t) by process t with only one write operation a.
2. For state s′, the conditions s′.Gm = s.Gm (as no flush operation in σ), and
Exec(s′.Gm, s′.Lm(t), s′.Buff lst(t), σ.a.σ
′, s.Lm(t)) hold.
3. As all buffers of s′ are of size at most k + 1 therefore s′ also exists in Sk+1
(Lemma 1).
4. As Sk⇃(cs,Gm,Lm,Bufflst) = Sk+1⇃(cs,Gm,Lm,Bufflst) holds, therefore there exists a state
s′′ ∈ Sk such that (i) s
′′.Gm = s′.Gm, (ii) s′′.Lm = s′.Lm, (iii) s′′.cs = s′.cs,
and (iv) s′′.Bufflst(t) = s
′.Bufflst(t) for all t ∈ TID.
5. This state s′′ can have at most k entries in its process buffers. Therefore this
state must be present in Sk+1 as well.
6. Using Point 2 and the conditions (i),(ii),(iii), and (iv) of Point 4 above,
we get Exec(s′′.Gm, s′′.Lm(t), s′′.Bufflst(t), σ.a.σ
′, s.Lm(t)). This implies that
after executing the sequence σ.a.σ′ by process t from state s′′ in Sk+1 the
resultant state, say s′′′ will have at most k + 1 write entries in the buffer of
process t. Further the global memory, local memories, control states and last
writes to shared variables in buffers will be identical in s′′′ and s. Therefore
s′′′ ∈ Sk+1 is the matching state with respect to s, a contradiction.
5 Trace partitioning approach
As a consequence of Theorem 1 one can use an explicit state model checker for
state reachability analysis of finite data domain programs. However, in this pa-
per we are interested in adapting a recently proposed trace partitioning based
verification method [16,25] for relaxed memory models. This method has been
shown very effective for verification under the SC memory model. The approach
for SC verification, as given in [25], is presented in Algorithm 1. Firstly, an au-
tomaton is built that represents the set of symbolic traces under the SC memory
model. For SC memory model such an automaton is obtained by language level
shuffle operation [26,17] on individual processes. Subsequently, a symbolic trace
is picked from this automaton and checked against a given safety property using
weakest precondition axioms [15]. If this trace violates the given property then
we have a concrete erroneous trace. Otherwise, an alternating finite automaton
(AFA) [12] is constructed from the proof of correctness of this trace.
The AFA construction algorithm ensures that every trace in the language of
this AFA is correct and hence can be safely removed from the set of all symbolic
traces of the input program. This process is repeated until either all symbolic
traces are proved correct or an erroneous trace is found. This algorithm is sound
and complete for finite data domain programs.
Input: A concurrent program P = {p1, · · · , pn} with safety property φ
Result: yes, if program is safe else a counterexample
Construct the automaton A(P) to capture the set of all SC traces of P ;
Let tmp be the language of A(P);
while tmp is not empty do
Let σ ∈ tmp with φ as a safety assertion to be checked;
Let Aˆσ,¬φ be the AFA constructed from σ and ¬φ ;
if σ violates φ then
σ is a valid counterexample;
return (σ);
else
tmp := tmp \ Rev, where Rev is the reverse of the language of Aˆσ,¬φ;
end
end
return (yes);
Algorithm 1: SC verification algorithm[25]
The main challenge in applying this trace partitioning approach to the TSO
memory model is the construction of the set of symbolic traces. Consider a
program with two processes in Figure 3. With initial values of shared variables x
and y as 0, it is possible to have ℓ1 = ℓ2 = 0 under the TSO memory model. We
can construct a symbolic trace b.d.a.c such that after executing this sequence
the state ℓ1 = ℓ2 = 0 is reached.
a. x:=1 c. y:=1
b. ℓ1:=y d. ℓ2:=x
Fig. 3
Note that this trace is not constructible using
the standard interleaving semantics which was used
to construct the set of traces under the SC memory
model. This is because of the program order between
a and b in process 1 and between c and d in process
2. To use Algorithm 1 for the TSO memory model we
would like to first construct a set of all such symbolic traces such that the sequen-
tial executions of these traces yield all reachable states under the TSO memory
model. For the above example, it involves breaking the program orders a−b and
c − d and then applying standard interleaving semantics to construct symbolic
traces under the TSO memory model. Let us look at another non-trivial example
in Figure 4.
a. ℓ:=2 d. m:=3
b. y:=ℓ+ 1 e. x:=m+ 2
c. ℓ:=x f. m:=y
Fig. 4
Assume the initial values of all variables are 0, and
ℓ, m are local variables. In TSO it is possible to have
the final values of variables ℓ and m as 0. This can
happen when writes at b and e are still in the buffers
and the read operations at c and f read from the initial
values. Let us construct a symbolic trace whose sequential execution will yield
this state. In this trace label e must appear after label c and label b must appear
after label f. This means that the trace will break either the order between b and
c or the order between e and f. However, by breaking the order between b and
c the value of ℓ = 2, assigned at a, no longer flows to b and hence y is assigned
the wrong value 1. Similarly by breaking the order e and f the value of m = 3,
assigned at d no longer flows to e and hence x is assigned the wrong value 1. In
a nutshell, it is not possible to create a symbolic trace whose execution will yield
the state where ℓ = m = 0, x = 5, and y = 3. Notice that the problem appeared
because of the use of the same local variable in two definitions. Such a scenario
is unavoidable when (i) multiple reads are assigned to the same local variable,
and/or (ii) in the case of loops the local variable appears in a write instruction
within the loop.
We propose to handle such cases by renaming local variables, viz. ℓ and m in
this case. For example, the execution of trace σ = ℓ:=2. ℓ1:=ℓ .ℓ:=x. y:=ℓ1 + 1
.m:=3. m1:=m .m:=y. x:=m1 + 2 results in state ℓ = m = 0, x = 5, y = 3 as
required by a TSO execution. Let us look at instructions highlighted in gray color
more carefully. We earlier saw that the problem arises when reordering b−c and
e−f instructions as their reordering will break the value flows of ℓ and m from a
and d respectively. Therefore, we create new instances of these local variables, ℓ1
and m1, to take the snapshot of ℓ and m respectively which are later used in the
write instructions b and e. This renaming ensures that even if we reorder b− c
and e−f instructions (as done in σ) the correct value flows from a to b and from
d to e are not broken. We will show that for a buffer bound of k it is sufficient
to use at most k instances of these local variables and they can be safely reused
even in the case of loops. We call such symbolic traces, that correspond to TSOk
executions, as SC interpretable traces. Formally, SC interpretation of a trace
σ ∈ LABL∗ is a function SCI : LABL∗×Var→ Val∪ {Undef}. such that SCI(σ, x)
calculates the last value assigned to variable x in the sequential execution of σ.
For example, if σ = a.b.c where labels a, b, and c denote ℓ := 3, x := ℓ + 2 and
y := 2 respectively then SCI(σ, x) = 5 and SCI(σ, ℓ) = 3. Label Undef is used to
denote the in-feasibility of σ as some boolean expressions in assume instructions
may become unsatisfiable because of the values that flow in them. If σ does not
contain any assignment to x then SCI(σ, x) returns the initial value of x.
Let us now construct a transition system such that the traces of this transition
system represent SC interpretable traces corresponding to TSOk semantics. We
represent this transition system as TSO♯k = 〈S
♯,⇒k, s
♯
0〉. Every state s
♯ ∈ S♯
is of the form (cs, Li,Buff♯k) such that cs : TID → Q represents process control
states, and Buff♯k : TID→ (SV×LABL)
k represents per process buffers of length
k. Unlike the buffers of TSOk, these buffers contain write instruction labels
along with the modified shared variable. A function Li : TID×LV→ N tracks the
instances of the local variables which have been used (for renaming purposes)
in the construction of traces up to a given state. First, we define ⇒k for simple
Ins(a) = (ℓ := x),
Buff
♯
k(t) ⇃{x}×LABL= ǫ
(cs, Li,Buff♯k)
a
⇒k (cs′, Li,Buff
♯
k)
(MRead♯)
Ins(a) = (ℓ := e)
(cs, Li,Buff♯k)
a
⇒k (cs′, Li,Buff
♯
k)
(LWrite♯)
Ins(a) = (assume(e))
(cs, Li,Buff♯k)
a
⇒k (cs′, Li,Buff
♯
k)
(Assume♯)
Buff
♯
k = (x, a).Buff
♯′
k
(cs, Li,Buff♯k)
a
⇒k (cs′, Li,Buff
♯′
k)
(flush♯)
Fig. 5: All rules assume transitions for thread t, ie. cs[t] = q, (q, a, q′) ∈ δt, and
cs′ = cs[t← q′]
cases, viz. read from memory, operations associated with local variables like
assume(e) and ℓ := e, and non-deterministic flush. In Rules MRead♯, LWrite♯,
and Assume♯ the labels that denote these operations are put in the trace with
only change in the control state of the process. As there is no notion of local
and global valuation in a state s♯ of the transition system, no update takes place
unlike in TSOk. For memory read operation, in Rule MRead
♯, the condition on
the buffer of Pt is the same as in TSOk. For non-deterministic flush operation,
Rule flush♯ removes the first label present in the buffer of Pt and puts that in the
trace. In rule Assume♯, the assume instruction is simply put in the trace without
evaluating the satisfiability of the boolean expression. This is different from the
corresponding rule in TSOk. This difference follows from the fact that we are
only interested in constructing symbolic traces. Symbolic model checking of these
traces will ensure that only feasible executions get analyzed (where all assume
instructions hold true). Now let us look at the remaining three operations, viz.
read from the buffer, write to the buffer and fence instruction, in detail.
Buffered Read Like TSOk, this transition takes place when Pt executes an
instruction ℓ := x to read the value of shared variable x and store it in its
local variable ℓ.
Ins(a) = (ℓ := x),Buff♯k ⇃{x}×LABL= α.(x, b),
Ins(b) = (x := e), Ins(c) = (ℓ := e)
(cs, Li,Buff♯k)
c
⇒k (cs′, Li,Buff
♯
k)
(BRead♯)
For this transition to take place, the buffer of Pt must have at least one write
instruction that modifies the shared variable x. Conditions Buff♯k ⇃{x}×LABL=
α.(x, b) and Ins(b) = (x := e) ensure that the last write to x in Buff♯k of
Pt is due to instruction Ins(b) which is of the form (x := e). Under these
conditions, in TSOk, read of x uses the value of expression e to modify ℓ.
Whereas in TSO♯k a label c is added to the trace such that Ins(c) represents
the assignment of e to variable ℓ.
Buffered Write This transition takes place when Pt executes a write instruc-
tion of the form x := e. Let
#»
ℓ be a set of local variables used in expression
e. For each of the local variables ℓ in
#»
ℓ , an integer Li(ℓ) is used to create an
assignment instruction of the form ℓLi(ℓ) := ℓ. These instructions are put in
the trace (through corresponding symbolic labels #»aℓ). Further, expression e
is also modified where every instance of a local variable ℓ in
#»
ℓ is substituted
with ℓLi(ℓ).
Ins(a) = (sv := e),FV(e) =
#»
ℓ , |Buff♯k| < k,
∀ℓ ∈
#»
ℓ , create a label aℓ (if not already present in LABL)s.t.
Ins(aℓ) = (ℓLi(ℓ) := ℓ), Li
′[ℓ] = Li[ℓ]%(k + 1) + 1
create a label a′ (if not already present in LABL)s.t
Ins(a′) = (sv := e′), e′ = e[
#»
ℓ /
#       »
ℓLi(ℓ)],
Buff
♯′
k = Buff
♯
k[t← Buff
♯
k[t].(sv, a
′)]
(cs, Li,Buff♯k)
#»aℓ⇒k (cs′, Li
′,Buff♯
′
k)
(BWrite♯)
This modified expression e′ is denoted e[
#»
ℓ /
#       »
ℓLi(ℓ)] in Rule BWrite
♯. A label,
a′, representing the assignment of e′ to x is put in the buffer in the form of
a tuple (x, a′). Note that the transition rule BWrite♯ increases the value of
Li(ℓ) (modulo (k + 1)) for every local variable ℓ present in expression e. We
can show the following property,
Lemma 2. For a state s♯ = (cs, Li,Buff♯k) of TSO
♯
k, if Li(ℓ) = m then local
variable ℓm does not appear in any write instruction used in buffers Buff
♯
k.
Proof. Suppose Li(ℓ) = m holds. By assumption, local variables among pro-
cesses are disjoint therefore the only possibility is that Buff♯k[t] contains a
write instruction that uses local variable ℓm. If this were the case then there
must be at least k+1 different writes appearing between that write and the
time s♯ is reached. This holds because every write, that uses a local variable
ℓ first increments its index by 1 and wraps around after k + 1. This incre-
mented index is then used to create an instance of the local variable ℓ used
in this write operation. But it contradicts our assumption that the buffer is
of bounded length k.
The above lemma is used in the equivalence proof of TSOk and TSO
♯
k.
Fence Fence instruction, like TSOk, gets enabled only when Buff
♯
k[t] is empty.
In the resultant state, function Li(t, ℓ) is set to 1 for every local variable ℓ of
Process Pt. This enables the reuse of indices in Function Li while preserving
Lemma 2.
Ins(a) = (fence),Buff♯k[t] = ǫ
Li′ = Li[(t, ℓ)← 1], ∀ℓ ∈ LVt
(cs, Li,Buff♯k)
ǫ
⇒k (cs′, Li
′,Buff♯k)
(Fence♯)
To show the equivalence of TSOk and TSO
♯
k we want to prove the following;
(i) for every state s reachable in TSOk there exists a trace σ
♯ in TSO♯k such
that the SC interpretation of σ♯ reaches a state with the same global memory
and local memory as of s, and (ii) for every trace σ♯ of TSO♯k such that its
SC interpretation is not Undef (i.e. execution should be feasible) there exists a
state s ∈ TSOk with same global and local memory as obtained after the SC
interpretation of σ♯. We formally prove the following theorem in the Appendix.
Theorem 2. Transition systems TSOk and TSO
♯
k are equivalent in terms of
state reachability.
In Theorem 1 we used the restricted set S⇃cs,Gm,Lm,Bufflst as a means to define fixed
point. However, there are no explicit representations of the global memory (Gm)
and the local memory (Lm) in the state definition of TSO♯k. Therefore, in order
to define a fixed point condition like Theorem 1 we first augment the definition of
state in TSO♯k to include global memory and local memory. Let Gm
♯ : SV→ Lab
and Lm♯ : TID×LV→ Lab be the functions assigning labels (of write instructions)
to shared variables and local variables respectively. Specifically, Gm♯(x) = a
means that the write instruction at label a was used to define the current value
of x in this state. Similarly, Lm♯(t, ℓ) = a means that the write instruction at
label a was used to define the current value of local variable ℓ of process t. Note
that in the construction of TSO♯k the values written by these write instructions
are only being represented symbolically using instruction labels. Therefore we
need a way to relate the instruction labels and the actual values written. For
a concurrent program P with finite data domain it is possible to construct an
equivalent program P ′ such that every assignment to variables in P ′ is only of
constant values. For example, consider the program in Figure 6 such that the
domain of variable x is {1, 2}. This program is equivalent to the program in
Figure 7 where only constant values are used in the write instructions. Here the
domain of x is used along with if-then-else conditions to decide the value that
needs to be written to y.
ℓ:=x
y:=ℓ+ 3
Fig. 6
ℓ:=x
if(ℓ = 1)
y:=4
else if(ℓ = 2)
y:=5
Fig. 7
After this transformation, every write label
uniquely identifies the value written to a shared vari-
able. Hence the functions Gm♯, Lm♯ can be extended
to SV→ Val and LV→ Val respectively. This allows us
to use Theorem 1 for checking the fixed point.
5.1 Fence Insertion For Program Correction
Let P be a program that is correct under the SC mem-
ory model. Let σ be an execution of P that violates
the given safety property under the TSO memory model. We can insert a fence
instruction in P so that σ does not appear as an execution under the TSO mem-
ory model. Towards this we use the critical cycle based approach of [27] and [6]
to detect the locations of fence insertions. For an execution σ, let Cmptσ be a
competing[6] or conflicting[27] relation on the read and write events of σ such
that (a, b) ∈ Cmptσ iff (i) both memory events operate on the same location but
originate from different processes, (ii) at least one of them is a write instruc-
tion, and (iii) a appears before b in σ. Let poσ denote the program order among
instructions of processes present in σ. This is defined based on the process speci-
fication. Let ppoσ = poσ \ {(a, b) | a ∈W, b ∈ R, (a, b) ∈ poσ} be a subset of poσ
preserved under TSO memory model, i.e. everything except write-read orders.
An Execution σ contains a critical cycle
cs
→⊆ (Cmptσ ∪ poσ)
+ iff (i) no cycle
exists in (Cmptσ∪ppoσ)
+, (ii) per process there are at most two memory accesses
a and b in
cs
→ such that Loc(a) 6= Loc(b), and (ii) for a given shared variable x
there are at most three memory accesses on x which must originate from differ-
ent processes. Following Theorem 1 of [6], an execution in TSO is sequentially
consistent if and only if it does not contain any critical cycle. Therefore, in order
to forbid an execution in TSO that is not sequentially consistent, it is sufficient
to ensure that no critical cycle exists in that execution. To avoid critical cycles,
we need to strengthen the ppoσ relation by adding a minimal set of program
orders such that Point (i) of critical cycle definition is not satisfied, i.e. finding a
set Dlay ⊆ poσ \ppoσ, set of write-read pairs of instructions within each process,
such that (Cmptσ∪ppoσ∪Dlay)
+ becomes cyclic. Once we identify that minimal
set of program orders we insert fence instructions in between them to enforce
the required orderings.
Overall Algorithm Algorithm that combines incremental buffer bounded verifi-
cation and fence insertion for finite data programs works as follows. We start the
verification with buffer bound of 0. Towards this, the transition system TSO♯k is
constructed using the relation ⇒k given in this section. This transition system
is represented as an automaton with error location representing the accepting
states and initial locations representing the initial state. The set of traces ac-
cepted by this automaton are the passed to the trace partitioning algorithm
implemented by [25] in the tool ProofTraPar. If an erroneous trace is found
then the program is not safe even under the SC memory model and hence the
algorithm returns the result as ‘Unsafe’. If all traces satisfy the given safety
property then the bound is increased by one and the analysis starts again. If
an error trace is found for non-zero buffer bound then the critical cycles are
obtained from this trace. Using these critical cycles a set of fence locations are
generated and the input program is modified by inserting fences in the code.
After the modification the analysis again starts with the same bound. This is
just an implementation choice because even if we increase the bound after the
modification still the fixed point will be eventually reached.
6 Experimental Results
We implemented our approach by extending the tool ProofTraPar which im-
plements the trace partitioning based approach of [25]. We implemented TSO♯k
semantics and fixed point reachability check on top of ProofTraPar. Its perfor-
mance was compared against memorax which implements sound and complete
verification of state reachability under the TSO memory model. Note that other
tools which exist in this landscape of relaxed memory verification either consider
SC behaviour as specification [3,6,10] or are sound but not complete [23,1,28].
However memorax does not assume any bound on the buffer size and it uses
the coverability based approach of well-structured-transition systems. Table 8
compares the performance, in terms of time and memory, of our approach with
memorax. We ran all experiments on Intel i7-3.1GHz, 4 core machine with 8GB
RAM. Out of 11 examples, our tool outperformed memorax in 8 examples. Our
Program # P ProofTraPar Memorax[2] # F
Time Memory Time Memory
(Sec) (MB) (Sec) (MB)
Peterson.safe 2 1.19 20 0.9 43 2
Dekker.safe 2 1.6 21.3 54.2 676 2
Lamport.safe 2 17 42 97 2312 4
Szymanksi.safe 2 27 121 ERR ERR 4
Alternating Bit(ABP) 2 3.12 39 0.17 11 0
Dijkstra 2 16 70 - - 2
Pgsql 2 1.2 20 210 2800 2
RWLock.safe (2R,1W) 3 41 164 - - 2
clh 2 326 1500 - - 0
Simple-dekker 3 103 155 600 3280 3
Qrcu.safe (2R,1W) 3 490 3000 - - 0
Fig. 8: Comparison of our tool with Memorax[2]. Time out, denote by ‘-’ is set
to 10 minute. #P and #F denote number of processes and number of fences
synthesized.
tool not only performed better in terms of time but also in terms of the memory
consumption. Except in two cases, qrcu and clh queue, on every other exam-
ple our tool consumed less than 200 MB of RAM. Whereas memorax in most
cases took more than 500 MB of RAM and in some cases even touched the 3GB
mark. Programs like Alternating bit protocol, clh queue andQrcu(quick read copy
update algorithm) remain correct even under TSO memory model. For other al-
gorithms where bugs were exposed under TSO we were able to synthesize fences
to correct their behaviour.
Analysis of the benchmarks memorax performed better on three benchmarks,
viz. peterson, szymanksi, and ABP. After carefully looking at them we realized
that the performance of memorax loosely depends upon the number of back-
ward control flow paths from error location to the start location, and number of
write instructions present along those paths. In benchmarks where ProofTraPar
outperformed memorax, viz. dekker, lamport, clh, qrcu, more than two such con-
trol paths exist. To further check this hypothesis experimentally we modified
peterson and ABP to add a write instruction along an already existing control
flow path where no write instruction was present. This write was performed
on a variable which was never read and hence did not affect the program. Af-
ter this modification memorax became more than 6 time slower in analyzing
these two benchmarks. Further, the analysis of these modified benchmarks with
ProofTraPar exhibited a very little (less than a second) increase in time as
compared to the unmodified benchmarks. Interestingly, a bug was exposed in
memorax when we made a similar change in szymanksi. As a result of this bug
the modified program szymanksi was declared as safe. Note that the original
program szymanksi is incorrect under TSO and we only modified the code by
adding a write instruction to an unused variable. Therefore it is not possible for
the modified program szymanksi to become safe unless there is a bug in the tool.
This bug was also confirmed by the author of memorax.
6.1 Discussion
Note that memorax starts from the symbolic representation of all possible config-
urations of buffer contents which it further refines using backward reachability
analysis. However, in our approach we start from a finite and small buffer bound
( an under-approximation) and keep expanding until we reach a fix point. We
believe that this difference, picking an over-approximation as a starting point in
one case and an under-approximation as a starting point in the other case, plays
a crucial role in the better performance of our approach on these benchmarks.
In all benchmarks, except peterson, buffer size of 1 was sufficient to expose
the error. In peterson, buffer size of 2 was needed to expose the bug. Effec-
tively, the buffer size depends upon the minimum distance (along control flow
path) between a write and a read instruction within a process whose reordering
reveals the bug. In the case of peterson, this distance is 2 since the reorder-
ing of first instruction (write to flagi) and third instruction (read of flagj)
within each process reveals the bug. In our benchmarks fence instructions were
inserted after finding an erroneous trace, as discussed in Section 5.1. Fence in-
struction restricts the unbounded growth of the buffer by flushing the buffer
contents. As a result, when a fence is inserted within a loop the buffer never
grows in size with loop iterations and fix point is reached quickly. In fact, for
all the benchmarks, if a bug was exposed with buffer size k then after inserting
the fence instruction the fix point was reached with buffer size k + 1. Bench-
marks which remain correct under TSO, a larger buffer bound was required to
reach the fix point and this bound depends upon the number of write opera-
tions in each process. As a result, their analysis took longer time and consumed
more memory. Detailed analysis of the benchmarks and the tool are available at
www.cse.iitd.ac.in/~chinmay/ProofTraParTSO.
7 Conclusion and Future Work
This paper uses the trace partitioning based approach to verify state reachability
of concurrent programs under the TSO memory model. We have also shown that
for finite state programs there exists a buffer bound such that if program is safe
up-to that bound then the program is guaranteed to be safe for unbounded
buffers as well. This work can be easily extended to PSO memory model as well.
This method gives us an alternate decidability proof of state reachability under
TSO (and PSO) memory model. We have also shown experimentally that for
standard benchmarks used in the literature such a bound is very small (in the
range of 2-4) and hence we may use SC verification based methods to efficiently
check concurrent programs under these memory models. We believe that for
other buffer based memory models a buffer bound can be shown to exist in a
similar manner. Recently [21] proposed a buffer based operational semantics for
C11 model. It will be interesting to investigate the use of bounded buffer based
method proposed in this paper to that semantics as well.
References
1. Abdulla, P. A., Aronis, S., Atig, M. F., Jonsson, B., Leonardsson, C.,
and Sagonas, K. F. Stateless model checking for TSO and PSO. In TACAS’15.
2. Abdulla, P. A., Atig, M. F., Chen, Y.-F., Leonardsson, C., and Rezine, A.
Counter-example guided fence insertion under tso. TACAS’12, Springer-Verlag.
3. Abdulla, P. A., Atig, M. F., and Ngo, T.-P. The best of both worlds: Trading
efficiency and optimality in fence insertion for tso. In ESOP’15, Springer-Verlag.
4. Alglave, J., Kroening, D., Nimal, V., and Poetzl, D. Don’t sit on the fence
- A static analysis approach to automatic fence insertion. In CAV’14.
5. Alglave, J., Kroening, D., Nimal, V., and Tautschnig, M. Software verifi-
cation for weak memory via program transformation. In ESOP’13 (2013).
6. Alglave, J., and Maranget, L. Stability in weak memory models. CAV’11,
Springer-Verlag, pp. 50–66.
7. Atig, M. F., Bouajjani, A., Burckhardt, S., and Musuvathi, M. On the
verification problem for weak memory models. SIGPLAN Not., Jan’10 45, 1.
8. Atig, M. F., Bouajjani, A., and Parlato, G. Getting rid of store-buffers in
TSO analysis. In CAV’11 (2011).
9. Bouajjani, A., Calin, G., Derevenetc, E., and Meyer, R. Lazy TSO reach-
ability. In FASE’15 (2015).
10. Bouajjani, A., Derevenetc, E., and Meyer, R. Checking and enforcing ro-
bustness against tso. ESOP’13, Springer-Verlag, pp. 533–553.
11. Burnim, J., Sen, K., and Stergiou, C. Sound and complete monitoring of
sequential consistency for relaxed memory models. TACAS’11, Springer-Verlag.
12. Chandra, A. K., Kozen, D. C., and Stockmeyer, L. J. Alternation. J. ACM
28, 1 (Jan. 1981), 114–133.
13. Dan, A. M., Meshman, Y., Vechev, M. T., and Yahav, E. Predicate abstrac-
tion for relaxed memory models. SAS’13, pp. 84–104.
14. Dan, A. M., Meshman, Y., Vechev, M. T., and Yahav, E. Effective abstrac-
tions for verification under relaxed memory models. VMCAI’15, pp. 449–466.
15. Dijkstra, E. W. Guarded commands, nondeterminacy and formal derivation of
programs. Commun. ACM 18, 8 (Aug. 1975), 453–457.
16. Farzan, A., Kincaid, Z., and Podelski, A. Inductive data flow graphs. In
POPL’13.
17. Hopcroft, J. E., Motwani, R., and Ullman, J. D. Introduction to automata
theory, languages, and computation, 2nd edition.
18. Joshi, S., and Kroening, D. Property-driven fence insertion using reorder
bounded model checking. In FM 2015: (2015).
19. Kuperstein, M., Vechev, M. T., and Yahav, E. Partial-coherence abstractions
for relaxed memory models. PLDI’11, pp. 187–198.
20. Kuperstein, M., Vechev, M. T., and Yahav, E. Automatic inference of mem-
ory fences. SIGACT News 43, 2 (2012), 108–123.
21. Lahav, O., Giannarakis, N., and Vafeiadis, V. Taming release-acquire consis-
tency. In POPL’16 (New York, NY, USA, 2016), POPL 2016, ACM, pp. 649–662.
22. Linden, A., and Wolper, P. A verification-based approach to memory fence
insertion in relaxed memory systems. In SPIN’11 (2011), pp. 144–160.
23. Linden, A., and Wolper, P. A verification-based approach to memory fence
insertion in pso memory systems. TACAS’13, Springer-Verlag, pp. 339–353.
24. Meshman, Y., Dan, A. M., Vechev, M. T., and Yahav, E. Synthesis of memory
25. Narayan, C., Sharma, S., Guha, S., and Arun-Kumar, S. From
traces to proofs: Proving concurrent program safe (accepted for publish-
ing). Theoretical Aspects of Software Engineering, 2016 (arXived Version:
http://arxiv.org/abs/1506.07635).
26. Riddle, W. E. An approach to software system modelling and analysis. Comput.
Lang. 4, 1 (Jan. 1979), 49–66.
27. Shasha, D., and Snir, M. Efficient and correct execution of parallel programs
that share memory. TOPLAS 10, 2 (Apr. 1988), 282–312.
28. Zhang, N., Kusano, M., and Wang, C. Dynamic partial order reduction for
relaxed memory models. In Proceedings of the 36th ACM SIGPLAN Conference on
Programming Language Design and Implementation (New York, NY, USA, 2015),
PLDI 2015, ACM, pp. 250–259.
