Context-bounded analysis of tso systems by Atig, Mohamed Faouzi et al.
Context-Bounded Analysis of TSO Systems
Mohamed Faouzi Atig1, Ahmed Bouajjani2, Gennaro Parlato3
1 Uppsala University, Sweden
2 LIAFA, Universit e Paris Diderot & Institut Universitaire de France, France
3 School of Electronics and Computer Science, University of Southampton, UK
Abstract. We address the state reachability problem in concurrent pro-
grams running over the TSO weak memory model. This problem has been
shown to be decidable with non-primitive recursive complexity in the
case of nite-state threads. For recursive threads this problem is unde-
cidable. The aim of this paper is to provide under-approximate analyses
for TSO systems that are decidable and have better (elementary) com-
plexity. We propose three bounding concepts for TSO behaviors that are
inspired from the concept of bounding the number of context switches
introduced by Qadeer and Rehof for the sequentially consistent (SC)
model. We investigate the decidability and the complexity of the state
reachability problems under these three bounding concepts for TSO, and
provide reduction of these problems to known reachability problems of
concurrent systems under the SC semantics.
1 Introduction
Sequential consistency is the standard interleaving model for shared memory
concurrent programs, where computations of a concurrent programs are inter-
leaved sequences of actions of the dierent threads, performed in the same or-
der as they appear in the program. However, for performance reasons, modern
multi-processors do not preserve in general the program order, that is, they may
actually reorder actions executed by a same thread. This leads to so-called weak
or relaxed memory models. One of such models is TSO (Total Store Order),
which is adopted for instance in x86 machines [36]. In TSO, write operations
can be delayed and overtaken by read operations. This corresponds to the use of
FIFO store buers, one per processor, where write operations wait until they are
committed in the main memory. Writes are therefore not visible immediately,
which may lead to undesirable behaviors since older values than expected may
be read along program computations.
Actually, for data-race free programs it can be shown that weak memory
models such as TSO induce the same semantics as SC, that is, all possible com-
putations under TSO are also possible under SC [35,4,5,9,18,31,21]. However,
data-race-freedom cannot be ensured in all situations. This is for instance the
case for low level lock-free programs used in many concurrency libraries and
other performance-critical system services. The design of such algorithms, which
must be aware of the underlying memory model, is in general extremely dicultdue to the unintuitive and hard to predict eects of the weak memory mod-
els. Therefore, it is important to develop automatic verication techniques for
programs running on such memory models.
In this paper, we focus on the TSO model and we address the state reach-
ability problem, i.e., whether a state of the program (composed by the control
locations of the threads and the memory state) is reachable from an initial state.
This problem is of course relevant for checking (violations of) safety properties.
To reason about programs running over TSO, we adopt an operational model
based on parallel automata with unbounded FIFO queues representing the store
buers. The automata model the threads running on each of the processors.
These automata are nite-state when programs do not have recursive procedure
calls. For the case of recursive programs, threads are modeled using pushdown
automata (automata with unbounded stacks). Note that our models have un-
bounded stacks and unbounded queues. In fact, although these structures are
necessarily nite in actual machines, we may not assume any xed bound on
their size, so a nite-state model would not be sucient to reason about the
correctness of a general algorithm for all possible values of these bounds.
Even for nite-state processor threads, the decidability of the state reacha-
bility problem under TSO is not trivial due to the unboundedness of the queues.
However, it has been shown that this problem is actually decidable, but unfortu-
nately with very high complexity [11]. Indeed, the complexity of state reachabil-
ity jumps from PSPACE for SC to non-primitive recursive for TSO. As for the
case of recursive programs, it is easy to prove that the problem is undecidable
as for SC. Therefore, it is important to investigate conditions under which the
complexity of this problem becomes elementary, and for which decidability can
be obtained even in the case of recursive programs. The approach we adopt in
this paper for this purpose is based on the idea of bounding the number of con-
text switches that has been used for the analysis of shared memory concurrent
programs under SC [34].
An important issue is to dene a suitable notion of context in the case of
TSO systems that oers a good trade-o between coverage, decidability and
complexity. The direct transposition of the denition for SC to this case consists
in considering that a context is a computation segment where only one processor
thread is active. This processor-centric denition does not restrict the behavior
of the memory manager which can execute at any time write operations taken
from any store buer. A memory-centric denition, that is the dual of the pre-
vious one, considers that in a context only one store buer is used for memory
updates, without restricting the behaviors of the processor threads. Finally, a
combination of the two previous denitions leads to a notion of context where
only one processor thread is active, and only its store buer can be used for
memory updates. Notice that the three denitions above coincide with the one
for SC when all write operations are immediately executed (i.e., the store buers
are of size 0).
We study the decidability and complexity of the analyses corresponding to
these three denitions, named pc-CBA, mc-CBA, and pmc-CBA, for processor,memory, processor-memory centric context-bounded analysis, respectively. In
terms of behavior coverage, pc-CBA and mc-CBA are incomparable, and both
of them subsume clearly pmc-CBA.
Actually, pmc-CBA coincides with the analysis that we have introduced and
studied in [13]. Interestingly, this analysis can be reduced linearly to the context-
bounded analysis for SC, and therefore both analysis have the same decidabil-
ity and complexity characteristics. In addition to the fact that this analysis is
decidable and has an elementary complexity (as opposed to the general TSO
reachability analysis which is non-primitive recursive as mentioned above), a
nice feature of this reduction is that the resulting analysis does not need explicit
representation for the contents of the queues. It is possible to show that the
content of the queue can be simulated in this case by adding a linear number
of additional copies of the global variables. Also, our result allows to use for
analyzing programs under TSO all the techniques and tools developed for SC
context-bounded analysis, especially those based on code to code translations to
sequential programs [28,26].
Then, the main contributions of this paper concern the decidability and com-
plexity of the other two more powerful analyses pc-CBA and mc-CBA. First, we
prove that in the case of nite-state processor threads, the pc-CBA is decidable
with an elementary complexity. The complexity upper bound we have is polyno-
mial in the size of the state space of the program (product of the thread automata
and the memory state) and doubly exponential in the number of contexts. The
proof is based on a reduction to the reachability problem of bounded-reverse-
phase multiply pushdown automata (brp-MPDA). These models are multi-stack
automata where all computations have a bounded number computation seg-
ments called reverse-phases, and within each of these segments only one stack
can be used in a non-restricted way, while all the others can only be used for
pop operations [32]. The name of reverse-phase is by opposition to the name of
phase, used in a preceding work introducing bounded-phase multiply pushdown
automata (bp-MPDA) [24], where again only one stack is unrestricted while the
others can only be used for push operations. The decidability of the reachabil-
ity problem in bp-MPDA and brp-MPDA has been established in [24] and [32],
respectively.
The reduction from TSO systems to brp-MPDA is far from being trivial. The
diculty is, for each context, in order to simulate with a stack the FIFO queue
representing the store buer of the active threads. A naive way to do it would
use an unbounded number of reverse-phases (for stack rotations). We show, and
this is the tricky part of the proof, that this is actually possible with only one
stack rotation for each context, due to the particular semantics of the store
buers. For the case of recursive threads, we prove that however, the pc-CBA is
surprisingly undecidable. Furthermore, we prove that the mc-CBA has the same
decidability and complexity characteristics as the pc-CBA. The decidability is
in this case obtained by a reduction to the bp-MPDA mentioned above, and the
undecidability is established following the same lines as in the previous case.Related work: Context-bounded analysis has been introduced in [34] as an under-
approximate analysis for bug detection in multithreaded programs. It has been
subsequently widely studied and extended in several works, e.g., in [28,25,26,
14,16]. All these works consider the SC semantics. Our work extends this kind
of analysis to programs running over weak memory models.
The decidability and the complexity of the state reachability problem for
TSO (without restriction on the behaviors) and for other weak memory models
(such as PSO) have been established in [11,12]. We are not aware of other work
investigating the decidability and complexity results of the state reachability
problem for weak memory models.
Testing and bounded model checking algorithms have been proposed for TSO
in [19,20,7]. These methods cannot cover sets of behaviors for arbitrary sizes
of the store buers. Algorithmic methods based on abstractions or on bounding
the size of store buers are proposed in [23,6,2,1,3]. In [30], a regular model
checking-based approach, using nite-state automata for representing sets of
store buer contents is proposed. The analysis delivers the precise set of reachable
congurations when it terminates, but termination is not guaranteed in general.
Checking (trace-)robustness against TSO, i.e., whether all traces of a given
program running over TSO are also traces of computations over SC, has been
addressed in [33,8,17,15]. This problem has been shown to be decidable in [17]
and to be polynomially reducible to state reachability for SC in [15]. Trace-
robustness and (safety-)correctness for SC imply correctness for TSO, but the
converse in not true.
2 Concurrent Pushdown Systems
In this section we dene concurrent pushdown systems (Cpds) with two seman-
tics: Sequential Consistency (Sc) and Total-Store-Order (Tso). Moreover, we
dene a behaviour-language reachability problem for them.
2.1 Memory model
A (shared) memory model is a tuple M = (Var;D;0;T), where Var is a nite
set of variable names, D is a nite domain of all variables in Var, 0 : Var ! D
is an initial valuation, and T is a nite set of thread names. The set of memory
operations Mop is dened as the smallest set containing the following: nop (no-
operation), r(x;d) (read), w(x;d) (write), arw(x;d;d0) (atomic read-write), for
every x 2 Var and d;d0 2 D.
We dene the action function actM : Mop ! fnop;read;write;atomicRWg
that maps each memory operation in its type. The size of a memory model M,
denoted jMj, is jMopj + jDj + jVarj.
Below, we give the Sc and Tso semantics for a memory model.Sequential Consistency (Sc): An Sc-conguration of a memory model M
consists of a valuation map  : Var ! D. A conguration  is initial if  = 0.
Given two Sc-congurations  and 0 of M, there is an Sc-transition from  to
0 on an operation op 2 Mop performed by thread t, denoted 
op
        !
Sc;M;t
0, if one
of the following holds:
[nop] op = nop, and  = 0;
[read] op = r(x;d), (x) = d, and  = 0;
[write] op = w(x;d), 0(x) = d, and 0(y) = (y) for every y 2 (Var n fxg);
[atomic-read-write] op = arw(x;d;d0), and 
r(x;d)
        !
Sc;M;t

w(x;d
0)
        !
Sc;M;t
0.
Total Store Order (Tso): In Tso, each thread t 2 T is equipped with a FIFO
queue t to store write operations performed by t. When t writes value d into
variable x, the pair (x;d) is enqueued into t. Write operations stored in queues
will aect the content of the shared variables only later in time: a pair (x;d)
is non-deterministically dequeued from one of the queues and only at that time
d is written into x, hence visible to all the other threads. Conversely, when t
reads from x, the value that t recovers is the last value that t has written into x,
provided that this operation is still pending in t; otherwise, the returned value
for x is that stored in the memory.
Formally, a Tso-conguration of M is a tuple CM = h;ftgt2Ti, where
 : Var ! D is a valuation map, and t 2 (Var  D) for every t 2 T. CM is
initial if  = 0 and t =  for every t 2 T (where  denotes the empty word).
Let C = h;ftgt2Ti and C0 = h0;f0
tgt2Ti be two Tso-congurations of
M. There is a Tso-transition from C to C0 on op 2 (Mop [ fmemg) performed
by thread t, denoted C
op
          !
Tso;M;t
C0, if one of the following holds:
[nop] op = nop, 0 = , and 0
h = h for every h 2 T;
[read] op = r(x;d), C0 = C, and either t = 1:(x;d):2 for some 1 2 ( n
(fxg  D)), or t 2 ( n (fxg  D)) and (x) = d;
[write] op = w(x;d), 0 = , 0
t = (x;d):t, and 0
h = h for every h 2 (T nftg);
[atomic-read-write] op = arw(x;d;d0), 0
t = t = , (x) = d, 0(x) = d0,
0(y) = (y) for every y 2 (Var n fxg), and 0
h = h for every h 2 T;
[memory] op = mem, t = 0
t:(x;d), 0(x) = d, 0(y) = (y) for every y 2
(Var n fxg), and 0
h = h for every h 2 (T n ftg).
2.2 Concurrent Pushdown Systems
We start with pushdown systems which are meant to model a recursive thread.
Pushdown Systems: A pushdown system (pds) is a tuple A = (Q;q0; ;)
where Q is a nite set of control states, q0 2 Q is the initial state,   is a
nite stack alphabet, and  = int [ push [ pop is the set of A moves, with
int  Q  Q, push  Q  Q   , and pop  Q     Q.A conguration of a pds A is a pair in Q . A conguration hq;i is initial
if q = q0 and  = . There is a transition from hq;i to a conguration hq0;0i
on  2 , denoted hq;i
   !
A
hq0;0i, if one of the following holds:
[internal move]  = (q;q0) 2 int and 0 = ;
[push move]  = (q;q0;a) 2 push and 0 = a:;
[pop move]  = (q;a;q0) 2 pop and  = a:0.
We dene an action map actA :  ! fint;push;popg where actA() = a i
 2 a. The size of a pds A = (Q;q0; ;), denoted jAj, is jQj + jj.
A pds A is a nite state system (fss) if actA() = int, for every  2 .
Concurrent Pushdown Systems: A concurrent pushdown system (Cpds) is
composed by a nite number of pds{one per thread{which communicate through
a memory model M according to the Sc or the Tso semantics.
Syntax. A Cpds over a nite set of thread names T and memory model
M = (Var;D;0;T) is a set of tuples A = f(Qt;q0
t; t;M
t )gt2T, where
At = (Qt;q0
t; t;t) is a pds (called the thread t of A), and M
t  (t  Mop).
The size of a Cpds A with memory M is jMj 
Q
t2T jAtj.
A Cpds A over T is a concurrent nite state system (Cfss) if for every t 2 T,
thread At of A is a fss.
Semantics. For Mem 2 fSc,Tsog, a Mem-conguration of A is a pair C =
hfCtgt2T;CMi, where Ct is an At conguration and CM is a Mem-conguration
of M. Further, C is initial if for every t 2 T, Ct is the initial conguration of
At, and CM is the initial Mem-conguration of M.
Dene ActT = fint;push;popg and ActM = fnop;read;write;atomicRWg.
Let Act = (ActT ActM T)[(f(nop;mem)gT). There is a Mem-transition
from C = hfCtgt2T;CMi to C0 = hfC0
t)gt2T;C0
Mi on an action (a;b;t) 2 Act,
denoted C
(a;b;t)
        !
Mem;A
C0, if Ch = C0
h for every h 2 (T nftg), CM
op
            !
Mem;M;t
C0
M, and
one of the following holds:
[thread & memory transition] (;op) 2 M
t with a = actAt() and b =
actM(op), and Ct
     !
At
C0
t;
[memory transition only] a = nop, b = op = mem, and Ct = C0
t.
2.3 Reachability problem
A Mem-run of A is a sequence  = C0
(a1;b1;t1)
            !
Mem;A
C1
(a2;b2;t2)
            !
Mem;A
:::
(an;bn;tn)
            !
Mem;A
Cn
for some n 2 N, where C0 is the initial Mem-conguration of A. We dene the
behaviour of  as the sequence beh() = (a1;b1;t1)(a2;b2;t2):::(an;bn;tn). For
a behaviour language B  Act
, a Mem-conguration C of A is B-reachable if
there exists a Mem-run  of A such that C = Cn and beh() 2 B. We say that
C is reachable in A if C is (Act
)-reachable in A.Reachability problems for Cpds. Given a Cpds A, a Mem-conguration C of A
with Mem 2 fSc,Tsog, and a behaviour language B  Act
, the reachability
problem asks whether C is B-reachable in A.
It is well know that the reachability problem is undecidable for Sc-
congurations with behaviour language Act
, as 2 stacks suce to simulate
Turing machines. Furthermore, since Cpds with Tso semantics can simulate
Cpds with Sc semantics, the reachability problem is also undecidable for Tso-
congurations (and behaviour language Act
). However, if we restrict to Cfss,
the reachability problem is non-primitive recursive [11].
In the rest of the paper we consider several behaviour languages B in which
we study the decidability and complexity of the reachability problem.
3 Processor-centric Context-Bounded Analysis
In this section we consider processor-centric context-bounded analysis (pc-CBA)
for Cpds with Tso semantics. A pc-context of a Cpds A is a contiguous part
of an A run where only transitions from one thread and the memory are al-
lowed. We study both the decidability and the complexity of the reachability
problem for Cpds and Cfss under the Tso semantics up to a given number of
pc-contexts. We show that the problem is undecidable for Cpds, and decidable
with elementary complexity for Cfss.
Formally, let A be a Cpds over a set of thread names T and shared-memory
M, and let k be a positive integer. For t 2 T we dene Lt as the pc-context
behaviour language ((ActT  ActM  ftg) [ (f(nop;memg)  T)) for thread t.
A k pc-context behaviour language over T, denoted Lk
T, is the set of all words
w 2 Act
 which can be factorized as w1w2 :::wk, where for every i 2 [k], wi 2
Lti, for some thread ti 2 T. Given a Tso-conguration C of A, the k pc-context
reachability problem is the problem of deciding whether C is Lk
T-reachable in A.
In the rest of the section we prove the following 2 theorems.
Theorem 1. For any k 2 N with k  5, the k pc-context reachability problem
for Cpds under Tso is undecidable.
Theorem 2. For any k 2 N, the k pc-context reachability problem for a Cfss
A under Tso is solvable in double exponential time in the size of A and k.
3.1 Proof of Theorem 1
The undecidability results is given by a reduction from the emptiness problem of
the intersection of two context-free languages [22]: for any two pda A1 and A2,
we dene a Cpds A that can reaches under Tso a special control state within 5
pc-contexts i there is a word accepted by both A1 and A2.
A pushdown automaton (pda) over a nite alphabet  is a tuple D =
(Q;q0; ;;F), where     , E = (Q;q0; ;) is a pds, and
F  Q. A word w = a1a2 :::an 2  is accepted by B i there is a sequenceC0
1   !
E
C1
2   !
E
:::Cn 1
n   !
E
Cn such that C0 is the initial conguration of E,
(i;ai) 2  for every i 2 [n], and Cn = hqf;i for some qf 2 F and  2  .
Dene L(B) to be the set of all words in  accepted by B.
Let A1 and A2 be two pda over . For simplicity's sake, we assume that
 = 2 L(A1) [ L(A2) and that in any word w 2 L(A1) [ L(A2) there are no two
consecutive identical symbols. We dene the Cpds A with memory model M
and four threads T = ft1;t2;t3;t4g having the property that a conguration in
which all threads are in the special control state, say @, is reachable i there is a
word w 2 L(A1)\L(A2); M = (Var;[f$g;0;T) with Var = fx1;x2;x3;x4g,
and 0(xi) = $ for every i 2 [4].
Below we give a concise description of each thread ti. We assume that all
threads (1) never read or write $ into a variable, and (2) never read consecutively
the same symbol from the same variable.
{ The description of t1 is split in two stages. In the rst stage, t1 non deter-
ministically generates a word w1 = a1a2 :::an 2 +, one symbol at a time.
Each symbol is also pushed into t1's stack and simultaneously written into
x1. After the rst stage, t1 has wR
1 stored in its own stack.
{ Thread t2, reading symbols from x1, simulates the pda A1. Every symbol
read from x1 is also written into variable x2. Nondeterministically, t1 stops
the simulation whenever A1 reaches a nal state and enters the special con-
trol state @. Let w2 be the word composed by the sequence of symbols read
by t2 from x1. Note that, w2 is a sub-word of w1 (w2  w1).
{ Thread t3 acts the same as t2 except that it simulates A2 and reads from
variable x2 and writes into x3. Let w3 be the word read by t3 from x2. It is
easy to see that w3  w2.
{ Thread t4 reads a word w4 from x3 and rewrites wR
4 into x4 using its stack,
and nally enters the control state @. Again, w4  w3.
{ In the second stage, t1 checks whether it can read wR
1 from x4, where wR
1 is
the content of its stack. If this is the case, t1 enters the control state @.
From above, it is easy to see that when all threads are in the state @ the
following property holds: w4  w3  w2  w1 and w1 = w4; which is true i
w1 = w2 = w3 = w4. Furthermore, w2 = w3 is also accepted by both A1 and A2.
Thus, L(A1) \ L(A2) 6= ; i A reaches in 5 pc-contexts a conguration where
all threads are in the control state @, and this concludes the proof.
3.2 Proof of Theorem 2
The proof is given by a reduction to the reachability problem for Cpds under
Sc semantics constrained to the bounded-reverse-phase behaviour language. A
bounded-reverse-phase language is dened as follows. For a thread t 2 T, dene
Lt = ((ActT ActMftg)[(ActT nfpushgActMT)). A word in Lt describes
Cpds sub-runs in which only thread t is allowed to take all its transitions, while
the other threads are forbidden to use push transitions. For h 2 N, a h-reverse-
phase word w is such that w 2 Act
 and can be factorized as w1w2 :::wh, wherefor every i 2 [h], wi 2 Lti for some ti 2 T. A h-reverse-phase behaviour language
is the set of all k-reverse-phase words. For any given h 2 N, the k-reverse-phase
reachability problem for Sc is decidable in double exponential time as shown
below.
Theorem 3. For any k 2 N, the k-reverse-phase reachability problem for a
Cpds A under SC is solvable in double-exponential time in k and jAj, where jAj
is the size of A.
Proof. The upper-bound can be shown by a straightforward reduction to the
emptiness problem of k-reverse-phase multi-pushdown automata (introduced in
[32]) where there is a shared control-state between all the stacks. The latter
problem is known to be solvable in double-exponential time in k and exponential
time in the size of the model [32,27]. Then there is a trivial reduction from the
k-reverse-phase reachability problem for a Cpds A under SC to the emptiness
problem of a k-reverse-phase multi-pushdown automaton B by converting A into
an automaton without variables and process states (this can be done by encoding
the variable valuation and process states in the shared state of B). This will result
in an exponential blow-up and so the k-reverse-phase reachability problem for
A can be solved in double-exponential-time in k and jAj (since the size of B is
exponential in A). u t
The reduction is as follows. Let T be the set of thread names of A. We dene
a Cpds D that non-deterministically simulates A along any bounded pc-context
using the Sc semantics. More specically, D simulates consecutively each pc-
context of A using 2-reverse-phases. Below we only describe the simulation of a
single pc-context.
Invariant. At the beginning and the end of the simulation of each pc-context of
A, D encodes the conguration of A as follows. D has all threads of A, where
for every thread t 2 T, t encodes the conguration of the thread with the same
name in A along with its FIFO queue. More specically, the control state of t
in A is stored in the control state of t in D, and since t does not use its stack
at all{as A is a Cfss{the stack of t in D is used to store the FIFO queue t
in A with the head pair on the top of the stack. Moreover, the valuation of
the shared variables in A is encoded in the shared variables of D. The shared
variables of D also include an auxiliary variable used to keep track on whether
the automaton is in a pc-context simulation phase. During the simulation of a
pc-context an auxiliary thread s = 2 T is used. We guarantee that the stack of s
is empty whenever D is not in a simulation phase.
Below we describe the 3 steps for the simulation of a pc-context. During the
description we also convey a correctness showing that the invariant above holds
after the simulation of a pc-context, provided it holds at the very beginning of
that pc-context simulation.
Pre-simulation. D non-deterministically selects a thread t 2 T that is allowed
to progress in the pc-context under simulation. Then, it reverses the content ofthe stack of t into the stack of s. Note that, the last pair written in the queue
of t (in A) is now on the top of the stack of s.
As D copies the stack content, it also computes two pieces of information
that are stored in the control state of s.
The rst piece of information consists in collecting for each shared variable
x 2 Var the value corresponding to the last write pair for x, if any, that still
resides in the queue. We compute this information to avoid inspecting the stack
of s to simulate read operations from t.
The second piece of information  is used to simulate memory operation con-
cerning thread t again to avoid accessing the stack of s. It consists in a sequence
of write pairs whose length is bounded by the number of variables of Var. This se-
quence is dened by the map lastseq. For  = (x1;d1):::(xn;dn) 2 (Var D),
lastseq() is the subsequence of  in which we remove all pairs (xj;dj) such
that xj = xi and j < i. For example, for  = (y;5)(z;2)(y;4)(x;2)(z;3)(x;1),
lastseq() = (y;4)(z;3)(x;1).  is dened as follows. The queue content R of t
in A, where  is the stack content of t in D at the beginning of the simulation,
can be split in two subsequences 12, where 2 is the portion of the queue that
is dequeued by means of memory operations of t by the end of the simulation of
the current pc-context. This partition is not known at the beginning of the sim-
ulation, and is non-deterministically guessed by D. We dene  as the sequence
lastseq(2). Again, s uses  to simulate the memory operation from t without
using the stack of s. The idea is that only the elements in  are relevant for the
simulation as the remaining write pairs will be overwritten by pairs in  by the
end of the simulation hence non visible to the other threads.
Since we do not remove elements from the queue (stack of s) during the
simulation, we eliminate them only at the end of the simulation when we copy
the queue content from the stack of s to that of t. Thus, when the content of
the stack of t is reversed into the stack of s at the beginning of the simulation,
D non-deterministically guesses the intermediate point between 1 and 2 and
inserts in the stack a separation symbol $ to remember which part must be
discarded. As a remark, it may happen that all pairs in the queue may be used
to update the memory in the current pc-round and thus no $ is inserted in this
phase. If this is the case, we need to update the sequence  to keep it consistent
as we simulate write operations.
Simulation. After the pre-simulation step, D non-deterministically simulates a
sequence of A transitions that may include moves from t and memory transitions
of all threads.
A write operation performed by t in A is simulated by pushing the cor-
responding write pair (x;d) onto the stack of s. Simultaniously, s updates its
control to keep track of the last written value for x. Finally, if $ has not been
pushed in the stack yet this pair is also used to update the sequence  by con-
catenating (x;d) to  and then removing any other existing pair in  for x. After
than, s may non-deterministically decide to push $ onto the stack of s.
A read transition performed by t in A, say on variable x, is simulated by
using the last written value for x stored in the control of s, if any, otherwise thevalue of x in the shared-memory is used. Note that, when we read a value for
which we keep track of its value in the control state of s, it may be the case
that such a write pair has already been used to update the shared-memory and
we should use this value instead. However, if this is the case these two values
coincide as the shared memory cannot be overwritten by any other thread as
they are idle in the current pc-context.
A memory transition from the queue of t0, for t0 2 T n ftg, is simulated by
popping the pair from the stack of t0, which contains the head pair of the queue
of t0, and then by updating the shared-memory accordingly.
To simulate memory transitions from t's queue, we use the sequence , stored
in the control state of s. We remove the leftmost pair from  and update the
shared-memory according to it. It is easy to see, that some write pairs are not
simulated at all, in particular we do not simulate all write operations that are not
captured by . However, in terms of correctness this is not an issue as all these
values will be overwritten in the shared memory by the end of the simulation of
the current pc-context by some pair in .
Restoring the encoding of the reached A conguration. The simulation of the pc-
context can non-deterministically end, provided that the sequence  is empty.
We restore the conguration of t by copying back the control state of s into t
and the content of the stack of s into the stack of t up to the symbol $ (while
discarding the remaining stack content of s).
2 Phase for each pc-context. From the above description it is easy to see that
the number of reverse-phases needed to simulate one pc-context are 2: one is
required in the rst macro step to copy the stack content from t to s, in the
second macro step we only pushes on s's stack and hence do not need any extra
reverse-phase, and the last step consumes another reverse-phase for the copy of
s's stack into the one of t.
4 Memory-centric Context-Bounded Analysis
In this section, we consider memory-centric context-bounded analysis (mc-CBA)
for Cpds and Cfss under Tso semantics. A mc-context of a Cpds A is a contigu-
ous part of an A run where only memory transitions concerning the queue of one
thread can be performed and no restriction are posed on the actions of all the
threads. We study both the decidability and the complexity of the reachability
problem for Cpds and Cfss under the Tso semantics up to a given number of
mc-contexts. We show that the problem is undecidable for Cpds, and decidable
with elementary complexity for Cfss.
Formally, let A be a Cpds over a set of thread names T and shared-memory
M, and let k be a positive integer. For t 2 T, we dene Lt as the mc-context
behaviour language ((ActT  ActM  fTg) [ (f(nop;memg)  ftg)) for thread
t. A k mc-context behaviour language over T, denoted Lk
T, is the set of all
words w 2 Act
 which can be factorized as w1w2 :::wk, where for every i 2 [k],wi 2 Lti, for some thread ti 2 T. Given a Tso-conguration C of A, the k mc-
context reachability problem is the problem of deciding whether C is Lk
T-reachable
in A.
In the rest of the section we prove the following 2 theorems.
Theorem 4. For any k 2 N with k  5, the k mc-context reachability problem
for Cpds under Tso is undecidable.
Theorem 5. For any k 2 N, the k mc-context reachability problem for a Cfss
A under Tso is solvable in double exponential time in the size of A and k.
4.1 Proof of Theorem 4
We exploit the construction given in the proof of Theorem 1 to prove the unde-
cidability of the problem.
We show that the Cpds constructed to decide the intersection of the lan-
guages accepted by the pushdown automata A1 and A2 has also a 5 mc-context
run to witness the existence of a common word accepted by both A1 and A2, if
any. Thread t1 runs rst, until it nishes its rst context. Then, synchronously,
we interleave the memory transitions on t1's queue with the transitions of thread
t2 so that it reads the entire words w from x1 and writes it into its queue on
the variable x2. The same is done for thread t2 and t3, and then for t3 and t4.
Finally actions by t1 are synchronised with the memory transitions from t4's
queue. It is direct to see that such a schedule leads to a 5 mc-context run, and
this concludes the proof.
4.2 Proof of Theorem 5
The proof is given by a reduction to the reachability problem for Cpds under
SC semantics constrained to the bounded-phase behaviour language. A phase
captures the dual notion of a reverse-phase, as it represents a contiguous segment
of any run in which only one thread can use its stack with no restrictions,
instead all the other threads can only push in their own stack. Formally, a
bounded-phase language is dened as follows: For a thread t 2 T, dene L0
t =
((ActT  ActM  ftg) [ (ActT n fpopg  ActM  T)). A word in L0
t describes
Cpds sub-runs in which only thread t is allowed to take all its transitions, while
the other threads are forbidden to use pop transitions. For h 2 N, a h-phase
word w is such that w 2 Act
 and can be factorized as w1w2 :::wh, where for
every i 2 [h], wi 2 L0
ti for some ti 2 T. A h-phase behaviour language is the
set of all k-phase words. For any given h 2 N, the k-phase reachability problem
for Sc is decidable in double exponential time as for the case of k-reverse-phase
reachability problem for Sc (see Theorem 3).
Theorem 6. For any k 2 N, the k-phase reachability problem for a Cpds A
under SC is solvable in time double-exponential time in k and jAj, where jAj is
the size of A.Proof. The upper-bound can be shown by a straightforward reduction to the
emptiness problem of k-phase multi-pushdown automata (introduced in [24])
where there is a shared control-state between all the stacks. The latter problem
is known to be solvable in double-exponential time in k and exponential time
in the size of the model [24,32,10]. Then there is a trivial reduction from the
k-phase reachability problem for a Cpds A under SC to the emptiness problem
of a k-phase multi-pushdown automaton B by converting A into an automaton
without variables and process states (this can be done by encoding the variable
valuation and process states in the shared state of B). This will result in an
exponential blow-up and so the k-phase reachability problem for A can be solved
in double-exponential-time in k and jAj (since the size of B is exponential in A).
Before giving the reduction to the bounded-phase reachability problem for
Cpds under Sc semantics, we show that any mc-phase can be rewritten such
that: (1) In the rst part of the run, only one thread t 2 T is allowed to perform
actions and no memory transitions are not allowed for all the threads, (2) and in
the second part, only memory transitions concerning the queue of t are allowed
and no restrictions are posed on the actions of all threads except the thread t
(which is not allowed to perform any action). Formally, for t 2 T we dene Bt
as a restricted mc-context behaviour language
 
(ActT ActM ftg) ((ActT 
ActM(Tnftg))[(f(nop;memg)ftg))
for thread t. A k restricted mc-context
behaviour language over T, denoted Bk
T, is the set of all words w 2 Act
 which
can be factorized as w1w2 :::wk, where for every i 2 [k], wi 2 Lti, for some
thread ti 2 T. Given a Tso-conguration C of A, the k restricted mc-context
reachability problem is the problem of deciding whether C is Bk
T-reachable in A.
Let us assume that in a mc-context, we are only performing memory transi-
tions concerning the queue of one process t 2 T. Then it is easy to see that the
execution of t can never be aected by anyone else (since they don't update the
memory). Other threads might eect t's execution if they were able to change
the conguration of the shared-memory. However, this is not the case as only
memory transitions can occur from t's queue. Instead, memory transitions from
t do change the state of the shared-memory, but as we now argue, it cannot devi-
ate the course of t's execution. Recall that the behaviour of t depends on: (1) the
value of a variable in the memory if there is no pending write for this variable in
the queue of t, and (2) the last write operations that are still reside in the queue
of t. This means that performing (or not) memory transitions from the queue
of t will not aect the behaviour of t. This implies that any mc-context can be
reordered such that in we execute rst the sequence of actions of the process t
and then we execute the sequence of memory transitions and actions of all the
other threads. This leads to the fact that the k mc-context reachability problem
for a Tso-conguration C of A can be reduced to the k restricted mc-context
reachability problem for C (which stated by the following lemma):
Lemma 1. Given a Tso-conguration C of A, C is Bk
T-reachable in A i C is
Lk
T-reachable in A.Next, we show that it is possible to reduce the k restricted mc-context reach-
ability problem for a Cfss under Tso to the k-phase reachability problem for
a Cpds under Sc. The reduction we propose is similar in spirit to the one for
the processor-centric case (see Proof of Theorem 2), and here we only sketch the
dierences. We dene a Cpds D that simulates every restricted mc-context of
A with 3 phases. The set of thread names of B is T [ fsg, where s 62 T is an
auxiliary thread which is employed for the simulation. The invariant we main-
tain is the following: when the simulation starts and ends thread s is in an idle
state meaning that it is in a special control state, say @, and its stack is empty.
Furthermore, every other B threads t 2 T encodes in its conguration the one
of t in A: in its control state it is encoded the control state of t in A and the
nite sequence last(t) where t is the content of t's queue in A, and in stores
in its stack t with the head write pair placed on top of the stack.
The simulation goes as follows. Initially s guesses the thread t from which
memory transitions can be executed. The content of t's stack is transferred into
s's stack, where now the tail of t's queue is stored on the top of s's stack. Also
the control state of t as well as last(t) is copied into the control state of s.
Now, we rst simulate the moves of t and only after the moves of the remain-
ing threads along with the memory transitions concerning t's queue (in order to
respect the denition of restricted mc-context). The simulation of t is as follows.
For write operations we update last(t) as described in Section 3.2, and push the
produced pair on the stack of s. Read operations, instead, will consult last(t)
to get the value of the read variable, if any, otherwise it recovers the value from
the memory.
In the second stage of the simulation we restore back into t's stack the content
of s's stack as well as the control state. The sequence last will not be copied
as it will change after memory operations will be performed. Such sequence is
reconstructed at the end of the simulation.
We now simulate all the other threads and memory updates in arbitrary
order. Memory transitions are simulated as expected by popping pairs from t's
stack and updating the memory accordingly. Transitions of other threads, say ^ t
are simulated straightforwardly by using last(^ t) and the shared memory in a
similar as we have done for t.
Non deterministically the simulation ends and the invariant is reestablished
by computing last(t). For such a purpose we need to inspect entirely t's stack.
Thus we copy it back and forth to the s's stack by paying one more phase.
Finally s enters into the special control state @ and the simulation ends.
By using the same argument as in Section 3.2 we can show that the above
construction of B allows to reduce in polytime the k restricted mc-context reach-
ability problem for Cfss under Tso to the 3k-phase reachability problem for
Cpdsunder Sc. Thus, from Lemma 1 and Theorem 6 we can state the main
result of the section.
Theorem 7. For any positive integer k, the k mc-context reachability problem
for a Cfss A under Tso is solvable in double exponential time in jAj and k.5 Process-memory centric context-bounded analysis
In this section, we consider process-memory centric context-bounded analysis
(pmc-CBA) for Cpds with Tso semantics. A context, in this case called pmc-
context, of a CPDS A is a contiguous part of an A computation where only one
processor thread is active, and only its store buer can be used for memory
updates. We consider here the reachability problem for Cpds up to a bounded
number of pmc-contexts. We recall that the pmc-CBA for Cpds (resp.Tso-Cfss)
with Tso semantics is reducible to the standard context-bounded analysis for
Cpds (resp. Cfss) with Sc semantics which is known to be decidable [34].
Next, we formally dene the bounded pmc-reachability problem for Tso-
Cpdss. Let A be a CPDS over thread names T and shared-memory model M,
and let k be a positive integer. The pmc-context language Lt of a thread t 2 T
is the set
 
((ActT  ActM) [ f(nop;mem)g)  ftg

. A k pmc-context behavior
language over T, denoted Lk
T, is the set of all words w 2 Act

A which can be
factorized as w1w2 wk, where for every i 2 [k], wi 2 Lti, for some thread
ti 2 T.
Given a Mem 2 fSc,Tsog and Mem-conguration C of A, the k pmc-context
reachability problem is the problem of deciding whether C is Lk
T-reachable in A.
Theorem 8 ([13]). For any k 2 N, the k pmc-context reachability problem for
Cpds (resp. Cfss) under Tso is reducible to the k- pmc-context reachability
problem for Cpds (resp. Cfss) under Sc semantics.
Moreover, we have:
Theorem 9. For any k 2 N, the k- pmc-context reachability problem for Cpds
(resp. Cfss) under Sc semantics is solvable in nondeterministic exponential
time in k and jAj, where jAj is the size of A.
Proof. The upper-bound can be shown by a straightforward reduction to the
reachability problem of k-context multi-pushdown systems (introduced in [34])
where there is a shared control-state between all the stacks. The latter problem
is known to be solvable in non-deterministic polynomial time in k and the size
of the system [29]. It is easy to see that there is a trivial reduction from the
k-pmc-context reachability problem for a Cpds A under SC to k-context multi-
pushdown system B by encoding all the process states and the valuation of the
memory into one single state. This will result in an exponential blow-up and so
the k-pmc-context reachability problem for A can be solved in nondeterministic
exponential-time in k and jAj (since the size of B is exponential in A).
As an immediate corollary of Theorem 8 and Theorem 9, we obtain:
Theorem 10. For any k 2 N, the k pmc-context reachability problem for Cpds
(resp. Cfss) A under Tso is decidable and can be solved in nondeterministic
exponential-time in k and jAj.6 Conclusion
We have considered three dierent notions of context-bounded analysis for TSO
computations, depending on whether a processor, or a memory, or a processor
and memory centric view is adopted. We have shown that each of these three
notions allows to cut-o drastically the complexity of checking state reachability
w.r.t. the unrestricted case, although of course the analysis is under-approximate.
The work we present in this paper allows to improve our understanding of the
trade-os between expressiveness, decidability, and complexity of checking state
reachability under TSO semantics.
While pmc-CBA was already introduced in our previous work [13], this work
introduces two other natural and more general concepts of pc-CBA and pm-CBA
for which the complexity of the TSO state reachability problem is still elemen-
tary. In terms of coverage, pc-CBA and mc-CBA are incomparable while both of
them are strictly more general than pmc-CBA. Indeed, these two analyses allow
to capture with a given bound on the pc/mc context switches sets of behaviors
that would need an unbounded number of pmc context switches. However, this
increase in power comes with a price. First, while pcm-CBA is decidable even
for recursive programs (pushdown threads), both pc-CBA and mc-CBA are un-
decidable in this case. For programs without recursive procedures, pmc-CBA is
in NEXPTIME while both pc-CBA and mc-CBA are in 2EXPTIME.
An interesting question left for future work is whether the analyses presented
here for TSO can be extended to other weak memory models.
References
1. P. A. Abdulla, M. F. Atig, Y.-F. Chen, C. Leonardsson, and A. Rezine. Automatic
fence insertion in integer programs via predicate abstraction. In A. Min e and
D. Schmidt, editors, SAS, volume 7460 of Lecture Notes in Computer Science,
pages 164{180. Springer, 2012.
2. P. A. Abdulla, M. F. Atig, Y.-F. Chen, C. Leonardsson, and A. Rezine. Counter-
example guided fence insertion under tso. In C. Flanagan and B. K onig, edi-
tors, TACAS, volume 7214 of Lecture Notes in Computer Science, pages 204{219.
Springer, 2012.
3. P. A. Abdulla, M. F. Atig, Y.-F. Chen, C. Leonardsson, and A. Rezine. Memorax, a
precise and sound tool for automatic fence insertion under tso. In TACAS, volume
7795 of Lecture Notes in Computer Science, pages 530{536. Springer, 2013.
4. S. V. Adve and M. D. Hill. A unied formalization of four shared-memory models.
IEEE Trans. Parallel Distrib. Syst., 4(6):613{624, 1993.
5. M. Ahamad, G. Neiger, J. E. Burns, P. Kohli, and P. W. Hutto. Causal memory:
Denitions, implementation, and programming. Distributed Computing, 9(1):37{
49, 1995.
6. J. Alglave, D. Kroening, V. Nimal, and M. Tautschnig. Software verication for
weak memory via program transformation. In ESOP, volume 7792 of Lecture Notes
in Computer Science, pages 512{532. Springer, 2013.
7. J. Alglave, D. Kroening, and M. Tautschnig. Partial orders for ecient bounded
model checking of concurrent software. In CAV, volume 8044 of Lecture Notes in
Computer Science, pages 141{157. Springer, 2013.8. J. Alglave and L. Maranget. Stability in weak memory models. In CAV, volume
6806 of Lecture Notes in Computer Science, pages 50{66. Springer, 2011.
9. D. Aspinall and J. Sevc k. Formalising java's data race free guarantee. In K. Schnei-
der and J. Brandt, editors, TPHOLs, volume 4732 of Lecture Notes in Computer
Science, pages 22{37. Springer, 2007.
10. M. F. Atig, B. Bollig, and P. Habermehl. Emptiness of multi-pushdown automata
is 2etime-complete. In M. Ito and M. Toyama, editors, Developments in Lan-
guage Theory, volume 5257 of Lecture Notes in Computer Science, pages 121{133.
Springer, 2008.
11. M. F. Atig, A. Bouajjani, S. Burckhardt, and M. Musuvathi. On the verication
problem for weak memory models. In M. V. Hermenegildo and J. Palsberg, editors,
POPL, pages 7{18. ACM, 2010.
12. M. F. Atig, A. Bouajjani, S. Burckhardt, and M. Musuvathi. What's decidable
about weak memory models? In ESOP, volume 7211 of Lecture Notes in Computer
Science, pages 26{46. Springer, 2012.
13. M. F. Atig, A. Bouajjani, and G. Parlato. Getting rid of store-buers in tso
analysis. In CAV, volume 6806 of Lecture Notes in Computer Science, pages 99{
115. Springer, 2011.
14. M. F. Atig, A. Bouajjani, and S. Qadeer. Context-bounded analysis for concurrent
programs with dynamic creation of threads. In TACAS, volume 5505 of Lecture
Notes in Computer Science, pages 107{123. Springer, 2009.
15. A. Bouajjani, E. Derevenetc, and R. Meyer. Checking and enforcing robustness
against tso. In ESOP, volume 7792 of Lecture Notes in Computer Science, pages
533{553. Springer, 2013.
16. A. Bouajjani, M. Emmi, and G. Parlato. On sequentializing concurrent pro-
grams. In SAS, volume 6887 of Lecture Notes in Computer Science, pages 129{145.
Springer, 2011.
17. A. Bouajjani, R. Meyer, and E. M ohlmann. Deciding robustness against total store
ordering. In ICALP (2), volume 6756 of Lecture Notes in Computer Science, pages
428{440. Springer, 2011.
18. G. Boudol and G. Petri. Relaxed memory models: an operational approach. In
Z. Shao and B. C. Pierce, editors, POPL, pages 392{403. ACM, 2009.
19. S. Burckhardt and M. Musuvathi. Eective program verication for relaxed mem-
ory models. In CAV, volume 5123 of Lecture Notes in Computer Science, pages
107{120. Springer, 2008.
20. J. Burnim, K. Sen, and C. Stergiou. Testing concurrent programs on relaxed
memory models. In ISSTA, pages 122{132. ACM, 2011.
21. R. Friedman. Consistency Conditions for Distributed Shared Memories. Phd.
thesis, Technion: Israel Institute of Technology, 1994.
22. J. E. Hopcroft, R. Motwani, and J. D. Ullman. Introduction to automata theory,
languages, and computation - international edition (2. ed). Addison-Wesley, 2003.
23. M. Kuperstein, M. T. Vechev, and E. Yahav. Partial-coherence abstractions for
relaxed memory models. In PLDI, pages 187{198. ACM, 2011.
24. S. La Torre, P. Madhusudan, and G. Parlato. A robust class of context-sensitive
languages. In LICS, pages 161{170. IEEE Computer Society, 2007.
25. S. La Torre, P. Madhusudan, and G. Parlato. Analyzing recursive programs using
a xed-point calculus. In PLDI, pages 211{222. ACM, 2009.
26. S. La Torre, P. Madhusudan, and G. Parlato. Reducing context-bounded concur-
rent reachability to sequential reachability. In CAV, volume 5643 of Lecture Notes
in Computer Science, pages 477{492. Springer, 2009.27. S. La Torre, M. Napoli, and G. Parlato. On the complement of multi-stack visibly
pushdown languages. Technical report, 2014.
28. A. Lal and T. W. Reps. Reducing concurrent analysis under a context bound to
sequential analysis. In CAV, volume 5123 of Lecture Notes in Computer Science,
pages 37{51. Springer, 2008.
29. A. Lal, T. Touili, N. Kidd, and T. W. Reps. Interprocedural analysis of concurrent
programs under a context bound. In C. R. Ramakrishnan and J. Rehof, edi-
tors, TACAS, volume 4963 of Lecture Notes in Computer Science, pages 282{298.
Springer, 2008.
30. A. Linden and P. Wolper. An automata-based symbolic approach for verifying
programs on relaxed memory models. In SPIN, volume 6349 of Lecture Notes in
Computer Science, pages 212{226. Springer, 2010.
31. V. Luchango. Memory Consistency Models for High Performance Distributed Com-
puting. Phd. thesis, Massachusetts Institute of Technology, 2001.
32. P. Madhusudan and G. Parlato. The tree width of auxiliary storage. In T. Ball
and M. Sagiv, editors, POPL, pages 283{294. ACM, 2011.
33. S. Owens. Reasoning about the implementation of concurrency abstractions on
x86-tso. In ECOOP, volume 6183 of Lecture Notes in Computer Science, pages
478{503. Springer, 2010.
34. S. Qadeer and J. Rehof. Context-bounded model checking of concurrent software.
In TACAS, volume 3440 of LNCS. Springer, 2005.
35. V. A. Saraswat, R. Jagadeesan, M. M. Michael, and C. von Praun. A theory of
memory models. In K. A. Yelick and J. M. Mellor-Crummey, editors, PPOPP,
pages 161{172. ACM, 2007.
36. P. Sewell, S. Sarkar, S. Owens, F. Z. Nardelli, and M. O. Myreen. x86-tso: a
rigorous and usable programmer's model for x86 multiprocessors. Commun. ACM,
53(7):89{97, 2010.