Abstract. We address the problem of verifying safety properties of concurrent programs running over the Total Store Order (TSO) memory model. Known decision procedures for this model are based on complex encodings of store buffers as lossy channels. These procedures assume that the number of processes is fixed. However, it is important in general to prove correctness of a system/algorithm in a parametric way with an arbitrarily large number of processes.
Introduction
Most modern processor architectures execute instructions in an out-of-order manner to gain efficiency. In the context of sequential programming, this out-of-order execution is transparent to the programmer since one can still work under the Sequential Consistency (SC) model [Lam79] . However, this is not true when we consider concurrent processes that share the memory. In fact, it turns out that concurrent algorithms such as mutual exclusion and producer-consumer protocols may not behave correctly any more. Therefore, program verification is a relevant (and difficult) task in order to prove correctness under the new semantics. The inadequacy of the interleaving semantics has led to the invention of new program semantics, so called Weak (or relaxed) Memory Models (WMM), by allowing permutations between certain types of memory operations [AG96, DSB86, AH90] . Total Store Ordering (TSO) is one of the the most common models, and it corresponds to the relaxation adopted by Sun's SPARC multiprocessors [WG94] and formalizations of the x86-tso memory model [OSS09, SSO + 10]. These models put an unbounded perfect (non-lossy) store buffer between each process and the main memory where a store buffer carries the pending store operations of the process. When a process performs a store operation, it appends it to the end of its buffer. These operations are propagated to the shared memory non-deterministically in a FIFO manner. When a process reads a variable, it searches its buffer for a pending store operation on that variable. If no such a store operation exists, it fetches the value of the variable from the main memory. Verifying programs running on the TSO memory model poses a difficult challenge since the unboundedness of the buffers implies that the state space of the system is infinite even in the case where the input program is finite-state. Decidability of safety properties has been obtained by constructing equivalent models that replace the perfect store buffer by lossy channels [ABBM10, ABBM12, AAC + 12a]. However, these constructions are complicated and involve several ingredients that lead to inefficient verification procedures. For instance, they require each message inside a lossy channel to carry (instead of a single store operation) a full snapshot of the memory representing a local view of the memory contents by the process. Furthermore, the reductions involve non-deterministic guessing the lossy channel contents. The guessing is then resolved either by consistency checking [ABBM10] or by using explicit pointer variables (each corresponding to one process) inside the buffers [AAC + 12a] , causing a serious state space explosion problem.
In this paper, we introduce a novel semantics which we call the Dual TSO semantics. Our aim is to provide an alternative (and equivalent) semantics that is more amenable for efficient algorithmic verification. The main idea is to have load buffers that contain pending load operations (more precisely, values that will potentially be taken by forthcoming load operations) rather than store buffers (that contain store operations). The flow of information will now be in the reverse direction, i.e., store operations are performed by the processes atomically on the main memory, while values of variables are propagated non-deterministically from the memory to the load buffers of the processes. When a process performs a load operation it can fetch the value of the variable from the head of its load buffer. We show that the dual semantics is equivalent to the original one in the sense that any given set of processes will reach the same set of local states under both semantics. The dual semantics allows us to understand the TSO model in a totally different way compared to the classical semantics. Furthermore, the dual semantics offers several important advantages from the point of view of formal reasoning and program verification. First, the dual semantics allows transforming the load buffers to lossy channels without adding the costly overhead that was necessary in the case of store buffers. This means that we can apply the theory of well-structured systems [Abd10, ACJT96, FS01] in a straightforward manner leading to a much simpler proof of decidability of safety properties. Second, the absence of extra overhead means that we obtain more efficient algorithms and better scalability (as shown by our experimental results). Finally, the dual semantics allows extending the framework to perform parameterized verification which is an important paradigm in concurrent program verification. Here, we consider systems, e.g., mutual exclusion protocols, that consist of an arbitrary number of processes. The aim of parameterized verification is to prove correctness of the system regardless of the number of processes. It is not obvious how to perform parameterized verification under the classical semantics. For instance, extending the framework of [AAC + 12a] , would involve an unbounded number of pointer variables, thus leading to channel systems with unbounded message alphabets. In contrast, as we show in this paper, the simple nature of the dual semantics allows a straightforward extension of our verification algorithm to the case of parameterized verification. This is the first time a decidability result is established for the parametrized verification of programs running over WMM. Notice that this result is taking into account two sources of infinity: the number of processes and the size of the buffers.
Based on our framework, we have implemented a tool and applied it to a large set of benchmarks. The experiments demonstrate the efficiency of the dual semantics compared to the classical one (by two order of magnitude in average), and the feasibility of parametrized verification in the former case. In fact, besides its theoretical generality, parametrized verification is practically crucial in this setting: as our experiments show, it is much more efficient than verification of bounded-size instances (starting from a number of components of 3 or 4), especially concerning memory consumption (which is the critical resource).
Related Work. There have been a lot of works related to the analysis of programs running under WMM (e.g., [LNP + 12, KVY10, KVY11, DMVY13, AAC + 12a, BM08, BSS11, BDM13, BAM07, YGLS04, AALN15, AAC + 12b, AAJL16, DMVY17, TW16, LV16, LV15, Vaf15, HVQF16] ). Some of these works propose precise analysis techniques for checking safety properties or stability of finite-state programs under WMM (e.g., [ There are also a number of efforts to design bounded model checking techniques for programs under WMM (e.g., [AKNT13, AKT13, YGLS04, BAM07]) which encode the verification problem in SAT/SMT.
The closest works to ours are those presented in [AAC + 12a, ABBM10, AAC + 13, ABBM12] which provide precise and sound techniques for checking safety properties for finite-state programs running under TSO. However, as stated in the introduction, these techniques are complicated and can not be extended, in a straightforward manner, to the verification of parameterized systems (as it is the case of the developed techniques for the Dual TSO semantics).
In Section 7, we experimentally compare our techniques with Memorax [AAC + 12a, AAC + 13] which is the only precise and sound tool for checking safety properties for concurrent programs under TSO.
Preliminaries
Let Σ be a finite alphabet. We use Σ * (resp. Σ + ) to denote the set of all words (resp. non-empty words) over Σ. Let be the empty word. The length of a word w ∈ Σ * is denoted by |w| (and in particular | | = 0). For every i : 1 ≤ i ≤ |w|, let w(i) be the symbol at position i in w. For a ∈ Σ, we write a ∈ w if a appears in w, i.e., a = w(i) for some i : 1 ≤ i ≤ |w|.
Given two words u and v over Σ, we use u v to denote that u is a (not necessarily contiguous) subword of v, i.e., if there is an injection h : {1, . . . , |u|} → {1, . . . , |v|} such that: (1) h(i) < h(j) for all i < j and (2) for every i ∈ {1, . . . , |u|}, we have u(i) = v(h(i)).
Given a subset Σ ⊆ Σ and a word w ∈ Σ * , we use w| Σ to denote the projection of w over Σ , i.e., the word obtained from w by erasing all the symbols that are not in Σ .
Let A and B be two sets and let f : A → B be a total function from A to B. We use f [a ← b] to denote the function g such that g(a) = b and g(x) = f (x) for all x = a.
A transition system T is a tuple C, Init, Act, ∪ a∈Act a − → where C is a (potentially infinite) set of configurations; Init ⊆ C is a set of initial configurations; Act is a set of actions; and for every a ∈ Act, A concurrent system is a tuple P = (A 1 , A 2 , . . . , A n ) where for every p : 1 ≤ p ≤ n, A p is a finite-state automaton describing the behavior of the process p. The automaton A p is defined as a triple Q p , q init p , ∆ p where Q p is a finite set of local states, q init p ∈ Q p is the initial local state, and ∆ p ⊆ Q p × Ω(X, V ) × Q p is a finite set of transitions. We define P := {1, . . . , n} to be the set of process IDs, Q := ∪ p∈P Q p to be the set of all local states and ∆ := ∪ p∈P ∆ p to be the set of all transitions.
3.2. Classical TSO Semantics. In the following, we recall the semantics of concurrent systems under the classical TSO model as formalized in [OSS09, SSO + 10]. To do that, we define the set of configurations and the induced transition relation. Let P = (A 1 , A 2 , . . . , A n ) be a concurrent system.
TSO-configurations.
A TSO-configuration c is a triple (q, b, mem) where: (1) q : P → Q is the global state of P mapping each process p ∈ P to a local state in
* gives the content of the store buffer of each process. (3) mem : X → V defines the value of each shared variable. Observe that the store buffer of each process contains a sequence of write operations, where each write operation is defined by a pair, namely a variable x and a value v that is assigned to x.
The initial TSO-configuration c init is defined by the tuple (q init , b init , mem init ) where, for all p ∈ P and x ∈ X, we have that q init (p) = q init p , b init (p) = and mem init (x) = 0. In other words, each process is in its initial local state, all the buffers are empty, and all the variables in the shared memory are initialized to 0.
We use C TSO to denote the set of TSO-configurations.
TSO-transition Relation. The transition relation − → TSO between TSO-configurations is given by a set of rules, described in Figure 1 . Here, we informally explain these rules. A nop transition (q, nop, q ) ∈ ∆ p changes only the local state of the process p from q to q . A write transition (q, w(x, v), q ) ∈ ∆ p adds a new message (x, v) to the tail (i.e., the left most) of the store buffer of the process p. A memory update transition update p can be performed at any time by removing the (oldest) message at the head (i.e., the right most) of the store buffer of the process p and updating the memory accordingly. For a read transition (q, r(x, v), q ) ∈ ∆ p , if the store buffer of the process p contains some write operations to x, then the read value v must correspond to the value of the the most recent (i.e., the left most) such a write operation. Otherwise the value v of x is fetched from the memory. A fence transition (q, fence, q ) ∈ ∆ p may be performed by the process p only if its store buffer is empty. Finally, an atomic read-write transition (q, arw(x, v, v ), q ) ∈ ∆ p can be performed by the process p only if its store buffer is empty. This transition checks then whether the value of x is v and then changes it to v . Let ∆ = update p | p ∈ P , i.e. ∆ contains all memory update transitions. We use c − → TSO c to denote that c t − → TSO c for some t ∈ ∆ ∪ ∆ . The transition system induced by P under the classical TSO semantics is then given by Figure 1 . The transition relation − → TSO under TSO. Here process p ∈ P and transition t ∈ ∆ p ∪ update p where update p is a transition that updates the memory using the oldest message in the buffer of the process p.
The TSO Reachability Problem. A global state q target is said to be reachable in T TSO if and only if there is a TSO-configuration c of the form (q target , b, mem), with b(p) = for all p ∈ P , such that c is reachable in T TSO .
The reachability problem for the concurrent system P under the TSO semantics asks, for a given global state q target , whether q target is reachable in T TSO . Observe that, in the definition of the reachability problem, we require that the buffers of the configuration c must be empty instead of being arbitrary. This is only for the sake of simplicity and does not constitute a restriction. Indeed, we can easily show that the "arbitrary buffer" reachability problem is reducible to the "empty buffer" reachability problem.
3.3. Dual TSO Semantics. In this section, we define the Dual TSO semantics. The model has a perfect FIFO load buffer between the main memory and each process. This load buffer is used to store potential read operations that will be performed by the process. Each message in the load buffer of a process p is either a pair of the form (x, v) or a triple of the form (x, v, own) where x ∈ X and v ∈ V . A message of the form (x, v) corresponds to the fact that x has had the value v in the shared memory. While a message (x, v, own) corresponds to the fact that the process p has written the value v to x.
A write operation w(x, v) of the process p immediately updates the shared memory and then appends a new message of the form (x, v, own) to the tail (i.e., the left most) of the load buffer of p. Read propagation is then performed by non-deterministically choosing a variable (let's say x and its value is v in the shared memory) and appending the new message (x, v) to the tail of the load buffer of p. This propagation operation speculates on a read operation of p on x that will be performed later on. Moreover, delete operation of the process p can remove the message at the head (i.e., the right most) of the load buffer of p at any time. A read operation r(x, v) of the process p can be executed if the (oldest) message at the head of the load buffer of p is of the form (x, v) and there is no pending message of the form (x, v , own). In the case that the load buffer contains some messages belonging to p (i.e., of the form (x, v , own)), the read value must correspond to the value of the most recent (i.e., the left most) message belonging to p. Implicitly, this allows to simulate the Read-Own-Write transitions in the TSO semantics. A fence operation means that the load buffer of p must be empty before p can continue. Finally, an atomic read-write operation arw(x, v, v ) means that the load buffer of p must be empty and the value of the variable x in the memory is v before p can continue.
DTSO-configurations.
A DTSO-configuration c is a triple (q, b, mem) where:
* is the content of the load buffer of each process. (3) mem : X → V gives the value of each shared variable.
The initial DTSO-configuration c D init is defined by (q init , b init , mem init ) where, for all p ∈ P and x ∈ X, we have that q init (p) = q init p , b init (p) = and mem init (x) = 0. We use C DTSO to denote the set of DTSO-configurations.
DTSO-transition Relation. The transition relation − → DTSO between DTSO-configurations is given by a set of rules, described in Figure 2 . This relation is induced by members of ∆ ∪ ∆ aux where ∆ aux := propagate x p , delete p | p ∈ P, x ∈ X . We informally explain the transition relation rules. The propagate transition propagate x p speculates on a read operation of p over x that will be executed later. This is done by appending a new message (x, v) to the tail (i.e., the left most) of the load buffer of p where v is the current value of x in the shared memory. The delete transition delete p removes the (oldest) message at the head (i.e., the right most) of the load buffer of the process p. A write transition (q, w(x, v), q ) ∈ ∆ p updates the memory and appends a new message (x, v, own) to the tail of the load buffer. A read transition (q, r(x, v), q ) ∈ ∆ p checks first if the load buffer of p contains a message of the form (x, v , own). In that case, the read value v should correspond to the value of the most recent (i.e., the left most) message of that form. If there is no such message on the variable x in the load buffer of p, then the value v of x is fetched from the message at the head of the load buffer of the process p.
We use c − → DTSO c to denote that c t − → DTSO c for some t ∈ ∆ ∪ ∆ aux . The transition system induced by P under the Dual TSO semantics is then given by
The Dual TSO Reachability Problem. The Dual TSO reachability problem for P under the Dual TSO semantics is defined in a similar manner to the case of TSO. A global state q target is said to be reachable in T DTSO if and only if there is a DTSO-configuration c of the form (q target , b, mem), with b(p) = for all p ∈ P , such that c is reachable in T DTSO . Then, the reachability problem consists in checking whether q target is reachable in T DTSO . 
3.4. Relation of TSO and Dual TSO Reachability Problems. The following theorem states the equivalence of the reachability problems under the TSO and Dual TSO semantics.
Proof. The rest of this section is devoted to the proof of the theorem by showing its only if direction and then if direction. In the following, for a TSO (DTSO)-configuration c = (q, b, mem), we use states (c), buffers (c), and mem (c) to denote q, b, and mem respectively.
From Dual TSO to TSO. We show the only if direction of Theorem 3.1. Consider a Dual TSO-computation
where c 0 = c D init and c i is of the form (q i , b i , mem i ) for all i : 1 ≤ i ≤ n with q n = q target . We will derive a TSO-computation π TSO such that target (π TSO ) is of the form (states (c n ) , b, mem (c n )) where b(p) = for all p ∈ P .
First, we define some functions that we will use in the construction of the computation π TSO . Then, we define a sequence of TSO-configurations that appear in π TSO . Finally, we show that the TSO-computation π TSO exists. In particular, π TSO starts from an initial TSO-configuration and its target configuration has the same local states as the target c n of the DTSO-computation π DTSO .
Let 1 ≤ i 1 < i 2 < · · · < i k ≤ n be the sequence of indices such that t i 1 t i 2 . . . t i k is the sequence of write or atomic read-write operations occurring in the computation π DTSO . In the following, we assume that i 0 = 0.
For each j : 0 ≤ j ≤ n, we associate a mapping function index j : P → {0, . . . , k} * that associates for each process p ∈ P and each message pending in its load buffer at the position : 1 :≤ ≤ |buffers (c j , p) | the memory view index index j (p, ), i.e., the index of the last write or atomic write-read operations, at the moment after this message has been added to the buffer. In other words, the memory view of a pending message at a position is given by the index stored at the position of the word index j (p). Formally, we define index j by induction on j as follows:
• Inductive Case. Let us assume that c j
with i r = j + 1.
We associate for each process p ∈ P and j : 0 ≤ j ≤ n, the memory view index view p (c j ) of the process p in the configuration c j as follows:
• If buffers (c j , p) = , then view p (c j ) := r where r : 0 ≤ r ≤ k is the maximal index such that i r ≤ j.
Let ≺ be an arbitrary total order on the set of processes. We use p min and p max to be the smallest and largest elements of ≺ respectively. For p = p max , we define succ (p) to be the successor of p wrt. ≺, i.e., p ≺ succ (p) and there is no p with p ≺ p ≺ succ (p) . We define prev (p) for p = p min analogously.
The computation π TSO will consist of k + 1 phases (henceforth referred to as the phase 0, 1, 2, . . . , k). In fact, π TSO will have the same sequence of memory updates as π DTSO . At the phase r, the computation π TSO simulates the movements of the processes where their memory view index is r. The order in which the processes are simulated during phase r is defined by the ordering ≺. First, process p min will perform a sequence of transitions. This sequence is identical to the sequence of transitions it performs in π DTSO where its memory view index is r. Then, the next process performs its transitions. This continues until p max has made all its transitions. When all processes have performed their transitions in phase r, phase r + 1 starts by p min executing its transitions, and so on. Formally, we define a scheduling function α(r, p, ) that gives for each r : 0 ≤ r ≤ k, p ∈ P , and ≥ 1 a natural number j : 0 ≤ j ≤ n such that process p executes the transition t j as its th transition during phase r. The scheduling function α is defined as bellow where r : 0 ≤ r ≤ k, p ∈ P , and ≥ 0:
is defined below. Phase r starts for process p at the point where its memory view index becomes equal to r. Notice that α(0, p, 0) = 0 for all p ∈ P since all processes are initially in phase 0. Moreover, for r > 0, the transition t α(r,p,0) is a delete transition of the process p if the corresponding buffer is not empty, or a write or an atomic read-write transition otherwise.
• α(r, p, + 1) is defined to be the smallest j such that α(k, p, ) < j, t j ∈ ∆ p and view p (c j ) = r. Intuitively, the ( + 1) th transition of process p during phase r is defined by the next transition from t α(k,p, ) that belongs to ∆ p . Notice that α(r, p, + 1) is defined only for finitely many .
In order to define π TSO , we first define the set of configurations that appear in π TSO . In more detail, for each r : 0 ≤ r ≤ k, p ∈ P , and : 0 ≤ ≤ (r, p), we define a TSOconfiguration d r,p, based on the DTSO-configurations that appear in π DTSO . We define d r,p, by defining its local states, buffer contents, and memory state.
Firstly, we define the local states of the processes as follows:
After process p has performed its th transition during phase r, its local state is identical to its local state in the corresponding DTSOconfiguration c α(r,p, ) .
e. the state of p will not change while p is making its moves. This state is given by the local state of p after it made its last move during phase r.
e. the local state of p will not change while p is making its moves. This state is given by the local state of p when it entered phase r (before it has made any moves during phase r). Secondly, to define the buffer contents, we give more definitions. For a DTSO-message a of the form (x, v), we define DTSO2TSO (a) to be . For a DTSO-message a of the form (x, v, own), we define DTSO2TSO (a) to be (x, v). From that, we define DTSO2TSO ( ) = and DTSO2TSO (a 1 a 2 · · · a n ) := DTSO2TSO (a 1 ) · DTSO2TSO (a 2 ) · · · DTSO2TSO (a n ), i.e., we concatenate the results of applying the operation individually on each a i . Moreover, we define DTSO2TSO + (w) for a word w ∈ ((X × V ) ∪ (X × V × {own})) * as follows: If |w| = 0 then DTSO2TSO + (w) := , else DTSO2TSO + (w) := DTSO2TSO (w(1)w(2) · · · w(|w| − 1)). In the following, we give the definition of the buffer contents of d r,p, :
After process p has performed its th transition during phase r, the content of its buffer is defined by (i) considering the buffer of the corresponding DTSO-configuration c α(r,p, ) and (ii) considering only messages belong to p (i.e., of the form (x, v, own)).
In a similar manner to the case of states, if p ≺ p then the buffer of p will not change while p is making its moves.
In a similar manner to the case of states, if p ≺ p then the buffer of p will not change while p is making its moves. Finally, we define the memory state as follows:
. This definition is consistent with the fact that all processes have identical views of the memory when they are in the same phase r. This view is defined by the memory component of c ir .
The following lemma shows the existence of a TSO-computation π TSO that starts from an initial TSO-configuration and whose target has the same local state definitions as the target c n of the DTSO-computation π DTSO .
Proof. The proof of the lemma is given in Appendix A.1.
This concludes the proof of the only if direction of Theorem 3.1.
From TSO to Dual TSO. We show the if direction of Theorem 3.1. Consider a TSOcomputation
where c 0 = c init and c i is of the form (q i , b i , mem i ) for all i : 1 ≤ i ≤ n with q n = q target . In the following, we will derive a DTSO-computation π DTSO such that states (target (π DTSO )) = states (c n ), i.e. the runs π TSO and π DTSO reach to the same set of local states at the end of the runs.
First, we define some functions that we will use in the construction of the computation π DTSO . Then, we define a sequence of DTSO-configurations that appear in π DTSO . Finally, we show that the DTSO-computation π DTSO exists. In particular, π DTSO starts from an initial DTSO-configuration and its target configuration has the same local states as the target c n of the TSO-computation π TSO .
For every p ∈ P , let ∆
be the set of write (resp. update) and atomic read-write transitions that can be performed by process p. Let ∆ r p be the set of read transitions that can be performed by the process p. Let I = i 1 . . . i m be the maximal sequence of indices such that 1 ≤ i 1 < i 2 < · · · < i m ≤ n and for every j : 1 ≤ j ≤ m, we have t i j is an update transition or an atomic read-write transition (i.e., t i j ∈ p∈P ∆ u,arw p ). In the following, we assume that i 0 = 0. Let I p be the maximal subsequence of I such that all transitions with indices in I p belong to process p.
Let I = i 1 . . . i m be the maximal sequence of indices such that 1 ≤ i 1 < i 2 < · · · < i m ≤ n and for every j : 1 ≤ j ≤ m, we have t i j is a write transition or an atomic read-write transition (i.e., t i j ∈ p∈P ∆ w,arw p ). Let I p be the maximal subsequence of I such that all transitions with indices in I p belong to process p. Observe that
For every j : 1 ≤ j ≤ m, let proc (j) be the process that has the update or atomic read-write transition t i j where t j ∈ I. We define match (i j ) to be the index of the write (resp. atomic read-write) transition t match(i j ) that corresponds to the update (resp. atomic readwrite) transition t i j . Formally, match (i j ):=l where ∃k :
For every j : 1 ≤ j ≤ n such that t j ∈ ∆ r p is a read transition of process p, we define fromMem(t j ) as a predicate such that fromMem(t j ) holds if and only if (x, v ) / ∈ buffers (c j−1 ) for all v ∈ V .
For every j : 1 ≤ j ≤ n and p ∈ P , we define the function label p as follows:
Bellow we show how to simulate all transitions of the TSO-computation π TSO by a set of corresponding transitions in the DTSO-computation π DTSO . The idea is to divide the DTSO-computation to m + 1 phases. For 0 ≤ r < m, each phase r will end at the configuration d r+1 by the simulation of the transition t match(i r+1 ) in π TSO . Moreover, in phase r : 0 ≤ r < m, we call the process proc (r + 1) as the active process, and other processes as the inactive ones. We execute only the DTSO-transitions of the active process p = proc (r + 1) in its active phases. For other processes p = p, we only change the content of their buffers in the active phase of p. In the final phase r = m, all processes will be considered to be active because the index i m+1 is not defined in the definition of the sequence I. The DTSO-computation π DTSO will end at the configuration d m+1 .
For every r : −1 ≤ r < m and p ∈ P , we define the function pos (r, p) in an inductive way on r:
In other words, the function pos (r, p) is the index of the last simulated transition by process p at the end of phase r in the computation π TSO . Moreover, we use pos (−1, p) to be the index of the starting transition of process p before phase 0.
We define the sequence of DTSO-configurations d 1 , . . . , d m , d m+1 by defining their local states, buffer contents, and memory states as follows:
• For every configuration d r where 0 ≤ r < m:
Lemma 3.3 shows the existence of a DTSO-computation π DTSO that starts from an initial TSO-configuration and whose target has the same local state definitions as the target c n of the TSO-computation π TSO . To make the proof understandable, bellow we consider a fence transition t j (1 ≤ j ≤ n) as an atomic read-write transition of the form (q, arw(x, v, v), q ) where v ∈ V is the memory value of variable x ∈ X in the transition t j . Because we know the TSO-computation, we can calculate the value v. The if direction of Theorem 3.1 will follow directly from Lemma 3.3. This concludes the proof of the if direction of Theorem 3.1. • For every r : 0 ≤ r < m,
Proof. The proof of the lemma is given in the Appendix A.2.
This concludes the proof of Theorem 3.1.
The Dual TSO Reachability Problem
In this section, we show the decidability of the Dual TSO reachability problem by making use of the framework of Well-Structured Transition Systems (Wsts) [ACJT96, FS01] . First, we briefly recall the framework of Wsts. Then, we instantiate it to show the decidability of the Dual TSO reachability problem.
− → be a transition system. Let be a well-quasi ordering on C. Recall that a well-quasi ordering on C is a binary relation over C that is reflexive and transitive; and for every infinite sequence (c i ) i≥0 of elements in C there exist i, j ∈ N such that i < j and c i c j .
A set U ⊆ C is called upward closed if for every c ∈ U and c ∈ C with c c , we have c ∈ U. It is known that every upward closed set U can be characterised by a finite minor set M ⊆ U such that: (i) for every c ∈ U, there is c ∈ M such that c c; and (ii) if c, c ∈ M and c c , then c = c . We use min to denote the function which for a given upward closed set U returns its minor set.
Let D ⊆ C. The upward closure of D is defined as D ↑:= {c ∈ C| ∃c ∈ D with c c }. We also define the set of predecessors of D as
The transition relation − → is said to be monotonic wrt. the order if, given c 1 , c 2 , c 3 ∈ C where c 1 − → c 2 and c 1 c 3 , we can compute a configuration c 4 ∈ C and a run π such that c 3 π − → c 4 and c 2 c 4 . The pair (T , ) is called a monotonic transition system if − → is monotonic wrt. .
Given a finite set of configurations M ⊆ C, the coverability problem of M in the monotonic transition system (T , ) asks whether the set M ↑ is reachable in T ; i.e. there exist two configurations c 1 and c 2 such that c 1 ∈ M, c 1 c 2 , and c 2 is reachable in T .
For the decidability of this problem, the following three conditions are sufficient: (1) For every two configurations c 1 and c 2 , it is decidable whether c 1 c 2 .
(2) For every c ∈ C, we can check whether {c} ↑ ∩Init = ∅. (3) For every c ∈ C, the set minpre ({c}) is finite and computable. The solution for the coverability problem as suggested in [ACJT96, FS01] is based on a backward analysis approach. It is shown that starting from a finite set M 0 ⊆ C, the sequence (M i ) i≥0 with M i+1 := minpre (M i ), for i ≥ 0, reaches a fixpoint and it is computable. 4.2. Dual TSO Transition System is a Wsts. In this section, we instantiate the framework of Wsts to show the following result:
Theorem 4.1. The Dual TSO reachability problem is decidable.
Proof. The rest of this section is devoted to the proof of the above theorem. Let P = (A 1 , A 2 , . . . , A n ) be a concurrent system (as defined in Section 3). Moreover, let T DTSO = C DTSO , {c D init }, ∆ ∪ ∆ aux , − → DTSO be the transition system induced by P under the Dual TSO semantics (as defined in Section 3.3).
In the following, let be a well-quasi ordering. We will show that the DTSO-transition system T DTSO is monotonicity wrt. the order . Then, we will show three sufficient conditions for the decidability of the coverability problem for (T DTSO , ) (as stated in Section 4.1).
(1) We first define the ordering on the set of DTSO-configurations.
(2) Then, we show that the transition system induced under the Dual TSO semantics is monotonic wrt. to the order (see Lemma 4.2). (3) For the first sufficient condition, we show that is a well-quasi ordering; and that for every two configurations c 1 and c 2 , it is decidable whether c 1 c 2 (see Lemma 4.3). (4) The second sufficient condition (i.e., checking whether the upward closed set {c} ↑, with c is a DTSO-configuration, contains an initial configuration) is trivial. This check boils down to verify whether c is an initial configuration. (5) For the third sufficient condition, we show that we can calculate the set of minimal DTSO-configurations for the set of predecessors of any upward closed set (see Lemma 4.4). (6) Finally, we will show also that the Dual TSO reachability problem for P can be reduced to the coverability problem in the monotonic transition system (T DTSO , ) (see Lemma 4.5). Observe that this reduction is needed since we require that the load buffers are empty when defining the Dual TSO reachability problem. This concludes the proof of Theorem 4.1.
Ordering . In the following, we define an ordering on C DTSO . Let us first introduce some notations and definitions. Consider a word w ∈ ((X × V ) ∪ (X × V × {own})) * representing the content of a load buffer. We define an operation that divides w into a number of fragments according to the most-recent own-messages concerning each variable. We define
where the following conditions are satisfied:
(2) If (x, v, own) ∈ w i , then x = x j for some j < i (i.e., the most recent own-message on x j occurs at position j).
e., the fragments correspond to the given word w). Let w, w ∈ ((X × V ) ∪ (X × V × {own})) * be two words. Let us assume that:
[w] own = (w 1 , (x 1 , v 1 , own), w 2 , . . . , w r , (x r , v r , own), w r+1 )
[w ] own = (w 1 , (x 1 , v 1 , own), w 2 , . . . , w m , (x m , v m , own), w m+1 ). We write w w to denote that the following conditions are satisfied: (i) r = m, (ii) x i = x i and v i = v i for all i : 1 ≤ i ≤ m, and (iii) w i w i for all i : 1 ≤ i ≤ m + 1.
Consider two DTSO-configurations c = (q, b, mem) and c = (q , b , mem ), we extend the ordering to configurations as follows: c c if and only if the following conditions are satisfied:
for all process p ∈ P , and • mem = mem.
p , delete p | x ∈ X with p ∈ P , and c 1 c 3 . We will show that it is possible to compute a configuration c 4 ∈ C DTSO and a run π such that c 3 π
We define the word w ∈ ((X × V ) ∪ (X × V × {own})) * to be the longest word such that w m+1 = w · w with w m+1 w . Observe that in this case we have either w m+1 = w = or w (|w |) = w m+1 (|w m+1 |). Then, after executing a certain number |w| of delete p transitions from the configuration c 3 , one can obtain a configuration c 3 = (q 3 , b 3 , mem 3 ) such that
As a consequence, we have c 1 c 3 . Furthermore, since c 1 and c 3 have the same global state, the same memory valuation, the same sequence of most-recent own messages concerning each variable, and the same last message in the load buffer of p, c 3 can perform the transition t and reaches to a configuration c 4 such that c 2 c 4 .
The following lemma shows that (T DTSO , ) is a monotonic transition system.
Lemma 4.2. The relation − → DTSO is monotonic wrt. .
Let us assume that c 1 t − → DTSO c 2 for some t ∈ ∆ p ∪ propagate x p , delete p and p ∈ P . We will define c 4 = (q 4 , b 4 , mem 4 ) such that c 3 * − → DTSO c 4 and c 2 c 4 . We consider the following cases depending on t:
• Nop: t = (q 1 , nop, q 2 ). Define q 4 := q 2 , b 4 := b 3 , and mem 4 := mem 2 = mem 3 = mem 1 .
We have c 3 t − → DTSO c 4 .
• Write to memory: t = (q, w(x, v), q ). Define q 4 := q 2 ,
and mem 4 := mem 2 . We have c 3 t − → DTSO c 4 .
• Propagate: t = propagate x p . Define q 4 := q 2 , mem 4 := mem 2 = mem 3 = mem 1 , and
• Delete: t = delete p . Define q 4 := q 2 and mem 4 := mem 2 = mem 3 = mem 1 . Define b 4 according to one of the following cases.
In other words, we define c 4 := c 3 .
We can perform the following sequence of transitions c 3
In other words, we reach to the configuration c 4 from c 3 by first deleting |b 3 (p)| − i messages from the head of b 3 (p).
• Fence: t = (q, fence, q ). Define q 4 := q 2 , b 4 := , and mem 4 := mem 2 . We can perform the following sequence of transitions α 3
In other words, we reach to the configuration c 4 from c 3 by first emptying the content of b 3 (p) and then performing t.
• ARW: t = (q, arw(x, v, v ), q ). Define q 4 := q 2 , b 4 := , and mem 4 := mem 2 . We can reach to the configuration c 4 from c 3 in a similar manner to the case of the fence transition. This concludes the proof of Lemma 4.2.
Ordering is Well-quasi. The following lemma shows that is indeed a well-quasi ordering.
Lemma 4.3. The relation is a well-quasi ordering over C DTSO . Furthermore, for every two DTSO-configurations c 1 and c 2 , it is decidable whether c 1 c 2 .
* be two words. Let us assume that
[w ] own = (w 1 , (x 1 , v 1 , own), w 2 , . . . , w m , (x m , v m , own), w m+1 ). First we show that the order w w is a well-quasi order. It is an immediate consequence of the fact that (i) the sub-word relation is a well-quasi ordering on finite words [Hig52] , and that (ii) the number of own-messages in the form (x, v, own) that should be equal, is finite.
Given two Dual TSO-configurations c = (q, b, mem) and c = (q , b , mem ). We define three orders state , mem , and buffer over configurations of C DTSO : c state c iff q = q , c mem c iff mem = mem, and c bufer c iff b(p) b (p) for all process p ∈ P . It is easy to see that each one of three orderings is a well-quasi ordering.
Next we proof the that the relation is a well-quasi ordering over C DTSO by showing that if we have an infinite sequence seq = c 1 , c 2 , . . . , c n , . . . of configurations of C DTSO , we can find an infinite ascending subsequence of configurations of the sequence seq. First, we apply the order state on the sequence seq. Because the order state is well-quasi, in the infinite sequence seq, we have an infinite ascending subsequence seq state respect to the order state . Next, we apply the oder mem on the sequence seq state . Because the order mem is well-quasi, in the infinite sequence seq state , we have an infinite ascending subsequence seq state,mem respect to the order mem . Finally, we apply the oder buffer on the sequence seq state,mem . Because the order buffer is well-quasi, in the infinite sequence seq state,mem , we have an infinite ascending subsequence seq state,mem,buffer respect to the order buffer . Observer that the sequence seq state,mem,buffer is also an infinite ascending subsequence of the sequence seq respect to the order . Therefore, the relation is a well-quasi ordering over C DTSO .
Since the number of processes, the number of local states, memory content, and the number of own-messages that should be equal are finite, it is decidable whether c 1 c 2 .
This concludes the proof of Lemma 4.3.
Conditions of Decidability. We show three conditions for the decidability of the coverability problem for (T DTSO , ).
The following lemma shows that we can calculate the set of minimal configurations for the set of predecessors of any upward closed set. For t ∈ ∆ ∪ ∆ aux , we select min c | c t − → c to be the minimal set of all finite DTSOconfigurations of the form c = (q , b , mem ) such that one of the following properties is satisfied:
, and one of the following properties is satisfied:
for some v ∈ V where w 1 · w 2 = w and (x, v , own) / ∈ w 1 for all v ∈ V .
• Propagate: t = propagate x p for some p ∈ P , mem(x) = v, q = q, mem = mem, b(p) = (x, v) · w for some w, and b = b [p ← w].
• Read: t = (q 1 , r(x, v), q 2 ), q(p) = q 2 for some p ∈ P , q = q [p ← q 1 ], and mem = mem, and one of the following two conditions is satisfied: -Read-own-write: there is an i : 1 ≤ i ≤ |b(p)| such that b(p)(i) = (x, v, own), and there are no j : 1 ≤ j < i and v ∈ V such that b(p)(j) = (x, v , own), and b = b.
-Read from buffer:
and mem = mem.
This concludes the proof of Lemma 4.4.
From Reachability to Coverability. Let q target be a global state of P and M target be the set of DTSO-configurations of the form (q target , b, mem) with b(p) = for all p ∈ P . Next, we show that the reachability problem of q target in T DTSO can be reduced to the coverability problem of M target in (T DTSO , ).
Recall that q target in T DTSO if and only if M target is reachable in T DTSO . Let us assume that M target ↑ is reachable in T DTSO . This means that there is a configuration c ∈ M target ↑ which is reachable in T DTSO . Let us assume that c is of the form (q target , b, mem). Then, from the configuration c, it is possible to reach to the configuration c = (q target , b , mem), with b (p) = for all p ∈ P , by performing a sequence of delete p transitions to empty the load buffer of each process. It is then easy to see that c ∈ M target and so M target is reachable in T DTSO . The other direction of the following lemma is trivial since M target ⊆ M target ↑.
Lemma 4.5. M target ↑ is reachable in T DTSO iff M target is reachable in T DTSO .
Parameterized Concurrent Systems
Let V be a finite data domain and X be a finite set of variables ranging over V . A parameterized concurrent system (or simply a parameterized system) consists of an unbounded number of identical processes running under the Dual TSO semantics. Formally, a parameterized system S is defined by an extended finite-state automaton A = Q, q init , ∆ uniformly describing the behavior of each process.
An instance of S is a concurrent system P = (A 1 , A 2 , . . . , A n ), for some n ∈ N, where for every p : 1 ≤ p ≤ n, we have A p = A. In other words, it consists of a finite set of processes each running the same code defined by A. We use Inst(S) to denote all possible instances of S. We use T P = (C P , Init P , Act P , − → P ) to denote the transition system induced by an instance P of S under the Dual TSO semantics.
A parameterized configuration α is a pair (P, c) where P = {1, . . . , n}, with n ∈ N, is the set of process IDs and c is a DTSO-configuration of an instance P = (A 1 , A 2 , . . . , A n ) of S. The parameterized configuration α = (P, c) is said to be initial if c is an initial configuration of P (i.e., c ∈ Init P ). We use C (resp. Init) to denote the set of all the parameterized configurations (resp. initial configurations) of S.
Let Act denote the set of actions of all possible instances of S (i.e., Act = ∪ P∈Inst(S) Act P ).
We define a transition relation − → on parameterized configurations such that (P, c)
for some action t ∈ Act iff P = P and there is an instance P of S such that t ∈ Act P and c t − → P c . The transition system induced by S is given by T = (C, Init, Act, − →). In the following we extend the definition of the Dual TSO reachability problem to the case of parameterized systems. A global state q target : P → Q is said to be reachable in T if and only if there exists a parameterized configuration α = (P, (q, b, mem)), with b(p) = for all p ∈ P , such that α is reachable in T and q target (1) · · · q target (|P |) q(1) · · · q(|P |). Then, the reachability problem consists in checking whether q target is reachable in T . In other words, the Dual TSO reachability problem for parameterized systems asks whether there is an instance of the parameterized system that reaches to a configuration with a number of processes in certain given local states.
Decidability of the Parameterized Verification Problem
We prove hereafter the following theorem:
Theorem 6.1. The Dual TSO reachability problem for parameterized systems is decidable.
Proof. Let S = Q, q init , ∆ be a parameterized system and T = (C, Init, Act, − →) be its induced transition system. The proof of Theorem 6.1 is done by instantiating the framework of Wsts.
In the following, let be a well-quasi ordering on the set of parameterized configurations. We will show that the parameterized transition system T is monotonicity wrt. the order . Then, we will show three sufficient conditions for the decidability of the coverability problem for (T , ) (as stated in Section 4.1).
(1) We first define the ordering on the set of parameterized configurations.
(2) Then, we show that the transition system (T , ) is monotonic wrt. to the order (see Lemma 6 .2). (3) For the first sufficient condition, we show that is a well-quasi ordering; and that for every two parameterized configurations α and α , it is decidable whether α α (see Lemma 6.3). (4) The second sufficient condition (i.e., checking whether the upward closed set {α} ↑, with α is a parameterized configuration, contains an initial configuration) for the decidability of the coverability problem is trivial. This check boils down to verify whether the configuration α is initial. (5) For the third sufficient condition, we show that we can calculate the set of minimal parameterized configurations for the set of predecessors of any upward closed set (see Lemma 6.4). (6) Finally, we will show that the Dual TSO reachability problem for S can be reduced to the coverability problem in the monotonic transition system (T , ) (see Lemma 6.5). This concludes the proof of Theorem 6.1.
Ordering . Let α = (P, (q, b, mem)) and α = (P , (q , b , mem )) be two parameterized configurations. We define the ordering on the set of parameterized configurations as follows: α α if and only if the following conditions are satisfied: (1) mem = mem . (2) There is an injection h : {1, . . . , |P |} → {1, . . . , |P |} such that (i) p < p implies h(p) < h(p ); and (ii) for every p ∈ {1, . .
. , |P |}, q(p) = q (h(p)) and b(p) b (h(p)).
Monotonicity. We assume that three parameterized configurations α 1 = (P, (q 1 , b 1 , mem 1 )), α 2 = (P, (q 2 , b 2 , mem 2 )) and α 3 = (P , (q 3 , b 3 , mem 3 )) are given. Furthermore, we assume that α 1 α 3 and α 1 t − → α 2 for some transition t. We will show that it is possible to compute a parameterized configuration α 4 and a run π such that α 3 π − → α 4 and α 2 α 4 . Since α 1 α 3 , there is an injection function h : {1, . . . , |P |} → {1, . . . , |P |} such that (i) p < p implies h(p) < h(p ), and (ii) for every p ∈ {1, . . . , |P |}, q 1 (p) = q 3 (h(p)) and b 1 (p) b 3 (h(p) ). We define the parameterized configuration α from α 3 by only keeping the local state and load buffers of processes in h(P ). Formally, α = (P, (q , b , mem )) is defined as follows: (i) mem = mem 3 ; and (ii) for every p ∈ {1, . . . , |P |},
Since the relation − → DTSO is monotonic wrt. the ordering (see Lemma 4.2), there is a Dual TSO-configuration (q , b , mem ) such that (q , b , mem ) − → * DTSO (q , b , mem ) and (q 2 , b 2 , mem 2 ) (q , b , mem ).
Consider now the parameterized configuration α 4 = (P , (q 4 , b 4 , mem 4 )) such that (i) mem = mem 4 ; (ii) for every p ∈ {1, . . . , |P |}, q (p) = q 4 (h(p)) and b (p) = b 4 (h(p)); and (iii) for every p ∈ ({1, . . . , |P |} \ {h (1), . . . , h(|P |)}), we have q 4 (p) = q 3 (p) and
. It is easy then to see that α 2 α 4 and α 3 − → * α 4 . The following lemma shows that (T , ) is a monotonic transition system. Lemma 6.2. The relation − → is monotonic wrt. .
We show that if α 1 t − → α 2 and α 1 α 3 for some t ∈ ∆ p ∪ propagate x p , delete p and p ∈ P 1 (note that P 1 = P 2 ) then the configuration α 4 exists such that α 3 − → * α 4 and α 2 α 4 . First we define P 4 :=P 3 . Because of α 1 α 3 , there exists an injection h : P 1 → P 3 in the order α 1 α 3 . We define an injection h : P 2 → P 4 in the order α 2 α 4 such that h = h . Moreover, for p ∈ P 4 , let q 4 (p) := q 2 (h (p)) if the process p ∈ P 2 , otherwise q 4 (p) := q 3 (p). We define c 4 depending on different cases of t:
• Nop: t = (q 1 , nop, q 2 ). Define b 4 := b 3 and mem 4 := mem 2 = mem 3 = mem 1 . We have 
• Delete: t = delete p . Define mem 4 := mem 2 = mem 3 = mem 1 . Define b 4 according to one of the following cases:
In other words, we have α 4 = α 3 .
and (x, v , own) ∈ b 2 (p) for some v ∈ V , then define b 4 := b 3 . In other words, we have α 4 = α 3 .
and there is no v ∈ V with (x, v , own) ∈ b 2 (p), then since b 1 (p) b 3 (h(p)) we know that there is an i and therefore a smallest i such that b 3 (h(p))(i) = (x, v, own). Define
We can perform the following sequence of transitions α 3
In other words, we reach to the configuration α 4 from α 3 by first deleting b 3 (h(p)) − i messages from the head of b 3 (h(p)).
• Read: t = (q, r(x, v), q ). Define mem 4 := mem 2 . We define b 4 according to one of the following cases: -Read-own-write: If there is an i : 1 ≤ i ≤ |b 1 (p)| such that b 1 (p)(i) = (x, v, own), and there are no 1 ≤ j < i and v ∈ V such that b 1 (p)(j) = (x, v , own). Since In other words, we have that α 4 = α 3 .
then let i be the largest i :
, we know that such an i exists. Define
We can reach to the configuration α 4 from α 3 in a similar manner to the last case of the delete transition.
• Fence: t = (q, fence, q ). Define b 4 := and mem 4 := mem 2 . We can perform the following sequence of transitions α 3
In other words, we can reach to the configuration α 4 from α 3 by first emptying the contents of b 3 (h(p)) and then performing t.
• ARW: t = (q, arw(x, v, v ), q ). Define b 4 := and mem 4 := mem 2 . We can reach to the configuration α 4 from α 3 in a similar manner to the case of the fence transition. This concludes the proof of Lemma 6.2.
Ordering is Well-quasi. The following lemma states that is indeed a well-quasi ordering:
Lemma 6.3. The relation is a well-quasi ordering over C. Furthermore, for every two parameterized configurations α and α , it is decidable whether α α .
Proof. The lemma follows a similar argument as in the proof of Lemma 4.3.
Conditions for Decidability. We show three conditions for the decidability of the coverability problem for (T , ).
The following lemma shows that we can calculate the set of minimal parameterized configurations for the set of predecessors of any upward closed set.
Lemma 6.4. For any parameterized configuration α, we can compute minpre({α}).
Proof. Consider a parameterized configuration α = (P, c) with c = (q, b, mem). We recall the definition of minpre(α): minpre({α}):=min (Pre T ({α} ↑) ∪ {α} ↑). We observe that
For t ∈ ∆∪∆ aux , we select min α | α t − → α to be the minimal set of all finite parameterized configurations of the form α = (P , c ) with c = (q , b , mem ) such that one of the following properties is satisfied: w(x, v) , q 2 ), mem(x) = v for some v ∈ V , mem (y) = mem(y) if y = x, and one of the following conditions is satisfied:
* , w 1 · w 2 = w and (x, v , own) / ∈ w 1 for all v ∈ V . In other words, (x, v , own) is the most recent message to variable x belonging to p in the buffer b (p). This condition corresponds to the case when we have some messages (x, v , own) that are hidden by the message (x, v, own) in the buffer
In other words, we add one more process p to the configuration α .
• Propagate: t = propagate x p for some p ∈ P , mem(x) = v, P = P , q = q, mem = mem, 
. This condition corresponds to the case when we have some messages (x, v) that are not explicitly presented at the head of the buffer b(p).
• Fence: t = (q 1 , fence, q 2 ), q(p) = q 2 for some p ∈ P , b(p) = , P = P , q = q [p ← q 1 ], b = b, and mem = mem.
, and one of the following conditions is satisfied:
This concludes the proof of Lemma 6.4.
From Reachability to Coverability. Let q target : P → Q be a global state. Let M target be the set of parameterized configurations of the form α = (P , (q target , b, mem)) with b(p) = for all p ∈ P . In the following, we show that M target ↑ is reachable in T if and only if there is a parameterized configuration α = (P, (q, b, mem)), with b(p) = for all p ∈ P , such that α is reachable in T and
Let us assume that there is a parameterized configuration α = (P, (q, b, mem)), with b(p) = for all p ∈ P , such that α is reachable in T and q target (1) · · · q target (|P |) q(1) · · · q(|P |). It is then easy to show that α ∈ M target ↑. Now let us assume that there is a parameterized configuration α = (P , (q , b , mem )) ∈ M target ↑ which is reachable in T . From the configuration α , it is possible to reach to the configuration α = (P , (q , b , mem ) ), with b (p) = for all p ∈ P , by performing a sequence of delete p transitions to empty the load buffer of each process. Since α ∈ M target ↑,
Hence, α is a witness of the state reachability problem.
Lemma 6.5. q target is reachable in T iff M target ↑ is reachable in T .
Experimental Results
We have implemented our techniques described in Section 4 and Section 6 in an open-source tool called Dual-TSO 1 . The tool checks the state reachability problems for (parameterized) concurrent systems under the Dual TSO semantics. Observe that by applying the technique in Section 6, we can check the reachability problem for parameterised concurrent systems of unbounded number of processes where the behavior of each process is described by an extended finite-state automaton from a fixed set of automata. We compare our tool with Memorax [AAC + 12a, AAC + 13] which is the only precise and sound tool for deciding the state reachability problem of concurrent systems under TSO. Observe that Memorax cannot handle parameterized verification. All experiments are performed on an Intel x86-32 Core2 2.4 Ghz machine and 4GB of RAM.
In the following, we present two sets of results. The first set concerns the comparison of Dual-TSO with Memorax (see Table 1 ). The second set shows the benefit of the parameterized verification compared to the use of the state reachability when increasing the number of processes (see Table 2 and Figure 3 Table 2 . Parameterized verification with Dual-TSO. Table 1 presents a comparison between Dual-TSO and Memorax on a representative sample of 20 benchmarks. In all these examples, Dual-TSO and Memorax return the same result for the state reachability problem (except 6 examples where Memorax runs out of time). In the examples where the two tools return, Dual-TSO out-performs Memorax and generates fewer configurations (and so uses less memory). Indeed, Dual-TSO is 600 times faster than Memorax and generates 277 times fewer configurations on average.
The second set compares the scalability of Memorax and Dual-TSO while increasing the number of processes. The results are given in Figure 3 . We observe that Dual-TSO scales better than Memorax in all these examples. In fact, Memorax can only handle the examples with at most 5 processes. Table  2 presents the running time and the number of generated configurations when checking the state reachability problem for the parameterized version of these examples. We observe that the verification of these parameterized systems is much more efficient than verification of bounded-size instances (starting from a number of processes of 3 or 4), especially concerning memory consumption (which is given in terms of number of generated configurations). The reason behind is that the size of the generated minor sets in the analysis of a parameterized system is usually smaller than the size of the generated configurations during the analysis of an instance of the system with a large number of processes.
Conclusion
In this paper, we have presented an alternative (yet equivalent) semantics to the classical one for the TSO model that is more amenable for efficient algorithmic verification and for extension to parametric verification. This new semantics allows us to understand the TSO model in a totally different way compared to the classical semantics. Furthermore, the proposed semantics offers several important advantages from the point of view of formal reasoning and program verification. First, the dual semantics allows transforming the load buffers to lossy channels without adding the costly overhead that was necessary in the case of store buffers. This means that we can apply the theory of well-structured systems [Abd10, ACJT96, FS01] in a straightforward manner leading to a much simpler Table 1 . Comparison between Dual-TSO and Memorax: The columns Safe under SC and Safe under TSO indicate that whether the benchmark is safe under SC and TSO wrt. its specification respectively. The columns #P , #T and #C give the number of processes, the running time in seconds and the number of generated configurations, respectively. If a tool runs out of time, we put t/o in the #T column and • in the #C column. proof of decidability of safety properties. Second, the absence of extra overhead means that we obtain more efficient algorithms and better scalability (as shown by our experimental results). Finally, the dual semantics allows extending the framework to perform parameterized verification which is an important paradigm in concurrent program verification.
In the future, we plan to apply our techniques to more memory models and to combine with predicate abstraction for handling programs with unbounded data domain.
Appendix A. Reachability Equivalence: Dual TSO -TSO
In this section, we show the remaining proofs of the equivalence of the reachability problems under the TSO and Dual TSO semantics in Theorem 3.1.
A.1. From Dual TSO to TSO. We devote this section to prove Lemma 3.2.
Proof. Lemmas A.4-A.7 show that the existence of the computation π TSO . Lemma A.9 and Lemma A.8 show the conditions on the initial and target configurations.
First, we start by establishing Lemma A.1, Lemma A.2, and Lemma A.3 that we will use later.
Lemma A.1. For every j : 0 ≤ j ≤ n and process p ∈ P , the following properties hold: Proof. The lemma holds following an immediate consequence of the definition of index j .
Lemma A.2. For every process p ∈ P and index j : 0 ≤ j < n,
Proof. The lemma holds following an immediate consequence of the definitions of view p and index j .
Lemma A.3. For every natural number j such that α(r, p, ) ≤ j < α(r, p,
Proof. The proof is done by contradiction. Let us assume that there is some j : α(r, p, ) ≤ j < α(r, p, +1)−1 such that DTSO2TSO + (buffers (c j , p)) = DTSO2TSO + (buffers (c j+1 , p)).
Observe that the only three operations that can change the content of the load buffer of the process p are write, delete and propagation operations. Since t j / ∈ ∆ p (and so no write operation has been performed) and propagation will append messages of the form (x, v), this implies that t j is a delete transition of the process p (i.e., t j = delete p ). Now, the only case when DTSO2TSO + (buffers (c j , p)) = DTSO2TSO + (buffers (c j+1 , p) ) is where buffers (c j , p) is of the form w · (y, v , own) · m with m ∈ {(x, v), (x, v, own) | x ∈ X, v ∈ V }. This implies that buffers (c j+1 , p) = w·(y, v , own). Now we can use the third case of Lemma A.1 to prove that view p (c j+1 ) > view p (c j ). This contradicts the fact that α(r,p, ) ) (see Lemma A.2) and view p (c j+1 ) > view p (c j ). Now we can start proving the existence of the computation π TSO by showing that we can move from the configuration d r,p, to d r,p, +1 using the transition t α(r,p, +1) .
Proof. We recall that t α(r,p, +1) ∈ ∆ p by definition. Therefore, t α(r,p, +1) is not a propagation transition nor a delete transition. Furthermore, suppose that t α(r,p, +1) is an atomic readwrite transition. It leads to the fact that view p (c α(r,p, +1) ) > view p (c α(r,p, ) ), contradicting to the assumption that we are in phase r. Hence, t α(r,p, +1) is not an atomic read-write transition.
Let t α(r,p, +1) ∈ ∆ p be of the form (q, op, q ). To prove the lemma, we will prove the following properties: We prove the property (2). We see from the definitions of d α(r,p, ) and
p ). This concludes the property (2).
We prove the properties (3) and (4). In a similar manner to the case of states, we can show the property (3). By the definitions of d α(r,p, ) and d α(r,p, +1) and the fact that
). This concludes the property (4). Now, it remains to prove the property (5). We consider the cases where op is a write or a read operation. The other cases can be treated in a similar way.
• op = w(x, v): We see from Lemma A.3 that for all: j : α(r, p, ) < j < α(r, p, + 1)
In particular, we have We will show that buffers c α(r,p, +1)−1 , p = by contradiction. Let us suppose that buffers c α(r,p, +1)−1 , p = . By definition, we have view p (c α(r,p, +1) ) = r such that i r = α(r, p, + 1). Furthermore, by applying Lemma A.2 to c α(r,p, ) , we know that i r ≤ α(r, p, ). Then, since α(r, p, ) < α(r, p, + 1) by definition, we have i r < i r . This contradicts to the fact that view p (c α(r,p, +1) ) = r by definition. Therefore, we have buffers c α(r,p, +1)−1 , p = .
As a consequence of the fact that buffers c α(r,p, +1)−1 , p = , we know that 
We see from Lemma A.3 that for all j : α(r, p, ) < j < α(r, p, + 1):
In particular, we have 
From the definition of α, it follows that t j ∈ ∆ p for all j : α(r, p, # (r, p)) ≤ j < α(r + 1, p, 0). This implies that states c α (r+1,p,0)−1 , p = states c α(r,p,#(r,p) ) , p . Now we have two cases:
• {j | view p (c j ) = r + 1} = ∅: We see that α(r + 1, p, 0) = α(r, p, # (r, p)), and hence that states d r,pmax ,#(r,pmax ) , p = states (d r+1,p min ,0 , p).
• {j | view p (c j ) = r + 1} = ∅: Since view p (c α(r+1,p,0)−1 ) = r, we can show that t α(r+1,p,0) / ∈ ∆ p . This is done by contradiction as follows. In fact if t α(r+1,p,0) ∈ ∆ p , then it is either a write transition or an atomic read-write transition. This implies that in both cases that buffers c α(r+1,p,0)−1 , p = and that view p (c α(r+1,p,0) ) = α(r + 1, p, 0). Hence, we have α(r + 1, p, 0) = r + 1, and this leads to a contradiction since t i r+1 ∈ ∆ p u . Thus, we have states d r,pmax ,#(r,pmax ) , p = states (d r+1,p min ,0 , p) .
In a similar manner to the case of states, we can show the property (2). Now we show the properties (3)-(4). Using a similar reasoning as for the process p, we know that This concludes the proof of Lemma A.6.
Lemma A.7. If r < k and
Proof. To prove the lemma, we will prove the following properties: • {j | view p (c j ) = r + 1} = ∅: Since view p (c α(r+1,p,0)−1 ) = r, we can show that t α(r+1,p,0) / ∈ ∆ p . This is done by contradiction as follows. In fact if t α(r+1,p,0) ∈ ∆ p , then it is either a write transition or an atomic read-write transition. This implies that in both cases that buffers c α(r+1,p,0)−1 , p = and that view p (c α(r+1,p,0) ) = α(r + 1, p, 0). Hence, we have α(r + 1, p, 0) = r + 1, and this leads to a contradiction since t i r+1 ∈ ∆ p u . Thus, we have states d r,pmax ,#(r,pmax ) , p = states (d r+1,p min ,0 , p) .
In a similar manner to the case of states, we can show the property (2). Now we show the property (3). Using a similar reasoning as for the process p, we know that ,p u ,0) . Furthermore, from the fact that view p (c α(r+1,p u ,0) ) = r + 1 and view p (c α(r+1,p u ,0)−1 ) < r + 1, we have two cases to consider:
• buffers c α(r+1,p u ,0)−1 , p u = : It follows from the conditions for view p (c α(r+1,p u ,0) ) and The following lemma shows that the TSO-computation π TSO starts from an initial TSO-configuration. The following lemma shows that the target of the TSO-computation π TSO has the same local process states as the target c n of the DTSO-computation π DTSO .
Lemma A.9. states d k,pmax ,#(k,pmax ) = states (c n ).
Proof. Let us take any p ∈ P . By the definitions of d k,pmax ,#(k,pmax ) and , # (k, p) ), we know that t j ∈ ∆ p for all j : α(k, p, # (k, p)) < j ≤ n. Therefore, we have states (c j , p) = states (c n , p) for all j : α(k, p, # (k, p)) ≤ j < n. In particular, we have states c α(k,p,#(k,p)) , p = states (c n , p). Hence, we have states d k,pmax ,#(r,pmax ) , p = states (c n , p). This concludes the proof of Lemma A.9.
A.2. From TSO to Dual TSO. We give the proof of Lemma 3.3. We consider the active process p = proc (r + 1) for the case that the transition t match(i r+1 ) is a write one. By executing the same transition, we add an owing message to the buffer of process p and change the memory.
• Since the transition t match(i r+1 ) is of the active process, we have states d r+1 , p = states c β(r,l) , p . Moreover, it follows from the fact β(r, l) = match (i r+1 ) and the definition of pos (r, p) that states c β(r,l) , p = states c match(i r+1 ) , p = states c pos(r,p) , p . We consider the active process p = proc (r + 1) for the case that the transition t match(i r+1 ) is an atomic read-write one. By simulation, we execute the same transition and change the memory.
• Since the transition t match(i r+1 ) is of the active process, we have states d r+1 , p = states c β(r,l) , p . Moreover, it follows from the fact β(r, l) = match (i r+1 ) and the definition of pos (r, p) that states c β(r,l) , p = states c match(i r+1 ) , p = states c pos(r,p) , p . (d r,l+1 , p) .
We consider the active process p = proc (r + 1) for the case that the transition t β(r,l)
is a read-from-memory one. From the simulation of t β(r,l) , we have states d r,l+1 , p = states c β(r,l+1)−1 , p . We have states (d r,l+1 , p) = states c β(r,l+1)−1 , p from the definition of d r,l+1 . Furthermore, because we delete the right most message in the buffer of the process p after we execute the read transition, it follows by the definition of d r,l+1
that buffers d r,l+1 , p = label Hence, it follows that d r,l+1 =d r,l+1 .
We consider the active process p = proc (r + 1) for the case that the transition t β(r,l) is an update one. Proof. We are in the final phase r = m. Observe that in this phase we do not have any write and atomic read-write transitions. Because from the configuration d m until the end of the TSO-computation the memory has not been changed, we observe that all memory-read transitions of a process p ∈ P after transitions t im get their values from mem (d m ). Therefore, we can execute a sequence of propagation transitions to propagate from the memory to buffer of the process p to full fill it by all messages that will satisfy all memory-read transitions of p after t im . We propagate to processes according to the order ≺: first to process p min and last to process p max . We have the following sequence: • To simulate a memory-read transition, we execute the same read transition. And then we execute a delete transition to delete the oldest (the right most) message in the buffer of the process p.
• To simulate a read-own-write transition, we execute the same read transition.
• To simulate an update transition, we execute a delete transition to delete the oldest (the right most) message in the buffer of the process p.
• To simulate a nop transition, we execute the same transitions in the DTSO-computation.
Following the same argument as in Lemma A.11, Lemma A.12, and Lemma A.13 we show that all simulations of transitions are feasible. As a consequence, from the configuration d m we reach to the configuration d m+1 where for all p ∈ P : states (d m+1 , p) = states (c n , p), buffers (d m+1 , p)= , and mem (d m+1 ) = mem (c n ).
This concludes the proof of Lemma A.14. 
