The reads-from equivalence for the TSO and PSO memory models by Bui, Truc Lam et al.
164
The Reads-From Equivalence for the TSO and PSO Memory
Models
TRUC LAM BUI∗, Comenius University, Slovakia
KRISHNENDU CHATTERJEE, IST Austria, Austria
TUSHAR GAUTAM∗, IIT Bombay, India
ANDREAS PAVLOGIANNIS, Aarhus University, Denmark
VIKTOR TOMAN, IST Austria, Austria
The verification of concurrent programs remains an open challenge due to the non-determinism in inter-
process communication. One recurring algorithmic problem in this challenge is the consistency verification
of concurrent executions. In particular, consistency verification under a reads-from map allows to compute
the reads-from (RF) equivalence between concurrent traces, with direct applications to areas such as Stateless
Model Checking (SMC). Importantly, the RF equivalence was recently shown to be coarser than the standard
Mazurkiewicz equivalence, leading to impressive scalability improvements for SMC under SC (sequential con-
sistency). However, for the relaxed memory models of TSO and PSO (total/partial store order), the algorithmic
problem of deciding the RF equivalence, as well as its impact on SMC, has been elusive.
In this work we solve the algorithmic problem of consistency verification for the TSO and PSO memory
models given a reads-from map, denoted VTSO-rf and VPSO-rf, respectively. For an execution of 𝑛 events over
𝑘 threads and 𝑑 variables, we establish novel bounds that scale as 𝑛𝑘+1 for TSO and as 𝑛𝑘+1 ·min(𝑛𝑘
2
, 2𝑘 ·𝑑 )
for PSO. Moreover, based on our solution to these problems, we develop an SMC algorithm under TSO and
PSO that uses the RF equivalence. The algorithm is exploration-optimal, in the sense that it is guaranteed
to explore each class of the RF partitioning exactly once, and spends polynomial time per class when 𝑘 is
bounded. Finally, we implement all our algorithms in the SMC tool Nidhugg, and perform a large number of
experiments over benchmarks from existing literature. Our experimental results show that our algorithms
for VTSO-rf and VPSO-rf provide significant scalability improvements over standard alternatives. Moreover,
when used for SMC, the RF partitioning is often much coarser than the standard ShashaśSnir partitioning for
TSO/PSO, which yields a significant speedup in the model checking task.
CCS Concepts: · Theory of computation→ Verification by model checking; · Software and its engi-
neering→ Formal software verification.
Additional Key Words and Phrases: concurrency, relaxed memory models, execution-consistency verification,
stateless model checking
ACM Reference Format:
Truc Lam Bui, Krishnendu Chatterjee, Tushar Gautam, Andreas Pavlogiannis, and Viktor Toman. 2021. The
Reads-From Equivalence for the TSO and PSO Memory Models. Proc. ACM Program. Lang. 5, OOPSLA,
Article 164 (October 2021), 30 pages. https://doi.org/10.1145/3485541
∗Work done while the author was an intern at IST Austria.
Authors’ addresses: Truc Lam Bui, Comenius University, Mlynská dolina, Bratislava, 842 48, Slovakia, bujtuclam@gmail.com;
Krishnendu Chatterjee, IST Austria, Am Campus 1, Klosterneuburg, 3400, Austria, krishnendu.chatterjee@ist.ac.at; Tushar
Gautam, IIT Bombay, Main Gate Rd, IIT Area, Powai, Mumbai, 400076, India, tushargautam.gautam@gmail.com; Andreas
Pavlogiannis, Aarhus University, Nordre Ringgade 1, Aarhus, 8000, Denmark, pavlogiannis@cs.au.dk; Viktor Toman, IST
Austria, Am Campus 1, Klosterneuburg, 3400, Austria, viktor.toman@ist.ac.at.
© 2021 Copyright held by the owner/author(s).
2475-1421/2021/10-ART164
https://doi.org/10.1145/3485541
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
This work is licensed under a Creative Commons Attribution 4.0 International License.
164:2 Truc Lam Bui, Krishnendu Chatterjee, Tushar Gautam, Andreas Pavlogiannis, and Viktor Toman
1 INTRODUCTION
The formal analysis of concurrent programs is a key problem in program analysis and verification.
Scheduling non-determinism makes programs both hard to write correctly, and to analyze formally,
as both the programmer and the model checker need to account for all possible communication
patterns among threads. This non-determinism incurs an exponential blow-up in the state space of
the program, which in turn yields a significant computational cost on the verification task.
Traditional verification has focused on concurrent programs adhering to sequential consis-
tency [Lamport 1979]. Programs operating under relaxed memory semantics exhibit additional
behavior compared to sequential consistency. This makes it exceptionally hard to reason about cor-
rectness, as, besides scheduling subtleties, the formal reasoning needs to account for buffer/caching
mechanisms. Two of the most standard operational relaxed memory models in the literature are
Total Store Order (TSO) and Partial Store Order (PSO) [Adve and Gharachorloo 1996; Alglave 2010;
Alglave et al. 2017; Owens et al. 2009; Sewell et al. 2010; SPARC International 1994].
On the operational level, both models introduce subtle mechanisms via which write operations
become visible to the shared memory and thus to the whole system. Under TSO, every thread
is equipped with its own buffer. Every write to a shared variable is pushed into the buffer, and
thus remains hidden from the other threads. The buffer is flushed non-deterministically to the
shared memory, at which point the writes become visible to the other threads. The semantics
under PSO are even more involved, as now every thread has one buffer per shared variable, and














Fig. 1. A TSO example (left) and a PSO example (right).
To illustrate the intricacies under TSO and PSO, consider the examples in Figure 1. On the left,
under SC, in every execution at least one of r(𝑦) and r′(𝑥) will observe the corresponding𝑤 ′(𝑦)
and𝑤 (𝑥). Under TSO, however, the write events may become visible on the shared memory only
after the read events have executed, and hence both write events go unobserved. Executions under
PSO are even more involved, see Figure 1 right. Under either SC or TSO, if r(𝑦) observes 𝑤 ′(𝑦),
then r′(𝑥) must observe𝑤 (𝑥), as𝑤 (𝑥) becomes visible on the shared memory before𝑤 ′(𝑦). Under
PSO, however, there is a single local buffer for each variable. Hence the order in which𝑤 (𝑥) and
𝑤 ′(𝑦) become visible in the shared memory can be reversed, allowing r(𝑦) to observe𝑤 ′(𝑦) while
r′(𝑥) does not observe𝑤 (𝑥).
The great challenge in verification under relaxed memory is to systematically, yet efficiently,
explore all such extra behaviors of the system, i.e., account for the additional non-determinism that
comes from the buffers. In this work we tackle this challenge for two verification tasks under TSO
and PSO, namely, for verifying the consistency of executions, and for stateless model checking.
Verifying execution consistency with a reads-from function. One of the most basic prob-
lems for a given memory model is the verification of the consistency of program executions with
respect to the given model [Chini and Saivasan 2020]. The input is a set of thread executions, where
each execution performs operations accessing the shared memory. The task is to verify whether the
thread executions can be interleaved to a concurrent execution, which has the property that every
read observes a specific value written by some write [Gibbons and Korach 1997]. The problem is of
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
The Reads-From Equivalence for the TSO and PSO Memory Models 164:3
foundational importance to concurrency, and has been studied heavily under SC [Cain and Lipasti
2002; Chen et al. 2009; Hu et al. 2012].
The input is often enhanced with a reads-from (RF) map, which further specifies for each read
access the write access that the former should observe. Under sequential consistency, the corre-
sponding problem VSC-rf was shown to be 𝑁𝑃-hard in the landmark work of Gibbons and Korach
[1997], while it was recently shown W[1]-hard [Mathur et al. 2020]. The problem lies at the heart
of many verification tasks in concurrency, such as dynamic analyses [Kini et al. 2017; Mathur et al.
2020, 2021; Pavlogiannis 2019; Roemer et al. 2020; Smaragdakis et al. 2012], linearizability and
transactional consistency [Biswas and Enea 2019; Herlihy and Wing 1990], as well as SMC [Abdulla
et al. 2019; Chalupa et al. 2017; Kokologiannakis et al. 2019b].
Executions under relaxed memory. The natural extension of verifying execution consistency with
an RF map is from SC to relaxed memory models such as TSO and PSO, we denote the respective
problems by VTSO-rf and VPSO-rf. Given the importance of VSC-rf for SC, and the success in
establishing both upper and lower bounds, the complexity of VTSO-rf and VPSO-rf is a very
natural question and of equal importance. The verification problem is known to be 𝑁𝑃-hard for
most memory models [Furbach et al. 2015], including TSO and PSO, however, no other bounds are
known. Some heuristics have been developed for VTSO-rf [Manovit and Hangal 2006; Zennou et al.
2019], while other works study TSO executions that are also sequentially consistent [Bouajjani
et al. 2013, 2011].
Stateless Model Checking. The most standard solution to the space-explosion problem is
stateless model checking [Godefroid 1996]. Stateless model-checking methods typically explore
traces rather than states of the analyzed program. The depth-first nature of the exploration enables
it to be both systematic and memory-efficient, by storing only a few traces at any given time.
Stateless model-checking techniques have been employed successfully in several well-established
tools, e.g., VeriSoft [Godefroid 1997, 2005] and CHESS [Madan Musuvathi 2007].
As there are exponentially many interleavings, a trace-based exploration typically has to explore
exponentiallymany traces, which is intractable in practice. One standard approach is the partitioning
of the trace space into equivalence classes, and then attempting to explore every class via a single
representative trace. The most successful adoption of this technique is in dynamic partial order
reduction (DPOR) techniques [Clarke et al. 1999; Flanagan and Godefroid 2005; Godefroid 1996;
Peled 1993]. The great advantage of DPOR is that it handles indirect memory accesses precisely
without introducing spurious interleavings. The foundation underpinning DPOR is the famous
Mazurkiewicz equivalence, which constructs equivalence classes based on the order in which
traces execute conflicting memory access events. This idea has led to a rich body of work, with
improvements using symbolic techniques [Kahlon et al. 2009], context-sensitivity [Albert et al.
2017], unfoldings [Rodríguez et al. 2015], effective lock handling [Kokologiannakis et al. 2019a], and
others [Albert et al. 2018; Aronis et al. 2018; Chatterjee et al. 2019]. The work of Abdulla et al. [2014]
developed an SMC algorithm that is exploration-optimal for the Mazurkiewicz equivalence, in the
sense that it explores each class of the underlying partitioning exactly once. Finally, techniques
based on SAT/SMT solvers have been used to construct even coarser partitionings [Demsky and
Lam 2015; Huang 2015; Huang and Huang 2017].
The reads-from equivalence for SMC. A new direction of SMC techniques has been recently
developed using the reads-from (RF) equivalence to partition the trace space. The key principle is
to classify traces as equivalent based on whether read accesses observe the same write accesses.
The idea was initially explored for acyclic communication topologies [Chalupa et al. 2017], and has
been recently extended to all topologies [Abdulla et al. 2019]. As the RF partitioning is guaranteed
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
164:4 Truc Lam Bui, Krishnendu Chatterjee, Tushar Gautam, Andreas Pavlogiannis, and Viktor Toman
to be (even exponentially) coarser than the Mazurkiewicz partitioning, SMC based on RF has
shown remarkable scalability potential [Abdulla et al. 2019, 2018; Kokologiannakis et al. 2019b;
Kokologiannakis and Vafeiadis 2020]. The key technical component for SMC using RF is the
verification of execution consistency, as presented in the previous section. The success of SMC
using RF under SC has thus rested upon new efficient methods for the problem VSC-rf.
SMC under relaxed memory. The SMC literature has taken up the challenge of model checking
concurrent programs under relaxed memory. Extensions to SMC for TSO/PSO have been consid-
ered by Zhang et al. [2015] using shadow threads to model memory buffers, as well as by Abdulla
et al. [2015] using chronological traces to represent the ShashaśSnir notion of trace under re-
laxed memory [Shasha and Snir 1988]. Chronological/ShashaśSnir traces are the generalization
of Mazurkiewicz traces to TSO/PSO. Further extensions have also been made to other memory
models, namely by Abdulla et al. [2018] for the release-acquire fragment of C++11, Kokologiannakis
et al. [2017, 2019b] for the RC11 model [Lahav et al. 2017], and Kokologiannakis and Vafeiadis
[2020] for the IMM model [Podkopaev et al. 2019], but notably none for TSO and PSO using the RF
equivalence. Given the advantages of the RF equivalence for SMC under SC [Abdulla et al. 2019],
release-acquire [Abdulla et al. 2018], RC11 [Kokologiannakis et al. 2019b] and IMM [Kokologian-
nakis and Vafeiadis 2020], a very natural standing question is whether RF can be used for effective
SMC under TSO and PSO. Here we tackle this challenge.
1.1 Our Contributions
Here we outline the main results of our work. We refer to Section 3 for a formal presentation.
Verifying execution consistency for TSO and PSO. Our first set of results and the main
contribution of this paper is on the problems VTSO-rf and VPSO-rf for verifying TSO- and PSO-
consistent executions, respectively. Consider an input to the corresponding problem that consists
of 𝑘 threads and 𝑛 operations, where each thread executes write and read operations, as well as
fence operations that flush each thread-local buffer to the main memory. Our results are as follows.
(1) We present an algorithm that solves VTSO-rf in𝑂 (𝑘 · 𝑛𝑘+1) time. The case of VSC-rf is solvable
in 𝑂 (𝑘 · 𝑛𝑘 ) time [Abdulla et al. 2019; Biswas and Enea 2019; Mathur et al. 2020]. Although
for TSO there are 𝑘 additional buffers, our result shows that the complexity is only minorly
impacted by an additional factor 𝑛, as opposed to 𝑛𝑘 .
(2) We present an algorithm that solves VPSO-rf in𝑂 (𝑘 · 𝑛𝑘+1 ·min(𝑛𝑘 · (𝑘−1) , 2𝑘 ·𝑑 )) time, where 𝑑 is
the number of variables. Note that even though there are 𝑘 · 𝑑 buffers, one of our two bounds
is independent of 𝑑 and thus yields polynomial time when the number of threads is bounded.
Moreover, our bound collapses to 𝑂 (𝑘 · 𝑛𝑘+1) when there are no fences, and hence this case is
no more difficult that VTSO-rf.
Stateless model checking for TSO and PSO using the reads-from equivalence (RF). Our
second contribution is an algorithm RF-SMC for SMC under TSO and PSO using the RF equivalence.
The algorithm is based on the reads-from algorithm for SC [Abdulla et al. 2019] and uses our
solutions to VTSO-rf and VPSO-rf for visiting each class of the respective partitioning. Moreover,
RF-SMC is exploration-optimal, in the sense that it explores only maximal traces and further it is
guaranteed to explore each class of the RF partitioning exactly once. For the complexity statements,
let 𝑘 be the total number of threads and 𝑛 be the number of events of the longest trace. The time
spent by RF-SMC per class of the RF partitioning is
(1) 𝑛𝑂 (𝑘) time, for the case of TSO, and
(2) 𝑛𝑂 (𝑘
2) time, for the case of PSO.
Note that the time complexity per class is polynomial in 𝑛 when 𝑘 is bounded.
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
The Reads-From Equivalence for the TSO and PSO Memory Models 164:5
Implementation and experiments. We have implemented RF-SMC in the stateless model
checker Nidhugg [Abdulla et al. 2015], and performed an evaluation on an extensive set of bench-
marks from the recent literature. Our results show that our algorithms for VTSO-rf and VPSO-rf
provide significant scalability improvements over standard alternatives, often by orders of magni-
tude. Moreover, when used for SMC, the RF partitioning is often much coarser than the standard
ShashaśSnir partitioning for TSO/PSO, which yields a significant speedup in the model checking
task.
2 PRELIMINARIES
General notation. Given a natural number 𝑖 ≥ 1, we let [𝑖] be the set {1, 2, . . . , 𝑖}. Given a map
𝑓 : 𝑋 → 𝑌 , we let dom(𝑓 ) = 𝑋 and img(𝑓 ) = 𝑌 denote the domain and image of 𝑓 , respectively.
We represent maps 𝑓 as sets of tuples {(𝑥, 𝑓 (𝑥))}𝑥 . Given two maps 𝑓1, 𝑓2 over the same domain
𝑋 , we write 𝑓1 = 𝑓2 if for every 𝑥 ∈ 𝑋 we have 𝑓1 (𝑥) = 𝑓2 (𝑥). Given a set 𝑋
′ ⊂ 𝑋 , we denote by
𝑓 |𝑋 ′ the restriction of 𝑓 to 𝑋 ′. A binary relation ∼ on a set 𝑋 is an equivalence iff ∼ is reflexive,
symmetric and transitive. We denote by 𝑋/∼ the quotient (i.e., the set of all equivalence classes) of
𝑋 under ∼.
2.1 Concurrent Model under TSO/PSO
Here we describe the computational model of concurrent programs with shared memory under
the Total Store Order (TSO) and Partial Store Order (PSO) memory models. We follow a standard
exposition, similarly to Abdulla et al. [2015]; Huang and Huang [2016]. We first describe TSO and
then extend our description to PSO.
Concurrent program with Total Store Order. We consider a concurrent program 𝒫 =
{thr𝑖 }
𝑘
𝑖=1 of 𝑘 threads. The threads communicate over a shared memory G of global variables.
Each thread additionally owns a store buffer, which is a FIFO queue for storing updates of variables
to the shared memory. Threads execute events of the following types.
(1) A buffer-write event wB enqueues into the local store buffer an update that wants to write a
value 𝑣 to a global variable 𝑥 ∈ G.
(2) A read event r reads the value 𝑣 of a global variable 𝑥 ∈ G. The value 𝑣 is the value of the most
recent local buffer-write event, if one still exists in the buffer, otherwise 𝑣 is the value of 𝑥 in the
shared memory.
Additionally, whenever a store buffer of some thread is nonempty, the respective thread can execute
the following.
(3) A memory-write event wM that dequeues the oldest update from the local buffer and performs
the corresponding write-update on the shared memory.
Threads can also flush their local buffers into the memory using fences.
(4) A fence event fnc blocks the corresponding thread until its store buffer is empty.
Finally, threads can execute local events that are not modeled explicitly, as usual. We refer to
all non-memory-write events as thread events. Following the typical setting of stateless model
checking [Abdulla et al. 2014, 2015; Chalupa et al. 2017; Flanagan and Godefroid 2005], each thread
of the program𝒫 is deterministic, and further𝒫 is bounded, meaning that all executions of𝒫 are
finite and the number of events of 𝒫’s longest execution is a parameter of the input.
Given an event 𝑒 , we denote by thr(𝑒) its thread and by var(𝑒) its global variable. We denote by E
the set of all events, by R the set of read events, byW𝐵 the set of buffer-write events, byW𝑀 the
set of memory-write events, and by F the set of fence events. Given a buffer-write event wB ∈ W𝐵
and its corresponding memory-write wM ∈ W𝑀 , we let w = (wB,wM) be the two-phase write
event, and we denote thr(w) = thr(wB) = thr(wM) and var(w) = var(wB) = var(wM). We denote
byW the set of all such two-phase write events. Given two events 𝑒1, 𝑒2 ∈ R ∪W
𝑀 , we say that
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
164:6 Truc Lam Bui, Krishnendu Chatterjee, Tushar Gautam, Andreas Pavlogiannis, and Viktor Toman
they conflict, denoted 𝑒1 Z 𝑒2, if they access the same global variable and at least one of them is a
memory-write event.
Proper event sets. Given a set of events 𝑋 ⊆ E, we write R(𝑋 ) = 𝑋 ∩ R for the set of read
events of 𝑋 , and similarlyW𝐵 (𝑋 ) = 𝑋 ∩W𝐵 andW𝑀 (𝑋 ) = 𝑋 ∩W𝑀 for the buffer-write and
memory-write events of 𝑋 , respectively. We also denote by L(𝑋 ) = 𝑋 \W𝑀 (𝑋 ) the thread events
(i.e., the non-memory-write events) of 𝑋 . We writeW(𝑋 ) = (𝑋 ×𝑋 ) ∩W for the set of two-phase
write events in 𝑋 . We call 𝑋 proper if wB ∈ 𝑋 iff wM ∈ 𝑋 for each (wB,wM) ∈ W. Finally, given a
set of events 𝑋 ⊆ E and a thread thr, we denote by 𝑋thr and 𝑋≠thr the events of thr, and the events
of all other threads in 𝑋 , respectively.
Sequences and Traces. Given a sequence of events 𝜏 = 𝑒1, . . . , 𝑒 𝑗 , we denote by E(𝜏) the set
of events that appear in 𝜏 . We further denote R(𝜏) = R(E(𝜏)),W𝐵 (𝜏) =W𝐵 (E(𝜏)),W𝑀 (𝜏) =
W𝑀 (E(𝜏)), andW(𝜏) =W(E(𝜏)). Finally we denote by 𝜖 an empty sequence.
Given a sequence 𝜏 and two events 𝑒1, 𝑒2 ∈ E(𝜏), we write 𝑒1 <𝜏 𝑒2 when 𝑒1 appears before 𝑒2
in 𝜏 , and 𝑒1 ≤𝜏 𝑒2 to denote that 𝑒1 <𝜏 𝑒2 or 𝑒1 = 𝑒2. Given a sequence 𝜏 and a set of events 𝐴,
we denote by 𝜏 |𝐴 the projection of 𝜏 on 𝐴, which is the unique sub-sequence of 𝜏 that contains
all events of 𝐴 ∩ E(𝜏), and only those. Given a sequence 𝜏 and an event 𝑒 ∈ E(𝜏), we denote by
pre𝜏 (𝑒) the prefix up until and including 𝑒 , formally 𝜏 |{𝑒
′ ∈ E(𝜏) | 𝑒 ′ ≤𝜏 𝑒}. Given two sequences
𝜏1 and 𝜏2, we denote by 𝜏1 ◦ 𝜏2 the sequence that results in appending 𝜏2 after 𝜏1.
A (concrete, concurrent) trace is a sequence of events 𝜎 that corresponds to a concrete valid
execution of 𝒫 under standard semantics [Shasha and Snir 1988]. We let enabled(𝜎) be the set of
enabled events after 𝜎 is executed, and call 𝜎 maximal if enabled(𝜎) = ∅. A concrete local trace 𝜌
is a sequence of thread events of the same thread.
Reads-from functions. Given a proper event set 𝑋 ⊆ E, a reads-from function over 𝑋 is a
function that maps each read event of 𝑋 to some two-phase write event of 𝑋 accessing the same
global variable. Formally, RF : R(𝑋 ) → W(𝑋 ), where var(r) = var(RF(r)) for all r ∈ R(𝑋 ).
Given a buffer-write event wB (resp. a memory-write event wM), we write RF(r) = (wB, _) (resp.
RF(r) = (_,wM)) to denote that RF(r) is a two-phase write for which wB (resp. wM) is the
corresponding buffer-write (resp. memory-write) event.
Given a sequence of events 𝜏 where the set E(𝜏) is proper, we define the reads-from function
of 𝜏 , denoted RF𝜏 : R(𝜏) → W(𝜏), as follows. Given a read event r ∈ R(𝜏), consider the set
Upd of enqueued conflicting updates in the same thread that have not yet been dequeued, i.e.,
Upd = {(wB,wM) ∈ (W(𝜏))thr(r) | wM Z r, wB <𝜏 r <𝜏 wM}. Then, RF𝜏 (r) = (wB
′,wM′),
where one of the two cases happens:
• Upd ≠ ∅, and (wB′,wM′) ∈ Upd is the latest in 𝜏 , i.e., for each (wB′′,wM′′) ∈ Upd we have
wB′′ ≤𝜏 wB
′.
• Upd = ∅, and wM′ ∈ W𝑀 (𝜏), wM′ Z r, wM′ <𝜏 r is the latest memory-write (of any thread)
conflicting with r and occurring before r in 𝜏 , i.e., for each wM′′ ∈ W𝑀 (𝜏) such that wM′′ Z r
and wM′′ <𝜏 r, we have wM
′′ ≤𝜏 wM
′.
Notice how relaxed memory comes into play in the above definition, as RF𝜏 (r) does not record
which of the two above cases actually happened.
Partial Store Order and Sequential Consistency. The memory model of Partial Store Order
(PSO) is more relaxed than TSO. On the operational level, each thread is equipped with a store
buffer for each global variable, rather than a single buffer for all global variables. Then, at any
point during execution, a thread can non-deterministically dequeue and perform the oldest update
from any of its nonempty store buffers. The notions of events, traces and reads-from functions
remain the same for PSO as defined for TSO. The Sequential Consistency (SC) memory model can
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
The Reads-From Equivalence for the TSO and PSO Memory Models 164:7
be simply thought of as a model where each thread flushes its buffer immediately after a write
event, e.g., by using a fence.
Concurrent program semantics. The semantics of 𝒫 are defined by means of a transition
system over a state space of global states. A global state consists of (i) a memory function that
maps every global variable to a value, (ii) a local state for each thread, which contains the values
of the local variables of the thread, and (iii) a local state for each store buffer, which captures the
contents of the queue. We consider the standard setting with the TSO/PSO memory model, and
refer to Abdulla et al. [2015] for formal details. As usual in stateless model checking, we focus on
concurrent programs with acyclic state spaces.
Reads-from trace partitioning. Given a concurrent program 𝒫 and a memory modelM ∈
{SC,TSO, PSO}, we denote by TM the set of maximal traces of the program𝒫 under the respective
memorymodel.We call two traces𝜎1 and𝜎2 reads-from equivalent if E(𝜎1) = E(𝜎2) andRF𝜎1 = RF𝜎2 .
The corresponding reads-from equivalence ∼RF partitions the trace space into equivalence classes
TM/∼RF and we call this the reads-from partitioning (or RF partitioning). Traces in the same
class of the RF partitioning visit the same set of local states in each thread, and thus the RF
partitioning is a sound partitioning for local state reachability [Abdulla et al. 2019; Chalupa et al.
2017; Kokologiannakis et al. 2019b].
2.2 Partial Orders
Here we present relevant notation around partial orders.
Partial orders. Given a set of events 𝑋 ⊆ E, a (strict) partial order 𝑃 over 𝑋 is an irreflexive,
antisymmetric and transitive relation over 𝑋 (i.e., <𝑃 ⊆ 𝑋 × 𝑋 ). Given two events 𝑒1, 𝑒2 ∈ 𝑋 , we
write 𝑒1 ≤𝑃 𝑒2 to denote that 𝑒1 <𝑃 𝑒2 or 𝑒1 = 𝑒2. Two distinct events 𝑒1, 𝑒2 ∈ 𝑋 are unordered by 𝑃 ,
denoted 𝑒1 ∥𝑃 𝑒2, if neither 𝑒1 <𝑃 𝑒2 nor 𝑒2 <𝑃 𝑒1, and ordered (denoted 𝑒1 ̸ ∥𝑃 𝑒2) otherwise. Given
a set 𝑌 ⊆ 𝑋 , we denote by 𝑃 |𝑌 the projection of 𝑃 on the set 𝑌 , where for every pair of events
𝑒1, 𝑒2 ∈ 𝑌 , we have that 𝑒1 <𝑃 |𝑌 𝑒2 iff 𝑒1 <𝑃 𝑒2. Given two partial orders 𝑃 and 𝑄 over a common
set 𝑋 , we say that 𝑄 refines 𝑃 , denoted by 𝑄 ⊑ 𝑃 , if for every pair of events 𝑒1, 𝑒2 ∈ 𝑋 , if 𝑒1 <𝑃 𝑒2
then 𝑒1 <𝑄 𝑒2. A linearization of 𝑃 is a total order that refines 𝑃 .
Lower sets. Given a pair (𝑋, 𝑃), where 𝑋 is a set of events and 𝑃 is a partial order over 𝑋 , a
lower set of (𝑋, 𝑃) is a set 𝑌 ⊆ 𝑋 such that for every event 𝑒1 ∈ 𝑌 and event 𝑒2 ∈ 𝑋 such that
𝑒2 ≤𝑃 𝑒1, we have 𝑒2 ∈ 𝑌 .
The program order PO. The program order PO of𝒫 is a partial order <PO⊆ E ×E that defines
a fixed order between some pairs of events of the same thread. Given any (concrete) trace 𝜎 and
thread thr, the buffer-writes, reads, and fences of thr that appear in 𝜎 are fully ordered in PO the
same way as they are ordered in 𝜎 . Further, for each thread thr, the program order PO satisfies the
following conditions:
• wB <PO wM for each (wB,wM) ∈ Wthr.
• wB <PO fnc iff wM <PO fnc for each (wB,wM) ∈ Wthr and fence event fnc ∈ Fthr.
• wB1 <PO wB2 iff wM1 <PO wM2 for each (wB𝑖 ,wM𝑖 ) ∈ Wthr, 𝑖 ∈ {1, 2}. In PSO, this condition
is enforced only when var((wB1,wM1)) = var((wB2,wM2)).
A sequence 𝜏 is well-formed if it respects the program order, i.e., 𝜏 ⊑ PO|E(𝜏). Naturally, every
trace 𝜎 is well-formed, as it corresponds to a concrete valid program execution.
3 SUMMARY OF RESULTS
Here we present formally the main results of this paper. In later sections we present the details,
algorithms and examples. The proofs appear in the appendix of Bui et al. [2021].
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
164:8 Truc Lam Bui, Krishnendu Chatterjee, Tushar Gautam, Andreas Pavlogiannis, and Viktor Toman
Verifying execution consistency for TSO and PSO. Our first set of results and the main
contribution of this paper is on the problems VTSO-rf and VPSO-rf for verifying TSO- and PSO-
consistent executions, respectively. The corresponding problem VSC-rf for Sequential Consistency
(SC) was recently shown to be in polynomial time for a constant number of threads [Abdulla et al.
2019; Biswas and Enea 2019]. The solution for SC is obtained by essentially enumerating all the
𝑛𝑘 possible lower sets of the program order (𝑋, PO), where 𝑘 is the number of threads, and hence
yields a polynomial when 𝑘 = 𝑂 (1). For TSO, the number of possible lower sets is 𝑛2·𝑘 , since there
are 𝑘 threads and 𝑘 buffers (one for each thread). For PSO, the number of possible lower sets is
𝑛𝑘 · (𝑑+1) , where 𝑑 is the number of variables, since there are 𝑘 threads and 𝑘 · 𝑑 buffers (𝑑 buffers for
each thread). Hence, following an approach similar to Abdulla et al. [2019]; Biswas and Enea [2019]
would yield a running time of a polynomial with degree 2 · 𝑘 for TSO, and with degree 𝑘 · (𝑑 + 1)
for PSO (thus the solution for PSO is not polynomial-time even when the number of threads is
bounded). In this work we show that both problems can be solved significantly faster.
Theorem 3.1. VTSO-rf for 𝑛 events and 𝑘 threads is solvable in 𝑂 (𝑘 · 𝑛𝑘+1) time.
Theorem 3.2. VPSO-rf for 𝑛 events, 𝑘 threads and 𝑑 variables is solvable in 𝑂 (𝑘 · 𝑛𝑘+1 ·
min(𝑛𝑘 · (𝑘−1) , 2𝑘 ·𝑑 )). Moreover, if there are no fences, the problem is solvable in 𝑂 (𝑘 · 𝑛𝑘+1) time.
Novelty. For TSO, Theorem 3.1 yields an improvement of order 𝑛𝑘−1 compared to the naive 𝑛2·𝑘
bound. For PSO, perhaps surprisingly, the first upper-bound of Theorem 3.2 does not depend on
the number of variables. Moreover, when there are no fences, the cost for PSO is the same as for
TSO (with or without fences).
Stateless Model Checking for TSO and PSO. Our second result concerns stateless model
checking (SMC) under TSO and PSO. We introduce an SMC algorithm RF-SMC that explores the
RF partitioning in the TSO and PSO settings, as stated in the following theorem.
Theorem 3.3. Consider a concurrent program 𝒫 with 𝑘 threads and 𝑑 variables, under a memory
modelM ∈ {TSO, PSO} with trace space TM and 𝑛 being the number of events of the longest trace in
TM . RF-SMC is a sound, complete and exploration-optimal algorithm for local state reachability in𝒫,
i.e., it explores only maximal traces and visits each class of the RF partitioning exactly once. The time
complexity is 𝑂 (𝛼 · |TM/∼RF |), where
(1) 𝛼 = 𝑛𝑂 (𝑘) underM = TSO, and
(2) 𝛼 = 𝑛𝑂 (𝑘
2) underM = PSO.
An algorithm with RF exploration-optimality in SC is presented by Abdulla et al. [2019]. Our
RF-SMC algorithm generalizes the above approach to achieve RF exploration-optimality in the
relaxed memory models TSO and PSO. Further, the time complexity of RF-SMC per class of RF
partitioning is equal between PSO and TSO for programs with no fence instructions.
RF-SMC uses the verification algorithms developed in Theorem 3.1 and Theorem 3.2 as black-
boxes to decide whether any specific class of the RF partitioning is TSO- or PSO-consistent, re-
spectively. We remark that these theorems can potentially be used as black-boxes to other SMC
algorithms that explore the RF partitioning (e.g., Chalupa et al. [2017]; Kokologiannakis et al.
[2019b]; Kokologiannakis and Vafeiadis [2020]).
4 VERIFYING TSO AND PSO EXECUTIONS WITH A READS-FROM FUNCTION
In this section we tackle the verification problems VTSO-rf and VPSO-rf. In each case, the input is
a pair (𝑋,RF), where 𝑋 is a proper set of events of 𝒫, and RF : R(𝑋 ) → W(𝑋 ) is a reads-from
function. The task is to decide whether there exists a trace 𝜎 that is a linearization of (𝑋, PO)
with RF𝜎 = RF, where RF𝜎 is wrt TSO/PSO memory semantics. In case such 𝜎 exists, we say that
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
The Reads-From Equivalence for the TSO and PSO Memory Models 164:9
(𝑋,RF) is realizable and 𝜎 is its witness trace. We first define some relevant notation, and then
establish upper bounds for VTSO-rf and VPSO-rf, i.e., Theorem 3.1 and Theorem 3.2.
Held variables. Given a trace 𝜎 and a memory-write wM ∈ W𝑀 (𝜎) present in the trace, we
say that wM holds variable 𝑥 = var(wM) in 𝜎 if the following hold.
(1) wM is the last memory-write event of 𝜎 on variable 𝑥 .
(2) There exists a read event r ∈ 𝑋 \ E(𝜎) such that RF(r) = (_,wM).
We similarly say that the thread thr(wM) holds 𝑥 in 𝜎 . Finally, a variable 𝑥 is held in 𝜎 if it is held by
some thread in 𝜎 . Intuitively, wM holds 𝑥 until all reads that need to read-from wM get executed.
Witness prefixes. Throughout this section, we use the notion of witness prefixes. Formally,
a witness prefix is a trace 𝜎 that can be extended to a trace 𝜎∗ that realizes (𝑋,RF), under the
respective memory model. Our algorithms for VTSO-rf and VPSO-rf operate by constructing traces
𝜎 such that if (𝑋,RF) is realizable, then 𝜎 is a witness prefix that can be extended with the remaining
events and finally realize (𝑋,RF).
Throughout, we assume wlog that whenever RF(r) = (wB,wM) with thr(r) = thr(wB), then
wB is the last buffer-write on var(wB) before r in their respective thread. Clearly, if this condition
does not hold, then the corresponding pair (𝑋,RF) is not realizable in TSO nor PSO.
4.1 Verifying TSO Executions
In this section we establish Theorem 3.1, i.e., we present an algorithm VerifyTSO that solves
VTSO-rf in𝑂 (𝑘 ·𝑛𝑘+1) time. The algorithm relies crucially on the notion of TSO-executable events,
defined below. Throughout this section we consider fixed an instance (𝑋,RF) of VTSO-rf, and all
traces 𝜎 considered in this section are such that E(𝜎) ⊆ 𝑋 .
TSO-executable events. Consider a trace 𝜎 . An event 𝑒 ∈ 𝑋 \ E(𝜎) is TSO-executable (or
executable for short) in 𝜎 if E(𝜎) ∪ {𝑒} is a lower set of (𝑋, PO) and the following conditions hold.
(1) If 𝑒 is a read event r, let RF(r) = (wB,wM). If thr(r) ≠ thr(wM), then wM ∈ 𝜎 .
(2) If 𝑒 is a memory-write event wM then the following hold.
(a) Variable var(wM) is not held in 𝜎 .
(b) Let r ∈ R(𝑋 ) be an arbitrary read with RF(r) = (wB,wM) and thr(r) ≠ thr(wM). For each
two-phase write (wB′,wM′) with var(r) = var(wB′) and wB′ <PO r, we have wM
′ ∈ 𝜎 .












(a) The reads r1 and r4 are TSO-executable.
The read r2 is not TSO-executable, because
E(𝜎) ∪ {r2} is not a lower set; neither is the




















(b) The memory-write wM4 is TSO-executable. The other
memory-writes are not; E(𝜎) ∪ {wM3} is not a lower set,
forwM5 resp.wM6, the blue dotted arrows show the events
that they have to wait for, because of Item 2a resp. Item 2b
(some buffer-writes are not displayed here for brevity).
Fig. 2. TSO-executability. The already executed events (i.e., E(𝜎)) are in the gray zone, the remaining events
are outside the gray zone. The buffer threads are gray and thin, the main threads are black and thick.
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
164:10 Truc Lam Bui, Krishnendu Chatterjee, Tushar Gautam, Andreas Pavlogiannis, and Viktor Toman
Intuitively, the conditions of executable events ensure that executing an event does not immedi-
ately create an invalid witness prefix. The lower-set condition ensures that the program order PO
is respected. This is a sufficient condition for a buffer-write or a fence (in particular, for a fence
this implies that the respective buffer is currently empty). The extra condition for a read ensures
that its reads-from constraint is satisfied. The extra conditions for a memory-write prevent it from
causing some reads-from constraint to become unsatisfiable.
Figure 2 illustrates the notion of TSO-executability on several examples. Observe that if 𝜎 is
a valid trace, extending 𝜎 with an executable event (i.e., 𝜎 ◦ 𝑒) also yields a valid trace that is
well-formed, as, by definition, E(𝜎) ∪ {𝑒} is a lower set of (𝑋, PO).
Algorithm VerifyTSO. We are now ready to describe our algorithm VerifyTSO for the problem
VTSO-rf. At a high level, the algorithm enumerates all lower sets of (W𝑀 (𝑋 ), PO) by constructing
a trace 𝜎 withW𝑀 (𝜎) = 𝑌 for every lower set 𝑌 of (W𝑀 (𝑋 ), PO). The crux of the algorithm is to
maintain the following. Each constructed trace 𝜎 is maximal in the set of thread events, among
all witness prefixes with the same set of memory-writes. That is, for every witness prefix 𝜎 ′ with
W𝑀 (𝜎 ′) =W𝑀 (𝜎), we have that L(𝜎) ⊇ L(𝜎 ′). Thus, the algorithm will only explore 𝑛𝑘 traces,
as opposed to 𝑛2·𝑘 from a naive enumeration of all lower sets of (𝑋, PO).
The formal description of VerifyTSO is in Algorithm 1. The algorithm maintains a worklist S
of prefixes and a set Done of already-explored lower sets of (W𝑀 (𝑋 ), PO). In each iteration, the
Line 4 loop makes the prefix maximal in the thread events, then Line 6 checks if we are done,
otherwise the loop in Line 7 enumerates the executable memory-writes to extend the prefix with.
Algorithm 1: VerifyTSO
Input: An event set 𝑋 and a reads-from function RF : R(𝑋 ) →W(𝑋 )
Output: A witness 𝜎 that realizes (𝑋,RF) if (𝑋,RF) is realizable under TSO, else ⊥
1 S ← {𝜖}; Done← {∅}
2 while S ≠ ∅ do
3 Extract a trace 𝜎 from S
4 while ∃ thread event 𝑒 TSO-executable in 𝜎 do
5 𝜎 ← 𝜎 ◦ 𝑒 // Execute the thread event 𝑒
6 if E(𝜎) = 𝑋 then return 𝜎 // Witness found
7 foreach memory-write wM that is TSO-executable in 𝜎 do
8 𝜎wM ← 𝜎 ◦wM // Execute wM
9 if ∄𝜎 ′ ∈ Done s.t.W𝑀 (𝜎wM) =W
𝑀 (𝜎 ′) then
10 Insert 𝜎wM in S and in Done // Continue from 𝜎wM
11 return ⊥
We now provide the insights behind the correctness of VerifyTSO. The correctness proof has
two components: (i) soundness and (ii) completeness, which we present below.
Soundness. The soundness follows directly from the definition of TSO-executable events. In
particular, when the algorithm extends a trace 𝜎 with a read r, where RF(r) = (wB,wM), the
following hold.
(1) If thr(r) ≠ thr(wB), then wM ∈ 𝜎 , since r became executable. Moreover, when wM appeared in
𝜎 , the variable 𝑥 = var(wM) became held by wM, and remained held at least until the current
step where r is executed. Hence, no other memory-write wM′ with var(wM′) = 𝑥 could have
become executable in the meantime, to violate the observation of r. Moreover, r cannot read-from
a local buffer write wB′ with var(wB′) = 𝑥 , as by definition, when wM became executable, all
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
The Reads-From Equivalence for the TSO and PSO Memory Models 164:11
buffer-writes on 𝑥 that are local to r and precede r must have been flushed to the main memory
(i.e., wM′ must have also appeared in the trace).
(2) If thr(r) = thr(wB), then either wM has not appeared already in 𝜎 , in which case r reads-from
wB from its local buffer, or wM has appeared in the trace and held its variable until r is executed,
as in the previous item.
Completeness. Let 𝜎 ′ be an arbitrary witness prefix, VerifyTSO constructs a trace 𝜎 such that
W𝑀 (𝜎) =W𝑀 (𝜎 ′) and L(𝜎) ⊇ L(𝜎 ′). This is because VerifyTSO constructs for every lower set
𝑌 of (W𝑀 (𝑋 ), PO) a single representative trace 𝜎 withW𝑀 (𝜎) = 𝑌 . The key is to make 𝜎 maximal
on the thread events, i.e., L(𝜎) ⊇ L(𝜎 ′) for any witness prefix 𝜎 ′ withW𝑀 (𝜎 ′) =W𝑀 (𝜎), and
thus any memory-write wM that is executable in 𝜎 ′ is also executable in 𝜎 .
We now present the above insight in detail. Indeed, if wM is not executable in 𝜎 , one of the
following holds. Let var(wM) = 𝑥 .
(1) 𝑥 is already held in 𝜎 . But sinceW𝑀 (𝜎 ′) =W𝑀 (𝜎) and any read of 𝜎 ′ also appears in 𝜎 , the
variable 𝑥 is also held in 𝜎 ′, thus wM is not executable in 𝜎 ′ either.
(2) There is a later read r ∉ 𝜎 that must read-from wM, but r is preceded by a local write (wB′,wM′)
(i.e., wB′ <PO r) also on 𝑥 , for which wM
′
∉ 𝜎 . Since L(𝜎) ⊇ L(𝜎 ′), we have r ∉ 𝜎 ′, and as
W𝑀 (𝜎 ′) =W𝑀 (𝜎), also wM′ ∉ 𝜎 ′. Thus wM is also not executable in 𝜎 ′.
The final insight is on how the algorithm maintains the maximality invariant as it extends 𝜎 with
new events. This holds because read events become executable as soon as their corresponding
remote observation wM appears in the trace, and hence all such reads are executable for a given
lower set of (W𝑀 (𝑋 ), PO). All other thread events are executable without any further conditions.
Figure 3 illustrates the intuition behind the maximality invariant. The following lemma states the




𝜌2 𝜌 ′3 𝜌3 𝜌
′
4













Fig. 3. VerifyTSO maximality in-
variant. The gray zone shows the
events of some witness prefix 𝜎 ′; the
lighter gray shows the events of the
corresponding trace 𝜎 , constructed
by the algorithm, which is maxi-
mal on thread events. Yellow writes
(wM2 and wM4) are those that are
TSO-executable in 𝜎 but not in 𝜎 ′.
Green writes (wM3) and red writes
(wM5) are TSO-executable and non
TSO-executable, respectively.
Lemma 4.1. (𝑋,RF) is realizable under TSO iff VerifyTSO returns a trace 𝜎 ≠ 𝜖 .
4.2 Verifying PSO Executions
In this section we show Theorem 3.2, i.e., we present an algorithm VerifyPSO that solves VPSO-rf
in 𝑂 (𝑘 · 𝑛𝑘+1 · min(𝑛𝑘 · (𝑘−1) , 2𝑘 ·𝑑 )) time, while the bound becomes 𝑂 (𝑘 · 𝑛𝑘+1) when there are no
fences. Similarly to the case of TSO, the algorithm relies on the notion of PSO-executable events,
defined below. We first introduce some relevant notation that makes our presentation simpler.
Spurious and pending writes. Consider a trace 𝜎 with E(𝜎) ⊆ 𝑋 . A memory-write wM ∈
W𝑀 (𝑋 ) is called spurious in 𝜎 if the following conditions hold.
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
164:12 Truc Lam Bui, Krishnendu Chatterjee, Tushar Gautam, Andreas Pavlogiannis, and Viktor Toman
(1) There is no read r ∈ R(𝑋 ) \ 𝜎 with RF(r) = (_,wM)
(informally, no remaining read wants to read-from wM).
(2) If wM ∈ 𝜎 , then for every read r ∈ 𝜎 with RF𝜎 (r) = (_,wM) we have r <𝜎 wM
(informally, reads in 𝜎 that read-from this write read it from the local buffer).
Note that if wM is a spurious memory-write in 𝜎 then wM is spurious in all extensions of 𝜎 . We
denote by SW𝑀 (𝜎) the set of memory-writes of 𝜎 that are spurious in 𝜎 . A memory-write wM
is pending in 𝜎 if wB ∈ 𝜎 and wM ∉ 𝜎 , where wB is the corresponding buffer-write of wM. We
denote by PW𝑀 (𝜎, thr) the set of all pending memory-writes wM in 𝜎 with thr(wM) = thr. See
Figure 4 for an intuitive illustration of spurious and pending memory-writes.













(a) Linearization wherewM1 is spurious. The table shows
the spurious and pending writes after each step.




(b) Linearization wherewM1 is not spurious;
here RF𝜎 (r1) = (_,wM1) and wM1 <𝜎 r1.
Fig. 4. Illustration of spurious and pending writes.
PSO-executable events. Similarly to the case of VTSO-rf, we define the notion of PSO-
executable events (executable for short). An event 𝑒 ∈ 𝑋 \ E(𝜎) is PSO-executable in 𝜎 if the
following conditions hold.
(1) If 𝑒 is a buffer-write or a memory-write, then the same conditions apply as for TSO-executable.
(2) If 𝑒 is a fence fnc, then every pending memory-write from thr(fnc) is PSO-executable in 𝜎 ,
and these memory-writes together with fnc and E(𝜎) form a lower set of (𝑋, PO).
(3) If 𝑒 is a read r, let RF(r) = (wB,wM). We have wB ∈ 𝜎 , and the following conditions.
(a) if thr(r) = thr(wB), then E(𝜎) ∪ {r} is a lower set of (𝑋, PO).
(b) if thr(r) ≠ thr(wB), then E(𝜎) ∪ {wM, r} is a lower set of (𝑋, PO)
and further either wM ∈ 𝜎 or wM is PSO-executable in 𝜎 .
Figure 5 illustrates several examples of PSO-(un)executable events. Similarly to the case of TSO,
the PSO-executable conditions ensure that we do not execute events creating an invalid witness
prefix. The executability conditions for PSO are different (e.g., there are extra conditions for a
fence), since our approach for VPSO-rf fundamentally differs from the approach for VTSO-rf.
𝜌2𝜌 ′2 (𝑥) 𝜌
′

















Fig. 5. PSO-executability. The green events are PSO-executable; the red events are not. The memory-write
wM2 (𝑥) is executable, and thus so are r1 (𝑥) and fnc1. The memory-write wM3 (𝑦) is not executable, as the
variable 𝑦 is held by wM1 (𝑦) until r2 (𝑦) is executed. Consequently, fnc2 and r3 (𝑦) are not executable.
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
The Reads-From Equivalence for the TSO and PSO Memory Models 164:13
Fence maps. We define a fence map as a function FMap𝜎 : Threads×Threads→ [𝑛] as follows.
First, FMap𝜎 (thr, thr) = 0 for all thr ∈Threads. In addition, if thr does not have a fence unexecuted
in 𝜎 (i.e., a fence fnc ∈ (𝑋thr \ E(𝜎))), then FMap𝜎 (thr, thr
′) = 0 for all thr′ ∈Threads. Otherwise,
consider the set of all reads 𝐴thr,thr′ such that every r ∈ 𝐴thr,thr′ with RF(r) = (wB,wM) satisfies
the following conditions.
(1) thr(r) = thr′ and r ∉ 𝜎 .
(2) thr(wB) ∉ {thr, thr′}, and var(r) is held by wM in 𝜎 , and there is a pending memory write wM′
in 𝜎 with thr(wM′) = thr and var(wM′) = var(r).
If 𝐴thr,thr′ = ∅ then we let FMap𝜎 (thr, thr
′) = 0, otherwise FMap𝜎 (thr, thr
′) is the largest index
of a read in 𝐴thr,thr′ . Given two traces 𝜎1, 𝜎2, FMap𝜎1 ≤ FMap𝜎2 denotes that FMap𝜎1 (thr, thr
′) ≤
FMap𝜎2 (thr, thr
′) for all thr, thr′ ∈ [𝑘].
The intuition behind fence maps is as follows. Given a trace 𝜎 , the index FMap𝜎 (thr, thr
′) points
to the latest (wrt PO) read r of thr′ that must be executed in any extension of 𝜎 before thr can
execute its next fence. This occurs because the following hold in 𝜎 .
(1) The variable var(r) is held by the memory-write wM ∈ 𝜎 with RF(r) = (_,wM).
(2) Thread thr has executed some buffer-write wB′ ∈ 𝜎 with var(wB′) = var(r) = var(wM), but the
corresponding memory-write wM′ has not yet been executed in 𝜎 . Hence, thr cannot flush its
buffers in any extension of 𝜎 that does not contain r (as wM′ will not become executable until r
gets executed).
The following lemmas state two key monotonicity properties of fence maps.
Lemma 4.2. Consider two witness prefixes 𝜎1, 𝜎2 such that 𝜎2 = 𝜎1 ◦wM for some memory-write
wM executable in 𝜎1. We have FMap𝜎1 ≤ FMap𝜎2 . Moreover, if wM is a spurious memory-write in 𝜎1,
then FMap𝜎1 = FMap𝜎2 .
Lemma 4.3. Consider twowitness prefixes𝜎1, 𝜎2 such that (i)L(𝜎1) = L(𝜎2), (ii) FMap𝜎1 ≤ FMap𝜎2 ,
and (iii)W𝑀 (𝜎1) \ SW
𝑀 (𝜎1) ⊆ W
𝑀 (𝜎2). Let 𝑒 ∈ L(𝑋 ) be a thread event that is executable in 𝜎𝑖
for each 𝑖 ∈ [2], and let 𝜎 ′𝑖 = 𝜎𝑖 ◦ 𝑒 , for each 𝑖 ∈ [2]. Then FMap𝜎′1
≤ FMap𝜎′2
.
Note that there exist in total at most 𝑛𝑘 ·𝑘 different fence maps. Further, the following lemma
gives a bound on the number of different fence maps among witness prefixes that contain the same
thread events.
Lemma 4.4. Let 𝑑 be the number of variables. There exist at most 2𝑘 ·𝑑 distinct witness prefixes 𝜎1, 𝜎2
such that L(𝜎1) = L(𝜎2) and FMap𝜎1 ≠ FMap𝜎2 .
Algorithm VerifyPSO. We are now ready to describe our algorithm VerifyPSO for the problem
VPSO-rf. In high level, the algorithm enumerates all lower sets of (L(𝑋 ), PO), i.e., the lower sets
of the thread events. The crux of the algorithm is to guarantee that for every witness-prefix 𝜎 ′, the
algorithm constructs a trace 𝜎 such that (i) L(𝜎) = L(𝜎 ′), (ii)W𝑀 (𝜎) \ SW𝑀 (𝜎) ⊆ W𝑀 (𝜎 ′),
and (iii) FMap𝜎 ≤ FMap𝜎′ . To achieve this, for a given lower set 𝑌 of (L(𝑋 ), PO), the algorithm
examines at most as many traces 𝜎 with L(𝜎) = 𝑌 as the number of different fence maps of witness
prefixes with the same set of thread events. Hence, the algorithm examines significantly fewer
traces than the 𝑛𝑘 · (𝑑+1) lower sets of (𝑋, PO).
Algorithm 2 presents a formal description of VerifyPSO. The algorithm maintains a worklist S
of prefixes, and a set Done of explored pairs ł(thread events, fence map)ž. Consider an iteration
of the main loop in Line 2. First in the loop of Line 4 all spurious executable memory-writes are
executed. Then Line 6 checks whether the witness is complete. In case it is not complete, the loop
in Line 7 enumerates the possibilities to extend with a thread event. Crucially, the condition in
Line 16 ensures that there are no duplicates with the same pair ł(thread events, fence map)ž.
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
164:14 Truc Lam Bui, Krishnendu Chatterjee, Tushar Gautam, Andreas Pavlogiannis, and Viktor Toman
Algorithm 2: VerifyPSO
Input: An event set 𝑋 and a reads-from function RF : R(𝑋 ) →W(𝑋 )
Output: A witness 𝜎 that realizes (𝑋,RF) if (𝑋,RF) is realizable under PSO, else 𝜎 = ⊥
1 S ← {𝜖}; Done← {∅}
2 while S ≠ ∅ do
3 Extract a trace 𝜎 from S
4 while ∃ spurious wM PSO-executable in 𝜎 do
5 𝜎 ← 𝜎 ◦wM // Flush spurious memory-write wM
6 if E(𝜎) = 𝑋 then return 𝜎 // Witness found
7 foreach thread event 𝑒 PSO-executable in 𝜎 do
8 Let 𝜎𝑒 ← 𝜎
9 if 𝑒 is a read event with RF(r) = (wB,wM) then
10 if thr(r) ≠ thr(wB) and wM ∉ 𝜎𝑒 then
11 𝜎𝑒 ← 𝜎𝑒 ◦wM // Execute the reads-from of 𝑒
12 else if 𝑒 is a fence event then
13 Let 𝜇 ← any linearization of (PW𝑀 (𝜎, thr(𝑒)), PO)
14 𝜎𝑒 ← 𝜎𝑒 ◦ 𝜇 // Execute pending memory writes
15 𝜎𝑒 ← 𝜎𝑒 ◦ 𝑒 // Finally, execute 𝑒
16 if ∄𝜎 ′ ∈ Done s.t. L(𝜎𝑒 ) = L(𝜎
′) and FMap𝜎𝑒 = FMap𝜎′ then
17 Insert 𝜎𝑒 in S and in Done // Continue from 𝜎𝑒
18 return ⊥
Soundness. The soundness of VerifyPSO follows directly from the definition of PSO-executable
events, and is similar to the case of VerifyTSO.
Completeness. For each witness prefix 𝜎 ′, algorithm VerifyPSO generates a trace 𝜎 with
(i) L(𝜎) = L(𝜎 ′), (ii)W𝑀 (𝜎) \ SW𝑀 (𝜎) ⊆ W𝑀 (𝜎 ′), and (iii) FMap𝜎 ≤ FMap𝜎′ . This fact
directly implies completeness, and it is achieved by the following key invariant. Consider that the
algorithm has constructed a trace 𝜎 , and is attempting to extend 𝜎 with a thread event 𝑒 . Further,
let 𝜎 ′ be an arbitrary witness prefix with (i) L(𝜎) = L(𝜎 ′), (ii)W𝑀 (𝜎) \ SW𝑀 (𝜎) ⊆ W𝑀 (𝜎 ′),
and (iii) FMap𝜎 ≤ FMap𝜎′ . If 𝜎
′ can be extended so that the next thread event is 𝑒 , then 𝑒 is also
executable in 𝜎 , and (by Lemma 4.2 and Lemma 4.3) the extension of 𝜎 with 𝑒 maintains the
invariant. In Figure 6 we provide an intuitive illustration of the completeness idea.
We now prove the argument in detail for the above 𝜎 , 𝜎 ′ and thread event 𝑒 . Assume that 𝜎 ′◦𝜅 ◦𝑒
is a witness prefix as well, for a sequence of memory-writes 𝜅. Consider the following cases.
(1) If 𝑒 is a read event, let w = (wB,wM) = RF(𝑒). If it is a local write (i.e., thr(w) = thr(𝑒)),
necessarily wB ∈ 𝜎 ′ ◦ 𝜅, and since the traces agree on thread events, we have wB ∈ 𝜎 ; thus
𝑒 is executable in 𝜎 . Otherwise, w is a remote write (i.e., thr(w) ≠ thr(𝑒)). Assume towards
contradiction that 𝑒 is not executable in 𝜎 ; this can happen in two cases.
In the first case, the variable 𝑥 = var(𝑒) is held by another (non-spurious) memory-write wM′
in 𝜎 . SinceW𝑀 (𝜎) \ SW𝑀 (𝜎) ⊆ W𝑀 (𝜎 ′), and L(𝜎) = L(𝜎 ′), the variable 𝑥 is also held by
wM′ in 𝜎 ′ ◦ 𝜅. But then, both wM and wM′ hold 𝑥 in 𝜎 ′ ◦ 𝜅, a contradiction.
In the second case, there is a write (wB′,wM′) with var(wM′) = var(𝑒) and wB′ <PO 𝑒 and
wM′ ∉ 𝜎 . If wM′ ∉ 𝜎 ′ ◦𝜅 , then 𝑒 would read-from wB′ from the buffer in 𝜎 ′ ◦𝜅 ◦𝑒 , contradicting
RF(𝑒) = (_,wM). Thus wM′ ∈ 𝜎 ′ ◦ 𝜅, and further wM ∈ 𝜎 ′ ◦ 𝜅 with wM′ <𝜎′◦𝜅 wM. Since
𝜎 ′ ◦ 𝜅 ◦ 𝑒 is a witness prefix and wB′ <PO 𝑒 , we have wB
′ ∈ 𝜎 ′. From this and L(𝜎) = L(𝜎 ′)
we have that wB′ ∈ 𝜎 and wM′ is pending in 𝜎 . This together gives us that wM′ is spurious
in 𝜎 . Consider the earliest memory-write pending in 𝜎 on the same buffer (i.e., thr(wM′) and
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
The Reads-From Equivalence for the TSO and PSO Memory Models 164:15
𝜌3 𝜌 ′3 (𝑥) 𝜌
′













wM2 (𝑥) wM1 (𝑦)
r1 (𝑦)
r2 (𝑥)FMap
𝜎 ′ (thr3 , thr2 )
Fig. 6. VerifyPSO completeness idea. Consider the witness prefix 𝜎 ′ (lighter gray) and the corresponding
trace 𝜎 constructed by the algorithm (darker gray). The fence fnc1 is PSO-executable in 𝜎 but not in 𝜎
′,
since in the latter, thr(fnc1) has non-empty buffers, but the variables 𝑥 and 𝑦 are held by wM1 and wM2,
respectively. This is equivalent to waiting until after r1 and r2 have been executed. Since executing r2 implies
having executed r1, the fence map FMap𝜎′ (thr3, thr2) compresses this information by only pointing to r2.
var(wM′)), denote it wM′′. We have that wM′′ ≤PO wM
′ and wM′′ is spurious in 𝜎 . Further,
wM′′ is executable in 𝜎 . But then it would have been added to 𝜎 in the while loop of Line 4, a
contradiction.
(2) Assume that 𝑒 is a fence event, and let wM1, . . . ,wM𝑗 be the pending memory-writes of thr(𝑒) in
𝜎 . Suppose towards contradiction that 𝑒 is not executable. Then one of the wM𝑖 is not executable,
let 𝑥 = var(wM𝑖 ). Similarly to the above, there can be two cases where this might happen.
The first case is when wM𝑖 must be read-from by some read event r ∉ 𝜎 , but r is preceded by a
local write (wB,wM) (i.e., wB <PO r) on the same variable 𝑥 while wM ∉ 𝜎 . A similar analysis
to the previous case shows that the earliest pending write on thr(wM) for variable 𝑥 is spurious,
and thus already added to 𝜎 due to the while loop in Line 4, a contradiction.
The second case is when the variable 𝑥 is held in 𝜎 . Since FMap𝜎 ≤ FMap𝜎′ , the variable 𝑥 is also
held in 𝜎 ′, and thus wM𝑖 is not executable in 𝜎
′ either. But then 𝜎 ′ ◦ 𝜅 ◦ 𝑒 cannot be a witness
prefix, a contradiction.
The following lemma states the correctness of VerifyPSO, which together with the complexity
argument establishes Theorem 3.2.
Lemma 4.5. (𝑋,RF) is realizable under PSO iff VerifyPSO returns a trace 𝜎 ≠ 𝜖 .
We conclude this section with some insights on the relationship between VTSO-rf and VPSO-rf.
Relation between TSO and PSO verification. In high level, TSO might be perceived as a
special case of PSO, where every thread is equipped with one buffer (TSO) as opposed to one
buffer per global variable (PSO). However, the communication patterns between TSO and PSO are
drastically different. As a result, our algorithm VerifyPSO is not applicable to TSO, and we do not
see an extension of VerifyTSO for handling PSO efficiently. In particular, the minimal strategy of
VerifyPSO on memory-writes is based on the following observation: for a read r observing a remote
memory-write wM, it always suffices to execute wM exactly before executing r (unless wM has
already been executed). This holds because the corresponding buffer contains memory-writes only
on the same variable, and thus all such memory-writes that precede wM cannot be read-from by
any subsequent read. This property does not hold for TSO: as there is a single buffer, wM might be
executed as a result of flushing the buffer of thread thr(wM) to make another memory-write wM′
visible, on a different variable than var(wM), and thus wM′ might be observable by a subsequent
read. Hence the minimal strategy of VerifyPSO on memory-writes does not apply to TSO. On the
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
164:16 Truc Lam Bui, Krishnendu Chatterjee, Tushar Gautam, Andreas Pavlogiannis, and Viktor Toman
other hand, the maximal strategy of VerifyTSO is not effective for PSO, as it requires enumerating
all lower sets of (W𝑀 (𝑋 ),RF), which are 𝑛𝑘 ·𝑑 many in PSO (where 𝑑 is the number of variables),
and thus this leads to worse bounds than the ones we achieve in Theorem 3.2.
4.3 Closure for VerifyTSO and VerifyPSO
In this section we introduce closure, a practical heuristic to efficiently detect whether a given
instance (𝑋,RF) of the verification problem VTSO-rf resp. VPSO-rf is unrealizable. Closure is
sound, meaning that a realizable instance (𝑋,RF) is never declared unrealizable by closure. Further,
closure is not complete, which means there exist unrealizable instances (𝑋,RF) not detected as
such by closure. Finally, closure can be computed in time polynomial with respect to the number of
events (i.e., size of 𝑋 ), irrespective of the underlying number of threads and variables.
Given an instance (𝑋,RF), any solution of VTSO-rf/VPSO-rf (𝑋,RF) respects PO|𝑋 , i.e., the
program order upon 𝑋 . Closure constructs the weakest partial order 𝑃 (𝑋 ) that refines the program
order (i.e., 𝑃 ⊑ PO|𝑋 ) and further satisfies for each read r ∈ R(𝑋 ) with RF(r) = (wB,wM):
(1) If thr(r) ≠ thr(RF(r)), then (i) wM <𝑃 r and (ii) wM <𝑃 wM for any (wB,wM) ∈ W(𝑋thr(r) )
such that wM Z r and wB <PO r.
(2) For any wM ∈ W𝑀 (𝑋≠thr(r) ) such that wM Z r and wM ≠ wM, wM <𝑃 r implies wM <𝑃 wM.
(3) For any wM ∈ W𝑀 (𝑋≠thr(r) ) such that wM Z r and wM ≠ wM, wM <𝑃 wM implies r <𝑃 wM.
If no above 𝑃 exists, the instance VTSO-rf/VPSO-rf (𝑋,RF) provably has no solution. In case 𝑃
exists, each solution 𝜎 of VTSO-rf/VPSO-rf (𝑋,RF) provably respects 𝑃 (formally, 𝜎 ⊑ 𝑃 ).









(a) Rule Item 1. Both new order-
ings are necessary, as a reversal of
either of them would łhidež wM
from r, making it impossible for r
to read-from (wB,wM).









(b) Rule Item 2. The new or-
dering is necessary; its reversal
would make wM appear between
(wB,wM) and r, making it impos-
sible for r to read-from (wB,wM).









(c) Rule Item 3. The new or-
dering is necessary; its reversal
would make wM appear between
(wB,wM) and r, making it impos-
sible for r to read-from (wB,wM).
Fig. 7. Illustration of the three closure rules. In each example, the read r has to read-from the write (wB,wM),
i.e., RF(r) = (wB,wM). All depicted events are on the same variable (which is omitted for clarity). The gray
solid edges illustrate orderings already present in the partial order, and the red dashed edges illustrate the
resulting new orderings enforced by the specific rule.
The intuition behind closure is as follows. The construction starts with the program order PO|𝑋 ,
and then, utilizing the above rules Item 1, Item 2 and Item 3, it iteratively adds further event
orderings such that every witness execution provably has to follow the orderings. Consequently,
if the added orderings induce a cycle, this serves as a proof that there exists no witness of the
input instance (𝑋,RF). The rules Item 1, Item 2 and Item 3 can intuitively be though of as simple
reasoning arguments why specific orderings have to be present in each witness of (𝑋,RF), and
Figure 7 provides an illustration of the rules.
We leverage the guarantees of closure by computing it before executing VerifyTSO resp.
VerifyPSO. If no closure 𝑃 of (𝑋,RF) exists, the algorithm VerifyTSO resp. VerifyPSO does not
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
The Reads-From Equivalence for the TSO and PSO Memory Models 164:17
need to be executed at all, as we already know that (𝑋,RF) is unrealizable. Otherwise we obtain
the closure 𝑃 , we execute VerifyTSO/VerifyPSO to search for a witness of (𝑋,RF), and we restrict
VerifyTSO/VerifyPSO to only consider prefixes 𝜎 ′ respecting 𝑃 (formally, 𝜎 ′ ⊑ 𝑃 |E(𝜎 ′)), since we
know that each solution of VTSO-rf/VPSO-rf (𝑋,RF) has to respect 𝑃 .
The notion of closure, its beneficial properties, as well as construction algorithms are well-
studied for the SC memory model [Abdulla et al. 2019; Chalupa et al. 2017; Pavlogiannis 2019].
Our conditions above extend this notion to TSO and PSO. Moreover, the closure we introduce here
is complete for concurrent programs with two threads, i.e., if 𝑃 exists then there is a valid trace
realizing (𝑋,RF) under the respective memory model.
4.4 Verifying Executions with Atomic Primitives
For clarity of presentation of the core algorithmic concepts, we have thus far neglected more
involved atomic operations, namely atomic read-modify-write (RMW) and atomic compare-and-
swap (CAS). We show how our approach handles verification of TSO and PSO executions that also
include RMW and CAS operations here in a separate section. Importantly, our treatment retains
the complexity bounds established in Theorem 3.1 and Theorem 3.2.
Atomic instructions. We consider the concurrent program under the TSO resp. PSO memory
model, which can further atomically execute the following types of instructions.
(1) A read-modify-write instruction rmw executes atomically the following sequence. It (i) reads,
with respect to the TSO resp. PSO semantics, the value 𝑣 of a global variable 𝑥 ∈ G, then (ii) uses
𝑣 to compute a new value 𝑣 ′, and finally (iii) writes the new value 𝑣 ′ to the global variable 𝑥 . An
example of a typical rmw computation is fetch-and-add (resp. fetch-and-sub), where 𝑣 ′ = 𝑣 + 𝑐
for some positive (resp. negative) constant 𝑐 .
(2) A compare-and-swap instruction cas executes atomically the following sequence. It (i) reads,
with respect to the TSO resp. PSO semantics, the value 𝑣 of a global variable 𝑥 ∈ G, (ii) compares
it with a value 𝑐 , and (iii) if 𝑣 = 𝑐 then it writes a new value 𝑣 ′ to the global variable 𝑥 .
Each instruction of the above two types blocks (i.e., it cannot get executed) until the buffer of its
thread is empty (resp. all buffers of its thread are empty in PSO). Finally, the instruction specifies
the nature of its final write. This write is either enqueued into its respective buffer (to be dequeued
into shared memory at a later point), or it gets immediately flushed into the shared memory.
Atomic instructionsmodeling. In our approach we handle atomic RMWand CAS instructions
without introducing them as new event types. Instead, we model these instructions as sequences of
already considered events, i.e., reads, buffer-writes, memory-writes, and fences. We annotate some
events of an atomic instruction to constitute an atomic block, which intuitively indicates that the
event sequence of the atomic block cannot be interleaved with other events, thus respecting the
semantics of the instruction.
(1) A read-modify-write instruction rmw on a variable 𝑥 is modeled as a sequence of four events: (i)
a fence event, (ii) a read of 𝑥 , (iii) a buffer-write of 𝑥 , and (iv) a memory-write of 𝑥 . The read and
buffer-write events (ii)+(iii) are annotated as constituting an atomic block; in case the write of
rmw is specified to proceed immediately to the shared memory, the memory-write event (iv) is
also part of the atomic block.
(2) For a compare-and-swap instruction cas we consider separately the following two cases. A
successful cas (i.e., the write proceeds) is modeled the same way as a read-modify-write. A failed
cas (i.e., the write does not proceed) is modeled simply as a fence followed by a read, with no
atomic block.
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
164:18 Truc Lam Bui, Krishnendu Chatterjee, Tushar Gautam, Andreas Pavlogiannis, and Viktor Toman
Executable atomic blocks. Here we describe the TSO- and PSO-executability conditions for
an atomic block. No further additions for executability are required, since no new event types are
introduced to handle RMW and CAS instructions.
Consider an instance (𝑋,RF) of VTSO-rf, and a trace 𝜎 with E(𝜎) ⊆ 𝑋 . An atomic block
containing a sequence of events 𝑒1, ..., 𝑒 𝑗 is TSO-executable in 𝜎 if:
(1) for each 1 ≤ 𝑖 ≤ 𝑗 we have that 𝑒𝑖 ∈ 𝑋 \ E(𝜎), and
(2) for each 1 ≤ 𝑖 ≤ 𝑗 we have that 𝑒𝑖 is TSO-executable in 𝜎 ◦ 𝑒1...𝑒𝑖−1.
Intuitively, an atomic block is TSO-executable if it can be executed as a sequence at once (i.e.,
without other events interleaved), and the TSO-executable conditions of each event (i.e., a read or a
buffer-write or a memory-write or a fence) within the block are respected.
The PSO-executable conditions are analogous. Given an instance (𝑋,RF) of VPSO-rf and a trace
𝜎 with E(𝜎) ⊆ 𝑋 , an atomic block of events 𝑒1, ..., 𝑒 𝑗 is PSO-executable in 𝜎 if:
(1) for each 1 ≤ 𝑖 ≤ 𝑗 we have that 𝑒𝑖 ∈ 𝑋 \ E(𝜎), and
(2) for each 1 ≤ 𝑖 ≤ 𝑗 we have that 𝑒𝑖 is PSO-executable in 𝜎 ◦ 𝑒1...𝑒𝑖−1.
Execution verification. Given the above executable conditions, the execution verification al-
gorithms VerifyTSO and VerifyPSO only require minor technical modifications to verify executions
including RMW and CAS instructions.
The core idea of the VerifyTSO resp. VerifyPSO modifications is to not extend prefixes with
single events that are part of some atomic block, and instead extend the atomic blocks fully. This
way, a lower set of (𝑋, PO) is considered only if for each atomic block, the block is either fully
present or fully not present in the lower set.
In VerifyTSO (Algorithm 1), in Line 4 we further consider each TSO-executable atomic block
𝑒1, ..., 𝑒 𝑗 not containing any memory-write event, and then in Line 5 we extend the prefix with the
entire atomic block, i.e., 𝜎 ← 𝜎◦𝑒1, ..., 𝑒 𝑗 . Further, in Line 7 we further consider each TSO-executable
atomic block 𝑒1, ..., 𝑒 𝑗 containing a memory-write event, and in Line 8 we then extend the prefix
with the whole atomic block, i.e., 𝜎 ← 𝜎 ◦ 𝑒1, ..., 𝑒 𝑗 .
In VerifyPSO (Algorithm 2), in the loop of Line 7 we further consider each PSO-executable atomic
block. Consider a fixed iteration of this loop with an atomic block 𝑒1, ..., 𝑒 𝑗 . The first event of the
atomic block 𝑒1 is a read, thus the condition in Line 9 is evaluated true with 𝑒1 and the control flow
moves to Line 10. Later, the condition in Line 12 is evaluated false (since 𝑒1 is a read). Finally, in
Line 15 the prefix is extended with the whole atomic block, i.e., 𝜎𝑒 ← 𝜎𝑒 ◦ 𝑒1, ..., 𝑒 𝑗 .
For VerifyTSO the argument of maintaining maximality in the set of thread events applies also
in the presence of RMW and CAS, and thus the bound of Theorem 3.1 is retained. Similarly, for
VerifyPSO the enumeration of fence maps and the maximality in the spurious writes is preserved
also with RMW and CAS, and hence the bound of Theorem 3.2 holds.
Closure. When verifying executions with RMW and CAS instructions, while the closure retains
its guarantees as is, it can more effectively detect unrealizable instances with additional rules.
Specifically, the closure 𝑃 of (𝑋,RF) satisfies the rules 1ś3 described in Section 4.3, and additionally,
given an event 𝑒 and an atomic block 𝑒1, ..., 𝑒 𝑗 , 𝑃 satisfies the following.
(4) If 𝑒𝑖 <𝑃 𝑒 for any 1 ≤ 𝑖 ≤ 𝑗 , then 𝑒 𝑗 <𝑃 𝑒 (i.e., if some part of the block is before 𝑒 then the entire
block is before 𝑒).
(5) If 𝑒 <𝑃 𝑒𝑖 for any 1 ≤ 𝑖 ≤ 𝑗 , then 𝑒 <𝑃 𝑒1 (i.e., if 𝑒 is before some part of the block then 𝑒 is before
the entire block).
5 READS-FROM SMC FOR TSO AND PSO
In this section we present RF-SMC, an exploration-optimal reads-from SMC algorithm for TSO and
PSO. The algorithm RF-SMC is based on the reads-from algorithm for SC [Abdulla et al. 2019], and
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
The Reads-From Equivalence for the TSO and PSO Memory Models 164:19
adapted in this work to handle the relaxed memory models TSO and PSO. The algorithm uses as
subroutines VerifyTSO (resp. VerifyPSO) to decide whether any given class of the RF partitioning
is consistent under the TSO (resp. PSO) semantics.
RF-SMC is a recursive algorithm, each call of RF-SMC is argumented by a tuple (𝜏,RF, 𝜎,mrk)
where the following points hold:
• 𝜏 is a sequence of thread events. Let 𝑋 denote the set of events of 𝜏 together with their memory-
write counterparts, formally 𝑋 = E(𝜏) ∪ {wM : ∃(wB,wM) ∈ W such that wB ∈ W𝐵 (𝜏)}.
• RF : R(𝑋 ) →W(𝑋 ) is a desired reads-from function.
• 𝜎 is a concrete valid trace that is a witness of (𝑋,RF), i.e., E(𝜎) = 𝑋 and RF𝜎 = RF.
• mrk ⊆ R(𝜏) is a set of reads that are marked to be committed to the source they read-from in 𝜎 .
Further, a globally accessible set of schedule sets called schedules is maintained throughout the
recursion. The schedules set is initialized empty (schedules = ∅) and the initial call of the algorithm
is argumented with empty sequences and sets Ð RF-SMC(𝜖, ∅, 𝜖, ∅).
Algorithm 3: RF-SMC(𝜏,RF, 𝜎,mrk)
Input: Sequence 𝜏 , desired reads-from RF, valid trace 𝜎 such that RF𝜎 = RF, marked reads mrk.
1 𝜎 ← 𝜎 ◦ 𝜎 where 𝜎 is an arbitrary maximal extension of 𝜎 // Maximally extend trace 𝜎
2 𝜏 ← 𝜏 ◦ 𝜎 |L(𝜎) // Extend 𝜏 with the thread-events subsequence of the extension 𝜎
3 foreach r ∈ R(𝜎) do // Reads of the extension 𝜎
4 schedules(pre𝜏 (r)) ← ∅ // Initialize new schedule set
5 foreach r ∈ R(𝜏) \mrk do // Unmarked reads
6 𝑃 ← PO|E(𝜎) // Program order on all the events of 𝜎
7 foreach r′ ∈ R(𝜏) \ {r} with thr(r′) ≠ thr(RF𝜎 (r
′)) do // Different-thread-RF𝜎 reads except r
8 insert wM→ r′ into 𝑃 where RF𝜎 (r
′) = wM // Add the reads-from ordering into 𝑃
9 mutations← {(wB,wM) ∈ W(𝜎) | r Z wM} \ {RF𝜎 (r)} // All different writes r may read-from
10 if r ∉ R(𝜎) then // If r is not part of the extension then
11 mutations← mutations ∩W(𝜎) // Only consider writes of the extension
12 foreach (wB,wM) ∈ mutations do // Considered mutations
13 causesafter← {𝑒 ∈ E(𝜏) | r <𝜏 𝑒 and 𝑒 ≤𝑃 wB} // Causal past of wB after r in 𝜏
14 𝜏 ′ ← pre𝜏 (r) ◦ 𝜏 |causesafter // r-prefix followed by causesafter
15 𝑋 ′ ← E(𝜏 ′) ∪ {wM′ : (wB′,wM′) ∈ W(𝜎) and wB′ ∈ W𝐵 (𝜏 ′)} // Event set for this mutation
16 RF′ ← {(r′,RF𝜎 (r
′)) : r′ ∈ R(𝜏 ′) and r′ ≠ r} ∪ {(r, (wB,wM))} // Reads-from for this mutation
17 if (𝜏 ′,RF′, _, _) ∉ schedules(pre𝜏 (r)) then // If this is a new schedule
18 𝜎 ′ ←Witness(𝑋 ′,RF′) // VerifyTSO (Algorithm 1) or VerifyPSO (Algorithm 2)
19 if 𝜎 ′ ≠ ⊥ then // If the mutation is realizable
20 mrk′ ← (mrk ∩ R(𝜏 ′)) ∪ R(causesafter) // Reads in causesafter get newly marked
21 add (𝜏 ′,RF′, 𝜎 ′,mrk′) to schedules(pre𝜏 (r)) // Add the successful new schedule
22 foreach r̂ ∈ R(𝜎) in the reverse order of <𝜎 do // Extension reads starting from the end
23 foreach (𝜏 ′,RF′, 𝜎 ′,mrk′) ∈ schedules(pre𝜏 (̂r)) do // Collected schedules mutating r̂
24 RF-SMC(𝜏 ′,RF′, 𝜎 ′,mrk′) // Recursive call on the schedule
25 delete schedules(pre𝜏 (̂r)) // This schedule set has been fully explored, hence it can be deleted
Algorithm 3 presents the pseudocode of RF-SMC. In each call of RF-SMC, a number of possible
changes (ormutations) of the desired reads-from function RF is proposed in iterations of the loop in
Line 5. Consider the read r of a fixed iteration of the Line 5 loop. First, in Lines 6ś8 a partial order
𝑃 is constructed to capture the causal past of write events. In Lines 9ś11 the set of mutations for r
is computed. Then in each iteration of the Line 12 loop a mutation is constructed (Lines 13ś16).
Here the partial order 𝑃 is utilized in Line 13 to help determine the event set of the mutation. The
constructed mutation, if deemed novel (checked in Line 17), is probed whether it is realizable (in
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.




(A)init || wB1 wM1 r1 wB2 wM2 r2
init wB2 r2 wB1 r1 wM1 wM2 || (C)
init wB2 wM2 wB1 r1 wM1 || r2 (B)







Fig. 8. RF-SMC (Algorithm 3). The gray boxes represent individual calls to RF-SMC. The sequence of events
inside a gray box is the trace 𝜎 ; the part left of the | |-separator is 𝜎 (before extending), and to the right is
𝜎 (the extension). The red dashed arrows represent the reads-from function RF𝜎 . Each black solid arrow
represents a recursive call, where the arrow’s outgoing tail and label describes the corresponding mutation.
Line 18). In case it is realizable, it gets added into schedules in Line 21. After all the mutations
are proposed, then in Lines 22ś25 a number of recursive calls of RF-SMC is performed, and the
recursive RF-SMC calls are argumented by the specific schedules retrieved.
Figure 8 illustrates the run of RF-SMC on a simple concurrent program (the run is identical
under both TSO and PSO). An initial trace (A) is obtained where r1 (𝑦) reads-from the initial event
and r2 (𝑥) reads-from 𝑤1 (𝑥). Here two mutations are probed and both are realizable. In the first
mutation (B), r1 (𝑦) is mutated to read-from𝑤2 (𝑦) and r2 (𝑥) is not retained (since it appears after
r1 (𝑦) and it is not in the causal past of 𝑤2 (𝑦)). In the second mutation (C), r2 (𝑥) is mutated to
read-from the initial event and r1 (𝑦) is retained (since it appears before r2 (𝑥)) with initial event as
its reads-from. After both mutations are added to schedules, recursive calls are performed in the
reverse order of reads appearing in the trace, thus starting with (C). Here no mutations are probed
since there are no events in the extension, the algorithm backtracks to (A) and a recursive call to
(B) is performed. Here one mutation (D) is added, where r2 (𝑥) is mutated to read-from the initial
event and r1 (𝑦) is retained (it appears before r2 (𝑥)) with𝑤2 (𝑦) as its reads-from. The call to (D) is
performed and here no mutations are probed (there are no events in the extension). The algorithm
backtracks and concludes, exploring four RF partitioning classes in total.
RF-SMC is sound, complete and exploration-optimal, and we formally state this in Theorem 3.3.
Extension from SC to TSO and PSO. The fundamental challenge in extending the SC algorithm
of Abdulla et al. [2019] to TSO and PSO is verifying execution consistency for TSO and PSO, which
we address in Section 4 (Line 18 of Algorithm 3 calls our algorithms VerifyTSO and VerifyPSO).
The main remaining challenge is then to ensure that the exploration optimality is preserved. To
that end, we have to exclude certain events (in particular, memory-write events) from subsequences
and event subsets that guide the exploration of Algorithm 3. Specifically, the sequences 𝜏 , 𝜏 ′, and 𝜏
invariantly contain only the thread events, which is ensured in Line 2, Line 13 and Line 14, and
then in Line 15 the absent memory-writes are reintroduced. No such distinction is required under
SC.
Remark 1 (Handling locks and atomic primitives). For clarity of presentation, so far we
have neglected locks in our model. However, lock events can be naturally handled by our approach
as follows. We consider each lock-release event release as an atomic write event (i.e., its effects are
not deferred by a buffer but instead are instantly visible to each thread). Then, each lock-acquire
event acquire is considered as a read event that accesses the unique memory location.
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
The Reads-From Equivalence for the TSO and PSO Memory Models 164:21
In SMC, we enumerate the reads-from functions that also consider locks, thus having constraints
of the form RF(acquire) = release. This treatment totally orders the critical sections of each lock,
which naturally solves all reads-from constraints of locks, and further ensures that no thread
acquires an already acquired (and so-far unreleased) lock. Therefore VerifyTSO/VerifyPSO need
not take additional care for locks. The approach to handle locks by Abdulla et al. [2019] directly
carries over to our exploration algorithm RF-SMC.
The atomic operations read-modify-write (RMW) and compare-and-swap (CAS) are modeled as
in Section 4.4, except for the fact that the atomic blocks are not necessary for SMC. Then RF-SMC
can handle programs with such operations as described by Abdulla et al. [2019]. In particular, the
modification of RF-SMC (Algorithm 3) to handle RMW and CAS operations is as follows.
Consider an iteration of the loop in Line 5 where r is the read-part of either a RMW or a successful
CAS, denoted 𝑒 , and let (wB′′,wM′′) = RF𝜎 (r). Then, in Line 9 we additionally consider as an extra
mutation each atomic instruction 𝑒 ′ satisfying:
(1) The read-part r′ of 𝑒 ′ reads-from the write-part (wB,wM) of 𝑒 (i.e., RF𝜎 (r
′) = (wB,wM)), and
(2) 𝑒 ′ is either a RMW, or it will be a successful CAS when it reads-from (wB′′,wM′′). In this case,
let (wB′,wM′) denote the write-part of 𝑒 ′.
When considering the above mutation in Line 12, we set RF′(r′) = (wB′′,wM′′) and RF′(r) =
(wB′,wM′) in Line 16, which intuitively aims to łreversež 𝑒 and 𝑒 ′ in the trace.
6 EXPERIMENTS
In this section we report on an experimental evaluation of the consistency verification algorithms
VerifyTSO and VerifyPSO, as well as the reads-from SMC algorithm RF-SMC.We have implemented
our algorithms as an extension in Nidhugg [Abdulla et al. 2015], a state-of-the-art stateless model
checker for multithreaded C/C++ programs with pthreads library, operating on LLVM IR.
Benchmarks. For our experimental evaluation of both the consistency verification and SMC,
we consider 109 benchmarks coming from four different categories, namely: (i) SV-COMP bench-
marks, (ii) benchmarks from related papers and works [Abdulla et al. 2015, 2019; Chatterjee et al.
2019; Huang and Huang 2016], (iii) mutual-exclusion algorithms, and (iv) dynamic-programming
benchmarks of Chatterjee et al. [2019]. Although the consistency and SMC algorithms can be
extended to support atomic compare-and-swap and read-modify-write primitives (cf. Remark 1),
our current implementation does not support these primitives. Therefore, we used all benchmarks
without such primitives that we could obtain (e.g., we include every benchmark of the relevant SC
reads-from work [Abdulla et al. 2019] except the one benchmark with compare-and-swap). Each
benchmark comes with a scaling parameter, called the unroll bound, which controls the bound on
the number of iterations in all loops of the benchmark (and in some cases it further controls the
number of threads).
6.1 Experiments on Execution Verification for TSO and PSO
In this section we perform an experimental evaluation of our execution verification algorithms
VerifyTSO and VerifyPSO. For the purpose of comparison, we have also implemented within
Nidhugg the naive lower-set enumeration algorithm of Abdulla et al. [2019]; Biswas and Enea
[2019], extended to TSO and PSO. Intuitively, this approach enumerates all lower sets of the
program order restricted to the input event set, which yields a better complexity bound than
enumerating write-coherence orders (even with just one location). The extensions to TSO and PSO
are called NaiveVerifyTSO and NaiveVerifyPSO, respectively, and their worst-case complexity is
𝑛2·𝑘 and 𝑛𝑘 · (𝑑+1) , respectively (as discussed in Section 3). Further, for each of the above verification
algorithms, we consider two variants, namely, with and without the closure heuristic of Section 4.3.
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
164:22 Truc Lam Bui, Krishnendu Chatterjee, Tushar Gautam, Andreas Pavlogiannis, and Viktor Toman
Setup. We evaluate the verification algorithms on execution consistency instances induced
during SMC of the benchmarks. For TSO we have collected 9400 instances, 1600 of which are
not realizable. For PSO we have collected 9250 instances, 1400 of which are not realizable. The
collection process is described in detail in Appendix C.1 of Bui et al. [2021]. For each instance, we
run the verification algorithms subject to a timeout of one minute, and we report the average time
achieved over 5 runs.
Below we present the results using logarithmically scaled plots, where the opaque and semi-
transparent red lines represent identity and an order-of-magnitude difference, respectively.
Results ś algorithms with closure. Here we evaluate the verification algorithms that execute
the closure as the preceding step. The plots in Figure 9 present the results for TSO and PSO.








































Fig. 9. Consistency verification comparison on TSO (left) and PSO (right) when using closure.
In TSO, our algorithm VerifyTSO is similar to or faster than NaiveVerifyTSO on the realizable
instances (blue dots), and the improvement is mostly within an order of magnitude. All unrealizable
instances (green dots) were detected as such by closure, and hence the closure-using VerifyTSO
and NaiveVerifyTSO coincide on these instances.
We make similar observations in PSO, where VerifyPSO is similar or superior to NaiveVerifyPSO
for the realizable instances, and the algorithms are indentical on the unrealizable instances, since
these are all detected as unrealizable by closure.














































Fig. 10. Consistency verification comparison on TSO (left) and PSO (right) without the closure.
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
The Reads-From Equivalence for the TSO and PSO Memory Models 164:23
Results ś algorithms without closure. Here we evaluate the verification algorithms without
the closure. The plots in Figure 10 present the results for TSO and PSO.
In TSO, the algorithmVerifyTSO outperforms NaiveVerifyTSO onmost of the realizable instances
(blue dots). Further, VerifyTSO significantly outperforms NaiveVerifyTSO on the unrealizable in-
stances (green dots). This is because without closure, a verification algorithm can declare an instance
unrealizable only after an exhaustive exploration of its respective lower-set space. VerifyTSO ex-
plores a significantly smaller space compared to NaiveVerifyTSO, as outlined in Section 3.
Similar observations as above hold in PSO for the algorithms VerifyPSO and NaiveVerifyPSO
without closure, both for the realizable and the unrealizable instances.
Results ś effect of closure. Here we comment on the effect of closure for the verification
algorithms, in Appendix C.2 of Bui et al. [2021] we present the detailed analysis. Recall that closure
constructs a partial order that each witness has to satisfy, and declares an instance unrealizable
when it detects that the partial order cannot be constructed for this instance (we refer to Section 4.3
for details).
For each verification algorithm, its version without closure is faster on most instances that are
realizable (i.e., a witness exists). This means that the overhead of computing the closure typically
outweighs the consecutive benefit of the verification being guided by the partial order.
On the other hand, for each verification algorithm, its version with closure is significantly faster
on the unrealizable instances (i.e., no witness exists). This is because a verification algorithm has to
enumerate all its lower sets before declaring an instance unrealizable, and this is much slower than
the polynomial closure computation.
Results ś verification with atomic operations. Here we present additional experiments to
evaluate TSO verification algorithms VerifyTSO and NaiveVerifyTSO on executions containing
atomic operations read-modify-write (RMW) and compare-and-swap (CAS). To that end, we con-
sider 1088 verification instances (779 realizable and 309 not realizable) that arise during stateless
model checking of benchmarks containing RMW and CAS, namely:
• synthetic benchmarks casrot [Abdulla et al. 2019] and cinc [Kokologiannakis et al. 2019b],
• data structure benchmarks barrier, chase-lev, ms-queue and linuxrwlocks [Kokologiannakis
et al. 2019b; Norris and Demsky 2013], and











































Fig. 11. Consistency verification comparison of VerifyTSO and NaiveVerifyTSO with closure (left) and
without closure (right) on verification instances that contain RMW and CAS instructions.
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
164:24 Truc Lam Bui, Krishnendu Chatterjee, Tushar Gautam, Andreas Pavlogiannis, and Viktor Toman
The results are presented in Figure 11. The left plot depicts the results for VerifyTSO and
NaiveVerifyTSO when closure is used as a preceding step. Here the results are all within an order-
of-magnitude difference, and they are identical for unrealizable instances, since all of them were
detected as unrealizable already by the closure. The right plot depicts the results for VerifyTSO and
NaiveVerifyTSO without using the closure. Here the difference for realizable instances is also within
an order of magnitude, but for some unrealizable instances the algorithm VerifyTSO is significantly
faster. Generally, the observed improvement of our VerifyTSO as compared to NaiveVerifyTSO is
somewhat smaller in Figure 11, which could be due to the fact that executions with RMW and
CAS instructions typically have fewer concurrent writes (indeed, in an execution where each write
event is a part of a RMW/CAS instruction, each conflicting pair of writes is inherently ordered
by the reads-from orderings together with PO). Finally, in Appendix C.2 of Bui et al. [2021] the
effect of closure is evaluated for both verification algorithms VerifyTSO and NaiveVerifyTSO on
instances with RMW and CAS.
6.2 Experiments on SMC for TSO and PSO
In this section we focus on assessing the advantages of utilizing the reads-from equivalence for
SMC in TSO and PSO. We have used RF-SMC for stateless model checking of 109 benchmarks
under each memory modelM ∈ {SC,TSO, PSO}, where SC is handled in our implementation as
TSO with a fence inserted after each thread event. Appendix C.3 of Bui et al. [2021] provides further
details on our SMC setup.
Comparison. As a baseline for comparison, we have also executed Source-DPOR [Abdulla et al.
2014], which is implemented in Nidhugg and explores the trace space using the partitioning based
on the ShashaśSnir equivalence. In SC, we have further executed rfsc, the Nidhugg implementation
of the reads-from SMC algorithm for SC by Abdulla et al. [2019], and the full comparison that
includes rfsc for SC is in Appendix C.4 of Bui et al. [2021]. Both rfsc and Source are well-optimized,
and recently started using advanced data-structures for SMC [Lång and Sagonas 2020]. The works
of Kokologiannakis et al. [2019b]; Kokologiannakis and Vafeiadis [2020] provide a general interface
for reads-from SMC in relaxed memory models. However, they handle a given memory model
assuming that an auxiliary consistency verification algorithm for that memory model is provided.
No such consistency algorithm for TSO or PSO is presented by Kokologiannakis et al. [2019b];
Kokologiannakis and Vafeiadis [2020], and, to our knowledge, the tool implementations of Kokolo-
giannakis et al. [2019b]; Kokologiannakis and Vafeiadis [2020] also lack a consistency algorithm for
both TSO and PSO. Thus these tools are not included in the evaluation.1
Evaluation objective. Our objective for the SMC evaluation is three-fold. First, we want to
quantify how each memory modelM ∈ {SC,TSO, PSO} impacts the size of the RF partitioning.
Second, we are interested to see whether, as compared to the baseline ShashaśSnir equivalence,
the RF equivalence leads to coarser partitionings for TSO and PSO, as it does for SC [Abdulla et al.
2019]. Finally, we want to determine whether a coarser RF partitioning leads to faster exploration.
Theorem 3.3 states that RF-SMC spends polynomial time per partitioning class, and we aim to see
whether this is a small polynomial in practice.
Results. We illustrate the obtained results with several scatter plots. Each plot compares two
algorithms executing under specified memory models. Then for each benchmark, we consider the
highest attempted unroll bound where both the compared algorithms finish before the one-hour
timeout. Green dots indicate that a trace reduction was achieved on the underlying benchmark
by the algorithm on the y-axis as compared to the algorithm on the x-axis. Benchmarks with no
1Another related work is MCR [Huang and Huang 2016], however, the corresponding tool operates on Java programs and
uses heavyweight SMT solvers that require fine tuning, and thus is beyond the experimental scope of this work.
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
The Reads-From Equivalence for the TSO and PSO Memory Models 164:25














































Fig. 12. Traces comparison as RF-SMC moves from SC to TSO (left) and from TSO to PSO (right).
trace reduction are represented by the blue dots. All scatter plots are in log scale, the opaque and
semi-transparent red lines represent identity and an order-of-magnitude difference, respectively.
The plots in Figure 12 illustrate how the size of the RF partitioning explored by RF-SMC changes
as we move to more relaxed memory models (SC to TSO to PSO). The plots in Figure 13 capture
how the size of the RF partitioning explored by RF-SMC relates to the size of the ShashaśSnir
partitioning explored by Source. Finally, the plots in Figure 14 demonstrate the time comparison of
RF-SMC and Source when there is some (green dots) or no (blue dots) RF-induced trace reduction.
Below we discuss the observations on the obtained results. Table 1 captures detailed results on
several benchmarks that we refer to as examples in the discussion.

















































Fig. 13. Traces comparison for RF-SMC and Source on the TSO (left) and PSO (right) memory model.
Discussion. We notice that the analysed programs can often exhibit additional behavior in
relaxed memory settings. This causes an increase in the size of the partitionings explored by SMC
algorithms (see 27_Boop4 in Table 1 as an example). Figure 12 illustrates the overall phenomenon for
RF-SMC, where the increase of the RF partitioning size (and hence the number of traces explored)
is sometimes beyond an order of magnitude when moving from SC to TSO, or from TSO to PSO.
We observe that across all memory models, the reads-from equivalence can offer significant
reduction in the trace partitioning as compared to ShashaśSnir equivalence. This leads to fewer
traces that need to be explored, see the plots of Figure 13. As we move towards more relaxed
memory (SC to TSO to PSO), the reduction of RF partitioning often becomes more prominent
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
164:26 Truc Lam Bui, Krishnendu Chatterjee, Tushar Gautam, Andreas Pavlogiannis, and Viktor Toman












































Fig. 14. Times comparison for RF-SMC and Source on the TSO (left) and PSO (right) memory model.
Table 1. SMC results on several benchmarks. U denotes the unroll bound. The timeout of one hour is
indicated by ł-ž. Bold-font entries indicate the smallest numbers for the respective memory model.
Benchmark U
Seq. Consistency Total Store Order Partial Store Order




1 2902 21948 3682 36588 8233 572436
4 197260 3873348 313336 9412428 1807408 -
Times
1 1.22s 1.74s 1.46s 6.18s 4.40s 169s




17 4667 100664 29217 4719488 253125 -
21 19991 1527736 223929 - - -
Times
17 6.70s 46s 32s 2978s 475s -




3 14625 47892 14625 59404 14625 63088
4 471821 2278732 471821 3023380 471821 3329934
Times
3 12s 6.18s 12s 12s 18s 39s
4 553s 331s 547s 778s 930s 2844s
(see 27_Boop4 in Table 1). Interestingly, in some cases the size of the ShashaśSnir partitioning
explored by Source increases as we move to more relaxed settings, while the RF partitioning remains
unchanged (cf. fillarray_false in Table 1). All these observations signify advantages of RF for
analysis of the more complex program behavior that arises due to relaxed memory.
We now discuss how trace partitioning coarseness affects execution time, observing the plots
of Figure 14. We see that in cases where RF partitioning is coarser (green dots), our RF algorithm
RF-SMC often becomes significantly faster than the ShashaśSnir-based Source, allowing us to
analyse programs scaled several levels further (see eratosthenes in Table 1). In cases where
the sizes of the RF partitioning and the ShashaśSnir partitioning coincide (blue dots), the well-
engineered Source outperforms our RF-SMC implementation. The time differences in these cases
are reasonably moderate, suggesting that the polynomial overhead incurred to operate on the RF
partitioning is small in practice.
Appendix C.4 of Bui et al. [2021] contains the complete results on all 109 benchmarks, as well
as further scatter plots, illustrating (i) comparison of RF-SMC with Source and rfsc in SC, (ii) time
comparison of RF-SMC across memory models, and (iii) the effect of using closure in the constency
checking during SMC.
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
The Reads-From Equivalence for the TSO and PSO Memory Models 164:27
7 CONCLUSIONS
In this work we have solved the consistency verification problem under a reads-from map for
the TSO and PSO relaxed memory models. Our algorithms scale as 𝑂 (𝑘 · 𝑛𝑘+1) for TSO, and as
𝑂 (𝑘 · 𝑛𝑘+1 ·min(𝑛𝑘 · (𝑘−1) , 2𝑘 ·𝑑 )) for PSO, for 𝑛 events, 𝑘 threads and 𝑑 variables. Thus, they both
become polynomial-time for a bounded number of threads, similar to the case for SC that was
established recently [Abdulla et al. 2019; Biswas and Enea 2019]. In practice, our algorithms perform
much better than the standard baseline methods, offering significant scalability improvements.
Encouraged by these scalability improvements, we have used these algorithms to develop, for the
first time, SMC under TSO and PSO using the reads-from equivalence, as opposed to the standard
ShashaśSnir equivalence. Our experiments show that the underlying reads-from partitioning is
often much coarser than the ShashaśSnir partitioning, which yields a significant speedup in the
model checking task.
We remark that our consistency-verification algorithms have direct applications beyond SMC. In
particular, most predictive dynamic analyses solve a consistency-verification problem in order to
infer whether an erroneous execution can be generated by a concurrent system (see, e.g., Kini et al.
[2017]; Mathur et al. [2020]; Smaragdakis et al. [2012]). Hence, the results of this work allow to
extend predictive analyses to TSO/PSO in a scalable way that does not sacrifice precision. We will
pursue this direction in our future work.
ACKNOWLEDGMENTS
The research was partially funded by the ERC CoG 863818 (ForM-SMArt) and the Vienna Science
and Technology Fund (WWTF) through project ICT15-003.
REFERENCES
Parosh Abdulla, Stavros Aronis, Bengt Jonsson, and Konstantinos Sagonas. 2014. Optimal Dynamic Partial Order Reduction
(POPL). https://doi.org/10.1145/2578855.2535845
Parosh Aziz Abdulla, Stavros Aronis, Mohamed Faouzi Atig, Bengt Jonsson, Carl Leonardsson, and Konstantinos Sagonas.
2015. Stateless Model Checking for TSO and PSO. In TACAS. https://doi.org/10.1007/978-3-662-46681-0_28
Parosh Aziz Abdulla, Mohamed Faouzi Atig, Bengt Jonsson, Magnus Lång, Tuan Phong Ngo, and Konstantinos Sagonas.
2019. Optimal Stateless Model Checking for Reads-from Equivalence under Sequential Consistency. Proc. ACM Program.
Lang. 3, OOPSLA, Article 150 (Oct. 2019), 29 pages. https://doi.org/10.1145/3360576
Parosh Aziz Abdulla, Mohamed Faouzi Atig, Bengt Jonsson, and Tuan Phong Ngo. 2018. Optimal stateless model checking
under the release-acquire semantics. Proc. ACM Program. Lang. 2, OOPSLA (2018), 135:1ś135:29. https://doi.org/10.1145/
3276505
S. V. Adve and K. Gharachorloo. 1996. Shared memory consistency models: a tutorial. Computer 29, 12 (Dec 1996), 66ś76.
https://doi.org/10.1109/2.546611
Elvira Albert, Puri Arenas, María García de la Banda, Miguel Gómez-Zamalloa, and Peter J. Stuckey. 2017. Context-Sensitive
Dynamic Partial Order Reduction. In Computer Aided Verification, Rupak Majumdar and Viktor Kunčak (Eds.). Springer
International Publishing, Cham, 526ś543. https://doi.org/10.1007/978-3-319-63387-9_26
Elvira Albert, Miguel Gómez-Zamalloa, Miguel Isabel, and Albert Rubio. 2018. Constrained Dynamic Partial Order Reduction.
In Computer Aided Verification, Hana Chockler and Georg Weissenbacher (Eds.). Springer International Publishing, Cham,
392ś410. https://doi.org/10.1007/978-3-319-96142-2_24
Jade Alglave. 2010. A Shared Memory Poetics. Ph.D. Dissertation. Paris Diderot University.
Jade Alglave, Patrick Cousot, and Caterina Urban. 2017. Concurrency with Weak Memory Models (Dagstuhl Seminar 16471).
Dagstuhl Reports 6, 11 (2017), 108ś128. https://doi.org/10.4230/DagRep.6.11.108
Stavros Aronis, Bengt Jonsson, Magnus Lång, and Konstantinos Sagonas. 2018. Optimal Dynamic Partial Order Reduction
with Observers. In Tools and Algorithms for the Construction and Analysis of Systems, Dirk Beyer and Marieke Huisman
(Eds.). Springer International Publishing, Cham, 229ś248. https://doi.org/10.1007/978-3-319-89963-3_14
Ranadeep Biswas and Constantin Enea. 2019. On the complexity of checking transactional consistency. Proc. ACM Program.
Lang. 3, OOPSLA (2019), 165:1ś165:28. https://doi.org/10.1145/3360591
Ahmed Bouajjani, Egor Derevenetc, and Roland Meyer. 2013. Checking and Enforcing Robustness against TSO. In Pro-
gramming Languages and Systems, Matthias Felleisen and Philippa Gardner (Eds.). Springer Berlin Heidelberg, Berlin,
Heidelberg, 533ś553. https://doi.org/10.1007/978-3-642-37036-6_29
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
164:28 Truc Lam Bui, Krishnendu Chatterjee, Tushar Gautam, Andreas Pavlogiannis, and Viktor Toman
Ahmed Bouajjani, Roland Meyer, and Eike Möhlmann. 2011. Deciding Robustness against Total Store Ordering. In Automata,
Languages and Programming, Luca Aceto, Monika Henzinger, and Jiří Sgall (Eds.). Springer Berlin Heidelberg, Berlin,
Heidelberg, 428ś440. https://doi.org/10.1007/978-3-642-22012-8_34
Truc Lam Bui, Krishnendu Chatterjee, Tushar Gautam, Andreas Pavlogiannis, and Viktor Toman. 2021. The Reads-
From Equivalence for the TSO and PSO Memory Models. CoRR abs/2011.11763 (2021). arXiv:2011.11763 https:
//arxiv.org/abs/2011.11763
Harold W. Cain and Mikko H. Lipasti. 2002. Verifying Sequential Consistency Using Vector Clocks. In Proceedings of the
Fourteenth Annual ACM Symposium on Parallel Algorithms and Architectures (Winnipeg, Manitoba, Canada) (SPAA ’02).
Association for Computing Machinery, New York, NY, USA, 153ś154. https://doi.org/10.1145/564870.564897
Marek Chalupa, Krishnendu Chatterjee, Andreas Pavlogiannis, Nishant Sinha, and Kapil Vaidya. 2017. Data-centric Dynamic
Partial Order Reduction. Proc. ACM Program. Lang. 2, POPL, Article 31 (Dec. 2017), 30 pages. https://doi.org/10.1145/
3158119
Krishnendu Chatterjee, Andreas Pavlogiannis, and Viktor Toman. 2019. Value-Centric Dynamic Partial Order Reduction.
Proc. ACM Program. Lang. 3, OOPSLA, Article 124 (Oct. 2019), 29 pages. https://doi.org/10.1145/3360550
Y. Chen, Yi Lv, W. Hu, T. Chen, Haihua Shen, Pengyu Wang, and Hong Pan. 2009. Fast complete memory consistency
verification. In 2009 IEEE 15th International Symposium on High Performance Computer Architecture. 381ś392. https:
//doi.org/10.1109/HPCA.2009.4798276
Peter Chini and Prakash Saivasan. 2020. A Framework for Consistency Algorithms. In 40th IARCS Annual Conference on
Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2020, December 14-18, 2020, BITS Pilani,
K K Birla Goa Campus, Goa, India (Virtual Conference) (LIPIcs, Vol. 182), Nitin Saxena and Sunil Simon (Eds.). Schloss
Dagstuhl - Leibniz-Zentrum für Informatik, 42:1ś42:17. https://doi.org/10.4230/LIPIcs.FSTTCS.2020.42
E.M. Clarke, O. Grumberg, M. Minea, and D. Peled. 1999. State space reduction using partial order techniques. STTT 2, 3
(1999), 279ś287. https://doi.org/10.1007/s100090050035
Brian Demsky and Patrick Lam. 2015. SATCheck: SAT-directed Stateless Model Checking for SC and TSO (OOPSLA). ACM,
New York, NY, USA, 20ś36. https://doi.org/10.1145/2814270.2814297
Cormac Flanagan and Patrice Godefroid. 2005. Dynamic Partial-order Reduction for Model Checking Software. In POPL.
https://doi.org/10.1145/1040305.1040315
Florian Furbach, Roland Meyer, Klaus Schneider, and Maximilian Senftleben. 2015. Memory-Model-Aware Testing: A Unified
Complexity Analysis. ACM Trans. Embed. Comput. Syst. 14, 4, Article 63 (Sept. 2015), 25 pages. https://doi.org/10.1145/
2753761
Phillip B. Gibbons and Ephraim Korach. 1997. Testing Shared Memories. SIAM J. Comput. 26, 4 (Aug. 1997), 1208ś1244.
https://doi.org/10.1137/S0097539794279614
P. Godefroid. 1996. Partial-Order Methods for the Verification of Concurrent Systems: An Approach to the State-Explosion
Problem. Springer-Verlag, Secaucus, NJ, USA. https://doi.org/10.1007/3-540-60761-7
Patrice Godefroid. 1997. Model Checking for Programming Languages Using VeriSoft. In POPL. https://doi.org/10.1145/
263699.263717
Patrice Godefroid. 2005. Software Model Checking: The VeriSoft Approach. FMSD 26, 2 (2005), 77ś101. https://doi.org/10.
1007/s10703-005-1489-x
Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: A Correctness Condition for Concurrent Objects. ACM
Trans. Program. Lang. Syst. 12, 3 (July 1990), 463ś492. https://doi.org/10.1145/78969.78972
W. Hu, Y. Chen, T. Chen, C. Qian, and L. Li. 2012. Linear Time Memory Consistency Verification. IEEE Trans. Comput. 61, 4
(2012), 502ś516. https://doi.org/10.1109/TC.2011.41
Jeff Huang. 2015. Stateless Model Checking Concurrent Programs with Maximal Causality Reduction. In Proceedings of the
36th ACM SIGPLAN Conference on Programming Language Design and Implementation. https://doi.org/10.1145/2737924.
2737975
Shiyou Huang and Jeff Huang. 2016. Maximal Causality Reduction for TSO and PSO. SIGPLAN Not. 51, 10 (Oct. 2016),
447ś461. https://doi.org/10.1145/3022671.2984025
Shiyou Huang and Jeff Huang. 2017. Speeding Up Maximal Causality Reduction with Static Dependency Analysis. In
31st European Conference on Object-Oriented Programming, ECOOP 2017, June 19-23, 2017, Barcelona, Spain. 16:1ś16:22.
https://doi.org/10.4230/LIPIcs.ECOOP.2017.16
Vineet Kahlon, Chao Wang, and Aarti Gupta. 2009. Monotonic Partial Order Reduction: An Optimal Symbolic Partial Order
Reduction Technique. In Proceedings of the 21st International Conference on Computer Aided Verification (Grenoble, France)
(CAV ’09). Springer-Verlag, Berlin, Heidelberg, 398ś413. https://doi.org/10.1007/978-3-642-02658-4_31
Dileep Kini, Umang Mathur, and Mahesh Viswanathan. 2017. Dynamic Race Prediction in Linear Time. In Proceedings of the
38th ACM SIGPLAN Conference on Programming Language Design and Implementation (Barcelona, Spain) (PLDI 2017).
ACM, New York, NY, USA, 157ś170. https://doi.org/10.1145/3062341.3062374
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
The Reads-From Equivalence for the TSO and PSO Memory Models 164:29
Michalis Kokologiannakis, Ori Lahav, Konstantinos Sagonas, and Viktor Vafeiadis. 2017. Effective Stateless Model Checking
for C/C++ Concurrency. Proc. ACM Program. Lang. 2, POPL, Article 17 (Dec. 2017), 32 pages. https://doi.org/10.1145/
3158105
Michalis Kokologiannakis, Azalea Raad, and Viktor Vafeiadis. 2019a. Effective Lock Handling in Stateless Model Checking.
Proc. ACM Program. Lang. 3, OOPSLA, Article 173 (Oct. 2019), 26 pages. https://doi.org/10.1145/3360599
Michalis Kokologiannakis, Azalea Raad, and Viktor Vafeiadis. 2019b. Model Checking for Weakly Consistent Libraries. In
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (Phoenix, AZ,
USA) (PLDI 2019). ACM, New York, NY, USA, 96ś110. https://doi.org/10.1145/3314221.3314609
Michalis Kokologiannakis and Viktor Vafeiadis. 2020. HMC: Model Checking for Hardware Memory Models. In ASPLOS
’20: Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16-20, 2020,
James R. Larus, Luis Ceze, and Karin Strauss (Eds.). ACM, 1157ś1171. https://doi.org/10.1145/3373376.3378480
Ori Lahav, Viktor Vafeiadis, Jeehoon Kang, Chung-Kil Hur, and Derek Dreyer. 2017. Repairing sequential consistency in
C/C++11. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation,
PLDI 2017, Barcelona, Spain, June 18-23, 2017, Albert Cohen and Martin T. Vechev (Eds.). ACM, 618ś632. https://doi.org/
10.1145/3062341.3062352
L. Lamport. 1979. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Trans.
Comput. 28, 9 (1979), 690ś691. https://doi.org/10.1109/TC.1979.1675439
Magnus Lång and Konstantinos Sagonas. 2020. Parallel Graph-Based Stateless Model Checking. In Automated Technology
for Verification and Analysis - 18th International Symposium, ATVA 2020, Hanoi, Vietnam, October 19-23, 2020, Proceedings
(Lecture Notes in Computer Science, Vol. 12302), Dang Van Hung and Oleg Sokolsky (Eds.). Springer, 377ś393. https:
//doi.org/10.1007/978-3-030-59152-6_21
Tom Ball Madan Musuvathi, Shaz Qadeer. 2007. CHESS: A systematic testing tool for concurrent software. Technical Report.
C. Manovit and S. Hangal. 2006. Completely verifying memory consistency of test program executions. In The Twelfth
International Symposium on High-Performance Computer Architecture, 2006. 166ś175. https://doi.org/10.1109/HPCA.2006.
1598123
Umang Mathur, Andreas Pavlogiannis, and Mahesh Viswanathan. 2020. The Complexity of Dynamic Data Race Prediction.
In Proceedings of the 35th Annual ACM/IEEE Symposium on Logic in Computer Science (Saarbrücken, Germany) (LICS ’20).
Association for Computing Machinery, New York, NY, USA, 713ś727. https://doi.org/10.1145/3373718.3394783
Umang Mathur, Andreas Pavlogiannis, and Mahesh Viswanathan. 2021. Optimal Prediction of Synchronization-Preserving
Races (POPL). https://doi.org/10.1145/3434317
Brian Norris and Brian Demsky. 2013. CDSchecker: checking concurrent data structures written with C/C++ atomics. In
OOPSLA, Antony L. Hosking, Patrick Th. Eugster, and Cristina V. Lopes (Eds.). ACM, 131ś150. https://doi.org/10.1145/
2509136.2509514
Scott Owens, Susmit Sarkar, and Peter Sewell. 2009. A Better x86 Memory Model: x86-TSO. In Theorem Proving in Higher
Order Logics, Stefan Berghofer, Tobias Nipkow, Christian Urban, and Makarius Wenzel (Eds.). Springer Berlin Heidelberg,
Berlin, Heidelberg, 391ś407. https://doi.org/10.1007/978-3-642-03359-9_27
Andreas Pavlogiannis. 2019. Fast, Sound, and Effectively Complete Dynamic Race Prediction. Proc. ACM Program. Lang. 4,
POPL, Article 17 (Dec. 2019), 29 pages. https://doi.org/10.1145/3371085
Doron Peled. 1993. All from One, One for All: On Model Checking Using Representatives. In CAV. https://doi.org/10.1007/3-
540-56922-7_34
Anton Podkopaev, Ori Lahav, and Viktor Vafeiadis. 2019. Bridging the gap between programming languages and hardware
weak memory models. Proc. ACM Program. Lang. 3, POPL (2019), 69:1ś69:31. https://doi.org/10.1145/3290382
César Rodríguez, Marcelo Sousa, Subodh Sharma, and Daniel Kroening. 2015. Unfolding-based Partial Order Reduction. In
CONCUR. https://doi.org/10.4230/LIPIcs.CONCUR.2015.456
Jake Roemer, Kaan Genç, and Michael D. Bond. 2020. SmartTrack: Efficient Predictive Race Detection. In Proceedings of
the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (London, UK) (PLDI 2020).
Association for Computing Machinery, New York, NY, USA, 747ś762. https://doi.org/10.1145/3385412.3385993
Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, and Magnus O. Myreen. 2010. X86-TSO: A Rigorous
and Usable Programmer’s Model for x86 Multiprocessors. Commun. ACM 53, 7 (July 2010), 89ś97. https://doi.org/10.
1145/1785414.1785443
Dennis Shasha and Marc Snir. 1988. Efficient and Correct Execution of Parallel Programs That Share Memory. ACM Trans.
Program. Lang. Syst. 10, 2 (April 1988), 282ś312. https://doi.org/10.1145/42190.42277
Yannis Smaragdakis, Jacob Evans, Caitlin Sadowski, Jaeheon Yi, and Cormac Flanagan. 2012. Sound Predictive Race Detection
in Polynomial Time. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming
Languages (Philadelphia, PA, USA) (POPL ’12). ACM, New York, NY, USA, 387ś400. https://doi.org/10.1145/2103656.
2103702
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
164:30 Truc Lam Bui, Krishnendu Chatterjee, Tushar Gautam, Andreas Pavlogiannis, and Viktor Toman
CORPORATE SPARC International, Inc. 1994. The SPARC Architecture Manual (Version 9). Prentice-Hall, Inc., Upper Saddle
River, NJ, USA.
Rachid Zennou, Ahmed Bouajjani, Constantin Enea, and Mohammed Erradi. 2019. Gradual Consistency Checking. In
Computer Aided Verification - 31st International Conference, CAV 2019, New York City, NY, USA, July 15-18, 2019, Proceedings,
Part II (Lecture Notes in Computer Science, Vol. 11562), Isil Dillig and Serdar Tasiran (Eds.). Springer, 267ś285. https:
//doi.org/10.1007/978-3-030-25543-5_16
Naling Zhang, Markus Kusano, and Chao Wang. 2015. Dynamic Partial Order Reduction for Relaxed Memory Models. In
PLDI. https://doi.org/10.1145/2737924.2737956
Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 164. Publication date: October 2021.
