Portability Analysis for Axiomatic Memory Models. PORTHOS: One Tool for
  all Models by Ponce-de-León, Hernán et al.
ar
X
iv
:1
70
2.
06
70
4v
2 
 [c
s.P
L]
  2
8 A
pr
 20
17
Portability Analysis for Weak Memory Models
porthos: One Tool for all Models
Herna´n Ponce-de-Leo´n1⋆, Florian Furbach2, Keijo Heljanko3, and Roland Meyer4⋆
1fortiss GmbH, Germany 2TU Kaiserslautern, Germany 3Aalto University and
HIIT, Finland 4TU Braunschweig, Germany
ponce@fortiss.org, furbach@cs.uni-kl.de, keijo.heljanko@aalto.fi,
roland.meyer@tu-braunschweig.de
Abstract. We present porthos, the first tool that discovers porting
bugs in performance-critical code. porthos takes as input a program
and the memory models of the source architecture for which the program
has been developed and the target model to which it is ported. If the
code is not portable, porthos finds a bug in the form of an unexpected
execution — an execution that is consistent with the target but inconsis-
tent with the source memory model. Technically, porthos implements
a bounded model checking method that reduces the portability analysis
problem to satisfiability modulo theories (SMT). There are two main
problems in the reduction that we present novel and efficient solutions
for. First, the formulation of the portability problem contains a quanti-
fier alternation (consistent + inconsistent). We introduce a formula that
encodes both in a single existential query. Second, the supported mem-
ory models (e.g., Power) contain recursive definitions. We compute the
required least fixed point semantics for recursion (a problem that was left
open in [47]) efficiently in SMT. Finally we present the first experimental
analysis of portability from TSO to Power.
1 Introduction
Porting code from one architecture to another is a routine task in system devel-
opment. Given that no functionality has to be added, porting is rarely consid-
ered interesting from a programming point of view. At the same time, porting
is non-trivial as the hardware influences both the semantics and the compila-
tion of the code in subtle ways. The unfortunate combination of being routine
and yet subtle makes porting prone to mistakes. This is particularly true for
performance-critical code that interacts closely with the execution environment.
Such code often has data races and thus exposes the programmer to the details
of the underlying hardware. When the architecture is changed, the code may
have to be adapted to the primitives of the target hardware.
We tackle the problem of porting performance-critical code among hardware
architectures. Our contribution is the new (and to the best of our knowledge first)
tool porthos to fight porting bugs. It takes as input a piece of code, a model of
the source architecture for which the code has been developed, and a model of the
⋆ This work was carried out when the author was at Aalto University.
target architecture to which the code is to be ported. porthos automatically
checks whether every behaviour of the code on the target architecture is also
allowed on the source platform. This guarantees that correctness of the program
in terms of safety properties (in particular properties like mutual exclusion) carry
over to the targeted hardware, and the program remains correct after porting.
Portability requires an analysis method that is hardware-architecture-aware
in the sense that a description of the memory models of source and target plat-
forms has to be part of the input. A language for memory models, called CAT [4],
has been developed only recently. In CAT, memory models are defined in terms
of relations between memory operations of a program. There are some relations
(program order, reads from, coherence) that are common to all memory mod-
els. A memory model may define further so-called derived relations by restricting
and composing base relations. The memory model specifies axioms in the form of
acyclicity and irreflexivity constraints over relations. An execution is consistent
if it satisfies all axioms. Our work builds on the CAT language.
There are three problems that make portability different from most common
verification tasks.
(i) We have to deal with user-defined memory models. These models may define
derived relations as least fixed points.
(ii) The formulation of portability involves an alternation (consistent + incon-
sistent) of quantifiers.
(iii) High-level code may be compiled into different low-level code depending on
the architecture.
Concerning the first problem, we implement in SMT the operations that CAT
defines on relations. Notably, we propose an encoding for derived relations that
are defined as least fixed points. Such least fixed points are prominently used
in the Power memory model [8] and their computation was identified as a key
problem in [47]. To quote the authors [...] the proper fixpoint construction [...] is
much more expensive than a fixed unrolling. We show that, with our encoding,
this is not the case. A naive approach would implement the Kleene iteration in
SAT by introducing copies of the variables for each iteration step, resulting in a
very large encoding. We show how to employ SAT + integer difference logic [19]
to compactly encode the Kleene iteration process. Notably, every bounded model
checking technique reasoning about complex memory models defined in CAT will
face the problem of dealing with recursive definitions and can make use of our
technique to solve it efficiently.
The second problem is to encode the quantifier alternation underlying the
definition of portability. A porting bug is an execution that is consistent with the
target but inconsistent with the source memory model. We capture this alterna-
tion with a single existential query. Consistency is specified in terms of acyclicity
(and irreflexivity) of relations. Hence, an execution is inconsistent if a derived
relation of the (source) memory model contains a cycle (or is not irreflexive).
The naive idea would be to model cyclicity by unsatisfiability. Instead, we reduce
cyclicity to satisfiability by introducing auxiliary variables that guess the cycle.
The reader may criticise our definition of portability: one could claim that all
that matters is whether safety is preserved, even if the executions differ. To be
precise, a state-based notion of portability requires that every state computable
under the target architecture is already computable on the source platform. We
study state portability and come up with two results.
(a) Algorithmically, state portability is beyond SAT.
(b) Empirically, there is little difference between state portability and our notion.
The third problem is that the same high-level program is compiled to dif-
ferent assembly programs depending on the source and the target architec-
tures. Even the number of registers and the semantics of the synchronisation
primitives provided by those architectures usually differ. Consider the program
from Fig. 1, written in C++11 and compiled to x86 and Power. The observa-
tion is this. Even if the assembly programs differ, one can map every assem-
bly memory access to the corresponding read or write operation in the high-
level code. In the example, clearly “MOV [y],$1” and “stw r1,y” correspond to
“y.store(memory order relaxed, 1)”. This allows us to relate low-level and
high-level executions and to compare executions of both assembly programs by
checking if they map to the same high-level execution. With this observation, our
analysis can be extended by translating an input program into two correspond-
ing assembly programs and making explicit the relation among the low-level and
high-level executions. While this relation among executions is not studied in the
present paper, details of how to construct it and how to incorporate it into our
approach are given in Appendix D.
In summary, we make the following contributions.
1. We present the first SMT-based implementation of a core subset of CAT
which can handle recursive definitions efficiently.
2. We formulate the portability problem based on the CAT language.
3. We develop a bounded analysis for portability. Despite the apparent alterna-
tion of quantifiers, our SMT encoding is a satisfiability query of polynomial
size and optimal in the complexity sense.
4. We compare our notion of portability to a state-based notion and show that
the latter does not afford a polynomial SAT encoding.
5. We present experiments showing that (i) in a large majority of cases both
notions of portability coincide, and (ii) mutual exclusion algorithms are often
non portable, particularly we perform the first analysis from TSO to Power.
2 Portability Analysis on an Example
Consider program IRIW in Fig. 1, written in C++11 and using the atomic
operator memory order relaxed which provides no guarantees on how memory
accesses in different threads are ordered. When porting, the program is compiled
to two different architectures. The corresponding low-level programs behave dif-
ferently on x86 and on IBM’s Power. On TSO, the memory model implemented
thread t0 thread t1
y.store(memory order relaxed, 1) x.store(memory order relaxed, 1)
thread t2 thread t3
r1 = x.load(memory order relaxed); r1 = y.load(memory order relaxed);
r2 = y.load(memory order relaxed) r2 = x.load(memory order relaxed)
x86 Assembly
thread t0 thread t1 thread t2 thread t3
MOV [y],$1 MOV [x],$1 MOV EAX,[x] MOV EAX,[y]
MOV EAX,[y] MOV EAX,[x]
Power Assembly
thread t0 thread t1 thread t2 thread t3
li r1,1 li r1,1 lwz r1,x lwz r1,y
stw r1,y stw r1,x lwz r3,y lwz r3,x
Rx1
Ry0
Ry1
Rx0
Wx1 Ix0Wy1Iy0
po
fr
co
po
fr
co
rf rf
rferfe
Fig. 1: Portability of program IRIW from TSO to Power.
by x86, each thread has a store buffer of pending writes. A thread can see its
own writes before they become visible to other threads (by reading them from
its buffer), but once a write hits the memory it becomes visible to all other
threads simultaneously: TSO is a multi-copy-atomic model [18]. Power on the
other hand does not guarantee that writes become visible to all threads at the
same point in time. Think of each thread as having its own copy of the memory.
With these two architectures in mind, consider the execution in Fig. 1. Thread
t2 reads x = 1, y = 0 and thread t3 reads x = 0, y = 1, indicated by the solid
edges rfe and rf . Since under TSO every execution has a unique global view of
all operations, no interleaving allows both threads to read the above values of
the variables. Under Power, this is possible. Our goal is to automatically detect
such differences when porting a program from one architecture to another, here
from TSO to Power.
Our tool porthos applies to various architectures, and we not only have a
language for programs but also a language for memory models. The semantics of a
program on a memory model is defined axiomatically, following two steps [8,47].
We first associate with the program (and independent of the memory model)
a set of executions which are candidates for the semantics. An execution is a
graph (Fig. 1) whose nodes (events) are program instructions and whose edges
are basic dependencies: the program order po, the reads-from relation rf (giving
the write that a load reads from), and the coherence order co (stating the order
in which writes take effect). The memory model then defines which executions
are consistent and thus form the semantics of the program on that model.
ConsistentTSO
1 acyclic((po ∩ sloc) ∪ rf ∪ fr ∪ co)
2 acyclic(rfe ∪ co ∪ fr ∪ (po \ (W× R)) ∪mfence)
fr := rf −1; co
rfe := rf \ sthd
Fig. 2: TSO.
We describe memory models in the recently proposed language CAT [4].
Besides the base relations, a model may define so-called derived relations. The
consistency requirements are stated in terms of acyclicity and irreflexivity axioms
over these (base and derived) relations. The CAT formalisation of TSO is given
in Fig. 2. It forbids executions forming a cycle over rfe∪ fr ∪(po \ (W× R)). The
red edges in Fig. 1 yield such a cycle; the execution is not consistent with TSO.
Power further relaxes the program order (Fig. 6), the dotted lines are no longer
considered for cycles and thus the execution is consistent. Hence, IRIW has
executions consistent with Power but not with TSO and is hence not portable.
Our contribution is a bounded analysis for portability implemented in the
porthos tool (http://github.com/hernanponcedeleon/PORTHOS). First, the
program is unrolled up to a user-specified bound. Within this bound, porthos
is guaranteed to find all portability bugs. It will neither see bugs beyond the
bound nor will it be able to prove a cyclic program portable. The unrolled
program, together with the CAT models, is transformed into an SMT formula
where satisfying assignments correspond to bugs.
A bug is an execution consistent with the target memory model MT but
inconsistent with the source MS . We express this combination of consistency
and inconsistency with only one existential quantification. The key observation
is that the derived relations, which may differ in MT and MS , are fully de-
fined by the execution. Hence, by guessing an execution we also obtain the
derived relations (there is nothing more to guess). Checking consistency for
MT is then an acyclicity (or irreflexivity) constraint on the derived relations
that immediately yields an SMT query. Inconsistency for MS requires cyclic-
ity. The trick is to explicitly guess the cycle. We introduce Boolean variables
for every event and every edge that could be part of the cycle. In Fig. 1,
if Rx1 is on the cycle, indicated by the variable C(Rx1) being set, then
there should be one incoming and one outgoing edge also in the cycle. Be-
sides the incoming edge shown in the graph, Rx1 could read from the initial
value Ix0. Since there are two possible incoming edges but only one outgoing
edge, we obtain C(Rx1)⇒ ((Crfe(Wx1, Rx1) ∨ Crf (Ix0, Rx1))∧Cpo(Rx1, Ry0)).
If a relation is on the cycle, then also both end-points should be part of
the cycle and the relation should belong to the execution: Cpo(Rx1, Ry0) ⇒
(C(Rx1) ∧ C(Ry0) ∧ po(Rx1, Ry0)). Finally, at least one event has to be part of
the cycle: C(Ix0)∨C(Wx1)∨C(Rx1)∨C(Rx0)∨C(Iy0)∨C(Wy1)∨C(Ry1)∨C(Ry0).
The execution in Fig. 1 contains the relations marked in red and forms a cycle
which violates Axiom 2 in TSO. The assignment respects the axioms of Power
(Fig. 6), showing the existence of a portability bug in IRIW from TSO to Power.
The other challenge is to capture relations that are defined recursively. The
Kleene iteration process [42] starts with the empty relation and repeatedly
adds pairs of events according to the recursive definitions. We encode this into
(quantifier-free) integer difference logic [19]. For every recursive relation r and
every pair of events (e1, e2), we introduce an integer variable Φ
r
e1,e2
representing
the iteration step in which the pair entered the value of r. A Kleene iteration
then corresponds to a total ordering on these integer variables. Crucially, we
only have one Boolean variable r(e1, e2) per pair rather than one per iteration
step. We illustrate the encoding on a simplified version of the preserved program
order for Power defined as ppo := ii ∪ ic (cf. Fig. 6 for the full definition).
The relation is derived from the mutually recursive relations ii := dd ∪ ic and
ic := cd ∪ ii , where dd and cd represent data and control dependencies. Call
Rx1 and Ry0 respectively e1 and e2. The encoding is
ii(e1, e2) ⇔ (dd(e1, e2) ∧ (Φ
ii
e1,e2
> Φdde1,e2)) ∨ (ic(e1, e2) ∧ (Φ
ii
e1,e2
> Φice1,e2))
ic(e1, e2) ⇔ (cd(e1, e2) ∧ (Φ
ic
e1,e2
> Φcde1,e2)) ∨ (ii(e1, e2) ∧ (Φ
ic
e1,e2
> Φiie1,e2)).
The pair (e1, e2) that belongs to relation dd in step Φ
dd
e1,e2
of the Kleene iteration
can be added to relation ii at a later step Φiie1,e2 > Φ
dd
e1,e2
. As ii := dd ∪ ic, the
disjunction allows us to also add the elements of ic to ii . Since dd and cd are
empty for IRIW, the relations ii and ic have to be identical. Identical non-
empty relations will not yield a solution: the integer variables cannot satisfy
(Φiie1,e2 > Φ
ic
e1,e2
) and (Φice1,e2 > Φ
ii
e1,e2
) at the same time. Hence, the only satis-
fying assignment is the one where both ii and ic are the empty relation, which
implies that ppo is empty. This is consistent with the preserved program order
of Power for IRIW.
3 Programs and Memory Models
We introduce our language for programs and the core of the language CAT. The
presentation follows [4,47] and we refer the reader to those works for details.
Programs. Our language for shared memory concurrent programs is given
in Fig. 3. Programs consist of a finite number of threads from a while-language.
The threads operate on assembly level, which means they explicitly read from
the shared memory into registers, write from registers into memory, and support
local computations on the registers. The language has various fence instructions
(sync, lwsync, and isync on Power and mfence on x86) that enforce ordering
and visibility constraints among instructions. We refrain from explicitly defin-
ing the expressions and predicates used in assignments and conditionals. They
will depend on the data domain. For our analysis, we only require the domain
to admit an SMT encoding in a logic which has its satisfiability problem in
NP. For the rest of the paper we will assume that programs are acyclic: any
while statement is removed by unrolling the program to a depth specified by
the user. Since verification is generally undecidable for while-programs [38], this
under-approximation is necessary for cyclic programs.
〈prog〉 ::= program 〈thrd〉∗
〈thrd〉 ::= thread 〈tid〉 〈inst〉
〈inst〉 ::= 〈atom〉 | 〈inst〉; 〈inst〉
| while 〈pred〉 〈inst〉
| if 〈pred〉 then 〈inst〉
else 〈inst〉
〈atom〉 ::= 〈reg〉 ← 〈exp〉 | 〈reg〉 ← 〈loc〉
| 〈loc〉 := 〈reg〉 | 〈mfence〉
| 〈sync〉 | 〈lwsync〉 | 〈isync〉
Fig. 3: Programming language.
〈MCM 〉 ::= 〈assert〉 | 〈rel〉 | 〈MCM 〉 ∧ 〈MCM 〉
〈assert〉 ::= acyclic(〈r〉) | irreflexive(〈r〉)
〈r〉 ::= 〈b〉 | 〈r〉 ∪ 〈r〉 | 〈r〉 ∩ 〈r〉 | 〈r〉 \ 〈r〉
| 〈r〉−1 | 〈r〉+ | 〈r〉∗ | 〈r〉; 〈r〉
〈b〉 ::= po | rf | co | ad | dd | cd | sthd | sloc
| mfence | sync | lwsync | isync
| id(〈set〉) | 〈set〉 × 〈set〉 | 〈name〉
〈set〉 ::= E |W | R
〈rel〉 ::= 〈name〉 := 〈r〉
Fig. 4: Core of CAT [4].
Executions. The semantics of a program is given in terms of executions, partial
orders where the events represent occurrences of the instructions and the order-
ing edges represent dependencies. The definition is given in Fig. 5. An execution
consists of a set X of executed events and so-called base and induced relations
satisfying the Axioms 3 - 18 . Base relations rf and co and the set X define an
execution (they are the ones to be guessed). Induced relations can be extracted
directly from the source code of the program. The axioms in Fig. 5 are common
to all memory models and are natively implemented by our tool. To state them,
let E represents memory events coming from program instructions accessing the
memory. Memory accesses are either read or writes E := R ∪W. By Rl and Wl
we refer respectively to the reads and writes that access location l. The events
of thread t form the set Et. Relations sthd and sloc are equivalences relating
events belonging to the same thread 3 and accessing the same location 4 . Re-
lations po, ad , dd and cd represent program order and address/data/control de-
pendencies. Axiom 5 states that the program order po is an intra-thread relation
which 6 forms a total order when projected to events in the same thread (predi-
cate total(r, A) holds if r is a total order on the set A). Address dependencies are
either read-to-read or read-to-write 7 , data dependencies are read-to-write 8 ,
and control dependencies originate from reads 9 . Fence relations are architec-
ture specific and relate only events in program order 10 - 13 . Axiom 14 , which
we do not make explicit, requires the executed events X to form a path in the
threads’ control flow. By Axioms 15 and 16 , the reads-from relation rf gives for
each read a unique write to the same location from which the read obtains its
value. Here, r1; r2 := {(x, y) | ∃z : (x, z) ∈ r1 and (z, y) ∈ r2} is the composition
of the relations r1 and r2. We write r
−1 := {(y, x) | (x, y) ∈ r} for the inverse of
relation r. Finally, id(A) is the identity relation on the set A. By Axioms 17 and
18 , the coherence relation co relates writes to the same location, and it forms
a total order for each location. We will assume the existence of an initial write
event for each location which assigns value 0 to the location. This event is first
in the coherence order.
sloc, sthd ⊆ E× E same location, same thread
po, ad , dd , cd ⊆ E× E program order, address/data/control dependency
mfence ⊆ E× E fences in x86
sync, lwsync, isync ⊆ E× E fences in Power
3 equiv(sthd ,E) 4 equiv(sloc,M) 5 po ⊆ sthd 6 total (po,Et)
7 ad ⊆ (R×M) ∩ po 8 dd ⊆ (R×W) ∩ po 9 cd ⊆ (R× E) ∩ po
10 sync ⊆ po 11 lwsync ⊆ po 12 isync ⊆ po 13 mfence ⊆ po
X ⊆ E executed events rf , co ⊆ E× E reads-from, coherence order
14 path 15 rf ⊆ (W× R) ∩ sloc 16 rf ; rf −1 = id(E)
17 co ⊆ ((W×W) ∩ sloc) \ id(E) 18 total (co,Wl)
Fig. 5: Executions; adapted from [47].
Memory Consistency Models. We give in Fig. 4 a core subset of the CAT
language for memory consistency models (MCMs). A memory model is a con-
straint system over so-called derived relations. Derived relations are built from
the base and induced relations in an execution, hand-defined relations that refer
to the different sets of events, and named relations that we will explain in a
moment. The assertions are acyclicity and irreflexivity constraints over derived
relations. CAT also supports recursive definitions of relations. We assume a set
〈name〉 of relation names (different from the predefined relations) and require
that each name used in the memory model has associated a defining equation
〈name〉 := 〈r〉. Notably, 〈r〉 may again contain relation names, making the sys-
tem of defining equations recursive. The actual relations that are denoted by the
names are defined to be the least solution to this system of equations. We can
compute the least solution with a standard Kleene iteration [42] starting from
the empty relations and iterating until the least fixed point is reached.
In Section 6 we study portability to Power; we use its formalization [8] in the
core of CAT as given in Fig. 6. Power is a highly relaxed memory model that sup-
ports program-order relaxations depending on address and data dependencies,
that is non-multi-copy atomic, and that has a complex set of fence instructions.
The axioms defining Power are uniproc 1 and the constraints 19 to 21 . The
model relies on the recursively defined relations ii , ci , ic, and cc.
4 Portability Analysis
Let consM(P ) be the set of executions of program P consistent with M. Given
a program P and two MCMs MS and MT , our goal is to find an execution X
which is consistent with the target (X ∈ consMT (P )) but not with the source
(X 6∈ consMS (P )). In such a case P is not portable from MS to MT .
ConsistentPower
1 acyclic((po ∩ sloc) ∪ rf ∪ fr ∪ co) 19 acyclic(hb)
20 irreflexive(fre; prop; hb∗) 21 acyclic(co ∪ prop)
dp := ad ∪ dd rdw := (po ∩ sloc) ∩ (fre; rfe) detour := (po ∩ sloc) ∩ (coe; rfe)
ii0 := dp ∪ rdw ∪ rfi ci0 := cd -isync ∪ detour
ic0 := ∅ cc0 := dp ∪ (po ∩ sloc) ∪ cd ∪ (ad; po)
ii := ii0 ∪ ci ∪ (ic; ci) ∪ (ii ; ii) ci := ci0 ∪ (ci ; ii) ∪ (cc; ci)
ic := ic0 ∪ ii ∪ cc ∪ (ic; cc) ∪ (ii ; ic) cc := cc0 ∪ ci ∪ (ci ; ic) ∪ (cc; cc)
ppo := ((R× R) ∩ ii) ∪ ((R×W) ∩ ic)
Preserved Program Order
fence := sync ∪ (lwsync \ (W× R))
Fences
hb := ppo ∪ fence ∪ rfe
Thin Air
prop-base := (fence ∪ (rfe; fence)); hb∗
prop := ((W×W) ∩ prop-base) ∪ (com∗; prop-base∗; sync; hb∗)
Propagation
Fig. 6: Power [8].
Definition 1 (Portability). Let MS, MT be two MCMs. A program P is
portable from MS to MT if consMT (P ) ⊆ consMS (P ).
Our method finds non-portable executions as satisfying assignments to an
SMT formula. Recall that an execution is uniquely represented by the set X
and the relations rf and co, which need to be guessed by the solver. All other
relations are derived from these guesses, the source code of the program, and
the MCMs in question. Thus. we also have to encode the derived relations of the
two MCMs defined in the language of Fig. 4. As the last part, we encode the
assertions expressed in the language of Fig. 4 on these relations in such a way that
the guessed execution is allowed by MT (all the assertions stated for MT hold)
while the same execution is not allowed byMS (at least one of the axioms ofMS
is violated). The full SMT formula is of the form φCF ∧ φDF ∧ φMT ∧ φ¬MS .
Here, φCF and φDF encode the control flow and data flow of the executions,
φMT encodes the derived relations and all assertions ofMT , and φ¬MS encodes
the derived relations ofMS and a violation of at least one of the assertions of the
source memory model. The control-flow and data-flow encodings are standard
for bounded model checking [17]. The rest of the section focuses on the parts that
are new in this work: how to encode the derived relations needed for representing
both the MCMs, how to encode assertions for the target memory model and how
to encode an assertion violation in the source memory model.
Encoding Derived Relations. For any pair of events e1, e2 ∈ E and relation
r ⊆ E×E we use a Boolean variable r(e1, e2) representing the fact that e1
r
→ e2
holds. We similarly use fresh Boolean variables to represent the derived relations,
using the encoding to force its value as follows. For the union (resp. intersection)
of two relations, at least one of them (resp. both of them) should hold; set
difference requires that the first relation holds and the second one does not; for
the composition of relations we iterate over a third event and check if it belongs
to the range of the first relation and the domain of the second. Computing a
reverse relation requires reversing the events. We define the transitive closure
of r recursively where the base case tc0 holds if events are related according to
r and the recursive case uses a relation composition. These are computed with
the iterative squaring technique using the relation composition. Finally reflexive
and transitive closure checks if the events are the same or are related by r+. The
encodings are summarized below.
r1∪r2(e1, e2)⇔ r1(e1, e2) ∨ r2(e1, e2) r1∩r2(e1, e2)⇔ r1(e1, e2) ∧ r2(e1, e2)
r1\r2(e1, e2)⇔ r1(e1, e2) ∧ ¬r2(e1, e2) r
−1(e1, e2)⇔ r(e2, e1)
r1;r2(e1, e2)⇔
∨
e3∈E
r1(e1, e3) ∧ r2(e3, e2) r
∗(e1, e2)⇔ r
+(e1, e2) ∨ (e1 = e2)
r+(e1, e2)⇔ tc⌈log |E|⌉(e1, e2),where
tc0(e1, e2)⇔ r(e1, e2), and
tci+1(e1, e2)⇔ r(e1, e2) ∨ (tci(e1, e2); tci(e1, e2)).
Recall that some of the relations (e.g. ii and ic of Power) can be defined
mutually recursively, and that we are using the least fixed point (smallest so-
lution) semantics for cyclic definitions. A classical algorithm for solving such
equations is the Kleene fixpoint iteration. The iteration starts from the empty
relations as initial approximation and on each round computes a new approxi-
mation until the (least) fixed point is reached. Such an iterative algorithm can
be easily encoded into SAT. The problem of such an encoding is the potentially
large number of iterations needed, and thus the resulting formula size can grow
to be large. A more clever way to encode this is an approach that has been
already used in earlier work on encoding mutually recursive monotone equation
systems with nested least and greatest fixpoints [30]. The encoding of this paper
uses an extension of SAT with integer difference logic (IDL), a logic that is still
NP complete. A SAT encoding is also possible but incurs an overhead in the
encoding size: if the SMT encoding is of size O(n), the SAT encoding is of size
O(n log n) [30]. We chose IDL since our experiments showed the encoding to be
the most time consuming of the tasks.
The basic idea of the encoding is to guess a certificate that contains the it-
eration number in which a tuple would be added to the relation in the Kleene
iteration. For this we use additional integer variables and enforce that they actu-
ally locally follow the propagations made by the fixed point iteration algorithm.
Thus, for any pair of events e1, e2 ∈ E and relation r ⊆ E × E we introduce an
integer variable Φre1,e2 representing the round in which r(e1, e2) would be set by
the Kleene iteration algorithm. Using these new variables we guess the execu-
tion of the Kleene fixed point iteration algorithm, and then locally check that
every guess that was made is also a valid propagation of the fixed point iteration
algorithm. For a simple example on how the encoding for the union of relations
needs to be modified to also handle recursive definitions, consider a definition
where r1 := r2 ∪ r3 and r2 := r1 ∪ r4. The encoding is as follows
r1(e1, e2) ⇔ (r2(e1, e2) ∧ (Φ
r1
e1,e2
> Φr2e1,e2)) ∨ (r3(e1, e2) ∧ (Φ
r1
e1,e2
> Φr3e1,e2))
r2(e1, e2) ⇔ (r1(e1, e2) ∧ (Φ
r2
e1,e2
> Φr1e1,e2)) ∨ (r4(e1, e2) ∧ (Φ
r2
e1,e2
> Φr4e1,e2)).
A pair (e1, e2) is added to r1 by the Kleene iteration in step Φ
r1
e1,e2
. It comes
from either r2 or r3. If it came from r2 then it is of course also in r2 and it was
added to r2 in an earlier iteration Φ
r2
e1,e2
and thus (Φr1e1,e2 > Φ
r2
e1,e2
). It is similar
if it came from r3. The only satisfying assignment for the encoding is one where
both r1 and r2 are the union of r3 and r4.
Encoding Target Memory Model Assertions. For the target architec-
ture we need to encode all acyclicity and irreflexivity assertions of the memory
model. For handling acyclicity we again use non-Boolean variables in our SMT
encoding for compactness reasons. One can encode that a relation is acyclic
by adding a numerical variable Ψe ∈ N for each event e in the relation we
want to be acyclic. Then acyclicity of relation r is encoded as acyclic(r) ⇔∧
e1,e2∈E
(r(e1, e2)⇒ (Ψe1 < Ψe2)). Notice that we can impose a total order with
Ψe1 < Ψe2 only if there is no cycle. Our encoding is the same as the SAT + IDL
encoding in [28] where more discussion of SAT modulo acyclicity can be found.
The irreflexive constraint is simply encoded as: irreflexive(r)⇔
∧
e∈E
¬r(e, e).
Encoding Source Memory Model Assertions. For the source architecture
we have to encode that one of the derived relations does not fulfill its assertions.
On the top level this can be encoded as a simple disjunction over all the assertions
of the source memory model, forcing at least one of the irreflexivity or acyclicity
constraints to be violated.
For the irreflexivity violation, we can reuse the same encoding as for the
target memory model simply as ¬irreflexive(r). What remains to be encoded is
cyclic(r), which requires the relation r to be cyclic. Here, we give an encoding
that uses only Boolean variables. We add Boolean variables C(e) and Cr(e1, e2),
which guess the edges and nodes constituting the cycle. We ensure that for every
event in the cycle, there should be at least one incoming edge and at least one
outgoing edge that are also in the cycle:
cn =
∧
e1∈E
(C(e1)⇒ (
∨
e2
r
→e1
Cr(e2, e1) ∧
∨
e1
r
→e2)
Cr(e1, e2))).
If an edge is guessed to be in a cycle, the edge must belong to relation r, and
both events must also be guessed to be on the cycle:
ce =
∧
e1,e2∈E
(Cr(e1, e2)⇒ (r(e1, e2) ∧ C(e1) ∧ C(e2))).
A cycle exists, if these formulas hold and there is an event in the cycle:
cyclic(r)⇔ (ce ∧ cn ∧
∨
e∈E
C(e)).
5 State Portability
Portability from MS to MT requires that there are no new executions in MT
that did not occur in MS . One motivation to check portability is to make sure
that safety properties of MS carry over to MT . Safety properties only depend
on the values that can be computed, not on the actual executions. Therefore, we
now study a more liberal notion of so-called state portability: MT may admit
new executions as long as they do not compute new states. Admitting more
executions means we require less synchronization (fences) to consider a ported
program correct, and thus state portability promises more efficient code. The
notion has been used in [31].
The main finding in this section is negative: a polynomial encoding of state
portability to SAT does not exist (unless the polynomial hierarchy collapses).
Phrased differently, state portability does not admit an efficient bounded analysis
(like our method for portability). We remind the reader that we restrict our input
to acyclic programs (that can be obtained from while-programs with bounded
unrolling). For while-programs, verification tasks are generally undecidable [38].
Fortunately, our experiments indicate that new executions often compute
new states. This means portability is not only a sufficient condition for state
portability but, in practice, the two are equivalent. Combined with the better
algorithmics of portability, we do not see a good motivation to move to state
portability.
A state is a function that assigns a value to each location and register. An
execution X computes the state state(X) defined as follows: a location receives
the value of the last write event (according to co) accessing it; for a register, its
value depends on the last event in po that writes to it. The relationship between
the notions is as in Lemma 1.
Definition 2 (State Portability). Let MS, MT be MCMs. Program P is
state portable from MS to MT if state(consMT (P )) ⊆ state(consMS (P )).
Lemma 1. (1) Portability implies state portability. (2) State portability does
not imply portability.
For Lemma 1.(2), consider a variant of IRIW (Fig. 1) where all written values
are 0. The program is trivially state portable from Power to TSO, but like IRIW,
not portable.
We turn to the hardness argumentation. To check state portability, every
MT -computable state seems to need a formula checking whether some MS-
consistent execution computes it. The result would be an exponential blow-up
or a quantified Boolean formula, which is not practical. But can this exponential
blow-up or quantification be avoided by some clever encoding trick? The answer
is no! Theorem 1 shows that state portability is in a higher class of the polynomial
hierarchy than portability. So state portability is indeed harder to check than
portability.
The polynomial hierarchy [41] contains complexity classes between NP and
PSPACE. Each class is represented by the problem of checking validity of a
Boolean formula with a fixed number of quantifier alternations. We need here
the classes co-NP = ΠP1 ⊆ Π
P
2 . The tautology problem (validity of a closed
Boolean formula with a universal quantifier ∀x1 . . . xn : ψ ) is a Π
P
1 -complete
problem. The higher classΠP2 allows for a second quantifier: validity of a formula
(∀x1 . . . xn∃y1 . . . yn : ψ) is a Π
P
2 -complete problem. Theorem 1 refers to a class
of common memory models that we define in a moment. Moreover, we assume
that the given pair of memory models MS and MT is non-trivial in the sense
that consMT (P ) ⊆ consMS (P ) fails for some program, and similar for state
portability.
Theorem 1. Let MS ,MT be a non-trivial pair of common MCMs. (1) Porta-
bility from MS to MT is Π
P
1 -complete. (2) State portability is Π
P
2 -complete.
By Theorem 1.(2), state portability cannot be solved efficiently. The first
part says that our portability analysis is optimal. We focus on this lower bound
to give a taste of the argumentation: given a non-trivial pair of memory models,
we know there is a program that is not portable. Crucially, we do not know
the program but give a construction that works for any program. The proof of
Theorem 1.(2) is along similar lines but more involved.
Definition 3. We call an MCM common1 if
(i) the inverse operator is only used in the definition of fr ,
(ii) the constructs sthd, sloc, and 〈set〉 × 〈set〉 are only used to restrict (in a
conjunction) other relations,
(iii) it satisfies uniproc (Axiom 1 ) , and
(iv) every program is portable from this MCM to SC.
We explain the definition. When formulating a MCM, one typically forbids
well-chosen cycles of base relations (and fr). To this end, derived relations are
introduced that capture the paths of interest, and acyclicity constraints are im-
posed on the derived relations. The operators inverse and 〈set〉 × 〈set〉 may do
the opposite, they add relations that do not correspond to paths of base relations
(and fr). Besides stating what is common in MCMs, Properties (i) and (ii) help
us compose programs (cf. next paragraph). Uniproc is a fundamental property
without which an MCM is hard to program. Since the purpose of an MCM is to
capture SC relaxations, we can assume MCMs to be weaker than SC. Proper-
ties (iii) and (iv) guarantee that the program Pψ given below is portable between
any common MCMs.
The crucial property of common MCMs is the following. For every pair of
events e1, e2 in a derived relation, (1) there are (potentially several) sequences
of base relations (and fr) that connect e1 and e2, and (2) the derived relation
1 Notice that all memory models considered in [8] and in this paper are common ones.
only depends on these sequences. The property ensures that if we append a
program P ′ to a location-disjoint program P , consistency of composed executions
is preserved.
It remains to prove ΠP1 -hardness of portability. We first introduce the pro-
gram Pψ that generates some assignment and checks if it satisfies the Boolean
formula ψ(x1 . . . xm) (over the variables x1 . . . xm). The program Pψ := t1 ‖ t2
consists of the two threads t1 and t2 defined below. Note that we cannot directly
write a constant i to a location, so we first assign i to register rc,i.
thread t1 thread t2
rc,0 ← 0; rc,1 ← 1; rc,2 ← 2 rc,1 ← 1;
x1 := rc,0 . . . xm := rc,0; x1 := rc,1 . . . xm := rc,1;
r1 ← x1 . . . rm ← xm;
if ψ(r1 . . . rm) then
y := rc,2;
else y := rc,1;
We reduce checking whether ∀x1 . . . xm : ψ(x1 . . . xm) holds to portability of a
program P∀ψ. The idea for P∀ψ is this. First Pψ is run, it guesses and evaluates an
assignment for ψ. If ψ is not satisfied (y = 1), then some non-portable program
Pnp is executed. The program P∀ψ is portable iff the non-portable part is never
executed. This is the case iff ψ is satisfied by all assignments.
Let MS, MT be common and non-trivial. By non-triviality, there is a pro-
gram Pnp = t
′
1 ‖ · · · ‖ t
′
k that is not portable from MS to MT . We can assume
Pnp has no registers or locations in common with Pψ . Program P∀ψ prepends
Pψ to the first two threads of Pnp . Once y = 1, Pnp starts running. Formally,
let t1 and t2 be the threads in Pψ and let ti := skip for 3 ≤ i ≤ k. We define
P∀ψ := t
′′
1 ‖ · · · ‖ t
′′
k with t
′′
i := ti; r← y; if(r = 1) then t
′
i.
We show that P∀ψ is portable iff ψ is satisfied for every assignment by proving
the following: if P∀ψ is not portable then ψ has an unsatisfying assignment and
vice versa.
6 Experiments
The encoding from Section 4 has been implemented in a tool called porthos.
We evaluate porthos on benchmark programs on a wide range of well-known
MCMs. For SC, TSO, PSO, RMO and Alpha (henceforth called traditional ar-
chitectures) we use the formalizations from [3]; for Power the one in Fig. 6. We
divide our results in three categories: portability of mutual exclusion algorithms,
portability of litmus tests, and performance of the tool.
Portability of Mutual Exclusion Algorithms. Most of the tools that are
MCM-aware [8,35,44,47,48] accept only litmus tests as inputs. porthos, how-
ever, can analyze cyclic programs with control flow branching and merging by
unrolling them into acyclic form. In order to show the broad applicability of our
method, we tested portability of several mutual exclusion algorithms: Lamport’s
Benchmark SC
-T
SO
SC
-P
ow
er
TS
O-
Po
we
r
Bakery ✗ ✗ ✗
Bakery x86 ✔ ✗ ✗
Bakery Power ✔ ✔ ✔
Burns ✗ ✗ ✗
Burns x86 ✔ ✗ ✗
Burns Power ✔ ✔ ✔
Dekker ✗ ✗ ✗
Dekker x86 ✔ ✗ ✗
Dekker Power ✔ ✔ ✔
Lamport ✗ ✗ ✗
Lamport x86 ✔ ✗ ✗
Lamport Power ✔ ✔ ✔
Peterson ✗ ✗ ✗
Peterson x86 ✔ ✗ ✗
Peterson Power ✔ ✔ ✔
Szymanski ✗ ✗ ✗
Szymanski x86 ✔ ✗ ✗
Szymanski Power ✔ ✔ ✔
Deadness
✗✗ ✔✔ ✗✔ ✔✔ ✗✔
SC-TSO 27 898 75 933 40
SC-PSO 27 777 196 836 137
SC-RMO 27 737 236 780 193
SC-Alpha 27 846 127 887 86
TSO-PSO 0 833 67 883 27
TSO-RMO 0 760 240 798 202
TSO-Alpha 0 877 133 912 88
PSO-RMO 0 831 169 844 156
PSO-Alpha 0 968 32 973 27
RMO-Alpha 0 999 1 999 1
Alpha-RMO 0 856 144 864 136
0.98% 85.29 % 13.73 % 88.26 % 10.73 %
SC-Power 1477 898 52 936 14
TSO-Power 917 1132 378 1166 344
PSO-Power 502 1880 45 1892 33
RMO-Power 40 2227 160 2239 148
Alpha-Power 0 2427 0 2427 0
24.20% 70.57% 5.23% 71.35% 4.45%
Table 1: (Left) Bounded portability analysis of mutual exclusion algorithms: portable
(✔), non-portable (✗). (Right) Portability vs. State Portability on litmus tests.
bakery [32], Burns’ protocol [15], Dekker’s [23], Lamport’s fast mutex [33], Peter-
son’s [37] and Szymanski’s [43]. The benchmarks also include previously known
fenced versions for TSO (marked as x86) and new versions we introduced using
Power fences (marked as Power). The mutual exclusion program loops were
unrolled once in all the experiments to obtain an acyclic program, and the dis-
cussion in what follows is for the portability analysis of this acyclic program.
While these algorithms have been proven correct for SC, it is well known that
they do not guarantee mutual exclusion when ported to weaker architectures.
The effects of relaxing the program order have been widely studied; there are
techniques that even place fences automatically to guarantee portability, but
they assume SC as the source architecture [5,12]. In Table 1 (left) we do not
only confirm that fenceless versions of the benchmarks are not portable from
SC to TSO and fenced versions of them are, we also show that those fences are
not enough to guarantee mutual exclusion when ported from TSO to Power.
We have used porthos to find portability bugs when porting from TSO to
Power and manually added fences to forbid such executions (see benchmarks
marked as Power). To the best of our knowledge these are the first results
about portability of mutual exclusion algorithms from memory models weaker
than SC to the Power architecture.
Checking Portability on Litmus Tests. We compare the results of porthos
(which implements portability) against Herd7 (http://diy.inria.fr/herd)
which reasons about state reachability and can be used to test state portability.
Herd7 systematically constructs all consistent executions of the program and
exhaustively enumerate all possible computable states. Such enumeration can
be very expensive for programs with lots of computable states, e.g. for many
programs with a very large level of concurrency. Since Herd7 only allows to
22 domain(cd) ⊆ range(rf )
23 imm(co); imm(co); imm(co−1) ⊆ rf ?; (po; (rf −1)?)? imm(r) := r\(r; r
+)
Fig. 7: Syntactic Deadness [47].
reason about one memory model at a time, for each test we run the tool twice
(one for each MCM) and compare the set of computable states. The program is
not state portable if the target MCM generates computable states that are not
computable states of the source MCM.
Our experiments contain two test suites: TS 1 contains 1000 randomly gen-
erated litmus tests in x86 assembly (to test traditional architectures) and TS 2
contains 2427 litmus tests in Power assembly taken from [36]. Each test contains
between 2 and 4 threads and between 4 and 20 instructions. Table 1 (right)
reports the number of non-portable (w.r.t. both definitions) litmus tests (✗✗),
the number of portable and state-portable litmus tests (✔✔) and the number of
litmus tests that are not portable but are still state portable (✗✔). In the last
case the new executions allowed by the target memory model do not result in
new computable states of the program. We show that in many cases both notions
of portability coincide. For TS 1 on traditional architectures, the amount of non
state-portable tests is very low (0.98%), while the non portability of the program
does not generate a new computable state in 13.73% of the cases. For TS 2 from
traditional architectures to Power, the number of non state-portable litmus tests
rises to 24.20%, while only in 5.24% of cases the two notions of portability do
not match because the new executions do not result in a new computable state
for the program.
In order to remove some executions that do not lead to new computable
states, porthos optionally supports the use of syntactic deadness which has
been recently proposed in [47]. Dead executions are either consistent or lead to
not computable states. Formally an execution X is dead if X 6∈ consM(P ) im-
plies that state(X) 6= state(Y ) for all Y ∈ consM(P ). Instead of looking for any
execution which is not consistent for the source architecture, we want to restrict
the search to non-consistent and dead executions of MS . This is equivalent to
checking state portability. As shown by Wickerson et al. [47], dead executions
can be approximated with constraints 22 and 23 given in Fig. 7 where r? is the
reflexive closure of r . These constraints can be easily encoded into SAT. Our
tool has an implementation which rules out quite a few executions not comput-
ing new states. The last two columns of the table show that by restricting the
search to (syntactic) dead executions, the ratio of litmus tests the tool reports as
non portable, but are actually state portable is reduced to 10.73% for traditional
architectures and to 4.44% for Power.
The experiments above show that in most of the cases both notions of porta-
bility coincide, specially when using dead executions. The cases where such dif-
ferences are manifested is very low, specially when porting to Power. To test state
portability, our method can be complemented with an extra query to check if
B
ak
er
y
B
ak
er
y
x8
6
B
ak
er
y
Po
w
er
B
ur
ns
B
ur
ns
x8
6
B
ur
ns
Po
w
er
D
ek
ke
r
D
ek
ke
r
x8
6
D
ek
ke
r
Po
w
er
La
m
po
rt
La
m
po
rt
x8
6
La
m
po
rt
Po
w
er
Pe
te
rs
on
Pe
te
rs
on
x8
6
Pe
te
rs
on
Po
w
er
Sz
ym
an
sk
i
Sz
ym
an
sk
i x
86
Sz
ym
an
sk
i P
ow
er
0
5
10
15
20
25
30
SC-TSO
SC-Power
TSO-Power
Fig. 8: Solving times (in secs.) for portability of mutual exclusion algorithms.
the final state of the counter-example execution is also reachable in the source
model by another execution. However, as shown in Section 5, the price to obtain
such a result is to go one level higher in the polynomial hierarchy which affects
the performance of the analyses.
Performance. For small litmus test, the running times of Herd7 outperform
porthos. However, as soon as the programs become bigger, Herd7 does not
perform as well as porthos. We believe this is due to the use of efficient search
techniques in the SMT solver. The impact on efficiency is manifested as the num-
ber of executions Herd7 has to explicitly simulate by enumeration grows. We
evaluate the solving times of our tool on the mutual exclusion benchmarks. Our
prototype encoding implementation is done in Python; the encoding generations
times have a minimum of 13 secs and a maximum of 303 secs. The encodings
involving Power are usually more time consuming than traditional models since
Power has both transitive closures and least fixed points in its encoding. We
expect that the encoding times could be vastly improved by a careful C/C++
implementation of the encoding. Fig. 8 presents the solving times of porthos
for the mutual exclusion algorithms, which are actually much lower than the
encoding times for our prototype implementation.
7 Related Work
Semantics and verification under weak memory models have been the subject of
study at least since 2007. Initially, the behavior of x86 and TSO has been clari-
fied [13,40], then the Power architecture has been addressed [36,39], now ARM is
being tackled [26]. The study also looks beyond hardware, in particular C++11
received considerable attention [10,11]. Research in semantics goes hand in hand
with the development of verification methods. They come in two flavors: pro-
gram logics [45,46] and algorithmic approaches [1,2,6,8,9,12,14,21,22]. Notably,
each of these methods and tools is designed for a specific memory model and
hence is not directly able to handle porting tasks.
The problem of verifying consistency under weak memory models has been
extensively studied. Multiple methods and the complexity of the corresponding
problems have been analyzed [16,24,25]. A prominent approach is testing where
an execution is (partially) given and consistency is tested for a specified model
[27,29]. In this line we showed that state portability (formulated as a bounded
analysis for cyclic programs) is Πp2 -complete. This means there is no hope for
a polynomial encoding into SAT (unless the polynomial hierarchy collapses). In
contrast, our execution-based notion of portability is co-NP-complete (we look
for a violation to portability), which in particular means that our portability
analysis is optimal in the complexity sense. Our experiments show that in most
of the cases both notions of portability coincide.
A problem less general than portability is solved in [12] where non-portable
traces from SC to TSO are characterized. The problem is reduced to state reacha-
bility under the SC semantics in an instrumented program and a minimal number
of fences is synthesized to enforce portability. One step further, one can enforce
portability not only to TSO, but also to weaker memory models. The offence
tool [7] does this, but can only analyze litmus test and is limited to restoring
SC. Checking the existence of critical cycles (i.e. portability bugs) on complex
programs has been tacked in [5], where such cycles are broken by automatically
introducing fences. The cost of different types of fences is considered and the
task is encoded as an optimization problem. The musketeer tool analyzes C
programs and has shown to scale up to programs with thousands lines of code,
but the implementation is also restricted to the case were the source model
is SC. Fence insertion can also be used to guarantee safety properties (rather
than restoring SC behaviors). The Fender and DFence tools [31,34] can verify
real-world C code, but they are restricted to TSO, PSO, and RMO.
8 Conclusion and Outlook
We introduce the first method that tests portability between any two axiomatic
memory models defined in the CAT language. The method reduces portability
analysis to satisfiability of an SMT formula in SAT + integer difference logic. We
propose efficient solutions for two crucial tasks: reasoning about two user-defined
MCMs at the same time and encoding recursively defined relations (needed for
Power) into SMT. The latter can be re-used by any bounded model checking
technique reasoning about complex memory models.
Our complexity analysis and experimental results both suggest that our def-
inition of portability is preferable over the state-based notion of portability. If
state-based portability is required, the complexity results show that it cannot be
done with a single SMT solver query, unlike the approach to portability analysis
suggested in this paper. We also show that our method is not restricted just to
litmus tests, but actually for the first time report on automated tool-based porta-
bility analysis of mutual exclusions algorithms from several axiomatic memory
models to Power.
Acknowledgements
We thank John Wickerson for his explanations about dead executions, Luc
Maranget for several discussions about CAT models and Egor Derevenetc for
providing help with the mutual exclusion benchmarks. This work was partially
supported by the Academy of Finland project 277522. Florian Furbach was sup-
ported by the DFG project R2M2: Robustness against Relaxed Memory Models.
References
1. Parosh A. Abdulla, Stavros Aronis, Mohamed Faouzi Atig, Bengt Jonsson, Carl
Leonardsson, and Konstantinos F. Sagonas. Stateless model checking for TSO and
PSO. In TACAS, volume 9035 of LNCS, pages 353–367. Springer, 2015.
2. Parosh A. Abdulla, Mohamed Faouzi Atig, Bengt Jonsson, and Carl Leonardsson.
Stateless model checking for POWER. In CAV, volume 9780 of LNCS, pages
134–156. Springer, 2016.
3. Jade Alglave. A Shared Memory Poetics. The`se de doctorat, L’universite´ Paris
Denis Diderot, 2010.
4. Jade Alglave, Patrick Cousot, and Luc Maranget. Syntax and semantics of the weak
consistency model specification language CAT. CoRR, abs/1608.07531, 2016.
5. Jade Alglave, Daniel Kroening, Vincent Nimal, and Daniel Poetzl. Don’t sit on the
fence - A static analysis approach to automatic fence insertion. In CAV, volume
8559 of LNCS, pages 508–524. Springer, 2014.
6. Jade Alglave, Daniel Kroening, and Michael Tautschnig. Partial orders for efficient
bounded model checking of concurrent software. In CAV, volume 8044 of LNCS,
pages 141–157. Springer, 2013.
7. Jade Alglave and Luc Maranget. Stability in weak memory models. In CAV,
volume 6806 of LNCS, pages 50–66. Springer, 2011.
8. Jade Alglave, Luc Maranget, and Michael Tautschnig. Herding cats: Modelling,
simulation, testing, and data mining for weak memory. ACM Trans. Program.
Lang. Syst., 36(2):7:1–7:74, 2014.
9. Mohamed Faouzi Atig, Ahmed Bouajjani, Sebastian Burckhardt, and Madanlal
Musuvathi. On the verification problem for weak memory models. In POPL,
pages 7–18. ACM, 2010.
10. Mark Batty, Alastair F. Donaldson, and John Wickerson. Overhauling SC atomics
in C11 and OpenCL. In POPL, pages 634–648. ACM, 2016.
11. Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. Mathe-
matizing C++ concurrency. In POPL, pages 55–66. ACM, 2011.
12. Ahmed Bouajjani, Egor Derevenetc, and Roland Meyer. Checking and enforcing
robustness against TSO. In ESOP, volume 7792 of LNCS, pages 533–553. Springer,
2013.
13. Sebastian Burckhardt, Rajeev Alur, and Milo M. K. Martin. CheckFence: Checking
consistency of concurrent data types on relaxed memory models. In PLDI, pages
12–21. ACM, 2007.
14. Sebastian Burckhardt and Madanlal Musuvathi. Effective program verification for
relaxed memory models. In CAV, volume 5123 of LNCS, pages 107–120. Springer,
2008.
15. James E. Burns and Nancy A. Lynch. Bounds on shared memory for mutual
exclusion. Information and Computation, 107(2):171 – 184, 1993.
16. Jason F. Cantin, Mikko H. Lipasti, and James E. Smith. The complexity of ver-
ifying memory coherence and consistency. IEEE Trans. Parallel Distrib. Syst.,
16(7):663–671, 2005.
17. He´le`ne Collavizza and Michel Rueher. Exploration of the capabilities of constraint
programming for software verification. In TACAS, volume 3920 of LNCS, pages
182–196. Springer, 2006.
18. William W. Collier. Reasoning about parallel architectures. Prentice Hall, 1992.
19. Scott Cotton, Eugene Asarin, Oded Maler, and Peter Niebert. Some progress in
satisfiability checking for difference logic. In FORMATS, volume 3253 of LNCS,
pages 263–276. Springer, 2004.
20. Daniela Carneiro da Cruz, Maria Joa˜o Frade, and Jorge Sousa Pinto. Verification
conditions for single-assignment programs. In SAC, pages 1264–1270. ACM, 2012.
21. Andrei M. Dan, Yuri Meshman, Martin T. Vechev, and Eran Yahav. Predicate
abstraction for relaxed memory models. In SAS, volume 7935 of LNCS, pages
84–104. Springer, 2013.
22. Andrei M. Dan, Yuri Meshman, Martin T. Vechev, and Eran Yahav. Effective
abstractions for verification under relaxed memory models. In VMCAI, volume
8931 of LNCS, pages 449–466. Springer, 2015.
23. Edsger W. Dijkstra. Cooperating sequential processes. InThe Origin of Concurrent
Programming, pages 65–138. Springer-Verlag New York, Inc., 2002.
24. Constantin Enea and Azadeh Farzan. On atomicity in presence of non-atomic
writes. In TACAS, volume 9636 of LNCS, pages 497–514. Springer, 2016.
25. Azadeh Farzan and P. Madhusudan. Monitoring atomicity in concurrent programs.
In CAV, volume 5123 of LNCS, pages 52–65. Springer, 2008.
26. Shaked Flur, Kathryn E. Gray, Christopher Pulte, Susmit Sarkar, Ali Sezgin, Luc
Maranget, Will Deacon, and Peter Sewell. Modelling the ARMv8 architecture,
operationally: concurrency and ISA. In POPL, pages 608–621. ACM, 2016.
27. Florian Furbach, Roland Meyer, Klaus Schneider, and Maximilian Senftleben.
Memory-model-aware testing: A unified complexity analysis. ACM Trans. Em-
bedded Comput. Syst., 14(4):63, 2015.
28. Martin Gebser, Tomi Janhunen, and Jussi Rintanen. SAT modulo graphs: Acyclic-
ity. In JELIA, pages 137–151, 2014.
29. Phillip B. Gibbons and Ephraim Korach. Testing shared memories. SIAM Journal
on Computing, 26:1208–1244, 1997.
30. Keijo Heljanko, Misa Keina¨nen, Martin Lange, and Ilkka Niemela¨. Solving parity
games by a reduction to SAT. J. Comput. Syst. Sci., 78(2):430–440, 2012.
31. Michael Kuperstein, Martin T. Vechev, and Eran Yahav. Automatic inference of
memory fences. SIGACT News, 43(2):108–123, 2012.
32. Leslie Lamport. A new solution of Dijkstra’s concurrent programming problem.
Commun. ACM, 17(8):453–455, 1974.
33. Leslie Lamport. A fast mutual exclusion algorithm. ACM Trans. Comput. Syst.,
5(1):1–11, 1987.
34. Feng Liu, Nayden Nedev, Nedyalko Prisadnikov, Martin T. Vechev, and Eran Ya-
hav. Dynamic synthesis for relaxed memory models. In PLDI, pages 429–440.
ACM, 2012.
35. Sela Mador-Haim, Rajeev Alur, and Milo M. K. Martin. Generating litmus tests
for contrasting memory consistency models. In CAV, volume 6174 of Lecture Notes
in Computer Science, pages 273–287. Springer, 2010.
36. Sela Mador-Haim, Luc Maranget, Susmit Sarkar, Kayvan Memarian, Jade Alglave,
Scott Owens, Rajeev Alur, Milo M. K. Martin, Peter Sewell, and Derek Williams.
An axiomatic memory model for POWER multiprocessors. In CAV, volume 7358
of LNCS, pages 495–512. Springer, 2012.
37. Gary L. Peterson. Myths about the mutual exclusion problem. Inf. Process. Lett.,
12(3):115–116, 1981.
38. Henry G. Rice. Classes of recursively enumerable sets and their decision problems.
Transactions of the American Mathematical Society, 74(2):358–366, 1953.
39. Susmit Sarkar, Peter Sewell, Jade Alglave, Luc Maranget, and Derek Williams.
Understanding POWER multiprocessors. In PLDI, pages 175–186. ACM, 2011.
40. Susmit Sarkar, Peter Sewell, Francesco Zappa Nardelli, Scott Owens, Tom Ridge,
Thomas Braibant, Magnus O. Myreen, and Jade Alglave. The semantics of x86-CC
multiprocessor machine code. In POPL, pages 379–391. ACM, 2009.
41. Larry J. Stockmeyer. The polynomial-time hierarchy. Theor. Comput. Sci., 3(1):1–
22, 1976.
42. Viggo Stoltenberg-Hansen, Edward R. Griffor, and Ingrid Lindstrom. Mathemati-
cal Theory of Domains. Cambridge Tracts in Theoretical Computer Science. Cam-
bridge University Press, 1994.
43. Boleslaw K. Szymanski. A simple solution to Lamport’s concurrent programming
problem with linear wait. In ICS, pages 621–626. ACM, 1988.
44. Emina Torlak, Mandana Vaziri, and Julian Dolby. MemSAT: checking axiomatic
specifications of memory models. In PLDI, pages 341–350. ACM, 2010.
45. Aaron Turon, Viktor Vafeiadis, and Derek Dreyer. GPS: navigating weak memory
with ghosts, protocols, and separation. In OOPSLA, pages 691–707. ACM, 2014.
46. Viktor Vafeiadis and Chinmay Narayan. Relaxed separation logic: a program logic
for C11 concurrency. In OOPSLA, pages 867–884. ACM, 2013.
47. John Wickerson, Mark Batty, Tyler Sorensen, and George A. Constantinides. Au-
tomatically comparing memory consistency models. In POPL, pages 190–204.
ACM, 2017.
48. Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom, and Konrad Slind. Nemos:
A framework for axiomatic and executable specifications of memory consistency
models. In IPDPS. IEEE Computer Society, 2004.
A Rest of the encoding
This section details the remaining two sub formulas for the portability encoding,
i.e. the control-flow and the data-flow.
A.1 Control-flow
Instead of representing the branching of the program with a tree [20], we use
a direct acyclic graph (DAG) capturing the branches of the program and how
those merge again. This allows to keep the size of the control-flow formula linear
w.r.t the (unfolded) program. The tree representation can be exponential if the
program has several if statements. We encode this DAG in the formula φCF .
For each instruction2 i we use a Boolean variable cfi representing the fact
that the instruction is actually executed by the execution. For a sequence i1 :=
i2; i3, instruction i1 belongs to the execution iff both i2 and i3 belong too (1).
Assignments (local computations, loads and stores) and fences do not impose any
restriction in the control-flow encoding (2)-(5); belonging or not to the execution
depends on them being part of the body of some if statement at a higher level of
the recursive definition. Given an instruction i1 := if b then i2 else i3, we use
three control-flow variables cfi1 , cfi2 , cfi3 ; then i1 is executed iff one of i2, i3 is
performed (6), which one actually depends on the value of b and this is encoded
in the data-flow formula φDF . These restrictions are encoded recursively by the
following constraints:
φCF (i2; i3) = cfi1 ⇔ (cfi2 ∧ cfi3 ) ∧ φCF (i2) ∧ φCF (i3) (1)
φCF (r ← e) = cfr←e (2)
φCF (r ← l) = cfr←l (3)
φCF (l := r) = cfl:=r (4)
φCF (fence) = cffence (5)
φCF (if b then i2 else i3) = cfi1 ⇔ (cfi2 ∨ cfi3 ) ∧ φCF (i2) ∧ φCF (i3) (6)
A.2 Data-flow
We encode the data flow with single static assignments using the method of [17].
Formula φDF represents how the data flows between locations and registers; we first
focus on how the data-flow of the local thread behavior is encoded (sub-formula φDFthrd ).
For each location of the program (resp. register) we use several integer variables (one
for each variable in the SSA form of the program) representing the value carried by
that location (resp. register) in the execution. For loads, stores and local computations,
if the instruction is part of the execution (i.e. its control-flow variable is True) then
both sides of the assignment should coincide (7)-(9). For a sequence, the formula is the
conjunction of the encoding of the corresponding instructions (10).
2 Notice that instructions are defined recursively and thus the term “instruction” may
represent a sequence of events accessing the memory.
Suppose register r and location l have been already assigned p and q times respec-
tively, then:
φDFthrd (r ← e) = cfr←e ⇒ (rp+1 = e) (7)
φDFthrd (r ← l) = cfr←l ⇒ (rp+1 = lq+1) (8)
φDFthrd (l := r) = cfl:=r ⇒ (lq+1 = rp) (9)
φDFthrd (i1; i2) = φDFthrd (i1) ∧ φDFthrd (i2) (10)
Following the SSA form, the left hand side of each assignment introduces new
variables; for registers in the right hand side, indexes are not updated so they match
with the last value which can only be modified by the same thread (9). However for
locations in the right hand side, the index is also updated (8) to allow variables to
match not only with the last assignment done by that thread, but also from other
threads (see the sub-formula φDFmem below).
If statements may have a different number of assignments in their branches for
certain variables. The idea here is to insert dummy assignments to ensure that both
branches have the same number of assignments. We show the encoding for the simple
case where each branch consists only of local computations to a register r. The same
process is applied individually for each register and location assigned in a branch.
If the branches contain if statements, the procedure must be applied recursively to
each of them. Consider the if statement if b then i1 else i2 where the first branch
has less assignments to r than the second one, i.e. i1 := r ← e1,1; . . . ; r ← e1,p and
i2 := r ← e2,1; . . . ; r ← e2,q with p < q (the encoding is symmetric for q < p). Assume
r has been already assigned x times, then the encoding of the instruction contains the
following constraint:
φDFthrd (if b then i1 else i2) = (b⇒ cfi1) ∧ (¬b⇒ cfi2) (11)
∧ cfi1 ⇒ (rx+1 = e1,1) (12)
...
∧ cfi1 ⇒ (rx+p = e1,p) (13)
...
∧ cfi1 ⇒ (rx+q = rx+p) (14)
∧ cfi2 ⇒ (rx+1 = e2,1) (15)
...
∧ cfi2 ⇒ (rx+q = e2,q) (16)
Constraint (11) imposes that which branch is followed depends on the value of the
predicate. Next, we specify how the value of r is updated depending on the branch: if the
first branch is taken, then the value of r is updated according to the expressions the first
p times (12)-(13) and it remains unchanged for the remaining q − p assignments (14);
if the second branch is taken, the value of r is updated according to the corresponding
expressions in that branch (15)-(16). By adding constraints which keep assignments
unchanged, we can easily model how branches merge again since any variable assigned
after the if statement would be matched with the last value assigned to that variable.
Since fresh variables are added for locations in both sides of the assignments (8),
their values are not yet constrained. We now specify how the data flows between in-
structions that access locations in the shared memory (possibly in different threads).
This depends on where the values are read-from (i.e. the rf relation) and is encoded
by constraints DFmem(i1, i2). A write instruction l := r generates data-flow constraint
cfl:=r ⇒ (li = r) and a read r ← l is encoded by cfr←l ⇒ (r = lj). The variables li, lj
remain unconstrained. If both instructions (call them i1 and i2) are related over the rf
relation, then their values need to match:
φDFmem (i1, i2) = rf(i1, i2)⇒ li = lj
Finally, the data-flow between register and location either within or between threads
is encoded as:
φDF =
∧
(i1,i2)∈(W×R)∩sloc
φDFmem (i1, i2) ∧
∧
t∈T
φDFthrd (it)
where it represent the instruction at the highest recursive level of the thread.
B Complexity Proofs
We recall the main theorem and the program Pψ := t1 ‖ t2 from the paper:
Theorem 1. Let MS,MT be a non-trivial pair of common MCMs. (1) Portability
from MS to MT is Π
P
1 -complete. (2) State portability is Π
P
2 -complete.
t1
1 rc,0 ← 0; rc,1 ← 1; rc,2 ← 2;
2 x1 := rc,0; . . . ; xm := rc,0 ; // Writes w1,0 . . . wm,0
3 r1 ← x1; . . . ; rm ← xm ; // Reads r1 . . . rm
4 if ψ(r1, . . . , rm) then // If A satisfies ψ,
5 y := rc,2 ; // return 2.
6 else
7 y := rc,1 ; // If it doesn’t, return 1.
t2
1 rc,1 ← 1;
2 x1 := rc,1; . . . xm := rc,1 ; // Writes w1,1 . . . wm,1
We give some technical results and show ΠP1 -Completeness of Portability for com-
mon MCMs.
Lemma 2. Pψ is portable from every common MCM to another common MCM.
Proof. According to property (iv), any common MCM is portable to SC. In SC, an
execution corresponds to an interleaving of the two thread executions. First, the threads
create some variable assignment A which is read by t1. Then, t1 checks whether the
assignment satisfies ψ. If it does, y is set to 2, otherwise y is set to 1.
We show that any consistent execution of some common MCM is SC-consistent by
examining possible executions.
– If wi,1
co
→ wi,0 and wi,0
rf
→ ri, then this corresponds to an interleaving where wi,1
occurs first, then t1 writes wi,0 and reads 0 (ri).
– If wi,0
co
→ wi,1 and wi,0
rf
→ ri, then wi,0 and ri in t1 occur first and then wi,1.
– If wi,0
co
→ wi,1 and wi,1
rf
→ ri, then t1 writes wi,0, t2 overwrites this with wi,1 and
afterward t1 reads 1 with ri.
– If wi,1
co
→ wi,0 and wi,1
rf
→ ri, then the derived relation fr := rf
−1; co satisfies
ri
fr
→ wi,0. Since wi,0 and ri are related by po and access the same location there is
a cycle wi,0
po∩sloc
−→ ri
fr
→ wi,0. This is a violation of uniproc, the situation can not
occur in a common MCM.
So we can construct a corresponding interleaving for any execution of a common MCM.
It follows that every execution of a common MCM is SC-consistent and according to
property (iv) consistent with any common MCM.
We use the following technical lemmas to show hardness. We call the relations
po, rf , co, ad , dd , cd and fr basic. Given a common MCM, we define the violating cycles
as follows: For an assertion acyclic(r), any cycle of r is violating. For an assertion
irreflexive(r), any cycle of the form e
r
→ e is violating.
The following lemma shows that an execution is not consistent if it contains a
violating cycle.
Lemma 3. Let M be common. An execution X is consistent with M iff X contains
no violating cycle of M.
This follows directly from the definiton of violating cycles.
We say a relation r satisfies the path condition if e1
r
→ e2 implies e1
b
−→∗e2 with
b := po ∪ rf ∪ co ∪ ad ∪ dd ∪ cd ∪ fr.
Lemma 4. Any relation r of a common MCM satisfies the path condition.
Proof. Note that the recursively defined relations of a common MCM can be obtained
with a Kleene iteration. We use a structural induction over the Kleene iteration.
Induction Basis: Any named relation is initially the empty relation and trivially
satisfies the path condition.
Induction Step: Assume all named relations satisfy the path condition. A named
relation is updated according to its defining equation using the current assignments
of the named relations and basic relations. To simplify this proof, we can assume
that any equation only contains one operator or a relation in base. Any basic relation
trivially satisfies the condition. Let r1 and r2 satisfy the path condition. We examine the
operations that can be applied to them according to properties (i) and (ii) of common
MCMs. We see that r1∪r2, r1∩r2, r1∩sloc, r1∩sthd, r1∩〈set〉×〈set〉, r1 \r2 all satisfy
the path condition. The relation r1; r2(e1, e2) requires an event e3 with r1(e1, e3) and
r2(e3, e2) and since r1 and r2 satisfy the path condition there is a path from e1 to e3
and from e3 to e2 and thus r1; r2 also satisfies the path condition. Similarly, r
+
1 adds a
relation only where there already is a path and satisfies the path condition. Relation r∗1
consists of r+ and the identity relation, which also satisfies the path condition (there
is always a path of length 0 from some e1 to e1). A named relation still satisfies the
path condition after a Kleene iteration step.
Note that according to Lemma 4, a violating cycle implies a cycle of basic relations
(called a basic path) that contains all events of the violating cycle. We can argue about
consistency of an execution by examining the paths of basic relations.
Lemma 5. Given a relation r of a common MCM and events e1, e2 of an execution,
whether r(e1, e2) holds is determined by the basic paths from e1 to e2.
Proof. We use a structural induction over the Kleene iteration.
Induction Basis: Any named relation is initially the empty relation and trivially
satisfies the condition. Any basic relation trivially satisfies the condition.
Induction Step: Assume all current assignments of relations are determined by the
basic paths between related events. Two events e1 and e2 are related by some relation
r iff the basic paths from e1 to e2 satisfy some property. A named relation is updated
according to its defining equation using the current assignments of the named relations
and basic relations. Let r1, r2 be either named or basic relations. Events e1, e2 are
related by r1 ∪ r2 (r1 ∩ r2) if the basic paths from e1 to e2 satisfy the property of r1 or
r2 (resp. r1 and r2). Similar, events are related by r1 \ r2 if the basic paths satisfy the
property of r1 but not r2. For r1 ∩ sloc, r1∩ sthd, r1 ∩〈set〉× 〈set〉 two events e1, e2 are
related if the property of r1 is satisfied and e1 and e2 satisfy an additional condition.
The condition for the events e1, e2 can be expressed as a additional condition for paths
from e1 to e2.
The relation r1; r2 relates e1 to e2 if there is an e3 such that the basic paths from
e1 to e3 ensure r1(e1, e3) and the basic paths from e3 to e2 ensure r1(e3, e2). It follows
that r1; r2(e1, e3) depends on the basic paths from e1 to e2 over some e3. The relations
r∗1 and r
+
1 are derived using previously examined operators over relations and thus they
satisfy the condition.
A named relation still satisfies the path condition after a Kleene iteration step.
Proof (Proof of Theorem 1.1). We show that P∀ψ is not portable iff ψ has an unsatis-
fying assignment.
(⇒): We assume an execution X exists that is MT -consistent but not MS-
consistent. According to Lemma 3, the execution has a violating cycle for MS. We
assume towards contradiction that no event of Pnp is executed. The read r ← y in
t′′1 has to read from the write in t1 (in Pψ) according to uniproc (the execution is
MT -consistent). It occurs after t1 in the program order. The read has incoming basic
relations from events in Pψ but no outgoing relations to some event in Pψ. Any read
r ← y in another thread can either read from the write in Pψ (it has an incoming rf
relation from Pψ) or it reads the initial value which results in an outgoing from-read
relation to Pψ. There are no other basic relations between a read r ← y and some
event in Pψ. So any read r ← y has either basic incoming relations from Pψ (rf , po) or
outgoing to Pψ (fr), not both.
It follows from Lemma 4, that no read r ← y has both incoming and outgoing
derived relations and so no read r ← y is in a violating cycle. Any violating cycle is in
Pψ. Further, there is no basic path from some event e1 in Pψ to one of the reads and
back to some e2 in Pψ. The reads do not affect the basic paths between events in Pψ.
So the violating cycle for MS is still present if we remove the reads and thus restrict
the execution to events of Pψ. Since removing the reads does not affect the basic paths
in Pψ, no violating cycle for MT has been added. It follows that the execution of Pψ is
consistent with MT but not MS. This is a contradiction to Pψ being always portable
for common MCMs (Lemma 2).
It follows that any violating cycle requires events from Pnp to be executed. Since an
event of Pnp can only be executed if y = 1 was read in the thread (and thus written by
t1), the if-condition in Line 8 was not satisfied. It follows that there is an assignment
that does not satisfy ψ.
(⇐): We now assume that there is an assignment that doesn’t satisfy ψ. There is
an SC-consistent execution Z of Pψ that executes the write y := rc,1. We extend this
execution of Pψ to an execution X of P∀ψ: We ensure that all reads r ← y read from
y := rc,1 and thus Pnp is executed. Let Y be an execution of Pnp that isMT -consistent
but not MS-consistent. This results in an execution X ⊃ Y ∪ Z of P∀ψ that contains
the executions of Pψ and Pnp , the read from relation for the reads of y and the required
program order additions. Since Pnp has no registers or locations in common with the
rest of the program and occurs last in po, no basic relation of X leaves Y . It follows
that X contains the same basic paths between events of Y as Y . Thus the violating
cycle for MS in Y is also in X. It follows that X is not MS-consistent. We show that
X is still MT -consistent. We assume towards contradiction that there is a violating
cycle for MT : As before, it holds that a violating cycle must contain an event from Y .
Since Y is never left, any basic cycle of X with events in Y must be contained entirely
in Y . It follows from (Lemma 4) that a violating cycle for MT must be entirely in Y .
This is a contradiction to the execution of Pnp being MT -consistent.
Since P∀ψ is a polynomial-time reduction, portability is Π
P
1 -hard.
B.1 ΠP
2
-Completeness of State Portability
We introduce Lemma 6 and Theorem 2 in order to show that state portability is both
in ΠP2 and is Π
P
2 -hard. It follows, that state portability is Π
P
2 -complete for common
MCMs and thus Theorem ??.2 is correct.
Lemma 6. State-based portability is in Π2 for all MCMs.
Proof. We encode the state portability property in a closed formula (i.e. all variables
are quantified) of the form ∀∃ψ. We have already shown how to encode consistency of
an execution X with an MCM M as a formula (X ∈ consM(P )) in Section 4. Again
we encode numbers as sequences of Boolean variables. Let val(e) be the value that is
read/written by a read or write event e and loc(e) the location it accesses. We can
encode the property state(X) = σ in a Boolean formula as follows. If a write has no
outgoing co relation, then it must have the same value as the location in the state:
∧
w∈W
(
∧
w′∈W
¬(w
co
→ w′))→ val(w) = σ(loc(w)).
In a similar way we can ensure that the last operation in the program order on a register
r has the value σ(r). This means we can construct Boolean formulas for properties of
the form X ∈ consM(P ) and state(X) = σ. With this, we can construct the following
formula:
∀X ∃Y : (X ∈ consMT (P ))⇒
(Y ∈ consMS (P ) ∧ state(X) = state(Y )).
This is equivalent to state portability (see Definition 2). The state portability prob-
lem from MS toMT can be expressed as a closed quantified formula of the form ∀∃ψ
and thus state based portability is in ΠP2 .
We now introduce the program Pψ and examine its behavior. We will then use Pψ
in order to prove ΠP2 -hardness.
Let ψ(x1 . . . xm) be a be a Boolean formula over variables x1 . . . xm. The concur-
rent program Pψ := t1 ‖ t2 with two threads t1 and t2 is defined below. The program
is similar to the program in the previous section. It contains additional synchroniza-
tion in order to ensure that the formula assignment and the computed state match.
The program either computes a satisfying assignment of ψ (y = 1), an unsatisfying
assignment (y = 0) or it ends with an error (y = 2).
We use the value 0 to encode the Boolean value false. To avoid confusion, we assume
that the variables are initialized with some other unused value, e.g. 3. This does not
interfere with the validity of the proofs since our program only assigns constants and
thus 0 and 3 are interchangeable.
We will see that it is sufficient to examine the program under SC (the strongest
common MCM) where an execution is an interleaving of the two thread executions.
The threads first create some variable assignment A; thread t1 assigns 0 to the variables
and t2 assigns 1. The assignment A is determined by the interleaving of those writes.
If the write xi := 0 of t1 is followed by xi := 1 of t2 (wi,0
co
→ wi,1), then xi is set to
1. Then t1 ensures that t2 has executed all its writes (so that the assignment doesn’t
change anymore) by using the synchronization variables x′1 . . . x
′
m to check that t1 and
t2 reads the same assignment. If that is not the case, then some writes of t2 have not
occurred yet and y is set to 2 in Line 6. If all the writes from t2 have occurred, t1
checks whether A satisfies ψ (y is set to 1) or not (y is set to 0).
t1
1 rc,0 ← 0; rc,1 ← 1; rc,2 ← 2;
2 x1 := rc,0; . . . xm := rc,0 ; // w1,0 . . . wm,0
3 r′1 ← x
′
1; . . . r
′
m ← x
′
m ; // r
′
1 . . . r
′
m
4 r1 ← x1; . . . rm ← xm ; // r1 . . . rm
5 if ¬(r1 = r
′
1 ∧ · · · ∧ rm = r
′
m) then // If A is incomplete
6 y := rc,2 ; // exit with error.
7 else
8 if ψ(r1 . . . rm) then // If A satisfies ψ,
9 y := rc,1 ; // return 1.
10 else
11 y := rc,0 ; // If it does not, return 0.
To simplify our study we will define a state only over its locations, not the registers.
This does not change the complexity of the state portability problem: We could simply
add instructions to our input programs that write all registers to locations in the end.
We can use our simpler notion of states without registers on the input program with
the added instructions to solve the original state portability problem with registers.
An assignment A of a set of Boolean variables V is a function A ⊂ V × {0, 1} that
assigns either 0 or 1 to a variable.
t2
1 rc,1 ← 1; rc,3 ← 3;
2 x1 := rc,1; . . . xm := rc,1 ; // w1,1 . . . wm,1
3 r1 ← x1; . . . rm ← xm ; // r¯1 . . . r¯m
4 x′1 := r1; . . . x
′
m := rm ; // w¯1 . . . w¯m
5 x′1 := rc,3; . . . x
′
m := rc,3
Definition 4. Given an Assignment A of x1 . . . xn, locations y1 . . . ym and a number
a ∈ N, let σ[A; y1 . . . yn ← a] denote the state with
– σ[A; y1 . . . yn ← a](xi) = A(xi) for i ≤ n and
– σ[A; y1 . . . yn ← a](yj) = a for j ≤ m,
– σ[A; y1 . . . yn ← a](z) = v for z any other location and v the initial value (we use
3).
We lift the definition accordingly to
σ[A; y1 . . . yn ← a; z1 . . . zl ← b].
The program Pψ can compute some assignment A with y = 1 or y = 0 depending on
whether A satisfies ψ.
The following lemmas show that Pψ behaves similar for all common MCMs.
Lemma 7. Let M contain uniproc and A be an assignment of ψ. It holds A |= ψ
(resp. A 6|= ψ) only if
∃X ∈ consM(Pψ) : state(X) = σ[A; y ← 1]
(resp. state(X) = σ[A; y ← 0]).
Proof. We show that any M-consistent execution with y = 0 or y = 1 computes a
desired state. Let X be an execution that satisfies uniproc and computes some σ with
σ(y) = 1 (σ(y) = 0 is analogue). Since y := 1 is executed in t1, it follows that ψ is
satisfied by the values of x1 . . . xm read by r1 . . . rm in Line 4 and also r
′
1, . . . , r
′
m in
Line 3 read the same values. We call this assignment A. Since the reads of Line 5 of t2
occur after the writes w¯1 . . . w¯m they are ordered last in co according to uniproc. The
execution computes 3 for x′1 . . . x
′
m.
It remains to show that the writes accessed by r1 . . . rm are indeed computed by
X, meaning they are ordered last in co. Towards contradiction, we assume this is not
the case. Then, there is a write wi,1 or wi,0 that is accessed by a read but its value is
not computed (it is not last in co).
Case 1: Assume that this write is wi,1. The write is accessed by a read (wi,1
rf
→ ri) and
it is not last in the coherence order (wi,1
co
→ wi,0). According to fr := rf
−1; co, it holds
ri
fr
→ wi,0. Since wi,0 occurs before ri in t1 and they access the same location, they are
related w.r.t po and sloc and thus wi,0
po∩sloc
−→ ri
fr
→ wi,0. This cycle is a contradiction
to X satisfying uniproc which is property (iii) of common MCMs.
Case 2: Assume that there is a write wi,0 that is read (wi,0
rf
→ ri) and its value is not
computed (wi,0
co
→ wi,1). It follows that ri reads the value 0. Since y = 1 is computed,
the condition in line 8 is not satisfied and since val(ri) = val(r
′
i), we know that r
′
i also
reads 0. This means r′i reads not the initial value 3 so it must read from w¯i. For the
write w¯i that r
′
i reads from (w¯i
rf
→ r′i) follows that val(w¯i) = val(r
′
i) = 0. According
to the data-flow, w¯i writes the value that was obtained by the previous read r¯i which
must be 0 (val(r¯i) = val(w¯i) = 0). Since r¯ reads 0 it must read from the only write
of 0 (wi,0
rf
→ r¯i). From wi,0
rf
→ r¯i and wi,0
co
→ wi,1 follows r¯i
fr
→ wi,1. This leads to the
cycle wi,1
po∩sloc
−→ r¯i
fr
→ wi,1, which is a contradiction to X satisfying uniproc.
So the writes accessed by r1 . . . rm are ordered last by co and thus the assignment
that ψ is checked against is computed by the execution. It follows that σ[A; y ← 1] is
computed.
Lemma 8. Let M be a common MCM and A be an assignment of ψ. It holds A |= ψ
(resp. A 6|= ψ) iff
∃X ∈ consM(Pψ) : state(X) = σ[A; y ← 1]
(resp. state(X) = σ[A; y ← 0]).
Proof. Given some assignment A, we can easily construct an SC-consistent execution
where the writes to x1 . . . xn are interleaved according to the assignment (wi,0
co
→ wi,1
if xi is satisfied). Then t2 reads those values and writes them to x
′
1 . . . x
′
n. Now t1 reads
x′1 . . . x
′
n and the if-condition in line 5 is not satisfied, we go in the else-branch. So t1
sets y to 0 or 1 depending on whether A |= ψ and t2 sets x
′
1 . . . x
′
n back to the initial
value 3. It follows that for every assignment A, there is a SC consistent execution X
that computes σ[A; y ← 0] or σ[A; y ← 1] depending on whether A satisfies ψ. Since
any SC-consistent execution is consistent with all common MCMs, we can compute
the desired state for any assignment of ψ. The other direction follows diretly from
Lemma 7.
In order to show ΠP2 -hardness, we reduce validity of a closed formula
∀x1 . . . xn∃y1 . . . ym : ψ to state portability. The idea is to construct a program that
uses Pψ in order to check if some assignment satisfies ψ and then overwrite y1 . . . ym
with 1 so that the assignment of y1 . . . ym is not given by the computed state. If ψ was
not satisfied, the non-portable component Pnp is executed. If the execution of Pnp is
MT -consistent but not MS-consistent, then it pretends that the formula was satisfied
by setting y to 1. This means, that under MT , any assignment of x1 . . . xn with y = 1
can be computed. It follows that the program is portable if any assignment of x1 . . . xn
with y = 1 can be computed under MS. Under MS however, pretending is not pos-
sible. Here Pψ can only be set to 1 by Pψ. So under MS, an assignment of x1 . . . xn
and y = 1 can only be computed if there is some assignment of y1 . . . ym so that ψ is
satisfied. The program is portable if ∀x1 . . . xn∃y1 . . . ym : ψ holds.
We want a simple non portable program that always computes the initial state
except under MT , where it can set a location z to 1. We use a program Pnp = t
′
1 ‖
· · · ‖ t′k with the following properties: Any execution consistent with MS computes 3
for all its locations and contains no write that sets z to 1. TheMT -consistent executions
compute either z = 1 or z = 3 and 3 for all other locations. The program contains only
one write to z which is in t′1.
We assume the state portability problem from MS to MT is not trivial and a
program exists that has an MT -consistent execution such that no MS-consistent ex-
ecution computes the same state σ. We can assume that the program only assigns
constant values and has no write on z.
Similarly to the synchronization of Pψ, we can add a mechanism at the end of all
threads that does the following: all threads check if they read σ; if so, they communicate
that to t1 which sets z accordingly and then all threads set all other locations back to
3. It follows that a program Pnp with the required properties exists.
Given a formula ∀x1 . . . xn∃y1 . . . ym : ψ, we use Pψ = t1 ‖ t2 and Pnp = t
′
1 ‖ · · · ‖ t
′
k
to construct a program Ps := t
s
1 ‖ · · · ‖ t
s
k. We define the threads below.
ts1
1 t1 ; // Try some assignment.
2 ry ← y;
3 if ry = 0 then // If ψ was not satisfied,
4 t′1 ; // execute Pnp.
5 rz ← z;
6 if rz = 1 then // If not MS-consistent,
7 z := rc,3 ; // pretend it is MS-consistent
8 y := rc,1 ; // and pretend ψ was satisfied.
9 y1 := rc,1; · · · ym := rc,1 ; // Overwrite y1..ym assignment.
Let ti := skip for i ≤ 3 ≤ k (t1 and t2 are from Pψ). The threads are defined for
2 ≤ i ≤ k as
t
s
i := ti; ry ← y; if (ry = 0) then t
′
i.
In general terms, Ps does the following: First, it executes Pψ. If the state computed
by Pψ did not satisfy ψ (y = 0), then it executes Pnp, which is not portable. If the
execution of Pnp is MT , but not MS-consistent (z = 1), then Ps pretends, that the
formula was satisfied by setting y to 1. Afterward, y1 . . . ym are set to 1, so that their
former assignment checked in Pψ is no longer given by the computed state.
Lemma 9. For every assignment A of x1 . . . xn holds
∃X ∈ consMT (Ps) : state(X) = σ[A; y, y1 . . . ym ← 1].
Proof. According to Lemma 8, the following holds: For every assignment A of x1 . . . xn
and A′ of y1 . . . ym, there is an MT -consistent execution X of Pψ, such that either
state(X) = σ[A ∪ A′; y ← 1] or state(X) = σ[A ∪A′; y ← 0]. We examine both cases:
Case 1: If state(X) = σ[A ∪ A′; y ← 1], then A ∪ A′ |= ψ and according to Lemma 8
there is an SC-consistent execution of Pψ that computes σ[A ∪ A
′; y ← 1]. We can
easily extend this to an SC-consistent execution X ′ (represented by an interleaving)
of Ps. After Pψ is executed, we read 1 with reads ry ← y of all threads. So the
subsequent if conditions are not satisfied. Then we execute the writes in Line 9. So
y1 := rc,1 . . . ym := rc,1; are the only writes executed outside of Pψ. These writes are
ordered after the assignments of 0 to y1 . . . ym in co and thus state(X
′)(yi) = 1 for
i ≤ m. Since there are no further executed writes outside of Pψ, the computed state
otherwise coincides with σ[A∪A′; y ← 1] computed byX. The extension of X computes
σ[A; y, y1 . . . ym ← 1].
Case 2: If state(X) = σ[A ∪ A′; y ← 0], then write y := rc,0 is executed in t1. We
construct an execution X ′ ⊃ X of Ps in the following way: We ensure all reads ry ← y
of threads tsi with i ≤ k read from y := rc,0 and thus all threads t
′
i are executed. This
means Pnp is executed. According to the definition of Pnp, there is an MT -consistent
execution Y of Pnp such that state(Y )(z) = 1 is written by some write wz in t
′
1 and Y
computes the initial value for all other locations of Pnp . We enforce X
′ ⊇ Y and ensure
the read rz ← z reads from the write in Pnp (z = 1) and thus the following if condition
is satisfied and the writes z := 3 and y := 1 are executed. We order them last in co. It
follows that X ′ computes σ[A; y, y1 . . . ym ← 1]
We show that X ′ is stillMT -consistent. We partition the events of X
′ into sets E1,
E2 and E3 and show that there is no violating cycle of MT inside or between these
sets.
Set E1 consists of events in X and the subsequent reads ry ← y of all threads.
Set E2 consists of the events of Y . Set E3 contains the events rz ← z in t1 and the
following writes z := rc,3, y := rc,1 and y1 := rc,1; · · · ym := rc,1.
The following two conditions hold: (i) E2 is only left by basic relations leading to
E3. This is the case since E1 precedes E2 in the program order and has no common
registers. (ii) There are no basic relations leading from E3 to some other set: The
events in E3 are ordered last in the program order and the writes are ordered last in
the coherence order. The read rz ← z has no outgoing fr relation, since there is only
one write to z.
From (i) and (ii) follows that X ′ contains no basic cycle that contains events from
more than one of the sets. According to the path property (Lemma 4) exists no violating
cycle forMT inX
′ that contains events from more than one of the sets. So any violating
cycle must be contained entirely in one of the sets.
From (i) and (ii) follows (iii): If two events e1, e2 are in the same set, then there is
no basic path from e1 to e2 that leaves the set.
We consider E1: There are read from relations from y := rc,0 in t1 to all reads
ry ← y in t
s
i with i ≤ k. However, there are no outgoing basic relations from any of
those reads to other events in E1. It follows from (iii) that there is no basic path from
a read y := rc,0 to another event in E1 and according to the path property, there is
no relation from a read y := rc,0 to some other element in E1. It follows that a read
ry ← y cannot occur in a violating cycle. From (iii) follows that the basic paths in
X ′ between events of E1 (and thus the violating cycles for MT ) are the same as in
X. Since X is MT -consistent, it has no violating cycle for MT and thus X
′ has no
violating cycle for MT in E1.
We consider E2: From (iii) follows that the basic paths in X
′ between events in E2
are the same as in Y . According to Lemma 5, X ′ has a violating cycle for MT in E2
iff Y has a violating cycle for MT . The execution Y is defined to be MT -consistent so
there is no violating cycle in E2.
We consider E3: The set contains no basic cycle and no basic relation leaves E3
according to (ii). It follows that there is no basic cycle that contains events of E3.
According to the path property, there is no violating cycle in E3. It follows that X
′ is
MT -consistent.
Lemma 10. There is an MS-consistent execution of Ps that computes
σ[A; y, y1 . . . ym ← 1] with some assignment A of x1 . . . xn iff there is an assignment
A′ of y1 . . . ym such that A ∪A
′ |= ψ.
Proof. If a program P ′ is contained in a program P , we can restrict an execution X of
P to an execution of P ′. We simply remove all events and relation on events that are
not in P ′. We denote X restricted to P ′ as X[P ′].
(⇒): Let X be an MS-consistent execution of Ps that computes state(X)(y) = 1.
We assume towards contradiction that X contains a write that sets z to 1. According
to the definition of Pnp the execution X[Pnp ] contains a violating cycle for MS . As
with the proof of Lemma 9, we can show that no basic path in X between events of
X[Pnp ] leaves X[Pnp ]. It follows that the basic paths between events in X[Pnp ] are the
same in X and X[Pnp ] and thus X also contains the violating cycle for MS. This is a
contradiction to X being MS-consistent.
It follows that X cannot read z = 1 in Line 5 and the write in Line 8 is not executed.
So y := rc,1 must have been executed in Pψ for X to compute y = 1.
The execution X satisfies uniproc since MS is common. It has no basic cycle that
violates uniproc. It follows that X[Pψ ] has no such basic cycle either and thus satisfies
uniproc. It follows from Lemma 7, that X[Pψ] computes a satisfying assignment of
x1 . . . ym. Since nothing is written to x1 . . . xn outside of Pψ, X computes the same
values for x1 . . . xn as X[Pψ]. A write yi ← rc,0 in t1 is related to write yi ← rc,1
in line 9 of ts1 in po ∩ sloc and according to uniproc also in co. This implies that X
computes 1 for y1 . . . ym. For the synchronization variables of Pψ, uniproc ensures that
X computes the initial values.
It follows, that if there is an MS-consistent execution that computes
σ[A; y, y1 . . . ym ← 1] for some assignment A of x1 . . . xn, then there is an assignment
A′ of y1, ..ym such that A ∪A
′ |= ψ.
(⇐): According to Lemma 8, for each assignment A of x1 . . . xn and A
′ of y1 . . . ym
where A ∪ A′ satisfies ψ, there is an SC-consistent execution X of Pψ such that
state(X) = σ[A ∪ A′; y ← 1]. This SC execution of Pψ can be represented as an
interleaving of the local executions. To this interleaving, we append the subsequent
reads ry ← y. They read the value 1 and the next if conditions are not satisfied. Then
we append the remaining writes in line 9. The resulting interleaving represents an
SC-consistent execution X ′ of Ps .
The only writes not in Pψ that are executed by X
′ are in Line 9 (they are last in
the program order and thus also last in po∩ sloc) and according to uniproc, they must
be ordered after any write yi := 0 in co. It follows state(X
′) = σ[A; y, y1 . . . ym ← 1].
Theorem 2. State-based portability is Π2-hard for common MCMs.
Proof. Given common MCMsMS andMT , we argue that Ps is a reduction of validity
of a formula ∀x1 . . . xn∃y1 . . . ym : ψ(x1 . . . ym) to state portability from MS to MT
Note that any execution of Ps satisfying uniproc computes 1 for y1 . . . ym and the
initial values for the synchronization variables. We show that the following properties
(i)-(iii) hold:
(i) Any state with y = 2 that is computable with an MT -consistent execution of Ps
can be computed by an MS-consistent computation of Ps. It is easy to see that
any state with y = 2 that can be computed by an execution satisfying uniproc,
can be computed by an SC-consistent execution.
(ii) Any state with y = 0 that is computed by an MT -consistent execution X of Ps
can be computed by an MS-consistent computation. No event of Pnp is executed,
the inital value is computed for locations of Pnp . According to uniproc, any such
state has the form σ[A; y ← 0; y1 . . . ym ← 1] for some assignment A of x1 . . . xn.
It is easy to see that A and y = 0 is computed by X[Pψ] and X[Pψ] satisfies
uniproc. According to Lemma 7, there is an assignment A′ of y1 . . . ym such that
A∪A′ 6|= ψ. It follows from Lemma 8 that there is an SC-consistent execution of Pψ
that computes σ[A ∪ A′; y ← 0]. This can be easily extended to an SC-consistent
execution of Ps that computes σ[A; y ← 0; y1 . . . ym ← 1]. Since MS is common,
σ[A; y ← 0; y1 . . . ym ← 1] is also computable under MS.
(iii) Any state with y = 1 computed by an MT -consistent execution X has the form
σ[A; y, y1 . . . ym ← 1] for some assignment A of x1 . . . xn. This holds since X[Pnp ] is
also MT -consistent (the basic paths between its events are the same as in X) and
according to the definition of Pnp , it computes the initial value for all its locations
except maybe 1 for z. If that is the case, then the write z ← rc,3 in t
s
1 is executed
and according to uniproc ordered later in co.
From (i)-(iii) follows that Ps is state portable from MS to MT iff any MT -
computable state σ[A; y, y1 . . . ym ← 1] can be computed under MS. Lemma 9 implies
that Ps is state portable from MS to MT iff any state σ[A; y, y1 . . . ym ← 1] can be
computed underMS. According to Lemma 10, this is the case iff for every assignment
A of x1 . . . xn there is an assignment A
′ of y1 . . . ym with A ∪A
′ |= ψ. This is the case
iff ∀x1 . . . xn∃y1 . . . ym : ψ(x1 . . . ym) is valid.
C Complete Experiments
Benchmark SC
-
T
S
O
S
C
-
P
S
O
S
C
-
P
o
w
e
r
T
S
O
-
P
S
O
T
S
O
-
P
o
w
e
r
P
S
O
-
P
o
w
e
r
R
M
O
-
A
l
p
h
a
R
M
O
-
P
o
w
e
r
A
l
p
h
a
-
R
M
O
A
l
p
h
a
-
P
o
w
e
r
Bakery ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✔ ✗ ✔
Bakery x86 ✔ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
Bakery Power ✔ ✗ ✔ ✗ ✔ ✔ ✗ ✔ ✗ ✔
Burns ✗ ✗ ✗ ✔ ✗ ✗ ✗ ✔ ✗ ✔
Burns x86 ✔ ✔ ✗ ✔ ✗ ✗ ✗ ✗ ✗ ✗
Burns Power ✔ ✔ ✔ ✔ ✔ ✔ ✗ ✔ ✗ ✔
Dekker ✗ ✗ ✗ ✗ ✗ ✗ ✔ ✔ ✗ ✔
Dekker x86 ✔ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
Dekker Power ✔ ✗ ✔ ✗ ✔ ✔ ✗ ✔ ✗ ✔
Lamport ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✔
Lamport x86 ✔ ✔ ✗ ✔ ✗ ✗ ✗ ✗ ✔ ✗
Lamport Power ✔ ✔ ✔ ✔ ✔ ✔ ✗ ✔ ✔ ✔
Parker ✗ ✗ ✗ ✗ ✗ ✔ ✔ ✔ ✔ ✔
Parker x86 ✔ ✗ ✗ ✗ ✗ ✔ ✔ ✔ ✔ ✔
Parker Power ✔ ✗ ✔ ✗ ✔ ✔ ✔ ✔ ✔ ✔
Peterson ✗ ✗ ✗ ✗ ✗ ✗ ✔ ✔ ✗ ✔
Peterson x86 ✔ ✗ ✗ ✗ ✗ ✗ ✔ ✗ ✗ ✗
Peterson Power ✔ ✗ ✔ ✗ ✔ ✔ ✔ ✔ ✗ ✔
Szymanski ✗ ✗ ✗ ✔ ✗ ✗ ✗ ✔ ✗ ✔
Szymanski x86 ✔ ✔ ✗ ✔ ✗ ✗ ✗ ✗ ✗ ✗
Szymanski Power ✔ ✔ ✔ ✔ ✔ ✔ ✗ ✔ ✗ ✔
Table 2: Bounded portability analysis of mutual exclusion algorithms: portable (✔),
non-portable (✗)
This section presents the complete set of experiments for portability of mutual ex-
clusion algorithms. Besides the portability analysis between SC, TSO and Power shown
on Table 1, we report on several MCMs combinations. The results of the portability
analysis are shown in Table 2. The complete set of encoding and solving times are
shown respectively in Fig. 9 and Fig. 10.
B
ak
er
y
B
ak
er
y
x8
6
B
ak
er
y
Po
w
er
B
ur
ns
B
ur
ns
x8
6
B
ur
ns
Po
w
er
D
ek
ke
r
D
ek
ke
r
x8
6
D
ek
ke
r
Po
w
er
La
m
po
rt
La
m
po
rt
x8
6
La
m
po
rt
Po
w
er
Pe
te
rs
on
Pe
te
rs
on
x8
6
Pe
te
rs
on
Po
w
er
Sz
ym
an
sk
i
Sz
ym
an
sk
i x
86
Sz
ym
an
sk
i P
ow
er
0
50
100
150
200
SC-TSO SC-PSO TSO-PSO RMO-Alpha Alpha-RMO
B
ak
er
y
B
ak
er
y
x8
6
B
ak
er
y
Po
w
er
B
ur
ns
B
ur
ns
x8
6
B
ur
ns
Po
w
er
D
ek
ke
r
D
ek
ke
r
x8
6
D
ek
ke
r
Po
w
er
La
m
po
rt
La
m
po
rt
x8
6
La
m
po
rt
Po
w
er
Pe
te
rs
on
Pe
te
rs
on
x8
6
Pe
te
rs
on
Po
w
er
Sz
ym
an
sk
i
Sz
ym
an
sk
i x
86
Sz
ym
an
sk
i P
ow
er
0
50
100
150
200
250
300
SC-Power TSO-Power PSO-Power RMO-Power Alpha-Power
Fig. 9: Encoding times (in secs.) for portability of mutual exclusion algorithms.
D Common Executions
For a given program P , let exec(P ) be the set of its (consistent and inconsistent)
executions. We show how to check portability of a high-level program PH based on
the executions of two different low-level programs PS and PT that were compiled from
PH to different architectures. The following concept of high-level portability is more
involved than Definition 1 since programs compiled towards different architectures may
differ greatly and it is often difficult to directly compare their executions. The definition
relies on a formula execproj (XL, XH) which holds iff the high-level execution XH is the
projection (see below) of the low-level one XL according to the compilation mapping.
We consider executions of PS and PT to be similar if they have the same projection
over PH . We adapt our method. Instead of looking for a single low-level execution that
is consistent with MT but not MS , we now look for a high-level execution that is the
projection of two low-level executions; an execution of PS not consistent withMS and
an execution of PT consistent with MT . In order to still be able to encode portability
as an existential SMT query, formula execproj (XL, XH) has to be existential as well.
Definition 5 (High-level Portability). Let MS, MT be two MCMs, PH a high-
level program, and PS, PT the two low-level programs after compiling to the correspond-
ing architectures. Program PH is not portable from MS toMT if there are executions
XH ∈ exec(PH), XS ∈ exec(PS) and XT ∈ exec(PT ) such that
execproj (XS , XH) ∧ execproj (XT , XH) ∧XS 6∈ consMS (PS) ∧XT ∈ consMT (PT ).
For compilers doing complex optimisations such as common subexpression elimination,
speculative execution, etc., creating the formula execproj (XL, XH) is not trivial, but
can still be done. Wickerson et al already study the construction of projected high-level
executions for litmus tests [47]. We leave the concrete details of the implementation
of such a mapping for, e.g., LLVM-compiler-generated x86/Power assembly code, for
future research. However, we claim this can be done with a reasonable amount of
implementation effort. For simpler compilers where we can obtain a function from
low-level instructions to high-level ones (such function exists even in the presence of
instructions reordering), we give below a concrete formula execproj (XL, XH ). This
relation between executions uses a direct mapping between events; it does not depend
on the program order which allows us to handle any instruction reorderings without
difficulties.
Given a program P , we denote its set of instructions as IP . A low-level program PL
is obtained by compiling an unrolled high-level program PH . Each memory event of an
execution XL corresponds to a low-level instruction, which in turn was compiled from
a high-level instruction. We define a function hlinst : EL → IH that assigns to each
memory event in the low-level execution the corresponding high-level instruction. We
use hlinst to relate executions of the low-level and high-level programs. An execution
XH of a high-level program PH is a set of executed instructions IH ⊆ IH and relations
rf and co between them.
Definition 6 (Execution Projections). Let XL, XH be respectively executions of
a low-level program PL and a high-level program PH and let hlinst : EL → IH be a
function. We define
execproj (XL, XH ) :=
∧
e∈XL
hlinst(e) ∈ IH∧ (17)
∧
e1,e2∈XL
rel
→∈{
rf
→,
co
→}
(e1
rel
→ e2 ⇒ hlinst(e1)
rel
→ hlinst(e2))∧ (18)
∧
i1,i2∈IH
(i1
rel
→ i2 ⇒ (19)
∨
e1,e2∈XL
(e1
rel
→ e2 ∧ i1 = hlinst(e1) ∧ i2 = hlinst(e2)))
We ensure that each executed event is mapped to a unique executed high-level instruc-
tion (17), that events related with rf or co are mapped to instructions that are related
by the same relation (18), and that if two instructions are related, then there are two
corresponding events in the low-level program with the same relation (19). Note that
one high-level instruction may correspond to multiple memory events.
The function hlinst can be easily constructed by keeping track of the original in-
structions during the compilation. Definition 6 can be encoded in SAT and thus porta-
bility of a high-level program can be encoded in an existential SMT formula according
to Definition 5 and 6.
B
ak
er
y
B
ak
er
y
x8
6
B
ak
er
y
Po
w
er
B
ur
ns
B
ur
ns
x8
6
B
ur
ns
Po
w
er
D
ek
ke
r
D
ek
ke
r
x8
6
D
ek
ke
r
Po
w
er
La
m
po
rt
La
m
po
rt
x8
6
La
m
po
rt
Po
w
er
Pe
te
rs
on
Pe
te
rs
on
x8
6
Pe
te
rs
on
Po
w
er
Sz
ym
an
sk
i
Sz
ym
an
sk
i x
86
Sz
ym
an
sk
i P
ow
er
0
0.5
1
1.5
SC-TSO SC-PSO TSO-PSO RMO-Alpha Alpha-RMO
B
ak
er
y
B
ak
er
y
x8
6
B
ak
er
y
Po
w
er
B
ur
ns
B
ur
ns
x8
6
B
ur
ns
Po
w
er
D
ek
ke
r
D
ek
ke
r
x8
6
D
ek
ke
r
Po
w
er
La
m
po
rt
La
m
po
rt
x8
6
La
m
po
rt
Po
w
er
Pe
te
rs
on
Pe
te
rs
on
x8
6
Pe
te
rs
on
Po
w
er
Sz
ym
an
sk
i
Sz
ym
an
sk
i x
86
Sz
ym
an
sk
i P
ow
er
0
5
10
15
20
25
30
35
SC-Power TSO-Power PSO-Power RMO-Power Alpha-Power
Fig. 10: Solving times (in secs.) for portability of mutual exclusion algorithms.
