A denotational account of C11-style memory by Kavanagh, Ryan & Brookes, Stephen
ar
X
iv
:1
80
4.
04
21
4v
1 
 [c
s.P
L]
  1
1 A
pr
 20
18
Submitted to MFPS 2018
A denotational account of C11-style memory
Ryan Kavanagh1 Stephen Brookes2
Computer Science Department
Carnegie Mellon University
Pittsburgh, PA, USA
Abstract
We introduce a denotational semantic framework for shared-memory concurrent programs in a C11-style
memory model. This denotational approach is an alternative to techniques based on “execution graphs”
and axiomatizations, and it allows for compositional reasoning. Our semantics generalizes from traces
(sequences of actions) to pomsets (partial orders of actions): instead of traces and interleaving, we embrace
“true” concurrency. We build on techniques from our prior work that gives a denotational semantics to
SPARC TSO. We add support for C11’s wider range of memory orderings, e.g., acquire-release and relaxed,
and support for local variables and various synchronization primitives, while eliminating significant amounts
of technical bookkeeping. Our approach features two main components. We first give programs a syntax-
directed denotation in terms of sets of pomsets of memory actions. We then give a race-detecting executional
interpretation of pomsets using footprints and a local view of state.
Keywords: C11, denotational semantics, pomsets, concurrency, weak memory models.
1 Introduction
A memory model specifies which values can be read by memory accesses. C11-
style memory models allow the programmer to specify if a given memory location
should be acted on atomically or non-atomically [10]. Atomic memory locations
are intended to be used for inter-thread communication and synchronization. Every
atomic memory action has a programmer-chosen memory ordering tag. The action’s
memory ordering specifies the visibility of actions sequenced before or after it to
other actions that synchronize with it. Intuitively, the memory ordering stipulates
how the memory action can be reordered with other actions in the same thread.
Following Lahav et al. [10], we differ from C11 and do not treat unsequenced races
between atomic accesses to the same location as undefined behaviour. In contrast,
multiple concurrent accesses to a non-atomic location, at least one of which is a
write, constitutes a race and is regarded as undefined behaviour, because reading
1 Email: rkavanagh@cs.cmu.edu. Funded in part by a Natural Sciences and Engineering Research Council
of Canada Postgraduate Scholarship.
2 Email: brookes@cs.cmu.edu
Kavanagh and Brookes
from a non-atomic location could retrieve a value from an intermediate state. Mem-
ory actions on non-atomic locations can be compiled to normal memory accesses,
cheaper to perform than their atomic counterparts.
We provide a denotational framework for exploring C11-style memory models.
We did not set out to exactly capture any particular account of the C11 memory
model for two reasons. First, because the literature presents many different accounts
of the C11 memory model (e.g., [2,10,13]), each addressing various shortcomings,
we believe it is better to develop a generic framework in which we can study various
formulations. Second, C11 has some features, such as consume accesses, that we
deem are premature or introduce excessive complexity with little gain.
Our framework has two major components. In Section 3, we give a denotational
(and hence compositional) account of a C11-style memory model. Each program is
given a set of pomsets (a generalization of traces) as its denotation using various
composition operators designed to capture exactly the per-thread memory reorder-
ings permitted by the memory model. In Section 4, we give these pomsets an
executional interpretation inductively defined on the structure of the pomset, us-
ing a local view of state. This interpretation is carefully constructed to respect
synchronization constraints provided by the memory model and it is race-detecting.
2 An Informal Account
Each type of memory location has an associated set of actions. Non-atomic locations
can be read from and written to. Atomic locations can additionally be acted on
with atomic read-modify-write actions. There are memory operations that involve
no locations. For example, a fence is a special atomic memory action that acts
as a barrier against reordering. All atomic operations have an associated memory
ordering tag chosen by the programmer.
The strongest ordering on atomic actions is sequential consistency, denoted by
the tag sc. An sc action cannot be reordered with any other action, and every
execution induces a total order on the sc actions. Though sc actions are expensive
to implement, they allow the programmer to reason via an interleaving semantics.
All atomic memory actions can use this ordering.
The release-acquire memory ordering paradigm gives lightweight synchronization
between threads. No memory action sequenced before a release (rel ) write can be
reordered to after the write. Symmetrically, no action sequenced after an acquire
(acq) read can be reordered to before the read. The intended semantics is that
any action after an acquire read that “synchronizes with” a release write to the
same atomic location sees the effects of all actions that occurred before the write.
Fences and atomic read-modify-write actions, such as locking primitives, can use
the acquire-release (ar ) ordering. These actions behave both as an acquire read and
a release write. A key difference from sc is that executions need not induce a total
order on ar actions.
The weakest memory ordering we consider is relaxed (rlx ). It imposes no addi-
tional constraints on reordering and only guarantees atomicity.
It is helpful to visualize the relative strength of these memory orderings using
2
Kavanagh and Brookes
the following diagram by Lahav et al. [10]:
acq ≤
''❖
❖❖
❖❖
na
≤
// rlx
≤ 77♥♥♥♥♥
≤ ''❖
❖❖
❖❖
❖ ar
≤
// sc
rel
≤
77♦♦♦♦♦♦
A key desideratum is that executing a single sequential thread under our memory
model should produce the same result as execution without any reorderings. This
implies that at no point may we reorder memory actions to the same location within
a given thread. We also want coherence, i.e., the property that writes to the same
location appear in the same order to all threads. Our model should further respect
data dependencies: whenever we write the value of an expression to a location, any
reads required to evaluate that expression must be ordered before the write.
We illustrate these principles and the interplay between the release and acquire
orderings using a simple message-passing example. Consider executing the program
x :=na 42 ; y :=rel 1 ‖ (while yacq = 0 do skip) ; z :=rlx xna (1)
from an initial state where all locations are initialized to 0. The fact that the write
to y is a release and the reads from y are acquires guarantees that after the while
loop terminates, the read from x will see the value 42. We make these dependencies
explicit by means of diagrams, where an arrow a→ b indicates that memory action
a is sequenced before b, and a //❴❴ a abbreviates a→ a→ · · · → a:
x :=rlx 42 // y :=rel 1 y=acq 0 //❴❴ y=acq 0 // x=rlx 42 // z :=rlx 42
Had all of the actions been tagged with the rlx ordering, the compiler could have
reordered the writes to x and y because they do not depend on each other, and then
the read from x could have returned 0 instead of 42, giving us
x :=rlx 42 y :=rlx 1 y=rlx 0 //❴❴ y=rlx 0 // x=rlx 42 // z :=rlx 42
We pause to remark that the reads from y are still ordered before the read from x:
this is because there is a control-flow dependency between the read yrlx from the
loop test and the write z :=rlx xrlx immediately following. Preserving control-flow
dependencies will be important for eliminating various thin air behaviours.
3 Denotational Semantics
We make our informal account precise using a denotational semantics. Its chief
advantage is compositionality : the meaning (or denotation) of a program in the
memory model is determined by the denotations of its subphrases. This allows for
modular reasoning and the validation of various program-level optimizations. The
denotations of programs will be order structures on syntactic objects called “mem-
ory actions”. These order structures, called partially-ordered multisets (pomsets),
generalize traces and are an abstract description of the program’s memory accesses.
3
Kavanagh and Brookes
We specify a simple imperative language with while loops, fences, local variables,
and atomic read-modify-write actions. These features were chosen to illustrate
the principles underlying our approach, but the details of the language are not
important. We assume meta-variables for disjoint sets of identifiers a ∈ Ideat
(atomic assignable identifiers) and n ∈ Idena (non-atomic assignable identifiers)
and we let x, y range over the set Ide = Ideat ∪ Idena of all identifiers. Finally, we
let v ∈ V = Z (integer values) and f range over partial functions of type V ⇀ V.
The abstract syntax of our language is given by the following grammar:
α ::= rlx | rel | acq | ar | sc
µ ::= na | α
e ::= v | aα/∈{rel ,ar} | nna | rmw(aα; f) | e1 + e2 | · · ·
b ::= true | ¬b | b1 ∨ b2 | e1 < e2 | · · ·
c ::= skip | a :=α/∈{acq ,ar} e | n :=na e | fence>rlx | rmwα(a; f)
| local nna = v in c | c1 ; c2 | if b then c1 else c2 | while b do c
p ::= c | p ‖ c
The meta-variables are e ∈ Expint (integer expressions), b ∈ Expbool (boolean ex-
pressions), c ∈ Cmd (commands), and p ∈ Prog (programs). We abuse notation
and use subscripts such as >µ to mean all memory orderings µ′ such that µ′ > µ.
Though the phrase rmw(aα; f) is used both as an expression and a command, con-
text will make its syntactic class unambiguous. In examples, we let the associated
memory order tag determine whether an identifier corresponds to an atomic or
non-atomic assignable.
Our semantic clauses will transform our language’s syntactic phrases into sets
of memory action pomsets. A (memory) action λ is a syntactic object representing
an action on the store. Memory actions are given by the following:
λ := δ (no-op)
| n :=na v (non-atomic write)
| a :=α v (atomic write, α ∈ {rlx , rel , sc})
| n=na v (non-atomic read)
| a=α v (atomic read, α ∈ {rlx , acq , sc})
| fenceα (atomic fence, α > rlx )
| rmwα(a; v; v
′) (atomic read-modify-write: read v and write v′).
Let A be the set of all actions and let Mo and Ide be the (partial) projections from
A to memory orderings and Ide, respectively. Given a predicate π on A, let Api be
the set of actions satisfying π. For example, A<µ is the set of all actions λ such
that Mo(λ) < µ. Special cases are the set Aµ of all memory actions with memory
ordering µ, the set Ar of all reading actions (i=µ v and rmwα(a; v; v
′)), the set Aw
of all writing actions (i :=µ v and rmwα(a; v; v
′)), the set Af of fence actions, and
the set Ax of all actions involving the identifier x.
A pomset over a set of labels L is a triple (P,<,Φ) where (P,<) is a strict partial
order satisfying the finite-height property and Φ : P → L is a labelling function. A
4
Kavanagh and Brookes
partial order (P,<) satisfies the finite-height property if for all p ∈ P , the set {q ∈
P | q < p} is finite. Because our pomsets describe orderings between a program’s
memory actions, the finite-height property implies that we have no unreachable
actions in our pomset, i.e., that every action could in principle be executed. As is
typical with mathematical structures, we call a pomset by its underlying set, and
given a pomset P , we let <P and ΦP denote its obvious components. We denote
by Pom(L) the set of pomsets over L. We typically refer to pomset elements
by their labels, relying on context to disambiguate which underlying element we
mean. In fact, the underlying elements carry no meaning, and we identify pomsets
P,Q ∈ Pom(L) whenever there exists an order-isomorphism Ψ : (P,<P )→ (Q,<Q)
respecting labels, i.e., satisfying ΦQ◦Ψ = ΦP . We can identify pomsets and labelled
directed acyclic graphs, as we did in Section 1, where we have an edge Φ(a)→ Φ(b)
if and only if a < b. We say a pomset P is linear if <P is a total order.
We further identify non-empty pomsets P,Q ∈ Pom(A) whenever there exists a
non-empty pomset R ∈ Pom(A) such that R can be obtained from P and from Q
by deleting finitely many δ actions. 3 This is akin to closure under stuttering and
mumbling in trace semantics (cf. [6]), and our semantics is well-defined relative to
it.
Our semantic clauses assign sets of pomsets to syntactic phrases, and composi-
tionality requires us to be able to compose the pomsets from subphrases to form
the denotation of a phrase. The sequential composition (P1, <1,Φ1) ; (P2, <2,Φ2)
of pomsets P1 and P2 is (P1 ⊎ P2, <,Φ1 ⊎Φ2) when P1 is finite, where (i, p) < (j, q)
if and only if i = j and p <i q, or i < j, and where (Φ1 ⊎ Φ2)(i, p) = Φi(p). In-
tuitively, this orders everything in P1 before everything P2 while preserving their
internal orderings. When P1 is infinite, P1 ; P2 = P1. The finiteness check on
P1 ensures P1 ; P2 satisfies the finite-height property. The parallel composition
(P1, <1,Φ1) ‖ (P2, <2,Φ2) of pomsets P1 and P2 is given by (P1 ⊎ P2, <,Φ1 ⊎ Φ2)
where (i, p) < (j, q) if and only if i = j and p <i q. It is straightforward to check
that these compositions are all associative with the empty pomset 0 = (∅,∅,∅) as
their unit. They lift to sets of pomsets in the obvious manner.
The denotation of an integer expression is a subset of Pom(A)×V inductively
defined on the syntax of the expression:
P(v) = {({δ}, v)}
P(xµ) = {({x=µ v}, v) | v ∈ V }
P(rmwα(a; f)) = {({rmwα(a; v; v
′)}, v′) | (v, v′) ∈ graph(f)}
P(e1 + e2) = {(P1 ‖ P2, v1 + v2) | (Pi, vi) ∈ P(ei)}
The xµ clause has a pomset {x=µ v} for each possible value v that could be read
from x. We must allow for all possible values to get compositionality: we do not
know a priori with which writes an expression may be composed, and hence do
not know what values might be read from x. The rmwα(a; f) clause is analogous
and captures the atomic nature of the read-modify-write by treating it as a single
memory action, rather than a sequenced read-write pair. We indicate that we
3 Formally, the deletion of S ⊆ P from P is given by (P \S,<P ∩ ((P \S)× (P \S)),Φ ↾ (P \S)). Deleting
finitely many δ actions from P means deleting a finite subset of Φ−1
P
(δ) from P .
5
Kavanagh and Brookes
compute the ei in e1 + e2 in parallel by combining the memory actions Pi with a
parallel composition.
The denotation of a boolean expression is a subset of Pom(A) × Bool and is
defined analogously. To simplify the clauses with conditionals, we introduce the
helper definitions P(b)true = {P | (P, true) ∈ P(b)} and the analogous P(b)false .
The denotation of a program p is a subset of Pom(A) inductively defined on
its syntax: P(p ‖ c) = P(p) ‖ P(c). The denotation of a command c is a subset of
Pom(A), also inductively defined on its syntax. The basic commands are given by:
P(skip) = {{δ}}
P(x :=µ e) = {P ; {x :=µ v} | (P, v) ∈ P(e)}
P(fenceα) = {{fenceα}}
P(rmwα(a; f)) = {{rmwα(a; v; v
′)} | (v, v′) ∈ graph(f)}
The only interesting clause here is for x :=µ e, where data dependency requires that
the corresponding write be sequenced after all actions performed in computing e.
Before we can give semantic clauses for compound commands, we must introduce
the relaxed sequential composition. The relaxed composition of two pomsets orders
actions from the first before those of the second only when required by the memory
model. To make this precise, we introduce the following predicates. IsAcq(λ) holds
if and only if λ ∈ Ar∪Af andMo(λ) ≥ acq . IsRel(λ) holds if and only if λ ∈ Aw∪Af
and Mo(λ) ≥ rel . Actions λ and λ′ are memory-ordered, Ord(λ, λ′), if and only
if Ide(λ) = Ide(λ′), IsAcq(λ), or IsRel(λ′). The relaxed sequential composition
(P1, <1,Φ1)
...(P2, <2,Φ2) of pomsets is (P1⊎P2, <
+,Φ1⊎Φ2) when P1 is finite, where
(i, p) < (j, q) if and only if i = j and p <i q, or i = 1, j = 2, and Ord(Φ1(p),Φ2(q)),
and <+ is the transitive closure of <. When P1 is infinite, P1
... P2 = P1. Relaxed
sequential composition is also associative with 0 as its unit.
The sequencing, looping, and conditional clauses are given by:
P(c1 ; c2) = P(c1)
... P(c2)
P(if b then c1 else c2) = (P(b)true ; P(c1)) ∪ (P(b)false ; P(c2))
P(while b do c) =
(
∞⋃
n=0
In(b, c)
)
∪ Iω(b, c)
I0(b, c) = P(b)false
In+1(b, c) = P(b)true ;
(
P(c)
... I
n(b, c)
)
where Iω(b, c) is the taken to be the evident infinite unfolding. There are a few sub-
tleties in these clauses. In the clause for if b then c1 else c2, we use sequential com-
positions because there is a control-flow dependency between the memory actions
for b and those for the ci. Respecting this dependency is important to eliminating
“thin-air” behaviours. Indeed, suppose we had used the relaxed composition instead,
and consider the program if yrlx = 1 then x :=rlx 1 ‖ if xrlx = 1 then y :=rlx 1.
Then it would have a pomset of the form {y=rlx 1 x :=rlx 1 x=rlx 1 y :=rlx 1}
in its denotation, and none of the memory actions would be ordered because the
6
Kavanagh and Brookes
locations in the boolean expressions and the commands in the conditional branches
involve different locations. One could then perform the write actions before the
read actions and execute the whole program, even from a state where y and x are
initialized to 0. By instead using sequential composition, the reads are sequenced
before the branch’s writes, and the program is not executable from this state. In
contrast, in the clause for c1 ;c2, we should be permitted to reorder memory accesses
if the memory model allows it, and so we use the relaxed sequential composition.
The last clause is for local assignables, which can be thought of as registers.
Given a command local nna = v in c, the intention is that the assignable n should
be initialized to v and be visible only to c. Consequently, any other commands
c′ should not be able to observe c’s effects on n, even if n appears free in c′. We
must, however, be able to observe that c did an action whenever it does an n-
action: the program local nna = 0 in (while nna = 0 do n :=na 0) should be non-
terminating. To satisfy these desiderata we take all of the pomsets of c whose uses
of the location n are internally consistent and then replace all n-actions with no-op
δ actions. We formally accomplish this by introducing additional operations on
pomsets. To ensure internal consistency on n, we need to restrict our attention to
n-actions. The restriction of a pomset (P,<,Φ) to a subset L′ ⊆ L is the pomset
P ↾ L′ = (Φ−1(L′), < ∩ L′ × L′,Φ ↾ Φ−1(L′)) obtained by discarding all elements
whose label is not in L′. To make sure they are internally consistent, we check that
they are sequentially executable. This is accomplished with a predicate SeqExecn(P )
that holds if and only if P is n :=na v followed by zero or more occurrences of n=na v,
or if P = P1 ; P2 with SeqExecn(P1) and SeqExecn(P2). This syntactic check is
equivalent to the sequential executions of Section 4. Finally, to replace all n-actions
by δ actions, we need a substitution operation. The substitution of l for L′ ⊆ L in
P , [l/L′]P , is given by (P,<P ,Φ) where Φ(p) = l if ΦP (p) ∈ L
′, and Φ(p) = ΦP (p)
otherwise. Combining these ingredients, we get the clause
P(local nna = v in c) = {[δ/An]P | P ∈ P(c),SeqExecn({i :=na v} ; P ↾ An)}.
This definition satisfies various desirable equivalences, such as
local nna = 0 in n :=na 42 ≡P skip
local nna = 0 in (while nna = 0 do n :=na 0) ≡P while true do skip,
where p ≡P p
′ is program equivalence, defined to hold if and only if P(p) = P(p′).
To illustrate our semantic clauses, we observe that the command
while i < 2 do (x :=na xna + 1 ; i :=na ina + 1)
includes pomsets of the following form, for each v ∈ V , in its denotation:
i=na 0 // i :=na 1 //
))❙
❙❙❙
❙
i=na 1 // i :=na 2
i=na 0
::ttt

❅❅
❅❅
i=na 1
55❦❦❦❦❦
##●
●●
x=na v // x :=na v + 1 // x=na v + 1 // x :=na v + 2.
7
Kavanagh and Brookes
In contrast, the command
local ina = 0 in while i < 2 do (x :=na xna + 1 ; i :=na ina + 1)
has executable pomsets of the form
x=na v // x :=na v + 1 // x=na v + 1 // x :=na v + 2 .
4 Executional Interpretation
We give a race-detecting input-output interpretation to the abstract denotations
of Section 3. This interpretation serves three main purposes. First, it gives us a
notion of “running” the executions a pomset describes, and it tells us the initial
states from which we can do so, along with the corresponding effects on state.
Second, it gives us a means of detecting which syntactic races are meaningful, and
which can not occur. For example, the program (1) (page 3) has a syntactic race
on the non-atomic location x, but this race can never occur during an execution
starting from a zero-initialized state because of the synchronization via the atomic
location y. Finally, it allows us to rule out various pomsets assigned to commands
that are not executable alone, but that are included for the sake of compositionality
and that are executable in a larger environment. Consider, for example, the pomset
x :=sc 2 //x=sc 1 //y :=sc 1 belonging to the program x :=sc 2;if xsc = 1 then y :=sc
1. It is not executable, but it would be if we were to compose it with x :=sc 1.
We use two kinds of state: proper and overdefined. Proper states are finite
partial function from identifiers to values, in particular, elements of Ide ⇀fin V⊥.
We include a least element ⊥ in the codomain to denote an unconstrained value.
Its purpose will be made clear when we define footprints of actions below. We
use the notation [x1 : u1, . . . , xn : un ] to mean the proper state whose graph is
{(x1, u1), . . . , (xn, un)}. Given proper states σ and σ
′, let σ ⊑ σ′ if and only if for
all x ∈ dom(σ), σ(x) ⊑ σ′(x). The symbol ⊤ is the overdefined state, which is the
result of a race. Let Σ be the set of all states, ranged over by σ.
We proceed in two stages. We first assign an executional meaning to individual
memory actions. Then, we assign an executional meaning to action pomsets. Be-
cause we are in a weak memory setting in which a single location can be acted on
concurrently, the concept of a “global state” is not well-defined. Indeed, hardware
features such as write buffers could cause different threads to read different values
from the same location at the same time. Instead, we use a local notion of state
called a footstep. A footstep of an action λ is a pair (σ, τ) of states, where σ is
a minimal piece of state enabling λ to be performed, and τ describes the effect of
performing λ from σ. The footprint Jλ K of λ is the set of all of its footprints. We
define the footprints of memory actions as follows:
Jx=µ v K = {([x : v ], [ ])} J δ K = {([ ] , [ ])}
Jx :=µ v K = {([x : ⊥ ], [x : v ])} J fenceα K = {([ ] , [ ])}
J rmwα(a; v; v
′) K = {([ a : v ], [ a : v′ ])}
Informally, it should be clear that none of the above actions cause any allocation:
8
Kavanagh and Brookes
whenever (σ, τ) ∈ Jλ K for some action λ, dom(τ) ⊆ dom(σ). We use the ⊥ value
in the codomain of proper states to indicate that, though a write action x :=µ v
requires that the location x appear in the initial state, it is ambivalent to its value.
The footprint of an action is also agnostic of the action’s memory ordering tag.
We can give pomsets an analogous notion of footprint. We will do so by recursing
on the structure of the pomset, considering three principle cases: when the pomset is
a single action, when the pomset can be decomposed into a pair of parallel pomsets,
and when the pomset has an executable prefix.
We first specify structural conditions for when two pomsets can be run concur-
rently and whether doing so constitutes a race. Because we want a total order
on all sc actions, we cannot run two pomsets containing sc actions concurrently.
So we say concurrently executing P1 and P2 respect sc actions, P1 rsc P2, if and
only if only one of them performs sc actions, i.e., if and only if P1 ↾ Asc = ∅
or P2 ↾ Asc = ∅. We say that pomsets P1 and P2 have a data race
4 on n if
n ∈ RaceLocs(P1, P2) =
⋃
1≤i 6=j≤2 Ide(Pi ↾Ana∩Aw)∩ Ide(Pj ↾Ana ) and that they have
a data race, P1 dr P2, if they have one on some n. Intuitively, n ∈ RaceLocs(P1, P2)
means that P1 and P2 both act on n with at least one of them writing to n.
Pomsets P1 and P2 are consistent, P1 co P2, if (i) ¬(P1 dr P2), (ii) P1 rsc P2,
and (iii) Ide(P1 ↾ Aw) ∩ Ide(P2 ↾ Aw) = ∅. Consistency means that there is no
syntactic constraint preventing us from considering concurrent execution of P1 and
P2. The third condition means we do not have any write-write races between P1
and P2, and is required to totally order writes on a per-location basis. In contrast,
pomsets P1 and P2 could race, P1 rc P2, if (i) P1 dr P2, (ii) P1 rsc P2, and
(iii) Ide(P1 ↾ (Aw\Ana))∩Ide(P2 ↾ (Aw\Ana )) = ∅. The third condition means that
we do not have any atomic write-write races. The intention is whenever P1 rc P2,
we should be able to regain consistency by deleting all of the data races.
Next, we need a notion of splitting a pomset into a prefix and a suffix that can
sequentially be executed. We say that a subset Q of a pomset P is downward-closed
if whenever p <P q and q ∈ Q, then p ∈ Q. We write P1✁P P2 to mean that P1 is a
finite downward-closed subset of P and that P2 is the remainder of P . In this case,
we call P1 a prefix of P and P2 a suffix of P . Observe that if P = P1 ‖ P2 and P1
is finite, then P1 ✁P P2; finiteness is needed to guarantee fairness.
When executing two threads in parallel, we need only consider footsteps starting
from consistent states. We say two proper states σ, σ′ ∈ Σ are consistent, σ ⇑ σ′,
if σ ⊔ σ′ exists. This means that for all x ∈ dom(σ) ∩ dom(σ′), if σ(x) 6= ⊥ and
σ′(x) 6= ⊥, then σ(x) = σ′(x). The overdefined state ⊤ is consistent with no state.
Given a set S and a state σ, we let σ \S be ⊤ when σ = ⊤, and {(x, v) ∈ σ | x /∈ S}
otherwise. Given proper states σ and σ′, updating σ by σ′ gives us a new state
[σ | σ′ ] = σ′ ⊔ (σ \ dom σ′). Explicitly, [σ | σ′ ](x) is σ′(x) whenever x ∈ dom(σ′),
and σ(x) whenever x /∈ dom(σ′). If σ or σ′ is ⊤, then [σ | σ′ ] is defined to be ⊤.
To combine the initial states of two pomsets data racing on R ⊆ Idena , we define
the racy product σ1 ⊗R σ2 = [σ1 ⊔ σ2 | (σ1 ⊓ σ2) ↾ R ], explained below.
We let the footprint JP K of a pomset P be inductively defined as the least set
given by the following rules, which are explained below:
4 One could replace Ana with Api throughout to instead consider races between actions satisfying pi.
9
Kavanagh and Brookes
(Act) If P = {λ}, then (σ, τ) ∈ JP K for all (σ, τ) ∈ Jλ K.
(Seq) If P1 ✁P P2, (σi, τi) ∈ JPi K, [σ1 | τ1 ] ⇑ σ2, and ⊤ /∈ {τ1, τ2},
then (σ1 ⊔ (σ2 \ dom τ1), [ τ1 | τ2 ]) ∈ JP K.
(Par) If P = P1 ‖ P2, P1 co P2, (σi, τi) ∈ JPi K, σ1 ⇑ σ2, and ⊤ /∈ {τ1, τ2},
then (σ1 ⊔ σ2, τ1 ⊔ τ2) ∈ JP K.
(Race) If P = P1 ‖ P2, P1 rc P2, (σi, τi) ∈ JPi K, σ1 ⇑ σ2, and ⊤ /∈ {τ1, τ2},
then (σ1 ⊗RaceLocs(P1,P2) σ2,⊤) ∈ JP K.
(RaceP) If P1 ✁P P2 and (σ1,⊤) ∈ JP1 K, then (σ1,⊤) ∈ JP K.
(RaceS) If P1 ✁P P2, (σi, τi) ∈ JPi K, [σ1 | τ1 ] ⇑ σ2, and τ2 = ⊤,
then (σ1 ⊔ (σ2 \ dom τ1),⊤) ∈ JP K.
The set of executions of a pomset P is E(P ) = {(σ, [σ | τ ]) | σ ∈ Σ, (σ′, τ) ∈
JP K, σ′ ⊑ σ, Im(σ) ⊆ V}, and the set of executions of a program p is E(p) =⋃
P∈P(p) E(P ). Executions capture running programs on “real” states, so we require
initial states to have a specific value for each location. We say P is executable from
σ if (σ, τ) ∈ E(P ) for some τ , and P is racy if (σ,⊤) ∈ JP K for some σ.
We explain each of the rules in turn. The (Seq) rule captures sequential exe-
cution of a prefix P1 before a suffix P2. The consistency condition [σ1 | τ1 ] ⇑ σ2
tells us that, if we start from a state satisfying σ1 and update it with the effects τ1
of performing P1, the resulting state doesn’t disable the execution of P2. The state
σ2 \ dom τ1 contributes the initial state required by P2 that isn’t provided by P1.
The (Par) rule tells us that whenever P1 and P2 cannot race and agree on their
initial states, then we can run them in parallel with resulting effect τ1⊔τ2. The effect
τ1 ⊔ τ2 is a well-defined proper state because the assumption P1 co P2 guarantees
that P1 and P2 do not write to the same location, i.e., that dom(τ1)∩ dom(τ2) = ∅.
The (RaceP) rule handles races a in pomset’s prefix. The rule tells us that if
σ1 is sufficient to reach a race in P1 and P1 is a prefix of P , then σ1 is sufficient
to reach a race in P . This captures the viewpoint that if we ever encounter a race,
we do not need to execute the rest of the program. The (RaceS) rule deals with a
race in a suffix of a pomset, and is analogous to the (Seq) rule.
The (Race) rule resembles the (Par) rule. The key difference is how we form the
initial state. Before considering motivating examples, we first unpack the definition
of the racy product σ = σ1 ⊗RaceLocs(P1,P2) σ2, assuming σ1 ⇑ σ2. If x /∈ dom(σ2),
then σ(x) = σ1(x), and vice-versa. Now consider x ∈ dom(σ1) ∩ dom(σ2). If
x ∈ RaceLocs(P1, P2), i.e., if P has a race on the non-atomic location x, then
σ(x) = σ1(x) ⊓ σ2(x), i.e., σ(x) = ⊥ if any of the σi(x) is ⊥. As we will see in
the example Qµ below, this captures a race where the action writing to x does not
depend on a prior read from x in order to be executable. If x /∈ RaceLocs(P1, P2),
then σ(x) = σ1(x) ⊔ σ2(x). For example, [x : 0, a : 1, n : 2, n
′ : 3, n′′ : 4 ] ⊗{a,n,n′}
[ a : 1, n : ⊥, n′ : 3, n′′ : ⊥ ] is [x : 0, a : 1, n : ⊥, n′ : 3, n′′ : 4 ]
To begin with, consider the pomset Qµ = {x :=µ v x :=µ 0oo //x=µ u}, and as-
sume first that µ = na. We argue that this pomset should be executable and racy
for all values of u and v: it has a non-atomic write of v to x that is not sequenced
with the non-atomic read of u from x, and so the values should not matter. Even
when u /∈ {0, v}, it could be that u is the value read from an intermediate hardware
10
Kavanagh and Brookes
state caused by the writing of x. Our semantics validates this desideratum: we can
apply (Race) to the unsequenced actions to get the footstep ([x : ⊥ ],⊤), and then
using (RaceS) we get that ([x : ⊥ ],⊤) ∈ JQna K. Now assume that µ 6= na, then
Qµ is not na-racy: it has no na actions on which it can race. Moreover, Qµ has a
non-empty footprint only if u ∈ {0, v}, and JQµ K = {([x : ⊥ ], [x : v ])}.
Next consider the pomset Rµ = {x :=µ v x=µ woo x :=µ 0oo //x=µ u}, and as-
sume first that µ = na. We argue that this pomset should be executable only when
w = 0: when w 6= 0, there is no write supplying the value w required by the read,
and so the data-race on x should not be able to manifest itself. However, when
w = 0, this pomset should be racy, because we have an unsequenced non-atomic
write and read from x. Our semantics captures these behaviours. For example,
J x :=µ v x=µ woo K = {([x : w ], [x : v ])} and Jx :=µ u K = {([x : u ], [ ])}, and the
states [x : w ] and [x : u ] are consistent only if w = u. When this is the case,
we get ([x : w ],⊤) as the sole footprint for the parallel composition. But we can
apply the (RaceS) rule only when [x : ⊥ | x : 0 ] ⇑ [x : w ], which holds only
when w = 0. Decomposing the pomset differently provides the same constraint. In
contrast, when µ 6= na, Rµ is not racy and it is executable only when w = 0 and
u ∈ {0, v}.
Finally, consider the pomset S = {x :=na v //y :=rel 1 y=acq 1 //x=na u}. We
get the footsteps ([x : u, y : 1 ], [x : v, y : 1 ]) and ([x : ⊥, y : 1 ],⊤) for all u
and v. The first footstep corresponds to a sequential execution with all the reads
before the writes. The second footstep corresponds to y reading 1 from the initial
state, and then the program racing on x. When u = v, we also get the footstep
([x : ⊥, y : ⊥ ], [x : v, y : 1 ]), which describes an execution where all reads are
performed after all writes.
We show that our definition of execution validates various other litmus tests
in the way we expect. The sb (store buffering) test is the simplest example of
behaviour that is not sequentially consistent. When α 6= sc, we can execute the
pomset {x :=α 1 //y=α 0 y :=α 1 //x=α 0} starting from the state [x : 0, y : 0 ] by
using the (Par) rule.
We can also validate the iriw (independent reads of independent writes) test.
Indeed, we can execute the pomset
x :=rel 1 y=acq 1 // x=acq 0 y :=rel 1 x=acq 1 // y=acq 0
starting from the initial state [x : 0, y : 0 ] by splitting the pomset down the middle
and applying the (Par) rule. This shows that we are weaker than the TSO memory
model, because we do not impose a total order on all writes. Though we can read
writes to different locations in different orders, the co restriction on the (Par) rule
ensures that we have a per-location total order on writes. Then all threads see the
writes to the same location in the same order, i.e., we guarantee coherence.
5 Related Work
Most work on formalizing weak memory models so far uses “execution graphs” or
“candidate executions”, in which nodes are labeled with actions and there are mul-
tiple kinds of edge, usually characterized as po (program order), rf (reads-from),
11
Kavanagh and Brookes
mo (modification order), and so on. There is a substantial body of well-established
research in this vein, including [1,3,5,11,12]. In work aiming to formalize C/C++11
such as [4], axioms are imposed to rule out candidate executions involving unde-
sirable cycles of composite edges, primarily to avoid issues with so-called thin-air
reads, and to ensure that each read action is justified by a suitable write. It has
proven to be difficult to strike the right balance between ruling out bad cases while
still allowing intended behaviours.
Our memory model is heavily based on the RC11 (“Repaired C11”) model pre-
sented by Lahav et al. [10]. The RC11 model repairs compilation to Power by
providing a better semantics for SC accesses. Like RC11, our model differs from
C11 by not treating races between atomic accesses as undefined behaviour. Our
approach to eliminating “thin-air” behaviour is similar to theirs: we prohibit viola-
tions of data and control-flow dependency, which amounts to prohibiting cycles in
the execution (“hb”) order between reads and their corresponding writes.
In prior work [9], we give a semantics to the SPARC TSO weak memory model
using pomset denotations and executions, and buffered state. The semantics devel-
oped in this paper is a significant step forward toward greater generality and wider
applicability. Rather than attempt to model out-of-order execution by explicitly
modelling buffers, both in pomset generation and in execution, we use the relaxed
composition operator and leverage pomset structure. To enforce a total order on
writes during execution, TSO executions were parametrized with linearizations of
writes. In contrast, C11 executions totally order sc actions via the co relation,
thereby reducing technical overhead.
Jeffrey and Riely [8] and Castellan [7] give a denotational semantics using event
structures and exploit game-theoretic ideas in formulating a notion of execution.
We prefer to work with a set of pomsets, obtained by taking account of possible
relaxations and executions, rather than using a single structure combining these
possibilities into one object; it seems less cumbersome to work with sets-of-pomsets.
6 Conclusion
Our extension to incorporate the wider range of C11-style memory orderings re-
quired significant technical development in order to produce a denotational account
that is faithful to operational intuitions, and improves on the foundations laid by
our prior denotational account of TSO. A key new idea presented here is the relaxed
sequential composition for pomsets, a form of sequential composition that takes ac-
count of the memory model’s support for reordering of memory actions. We are
careful to avoid reordering of actions for which there is a control-flow dependency,
arguing that this eliminates a major class of thin-air behaviours. We discussed a
selection of “litmus test” examples to show that our semantics and our notion of
pomset execution yield results consistent with the literature. We plan a more com-
prehensive cataloguing of litmus tests to solidify this claim. Indeed, we expect to
prove a pomset analogue of the DRF-SC property, which provides programmers a
sufficient condition for ensuring their programs are not executionally racy.
12
Kavanagh and Brookes
References
[1] Batty, M., M. Dodds and A. Gotsman, Library abstraction for C/C++ concurrency, in: Proceedings
of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,
POPL ’13 (2013), pp. 235–248.
URL http://doi.acm.org/10.1145/2429069.2429099
[2] Batty, M., A. F. Donaldson and J. Wickerson, Overhauling SC atomics in C11 and OpenCL, in:
Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming
Languages, POPL ’16 (2016), pp. 634–648.
URL http://doi.acm.org/10.1145/2837614.2837637
[3] Batty, M., K. Memarian, S. Owens, S. Sarkar and P. Sewell, Clarifying and compiling C/C++
concurrency: From C++11 to POWER, in: Proceedings of the 39th Annual ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages, POPL ’12 (2012), pp. 509–520.
URL http://doi.acm.org/10.1145/2103656.2103717
[4] Batty, M., S. Owens, S. Sarkar, P. Sewell and T. Weber, Mathematizing C++ concurrency, in:
Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming
Languages, POPL ’11 (2011), pp. 55–66.
URL http://doi.acm.org/10.1145/1926385.1926394
[5] Boehm, H.-J. and S. V. Adve, Foundations of the C++ concurrency memory model, in: Proceedings of
the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI
’08 (2008), pp. 68–78.
URL http://doi.acm.org/10.1145/1375581.1375591
[6] Brookes, S., Full abstraction for a shared-variable parallel language, Information and Computation 127
(1996), pp. 145–163.
URL http://dx.doi.org/10.1006/inco.1996.0056
[7] Castellan, S., Weak memory models using event structures, in: Vingt-septie`me Journe´es Francophones
des Langages Applicatifs (JFLA 2016), 2016.
URL https://hal.inria.fr/hal-01333582
[8] Jeffrey, A. and J. Riely, On thin air reads towards an event structures model of relaxed memory, in:
Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’16 (2016),
pp. 759–767.
URL http://doi.acm.org/10.1145/2933575.2934536
[9] Kavanagh, R. and S. Brookes, A denotational semantics for SPARC TSO, in: Proceedings of the 33rd
Conference on the Mathematical Foundations of Programming Semantics (MFPS XXXIII), Electronic
Notes in Theoretical Computer Science, 2017, to appear.
URL http://coalg.org/mfps-calco2017/mfps-papers/14-kavanagh.pdf
[10] Lahav, O., V. Vafeiadis, J. Kang, C.-K. Hur and D. Dreyer, Repairing sequential consistency in
C/C++11, in: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design
and Implementation, PLDI 2017 (2017), pp. 618–632.
URL http://doi.acm.org/10.1145/3062341.3062352
[11] Sarkar, S., P. Sewell, F. Z. Nardelli, S. Owens, T. Ridge, T. Braibant, M. O. Myreen and J. Alglave, The
semantics of x86-cc multiprocessor machine code, in: Proceedings of the 36th Annual ACM SIGPLAN-
SIGACT Symposium on Principles of Programming Languages, POPL ’09 (2009), pp. 379–391.
URL http://doi.acm.org/10.1145/1480881.1480929
[12] Sewell, P., S. Sarkar, S. Owens, F. Z. Nardelli and M. O. Myreen, X86-TSO: A rigorous and usable
programmer’s model for x86 multiprocessors, Commun. ACM 53 (2010), pp. 89–97.
URL http://doi.acm.org/10.1145/1785414.1785443
[13] Vafeiadis, V., T. Balabonski, S. Chakraborty, R. Morisset and F. Zappa Nardelli, Common compiler
optimisations are invalid in the C11 memory model and what we can do about it, in: Proceedings
of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,
POPL ’15 (2015), pp. 209–220.
URL http://doi.acm.org/10.1145/2676726.2676995
13
