Reconciling Event Structures with Modern Multiprocessors by Moiseenko, Evgenii et al.
Reconciling Event Structures with Modern
Multiprocessors
Evgenii Moiseenko
St. Petersburg University, Russia and JetBrains Research, Russia
e.moiseenko@2012.spbu.ru
Anton Podkopaev
National Research University Higher School of Economics, Russia and MPI-SWS, Germany and
JetBrains Research, Russia
podkopaev@mpi-sws.org
Ori Lahav
Tel Aviv University, Israel
orilahav@tau.ac.il
Orestis Melkonian
University of Edinburgh, UK
melkon.or@gmail.com
Viktor Vafeiadis
MPI-SWS, Germany
viktor@mpi-sws.org
Abstract
Weakestmo is a recently proposed memory consistency model that uses event structures to resolve
the infamous “out-of-thin-air” problem and to enable efficient compilation to hardware. Nevertheless,
this latter property—compilation correctness—has not yet been formally established.
This paper closes this gap by establishing correctness of the intended compilation schemes from
Weakestmo to a wide range of formal hardware memory models (x86, POWER, ARMv7, ARMv8) in
the Coq proof assistant. Our proof is the first that establishes correctness of compilation of an
event-structure-based model that forbids “out-of-thin-air” behaviors, as well as the first mechanized
compilation proof of a weak memory model supporting sequentially consistent accesses to such a
range of hardware platforms. Our compilation proof goes via the recent Intermediate Memory Model
(IMM), which we suitably extend with sequentially consistent accesses.
2012 ACM Subject Classification Theory of computation → Logic and verification; Software and
its engineering → Concurrent programming languages
Keywords and phrases Weak Memory Consistency, Event Structures, IMM, Weakestmo.
Digital Object Identifier 10.4230/LIPIcs.ECOOP.2020.5
1 Introduction
A major research problem in concurrency semantics is to develop a weak memory model that
allows load-to-store reordering (a.k.a. load buffering, LB) and compiler optimizations (e.g.,
elimination of fake dependencies), while forbidding “out-of-thin-air” behaviors [19, 11, 5, 14].
The problem can be illustrated with the following two programs, which access locations
x and y initialized to 0. The annotated outcome a = b = 1 ought to be allowed for LB-fake
because 1 + a ∗ 0 can be optimized to 1 and then the instructions of thread 1 executed out of
order. In contrast, it should be forbidden for LB-data, since no optimizations are applicable.
a := [x] / 1
[y] := 1 + a ∗ 0
b := [y] / 1
[x] := b (LB-fake)
a := [x] / 1
[y] := a
b := [y] / 1
[x] := b (LB-data)
© Evgenii Moiseenko, Anton Podkopaev, Ori Lahav, Orestis Melkonian, Viktor Vafeiadis;
licensed under Creative Commons License CC-BY
34th European Conference on Object-Oriented Programming (ECOOP 2020).
Editors: Robert Hirschfeld and Tobias Pape; Article No. 5; pp. 5:1–5:34
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
ar
X
iv
:1
91
1.
06
56
7v
2 
 [c
s.P
L]
  2
8 M
ay
 20
20
5:2 Reconciling Event Structures with Modern Multiprocessors
Among the proposed models that correctly distinguish between these two programs is the
recent Weakestmo model [6]. Weakestmo was developed in response to certain limitations of
earlier models, such as the “promising semantics” of Kang et al. [12], namely that (i) they
did not cover the whole range of C/C++ concurrency features and that (ii) they did not
support the intended compilation schemes to hardware.
Being flexible in its design, Weakestmo addresses the former point. It supports all usual
features of the C/C++11 model [3] and can easily be adapted to support any new concurrency
features that may be added in the future. It does not, however, fully address the latter
point. Due to the difficulty of establishing correctness of the intended compilation schemes
to hardware architectures that permit load-store reordering (i.e., POWER, ARMv7, ARMv8),
Chakraborty and Vafeiadis [6] only establish correctness of suboptimal schemes that add
(unnecessary) explicit fences to prevent load-store reordering.
In this paper, we address this major limitation of the Weakestmo paper. We establish in
Coq correctness of the intended compilation schemes to a wide range of hardware architectures
that includes the major ones: x86-TSO [18], POWER [1], ARMv7 [1], ARMv8 [22]. The com-
pilation schemes, whose correctness we prove, do not require any fences or fake dependencies
for relaxed accesses. Because of a technical limitation of our setup (see §6), however, compi-
lation of read-modify-write (RMW) accesses to ARMv8 uses a load-reserve/store-conditional
loop (similar to that of ARMv7 and POWER) as opposed to the newly introduced ARMv8
instructions for certain kinds of RMWs.
The main challenge in this proof is to reconcile the different ways in which hardware
models and Weakestmo allow load-store reordering. Unlike most models at the programming
language level, hardware models (such as ARMv8) do not execute instructions in sequence;
they instead keep track of dependencies between instructions and ensure that no dependency
cycles ever arise in a single execution. In contrast, Weakestmo executes instructions in order,
but simultaneously considers multiple executions to justify an execution where a load reads
a value that indirectly depends upon a later store. Technically, these multiple executions
together form an event structure, upon which Weakestmo places various constraints.
IMMSC
ARMv7
POWER
x86-TSO
ARMv8
Weakestmo
C11
Figure 1 Results proved in this paper.
The high-level proof structure is shown in
Fig. 1. We reuse IMM, an intermediate memory
model, introduced by Podkopaev et al. [20] as
an abstraction over all major existing hardware
memory models. To support Weakestmo compila-
tion, we extend IMM with sequentially consistent
(SC) accesses following the RC11 model [14]. As
IMM is very much a hardware-like model (e.g., it
tracks dependencies), the main result is compilation from Weakestmo to IMM (indicated by
the bold arrow). The other arrows in the figure are extensions of previous results to account
for SC accesses, while double arrows indicate results for two compilation schemes.
The complexity of the proof is also evident from the size of the Coq development. We
have written about 30K lines of Coq definitions and proof scripts on top of an existing
infrastructure of about another 20K lines (defining IMM, the aforementioned hardware models
and many lemmas about them). As part of developing the proof, we also had to mechanize
the Weakestmo definition in Coq and to fix some minor deficiencies in the original definition,
which were revealed by our proof effort.
To the best of our knowledge, our proof is the first proof of correctness of compilation of
an event-structure-based memory model. It is also the first mechanized compilation proof
of a weak memory model supporting sequentially consistent accesses to such a range of
E. Moiseenko et al. 5:3
Init
R(x, 1)
W(y, 1)
R(y, 1)
W(x, 1)
po po ppo
rf
po po
(a) GLB: Execution graph of LB.
Init
R(x, 1)
W(y, 1)
R(y, 1)
W(x, 1)
po poppo ppo
rf
po po
(b) Execution of LB-data and LB-fake.
Figure 2 Executions of LB and LB-data/LB-fake with outcome a = b = 1.
hardware architectures. The latter, although fairly straightforward in our case, has had a
history of wrong compilation correctness arguments (see [14] for details).
Outline We start with an informal overview of IMM, Weakestmo, and our compilation proof
(§2). We then present a fragment of Weakestmo formally (§3) and its compilation proof (§4).
Subsequently, we extend these results to cover SC accesses (§5), discuss related work (§6)
and conclude (§7). The associated proof scripts and supplementary material for our paper
are publicly available at http://plv.mpi-sws.org/weakestmoToImm/.
2 Overview of the Compilation Correctness Proof
To get an idea about the IMM and Weakestmo memory models, consider a version of the
LB-fake and LB-data programs from §1 with no dependency in thread 1:
a := [x] / 1
[y] := 1
b := [y] / 1
[x] := b (LB)
As we will see, the annotated outcome is allowed by both IMM and Weakestmo, albeit in
different ways. The different treatment of load-store reordering affects the outcomes of other
programs. For example, IMM forbids the annotate outcome of LB-fake by treating it exactly
as LB-data, whereas Weakestmo allows the outcome by treating LB-fake exactly as LB.
2.1 An Informal Introduction to IMM
IMM is a declarative (also called axiomatic) model identifying a program’s semantics with a
set of execution graphs, or just executions. As an example, Fig. 2a contains GLB, an IMM
execution graph of LB corresponding to an execution yielding the annotated behavior.
Vertices of execution graphs, called events, represent memory accesses either due to the
initialization of memory or to the execution of program instructions. Each event is labeled
with the type of the access (e.g., R for reads, W for writes), the location accessed, and the
value read or written. Memory initialization consists of a set of events labeled W(x, 0) for
each location x used in the program; for conciseness, however, we depict the initialization
events as a single event with label Init.
Edges of execution graphs represent different relations on events. In Fig. 2, three different
relations are depicted. The program order relation (po) totally orders events originated from
the same thread according to their order in the program, as well as the initialization event(s)
before all other events. The reads-from relation (rf) relates a write event to the read events
that read from it. Finally, the preserved program order (ppo) is a subset of the program
order relating events that cannot be executed out of order. Such ppo edges arise whenever
there is a dependency chain between the corresponding instructions (e.g., a write storing the
value read by a prior read).
ECOOP 2020
5:4 Reconciling Event Structures with Modern Multiprocessors
Because of the syntactic nature of ppo, IMM conflates the executions of LB-data and
LB-fake leading to the outcome a = b = 1 (see Fig. 2b). This choice is in line with hardware
memory models; it means, however, that IMM is not suitable as a memory model for a
programming language (because, as argued in §1, LB-fake can be transformed to LB by an
optimizing compiler).
The executions of a program are constructed in two steps.1 First, a thread-local semantics
determines the sequential executions of each thread, where the values returned by each
read access are chosen non-deterministically (among the set of all possible values), and the
executions of different threads are combined into a single execution. Then, the execution
graphs are filtered by a consistency predicate, which determines which executions are allowed
(i.e., are IMM-consistent). These IMM-consistent executions form the program’s semantics.
IMM-consistency checks three basic constraints:
Completeness: Every read event reads from precisely one write with the same location and
value;
Coherence: For each location x, there is a total ordering of x-related events extending the
program order so that each read of x reads from the most recent prior write according to
that total order; and
Acyclic dependency: There is no cycle consisting only of ppo and rf edges.
The final constraint disallows executions in which an event recursively depends upon itself,
as this pattern can lead to “out-of-thin-air” outcomes. Specifically, the execution in Fig. 2b,
which represents the annotated behavior of LB-fake and LB-data, is not IMM-consistent
because of the (ppo ∪ rf)-cycle. In contrast, GLB is IMM-consistent.
2.2 An Informal Introduction to Weakestmo
We move on to Weakestmo, which also defines the program’s semantics as a set of execution
graphs. However, they are constructed differently—extracted from a final event structure,
which Weakestmo incrementally builds for a program.
An event structure represents multiple executions of a programs in a single graph. Like
execution graphs, event structures contain a set of events and several relations among them.
Like execution graphs, the program order (po) orders events according to each thread’s
control flow. However, unlike execution graphs, po is not necessarily total among the events
of a given thread. Events of the same thread that are not po-ordered are said to be in conflict
(cf) with one another, and cannot belong to the same execution. Such conflict events arise
when two read events originate from the same read instruction (e.g., representing executions
where the reads return different values). Moreover, cf “extends downwards”: events that
depend upon conflicting events (i.e., have conflicting po-predecessors) are also in conflict
with one other. In pictures, we typically show only the immediate conflict edges (between
reads originating from the same instruction) and omit the conflict edges between events
po-after immediately conflicting ones.
Event structures are constructed incrementally starting from an event structure consisting
only of the initialization events. Then, events corresponding to the execution of program
instructions are added one at a time. We start by executing the first instruction of a
program’s thread. Then, we may execute the second instruction of the same thread or the
first instruction of another thread, and so on.
1 For a detailed formal description of the graphs and their construction process we refer the reader to [20,
§2.2].
E. Moiseenko et al. 5:5
Init
e111 : R(x, 0)
jf
(a) Sa
Init
e111 : R(x, 0)
e121 : W(y, 1)
jf
(b) Sb with execution Xb selected
Init
e111 : R(x, 0)
e121 : W(y, 1)
e21 : R(y, 1)
jf
jf
(c) Sc
Init
e111 : R(x, 0)
e121 : W(y, 1)
e21 : R(y, 1)
e22 : W(x, 1)
jf
jf
(d) Sd with execution Xd selected
Init
e111 : R(x, 0)
e121 : W(y, 1)
e112 : R(x, 1) e21 : R(y, 1)
e22 : W(x, 1)
cf
jfjf
jf
(e) Se
Init
e111 : R(x, 0)
e121 : W(y, 1)
e112 : R(x, 1)
e122 : W(y, 1)
e21 : R(y, 1)
e22 : W(x, 1)
cf
ew
jf
(f) Sf with execution Xf selected
Figure 3 A run of Weakestmo witnessing the annotated outcome of LB.
As an example, Fig. 3 constructs an event structure for LB. Fig. 3a depicts the event
structure Sa obtained from the initial event structure by executing a := [x] in LB’s thread 1.
As a result of the instruction execution, a read event e111 : R(x, 0) is added.
Whenever the event added is a read, Weakestmo has to justify the returned value from an
appropriate write event. In this case, there is only one write to x—the initialization write—
and so Sa has a justified from edge, denoted jf, going to e111 in Sa. This is a requirement of
Weakestmo: each read event in an event structure has to be justified from exactly one write
event with the same value and location. (This requirement is analogous to the completeness
requirement in IMM-consistency for execution graphs.) Since events are added in program
order and read events are always justified from existing events in the event structure, po∪ jf
is guaranteed to be acyclic by construction.
The next three steps (Figures 3b to 3d) simply add a new event to the event structure.
Notice that unlike IMM executions, Weakestmo event structures do not track syntactic
dependencies, e.g., Sd in Fig. 3d does not contain a ppo edge between e21 and e22. This is
precisely what allows Weakestmo to assign the same behavior to LB and LB-fake: they
have exactly the same event structures. As a programming-language-level memory model,
Weakestmo supports optimizations removing fake dependencies.
The next step (Fig. 3e) is more interesting because it showcases the key distinction
between event structures and execution graphs, namely that event structures may contain
more than one execution for each thread. Specifically, the transition from Sd to Se reruns
the first instruction of thread 1 and adds a new event e112 justified from a different write
event. We say that this new event conflicts (cf) with e111 because they cannot both occur
in a single execution. Because of conflicts, po in event structures does not totally order all
events of a thread; e.g., e111 and e112 are not po-ordered in Se. Two events of the same thread
are conflicted precisely when they are not po-ordered.
ECOOP 2020
5:6 Reconciling Event Structures with Modern Multiprocessors
Init
e11 : R(x, 1)
e12 : W(y, 1)
e21 : R(y, 1)
e22 : W(x, 1)
ppo
rf
(a) TCa
Init
e11 : R(x, 1)
e12 : W(y, 1)
e21 : R(y, 1)
e22 : W(x, 1)
ppo
rf
(b) TCb
Init
e11 : R(x, 1)
e12 : W(y, 1)
e21 : R(y, 1)
e22 : W(x, 1)
ppo
rf
(c) TCc
Init
e11 : R(x, 1)
e12 : W(y, 1)
e21 : R(y, 1)
e22 : W(x, 1)
ppo
rf
(d) TCd
Init
e11 : R(x, 1)
e12 : W(y, 1)
e21 : R(y, 1)
e22 : W(x, 1)
ppo
rf
(e) TCe
Init
e11 : R(x, 1)
e12 : W(y, 1)
e21 : R(y, 1)
e22 : W(x, 1)
ppo
rf
(f) TCf
Figure 4 Traversal configurations for GLB.
The final construction step (Fig. 3f) demonstrates another Weakestmo feature. Conflicting
write events writing the same value to the same location (e.g., e121 and e122 in Sf) may be
declared equal writes, i.e., connected by an equivalence relation ew.2
The ew relation is used to define Weakestmo’s version of the reads-from relation, rf,
which relates a read to all (non-conflicted) writes equal to the write justifying the read. For
example, e21 reads from both e121 and e122.
The Weakestmo’s rf relation is used for extraction of program executions. An execution
graph G is extracted from an event structure S denoted S BG if G is a maximal conflict-free
subset of S, it contains only visible events (to be defined in §3), and every read event in G
reads from some write in G according to S.rf. Two execution graphs can be extracted from
Sf : {Init, e111, e121, e21, e22} and {Init, e112, e122, e21, e22} representing the outcomes a = 0 ∧ b = 1
and a = b = 1 respectively.
2.3 Weakestmo to IMM Compilation: High-Level Proof Structure
In this paper, we assume that Weakestmo is defined for the same assembly language as IMM
(see [20, Fig. 2]) extended with SC accesses and refer to this language as L. Having that, we
show the correctness of the identity mapping as a compilation scheme from Weakestmo to
IMM in the following theorem.
I Theorem 1. Let prog be a program in L, and G be an IMM-consistent execution graph of
prog. Then there exists an event structure S of prog under Weakestmo such that S BG.
To prove the theorem, we must show that Weakestmo may construct the needed event
structure in a step by step fashion. If the IMM-consistent execution graph G contains no
po ∪ rf cycles, then the construction is completely straightforward: G itself is a Weakestmo-
consistent event structure (setting jf to be just rf), and its events can be added in any
order extending po ∪ rf.
The construction becomes tricky for IMM-consistent execution graphs, such as GLB, that
contain po∪rf cycles. Due to the cycle(s), G cannot be directly constructed as a (conflict-free)
2 In this paper, we take ew to be reflexive, whereas it is is irreflexive in Chakraborty and Vafeiadis [6].
Our ew is the reflexive closure of the one in [6].
E. Moiseenko et al. 5:7
Weakestmo event structure. We must instead construct a larger event structure S containing
multiple executions, one of which will be the desired graph G. Roughly, for each po ∪ rf
cycle in G, we have to construct an immediate conflict in the event structure.
To generate the event structure S, we rely on a basic property of IMM-consistent execution
graphs shown by Podkopaev et al. [20, §§6,7], namely that execution graphs can be traversed
in a certain order, i.e., its events can be issued and covered in that order, so that in the
end all events are covered. The traversal captures a possible execution order of the program
that yields the given execution. In that execution order, events are not added according to
program order, but rather according to preserved program order (ppo) in two steps. Events
are first issued when all their dependencies have been resolved, and are later covered when
all their po-prior events have been covered.
In more detail, a traversal of an IMM-consistent execution graph G is a sequence of
traversal steps between traversal configurations. A traversal configuration TC of an execution
graph G is a pair of sets of events, 〈C, I〉, called the covered and issued set respectively. As
an example, Fig. 4 presents all six traversal configurations of the execution graph GLB of LB
from Fig. 2a except for the initial configuration. The issued set is marked by and the
covered set by .
A traversal might be seen as an execution of an abstract machine that can execute write
instructions early but has to execute everything else in order. The first option corresponds
to issuing a write event, and the second option to covering an event. The traversal strategy
has certain constraints. To issue a write event, all external reads that it depends upon
must be resolved; i.e., they must read from already issued events. To cover an event, all
its po-predecessors must also be covered.3 For example, in Fig. 4, a traversal cannot issue
e22 : W(x, 1) before issuing e12 : W(y, 1) nor cover e11 : R(x, 1) before issuing e22 : W(x, 1).
According to Podkopaev et al. [20, Prop. 6.5], every IMM-consistent execution graph G
has a full traversal of the following form:
G ` TCinit(G) −→ TC1 −→ TC2 −→ ... −→ TCfinal(G)
where the initial configuration, TCinit(G) , 〈G.Init, G.Init〉, has issued and covered only G’s
initial events and the final configuration, TCfinal(G) , 〈G.E, G.W〉, has covered all G’s events
and issued all its write events.
We construct the event structure S following a full traversal of G. We define a simulation
relation, I(prog, G, TC, S,X), between the program prog, the current traversal configuration
TC of execution G and the current event structure’s state 〈S,X〉, where X is a subset of
events corresponding to a particular execution graph extracted from the event structure S.
Our simulation proof is divided into the following three lemmas, which state that the
initial states are simulated, that simulation extends along traversal steps, and that the
similation of final states means that G can be extracted from the generated event structure.
I Lemma 2 (Simulation Start). Let prog be a program of L, and G be an IMM-consistent
execution graph of prog. Then I(prog, G, TCinit(G), Sinit(prog), Sinit(prog).E) holds.
I Lemma 3 (Weak Simulation Step). If I(prog, G, TC, S,X) and G ` TC −→ TC ′ hold,
then there exist S′ and X ′ such that I(prog, G, TC ′, S′, X ′) and S −→∗ S′ hold.
I Lemma 4 (Simulation End). If I(prog, G, TCfinal(G), S,X) holds, then the execution graph
associated with X is isomorphic to G.
3 For readers familiar with PS [12], issuing a write event corresponds to promising a message, and covering
an event to normal execution of an instruction.
ECOOP 2020
5:8 Reconciling Event Structures with Modern Multiprocessors
The proof of Theorem 1 then proceeds by induction on the length of the traversal
G ` TCinit(G) −→∗ TCfinal(G). Lemma 2 serves as the base case, Lemma 3 is the induction
step simulating each traversal step with a number of event structure construction steps, and
Lemma 4 concludes the proof.
The proofs of Lemmas 2 and 4 are technical but fairly straightforward. (We define I in a
way that makes these lemmas immediate.) In contrast, Lemma 3 is much more difficult to
prove. As we will see, simulating a traversal step sometimes requires us to construct a new
branch in the event structure, i.e., to add multiple events (see §4.3).
2.4 Weakestmo to IMM Compilation Correctness by Example
Before presenting any formal definitions, we conclude this overview section by showcasing
the construction used in the proof of Lemma 3 on execution graph GLB in Fig. 2a following
the traversal of Fig. 4. We have actually already seen the sequence of event structures
constructed in Fig. 3. Note that, even though Figures 3 and 4 have the same number of
steps, there is no one-to-one correspondence between them as we explain below.
Consider the last event structure Sf from Fig. 3. A subset of its events Xf marked by ,
which we call a simulated execution, is a maximal conflict-free subset of Sf and all read events
in Xf read from some write in Xf (i.e., are justified from a write deemed “equal” to some
write in Xf). Then, by definition, Xf is extracted from Sf . Also, an execution graph induced
by Xf is isomorphic to GLB. That is, construction of Sf for LB shows that in Weakestmo it is
possible to observe the same behavior as GLB. Now, we explain how we construct Sf and
choose Xf .
During the simulation, we maintain the relation I(prog, G, TC, S,X) connecting a program
prog, its execution graph G, its traversal configuration TC, an event structure S, and a
subset of its events X. Among other properties (presented in §4.2), the relation states that all
issued and covered events of TC have exact counterparts in X, and that X can be extracted
from S.
The initial event structure and XInit consist of only initial events. Then, following issuing
of event e12 : W(y, 1) in TCa (see Fig. 4a), we need to add a branch to the event structure that
has W(y, 1) in it. Since Weakestmo requires adding events according to program order, we
first need to add a read event corresponding to ‘a := [x]’ of LB’s thread 1. Each read event
in an event structure has to be justified from somewhere. In this case, the only write event to
location x is the initial one. That is, the added read event e111 is justified from it (see Fig. 3a).
In the general case, having more than one option, we would choose a ‘safe’ write event for
an added read event to be justified from, i.e., the one which the corresponding branch is
‘aware’ of already and being justified from which would not break consistency of the event
structure. After that, a write event e121 : W(y, 1) can be added po-after e111 (see Fig. 3b), and
I(LB, GLB, TCa, Sb, Xb) holds for Xb = {Init, e111, e121}.
Next, we need to simulate the second traversal step (see Fig. 4b), which issues W(x, 1). As
with the previous step, we first need to add a read event related to the first read instruction
of LB’s thread 2 (see Fig. 3c). However, unlike the previous step, the added event e21 has to
get value 1, since there is a dependency between instructions in thread 2. As we mentioned
earlier, the traversal strategy guarantees that e12 : W(y, 1) is issued at the moment of issuing
e22 : W(x, 1), so there is the corresponding event in the event structure to justify the read
event e21 from. Now, the write event e22 : W(y, 1) representing e22 can be added to the event
structure (see Fig. 3d) and I(LB, GLB, TCb, Sd, Xd) holds for Xd = {Init, e111, e121, e21, e22}.
In the third traversal step (see Fig. 4c), the read event e11 : R(x, 1) is covered. To have
a representative event for e11 in the event structure, we add e112 (see Fig. 3e). It is justified
E. Moiseenko et al. 5:9
from e22, which writes the needed value 1. Also, e112 represents an alternative to e111 execution
of the first instruction of thread 1, so the events are in conflict.
However, we cannot choose a simulated execution X related to TCc and Se by the
simulation relation since X has to contain e112 and a representative for e12 : W(y, 1) (in Se it is
represented by e121) while being conflict-free. Thus, the event structure has to make one other
step (see Fig. 3f) and add the new event e122 to represent e12 : W(y, 1). Now, the simulated
execution contains everything needed, Xf = {Init, e112, e122, e21, e22}.
Since Xf has to be extracted from Sf , every read event in X has to be connected via an
rf edge to an event in X.4 To preserve the requirement, we connect the newly added event
e122 and e121 via an ew edge, i.e., marking them to be equal writes.5 This induces an rf edge
between e122 and e21. That is, I(LB, GLB, TCc, Sf , Xf) holds.
To simulate the remaining traversal steps (Figures 4d to 4f), we do not need to modify
Sf because it already contains counterparts for the newly covered events and, moreover, the
execution graph associated with Xf is isomorphic to GLB. That is, we just need to show that
I(LB, GLB, TCd, Sf , Xf), I(LB, GLB, TCe, Sf , Xf), and I(LB, GLB, TCf , Sf , Xf) hold.
3 Formal Definition of Weakestmo
In this section, we introduce the notation used in the rest of the paper and define the
Weakestmo memory model. For simplicity, we present only a minimal fragment of Weakestmo
containing only relaxed reads and writes. For the definition of the full Weakestmo model, we
refer the readers to Chakraborty and Vafeiadis [6] and to our Coq development [17].
Notation Given relations R1 and R2, we write R1 ; R2 for their sequential composition.
Given relation R, we write R?, R+ and R∗ to denote its reflexive, transitive and reflexive-
transitive closures. We write id to denote the identity relation (i.e., id , {〈x, x〉}). For a set
A, we write [A] to denote the identity relation restricted to A (that is, [A] , {〈a, a〉 | a ∈ A}).
Hence, for instance, we may write [A] ;R ; [B] instead of R ∩ (A×B). We also write [e] to
denote [{e}] if e is not a set.
Given a function f : A→ B, we denote by =f the set of f -equivalent elements: (=f ,
{〈a, b〉 ∈ A×A | f(a) = f(b)}). In addition, given a relation R, we denote by R|=f the
restriction of R to f -equivalent elements (R|=f , R∩=f ), and by R| 6=f be the restriction of
R to non-f -equivalent elements (R| 6=f , R \=f ).
3.1 Events, Threads and Labels
Events, e ∈ Event, and thread identifiers, t ∈ Tid, are represented by natural numbers. We
treat the thread with identifier 0 as the initialization thread. We let x ∈ Loc to range over
locations, and v ∈ Val over values.
A label, l ∈ Lab, takes one of the following forms:
R(x, v) — a read of value v from location x.
W(x, v) — a write of value v to location x.
4 Actually, it is easy to show that there could be only one such event since equal writes are in conflict
and X is conflict-free.
5 Note that we could have left e122 without any outgoing ew edges since the choice of equal writes for
newly added events in Weakestmo is non-deterministic. However, that would not preserve the simulation
relation.
ECOOP 2020
5:10 Reconciling Event Structures with Modern Multiprocessors
Given a label l the functions typ, loc, val return (when applicable) its type (i.e., R or W),
location and value correspondingly. When a specific function assigning labels to events is
clear from the context, we abuse the notations R and W to denote the sets of all events labelled
with the corresponding type. We also use subscripts to further restrict this set to a specific
location (e.g., Wx denotes the set of write events operating on location x.)
3.2 Event Structures
An event structure S is a tuple 〈E, tid, lab, po, jf, ew, co〉 where:
E is a set of events, i.e., E ⊆ Event.
tid : E→ Tid is a function assigning a thread identifier to every event. We treat events
with the thread identifier equal to 0 as initialization events and denote them as Init, that
is Init , {e ∈ E | tid(e) = 0}.
lab : E→ Lab is a function assigning a label to every event in E.
po ⊆ E × E is a strict partial order on events, called program order, that tracks their
precedence in the control flow of the program. Initialization events are po-before all other
events, whereas non-initialization events can only be po-before events from the same
thread.
Not all events of a thread are necessarily ordered by po. We call such po-unordered
non-initialization events of the same thread conflicting events. The corresponding binary
relation cf is defined as follows:
cf , ([E \ Init] ; =tid ; [E \ Init]) \ (po ∪ po−1)?
jf ⊆ [E∩W] ; (=loc∩=val) ; [E∩R] is the justified from relation, which relates a write event
to the reads it justifies. We require that reads are not justified by conflicting writes (i.e.,
jf ∩ cf = ∅) and jf−1 be functional (i.e., whenever 〈w1, r〉, 〈w2, r〉 ∈ jf, then w1 = w2).
We also define the notion of external justification: jfe , jf \ po. A read event is
externally justified from a write if the write is not po-before the read.
ew ⊆ [E ∩ W] ; (cf ∩=loc ∩=val)? ; [E ∩ W] is an equivalence relation called the equal-writes
relation. Equal writes have the same location and value, and (unless identical) are in
conflict with one another.
co ⊆ [E ∩ W] ; (=loc \ ew) ; [E ∩ W] is the coherence order, a strict partial order that relates
non-equal write events with the same location. We require that coherence be closed with
respect to equal writes (i.e., ew ; co ; ew ⊆ co) and total with respect to ew on writes to
the same location:
∀x ∈ Loc. ∀w1, w2 ∈ Wx. 〈w1, w2〉 ∈ ew ∪ co ∪ co−1
Given an event structure S, we use “dot notation” to refer to its components (e.g.,
S.E, S.po). For a set A of events, we write S.A for the set A ∩ S.E (for instance, S.Wx =
{e ∈ S.E | typ(S.lab(e)) = W ∧ loc(S.lab(e)) = x}). Further, for e ∈ S.E, we write S.typ(e)
to retrieve typ(S.lab(e)). Similar notation is used for the functions loc and val. Given a
set of thread identifiers T , we write S.thread(T ) to denote the set of events belonging to one
of the threads in T , i.e., S.thread(T ) , {e ∈ S.E | S.tid(e) ∈ T}. When T = {thread(t)}
is a singleton, we often write S.thread(t) instead of S.thread({t}).
We define the immediate po and cf edges of an event structure as follows:
S.poimm , S.po \ (S.po ; S.po) S.cfimm , S.cf ∩ (S.poimm−1 ; S.poimm)
E. Moiseenko et al. 5:11
An event e1 is an immediate po-predecessor of e2 if e1 is po-before e2 and there is no event
po-between them. Two conflicting events are immediately conflicting if they have the same
immediate po-predecessor.6
3.3 Event Structure Construction
Given a program prog, we construct its event structures operationally in a way that guarantees
completeness (i.e., that every read is justified from some write) and po ∪ jf acyclicity. We
start with an event structure containing only the initialization events and add one event at a
time following each thread’s semantics.
For the thread semantics, we assume reductions of the form σ e−→ σ′ between thread
states σ, σ′ ∈ ThreadState and labeled by the event e ∈ E generated by that execution
step. Given a thread t and a sequence of events e1, ... , en ∈ S.thread(t) in immediate po
succession (i.e., 〈ei, ei+1〉 ∈ S.poimm for 1 ≤ i < n) starting from a first event of thread t (i.e.,
dom(S.po; [e1]) ⊆ Init), we can add an event e po-after that sequence of events provided that
there exist thread states σ1, ... , σn and σ′ such that prog(t)
e1−→ σ1 e2−→ σ2 · · · en−→ σn e−→ σ′,
where prog(t) is the initial thread state of thread t of the program prog. By construction,
this means that the newly added event e will be in conflict with all other events of thread t
besides e1, ... , en.
Further, when the new event e is a read event, it has to be justified from an existing
write event, so as to ensure completeness and prevent “out-of-thin-air” values. The write
event is picked non-deterministically from all non-conflicting writes with the same location
as the new read event. Similarly, when e is a write event, its position in co order should be
chosen. It can be done by either picking an ew equivalence class and including the new write
in it, or by putting the new write immediately after some existing write in co order. At each
step, we also check for event structure consistency (to be defined in Def. 5): If the event
structure obtained after the addition of the new event is inconsistent, it is discarded.
3.4 Event Structure Consistency
To define consistency, we first need a number of auxiliary definitions. The happens-before
order S.hb is a generalization of the program order. Besides the program order edges, it
includes certain synchronization edges (captured by the synchronizes with relation, S.sw).
S.hb , (S.po ∪ S.sw)+
For the fragment covered in this section, there are no synchronization edges (i.e., sw = ∅),
and so hb and po coincide. In the full model,7 however, certain justification edges (e.g.,
between release/acquire accesses) contribute to sw and hence to hb.
The extended conflict relation S.ecf extends the notion of conflicting events to account
for hb; two events are in extended conflict if they happen after conflicting events.
S.ecf , (S.hb−1)? ; S.cf ; S.hb?
As already mentioned in §2, the reads-from relation, S.rf, of a Weakestmo event structure
is derived. It is defined as an extension of S.jf to all S.ew-equivalent writes.
S.rf , (S.ew ; S.jf) \ S.cf
6 Our definition of immediate conflicts differs from that of [6] and is easier to work with. The two
definitions are equivalent if the set of initialization events is non-empty.
7 The full model is presented in [6] and also in our Coq development [17].
ECOOP 2020
5:12 Reconciling Event Structures with Modern Multiprocessors
Note that unlike S.jf−1, the relation S.rf−1 is not functional. This does not cause any
problems, however, since all the writes from whence a read reads have the same location and
value and are in conflict with one another.
The relation S.fr, called from-read or reads-before, places read events before subsequent
writes.
S.fr , S.rf−1 ; S.co
The extended coherence S.eco is a strict partial order that orders events operating on the
same location. (It is almost total on accesses to a given location, except that it does not
order equal writes nor reads reading from the same write.)
S.eco , (S.co ∪ S.rf ∪ S.fr)+
We observe that in our model, eco is equal to rf∪co;rf?∪fr;rf?, similar to the corresponding
definitions about execution graphs in the literature.8
The last ingredient that we need for event structure consistency is the notion of visible
events, which will be used to constrain external justifications. We define it in a few steps.
Let e be some event in S. First, consider all write events used to externally justify e or
one of its justification ancestors. The relation S.jfe ; (S.po ∪ S.jf)∗ defines this connection
formally. Among that set of write events restrict attention to those conflicting with e, and
call that set M . That is, M , dom(S.cf ∩ (S.jfe ; (S.po ∪ S.jf)∗) ; [e]). Event e is visible if
all writes in M have an equal write that is po-related with e. Formally,9
S.Vis , {e ∈ S.E | S.cf ∩ (S.jfe ; (S.po ∪ S.jf)∗) ; [e] ⊆ S.ew ; (S.po ∪ S.po−1)?}
Intuitively, visible events cannot depend on conflicting events: for every such justification
dependence, there ought to be an equal non-conflicting write.
Consistency places a number of additional constraints on event structures. First, it checks
that there is no redundancy in the event structure: immediate conflicts arise only because
of read events justified from non-equal writes. Second, it extends the constraints about cf
to the extended conflict ecf; namely that no event can conflict with itself or be justified
from a conflicting event. Third, it checks that reads are justified either from events of the
same thread or from visible events of other threads. Finally, it ensures coherence, i.e., that
executions restricted to accesses on a single location do not have any weak behaviors.
I Definition 5. An event structure S is said to be consistent if the following conditions hold.
dom(S.cfimm) ⊆ S.R (cfimm-read)
S.jf ; S.cfimm ; S.jf−1 ; S.ew is irreflexive. (cfimm-justification)
S.ecf is irreflexive. (ecf-irreflexivity)
S.jf ∩ S.ecf = ∅ (jf-non-conflict)
dom(S.jfe) ⊆ S.Vis (jfe-visible)
S.hb ; S.eco? is irreflexive. (coherence)
8 This equivalence equivalence does not hold in the original Weakestmo model [6]. To make the equivalence
hold, we made ew transitive, and require ew ; co ; ew ⊆ co.
9 Note, that in [6] the definition of the visible events is slightly more verbose. We proved in Coq [17] that
our simpler definition is equivalent to the one given there.
E. Moiseenko et al. 5:13
3.5 Execution Extraction
The last part of Weakestmo is the extraction of executions from an event structure. An
execution is essentially a conflict-free event structure.
I Definition 6. An execution graph G is a tuple 〈E, tid, lab, po, rf, co〉 where its components
are defined similarly as in the case of an event structure with the following exceptions:
po is required to be total on the set of events from the same thread. Thus, execution
graphs have no conflicting events, i.e., cf = ∅.
The rf relation is given explicitly instead of being derived. Also, there are no jf and ew
relations.
co totally orders write events operating on the same location.
All derived relations are defined similarly as for event structures. Next we show how to
extract an execution graph from the event structure.
I Definition 7. A set of events X is called extracted from S if the following conditions are
met:
X is conflict-free, i.e., [X] ; S.cf ; [X] = ∅.
X is S.rf-complete, i.e., X ∩ S.R ⊆ codom([X] ; S.rf).
X contains only visible events of S, i.e., X ⊆ S.Vis.
X is hb-downward-closed, i.e., dom(S.hb ; [X]) ⊆ X.
Given an event structure S and extracted subset of its events X, it is possible to associate
with X an execution graph G simply by restricting the corresponding components of S to X:
G.E = X G.tid = S.tid|X G.lab = S.lab|X
G.po = [X] ; S.po ; [X] G.rf = [X] ; S.rf ; [X] G.co = [X] ; S.co ; [X]
We say that such execution graph G is associated with X and that it is extracted from the
event structure: S BG.
Weakestmo additionally defines another consistency predicate to further filter out some
of the extracted execution graphs. In the Weakestmo fragment we consider, this additional
consistency predicate is trivial—every extracted execution satisfies it—and so we do not
present it here. In the full model, execution consistency checks atomicity of read-modify-write
instructions, and sequential consistency for SC accesses.
4 Compilation Proof for Weakestmo
In this section, we outline our correctness proof for the compilation from Weakestmo to the
various hardware models. As already mentioned, our proof utilizes IMM [20]. In the following,
we briefly present IMM for the fragment of the model containing only relaxed reads and
writes (§4.1), our simulation relation (§4.2) for the compilation from Weakestmo to IMM,
and outline the argument as to why the simulation relation is preserved (§4.3). Mapping
from IMM to the hardware models has already been proved correct by Podkopaev et al. [20],
so we do not present this part here. Later, in §5, we will extend the IMM mapping results to
cover SC accesses.
As a further motivating example for this section consider yet another variant of the load
buffering program shown in Fig. 5. As we will see, its annotated weak behavior is allowed by
IMM and also by Weakestmo, albeit in a different way. The argument for constructing the
Weakestmo event structure that exhibits the weak behavior from the given IMM execution
graph is non-trivial.
ECOOP 2020
5:14 Reconciling Event Structures with Modern Multiprocessors
r1 := [x] / 1
[y] := r1
[z] := 1
r2 := [y] / 1
r3 := [z] / 1
[x] := r3
Init
e11 : R(x, 1)
e12 : W(y, 1)
e13 : W(z, 1)
e21 : R(y, 1)
e22 : R(z, 1)
e23 : W(x, 1)
rf
rf
rf
ppo
ppo
Figure 5 A variant of the load-buffering program (left) and the IMM graph G corresponding to
its annotated weak behavior (right).
4.1 The Intermediate Memory Model IMM
In order to discuss the proof, we briefly present a simplified version of the formal IMM
definition, where we have omitted constraints about RMW accesses and fences.
I Definition 8. An IMM execution graph G is an execution graph (Def. 6) extended with
one additional component: the preserved program order ppo ⊆ [R] ; po ; [W].
Preserved program order edges correspond to syntactic dependencies guaranteed to be
preserved by all major hardware platforms. For example, the execution graph in Fig. 5 has
two ppo edges corresponding to the data dependencies via registers r1 and r3. (The full
IMM definition [20] distinguishes between the different types of dependencies—control, data,
adress–and includes them as separate components of execution graphs. In the full model,
ppo is actually derived from the more basic dependencies.)
IMM-consistency checks completeness, coherence, and acyclicity:10
I Definition 9. An IMM execution graph G is IMM-consistent if
codom(G.rf) = G.R, (completeness)
G.hb ;G.eco? is irreflexive, and (coherence)
G.rf ∪G.ppo is acyclic. (no-thin-air)
As we can see, the execution graph G of Fig. 5 is IMM-consistent because every read of
the graph reads from some write event and, moreover, the coherence and no-thin-air
properties hold.
4.2 Simulation Relation for Weakestmo to IMM Proof
In this section, we define the simulation relation I 11, which is used for the simulation of a
traversal of an IMM-consistent execution graph by a Weakestmo event structure presented in
§2.3.
The way we define I(prog, G, 〈C, I〉, S,X) induces a strong connection between events in
the execution graph G and the event structure S. We make this connection explicit with the
function s2gG,S : S.E→ G.E, which maps events of the event structure S into the events of
the execution graph G, such that e and s2gG,S(e) belong to the same thread and have the
10Again, this is a simplified presentation for a fragment of the model. We refer the reader to Podkopaev
et al. [20] for the full definition, which further distinguishes between internal and external rf edges.
11A refined version of the simulation relation for the fullWeakestmomodel can be found in [17, Appendix A]
E. Moiseenko et al. 5:15
same po-position in the thread.12 Note that s2gG,S is defined for all events e ∈ S.E, meaning
that the event structure S does not contain any redundant events that do not correspond to
events in the IMM execution graph G. The function s2gG,S , however, does not have to be
injective: in particular, events e and e′ that are in immediate conflict in S have the same
s2gG,S-image in G. In the rest of the paper, whenever G and S are clear from the context,
we omit the G,S subscript from s2g.
In the context of a function s2g (for some G and S), we also use V·W and T·U to lift s2g
to sets and relations:
for AS ⊆ S.E : VASW , {s2g(e) | e ∈ AS}
for AG ⊆ G.E : TAGU , {e ∈ S.E | s2g(e) ∈ AG}
for RS ⊆ S.E× S.E : VRSW , {〈s2g(e), s2g(e′)〉 | 〈e, e′〉 ∈ RS}
for RG ⊆ G.E×G.E : TRGU , {〈e, e′〉 ∈ S.E× S.E | 〈s2g(e), s2g(e′)〉 ∈ RG}
For example, TCU denotes a subset of S’s events whose s2g-images are covered events in G,
and VS.rfW denotes a relation on events in G whose s2g-preimages in S are related by S.rf.
We define the relation I(prog, G, 〈C, I〉, S,X) to hold if the following conditions are met:
1. G is an IMM-consistent execution of prog.
2. S is a Weakestmo-consistent event structure of prog.
3. X is an extracted subset of S.
4. S and X corresponds precisely to all covered and issued events and their po-predecessors:VS.EW = VXW = C ∪ dom(G.po? ; [I])
(Note that C is closed under po-predecessors, so dom(G.po? ; [C]) = C.)
5. Each S event has the same thread, type, modifier, and location as its corresponding
G event. In addition, covered and issued events in X have the same value as their
corresponding ones in G.
a. ∀e ∈ S.E. S.{tid, typ, loc, mod}(e) = G.{tid, typ, loc, mod}(s2g(e))
b. ∀e ∈ X ∩ TC ∪ IU. S.val(e) = G.val(s2g(e))
6. Program order in S corresponds to program order in G:VS.poW ⊆ G.po
7. Identity relation in G corresponds to identity or conflict relation in S:TidU ⊆ S.cf?
8. Reads in S are justified by writes that have already been observed by the corresponding
events in G. Moreover, covered events in X are justified by a write corresponding to that
read from the corresponding read in G:
a. VS.jfW ⊆ G.rf? ;G.hb?
12Here we assume existence and uniqueness of such a function. In our Coq development [17], we have a
different representation of execution graph events (but the same for events of event structures), which
makes the existence and uniqueness questions trivial.
More specifically, we follow Podkopaev et al. [20, §2.2]. There each non-initializing event e of an execution
graph G is encoded as a pair 〈t, n〉 where t is e’s thread and n is a serial number of e in thread t, i.e., a
position of e in G.po restricted to events of thread t; each initializing event is encoded by the corresponding
location—〈init l〉.
In this representation, the function s2gG,S for an event e returns (i) the e’s thread and a number of
non-initial events which S.po-preceded e if e is non-initialing or (ii) its location if it is initializing:
s2gG,S(e) ,
{
〈S.tid(e), |dom([S.E \ S.Init];S.po; [e])|〉 for e 6∈ S.Init
〈init S.loc(e)〉 for e ∈ S.Init
ECOOP 2020
5:16 Reconciling Event Structures with Modern Multiprocessors
Init
e11 : R(x, 1)
e12 : W(y, 1)
e13 : W(z, 1)
e21 : R(y, 1)
e22 : R(z, 1)
e23 : W(x, 1)
The execution graph G and
its traversal configuration TCa
ppo
ppo
Init
e111 : R(x, 0)
e121 : W(y, 0)
e131 : W(z, 1)
jf
The event structure Sa and
the selected execution Xa
Figure 6 The execution graph G, its traversal configuration TCa, the related event structure Sa,
and the selected execution Xa. Covered events are marked by and issued ones by . Events
belonging to the selected execution are marked by .
b. VS.jf ; [X ∩ TCU]W ⊆ G.rf
9. Every write event justifying some external read event should be S.ew-equal to some issued
write event in X:
dom(S.jfe) ⊆ dom(S.ew ; [X ∩ TIU])
10. Equal writes in S correspond to the same write event in G:VS.ewW ⊆ id
11. Every non-trivial S.ew equivalence class contains an issued write in X:
S.ew ⊆ (S.ew ; [X ∩ TIU] ; S.ew)?
12. Coherence edges in S correspond to coherence or identity edges in G. (We will explain in
§4.3 why a coherence edge in S might correspond to an identity edge in G.)VS.coW ⊆ G.co?
As an example, consider the execution G from Fig. 5, the traversal configuration
TCa , 〈{Init}, {Init, e13}〉, and the event structure Sa shown in Fig. 6. We will show that
I(prog, G, TCa, Sa, Xa), where Xa , Sa.E, holds.
Take s2gG,Sa = {Init 7→ Init, e111 7→ e11, e121 7→ e12, e131 7→ e13}. Given that cf = ew = ∅, the
consistency constraints hold immediately. For example, condition 8 holds because e111 is
justified by Init, which happens before it. Finally, note that only e131 and e13 are required to
have the same value by constraint 5, the other related thread events only need to have the
same type and address.
The definition of the simulation relation I renders the proofs of Lemmas 2 and 4 straight-
forward. Specifically, for Lemma 2, the initial configuration TCinit(G) containing only the
initialization events is simulated by the initial event structure Sinit as all the constraints are
trivially satisfied (Sinit.po = Sinit.jf = Sinit.ew = Sinit.co = ∅).
For Lemma 4, since TCfinal(G) covers all events of G, property 5 implies that the labels
of the events in X are equal to the corresponding events of G; property 6 means that po is
the same between them; property 8 means that rf is the same between them; properties 7
and 12 together mean that co is the same. Therefore, G and the execution corresponding to
X are isomorphic.
4.3 Simulation Step Proof Outline
We next outline the proof of Lemma 3, which states that the simulation relation I can be
restored after a traversal step.
E. Moiseenko et al. 5:17
Init
e11 : R(x, 1)
e12 : W(y, 1)
e13 : W(z, 1)
e21 : R(y, 1)
e22 : R(z, 1)
e23 : W(x, 1)
The traversal configuration TCb
vf vf
vf
ppo
ppo
Init
e111 : R(x, 0)
e121 : W(y, 0)
e131 : W(z, 1)
e211 : R(y, 0)
e221 : R(z, 1)
e231 : W(x, 1)
jf jf
jf
The event structure Sb and
the selected execution Xb
Figure 7 The traversal configuration TCb, the related event structure Sb, and the selected
execution Xb.
Suppose that I(prog, G, TC, S,X) holds for some prog, G, TC, S, and X, and we need
to simulate a traversal step TC −→ TC ′ that either covers or issues an event of thread
t. Then we need to produce an event structure S′ and a subset of its events X ′ such that
I(prog, G, TC ′, S′, X ′) holds. Whenever thread t has any uncovered issued write events,
Weakestmo might need to take multiple steps from S to S′ so as to add any missing events po-
before the uncovered issued writes of thread t. Borrowing the terminology of the “promising
semantics” [12], we refer to these steps as constructing a certification branch for the issued
write(s).
Before we present the construction, let us return to the example of Fig. 5. Consider
the traversal step from configuration TCa to configuration TCb , 〈{Init}, {Init, e13, e23}〉 by
issuing the event e23 (see Fig. 7). To simulate this step, we need to show that it is possible
to execute instructions of thread 2 and extend the event structure with a set of events Brb
matching these instructions. As we have already seen, the labels of the new events can differ
from their counterparts in G—they only have to agree for the covered and issued events. In
this case, we set Brb = {e211, e221, e231}, and adding them to the event structure Sa gives us
event structure Sb shown in Fig. 7.
In more detail, we need to build a run of thread-local semantics prog(2) e
2
11−−→ e
2
21−−→ e
2
31−−→ σ′
such that (1) it contains events corresponding to all the events of thread 2 up to e23 (i.e.,
e21, e
2
2, e
2
3) with the same location, type, and thread identifier and (2) any events corresponding
to covered or issued events (i.e., e23) should also have the same value as the corresponding
event in G.
Then, following the run of the thread-local semantics, we should extend the event structure
Sa to Sb by adding new events Brb, and ensure that the constructed event structure Sb is
consistent (Def. 5) and simulates the configuration TCb. In particular, it means that:
for each read event in Brb we need to pick a justification write event, which is either
already present in S or po-preceed the read event;
for each write event in Brb we should determine its position in co order of the event
structure.
Finally, we need to update the selected execution by replacing all events of thread 2 by the
new events Brb: Xb , Xa \ S.thread({2}) ∪Brb.
ECOOP 2020
5:18 Reconciling Event Structures with Modern Multiprocessors
4.3.1 Justifying the New Read Events
In order to determine whence these read events should be justified (and hence what value
they should return), we have adopted the approach of Podkopaev et al. [20] for a similar
problem with certifying promises in the compilation proof from PS to IMM. The construction
relies on several auxiliary definitions.
First, given an execution G and a traversal configuration 〈C, I〉, we define the set of
determined events to be those events of G that must have equal counterparts in S. In
particular, this means that S should assign to these events the same label as G, and thus the
same reads-from source for the read events.
G.determined〈C,I〉 , C∪I∪dom((G.rf ∩G.po)? ;G.ppo ; [I])∪codom([I] ; (G.rf ∩G.po))
Besides covered and issued events, the set of determined events also contains the ppo-prefixes
of issued events, since issued events may depend on their values, as well as any internal reads
reading from issued events, since their values are also determined by the issued events.
For the graph G and traversal configuration TCb, the set of determined events contains
events e13, e22, and e23. (The events e13 and e23 are issued, whereas e22 has a ppo edge to e23.)
In contrast, events e11, e12, and e21 are not determined, since their corresponding events in S
read/write a different value.
Second, we introduce the viewfront relation (vf) to contain all the writes that have been
observed at a certain point in the graph. That is, the edge 〈w, e〉 ∈ G.vfTC indicates that
the write w either happens before e, is read by a covered event happening before e, or is
read by a determined read earlier in the same thread as e.
G.vf〈C,I〉 , [G.W] ; (G.rf ; [C])? ;G.hb? ∪G.rf ; [G.determined〈C,I〉] ;G.po?
Figure 7 depicts three G.vfTCb edges. Since G.vfTC ;G.po ⊆ G.vfTC , the other incoming
viewfront edges to thread 2 can be derived. Note that there is no edge from e12 to thread 2,
since e12 neither happens before any event in thread 2 nor is read by any determined read.
Finally, we construct the stable justification relation (sjf) that helps us justify the read
events in Brb in the event structure:
G.sjfTC , ([G.W] ; (G.vfTC ∩=G.loc) ; [G.R]) \ (G.co ;G.vfTC)
It relates a read event r to the co-last ‘observed’ write event with same location. Assuming
that G is IMM-consistent, it can be shown that G.sjf agrees with G.rf on the set of
determined reads.
G.sjfTC ; [G.determinedTC ] ⊆ G.rf
For the graph G and traversal configuration TCb shown in Fig. 7 the sjf relation coincides
with the depicted vf edges: i.e., we have 〈Init, e11〉, 〈Init, e21〉, 〈e13, e22〉 ∈ G.sjfTCb .
Having sjfTCb as a guide for values read by instructions in the certification run, we
construct the steps of the thread-local operational semantics prog(2) −→∗ σ′ using the
receptiveness property of the thread’s semantics, which essentially says that given an execution
trace τ = e1, ... , en of the thread semantics, and a subset of events K ⊆ {e1, ... , en−1} along
that trace that have no ppo-successors in the graph, we arbitrarily change the values of read
events in K, and there exist values for the write events in K such that the updated execution
trace is also a trace of the thread semantics.13
13The formal definition of the receptiveness property is quite elaborate. For the detailed definition we
refer the reader to the Coq development of IMM [7].
E. Moiseenko et al. 5:19
Init
e11 : R(x, 1)
e12 : W(y, 1)
e13 : W(z, 1)
e21 : R(y, 1)
e22 : R(z, 1)
e23 : W(x, 1)
The traversal configuration TCc
ppo
ppo
Init
e111 : R(x, 0)
e121 : W(y, 0)
e131 : W(z, 1)
e112 : R(x, 1)
e122 : W(y, 1)
e132 : W(z, 1)
e211 : R(y, 0)
e221 : R(z, 1)
e231 : W(x, 1)
cf
co
ew
The event structure Sc and
the selected execution Xc
Figure 8 The traversal configuration TCc, the related event structure Sc, and the selected
execution Xc.
The relation sjfTCb is also used to pick justification writes for the read events in Brb. We
have proved that each sjf edge either starts in some issued event (of the previous traversal
configuration) or it connects two events that are related by po:
G.sjfTCb ⊆ [Ia] ;G.sjfTCb ∪G.po
In the former case, thanks to the property 4 of our simulation relation, we can pick a
write event from Xa corresponding to the issued write (e.g., for Fig. 7, it is the event e131,
corresponding to the issued write e13). In the latter case, we pick either the initial write or
some Sb.po preceding write belonging to Brb.
4.3.2 Ordering the New Write Events
In order to pick the Sb.co position of the new write events in the updated event structure, we
generally follow the original G.co order of the IMM graph. Because of the conflicting events,
however, it is not always possible to preserve the inclusion between the relations. This is
why we relax the inclusion to VS.coW ⊆ G.co? in property 12 of the simulation relation.
To see the problem let us return to the example. Suppose that the next traversal step
covers the read e11. To simulate this step, we build an event structure Sc (see Fig. 8). It
contains the new events Brc , {e112, e122, e132}.
Consider the write events e121 and e122 of the event structure. Since the events have
different labels, we cannot make them ew-equivalent. And since Sc.co should be total among
all writes to the same location (with respect to Sc.ew), we must put a co edge between these
two events in one direction or another. Note that events e121 and e122 correspond to the same
event e12 in the graph, thus we cannot use the coherence order of the graph G.co to guide
our decision.
In fact, the co-order between these two events does not matter, so we could pick either
direction. For the purposes of our proofs, however, we found it more convenient to always
put the new events earlier in the co order (thus we have 〈e122, e121〉 ∈ Sc.co). Thereby we can
show that the co edges of the event structure ending in the new events, have corresponding
edges in the graph: VSc.co ; [Brc]W ⊆ G.co.
Now consider the events e131 and e132. Since these events have the same label and correspond
to the same event in G, we make them ew-equivalent. In fact, this choice is necessary for the
correctness of our construction. Otherwise, the new events Brc would be deemed invisible,
because of the Sc.cf ∩ (Sc.jfe ; (Sc.po ∪ Sc.jf)∗) path between e131 and e112. Recall that only
the visible events can be used to extract an execution from the event structure (Def. 7).
ECOOP 2020
5:20 Reconciling Event Structures with Modern Multiprocessors
In general, assuming that I(prog, G, 〈C, I〉, S,X) holds, we attach the new write event e
to an S.ew equivalence class represented by the write event w, s.t. (i) w has the same s2g
image as e, i.e., s2g(w) = s2g(e); (ii) w belongs to X and its s2g image is issued, that is
w ∈ X ∩ TIU. If there is no such an event w, we put e S.co-after events such that their s2g
images are ordered G.co-before s2g(e), and S.co-before events such that their s2g images
are equal to s2g(e) or ordered G.co-after it. Note that thanks to property 9 of the simulation
relation, that is dom(S.jfe) ⊆ dom(S.ew ; [X ∩ TIU]), our choice of ew guarantees that all
new events will be visible.
4.3.3 Construction Overview
To sum up, to prove Lemma 3, we consider the events of G.thread({t}) where t is the
thread of the event issued or covered by the traversal step TC −→ TC ′, together with the
sjf relation determining the values of the read events. At this point, we can show that
I-conditions for the new configuration TC ′ hold for all events except for those in thread t.
Because of receptiveness, there exists a sequence of the thread steps prog(t) −→∗ σ′ for
some thread state σ′ such that the labels on this sequence match the events G.thread({t})
with the labels determined by sjf, and include an event with the same label as the one
issued or covered by the traversal step TC −→ TC ′.
We then do an induction on this sequence of steps, and add each event to the event
structure S and to its selected subset of events X (unless already there), showing along the
way that the I-conditions also hold for the updated event structure, selected subset, and
the events added. At the end, when we have considered all the events generated by the
step sequence, we will have generated the event structure S′ and execution X ′ such that
I(prog, G, TC ′, S′, X ′) holds.
5 Handling SC Accesses
In this section, we briefly describe the changes needed in order to handle the compilation
of Weakestmo’s sequentially consistent (SC) accesses. The purpose of SC accesses is to
guarantee sequential consistency for the simple programming pattern that uses exclusively
SC accesses to communicate between threads. As Lahav et al. [14] showed, however, their
semantics is quite complicated because they can be freely mixed with non-SC accesses.
We first define an extension of IMM, which we call IMMSC. Its consistency extends that
of IMM with an additional acyclicity requirement concerning SC accesses, which is taken
directly from RC11-consistency [14, Definition 1].
I Definition 10. An execution graph G is IMMSC-consistent if it is IMM-consistent [20,
Definition 3.11] and G.pscbase ∪G.pscF is acyclic, where:14
G.scb , G.po ∪G.po| 6=G.loc ;G.hb ;G.po| 6=G.loc ∪G.hb|=loc ∪G.co ∪G.fr
G.pscbase , ([G.Esc] ∪ [G.Fsc] ;G.hb?) ;G.scb ; ([G.Esc] ∪G.hb? ; [G.Fsc])
G.pscF , [G.Fsc]; (G.hb ∪G.hb;G.eco;G.hb); [G.Fsc]
The scb, pscbase and pscF relations were carefully designed by Lahav et al. [14] (and
recently adopted by the C++ standard), so that they provide strong enough guarantees for
14 In IMMSC, event labels include an “access mode”, where sc denotes an SC access. The sets G.Esc
consists of all SC accesses (reads, writes and fences) in G, and G.Fsc consists of all SC fences in G.
E. Moiseenko et al. 5:21
programmers while being weak enough to support the intended compilation of SC accesses
to commodity hardware. In particular, a previous (simpler) proposal in [2], which essentially
includes G.hb between SC accesses in the relation required to be acyclic, is too strong
for efficient compilation to the POWER architecture. Indeed, the compilation schemes to
POWER do not enforce a strong barrier on hb-paths between SC accesses, but rather on
G.po ;G.hb ;G.po-paths between SC accesses.
I Remark 11. The full IMM model (i.e., including release/acquire accesses and SC fences, as
defined by Podkopaev et al. [20]) forbids cycles in rfe∪ppo∪bob∪pscF, where bob is (similar
to ppo) a subset of the program order that must be preserved due to the presence of a memory
fence or release/acquire access. Since pscF is already included in IMM’s acyclicity constraint,
one may consider the natural option of including pscbase in that acyclicity constraint as well.
However, it leads to a model that is too strong, as it forbids the following behavior:
a := [x]rlx / 2
[y]sc := 1 [y]
sc := 2 b := [y]
rlx / 2
[x]rlx := b
Rrlx(x, 2)
Wsc(y, 1)
Wsc(y, 2) Rrlx(y, 2)
Wrlx(x, 2)
bob coe
pscbase
rfe
pporfe
This behavior is allowed by POWER (using any of the two intended compilation schemes for
SC accesses; see §5.1.2).
Adapting the compilation from Weakestmo to IMMSC to cover SC accesses is straightfor-
ward because the full definition of Weakestmo [6] does not have any additional constraints
about SC accesses at the level of event structures. It only has an SC constraint at the level of
extracted executions which is actually the same as in RC11, which we took as is for IMMSC.
5.1 Compiling IMMSC to Hardware
In this section, we establish describe the extension of the results of [20] to support SC accesses
with their intended compilation schemes to the different architectures.
As was done in [20], since IMMSC and the models of hardware we consider are all
defined in the same declarative framework (using execution graphs), we formulate our
results on the level of execution graphs. Thus, we actually consider the mapping of IMMSC
execution graphs to target architecture execution graphs that is induced by compilation
of IMMSC programs to machine programs. Hence, roughly speaking, for each architecture
α ∈ {TSO,POWER,ARMv7,ARMv8}, our (mechanized) result takes the following form:
If the α-execution-graph Gα corresponds to the IMMSC-execution-graph G, then
α-consistency of Gα implies IMMSC-consistency of G.
Since the mapping from Weakestmo to IMMSC (on the program level) is the identity mapping
(Theorem 1), we obtain as a corollary the correctness of the compilation from Weakestmo to
each architecture α that we consider. The exact notions of correspondence between Gα and
G are presented in [17, Appendices B, C and D].
The mapping of IMMSC to each architecture follows the intended compilation scheme
of C/C++11 [16, 14], and extends the corresponding mappings of IMM from Podkopaev
et al. [20] with the mapping of SC reads and writes. Next, we schematically present these
extensions.
ECOOP 2020
5:22 Reconciling Event Structures with Modern Multiprocessors
5.1.1 TSO
There are two alternative sound mappings of SC accesses to x86-TSO:
Fence after SC writes Fence before SC reads
(|Rsc|) , mov (|Rsc|) , mfence;mov
(|Wsc|) , mov;mfence (|Wsc|) , mov
(|RMWsc|) , (lock) xchg (|RMWsc|) , (lock) xchg
The first, which is implemented in mainstream compilers, inserts an mfence after every SC
write; whereas the second inserts an mfence before every SC read. Importantly, one should
globally apply one of the two mappings to ensure the existence of an mfence between every
SC write and following SC read.
5.1.2 POWER
There are two alternative sound mappings of SC accesses to POWER:
Leading sync Trailing sync
(|Rsc|) , sync;(|Racq|) (|Rsc|) , ld;sync
(|Wsc|) , sync;st (|Wsc|) , (|Wrel|);sync
(|RMWsc|) , sync;(|RMWacq|) (|RMWsc|) , (|RMWrel|);sync
The first scheme inserts a sync before every SC access, while the second inserts an sync
after every SC access. Importantly, one should globally apply one of the two mappings to
ensure the existence of a sync between every two SC accesses.
Observing that sync is the result of mapping an SC-fence to POWER, we can reuse the
existing proof for the mapping of IMM to POWER. To handle the leading sync (respectively,
trailing sync) scheme we introduce a preceding step, in which we prove that splitting in the
whole execution graph each SC access to a pair of an SC fence followed (preceded) by a
release/acquire access is a sound transformation under IMMSC. That is, this global execution
graph transformation cannot make an inconsistent execution consistent:
I Theorem 12. Let G be an execution graph such that
[Rsc ∪ Wsc] ; (G.po′ ∪G.po′ ;G.hb ;G.po′) ; [Rsc ∪ Wsc] ⊆ G.hb ; [Fsc] ;G.hb,
where G.po′ , G.po \G.rmw. Let G′ be the execution graph obtained from G by weakening
the access modes of SC write and read events to release and acquire modes respectively. Then,
IMMSC-consistency of G follows from IMM-consistency of G′.
Having this theorem, we can think about mapping of IMMSC to POWER as if it consists
of three steps. We establish the correctness of each of them separately.
1. At the IMMSC level, we globally split each SC-access to an SC-fence and release/acquire
access. Correctness of this step follows by Theorem 12.
2. We map IMM to POWER, whose correctness follows by the existing results of [20], since
we do not have SC accesses at this stage.
3. We remove any redundant fences introduced by the previous step. Indeed, following the
leading sync scheme, we will obtain sync;lwsync;st for an SC write. The lwsync is
redundant here since sync provides stronger guarantees than lwsync and can be removed.
Similarly, following the trailing sync scheme, we will obtain ld;cmp;bc;isync;sync for
an SC read. Again, the sync makes other synchronization instructions redundant.
E. Moiseenko et al. 5:23
5.1.3 ARMv7
The ARMv7 model [1] is very similar to the POWER model with the main difference being
that it has a weaker preserved program order than POWER. However, Podkopaev et al. [20]
proved IMM to POWER compilation correctness without relying on POWER’s preserved
program order explicitly, but assuming the weaker version of ARMv7’s order. Thus, their
proof also establishes correctness of compilation from IMM to ARMv7.
Extending the proof to cover SC accesses follows the same scheme discussed for POWER,
since two intended mappings of SC accesses for ARMv7 are the same except for replacing
POWER’s sync fence with ARMv7’s dmb:
Leading dmb Trailing dmb
(|Rsc|) , dmb;(|Racq|) (|Rsc|) , ldr;dmb
(|Wsc|) , dmb;str (|Wsc|) , (|Wrel|);dmb
(|RMWsc|) , dmb;(|RMWacq|) (|RMWsc|) , (|RMWrel|);dmb
5.1.4 ARMv8
Since ARMv8 has added dedicated instructions to support C/C++-style SC accesses, we
have established the correctness of a mapping employing these new instructions:
(|Rsc|) , LDAR
(|Wsc|) , STLR
(|FADDsc|) , L:LDAXR;STLXR;BC L
(|CASsc|) , L:LDAXR;CMP;BC Le;STLXR;BC L;Le:
We note that in this mapping, we follow Podkopaev et al. [20] and compile RMW opera-
tions to loops with load-linked and store-conditional instructions (LDX/STX). An alternative
mapping for RMWs would be to use single hardware instructions, such as LDADD and CAS, that
directly implement the required functionality. Unfortunately, however, due to a limitation of
the current IMM setup and unclarity about the exact semantics of the CAS instruction, we
are not able to prove the correctness of the alternative mapping employing these instructions.
The problem is that IMM assumes that every po-edge from a RMW instruction is preserved,
which holds for the mapping of CAS using the aforementioned loop, but not necessarily using
the single instruction.
6 Related Work
While there are several memory model definitions both for hardware architectures [1, 10, 18,
22, 23] and programming languages [3, 4, 11, 15, 19, 21] in the literature, there are relatively
few compilation correctness results [6, 9, 12, 14, 20, 25].
Most of these compilation results do not tackle any of the problems caused by po∪rf cycles,
which are the main cause of complexity in establishing correctness of compilation mappings
to hardware architectures. A number of papers (e.g., [6, 12, 25]) consider only hardware
models that forbid such cycles, such as x86-TSO [18] and “strong POWER” [13], while others
(e.g., [9]) consider compilation schemes that introduce fences and/or dependencies so as to
prevent po∪ rf cycles. The only compilation results where there is some non-trivial interplay
of dependencies are by Lahav et al. [14] and by Podkopaev et al. [20].
The former paper [14] defines the RC11 model (repaired C11), and establishes a number
of results about it, most of which are not related to compilation. The only relevant result
is its pencil-and-paper correctness proof of a compilation scheme from RC11 to POWER
ECOOP 2020
5:24 Reconciling Event Structures with Modern Multiprocessors
that adds a fence between relaxed reads and subsequent relaxed writes, but not between
non-atomic accesses. As such, the only po ∪ rf cycles possible under the compilation scheme
involve a racy non-atomic access. Since non-atomic races have undefined semantics in RC11,
whenever there is such a cycle, the proof appeals to receptiveness to construct a different
acyclic execution exhibiting the race.
The latter paper [20] introduced IMM and used it to establish correctness of compilation
from the “promising semantics” (PS) [12] to the usual hardware models. As already men-
tioned, IMM’s definition catered precisely for the needs of the PS compilation proof, and
so did not include important features such as sequentially consistent (SC) accesses. Our
compilation proof shares some infrastructure with that proof—namely, the definition of
IMM and traversals—but also has substantial differences because PS is quite different from
Weakestmo. The main challenges in the PS proof were (1) to encode the various orders of
the IMM execution graphs with the timestamps of the PS machine, and (2) to construct the
certification runs for each outstanding promise. In contrast, the main technical challenge in
the Weakestmo compilation proof is that event structures represent several possible executions
of the program together, and that Weakestmo consistency includes constraints that correlate
these executions, allowing one execution to affect the consistency of another.
7 Conclusion
In this paper, we presented the first correctness proof of mapping from the Weakestmo
memory model to a number of hardware architectures. As a way to show correctness of
Weakestmo compilation to hardware, we employed IMM [20], which we extended with SC
accesses, from which compilation to hardware follows.
Although relying on IMM modularizes the compilation proof and makes it easy to extend
to multiple architectures, it does have one limitation. As was discussed in §5.1.4, IMM
enforces ordering between RMW events and subsequent memory accesses, while one desirable
alternative compilation mapping of RMWs to ARMv8 does not enforce this ordering, which
means that we cannot prove soundness of that mapping via the current definition of IMM.
We are investigating whether one can weaken the corresponding IMM constraint, so that we
can establish correctness of the alternative ARMv8 mapping as well.
Another way to establish correctness of this alternative mapping to ARMv8 may be to use
the recently developed Promising-ARM model [23]. Indeed, since Promising-ARM is closely
related to PS [12], it should be relatively easy to prove the correctness of compilation from
PS to Promising-ARM. Establishing compilation correctness of Weakestmo to Promising-
ARM, however, would remain unresolved because Weakestmo and PS are incomparable [6].
Moreover, a direct compilation proof would probably also be quite difficult because of the
rather different styles in which these models are defined.
Acknowledgments. Evgenii Moiseenko and Anton Podkopaev were supported by RFBR
(grant number 18-01-00380). Ori Lahav was supported by the Israel Science Foundation
(grant number 5166651), by Len Blavatnik and the Blavatnik Family foundation, and by the
Alon Young Faculty Fellowship.
References
1 Jade Alglave, Luc Maranget, and Michael Tautschnig. Herding cats: Modelling, simulation,
testing, and data mining for weak memory. ACM Trans. Program. Lang. Syst., 36(2):7:1–7:74,
July 2014. URL: http://doi.acm.org/10.1145/2627752, doi:10.1145/2627752.
E. Moiseenko et al. 5:25
2 Mark Batty, Alastair F. Donaldson, and John Wickerson. Overhauling SC atomics in C11 and
OpenCL. In POPL 2016, pages 634–648. ACM, 2016.
3 Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. Mathematizing C++
concurrency. In POPL 2011, pages 55–66, New York, 2011. ACM. doi:10.1145/1925844.
1926394.
4 John Bender and Jens Palsberg. A formalization of java’s concurrent access modes. Proc.
ACM Program. Lang., 3(OOPSLA):142:1–142:28, October 2019. URL: http://doi.acm.org/
10.1145/3360568, doi:10.1145/3360568.
5 Hans-J. Boehm and Brian Demsky. Outlawing ghosts: Avoiding out-of-thin-air results. In
MSPC 2014, pages 7:1–7:6. ACM, 2014. doi:10.1145/2618128.2618134.
6 Soham Chakraborty and Viktor Vafeiadis. Grounding thin-air reads with event structures.
Proc. ACM Program. Lang., 3(POPL):70:1–70:27, 2019. doi:10.1145/3290383.
7 The Coq development of IMM, available at http://github.com/weakmemory/imm, 2019.
8 Will Deacon. The ARMv8 application level memory model, 2017. URL: https://github.
com/herd/herdtools7/blob/master/herd/libdir/aarch64.cat.
9 Stephen Dolan, KC Sivaramakrishnan, and Anil Madhavapeddy. Bounding data races in space
and time. In PLDI 2018, pages 242–255, New York, 2018. ACM. URL: http://doi.acm.org/
10.1145/3192366.3192421, doi:10.1145/3192366.3192421.
10 Shaked Flur, Kathryn E. Gray, Christopher Pulte, Susmit Sarkar, Ali Sezgin, Luc Maranget,
Will Deacon, and Peter Sewell. Modelling the ARMv8 architecture, operationally: Concurrency
and ISA. In POPL 2016, pages 608–621, New York, 2016. ACM. URL: http://doi.acm.org/
10.1145/2837614.2837615, doi:10.1145/2837614.2837615.
11 Alan Jeffrey and James Riely. On thin air reads towards an event structures model of relaxed
memory. In LICS 2016, pages 759–767, New York, 2016. ACM. URL: http://doi.acm.org/
10.1145/2933575.2934536, doi:10.1145/2933575.2934536.
12 Jeehoon Kang, Chung-Kil Hur, Ori Lahav, Viktor Vafeiadis, and Derek Dreyer. A promising
semantics for relaxed-memory concurrency. In POPL 2017, pages 175–189, New York, 2017.
ACM. doi:10.1145/3009837.3009850.
13 Ori Lahav and Viktor Vafeiadis. Explaining relaxed memory models with program transfor-
mations. In FM 2016. Springer, 2016. doi:10.1007/978-3-319-48989-6_29.
14 Ori Lahav, Viktor Vafeiadis, Jeehoon Kang, Chung-Kil Hur, and Derek Dreyer. Repairing
sequential consistency in C/C++11. In PLDI 2017, pages 618–632, New York, 2017. ACM.
doi:10.1145/3062341.3062352.
15 Jeremy Manson, William Pugh, and Sarita V. Adve. The Java memory model. In POPL 2005,
pages 378–391, New York, 2005. ACM. doi:10.1145/1040305.1040336.
16 C/C++11 mappings to processors, 2016. URL: http://www.cl.cam.ac.uk/~pes20/cpp/
cpp0xmappings.html.
17 Evgenii Moiseenko, Anton Podkopaev, Ori Lahav, Orestis Melkonian, and Viktor Vafeiadis.
Coq proof scripts and supplementary material for this paper, available at http://plv.mpi-sws.
org/weakestmoToImm/, 2020.
18 Scott Owens, Susmit Sarkar, and Peter Sewell. A better x86 memory model: x86-TSO. In
TPHOLs 2009, volume 5674 of LNCS, pages 391–407, Heidelberg, 2009. Springer.
19 Jean Pichon-Pharabod and Peter Sewell. A concurrency semantics for relaxed atomics that
permits optimisation and avoids thin-air executions. In POPL 2016, pages 622–633, New York,
2016. ACM. doi:10.1145/2837614.2837616.
20 Anton Podkopaev, Ori Lahav, and Viktor Vafeiadis. Bridging the gap between programming
languages and hardware weak memory models. Proc. ACM Program. Lang., 3(POPL):69:1–
69:31, 2019. doi:10.1145/3290382.
21 Anton Podkopaev, Ilya Sergey, and Aleksandar Nanevski. Operational aspects of C/C++
concurrency. CoRR, abs/1606.01400, 2016. URL: http://arxiv.org/abs/1606.01400.
ECOOP 2020
5:26 Reconciling Event Structures with Modern Multiprocessors
22 Christopher Pulte, Shaked Flur, Will Deacon, Jon French, Susmit Sarkar, and Peter Sewell.
Simplifying ARM concurrency: multicopy-atomic axiomatic and operational models for ARMv8.
Proc. ACM Program. Lang., 2(POPL):19:1–19:29, 2018. doi:10.1145/3158107.
23 Christopher Pulte, Jean Pichon-Pharabod, Jeehoon Kang, Sung-Hwan Lee, and Chung-
Kil Hur. Promising-ARM/RISC-V: a simpler and faster operational concurrency model.
In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design
and Implementation, PLDI 2019, pages 1–15, New York, NY, USA, 2019. ACM. URL:
http://doi.acm.org/10.1145/3314221.3314624, doi:10.1145/3314221.3314624.
24 Viktor Vafeiadis, Thibaut Balabonski, Soham Chakraborty, Robin Morisset, and Francesco
Zappa Nardelli. Common compiler optimisations are invalid in the C11 memory model
and what we can do about it. In POPL 2015, pages 209–220, New York, 2015. ACM.
doi:10.1145/2676726.2676995.
25 Jaroslav Ševčík, Viktor Vafeiadis, Francesco Zappa Nardelli, Suresh Jagannathan, and Peter
Sewell. CompCertTSO: A verified compiler for relaxed-memory concurrency. J. ACM, 60(3):22,
2013. doi:10.1145/2487241.2487248.
E. Moiseenko et al. 5:27
A Simulation Relation for the complete Weakestmo model
Here we present the simulation relation IT (prog, G, TC, S,X) and the auxiliary relation
Icert(prog, G, 〈C, I〉, 〈C ′, I ′〉, S,X, t, Br, σ, σ′) for the complete Weakestmo memory model.
In addition to the relaxed accesses the full versions of the relations handle fences, read-
modify-write pairs, release, acquire and sequentially consistent accesses.
We define the relation IT (prog, G, 〈C, I〉, S,X) to hold if the following conditions are met:
1. G is an IMMSC-consistent execution of prog.
2. S is a Weakestmo-consistent event structure of prog.
3. X is an extracted subset of S.
4. The s2g-image of X is equal to the union of the covered and issued events and the events
which po-precede the issued ones:VXW = C ∪ dom(G.po? ; [I])
5. The s2g-image of the event from the thread t ∈ T lies in C ∪ dom(G.po? ; [I]).VS.thread(T )W ⊆ C ∪ dom(G.po? ; [I])
6. The s2g-image of S’s event has the same thread, type, modifier, and location. Additionally,
the s2g-image of X’s event which is covered or issued has the same value:
a. ∀e ∈ S.E. S.{tid, typ, loc, mod}(e) = G.{tid, typ, loc, mod}(s2g(e))
b. ∀e ∈ X ∩ TC ∪ IU. S.val(e) = G.val(s2g(e))
7. The s2g-image of S.po is a subset of the G.po relation:VS.poW ⊆ G.po
8. Identity relation in G corresponds to identity or conflict relation in S:TidU ⊆ S.cf?
9. The s2g-image of a justification edge is included in paths in G representing observation
of the corresponding thread. The s2g-image of a justification edge is in G.rf if the
edge ends either in domain of S.rmw, an acquire access, or followed by an acquire fence.
Moreover, the s2g-image of S.jf ending in X matches the simulation reads-from relation:
a. VS.jfW ⊆ G.rf? ; (G.hb ; [G.Fsc])? ;G.psc?F ;G.hb?
b. VS.jf ; S.rmwW ⊆ G.rf ;G.rmw
c. VS.jf ; (S.po ; [S.F])? ; [S.Ewacq ]W ⊆ G.rf ; (G.po ; [S.F])? ; [G.Ewacq ]
d. VS.jf ; [X]W ⊆ G.sjf(TC)
Using the last property it is possible to derive that VS.jf ; [X ∩ TCU]W ⊆ G.rf.
10. Each write event in S which justifies some read event externally should be S.ew-equal to
a write event in X whose s2g-image is issued:
dom(S.jfe) ⊆ dom(S.ew ; [X ∩ TIU])
11. The s2g-image of S.ew is a subset of the identity relation:VS.ewW ⊆ id
12. Let w and w′ be different events in one S.ew equivalence class. Then, there is w′′ in this
equivalence class s.t. w′′ is in X and s2g(w′′) is issued:
S.ew ⊆ (S.ew ; [X ∩ TIU] ; S.ew)?
13. The s2g-image of S.co lies in the reflexive closure of G.co. Additionally, s2g-images of
S.co-edges ending in X ∩ S.thread(T ) lay in G.co:
a. VS.coW ⊆ G.co?
b. VS.co ; [X ∩ S.thread(T )W ⊆ G.co
14. The s2g-image of S.rmw is in G.rmw. Vice versa, G.rmw ending in the covered set is in
the s2g-image of S.rmw ending in X.
a. VS.rmwW ⊆ G.rmw
b. G.rmw ; [C] ⊆ VS.rmw ; [X]W
ECOOP 2020
5:28 Reconciling Event Structures with Modern Multiprocessors
15. Let e, w, and w′ be events in S s.t. (i) 〈e, w〉 is an S.release edge, (ii) w and w′ is in the
same S.ew equivalence class, (iii) w′ is in X, and (iv) s2g(w′) is issued. Then e is in X:
dom(S.release ; S.ew ; [X ∩ TIU]) ⊆ X
This property is needed to show that dom(S.hb \ S.po) is included in X.
16. Let r, r′, w, and w′ be events in S s.t. (i) r and r′ are in immediate conflict and justified
from w and w′ respectively, and (ii) r′ is in X and its thread is in T . Then s2g(w) is
G.co-less than s2g(w′):VS.jf ; S.cfimm ; [X ∩ S.thread(T )] ; S.jf−1W ⊆ G.co
This property is needed to prove cfimm-justification on the simulation step.
17. For all t ∈ T there exists σ s.t. S.KC(t) −→∗t σ and the thread-local execution graph σ.G
is equivalent modulo rf and co components to the restriction of G to the thread t.
In addition to I we also define a version of the simulation realtion which holds during
the construction of a certification branch Icert.
We define the relation Icert(prog, G, 〈C, I〉, 〈C ′, I ′〉, S,X, t, Br, σ, σ′) to hold if the follow-
ing conditions are met:
1. IT\{t}(prog, G, 〈C, I〉, S,X) holds.
2. G ` 〈C, I〉 −→t 〈C ′, I ′〉 holds.
3. σ and σ′ are thread states s.t. σ′ is reachable from σ, σ corresponds to the S.po-last
event in Br and the partial execution graph of σ′ contains covered and issued events up
to the G.po-last issued write in the thread t:
a. σ −→∗t σ′
b. σ.G.E = VBrW
c. σ′.G.E = G.thread(t) ∩ (C ′ ∪ dom(G.po? ; [I ′]))
4. The set Br consists of the events from the thread t and covered prefixes of Br and X
restricted to thread t coincide:
a. Br ⊆ S.thread(t)
b. Br ∩ TCU = X ∩ S.thread(t) ∩ TCU
5. The partial execution graph of σ′ assigns same thread identifier, type, location and mode
as the full execution graph G does. Additionally, it assigns the same value as G to
determined events.
a. ∀e ∈ σ′.G.E. σ′.G.{tid, typ, loc, mod}(e) = G.{tid, typ, loc, mod}(e)
b. ∀e ∈ σ′.G.E ∩G.determined(〈C ′, I ′〉). σ′.G.val(e) = G.val(e)
6. The s2g-image of the jf relation ending in Br is included in G.sjf(〈C ′, I ′〉):VS.jf ; [Br]W ⊆ G.sjf(〈C ′, I ′〉)
7. For every issued event from Br there exists an S.ew-equivalent in X. And, symmetrically,
every issued event from X within the processed part of the certification branch has an
S.ew-equivalent in Br.
a. Br ∩ TIU ⊆ dom(S.ew ; [X])
b. X ∩ TI ∩ σ.G.EU ⊆ dom(S.ew ; [Br])
8. The s2g-image of S.co ending in Br lies in G.co The s2g-image of S.co ending in
X ∩ S.thread(t) and not in the processed part of the certification branch lies in G.co.
a. VS.co ; [Br]W ⊆ G.co
b. VS.co ; [X ∩ S.thread(t) \ Tσ.G.EU]W ⊆ G.co
9. Each G.rmw edge ending in the processed part of the certification branch is the s2g-image
of some S.rmw edge ending in Br.
G.rmw ; [C ′ ∩ σ.G.E] ⊆ VS.rmw ; [Br]W
E. Moiseenko et al. 5:29
(|r := [e]rlx|) ≈ “ldr” (|[e1]rlx := e2|) ≈ “str”
(|r := [e]wacq|) ≈ “ldar” (|[e1]wrel := e2|) ≈ “stlr”
(|fenceacq|) ≈ “dmb.ld” (|fence 6=acq|) ≈ “dmb.sy”
(|r := FADDo(e1, e2)|) ≈ “L:” ++ ld(o) ++ st(o) ++ “bc L”
(|r := CASo(e, eR, eW)|) ≈ “L:” ++ ld(o) ++ “cmp;bc Le;” ++ st(o) ++ “bc L;Le:”
ld(o), o w acq ? “ldaxr;” : “ldxr;” st(o) , o w rel ? “stlxr.;” : “stxr.;”
Figure 9 Compilation scheme from IMMSC to ARMv8.
10. Suppose w, w′, r, and r′ are S’s events s.t. (i) r and r′ are justified from w and w′
respectively, and (ii) r and r′ are in immediate conflict and belong to thread t. Then
s2g(w′) is G.co-greater than s2g(w) if either r′ is in Br:VS.jf ; S.cfimm ; [Br] ; S.jf−1W ⊆ G.co
or r is not in Br and r′ is in X ∩ S.thread(t):VS.jf ; [S.E \Br] ; S.cfimm ; [X ∩ S.thread(t)] ; S.jf−1W ⊆ G.co
B From IMMSC to ARMv8
The intended mapping of IMM to ARMv8 is presented schematically in Fig. 9 and follows [16].
Note that acquire and SC loads are compiled to the same instruction (ldar) as well as release
and SC stores (stlr). In ARM assembly RMWs are represented as pairs of instructions—
exclusive load (ldxr) followed by exclusive store (stxr), and these instructions are also have
their stronger (SC) counterparts—ldaxr and stlxr.
We use ARMv8 declarative model [8] (see also [22]).15 Its labels are given by:
ARM read label: RoR(x, v) where x ∈ Loc, v ∈ Val, oR ∈ {rlx, Q, A}, and rlx @ Q @ A.
ARM write label: WoW(x, v) where x ∈ Loc, v ∈ Val, oW ∈ {rlx, L}, and rlx @ L.
ARM fence label: FoF where oF ∈ {ld, sy} and ld @ sy.
In turn, ARM’s execution graphs are defined as IMMSC’s ones, except for the CAS dependency,
casdep, which is not present in ARM executions.
The definition of ARMv8-consistency requires the following derived relations (see [22] for
further explanations and details):
obs , rfe ∪ fre ∪ coe (observed-by)
dob , (addr ∪ data); rfi? ∪ (ctrl ∪ data); [W]; coi? ∪ addr; po; [W]
(dependency-ordered-before)
aob , rmw ∪ [Wex]; rfi; [RwQ] (atomic-ordered-before)
bob , po; [Fsy]; po ∪ [R]; po; [Fld]; po ∪ [RwQ]; po ∪ po; [WL]; coi? ∪ [WL]; po; [RA]
(barrier-ordered-before)
I Definition 13. An ARMv8 execution graph Ga is called ARMv8-consistent if the following
hold:
codom(Ga.rf) = Ga.R.
For every location x ∈ Loc, Ga.co totally orders Ga.W(x).
Ga.po|loc ∪Ga.rf ∪Ga.fr ∪Ga.co is acyclic. (sc-per-loc)
Ga.rmw ∩ (Ga.fre;Ga.coe) = ∅.
15We only describe the fragment of the model that is needed for mapping of IMMSC, thus excluding isb
fences.
ECOOP 2020
5:30 Reconciling Event Structures with Modern Multiprocessors
(|r := [e] 6=sc|) ≈ “mov” (|fence6=sc|) ≈ “”
(|[e1] 6=sc := e2|) ≈ “mov” (|fencesc|) ≈ “mfence”
(|r := FADDo(e1, e2)|) ≈ “(lock) xadd”
(|r := CASo(e, eR, eW)|) ≈ “(lock) cmpxchg”
Alt. 1: (|r := [e]sc|) ≈ “mov” Alt. 2: (|r := [e]sc|) ≈ “mfence;mov”
(|[e1]sc := e2|) ≈ “mov;mfence” (|[e1]sc := e2|) ≈ “mov”
Figure 10 Compilation scheme from IMMSC to TSO.
Ga.obs ∪Ga.dob ∪Ga.aob ∪Ga.bob is acyclic. (external)
We interpret the intended compilation on execution graphs:
I Definition 14. Let G be an IMM execution graph. An ARM execution graph Ga corresponds
to G if the following hold:
Ga.E = G.E and Ga.po = G.po
Ga.lab = {e 7→ (|G.lab(e)|) | e ∈ G.E} where:
(|Rrlxs (x, v)|) , Rrlx(x, v) (|Wrlx(x, v)|) , Wrlx(x, v)
(|Racqs (x, v)|) , RQ(x, v) (|Wwrel(x, v)|) , WL(x, v)
(|Rscs (x, v)|) , RA(x, v)
(|Facq|) , Fld (|Frel|) = (|Facqrel|) = (|Fsc|) , Fsy
G.rmw = Ga.rmw, G.data = Ga.data, and G.addr = Ga.addr
(the compilation does not change RMW pairs and data/address dependencies)
G.ctrl ⊆ Ga.ctrl
(the compilation only adds control dependencies)
[G.Rex] ;G.po ⊆ Ga.ctrl ∪Ga.rmw ∩Ga.data
(exclusive reads entail a control dependency to any future event, except for their immediate
exclusive write successor if arose from an atomic increment)
G.casdep ;G.po ⊆ Ga.ctrl
(CAS dependency to an exclusive read entails a control dependency to any future event)
We state our theorem that ensures IMMSC-consistency if the corresponding ARMv8
execution graph is ARMv8-consistent.
I Theorem 15. Let G be an IMM execution graph with whole serial numbers (sn[G.E] ⊆ N),
and let Ga be an ARMv8 execution graph that corresponds to G. Then, ARMv8-consistency
of Ga implies IMMSC-consistency of G.
Outline. IMM-consistency of G follows from [20, Theorem 4.5]. That is, we only need to show
that acyclicity of G.pscbase ∪G.pscF holds. We start by showing that Ga.obs′ ∪Ga.dob ∪
Ga.aob ∪Ga.bob′ is acyclic, where
obs′ , rfe ∪ fr ∪ co
bob′ , bob ∪ [R]; po; [Fld] ∪ po; [Fsy] ∪ [Fwld]; po
Then, we finish the proof by showing that Ga.pscbase ∪Ga.pscF is included in (Ga.obs′ ∪
Ga.dob ∪Ga.aob ∪Ga.bob′)+. J
E. Moiseenko et al. 5:31
C From IMMSC to TSO
The intended mapping of IMMSC to TSO is presented schematically in Fig. 10. There are
two possible alternatives for compiling SC accesses (see the bottom of Fig. 10): to compile
an SC store to a store followed by a fence or to compile an SC load to a load preceded by a
fence. Both of the schemes guarantee that in compiled code there is a fence between every
store and load instructions originated from SC accesses. Regarding compilation schemes of
SC accesses, our proof of the compilation correctness from IMMSC to TSO depends only on
this property. That is, in this section, we concentrate only on the compilation alternative
which compiles SC stores using fences.
As a model of the TSO architecture, we use a declarative model from [1]. Its labels are
given by:
TSO read label: R(x, v) where x ∈ Loc and v ∈ Val.
TSO write label: W(x, v) where x ∈ Loc and v ∈ Val.
TSO fence label: MFENCE.
In turn, TSO’s execution graphs are defined as IMMSC’s ones. Below, we interpret the
compilation on execution graphs.
I Definition 16. Let G be an IMM execution graph with whole identifiers (G.E ⊆ N). A
TSO execution graph Gt corresponds to G if the following hold:
Gt.E = G.E \G.F 6=sc ∪ {n+ 0.1 | n ∈ G.Wsc}
(non-SC fences are removed)
Gt.tid(e) = G.tid(bec+ 0.1) for all e in Gt
Gt.po =
[Gt.E] ; (G.po ∪ {〈a, n+ 0.1〉 | 〈a, n〉 ∈ G.po?} ∪ {〈n+ 0.1, a〉 | 〈n, a〉 ∈ G.po}) ; [Gt.E]
(new events are added after SC writes)
Gt.lab = {e 7→ (|G.lab(e)|) | e ∈ G.E \G.F 6=sc} ∪ {e 7→ MFENCE | e ∈ Gt.E \G.E} where:
(|RoRs (x, v)|) , R(x, v) (|WoW(x, v)|) , W(x, v) (|Fsc|) , MFENCE
G.rmw = Gt.rmw, G.data = Gt.data, and G.addr = Gt.addr
(the compilation does not change RMW pairs and data/address dependencies)
G.ctrl; [G.E \G.F6=sc] ⊆ Gt.ctrl
(the compilation only adds control dependencies)
The following derived relations are used to define the TSO-consistency predicate.
ppoTSO , [R ∪ W]; po; [R ∪ W] \ [W]; po; [R]
fenceTSO , [R ∪ W]; po; [MFENCE]; po; [R ∪ W]
implied_fenceTSO , [W]; po; [dom(rmw)] ∪ [codom(rmw)]; po; [R]
hbTSO , ppoTSO ∪ fenceTSO ∪ implied_fenceTSO ∪ rfe ∪ co ∪ fr
I Definition 17. G is called TSO-consistent if the following hold:
codom(G.rf) = G.R. (rf-completeness)
For every location x ∈ Loc, G.co totally orders G.W(x). (co-totality)
po|loc ∪ rf ∪ fr ∪ co is acyclic. (sc-per-loc)
G.rmw ∩ (G.fre ;G.coe) = ∅. (atomicity)
G.hbTSO is acyclic. (tso-no-thin-air)
Next, we state our theorem that ensures IMMSC-consistency if the corresponding TSO
execution graph is TSO-consistent.
ECOOP 2020
5:32 Reconciling Event Structures with Modern Multiprocessors
I Theorem 18. Let G be an IMMSC execution graph with whole identifiers (G.E ⊆ N), and
let Gt be an TSO execution graph that corresponds to G. Then, TSO-consistency of Gt
implies IMMSC-consistency of G.
Outline. Since Gt corresponds to G, we know that
[G.Wsc];G.po; [G.Rsc] ⊆ Gt.po; [Gt.MFENCE];Gt.po
as the aforementioned property of the compilation scheme. We show that
Gt.ehbTSO , Gt.hbTSO ∪ [Gt.MFENCE];Gt.po ∪ [Gt.MFENCE];Gt.po
is acyclic. Then, we show that G.pscbase ∪G.pscF is included in Gt.ehb+TSO. It means that
acyclicity of G.pscbase ∪G.pscF holds, and it leaves us to prove that G is IMM-consistent.
That is done by standard relational techniques (see [7]). J
E. Moiseenko et al. 5:33
(|r := [e]rlx|) ≈ “ld” (|[e1]rlx := e2|) ≈ “st”
(|r := [e]acq|) ≈ “ld;cmp;bc;isync” (|[e1]rel := e2|) ≈ “lwsync;st”
(|fence6=sc|) ≈ “lwsync” (|fencesc|) ≈ “sync”
(|r := FADDo(e1, e2)|) ≈ wmod(o) ++ “L:lwarx;stwcx.;bc L” ++ rmod(o)
(|r := CASo(e, eR, eW)|) ≈ wmod(o) ++ “L:lwarx;cmp;bc Le;stwcx.;bc L;Le:” ++ rmod(o)
wmod(o) , o w rel ? “lwsync;” : “” rmod(o) , o w acq ? “;isync” : “”
Leading sync: Trailing sync:
(|r := [e]sc|) ≈ “sync;ld;cmp;bc;isync” (|r := [e]sc|) ≈ “ld;sync”
(|[e1]sc := e2|) ≈ “sync;st” (|[e1]sc := e2|) ≈ “lwsync;st;sync”
Figure 11 Compilation scheme from IMM to POWER.
D From IMMSC to POWER
Here we use the same mapping of IMM to POWER (see Fig. 11) as in [20] for all instructions
except for SC accesses. For the latter, there are two standard compilations schemes [16]
presented in the bottom of Fig. 11: with leading and trailing sync fences.
The next definition presents the correspondence between IMM execution graphs and their
mapped POWER ones following the leading compilation scheme in Fig. 11 with elimination
of the aforementioned redundancy of SC write compilation.
I Definition 19. Let G be an IMM execution graph with whole identifiers (G.E ⊆ N). A
POWER execution graph Gp corresponds to G if the following hold:
Gp.E = G.E ∪ {n+ 0.1 | n ∈ (G.Rwacq \ dom(G.rmw)) ∪
codom([G.Rwacq] ;G.rmw)}
∪ {n− 0.1 | n ∈ (G.Ewrel \ dom(G.rmw)) ∪
dom(G.rmw ; [G.Wwrel])}
(new events are added after acquire reads and acquire RMW pairs and before SC accesses
and SC RMW pairs)
Gp.tid(e) = G.tid(be+ 0.1c) for all e in Gp
Gp.po = G.po ∪ ((Gp.E×Gp.E) ∩
({〈a, n− 0.1〉 | 〈a, n〉 ∈ G.po} ∪
{〈n− 0.1, a〉 | 〈n, a〉 ∈ G.po?} ∪
{〈a, n+ 0.1〉 | 〈a, n〉 ∈ G.po?} ∪
{〈n+ 0.1, a〉 | 〈n, a〉 ∈ G.po}))
Gp.lab ={e 7→ (|G.lab(e)|) | e ∈ G.E} ∪
{n+ 0.1 7→ Fisync | n+ 0.1 ∈ Gp.E ∧ n ∈ N} ∪
{n− 0.1 7→ Flwsync | n− 0.1 ∈ Gp.E ∧ n ∈ N ∧
n 6∈ G.Esc ∪ dom(G.rmw ; [G.Wsc])} ∪
{n− 0.1 7→ Fsync | n− 0.1 ∈ Gp.E ∧ n ∈ N ∧
n ∈ G.Esc ∪ dom(G.rmw ; [G.Wsc])}
where:
(|RoRs (x, v)|) , R(x, v) (|Facq|) = (|Frel|) = (|Facqrel|) , Flwsync
(|WoW(x, v)|) , W(x, v) (|Fsc|) , Fsync
G.rmw = Gp.rmw, G.data = Gp.data, and G.addr = Gp.addr
(the compilation does not change RMW pairs and data/address dependencies)
G.ctrl ⊆ Gp.ctrl
(the compilation only adds control dependencies)
ECOOP 2020
5:34 Reconciling Event Structures with Modern Multiprocessors
[G.Rwacq] ;G.po ⊆ Gp.rmw ∪Gp.ctrl
(a control dependency is placed from every acquire or SC read)
[G.Rex] ;G.po ⊆ Gp.ctrl ∪Gp.rmw ∩Gp.data
(exclusive reads entail a control dependency to any future event, except for their immediate
exclusive write successor if arose from an atomic increment)
G.data ; [codom(G.rmw)] ;G.po ⊆ Gp.ctrl
(data dependency to an exclusive write entails a control dependency to any future event)
G.casdep ;G.po ⊆ Gp.ctrl
(CAS dependency to an exclusive read entails a control dependency to any future event)
The correspondence between IMM and POWER execution graphs which follows the trailing
compilation scheme may be presented similarly with two main difference. First, obviously,
SC accesses are compiled to release and acquire accesses followed by SC fences:
Gp.E = G.E ∪ {n+ 0.1 | n ∈ {(G.Ewacq \ dom(G.rmw)) ∪
codom([G.Rwacq] ;G.rmw)}
∪ {n− 0.1 | n ∈ (G.Wwrel \ dom(G.rmw)) ∪
dom(G.rmw ; [G.Wwrel])}
Gp.lab = {e 7→ (|G.lab(e)|) | e ∈ G.E} ∪
{n+ 0.1 7→ Fisync | n+ 0.1 ∈ Gp.E ∧ n ∈ N ∧
n ∈ G.Racq ∪ codom([G.Racq] ;G.rmw)} ∪
{n+ 0.1 7→ Fsync | n+ 0.1 ∈ Gp.E ∧ n ∈ N ∧
n ∈ G.Esc ∪ codom([G.Rsc] ;G.rmw)} ∪
{n− 0.1 7→ Flwsync | n− 0.1 ∈ Gp.E ∧ n ∈ N+}
Second, [G.Rwacq] ;G.po has to be included in Gp.rmw∪Gp.ctrl∪Gp.po ; [Gp.Flwsync] ;Gp.po?,
not just in Gp.rmw ∪ Gp.ctrl, to allow for elimination of the aforementioned SC read
compilation redundancy.
The next theorem ensures IMMSC-consistency if the corresponding POWER execution
graph is POWER-consistent.
I Theorem 20. Let G be an IMM execution graph with whole identifiers (G.E ⊆ N), and let
Gp be a POWER execution graph that corresponds to G. Then, POWER-consistency of Gp
implies IMMSC-consistency of G.
Outline. We construct an IMM execution graph G′ by inserting SC fences before SC accesses
in G. We also construct GNoSC from G′ by replacing SC write and read accesses of G′
with release write and acquire read ones respectively. Obviously, IMMSC-consistency of G
follows from IMMSC-consistency of G′, which, in turn, follows from IMM-consistency of GNoSC
by Theorem 12. We construct an IMM execution graph G′′ from GNoSC by inserting release
fences before release writes, and then an IMM execution graph GNoRel from G′′ by weakening
the access modes of release write events to a relaxed mode. As on a previous proof step,
IMM-consistency of GNoSC follows from IMM-consistency of G′′, which in turn follows from
IMM-consistency of GNoRel by [20, Theorem 4.1].
Thus to prove the theorem we need to show that GNoRel is IMM-consistent. Note that Gp—
the POWER execution graph corresponding to G—also corresponds to GNoRel by construction
of GNoRel. That is, IMM-consistency of GNoRel follows from POWER-consistency of Gp by [20,
Theorem 4.3] since GNoRel does not contain SC read and write access events as well as release
write access events. J
