Weakestmo is a recently proposed memory consistency model that uses event structures to resolve the infamous "out-of-thin-air" problem. Although it has been shown to have important benefits over other memory models, its established compilation schemes are suboptimal in that they add more fences than necessary. In this paper, we prove the correctness in Coq of the intended compilation schemes for Weakestmo to a range of hardware memory models (x86, POWER, ARMv7, ARMv8, RISC-V). Our proof is the first that establishes correctness of compilation of an event-structure-based model that forbids "thin-air" behaviors, as well as the first mechanized compilation proof of a weak memory model supporting sequentially consistent accesses to such a range of hardware platforms. Our compilation proof goes via the recent Intermediate Memory Model (IMM), which we suitably extend with sequentially consistent accesses.
Introduction
A large body of research on weak memory models has recently been devoted to developing models that allow load-to-store reordering (a.k.a. load buffering, LB) combined with compiler optimizations (e.g., elimination of fake dependencies), while forbidding "out-of-thin-air" behaviors [16, 10, 4, 12] . For example, under the assumption that locations x and y are initialized by 0, it is desirable to allow annotated outcome a = b = 1 for LB-fake, while forbidding it for LB-data. The most established model which meets the desideratum is the promising semantics of Kang et al. [11] (henceforth, PS), an operational model based on timestamps and 'promises' of future writes. Podkopaev et al. [17] recently proved the correctness of compilation from PS to hardware memory models by introducing an intermediate memory model, called IMM [17] , that abstracts over the major existing hardware models (x86-TSO [15] , POWER [1] , ARMv7 [1] , ARMv8 [19] , RISC-V [21, 22] ), thereby modularizing the proof.
Nevertheless, PS has some drawbacks pertaining to its compilation: (1) it does not support sequential consistent (SC) accesses, the default C/C++ atomic access mode; (2) its compilation of read-modify-write (RMW) operations (e.g., compare-and-swap and fetch-and-add) to ARMv8 requires an extra fence. Moreover, PS is not very flexible in that it is not easy to adapt its definition.
For these reasons, Chakraborty and Vafeiadis [5] introduced the Weakestmo memory model based on event structures. Being largely declarative, Weakestmo is flexible and supports SC accesses, and was conjectured to support optimal compilation to hardware memory models, i.e., not requiring any fences or fake dependencies for relaxed accesses (incl. RMWs). Because of the difficulty of establishing such a result, Chakraborty and Vafeiadis only established correctness of the intended compilation scheme to x86-TSO, as well as correctness of two suboptimal compilation schemes to POWER and ARMv7 that involve more expensive fences than intended. Compilation to ARMv8 is not established.
In this paper, we formalize Weakestmo in Coq and establish the correctness of the intended compilation schemes from Weakestmo to the aforementioned hardware architectures with a mechanized Coq proof of about 30K lines on top of an existing infrastructure of about 19K lines. Our proof also revealed some minor deficiencies in the Weakestmo model, which we have addressed. To the best of our knowledge, our proof is the first mechanized compilation proof of an event-structurebased memory model, as well as the first mechanized compilation proof of a weak memory model supporting SC accesses. The latter, perhaps counterintuitively, are not easy to support and have had a history of wrong compilation correctness arguments (see [12] for details). To achieve this, we introduce IMM SC , an extension of IMM with SC accesses, which we consider as a valuable secondary contribution, since IMM SC is already used for proving the correctness of compilation of the OCaml and JavaScript models. The compilation proof is structured as shown in Fig. 1 with the bold arrow representing the main result, the others being extensions of previous results, and double arrows denoting results for two compilation schemes.
Outline We start with an overview of IMM, Weakestmo, and our compilation proof ( §2); we then present Weakestmo ( §3), IMM SC ( §4), a simulation relation used in compilation correctness proof ( §5), and the proof of the simulation step ( §6). The associated proof scripts can be found at http://github.com/weakmemory/weakestmoToImm and http://github.com/weakmemory/imm.
Overview of the Compilation Correctness Proof
To get an idea about the IMM and Weakestmo memory models, consider a version of the LB-fake and LB-data programs from §1 with no dependency in thread I: IMM is a declarative (also called axiomatic) model identifying a program's semantics with a set of execution graphs, or just executions. As an example, Fig. 2a contains G LB , an IMM execution graph of LB corresponding to an execution yielding the annotated behavior. Vertices of execution graphs, called events, represent memory accesses which are either initialization of memory or generated by execution of program instructions. Each non-initialization event is labeled with the type of the access (e.g., R for reads, W for writes), the location accessed, and the value read or written. Memory initialization consists of a set of events labeled W(x, 0) for each location x used in the program; for conciseness, however, we depict it as a single event Init.
The edges represent different relations on events. In Fig. 2 , three different relations are depicted. The program order relation (po) totally orders events originated from the same thread according to their order in the program, as well as the initialization event(s) before all other events. The reads-from relation (rf) relates a write event to the read events that read from it. The data relation represents a syntactic data dependency between events of the same thread (e.g., a write storing the value read by a prior read). In examples, we depict only immediate po edges and omit marking po edges between events connected by another thread-local relation like data.
As a declarative memory model, IMM employs a consistency predicate to define which executions are allowed (i.e., are IMM-consistent). The predicate is defined as a collection of constraints forbidding cycles of different shapes in executions graphs. For example, IMM forbids cycles consisting only of rf and data edges. That is, the execution in Fig. 2b , which represents the annotated behavior of LB-fake and LB-data, is not IMM-consistent. In contrast, G LB is IMM-consistent, and so IMM allows the annotated behavior of LB, but forbids those of LB-fake and LB-data.
Execution graphs of a program are constructed in three steps in IMM. 7 First, sequential executions are built for each thread in accordance to a thread-local semantics, which non-deterministically picks for each read access the value being read (among the set of all possible values). Second, the executions of different threads are combined to a single complete execution graph. An execution graph is complete if each read event in it is connected to some write of the same location and value by the rf relation. Third, the execution graphs are filtered by the IMM-consistency predicate. The IMM-consistent ones form the program's semantics.
Since Weakestmo supports C11-style sequential consistent (SC) accesses, which IMM does not support, we extend IMM with SC accesses following their axiomatization by Lahav et al. [12] . Our extended model, IMM SC , adds some constraints in its consistency definition for SC accesses. The extension is conservative in that programs without SC accesses have the same semantics under IMM and IMM SC .
An Informal Introduction to Weakestmo
We move on to the Weakestmo model, which also defines the program's semantics as a set of execution graphs. However, they are constructed differentlyextracted from a final event structure, which Weakestmo builds for a program. The event structures themselves are built operationally. For example, a sequence of event structures which Weakestmo constructs for LB is presented in Fig. 3 .
The initial event structure consists of only initial events. Then, Weakestmo may add an event representing execution of the first instruction of a program's thread. Then, Weakestmo may execute the second instruction of this thread or the first instruction of another thread, etc. Fig. 3a depicts the event structure S a obtained from the initial event structure by executing a := [x] in LB's thread I. As a result of the instruction execution, a read event e 1 11 : R(x, 0) is added. Whenever the event added is a read, Weakestmo select a write event to the same location to justify the value returned by the read. In this case, there is only one write to x-the initialization write-and so S a has a justified from edge, denoted jf, going to e 1 11 in S a . This is a requirement of Weakestmo: each read event in an event structure has to be justified from exactly one write event with the same value and location. As a consequence of how Weakestmo event structure are constructed, po ∪ jf is guaranteed to be acyclic. track syntactic dependencies, e.g., S d in Fig. 3d does not contain a data edge between e 2 11 and e 2 12 . The reason is that Weakestmo is a programming-languagelevel memory model and supports optimizations removing fake dependencies. The next step ( Fig. 3e ) is more interesting because it showcases the key distinction between event structures and execution graphs, namely that event structures may contain more than one execution for each thread. Specifically, the transition from S d to S e reruns the first instruction of thread I and adds a new event e 1 12 justified by a different write event. We say that this new event conflicts with e 1 11 because they cannot both occur in a single execution. Technically, conflicting events are represented by a symmetric "conflict" relation cf. Because of conflicts, po in event structures does not totally order all events of a thread; e.g., e 1 11 and e 1 12 are not po-ordered in S e . Two events of the same thread are conflicted precisely iff they are not ordered by po. By construction, cf "extends downwards": po-successors of conflicting events are also in conflict with one another.
The final construction step ( Fig. 3f cation may be announced equal writes, i.e., connected by an equivalence relation ew, 8 e.g., events e 1 21 and e 1 22 in S f . The ew relation is used to define Weakestmo's version of the reads-from relation, rf, which relates a read to all (non-conflicted) writes equal to the write justifying the read. For example, e 2 11 reads from both e 1 21 and e 1 22 . The Weakestmo's rf relation is used for extraction of program executions. An execution graph G is extracted from an event structure S denoted S ⊲ G if, among other requirements, G is a maximal conflict-free subset of S such that each read event in G reads from some write in G according to S.rf. There are two execution graphs which could be extracted from S f : {Init, e 1 11 , e 1 21 , e 2 11 , e 2 21 } and {Init, e 1 12 , e 1 22 , e 2 11 , e 2 21 } representing outcomes a = 0 ∧ b = 1 and a = b = 1 respectively.
Weakestmo to IMM SC Compilation: High-Level Proof Structure
In this paper, we assume that Weakestmo is defined for the same assembly language as IMM (see [17, Fig. 2] ) extended with SC accesses and refer to the language as L. Having that, we show the correctness of the identity mapping as a compilation scheme from Weakestmo to IMM SC in the following theorem. Theorem 1. Let prog be a program in L, and G be an IMM SC -consistent execution graph of prog. Then there exists an event structure S of prog under Weakestmo such that S ⊲ G.
To prove the theorem, we show that Weakestmo may construct the needed event structure in a step by step fashion by following a traversal of IMM SCconsistent execution graph from [17, § §6,7] .
A traversal of an IMM SC -consistent execution graph G is a sequence of traversal steps between traversal configurations. A traversal configuration T C of an execution graph G is a pair of sets of events, C, I , called the covered and issued set respectively. As an example, Fig. 4 presents all six (except for the initial one) traversal configurations of the execution graph G LB of LB from Fig. 2a , with the issued set marked by and the covered set marked by . A traversal might be seen as an execution of an abstract machine which is allowed to perform write instructions out-of-order but has to execute everything else in order. The first option corresponds to issuing a write event, and the second option to covering an event. The traversal strategy has certain constraints. To issue a write event, all external reads that it depends upon must read from issued events, while to cover an event, all its po-predecessors must also be covered. 9 For example, a traversal cannot issue e 2 2 : W(x, 1) before issuing e 1 2 : W(y, 1) in Fig. 4 , or cover e 1 1 : R(x, 1) before issuing e 2 2 : W(x, 1). These constraints allow to simulate the traversal by constructing a Weakestmo event structure.
According to [17, Prop. 6.5] , every IMM-consistent execution graph G has a full traversal of the following form:
where the initial configuration, T C init (G) G.Init, G.Init , has covered/issued only G's initial events and the final configuration, T C final (G)
G.E, G.W , has covered all G's events and issued all its write events.
Our simulation proof is divided into the following three lemmas, which establish a simulation relation, I(prog, G, T C, S, X), between the current traversal configuration T C of execution G and the current event structure's state S, X , where X is a subset of events corresponding to a particular execution graph extracted from the event structure S. Lemma 1. Let prog be a program of L, and G be an IMM SC -consistent execution graph of prog. Then I(prog , G, T C init (G), S init (prog), S init (prog ).E) holds. Lemma 2. If I(prog , G, T C, S, X) and G ⊢ T C −→ T C ′ hold, then there exist S ′ and X ′ such that I(prog, G, T C ′ , S ′ , X ′ ) and S − → * S ′ hold. Lemma 3. If I(prog, G, T C final (G), S, X) holds, then the execution graph associated with X is isomorphic to G.
The proof of Theorem 1 then proceeds by induction on the length of the traversal G ⊢ T C init (G) −→ * T C final (G). Lemma 1 serves as the base case, and Lemma 2 is the induction step simulating each traversal step with a number of event structure construction steps, and Lemma 3 concludes the proof.
The proofs of Lemmas 1 and 3 are technical but fairly straightforward. In contrast, Lemma 2 is much more difficult to prove. As we will see, simulating a traversal step sometimes requires constructing a new branch in the event structure, i.e., adding multiple events. For this reason, we introduce an intermediate simulation relation that holds throughout that construction (see §6).
Weakestmo to IMM SC Compilation Correctness by Example
Before presenting any formal definitions, we conclude this overview section by showcasing the construction used in the proof of Lemma 2 on execution graph G LB in Fig. 2a following the traversal of Fig. 4 . We have actually already seen the sequence of event structures constructed in Fig. 3 . Note that, even though Figures 3 and 4 have the same number of steps, there is no one-to-one correspondence between them as we explain below.
Consider the last event structure S f from Fig. 3 . A subset of its events X f {Init, e 1 12 , e 1 22 , e 2 11 , e 2 21 }, which we call a simulated execution, marked by is a maximal conflict-free subset of S f and all read events in X f are in codomain of S f .rf restricted to X f . Then, by definition, X f is extracted from S f . Also, an execution graph induced by X f is isomorphic to G LB . That is, construction of S f for LB shows that in Weakestmo it is possible to observe the same behavior as G LB . Now, we explain how we construct S f and choose X f .
During the simulation, we maintain the relation I(prog, G, T C, S, X) connecting a program prog, its execution graph G, its traversal configuration T C, an event structure S, and a subset of its events X. Among other properties, the relation states that all issued and covered events of T C have exact counterparts in X. Also, we require X to be extracted from S.
The initial event structure and X Init consist of only initial events. Then, following issuing of event e 1 2 : W(y, 1) in T C a (see Fig. 4a ), we need to add a branch to the event structure s.t. it has W(y, 1) in it. Since Weakestmo requires to add events according to the program order, we first need to add a read event related to 'a := [x]' of LB's thread I. Each read event in an event structure has to be justified from somewhere. In this case, the only write event to location x is the initial one. That is, the added read event e 1 11 is justified from it (see Fig. 3a ). In the general case, having more than one option, we would choose a 'safe' write event for an added read event to be justified from, i.e., the one which the corresponding branch is 'aware' of already and being justified from which would not break consistency of the event structure. After that, a write event e 1 21 : W(y, 1) could be added po-after e 1 11 (see Fig. 3b ), and I(LB,
11 , e 1 21 }. Next, we need to simulate the second traversal step (see Fig. 4b ), which issues W(x, 1). As with the previous step, we first need to add a read event related to the first read instruction of LB's thread II (see Fig. 3c ). However, unlike the previous step, the added event e 2 11 has to get value 1, since there is a dependency between instructions in thread II. As we mentioned earlier, the traversal strategy guarantees that e 1 2 : W(y, 1) is issued at the moment of issuing e 2 2 : W(x, 1), so there is the corresponding event in the event structure to justify the read event e 2 11 from. Now, the write event e 2 21 : W(y, 1) representing e 2 2 could be added to the event structure (see Fig. 3d ) and I(LB, G LB , T C b , S d , X d ) holds for X d = {Init, e 1 11 , e 1 21 , e 2 11 , e 2 21 }. In the third traversal step (see Fig. 4c ), the read event e 1 1 : R(x, 1) is covered. To have a representative event for e 1 1 in the event structure, we add e 1 12 (see Fig. 3e ). It is justified from e 2 21 , which writes the needed value 1. Also, e 1
12
represents an alternative to e 1 11 execution of the first instruction of thread I, so the events are in conflict.
However, we cannot choose a simulated execution X related to T C c and S e by the simulation relation since X has to contain e 1 12 and a representative for e 1 2 : W(y, 1) (in S e it is represented by e 1 21 ) while being conflict-free. Thus, the event structure has to make one other step (see Fig. 3f ) and add the new event e 1 22 to represent e 1 2 : W(y, 1). Now, the simulated execution contains everything needed, X f = {Init, e 1 12 , e 1 22 , e 2 11 , e 2 21 }. Since X f has to be extracted from S f , every read event in X has to be connected via an rf edge to an event in X. 10 To preserve the requirement, we connect the newly added event e 1 22 and e 1 21 via an ew edge, i.e., marked them to be equal writes. 11 This induces an rf edge between e 1 22 and e 2 11 . That is,
To simulate the latter traversal steps (see Figures 4d to 4f), we don't need to modify S f since the execution graph associated with X f is isomorphic to G LB . That is, in the proof we just need to show that, first,
Formal Definition of Weakestmo
In this section, we introduce the notation used in the rest of the paper and define the Weakestmo memory model formally.
Notation Given relations R 1 and R 2 , we write R 1 ; R 2 for their sequential composition. Given relation R we write R ? , R + and R * to denote its reflexive, transitive and reflexive-transitive closures. For a set A, we write [A] to denote the identity relation on A (that is, [A] { a, a | a ∈ A}). Hence, for instance, we may write
Given a function f , we denote by = f the set of f -equivalent elements:
In addition, given a relation R, we denote by R| =f the restriction of R to f -equivalent elements (R| =f R ∩ = f ), and by R| =f be the restriction of R to non-f -equivalent elements (R| =f R \ = f ).
Events, Threads and Labels
Events, e ∈ E, and thread identifiers, t ∈ Tid, are represented just by unique natural numbers. We treat the thread with identifier 0 as the initialization thread.
We let x ∈ Loc to range over locations, and v ∈ Val over values.
Each memory access has a mode which is either relaxed (rlx), release (rel), acquire (acq), acquire-release (acqrel), or sequentially-consistent (sc). 12 The modes are partially ordered by ⊏ as follows: 
Given a label l the functions typ, loc, val, mod return (when applicable) its type (i.e., R, W or F), location, value and mode correspondingly. By abuse of notation, we also use R, W, F for the set of all events with the corresponding type as well as, for example, RW for R ∪ W. We also use subscripts and superscripts to further restrict this set (e.g., W ⊒rel x denotes the set of write events operating on location x with mode at least as strong as rel).
Event Structures
An event structure S is a tuple E, tid, lab, po, rmw, jf, ew, co, K init , K where:
-E is a set of events.
tid : E → Tid is a function that assigns a thread identifier to every event.
We treat events with the thread identifier equal to 0 as initialization events and denote them as Init, that is Init {e ∈ E | tid(e) = 0}. lab : E → Lab is a function that assigns a label to every event.
po ⊆ E×E is a strict partial order on events, called program order, that tracks their precedence in the control flow of the program. Initialization events are po-before all other events and po edges relate non-initialization events only when they are from the same thread. po does not necessarily totally order all events of the same thread. Non-initialization events of the same thread, that are not related by program order, are called conflicting events. The corresponding binary relation cf is defined as follows:
We say that an event e 1 is an immediate po predecessor of e 2 if e 1 is po predecessor of e 2 and there is no event between them.
We also define the notion of immediate conflict. 13
is the justified from relation, which relates a write event to the reads it justifies. We require that a read not be justified by a conflicting write (i.e., jf ∩ cf ⊆ ∅) and jf −1 be functional (i.e., whenever w 1 , r , w 2 , r ∈ jf, then w 1 = w 2 ). We also define the notions of internal and external justification,
is an equivalence relation called the equal-writes relation. Note that equal writes have the same location and value, and non-reflexive edges of ew relate only conflicting relaxed writes.
is the coherence order, a strict partial order that relates non-equal write events with the same location. We require that coherence be closed with respect to equal writes (i.e., ew ; co ; ew ⊆ co) and total with respect to ew on writes to the same location:
-K init and K components are related to the process of the event structure construction, which is explained in more detail later in this section. They are functions whose codomain is a set of thread states.
• K init : Tid → ThreadState is a function that given a thread identifier t returns the initial state σ 0 of the thread. • K : E \ dom(rmw) → ThreadState is a function that assigns state σ to every event e, that is not a read part of some read-modify-write pair. This state corresponds to the thread state after the effect of the event e has been performed.
Given an event structure S, we use S.x notation to refer to its components (e.g., S.E, S.po etc. ). For a set A of events, we write S.A for the set A ∩ S.E (e.g., S.W x ). Further, for e ∈ S.E, we write S.typ(e) to retrieve typ(S.lab(e)). Similar notation is used for the functions loc, val, and mod. Given a set of thread identifiers T , we also use notation S.thread(T ) to denote the set of events belonging to one of the threads from T , i.e., S.thread(T ) {e ∈ S.E | S.tid(e) ∈ T }. By abuse of notation we often write S.thread(t) instead of S.thread({t}), assuming t is a single thread identifier.
Derived Sets and Relations
First, the reads-from relation, S.rf, of a Weakestmo event structure is derived. It is defined as an extension of S.jf to all S.ew-equivalent writes. Note that unlike S.jf −1 the S.rf −1 relation is not functional.
The relation S.fr, called from-reads or reads-before, places read events before subsequent writes.
S.fr S.rf −1 ; S.co
The extended coherence S.eco is a strict partial order, that orders write-write, write-read and read-write pairs of events operating on the same location.
We observe that in our model, eco is equal to rf ∪ co ; rf ? ∪ fr ; rf ? , similar to the corresponding definitions on execution graphs in the literature. 14 Next, we define the synchronizes-with S.sw and happens-before S.hb relations, using auxiliary notions of release sequence S.rs and release prefix S.release. These definitions coincides with the conventional definitions except that in the Weakestmo case the jf relation is used in place of rf. We say that two events are in extended conflict if they happen after some conflicting events.
S.ecf (S.hb −1 ) ? ; S.cf ; S.hb ?
The last ingredient that we need for event structure consistency is the notion of visible events. We define it in a few steps. First, consider an event and all the write events, that were used to externally justify it or one of its S.po ∪ S.jf ancestors. The relation S.jfe ; (S.po ∪ S.jf) * defines this connection formally. Next, consider only those writes that are in conflict with the event they 'recursively' justify: S.cf ∩ S.jfe ; (S.po ∪ S.jf) * . We say that event is visible if all such writes are ew-equivalent to some write event in the same control-flow branch of the program 15
This equivalence equivalence does not hold in the original Weakestmo model [5] . To make the equivalence hold, we made ew transitive, and required ew ; co ; ew ⊆ co. 15 Note, that in [5] the definition of the visible events is slightly more verbose. We proved in Coq that our simpler definition is equivalent to the one given there.
Event Structure Consistency
Similarly to the axiomatic style of the memory model definitions, Weakestmo further restricts the semantics of the program by the requirement on the event structure to satisfy the consistency predicate.
Definition 1. An event structure S is said to be consistent if the following conditions hold.
In brief, consistency requires that (1) no event happen after two conflicting events, (2) reads not be justified by a write in extended conflict, (3) reads be justified only by po-prior or visible events, (4) the execution be coherent, (5) immediate conflicts arise only because of read events, and (6) there exist no duplicate read events (i.e., in immediate conflict and justified by equal writes).
Execution Extraction
We move on to the extraction of executions from an event structure.
First, we define an execution graph.
Definition 2. An execution graph G is a tuple E, tid, lab, po, rmw, rf, co where its components are defined similarly as in case of an event structure with the following exceptions:
po is required to be total on the set of events from the same thread. Thus, execution graphs have no conflicting events, i.e., cf ≡ ∅. -The rf relation is given explicitly instead of being derived. Also, there is no jf and ew relations. co totally orders write events operating on the same location.
All derived relations are defined similarly as for event structures except for rs, release, sw, and hb which are defined with rf instead of jf:
Following [12] we also define SC-before relation scb and partial SC relations psc base and psc F .
Next we show how to extract an execution graph from the event structure.
Definition 3. A set of events X is called extracted from S if the following conditions are met:
-X contains only visible events of S, i.e., X ⊆ S.Vis.
-X is hb-downward-closed, i.e., dom(S.hb ; [X]) ⊆ X.
Given an event structure S and extracted subset of its events X, it is possible to associate with X an execution graph G simply by restricting the corresponding components of S to X:
We say that such execution graph G is associated with X and that it is extracted from the event structure: S ⊲ G. Finally, we define the consistency of an execution graph. We say that only the consistent execution graphs, extracted from the Weakestmo consistent event structure, constitute the set of program executions under Weakestmo memory model. (coherence) -G.rmw ∩ (G.fr ; G.co) ⊆ ∅.
(atomicity) -G.psc base ∪ G.psc F is acyclic.
(sc)
Event Structure Construction
The event structure is constructed operationally, in a way that guarantees po∪jf to be acyclic. Thus, the Weakestmo model prevents an appearance of thin-air reads by construction. The operational semantics is relatively complicated and proceeds in several stages. First, we assume there is a small step operational semantics σ es − → t σ ′ defining the sequential execution of the thread with the identifier t. It is defined on the set of thread states σ, σ ′ : ThreadState. Thread state σ, among other components, contains list of instructions instrs, program counter pc ∈ N which points to the next instruction to be executed, and partial execution graph G, corresponding to the execution of the thread up to the current state. Partial execution graph is organized simpler that the full execution graph. It contains only the events of the given thread and it does not record reads-from and coherence order relations (σ.G.E ⊆ G.thread(t) and σ.G.rf = σ.G.co = ∅).
Step of thread sequential semantics is labeled by the list of events es, which is either empty (in case when the thread performs some local actions, like a conditional jump), contains single event (in case of executing load, store or fence instructions), or a pair of events (when some read-modify-write instruction is executed).
Second, there is a relation S es = ⇒ S ′ , called basic step, defined on the event structures, which is mainly responsible for the update of the set of events E, program order po, and read-modify-write pairs rmw. Internally, basic step performs several thread-local steps σ − → * t σ ′ , until one of this steps will produce non-empty list of events es. These events are added to the event structure on the step (i.e., S ′ .E = S.E ∪ es).
Next, there are three relations S w,r The set W should be prefix-closed with respect to S.co and disjoint with W. The event w will be placed S.co-after events from W in S ′ and before the S.co-complement of this set. 16 The relation S es − ⇀ S ′ defines the whole transition relation using auxiliary relations mentioned above. It consists of four cases, which correspond to execution of fence, load, store or atomic update instructions. The relation S es − → S ′ additionally ensures that S ′ is consistent.
Because of space constraints, we refer the reader to [5] and our Coq developments for the full formal definitions related to the event structure construction.
IMM SC : IMM extended with SC accesses
Unlike Weakestmo, IMM tracks syntactic dependencies in its execution graphs, and uses them to forbid "out-of-thin-air" behaviors.
Definition 5. An IMM execution graph G is a tuple: E, tid, lab, po, rmw, rf, co, data, addr, ctrl, casdep 16 Since the co forms a total order on a set of writes with the same location, it is sufficient to pick just a single write event and place the event w co-after it. However, we found it technically more convenient to define the position by the set of write events.
where E, tid, lab, po, rmw, rf, co is an execution graph (Def. 2), and data, addr, ctrl, casdep are relations that represent data, address, control and CAS dependencies respectively. They should satisfy the following constraints:
IMM-consistency is defined as follows [17] :
An execution graph G is IMM-consistent if the following hold:
(atomicity) -G.ar is acyclic.
(no-thin-air) To handle SC accesses, we define an extension of IMM, which we call IMM SC . Its consistency predicate is defined as follows:
Definition 7. An execution graph G is IMM SC -consistent if, in addition to the conditions in Def. 6, the following hold:
The sc constraint is taken as is from Weakestmo-consistency and RC11consistency [12, Definition 1]. Since psc F is already included in IMM's ar relation, one may consider the natural option of including psc base in ar as well. However, it leads to a too strong model, as it forbids the following behaviour:
This behaviour is allowed by POWER (using any of the two intended compilation schemes for SC accesses).
Compiling IMM SC to Hardware
The main benefit of IMM is its use to simplify compilation correctness proofs by breaking them into two parts: (i) correctness of mapping from the high-level language to IMM; and (ii) correctness of mapping from IMM to the different multiprocessor architectures. In this section, we establish part (ii) for IMM SC by extending the results of [17] to support SC accesses with their intended compilation schemes to the different architectures.
As was done in [17] , since IMM SC and the models of hardware we consider are all defined in the same declarative framework (using execution graphs), we formulate our results on the level of execution graphs. Thus, we actually consider the mapping of IMM SC execution graphs to target architecture execution graphs that is induced by compilation of IMM SC programs to machine programs. Hence, roughly speaking, for each architecture α ∈ {TSO, POWER, ARMv7, ARMv8, RISC-V}, our (mechanized) result takes the following form:
If the α-execution-graph G α corresponds to the IMM SC -execution-graph G, then α-consistency of G α implies IMM SC -consistency of G.
Since the mapping from Weakestmo to IMM SC (on the program level) is the identity mapping (Theorem 1), we obtain as a corollary the correctness of the compilation from Weakestmo to each architecture α that we consider. The exact notions of correspondence of G α and G is presented in Appendices A to C.
The mapping of IMM SC to each architecture, follows the intended compilation scheme of C/C++11 in [14, 12] , and extends the corresponding mappings of IMM from [17] with the mapping of SC reads and writes. Next, we schematically present these extensions.
TSO There are two alternative sound mappings of SC accesses to x86-TSO:
Fence after SC writes
Fence before SC reads
The first, which is implemented in mainstream compilers, inserts an mfence after every SC write; which the second inserts an mfence before every SC read. Importantly, one should globally apply one of the two mappings, to ensure the existence of an mfence between every SC write and following SC read.
POWER There are two alternative sound mappings of SC accesses to POWER:
Leading sync Trailing sync (|R sc |) sync;(|R acq |) (|R sc |) ld;sync (|W sc |) sync;st (|W sc |) (|W rel |);sync (|RMW sc |) sync;(|RMW acq |) (|RMW sc |) (|RMW rel |);sync
The first scheme inserts a sync before every SC access, while the second inserts an sync after every SC access. Importantly, one should globally apply one of the two mappings, to ensure the existence of a sync between every two SC accesses.
Observing that sync is the result of mapping an SC-fence to POWER, we can reuse the existing proof for the mapping of IMM to POWER. To handle the leading sync (respectively, trailing sync) scheme we introduce a preceding step, in which we prove that splitting in the whole execution graph each SC access to a pair of an SC fence followed (preceded) by a release/acquire access is a sound transformation under IMM SC . That is, this global execution graph transformation cannot make an inconsistent execution consistent:
Theorem 2. Let G be an execution graph such that
where G.po ′ G.po \ G.rmw. Let G ′ be the execution graph obtained from G by weakening the access modes of SC write and read events to release and acquire modes respectively. Then, IMM SC -consistency of G follows from IMM-consistency of G ′ .
Having this theorem, we can think about mapping of IMM SC to POWER as if it consists of three steps. We establish the correctness of each of them separately.
1. At the IMM SC level, we globally split each SC-access to an SC-fence and release/acquire access. Correctness of this step follows by Theorem 2. 2. We map IMM to POWER, whose correctness follows by the existing results of [17] , since we do not have SC accesses at this stage. 3. We remove any redundant fences introduced by the previous step. Indeed, following the leading sync scheme, we will obtain sync;lwsync;st for an SC write. The lwsync is redundant here since sync provides stronger guarantees than lwsync and can be removed. Similarly, following the trailing sync scheme, we will obtain ld;cmp;bc;isync;sync for an SC read. Again, the sync makes other synchronization instructions redundant.
ARMv7 The ARMv7 model [1] is very similar to the POWER model with the main difference being that it has a weaker preserved program order than POWER. However, Podkopaev et al. [17] proved IMM to POWER compilation correctness without relying on POWER's preserved program order explicitly but assuming the weaker version of ARMv7's order. Thus, their proof also establishes correctness of compilation from IMM to ARMv7.
Extending the proof to cover SC accesses follows the same scheme discussed for POWER, since two intended mappings of SC accesses for ARMv7 are the same except for replacing POWER's sync fence with ARMv7's dmb:
Leading dmb
Trailing dmb
ARMv8 The mapping to ARMv8 [19] is defined in a straightforward fashion, since ARMv8 has dedicated instructions for SC accesses:
stlr (|FADD sc |) L:ldaxr;stlxr;bc L (|CAS sc |) L:ldaxr;cmp;bc Le;stlxr;bc L;Le:
RISC-V The RISC-V model [21, 22] is stronger than the ARMv8 model. Therefore, soundness of mapping to RISC-V follows from soundness of mapping to ARMv8.
Compiling C11 and RC11 to IMM SC
Podkopaev et al. [17] also proved the correctness of the mapping from RC11 without SC and non-atomic accesses to IMM. The extension of this result to cover SC accesses using IMM SC is straightforward since IMM SC has the same sc axiom as RC11. Similarly, the correctness of the mapping from C11, including SC and non-atomic accesses (after applying the fixes of [23] and [12] ), to IMM SC is trivial.
Simulation Relation for Weakestmo to IMM SC Proof
In this section, we define the relation I, which is used for the simulation of a traversal of an IMM SC -consistent execution graph by a Weakestmo event structure presented in §2.3. The way we define I(prog , G, C, I , S, X) induces a strong connection between events in the execution graph G and the event structure S. We make this connection explicit with the function s2g G,S : S.E → G.E, which is defined in a way that satisfies the following predicate: That is, e and s2g G,S (e) belong to the same thread and have the same poposition in the thread. Note that s2g G,S does not have to be injective since if events e and e ′ are in immediate conflict in S they have the same s2g G,S -image in G.
In combination with s2g G,S , we often use functorial · f and co-functorial · f maps for sets and relations:
denotes a subset of S's events whose s2g-images are covered events in G, and S.rmw rel s2g G,S denotes a relation on events in G whose s2g-preimages in S are related by S.rmw.
In the rest of the paper, we do not write the subscript for s2g since the parameters for this function are usually deducible from the context. Moreover, we use the notation · and · only in combination with s2g and we omit the subscript (i.e., set and rel) since it can be deduced from the context (e.g., we write just S.rf instead of S.rf rel s2g ). In the simulation relation I, we also use a function S.K C : Tid → ThreadState parameterized by a covered set C:
Intuitively, this function returns the state of thread t's local semantics that represents thread t's covered part of the simulated execution X.
To define the simulation relation I(prog, G, T C, S, X), we introduce an auxiliary relation, I T (prog, G, T C, S, X), parameterized by a set of threads T . The relation I itself is defined to be I Tid where Tid is the complete set of threads. The auxiliary relation I T is used later in the proof of Lemma 2 in §6, where a new branch in the event structure has to be constructed via multiple steps. In the middle of these steps, the full simulation relation I may be temporally broken for the thread for which the branch is constructed.
We define the relation I T (prog, G, C, I , S, X) to hold if the following conditions are met: 13. Let e, w, and w ′ be events in S s.t. (i) e, w is an S.release edge, (ii) w and w ′ is in the same S.ew equivalence class, (iii) w ′ is in X, and (iv) s2g(w ′ ) is issued. Then e is in X:
dom(S.release ; S.ew ; [X ∩ I ]) ⊆ X This property is needed to show that dom(S.hb \ S.po) is included in X. 14. Let r, r ′ , w, and w ′ be events in S s.t. (i) r and r ′ are in immediate conflict and justified from w and w ′ respectively, and (ii) r ′ is in X and its thread is in T . Then s2g(w) is G.co-less than s2g(w ′ ):
This property is needed to prove cf imm -justification on the simulation step. 15. For all t ∈ T there exists σ s.t. S.K C (t) − → * t σ and the thread-local execution graph σ.G is equivalent modulo rf and co components to the restriction of G to the thread t.
Simulation Step Proof
In this section, we present an outline of our proof of Lemma 2, which states that the simulation relation I can be restored after a traversal step.
Suppose that I(prog , G, T C, S, X) holds for some prog, G, T C, S, and X, and we need to simulate a traversal step G ⊢ T C −→ t T C ′ s.t. it either covers or issues an event in thread t. Then we need to provide an event structure S ′ and a subset of its events X ′ s.t. I(prog , G, T C ′ , S ′ , X ′ ) holds. Weakestmo might need to take multiple steps from S to S ′ , i.e., to construct a new so-called certification branch to have representatives of all issued write events from T C ′ .
For example, consider construction of the certification branch to simulate issuing of e 1 2 : W(y, 1), i.e., the traversal step
ending in Fig. 4a . Before the step,
holds. To simulate the step, we need to start by showing that it is possible to execute instructions of LB's thread I, which are 'a := [x]; [y] := 1', in a way that they would produce a sequence of labels satisfying two properties:
1. For each read label in the sequence, it would be possible to find either a related write event in S init (LB) or a related write label in the sequence. 2. The sequence would contain labels of all issued write events in G.thread(1)∩ (G LB .Init ∪ {e 1 2 }).
In our case, the sequence is [R(x, 0), W(y, 1)]. The first requirement arises from the fact that, for all read events to be added to the certification branch, we need to have write events to justify from. The second one means that, regardless of changing values read by some instructions, it is still possible to write the same values to the same locations as in the issued set after the traversal step.
In the general case, to construct such a sequence, we follow the approach of [17] for a similar problem with certifying promises in the compilation proof from PS to IMM. For each read event r in the part of the graph for which we construct the certification branch (i.e., r ∈ G.thread(t)∩(C ′ ∪dom(G.po ; [I ′ ]))), we choose from which write event the certification version of r will read from. In order to do that, we introduce several auxiliary definitions.
First, we define the set of determined events, which depends on the execution graph G and traversal configuration C ′ , I ′ :
Intuitively, G.determined( C ′ , I ′ ) represents the events which should be equal to their counterparts in the certification branch. In partucular, it means that these events should have the same label in graph G and in the certification branch, and additionally the read events should have the same reads-from source. Note that set of determined events contain dom(G.rfi ? ; G.ppo ; [I ′ ])-a set of events whose values the issued write events depend on, and codom(G.rfe ; [G.E ⊒acq ])a set of read events with mode equal or stronger than acq that read value from another thread.
Second, we introduce the viewfront relation:
For any event e ∈ G.E the set dom(G.vf( C ′ , I ′ ) ; [e]) contains write events that are 'observable' by e.
Finally, we construct the simulation read from relation denoted sim rf:
It relates a read event r to the co-last 'observable' write event with same location. Assuming that G is IMM SC -consistent, it can be shown that G.sim rf agrees with G.rf for the determined reads.
Having sim rf as a guide for values read by instructions in the certification run, we construct the steps of the thread-local operational semantics
with the required sequence of labels using the receptiveness property 19 of the thread's semantics. Here σ is a thread state corresponding to the S.po-last covered event from X with thread identifier t, that is σ = S.K C (t) and σ.G.E is equal to G.thread(t) ∩ C. Then, the proof of Lemma 2 is done by induction on σ − → * t σ ′ using an auxiliary relation I cert . Formally, this is stated in the following lemmas. Here Lemma 4 constructs σ − → * t σ ′ and serves as the base case of induction. Lemma 5 is the induction step. Lemma 6 concludes the proof of Lemma 2 by restoring the simulation relation I.
We define the relation I cert (prog, G, C, I , C ′ , I ′ , S, X, t, Br, σ, σ ′ ), where the parameter Br represents the already constructed part of the certification branch, to hold if the following conditions are met:
3. σ and σ ′ are thread states s.t. σ corresponds to the S.po-last event in Br and σ ′ is reachable from σ, i.e., σ − → * t σ ′ . 4. The partial execution graph of σ ′ contains covered and issued events up to the G.po-last issued write in thread t:
The partial execution graph of σ ′ assigns same type, location and mode as the full execution graph G does. Additionally, it assigns the same value as G to determined events.
The set Br consists of initial events plus the events from the thread t and covered prefixes of Br and X restricted to thread t coincide: Let us again consider the program LB from §2 together with its IMM SC execution graph depicted on Fig. 2a . We showcase the construction of the event structure following the traversal of Fig. 4 once again, but this time paying more attention to some particular properties preserved by I and I cert .
Initially, T C init (G LB ) G LB .Init, G.Init , and X Init S init (prog).Init (it is easy to see that Init forms a valid extracted subset, since all the constraints of Def. 3 are met). Since for S init (prog ) we have rmw = jf = co = ∅ and ew = [W] ; id ; [W] most of the properties of I hold trivially.
At the first step of the traversal T C a the event e 1 2 : W(y, 1) is issued. Steps Fig. 3a-Fig. 3b build the corresponding certification branch. The G.sim rf(T C a ) relation connects the read event e 1 1 with the initial write event G.Init since it is the only 'observable' event for this read. Thus on the step Fig. 3a the corresponding write event S.Init is chosen to justify the read event e 1 11 (note that s2g(S a .Init) = G LB .Init, s2g(e 1 11 ) = e 1 1 and consequently S a .jf ⊆ G LB .sim rf(T C a )). At the step Fig. 3b the certification branch Br a {S b .Init, e 1 11 , e 1 21 } is fully constructed. It constitues valid extracted subset of S b (denoted as X b ) since it has no conflicting events, rf-complete, contains only visible events, and hb-downward-closed.
Let us check that I(prog, G LB , T C a , S b , X b ) indeed holds. Given that s2g(S a .Init) = G.Init, s2g(e 1 11 ) = e 1 1 , s2g(e 1 12 ) = e 1 2 it is easy to see that properties 4 and 5 hold. Properties 6a and 7 hold trivially since there are no covered reads nor external justification edges. Properties 8 and 9 also hold because S b contains only reflexive ew edges. There is single co edge in S b : S b .Init, e 1 21 ∈ S b .co. Its s2g-image lies in G.co: G.Init, e 1 2 ∈ G.co. Thus 10a and 10b also hold. At the next step of traversal T C b another write event e 2 2 : W(x, 1) is issued. Steps Fig. 3c-Fig. 3d build the certification branch similarly as at the previous step. The only difference is that this time the read event e 2 1 is determined (since e 2 1 , e 2 2 ∈ G LB .ppo ; [I b ]). Because of that the simulation read-from relation G LB .sim rf(T C b ) pick the write event e 1 2 as a source for e 2 1 . The corresponding write event in S d is e 1 21 . This write belongs to X b and its s2g image is issued therefore the constraint 7 is satisfied.
The next traversal step T C c is challenging. The read event e 1 1 : R(x, 1) is covered. Since the event structure S d does not contain read events with s2g image equal to e 1 1 and label equal to R(x, 1), it has to 're-certify' execution of the thread I. The new certification branch is constructed during the steps Fig. 3e - Fig. 3f . Notice at this point of the traversal the event e 1 1 becomes determined and thus e 2 2 , e 1 1 ∈ G LB .sim rf(T C c ). It means that in the event structure the write event e 2 21 should be chosen to justify read e 1 12 . Note that after the construction of certification branch ends, it is not possible to just add events e 1 12 and e 1 22 to the extracted subset, since they are in conflict with the events e 1 11 and e 1 21 . Thus in order to form new extracted subset X f we have to replace the later events with the former.
Notice the property 6a holds, because the only read event in X f with s2g image covered is e 1 12 . Since its s2g image e 1 1 belongs to the set of determined events the G LB .sim rf(T C c ) and G LB .rf assign the same write to it. The property 7 also holds, because e 2 21 belongs to X f and its s2g image is issued.
The event structure S f has single non-reflexive ew edge: e 1 21 , e 1 22 ∈ S f .ew. These two events form an ew equivalence class. Both of them have same s2g image -event e 1 2 , which is issued, and moreover the event e 1 22 belongs to X f . Thus 8 and 9 hold. Notice that the S f .ew edge e 1 21 , e 1 22 induces an S f .rf edge e 1 22 , e 2 11 , therefore the rf-completness constraint is satisfied for X f . Also consider the S f .cf ∩ (S f .jfe ; (S f .po ∪ S f .jf) * ) path between the events e 1 21 and e 1 12 . Without the edge e 1 21 , e 1 22 ∈ S f .ew the existence of this path would make the event e 1 12 invisible, thus violationg X f ⊆ S f .Vis constraint. Since S f do not have any new co edges properties 10a and 10b still hold. On the traversal step T C d event e 1 2 : W(y, 1) is covered. However the event structure S d is not updated since it already 'fits' the new traversal configuration. The same applies to traversal steps T C e and T C f .
Related Work and Conclusion
While there are several memory model definitions both for hardware architectures [1, 15, 9, 19, 20] and programming languages [2, 3, 16, 18, 13, 10] in the literature, there are relatively few compilation correctness results [11, 12, 24, 8, 17] .
As a way to show correctness of Weakestmo compilation to hardware, we employed IMM, which we extended with SC accesses, from which compilation to hardware follows. The only limitation of this approach is that IMM enforces ordering between RMW events and subsequent memory accesses. This ordering is also typically enforced in hardware architectures (x86-TSO, POWER, ARMv7), but not always. The exception is ARMv8, which has two ways of compiling RMWs: one is via a pair of load-linked and store-conditional (LDX/STX) instructions in a loop, which naturally induces a dependency from RMWs to subsequent accesses, and the other is via hardware instructions, such as CAS, LDADD, LDCLR, LDMAX, LDMIN, which do not necessarily induce dependencies to subsequent instructions.
An alternative for compiling to ARMv8 would have been to use the recently developed Promising-ARM model [20] . Indeed, since Promising-ARM is closely related to PS [11] , it should be relatively easy to prove the correctness of compilation from PS to Promising-ARM. Establishing compilation correctness of Weakestmo to Promising-ARM, however, would remain unresolved because Weakestmo and PS are incomparable [5] . Moreover, a direct compilation proof would probably also be quite difficult because of the rather different styles in which these models are defined. A From IMM SC to ARMv8
The intended mapping of IMM to ARMv8 is presented schematically in Fig. 5 and follows [14] . Note that acquire and SC loads are compiled to the same instruction (ldar) as well as release and SC stores (stlr). In ARM assembly RMWs are represented as pairs of instructions-exclusive load (ldxr) followed by exclusive store (stxr), and these instructions are also have their stronger (SC) counterparts-ldaxr and stlxr.
We use ARMv8 declarative model [7] (see also [19] ). 20 Its labels are given by:
In turn, ARM's execution graphs are defined as IMM SC 's ones, except for the CAS dependency, casdep, which is not present in ARM executions.
The definition of ARMv8-consistency requires the following derived relations (see [19] for further explanations and details): -G a .po| loc ∪ G a .rf ∪ G a .fr ∪ G a .co is acyclic.
We interpret the intended compilation on execution graphs:
Let G be an IMM execution graph. An ARM execution graph G a corresponds to G if the following hold:
E} where:
data, and G.addr = G a .addr (the compilation does not change RMW pairs and data/address dependencies) -G.ctrl ⊆ G a .ctrl (the compilation only adds control dependencies)
(exclusive reads entail a control dependency to any future event, except for their immediate exclusive write successor if arose from an atomic increment) -G.casdep ; G.po ⊆ G a .ctrl (CAS dependency to an exclusive read entails a control dependency to any future event)
We state our theorem that ensures IMM SC -consistency if the corresponding ARMv8 execution graph is ARMv8-consistent.
Theorem 3. Let G be an IMM execution graph with whole serial numbers (sn[G.E] ⊆ N), and let G a be an ARMv8 execution graph that corresponds to G. Then, ARMv8-consistency of G a implies IMM SC -consistency of G.
Proof (Outline). IMM-consistency of G follows from [17, Theorem 4.5] . That is, we only need to show that the sc axiom holds for G. We start by showing that
Then, we finish the proof by showing that G a .psc base ∪ G a .psc F is included in 
B From IMM SC to TSO
The intended mapping of IMM SC to TSO is presented schematically in Fig. 6 . There are two possible alternatives for compiling SC accesses (see the bottom of Fig. 6 ): to compile an SC store to a store followed by a fence or to compile an SC load to a load preceded by a fence. Both of the schemes guarantee that in compiled code there is a fence between every store and load instructions originated from SC accesses. Regarding compilation schemes of SC accesses, our proof of the compilation correctness from IMM SC to TSO depends only on this property. That is, in this section, we concentrate only on the compilation alternative which compiles SC stores using fences. As a model of the TSO architecture, we use a declarative model from [1] . Its labels are given by: The following derived relations are used to define the TSO-consistency predicate. hb TSO ppo TSO ∪ fence TSO ∪ implied fence TSO ∪ rfe ∪ co ∪ fr Definition 11. G is called TSO-consistent if the following hold:
codom(G.rf) = G.R.
(rf-completeness) -For every location x ∈ Loc, G.co totally orders G.W(x).
(co-totality) -po| loc ∪ rf ∪ fr ∪ co is acyclic.
(sc-per-loc) -G.rmw ∩ (G.fre ; G.coe) = ∅.
(atomicity) -G.hb TSO is acyclic.
(tso-no-thin-air)
Next, we state our theorem that ensures IMM SC -consistency if the corresponding TSO execution graph is TSO-consistent.
Theorem 4. Let G be an IMM SC execution graph with whole identifiers (G.E ⊆ N), and let G t be an TSO execution graph that corresponds to G. Then, TSOconsistency of G t implies IMM SC -consistency of G.
Proof (Outline). Since G t corresponds to G, we know that [G.W sc ]; G.po; [G.R sc ] ⊆ G t .po; [G t .MFENCE]; G t .po as the aforementioned property of the compilation scheme. We show that G t .ehb TSO G t .hb TSO ∪ [G t .MFENCE]; G t .po ∪ [G t .MFENCE]; G t .po is acyclic. Then, we show that G.psc base ∪ G.psc F is included in G t .ehb + TSO . It means that the sc axiom holds for G, and it leaves us to prove that G is IMMconsistent. That is done by standard relational techniques (see [6] ). The correspondence between IMM and POWER execution graphs which follows the trailing compilation scheme may be presented similarly with two main difference. First, obviously, SC accesses are compiled to release and acquire accesses followed by SC fences: Theorem 5. Let G be an IMM execution graph with whole identifiers (G.E ⊆ N), and let G p be a POWER execution graph that corresponds to G. Then, POWERconsistency of G p implies IMM SC -consistency of G.
Proof (Outline). We construct an IMM execution graph G ′ by inserting SC fences before SC accesses in G. We also construct G NoSC from G ′ by replacing SC write and read accesses of G ′ with release write and acquire read ones respectively.
Obviously, IMM SC -consistency of G follows from IMM SC -consistency of G ′ , which, in turn, follows from IMM-consistency of G NoSC by Theorem 2. We construct an IMM execution graph G ′′ from G NoSC by inserting release fences before release writes, and then an IMM execution graph G NoRel from G ′′ by weakening the access modes of release write events to a relaxed mode. As on a previous proof step, IMM-consistency of G NoSC follows from IMM-consistency of G ′′ , which in turn follows from IMM-consistency of G NoRel by [17, Theorem 4.1] .
Thus to prove the theorem we need to show that G NoRel is IMM-consistent. Note that G p -the POWER execution graph corresponding to G-also corresponds to G NoRel by construction of G NoRel . That is, IMM-consistency of G NoRel follows from POWER-consistency of G p by [17, Theorem 4.3] since G NoRel does not contain SC read and write access events as well as release write access events.
⊓ ⊔
