This paper develops an operational semantics for a releaseacquire fragment of the C11 memory model with relaxed accesses. We show that the semantics is both sound and complete with respect to the axiomatic model of Batty et al. The semantics relies on a per-thread notion of observability, which allows one to reason about a weak memory C11 program in program order. On top of this, we develop a proof calculus for invariant-based reasoning, which we use to verify the release-acquire version of Peterson's mutual exclusion algorithm.
Introduction
Intensive research on the correctness of shared-memory concurrent programs over the last three decades has resulted in a variety of tools and techniques. However, the vast majority of these have been developed on the assumption of sequential consistency [20] . Programs running on modern hardware execute using weak memory models [2] , requiring many of these techniques to be reworked. This paper is focused on the C11 memory model, which has been the topic of several recent papers (e.g., [4-6, 8, 11, 13, 14, 16, 19, 21, 22] ). Typically the C11 memory model is described using an axiomatic semantics [4] [5] [6] 19 ] via a two-step procedure. (1) Construct candidate executions of a program comprising low-level (e.g., read/write operations) in which reads may return an arbitrary value. (2) Apply a number of axioms over the memory model to rule out invalid candidate executions. Such axioms may state, for instance, that every read is validated by a write that has written the value read. Of particular interest are axioms that exclude certain cycles from arising. However precise, axiomatic definitions are unsuitable for program verification (in particular, those involving invariant-based reasoning), which requires one to consider the step-wise execution of a program. There has therefore been a substantial effort to develop an operational semantics: for weak memory models in general [13, 14, 17] and for C11 specifically [21, 22] .
Our key goal in this paper is to develop an operational model that supports verification of weak memory C11 programs. Like many programming languages, C11 has several advanced features, e.g., speculation, that contributes to the complexity of the logics for reasoning about them. Some operational models (e.g., [22] ) attempt to deal with the full complexity of the language and its behaviour. Other models focus on a well-behaved and well-understood fragment (e.g., [14, 17] ). In order to support an intuitive verification method, we take the latter course. We do not handle some forms of speculation (load buffering), release sequences, non-atomic accesses or sequentially consistent accesses. This leaves us with the so-called RAR fragment [5] of C11 (see Section 4.1), where sb ∪ rf is acyclic, and thus dependencies between operations are easier to manage. All read/write/update operations are either relaxed or synchronised via release-acquire annotations. Acyclicity of sb∪rf precludes behaviours allowed by hardware architectures such as ARMv8 [25] . Thus, to ensure programs proved correct by our logic remain sound, one must ensure adequate fencing of independent instructions during compilation (see [18] for details).
This paper comprises three main contributions. The first contribution is an operational semantics for the RAR fragment that we prove to be both sound and complete with respect to the axiomatic definition. Our semantics (like [14, 24] ) allows each thread to have its own (per-thread) observations of memory. We build on the recently proposed extended coherence order [19] (which is the transitive closure of the communication relation in [3] ). The extended coherence order describes the order of reads and writes to a variable (see Example 3.3), which in turn enables one to define how events may be introduced in a valid C11 execution without violating validity of the axioms.
We combine the extended coherence order with the causality relation of C11 (formalised by happens-before) to define the set of writes already encountered by each thread. This set is in turn used to define the writes observable by the thread (see Section 3.2). Our operational semantics naturally builds on observability: reads are validated on-the-fly (as opposed to a post-hoc manner in the axiomatic semantics). Thus, each state constructed using the transition relations of our operational semantics is a valid C11 state (see Section 4.2). Moreover, we show that any candidate execution that is valid according to the axiomatic semantics can be generated by our operational semantics.
The second contribution is a verification technique that builds on the operational semantics to enable inductive reasoning over the program steps. One difficulty in using an operational semantics of weak-memory to support verification is the fact that the state spaces of such operational models are far more complicated than the state space that one would use for a verification over sequentially consistent memory, where the shared store can be represented using a simple mapping from variables to values. We address this issue by developing a notation that builds on conventional reasoning (over sequentially consistent memory). For example, we include assertions that ensure a thread will read a particular value in a C11 state and assertions that ensure happens-before order between writes to different variables. The former is analogous to equations on variables and their values in the conventional setting; the latter has no direct analogue in a sequentially consistent setting (the closest analogue is the use of auxiliary variables [23] to record whether certain operations have already occurred).
Our third contribution is the demonstration of the utility of our verification method by proving the mutual exclusion property of a C11 version of Peterson's algorithm [28] .
Command Language
This section describes our command language and defines its uninterpreted operational semantics; namely, an operational semantics that generate the read, write or update action for each step of the corresponding command. These actions are in turn used to generate state transitions in Section 3, where the reads and writes are interpreted in a C11 state. Such a decoupled approach is inspired by the approach taken by Lahav et al. [16] .
Syntax
The syntax of commands (for a single thread) is defined by the following grammar, where Exp and Com define expressions and commands, respectively. We assume that ⊖ is a unary operator (e.g., ¬), ⊗ is a binary operator (e.g., ∧, ∨), B is an expression (of type Exp) that evaluates to a boolean, x is a variable (of type Var) and n is a value (of type Val).
Commands have their standard meanings. The only exceptions are the synchronising annotations, release R, acquire A and release-acquire RA (which we describe in detail below), and the command swap, which generates a read-modifywrite update event, atomically swapping the variable x with value n. Note that (for simplicity), we only present a releaseacquire version of the swap operation, but leave in the RA annotation for emphasis. Furthermore, we assume unannotated accesses are relaxed, i.e., data races do not give rise to undefined behaviour; however it is straightforward to extend the semantics to incorporate non-atomic accesses (which potentially generate undefined behaviour).
Example 2.1 (Peterson's algorithm). The running example for this paper will be the classic Peterson's mutual exclusion algorithm for two threads (see Algorithm 1) implemented using release-acquire annotations (this algorithm is taken from [28] ). As with Peterson's original algorithm, variable flag i is used to indicate whether thread i intends to enter its critical section and a shared variable turn is used to "give way" when both threads intend to enter their critical sections at the same time.
The difference in the C11 implementation is with the synchronisation annotations. The flag variable is set to true (line 2) using relaxed atomics (which does not induce any synchronisation), but is set to false (line 6) using a release annotation. The intention of the latter is to synchronise this write to flag with the read of flag at line 4 in the other thread. The value of turn is set using a swap command, which induces release-acquire synchronisation. Note that the read of turn within the guard of the busy wait loop (line 4) is relaxed. However, as we shall see, the algorithm still satisfies the mutual exclusion property.
Uninterpreted semantics
The uninterpreted operational semantics of commands is given by a relation −→ ⊆ Com × Act τ × Com, where Act = x ∈Var;m,n ∈Val {rd(x, n), rd A (x, n), wr (x, n), wr R (x, n), upd RA (x, m, n)} τ Act is a silent action and Act τ = Act ∪ {τ }. We write An expression evaluation step is formalised by a relation eval(E, a, E ′ ), where E, E ′ are expressions and a is a read action that is generated by the evaluation step (see Figure 1 ). We assume fv(E) returns the set of free variables in E. Note that eval(E, a, E ′ ) is only defined when fv(E) ∅. Moreover, in the presence of a binary operator, expression evaluation is assumed to take place from left to right. The notation E[n/x] stands for expression E with variable x replaced by value n.
The uninterpreted operational semantics for commands is given in Figure 2 . Again, most of these rules are straightforward. We assume [[E] ] denotes the value of (variable-free) expression E. An assignment x := E generates a read action whenever fv(E) ∅ and a write action whenever fv(E) = ∅. A swap command generates an update action, and guard evaluation either generates a read or a silent action.
Note that the uninterpreted operational semantics allows any value to be read. Thus, we have the following property.
For simplicity, we assume concurrency at the top level only. We let T be the set of all threads and use function of type Prog : T → Com to model a program comprising multiple threads. The uninterpreted operational semantics of a program is given by a relation −→ ⊆ Prog × T × Act τ × Prog (using overloading). As before, we write P a −→ t P ′ for (P, t, a, P ′ ) ∈ −→. An evaluation step of a program P is given by the rule Prog (Figure 2 ), which relies on the uninterpreted operational semantics of a command to generate an action a and command C from the command P(t). The program after taking a transition is the program P but with t mapped to the new command C.
Since threads execute independently in the uninterpreted semantics, all actions commute. 
An Operational Semantics for RAR C11
We now extend the semantics from Section 2 and interpret read, write and update actions in the C11 memory model. We develop an operational semantics that takes inspiration from the axiomatic descriptions [5, 6, 19] . In Section 4.2, we show that the operational model is in fact equivalent to a reformulation (inspired by [19] ) of the RAR fragment of the RC11 semantics [5] .
We formalise C11 states in Section 3.1 and define an operational event semantics based on observability (Section 3.2). This event semantics in turn gives rise to an interpreted semantics (Section 3.3).
C11 States and Basic Orders
The formalisation in this section follows the existing literature on axiomatic C11 semantics [6, 19] . First we give some preliminary definitions. Notation. For an action a ∈ Act, we let var(a) ∈ Var be the variable read (or written to), rdval(a) ∈ Val be the value read and wrval(a) ∈ Val be the value written. We extend actions to events of type Evt = G × Act τ × T , where G is the set of tags used to uniquely identify events in an execution. For an event (д, a, t), where д is a tag, a is an action, and t is a thread identifier, we define tag(e) = д, act(e) = a, tid(e) = t, and (using lifting) var(e) = var(act(e)), wrval(e) = wrval(act(e)), rdval(e) = rdval(act(e)). For a relation R ⊆ Evt × Evt, we let R |t and R |v be the restriction of R to events of thread t, and variable v, respectively.
We let U denote the RMW update events, and distinguish the sets Wr R ⊇ U (write release), Rd A ⊇ U (read acquire), Wr X (write relaxed) and Rd X (read relaxed). Finally, we define Rd = Rd A ∪ Rd X (all reads) and Wr = Wr R ∪ Wr X (all writes).
comprising a set of events D paired with a sequenced-before relation sb ⊆ D × D, a reads-from relation rf ⊆ Wr × Rd and a modification order mo ⊆ Wr × Wr.
We let Σ denote the set of all C11 states. The three relations in a C11 state ((D, sb), rf, mo) reflect different relationships between operations. The sequenced-before relation sb records the program order within one thread; sb |t is a strict total order for each thread t. The reads-from relation rf provides the justification for the values being read: every read must have a corresponding action that writes the value eval(E, a, E ′ ) Weak memory models are often defined in terms of a happens-before order (denoted hb), which formalises a notion of causality. In C11, an event occuring in a thread before another event in the same thread induces sequenced-before order (denoted sb), which in turn induces happens before order. Moreover, reads-from edges induce happens-before order when the corresponding actions in the edge are synchronising actions (i.e., a release and an acquire). This is formalised by an additional synchronises-with relation (denoted sw). Formally, we define
As is standard in the literature, we assume all variables are initialised by a special thread 0 ∈ T . Define the set of initialising writes to be IWr = {w ∈ Wr | tid(w) = 0}. The initial states of our operational model are those of the form σ 0 = ((I, ∅), ∅, ∅) where I ⊆ IWr, and for each variable x, there is exactly one write w ∈ I such that var(
The relation fr = (rf −1 ; mo)\Id (where ; is relational composition) is the "from-read" relation 1 that relates each read to all writes that are mo-after the write the read has read from. We must subtract Id (identity) edges from rf −1 ; mo to cope with update events, which have the potential to induce reflexivity in fr [5, 19] .
2. An example C11 state is given below, where threads 1-4 have executed some actions. Since the actions are unique, we elide the tags from each event, and we identify the thread id with the action itself, e.g., wr 1 (y, 1) is the action wr (y, 1) executed by thread 1.
1 fr is also referred to as "reads-before" [19] .
wr 2 (y, 1) The initialising writes are sb-before all thread actions, but are not ordered amongst themselves. Relation sb also describes the order for each thread. Relation mo describes the order of modifications for each variable. The unsynchronised read rd 4 (z, 3) is justified by the rf from wr 3 (z, 3), whereas the synchronised read rd A 3 (x, 2) is justified by the sw from wr R 2 (x, 2) and fixed before upd RA 1 (x, 2, 4) via the fr relation. Update events are related by both mo and rf to the immediately preceding write, and possibly related to later writes/updates by mo and fr. If the write being read is releasing, then an update induces an sw (e.g., see upd RA 1 (x, 2, 4)). □
In addition, our semantics uses the extended coherence order 2 [19] , denoted eco, which is an order that fixes the order of reads and writes to each variable (see Example 3.3 below). Formally we define: eco = (fr ∪ mo ∪ rf) Reads r 1 , r ′ 1 and r ′′ 1 read from the write w 1 , inducing fromread edges to w 2 (the write that immediately follows w 1 in mo). The update u induces an rf from w 3 (the write event
((D, sb), rf, mo)
w ,e RA ((D, sb) + e, rf ′ , mo ′ ) Figure 3 . Event semantics assuming σ = ((D, sb), rf, mo), e = (д, a, t) and д taдs(D)
immediately before u in mo) and an fr to w 4 (the write event immediately after u in mo). □
Event Semantics and Observability
Recalling that Σ denotes the set of all possible C11 states and Wr is the set of all writes (including updates), each step of the event semantics is formalised by the transition relation Figure 3) , where we have Wr ⊥ = Wr ∪ {⊥} and ⊥ Wr. Again, we write σ
For each rule σ w ,e RA σ ′ , w is the write being observed by the event e. Strictly speaking, the event semantics could be defined without the w. However, making this observed write explicit is useful for the verification (Section 5).
We now describe each of the rules in Figure 3 . Executing each event e updates (D, sb) to:
Thus, the initial writes are sb-prior to every non-initialising event. Relations rf and mo are updated according to the write events in D that are observable to the thread executing the given event. To this end, we must distinguish three sets of writes: encountered writes and observable writes, which are specific to each thread, and covered writes, which are the set of writes that are immediately followed, in reads-from order, by an update event.
The set of encountered writes are the writes that thread t is aware of (either directly or indirectly) in state σ = ((D, sb), rf, mo), and are given by:
where R ? is the reflexive closure of relation R. Thus, for each w ∈ EW σ (t), there must exist an event e of thread t such that w is either eco-or hb-or eco; hb-prior to e. Note that EW σ (t) = ∅ if the thread t has not executed any actions; as soon as the thread executes its first action, we have I ⊆ EW σ (t).
From these, we determine the observable writes, which are the writes that thread t can observe in its next read. These are defined as:
Thus, observable writes are not succeeded by any encountered write in modification order, i.e., the thread has not seen another write overwriting the value being read. Finally, to guarantee atomicity of the update events, there cannot be any write operations (in modification order) between the write that an update reads from and the write of the update itself. We therefore define the set of covered writes as follows: Given that I = {wr 0 (x, 0), wr 0 (y, 0), wr 0 (z, 0)} is the set of initialising writes, the encountered writes for each thread are as follows: 
Observable writes are used to resolve the read events in each thread. Namely, a thread t can read from any write event in OW σ (t). This is reflected in the Read rule, where the rf component is updated to record an rf from some observable write w to the read event e, provided w writes to the variable that e reads and the value read matches the value written.
To explain the write and update semantics, we require some more formal machinery. The observable and covered writes together determine the allowable updates to the mo relation after executing a write event. Unlike SC, a write event to variable x is not simply appended to the end of mo |x . Instead we allow a thread t that performs a write e (or update) to x to insert e after any observable write w in mo |x that is not a covered write. This condition is sufficient to ensure no cyclic dependencies arise as a result of performing the write.
Given that R[x] is the relational image of x in R, we define R ⇓x = {x } ∪ R −1 [x] to be the set of all elements in R that relate to x (inclusive). The insertion of a write event e directly after a write w in mo is given by
The rules Write and RMW update mo in the same way.
Interpreted Semantics
We now combine the event semantics with the uninterpreted semantics to give an interpreted semantics for the language in Section 2 overall. We give two generic rules that allows different memory models to be plugged in for the event semantics.
To this end, we define a configuration to be a pair (P, σ ), consisting of a program P and a state σ of the memory model. The command part of a configuration triggers events that are agnostic to values. However, the memory model will only allow certain values in read events. This idea is captured by the following two rules combining the uninterpreted program semantics (i.e., rule Prog) from Section 2.2 and an event semantics in some memory model M:
The first rule describes a τ -step and does not change the state. The second states that a thread can execute action a in the current state σ only if the event semantics of the memory model in consideration permits it. In the state without the boxed event, thread 2 can read from wr 0 (turn, 1) via a read event, but it cannot do so via an update because wr 0 (turn, 1) is covered by the existing update upd RA 1 (turn, 1, 2). Hence the update of thread 1 (when the event in the box occurs) updates turn from 2 to 1, which creates mo, sw and fr edges from upd RA 1 (turn, 1, 2). Now consider a continuation from the state with the boxed event, where the threads read the values in their respective guards. Thread 2 has encountered wr 1 (flag 1 , true), and hence, is no longer able to observe wr 0 (flag 1 , false). Similarly, since thread 2 has encountered upd RA 2 (turn, 2, 1) it is no longer able to observe wr 0 (turn, 1) or upd RA 1 (turn, 1, 2). We therefore conclude that thread 2's guard will evaluate to true, causing it to spin at line 4. In contrast, thread 1 can read from either wr 0 (flag 2 , false) or wr 2 (flag 2 , true) since it has not yet encountered the event wr 2 (flag 2 , true). Similarly, since it has not yet encountered upd RA 2 (turn, 2, 1), it can read from both upd RA 1 (turn, 1, 2) and upd RA 2 (turn, 2, 1). Thread 1 therefore could spin at line 4 or exit the busy loop. Note that once it has read a new value for flag 2 or turn, the previous value (in mo-order) can no longer be read.
This example demonstrates how the basic synchronisation principle of Peterson's algorithm is guaranteed by the releaseacquire annotations. Namely, (1) the updates on turn are totally ordered via hb due to the release-acquire annotation on statement swap, and (2) the thread that is first to execute swap, may miss to see that the other thread has set its flag.
Validation of Operational Semantics
We now justify our operational semantics by showing it to be sound and complete with respect to an existing axiomatic version of the C11 memory model. There are several versions of the C11 axiomatic semantics that might be regarded as both standard and complete [5, 6, 19] . Our semantics deals only with the release, acquire and relaxed annotations on operations. We call this the RAR fragment of C11. The standard C11 semantics also specifies the behaviour of operations carrying sequentially consistent and non-atomic annotations. We ignore these annotations here. Our semantics closely resembles the RAR fragment of [5] and [19] . Like [5] , we use the convention that update operations are represented as a single event, rather than a read/write pair. Like [19] , we adopt the constraint that sb ∪ rf is acyclic, and make use of the extended coherence order. 3 The axiomatic semantics is given in Section 4.1. Soundness and completeness of the memory model is presented in Section 4.2.
Background: RAR Fragment of RC11
Axiomatic semantics start with pre-executions, which are candidates for valid C11 executions. A number of axioms are used to define which of these candidates are considered 3 In the full version of this paper [7] , we prove that our axiomatic model is equivalent to a variant of the RAR fragment of [5] . This proof is supported by a mechanisation in Memalloy [27] , which shows our models is equivalent to the RAR fragment for models upto size 7. The associated .cat files have been submitted as supplementary material. real executions. Pre-executions only contain a set of events and program order (as represented by the sequenced-before relation). We call such a pair (D, sb) a pre-execution state. New events can be added to pre-execution states using the + operator in the same way as in Figure 3 . Thus, if (D, sb) Once a candidate pre-execution (D, sb) is computed, it is augmented with the relations rf and mo. SB-Total. Sequenced-before is a total order over the events of each (non-initialising) thread and orders all initialising writes before all other events. Formally, for any e, e ′ ∈ D,
((e, e ′ ) ∈ sb ⇒ tid(e) = 0 ∨ tid(e) = tid(e ′ )) ∧ (tid(e) = 0 ∧ tid(e ′ ) 0 ⇒ (e, e ′ ) ∈ sb) ∧ (tid(e) 0 ∧ tid(e) = tid(e ′ ) ∧ e e ′ ⇒ (e, e ′ ) ∈ sb ∪ sb −1 ) .
MO-Valid. Modification order is a strict order on
Wr ∩ D consisting of a disjoint union of relations {mo |x } x ∈Var which are themselves total. That is, for any w, w ′ ∈ Wr ∩ D,
RF-Complete. Each read matches exactly one write in the execution, i.e., for every e ∈ Rd ∩ D there is exactly one w ∈ Wr ∩ D such that (w, e) ∈ rf, and for every (e, e ′ ) ∈ rf, e ∈ Wr ∧ e ′ ∈ Rd ∧ var(e) = var(e ′ ) ∧ wrval(e) = rdval(e ′ ).
No-Thin-Air. The relation sb ∪ rf is acyclic.
Coherence. The relations hb; eco ? and eco are irreflexive.
Definition 4.3.
A pre-execution state γ is justifiable iff there exist relations rf and mo such that (γ , rf, mo) is valid.
Soundness and Completeness
Having defined a new operational semantics for C11, the next step is now the comparison with the existing axiomatic semantics. In the following, we prove the before given operational and axiomatic semantics to be equal. We start by showing that the executions of the operational semantics are all consistent. We next show that all consistent executions of a program are reachable in our operational semantics. We do so in two steps. First, we consider the runs of a program on the memory model. Since the axiomatic semantics in its preexecution allows for reads before the appropriate writes, not every sequence of events possible for pre-executions is also possible in the operational semantics. We have mapping P 0 = {1 → z := x, 2 → x := 5}. The following pre-execution is possible:
where δ i = (P i , γ i ). The pre-execution state δ 3 can be justified using the following C11 state
The sequence of events is however not possible in the RA semantics since we cannot have a read without the prior write that it reads from, and hence the first transition cannot be emulated. Still, the operational semantics can reach the same final C11 state by executing
which is also a sequence of steps in = = ⇒ PE . □
The "reordering" of events described in Example 4.5 is always possible: for every sequence of steps of pre-executions, we can find a corresponding permutation of these steps in which reads are ordered after their writes (and the program order within threads is preserved). Putting together Propositions 2.3 and 4.1, we have the following result.
=⇒ PE (P ′ , γ ′ ) where tid(e 1 ) tid(e 2 ), then there exists a program P 2 and a pre-execution state γ 2 such that (P, γ )
and (P 2 , γ 2 )
This proposition is used to prove a permutation theorem for independent elements. We say that sequence e 1 e 2 . . . e n is a linearization of a strict order ≺ iff dom(≺) ∪ ran(≺) = {e 1 , e 2 , . . . , e n } and for any e i , e j , we have e i ≺ e j ⇒ i < j.
Then for every linearization f 1 , . . . , f k of sb k , there exist programs P ′ 1 , . . . , P ′ n−1 and preexecution states γ ′ 1 , . . . , γ ′ n−1 such that
We now show that for every justifiable pre-execution there is an execution of the C11 semantics that ends in the C11 state justifying the pre-execution. The theorem uses a notion that restricts pre-executions and C11 executions to a set of events. For a set of events E ⊆ D, we define:
In the completeness proof, we assume that the given preexecution sequence (P 0 , γ 0 )
has been reordered such that e 1 . . . e k is a linearization of sb k ∪ rf k , where rf k is the reads-from relation used in the justification of γ k . Such a linearization is possible since sb k ∪ rf k is acyclic (axiom No-Thin-Air).
Verification
We now describe our verification method (Section 5.1), building on the operational semantics. In Section 5.2, we apply it to our case study, Peterson's mutual exclusion algorithm.
Verification Method
Our verification method is built around two kinds of assertions for describing states of the operational semantics. The first kind, determinate-value assertions, are used to describe the values that a read operation might return. As such, these assertions are analogous to equations that specify the values of variables in a conventional (i.e., sequentially consistent) setting in which the state of an algorithm can be represented as a store that maps variables to values. The second kind of assertion, variable-ordering assertions, has no direct analogue in the conventional setting. Variable-ordering assertions provide a way to describe how information about a variable propagates between threads. Determinate-values. In the following, we assume that σ = ((D, sb), rf, mo) is a valid C11 state. We let σ .last(x) be the write or update to x in D that is not succeeded by another write or update in mo |x . Note that σ .last(x) is well-defined in any valid state σ . When X is a set of operations and x is a variable, X |x = {e ∈ X | var(e) = x }. For the determinate value assertions, consider some thread t and variable x. In some states of the operational semantics, there is exactly one write that t can read-from when reading x. This is true precisely when OW σ (t) |x = {σ .last(x)} (recall that σ .last(x) is never covered, and so σ .last(x) can always be observed in a transition). Under such a condition, the value returned by a read of x in thread t must be wrval(σ .last(x)). This ultimately provides us with a weak memory analogue of an equation asserting that a given variable has a given value in a conventional sequentially consistent setting.
Definition 5.1. Let t be a thread, σ a state and v a value. The determinate-value assertion x σ = t v holds iff
Condition (2) states that σ .last(x) is either an operation of the initialising thread, an operation of t, or happens-before an operation of t. This condition implies that t can only observe the last write to x. Formally,
Example 5.2. To illustrate the determinate-value assertion, consider the two states below. In each case, assume there are writes (not shown) to variable x that are mo-prior to the write to x. Also assume that each write is the last write in mo order.
For the state on the left, after the boxed operation, thread 2 satisfies x σ = 2 2, but for the state on the right, thread 2 does not. In each case, the only write to x that thread 2 can observe is the illustrated write to x, but thread 2 satisfies a corresponding determinate value assertion only on the left state. This is because on the left we have (wr 1 (x, 2), rd 2 (x, 2)) ∈ hb, but the unsychronised rf edge on the right means that there is no analogous hb edge.
In our verification, determinate-value assertions support clean interaction with variable-ordering assertions, which we describe shortly. Note that because our operational model prevents update operations reading from covered writes (see Section 3), i.e., are more restricted than read operations, an update operation on a variable x may only be able to read from the last write to x even if x σ = v is false for all v. Below, we show how to handle important instances of this situation.
The next two lemmas are immediate from the definition of σ = t . Lemma 5.3 below ensures that the value returned by a reading transition using the semantics in Figure 3 is consistent with the determinate-value assertion. Lemma 5.4 ensures that when a determinate-value assertion holds for two threads reading from the same variable, they return the same values for the variable. Determinate-value assertions differ from their conventional counterparts in that they are relative to a particular thread. It is almost definitive of weak-memory systems that distinct threads can have different views of the memory state. Variable-ordering. How can we ensure that distinct threads can agree on (or share) sufficient determinate-value assertions to support a verification? We address this problem by using another class of assertion: variable-order assertions, which orders two variables whenever the last writes to the variables are causally (i.e., hb) ordered. value assertion x σ = t v can be "copied" to another thread t ′ , whenever t ′ performs an acquiring read that reads-from the last modification of y and this write is releasing. It is easy to see that in a state σ ′ after such a synchronisation, σ ′ .last(x) is happens-before an operation of t ′ , and thus
Inference rules. Figure 4 presents a set of rules that precisely captures reasoning principles for determinate-value and variable-order assertions. The "copying" of determinate value assertions is captured in rule Transfer 4 . For the left state in Example 5.2 we can see this copying: when the boxed event rd A 2 (y, 1) occurs (leading to state σ ′ ), the determinate value assertion x σ = 1 2 is "copied" to thread 2 giving x σ = 2 2 by rule Transfer. Rule WOrd shows how we introduce variable ordering assertions: a variable ordering assertion can be introduced every time a thread writes to one variable (y in the rule), while having a determinate value assertion on another variable (x in the rule). Note that this rule would not be sound, without Condition (2) of Definition 5.1: since the existence of an hb edge from σ ′ .last(x) to σ ′ .last(y). Last modification transitions. Observe that the rules in Figure 4 are all conditioned on the modification that is observed in the transition being the last modification to the given variable. Thus, we must be able to prove that a given read or update observes the last modification. There are several ways to do this. It is easy to see that if x σ = t v for some thread t in some state σ then t can only read the last write to x. We formalise this claim in Lemma 5.6 below, and in our case study we show how to use it in verification.
Update operations provide another way to guarantee that a given operation observes the last modification at a given variable. Given a C11 state ((D, _), _, mo), an update-only 4 We show soundness of these proof rules in the full version.
variable is any variable x such that for all modifications m ∈ D with x = var(m), either m is an update or m ∈ IWr. Note that initially, every variable is an update only variable. In the operational semantics, update-only variables have the property that any new update or write can only be added to the end of the modification order. This is a consequence of the fact that for such a variable, any modification but the last is covered. Thus, we have the following lemma.
Lemma 5.6 (Last Modification Transition). Let t = tid(e) and x = var(e) for some event e. For any reachable transition (P, σ ) m,e =⇒ RA (P ′ , σ ′ ), m = σ .last(x) if either x σ = t v, for some value v, or x is an update only variable in σ .
In other cases, other kinds of invariants can be used to guarantee this last-modification property.
Example 5.7. Consider the following message-passing interaction between two threads:
Init:
1 : while !f A do skip; 2 : f := R 1; 2 : r := d; Here, thread 1 sets the data variable d to 5, and then indicates that the data is ready by setting the flag variable f to 1. Thread 2 awaits this condition, and then consumes the data. In order to show that this simple program is correct, we must be able to prove that thread 2 always reads the correct value at line 2.
We sketch a proof that for any state σ ′ , where thread 2 is at line 2, we have d Equipped with these techniques, we now show that Peterson's algorithm with the synchronisation annotations as given in Section 2 guarantees mutual exclusion.
An Example Verification: Peterson's Algorithm
We turn now to the verification of the version of Peterson's Mutual Exclusion algorithm given in Algorithm 1. Our verification consists of proving a mutual exclusion invariant (Theorem 5.8) stating that there is no reachable state in which both processes are in their respective critical sections.
To state our invariants, we make use of an auxiliary program counter function, which for each thread, returns the line number of Algorithm 1 that the thread is currently executing. More precisely, for each configuration (P, σ ) of Peterson's algorithm, and t a thread with t ∈ {1, 2}, the expression P .pc t returns i when P(t) is the part of the program starting on line i.
The mutual exclusion property for Algorithm 1 is proved in Theorem 5.8, which relies on the following invariants.
turn is an update-only location (4) turn
As in the classical (sequentially consistent) setting, we prove that these invariants hold for the initial configuration and for each transition of the algorithm. For space reasons we only provide details for one of these cases, i.e., where the first test at line 4 is evaluated to false, causing it to enter the critical section. 5 Proof. We consider the first test at line 4, flag t = false, in the case where the test fails (the success case is very simple). Let (P ′ , σ ′ ) be the configuration after the step in question Assume that P .pc t = 4, P ′ .pc t = 5, and e = R t (flagt , false).
Because e is not a write and the value of pct does not change, it is easy to use the NoMod and NoModOrd rules to show that each invariant except for (9) is preserved. We now prove that (9) is preserved. We do so by proving that turn σ ′ =t t under the assumption that P ′ .pct ∈ {4, 5, 6}. Because P .pct = P ′ .pct , we have P .pct ∈ {4, 5, 6}. Furthermore, by Lemma 5.3 and the fact that e = R t (flagt , false) and m = σ .last(flagt ) the assertion flagt σ = t true is false. Thus by Invariant 8, we have turn σ =t t. Then, from rule NoMod, and the fact that e is not a write, we have turn σ ′ =t t, as required. □ These invariants are sufficient to prove that Peterson's Algorithm satisfies the mutual exclusion property. 5 The full proof is available in the extended version [7] . 
Conclusion and Related Work
We have developed an operational semantics for the RAR fragment of the C11 memory model, which has been shown to be both sound and complete with respect to the axiomatic description. Thus, every state generated by the operational semantics is guaranteed to be one allowed by the axiomatic semantics. Moreover, any execution that is valid with respect to the axiomatic semantics can be generated by the operational semantics. Our semantics relies on a thread-local view of observability 6 , which is defined in terms of eco and hb orders. We have developed a proof technique for our operational semantics with a notation that follows conventional proofs of sequentially consistent memory as much as possible. Finally, we have applied this technique to an example verification.
There is a large body of related work; here, we provide a brief snapshot. There are several works aimed at providing operational semantics for a larger subset of C11, including models that aim to address the so-called thin-air problem (that we rule out by the No-Thin-Air axiom), which invariably lead to more complex semantics. Nienhuis et al. [22] provide a semantics that supports inductive reasoning, but they are forced to consider an order that does not include sb. This complicates a verification technique that follows program order. Kang et al. [14] develop an operational model aimed at handling cycles in sb ∪ rf. Again, their sophisticated model handles a larger subset of the C11 language, but at the cost of a more complicated state space and transition relation. Lahav et al. [16] provide an operational model for a stronger release-acquire model, where sb ∪ rf ∪ mo is required to be acyclic.
Kang et al. [14] provide a basic program logic for proving invariants; using their semantics in verification remains an open problem. Jagadeesan et al. [12] develop an operational semantics capable of coping with out-of-order executions for the Java memory model. However, their work focusses on supporting Java compiler optimisations and they do not consider program verification. One avenue for future work is to see how our notions of determinate-value and variableordering assertions might be applied to verification in a more sophisticated semantics [12, 14] .
Concurrent separation logic (CSL) provides a different approach to verification, and several frameworks have been developed for dealing with C11-style weak memory. [13, 26] handle a fragment of C11 incomparable with ours, ignoring relaxed accesses but modelling so-called non-atomic accesses. [8, 9, 11] additionally handle both relaxed accesses and fence operations. Weak-memory CSL has been a very active area of research for several years, and we refer the reader to the introduction of [13] for an excellent review.
Finally, recent works have focused on model checking approaches [1, 15] , where validation is aimed at efficient consistency checking of the standard axiomatic semantics.
