Abstract. Linearizability is the standard correctness notion for concurrent objects, i.e., objects designed to be accessed simultaneously by multiple threads. In this paper, we explain why the original definition of linearizability is not applicable to code running on the weak memory models supported by modern multicore architectures, and provide an alternative definition which is applicable. In contrast to earlier work, our definition is proved to be both sound and complete. Furthermore, while earlier work has focussed on linearizability for the TSO (Total Store Order) architecture, ours is applicable to any existing weak memory model. We illustrate how it can be used to prove the correctness of a simple case study on the widely used TSO, Power and ARM architectures.
Introduction
Linearizability [17] is widely accepted as the standard correctness criterion for concurrent objects, i.e., objects designed to be accessed simultaneously by multiple threads [19] . At the level of implementation, operations on an object take time and hence they may overlap in a multi-threaded program. This is obviously difficult to reason about. Linearizability allows us to prove, however, that the behaviour of such an object implementation is identical to that of a specification in which operations are atomic, and hence cannot overlap. The key concept is the notion of a linearization point, a point where an operation in the implementation can be thought of as taking effect atomically. Choosing such a point for each operation in a concurrent history of an implementation allows us to match that history with a sequential history of the specification. The sequential history is referred to as a linearization of the concurrent history. Fig. 1 shows two linearizations of a concurrent history in which the operation B of thread T1 overlaps with the operation C of thread T2. There are three important things to note. Firstly, the linearization point of an operation must occur somewhere between its invocation, i.e., when it is called, and its response, i.e., when it returns. Secondly, overlapping operations in the concurrent history may occur in either order in the linearization, regardless of the order they were invoked or responded. Thirdly, operations which do not overlap occur in the order they were invoked. A concurrent history satisfies a specification, provided one of its possible linearizations is a history of the specification. A concurrent object is considered correct when each of its finite histories linearizes with a sequential history of the specification. Hence, linearizability only checks safety properties of an object implementation, not liveness properties [16, 23] .
Recent work [6, 15, 9, 24, 10] has begun examining the applicability of linearizability in the context of weak memory models of modern multicore architectures [22, 20, 18, 2, 3, 14] . These memory models improve hardware efficiency by limiting accesses to global memory. Individual threads operate on local copies of global variables, updates to the global memory being made by the hardware and largely out of the programmer's control. This can cause threads executing on different cores to get out of sync with respect to the values of global variables.
For example, on the TSO (Total Store Order) architecture [22] a thread updating a global variable x stores the new value in a per-core FIFO buffer. Threads executing on that core will then read x from the buffer, rather than the global memory, until the new value is flushed from the buffer by the hardware. In the meantime, threads on other cores read the value of x from the global memory or from their own core's buffer when it has a value for x .
Consider the example of Fig. 1 when operation A writes to a global variable x , and both operations B and C read and return the value of x . If T1 and T2 are executing on different cores and the flush of A's write occurs after C's response then B will return the new value of x (which it reads from the buffer of the core that T1 is running on), and C will read the old value of x (which it reads from global memory). Neither linearization will satisfy a specification in which the old value of x cannot be read after x has been updated. Hence, using the standard definition of linearizability, we would conclude that the implementation is incorrect.
However, the client program that runs threads T1 and T2 cannot observe that A occurs before C. To do so, it would require a point of synchronisation after A has responded and before C has been invoked. For example in the client program of Fig. 2 , T2 can only call C after z is assigned 1 in T1. Such a point of synchronisation requires writing to a global variable, such as z, and waiting for it to be flushed. On TSO, since the per-core buffer is FIFO, this would also mean the value of x (written earlier as part of A) is flushed, and hence C would no longer read the old value of x . Hence, from the client perspective the concurrent history does satisfy the specification. Hence, standard linearizability does not provide a notion of correctness of objects on weak memory models.
To address this problem, there have been several attempts at redefining linearizability for TSO. Burckhardt et al. [6] include a notion of buffers in the specification of a concurrent object, and associate two atomic steps with each specification operation: one where the effect of the operation updates the buffer, and a subsequent one where it takes effect in the global memory. Gotsman et al. [15] introduce nondeterminism into the specification to model that a thread may, or may not, have seen a recent update. Both of these approaches change the specification that the implementation needs to satisfy. The resulting specifications are less intuitive and do not correspond to specifications that would normally be found as part of a software library.
Derrick et al. [9, 11, 8] take a different approach, leaving the specification unchanged and instead changing the definition of linearizability. In particular, they do not require that the linearization point of an operation occurs before its response; it can occur anytime up to the final flush associated with the operation. This allows operation A of Fig. 1 to linearize after C in the case when its flush occurs after C. Hence, we can match the history to a sequential one of the intended specification in which C occurs returning the initial value of x , followed by A and then B occurring, the latter with the new value of x from A.
A similar approach is used by Travkin et al. [24, 25] when developing tool support for proving linearizability on TSO. None of these papers, however, attempt to prove that the modified definitions of linearizability are sound or complete. A variant of Derrick et al.'s definition proposed by Doherty and Derrick [10] is proved sound with respect to a notion of correctness from a client program's point of view. This definition is limited, however, to client programs that are free from operation races (these are like data races but at the level of operations rather than individual lines of code), and hence is not complete.
In this paper, we propose a general definition of linearizability for any existing memory model (not just TSO) which we prove to be both sound and complete. In Sect. 2, we provide a formal definition of correctness from a client program's point of view. This definition is based on the standard notion of trace refinement of programs [4, 5, 1] , adapted to a weak-memory context. In Sect. 3, we provide our definition of linearizability along with proofs of soundness and completeness. In Sect. 4, we illustrate how our definition can be used to prove correctness of a case study on TSO, and in Sect. 5 we consider the case study on the significantly weaker Power and ARM memory models [2, 20, 18, 3, 14] . We conclude with a brief summary of related work and ideas for future work in Sect. 6.
Correctness
To be able to prove soundness and completeness of our definition of linearizability in Sect. 3, we begin by defining correctness of an object in terms of trace refinement of a client program using that object. Such a program P is defined in terms of the set of events it can undergo, and the partial order on the occurrence of those events enforced by P on a given memory model.
Events can be program steps, such as assignments, conditional branch instructions (e.g., of if or while statements) or higher-level instructions which can, in most cases, be defined in terms of assignments and/or conditional branches. For example, the await(z=1) statement of Fig. 2 could be defined as while(z = 1) {}. To enable overlapping of operations on different threads, we also have invocation and response events for each operation call. All program steps and operations are deterministic; non-determinism in a program results from the interleaving of events on different threads varying between executions.
Central to our definitions of trace refinement and linearizability is the notion of an additional type of event called observation events. Such events denote the point in an execution where a program step which writes to one or more global variables, or the termination of an operation (whether it writes to global variables or not) can be deemed to have been observed by all threads. An observation will occur when each thread in the program can either (a) access the new values of global variables written by the program step or operation, or (b) access writes to those variables that occurred later in the execution (as may occur in non-multi-copy atomic architectures [2] ).
In both cases, the observation of an operation will not be before its response. In the case of an operation which does not write to any global variables, the observation will occur at the same time as the next observation that, due to the order enforced by the program, must have occurred later. 1 Importantly, the point where an observation occurs depends on the memory model and affects the definition of correctness.
On a sequentially consistent (SC) architecture (i.e., one without a weak memory model), writes to global variables occur instantaneously. Hence, the observation event for a program step occurs immediately after the program step, and that of an operation which writes to global variables immediately after its response. For example, the partial order of the events of the program of Fig. 2 The event order the program enforces is the transitive closure of {(obs A , z = 1), (obs z=1 , inv B ), (obs z=1 , await(z = 1)), (await(z = 1), inv C )}. Note that we exclude from the order event pairs such as (inv A , res A ), (res A , obs A ) and (z = 1, obs z=1 ) which represent event orders which hold for any program on any memory model.
On TSO, writes to global variables become available to threads on other cores when they are flushed. Hence, the observation event of a program step or an operation that writes to global variables occurs at the final flush associated with its writes, or, in the case of an operation, immediately after the response when the final flush occurs before the response. The partial order on events of the program of Fig. 2 
We formalise the semantics of programs as follows. Let T be the set of all thread ids, and Call the set of all operation calls. An operation is then defined as a call by a particular thread.
Op = T × Call
Let PS denote the set of all program step events, and Val the set of all values (of input and output parameters) including a special element ⊥ meaning 'no value'. The set of all events is defined as follows where each invocation is associated with an input, and each response and observation with an output.
We refer to inv (Op, Val ), res(Op, Val ), and obs(Op, Val ) as operation events, and to step(PS ) and obs(PS ) as program events.
A program P has a set of events, events(P ), such that
and a partial order on events < PM which is enforced by P on memory model M . A point of synchronisation in a program cannot occur during an operation (whose code is not part of the program), and existing memory models only enforce an order on observations on the same thread. Hence, < PM cannot enforce an order between an invocation of one thread and an event of another, or an event of one thread and a response or observation of another. Furthermore, existing memory models, and hence < PM , do not enforce an order between a response of an operation and an observation of a different operation. The partial order also does not include event pairs such as (inv (op, in), res(op, out)), (res(op, out), obs(op, out)) and (step(p), obs(p)) which represent event orders which hold for all programs on all memory models (reducing the transitive closure of < PM to what is effectively enforced by P and M ). Hence, there are no pairs in the order which start with an invocation, end with a response, or end with an observation on a different thread, and no pairs that start with a response and end with an observation. Let τ (a) denote the thread on which an event a occurs.
Since the observation of an operation can occur immediately after its response (or when the operation does not write to global variables, at the same time as the observation of a subsequent event), < PM cannot enforce the observation of a later operation to occur before an earlier one.
The semantics of a program P is a set of finite sequences of events, referred to as traces. Since we are interested in defining linearizability, and hence only safety properties, we do not consider infinite sequences of events in our semantics. For each trace t, each event is unique (similar events, e.g., calls to the same operation, may be annotated by their relative position in the trace), and an invocation of an operation always occurs before the associated response, which in turn occurs before the associated observation. Similarly, a program step always occurs before its observation, if any.
The events of a trace and the order on these events are defined as follows.
The semantics of program P on memory model M is then defined as the set of traces using only events from P and whose order is allowed by P on M .
where < PM < t means t is allowed by P on M and is defined as
Trace refinement
To define trace refinement between client programs using objects, we need to constrain the behaviour of the program to a particular object. 2 We define the history of an object to be a trace with only object events.
where t |o denotes the trace t restricted to object events.
An object implementation C has a set of object events, events(c), and, on a particular memory model M , a prefix-closed set of histories made up of those events, [[C ]] M . Observation events are not controlled by the object and hence can occur at any time after the associated response that the memory model allows.
For any object implementation C , P [C ] denotes the object C operating in program P . It is only defined when all object events of P are events of C . The traces of P [C ] on memory model M are those of P on M whose object events correspond to a history of C on M .
An object specification A similarly has a set of object events, events(A), and a prefix-closed set of histories, [[A] ]. Since A represents a typical specification found in a software library, its set of histories is independent of the memory model. Any weak memory model behaviour is absent from its histories due to its operations being atomic, i.e., they occur without interference from other operations.
To capture this in our semantics, the histories of A are restricted to those where only operations on one thread are active at a time. For example, suppose the specification of a lock object, lock, has an operation acquire which waits until the value of a variable of the object, x, is 1 and sets it to 0, i.e., acquire is specified as await(x=1); x=0. Assuming x is initially 1, in the program of Fig. 3 the intention would be that only one of y or z would be set to 1. On SC, this intention is achieved when the invocation of acquire which occurs second does not happen until after the response of the acquire which occurs first. On TSO and assuming T1 and T2 are running on different cores, the intention is only achieved when the invocation of acquire which occurs second does not happen until after the flush of x from the acquire which occurs first. In both cases, the intention is met when the second occurrence of acquire is not invoked before the observation event of the first occurrence. In general, for any object specification to behave as intended, an operation invocation on one thread does not occur before the observation event of a previously invoked operation on any other thread. (The same does not need to hold for operations on the same thread, as values are read locally.)
Additionally, since operations in a specification are intended to always return, for each history h of A with pending invocations, i.e., invocations for which there is no response, the history which extends h with the missing events is also in [[A] ].
This is not the case for object implementations which may have operations which never return (e.g., due to an infinite loop). Provided all object events of a program P are events of a specification A, P [A] denotes the program P operating with an object whose behaviour satisfies A.
Correctness of an object is defined from the client program's point of view. Such a program can only observe changes to program variables, i.e., variables that are not defined locally on a thread or as part on an object. Let t |global denote the observable behaviour of a trace t, i.e., the sequence of observation events of program steps which write to program variables. An object implementation C is correct with respect to an object specification A when any observable behaviour of any program P using C on memory model M is a possible behaviour of P using A on M . We refer to this property as trace refinement.
Linearizability
Linearizability relates histories of an object implementation, which may have pending invocations, to histories of an object specification which do not [17] . To do this, it needs to complete the implementation histories. This can be done by adding a response when a pending invocation is deemed to have taken effect, and removing the invocation when it has not [17] . For example, consider a history comprising a read operation of a variable x occurring on a thread T1 concurrently with a write operation to x on thread T2 where the latter has not yet responded. If the read operation returns the value from the write operation, we can assume the write operation has taken effect and hence we add a response event to the history. If the read operation returns the value of x from before the write operation, we can assume the latter has not taken effect and remove its pending invocation.
To define linearizability, therefore, we need to define functions for adding responses and removing invocations from histories. The function ext returns the set of traces which extend a given trace with a sequence of response events such that the result is still a trace, i.e., responses are only added for pending invocations.
The function comp returns the trace resulting from the removal of all pending invocations from a given trace.
where
The following formalisation of the standard definition of linearizability is based on that of Derrick et al. [7] which has been proved, in [7] , to correspond to the original definition by Herlihy and Wing [17] . Here we view it in the context of memory model M .
where t ∼ t denotes that t and t are thread equivalent, i.e., when restricted to the events of any one thread they have the same sequence of invocations and responses, and t = {(res(c), inv (d )) : < t }, i.e., t captures the order between responses and invocations in a trace t. Note that h must be a complete history (as required in [17] ) because it is thread equivalent to the completion of h. The intuition behind the definition is that operations which are overlapping in h will not be ordered by h and hence can occur in any order in h (since h is a superset of h ). For example, the overlapping operations B and C of the implementation history of Fig. 1 can occur in any order in a linearization of that history. This is equivalent to letting the linearization points of B and C occur anytime between the respective operations' invocations and responses.
Importantly, the definition is compositional (this property is referred to as locailty in [17] ). In the case when the object implementation C is a collection of interacting objects, compositionality allows us to prove that C is linearizable to a specification of a similar collection of interacting objects by proving each individual object implementation is linearizable to the corresponding object specification.
As discussed in Sect. 1, this definition does not work for weak memory models such as TSO. It has been argued that in such memory models, the linearization point of an operation can occur after its response [9, 11, 8, 24, 25] . Specifically, for TSO it can occur anytime between the operation's invocation and the final flush of a variable value written by that operation.
To capture this idea in a more general definition of linearizability, we replace the order t in the standard definition with ≺ t defined as follows.
The effect of this is that the order between operations on different threads is between observation events on one and invocations on the other (as opposed to between responses and invocations). This matches the intuition that a synchronisation point (and hence observation event) is required to order operations on different threads. The definition therefore allows operations to occur in any order in the specification history not only when the operations overlap in the implementation history, but also whenever one operation overlaps with the time between the invocation and observation event of another. The order between operations on a single thread is maintained by the thread equivalence condition (as for standard linearizability).
Definition 3. Generalized linearizability (for weak memory models)
This definition is equivalent to the standard definition of linearizability when h (in the standard definition) is replaced by a transformation trans(h) which replaces observation events with their corresponding responses (and removes the original responses). As shown in [8] , such a definition maintains the property of compositionality of the standard definition when trans(h |x ) = trans(h) |x , where h |x denotes the restriction of a history to events of a particular object x . This trivially follows from the definition of trans and hence our general definition is compositional (see Appendix A for the formal definition and full proof).
In the following sections, we show that the definition is sound and complete with respect to the definition of trace refinement in Sect. 2.1. To do so, we first introduce the lemmas below whose proofs are included in Appendix B.
Lemma 1. The ≺ order of a completion of a trace t is a subset of that of t.
If the events of a trace t are events of a program P then so are the events of any completion of t.
Lemma 3. If a trace t is allowed by a program P on memory model M then so is any completion of t.
Lemma 4. If t |o = h, for a trace t and history h, then for any completion of h there is a completion of t which matches the completion of h. ∀ t : Trace; h :
Lemma 5. If an object implementation C linearizes with a specification A then any history h of C linearizes with a history h of A in which res(c) comes before inv (d ), for each response res(c) which comes before invocation inv
∈ < h where h g-lin M h denotes h linearizes to h on memory model M .
Soundness
Soundness of the definition follows from the observation that if C g-lin M A then for any trace t of a program P [C ] on memory model M , we can (a) construct a history h of A which linearizes with the history of a completion of t, and (due to Lemma 5) has the same or a stronger order between responses and invocations, and has all observations of operations ordered according to the order of operations enforced by P on M (the latter is always possible since observations of operations can be moved earlier in a linearization of a history to ensure such an ordering, resulting in a history which has a stronger ≺ order and is hence still a linearization), and (b) construct a trace t for this history h which has the same events as the completion of t, and the same event order as the completion of t between program events, and between program events and operation events, i.e., the part of the event order that does not affect h .
Such a trace t will be a trace of P on M since it has the same events as the completion of t and the same order or a stronger order on all events enforced by P on M . Also, due to having the same events and same order on program events, t will have the same observable behaviour as t.
The proofs in this and the following section are formatted as a numbered sequence of properties, together with the justification that the property holds. The justification consists of definitions, lemmas, axioms (numbered (1), (2), etc. throughout the paper) and/or preceding lines in the proof on which it depends. Theorem 1. If an object implementation C linearizes with an object specification A on memory model M then for all programs P , P [C ] is a trace refinement of P [A] on M .
Proof Expanding g-lin M and M we have
Assume the antecedent holds. For any P , either:
In the following proof, lines 4 to 10 refer (directly or indirectly) to 3a. and 3b. and hence are interpreted within the context of the declaration of t + (in line 3). Similarly, lines 5 to 10 are interpreted in the context of the declaration of h (in line 4) and lines 6 to 10 in the context of the declaration of t (in line 5). S \ T denotes set S minus the elements of set T .
9, 5b., 5c.
Note that property 6f. can be derived as follows:
, and (obs(d ), obs(c)) ∈ < t implies (res(d ), inv (c)) ∈ < h (from 4c.) which implies (res(d ), inv (c)) ∈ < PM (since each thread of t |o respects < PM due to 5a. (1 st conj.), 4a. (1 st conj.), and 1). Hence, from axiom (3) we can deduce that (obs(c), obs(d )) ∈ < PM .
Completeness
Completeness of the definition follows from the fact that we can construct a program P which (a) records, in a program variable, the sequence of invocations and responses on each thread as they occur (i.e., immediately before each invocation and after each response), and (b) only allows operations of a given trace t ∈ [[P [C ] ]] M to occur in the order prescribed by t. If t does not prescribe an order on two operations (i.e., they overlap) then P allows them to occur in either order on M .
, there will be a t ∈ [[P [A] ]] M such that t |global = t |global which has the same events as t and whose operation observations each occur before the next invocation. The latter is possible due to the event recording the response in a program variable allowing operations which do not write to global variables to be observed, and ensures ≺ t ⊆ ≺ t . Since (a) also ensures t ∼ t , we can deduce that the history of C corresponding to any completion of t is linearizable to the history of A corresponding to the same completion of t .
Theorem 2. If we have an object implementation C and specification A such that, for all programs P , P [C ] is a trace refinement of P [A] on memory model M then C linearizes with A on M .
(ii) P records, in a program variable, the sequence of invocations and responses on each thread as they occur, and
and given t ∈ [[P [A] ]] M such that the antecedent holds and
11, defns of ≺ t and ≺ 
Linearizability on TSO
Consider a lock object with operations acquire, release and tryAcquire specified as follows. where TAS(x,a,b) is the atomic hardware primitive test-and-set which, when x is a, sets x to b and returns 1, and otherwise returns 0. This implementation is known to also work on TSO [22] , although it is not linearizable with the specification using the standard definition. For example, according to the semantics of TSO [22] , one possible history of the object is inv (T1, lock.aquire, ⊥), res(T1, lock.acquire, ⊥), obs(T1, lock.acquire, ⊥), inv (T1, lock.release, ⊥), res(T1, lock.release, ⊥), inv (T2, lock.tryAcquire, ⊥), res(T2, lock.tryAcquire, 0), obs(T2, lock.tryAcquire, 0), obs(T1, lock.release, ⊥) corresponding to an execution in which T1 and T2 run on different cores, and the flush of the value of x written by release is delayed until after tryAcquire occurs. Since release and tryAcquire do not overlap, standard linearizability cannot be used to show correctness with respect to the specification which, due to axiom (4), requires tryAcquire to always return 1 after a release operation. However, correctness can be proved using our definition of generalised linearizability due to its weaker order, ≺, between operations on different threads.
Linearizability on Power and ARM
On the Power and ARM architectures [2, 20, 18, 3, 14] , writes to variables are, like on TSO, local to the core on which they occur and are made available to other cores by the hardware or the use of fence instructions in the program. However, they are not necessarily made available in FIFO order. A write to variable x may be made available globally after a write to variable y when the write to x occurs before the write to y in program order. One effect of this is that a thread can detect that an operation has completed on another thread before the variables the operation has written are available globally. For example, in the following client program using the lock object of Section 4, it is possible that z is updated in globlal memory before the value of x written by release. T1
lock.acquire; lock.release; z=1 T2 await(z=1); y=lock.tryAcquire
The observation of a program step or operation on Power or ARM occurs when all threads can access all values written to global variables by the program step or operation, or can access values written to those variables later in the execution. As with TSO, when the final value written by an operation is available before its response, the observation occurs immediately after the response.
Consider the following trace of the client program using the implementation of Section 4.
inv (T1, lock.aquire, ⊥), res(T1, lock.acquire, ⊥), obs(T1, lock.acquire, ⊥), inv (T1, lock.release, ⊥), res(T1, lock.release, ⊥), step(z = 1), obs(z = 1), step(await(z = 1)), inv (T2, lock.tryAcquire, ⊥), res(T2, lock.tryAcquire, 0), obs(T2, lock.tryAcquire, 0), obs(T1, lock.release, ⊥), step(y = 0), obs(y = 0)
Since the observable behaviour obs(z = 1), obs(y = 0) of this trace is not possible using the specification, the implementation is not correct on Power or ARM. This can (trivially) be shown using our generalised definition of linearizability since there is no history of the specification where tryAcquire returns 0 after a release (and hence thread equivalence with the history of the above trace fails).
Conclusion
In this paper, we have provided a definition of linearizability which is sound and complete with respect to trace refinement on client programs using an object, and can be used on any weak memory model. Other work relating linearizability and refinement includes that of Filipović et al. [13] , Dongol and Groves [12] and Smith and Winter [23] . The latter paper examines object refinement whereas the former two refer to refinement of client programs as in this paper. However, none of these papers consider weak memory models. There has also been work on semantics of weak memory models including TSO [22] , and Power and ARM [2, 20, 18, 3, 14] . These are aimed at understanding programs running on the memory models, but not their correctness with respect to an abstract specification.
Our definition provides a solid basis on which to develop proof methods, and associated support tools, for proving the correctness of objects running on weak memory models. Given that it is structurally similar to that of standard linearizability, as well as compositional, a promising way forward would be to adapt existing proof methods and tools developed for proving standard linearizability (e.g., [7, 21] ).
A Proof of compositionality
The definition of g-lin M is equivalent to the standard definition of linearizability, lin M , when h (in the standard definition) is replaced by a transformation trans(h) which replaces observation events with their corresponding responses (and removes the original responses).
As shown below, such a definition maintains the property of compositionality of the standard definition when trans(h |x ) = trans(h) |x , where h |x denotes the restriction of a history to events of a particular object x . This trivially follows from the definition of trans and hence g-lin M is compositional.
i.e., that h linearizes to h on memory model M , and X denote the set of objects comprising C and A.
The proof follows from the proof of compositionality of standard linearizability of Herlihy and Wing [17] . It is formatted in the same style as the proofs in the paper.
Theorem 3. Compositionality of generalised linearizability
∀ h : [[C ]] M ; h : [[A]] • (∀ x : X • h |x g-lin M h |x ) ⇔ h g-lin M h Proof For any h ∈ [[C ]] M and h ∈ [[A]] ⇒ direction: Assume ∀ x : X • h |x g-lin M h |x 1 ∀ x : X • trans(h |x ) lin M h |x defn of g-lin M 2 ∀ x : X • trans(h) |x lin M h |x trans(h) |x = trans(h |x ) 3 trans(h) lin M h [17] (compositionality) 4 h g-lin M h defn of g-lin M ⇐ direction: Assume h g-lin M h 1 trans(h) lin M h defn of g-lin M 2 ∀ x : X • trans(h) |x lin M h |x [17] (compositionality) 3 ∀ x : X • trans(h |x ) lin M h |x trans(h) |x = trans(h |x ) 4 ∀ x : X • h |x g-lin M h |x defn of g-lin M
B Proof of lemmas
We format proofs in the same style as proofs in the paper.
Lemma 1a. Adding a response event to a trace t does not affect the order ≺ t .
Following Lemma 1a., extending a trace t with a sequence of response events does not affect ≺ t .
Lemma 1b. If, after removing an invocation from a trace, the resulting event sequence is still a trace, i.e., the invocation was a pending invocation, then its ≺ order is a subset of that of the original trace.
Following Lemma 1b., removing all pending invocations from a trace results in a trace whose ≺ order is a subset of that of the original trace.
The ≺ order of a completion of a trace t is a subset of that of t.
For any t ∈ Trace and t + ∈ ext(t)
Lemma 2. If the events of a trace t are events of a program P then so are the events of any completion of t.
2a., defn of Trace 5 events(t) ⊆ events(P ) ⇒ events(tr ) ⊆ events(P ) 4, (1) 6 events(t) ⊆ events(P ) ⇒ events(t + ) ⊆ events(P ) 5, 3 7 events(t) ⊆ events(P ) ⇒ events(comp(t + )) ⊆ events(P ) 6, 1
Note that lines 3 to 7 refer (directly or indirectly) to lines 2a., 2b. and 2c. and hence are interpreted within the context of the declaration of tr (in line 2).
In the following proofs, S \ T denotes the set S minus the elements of set T .
Lemma 3a. If a trace t is allowed by P on M , then so is any trace formed by adding a response to t.
Following Lemma 3a., if a trace t is allowed by a program P on memory model M , the trace formed by extending t with a sequence of response events is allowed by P on M .
Corollary 3. ∀ P , M • ∀ t : Trace • ∀ t + : ext(t) • < PM < t ⇒ < PM < t + Lemma 3b. If, after removing an invocation from a trace, the resulting event sequence is still a trace, i.e., the invocation was a pending invocation, then if the original trace is allowed by a program P on memory model M , the resulting trace is allowed by P on M .
∀ P , M • ∀ t inv (c) t : Trace • t t ∈ Trace ∧ < PM < t inv (c) t ⇒ < PM < t t
Proof
For any P , M and t inv (c) t ∈ Trace, assume t t ∈ Trace. 1 ∀(a, b) : < t inv (c) t \ < t t
• inv (c) ∈ {a, b} defn of < Following Lemma 3b., if a trace t is allowed by a program P on memory model M , the trace formed by removing all pending invocations from t is allowed by P on M .
Corollary 4. ∀ P , M • ∀ t : Trace • < PM < t ⇒ < PM < comp(t) Lemma 3. If a trace t is allowed by a program P on memory model M then so is any completion of t.
For any P , M , t ∈ Trace and t + ∈ ext(t) 1 < PM < t + ⇒ < PM < comp(t + ) Corollary 4 2 < PM < t ⇒ < PM < comp(t + ) 1, Corollary 3
Lemma 4. If t |o = h, for a trace t and history h, then for any completion of h there is a completion of t which, when restricted to object events, is equal to the completion of h. 
