Abstract. The Total Store Order memory model is widely implemented by modern multicore architectures such as x86, where local buffers are used for optimisation, allowing limited forms of instruction reordering. The presence of buffers and hardware-controlled buffer flushes increases the level of non-determinism from the level specified by a program, complicating the already difficult task of concurrent programming. This paper presents a new notion of refinement for weak memory models, based on the observation that pending writes to a process' local variables may be treated as if the effect of the update has already occurred in shared memory. We develop an interval-based model with algebraic rules for various programming constructs. In this framework, several decomposition rules for our new notion of refinement are developed. We apply our approach to verify the spinlock algorithm from the literature.
Introduction
Logics for reasoning about concurrency in shared memory systems are based on the assumption that hardware is sequentially consistent [18] , guaranteeing that instructions within each process are never executed out of order in memory. However, modern processors have abandoned sequential consistency in favour of weaker memory guarantees, using local buffers to offer greater scope for optimisation. There are several different weak memory models [1, 2, 23] ; in this paper, we focus on the most restricted of these: the Total Store Order (TSO) memory model, which is implemented by architectures such as x86 (see Fig. 1 ). Under TSO, instead of committing writes immediately to main memory, the process executing the write stores it as a pending write in its local buffer. Pending writes are not visible to other processors until they are flushed, which commits the write to shared memory. A flush is either programmer controlled (via instructions such as fence or lock) or hardware controlled. Programmer-controlled flushes are ultimately expensive (and inefficient), hence, one would like to keep these to a minimum. On the other hand, reasoning about hardware-controlled flushes is difficult due to the increase in non-determinism of a program's behaviour.
Several approaches to program verification under TSO have been developed; we provide a brief survey. Researchers have considered direct methods, such as executable memory models [22] , theorems for reduction [9] , and identification of race conditions [20] . Others have linked programs under TSO executions to an abstract specification using linearizability [7, 16] , however, these use abstract specifications different from the natural abstractions one would expect; [7] requires buffers to be present in the abstract specification, while [16] uses a non-deterministic abstract specification. An issue with many existing approaches is that program semantics is given at a low level of abstraction of individual read and writes, which means programs must be understood and analysed using a verbose representation. Our work is based on the desire to lift reasoning to higher levels of abstraction [15] , which in turn improves scalability. To this end, we develop an interval-based semantics by adapting Interval Temporal Logic [19] . Such an approach has two distinct advantages: (a) it allows one to define truly concurrent executions [10, 11] , providing a more accurate model of TSO-based hardware; and (b) it is amenable to algebraic reasoning [3, 13] , which enables one to develop algebraic laws for syntactically manipulating formulas representing program behaviour. In this paper, we develop algebraic rules to verify refinement between a concrete program and its abstract representation. The development of algebraic laws is non-trivial. However, once available, they provide high-level reusable theories for verification. We do not claim to have a complete set of laws (this is a topic of future work), instead, we provide a set of rules that are required for proving the spinlock example we verify.
Within this interval-based logic, we develop a framework for reasoning on TSO, simplifying our existing semantics [15] and introducing enhancements specifically designed for reasoning about buffer-based programs. This includes a simplified permission framework (Section 3.1), a novel methodology for evaluating expressions in the presence of local buffers (Section 4.3) and a novel notion of local buffer refinement (Section 5.2) . Local buffer refinement is based on the observation that: To show a command C refines another command A with respect to a process p, the pending writes to local variables of p may be treated as if they have already taken effect in A. Thus, local updates at the concrete level may be treated as if they occur in their program order (without waiting for their flush to occur). This benefits verification because the nondeterminism from flushes of local variables is resolved earlier. We develop a number of algebraic transformation laws for both refinement and local buffer refinement.
Background

Total Store Order Example
Total Store Order (TSO) memory allows a process to store a write in its local buffer and continue processing without waiting for this write to be commited to memory (i.e., while the write is pending). The values in the buffer are flushed in a FIFO order. To see the effect of this, consider the following classic example with processes p and q that modify shared variables x and y, which are initialised to 0. In this paper, we assume maximum parallelism and that each thread resides in exactly one core, therefore, the words process and core are used synonymously.
word x=0, y=0; p { p1:
x := 1 ; p2:
r1 := y } q { q1: y := 1 ; q2:
r2 := x } Under sequentially consistent memory, at the end of execution, at least one of r1 or r2 would have a value 1. However, in TSO memory, it is possible to end execution so that both r1 and r2 read the original values of x and y, i.e., both r1 and r2 are 0 at termination. One such execution is p1, p2, q1, q2, flush(p), flush(p), flush(q), flush(q) , where flush(p) denotes a (hardware-controlled) flush event for process p. The write to x at p1 is not seen by process q until p's buffer is flushed, and symmetrically for the write to y at q1. Hence, it is possible for q to read a value 0 for x at q2 even though q2 is executed after p1. In addition to the above behaviour, each TSO process reads pending writes from its own buffer if possible, and hence, may obtain values that are not yet globally visible to other processes, e.g., if p2 is replaced with r1 := x, process p would read x = 1 even if the write to x is pending. If there are multiple pending writes to the same location, then the write value corresponding to the last pending write is returned.
Case Study: Spinlock
Spinlock [4] is a locking mechanism designed to avoid operating system overhead associated with process scheduling and context switching. A typical implementation of spinlock is shown in Fig. 2 , where a global variable x represents the lock and is set to 0 when the lock is held by a process, and 1 otherwise. The lock x is itself acquired using a secondary hardware lock (see Fig. 1 ), and this hardware lock is acquired/released using lock/unlock instructions. A process trying to acquire the lock x spins, i.e., waits in a loop and repeatedly checks the lock for availability.
Operation acquire only terminates if it successfully obtains the lock x. It will first lock the hardware so that no other process can access x. If, another process has already acquired x (i.e., x = 0) then it will release the hardware lock at a7 and spin at a8, i.e., loop in the while-loop until x becomes free, before starting over from a2. Otherwise, it acquires the lock at a4 by setting x to 0, releases the hardware lock at a5 and returns at a6. The operation release releases the lock by setting x to 1. The operation tryacquire is similar to acquire, but unlike acquire it only makes one attempt to acquire the lock. If this is successful it returns true, otherwise it returns false. Under TSO, a process p executing an assignment (e.g., x := 0) places a pending write in p's local buffer, which is not visible to other processes until the buffer is flushed.
We refer to processes that use spinlock to provide mutual exclusion to a critical section of code as its clients. Here, as in [22] , we assume that clients of the spinlock behave either as the program in Fig. 3 or Fig. 4 . Thus, one can assume that a client only calls a release operation when it holds the lock. 3 Note however, that the behaviours in Fig. 3 and Fig. 4 are not exhaustive. To admit other behaviours, one may formalise the additional client code, then apply our proof methods in this paper to verify this additional behaviour.
Clients can ensure mutual exclusion in the critical sections if in place of acquire, release and tryacquire, they use abstract operations AAcq, ARel and ATry below, respectively, which do not use buffers. We will refer to such clients as abstract clients. Here, statement await denotes a blocking atomic test-and-set statement, e.g., AAcq() can only execute if x = 1 holds, and its execution atomically sets the value of x to 0. Unlike the concrete program in Fig. 2 , all reads and writes occur directly with main memory; we use assignments of the form x <== 0 (which directly updates the value of x in memory) to distinguish this difference. If x = 0, then AAcq() blocks and cannot execute further until its guard x = 1 is set to true by another process. Operation ATry() attempts to update x to 0 using a non-blocking atomic compare-and-swap operation CAS, and returns 1 if the operation is successful and 0, otherwise. 4 Our notion of correctness of the spinlock will be to show that every possible execution of a spinlock client is a possible execution of an abstract client. To this end, we prove refinement between the behaviour of the two executions (see Section 5.3). Proving refinement under TSO is difficult; one must not only verify concurrency effects, but additionally consider the effect of accessing the buffer during a program's execution. Furthermore, the level of atomicity at which these effects are visible is fine-grained, occuring at the level of individual reads and writes. This paper develops a high-level approach for proving refinement that avoids the need to consider low-level (fine-grained) effects whenever possible by developing an interval-based semantics for programs under TSO. This allows one to view the concurrent execution of two processes as the conjunction of their behaviours over an interval (as opposed to an interleaving of their traces), reducing the impact of non-determinism due to concurrency.
3 Interval-Based Reasoning
Permission Monitoring
Using an interleaved execution semantics, one can guarantee that a variable will not be simultaneously written, or read and written as part of the same transition. This is not true for shared-memory true concurrency, where one must model variable access by the different processes (e.g., two processes simultaneousy modifying variable x in Fig. 2 ).
Our solution is to explicitly define read/write permissions. To this end, we assume that programs are executed by processes from a set Proc; each process represents a concurrent thread which modifies a set of variables from a set Var. The TSO architecture uses sophisticated coherence protocols to provide an illusion of shared memory. One may assume the following about read and write instructions:
-Two simultaneous writes (by different processes) to the same variable do not occur.
-A simultaneous read and write of the same variable does not occur.
-A process never has access permission to the local variable of another process. As we shall see in Sections 4.2 and 4.4, permissions also provide a convenient mechanism for formalising the effect of a lock-unlock block.
In previous work [11] , we have modelled permissions using a fractional encoding (inspired by [5] ). Here, we simplify these general notions and define the permission space as Perm " = Proc → Var → P{wr, rd}, where wr and rd denote write and read permission, respectively. Using '.' for function application, given π ∈ Perm, we interpret wr ∈ π.p.v (resp., rd ∈ π.p.v) as p ∈ Proc has permission to write to (resp., read the value of) v ∈ Var.
A system at any time is described by a state of type State " = Var → Val, where Val is the set of values. The system over time is formalised by a stream, which is a total function of type Stream " = Z → (State × Perm). Therefore, for each time in Z, a stream formalises the state of the system and the permissions for each process and variable. Properties of a system are given by predicates; a predicate of type T is a member of PT " = T → B, e.g., PState, PStream, and PPerm are state, stream, and permission predicates, respectively. We assume pointwise lifting of boolean operators over predicates in the normal manner.
denote permission predicates that hold iff process p has write, read or no access to v in the permission space π, respectively.
Then, W.p.u.π (p has write permission to u), (R.p ∧ R.q).v.π (both p and q have read permission to v) and N .q.u.π (q has no permission to u) in space π. Note that due to pointwise lifting (R.p ∧ R.q).v.π = R.p.v.π ∧ R.q.v.π. The assumptions on reads and writes above are then taken into account by assuming that each valid permission space π satisfies the following, where p and q such that p = q are processes, v is a variable, and u p is a local variable of process p.
Note that the third conjunct combined with the first ensures that q does not have read nor write permission to any local variable of p. This is lifted to the level of streams by defining a valid stream to be one in which each s.t is valid for t ∈ Z. For the rest of the paper, we assume each stream is valid.
To simplify the notation, for a state predicate b and permission predicate z, we assume b.(σ, π) = b.σ and z.(σ, π) = z.π, where σ and π are a state and a permission state, respectively. We assume '↾' denotes a projection operator, e.g., (x, y) ↾ 1 = x.
Interval Predicates
In this section, we provide the basics of interval predicates, which forms the logical basis of our program semantics. Our logic is an adaptation of Interval Temporal Logic [19] . An interval is a contiguous set of integers (denoted Z), and hence the set of all intervals is
We let lub.∆ and glb.∆ denote the least upper and greatest lower bounds of an interval ∆, respectively. Furthermore, we define inf.∆ " = (lub.∆ = ∞), fin.∆ " = ¬inf.∆, and ε.∆ " = (∆ = ∅). We define an ordering
To facilitate reasoning about specific parts of a stream, we use interval predicates, which have type IntvPred " = Intv → PStream [11, 13] . Example 2. Given Var, Proc and π as defined in Example 1, we define 3] .s and g. [10, 100) .s hold. 5 We define universal implication g 1 ⇛ g 2 " = ∀∆: Intv, s: Stream • g 1 .∆.s ⇒ g 2 .∆.s for interval predicates g 1 and g 2 , and write g 1 ≡ g 2 iff both g 1 ⇛ g 2 and g 2 ⇛ g 1 hold.
Concurrent Programming with Intervals 4.1 Operators to Model Programming Constructs
In this section, we introduce interval predicate operators used to formalise common programming constructs: sequential composition, branching, loops, and parallel composition. To model sequential composition, we define the chop operator [19, 13] . Unlike Interval Temporal Logic, which requires adjoining intervals to overlap at a single point, adjoining intervals in our logic are disjoint.
(
Thus, (g 1 ; g 2 ).∆.s holds iff either interval ∆ may be split into two adjoining parts ∆ 1 and ∆ 2 so that g 1 holds for ∆ 1 and g 2 holds for ∆ 2 in s, or the least upper bound of ∆ is ∞ and g 1 holds for ∆ in s. Inclusion of the second disjunct (inf ∧ g 1 ).∆.s enables g 1 to model an infinite (divergent or non-terminating) program. We assume that ';' binds tighter than all other binary operators, e.g., Iteration g * and g ω are the least and greatest fixed points of λ z • gz ∨ ε, respectively [14] , where g * allows empty and finite iterations and g ω allows empty, finite and infinite iterations of g. We also define strictly finite and possibly infinite positive iterations.
A thorough algebraic treatment of loops using iteration is given in [3] . We are interested in modelling true concurrency and therefore simply treat the parallel composition of two or more processes using (lifted) logical conjunction. For example, the behaviour of g 1 ; g 2 in parallel with h 1 ; h 2 over an interval ∆ in stream s is given by (g 1 ; g 2 ∧ h 1 ; h 2 ).∆.s. Using pointwise lifting, this is equivalent to (g 1 ; g 2 ).∆.s ∧ (h 1 ; h 2 ).∆.s, which holds iff (a) ∆ can be split into adjoining intervals ∆ 1 and ∆ 2 such that g 1 .∆ 1 .s ∧ g 2 .∆ 2 .s holds; and (b) ∆ can also be split into adjoining intervals ∆ 3 and ∆ 4 such that h 1 .∆ 3 .s ∧ h 2 .∆ 4 .s. Note that there is no immediate correlation between the lengths of ∆ 1 and ∆ 3 , i.e. g 1 could terminate earlier than h 1 , and vice versa. Modelling tests. Interval predicates provide a flexible approach to non-deterministic state predicate evaluation [17] , where expression evaluation is assumed to take time (as opposed to being instantaneous). In this paper, guards and assignments are restricted to contain at most one shared variable. 6 Given that c is either a state or permission predicate, and ∆ and s are an interval and stream, respectively, we define:
Thus, ( c).∆.s holds iff ∆ is non-empty and c holds for each state of s within ∆. For example, ¬ε ∧ g ≡ (u ≥ 300), where g is the interval predicate defined in Example 2.
Reasoning about pre/post assertions. One may define several additional interval predicate operators [13] . For the purposes of this paper, we find it useful to reason about properties that hold in the immediately preceding interval. We therefore define
where prev.∆ " = {t: Z | ∀u: ∆ • t < u} is the interval of all times before ∆. If c is a state or permission predicate, we use notation − → c " = true ; c, where − → c .∆ states that c holds at the end of ∆ whenever inf.∆ = ∞. Additionally, we define the following notation to reason about assertions that immediately precede, or are a result of a computation.
Such a definition of a pre-assertion is necessary because we assume adjoining intervals do not overlap (unlike [19] ). We have the following useful properties, which can be proved in a straightforward manner.
Abstract Commands
Using the interval-based semantic basis from the previous sections, we formalise commands, which describe the behaviours of the system processes. Formally, a command is of type Cmd " = P 1 Proc → IntvPred, mapping non-empty sets of processes to an interval predicate representing their behaviour. We use C.p as shorthand for C.{p}, where C is a command and p is a process. The semantics of sequential composition, iteration, non-deterministic choice and parallel composition of commands are defined pointwise lifting of the interval predicate operators, and hence, are given in the same syntax, e.g., (C 1 ; C 2 ).p = C 1 .p; C 2 .p. What remains is to define the commands to model, say, guard evaluation and assignment.
We first present some basic commands that may be used to models the abstract (sequentially consistent) specification. In particular, we define idling (denoted id), abstract guard evaluation (denoted nid.p " = ∀v: Example 4. The abstract specification is formalised as follows, where AAcq and ARel specify operations AAcq and ARel, respectively, while ATryOK and ATryFl specify execution of the ATry operation that succeed and fail to acquire the lock, respectively. We abbreviate x = 1 and x = 0 to x and ¬x, respectively. The return value of an execution of tryacquire in process p is modelled by a local variable r p .
The concurrent execution of abstract clients is modelled by Spec, which begins in a state in which the lock x is available (i.e., x holds) and consists of a number of (truly) parallel processes. We assume that each client of the spinlock behaves as either Fig. 3 or Fig. 4 , and furthermore, that the critical and non-critical sections do not modify variables x and r p , and hence, both the critical and non-critical sections are modelled by id. Therefore, id ; AExec models a single call to the abstract spinlock. Each process may make multiple (zero or more) calls, followed by no calls, and hence, all possible behaviours of an abstract client is given by (id ; AExec) ω ; id. We now explain how each operation is modelled. If AAcq.p.∆.s holds for interval ∆ and stream s, then only process p has access to x (i.e., no process q = p may read or write to x) and either (i) ∆ can be partitioned into ∆ 1 and ∆ 2 with ∆ 1 < ∆ 2 such that x holds in s throughout ∆ 1 and x is updated to 0 in s within ∆ 2 , or (ii) ∆ is infinite and x (i.e., x = 1) holds in s throughout ∆. Because await b blocks until test.b becomes true, there are no behaviours for AAcq.p when test.(¬b) holds. Operation ARel.p immediately sets x to 1, and by the definition of ⇐ together with assumption (1), we have that no other process reads or writes to x while this update occurs. Operation ATryOK.p behaves as AAcq.p, performs some idling, then updates r p to true. The idling between AAcq.p and update to r p provides scope for potential stuttering at the concrete level. Operation ATryFl.p starts by behaving as x • [¬x] , which implies that x is not accessed by any process q = p and that ¬x holds throughout the given interval. Then, ATryFl.p performs some idling and updates the return value r p to false.
Reading Variables for Expression Evaluation with Buffer Effects
Section 4.2 provided an interval-based semantics for commands without buffers, which were in turn used to model the abstract specification. The concrete program executes under TSO memory and contains local buffers, whose effects on the program's behaviour must be formalised. In this section, we present a method for evaluating expressions, i.e., when processes read variables, in the presence of local buffers. In particular, we formalise the fact that a TSO process first checks its buffer for pending writes; if a pending write exists, the last pending value is returned, and otherwise the value from memory is returned. Using interval-based methods enables one to formalise the effects of a buffer on the value of an expression at a high level of abstraction [15] .
We assume that B p ∈ Var denotes the buffer for process p, whose value is of type seq.(Var×Val), representing a pending write. Each buffer may contain multiple pending writes to the same location, and hence, we define a function cover that returns a set of mappings to the last pending write in a given buffer. Because seq.X is a partial function of type N → X, we may use dom.z to refer to the indices of z ∈ seq.X.
When a process evaluates an expression, the values of pending writes in a process' buffer mask those in memory, which is modelled formally using functional override '⊕' (see [24] for a formal definition). w < v) .B).σ hold, but (w < v).σ does not.
Processes evaluate state predicates (e.g., as part of a guard), however, in the presence of permissions and local buffers, evaluation is non-trivial. Firstly, one must ensure that a process p evaluating state predicate b is able to obtain read permission to each variable of b whenever the variable's value is fetched from memory. Note that this is only potentially problematic if the variable in question is shared (i.e., not a local variable of p) and not in p's buffer (p may can always access its local buffer). Secondly, the value of a variable v read by p must be the last value of v in p's buffer if it exists, and the value of v in memory, otherwise. 
Commands under TSO
As already mentioned, processes that execute under TSO write only to their local buffers. The effects of these writes are not seen by other processes until a buffer is flushed, which moves the pending write from a buffer to shared memory. TSO buffers operate in a FIFO order, and hence, we define the following commands, where Φ models a single flush, Φ models a flush or a non-empty idle, and Φall models a complete buffer flush.
.p {B p = } Due to the fine-granularity of the concrete implementation, seemingly atomic statements become compound commands under TSO memory. Evaluation of a boolean expression b (e.g. a guard in an if-then-else block) is a compound statement that flushes or idles (zero or more times), evaluates b using the buffer-based evaluation semantics defined in Section 4.3, then flushes or idles again (zero or more times). A write of v with value k appends the pair (v, k) to the end of the local buffer. An assignment to a constant value k, potentially flushes or idles (zero or more times), appends the value to the buffer, then potentially flushes or idles (zero or more times). An assignment to a complex expression e, first evaluates the expression to a value k, then assigns k to v. Thus, we define:
There are several TSO instructions that force the entire buffer to be flushed. These additionally may lock certain variables from being accessed while the flush all is being executed. We therefore define commands pre Φ .v.C.p and post Φ .v.C.p, where pre Φ .v.C.p flushes the entire buffer (locking v) before C is executed (and similarly post Φ .v.C.p flushes all after C).
.p Some TSO instructions do not lock the memory while the buffer is being flushed. These may be modelled using pre Φ .∅.C and post Φ .∅.C, which we abbreviate to pre Φ .C and post Φ .C, respectively. A lock (e.g., a2 in Fig. 2) acquires the memory lock then flushes the entire buffer; an unlock (e.g., a5 in Fig. 2) flushes the entire buffer then releases the memory lock. Therefore, executing a command C within a lock-unlock block is modelled by
which executes C and ensures the buffer is empty before and after executing C. In addition it ensures that no reads and writes to v by other processes occur while C is being executed. Note however, that if a process p executes v • C Φ and a process q = p has a pending write to v in its local buffer, then q may read this value of v even while p is executing v • C Φ . Fig. 2 ). Command
] models the while loop at a8. Therefore, the outermost ω iteration in Acq.p models executions of the outermost loop of acquire that fail to acquire the lock. Command Lck.p models the lock at a2, successful test at a3, assignment at a4, and unlock at a5 followed by the return at a6. The other operations are similar.
Refinement and Local Refinement for TSO
Interval-Based Refinement
In this section, we develop a theory for proving that a command C refines another command A, providing a formal link between the behaviours of C and A. Here, A is an abstraction and therefore admits more behaviours than C, or conversely, any behaviour of C must also be a behaviour of A. In an interval-based setting, we use the following definition of refinement [11] . In the context of our example, if refinement holds, then whenever a spinlock client is able to enter its critical section, it must also be possible for the abstract client to enter the critical section. Definition 1. If C and A are commands, then C refines A with respect to P ⊆ Proc, (denoted C ⊒ P A) iff for any interval ∆ and stream s, C.P ⇛ A.P. We say C is equivalent to A with respect to P (denoted C ⊑ ⊒ P A) iff both C ⊒ P A and A ⊒ P C.
Refinement is defined in terms of implication, and hence, relation ⊒ P is both reflexive and transitive. In this paper, we use ⊒ P as a basis for transforming the abstract specification Spec and the concrete program Prog individually. We use a notion of local buffer refinement (Definition 2) to relate concrete behaviours (with buffers) to abstract behaviours (without buffers).
Example 7.
We transform the TSO implementation Prog into a form that is more amenable to verification. In particular, a difficulty encountered when verifying Prog in Example 6 directly is that for each process, p, command Exec.p is not guaranteed to end in a flush, and hence changes to x may not be globally visible until the start of the next iteration (which starts with a lock that performs a flush all). In particular, it is not immediately possible to match the behaviour of Rel with abstract ARel because Rel only places a pending write in the buffer, whereas ARel modifies the value of x in memory. Therefore, we aim to transform Prog to Prog ′ below (see Proposition 1), where the flush occurs at the end of execution. We use notation 
Clearly, transforming Prog to Prog ′ by reasoning at trace-based level of Definition 1 is infeasible. Therefore, we develop a number of refinement laws that are applied to our example. First, we have the following; the proof of each equivalence is straightforward.
Law 2 If p ∈ Proc, C and D are commands, each C i is a command and v is a vector of variables, then
To transform Prog to Prog ′ , we develop a leapfrog theorem analogous to Law 1, whose proof uses the equivalences defined in Law 2 as well as 
The left hand side of Theorem 1 contains a disjunction that executes v • C i Φ , which ensures the buffer is empty (via flushes if necessary) both before and after execution of C i . After the end of the iteration, command PF.v is executed, which ensures the buffer is empty when the process terminates; flushes may be necessary due to the behaviour of D i . On the right hand side, each iteration is guaranteed to start with an empty buffer and each disjunct starts with the weaker v • C i Φ , which only flushes the buffer at the end of execution. However, each iteration ends with PF.v. Further note that on the right hand side, each iteration is guaranteed to begin in a state where the buffer of p is empty. For the proof in Section 5.3, we find the following laws to be useful, each of which is proved in a straightforward manner. Note that for (11) and (14) , the refinement only holds in one direction. Of these, (14) states that an assignment {B p = }v := k either ends with B p = (v, k) , or the buffer B p is flushed as part of the assignment semantics.
contrast to existing methods which require global conditions to be checked, e.g., [20] checks race conditions, [7, 16, 25] check linearizability, and [9] checks reduction. We conjecture that more complex examples will indeed require consideration of the behaviour of other processes. To this end, we will integrate compositional methods such as rely/guarantee into our framework [13, 11] .
Conclusions
Existing approaches to relaxed memory verification (e.g., [6, 21, 22, 9, 20, 7, 16] ) focus on a low-level language (i.e., individual reads/writes), and hence, to perform a verification, programs need to be observed and understood in their (verbose) low-level representation. We are not aware of any approach that tries to lift memory model effects to a higher level of abstraction; our work here is hence unique in this sense [15] .
The basic idea is to think of a statements as being executed over an interval of time or an execution window. Such execution windows can overlap if programs are executed concurrently and overlapping windows correspond to program instructions that can be executed in any order, representing the effect of concurrent executions and reorderings due to TSO. Overlapping execution windows may also interfere with each other and fixing the outcome of an execution within a window can influence the outcome within another. This paper presents several advances to the semantics in [15] by simplifying the interval logic, and program semantics, as well as developing buffer-specific rules for expression evaluation and refinement. The underlying rules are algebraic in nature, and hence, we provide generic transformation laws, which are in turn applied to our running example.
A difficulty when reasoning about TSO memory is that in addition to the normal non-determinism caused by concurrency, an additional level of non-determinism is introduced via use of local buffers. The methods in this paper allow one to reduce the non-determinism that must be considered when reasoning about local updates. In particular, we develop a notion of local buffer refinement, which allows one to proceed as if pending writes to local variables have already occurred in the abstract level. In particular, this means that local writes do not appear out of order. A similar observation is used for local transformation in the context of compilers for weak memory [8] , however, these do not consider higher-level synchronisation instructions such as lock.
As part of future work, we aim to study the connections between local buffer refinement, and existing notions such as triangular race freedom [20] and reduction [9] .
