Reasoning algebraically about refinement on TSO architectures by Dongol, B et al.
Reasoning Algebraically about Refinement on TSO
Architectures
Brijesh Dongol1, John Derrick1, and Graeme Smith2
1 Department of Computing, University of Sheffield, UK
2 School of Information Technology and Electrical Engineering,
The University of Queensland, Australia
Abstract. The Total Store Order memory model is widely implemented by mod-
ern multicore architectures such as x86, where local buffers are used for optimisa-
tion, allowing limited forms of instruction reordering. The presence of buffers and
hardware-controlled buffer flushes increases the level of non-determinism from
the level specified by a program, complicating the already difficult task of con-
current programming. This paper presents a new notion of refinement for weak
memory models, based on the observation that pending writes to a process’ lo-
cal variables may be treated as if the effect of the update has already occurred
in shared memory. We develop an interval-based model with algebraic rules for
various programming constructs. In this framework, several decomposition rules
for our new notion of refinement are developed. We apply our approach to verify
the spinlock algorithm from the literature.
1 Introduction
Logics for reasoning about concurrency in shared memory systems are based on the
assumption that hardware is sequentially consistent [18], guaranteeing that instructions
within each process are never executed out of order in memory. However, modern pro-
cessors have abandoned sequential consistency in favour of weaker memory guarantees,
using local buffers to offer greater scope for optimisation. There are several different
weak memory models [1, 2, 23]; in this paper, we focus on the most restricted of these:
the Total Store Order (TSO) memory model, which is implemented by architectures
such as x86 (see Fig. 1). Under TSO, instead of committing writes immediately to main
memory, the process executing the write stores it as a pending write in its local buffer.
Pending writes are not visible to other processors until they are flushed, which commits
the write to shared memory. A flush is either programmer controlled (via instructions
such as fence or lock) or hardware controlled. Programmer-controlled flushes are ul-
timately expensive (and inefficient), hence, one would like to keep these to a minimum.
On the other hand, reasoning about hardware-controlled flushes is difficult due to the
increase in non-determinism of a program’s behaviour.
Several approaches to program verification under TSO have been developed; we
provide a brief survey. Researchers have considered direct methods, such as executable
memory models [22], theorems for reduction [9], and identification of race conditions
[20]. Others have linked programs under TSO executions to an abstract specification
using linearizability [7, 16], however, these use abstract specifications different from the
Core 1
. . . . .
...Shared memory... lock
B
uffer
1
Core n
B
uffern
Fig. 1. TSO hardware overview
natural abstractions one would expect; [7] requires buffers to be present in the abstract
specification, while [16] uses a non-deterministic abstract specification.
An issue with many existing approaches is that program semantics is given at a
low level of abstraction of individual read and writes, which means programs must be
understood and analysed using a verbose representation. Our work is based on the desire
to lift reasoning to higher levels of abstraction [15], which in turn improves scalability.
To this end, we develop an interval-based semantics by adapting Interval Temporal
Logic [19]. Such an approach has two distinct advantages: (a) it allows one to define
truly concurrent executions [10, 11], providing a more accurate model of TSO-based
hardware; and (b) it is amenable to algebraic reasoning [3, 13], which enables one to
develop algebraic laws for syntactically manipulating formulas representing program
behaviour. In this paper, we develop algebraic rules to verify refinement between a
concrete program and its abstract representation. The development of algebraic laws
is non-trivial. However, once available, they provide high-level reusable theories for
verification. We do not claim to have a complete set of laws (this is a topic of future
work), instead, we provide a set of rules that are required for proving the spinlock
example we verify.
Within this interval-based logic, we develop a framework for reasoning on TSO,
simplifying our existing semantics [15] and introducing enhancements specifically de-
signed for reasoning about buffer-based programs. This includes a simplified permis-
sion framework (Section 3.1), a novel methodology for evaluating expressions in the
presence of local buffers (Section 4.3) and a novel notion of local buffer refinement
(Section 5.2). Local buffer refinement is based on the observation that: To show a com-
mand C refines another command A with respect to a process p, the pending writes to
local variables of p may be treated as if they have already taken effect in A. Thus, lo-
cal updates at the concrete level may be treated as if they occur in their program order
(without waiting for their flush to occur). This benefits verification because the non-
determinism from flushes of local variables is resolved earlier. We develop a number of
algebraic transformation laws for both refinement and local buffer refinement.
2 Background
2.1 Total Store Order Example
Total Store Order (TSO) memory allows a process to store a write in its local buffer
and continue processing without waiting for this write to be commited to memory (i.e.,
while the write is pending). The values in the buffer are flushed in a FIFO order. To see
2
word x = 1;
void acquire() {
a1 while(1) {
a2 lock;
a3 if (x = 1) {
a4 x := 0;
a5 unlock;
a6 return }
a7 unlock;
a8 while(x = 0);
} }
void release() {
r1 x := 1; }
bool tryacquire() {
t1 lock;
t2 if (x = 1) {
t3 x := 0;
t4 unlock;
t5 return true }
t6 unlock;
t7 return false; }
Fig. 2. Spinlock algorithm
Noncritical section ;
acquire();
Critical section ;
release();
Fig. 3. Spinlock client (a)
Noncritical section;
if tryacquire() {
Critical section ;
release() ;
}
Fig. 4. Spinlock client (b)
the effect of this, consider the following classic example with processes p and q that
modify shared variables x and y, which are initialised to 0. In this paper, we assume
maximum parallelism and that each thread resides in exactly one core, therefore, the
words process and core are used synonymously.
word x=0, y=0;
p { p1: x := 1 ;
p2: r1 := y }
q { q1: y := 1 ;
q2: r2 := x }
Under sequentially consistent memory, at the end of execution, at least one of r1 or r2
would have a value 1. However, in TSO memory, it is possible to end execution so that
both r1 and r2 read the original values of x and y, i.e., both r1 and r2 are 0 at termina-
tion. One such execution is 〈p1, p2, q1, q2, flush(p), flush(p), flush(q), flush(q)〉,
where flush(p) denotes a (hardware-controlled) flush event for process p. The write
to x at p1 is not seen by process q until p’s buffer is flushed, and symmetrically for the
write to y at q1. Hence, it is possible for q to read a value 0 for x at q2 even though q2
is executed after p1.
In addition to the above behaviour, each TSO process reads pending writes from its
own buffer if possible, and hence, may obtain values that are not yet globally visible to
other processes, e.g., if p2 is replaced with r1 := x, process p would read x = 1 even
if the write to x is pending. If there are multiple pending writes to the same location,
then the write value corresponding to the last pending write is returned.
2.2 Case Study: Spinlock
Spinlock [4] is a locking mechanism designed to avoid operating system overhead as-
sociated with process scheduling and context switching. A typical implementation of
spinlock is shown in Fig. 2, where a global variable x represents the lock and is set to 0
when the lock is held by a process, and 1 otherwise. The lock x is itself acquired using a
secondary hardware lock (see Fig. 1), and this hardware lock is acquired/released using
lock/unlock instructions. A process trying to acquire the lock x spins, i.e., waits in a
loop and repeatedly checks the lock for availability.
3
Operation acquire only terminates if it successfully obtains the lock x. It will first
lock the hardware so that no other process can access x. If, another process has already
acquired x (i.e., x = 0) then it will release the hardware lock at a7 and spin at a8, i.e.,
loop in the while-loop until x becomes free, before starting over from a2. Otherwise,
it acquires the lock at a4 by setting x to 0, releases the hardware lock at a5 and re-
turns at a6. The operation release releases the lock by setting x to 1. The operation
tryacquire is similar to acquire, but unlike acquire it only makes one attempt to
acquire the lock. If this is successful it returns true, otherwise it returns false. Under
TSO, a process p executing an assignment (e.g., x := 0) places a pending write in p’s
local buffer, which is not visible to other processes until the buffer is flushed.
We refer to processes that use spinlock to provide mutual exclusion to a critical
section of code as its clients. Here, as in [22], we assume that clients of the spinlock
behave either as the program in Fig. 3 or Fig. 4. Thus, one can assume that a client only
calls a release operation when it holds the lock.3 Note however, that the behaviours
in Fig. 3 and Fig. 4 are not exhaustive. To admit other behaviours, one may formalise
the additional client code, then apply our proof methods in this paper to verify this
additional behaviour.
Clients can ensure mutual exclusion in the critical sections if in place of acquire,
release and tryacquire, they use abstract operations AAcq, ARel and ATry below,
respectively, which do not use buffers. We will refer to such clients as abstract clients.
word x = 1;
void AAcq() {
await (x = 1) {
x <== 0
} }
void ARel() {
x <== 1
}
bool ATry() {
if CAS(x, 1, 0) {
return true
else return false } }
Here, statement await denotes a blocking atomic test-and-set statement, e.g., AAcq()
can only execute if x = 1 holds, and its execution atomically sets the value of x to 0.
Unlike the concrete program in Fig. 2, all reads and writes occur directly with main
memory; we use assignments of the form x <== 0 (which directly updates the value of
x in memory) to distinguish this difference. If x = 0, then AAcq() blocks and cannot
execute further until its guard x = 1 is set to true by another process. Operation ATry()
attempts to update x to 0 using a non-blocking atomic compare-and-swap operation
CAS, and returns 1 if the operation is successful and 0, otherwise.4
Our notion of correctness of the spinlock will be to show that every possible exe-
cution of a spinlock client is a possible execution of an abstract client. To this end, we
prove refinement between the behaviour of the two executions (see Section 5.3). Prov-
ing refinement under TSO is difficult; one must not only verify concurrency effects, but
additionally consider the effect of accessing the buffer during a program’s execution.
Furthermore, the level of atomicity at which these effects are visible is fine-grained,
3 Such restrictions on client behaviour must be made due to the simplicity of the spinlock al-
gorithm in Fig. 2. Arbitrary client behaviour e.g., two consecutive calls to release without
acquiring the lock will result in incorrect behaviour under TSO.
4 CAS(a, b, c) is equivalent to atomic { if a = b then a := c ; return true
else return false}.
4
occuring at the level of individual reads and writes. This paper develops a high-level ap-
proach for proving refinement that avoids the need to consider low-level (fine-grained)
effects whenever possible by developing an interval-based semantics for programs un-
der TSO. This allows one to view the concurrent execution of two processes as the
conjunction of their behaviours over an interval (as opposed to an interleaving of their
traces), reducing the impact of non-determinism due to concurrency.
3 Interval-Based Reasoning
3.1 Permission Monitoring
Using an interleaved execution semantics, one can guarantee that a variable will not be
simultaneously written, or read and written as part of the same transition. This is not
true for shared-memory true concurrency, where one must model variable access by the
different processes (e.g., two processes simultaneousy modifying variable x in Fig. 2).
Our solution is to explicitly define read/write permissions. To this end, we assume
that programs are executed by processes from a set Proc; each process represents a
concurrent thread which modifies a set of variables from a set Var. The TSO architecture
uses sophisticated coherence protocols to provide an illusion of shared memory. One
may assume the following about read and write instructions:
– Two simultaneous writes (by different processes) to the same variable do not occur.
– A simultaneous read and write of the same variable does not occur.
– A process never has access permission to the local variable of another process.
As we shall see in Sections 4.2 and 4.4, permissions also provide a convenient mecha-
nism for formalising the effect of a lock-unlock block.
In previous work [11], we have modelled permissions using a fractional encoding
(inspired by [5]). Here, we simplify these general notions and define the permission
space as Perm “= Proc → Var → P{wr, rd}, where wr and rd denote write and read
permission, respectively. Using ‘.’ for function application, given pi ∈ Perm, we inter-
pret wr ∈ pi.p.v (resp., rd ∈ pi.p.v) as p ∈ Proc has permission to write to (resp., read
the value of) v ∈ Var.
A system at any time is described by a state of type State “= Var → Val, where
Val is the set of values. The system over time is formalised by a stream, which is a
total function of type Stream “= Z → (State × Perm). Therefore, for each time in
Z, a stream formalises the state of the system and the permissions for each process
and variable. Properties of a system are given by predicates; a predicate of type T is
a member of PT “= T → B, e.g., PState, PStream, and PPerm are state, stream, and
permission predicates, respectively. We assume pointwise lifting of boolean operators
over predicates in the normal manner.
If pi ∈ Perm, then W.p.v.pi “= (wr ∈ pi.p.v), R.p.v.pi “= (rd ∈ pi.p.v) and
N .p.v.pi “= (pi.p.v = ∅) denote permission predicates that hold iff process p has write,
read or no access to v in the permission space pi, respectively.
Example 1. Suppose Var = {u, v}, Proc = {p, q} and
pi “= {p 7→ {u 7→ {rd,wr}, v 7→ {rd}}, q 7→ {u 7→ ∅, v 7→ {rd}}}
5
Then,W.p.u.pi (p has write permission to u), (R.p ∧ R.q).v.pi (both p and q have read
permission to v) and N .q.u.pi (q has no permission to u) in space pi. Note that due to
pointwise lifting (R.p ∧ R.q).v.pi = R.p.v.pi ∧ R.q.v.pi. 
The assumptions on reads and writes above are then taken into account by assuming
that each valid permission space pi satisfies the following, where p and q such that p 6= q
are processes, v is a variable, and up is a local variable of process p.
(W.p.v.pi ⇒ N .q.v.pi) ∧ (R.p.v.pi ⇒ ¬W.q.v.pi) ∧ pi.p.up = {wr, rd} (1)
Note that the third conjunct combined with the first ensures that q does not have read
nor write permission to any local variable of p. This is lifted to the level of streams by
defining a valid stream to be one in which each s.t is valid for t ∈ Z. For the rest of the
paper, we assume each stream is valid.
To simplify the notation, for a state predicate b and permission predicate z, we
assume b.(σ, pi) = b.σ and z.(σ, pi) = z.pi, where σ and pi are a state and a permission
state, respectively. We assume ‘’ denotes a projection operator, e.g., (x, y)  1 = x.
3.2 Interval Predicates
In this section, we provide the basics of interval predicates, which forms the logical
basis of our program semantics. Our logic is an adaptation of Interval Temporal Logic
[19]. An interval is a contiguous set of integers (denoted Z), and hence the set of all
intervals is Intv “= {∆ ⊆ Z | ∀t1, t2:∆ • ∀t:Z • t1 ≤ t ≤ t2 ⇒ t ∈ ∆}.
We let lub.∆ and glb.∆ denote the least upper and greatest lower bounds of an in-
terval ∆, respectively. Furthermore, we define inf.∆ “= (lub.∆ =∞), fin.∆ “= ¬inf.∆,
and ε.∆ “= (∆ = ∅). We define an ordering ∆1 < ∆2 “= ∀t1:∆1, t2:∆2 • t1 < t2. To
facilitate reasoning about specific parts of a stream, we use interval predicates, which
have type IntvPred “= Intv→ PStream [11, 13].
Example 2. Given Var, Proc and pi as defined in Example 1, we define
σ1 “= {u 7→ 500, v 7→ 42} s “= λ t • if t ≥ 10 then (σ1, pi) else (σ2, pi)
σ2 “= {u 7→ 0, v 7→ 1} g “= λ∆ • λ s • ∀t:∆ • ((s.t)  1).u ≥ 300
b “= λσ • σ.u < σ.v
Then, σ1, σ2 are states, b a state predicate, s is a stream and g is an interval predicate.
Each of (¬b).(σ1, pi), b.(σ2, pi), ¬g.[−3, 3].s and g.[10, 100).s hold.5 
We define universal implication g1 V g2 “= ∀∆: Intv, s: Stream • g1.∆.s ⇒ g2.∆.s for
interval predicates g1 and g2, and write g1 ≡ g2 iff both g1 V g2 and g2 V g1 hold.
4 Concurrent Programming with Intervals
4.1 Operators to Model Programming Constructs
In this section, we introduce interval predicate operators used to formalise common
programming constructs: sequential composition, branching, loops, and parallel com-
position. To model sequential composition, we define the chop operator [19, 13]. Unlike
5 Here, [−3, 3] is the closed interval from −3 to 3 and [10, 100) is the right-open interval from
10 to 100.
6
Interval Temporal Logic, which requires adjoining intervals to overlap at a single point,
adjoining intervals in our logic are disjoint.
(g1 ; g2).∆.s “= (∃∆1, ∆2 • (∆1 ∪∆2 = ∆) ∧ ∆1 < ∆2 ∧ g1.∆1.s ∧ g2.∆2.s) ∨
(inf ∧ g1).∆.s
Thus, (g1 ; g2).∆.s holds iff either interval ∆ may be split into two adjoining parts ∆1
and ∆2 so that g1 holds for ∆1 and g2 holds for ∆2 in s, or the least upper bound of ∆
is∞ and g1 holds for ∆ in s. Inclusion of the second disjunct (inf ∧ g1).∆.s enables g1
to model an infinite (divergent or non-terminating) program. We assume that ‘;’ binds
tighter than all other binary operators, e.g., g1 ; g2 ∨ h = (g1 ; g2) ∨ h.
Example 3. For b, g and s as defined in Example 2, if h “= λ∆ • λ s • ∃t:∆ • b.(s.t),
then (h ; g).[0, 100).s holds because both h.[0, 10).s and g.[10, 100).s hold. Note that
there may be more than one possible way to split up an interval when applying the
definition of chop. 
Non-deterministic choice is modelled by (lifted) disjunction, and hence, for exam-
ple, the behaviour of if b then S1 else S2 is given by test.b ; beh.S1 ∨ test.(¬b) ;
beh.S2, where test.b and beh.S1 are interval predicate formalisations of evaluating b
and executing S1, respectively. The precise value of test.b depends on the atomicity as-
sumptions of the program under consideration, and hence, the interpretation of test.b is
non-trivial (see [17, 13, 11]). The value of test.b is modelled by command [b] (see Sec-
tion 4.2), whereas at the concrete level its value is formalised by a different command
[[b]], which takes the effect of the buffer into account (see Section 4.3).
Iteration g∗ and gω are the least and greatest fixed points of λ z • gz ∨ ε, respectively
[14], where g∗ allows empty and finite iterations and gω allows empty, finite and infinite
iterations of g. We also define strictly finite and possibly infinite positive iterations.
g∗ “= µ z • ((g ; z) ∨ ε) g+ “= g ; g∗
gω “= νz • ((g ; z) ∨ ε) gω+ “= g ; gω
A thorough algebraic treatment of loops using iteration is given in [3]. For example,
program code while b do S is modelled by (test.b ; beh.S)ω ; test.(¬b). In this paper,
we use the following rule.
Law 1 (Leapfrog [3]) For interval predicates g and h, both g ; (h ; g)ω ≡ (g ; h)ω ; g
and g ; (h ; g)∗ ≡ (g ; h)∗ ; g hold.
We are interested in modelling true concurrency and therefore simply treat the par-
allel composition of two or more processes using (lifted) logical conjunction. For ex-
ample, the behaviour of g1 ; g2 in parallel with h1 ; h2 over an interval ∆ in stream
s is given by (g1 ; g2 ∧ h1 ; h2).∆.s. Using pointwise lifting, this is equivalent to
(g1 ; g2).∆.s ∧ (h1 ; h2).∆.s, which holds iff (a) ∆ can be split into adjoining intervals
∆1 and ∆2 such that g1.∆1.s ∧ g2.∆2.s holds; and (b) ∆ can also be split into adjoin-
ing intervals ∆3 and ∆4 such that h1.∆3.s ∧ h2.∆4.s. Note that there is no immediate
correlation between the lengths of ∆1 and ∆3, i.e. g1 could terminate earlier than h1,
and vice versa.
Modelling tests. Interval predicates provide a flexible approach to non-deterministic
state predicate evaluation [17], where expression evaluation is assumed to take time (as
opposed to being instantaneous). In this paper, guards and assignments are restricted
7
to contain at most one shared variable.6 Given that c is either a state or permission
predicate, and ∆ and s are an interval and stream, respectively, we define:
( c).∆.s “= ¬ε.∆ ∧ ∀t:∆ • c.(s.t)
Thus, ( c).∆.s holds iff ∆ is non-empty and c holds for each state of s within ∆. For
example, ¬ε ∧ g ≡ (u ≥ 300), where g is the interval predicate defined in Example 2.
Reasoning about pre/post assertions. One may define several additional interval pred-
icate operators [13]. For the purposes of this paper, we find it useful to reason about
properties that hold in the immediately preceding interval. We therefore define
( g).∆.s “= ¬ε.∆ ∧ glb.∆ 6= −∞ ∧ g.(prev.∆).s
where prev.∆ “= {t:Z | ∀u:∆ • t < u} is the interval of all times before∆. If c is a state
or permission predicate, we use notation−→c “= true ;  c, where−→c .∆ states that c holds
at the end of ∆ whenever inf.∆ 6=∞. Additionally, we define the following notation to
reason about assertions that immediately precede, or are a result of a computation.
{c}g “= −→c ∧ g g{c} “= g ∧ −→c
Such a definition of a pre-assertion is necessary because we assume adjoining intervals
do not overlap (unlike [19]). We have the following useful properties, which can be
proved in a straightforward manner.
{c}(g1 ∨ g2) ≡ ({c}g1) ∨ ({c}g2) (2)
g1{c} ; g2 ≡ g1 ; {c}g2 provided g1 ∨ g2 V ¬ε (3)
4.2 Abstract Commands
Using the interval-based semantic basis from the previous sections, we formalise com-
mands, which describe the behaviours of the system processes. Formally, a command is
of type Cmd “= P1 Proc→ IntvPred, mapping non-empty sets of processes to an inter-
val predicate representing their behaviour. We use C.p as shorthand for C.{p}, where C
is a command and p is a process.
The semantics of sequential composition, iteration, non-deterministic choice and
parallel composition of commands are defined pointwise lifting of the interval predicate
operators, and hence, are given in the same syntax, e.g., (C1 ; C2).p = C1.p; C2.p. What
remains is to define the commands to model, say, guard evaluation and assignment.
We first present some basic commands that may be used to models the abstract (se-
quentially consistent) specification. In particular, we define idling (denoted id), abstract
guard evaluation (denoted [b]), memory update (denoted v⇐\ e) and locked access (de-
noted v • C ), where v and e denote vectors of variables and expressions, respectively.
We define Deny.v.p “= (∀q: Proc\{p}, v: v • (¬W ∧ ¬R).q.v), which states that the
variables in v are not accessed by processes other than p. We assume that vars.b denotes
the free variables in b.
nid.p “= ∀v: Var • ¬W.p.v
id.p “= ¬ε⇒ nid.p
[b].p “= b ∧ nid.p ∧
∀v: vars.b • R.p.v
(v⇐ \ e).p “= ∃k • {e = k}(v = k) ∧ (∀v: v •W.p.v)
v • C .p “= (v 6= ∅⇒ Deny.v.p) ∧ C.p ∧
(∀v: v •(W ∧ R).p.v)
6 This can be extended to handle more complex expressions [17].
8
Thus nid.p states that p is write idle i.e., p does not have write access to any variable
during the given (non-empty) interval; id.p states that either p is write idle or the interval
under consideration is empty; [b].p holds iff b holds throughout the given interval and
p is idle; v⇐ \ e denotes an instantaneous update, which holds iff e evaluates to a vector
of values k as a pre-assertion, and v is updated to k, where p has write permission to
each v ∈ v; and v • C .p holds iff C.p holds and additionally no process other than p
has permission to access v ∈ v.
Example 4. The abstract specification is formalised as follows, where AAcq and ARel
specify operations AAcq and ARel, respectively, while ATryOK and ATryFl specify ex-
ecution of the ATry operation that succeed and fail to acquire the lock, respectively. We
abbreviate x = 1 and x = 0 to x and ¬x, respectively. The return value of an execution
of tryacquire in process p is modelled by a local variable rp.
AAcq.p “= x • [x] ; (x⇐ \ 0) ATryOK.p “= AAcq.p ; id ; (rp ⇐\ true)
ARel.p “= x⇐ \ 1 ATryFl.p “= x • [¬x] ; id ; (rp ⇐ \ false)
AExec.p “= ((AAcq ; id ; ARel) ∨ (ATryOk ; id ; ARel) ∨ ATryFl).p
Spec.P “= {x}∧p:P((id ; AExec)ω ; id).p
The concurrent execution of abstract clients is modelled by Spec, which begins in a
state in which the lock x is available (i.e., x holds) and consists of a number of (truly)
parallel processes. We assume that each client of the spinlock behaves as either Fig. 3
or Fig. 4, and furthermore, that the critical and non-critical sections do not modify
variables x and rp, and hence, both the critical and non-critical sections are modelled
by id. Therefore, id ; AExec models a single call to the abstract spinlock. Each process
may make multiple (zero or more) calls, followed by no calls, and hence, all possible
behaviours of an abstract client is given by (id ; AExec)ω ; id.
We now explain how each operation is modelled. If AAcq.p.∆.s holds for interval
∆ and stream s, then only process p has access to x (i.e., no process q 6= p may read or
write to x) and either (i) ∆ can be partitioned into ∆1 and ∆2 with ∆1 < ∆2 such that
x holds in s throughout ∆1 and x is updated to 0 in s within ∆2, or (ii) ∆ is infinite and
x (i.e., x = 1) holds in s throughout ∆. Because await b blocks until test.b becomes
true, there are no behaviours for AAcq.p when test.(¬b) holds. Operation ARel.p im-
mediately sets x to 1, and by the definition of⇐ \ together with assumption (1), we have
that no other process reads or writes to x while this update occurs. Operation ATryOK.p
behaves as AAcq.p, performs some idling, then updates rp to true. The idling between
AAcq.p and update to rp provides scope for potential stuttering at the concrete level. Op-
eration ATryFl.p starts by behaving as x • [¬x] , which implies that x is not accessed
by any process q 6= p and that ¬x holds throughout the given interval. Then, ATryFl.p
performs some idling and updates the return value rp to false. 
4.3 Reading Variables for Expression Evaluation with Buffer Effects
Section 4.2 provided an interval-based semantics for commands without buffers, which
were in turn used to model the abstract specification. The concrete program executes un-
der TSO memory and contains local buffers, whose effects on the program’s behaviour
9
must be formalised. In this section, we present a method for evaluating expressions,
i.e., when processes read variables, in the presence of local buffers. In particular, we
formalise the fact that a TSO process first checks its buffer for pending writes; if a
pending write exists, the last pending value is returned, and otherwise the value from
memory is returned. Using interval-based methods enables one to formalise the effects
of a buffer on the value of an expression at a high level of abstraction [15].
We assume that Bp ∈ Var denotes the buffer for process p, whose value is of type
seq.(Var×Val), representing a pending write. Each buffer may contain multiple pending
writes to the same location, and hence, we define a function cover that returns a set of
mappings to the last pending write in a given buffer. Because seq.X is a partial function
of type N 7→ X, we may use dom.z to refer to the indices of z ∈ seq.X.
cover.B “= {B.i | i: dom.B ∧ ∀j: dom.B • j > i⇒ (B.i  1) 6= (B.j  1)}
When a process evaluates an expression, the values of pending writes in a process’
buffer mask those in memory, which is modelled formally using functional override
‘⊕’ (see [24] for a formal definition).
Example 5. Suppose B and BB are buffers (BB is not shown below), p and q are pro-
cesses, u, v, and w are variables, and σ is a state such that
B “= 〈(v, 11), (w, 33), (v, 44)〉 σ “= {(Bp,B), (Bq,BB), (u, 0), (v, 1), (w, 2)}
Then we have cover.B = {(w, 33), (v, 44)}, i.e., for each variable in B its last corre-
sponding value in B is picked. Hence, we have
σ ⊕ cover.B = {(Bp,B), (Bq,BB), (u, 0), (v, 44), (w, 33)}
which replaces each mapping in σ by those in cover.B. 
We lift buffer effects to state predicates using (mask.b.B).σ “= b.(σ ⊕ cover.B) , which
states that b holds in a state σ covered by B. For the definitions in Example 5, both
(mask.(u = 0).B).σ and (mask.(w < v).B).σ hold, but (w < v).σ does not.
Processes evaluate state predicates (e.g., as part of a guard), however, in the pres-
ence of permissions and local buffers, evaluation is non-trivial. Firstly, one must ensure
that a process p evaluating state predicate b is able to obtain read permission to each
variable of b whenever the variable’s value is fetched from memory. Note that this is
only potentially problematic if the variable in question is shared (i.e., not a local vari-
able of p) and not in p’s buffer (p may can always access its local buffer). Secondly, the
value of a variable v read by p must be the last value of v in p’s buffer if it exists, and
the value of v in memory, otherwise. Assuming vars.B “= {(B.i)  1 | i ∈ dom.B} is the
set of all variables in B ∈ seq.(Var × Val), we define:p b “= (mask.b.Bp ∧ ∀v: vars.b\vars.Bp •R.p.v)
Thus, (p b).∆.s models the evaluation of state predicate b by the process p, by either
reading variables from p’s buffer (if possible) or from main memory. Here (p b).∆.s
holds iff (i) b masked by Bp holds in s throughout ∆, and (ii) p has read permission
to the free variables of b not in vars.Bp throughout ∆ in s.7 In Section 4.4, (p b) is
7 In general, if b contains multiple shared variables, p b is not an accurate model of evaluation
because the variables in b may be read at different instants [17, 10, 12]. However, here, we
assume that each expression/guard of each program under consideration contains at most one
shared variable, in which case p b is accurate (see [17, 10, 12]).
10
used to define expression evaluation, which in turn is used to model guards and the right
hand side of assignments.
4.4 Commands under TSO
As already mentioned, processes that execute under TSO write only to their local buffers.
The effects of these writes are not seen by other processes until a buffer is flushed, which
moves the pending write from a buffer to shared memory. TSO buffers operate in a FIFO
order, and hence, we define the following commands, where Φ models a single flush, Φ˜
models a flush or a non-empty idle, and Φall models a complete buffer flush.
Φ .p “= ∃k • {Bp = k ∧ k 6= 〈 〉}
Bp, (k.0  1)⇐ \ tail.k, (k.0  2) Φ˜ “= Φ ∨ nidΦall .p “= Φ+.p {Bp = 〈 〉}
Due to the fine-granularity of the concrete implementation, seemingly atomic state-
ments become compound commands under TSO memory. Evaluation of a boolean ex-
pression b (e.g. a guard in an if-then-else block) is a compound statement that flushes
or idles (zero or more times), evaluates b using the buffer-based evaluation semantics
defined in Section 4.3, then flushes or idles again (zero or more times). A write of v
with value k appends the pair (v, k) to the end of the local buffer. An assignment to a
constant value k, potentially flushes or idles (zero or more times), appends the value
to the buffer, then potentially flushes or idles (zero or more times). An assignment to
a complex expression e, first evaluates the expression to a value k, then assigns k to v.
Thus, we define:
[[b]].p “= Φ˜∗.p ; (p b ∧ nid.p) ; Φ˜∗.p
(v← [ k).p “= Bp ⇐\ Bp a 〈(v, k)〉
v := e “= if e ∈ Val then Φ˜∗ ; (v← [ e) ; Φ˜∗ else ∃k: Val • [[e = k]] ; v := k
There are several TSO instructions that force the entire buffer to be flushed. These addi-
tionally may lock certain variables from being accessed while the flush all is being exe-
cuted. We therefore define commands preΦ.v.C.p and postΦ.v.C.p, where preΦ.v.C.p
flushes the entire buffer (locking v) before C is executed (and similarly postΦ.v.C.p
flushes all after C).
preΦ.v.C.p “= {Bp = 〈 〉}C.p ∨ ( v • Φall ; C).p
postΦ.v.C.p “= C.p{Bp = 〈 〉} ∨ (C ; v • Φall ).p
Some TSO instructions do not lock the memory while the buffer is being flushed.
These may be modelled using preΦ.∅.C and postΦ.∅.C, which we abbreviate to preΦ.C
and postΦ.C, respectively. A lock (e.g., a2 in Fig. 2) acquires the memory lock then
flushes the entire buffer; an unlock (e.g., a5 in Fig. 2) flushes the entire buffer then
releases the memory lock. Therefore, executing a command C within a lock-unlock
block is modelled by
v • C
Φ
“= preΦ.v.(postΦ.v. v • C )
which executes C and ensures the buffer is empty before and after executing C. In
addition it ensures that no reads and writes to v by other processes occur while C is
being executed. Note however, that if a process p executes v • C
Φ
and a process q 6= p
has a pending write to v in its local buffer, then q may read this value of v even while p
is executing v • C
Φ
.
11
Example 6. Our modelling notation is used to formalising the behaviour of the concrete
implementation as follows, where Lck.p “= x • [[x]] ; (x := 0)
Φ
.
Acq.p =̂
Ä
x • [[¬x]]
Φ
; [[¬x]]ω ; [[x]]
äω
; Lck.p Rel.p =̂ x := 1
TryOK.p =̂ Lck.p ; (rp := true) TryFl.p =̂ x • [[¬x]] Φ ; (rp := false)
Exec.p =̂ (Acq ; id ; Rel ∨ TryOK ; id ; Rel ∨ TryFl).p
Prog.P =̂ {x}∧p:P ({Bp = 〈 〉}(Φ˜+ ; Exec)ω ; postΦ.x.(Φ˜+)).p
Within Acq.p, command x • [[¬x]]
Φ
models an execution consisting of the lock at a2,
failed test at a3, unlock at a7 (see Fig. 2). Command [[¬x]]ω ; [[x]] models the while loop
at a8. Therefore, the outermost ω iteration in Acq.p models executions of the outermost
loop of acquire that fail to acquire the lock. Command Lck.p models the lock at a2,
successful test at a3, assignment at a4, and unlock at a5 followed by the return at a6.
The other operations are similar. 
5 Refinement and Local Refinement for TSO
5.1 Interval-Based Refinement
In this section, we develop a theory for proving that a command C refines another
command A, providing a formal link between the behaviours of C and A. Here, A is an
abstraction and therefore admits more behaviours than C, or conversely, any behaviour
of C must also be a behaviour of A. In an interval-based setting, we use the following
definition of refinement [11]. In the context of our example, if refinement holds, then
whenever a spinlock client is able to enter its critical section, it must also be possible
for the abstract client to enter the critical section.
Definition 1. If C and A are commands, then C refines A with respect to P ⊆ Proc, (de-
noted C wP A) iff for any interval ∆ and stream s, C.PV A.P. We say C is equivalent
to A with respect to P (denoted C vwP A) iff both C wP A and A wP C.
Refinement is defined in terms of implication, and hence, relation wP is both reflexive
and transitive. In this paper, we use wP as a basis for transforming the abstract spec-
ification Spec and the concrete program Prog individually. We use a notion of local
buffer refinement (Definition 2) to relate concrete behaviours (with buffers) to abstract
behaviours (without buffers).
Example 7. We transform the TSO implementation Prog into a form that is more ame-
nable to verification. In particular, a difficulty encountered when verifying Prog in Ex-
ample 6 directly is that for each process, p, command Exec.p is not guaranteed to end
in a flush, and hence changes to x may not be globally visible until the start of the next
iteration (which starts with a lock that performs a flush all). In particular, it is not im-
mediately possible to match the behaviour of Rel with abstract ARel because Rel only
places a pending write in the buffer, whereas ARel modifies the value of x in memory.
Therefore, we aim to transform Prog to Prog′ below (see Proposition 1), where the flush
occurs at the end of execution. We use notation
v • C
Φ
“= postΦ.v. v • C
12
Unlike v • C
Φ
, command v • C
Φ
only flushes the buffer at the end of execution.
Below, we have used the property gω ≡ ε ∨ gω+ to split Acq into two cases. Note that
Acq′1 is defined in terms of Acq.
Acq′1.p =̂ x • [[¬x]] Φ ; [[¬x]]
∗ ; [[x]] ; Acq.p Acq′2.p =̂ x • [[x]] ; (x := 0) Φ
TryOK′.p =̂ x • [[x]] ; (x := 0)
Φ
; (rp := true) TryFl′.p =̂ x • [[¬x]] Φ ; (rp := false)
Exec′.p =̂ ((Acq′1 ; id ; Rel) ∨ (Acq′2 ; id ; Rel) ∨ (TryOK′ ; id ; Rel) ∨ TryFl′).p
Prog′.P =̂ {x}∧p:P Äid ; Ä{Bp = 〈 〉}Exec′ ; postΦ.x.(Φ˜+)äωä .p 
Clearly, transforming Prog to Prog′ by reasoning at trace-based level of Definition 1 is
infeasible. Therefore, we develop a number of refinement laws that are applied to our
example. First, we have the following; the proof of each equivalence is straightforward.
Law 2 If p ∈ Proc, C and D are commands, each Ci is a command and v is a vector of
variables, then
v • C
Φ
vwp preΦ.v. v • C Φ (4)
preΦ.C vwp preΦ.({Bp = 〈 〉}C) (5)
preΦ.v. v • C Φ ; D vwp preΦ.v.
Ä
v • C
Φ
; D
ä
provided C.pV ¬ε (6)∨
i preΦ.Ci vwp preΦ. (
∨
i Ci) (7)
C ; preΦ.D vwp postΦ.C ; D provided C.p ∨ D.pV ¬ε (8)
To transform Prog to Prog′, we develop a leapfrog theorem analogous to Law 1,
whose proof uses the equivalences defined in Law 2 as well as PF.v “= postΦ.v.Φ˜+.
Theorem 1 (Leapfrog flush). Suppose p ∈ Proc, each Ci and Di is a command such
that Ci.pV ¬ε, and v is a vector of variables. ThenÄ
Φ˜
+
;
Ä∨
i v • Ci Φ ; Di
ääω
; PF.v vwp PF.v ;
Ä
{Bp = 〈 〉}
Ä∨
i v • Ci Φ ; Di
ä
; PF.v
äω
The left hand side of Theorem 1 contains a disjunction that executes v • Ci Φ , which
ensures the buffer is empty (via flushes if necessary) both before and after execution of
Ci. After the end of the iteration, command PF.v is executed, which ensures the buffer
is empty when the process terminates; flushes may be necessary due to the behaviour
of Di. On the right hand side, each iteration is guaranteed to start with an empty buffer
and each disjunct starts with the weaker v • Ci Φ , which only flushes the buffer at the
end of execution. However, each iteration ends with PF.v. Further note that on the right
hand side, each iteration is guaranteed to begin in a state where the buffer of p is empty.
Proposition 1. Prog wP Prog′
Proof. Applying Theorem 1 to Prog, we obtain:
Prog′′.P “= {x}∧p:P ({Bp = 〈 〉}PF.x ; ({Bp = 〈 〉}Exec′ ; PF.x)ω) .p
Then, because {Bp = 〈 〉}PF.x vp id, we have Prog′′ vP Prog′. 
For the proof in Section 5.3, we find the following laws to be useful, each of which
is proved in a straightforward manner. Note that for (11) and (14), the refinement only
holds in one direction. Of these, (14) states that an assignment {Bp = 〈 〉}v := k either
ends with Bp = 〈(v, k)〉, or the buffer Bp is flushed as part of the assignment semantics.
13
Law 3 Suppose C and D are commands, p is a process, v is a variable and k a value.
Then each of the following holds:
{Bp = 〈 〉} v • C Φ vwp {Bp = 〈 〉} v • C Φ (9)
v • C ; D
Φ
vwp v • C ; v • D Φ provided C.p ∨ D.pV ¬ε (10)
{Bp = 〈 〉} v • v := k Φ wp v • id ; v⇐\ k ; id {Bp = 〈 〉} (11)
v • C ; v • D vwp v • C ; D (12)
{Bp = 〈 〉} v • [[b]] vwp {Bp = 〈 〉} v • [b] (13)
{Bp = 〈 〉}v := k wp (id ; v← [ k ; id){Bp = 〈(v, k)〉} ; (ε ∨ (Φ ; id)) (14)
5.2 Local Buffer Refinement
In this section, we develop a novel method for proving refinement for TSO architec-
tures, where buffer effects are taken into account. The method allows one to prove that
a (concrete) process with buffer effects has the same behaviour as an (abstract) pro-
cess without the buffer. In essence, one may pretend that the effect of local changes
have already been flushed at the abstract level. This is allowed because in the context
of the overall behaviour of a program, it makes no difference whether a variable local
to process p has a pending write in p’s local buffer, or in shared memory. This essen-
tially removes the potential non-determinism that arises from reasoning about flushes
for local variables. We let
LCover.P.(σ, pi) “= (σ ⊕⋃p:P LVar.pC cover.(σ.Bp), pi)
LBuffer.P.s “= λ t:Z • LCover.P.(s.t)
Here cover.(σ.Bp) generates a set of pairs from σ.Bp and LVar.pCcover.(σ.Bp) restricts
these to the local variables of p. Within LCover.P.(σ, pi) such a localised mapping is
generated for each p ∈ P, and then, σ is overwritten by this mapping.
Definition 2. If C and A are commands, and P is a non-empty set of processes, we say
C buffer refines A with respect to P, (denoted CcP A) iff for any interval ∆ and stream
s, (C.P).∆.s⇒ (A.P).∆.(LBuffer.P.s).
Thus, whenever C holds for a set of processes P over interval ∆ in stream s, command
A must hold for P in ∆ for the masked stream LBuffer.P.s. In particular, this implies
that A behaves as if the local buffer effects of the concrete command have already been
applied. We write cp for c{p}.
Clearly, reasoning at the level of Definition 2 is infeasible. Instead, we explore some
higher level properties for the programming constructs we use. In general, buffer re-
finement is neither reflexive nor transitive8, e.g., if C always reads from main memory
regardless of the state of the buffer, then reflexivity does not hold. However, cP may
be combined with standard refinement as follows, which holds trivially after expanding
the definitions of wP and cP.
Theorem 2. If C′ wP C, C cP A and A wP A′ then C′ cP A′.
8 Fully exploring the properties of cP lie outside the scope of this paper.
14
We immediately have the following laws, which help simplify verification of assign-
ments and buffer flushes.
Law 4 If v ∈ Var, p ∈ Proc, k ∈ Val, P “= Bp 6= 〈 〉 ∧ (Bp.0  1) ∈ LVar.p, and
Q “= Bp 6= 〈 〉 ∧ (Bp.0  1) 6∈ LVar.p, then
{v ∈ LVar.p}v←[ k cp v⇐\ k (15)
{v 6∈ LVar.p}v←[ k cp id (16) {P}Φ cp id (17){Q}Φ cp (Bp.0  1)⇐\ (Bp.0  2) (18)
By conditions (15) and (16), adding a pending write (v, k) to p’s buffer is a local buffer
refinement of a global update to v whenever v ∈ LVar.p, and of id, otherwise. On the
other hand, (17) states that flushing a local variable of process p from p’s buffer has
the same effect as executing id abstractly (because the effect of the variable has already
occurred), and condition (18) states that flushing a global variable has the same effect
as executing the corresponding write in memory at the abstract level.
The laws below allow local buffer refinement to be decomposed, and the pre/post
assertions under local buffer refinement to strengthened.
Law 5 If C cP A, C1 cP A1 and C2 cP A2 for a set of processes P, and b, c are state
predicates such that b⇒ c and (vars.b ∪ vars.c) ∩ (⋃p:P LVar.p) = ∅, then
C1 ; C2 cP A1 ; A2 (19)
Cω cP Aω (20)
C1 ∧ C2 cP A1 ∧ A2 (21)
C1 ∨ C2 cP A1 ∨ A2 (22)
{b}C cP {c}A (23)
C{b} cP A{c} (24)
Law 6 (Parallel composition) If C′.P “= ∧q:P C.q, A′.P “= ∧q:P A.q and Ccq A holds
for each for each q ∈ P, then C′ cP A′.
5.3 Application: Spinlock Example
We now apply our rules to the running spinlock example (modelled by Prog), and prove
it to be a refinement of the abstract program (modelled by Spec). Our notion of refine-
ment is local buffer refinement (Definition 2), i.e., we show
ProgcP Spec (25)
for an arbitrarily chosen non-empty set of processes P. Various refinement rules may
be introduced to generalise the theory as needed. By Theorem 2 and Proposition 1, (25)
immediately reduces to a proof of Prog′cP Spec. Using (23), followed by Law 6, proof
of Prog′ cP Spec again reduces to the following for some arbitrarily chosen p ∈ P.
id ;
Ä
{Bp = 〈 〉}Exec′ ; postΦ.x.(Φ˜
+
)
äω
cp (id ; AExec)ω ; id (26)
We use id vwp id ; id to split the first id on the right hand side of (26), then Law 1,
to obtain id ; (id ; AExec.p ; id)ω . Hence, using (19) followed by (20), the proof of (26)
reduces again to a proof of
{Bp = 〈 〉}Exec′ ; postΦ.x.(Φ˜
+
)cp id ; AExec ; id (27)
Then using (22), condition (2) and the fact that ; distributes over ∨, we are left with a
number of proof obligations for each disjunct of Exec′. Of these, we present the most
complex: the proof obligation for Acq′1.
15
{Bp = 〈 〉}Acq′1 ; id ; Rel ; postΦ.x.(Φ˜
+
) cp id ; AAcq ; id ; ARel ; id (28)
It is trivial to show {Bp = 〈 〉} x • [[¬x]] Φ ; [[¬x]]
∗ ; [[x]] ; Acq wp id ; {Bp = 〈 〉}Acq,
and using id vwp id ; id the proof of (28) reduces as follows, where the initial part of
Acq′1 is refined to id.
{Bp = 〈 〉}Acq ; id ; Rel ; postΦ.x.(Φ˜
+
) cp id ; AAcq ; id ; ARel ; id (29)
We now focus on the initial part of the left hand side, where we distinguish between
ALoop “= x • [[¬x]]
Φ
; [[¬x]]ω ; [[x]] and ADo “= x • [[x]] ; (x := 0)
Φ
.
{Bp = 〈 〉}Acq
vwp {Bp = 〈 〉}ADo ∨ {Bp = 〈 〉}ALoopω+ ; ADo defn of Acq, then gω ≡ ε ∨ gω+
Using {Bp = 〈 〉}ALoopω+ cp id{Bp = 〈 〉} (proof elided) condition (29) reduces to
{Bp = 〈 〉}ADo ; id ; Rel ; postΦ.x.(Φ˜
+
) cp AAcq ; id ; ARel ; id (30)
Again focusing on the initial part of the left hand side, we have:
{Bp = 〈 〉} ADo
vwp {Bp = 〈 〉} x • [[x]] ; (x := 0) Φ defn of ADo and by (9)
vwp {Bp = 〈 〉} x • [[x]] ; x • (x := 0) Φ by (10)
wp {Bp = 〈 〉} x • [[x]] ; {Bp = 〈 〉} x • (x := 0) Φ by logic
wp {Bp = 〈 〉} x • [[x]] ; x • id ; (x⇐ \ 0) ; id {Bp = 〈 〉} by (11)
vwp {Bp = 〈 〉} x • [[x]] ; id ; x • (x⇐ \ 0) ; id {Bp = 〈 〉} using (12) twice
wp {Bp = 〈 〉} x • [[x]] ; x • (x⇐ \ 0) ; id {Bp = 〈 〉} [[x]] ; id vwp [[x]]
wp {Bp = 〈 〉} x • [x] ; x • (x⇐ \ 0) ; id {Bp = 〈 〉} by (13)
vwp x • [x] ; (x⇐\ 0) ; id {Bp = 〈 〉} by (12) and weakening
It is straightforward to show x • [x](x⇐ \ 0) ; id cp AAcq ; id, which expanding Rel
and using (14) further reduces (30) to a proof of:
(id ; x← [ 1 ; id){Bp = 〈(x, 1)〉} ; (ε ∨ Φ ; id) ; postΦ.x.(Φ˜+) cp id ; ARel ; id
Using id vwp id ; id the right hand side transforms to id ; id ; ARel ; id. Then, using
(16) and Theorem 2, we have id ; x ← [ 1 ; id cp id, and hence, using (19), we obtain
{Bp = 〈(x, 1)〉}(postΦ.x.(Φ˜
+
) ∨ (Φ ; id ; postΦ.x.(Φ˜
+
)) cp id ; ARel ; id. Finally,
because Φ ; id ; postΦ.x.(Φ˜
+
) wp postΦ.x.(Φ˜
+
), we are left with
{Bp = 〈(x, 1)〉}postΦ.x.(Φ˜
+
) cp id ; ARel ; id (31)
Because there is exactly one item in the buffer, and because postΦ.x.(Φ˜
+
) guarantees
that a flush occurs, the left hand side reduces to {Bp = 〈(x, 1)〉}id ; Φ ; id. Finally, using
the fact that Bp is a local variable of p and is not modified by id followed by (18), our
proof is completed.
Notable in our verification is that concurrency aspects hardly need to be consid-
ered. The fact that the locking mechanisms guarantee safety is understood at the level
of Spec. The refinement proof only requires consideration of the local buffer. This is in
16
contrast to existing methods which require global conditions to be checked, e.g., [20]
checks race conditions, [7, 16, 25] check linearizability, and [9] checks reduction. We
conjecture that more complex examples will indeed require consideration of the be-
haviour of other processes. To this end, we will integrate compositional methods such
as rely/guarantee into our framework [13, 11].
6 Conclusions
Existing approaches to relaxed memory verification (e.g., [6, 21, 22, 9, 20, 7, 16]) focus
on a low-level language (i.e., individual reads/writes), and hence, to perform a verifica-
tion, programs need to be observed and understood in their (verbose) low-level repre-
sentation. We are not aware of any approach that tries to lift memory model effects to a
higher level of abstraction; our work here is hence unique in this sense [15].
The basic idea is to think of a statements as being executed over an interval of time
or an execution window. Such execution windows can overlap if programs are executed
concurrently and overlapping windows correspond to program instructions that can be
executed in any order, representing the effect of concurrent executions and reorderings
due to TSO. Overlapping execution windows may also interfere with each other and
fixing the outcome of an execution within a window can influence the outcome within
another. This paper presents several advances to the semantics in [15] by simplifying
the interval logic, and program semantics, as well as developing buffer-specific rules
for expression evaluation and refinement. The underlying rules are algebraic in nature,
and hence, we provide generic transformation laws, which are in turn applied to our
running example.
A difficulty when reasoning about TSO memory is that in addition to the normal
non-determinism caused by concurrency, an additional level of non-determinism is in-
troduced via use of local buffers. The methods in this paper allow one to reduce the
non-determinism that must be considered when reasoning about local updates. In par-
ticular, we develop a notion of local buffer refinement, which allows one to proceed
as if pending writes to local variables have already occurred in the abstract level. In
particular, this means that local writes do not appear out of order. A similar observa-
tion is used for local transformation in the context of compilers for weak memory [8],
however, these do not consider higher-level synchronisation instructions such as lock.
As part of future work, we aim to study the connections between local buffer refine-
ment, and existing notions such as triangular race freedom [20] and reduction [9].
References
1. S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE
Computer, 29(12):66–76, 1996.
2. J. Alglave. A formal hierarchy of weak memory models. Formal Methods in System Design,
41(2):178–210, 2012.
3. R. J. R. Back and J. von Wright. Reasoning algebraically about loops. Acta Informatica,
36(4):295–334, July 1999.
4. D. Bovet and M. Cesati. Understanding the Linux Kernel. OReilly, 3rd edition, 2005.
17
5. J. Boyland. Checking interference with fractional permissions. In R. Cousot, editor, SAS,
volume 2694 of LNCS, pages 55–72. Springer, 2003.
6. S. Burckhardt, R. Alur, and M. M. K. Martin. Checkfence: Checking consistency of concur-
rent data types on relaxed memory models. In PLDI, pages 12–21, 2007.
7. S. Burckhardt, A. Gotsman, M. Musuvathi, and H. Yang. Concurrent library correctness on
the TSO memory model. In H. Seidl, editor, ESOP, volume 7211 of LNCS, pages 87–107.
Springer, 2012.
8. S. Burckhardt, M. Musuvathi, and V. Singh. Verifying local transformations on relaxed
memory models. In R. Gupta, editor, CC, volume 6011 of LNCS, pages 104–123. Springer,
2010.
9. E. Cohen and B. Schirmer. From total store order to sequential consistency: A practical
reduction theorem. In M. Kaufmann and L. C. Paulson, editors, ITP, volume 6172 of LNCS,
pages 403–418. Springer, 2010.
10. B. Dongol and J. Derrick. Data refinement for true concurrency. In J. Derrick, E. A. Boiten,
and S. Reeves, editors, Refine, volume 115 of EPTCS, pages 15–35, 2013.
11. B. Dongol, J. Derrick, and I. J. Hayes. Fractional permissions and non-deterministic evalua-
tors in interval temporal logic. ECEASST, 53, 2012.
12. B. Dongol and I. J. Hayes. Deriving real-time action systems in a sampling logic. Sci.
Comput. Program., 78(11):2047–2063, 2013.
13. B. Dongol, I. J. Hayes, and J. Derrick. Deriving real-time action systems with multiple time
bands using algebraic reasoning. Sci. of Comp. Prog., 85, Part B(0):137 – 165, 2014.
14. B. Dongol, I. J. Hayes, L. Meinicke, and K. Solin. Towards an algebra for real-time programs.
In W. Kahl and T.G. Griffin, editors, RAMiCS, volume 7560 of LNCS, pages 50–65, 2012.
15. B. Dongol, O. Travkin, J. Derrick, and H. Wehrheim. A high-level semantics for program
execution under total store order memory. In Z. Liu, J. Woodcock, and H. Zhu, editors,
ICTAC, volume 8049 of LNCS, pages 177–194. Springer, 2013.
16. A. Gotsman, M. Musuvathi, and H. Yang. Show no weakness: Sequentially consistent spec-
ifications of TSO libraries. In M. K. Aguilera, editor, DISC, volume 7611 of LNCS, pages
31–45. Springer, 2012.
17. I. J. Hayes, A. Burns, B. Dongol, and C. B. Jones. Comparing degrees of non-determinism
in expression evaluation. Comput. J., 56(6):741–755, 2013.
18. L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess
programs. IEEE Trans. Computers, 28(9):690–691, 1979.
19. B. C. Moszkowski. A complete axiomatization of Interval Temporal Logic with infinite time.
In LICS, pages 241–252, 2000.
20. S. Owens. Reasoning about the implementation of concurrency abstractions on x86-TSO. In
T. D’Hondt, editor, ECOOP, volume 6183 of LNCS, pages 478–503. Springer, 2010.
21. S. Park and D. L. Dill. An executable specification, analyzer and verifier for RMO (relaxed
memory order). In SPAA, pages 34–41, 1995.
22. P. Sewell, S. Sarkar, S. Owens, F. Z. Nardelli, and M. O. Myreen. x86-TSO: a rigorous and
usable programmer’s model for x86 multiprocessors. Commun. ACM, 53(7):89–97, 2010.
23. D. J. Sorin, M. D. Hill, and D. A. Wood. A Primer on Memory Consistency and Cache
Coherence. Synthesis Lectures on Computer Architecture. Morgan & Claypool, 2011.
24. J.M. Spivey. The Z Notation: A Reference Manual. Prentice Hall, 1992.
25. O. Travkin, A. Mütze, and H. Wehrheim. SPIN as a linearizability checker under weak
memory models. In V. Bertacco and A. Legay, editors, HVC’13, volume 8244 of LNCS,
pages 311–326. Springer, 2013.
18
