Inferring Fences in a Concurrent Program Using SC proof of Correctness by Narayan, Chinmay et al.
ar
X
iv
:1
30
4.
29
36
v1
  [
cs
.L
O]
  1
0 A
pr
 20
13
Inferring Fences in a Concurrent Program Using
SC proof of Correctness
Chinmay Narayan, Shibashis Guha, S.Arun-Kumar
Indian Institute of Technology Delhi
Abstract. Most proof systems for concurrent programs [Jon83] [O’H07]
[VP07] assume the underlying memory model to be sequentially consis-
tent(SC), an assumption which does not hold for modern multicore pro-
cessors. These processors, for performance reasons, implement relaxed
memory models. As a result of this relaxation a program, proved correct
on the SC memory model, might execute incorrectly. To ensure its cor-
rectness under relaxation, fence instructions are inserted in the code.
In this paper we show that the SC proof of correctness of an algorithm,
carried out in the proof system of [Sou84], identifies per-thread instruc-
tion orderings sufficient for this SC proof. Further, to correctly execute
this algorithm on an underlying relaxed memory model it is sufficient to
respect only these orderings by inserting fence instructions.
1 Introduction
The memory model of a processor defines the order in which memory opera-
tions, issued by a single processor, appear to execute from the point of view of
the memory subsystem. In a broad sense, it determines whether any two memory
access instructions issued by a processor within a single thread can be reordered.
Sequentially Consistent memory model (SC) is the simplest but most restrictive
of all and does not allow any reordering of instructions within a thread. Mod-
ern multicore processors, for the purpose of hiding latencies, implement relaxed
memory models and allow instructions within a thread to be reordered as long
as they operate on different memory addresses. For example, Total Store Order
(TSO), the memory model for x86 processors, allows write instructions to get
reordered with later reads provided they operate on different memory locations.
As a result of these relaxations, a program may exhibit more behaviours than
under SC and it is possible that some of these extra behaviours do not satisfy the
property which holds under SC. Peterson’s mutual exclusion algorithm, in Figure
1 illustrates this behaviour. This algorithm satisfies mutual exclusion property
under the SC model but executing it on an Intel’s x86 processor might result in
the violation of this property. This can happen if the read of flag2 at label 3 is
reordered before instructions at label 1 and label 2. With such reordering Proc1
can enter the critical section. Proc2, with or without this reordering, can also
enter the critical section simultaneously. This example clearly shows the effect
of memory model on the correctness of an algorithm.
Proc1 Proc2
1.flag1:=true
2.turn:=2
3.while(flag2 && turn = 2) do od
Critical Section
4.flag1:=0
6.flag2:=true
7.turn:=1
8.while(flag1 && turn = 1) do od
Critical Section
9.flag2:=0
Fig. 1. Peterson’s mutual exclusion algorithm
It is clear that this problem appeared because of an extra execution gener-
ated due to instructions’ reordering which was not possible under the SC memory
model. This problem also does not appear for a data race free program if the
underlying relaxed memory model satisfies data-race freedom (DRF) property.
It can be shown that any data race free program when executed on a memory
model satisfying DRF exhibits exactly the same set of behaviours as under the
SC memory model. A program is data race free if in every execution of this
program any pair of an conflicting instructions (w-w or w-r) by two different
threads to the same variable are separated by an unlock instruction. There-
fore a data race free and a correct program under SC is guaranteed to execute
correctly on a memory model which satisfies DRF property. However, the al-
gorithms that we are interested in (lock-free, wait-free, lock implementations)
do not use lock/unlock and hence do not fit under this definition of data race
freedom. Therefore, their correctness under a relaxed memory model does not
follow from their correctness under the SC memory model.
Another way to avoid extra executions is to prevent certain reorderings by
putting a special instruction, fence, after every instruction in each thread. A
fence instruction when placed between any two instructions in a thread pro-
hibits their reordering. Execution of a fully fenced program (fence after every
instruction in a thread) on a relaxed memory model generates exactly the same
set of executions as under SC and therefore the correctness under SC implies
the correctness under relaxed memory model. However, this trivial placement
strategy would negate the performance benefits associated with relaxed mem-
ory models. Therefore, an ideal placement of fence instructions should preserve
only those program orders which are sufficient to prove the correctness of the
properties of interest.
In this paper, we deal with parallel programs which satisfy some property
under SC but are not race-free. The main contribution of this paper is to show
that the proof of correctness of these programs under SC is useful in identifying
per-thread instruction orderings sufficient to make this program correct on a
relaxed memory model. Further, locations of fence instructions can be inferred
based on these orderings and the underlying memory model. We are not aware
of any other attempt to use the SC proof of correctness for fence inference in a
concurrent program.
2
2 Related Work
All existing approaches for inferring fences for relaxed memory models can
be divided into two main categories; model checking based approaches [HR07]
[KVY11] [LW11] [AAC+12b] and proof system based approaches [Rid10] [Bor12]
[BD12]. Model checking based approaches first explore the state space of a pro-
gram under a given memory model using buffer based operational semantics
and check the reachability of erroneous states. Once a reachable erroneous state
is identified, the path leading to this state is restricted by inserting fences at
appropriate places. [ABBM12] and [ABBM10] showed that the state reacha-
bility problem for TSO and PSO memory models is decidable for finite state
programs. Further, this problem becomes undecidable as soon as the read after
write reordering is added to the memory model. This approach, by its nature,
is better suited to programs with finite data domains. We are aware of only
one line of work [AAC+12a] which combines predicate abstraction and model
checking based approach to verify and correct infinite data domain programs
like Lamport’s bakery algorithm.
The second approach is to use a memory model specific proof system as done
in [Rid10]. [Rid10] presents a separation logic based proof system for the TSO
memory model and shows that Simpson’s 4 slot algorithm does not satisfy the
interference freedom property. Recently [Bor12] and [BD12] looked at the use of
separation logic derived proof system for verifying concurrent data structures on
POWER/ARM based memory models. These memory models are more complex
than the TSO or the PSO memory model mainly because of non-atomic writes.
Unlike the approaches of [Rid10], [Bor12] and [BD12], we do not propose a
memory model specific proof system but only look at the proof of correctness
under SC memory model and use it to infer sufficient orderings required for the
correctness. Unlike model checking approaches, proof system based approaches
can cover more than just reachability. This is evident in the example of Simpson’s
4 slot algorithm where apart from the interference freedom we also prove that
the sequence of values observed by the reader are consistent, i.e. they form a
stuttering sequence of the values written by the writer. We are not aware of
any line of work which handled the fence inference in Simpson’s 4 slot algorithm
under PSO memory model with respect to the interference freedom and the
consistent reads properties.
3 Language: Syntax and Semantics
Figure 2 shows the syntax of a simple parallel programming language without
the support of dynamic thread creation. Operator ‖ is used to compose a finite
number of programs in parallel. We explicitly distinguish local variables, ranged
over by ℓvar and accessed only within a thread, and shared variables, ranged over
by shvar and accessed by more than one thread. A local expression, ranged over
by ℓexp, is constructed using only local variables, values and operators. A shared
expression, ranged over by shexp, is constructed from exactly one shared vari-
able, another local expression and operators. Assignment command only allows
3
Program = Cmd; end
Cmd = shvar:=ℓexp | ℓvar:=ℓexp | ℓvar:=shexp | Cmd1;Cmd2 | Program1‖ · · · ‖Programn
| if ℓexp then Cmd1 else Cmd2 | while ℓexp do Cmd od | skip | fence
shexp = shvar | shvar ⊕ ℓexp
ℓexp = ℓvar | ℓexp⊕ ℓexp | val
val = N | B | T
⊕ = + | − | × | ÷
Fig. 2. A simple concurrent programming language with barrier support
assigning a local expression to a local variable, a shared expression to a local
variable or a local expression to a shared variable. Assignment of a shared ex-
pression to a shared variable can be broken down into assignments of one of the
above forms. Because of this restriction every assignment command either reads
at most one shared variable or writes to at most one shared variable but not
both. This guarantees at most one memory load or store event per assignment
expression which is helpful in reasoning about memory model and associated
events.
3.1 SC Semantics
Figure 3 shows the semantics of this language under SC. A state or configuration
under SC is of the form (G,Tstore) where
G
def
= shvar→ val, Tstore
def
= Tid→ L× Program, L
def
= ℓvar→ val
Local store L and global store G maps local and shared variables respectively
to their values. Each thread is represented by a unique thread id. Thread store
maps a thread id to its local store and the program to be executed next. In our
operational semantics we use the set representation of this function, i.e Tstore
as a set of tuples of the form (t,L, C). Function J−KG,L ∈ exp → G → L → val
such that
JshvarKG,L = G(shvar), JℓvarKG,L = L(ℓvar), Je1 ⊕ e2KG,L = Je1KG,L ⊕ Je1KG,L
takes an expression and evaluates it to a value based on the mapping of variables
in global store G and thread local store L. This function is then used to define
reference semantics in Figure 3. For any function F : A→ B, a′ ∈ A, b′ ∈ B, we
write F [a′ := b′] to denote a function which is same as F everywhere except at a′
where it evaluates to b′. Semantic rules corresponding to the conditional and the
looping constructs (ITE-T, ITE-F,WHL-T,WHL-F), local variable’s read write
(LRW) and global variable’s read write (GR,GW) are quite straightforward.
In the parallel composition command, the parent thread stops its execution and
4
C = shvar:=ℓexp;C′
JℓexpKG,L = v G
′ = G[shvar := v]
(G, {(t,L, C)} ∪ T )→ (G′, {(t,L, C′)} ∪ T )
(GW)
C = ℓvar:=ℓexp;C′
JℓexpKG,L = v L
′ = L[ℓvar := v]
(G, {(t,L, C)} ∪ T )→ (G, {(t,L′, C′)} ∪ T )
(LRW)
C = join(ℓvar);C′
JℓvarKG,L = t
′ ∈ Tid t′ 6∈ dom(T ) ∪ {t}
(G, {(t,L, C)} ∪ T )→ (G, {(t,L, C′)} ∪ T )
(Join)
JℓexpKG,L = true
C = if ℓexp then Cmd1 else Cmd2;C
′
(G, {(t,L, C)} ∪ T )→ (G, {(t,L,Cmd1;C
′)} ∪ T )
(ITE-T)
JℓexpKG,L = true
C = while ℓexp do Cmd od;C′
(G, {(t,L, C)} ∪ T )→ (G, {(t,L,Cmd;C)})
(WHL-T)
C = ℓvar:=shexp;C′
JshexpKG,L = v L
′ = L[ℓvar := v]
(G, {(t,L, C)} ∪ T )→ (G, {(t,L′, C′)} ∪ T )
(GR)
C = end
(G, {(t,L, C)} ∪ T )→ (G, T )
(END)
C ∈ {skip;C′, fence;C′}
(G, {(t,L, C)} ∪ T )→ (G, {(t,L, C′)} ∪ T )
(SKP-SYC)
JℓexpKG,L = false
C = if ℓexp then Cmd1 else Cmd2;C
′
(G, {(t,L, C)} ∪ T )→ (G, {(t,L,Cmd2;C
′)} ∪ T )
(ITE-F)
JℓexpKG,L = false
C = while ℓexp do Cmd od;C′
(G, {(t,L, C)} ∪ T )→ (G, {(t,L, C′)})
(WHL-F)
T ′ = T ∪ {{(t1, ∅,Program1)}, · · · , {(tn, ∅,Programn)}}
t1, · · · , tn 6∈ dom(T ) ∪ {t}
C = Program1‖ · · · ‖Programn;C
′
(G, {(t,L, C)} ∪ T )→ (G, {(t,L, join(t1); · · · ; join(tn);C
′)} ∪ T ′)
(PARCOMP)
Fig. 3. Reference semantics (SC) for programming language of Figure 2
waits for all children threads to finish their execution. This is achieved by adding
a join() command in the parent thread for each spawned thread. The rule for
join(tid) command ensures that this command is executed when the thread
corresponding to tid has finished its execution. This also ensures that the parent
thread waits for the completion of children threads before continuing further.
fence and skip are like no-op under SC semantics. Following the syntax of
Figure 2 one process Proci, or thread, can only execute one program Programi.
Therefore we sometime use Proci and Programi interchangeably to mean the
same thing.
4 Logic
In the proof system of [Sou84] every process Proci executing a program Programi
has a history variable hProci which captures the interaction of this process with
shared variables in terms of values read from and written to them. hProci is
a sequence of elements of the form (?shvar,ph) or (!shvar, v). (?shvar,ph) is
5
added to the sequence when Proci reads a value from the shared variable shvar.
Similarly, (!shvar, v) is added to the sequence when Proci writes a value v to the
shared variable shvar. Local reasoning of a program Programi generates a triple
of the form {Pi} Programi {Qi} where Pi and Qi define assertions on the local
state of the process as well as on the history hProci . Rest of this section describes
the axioms of this proof system explaining the idea of local reasoning in terms of
history variable and the parallel composition rule in terms of Compat predicate.
Axioms In our programming language all program constructs, except GR and
GW, operate on local expression and therefore do not require reading from or
writing to shared variables. Therefore the proof rules for these constructs are
same as the Hoare’s axioms in sequential setting. In the following proof rules,
the notation P [Q/Q′][R/R′] denote simultaneous substitution of Q′ for Q and
R′ for R in P . Operator “.” concatenates an element to a sequence and ǫ is
the empty sequence. Given a sequence σ, |σ| is the length and σ[i] is the ith
element of this sequence. The proof rule for the assignment to shared variable is
as following,
{P [hProci/hProci .(!shvar, ℓexp)]} shvar:=ℓexp {P} (GWrite)
As a result of this write the history is appended with the element (!shvar, ℓexp).
The proof rule for the reading of a shared variable, ℓvar:=shexpj, is given as
{∀ph. P [hProci/hProci .(?shvarj,ph)][ℓvar/shexpj [shvarj/ph]]} ℓvar:=shexpj {P}
(GRead)
This rule requires the value of the shared variable shvarj in order to evaluate
the expression shexpj but while reasoning locally we do not know the value
beforehand. Therefore instead of the actual value of shvarj a placeholder variable
ph is assigned to shvarj and this information is stored in hProci by appending it
with (?shvarj,ph). Further, shexpj is evaluated accordingly before being assigned
to ℓvar in the assertion. Here ph is universally quantified to all possible values.
{Pi ∧ hProci = ǫ} Programi {Qi}, i = 1 · · ·n
{P1 ∧ · · · ∧ Pn ∧ shvar1 = v1 ∧ · · · shvarm = vm}
Program1‖ · · · ‖Programn
{Q1 ∧ · · · ∧Qn ∧ Compat(v1, · · · , vm, hProc1 , · · · , hProcn)}
(ParComp)
In parallel composition, each process is analyzed in isolation with the initial
value of its history variable hProci set to empty and some precondition Pi on
its local state. This gives post-condition Qi for each process Proci which also
contains the assertions on hProci . Precondition of the parallel composition rule
is the conjunctions of individual processes’ preconditions and the initial values
of shared variables shvar1 to shvarm. Post-condition of this rule is the conjunc-
tions of individual processes’ post-conditions along with the predicate Compat
where Compat is defined in Figure 4. Merge(hProc1 , · · · , hProcn) represent the set
of all possible interleavings of histories hProc1 to hProcn such that the sequence
6
Compat(v1, · · · , vm, hProc1 , · · · , hProcn)
def
=

shvar1 = v1 ∧ · · · shvarm = vm if hProci = ǫ, i = 1 · · ·n
∃h ∈Merge(hProc1 , · · · , hProcn). otherwise
∀j ≤ m.shvarj = fj(vj , h)
∧ ∀k ≤ |h|. h[k] = (?shvarj,phj)⇒
phj = fj(vj , h[1 : k − 1]),
Fig. 4. Non-recursive definition of Compat predicate
of elements within them is preserved in the merged history. Function fj(v, h)
returns the last value written to the variable shvarj in history h. It returns v
if no such write is found. Essentially, the predicate Compat generates a set of
equality predicates (one corresponding to each merged history). The first line in
Compat’s definition denotes that the final value of any shared variable shvarj is
the last value written to shvarj in that merged history. Second line relates the
placeholder value phj , corresponding to a read of a shared variable shvarj , to
the latest value written to shvarj just before this element in the merged history.
This is sufficient to characterize all compatible merged histories and therefore
plays the central role in proofs of §6 and of Appendix ??.
Individual process histories contains placeholder variables for every read. In
Figure 5 we define, rather informally, a set of predicates over these histories
in order to succinctly represent them in our proofs. Given a sequence σ, σ ⇃type
denote the restricted subsequence of σ consisting of only type elements. Predicate
None! holds if the history does not contain any write to any shared variable.
None!shvar and None?shvar hold if the history does not contain any write to
or read from the variable shvar. [P ]∗ and [P ]+ capture the regularity of history
sequence by abstracting the placeholder variables. We also admit P+ ⇒ P ∗
as a relaxation on the history sequence which is used in the consequence rule.
We admit that the use of these predicates, without giving proper semantics,
is not fully justified but we do it solely for the purpose of making our proofs
manageable. We leave more formal treatment of these predicates for the future
work.
5 Relaxed Memory Model
For the rest of this paper we consider the PSO memory model which allows
reordering of a write instruction with future reads and future write instructions
operating on different variables. To simulate the effect of PSO, every thread
is equipped with one buffer per shared variable. These buffers store the values
written on the corresponding variable by this thread in a queue(FIFO) discipline.
Buffering the value of the write in a variable specific queue, simulates the effect of
delaying the execution of a write instruction past future reads and future writes
on different memory locations. If a read instruction for any shared variable shvar
is executed in a thread then the local buffer of shvar is checked first. If this buffer
7
None?shvar
def
= λelem,h. h ⇃(?shvar, )= ǫ
None!shvar
def
= λelem,h. h ⇃(!shvar, )= ǫ
None!
def
= λelem,h. h ⇃(! , )= ǫ
[P ;Q]
def
= λh.∃h0, h1. h = h0.h1 ∧ P (h0) ∧Q(h1)
[P ]∗
def
= λh. h = ǫ ∨ ∃h0, h1. h = h0.h1 ∧ P (h0) ∧ [P ]
∗(h1)
[P ]+
def
= λh. ∃h0, h1. h0 6= ǫ ∧ h = h0.h1 ∧ P (h0) ∧ [P ]
∗(h1)
Fig. 5. Predicates on individual history variable hi
is non empty then the latest written value (from the tail) is returned. In case of
empty buffer the value is read from the global state. This memory model also
provides an explicit fence instruction in order to restrict reordering of any two
instructions within a thread. Operationally this is achieved by flushing all the
buffers of that thread.
For relaxed semantics we need some modifications in the notion of State. A
state or configuration under this memory model is of the form (G,Tstore) where,
G
def
= shvar→ val Tstore
def
= Tid→ L× B × Program
B
def
= shvar → σ L
def
= ℓvar→ val
The only change with respect to SC state is in the definition of thread store.
Here, the range of this function also contains a buffer store B which is a function
from shared variable to an ordered sequence of values, ranged over by σ. Function
J−KrmG,L,B ∈ exp → G → L → B → val, defined in Figure 7, takes an expression
and evaluates it based on the values stored in the global store, thread local
store and buffer store, Further, relaxed semantics is defined in Figure 6. For
the constructs not shown in Figure 6 the semantics is the same as in the SC
semantics of Figure 3 except for the change in the evaluation function from
J−KG,L to J−K
rm
G,L,B. In relaxed semantics, a thread t executing a write to a
shared variable shvar enqueues the value in the buffer of shvar in t. Any read of a
shared variable shvar returns the latest value in the buffer of shvar, if any. If this
buffer is empty then the read returns the value from the global state (memory).
Flush operation non-deterministically deques an element from any thread buffer
and updates the global state accordingly. Further, in the parallel composition
rule the requirement for the parent thread’s buffer being empty ensures that
instructions before and after the parallel composition are ordered. Same holds
for the end command as well.
6 Examples
We use our proof system to prove the correctness of Lamport’s bakery algorithm
[Lam74] for two processes and Simpsons’s 4 slot algorithm [Sim90].
8
C = shvar:=ℓexp;C′ B(shvar) = σ
JℓexpKrmG,L,B = v B
′ = B[shvar := σ.v]
(G, {(t,L,B, C)} ∪ T )
M
→ (G, {(t,L,B′, C′)} ∪ T )
(GW)
C = ℓvar:=ℓexp;C′
JℓexpKrmG,L,B = v L
′ = L[ℓvar := v]
(G, {(t,L,B, C)} ∪ T )
M
→ (G, {(t,L′,B, C′)} ∪ T )
(LRW)
C = join(ℓvar);C′
JℓvarKrmG,L,B = t
′ ∈ Tid t′ 6∈ dom(T ) ∪ {t}
(G, {(t,L,B, C)} ∪ T )
M
→ (G, {(t,L,B, C′)} ∪ T )
(Join)
C = ℓvar:=shexp;C′
JshexpKrmG,L,B = v L
′ = L[ℓvar := v]
(G, {(t,L,B, C)} ∪ T )
M
→ (G, {(t,L′,B, C′)} ∪ T )
(GR)
C = end ∀shvar ∈ dom(B). B(shvar) = ǫ
(G, {(t,L,B, C)} ∪ T )
M
→ (G, T )
(END)
∃shvar ∈ dom(B). B(shvar) = v.σ
B′ = B[shvar := σ] G′ = G[shvar := v]
(G, {(t,L,B, C)} ∪ T )
M
→ (G′, {(t,L,B′, C)} ∪ T )
(Flush)
C = fence;C′
∀shvar ∈ dom(B). B(shvar) = ǫ
(G, {(t,L,B, C)} ∪ T )
M
→ (G, {(t,L,B, C′)} ∪ T )
(Fence)
T ′ = T ∪ {{(t1, ∅,Program1)}, · · · , {(tn, ∅,Programn)}}
t1, · · · , tn 6∈ dom(T ) ∪ {t}
C = Program1‖ · · · ‖Programn;C
′ ∀shvar ∈ dom(B). B(shvar) = ǫ
(G, {(t,L,B, C)} ∪ T )
M
→ (G, {(t,L,B, join(t1); · · · ; join(tn);C
′)} ∪ T ′)
(PARCOMP)
Fig. 6. Relaxed semantics for programming language of Figure 2
JshvarKrmG,L,B
def
=
{
v if B(shvar) = σ.v
G(shvar) otherwise
JℓvarKrmG,L,B = L(ℓvar) Je1 ⊕ e2K
rm
G,L,B = Je1K
rm
G,L,B ⊕ Je1K
rm
G,L,B
Fig. 7. J−KrmG,L,B for relaxed semantics
– Lamport’s algorithm has unbounded data domain for token variables which
makes its verification challenging for model checking based approaches. We
prove that this algorithm satisfies the mututal exclusion property, i.e. it is
never possible for both processes to be inside their critical section simulta-
neously.
– Simpson’s 4 slot algorithm implements a wait-free and lock-free atomic reg-
ister for concurrent reader and writer. This algorithm uses disjoint slots to
read from and write to in presence of interference. We prove that this algo-
rithm is safe in the sense that concurrent reader and writer never use the
same slot in presence of interference. Further, we also prove that the values
observed by successive reads are in the same order as written by the writer.
One important notation used in our proofs is as following; Let P and Q be the
assertions about individual elements in a history sequence then P ≺ Q denotes
9
the fact that the element satisfying the assertion P appears before the element
satisfying the assertion Q in the history sequence.
6.1 Example: Lamport’s Bakery Algorithm for Two Processes
Inv1
def
= λh.

Let ph1, ph2, ph3, ph4,
P1, Q1, R1, T1, U1, V1,W1. in
P1 = λh. (!token1, 0)(h) in
Q1 = λh. (!taking1, true)(h) in
R1 = λh. (?token2,ph1)(h) in
T1 = λh. (!token1,ph1 + 1)(h) in
U1 = λh. (!taking1, false)(h) in
V1 = λh. (?taking2,ph2)(h) in
W1 = λh. (?token1,ph3)(h) in
X1 = λh. (?token2,ph4)(h) in
[[Q1;R1; T1;U1;None!;P1]
∗;Q1;R1;T1;
U1;None!; V1;None!;W1;X1](h)


Inv2
def
= λh.

Let ph′1, ph
′
2, ph
′
3, ph
′
4,
P1, Q2, R2, T2, U2, V2,W2. in
P2 = λh. (!token2, 0)(h) in
Q2 = λh. (!taking2, true)(h) in
R2 = λh. (?token1,ph
′
1)(h) in
T2 = λh. (!token2,ph
′
1 + 1)(h) in
U2 = λh. (!taking2, false)(h) in
V2 = λh. (?taking1,ph
′
2)(h) in
W2 = λh. (?token1,ph
′
3)(h) in
X2 = λh. (?token2,ph
′
4)(h) in
[[Q2;R2; T2;U2;None!;P2]
∗;Q2;R2;T2;
U2;None!; V2;None!;W2;X2](h)


Fig. 8. Invariants Inv1 and Inv2 of Figure 9
In Lamport’s algorithm each process Proci operates on shared variables
tokeni and takingi of type integer and boolean respectively. When a process
Proci intends to enter the critical section, it first reads the value of token cor-
responding to another process, say v, and assigns the value v + 1 to its own
token variable. ftok1 and ftok2 are local variables of Proc1 which hold the value
of token1 and token2 respectively. Similarly, stok1 and stok2 are local variables
of Proc2 for token1 and token2. (a, b) < (a
′, b′) denotes lexicographic less than
relation, i.e. (ftok2, 2) < (ftok1, 1) iff ftok2 < ftok1 and (stok1, 1) < (stok2, 2)
iff stok1 ≤ stok2. This algorithm with inline assertions is shown in Figure 9
where Inv1 and Inv2 are as in Figure 8. Assertions are on the history hi and
local variables of that process. History hi is abstracted using the predicates of
Figure 5. It should be noted that V1,W1 and X1 do not appear explicitly inside
the regular structure of the history abstracted by [Q1;R1;T1;U1;None!;P1]
∗.
They appear in the last iteration of the loop and therefore the variables updated
inside the loop, (ftok1, f tok2), are assigned the placeholders corresponding to
these reads, i.e. ph2, ph3 and ph4. Same holds for the invariant of Proc2 as well.
Subsequently, these elements are abstracted to None! in order to establish the
loop invariant.
Mutual Exclusion proof In order to prove the mutual exclusion property, we
first state the required assertion to capture this property.
ME
def
= (Inv1′′(h1) ∧ Inv2
′′(h2) ∧ Compat(0, 0, false, false, h1, h2) = false)
10
token1:=0, token2:=0, taking1:=false, taking2:=false
Proc1
while(true) do
[Q1; R1; T1; U1;None!; P1]
∗(h1)
1. taking1:=true
[[Q1; R1; T1; U1;None!; P1]
∗; Q1](h1)
2. ftok2:=token2
3. token1:=ftok2 + 1
4. taking1:=false
[[Q1; R1; T1; U1;None!; P1]
∗; Q1; R1; T1; U1](h1)
5. fdone2:=taking2
6. while(fdone2 6= false) do
fdone2:=taking2
od
7. ftok1:=token1
8. ftok2:=token2
Inv1′(h1)
def
= Inv1(h1) ∧ ftok1 = ph3
∧ftok2 = ph4 ∧ ph2 = false
9. while(ftok2 6= 0 ∧ (ftok2, 2) < (ftok1, 1)) do
ftok1:=token1
ftok2:=token2
od
Inv1′′(h1)
def
= Inv1′(h1)
∧¬(ftok2! = 0 ∧ (ftok2, 2) < (ftok1, 1))
Critical Section
10. token1:=0
[Q1; R1; T1; U1;None!; P1]
+(h1)
od
Proc2
while(true) do
[Q2; R2; T2; U2;None!; P2]
∗(h2)
1′. taking2:=true
[[Q2; R2; T2; U2;None!; P2]
∗; Q2](h2)
2′. stok1:=token1
3′. token2:=stok1 + 1
4′. taking2:=false
[[Q2; R2; T2; U2;None!; P2]
∗; Q2; R2; T2; U2](h2)
5′. sdone1:=taking1
6′. while(sdone1 6= false) do
sdone1:=taking1
od
7′. stok1:=token1
8′. stok2:=token2
Inv2′(h2)
def
= Inv2(h2) ∧ stok1 = ph
′
3
∧stok2 = ph
′
4 ∧ ph
′
2 = false
9′. while(stok1 6= 0 ∧ (stok1, 1) < (stok2, 2)) do
stok1:=token1
stok2:=token2
od
Inv2′′(h2)
def
= Inv2′(h2)
∧¬(stok1! = 0 ∧ (stok1, 1) < (stok2, 2))
Critical Section
10′. token2:=0
[Q2; R2; T2; U2;None!; P2]
+(h2)
od
Fig. 9. Lamport’s Bakery Algorithm for Two Processes
where Inv1′′(h1) and Inv2
′′(h2) are the assertions inside critical sections of
Proc1 and Proc2 respectively. We prove it in two steps. First we show that,
Inv1′(h1) ∧ Inv2
′(h2) ∧ Compat(0, 0, false, false, h1, h2) =⇒ Inter.
Inter
def
=


Inter1
def
= (ftok2 = 0 =⇒ stok1 6= 0 ∧ stok1 < stok2)
∧ Inter2
def
= (stok1 = 0 =⇒ ftok2 6= 0 ∧ ftok2 < ftok1)
∧ Inter3
def
= (ftok2 6= 0 ∧ stok1 6= 0 =⇒
ftok1 ≤ ftok2 =⇒ stok1 ≤ stok2)


and then it is easy to show that,
Inter ∧ ¬(ftok2 6= 0 ∧ (ftok2, 2) < (ftok1, 1))
∧ ¬(stok1 6= 0 ∧ (stok1, 1) < (stok2, 2)) = false
11
Proof of Inv1′(h1) ∧ Inv2
′(h2) ∧ Compat(0, 0, false, false, h1, h2) =⇒ Inter1 , We
show that in two steps,
Inv1′(h1) ∧ Inv2
′(h2) ∧ Compat(0, 0, false, false, h1, h2)
∧ ftok2 = 0 =⇒ stok1 6= 0
(1)
and
Inv1′(h1) ∧ Inv2
′(h2) ∧ Compat(0, 0, false, false, h1, h2)
∧ ftok2 = 0 =⇒ stok1 < stok2
(2)
Proof of (1) , First assume that Inv1′(h1), Inv2
′(h2), Compat(0, 0, false, false, h1, h2)
and ftok2 = 0 hold. Only way to have ftok2 = ph4 = 0 is to put last read of
token2 in h1 (denoted by X1) in the merged history after P2 of, say k − 1
th
iteration of [Q2;R2;T2;U2;None!;P2] and before T2 of k
th iteration. Inv1
implies that T1 ≺ X1 hence the value of token1 visible at X1 is non-zero
which also becomes visible at T2 because of the placement of X1 before T2
and the fact that the history hProc1 does not contain any write to token1 after
X1. Further because Inv2 implies T2 ≺W2 therefore the read of token1 at
W2 in Proc2 also observes this non-zero value of token1 and assigns it to
stok1. Therefore we have,
Inv1(h1) ∧ Inv2(h2) ∧ Compat(0, 0, false, false, h1, h2)
∧ ftok2 = 0 =⇒ stok1 6= 0
Out of all the orderings of Proc1 and Proc2 only T1 ≺ X1 and T2 ≺ W2 are
used to prove this part of the proof. Hence these two are sufficient to prove
(1).
Proof of (2) In order to have ftok2 = ph4 = 0, X1 should be placed in the
merged history after P2, of say k−1
th iteration of [Q2;R2;T2;U2;None!;P2]
and before T2 in k
th iteration. Further, Inv′1 implies ph2 = false and therefore
V1(corresponding to the read of taking2 in last iteration) must be placed
before Q2 of any n
th iteration and after U2 of n − 1
th iteration such that
n ≤ kth. It must be noticed that V1 cannot be put after U2 of k
th iteration
because V1 ≺ X1 in the history of Proc1. Inv1 implies T1 ≺ V1 and hence the
value of token1 visible at V1 is non-zero, say v. As V1 has been placed before
Q2 of n
th iteration, where n ≤ k, hence the same value v of token1 is also
visible at Q2 of n
th iteration. Inv2 implies that Q2 ≺ R2 hence the same
value v of token1 is also visible at R2 of any subsequent iterations. Inv2
implies R2 ≺ T2 and therefore all subsequent iterations of T2 write v + 1 to
token2 resulting in stok2 = v+1. Further the value visible at stok1 in Proc2
is still the last value written to token1 by Proc1, i.e. v. Therefore we get,
Inv1′(h1) ∧ Inv2
′(h2) ∧ Compat(0, 0, false, false, h1, h2)
∧ ftok2 = 0 =⇒ stok1 < stok2
Only orderings used in this part of the proof are V1 ≺ X1, T1 ≺ V1, R2 ≺ T2
and Q2 ≺ R2. Hence, out of all total orders in h1 and h2 these are sufficient
to prove (2).
12
Proof of Inv1′ ∧ Inv2′ ∧ Compat(0, 0, false, false, h1, h2) =⇒ Inter2, This is sym-
metric to previous proof and gives us following symmetric sufficient order-
ings ; T2 ≺ X2, T1 ≺W1, V2 ≺ X2, T2 ≺ V2, Q1 ≺ R1 and R1 ≺ T1.
Proof of Inv1′ ∧ Inv2′ ∧ Compat(0, 0, false, false, h1, h2) =⇒ Inter3 , For ftok2 6=
0, X1 must read the non-zero value written to token2 at T2 and similarly for
stok1 6= 0, W2 must read the non-zero value written to token1 at T1. Let X1
read from T2 of iteration k2 and W2 reads from T1 of iteration k1. Following
possibilities arise based on whether or not k1 and k2 are last iterations of
Proc1 and Proc2.
k1 is not the last iteration and k2 is the last iteration In order to have
stok1 6= 0, W2 is placed after T1 of k
th
1 iteration and before P1 of k
th
1
iteration in the merged history. Inv2 implies that T2 ≺W2 and therefore
the value written to token2 in the last iteration of Proc2 at T2 is some v
such that v 6= 0 and it flows to W2. Inv1 implies P1 ≺ X1 and because
Proc1 does not write to token2 hence X1 also sees the value of token2 as
v 6= 0. Inv1 further implies that P1 ≺ R1 for R1 of any iteration greater
than k1 and R1 ≺ T1 such that R1 reads the value of token2 and writes
back this value incremented by 1 to token1 at T1. Therefore v+1 is writ-
ten to token1 at T1 in all iterations greater than k1. Further, T1 ≺W1
implies that the same value v+1 is also visible at W1. Therefore we get
ftok1 > ftok2 implying Inter3 trivially holds.
k1 is the last iteration and k2 is not the last iteration In order to have
ftok2 6= 0, X1 is placed after T2 of k2 and before P2 of k2 in the merged
history. Inv1 implies T1 ≺ X1 and therefore the value written to token1
in the last iteration of Proc1 at T1 is some v such that v 6= 0 and it
flows to X1. Inv2 implies P2 ≺ R2 for R2 of any iteration greater than
k2 and R2 ≺ T2 such that R2 reads the value of token1 and writes back
this value incremented by 1 to token2 at T2. Therefore v+1 is written to
token2 at T2 in all iteration greater than k2. Also Inv2 implies T2 ≺ X2
which gives stok2 equal to v + 1. Further Inv2 also implies P2 ≺W2
hence the same value v of token1 which is visible at P2 is assigned to
stok1. This results in stok1 < stok2 and hence proved.
Both k1 and k2 are last iterations : In this case X1 is placed after T2 of
the last iteration and because T2 ≺ X2 hence ftok2 = stok2. Similarly, if
W2 is placed after T1 of the last iteration then from T1 ≺W1 in Inv1 we
have stok1 = stok2. Therefore ftok1 ≤ ftok2 =⇒ stok1 ≤ stok2 hence
proved.
Neither of k1 and k2 are last iterations : We show that if ftok2 reads
the value of token2 from k
th
2 iteration of Proc2 then it is not possible for
stok1 to read the value of token1 from the iteration k1 of Proc1 which
is not the last iteration. Let X1 is placed after T2 and before P2 of k
th
2
iteration. T1 ≺ X1 implies that the value of token1 visible at X1 is from
the last iteration of T1 which also becomes visible at P2 of k
th
2 iteration
because of the placement of X1. Further, P2 ≺W2 implies that W2 sees
the same value of token1 written by the last iteration of Proc1 at T1 which
13
is not k1. Similar argument follows for W2 as well. Hence no compatible
merged history exists for this case.
Finally we collect the sufficient orderings used to prove Inter3.
– T1 ≺ X1, T1 ≺W1, P1 ≺ X1, P1 ≺ R1 and R1 ≺ T1 for Proc1.
– T2 ≺ X2, T2 ≺W2, P2 ≺W2, P2 ≺ R2 and R2 ≺ T2 for Proc2.
In P1 ≺ R1 and P2 ≺ R2, Ri comes from the iteration later than that of Pi.
Lamport’s bakery algorithm under PSO memory model PSO mem-
ory model allows write instructions in a process to be reordered with future
write and read instructions operating on different addresses. We have follow-
ing sufficient instruction orderings needed to prove Inter and hence mutual
exclusion.
– T1 ≺ V1, T1 ≺W1, T1 ≺ X1, P1 ≺ X1, V1 ≺ X1, Q1 ≺ R1, R1 ≺ T1
– T2 ≺ V2, T2 ≺ X2, T2 ≺W2, P2 ≺W2, V2 ≺ X2, Q2 ≺ R2, R2 ≺ T2
– P1 ≺ R1 if P1 is from iteration k then R1 is from iteration greater than
k.
– P2 ≺ R2 if P2 is from iteration k then R2 is from iteration greater than
k.
The PSOmemory model preserves the ordering of any two instructions which
are data or control dependent. Therefore T1 ≺ W1, T2 ≺ X2, R1 ≺ T1 and
R2 ≺ T2 are also satisfied by PSO. Further, PSO also preserves the order of
two read instructions, i.e. V1 ≺ X1 and V2 ≺ X2. A fence between T1 and
V1 also satisfies the ordering T1 ≺ X1 and symmetrically a fence between
T2 and V2 also satisfies the ordering T2 ≺ W2. Further, a fence between Q1
and R1 also orders P1 of k
th iteration before R1 of ≥ k + 1
th iterations and
X1. A fence between Q2 and R2 also orders P2 of k
th iteration before R2 of
≥ k + 1th iteration and W2. Therefore we only need two fence instructions
per process, i.e. between Q1 and R1 and between T1 and V1 in Proc1 and
symmetrically in Proc2.
6.2 Example: Simpson’s 4 slot algorithm [Sim90]
This is a wait-free algorithm for concurrent access of a location from a single
reader and a single writer processes. This location is simulated using a 2×2 array
variable slot. If the reader is reading the data and the writer wants to write new
data at the same time then instead of waiting for the reader to complete, the
writer writes to a different index of slot and indicates the reader to read from this
location in the subsequent read. Boolean variables reading and latest represent
two indices (false as 0 and true as 1) to denote the row and column indices of slot
and index variable. Variable index is a two element boolean array such that for
any s ∈ {true, false}, slot[s][index[s]] has the latest data written by the writer
in the row slot[s]. Variables fwp and fwindex are local to the writer process.
Variables srp and srindex are local to the reader process. The algorithm with
inline assertions on history and local variables are shown in Figure 10. In order to
simulate different invocations of the writer and the reader the respective program
is enclosed within the loop. We are interested in proving following two properties
of this algorithm.
14
reading:=false, latest:=false, slot[2][2]:={0}, index[2]:={false}
Inv1
def
= λh.

Let ph1, ph2, P1, Q1, R1, S1, T1 in
P1 = λh. (?reading,ph1)(h) in
Q1 = λh. (?index[¬ph1],ph2)(h) in
R1 = λh. (!slot[¬ph1][¬ph2], )(h) in
S1 = λh. (!index[¬ph1],¬ph2)(h) in
T1 = λh. (!latest,¬ph1)(h) in
[P1;Q1;R1;S1;T1]
∗(h) ∧None!reading(h)


Writer
while(true) do
Inv1(h1)
fwp:=¬reading
fwindex:=¬index[fwp]
Inv1′
def
= [[P1; Q1; R1; S1; T1]
∗;
(?reading, ph1); (?index[¬ph1], ph2)](h1)
∧fwp = ¬ph1 ∧ fwindex = ¬ph2
∧None!reading(h1)
Critical Section
Write to slot[fwp][fwindex]
index[fwp]:=fwindex
latest:=fwp
[P1; Q1; R1; S1; T1]
+(h1)
od
Inv2
def
= λh.

Let ph′1, ph
′
2, P2, Q2, R2, S2. in
P2 = λh. (?latest, ph
′
1)(h) in
Q2 = λh. (!reading, ph
′
1)(h) in
R2 = λh. (?index[ph
′
1], ph
′
2)(h) in
S2 = λh. (?slot[ph
′
1][ph
′
2], )(h) in
[P2;Q2;R2;S2]
∗(h) ∧None!index (h)
∧None!latest(h) ∧None!slot(h)


Reader
while(true) do
Inv2(h2)
srp:=latest
reading:=srp
srindex:=index[srp]
Inv2′
def
= [[P2; Q2; R2; S2]
∗;
(?latest, ph′1); (!reading, ph
′
1); (?index[ph
′
1],ph
′
2)](h2)
∧srp = ph′1 ∧ srindex = ph
′
2
∧None!index(h2) ∧None!latest(h2)
∧None!slot(h2)
Critical Section
Read from slot[srp][srindex]
[P2; Q2; R2; S2]
+(h2)
od
Fig. 10. Simpson’s 4 slot algorithm
Interference Freedom We want to show that at the entry point of critical
sections (for the writer and the reader) fwp 6= srp, i.e. the reader and the
writer use different rows of the slot variable, and fwp = srp ⇒ fwindex 6=
srindex, i.e. if both use the same row of the slot variable then they read
from and write to different column of that row. Therefore, we want to prove
Inv1′(h1) ∧ Inv2
′(h2) ∧ Compat(false, false, {0}, {false}, h1, h2) =⇒
fwp 6= srp ∨ fwp = srp⇒ fwindex 6= srindex
Where Inv1′ and Inv2′ are the assertions just before the program point
where writer is going to write the data and reader is going to read the data.
Proof. In any compatible merged history h of h1 and h2 the placement of
the last write (!reading, ph′1) of h2 has two choices with respect to the last
read (?reading, ph1) of h1.
– Last write (!reading, ph′1) of h2 is placed before the last read (?reading, ph1)
of h1: As writer does not write to reading hence ph1 = ph
′
1, or ¬fwp =
srp (from assertions fwp = ¬ph1 and srp = ph
′
1) and therefore fwp 6=
srp. Hence proved.
15
– Last write (!reading, ph′1) of h2 is placed after the last read (?reading, ph1)
of h1: This, along with the assumption fwp = srp implies ¬ph1 = ph
′
1.
Therefore in Inv2′, (?index[ph′1], ph
′
2) is same as (?index[¬ph1], ph
′
2). Ac-
cording to Inv1′ (?index[¬ph1], ph2) is in h1. We want to establish the
relation between ph2 and ph
′
2.
Inv1′ implies that S1 ≺ (?reading, ph1) hence there is no write to vari-
able index beyond (?reading, ph1) in the merged history. Hence the value
at index[¬ph1] is same for both processes, i.e. ph2 = ph
′
2. From Inv1
′
and Inv2′ we get fwindex = ¬ph2 and srindex = ph2 which implies
fwindex 6= srindex. Hence proved.
From the above proof, we can see that the only ordering important in proving
this property is S1 ≺ P1 where P1 is from later iteration than that of S1. Now
we prove another property of interest and find out the program orderings
used in that proof.
Consistent Reads Second property of interest is related to the order of reads.
It specifies that the values read by the reader form a stuttering sequence of
values written by the writer, i.e. if the writer writes the sequence 1,2,3,4,5
in subsequent invocations of write then the reader cannot observe 1,4,3,5 as
a sequence read. It must observe the sequence which preserve the order of
writes and possibly interspersed with the repetition of the same data. First,
we define some notations used in this section. Let MergedHist(ReaderR,WriterW )
be the set of all compatible merged histories consisting of R invocations of
the reader process and W invocations of the writer process. Let Reader(n)
and Writer(n) be the nth invocations of the reader and the writer respec-
tively. Let Dk be the data written by the writer in the kth invocation. Let
D(w) be a sequence D1.D2. · · · .Dw of values written by w consecutive it-
erations of the writer. For s ∈ {true, false}, s denote the negation of s. Let
elemReaderr be the element elem ∈ {(? , ), (! , )} in the merged history
from rth invocation of the Reader. Similarly elemWriterw denote the same
for for the wth iteration of the Writer.
Stuttering sequence Let Sr,w be a sequence of length r, constructed from
the elements of D(w). Sr,w is a stuttering sequence of D(w) if for any index
i of the sequence Sr,w such the Sr,w[i − 1] = D
k1 and Sr,w[i] = D
k2 then
k1 ≤ k2.
Some interesting properties of the Reader and the Writer processes
Only the writer writes to latest and reads from reading variable. Further,
only the reader writes to reading and reads from latest. Also, the value
written to reading by the reader in any invocation is same as the value read
from latest. The value written to latest by the writer in any invocation is
negation of the value that it reads from reading.
Following lemma characterizes the sequence of values written to reading in
a segment of the merged history. This characterization is then used in the
proof of Lemma 2.
16
Lemma 1. For all R, W , h ∈MergedHist(ReaderR,WriterW ), r ≤ R,w ≤
W , s ∈ {true, false}, if Reader(r) reads the value of latest written by Writer(w)
as s then in the sequence of values written to variable reading between P1 of
Writer(w) and P2 of Reader(r), s is never followed by s.
✤
✤
A′′ : (!latest, s)
Writern
′
++❲❲
❲❲❲
❲❲❲
❲❲
B′′ : (?latest, s)
Readerr
′

B′ : (!reading, s)
Readerr
′
ss❣❣❣
❣❣❣
❣❣❣
❣
A′ : (?reading, s)Writern+1

A : (!latest, s)Writern+1
++❲❲
❲❲❲❲
❲❲❲❲
❲
B : (?latest, s)Readerr
Fig. 11. Merged history for Lemma 1
Proof. We prove it using induction on the iteration number of the writer
process.
Base case, w = 0: If Reader(r) reads the initial value of latest (in 0th itera-
tion of the writer process), say s, then all the invocations of the reader before
Reader(r) also see this initial value of latest as s and therefore the sequence
is made of only this value. Hence the base case satisfies this property.
Induction Hypothesis, w ≤ n: For all w ≤ n, if Reader(r) reads the value
of latest as s, written by Writer(w) then the sequence of values written to
reading between P1 of Writer(w) and P2 of Reader(r) satisfies this property.
Induction Step, w = n + 1: Consider the merged history of Figure 11
where Reader(r) reads the value of latest(denoted by B) from Writer(n+1)
(denoted by A). We want to prove this property for the sequence of values
written to reading between A′ and B where A′ denotes P1 of Writer(n+1) in
Figure 11. P1 ≺ T1 implies that A
′ appears before A in the merged history.
Let Writer(n + 1) read the value of reading from Reader(r′) (denoted by
B′). P2 ≺ Q2 implies that P2 of Reader(r
′) appears before B′ in the merged
history. Let Reader(r′) at P2 read the value of latest from Writer(n
′)(A′′).
It is clear that n′ < n + 1 hence from the induction hypothesis we know
that in the sequence of values written to reading between A′′ and B′′, s is
never followed by s. We use this knowledge to characterize the values writ-
ten to latest between B′′ and A′. From the writer process we know that the
17
value written to latest is negation of the value read from reading variable.
Therefore in the sequence of values written to latest between B′′ and A′, s is
never followed by s. Further, we also know that the reader process writes the
same value to reading as is read from the variable latest . Therefore in the
sequence of values written to reading between A′ and B, s is never followed
by s. Hence proved.
For any reader we establish the relation between its read of variable latest
and the read of data. More formally we want to say that,
Lemma 2. For all R, W , h ∈MergedHist(ReaderR,WriterW ), r ≤ R,w ≤
W , s ∈ {true, false}, if Reader(r) reads the value of latest as s written by
Writer(w) and reads the value of index [s] written by Writer(w′) then w′ ≥ w
and Reader(r) reads the data Dw
′
and all the invocations of the writer from
w to w′ write the data in the row s of the slot variable.
Proof. Let us assume that Reader(r) reads the value of latest as s writ-
ten by Writer(w) which means that P2 of Reader(r) appears after T1 of
Writer(w) and no write to latest appears in between these two points in
the merged history. As Writer(w) also writes to index [s] and S1 ≺ T1 and
P2 ≺ R2, hence Reader(r) reads the value of index [s] either from Writer(w)
or from Writer(w′) such that w′ ≥ w. We prove the following two properties,
– Reader(r) reads the data written by Writer(w′) in slot[s][k] where
k is the value written to index [s] by Writer(w′)
Proof: If Reader(r) reads index [s] as k ∈ {true, false} from Writer(w′)
then R1 ≺ S1 implies that Writer(w
′) has also written the data in slot[s][k].
We want to show that no subsequent invocation of the writer writes
in slot[s][k] before S2 of Reader(r). From the assumption that R2 of
Reader(r) reads the value of index [s] from Writer(w′) implies that there
is no write to index [s] between S1 of Writer(w
′) and R2 of Reader(r).
S1 ≺ P1, where P1 is from later iterations than that of S1, in Inv1
′ im-
plies that the writer can be invoked at most once after the iteration
w′ and before R2 of Reader(r). If invoked more than once it results in
the write of index [s] to appear after S1 of Writer(w
′) and before R2 of
Reader(r), contradicting our assumption. Further, if the next invocation
after w′ happens then it writes the data to column k of row s in slot vari-
able still satisfying the property. If any further invocation of the writer
happens after R2 and before S2 of Reader(r) then because of Q2 ≺ R2
it observes the value of reading as s and therefore writes the data to
s row of the slot variable. Inv2′ implies that S2 of Reader(r) ≺ Q2 of
subsequent invocations of the reader and therefore no write to reading
exists between R2 and S2 of Reader(r). This results in observing the
same value of reading , s, and subsequently writing the data to row s by
all invocations of the writer between R2 and S2 of Reader(r).
– All the invocations of the writer from w to w′ write the data in
the row s of the slot variable
Proof: We prove an alternate equivalent property; All invocations of
18
the writer from w to w′ read the value of reading as s. This follows
from the property that the writer writes in that row of the slot variable
that is obtained by negating the value of the variable reading . From
the assumption, Writer(w′) writes to index [s] and therefore it also reads
the value of reading as s and the same holds for Writer(w) as well. We
need to show that this property holds for the rest of the invocations in
between w and w′. We prove it by contradiction by assuming that there
exists a w′′ such that w < w′′ < w′ and P1 of Writer(w
′′) reads the
value of reading as s. It is clear that P1 of Writer(w
′′) appears after P1
of Writer(w) and before P1 of Writer(w
′). From the earlier argument we
know that the value of reading visible at P1 of Writer(w) is s. Hence,
P1 of Writer(w
′′) cannot read from the value of reading visible at P1 of
Writer(w). Therefore, it must read the value of reading as s from the
sequence of values written to reading between P1 of Writer(w) and P2
of Reader(r). Following Lemma 1, s can never be followed by s in the
sequence of values written to reading in between P1 of Writer(w) and P2
of Reader(r). It further implies that P1 of Writer(w
′) cannot read the
value of reading as s, a contradiction to our assumption. Therefore all
invocations of the writer from w to w′ read the value of reading as s.
Hence proved.
Now we use Lemma 2 to prove the following theorem.
Theorem 1 (Stuttering Sequence). Stutter(n)
def
= For all n,w ∈ N, h ∈
MergedHist(Readern,Writerw) the sequence Sn,w, constructed from the ele-
ments of D(w), is a stuttering sequence of D(w).
Proof. Base case, n = 0: With no history corresponding to the Reader in
the merged history, the empty sequence trivially forms a stuttering sequence.
Induction Hypothesis: For all r ≤ n Stutter(r) holds true.
Induction Step, r = n+ 1:
Let Reader(n) read the value of latest from Writer(wa). From Lemma 2, we
know that Reader(n) reads the value Dw
′
such that wa ≤ w
′ ≤ w and all
the invocations of the writer from wa to w
′ write the data in the same row s
of the slot variable. Further, the value of index[s] visible at R2 of Reader(n)
is from S1 of Writer(w
′). We know that the write to latest from consecutive
iterations are totally ordered and the same holds for consecutive reads from
latest as well. Therefore, Reader(n+ 1) reads the value of latest from some
Writer(wb) such that wb ≥ wa. There are two possibilities based on whether
this value is s or s.
– Reader(n + 1) reads the value of latest as s from the Writer(wb) such
that wb ≥ wa: From our assumption we know that Reader(n) sees the
value of index[s] from Writer(w′) therefore the value of index[s] visible
at R2 of Reader(n + 1) will be from some w
′′ such that w ≥ w′′ ≥ w′.
Following Lemma 2, the data read by Reader(n+1) from slot[s][index[s]]
will be from Writer(w′′) and w′′ ≥ w′ implies that the resulting sequence
Sn+1,w is still a stuttering sequence.
19
– Reader(n + 1) reads the value of latest as s from Writer(wb) such that
wb ≥ wa: From Lemma 2, we know that all the invocations of the writer
from wa to w
′ write the data in the same row s of slot and because
Writer(wb) writes the data in row s of slot therefore wb > w
′. Following
Lemma 2, Reader(n + 1) reads the data from some Writer(w′′) such
that w′′ ≥ wb. Combining these two we get w
′′ > w′ such that Reader(n)
reads Dw
′
and Reader(n+1) reads Dw
′′
. Therefore the resulting sequence
Sn+1,s is still a stuttering sequence.
Simpson’s 4 slot algorithm under PSO memory model Following per-
thread instruction orderings are used in the proofs of the interference freedom
and the consistent reads properties.
– P1 ≺ T1(from the proof of Lemma 1), S1 ≺ T1, R1 ≺ S1
– P2 ≺ Q2(from the proof of Lemma 1), P2 ≺ R2, Q2 ≺ R2
– S1 ≺ P1, where P1 is from iteration later than that of S1
– S2 ≺ Q2, where Q2 is from iteration later than that of S2.
Out of all these orderings, P1 ≺ T1, P2 ≺ Q2 and P2 ≺ R2 are data dependent
orders which are respected by the PSO memory model. S2 ≺ Q2 whereQ2 is from
iteration later than that of S2 is also respected by PSO memory model because
S2 corresponds to read and Q2 corresponds to write instruction. Therefore, we
have to enforce only R1 ≺ S1, S1 ≺ T1, Q2 ≺ R2 and S1 ≺ P1 where P1 is from
iteration later than that of S1. Following the semantics of fence it is sufficient
to put two fence instructions in the writer; one between R1 and S1 and another
between S1 and T1. Further, we need one fence instruction between Q2 and R2
in the reader as well.
7 Conclusion and Future Work
In this paper we proved Simpson’s 4 slot algorithm correct under the SC memory
model with respect to the interference freedom and the consistent reads proper-
ties. Based on these proofs, we identified the locations of the fence instructions
needed to satisfy these two properties under the PSO memory model. As a direc-
tion for future work we still have to explore the use of this approach for advanced
memory models which support non-atomic writes (POWER/ARM). This paper
introduces the predicates over history variable only to carry out proofs conve-
niently. No formal treatment is given to them and this should be addressed with
high priority. Even in presence of predicates over history variables, the difficulty
in carrying out these proofs without any tool support is evident from the proof
of Simpson’s 4 slot algorithm. We plan to use proof assistants for this in near
future. In all the examples we considered here, a program is always executed
by only one thread. This restriction needs to be addressed in order to handle
concurrent data structures where many threads can execute the same method of
an object.
20
References
[AAC+12a] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Carl
Leonardsson, and Ahmed Rezine. Automatic fence insertion in integer
programs via predicate abstraction. In SAS, pages 164–180, 2012.
[AAC+12b] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Carl
Leonardsson, Ahmed Rezine, and Ahmed Rezine. Counter-example guided
fence insertion under tso. In TACAS, pages 204–219, 2012.
[ABBM10] Mohamed Faouzi Atig, Ahmed Bouajjani, Sebastian Burckhardt, and
Madanlal Musuvathi. On the verification problem for weak memory mod-
els. SIGPLAN Not., 45(1):7–18, January 2010.
[ABBM12] Mohamed Faouzi Atig, Ahmed Bouajjani, Sebastian Burckhardt, and
Madanlal Musuvathi. What’s decidable about weak memory models? In
ESOP, pages 26–46, 2012.
[BD12] Richard Bornat and Mike Dodds. Abducing memory barriers. Draft, April,
2012.
[Bor12] Richard Bornat. Barrier logic: a program logic for concurrency on powerpc.
2012.
[HR07] Thuan Quang Huynh and Abhik Roychoudhury. Memory model sensitive
bytecode verification. Form. Methods Syst. Des., 31(3):281–305, December
2007.
[Jon83] C. B. Jones. Specification and design of (parallel) programs. In Proceedings
of IFIP’83, pages 321–332. North-Holland, 1983.
[KVY11] Michael Kuperstein, Martin Vechev, and Eran Yahav. Partial-coherence
abstractions for relaxed memory models. In Proceedings of the 32nd ACM
SIGPLAN conference on Programming language design and implementa-
tion, PLDI ’11, pages 187–198, New York, NY, USA, 2011. ACM.
[Lam74] Leslie Lamport. A new solution of dijkstra’s concurrent programming prob-
lem. Commun. ACM, 17(8):453–455, August 1974.
[LW11] Alexander Linden and Pierre Wolper. A verification-based approach to
memory fence insertion in relaxed memory systems. In Proceedings of
the 18th international SPIN conference on Model checking software, pages
144–160, Berlin, Heidelberg, 2011. Springer-Verlag.
[O’H07] P.W. O’Hearn. Resources, concurrency, and local reasoning. Theoretical
Computer Science, 375(1):271–307, 2007.
[Rid10] Tom Ridge. A rely-guarantee proof system for x86-tso. In Proceedings
of the Third international conference on Verified software: theories, tools,
experiments, VSTTE’10, pages 55–70, Berlin, Heidelberg, 2010. Springer-
Verlag.
[Sim90] H.R. Simpson. Four-slot fully asynchronous communication mechanism.
Computers and Digital Techniques, IEE Proceedings E, 137(1):17 – 30, jan
1990.
[Sou84] N. Soundararajan. A proof technique for parallel programs. Theoretical
Computer Science, 31(1–2):13 – 29, 1984.
[VP07] V. Vafeiadis and M. Parkinson. A marriage of rely/guarantee and separa-
tion logic. In CONCUR 2007, pages 256–271. Springer, 2007.
21
