Portland State University

PDXScholar
Computer Science Faculty Publications and
Presentations

Computer Science

6-2009

Operational Verification of a Relativistic Program
Robert T. Bauer
Portland State University

Follow this and additional works at: https://pdxscholar.library.pdx.edu/compsci_fac
Part of the OS and Networks Commons, and the Programming Languages and Compilers Commons

Let us know how access to this document benefits you.
Citation Details
Bauer, Robert T., "Operational Verification of a Relativistic Program" (2009). Computer Science Faculty
Publications and Presentations. 214.
https://pdxscholar.library.pdx.edu/compsci_fac/214

This Technical Report is brought to you for free and open access. It has been accepted for inclusion in Computer
Science Faculty Publications and Presentations by an authorized administrator of PDXScholar. Please contact us if
we can make this document more accessible: pdxscholar@pdx.edu.

Operational Verification of a Relativistic
Program
Robert T. Bauer
Portland State University
Portland, Oregon
26 June 2009
TR–09–04

Abstract. Engineering efforts to achieve scalable multiprocessor performance for concurrent reader-writer programs have resulted in a family
of algorithms that are non-blocking and that tolerate interprocessor interference. Because these algorithms accept a unique frame of reference
for each processor’s accesses to memory, they typify a concurrent programming technique for shared memory multicore architectures called
relativistic programmming.
Rigorous verification of these algorithms is not possible with existing
semantic based approaches because the semantics under approximates
multiprocessor behavior and the algorithms rely on abstruse interactions
with the operating system that aren’t reconciled with language semantics.
The Read-Copy Update (RCU) algorithm is the protypical example of
relativisitc programming; it is used in more than 2000 places in the Linux
kernel and has thus far resisted analysis. In this paper a simple language
for a sequentially consistent multiprocessor is defined and we implement
RCU in that language. Both the RCU implementation and the language
semantics are instrumented to prove that RCU does not collect live objects and that it is memory safe; restrictions on the definition and use of
local RCU pointers that eliminate the instrumentation are introduced.
Thus, RCU implementations that conform to these restrictions do not
collect live objects and are memory safe.
These restrictions are readily accommodated by program analysis tools
to certify RCU implementations.

1

Introduction

Engineering efforts to achieve scalable multiprocessor performance for concurrent reader-writer programs have resulted in a family of algorithms that are
non-blocking and that tolerate interprocessor interference[28]. Because these algorithms accept a unique frame of reference for each processor’s accesses to
memory, they typify a concurrent programming technique for shared memory
multicore architectures called relativistic programmming[16].
The Read-Copy Update (RCU) algorithm[24–26] is the protypical example of relativisitc programming; it is used in more than 2000 places in the Linux kernel[27]
and achieves near optimal multiprocessor scalability by allowing concurrent accessors and a mutator to access possibly different versions of a data object.
Despite RCU’s growing importance, verification of RCU use and implementation is almost entirely based on testing[23]. This hinders RCU use in trustworthy
kernels and critical mission software.
RCU has resisted rigorous analysis for at least three reasons: 1) Traditional
semantic based approaches for reasoning about program correctness under approximates modern processor-memory systems and tend to focus on data race
free programs[43] — RCU is a racey program; 2) RCU relies on abstruse operating system interactions that aren’t reconciled with language semantics; and
3) Behavioral specifications for RCU protected data structures have not been
established. Each of these issues represents a relatively formidable challenge to
the existing state-of-the-art.
At this point in time, a proof of correctness for a sophisticated implementation of RCU executing on a modern processor-memory system is not feasible.
However, as a first step towards that goal, this work tackles proving that an unsophisticated variant – RCU “Classic” – executing on a sequentially consistent
multiprocessor that supports multitasking does not collect live objects and is
memory safe.
Our approach to prove that RCU “Classic” has the live object and memory safety
properties is to define a minimal language capable of expressing RCU. This language incorporates context switch behavior and is defined using a small-step
operational semantics that maps language constructs to configuration transformations. We express the live object and memory safety properties in linear temporal logic and use the set of traces comprised of sequences of configurations that
are created by simulated execution of the RCU program via the operational semantics as models to determine the validity of these properties. We instrument
both the code and the language semantics to simplify the verification effort —
we prove that instrumented RCU executing instrumented semantics does not
collect live objects and is memory safe. We then introduce restrictions on the
definition and use of the local RCU pointers that allow us to eliminate the instrumentation; thus, RCU implementations that conform to these restrictions
do not collect live objects and are memory safe.
2

Our work advances the state-of-the-art by capturing a facet of the real behavior
of an operating system context switch in a way that exposes the intermediate
state of a system where one thread has been switched out and the next is yet to
be switched in. We show that this intermediate state is used by RCU to achieve
synchronization without the use of synchronization primitives.
We formalize restrictions for the definition and use of the RCU local pointer.
These restrictions can help certify RCU implementations and are readily accommodated by static analysis tools.
This work demonstrates that operational verification techniques are useful for
proving liveness and safety properties of concurrent, reactive programs. We give,
for the first time, proofs that RCU does not collect live objects and that it is
memory safe.
This work is organized as follows. Section 2 gives the background for this work
and describes our overall approach. Section 3 describes related work. Section
4 defines the machine abstraction for a sequentially consistent multiprocessor
that supports cooperative multitasking. The machine language syntax is given
in Section 5. The semantics are given in Section 6. RCU “Classic” is described in
Section 7. The properties are verified in Section 8. Section 9 gives our conclusions
and Section 10 describes future work.

2

Background

Mutual exclusion (locking) is the standard technique to prevent concurrent programs from corrupting shared memory structures[2, 46]. When access to the
shared data structure can be partitioned into reader and writer accesses (readers
don’t modify the data), reader-writer locks reduce latency by allowing concurrent readers and enforcing mutual exclusion only when a writer is accessing the
shared data structure[31]. To further improve performance, RCU allows a writer
to execute concurrently with possibly many readers; mutual exclusion is enforced
only on writers (i.e., at most one writer is executing at any given time).
Reader-writer mutual exclusion guarantees that concurrent programs always access data in a consistent state — readers cannot acquire an object in some
partially updated state and writers cannot partially overwrite the effects of another write. Owicki[36] showed that reasoning about such systems is straight
forward because the intermediate effects of a multistep update are invisible —
she showed that we can treat the critical region as “non-concurrent” code whose
effects are atomic and visible only after the writer exits its critical region. For example, updating an object using reader-writer mutual exclusion is accomplished
either by updating the object in place or by creating a new object, copying the
existing data, and then updating the new object. Since the update is done within
a critical region, it’s not possible that any reader (or writer) is using the object
being updated; thus, in the case where the old object is replaced by a new object,
the old object’s storage can be collected once the existing data is copied to the
new object.
3

RCU also guarantees that concurrent programs always access data in a consistent
state. Writers cannot partially overwrite the effects of another write and readers
cannot acquire an object in some partially updated state. However, because RCU
doesn’t use mutual exclusion, the intermediate effects of a multistep update are
visible. To prevent corruption, RCU updates an object by creating a new (local)
object, copying the existing data, updating the new object, and then atomically
storing a single pointer to simultaneously make the new object accessible and
the old one inaccessible. After the pointer update, there are at least two versions
of the object — the new one and the old one. Storage for the old object cannot
always be reclaimed immediately after the pointer update because some reader
may be using it.
An object’s storage is safe to reclaim only when the object is no longer live
(informally, not being used and inaccessible); RCU “Classic” claims it is safe to
reclaim the old object’s storage after the pointer is updated and every reader
has executed a context switch. This seems like a very easy claim to prove; however, traditional programming language semantics treat a context switch as a
“skip” — the semantics abstract away from the behavior of the scheduler by
treating yields to the kernel as semantic no-ops[37, 38]. In contrast, RCU treats
these yields as critical rendezvous points between processes and avoids adding
synchronization primitives by piggy-backing on synchronization actions in the
kernel. Our semantics captures this and lets us forumlate the claim after the
pointer is updated and every reader has executed a context switch as a predicate
on the assignment and rendezvous actions.
An operating system context switch suspends the execution of a process and
resumes the execution of one that had previously been suspended[4]. In a multiprocessor, a suspended process resumes on any available processor unless the
suspended process is tagged to run on a specific one[3]. The RCU writer program
is constructed so that after the pointer update, it runs on each processor before
collecting the old object’s storage. How it does this is straight forward. Using
an operating system primitive, the RCU writer’s execution is suspended on one
processor and resumed another; our operational semantics captures this with the
resched instruction.
For example, the RCU writer on a n processor system executes:
Update Pointer
i <- current processor
resched 0
resched 1
...
resched n-1
resched i
Collect Old Object
4

The sequence of resched instructions direct the RCU writer to execute first on
Processor 0 (resched 0), then on Processor 1 (resched 1), etc. After executing on the “last” processor, the RCU writer continues execution on the initial
processor and collects the old object’s storage.
How does this ensure that it safe to collect the old object? Consider what happens when resched j is executed. The RCU reader executing on Processor j−1
(resched j executes on Processor j−1) must have executed a resched for otherwise no context switch is possible. The RCU guidelines tell us that RCU readers
are not allowed to hold a reference to an RCU protected object beyond the RCU
read-side critical section — and the read-side critical section must end before the
context switch. This effectively says that from the perspective of RCU reader
executing on Processor j−1, the old object is not live when the RCU writer executes resched j on Processor j−1. By simple induction, when the RCU writer is
ready to collect the old object’s storage it is the case that for every RCU reader,
the old object is “effectively” not live and it is safe to collect the old object’s
storage.
Note that RCU does not guarantee that the old object is not live — RCU is
warranting that if a particular coding rule is followed, the old object won’t be
live at the time its storage is collected. However, the published implementation
of RCU “Classic” shows that the old object can be live at the time it is collected.
Yet, in looking at the published implementation, one gets the intuition that it
is “safe” to collect the old object.
Our research is meant to replace informal arguments that RCU has certain properties with a formal proof. A formal proof that RCU has particular properties
gives us confidence that it actually has those properties and it tells us explicitly
what set of coding rules need to be followed to have this confidence.
This research focuses on proving that RCU doesn’t collect live objects (live
object safety) and the dual property that RCU doesn’t access objects that have
been collected (memory safety).
2.1

Approach

Our approach to proving the live object and memory safety properties is an
operational verification using pencil and paper proofs. We use pencil and paper
proofs because mechanizing a complex proof is a formidable challenge in-andof-itself and is usually predicated on the availability of the pencil and paper
proof.
An operational verification that program P has property S requires three things:
– An operational semantics of the implemenation language used to express
program P
– A formal specification language in which to express S
– A deductive apparatus capable of carrying out the proof that P ⇒ S.
5

The remainder of this section describes our approach for proving the live object
and memory safety properties vis-a-vis our operational semantics, specification
of the properties, and the deductive apparatus.
Operational Semantics of Implementation Language
The operational semantics[38, 39] of a language describes the execution of programs in terms of sequences of configurations of an abstract machine. Our abstract machine is meant to be like a real multiprocessor, but with the minimum
machinery necessary to explain the semantic aspects of the instructions used
to implement RCU “Classic.” The abstract machine we define is quite simple.
Each processor has registers and a ready list. All processors share a common
memory. Registers have the same type as memory, except that they are local
to a processor and named differently so as to distinguish them from memory.
A processor’s ready list maintains processes that are eligble to run when the
processor executes a resched instruction.
In our semantics, processors sequentially execute their programs and each memory operation is completed before the next is initiated. Concurrency is represented as a non-determistic choice of processor execution. Thus, our semantics
describe a sequentially consistent multiprocessor[18] — sequential consistency is
a global property where all memory accesses represent an interleaving that is
consistent with the program order of each processor.
Our abstract machine differs little from modern sequential processors (e.g.,
ARM, Z80, 8096, etc.) and our semantics is faithful to them as well. Indeed,
our semantics is faithful to any multiprocessor comprised of processors that execute instructions in program order and that serialize access to memory. The set
of execution sequences realized by our operational semantics for any given program is compatible with the execution sequence realized by any real sequentially
consistent multiprocessor.
The difference between our abstract machine and a real sequentially consistent
multiprocessor is in our support of process rescheduling. Many processors and
most modern ones provide support for context switching while depending on the
operating system to determine the “next” thread that executes. Our semantics
represents the scheduling and context switch behavior in the resched instruction
and the information needed to effect a context switch is carried about in a
processor’s ready list. Processes in a real system are interrupted (either by preemption or voluntarily) so that the operating system can switch from one process
to another; the intermediate state where one process is switched out and the next
is yet to be switched in is captured in our semantics as the R state. If the ready
list is empty when a processor executes a resched instruction, the processor stays
in the R state until the ready list is loaded with a process. In a real operating
system, if the ready list is empty, the processor executes a null process.
Whereas traditional semantic approaches for non-deterministic program execution treat a context switch as a “skip”[35], our semantics includes the processor
6

ready lists and the operating system intermediate state in the abstract machine’s configuration. For this reason, the set of configuration sequences that
define RCU as determined by our operational semantics closely approximates
the real behavior of a sequentially consistent processor executing RCU.
Specification of Properties
Reactive programs, such as RCU, maintain an ongoing interaction with each
other (concurrency) and/or their environment rather than produce a final value
on termination[21, 22]. Reasoning about reactive programs requires specifying
and verifying behaviors[40]. Temporal logic[6] is a standard formalism for expressing the behavior of reactive systems[19].
Programs are traditionally specified with linear temporal logic (LTL)[17] because the “interleaving” model of concurrency quite naturally corresponds with
describing a computation as a set of executions and because the model-theoretic
semantics of temporal logic relate formulae in the logic to sets of models that
satisfy them.
The models in linear temporal logic take the form of sequences. Each configuration sequence in the set of executions determined by our operational semantics
is a model; this establishes a direct relation relation between programs (and in
particular our implementation of RCU) and the temporal logic formulae used to
specify properties.
Deductive Apparatus
Our deductive apparatus is first order linear temporal logic. Specifications are
written in linear temporal logic and the configuration sequences obtained by
applying our operational semantics to the implementation of RCU “Classic”
provide models for the verification effort.
The verificaton that RCU has a particular property is carried out as follows. Let
ΠP = {π0 , π1 , . . .} be the set of execution sequences that define P . Let S be the
specification of the property being verified, then ∀π ∈ ΠP π |= S ` P ⇒ S.

3

Related Work

Liu and Moore[20] have applied operational techniques to the verification of
Java programs. Their approach attacks verification at the byte code level and
does not consider liveness and safety properties. They prove correctness of two
computational programs.
Atalli[1] uses operational techniques to create formal executable semantics of
Java; she mixes small-step semantics to describe concurrency with big-step semantics to describe Java object-oriented features. Meseguer[32] defines concurrency in terms of rewriting and relates the execution of non-deterministic operational semantics to rewriting.
7

Schacht[45] creates a framework for reasoning about the temporal properties
of Actor programs by defining the Actor concurrency model in terms of an
operational semantics. The operational semantics are used to create traces (of
configurations) that serve as models for validating linear temporal logic property
specifications.
Compton[5] uses operational techniques to verify a safety property of Stenning’s
(UDP) algorithm. He uses a fragment of the Ocaml language (miniCaml), including the sockets library for UDP to demonstrate the verification in a model
that accurately represents the operating system API and network environment
in which the program would execute. Compton creates an operational semantics
such that the network emits a label for every action performed and a list of
these these labels forms a trace. He shows that each step of a program’s execution corresponds to some step in the network trace. His safety theorem is given
as a property for each trace.
Ridge[42] uses operational reasoning techniques to prove correctness of Peterson’s algorithm on sequential and weak memory systems. He creates a small-step
semantics for a functional language and implements Peterson’s algorithm in that
language. He uses the theorem prover Isabelle to symbolically execute the program. He verifies that the two processes cannot be in the critical section at the
same time. His approach is to annotate trace information obtained from execution of the operational semantics so that it’s easy to determine when execution
for a thread is in a critical section.
Ridge approaches weak memory as follows. He maintains a history of writes
and updates a memory location from this history when a memory barrier for
that location is executed. He does not define a read — his proof relies on the
side-condition that a memory location has only one value at any time.
Mckenney[29, 30] has used the SPIN Model Checker[15] to verify a variety of
RCU implementation; however, he does not tackle liveness and safety properties.

4

Multiprocessor Abstraction

The operational semantics of a language describe the execution of programs in
terms of sequences of configurations of an abstract machine[38, 39]. Our abstract
machine is much simpler than a real processor as we support only those instructions necessary to create an implementation of RCU “Classic”. Our abstract
machine supports the basic load and store operations (load and store registers
and immediate values), a load indirect operation, a resched instruction, and
a halt instruction; we use some syntactical sugar to implement looping. Our
abstract machine does not include a program counter, stack, machine status
register, interrupts, etc. Memory operations are serialized.
The basic operations (load, store, load indirect) and looping construction are
straight-forward. Load instructions put values in registers, store operations put
values into memory. The value returned by an indirect operation isn’t used in
8

our implemenation of RCU “Classic” because we don’t really use RCU as part of
a larger solution. For this reason, our semantics define a fail-stop semantics for
dereferencing collected memory; to prove prove that RCU doesn’t collect live objects and is memory safe requires showing that null pointers aren’t dereferenced
and that freed memory isn’t accessed. To this end, we provide two semantics for
dereferenced operations. The processor semantics for a load indirect implement a
“skip” (the returned value isn’t used) while the instrumented semantics indicate
indicate an error if a null pointer is accessed or if freed memory is accessed1 .
These two semantics agree on computations that don’t fail (we prove this in
Section 8).
The resched instruction moves the sequence of instructions currently executing
on a processor and the processor’s registers to a ready list. This sequence of
instructions and register values is called a context. Next, the resched instruction
arbitrarily chooses a context from the processors ready list. The instruction
sequence of the selected context becomes the currently executing instruction
sequence and the processor registers are updated with the register values of the
selected context. Once selected, the context is removed from the ready list.
A context switch occurs only when a processor executes a resched instruction.
At that time, the program can reschedule itself to run on the other processor or
the current processor. If a program A running on processor 1 executes a resched,
placing program A on processor 0’s ready list and if processor 1’s ready list is
empty, then after the processor 1’s instruction stream and register values are
placed in processor 0’s ready list processor 1 will be idle until another processor
executes a resched that puts a context in processor 1’s ready list.
This behavior abstracts how processors execute a context switch and how an
operating system dispatches processes to run on a particular processor. We now
give a formal description of the multiprocessor abstraction; its language syntax
and semantics are given in subsequent sections.
The multiprocessor has two processors, a common memory, a ReadyList for
processor 0, and one for processor 1. Processors have local registers that are
managed as an array. For example regs(0) refers to register 0. Registers map a
location to an integer value.
Multiprocessor = Processor0 × Processor1 × Mem × ReadyList0 × ReadyList1
Processor0 = regs
Processor1 = regs
regs = Array of val indexed by N Range(0 .. maxReg-1)
Mem maps locations to integer values. For example, M em(20) refers to memory
location 20.
Mem = Array of val indexed by N Range(0 .. maxMem-1)
1

Accessing previously freed or reallocated memory generally doesn’t raise an exception on a real processor but the effect of using such memory is corruption that leads
to data inconsistency and program crashes.

9

Ready lists maintain the “context” for the execution of processes that aren’t
currently running. The context for a process’s execution are the values of the
processor’s registers and a list of instructions to be sequentially executed.
ReadyList0 = {(regs, cmds)}
ReadyList1 = {(regs, cmds)}
Values are integers:
val ∈ N

0 ≤ val ≤ maxInt

The multiprocessor’s instructions:
cmds = List of cmd
cmd = load + loadi + store + storei + dr + di + resched + forever + halt
σ ∈ Processor0 × Processor1 × Mem × ReadyList0 × ReadyList1

5
5.1

Syntax
Syntax of Processor Instruction Set

cmd ::=
|
|
|
|
|
|
|
|

’load’
reg loc
’loadi’ reg val
’store’ loc reg
’storei’ loc val
’dr’
reg
’di’
loc
’resched’ procId
’forever’ cmds
’halt’

reg ::= digits
loc ::= digits
val ::= digits | ’Nil’
procId ::= digits

//
//
//
//
//
//
//
//
//

reg <-- contents of loc (r0 := [0])
reg <-- val
(r0 := 2)
loc <-- contents of reg ([0] := r0)
loc <-- val
([0] := 3)
dereference reg
dereference loc
context switch; continue execution on processor procId
loop
terminate execution

// ’Nil’ is "instrumentation" discussed later

digits ::= digits digit
| digit
digit ::= ’0’ | ’1’ | ’2’ | ’3’ | ’4’ | ’5’ | ’6’ | ’7’ | ’8’ | ’9’
5.2

Program Syntax

A multiprocessor program consists of two sequences of commands separated by
a k. Commands to the left of k execute on processor 0 while those on the right
execute on processor 1.
10

mp ::= cmds || cmds
cmds ::= cmds ’;’ cmd
| cmd

6

Semantics

The small step semantics in this section formally describe the execution of multiprocessor programs as a transition system over configurations.
There are three forms of configurations:
hc, σi : A pair (c, σ) with command c ∈ cmd and “state” σ.
hc0 k c1 , σi : A triple (c0 , c1 , σ) with Processor 0’s continuation, c0 ∈ cmds;
Processor 1’s continuation, c1 ∈ cmds; and the “state”, σ.
The third form is either of these two forms where the command is replaced with
R — a context switch instruction (resched) transitions the configuration to an
intermediate configuration whose continuation is R.
The state (as defined earlier) is σ ∈ P rocessor0 × P rocessor1 × ReadyList0 ×
ReadyList1 × M em.
The semantics of the dr and di instructions have two forms: Instrumented and
Processor. The instrumented semantics for these instructions gives transitions
that can lead to an error configuration. The processor semantics for these instructions do not have such transitions.
To support the instrumented semantics, we have instrumented the processor
instructions so that val can be a number (digits) or Nil.
6.1

Commands

The following subsections give the semantics for each command. To keeps things
short, whenever there are processor specific differences in the semantics, we give
the semantics with respect to Processor 0.
load reg loc
Register reg is assigned the contents of memory location loc.
hload reg loc, σi → σ[Processor0.regs(reg) = Mem(loc)]
loadi reg val
Register reg is assigned val.
11

hloadi reg val, σi → σ[Processor0.regs(reg) = val]
store loc reg
Memory location loc is assigned the contents of register reg.
hstore loc reg, σi → σ[Mem(loc) = Processor0.regs(reg)]
storei loc val
Memory location loc is assigned val.
hstorei loc val, σi → σ[Mem(loc) = val]
dr reg
Dereference memory location pointed to by reg.
Instrumented Semantics:
σ.Processor0 .Regs(reg) = Nil
hdr reg, σi → hError, σi
σ.Processor0 .Regs(reg) 6= N il
isFree(Processor0 .Regs(reg))
hdr reg, σi → hError, σi
σ.Processor0 .Regs(reg) 6= N il
¬ isFree(Processor0 .Regs(reg))
hdr reg, σi → σ
Processor (uninstrumented) Semantics
hdr reg, σi → σ
Note: The auxiliary predicate isFree is discussed at the end of this section.
di loc
Dereference memory location pointed to by loc.
Instrumented Semantics
σ.Mem(loc) = Nil
hdi loc, σi → hError, σi
σ.Mem(loc) 6= Nil isFree(Mem(loc))
hdi loc, σi → hError, σi
12

σ.Mem(loc) 6= Nil
¬ isFree(Mem(loc))
hdi loc, σi → σ
Processor (uninstrumented) Semantics
hdi loc, σi → σ
Note: The auxiliary predicate isFree is discussed at the end of this section.
Error
Terminate execution.
hError, σi → hError, σi
halt
Terminate execution.
hhalt, σi → hhalt, σi
forever
Execute an instruction or sequence of instructions over and over again.
hforever c, σi → hc ; forever c, σi
resched n
The sequence of instructions following resched are to be executed on processor n.
resched 0

resched R
resched 1

hresched 0; c, σi → hR, σ[ReadyList0 ∪ (regs = Processor0 .regs,c)]i
{(regs,c)} ∈ ReadyList0
hR, σi → hc, σ[Processor0 .regs = regs, ReadyList0 \ {(regs,c)}]i
hresched 1; c, σi → hR, σ[ReadyList1 ∪ (regs = Processor0 .regs,c)]i

isFree loc
The predicate isFree(loc) tells whether the location loc has been freed.
A future revision of this semantics adds type states (e.g., Unassigned, Assigned,
and Free) to memory locations and gives implementations of Free and Allocate.
13

An unAssignedList will track the memory locations that may be allocated.
Once a memory location is freed, it won’t be reallocated. Strategies that allow
memory locations to be reallocated make it difficult to detect situations where
a reference held by one thread is no longer valid because the memory to which
it refers has been freed and reallocated. A once freed, never reallocated policy
makes it relatively easy to detect invalid references.
6.2

Sequence

Seq1

hc0 , σi → σ 0
hc0 ; c1 , σi → hc1 , σ 0 i

Seq2

hc0 , σi → hc00 , σ 0 i
hc0 ; c1 , σi → hc00 ; c1 , σ 0 i

6.3

Interleaving Concurrency

Int1

hc0 , σi → hc00 , σ 0 i
hc0 k c1 , σi → hc00 k c1 , σ 0 i

Int2

hc0 , σi → σ 0
hc0 k c1 , σi → hc1 , σ 0 i

Int3

hc1 , σi → hc01 , σ 0 i
hc0 k c1 , σi → hc0 k c01 , σ 0 i

Int4

hc1 , σi → σ 0
hc0 k c1 , σi → hc0 , σ 0 i

6.4

Cooperative Multitasking

Mul1

hResched 0 ; c0 , σi → hR, σ 0 i
h(Resched 0 ; c0 ) k c1 , σi → hR k c1 , σ 0 i

Mul2

hResched 1 ; c0 , σi → hR, σ 0 i
h(Resched 1 ; c0 ) k c1 , σi → hR k c1 , σ 0 i

Mul3

hR, σi → hc0 , σi
hR k c1 , σi → hc0 k c1 , σ 0 i

Mul4

hResched 0 ; c0 , σi → hR, σ 0 i
hc1 k (Resched 0 ; c0 ), σi → hc1 k R, σ 0 i
14

Mul5

hResched 1; c0 , σi → hR, σ 0 i
hc1 k (Resched 1 ; c0 ), σi → hc1 k R, σ 0 i

Mul6

hR, σi → hc0 , σi
hc1 k R, σi → hc1 k c0 , σ 0 i

7

RCU “Classic”

RCU allows a mutator, without using blocking synchronization, to update a
shared data structure concurrent with accessors utilizing that data structure.
Figure 1 illustrates RCU “Classic” and shows how the new object is published
by (atomically) writing a pointer (g). It is through this pointer that accessors
retrieve objects. Since the mutator publishes the new object concurrently with
accessor execution, accessors that retrieve a reference (by reading g) to the object
before the pointer update coexist with accessors that retrieve a reference after
the update. This means that the old object cannot be collected immediately after
the global pointer is updated. Instead, the RCU mutator collects the old object’s
storage only after all processors have passed through a quiescent state[28] which
is defined as a context switch, idle loop, or user code[44]. To prevent accessing
the old object, a, after it has been collected (which can lead to all sorts of errors),
accessors are not allowed to hold references to RCU protected objects through
a quiescent state[24–26].
Figure 2 gives our implementation of RCU using the language we defined in
Sections 5 and 6. The RCU Mutator begins execution on Processor 0, but after
updating the global RCU pointer it reschedules itself to run on Processor 1. Since
ReadyList0 is empty, Processor 0 stays in the R configuration. Only after Processor 1 executes resched 0 does Processor 0 transition from the R configuration
to executing storei 0 0 and collecting the “old” object’s memory.
Our implementation of RCU is also discussed in Section 8, Definition 13 where
we give the initial configuration of the RCU “Classic” Program.

8
8.1

Verifying RCU Correctness
Definitions

Definition 1 (Atomic Configuration Formula).
A configuration is a triple (Section 6) comprised of: Processor 0’s continuation, Continuation0 ∈ cmds; Processor 1’s continuation, Continuation1 ∈
cmds; and the “state”, σ ∈ P rocessor0 × P rocessor1 × ReadyList0 ×
ReadyList1 × M em.
15

g

a
not free

1: Allocate storage for new object
g

a

b
not free

not free

2: Copy existing data to new object
Read a
g

a

b’
Copy to b

not free

not free

3: Update new object
Update
g

a

b’’
not free

not free

4: Publish new object
RCU_Assign_Pointer

g’

a

b’’
not free

not free

5: All accessors pass thru
a quiescent point

Q

6: Collect old object’s storage
collect a

g’
free

b’’
not free

Fig. 1. RCU “Classic”. g is the global pointer that references the shared object.
When the shared object (a) is to be modified, RCU creates b, the new object,
copies the old to-be-modified-object to b, modifies b, and then “switches” the
global pointer so that it points to b (the new object). After all accessors pass
through a “quiescent” state, the RCU Mutator collects the old object’s storage.

An atomic configuration formula ranges over a configuration as given by
the following BNF:
p ::=
|
|
|
|
|
|

Continuation0 = cmds
Continuation1 = cmds
P rocessor0 .regs(loc) = val
P rocessor1 .regs(loc) = val
ReadyList0 setop {rc|rc ∈ (regs, cmds)}
ReadyList1 setop {rc|rc ∈ (regs, cmds)}
M em(loc) = val

setop ::=
⊂ | ⊆ | = | 6= | · · ·
Definition 2 (Configuration Formula).
A configuration formula ranges over atomic configuration formulae. The
following BNF defines a configuration formula in terms of atomic con16

RCU Multiprocessor Implementation
RCU Mutator --- Processor 0:
load 0 1
store 2 0
storei 2 2
storei 3 2
resched 1
resched 0
storei 0 0
halt

;
;
;
;
;
;
;

Reg(0) <-- Mem(1)
Mem(2) <-- Reg(0)
Mem(2) <-- 2
Mem(3) <-- 2
Context Switch
Context Switch
Mem(0) <-- 0

***
***
***
***
***
***
***

Read
Copy
Update
RCU_Assign_Ptr
Run_on Processor 1
Run_on Processor 0
Collect (free) address 1

RCU Accessor --- Processor 1:
forever
*** RCU Lock
load 0 3 ; Reg(0)<-- Mem(3)
. . . . .
. . . . .
*** RCU Unlock
*** Run_on Processor 1

resched 1 ; Context Switch
Summary of Memory Usage:
Mem(0):
Mem(1):
Mem(2):
Mem(3):

Type state of Mem(1)
"Old" Object
"New" Object
RCU Global Pointer

Fig. 2. RCU Multiprocessor Implementation.

figuration formulae:
φ ::= (p) | (¬φ) | (φ ∨ φ) | (φ ∧ φ) | (φ ⇒ φ) | (φ ⇔ φ) | (∀i φ) | (∃i φ)
where p is any atomic configuration formula.
Definition 3 (Specification Formula).
Specifications are first order, linear temporal logic formulae that range
over configuration formulae. The following BNF gives the syntax of a
specification formula:
ψ ::= T | F | (φ) | (¬ψ) | (ψ∨ψ) | (ψ∧ψ) | (ψ ⇒ ψ) | (ψ ⇔ ψ) | (ψ) | (♦ψ)
where φ is any configuration formula.
17

Definition 4 (Computation).
A computation is a sequence of configurations, ω0 , . . . , ωn such that each
configuration in the sequence, except for the first, is related to the previous configuration by the application of a semantic rule (Section 6):
ωi → ωi+1
+

We write, ω0 → ωn to mean the computation with initial configuration
ω0 and that leads to configuration ωn by one or more applications of the
semantic rules.
Since more than one semantic rule may apply to a given configuration, a configuration may have several possible successor configurations. Thus, there can be
many different computations that have a given initial configuration. Any particular computation is but one “branch” of the “tree” of executions. A computation
corresponds to what is commonly called an “interleaving”.
We write ΠP = {π0 , . . .} to mean the set of computations that describe all
executions of multiprocessor program P (i.e., the initial configuration of each
π ∈ ΠP is the program P ). We omit the subscript P when it’s clear by context
which program is being described.
Definition 5 (Racey Computations).
Let,
∗

Σ 0 = {σ 0 |hC0 k C1 , σi → hC00 k C10 , σ 0 i}
be the set of possible outcomes of executing the multiprocessor program
that begins in the initial configuration hC0 k C1 , σi and that reaches
configurations such that Processor 0’s continuation is C00 and Processor
1’s continuation is C10 .
When |Σ 0 | = 1, the computation is not racey. A not racey, multiprocessor
∗
computation (program), hC0 k C1 , σi → hC00 k C10 , σ 0 i, has the stateconfluence property:
∗

∀ C000 C100 σ 00 hC0 k C1 , σi → hC000 k C100 , σ 00 i ⇒
∗
hC000 k C100 , σ 00 i → hC00 k C10 , σ 0 i
When |Σ 0 | > 1, the computation is racey. Racey computations do not
have the state-confluence property.
A multiprocessor program has a data race if two accesses conflict, at least
one of them is a write, and they are not ordered[41]. In the case of two
writes, the value observed after both writes depends on which was last;
in the case of a write and read, the value read depends on whether the
18

read happens before the write or after it. A multiprocessor program with
a data race is racey2 .
A data race is a sufficient condition for a computation to be racey. However, according to our definition of a racey computation, a data race is
not necessary — a computation may be racey because of other attributes
of the state and/or there may be mechanisms other than loads that alter
memory.
The two computations:
∗

π1 = hC1 k C2 , σi → hC10 k C20 , σ10 i
∗

π2 = hC1 k C2 , σi → hC10 k C20 , σ20 i
are racey if σ10 6= σ20 .
These computations are locally racey if σ10 .P rocessor0 .regs 6= σ20 .P rocessor0 .regs∨
σ10 .P rocessor1 .regs 6= σ20 .P rocessor1 .regs.
They are globally racey if σ10 .M em 6= σ20 .M em
Raceyness is often restricted to specific memory locations, registers, etc. For
example, stating that π1 and π2 above are globally racey with respect to memory
location x means σ10 .M em(x) 6= σ20 .M em(x). In such a case, we say that the
initial configuration is racey (or leads to computations that are racey) with
respect to memory location x (or a processor’s register, etc.).
The configuration hstore 3 2; C0 k load 0 3; C1 , σi is locally racey with respect
to Processor 1’s register 0. There are exactly two computations that lead to
a configuration where Processor 0’s continuation is C0 and and Processor 1’s
continuation is C1 .
π0 = hstore 3 2; C0 k load 0 3; C1 , σ[M em(3) = 1]i →
hC0 k load 0 3; C1 , σ00 = σ[M em(3) = 2]i →
hC0 k C1 , σ000 = σ[M em(3) = 2, P1 .Reg(0) = 2]i
π1 = hstore 3 2; C0 k load 0 3; C1 , σ[M em(3) = 1]i →
hstore 3 2; C0 k; C1 , σ10 = σ[M em(3) = 1, P1 .Reg(0) = 1]i →
hC0 k C1 , σ100 = σ[M em(3) = 2, P1 .Reg(0) = 1]i
As also depicted in Figure 3a, these two computations lead to different results:
σ000 .Processor1.regs(0) = 1
σ100 .Processor1.regs(0) = 2
Figure 3b depicts a not racey computation.
Definition 6 (Fairness Constraint).
2

In particular, the computation delineated by the continuations where each processor
executes the conflicting access defines a racey computation.

19

Racey Program
[3] = 1

Not Racey Program
[3] = 1

l reg0 [3]

s [3] 2

s [3] 2

Processor 0

s [2] 2

reg0 = 1

l reg0 [3]

l reg0 [3]

s [2] 2

l reg0 [3]

reg0 = 2

reg0 = 1

(a)

(b)

Processor 1

Fig. 3. Racey and Not Racey Computations. A locally racey computation is
shown in (a). A not racey computation is shown in (b). Not racey computations
can be represented as a commutating diagram because the “final” configurations
are equivalent.
A fairness constraint is a configuration formula φ. Given a computation
π ∈ ΠP and fairness constraint φ. π satisfies φ iff the fairness constraint
is true infinitely often along π (i.e., π |= ♦(φ)).
Definition 7 (Fair Execution).
Let Fc = {φ1 , . . . , φn } be a set of n fairness constraints. Let Π =
{π0 , . . .} be the set of computations that describe
all executions of proV
gram P . We call the set, F = {π ∈ Π|π |= Fc ♦(φi )}, the fair executions of P with respect to Fc .
Interleaving semantics trade parallel/independent processor execution for a model
that allows only one processor, at a time, to perform a step in the computation.
In this model, it’s possible to produce a computation where only one processor
makes steps in the computation, even though this computation is not observable
by any real system. Fairness constraints and the resulting fair execution paths
are used to restrict verification to realizable computations.
Definition 8 (Roots).
The “Roots” of a program are a set of distinguished objects that are
reachable (accessible via the normal rules of memory access).
The objects that comprise the roots of RCU “Classic” are the global RCU
pointer, g and the Accessor’s local RCU pointer g1 .
Definition 9 (Live Object).
An object is live if it is in the set of roots or is in the transitive closure
of objects reachable from any object in the set of roots.
20

Definition 10 (Memory Safe).
A computation is memory safe if it does not dereference N il and it does
not access memory that has been freed. The two instructions that dereference pointers are di and dr; if either dereferences N il, the resulting
continuation is Error. Also, if either accesses memory that has been
freed, the resulting continuation is Error. Thus, computation π is memory safe it does not lead to the Error continuation:
Memory Safe π ⇔
π |= ¬♦ (Continuation0 = Error ∨ Continuation1 = Error)
Definition 11 (Variable Use/Def ).
The use of a variable is the set of positions of those statements where
the variable appears on the right-hand side of an expression.
The def of a variable is the set of positions of those statements where the
variable appears on the left-hand side of an expression. Since statements
are not labeled in our language, the position is the continuation such that
definition is the first command in the sequence of commands that make
up the continuation.
Definition 12 (Reaching Definition).
The reaching definition for a particular use of a variable is the set of definitions for the variable that “reach” that use — definition d of variable
t reaches a statement u that “uses” the variable t if there is a sequence
of configurations from the definition d to u such that there is no other
configuration that defines t.
Definition 13 (RCU “Classic” Program).
P0 = load 0 1;store 2 0;storei 2 2;storei 3 2;resched 1;resched 0;storei 0 0;halt
P1 = forever (load 0 3;resched 1)
Initial configuration:
hP0 k P1 , σi

M em(0) = 1, M em(1) = 1, M em(2) = 0, M em(3) = 1



ReadyList0 = ∅
σ =
ReadyList1 = ∅



P rocessor1.Regs(0) = N il
P0 is called the “mutator” program and it executes on Processor 0. Similarly, P1 is called the “accessor” program and it executes on Processor
1.
We also define an instrumented version of the accessor program:
I1 = forever (load 0 3;loadi 0 Nil;resched 1)
21

The instrumented version differs only by the addition of a load immediate
that “kills” the definition of register 0. The initial configuration:
hP0 k I1 , σi
is called Instrumented RCU “Classic”.
The implementations of RCU make use of certain memory locations in
particular ways:
M em(0) = 0 ⇒ M em(1) is free.
M em(1) is the “old” object.
M em(2) is the “new” object.
M em(3) is the global pointer, g.
Section 7 gives an overview of RCU “Classic”.
Definition 14 (Read-Side Critical Section).
The RCU Read-Side Critical Section is the code region in an RCU Accessor program where operations take place that involve RCU protected
objects. In a traditional locking environment, one would find this region
bounded by lock and unlock directives.
RCU Read-Side Critical Sections do not contain synchronizing events
such as the resched command.
8.2

Assumptions

Assumption 1 (Fair Progress Constraint)
Continuation0 = Halt
Let Π = {π0 , . . .} be the set of computations that describe execution of RCU
(i.e., the initial configuration of each π ∈ Π is hP0 k P1 , σi).
The set of fair executions of RCU, Πf air with respect to the fair program
constraint is given by
Πf air = {π ∈ Π | π |= ♦ (Continuation0 = Halt)}
Similarly, let Γ = {γ0 , . . .} be the set of computations that describe execution of
Instrumented RCU (i.e., the initial configuration of each γ ∈ Γ is hP0 k I1 , σi).
The set of fair executions of Instrumented RCU, Γf air with respect to the
fair program constraint is given by
Γf air = {γ ∈ Γ | γ |= ♦ (Continuation0 = Halt)}
22

8.3

Verification

In this section, we prove that RCU implementations that conform to particular
restrictions are memory safe and do not collect live objects. We also show how
RCU achieves synchronization without using synchronization primitives.
Our proof is structured as follows. First, we prove that every fair execution of
RCU reaches a state such that Processor 0 is “blocked” waiting for Processor 1. In
this state, Processor 1 can always allow Processor 0 to proceed. By instrumenting
the code, it is easy to prove that the “old” object is not live in this state. We
show that this state alway precedes collection; thus, instrumented RCU cannot
collect live objects.
To make it easy to show that RCU is memory safe (doesn’t access nil pointers
or retrieve freed memory), we instrument the semantics to generate an error
continuation if RCU accesses freed memory or attempts to dereference a nil
pointer. We then prove that instrumented RCU cannot reach a configuration
where the continuation is error.
We next show that the restriction on fair executions can be eliminated — thus
proving that every execution of instrumented RCU using instrumented semantics
doesn’t collect live objects and is memory safe. Lastly, we introduce restrictions
on the local RCU pointer’s def/use chain that allow us to remove the instrumentation. This establishes that that RCU implementations that conform to these
restrictions are memory safe and do not collect live objects.
8.3.1 Instrumented RCU does not collect live objects
Let,
∆

f = Continuation0 = R ∧
M em = σ.M em[M em(2) = 2, M em(3) = 2] ∧
{(σ.P rocessor0 [regs(0) = 1], resched 0;storei 0 0;halt)} ⊆
ReadyList1 )
be the configuration formula describing the configuration arising after the Mutator (Processor 0) executes the resched 1 instruction. This is the configuration
in which Processor 0 “blocks” waiting for Processor 1.
Lemma 1.
∀γ ∈ Γf air γ |= ♦ f
Proof:
By definition of sequencing, cmds are executed in sequence.
Since, for all fair executions, Processor 0 “reaches” the halt instruction
of P0 , it must be the case that the instructions located before halt were
first executed. In particular, the continuation resched 1;resched 0;storei 0 0;halt
23

must have been reached. And since only Processor 0 updates memory,
M em(2) = 2, M em(3) = 2 and Processor 0’s Register 0 is 1.
By definition of the semantics of the resched 1 instruction, whenever
Processor 0 executes resched 1, Continuation0 = R. Thus,
∀γ ∈ Γf air γ |= ♦ f
t
u
Let,
∆

g = f ∧ Continuation1 = R ∧
{(σ.P rocessor1 [regs(0) = Nil],
forever (load 0 3;loadi 0 Nil;resched 1))} ⊆
ReadyList1 )
be the configuration formula describing the configuration where Processor 0 is
“blocked” waiting for Processor 1 and Processor 1 is able to continue either with
the Accessor or with the Mutator. Note that Mutator immediately reschedules
itself to run on Processor 0 — thereby “unblocking” Processor 0.
Lemma 2.
∀γ ∈ Γf air γ |= (f ⇒ ♦g)
Proof:
Processor 0 cannot transition out of the configuration where its continuation is R unless its ReadyList is not empty. ReadyList0 is populated
by the mutator continuation that executes on Processor 1 (i.e., resched
0;...). For that continuation to run, Processor 1’s accessor program
must have executed its resched instruction.
Since Processor 1 executes independently of Processor 0, at the time Processor 0 executes resched 1 and enters the R continuation, Processor 1
continuation can be any legal sequence; if it is R, the Lemma is proved.
If it is not R, then since the accessor program loops, it will eventually
execute the resched 1 instruction causing it to enter the R continuation
— and it will have tasks on its ReadyList.
t
u
Lemma 3.
(♦ p ∧  (p ⇒ ♦ q)) ⇒ ♦ q
Proof:
(♦ p ∧  (p ⇒ ♦ q))
♦p
 (p ⇒ ♦ q)
p
♦q

1.
2.
3.
4.
5.

Assumption
(1), Logic: a ∧ b ⇒ a
(1), Logic: a ∧ b ⇒ b
Definition of ♦, (2)
Definition of , (4), (3), modus ponens
24

Discharging the assumption gives the lemma:
(♦ p ∧  (p ⇒ ♦ q)) ⇒ ♦ q
t
u
Lemma 4.
∀γ ∈ Γf air γ |= ♦ g
Proof:
∀γ ∈ Γf air γ |= ♦ f

Lemma 1

∀γ ∈ Γf air γ |=  (f ⇒ ♦ g) Lemma 2
∀γ ∈ Γf air γ |= ♦ g

Lemma 1, Lemma 2, Lemma 3

t
u
By Lemma 4, the configuration hR k R, σ 00 i, where


 P rocessor0 = σ.P rocessor0 [Regs(0) = 1]



 P rocessor1 = σ.P rocessor0 [Regs(0) = 0]



 M em = σ.M em[M em(2) = 2, M em(3) = 2]



 ReadyList0 = ∅

ReadyList1 = {
00
σ =
(σ.P rocessor0 [regs(0) = 1],




resched 0;storei 0 0;halt),




(σ.P
rocessor1 [regs(0) = Nil],




forever
(load 0 3;loadi 0 Nil;resched 1))



}
∗

is inevitable: hP0 k I1 , σi → hR k R, σ 00 i.
Lemma 5.
live x ⇔
M em(3) = x ∨
∃(regs, c) ∈ ReadyList1 (regs(0) = x ∧ c = I1 ) ∨
(Continuation1 = I1 ∧ P rocessor1 .regs(0) = x) ∨
(Continuation1 = loadi 0 Nil;resched 1;I1 ∧ P rocessor1 .regs(0) = x) ∨
(Continuation1 = resched 1;I1 ∧ P rocessor1 .regs(0) = x)
where x ∈ loc.
Proof:
A RCU object is live (Definition 9) only if it can be reached via the global
RCU pointer or the local RCU pointer (these two pointers are the only
objects in the set of roots – Definition 8).
25

(1) Prove “⇐” part:
(M em(3) = x) ∨
(∃(regs, c) ∈ ReadyList1 (regs(0) = x ∧ c = I1 ) ∨
(Continuation1 = I1 ∧ P rocessor1 .regs(0) = x) ∨
(Continuation1 = loadi 0 Nil;resched 1;I1 ∧ P rocessor1 .regs(0) = x) ∨
(Continuation1 = resched 1;I1 ∧ P rocessor1 .regs(0) = x)
⇒ live x, where x ∈ loc.
Since the global RCU pointer is memory location 3,
(1)

M em(3) = x ⇒ live x, x ∈ loc

The other way for an object to be live is via the local RCU pointer. Although the RCU local pointer is Processor 1’s Register 0, Processor 1’s
Register 0 is not always the local RCU pointer. When Processor 1 is
executing the RCU Accessor program (I1 ), Processor 1’s Register 0 is
the RCU local pointer. When Processor 1 is not executing the RCU Accessor program, the RCU local pointer is on ReadyList1 in the context
portion of the context-continuation pair associated with the RCU Accessor Program. Thus, when Processor 1 is not executing the RCU Accessor
Program,
(2)

∃(regs, c) ∈ ReadyList1 (regs(0) = x ∧ c = I1 ) ⇒ live x, x ∈ loc3

The following equivalence (obtained by applying the definition of the
forever instruction to the RCU Accessor program) is used to define
the possible continuations of the RCU Accessor Program.
I1 = load 0 3;loadi 0 Nil;resched 1;I1
This gives rise to the following cases:
(a) (Continuation1 = I1 ∧ P rocessor1 .regs(0) = x) ⇒ live x
(b) (Continuation1 = loadi 0 Nil;resched 1;I1 ∧ P rocessor1 .regs(0) = x) ⇒ live x
(c) (Continuation1 = resched 1;I1 ∧ P rocessor1 .regs(0) = x) ⇒ live x
x ∈ loc
Case (c) is unnecessary because the definition of register 0 is killed by
the previous instruction (i.e., writing a Nil to Processor 1’s Register 0
ensures that Register 0 cannot point to an object). However, we will later
consider real RCU (without instrumentation), so we we won’t optimize
by removing this case.
3

(2) means that if the accessor’s register 0 “points” to x then x is live, even when the
accessor is switched out.

26

(1), (2), and Cases (a),(b),(c) give:
(M em(3) = x) ∨
(∃(regs, c) ∈ ReadyList1 (regs(0) = x ∧ c = I1 ) ∨
(Continuation1 = I1 ∧ P rocessor1 .regs(0) = x) ∨
(Continuation1 = loadi 0 Nil;resched 1;I1 ∧ P rocessor1 .regs(0) = x) ∨
(Continuation1 = resched 1;I1 ∧ P rocessor1 .regs(0) = x)
⇒ live x, where x ∈ loc.
this proves the “⇐” part.
(2) Prove the “⇒” part:
live x ⇒
M em(3) = x ∨
∃(regs, c) ∈ ReadyList1 (regs(0) = x ∧ c = I1 ) ∨
(Continuation1 = I1 ∧ P rocessor1 .regs(0) = x) ∨
(Continuation1 = loadi 0 Nil;resched 1;I1 ∧ P rocessor1 .regs(0) = x) ∨
(Continuation1 = resched 1;I1 ∧ P rocessor1 .regs(0) = x)
where x ∈ loc.
Proof:
By definition (Definitions 8, 9),
live x ⇒ RCU Global pointer = x ∨ RCU Local pointer = x
In proving the “left” arrow portion of the proof, we showed that the RCU
Local pointer is Processor 1’s Register 0 when Processor 1’s continuation is the RCU Accessor program; otherwise the RCU Local pointer is
on ReadyList1 in the context portion of the context-continuation pair
associated with the RCU Accessor Program.
Since, the RCU global pointer is M em(3) and because the case analysis performed in the “left” arrow portion is exhaustive, we can reverse
the implications in (1),(2),and cases (a),(b),and (c); which proves the
“right” part.
Since we proved the left and right parts, the lemma follows:
live x ⇔
M em(3) = x ∨
∃(regs, c) ∈ ReadyList1 (regs(0) = x ∧ c = I1 ) ∨
(Continuation1 = I1 ∧ P rocessor1 .regs(0) = x) ∨
(Continuation1 = loadi 0 Nil;resched 1;I1 ∧ P rocessor1 .regs(0) = x) ∨
(Continuation1 = resched 1;I1 ∧ P rocessor1 .regs(0) = x)
where x ∈ loc.
t
u
27

The Mutator collects the old object by “freeing” the memory associated with
the old object. As described earlier, memory location 0 tracks the type-state of
the old object (memory location 1). The storei 0 0 statement “frees” the old
object; if memory location 1 is live when the Mutator frees it, then a live object
has been collected. This gives the equivalence4 :
collectLive x ⇔ (Continuation0 = storei 0 0;halt ∧ live x)
Lemma 6 (Fair executions of Instrumented RCU do not collect live
objects).
∀γ ∈ Γf air γ |=  ¬ (collectLive 1)
Proof:
Every fair execution of Instrumented RCU passes through the configuration defined in Lemma 4. This configuration is characterized by both
processors continuations being R. In this configuration, the global RCU
pointer references the “new” object and the local RCU pointer references
Nil. Thus, in this “hR k R, σ 00 i” configuration, the old object (memory
location 1) is not live. Because the local RCU pointer is only defined
by the global pointer, any future definition of the local pointer cannot
reference the “old” object.
Since every configuration where the Mutator’s continuation equals storei 0 0;halt
is dominated by the “hR k R, σ 00 i” configuration,
∀γ ∈ Γf air γ 6|= ♦(collectLive 1)
The lemma
∀γ ∈ Γf air γ |=  ¬ (collectLive 1)
follows from  ⇔ ¬ ♦ ¬ in LTL.
t
u
Theorem 1 (Instrumented RCU does not collect live objects).
∀γ ∈ Γ γ |=  ¬ (collectLive 1)
Proof:
The Fair Progress Constraint (Assumption 1) partitions the set of Instrumented RCU computations such that every computation is either in
the set of fair executions or it is not:
∀γ ∈ Γ (γ ∈ Γf air ∨ γ 6∈ Γf air )
Since every γ 6∈ Γf air does not collect storage
4

As discussed earlier, a future version of the semantics will provide allocate and free;
collecting a live object becomes (Continuation0 = free(x); ∧ live x).

28

∀γ ∈ Γ γ |= ‘collect live object‘ ⇒ γ ∈ Γf air
Thus,to prove the theorem that Instrumented RCU does not collect live
objects, it is sufficient to prove that no fair execution of Instrumented
RCU collects a live object, that is
∀γ1 ∈ Γf air γ1 |=  ¬ (collectLive 1) ⇒
(∀γ2 ∈ Γ γ2 |=  ¬ (collectLive 1))
Since the antecedent is given by Lemma 6, Modus Ponens and renaming
gives the theorem:
∀γ ∈ Γ γ |=  ¬ (collectLive 1)
t
u
8.3.2 Instrumented RCU with restrictions is memory safe
A continuation is a sequence. Each command in the sequence has an implicit
position. Labeling a continuation gives an implicit position an explicit name.
For simplicity, the first command of the initial continuation of a multiprocessor
program is labeled `0 , the second `1 , and so on. For commands comprising the
body of a forever loop, only the body is labeled. For example, the instrumented
RCU Accessor continuation (I1 ) is labeled:
I1 = `0 : load 0 3; `1 : loadi 0 Nil; `2 : resched 1; I1
Let `0 ≺ `1 mean that the program statement located at position `0 is statically
located before the program statement located at position `1 . Given a continuation C and a command c, P os C c gives the position (label) of c within C:
P os C c ` ⇔ C = · · · ` : c · · ·
I1 ’s Read-Side Critical Section begins at `0 and ends at `1 .
Applications of RCU need to dereference the RCU local pointer. Showing that
RCU is memory safe (Definition 10) requires showing that dereferencing the
RCU local pointer does not lead to Error as Processor 1’s continuation. To
ensure that dereferencing the RCU local pointer is safe, we give the following
restrictions:
Restriction 1
Local RCU pointer use occurs only within the Read-Side Critical Section.
∀` P os I1 dr 0 ` ⇒ (`  `0 ∧ ` ≺ `1 )
Restriction 2
For each use of the local RCU pointer, every reaching definition is statically located before the use.
∀` P os I1 dr 0 ` ⇒ ∀`0 RDef P rocessor1.regs(0) ` `0 ⇒ `0 ≺ `
29

Restriction 3
For each use of the local RCU pointer, every reaching definition uses
only the global RCU pointer.
∀` P os I1 dr 0 ` ⇒ ∀`0 RDef P rocessor1.regs(0) ` `0 ⇒ usesOnly M em(3) `0
Lemma 7 (¬ Restriction 1 ⇒ ¬ Memory Safe).
(∃` P os I1 dr 0 ` ∧ (` ≺ `0 ∨ `  `1 )) ⇒
∃γ ∈ Γ γ |= ♦ (Continuation1 = Error)
Proof:
(1)

∃` P os I1 dr 0 ` ∧ `  `1 ⇒
∃γ ∈ Γ γ |= ♦ (Continuation1 = Error)

Since the command at `1 assigns N il to the local RCU pointer, placing
the dr 0 instruction immediately after `1 ensures the transition hdr, σi →
hError, σi.

(2)

∃` P os I1 dr 0 ` ∧ ` ≺ `0 ⇒
∃γ ∈ Γ γ |= ♦ (Continuation1 = Error)

Since the initial value of the local RCU pointer is N il (Definition 13),
dereferencing it before defining it ensures the transition hdr, σi → hError, σi.
Thus,
(∃` P os I1 dr 0 ` ∧ (` ≺ `0 ∨ `  `1 )) ⇒
∃γ ∈ Γ γ |= ♦ (Continuation1 = Error)
t
u
Lemma 8 (¬ Restriction 2 ⇒ ¬Memory Safe).
(∃` P os I1 dr 0 ` ∧
∃` 0 RDef P rocessor1.regs(0) ` `0 ∧
` 0  `) ⇒
∃γ ∈ Γ γ |= ♦ (Continuation1 = Error)
Proof:
A definition located after a use of the local RCU pointer that reaches that
use must be located before the resched 1 instruction. Locations before
the resched 1 instruction can be partitioned into two regions: Positions
before the local RCU pointer is assigned N il (`1 ) and positions after it
is assigned N il.
30

Definitions located after `1 can acquire an object via the global RCU
pointer that is “freed” by the RCU mutator before the accessor’s continuation transitions from R to I1 .
Definitions before `1 are “killed” by the definition at `1 that assigns N il
to the local RCU pointer.
In either case, the transition hdr, σi → hError, σi is assured, giving
(∃` P os I1 dr 0 ` ∧
∃` 0 RDef P rocessor1.regs(0) ` `0 ∧
` 0  `) ⇒
∃γ ∈ Γ γ |= ♦ (Continuation1 = Error)
t
u
Lemma 9 (Restrictions 1, 2, 3 are necessary for Instrumented RCU
to be Memory Safe).
Memory Safe ⇒ R1 ∧ R2 ∧ R3
Proof:
Contrapositive:
¬ R1 ∨ ¬ R2 ∨ ¬ R3 ⇒ ¬ Memory Safe
∆

Rewriting using definition: “ ⇒ ” = (a ⇒ b ⇔ ¬ a ∨ b):
(R1 ∧ R2 ∧ R3) ∨ (¬ Memory Safe)
Distributing the “and” over the “or”:
(R1∨¬ Memory Safe)∧(R2∨¬ Memory Safe)∧(R3∨¬ Memory Safe)
Rewriting using definition of “⇒”:
(¬ R1 ⇒ ¬ Memory Safe) ∧
(¬ R2 ⇒ ¬ Memory Safe) ∧
(¬ R3 ⇒ ¬ Memory Safe)
This gives the equivalence:



¬ R1 ⇒ ¬ Memory Safe ∧
(Memory Safe ⇒ R1 ∧ R2 ∧ R3) ⇔  ¬ R2 ⇒ ¬ Memory Safe ∧ 
¬ R3 ⇒ ¬ Memory Safe
By Lemma 7: ¬ R1 ⇒ ¬ Memory Safe
By Lemma 8: ¬ R2 ⇒ ¬ Memory Safe
If the local RCU pointer is defined to reference objects other than those
to which the global RCU pointer refers, then we cannot assert that RCU
is memory safe. So,
31

¬ R3 ⇒ ¬ Memory Safe
Thus,
(¬ R1 ⇒ ¬ Memory Safe) ∧
(¬ R2 ⇒ ¬ Memory Safe) ∧
(¬ R3 ⇒ ¬ Memory Safe)
Then, by the above equivalence:
(Memory Safe ⇒ R1 ∧ R2 ∧ R3)
t
u
Lemma 10 (Restrictions 1, 2, 3 are sufficient for Instrumented RCU
to be Memory Safe).
R1 ∧ R2 ∧ R3 ⇒ Memory Safe
Proof:
Contrapositive
¬ Memory Safe ⇒ ¬ (R1 ∧ R2 ∧ R3)
Distribute ¬ over conjunction in consequent:
¬ Memory Safe ⇒ (¬ R1 ∨ ¬ R2 ∨ ¬ R3)
Rewrite, expanding definitions
∃γ ∈ Γ γ |= ♦ (Continuation1 = Error) ⇒
(∃` P os I1 dr 0 ` ⇒ ` ≺ `0 ∨ `  `1 ) ∨
(∃` P os I1 dr 0 ` ⇒
∃`0 RDef P rocessor1.regs(0) ` `0 ⇒ `0  `) ∨
(∃` P os I1 dr 0 ` ⇒
∃`0 RDef P rocessor1.regs(0) ` `0 ⇒
¬ usesOnly M em(3) `0 )
By definition, π |= ♦ (Continuation1 = Error) implies that the dr 0
command transitioned to “Error”; this can only happen if the local RCU
pointer was N il when dereferenced or the object to which it pointed was
“Freed”.
A dereference operation occurs either inside the Read-Side Critical Section or outside it. If the dereference operation occurs outside of the critical region, Continuation1 = Error and Restriction 1 is violated.
If the dereference operation occurs within the critical section, then the
error continuation results only if the dereference operation attempts to
dereference N il or the operation attempts to dereference a memory location that has been F reed. Dereferencing N il can only occur if there is
32

no reaching definition for that use that is before the use. Dereferencing
a F reed location can only occur if there is a reaching definition that is
located after the use. Both of these violate Restriction 2.
This shows that it is not possible to reach the “Error” continuation if
the accessor meets Restrictions 1 and 2 and every definition of the local
RCU uses only the global RCU pointer. Thus, if Restrictions 1 and 2
are met, then if the accessor reaches the “Error” continuation, it must
be the case that some reaching definition for the local RCU pointer does
not reference the global RCU pointer. This violates Restriction 3.
Thus,
(¬ Memory Safe ⇒ ¬ R1) ∨
(¬ Memory Safe ⇒ ¬ R2) ∨
(¬ Memory Safe ⇒ ¬ R3)
Which gives the lemma
R1 ∧ R2 ∧ R3 ⇒ Memory Safe
t
u.
Theorem 2 (Restrictions 1, 2, 3 are necessary and sufficient for Instrumented RCU to be Memory Safe).
Memory Safe ⇔ R1 ∧ R2 ∧ R3
Proof:
Memory Safe ⇒ R1 ∧ R2 ∧ R3

Lemma 9

R1 ∧ R2 ∧ R3 ⇒ Memory Safe

Lemma 10

Memory Safe ⇔ R1 ∧ R2 ∧ R3

Definition of ⇔

t
u.
8.3.3 Uninstrumented RCU has the live object and (with restrictions)
memory safety properties
Let L be the uninstrumented (processor) semantics and L0 be the instrumented
semantics. As noted in Sections 5 and 6, the difference between the instrumented
and uninstrumented syntax and semantics are Nil as a val and the dr and di
instructions transitions to Error. Note that storing Nil to a memory location
(or register) is a different transition than storing digits to a memory location
(or register).
Recall that a computation is a sequence of configurations (Definition 4) such
that the transition from one configuration to the next is determined by the
application of a semantic rule. The semantics of a language is defined by the set
33

semantic rules that define the transitions — each semantic rule is a potential
transition in the execution of some program.
Let T = {t0 , . . . , tn } be the set of transitions for L (uninstrumented semantics).
Let the set of transitions T 0 = {t0 , . . . , tn , . . . , tm } be those for the instrumented
semantics, L0 . T ⊂ T 0 ⇒ L ⊂ L0 and, in fact, T 0 differs from T only in that T 0
has the extra transitions tn+1 , . . . , tm — that is, T 0 = T ∪ {tn+1 , . . . , tm }.
Let P be the uninstrumented RCU “Classic” program and P 0 be the instrumented RCU “Classic” program. As described in Definition 13, the instrumented
RCU “Classic” program adds the instruction, load 0 Nil, at the end of the RCU
Read-side Critical Section.
L0 (P 0 ) = {h00 , . . .} is the set of all executions of instrumented RCU (P 0 ) with
respect to the instrumented semantics, L0 . Similarly, L(P ) = {h0 , . . .} is the
set of all executions of uninstrumented RCU with respect to the uninstrumented semantics, L. Since L and L0 differ only with respect to transitions
{tn+1 , . . . , tm }, if no computation in the L0 executions of P 0 uses those transitions, then L0 (P 0 ) = L(P ).
Lemma 11.

∀t ∈ {tn+1 , . . . , tm }, ∀h0 ∈ L0 (P 0 )
t
h0 6|= hc, σi → hc0 , σ 0 i ⇒
L0 (P 0 ) = L(P )

Theorem 1 shows that an instrumented version of RCU executing an instrumented semantics does not collect live objects. Theorem 2 shows that an instrumented version of RCU executing an instrumented semantics is memory safe if
and only if certain restrictions are obeyed. By instrumenting the code and the
semantics, these properties were stated simply and the properties easily proved.
∀π 0 ∈ L0 (P 0 ) π 0 |= Q
To prove that the uninstrumented code executing the uninstrumented semantics
has the same properties, identify restrictions R such that
R(P 0 ) ⇒ L0 = L
∧ R(P 0 ) ⇒ P 0 = P
Then,
R(P ) ⇒ ∀π ∈ L(P ) π |= Q
Lemma 12.
(∀π 0 ∈ L0 (P 0 ) π 0 |= Q ∧
(R(P 0 ) ⇒ L0 = L ∧ R(P 0 ) ⇒ P 0 = P )) ⇒
R(P ) ⇒ ∀π ∈ L(P ) π |= Q
Proof:
Rewrite and renaming.
t
u
34

Lemma 13.
R1 ∧ R2 ∧ R3 ⇒ L = L0
Proof:
By Lemma 10, we have that implementations of RCU that satisfy R1, R2,
R3, cannot make transitions to the error continuation; thus, those transitions are not realizable. The transitions to the error continuation are
what distinguishes the instrumented semantics from the uninstrumented
instructions.
Then by Lemma 11, when the transitions to the error continuation cannot
be realized by a program P , the set of computations defining the execution
of P using the instrumented semantics, L0 (P ) is equivalent to the set of
computations from using the uninstrumented semantics, L(P ). This gives
the lemma:
R1 ∧ R2 ∧ R3 ⇒ L = L0
t
u
Lemma 14.
R1 ∧ R2 ∧ R3 ⇒ P = P 0
Proof:
Figure 4 illustrates the define/usage chains for the local RCU pointer
when the constraints identified by restrictions R1, R2, and R3 are satisfied.
def local RCU Ptr: Processor1.Regs(0):=Nil

I1

(* INSTRUMENTATION *)

def local RCU Ptr: Processor1.Regs(0):=Global RCU Ptr
Use local RCU ptr

def local RCU Ptr: Processor1.Regs(0):=Nil

(* INSTRUMENTATION *)

Resched 1
I1

def local RCU Ptr: Processor1.Regs(0):=Global RCU Ptr
Use local RCU ptr

def local RCU Ptr: Processor1.Regs(0):=Nil

(* INSTRUMENTATION *)

Fig. 4. R(P ) ⇒ P = P 0 . Definitions provided by the instrumented code are
killed by definitions in the uninstrumented code. Between definitions there are
no intervening usages; thus, the instrumented definitions can be eliminated as
dead code.
35

It is easy to see that instrumented definitions for the local RCU pointer
can be eliminated as dead code5 since every “instrumented” definition is
followed by another “uninstrumented” definition without an intervening
usage[34]. Thus,
R1 ∧ R2 ∧ R3 ⇒ P = P 0
t
u
Lemma 15 (Fair executions of RCU implementations that satisfy restrictions R1, R2, and R3 do not collect live objects).
R1 ∧ R2 ∧ R3 ⇒ ∀π ∈ L(P )f air π |=  ¬ (collectLive 1)
Proof:
R1 ∧ R2 ∧ R3
L(P ) = L0 (P 0 )
∀γ ∈ L0 (P 0 )f air γ |=  ¬ (collectLive 1)
∀π ∈ L(P )f air π |=  ¬ (collectLive 1)

Assumption
Assumption, Lemma 13, Lemma 14
Lemma 6
Subst. “=” and renaming

Discharge assumption:
R1 ∧ R2 ∧ R3 ⇒ ∀π ∈ L(P )f air π |=  ¬ (collectLive 1)
t
u
Theorem 3 (RCU implementations that satisfy restrictions R1, R2,
and R3 do not collect live objects).
R1 ∧ R2 ∧ R3 ⇒ ∀π ∈ Π π |=  ¬ (collectLive 1)
Proof:
The Fair Progress Constraint (Assumption 1) partitions the set of RCU
computations such that every computation is either in the set of fair
executions or it is not:
∀π ∈ Π (π ∈ Πf air ∨ π 6∈ Πf air )
Since every π 6∈ Πf air does not collect storage
∀π ∈ Π π |= ‘collect live object‘ ⇒ π ∈ Πf air
Thus,to prove the theorem that RCU does not collect live objects, it is
sufficient to prove that no fair execution of RCU collects a live object,
that is
∀π1 ∈ Πf air π1 |=  ¬ (collectLive 1) ⇒
(∀π2 ∈ Π π2 |=  ¬ (collectLive 1))
Assuming R1 ∧ R2 ∧ R3, then the antecedent is given by Lemma 15;
Modus Ponens and renaming gives:
5

When translated to SSA form, definitions without a following use show up as assignments to variables that are never used.

36

∀π ∈ Π π |=  ¬ (collectLive 1)
Discharging the assumption gives the theorem:
R1 ∧ R2 ∧ R3 ⇒ ∀π ∈ Π π |=  ¬ (collectLive 1)
t
u
Theorem 4 (RCU implementations that satisfy restrictions R1, R2,
and R3 are Memory Safe).
R1 ∧ R2 ∧ R3 ⇒ ∀π ∈ L(P ) π |= Memory Safe
Proof:
R1 ∧ R2 ∧ R3
L(P ) = L0 (P 0 )
∀γ ∈ L0 (P 0 ) γ |= Memory Safe
∀π ∈ L(P ) π |= Memory Safe

Assumption
Assumption, Lemma 13, Lemma 14
Modus Ponens, Assumption, Lemma 10
Subst. “=” and renaming

Discharge assumption:
R1 ∧ R2 ∧ R3 ⇒ ∀π ∈ L(P ) π |= Memory Safe
t
u

9

Conclusions

RCU has resisted rigorous analysis because existing semantic based approaches
under approximate multiprocessor behavior and the algorithm relies on abstruse
interactions with the operating system that aren’t reconciled with language semantics. This work tackles the latter by incorporating the relevant operating system context switch behavior in a minimal language capable of expressing RCU.
We define the formal semantics of this language using a small-step operational
semantics that maps language constructs to configuration transformations. We
express the live object and memory safety properties in linear temporal logic and
use the sets of traces comprised of sequences of configurations that are created
by simulated execution of the RCU program via the operational semantics as
models to determine the validity of these properties. We instrument both the
code and the language semantics to simplify the verification effort — we prove
that instrumented RCU executing instrumented semantics does not collect live
objects and is memory safe. We then introduce restrictions on the definition and
use of the local RCU pointers that allow us to eliminate the instrumentation;
thus, RCU implementations that conform to these restrictions do not collect live
objects and are memory safe.
Our work advances the state-of-the-art by capturing the real behavior of an
operating system context switch in a way that exposes the intermediate state of a
system where one thread has been switched out and the next is yet to be switched
37

in. Our semantics captures this with the “R” configuration. We show that this
intermediate state is used by RCU to achieve synchronization without the use of
synchronization primitives because the operating system blocks execution until
a particular event occurs.
We formalize restrictions for the definition and use of the RCU local pointer.
These restrictions can help certify RCU implementations and are readily accommodated by static analysis tools.
This work demonstrates that operational verification techniques are useful for
proving liveness and safety properties of concurrent, reactive programs. We give,
for the first time, proofs that RCU does not collect live objects and that it is
memory safe.
Our proofs that RCU does not collect live objects and is memory safe use a
novel strategy. We instrument the code to make it simple to specify and prove
the properties. We next introduce restrictions on the program code that allows
us to remove the instrumentation; thus, code conforming to the restrictions has
the properties. This technique proved especially valuable in showing that RCU
is memory safe.
We confirmed our intuition that an operational semantics based on state transformation is easy for engineers to understand — as part of the review process,
we invited experts in RCU multiprocessor programming to look at our RCU
implementation and the language semantics. They readily understood the RCU
implementation and were able to follow the machine language semantics. They
confirmed that our implementation was “correct” and that our semantics were
realistic6 .

10

Future work

Our processor model and language included only those “features” necessary to
prove the live object and memory safety properties of RCU “Classic” executing on a sequentially consistent multiprocessor. However, modern processors are
not sequentially consistent and deployed RCU implementations protect recursive
data structures such as a linked list. RCU linked lists are especially interesting
because they satisfy a safe-traversal property: RCU accessors traversing a linked
list always reach a next node or the end of the list. Other non-blocking synchronization techniques such as software transactional memory[47, 13, 12], hazard
pointers[33], and lock-free approaches[14, 7, 8] do not satisfy this property.
To develop appropriate machine abstractions for weakly consistent architectures,
we are studying how modern processors implement strong serialization properties
akin to acquire/release semantics[10, 11] and in particular how locks and memory
barriers ensure the visibility of the global RCU pointer. At this stage, we are
confident in the description of the weakest but still useful memory model[9].
6

This tells why it is important to have a specification.

38

A natural next step is to extend our operational verification approach to RCU
linked list implementations that execute on weakly (the weakest) consistent
memory architectures.

References
1. I. Attali, D. Caromel, and M. Russo. A formal executable semantics for Java. In
Proceedings of Formal Underpinnings of Java Workshop, OOPSLA ’98, 1998.
2. T. Axford. Concurrent Programming: Fundamental Techniques for Real-Time and
Parallel Software Design. Wiley Series in Parallel Computing. Wiley, 1990.
3. B. A. Bowen and R. J. A. Buhr. The Logical Design of Multiple-Microprocessor
Systems. Prentice-Hall, 1980.
4. D. Comer. Operating System Design – The XINU Approach. Prentice-Hall, 1984.
5. M. Compton. Stenning’s protocol implemented in udp and verified in isabelle. In
CATS ’05: Proceedings of the 2005 Australasian symposium on Theory of computing, pages 21–30, Darlinghurst, Australia, Australia, 2005. Australian Computer
Society, Inc.
6. E. A. Emerson. Temporal and modal logic. In J. van Leeuwen, editor, Handbook of
Theoretical Computer Science, vol. B, pages 996–1072. Elsevier Science Publishers
/ MIT Press, 1990.
7. K. Fraser. Practical lock freedom. PhD thesis, Cambridge University, September
2003.
8. K. Fraser and T. Harris. Concurrent programming without locks. ACM Transactions on Computers, 25(2), May 2007.
9. M. Frigo. The weakest reasonable memory model. Master’s thesis, Massachutes
Institute of Technology, 1997.
10. K. Gharachorloo. Memory consistency models for shared-memory multiprocessors. Technical Report CSL-TR-95-685, Stanford University, Dept of EE and CS,
December 1995. http://www.hpl.hp.com/techreports/compag-dec/WRL-95-9.pdf.
11. K. Gharachorloo, D. Lenoski, J. Laudon, P. B. Gibbons, A. Gupta, and J. L.
Hennessy. Memory consistency and event ordering in scalable shared-memory
multiprocessors. In 25 Years ISCA: Retrospectives and Reprints, pages 376–387,
1998.
12. T. Harris, S. P. Jones, and M. Herlihy. Composable memory transactions. In ACM
Conference on Principles and Pracitce of Parallel Programming, Chicago, Illinois,
USA, June 2005.
13. M. Herlihy, V. Luchangco, and M. Moir. A flexible framework for implementing
software transactional memory. In OOPSLA, pages 253–262, 2006.
14. M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for
lock-free data structures. In Proceedings of the Twentieth Annual International
Symposium on Computer Architecture, 1993.
15. G. Holzman. Spin Model Checker. Wiley, 2004.
16. P. Howard, J. Walpole, P. McKenney, and J. Triplett. The case for relativistic
programming. Technical report, Portland State University, January 2009.
17. M. Huth and M. Ryan. Logic in Computer Science: Modelling and Reasoning about
Systems. Cambridge University Press, 2nd edition, 2004.
18. L. Lamport. How to Make a Multiprocessor Computer that Correctly Executes
Multiprocess Programs. IEEE Transactions on Computers, 28:690–691, 1979.

39

19. L. Lamport. What good is temporal logic? In R. Mason, editor, Information
Processing 83: Proceedings of the IFIP 9th World Computer Congress, pages 657–
686, Amsterdam, 1983. North Holland.
20. H. Liu and J. S. Moore. Java program verification via a JVM deep embedding in
ACL2. In Theorem Proving in Higher Order Logics (TPHOLS 04, volume 3223 of
Lecture Notes in Computer Science, pages 184–200. Springer-Verlag, 2004.
21. Z. Manna and A. Pnueli. The temporal logic of reactive and concurrent systems —
Specification. Springer-Verlag, 1992.
22. Z. Manna and A. Pnueli. The temporal logic of reactive and concurrent systems —
Safety properties. Springer-Verlag, 1995.
23. P. McKenney. Hierarchical RCU. Linux Weekly News, November 2008.
24. P. McKenney and J. Walpole. What is RCU, fundamentally? Linux Weekly News,
December 2007.
25. P. McKenney and J. Walpole. What is RCU? part 2: Usage. Linux Weekly News,
December 2007.
26. P. McKenney and J. Walpole. What is RCU? part 3: The RCU API. Linux Weekly
News, January 2008.
27. P.
E.
McKenney.
Read-Copy
Udate
Usage.
http://www.rdrop.com/users/paulmck/RCU/linuxusage.html — this site is
updated quarterly.
28. P. E. McKenney. Exploiting Deferred Destruction: An Analysis of Read-Copy
Update Techniques in Operating System Kernels. Ph.D. Dissertation; OGI, 2004.
http://www.rdrop.com/users/paulmck/RCU.
29. P. E. McKenney. Using Promela and Spin to verify parallel algorithms. Available:
http://lwn.net/Articles/243851/ [Viewed September 8, 2007], August 2007.
30. P. E. McKenney and S. Rostedt. Integrating and validating dynticks and preemptable rcu. Available: http://lwn.net/Articles/279077/ [Viewed April 24, 2008], April
2008.
31. J. M. Mellor-Crummey and M. L. Scott. Scalable reader-writer synchronization
for shared-memory multiprocessors. In PPOPP ’91: Proceedings of the third ACM
SIGPLAN symposium on Principles and practice of parallel programming, pages
106–113, New York, NY, USA, 1991. ACM Press.
32. J. Meseguer. Rewriting as a unified model of concurrency. In OOPSLA/ECOOP
’90: Proceedings of the workshop on Object-based concurrent programming, New
York, NY, USA, 1991. Association for Computing Machinery.
33. M. M. Michael. Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects.
IEEE Transactions on Parallel and Distributed Systems, 15(6):491–504, June 2004.
34. S. S. Muchnick. Advanced Compiler Design Implementation. Morgan Kaufmann
Publishers, Inc., 1997.
35. H. R. Nielson and F. Nielson. Semantics With Applications. John Wiley & Sons,
1992. Revised 1999 – http://www.www.daimi.au.dk/~hrn.
36. S. Owicki and D. Gries. Verifying Properties of Parallel Programs: An Axiomatic
Approach. Communications of the ACM, 19(5):279–285, 1976.
37. G. D. Plotkin. A powerdomain construction. SIAM Journal on Computing, 5:452–
487, September 1976.
38. G. D. Plotkin. Structural operational semantics. Lecture Notes DAIMI FN-19,
1981.
39. G. D. Plotkin. The origins of structural operational semantics. The Journal of
Logic and Algebraic Programming, 60:3–15, 2004.
40. A. Pnueli. The temporal logic of programs. In Proceedings of the 18th IEEE
Symposium on Foundations of Computer Science, pages 46–67, 1977.

40

41. W. Pugh, J. Mason, and S. V. Adve. The java memory model. POPL, pages
378–391, January 2005.
42. T. Ridge. Operational reasoning for concurrent caml programs and weak memory
models. In TPHOLs, pages 278–293, 2007.
43. S. Sarkar, P. Sewell, F. Z. Nardelli, S. Owens, T. Ridge, T. Braibant, M. O. Myreen,
and J. Alglave. The semantics of x86-cc multiprocessor machine code. In POPL ’09:
Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles
of programming languages, pages 379–391, New York, NY, USA, 2009. ACM.
44. D. Sarma and K. Thomas. Read Copy Udate HOWTO. Linux Documentation,
2001. http://lse.sourceforge.net/locking/rcu/HOWTO/index.hmtl.
45. S. Schacht. Proving properties of actor programs using temporal logic. In In G.
Agha and F. De Cindo (Eds.), Proc. of the workshop on object-oriented programming and models of concurrency, 1995.
46. F. B. Schneider. On Concurrent Programming. Graduate Texts in Computer
Science. Springer-Verlag, 1997.
47. N. Shavit and D. Touitou. Software transactional memory. Distributed Computing,
10(2):99–116, February 1997.

41

