On proving register atomicity by Awerbuch, B. et al.
Centrum voor Wiskunde en lnformatica 
Centre for Mathematics and Computer Science 
B. Awerbuch, L.M. Kirousis, E. Kranakis, P.M.B. Vitanyi 
On proving register atomicity 
Computer Science/Department of Algorithmics & Architecture Report CS-R8707 May 
The Centre for Mathematics and Computer Science is a research institute of the Stichting 
Mathematisch Centrum, which was founded on February 11, 1946, as a nonprofit institution aim-
ing at the promotion of mathematics, computer science, and their applications. It is sponsored by 
the Dutch Government through the· Netherlands Organization for the Advancement of Pure 
Research (Z.W.O.). 
'~ 16-i, ! I ,() 1 f) \i~I, bq 1)~ \I 6c\ 'D~'-1 
Copyrigpt © Stichting Mathematisch Centrum, Amsterdam 
On Proving Register Atomicity 
Baruch Awerbuch 
Massachusetts Institute of Technology 
Department of Mathematics and Laboratory for Computer Science 
Cambridge, Massachusetts (baruch@theory.lcs.mit.edu) 
Lefteris M. Kirousis 
University of Patras, Department of Mathematics, Patras, Greece 
and 
Centre for Mathematics and Computer Science 
Amsterdam, The Netherlands (lefteris@cwi.nl) 
Evangelos Kranakis 
Centre for Mathematics and Computer Science 
Amsterdam, The Netherlands (eva@cwi.nl) 
Paul M. B. Vitanyi 
Centre for Mathematics and Computer Science 
Amsterdam, The Netherlands (paulv@cwi.nl) 
Concurrent access of shared variables by asynchronous processes does not require 
mutual exclusion, but can be solved with no waiting. A fruitful paradigm in this context 
is the notion of a shared register satisfying a niceness condition called atomicity. The 
model is rigorously presented, and then a method is given for proving register atomicity. 
It is then used to give simple proofs of the atomicity of two register constructions, the 
matrix register and the Bloom register, without assuming the existence of a global clock. 
(The matrix construction shows how to implement an atomic, n-writer, n-reader register 
with value domain V from n2 atomic - or even regular - I-writer, 1-reader registers with 
value domain N xv, with N the set of nonnegative integers. Bloom's construction shows 
how to implement an atomic, 2-writer, n -reader register with value domain V, from two 
atomic, I-writer, n-1-reader registers with value domain {0,l}xV. These constructions 
are so simple that they may be even practical.) 
1980 Mathematics Subject Classification: 68C05, 68C25, 68A05, 68B20. 
CR Categories: B.3.2, B.4.3, D.4.1., D.4.4. 
Keywords and Phrases: Register, run, atomic, regular, reader, writer, proof method 
Note: This paper is submitted for publication elsewhere. 
The work of the fust author was supported in part by the Air Force Office of Scientific Research under Con-
tract TNDGAFOSR-86-0078. The work of the fourth author was supported in part by the Office of Naval 
Research under Contract N00014-85-K-0168, by the Office of Army Research under Contract DAAG29-84-
K-0058, by the National Science Foundation under Grant DCR-83-02391, and by ~e Defence Advanced 
Research Projects Agency (DARPA) under Contract N00014-83-K-0125. 
Report CS-R8707 
Centre for Mathematics and Computer Science 
P.O. Box"4079, 1009 AB Amsterdam, The Netherlands 
-2-
1. Introduction 
We are interested in true concurrency in the context of shared register access by asynchronous 
processors. Concurrency control of asynchronous processes is often realized by actively serializ-
ing concurrent actions, using synchronization primitives like mutual exclusion, semaphores, and 
locking. Thus, although it seems that the actions are executed concurrently, in the system they 
are actually executed serially in some order. It has been pointed out in [3] that to implement such 
primitives we first need interprocess communication through a shared memory unit, which we 
shall call a register, even if the processors communicate by message passing. This suggests that 
the problem of simultaneous memory access needs to be solved without recourse to synchronisa-
tion primitives. It is desired that such a solution involves no waiting by one operator for another 
one. Thus we kill two birds with one stone, since it is the waiting involved in synchronization 
methods to control the communication between asynchronous participants, which may make such 
solutions unacceptable. Note, that asynchrony need not be due solely to hardware, but can also 
be caused by multiple users on the various machines. The problem of providing general wait-free 
asynchronous communication interfaces becomes more acute, as more and more hardware from 
different technologies, scale and speed continue to be connected in computer networks and other 
complexes. The purpose of the present investigation is to examine the feasibility of such general 
interfaces. In particular, we analyse the problem of how to implement a shared register which 
can be read by different asynchronous processors (the readers) and be written by different asyn-
chronous processors (the writers) in a truly concurrent fashion. That is, without any restrictions 
to prevent simultaneous access and making no assumptions, either about the relative durations of 
the reads and writes, or about the actual timing of the lower level constituent operation execu-
tions. 
More precisely, we are given some registers with certain restrictions on their mode of 
operation, e.g. that only a certain number of operators are allowed to access each one of them. We 
are asked to construct a more powerful (compound) register without some of the original restric-
tions, while retaining some of the positive characteristics of the subregisters, e.g. their serializable 
mode of operation (otherwise called atomicity). These compound registers will comprise a set of 
registers (i.e. subregisters) and an operation execution on the compound register will consist of a 
sequence of operation executions on the subregisters that follow a given protocol. A well-known 
instance of the above problem is when the subregisters can be read or written by only one opera-
tor, and the compound register should be accessible to many readers or writers. In this case, each 
read (or write) on the compound register will comprise a sequence of reads and writes on the 
subregisters. Suppose now that the subregisters do have a serial (or atomic) mode of operation. 
Here we allow that the absence of concurrency in the subregisters is not actual, i.e. the results are 
as if the operations were executed serially. The atomicity of the subregisters does not imply a 
priori that the operation executions on the compound register are serializable. This is a conse-
quence of the fact that each operation on the compound register generally will comprise more 
than one operation on the subregisters. One way to make the compound register operate atomi-
cally is, essentially, to coerce it to operate consistently (i.e. in a serializable fashion) via the men-
tioned synchronization primitives. Serializability in concurrent databases is usually enforced in 
this way [6]. These methods, however, entail some sort of synchronization among the operators, 
which makes necessary the slowing down of the fast operators to the pace of the slow ones (see 
[9] for a detailed exegesis). 
H'~re, we assume complete asynchronicity among the operators, and no waiting. All we 
require from the constructed protocol is that it guarantees the existence of some total (i.e., linear) 
-3-
order in which the operation executions on the compound register could have taken place (exter-
nal consistency). This order, in some sense, represents the succession these operations seemingly 
follow. This idea has been successfully applied by Bloom, Lamport, Peterson, and Vitanyi and 
Awerbuch (see [1], [3], [7], [9], [10]). Of course, for such a total order to be meaningful, it must 
satisfy certain additional requirements. For example, there should be no second write placed by 
this order between a read and the write it reads (internal consistency). If we assume that there is a 
global (i.e. referring to all registers) time-reference system (otherwise, a global clock), and if all 
subactions of an operation execution on the compound register precede in time all the subactions 
of a second operation execution on the compound register, then this order must place the second 
operation execution after the first one. In general, we have a relation on the operation executions 
which is naturally imposed by the problem (e.g., an acyclic relation that tells if an operation exe-
cution can have an influence on another), and we desire the existence of a total order that extends 
this relation, but without violating the above restrictions. If this is possible for each scenario of 
operation executions of a proposed register, then the register is atomic. Atomic access of com-
pound shared registers is related to version management in distributed systems. E.g., [3], [7] deal 
with techniques for distributing a shared variable, while [l], [3], [8], [9], [10] and the current 
paper deal with some aspects of concurrency control for replicated shared variables. 
In the next section we present the model rigourously, and give two general atomicity cri-
teria that are suitable for proving register atomicity (the second criterion is a simple variation of 
the first, but is more suitable for 1-writer registers). We use the general 'causality' model pro-
posed by Lamport [3], specialized towards shared registers and atomicity, and we have no need to 
assume the existence of a global clock. We also investigate how the assumption of global time 
affects these criteria, by proving a rather general shrinking function theorem. In Section 3 we 
prove the atomicity of a multiwriter, multireader compound register constructed from atomic 1-
writer, 1-reader subregisters. We also show that the construction is optimal in the number of 
subregisters used, and the fact that it suffices if the subregisters are 'regular' (regularity is a 
weaker requirement than atomicity). This 'matrix' register appeared first in [9]. In Section 4 we 
prove the atomicity of a 2-writer, n-reader compound register constructed from atomic 1-writer, 
n -reader registers. This 'Bloom' register appeared first in [3]. The atomicity proofs of the matrix 
and Bloom registers given here, are the first such proofs that are based solely on causality con-
siderations. They are influenced by the proofs in [l], [3], [9], (10]. 
Finding the 'right' formalism and level of rigour is of major importance in this complicated 
area. Ideally, as stated in [7], one designs protocols "sufficiently simple that there is no need to 
provide complicated, and possibly not understandable, formal proofs.'' Despite their apparent 
simplicity, the protocols treated here are subtle, and their correctness is by no means obvious. 
We therefore need rigorous proof. However, excess of formality may prove counterproductive, 
make proofs incomprehensible and introduce errors. Our goal is to develop simple formal criteria 
and proof methods, and demonstrate their usefulness by proving these nontrivial protocols correct 
in a convincing manner. 
2. The Model 
A proto-register is an abstract data type, capable of holding values out of a given domain of 
values. Initially the proto-register is empty. The operations that can be performed on the proto-
register are writes and reads. A write of a value puts that value in the proto-register. A read 
reports a value from the domain. 
We assume that such values have an identity apart from a value. That is, values written by 
-4-
different write operation executions may have the same value, e.g. 0, but they are not identical. 
The identity id ( v) of a value v written by a write w is defined by id ( v )=w . (Thus, v =v' and 
id (v ):i:id (v') may be both true.) If a read operation execution reports a value v , then either 
id (v )=w for a particular write w or else id (v) is undefined. If a read r reports v with id (v )=w · 
for some write w , then we say that r reports the value written by w . The proto-register can be 
implemented by a multiset over a given domain. That is, an unordered list of elements, where the 
same element can occur more than once. A proto-register has associated with it a finite set of pro-
cessors called the writers and a finite set of processors called the readers. A processor can be 
both a reader and a writer. A write can only be performed by a writer, while a read can only be 
performed by a reader. 
A sequential register is a proto-register where all operations are executed in sequence, and 
an execution of a read operation reports the value written by the execution of the last write opera-
tion that precedes it. A sequential register can be implemented by a linear list. Originally, the 
list is empty. A write adds an element to the end of the list, and a read reports the element at the 
end of the list. 
We address the problems arising from true concurrency, where we allow simultaneous 
operation executions by different processors. However, simultaneous operation executions by the 
same processor are excluded. Informally, we aim at a specification of a general concurrent regis-
ter, register for short, which corresponds as closely as possible to that of the sequential register. 
In the atomic register defined below, the operations may be actually executed concurrently, yet it 
· will seem as if they were executed in sequence. For read operations to report a value which was 
written by a write operation to the register, there must be causal relations between the operations. 
We define an 'apparent' precedence relation (-7) on the set of operation executions, which cap-
tures the crucial aspect of the causal relations between operation executions to the same register.* 
To define various degrees of niceness conditions on a register with simultaneous operation execu-
tions, we need some formal definitions first. We use 'action' as synonym for 'operation execu-
tion'. 
A run p = (A , -7, rt) consists of the following: 
(R 1) A finite or countably infinite set A of read and write actions. If R is the set of read actions 
and W is the set of write actions, which were actually performed during the course of the 
run, thenA=WUR and WnR=0. 
(R2) A reading mapping which is a partial function 1t : R -7 W. 
(R3) An irreflexive partial order -7 on the set A of actions.t We call -7 a precedence relation. 
*Lamport [3] defines two precedence relations~ and--~. the semantics of which are intended to be problem in-
dependent. If a "precedes" relation on the subactions of a ,b is defined, then a --~ b means "some subaction of a 
precedes some subaction of b ", and a ~ b means "each subaction of a precedes each subaction of b." In the context 
of shared register access there is always an intended way for the actions to interact, which ensures correctness of the al-
gorithm. Therefore, it is advantageous to reflect this essential causal relation between the actions of a particular algo-
rithm by a single made-to-measure precedence relation. This relation is our~. not to be confused with Lamport's --?, 
and will have an algorithm dependent semantics. 
t It is not a priori obvious that the precedence relation needs to be cycle free. Suppose we have defined --? such that it 
is transitive and contains cycles. In our frameworlc this is supposed to have a physical interpretation. Accordingly, in 
the following discussion, we use some well-known notions from modern physics, in particular the geometry of space-
time as in special relativity, as in [4]. Occurrence of events a ,b such that a -?b & b ~a, is intended to have the physi-
cal meaning that subevents of a precede subevents of b, and subevents of b precede subevents of a, in space-time. 
E.g. a and b involve the same point in space at alternating times, first a, then b, and then a again. In contrast, oc-
currence of events a ,b which are not related in --? at all, i.e., -,(a -?b or b -?a), can mean that no subevent of a pre-
-5-
If a -?b then we say a precedes b . To initialize the run, there is an initial write that pre-
cedes all other actions. We moreover require that, for each aeA, there are only finitely 
many b eA such that -,(a -?b ). Informally, this means that a run begins at some point in 
time, rather than extending in the infinite past [3], and that an action cannot be infinitely 
long or infinitely small in duration. 
Intuitively, a --? b will imply that, in the aspect we deem important, a may influence b , but 
b cannot influence a . Two actions a , b are called concurrent if -,(a --? b or b --? a). I.e., if 
they are incomparable in the relation --?. If w is a write and r is a read , then w directly pre-
cedes r, if w -?r and there is no write w' , such that w --? w' --? r. 
Irreflexive orders. All orders in this paper are irreflexive. For convenience, "total order" 
and "partial order" will henceforth mean "irreflexive total order" and "irreflexive partial 
order," respectively. 
A run p = (A , --?, 7t) can now be classified into the following categories according to how 
well it behaves under concurrent operations. The definitions below closely follow the presenta-
tion of Lamport [3]. The 'normal' run is a new category we found advantageous to introduce. 
l. (safe) For each read r, that has no concurrent writes, 1t(r) is defined, and directly precedes 
r.* 
2. (normal) For each read r, 1t(r) is defined, and 7t(r) either precedes r or is concurrent with 
r. 
2. (regular) For each read r, 1t(r) is defined, and 1t(r) directly precedes r or is concurrent 
with r. (Hence a regular run is both safe and normal.) 
4. (atomic) A run is atomic if it is normal and there is a total order ==>, which we call an 
atomic precedence relation, on the set A of actions, as follows. 
(i) if a-?b then a =>b (external consistency), and 
(ii) for each read r, 1t(r) is the write directly =>-preceding r (internal consistency). 
We say that==> atomically extends the precedence relation--?. 
Without proof we state the hierarchy involved. Every atomic run is regular, but not every 
regular run is atomic. By definition, regular runs are exactly the ones which are both safe and nor-
mal. There are runs which are safe but not normal, and there are runs which are normal but not 
safe. 
A run is a possible set of operation executions by a register, a possible 'history'. We now tie 
up the notion of a register and the notion of a run. Intuitively, a register is a deterministic 'black 
cedes any subevent of b , and no subevent of b precedes any subevent of a , in space-time. E.g., a and b take place at 
different points in space, and a light ray sent by a to b 's point in space arrives after b has finished, while a light ray 
sent by b to a 's point in space arrives after a has finished. Both cases imply concurrency albeit of a different kind. For 
us, it is not necessary to distinguish between them, due to the later restrictions (R7)-(R9) on the choice of~. We there-
fore replace a transitive~ with cycles by a relation~, defined as: a~'b iff (a~b) & -.{b-M). Then~' is an 
irreflexive partial order. W.l.o.g., therefore, we define ~ from the outset as an irreflexive partial order. 
* In the definition of a run we assumed that the reading function 7t is partial, since it is perfectly possible in general to 
have a read that returns a value that has not been written by a write. For an example, consider a safe register with a 
domain of values {0,1) where all writes write a 0. If a read overlappes a write, then it is legitimate by our definitions to 
have this read return the value 1, a value not written by any write. In most cases though, the protocols one designs in-
struct a read to return the value written by a particular write. Similarly, a read can return a value, actually written by a 
different.,write than the one it ostensibly ought to choose, as long as this value is the right one. This requires a more 
elaborate definition of it than we need here. 
-6-
box' that reports a value in response to a read query. We can view the function of this black box 
as associating a reading mapping with a given pair (A ,7 ). Since (A ,-t) is a high-level descrip-
tion, it is actually an equivalence class of different finer grained descriptions. These differences 
may give rise to different responses to the same read query. Therefore, the register associates a 
set of reading mappings with each (A ,-t ). Let II be the set of all its possible reading mappings. 
Formally, a register mapping REG: {(A ,-t)}-t2rr is a total mapping, that associates a nonempty 
set of reading mappings 7t with each pair (A ,-t) satisfying (Rl), (R2) and (R3). With each regis-
ter we associate a register mapping. We assume that each processor actually executes operations 
to the register serially. This assumption is embodied in requirement (R4) below. If K is a register 
and REGK is its associated register mapping, then a run p=(A ,-t,7t) of K satisfies 
(R4) if a and b are different actions by the same processor then either a-tb orb-ta, and this 
total -t-order on the actions by the same processor is identical with the serial order in 
which a processor executes its actions in A ; 
(R5) 7t e REGK(A ,-t); and 
(R6) if a read r returns a value v and id (v )=w, then 1t(r )=w. 
A register is atomic (respectively regular, normal, safe) if each of its runs is atomic 
(respectively regular, normal, safe). Obviously, the atomic register is the ideal register; the 
operations may be concurrent, yet they seem to be executed in a serial fashion, extending the 
given precedence relation (external consistency) and consistent with the reading function (inter-
nal consistency). The notion of register atomicity is closely related to what is called '(strict) seri-
alizability' in conventional concurrency control, in particular in the context of databases with 
concurrent 'transactions'.* See e.g. [ 6]. Given a particular implementation of a data structure, 
and having selected the particular precedence relation -t we wish to employ, t it is often simple 
to check whether it is a safe, a normal, a regular register or none of these. However, proving 
atomicity using the given definition, or its 'shrinking' variant which we will meet below, turns 
out to be a difficult matter. Therefore, in the next section we propose simple criteria which are 
necessary and sufficient for atomicity. In later sections we show how to use these atomicity 
* Concurrent transactions are usually called atomic if they are both serializable and recoverable. Recoverability means 
that each transaction appears all-or-nothing: either it executes to completion (in which case we say that it commits) or 
it cannot influence other transactions (in which case we say that it aborts). Recoverability is a problem only in the pres-
ence of failures. We assume that registers are failure-free, so we do not consider recoverability. 
t In our definition, (Rl) through (R6) do not preclude that~ is empty, apart from the total ~-order on the set of ac-
tions by each processor (R4). The sets of actions by different processors are necessarily disjoint. Suppose we construct 
a register and choose~ such that, for each associated run p=(A .~.it), we have: 
(i) if a ,b are actions by different processors, then -,(a ~b orb ~a), and 
(ii) if r is a read by processor p, then it(r) is a write by processor q, q '#p. 
Then this register is vacuously atomic. This example shows clearly, that the significance of the properties of a register 
is derived from the meaningfulness of our precedence relation ( ~ ). In section 2.2 we show that this abstract approach 
suffices to cover the conventional global time semantics by a 'natural' choice for ~. In [3], Lamport proceeds dif-
ferently. As already noted in a previous footnote, the two precedence relations~ and -- ~distinguished by him are 
not chosen freely. Ultimately, his precedence relations have a physical foundation with its roots in the notion of causal-
ity. The intuitive concept of register implies the possibility of communication by means of read and write operations to 
the register. But this requires some causality(--~) relations between reads and writes of the register. For 1-writer re-
gisters, Lamport restricts registerhood to constructions satisfying his axiom Bl: "for any read r and write w to the 
same register, either r --~w or w --~r (or both)". The causality relation--~ refers to subactions of the two 
actions involved. Therefore, we defer the introduction of our version of B 1 to the discussion of the compound register, 
where it appears as (R9). 
-7-
criteria for verifying that proposed constructions implement atomic registers. 
2.1. Atomicity Criteria 
For a proposed data structure K to be an atomic register, it suffices to prove that each of its runs, 
as defined in (Rl) through (R6), is atomic. Let p =(A .~,1t) be a normal run of K, so 1t is total. 
We divide the set of actions A into equivalence classes induced by 1t. Each such equivalence 
class, called a clan, is associated with a write. The clan associated with a write w is the set 
[w] = {w ]v{r e R : rc(r) = w }. For any two writes w, w' define [w] ~Jt [w'] if and only if 
w *' w' and there exist actions a e [ w] and a' e [ w'] such that a ~ a' . Note that ~Jt is not 
necessarily acyclic. The following theorem is basic for proving the atomicity of runs. 
Theorem 2.1. (Atomicity Criterion) Let p = (A , ~. 1t) be a run. The following statements 
are equivalent: 
(1) p is atomic. 
(2) p is normal and ~Jt is acyclic. 
Proof. (1) implies (2). Let p be atomic. By definition, atomicity implies normality, which 
shows the first part of (2). To show the second part of (2), let => be a total order that atomically 
extends ~. I.e., for each read r, we have 
(i) 1t(r )=>r, and 
(ii) there is no write w with rt(r) => w => r. 
We prove that ~Jt is extrndible to a total order, which implies acyclicity of ~Jt. It is enough to 
show that for any two writes w , w', if [ w] ~1t [ w'] then w => w' . 
Since => is a total order, the negation of w =>w' is equivalent to w' =>w. Therefore, we 
only need to show that for any two writes w , w', if w' => w then -,([ w] ~1t [ w'] ). Thus, suppose 
w' =>w. Exhaustive analysis of all cases shows that then the combination of (i) and (ii) implies 
w' =>r' => w =>r, for all reads re [ w] and r' e [ w']. Hence, there are no a e [ w] and a' e ( w'] such 
that a~,. Therefore, -,([w ]~1t[w' ]). 
(2) implies (1). Assume (2) holds. It is clear that the transitive closure of ~Jt is a partial 
order, which in turn can be extended to a total order =>Jt. Since p is normal, for each read 
re [w], we have -,(r ~ w). Hence, there is a total order =>cwl on each [w] atomically extend-
ing~ and such that w =>cwJ r, for each read re [w]. Define a unique relation=> on the set A 
as follows. For all a ,a' eA, a=>a' if and only if either 
(i) a ,a' e [w] and a=>cw1a', or 
(ii) a e [w ], a' e [w'], and [w] =>it [w']. 
Clearly,=> is a total order atomically extending~. It follows that p is atomic. • 
The second atomicity theorem refers to registers with only one writer. In this case, for each 
pair of different writes w ,w' eA, either w ~w' or w' ~w, by (R4). The theorem is similar to a 
corresponding theorem in Lamport [3]. 
Theorem 2.2. (1-writer Atomicity Criterion) Assume that K is a register with only one 
writer. Then for each run p = (A, ~. rt) of K the following statements are equivalent: 
(I) p is atomic. 
(2) p is regular and rt is weakly monotonic (i.e., if r ~ r', then either 1t(r) ~ 1t(r') or 
1t(r) = rc(r' )). 
Proof. (1) implies (2). Let p be atomic. Atomicity implies regularity. Therefore we only 
- 8 -
need to prove weak monotonicity of 7t. 
Let r ,r eA be different reads with r-+r. Since there is only one writer, we have by (R4) 
that either 1t(r )-+1t(r') or 1t(r )=1t(r) or 7t(r' )-+1t(r ). Atomically extend --+to a total order=>, as 
in the definition of an atomic run. Exhaustive case analysis shows that, by the properties of =>, 
either 1t(r) => r => 1t(r') => r' or 7t(r) = 1t(r' ). 
(2) implies (1 ). Let p be regular and 1t be weakly monotonic. By Theorem 2.1, if we prove 
that -+it is acyclic, then we are done. Assume to the contrary, there is a cycle 
[w] -+it [w' ]-+it ... -+it[w] . 
Since [w]-+lt[w'], there are ae [w] and a' e [w'], w;ew', such that a-+a'. If both a and a' are 
writes then w-+w'. If a ,a' are both reads, then by weak monotonicity of 1t, we have w-+w'. If a 
is a read and a' =w', then a -+w' -+w (=1t(a )) contradicts normality of p. Therefore w-+w' by 
(R4). If a=w and a' is a read, then w' -+w-+a' (7t(a')=w') contradicts safety of p, and therefore 
w-+w' by (R4) again. Hence, [w]-+1t[w'] implies w-+w'. Since this argument holds for all pairs 
of adjacent clans in the cycle, we obtain a cycle w -+w'--+ · · · -+W. This contradicts that --+ is a 
partial order. • 
2.2. Global Time, Intervals and Shrinking 
In [l], [7], [8], [9], atomicity is related to the assumption of a global time reference frame (also 
called global clock). We show that the theory as developed here, along the lines of [3], is more 
general. In particular, a register is atomic in global time if and only if it is atomic for a particular 
choice of the --+ precedence relation. This precedence relation turns out to be the interval order 
induced by the time intervals representing the actions. The notions developed and results proved 
in the rest of the present section are not needed for the results of the remainder of the paper. 
An important aspect of atomic runs is the following property. Although their actions have a 
duration on a global time scale, and such durations may overlap, each action may be considered 
to take place instantaneously, i.e., as if it happened completely at a particular time instant. If all 
of these time instants are distinct, then the apparent time instants of the actions orders the actions 
totally. This relates the order approach to atomicity with the global time approach. 
Time is represented by the set of real numbers R, ordered as usual. Assume that every 
action a is represented by an open time interval (s (a), f (a)), s (a)< f (a), within the bounds of 
which the action is supposed to have taken place. s(a) (respectively, f (a)) is a real number 
called the starting (respectively, finishing) time of the action a . * 
Define the precedence relation --+, as the natural relation a-+b iff f (a)::;; s (b ). A relation 
which is so induced by a set of intervals of the real line R, satisfies the axioms of a special type of 
partial order called interval order. Formally, an interval order on a set A is an irreflexive rela-
tion --+ that satisfies 
* To exclude some technical difficulties. there is usually an assumption that s(a ):#(b ), s (a)"#/ (b) and f (a)# (b ), 
for any two distinct actions a ,be A, and s(a ):t:f (a) for each action a EA. The fact that we should be allowed to as-
sume that no two starting or finishing times are equal, is justified by appeal to the sensibility of natural law [3]. "No 
physical meaningful result could depend on upon completely accurate knowledge of these times. (It makes no physical 
sense to specify starting and finishing times of an operation execution down to the fraction of a micropicosecond.)" By 
excluding the starting and finishing times from the duration associated with action a , to obtain the desired effect in the 
mathemat~~al frameworlc, we may come closer to the spirit of physics. Thus, we choose to represent durations of ac-
tions as open intervals. 
-9-
a ~b & c -td implies a ~d or c ~b, for all a ,b ,c ,de A (see [2]). (2.1) 
Every interval order is a partial order, and hence the previously developed theory applies. Since 
not every (irreflexive) partial order is an interval order, the global time approach requires more 
from the precedence relation ( ~) than the general approach in (R3 ). We now proceed with the 
definitions of this more conventional global time approach to atomicity. The relation between the 
two approaches is analysed in Theorem 2.3. 
A shrinking function on the set of actions of a run p=(A .~.7t), with~ the interval order 
induced by the set of intervals {(s (a )J (a ))<;;.R: a eA}, is a one-to-one function cr that associates 
with each action a of the run a time instant (i.e., a real number) cr(a) such that: 
(SI) cr(a) belongs to the interval (s (a )J (a)) of a. 
A shrinking function gives a possible serialization of the actions. Condition (Sl) enforces exter-
nal consistency of the serialization. In the order approach, external consistency follows from the 
fact that the serialisation extends ~- Define the precedence relation ~CJ• induced by cr, as a ~Gb 
iff cr(a )<a(b ). Obviously, ~CJ is a total order on A. Then (S 1) implies that ~CJ extends ~- A 
shrinking function cr is consistent with the reading mapping 7t if 
(S2) (A .~CJ,1t) is atomic. 
A run p is shrinking atomic if there is a shrinking function cr such that (Sl) and (S2) are 
satisfied. 
Theorem 2.3. (Shrinking Function Theorem) Let p=(A .~,7t) be a run, and let ~ be the 
interval order induced by a representation of open (time) intervals of the actions in A. The fol-
lowing statements are equivalent: 
( 1) p is atomic, and 
(2) p is shrinking atomic. 
Proof. (1) implies (2). Suppose (1) holds. Let~ be a total order which atomically extends 
~- Define Q (a )={b: -,(a ~b) & -,(b ~a)}. Note that, by (R3), Q (a) is finite, and that Q (a) is 
nonempty since a e Q (a). Define, by induction on a, cr(a) to be a real number such that: 
(i) if b ~a then cr(a )>cr(b ), 
(ii) cr(a )>s (a), and 
(iii) cr(a )<µ, with µ=min{f (b ): be Q (a)}. 
Note that (ii) and (iii) imply (Sl), and (i) implies (S2). Induction is possible if: 
(a) if b ~ then cr(b )<µ, and 
(b) s(a)<µ. 
Let bminE Q (a) be an action such that µ=f (bmiJ. 
Ad (a). Assume b~a. If b~bmin then cr(b )<f (b )~(bmin)<f (bmin)=µ. 
If -.(b ~bmin) & -,(bmin~b) then, since b ~, we have -,(bmin~b ). Therefore, bminE Q (b ). 
Then, O'(b )<min{f (C ): C E Q (b )}~ (b min)=µ. 
If b min~b then b min~b ~a, contradicting b minE Q (a). 
Ad (b). Since -,(a ~bmin) & -.(bmin~a) we haves (a )<f (bmin)=µ. 
(2) implies (1). Assume (2). Since ~CJ is a total order extending ~and satisfying (S2), 
atomicity of p is immediate. • 
" 
-10-
2.3. Compound Register 
The most obvious approach to constructing a register is to build it from simpler ones. The 
existence of such a simpler register is either postulated, or it is constructed from still simpler 
registers. More precisely, a compound register consists of a finite number of registers, called 
subregisters. The set of readers and writers of the compound register is the union of the set of 
readers and writers of the subregisters. The subregisters are allowed to hold a value out of a 
given domain. We can distinguish essentially two cases. In one case the subregisters are simpler 
than the compound register in that their domain of values is smaller than the value domain of the 
compound register. Then the construction for the compound register distributes the value to be 
stored piecemeal over the subregisters. E.g., a positive integer n can be distributed in log n bits 
over log n boolean subregisters. In the other case the set of readers and writers associated with 
the compound register is larger than the set of readers and writers associated with each subregis-
ter. Then the construction for the compound register replicates the value to be stored as versions 
in several subregisters. A reader has to determine the 'latest' version among the versions it 
obtains from different subregisters. To make this possible, extra information such as a 'times-
tamp' is attached to each version. As a result, the value domain of each subregister has to be 
larger than the value domain of the compound register. The constructions of compound registers 
in this paper are of the latter type. To express their complexity we use the following cost meas-
ures. Let V be the value domain of the compound register, v= IV I, and let there be n readers and 
m writers associated with the compound register. Let S ,T: VxNxN ~ N be total cost functions, 
with N the set of nonnegative integers. Let the value domain of each subregister of the com-
pound register be (isomorphically) contained in TAGxV, with ITAG I =S(v,n,m), the number 
of elements in TAG. Then the space complexity of the compound register is log S (v ,n ,m ). The 
processors execute read or write actions on the compound register, independently of each other 
but following a protocol. Let each read or write action by a given processor on the compound 
register consist of at most T (v ,n ,m) read and/or write actions on the subregisters. Then the time 
complexity of the compound register is T (v ,n ,m ). An action on the compound register is con-
sidered to be a higher-level operation execution of the same nature as its subactions. Thus, with 
each run of the compound register is associated a run of each subregister which constitutes the 
compound register. This means that we associate with each subregister a set of subactions related 
by a precedence relation. Sets of subactions associated with different subregisters are disjoint. 
We assume that a processor actually executes all its subactions in serial order. The disjoint pre-
cedence relations of the subactions on respective subregisters are related by the order in which 
each processor executes its subactions. For the compound register we define a transitive pre-
cedence relation (-») on the set of all subactions involved, as follows. 
Let K be a compound register comprising subregisters K 1, ••. , Kn. Let p=(A .~,1t) be a run 
of K and let p;=(Ai .~i ,1t;) be the associated run of subregister Ki, ,lg 91. The precedence rela-
tion -» on the set ul!:1Ai, is defined as the minimal transitive1 relation that extends all pre-
cedence relations ~;, 1::=;;t 91 , such that 
(R7) if a. and pare different subactions by the same processor, then either a.-»P or P-»a., but 
not both, and this total -»-order on the subactions by the same processor is identical with 
the actual serial order in which the processor executes these subactions; and 
(R8) if a ,be A and for each subaction a. of a and each subaction P of b holds a.-»P, then 
arlb. 
Lemma 2.4. -» is a partial order. 
-11 -
Proof. Clearly, (R7) precludes -»-cycles containing two subactions by the same proces-
sor. Therefore, since the sets of subactions on the same subregisters are disjoint, any -»-cycle 
contains only subactions on the same subregister. But these subactions are partially ordered, 
which contradicts such a - »-cycle. • 
Finally, we need to express the 'registerhood' of the compound by suitably restricting the 
choice of --t. That this is necessary can be seen from the following example. Let K be a com-
pound register consisting of subregisters K 1,l( 2• Let p be a writer associated with subregister K 1, 
and let q be a reader associated with subregister K 2, p :/=q. Then there is no way that q can read 
what p has written. Yet runs of K can satisfy (Rl) through (R8) and even be atomic. For exam-
ple, atomicity of K 1,l( 2 implies atomicity of K. Such anomalies are due to the fact that we have 
not yet required the existence of causal relations between actions by d.iff erent processors. There 
must be some causal relation between a write and a read, since otherwise a reader cannot report 
what a writer wrote. There must be some causal relation between two writes, because otherwise a 
writer cannot replace the value in the register by the value it wants to write. However, it is not 
necessary to have a causal relation between two reads; this is because neither do reads have to 
change the value contained by the register, nor do they need to report what the another read 
wrote. The following condition expresses these requirements on the compound register in terms 
of subregisters. Assuming the general setting above: 
(R9) if a ,b EA are not both reads, then there are subactions a of a and p of b, a,p are not both 
subreads, and some i (l~i 'S',n ), such that a,pEAi. (a and pact on the same subregister Ki .) 
It follows that a choice of --t satisfying (R9) is 'proper' if the choices of the --ti 's are 'proper.' 
This can be argued as follows. Assume that the ultimate subsub .. subregister is atomic. If a ,b are 
not both reads, then there are subactions a of a and p of b, not both reads, which act on the same 
subregister, and so on. At the atomic subsub .. subregister level the subsub .. subactions involved 
have an apparent total order. Choose this as the precedence relation. For convenience, let the 
Ki 's be the basic atomic subregisters, so the --tj 's are total orders. Then either a--ti p or P--tja, 
but not both, by (R 7). Suppose a--ti p. If c ,d EA , c -ta and b --td, then by (R8) we have c --td. 
Suppose P--tja. If c ,d EA, a--tc and d--tb, then by (R8) we have d--tc. Using the precedence 
relations at the previous level, we induce in this fashion a 'coarse' precedence relation at each 
next higher level compound register. Our choice of --t is constrained to be an extension of this 
coarse precedence relation. That is, (R7) through (R9) restrict the freedom of our choice of --t 
appropriately, by ultimately reducing the constraints on our choice of precedence --t to pre-
cedence at the elemental level. 
2.4. Naming of Registers 
Unfortunately, the naming conventions for types of registers are inconsistent. For a 1-writer 
register, the operator who writes can simply remember the value it wrote last. Therefore, the 
name '1-writer, 1-reader' register is used for a register that can be read by both writer and reader 
[3]. By analogy, we use '1-writer, (n-1)-reader' register for a register that can be written by one 
writer and read by n -1 readers that cannot write. The writer can always read as above. However, 
in an m-writer register, with m>l, while a writer can remember what it wrote last, this value can 
have been overwritten by a later write of another writer. Hence, here we might as well have writ-
ers that cannot read (in addition to readers that cannot write). We will, however, only consider 
register§ where the writers can also read. For us, an 'm-writer, n-reader' register, m>l, desig-
nates a register that can be written by m processors, and read by n processors including the m 
writers (n '2:Tn ). 
- I2-
3. The Matrix Register 
The ma~ register is a compound n -writer, n -reader register constructed as a matrix of atomic 
/ ' 
I-writer, I-reader subregisters. The domain of values of the subregisters is the cartesian product 
of the domain of values of the compound register with the nonnegative integers. The matrix regis-
ter was the first general atomic multiwriter register, and was initially defined in [9]. It may be a 
register of practical importance, because of its simplicity, elegance and low complexity. Its 
atomicity was derived in [9] by proving that four special conditions were satisfied, using global 
time. The present proof is an application of the atomicity Theorem (Theorem 2.I). Shrinking 
atomicity is an immediate corollary, by Theorem 2.3. 
Architecture. Let p i, ... ,pn be n processors and let K be an n xn matrix register consisting 
of n2 atomic, I-reader, I-writer registers K;,j, i,j =I, ... ,n. Eachp; is a writer of (i.e., is con-
nected to the write terminal ot) each Ki.j·. Eachp; is also a reader (i.e., is connected to the read 
terminal) of each Kj.i. Let V be the domain of values of the compound register. Then 
Nx{l, ... ,n}xV, with N the nonnegative integers, is the domain of values of each subregister. A 
tag is a pair (k,i), where k is a nonnegative integer and i e {I, ... ,n}. We say that each subregis-
ter can hold a tag, next to a value from the domain V of the compound register. All subregisters 
are initialized with tag (0,I) and value 0. Moreover, each run of the compound register starts 
with a write action, which precedes all other actions, as required by (R3). The architecture is 
depicted in Figure I. 
Pi P2 
pi~·····~········~···~ 
p, ~ ......... ,K~+ .. ····IK,,, ......... ,K'.,j 
. . 
·········~·········~ 
Figure 1: An action by processor p 2 in the 4-reader, 4-writer, matrix register. 
Protocol. The register K obeys the following protocol. 
-13 -
p; writes the value v : 
1. for all j = 1, ... ,n read Kj ,i (i.e., read the i th column); 
2. determine the lexicographically largest tag (kmax•m ); 
3. set own tag to (kmax+l,i); 
4. for all j = 1, ... ,n write on Ki ,j (i.e., write to the i throw) the new tag, as well as the value v. 
Pi reads: 
1. for all j = 1, ... ,n read Kj ,i (i.e., read the i th column); 
2. determine the lexicographically largest tag (kmax.m) and let Vm be the value contained in a 
register with such a tag; 
3. set own tag to (k max.•m ); 
4. for ~11 j = 1, ... ,n write to K; ,j (i.e., write to the i th row) the new tag, as well as the value 
Vm, which was determined in 2. (Also, report Vm .) 
Each action a by processor Pi consists of a set of subreads R (a ,l,i), ... ,R(a ,n ,i) fol-
lowed by a set of subwrites W (a ,i ,1), ... , W (a ,i ,n ), where the last two indices i ,j indicate the 
subregister K; ,j on which the subaction took place. The order in which these subreads and 
subwrites take place is arbitrary, but for the fact that each subread precedes each sub write. Each 
subregister K; ,j (l:$;i ,j :$;n) of K is atomic. Let p=(A ,-+,7t) be a run of K. Let Ai be the subset of 
actions inA that a,re executed by Pi (l:$;i:$;rz). Define, for all l:$;i,j:$;rz, p;,j=(A;,j,-+i,j•TC;,j), the 
run of Ki,j associated with p, where 
R;,j={R (a ,i ,j): aeAi }, 
W;,j={W(a ,i ,j):aeAi }; 
R; ,j is the set of subreads, and Wi ,j is the set of sub writes on subregister Ki ,j. Since Ki .j is 
atomic, there is an atomic extension ~i ,j of --+; ,j, for all l:$;i ,j :$;n . This atomic extension is a 
total order on the subactions executed on the subregister concerned. Moreover, if a subread reads 
a subwrite (on a subregister), then there is no other subwrite placed between them by this order. 
The orders on the disjoint sets of subactions associated with each subregister are related by the 
orders on the disjoint sets of subactions by each processor. Let-» be the minimal transitive 
relation on the subactions in Ui~j=IAi,j extending the ~;./s and satisfying (R7). By (R7), the 
subactions by the same processor p are totally ordered by-». This order is the serial execution 
order of the subactions by p . In particular, -» must satisfy: 
R (a ,i ,j)-» W(a ,j ,k), (3.1) 
for all aeAi and all 1:$;/ ,j ,k:$;n. We now define a precedence relation --+on A. For any two 
actions a and b on the compound register K, by p; and p j, respectively, let --+ be the transitive 
closure of-+': 
a --+' b iff W(a ,i ,j)-» R (b ,i ,j). (3.2) 
~ 
Clearly, ~s satisfies (R8) and (R9). 
-14-
Lemma 3.1. --+ is a partial order on A . 
Proof. Existence of a ...+-cycle containing a e Ai, implies W (a ,j ,k )-»R (a ,i ,j ), for some 
k ,i (lg.,i~ ). This contradicts (3.1), since-» is a partial order by Lemma 2.4. • 
Remark. If-» extends the original partial orders -+;,j. instead of the apparent total orders 
==?;,j. then Lemma 3.I still holds for the--+ resulting from (3.2). This will be useful in the proof 
of Theorem 3.5. 
The following theorem is the main result of this section. 
Theorem 3.2. The matrix register K is an atomic, n-writer, n -reader compound register, 
which is implemented with n 2 atomic, I-writer, I-reader registeris. 
Proof. 
Let p =(A ,-+,TC) be a run of K. Examine the write protocol. For a write w e A, let v (w) 
be the value written by w to the compound register, and, if t(w) denotes the tag determined in 
step 3 of w, let (t(w ),v(w)) be the value written to the subregisters in step 4 of w 's execution.* 
For a read re A, let v(r) be the value reported by r from the compound register, and, if t(r) is 
the tag associated with v (r ), let (t (r ),v (r)) be the value written to the subregisters in step 4 of its 
execution. The pair (t (r ),v (r )) is selected in step 3 of the read protocol. 
Claim. For each read r, there is a write w , such that 
t(r)=t(w)& id(v(r))=id(v(w))(=w). (3.3) 
If id (v (r )) = w then 1t(r) = w. Hence, 1t is total. 
Proof of Claim. The subregisters are initialized with tag (0,1), and there is a write preced-
ing all other actions, by (R3). Hence, there is an a eA, such that (t (r ),v (r )) = (t (a ),v (a)). If a is 
a write then we are done, else a is a read and we repeat the argument. If r....+a, then, by (3.2) and 
atomicity of the subregisters, r can not read the value written by a to the subregister involved. 
There are only finitely many µ , such that -,(r -+a), --+ is a partial order, and there is an initial 
write preceding all other actions, by (R3). Hence, we need only finitely many repetititions of the 
argument before we find a write w such that (3.3) holds. If id (v (r )) = w then 1t(r) = w by (R6). 
Since this holds for each read r, 1t is total. This proves the Claim. 
Let < 1x be the irreflexive lexicographic order on pairs of integers. If a ,be A such that 
a ....+b (therefore a i:b ), then it follows by (3.2) and the choosing of the new tag in step 3 of the 
write and read protocols, that: 
t(a)~tx t(b) (t(a)<tx t(b)ifbeW). (3.4) 
The prove atomicity, by Theorem 2.I, we only need to prove that p is normal and that there 
is a total order extending the -+1t relation among the clans. Intuitively, we proceed by first choos-
ing a plausible total order on the set of writes, and next showing that the corresponding total order 
on the set of clans extends -+it. In the matrix register, the obvious total order on the set of writes I 
is the lexicographical order of the associated tags. Proceeding this way, the conclusion of the 
theorem follows from Theorem 2.1 and by the following lemma. 
Lemma3.3. 
* In the next section on Bloom's algorithm, it is useful to distinguish between t (a), the identity of a tag, and va/ (t (a)), 
the value 6f a tag. In this algorithm, however, val (t (a ))=va/ (t (b)) iff t (a )=t(b ), so we do not want to load the nota-
tion unnecessarily. 
- 15 -
( 1) p is normal, and 
(2) if[w] ~it [w'], then t(w) <1x t(w'). In particular, ~it is acyclic. 
Proof. (1). By the Claim above, 1t is a total function. Let 1t(r)=w (i.e., re [w]). Then, by 
(3.3), t(r)=t(w). However, if r~w. then by (3.4) we have t(w)>1xt(r), which is a contradic-
tion. Hence, -,(r ~w ), i.e., p is normal. 
(2). Let [w] ~it [w']. By definit'ion of ~it, there exist actions a e [w] andb e [w'] such 
thata ~b. 
Suppose b = w'. Then by (3.3) and (3.4) it follows that t(w')>1xt(w), which is as claimed. 
Suppose that a = w and b is a read. By (3.3) and (3.4), t(w) :$;1x t(b) = t(w'). If w ,w' are 
writes by different processors, then their tags have different processor numbers; if they are writes 
by the same processor then, since w *' w', one of them ~-precedes the other. Therefore, by (3.4), 
they must have different tags. In both cases, t ( w) :to t ( w' ), which is as claimed. 
Suppose both a,b are reads. Then t(a)=t(w):$;1x t(b)=t(w'), by (3.3) and (3.4). The 
proof oft ( w ) *' t ( w') is now exactly as before. This proves the lemma. 
The proof of the theorem is finished. • 
Corollary. The matrix register is shrinking atomic. 
Proof sketch. The argument goes as follows. Assume global time. The interval representa-
tions of the subactions induce the ~i ,j precedence relations on the subregisters. Each such rela-
tion is therefore an interval order. The intervals associated with the subactions of each processor 
are linearly ordered (do not overlap) by definition. Since each subregister Ki ,j is atomic, each run 
Pi.j=(Ai,j•~i.j.1ti,j) has a shrinking function <J;,j such that (A;,j.~cr'J'Tti,j) is shrinking atomic, 
by Theorem 2.3. Since the associated intervals are open, we can always choose the <J;,/s such 
that a'= Uf.j=t<Ji,j is one-to-one. Define-» as the total order of the real images of the subac-
tions under cr', i.e.,-» agrees with the usual total order< on the reals.' Then-» satisfies (R7) 
and (3.1). Define a ~,, b iff cr'(a.) <<J'(~) for all subactions a of a and~ of b. Then~,, is an 
interval order. This satisfies (R8) and (R9), and ~ is a refinement of~". The proof of Theorem 
3.2 goes through exactly as before, with interval order~,, instead of~. which implies that regis-
ter K is shrinking atomic by the Shrinking Function theorem (Theorem 2.3). • 
3.1. Complexity and Optimality 
The time complexity of the matrix register is 2n (or rather 2n-2, as follows from Theorem 3.4 
below) which seems to be as low as it can possibly be.* The space complexity of the matrix 
register is unbounded. In theory this is pretty bad. In practice, however, this solution uses far 
less space than many solutions which theoretically do better. For instance, in [9] a solution has 
been proposed where the space complexity of the compound register is Sn 2Iog n. However, we 
can assume that a system executes only a limited number of actions on the compound register in 
its total lifetime. If we set a generous bound of at most 250 such actions, the matrix solution is 
superior in terms of space complexity, with respect to the mentioned bounded space solution, for 
any number n ~ of associated processors. Thus the matrix register has effectively a lower space 
complexity than comparable solutions with bounded space complexity, even for solutions which 
solve only subproblems of the one addressed by the matrix solution. An exception is the Bloom 
* Cf. rela~ed lower bounds on distributed match-making in [5]. 
- 16-
register in the next section (with only two writers), which both effectively and theoretically can-
not be improved in space and time complexity. 
Another complexity criterion is the number of subregisters of a certain type used in the 
compound register. Leaving out the subregisters on the main diagonal, which are redundant, the 
matrix solution is optimal in the number of 1-writer, 1-reader subregisters used. 
Theorem 3.4. (Optimality) The implementation of a compound safe n -writer, n -reader 
register from I -writer, I-reader subregisters, requires at least n (n-1) such subregisters (atomic 
or not). Register K, minus the subregisters on the main diagonal, is such an optimal implemen-
tation. 
Proof. Suppose we have implemented a safe compound n -writer, n -reader register R, with 
associated processors p 1, ••• , Pn, from 1-writer, 1-reader subregisters. For each ordered pair of 
processors (p; ,pj ), 1:s;;i ,j:s;;n and i'::l:j, we can consider a run ({w ,r },-+;TC) of R, consisting solely 
of two nonoverlapping operation executions: a write w by Pi, followed by a read r by Pj. Since 
R is safe, 1t(r )=w. Since i -:t:j, there must be a subregister R;,j, such that p; is the associated writer 
and Pj is the associated reader. There are n(n-1) different ordered pairs (pi.Pj). i-:t:j. In each 
such ordered pair the first element is a writer and the second element is a reader. No subregister 
Ri ,j can be associated with more than one such (writer, reader) pair, since the subregisters have 
only one associated writer and one associated reader other than the writer. Hence, there must also 
be n (n-1) different subregisters Ri ,j in the compound register R. -This is exactly achieved by the 
presented matrix register K, noting that the subregisters on the main diagonal are superfluous. 
I.e., Pi can remember what it wrote last in Ki ,i. • 
The assumption of atomicity of the subregisters of the compound matrix register is con-
venient in the proof, but is it necessary? It turns out that regularity of subregisters suffices to 
obtain atomicity of the compound matrix register. First we note that the relation between the 
operation of the subregisters and the operation of the compound register can be expressed as fol-
lows. Let maxix mean the lexicographic maximum. The value written by a write subaction 
satisfies: 
v (W (a ,i ,j)) = (t (a), v (a)), l:s;;j :s;;n, with 
t(a) = maxix{t(b ): rr:k,i (R (a ,k ,i ))=W (b ,k ,i ), l:s;;k:s;;n} if a is a read, and 
t(a) >ix maxix{t(b ): 1tk,;(R (a ,k ,i))=W (b ,k ,i ), I:s;;k:s;;n} if a is a write. 
(3.5) 
Theorem 3.5. (Regularity of Subregisters) The registers Ki,j do not need to be atomic; 
regularity is sufficient to guarantee atomicity of K. 
Proof. Let p and Pi.j be defined as before. Define-» as the minimal transitive relation on 
the subactions UI'.j=tAi,j extending the -+i,j 'sand satisfying (R7). Then (3.1) holds for this new 
-», and we define a new -+in terms of the new-» as in (3.2). This satisfies (R8) and (R9) 
again. The new-+ is a partial order on A, since the proof of Lemma 3.1 goes through unchanged 
for the new relations-» and-+. 
Since the subregisters are regular, by Theorem 2.2 there is only one way in which associ-
ated runs can fail to be atomic. Namely, if the reading mapping associated with such a run is not 
weakly monotonic. We need to show that this does not r~sult in a reading mapping that is incon-
sistent w.ith atomicity of the run of the compound register. The proof is by first assuming that all 
subregisters are atomic, and then replacing the atomic subregisters by regular subregisters, one by 
one, retaining an atomic compound register after each step. In the step involving subregister K; ,j, 
-17 -
we use induction on the ordered set of read subactions in A; .i. So, assume first that all subregis-
ters K;,i (l~i .j~) of K are atomic. 
Base Case: i+j~l. Run p using atomic subregisters is atomic by Theorem 3.2. 
Induction: n'2:.i ,j'2:.1. Assume the register mapping induced by the compound register K 
stays atomic (i.e., all runs are atomic) under replacement of the atomic subregisters Kp,q by regu-
lar ones, for all p ,q such that (p ,q )< /x (i ,j ). By way of contradiction, assume that the register 
mapping becomes nonatomic if we replace also Ki .i by a regular subregister. I.~., there is a nona-
tomic run p=(A ,-7,1t). By the inductive assumption, the nonatomicity of p must be due to a first 
nonatomic event in Pi .i. By Theorem 2.2 this must be as follows. 
By (R7), -» is a total order on the subactions by the same processor. Let R (a ,i ,j) and 
R (b ,i ,j) be the -»-least pair of consecutive read subactions by Pi on K;,;, such that 
R (a ,i ,j)-7;,i R (b ,i ,j) & W(t ,i ,j)-7;,i W(d ,i ,j), (3.6) 
1t; .i(R (a ,i ,j)) = W (d ,i ,j) & 1t;,i(R (b ,i ,j )) = W(c ,i ,j) . 
For p to be nonatomic as a result of (3.6), 1t(b) must be different from a write it could have been 
in case p;,i were atomic. By choice of R (a ,i ,j) and R (b ,i ,j ), it follows by (R7) and (R8) that a 
and b must be -?-consecutive actions by Pi. 
Since a -?b, if Pi .i were atomic, then, by Theorem 2.2, 
if 1t;,i(R (a ,i ,j))=W(d ,i ,j) then 1t;,i(R (b ,i ,j))=W(e ,i ,j), (3.7) 
for some action e such that d-?e or d=e. Now consider the subregisters on the main diagonal. 
Processor Pi is both the only writer and the only reader associated with Kj,i. In particular, regard-
less of whether the subregisters are regular or atomic, we have for p: 
1tj,j(R (b ,j ,j)) = W (a ,j ,j). (3.8) 
Intuitively, even though R (b ,i ,j) scans too old a value, the way 1t(b) is determined as in 
(3.5) will show no difference whether we assume (3.6) or (3.7), because of (3.8). More formally, 
consider 
S ={t(x): W (x,k,j)=rck,j(R(b,k,j)), 1~~} 
S' = (S - {t(c )}) u {t(e )} . 
Then S' is the set we would have instead of S, if K; ,j were not replaced by a regular subregister. 
By the induction hypothesis, S' gives rise to an atomic compound register. But, S contains also 
t(a) by (3.8), and by (3.6) and (3.5) we have t(a)'?:.1xt(d) Hence, the max1x(S) is invariant under 
whether rc;,j(R(b,i,j))=W(c,i,j) or rc;,j(R(b,i,j))=W(d,i,j), i.e., whether (3.6) or (3.7) hold. 
Consequently, atomicity of the compound register is invariant under (3.6) or (3.7) as well. This 
ends the induction, and therefore the proof of the theorem. • 
Using regular subregisters, one may get the impression that the main diagonal subregisters 
in the matrix register cannot be left out. However, again a processor can remember what it wrote 
last, and so we can dispense with the main diagonal. 
In the matrix construction, the reads on the compound register, comprise writes on the 
subregi&Jers. A natural question to ask is, whether we can implement a compound atomic register 
such that the reads on the compound register do not contain writes on a subregister. In [3] it is 
proved that to implement an atomic register from regular ones, the reads of the compound 
-18-
register necessarily include writes of the subregisters. Hence, writes of subregisters are unavoid-
able in reads of the atomic matrix register, assuming it is constructed from regular subregisters 
(as in Theorem 3.5). In contrast, in Bloom's register below, where the subregisters are atomic, 
reads of the compound register do not involve writes of a subregister. 
4. The Bloom Register 
The Bloom register is a compound 2-writer, n -reader register, constructed as a pair of atomic 1-
writer, n -reader subregisters. The domain of values of the subregisters is the cartesian product 
{O,l}xV, with V the domain of values of the compound register. This register, the first atomic 
2-writer register, was designed by Bard Bloom, and is e.g. defined in [1]. It is remarkable in the 
simplicity of its protocol, and that the domain of values of the subregister is only twice the 
domain of values of the compound register. The argument for its atomicity, as given in [1], uses 
global time and is not easy to follow. It assigns a value cr(a) to the time interval I (a) associated 
with each action a (i.e., a shrinking value). The present proof is based only on causality con-
siderations. The atomicity of the register follows by Theorem 2.1. If we assume global time, then 
the precedence relation -7 defined below is an interval order. Hence, shrinking atomicity of 


















, ... ,' 0 ...... 
- - - - - - - - - - p; - - - - - - - - -
... , 
' 

















Figure 2: The 2-writer, n-reader, Bloom register. 
Throughout the present section $ denotes modulo 2 addition. Let p 0 ,p 1, ••. ,p11 _ 1 be n 
processors and let K be a compound register consisting of two atomic, 1-writer, n -reader subre-
gisters, say Ko, K I· Processor po is a writer of Ko, and processor p 1 is a writer of K 1• All proces-
sors Pi are readers of both Ko and K 1, i =O, ... , n -1. Let V be the value domain of the com-
pound register K. Then {O,l}xV is the value domain of each subregister. A tag is an element of 
{0,1}. We say that each subregister can hold a tag, next to a value from the domain V of the com-
pound register. Denote the tag variable in Ki by, say ti, i =0,1. Both registers are initialized with 
tag 0 and value 0. Moreover, each run of the compound register starts with a write action, as 
-19 -
required by (R3). The architecture is depicted in figure 2. It implements an n-reader, 2-writer 
register K. The register obeys the following protocol. 
Pi writes the value a (i = 0,1): 
1. read tie1; 
2. ti := i $tie1; 
3. write ti and value a in register Ki 
Pi reads value: 
1. read to from Ko; 
2. read t 1 from K 1; 
3. read the value from register Kj, for j = t 0 $ t 1; 
Ignoring step 2 of the write protocol, because it is no register access operation, every read or 
write action a by processor p; consists of two or three atomic read or write subactions. * 
Let p =(A ,-7,1t) be a run associated with K. Let a eA. Denote by p (a) the number of the 
processor that executes a . Denote by R ( w ,i $1) the subaction of a write w that reads from 
subregister K; e i, and denote by W ( w) the subaction of w that writes to subregister Ki , for 
i =p ( w ). Denote by Rs (r ,i) the first subaction of read r that reads subregister Ki , i =O, l, and 
denote by Rt (r ,j) the final subaction of r that reads Kj, for j is either 0 or 1. Each subregister 
Ki (i=0,1) of K is atomic. Let p=(A ,-7,1t) be a run of K. Define, for i=O,l, Pi=(Ai ,-7i ,'Tti ), the 
run of K; associated with p. Since K; is atomic, we can extend -7; to a total atomic precedence 
relation =>i, i=0,1. This means that there is an apparent total order on the subactions executed on 
the same subregister. Moreover, if a subread reads a subwrite (on a subregister) then there is no 
other subwrite placed between them by this order. 
Lemma 4.1. Let -» be the minimal transitive relation on the subactions in A 1vA 2, 
extending =>o and :::>1, and satisfying (R7). Then-» is a partial order on A. 
Proof. By Lemma 2.4. • 
Define the precedence relation -7 on A by 
a -7b iffa-» ~, (4.1) 
for all subactions a of a and ~ of b . Obviously, relation -7 is a partial order and satisfies (R8) 
and (R9). It is convenient to use the following property. 
Lemma 4.2. -7 is an interval order on the set of writes W. 
Proof. Since -7 is a partial order, it is irreflexive, and it remains to show that its restriction 
to W satisfies (2.1). Note that there are only two processors involved, the writers p 0 and p 1. 
Suppose a-7b and c-7d. Assume w.l.o.g. that all four actions are distinct. Using (R7), 
if a ,c are by the same processor, then a -7c -7d or c ~ -7b ; 
* In the read protocol we can delete step 1 if i =O and step 2 if i =l, because the executing processor p; can remember 
the t; it has written last. Because p; can also remember the value it has written last, we can delete step 3 of the read pro-
tocol if j=i. Thus, such a fine tuned read protocol may consist of only one atomic read of t;e1 from K;e1 by p;, in case 
t o$t 1 :R. Correctness follows because this version is obviously equivalent to the original one. 
-20-
if b ,d are by the same processor, then a -7b -7d or c-7d -7b ; 
if a ,d are by the same processor, then a -7d or c -7d -7a -7b ; 
if b ,c are by the same processor, then a -7b -7e -7d or c-7b . 
W.l.o.g., the only remaining case to check is a ,b are by p 0 and c ,d are by p 1• If 
W (a )~oRs (d ,0), then each subaction of a -»-precedes each subaction of d, i.e., a-7d. If 
Rs (d ,D)~oW (a) then each subaction of c -»-precedes each subaction of b, i.e., c -7b. • 
Define the clans [w] and the -71t relation between them as before. 
Theorem 4.3. The Bloom register K is an atomic, 2-writer, n -reader register, which is 
implemented by two atomic, 1-writer, n -1-reader registers. The value domain of the subregis• 
ters is the cartesian product of {O, I} and the value domain of the compound register. 
Proof. Let p = (A ,-7,1t) be an arbitrary run according to the protocol. The conclusion of the 
theorem follows immediately from Lemma 4.4, Lemma 4.5 and Theorem 2.1. Namely, by 
Lemma 4.5 (1) p is normal, and Lemma 4.5 (2) implies acyclicity of-71t by Lemma 4.4. 
Like in the previous section, the key idea is to totally order the write actions first. Denote 
the identity of the tag written in step 3 of a write w by t(w ), and its value (0 or 1) by val (t(w )). 
Consider only the set W of writes. Define a relation « ia on the set of writes by w « ia w' iff 
w :;ew' and either 
(a) w -7w', or 
(b) -.(w-7w')& -.(w'-7w)andval(t(w))EBval(t(w'))=p(w'). 









. ··- - - - - - - - - - - - - - -
to= 0 
Figure 3: r reads w 1 since to$ t i = 1. 
It may come as a surprise, that the writes can be ordered by « in a way which contradicts the 
timing of the final subwrites. See figure 3 for an example where write w0 ends later than write 
w 1, and yet the value written by the former cannot be read by any read. 
Lemma 4.4. « is a total order on W. 
Proof. It suffices to prove that «ia is acyclic. Assume to the contrary that there is a 
minimal length cycle: 
Wo<<iaWt<<ia · · · <<;aWm<<;aWo. 
Since writes by the same processor are totally ordered by -7, the cycle must contain writes by 
both p 0 and p 1• Suppose the cycle contains writes a -7b by p 0 and writes c -7d by p 1• Since -7 is 
an interval order on W, it follows either a -7d or c -7b , which contradicts that the cycle is of 
minimal length. Therefore, the cycle contains precisely one write by one of the two processors, 
say wo by p 0• Consequently, since the writes by p 1 are totally ordered by -7, we have 
-21 -
w 1--7 · · ·-+Wm. Since the cycle has minimal length, and --7 is transitive, it follows m=l or m=2. 
Case 1. Suppose m=l. Then, Wo«iaWt and W1«iawo, which contradicts the antisym-
metry of « ia. 
Case 2. Suppose m =2. Then, 
Wo<<ia W t<<ia W2<<ia Wo, (4.2) 
in which w 1 and w2 are writes by p 1. Since w 1--+w2, the only way for this cycle to occur is if 
-,(wo--+w 1) & -,(w 1--+wo) & val (t(w0))$val (t(w 1))= 1 & (4.3) 
-,(wo--+w2) & -,(w2--+wo) & val (t(wo))$val(t(w2))=0 . 
We now derive a contradiction by induction on--+. We only give one half of the argument. The 
other half, with the roles of p 0 andp 1 reversed, is symmetric. 
Base case. Assume w0 is the first write by p 0• Then, since both w i,w 2 are --+-incomparable 
to wo, they obtain the same initial tag from Ko. Therefore, the tags written by w i.w2 are equal, 
i.e., val (t(w 1)) = val (t(w 2)), which contradicts (4.3). 
Induction. Assume there is no 3-cycle like (4.2) containing a write w by p 0, with w--+w0 • 




Figure 4: Reducing to an earlier 3-cycle. 
Let w be a write by p 0, with w --+w o. Since w 1--+w 2, and --7 is an interval order on W, 
either w--+w2orw 1--+wo. Since w 1,w0 are --+-incomparable, we must have w--+w 2• Assume that 
w is the write by p 0 directly preceding w0• Then w is also the write by p 0 which directly pre-
cedes w2. Therefore, w 2 reads t(w) from K 0• Hence, in write w2, processor p 1 chooses t(w 2) 
such that val (t(w ))$val (t(w 2)) = 1. Then, by (4.3), 
val (t (w ))$val (t (w 1)) = 0. (4.4) 
We can now finish the argument by showing that each way w and w 1 can be -+-related 
leads to a contradiction. 
Suppose w--+w 1. Since both pairs w0,w 1 and w 0,w2 were --+-incomparable, both w 1 and w 2 must 
read tag t(w). Then, val (t(w 1)) = val (t(w 2)), which contradicts (4;.3). 
Suppose w 1--+w. Then w 1--+w0, which contradicts (4.3) again. 
Suppose w and w 1 are --+-incomparable. Then, by ( 4.4) and the definition of « ia, we have 
w t«ia w. This, together with w--+wo and part of cycle (4.2), creates a new cycle 
W<<ia Wo<<ia W 1<<ia W · 
-22-
However, this is a 3-cycle contradicting the assumption that (4.2) is the 3-cycle containing the 
-+least write by po. This concludes the induction and the proof of the lemma. • 
Lemma4.5. 
( 1) p is normal. 
(2) .lf[w] -7n: [w'] then w«w'. 
Proof. ( 1 ). By construction of the protocol, and because there is an initial write preceding 
all other actions, each read reports a value written by a write. I.e., 1t is a total function. If 
w =it(r ), then r returns v with id (v )=w, by (R6). Without loss of generality, let p (w )=O. Thus, 
considering the run po={Ao,=}o,1fo) of subregister K 0 , it must be the case that 1to(R1 (r ,O))=W(w), 
and therefore W(w)=}oR1(r,O) by the definition of atomicity. Hence, by definition (4.1) of-7, 
we have -,(r-71t(r ). 
(2). It is convenient to prove the following Claim first. 
Claim. If [w]-71t[w'] and -,(w-7w'), then W-7r for some read re [w']. 
Proof of the claim. If [w ]-71t[w'] then a-7b with a e [w] and be [w' ]. 
Case a. If a=w and b is a read, then there is nothing to prove. 
Case b. Let a=w and b=w'. This contradicts assumption-,(w-7w'). 
Case c. Let a be a read. I.e., 1t(a )=w. Without loss of generality, let p (w )=0. Then, 
1to(R1 (a ,O))=W (w) and therefore W (w )=}oRt (a ,0). Therefore, by ( 4.1 ), we have W-7b. If b =w' 
then this case reduces to Case b. If b is a read then this case reduces to w -7b with 1t(b )=w', 
which is what we had to prove. This completes the proof of the Claim. 
To prove part 2 of the lemma, assume to the contrary that [w] -7n: [w'] and w' «w. Sup-
pose first that w and w' are executed by the same processor. Then, by definition of «, w' -7W. 
By the claim, w-7r for some read re [w']. Since the value written by w overwrites the value 
written by w' in Ko, a read r with Rt (r ,0) such that W(w)=}oR1 (r ,0), cannot read the value writ-
ten in Ko by w', which contradicts re [w']. Therefore, we must assume that w and w' are exe-
cuted by different processors, say p 0 and pi. respectively. 
Case 1. Suppose w-7w'. Then it follows that w «w', which is a contradiction. 
Case 2. Suppose w' -7W. By the claim, for some r, we have w-7r and n:(r )=w'. There-
fore, w' -7W-7r. Since n:(r)=w', p 1 does not write in K 1 in between W(w') and R1 (r ,1). Hence 
when read r scans the tag in register Ki. in subreadRs(r,l), it must obtain t(w'). Each write w 
by p 0 (in particular w=w, but also all other writes w satisfying the following conditions), such 
that w' -7W and W (w)=}oRs (r ,0), scanning the tag in K i. also must obtain t (w' ). Therefore, p 0 
chooses val(t(w)) equal val(t(w')) in write w. Hence, subreadRs(r,0) scans a tag t(w) such 
that val(t(w))= val(t(w')). The final subread in r, therefore, must read from Ko, i.e., must be 
R1 (r ,0), since val (t(w))$val (t(w' )) = 0. This contradicts the assumption that r returns the value 
written by w' inK1. 
Case 3. Suppose neither w' -7W nor w-7w'. Since w' «w, it must be the case that 
val(t(w))$val(t(w'))=O. (4.5a) 
By the Claim we have w-7r with 1t(r)=w'. Thus, r returns the value from K 1. This implies that 
r scans a tag t(wo), written by some write wo by p 0 , and a tag t(w 1), written by some write w 1 
by p 1, ,such that 
val (t(wo))$val (t(w 1)) = 1. (4.5b) 
-23 -
It turns out, that any possible -t-relation between w0 and w (both by p 0), in combination with 
any possible -t-relation between w 1 and w' (both by p 1), leads to a contradiction. 
We first observe that neither Wo-tw nor w' -tw 1• Namely, if w0-tw, then w0-tw-tr and 
r cannot scan t(wo) in Ko, since it is already overwritten by w when r scans. Therefore, w=w 0 
or w -two. Secondly, because r returns the value written by w' in K 1, there can be no W ( w 1) 
between W(w') and R1(r,l). Therefore, w'=w 1 or w 1-tw'. Now we check off the remaining 
possibilities. 
Subcase 3.1. Suppose val(t(w0))=val(t(w)) and val(t(w 1))=val(t(w')). This contrad-
icts (4.5) straightaway. 
Subcase 3.2. Suppose val (t(w 1)) ;e val (t(w')). Then w 1-tw' and therefore w 1«w'. Since 
we have assumed w' <<w, we have w 1<<W. If w 1,w were -7-incomparable, then 
val (t(w 1))$val (t(w )) = 0 by definition of«. Therefore val (t(w'))$val (t(w)) = 1, which con-
tradicts (4.5). Hence w 1-tw. However, we have assumed that w-tr and r scans t(w 1). Then, 
also w must obtain t(w 1), and again val (t(w 1))$val (t(w )) = 0, contradicting (4.5). 
Subcase 3.3. Suppose val(t(w));Cval(t(w0)). Then w-tw0 and therefore w«w0• Since 
w' «w we have w' «w0• If w0,w' were -t-incomparable, then va/(t(w0))$va/(t(w')) =0 by 
definition of«. Then, val(t(w))$val(t(w'))= 1, which contradicts (4.5). The only other way 
to satisfy w' «Wo is w' -two. Since r scans t(wo), we also have W(wo)-»Rs(r ,0). But r 
reports the value written by w', and therefore there is no write on K 1 in between W (w') and 
R1 (r ,1). So wo must scan t(w'). Hence, val (t(w' ))$val (t(wo)) = 0, contradicting (4.5). 
This finishes. the proof of the lemma and hence of the theorem. 111 
Corollary. The Bloom register is shrinking atomic. 
Proof: sketch. This follows similarly as in the matrix construction by assuming that -7 is 
an interval order, e.g., by assuming global time. Then we have 'shrinking' atomicity by Theorem 
2.3 .• 
5. Conclusion 
In this paper we propose a method of proving atomicity of shared registers in an order setting. It 
seems to us that it can be applied in many cases where we have to prove atomicity. In outline: 
1. Find an appropriate partial order -7 between the high level reads and writes defined in terms 
of the assumed partial order between the lower level reads and writes. Induce the -71t rela-
tion on the set of clans defined by the reading mapping. 
2. Find a way to totally order the writes, using the intuition which makes you believe the pro-
tocol works correctly. Use this total order to prove that -tit is acyclic. 
Leslie Lamport has suggested expressing atomicity in terms of orders before. He treats the 
single writer case in [3]. On a metalevel, our method of first totally ordering the writes as an auxi-
liary construction for proving acyclicity of -71t, may be viewed as a reduction to the single writer 
case. The method is present in embryonic form in [9], and was used in its present form in the 
presentation of that paper (and distributed) at that conference as [10]. Nancy Lynch has devised a 
related proof for the matrix algorithm using time and Input/Output automata. The 'knowledge 
graph' in [8] is the set of clans with the -tit relation induced by the natural interval order. 
-24-
Acknowledgement 
Conversations with Bard Bloom, Leslie Lamport, Arjen Lenstra and Nancy Lynch are gratefully 
acknowledged. Lambert Meertens' comments had a profound influence on this paper. 
References 
[l] Bloom, B., Constructing Two-writer Atomic Registers, Manuscript, Massachusetts Institute 
of Technology, June 1986. 
[2] Fishburn, P.C., Interval Orders and Interval Graphs, Wiley, 1985. 
[3] Lamport, L., On Interprocess Communication, Part I: Basic Formalism, Part ll: Algo-
rithms, Distributed Computing, vol. 1, pp. 77-101, 1986. 
[4] Lamport, L., The mutual exclusion problem, part I - A theory of interprocess communica-
tion, Journal ACM, vol. 33, pp.313-326, 1986. 
[5] Mullender, SJ., and P.M.B. Vitanyi, Distributed match-making for processes in computer 
networks. In: Proceedings 4th ACM Symposium on Principles of Distributed Computing, 
1985, 261-271. 
[6] Papadimitriou, C., The serializability of concurrent database updates, Journal ACM, vol. 
26,pp.631-653, 1979. 
[7] Peterson, G. L., Concurrent Reading While Writing, ACM Transactions on Programming 
Languages and Systems, Vol. 5, No. 1, Jan. 1983, pp. 46-55. 
[8] Peterson, G.L. and J.E. Burns, Concurrent Reading While Writing//, the Multiwriter Case, 
Tech. Rept. CIT-ICS-86/26, Georgia Institute of Technology, December 1986. 
[9] Vitanyi, P. M. B., and Awerbuch, B., Atomic Shared Register Access by Asynchronous 
Hardware, Proceedings 27th IEEE Symposium on Foundations of Computer Science, 1986, 
233-243. 
[10] Vitanyi, P. M. B., and Awerbuch, B., Atomic Shared Register Access by Asynchronous 
Hardware using Unbounded Tags, Manuscript, Massachusetts Institute of Technology, 
October 1986. 
