Time and space optimal implementations of atomic multi-writer register  by Israeli, Amos & Shaham, Amnon
Information and Computation 200 (2005) 62–106
www.elsevier.com/locate/ic
Time and space optimal implementations of atomic
multi-writer register
Amos Israeli a,∗, Amnon Shaham b
aSchool for Computer Science and Mathematics, Netanya Academic College, Israel
bIntel, Israel
Received 20 September 2000; revised 3 July 2004
Available online 23 May 2005
Abstract
This paper addresses the wide gap in space complexity of atomic, multi-writer, multi-reader register im-
plementations. While the space complexity of all previous implementations is linear, the lower bounds are
logarithmic.We present three implementationswhich close this gap: the ﬁrst implementation is sequential and
its role is to present the idea and data structures used in the second and third implementations. The second and
third implementations are both concurrent, the second usesmulti-reader physical registers while the third uses
single-reader physical registers. Both the second and third implementations are optimalwith respect to the two
most important complexity criteria: their space complexity is logarithmic and their time complexity is linear.
© 2005 Elsevier Inc. All rights reserved.
1. Introduction
1.1. Problem description
At the most basic level of asynchronous interprocessor communication, data are transferred
via shared memory. A register is a very simple model for shared memory-based communication.
 Partially supported by NWO through NFI Project ALADDIN under Contract No. NF 62-376. A preliminary version
of this paper was presented in the 11th Annual ACM Symposium on Principles of Distributed Computing, August 1992,
Vancouver, Canada.
∗ Corresponding author. Fax: +972 9 860 7825.
E-mail addresses: amos@netanya.ac.il (A. Israeli), amnon.shaham@intel.com (A. Shaham).
0890-5401/$ - see front matter © 2005 Elsevier Inc. All rights reserved.
doi:10.1016/j.ic.2004.11.004
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 63
Processors access registers by executing read and write operations. Each register has a set of writer
processors, a set of reader processors and a set of permitted values, including a distinguished value
called the register’s initial value. A register is atomic if each operation is executed instantaneously
and each read operation returns the value written by the most recent, preceding, write operation,
or the initial value if no such preceding write operation occurred. Each atomic register is further
classiﬁed by the number of its writers, the number of its readers and the number of its permitted
values. Once atomic registers are deﬁned and classiﬁed, it is natural to compare the relative com-
putational power of various kinds of registers. In this paper, we study the computational resources
needed to implement a multi-writer atomic register using single-writer atomic registers.
Informally, an implementation of a logical register using a set of physical registers consists of a
hardware arrangement of the physical registers and two programs that are called thewriter protocol
and the reader protocol. Both protocols are composed of operations of the physical registers (which
are called physical operations) and constitute the operations of the logical register (which are called
the logical operations). The set of processors is partitioned into logical writers and logical readers,
that is, writers and readers of the logical register. For simplicity we assume that each processor is
either a logical writer or a logical reader though in reality a processor can function as both.
Whenever a processor “wishes” to execute a logical operation, it starts executing its protocol.
The execution of a physical (logical) operation is called a physical (logical) action. The physical
actions executed by a processor during a logical action may be interleaved with physical actions of
other processors and since the system is entirely asynchronous there is no bound on the number of
physical actions (of other processors) between any two consecutive physical actions of any proces-
sor. In particular, a processor may crash, that is stop execution forever, in the middle of any logical
action.
Informally, an execution of the system is a sequence of physical actions partitioned into logical
actions. The correctness condition we use is called linearizability. A linearization of an execution
of some register implementation is an assignment of a distinct linearization point for each logical
action such that the induced sequence of logical actions preserves the order of non-overlapping
actions and each read operation returns the value written by the most recent, preceding, write op-
eration, or the initial value if no such write operation occurred. An implementation is linearizable
if all its executions are linearizable. To ensure resiliency for crash faults and to avoid the use of
mutual-exclusion techniques that eliminate concurrency, it is required that the reader and writer
protocols are wait-free, that is, the number of physical actions a processor executes during a single
logical action is bounded from above, where the bound may depend on the number of processors
in the system.
1.2. Complexity measures
Throughout this paper w and r are used for the number of writers and readers, respectively,
and n = w + r is the total number of processors in the system. A register with w writers and r
readers is denoted as a (w, r)-register. In label-based implementations, the physical registers are
divided into two ﬁelds: a value ﬁeld and a label ﬁeld. The value ﬁeld holds a permitted value, or
a ﬁnite set of permitted values. The only operations which access the value ﬁeld are copying a
value to this ﬁeld or from it. The label ﬁeld holds all the coordination information needed for the
implementation.
64 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
We deal with label-based implementations of two types, according to the physical registers used:
In type (1, n), each logical writer owns1 an atomic (1, n)-register,2 via which it communicates with
all other processors. In type (1, 1), each processor communicates with every other processor via
a (1, 1) atomic register. It should be noted that if two registers of the same owner have the same
set of readers, they can be joined to a single register in which the two registers are represent-
ed as ﬁelds; thus we assume that the sets of readers of every two registers of the same owner
are distinct. Under this assumption, the complexity of label-based implementations is measured
by:
1. Space complexity. Themaximal size of a label ﬁeld of any physical register. (This criterion is often
called label-size.)
2.Time complexity. The maximal number of physical actions executed during a single logical read
or write operation.
Note. The deﬁnition of space complexity enables us to ignore the size of the value set of the imple-
mented register. In case the value ﬁeld holds more than a single value, the number of values in the
value ﬁeld should also be considered and added to the space complexity. In the second and third
implementation presented in this paper, the value ﬁeld has two and four values, respectively. We
regard 4 as a small enough constant and conveniently ignore it.
A third and much less common complexity measure is parallel-time complexity. In order to an-
alyze the parallel-time complexity of an implementation it is assumed that each processor P can
write in (respectively, read from) k of its n registers (k  n) in parallel, using a single operation. Fur-
thermore, it is assumed that whenever p issues a parallel write operation, all k actions are executed
independently and any of these actions interleaving among themselves as well as with actions of
other processors is possible. The only restriction is that P may not continue its execution until all k
actions are completed. Under these assumptions we deﬁne:
3. Parallel-time complexity. The maximal number of physical actions, including Parallel Write and
Parallel Read, executed during a single logical read or write operation.
1.3. Results of this paper
We present three bounded, label-based implementations for an atomic, multi-writer, multi-read-
er register, with logarithmic space complexity. The ﬁrst implementation is sequential and its role is
to present the ideas and data structures used by the second and third implementations. The second
and third implementations are the ﬁrst concurrent implementations of multi-writer register with
logarithmic space complexity.
The second implementation is of type (1, n); in this implementation communication is one sid-
ed: Writers communicate to readers (and to other writers) while readers do not write. Each writer
1 The writer of every physical register is called its owner.
2 Throughout this paper we assume that each processor communicates with itself via a register. This assumption is not
essential and is used to obtain simpler code and cleaner expressions, e.g., (1, n)-register instead of (1, n− 1)-register.
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 65
owns an atomic (1, n)-register which can be read by all processors, writers and readers. The space
complexity of this implementation is 	(logw); by the lower bounds of [3,19], this bound is op-
timal for label-based implementations. For general implementations, it is not hard to prove that
the number of values in each of the registers used in any implementation is bounded below by
the size of the value set of the implemented register. Hence, if the number of permitted val-
ues of the implemented register is polynomial in n, the register size is 
(log n). Therefore, the
space complexity of this implementation is optimal for any implementation of an atomic register
whose value set size is polynomial in n. The time complexity of this implementation is 	(w) which
is optimal for any implementation. The parallel-time complexity of the (1, n) implementation is
	(w).
The third implementation is of type (1, 1), in this implementation communication is two sided:
Each processor, writer or reader, communicates with each other processor via an atomic (1, 1)-reg-
ister. In [9], it was proved that two-sided communication is necessary for implementations of type
(1, 1). The space complexity of this implementation is 	(log n) and the time complexity is 	(n).
In case w = 	(n), the space complexity is optimal for label-based implementations. For general
implementations, if the size of the value set is polynomial in n then the space complexity is optimal.
The time complexity is optimal for any implementation. The parallel-time complexity of the (1, 1)
implementation is 	(n).
One may wonder which of the two concurrent implementations is “better”? The answer depends
on the type of physical registers at our disposal. If the available physical registers are of type (1, n)
then they can be used as (1, 1) in the (1, 1) implementation, but in this case the space and time com-
plexity are	(log n) and	(n), respectively, instead of	(logw) and	(w), and the number of needed
registers is n2 instead of n. Thus, in this case the answer is clear.
If the available physical registers are of type (1, 1) then the (1, n) implementation can be used if the
required (1, n)-registers are implemented from (1, 1)-registers. The best implementation for a (1, n)-
register from (1, 1)-registers is obtained from the multi-writer implementation of [13]. Adapting this
implementation for a single writer yields an implementation of (1, n)-register in which the single
writer owns (1, 1)-registers whose label-size is 	(n) while the label-size of the readers’ registers is
	(1). Thus, the space complexity is 	(n) and its time complexity is 	(n) as well. In order to imple-
ment the w (1, n)-registers, each writer of the implemented (w, r)-register uses n (1, 1)-registers whose
label-size is n+ w − 1 bits. The space complexity of the combined implementation is	(n+ w logw).
The time complexity for reading or writing a single value is 	(n), thus, the time complexity of the
combined implementation is 	(w · n). Obviously, when the available physical registers are (1, 1),
the (1, 1) implementation whose space and time complexity are 	(log n) and 	(n), respectively, is
superior to the (1, n).
1.4. Previous work
Peterson, in [16], presented the ﬁrst implementation of a register by another register, Misra, in
[15], gave axioms for shared memory systems and Lamport [10,11] was the ﬁrst to formalize the
notion of a register implementation. In [10,11], Lamport showed that an atomic (1, 1)-register with
any value set can be implemented using very weak registers, namely binary, safe, (1, 1)-registers.
Several papers, motivated by the work of Lamport, studied the intriguing problem of implementing
atomic, multi-writer, multi-reader registers. The simplest such implementation was presented by
66 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
Table 1
Implementations of type (1, n)
Refs. [17,18] Ref. [7] Ref. [4] This paper Ref. [2]
Space 	(w) 	(n) 	(w) 	(logw) 	(logw)
Time 	(w2) 	(n) 	(w2 logw) 	(w) 	(w)
Table 2
Implementations of type (1, 1)
Ref. [20] Ref. [13] This paper
Space unbounded 	(n) 	(log n)
Time 	(n) 	(n) 	(n)
Parallel time 	(n) 	(1) 	(n)
Vitányi and Awerbuch in [20]. They present a label-based implementation of an atomic (w, r)-reg-
ister, using atomic (1, 1)-physical registers. In the implementation of [20], the labels are unbounded
counters used as time-stamps, hence this implementation has unbounded space complexity. The ac-
tual size of a label in any logical action is logarithmic in the number of write actions performed
prior to that action. The time complexity of this implementation is linear in n, the total number of
processors.
Some researchers have devised bounded label-based implementations of type (1, n), for atomic
multi-writer, multi-reader registers: The ﬁrst implementation was proposed in [20] and was found
to be erroneous. The second implementation was presented by Peterson and Burns in [17]—that im-
plementation has a bug which was discovered and corrected by Schaffer in [18]. Its space complexity
is 	(w) and its time complexity is 	(w2). Israeli and Li, in [7], suggested bounded time-stamps as
a bounded primitive to capture the precedence relationship among asynchronous processors. Us-
ing bounded time-stamps they presented an implementation whose space and time complexity are
	(n). The correctness of the [7] implementation was never fully proved. Another implementation
with higher complexity was presented by Dolev and Shavit [4]. Abraham in a manuscript dated
1991 presents an implementation whose space and time complexity are	(w). A later version of this
paper with stronger results that are inﬂuenced by the results of this paper appears in [2]. The various
(1, n) implementations are compared in Table 1. The parallel-time complexity of none of these (1, n)
implementations was analyzed.
The only two implementations of type (1, 1) are the original unbounded implementation of [20]
whose time complexity is	(n) and the implementation of Li, Tromp and Vitányi, presented in [13].
The space and time complexity of the [13] implementation is	(n), and its parallel-time complexity
is a small constant. The various (1, 1) implementations are compared in Table 2.
1.5. Discussion
Prior to this work, all proposed bounded concurrent implementations have a space complexity
which is linear in the number of writers, and in some cases even in the total number of proces-
sors. In [14], Li and Vitányi presented a sequential implementation (sequential implementations
are correct only for sequential executions in which the logical actions are executed sequentially,
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 67
without overlapping) with logw space complexity. The sequential implementation of Li andVitányi
uses the ids of the processes, or in other words, is not anonymous. For an anonymous implemen-
tation their method requires labels of size 2 logw, since the processor’s ids are added to the
labels.
Later, it was proven by Cori and Sopena in [3] that an anonymous implementation for w writers
should have at least 2w − 1 distinct labels. They also devised a sequential implementation with ex-
actly 2w − 1 labels which improved the space complexity of the sequential [14] implementation by
a constant. In [19], it was proven by Tromp that the sum of sizes of label ﬁelds in non-anonymous
implementations is at least w logw which translates to 
(logw) label size assuming that the size of
all label ﬁelds is the same.
These results leave an exponential gap between the lower and upper bounds on the space com-
plexity of label-based implementations of atomic registers. In a way, this bound represents the cost
of concurrency: the lower bounds of [3] and [19] consider only the combinatorial requirements for
identifying the last written value. For this reason, these lower bounds hold for sequential and con-
current implementations. The extra complexity of concurrent implementations seems to be incurred
by the need to deal with concurrency aspects. All concurrent implementations before this paper use
direct binary comparison between labels of every pair of processors to determine their precedence
relations and therefore they all require (at least) linear space.
The signiﬁcance of this gap is further emphasized when one considers a related problem, namely:
an implementation of an atomic register which is correct for executions whose length is polynomial
in the number of processors. Onemaymotivate this deﬁnition by arguing that in real life, the proba-
bility for longer executions is so low that the cost of allowing themmust be taken into account. For
polynomially long executions, the space complexity of the [20] implementation is logarithmic while
the space complexity of all previous bounded protocols is linear. In other words, for a problem that
might be considered more practical, the protocol of [20] supersedes all other implementations by an
exponential factor. The results of this work show that boundedness can be achieved with no more
than logarithmic space complexity, thus closing the exponential gap discussed above.
The problem of implementing a multi-writer atomic register is well known. As we pointed out,
several erroneous or incomplete solutions have been published and debugging them was complex
and controversial. Given the history of the problem we believe that no implementation is worth
anything without a complete detailed correctness proof. In this work, we fulﬁll this obligation and
present full proofs, without any shortcuts, to both implementations. Unfortunately, the proofs are
often not intuitive, very technical and uninspiring. We regard the presentation of a formal veriﬁca-
tion system for register implementations as an important open problem.
1.6. Paper organization
The rest of this paper is organized as follows: In Section 2, we formally deﬁne the model of
computation and the implementation problem. In Section 3, we explain the data structure used
by our protocols and present a sequential implementation. The sequential implementation serves
as an exposition for the ideas which are later used in the concurrent implementations. The (1, n)
and the (1, 1) implementations are presented in Sections 4 and 5, respectively. Concluding re-
marks are presented in Section 6, the correctness proof of the hand-shake mechanism appears
in Appendix A.
68 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
2. Model and requirements
In this section, we deﬁne themodel of computation and the atomic register implementation prob-
lem. A system consists of a set of processing entities called processors, and a set of memory entities
called atomic registers (for brevity we use the term registers throughout this paper). Each register
has a set of writer processors, a set of reader processors and a set of permitted values, including a
distinguished value called the register’s initial value. Processors access registers by executing read
and write operations. A write operation to register REG is executed by some writer of REG, the
operation gets a permitted value of REG as an input parameter and it stores the value in REG. Anal-
ogously, a read operation from REG is executed by some reader of REG, the operation retrieves the
(permitted) value stored in REG and the value is returned as an output parameter.
Register REG is atomic if each operation is executed instantaneously and each read operation
returns the value written by the most recent preceding write operation, or REG’s initial value if no
such write operation exists. A processor is a ﬁnite state machine. For convenience, we describe a
processor by its protocolwhose building blocks are instructionswhere each instruction corresponds
to a single state-transition. Each instruction starts with an internal computation which is succeeded
by at most one operation. Each protocol has a distinguished instruction corresponding to the state
machine’s initial state, that is called the protocol’s initial instruction.
System executions are described under the interleavingmodel: a system’s global state is described
by its conﬁguration—a vector containing the state of each processor and the value of each register.
The system’s initial conﬁguration contains the processors’ initial states and the registers’ initial val-
ues. The execution of an instruction is called an action. We assume that each processor repeatedly
executes its protocol and that the actions of the system’s processors are interleaved. A system ex-
ecution is a sequence of conﬁgurations and actions E = c0, d1, c1, . . . , ci−1, di, ci, . . ., where c0 is the
system’s initial conﬁguration and for every i > 0, ci is obtained from ci−1 by action di . Conﬁguration
ci is called the result conﬁguration of action di . Whenever convenient we omit the di’s and describe
an execution by the sequence c0, c1, . . .
Now, we deﬁne the implementation problem for atomic registers: an atomic register is a concur-
rent object, see [5]. A concurrent object can be speciﬁed either as an automaton, see [12], or by a set of
axioms, see [15], or by a set of sequential executions, see [6]. Regardless of the method of specifying
the concurrent object, most works assume the interleaving model and describe executions in the
way we presented above. Intuitively, a system is an implementation of logical register REG, if the
system processors can be divided into writers and readers and a single execution of a writer (reader)
protocol can be looked at as a write (read) action to REG. If the physical register(s) are equal to
the logical register, the problem is trivial, thus, to make the problem meaningful, the implemented
register should have either a larger number of readers, or a larger number of writers, or a larger set
of values.
Instead of giving a formal deﬁnition of an implementation, we refer the reader to [5,1] in which
the notion of implementation is discussed and formally deﬁned. Here, we resort to an informal
description which explains what we do without any ambiguities: An implementation of a logical
register is a system whose registers are called physical registers. The operations that access the phys-
ical registers are called physical operations. A writer (reader) is deﬁned by the writer protocol (reader
protocol), whose building blocks are physical operations. The writer protocol has one input param-
eter which is the value that should be stored in the logical register. The reader protocol should be
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 69
exited through instruction return that does not contain any physical operation but rather returns a
value which is the result of the logical read action.
Consider an execution of an implementation in which each processor executes its protocol re-
peatedly and the actions of the processors are interleaved. Each execution of the writer (reader)
protocol constitutes a logical write (logical read) action. Let a be a logical action of processor Pi .
The actions of Pi, executed during a are the physical actions of a. Let di1 , ci1 , di2 , . . . , dit , cit be the
physical actions and resulting conﬁgurations of logical action a. Conﬁguration ci1−1 is called a’s
initial conﬁguration, conﬁguration cit is called a’s ﬁnal conﬁguration and the interval [ci1−1, . . . , cit ]
is called a’s execution interval. It should be noted that interval [ci1−1, . . . , cit ]may contain actions of
all processors (and not necessarily of Pi alone). We prove the correctness of the implementations
under the linearizability correctness condition, [6]. An execution is linearizable if each logical action
can be assigned a linearization point, such that the sequence of linearization points preserves the
order of non-overlapping logical actions and each logical read action r returns the value written
by w, where w is the action that is linearized last among all logical write actions linearized before
r. In all linearizations we present, every linearization point lies within the execution interval of its
logical action, which trivially preserves the order of non-overlapping actions.
An implementation is linearizable if all its executions are linearizable. An execution in which the
physical actions of each logical action are executed one after the other, with no interleaving with the
physical actions of any other logical actions, is called sequential. An implementation is sequential
if all its sequential executions are linearizable. A protocol is wait-free if all its executions consist of
a bounded number of physical actions, where the bound may depend on the number of processors
in the system. We require that the logical operations are wait-free.
3. Basic principles
In this section, the precedence graphmethod, used in all three implementations, is reviewed. Then,
the actual precedence graph on which the implementations are based is presented. The section is
concluded by presenting a sequential implementation of a multi-writer, multi-reader register from
single-writer, multi-reader registers. The sequential implementation is used to demonstrate themain
features shared by all three implementations.
3.1. Time-stamps and the precedence graph method
Many concurrent protocols use the Natural Time-Stamps Scheme to represent temporal prece-
dence relations among protocol actions: Each action of the protocol is labeled with some natural
number called the action’s time-stamp and the time-stamp of an action is always larger than the
time-stamp of every preceding action. In [7], Israeli and Li observed that the natural time-stamp
scheme can be looked at as a graph, where each time-stamp is a node and the node of every time-
stamp dominates the nodes of all preceding time-stamps. Following that observation, they proposed
to keep the basic idea in which nodes of earlier actions are dominated by nodes of later actions but
to replace the inﬁnite graph of the natural time-stamps scheme with some ﬁnite graph.
The ﬁrst problem to overcome when using a ﬁnite graph to represent temporal precedence rela-
tions is the fact that the number of time-stamps is ﬁnite. Let us elaborate on this problem: consider
70 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
the situation when a protocol needs to pick a time-stamp for some new action during some ex-
ecution. When the protocol uses the natural time-stamp scheme, it simply picks a number larger
than any number used earlier in that execution. Thus, in this case, every logical action is stamped
with a unique time-stamp, all time-stamps are ordered by the temporal precedence relation and no
confusion can occur. Now, consider the same situation, when the time-stamp set is ﬁnite: for long
enough executions, the time-stamp collection is eventually depleted and old time-stamps must be
reused. Call a time-stamp alive if it exists in the memory of some processor. Whenever a time-stamp
is picked for some new action, it is obvious that the protocol should not pick an alive time-stamp,
but this is not enough: The protocol must ﬁnd all alive time-stamps and pick a time-stamp that
dominates all of them. In this way, the set of all alive time-stamps is always ordered by temporal
precedence relation. This problemwas posed and solved in [7]. For concurrent time-stamp schemes,
the problem was solved by several protocols, see, e.g. [4].
By deﬁnition, each concurrent time-stamp scheme enables determination of the precedence re-
lation between every pair of alive time-stamps by a direct binary comparison. Therefore, each
concurrent time-stamp scheme can be used to implement an atomic multi-writer register. As was
proved in [7], the space complexity of any such time-stamp scheme is at least linear. In this work
we use the precedence graph method, but in order to achieve logarithmic space complexity we must
give up binary precedence comparability. Instead we resort to a new method of using precedence
graphs. In this new method, precedence relations are represented using directed paths. All nodes
on each directed path are temporally ordered while the temporal order among nodes on different
paths is not determinable. Though our new approach cannot be used to ﬁnd a complete tempo-
ral order among all alive time-stamps, it is sufﬁcient to enable each read action to determine the
most recent preceding write action, which is exactly the requirement for implementing an atomic
register. A second difference of a technical nature is the reversing of domination order, namely:
In the new method, nodes corresponding to earlier actions dominate nodes corresponding to later
actions.
3.2. The precedence graph and its sub-trees
In this section we present the precedence graph used by the three implementations and the way
in which it represents temporal precedence order:
Deﬁnition 1. Let ID be the set of processor ids, {0, 1, 2, . . . , p}, and let AD be the set of addresses
{0, 1, . . . ,max_add}. The number of ids, p + 1, is equal to the number of processors in the system
plus 1. The number of addresses, max_add + 1, which determines the exact size of the precedence
graph, is a function of the number of processors and it differs from one implementation to another.
The precedence graph P = (V ,E) is deﬁned as follows:
• Each node is a quadruple of natural numbers. The set of nodes is a subset of ID ×AD × ID ×
AD ∪ {v00}, where v00 = (0, 0, 0, 0) is called the root node. The four components of node v are denot-
ed by v.id , v.address, v.tail_id and v.tail_address. A pictorial description of a single node appears
in Fig. 1.
• The edges ofP are encoded by the names of the nodes: Let u and v be twonodes. If u.id = v.tail_id
and u.address = v.tail_address then there is an edge emanating from u and incoming to v. This
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 71
Fig. 1. A single node.
edge is denoted by (u, v) and u is called the tail node of v. To avoid non-trivial cycles we require
that for every node v, v.id  v.tail_id .
Note. The indegree of all nodes is 1. For convenience we ignore the self-loop that emanates from
v00 and regard its indegree as 0. For any other node v, v /= v00, the edge incoming to v is called the
edge of v. In most cases it holds that for every node v, v.id > v.tail_id . Self-loops are allowed only
in special cases that will be described later.
In all three implementations, the implemented register is a (w, r)-register where w and r are used
for the number of writers and readers, respectively, and n = w + r is the total number of processors
in the system. The writers of the implemented logical register are denoted byW1,W2, . . . ,Ww and
the readers are denoted by Rw+1, . . . ,Rw+r . Execution number a of the writer protocol by Wi is
denoted by Lai ; i and a are called the id and the index of L
a
i , respectively. Logical action L
a
i computes
a node, denoted by vai , that is written into the current ﬁeld of REGi at the end of L
a
i . Node v
a
i remains
in (the current ﬁeld of) REGi until the end of L
a+1
i , when v
a+1
i is written into (the current ﬁeld of)
REGi . During this interval node vai is called the current node ofWi . Given an execution E, the current
node of each logical action is completely determinable. Note, however, that the processors cannot
compute the indices of the actions that generated the processors’ current nodes.
For every current node ofWi, vai , it holds that vai .id = i and all these nodes are stored in REGi .
Therefore, the id , i, is omitted from the explicit encoding of vai in REGi, and v
a
i is speciﬁed by its
address, tail_id and tail_address components. The logical value, written in Lai , is attached to v
a
i , but
the protocols never use the logical value and it is omitted from the protocol’s description.
All three implementations use a procedure called collect whose code appears in Fig. 2. Proce-
dure collect computes and returns a subgraph of P , called G, induced by the current nodes of all
processors. In addition, collect computes and returns the set AD containing the tail_address-s of
all collected nodes. Let us describe collect: the set V is initialized with root node v00. Then nodes
are collected by reading the processors’ registers in ascending order of the processors’ ids. After all
Fig. 2. The code for procedure collect.
72 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
nodes are collected, the procedure computes the subgraph induced by all nodes in V . Recall that the
indegree of every node of P , except the root node, is 1. Since G is a subgraph of P , the indegree of
each node inG is at most 1 and since there are no non-trivial cycles, every collected graph is a forest
of rooted trees. One of these trees is always rooted at the root node while the other trees (if there
are any) are rooted at some other nodes. Since we require that for every node vai , v
a
i .id  vai .tail_id ,
the nodes on each directed path of the precedence graphs are ordered in an ascending order of
their ids. From now on we do not use the ﬁxed precedence graph explicitly, instead we use the term
precedence graphs to refer to the subgraphs of P , computed by collect.
The following example demonstrates the way in which the current graph is constructed: assume
that the following set of values was read from REG1, . . . ,REG6:
REG1 = (1, 0, 0), REG2 = (6, 0, 0), REG3 = (8, 1, 3),
REG4 = (7, 2, 6), REG5 = (0, 3, 8), REG6 = (1, 4, 7).
In order to get the actual nodes that were collected, the ID of each processor should be appended
as the ﬁrst ﬁeld of each node, thus the set V of collected nodes is:
v00 = (0, 0, 0, 0),
va1 = (1, 1, 0, 0), vb2 = (2, 6, 0, 0), vc3 = (3, 8, 1, 3),
vd4 = (4, 7, 2, 6), ve5 = (5, 0, 3, 8), vf6 = (6, 1, 4, 7).
Note. The node v00 is added to V at the beginning of collect. The upper indices a, . . . , f are not
known to the processors and are not used in any protocol. The induced graph appears in Fig. 3.
Let us now explain the edges ofG: denote the nodes in REG1 and REG2 by va1 and v
b
2, respectively.
Since va1 .tail_id = v00.id = 0 and since va1 .tail_address = v00.address = 0, G has an edge from v00 to va1
and this edge is denoted by (v00, v
a
1 ). For the same reason, (v
0
0, v
b
2) is also an edge of G. On the other
hand, no node vui satisﬁes v
u
i .tail_id = 1 and vui .tail_address = 1, thus the outdegree of va1 is 0. Using
the same rule, the reader can now infer the rest of the edges of G.
Now, we describe the way precedence relations are represented in G: An edge of the precedence
tree, (vbj , v
a
i ), reﬂects the fact that L
b
j is linearized before L
a
i . If two edges (v
a
i , v
b
j ) and (v
a
i , v
c
k) emanate
Fig. 3. The induced precedence graph.
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 73
from the same node vai , then the order of actions L
b
j and L
c
k cannot be determined by the edges of
the precedence tree. In this case, we order these actions by a descending order of their ids (higher
ids are linearized before lower ids):
Deﬁnition 2. Let vai and v
b
j , i /= j, be two nodes with a common tail node. If i > j we say that node
vai locally precedes node v
b
j .
Note. Relation locally precedes is partial and it is deﬁned only for nodes whose edges emanate from
the same node in the precedence graph.
This deﬁnition enables linearizing nodes with lower ids after nodes with higher ids. Using this
deﬁnition we proceed to deﬁne for each precedence graph its Frontal Branch and its last node:
Deﬁnition 3. Let G be some precedence graph. The frontal branch of G, denoted by B, is a directed
path which is deﬁned inductively as follows:
• The ﬁrst node in B is the root node v00.• Let ' be a preﬁx of B and let vai be the last node of '. The next node in B is the node whose edge
emanates from vai and which is locally preceded by all other nodes whose edges emanate from v
a
i .• The last node in the frontal branch of G is called the last node of G.
Let e1 = (vck , vai ) and e2 = (vck , vbj ) be two edges in some precedence graph G such that e1 belongs
to the frontal branch of G (that is vbj locally precedes v
a
i ). In this case, we say that e1 excludes e2
from the frontal branch of G. A pictorial description of a precedence tree appears in Fig. 4, where
the edges of the frontal branch appear as bold arrows.
Fig. 4. A precedence tree.
74 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
3.3. The sequential implementation
In this section we present a sequential implementation of a (w, r) register. The role of this im-
plementation is to demonstrate the basic ideas, used by all the implementations, in their purest
form. In this implementation, each writer writes in a (1, n)-register called REGi, which can be read
by all processors, writers and readers; readers do not write. Each register REGi has a single ﬁeld
that holds the current node of Wi . Let c be some system conﬁguration. The current graph of c is
the graph induced by the root node and the set of all current nodes in c. The basic idea in all the
implementations is:
At any conﬁguration, the last node of the current graph belongs to the write action that was
most recently linearized.
Fig. 5 presents the code of the protocols forWi (above) and forRu (below): The sequential writer
protocol works as follows: Logical action Lai starts with an invocation of procedure collect (see Fig.
2). The graph, returned by collect, and its frontal branch are denoted byGai and B
a
i , respectively. The
i-preﬁx of Bai , denoted by B
a
i /i, is the preﬁx of B
a
i containing all nodes whose id is less then i. Now,
Wi computes a new node, called new, that will become its current node at the end of Lai . When new
will be added to the current graph, it will exclude the rest of Bai and will become the last node of the
new frontal branch. To do that,Wi chooses the last node of Bai /i, denoted by tail, as the tail node
of new, that is: new.tail_id := tail.id and new.tail_address := tail.address. The address for vai is the
minimal address which does not belong to AD, thus, the new address is different from component
tail_address of every current node. In this way, it is ensured that the outdegree of the new node is 0.
The sequential reader protocol works as follows: Reader Ru collects the current graph, computes
its frontal branch and returns the last node of the frontal branch.
To complete the deﬁnition of the protocol, we need to specify initial values for all data struc-
tures. For each writer Wi, REGi holds its initial node, v0i , where v0i .address = 0, v0i .tail_id = i and
v0i .tail_address = 0. Thus, at the system’s initial state all edges of the current graph are self-loops
and the initial frontal branch contains solely the root node v00. (This is the only case in the sequential
Fig. 5. The protocols forWi (top) andRu (bottom) in the sequential implementation.
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 75
implementation in which self-loops are allowed.) The initial logical value is the value corresponding
to v00 which can be chosen freely from the permitted values of the logical register.
In Figs. 6 and 7 we demonstrate a single execution of the writer’s protocol, assuming that it is
invoked in action L34. (Note. The index 3 is neither known nor used.) The current graph before L
3
4
starts, appears in Fig. 6, where the edges of the frontal branch are drawn as bold arrows. At the
beginning of L34, the current graph G
3
4 is collected. Then, the frontal branch of G
3
4, B
3
4, is computed,
and the last node of its 4-preﬁx, B34/4, namely, the current node ofW1, is assigned to the variable tail.
Now, node new is computed so that its edge emanates from tail. To do this, variables new.tail_id
and new.tail_address are assigned with the id and address of tail, namely 1 and 2, respectively. Fol-
lowing that, the address of new is chosen as the minimal address not used as tail_address in any
other current node, namely 1. At this point, computation of new is completed and it is written into
REG4 replacing v24 as the current node ofW4. The result current graph appears in Fig. 7. Note that
v34 is the last node of the current graph and v
5
6 is excluded from the frontal branch.
Correctness of the sequential implementation is straightforward and is left to the reader. To
compute the space complexity, note that component tail_id has w possible values, thus, its size is
Fig. 6. The sequential writer protocol: before L34.
Fig. 7. The sequential writer protocol: after L34.
76 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
at most logw. Since the size of the set AD is at most w there should be w + 1 possible addresses,
including 0 for the root node, and the size of components address and tail_address is logw +O(1).
Hence, the protocol’s space complexity is 3 logw +O(1). The time complexity is straightforward:
Every execution of the writer protocol consists of w + 1 physical actions and every execution of the
reader protocol consists of w actions, hence, the time complexity of the sequential implementation
is w +O(1). The sequential implementation does not improve upon the complexity of previous-
ly known sequential implementations and its role is to present the ideas on which the other two
implementations are based.
4. Multi-writer registers using multi-reader registers
In this section, we present a concurrent implementation of a (w, r)-atomic register using physical
(1, n)-atomic registers. The implementation is obtained by adjusting the sequential implementation
to the concurrent environment while preserving the basic ideas and the data structure. In this imple-
mentation, communication is again one sided, readers do not write. An important design decision
in this implementation is to linearize all logical write actions independently of the scheduling of the
logical read actions. Under this decision, it holds that at any given conﬁguration, the implemented
register has a well-deﬁned value which depends only on the execution of the writers. Linearization
of the logical read actions should ensure that every read action returns the register’s value at the
action’s linearization instance. Accordingly, this section is divided into two subsections: The ﬁrst
subsection begins with a description of the writer protocol and the linearization of the logical write
actions, and continues with a correctness proof for the linearization. The second subsection begins
with a description of the reader protocol and the linearization of the logical read actions, and then
proceeds to prove the correctness of the linearization of the read actions. Together the proofs imply
the correctness of the entire implementation.
4.1. The writer protocol
4.1.1. Description
In this implementation, once more, each writer has a current node. In every conﬁguration, c,
the writers’ current nodes induce the current graph whose last node is the node of the write ac-
tion that is linearized last, before c. The value of the last node is the value of the implemented
register at c. Concurrency, however, requires a principal difference between the implementa-
tions: Correctness of the sequential implementation crucially relies on the fact that procedure
collect always returns the current graph. In a concurrent environment, the current node of each
writer may change many times during a single invocation of collect and the collected graph
might be different from the current graph of any conﬁguration reached during the invocation.
The writer protocol is changed as follows: Node new is chosen as in the sequential implemen-
tation but it is regarded as tentative and the choice is validated using a declaration mecha-
nism: Writer Wi declares the tentative new node to the other writers by writing it into the new
ﬁeld with which REGi is augmented. Then Wi rereads the tail node of the edge of new and in
case the tail node is not changed, node new becomes current. The writer protocol appears in
Fig. 8.
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 77
Fig. 8. The protocol forWi in the (1, n) implementation.
Now, we describe the protocol in more detail assuming that Lai is executed: The register ofWi,
REGi, holds two nodes, denoted by REGi.current and REGi.new. First Wi collects the graph in-
duced by all current nodes. Execution of collect takes w atomic physical actions which are denoted
by rai [1] · · · rai [w]. Procedure collect in this implementation is identical to collect in the sequential
implementation except that the set AD includes all addresses from component tail_address of all
collected current and new nodes. Following collect, Wi makes sure that its new address will not
be equal to its current address, by adding current.address (= va−1i .address) to the set AD. Thus
the ﬁnal size of the set AD may reach 2w + 2, including the addresses of the root node (0) and
current.address.
The graph collected during Lai is denoted by G
a
i . The frontal branch of G
a
i is denoted by B
a
i . Let
can_tail be the last node of Bai /i and let cid be the id of (the owner of) can_tail. AfterG
a
i is collected,
Wi chooses a tentative new node whose edge emanates from can_tail andwhose address is not equal
to any address in the set AD, thus the outdegree of the new node is 0. In addition, it is ensured that
new.address /= va−1i .address, thus a writer never uses the same address twice in a row. The chosen
node is declared byWi by writing it into REGi.new while REGi.current is not changed. The declaring
write action is denoted by pai .
Following pai , Wi rereads REGcid .current. This second read action is denoted by r˜ai [cid]. Af-
ter r˜ai [cid], Wi computes vai . The following notation is needed in order to describe the way vai is
computed: Let vbj and v
c
j be two nodes that are equal as records, namely: v
b
j .address = vcj .address,
vbj .tail_id = vcj .tail_id and vbj .tail_address = vcj .tail_address. The fact that vbj and vcj are equal (as
records) is denoted by vbj  vcj .
Note. The fact that vbj  vcj does not imply that b = c.
78 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
The (current) node of Lai , v
a
i , is chosen as follows: Let can_tail and tail be the two nodes read
by Wi in actions rai [cid] and r˜ai [cid], respectively. If can_tail  tail then the new node is assigned
to current, in this case we say that Lai connects. If, however, can_tail  tail then current is chosen
so that its edge is a self-loop directed from vai towards itself. In this case, we say that L
a
i loops. The
logical write action Lai is concluded by a ﬁnal physical write action in whichWi writes vai in REGi
replacing va−1i as its current node. This last concluding write action is denoted by wai .
To complete the deﬁnition of the protocol, we need to specify initial values for all data structures.
Similar to the sequential implementation, both the current and the new nodes are initialized to hold
the initial node, v0i , where v
0
i .address = 0, v0i .tail_id = i and v0i .tail_address = 0. That is, at the initial
conﬁguration all edges of the current graph are self-loops, its initial frontal branch contains only
the root node v00 and the initial logical value is the value corresponding to v
0
0, which can be chosen
freely from the set of all permitted values.
Component tail_id has w + 1 possible values and its size is logw +O(1). Since the size of the set
AD is at most 2w + 2 there should be 2w + 3 possible addresses and the size of components address
and tail_address is also logw +O(1). Thus, the space required by each node is 3 logw +O(1) and
since the each register contains 2 nodes, the protocol’s space complexity is 6 logw +O(1). The time
complexity of the writer’s protocol in this implementation is w + 3, assuming that each writer reads
its own register.
4.1.2. Linearization of the logical write actions
Now, we ﬁx some arbitrary execution of the system, E = c0, d1, c1, . . ., and all deﬁnitions and
proofs are made with respect to E. Since E is arbitrary, the results hold for every system execution
of the implementation. The logical write actions are linearized using aHistory graph—a precedence
graph which reﬂects the execution of the system. The history graph, which is not computable by the
processors, plays a key role in the correctness proof of our protocols.
Deﬁnition 4. Let vai be the node of logical write action L
a
i . The tail node of v
a
i is deﬁned as follows:
(1) If Lai connects then the tail node of v
a
i is the node tail, read in r˜
a
i [cid], which is  to can_tail
(read in rai [cid]).
(2) If Lai loops then the tail node of v
a
i is v
a
i itself.
Deﬁnition 5. Let E = c0, d1, c1, . . . be an execution of the system. For every conﬁguration ct of E, the
History graph of ct , Hct (denoted also as Hdt ), is deﬁned as follows:
(1) Hc0 is the History graph before the execution begins; it contains only the root node—v00.
(2) Let ct be an arbitrary conﬁguration of E. If dt = wai , for some i and a, then Hct = Hw
a
i is ob-
tained from Hct−1 by adding node, vai , and an edge directed from the tail node of v
a
i into v
a
i . If
dt /= wai for any i and a then Hct = Hct−1 .
For any conﬁguration ct , the history graph Hct is a precedence forest which consists of a single
precedence tree and some disjoint self-loops. The number of nodes in Hct is equal to the number
of logical write actions completed until ct plus one (for the root node). It should be noted that
though each node of H corresponds to some node of the precedence graph P , the graph H is not a
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 79
subgraph of P . Before we deﬁne the frontal branch ofH , we should extend relation locally precedes
to accommodate the following situation: Unlike the collected precedence graphs, the graph H may
have a node vck with several edges directed from v
c
k into several nodes of the same writer. Since the
actions of each individual processor are temporally ordered by their indices, nodes with equal ids
are ordered by their indices where a node with a lower index locally precedes a node with a higher
index.
Deﬁnition 6. Let vai and v
b
j be two nodes with a common tail node. Node v
a
i locally precedes node v
b
j
if either i > j or if i = j and a < b.
Under the new deﬁnition of locally precedes, the frontal branch of Hdt , denoted by BdtH , is deﬁned
just as before in Deﬁnition 3. The next deﬁnition is needed for the linearization of write actions:
Deﬁnition 7. Let Lai be an arbitrary logical write action. Action L
a
i is lasting if v
a
i is last in B
wai
H . If L
a
i
is not lasting, it is transient.
Obviously if Lai loops then it is transient, but there are many cases in which L
a
i connects but it is
still transient. For example, consider a situation in which the last node of the current graph is vai and
write actions Lbj and L
c
k , where i < j < k , are executed, while all other writers and readers are idle.
The physical actions of Lbj and L
c
k are executed as follows: ﬁrst, all physical actions of L
b
j except w
b
j
are executed. Then all physical actions of Lck except w
c
k are executed. In this situation, both writers
collect the same precedence graph (which is equal to the current graph), choose vai as tail, compute
a new node whose edge emanate from vai , declare the new node toWi (actions pbj and pck ), reread
REGi (actions r˜bj [i] and r˜ck [i]) and connect. Now, assume that wbj is executed. At this instance, the
edge (vai , v
b
j ) is added toH , and the last node ofH is v
b
j , so L
b
j is lasting. Following thatw
c
k is executed
and the edge (vai , v
c
k) is added to H . At this instance, however, the edge (v
a
i , v
b
j ) is already in H so v
b
j
excludes vck from B
wck
H , that is: L
c
k is transient.
Using the partition of logical write actions to lasting and transient we linearize all the logical
write actions. For every execution E = c0, d1, c1, . . ., linearization points can be speciﬁed by E’s con-
ﬁgurations or by its physical actions. When we say that logical action a is linearized at physical
action di we mean that the linearization point is after di has taken its effect and actually it can be
looked at as if a is linearized at conﬁguration ci .
Deﬁnition 8. Let Lai be an arbitrary logical write action: The linearization point of L
a
i is deﬁned as
follows:
(1) If Lai is lasting then L
a
i is linearized at w
a
i .
(2) If Lai is transient and v
b
j is the last node of B
wai
H then L
a
i is linearized before w
b
j and after any
other physical action which precedes wbj . In this case, we say that L
a
i is linearized by L
b
j .
(3) If two transient write actions, Lai and L
b
j , are linearized by the same lasting write action L
c
k , L
a
i
and Lbj are linearized by ascending order of their ids.
Under the deﬁned linearization, the value written by a lasting logical write, Lai , is the value of
the implemented register from the point of execution of wai until the next lasting logical write
80 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
completes its execution, or an inﬁnitesimally short time before the next lasting logical write com-
pletes its execution. On the other hand, the value written by a transient logical write is the value of
the implemented register only for an inﬁnitesimally short period.
4.1.3. Correctness of the linearization of logical write actions
In the correctness proof, we have to prove that the linearization point satisﬁes the linearization
requirements for atomic registers. In Theorem 10 we prove that every logical write action is linear-
ized within its execution interval. In Theorem 11 we prove that all graphs collected by the writers
are precedence graphs. We start the correctness proof with some technical lemmas.
Lemma 1. Let dt be an arbitrary physical action and let vai be an arbitrary node in H
dt . If vai ∈ BdtH ,
then for every u > t, vai ∈ BduH .
Proof. If Lai loops then for any u  t, vai is not connected to the root inHdu . Assume that Lai connects
and denote the path from the root to vai , inH
dt , by '. Since vai ∈ BdtH , let (vbj , vck) be the ﬁrst edge which
is in ' but not in BdtH and let (v
b
j , v
d
* ) be the edge excluding (v
b
j , v
c
k) from B
dt
H . No edge of H is ever
deleted, therefore, for any u > t, (vbj , v
d
* ) excludes the sufﬁx of ' and in particular v
a
i , from B
du
H . 
Let dt and du be two physical atomic actions; the fact that dt occurs before du, i.e. t < u, is denoted
by dt → du.
Lemma 2. If (vbj , v
a
i ), i /= j, is an edge of H then wbj occurs before wai (i.e., wbj → wai ).
Proof. According to the deﬁnition of the history graph, (vbj , v
a
i ), i /= j, is an edge of H only if the id
of the last node of Bai /i is equal to j, and the tail node of v
a
i (that is the node read in r˜
a
i [cid]) is vbj .
Therefore wbj → r˜ai [j]. Since r˜ai [j] → wai , we get wbj → wai . 
Lemma 3. If, for some t > 0, vai ∈ BdtH then Lai is lasting.
Proof. By Lemma 1, vai ∈ Bw
a
i
H . Lemma 2 implies that for any edge (v
a
i , v
b
j ) in H , w
a
i → wbj . Hence the
outdegree of vai in H
wai is 0, and vai is last in B
wai
H . By Deﬁnition 7, L
a
i is lasting. 
Lemma 4.
(a) The sequence of node ids along any directed path of the history graph, from the root towards any
leaf, is strictly increasing.
(b) Every directed path of the history graph contains at most one node of every writer.
Proof.According to the protocol, the tail node of vai is to the last node of Bai /i. Since Bai /i contains
only nodes whose id < i, the proof for (a) follows. (b) is implied immediately by (a). 
The next four (quite technical) lemmas deal with the relations between the history graph H ,
its frontal branch BH , trees collected by the writers and the writers’ current nodes. These lemmas
are used in the proof of Theorem 10 in which correctness of the linearization is proved. The next
lemma demonstrates a principal difference between the sequential implementation and the concur-
rent implementation: In the sequential implementation, whenever a processor reads nodes of two
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 81
processors, say vbj and v
a
i (j < i), and discovers that there is an edge from v
b
j to v
a
i , it can immediately
conclude that the edge (vai , v
b
j ) is part of the current graph. In concurrent environment, the situation
is very different. For example,Wk may read v = vbj , thenWj may compute several nodes until, in
Lb+rj , it reaches v again, that is vbj  vb+rj . At this point,Wi may execute Lai and compute vai so that
its edge is (vb+rj , vai ). Now, ifWk reads vai , the edge (vbj , vai ) is in Gck while the current graph (and the
history graph H ) has the edge (vb+rj , vai ). In the following lemma we prove that whenever (vbj , vai ) is
in Gck , either it is also in H or the cause for the edge of G
c
k is the scenario described above.
Lemma 5. If (vbj , v
a
i ) is an edge of G
c
k then there exists an integer r, r  0, such that (v
b+r
j , v
a
i ) is an
edge of Hw
a
i .
Proof. By thewriter’s protocol code, j  i. If j = i then Lai loops and the lemma follows immediately.
Wecontinue theproof assuming that j < i, that isLai connects.First,weprove that if (v
b
j , v
a
i ) is an edge
ofGck then r
b
j [i] → pai . Assume by way of contradiction that pai → rbj [i]; by the protocol rbj [i] → wbj .
Since vbj is a node in G
c
k , it holds that w
b
j → rck [j]. Since Gck is collected in ascending order, it holds
that rck [j] → rck [i]. Thus we get that pai → rbj [i] → rck [i]. Since vai ∈ Gck , vai is the current node ofWi at
rck [i], so we can conclude that during the interval [pai , rck [i]], vai is either the new node ofWi or it is its
current node.Therefore, vai .tail_address ∈ ADbj , whereADbj is the setAD returnedbyprocedure collect
invoked duringLbj . In this case, the code implies that v
b
j .address /= vai .tail_address. On the other hand,
the fact that (vbj , v
a
i ) is an edge ofG
c
k implies that v
a
i .tail_address = vbj .address, a contradiction.
Now, we continue the proof assuming rbj [i] → pai . Since Lai connects, Deﬁnition 5 implies that
there exists some integer r such that vb+rj is the current node of Wj at r˜ai [j] and (vb+rj , vai ) is an
edge of H . To prove the lemma we have to show that r  0. Assume by way of contradiction that
r < 0. Since (vbj , v
a
i ) is an edge of G
c
k , v
a
i .tail_address = vbj .address; since (vb+rj , vai ) is an edge of H ,
vai .tail_address = vb+rj .address; thus vbj .address = vb+rj .address. Since awriter never uses the same ad-
dress twice in a row,we conclude that r < −1 andwb+rj → wb−1j → rbj [i]. Since rbj [i] → pai , we get that
wb+rj → wb−1j → rbj [i] → pai → r˜ai [j], and in particularwb+rj → wb−1j → r˜ai [j]. Therefore, vb+rj is not
the current node ofWj at r˜ai [j], and the edge (vb+rj , vai ) does not belong to H , a contradiction. 
In the next lemma, we prove that if vai belongs to the frontal branch of the history graph in some
arbitrary conﬁguration ct then vai is the current node ofWi in ct . In the proofwe use the following no-
tation: The fact that vai and v
b
j are actually the same node, i.e., i = j and a = b, is denoted by vai ≡ vbj .
The fact that every node vbj ∈ BctH is in Bai and every node vck ∈ Bai is in BctH is denoted by BctH ≡ Bai .
Lemma 6. If vai ∈ BctH for some conﬁguration ct , then vai is the current node ofWi at ct.
Proof. Assume by way of contradiction that there are nodes which do not satisfy the lemma. Let
vai be the ﬁrst node in E among these nodes, that is, there exists some conﬁguration ct , such that
wa+1i occurs before ct , while vai ∈ BctH . Under these conditions, Lemma 1 implies that vai ∈ B
wa+1i
H . By
the same Lemma vai ∈ Bw
a
i
H , that is B
wai
H /i ≡ B
wa+1i
H /i. By the minimality of v
a
i we get that all nodes in
B
wa+1i
H /i are the current nodes of their owners throughout L
a+1
i , Thus B
wa+1i
H /i is a subgraph of G
a+1
i .
82 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
Now we prove that Ba+1i /i ≡ Bw
a+1
i
H /i. Assume towards a contradiction that B
a+1
i /i ≡ Bw
a+1
i
H /i.
Since B
wa+1i
H is a subgraph of G
a+1
i , there exists a node of B
wa+1i
H /i (and of B
a+1
i /i), v
b
j , and an edge
of Ga+1i , (vbj , vck) such that j < k < i, and the edge (vbj , vck) excludes the rest of B
wa+1i
H from B
a+1
i /i.
Since vbj ∈ Bw
a+1
i
H /i, and (v
b
j , v
c
k) is an edge of G
a+1
i , Lemma 5 implies that in H
wck there exists an
edge (vb+rj , vck), for some r  0. By the minimality of vai we get that vbj is the current node of Wj
throughout La+1i . Since vck is a node of Gai we conclude that vbj is the current node ofWj throughout
Lck and in particular, the node read in r˜
c
k [j]. By Deﬁnition 5, we conclude that (vbj , vck) is an edge of
Hw
a+1
i . This, however, means that the edge (vbj , v
c
k) excludes the rest of B
wai
H /i from B
wa+1i
H /i, hence
vai ∈ Bw
a+1
i
H , a contradiction. We conclude B
a+1
i /i ≡ Bw
a+1
i
H /i.
Since Ba+1i /i ≡ Bw
a+1
i
H /i, L
a+1
i connects and v
a+1
i is last in B
wa+1i
H . This, however, means that v
a+1
i
excludes vai from B
wa+1i
H , in contradiction to the assumption that v
a
i ∈ BctH . The lemma follows. 
Lemma 7. If Lai is transient then the last node of B
wai
H /i is ≡ to the last node of Bai /i and hence,
B
wai
H /i ≡ Bai /i.
Proof. Let vbj be the last node of B
wai
H /i and assume by way of contradiction that v
b
j is also the last
node of Bai /i. Since v
b
j is a node inG
a
i , we conclude thatw
b
j → rai [j]. Since vbj is a node in Bw
a
i
H , Lemma
6 implies that vbj is the current node of Wj at wai , hence vbj is the current node of Wj during the
interval [rai [j] ,wai ] and in particular at r˜ai [j]. Therefore, nodes can_tail and tail read during Lai are
both ≡ to vbj , hence Lai connects. Since vbj is last in Bw
a
i
H /i, v
a
i is last in B
wai
H , that is L
a
i is lasting, a
contradiction. The lemma follows. 
Though in general edges of collected graphs are not in H , the next corollary gives a sufﬁcient
condition for a collected edge to be in H :
Corollary 8. Let (vbj , v
a
i ) be an edge of G
c
k. If for some conﬁguration ct after r
c
k [i], vbj ∈ BctH then (vbj , vai )
is an edge of Hct .
Proof. Since (vbj , v
a
i ) is an edge of G
c
k , Lemma 5 implies that for some r, r  0, (v
b+r
j , v
a
i ) is an edge
of Hct . Since vbj ∈ BctH , Lemma 6 implies that vbj is the current node ofWj at ct , therefore r = 0. 
In the next lemma and in the theorem that follows, we prove that the linearization point of each
logical write action lies within its execution interval.
Lemma 9. If for some conﬁguration ct , after rai [j], it holds that BctH /(j + 1) ≡ Bai /(j + 1), then there
exists a lasting logical write action Lck , for some k  j, such that wck occurs within the interval starting
at rai [k] and ending at ct.
Proof. By the assumption of the lemma, BctH /(j + 1) ≡ Bai /(j + 1). Let vd* , *  j, be the node with the
maximal id in the common preﬁx of BctH /(j + 1) and Bai /(j + 1). Such a node always exists because
the root, v00, belongs to both branches.
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 83
First, we show that vd* is not the last node of B
ct
H /(j + 1) and identify action Lck : if vd* is the
last node of BctH /(j + 1) then since BctH /(j + 1) ≡ Bai /(j + 1), vd* is not the last node of Bai /(j + 1). Let
e1 = (vd* , vfp ), p < j + 1, be the edge emanating from vd* in Bai /(j + 1). Since vd* ∈ BctH , Corollary 8
implies that (vd* , v
f
p ) is an edge of Hct ; since p < j + 1, either e1 ∈ BctH /(j + 1) or there exists some
edge e2 ∈ BctH /(j + 1) excluding e1 from BctH /(j + 1), a contradiction to the assumption that vd* is the
last node of BctH /(j + 1). Since vd* is not the last node of BctH /(j + 1), let (vd* , vck), * < k  j, be the edge
emanating from vd* in B
ct
H /(j + 1).
Now, we show that Lck satisﬁes the requirements of the lemma: The situation is depicted in Fig. 9.
Since vck ∈ BctH , Lemma 3 implies that Lck is lasting. Since vck ∈ Hct , wck occurs before ct . To complete
the proof it remains to show that rai [k] → wck . Assume by way of contradiction that wck → rai [k].
Now we show that (vd* , v
c
k) is an edge of G
a
i (but not of B
a
i /(j + 1)). By its deﬁnition, vck is a node of
B
ct
H , therefore, by Lemma 6, v
c
k is the current node ofWk at ct . By our assumption wck → rai [k], hence
vck is a node of G
a
i and (v
d
* , v
c
k) is an edge of G
a
i . Since k > *, and v
d
* is the maximal common node
in BctH /(j + 1) and Bai /(j + 1) we conclude that vck ∈ Bai /(j + 1). Let (vd* , vem), * < m < k , be the edge
excluding (vd* , v
c
k) from B
a
i /(j + 1). By Corollary 8, (vd* , vem) is an edge in Hct . Since m < k , (vd* , vem)
excludes (vd* , v
c
k) fromB
ct
H , a contradiction to the assumption that (v
d
* , v
c
k) is inB
ct
H /(j + 1). The lemma
follows. 
Theorem 10. Every logical write action is linearized within its execution interval.
Proof. Let Lai be some logical write action. If L
a
i is lasting then it is linearized at its concluding
physical write wai , which is within its execution interval. For the rest of the proof we assume that
Lai is transient. Let v
b
j be the last node of B
wai
H by which L
a
i is linearized. We have to prove that L
b
j
is linearized within the execution interval of Lai , which follows if we show r
a
i [1] → wbj → wai . Since
vbj ∈ Hw
a
i , it is clear that wbj → wai . Now, we prove that rai [1] → wbj . Action Lai is transient, therefore,
Lemma 7 implies that B
wai
H /i ≡ Bai /i. By Lemma 9 (with j = i − 1), there exists a lasting logical write
action Lck , 1  k < i, such that wck occurs within the interval starting at rai [k] and ending at wai . For
Fig. 9. The graph for Lemma 9.
84 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
k = 1, we get rai [1] → wck → wai . For k > 1 we use the fact that rai [1] → rai [k], and once again get
rai [1] → wck → wai . If Lbj ≡ Lck then the proof follows. Otherwise, since Lbj and Lck are both lasting, and
since vbj is the last node of B
wai
H , we get that w
c
k → wbj . Therefore, rai [1] → wck → wbj , which implies
rai [1] → wbj . 
The fact that Lai is linearized before L
b
j is denoted by L
a
i ⇒ Lbj . The next theorem says that the
graphs collected by the processors are precedence graphs with respect to the relation⇒.
Theorem 11.
(a) If (vbj , v
a
i ), j < i, is an edge of H then L
b
j ⇒ Lai .
(b) If (vbj , v
a
i ), j < i, is an edge of G
c
k then L
b
j ⇒ Lai .
Proof of (a). By Lemma 2, wbj → wai . Consider the following cases:
Case 1: Lai is lasting. In this case, L
a
i is linearized at w
a
i . Since L
b
j is not linearized after w
b
j , Lemma
2 implies that wbj → wai , hence Lbj ⇒ Lai .
Case 2: Lai and L
b
j are both transient. Let v
c
k and v
d
* be the last nodes of B
wai
H and B
wbj
H , respectively. In
this case Lai is linearized by L
c
k and L
b
j is linearized by L
d
* . If v
c
k ≡ vd* , then by Deﬁnition 8,
Lai and L
b
j are linearized by their ids. Since i > j, L
b
j ⇒ Lai . Assume vck ≡ vd* . By Deﬁnition 8,
Ld* is the last lasting write action linearized before w
b
j and L
c
k is the last lasting write action
linearized before wai . Since w
b
j → wai it holds that wd* → wck and therefore Ld* ⇒ Lck . Since Lbj
is linearized by Ld* and L
a
i is linearized by L
c
k , we get L
b
j ⇒ Lai .
Case 3: Lai is transient, L
b
j is lasting. First we show that L
a
i is not linearized by L
b
j . If this was the
case, vbj would have been last in B
wai
H , but this means that L
a
i is lasting, a contradiction. The
rest of the proof of this case is very similar to the proof of the previous case.
Proof of (b).ByLemma5, there exists some r, r  0, such that (vb+rj , vai ) is an edgeofH . By (a),Lb+rj ⇒
Lai . If r = 0 then we get Lbj ⇒ Lai . If r > 0 then, since Lbj ⇒ Lb+rj , we get again that Lbj ⇒ Lai . 
4.2. The reader protocol
4.2.1. Description
The basic idea in this implementation is to maintain a precedence graph whose last node at any
conﬁguration ct belongs to the logical write linearized last before ct . Unfortunately, it is not sufﬁ-
cient for a reader to just collect a precedence graph and return the last node in the graph’s frontal
branch: Due to concurrency, the current graph and its last node may change during the execution
of collect by the reader. As a result, a node of a logical write action which should not be returned
by a reader may appear as last in the reader’s collected graph.
To overcome this problem, the reader collects three graphs; the ﬁrst and third graphs are collected
by procedure collect of the writer’s protocol and they are denoted by G and Gˆ, respectively. The
second graph, denoted by
←
G , is collected by procedure
←−
collect which is identical to collect except
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 85
Fig. 10. The protocol forRu in the (1, n) implementation.
that nodes are collected in reverse order—from REGw down to REG1 (therefore most of the lemmas
proven in the previous section do not hold for
←
G). AfterG,
←
G and Gˆ are collected, they are analyzed
and a node satisfying the requirements for the reader protocol is identiﬁed and returned. The code
of the reader protocol appears in Fig. 10.
LetRu be a reader executing the protocol. The physical actions, executed byRu, during the reader
protocol are denoted by: ru[1] · · · ru[w], in whichG is collected,←ru [w] · · · ←ru [1], in which
←
G is collect-
ed, and rˆu[1] · · · rˆu[w], in which Gˆ is collected. The notation Bai ⊂ Bbj is used when for each vck ∈ Bai ,
there is a node vdk ∈ Bbj such that vck  vdk . The notation Bai  Bbj is used when Bai ⊂ Bbj and Bbj ⊂ Bai .
4.2.2. Linearization of the logical read actions
Throughout this paper Sau denotes the ath execution of the read protocol byRu. The linearization
point of any logical read action is determined by the linearization point of the logical write action
whose value is returned by the read action, as follows.
Deﬁnition 9. Let vbi be the node returned by S
a
u . Denote by cs and ct the result conﬁgurations of r
a
u[1]
and rˆau[w], respectively. The linearization point of Sau is deﬁned as follows:
(1) If Lbi is linearized before cs then S
a
u is linearized at cs.
(2) If Lbi is linearized after cs then S
a
u is linearized by L
b
i , that is after L
b
i and before any physical
action which occurs after Lbi (or before any logical write which is linearized after L
b
i ). In case
that two logical read actions are linearized by the same logical write action, they are linearized
by an ascending order of their ids.
4.2.3. Correctness of the implementation
The correctness of the linearization scheme for logical read actions, and the correctness of the
entire implementation, is proved by the following theorem.
86 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
Theorem 12.Let (cs, ct) be the execution interval of Su, let v be the node returned by Su, and let L be the
logical write action that produced the node v. Logical action L satisﬁes one of the following two claims:
(1) Either
L is the last logical write action linearized before cs( and hence v is last in B
cs
H ).
(2)Or
L is linearized within the interval (cs, ct).
Proof. Since v is read during the execution of Su, clearly L terminates before ct . Therefore, to con-
clude that (2) holds, it sufﬁces to show that L is linearized after cs. Consider the following cases
(which match the cases of the protocol):
Case 1: Bu  Bˆu 
←
Bu.
Let * be the smallest id such that vb*  vc* and vc*  vd* , where vb* , vc* and vd* are nodes in Gu,←
Gu, and Gˆu, respectively. By Claim 1 such nodes always exist. According to the protocol Su
returns vd* . Since v
b
*  vc* we get b < c and ru[1] → ru[*] → wc* . Since vc*  vd* we get c < d and
wc* → rd* [1]. We conclude that ru[1] → rd* [1], hence Ld* is linearized after cs.
Claim 1.Under the conditions of Case 1, there exists an integer *, 1  *  w, such that vb* ≡ vc* ≡
vd* , where v
b
* , v
c
* and v
d
* are nodes in Gu,
←
Gu, and Gˆu, respectively.
Proof of claim. By the conditions for Case 1 we have Bu  Bˆu 
←
Bu. Consider the following
cases:
Case 1.1: Bu is a subgraph of
←
Gu.
Let vak be the last node on the common preﬁx of Bu and
←
Bu and let (vak , v
c
*) be the
ﬁrst edge of
←
Bu not included in Bu. Such an edge always exists since, by the condi-
tions of Cases 1 and 1.1, we have Bu 
←
Bu and Bu ⊂
←
Gu . Let vb* , v
c
* and v
d
* be the nodes of
W* inGu,
←
Gu, and Gˆu, respectively. Obviously vb*  vc* because otherwise the edge (vak , vc*)
would have been included in Bu. Since Bu  Bˆu we also have vc*  vd* . The claim follows.
Case 1.2: Bu is not a subgraph of
←
Gu.
Let vb* be the ﬁrst node on Bu which does not belong to
←
Bu. Such a node always exists
because Bu is not a subgraph of
←
Gu. Let vb* , v
c
* and v
d
* be the nodes ofW* in Gu,
←
Gu, and
Gˆu, respectively. Obviously vb*  vc* and vc*  vd* . The claim follows. 
Case 2: Bu  Bˆu 
←
Bu.
Let vd* be the last node of Bˆu. According to the protocol Su returns v
d
* . Consider the following
cases:
Case 2.1: Bu ≡ Bˆu.
In this case, there exist two distinct nodes vbj and v
c
j (b < c) of the same writer, Wj ,
such that vbj ∈ Bu while vcj ∈ Bˆu. Since Bu  Bˆu, it holds that vbj  vcj , and in particular
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 87
vbj .address = vcj .address. A writer does not use the same address twice in a row, hence,
(b+ 1) < c. Therefore, ru[j] → wb+1j → rcj [1] → wcj , which implies that Lcj is linearized
after cs. If vcj ≡ vd* then Gˆu has a path from vcj to vd* , because vcj is in Bˆu, and vd* is last in
Bˆu. Hence, by Theorem 11 (b), Lcj ⇒ Ld* and Ld* is linearized after cs.
Case 2.2: Bu ≡ Bˆu.
First, we show that Bu is a path in H
ru[w]. To prove this, we assume that (vbj , vck) is an
arbitrary edge in Bu and show that (v
b
j , v
c
k) is an edge of H
ru[w]. By Lemma 5, there
exists an integer r, r  0, such that (vb+rj , vck) is an edge in Hw
c
k , where vb+rj is the node
read in r˜ck [j]. Since vck is a node of Bu, we conclude that wck → ru[w] and since no edge
is ever deleted from the history graph, we conclude that (vb+rj , vck) is an edge in Hru[w].
Since vbj is a node of Bu, and since Bu ≡ Bˆu, we conclude that rˆu[j] → wb+1j , thus vbj is
the current node ofWj at least until rˆu[j]. Since wck → ru[w], we conclude that r˜ck [j] →
wck → ru[w] → rˆu[j] → wb+1j and since r  0, we conclude that r = 0 and (vbj , vck) is an
edge of Hru[w]. The proof follows.
The assumptions Bu 
←
Bu (Case 2) and Bu ≡ Bˆu (Case 2.2) imply that Bu ≡
←
Bu≡ Bˆu.
Since Bu ≡ Bru[w]H (which is proven in Claim 2), we get that vd* is last in Bru[w]H , which
implies that vd* is lasting and that it is linearized at w
d
* . Therefore, v
d
* is either last in B
cs
H
or it is linearized after cs.
Claim 2. Under the assumptions of Case 2.2, it holds that Bu ≡ Bru[w]H .
Proof of claim. Assume by way of contradiction that Bu ≡ Bru[w]H . Since no edge is ever
deleted from the history graph, it follows that for every conﬁguration ct , after ru[w],
either Bu ⊂ BctH , or there exists some edge in Hct that excludes a sufﬁx of Bu from BctH .
Let S = c←
tw
, c←
tw−1
, . . . , c←
t1
be the sequence of result conﬁgurations of actions
←
ru [w] ,
←
ru [w − 1], . . . ,←ru [1], respectively. Deﬁne the function EX(ct) on S as follows: The value
of EX(ct) is either w, if Bu ⊂ BctH or it is k where k is the id of the node in BctH whose edge
excludes a sufﬁx of Bu from B
ct
H . For every c←t i
, 1  i  w, 1  EX(c←
t i
)  w. Since no edge
is ever deleted from BH , the sequence of values of EX , EX(c←tw
),EX(c←
tw−1
), . . . ,EX(c←
t1
)
is non-increasing.
Now, we use the function EX to reach a contradiction by showing that under these
assumptions Bu ≡
←
Bu. Since
←
Gu is collected in reverse order, fromw to 1, and EX is a non-
increasing integer function into the interval [1,w], there exists an integer k , 1  k  w
such that c←
tk
∈ S and EX(c←
tk
) = k . Let (vbj , vck) be the edge of B
c←
t k
H excluding a sufﬁx of
Bu from B
c←
t k
H . By this deﬁnition, v
b
j ∈ Bu; since Bu ≡
←
Bu, vbj ∈
←
Bu, hence vbj ∈
←
Gu. Since vck
belongs to B
c←
t k
H , Lemma 6 implies that it is the current node ofWk at c←tk and therefore
vck is a node in
←
Gu. Since vck ∈
←
Gu and vbj ∈
←
Gu, we get that (vbj , v
c
k) is an edge in
←
Gu which
excludes a sufﬁx of Bu from
←
Bu, a contradiction to the assumption that Bu ≡
←
Bu. 
88 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
Case 3: Bu  Bˆu and Bu ⊂ Gˆu
Let vai be the last node of Bˆu. According to the protocol Su returns v
a
i . Since Bu ⊂ Gˆu and
Bu  Bˆu, Bˆu has a sufﬁx whose nodes are not in Bu. Let vcj ∈ Bˆu be the minimal node in that
sufﬁx and let vbj be the node ofWj in Gu. By this deﬁnition, vbj  vcj . By Claim 3 below, Lcj is
linearized after cs. If Lcj ≡ Lai , we are done. Otherwise, j < i and Bˆu has a path from vcj to vai .
By Theorem 11 (b), Lcj ⇒ Lai .
Claim 3. Under the assumptions of Case 3, Lcj is linearized after cs.
Proof of claim. Since vbj  vcj , clearly ru[j] → wcj and thereforewcj occurs after cs. If Lcj is lasting
then it is linearized at wcj and the proof follows. If L
c
j is transient then let v
d
k be the last node of
B
wcj
H , by which L
c
j is linearized. If L
d
k is linearized after cs then L
c
j is also linearized after cs and we
are done. We continue the proof assuming that Ldk is linearized before cs, that is, w
d
k → ru[1].
By this assumption, there are no lasting write actions that are linearized within the interval
starting at ru[1] and ending at wcj . In this case, Lemma 9 implies that B
wcj
H /(j + 1) ≡ Bu/(j + 1).
Consider (ve*, v
c
j ), the edge of v
c
j in Bˆu. By Lemma 5, there exists an integer r, r  0, such that
(ve+r* , vcj ) is an edge in H
wcj . By Theorem 11(a), Lcj is linearized after L
e+r
* . Now, we show that
r > 0:Assume byway of contradiction that r = 0, that is (ve*, vcj ) is an edge inHw
c
j . By the deﬁni-
tion of vcj , v
e
* ∈ Bu. Since Bu/(j + 1) ≡ B
wcj
H /(j + 1), we get that ve* ∈ B
wcj
H /(j + 1). Since Lcj is tran-
sient, let (ve*, v
f
p ) be the edge excluding (ve*, v
c
j ) from B
wcj
H /(j + 1). Since Bu/(j + 1) ≡ B
wcj
H /(j + 1)
the edge (ve*, v
f
p ) belongs also to Bu/(j + 1). Since Bu ⊂ Gˆu, (ve*, vfp ) is an edge in Gˆu, and it
excludes (ve*, v
c
j ) from Bˆu, a contradiction to the deﬁnition of v
c
j .
The proof is completed by showing that if r > 0 then Le+r* is linearized after cs, and therefore
Lcj is also linearized after cs. If r > 0 then r > 1 because a writer never uses the same address
twice in a row. Since ve* is the current node read at rˆu[*], rˆu[*] → we+1* . Since we+1* → re+r* [1]
we get that rˆu[*] → re+r* [1] which implies that Le+r* is linearized after cs. Therefore, Lcj is also
linearized after cs. 
Case 4: Bu  Bˆu and Bu ⊂ Gˆu
Let i be the smallest id such that vai is in Bu, v
b
i ∈ Gˆu and vai  vbi . According to the reader
protocol, Su returns vbi . Under this assumption, we show that L
b
i is linearized after cs. Since
vbi  vai , wai → ru[i] → wbi → rˆu[i], in particular ru[1] → wbi . If Lbi is lasting, then it is linearized
after cs. Hence, for the rest of Case 4, we assume that Lbi is transient. In the sequel, we show
that there exists a lasting write action Lcj which is linearized after cs, and L
b
i is either linearized
by Lcj or it is linearized after L
c
j . Consider the following cases:
Case 4.1: vai ∈ Bru[i]H .
Since Lbi is transient, B
wbi
H /i ≡ Bbi /i, hence Lemma 9 (with j = i − 1) implies that there
exists a lasting write action Lcj , such that j < i and r
b
i [j] → wcj → wbi . Since wai → rbi [j]
we get that wai → wcj → wbi .
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 89
Now, we use Lcj to show that L
b
i is linearized after cs. Since L
b
i is transient, it is linear-
ized by the last node of B
wbi
H . Since w
c
j → wbi , the last node of Bw
b
i
H is either v
c
j or the node
of another logical write action which is linearized after Lcj .
The proof is completed by showing that Lcj is linearized after ru[i], which implies that
Lbi is also linearized after ru[i] and therefore after cs. Since Lcj is lasting, it is linearized at
wcj . Therefore, it is enough to show that ru[i] → wcj . Note that since wai → wcj , vai ∈ Hw
c
j .
However, since vcj is last in B
wcj
H and i > j, Lemma 4(a) implies that v
a
i ∈ B
wcj
H . Therefore,
Lemma 1 implies that for any conﬁguration ct after wcj , v
a
i ∈ BctH . By the assumption of
this case vai ∈ Bru[i]H , therefore ru[i] → wcj .
Case 4.2: vai ∈ Bru[i]H .
By the deﬁnition of i in Case 4, vai ∈ Bu. On the other hand, by Case 4.2, vai ∈ Bru[i]H ,
therefore Bru[i]H /(i + 1) ≡ Bu/(i + 1). By Lemma 9, there exists a lasting write action Lcj ,
j  i, such that ru[j] → wcj → ru[i], hence j < i. Since Lcj is lasting, it is linearized at wcj .
Since wcj → ru[i] → wbi , we get wcj → wbi , and therefore Lbi is linearized either by Lcj or
later. Since wcj occurs after cs, L
b
i is linearized after cs too. 
5. Multi-writer registers using single-reader registers
5.1. Description
In this section, we present an implementation whose physical registers are atomic, (1, 1)-registers.
In this implementation, readers and writers use the same protocol which is obtained by modifying
the writer protocol of the (1, n) implementation. The ids of the readers are larger than the ids of
the writers and the reader protocol contains an extra return statement in which the read value is re-
turned. Communication is two sided; processor Pi communicates with processor Pj , i /= j, by writing
into a (1, 1) atomic register denoted by REGi,j from which Pj reads. A physical write action executed
by Pi to REGi,j is denoted by wi[j]; while rj[i] denotes a physical read action by Pj from REGi,j .
Logical action number a of Pi, where Pi is either a writer or a reader, is denoted by Lai . Since we use
(1, 1) registers, some single physical actions of the (1, n) implementation are replaced by n physical
actions (for example wai is replaced by w
a
i [1] · · ·wai [n]). Each register contains ﬁve successive nodes
called new, current, previous, old and ancient. Immediately after the occurrence of wai [j] the current
node in REGi,j holds vai , the node computed by L
a
i , while previous, old and ancient hold v
a−1
i , v
a−2
i
and va−3i , respectively.
The use of (1, 1) registers inﬂuences the protocols’ design in two ways. On one hand, informa-
tion propagation is not atomic any more: A processor that wishes to pass some information to all
other processors should write n times whereas in the (1, n) implementation a single atomic write
would sufﬁce. Consequently, information is passed to the processors gradually and not at once. On
the other hand, since (1, 1) registers are used, each pair of processors can use a constant number
of additional bits on top of those used for node encoding without increasing the O(log n) space
complexity.
90 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
Now we discuss the inﬂuence of non-atomic information propagation on the protocol: Consider
the situation when node vbj joins the system gradually during physical actions w
b
j [1], . . . ,wbj [n]. In
these circumstances, the following problem may arise: let Lck be some action whose tail node is v
b
j ,
where j < k . Assume that Lck is terminated before w
b
j [i], k < i, occurs. If at this point, P*, * < i, reads
both REGj,* and REGk ,*, while executing Ld* , then nodes v
b
j and v
c
k belong to G
d
* and (v
b
j , v
c
k) is an
edge Gd* . On the other hand, if at this point, Pm, m  i, reads REGj,m and REGk ,m, while executing
Lam, then, since w
b
j [m] has not occurred yet, vbj is not a node of Gam and (vbj , vck) is not included in Gam.
This may cause a situation in which one reader returns one value while another reader returns a
second value and atomicity might be violated. To prevent this problem the following adjustments
are made:
(1) The protocol is augmented with the inform stage in which Pi informs all processors of its new
node.
(2) During the invocation of collect in action Lai , all new nodes are scanned. Whenever Pi sees that
the new node in REGj,i, vbj , is the tail node of some node v
c
k , inG
a
i then v
b
j is added toG
a
i as well.
The code of (1, 1) version of procedure collect, for Pi, appears in Fig. 11. In this implementation,
the code depends on the id of Pi . Assume Gai is collected: For every processor Pj , G
a
i contains nodes
current, previous and old read from REGj,i . When Gai contains some node v
c
k whose tail node is the
new node of Pj , vbj , (that is , v
b
j is the new ﬁeld in REGj,i) where i > k > j, v
b
j is added to G
a
i as well.
The set AD contains the tail_address of all ﬁve nodes read from each register.
The gradual departure of nodes from the system raises a similar problem: LetE be an execution in
which vbj is the tail node of v
a
i (j < i). In actionsw
a+3
i [k], k = 1, 2, . . . , n, vai ismoved from the old ﬁelds
into the ancient ﬁelds of REGi,k . Consider the situation when actions w
a+3
i [k], k = 1, 2, . . . , *, * < n
were executed, but actions wa+3i [k], k = *+ 1, *+ 2, . . . , n, were not carried out yet. If, at this point,
Pj uses vbj .address as an address for some new node v
b+r
j (r > 0) where L
b+r
j starts after L
a
i terminates,
the edge (vb+rj , vai ) might be collected by some Pm, * < m  n. Since in E, Lai ⇒ Lb+rj , this edge vio-
lates the precedence relation. To prevent this problem we add the ancient node to the registers of
Fig. 11. Procedure collect for Pi in the (1, 1) implementation.
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 91
the processors. The ancient node is never used as a tail node for new nodes; its role is to provide a
window of time during which the old node leaves the system while reuse of its address is delayed.
Now, we present the notion of enclosed actions: Intuitively, action Lbj is enclosed within L
a
i , if
its execution interval [ rbj [1],wbj [n] ] is contained in the execution interval of Lai , [ rai [1],wai [n] ]. The
hand-shakemechanism is a fairly standard distributed protocolwhich enables Pi to sometimes detect
enclosed actions by processors with smaller ids. Whenever some enclosed actions are detected, the
node of one of them is chosen as can_tail. In the next subsection, enclosed actions and enclosed-free
actions are deﬁned. The required properties of the hand-shake mechanism are stated in Lemma 13.
The hand-shake mechanism is presented and Lemma 13 is proved in Appendix A.
The code for the protocol, without the details of the hand-shake mechanism, appears in Fig. 12.
Now we describe the protocol assuming Lai is executed: Execution of L
a
i starts with an invocation
of collect in which Gai is collected. The physical actions of collect are r
a
i [1] · · · rai [n]. Following that,
Fig. 12. The protocol for Pi in the (1, 1) implementation.
92 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
the frontal branch of Gai , B
a
i , is computed. Computation of B
a
i is done just like before, where re-
lation locally precedes is deﬁned so that each of old , previous, current and new locally precedes its
successors on the list. After that, Pi computes its tentative new node, whose tail node is denoted by
can_tail, as follows: If an enclosed node is detected then can_tail is the enclosed node with maximal
id . Otherwise, can_tail is the last node ofBai /i. As before, cid denotes the id of (the owner of) can_tail.
The address of new is the minimal address not included in the set AD which includes components
tail_address of all nodes read during collect. In addition, it is ensured that the address of new is not
equal to the last four addresses used by Pi, thus, the addresses of every ﬁve consecutive nodes are
distinct and if vai  va+ri and r > 0, then r > 4.
During execution of Lai , Pi computes a node called Ret_node as follows: If L
a
i connects then
Ret_node is chosen as the tail node of vai , if L
a
i loops then Ret_node is equal to node can_tail com-
puted during Lai . Node Ret_node is used only if Pi is a reader, in this case the value corresponding
to Ret_node is returned by Lai and it becomes the value corresponding to v
a
i . This is the only place
in the implementation in which a value corresponding to one node is copied to another node.
In the next step, Pi declares the new node, to Pcid exclusively, in action pai [cid]. Then, in action
r˜ai [cid], Pi rereads REGcid ,i . If vcid is  to either current, or previous, or old then Lai connects—the
tentative choice of tail is committed; otherwise (Lai loops), the tail node of v
a
i is v
a
i itself. Thus, after
this stage, the choice of node new is ﬁnalized. After choosing its ﬁnal new node Pi executes the inform
stage in which it informs all other processors about the new value by writing it into component
new of all its registers in physical actions iai [1] · · · iai [n]. Logical action Lai is concluded with physical
actionswai [1] · · ·wai [n] inwhich vai is written in the currentﬁeld of all registers of Pi one after the other.
To complete deﬁnition of the protocol, we need to specify initial values for all data structures.
Similar to the (1, n) implementation, all nodes in all registers are initialized to hold the initial node,
v0i , where v
0
i .address = 0, v0i .tail_id = i and v0i .tail_address = 0. That is, at the initial conﬁguration
all edges of the current graph are self-loops, the initial frontal branch contains only the root node
v00 and the initial logical value is the value corresponding to v
0
0, which can be chosen freely from the
set of all permitted values. The initial values for the bits implementing the hand-shake mechanism
are presented in Appendix A.
Since there are ﬁve nodes in each register, and each node contains three ﬁelds each of which
is of size log n+O(1) bits, the space complexity of the (1, 1) implementation is 	(log n). The time
complexity, including the implementation of the hand-shake mechanism, is 5n+O(1).
5.2. Linearization of the logical actions
Similar to the previous section, we ﬁx some arbitrary execution of the system, E = c0, d1, c1, . . .,
and all deﬁnitions and proofs are made with respect to E.
Deﬁnition 10. The tail node of vai is deﬁned as follows:
(1) If Lai connects then the tail node of v
a
i is Ret_node, read in r˜
a
i [cid], which is  to can_tail.
(2) If Lai loops then the tail node of v
a
i is v
a
i itself.
In this implementation, once more a history graph, H , is used to linearize the logical actions. For
every processor, Pi, writer or reader, each node vai is included in H . The conﬁguration at which v
a
i
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 93
joins the history graph, which is called the joining conﬁguration of vai , should satisfy the following
requirements:
(1) The joining conﬁguration of vai lies within the interval [wai [i + 1], wai [n]].
(2) If (vbj , v
a
i ), j < i, is an edge in H then the joining conﬁguration of v
a
i is not earlier than the
joining conﬁguration of vbj .
These requirements are fulﬁlled by the following deﬁnition.
Deﬁnition 11. Let Lai be some logical action and let DES
a
i be the set of actions whose nodes are
descendents of vai in H , including v
a
i itself. Let L
b
j be the ﬁrst logical action in DES
a
i to complete its
execution. The joining conﬁguration of vai (and all nodes on the directed path from v
a
i to v
b
j ) is the
result conﬁguration of wbj [n]. In case vai /= vbj , we say that vai joins the history graph by vbj .
Deﬁnition 12. Let E be an execution of the system. The History graph of E, H , is deﬁned as
follows:
(1) H 0 is the History graph before the execution begins. It contains only the root node—v00.
(2) Let ctj be the result conﬁguration of w
b
j [n]. Assume that ctj is the joining conﬁguration of
a set of nodes PTHji , i  j. In this case we deﬁne H
ctj as the graph obtained from Hctj−1
by adding all nodes of PTHji . For each node of PTH
j
i , v
c
k , H
ctj contains an edge emanating
from the tail node of vck and incoming to v
c
k . If ctj is not a joining conﬁguration of any
node then Hctj ≡ Hctj−1 .
Deﬁnition 13. Let cti be the joining conﬁguration of v
a
i . Action L
a
i is lasting if v
a
i belongs to B
cti
H . An
action which is not lasting is transient.
Note. Since in this implementation, conﬁguration cti may be the joining conﬁguration of several
nodes. A node of a lasting action is only required to belong to B
cti
H (and not necessarily to be last
in B
cti
H ).
Now, we deﬁne enclosed actions: Intuitively, action Lbj is enclosed within L
a
i , if its execution inter-
val [ rbj [1],wbj [n] ] is contained in the execution interval of Lai , [ rai [1],wai [n] ]. Since if i /= n, Pi cannot
determine the occurrence time of wbj [n] we deﬁne Lbj as enclosed in Lai if the interval [ rbj [1],wbj [i] ] is
contained in the interval [ rai [1],wai [n] ]. Even under this permissive deﬁnition, asynchronicitymakes
it impossible to detect every enclosed action. The maximum one can hope for is that if Pj executes
“enough” enclosed actions during Lai then one of them is detected. In the following deﬁnition we use
a slightly abusive language and deﬁne an action as enclosed-free if no enclosed action is detected,
that is, an enclosed-free action may actually have some (undetected) enclosed actions. Detection of
enclosed actions is done by use of the hand-shake mechanism. Description of the mechanism and
a formal proof of its properties appear in Appendix A.
Deﬁnition 14. Lbj and v
b
j are enclosedwithin L
a
i , i > j, if the execution of L
b
j begins after the execution
of Lai , and v
b
j is the current node in REGj,i, collected during L
a
i . Action L
a
i is enclosed-free if it does
not detect any enclosed action.
94 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
Now, we deﬁne the linearization point for the logical actions. Logical write actions are linearized
independently of the logical read actions according to the following deﬁnition.
Deﬁnition 15. Let Lai be a logical write action whose joining conﬁguration is cti .
(1) If Lai is last in B
cti
H /(w + 1) then Lai is linearized at cti .
(2) If vbj is the last node of B
cti
H /(w + 1) then Lai is linearized by Lbj , that is, before Lbj and after any
other physical action that precedes the joining conﬁguration of vbj . In case several logical write
actions are linearized by the same logical action they are linearized in ascending order of their
ids.
Read actions are linearized according to the nodes they return. The linearization point of a logical
read action Lai is deﬁned as follows.
Deﬁnition 16. Let Lai be a logical read action whose Ret_node is v
b
j . If L
b
j is linearized before the
beginning of the execution interval of Lai , then L
a
i is linearized at the beginning of its execution
interval. Otherwise, Lai is linearized by L
b
j , that is, after L
b
j and before any physical action which
occurs after Lbj , or before any logical action which is linearized, after L
b
j . In case several logical read
actions are linearized by the same logical action they are linearized in ascending order of their ids.
5.3. Correctness proof
We begin the proof with several auxiliary lemmas which are used in the proof of Theorem 19 in
which it is proved that every logical action is linearized inside its execution interval. At the end of
the proof, in Theorem 22, we show that every logical read action returns the value written by the
last logical write action linearized before it, hence, the implementation is correct.
The ﬁrst lemma deals with enclosed actions andwith the properties of their detectionmechanism.
A description of the mechanism and a proof for the lemma appear in Appendix A.
Lemma 13. Let vbj be the last node of Pj whose joining conﬁguration is before r
a
i [1]. There exists a
detection mechanism that requires three bits for each pair of processors such that if, for some r  0,
vb+3+rj is the current node in REGj,i at rai [j], then it is detected as enclosed.
Lemma 14. Let (vbj , v
a
i ), j < i, is an edge in G
c
k and let ctj and cti be the joining conﬁguration of v
b
j and
vai , respectively. Under these conditions the following hold:
(1) There exists some integer r, r  0 such that (vb+rj , vai ) is an edge in Hcti .
(2) Conﬁguration ctj is not later than cti .
Proof of 1. By the protocol, if (vbj , v
a
i ) is an edge of G
c
k , where j < i, then L
a
i connects, and its tail
node is some node of Pj , v
b+r
j . To prove the lemma we show that r  0.
Since (vbj , v
a
i ) is an edge of G
c
k and (v
b+r
j , v
a
i ) is an edge of H , we conclude that v
b
j .address =
vb+rj .address. In the next claim we assume by way of contradiction that r < 0 and prove that
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 95
pai [j] → rbj [i] → wa+4i [j]. Since vai is one of the ﬁve nodes in REGi,j throughout the interval
[ pai [j] , wa+4i [j] ), we conclude that vai .tail_address ∈ ADbj , where ADbj is the set AD returned by pro-
cedure collect invoked during Lbj . By the protocol, v
b
j .address /= vai .tail_address. On the
other hand, the fact that (vbj , v
a
i ) is an edge of G
c
k implies that v
b
j .address = vai .tail_address, a
contradiction.
Claim 4. Under the conditions of the lemma, if r < 0 then pai [j] → rbj [i] → wa+4i [j].
Proof of claim. First, we show that pai [j] → rbj [i]. Since the addresses of every ﬁve consecutive
nodes are distinct and vb+rj .address = vbj .address, the fact that r < 0 implies that r < −4. Nodes old ,
previous and current in REGj,i at w
b−2
j [i] are vb−4j , vb−3j and vb−2j , respectively. Since r < −4, vb+rj
is neither old nor previous nor current in REGj,i at w
b−2
j [i]. Since Lai connects, vb+rj is one of these
nodes in REGj,i at r˜ai [j], hence r˜ai [j] → wb−2j [i]. By the protocol, pai [j] → r˜ai [j] and wb−2j [i] → rbj [i],
hence pai [j] → rbj [i].
The fact that rbj [i] → wa+4i [j] is proved as follows:
(1) rbj [i] → ibj [k] (according to the protocol)
(2) ibj [k] → rck [j] (since vbj ∈ Gck )
(3) rck [j] → rck [i] (since i > j and Gck is collected in ascending order)
(4) rck [i] → wa+3i [k] (since vai ∈ Gck )
(5) wa+3i [k] → wa+4i [j] (Pi works in sequential manner)
Since the right-hand side of each relation is the left-hand side of the next one, we get rbj [i] → wa+4i [j].
In conclusion, pai [j] → rbj [i] → wa+4i [j]. The claim follows. 
Proof of 2. Let ctj+r be the joining conﬁguration of v
b+r
j . By Deﬁnition 11, ctj+r is not later than cti .
If r > 0, the protocol implies that ctj+r is later than ctj . 
Lemma 15.Let cti be the joining conﬁguration of L
a
i and let v
b
j be node can_tail computed by L
a
i .Under
these conditions vbj ∈ Hcti .
Proof. Let vbj be node can_tail in L
a
i . First we show that v
b
j is not the new node in REGj,i at r
a
i [j]. If
Lai is enclosed-free then v
b
j is the last node in B
a
i /i. Since a new node of P* is added to G
a
i only if it
is the tail node of another node vck , where * < k < i, we conclude that v
b
j is not new in REGj,i . If L
a
i
has an enclosed action then, by Deﬁnition 14, vbj is the current node of Pj , where j is largest id of
detected enclosed action.
Now,we continue the proof, assuming that vbj is not new inREGj,i . IfL
a
i loops, then v
b
j is not among
current, previousor old read fromREGj,i in r˜ai [j]. Obviouslywbj [n] → rai [j] → wai [n] andweare done.
Assume that Lai connects. In this case v
b
j  one of current, previous or old in REGj,i at r˜ai [j]. Assume
vbj  vcj , where vcj is one of these three ﬁelds. If vbj ≡ vcj , we conclude that wbj [n] → icj [j] → r˜ai [j] and,
once again, we are done. Assume now that vbj ≡ vcj . In this case, vbj is the tail node of vai in H . By
Deﬁnition 11, vbj ∈ Hcti . 
96 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
Lemma 16. Let Lck be an enclosed-free action and let ctk be the joining conﬁguration of v
c
k . Every node
of Bck/k belongs to H
ctk .
Proof. Let the nodes of Bck/k be vj0(= vai ), vj1 , . . . , vjp (= v00), where vai is last in Bck/k . We prove the
lemma by induction on jq.
Base:
We have to show that the last node of Bck/k , v
a
i , satisﬁes the lemma. Since L
c
k is enclosed-free,
it holds that vai is node can_tail computed during L
c
k . By Lemma 15, v
a
i ∈ Hctk .
Induction step:
Assume that all nodes vj0 , . . . , vjq belong to H
ctk . To complete the proof it sufﬁces to show
that vjq+1 ∈ Hctk . Assume that (vjq+1 , vjq) = (vbj , ve*) for some processors Pj and P*. By Lemma
14, there exists some r, r  0, such that (vb+rj , ve*) is an edge in H
ctk , that is, vb+rj ∈ Hctk . If
r = 0, we are done. If r > 0, then since Lbj → Lb+rj , clearly vbj ∈ Hctk as well. 
Let Lai be a logical action. The initial conﬁguration of L
a
i is the conﬁguration that precedes the
ﬁrst physical action of Lai , that is the conﬁguration that precedes r
a
i [1].
Lemma 17. Let csi be the initial conﬁguration of L
a
i and let cti be the joining conﬁguration of v
a
i .
1.If B
csi
H /i ≡ B
cti
H /i then
(1.a) Lai is enclosed-free.
(1.b) B
csi
H /i is a subgraph of G
a
i .
(1.c) Lai is lasting.
2.For any c  cti , if vai ∈ BcH then va+1i ∈ Hc.
Proof. By induction on i, the ids of the processors.
Base: i = 1
(1.a)Every logical action executed by P1 is enclosed-free.
(1.b)For any conﬁguration c, BcH/1 contains a single node, namely v
0
0.
(1.c) Every action of P1 is lasting.
(2) Every node of P1 excludes its previous node from the frontal branch of the history graph.
Step:Assume correctness for j < i. Now, we prove correctness for i.
Proof of (1.a). Assume by way of contradiction that Lai is not enclosed-free. By Deﬁ-
nition 14, this means that Lai detects at least one enclosed action. Let L
b
j , j < i, be the
action with maximal id , detected as enclosed in Lai . By Deﬁnition 14, the initial conﬁg-
uration of Lbj , csj , is after the initial conﬁguration of L
a
i , csi . By the protocol, v
b
j is node
can_tail computed by Lai , hence, Lemma 15 implies that v
b
j ∈ Hcti . We get that the in-
terval [csj , ctj ] is contained within the interval [csi , cti ]. Since B
csi
H /i ≡ B
cti
H /i we get that
B
csj
H /j ≡ B
ctj
H /j. By Induction Assumption (1.c), L
b
j is lasting. Hence, the frontal branch of
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 97
the history graph is modiﬁed at ctj , the joining conﬁguration of v
b
j . Since j < i, B
csj
H /i ≡
B
ctj
H /i, a contradiction.
Proof of (1.b). We have to show that every node of B
csi
H /i belongs to G
a
i : First, we show
that every such node is read during the collection of Gai either as new or as current or as
previous. Let vbj be some node in B
csi
H /i. Clearly i
b
j [i] → csi → rai [j]. Since B
csi
H /i ≡ B
cti
H /i we
get that vbj ∈ B
cti
H /i which implies, by induction hypothesis (2), that v
b+1
j ∈ Hcti . Therefore,
wb+1j [n] occurs after cti which implies that ibj [i] → rai [j] → wb+1j [n]. Thus, vbj is either new or
current or previous in REGj,i at rai [j]. If vbj is either current or previous then it belongs to Gai
independently of the rest of the nodes of Gai . If, however, v
b
j is new in REGj,i at r
a
i [j] then
it belongs to Gai only if it is the tail node of another node in G
a
i . Thus, to show that all the
nodes of B
csi
H /i belong to G
a
i , it sufﬁces to show that the last node of B
csi
H /i is in G
a
i .
Let vd* be the last node of B
csi
H /i. Now, we show that w
d
* [i] → rai [j] which implies that vd* is
either current or previous in REGj,i at rai [j], hence, vd* ∈ Gai . If the joining conﬁguration of vd*
is at wd* [n], then wd* [n] → csi → rai [j]. Otherwise, vd* joins the history graph by another node
vck whose tail node is v
d
* , where * < i  k . Since vd* ∈ B
csi
H and v
d
* joins the history graph by v
c
k ,
Deﬁnition 11 implies that vck ∈ Hcsi . The proof is completed by the following relations:
1.a. wd* [i] = wd* [k] (if i = k). Or
1.b. wd* [i] → wd* [k] (according to the protocol, assuming i < k).
2. wd* [k] → r˜ck [j] (since vd* is the tail node of vck ).
3. r˜ck [j] → csi (since vck ∈ Hcsi ).
4. csi → rai [j] (according to the protocol).
Which imply wd* [i] → rai [j]. Therefore, vd* ∈ Gai , and B
csi
H /i is a subgraph of G
a
i .
Proof of (1.c). Assume by way of contradiction that Lai is transient. Since B
csi
H /i ≡ B
cti
H /i,
induction hypothesis (1.a) and (1.b) for i imply that Lai is enclosed-free and that B
cti
H /i is a
subgraph of Gai . Since L
a
i is enclosed-free, Lemma 16 implies that B
a
i is a subgraph of H
cti .
Now, we show that Lai does not loop:We claim that B
a
i is a subgraph of B
cti
H . Assume towards
a contradiction that vbj is the last node in their common preﬁx and (v
b
j , v
c
k) is the ﬁrst edge of
Bai not in B
cti
H . Since B
a
i is a subgraph of H
cti , the edge (vbj , v
c
k) belongs to H
cti . Since (vbj , v
c
k)
is not an edge of B
cti
H , there exists an edge of B
cti
H , (v
b
j , v
d
* ), excluding (v
b
j , v
c
k) from B
cti
H . Since
B
cti
H /i is a subgraph of G
a
i , we conclude that (v
b
j , v
d
* ) is an edge of G
a
i and it excludes (v
b
j , v
c
k)
from Bai , a contradiction, thus B
a
i is a subgraph of B
cti
H . Let v
b
j , j < i, be node can_tail selected
during Lai . Since B
a
i is a subgraph of B
cti
H and L
a
i is enclosed-free, v
b
j is a node of B
cti
H /i. Since
r˜ai [j] → ctj , vbj is read either as new, or as previous or as old during r˜aI [j], hence Lai does not
loop.
Since Lai is enclosed-free, transient and does not loop, we get thatB
a
i /i ≡ B
cti
H /i. Let v
b
j , j < i,
be the node with the maximal id in their common preﬁx. Node vbj is not last in B
a
i /i since
B
cti
H /i is a subgraph of G
a
i and B
a
i /i ≡ B
cti
H /i. Let (v
b
j , v
c
k), k < i, be the edge emanating from
vbj in B
a
i /i. Now, we show that (v
b
j , v
c
k) is an edge in H
cti : Since Lai is enclosed-free, Lemma
98 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
16 implies that vck ∈ Hcti . By Lemma 14, there exists some r  0, such that (vb+rj , vck) is an
edge in Hcti . Since vbj ∈ B
cti
H , induction hypothesis (2) implies that v
b+1
j ∈ Hcti , that is r = 0
and (vbj , v
c
k) is an edge in H
cti . By the maximality of vbj , (v
b
j , v
c
k) ∈ B
cti
H /i. Let (v
b
j , v
d
* ), * < i, be
the edge excluding (vbj , v
c
k) from B
cti
H /i. Since B
cti
H /i is a subgraph of G
a
i , (v
b
j , v
d
* ) ∈ Gai , and it
excludes (vbj , v
c
k) from B
a
i /i, a contradiction.
Proof of (2). Assume by way of contradiction that vai does not satisfy the lemma, that is
vai ∈ B
cui
H where cui is the joining conﬁguration of v
a+1
i . Since v
a
i ∈ B
cui
H , L
a+1
i is transient. We
reach the required contradiction by showing that La+1i is lasting. Since the joining conﬁgu-
ration of vai is before r
a+1
i [1] and vai ∈ B
cui
H , there are no lasting actions with id smaller then
i in the interval [ra+1i [1], cui ]. Hence, Br
a+1
i [1]
H /i ≡ B
cui
H /i which implies, by (1.c), that L
a+1
i is
lasting. 
Lemma 18. If Lai is enclosed-free then B
rai [1]
H /i is a subgraph of G
a
i .
Proof. In order to show that B
rai [1]
H /i is a subgraph ofG
a
i , we show that every node of B
rai [1]
H /i belongs
to Gai : Let v
b
j be an arbitrary node in B
rai [1]
H /i. Since v
b
j ∈ Br
a
i [1]
H /i, i
b
j [i] → rai [1] → rai [j]. By Lemma
17(2), vbj is the last node of Pj whose joining conﬁguration is before r
a
i [1]. Since Lai is enclosed-free,
Lemma 13 implies that the current node in REGj,i at rai [j] is vb+rj , 0  r  2. Thus, vbj is read, in
rai [j], during the collection ofGai either as new or as current or as previous or as old in REGj,i . If vbj is
either current or previous or old then it belongs to Gai independently of the rest of the nodes in G
a
i .
Assume that vbj is new in REGj,i at r
a
i [j]. The proof that vbj ∈ Gai is identical to the proof of Lemma
17(1.b). 
Theorem 19. Every logical action is linearized within its execution interval.
Proof.Since logicalwrite and read actions are linearized differently, we prove the theorem separately
for each type of logical action:
Proof for logical write actions.Let Lai be a logical write action whose initial conﬁguration is csi .
By Deﬁnition 15, every logical write action is linearized no later than its joining conﬁguration.
Therefore, we only have to show that Lai is linearized after csi . Let cti be the joining conﬁguration
of vai . If L
a
i is lasting then it is linearized either at cti or just before it, and the theorem follows.
Assume that Lai is transient. By Lemma 17(1.c), B
csi
H /i ≡ B
cti
H /i, hence, there exists a lasting logical
write action Lbj , j < i, whose joining conﬁguration is after csi and before cti . Action L
a
i is either
linearized by Lbj or it is linearized by another lasting logical write action which is linearized after
Lbj and before cti . The theorem follows.
Proof for logical read actions. Now, we prove by induction on i, the ids of the readers that every
logical read action of Pi is linearized within its execution interval:
Base: i = w + 1
Let Lai be a logical write action of Pi and let csi and cti be the initial conﬁguration of L
a
i and
the joining conﬁgurations of vai , respectively. By Deﬁnition 16, every logical read action is
linearized after its initial conﬁguration, thereforewe only have to prove that Lai is linearized
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 99
no later than cti . Let v
b
j , j < i, be node ret_node computed by L
a
i . Since j < i, Pj is a writer,
hence, the ﬁrst part of the proof implies that Lbj is linearized within its execution interval.
If Lai connects then by Deﬁnition 11, v
b
j ∈ Hcti . Therefore, Deﬁnition 16 implies that Lai is
linearized before cti . If L
a
i loops, then obviously L
b
j is completed before cti , so once again,
Lai is linearized before cti .
Induction step:
Assume that all logical actions of all readers whose ids are less then i are linearized within
their execution interval and let Lai be an arbitrary action of Pi, whose ret_node is v
b
j . Action
Lbj is linearized within its execution interval, either by the ﬁrst part of the theorem, if Pj is a
writer, or by the induction hypothesis, if Pj is a reader. In both cases, the proof is identical
to the proof of the base case. 
Theorem 20. If (vbj , v
a
i ), j < i, is an edge in G
e
m then L
b
j ⇒ Lai .
Proof. Let cti be the joining conﬁguration of v
a
i . By Lemma 14, there exists some r  0 such that
(vb+rj , vai ) is an edge ofHcti . We show that L
b+r
j ⇒ Lai . The theorem follows since if r > 0, Lbj ⇒ Lb+rj .
Consider the following cases:
Case 1: i  w.
In this case Lb+rj and Lai are bothwrite actions. If Lai is lasting or if both Lai and L
b+r
j are transient
then the proof is implied immediately by Deﬁnition 15. If Lb+rj is lasting and Lai is transient
then Lai joins the history by another lasting logical write action, L
c
k , that satisﬁes L
b+r
j ⇒ Lck .
By Deﬁnition 15, Lb+rj ⇒ Lai .
Case 2: j  w < i.
In this case the value written by Lb+rj is returned by Lai . By Deﬁnition 16, L
b+r
j ⇒ Lai .
Case 3: w  j.
In this case, by Deﬁnition 16, Lai is linearized by L
b
j . 
Lemma 21.LetLai , i > w, be an enclosed-free read action and let v
b
j , j  w, be the last node ofBai /w + 1.
If vbj ∈ Br
a
i [1]
H /w + 1 then Lbj is linearized after rai [1].
Proof. Since vbj ∈ Bai /(w + 1) but vbj ∈ Br
a
i [1]
H /(w + 1), Bai /(w + 1) ≡ B
rai [1]
H /(w + 1). Let vck be the node
with the maximal id in the common preﬁx. Since vbj ∈ Bai /(w + 1), vck is not last in Bai /(w + 1). Let
(vck , v
d
* ) be the edge emanating from v
c
k in B
a
i /(w + 1). Let ct* be the joining conﬁguration of vd* . By
the next claim ct* is after r
a
i [1]. If Ld* ≡ Lbj , we are done. If Ld* ≡ Lbj then there is a path from vd* to vbj
in Gai (because v
b
j and v
d
* are both of B
a
i /i and v
b
j is last in B
a
i /i). By Theorem 20, L
d
* ⇒ Lbj , therefore
Lbj is linearized after r
a
i [1].
Claim 5. Under the conditions of the lemma, ct* is after r
a
i [1].
Proof of claim.Assume byway of contradiction that ct* is before r
a
i [1], that is, vd* ∈ Hr
a
i [1]. By Lemma
14, there exists some r, r  0, such that (vc+rk , vd* ) is an edge ofHr
a
i [1]. Since vck ∈ B
rai [1]
H /(w + 1), Lemma
100 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
17(2) implies that vc+1k ∈ Hr
a
i [1]/(w + 1). Hence, r = 0 and (vck , vd* ) is an edge of Hr
a
i [1]. Let (vck , vem),
m  *, be the edge excluding (vck , vd* ) from B
rai [1]
H . Since L
a
i is enclosed-free, Lemma 18 implies that
B
rai [1]
H /i is a subgraph of G
a
i . Hence, (v
c
k , v
e
m) is an edge of G
a
i and it excludes (v
c
k , v
d
* ) from B
a
i , a
contradiction. Therefore, the joining conﬁguration of vd* , ct* is after r
a
i [1].
Now, we show that Ld* is linearized after r
a
i [1]. If Br
a
i [1]
H /* ≡ B
ct*
H /* , then L
d
* is linearized by the
lasting write action that is linearized last before ct* , that is after r
a
i [1]. Assume that Br
a
i [1]
H /* ≡ B
ct*
H /*.
The following three facts:
(1) B
rai [1]
H /i is a subgraph of G
a
i (by Lemma 18),
(2) vck ∈ B
rai [1]
H /i (by deﬁnition),
(3) (vck , v
d
* ) is an edge in B
a
i /i (by deﬁnition),
imply that vck is the last node of B
rai [1]
H /* (otherwise the edge emanating from v
c
k in B
rai [1]
H /* excludes
(vck , v
d
* ) from B
a
i /i). Since B
rai [1]
H /* ≡ B
ct*
H /* it holds that v
c
k is last in B
ct*
H /*. By Deﬁnition 15, L
d
* is
linearized at ct* (or just before), that is after r
a
i [1]. The claim and the lemma follow. 
In the following theorem, we prove that the (1, 1) implementation is correct.
Theorem 22. Let Lai be a logical read action, and let L
b
j be the logical write action which wrote the
value returned by Lai . L
b
j is the most recent write action linearized before L
a
i .
Proof.Denote the initial conﬁguration of Lai by csi . We prove this theorem by induction on i, the id
of logical readers.
Base: i = w + 1.
Let vb−rj , r  0, be node can_tail computed in Lai . Consider the following cases:
Case 1: Lai is enclosed-free.
In this case vb−rj is the last node ofBai /i. ByLemma 18,B
csi
H /i is a subgraphofG
a
i .We show
that either Lb−rj is linearized after csi or L
b−r
j is the most recent write action linearized
before csi . If v
b−r
j ∈ B
csi
H /i then by Lemma 21, L
b−r
j is linearized after csi . If v
b−r
j ∈ B
csi
H /i
then since B
csi
H /i is a subgraph ofG
a
i , and since v
b−r
j is the last node of B
a
i /i, v
b−r
j is last in
B
csi
H /i. Hence, L
b−r
j is the most recent write action linearized before csi . Since L
b−r
j ⇒ Lbj
(if r > 0), the proof follows.
Case 2: Lai is not enclosed-free.
In this case, vb−rj is the node with the maximal id detected as enclosed in Lai . By Theorem
19, Lb−rj is linearized within its execution interval. Since L
b−r
j is enclosed within L
a
i , it is
linearized after the beginning of the execution interval of Lai . Since L
b−r
j is node can_tail
of Lai , Lemma 15 implies that L
b−r
j is linearized before the joining conﬁguration of L
a
i .
Hence Lb−rj is linearized within the execution interval of Lai . If r = 0 we are done. In par-
ticular, if Lai loops then Ret_node is can_tail, r = 0 and we are done. Assume Lai connects
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 101
and r > 0. In this case Lbj is linearized after the beginning of the execution interval of
Lai (since L
b−r
j ⇒ Lbj ) and before the joining conﬁguration of Lai (since vbj is the tail node
of vai ). In this case L
b
j is linearized within the execution interval of L
a
i and Deﬁnition 16
implies that Lai is linearized by L
b
j . That is, L
b
j is the most recent write action linearized
before Lai .
Step: Assume correctness for m, w < m < i. Now, we prove for i:
Recall that Lbj be the logical write action which wrote the value returned by L
a
i . If v
b
j is node
Ret_node of Lai then the proof is identical to the proof of the induction base. Otherwise let v
c
k ,
w < k < i, be node Ret_node of Lai , and let v
c−r
k , r  0, be node can_tail computed during Lai .
In this case Lck is a logical read action which also returns the value written by L
b
j . Consider
the following cases:
Case 1: Lai is enclosed-free.
By the induction assumption, Lbj is the most recent write action linearized before L
c
k . If
Lck is linearized within the execution interval of L
a
i then L
a
i is linearized by L
c
k and we are
done. Assume that Lck is linearized before csi . According to Deﬁnition 16, L
a
i is linearized
at csi . By the next claim there exist no write actions linearized after L
c
k and before csi and
therefore the proof of this case follows.
Claim 6. Under the conditions of Case 1, there exist no write actions linearized after Lck
and before csi .
Proof of claim. Assume by way of contradiction that there exist some logical write
actions linearized after Lck and before csi . In this case, there exists at least one
such lasting write action. Let vd* , *  w, be the last node of B
csi
H /(w + 1). By this
deﬁnition, Lbj ⇒ Lck ⇒ Ld* . By Lemma 18, B
csi
H /i is a subgraph of G
a
i . Since v
d
* ∈
B
csi
H /(w + 1) we get that vd* ∈ Gai . In both of the following cases we reach the re-
quired contradiction:
Case 1.1: vd* ∈ Bai /i.
Since vc−rk is last inBai /i, there is a path from vd* to v
c−r
k inG
a
i . Therefore, by Theorem
20, Ld* ⇒ Lc−rk . Since r  0 we get that Ld* ⇒ Lck , a contradiction.
Case 1.2: vd* ∈ Bai /i.
In this case Gai has some edge (v
f
o , vem), o < m < *  w, that excludes the sufﬁx of
B
csi
H /i (and v
d
* ) from B
a
i /i. Let ctm be the joining conﬁguration of v
e
m. First, we show
that vd* ∈ Hctm : Since (vfo , vem) is an edge of Gai , Lemma 14 implies that there exists
an edge (vf+ro , vem) in Hctm . Assume that ctm is before csi . Since v
f
o ∈ BcsiH , Lemma
17 implies that r = 0 and therefore, (vfo , vem) excludes vd* from B
csi
H , a contradiction.
Thus ctm is after csi and v
d
* ∈ Hctm .
By Lemma 14, there exists some r  0, such that (vf+ro , vem) is an edge in Hctm . If
r = 0, (vfo , vem) excludes vd* from BctmH . If r > 0, vf+ro ∈ Hctm , and by Lemma 17.(2),
v
f
o ∈ BctmH . Since vfo is on the path from the root to vd* , vd* ∈ BctmH . Since vd* ∈ Hctm , Lem
is either linearized at ctm , or by another node whose joining conﬁguration is after
ct* , the joining conﬁguration of v
d
* , hence, L
d
* ⇒ Lem. Since vem ∈ Bai /i and vc−rk is last
102 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
in Bai /i, Theorem 20 implies that L
e
m ⇒ Lc−rk . Therefore, we get Lbj ⇒ Ld* ⇒ Lem ⇒
Lc−rk ⇒ Lck and in particular Lbj ⇒ Lem ⇒ Lck , a contradiction.
Case 2: Lai is not enclosed-free.
In this case vc−rk is the node with the maximal id detected as enclosed within Lai . There-
fore, Lc−rk is linearized after csi , the initial conﬁguration of Lai . Since r  0 we get that
Lck is linearized after csi . By the induction assumption, L
b
j is the most recent write action
linearized before Lck . Since L
a
i is linearized by L
c
k , L
b
j is the most recent write action line-
arized before Lai . To show that L
c
k is linearized before cti , note: If L
a
i loops then Ret_node
is node can_tail of Lai , r = 0 and the proof follows. If Lai connects and the tail node of
Lai is L
c+r
k , r  0 and if r > 0 then Lck ⇒ Lc+rk . By Deﬁnition 16, Lc+rk ⇒ Lai and the proof
follows. 
6. Concluding remarks
The second and third implementations presented in this paper are two novel label-based
implementations of a (w, r)-atomic register with logarithmic space complexity. The second im-
plementation is of type (1, n), its space complexity is 	(logw), its time complexity is 	(w) and
communication is one sided. The third implementation is of type (1, 1), its space complexity is
	(log n), its time complexity is 	(n), and communication is two sided. The logarithmic space
complexity of both implementations is asymptotically optimal for label-based implementations.
We conjecture that the space complexity of any (w, r) register is logarithmic, even for general
(non-label-based) implementations. Our proofs are very technical and complex. We view the prob-
lem of presenting a formal veriﬁcation system for register implementation as an important open
problem.
Acknowledgments
We thank Paul Vitányi for his tireless efforts to convince us that implementations with sublinear
space complexity are worth looking at. We also thank Yael Gafni, and Arie Rudich whose com-
ments on an earlier version have helped us in this presentation. Last but surely not least is John
Tromp whose continuous help during this work was invaluable in terms of both correctness and
style.
Appendix A. The hand-shake mechanism
In this appendix we describe the hand-shake mechanism in detail and prove that it satisﬁes the
requirements of Lemma 13. The code of the protocol in which the mechanism is integrated ap-
pears in Fig. 13. For each pair of processors, Pj and Pi, i > j, the protocol requires that Pi should
be able to detect enclosed actions of Pj . To do that, REGj,i is augmented with two bits called
REGj,i.s1 and REGj,i.s2, and REGi,j is augmented with one bit called REGi,j.t. Processor Pi keeps
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 103
Fig. 13. The protocol for Pi with the hand-shake mechanism.
a local image for each of these bits where the local image of REGj,i.s1 (REGj,i.s2, respectively) is
denoted by *s1[j] (*s2[j], respectively) while the local image of REGi,j.t is denoted by *t[j]. All s
bits and their local images are initialized to 0, while all t bits and their local images are initial-
ized to 1.
The detection mechanism works as follows: At the beginning of Lai , Pi initiates detection of ac-
tions of Pj , i > j, by reading bit REGj,i.s1 and writing its inverted value in REGi,j.t. Whenever Pi
104 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
wishes to check whether there exists an action of Pj , enclosed within Lai , Pi reads REGj,i.s1 and
REGj,i.s2. In case both bits are equal to bit REGi,j.t, the current node of Pj is detected as enclosed.
Now, we describe the role of Pj in the detection mechanism: Assume that Pj executes Lbj . Pj al-
ways tries to modify the value of REGj,i.s1 (and its local image) so that it is equal to the most
recently read value of REGi,j.t. At the beginning of Lbj , Pj reads REGi,j.t and compares its val-
ue with (its local image of) REGj,i.s1. If the bits are not equal, Pj concludes that this is the ﬁrst
time REGi,j.t is read since it was last written, thus it is possible that Lai has started after L
b
j . In
this case Pj assigns REGj,i.t to REGi,j.s1 and the inverted value of REGj,i.t to REGi,j.s2 and pre-
vents Pi from detecting Lbj as enclosed during L
a
i . On the other hand, if REGi,j.t and (the local
image of) REGj,i.s1 are equal, Pj concludes that Lai has started before L
b
j and the current val-
ue of REGj,i.s1 was written during L
b−1
j . In this case, Pj can conclude that L
a
i has started before
the end of Lb−1j , and hence before the beginning of Lbj . In this situation, Pj assigns the value of
REGi,j.s1 to *s2[j]. This updated value of *s2[j] is written in REGi,j.s2 together with vbj in action
wbj [i].t.
Note. If the detection part of Lai occurs before this action, L
a
i does not detect any action of Pj as
enclosed. If, however, the detection part occurs after this action, the current node of Pj is detected
as enclosed.
The modiﬁed code appears in Fig. 13. For convenience we denote Pi’s local images of bits which
enable Pi to detect enclosed actions (of processors with lower ids) with capital letters while local
images of bits which help other processors (with higher ids) to detect actions of Pi as enclosed
are denoted with lower case letters. To implement the mechanism, the code is modiﬁed in four
different spots. The ﬁrst change is to add two blocks of code called down and up before Gai is
collected. In block down, Pi initiates detection of actions (of processors with lower ids) enclosed
within Lai . For each j < i, Pi reads bit REGj,i.s1 (action r
a
i [j].s) and writes its inverted value in
REGi,j.t and in *t[j] (action wai [j].t). In block up, Pi enables detection of vai as enclosed within
the actions of processors with higher ids. In this block, Pi computes vectors *s1, *s2, and *t; for
each j > i, Pi reads REGj,i.t (action rai [j].t), stores it in *t[j] and compares it with *s1[j]. If the bits
are not equal, Pi assigns REGj,i.t and its inverted value to REGi,j.s1 and to REGi,j.s2, respective-
ly (action wai [j].s). If the bits are equal, Pi assigns the value of *s1[j] to *s2[j] (and action wai .s is
skipped).
The second change is the augmentation of procedure collect so that it returns vectors *S1 and
*S2 where *S1[j] (*S2[j]) contains bit REGj,i.s1 (REGj,i.s2), read in action rai [j]. The third change
is in computing predicate enclosed-free using vectors *S1, *S2 and *T . Node vbj is detected as en-
closed by Lai if bits REGj,i.s1 and REGj,i.s2 read in r
a
i [j] are both equal to bit REGi,j.t stored in
*T [j] during block down (and written in wai [j].t). If no enclosed action is detected then enclosed-
free is true. The fourth change is to write bit *s2[j], for every j > i, computed in block up, in
REGi,j.s2 in action wai [j]. The additional physical action required by the hand-shake mechanism are
all executed in block up and down and are denoted by rai [1].s; wai [1].t · · · rai [i − 1].s; wai [i − 1].t and
rai [i + 1].t; wai [i + 1].s · · · rai [n].t ; wai [n].s.
In the next two lemmas we prove that the hand-shake mechanism satisﬁes the requirements. In
Lemma 23, we prove that every label detected as enclosed satisﬁes Deﬁnition 14. In Lemma 24 we
prove that the implemented mechanism satisﬁes the requirements stated in Lemma 13:
A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106 105
Lemma 23. Let vbj be the current node read from REGj,i in action r
a
i [j], where i > j. If REGj,i.s1 and
REGj,i.s2(the bits read in rai [j]) are equal to REGi,j.t(the bit written in wai [j].t) then vbj is enclosed
within Lai .
Proof. Bits s1 and s2 are always written and read together. Since REGj,i.s1 = REGj,is2, we conclude
that after Pj read bit REGi,j.t, during the up block of Lbj , Pj found that *s1[i] = *t[i]. Since in this
case Pj does not change bit REGj,i.s1, we conclude that the value of REGj,i.s1 was not changed after
action wb−1j [i].s. By the code, the value of bit REGi,j.t, written in action wai [j].t is the complement
of the value read from bit REGj,i.s1 in action rai [j].s, we conclude that rai [j].s → wb−1j [i].s. Pj works
sequentially, thereforewe getwb−1j [i].s → rbj [1].s. Since vbj is read in rai [j]we getwbj [i] → rai [j]. Hence
rai [1].s → rbj [1].s → wbj [i] → rai [j] and by Deﬁnition 14, vbj is enclosed within Lai . 
Lemma 24. Let vbj be the last node of Pj whose joining conﬁguration is before r
a
i [1]. If vb+3+rj , r  0, is
the current node in REGj,i at rai [j], then Pi detects Lb+3+rj as an enclosed action.
Proof. Since vbj is the last node of Pj that joined the history graph before r
a
i [1], it holds that rai [1] →
wb+1j [n]. According to the protocol, wai [j].t → rai [1]. Therefore, we get that wai [j].t → wb+1j [n] →
rb+2j [i].t. Hence, wai [j].t → rb+2j [i].t → rb+3j [i].t. According to the hand-shake protocol, Pj copies
the t bit read in rb+2j [i].t to REGj,i.s1 (in wb+2j [i].s) and REGj,i.s2 (in wb+3j [i]). Therefore, when
executing rai [j], Pi ﬁnds ls1[j] = ls2[j] = lT [j], hence Pi detects Lb+3j as enclosed. 
References
[1] Y. Afek, H. Attiya, D. Dolev, E. Gafni, M. Merritt, N. Shavit, Atomic snapshots of shared memory, J. ACM 40 (4)
(1993) 873–890.
[2] U. Abraham, On interprocess communication and the implementation of a multi-writer atomic registers, Theor.
Comput. Sci. 149 (1995) 257–298.
[3] R. Cori, E. Sopena, Some combinatorial aspects of time stamp systems, Eur. J. Combin. 14 (1993) 95–102.
[4] D. Dolev, N. Shavit, Bounded concurrent time-stamping, SIAM J. Comput. 26 (2) (1997) 418–455.
[5] M.P. Herlihy, Impossibility and universality results for wait-free synchronization, in: Proceedings of the 7th ACM
Symposium on Principles of Distributed Computing, 1988, pp. 276–290.
[6] M. Herlihy, J. Wing, Axioms for concurrent objects, in: 14th ACM Symposium on Principles of Programming
Languages, 1987, pp. 13–26.
[7] A. Israeli, M. Li, Bounded time-stamps, Distrib. Comput. 6 (4) (1993) 205–209.
[9] A. Israeli, J. Tromp, P.M.B. Vitányi, personal communication.
[10] L. Lamport, On interprocess communication. Part I: basic formalism, Distrib. Comput. 1 (2) (1986) 77–85.
[11] L. Lamport, On interprocess communication. Part II: algorithms, Distrib. Comput. 1 (2) (1986) 86–101.
[12] N. Lynch, M. Tuttle, Hierarchical correctness proofs for distributed algorithms, in: Proceedings of the 6th ACM
Symposium on Principles of Distributed Computing, 1988, pp. 137–151.
[13] M. Li, J. Tromp, P.M.B. Vitányi, How to share concurrent wait-free variables, J. ACM 43 (1992) 107–112.
[14] M. Li, P.M.B. Vitányi, Optimality of wait-free atomic multiwriter variables, Inform. Process. Lett. 43 (1992) 107–
112.
[15] J. Misra, Axioms for memory access in asynchronous hardware systems, ACM Trans. Progr. Lang. Syst. 8 (1) (1986)
142–153.
[16] G.L. Peterson, Concurrent reading while writing, ACM Trans. Progr. Lang. Syst. 5 (1) (1983) 46–55.
106 A. Israeli, A. Shaham / Information and Computation 200 (2005) 62–106
[17] G.L. Peterson, J.E. Burns,Concurrent readingwhilewriting II: themultiwriter case, in: 28thAnnual IEEESymposium
on Foundations of Computer Science, 1987, pp. 383–392.
[18] R.W. Schaffer, On the correctness of atomic multi-writer registers, Technical ReportMIT/LCS/TM-364, Laboratory
for Computer Science, MIT.
[19] J. Tromp, On update-last schemes, Parallel Process. Lett. 44 (1) (1993) 25–28.
[20] P. Vitányi, B. Awerbuch, Atomic shared register access by asynchronous hardware, in: Proceedings of the 27th
Annual Symposium on Foundations of Computer Science, 1986, pp. 233–243.
