Fine-grained Analysis on Fast Implementations of Distributed
  Multi-writer Atomic Registers by Huang, Kaile et al.
Fine-grained Analysis on Fast Implementations of
Multi-writer Atomic Registers
Kaile Huang, Yu Huang, Hengfeng Wei
State Key Laboratory for Novel Software Technology
Nanjing University
mg1933024@smail.nju.edu.cn, {yuhuang, hfwei}@nju.edu.cn
Abstract
Distributed multi-writer atomic registers are at the heart of a large number of distributed
algorithms. While enjoying the benefits of atomicity, researchers further explore fast imple-
mentations of atomic reigsters which are optimal in terms of data access latency. Though it is
proved that multi-writer atomic register implementations are impossible when both read and
write are required to be fast, it is still open whether implementations are impossible when only
write or read is required to be fast. This work proves the impossibility of fast write implementa-
tions based on a series of chain arguments among indistiguishable executions. We also show the
necessary and sufficient condition for fast read implementations by extending the results in the
single-writer case. This work concludes a series of studies on fast implementations of distributed
atomic registers.
1 Introduction
Distributed storage systems employ replication to improve performance by routing data queries to
data replicas nearby [19, 1, 2]. System reliability is also improved due to the redundancy of data.
However, data replication is constrained by the intrinsic problem of maintaining data consistency
among different replicas [27]. The data consistency model acts as the “contract” between the
developer and the storage system. Only with this contract can the developers reason about and
program over the data items which actually exist as multiple replicas [26, 9].
Atomicity is a strong consistency model [20, 21, 17]. It allows concurrent processes to access
multiple replicas of logically the same data item, as if they were accessing one data item in a
sequential manner. This abstraction, usually named an atomic register, is fundamental in dis-
tributed computing and is at the heart of a large number of distributed algorithms [5, 23]. Though
the atomicity model greatly simplifies the development of upper-layer programs, it induces longer
data access latency. The latency of read and write operations is mainly decided by the number
of round-trips of communications between the reading and writing clients and the server replicas.
In the single-writer case, the read operation on an atomic register needs two round-trips of com-
munications [5]. In the multi-writer case, both write and read operations need two round-trips of
communications [23, 4].
In distributed systems, user-perceived latency is widely regarded as the most critical factor
for a large class of applications [22, 24, 19, 1, 2]. While enjoying the benefits of atomicity, re-
searchers further explore whether we can develop fast implementations for atomic registers. Since
two round-trips are sufficient to achieve atomicity, fast implementation means one round-trip of
communication, which is obviously optimal. In the single-writer case, it is proved that when the
1
ar
X
iv
:2
00
1.
07
85
5v
5 
 [c
s.D
C]
  1
8 J
un
 20
20
Table 1: Overview of contributions.
Design space Impossibility Implementation
W2R2 [23] t ≥ S2 W ≥ 2, R ≥ 2, t < S2
W1R2 [this work] W ≥ 2, R ≥ 2, t ≥ 1 ∅
W2R1 [this work] R ≥ St − 2 R < St − 2
W1R1 [12] W ≥ 2, R ≥ 2, t ≥ 1 ∅
number of reading clients exceeds certain bound, fast read is impossible [12]. In the multi-writer
case, it is proved impossible when both read and write are required to be fast [12].
This leaves an important open problem when examining the design space of fast implementa-
tions of multi-writer atomic registers in a fine-grained manner. Specifically, we denote fast write
implementations as W1R2, meaning that the write operation finishes in one round-trip, while
the read operation finishes in two round-trips. Similarly, we denote fast read implementations as
W2R1 and fast read-write implementations as W1R1. Existing work only proves that fast read-
write (W1R1) implementations are impossible. It is still open whether fast write (W1R2) and fast
read (W2R1) are impossible. This impossibility result (yet to be proved) underlies the common
practice of quorum-replicated storage system design, e.g. the Cassandra data store [19]: when read
or write is required to finish in one round-trip, weak consistency has to be accepted.
This work thoroughly explores the design space of fast implementations of multi-writer atomic
registers. Specifically, for fast write (W1R2) implementations, we prove that it is impossible to
achieve atomicity. The impossibility proof is mainly based on the chain argument to construct
the indistinguishability between executions. Unlike the W1R1 case, the chain argument for W1R2
implementations faces two severe challenges:
• (Section 3) Since the read operation has one more round-trip (compared to the W1R1 case) to
discover differences between executions, it is more difficult to construct the indistinguishability
we need for the impossibility proof. To this end, we combine three consecutive rounds of
chain arguments, in order to hide the differences in the executions from the 2-round-trip read
operations.
• (Section 4) The first round-trip of a read operation might update information on the servers,
thus potentially affecting the return values of other read operations. The effect from the first
round-trip of read operations may also break the indistinguishability we try to construct.
To this end, we use sieve-based construction of executions to eliminate the effect of the first
round-trip of a read operation.
The impossibility proof for W1R2 implementations is the main contribution of this work.
For W2R1 implementations, we prove the impossibility when R ≥ St − 2. When R < St − 2, we
propose a W2R1 implementation. The proof and the implementation are extensions to the results
of the single-writer case [12].
The contributions in this work conclude a series of studies on fast implementations of distributed
atomic registers. The contributions of this work in light of results in the existing work are outlined
in Table 1.
The rest of this work is organized as follows. In Section 2, we describe the preliminaries. Section
3 and Section 4 present the impossibility proof for W1R2 implementations. Section 5 outlines the
impossibility proof and the algorithm design of W2R1 implementations. Section 6 discusses the
related work. In Section 7, we conclude this work and discuss the future work.
2
2 Preliminaries
In this section, we first describe the system model and the definition of atomicity. Then we outline
the algorithm schema for multi-writer atomic register implementations.
2.1 Atomic Register Emulation in Message-passing Systems
We basically adopt the system model used in [12]. Specifically, a replicated storage system consid-
ered in this work consists of three disjoint sets of processes:
• the set Σsv of servers: Σsv = {s1, s2, · · · , sS}.
• the set Σrd of readers: Σrd = {r1, r2, · · · , rR}.
• the set Σwr of writers: Σwr = {w1, w2, · · · , wW }.
Here, S, R and W denote the cardinalities of Σsv, Σrd and Σwr respectively. The readers and the
writers are also called clients. We are concerned of multi-writer multi-reader implementations. Thus
we have W ≥ 2 and R ≥ 2. In a distributed message-passing system, we also have that S ≥ 2. The
clients and the servers communicate by asynchronous message passing, via a bidirectional reliable
communication channel, as shown in Fig. 1. There is no communication among the servers. For
the simplicity of presentation, we assume the existence of a discrete global clock, but the processes
cannot access the global clock. An implementation A of a shared register is a collection of automata.
Computation proceeds in steps of A. An execution is a finite sequences of steps of A. In any given
execution, any number of readers and writers, and t out of S servers may crash.
Readers (r1, r2, …, rR), Writers (w1, w2, …, wW)
…local replicaemulator
read()
reply(value)
local replica
emulator
local replica
emulator
Server s1 Server sSServer s2
…
write(value)
ACKmessage-passing
channels
Figure 1: System model of read/write register emulation.
An atomic register is a distributed data structure that may be concurrently accessed by multiple
clients, yet providing an “illusion of a sequential register” to the accessing processes. The atomic
register provides two types of operations. Only a writer can invoke the write operation write(v),
which stores v in the register. Only a reader can invoke the read operation read(), which returns the
value stored. We are concerned of wait-free implementations, where any read or write invocation
eventually returns independently of the status of other clients. Due to the locality property of
atomicity [17], we consider one single shared register.
We define an execution of the clients accessing the shared register as a sequence of events where
each event is either the invocation or the response of a read or write operation. Each event in
the execution is tagged with a unique timestamp from the global clock, and events appear in the
3
execution in increasing order of their timestamps. For execution σ, we can define the partial order
between operations. Let O.s and O.f denote the timestamps of the invocation and the response
events of operation O respectively. We define O1 ≺σ O2 if O1.f < O2.s. We define O1||O2 if neither
O1 ≺σ O2 nor O2 ≺σ O1 holds. An execution σ is sequential if σ begins with an invocation, and
each invocation is immediately followed by its matching response. An execution σ is well-formed
if for each client pi, σ|pi (the subsequence of σ restricted on pi) is sequential. Given the notations
above, we can define atomicity:
Definition 2.1. A shared register provides atomicity if, for each of its well-formed executions
σ, there exists a permutation pi of all operations in σ such that pi is sequential and satisfies the
following two requirements:
• [Real-time requirement] If O1 ≺σ O2, then O1 appears before O2 in pi.
• [Read-from requirement] Each read returns the value written by the latest preceding write
in pi.
2.2 Algorithm Schema for Multi-writer Atomic Register Implementations
When studying fast implementations of multi-writer atomic registers, the critical operation we
consider is the round-trip of communication between the client and the servers. In each round-trip,
the client can query all the servers, i.e., collect useful information from the servers. The client
can also update all the servers, i.e., send useful information to the servers. Upon receiving a query
request, the server replies the client as required. Upon receiving an update request, the server first
stores data sent from the client. Then it can reply certain information if necessary, or it can simply
reply an ACK. Exemplar implementations can be found in [5, 23, 4, 28, 18].
Tuning the number of round-trips in emulation of a multi-writer atomic register, we have four
possible types of implementations [25], as shown in Fig. 2. They are slow read-write implementation
(W2R2), fast write implementation (W1R2), fast read implementation (W2R1) and fast read-write
implementation (W1R1). Fig. 2 can be viewed as the Hasse Diagram of the partial order among
implementations. The partial order relation can be thought of as providing stronger consistency
guarantees or inducing less data access latency.
Note that, for atomic register implementations, when 2 round-trips are sufficient, we do not
consider implementations employing k round-trips for k ≥ 3. However, for impossibility of fast
implementations, we need to consider the impossibility of W1Rk and WkR1 implementations for
k ≥ 3. The impossibility proofs of W1Rk and WkR1 implementations are principally same with the
impossibility proofs of W1R2 and W2R1 implementations, as discussed in Section 3 and Section 5
respectively.
3 Fast Write (W1R2): Chain Arguments for Impossibility Proof
We first present the impossibility proof for fast write (W1R2) implementations. Specifically, we
prove the following theorem:
Theorem 1 (W1R2 impossibility). Let t ≥ 1, W ≥ 2 and R ≥ 2. There is no fast write (W1R2)
atomic register implementation.
This impossibility result is proved by chain argument [6], which is also used to prove the impossi-
bility of W1R1 implementations in [12]. The central issue in chain argument is to construct certain
4
High
Low
LATENCY
W2R2
W2R1 W1R2
W1R1
Strong
Weak
CO
NS
IS
TE
NC
Y
Figure 2: Algorithm schema for multi-writer atomic register implementations.
indistinguishability between executions. Compared to the impossibility proof of W1R1 implemen-
tations, the read operations now have one more round-trip. This “one more round-trip” imposes
two critical challenges for constructing the indistinguishability:
1. Obtaining more information from the second round-trip, the read operations can now “beat”
the indistinguishability constructed in the W1R1 case. In our proof, we add one more read
operation and construct two more chains of executions, in order to obtain the indistinguisha-
bility even when facing two round-trips of read operations.
2. The first round-trip of a read operation may update information on the servers, thus possibly
affecting the return values of other read operations. The effect of the first round-trip may also
break the indistinguishability we plan to construct. To cope with this challenge, we propose
the sieve-based construction of executions. We sieve all the servers and eliminate those which
are affected by the first round-trip of a read operation. On the servers that remain after the
sieving, we show that the chain argument can still be successfully conducted.
This section addresses the first challenge and presents the chain argument. In Section 4, we address
the second challenge and discuss how to eliminate effects of the first round-trip. Note that the
impossibility proof of W1R2 implementations also applies for W1Rk implementations for k ≥ 3. We
can combine the round-trips 2, 3, · · · , k as if they were one single round-trip. The chain argument
still applies.
3.1 Overview
It suffices to show the impossibility in a system where S ≥ 3 2, W = 2, R = 2 and t = 1. In
the proof, we use two write operations W1 and W2 (issued by writers w1 and w2 respectively)
and two read operations R1 and R2 (issued by readers r1 and r2 respectively). Since t = 1, the
read operation must be able to return when one server gives no response. When constructing
an execution, we say one round-trip in an operation skips one server s, if the messages between
the client and the server are delayed a sufficiently long period of time (e.g. until the rest of the
execution has finished). If one round-trip of communication does not skip any server, we say it is
skip-free.
In a chain argument, we will construct a chain of executions, where two consecutive executions
in the chain differ only on one server. Since in two end executions of the chain the read operations
2In a replicated system, we have S ≥ 2. When S = 2 while t = 1, it is trivial to prove the impossibility.
5
return different values, there must be some “critical server”. The change on the critical server
results in the difference in the return values. We intentionally let the read operation skip the
critical server. This will construct the indistinguishability we need (as detailed in Section 3.2).
In a chain argument, we may also utilize the relation between operations to construct the indis-
tinguishability (as detailed in Section 3.4). Specifically, one operation cannot notice the differences
in executions after it has finished. Moreover, the operation cannot notice the differences on one
server if it skips this server.
The indistinguishability makes the read operations return the same value in two executions.
However, construction of the chain of executions tells us that the two executions should return
different values (note that within one execution, two reads must return the same value, as required
by the definition of atomicity). This leads to contradiction.
To beat the ability of the read operation to employ two round-trips of communications, we need
to conduct a series of chain arguments. The proof will be presented in three phases, each phase
constructing one chain, as shown in Fig. 3. For the ease of presentation, we assume in the chain
argument that the first round-trip of a read operation will not affect the return values of other read
operations. In Section 4, we will explain how to lift this assumption.
w1
w2
r1
r2
α0
w1
w2
r1
r2
αS chain
w1
w2
r1
r2
β0’
w1
w2
r1
r2
β0’’
w1
w2
r1
r2
βS’
w1
w2
r1
r2
βS’’
chain
chain
Phase 1 Phase 2
Phase 3
head
tail
head tail
…
β0
β1
β2
βS-1
βSchain β
… …
β0
β1
β2
βS-1
βS
γ0
γ1
γ2
γS-1
chain γ
… …
β0
β1
β2
βS-1
βS
γ0
γ1
γ2
γS-1
chain ℤ
…
β0
β1
β2
βS-1
βS
chain βchain α
𝛼௜భିଵ
𝛼௜భ
2020.5.5修改
Figure 3: Proof overview.
3.2 Phase 1: Chain α and Critical Server si1
To construct chain α, we first construct the “head” and the “tail” executions:
• Head execution αhead consists of the following three non-concurrent operations: i) a skip-free
W1 = write(1), which precedes ii) a skip-free W2 = write(2), which precedes iii) a skip-free
R1 = read(). Note that all servers receive the three operations in this order. In αhead, R1
returns 2 (as required by the definition of atomicity).
6
• Tail execution αtail consists of the same three non-concurrent operations, but the temporal
order is W2, W1 and R1. In αtail, R1 returns 1.
Let α0 = αhead. From α0 we construct α1, the next execution in the chain, as follows. Execution α1
is identical to α0 except that server s1 receives W2 first, and then W1 and R1. That is, we “swap”
two write operations on s1 and everything else is unchanged. Continuing this “swapping” process,
we swap two write operations on si in αi−1 and obtain αi, for all 1 ≤ i ≤ S. Thus we obtain chain
α = (α0, α1, · · · , αS). Note that R1 cannot distinguish αS from αtail. Thus, R1 returns 1 in αS ,
while it returns 2 in α0.
Since R1 returns different values in two ends of the chain, there must exist two consecutive exe-
cutions αi1−1 and αi1 (1 ≤ i1 ≤ S), such that R1 returns 2 in αi1−1 and returns 1 in αi1 . Note that
αi1−1 and αi1 differ only on one “critical server” si1 , i.e., si1 receives W1 first in αi1−1, and receives
W2 first in αi1 . This critical server si1 will be intentionally skipped to obtain indistinguishability,
when constructing chain β in Phase 2 below.
3.3 Phase 2: Chain β Derived from Chain β′ and Chain β′′
In Phase 2 of our proof, we basically append the second read operation R2 to executions in chain
α and obtain chain β. We actually construct two candidate chains β′ and β′′, and modify one of
them to get chain β, depending on what the return value of R2 is. Chain β
′ and β′′ stem from
execution αi1−1 and αi1 respectively, i.e. two executions pertained to the critical change on the
critical server.
Since the read operations consist of two round-trips, we denote the two round-trips of read
operation Ri as R
(1)
i and R
(2)
i (i = 1, 2). We extend execution αi1−1 with the second read operation
R2. We interleave the round-trips of R1 and R2 as follows: the four round-trips are non-concurrent
and the temporal order is R
(1)
1 , R
(1)
2 , R
(2)
1 and R
(2)
2 on all servers si (1 ≤ i ≤ S), as shown in Fig.
3. This execution is named β′head = β
′
0. To construct chain β
′, we will swap R(2)1 and R
(2)
2 on one
server a time. Specifically, for 1 ≤ i ≤ S, β′i is the same with β′i−1, except that server si receives
R
(2)
1 first in β
′
i−1, and receives R
(2)
2 first in β
′
i. The last execution of the chain is β
′
tail = β
′
S .
We then extend execution αi1 in the same way, and get β
′′
head = β
′′
0 . We also do the swapping
in the same way and get executions β′′1 , β′′2 , · · · , β′′S . The only difference between chain β′ and β′′
is that, chain β′ stems from execution αi1−1, while chain β′′ stems from αi1 . Thus, R1 returns 2
in chain β′, while returning 1 in chain β′′. This is because, the return value of R1 is decided by
executions αi1−1 and αi1 . Appending the read operation R2 should not change the return value of
an existing read, as required by the definition of atomicity.
The only server which can tell the difference between β′ and β′′ is si1 , the critical server in
chain α. Now we modify tail executions β′tail and β
′′
tail, in order to obtain the indistinguishability
we need. In both tail executions β′tail and β
′′
tail, we let R2 (both round-trips) skip server si1 . Thus
the (modified) β′tail and β
′′
tail are indistinguishable to R2, and R2 returns the same value in both
modified tail executions.
To construct chain β, we must start from either β′ or β′′, and revise the chosen candidate chain
into chain β. The criteria for choosing a chain is that the candidate chain must enable us to make
the read operations in the two end executions β0 and βS have different return values. Without loss
of generality, we assume that R2 returns 1 in both β
′
tail and β
′′
tail (modified, with R2 skipping si1).
In β′0, since R1 returns 2, according to the definition of atomicity, we have that R2 must also return
2 in β′0. Thus, we choose chain β′.
We modify chain β′ to obtain chain β as follows. For every execution in chain β′, we let R2
(both round-trips) skip si1 and obtain every corresponding execution in chain β. That is, R2 in
7
chain β′ is skip-free while R2 in chain β skips si1 (if R2 returns 2 in both modified β′tail and β
′′
tail,
we will choose to revise chain β′′, and obtain chain β in the same way). Chain β servers as the
basis for construction of chain γ and Z in Phase 3 of our proof.
3.4 Phase 3: Zigzag Chain Z Combining Chain β and γ
Given chain β, we have that R1 and R2 both return 2 in β0, while both read operations return 1 in
βS . Now in Phase 3, we will first construct chain γ = (γ0, γ1, · · · , γS−1). Then we combine chain
β and γ, and obtain the zigzag chain Z, as shown in Fig. 3.
For any two executions x and x′ from chain β and γ, we define an equivalence relation: x ≈ x′
when R1 and R2 return the same value in both x and x
′. Note that R1 and R2 must return the same
value in one execution, as required by the definition of atomicity. We will prove that all executions
in chain Z are connected by the ‘≈’ relation, i.e., β0 ≈ γ0 ≈ β1 ≈ γ1 ≈ · · ·βS−1 ≈ γS−1 ≈ βS .
According to our construction in Phase 1 and 2, we have that β0 6≈ βS . This leads to contradiction.
We first construct the horizontal links in chain Z, i.e., ∀0 ≤ k ≤ S− 1, βk ≈ γk in Section 3.4.1.
Then we construct the diagonal links, i.e., ∀0 ≤ k ≤ S − 1, βk+1 ≈ γk in Section 3.4.2.
3.4.1 Horizontal link from βk to γk
We first construct execution γk from βk (0 ≤ k ≤ S − 1). The construction process implies
that βk ≈ γk. The key behind the process is still constructing certain indistinguishability. When
constructing γi, we need to utilize two sources of indistinguishability:
1. When R
(2)
1 finishes before R
(2)
2 on some server sx, and we modify R
(2)
2 on sx, R
(2)
1 will not
notice the change (behind its back).
2. When R
(2)
2 skips sx, and we modify R1 on sx, R2 will not notice the change.
The construction is shown in Fig. 4 and Fig. 5, from the reader’s view and the server’s view
respectively.
Before the construction of γk, we need to review the characteristics of all executions in chain β.
For every execution βk (0 ≤ k ≤ S), operation R1 (both round-trips) is skip-free, while R2 (both
round-trips) skips exactly one server si1 (the critical server obtained from chain α, see Section 3.2).
In the construction of γk, we will change the server R
(2)
2 skips. We will also let R
(2)
1 skip one server.
No other modifications will be made to βk. Also note that starting from β0, for k = 1, 2, · · · , S, we
do the swapping in sk and obtain execution βk. That is, for βk, sk+1 sees R
(2)
1 and then R
(2)
2 (not
swapped); while sk sees R
(2)
2 and then R
(2)
1 (swapped).
From βk, we will create γk as follows. In construction of γk, we only modify R
(2)
1 and R
(2)
2 , i.e.,
the first round-trips of both operations are unchanged. In βk, the swapping takes place on sk and
we will pick the first not-swapped server, i.e., sk+1. Server sk+1 finishes R
(2)
1 before it receives R
(2)
2 .
We create a temporary execution tempk which is the same with βk except that R
(2)
2 skips sk+1 and
does not skip si1 . The only two servers affected are sk+1 and si1 . Note that here we assume that
k + 1 6= i1. The case k + 1 = i1 (which is actually simpler) will be discussed separately below. For
the two servers affected, we verify the indistinguishability for R1 (see Fig. 5):
• For sk+1, R1 cannot see any difference since R1 finishes first.
• For si1 , previously R(2)2 skips si1 (in βk) and now we add R(2)2 back on si1 (in tempk). We
can intentionally add R
(2)
2 after R
(2)
1 on si1 . Thus R1 still cannot see any difference.
8
βk tempk γk
{R1} {R2}
R2 = x
R1 = x ⇒ R1 = x
R2 = x R2 = x
R1 = x
⇒
R1 = R2
required by atomicity tempk .R2 = γk .R2
based on indistinguishability
h-link
Figure 4: Construction of the horizontal link: the reader’s view.
Thus R1 cannot distinguish βk from tempk, and R1 will return the same value in both executions
(see Fig. 4). As required by the definition of atomicity, R2 will return the same value with R1,
thus returning the same value in both executions. This gives us that βk ≈ tempk.
Now we create execution γk which is the same with tempk except that R
(2)
1 skips sk+1 (note
that in βk and tempk, R
(2)
1 is skip-free). The only change takes place on sk+1. Since R
(2)
2 skips sk+1
in both tempk and γk, R2 cannot distinguish tempk from γk. Thus we have that R2 will return the
same value in tempk and γk (see Fig. 4). Also as required by the definition of atomicity, R1 will
return the same value in both executions. This gives us tempk ≈ γk.
Finally, combining the two links above (see Fig. 4 and Fig. 5), we have βk ≈ γk. Here note that
since R
(2)
2 skips sk+1, it seems unnecessary for R1 to skip sk+1 in γk. For the proof till now, it is
indeed unnecessary. However, we need to let R1 skip sk+1 here, in order to construct the diagonal
link between βk+1 and γk later in the following Section 3.4.2.
In the proof above, we left out the case k + 1 = i1, which is discussed here. When k + 1 = i1,
we create γk as follows. In βk, sk+1 only receives R
(2)
1 (since R
(2)
2 skips si1 = sk+1). We let R
(2)
1
skip sk+1, and get γk. Since R
(2)
2 skips sk+1, R2 cannot distinguish βk from γk and will return the
same value. As required by the definition of atomicity, R1 will also return the same value in βk
and γk. Thus we still have βk ≈ γk when k + 1 = i1.
3.4.2 Diagonal link from βk+1 to γk
Now we construct the diagonal link. We will create from βk+1 executions temp
′
k and γ
′
k (0 ≤ k ≤
S − 1). The construction is principally the same with the construction of the horizontal link. We
need to show that βk+1 ≈ temp′k ≈ γ′k. As for γ′k and γk, the executions on all servers, together
with the order among operations, are the same. It is straightforward to verify that γ′k ≈ γk (so
we do not show γ′k in Phase 3 in Fig. 3). Thus we can obtain the diagonal link, meaning that
βk+1 ≈ γk. Now we explain construction of the diagonal link in detail.
First note that in βk+1, the “swapping” (see Section 3.3) takes place in sk+1. Thus sk+1 sees
9
R2(2)
R2(2) skips
sk R1(2)
R1(2)sk+1 R2(2)
… …
R1(2)𝑆௜భ
βk
R2(2)
R2(2) skips
sk R1(2)
R1(2)sk+1
R2(2)
… …
R1(2)𝑆௜భ
tempk
R2(2)
R2(2) skips
sk R1(2)
sk+1
R2(2)
… …
R1(2)𝑆௜భ
γk
R1(2) skips
{R1} {R2}
2020.5.10改-hlink
2020.5.21改大字体
Figure 5: Construction of the horizontal link: the server’s view.
R
(2)
2 first and then R
(2)
1 . We create execution temp
′
k which is the same with βk+1, except that R
(2)
1
skips sk+1. The only difference between temp
′
k and βk+1 is on sk+1. Since R
(2)
2 finishes first on sk+1,
we have that R2 cannot distinguish βk+1 from temp
′
k, as shown in Fig. 6. So R2 will return the
same value in βk+1 and temp
′
k. As required by the definition of atomicity, R1 will also return the
same value in βk+1 and temp
′
k. Thus we have βk+1 ≈ temp′k. The construction from the server’s
view is shown in Fig. 7.
βk+1 temp’k γ'k
{R2} {R1}
R1 = x
R2 = x ⇒ R2 = x
R1 = x R1 = x
R2 = x
⇒
R1 = R2
required by atomicity temp’k .R1 = γ’k .R1
based on indistinguishability
d-link
Figure 6: Construction of the diagonal link: the reader’s view.
Now we construct execution γ′k, which is the same with temp
′
k except that R
(2)
2 skips sk+1 and
does not skip si1 (see Fig. 7). Similar to the horizontal link case, here we assume that k + 1 6= i1.
We will discuss the simpler case “k + 1 = i1” below. We need to show that R1 cannot distinguish
10
R2(2)
R2(2) skips
sk R1(2)
R2(2)sk+1 R1(2)
… …
R1(2)𝑆௜భ
βk+1
R2(2)
R1(2) skips
sk R1(2)
R2(2)sk+1
… …
R1(2)𝑆௜భ
temp’k
R2(2)
R1(2) skips
sk R1(2)
sk+1
R2(2)
… …
R1(2)𝑆௜భ
γ'k
R2(2) skips
{R2} {R1}
R2(2) skips
2020.5.10改-dlink
2020.5.21改大字体
Figure 7: Construction of the diagonal link: the server’s view.
temp′k from γ
′
k. The differences concern two servers sk+1 and si1 :
• As for sk+1, since R(2)1 skips sk+1, R1 will not see the difference that R(2)2 skips sk+1.
• As for si1 , now we add R(2)2 back on si1 . We can add R(2)2 after R(2)1 on si1 . Thus R1 finishes
first on si1 , not being able to distinguish temp
′
k from γ
′
k.
Thus we have that R1 returns the same value in temp
′
k and γ
′
k. As required by the definition of
atomicity, R2 will also return the same value in both executions. This gives us that temp
′
k ≈ γ′k.
It is straightforward to check that behaviors of R1 and R2 on every server, as well as the order
among operations, in γi and γ
′
i are the same. Thus we have γ
′
i ≈ γi. Note that here we can see
the importance of the seemingly unnecessary change from tempk to γk (in Section 3.4.1): letting
R
(2)
1 skip sk+1. The “unnecessary” skipping of R
(2)
1 in the horizontal link helps us make γk and γ
′
k
behave principally in the same way. Finally, this gives us βk+1 ≈ γk.
There is still the case “k+ 1 = i1” left, which is also simpler. We create γ
′
k as follows. In βk+1,
sk+1 only receives R
(2)
1 . Let R
(2)
1 skip sk+1, and we will get γ
′
k. Since R2 skips sk+1, R2 cannot
distinguish βk+1 from γ
′
k and will return the same value. As required by the definition of atomicity,
R1 will return the same value in βk+1 and γ
′
k too. Thus we still have βk+1 ≈ γ′k.
All the horizontal and diagonal links finally connects β0 and βS , meaning that R1 and R2 return
the same value in both executions. However, according to our construction of the chains, R1 and
R2 return different values in β0 and βS . This leads to contradiction, which finishes our impossibility
proof.
4 Fast Write (W1R2): Sieve-based Construction of Executions
Informally speaking, it is reasonable to think that the first round-trip of a read operation should
not change the information stored on the servers, thus being not able to affect the return values of
other read operations. It is because in the first round-trip, the reader knows nothing about what
happens on the servers and other clients. It should not “blindly” affect the servers.
11
Following the intuition above, we prove that in our chain argument in Section 3, if R
(1)
2 affects
certain servers, such servers cannot affect our chain argument. Thus we sieve all the servers and
only those which actually decide the return values of R1 remain. We restrict our chain argument
in Section 3 to the remaining servers and can still obtain the contradiction.
Before sieving the servers, we need an abstract model which can capture the essence of the
interaction between clients and servers in W1R2 implementations. Only with this abstract model
can we discuss what the effect is when we say that the server is affected by the first round-trip of
a read. In analogy, the role of this abstract model is like that of the decision-tree model, which is
used to derive the lower bound of time complexity for comparison-based sorting algorithms [11].
We name this model the crucial-info model and present it in Section 4.1. With the crucial-info
model, we discuss in Section 4.2 how we can eliminate the servers which have no effect on the
return values of read operations. We further explain how our chain argument can be successfully
conducted on servers that remain.
4.1 The Crucial-Info Model
We first present the full-info model, which is the basis for presenting the crucial-info model. When
considering an atomic register implementation, we only care about the number of round-trips to
complete a read or write operation. To this end, we use a full-info model, where the server is
designed as an append-only log. The server just append everything it receives from the writers
and readers in its log (never deleting any information). The clients can send arbitrary information
to the servers. The clients can also arbitrarily modify the information stored on the servers. The
server itself and the clients can always check the log to decide what data the server holds in any
moment in the execution.
When the client queries information from the server, the server just replies the client with all
the log it currently has. When the client obtains the full-info logs from multiple servers, it de-
rives from the logs what to do next, .e.g. deciding a return value or issuing another round-trip
of communication. Since we only care about the number of round-trips required in an implemen-
tation, we assume that the communication channel has sufficient bandwidth and the clients and
servers have sufficient computing power. Implementations following this model are called full-info
implementations.
This full-info model is for the theoretical analysis on the lower bound of the number of round-
trips. Obviously, full-info implementations can be optimized to obtain practically efficient imple-
mentations. Since no implementation will use less round-trips than the full-info implementation,
we only need to prove that there is no W1R2 full-info implementation of the atomic register. Based
on the full-info model, we can refine certain crucial information the servers must maintain. Such
crucial information must be stored, modified and disseminated among the clients and the servers,
as long as the implementation is a correct atomic register implementation. Specifically, in the exe-
cutions constructed in our impossibility proof (Section 3), when the writer writes the value “1” to
the servers, the server must store the crucial information “1”. Besides this crucial information, the
server can store any auxiliary information it needs, but we are not concerned of such non-crucial
information. In analogy, in comparison-based sorting, we only record which elements are compared
and what the results are in the decision tree. Other information is not of our concern when deriving
the lower bound of the time complexity of comparison-based sorting.
When two writers write “1” and “2”, no matter what the temporal relation between the two
write operations is, the server receives the crucial information in certain sequential order, and
we store this crucial information as “12” or “21”. In order to determine the return value, the
reader collects the crucial information “12” or “21” from no less than S − t servers. According to
12
the definition of atomicity, the reader needs to infer the temporal relation between the two write
operations W1 and W2. Then it can decide the return value. In executions we construct in our
proof, the only possible relations between W1 and W2 are:
• Rel1: W1 precedes W2.
• Rel2: W1 is concurrent with W2.
• Rel3: W2 precedes W1.
In the executions in chain α, β, γ and Z, there are two essential cases for the reader to decide a
return value:
• If the reader cannot differentiate Rel1 (or Rel3) from Rel2, then it must return 2 (or 1).
• If two readers both see Rel2, they need to coordinate (through the servers) to make sure
that they decide the same return value.
In other cases, the reader can obviously decide what it should return. Note that we only have
client-server interaction, i.e., the servers do not communicate with other servers and the clients do
not communicate with other clients.
Given the crucial-info model, we can now describe how the first round-trip of a read operation
R
(1)
i affects another read operation Rj . When R
(1)
i affects Rj , R
(1)
i must change the crucial infor-
mation on some servers, while such modified crucial information is obtained by Rj . Note that Rj
may be affected (i.e., the indistinguishability is broken) since the crucial information it obtains from
the servers changes, but Rj could still decide the same return value even if the crucial information
has changed.
In the executions in our proof, the reader only needs to derive the temporal relation between
W1 and W2. The only crucial information that can be stored on the server is the temporal order
between W1 and W2 the server sees. The possible values of the crucial information on the server are
“12” and “21”. The first round-trip of the reader can only affect the server by changing the crucial
information from “12” to “21” or vise versa, as long as the implementation correctly guarantees
atomicity.
Given the crucial-info model, we can explain how we sieve the serves, as well as how the chain
argument can be successfully conducted after the affected servers are eliminated.
4.2 Eliminating the Affected Servers
We conduct the sieving when we append R
(1)
2 to executions in chain α = (α0, α1, · · · , αS). From
α0, we append the second read operation R2, and discuss the effect of R
(1)
2 on the return value of
R1. Now we have three non-concurrent round-trips R
(1)
1 , R
(1)
2 and R
(2)
1 , as shown in Phase 2 of Fig.
3. We are concerned of what happens to our chain argument proof (in Section 3) if R
(1)
2 may affect
(the crucial information on) some servers and may potentially affect the return value of R1 (more
specifically, R
(2)
1 ), thus breaking the indistinguishability we try to construct.
Considering the effect of R
(1)
2 , we partition all servers Σsv into two subsets Σ1 and Σ2, as shown
in Fig. 8. Set Σ1 contains all servers whose crucial information is affected by R
(1)
2 , while Σ2
contains all servers whose crucial information is not affected. Without loss of generality, we let
Σ2 = {s1, s2, · · · , sx} and Σ1 = {sx+1, sx+2, · · · , sS}. According to our construction of α0, every
server in Σ2 contains crucial information “12”. At first, the crucial information stored on servers
in Σ1 is also “12”. According to the crucial-info model, the only effect on the server which can
13
affect the return value of read operations is changing this “12” to “21”. So after servers in Σ1 are
affected by R
(1)
2 , their crucial info is changed from “12” to “21”. We denote this execution where
servers in S1 are affected by R
(1)
2 as αˆ0.
In αˆ0, we have that R1 must return 2. It is because W1 precedes W2 by construction, and in
any correct atomic register implementation, read operations after W2 should return 2. Whatever
the effect of R
(1)
2 is, it should not prevent R1 from returning 2. For the chain argument, we need
to construct the other end of the chain. We still do the swapping one server a time. Execution αˆi
is the same with αˆi−1 except for si, for 1 ≤ i ≤ x. The crucial information on si is “12” in αˆi−1,
while the crucial information on si is “21” in αˆi. We do the swapping one server a time for all
servers in Σ2. The tail execution of chain is αˆtail = αˆx, as shown in Fig. 8. Note that the chain
becomes “shorter”. Servers in Σ1 are unchanged, in all executions αˆ0, αˆ1, · · · , αˆx.
Now we describe the sieving process to eliminate servers in Σ1 from our chain argument. As
for execution αˆx, consider the servers in Σ1. They do the computation the same way they do in
αˆ0, i.e., they first contain crucial info “12”, then is affected by R
(1)
2 and change their crucial info to
“21”. Note that the effect of R
(1)
2 is “blind” effect because it does not obtain any information from
the outside world first. The servers in Σ1 and the reader r2 of round-trip R
(1)
2 will not differentiate
αˆx from αˆ0. Thus all servers in Σ1 behave the same way in both executions, and they will have
crucial info “21”.
As for servers in Σ2 in αˆx, after W1 and W2, all servers in Σ2 have crucial information “21”.
This crucial information should remain “21” after R
(1)
1 and R
(1)
2 . Assume for contradiction that
the crucial information on some server sy in Σ2 has been affected by R
(1)
1 and R
(1)
2 , and is changed
from “21” to “12”. Combining the behavior of sy in both αˆ0 and αˆx, we find that sy always end
with crucial information “12” after R
(1)
1 and R
(1)
2 , no matter what the write operations write on the
servers. Such servers obviously cannot decide the return value of R1 and can be safely eliminated.
So we can assume that all servers in Σ2 in αˆx have crucial information “21” after R
(1)
1 and R
(1)
2 .
In this way, R1 will see all servers have crucial info “21” in αˆx, and R1 must return 1 in execution
αˆx. We thus obtain the key property required for the chain argument: in two end executions of
the chain αˆ = (αˆ0, αˆ1, · · · αˆx), R1 return different values. Note that the length of the chain will
not affect our chain argument in Section 3, as long as we have enough servers left for the chain
argument. Operation R1 uses crucial information only from servers in Σ2. Crucial information on
servers in Σ1 have been affected, and the change in this crucial information will not affect that R1
returns 2. Since t = 1 and servers in Σ2 can enable a correct atomic register implementation (we
have this assumption to derive the contradiction), we have at least 3 servers in Σ2.
Another threat to clarify is that when constructing chain β′, β′′ and β, the chains are based on
the swapping among all servers, i.e., chain β′, β′′ and β all have length S+ 1 even after the sieving.
That is to say, the sieving is only conducted on executions in chain α, in order to obtain the critical
server si1 . This raises the potential threat that when constructing β0, β1, · · · , βS , what happens if
R
(1)
1 affects the return value of R2. Observe that in our proof, we only use the fact that, when R2
(both round-trips) skips si1 in executions β
′
S and β
′′
S , R2 returns the same value. This means that,
no matter what the effect of R
(1)
1 is, R2 still returns the same values in both β
′
S and β
′′
S , as long as
the critical server is skipped. Thus our chain argument can successfully go on as in Section 3.
5 Fast Read (W2R1): Impossibility and Implementation
In this section, we discuss the impossibility and implementation of fast read (W2R1) multi-writer
atomic registers. The necessary and sufficient condition of a W2R1 implementation is R < St − 2,
14
𝑅ଵ
ሺଵሻ, 𝑅ଶ
ሺଵሻ
s1
s2
s3
…
sx
sx+1
……
sS
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
2 1
2 1
𝑅ଵ
ሺଶሻ , 𝑅ଶ
ሺଶሻ
s1
s2
s3
…
sx
sx+1
……
sS
2
R1 and R2
return 2
1
2 1
2 1
2 1
1 2
1 2
2 1
2 1
𝛼ො଴
Σ2
servers 
not 
affected
Σ1
servers 
affected 
by  Rଶ
ሺଵሻ
Σ2
servers 
not 
affected
Σ1
servers 
affected
by  Rଶ
ሺଵሻ
R1 and R2
return 1
2 1
2 1
2 1
2 1
operations on server
𝑅ଵ
ሺଵሻ, 𝑅ଶ
ሺଵሻ 𝑅ଵ
ሺଶሻ , 𝑅ଶ
ሺଶሻ
operations on server
crucial-info on server crucial-info on server
𝛼ො௫
2020.5.10修改 2020.5.21改大字体
Figure 8: Eliminating servers affected by R
(1)
2 .
which is the same with that of the single-writer case [12]. The impossibility proof and the algorithm
design are also obtained by extending their counterparts in the single-writer case.
5.1 Impossibility when R ≥ S
t
− 2
We need to prove that it is impossible to obtain a W2R1 implementation when R ≥ St − 2 in the
multi-writer case. It is sufficient to prove that, even there is only one writer and this single writer
can employ two round-trips, W2R1 implementations are still impossible.
The proof in the single writer-case does not depend on how many round-trips a write operation
has. When we change all write operations in the impossibility proof in the single-writer case to
two (or more) round-trips, we let all the two (or more) round-trips of a write operation take place
consecutively and precede all other operations, as shown in Fig. 9 (based on Fig. 6 of [12]). The
rest of the impossibility proof is not affected.
5.2 Implementation when R < S
t
− 2
We derive the W2R1 implementation from the single-writer W1R1 implementation in [12]. Our
implementation is inspired by how multiple writers are handled in the W2R2 implementation [23],
which can also be viewed as a derivation from the single-writer W1R2 implementation [5]. The key
change in the design of a multi-writer implementation is that we use (ts, wi) to denote one value.
Here wi is the writer ID and ts is the version number denoting one value written by wi. Assuming
that the writer IDs are totally ordered, we can thus order all the values from multiple writers using
the lexicographical order when we have equal ts values.
The order among write values is further strengthened by the two round-trip write algorithm.
Specifically, before writing a value, the writer first queries all the servers and calculates the maxTS
in its first round-trip. Then the writer updates value (maxTS + 1, wi) to all servers in the second
round-trip. The two-round-trip write algorithm guarantees that when write operations have the
same ts value, they must be concurrent.
15
B1
B2
B3
B4
B5
W R1 R2 R3 R1 R1
two round-trip write
Figure 9: Fast read impossibility.
As for the the single round-trip read operations, the reader first obtains multiple values from
the servers. It also uses the admissible(·) predicate (defined in the single-writer algorithm in [12])
to test all the values obtained. The admissible(·) predicate is designed to guarantee that: i) a
read never returns older values than that of a preceding write, and that ii) a read never returns
older values than that of a preceding read. Since there are multiple writers, the reader may obtain
multiple admissible values and need to return one of them. Since all the values are totally ordered,
we simply let the reader return the largest admissible value. That is, when the reader needs to
choose from equal ts values, it just chooses the ts value with the largest writer ID.
One potential threat to the correctness of our algorithm is that in the single-writer case, all
values are totally ordered on one single-writer and it is trivial to choose the more up-to-date value.
However in the multi-writer case, the two round-trip algorithm can order non-concurrent write
operations from different writers. But for the concurrent writes, we can only use the (somewhat
arbitrary) order among writer IDs. We need to prove that using the writer ID order will not
comprise the correctness of our implementation.
Specifically, for two read operations R1 preceding R2, the predicate admissible(·) guarantees
that Sad(R1) ⊆ Sad(R2) [12]. Here Sad(Ri) (i = 1, 2) denotes the set of admissible values on
Ri. Denote the return values of R1 and R2 as val1 = max(Sad(R1)) and val2 = max(Sad(R1))
respectively. The potential threat to our multi-writer implementation is that R2 chooses a new
return value only due to the difference in writer ID while the ts values are the same. Specifically,
assume that val2 6= val1, but val2.ts = val1.ts and val2.writer-id > val1.writer-id. Since val2 6∈
Sad(r1) (or R1 and R2 will choose the same return value), we have that val2 is not admissible in
R1’s view, but val2 is admissible in R2’s view. This ensures that R1 must be concurrent with W2
(let Wi denote the write operation of vali for i = 1, 2). Thus we have W1, R1,W2, R2 is a correct
permutation of these operations as required by the definition of atomicity. This ensures that the
return value of R2 is correct.
For other cases, the correctness proof of our W2R1 implementation is principally the same with
that of the W1R1 implementation in [12]. We present our W2R1 implementation and its detailed
proof of correctness in Appendix A. Note that impossibility results in the crash failure model directly
16
imply impossibility in the Byzantine failure model. However, for our W2R1 implementation, we
can further study whether it can be extended to further tolerate Byzantine failures. The extension
is principally the same with that in the single-writer case, as detailed in [12]. We thus omit detailed
discussions here.
6 Related Work
The importance of low latency data access in distributed storage systems motivates the study on
fast implementations of distributed atomic registers. Fast implementation in the single-writer case
is studied in [12], where the sufficient and necessary condition for fast implementation is derived.
As for the multi-writer case, only impossibility for fast read-write implementations is presented.
When examined at a finer granularity, it is still open whether fast implementations are possible
when only read or write are required to be fast. The notion of semifast implementation is presented
in [14]. It is proved that semifast implementation is not possible for multi-writer atomic registers.
In this work, we consider implementations where the read can always be slow (using two or more
round-trips). The implementation we consider is strictly stronger than semifast implementations.
Thus our our impossibility proof is more general and directly implies the impossibility of semifast
implementations. This work concludes this series of studies on fast implementations of distributed
atomic registers. Impossibility proof for fast write implementations is presented, and necessary and
sufficient condition for fast read implementations is derived.
Our impossibility proof for fast write implementations are inspired by the classical result in a
shared-memory setting that “atomic reads must write” [20, 21, 7]. The CAP theorem [8, 15] and
the PARCELC tradeoff [3] in distributed systems also inspire us to prove the impossibility of fast
(low latency, strongly consistent and fault-tolerant) implementations. Our use of the crucial-info
model is inspired by the CHT proof of the weakest failure detector for consensus [10, 13]. In the
CHT proof, a directed acyclic graph is used to store the failure detector outputs on all processes
as well as the temporal relations between them.
The study on atomic register implementations on the Oh-RAM model is closely related to our
work [16]. Both works use chain arguments [6] to prove the impossibility. The main difference lies
in the system model. In the Oh-RAM model, servers are allowed to exchange messages , while
in our client-server model, we only model communications between the client and the server. We
derive our system model from the existing work [5, 23, 4, 12], as well as from our study of popular
distributed storage systems [19, 1, 2].
7 Conclusion and Future Work
In this work, we study fast write and fast read implementations of multi-writer atomic registers.
For fast write implementations, we come up with the impossibility proof, which is based on a three-
phase chain argument. For fast read implementations, we provide the necessary and sufficient
condition for fast implementations, by extending the result of the single-writer case.
In our future work, we will study fast implementations for multi-writer atomic registers from
a different perspective. Specifically, we will fix fast implementations in the first place, and then
quantify how much data inconsistency will be introduced when strictly guaranteeing atomicity is
impossible. We also plan to introduce knowledge calculus to reason about quorum-based distributed
algorithms at a higher level of abstraction.
17
References
[1] Redis distributed data structure store. http://redis.io/, 2020.
[2] Riak distributed database. https://riak.com/, 2020.
[3] Abadi, D. J. Consistency tradeoffs in modern distributed database system design: Cap is
only part of the story. Computer 45, 2 (Feb 2012), 37–42.
[4] Aspnes, J. Notes on Theory of Distributed Systems. Yale University, CPSC 465/565, 2019.
[5] Attiya, H., Bar-Noy, A., and Dolev, D. Sharing memory robustly in message-passing
systems. J. ACM 42, 1 (Jan. 1995), 124–142.
[6] Attiya, H., and Ellen, F. Impossibility Results for Distributed Computing. Morgan &
Claypool, 2014.
[7] Attiya, H., and Welch, J. Distributed Computing: Fundamentals, Simulations and Ad-
vanced Topics. John Wiley & Sons, 2004.
[8] Brewer, E. A. Towards robust distributed systems (abstract). In Proceedings of the Nine-
teenth Annual ACM Symposium on Principles of Distributed Computing (New York, NY, USA,
2000), PODC’00, ACM, pp. 7–.
[9] Burckhardt, S., Gotsman, A., Yang, H., and Zawirski, M. Replicated data types:
Specification, verification, optimality. In Proceedings of the 41st ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages (New York, NY, USA, 2014), POPL ’14,
ACM, pp. 271–284.
[10] Chandra, T. D., Hadzilacos, V., and Toueg, S. The weakest failure detector for solving
consensus. J. ACM 43, 4 (July 1996), 685–722.
[11] Cormen, T., Leiserson, C., Rivest, R., and Ctein, C. Introduction to Algorithms (third
edition). the MIT Press, 2009.
[12] Dutta, P., Guerraoui, R., Levy, R. R., and Vukolic´, M. Fast access to distributed
atomic memory. SIAM J. Comput. 39, 8 (Dec. 2010), 3752–3783.
[13] Freiling, F. C., Guerraoui, R., and Kuznetsov, P. The failure detector abstraction.
ACM Comput. Surv. 43, 2 (Feb. 2011).
[14] Georgiou, C., Nicolaou, N. C., and Shvartsman, A. A. Fault-tolerant semifast imple-
mentations of atomic read/write registers. Journal of Parallel and Distributed Computing 69,
1 (2009), 62 – 79.
[15] Gilbert, S., and Lynch, N. A. Perspectives on the cap theorem. Computer 45, 2 (2012),
30–36.
[16] Hadjistasi, T., Nicolaou, N., and Schwarzmann, A. A. Oh-ram! one and a half round
atomic memory. In Networked Systems (Cham, 2017), A. El Abbadi and B. Garbinato, Eds.,
Springer International Publishing, pp. 117–132.
[17] Herlihy, M. P., and Wing, J. M. Linearizability: a correctness condition for concurrent
objects. ACM Transactions on Programming Languages and Systems 12 (July 1990), 463–492.
18
[18] Huang, K., Huang, Y., and Wei, H. Fine-grained analysis on fast implementations of
multi-writer atomic registers. Technical report, Institute of Computer Software, Nanjing Uni-
versity, 2020.
[19] Lakshman, A., and Malik, P. Cassandra: A decentralized structured storage system.
SIGOPS Oper. Syst. Rev. 44, 2 (Apr. 2010), 35–40.
[20] Lamport, L. On interprocess communication. part i: Basic formalism. Distributed Computing
1, 2 (1986), 77–85.
[21] Lamport, L. On interprocess communication. part ii: Algorithms. Distributed Computing 1,
2 (1986), 86–101.
[22] Lloyd, W., Freedman, M. J., Kaminsky, M., and Andersen, D. G. Stronger semantics
for low-latency geo-replicated storage. In Proceedings of the 10th USENIX Conference on
Networked Systems Design and Implementation (Berkeley, CA, USA, 2013), nsdi’13, USENIX
Association, pp. 313–328.
[23] Lynch, N. A., and Shvartsman, A. A. Robust emulation of shared memory using dynamic
quorum-acknowledged broadcasts. In Proceedings of IEEE 27th International Symposium on
Fault Tolerant Computing (June 1997), pp. 272–281.
[24] Moniz, H., Leita˜o, J. a., Dias, R. J., Gehrke, J., Preguic¸a, N., and Rodrigues,
R. Blotter: Low latency transactions for geo-replicated storage. In Proceedings of the 26th
International Conference on World Wide Web (Republic and Canton of Geneva, Switzerland,
2017), WWW ’17, International World Wide Web Conferences Steering Committee, pp. 263–
272.
[25] Ouyang, L., Huang, Y., Wei, H., and Lu, J. Enabling almost strong consistency for
quorum-replicated datastores. Technical report, Institute of Computer Software, Nanjing Uni-
versity, 2019.
[26] Sivaramakrishnan, K., Kaki, G., and Jagannathan, S. Declarative programming over
eventually consistent data stores. In Proceedings of the 36th ACM SIGPLAN Conference on
Programming Language Design and Implementation (New York, NY, USA, 2015), PLDI ’15,
ACM, pp. 413–424.
[27] Viotti, P., and Vukoliundefined, M. Consistency in non-transactional distributed storage
systems. ACM Comput. Surv. 49, 1 (June 2016).
[28] Wei, H., Huang, Y., and Lu, J. Probabilistically-atomic 2-atomicity: Enabling almost
strong consistency in distributed storage systems. IEEE Trans. Comput. 66, 3 (Mar. 2017),
502–514.
19
A Correctness Proof of the W2R1 Implementation
A.1 Definitions
An execution satisfies atomicity if for every history H ′ of any of it there is a history H that
completes H ′ and H satisfies the properties below. Let Π be the set of all operations in H. There
is an irreflexive partial ordering ≺pi of all the operations in H such that (A1) if op1 precedes op2
in H, then it is not the case that op2 ≺pi op1; (A2) if op1 is a write operation in Π and op2 is any
other operation(including other write operation) in Π, then either op2 ≺pi op1 or op1 ≺pi op2 in Π;
and (A3) the value returned by each read operation is the value written by the last preceding write
operation according to ≺pi.
In a given execution, we denote by wrk,i the write that is preceded by exactly the write with
ts = k by the writer wi in the execution (note that wr0,⊥ is not invoked by the writer). Then we
say that an operation op returns a value (k,wi), if (a) op is wrk,i, or (b) op is a read that returns
the value stored by wrk,i.
For two values (ts1, wi) and (ts2, wj), we say (ts1, wi) < (ts2, wj) if and only if (ts1 < ts2)∨(ts1 =
ts2 ∧ wi < wj). Consider a relation ≺pi such that op1 ≺pi op2 if and only if the value returned by
op1 is smaller than the value returned by op2. Then it is straightforward to show that the ≺pi is a
partial ordering that satisfies properties A1 - A3 if the following properties are satisfied:
(MWA0) Let wr and wr′ be two different write operations, and v1(resp., v2) is the value that wr
(resp., wr′) writes (note that v1 6= v2). If wr ≺σ wr′, then v1 < v2.
(MWA1) If a read returns, it returns a nonnegative timestamp and a wid of the writer proposing
the timestamp.
(MWA2) If a read rd returns value (l, wj) and rd follows write wrk,i, then (l, wj) ≥ (k,wi).
(MWA3) If a read rd returns value (k,wi), then rd does not precede wrk,i.
(MWA4) If reads rd1 and rd2 return value (k,wi) and (l, wj), respectively, and if rd2 follows rd1,
then (l, wj) ≥ (k,wi).
A.2 Implementation and Correctness of Algorithm 1 and 2
In [12], there is a W1R1 atomic register implementation for the single-writer case. We use this
implementation for reference and extend it to a W2R1 implementation, as shown in Algorithm 1
and 2. Since Algorithm 1 and 2 largely reuse the implementation in [12], we only give the pseudo
code and do not explain them in detail. Now we prove the correctness of Algorithm 1 and 2. Our
proof also largely reuses the proof in [12].
To prove that Algorihtm 1 and 2 implement an atomic register, it suffices to prove MWA0-
MWA4. In the proof, we use the following notations.
Definition 1 rcvMsgop denotes the set of received READACK messages the reader collects in
read operation op.
Definition 2 Σop denotes the set of servers from which the reader received READACK messages
in rcvMsgop (in case op is a read), or the set of servers from which the writer received WRITEACK
messages of op (in case op is a write). Notice that for every operation op, |Σop| = S − t.
Definition 3 maxV oldop denotes the value a read op send to all servers when it starts reading. More-
over, maxVop denotes the value computed by the reader in line 23 in Algorithm 1, in op. maxTS
old
op
20
denotes the timestamp in maxV alueoldop and maxTSop denotes the timestamp in maxV alueop.
Definition 4 µop,v,a denotes, in case value v is admissible with degree a in op, the subset of
rcvMsgop, such that
(a) |µop,v,a| ≥ S − at,
(b) for all m ∈ µop,v,a, m has v, and
(c) |⋂m∈µop,v,am.updated| ≥ a.
Definition 5 Πop,v,a denotes the set of servers
⋂
m∈µop,v,am.updated.
Definition 6 Σop,v,a denotes the set of servers that sent messages in µop,v,a.
We start with several simple observations that we use in the rest of the proof.
Lemma 0 (MWA0) Let wr and wr′ be two different write operations, and v1(resp., v2) is the
value that wr (resp., wr′) writes (note that v1 6= v2). If wr ≺σ wr′, then v1 < v2.
Proof. The lemma trivially holds by the definition of value. 
Lemma 1 If a server gets variable x at time T , then the server send all ACK with x after time
T .
Proof. The lemma is proved by trivial server code inspection. 
Lemma 2 Read operation rd may only return a value whose timestamp is either maxTSrd or
maxTSrd − 1.
Proof. rd will get maxVrd = k in several servers. Since wrk,i starting proposing maxVrd, wi
must have finished proposing (k − 1, wi). So (k − 1, wi) will be admissible, and rd will return a
value larger than (k − 1, wi). 
Lemma 3 maxV oldrd is admissible in rd.
Proof. Recall that maxV oldrd denotes the value sent in read message in rd (line 19 in Algorithm
1). By lines 21 in Algorithm 1, every READACK message received by a reader in rd from some
server sj has value maxV
old
rd . Hence, maxTSrd ≥ maxTSoldrd . Since Σop = S − t, so maxV oldrd is
admissible with degree a = 1 in rd. 
Using these several simple observations, we can first prove MWR1 and MWR2.
Lemma 4 (MWA1) If a read returns, it returns a value with nonnegative timestamp.
Proof. To prove the lemma, it is sufficient to show that there is no read rd in which maxTSoldrd <
0 (then the lemma follows from Lemma 3).
To see this, assume by contradiction that there is a read rd by reader ri in which maxTS
old
rd < 0.
By lines 17 in Algorithm 1, this is not the first read by ri, i.e., there is a read rd
′ by ri that
(immediately) precedes ri such that maxTSrd < 0, i.e., S − t servers sent READACK messages in
rd′ with ts′ < 0. However, since server timestamps are initialized to 0, this contradicts Lemma 2.

21
Lemma 5 (MWA2) let a read rd which returns value (l, wj) follow write wrk,i, then (l, wj) ≥
(k,wi).
Proof: Denote by ri the reader that invoked rd and let Σ
′ = Σwrk,i ∩Σrd. Since |Σwrk,i | = S− t
and |Σrd| = S − t, we have |Σ′| ≥ S − 2t.
When a server sj in Σwrk,i (and, hence, in Σ
′) replies to a write message from wrk,i in time T ,
(k,wi) will be in all messages sj sent to clients after T . Since wrk,i precedes rd, rd will see (k,wi)
in all servers in Σ′. So (k,wi) will be admissible with degree a = 2 in rd. And so rd must return a
value (l, wj) larger than (k,wi). 
Then we need to prove MWA3. The following lemma helps prove property MWA3.
Lemma 6 If maxVrd ≥ (k,wi), then rd does not precede wrk,i.
Proof. We focus on the case k ≥ 1, since the proof for k = 0 follows from the definition of
wro,⊥. To prove the lemma, it is sufficient to show that no server has value (k,wi) before wrk is
invoked. Assume, by contradiction, that there is such a server si that is, moreover, the first server
to has a value greater than (k,wi) according to the global clock (at time T ); i.e., no server has
a value (l, wj) greater than (k,wi) before time T . It is obvious that wrk,i is invoked atfer T ; and
so is the second round trip of wrl,j (which might be the same operations), or (k,wi) will greater
than (l, wj). Hence, si must have get (k,wi) after receiving a read message in a read rd
′ invoked
by reader rx in which has value (k,wi). Since (l, wj) ≥ (k,wi) > (0,⊥), there is a read rd′′ by rx
that immediately precedes rd′ which has receive a message containing (k,wi). Since rd′′ precedes
rd′, rd′′ completes before time T . So some server had got value (l, wj) before rd′′ completed. A
contradiction with the assumption that no server gets value (l, wj) before time T . 
Lemma 6 has the following important corollary.
Corollary 1 If maxVrd = (l, wj) > (0,⊥), all wrk,i(k < l) completes before rd completes.
Lemma 7 (MWA3) If a read rd returns value (k,wi), then rd does not precede wrk,i.
Proof. The lemma is proved by Lemmas 2 and 6. By lemma2, rd will only return value with
timestamp maxTSrd or maxTSrd− 1. If k = maxTSrd− 1, then (k,wi) < maxVrd; by corollary 1,
wrk,x proceeds rd. If k = maxTSrd, by lemma 6, rd does not proceed wrmaxVop ; and it is obvious
that wrmaxVop (which may be the same wr as wrmaxVop) does not proceed (maxTSrd, k); so rd does
not proceed wrk,i. 
Lemma 8 v is admissible in rd1 and rd2 follows rd1, then v is admissible in rd2.
Proof. retrd1 must be admissible in rd1 with degree a. There are two cases:
(i) a ≤ R. v is admissible in rd1 and is not admissible in rd2. We show that this case is impossible
by exhibiting appropriate contradictions. In this case, by Lemma 10, there is at least one server
si ∈ Σµrd1,v,a
⋂
Σrd2 . Since rd1 precedes rd2, si first replies with v to rd1 before si replies to rd2.
Finally, by Lemma 1, it follows that si replies to rd2 with v. Let µ1 be the set of READACK
messages sent from servers in Σrd1,v,a
⋂
Σrd2 to rd1. Denote
⋂
m∈µ1 m.v.updated by Π1. Notice
that, by definitions of µ1 and µrd1,v,a, µ1 ⊆ µrd1,v,a. Hence, we have Πrd1,v,a ⊆ Π1 and |Π1| ≥ a.
Let µ2 be the set of messages received by rd2 from servers in Σµrd1,v,a
⋂
Σrd2 . For any server
si ∈ Σµrd1,v,a
⋂
Σrd2 , let m1 and m2 be the messages sent by si in µ1 and µ2, respectively. Since
m1 is sent before m2, we have m1.v.updated ⊆ m2.v.updated. Hence, Π1 ⊆
⋂
m∈µ2 m.v.updated.
Since every server which replies to r2 in rd2 adds r2 to its updated set before replying to r2,
r2 ∈
⋂
m∈µ2 m.v.updated. Since r2 /∈ Π1, |
⋂
m∈µ2 m.v.updated| ≥ |Π1| + 1 ≥ a + 1. Since (a) the
number of messages in µ2 equals the number of servers in Σrd1,a
⋂
Σrd2 , and (b) a + 1 ≤ R + 1,
22
by Lemma 10 and the definition of predicate admissible, we have that v is admissible in rd2 with
degree a+ 1.
(ii) a = R+ 1. Since |w, r1, · · · , rR| = R+ 1 and |
⋂
m∈µrd1,v,am.updated| ≥ a = R+ 1, we have
r2 ∈
⋂
m∈µrd1,v,am.updated. By Lemma 9, Σµrd1,v,a contains at least t + 1 servers. Let rd
′
2 be the
last read by reader r2 which precedes rd1. Since |Σrd| = S − t, there is at least one server sk in
Σµrd1,v,a ∩ Σrd′2 , such that the READACK message m sent by sk is received by r2 in rd′2. In the
following paragraph, we show that m contains retrd1 .
By contradiction, assume m does not contain retrd1 . There is a read rdα by r2 , such that rdα
follows rd′2 and sk sends a READACK message mα to rdα, before sk sends mk ∈ µrd1,v,a, i.e., before
rd1 is invoked. Hence, rd
′
2 is not the last read by reader r2 which precedes rd1 - a contradiction.
Since m contains retrd1 and rd2 follows rd
′
2, it follows that r2 in rd2 sends read messages with
value ≥ maxV aluerd1 . So all servers in Σrd2 will send v to rd2, so v is admissible in rd2 with
degree a = 1. 
Lemma 8 has the following important corollary, which is MWA4.
Corollary 2 (MWA4) If reads rd1 and rd2 return value (k,wi) and (l, wj), respectively, and if
rd2 follows rd1, then (l, wj) ≥ (k,wi).
The following 2 auxiliary lemmas are related to the predicate admissible and to the sizes of the
relevant subsets of the set rcvMsg.
Lemma 9 If maxVrd v is admissible in rd with degree a, then Σµrd,v,a contains at least t + 1
servers.
Proof. By Definition 1 - 6 and inequalities a ≤ R+ 1 and R < St − 2, we have
|Σµrd,v,a | ≥ S − at > (R+ 2)t− (R+ 1)t = t. 
Lemma 10 Assume that maxVrd is admissible with degree a ∈ [1, R+1] in some read rd and that
a complete read rd′ follows rd. Then the number of servers in Σµrd,v,a ∩Σrd is at least S− (a+ 1)t.
Moreover, Σµrd,v,a ∩ Σrd contains at least one server.
Proof. Since |Σµrd,v,a | = |µrd,v,a| ≥ S − at and |Σrd| = S − t, it follows that Σµrd,v,a ∩ Σrd ≥
S − (a+ 1)t. Moreover, since a ∈ [1, R+ 1] and t < SR+2 , we have S − (a+ 1)t ≥ 1. 
23
Algorithm 1: Client logic
1 At each writer wi
2
3 procedure initialization:
4 ts ← 0
5 procedure write()
6 send(read, maxTS) to all servers
7 Wait until receive READACK from S − t servers
8 maxTS ← Max {ts in READACKs }
9 ts ← maxTS + 1
10 val ← (ts, wi)
11 send(write, val) to all servers
12 Wait until receive WRITEACK from S − t servers
13 return OK
14 At each reader ri
15
16 procedure initialization:
17 valQueue ← (0,⊥)
18 procedure read()
19 send(read, valQueue) to all servers
20 Wait until receive READACK from S − t servers
21 rcvMsg ← { m — ri received READACK }
22 valQueue ← (⋃v∈rcvMsg v) ∪ valQueue
23 maxV ← Max { (ts,wid) in rcvMsg }
24 while
25 if ∃ a ∈ [1, R+ 1] : admissible(maxV, rcvMsg, a) then
26 return maxV
27 end
28 else
29 remove maxV from all msg in rcvMsg
30 maxV ← Max { (ts,wid) in rcvMsg }
31 end
32 admissible(v,Msg, a) ≡ ∃µ ⊆Msg∀m ∈ µ : (m has v) ∧ (|µ| ≥ S − at) ∧
(|⋂m′∈µm′.updated| ≥ a)
24
Algorithm 2: Server logic
1 At each server si
2
3 procedure initialization:
4 vali ← (0,⊥)
5 valuevector ← {vali, update}
6 valuevector[vali].update ← ∅
7 procedure update(val, c)
8 if val ¿ vali then
9 valuevector ← valuevector ∪ {val, update}
10 valuevector[val].updated ← { c }
11 vali ← val
12 end
13 else
14 valuevector[val].updated ← valuevector[val].updated ∪ { c }
15 end
16 upon receive (write, val) from writer wk
17 update(val, wk)
18 send WRITEACK to wk
19 upon receive (read, valQueue) from reader rj
20 update(val, rj) for all val in valQueue
21 send READACK to rj
25
