The cost of concurrent, low-contention Read&Modify&Write  by Busch, Costas et al.
Theoretical Computer Science 333 (2005) 373–400
www.elsevier.com/locate/tcs
The cost of concurrent, low-contention
Read&Modify&Write 
Costas Buscha,1, Marios Mavronicolasb,∗,2, Paul Spirakisc,d
aDepartment of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
bDepartment of Computer Science, University of Cyprus, Nicosia CY-1678, Cyprus
cDepartment of Computer Engineering and Informatics, University of Patras, Rion, 265 00 Patras, Greece
dResearch and Academic Computer Technology Institute, P.O. Box 1122, 261 10 Patras, Greece
Received 8 October 2003; received in revised form 25 March 2004; accepted 15 April 2004
Abstract
This work addresses the possibility or impossibility, and the corresponding costs, of devising
concurrent, low-contention implementations of atomicRead&Modify&Write (orRMW )operations
in a distributed system. A natural class of monotone RMW operations associated with monotone
groups, a certain class of algebraic groups introduced here, is considered. The popular Fetch&Add
and Fetch&Multiply operations are examples from the class.
AMonotone Linearizability Lemma is proved and employed as a chief combinatorial instrument in
this work; it establishes inherent ordering constraints of linearizability for a certain class of executions
of any distributed system implementing a monotone RMW operation.
A preliminary version of this work appears in the Proceedings of the 10th International Colloquium on
Structural Information and Communication Complexity (Umea˚, Sweden, June 2003), J.F. Sibeyn ed., pp. 57–72,
Proceedings in Informatics 17, Carleton Scientiﬁc, 2003. This work has been partially supported by the IST
Program of the European Union under contract numbers IST-1999-14186 (ALCOM-FT) and IST-2001-33116
(FLAGS), by funds from the Joint Program of Scientiﬁc and Technological Collaboration between Greece and
Cyprus, by the Greek General Secretariat for Research and Technology, and by research funds from Rensselaer
Polytechnic Institute and University of Cyprus.
∗Corresponding author.
E-mail addresses: buschc@cs.rpi.edu (C. Busch), mavronic@ucy.ac.cy (M. Mavronicolas), spirakis@cti.gr
(P. Spirakis).
1Part of the work of this author was performed while visiting Department of Computer Science, University of
Cyprus.
2Part of the work of this author was performed while visiting Faculty of Computer Science, Electrical
Engineering and Mathematics, University of Paderborn.
0304-3975/$ - see front matter © 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.tcs.2004.04.018
374 C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400
The end results of this work speciﬁcally apply to implementations of (monotone)RMW operations
that are based on switching networks, a recent class of concurrent, low-contention data structures that
generalize counting networks (J. ACM 41(5) (1994) 1020–1048) (which implemented the traditional
Fetch&Increment operation). These results are negative; they are shown through the Monotone
Linearizability Lemma. In particular, the ﬁrst lower bounds on size (the number of switches in the
network) for any (non-trivial) switching network implementing a monotone RMW operation are
derived. It is proven that if the network incurs low contention, then its size must be inﬁnite, no
matter whether the number of states of each switch is ﬁnite or inﬁnite. Since Fetch&Increment is
implementable with counting networks of ﬁnite-size (J. ACM 41(5) (1994) 1020–1048), these lower
bounds imply a space complexity separation between Fetch&Increment and any monotone RMW
operation in the model of switching networks.
The presented lower bounds provide a mathematical explanation for the observed inability of
researchers over the last thirteen years to extend counting networks, while keeping their ﬁnite-size,
high-concurrency and low-contention, in order to perform tasksmore complex thanFetch&Increment
but yet as simple as Fetch&Add.
© 2005 Elsevier B.V. All rights reserved.
Keywords: Distributed computing; Synchronization; Linearizability; Monotone Linearizability Lemma;
Switching networks; Lower bounds
1. Introduction
1.1. Background, motivation and framework
A Read&Modify&Write shared variable or register [8,11], henceforth abbreviated as
RMW, is an abstract variable type that allows reading its old value, computing via some
speciﬁc operator a new value as a function of the old one, and writing the new value back,
all in a single, atomic (indivisible) RMW operation. For example, a Fetch&Increment
register provides an operation that atomically adds one to its value and returns its prior
value; a Fetch&Add register provides an operation that adds any arbitrary integer to its
value and returns its prior value, while a Fetch&Multiply register does a corresponding
thing for multiplication.
Most RMW operations provide strong synchronization primitives that allow for the de-
sign of efﬁcient and transparent algorithms in the asynchronous shared memory model of
distributed computation. So, it is desirable to devise suitable distributed data structures for
the construction of highly concurrent, low-contention implementations of RMW registers.
Intuitively, the contention of an implementation measures the extent to which concurrent
processes access the same memory location simultaneously; it has been argued that con-
tention is a critical factor for the overall efﬁciency of (asynchronous) shared memory algo-
rithms (see, e.g., [4] and references therein). A counting network [2] is a particular class of
ﬁnite-size distributed data structures used to construct high-concurrency and low-contention
implementations ofRMW registers that simultaneously support the Fetch&Increment and
Fetch&Decrement operations [1].
The fundamental question that has motivated this work is the possibility or impos-
sibility, and the corresponding incurred costs, of devising distributed data structures to
C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400 375
construct highly concurrent, low-contention implementations of general RMW registers.
In particular, is there, and at what costs, a generalization of counting networks to imple-
ment the general RMW operation while still retaining the nice properties of ﬁnite-size and
low-contention?
We focus on a speciﬁc class ofRMW operations whose operators correspond to a certain
class of algebraic groups introduced and studied here, which we call monotone groups. A
monotone group has a total order and a monotone subdomain; the latter enjoys a signiﬁ-
cant monotonicity property, which we call Monotonicity under Composition: applying the
operator on an element from the monotone subdomain results to another element in the
monotone subdomain that strictly dominates the ﬁrst with respect to the total order. For
example, the Fetch&Add operation clearly falls into the context of monotone groups; so
also does the Fetch&Multiply operation, and so on. A monotone RMW operation is one
that is associated with a monotone group.
We consider switching networks [6,7], a class of distributed data structures that may be
used for concurrent, low-contention implementations of RMW registers; these are natu-
ral generalizations of counting networks [2]. Roughly speaking, a switching network is a
directed, acyclic graph made up of nodes called switches and output registers, and edges
called wires. A process issuing a RMW operation shepherds a token through the network;
the token traverses a path of switches till it is eventually returned a value upon exiting the
network. The size of a switching network is the total number of switches in it; its concur-
rency is the maximum number of concurrent processes that may simultaneously shepherd
a token through the network.
In order to model the low-contention property for switching networks, we introduce
register bottleneck and switch bottleneck; roughly speaking, both measure the minimum
number of network elements (either output registers or switches) that are accessed by
processes in any inﬁnite execution. Intuitively, if this number is small, some element will
become a bottleneck in some inﬁnite execution, and the network incurs high contention;
hence, a switching network is low-contention if register bottleneck and switch bottleneck
are sufﬁciently large.
1.2. Contribution and signiﬁcance
Our chief combinatorial instrument is a Monotone Linearizability Lemma
(Proposition 5.1). This establishes inherent ordering constraints of linearizability [10] for
a certain class of executions of any distributed system that implements a monotone RMW
operation. Recall that an execution is linearizable [10] if the values returned to operations
respect their real-time ordering.
The end results of our study are negative; they are shown through a modular use of the
Monotone Linearizability Lemma. These results are the ﬁrst lower bounds on size for any
highly concurrent, low-contention switching network that implements a monotone RMW
operation. For any such switching network (other than the trivial single-switch one), we
prove:
• If each switch has a ﬁnite number of states, then the network must contain an inﬁnite
number of switches, even if concurrency is restricted to remain bounded (Theorem 6.1).
376 C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400
• If each switch has an inﬁnite number of states, then the network must still contain
an inﬁnite number of switches if concurrency is now allowed to grow unbounded
(Theorem 6.2).
Our impossibility results settle to the negative the general question about the possibility
of devising distibuted, low-contention data structures of ﬁnite-size, as suitable extensions
to counting networks, to support synchronization operations other than Fetch&Increment
(originally supported by counting networks). This questionwas already stated in the seminal
work ofAspnes et al. [2] that introduced counting networks; however, it has remained tanta-
lizingly open, and progress on it has been so far limited to discovering that counting networks
themselves can also support Fetch&Decrement (simultaneously with Fetch&Increment)
[1]. Our results imply a space complexity separation between Fetch&Increment and any
monotone RMW operation in the model of switching networks.
In summary, our lower bounds imply that we cannot conveniently generalize counting
networks, while still retaining their ﬁnite-size, high-concurrency and low-contention, in
order to perform tasks more complex than just incrementing a counter by one but yet
as simple as adding an arbitrary value to a counter. Thus, our lower bounds provide a
mathematical explanation for the observed inability of researchers in the last thirteen years
or so (since the original conference publication of counting networks [2] in STOC 1991) to
achieve such generalizations.
Finally, we remark that linearizability has so far been studied as a required property
for a distributed system that best guarantees acceptable concurrent behavior. To the best
of our knowledge, our work is the ﬁrst to provide, through the Monotone Linearizability
Lemma, an (non-trivial) instance of a distributed system where linearizability is an inherent
property.
1.3. Related work and comparison
A particular switching network, called Read–Modify–Write network, is given in
[7, Section 4] that implements any general class of commutative functions; Fetch&Add
and Fetch&Multiply are two particular examples of such classes. This Read–Modify–Write
network contains an inﬁnite number of switches, and it has the same topology as a corre-
sponding linearizable counting network presented in [9]. The latency (maximum number of
switches traversed by a token) of this network is shown to be O(n) [7, Theorem 4.14], while
a corresponding lower bound of(n) is also shown in [7, Theorem 3.2] for any general class
of functions with certain functional properties; this family encompasses both Fetch&Add
and Fetch&Multiply as special cases. In contrast, we deal, in this work, exclusively with
the size of switching networks.
A counting network is linearizable [9] if the values returned to tokens respect their real-
time orderings. Herlihy et al. [9, Theorem 5.1] show that any non-trivial (non-blocking)
linearizable counting network must have inﬁnite size. The structure of the proofs of our
impossibility results is inspired by that of the proof of [9,Theorem5.1].The requirement that
all executions be linearizable allows that proof to pick any arbitrary execution of choice
and force it to violate linearizability. Since a switching network for a monotone RMW
operation need not guarantee linearizability in all executions. The role of the Monotone
Linearizability Lemma is to contribute executions that are necessarily linearizable.
C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400 377
1.4. Road map
Section 2 introduces monotone groups. Deﬁnitions for the model of a distributed sys-
tem appear in Section 3. Section 4 provides a framework for switching networks. The
Monotone Linearizability Lemma is the subject of Section 5. Lower bounds on the size of
switching networks implementingmonotone groups are shown in Section 6.We conclude in
Section 7.
2. Monotone groups
In this section, we introduce and study monotone groups. We assume familiarity of the
reader with the very basic concepts from Group Theory, such as a group 〈I,⊕〉 and an
Abelian group. Denote e the identity element of the group 〈I,⊕〉. An elementary property
of groups will be used in some of our later proofs is the Cancellation Law. It states that for
any group 〈I,⊕〉, for any triple of elements a, b, c ∈ I, a⊕b = a⊕c (resp., b⊕a = c⊕a)
implies b = c.
Throughout this section (and in the rest of the paper), denote Z,N and Q the sets of inte-
gers, natural numbers (including zero), and rational numbers (excluding zero), respectively.
Wewill use+ and · to denote the common (binary) operators of addition andmultiplication,
respectively, on these sets. Denote  the less-than-or-equal relation (total order) on these
sets.
Some composite operators are introduced in Section 2.1. Section 2.2 provides the basic
deﬁnitions for monotone groups. Section 2.3 treats n-wise independence.
2.1. Composite operators
We deﬁne two composite operators by applying the operator ⊕ a number of times. For
any integer k, deﬁne the unary operator
⊕
k : I→ I as follows:
⊕
k
a =


a ⊕ a ⊕ · · · ⊕ a︸ ︷︷ ︸
k times
if k > 0,
e if k = 0,
a−1 ⊕ a−1 ⊕ · · · ⊕ a−1︸ ︷︷ ︸
−k times
if k < 0.
Call
⊕
k the power operator. It follows that for any element a ∈ I and integer k,
⊕
ka =⊕
−ka−1. We continue to state two elementary properties of the power operator that will
be used later; their proofs are omitted as straightforward.
Property 2.1 (Superposition of powers). For any Abelian group 〈I,⊕〉, ﬁx any element
a ∈ I. Then, for any sequence of integers k1, k2, . . . , kn,(⊕
k1
a
)
⊕
(⊕
k2
a
)
⊕ · · · ⊕
(⊕
kn
a
)
= ⊕∑n
i=1 ki
a.
378 C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400
Property 2.2 (Composition of powers). For any group 〈I,⊕〉, ﬁx any element a ∈ I.
Then, for any integer k and natural number l,⊕k (⊕l a) =⊕k·la.
For any integer n, the operator
⊎
n : In → I is n-ary.• For n = 0, it assumes the constant value⊎0 = e.• For n = 1,⊎1{a} = a for all elements a ∈ I. For n = −1,⊎−1{a} = a−1.• For |n|2. ⊎n takes as input an ordered multiset of elements {a1, a2, . . . , a|n|} ∈ I,
and it yields the result⊎
n
{a1, a2, . . . , an} =
{
a1 ⊕ a2 ⊕ · · · ⊕ a|n| if n2,
a−11 ⊕ a−12 ⊕ · · · ⊕ a−1|n| if n − 2
denoted also as
⊎n
i=1 ai . Note that, by associativity, the result of applying the operator
is well deﬁned.
Call
⊕
the summation operator. Our deﬁnitions for the power and summation operators
immediately imply that for any element a ∈ I and for any integer n = 0,
⊕
n
a =


⊎
n

a, a, . . . , a︸ ︷︷ ︸
n times

 if n > 0,
⊎
n

a−1, a−1, . . . , a−1︸ ︷︷ ︸
−n times

 if n < 0.
So, roughly speaking, the power operator is some special case of the summation operator
where all inputs are identical. The result
⊎
n {a1, a2, . . . , an} of the summation operator
will sometimes be called a composite expression.
2.2. Monotone groups
Assume now that the set I is totally ordered; thus, a total order  is deﬁned on I. For
any pair of elements a, b ∈ I, write a ≺ b (and, equivalently, b  a) if ab and a = b.
A monotone subdomain of I is a subset M ⊆ I that satisﬁes the following three
properties:
1. Closure: For any two elements a, b ∈M, a ⊕ b ∈M.
2. Identity Lower Bound: For any element a ∈M, e ≺ a.
3. Monotonicity under Composition: For any pair of elements a, b ∈ M, both a ≺ a ⊕ b
and b ≺ a ⊕ b.
Notice that the Identity Lower Bound property implies that e /∈M, so thatM ⊂ I. Notice
also that theMonotonicity underComposition property implies thatM is necessarily inﬁnite.
A monotone group is a quadruple 〈I,M,⊕,〉, where 〈I,⊕〉 is an Abelian group, is a
total order on I, andM is a monotone subdomain of I.
We encourage the reader to verify that both quadruples 〈Z,N\{0},+, 〉 (called Integers
with Addition) and 〈Q,N \ {0, 1}, ·, 〉 (called Rationals with Multiplication) are mono-
tone groups. They are associated with the monotone Fetch&Add and Fetch&Multiply
operations, respectively. There follows an elementary, non-idempotency property of mono-
tone groups that will be used in some of our later proofs.
C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400 379
Property 2.3 (No idempotent power). For any arbitrary monotone group 〈I,M,⊕,〉,
ﬁx any element a ∈M. Then, for any integer k,⊕ka = e implies k = 0.
The proof of Property 2.3 is straightforward; it is left as an exercise for the reader. We
only remark that Property 2.3 does not necessarily hold for a general group; so, it is no
coincidence that its proof relies on using theMonotonicity under Composition property that
holds speciﬁcally for monotone groups.
2.3. n-Wise independence
Fix any integer n2. Consider any n distinct elements a1, a2, . . . , an ∈ I with a1,
a2, . . . , an = e. Say that a1, a2, . . . , an are n-wise independent over 〈I,⊕〉 if for any
sequence of n integers k1, k2, . . . , kn, where −1ki2 for 1 in, that are not all si-
multaneously zero,
⊎n
i=1
⊕
ki
ai = e. Say that the monotone group 〈I,M,⊕,〉 is n-wise
independent if there are n distinct elements a1, a2, . . . , an ∈ M, with a1, a2, . . . , an = e,
that are n-wise independent over 〈I,⊕〉.
From the deﬁnition of n-wise independence, n integers a1, a2, . . . , an ∈ Z, where n2,
are n-wise independent over 〈Z,+〉 if for any sequence of n integers k1, k2, . . . , kn ∈
{−1, 0, 1, 2}, which are not all simultaneously zero,∑ni=1 ki · ai = 0. We prove.
Lemma 2.4. For any integer n2, the monotone group 〈Z,N \ {0},+, 〉 is n-wise
independent.
Proof. Fix any integer 0. Consider the n natural numbers 2, 2+2, . . . , 2+2(n−1) ∈
N \ {0}, which are powers of two; we will prove that these n natural numbers are n-wise
independent over 〈Z,+〉. The proof is by induction on n.
For the basis case where n = 2, consider the natural numbers 2 and 2+2. Fix any
pair of integers k1, k2 ∈ {−1, 0, 1, 2} that are not both simultaneously zero. Clearly,
k12+k22+2 = 2(k1+4k2), which can be zero only if k1 = k2 = 0. So, the natural num-
bers 2, 2+2 ∈ N \ {0} are 2-wise independent over 〈Z,+〉. Hence, the monotone group
〈Z,N \ {0},+, 〉 is 2-wise independent. This completes the proof of the basis case.
Assume inductively that the n − 1 natural numbers 2, 2+2, . . . , 2+2((n−1)−1) =
2+2(n−2) ∈ N \ {0} are (n− 1)-wise independent over 〈Z,+〉.
For the induction step,wewill show that thennatural numbers 2, 2+2, . . . , 2+2(n−1) are
n-wise independent in 〈Z,+〉.Assume, byway of contradiction, that they are not.Thus, there
exist n integers k1, k2, . . . , kn ∈ {−1, 0, 1, 2} which are not all simultaneously zero, such
that
∑n
i=1 ki2+2(i−1) = 0.We proceed by case analysis on the value of kn ∈ {−1, 0, 1, 2}.
• Assume ﬁrst that kn = −1. Then, ∑n−1i=1 ki2+2(i−1) − 2+2(n−1) = 0, or ∑n−1i=1 ki
2+2(i−1) = 2+2(n−1), or∑n−1i=1 ki22(i−1) = 22(n−1). However, since ki2 for all indices
i, 1 in− 1,
n−1∑
i=1
ki22(i−1)  2
n−1∑
i=1
22(i−1)
< 2 ·
2n−4∑
i=0
2i
380 C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400
= 2
(
22n−3 − 1
)
< 22n−2 = 22(n−1),
a contradiction.
• Assume now that kn = 0. Then,∑n−1i=1 ki2+2(i−1) = 0. Since the integers k1, k2, . . . , kn
are not all simultaneously zero while kn = 0, it follows that the integers k1, k2, . . . , kn−1
are not all simultaneously zero. This implies that the n−1 natural numbers 2, 2+2, . . . ,
2+2(n−2) are (n − 1)-wise independent over 〈Z,+〉, which contradicts the induction
hypothesis.
• Assume ﬁnally that kn ∈ {1, 2}. Then, ∑n−1i=1 ki2+2(i−1) + kn · 2+2(n−1) = 0, or,
equivalently,−∑n−1i=1 ki2+2(i−1) = kn · 2+2(n−1), or−∑n−1i=1 ki22(i−1) = kn · 22(n−1).
However, since ki − 1 for all indices i, 1 in− 1,
−
n−1∑
i=1
ki22(i−1) 
n−1∑
i=1
22(i−1)
<
2n−4∑
i=0
2i
= 22n−3 − 1 < 22n−2 = 22(n−1)kn · 22(n−1),
a contradiction.
Since we obtained a contradiction in all possible cases, the proof is now complete. 
We ﬁnally prove that every monotone group is n-wise independent.
Lemma 2.5 (Every monotone group is n-wise independent). For any integer n2, the
monotone group 〈I,M,⊕,〉 is n-wise independent.
Proof. Since the monotone group 〈Z,N \ {0},+, 〉 is n-wise independent (Lemma 2.4),
there exist n distinct natural numbers l1, l2, . . . , ln ∈ N \ {0} that are n-wise independent
over 〈Z,+〉. Fix any element a ∈M and consider the n elements⊕l1a,⊕l2a, . . . ,⊕lna
of M. Clearly, by the Monotonicity under Composition property of the monotone group
〈I,M,⊕,〉, these n elements are distinct. We will prove that they are also n-wise inde-
pendent over 〈I,⊕〉.
Assume, by way of contradiction, that the elements
⊕
l1a,
⊕
l2a, . . . ,
⊕
ln
a are not n-
wise independent over 〈I,⊕〉. Thus, there exist n integers k1, k2, . . . , kn ∈ {−1, 0, 1, 2},
which are not all simultaneously zero, such that
⊎n
i=1
(⊕
ki
(⊕
li
a
))
= e. By Property
2.2, it follows that
⊎n
i=1
(⊕
ki ·li a
)
= ewhich, by the deﬁnition of the summation operator,
may be written as
(⊕
k1·l1a
)⊕ (⊕k2·l2a)⊕· · ·⊕ (⊕kn·ln) = e. By Property 2.1, it follows
that
⊕∑n
i=1 ki ·li a = e. Property 2.3, now implies that
∑n
i=1 ki · li = 0. Since the integers
ki , 1 in, are from the set {−1, 0, 1, 2}, and they are not all simultaneously zero, this
implies that the n natural numbers l1, l2, . . . , ln are not n-wise independent over 〈Z,+〉.
A contradiction. 
Weremark that the proof ofLemma2.5 employs then-wise independence of themonotone
group 〈Z,N \ {0, },+, 〉 (which was established in Lemma 2.4) in order to conclude the
C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400 381
n-wise independence of the arbitrary monotone group 〈I,M,⊕, 〉. So, this proof by
reduction indicates some kind of completeness of this group for the class of all monotone
groups.
3. System model
Section 3.1 provides basic deﬁnitions for a distributed system that implements amonotone
group. Deﬁnitions related to linearizability are given in Section 3.2.
3.1. Distributed systems implementing monotone groups
Our model of a distributed system is patterned after the one in [10, Section 2]; however,
that one is adjusted in order to incorporate the issue of implementing a monotone group
〈I,M,⊕,〉.
We consider a distributed system P consisting of a collection of sequential threads of
control, called processes. Processes are sequential, and each process applies a sequence of
operations to a distributed data structure, called the object, alternately issuing an invocation
and then receiving the associated response. Each invocation at process pi has the form
Invokei (a) for some value a ∈M; each response at processpi has the form Responsei (b)
for some value b ∈M ∪ {e}.
Formally, an execution of system P is a (possibly inﬁnite) sequence  of invocation and
response events. We assume that for each invocation at process pi in execution , there is
a later response in  that matches it and no invocation at pi that precedes the matching
response in . Preﬁxes and sufﬁxes of an execution are deﬁned in the natural way. Say that
an execution  extends a preﬁx  of execution  if  is a preﬁx of  as well.
An operation at process pi in execution  is a matching pair opi = [Invokei (a),
Responsei (b)] of an invocation and response at pi ; we will sometimes say that opi is
of type a. For such an operation, we will write a = In(opi ) and b = Out(opi ); thus, opi has
input and output a and b, respectively. We will sometimes write In(opi ) and Out(opi ) in
order to emphasize reference to execution .
An execution  induces a partial order −→ on the set of operations in  as follows. For
any two operations opi1 = [Invokei1(a1), Responsei1(b1)] and opi2 = [Invokei2(a2),
Responsei2(b2)] at processes pi1 and pi2 , respectively, say that opi1 precedes opi2 in
execution , denoted opi1
−→ opi2 , if the response Responsei1(b1) precedes the invocation
Invokei2(a2). In particular, execution  induces, for each process pi a total order
−→i on
the set of operations at pi in  as follows: For any two operations op(1)i and op
(2)
i , op
(1)
i
−→i
op(2)i if and only if op
(1)
i
−→ op(2)i .
If, in execution , operation opi1 does not precede operation opi2 , then we write opi1 
−→
opi2 . If simultaneously opi1 
−→ opi2 and opi2 
−→ opi1 , then we say that opi1 and opi2 are
parallel in execution , denoted as opi1 ‖ opi2 .
For any execution  of system P, a serialization S() [5] of execution  is a sequence
whose elements are the operations of , and each operation of  appears exactly once in
382 C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400
S(). Thus, a serialization S() is a total order S()−→ on the set of operations in . Notice that
there may be, in general, many possible serializations of the execution . Say that a serial-
ization S() is valid for the monotone group 〈I,M,⊕,〉 if the following two conditions
hold:
1. Valid Start: If opi = [Invokei (a), Responsei (b)] is the ﬁrst operation in S(), then
b = e.
2. Valid Composition: For any pair of operations op(1)i1 = [Invokei1(a1), Responsei1(b1)]
and op(2)i2 = [Invokei2(a2), Responsei2(b2)] that are consecutive inS(), b2 = b1⊕a1.
Sometimes we shall simply refer to a valid serialization, and avoid explicit reference to the
monotone group when such is clear from context.
Say that system P implements the monotone group 〈I,M,⊕,〉 if every execution
 of P has a serialization that is valid for the monotone group. Monotone RMW oper-
ations are those associated in the natural way with monotone groups. Say that system
P implements a (monotone) operation whenever it implements the associated monotone
group.
We continue to state and prove the Unique Serialization Lemma.
Lemma 3.1 (Unique Serialization Lemma). Assume that system P implements the mono-
tone group 〈I,M,⊕,〉. Then, for any execution  of P, there is a unique valid
serialization S().
Proof. Assume, by way of contradiction, that there are two distinct valid serializations
S(1)() = op(1.1), op(1.2), op(1.3), . . . and S(2)() = op(2.1), op(2.2), op(2.3), . . . of execu-
tion . Since S(1)() and S(2)() are distinct, there exists a least index k1 such that op(1.k)
is different from op(2,k). Assume, without loss of generality, that op(1.k) appears at position
l > k in the serialization S(2)(); that is, op(1.k) = op(2.l), so that, in particular,
Out
(
op(1.k)
) = Out (op(2.l)). Notice ﬁnally that for each i < k,op(1.i) = op(2.i).
We proceed by case analysis on the possible values of k.
1. Assume ﬁrst that k = 1. Since S(1)() is a valid serialization of  and k = 1, the Valid
Start condition implies that Out(op(1.k)) = e. Since S(2)() is a valid serialization of 
and l > k = 1, the Valid Composition condition implies that
Out(op(2.l))=Out(op(2.l−1))⊕ In(op(2.l−1)).
TheMonotonicity underCompositionproperty implies thatOut(op(2.l−1))⊕In(op(2.l−1))
 In(op(2.l−1)). Since In(op(2.l−1)) ∈ M, the Identity Lower Bound property implies
that In(op(2.l−1))  e. It follows that Out(op(2.l))  e. A contradiction.
2. Assume now that k > 1. Since S(1)() is a valid serialization of  and k > 1, the Valid
Composition property implies that
Out(op(1.k))=Out(op(1.k−1))⊕ In(op(1.k−1)).
C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400 383
Since S(2)() is a valid serialization of  and l > k > 1, the Valid Composition property
implies that
Out(op(2.l))=Out(op(2.k−1))⊕ In(op(2.k−1))⊕ · · · ⊕ In(op(2.l−2))
⊕ In(op(2.l−1)).
Since Out(op(1.k)) = Out(op(2.l)), it follows that
Out(op(1.k−1))⊕ In(op(1.k−1))
= Out(op(2.k−1))⊕ In(op(2.k−1))⊕ · · · ⊕ In(op(2.l−2))⊕ In(op(2.l−1)).
Since Out(op(1.k−1)) = Out(op(2.k−1)) and In(op(1.k−1)) = In(op(2.k−1)), it follows
that
Out(op(2.k−1))⊕ In(op(2.k−1))
= Out(op(2.k−1))⊕ In(op(2.k−1))⊕ · · · ⊕ In(op(2.l−2))⊕ In(op(2.l−1)).
By the Cancellation Law, it follows that
e= In(op(2.k))⊕ · · · ⊕ In(op(2.l−2))⊕ In(op(2.l−1)).
The Monotonicity under Composition property implies that
In(op(2.k))⊕ · · · ⊕ In(op(2.l−2))⊕ In(op(2.l−1))  In(op(2.k)).
Since In(op(2.k)) ∈M, the Identity Lower Bound property implies that In(op(2.k))  e.
It follows that
In(op(2.k))⊕ · · · ⊕ In(op(2.l−2))⊕ In(op(2.l−1))  e.
A contradiction.
Since we obtained a contradiction in all possible cases, the proof is now complete. 
We remark that the proof of Lemma 3.1 relied heavily on the required properties for a
monotone group, namely the Monotonicity under Composition and Identity Lower Bound
properties. Since these properties do not necessarily hold for a general group, the same
follows for the Unique Serialization Lemma. We conclude this section with an immediate
consequence of the Valid Start and Valid Composition conditions assumed in the deﬁnition
of implementation of a monotone group.
Property 3.2. Assume that system P implements the monotone group 〈I,M,⊕,〉. Then,
for any operation op in an execution  of P,
Out (op)= ⊎
|{op′ | op′S()−→op}|
{
In
(
op′
) | op′ S()−→ op} .
3.2. Linearizability
Our deﬁnitions refer to a distributed systemP implementing amonotone group 〈I,M,⊕,
〉, and, in particular, to any arbitrary execution  of it and its (unique) valid serialization
S().
384 C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400
Say that execution  is linearizable [10] if the serialization S() extends −→; that is,
for any pair of operations op(1) and op(2) such that op(1) −→ op(2), op(1) S()−→ op(2). The
Valid Composition condition implies that for any two operations op(1) and op(2) such that
op(1)
S()−→ op(2),Out(op(1)) ≺ Out(op(2)). Thus, it follows that for any pair of operations
op(1) and op(2) such that op(1) −→ op(2), Out(op(1)) ≺ Out(op(2)).
Say that operation op(1) in execution  is non-linearizable in execution  if there is another
operation op(2) in execution  such that op(2) −→ op(1) while op(2) S()−→ op(1). Say that
operation op in execution  is linearizable in execution  if it is not non-linearizable in
execution . It follows that execution  is linearizable if every operation in execution  is
linearizable in it. Finally, we say that system P is linearizable if all its executions are.
4. Switching networks
In this section, we present a framework for switching networks. Some of our deﬁnitions
are common with some from [6, Section 2] and [7, Section 2], while most of them reﬁne
and extend corresponding ones there. Some basic deﬁnitions are articulated in Section 4.1.
Processes, tokens, switches and wires are described in Section 4.2. Section 4.3 deﬁnes
states, conﬁgurations and executions. The outputs of switching networks are described in
Section 4.4. Section 4.5 introduces some contention measures for switching networks.
4.1. Basic deﬁnitions
A switching network [6], like a counting network [2], is a directed (acyclic) graph in
which the nodes are simple computing elements called switches, and the edges are called
wires.
More speciﬁcally, an (fin, fout)-switch, or switch for short, is a routing element with fin
input wires, fout output wires, and an internal state; fin and fout are called the switch’s fan-in
and fan-out, respectively.A switch’s internal state is a collection of variables, possibly with
initial values. In the initial state of switch, all of its variables are set to their initial values.
The number of internal states of a switch may be either ﬁnite or inﬁnite, giving rise to a
ﬁnite-state or inﬁnite-state switch, respectively. In either case, a switch changes its internal
state according to its transition function.
Aﬁnite-state switching network is a switching networkmadeup fromﬁnite-state switches;
an inﬁnite-state switching network is a switching network made up from inﬁnite-state
switches.
A (win, wout)-switching network N has win input wires and wout output wires, and it is
formed by connecting together switches; thus, we connect output wires of switches to input
wires of other switches. Some switches have input wires (resp., output wires) not connected
to other switches in the network, and these wires are the win input wires (resp., wout output
wires) of the switching network N .
The size S(N ) of a switching network N is the total number of its switches. A network
N is ﬁnite-size if s(N ) < ∞; else, it is inﬁnite-size. The depth d(b) of a switch b in a
switching network N is deﬁned to be 0 if one of its input wires is an input wire of the
C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400 385
network, and maxj d(bj )+1, where the maximum is taken over all switches bj with output
wires connected to input wires of switch b. The depth d(N ) of the networkN is deﬁned as
themaximumdepth of any of its switches.The switching networkN can naturally be divided
into d(N ) layers, so that layer  contains all switches of depth , where 0d(N ). A
path in a switching network is a sequence of switches each (other than the last) connected
to the next.
4.2. Processes, tokens and switches
We assume a collection of asynchronous, non-failing processes that access a switching
network by shepherding tokens through it. A switching network may be accessed by many
tokens simultaneously, which traverse the network asynchronously; however, each process
has at most one token sheperded through the network at each time. The concurrency of a
switching network is the maximum number of processes (and, therefore, tokens as well)
allowed to access the network simultaneously.
Unlike counting networks [2], each token has a state (a collection of variables) which is
allowed to change as the token traverses the network according to its transition function.
The state of a token includes its input value.
A token enters the switching network on one of the network’s win input wires. Then, the
token is instantaneously forwarded to the switch to which the wire belongs; the switch then
routes the token to one of its output wires from which the token enters the next switch in the
network, and so on. Both the switch’s and the token’s states change. The token continues
traversing the network in the same fashion until it reaches one of the wout output wires of
the network. At that point, the token exits the network and returns a value to the process
that owns it.
In more detail, when a token arrives on an input wire of a switch, the following events
occur in a single, atomic (indivisible) step:
The switch removes the token from the input wire and it changes state; the token
changes state and it is routed to an output wire of the switch.
For example, an (fin, fout)balancer is a ﬁnite-state switchwith fan-infin and fan-outfout.
The kth token to arrive on any of its input wires is routed to the output wire fout mod k. Thus,
the state of an (fin, fout) balancer encapsulates the number of tokens that have traversed the
switch modulo its fan-out fout. The state of a token traversing an (fin, fout)-balancer is not
affected. Such balancers have been used to construct counting networks (see, e.g., [2,9]).
4.3. States, conﬁgurations and executions
For each (fin, fout)-switch, denote by xi , 0 ifin − 1, the number of tokens that have
entered the switch on input wire i; similarly, denote by yj the number of tokens that have
exited the switch on output wire j.
A switch’s state includes both its internal state and the collections of tokens on its input
and output wires. A switch is in a quiescent state if there are no tokens currently traversing
the switch; thus, in a quiescent state, the number of tokens that arrived on the input wires
of the switch have exited the switch on its output wires, or
∑fin
i=1 xi =
∑fout
j=1 yj .
386 C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400
A switch satisﬁes the following two conditions:
1. Safety condition: In any state,∑fini=1 xi∑foutj=1 yj . thus, a switch never creates tokens
spontaneously.
2. Liveness condition: Starting from any state, a switch eventually reaches a quiescent state.
An internal conﬁguration of a switching network is a collection of the internal states of
its switches. Consider a ﬁnite-state switching network N with (ﬁnite) switches having S
internal states each. Then, clearly, the number of internal conﬁgurations of the networkN is
ﬁnite and equal to Ss(N ). Note that the number of internal conﬁgurations of an inﬁnite-state
switching network is no longer ﬁnite.
A conﬁguration of a switching network is the collection of the states of its switches; thus,
the conﬁguration of a switching network includes the states of all tokens currently traversing
the network as well.A conﬁguration of a switching network is quiescent if all of its switches
are in a quiescent state. The safety and liveness properties for switches immediately imply
corresponding safety and liveness properties for a switching network.
For any token t and switch s, we denote by  = 〈t, s〉 the state transition in which the
token t passes (in a single atomic step) from an input wire to an output wire of switch s;
thus, in a state transition the state of a switch (including the states of tokens on its input and
output wires) changes according to the transition function of the switch (and the transition
functions of the tokens on its input and output wires). Although state transitions can occur
concurrently, it is convenient to treat them using a model of interleaving semantics.
An execution of a switching network is a ﬁnite or inﬁnite sequence  = Q0, 1,Q1, 2,
Q2, . . ., of alternating conﬁgurations and switch transitions such that:
1. Q0 is the initial conﬁguration, in which there are no tokens on input wires of switches
except for at least one token on input wires of the network, and all switches are in their
initial internal states.
2. For each triple 〈Qi, i+1,Qi+1〉, where i0, the switch transition i+1 carries the con-
ﬁgurationQi to the conﬁgurationQi+1.
A ﬁnite execution ends with a conﬁguration. A ﬁnite execution is complete if it results to
a quiescent conﬁguration.An execution  is sequential if for any two transitions i = 〈t, si〉
and j = 〈t, sj 〉 that involve the same token t, all transitions (if any) between them also
involve that token. Lightly speaking, tokens traverse the network one completely after the
other in a sequential execution.
An execution sufﬁx of a switching network is a sufﬁx of some execution of the network
that starts with a conﬁguration. The deﬁnition of sequential executions can be extended to
sequential execution sufﬁxes in the natural way. So again, tokens traverse the network one
completely after the other in a sequential execution sufﬁx.
An execution fragment of a switching network is a ﬁnite (contiguous) subsequence of
some execution of the network that starts and ends with a conﬁguration. A pump of a
switching network is an execution fragment of it that starts and ends with the same quiescent
conﬁguration. The concatenation 1 · 2 of two execution fragments 1 and 2 is deﬁned
when 2 follows 1 in the same execution of the network; thus, the end conﬁguration of
1 is the start conﬁguration of 2. The concatenation is also an execution fragment; thus, it
does not repeat the common conﬁguration of the two original execution fragments.
For an execution of a switching network, we say that concurrency is bounded if the
number of concurrent processes accessing the network in the execution is bounded. In an
C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400 387
(inﬁnite) execution, we say that concurrency is unbounded if the number of concurrent
processes accessing the network in the execution is unbounded (either ﬁnite or inﬁnite).
4.4. Outputs
The input and output values of token t in execution  will be denoted as In(t) and
Out(t), respectively.
For ﬁnite-state switching networks, we include an additional component on the output
wires of the switching network, namely the output registers. More speciﬁcally, there is an
output register associated with each output wire of the switching network. However, unlike
ﬁnite switches, each output register has an inﬁnite number of states. Denote or(N ) the
number of output registers in a ﬁnite-state switching network N .
The output value for a token in a ﬁnite-state switching network is computed on the output
register residing on the network’s output wire from which the token exits. When a token
arrives on an output register. the following events occur in a single, atomic (indivisible)
step:
1. The token computes its output value according to the output register’s state.
2. The state of the output register changes according to its previous state and the state of
the token (which includes its input value).
Note that the input value of a token does not affect its output value, but it may as well affect
the output values of tokens that will later access the same output register.
We remark that ﬁnite-state switching networks correspond more closely to traditional
counting networks [2], where a token fetching the counter’s value and incrementing the
counter by one obtains the value from the register attached to the outputwire it will exit from.
We also remark that output registers are necessary for this kind of switching networks, since
they provide an inﬁnite number of different output values to tokens, while ﬁnite switches,
used only for routing, are unable to do so.
For inﬁnite-state switching networks, there are no attached output registers and the output
value of a token is determined according to the state of the token when it exits the network.
4.5. Contention measures
In a switching network, contention represents the extent to which concurrent processes
access the same switch or output register simultaneously. We use two complexity-theoretic
measures to model contention in switching networks, namely register bottleneck and switch
bottleneck, which are introduced here for the ﬁrst time.
The deﬁnition of register bottleneck applies only to ﬁnite-state switching networks.
Deﬁnition 4.1 (Register bottleneck). The register bottleneck of a ﬁnite-state switching net-
work N is the minimum number of output registers, where the minimum is taken over all
inﬁnite executions of the network, that are accessed by tokens in some inﬁnite sufﬁx of an
inﬁnite execution of the network.
On the account of register bottleneck, a switching network is low-contention if its register
bottleneck is sufﬁciently large. A register bottleneck of 1 is the worst possible register
388 C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400
bottleneck, since it implies the existence of some execution of the network in which as
many tokens as processes participating in the execution will eventually accumulate in front
of the same output register, which thus becomes a “hot-spot”. Note that register bottleneck
is a trivial lower bound on the number of output registers of a ﬁnite-state switching network.
We prove:
Lemma 4.1. Assume that the register bottleneck of N is at least 2. Then, in any pump of
N , there exist at least two distinct tokens that access two different output registers.
Proof. Assume, by way of contradiction, that there is a pump  of N in which all tokens
access the same output register. Clearly, the inﬁnite sequence  ·  · . . . of pumps is an
inﬁnite sufﬁx of an inﬁnite execution of N in which all tokens access the same output
register. It follows that the register bottleneck of N is 1. A contradiction. 
The deﬁnition of switch bottleneck will be useful for inﬁnite-state switching networks.
Deﬁnition 4.2 (Switch bottleneck). The switch bottleneck of a switching networkN is the
minimum number of switches, where the minimum is taken over all inﬁnite executions of
the network, that are accessed by an inﬁnite sequence of tokens exiting a switch connected
to them that has been accessed by an inﬁnite number of tokens itself.
On the account of switch bottleneck, a switching network is low-contention if its switch
bottleneck is sufﬁciently large. A switch bottleneck of 1 is the worst possible switch bottle-
neck since it implies the existence of some inﬁnite execution of the network in which some
switch is accessed by an inﬁnite number of tokens and it outputs a ﬁnite number of tokens
on all but one of its output wires. Intuitively, such a switch does not effectively “balance”
the inﬁnite stream of tokens that access it, but it emits almost all of them (except for a
ﬁnite number) to the same switch in the next layer; this last switch will eventually become
a “hot-spot”.
Clearly, in the special case where switches are balancers which “balance” their input
tokens, the switch bottleneck is the least (over all balancers) number of output wires of a
balancer, which (usually) exceeds 1. Thus, the requirement that switch bottleneck be high
can also be seen as a generalization of the balancing property from balancers to general
switches.
Note that switch bottleneck is a trivial lower bound on the number of switches in any
layer (other than layer 1) of an inﬁnite-switch network. In our later proofs, we will also
assume that this is also a lower bound for layer 1.
Consider a switching network with a certain switch bottleneck. Consider now what hap-
pens when some tokens have been permanently “halted” in front of some switches of the
network in some inﬁnite sequence; this resulting sequence is not necessarily an execution
since it fails to guarantee liveness. Recall, however, that a switch operates locally: it changes
its state according to its state and the states of tokens that traverse it, and independently
of the operation of other tokens and switches in the network. This implies that the switch
bottleneck of the network is maintained also for such sequences. (This observation will be
used in the proof of Theorem 6.2.)
C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400 389
4.6. Switching networks implementing monotone groups
A switching network N can be used to implement a monotone group 〈I,M,⊕,〉 as
follows:
• Token t issued by process pi corresponds to an operation opi = [Invokei (a),
Responsei (b)] invoked by process pi , where a ∈ M and b ∈ M ∪ {e}. We say that
a is the input value or type of the token t, and b is the output value of the token t. The
input value of the token is part of the token’s (initial) state.
• For any execution , the invocation of operation opi corresponds to the ﬁrst transition
i = 〈ti , si〉 in execution , where ti = t and si is an input switch of the network;
this transition occurs when the token enters the network. The response of operation op
corresponds to the latest transition j = 〈tj , sj 〉 in execution , where tj = t and sj is
an output switch of the network; this transition occurs when the token exits the network.
• When token t exits the network, it carries encapsulated in its state the output value b that
operation opi is returned.
It is now straightforward to formally deﬁne when the switching network N implements
the monotone group 〈I,M,⊕,〉.
4.7. The covering technique
In some of our impossibility proofs, we will use a variant of the variable covering
technique originally introduced by Burns and Lynch [3] for proving lower bounds on the
number of read/write registers needed to solve (deadlock-free)mutual exclusion. Intuitively,
a token covers a switch if it is about to access the switch. We omit the formal deﬁnition
here, which can be immediately extended to tokens covering output registers as well.
5. The Monotone Linearizability Lemma
Throughout this section, we refer to a distributed system P implementing a monotone
group 〈I,M,⊕,〉. The main contribution of the section is to state and prove the Mono-
tone Linearizability Lemma,which establishes ordering constraints of linearizability on the
system P. Recall that, by Lemma 2.5, the monotone group 〈I,M,⊕,〉 is n-wise inde-
pendent for any integer n2. So, there are n distinct elements a1, a2, . . . , an ∈ M, with
a1, a2, . . . , an = e, which are n-wise independent over 〈I,⊕〉. The proof of theMonotone
Linearizability Lemma amounts to establishing a contradiction to n-wise independence for a
hypothetical non-linearizable execution, in which the types of the RMW operations issued
by the processes are a1, a2, . . . , an. We are now ready to state and prove the Monotone
Linearizability Lemma.
Proposition 5.1 (Monotone Linearizability Lemma). Consider any execution  of system
P in which each process pi issues only operations of type ai , where 1 in. Then,  is
linearizable.
Proof. We start with an informal outline of our proof. We will proceed by contradiction.
We will consider the earliest non-linearizable operation opk (at process pk) in  and the
390 C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400
latest operation opl that precedes it. We will use these two operations to construct two
executions 1 and 2 that are indistinguishable to process pl with respect to operation opl .
This indistinguishability implies that opl receives the same output in these two executions.
The contradiction will follow from the comparison of the two identical outputs, where we
use simple algebraic properties of (monotone) groups in order to contradict the assumed
n-wise independence. We now continue with the details of the formal proof.
Assume, by way of contradiction, that  is not linearizable. So, there is at least one
operation that is non-linearizable in execution . Consider the earliest such operation opk
(occurring at process pk), and let opl be the latest operation (occurring at process pl) that
precedes opk in . So, opl
−→ opk while opk S()−→ opl , where S() is the (unique) valid
serialization of .
In our proof, we will use the operations opk and opl in order to deﬁne and treat two ﬁnite
preﬁxes of execution :
• the ﬁnite preﬁx 1 of execution  that ends with the response for operation opk , and
• the ﬁnite preﬁx 2 of execution  that ends with the response for operation opl .
Clearly, 2 is a preﬁx of 1 as well. We ﬁrst treat separately each of the two preﬁxes 1 and
2 and a corresponding extension of it; we then treat them together.
Properties of the preﬁx 1 and its extension 1: Consider a ﬁnite execution 1, which is
an extension of 1 that includes no additional invocations by processes; so, 1 is extended
to only include responses to invocations that are pending in 1.
Since 1 is a preﬁx of both  and 1, it follows that all operations whose responses are
included in 1 (or, in other words, they are not preceded in either  or 1 by the response
for opk) have identical outputs in  and 1. In particular, Out
(
opl
) = Out1 (opl) and
Out
(
opk
) = Out1 (opk). Take now the (unique) valid serialization S(1) of 1.
Since opk
S()−→ opl theValid Composition condition (for S()) implies thatOut
(
opk
) ≺
Out
(
opl
)
. SinceOut
(
opk
) = Out1 (opk) andOut (opl) = Out1 (opl), it follows that
Out1
(
opk
) ≺ Out1 (opl). The Valid Composition condition (for S(1)) implies now that
opk
S(1)−→ opl .
For each process pi , where 1 in, denote (1)i the number of operations at pi that
precede opl in the serialization S(1). Assume that:
• (1)i,a of those (1)i operations have their responses followed in 1 by that for opl ;
• the rest (1)i,b of them have their responses preceded in 1 by that for opl .
So, (1)i = (1)i,a + (1)i,b . We next prove a simple property.
Property 5.2. For each process pi , where 1 in, (1)i,b2.
Proof. Consider the earliest (if any) operation op at process pi such that op
S(1)−→ opl , while
the response for op follows the one for opl in 1. We proceed by case analysis on the order
of the responses for op and opk in 1.
1. Assume ﬁrst that the response for op follows the one for opk in 1. Since 1 includes no
invocations following the response for opk , it follows that there is no other operation at
pi following op, so that (1)i,b1 in this case.
C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400 391
2. Assume now that the response for op precedes the one for opk in 1. Consider any other
operation op′ at pi that follows op in 1, while still op′
S(1)−→ opl .We will prove that there
is at most one such additional operation.
• By construction of 1, the invocation for op′ precedes the response for opk in 1.
• Assume, by way of contradiction, that the response for op′ precedes the response for
opk in 1. Thus, op′ is included in the preﬁx 1. Since 1 is a preﬁx of , it follows
that the response for op′ precedes the response for opk in  as well. This implies that
Out1
(
op′
) = Out (op′). Since op′ S(1)−→ opl , the Valid Composition condition (for
S(1)) implies that Out1
(
op′
) ≺ Out1 (opl). Since Out1 (opl) = Out (opl), it
follows that Out
(
op′
) ≺ Out (opl). The Valid Composition condition (for S())
implies now that op′ S()−→ opl .
Since the response for op follows the response for opl in 1, while op′ follows op
in 1, it follows that opl
1−→ op′. Since op′ is included in the preﬁx 1 of 1, which
is also a preﬁx of , this implies that opl
−→ op′ as well. It follows that op′ is a
non-linearizable operation in . Since the response for op′ precedes the response for
opl in , it follows that op′ is an earlier than opl , non-linearizable operation in .
A contradiction.
It follows that the response for op′ follows the response for opk in 1.
Since 1 includes no invocations following the response for opk , it follows that there is
no other operation at pi following op′ in 1, so that 
(1)
i,b2 in this case.
Thus, in all cases, (1)i,b2, as needed. 
Since opk
S(1)−→ opl while the response for opl precedes the response for opk in 1, a slight
strengthening of Property 5.2 for the particular case of process pk is now immediate:
Property 5.3. 1(1)k,b2
By Property 3.2, Out1
(
opl
)
is a composite expression involving for each process pi ,
1 in, (1)i contributions of ai . By the Commutativity property, these n types of contri-
butions can be separated from each other in the composite expression, so that
Out1
(
opl
)= n⊎
i=1
⊕
(1)i
ai .
Properties of the preﬁx 2 and its extension 2: Consider a ﬁnite execution 2, which
is an extension of 2 that includes no additional invocations by processes; so, 2 is an
extension that only includes responses to invocations that are pending in 2 (in addition to
the responses included in 2).
Since 2 is a preﬁx of both  and 2, it follows that all operations whose responses are
included in 2 (hence, they are not preceded in either  or 2 by the response for opl)
have identical outputs in  and 2. In particular, Out
(
opl
) = Out2 (opl). Take now the(unique) valid serialization S(2) of 2.
392 C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400
For each process pi , where 1 in, denote (2)i the number of operations at pi that
precede opl in the serialization S(2). Assume that:
• (2)i,a of those (2)i operations have their responses not preceded by that for opl in 2;
• the rest (2)i,b of them have their responses preceded in 2 by that for opl .
So, (2)i = (2)i,a + (2)i,b . We continue to prove a simple property:
Property 5.4. For each process pi , where 1 in, (2)i,b1.
Proof. Consider the earliest (if any) operation op at process pi such that op
S(2)−→ opl , while
the response for op follows the one for opl in 2. Since 2 includes no invocations following
the response for opl , it follows that there is no other operation at pi following op in 2, so
that ′i,b1, as needed. 
By Property 3.2, Out2
(
opl
)
is a composite expression involving for each process pi ,
1 in, (2)i = (2)i,a + (2)i,b contributions of ai . By the Commutativity property, these n
types of contributions can be separated from each other in the composite expression, so that
Out2
(
opl
)= n⊎
i=1
⊕
(2)i
ai .
Joint properties of the preﬁxes 1 and 2 and their extensions 1 and 2: SinceOut
(
opl
)
= Out1
(
opl
)
and Out
(
opl
) = Out2 (opl), it follows that Out1 (opl) = Out2 (opl).
We continue to prove two simple properties of the preﬁxes 1 and 2, and their extensions
1 and 2. The ﬁrst property relates 
(1)
i,a and 
(2)
i,a , while the second one relates 
(1)
i,b and 
(2)
i,b .
We start with the ﬁrst.
Property 5.5. For each process pi , where 1 in, (1)i,a = (2)i,a .
Proof. We will prove that both (1)i,a
(2)
i,a and 
(2)
i,a
(1)
i,a .
1. To prove that (1)i,a
(2)
i,a , consider any operation op at process pi such that op
S(1)−→ opl ,
while the response for op precedes the response for opl in 1. So, op is included in
preﬁx 2.
• Since 2 is a preﬁx of 2, and it ends with the response for operation opl , it follows
that the response for op precedes the response for opl in 2 as well.
• Since 2 is a preﬁx of both 1 and 2, it follows that Out1 (op) = Out2 (op). Since
op
S(1−→ opl , the Valid Composition condition for S(1) implies that Out1 (op) ≺
Out1
(
opl
)
. Since Out1
(
opl
) = Out2 (opl), it follows that Out2 (opl) = Out2(
opl
)
. Thus, the Valid Composition condition for S(2) implies that op
S(2)−→ opl .
It follows that (1)i,a
(2)
i,a .
C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400 393
2. To prove that, (1)i,a
(2)
i,a , consider any operation op at process pi such that op
S(2)−→ opl ,
while the response for op precedes the response for opl in 2. So, op is included in
preﬁx 2.
• Since 2 is a preﬁx of 1, and it ends with the response for operation opl , it follows
that the response for op precedes the response for opl in 1 as well.
• Since 2 is a preﬁx of both 1 and 2, it follows that Out1 (op) = Out2 (op). Since
op
S(2)−→ opl , the Valid Composition condition for S(2) implies that Out2 (op) ≺
Out2
(
opl
)
. Since Out1
(
opl
) = Out2 (opl), it follows that Out1 (opl) = Out1(
opl
)
. Thus, the Valid Composition condition for S(1) implies that op
S(1)−→ opl .
It follows that (1)i,a
(2)
i,a .
So, in total, (1)i,a = (2)i,a , as needed. 
We continue with the second property.
Property 5.6. (1)k,b − (2)k,b1.
Proof. Recall from Property 5.3 that 1(1)k,b2. We proceed by case analysis on 
(1)
k,b.
1. Assume ﬁrst that (1)k,b = 1. Since opk
S(1)−→ opl and the response for opk follows the
response for opl in 1, it follows that opk counts for 
(1)
k,b. Since 
(1)
k,b = 1, this implies
that no operation (in 1) other than opk counts for (1)k,b; that is, there is no operation op′k
(other than opk) at process pk in 1 such that op′k
S(1)−→ opl while the response for op′k
follows the response for opl in 1.
We will prove that (2)k,b = 0 in this case. Assume, by way of contradiction, that
(2)k,b = 0. Thus, there is some operation op′k in 2 such that op′k
S(2)−→ opl while the
response for op′k follows the response for opl in 2. Since 2 includes no invocations
following the response for opl , it follows that the invocation for op′k precedes the response
for opl in 2. So, the invocation for op′k is included in the preﬁx 2 of 2. Since 2 is a
preﬁx of both  and 1 as well, it follows that op′k is an operation in each of  and 1 as
well such that its invocation precedes the response for opl in each of  and 1.
Since the invocation for op′k precedes the response for opl in  (resp., 1) and opl
−→
opk (resp., opl
1−→ opk), it follows that op′k
−→ opk (resp., op′k
1−→ opk).
Since the response for op′k is not included in preﬁx 2, it follows that the response for
op′k follows the response for opl in each of and 1 aswell.This implies that opl
S(1)−→ op′k .
Since opl
S(1)−→ op′k , the Valid Composition condition for 1 implies that Out1
(
opl
) ≺
Out1
(
op′k
)
. On the other hand, since op′k
1−→ opk , the response for op′k is included
in preﬁx 1, which is a preﬁx of both  and 1, so that Out
(
op′k
) = Out1 (op′k).
Since also Out
(
opl
) = Out1 (opl), while by assumption, Out (opk) ≺ Out (opl),
it follows that Out
(
opk
) ≺ Out (op′k).
394 C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400
Thus, in total, Out
(
opk
) ≺ Out (op′k), op′k −→ opk , and the response for opk
follows the response for opl in . So, opl is not the latest operation in  that precedes
opk in  and yet Out
(
opk
) ≺ Out (opl). A contradiction.
The contradiction implies that (2)k,b = 0, so that (1)k,b − (2)k,b = 1 in this case.
2. Assume now that (1)k,b = 2. By Property 5.4, (2)k,b1, so that (1)k,b−(2)k,b1 in this case.
Thus, in all cases, (1)k,b − (2)k,b1, as needed. 
Since Out1
(
opl
) = Out2 (opl), we have that
n⊎
i=1
⊕
(1)i
ai =
n⊎
i=1
⊕
(2)i
ai .
By Property 2.1, it follows that for each process pi , where 1 in,⊕
(1)i
ai = ⊕
(1)i −(2)i
ai ⊕ ⊕
(2)i
ai .
It follows that
n⊎
i=1

⊕
(1)i
ai

= n⊎
i=1

 ⊕
(1)i −(2)i
ai ⊕⊕
(2)i
ai


=

 ⊕
(1)1 −(2)1
a1 ⊕⊕
(2)1
a1

⊕ · · · ⊕

 ⊕
(1)n −(2)n
an ⊕⊕
(2)n
an


(by deﬁnition of the summation operator)
=

 ⊕
(1)1 −(2)1
a1 ⊕ · · · ⊕ ⊕
(1)n −(2)n
an

⊕

⊕
(2)1
a1 ⊕ · · · ⊕⊕
(2)n
an


(by Commutativity and Associativity)
=
n⊎
i=1

 ⊕
(1)i −(2)i
ai

⊕ n⊎
i=1

⊕
k
(2)
i
ai


(by deﬁnition of the summation operator)
=
n⊎
i=1

 ⊕
(1)i −(2)i
ai

⊕ n⊎
i=1

⊕
k
(1)
i
ai

 .
Hence, the Cancellation Law implies that
n⊎
i=1

 ⊕
(1)i −(2)i
ai

= e.
Consider any index i, where 1 in. By Property 5.5, (1)i − (2)i = (1)i,b − (2)i,b . Now,
Properties 5.2 and 5.4 immediately imply that −1(1)i,b − (2)i,b2, On the other hand,
Property 5.6 implies that (1)k,b − (2)k,b1, so that not all differences (1)i,b − (2)i,b , where
C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400 395
1 in, are simultaneously zero. It follows that the n elements a1, a2, . . . , an are not
n-wise independent over 〈I,⊕〉. A contradiction. 
6. The impossibility of ﬁnite-size switching networks
Finite-state and inﬁnite-state networks are considered in Sections 6.1 and 6.2,
respectively.
6.1. Finite-state networks
We show:
Theorem 6.1 (Impossibility result for ﬁnite-state network). There is no non-trivial ﬁnite-
state switching network N with concurrency (or(N )+ 1) · (Ssize(N ) + 1) that has ﬁnite
size, incurs register bottleneck at least 2 and implements a monotone group 〈I,M,⊕,〉.
Proof. Assume, by way of contradiction, that there is such a switching networkN . Recall
that the number of internal conﬁgurations ofN is Ss(N ), where S is the number of internal
states of each switch.
Consider a sequential execution  of networkN involving (or(N )+ 1) · (Ssize(N ) + 1)
tokens, whose types are (or(N )+ 1) · (Ssize(N ) + 1)-wise independent over 〈I,⊕, 〉. By
the Monotone Linearizability Lemma, execution  is linearizable. Write  = 1 · 2 ·
. . . · or(N )+1/2, where each execution fragment i , 1 i
⌈
or(N )+ 12
⌉
, includes the
traversals of Ssize(N ) + 1 tokens.
Take now any execution fragment i , where 1 i (or(N )+ 1). Since each token tra-
verses at least one switch, i contains at least Ssize(N ) + 1 conﬁgurations; so, it contains at
least Ssize(N ) + 1 internal conﬁgurations. Since the total number of internal conﬁgurations
of N is Ssize(N ), the Pigeonhole Principle implies that some internal conﬁguration of N
is repeated in i , so that i contains at least one pump. Lemma 4.1 implies that there are at
least two distinct tokens that access two different output registers in any such pump.
It follows that execution  contains at least or(N ) + 1 pumps, and the total number of
output registers (allowing repetitions) accessed in these pumps is at least 2 · (or(N )+ 1) >
2or(N ). The Pigeonhole Principle implies that there is at least one output register accessed
by tokens in at least three different pumps.
So there are tokens t1, t2 and t3, with t1
−→ t2 −→ t3, and pumps 1, 2 and 3 of
 such that token ti accesses the same output register r in pump i of , where 1 i3.
Consider also output register r ′ accessed by token t ′2 in pump2 of . Since  is a sequential
execution, t1
−→ t ′2
−→ t3.
We use now execution  to construct another ﬁnite (but not sequential) sequence  of
alternating conﬁgurations and switch transitions, which involves the same tokens as , with
the same types and in the same order, except for the following changes:
All switch transitions that involve output register r, starting with the one involving
token t1 (and r) and preceding the one involving token t3 (and r) are scheduled to
396 C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400
occur immediately after the switch transition involving token t3 (and r), and in the
same order (as in ).
So, roughly speaking, all tokens starting with t1 and not following t3 that access r are
“halted” once they get to cover r and till immediately after token t3 accesses r.
Clearly, the sequence  is an execution of N , in which each token accesses the same
output register as in .
Since execution  is linearizable and t1
−→ t ′2
−→ t3, it follows that t1 S()−→ t ′2
S()−→ t3.
Thus, the Monotonicity under Composition property implies that Out(t1) ≺ Out(t ′2) ≺
Out(t3).
Since  uses tokens with the same types as , it follows that the types of the
tokens in  are also
⌈
or(N )+ 12
⌉ · (Ssize(N ) + 1)-wise independent over 〈I,⊕, 〉. By
the Monotone Linearizability Lemma, execution  is linearizable. By construction, t ′2
−→
t3. It follows that t ′2
S()−→ t3. Thus, the Monotonicity under Composition property implies
that Out(t ′2) ≺ Out(t3). However, by construction of , Out(t ′2) = Out(t ′2), while
Out(t3) = Out(t1). It follows that Out(t ′2) ≺ Out(t1). A contradiction. 
We remark that the assumption of non-triviality is essential for Theorem 6.1. Since each
token can atomically invoke a computation on an output register, we can implement a
monotone RMW operation by a trivial switching network consisting of a single switch that
outputs tokens along one output wire, which has an associated register that maintains the
state of the RMW variable to be implemented. The switch serializes the operations (that
correspond to the tokens) so that they can be atomically invoked (by the tokens) on the
register.
Recall the Integers with Addition monotone group 〈Z,N \ {0},+, 〉 and the Rationals
with Multiplication monotone group 〈Z,N \ {0},+, 〉, which are associated with the
monotone Fetch&Add and Fetch&Multiply operations, respectively. So, Theorem 6.1 im-
mediately implies corresponding impossibility results for switching networks implementing
the Fetch&Add and Fetch&Multiply operations.
6.2. Inﬁnite-state networks
Clearly, the proof of Theorem 6.1 is not applicable to inﬁnite-state networks since the
number of their possible internal conﬁgurations is no longer ﬁnite. Thus, we need to develop
new techniques in order to handle such networks. We show:
Theorem 6.2 (Impossibility result for inﬁnite-state network). There is no non-trivial
inﬁnite-state switching network with unbounded concurrency that has ﬁnite size, incurs
switch bottleneck at least 2 and implements a monotone group 〈I,M,⊕,〉.
Proof. Assume, by way of contradiction that there is such a switching networkN . Partition
N into layers 1, 2, . . . , d(N ) in the natural way. Assume, without loss of generality, that
any switch b at layer , where 2 < d(N ), has its input wires connected to switches of
C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400 397
layer  − 1 and its output wires connected to switches of layer  + 1. 3 Since the switch
bottleneck of N is 2, there are at least two switches in each of its layers.
We ﬁrst construct an inﬁnite non-sequential sequence  for network N . We prepare the
reader that  is nearly an execution of networkN since it only fails the liveness condition.
However, as we discussed in Section 4.5,  maintains the switch bottleneck of N , and this
is all we will need of it. For clarity of exposition, we will abuse terminology and still call
 (and several sequences we will derive from it as well) an execution.
Construction of execution : The execution  involves an inﬁnite sequence of tokens
t1, t2, . . . with associated types a1, a2, . . . that are issued by distinct processes. The types
are chosen so that for each (ﬁnite) preﬁx t1, . . . , tn of tokens, the associated types a1, . . . , an
are n-wise independent over 〈I,⊕〉.
To construct the execution , we ﬁrst deﬁne through a simultaneous induction two ﬁnite
sequences each of length d(N ):
• a sequence of pairs of disjoint, inﬁnite subsequences of the type sequence a = a1, a2, . . .,
denoted 〈a1.i , a2.i〉, where 1 id(N );
• a sequence of pairs of distinct switches from the same layer in the network, denoted
〈b1.i , b2.i〉, where 1 id(N ).
The properties of the two sequences will be used inductively along the way. Speciﬁcally,
the induction proceeds as follows:
Basis case:Assume that i = 1.
• Fix a1.1 and a2.1 to be the odd and even (inﬁnite) subsequences of a, respectively.
• Fix b1.1 and b2.1 to be any arbitrary switches in layer 1 of the network.
Call tokens in sequences a1.1 and a2.1 the odd and even tokens, respectively.
Induction hypothesis: Assume that we have deﬁned all pairs 〈a1.i , a2.i〉 and 〈b1.i , b2.i〉 for
all indices i, 1 ik.
Induction step:We now deﬁne 〈a1.k+1, a2.k+1〉 and 〈b1.k+1, b2.k+1〉.
Since the balancer bottleneck is at least two, the switch b1.k and the inﬁnite sequence
a1.k determine two distinct switches b1.k+1 and b′1.k+1 and two disjoint inﬁnite sequences
a1.k+1 and a′1.k+1 (that are subsequences of a1.k). Correspondingly, the switch b2.k and the
inﬁnite sequence a2.k determine two distinct switches b2.k+1 and b′2.k+1 and two disjoint
inﬁnite sequences a2.k+1 and a′2.k+1 (that are subsequences of a1.k). Assume, without loss
of generality, that the switches b1.k+1 and b2.k+1 are distinct. Note also that the sequences
a1.k+1 and a2.k+1 are necessarily disjoint since they are subsequences of a1.k and a2.k ,
respectively, which are disjoint by induction hypothesis.
So, for i = k + 1, deﬁne the pairs 〈a1.k+1, a2.k+1〉 and 〈b1.k+1, b2.k+1〉, respectively.
Note that our inductive deﬁnition guarantees that for each index i, where 1 id(N ),
the sequences a1.i and a2.i contain only odd and even tokens, respectively.
We now continue with the construction of sequence . Write  = 1 · 2 · . . . as the
concatenation of an inﬁnite number of execution fragments, where each execution fragment
i is ﬁnite and includes switch transitions involving token ti , as follows:
3 Note that this assumption is indeed with no loss of generality, since for wires that connect non-consecutive
layers, we can intercept dummy switches in the missing layers, with input and output width 1, which simply
forward tokens (without routing them).
398 C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400
For each token ti , denote last(ti) the largest integer k, 1k < d(N ), such that token
ti in either a1.k or a2.k , but in neither a1.k+1 nor a2.k+1, or d(N ) if no such integer
exists. Then, the execution fragment i includes only the switch transitions involving
the token ti and a switch from each layer , where 1 last(ti).
Intuitively, each token ti enters the network from either switch b1.1 or switch b2.1; it traverses
the network till either it exits the network or it is “halted” once it gets to cover the switch
immediately following the switch it has halted in layer last(ti) (in case last(ti) < d(N )).
Note that the construction of execution  guarantees, in particular, that both sequences
a1.d(N ) and a2.d(N ) are inﬁnite. Thus, it follows that an inﬁnite number of odd tokens
traverses switch b1.d(N ), and an inﬁnite number of even tokens traverses switch b2.d(N ).
The construction of execution  induces an odd path 1 = b1.1, . . . , b1.d(N ) and an even
path 2 = b2.1, . . . , b2.d(N ). The odd and even paths are traversed by odd and even tokens,
respectively. Since the switches b1.i and b2.i are distinct for all layers i, 1 id(N ), it
follows that the odd and even paths are disjoint. We prepare the reader that the rest of our
proof will use the two possible ways of ordering these two disjoint paths in order to create
two corresponding executions. We will use the fact that the two resulting executions are
both still indistinguishable from  and linearizable; this will lead to a contradiction. We
now continue with the details of the formal proof.
We proceed to use execution  in order to construct a ﬁnite execution .
Construction of execution : Fix  to be the shortest preﬁx of  that includes a switch
transition involving an odd token at a switch from layer d(N ) and a switch transition
involving an even token at a switch from layer d(N ); thus,  is a (not necessarily complete)
ﬁnite execution. Since is ﬁnite, it only involves a ﬁnite number n of tokens. By construction
of execution , the types of these n tokens are n-wise independent over 〈I,⊕〉. Moreover,
for each token t involved in execution , Out(t) = Out(t).
Denote t1 and t2 the latest odd and even tokens, respectively, in execution . We will use
t1 and t2 in order to construct from  two distinct ﬁnite executions 1 and 2.
Construction of executions 1 and 2: We permute switch transitions in execution  in
order to obtain executions 1 and 2 as follows:
• In execution 1, all switch transitions involving odd tokens precede the switch transitions
involving even tokens.
• In execution1, all switch transitions involving even tokens precede the switch transitions
involving odd tokens.
In both executions 1 and 2, the relative order of odd tokens (resp., even tokens) is the
same as the relative order of odd tokens (resp., even tokens) in execution .
Since odd tokens (resp., even tokens) follow the odd path 1 (resp., even path 2) in
both executions 1 and 2, the paths 1 and 2 are disjoint, and the relative order of odd
and even tokens, respectively, is maintained in all executions , 1 and 2, it follows that
Out(t1) = Out1(t1) = Out2(t1) and Out(t2) = Out1(t2) = Out2(t2).
We ﬁnally use executions 1 and 2 in order to construct executions 1 and 2.
Construction of executions 1 and 2: We extend 1 and 2 to complete executions 1
and 2, respectively.
Since 1 extends 1 and the traversals of tokens t1 and t2 are both completed in 1, it
follows that Out1(t1) = Out1(t1) and Out1(t2) = Out1(t2). Since 2 extends 2 and
C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400 399
the traversals of tokens t1 and t2 are both completed in 2, it follows that Out2(t1) =
Out2(t1) and Out2(t2) = Out2(t2). It follows that Out1(t1) = Out2(t1) = Out(t1)
and Out1(t2) = Out2(t2) = Out(t2).
Since 1 is a complete execution in which the types of operations are n-wise independent,
theMonotone Linearizability Lemma implies that 1 is linearizable. Recall that, by construc-
tion, t1
1−→ t2. It follows that t1 S(1)−→ t2. Thus, theMonotonicity underComposition property
implies thatOut1(t1) ≺ Out1(t2). Similarly, since 2 is a complete execution in which the
types of operations are n-wise independent, the Monotone Linearizability Lemma implies
that 2 is linearizable. Recall that, by construction, t2
2−→ t1. It follows that t2 S(2)−→ t1. Thus,
the Monotonicity under Composition property implies that Out2(t2) ≺ Out2(t1).
So, in total, Out1(t1) ≺ Out1(t2) = Out2(t2) ≺ Out2(t1) = Out1(t1). A contradic-
tion. 
We remark that the assumption of a non-trivial switching network is essential for
Theorem 6.2 to hold: A switching network consisting of a single inﬁnite-state switch with
n input wires and n output wires (where n is the number of concurrent processes) can im-
plement any RMW register as follows. The state of the variable is encoded by the state of
the switch. To invoke an operation on the variable, a process issues a token with a state
encoding the type of the operation. Such a token, when atomically processed by the switch,
will cause the natural changes to its state and to the state of the switch, so that the new state
of the switch is the new state of the variable, and the new state of the token is the response
of the variable to the operation invoked by the token.
Recall the Integers with Addition monotone group 〈Z,N \ {0},+, 〉 and the Rationals
with Multiplication monotone group 〈Z,N \ {0},+, 〉, which are associated with the
monotone Fetch&Add and Fetch&Multiply operations, respectively. So, Theorem 6.1 im-
mediately implies corresponding impossibility results for switching networks implementing
the Fetch&Add and Fetch&Multiply operations.
7. Conclusion
We have studied the possibility or impossibility, and the corresponding costs, of devising
distributed implementations of any monotone RMW operation that achieve high concur-
rency and low contention. Through ourMonotone Linearizability Lemma,which may be of
independent interest, we identiﬁed inherent ordering constraints of linearizability for any
such implementation; we proposed exploiting this inherent linearizability in order to devise
impossibility proofs. We succeeded in doing so within the speciﬁc context of a switching
network implementing a monotone RMW operation, for which we derived the ﬁrst lower
bounds on size. These negative end results establish the ﬁrst space complexity separations
between Fetch&Increment and any monotone RMW operation in the model of switching
networks.
We remark that the proof of the impossibility result for inﬁnite-state networks has required
unbounded concurrency. This is not the case for ﬁnite-state switching networks, even though
we have made similar assumptions on register bottleneck and switch bottleneck for the two
400 C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400
classes of switching networks, respectively, in our corresponding proofs. Thus, the two
impossibility results represent a trade-off between the strength of the switches (ﬁnite or
inﬁnite number of states) and the concurrency of the network (bounded or unbounded), and
neither of them is implied by the other.
Finally, we mention that we are able to use our Monotone Linearizability Lemma to
prove a lower bound on latency for switching networks that implement monotone groups.
Speciﬁcally, we prove that any switching network (whether made up of switches with a
ﬁnite or inﬁnite number of states) that implements a monotone RMW operation induces
executions with latency (n), where n is the number of concurrent processes participating
in the execution. This lower bound complements the corresponding lower bound on latency
shown in [7, Theorem 3.2].
Acknowledgements
We would like to thank the anonymous Theoretical Computer Science and SIROCCO
2003 reviewers for their helpful comments.
References
[1] W. Aiello, C. Busch, M. Herlihy, M. Mavronicolas, N. Shavit, D. Touitou, Supporting increment and
decrement operations in balancing networks, Chicago J. Theoretical Comput. Sci. 2000-4, December 14,
2000 (electronic).
[2] J. Aspnes, M. Herlihy, N. Shavit, Counting networks, J. ACM 41 (5) (1994) 1020–1048.
[3] J.E. Burns, N.A. Lynch, Bounds on sharedmemory for mutual exclusion, Inform. and Comput. 107 (2) (1993)
171–184.
[4] C. Dwork, M. Herlihy, O.Waarts, Contention in shared memory algorithms, J. ACM 44 (6) (1997) 779–805.
[5] K.P. Eswaran, J.N. Gray, R.A. Lorie, I.L. Traiger, The notions of consistency and predicate locks in a database
system, Commun. ACM 19 (11) (1976) 624–633.
[6] P. Fatourou, M. Herlihy,Adding networks, Proc. 15th Internat. Symp. on DIStributed Computing, J.L.Welch
(Ed.), Lecture Notes in Computer Science,Vol. 2180, Springer, Lisbon, Portugal, October 2001, pp. 330–342.
[7] P. Fatourou, M. Herlihy, Read–Modify–Write networks, Distributed Comput. 17 (2004) 33–46.
[8] J. Goodman, M. Vernon, P. Woest, Efﬁcient synchronization primitives for large-scale, cache-coherent
multiprocessors, Proc. 3rd Internat. Conf. on Architectural Support for Programming Languages and
Operating Systems, April 1989, pp. 64–75.
[9] M. Herlihy, N. Shavit, O. Waarts, Linearizable counting networks, Distributed Comput. 9 (4) (1996)
193–203.
[10] M.Herlihy, J.Wing, Linearizability: a correctness condition for concurrent objects,ACMTrans. Programming
Languages and Systems 12 (3) (1990) 463–492.
[11] C.P. Kruskal, L. Rudolph, M. Snir, Efﬁcient synchronization on multiprocessors with shared memory, Proc.
5th Annu. ACM Symp. on Principles of Distributed Computing, August 1986, pp. 218–228.
