In this work, we embark on a study of the possibility (or impossibility), and the corresponding costs, of devising concurrent, low-contention implementations of atomic Read-Modify-Write operations (abbreviated as RMW), in a distributed system. We consider a natural class of RMW operations which give rise to a certain class of algebraic groups that we introduce here and call monotone groups. Our chief combinatorial instrument is a Monotone Linearizability Lemma, which establishes inherent ordering constraints of linearizability for a certain class of executions of any distributed system that implements a monotone RMW operation.
• If the network is made up of switches with infinite state and it incurs low contention, then it must still contain an infinite number of switches if we now allow concurrency to grow unbounded.
• Any switching network induces executions with latency at least l n−1 c−1 m , where n is the number of concurrent processes and c is the maximum number of processes that simultaneously access a switch.
Introduction
Motivation, Framework and Outline. A Read-Modify-Write shared variable [8, 13] , henceforth abbreviated as RMW, is an abstract variable type that allows reading its old value, and determining (via some specific operator) and writing a new value back to it in a single, atomic (indivisible) RMW operation (cf. [15, Example 9.4.2] ). A RMW operation is a strong synchronization primitive that allows for the design of efficient and transparent algorithms in the asynchronous shared memory model of distributed computation; see, e.g., the folklore algorithm for mutual exclusion described in [3, Section 4.3.2] , or the scalable ordered multicast protocol of Herlihy et al. [11] that is based on a modular use of the distributed Swap operation, a special case of RMW. Due to their fundamental importance as synchronization primitives, it is most desirable to devise suitable distributed data structures for the construction of concurrent, low-contention implementations of RMW variables. Intuitively, the contention of an implementation measures the extent to which concurrent processes access the same memory location simultaneously; it has been argued quite convincingly that contention is a critical factor for the overall efficiency of shared memory algorithms (cf. [6] ). The central question motivating this work is the possibility (or impossibility), and the corresponding incurred complexities, for concurrent, low-contention implementations of RMW shared variables.
We focus on a specific class of RMW operations whose associated operators give rise to a certain class of algebraic groups introduced and studied here, which we call monotone groups. A monotone group has a total order and a monotone subdomain associated with it; the latter enjoys a significant monotonicity property, which we call monotonicity under composition: applying the operator on an element from the monotone subdomain results to another element in the monotone subdomain that strictly dominates the initial one with respect to the total order. For example, the Fetch&Add operation (over the set of integers) clearly falls into the context of monotone groups, since adding a positive integer to a positive integer results in a larger positive integer; here, the monotone subdomain is the set of positive integers. So also does the Fetch&Multiply operation, and so on. A monotone RMW operation is one that is associated with a monotone group.
An abstract concept defined in relation to monotone groups is that of n-wise independence. Roughly speaking, n elements of a monotone group are n-wise independent if it is not possible to derive the identity element of the group through successive (specifically restricted though) applications of the operator on n of the elements or their inverses. A preliminary but significant property of monotone groups that we prove is that every monotone group is n-wise independent, in the sense of having n-wise independent elements. As we establish, the existence of n-wise independent elements in a monotone group is largely responsible for enforcing linearizability [12] for certain suitable executions of a distributed system that implements the corresponding (monotone) RMW operation; recall that an execution is linearizable [12] if the values returned to operations in it respect the real-time ordering of the operations.
As a consequence, the main conclusion of our work is that guaranteeing the inherent linearizability for these particular executions must incur a high cost in efficiency for a certain class of concurrent, lowcontention implementations of (monotone) RMW that are based on switching networks; these are concurrent, low-contention data structures that were recently introcuded [7] as a generalization of counting networks [2] . Roughly speaking, a switching network is a directed, acyclic graph made up of switches and output registers; whenever a process issues a RMW operation, it shepherds a token through the network, which traverses a path of switches till it is eventually returned a value (at an output register). Thus, concurrent processes are spatially dispersed in a switching network, which reduces their simultaneous crossings in front of the same memory location; this offers potential for low contention. The size of a switching network is the total number of switches in it; its latency is the maximum number of switches traversed by a token shepherding a RMW operation through the network. The concurrency of a switching network is the maximum number of concurrent processes that may shepherd a RMW operation through the network.
In order to model the low-contention property for switching networks, we introduce register bottleneck and layer bottleneck; roughly speaking, both register bottleneck and layer bottleneck measure the minimum number of network elements (either switches or output registers) that are accessed by processes in any infinite execution. (Layer bottleneck assumes partitioning the switches of the network into layers in the natural way.) Intuitively, if this minimum number is small, some network element will become a bottleneck (or a "hot-spot" in the pool of memory locations) in some infinite execution and the network incurs high contention; hence, a switching network is low-contention if register bottleneck and layer bottleneck are sufficiently large.
• If the switching network is made up of switches with finite state and it is low-contention, then it must contain an infinite number of switches, even if concurrency is restricted to remain bounded (Theorem 6.1).
• If the switching network is made up of switches with infinite state and it is low-contention, then it must still contain an infinite number of switches if concurrency is now allowed to grow unbounded (Theorem 6.2).
We note that our two lower bounds on the size of any switching network that implements a monotone RMW operation represent a trade-off between the strength of the switches (finite or infinite state) and the concurrency of the network (bounded or unbounded). Thus, neither of them is implied by the other. Our final result deals with latency. We obtain:
• Any switching network (whether made up of switches of finite or infinite state) that implements a monotone RMW operation induces executions with latency at least n−1 c−1 , where n is the number of concurrent processes participating in the execution, and c, the network's capacity, is the maximum number of processes that simultaneously access a switch in any execution of the network.
Our impossibility results for switching networks indicate that inherent linearizability, necessitated by our Monotone Linearizability Lemma, is the crucial bottleneck that rules out efficiency (with respect to both size and latency) for any low-contention switching network that implements a monotone RMW operation. In fact, we believe that inherent linearizability is indeed the crucial efficiency bottleneck for any such class of distributed, low-contention implementations, but this remains to be seen. Finally, we remark that linearizability has so far been studied as a required property for a distributed system that best guarantees acceptable concurrent behavior. To the best of our knowledge, our work is the first to provide, through the Monotone Linearizability Lemma, a (non-trivial) instance of a distributed system where linearizability is an inherent property.
Related Work, Comparison and Significance. The notion of linearizability has been introduced by Herlihy and Wing [12] . Switching networks (and, in particular, adding networks) were recently studied in [7] , as an extension to counting networks [2] that accommodates the general Fetch&Add operation (as opposed to the Fetch&Increment and Fetch&Decrement operations that were supported before by counting networks [1, 2, 18] ); for more on counting networks, see, e.g., [4, 5, 10, 16, 17] .
Theorems 6.1 and 6.2 settle to the negative a far generalization of a specific open question articulated in [7, Section 5] about the existence of switching networks with a finite number of switches that implement the (monotone) Fetch&Add operation. (Two solutions, called adding networks, with an infinite number of switches were presented in [7, Section 4] .) Indeed, the more general problem of devising finite network-based data structures, as suitable extensions to counting networks, to support synchronization operations other than Fetch&Increment (which was originally supported by counting networks) was already stated in the seminal work of Aspnes et al. [2] that introduced counting networks; however, it has remained essentially open: progress on this problem has been so far limited to discovering that counting networks themselves can also support Fetch&Decrement (concurrently with Fetch&Increment) [1, 18] . The impossibility results established in Theorems 6.1 and 6.2 provide a mathematical explanation for the apparent lack of progress on this problem; thus, they are significant since they explain the observed inability of researchers in the last decade or so (since the original conference publication of counting networks [2] ) to operationally extend counting networks, while still retaining them finite and low-contention, in order to perform tasks more complex than just incrementing a counter by one but yet as simple as adding an arbitrary value to a counter.
The structure of the proofs of Theorems 6.1 and 6.2 is inspired by that of the proof of a result of Herlihy et al. [10, Theorem 5.1] , showing that any (non-blocking) counting network [2] (other than the trivial singlebalancer one) must have an infinite number of balancers if all of its executions are to be linearizable. The requirement that all executions be linearizable allows the proof of [10, Theorem 5.1] to pick the execution of choice and force it to violate linearizability. However, a switching network for a monotone RMW operation need not guarantee linearizability in all executions; thus, the role of the Monotone Linearizability Lemma is to contribute to the proofs of Theorems 6.1 and 6.2 executions that are necessarily linearizable. Note also that although a counting network is a special case of a switching network, the lower bound on size established in [10, Theorem 5.1] for a linearizable counting network does not immediately apply to switching networks that implement a monotone RMW operation, since the proof of [10, Theorem 5.1] relies on the behavior of counting networks; instead, the proofs of Theorems 6.1 and 6.2 require far more delicate arguments that are specific to the behavior of switching networks. switches. Theorem 6.3 significantly extends and improves [7, Theorem 1] in the following ways: First, Theorem 6.3 applies to switching networks that implement any monotone RMW operation, while [7, Theorem 1] is specific to adding networks and the Fetch&Add operation, and second, despite the enhanced generality of Theorem 6.3, its proof is far simpler and more natural and succinct than that of [7, Theorem 1] .
Monotone Groups
Basic Definitions. We start by reviewing some very basic definitions from Group Theory. (See [9] for a general background in Group Theory.) A (binary) operator (also called composition law) on a set IΓ is a mapping ⊕ : IΓ × IΓ → IΓ. A group IΓ, ⊕ is a set IΓ together with an operator ⊕ such that: (1) Closure Property: for all pairs of elements a, b ∈ IΓ, a ⊕ b ∈ IΓ, (2) Associativity: for all triples of elements a, b, c ∈ IΓ, (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c), (3) Identity Element: there is an element a ∈ IΓ, called the identity element of IΓ, such that for each element a ∈ IΓ, a ⊕ e = e ⊕ a = a, and (4) Inverse Element: for each element a ∈ IΓ, there is an element a −1 ∈ IΓ, called the inverse of a, such that a ⊕ a −1 = a −1 ⊕ a = e. An Abelian group is a group IΓ, ⊕ which satisfies in addition the following property: (5) Commutativity: for all pairs of elements
Composite Operators. We proceed to define two composite operators by applying the operator ⊕ a number of times. For any integer k, define the unary operator k : IΓ → IΓ as follows:
Call k the power operator. For any integer n ≥ 2, the operator n is n-ary; it takes as input a sequence of elements a 1 , a 2 , . . . , a n , and it yields the result n (a 1 , a 2 , . . . , a n ) = a 1 ⊕ a 2 ⊕ . . . ⊕ a n , denoted as n i=1 a i . (By associativity, the result is well defined.) Call the summation operator.
Monotone Groups. Assume now that the set IΓ is totally ordered; thus, a total order is defined on IΓ. For any pair of elements a, b ∈ IΓ, write a ≺ b if a b and a = b. A monotone subdomain of IΓ is a subset IMI ⊆ IΓ that satisfies the following three properties: (1) Closure: for any two elements a, b ∈ IMI, a ⊕ b ∈ IMI, (2) Identity Lower Bound: for any element a ∈ IMI, e ≺ a, and (3) Monotonicity under Composition: for any pair of elements a, b ∈ IMI, both a ≺ a ⊕ b and b ≺ a ⊕ b. Notice that e / ∈ IMI. Notice also that IMI is necessarily infinite. A monotone group is a quadruple IΓ, IMI, ⊕, , where IΓ, ⊕ is an Abelian group, is a total order on IΓ, and IMI ⊆ IΓ is a monotone subdomain of IΓ.
We proceed with some examples of monotone groups that will be used in our later analysis. Throughout, denote Z, IN and Q the sets of integers, natural numbers (including zero), and rational numbers, respectively. We will use + and · to denote the common (binary) operators of addition and multiplication, respectively, on these sets. Denote ≤ the less-than-or-equal relation (total order) on these sets. The quadruple Z, IN \ {0}, +, ≤ is a monotone group (integers with addition). From the definition of the power operator k , for any integer k, we have that for any integer a ∈ Z, k a = k · a. From the definition of the summation operator k2 k1 , for any pair of integers k 1 and k 2 , we have that for any sequence of k 2 − k 1 + 1 integers a k1 , a k1+1 , . . . , a k2 ∈ Z, k2 i=k1 a i = k2 i=k1 a i . The quadruple Q, IN \ {0, 1}, ·, ≤ is also a monotone group (integers with multiplication). From the definition of the power operator k , for any integer k, we have that for any rational number a ∈ Q, k a = a k . From the definition of the summation operator , for any set of n integers k 1 , k 2 , . . . , k n , we have that for any set of n rational numbers a k1 , a k1+1 , . . . , a k2 ∈ Q,
Independence. Fix any integer n ≥ 2, and consider any n distinct elements a 1 , a 2 , . . . , a n ∈ IΓ with a 1 , a 2 , . . . , a n = e. Say that a 1 , a 2 , . . . , a n are n-wise independent in IΓ, ⊕ if for any sequence of n integers
ki a i = e. Say that the monotone group IΓ, IMI, ⊕, is n-wise independent if there are n distinct elements a 1 , a 2 , . . . , a n ∈ IMI, which are n-wise independent in IMI, ⊕ .
By the definition of n-wise independence, n integers a 1 , a 2 , . . . , a n ∈ IN \ {0}, where n ≥ 2, are nwise independent in IN \ {0}, + if for any sequence of n integers k 1 , k 2 , . . . , k n ∈ {−1, 0, 1, 2}, that are not all simultaneously zero, n i=1 k i a i = 0. We are able to prove that for any integer n ≥ 2, the monotone group Z, IN \ {0}, +, ≤ is n-wise independent. From the definition of n-wise independence, n integers a 1 , a 2 , . . . , a n ∈ IN \ {0, 1} are n-wise independent in IN \ {0, 1}, · if for any sequence of n integers k 1 , k 2 , . . . , k n ∈ {−1, 0, 1, 2}, that are not all simultaneously zero, n i=1 a ki i = 1. Consider any n distinct prime numbers a 1 , a 2 , . . . , a n . Then, n i=1 a ki i is a rational number whose numerator and denominator have no common factors; so n i=1 a ki i = 1, and the n integers a 1 , a 2 , . . . , a n are n-wise independent in IN \ {0, 1}. This implies that the monotone group Q, IN \ {0, 1}, ·, ≤ is n-wise independent.
Independence of Monotone Groups. We are able to show that every monotone group is n-wise independent. The proof uses a reduction to the (already proven) n-wise independence of the monotone group Q, IN\{0, }, +, ≤ . (Thus, this establishes some kind of completeness of the monotone group Q, IN\{0}, +, ≤ for the class of all n-wise independent monotone groups.)
System Model
Systems that Implement Monotone Groups. Our model of a distributed system is patterned after the one in [12, Section 2], adjusted to incorporate the issue of implementing a monotone group IΓ, IMI, ⊕,
. We consider a distributed system P consisting of a collection of sequential threads of control, called processes. Processes are sequential, and each process applies a sequence of operations to a distributed data structure, called the object, alternately issuing an invocation and then receiving the associated response. Each invocation at process p i has the form Invoke i (a) for some value a ∈ IMI; each response at process p i has the form Response i (b) for some value b ∈ IMI ∪ {e}. Formally, an execution of system P is a (possibly infinite) sequence α of invocation and response events. We assume that for each invocation at process p i in execution α, there is a later response in α that matches it and no invocation at p i that precedes the matching response in α. An operation at process p i in execution α is a matching pair
of an invocation and response at p i ; for such an operation, we will write a = In(op i ) and b = Out(op i ), and we will sometimes say that op i is of type a. An execution α induces a partial order For any execution α of system P, a serialization S(α) of execution α is a sequence whose elements are the operations of α, and each operation of α appears exactly once in S(α). Thus, a serialization S(α) is a total order S(α) −→ on the set of operations in α. Notice that there may be, in general, many possible serializations of the execution α. Say that a serialization S(α) is valid for the monotone group IΓ, IMI, ⊕, if the following two conditions hold:
] is the first operation in S(α), then b = e, and (2) Valid Composition: for any pair of operations op
if every execution of α has a valid serialization. We prove the Unique Serialization Lemma, asserting that for any execution α of P implementing a monotone group, there is a unique valid serialization S(α).
Sometimes, we will write In α (op) and Out α (op) in order to emphasize reference to execution α.
Linearizable Executions. We consider a system P that implements a monotone group IΓ, IMI, ⊕, . Say that execution α is linearizable [12] if the (unique) valid serialization S(α) extends α −→; that is, for any pair of operations op (1) and op (2) such that op
−→ op (2) . Since P implements the monotone group IΓ, IMI, ⊕, , for any two operations op (1) and op (2) such that op (2) ). Thus, it follows that for any pair of operations op (1) and op (2) such that op (2) ). Say that operation op (1) in execution α is non-linearizable in execution α if there is another operation op (2) in execution α such that op (2) ). Say that operation op in execution α is linearizable in execution α if it is not non-linearizable in execution α. Clearly, execution α is linearizable if every operation in execution α is linearizable in it.
Switching Networks
Basic Definitions. A switching network [7] is a directed acyclic graph in which the nodes are called switches and the edges are called wires. An (f in , f out )-switch is a routing element with f in input wires, f out output wires, and an internal state. A (w in , w out )-switching network has w in input wires and w out output wires, and it is formed by connecting together switches; thus, we connect output wires of switches to input wires of other switches. Some switches have input wires (resp., output wires) not connected to other switches in the network, and these wires are called the input wires (resp., output wires) of the network. The size of a switching network is the number of its switches. A path in a switching network is a sequence of switches, each connected to the next. The depth d(b) of a switch b in a switching network is defined to be 0 if one of its input wires is an input wire of the network, and max j d(b j ) + 1, where the maximum is taken over all switches b j that are connected to switch b. The depth d of the network is defined as the maximum depth of any switch. The switching network can naturally be divided into d layers, so that layer contains all switches of depth , where 0 ≤ ≤ d. A simple (4, 4)-switching network with depth 3, made from (2, 2)-switches, is depicted in Figure 1 , where switches are drawn with vertical lines and wires with horizontal lines; for each switch, inputs wires appear on its left and output wires on its right, and similarly for the switching network.
Tokens. Processes access the switching network by issuing tokens. In contrast to counting networks [2] , each token has a state (a set of variables) which can change as the token traverses the network. In particular, a token enters the switching network from one of the network's input wires; then, the token is forwarded to the switch to which the wire belongs, the switch then routes the token to one of its output wires from which the token enters the next switch in the network, and so on. The token continues traversing the network in the same fashion until it reaches an output wire of the network. Then, the token exits the network and returns to the process that issued it. When a token traverses a switch, the states of the token and the switch change atomically before the token is routed to an output wire of the switch. Note that the token and the switch have different transition functions for their states. A switching network may be accessed by many tokens simultaneously which traverse the network asynchronously; however, each process has at most one token traversing the network each time. The latency of the switching network is the maximum number of switches traversed by any token (thus, it does not exceed the depth of the network). The concurrency of a switching network is the maximum number of processes (hence, tokens) allowed to access the network simultaneously.
Configurations.
A network configuration of a switching network is the concatenation of the current states of the network´s switches. A total configuration of a switching network is the concatenation of the current states of the networks' switches and the states of all tokens that are currently traversing the network. Say that a switching network is in a quiescent total configuration if there are no tokens traversing the network (that is, all tokens that have entered the network have exited it). Denote x i the total number of tokens that have ever entered from input wire i of the network, where 1 ≤ i ≤ w in , and denote y j the total number of tokens that have left from output wire j of the network, where 1 ≤ j ≤ w out . The network must satisfy the following two properties: (1) Safety property: in any total configuration, it must be win i=1 x i ≥ wout j=1 y j ; thus, no new tokens are created in the network, and (2) Liveness Property: given any finite number of input tokens that traverse the network, the network will eventually reach a quiescent total configuration. In any quiescent configuration it must be that win i=1 x i = wout j=1 y j . The safety and liveness properties must also be satisfied by every individual switch in the network.
Executions. We model executions of switching networks in the style of Herlihy et al. [10] . For any switch b and token t, we denote by e = t, b the transition in which the token passes (in one atomic step) from an input wire to an output wire of switch b. An execution of a switching network is a finite or infinite sequence s 0 , e 1 , s 1 , e 2 , . . . of alternating total configurations and switch transitions such that for each triple s i , e i+1 , s i+1 , the switch transition e i+1 carries the total configuration s i to total configuration s i+1 . A finite execution is complete if it results to a quiescent total configuration. An execution α is sequential if for any two transitions e i = t i , b i and e j = t j , b j , where t i and t j correspond to the same token, all transitions (if any) between them also involve that token. In other words, tokens traverse the network one completely after the other in a sequential execution. In an execution of a switching network, we say that concurrency is bounded if the number of concurrent processes accessing the network in the execution is bounded. In an (infinite) execution, we say that concurrency is unbounded if the number of concurrent processes accessing the network in the execution is unbounded (in which case it is either finite or infinite).
Implementations. A switching network N can be used to implement a monotone group IΓ, IMI, ⊕, . Each token t issued by process p i corresponds to an operation op i = [Invoke i (a), Response i (v)] invoked by process p i , where a ∈ IMI and v ∈ IMI ∪ {e}. We say that a is the input value of the token t, and v is the output value of the token t. The input value of the token is part of the token's initial state. In any execution α, the invocation of operation op corresponds to the first transition e i = t i , b i where t i = t and b i is an input switch of the network (this transition occurs when the token enters the network); the response of op corresponds to the latest transition e j = t j , b j in execution α such that t j = t (this transition occurs when the token exits the network). When token t exits the network, it carries encapsulated in its state the output value v that operation op i responds with. Use execution α to define its subsequence α that contains only transitions that correspond to invocations and responses of the operations corresponding to tokens. The sequence α induces an execution of a distributed system in the natural way. Denote P the distributed system that is determined by all such induced executions (one for each execution of the switching network N ). Say now that switching network N implements the monotone group IΓ, IMI, ⊕, if the system P implements the monotone group IΓ, IMI, ⊕, .
Finite and Infinite Switches. We examine two kinds of switching networks, corresponding to switches with finite or infinite state.
• Switching networks with finite switches: Each switch of the network has a finite number of states. For this kind of network, we include an additional component on the output wires of the switching network: the output registers. There is an output register associated with each output wire of the switching network. Unlike switches, each output register has an infinite number of states. The output value for a token's operation is computed on the output register residing on the network's output wire from which the token exits. At the exit, the following happen atomically: the token computes its output value according to the register's current state and the state of the register changes according to its previous state and the state of the token (which includes its input value). Notice that the input value of a token does not affect its output value, but only the output value of the next token that will access the same output register.
We remark that this kind of switching networks corresponds more to traditional counting networks [2] , where a token fetching the counter's value and incrementing the counter by one obtains the value from the register attached to the output wire it will arrive at. We also remark that output registers are necessary for this kind of switching networks, since they provide an infinite number of different output values to tokens, while finite switches, used only for routing, are unable to do so.
• Switching networks with infinite switches: Each switch has an infinite number of states. For this kind of networks, there are no attached output registers and the output value of a token is determined according to the state of the token when it exits the network.
Contention Measures. In a switching network, contention represents the extent to which concurrent processes access the same switch or output register simultaneously. We use the following complexity-theoretic measures to model contention in switching networks, the last of which was originally introduced by Dwork et al. [6] for the case of counting networks.
• The register bottleneck of a switching network N is the minimum number of output registers, the minimum being taken over all infinite executions, accessed by tokens in an infinite suffix of an infinite execution of N . (This definition applies only to switching networks with finite switches.) Intuitively, a switching network is low-contention if its register bottleneck is large; a register bottleneck of 1 is the worst, since then many tokens (as many as processes) may eventually accumulate in front of the same output register, which becomes a "hot-spot".
• Similarly, we define the layer bottleneck of a switching network N to be the minimum number of switches in the same layer, the minimum being taken over all layers and infinite executions, accessed by tokens in an infinite suffix of an infinite execution of N . (This definition will be useful for switching networks with infinite switches.) Intuitively, a switching network is low-contention if its layer bottleneck is large; a layer bottleneck of 1 is the worst, since then many tokens (as many as processes) may eventually accumulate in front of the same switch, which becomes a "hot-spot".
• The capacity c of a switching network N is the maximum number of processes that simultaneously access a particular switch in any execution of N .
The Monotone Linearizability Lemma
In this section, we state and prove the Monotone Linearizability Lemma, which establishes ordering constraints of linearizability on a system P that implements a monotone group IΓ, IMI, ⊕, . Since the monotone group IΓ, IMI, ⊕, is n-wise independent, there exist n distinct elements a 1 , a 2 , . . . , a n ∈ IMI, with a 1 , a 2 , . . . , a n = e, which are n-wise independent in IMI, ⊕ . The proof of the Monotone Linearizability Lemma amounts to establishing a contradiction to n-wise independence for a hypothetical non-linearizable execution, in which the arguments of the RMW operations issued by the processes are a 1 , a 2 , . . . , a n . We show:
Proposition 5.1 (Monotone Linearizability Lemma) Consider any execution α of system P in which each process p i , 1 ≤ i ≤ n, issues only operations of type a i . Then, α is linearizable.
Sketch of proof:
Assume, by way of contradiction, that α is not linearizable. So, there is at least one operation that is non-linearizable in α. Consider the non-linearizable operation op k at process p k with earliest response in execution α (among all non-linearizable operations in α). Since op k is non-linearizable, there exists an operation op l at some process l such that 
a i . We then prove (by case analysis) that for each process
Consider now the finite prefix β 2 of execution α that ends with the response for operation op l . Consider a finite execution α 2 , which is an extension of β 2 that includes no additional invocations by processes; thus, α 2 contains, in addition to those in β 2 , only responses to invocations that are pending in β 2 . Clearly, all operations whose responses precede or coincide with that of op l have identical outputs in α and α 2 . Note also that, by construction, there are no non-linearizable operations in α 2 . Consider the (unique) valid serialization S(α 2 ) of α 2 . For each process p i , where 1 ≤ i ≤ n, denote m i the number of operations at process p i that precede op l in the serialization S(α 2 ). Partition these m i operations into two classes: (a) Operations whose response in α 2 precedes or coincides with that of op l ; (thus, op l falls into this class); assume there are m i,a of them. (b) Operations whose response in α 2 follow that of op l ; assume there are m i,b of them; thus, m i = m i,a + m i,b . Since each operation at process p i is of type a i , and since S(α 1 ) is a valid serialization of α 1 , we use the associativity and commutativity of operation ⊕ to obtain that
Clearly, all operations whose responses precede or coincide with that of op l have the same outputs in α and α 1 , and in α and α 2 , respectively. It follows that all such operations have the same outputs in α 1 and α 2 . Consider any such operation op. If op precedes op l in serialization S(α 1 ) then it must be that Out(op) ≺ Out(op l ). Subsequently, it must be that op precedes op l in serialization S(α 2 ) too, since otherwise the valid composition condition of the valid serialization for a monotone group would be violated. For the same reason, if op does not precede op l in serialization S(α 1 ) then it must be that op does not precede op l in serialization S(α 2 ) either. This implies that for all processes p i , 1 ≤ i ≤ n, m i,a = m i,a . We use the fact that operation op l has the same output in executions α 1 and α 2 to conclude (using the associativity and commutativity of ⊕) that a i = e that a 1 , a 2 , . . . , a n are not n-wise independent. A contradiction.
Impossibility Results and Lower Bounds
Lower Bounds on Size. We first consider switching networks with finite switches. We show: Theorem 6.1 (Impossibility Result for Switching Networks with Finite Switches) There is no nontrivial switching network with finite switches that has finite size, incurs register bottleneck at least 2 and implements a monotone group IΓ, IMI, ⊕, , when the concurrency is bounded.
Sketch of proof:
Assume, by way of contradiction that there is such a (non-trivial) switching network N with finite size. Since the number of switches in the network N is finite, and each switch has a finite number of states, the switching network has a finite number of network configurations, which we denote by s. The number of output registers is also finite, which we denote by m. Consider a sequential execution α of network N with s + 1 tokens. By the pigeon-hole principle, some network configuration of N is repeated in α; so take any pair of identical network configurations, and call phase the execution segment of α that appears between them. Since the register bottleneck of N is at least 2, it must be that in any phase φ, there exist at least two distinct tokens that access two different output registers (since, otherwise, we would be able to construct an infinite execution with register bottleneck 1 by repeating φ infinitely many times, so that only one output register would be accessed in the resulting infinite suffix).
Consider now a sequential execution α 1 involving 3m(s + 1) tokens, whose input values are taken to be any 3m(s + 1)-wise independent elements in IMI, ⊕ . Execution α 1 must contain at least 3m disjoint phases; since the number of output registers is m, and since at least two output registers are accessed in each phase, it follows that there exists some output register r 1 that is accessed in at least three different phases φ 1 , φ 2 and φ 3 (in this order). In φ 1 , register r 1 is accessed by some token t i . In φ 2 , besides r 1 , there must exist another register r 2 which is accessed by some token t j (since in each phase at least two distinct registers are accessed). In φ 3 , register r 1 is accessed by a token t k . Proposition 5.1 implies that execution α 1 is linearizable, so that
We construct now an execution α 2 which involves the same tokens (with same input values) as α 1 . Execution α 2 is identical with α 1 up to the point where t i is about to access register r 1 , when the following occur:
Token t i does not take its transition to r 1 , and we say that t i is halted on r 1 (or that we halt it on r 1 ). Whenever any subsequent token different than t k attempts to access r 1 , we halt it on r 1 , otherwise we let it exit the network. When it is t k 's turn, we late it take its transition step on r 1 . We then let t i take its transition on r 1 , and then the rest of the halted tokens take their transitions on r 1 . (The order of tokens is the same as in α 1 .) Notice that in execution α 2 , each token accesses the same output register as in a 1 . Thus, it follows that token t j does not access register r 1 in execution α 2 (but it accesses register r 2 ), and thus it returns the same output value as in α 1 ; hence, Out α1 (t j ) = Out α2 (t j ). Since in executions α 1 and α 2 , tokens t i and t k "see" the same state of register r 1 (that is, token t k bypasses the halted token t i and takes on the value it takes in α 1 ), Out α1 (t i ) = Out α2 (t k ). It follows that Out α2 (t k ) ≺ Out α2 (t j ). However, from Proposition 5.1, execution a 2 is linearizable, while by construction of α 2 , token t j completely precedes token t k in α 2 ; it follows that Out α2 (t j ) ≺ Out α2 (t k ). A contradiction.
We remark that the concurrency assumed in the proof of Theorem 6.1 is no more than the number of tokens involved in the proof, which is a bounded quantity (3m(s + 1)) depending only on parameters of the network N . Thus, the impossibility result in Theorem 6.1 holds even for networks with bounded concurrency. Finally, we argue that the assumption of a non-trivial switching network is essential for Theorem 6.1 to hold: since each token can atomically invoke a computation on an output register, we can implement a monotone RMW operation by a trivial switching network consisting of a single switch that outputs tokens along one output wire, which has an associated register that maintains the state of the RMW variable to be implemented. The switch sequences the operations (that correspond to the tokens) so that they can be atomically invoked (by the tokens) on the register.
We now turn to switching networks with infinite switches. Clearly, the proof of Theorem 6.1 is not applicable to switching networks with infinite switches, since the number of their possible network configurations is no longer finite. Thus, we need to develop new techniques to handle them. We show: * * The proof of Theorem 6.2 will consider (without loss of generality) so called normalized switching networks, in which any switch b at layer has its input wires connected to switches of layer − 1 (assuming ≥ 2) and its output wires connected to Theorem 6.2 (Impossibility Result for Switching Networks with Infinite Switches) There is no non-trivial switching network with infinite switches that has finite size, incurs layer bottleneck at least 2 and implements a monotone group IΓ, IMI, ⊕, , assuming concurrency is unbounded.
Assume, by way of contradiction that there is such a switching network N with finite size. Assume that the depth of N is d, and that N is partitioned into layers 1 , . . . , d . Consider an execution α of the network N which involves an infinite number of tokens t 1 , t 2 , . . ., issued by distinct processes. In execution α, token t 1 is the first token which enters and traverses the network N . Afterwards, each token t i+1 , where i > 1, enters the network when one of the following two events occurs to the previous token t i : token t i "finishes" its traversal in N by leaving from one of the network's output wires, or token t i "halts" on some switch in N without taking its transition on that switch (as explained below).
Let b . Note that we can always find such distinct switches at layer 2 , since otherwise, there would be only one switch at layer 2 accessed by an infinite number of tokens from both classes, which implies that there is an infinite suffix of α that violates the assumption that the layer bottleneck is at least 2.
In execution α, we let all class-1 tokens that access switch b 2 1 to take their transitions on this switch, and the rest of the class-1 tokens to halt on their layer 2 switch (which is different than b 2 1 ). We treat similarly the class-2 tokens on switch b 2 2 . Note that there is an infinite number of class-1 and class-2 tokens that exit from the second layer. Execution α continues in the same fashion for each subsequent layer. As a consequence, in execution α an infinite number of class-1 and class-2 tokens finish by leaving the network from an output wire. The class-1 tokens that finish follow the path b Consider now the shortest prefix α of execution α in which a token t i1 of class-1 finishes and a token t i2 of class-2 finishes. Since α is a prefix of α it must be that the number of tokens in α is bounded by some number m. We can choose the input values of the first m tokens of prefix α so that they are m-wise independent.
Next, we construct a bounded execution β from the prefix α , such that in β all the class-1 tokens enter the network N before the class-2 tokens enter the network (within each class, the relative order of tokens is the same as in α ). Notice that in prefix α and in execution β the tokens behave in exactly the same way, since the paths of the tokens from the different classes do not intersect. Proposition 5.1 implies that execution β is linearizable, since the input values of the tokens are m-wise independent. Therefore, since in execution β the token t i2 enters the network after token t i1 finishes, it must be Out β (t i1 ) ≺ Out β (t i2 ). Similar to β, we can construct an execution γ where now the class-2 tokens enter the network before the class-1 tokens. Execution γ must be linearizable, and since token t i1 enters the network after token t i2 finishes, it must be that Out γ (t i2 ) ≺ Out γ (t i1 ). Notice that all the tokens in executions α , β and γ behave in exactly the same way, since tokens from class-1 and class-2 do not intersect. Therefore, Out β (t i1 ) = Out γ (t i1 ) and Out β (t i2 ) = Out γ (t i2 ). Subsequently, Out β (t i2 ) ≺ Out β (t i1 ). A contradiction.
We remark that the proof of Theorem 6.2 requires unbounded (finite or infinite) concurrency. This stems from the assumption that the layer bottleneck is at least 2; thus, in any infinite suffix of an infinite execution, at least two switches in each layer are certain to be accessed by tokens. We do not know, however, how many tokens we will need (and, therefore, how many processes) before a second switch is accessed; hence, switches of layer + 1 (assuming is less than the depth of the network). Thus, in a normalized switching network, there are no wires connecting switches in non-consecutive layers. Note that any switching network can be easily cast as a normalized one, if we intercept wires that connect non-consecutive layers with dummy switches with input and output width 1, which simply forward tokens (without routing them).
we cannot bound the concurrency needed in the execution used in the proof. This was not the case for switching networks with finite switches (even though the assumption on register bottleneck made there is quite similar), where the fact that configurations were repeated (since switches are finite) allowed to get a token visit a register for a second time by shepherding in a finite number of tokens. So, since the proof of Theorem 6.2 requires unbounded concurrency, it does not imply Theorem 6.1 which assumes bounded concurrency, and the two results are incomparable and represent a trade-off.
Finally, we remark that the assumption of a non-trivial switching network is essential for Theorem 6.2 to hold: A switching network consisting of a single infinite-state switch with n input wires and n output wires (where n is the number of concurrent processes) can implement any RMW variable as follows. The state of the variable is encoded by the state of the switch. To invoke an operation on the variable, a process issues a token with a state encoding the argument of the operation. Such a token, when atomically processed by the switch, will cause the natural changes to its state and to the state of the switch, so that the new state of the switch is the new state of the variable, and the new state of the token is the response of the variable to the operation invoked by the token.
Lower Bound on Time. We start with a definition that we will use in our proof. For any quiescent total configuration s of a switching network N , we say that token t i has preferred path π if t i follows the path π and runs in isolation into the network, which is initially in the total configuration s, until token t i exits the network and responds with an output value v which is its preferred value. We show: Theorem 6.3 For any switching network N that implements a monotone group IΓ, IMI, ⊕, , there is a sequential execution with n tokens such that each token traverses at least n−1 c−1
switches.
Proof: Consider n tokens t 1 , t 2 , . . . , t n issued by distinct processes, with respective input values a 1 , a 2 , . . . , a n ∈ IMI (trivially, a 1 , a 2 , . . . , a n = e) which are n-wise independent in IMI, ⊕ . First we show that the preferred paths of any two tokens, starting from the same total quiescent configuration, intersect. Consider the network N in a quiescent configuration s. Denote v the output value returned by N to the last token in the (unique) valid serialization of the execution fragment ending with total configuration s, and let a denote that last token's input value. Consider tokens t i and t j with input values a i and a j , respectively. Assume, by way of contradiction, that the preferred paths of t i and t j starting from total configuration s do not intersect. Since the network N provides a valid implementation of the monotone group IΓ, IMI, ⊕, , and since the preferred paths do not intersect, the output values of t i and t j when they run sequentially into the network N with t i first and t j next, starting from s are equal to v ⊕ a (those that would be returned in separate executions where only one of the tokens would be running). However, execution α , where both tokens are running, is linearizable (by Proposition 5.1). Hence, the token t i is serialized before token t j in the (unique) valid serialization of execution α . Since the network N provides a valid implementation of the monotone group IΓ, IMI, ⊕, , the output values of t i and t j are v ⊕ a and v ⊕ a ⊕ a i , respectively. Since a i = e, v ⊕ a = v ⊕ a ⊕ a i , a contradiction. It follows that the preferred paths of t i and t j starting from total configuration s do intersect. By the definition of c, no more than c − 1 tokens (other than t i ) can access any switch along the preferred path of t i (starting from total configuration s). Since every other process's preferred path must intersect t i 's preferred path, it follows that the preferred path of t i must include at least n−1 c−1 switches. Take now any sequential execution α of N , starting from any arbitrary quiescent total configuration s, in which each token t i issues only operations with input value a i . Since the preferred path of any token includes at least n−1 c−1 switches, the first token to run will traverse at least n−1 c−1 switches of N , and the network will return to another quiescent total configuration, for which the same argument applies inductively.
Conclusion and Open Problems
We have studied the possibility, and the corresponding costs, of implementing a monotone RMW operation in a concurrent and low-contention manner. Our end results are lower bounds on size and latency for any non-trivial, low-contention switching network that implements a monotone RMW operation; these are shown by using the Monotone Linearizability Lemma, which may be of independent interest. It would be interesting to ask whether timing conditions may suffice to overcome the limitations we have shown; recall that timing conditions have been exploited in the work of Lynch et al. [16] for devising finite-size linearizable counting networks, while Herlihy et al. [10] establish that no finite-size (non-trivial) asynchronous linearizable counting network exists. For future work, we are also interested in establishing further limitations on various kinds of distributed systems (other than switching networks) that implement a monotone RMW operation. A natural candidate to consider is the message-passing system adopted in the work by Wattenhofer and Widmayer [19] ; that work showed a lower bound on the message complexity of implementing the Fetch&Increment operation in that system; we feel that similar limitations hold for implementations of any monotone RMW operation.
1 . It is straightforward to see that any two distinct prime numbers a 1 and a 2 greater than 1, are pairwise independent, which immediately implies that the monotone group Q, IN \ {0, 1}, ·, ≤ is pairwise independent.
Note finally that n-wise independence is not a generalization of pairwise independence since it imposes constraints on the integers k i , 1 ≤ i ≤ n.
B Proof that Every Monotone Group is Pairwise Independent
Fix any two distinct integers l 1 , l 2 ∈ IN\{0} that are pairwise independent in IN\{0}, + . (Since IN\{0}, + is pairwise independent, such integers exist.) Consider now any monotone group IΓ, IMI, ⊕, . Fix any element a ∈ IMI. We will prove that the elements l1 a and l2 a are pairwise independent in IMI, ⊕ . (Clearly, by monotonicity under composition, these elements are distinct.)
Assume, by way of contradiction, that the elements l1 a and l2 a are not pairwise independent in IMI, ⊕ . Thus, there exists some integer k such that either l1 a = k l2 a or l2 a = k l1 a . Assume, without loss of generality, that l1 a = k l2 a . Since ⊕ is associative, this implies that l1 a = k·l2 a. Multiplying right by −k·l2 a, we obtain that l1 a ⊕ −k·l2 = k·l2 a ⊕ −k·l2 a . Since ⊕ is associative, this implies that l1−k·l2 a = k·l2−k·l2 a, or l1−k·l2 a = e. Thus, it follows that l 1 − k · l 2 = 0 or l 1 = kl 2 . Thus, the integers l 1 and l 2 are not pairwise independent in IN \ {0}, + . A contradiction.
C Proof that Some Specific Group is n-Wise Independent
We prove that for any integer n ≥ 2, the monotone group Z, IN \ {0}, +, ≤ is n-wise independent.
Fix any integer ≥ 0. It suffices to prove that the n integers 2 , 2 +2 , . . . , 2
The proof is by induction on n. For the basis case where n = 2, consider the integers 2 and 2 +2 , and fix any pair of integers k 1 , k 2 ∈ {−1, 0, 1, 2} that are not both simultaneously zero. Clearly, k 1 2 + k 2 2 +2 = 2 (k 1 + 4k 2 ), which can be zero only if k 1 = k 2 = 0. This completes the proof of the basis case.
Assume inductively that for all integers n < n, the n integers 2 , 2 +2 , . . . , 2 +2(n −1) are n -wise independent in IN \ {0}, + .
For the induction step, we will show that the n integers 2 , 2 +2 , . . . , 2 +2(n−1) are n-wise independent in IN\{0}, + . Assume, by way of contradiction, that they are not. Thus, there exist n integers k 1 , k 2 , . . . , k n ∈ {−1, 0, 1, 2} that are not all simultaneously zero, such that
We proceed by case analysis on the value of k n ∈ {−1, 0, 1, 2}.
• Assume first that k n = −1. Then,
a contradiction.
• Assume now that k n = 0. Then,
. . , k n are not all simultaneously zero while k n = 0, it follows that the integers k 1 , k 2 , . . . , k n−1 are not all simultaneously zero. This implies that the n − 1 integers 2 , 2 +2 , . . . , 2 +2(n−2) are (n − 1)-wise independent in IN \ {0}, + , which contradicts the induction hypothesis.
• Assume finally that k n ∈ {1, 2}. Then,
Since we obtained a contradiction in all possible cases, the proof is now complete.
D Proof that Every Monotone Group is n-Wise Independent
Fix any n distinct integers l 1 , l 2 , . . . , l n ∈ IN \ {0} that are n-wise independent in IN \ {0}, + . (Since every monotone group is n-wise independent, such integers exist.) Consider now any monotone group IΓ, IMI, ⊕, . Fix any element a ∈ IMI. We will prove that the n elements l1 a, l2 a, . . . , ln a of IMI are n-wise independent in IMI, ⊕ . (Clearly, by monotonicity under composition, these elements are distinct.) Assume, by way of contradiction, that the elements l1 a, l2 a, . . . , ln a are not n-wise independent in IMI, ⊕ . Thus, there exist n integers k 1 , k 2 , . . . , k n ∈ {−1, 0, 1, 2}, that are not all simultaneously zero, such that
Since ⊕ is associative, this implies that
ki·li a = e, or, by the definition of the summation operation, P n i=1 ki·li a = e. Hence, it follows that
and not all simultaneously zero, this implies that the integers l 1 , l 2 , . . . , l n are n-wise independent in IN \ {0}, + . A contradiction.
E Proof of Unique Serialization Lemma
Assume, by way of contradiction, that there are two distinct valid serializations
ii of execution α. Since S (1) (α) and S (2) (α) are distinct, there exists a least index k ≥ 1 such that op
is different from op (2,k) . Assume, without loss of generality, that op (1.k) appears at position l > k in the serialization S (2) (α) (that is, op (1.k) = op (2.l) ). Notice that for each i < k, op (1.i) = op (2.i) . We examine two cases: k = 1 and k > 1. First consider the case k = 1. Since S (1) (α) is a valid serialization of α, we have that Out(op (1.k) ) = e, and since S (2) (α) is a valid serialization of α, Out(op (2.l) ) = Out(op (2.l−1) ) ⊕ In(op (2.l−1) ) = e (this follows from the identity lower bound for IMI). Which implies that Out(op (1.k) ) = Out(op (2.l) ), a contradiction. Now we consider the case k > 1. Since S (1) (α) is a valid serialization of α,
Since Out(op
Hence,
From the associativity of operation ⊕ we obtain e = e⊕In(op
). By the identity lower bound for IMI, e ≺ In(op
F The Monotone Sequential Consistency Lemma and its Proof
To provide the reader with useful intuition for the Monotone Linearizability Lemma and its proof, we first prove a corresponding Monotone Sequential Consistency Lemma, which applies to a somehow simpler context. We consider a system P that implements a monotone group IΓ, IMI, ⊕, . Say that a process p i is sequentially consistent in execution α [14] if the (unique) valid serialization S(α) extends α −→ i ; that is, for any pair of operations op
i . Since P implements the monotone group IΓ, IMI, ⊕, , for any two operations op (1) i and op
i ). Thus, it follows that for any process p i that is sequentially consistent in execution α, for any pair of operations op such that op
Say that an execution α is sequentially consistent [14] if every process p i is sequentially consistent in execution α. Say that operation op at process p i in execution α is sequentially consistent in execution α if it is not sequentially inconsistent in execution α. Clearly, process p i is sequentially consistent in execution α if every operation op i at process p i in execution α is sequentially consistent in it.
iii Proposition F.1 (Monotone Sequential Consistency Lemma) for any execution α of system P in which process p 1 issues only operations of type a 1 , while any other process p i , i = 1, issues only operations of type a 2 , p 1 is sequentially consistent in execution α.
Proof: Assume, by way of contradiction, that process p 1 is not sequentially consistent in execution α. So, there is at least one operation at p 1 that is sequentially inconsistent in execution α; consider the earliest such operation op (1) 1 . It follows that for the operation op (2) 1 that immediately precedes op (2) 1 in α, Out(op
Consider the finite prefix β 1 of execution α that ends with the response for operation op (1) 1 . Consider now a finite execution α 1 that is an extension of β 1 that includes no additional invocations by processes; thus, α 1 contains only responses to invocations that are pending in β 1 . Clearly, all operations whose responses precede or coincide with that of op (1) 1 have identical outputs in α and α 1 . This implies, in particular, that op (1) 1 is the only one sequentially inconsistent operation at process p 1 in execution α 1 . Take now the (unique) valid serialization S(α 1 ) of α 1 . Clearly, op 1 is sequentially consistent in execution α 1 , all these operations must precede op (2) 1 in S(α 1 ); recall that all operations at p 1 are of type a 1 . Denote l 1 the number of operations at other processes that precede op (2) 1 in the serialization S(α 1 ); recall that all these operations are of type a 2 . Since S(α 1 ) is a valid serialization of α 1 , we use the the associativity and commutativity of operation ⊕ to obtain that Out op
Consider now the finite prefix β 2 of execution α that ends with the response for operation op (2) 1 . Consider now a finite execution α 2 that is an extension of β 2 that includes no additional invocations by processes; thus, α 2 contains only responses to invocations that are pending in β 2 . Clearly, all operations whose responses precede or coincide with that of op (2) 1 have identical outputs in α and α 2 . This implies, in particular, that there is no sequentially inconsistent operation at process p 1 in execution α 2 .
Take now the (unique) valid serialization S(α 2 ) of execution α 2 . Denote k ≥ 0 the number of operations at p 1 that precede op (2) 1 in execution α 1 . Since op (2) 1 is sequentially consistent in execution α 1 , all k operations at p 1 that precede op (2) 1 in execution α, and hence in execution α 2 as well, must also precede op (2) 1 in the serialization S(α 2 ); recall that all operations at p 1 are of type a 1 . Denote l 2 the number of operations at other processes that precede op (2) 1 in the serialization S(α 2 ); recall that all these operations are of type a 2 . Since S(α 2 ) is a valid serialization of α 2 , we use the the associativity and commutativity of operation ⊕ to obtain that Out op
Recall that all operations whose responses precede or coincide with that of op in α have identical outputs in α and α 2 . In particular, these hold for the operation op (2) 1 , which implies that it has identical outputs in α 1 and α 2 . It follows that
Multiplying both sides by ( k a 1 ) −1 from the left, it follows that a 1 ⊕ l1 a 2 = l2 a 2 . Multiplying both sides by −l1 a 2 from the right, we obtain that
We proceed by case analysis on how l 1 and l 2 compare to each other.
iv
• Assume first that l 1 = l 2 . Then, clearly, a 1 = e, a contradiction.
• Assume now that l 1 < l 2 . Then, clearly, a 1 = l2−l1 a 2 . It follows that a 1 and a 2 are not pairwise independent, a contradiction.
• Assume finally that l 1 > l 2 . Then, clearly, a 1 = l1−l2 a −1 2 = l2−l1 a 2 . It follows that a 1 and a 2 are not pairwise independent, a contradiction.
G Proof of the Monotone Linearizability Lemma
Assume, by way of contradiction, that α is not linearizable. So, there is at least one operation that is non-linearizable in α. Consider the non-linearizable operation op k at process p k with earliest response in execution α (among all non-linearizable operations in α). Since op k is non-linearizable, there exists an operation op l at some process l such that op l α −→ op k while Out (op k ) ≺ Out (op l ); fix op l to be the one with latest response in execution α.
Consider the finite prefix β 1 of execution α that ends with the response for operation op k . Consider now a finite execution α 1 , which is an extension of β 1 that includes no additional invocations by processes; thus, α 1 contains, in addition to those in β 1 , only responses to invocations that are pending in β 1 . Clearly, all operations whose responses precede or coincide with that of op k have identical outputs in α and α 1 . This implies, in particular, that the non-linearizable operations in α 1 are op k and possibly operations with responses following that of op k (but at most one such operation per process).
Take now the (unique) valid serialization S(α 1 ) of α 1 . Clearly, op k precedes op l in S(α 1 ). In S(α 1 ), operation op k must appear before op l . For each process p i , where 1 ≤ i ≤ n, denote m i the number of operations at p i that precede op l in the serialization S(α 1 ). Partition these m i operations into two classes: We continue to prove a simple claim:
Claim G.1 For each process p i , 0 ≤ m i,b ≤ 2.
Proof:
We proceed by case analysis.
1. Assume first that i ∈ {k, l}, and consider any operation op i whose response follows that of op l . Hence, either op l α1 −→ op i or op l α1 op i . We proceed by case analysis.
• Assume first that op l α1 −→ op i . Since op i precedes op l in the valid serialization S(α 1 ), Out (op i ) ≺ Out (op l ). Thus, op i is non-linearizable in execution α 1 . It follows that the response for op i follows the one for op k . Since, by construction, α 1 contains no invocation following the response for op k , it follows that there can be at most one such operation op i .
• Assume now that op l α1 op i . Since the response for op i follows that of op l , it follows that there can be at most one such operation op i (since any other following the one such operation with earliest response would satisfy op l α1 −→ op i ).
v It follows from the case analysis that 0 ≤ m i,b ≤ 2.
2. Assume now that i = k, and consider any operation op k at process p k whose response follows that of op l in α 1 . Hence, either op l α1 −→ op k or op l α1 op k . We proceed by case analysis.
• Assume first that op l α1 −→ op k . Clearly, op k is one such operation. Any other op k different than op k cannot have its invocation following the response for op k in α 1 , since α 1 does not contain any invocations following the response for op k . Hence, op k precedes op k in α 1 . It follows that op k is linearizable in execution α 1 . Since op l α1 −→ op k , it follows that Out (op l ) ≺ Out (op k ). Since S(α 1 ) is a valid serialization of α 1 , op k follows op l in S(α 1 ), a contradiction. This implies that no other operation op k exists.
• Assume now that op l α1 op k . Clearly, the invocation for operation op k precedes the response for operation op l , while the response for operation op k precedes the invocation for operation op k (since no two operations at the same process may be simultaneously pending). It follows that there can be at most one such operation op k .
It follows from the case analysis above that 1 ≤ m k,b ≤ 2.
3. Assume finally that i = l, and consider any operation op l at process p l whose response follows that of op l in α 1 . Since no two operations at the same process may be simultaneously pending, it follows that op l precedes op l in α 1 . Since op l precedes op l in S(α 1 ), it follows that op l is non-linearizable in execution α 1 . Hence, the response for op l follows that of op k in α 1 . Since there are no invocations following the response for op k in α 1 , it follows that there can be at most one such operation op l , and 0 ≤ m l,b ≤ 1.
The previous analysis implies that for any process p i , where 1 ≤ i ≤ n, 0 ≤ m i,b ≤ 2, as needed.
Consider now the finite prefix β 2 of execution α that ends with the response for operation op l . Consider a finite execution α 2 , which is an extension of β 2 that includes no additional invocations by processes; thus, α 2 contains, in addition to those in β 2 , only responses to invocations that are pending in β 2 . Clearly, all operations whose responses precede or coincide with that of op l have identical outputs in α and α 2 . Note also that, by construction, there are no non-linearizable operations in α 2 .
Consider the (unique) valid serialization S(α 2 ) of α 2 . For each process p i , where 1 ≤ i ≤ n, denote m i the number of operations at process p i that precede op l in the serialization S(α 2 ). Partition these m i operations into two classes:
(a) Operations whose response in α 2 precedes or coincides with that of op l ; (thus, op l falls into this class); assume there are m i,a of them. Clearly, all operations whose responses precede or coincide with that of op l have the same outputs in α and α 1 , and in α and α 2 , respectively. It follows that all such operations have the same outputs in α 1 and α 2 . Consider any such operation op. If op precedes op l in serialization S(α 1 ) then it must be that Out(op) ≺ Out(op l ). Subsequently, it must be that op precedes op l in serialization S(α 2 ) too, since otherwise the valid composition condition of the valid serialization for a monotone group would be violated. For the
