Abstract
Introduction
A shared counter can be easily implemented using a single shared Fetch&Increment variable. However, empirically, the time to access a shared variable grows at least linearly with the contention, the extent to which concurrent processors simultaneously attempt to access the variable. Aspnes et al. [3] suggested the counting network as an alternative approach for implementing shared counters.
Counting networks are constructed from simple elements called balancers in a similar way that sorting networks are constructed from comparators (see [4, 10] ). Loosely speaking, a balancer can be thought of as a toggle mechanism with p input and output wires that receives tokens from its input wires and forwards them to its output wires (see [3, 5, 8] ). When a token appears on an input wire, it takes the output wire to which the toggle is set, and toggles the balancer so that the input next to come will leave on the next output wire. If the toggle was set to the last output wire it is set back to the first output wire.
One can connect a collection of balancers to form a balancing network. This is done by connecting output wires from some balancers to input wires of others. The remaining unconnected input and output wires are the input and output wires, respectively, of the network. The number of input and output wires is the same and is called the width t of the network. Like the balancer, the balancing network receives tokens in its inputs and forwards them in its outputs. A counting network is a balancing network that has the step property (see Section 2), a property which makes it able to behave like a counter. A processor that wants to obtain a new value from the counter traverses the network by issuing a token, and according to the output it leaves from the network it takes an appropriate value.
In this work we deviate from the "traditional" approach and we construct counting networks which have different input and output widths (different number of input and output wires). In our construction the input width t is smaller or equal to the output width w. More specifically, we have t = 2 k , w = p2 l and k l. Our counting network, denoted as C(t; w), is constructed from regular balancers with 2 inputs and outputs and from balancers with different number of input and output wires. A q-input, p-output balancer behaves in the same as a regular balancer, that is, a token is received from one of its q inputs and is forwarded to one of its p outputs using the same toggle mechanism with p settings (see also [2, 11] ). In C(t; w) we use balancers with q = 2 and p 2. In figure 1 we see the construction of C(4; 8), where the balancers are drawn with vertical lines and the wires are drawn with horizontal lines. We see that this construction uses 2-input, 2-output balancers, and 2-input, 4-output balancers.
Our construction improves over all known practical constructions in terms of depth and contention, two important measures for balancing and counting networks. The depth of a balancing network is the maximal path length from an input wire to an output wire. The depth is important since the number of memory locations that a processor may have to access, before its incremental request is satisfied, is at most the depth of the network. The contention is the extent to which concurrent processors access the same memory location (the balancer in our case) at the same time. The amortized contention, defined by Dwork et al. [7] , measures contention in the worst-case and in the limit when many processors access the balancing network concurrently. In order to achieve good performance in a counting network it is necessary to achieve both small depth and low contention (see [6] ).
The traditional practical counting networks known so far achieve depth O(lg 2 t), where t is the width of the network.
Such networks are the bitonic and periodic counting networks [3] , which use 2-input, 2-output balancers, and other constructions which use balancers of larger widths [1, 8, 9] . The amortized contention that is achieved by these networks is of the order O((n lg 2 t)=t), where n is the processor concurrency. It is easy to see that in these networks if we need low contention we have to use large widths. However, this has the side effect that it increases the depth of the network. Therefore, there exists a trade off between the choice of the appropriate depth and contention which is related to the width of the counting network. Our construction, due to its irregular structure, achieves depth O(lg 2 t) which is independent from its output width. Simultaneously, the amortized contention is ((n lg 2 t)=w + (n lg t)=t))) which means that by increasing the output width the contention drops. Therefore, for any fixed input width t we can decrease the contention by increasing only the output width, while preserving the depth of the network. This way, we avoid the trade off between depth and contention found in traditional networks. Actually, by making w to be of the order (t lg t) we achieve amortized contention of the order O((n lg t)=t) which improves by a logarithmic factor over all known best practical counting network constructions. The performance of our network can be explained as follows. A balancing network can be divided in layers, where each layer contains the balancers that are in a specific depth. In the traditional counting networks the width of all the layers is equal to t, making the contention to be the same for each layer. On the other hand, in our network, only the first lg t layers have width t, and the rest layers, which are the majority, have width w. By taking w to be larger than t more balancers are available at each of the last layers. Thus the contention of the balancers of these layers decreases as w increases, making the total contention to decrease.
The construction uses as a building block a network with a novel merging property which we call bounded difference -merging network. This network mergers the outputs of two counting networks which have a difference of at most . The contention measurement is done using the recursive method introduced in [9] . The rest of the paper is organized as follows. In Section 2 we give the necessary definitions, in Section 3 we present the construction of a bounded difference -merging network, and in section 4 we present the construction of our counting network. Finally, in Section 5 we give our concluding remarks and present some open problems.
Definitions
We denote an integer sequence with a capital letter, e.g.
X, and its elements with small letters e.g. x i . The first index of a sequence is 0. Let (X) denote the sum of all the elements of X. From know on whenever we say that we compare two sequences X and Y we mean that we compare their sums, that is, we actually compare (X) and (Y ). Take a sequence X of length (width) p. We say that X has the step property, or alternatively X is step, if 0 x i ?x j 1, for any i; j, 0 i < j p ? 1. We say that X has the k-smooth property, or alternatively X is k-smooth, if 0 jx i ? x j j k, for any 0 i; j p ? 1
Let b be a q-input, p-output balancer. Let X be the sequence (input sequence) of width q such that x i is the number of tokens received on the ith wire of b, for all 0 i q ? 1 . In a similar way we define the sequence Y As in the balancer, we require B to have the safety and liveness properties. For the rest of the paper we will assume that the balancing networks we consider are in quiescent state.
A counting network is a balancing network such that its output sequence has the step property. We will denote a counting network of input width t and output width w as C(t; w).
A bounded difference -merging network is a balancing network of equal input and output width, whose input sequence can be divided into two equal length subsequences A and B such that its output sequence has the step property whenever A and B both have the step property and 0 (A) ? (B) . That is the difference between A and B is bounded by . We denote such a network of width w as M(w; ). We will refer to the sequences A and B as the first and second input sequence, respectively, of M(w; ).
On an MIMD shared memory multiprocessor machine, a balancing network B is implemented as a shared data structure, where balancers are records and wires are pointers from one record to another. Each of the machine's n asynchronous processors runs a program that repeatedly traverses the data structure from some input pointer to some output pointer, each time shepherding a new token through the network. Tokens generated by processor p l , l 2 f0; : : :; n ? 1g, enter the network on input wire l mod t. The limitation on the number of concurrent processors implies a limitation on the number of tokens concurrently traversing the network at any given time:
Consider an execution of B entering a quiescent state after m tokens pass through it. Each time a token passes through a balancer, all tokens pending at this balancer incur a stall step, modeling their delay due to contention with each other. The number of stall steps has been introduced in [7] as a measure of contention. The contention incurred by the traversal of m tokens through the network B at concurrency n, denoted cont(m; n; B), is the maximum number of stalls, over all possible executions, induced by an adversary scheduler. The amortized contention of the network B at concurrency n, denoted cont(n; B), is the limit supremum of cont(m; n; B) divided by m, as m goes to infinity.
A Bounded Difference -Merging Network
In this section we present the construction of a bounded difference -merging network M(w; ) where w = p2 l , = 2 k , p 1, l 2, and 1 k < l. Let A and B denote the first and second input sequences, respectively, of M(w; ) and let Y denote its output sequence.
The construction is by induction on . For the base case we have = 2 and the network M(w; 2) consists from w=2 2-input, 2-output balancers b 0 ; : : :; b w=2?1 . The first and second input wires of balancer b i are connected to b i?1 and a i , respectively, and its first and second output wires are connected to y 2i?1 and y 2i , respectively, for 1 i w=2 ? 1. The first and second input wires of balancer b 0 , are connected to a 0 and b w=2?1 , respectively, and its first and second output wires are connected to y 0 and y w?1 , respectively.
For the inductive case > 2, the network M(w; ) is constructed as follows (see figure 2) . We take two copies of the network M(w=2; =2) denoted as M 0 (w=2; =2) and M 1 (w=2; =2). Next, we examine the inductive case > 2. Since the difference between the sequences A and B is at most we have that the difference between their even subsequences is at most =2, and similarly the difference between their odd subsequences is at most =2. Furthermore, these subsequences have the step property. Therefore, by the induction hypothesis, the outputs of networks M 0 (w=2; =2) and M 0 (w=2; =2) have the step property. Since in a sequence the even subsequence is greater by one or equal than the odd subsequence, we have that the output sequence of network M 0 (w=2; =2) is bigger by at most two or equal to the output sequence of network M 1 (w=2; =2). Therefore the output sequence of network M(w; 2) has the step property, as needed. The construction guarantees concurrency n=2 for each of these networks.
We solve the recurrence cont (m; n; M(w; )) cont (m 0 ; n=2; M 0 (w=2; =2)) + cont (m 1 ; n=2; M 1 (w=2; =2)) + cont (m; n; M(w; 2)).
For the base case we have cont (m; n; M(w; 2)) = m(2n=w ? 1), since the contention of a balancer with concurrency n which traverse m tokens is equal to m(n ? 1).
A Counting Network
In this Section, we present a counting network C(t; w) where t = 2 k , w = p2 l , p; l 1, and 1 k l. Let X and Y denote the input and output sequences, respectively, of C(t; w).
The construction is by induction on t. For the base case we have t = 2 and the network C(2; w) is just a 2-input, w-output balancer. For the inductive case t > 2 the network C(t; w) is constructed as follows (see figure 3) . We take t=2 2-input, 2-output balancers b 0 ; : : :; b t=2?1 . The first and second input wires of balancer b i are connected to x 2i and x 2i+1 , respectively, for all 0 i t=2 ?
1. Next, we take two copies of C(t=2; w=2) denoted as C 0 (t=2; w=2) and C 1 (t=2; w=2). The first and second output wires of balancer b i are connected to the ith input of the network C 0 (t=2; w=2) and C 1 (t=2; w=2), respectively, for all 0 i t=2 ? 1. Next, we take the bounded difference t=2-merging network M(w; t=2) described in section 3. The first and second input sequences of network M(w; t=2) are connected to the output sequences of the networks C 0 (t=2; w=2) and C 1 (t=2; w=2), respectively. The output sequence of M(w; t=2) is connected to the sequence Y . This completes the construction. Next, we show the correctness of C(t; w), then we calculate its depth and we estimate its contention. 
Sketch of proof:
For the base case t = 2 the network C(2; w) is obviously a counting network. For the inductive case t > 2 we have the following. The balancers b 0 ; : : :; b t=2?1 make the input sequence of network C 0 (t=2; w=2) to be greater by at most t=2 or equal to the input sequence of network C 1 (t=2; w=2). By the induction hypothesis, the output sequences of these two networks have the step property and furthermore their output sequences have the same difference as their input sequences. Therefore, by Proposition 3.1, we have that the output sequence of M(w; t=2) has the step property, as needed. Theorem 4.2 depth(S t;w ) = (lg 2 t + lgt)=2
We solve the recurrence depth (C(t; w)) = 1 + depth (C(t=2; w=2)) + depth (M(w; t=2)) by using the result of Proposition 3.2.
For the base case we have depth (C(2; w)) = 1.
Theorem 4.3 cont (n; C(t; w)) (n=w ? 1=2) lg 2 t + (2n=t ? n=w ? 1=2) lg t
Let m 0 and m 1 denote the number of tokens that enter the network C 0 (t=2; w=2) and C 1 (t=2; w=2), respectively. Let L denote the first layer of balancers. We solve the recurrence cont (m; n; C(t; w)) cont (m; n; L) + cont (m 0 ; n=2; C 0 (t=2; w=2)) + cont (m 1 ; n=2; C 1 (t=2; w=2)) + cont (m; n; M(w; t=2)).
In order to do so we use Proposition 3.3. We also have cont (m; n; L) m(2n=t ? 1). For the base case we have cont (m; n; C(2; w)) m(n ? 1). Finally we take cont (m; C(t; w)) = lim m!1 (cont(m; n; C(t; w))=m).
Concluding Remarks and Open Problems
We presented a counting network construction with t input wires and w output wires. where t = 2 k and w = p2 l , and k l. This is one of a very few constructions known whose output width is not a power of two [1, 6, 9] .
Several interesting questions remain. Is it possible to extend our construction to arbitrary input and output widths, other than multiples of a power of two? It follows from impossibility results in [1, 5] that appropriate sets of balancer types would have to be used for such extension. Using such larger balancers is often expected to cause a reduction in depth (see [8, 9] ). What would be a trade-off between depth and contention in this situation? Can the combinatorial techniques in [5] be used to show impossibility results on constructible widths for bounded difference -merging networks?
We believe that our counting network will allow a significant improvement in performance when used in real shared memory multiprocessors, over previously used counting network constructions [1, 3, 6, 8, 9] . To verify our belief, we are currently implementing a software simulation of our counting network construction in a general asynchronous multiprocessor. We hope this simulation will enable us to evaluate the performance of our construction as measured by contention. In our simulations, we fix a particular input width and compare the counting network introduced in this paper to the bitonic and periodic counting networks [3] of the same width. Preliminary experimental investigations reveal that for appropriately chosen values of the parameter w, especially when it is taken to be of order (t lg t), the counting network resulting from our construction significantly outperforms the other two under identical concurrency conditions.
