We consider a failure-free, asynchronous message passing network, with n processors arranged on a ring or a chain.
complezity, is a chain where the intermediate processors have no identities. For an odd length chain we prove BitC(Leader) = O(nlogM), while if the length is even, BitC(Leader) = Q(n).
For the ring topology, we prove lower bounds of n(n log M) for Leader and (hence) HaxF.
More specifically, we prove several new lamer bounds (and some simple upper bounds) that imply the following results: For the two processors case, BitC(Consensus) = 2 and BitC(Leader) = BitC(MaxF) = 2log,M -O(1). For a chain, BitC(Consensus) = o(n), and BitC(MaxF) = O(n1og.V).
When the length is even BitC(Leader) = O(n), while if the length is odd BitC (Leader) = O(n + log M). An important situation, considered in communication
In our proofs we use both methods of distributed computing and of communication complexity theory establishing new links between the two areas.
1 Introduction sor starting the computation, and that there is transmission
The basic problem in the area of communication complezin only one direction of the link at any riven time. A large ity waz introduced by Yao in 1979 [ZO] . This problem asks what is the number of bits that two processors, A and B, have to communicate to each other in order to compute a function f(z, y) of their respective, private inputs, z (A's input) and y (B's input).
The model assumes that they send bits one at a time, starting with some bits sent by A, then some bits sent by B, and so on. At the end of the computation, both processors know the value off (2, y) . For example, if f(s, y) = max(x, y) and 5, y are integers in Z,u = 11,. ,M], then it is known that exactly [logA bits must be transmitted to solve the problem. Traditionally, it is assumed in this model that each of the processors in the network has a complete knowledge of the topology of the network and of the identities of all other processors. This justifies the assumption that there is only one procesnumb& of results exist on Yao's basicp&blem and variants of it, some of which consider also more general networks (e.g. 119, 5, 111 ). An introduction to the area appears in [13] . *Department of Computer Science, Ben-Gurion University of the Negev, PGB 653, Beer-Sheva 81105 Israel. This work has been done mostly when the author was visiting Institute de Matendticas, "NAM, M.&&o. Email: dinitz0cs.bgu.ac.iI.
'rkpartment of Computer Science, The Technion, Haifa, 32000 Israel. This work has been supported in part by the Bernard Elkin Chair for Computer Science. Part of the work was done while this author was at the University of Arizona, supported by US-Israel BSF grant 95-00238, while he was visiting the Institute de Matem&icas. UNAM, M&&o, and while he was visiting the Institute of information Sciences, Academia Sinica, at Taipei. Email: mor'anocs .technion.ac.il.
'Institute de Matem6ticas, UNAM, Ciudad Universitaria, D.F. 04510, M&&o.
Partidly supported by Conacyt and DGAPA-UNAM Projects. Email: rajsbaum~math.unam.m.
Consider now a seemingly slight modification of the problem of computing max (z, y) . This time, the two parties are not necessarily A and B, but any two, which are not known in advance, out of M distinct processors, each one identified with a value of Z.w; the processors have to find who of them has the larger identity. Note that in this scenario one cannot fix, ahead of time, a processor that will send the first bit in the communication.
Indeed, our results imply that at least Z[log.t4J -3 bits must be transmitted tosolve the problem in this setting. This introduces a new pammeter to the problem--namely, the information on the network known to the processors. In more general networks, we may also assume that the processors do not know the structure of the entire network, or even the number of processors in the network. These are the usual assumptions in the distributed computing literature (e.g. [l, 15, la] ). Moreover, in this literature, it is common that the problem which has to be solved is not necessarily a function of the processors inputs, but is specified by a task, i.e. an input/output relation (several outputs are allowed for the same input, see [17] ). Also, the output is not required to be the same for all processors. The inputs are usually the processors identities, which are distinct and are not mutually known. When considering tasks, the differences in power between the two models can become quite dramatic: -the Leader problem is solvable in the communicationcomplexitymodel with 0 bits (A outputs 0, B outputs l), while in the distributed computing model it still requires at least 2[logMJ -3 bits.
The two most studied tasks in the distributed computing literature are probably Consensus and Leader. A large number of results about these tasks exist, including algorithms, lower bounds and applications, in a variety of distributed computing models. However, almost always, Consensus is studied in the shared memory setting, mainly from the fault tolerance point of view, while Leader is studied in the message passing setting, emphasizing message complexity bounds We view both tasks as duals in the sense that in Consensus all processors have to agree on the same value, while in Leader one processor has to output a different value from the rest of the processors. In this sense, Consensus is about achieving symmetry, while Leader is about breaking symmetry. In this paper we study the communication cost of these two tasks and of the variant of Leader in which the elected processor must have the largest id in the system, called i.taxF.
In the distributed computing and networking literature, the usual communication cost meezure is message complexity. Under this measure, Leader and MaxF have been extensively studied in an asynchronous, message passing, failurefree distributed system. It is known that in general, O(m + n log n) messages are necessary and sufficient for both these tasks in a network of n processors and m communication links (e.g. see [9] for an upper bound and 141 for a lower bound), and this bound holds even when it is given that the processors are arranged in a ring. Some other special topologies (like trees or complete graphs) have smaller message complexities (resp. O(n) and O(nlogn)), but also in these cases Leader and &,xF have the same message complexity.
To the best of our knowledge, the message complexity of consensus wes not studied prior to this work. It is easy to see that Coneensue is not harder than Leader: in any topology, once a leader has been elected it can broadcast the value to be decided. Thus; only additional O(m) messages are needed to solve Consensus (in some cases, like in a complete graph topology; only O(n) additional messages are needed). We start by observing that the opposite is not true: For all the networks we discuss, the asymptotic message complexity of Leader is not reduced even if Consensue is given for free. In view of this. it may be surprising that, es we show, the message complexity of Consensus is the same es that of Leader.
This motivates the use of the following finer measure to distinguish between the communication costs of these tasks: the number of bits sent; which we call bit complezity. With respect to this measure, we can show that in chains: Consensus is easier than Leader, which is easier than &xF.
As mentioned above, bit complexity is the usual measure in the area of communication complexity. However, to the best of our knowledge, the bit complexity of asynchronous tasks like Consensus and Leader was not studied before in our model. In fact, as mentioned above, in the usual model of communication complexity, Leader is solvable with no communication whatsoever. Bit complexity in the distributed setting was studied earlier in [16, 31, where it was shown that computing any non-trivial function on a ring of n processors requires fI(n logn) bits.
The model we consider consists of a failure-free, asynchronous message passing distributed system, with arbitrary but finite link delays and local computation times. We assume that processors have distinct ids, and that processors are identical in the sense that their programs depend only on their id's. There are n processors, and the ids are taken fromthesetZ~={l,..., M),forn<<M.
The results
Message Complexity. First we show that all three tasks have the same message complexity on a ring, O(n log n). For this, we obtain an Q(nlogn) lower bound on the message complexity of Consensus in a ring, which matches the lower bound for Leader and MexF [4, 2, 71 . We stress that the lower bounds for Leader and MaaF hold even if Consensus is given for free. Moreover, the same lower bound is obtained for a complete network, proving that the message complexity of Consensus in this topology is also O(nlogn).
Hence, also in this topology, our three tasks have the same message complexity; the corresponding lower bound for Leader appears in [12] . For the cake of a chain, it is easy to see that all three tasks have O(n) message complexity.
The rest of the results are on bit complexity. We consider three simple topologies: a pair of processors, a chain of processors: and a ring. In the first two topologies, simple arguments show that the bit complexities of Consensus are 2 and O(n) respectively. In the main pact of the paper we concentrate on proving asymptotically larger bit complexities for breaking symmetry in these two topologies. Bit Complexity for Two Processors.
The following bounds hold for the bit complexities of MexF and Leader: MexF (and thus Leader) can be solved in 2 [log M] -2 bits, and every Leader algorithm (and thus every MarF algorithm) requires at least 2 [log Ml -3 bits. Notice that this is twice larger than the maximum possible bit complexity of computing any function f(x, y), of inputs z, y E (1,. , M), when the processors identities are mutually known [13] . Bit Complexity in a Chain. When the chain is of even length then there exists an inherently distinguished processor: the middle one; indeed electing a leader is of bit complexity only Q(n). The cese of odd length is more interesting. We show that in this cese Leader is harder then Consensus (but easier than MexF), by presenting an O(n + log M) bits algorithm and proving a matching n(n flog M) lower bound for Leader. The most interesting cake is the model where the (distinct) ids are given only to the two terminals of an odd length chain, while the remaining processes are identical and anonymous. This case is an analogue to the linear array problem studied in the communication complexity literature [19, 11, 51 . The question there is whether the bit complexity of computing a function f (5, y) of two inputs located at the end processors of a chain of length n equals n times the complexity of computing f (x, y) on a chain of length one (i.e., pair of processors). In terms of asymptotic complexity (i.e., ignoring constants), this question was settled affirmatively in [5] . We show that an analogue of the result in [5] holds for Leader in a chain of odd length, i.e. BitC(Leader) = O(nlogM); such a result does not hold in a chain of even length, as follows from the O(n) algorithm mentioned above (which works also in this two-input setting).
It follows from [19] that the bit complexity of finding the maximal input in a chain when all the identities are mutually known is nlogM-O(1).
For MaxF, weshow that O(nlog M) complexity is valid also in our model, even when we consider the complexity under the best possible scheduler. Rings. For the ring topology, the best known upper bound for Leader, H-F and Consensus is implied by the algorithm of [lo] , with bit complexity O(nlognlog!\/I).
As for lower bounds, all these three tasks have the n(n log n) lower bound implied by the lower bounds on their message complexity (see [2, 4, 71) . For Leader (and hence also for Md) we improve this lower bound to R(nlogiM). About Our Techniques. The main difficulty is in proving lower bounds. Our first technique is a modification of the partition-into-rectangles tech"iq"e of [20, 19, 51 (see also [13] ). Essential ingredients for this technique are (a) the algorithm should compute a function, and (b) allexecutions of the algorithm on a given input produce the same history on each communication link. In the communication complexity model, (a) is given for free, and (b) is achieved in [19, 51 for the chain topology by certain non-trivial restrictions on the algorithm.
In mo*e general topologies, (b) is obtained in [lQ] by transforming each algorithm to a token algorithm, in a way that increases the complexity by a constant factor at most. To adopt these properties to o"* model, we consider executions of a leader election algorithm under a fixed, specific schedule*, and shorn that these executions compute some anti-symmetric function. Then we show that such a function cannot have a partition i"to a small number of rectangles. In the pair topology: this also suffices to obtain the above property (b). To obtain property (b) in the two-input odd-length chain case, we introduce a transformation of algorithms for the odd-length chain topology to a canonical form, called two tokens algorithm. This transformation is an extension of [19] which eliminates the assumption on a single initiator used in [19] (this assumption implies the existence of a predetermined leader. and hence cannot be applied to tasks like Leader). The above techniques relate two areas: distributed computing and communication complexity, which were studied independently.
Our second main tool, which we "se for proving lower bounds on breaking symmetry in the two-processor and ring casesz consists of finding long symmetric erecutions. A contradiction is the" obtained by the fact that no symmetric execution can break symmetry. This method can be viewed as a dual of the partition-into-rectangles technique, as it shows that certain short symmetric executions define each a square s x s, for some large set s G zA4, s.t. all pairs of distinct inputs from S x S have the same history.
We note that in the ring case, there is a gap between o"* lower bo""d of n(n log,M) and the known upper bound of O(nlognlogM).
One can distinguish two different techniques for proving lower bounds for the ring topology: the techniques of [4. 2, 71, which exploits the fact that in some executions the ring must be traversed by a long chain of messages, and the technique used here which shows that many bits must be sent by each processor. It is possible that none of the above tvo techniques by itself could close the above gap, e.g., by proving an R(nlognlogM) lorver bound on the bit complexity of MaxF in the ring (improving the lower bounds for Leader and Consensus could be even harder). Improving these lower bounds, if at all possible, may require a new technique, which could be a combination of these two techniques.
Some of the proofs are omitted from this extended abstract because of lack of space.
Preliminaries
The model we consider consists of a failure-free, asynchronous message passing distributed system, with arbitrary but finite link delays and negligible local computation times. Thus, messages are sent by a processor only when it wakes up o* as an immediate response to receiving a message. Messages are delivered in a FIFO order. We assume that processors have distinct ids, and that processors are identical in the sense that their programs depend only on their id's, and their number of incident links. There are n processors, and the ids are taken from the set Znn = {l,.
,M}, for n << M. This is a standard model used ia distributed computing (e.g. [l, 15, 18] ), and is not described in detail in this extended abstract.
We consider two interconnection topologies: rings and chains.
A scheduler specifies the order in which processors take steps, and messages are delivered. In particular, a schedule* defines which processors wake up spontaneously.
An ezecution is defined by a given schedule* in the obvious way: the next configuration is determined by the current configuration and the set of messages received. A scheduler is deterministic if it produces a unique execution for each initial configuration. LVe are interested in the following tasks:
1. Consensus: all processors output the same bit. If all the id's are odd they output 1; if all id's are even they output 0.
2. Leader: one process outputs 1, the rest output 0.
3. MaxF: the same as Leader, except that the process which outputs 1 is the one with the maximal id.
Thus, a task specifies the set of allowableoutput vecto*s for each input rector. In the three cakes the inputs to the task are the processors id's: a"d hence any vector of n distinct integers from Zol is a possible input vector to the system. The output vecto*s are binary. In Section 6 and 7 we consider also the caze where only the end processors of the chain have inputs. We consider the following worst case complexities: For exact bounds on bit complexities, we will assume that each message is a single bit. This avoids a situatio" where the same sequence of bits can be sent in mo*e than one way, by segmenting it in different message sequences. Thus, a processor can send mo*e than one bit as a response to a message, but these bits are not guaranteed to be received at the same time. In Section 7.2 we consider best scheduler complexity, in which the worst case number of bits sent (eve* all inputs) is considered for any fixed deterministic scheduler, and this meas"*e is minimized over all deterministic schedulers.
I" this section we p*ese"t a lower bound on the number of messages needed to solveConsensus. This is the only section where we studs message complexity; in the rest of the paper we will deal with bit complexity only.
Any distributed algorithm for Leader in a ring can be modified to achieve also Consensus using O(n) additional messages: first elect a leader and then the leader sends a message around the ring informing all processors that the consens"s decision value is the parity of its id. (A similar technique works also in general graphs). Since there is an O(n log") messages algorithm for Leader [lo] , then there is an O(nlogn) messages algorithm for Consensus. We have the following matching loner bound for the message complexity of Consensus in rings. The proof of Theorem 3.1 is postponed to the complete version of this paper. A similar proof strategy can be used to prove: Theorem 3.2 The message complexity of Consensus in a complete graph is n(n logn).
4 Two-processor bit complexity
Here we consider a chain of two processors connected by one link, which is the basic model in communication complexity, In Section 4.1 we study achieving symmetry, while in Section 4.2 we study breaking symmetry. In Section 4.3 we present a more general perspective of the two-processors bit complexity, and its relation to communication complexity.
Achieving symmetry with two processors
We start with a very simple result, but its proof already incorporates some of the ideas used later on in more sophisticated forms. These ideas are: considering specific schedulers (e.g., both processors wake up spontaneously), "cut&paste" two executions to produce a third one, and partitioning the set of ids according to the first bit sent upon wake up. Proof: To solve Consensus sending two bits, each processor sends the parity of its id to the other processor, and both decide on the (say) OR of these bits.
To prove the lower bound, consider an arbitrary algorithm solving Consensus. Observe that when a processor wakes up spontaneously, the first bit it sends (if any) is a function of its input. Partition the set of ids in So, SI, So, according to the first bit sent by a processor when it wakes up spontaneously: 0,l or nothing. Assume for contradiction that the bit complexity of Consensus is less than 2. Then iSo v S1 1 2 1. Indeed, otherwise we can give two input ids from So U S, to the processors, and in an execution where both processors wake up spontaneously, at least 2 bits are sent.
Therefore I.?& 2 3 and there are at least two even ids in Sn or at least two odd ids in Se. Assume w.1.o.g. that .zozyo E S@ are even, and 51 E Sa is odd. Then, in an execution where the inputs are (a, yo) both have to decide 0, withont waiting to receive any bits. Thus, in an execution where the inputs are (51, yo) also both have to decide 0, without waiting to receive any bits. This implies that when the inputs are (51, y,), where yl is another odd input, both have to decide 0, and Consensus is not solved in this case.
Breaking symmetry with two processors
The following algorithm solves HaxF, and thus Leader, sending 2[logz M] -2 bits: the two processors exchange their identities without their least significant bit: the larger abridged identity wins, and if they are equal, the unsent bit which is equal to 1 wins. Below we prove that 2 [log, Mj -3 bits are needed to solve leader. Consider an algorithm for a chain of two processors, A and B (the names A and B are not known to the processors). We now define a deterministic scheduler called synchronous, which we use in some of our lower bound proofs. Evidently, a lower bound on the complexity of executions defined by any fixed scheduler is also a lower bound on the complexity of the distributed algorithm. l both processors wake up at the same time, say 0; l at each integral time, every processor receives the first undelivered bit sent to it; if any, processes it, and sends messages according to its algorithm.
Note that the time is unknown to the processors; using time is just a convenient formal device to specify an execution.
Our first lower bound technique shows that there exists a long synchronous execution (see Fig. 1 ). Let us represent the history, h, of a synchronous execution by a ternary sequence a,bla262...aeb,, where if a;: b; E {O,l} then they indicate the bits received by A and B, rap., at the moment i, and if a; or b; is equal to I, no bit was received. In fact, we will construct a long symmetric synchronous execution, with a; = bi # 1. If h corresponds to an execution with inputs (5, y), we denote it as h(x, y), and let inputs(h) = ((z:y)lh(z:y) Proof: Consider any Leader algorithm for a chain of two processors, A and B. Assume first that the last bit sent by each processor is its decision. Any valid algorithm can be extended, by sending two bits after decisions, to fit this. Hence, the lower bound under the assumption may be off by these two bits as compared with the general bound. We will show that there exists a long synchronous symmetric execution. Clearly, a Leader algorithm with the above property cannot terminate while the execution is symmetric.
We construct a symmetric history h = alalazaz...a,bg, a; E (0, I], inputs(h) # 0, as follows. At each step i of the con&u&on me define a prefix of h called hi. Initially, ha is empty and inputs = ((2, y)lz, y E ZM, I # y]. We will maintain the invariant that in any possible last configuration for hi, i 2 1, there will be bits in transit in both directions of the link. Moreover, the first undelivered bit will be the same in both directions. We will extend h; into hi+, by delivering the same bit a;+, from each processor to the other.
Partition the set SO of ids into St, Spx S,O, according to what the first bit sent by a processor is, when it wakes up spontaneously: 0,l or nothing. Notice that there is at most one input in Si, because if there are two (distinct) such inputs, we give one to A and the other to B, resulting in an execution where no bits are sent, contradicting the assumption that the last bit sent is the decision.
Let S' be the largest set of S,o,S:, and let LI, be the corresponding bit sent. If 1.7'1 2 2 then we fix hl = alal; in hl we deliver the same bit 01 to each of the processors. Thus, inputs = ((z,y)lz,y E S',z # y}, i.e., if A and B get (distinct) inputs from S' then the corresponding execution prefix has history hi. (Remark: in any execution Nhich starts with inputs from S', the same first bit is sent but it is possible that some additional bits are sent too.)
Notice that the only knowledge that each processor has at the end of hl, is that the input of the other processor belongs to the same subset S' as its own input. Hence, given that a history begins with hl, the second bit sent by A or B (if any) depends only on its own input. As before, there are three subsets of S' , denoted by Si, SF, SL, according to the second bit sent (this bit could be sent either on wake up or after receiving the first bit (II). Once more, ISAl 5 1. Let 5' be the largest set of .S'A, S:, and let the corresponding bit sent be a~. Then hz = a~a~azaz. We continue in this way, while subsets Sk contain at least two inputs; so long inputs # 0.
We For every input (x, y) there is a unique synchronous (final) execution and hence a unique (final) history. Let us denote by J(z, y) the x&e decided by A in the synchronous execution of algorithm R; notice that f is a function. Note that, assuming that the last bit sent by a processor is its decision, A decides the same value in all executions with history h. Thus, j assigns a binary value f(h) to each history h, which is the value decided by A on every execution starting with inputs(h).
That is, the function f has the same value on the semi-rectangle inputs(h) of Lemma 4.5. -1) . Thus, the maximal k for which IS'1 > 2 is at least Llog, MJ -1. Since for each i = 1.. k, two identical bits are delivered to the processors and the last bits sent by the processors must be distinct (assuming the last bit sent by a processor is its decision), at least one more bit must be delivered. We get the lower bound 2 Llog, Mi -1, aad the claim follows. .
Two processor communication complexity approach
As observed in the introduction, one difficulty in applying communication complexity lower bounds techniques like those of [20] to our model is that unlike the model of [20] , our model does not guarantee that there is a unique execution or even a unique output for each given input rector, independent of the specific message delays. IVe obtain such a functional dependency, which maps each input vector to a unique execution of the algorithm (and, hence, to a unique output vector), by considering a synchronous scheduler.
4.3.1
Anonymous schedulers and rectangles
We say that a deterministic scheduler S is anonym~w if it is independent of the processor names. That is, if Q is the execution produced by S when starting with input (x, y) (i.e., z for A and input y for B): then d should be the execution starting from (y,z), where & is obtained from a by permuting the names A and B everywhere. Notice that a synchronous scheduler is anonymous. Since an _anonymous scheduler is deterministic, it defines a junction j that maps each input vector to an output vector, and jis a restriction of T, the task solved by the distributed algorithm. Let S be an anonymotes scheduler. For every input vector (x; y), there is a unique ezecution under S, and hence a z~nipue cowesponding output vector (z': y'). Moreover, if (z', y') is the output vector for (z: y), then (y', z') is the outpet vector for (y, a-). As an application, me give a different proof of the R(log M) lower bound on the bit complexity of breaking symmetry stated in Theorem 4.2. The bound is less tight, but the technique used is more general, in the sense that it applies to any algorithm which under some scheduler computes any function with large D,.
Consider the executions of a Leader algorithm under the synchronous scheduler, and let j be, as before, the function defined by the decision of processor A. By Proposition 4.3, the function j is anti-symmetric: f (2, y) = l-f (y,x), for all inputs 5, y. Following Theorem 4.7, it suffices to show that for an arbitrary anti-symmetric function f, D, = R(M). We use a variant of the rank method (see e.g. [l3], p. 13). To this end, consider a partition of D to j-monochromatic semi-rectangles. Replace each semi-rectangle by the corresponding rectangle (i.e., include the diagonal elements). Let C be the resulting covering of D by rectangles. Then the rectangles in C partition all non-diagonal entries, and, in addition, cover also k diagonal entries, for some 0 5 k 5 M.
C&e 1: k = M (C COYBE all the elements in I) = {dij}). NO rectangle can contain two distinct diagonal elements d;i, d,j, since j(d;j) # f(d,;). Thus, in this case Df >_ M.
Case 2: k = 0 (C does not cover any diagonal entry).
For each rectangle R in C let An be the matrix in which the entries of R are 1 and the other entries are 0. Then CAR = J -I where J is the all-ones matrix and I is the identity matrix. Below we show the lower bound n(log(M/n)) for this case. Note that, together with the above bound n(n), it implies BitC(Leader) = n(n + logM), a~ required.
Let us call a super-input the concatenation of (s+l)/Z ids in the chain from a terminal processor inward; thus, each instance of Leader has two super-inputs, from the sides of the two terminals.
Let us rest&t the problem to super-inputs 12.. ((n+1)/2), ((n+1)/2+l)((n+1)/2+2)...(n+l): . . . . there are L2M/(n+ 1)j such super-inputs. The idea is, given an algorithm A, to replace each half-chain by a single superprocessor which simulates in its local memory the Eompatation of A in this half-chain.
Thus we obtain a two-processor Leader algorithm, over the middle edge, with [2M/(n+ l)] possible (distinct) inputs. By Theorem 4.7, its bit complexity, and thus the bit complexity of A, is n(log(M/n)), as
required. An implementation of this idea can be done similarly to the proof of Lemma 7.1.
Canonical Algorithms in Odd Length Chains
In the previous section the easy cases for a chain were identified. In this section and the following one we consider the remaining case: the length of the chain is odd and only the two terminals have inputs. This is analogous to the linear array topology studied in the context of communication complexity [19; 5, 111 . In this section we develop our main tool for proving bit complexity lower bounds in this situation, and apply it to leader election in the next section.
We A message sent to an outside (inside) neighbor is sent ovtwads (inwards resp.). We prove our lower bound under the assumptions that each one of the middle processors knows which of its two ports is incident to g, and each processor knows which of its two ports is directed outwards.
Clearly, such a lower bound is valid also for algorithms which do not use these assumptions.
To ease the discussion, we also assume in this section that it is possible to send and receive in one atomic step messages of arbitrary length. This can be simulated in our model where a message contains only one bit by sending an extra symbol at the end of each message, as we will show later.
Equivalent executions
We proceed to define an equivalence relation on the set of executions of a distributed algorithm, which holds for executions which are undistinguishable by the processors. This is done in terms of the usual partial order on events [14] , adapted to our model. The partial order is d&ted on some events of an execution a, denoted relevant(cr).
We will consider two types of events in relevant(a): reading and sending. Note that we distinguish between the event of receiving a message and the event of reading the message. This will enable to simulate that a message arrived later than its actual arrival time, and it is easily achieved vith an auxiliary variable. Once a processor p; receives a message M from pi it moves it to the auxiliary variable.
Later it reads M from this variable, applies its transition function, and empties the variable. We call this procedure a read event.
For an event $;, 4; E relevant(a) if 4; includes a read event, 01 if it is a send event, i.e, in 4; a message is sent (an event can be both a send and a receive event).
Norm, the causal order of an execution a, (relevant(a), +), is the transitive closure of the following order. If ~,,+I are two events of relevant(a), then O1 + $2 if (i) both events are of the same processor and & occurs before 6, or (ii) in & a message is read which was sent in ~1. Notice that in one receive event a processor can read several messages accumulated in its buffer. Two executions a and p are equivalent if they have the same initial configuration, and there is bijection from the relevant events of OL to these of fi, which maps each event in a to an identical event in p, and preserves the partial order of events. A causal prejiz of an execution (I is a subset of the events in u which is closed under causal precedence.
The causal history of a link is the restriction of the causal order of the execution to the events on the link. Notice that if two executions are equivalent; then they have the same causal histories on all links, and every processor passes through the same sequence of local states in both executions. In particular, every processor decides the same value in both executions.
6.2
The outside-precedence scheduler
To facilitate the definition of the scheduler, we append (logically) to some messages (possibly empty) a $ symbol according to the next rule. Any time that a processor receives a message, and it does not waut to send anything outwards, then it sends the token $ inwards. This token is appended to the last inwards messa e it sends, or it is sent alone if there are no such messages. f In Section 6.3 we will actually send the symbol $, but in this section, it is just a formal device to identify certain events in an execution. We use another such formal device: assign to any message sent over 0 its sequential number among such messages sent over Z in the same direction. The rules for the outside-precedence scheduler are described next. (The description is in terms of time for notational convenience)
Outside-precedence scheduler:
1. The terminal processors wake up simultaneously, and no other processor wakes up spontaneously.
2. Messages sent outwards are delivered in one time unit.
3. Messages not ending with $ which are sent inwards, are delayed and delivered together with the first subsequent message ending with $ sent in the same direction over the same edge.
4. Messages ending with $ sent inwards over au edge other than e are delivered in one time unit.
5. A message ending with $ which is sent over i before the message with the same sequential number sent over E in the opposite direction, is delayed and delivered simultaneously with that latter message.
Our two tokens algorithm a, presented next, is intended to simulate executions of A under the outside-precedence scheduler.
'Note that in ewe that A enters a deadlock, the middle processors in a may exchange $ symbols over the middle edge for ever. We will show later that this is not the case when A is guaranteed to terminate.
6.3
The hw tokens algorithm
The conversion of an algorithm A to an equivalent two tokens algorithm a is done in two steps.
Step 1: Using $ symbols. First we convert A to an algorithm 2 which act exactly as A does, but additionallysends the $ symbols as described above. An efficient implementation of such a codification using two symbols is described in Lemma 6.3.
Step 2: The two tokens algorithm.
We now show how to transform algorithm J!' to obtain the algorithm a that simulates the outside-precedence scheduler.
The transformation has two rules.
1. A processor receiving a message from outside reads it only when the next $ arrives. That is, when the processor receives a message, it stores it in a queue, and when it receives a $, it processes all the messages stored in the queue, and empties the queue.
2. A middle processor counts the messages ending with $ that it sends and receives over & It reads the queue of messages that arrived on e only after it receives a message ending with $ aud after it sends the message on t? ending with $ with the same sequential number.
Note that rule 1 and rule 2 separate the event of receiving a message from that of reading it. Recall that in defining the causal order of the events in au execution (Section 6.1), the relevant events are read and send. Under this definition, two executions with the same causal order are indistinguishable, in the sense that the underlying executions of A are indistinguishable.
It will be useful in the following discussion to assume an additional rule for the two tokens algorithm:
3. Every time a processor wants to send a message inwards, it stores it in a queue. If the message ends with $, it sends all the messages in the queue as a single message, and empties the queue.
This rule do-not really change the behavior of the algorithm, but it eases the analysis of the algorithm.
Properties of the two tokens algorithm
For the next lemma we need sorn~ definitions. Consider an outside-precedence execution of A, and let t be an integral time (note that the events occur only at integral times). We say that a processor p is quiescent at time t if at time t -$ there were no pending messages destined to p, and p is not a terminal processor in initial state. p is active at time t if t = 0 and it is a terminal processor in its initial state, or if it receives a message at time t, or if it is a middle processor which sent a message over the middle link which is yet undelivered at time t. The following lemma states the main properties of outside-precedence executions of a. The proof is a straightforward induction, and is postponed to the full version.
Lemma 6.1 For every outside-precedence ezecvtion of a, the following holds at each integral time t:
1. There aw no tlndelivered lnessages on edges, ezcept possibly on E.
2. There is at most one active processor at each haffchain, and all the processors outside u.7~ active proces-SOT are quiescent.
3. There is at most one undelivered message ending with $ on E, whose sequential number is larger by one than the number of messages sent in the opposite direction over Z. If there is no undelivered message on Z, then the same number of messages were sent in both directions over E.
We note that the above lemma requires that a processor sends at most one message over a given link in each atomic step. This can be implemented in a model where only a message consists of a single bit by requiring that processors append % symbols also to messages sent outmuds.
Lemma 6.2 If algorithm A never enters a deadlock, then in each outside-precedence ezecution of 2, out of two messages with the fame sequential number sent OU~P .5, at most one contains only a $ symbol.
Proof:
If two such messages carry only a $ symbol, then in the corresponding execution of A there are no more undelivered messages, but the tvo middle processors have not terminated -which is a deadlock in A. n
Proof:
We use a simple encoding to implement .& which uses 3 letters {O,l, $}, so that it uses the two letters {O,l). We encode the three letters with the uniquely decipherable cod:JS] in which $ is encoded by i and 0 and 1 are encoded by 00 and 01. Thus, a message blb2 b&S of A is encoded e OblObz Obti. We have that at most 3 bits are sent in A per each bit of A. The claim follows since each $ sent on an edge other then Z except the first one corresponds to a message (with no $) sent previously outwards, and by Lemma 6.2 at least one bit is sent for each two $ symbols on the middle link. n
Consider an execution a' of a, defined by the outsideprecedence scheduler. Let a "crossing" event in a* be an event in which a middle processor sends a message over E. Split the events of o(* to blocks, where an event e of a processor at a half chain C belongs to the i-th block if its causal order is between the i -I-st and the i-th crossing events of the middle processor in C. Then, using Lemma 6.1 and rule 2 of the two tokens algorithm, a proof by induction shows the following on the causal order of the events.
Lemma 6.4 Any outside-precedence execution of A satisj'ies the following properties. . Events in block i precede the events in block i + 1.
We have the main result of this section: Theorem 6.5 For any algorithm A solving a task T with bit complezity C, there mists an algorithm a solving T, such that for any inputs, any executiona of a is equivalent to the outside-precedence erecution a* of A or, the same inputs. Moreover, BitC(& < 4C + n.
Sketch of proof:
The second part of the theorem follows from Lemma 6.3. For the first part, let a; be the prefix of the first i events in a. It suffices to prove that for each i, ai is equivalent to a causal prefix of a*. The base is by the fact that in both executions, there are exactly two minimal events-the spontaneous wake up of the terminal processors (and these processors have the same inputs in both algorithms).
Assume now that the claim holds for i, and let of be the causal prefix of a* which is equivalent to a;. By induction, for each enabled event in a; there is an identical enabled event in a*;, and by the properties of the causal order of a*, in a*i there is at most one enabled event in each half chain. Thus, the i + 1st event in u, say e, is identical to the i + 1st event in a causal prefix a*<+, of a*; which is identical to e. Let this latter event be e*. It remains to show that the causal order of e in a;+, is the same as of e' in uF+, e can be either an event in the ith block of some half-chain, or a crossing event. In the first case, e has the same causal order in a;+, as e' in a:+, by a straightforward induction. In the second case, this follows by induction using rule 2 of algorithm a.
. In this section we use the canonical form construction of the previous section in order to prove that the complexity of these tasks is R(nlogM). We use a variant of Dietzfelbinger 151 proof strategy, which provides a general lower bound on the bit complexity of computing a function in a chain (in the communication complexity model).
Consider an arbitrary Leader algorithm, A, for an odd length chain, and let a be the corresponding canonical form algorithm a5 in Theorem 6.5. The history of an edge in an execution is the sequence of messages sent an it, ordered according to their causal order; this order is total on edges other than E.
Lemma 7.1 The number of distinct histories of a on :, over all pairs of inputs, is at least M/2.
Proof:
By replacing each half-chain by a single superprocessor which simulates in its local memory the coplputation of A in this half-chain, we get a two-processors Leader algorithm, 13, over i. This transformation is possible becanse of two reasons. First, the chain is of odd length, and hence the network is split into two symmetric parts, preserving the requirement that the two processors can be distinguished only by their identities. Second, each of the two processors can simulate its own half-chain and communicate with the other processor so that together they simulate an outside-precedence execution of .& Thus, in algorithm U, for each input z,y, the messages exchanged on E are the same as in the (unique) outside-precedence execution of 2 on this input. The result follows by Lemma 4.8, because these are synchronous executions. n Let el, ez be edges belonging to different half-chains. If in an execution their respective histories are hl and hz, the bracket history of (el, ez) is the concatenation hl hz of their histories. The following analogue of the Bracket Lemma [5, Lemma 2.51 holds: lemma 7.2 If two ereeutions of a over the same input I, y have distinct histories on E, then the bracket histories of (e,, ez) in these ezecutions are also distinct.
Assume for contradiction that the execution starting with rc, y and the execution starting rvith x', y' have different histories h; h' in E, but both executions have the same bracket history hl hz on el and es. Then for input z, y' we get the same execution a with inputs I', y', by a standard cut and paste argument, and hence history h' on i. Repeating, we get the same execution for z, y, and hence again history h' on E. Thus we get two different executions for .z, y, one where z has history h and the other where it has history h', contradicting Theorem 6.5. w Lemma 7.3 BitC(A) > (n/2)log, M.
Proof: As a consequence of Lemma 7.1, there are c; c 2 M/2, different inputs that lead to c different histories on E. Let the inputs be (21, yl), , (I<, y<), and the histories h(zl,yl), , L(.G, yc). By Lemma 7.2, these c inputs lead to different bracket histories at any pair of symmetric links (i.e., links at the same distance k from the terminals). Let the history at some symmetric distance k, be hth,-t(xi, y;). The number of bits sent for each input zi, yi is the sum of the number of bits in the histories:
Hence, the average of lengths of such histories gives a lower bound on BitC(&:
Each of the averages in this expression is at least log, (M/Z) -1 (each such history is a sequence of bits, and there are at least M/2 of them). Thus BitC(a) > log,(lM/2) -1+ (VI -2)/2(log,(M/2) -l), and the claim follows. n Now, it remains to convert this bound for canonical algorithms A, to a lower bound for an arbitrary leader election algorithm. This follows from Theorem 6.5, which says that every such algorithm can be converted to a canonical form algorithm with only a constant factor increase in the bit complexity, and from the fact that the total length of the edge histories is greater by at most a constant factor than the number of bits sent in the corresponding executions. This implies the next theorem. The constructions in Section 5 can be used to show that, for any chain topology, there are l.eader algorithms which, under certain "good" schedulers, are guaranteed to have small bit complexity. In this section we show that this is not the case for MaxF. Specifically, we show that for any algorithm A for MaxF in a chain, and for any deterministic scheduler, there is an execution of A under this scheduler in which fl(n 1ogM) bits are sent. For this we assume that only the two end processors have inputs, and prove a more general result, which holds for a distributed computation of atly function f (z, y) in this model. This result holds even if the processors have distinct identities, A and B, which are mutually known.
To eaze the analysis, assume that in each execution of A, at most one bit is received at any given moment. Then the history of an edge e can be represented by a sequence 01610262.. Dtbt, where Di is L(eft) or R(ight) and b; is a bit, with the obvious meaning: bi is the i-th bit delivered on e, and D; is the direction on which b; was delivered. Assume now that A is an algorithm which computes some function j(x, y) of the inputs I, y of the end processors, and that the last bit sent by each processor is its decision value. Let inputs(h) = {(x, y)] there exists an execution of A under which the history of e on input (5, y) is h]. Working with functions (instead of tasks) allows to strengthen Lemma 4.5 as follows.
Lemma 7.5 Let h be a history on an edge e. inputs(h) is a f-semi-monochromatic rectangle.
Proof: Notice that any execution with the same history has the same outputs because the last bit sent is the decision. By Proposition 4.4, it suffices to show that the following holds: if there is an execution of A on input (21, ye) for which the history of e is h, and there is an execution of A on input (sz, yz), 51 # yz, under which the history of e is also h, then there exists also an execution of A on input (~1, ~2) under which the history of e is h.
The construction of the third execution is done by a standard 'Yut and paste" technique, that simulates the execution which leads to history h on input (XI, yl) on the left side of e, and the execution which leads to history h on input (x2, yz) on the right side of e.
.
A subset I of D = Znr x Z.w is a fooling set ([13]) if for every pair of entries (XI, y,), (x2, yz) in I, there is no fsemi-monochromatic rectangle containing both (II, ye) and (~1, yz). Let 1, denote the maximal cardinality of a fooling set.
Lemma 7.6 For any schedulers, no two pairs (xl, yl), (22, yz) in a fooling set I produce the same history on an edge e under ezecvtions ojd using S.
By our assumptions, if there are two computations of A, one on input (II, yl) and the other on input (22, yz), in which e has the same history, then, by Lemma 7.5, (21, ye) and (z2, yz) are both included in some f-semi-monochromatic rectangle, contradicting the definition of I. .
In view of Lemma 7.6, in the rest of this section we concentrate on computations of A under scheduler S restricted to the set of inputs in some fooling set, I. Corollary 7.7 Under the assumptions of Lemma 7.6, the average length oj a history produced by the execution oj A under scheduler S on edge e, taken over all distinct input pairs (I, y) in the fooling set I, is at feast [log]I]J -1.
By Lemma 7.6 and the fact that the average length of e distinct binary sequences is at least LlogeJ -1. .
Since the number of bits sent in a computation is half the length of the history of that computation, Corollary 7.7 implies that the average, taken over all distinct input pairs (I, y) in I, of the number ofbits sent over an edge e during an execution of A under scheduler S is at least ([log ]I]] -1)/2. Summing ova all edges, we get that the average of the total number of bits sent during an execution of A under scheduler S is at least n times that number. This gives the desired result: Theorem 7.8 For any sehedvler S, there mists an ezecution of A under S in which at least n( Llog I,J -1)/2 bits are sent.
As a consequence of this theorem we get a lower bound for MaxF. Associate with Me.xF a scalar function f which has value 1 if processor A has the greater id, and value 0 otherwise. It is easy to check that the set I of all the diagonal elements of D is a fooling set for J. Since 111 = M, we get bitc(MaxF) = $l(n log M) for any scheduler. Remark:
For a function f, let D, and If be the cardin&ties of the minimum partition-into-rectangles and of the maximum fooling set of fi respectively. The above lemma implies that if Df = 1,, then the best scheduler complexity of f is the same as its (worst case) bit complexity. It is known, however, that Df is in some cases much larger than I, [6] . It is therefore an interesting problem whether the best scheduler complexity equals the worst case complexity for all functions f. 
Sketch of proof:
The proof is very similar to the one of Theorem 4.2. The idea is the following. Consider an axbitrary algorithm for Leader in a ring, where the last bit sent by a processor is its decision. For a lower bound, we can assume that the ring is oriented: local numbering of links at all processors is in the same direction.
When a processor wakes up, at time 0, there are 5 possibilities: to send 0 or 1, clockwise or counter-clockwise, and to send I (nothing). We choose the largest one out of the five conespending input subsets of Z,u, say, S1. Provided its size is at least n, we give inputs from .?I to all processors. The size of the subset corresponding to I must be less than n, since otherwise we can give inputs from this subset to all the processors, and then the algorithm terminates before processors sent their decisions bits, contradicting the assumption above. Then, at time 1, each processor receives the same bit from the same side. We choose a subset Sz in a similar way. We continue with such a construction as long as [St1 2 n. Thus, I.% 2 [(IS,-11 -R + 1)/41, IS'I = M, and we get that n(nlog(M/n)) bits are sent. Combining this with the known n(nlogn) lower bound (implied by the message complexity result of [4]) we get the required bound: $$(:/d) + fi(n log(n)) = W4lodMln) + log(n)) = no
