Abet.tact The computational capabilities of a system of indistinguishable processors arranged on a ring in the synchronous and asynchronous model of distributed computation are analyzed. A precise characterization of the functions that can be computed in such setting is given. It is shown that any function can be computed in O(n ~) messages in the asynchronous model, and ~(n ~) lower bounds 8P~ ~iven for elementary functions such as AND, SUM and orientation. In the synchronous model any function can b~ ~omputed ifi O(~tl~ ~t) M~.~8~. A ring can be oriented and synchronized within the same bounds. These bounds are tight.
Introduction.
The Siittipl~ i'ihg topology can exhibit many important facets of distributed computations. In a general distributed ~Othtatitettitih, ~bti tJ['5~Ot' ~tttPt~ Wit~i some initial celtiC; a function is computed that depends on all processors values. This requires a certain amount of coordination and synchronization between the processors. One way to achieve coordination is to elect a leader to the ring that will centralize the computation; once a leader is elected it can collect all input values in a linear number of messages, and then compute every function. Angluin [A] has shown that i~ processors are indistinguishable, there exists no algorithm to elect a leader in a ring. Hence every leader election algorithm must assume that processors have unique labels; iJStially the algorithm determines the leader to be the one with extremal (minimal or maximal) identity number. However, the task of assigning unique identities to proees= sara presupposes some form of preliminary coor. din atipn between the processors. ~n this paper we attack the computational problem directly: Which functions can be computed on a ring when processors do not have distinct id's (anonymmus ring)? We want to find algorithms to compute these functions, and to explore the minimal cost of such algorithms.
A problem specific to the ring is the problem of orientation. We assume that a processor can tell its left from its right; However, their notions are not necessarily consistent. We want to orient the ring consistently, i.e. have processors agree on what is left and what is right. This is important because algorithms generally use less messages on oriented rings. tThis research was was supported by the United States -Israel Binational Science Foundation, grant no. 2439/82. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.
In the general Problem each processor has some initial vaiue and we want all processors to compute some non-eans~ant function of these values, We will be mainly interested in the case where the initiai values are [O, lJ and a Boolean function is computed, such as XOR, SUM or AND.
We give a complete characterization of functions that can be computed on an anonymous ring. It is shown that some knowledge of the ring size is necessary, Our subsequent discussion assumes that the number o~ processors on the rins is known.
We distinguish two computation models, the synchronous and the asynehr0nous. For the asynchronous ring upper and lower bounds for finding the maximum of n distinct, inputs of order nlg n were proved [DKR, P,PKR] . Our first lower bound shows that ~(ng) messages are required to compute many elementary functions, in particular AND. As a corollary we obtain that if input values are not unique than ex~remum ~Jnd~ng costs at least [I(~ ~} messages. The difference between the last two results is due to symmetries that may occur wt~h identical inputs.
The best leader election algorithms consist of successive rounds where local maxima are selected, until a unique global maximum is found. When inputs are no~ disiinc~, symmetry may cause a deadlock situation, lh the synchronous model we can detect such deadlock; we can ~hen use ~he symmetry ~o obta{n in~ormatlon aboLiL [tae ring configuration. Ifi this way we achieve synchro; nous algorithms with message complexity O(nlg n) for orienting the ring and collecting all inptit values. Tl-l@ ~atl't~ ~tl~t~Plthms can 0e applied in the asynchronous model if we settle for a weaker notion of termination, OSll~O ~v@~tutll t~erat~tatton. T~'IeSa O(~tl~ n) algorithms are optimal, up to a constant factor.
In the next section we give basic definitions. In Section 3 we show various impossibility results for anonymous rings. Section 4 is dedicated to the asynchronous model. Tight upper and lower bounds of @(nR) messages are given for a simple function such as AND. This lowei-Oound holds for any Boolean function with the property that f (0 ..... 0)~f (1 ..... 1), while the upper bound holds for arbitrary functions. Furthermore, we show that the message complexity of the orientation problem is O(ne),
In Section 5 we show that more efficient algorithms can be obtained in the synchronous case: AND can be computed in O(n) bit messages and cycles, which is clearly optimal. We present an algorithm which communicates all inputs to all processors using O(~tlogn) bit messages. The ring can also be oriented within the same bounds. The algorithms work even if the processors do not start synchronously; An O(nlg r~) bit messages algorithm is given for processor synchronization.
In section 6 we outline a proof of an fI(nlg n) lower bound on the number messages required in the synchronous ease to compute XOR. The same..lower bound, holds for orientation and synchronization. The full details are ~lt~ elS~'here [ASW]~
Definitions.
Consider a system of ~ processors arranged on a bidirectional ring. Every processor P have two distinguishable communication channels with its two direct neighbors on the ring, left (P) and Tigltt (P) . If P=left(r~M~t (P) ) for all processors P of the ring then we say that the ring is of~ented with direction r~ght (P) .
An algorithm specifies the behavior of each processor, modeled as a state machine. The initial state of the processor is determined by its input value. We distinguish two computation models:
In the s%mehronous model all processors start the computation slmu|taneous[y, and proceed ih [oeks{ep (we shall later relax the assumption of simultaneous start). At each step a processor may send a (possibly distinct) message to each of it~ neigHb0r~. TH~ gtgte ~f P at the next stop I~ determined bE its current state, and the messages received from left (P) and ,q.ght (P) 
at this
In the asynchronous model the transmission of a message incurs an unpredictable but finite delay. Messages sent on a llnk are received in the order they are sent. A §tat~ tF~t~ §ition occurs at a processor only when it receives a message (we assume that a processor When a processor enters a terrr~nal stctte, it outputs an output value determined by its state, and halts. The computation terminates when eli processors have halted. Note that {n the asynchronous model the computation is not deterministic, so that there may be dii~eren£ executions (and d{i'~erent outcomes) ~or the same input.
For descriptive purposes we label the processors consecutively with labels 1 ..... n; the labels, however, are not ava~laSIe to the processors. The ring is u~onWwzo'u.8 if all ~he processors run the same algorithm. in a ~=6e&~ ring, processors w{£ia different [a~e|s may run di~erent algorithms. In particular, in a r'~ng ~u/~A le~zder, one distinguished processor runs an algorithms that may be distinct from the algorithm run by the remaining pPC~OesSors.
Let iN (respectively OUT) be the set of input values (respectively output values). In most of the problems discussed in this paperl ,IN = }0;I}, and OUT = ~Oil}, An 'ivt~vu.t covtf~g%~r~i'io~ for a ring is an assignment of input values to the processors of the rings. We speak, simi-[arly~ of output, ~tfid state configurations. A pruble~'t H is a mapping that assigns to each ring R and each input configuration I Or R a set II(R,I) Of ~tltpUt ~Snlfl~tlratit~iJg for R, the set of coy'reel solutions for R and /. An algorithm A sol~es the problem H if a computation of A on ring R started on input configuration i ends with R in an output configuration that is a correct solution for R and l, whenever such exists. The following two types of problems are considered in this paper:
Computing a Function. Let I:IN'-,OUT. A ring E con, tputes f if a computation started with input configuration I ends with each processor outputing the result f (I). Formally. the unique correct solution associated with input configuration I is the output configuration <f U) ..... f (Z)>.
Ring Orientation. In the Or'i, ent~on Problern~ each processor is required to orient its connections so that the whole ring becomes oriented. Formally, each processor computes a Boolean output, so that if the processors with output 1 switch their left and right connections then the ring becomes oriented. No input values are required, and for each ring there are two correct solutions.
Our cost function will be either the number of messages sent or the number of bits sent, for some binary encoding of the messages. Lower bounds will be on the number of messages while algorithms will be analyzed by counting bit messages.
Functions that are Computable on an Anonymous
Ring.
The first question that we address is that of characterizing the functions that can be computed distributively on an anonymous ring. Note that there is no difference for that purpose between the synchronous and asynchronous model. Clearly, a synchronous ring call simulate an execution of an asynchronous ring of the same size. Conversely, we can simulate a synchronous rlri~ on an asynchronous ring using local synchronization: each processor sends at each cycle synchronization messages, t.o both neighbo.rs; a processor proceeds from the simulation of one cycle to the simulation of the next cycle only after it had received synchronization messages
We shall, therefore, restrict our attention in this section £o the synchronous ffiodel.
Knowledge of Ring Size.
We first remark that knowledge on the size o~ the ring is needed for algorithms operating on anonymous rings.
THEOREM
3.1. Let I :}0,i}*-~0,i~ be a nonconstant function. There is no distributed algorithm that com-pUteg I COrrectly for infinitely many oriented rings.
PROOF:
Given an algorithm, consider an input ~t~fifi~ti~-ati~tl I~ wit@re the atigwet-lg 0, and an input cortfiguratlon Ii, where the answer is 1. Assume that the algorithm terminates in no more than T steps on both configuratkons. Consider now a coni~guration of the term There will be a processor in the first segment that receives the same messages for T cycles as a corresponding processor in the ring with configuration ib. Ttii § processor Will halt arid 6titi~ut 0. SimtlarIF, some processor in the second segment will output I. • Thus, any algorithm must have at least a bound on the ring size. For some simple functions the algorithm must "know" exactly the ring size. THEOREM 3.2. There is no SUM algorithm that works eorf-eetly 5ffi t~5 diff~t-eflt ~l~ed 61~letlt~d t'ln~s.
Given a SUM algorithm, consider two configurations with all inputs I, on two oriented rings of different sizes. These two conflgurations are equivalent: the processors receives exactly the same sequence of messages in both. Thus, the algorithm will output the same answer for both. • A sittlilar argument pt0V~S ttiat a X0R algorithm cannot operate correctly on both even and odd size rings.
We shall henceforth assume that algorithms depend on the ring size ~z (the algorithms given in this paper depend uniformly on n), Even then, not every function can be computed distributively. We have (i) There exist an algorithm that computes f on a ring of size n with fixed orientation iff f is invariant under cyclic shifts of the inputs. (ii) There exist an algorithm that computes f on any ring of size n iff f is invariant under cyclic shifts and reversals of the inputs. PROOF: The conditions in (i) and (ii) are clearly neces-g~F~. TI~8 sufficieneF if~tl~W~ ~Pgl~l tile al~orltllrn~ described in §4.1. • 3.3. Orienbation TH~0R~M 3.4. There is no orientation algorithm that works correctly for even length rings. PROOF: The gist of the argument is that a deterministic algorithm cannot break a symmetric configuration, and achieve an asymmetric orientation. Given an orientation algorithm that works correctly on a ring of length 2n consider the configuration that consists of two oriented half rings.
For every processor is(left(i)) = left(~(i)) and ~a(right(i)) = riyht(~(i)). It follows by induction that i and ~(i) receive the same messages throiJghout the computation, and halt with the same output. Therefore, the processors ¢ and ~n+l-i either both switch their orienta~{on, oi" both preserve their original orientation. However, processors i and 2n+l-i have reverse initial orientations, so that i should switch its orientation iff ~n +l-i does not switch it. • The orientation ean, therefore, be solved only for odd length rings. An argument simi|ar to that given [n the proof of Theorem 3.1 shows:
There is no algorithm that solves the orientation problem for infinitely many ring sizes.
The Asynchronous Model.

J.. Upper Bounds
We first show that any function that can be computed on a ring, can be computed using O(n 2) messages; if the inputs are Boolean then one bit messages are sufficient. This is true even if the ring is not oriented. The algorithm that performs that is trivial: Each processor P first sends a message in both directions consisting o~ {is input va[ue, and o[ a bit ihdic~tti~ tl~h port label (e.g. 0 is sent to feB (P) and 1 is sent to r~ght(P)). Next, P forwards the following ~%/2]--I messag@s it i'~ei'~ f~6Itl I~.~(P) t~ right (P) , and the following [vt/2J-1 messages it receives from right (P) to Left (P) .
Each processor may reconstitute the input configuration and the orientation of the individual processors on the ring from the messages it receives, relatively to its own location and orientation; it can use this information to compute locally the required function. Note that the algorithm needs zt(n-1) messages if n is odd. When zt is even n(n-1) messages can be achieved by the following refinement: a processor forwards n/~-1 messages that were initially sent left and n/~ -~ messages that were initially sent right.
]f the ring length is odd, then this algorithm can be used to orient the ring: processors pick an orientation according to the majority of the individual orientations.
Lower Boundfor AND
We shall show that a quadratic number of messages is required to compute even simple functions such as AND.
THEOI~M 4. i Any asynchronous algorithm for computing the AND of all inputs on a ring, requires ~(n~) messages in the worst case; this holds true even if the ring is oriented. PHOOF: In the asynchronous model an adversary can manipulate the delays of the communication channels. We assume it schedules channels at discrete time steps.
When a communication channel is scheduled all meSsages sent during previous cycles on the communication channel reach theii-destination in the preseht cybl~. W~ shall USe alti adversary that schedules all communication channels at each cycle. This "synchronizes" the computation, and keeps the ring configuration as symmetric as possible.
Let A be an algorithm that computes the AND function on an oriented ring. Consider two input 55~i~iii~tiShs: the l/li'St W~ere all ihput~ are i, arid the second where all inputs are 1, with the exception of one O input at processOi" 1.
Consider the computations in the first and second configuration using the above adversary.
The first configuration is symmetric. Thus if some processor does ndt rec~iV~ a message at some cycle, then no processor receives a message at that cycle, and no further state transitions occur. This implies that each processor receives a message at each cycle of the first computation except for the first cycle, until the computation terminates at some cycle T. At least (T-1)n messages were sent during the computation on the first configuration.
We now compare the states o~ certain sets of processors in 65t6 compdtations. Let Er = }r÷l,T+2
.n+l--r}, for l~-~n/2J. One can show bj, a mmpl~ iridti~ti~ argumeiit tliiit tile proees-18~ 8f Er ~re in the same state at the beginning of cycle r in both computations.
This holds because for each r the pt0e~s~or~ in E~ receive the same gequence of messages in both computations during the first r-1 cycles.
If Tg~n/2] then Er is non-empty. Thus the processors of ET would produce the same output in both computatlons. We conclude that T> ~zF ~], and therefore tti~ number of messages sent in the first computation is at least in/~|~ messages, • COROLLARY 4.2 Any asynchronous algorithm for com-putin~ the maximum of all inputs on a ring requires in the worst case ~(n ~) messages, when the inputs are nor necessarily distinct.
Trl@ last proof can be refined to Field a lower bound of n(vz--1) messages which is tight. Using a similar argument, it is possible to prove an [}(n ~) lower bound on number of messages required to compute any function f :~0,1} '~ 4~0,1~ with the property that ] (0 ..... 0)~! (i ..... 1). Oti tti~ other hand, it is possible to construct nora constant functions that can be computed with a linear number of messages.
Consider the function W defined on rings of odd size that is 1 on the configuration 0(01)', 0 9th.er.wis.e. The following algorithm computes W on an oriented ring.
(1) Each processor checks the value of its left neighbor.
(2) The rightmost processor in a 00 pair sends right a 1 message; the rightmost processor in a It pair sends right a 0 message. (31 A processor that did not issue any message forwards the messages it receives; a processor that issued a 0 message does not forward messages; a processor that issued a 1 message issues a 0 message if it receives a message. Any ring configuration of odd length contains ~ither a 00 pair or a 11 pair, so that some message is issued; each processor sends at most two messages, so that the total number of messages is linear. The value of W is 1 iff there is a unique processor issuing a 1 message, in which case all processors receive that message; in all other cases all processors receive a 0 message.
The algorithm can be modified to work on an unoriented ring. We can also construct a non constant function that can be computed in a linear number of messages on even sized rings.
Lower Bound for Orientation
An argument similar to that used in Theorem 4.1 yields an C~(n ~) lower bound for the orientation problem.
THEOREM 4.3. Any asynchronous algorithm for orienting a ring requires in the worst case 9(n z) messages. PROOF: Assume algorithm A orients rings of length n=2rn+1. We consider three input configurations. In the first one, processors 1 ..... rn are clockwise oriented (~"~ght(i)= i+i, fori=l ..... rn,) and the processors rn+l ..... 2m+l have the reverse orientation (~ight(i) = i-l, for~.=rn+l ..... 2rn+l). In the second input configuration right(i) = i+1 for the first ~n.+l processors, and right(i) = i--1 for the remaining m processors. The last input configuration is an oriented ring:
in the second; and the same for right.
We use, again, the same "synchronizing" adversary. Consider the execution of algorithm A on each of the three input configurations.
Processor i in the first configuration will "behave" identically as processor ~(i) in the second configuration; both will output the same value. If i~,t+l, then processors ~ and ~(~) have in the second configuration opposite initial orientations; they halt, therefore, with distinct outputs. It follows that if g~vt +i then the output of processor i in the first configuration is different from its output in the second configuration.
Let T be the cycle of termination in the last configuration. Since this configuration is completely symmetric at least (T-l)n messages are sent. Let E~. = ~T,r+l ...... kcz-r+iI. With the same inductive argument as in the previous proof one can show that at the beginning of cycle r the processors of Er are in the same state in each of the three computations.
It follows that T>[rn/2]. Thus, the number of messages se~t in t~e last configuration is a~ least [m/2]n~ re n--1 1~-~-~ messages. • A more refined analysis can be used to raise the last lower bound by a ~actor of two.
Rings with Leaders
The last results indicate that the bit complexity of nontrivial functions is equal to their message complexity, for anonymous~ asynchronous rings. The situation is different for labeled rings, or even for rings with leaders.
]n a ring with a leader every computable fiinetibn can be computed in a linear number of messages using the following simple input collection algoTith~n: The leader initiates a message in one direction. Each processor that receives this message appends to it its input value and its orientation with respect to the message direction, and forwards the augmented message. When the leader receives back this message, it has a complete description of the input configuration. It can compute locally the output configuration, then propagate it around in a second message.
On the other hand, even on rings with leaders, it is easy to prove quadratic lower bounds on the number of bit transmissions required to compute certain functions, using information theoretic arguments. An example of a function with quadratic bit complexity is that of detecting squares (e.g. input configurations of the form zm).
EventUal Termifititioli
The quadratic lower bounds we obtained in this section are strongly dependent on the assumptions on termination. In our model we assume that a processor not only reaches the correct answer~ but also "knows" that the computation has terminated: It decides on the output, halts, and does not accept any new messages. One can consider a weaker model, where there are no terminal states: A processor is always ready to accept a new message. A function is computed in this model if all processors eventually reach a state corresponding to the correct output, and do not transmit any new messages. However, the processors are not required to coordinate terqa[nation.
We call this the eve~zfual terrn.ination modeL
It is possible to compute in this model the AND function asynchronously using only a linear number of messages: A processor with initial bit 0 sends a message in both directions, and goes to state zero, where no further messages are generated. A processor with input bit 1 is initially in state one, where no messages are sent. ]f it receives a message, it forwards it, and goes to state zero (if it receives simultaneously two messages, it does not forward any). Eventually either all processors are in state zero~ or all processors are in state one (according to the value of the AND function), and no further message is transmitted.
The total number of messages sent is O(n). Note, however, that a processor in state one does not "know" whether the computation has terminated.
The Synchronous Model.
B.I. RND RIEorikhm
A simple adversary argument will show that O(n) messages are needed to compute any nonconstant (one output) function. Indeed, consider two input configurations differing at one node only, such that the corresponding outputs differ. Then each of the remain-1~1~ ~:1 i'l~as must racelv~ a message in at least one of the computations, otherwise it will output the same value iM both computations.
TIi~ AND function can be Computed with ~i litiaat number of messages: We run the eventual termination algorithm given at the end of the last section. If by. cydle ~7 ~] ~ p~Ocessor had recEiVed a message the value cff the AND is O, otherwise the value is 1.
This simple algorithm exhibits the power o~ synchronism: a processor may gain information without receiving any message.
In general, any asynchronous algorithm that works in the eventual termination model ca~ be modified to ymld a valid synchronous algorithrti, with the strong termination assumption: We run the algorithm synchronously. It is possible to glee an ~ ~i-i~flci upper bound on the number of cycles an algorithm may run before reaching a configuration where no further messages are sent. We modify the algorithm so that processors halt after that number of cycles has lapsed.
Note that strict synchronization is not required for that purpose: it is sufficient to have an upper bound on transmission delays. The same remark applies to all the synchronous algorithms we present in this section: they use synchronism only to detect situations where there are no pending messages in the system. We shall use in this section this basic technique to detect situations where an algorithm deadlocks in a symmetric configuration.
All the synchronous algorithms given in this section can be run (with slight modifications) asynehronously, model, using the same number of messages, This, however, is not the case for bit message complexity, In §5.4 we give an example of a function that can be computed synchronously using O(n) bit messages, but requires O(n ~) bits in the asynchronous model, even with eventual termination.
Input Collection Algorithm
Not every function can be computed in a linear number of messages. We shall show, however~ that any computable function can be computed in O(nlogn) messages in the worst case. We present an algorithm that solves the ~r~put collection problem, that is the problem of distributing to each processor the entire input configuration. At the end of the computation, each processor holds an output string that consists of the sequence of input values at consecutive processors on the ring, starting from its own input value, followed by the value of its right neighbor, etc. That information is sufficient, according to Theorem 3.3, to compute iocaiiy any function ~hat can be computed distributively.
In a ring with a leader it is possible to solve the input collection problem in O(n) messages, using the linear input collection algorithm described in ~4.4. If the processors have distinct labels, then a leader can be selected using O(nlg n) messages [DKR] . The leader is selected to be the processor with maximum label. The selection algorithm proceeds by rounds: Initially each processor is a prospective leader.
At each round remaining candidates that are neighbors exchange information, whereby a fixed proportion of them is disqualified. After O(lg n) rounds, each requiring O(n) messages, otfiy orie leader remains.
We wish to combine the previous two algorithm s in one algorithm. We run a leader choosing algorithm where a logarithmic number of rounds are performed.
At the start of each round processors are either active (leader candidates), or passive. Passive processors merely forward messages. The ring is partitioned into disjoint segments, each containing one active processor, that "represents" the segment: This pi-oces §or stoi-es a complete description of the input configuration on this segment. At each round a new. smaller set of active processors is chosen. Each of the chosen prts~essors gathers information from the disqualified active processors in its neighborhood, by running the linear input collection algorithm, where it acts as a leader.
The information available at each active processor plays the role of the labels that are compared in the leader choosing alg0i-ithHi. Th~i'~ iS, however, hO guarantee that these labels are distinct. The algorithm may deadlock in a symmetric configuration where all active processors have the same information. However, since the algorithm is synchronous, the processors can detect that such event occurred, by waiting su~icientiy long time. If the ring is oriented then we are at this point in a situation where the input configuration is periodic, and each active processor holds a copy of that period. Since each active processor also knows the ring size it can, therefore, reconstruct the entire input.
However, we do not assume that the ring is oriented. If each active processor holds the same string s, then the ring is not necessarily periodic, as the active processors need not be consistently oriented. The ring configuration consists of a sequence of segments that are all equal either to s, or to s ~, the reversal of s.
We can at this stage run a new round of the leader choosing algorithm, using this time the directions of the active processors as labels. This can be used anew to break symmetry.
If this method fails, then the cc~nfiguration is periodic, and each active processor may compute the peried,
We describe now the algorithm in detail.
[lo~glORI~HM. The algorithm proceeds through at most .sn i rounds. Each round takes a linear time, independent of the input, and requires, in the worst case. a linear number of messages. Each active processor holds a value which is a string describing the inputs on a segment of the ring, oriented according to the ! f-orten~atlon of the proces §oi-, with a mark denoting the processor location, These segments form a partition of the Hng. ~nltlal[y each processor is active, and h~i~ fl § If an active processor holds a string of length n, then it has the entire ring configuration, and no other processor is active, i~he string is sent in one direction, and forwarded by the remaining processors, If an active processor holds a string of length <~z, it starts a new i-Ound.
In the first part of the round leaders are selected amo.ng the active processors according to the lexicographic order of their values. An active processor is a local mam~murn, if it has a value greater or equal to the values of the nearest active processors on its both sides, and at least one inequ~ility is Sti'biig. At most two thirtd of the active processors may be local maxima. The first part of a round proceeds in three phases.
In the first phase active processors send their value in both directions; active processors decide whether they are local maxima by comparing their value to the values they receive.
Ifi the §~eond ~h~, ~6 l~5~ll fffi~titnum ~nds a message in both directions to assert leadership over a segment. An active processor forwards the first such B'i~t~@ it sees; this messaEe determines to what segment it belongs. If it receives a second message, then it lg alga ari ~0point of this segment (this includes the case nf a local maximum that receives a message).
In the third phase the active processors that are within a segment mai'ked by a local maximum f61-Wg[-~ t~i~ ~l~i~ to it. A m~M~t~ i § liflltlat~a By ~¢5t!~ ~rlt1= points of the segment. Each processor that belongs to th~ §~t'~i@flt ~.[Sp~IdB ttS valU~ t~ SUCh message as it forwards it.
Each phase takes a constant, linear number of cycles., and uses aE most a linear number of messages.
If some local maximum was found then each active processor received a message during the second phase. The active processors that are not local maxima inactivate themselves at the end of the third phase, and the remaining active processors start a new round, The number of active processors has decreased by at least one ~ii'~.
If rio local maximum was found in the first phase, then the active processors did not receive any message in the second phase. In that case the active processors start the second part of the round. Note that in this case all the active processors have the same value s. With respect to a fixed orientation, the configuration of the segment associated with each active processor is either s or s R.
We say that P points to Q if there is no active processor at the right of P between P and Q. An active processor is now said to be a local maximum, if it is pointed at by its two neighbors. At most one half of the active processors are local maxima. The algorithm proceeds now through the same three phases that were executed in the first part (in the first ~l~ ~ ~roce$~P ~lfl~ § lfl BBtl~ ~tP~ti~ln ItS ~rlerltS= lion, relatively to the direction of the message).
If no Local ni~tXima were discovered in this part then we have one of the following two cases: (i) All the active processors have the same orientation.
In that case the ring is periodic, with period s. (ii) Every two consecutive active processors have reverse orientations.
In that case the ring is periodic, with period ss e. The active processors can detect which condition holds, and proceed to distribute the ring's period, within their segment.
Ifa local maximum was discovered, a new round is started.
The last algorithm does not use orientation, ]f the ring is oriented, then it can be simplified, Also, if the ring is oriented, one can construct a similar algorithm that uses only one-sided communication,
Thus, any function that can be computed on a unldlrec£1ona[ ring, can be computed synchronously in O(nlg ~t) messages.
Note that the last algorithm uses the local orientation of the proc.essors to resolve conflicts when a processor receives simultaneously messages from both its neighbors. Thus, the algorithm run by each individual processor is not symmetric: If ie~/c is replaced by r{ght, a different algorithm is obtained (which is also correct),
One can modify the algorithm to he symmetric, The only place where local orientation is needed is for a processor to decide to which segment it belongs, when it simultaneously receives a "leadership assertion" message from both directions. The decision can be avoided by making al|eg{ance to both leaders: The processor will send its value in both directions. The segments owned by the active processors may now overlap. [P] h~s, however, does not require significant changes in the algorithm.
~.3. orietitatioli Algorithm
The second part of each round in the previous algorithm uses only the processors' orientation, in order to choose a leader. We can use this part in order to construct an orientation algorithm. Note, however, that we may reach, starting from an orientable ring, a configuration w[£~ an even number of active processors that are pairwise oriented in reverse directions, thus creating a dead~oc[~ situation.
According to Theorem 3.4. only odd length rings can be oriented. We shall take advantage of the "oddity" of the initial configuration: ~he aigorithm is modi~ect in such a way £ha[ ~he leader finding algorithm deadlocks only when the ring is oriented.
The algorithm proceeds in rounds. In each round the number of active processors is reduced by et constant ratio (here one third), and each round "costs" O(r~) messages and time. To prevent the creation of a symmetric configuration, we ensure that the number of active processors is odd at each round.
The main di~erence between the orientation protocol and the input collection algorithm is in the way the ring is partitioned into segments and how the leaders to the segments are elected. Look at the graph with processors as vertices and for each processor P, a directed edge from P to r, Lght (P) . The number of weakly connected components is at most (n-l)/2 and at least one. The size of the ring is odd, hence there is an odd number of odd components (hence at least one such component).
Call a processor P, s.t. rioht(left(P))~P an .edge processor. Each component has two edge processors at its boundary (unless the ring is oriented and consists of one component), Call a processor that is at the same distance from the two edge processors of its component a ce~zter processor.
Note that there is one center p~b~ § §6~ in every odd cotttpot~ent, and no center pro ~ cessor in even components. Center processors will be the aetivb ~b6bbssdi-s ifi thb he~t i-61ihd. Each ~dd length corftponent wUl be replaced by one processor; all processors of even length components will become passive. The number of active processors at the next round is equal to the number of odd sized components in the current round, hence it is Odd. Clearly the size of each component is at least two. Thus the smaliest odd component is of size three and ~,herefore the number of active processors will be reduced at least by a factor of one £tiird.
It remains only to show that a round can be implemented with O(~t) messages and time. The computation in each round is divided into two phases. In the first one, currently active processors draw arrows, and create components; at the end of the first phase each edge processors send a message in the direction of the other edge processor of the component, and in the third, center processors are detected; at a center processor the messages of the edge pbocessoi-s of the cortiptment must meet. The algorithm appears in Figure 1 .
Bit C0mpleslty
The orientation algorithm can do with bit messages.
However, in the input collection algorithm we did not resLrlc~ the size of the messages sent However, in synchronous algorithms, it is always possible to replace arbitrary messages by one bit messages, at the expense of an increase in time (we assume that message size is bounded). The idea is to encode the message value by a delay. If messages may have /c distinct values, then we t-eplace each cycle of the original algorithm by k subcyales, If a message with value i was sent in the original algorithm, then a message is sent at subcycle i in the new algorithm. Note that we do not make use of the message value but just of its reception time.
This argument implies £ha£ the genel-al ihptit c011eewith O(~zlg n) bit messages and 0(2n~tlg ~t) cycles, for Bbbleah ifipi.it~. A inore i-~if~i~d ~tfd~ily §i § §h6Wg teat messages sent during a round can be pipelined, so that we 0nly have an additive factor of ~" at each round. Thus, the aigorithm can be run in 6(~ign) cycles, witia if first message in this phase then forward: until no messages received during the round. The last bound can be improved for special functions, where one can distribute a small function value, rather then a string encoding the input values. For example, the sum of the input bits can be computed in O(nlg n) cycles, with O(nlg n) bit messages. This holds true for the computation of ¢~t* '' " *an, where * is an arbitrary associative and commutative operation, with a range of size O(n).
The same technique does not work in the asynchronous case, even with eventual termination. Consider the problem of detecting a ring configuration of the form 0X0110X0, where 11 does not occur in X. A simple information theoretic argument shows that O(n ~) bits are required to solve this problem asynchronously. It can be solved asynchronously using 0(n) messages, in the eventual termination model: An algorithm similar to the AND algorithm can be used to detect that 11 occurs only once; then, starting from its place of occurrence, one can run an input collection algorithm to solve the problem. It follows that this problem can be solved synchronously with O(n) bits, can be solved asynchronously with O(n) messages (in the eventual termination model) but requires f}(n ~) bits if solved asynchronously.
Synchronization
Algorithms in this section were written assuming that processoi'~ ~tart simultaneously. We drop now this assumption. We assume that the processors are all originally idle. A processbP ~iW~iR~ %itlti~r ~pOtltatleOU~ly, t~r when receiving a message.
The transitions are henceforth deterministic. We can introduce minor modifications in the previous algorithms of this section, so that they will run correctly in that weaker model.
Alternatively we can reduce this situation to the one that holds when processors start siiiiidltddiebi.i~ly by showing that tlH~ ~ocessor synchronization problem can be solved in O(nlg n) messages: We can write an algorithm, so that if ~a6h pi-bb~sfibi-~ttit-tS PUtitltti R the al~t~rithm at a (possibly) different time, then O(nlgn) cycles and messages after the first processor started all processors will halt simultaneously. By prepending the synchronization algorithm to an algorithm that assumes simultaneous start, we obtain an algorithm that computes the same function, but does not require simultaneous start. As processor synchronization is an interesting problem on its own, we shall follow the second approach.
The synchronization algorithm will synchronize all processors to the time of the earliest starting processor(s).
It uses the same basic framework as our input coilectitiii ~l~orithm. We tun h leader choosing algorithm, that chooses the earliest running processor. We assUftt6 that ~bti ~ftle~r l~pg a eount ~f tlti~ number of cycles that lapsed since it awoke. An active processor is a local maximum if its count is ahead of the count of its neighbors, and strictly ahead of the count of at least one neighbor. At most two third of the active processors survive after each round. If no local maximum is found then all processors have the same count, and thereby are synchronized.
A difficulty arises when neighboring active processors exchange their counts: While the message containing the count travels, the value of the count changes. This can be remedied by having each passive processor increase the count it forwards by one.
We give a formal description of this algorithm in apart by at most In/Hi. Let phase k consists of the cycles where processors forward messages which were initiated by active processors when their count was equal to 2kn. ~ince air messages reach their destination in less than n cycles, the last remark implies that distinct phases do not overlap.
When a phase ends all processors that received a message from a local maximum are synchronized with it. If one prpcessor receives a message during a phase then all do. If no processor received a message durthg a phase then no local maximum was detected at the previous phase, and aii processors are synchronized. This implies the correctness of the algorithrh.
During a phase O(n) message~ are sent. The number of active processors decreases at each phase by one third at least, so that the total number of phases is logarithmic.
The total number of messages sent is, therefore. O(nlg n).
Bit Complexity
It is possible to replace in the synchronization algorithm arbitrary messages by bit messages, with a constant factor increase in time and message complexity. Note that each processor knows, upon receiving a message, to which phase it pertains. The only required infor-lttlatlOn Is the time that lapsed from the message issuin~ to its reception, e.g. the distance separating the sender from the receiver. This distance ean be computed by" sending two messages, first a message traveling at speed 1, next a message traveling at speed ~: The first message is forwarded at each cycle, whereas the second message is delayed one cycle by each forwarding processor. The distance betw~etl sender and receiver equals to the difference between the reception time of these two messages.
Suppose processor P has count c = /c~%, and is to transmit a message. Then P sends the following two messages:
(1) One message of speed I, at time c (the current time). (2) One message of speed ~, at time c+l. Assume P' receives messages from P at times t~ and t z. Then the distance between P and P' is f~-tl; the count of P when P' receives its second message equals to c + ~(t~-~). P' can compute c from the phase number; it computes the current count of P, and update its count accordingly.
The time required for a communication between two active processors is bounded by 2n. We shall modify the algorithm so that 3n cycles are dedicated to each phase (an active processor sends a message only when its count is c = 3kn): active processors send messages using the method described above, and passive processors merely forward messages. The algorithm will require no more than 3nlogL~n cycles and messages.
Note that each processor receives from each direction alternately messages of speed 1 and messages of speed ~, startlng with speed 1 messages. Thus, the messages need not carry any information (not even their speed) and one can use empty messages.
Lower Bounds in the Synchronous Model
The main 0Dst~cle to a lower bound proof in the synchronous model is that information is obtained at each cycle even when no message is received. Use of such information is essential in our synchronous algorithms. We give in this section a short outline of a lower bound technique that overcomes this problem. Full details are given in [ASW] .
Let A be an algorithm that distinguishes input configuration I~ from the configuration I~. Consider the computations of A on these two input configurations. Assume that a processor does not receive messages at some cycle of one computation. This yields new information that helps to distinguish the two inputs only if this processor receiyes a message at the same cycle in the other computation. Therefore, we can restrict our attention to "active cycles" only, e.g. cycles where some message is transmitted in at least one of ~.~nese (we computations.
The state of a processor at the end of the k-th active cycle depends only on input values at nodes at most k apart from it. Assume that I, and Iz are strings of length n with the following property: If ~ is a substring of length k <an that appears either in Ii or in I m than it occurs at least KS(n/k) times both in I1 and in Iz.
Since there is a linear length substring common to both I 1 and t~ decision cannot be reached in less than a linear number of active cycles. The previous condition on the strings implies that whenever a processor sends a message at the k-th active cycle, there are f](n/k) processors that send, too, the same message. This yields an ~(nlgn) lower bound on the number of messages required to distinguish the two configurations.
An fI(nlg ~) lower bound for X0R is proved by building two such strings where XOR obtains distinct values. A similar technique is used for the other two problems.
