We consider the problem of routing in an asynchronous dynamically changing ring of processors using schemes that minimize the storage space for the routing information. In general, applying static techniques to a dynamic network would require significant re-computation. Moreover, the known dynamic techniques applied to the ring lead to inefficient schemes. In this paper we introduce a new technique, Dynamic Interval Routing, and we show tradeoffs between the stretch factor, the adaptation cost, and the size of the update messages used by routing schemes based upon it. We give three algorithms for rings of maximum size N : the first two are deterministic, one with adaptation cost zero but worst case stretch factor N /2 , the other with worst case adaptation cost O(N ) update messages of O(log N ) bits and stretch factor 1. The third algorithm is randomized, uses update messages of size O(k log N ), has adaptation cost O(k), and expected stretch factor 1 + 1/k, for any integer k ≥ 3. All schemes require O(log N ) bits per node for the routing information and all messages headers are of O(log N ) bits. * This research was supported by an NSERC grant and by MIUR progetto "Matematica per le scienze e la tecnologia" Università di Trieste. Rajeev Raman's work was supported in part by EPSRC Grant GR/L92150. A preliminary version of this paper was presented at the IEEE 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing (IPPS/SPDP '99) [17] .
Introduction
The design of routing schemes that minimize the space devoted to routing tables in a network is an active area of research [3] , [7] , [10] - [16] , [18] - [21] . Most of the research done in this area has concerned static networks and has focused on the tradeoffs between the space required for routing tables and the quality of the routing paths the tables defined. In general, to apply such static compact routing techniques to a dynamic network it would be necessary to perform a global re-computation of all the routing tables in the network. A much better approach is to consider schemes that require only a limited number of table updates (in the worst case or in an amortized sense) whenever a change occurs in the network [1] , [2] , [4] - [6] , [8] , [9] .
A static routing scheme is composed of distributed routing tables (one at each node) and a routing procedure which uses the routing tables to perform the message delivery. A dynamic routing scheme consists also of a distributed update procedure which updates the routing tables whenever a change occurs in the network. In this paper we assume the changes include processors going off-line or coming on-line, in a fault-free manner, as is the case when, e.g., users are logging in or out, processors are being taken off-line for maintenance, new processors are being added to the network, etc. (In particular, we assume that processors going off-line complete the update procedure first.) In the worst case a single change may require that all the routing tables in the network have to be updated. It is desirable to design dynamic routing schemes that limit the amount of updating that must occur per change. We are interested in finding tradeoffs between the length of the routing paths, the space requirements of the routing tables, and the worst case or amortized number of messages exchanged per topology change, for dynamic routing schemes.
An important example of static routing schemes are k-Interval Routing Schemes (k-IRSs) [18] , [21] : In an N node network, every node is labeled with a different value in the set {0, . . . , N − 1}; every arc e i leaving a node i is assigned a set of at most k disjoint intervals [a Messages from i to j are forwarded through the arc labeled with the interval containing j. Interval routing schemes are an example of compact routing schemes. Note that the space required to store the routing table of a single node for a k-IRS is O(kd log N ) bits where d is the degree of the node. For small k and d this is a significant saving over the complete table which requires O(N log d) bits per node in the worst case.
As an example of a dynamic routing scheme we introduce Dynamic Interval Routing Schemes (DIRSs). A DIRS for a network with maximum size N is based on the 1-IRSs (shortly IRSs): nodes are labeled by distinct values in the set {0, . . . , N − 1} and arcs are labeled by disjoint intervals of values in the same set. However, not all of the processors may be on-line at all times, i.e., intervals may contain the label of processors that are not on-line. As changes may occur in the network, an update procedure is defined in order to modify the routing tables dynamically, i.e., the range of the intervals assigned to the arcs.
We require the following definitions. A processor is said to be pending if it has come on-line or is going off-line but has not yet completed the update procedure associated with the change it has caused in the network. After completing the update, the processor is said to be active if it comes on-line and non-active if it goes off-line. We say that the system has reached quiescence if there are no topological updates pending, i.e., all processors are either active or non-active (see [1] ).
We say that the system delivers all messages correctly if:
1. A message travels only a bounded number of steps. 2. A receiver receives a message if it is active during the entire lifetime of the message.
We assume that the communication between two neighboring processors that are active or pending has cost 1. We consider the following complexity measures:
1. The space complexity for the routing scheme, i.e., the maximum number of bits stored in each processor for the routing information during the quiescent state. 2. The update message size, i.e., the size of the update messages in bits. 3. The adaptation cost, i.e., the number of update messages generated per insertion and deletion of processors (this cost is worst case or amortized over the number of processors that go on-line and off-line in the system). 4. The stretch factor for the routing scheme, i.e., the maximum ratio between the length of the routing path between any two active processors and the length of a shortest path between them. The stretch factor is computed only when the destination processor is active and when the system has reached a state of quiescence, otherwise it can trivially be shown to be (N ) [1] .
In this paper we consider the problem of routing on an asynchronous bidirectional dynamically changing ring of processors with FIFO queues and with global orientation (i.e., all processors agree on the left and right direction).
Specifically, we assume that the ring has N switches, numbered consecutively {0, . . . , N − 1} in clockwise order, with switch i connected to switches numbered (i − 1) mod N and (i + 1) mod N . There are N processors as well, each of which is associated with exactly one switch. The label of a processor is simply the number of the switch with which it is associated. Messages move between switches: If a switch is open the associated processor is non-active and messages pass through at no cost. If a switch is closed the associated processor is either active or pending and all messages are delivered to the processor for processing. (See Figure 1. ) Each processor knows the topology of the network, the value N , and its own label, but must explicitly acquire information about the status of the other processors in the network (i.e., whether they are active, inactive, or pending).
In order to ensure correctness, and more precisely to ensure that messages travel only a bounded number of steps, we assume that there is a unique processor, without loss of generality, the one labeled with value 0, that is always active (see details given later). We use n to denote the number of active and pending processors at any particular time, i.e., n is the effective size of the ring. This model captures the common star-shaped ring Local Area Network (LAN) topology in which all connections pass through a wire center as well as rings consisting of virtual links in optical networks [22] .
Below we describe three different DIRSs for the ring network which show a tradeoff between the stretch factor, the adaption cost, and the size of the update messages. The first two algorithms are deterministic, one with adaptation cost zero but worst case stretch factor N /2 , the other with worst case adaptation cost O(N ) messages of O(log N ) bits but with stretch factor 1. The third algorithm is randomized and has expected amortized adaptation cost O(k) with messages of size O(k log N ) and expected stretch factor 1 + 1/k, for any integer k ≥ 3. All of our schemes use only O(log N ) bits per node to store routing table information and require the message headers to contain at most O(log N ) bits.
Regarding the space complexity note that, as in the study of compact routing schemes, the space considered is only that related to the routing scheme, i.e., to the storage of the routing table. In order to speed up routing table look ups, we assume that each table is stored in a cache memory inside the router. We also make the assumption that all of the storage required by the update procedure relies on a different level of storage (either in the router or in the processor itself).
To the best of our knowledge these are the first dynamic routing schemes derived for the above model of the ring. Previous work on dynamic routing has concentrated on other classes of networks such as dynamically growing trees [1] or on general networks [2] , [5] , [6] , [9] . The results on general networks are mostly based upon spanning trees and cluster techniques. When applied to the ring the best of them require polylogarithmic adaptation cost and message size and result in schemes with a polylogarithmic stretch factor.
DIRS with Adaptation Cost Zero
In this section we describe a DIRS that has adaptation cost zero (i.e., requires no messages to be sent when processors go on-line or off-line) but has stretch factor min{n−1, N /2 }, where n is the effective size of the ring. The results of this section are pretty straightforward but are fundamental for understanding the background and the problems that will be faced and solved in the next sections.
We assume the classical node and arc labeling of the IRS for a ring of size N [18] , [21] . Every node i is labeled with some unique integer in the set {0, . . . , N − 1}. Moreover it has two arcs leaving: l i , the left one in a clockwise orientation of the ring, and r i , the right one. The (possibly wrapped-around) interval associated to l i is
Hence every message destined by i to a node j = i in the ring is unambiguously sent either through l i or through r i , as j is exactly contained in either the interval associated to l i or to r i . As an example consider Figure 1 and the intervals associated to node 0, i.e., [5, 7] to r i and [1, 4] to l i . A message from 0 to 2 is therefore sent through arc l 0 .
In the DIRS the data message is in the form M = (D, r, s, x) where D is the information to be exchanged, r is the name of the receiver, s is the name of the sender, and x is a value that denotes the number of times M has passed in front of processor 0 (which is always active).
The DIRS consists of an Update and a Routing procedure. The Update procedure (Algorithm 1 in the Appendix) is executed by a non-active processor that wants to go on-line. It consists of getting a fixed label i (line 1), and then of using the classical optimal stretch IRS, i. (1) the space required for the routing tables is at most O(log N ) bits per node; (2) the adaptation cost is zero (i.e., no update messages are sent); (3) the stretch factor is at most min{n − 1, N /2 }.
Proof. Consider the case in which a receiver j is active during the lifetime of the message M. In this case the DIRS behaves as a normal IRS in a static ring, therefore M is correctly delivered to j. Note also that every processor that goes on-line has a fixed label and the arc labeling is the one of the classical ring IRS. If j is non-active M is killed by the next active processor in the ring. Note that processor 0 is always active and kills messages that have passed in front of it twice, i.e., with x = 2.
The space complexity is O(log N ) bits since every processor stores the value N , its own label, and at most two intervals of O(log N ) bits, one for each out-going arc. No update messages are sent and therefore the adaptation cost is zero. To compute the stretch factor observe that no matter which sequence of changes occurs in the network (processors going on-line or off-line), the longest path a message from i has to travel is to go to processor j = i + N /2 mod N . This trivially implies that at any time the stretch factor can be at most ( i + N /2 − i mod N )/1 = N /2 since a message always travels in the same direction and since at most N /2 processors can be active along that path. Finally observe that N /2 may be at most n − 1, therefore the stretch factor is at most min{n − 1, N /2 }.
Scheme with Linear Adaptation Cost
In this section we describe a DIRS that at quiescence routes with stretch factor 1 and requires O(log N ) bits of space on each processor. For any change to the network (processors going on-line or off-line) between two quiescent states, the worst case adaptation cost is O(n) update messages of O(log N ) bits where n is the maximum number of processors that are active or pending between these two states.
The Algorithm
As we have explained in the previous section, routing can be easily accomplished if every node i sends messages to the left or to the right only, depending on the destination value and on the fixed intervals associated to the arcs leaving i. The loss though is on the stretch factor of the routing path which is N /2 in the worst case. In this section we introduce a new routing scheme (still based on IRS) which drastically improves on the previous stretch factor by allowing a dynamical update of the intervals of the arcs leaving i. Informally, every such interval will on one side be delimited by the value of the processor which is precisely opposite to i (we call it op(i)), i.e., the number of active or pending processors on the left and on the right paths from i to op(i) is equal to within 1. All this is accomplished by ensuring that whenever a change occurs in the network, i.e., when a processor z goes on/off-line, z starts an update phase in which the opposite values of all processors in the ring are dynamically updated.
The algorithm is divided into a Routing procedure and an Update procedure. The Routing procedure (Algorithm 3 in the Appendix) is very similar to Algorithm 2. Informally, the main differences are that the interval associated to every arc leaving a node i now contains a dynamically changing value op(i) (in Algorithm 2 this value was fixed) and that pending processors may only receive and forward messages (in Algorithm 2 a processor coming on-line instantaneously becomes active, i.e., there are no pending processors).
In more detail, an active processor s that wants to send to a processor r a data message Here n = 5 and the intervals associated to node 0 are [5, 7] to r 0 and [1, 4] to l 0 , (i.e., op(0) = 4), as there are two processors on-line both on the left path from 0 to 4 and on the right path from 0 to 5. A message from 0 to 2 is thus sent through arc l 0 .
The update strategy (Algorithm 4 in the Appendix) is more complicated and is divided into three phases. We discuss the case where a processor goes on-line in detail. The case where a processor goes off-line is handled similarly. Recall that a pending processor cannot go off-line, i.e., it must first complete its update procedure.
Informally, whenever a processor wants to go on-line it becomes pending and gets its (fixed) label. For simplicity, let us first assume that a single processor, e.g., i goes on-line. Trivially, in this case the new (exact) opposite value that will have to be stored by each processor will either remain unchanged or will become the old opposite value of its right or left neighbor. All this will depend on the new oddness or evenness of the number n of processors in the ring and on the interval i belongs to. To obtain such an update, processor i has to start sending messages so that every processor may collect the values of its right and left neighbors together with their opposite values. The last thing to observe is that when more than one processor goes on-line the updates have to be sequentialized. This is realized by dividing the update into different phases and by letting only messages related to the update started by the processor with the maximal label and in the highest phase go through and temporarily stopping and buffering all other messages.
In more detail, every processor i that goes on-line becomes pending and gets its fixed label i (line 2). It then sends a Phase 1 message (with values initialized to −1, line 4) in the direction of the orientation (e.g., to the left). This message collects the value of the left and right neighbors of i (called l i and r i , respectively), their opposite values (op(l i ) and op(r i ), respectively), and their actual knowledge on the oddness or evenness of the ring (even(l i ) and even(r i ), where even(x) = 1 if x knows n is even, 0 otherwise). If i gets the message back (line 28) it moves to the next phase since it has won a "race" and it is the only processor going to the next phase. The mechanism used to win the race is simple: if a processor in Phase 1 receives an updated message started by some other processor it lets it through only if the sender's label is bigger (lines 7-9 and 11) or if the sender is in a higher phase (lines 13-26). Every processor in a higher phase stops and buffers in a FIFO queue Phase 1 messages (lines 55 and 72). Therefore a unique processor (the one with the maximal label among pending processors) may move to Phase 2 (and consequently to Phase 3).
We assume i is the unique processor that receives back its At the very end of the algorithm, every processor j has to contain consistent and exact information on j, op( j), even( j), l j , op(l j ), even(l j ), r j , op(r j ), and even(r j ) . Therefore, the aim of Phases 2 and 3 is now to propagate the update from i to the remaining processors. Moreover, to be consistent, i will also have to update its right and left opposite values (op(r i ) and op(l i )), as they may change during these new phases.
More precisely, the Phase 2 message first has the aim of updating the opposite value of each processor, which may either remain the same or become the opposite value of the right or left neighbor depending on the new oddness or evenness of the ring (lines 110-118 and 123-131). Moreover both Phase 2 and 3 messages are used (one for each direction) for the update on every processor of the left and right values, their related opposite and evenness values.
All this is done by sending messages containing the name of the local processor i, its opposite op(i) and evenness even(i) values (lines 47-53 and 66-70), and by dynamically updating it. In other words, if a message is sent to the right, j, the next processor receiving it, will set its local variables r j to i, op(r j ) to op(i), and even(r j ) to even(i) (lines 105-109 and 136-140), and will forward a message containing j, op( j), and even( j) (lines 133 and 150).
When i completes Phase 3 it becomes active, updates all the buffered messages, and lets them through. In this way the blocked updates may proceed (lines 87-89). This mechanism essentially sequentializes all insertions. Proof. The general correctness of the Routing procedure derives from the fact that the DIRSs are based on the classical IRSs. If the receiver r is not on-line, then there exists a pending or an active processor (the one immediately after r 's position) that by looking at the side the message comes from, and at the sender's label, will realize r went off-line. In any case processor 0 is always active. On the other hand, if r is on-line it will eventually receive the message and in the case where it is pending it will kill it. Therefore all data messages are delivered correctly. Since no pending processor can go off-line without completing its update procedure, update messages are all eventually delivered. To show that the Update procedure is correct it is sufficient to show that (i) at most one processor enters Phase 2 at a time, (ii) all pending processors eventually enter Phase 2 (and thereafter Phase 3 and complete their update), and (iii) at the completion of Phase 3 of an update the routing tables of all active processors are correct (ignoring pending processors).
Algorithm Analysis
Proof of (i). We assume by contradiction that at least two processors x and y enter Phase 2. If this is the case it means that both have completed Phase 1, therefore they both got their values back. Without loss of generality, we assume x < y. If this is the case, then the message from y could have passed in front of x but the message from x could not (since it found a processor in the same phase but with a bigger value therefore it had to stop), therefore it passed by before y got up. If this is the case, then x got into Phase 2 before the message for y passed by, and therefore this message must have stopped and must have been stored in x's FIFO queue, therefore yielding a contradiction. Symmetrically if y < x only one processor wins. Moreover, only processors in Phase 2 can move to Phase 3. This can be generalized to many pending processors therefore messages either stop at a processor in Phase 2 or 3 or at others in Phase 1 that have bigger values. When the processor has completed the update it removes buffered messages (that can only be of processors in Phase 1). A new "race" can then start. Also observe that if at least one other processor sent a Phase 1 message then i contains at least one other processor's Phase 1 message in its buffer, i.e., new updates can start.
Proof of (ii). A pending processor eventually sends a Phase 1 message. All Phase 1 messages travel around the ring in a single direction through FIFO queues. A Phase 1 message can be blocked by either a processor in Phase 2 or 3 or by another Phase 1 processor with a higher identity. In the first case the message is unblocked and makes progress as soon as the blocking processor completes its Phase 3. In the second case the Phase 1 blocking processor eventually enters Phase 2 and then 3, and the message makes progress. If not, it must be the case that a cycle of messages blocked by a processor's Phase 1 message exists but this cannot occur since the message from the Phase 1 processor with the highest identity always makes progress.
Proof of (iii). We now show that the routing tables of active processors are correct at the completion of an update. Assume that a processor i wakes up and sends a Phase 1 message M around. If it is the only pending or largest identity processor it will receive back the correct values of its right and left neighbor since no other processor is in Phase 2 or 3. On the other hand, let us assume that M gets to another pending processor j with a bigger value. M will stop at j and i will eventually get it after j's update, but the correct values of its neighbors might have changed in the meantime. On the other hand, every time a processor starts Phase 2 and 3 it updates the values of any active or pending processors x in the ring, i.e., the value op(x) of the opposite processor, the values r x , l x of the right and left active neighbors and op(r x ), op(l x ) their opposite values, respectively, and the evenness values even(x), even(r x ), and even(l x ). Therefore i will have its values updated. Note that Phases 2 and 3 have to be sequentialized as in Phase 2 every processor updates its opposite value and therefore a Phase 3 message passing by earlier would collect a wrong value. The same holds if other updates start before i gets its Phase 1 message back, and the temporary variables will be replaced if necessary. Therefore at the end i will be able to choose between old and new values if there are any and therefore it will be able to take into account all the updates that took place in the meantime.
We are now ready to show points (1)-(3) in the statement of the theorem, namely: (1) the stretch factor is 1; (2) the space required for the routing tables is at most O(log N )
bits; and (3) the amortized adaptation cost per update is O(n) messages of O(log N ) bits.
Proof of (1). This follows immediately from point (iii) above. If at the end of each update the opposite values are correct and the system reaches quiescence, then the stretch factor is obviously 1.
Proof of (2) . The space complexity is straightforward since every processor stores a constant number of values of at most O(log N ) bits each.
Proof of (3). Every processor that goes on/off-line generates at most O(n) messages since the update consists of three phases in each of which the size of the ring is at most n (as this update occurs during two quiescent states). Messages are O(log N ) bits each. Therefore the worst case adaption cost of any change to the network is O(n) messages of O(log N ) bits each.
Finally note that if the system runs synchronously the worst case time complexity of the algorithm, computed between two quiescent states, is O(mn) steps, where m is the number of changes occurring between these two states and n is the maximum number of active or pending processors. The worst case occurs when all m updates are sequentialized as in this case each single update requires O(n) steps.
Scheme with Constant Expected Adaptation Cost
In this section we describe a randomized DIRS that at quiescence routes with expected stretch factor 1 + 1/k and requires an expected amortized O(k) messages containing O(k log N ) bits each for each change in the ring, for any integer k ≥ 3. If k is chosen to be constant the expected adaptation cost is then constant with update messages of size O(log N ). Achieving a smaller expected stretch factor is possible but this requires more messages of larger size.
The Algorithm
The routing scheme we use is again based upon the classical optimal stretch IRS for the ring. Every node i assigns to the left arc the interval [(i + 1) mod N , op(i)] and to the right arc [op(i) + 1 mod N , (i − 1) mod N ], where op(i) is an estimate of i's true opposite value accurate to within a factor of approximately 1/k. This value is updated with probability proportional to an estimate of the number of active processors in the ring by sending a constant number of messages of size O(k log N ) all the way around the ring. The probability is chosen so that the expected adaptation cost is O(k) and the expected stretch factor is less than 1 + 1/k. We discuss the case where a processor goes on-line in detail. The case where a processor goes off-line is analogous and is only sketched below. Recall that a pending processor cannot go off-line, i.e., it must first complete its update procedure.
The DIRS consists of two different algorithms: one used to route messages and one used for the update. The Routing procedure is the same as for the linear adaptation cost algorithm (see also Algorithm 3 in the Appendix) with the only difference that in this case op(i) is an estimate of i's true opposite value.
The Update procedure (Algorithm 5 in the Appendix) is more complicated. Informally, the general idea is that at every instant processors store opposite values which may not be perfectly accurate. Exact opposite values are recomputed only after a certain number of network changes, i.e., pending processors flip coins to decide whether or not to start a new update. This obviously implies that that the stretch factor may not always be 1, but still is very small, and that the amortized cost of the updates is not too high. The update strategy is again divided into three phases, and all updates are sequentialized by using a mechanism similar to the one presented for Algorithm 4. In Phase 1 a processor computes the number of on-line processors, in Phase 2 it collects a subset of their labels (roughly equally spaced among the on-line processors), and finally in Phase 3 it sends these values around the ring so that every processor can compute from them its new opposite value.
More formally, every pending processor i has to do the following things (see Algorithm 5 in the Appendix): it sends a message R = (1, i) to the next active processor on the left. The first active processor j receiving R, knows from the value 1 it is a request for its n and opposite values (lines 4-5 and 102). Processor j replies sending to the right an answer message A = (2, i, j, n, op( j)) that contains the requested values n and op( j) and the label 2 to state it is an answering message (lines 102-103). Processor i waits for this message, and stores the received values (lines 33-34). It then flips a coin with probability of heads equal to min{1, 10k/n} (line 37). If it gets a tail it replies to all buffered messages and it becomes active (lines 97-99). Otherwise, it starts an update similar to that of Algorithm 4 by sending a Phase 1 message U 1 = (3, n 0 , n 1 , i) that contains the value 3 to state is a Phase 1 message, a counter n 0 for the number of active processors in Phase 1, n 1 for the ones pending during Phase 1 (both initialized to 0), and its name i (lines 39-40). The counters are increased by the receiving processors. Note that update messages are always forwarded in the same direction they come from, as in Algorithm 4.
Consider what happens when a single processor starts an update. In this case it gets back the Phase 1 message U 1 containing the number of active processors, n 0 , and pending processors, n 1 , i.e., i's new estimate on n is now n 0 + n 1 , it then sends a Phase 2 message U 2 = (4, (n 0 + n 1 )/10k , d, V, i) that contains a value 4 to state it is a Phase 2 message, a value (n 0 + n 1 )/10k that defines the intervals of values to take, a counter d (initialized to 0), a vector V of processors labels (initially containing i), and its value i (line 70). Message U 2 (line 70) globally collects (n 0 + n 1 )/ (n 0 + n 1 )/10k processor labels of nodes that are active or pending during Phase 1. This is obtained by storing in U 2 the value (n 0 + n 1 )/10k , and by adding a counter d, that is increased only by processors that are on-line during Phase 1 (lines 54-61 and 118-123). When d = (n 0 + n 1 )/10k − 1 the receiving processor adds its label to U 2 and sets d := 0 (lines 55-56 and 119-120). Finally during Phase 3, i sends an update message U 3 = (5, n, V, i) that contains a value 5 indicating it is a Phase 3 message, its n value, the vector V containing the labels it gathered in Phase 2, and its label i (line 82). Whenever an active or pending processor j receives a U 3 message, it updates its op( j) and n values (lines 21-26, 62-67, and 124-129). A pending processor awaiting an answer A message (i.e., values op( j) and n of its left active neighbor) that receives a Phase 1 message (lines 7-10) is counted in n 1 and gets activated at the end of Phase 3 without flipping a coin (i.e., its values will be updated by some other processor going on-line). A pending processor awaiting an answer A message that receives a Phase 2 (lines [17] [18] [19] [20] or Phase 3 (lines 21-26) message waits until the completion of the update at which point it will have collected the new n and opposite value and it will flip a coin (lines 36-37) and eventually start a new update if the result is heads (39-99). Whenever it receives its A message it kills it (lines 42-43).
The case in which more than one processor sends a Phase 1 message is solved using an ordering of the requests based on the largest label value as in Algorithm 4. In this case the update performed by the "winning" processor acts as an update for all the other processors that entered Phase 1 and it is not necessary for them to continue to Phases 2 and 3 (lines 48-53).
The procedure for a processor i going off-line is analogous to the above. Processor i flips a coin with probability min{1, 10k/n} of heads (where n is the most recent estimate of the size of the ring). If it gets a tail it goes off-line. Otherwise, it begins an update procedure as before. Phase 1 counts the active and pending processors. Assuming i moves to Phase 2, processor labels are collected and in Phase 3, the opposite values of active and pending (during Phase 1) processors are updated. At the completion of Phase 3, i goes off-line. Processors deciding to go off-line during an update wait until the completion of the update to flip their coin. If more than one processor enters Phase 1, the one with the largest label proceeds to Phase 2 and all such pending processors are updated together. Note that pending processors that are going off-line add −1 during the counting phase (i.e., during the computation of n 0 + n 1 ). N ) bits.
Algorithm Analysis
Proof. The proof of correctness of the Routing procedure is the same as that for Algorithm 3 given above.
We now show that all pending processors eventually go on/off-line depending upon the action they request. A processor wishing to become active sends a request R message. Four possible situations may arise: (a) It receives back an answer A to its R message and flips a coin with outcome tails. (Note that since processor 0 is always active, the R message is always received by some active processor.) In this case it immediately becomes active. (b) It receives back an answer A to its R message and flips a coin with outcome heads. In this case it enters Phase 1. By an argument similar to that given for Algorithm 4 at most one processor enters Phase 2 and its update runs to completion. At the completion of the update, all processors that were in Phase 1 at the beginning of the update will become active. (c) It does not receive an answer A message but it receives a Phase 1 message. In this case it participates in the update and upon its completion becomes active. (d) It does not receive an answer A message but it receives a Phase 2 orthe outcome ends up in case (a) or (b) above. A processor wishing to go off-line waits until an update in progress is completed if one is in progress and then flips its coin. At this point two situations can arise: (a) The coin flip outcome is tails in which case it goes off-line. (b) The coin flip outcome is heads in which case it enters Phase 1. As before a single processor will eventually enter Phase 2 and upon completion inform processors that they may go off-line.
Proof of (1) . We now show that the routing tables of active processors are correct to within an expected stretch factor of 1 + 1/k. Consider the system at a quiescent state and assume that the last update was done by processor i, i.e., i was the (unique) last processor to enter into Phase 2 of the update procedure. Let n 0 be the number of active processors counted by i during its Phase 1 and let n 1 be the number of pending processors which are going on-line minus the number of pending processors that are going off-line counted by i during its Phase 1. (Note that active processors that wish to go off-line but have already received i's Phase 1 or 2 messages are still active until after the update is completed and are counted in n 0 .) Let n 2 be the change in the size in the ring since the end of Phase 1 for i, i.e., the number of processors that became active minus the number that went off-line between the time that i's Phase 1 message passes by the processor and the quiescent state we are examining. Note that all of these processors flip a coin with probability of heads min{1, 10k/(n 0 + n 1 )}. The result of all of these coin flips is tails. Otherwise, at least one of these processors would have initiated an update, i.e., a contradiction. Therefore the absolute value of expected value of n 2 is less than or equal to (n 0 + n 1 )/10k.
Letting n 0 +n 1 = ν, D = ν/(10k) , we note that there are at most λ = ν/D labels in V , which we denote by v 0 , . . . , v λ−1 . To calculate op(x), we let v j , j ∈ {0, . . . , λ− 1}, be the first processor after x in clockwise order around the ring that belongs to V , if x ∈ V (if x ∈ V we let v j = x), and finally set op(x) = v ( j+ λ/2 )modλ . We now calculate the minimum distance between x and op(x). The distance between x and op(x) can be considered to be made up of λ/2 + 1 distances, the first from x to v j , which is between 0 and D − 1 (0 if x = v j ) and the remainder being of length exactly D except that one may be as small as 1 (if D does not divide ν evenly). Thus, the minimum distance is given by
In the worst case the longer distance between op(x) and x stays the same (respectively is increased by n 2 ), but the shorter distance decreases by n 2 , whose expected value is ν/(10k) (respectively remains the same). Thus, the expected stretch is bounded by
It is easy to verify that this is bounded by 1 + 1/k for any integer k ≥ 3 (respectively with similar computations we obtain that the expected stretch is bounded by the same value).
Proof of (3) . We now prove that the expected amortized number of messages sent per update is O(k). Messages are of size O(k log N ) since R, A, and Phase 1 messages (U 1 ) have at most 3 + 4 log N bits. Phase 2 and 3 messages (U 2 and U 3 ) have at most 3 + (20k + 3) log N bits as the number of labels collected in V is at most 20k. As a matter of fact, for n 0 + n 1 ≥ 10k, and n 0 + n 1 = q10k + r , with r < 10k, r and q integers, we have:
The expected number of messages sent per update can be bounded as follows. A pending processor is responsible for sending at most one R message and at most one A message. Moreover, it sends at most two Phase 1 messages, i.e., its own if it does flip a coin or that of some pending processor behind it if it does not, plus the Phase 1 message of the eventual "winner" of the Phase 1 "race," and at most one Phase 2 and 3 messages. After flipping its coin, with the probability of heads equal to min{1, 10k/n}, where n is the processors estimate for the size of the ring determined during the previous successful update, it generates an update that may proceed through all three phases. Let n be the number of changes that have occurred in the ring since the value n was determined including changes occurring up to the point where the processor receives back its "winning" Phase 1 message. The successful update is responsible for a total of 3(n + n ) messages for each of the three phases. During this period 1 + n processors go on-line or off-line. Note that processors arriving during Phase 2 or 3 are responsible for their own messages before they flip their coin.
Therefore the expected amortized cost per update is at most
For the space complexity observe that every processor stores its label, the estimated value of n, and an opposite value all of O(log N ) bits plus some extra variables of O(1) bits. At run time it needs another O(log N ) bits for local computation.
Conclusions
In this paper we have considered the problem of routing in an asynchronous dynamically changing ring of processors. We introduced a new technique, Dynamic Interval Routing, and applied it to the ring. We presented three algorithms for rings of maximum size N : the first two are deterministic, one with adaptation cost zero but worst case stretch factor N /2 , the other with worst case adaptation cost O(N ) messages of size O(log N ) bits and stretch factor 1. The third is a randomized algorithm that uses update messages of size O(k log N ), has adaptation cost O(k), and expected stretch factor 1 + 1/k. All schemes require O(log N ) bits per node for storing the routing information and all messages have headers of size O(log N ) bits.
Observe that the techniques introduced can be easily extended to the case of the ring of rings networks. It remains an open problem to study whether the tradeoffs established by our randomized algorithm hold in the deterministic setting and to see to which other topologies the above techniques can be applied. Also, it would be interesting to find tight lower bounds for the problem. 
