A synchronizer is a compiler that transforms a program designed to run in a synchronous network into a program that runs in an asynchronous network. The behavior of a simple synchronizer, which also represents a basic mechanism for distributed computing and for the analysis of marked graphs, was studied in ER1] and ER2] under the assumption that message transmission delays and processing times are constant. In this paper we study the behavior of the simple synchronizer when processing times and transmission delays are random. Our main performance measure is the rate of a network, i.e., the average number of computational steps executed by a processor in the network, per unit time. We analyze the e ect of the topology and the probability distributions of the random variables on the behavior of the network. For random variables with exponential distribution we provide tight (i.e. attainable) bounds and study the e ect of a bottleneck processor on the rate.
INTRODUCTION
Consider a network of processors which communicate by sending messages along communication links. The network is synchronous if there is a global clock whose beats are heard by all the processors simultaneously, and the time interval between clock beats is long enough for all messages to reach their destinations and for local computational steps to be completed before the clock beats again. The network is asynchronous if there is no global clock, and the transmission times of messages are unpredictable.
In general, a program designed for a synchronous network will not run correctly in an asynchronous network. Instead of designing a new program for the asynchronous network, it is possible to use a synchronizer, A1], i.e., a compiler that converts a program designed for a synchronous network, to run correctly in an asynchronous network. Synchronizers provide a useful tool because programs for synchronous networks are easier to design, debug and test than programs for asynchronous networks. Furthermore, an important use of synchronizers is the design of more e cient asynchronous algorithms A2]. The problem of designing e cient synchronizers has been studied in the past (e.g. A1], AP90], PU89]).
The (worst case) time complexity of a distributed algorithm is usually computed assuming that processing times and message transmission delays are equal to some constant which represents an upper bound on these durations. The goal of this paper is to study the e ect of random processing times and transmission delays on the performance of synchronous programs running in an asynchronous network under the control of a simple synchronizer. We compare the results with the deterministic case ER1], ER2], in which processing times, as well as message delays, are constant (or bounded).
The operation of the synchronizer is as follows: Each processor waits for a message to arrive on each of its in-coming links before performing the next computational step. When a computational step is completed (after a random time), it sends one message on each of its out-going links. The implementation of this synchronizer may require, for instance, that every message is followed by an end-of-message marker, even if the message is empty. These end-of-message markers model the ow of information that must exist between every pair of processors connected by a link in each computational step A2]. This is how a processor knows it has to wait for a message which was sent to it, or if no message was sent.
We use this synchronizer in our analysis since it is very simple, yet, it captures the essence of the synchronizer methodology, i.e., it ensures that a processor does not initiate a new phase of computation before knowing that all the messages sent to it during the previous phase have already arrived. Moreover, the synchronizer is equivalent to a marked graph (e.g. CHEP] ) in which the initial marking has one token per edge. In Ra91] and RM92] the relationship between synchronizers and marked graphs is studied, and it is shown how the simple synchronizer can model the behavior of any marked graph, of the synchronizers of A1], and of distributed schedulers in BG89], MMZ88]. Thus, our work is closely related to problems in stochastic petri nets, where, due to the huge size of the state space, the solution techniques often rely on simulation (e.g. M1] , M2], Ma89]).
Many distributed protocols are based on this simple synchronizer. For example, the snapshot algorithm CL85], clock synchronization algorithms (e.g. BS88] , OG87]), the synchronizers of A1], the distributed schedulers in BG89], MMZ88], the optimistic synchronizer GRST92]. The synchronizer is similar to synchronizer in A1], but can be be used also in directed networks, as opposed to other synchronizers suggested in A1] that require all links to be bidirectional. In ER1] and ER2] the bene ts of using the synchronizer as an initialization procedure are described.
Main Results
This paper is devoted to the performance analysis of strongly connected directed networks controlled by the simple synchronizer, in which transmission delays, as well as the time it takes a processor to complete a computational step are random variables. Our main performance measure is the rate of computation R v , i.e., the average number of computational steps executed by a processor in the network, per unit time. To facilitate the presentation, we rst assume that the transmission delays are negligible, and only at the end of the paper describe how to extend the results for networks with non-negligible delays.
In Section 3 we study the case in which the random variables have general probability distributions. We consider two approaches. First (Section 3.1) we analyze the e ect of the topology on the rate. We use stochastic comparison techniques to compare the rate of networks with di erent topologies. We give examples of networks with di erent topologies, but with the same rate. Then (Section 3.2) we analyze networks with the same topology but different processing times. By de ning a partial order on the set of distributions, we show that deterministic (i.e. constant) processing times maximize the rate of computation. For this case, it is shown in ER1] that if the processing times are equal to ?1 , the rate of the network is , regardless of the number of processors in the network or its topology. In the next section we show that in case the processing times are random and unbounded, the rate may be degraded by a logarithmic factor in the number of processors. This occurs in the case of exponentially distributed processing times. However, in this section we show that the exponential is the worst, among a large and natural class of distributions (it yields the minimum rate within a class of distributions).
In Section 4 we concentrate on the case of processing times that are exponentially distributed random variables with mean ?1 . We prove that the rate is between =4 log( + 1) and = log( + 1), where ( ) is the maximum (minimum) vertex in-degree or out-degree. Hence, for regular-degree (either in or out-degree) networks, the rate is ( = log( + 1)). We compute the exact rate and the stationary probabilities for the extreme cases of a directed cycle and a complete graph. Finally, we study the e ect of having one processor that runs slower than the rest of the processors, and we show that in some sense, the directed cycle network is more sensitive to such a bottleneck processor than a complete network.
In the last section we show that it is easy to extend the results to networks with nonnegligible transmission delays. We consider the exponential distribution case, and show that adding transmission delays to a regular degree network may reduce its rate by at most a constant factor, provided that they are not larger (w.r.t. the partial order) than the processing times. In networks with processing times exponentially distributed with mean 1, and larger delays with mean ?1 , we compare the results with those of ER2], where it was shown that for the corresponding deterministic case the rate is . In the probabilistic case of a regular-degree network, the rate is at least ( = log ). Thus, in both cases (small and large delays), the rate of a bounded degree network is reduced only by a constant factor.
Previous Work
There exist several results related to our results in Section 3.2 in the literature on stochastic petri-nets. For instance, dominance results for rather general stochastic petri-nets have been obtained in Ba89] and more recently in BL91] by using Subadditive Ergodic Theory (e.g. K73] ). It should be noted, however, that the proofs we provide for the simple synchronizer are di erent and much simpler and do not require heavy mathematical tools. Other stochastic ordering studies exist. Papers on acyclic networks and fork-join queues are PV89] and BM89, BMS89, BMT89], respectively. For closed queueing networks the e ect of increasing the service rate of a subset of stations for systems such that the distribution of the number of works in each station has a product form solution is studied in SY86].
A model similar to our model in Section 4 is considered in BT89], where it is claimed that the rate is (1= log out ), for regular networks with out-degree equal to out , identically exponentially distributed transmission delays with mean 1, and negligible processing times.
In BS88] only a lower bound of (1= log in ) on the rate is given, for regular networks with in-degree equal to in , with negligible transmission delays, and identically exponentially distributed processing times. Recently, it has been shown in BK91] that subadditive ergodic theory can be used to derive more general lower bounds on the rate. A bottleneck problem related to ours has been considered by B88] where an asymptotic analysis of cyclic queues as the number of costumers grows is presented. Asymptotic performance of stochastic marked graphs as the number of tokens grows is studied in M2]. The class of networks with exponentially distributed processing times belongs to the more general model of stochastic petri nets (see Ma89] for a survey), where it is usually assumed that the state space (of exponential size, in our case) is given.
THE MODEL
The network is modeled by a ( nite) directed, strongly connected graph G(V; E), where V = f1; 2; : : :; ng is the set of vertices of the graph and E V V is the set of directed edges. A vertex of the graph corresponds to a processor that is running its own program, and a directed edge u ! v corresponds to a communication link from processor u to processor v. In this case, we shall say that u is an in-neighbor of v, and v is an out-neighbor of u in the network. The processors communicate by sending messages along the communication links. To facilitate the presentation, we assume that the message transmission delays are negligible. At the end we brie y discuss the case of non-negligible transmission delays.
Initially, all processors are in a quiescent state, in which they send no messages and perform no computations. Once a processor leaves the quiescent state, it never reenters it and is considered awake. When awakened, each processor operates in phases as described in the sequel. Assume that at an arbitrary time, t(v), processor v leaves the quiescent state and enters its rst processing state, PS 0 (this may be caused by a message from another processor, or a signal from the outside world, not considered in our model). Then, processor v remains in PS 0 for 0 (v) units of time and then transits to its rst waiting state, WS 0 . From this time on, let PS k and WS k , k 0, denote the processing state and the waiting state, respectively, for the k-th phase. Observe that we are concerned with the rate of computation of the network; the nature of the computation is of no concern to us here. Thus we take the liberty of denoting with the same symbol the k-th processing state of all the processors.
The transition rules between states are as follows: If a processor v transits from state PS k to WS k , it sends one message on each of its outgoing edges. These messages are denoted by M k . Note that this labeling is not needed for the implementation of the protocol; it is used only for its analysis. When v sends the M k messages, we say that v has completed its k-th processing step.
If a processor v is in state WS k , and has received a message (M k ) on each of its incoming edges, it removes one message from each of its incoming edges, transits to state PS k+1 , remains there for k+1 (v) units of time and then transits to state WS k+1 . Otherwise, (if at least on one incoming edge, M k has not yet arrived) processor v remains in state WS k until it receives a message from each of its in-neighbors, and then operates as described above.
The processing times, k (v), correspond to the time it takes for processor v to complete the k-th computation step. The processing times k (v), k 0, v 2 V , are positive, real-valued random variables de ned over some probability space. , it waits until all processors with an edge to it send message M k , and then starts its (k + 1)-st computation step; that is, after the maximum of t k (u), u 2 IN(v), it starts the (k + 1)-st computation step, which takes k+1 (v) units of time, and then sends out M k+1 . For this reason we shall assume in the rest of the paper that for each vertex v the edge v ! v is in E. The evolution of the network can be described by the following recursions:
It is interesting to note that the completion times t k (v) have a simple graph theoretic interpretation. Assume that the Theorem holds for k 0. From the recursion above we have that t k+1 (v) = max
By the inductive hypothesis,
which gives the desired result
The Performance Measures
The most important performance measures investigated in this paper are the completion times t k (v), k 0, v 2 V . A related performance measure of interest is the counting process N G t (v) (or simply N t (v)), associated with processor v de ned by N t (v) = sup fk : t k (v) tg; that is, N t (v) is the number of computation steps (minus 1) completed by v up to time t, or the highest index of an M k message that has been sent by v up to time t. Similarly, N t = P n v=1 N t (v) denotes the total number of processing steps (minus n) executed in the network up to time t. The following claim indicates that no processor can advance (in terms of executed processing steps) too far ahead of any other processor. (v) t ;
whenever the limit exists. Similarly, the computation rate of the network is de ned by R = lim t!1 N t t : Claim 2.2 implies that for every u; v 2 V , R(u) = R(v), and therefore R = n R(v).
GENERAL PROBABILITY DISTRIBUTIONS
In this section we compare the performance of di erent networks, with general distributions of the processing times k (v). We rst show that adding edges to a network with an arbitrary topology slows down the operation of each of the processors in the network. We show how the theory of graph embedding can be used to compare the rates of di erent networks. As an example we present graphs, which have the same rate (up to a constant factor) for general distributions, although they have di erent topologies. Finally, we compare networks with the same (arbitrary) topology but di erent distributions of the processing times. Speci cally, we show that determinism maximizes the rate, and exponential distributions minimize the rate, among a large class of distributions.
Topology of the Network

Monotonicity
Here we show that adding edges to a network with an arbitrary topology slows down the operation of each of the processors in the network. The basic methodology used is the sample path comparison; that is, we compare the evolution of message transmissions in di erent networks for every instance, or realization, of the random variables k (v). This yields a stochastic ordering between various networks Ro83], S84].
Theorem 3.1 Let G(V; E) be a graph, and E 0 V V be a set of directed edges. Let H(V; E E 0 ) be the graph obtained from G by adding edges E 0 . Assume that processor v,
Embedding
The theory of graph embedding has been used to model the notion of one network simulating another on a general computational task (see for example R88]). Here we show how the notion of graph embedding can be helpful in comparing the behavior and the rates of di erent networks controlled by the synchronizer.
An embedding of graph G in graph H is speci ed by a one-to-one assignment : V G ! V H of the nodes of G to the nodes of H, and a routing : E G ! Paths(H) of each edge of G along a distinct path in H. The dilation of the embedding is the maximum amount that the routing \stretches" any edge of G:
The dilation is a measure of the delay incurred by the simulation according to the embedding.
The following theorem is a generalization of Theorem 3.1. We are assuming a realization k (u) = kD ( (u)), for every u 2 V G . It follows that for every path P G k , there exists a path P H kD , such that
By Theorem 2.1, t G k (v) t H kD ( (v)). Proposition 3.6 For all n 1: One can embed the order n Shu e-Exchange graph in the order n deBruijn graph with dilation 2. One can embed the order n deBruijn graph in the order n Shu e-Exchange graph with dilation 2.
Proposition 3.7 For all n 1: One can embed the order n Cube-Connected-Cycles graph in the order n Butter y graph with dilation 2. One can embed the order n Butter y graph in the order n Cube-Connected-Cycles graph with dilation 2.
By Theorem 3.3, the average rate of the graphs of Proposition 3.6 (3.7) are equal up to a constant factor of 2, provided that the processing times of corresponding processors have the same distributions (regardless of what these distributions are).
Probability Distributions
Deterministic Processing Times
Now we compare networks, say G(V; E) and H(V; E), having the same (arbitrary) topology, but operate with di erent distributions of the random variables k (v). To that end, we assume that the processing times G k (v), k 0, v 2 V are independent and have nite mean E G k (v)] = ?1 v .
We say that v is the potential rate of v, as this would be the rate of v if it would not have to wait for messages from its in-neighbors. The processing times in H are distributed as in G From (1) we have that
By the induction hypothesis, When all processing times in the network H are deterministic, the computation of the network rate is no longer a stochastic problem, but a combinatorial one. Thus, a conclusion of Theorem 3.8 is that in this case, the computation rate of H, obtained via combinatorial techniques ( ER1] and ER2]), yields an upper bound on the average rate of G. Furthermore, if the times t H k (v) are computed, they give a lower bound on E t G k (v)], for every k 0.
More Variable Processing Times
More generally, we study the e ect of substituting a random variable in the network (e.g. the processing time of a given processor, for a given computational step) with a given distribution, for a random variable with another distribution on the rate of the network, and de ne an ordering among probability distributions.
Recall that a function h is convex if for all 0 < t < 1, x 1 , x 2 , h(tx 1 + (1 ? t)x 2 ) th(x 1 )+(1?t)h(x 2 ). A random variable X with distribution F X is said to be more Here we compare networks, say G(V; E) and H(V; E) having the same arbitrary topology, but some of the processing times in G are more variable than the corresponding processing times in H, i.e., for some k's and some v's, G In the next section we show that if the processing times are independent and have the same exponential distribution with mean ?1 , then the rate of any network is at least jV j= log jV j.
We conclude this subsection by characterizing a set of distributions for which the same lower bound holds.
Assume that the expected time until a processor nishes a processing step given that it has already been working on that step for time units is less or equal to the original expected processing time for that step. Namely, we assume that the distributions of the processing times k (v), for all v 2 V , k 0, are new better than used in expectation (NBUE) (e.g. Ro83], S84]), so that if is a processing time, then E ? aj > a] E ] ; 8a 0: Let G d (V; E) be a network with deterministic processing times, let G e (V; E) be a network with corresponding processing times with the same mean, but independent, exponentially distributed, and let G(V; E) be a network with corresponding processing times with the same mean and independent, but with any NBUE distribution. The following theorem follows from the fact that the deterministic distribution is the minimum, while the exponential distribution is the maximum with respect to the ordering c , among all NBUE distributions Ro83], S84].
Theorem 3.11 For every v 2 V , k 0 it holds that t G d
Some examples of distributions which are less variable than the exponential (with appropriate parameters) are the Gamma, Weibull, Uniform and Normal.
We should conclude this section by pointing out that the interested reader can nd similar results for rather general stochastic petri-nets in Ba89] and BL91].
EXPONENTIAL DISTRIBUTIONS
In this section we assume that the processing times k (v), k 0, v 2 V are independent and exponentially distributed with mean ?1 . We rst consider general topologies and derive upper and lower bounds on the expected values of t k (v), and thus obtain upper and lower bounds on the rate of the network. These bounds depend on the in-degrees and out-degrees of processors in the network, but not on the number of processors itself. Then, exploring the Markov chain of the underlying process, we derive the exact rates of two extreme topologies: the directed ring and the fully connected (complete) network. For these two topologies we study also the e ect of having a single slower processor within the network. We assume the statement holds for k 0, and prove it for k + 1. The proof of the basis is identical. Let v k+1 be the processor for which the processing time during the (k + 1)-th computational step is maximum, among the out-neighbors of v k , i.e.,
Upper and Lower Bounds
Since v k+1 will not start the (k + 1)st computational step before v k nishes the kth computational step, we have that t k+1 (v k+1 ) ? t k (v k ) k+1 (v k+1 completing the proof of (i). The proof of part (ii) evolves along the same lines, except that we start from v k and move backwards along a path.
Remark 5 From its proof, one can see that Lemma 4.1 holds for any distribution F of the processing times, for which the expected value m c of the maximum of c independent r.v. with distribution F exists. In this case it implies that R v 1=m c , with c = out or c = in .
Remark 6 Lemma 4.1 implies that for the exponential case, the slowdown of the rate is at least logarithmic in the maximum degree of G. By Remark 5, there are distributions (not NBUE, by Theorem 3.11) for which the slowdown is larger; an example is F(x) = 1 ? 1=x 2 , x 1, for which the slowdown is at least the square root of the maximum degree of G ( D70] pp. 58). (ii) For every k 1, for every processor v, E t k?1 (v)] max u2V t(u) + log jV j + 4 (1 + k log out ):
Proof: Again we restrict ourselves to the proof of part (i). Recall that Theorem 2.1 states that for every v 2 V , k 0, t k (v) = max T(S k (v)). Also, for a path P k = v 0 ! v 1 ! ! v k , T(P k ) = t(v 0 ) + P k i=0 i (v i ), but for the moment let t(v) = 0, for every v. 
Exact Computations
Theorem 4.3 implies the following bounds for the rate of a directed cycle C n ( = = 2) and of a complete graph K n ( = = n), where n is the number of processors: 0:36 R Cn (v) ; 4 log n R Kn (v) log n : In this section we shall compute the exact values for the rates of C n and K n . To that end we consider the Markov chain associated with the network This Markov chain is denoted by X(t) = (X 1 (t); X 2 (t); :::; X m (t)), where X i (t), is the number of messages stored in the bu er of edge i at time t, and m is the number of edges in the network. Note that a processor with positive number of messages on each of its in-coming edges is in a processing state. When such a processor completes its processing (after an exponential time), one message is deleted from each of its in-coming edges and one message is put on each of its out-going edges. We denote by s 0 the state in which X i (0) = 1, 1 i m. Thus, the network can be represented as a Marked Graph (e.g. CHEP]).
The number of states in the Markov chain is nite, say N, because a transition of the chain does not change the total number of messages in a circuit in the network. Moreover, if the network is strongly connected, then the Markov chain is irreducible. Therefore, the limiting probabilities P i , 1 i N, of the states s i of the chain exist, they are all positive and their sum is equal to 1 (e.g. C67], Ro83]). However, as we shall see, N can be exponential in n, therefore it is infeasible to compute the rate by directly solving the Markov chain. Here we show how to solve the Markov chain for two network classes, without having to produce the entire chain. We hope this combinatorial approach could be applied to other networks as well.
Let G X denote the transition diagram (directed graph) of the Markov chain X. Consider a BFS (breadth rst search) tree of G X , rooted at s 0 . The level L(v) of a vertex v will be equal to the distance from s 0 to v. Thus, L(s 0 ) = 0. Denote by L i , i 0, the set of vertices at level i, and by L the number of levels of G X .
A Simple Directed Cycle
We study the performance of a simple directed cycle of n processors C n = p 1 ! p 2 ! ! p n ! p 1 . It is not di cult to observe that the Markov chain associated with C n , corresponds to that of a closed queuing network; we return to this approach later. Here we choose to use a combinatorial approach. (ii) For any graph G which is not a simple directed cycle (i) does not hold. Proof: (i) The proof follows from two observations. First, by symmetry, all the states in one level have the same probability. Second, the indegree of any state in the transition diagram is equal to its outdegree. Then a simple inductive argument can be used to prove part (i).
(ii) If G is not a cycle, then it has a node v, s.t. d in (v) > 1. Let v 1 ; v 2 , be two nodes with edges to v. Consider the state s, reached from s 0 , by the processing completion (or in marked graphs terminology, ring) of vertex v. The outdegree of s is equal to n ? 1, because apart from v, all vertices are still enabled. But the indegree of s is at most n ? 2, because by the ring of v 1 , or of v 2 it is not possible to reach s, since there are no messages on the edges from v 1 and v 2 to v, in s. Therefore, we have proved that d in (s) 6 = d out (s).
Consider the balance equation that holds at state s: P 1 + P 2 + ::: + P k = n ? 1 P s , Where P i , 1 i k are the limiting probabilities of the states that have an edge to s, k = d in (s), P s is the limiting probability of s, and n ? 1 = d out (s). We have just proved that k 6 = n ? 1. It follows that it is not possible that all the probabilities of the last equation are equal.
The next theorem states that each processor of C n works at least at half of its potential rate , regardless of the value of n. Proof: If M is the number of states in which at least one message is in an edge, going into a processor, say v, then the running rate will be M=N times the expected ring rate. This is because v will be enabled when it has more than 0 messages in its input edge, and since all states have the same probability (Theorem 4.4), the percent of the time that is enabled is simply, M=N.
The number of ways of putting n objects in k places is P(n; k) = (n + k ? 1)! n!(k ? 1)! : It is not di cult to see that N = P(n; n) and M = N ? P(n; n ? 1). Thus, M N = 1 ? n ? 1 2n ? 1 which gives the desired results.
A Complete Graph
Let K n be a complete graph with n processors. Recall that N is the number of states in the associated Markov chain, and let s 0 be the state in which each edge has one token. A state is at level l, 0 l n ? 1, if it can be reached from s 0 by the ring of l processors. The limiting probability of a state at level l is denoted by P(l).
Theorem 4.6 : The rate of a processor in K n is
Proof: A simpler proof can be derived as in the proof of Theorem 4.10; here we give a combinatorial proof which also yields the number and the limiting probabilities of the states of the associated Markov chain.
We consider a Markov chain T, similar to the Markov chain associated with network K n . The root of T, s 0 , is the state with a message in each edge. A state s will have one son for each one of the enabled processors at state s; a son of s corresponds to the state arrived from s by the ring (completion of a processing step) of one of the enabled processors in state s. Note that in chain T there are several vertices corresponding to the same state of the chain associated with K n .
In T, the number of states in level l is n!=(n ? l)!, because each time a processor res it can not re again until the rest of the processors have red. Thus, the number N T , of states in T is
n! i! The number of states in which a given processor is enabled at level l, en(l) (edges from level l to level l + 1), is en(l) = 1 n n! (n ? l ? 1)! ; because at level l there are n!=(n ? l ? 1)! enabled processors, and by symmetry, each processor is enabled the same number of times at each level.
Let us denote by P T l , the limiting probability of a state of T in level l. One can show that P T l = (n ? l ? 1)!=K, where
It follows that the percent of time that a processor is enabled is
where ut(l) = en(l)P T l , and its rate is ut.
Corollary 4.7 For a network K n , N = 2 n ? 1; P l = l!(n ? l ? 1)! n! P n i=1 1 i :
Proof: As noted before, it may be that two states of T correspond to the same state, say s, of K n . In fact, if a state of T is reached from s 0 by ring a sequence of processors of length k, then all k! permutations of the processors in this sequence constitute a valid ring sequence, which leads to the same state s. Thus, the limiting probability of a state s at level l is P l = l!P T l = l!(n ? l ? 1)! n! P n i=1 1 i ;
The number of di erent states at level l is n!=l!(n ? l)!, and the total number of di erent states is n?1 X l=0 n!=l!(n ? l)! = 2 n ? 1:
Corollary 4.8 Asymptotically, the rate of any network of n processors is between n=2 and n= log n.
Observe that the best possible rate of a processor is 2=3 of the potential rate, in the case of a cycle of two processors; adding more processors can only lower this rate, but not below 1=2. Yet, the rate of the network grows linearly with n. In the case of a complete graph, the rate of a processor reduces as n grows, but also here the total number of computational steps executed per unit time (n= log n) grows with n.
Bottlenecks
Suppose that the potential rate of all processors of a graph is , except for one, which has a lower rate . We shall now show that such a bottleneck has a stronger e ect in a network which is a directed cycle, than in one which is a complete graph.
Consider the case of a simple directed cycle with n vertices CB n , where n ? 1 processors have rate , and one processor has rate . Using standard techniques of Queuing Theory, we prove the following. Proof: Let X i , 1 i n be the number of messages in the bu er of the incoming edge to processor i. The total number of messages in the cycle is equal to n. Since this is a closed queueing system, we have that the limiting probability of the system being in state (x 1 ; x 2 ; : : :; x n ) is given by the following product form Ro]:
Pr(X 1 = x 1 ; X 2 = x 2 ; : : :; X n = x n ) = K 1 xn n?1 j=1
x j = n;
and is equal to 0 otherwise, where K is a normalization constant that guarantees that the sum of all the above probabilities is equal to 1. Thus, the probability of having l messages on the incoming edge to processor n is Pr(X n = l) = Pr(X 1 + X 2 + + X n?1 = n ? l; X n = l) = X S 0 K 1 l 1 n?l = jS 0 jK 1 n l ; 0 l n; where = = and S 0 = f(x 1 ; x 2 ; : : :; x n?1 ) : x j 0; 1 j n ? 1; P n?1 j=1 x j = n ? lg. Hence, Pr(X n = l) = K 0 2n ? l ? 2 n ? l ! l ; 0 l n;
where K 0 is the normalization factor determined by the condition P n l=0 Pr(X n = l) = 1. Now, observe that the rate is simply n 1 ? Pr(X n = 0)], as the rate of the processor is while there are messages in the incoming edge to processor n.
Several conclusions can be derived from Theorem 4.9. First, observe that the rate of the cycle cannot exceed n, thus the slow processor bounds the rate of the network. Moreover, for a xed n and a very slow processor ( ! 0 or ! 1), the rate of the network is n 1 ? 2n ? 2 n !
?n ], namely, as increases, the rate approaches its upper bound n.
Next, we consider the case where the graph is a complete graph KB n . We continue to assume that the rate of n ? 1 processors is and the n-th processor is slower, operating at rate . We shall show that, for xed and , as the number of processors n grows to in nity, the in uence of the slow processor diminishes, and in the limit, the rate of the network is the same as that of a network with all processors running with the same rate .
Theorem 4.10 The rate of a processor in KB n is at least +log n , = = .
Proof: Suppose that the network is in state s 0 at a given time, and after some time T 1 it returns to that state; then after some time T 2 it returns again, and so on. Then, fT i ; i = 1; 2; : : :g is a sequence of non-negative independent random variables with a common distribution F, and expected value E T i ].
Denote by N(t) the number of events (returning to s 0 ) by time t. The counting process fN(t); t 0g is a renewal process. Therefore, with probability 1, N(t) t ! 1 E T i ] as t ! 1: (See, for example Ro83]). Moreover, since each time the process returns to s 0 , each processor of the network has completed exactly one computational step, it follows that the rate of the network is 1=E T i ]. We proceed to bound E T i ].
The expected time of T i , that takes to return to s 0 is of the form:
(n ? 1) + + 1 (n ? 2) + + + 1 (n ? j) + + 1 (n ? j) + 1 (n ? j ? 1) + + 1 ; for some 1 j n, depending on when the slow processor completes a computational step. If the system leaves s 0 because the slow processor completed a computational step, E T i ] is 1 . In general, if the j-th (1 j n) processor to complete a computational step, after leaving s 0 , is the slow processor, then E T i ] is j .
The probability of E T j ] being equal to j , is not necessarily the same for every j, but for the case > , it holds that j < j+1 . Thus, n = n?1 X k=0 1 k + gives an upper bound on the time E T] that takes to return to s 0 , and 1= n , is a lower bound on the rate of a processor in the network.
We have
We see that E T i ] = O( 1 + 1 log n) and thus R(KB n ) is at least 1 1 + 1 logn :
For xed , R v is ( = log n), but observe that the rate of the network cannot exceed . However, when the number of processors n increases to in nity, the rate of the network decreases in proportion to 1= log n, as if the slower processor is not in the network.
NON-NEGLIGIBLE TRANSMISSION DELAYS
We brie y discuss the case of non-negligible transmission delays. In this model the processing times are random, as before, but the transmission delays are also random. Denote the transmission delay of message M k , k 0, along edge u ! v, by k (u; v), and let k (u; u) = 0, for all u 2 V . It follows that the behavior of the system is described by the recursions t 0 (v) = t(v) + 0 (v)
Note that this system is not equal to the one of BT89], in which the processing times are negligible , and the delays non-negligible, with a self-loop in each processor (to model its processing delay).
Let P k = v 0 ! v 1 ! ! v k (= v) be a path of length k. It is easy to see how to modify the de nition of T(P k ):
Thus a theorem similar to Theorem 2.1 holds, and the corresponding results for general distributions follow.
Consider the case in which the processing times, as well as the transmission delays are exponentially distributed, with the same mean, say 1. It is easy to see that Lemma 4.1 still holds, and that Lemma 4.2 holds up to a factor of 2. Namely, by Theorem 3.9, a regular network with non-negligible delays runs at the same rate that the same network, up to a constant factor, provided that the delays are less or equal (in the convex order) than the processing times. In ER1] we show that for a network with negligible delays and deterministic processing times equal to 1, the rate of any network is equal to 1. Thus, in this case, random processing times degrade the rate by at most a logarithmic factor in the maximum degree of a processor. Now, consider the case in which all processing times have mean 1, but the delays have mean ?1 greater than 1, both exponentially distributed. The rate in the deterministic case is equal to ER2], and thus, by Theorem 3.3, in our case the rate is at most . One can prove (using Proposition D.2 ), that also in the case of non-negligible delays, the rate is degraded by at most a logarithmic factor in the maximum degree of a processor, with respect to the (optimal) deterministic case, for any NBUE distribution.
As for the exact computations for networks with average processing times and delays exponentially distributed with mean 1, the rate of a simple cycle can be computed using the same tools of Queuing Theory that we used in the case of negligible delays. To compute the rate of a complete network K n things are not as straightforward; the structure of the Markov process is more complicated, but by the arguments above, we have that the rate is between 1=8 log n and 1= log n. However, using the ideas of embedding, let us show that the rate of K n , is at least 1=4 log n. Let K 0 n be a complete network with negligible delays. Construct G, from K 0 n by inserting one vertex in each of its edges. By Theorem 3.1 (or also 3.9), the rate of any processor in G is at least 1 log(n + n(n ? 1)) = 1 2 log n : One can show that the rate of any processor v in K n is greater or equal than half the rate of the corresponding processor in G, using using the fact that there is an embedding of K n into G of dilation 2. Therefore, the rate of v in K n is at least 1=4 log n.
CONCLUSIONS
In this paper we have studied the behavior of synchronizers in networks with random transmission delays and processing times. We attempted to present a self-contained, general study of the synchronizer performance, from the view point of distributed algorithms, rather than providing a deep mathematical study of the underlying stochastic process. In particular, we were interested in comparing the behavior of synchronizers with random delays as opposed to the usual approach of analyzing distributed algorithms with bounded delays. Our main conclusion is that if the delays belong to the natural class of NBUE distributions, the rate of the network is only degraded by a small, local (vertex degree) factor.
We presented several properties of the behavior of the synchronizer for general probability distributions, and described techniques useful to compare the rate of the synchronizer running in networks with di erent topologies.
For exponential distributions we showed that the expected duration of a round of computation depends on the logarithm of a vertex degree, and hence, the rate of computation does not diminishes with the number of processors in the network. We presented techniques to prove upper and lower bounds on the rate, and to obtain exact computations. We hope the combinatorial approach of these techniques, which was applied to rings, complete networks and regular degree networks, will be used in the future to obtain results for other topologies as well.
APPENDIX
The following proposition (similar to pp. 672 in BT89]) is used to prove the lower bounds on the rate of a network. 
