In the theory of dissemination of information in interconnection networks (gossiping and broadcasting) one assumes that a message consists of a set of distinguishable, atomic pieces of information, and that one communication pattern is used for solving a task. In this paper, a close connection is established between this theory and a situation in which functions are computed in synchronous networks without restrictions on the type of message used and with possibly di erent communication patterns for di erent inputs. The following restriction on the way processors communicate turns out to be essential: ( * ) "Predictable reception": At the beginning of a step a processor knows whether it is to receive a message across one of its links or not. We show that if ( * ) holds then computing an n-ary function with a "critical input" (e.g., the OR of n bits) and distributing the result to all processors on an n-processor network G takes exactly as long as performing gossiping in G. Further we study the complexity of broadcasting one bit in a synchronous network, assuming that in one step a processor can send only one message, but without assuming ( * ), and broadcasting one bit on parallel random-access machines (PRAMs) and distributed memory machines (DMMs) with the ARBITRARY access resolution rule. ?
Introduction
The purpose of this paper is to demonstrate that the well-established theory of gossiping and broadcasting in interconnection networks can directly be applied to obtain lower bounds for algorithms that compute functions in synchronous networks consisting of processors connected by bidirectional links. In this introductory section, we informally discuss the relevant structures and describe the results. Formal deÿnitions will be given in Section 2.
Gossiping and broadcasting
In gossiping theory one deals with the following basic situation: each node of a network initially has an atomic piece of information; the purpose of a gossiping scheme is to distribute all the information to all the nodes. For this, in rounds, the nodes send each other messages consisting of an arbitrary number of pieces of information. The standard restriction is that in one round a node can communicate with only one of its neighbors. One distinguishes 1-way (or half-duplex) mode, where in a round information can be sent through a link in only one direction, and 2-way (or full-duplex) mode, where in a round two nodes may exchange all their information through a link that connects them. The most intensively studied e ciency criterion in this theory is the number of rounds needed for disseminating all pieces of information to every node. The broadcasting problem is similar, excepting that only one piece of information, initially located at one node, is to be spread to all others. The accumulation problem is the converse of the broadcast problem: the aim is to collect all pieces of information initially located at the single nodes in one distinguished node.
Since the actual contents of the ideal "pieces of information" are irrelevant for the task, the algorithms considered are communication schemes that do not depend on any input data.
For an account of the history of the area of gossiping and broadcasting and the intensive research devoted to it see the survey [16] , the more recent surveys [13, 19] , or other articles in the special issue of Discrete Applied Mathematics (vol. 53, 1994) . In the fundamental paper [22] , which also contains many lower bound arguments, the relevance of the gossip model for real multiprocessor systems is discussed.
Computing functions in processor networks
In this paper we consider the problem of computing functions in networks. The computational model is a network of n processors, P 1 ; : : : ; P n , that are connected by bidirectional links, according to a network graph G = (V; E). The processor network is to compute an n-ary function f : A 1 × · · · × A n → A, for arbitrary sets A 1 ; : : : ; A n , and A, in the following way. Initially, processor P i knows the ith component a i of the input a = (a 1 ; : : : ; a n ); at the end, all processors know the result f(a) ("global output"). (For example, for realizing a synchronization barrier in the network, i.e., in order to ÿnd out whether all processors have ÿnished certain subtasks from a set T 1 given to them and inform them whether to proceed to another set T 2 of subtasks that requires that all activities for T 1 are ÿnished, the Boolean function OR must be computed with global output.) We will also consider the situation where only one processor has to know the result ("local output"). The processors work synchronously in lock-step, i.e., in global steps t = 1; : : : ; T . In one step, a processor may communicate with at most one of its neighbors. Di erent models are obtained by allowing only 1-way tra c on a link in a step (half-duplex mode) or 2-way-tra c (full-duplex-mode).
Fixed versus data dependent communication.
A gossiping scheme consists of a single, ÿxed communication pattern, whereas an algorithm that computes a function may use di erent communication patterns for di erent inputs. It is almost obvious that a gossiping scheme can be used to compute any function with global output (cf. Observation 3.1). In [22] it is stated that lower bounds for gossiping carry over directly to the synchronization problem or, even more generally, to computing any multiple-output function in which all output components depend on all input components, like matrix inversion or computing the discrete Fourier transform of a vector, on arbitrary processor networks. However, although the task of computing a function that depends on all input components with global output bears some resemblance to the gossiping problem, there does not seem to be an obvious way of identifying a gossiping scheme within an arbitrary algorithm for such a function.
To illustrate this, consider the function h : {0; 1} n → {0; 1}; (a 1 ; : : : ; a n ) → 16i6m 16j6m
for some square number n = m 2 . If h(a 1 ; : : : ; a n ) = 1, it is su cient to tell all nodes that a (i−1)m+1 =· · ·=a im =1 for some i; if h(a 1 ; : : : ; a n )=0, it is su cient to inform all nodes that a 1;s(1) = · · · = a m; s(m) = 0, for suitable s(1); : : : ; s(m) ∈ {1; : : : ; m}. It is conceivable that there are network algorithms that employ di erent communication patterns for di erent inputs to compute this function in fewer steps than the number of steps in a gossiping scheme.
In this paper we give precise su cient conditions for when one can indeed ÿnd a gossiping scheme within a network algorithm, and thus provide a formal version of the statement from [22] mentioned above.
There is an essential restriction on the communication mode in the network, which turns out to be necessary for our results to be true; it can be phrased informally as follows.
( * ) "Predictable reception": a processor knows at the beginning of a step whether it is to receive a message across one of its links in this step or not.
That means, a processor must not, in step t, wait for a message that may or may not arrive and proceed in di erent ways when a message arrives and otherwise. This restriction makes it impossible that information is transferred by not sending a message in a certain round. The relevance of such a possibility in the context of computations on the parallel random-access machine (PRAM) was observed in [7] , and investigated in more depth in [3, 9] . Apart from this restriction, the model is quite general, e.g., a
processor may wait for a message from any one of its neighbors without specifying the sender or may make a message available to any neighboring processor that wants to receive it. As will be discussed later, algorithms that observe restriction ( * ) are suitable to be executed on asynchronous networks as well (Remark 2.9).
Results
It is almost obvious that if a network has a T -round gossiping scheme then it can solve the synchronization problem in T steps with 1-bit messages and trivial computation, and, under the assumption that messages may be arbitrarily long and computation is for free, it can compute any function with global output in T steps. The main result of this paper essentially is that this is best possible. More speciÿcally, we show the following for processor networks with "predictable reception".
• If an n-ary function f that has a critical input a, i.e., an input with the property that each of the n components of the input can be changed in such a way that the value of the function changes, can be computed on a processor network G with global output in T steps, then G has a T -round gossiping scheme.
• If an n-ary function f that has a critical input can be computed on a processor network G in T steps with local output at processor P i0 , or a function f that depends on component a i0 can be computed on G in T steps with global output, then G has a T -round broadcasting scheme.
These results hold if both in the processor network and in the gossiping network 1-way resp. 2-way communication is assumed. Of course, the idea of the proof is to show that any communication pattern that the network produces on the critical input in fact essentially is a gossiping scheme respectively a broadcasting scheme. Although this is quite intuitive, the result does not seem to be obvious, due to the fact that communication patterns for di erent inputs may be di erent. The role played by the restriction "predictable reception" in clearing away these di erent communication patterns becomes apparent only in the proof.
In Sections 4 and 5 we will consider networks without restriction ( * ). As for computing functions with global or local output, we shall see that the possibility of using di erent communication patterns on di erent inputs actually makes it possible to compute the OR faster than by a gossiping respectively an accumulation scheme. Still, it can be shown that in no network the speedup can be more than a factor of 4.
In Section 5, we will exactly characterize the complexity of broadcasting a bit in a synchronous network with very general communication rules in terms of a variant of broadcasting schemes in the gossiping type model. The only remaining restriction is that in one step a processor may send only one message that may not be duplicated by the communication system.
Applications
The consequences of our results for networks with "predictable reception" are threefold. First, as corollaries we obtain a host of lower-bound results for computing functions in networks of di erent topologies with algorithms that obey restriction ( * ), since all lower bounds proved for gossiping and broadcasting carry over. As is common in gossiping theory, many of these bounds are tight. E.g., we obtain the following:
• Computing the OR in 1-way mode with global output on a complete network of n processors takes 1:44 : : : log n ± O(1) steps 1 [11, 22, 23, 28] .
• Computing the OR in 1-way mode with global output in a ring of n processors takes time n=2 + √ 2n ± O(1) [18, 22] .
• Computing the OR in 1-way or 2-way mode with global or local output on a butter y network with n = k · 2 k nodes takes at least 1:7417k steps [21] .
• Computing the OR in a complete k-ary tree of depth d with local output at the root takes exactly kd steps; with global output, 2kd steps are necessary and su cient [19] .
Second, it follows that the so-called minimum broadcast subgraphs (cf. [16] ) of a network or the trees and schedules used to achieve optimal broadcast times are also the optimal communication pattern for computing functions with a critical input on the corresponding networks, with local output, under restriction ( * ). The analogous statement for minimum gossip graphs (cf. [16] ) also holds. Third, it has been an objection to upper bounds described in gossiping theory that the model used is somewhat unrealistic in that it allows to send messages containing an arbitrary number of atomic pieces of information from one node to another in one round. Thus, claims to the e ect that a lower bound was optimal by giving the matching upper bound were not completely satisfying. (See [1] for an investigation of gossiping with bounded packet sizes.) In our network model, the corresponding upper and lower bounds hold for the problem of computing the OR function, which can be solved by sending messages consisting of one bit. Thus, both upper and lower bounds hold for a quite realistic model. The model without "predictable reception" is general enough to make it possible to draw conclusions about the complexity of broadcasting a bit in some other models, viz., PRAMs and distributed memory machines (DMMs), with communication that obeys the EXCLUSIVE READ rule. Not surprisingly, the mode of writing is not relevant in this context. (See [12, 20] for information on the complexity of PRAM computations, and [10] for a speciÿcation of the DMM model with various access con ict resolution rules.)
Related work
In [8] , the relevance of results for the gossiping problem for real multiprocessor systems was discussed in depth, but informally, i.e., without making the model for the multiprocessor system explicit. To the best of the knowledge of the author, the problem considered here has not been studied before on a comparable technical level, with the exception of work done by Belting in his diploma thesis [4] , who used a combination of the lower-bound method for the gossip problem on the complete network from [11, 22, 23, 28] with the degree technique for proving lower bounds for CREW PRAMs from [8] to show that computing the OR function in networks that obey restriction ( * ) and the exclusive-write property (it is forbidden that in one step, more than one message is sent to one processor) takes exactly as long as gossiping in this network. The proof method used in the present paper is completely di erent. Our technique is, however, related to the method introduced in [31] for analyzing computations on concurrent-read concurrent-write PRAMs with bounded communication width. The method for analyzing broadcasting algorithms in networks with unrestricted communication is a new and more general formalization of the idea of analyzing PRAM computations by keeping track of those cells and processors that are "a ected" by some input bit, which has been used in [3, 7] .
In [2] , the problem of computing and distributing the value of a commutative and associative function in a complete network is considered, with the variant of communication mode that allows a processor to send one message and receive one message in one step. The lower bound log n claimed in that paper for the broadcasting problem is called "obvious", and it seems that a ÿxed communication pattern is assumed. Similarly, the lower bound mentioned in [5] for the more general "k-port" model seems to be based on this assumption.
Also, it should be mentioned that in the context of asynchronous communication in networks, Tel studied a family of algorithms that were all recognized to be equivalent (called "normal algorithms" in [29] and "wave algorithms" in [30] ). They would correspond to the problem of computing functions that have a critical input with "local output". However, because of the absence of a notion of time in these models, Tel's results do not have direct applications in our setting.
Structure of paper
The structure of the rest of this paper is as follows. In Section 2, we recall the relevant notions from gossiping theory and formally describe the network model. Section 3 contains the proof of the main theorem; Section 4 contains variants of the main result, in particular, extensions to functions that do not have a critical input, to multiple-valued functions, and to networks without restriction ( * ). Finally, in Section 5 we consider the problem of broadcasting one bit in unrestricted networks, and mention applications to other parallel computational models (exclusive-read PRAMs, and distributed memory machines (DMMs) with the ARBITRARY read and write con ict resolution rule).
Preliminaries
In this section, we ÿrst recall some basic deÿnitions and facts from gossiping and broadcasting theory. Next, we give a precise description of the type of processor networks we will consider. For completeness, we also recall the deÿnition of some well-known complexity measures for n-ary functions.
A formal view of gossiping and broadcasting
For the rigorous deÿnitions of gossiping, broadcasting, and accumulation schemes we partly follow the survey paper [19] . (See that paper as well as [13] for more information on the subject, in particular for lower and upper bounds for speciÿc networks, and for variations of the models.) Throughout this paper, by a graph G = (V; E) we mean an undirected graph without loops or multiple edges, where, without loss of generality, V = {1; : : : ; n} for some n ∈ N. An undirected edge between nodes i and j is denoted by {i; j}, a directed edge from i to j by (i; j).
Let G = (V; E) be such a graph, and assume each node j ∈ V has a piece j of information. A communication scheme works in rounds 1 6 t 6 T , where in each round a node may send (copies of) all pieces of information it has collected so far to one of its neighbors. In gossiping, after round T every node must know all pieces of information. Such a communication scheme may be compactly represented by ÿxing for each round t the set M t of the edges along which messages are sent. For most of this paper, we only allow schemes in which in one round a node can communicate with only one of its neighbors. Quite naturally then, the edge sets M t are described as (undirected or directed) matchings in G, according as the communication along the edges is in 1-way or in 2-way mode. Node i will send all its information to node j in round t if (i; j) ∈ M t (1-way mode) resp. {i; j} ∈ M t (2-way mode). For later use, we need notation for the pieces of information that the nodes have collected after t rounds of the communication protocol have taken place. If we intend K t (i) to be the set of those indices j ∈ V such that node i knows j by the end of round t, the following inductive deÿnition is the straightforward formalization. Deÿnition 2.3. Let M be as in Deÿnition 2.2. The sequence K(M) = (K 0 ; K 1 ; : : : ; K T ) of mappings K t : V → P(V ), where P(V ) denotes the power set of V , is deÿned as follows:
if such an l exists:
otherwise:
be the sequence associated with M as in the previous deÿnition. One can also deÿne broadcast and accumulation complexity in 2-way mode. However, it is easily seen that these do not di er from their 1-way counterparts. In fact, we even have the following. (For the simple proofs of these claims, see, e.g., [19] .)
Speciÿcation of the network model
In this section, we describe the machine model that will be the basis of our considerations. The description will be slightly informal; a fully rigorous deÿnition (involving abstract state sets, transition functions, read-address, write-address, and write-value functions, as well as input and output functions) can quite easily be constructed, e.g., along the lines of the formal description of a CREW PRAM in [7, 9] . We consider a network consisting of n processors, P 1 ; : : : ; P n , and of a set of bidirectional links, each of which joins two of the processors. The topology of the network is described by a graph G = (V; E) with V = {1; : : : ; n}, the edges {i; j} ∈ E representing the links. Assume the network is to compute a function f :
At the beginning of the computation, input a i ∈ A i is given to processor P i , for i ∈ V , that means, the initial state of P i depends on a i .
The computation proceeds synchronously in steps t = 1; : : : ; T . The actions of the network in one step are as follows. We ÿrst describe the 1-way case (half-duplex use of links). Processor P i , i ∈ V , on the basis of its state after step t − 1, chooses one of the following two possibilities.
(S) Choose a message m i; t and a set V i; t ⊆ {j ∈ V | {i; j} ∈ E} representing possible recipients of the message. We say that P i SENDS a message in this step. (For notational convenience, we also allow that V i; t = ∅, which means that P i does nothing in this step.) (R) Choose a nonempty set W i; t ⊆ {l ∈ V | {l; i} ∈ E} of neighbors, representing possible senders from which P i wishes to RECEIVE a message. Now, messages are transmitted, so as to satisfy the choices made by the processors, as described in the following. Consider the set
representing all edges across which information may ow. It will be important that we may choose from E t disjoint edges to be used for communication in a greedy manner, and still are guaranteed that all processors that want to receive a message actually get one. For this, condition ( * ) is formulated technically as follows:
( * ) 1 "Predictable reception (1-way)": For any directed matching M ⊆ E t there is a directed matching M with M ⊆ M ⊆ E t that covers all recipients, i.e., if P j has chosen to receive a message in step t then (i; j) ∈ M for some i ∈ V .
Some matching M ⊆ E t that covers all recipients is chosen (by "the system"), and for each pair (i; j) ∈ M message m i; t is delivered to P j . Messages that are not delivered are discarded. We require that no matter which decision is made here by "the system", the output produced at the end of the computation is always correct. (This rule is analogous to the ARBITRARY write-con ict resolution rule for PRAMs, cf. [20] .) Processors P i that send a message in step t change their state only by noting that step t is ÿnished; those processors P j that have received a message m i; t assume a new state that also depends on this message. After the last step, T , the result f(a) is known to all processors (in "global output" mode) or to one designated processor P i0 (in "local output" mode), that means, f(a) is a function of the state that is ÿnally reached by each processor resp. by processor P i0 .
Remark 2.7. We sketch two special combinations of possible choices for the sets V i; t and W j; t that may make it clearer which variety of possibilities is covered by this model. (a) All senders P i might specify a set V i; t with |V i; t | = 1 or V i; t = ∅, and all recipients P j might specify W j; t = {i | {i; j} ∈ E}: this corresponds to the situation where each receiving processor has a "write window" into which other processors may write; con icts are resolved by the ARBITRARY rule known from PRAMs. Requirement ( * ) 1 simpliÿes to the condition that for each recipient P j there must be at least one message actually addressed to it.
(b) All recipients P j might specify a set W j; t with |W j; t | = 1, i.e., they are interested in seeing a message from a speciÿc neighbor, and all senders P i might specify V i; t = {j | {i; j} ∈ E}, i.e., make their respective message available to all their neighbors: this corresponds to a situation in which senders o er their information via a "read window" accessible to all their neighbors. Requirement ( * ) 1 turns into the EXCLUSIVE READ rule known from PRAMs.
In the 2-way (full-duplex) variant of the model, the basic structure of the computation is similar. However, here at the beginning of step t processor P i ÿxes a message m i; t and a set V i; t ⊆ {j ∈ V | {i; j} ∈ E} of neighbors that are possible partners. (If V i; t = ∅, the processor does not want to communicate.) Consider the graph G t = (V t ; E t ), where V t = {i | V i; t = ∅} and E t = {{i; j} | i; j ∈ V t and i ∈ V j; t and j ∈ V i; t }. The condition "predictable reception" here takes on the following form.
( * ) 2 "Predictable reception (2-way)": Any partial matching M ⊆ E t can be extended to a perfect matching M for G t , i.e., to a matching in G t that covers all nodes in V t and includes M .
The "system" (arbitrarily) chooses one such perfect matching M , and the processors communicate according to this matching, i.e., for every edge {i; j} ∈ M message m i; t is delivered to P j and message m j; t is delivered to P i . Processors that have received a message change their state accordingly. Deÿnition 2.8. Let G = (V; E) be a graph specifying a processor network, and let f be an n-ary function.
(a) A network algorithm (in 1-way or 2-way mode) is said to compute f with "global output" ("g") if for all inputs a, after T steps have been performed, all processors know f(a). (b) The 1-way network complexity T 1; g G (f) is the minimum number T of steps of a 1-way algorithm with global output that computes f. The 2-way complexity T 2; g G (f) is deÿned analogously. (c) A network algorithm in 1-way or 2-way mode is said to compute f with "local output" ("l") at node P i0 if after step T of the algorithm processor P i0 knows the result f(a). (d) The network complexities T 1; l G; i0 (f) and T 2; l G; i0 (f) are deÿned for local output at processor P i0 in analogy to the case of global output.
Remark 2.9. While restriction ( * ) 1 (respectively ( * ) 2 in the 2-way case) may not seem to be a natural assumption to make in fully synchronous networks, it arises quite naturally in connection with the problem of performing synchronous algorithms on fully asynchronous networks. Assume that in an asynchronous network internal computations of processors and delivery of messages may be delayed for indeÿnite but ÿnite periods of time. Still, we want to perform an algorithm written for a synchronous network in which processors can either receive or send one message in one step. The obvious idea is to have each processor keep an internal (virtual) step counter, and to keep the exchange of messages synchronized by the use of time stamps. Conceptually, we assume that there is a "communication system" that manages a global message bu er. If P i wants to send a message m i; t to one of the processors P j , j ∈ V i; t , it places the message, extended by a time stamp t, and the list V i; t into this bu er, and proceeds to step t + 1 no matter what happens to the message. If P j wants to receive a message in step t from one of the processors P i , i ∈ W j; t , it submits a corresponding request to the system, again with time stamp t. The system searches the bu er for a message that has time stamp t and lists P j among its possible recipients. If one is found, it is delivered to P j and removed from the bu er; P j proceeds to step t + 1. Otherwise, P j is blocked until a message appears in the bu er that can be delivered to it. Synchronous algorithms that obey restriction ( * ) 1 are suited for being run in this way on an asynchronous network, in the following sense: no matter in which order messages are submitted or delivered, every receive request for step t will ÿnally be satisÿed; it never happens that a processor waits indeÿnitely for a message that will not arrive. (In the terminology of asynchronous distributed systems, this is a "liveness" property; see [30] .)
Complexity measures for functions
We recall some deÿnitions concerning functions f :
Originally, these notions were formulated for Boolean functions, but they can readily be generalized to other domains (cf. [12] ). The concept of a critical input will be central for the main result. For discussion and applications of all these notions in the context of parallel computing on PRAMs see, e.g., [6, 7, 26, 31] , and in particular the survey [12] . To help readers to familiarize themselves with these concepts, we provide some simple examples.
(a) An input a = (a 1 ; : : : ; a n ) is critical for f if for every i ∈ {1; : : : ; n} there is an input b that di ers from a only in the ith component and satisÿes f(a) = f(b). (b) The critical complexity c(f) is the maximal k such that there is an input a with the property that for k di erent indices i ∈ {1; : : : ; n} there is an input b that di ers from a only in the ith component and satisÿes f(a) = f(b). (c) For each input a, the sensitive complexity (or certiÿcate complexity) s(f; a) of f at a is the minimal k such that there is a set I a ⊆ {1; : : : ; n} of cardinality k with the property that all inputs b = (b 1 ; : : : ; b n ) with ∀i ∈ I a :
The function f depends on input bit i if there are inputs a and b that di er only in the ith component and satisfy f(a) = f(b).
Example 2.11. (a) Input (0; : : : ; 0) is critical for the n-ary OR function, but no other input is critical. For the n-ary PARITY function every input is critical.
(b) The critical complexity of the n-ary OR function is n; the critical complexity of the function h deÿned in Section 1.2 is m = √ n.
(c) If a = (0; : : : ; 0) and a = (1; 0; : : : ; 0), then s(OR; a) = n and s(OR; a ) = 1. The sensitive complexity of the n-ary OR function is n. The sensitive complexity of the function h from Section 1.2 is m = √ n. (For each input a, it is su cient to ÿx m input bits to guarantee that the output is h(a).) (d) All functions mentioned in (a) -(c) depend on all their input bits.
Computing functions versus gossiping
This section contains the formulation and the proof of the main results concerning the relationship between gossip and accumulation protocols on the one hand and computing functions in networks with global respectively local output on the other hand.
Throughout this section, let G = (V; E) be a graph with V = {1; : : : ; n} for some n ∈ N, and let i 0 ∈ V be ÿxed. We assume alternately that G represents a gossiping network and a processor network. The following is well known and an almost immediate consequence of the deÿnitions.
If f is the OR of n bits, it can be computed within these time bounds by using 1-bit messages only.
Proof. We only consider (a); the other statements are proved similarly. Let M = (M 1 ; : : : ; M T ) be a 1-way gossip protocol, and let a = (a 1 ; : : : ; a n ) be the input. The sequence K(M) = (K 0 ; K 1 ; : : : ; K T ) is deÿned as in Deÿnition 2.3. In step t, 1 6 t 6 T , the processors send each other messages according to the communication pattern given by M t . If (i; j) ∈ M t , processor P i sends a message m i; t to P j , where m i; t is a representation of the input fragment (a l ) l∈Ki(t−1) . From the deÿnition of K(M) it follows by induction that the required information is available to P i in this step; since M is a gossip protocol, after step T each processor knows the complete input a and can compute f(a). Note that the longest message can be at most as long as a. If f is the OR function, it is su cient to send as message m i; t the single bit l∈Ki(t−1) a l . (Note that the only properties of the OR function used here are that it is commutative, associative, and idempotent. In complete networks, idempotency is not required, see [2, 5] .)
The main result of this paper is essentially that, under the restriction "predictable reception", these algorithms for computing f are optimal if global respectively local output is required and f has a critical input. If global output is required and f is nonconstant, lower bounds for broadcast protocols apply. Theorem 3.2. Let G =(V; E) be a graph with node set V ={1; : : : ; n}, which represents a network of processors, and let f be an n-ary function. If f has a critical input, then Proof of Theorem 3.2. We deal with part (a) in detail; the proofs of the other parts, being similar, will only be sketched. In view of Observation 3.1, only the inequality T 1; g G (f) ¿ r(G) has to be proved. For this, assume that a 1-way algorithm for computing f on G in T steps is given, and that a * = (a * 1 ; : : : ; a * n ) is a critical input for f. Obviously, it is su cient to construct a 1-way gossip protocol for G that has T rounds.
The construction splits into two parts. First, we eliminate ambiguities from computations according to the algorithm, i.e., for each input a we ÿx a computation C a , which essentially corresponds to a communication pattern. In the second part, we show that C a * induces a gossip protocol for G.
Part 1: Fix computations. Consider the network algorithm that computes f in T steps. First, we arbitrarily ÿx a computation C a * for input a * . Consider steps t =1; : : : ; T one after the other. Let E t (a * ) be the set of all possible pairs of senders and recipients determined on the basis of step t − 1 (cf. Section 2.2). Then an arbitrary matching M * t ⊆ E t (a * ) is chosen that covers all recipients, and messages are delivered according to M * t . Now, consider some input a = a * . We proceed by induction on t. Assume C a has been ÿxed up to step t − 1, and consider the graph (V; E t (a)) induced by the communication requests of the processors for step t. Let M t (a) = M * t ∩ E t (a) be the set of those edges in E t (a) that are used in step t of C a * . By restriction ( * ) 1 , we may extend M t (a) to some directed matching M t (a) ⊆ E t (a) that covers all recipients. Deliver messages according to M t (a). Proof. Let the sequence K(M * ) = (K 0 ; K 1 ; : : : ; K T ) be as in Deÿnition 2.3. We must show that K T (i) = V for all i ∈ V . For this, it is su cient to establish the following assertion (A t ) for t = T : (A t ) for all inputs a = (a 1 ; : : : ; a n ) and all i ∈ V :
(∀j ∈ K t (i) : a j = a * j ) ⇒ P i is in the same state in C a * and C a after step t.
, then by (A T ) processor P i is in the same state after step T on input a * and each input that di ers from a * only in the jth component, which contradicts the assumption that a * is critical for f and that the network computes the function f with global output.
We prove (A t ) by induction on t. For t = 0, the claim follows from the deÿnitions: K 0 (i)={i}, for i ∈ V , and the initial state of P i only depends on the ith input component. Now assume t ¿ 0, and that (A t−1 ) is true. Let i ∈ V and a be an input so that a j = a * j for all j ∈ K t (i). There are two cases. Case 1: There is no l such that (l; i) ∈ M * t . Then, by deÿnition, K t (i) = K t−1 (i), in particular a j = a * j for all j ∈ K t−1 (i). By the induction hypothesis, P i is in the same state after step t − 1 in both C a * and C a . Since no l satisÿes (l; i) ∈ M * t , no processor P l delivers a message to P i in step t of C a * , i.e., by restriction ( * ) 1 ("predictable reception"), in step t of C a * processor P i is a sender. Since this decision is based on the state at the end of step t − 1, P i decides in the same way in computation C a , thus does not receive a message in C a in step t either, and enters the same state in C a as in C a * .
Case 2: There is some (unique) l such that (l; i) ∈ M * t . In this case, by deÿnition, K t (i) = K t−1 (l) ∪ K t−1 (i). We apply the induction hypothesis (A t−1 ) to a with respect to both K t−1 (l) and K t−1 (i) to conclude that P l is in the same state after step t − 1 of C a and of C a * , and that the same is true for P i . Since (l; i) ∈ M * t , the message m l; t sent by processor P l in this step is delivered to P i in step t of computation C a * ; in particular, P l is a sender with i ∈ V l; t and P i is a recipient with l ∈ W i; t . Since both P i and P l are in the same state after step t − 1 in C a * and C a , this will also be true in computation C a ; thus, edge (l; i) is in E t (a) and in M * t , and P l sends message m l; t in step t of C a as well. By the construction of C a described in Part 1, edge (l; i) will be chosen to be in M t (a), and message m l; t will be delivered to P i in C a . This implies that P i receives identical messages in step t of C a * and of C a ; hence P i will enter the same state at the end of step t in these two computations. This ÿnishes the induction step, and the proof of the main lemma.
We will not prove parts (b) and (c) of the theorem in detail, since the arguments are essentially the same as in part (a). We only point out the changes to be made. In part (b), undirected matchings are used in place of directed matchings. We ÿx a computation C a * arbitrarily, and show exactly as in the 1-way case that the sequence M * = (M 1 (a * ); : : : ; M T (a * )) of matchings that describe the pairs of processors that communicate in each step is a 2-way gossip protocol. Restriction ( * ) 2 is formulated in such a way that the argument from (a) carries over without any di culties. In part (c), the construction of the communication protocol is exactly the same as in part (a), respectively (b). The only di erence in the proof is that since it is only required that P i0 knows the result at the end, we can only conclude that K T (i 0 ) = V , which means that we have constructed an accumulation protocol for G. Then Fact 2.6 and the remark preceding it yield (c). Finally, we sketch the proof of part (d), for 1-way mode. Choose a * and a such that f(a * ) = f(a) and a * and a di er only in the i 0 th component. Computation C a * and (afterwards) computation C a , each with T steps, are ÿxed exactly as in the proof of part (a); the sequence M = (M 1 (a * ); : : : ; M T (a * )) is deÿned as above. Consider the resulting sequence K(M) = (K 0 ; : : : ; K T ) (Deÿnition 2.3). The assertion (A t ) ∀i ∈ V [i 0 ∈ K t (i) ⇒ P i is in the same state after step t of C a * and C a ] is proved by induction on t. Since all processors P i must be able to distinguish a * and a after step T , we must have i 0 ∈ K T (i) for all i ∈ V , thus, M * is a broadcast protocol, which establishes the inequality T ¿ b(G; i 0 ).
Extensions and limitations
In this section, we consider possible extensions of the main result. First, we discuss functions that do not necessarily have a critical input; next, multiple-output functions are considered; ÿnally, the situation of processor networks that do not necessarily satisfy restriction ( * ) is discussed. The complexity of broadcasting a bit in networks with an even more general communication mode is treated in detail in Section 5.
Functions without critical input
What can be said about functions that do not have a critical input? We can use the approach taken in the proof of Theorem 3.2 to obtain the following. Assume a network G=(V; E) computes a function f with global output, and a * =(a * 1 ; : : : ; a * n ) is an arbitrary input. Deÿne M * = (M of the complete network, in [4] it was established that the technique of [11, 22, 23, 28] can quite easily be adapted to prove a lower bound of about log (s) for this case with = 1 2 (1 + √ 5), which is about 1:44 : : : log(s). Alternatively, one can consider the critical complexity of f (Deÿnition 2.10(b)). Let a * be an input and I ⊆ V be a set of size c(f) such that for every i ∈ I there is an input a that di ers from a * only in the ith component and satisÿes f(a) = f(a * ). Then from (A T ) it is clear that the communication protocol deÿned in the proof of Theorem 3.2 satisÿes I ⊆ K T (j) for all j ∈ V . This relates to the problem of broadcasting from k =c(f) ÿxed sources. Results for this problem for speciÿc networks and speciÿc placement of these sources have been obtained by H oltring in her diploma thesis [17] .
Functions with multiple outputs
In applications, often functions f :
, a → (f 1 (a); : : : ; f n (a)), must be computed in such a way that the jth component f j (a) appears at processor P j at the end. In [8] , matrix inversion, discrete Fourier transform, and sorting are listed as examples. Our methods apply to this situation directly if there is one input a * that is critical for all functions f 1 ; : : : ; f n . This is the case, e.g., for the problem of sorting n numbers a 1 ; : : : ; a n from the set {1; : : : ; m} with m ¿ n + 2, so that the number with rank j appears at processor P j at the end, since the input (2; : : : ; n + 1) is critical for all output positions. In many important cases, though, no single input can be found that is critical for all components of the output, e.g., for the problem of sorting n bits. For the special case of the complete network, we still can obtain lower bounds in terms of the sensitive complexity of f 1 ; : : : ; f n on speciÿc inputs, in the following way. Proposition 4.2. If the multiple-output function f : a → (f 1 (a); : : : ; f n (a)) is computed on the complete network in 1-way mode in T steps, under restriction ( * ) 1 , and a * is any input, then
Proof. (Sketch.) Fix any input a * . Just as in Section 4.1 we can see that the communication protocol M * induced by a * and the corresponding sequence K(M * ) = (K 1 ; : : : ; K T ) must satisfy |K T (j)| ¿ s(f j ; a * ), for 1 6 j 6 n. The proof method from [8, 12, 23, 28] can be applied to show that T must satisfy
Combining these inequalities and taking logarithms yields the result.
Corollary 4.3. Sorting n bits in 1-way mode on a complete network with restriction ( * ) 1 takes at least 1:44 : : : log n − O(1) steps.
Proof. Assume the problem of sorting n bits can be solved in T steps. The function f j is the output bit that appears at processor P j . As input, we choose a * = (0; : : : ; 0). Since f j (a) = 0 if and only if a contains j zeroes, it is clear that s(f j ; a
We leave it to the reader to ÿnd inputs for the other problems mentioned (discrete Fourier transform or matrix inversion) that make it possible to show that these problems have high complexity in complete processor networks.
In order to apply results from the gossiping framework to the computation of multipleoutput functions in other, sparse, networks, we would have to have more information on partial gossiping in these networks, in particular, the complexity of collecting any k input pieces or a predetermined set of k input pieces in every node.
Networks without "predictable restriction"
In this section, we show that Theorem 3.2 becomes false if the network algorithm does not have to satisfy restriction ( * ) ("predictable restriction") but still is fully synchronous. In this case, we may collect information with a large fan-in, and distribute information faster using the phenomenon of "transmitting information by not sending a message", cf. [3, 7, 9] . Then, we show that even such networks cannot compute nonconstant functions with global output more than four times faster than gossip protocols.
Let us start with two examples, the tree network and the complete network. (a) the OR of n bits with local output at P 1 can be computed in d steps, (b) a bit b located initially at P 1 can be broadcasted to all nodes in d steps, and (c) the OR of n bits with global output can be computed in 2d steps.
Proof. (a) It is su cient to note that (for d = 1) the OR of three bits a 1 , a 2 , and a 3 can be computed in 1 step, because then we can proceed iteratively, starting from the leaves. In one step, P 2 [P 3 ] sends a message to P 1 if and only if a 2 = 1 [a 3 = 1]. If P 1 gets any messages, the result is 1, otherwise it is equal to a 1 . (b) Again, we only have to show how this can be done for d = 1; in larger trees, we iterate, starting from the root. P 1 informs both of its sons in one step as follows: if b = 0 then it sends a message to P 2 ; if b = 1 then it sends a message to P 3 . In each case, the processor that did not get a message can derive b from just this fact. (The contents of the message are irrelevant; cf. [3, 10] for this trick.) (c) Apply (a) and (b) one after the other.
The resulting running times should be contrasted with the fact that if G is the binary tree of depth d with root P 1 then b(G; P 1 )=2d and r(G)=4d (cf. [19] ). In the complete network, the situation is slightly di erent.
Proposition 4.5. Let G be the complete network of size n. With synchronous algorithms in 1-way mode (which do not satisfy restriction ( * ) 1 ), in G:
(a) the OR of n bits with local output can be computed in one step, (b) a bit c located initially at some node can be broadcasted to all nodes in log 3 n ≈ 0:63 log n steps, and (c) the OR of n bits with global output can be computed in 1 + log 3 n steps.
Proof.
(a) For 2 6 i 6 n, processor P i sends a message to P 1 if and only if a i = 1. If P 1 gets a message, the result is 1, otherwise it is a 1 . (b) Using the same trick as in the proof of the previous proposition, in one step a processor that knows c can inform two other processors of this value; thus, the number of processors that know c can be tripled in one step. (c) Combine (a) and (b).
The running times from this proposition should be compared with the values a(G; i 0 )= b(G; i 0 ) = log n and r(G) = 1:44 : : : log n ± O(1) for G the complete network. Next, we note that for computing nonconstant functions with global output in any network G, at least Proof. The main part of the proof is postponed to Section 5, where we characterize the complexity of broadcasting a bit in synchronous networks that are even stronger than those considered here in terms of a variant of the broadcast complexity of the underlying graph. Clearly, if a network G can compute a function f that depends on input position i 0 in T steps with global output, then it can broadcast one bit from source P i0 in T steps. In Section 5, we show that broadcasting one bit from a node i 0 in G takes at least b 2 (G; i 0 ) steps, which is the number of rounds in an optimal "2-broadcast" protocol, in which one node may send a message to two of its neighbors in one step. It is easy to see that b(G; i 0 ) 6 2b 2 (G; i 0 ), hence b(G; i 0 ) 6 2T . By the trivial fact that r(G) 6 2b(G; i 0 ) (cf. [19] ), we obtain r(G) 6 4T .
We note that the bound given in the previous proposition is tight: Let G be the binary tree of depth d. We can broadcast a bit from the root in d steps ( Proposition  4.4(b) ), hence in G some nonconstant function can be computed with global output in d steps. On the other hand, r(G) = 4d for this network, and b(G; i 0 ) ¿ 2d for every node i 0 in G.
Broadcasting a bit in a synchronous network
In this section, we study the complexity of broadcasting one bit in a synchronous network of processors that communicate by message passing, with hardly any restriction on the communication mode excepting that a processor must not send more than one message in one step. We show, in analogy to Theorem 3.2, that a well-known complexity measure from the theory of broadcast protocols for atomic pieces of information is appropriate for describing the complexity of this problem.
Note that the problem of broadcasting a bit captures the essence of all situations in which di erences in the state of a single processor ÿnally in uence all others.
We treat communication in 1-way mode and in 2-way mode separately (Sections 5.3 and 5.4, respectively). The results of this section can be applied to other parallel models like EXCLUSIVE READ PRAMs and 1-ARBITRARY distributed memory machines, as will be indicated at the end of Section 5.4.
The general network model in 1-way mode
As before, we consider a network of n processors, connected by bidirectional links according to a graph G = (V; E). First, we focus on 1-way communication. One node P i0 is distinguished as the source of the broadcasting process, i.e., this processor can be in either one of two di erent states initially, representing inputs 0 and 1; the other processors are in an initial state that is independent of the input. In step t, a processor may send a message and receive several messages, as speciÿed by the following rules. Depending on its state after step t − 1, processor P i ÿxes a message m i; t that it wishes to send, and a set V i; t ⊆ {j | {i; j} ∈ E} of possible recipients. (As in Section 2.2, the choice V i; t = ∅ indicates that P i does not send a message at all; with |V i; t | = 1 a unique recipient can be speciÿed.) Further, P i speciÿes a set W i; t ⊆ {j | {i; j} ∈ E} of processors from which it wishes to receive a message. Let E t = {(i; j) | j ∈ V i; t and i ∈ W j; t }. Some set E t ⊆ E t is selected arbitrarily such that for each i that occurs as a ÿrst component in E t there is exactly one j such that (i; j) ∈ E t . Then, for each (i; j) ∈ E t , message m i; t is delivered to P j . Thus, a processor P j that has speciÿed W j; t = ∅ may receive messages from none, some, or all P i with j ∈ W j; t . On the basis of all messages received P j changes its state. After step T , all processors must know whether the input bit was 0 or 1. (The reader is invited to check against his or her intuition that alternative rules for dealing with surplus messages like bu ering for later delivery, combining, discarding, etc. are at most as strong as this scheme.) Deÿnition 5.1. The broadcast complexityT G; i0 of a processor network G in 1-way mode with source P i0 is the smallest number of steps an algorithm for broadcasting a bit from P i0 can have.
2-broadcast protocols
Turning now to communication networks, we generalize broadcast protocols (see Section 2) to 2-broadcast protocols, in which one node may pass the information it has to two of its neighbors in one step. (This modiÿcation is discussed as "DMA-bound model" H 2, an abbreviation for "half-duplex with outdegree 2", in [13, 22] .) Our deÿ-nition of 2-broadcast protocols di ers only formally from the standard deÿnition. A 2-broadcast protocol can be used to broadcast a piece of information, initially located at node i 0 , in the following obvious way: if edge (i; j) is in M t and i 0 ∈ K t (i), then in round t the piece of information is sent from i to j. The deÿnition makes sure that the node that has to send the piece of information actually has received it before and that every processor sends the piece of information to at most two of its neighbors in one step. As mentioned before, it is obvious that b(G; i 0 ) 6 2b 2 (G; i 0 ) for arbitrary networks G and nodes i 0 in G.
Processor networks versus 2-broadcast networks
By using the trick described in the proof of Proposition 4.4(b) for sending one bit to two recipients in one step, we can transform any 2-broadcast protocol for a graph G into a broadcast algorithm in 1-way mode for the processor network with topology given by G. Proof. Assume that a 2-broadcast protocol for G with T rounds is given, and deÿne K(M)=(M 1 ; : : : ; M T ) as before. We obtain an algorithm for the network as follows. If edges (i; j 1 ) and (i; j 2 ) are in E t and i 0 ∈ K t (i), then in step t processor P i informs P j1 and P j2 of the value of b with the trick described in the proof of Proposition 4.4(b). If there is only one edge leaving i in E t , the trick can be applied nonetheless. We note that in this way all processors can use the same message in all steps.
The purpose of this section is to show that this algorithm is optimal. This may be intuitively plausible, but to the best of the knowledge of the author a formal proof, especially for a network model as general as considered here, has not been available so far.
Proof. In view of Observation 5.3 we must only prove thatT G; i0 ¿ b 2 (G; i 0 ). Let a broadcast algorithm for the processor network G in 1-way mode with source i 0 be given that runs in T steps. We construct a 2-broadcast protocol with at most T steps.
We proceed similarly as in the proof of Theorem 3.2. However, here computations C 0 and C 1 on inputs 0 and 1 are ÿxed simultaneously by induction on t. The idea is to choose the actual transmissions for C 0 and C 1 in such a way that they have as many common elements as possible.
Thus, assume that C 0 and C 1 have been ÿxed up to step t − 1. For b = 0; 1, the communication requests of the processors (as represented by the sets V i; t (b) and W i; t (b), for i ∈ V ) induce a set E t (b) of directed edges across which messages may ow. We let E t = E t (0) ∩ E t (1) and choose a subset E t ⊆ E t so that if i appears as a ÿrst component in E t then (i; j) ∈ E t for exactly one j. Then, for b = 0; 1 separately, a set E t (b) with E t ⊆ E t (b) ⊆ E t (b) is chosen with the property that if i appears as a ÿrst component in E t (b) then (i; j) ∈ E t (b) for exactly one j, and in step t of C b message m i; t is delivered to P j for all pairs (i; j) ∈ E t (b).
The intuition behind the next deÿnition is that we try to "peel o " inessential parts of the two computations, i.e., to identify "meaningless" messages and eliminate them. For this, we deÿne a set E 0; 1 of labeled, directed edges that run along some of the edges of G, and identify a 2-broadcast protocol as part of this set.
• Edge (i; j), with label t, for 1 6 t 6 T , is in E 0; 1 if and only if there is some b ∈ {0; 1} such that in computation C b the message m i; t sent by processor P i is delivered to P j , but in the other computation C b no message or a di erent one is sent from P i to P j in step t.
Note that parallel edges are possible, involving di erent time steps. Note also that identical messages that are sent across the same edge in both C 0 and C 1 are ignored in this deÿnition, corresponding to the intuition that they are "meaningless". The following simple observations are crucial.
Claim 1. Let i = i 0 and j = i 0 . If (i; j) with label t is in E 0; 1 , and no edge (i ; j) with a label t ¡ t is in E 0; 1 , then there must be some l ∈ V such that edge (l; i) with label t is in E 0; 1 for some t ¡ t.
(Intuitively spoken, in order to send a meaningful message in step t, a processor P i with i = i 0 must either
• have received a meaningful message in an earlier step in the same computation, or • have noticed that a meaningful message that would have been due to arrive in the other computation has not turned up, or • be treated di erently by the intended recipient of the message in the two computations.)
Claim 2. If i = i 0 then there is at least one labeled edge that enters node i.
(Intuitively, if P i , i = i 0 , never gets a meaningful message in either computation, it cannot know the input bit at the end.)
Proof of Claim 1. Assume for a contradiction that (i; j) with label t is in E 0; 1 , but that in E 0; 1 there is neither an edge (i ; j) nor an edge (l; i) with a label t ¡ t. By symmetry, we may assume without loss of generality that in computation C 0 processor P i sends a message m i; t , which is delivered to P j , but that this message does not ow from P i to P j in C 1 . This means that (i; j) ∈ E t (0), hence we have j ∈ V i; t (0) and i ∈ W j; t (0). The last two assumptions imply that processor P i , which does not know b initially since i = i 0 , has received exactly the same messages in each of the steps 1; : : : ; t − 1 in both computations, and that the same is true for P j . This entails that P i and P j both are in the same state after step t − 1 in C 0 and C 1 , hence j ∈ V i; t (1) and i ∈ W j; t (1), which implies (i; j) ∈ E t (1), and P i sends message m i; t in C 1 as well. By the deÿnition of E t , edge (i; j) appears in this set. In computation C 0 edge (i; j) is used for delivering m i; t , which implies (i; j) ∈ E t . Hence, (i; j) ∈ E t (1) as well, which means that message m i; t is delivered to P j in C 1 as well, a contradiction.
Proof of Claim 2. If no labeled edges enter i, then all messages received by P i in steps t, 1 6 t 6 T , are identical in C 0 and C 1 . Thus, since i = i 0 , in both computations P i will be in the same state at the end of step T . This contradicts the requirement that at the end all processors must know b.
Next, we eliminate some of the labeled edges, as follows. All labeled edges that enter i 0 are removed; all labeled edges that enter i = i 0 excepting that one with the smallest label t are also removed. (This operation corresponds to the intuition that once a processor has received a meaningful message, or noted that it did not arrive at its due time, it knows b, and later messages are irrelevant.) The resulting edge set is called E * 0; 1 . The digraph G * = (V; E * 0; 1 ) has the following two properties:
(i) All nodes excepting i 0 have indegree 1 in G * . (ii) If (i; j) with label t is in E * 0; 1 , then either i = i 0 or the (unique) edge that enters i in G * has a label t ¡ t.
(Property (i) is immediate from Claim 2 and the deÿnition of E * 0; 1 . Property (ii) is a consequence of Claim 1 and the deÿnition of E * 0; 1 . Indeed, if (i; j) with label t is in E * 0; 1 , then there cannot be an edge (i ; j) in E 0; 1 with an "earlier" label t ¡ t, and j = i 0 . We may assume that i = i 0 , and apply Claim 1 to conclude that there is an edge in E 0; 1 that enters i and has a label t ¡ t. Again by the deÿnition of E * 0; 1 , the label of the unique edge (l; i) ∈ E * 0; 1 that enters i must have a step number which is at most t , thus, smaller than t.)
Properties (i) and (ii) taken together say that G * is a directed spanning tree for V with root i 0 , and that the labels along directed paths in G * are strictly increasing with respect to their t-parts. Moreover, due to the deÿnition of E 0; 1 , for each node and each t there can be at most two edges leaving i that are labeled with t. From this it easily follows that we obtain a 2-broadcast protocol for G with at most T rounds by deÿning E t = {(i; j) | (i; j) ∈ E * 0; 1 and (i; j) has label t}, for 1 6 t 6 T .
Broadcasting in 2-way mode and in other parallel models
The complexity of broadcasting a bit on EREW PRAMs has been determined in [3] . Let us recall brie y how such a parallel model works. It consists of processors Q 1 ; : : : ; Q p and cells C 1 ; : : : ; C r . In one step, a processor reads from a cell, performs some internal computation, and writes to a cell. It is forbidden that in any step more than one processor reads from the same cell or more than one processor writes to the same cell. Initially, one cell contains a bit, the others have a neutral content. EREW algorithms can be regarded as algorithms for processor networks with n = p + r processors. Each PRAM processor and each PRAM cell are represented as a network processor. A write phase is represented as a step in which all PRAM processors are senders (with 0 or 1 possible recipients) and all cells are recipients (specifying all PRAM processors as possible senders). A read phase is represented as a step in which all PRAM processors are recipients (specifying 0 or 1 cell as possible sender). If we apply the technique from the previous section, we obtain a 2-broadcast tree in which levels alternately consist of PRAM cells and PRAM processors. It is not hard to show that the numbers q t of the cumulative size of the levels 1 through t of processors and the numbers c t of the cumulative size of the level 1 through t of cells satisfy the following recurrence inequalities: c 0 = 1; p 0 = 0; p t 6 p t−1 + c t−1 ; c t 6 c t−1 + 2p t (cf. [3] ). This implies that for all p processors to be informed of the bit, log 2+ √ 3 p ± O(1) steps must be made. Incidentally, we note that the EXCLUSIVE WRITE rule is not needed in this argument; rather concurrent writing with any con ict resolution rule may be permitted.
We want to generalize this lower bound to a PRAM in which concurrent read accesses or write accesses are not forbidden but rather are resolved by the ARBITRARY rule.
ARBITRARY READ: If in a step several processors try to read from the same cell, an arbitrary one of them is given the contents of the cell, the other ones receive a negative acknowledgement ("reading failed"). ARBITRARY WRITE: If in a step several processors write to the same cell, an arbitrary one of them succeeds and is given an acknowledgement ("writing successful"), the other ones are given a negative acknowledgement ("writing failed").
The reader should note that for reading this is an unusual rule in the context of PRAMs. It has only been proposed in the context of distributed memory machines (DMMs) as a relaxation of the COLLISION access rule, see [10, 25] . The DMM with COLLISION access rule, also known as OCPC, has been unter intensive investigation in recent years, and it has been shown that such machines are very strong in a randomized setting; in particular they are able to perform routing and simulate PRAMs in sublogarithmic time [14, 15, 25] . Here, we show that the deterministic version of the model has a signiÿcant weakness: the elementary task of broadcasting one bit to p processors takes (log p) steps.
If we try to translate the rules for such ARBITRARY-READ ARBITRARY-WRITE PRAMs into communication rules for a network as above for the EREW PRAM, we note that this time we must take 2-way communication into account, because of the acknowledgements received by the processors that write. Thus, we must describe communication rules for processor networks without restriction ( * ) 2 but allowing full-duplex use of links in one step. The rules for step t in such a network are as follows. Depending on its state after step t − 1, each processor P i ÿxes a message m i; t and a set V i; t ⊆ {j | {i; j} ∈ E} of possible partners. Let V t = {i ∈ V | V i; t = ∅} and E t = {{i; j} | i ∈ V j; t and j ∈ V i; t }, and consider the graph G t = (V t ; E t ). A maximal (i.e., not extendible) matching M t ⊆ E t is chosen arbitrarily, and for each edge {i; j} ∈ M t message m i; t is delivered to P j and m j; t is delivered to P i . Processors that have received a message change their state based on this message; processors that have not received anything change their state based on this knowledge. At the beginning, i.e., before step 1, processor P i0 knows the bit b; after step T , all processors must know bit b no matter in which way the sets M t were chosen.
Deÿnition 5.5. The 2-way broadcast complexityT 2 G; i0 of a processor network G with source P i0 is the smallest number of steps a 2-way algorithm for broadcasting a bit from P i0 can have.
Remark 5.6. In analogy to Remark 2.9 we note that this kind of algorithm can be performed on an asynchronous network without deadlocks and without processors waiting indeÿnitely for messages that will not arrive. As in the case of algorithms that obey restriction ( * ) 2 , in its local step t processor P i places its message m i; t and the list V i; t of its possible partners into a global bu er, together with a time stamp t. Whenever the bu er contains a matching pair of processors (i.e., i ∈ V j; t and j ∈ V i; t ), after some ÿnite delay the system delivers the messages m i; t and m j; t and removes the communication requests. Ties are broken arbitrarily. Whenever it happens that P i has submitted a list V i; t , but all P j with j ∈ V i; t either have speciÿed V j; t = ∅ or have already found other partners for communicating in step t, the system informs P i that its communication attempt in step t has failed and P i proceeds to step t + 1. It is easily seen that the set {{i; j} | P i exchanges messages with P j in step t} is a maximal matching M t in E t , as required.
The 1-way algorithm of Observation 5.3 can also be run in 2-way mode, since no concurrent writing is used. Thus, we have the following. However, one can also prove the (again intuitively obvious) fact that for the broadcast problem 2-way communication does not help. Proof (Sketch). We must only prove that b 2 (G; i 0 ) 6T 2 G; i0 . Let a 2-way algorithm be given that performs broadcast from P i0 . We proceed similarly as in the proof of Theorem 5.4, and only indicate the changes. First, we consider undirected edges. Computations C 0 and C 1 are ÿxed simultaneously by induction on t. If E t (b)={{i; j} | i ∈ V j; t (b) and j ∈ V i; t (b)}, for b ∈ {0; 1}, is the collection of possible pairs of processors to communicate in step t, we ÿrst ÿx a maximal matching E t in E t = E t (0) ∩ E t (1) and then let E t (b) ⊇ E t be a maximal matching in E t (b), for b = 0; 1. Next, we choose labeled edges. Edge {i; j} is put into E 0; 1 with label t if there is some b ∈ {0; 1} such that in C b this edge is used for exchanging messages m i; t (b) and m j; t (b) but in the other computation C b this edge either is not used or di erent messages ow across it. We need the following observations. Claim 1. If i = i 0 and j = i 0 , and {i; j} has label t, then there is some l ∈ V such that either {l; i} or {l; j} is in E 0; 1 with a label t ¡ t. Claim 2. If i = i 0 , then node i is incident with some labeled edge.
The proofs of these claims are similar as in the 1-way case. Now, we eliminate some labeled edges as follows. For each node i = i 0 , choose the unique incident labeled edge with the smallest label, and direct this edge towards i. The resulting set of n−1 directed edges is called E * 0; 1 . Using Claim 1, is easily seen that if (i; j) ∈ E * 0; 1 with label t, then either i = i 0 or there is an edge (l; i) ∈ E * 0; 1 with label t ¡ t. This implies that E * 0; 1 forms a directed spanning tree of G with root i 0 . Again, by the deÿnition of E 0; 1 , at most two directed edges with the same time stamp can leave any node. This means that from E * 0; 1 we obtain a 2-broadcast protocol by deÿning E t = {(i; j) | (i; j) ∈ E * 0; 1 and (i; j) has label t}.
We leave it to the reader to work out the details of the following argument. We wish to apply Theorem 5.8 to PRAMs with ARBITRARY READ and ARBITRARY WRITE with acknowledgement. To this end, we model both processors and cells as processors in a network. The aim is to inform all p processors of one bit initially located in one cell. If we carry out the construction of the edge set E * 0; 1 as in the previous proof, it turns out that the resulting tree splits into levels, which alternatingly correspond to PRAM processors only and to PRAM cells only. Due to the special properties of the read operation there can be at most one edge with time stamp t running from a "cell node" to a "processor node" at a higher-numbered level, whereas from a "processor node" two edges with the same time stamp may emanate. This implies that the recurrence inequalities mentioned at the beginning of this section also hold here. In this way we obtain the following generalization of the main result from [3] . (A similar result for the related ERCW PRAM model has been stated in [24] .) Theorem 5.9. Broadcasting a bit from one cell to all p processors in a PRAM with the ARBITRARY READ rule and the ARBITRARY WRITE rule with acknowledgement takes log 2+ √ 3 p ± O(1) steps. The same bound holds for broadcasting a bit on a DMM with p processors and p memory modules.
Theorem 5.10. Any randomized algorithm for broadcasting a bit from one cell to all p processors in a PRAM with the ARBITRARY READ rule and the ARBITRARY WRITE rule with acknowledgement takes (log p) steps. The same bound holds for broadcasting a bit on a DMM with p processors and p memory modules.
Proof (Sketch). From a randomized algorithm that performs broadcasting of one bit, such that with probability 1 − after T steps all processors know the result, one obtains, by standard methods, a deterministic algorithm that performs broadcasting a bit to (1 − 2 )p processors in T steps. To this algorithm we can apply the method for proving the previous theorem.
