This paper presents two algorithms for mutual exclusion on optical bus architectures including the folded one-dimensional bus, the one-dimensional array with pipeleined buses (1D APPB), and the twodimensional array with pipelined buses (2D APPB). The rst algorithm guarantees mutual exclusion, while the second guarantees both mutual exclusion and fairness. Both algorithms exploit the predictability of propagation delays in optical buses.
Introduction
Fiber optics based communication technologies have been utilized in Wide Area Networks because they o er higher bandwidths and lower error probability t h a n o t h e r c o m m unication technologies. More recently, optical properties such as unidirectional propagation and predictable propagation delays have b e e n touted as bene cial in building massively parallel processing systems using optical buses for interconnection among processing nodes. Time Division Multiplexing (TDM) enables the implementation of pipelined optical buses 1]. Alternatively, Wavelength Division Multiplexing (WDM) can be used to create multiple channels on which multiple messages can simultaneously be transmitted. The multiple channels can be either statically or dynamically allocated to processors 2]. Utilizing the ability to transmit multiple messages simultaneously on optical buses (in pipelined fashion or using multiple channels), researchers have described e cient algorithms for parallel computing based on the message-passing paradigm (See 3] for an excellent survey). To the best of our knowledge, the shared-memory paradigm for parallel computing has not been investigated for its suitability on optically interconnected parallel processing systems. In this paper we describe how mutual exclusion can be implemented on pipelined optical buses. The problem of mutual exclusion is critical to the shared-memory paradigm of parallel computation since it arises whenever concurrent access to shared resources by several sites is involved 4]. For correctness it is necessary to guarantee that the shared resource is accessed by only one site at a time (i.e., mutual exclusion). In addition to assuring mutual exclusion, techniques for mutual exclusion in distributed systems must exhibit the following characteristics: (1) Freedom from deadlocks: Two or more sites should not endlessly wait for events that will never occur. An event can be the arrival of a message. (2) Freedom from starvation: A site should not be forced to wait inde nitely to acquire a shared resource, while other sites are repeatedly acquiring the resource. In other words, any s i t e should be allowed to request and acquire a shared resource in a nite amount o f time. (3) Fairness: Fairness dictates that requests must be executed in the order they are made, or in the order they arrive in the system.
In this paper we will assume Time Division Multiplexing based optical wave guides that facilitate unidirectional pipelined buses. Our work is based on two related models for systems that utilize pipelined buses: Linear array w i t h a recon gurable pipelined bus system (LARPBS) 1, 5] and array with recon gurable buses (AROB) 6]. Many parallel algorithms were proposed using these models. In LARPBS counting is not allowed during a bus cycle. In fact, the LARPBS does not allow a n y processing during a bus cycle, except for setting switches at the beginning of a bus cycle. AROB allows counting during a cycle. More detailed description of the functional behavior of such pipelined buses, and how \coincident pulse" methods can be used to achieve point-to-point, multicast and broadcast data communications can be found in 5] .
In this paper, we will make the following extensions to the pipelined bus models: (1) Bus cycle: Most researchers de ne a bus cycle as the time needed for a message to travel the entire length of the bus. For our purposes, we de ne a cycle as the amount time needed for a message to travel one segment of the pipelined bus (that is, the time to travel between two adjacent sites). (2) Computation during a bus cycle: Atomicity which is fundamental to the implementation of mutual exclusion requires that computation be performed as events occur. This implies that delay loops may be required in pipelined optical bus segments to accommodate computations required by the techniques described in this paper. In other words, we will assume that during a cycle, an optical message travels one segment of the bus, and a processing node performs some computations when a message is received. Alternately, w e can require two phases: Communication and Computation. During communication phase, processors transmit (and receive) messages on the optical pipelined bus. At most one message can be placed on the bus by a processor. All messages move along the bus synchronously. The communication phase is complete only when (all) messages travel the entire length of the bus assuring receipt by all processors. During the computation phase, processors examine the messages received during the communication phase, and perform computations as needed to achieve m utual exclusion. The system operates with alternating communication and computation phases. To simplify this paper, we will assume that bus cycle includes computation. However, our techniques will also work in the second case with alternating communication and computation cycles. (3) Bus contention: Since we include computation in a bus cycle, we permit asynchronous and simultaneous requests from sites. In other words, processors are NOT required to transmit messages synchronously, and they can place a request at any time. This can lead to bus contention as the messages move d o wn the pipelined buses, if new message requests are inserted by other sites. In this paper we will assume that the buses are designed to eliminate such collisions on the bus and a processor excludes itself from placing a new message if that message collides with a message that is already on the pipelined bus. This assumption is not needed in the alternating communication-computation phases model described above.
Optical Bus Architectures

The Folded One-Dimensional Bus
The folded one-dimensional bus model 7, 8 ] is shown in Figure 1 . It consists of a sequence of n equidistant processors connected by a folded bus. A processor is connected to both the upper and lower segments of the bus. A processor transmits messages on the upper segment of a bus and receives messages from the lower segment. In Figure 1 , a message originates on the upper segment and travels in the direction of the arrow. Assume that it takes 1 unit of time for a message to travel from one processor to its neigh- bor. This architecture permits a pipeline of messages to simultaneously coexist on an optical bus. Therefore, processors P 0 and P 1 can place messages on the bus at the same instant of time. An arbitrary processor P i receives P 0 's message exactly 1 time unit after it receives P 1 's message. In general, a message from processor P i to P j takes d ij = 2 n ;1;i;j time units.
As stated earlier, it is assumed that the optical transmission hardware on the upper segment is capable of conditionally transmitting a message so that this will not cause a collision. This will prevent P i from transmitting at time t if processor P i;1 transmitted at time t ; 1.
The One-Dimensional Array with
Pipelined Buses (1D APPB)
The 1D APPB is shown in Figure 2 . This is similar to the folded bus above. Here processors can transmit and receive on either segment. For example, a message from P 0 to P 2 would be sent on the upper segment, while a message from P 2 to P 0 would be sent on the lower segment. Once again, we assume that processors have the capability to conditionally transmit a message so that there are no bus collisions. In this model, a message from P i to P j takes d ij = ji ; j j time units.
Mutual Exclusion Algorithms
Consider a multiprocessing environment where processors are in contention for a particular resource (e.g., a particular word in memory). When this happens, we would like to ensure that at most one processor has access to the resource at any g i v en instant. We a c hieve this below b y assigning a unique integer to each contending processor that indicates the order in which it must access that resource. This integer will be referred to as the processor's \turn". This contention phase is followed by an access phase where each p r ocessor actually accesses the resource in the prescribed order. When a processor has completed its access, it broadcasts a message to this e ect to the other processors and the next processor proceeds to access the resource. The key to implementing this philosophy is to correctly design the algorithm that is to be employed during the contention phase so that di erent contending processors are guaranteed to be assigned di erent turns.
A processor P i rst communicates its request for a given resource to all of the other processors. When processor P j receives P i 's request, it immediately places a hold on any future requests that it might h a ve for that resource until P i 's turn is con rmed. The difculty arises when P j places its request for the same resource before it receives P i 's request. The purpose of a mutual exclusion algorithm is to ensure that this situation is resolved in a consistent fashion. We next present two mutual exclusion algorithms for optical bus architectures. These algorithms are presented in an interconnection architecture-independent fashion.
A Window-Based Algorithm
De ne the diameter D of a network to be the longest delay (or time taken for a message to travel) between any pair of processors in the network i.e., D = m a x ij i6 =j d ij , where d ij is the delay (as de ned in Sections 2.1 and 2.2) between P i and P j . Note that the diameter depends on the speci c optical bus architecture that is being considered. Next, we de ne the window of vulnerability for each processor to be twice the diameter of the network i.e., 2D.
We begin by outlining some assumptions and principles on which our algorithms are based.
Rule 1:
A resource request is a message that contains the id of the processor making the request and the resource being requested.
2. Rule 2: When a processor makes a request for a resource, it may not make additional requests until the window of vulnerability for its request has elapsed.
3. Rule 3: A processor P j that receives a request for a resource from P i delays making a future request for that resource until P j 's window of vulnerability has expired.
Rule 4:
When requests from several processors overlap in time, processor priorities are used to resolve t h e requests. Without loss of generality, we assme that a lower numbered processor has higher priority (i.e., P 0 has the highest priority). Our mutual exclusion algorithm requires certain actions to be taken by a processor when it receives a resource-request message and when it wishes to send a resource-request. Algorithm Receive Request (Figure 3) below describes the actions to be taken by a processor i when it receives a request for a resource at time t. For convenience, we consider the current t i m e step (t) and the processor's ID (i) to be arguments to the function. The Resource ID parameter (r) and the Time parameter (t i ) are assumed to be NULL if P i has no outstanding requests at time t. If P i does have outstanding requests at time t, then these quantities denote the resource that was requested and the time at which the request was sent.
Receive Request(TIME t, Proc Note that t t i + 2 D must be true if P i has an outstanding request (Rule 2). The resource table (array) my turn is contained in each processor. It is assumed that these arrays initially contain the same values in each processor. The variable my turn r] for a resource r in processor P i denotes the order in which processor P i will get access to resource r. Thus, the objective is that when several processors request the same resource r, our mutual exclusion algorithm must ensure that each of these processors has a di erent local value of my turn r] that indicates the order in which that processor will access the resource.
When a processor wishes to make a request for a resource r, it rst checks that the current time is greater than that in Example: Consider a folded unidirectional bus architecture (Fig 1) with n = 10 processors. Suppose that processors P 0 and P 2 send resource requests at time 0, P 7 at time 8, P 1 at time 11, P 9 at time 16, and P 8 at time 22. Assume that all requests are for the same resource. We describe the step-by-step operation of our algorithms using Table 1 .
At time 0, P 0 and P 2 send request messages. As per function Send Request, both processors set their my turn variables to 0, increment their next turn variables, and initialize earliest to 2D + 1 = 37. In the next 7 time steps, their messages travel towards the right on the upper segment of the folded bus. At t i m e step 8, P 2 's message becomes available at P 9 's receiver. This causes P 9 to update its next turn and earliest variables as outlined in function Receive Request. Simultaneously, i n s t e p 8 , P 7 sends a request. In steps 9 through 17, P 2 's message travels towards the left in the lower segment causing variables in all the processors to be updated. Each processor increments next turn and, if necessary, updates earliest. Also, observe, that P 7 increments its my turn variable in step 10 since it has a l o wer priority than P 2 . However, P 0 does not change its turn in step 17 since it has higher priority t h a n P 2 . In the mean time, P 1 sends a resource request in step 11 and its next turn variable is subsequently updated when P 2 's message passes through in step 16. The messages from P 0 ,P 7 , a n d P 1 make their appearances at P 9 at steps 10, 11, and 20, respectively. In subsequent steps, these messages travel towards the left in the lower segment updating variables as speci ed in Receive Request. Notice that when a message passes the processor from which it originated, no change is made to the processor's variables (e.g., P 2 in step 15, P 0 in step 19, P 7 in step 13, and P 1 in step 28). Finally, we observe t h a t t h e messages from P 9 and P 8 , which w ere to be sent at steps 16 and 22, respectively, never get sent in our snapshot! This is because the value of the earliest variable in those processors at the speci ed times are greater than the time step (e.g., the earliest variable at step 16 in P 9 is 44 and 44 > 16). Observe, that on completion, all processors have a l o c a l next turn value of 4 and an earliest value of 48. The four processors that got their requests out before receiving any requests (i.e., P 0 , P 1 , P 2 , and P 7 ) h a ve been given unique turns according to their priorities.
Lemma 1 If the windows of vulnerability for two processors overlap, then both processors are scheduled consistently with respect to each other.
Proof Let P i and P j be two processors whose windows overlap. Without loss of generality, assume that P i makes its request rst at time t i . Since P j made an overlapping request, the request must have been made at time t j such that t i t j t i + d ij (Rule 3). P j 's request reaches processor P i at time t j + d j i t i + d ij + d j i t i + 2 D. Therefore, P i receives P j 's request in its (P i 's) window of vulnerability and vice versa. Since processors are prioritized consistently throughout the network, the turns allocated to each processor are consistent with respect to each other e.g., if j < i , then P i 's turn will be incremented whereas P j 's turn will remain the same. Proof We s h o w that di erent processors requesting the same resource will be assigned a \turn" for that resource such that each processor is assigned a di erent turn. First, we show that any s e t S j of requests for resource j over some period can be partitioned into subsets S all overlap with each other and do not overlap with windows of vulnerability f r o m a n y S l j , l 6 = k. If this is not true, there must exist a triple of requests from P i , P j , a n d P k such t h a t Since priorities are hard-wired into the algorithm of the previous section, the mutual exclusion protocol described in the previous section is consistently biased against low priority (i.e., higher numbered) processors. Thus, the algorithm is not fair since, even if the request from a lower priority site originated earlier than that of a higher priority site, the higher priority s i t e m a y b e granted the mutual exclusion request before the lower priority site. For example, every time P i and P j , i < j , place requests so that their windows overlap, P i gets access to the resource before P j even if t j < t i . In this section, we present an algorithm that operates on a First Come First Served (FCFS) basis i.e., if P j places a request before P i , it gets earlier access to the resource even if their windows overlap. We revert to the priority s c heme of the previous section in the relatively unlikely event that several processors place a request at the same time. Algorithm Receive Request ( Figure 5 ) utilizes the predictability of delays in an optical bus to determine the time of request of a message based on the time that the message is received at a processor. For example, if P i receives a request message from P j at time t, P j 's message must have originated at time t ; d j i . If this quantity is less than t i , the time at which P i sent its message, P j gets access to the resource before P i .
Theorem 2 The algorithms presented above guarantee mutual exclusion and fairness.
Proof Mutual exclusion is guaranteed by reasoning identical to that in Theorem 1. The timestamp technique described above clearly causes requests to be handled in an FCFS fashion. 2
We note that the algorithms can be easily extrapolated to other optical networks by c hoosing appropriate values for d ij .
Performance Analysis
In this section, we analyze the performance of the Timestamp algorithm of the previous section assuming the folded bus architecture. We begin by de ning some performance measures for mutual exclusion algorithms. Our In order to determine the synchronization delay S D , w e assume that when a processor releases a lock, it broadcasts a release message to all the other processors. On a folded bus, this takes a minimum of n units (for P n;1 ) and a maximum of 2n ; 2 units (for P 0 or P 1 ). To determine the throughput T H , w e assume that a processor spends E time units in a critical section. Then, T H = 1 =(S D + E ). Appropriate values for S Dcan be substituted to obtain minimum and maximum throughput. Next, we present best and worst case analyses for response time RT for some arbitrary processor P i . The best case is when the lock i s not held by a n y other processor and no other processor requests the lock during P i 's window of vulnerability. In this case, RT = 2 D + E . Alternatively, i f w e m a k e the common assumption that in the best case, P i has to wait for a single processor P j to release the lock, RT also includes the remaining execution time of P j (E = 2 on the average) and the synchronization delay after P j releases the lock ( a p p r o ximately 3n=2 o n t h e average). Then RT = 2D + 3 =2E + 3 n=2. In the worst case, all processors request the resource at the same instant o f t i m e a n d P i 's request receives the lowest priority (i.e., i = n ; 1). So, after waiting for the window of vulnerability t o e x p i r e , P n;1 must wait for the remaining n ;1 processors to execute and to incur their synchronization delays before perfoming its own execution. Here, RT = 2D + nE + 2 n ; 2 + n;2 i=1 (2n ; i ; 1) = 2D + nE + 3 =2n 2 ; 3=2n ; 1 Note that P 0 has S D = 2 n ; 2, while an arbitrary P i has S D = 2 n ; i ; 1.
