Time-division multiplexing (TDM) optical buses have applications in computer networks, high-performance routers and switches, and interprocessor communication subsystems of parallel computers. Three waveguides are used to implement the previously proposed TDM optical bus. One waveguide is used for data transmission, and the other two waveguides are used for processor addressing. The coincident pulse technique is used to implement a unary addressing scheme. This architecture has two drawbacks: the bandwidth of address waveguides is wasted and the scalability of the bus is limited due to the unary addressing. In this paper, the coincident pulse technique is generalized in order to derive compact addressing schemes for TDM optical buses. Based on this new technique, optimal addressing schemes are presented. Several general methods for designing new addressing schemes are proposed to achieve various trade-offs. Compared with the previously proposed addressing scheme, the new schemes significantly improve the utilization of the bandwidth of optical waveguides and the scalability of a TDM optical bus.
all-optical communications among its connected nodes. A node that is connected by more than one link is considered as a routing node. Communications between two nodes that are not connected by a link are conducted along a path connecting the two nodes in the hypergraph corresponding to the network, and optical-to-electrical and electrical-to-optical conversions are performed at the routing nodes on the path. Routing in such an optical network consists of two components: self-routing in the optical domain for sending a packet from one node to another on the same link, and forwarding at a routing node by using the conventional routing method (such as table lookup). Figure 1 illustrates the structure of such an optical network. In parallel computing, pipelined TDM optical buses can be used to construct processor arrays. Many efficient parallel algorithms have been developed for such processor arrays (refer to [5, 9 11, 14, 15] for examples).
An addressing scheme based on the coincident pulse technique is used in the previously proposed TDM buses. Addressing in the optical domain increases the signal transmission rate of an optical bus. However, the available bandwidth of waveguides in previously proposed bus architectures is not fully utilized due to the following facts: the unary coding in the existing addressing scheme uses n pulse slots (n is the number of processors connected by a bus), which may take a large portion of a packet, and the address waveguides are only used for addressing purposes. In this paper, we first simplify the previous pipelined TDM optical bus by reducing the number of address waveguides and proposing to also use address waveguides for transmitting data. As a consequence, there is no need to distinguish waveguides for addressing and data transmission. Then we introduce a generalization of the known coincident pulse technique used in processor addressing. As a result of the combination of these two techniques, the utilization of the available bandwidth of all waveguides can be significantly increased. We demonstrate this by introducing several optimal addressing schemes. We also propose several general addressing scheme design methods that can be used to design a wide variety of addressing schemes achieving trade-offs among a few parameters. It is important to note that the applicability of the addressing schemes proposed in this paper is broad. They can be used in optical buses adopting either fixed-assignment TDM (synchronous TDM) multiaccess protocol or demand-assignment TDM (asynchronous TDM) multiaccess protocol, with or without pipelined transmission, even though our discussions focus on the pipelined optical bus using fixed-assignment TDM multiaccess protocol.
PIPELINED TDM OPTICAL BUS ARCHITECTURE
A pipelined optical bus consists of three folded waveguides, the message waveguide, the reference waveguide, and the select waveguide, connecting linearly ordered n processors, P 0 , P 1 , ..., P n&1 (refer to Fig. 2 ). The message waveguide is used for carrying data. The reference waveguide and the select waveguide, which are also called address waveguides collectively, are used together for carrying address information encoded by using the coincident pulse technique [2, 8] . Each waveguide of the bus is divided into two segments, the transmitting segment, which is the upper half of the waveguide with taps originating from processors, and the receiving segment, which is the lower half of the waveguide with taps leading to processors. Let d be the pulse duration in time and v l be the velocity of light in these waveguides. Define a unit delay to be the spatial length d } v l . The generic pipelined TDM optical bus configuration is given in Fig. 3 , where only one waveguide is shown. In this configuration, the spatial separation of any two adjacent taps on a waveguide has D unit delays. The bus architecture of Fig. 2 is obtained from the generic bus configuration by adding a loop between every two adjacent taps on the receiving segments of the reference waveguide and the message waveguide. Each loop is an extra fiber segment of a unit delay. These added delays are used for optical self-routing purposes.
Referring to Figs. 2 and 4a, we explain the coincident pulse technique. When a source processor sends a packet (which will be defied shortly), it sends a reference pulse and a select pulse. The select pulse is transmitted later than the reference pulse with an appropriate time delay so that the two pulses arrive at the destination processor at the same time. That is, the coincidence of the two pulses occurs at the desired destination. Whenever a processor detects a coincidence of a reference pulse and a select pulse, it starts to read from the message waveguide. More specifically, suppose that a processor is going to send a packet to processor j. We use t ref and t sel to denote the time when the processor transmits its reference pulse and its select pulse, respectively. The two light pulses will coincide at processor j if and only if
We call the duration of each light pulse a pulse slot. Define a packet as a collection of information including an address frame and a data frame (see Fig. 4c ). An address frame consists of a sequence of n pulse slots, as shown in Fig. 4a . The pulse slots in the address frame are named a 0 , a 1 , ..., a n&1 . A reference pulse is sent in a 0 of the reference waveguide. The existence of a select pulse at pulse slot a j means that a packet is to be sent to the processor j. The address frame for broadcasting can be easily implemented by setting a select pulse at each pulse slot of an address frame (see Fig. 4b ). For easy reference, we call this addressing scheme the unary addressing scheme.
The simplest multiaccess protocol for this bus is the classical fixed-assignment TDM (FA-TDM). Figure 3 illustrates this bus access method on a generic pipelined TDM optical bus configuration. Using this method, the data transmission is partitioned into a sequence of phases, each consisting of a sequence of n packet slots, from n&1 down to 0 (see Fig. 3 ). Packet slot (PS for short) PS i is fixed for processor i. Imagine a train of n packet slots is originated from the entry of a bus. If processor i has a packet to send, it loads its packet to PS i . Let L[PS] be the packet length in terms of unit delays. In order to provide correct unary addressing and prevent packet overlaps, the condition D L[PS]>n must hold. Several variations of this bus can be found in [11, 14, 20] . The coincident pulse addressing method also works for processor addressing in a TDM optical bus without using pipelined transmission.
In order for the pipelined optical TDM bus to operate properly, precise timing is required for the synchronization of processor operations. One can use one or more separate waveguides to connect all processors to a single global clock to ensure that all processors virtually share an identical global time. Different clock distribution implementation methods based on predictable optical signal delays are discussed in [17] . Suppose that the global (system) clock provides starting pulses of packet slots. These global clock pulses serve as packet envelopes. Then each processor can generate finer clock signals in the way shown in Fig. 5 , and these finer clock signals are used to synchronize the transmitters and receivers of the processor. The small circles in Fig. 5 represent the extra segments that are required to delay the starting pulse of each packet. For easy reference, we call the time interval between the rising edges of two adjacent packet starting pulses a packet period, and we call t i in such an interval the ith pulse of the packet period. Clearly, repeated subsequences of the finer clock signals can be extracted by optical hardware similar to the one shown in this figure. All discussions in the rest of this paper are based on precise timing and synchronization.
Note that this clock distribution method can also be applied to the optical links in a multihop TDM optical network. In such a network, the spatial separations of adjacent taps may not be the same. One can use one waveguide to distribute the clock signals. The rightmost processor of Fig. 2 (processor n&1) is called the head processor of the bus. If the packet envelop pulses are sent by the head processor, these pulses can be used to synchronize the transmitters and receivers of all processors. As long as all the spatial separations of adjacent taps are greater than the packet length L[PS], pipelined packet transmission can be ensured.
IMPROVED UNARY ADDRESSING
Observe that it is a big waste if the select and reference waveguides only serve the purpose of addressing. In fact, these two waveguides can also be used for data transmission.
Consider the bus architecture of Fig. 2 , and suppose that processor P i wants to send data to processor P j . Processor P i uses the address setting of Fig. 4a , and loads the data frame of a packet slot on all three waveguides. After this is done, we have a situation shown in Fig. 6a . The snapshot at time that processor P j detects the coincidence of select and reference pulses is shown in Fig. 6b . At this time, processor P j expects to read a data frame from each reference and message waveguide after exactly n&1 additional pulse slots, and it expects to read a data frame from select waveguides after exactly n& j&1 additional pulse slots. Actually, the unit delay loops on the receiving segment of the message waveguide (see Fig. 2 ) are not necessary. Assume, that these loops are not present. Then, at the time processor P j detects the coincidence of select and reference pulses, it expects to read a data frame from each of select and message waveguides after exactly n& j&1 additional pulse slots, and read a data frame from reference waveguide after exactly n&1 additional pulse slots, as shown in Fig. 6c . The subsequences of clock signals for sampling the data frame of each waveguide can be provided by optical hardware similar to that shown in Fig. 5 . Consequently, the message waveguide becomes redundant and can be eliminated. Compared with the previously proposed method, this modified method not only reduces the number of required waveguides by one, but also doubles the data transmission rate. 
GENERALIZED COINCIDENT PULSE TECHNIQUE FOR REDUCING ADDRESS FRAME LENGTH
The bandwidth utilization of the waveguides in a TDM optical bus is restricted by the unary addressing scheme because of the requirement that L[PS]>n. The length of unary addresses increases linearly as a function of the number of processors, and this makes packet size L[PS] large for a bus system with many processors. Large packet size results in inefficiency. Furthermore, the requirement D L[PS] for a pipelined TDM bus makes the bus less scalable. Thus, reducing the address frame size is of importance in improving the system efficiency and scalability.
In the rest of this paper, we introduce addressing schemes with reduced address frame lengths and discuss the hardware required for supporting these schemes. All these new addressing schemes are based on the generic TDM optical bus architecture depicted in Fig. 3 and the coincident pulse technique. One or more waveguides are used for both addressing and carrying data. Unlike the bus architecture described in Section 2, there are no delay loops on the receiving segments of the waveguides of new bus architectures. We generalize the coincident pulse technique by using unit delay loops (if required) on the taps leading to a processor and optical logical gates, such as optical AND, XOR (exclusive OR), and NOR gates [7] , to form an address detecting circuitry for the processor. This circuitry evaluates in the optical domain a Boolean expression associated with the processor. The values of the variables of this expression are transmitted sequentially on one or more waveguides.
Suppose that w waveguides A 0 , A 1 , ..., A w&1 are used to implement a TDM optical bus. We will not separate message waveguides from address waveguides. All waveguides are used for carrying address information as well as data. To facilitate our discussion, we generalize the notion of a packet. A two-dimensional packet slot PS of length L[PS] is a w_L[PS] array F of pulse slots, which is divided into two parts, a w_p subarray AF and a w_(L[PS]& p) subarray DF called (twodimensional) address frame and (two-dimensional) data frame, respectively. As before, address frame and data frame are used for addressing and carrying data, respectively. The kth column of F corresponds to timing signal t k of a packet slot. An optical pulse in pulse slot F r, s , 0 r w&1 and 0 s L[PS]&1 can only be injected by the sending processor at its time t s of the packet slot dedicated to the processor. The pulses in the same column of F are injected into the waveguides at the same time, and the pulses in row i of F are transmitted over waveguide A i . The state of F r, s can be detected by a receiving processor. If a processor wants to send a packet to processor P j , it sends address information that encodes j in the address frame AF of the packet. Processor P j determines whether or not a packet is destined for it by checking the optical binary information carried in the address frame AF of its enclosing packet using its optical address detecting circuitry, and the coincidence of a set of specified pulses indicates that the packet is addressed to P j . Our coincident pulse technique generalizes the known technique by allowing one to detect the presence and absence of pulses transmitted over one or more waveguides.
Remarks. The delay loops are introduced on the waveguides of the bus structure of Section 2, and if we exchange the connecting positions of two processors on such a bus, then their addresses must be exchanged. Since the unit delay loops used in our generalized coincident pulse technique are not introduced on the waveguides but on the (input) taps leading to receiving processors, we can consider input taps of a processor as part of the address-detecting circuitry of the processor. Then, if we exchange the connecting locations of two processors (along with their address detecting circuits), their addresses remain unchanged. Thus, using generalized coincident pulse technique, processor addresses are not dependent on the relative physical positions of processors. Therefore, the addressing schemes based on our generalized coincident pulse technique provide flexibility for the bus construction compared with the previous known coincident pulse technique.
BASIC ADDRESSING SCHEME DESIGNS
We show how to use the generalized coincident pulse technique discussed in the previous section to design basic compact addressing schemes. In Section 5.1, we consider two extreme cases addressing schemes with one pulse slot and addressing schemes with one waveguide, and in Section 5.2, we consider addressing schemes that use multiple pulse slots and multiple waveguides. The generalization of these schemes will be discussed in Section 6.
Vertical and Horizontal Addressing Schemes
We call a w_1 addressing scheme a vertical addressing scheme and a 1_ p addressing scheme a horizontal addressing scheme. A horizontal addressing scheme uses a minimum number of waveguides, and the length of a vertical addressing scheme is the shortest possible.
First, we introduce a simple vertical address scheme. Assume n=2 w . Let j w&1 j w&2 } } } j 1 j 0 denote the w-bit binary representation of integer j, and label each processor P j , 0 j 2 w &1, by j w&1 j w&2 } } } j 1 j 0 . There are 2w waveguides, A 0 , A 1 , ..., A 2w&1 . Relabel these waveguides by B i , B i , for 0 i w&1. Such a TDM optical bus connecting 8 processors (where w=3) is shown in Fig. 7 .
We label the only column in an address frame array by a. Suppose that a processor wants to send a packet to processor P j . It sends a pulse in pulse slot a of a packet as follows: if j k =0, it sends a pulse in a of B k , and if j k =1 it sends a pulse in a of B k . Any processor P j detects that a packet is addressed to it by sensing coincidence of exactly w pulses from the w address waveguides. The coincidence of w pulses can be detected by using an optical AND gate in a processor. The inputs of the AND gate are as follows: if j k =0, then the AND gate has an input from B k , and if j k =1, then the AND gate has an input from B k . Suppose that processor P 3 wants to send a packet to processor P 5 in the bus shown in Fig. 7 . It sets the address pulse in the address frame of six address waveguides as shown in Fig. 8a . All 2w waveguides can be used to transmit data.
The receiving circuitry of processor P 5 is shown in Fig. 9 . In this figure, a sequence T of timing signals t 1 , t 2 , ..., t L[PS]&1 , which are generated by the hardware similar to Fig. 5 , are used to sample the pulse slots of a data frame. The timing signal t 0 , which corresponds to the pulse slot a, is used to detect the coincidence of three addressing pulses.
Given an addressing scheme AS, let W[AS] and N[AS] denote the number of waveguides implementing AS and the maximum number of addressable processors by AS, respectively. Denote the vertical addressing scheme discussed above by AS v 1 . Clearly,
Though the address frame length of this addressing scheme is minimized, the utilization of waveguides in terms of the number of addressable processors is not maximized. We now show how to increase the number of processors connected by an optical bus using vertical addressing schemes.
FIG. 8.
Address settings for a bus using 1-pulse addressing scheme: (a) sending a packet to processor P 5 and (b) broadcasting to all processors.
FIG. 9.
The receiving circuitry of processor P 5 of the bus shown in Fig. 7 .
The basic idea of AS v 1 is to associate each processor P j with a unique subset W j of address waveguides. A waveguide A i is used to identify processor P j if and only if A i is in W j . The coincidence of pulses from all waveguides in W j at time t 0 indicates that a packet is addressed to P j . However, for two subsets W j and W k of a given set of waveguides to address different processors P j and P k , they must satisfy W j 3 W k and W k 3 W j ; otherwise either W j or W k addresses both P j and P k . If we use each x-subset of a given set of waveguides to address a unique processor, then the total number of distinct x-subsets of a set of w waveguides is ( 
For AS v 1 and AS v 2 using the same number W of waveguides,
(by Stirling's formula). Vertical addressing scheme AS v 2 can connect much more processors than vertical addressing scheme AS v 1 . For example, using 6 address waveguides, AS v 2 addresses 20 processors, whereas AS v 1 addresses 8 processors.
In an 1_p horizontal addressing scheme, we can also use pulses in a subset of pulse slots in an address frame to address a processor. The selected pulses in different pulse slots of an address frame are delayed using delay loops for appropriate units so that they coincide at time t p&1 of the timing sequence. For example, consider a horizontal address frame of length 4. Denote the ith pulse slot of an address frame by AF i . For detecting whether or not there are optical pulses in pulse slots AF 0 , AF 2 , and AF 3 , the circuitry shown in Fig. 10 can be used. By an argument similar to that for the vertical addressing scheme described above, the maximum number of processors can be addressed by a 1_ p addressing scheme using this technique is
Given X . An optimal addressing scheme addresses a set of processors using minimum addressing information.
Consider the design of an optimal vertical addressing scheme that uses w waveguides (A 0 , A 1 , ..., A w&1 ) for addressing 2 w processors. Denote the only address pulse slot on address waveguide A i as AF i, 0 . We use the binary representation j w&1 j w&2 } } } j 0 to represent the address of processor P j and associate the presence and absence of a pulse in a pulse slot with binary value 1 and 0, respectively. Treating a pulse slot AF i, 0 in the address frame as a Boolean variable, then for a processor to send a packet to P j , it sends a pulse in AF i, 0 (i.e., setting AF i, 0 =1), 0 i w&1, if and only if j i =1. Processor P j detects a packet addressed to it if (AF 0, 0 = j 0 ) 7 (AF 1, 0 = j 1 ) 7 } } } 7 (AF w&1, 0 = j w&1 )=Ã w&1 i=0 (AF i, 0 = j i )=1 at time t 0 , where 7 is the logical AND operator. Processor P j detects coincidence of w+1 pulses by a circuitry constructed using optical AND and NOR gates. For example, for w=6 and j=53 (note: the binary representation of 53 is 110101), the   FIG. 11 . Address detecting circuitry of processor P 53 : (a) optimal vertical addressing scheme and (b) optimal horizontal addressing scheme.
coincidence detecting circuitry associated with P j is shown in Fig. 11a . The NOR gate plays the role of several inverters. Now we show how to design an optimal horizontal addressing scheme that uses one waveguide and address frames of length p to address 2 p processors. Denote the ith pulse slot of an address frame by AF i . We use the binary representation j p&1 j p&2 } } } j 0 to represent the address of processor P j and associate the presence and absence of a pulse in a pulse slot with binary value 1 and 0, respectively. We treat each pulse slot AF i as a Boolean variable. When a processor sends a packet to P j , it sends a pulse in AF i (i.e., setting AF i =1), 0 i w&1, if and only if j i =1. Processor P j determines a packet addressed to it if it detects Ã w&1 i=0 (AF i = j i )=1. Processor P j detects this condition by a circuitry constructed using unit delay loops, and optical AND and XOR gates. More specifically, an XOR gate plays the role of several inverters, and ( p&i&1) unit delay loops are needed for enforcing that the delayed signals u i to coincide with timing signal t p&1 . The pulse coincidence detecting circuitry associated with P 53 is shown in Fig. 11b .
Optimal schemes have several drawbacks, which include excessive splitting loss, additional optical gates and delay loops. Another disadvantage of optimal vertical and horizontal addressing schemes is that they cannot be used to broadcast a message to all processors effectively. Improved broadcasting can be accomplished by either using one more address waveguide or an extra address pulse slot. In the former case, the presence of a specified address pulse on the additional waveguide indicates that the messages on all waveguides are broadcast to all processors. In the latter case, the presence of a pulse in an additional address pulse slot indicates that the messages on all waveguides are broadcast to all processors. In either case, a processor receives the broadcast message(s) after detecting this designated pulse.
Block Addressing Schemes
In this section, we consider w_ p addressing schemes with w 1 and p 1, which are called block addressing schemes. Vertical and horizontal addressing schemes are special cases of block schemes with p=1 and w=1, respectively.
Regarding an address frame as a Boolean array, we know that there are exact 2 wp distinct w_ p Boolean address patterns, and consequently at most 2 wp processors can be addressed by a w_ p block addressing scheme. Thus, a w_p block addressing scheme AS is an optimal block addressing scheme if it can be used to address 2 wp
processors (note: I [AS]=L[AS] } W [AS]=w } p).
In the following, we will show how to design an optimal block addressing scheme.
We associate each processor P k with a unique w_ p Boolean array AP k (an address pattern) as its address. Any processor who wants to send a packet to P k sends a pulse in the pulse slot AF i, j of the address frame in the packet if AP wp processors can be addressed by this method. An optimal block addressing scheme shares the same advantages and disadvantages of the optimal horizontal and vertical addressing schemes.
To remedy some of the disadvantages of optimal block addressing schemes, a sun-optimal block addressing scheme can be designed by enforcing the address patterns to satisfy certain properties. For example, we may restrict that each row of a w_p address frame contains exactly one 1, and map each such address frame to a unique processor as follows: a processor's address is represented by a w-digit p-ary number (i. . If a processor wants to broadcast a packet to all other processors, it can simply send pulses in all pulse slots in an address frame. It is easy to verify that this addressing scheme works properly. For easy reference, we call this class of addressing schemes base-p block addressing schemes. For example, for w=4, the architecture using base-2 block FIG. 13 . Address settings for a bus using the base-2 block addressing scheme: (a) sending a packet to processor P 10 and (b) broadcasting to all processors. scheme is shown in Fig. 12 . The address settings for addressing P 10 and broadcasting are shown in Figs. 13a and 13b , respectively.
GENERALIZATIONS
The addressing schemes presented so far can be used as building blocks for constructing new schemes. In this section, we propose several general methods for designing new addressing schemes from existing ones. These methods can be used to achieve trade-offs among several parameters such as number of waveguides, address frame length, number of processors to be connected, and the complexity of address-detecting circuitry associated with processors. An additional important feature of the addressing schemes derived from these methods is the modular implementation of address-detecting circuitry: the address-detecting circuitry for each processor can be divided into several components (i.e., modules), and the address-detecting circuitry of more than one processor can have components of the same structure.
Grouped Addressing Schemes
We first introduce a class of block addressing schemes called grouped addressing schemes. There are two types of grouped addressing schemes, vertically grouped addressing schemes and horizontally grouped addressing schemes. A vertically (respectively, horizontally) grouped addressing scheme is constructed as follows. We partition a w_p address frame vertically (respectively, horizontally) into k subframes AF m , 0 m k&1, such that AF m is a w_p m (respectively, w m _p) subframe and . For example, consider a 16-processor TDM bus using a 3_4 addressing scheme that is vertically partitioned into two 3_2 addressing schemes AF 0 and AF 1 . The processors are also partitioned into two groups PG 0 and PG 1 , each of them has 8 processors. AF 0 is used to address processors in PG 0 and AF 1 is used to address processors in PG 1 , according to base-2 block addressing schemes only using the rightmost 3 bits of the binary processor address. This bus architecture is shown in Fig. 14. The setting of the address frame for addressing processor P 10 (binary representation of 10 is 1010) is shown in Fig. 16a . Consider a 16-processor TDM bus using a 2_8 addressing scheme that is vertically partitioned into four 2_2 subframes AF 0 , AF 1 , AF 2 , and AF 3 . The processors are also partitioned into four groups PG 0 , PG 1 , PG 2 , and PG 3 , each of them has 4 processors. Subframe AF i is used to address processors in PG i according to base-2 block addressing schemes only using the rightmost 2 bits of the binary processor addresses. This bus architecture is shown in Fig. 15 , and the setting of the address frame for addressing processor FIG. 14. A 16-processor optical TDM bus architecture using a 3_4 vertically grouped block addressing scheme. There are two processor groups, each of them has 8 processors and is addressed by a 3_2 subscheme. P 10 is shown in Fig. 16b . Both grouped addressing schemes can broadcast by sending a pulse in every pulse slot of an address frame.
By comparing Figs. 14 and 15 with Fig. 12 , and comparing Fig. 16 with Fig. 13 , one can easily see the trade-offs between the number of waveguides and the address frame lengths. The address-detecting logics of processors in the architecture shown in Fig. 15 are simpler than that of the processors in the architecture shown in Fig. 14 , and the address-detecting logics of processors in the architecture shown in Fig. 14 are simpler than that of the processors in the architecture shown in Fig. 12 . Referring to Fig. 15 , one can see that several processors have address detecting circuitry of the same structure.
Hierarchical Addressing Schemes
We further generalize the idea of partitioning processors into groups and addressing a processor using combined information of a processor group that contains the processor and the position of the processor within its group by presenting another class of addressing schemes called hierarchical addressing schemes. The idea behind the hierarchical addressing schemes is to partition processors into groups in a recursive fashion.
First, all processors are partitioned into k 1 groups PG 0 , PG 1 , ..., PG k 1 &1 . Then, each group PG i is partitioned into k 2 subgroups PG i, 0 , PG i, 1 , ..., PG i, k 2 &1 . Such a process continues for l levels until a group PG i 1 , i 2 , ..., i l contains only one processor, and this processor can be uniquely denoted by P i 1 , i 2 , ..., i l . Let all processors constitute a level-0 group, and define group PG i 1 , i 2 , ..., i k as a level-k processor group. Let C i j be a Boolean expression indicating that a processor is in the i j th subgroup of a level-( j&1) group if and only if C i j =1. Then, the condition for a processor to be P i 1 , i 2 , ..., i l is C i 1 , i 2 , ..., i l =Ã l j=1 C i j =1. Thus, for a processor P i 1 , i 2 , ..., i l to detect that FIG. 15 . A 16-processor optical TDM bus architecture using a 3_8 vertically grouped block addressing scheme. There are four processor groups, each of them has 4 processors and is addressed by a 2_2 subscheme.
FIG. 16.
Addressing processor P 10 (a) using a 2-group vertically grouped addressing scheme and (b) using a 4-group vertically grouped addressing scheme. a packet is uniquely destined for it, we must enforce that a unique address frame state is used in the packet so that the address-detecting circuitry associated with P i 1 , i 2 , ..., i l detects that C i 1 , i 2 , ..., i l is true.
We can implement this idea in two ways, which leads to two different types of hierarchical addressing schemes, vertically oriented hierarchical addressing schemes and horizontally oriented hierarchical addressing schemes. An l-level vertically (respectively, horizontally) oriented hierarchical addressing scheme is constructed by partitioning a w_p address frame AF vertically (respectively, horizontally) into l subframes AF k , 1 k l, such that AF k contains p k columns (respectively, w k rows) of AF and . . Clearly, the total number of addressable processors using such an addressing scheme is > l i=1 N[AS i ]. The reusability of address-detecting logic components is an inherent property of hierarchical addressing schemes. The circuitry for detecting condition C i k using our generalized coincident pulse technique is a component of the address-detecting circuitry of all processors in the same level-k processor group. Processors in the i k+1 th group of different level-k processor groups use the same circuitry for C i k+1 to detect their memberships in level-(k+1) processor groups.
It is easy to see that the optimal block addressing scheme presented in Section 6 can be implemented as a horizontally oriented hierarchical scheme. In the optimal block-addressing scheme, a processor P k detects a packet addressed to it by detecting condition Ã w&1 i=0 Ã p&1 j=0 u i, j =1, where u i, j =AF i, j if AP k i, j =1, and u i, j =AF i, j otherwise. We can partition AF into subframes AF i , 0 i w&1, each being a row of AF. Subframe AF 0 is used to address 2 p level-1 processor groups by checking Ã p&1 j=0 u 0, j . Subframe AF i is used to address 2 p level-(i+1) processor groups of a level-i group defined by the state of subarray consisting AF 0 through AF i&1 , and each such level-(i+1) group corresponds to a unique condition Ã p&1 j=0 u i, j . Similarly, the optimal block-addressing scheme can also be implemented as a vertically oriented hierarchical scheme. Readers can easily see that the base-p blockaddressing scheme can be implemented as a horizontally oriented hierarchical scheme.
We can combine vertically and horizontally oriented addressing schemes to obtain addressing schemes by allowing partitioning an address frame horizontally and vertically in a recursive fashion. Corresponding to such a recursive partition of an addressing scheme is the recursive partition and addressing of processor groups.
Hybrid Addressing Schemes
Both grouped addressing schemes and hierarchical addressing schemes use the idea of partitioning an address frame into subframes and partitioning processors into groups. The difference between these two classes of addressing schemes lies in the ways the processor groups are addressed by subframes. In a grouped scheme, the separation of address subframes directly corresponds to the separation of processor groups: different subframes are used to address disjoint processor groups. We call this feature the direct separation property. In a hierarchical scheme, a processor group is addressed by a common state of a subset of address subframes: a smaller processor group G" is contained in a larger processor group G$, the subframes used to address G$ form a subset of the subframes addressing G", and the state of the subframes for addressing G$ is also used for addressing G" (with additional address information). We call this feature the inductive separation property. By combining the direct and inductive properties of grouped and hierarchical schemes, we obtain a new class of addressing schemes, called the hybrid addressing schemes. Basically, a hybrid addressing scheme is obtained by partitioning an address frame recursively in alternating directions and correspondingly partitioning processors into processor groups using direct and inductive separation properties.
For example, consider a hybrid addressing scheme using a w_p address frame AF. First, AF is partitioned horizontally into three subframes AF 2p processor groups (this is possible because of the inductive separation property and the optimal horizontal addressing for each AF 1 and AF 2 ), and we use the vertically grouped addressing scheme AF 3 for addressing processors in each group. If we use a (w&2)_1 optimal vertical addressing scheme for each AF 3, i (which is the ith column of AF 3 ), then each AF 3, i can be used to address 2 w&2 processors, and each group addressed by AF 1 and AF 2 has 2 w&2 p processors (due to direct separation property of AF 3 ). Then, the total number of addressable processors by this hybrid addressing scheme is 2 2p+w&2 p. The class of hybrid addressing schemes includes grouped and hierarchical addressing schemes as subclasses. This approach provides maximum freedom for finding trade-offs among various parameters such as cost, performance, and scalability of a TDM optical bus system.
FINAL REMARKS
With the current precision in mechanical layout, subpicosecond time precision is achievable. Techniques employing controlled delay have many applications in data communication using optical waveguides [3] . Pipelined TDM optical buses are examples of such applications. We presented several new addressing schemes for pipelined optical buses using FA-TDM. These schemes improve the utilization of the bandwidth of all the waveguides used, compared with the previously proposed scheme.
The major disadvantage of the fixed-assignment TDM (FA-TDM) is the requirement that each processor has a fixed allocation of packet slot regardless whether or not it has a packet to transmit. A demand-assignment TDM method (DA-TDM) allocates packet slots to processors dynamically according to their demands and the traffic situation. It belongs to the class of asynchronous TDM (ATDM) multiaccess methods. The ATDM multiaccess methods are introduced to reduce the inefficiency associated with fixed slot assignments, with certain degree of penalties for the overheads in polling andÂor slot reservations. A pipelined ATDM optical bus with hardware slot reservation was presented in [20] . The new addressing schemes proposed in this paper can be easily incorporated into the pipelined ATDM optical bus of [20] .
In parallel computing, the pipelined TDM optical bus with conditional delays proposed in [11] and the reconfigurable pipelined TDM optical bus described in [14] utilize the unary coding to facilitate binary prefix-sums computation. Using unary coding, the operation of prefix-sums over n binary operands, one per processor, can be carried out in one bus cycle, which is the end-to-end propagation time of a light pulse on a waveguide used in the optical buses of [11, 14] . Binary prefix-sums operation has many applications such as sorting, data compaction, data partition, and processor reordering. Many efficient parallel algorithms based on efficient parallel binary prefix-sums have been developed (refer to [5, 9 11, 14, 15] for examples). The new addressing schemes proposed in this paper can be incorporated into the pipelined TDM buses of [11, 14] as follows: use a set of address waveguides for one-to-one and one-to-many communications, and use a pair of additional waveguides for binary prefix-sums computation, which is performed across the boundaries of packet slots. This method can reduce the address frame size and increase the bus scalability while accommodating efficient binary prefixsums operations.
