Modern computer buses are typically organized by the three functions of data transfer, addressing, and arbitration control. In this paper we present a ber based bus design which provides optical solutions for each of these functions. The design includes an all optical addressing system, based on coincident pulse addressing, which eliminates the latency contribution and bandwidth limitation associated with electronic address decoding. The control system uses time of ight relationships between a priority c hain and a feedback w a v eguide to implement fully distributed asynchronous and self-timed bus arbitration.
Introduction
Buses are by far the most commonly implemented communication structure within a modern computer system. As optical technology moves from the realm of local area and wide area networks between computer systems, to board level, multi-chip-module, or even chip level communications within computer systems, an optical solution to the fundamental issues in bus design must be devised. In this paper we present a design for a multiple access optical bus. The design includes optical solutions to the problems of data transfer, bus arbitration and device addressing. The problem domain we h a v e c hosen is the backplane of a closely coupled multiprocessor system. In a closely coupled multiprocessor system the resources can be accessed via a single bus level operation without I O transfers, and in a manner which is transparent to both systems and application software. Although the design is presented in this domain it is also applicable to a variety of high speed bus applications, There are number of de ning characteristics for bus applications which distinguish them from other types of optical communications networks. Buses are multiple access links implemented either with tapped bers or optical star couplers. The end-to-end length and propagation delay are relatively small. Bus level transactions consist of short messages which occur with a volume of distinct messages per source which is higher than is typically experienced in network environments.
Multiple access networks require both addressing and arbitration of accesses. However, with short messages, and low end-to-end latency, the overhead for access arbitration and address decoding dominate the total message latency time. At short distances the bandwidth required to be competitive with electronic implementations is substantially higher than other optical network applications. Thus, in order to support the low latency and high bandwidth requirements of this application it is imperative that the optical links provide more than simply a communication channel. A substantial portion of the bus control logic must also be implemented in optics.
Two unique properties of optical signals, unidirectional propagation, and predictable path delay make it possible to base a logic system on the time of ight and relative delay b e t w een two signals. We use these properties heavily in our implementation of addressing and control. For example, our optical address bus provides two paths by which signals may reach a node. Optical addressing is achieved by encoding an address as a di erence in path length between the two paths and using the time of ight and relative signal delays as the address. Arbitration is similarly achieved by using the time of ight of of an optical feedback w a v efront in lieu of a clocking signal in an optical priority c hain.
This paper presents a synthesis of several investigations into the key issues of optical bus design. Separate solutions have been previously devised for the problems of data transfer 1 , device addressing 2 , and arbitration control 3 of an optical bus. We present here a complete and operational bus design, which v eri es the compatibility of these techniques and analyzes their combined performance.
The presentation is organized as follows, Section two, outlines previous research on both networks and optical bus implementations. Section three describes the data bus. Section four introduces our solution to all optical address generation and decoding. Section ve deals with the access arbitration problem and presents an electro-optical distritbuted solution. Section six shows the combined implementation for the bus, in Section seven we report on various experiments designed to determine the technological limits on the implementation.
Background
Optical interconnections o er the potential for gigahertz transfer rates in an environment free from capacitive bus loading, and electro-magnetic interference. The e ectiveness of optical interconnections has been examined from both theoretic 4, 5 and practical perspectives 6, 7 , 8 , 9 , 1 0 .
Over the past decade, much of the research in optical communications networks has focused on applications to wide area networks WANS 11, 12 and Device technology in electro-optics has also matured to a point where small, low p o w er, and low cost devices exist which are suitable for use in bus level implementations 28 . Initial e orts have focused on direct technology substitution 29 in board level 30 and chassis to chassis 31 links. However, there are there are obvious limitations to such substitution. For example, any i n terface between electronics and optics limits the speed of that interface to the speed of electronics. Even though optical pulses as short as few femto-seconds may be generated and detected 32, 33, 3 4 , 35 , such short pulses may not be useful to transmit data on an optical bus without an electro-optic interface of comparable speed. In other words, the speed of electronics bounds the transmission speed of optical buses.
In switched networks, time division switched TDS 36, 37, 38 , space division switched SDS 39, 40 , and wavelength division switched WDS 41, 4 2 implementations have been used to perform message routing in both HSLN and multiprocessor applications. However since switching device technology has developed more slowly than technology for other components, many recent designs have implemented low latency single hop networks". These networks are composed of groups of processors linked by m ultiple passive star couplers which e ciently use optical power, and have simple control structures 43, 44, 45, 4 6 , 4 7 .
The work presented here shares the single hop" concept of the multiple passive star networks but is speci cally adapted to single backbone designs intended to compete with electronic buses and crossbar switches for parallel processors. Given that on a bus each message is broadcast, access arbitration, rather than switching, becomes the control problem. All such networks use control algorithms which are generically called multiple access and whose implementation falls into one of the following three classes. The rst class is the carrier sense multiple access CSMA CD control protocol 48 . Examples of this class are Fibernet 49, 50 and Fibernet II 51 . The primary motivation for optical CSMA CD is compatibility with electronic ethernet systems. Thus, most of the proposed designs resort to electronic collision detection and their performance is bound by Since all of these systems are designed for use in HSLN applications, where relatively long packets of information are sent for each transaction, they share the common characteristic of substantial overhead in controller complexity. In addition, depending on the control structure adopted, a signi cant time overhead is paid for either collision handling or control information appended to message packets.
On the other hand, bus applications are characterized short messages, with a high volume of messages per source. Also, bus lengths are on the order of meters. Given these two c haracteristics, we are speci cally motivated by t w o corresponding design requirements. The rst is to minimize control latency since, in this application, control time dominates overall message latency. The second is to eliminate the additional latency imposed by electronic address decoding. The unique contribution of the work presented in this paper is that a substantial percentage of the overhead required for a bus implementation is processed in the optical domain.
The most common ber based designs are based on either tapped bers, or optical star couplers as shown in Figure 1 . Both or these structures are functionally equivalent. However, their temporal and power distribution characteristics di er signi cantly. F or example, the time of ight, T, for a message to traverse a star coupled system is T = dc g where d the total length of ber connected to both sides of the coupler and c g is the speed of light in the ber. T is thus independent o f the number of transmitting and receiving nodes. The time of ight in a tapped ber system is T = n t + n r lc g where n t and n r are node numbers, counted from the end of the bus, for the transmitting and receiving processors respectively, and l is the length of ber between each n o d e on the bus. Thus, in a star coupled system, each message arrives at all receivers simultaneously while on a tapped ber each message arrives at successive time intervals given by the di erence in optical path length between the receivers. For this reason tapped bers are often referred to as tapped delay lines.
In its power distribution characteristics, the star coupler has a signi cant advantage over a tapped ber. Each of the outputs from the star coupler sees an equal percentage of the optical power injected into the coupler by a transmitting node. If a star coupler has a fanout of N bers, the optical power in each of the output bers is P = p , = N where p is the input optical power from any source ber, and is the excess loss in the coupler. This compares with the power characteristics of a tapped ber, in which P = p1 , k nr+nt where k is the percentage of power removed at each tap. It is possible to emulate the temporal characteristics of a tapped ber in a star coupler design by merely trimming the lengths of each ber of to di er by length l. This retains the favorable power distribution characteristics of the star coupler. When such an implementation is not practical, multi-level taps can be used to reduce losses 2 , or ber ampli er segments can be introduced to restore power 64 respectively. An analysis of the limits on power distribution in tapped ber is presented in section 7. Since temporal characteristics are a key attribute in our bus design, the remainder of this discussion uses both tapped ber and star coupler structures as a representation of speci c temporal characteristics but not as implementation requirements. Thus, where a tapped ber is shown it represents the fact that an optical path length di erence between the transmitters or the receivers must exist for proper operation. Similarly, connections shown through star couplers assume equal path lengths. However, in either case, temporally equivalent implementations could be substituted. Figure 2 shows the structure of data bus portion of the system bus implementation. The temporal di erences between the input side and the output side of the bus are intentional in order to match the temporal characteristics of the address bus. As we will show, data and addresses concurrently traverse identical path lengths and arrive in a xed temporal relationship at each o f the receiver sites.
Optical Addressing using Coincident Pulse Logic
The address bus implementation exploits two properties of optical signals, unidirectional propagation and predictable path delay, which make it possible to encode an address as the relative timing of two optical signals. In this technique, called coincident pulse addressing, the address of a detector site is encoded as the delay b e t w een two optical pulses which traverse independent optical paths to the detector. The delay is encoded to correspond exactly to the di erence between the two optical path lengths. Thus, pulse coincidence, a single pulse with power equal to the sum of the two addressing pulses, is seen at the selected detector site. Other detectors along the two optical paths for which the delay did not equal the di erence in path length detect both pulses independently, separated in time.
Figure 3: Coincident pulse addressing structure Consider the optical addressing structure shown in gure 3. It consists of an optical ber with two optical pulse sources, P 1 and P 2 coupled to each end. Each source generates pulses of width and height h. Assume l = c g where c g is the speed of light in the ber. In other words l is the length of ber corresponding to the pulse width. Using 2 2 passive couplers, n detectors, labeled D 1 through D n , are placed in the ber with the two tap bers from each coupler cut to equal lengths and joined at the detector site. The location of each coupler detector is carefully measured so that the ith detector is located at il. T o uniquely address any detector, a speci c delay b e t w een the pulses generated by P 1 and P 2 is chosen. If this delay i s n , 2 i + 1 the two pulses will be coincident at detector D i The same technique can be generalized to support parallel selections. If the P 1 source generates a single pulse at time t 1 and the source P 2 generates a series of pulses at times t i ; i 2 f 1 ::ng with each t i timed relative t o t 1 . Then, according to the addressing equation above, to select a speci c detector i each t i will be in the range ,n , 1 t 1 , t i n , 1 . Therefore, any or all of the i detectors can be uniquely addressed by a positionally distinguishable pulse from source P 2 . F or convenience, this pulse train is referred to as the select pulse train and the single pulse emanating from P 1 is called the reference pulse. Since the length of the select pulse train is n, and each pulse in the return to zero encoding is separated by 2 , it follows that the system latency, = 2 n . Further, up to n locations may be selected in parallel within a single latency period. Therefore the system throughput is = 1 = 2 . This simple technique can be used as the basis for a practical addressing mechanism for a system bus. Figure 4 shows a design in which the select and reference pulse generators in Figure  3 , are replaced by star couplers. In this structure, each processor, when granted bus access, can independently generate select and reference pulses. Addresses are encoded at each node as relative delays between the reference and select pulses using the coincidence equations above. Coincidences resulting between the select pulses and the reference pulse may select one or more destination nodes for each message. Once selected, messages are read by the node from a separate data bus as shown in gure 2. Since the design uses multiple sources both for the reference and select pulse trains, only one node at a time may transmit on the bus. The arbitration of bus access is the subject of the next section.
Control and Arbitration
Bus control and arbitration is fully distributed among the nodes and no central bus arbiter is required. Each processor accesses the bus via an electronic control node. RequestOut, indicates a pending request at the corresponding control node. The RequestIn optical input signal re ects the state of the RequestOut signal of all higher priority control nodes. The third optical input AckIn is used as a feedback mechanism to trigger state transitions in the control node circuitry. A processor may request access to the bus by asserting its electronic request signal, BusRequest. Similarly, when the bus is made available to the processor, the corresponding control node electronically asserts a bus grant signal, BusGrant to the processor. Both BusRequest and BusGrant are held active for the duration of the bus transfer cycle.
The design is asynchronous. Two w a v eguides, Request and Ack form the optical control bus. At each control node, RequestIn and RequestOut respectively sample the upstream Request waveguide and drive the Request waveguide downstream. The AckIn input at each control node, reads the state of the Ack waveguide. The substitution of the Ack waveguide for a global clock signal is accomplished by the feedback structure between the Request and Ack waveguides shown in gure 6.
The functions of the Request and Ack waveguides are as follows. The Ack waveguide de nes two operating states for the control bus. When there is no light in the Ack waveguide, the control bus state is in the batch-formation state. In this state, one or more control nodes make requests by injecting light i n to the Request waveguide, the feedback mechanism between the Request and Ack waveguides causes a transition from dark to light i n t h e Ack waveguide.
With light in the Ack waveguide the bus enters the batch-service state. In this state, the Request waveguide acts as a priority c hain. Each control node with a pending request, defers from bus access so long as there is light upstream in the Request waveguide. When there is no upstream signal, the control node grants the bus access to the attached processor and on completion removes the optical output from its RequestOut waveguide. Note that no control node may assert RequestOut during the servicing state. Thus any new requests must be held pending by the control node until there is no light in the Ack waveguide. This organization has the e ect of creating a batch from all pending requests at the time of the transition on the Ack waveguide Batching eliminates the starvation problems which c haracterize other priority c hain arbitration systems. Once a request enters a batch during the batch-formation state, it is guaranteed bus access in the next batch-service state.
Operating the control nodes in this fashion has a desirable side e ect. Speci cally, the control delay for arbitration of requests is now proportional to the optical path length between the two asserting control nodes. Only in the worst case, that is for a batch size of one, will this delay equal the round trip delay time of the Request and Ack waveguides. For any combination of multiple requests, the delay is always less than the round trip delay. In addition, for a high contention environment, where the number of pending requests and thus batch size, is large, the average control overhead per message will decrease, as the requests are grouped more closely on the bus. We show this e ect in section 7.2, where we present a simulation analysis. In this section we combine the data, address and control buses from the previous sections into a combined implementation. The design is shown in Figure 7 . The data, address, and control buses operate as described in the previous sections. In this section we focus on the combined operation, speci cally on the timing of each bus cycle and the latency of bus transfers.
In this description it is important to note a speci c di erence between the operation of optical and electronic bus implementations. In electronics, signal validity is implicitly assumed to be spatially invariant along the length of the bus. In other words, a bus signal is only considered valid when it is stable at all inputs, along the entire bus length. Variations in transport time are attributed to signal skew, and the worst case skew must be accounted for in the timing analysis. On an optical bus the transport time, or time of ight, of a signal is considered as part of the signal state. Thus, signal validity is both temporally and spatially variant. As a practical matter, this implies that a timing diagram representing the state of a bus line cannot be drawn for an optical bus. Timing diagrams of bus signals are only meaningful when con ned to a speci c location on the bus. In other words at the inputs or outputs of a speci c node.
The accuracy and reproducibility of of the transport time between the transitions of a bus signal at each node input is a key requirement in the design of the addressing and control structures. Optical signal propagation time is a function of the wavelength of the signal, the length of ber, signal mode, and the refractive index of the ber medium. Although these parameters are subject to environmental e ects and manufacturing tolerances, variation e ects are on the order of picoseconds kilometer. Bus lengths are typically on the order of meters. Thus, even for bandwidths of hundreds of gigahertz, time of ight v ariations are three orders of magnitude smaller than a bit time.
The timing diagram in gure 8 represents the states of bus signals at two arbitrary control nodes, node i and and node j , such that node i is physically upstream on the Request waveguide and thus has a higher priority. Bus transfers consists of interleaved control and data transfer cycles in which the control cycle may be one of two t ypes, a long control cycle, or a short control cycle. Long control cycles correspond to the batch formation state of the control bus, short cycles are control operations between nodes within a batch during the batch servicing state. If the optical path length between each n o d e o n t h e Ack and Request waveguides is l, then the latency of a long control cycle is equal to 1.5 round trip propagation delay times on the control bus. In other words, 3Nl c g for an N node bus. Short cycles vary in length from lc g to N , 1lc g depending on the relative position of the nodes in a batch. Assuming all nodes are equally active, the average latency of a short cycle is N , 1lc g =2. Figure 8 shows the timing for two bus transfers, one each from node i and node j , assuming that both transfers take place within a single batch. The top set of waveforms show control, address, and data bus connections for node i and the lower set for node j . The time axis is in units of lc g . The bus is assumed to connect ve nodes with node i and node j separated by an optical path length of 2lc g on the control bus. For simplicity, electronic delays within the control node circuitry are not represented. This is reasonable since optical delays in the design are only measured against other optical delays. While electronic delays add to the total latency, they do not invalidate the asynchronous handshaking based on the relative time of ight of the optical signals.
As stated above, bus transfers consist of interleaved control and data transfer cycles. The bus activity represented here begins with transitions on the BusRequest i input lines for node i and node j . These transitions, marked a and b respectively in the timing diagram, are shown to occur when the Ack input is high. In fact, the two requests would be placed in the same batch if they occur at any time during the data transfer or short control cycles of the previous batch or during the long control cycle of the current batch. Since at time a the Ack i input for node i is high, the control node takes no action until the falling edge of the Ack input. Similarly,node j sees Ack j high at time b and also holds BusRequest j pending. The falling edge of Ack i at time c marks the beginning of the long control cycle at node i . In response to the low level, node i asserts RequestOut i . A t 2 lc g time units later, node j sees the same low going transition on its Ack j input and similarly asserts RequestOut j . Both RequestOut signals traverse the feedback path into the Ack waveguide. The resulting rising edge at each Ack input ends the long control cycle at each node. At time e node i sees the rising edge on Ack i . With no upstream control nodes asserting RequestOut, RequestIn i is dark. A data transfer cycle thus begins at node i which asserts BusGrant to the corresponding processor. Node j sees this same transition at time f as the edge traverses the Ack waveguide, but with RequestIn j held high by the output from node i it defers bus access to node i . During the data transfer cycle, both the address and data outputs from node i are active.
The data output is a serial bit stream containing the message. The two address outputs generate reference and select pulses with the reference pulse aligned relative to the rst bit of the data Thus, the three operations of control, addressing and data transfer are supported. We turn now t o a v alidation of the of the system by experimental and simulation analysis and discuss the scalability limitations for such a design.
Performance and Scalability Limitations
As discussed above, there are three limitations to the large scale implementation of the proposed bus architecture. These are bandwidth limits, which determine the minimum pulse width, detector spacing and temporal limits on coincidence; latency limits, which bound the acceptable delay for a bus transfer and are determined by the speed and complexity of the bus arbitration and control algorithms; and power budget, which sets the minimum amount o f p o w er required at each detector to provide acceptable bit error rates and noise margins.
Each of these limits have been separately characterized for our bus design. Temporal limits have been established experimentally by testing the tolerances for pulse overlap when detecting coincidence. Latency in the control bus design was characterized by simulation analysis for various synthetic tra c loads. Power distribution was characterized analytically for linear tapped ber structures.
Temporal Limits
In this section, we present results from a laboratory experiment on a prototype of the address bus to investigate the relationship of coincident pulse power as a function of the synchronization of the arriving pulses. We rst discuss the experimental structure, then we show t ypical coincident and non-coincident w a v eforms before we discus the experiment itself. Figure 9 is a diagram of the prototype structure. The ber bus consists of a length of multimode ber tapped three times using Gould 10 dB ber couplers. Select and reference bit patterns are generated by modulating the 4ns pulse output of a Tektronix PG502 pulse generator, shown in the diagram as clock, with the output of two ECL shift registers, one for select, one for reference, at gates G2 and G3. Gates G1 and G4 simultaneously hold the diode current for laser diodes P1 and P2 respectively at threshold while the outputs of G2 and G4 generate modulation current. The result is two, 4-bit, return to zero bit streams which encode the information in each of the shift registers. As explained above, this allows us to select any subset of the three detectors. The use of two shift registers allows us exibility in the positioning of the reference pulse relative to the select pulse train. : Synchronization Experiment Figure 10 shows waveforms for both coincidence and non-coincidence measured at detector D1. The left waveform shows a single double-height pulse as seen at the detector for the case that the reference and the select pulse arrive at the detector simultaneously. The right w a v eform shows two pulses, each o f l o w er amplitude and separated in time, at the detector for the case that a di erent detector D3 was chosen. Note that in this case, the non-coincident pulses are of unequal power. This is due to the fact that each pulse has passed through a di erent n umber of couplers and, hence, has become attenuated to di erent levels. This shows that the relative p o w er between coincident and non-coincident pulses is a function of the detector location as discussed in section 7.3. One of the limits to the bandwidth which can be supported on the bus comes from the synchronization error which can be tolerated, while still detecting coincidence. Therefore, measurements were made to characterize the e ect of synchronization error between the reference and select pulses on the power of the coincident pulse. Since clearly this error can be characterized as a percentage of the pulse width, synchronization precision has a direct bearing on the absolute width and height of an addressing pulse that can be e ectively detected. In this experiment, the reference and select pulse trains were con gured to select D 2 . In each step of the experiment synchronization error was introduced by adding successively longer lengths of ber to the ends of the bus. Length was added rst on the reference pulse end of the bus, and then on the select pulse end of the bus. The two pulses shown on the left of gure 11 show the pulse waveform at the end of the experiment, after su cient delays were added to the ber to bring the pulses completely apart.
The right half of gure 11 shows the reduction factor, f, of the coincident pulse power as a function of percent synchronization error. Percent synchronization error is the error, in time, introduced by each length of ber divided by the pulse width. In other words pulses at perfect coincidence synchronization error = 0 yield a reduction factor of f = 1 : 0 which implies a coincident power equal to twice the single pulse power.
Synchronization error in either the select pulse, shown as positive error, or the reference pulse, shown as negative error, reduces this power by the factors shown. The solid line in gure 11 is the experimental result. The dotted line is an analytical result generated from the coincidence of two sinusoidal pulse waveforms. In both cases, the power falls o in roughly the shape of the coincident waveforms themselves.
In order to analyze this result, we m ust consider the sources of synchronization error. Assuming that tolerances for electronic components and errors in ber length measurements can be compensated for by tuning the system, the primary sources of synchronization error will be thermal variations in both the optical characteristics of the ber and in the performance of electronic components. For the former, studies 65 h a v e shown that the variability of the index of refraction of the ber versus temperature is on the order of 40ps kilometer-degree C, and that this is the dominant temperature e ect. This represent s a v ery minor variation in e ective time delay. O b viously, the electronic variations with temperature will be the predominant source of synchronization error. However, from gure 11 we can see that a timing error of up to 50 only decreases the coincident pulse power to about 70 of its ideal value. Therefore, large variations on the order of one half of a pulse width in the electronics can be tolerated without signi cant degradation of the coincident signal.
The experiment shows that the important system issues of latency and throughput which are related to pulse width limits are highly scalable. Based on current and near term technology, w e have shown that synchronization error does not contribute signi cantly to the bounds calculated above. On the other hand, physical scalability issues such as the size of the bus and the number of detectors that can be supported are more severely restricted due to power distribution in a system built from passive couplers. In section 7.3 we discuss the implications of power distribution on scalability issues. First we consider the latency imposed by the control structure itself.
Control Latency
In this section we present a simulation study of the bus performance under various load conditions. Minimizing control overhead is one of our primary motivations in the design. Thus, we focus on an analysis of the time spent in control operations versus data transfers. One of the side e ects of batching as it has been implemented here is that the average control time per-message required to manage bus access decreases with increasing tra c. This is because the ratio of long control cycles to short control cycles becomes more favorable.
To analyze bus performance, we conducted a discrete event simulation study on an eight processor model. For simplicity w e assume in the model that the processors are arranged in a spiral in order to minimize the feedback path length. The physical separation of each processor, and hence the delay b e t w een processors i;j , is the same for all pairs of adjacent processors. Further, the round trip delay is equal to i;j the number of processors. While this topology is more restrictive than can be supported in general, it provides a convenient time unit for performance measurements which is independent of other parameters such as the number of processors.
Two parameters in the model determine the level of bus contention: average next request delay and average transfer length. Average next request delay, nrd , is the period that any processor will wait before issuing its next bus request after completion of a bus transfer cycle. Average transfer length, trans , is the period a processor will hold the bus once a bus grant is issued. For this simulation we h a v e c hosen a xed value for trans . T h us the actual length of each simulated transfer was randomly generated within a small range bounded by trans =2. To simulate various levels of bus contention, nrd was varied in each simulation. We began with a relatively low demand environment and incrementally increased demand, by a proportional decrease in nrd , u n til bus saturation. In the nal saturated test, new requests arrive at each processor more often than the average transfer length. This assured that in the nal simulation each new batch included all other processors. Figure 12 shows clearly the reduction in overhead with increased contention. In this gure we identify three possible bus states: idle, busy, and overhead during any time unit. The bus is idle when no bus requests are pending and no transfers are in progress. The busy state is de ned to be the period when the bus has been granted to a processor and the requested bus transfer is in progress. Overhead occurs between the termination of a busy state and the next grant, if a request is currently pending, or the time from request to grant if the bus is currently idle. We have accounted for and plotted in gure 12 the percentage of total time the bus spends in each of the three states versus increasing bus demand. The uppermost plot, busy = 2, increases as In the protocol we h a v e proposed, it is at this point that batching becomes a dominate e ect and control overhead begins a decrease proportional to further increases in the level of demand. The decreasing overhead trace in this region corresponds to additional bus capacity provided by o v erhead reduction. It continues to decrease to actual bus saturation, where nrd trans . A t this point there is always a pending request at the neighboring processor upon completion of a bus transfer cycle. Thus control overhead reduces to its minimum, i;j .
Power Distribution
In this section we address the third limitation to the proposed bus organization, power distribution. We present an analysis of power distribution in a bi-directional tapped ber as used in the experiment a b o v e and discussed in section 6. In this analysis, we use passive, bidirectional, 2 2, symmetric ber couplers 66, 67 and assume no excess loss in the couplers.
Since the couplers are symmetric and bidirectional, either side can be considered the input or the output. The power distribution from the input to the output is: A bound on the number of detectors, n, is determined by the sensitivity of the last detector on the bus. In other words, it is the bound for a detector to discriminate between no pulse" and pulse". If the last detector has a sensitivity Pmin, then the maximum number of detectors supportable is:
Equation 4 is shown graphically in gure 15 for a set of coupling ratios r = 90, 95, 97, 98, 99 and :01 Pmin 1 of the input power on a logarithmic scale. This graph con rms the intuition that by improving either the coupling ratio r, or the sensitivity of the detectors Pmin we will be able to support more detectors on the bus. We also note the sharp drop in n for high values of Pmin and r which re ects the situation where much of the available power ows o the end of the bus and is wasted.
However, it is clear that it is not the absolute power but rather the power margin which imposes a bound on the size of the system. We de ne the power margin Pmto be the amount of additional Pmindicates the threshold level needed for a detector to discriminate between coincident and non-coincident pulses. That is, for each detector on the bus the threshold should be set to be at: P m+ 1 maxp1; p 2=2 For any linear structure, Pmhas its maximum value, Pm= 1, at the center of the bus, where each pulse is at equal power, and coincidence is re ected as a doubling of power seen by the detector. It is at its minimum value at the ends of the bus. Another constraint is that the con guration of the coincident address bus requires bidirectional propagation. Therefore, we are constrained to use a single tapping ratio, r, for all couplers.
Based on these two constraints of power margin and a single coupling ratio, the graph shown in gure 16 which is a plot of worst case power margin Pmv ersus 1 , r for various bus lengths, con rms that the power margin for the coincident structure bounds scalability more strongly than absolute power. We can see from gure 15 that using 90 percent couplers, and assuming we can tolerate a Pmin of :001 of input power we could achieve bus lengths of about 50 detectors. However, gure 16 shows that for a power margin of Pm= 20 we could only reach lengths of 16 detectors.
Therefore, due to both minimum power constraints but more strongly by p o w er margin issues the system scale is highly sensitive to the xed value of r.
There are three solutions to this problem. First, is the use of more sophisticated taps that use cladding modes to give tapping ratios as small as 36db 68 . Second is the use of non-linear ber 
Summary
In this paper we h a v e presented a complete design for an optical ber bus suitable for applications such a s m ultiprocessor backplanes or other systems applications. The design incorporates optical processing as well as data transfer into the communication links. The resulting system includes an all optical addressing system which eliminates the latency contribution and bandwidth limitation associated with electronic address decoding. The control system uses time of ight relationships between a priority c hain and feedback w a v eguide to implement fully distributed asynchronous and self-timed bus arbitration. 
