Abstract-This paper presents an architecture and implementation issues of an "almost-all" optical packet switch that does not rely on recirculating loops for storage implementation. The architecture is based on two rearrangeably nonblocking stages interconnected by optical delay lines with different amounts of delay. We investigate the probability of loss and the switch latency as a function of link utilization and of the size of the switch. In general, with proper setting of the number of delay lines, the switch can achieve an arbitrarily low probability of loss. Growability patterns and extension of the design to the dense wavelength division multiplexing (WDM) case are also shown. In particular, we discuss an extension to the architecture whereby, through the use of WDM, the switch capacity may be increased several times, with only minor changes to the switch design.
I. INTRODUCTION AND MOTIVATION
N "all-" and "almost-all'' optical networks the data path is I fully optical. In other words, the data bits (the payload) remain in the optical domain during their journey through the network. In "almost-all'' optical networks, the control of the switching operation is done electronically. In contrast, in alloptical networks, the switching control is fully optical. The current state of optical processing and computing does not allow full implementation of an all-optical network. Thus, our work concentrates on the design of an "almost-all'' optical network.
In this paper, it is assumed that optical networks have some advantage over their electronic counterpart. One of the features provided by photonics is the transparency, and refers to the fact that, except for the control information (i.e., the packet header), the payload may be encoded in an arbitrary format or at an arbitrary bit rate. This feature may have far-reaching consequences for expanding and upgrading of future networks. Nevertheless, it is beyond the scope of this paper to argue for the superiority of photonics over electronics, especially because many crucial parameters, such as future cost, are still unknown.
The main problem in the implementation of packet-switched optical networks is the lack of random access optical memory ( [l] ). Some networks cope with this shortcoming by introducing special architectures that either eliminate the need for local buffering or that reduce the buffers' size ( [ 2 ] ) . Other networks, especially local area networks, rely on some (electronically based) reservation scheme that again avoids the use of optical buffers ( [3] ). However, this approach cannot be easily extended to a wider network span.' Therefore, the challenge is to propose an optical switch architecture design with the constraint that large optical storage is not feasible.
Several recently proposed switching architectures2 are not suitable for optical implementation because of the limitation of optical buffering; i.e., lack of random access optical memory. Nevertheless, some come quite close to coping with the above limitation. For example, Starlite ( [5] ), by using the recirculation lines as (shared) buffers, could be optically implemented. However, with recirculation loop storage, every recirculation loop may require optical amplification because of the relatively large attenuation created by the switching element that controls the insertion (extraction) to (from) the recirculation loop3 Here, we present a switch architecture that employs the inherent capability of optical fiber for storage, yet does not rely on recirculation-the Staggering S~i t c h .~ The paper is organized as follows: In Section 11, the basic concept and elements of the Staggering Switch design are described. Because of the limitation on the length of this paper, detailed performance figures are not fully presented. These will be reported in another publication. (An interested reader is also referred to [6] .) Nevertheless, a limited performance analysis is included in Section 111. Extensions to the basic architecture of the Staggering Switch are discussed in Section IV. In particular, growability of the switch, both in the space and in the wavelength domains, is addressed. Additionally, a packet synchronization scheme, the packet flipping scheme, is introduced. A n example of the control design is shown in Section V. A number of implementation issues are also considered in this section. Finally, some concluding remarks are included in Section VI.
THE STAGGERING SWITCH ARCHITECTURE
The Staggering Switch is an output-collision-resolution scheme that is based on a set of delay lines of unequal delay and is targeted at the (optical) packet-switched networks. ' Though, some attempts to do so has been considered ( [4] ). ' The reason for this increased activity is the Asynchronous Transfer Mode 'Since an optical amplifier degrades the optical signal recirculation loop 4The switch is named the Staggering Switch because of the structure of its effort. The control part of the switch operates on the "almost-all'' optical principle described in [7] . The optical energy of each one of the n switch input lines is split immediately after its arrival; a small fraction of the energy is passed to the detector, converted to an electrical signal, and forwarded to the Control. The Control reads the header bits to determine the required routing for the packet and drives the scheduling and the switching stages of the switch.
Additional delay between the splitter and the scheduling module may be needed to compensate for the electronic Control processing time. It is assumed that:
1. all the packets are of fixed size'; T [SI, 2. time is slotted; initially we assume that each slot duration 3. the arrival of all the packets to the switch is synchro-
The scheduling stage and the switching stage may be implemented as rearrangeably nonblocking networks (e.g., Benes networks ([8] and [9])). 8 In rearrangeably nonblocking networks, any permutation of the inputs can be achieved at the outputs without blocking. Dilation of switching fabrics may be used in optical implementation as a way to reduce the cross-
talk, especially for large number of inputs/outputs (e.g., the dilated Benes networks [lo] ).
Numerous configurations of the Staggering Switch architecture are possible. We discuss here only the basic one.
B. The Operation of the Staggering Switch
The scheduling stage distributes packets to the delay lines in such a way that, in any time slot, no two packets arrive at the switching stage destined for the same output. In other words, the output collisions present at the inputs are resolved by delaying the colliding packets by different numbers of slots, so that when they arrive at the switching module, there are no output collisions.
The scheduling is done by the Control. The Control receives the header information from all the arriving packets and attempts to allocate as many of them as possible into the delay lines, d,. The scheduling may be done according to the algorithm presented in the next section. To perform this algorithm, the Control needs to maintain knowledge of the "content" of the delay line at any time. Based on this information, the Control ensures that no collisions occur at the switching stage. Because of the statistical properties of the arrival process, some packets cannot be accommodated by the scheduling process, without violating the "no collisions at the switching stage" principle. Thus, some packets will be lost. The fraction of lost packets corresponds to some probability of packet loss. Section 111 evaluates the loss performance of the switch.
The switching stage permutes the outputs of the delay lines, so that the packets emerge at the required switch output.
C. The Scheduling Algorithm
In its basic form, the scheduling algorithm is as follows9: In other words, scanning the inputs sequentially, for each input packet the algorithm tries to insert the packet in the lowest 1. that no previous packet was inserted in the delay line in LiNbO3 for example, ' We differentiate between packets and slots. A packet is the data that needs to be switched, while a slot is a frame into which the logical packet is possible line? subject to the two conditions:
inserted. Slot duration need not equal packet duration.
'Size, here, being the time duration. However, the number of bits in each this time slot (i.e., the output is free), column in which the packet is to be inserted.
packet may vary, depending on the actual data rate of the data field; i.e., the network may operate based on the field coding technique ([7] ).
8The complexity of routing in large rearrangeably non-blocking networks may be quite high. In these cases, non-blocking networks may be required. This algorithm is referred to as the sequential biased assignment, since it gives higher priority to the lower numbered inputs by trying first to accommodate a packet from input 1, then from input 2, etc. Thus, the probability of blocking of input number 1 is O (i.e., Pblock(l) = 0, and Pblock(z) < Pblock(z + 1 ) ) . A variation of this algorithm is the sequential nonbiased assignment, where in each slot the order in which the inputs are accommodated is random. Other assignment possibilities exist. In particular, there is an optimal assignment based on the knowledge of the past and the future. Some of the more interesting possibilities are discussed in Appendix A.
PERFORMANCE EVALUATION
Packet switching structures are compared based on a number of parameters. The main figure of merit is the loss probability or blocking probability (i.e., the probability that a packet cannot be switched because of contention) under some assumptions on traffic intensity and statistics (usually uniform distribution of destinations). The capacity of a switch is the maximum throughput, again under some assumptions on traffic statistics. The switch latency is the average delay of a packet in the switch, when the switch operates under specific traffic load.
In this section, the two major switch performance criteria are considered: probability of blocking" and latency". Both probability of blocking and latency depend on input line utilization12 p; both increase with an increase in line utilization. Probability of blocking, Pblock. is directly related to the normalized switch throughput, T , by:
The normalized switch capacity C is its maximum throughput. Thus assuming that the maximum throughput is achieved at P m , c = (1 -P b l~c k ( P r n ) ) . '~ '"Probability of blocking is defined as the probability that a randomly "Latency is defined as the average time (in slots) that a packet is delayed 12Line utilization is the probability that an input slot contains a packet. 13The results of our simulations strongly suggest that under the assumptions chosen packet cannot be scheduled and is dropped. in the switch. A. Probability of Blocking-Throughput following were assumed:
To evaluate the blocking performance of the switch, the 1 . The destination addresses are uniformly distributed. 2. The traffic is not time-correlated, i.e., nonbursty traffic.
(This restriction is alleviated later.) 3. There is no correlation between packets arriving on any two inputs; no arrival time correlation and no destination correlation. 4. Sequential biased scheduling is performed. 5. All packets have the same routing-and loss-priorities. Note that since the traffic on each input is assumed to be uniformly distributed and uncorrelated in time, the throughput of the switch in the cases of sequential biased and sequential non-biased assignments is the same.
The probability of blocking is evaluated by simulation as a function of line utilization with the size of the switch n as a parameter. Fig. 3 shows the simulation results for switches of size 8 x 8,16 x 16,24 x 24,32 x 32, and 64 x 64. As is evident from the graph, the larger the switch, the lower the loss probability for a given line utilization. Thus, for example, at p = 0.7,Pblock z for the 16 x 16 switch, while Pblock z lop7 for the 32 X 32 switch.
An additional parameter is the number of delay lines m. As can be expected, increasing the number of delay lines lowers the loss pr~bability.'~ This is shown in Fig. 4 , where n is kept constant at 16 and m is varied from 16 to 32. The effect of the increase in m is quite dramatic. For example, at p = 0.8, when m is increased from 16 to 32, the Pblock decreases from l.2.lOP3 to 6.1OU7 (i.e., about 3 4 orders of magnitude). Thus rn can serve as a very effective design parameter to achieve a desirable level of performance.
Since the Staggering Switch operates in the deflection routing mode, packets are, in fact, not lost but misdirected. In other words, packets are accommodated in the delay lines in such a way as to maximize the number of packets that conform to the rule that "no two packets in the same column are destined for the same output." But, since the scheduling module only permutes the inputs, some packets will be forwarded to delay lines in violation of the above rule. These packets will be I4Since as t)) increases. there is more "buffering" available. switched to a wrong output at the switching stage, that again simply permutes the delay lines to the outputs. Thus, even though we consider here such misdirected packets as lost, in practical, multihop networks, these packets may be still correctly delivered after subsequent nodes redirect them back to their original destination.
B. Bursty Traffic
The assumption of nonbursty traffic is unrealistic for future high-speed networks. The following set of simulation results shows the effect of time correlation (i.e., burstiness) on the switch performance. As a function of time, the input traffic is composed of bursts of packets destined to the same output. 5, and 10). As can be noted, the performance degrades rapidly with an increase in the burst length. Thus, the basic Staggering Switch architecture is not well suited to time-correlated traffic. The following change to the switch design allows us to reduce the ill-effect of the time-correlated traffic considerably. The change is described in Fig. 6 . Let Ai define the difference in the length of the ith and the ( i + l)th delay line, i.e.,
Thus, in the basic Staggering Switch design, A, = A = 1. Increasing A, reduces the effect of traffic time correlation. This is shown in Fig. 5 and Figs. 7-9, where Ai = A takes values of 1, 2, 5, and 10. As can be observed from these figures, as A increases, the performance of bursty traffic tends more to the performance of the uncorrelated case (1 = 1). It should be noted that increasing A corresponds to an increase in the length of the delay lines but has no effect on the number of the delay lines or the size of the scheduling and the switching modules. This is an important fact, since the hardware cost is determined by the size of the scheduling and switching modules. Thus by properly sizing A, the increase in probability of loss due to traffic correlation can be effectively eliminated. The penalties for increasing A are: the cost of fiber (which is quite small); the increased attenuation (which is probably negligible because of the dominant attenuation of the LiNb03 modules); the increased chromatic dispersion for very high-speed design; and the increased switch latency (i.e., the switch latency is A times larger than the basic design with A = 1). The scheduler operation (i.e., the size of the scheduling) remains uneffected by the change in A's, as long as the number of delay lines remains constant.
C. Reduction in Column Dependency
The packet loss probability can be improved if some of the dependency between the columns can be reduced or eliminated. The dependency is created because the system has memory. In other words, if a packet is "rejected" from a column and accepted at a subsequent column, then future packets "rejected" from the first column will be no longer independently "rejected" or "accepted" at the second column. The A's can be staggered so that only once (during the "life" of the columns) will the same two columns "accept" packets at the same time. Such an arrangement reduces the column dependency and exists, for example, if
A .
Other arrangements with smaller A's than in formula (2), are possible. Fig. 10 demonstrates the improvement of the probability of loss for n = 4, when the A's are arranged according to formula (2) and uncorrelated traffic is assumed. In this figure, the curves are labeled n x m. Prefix b indicates that arrangement (2) was used. As can be observed from the figure, significant improvement can be achieved by the above arrangement, especially for low utilization. It should be pointed out that the improvement is not due to additional storage created by the arrangement of formula 2 since, as can be easily verified, an equal increase in all the A's (and thus an increase in the storage capability) provides no improvement in the probability of loss when uncorrelated traffic is assumed.
remains relatively constant as a function of m for small values of p. The reason is that, for low utilization, the "extra" delay lines are rarely occupied. However, as p increases, the longer delay lines become more and more occupied, and the average latency increases. Note that the latency for smaller values of m is lower. This happens because latency is averaged only for successful packets; also for large values of m and high utilization, many packets are scheduled on longer delay line, contributing to higher latency numbers. The maximum and minimum switch latencies are m and 1, respectively. The
It should be emphasized that switch latency is not an issue in networks with wide span and small packet size, since the propagation delay of long distance fiber may be several orders of magnitude larger than the actual switching delay. For example, with a 64 x 64 switch size and 100 km between switching nodes, the propagation delay is 500 ps, while the maximum switch latency is 64 packets or approximately 27 ps, for 53-bytes, 1 Gb/s packets. Of course, for a smaller network span, larger packets, or large values of A 's, latency may be an important factor.
Iv. EXTENSIONS TO THE BASIC SWITCH ARCHITECTURE

A. Packet Reseauencina
The Staggering Switch may resequence packets. Packets arriving on the same input may end up in the reverse order on the same output, because the packet arriving first may be placed in a longer delay line than the packet arriving later.
Packet resequencing may be solved either in the switch itself or on the higher protocol layers. The transport layer, for example, may assume the function of packet resequencing (e.g., Transmission Control Protocol [ 111, for example). For D. Switch Latencv some applications, packet sequencing is not an issue, and no action is required.16 ATM networks require cell order to be The latency Of the Staggering D > depends On P j n , preserved, since there is no internal mechanism in ATM to and m. 15 In general, an increase in p , n , and m results in Thus, for each input, a pointer must be maintained to indicate the range of the delay lines on which a packet arriving on this input may be placed. Alternatively, a pointer may be kept for each output, in which case condition (3) should be applied to output lines. However, Fig. 13 shows a considerable increase in the probability of loss for the above two schemes (the curves "input" and "output"). When a pointer is maintained for every input-output pair and condition (3) is preserved for each such input-output pair (the curve "inpout"), the performance degradation is minimal, as shown in the same figure.** However, n2 pointers are required in the latter case. Since, as pointed out before, some traffic may not require packet ordering, the routing may be based on the traffic type-traffic that does not require order preservation may be routed according to the regular scheduling algorithm, while order preserved traffic will be scheduled based on the above pointer-based scheduling scheme.
B. Growing the Staggering Switch in the Space Domain
A large-sized Staggering Switch may not be easy to implemented because of practical problems, such as attenuation and cross-talk. Also, in a large switch, the electronic control has a limited ability to process the large flux of incoming packets. Thus, a scheme is required that allows building of a large switch using relatively smaller modules. One way to grow the Staggering Switch is shown in Fig. 14 built out of 2q n x n switches. Since each path passes through two stages, the total loss probability is given in terms of the loss probability of a single n x n module, Pt,",", by:
Thus, for example (see routing can be performed in a distributed manner (i.e., each switching module may perform the routing locally).
C. Growing the Stagggering Switch in the Wavelength Domain
The Staggering Switch may be extended in the wavelength domain in two ways. First, in the WDM case where each one of the n trunks carries k wavelengths, each trunk is demultiplexed to k channels and the n channels of the same color are grouped together for switching in a single n x n switch (shown in Fig. 16 ). Following the switching operation, the outputs of the switches are multiplexed onto n outgoing trunks. In this arrangement, any single switch needs to operate only at a single color. A total of k switches are required. If inter-wavelength switching is required, wavelength converters must be included, as shown in Fig. 17 .
Another way to incorporate the wavelength multiplexing in the design of the Staggering Switch is by representing the packet simultaneously on several wavelengths. This scheme is shown in Fig. 18 , where four colors are used in parallel to represent a single nibble of information. In this example, each one of the four wavelengths carries a 2.5 Gbls stream, totaling 10 Gb/s per input line. Since the LiNb03 optical bandwidth is large enough to accommodate four Dense-WDM wavelengths, parallel 4 bit switching can be performed at the same time. In this way, the capacity of the switch can be quadrupled with no change to the switch design. The required changes are in the peripheral devices that generate the four parallel channels and multiplex/demultiplex them on a single fiber. Practically, much more than four wavelengths can be accommodated, considerably increasing the switch capacity. Such an arrangement can be employed where the use of extremely fast transmitterslreceivers is prohibitively expensive.
D. Slot Synchronization
When the Staggering Switch is used as a part of an alloptical network, it is necessary either to ensure that all the inputs to the switch are synchronized (i.e., packets arriving at different switch inputs are aligned at the switch) or to operate the switch in an unsynchronized manner. If the switch is to be asynchronously operated, the switching and scheduling modules should be nonblocking elements, not reconfigurably nonblocking elements as in the synchronized case, since in asynchronous operation reconfiguration is not possible while packets are being forwarded to the delay lines. Thus using reconfigurably nonblocking elements for asynchronously operated switch results in an additional penalty because of the blocking in the scheduling and switching modules. Nonblocking modules with small crosstalk are harder and more expensive to manufacture. When the switch operates synchronously, the scheme in Fig.  19 can be used to synchronize packets arriving at different inputs." In this scheme, delay lines with delay equal to a fractional part of a slot are connected in tandem. The synchronization circuit, after comparing the packet time of arrival with the phase of the local slot clock, rlock,lOt, generates the appropriate setting of the delay line, so that the input stream is aligned with the local clock. If the delay variations in the network are slow, the adjustments to the delay line settings occur infrequently. This sychronization scheme suffers relatively large attenuation because the optical signal may travel in and out for the LiNb03 wafer several times.
To overcome this shortcoming, we propose a new synchronization scheme, termed packet flipping. The scheme is based on time slots that are twice as long as the actual duration of the packet. Thus, the network capacity is reduced to 50%.20 Refer to Fig. 20 for the explanation of the scheme's operation. It is assumed that each switching node in the network has a local clock, clockframe, with period of 2r. where r is the duration of the packet. We call this 27-long slot a frame. The clocks at different switches (nodes) are unsynchronized with each other. Moreover, it is assumed that in each switching node (and network interface) only one packet can be placed "Other configurations are possible. 2"This is in line with the philosophy that some part of thc enormous bandwidth can be "wasted" to provide simpler control or operation of the all-optical networks. in a (locally defined) frame. Within this frame, however, the packet may float. If such traffic is presented to the Staggering Switch with slot size equal to the frame size (i.e., double the slot size), the switch preserves the "one packet per frame" rule. However, if traffic coming from the output of the switch is received at the next switching node with an unsynchronized local frame clock, the "one packet per frame" rule may be violated, as shown in Fig. 20 . To correct this, so that there again is a single packet per frame aligned with the new local frame clock, the hardware shown in Fig. 21 is introduced. The central component is the synchronizer.*'
The synchronizer can be built from two delay modules, as shown in Fig. 22(a) , where each module can either add no extra delay or a delay of r to the optical signal. Alternatively, the sychronizer can be composed of three 2 x 2 modules interconnected by two sets of two delay lines; one with delay of r and one with no delay, as shown in Fig. 22(b) . Comparing the two sychronizer designs, the maximum loss of the twomodules design is equivalent to four times the loss of a single 2 x 2 module, while the three-modules design loss is only three times the single module loss. Moreover, the loss of the twomodules design is not constant and depends on the total delay required. However, the cost of the three-modules synchronizer hardware is considerable higher than the two-modules one.
If a packet does not fall in between two frame clocks, or if there are two packets per frame, an extra delay is added to correct this situation. A proof to show that the two packet delays can restore the "one packet per frame" characteristic is given in Appendix C. Intuitively, when the "original" frame clock of the traffic is offset by less than 7 , a single packet delay will restore the traffic "one packet per frame" characteristic; when the offset is longer, two packet delays are required. The l' The synchronizer is placed between the splitter and optical processing unit. The purpose of the optical processing unit is to condition the optical signal, so that it is suitable for the LiSbU.3 modules; e.g., proper polarization.
technique, reducing the Control receiver's large dynamic range synchronizer 3 problem. Moreover, the problem of crosstalk a c c~m u l a t i o n~~ is virtually eliminated by using this amplification technique.
Since the LiNb03 modules are polarization dependent, either polarization maintaining fibers or polarization controllers inserted at the input to each one of the two modules are required. Alternatively, polarization independent switches may be used ([15], [16] ).
The high-speed receivers at the switch output need to be
synchronization-circuitry Is.
properly designed. The required features are: DC-coupling (or use of another scheme to avoid the DC-wandering problem) and proper dynamic range (to compensate for variations in optical path lengths through the switching modules and through the different delay lines). Burst receivers described in [17] were designed to cope with these problems. packet flipping scheme reduces the number of times the optical signal needs to leave the LiNb03 wafer compared with the scheme in Fig. 19 at the expense of reduced bandwidth.
V. IMPLEMENTATION ISSUES
A. Some Practical Considerations
As discussed in [12] , 16 x 16 rearrangeably non-blocking modules were built with an optical power loss of 13.4 dB per module. Thus, connecting two of these modules22 yields attenuation of 29 dB. This is a satisfactory power budget at 2.5 Gb/s per line. However, if the switch is only one of several switches that a packet is going to pass through (as in a multi-hop network), some sort of amplification is required. Moreover, since packets arriving at a switch may have traveled on different routes, the header receivers may require considerable dynamic range. A possible solution is a single stage of optical amplification ( [13] ). Another solution is electrooptic amplification, described in [14] , in which an optical signal is detected, amplified as an analog signal, and converted back to light. Threshold amplification is an additional possible feature of the electrooptic amplification 22For I N = 64, 1 Gb/s line rate, ATM size packets of 53 bytes, and fiber loss of 0.2 dB/km, the total fiber loss (~5 km) is 1.0 dB. Adding additional 1 dB for connectors results in total interconnection loss of ~z 2 dB.
B. Hardware Implementation of the Scheduling Algorithm
An example of the Control implementation is shown in Fig.  23 and is composed of the data-base, the scheduler, the xand y -converters, and the path hunting memories. The database stores the representations of the packets' destinations of the packets that are in the delay lines. It is composed of an array of shift-registers that mimic the flow of the packets in the delay lines. The scheduler, which can be implemented as combinatorial logic or as an array of look-up memories, Because the operation of the Control hardware may require a considerable fraction of a time slot (especially for small, ATM-like packets and high bit-rate), the operation may be pipelined. In the simplest form, the calculation of the setting is done one step ahead (i.e., while the transmission of the current packet takes place, the Control computes the setting for the set of packets in the next slot). Thus, the vector that contains the destination information for the next slot is supplied at the beginning of a slot, and the computation of the next slot setting is completed before the end of the current slot. During the guard-bands, the actual reconfiguration of the modules takes place in parallel with updating the data-base with the information from the vector b. input packets
VI. CONCLUDING REMARKS
We have presented here an "almost-all'' optical switch architecture that does not rely on recirculating loops for storage implementation. Our architecture is based on two rearrangeably non-blocking stages interconnected by delay lines with different amounts of delay; the Staggering Switch architecture. We have investigated the probability of loss as a function of link utilization and the size of the switch. In general, with proper setting of the number of delay lines, the switch can achieve arbitrarily low probability of loss. The latency characteristics of the switch were also investigated. Moreover, a scheme to reduce the effect of traffic burstiness was presented. Growability in space and wavelength domains were discussed. A packet synchronization scheme was introduced. A possible design of the electronic Control circuit was shown. We believe that the Staggering Switch may become the fundamental building block of future, widearea, all-optical networks as the network bit-rate increases and packet switching becomes the switching scheme of choice in these networks.
APPENDIX A THE EFFECT OF THE SCHEDULING ALGORITHM
The scheduling algorithms may be classified based on the amount of information fed into the algorithm. In particular, algorithms can be causal or non-causal, if the information that the algorithm relies on is restricted to the past information or not, respectively. Thus, if an algorithm has knowledge about future arrivals, it is noncausal. Non-causal algorithms may be implemented by delaying the input information as in Fig. 24 .
The scheduling algorithm may have a considerable effect on the performance of the switch, both in the probability of loss and the latency. The scheduling algorithm described in Section 11-C, referred to as the sequential algorithm, is a causal algorithm. It is, however, not an optimal algorithm for achieving the lowest probability of loss. The problem of maximizing the number of packets that can be accommodated at a given slot can be viewed as the maximal matching on delay-lines bipartite graphs, as shown in Fig. 25 . In this example, the four left nodes are the four inputs, the eight right nodes are the eight delay lines in the 4 x 4 Staggering Switch with rn, = 8. There is a link from left node i to right node ,j, if a packet from input i can be accommodated in column j . Maximizing a matching on this bipartite graph corresponds to the maximum number of packets that can be accommodated in this slot. Finding a maximal matching will not, in general, correspond to the minimal loss probability, since, obviously some maximal matching (i.e., the ones that place the packets on lower numbered delay lines) are better than others. This is shown in Fig. 26 , in which the performance of three algorithms is shown: the sequential algorithm, the maximal algorithm which randomly chooses between the maximal matching, and the 1-maximal, which gives more weight to the maximal matching with lower numbered delay lines. The 1 -maximal algorithm may not necessarily be the optimum. More extensive results on the comparison between the various scheduling algorithms will be reported in a separate future publication.
APPENDIX B MODEL FOR THE BURSTY TRAFFIC
A realistic traffic model needs to account for some time correlation in the traffic stream. In other words, if a current packet on input i is destined to output j , then there is some probability greater than 1 / 7 1 that the next packet on z is also destined to j . where 71 is the number of inputs. Thus, as a function of time, traffic on each input is composed of bursts of packets destined to the same output. These bursts are The average burst length, 1, is calculated by observing that the probability of burst length k is: APPENDIX c PROOF OF THE PACKET FLIPPING ALGORITHM followed by an idle period that can be of length zero (i.e., two back-to-back bursts).
Time correlation of traffic on each input is modeled on the Markov chain, shown in Fig. 27 . The chain is composed of three states: idle, in burst I, and in burst 11. The system is in the idle state when no packet arrives in the current slot. With probability p a , no packet will also arrive in the next slot, while, with probability 1 -p a , a new burst will begin and the system will transfer to the state "in burst I." While in this state, with probability P b , the next arrival will be part of the burst (i.e., destined to the same destination), or the burst will terminate with probability pa. The termination of the burst can occur in two ways: by starting a new burst (i.e., to another destination), in which case the system will transfer to "in burst 11" state, or by going to the idle state. The probability of the first case is ( 1 -pb) . ( 1 -p a ) , and the probability of second case is ( 1 -pb) .pa. The behavior of the system in the state "in burst II" is similar to that of the state in burst I. Equation (9, seen at the bottom of the page, corresponds to the steady-state solution of the above Markov chain (7ri is In this Appendix, we assume that all clocks are of equal period 27. We define s-characteristic of traffic with respect to a specific clock, as having the following property: All the packets in the traffic stream are positioned between the clock ticks.
In other words, the traffic might have been generated in such a way that there is only one packet per frame (27-long). The packets may, however, "float" within the frame.
In this Appendix, we prove that s-characteristic traffic with respect to any clock can be made s-characteristic with respect to a specific clock by the hardware shown in Fig. 21 . It follows, as a conclusion from this proof that the s-characteristic of traffic with respect to local clock can be preserved at each switch by having the hardware in the is measured on axis t'. We will show how to convert the traffic so that it exhibits the s-characteristic with respect to clock B.
We label the offset of clock B from clock A by 6 (i.e., In this case, we use one of the delay lines of delay r to offset the transmission, so that we can now assume that 0 5 6 5 r, as in the case 2. What remains to be shown is that in the case when 0 5 6 5 r , using a single delay line of delay r restores the s -characteristics of the traffic with respect to clock B (the case nr. 2).
To prove the above, associate each frame of clock A with a frame of clock B, as shown in the figure (i.e., A I is associated with B1, etc.). Now, the claim is that under the above conditions, a packet either fits in a frame of clock B, or can be delayed by r and now fits in the frame.
If a packet fits in the clock B frame, nothing needs to be done. This will happen when 6 5 t, 5 2r. where t, is the arrival time of packet i. On the other hand, if a packet "falls" on the B clock (i.e., when 0 _< t, 5 6 5 r), the packet needs to be delayed by 7 ("flipped"). When this happens, the time of arrival of the packet i on the t' axis will be tl = f , -6 + r.
But since 0 5 t , 5 6 5 7 , it follows that 0 5 r -6 5 ti 5 r.
Thus, after flipping, the packet will fit into the clock B frame.
The Staggering Switch with frames of size 2r will preserve the s-characteristic of the traffic with respect to the local clock. Thus, if the input traffic to the first switch in a series of switches is s-characteristic with respect to any clock, by
