We focus on non-conflicting construction of an optical multistage feedforward network to emulate the N-to-1 output buffer multiplexer by using switched fiber delay line (SDL). In [1], Y.T. Chen et al. presented a sufficient condition (an upper bound) for the number of delay lines required for such a multiplexer with variable length bursts. In this paper, we first give an improved upper bound. Then we develop a framework to construct an arrival case of bursts which can be used to achieve a necessary condition (a lower bound). These results are further extended to the feedforward construction of N-to-N output buffer switch. Through simulation and performance comparison, we find that the new bounds can significantly decrease the hardware cost for constructing both the feedforward SDL-based multiplexer and output buffer switch while still provide the same performance as that of the old ones. key words: optical buffer, switched delay line (SDL), fiber delay line (FDL), multiplexer, output buffer switch
Introduction
All-optical switching is attractive since it eliminates the quite expensive optical-electronic-optical conversion and provides us an opportunity to make good use of the enormous bandwidth of optical network. Time sliced (synchronous) optical switching without wavelength conversion is a simple yet cost-effective technology for implementing all-optical packet switching [2] - [6] , where contending packets are buffered temporarily and forwarded later. Since the optical-RAM is not available yet, the fiber delay line (FDL) is usually adopted to delay packets. Unlike the traditional electronic memories where packets can be randomly read and written, a packet entering a FDL must propagate for a fixed time and can not be removed before that time. As such, the design of FDL based optical buffers with a desired throughput and delay performance as that of electronic ones is still a significant issue.
Recently, a mathematical theory was developed by C.S. Chang et al. [7] for the exact emulation of First In First Out (FIFO) optical multiplexers based on the combination of (bufferless) Switch and fiber Delay Line (SDL). Y.T. Chen et al. [1] then extended the work in [7] to more flexible feedforward SDL constructions of multiplexer and output buffer switch (as illustrated in Fig. 1 conditions of the number of fiber delay lines were derived in [1] for the emulations of both multiplexers and outputbuffered switches.
In this paper, we will first show in Sect. 2 that for the Nto-1 multiplexer supporting variable length bursts, the sufficient condition proposed in [1] can be further improved to a tighter upper bound. In Sect. 3, we provide a method to construct a practical case, which serves as a lower bound for the multiplexer construction. Then these ideas are extended to the N-to-N output buffer switch in Sect. 4. In Sect. 5, we provide the corresponding comparison and simulation, which indicate that our new lower bounds may significantly reduce the hardware cost with the exactly same performance as that of the old bounds. Finally, the Sect. 6 concludes this paper.
The Feedforward Construction of Non-conflicting Multiplexers

Architecture and Routing
As shown in Fig. 1 , the construction of N-to-1 multiplexer is composed of M stages of fiber delay lines connected by bufferless switches (e.g. crossbar switches). In particular, the first switch has extra N-1 ports for packets loss due to buffer overflow. The length of all the fiber delay lines in the j th bundle of stage i is jr i−1 , where 1 ≤ i ≤ M and 0 ≤ j ≤ r − 1. From Fig. 1 
For clarity, the notations used in this paper is shown in Table 1 (please refer to Fig. 1 for illustration).
Here we adopt the same assumptions as that in [1] : the time is partitioned into slots, a packet can be transmitted within one time slot, and the arriving burst consists of an integer number of fixed size packets. Define a busy period as the period of time that there are packets stored in the network. The transportation of packets satisfies the following three constraints:
(i) Conflict constraint: no more than one packet can be scheduled at the same input/output ports of each crossbar switch at the same time. (iii) Strong contiguity constraint: packets in the same burst should be routed through any fiber delay lines contiguously.
The Non-conflicting Analysis
As the network in Fig. 1 is designed to emulate an N-to-1 FIFO queue, the delay x of any newly arrived burst can be known according to the current length of queue, that is the sum of lengths of packets stored in the network. This network allows sel f − routing for arrival bursts because the delay path can be expressed by the r-ary representation of assigned delay x, as long as x is no larger than the maximum acceptable delay r M − 1, i.e., the delay x can be expressed as:
where I i (x) ∈ {0, 1, 2, . . . , r − 1}. An example of the r-ary representation of delay x is shown in Table 2 where r = 2 and M = 4. Following Table 2 , if a burst is assigned with delay 1 it will be routed to the 2 nd bundle of FDLs in stage 1 and the 1 st bundle of FDL in other stages. Now, the problem left is to calculate the value of |D i j | so that the construction in Fig. 1 can exactly emulate an Nto-1 multiplexer. A necessary condition of |D i j | was found in [1] which is shown in Theorem 1. Here, we first introduce the following two Lemmas established in [1] since they lie the foundation of the analysis.
Lemma 1:
Define the k th packet of a busy period as the k th Table 2 The r-ary representation of delay x when r = 2 and M = 4. 
We adopt Fig. 2 to explain the proof of Lemma 2, where the rows mark the packets arrived from different input ports and the columns mark the time slots. Assume that the packet departing at t of a burst with length l max for the worst case, we have that the packets arrived early than t 1 − l max must leave before t d 1 . Thus, it is only necessary to consider those packets that arrive later than t 1 − l max . Similarly, a packet that departs earlier than t d 2 must arrive either at input i 2 before t 2 or at the other N − 1 inputs before t 2 + l max . Therefore, the number of packets that depart in the time interval [t
Based on Lemma 1 and Lemma 2, the upper bound can be derived by ordering t 1 = t − r i−1 + 1, t 2 = t, and adopting the minimum departure time interval of packets, cr i | c=1 , as follows:
Theorem 1: Suppose that the burst lengths are bounded by l max and that the feedforward network is started from an empty system. If
then under the self-routing rule it is a discrete-time N-to-1 FIFO multiplexer with buffer r M − 1 for variable length bursts.
In (3), the first part is the maximum number of bursts that may depart from D i j to output ports and the second part is the maximum number of bursts that may route to D i j from N input ports in one time slot.
A Tighter Upper Bound
Theorem 1 gives a sufficient condition for the number of delay lines needed in each bundle of the feedforward network to achieve exact emulation of an N-to-1 output queue. It is notable, however, that we can find some cases to show that the region of the bound in Theorem 1 is not really tight. For the first stage (i = 1), the packets routed to the delay lines in D 1 j must arrive at the same time, i.e., the arrival time interval of packets that may conflict with each other is t 2 − t 1 + 1 = r i−1 = 1. Consider Fig. 2 in which the first departure packet (resp. the last departure packet) arrived from input i 1 (resp. i 2 ), the packets that depart between the first and the last packets must arrive from the other N-2 input ports. Thus, the number of packets that depart between the first and the last packets is bounded by (N − 2)(l max ) + (l max − 1). Adding the first and the last packets, we have N + (N − 1)(l max − 1). From Lemma 1, there is at most one packet routed to D i j among every r consecutive departure packets. Then we have at most
packets routed to D 1 j simultaneously. For instance, when N = 2, r = 3, l max = 2, we get from (4) that the maximum number of FDLs needed in one bundle of the first stage is 1, while the result of Theorem 1 is
In addition, from (3) we can see that the upper bound of D i j is in proportion to the maximum burst length l max . In fact, with the increase of the maximum burst length, the time interval in Lemma 2 is increased but the number of bursts which will conflict with each other may decrease correspondingly. Thus, a tighter upper bound may exist which does not grow infinitely with the increment of l max .
Based on above observation, we provide an improved upper bound as follows:
Theorem 2: Suppose the feedforward network is started from an empty system. The number of bursts that are routed to D i j in a time slot is bounded by
where
Proof. Take the first stage of FDLs into account, it is easy to find that a packet can only conflict with those packets come from the other N-1 input links at the same time. Thus, the number of bursts arriving simultaneously and routed to the same bundle in stage 1 can be achieved as follows:
the first bound is the same as that in Theorem 1 for i = 1. The second one is achieved based on Lemma 1 that r/l max is the number of bursts needed to occupy the delays within r interdeparture slots. Since at most N bursts may arrive at the same time slots, we have N r/l max bursts which may arrive to the same bundle of the first stage at the same time.
Then, it is necessary to indicate that the upper bound can not grow infinitely with the increase of maximum burst length. It is obvious that the packets belonging to the same burst can never conflict with each other, i.e., the conflicting packets in D i j must come from different bursts. Assume one of these bursts is larger than 2r i , we mark it as the k th burst and its length is l. By so doing, the k th burst will occupy consecutive l departure time slots at the output link. According to Theorem 1, this burst can bring i − 1 instead of l max in (3). For those bursts that are not routed to D i j , we can also apply the same idea because they do not contain any conflicting packets at all. By constraining the maximum length of a burst upper to 2r i − 1, the Theorem 2 follows.
A Lower Bound
In this section, we develop a method to construct a real packets arrival case which can be used to achieve a lower bound of delay lines needed in each bundle. According to Lemma 2,  Table 2 is the set of delays {0, 1, 4, 5, 8, 9, 12, 13}. From the r-ary representation of delay x, we can see in stage i that the intervals of delay values in G j i are the same in terms of different j. In addition, the N-to-1 multiplexer can be initialized from any length. Take Table 2 as an example, we can start from an empty system for the calculation of |D 20 |. Similarly we can also start from a multiplexer with 2 packets for the calculation of |D 21 |. Therefore the number of packets routed to D i j simultaneously is the same in terms of different j, i,e., the value of |Di j| is only related to the parameter i. For simplicity, we select the first bundle of fiber delay lines, D i0 , as the tagged bundle.
The main idea of our method is to order the bursts arriving at former time slot experiences a longer delay and the bursts arriving at later time slot experiences a shorter delay before enter the delay lines in the tagged bundle. By so doing, this construction can accommodate more bursts which may conflict in the same bundle D i j at the same time. We further order that the conflicting bursts are assigned with consecutive delays in G j i . For G 0 2 in Table 2 , if a burst is assigned with a delay in {0, 1}, the next conflicting burst is eager to be assigned with a delay in {4, 5}. Thus the arrival time interval of bursts can be further constrained as
, where r i is the distance between two consecutive delay set in G j i . Given the bursts' arrival time t and the assigned delay x, we can achieve the next available insertion time slot (marked as τ(x)). For the example of G 0 2 in Table 2 , if a packet is assigned with a delay in set {1,5,9,13} at t = 1, we can assign another packet with a delay x in set {0,4,8,12} at time τ(x) = 2 so that they can come to D 20 at the same time t = 2.
The length of each arrival burst should be carefully designed so that the bursts arriving from other inputs or at future times can also be assigned with the delays whose routing paths contain the tagged bundle. To decide the length of a newly arrived burst, we need the information about the next idle time slot among other N − 1 input ports (marked as t next ). After calculating the distance between the assigned delay at current time slot and the next unassigned delay in G 0 i (marked as U i (t)), the length of the new burst is required to approach or equal to this distance. More formally, the construction procedure of the bursts arrival case can be given as follows:
Create a burst with length l Set input k busy in [t, To ease the understanding of the construction, we give a simple example to explain the process, where N = 2, r = 2, M = 4 and l max = 4. Example. Here, we calculate the value of |D i j | in the second stage (i = 2). Take the first bundle |D 20 | as the tagged bundle. Initially, all the input ports are idle and the output queue has r i−1 − 1 = 1 packet since we hope the first conflicting burst experiences a long delay before enters the delay line in tagged bundle (please refer to Table 2 and Figure 3 as illustration).
Step 1. At time slot t, the input port 1 is idle and t next is current time slot t which means the new burst will use the delay 1 in G 0 2 and another burst will be created from input 2 at the same time. Besides that, we prefer to assign delay 5 in G 0 2 for the next burst. Thus, the assigned delay of the first burst is 1 and the length is 5 − 1 = 4. This burst will occupy one delay line in the tagged bundle from time t + 1 to t + 4. Set |D 2 j | = 1.
Step 2. At time slot t, input port 2 is idle, t next is t + 4 and the length of output queue is 5, which means the new burst will be assigned with delay 5 and we hope the next arrival burst can use delay 8 at time t + 4. Thus, in order to guarantee the length of output queue is 8 at time t + 4, a burst with length 8 − 5 + (t + 4 − t) = 8 is required to insert to the output queue. However, the maximum length of burst is 4, so the input 2 will create a burst whose length is 4. This burst will occupy one delay line in the tagged bundle from time t + 1 to t + 4. Set |D 2 j | = 2.
Step 3. At time slot t + 4, the input port 1 is idle, t next is t + 4 and the length of output queue is 5 since 4 packets have already departed from output port. It means the new burst will be assigned with delay 5 and the next burst wants to use delay 8 which is the next unused delay in tagged bundle. Thus a new burst with length 8 − 5 = 3 is created so than the length of output queue is 8 at t next .
Step 4. At time slot t + 4, the input port 2 is idle, t next is t + 7 and the length of output queue is 8. It means the new burst can use the delay 8 in G 0 2 and the next burst wants to use delay 12. Thus, the assigned delay of the first burst is 8 and the length is 4. This burst will occupy one delay line in the tagged bundle from time t + 4 to t + 7. Set |D 2 j | = 3. Now, we have been out of the arrival time interval [t, t + 4] for conflicting burst. The process stops and returns |D 2 j | = 3.
Extensions to the N-to-N Output Buffer Switches
The Non-conflicting Construction
The N-to-N output buffer switch can be considered as an N × N switch fabric with N parallel and independent queues at the output ports. Such an N-to-N output buffer switch can be built by connecting N input ports to N multiplexers directly via a full mesh network, as shown in Fig. 4 . However, the direct construction is not efficient since the hardware in each multiplexer can not be shared by packets destined for different outputs. For this, the N-to-N output buffer switch in Fig. 5 was proposed in [1] by using the similar feedforward architecture as that of multiplexer, except that there are N output ports at the last stage. Thus, the length of each output queue is r M − 1 and the routing path can be expressed by the r-ary of the assigned delay as well as that in (1).
To guarantee that the construction in Fig. 5 is nonconflicting, a sufficient condition on the number of delay lines in each bundle was provided in [1] . Let E i j be the set of the delay lines in the j th bundle of stage i and |E i j | be the number of FDLs in E i j . The sufficient condition can be expressed as follows:
Based on the analysis in Sect. 2, the following improved upper bound can be deduced directly.
Theorem 3:
Suppose the feedforward network in Fig. 5 is started from an empty system. The number of bursts that are routed to E i j in a time slot is bounded by
The proof is omitted since it is similar to the proof of Theorem 2.
A Lower Bound for the Construction of N-to-N Output Buffer Switches
Here we adopt the idea in Sect. 3 to construct a bursts arrival case which can be used to achieve a lower bound of |E i j |. This can be reduced to the problem that at least how many bursts may come to the j bound of FDLs in stage i at the same time. Since the number of inputs is equal to the number of outputs, we can not simply assume that all the output queues have the same lengths initially. Here we initialize the output queues as follows. First, the length of N − 1 output queues are set as r M − 1 and another one is set as 0. This is possible because the arrival rate of N inputs is large that the service rate of N − 1 output queues. Then we order that all the input bursts are destined for the empty output queue. By so doing, there are N packets inserted into the empty output queue and 1 packet departs from any one of all the N output queues at each time slot. Once a queues is equal to r i−1 − 1, we can initialize the output queues for the calculation of |E i j |. If the lengths of all the queues are equal to or larger than r i−1 − 1, we set the initial lengths of all the queue as r i−1 − 1, otherwise we set the current length of the former empty queue as its initial length but the other N − 1 output queues as r i−1 − 1. After that, the arrival bursts in future time slots will fill the output queues one by one until the arrival time is out of the time interval [t − r i−1 + min(r i , l max ) + 1], which is similar as that of multiplexer. Formally, the construction procedure is given as follows:
Set length of output queue
Set the number of conflicting bursts for each output 
Performance Evaluation
To illustrate the new bounds, we compare the numbers of required FDLs under some typical settings of both N-to-1 multiplexers and N-to-N output buffer switches. First, Table 3 considers the scenario of multiplexer that r = 2, M = 9 and N = 5. As the l max is larger than 2r
i , i ∈ [1, 4] , the upper bounds of |D i j | given by Theorem 2 are slightly decreased compared with that deduced from Theorem 1. While the lower bounds of |D i j | given by our method lead on a distinct reduction of FDLs, where the sum of FDLs length ( 16% of the FDLs can be reduced by adopting the lower bound. It is notable that the improvement can be more explicit with the increase Table 3 Comparison of |D i j | in N-to-1 multiplexer where r = 2, M = 9, N = 5 and l max = 38. Theorem 1  5  10  20  21  Theorem 2  5  10  17  18  Lower Bound  5  9  9  9   Table 4 Comparison of |D i j | in N-to-1 multiplexer where r = 3, M = 9, N = 32 and l max = 1024. Table 5 Comparison of |E i j | in N-to-N output buffer switch where r = 3, M = 9, N = 32 and l max = 1024. of maximum burst length. In Table 4 , when we ordered r = 3, M = 9, N = 32 and l max = 1024, the results show that the reduction of FDLs by Theorem 2 and our lower bound can be 10% and 20% respectively. For the construction of N-to-N output buffer switches, we take the feedforward network that r = 3, M = 9, N = 32 and l max = 1024 as an example. Under this setting, it is interesting to see that both the upper bounds and lower bounds of |E i j | in each stage i are the same in terms of different bundles j. As shown in Table 5 , by constraining the maximum burst length, the Theorem 3 decreases 50% number of FDLs compared with that of the original upper bound in (8) . Further almost 75% number of FDLs can be reduced after adopting the new lower bound in Sect. 4.2.
The above discussions show that our new bounds can really reduce the hardware cost. Here we study the performance of packet loss rate through simulations under different traffic and loads. In our simulation, we adopt the uniform Bernoulli traffic model and the uniform burst traffic model where the average burst length is 32 time slots as that in [1] . For practical, the multiplexers are applied as buffers at the outputs of an N-to-N switch, as that illustrated in Fig. 4 . We list in Table 6 the packet loss rates of multiplexer with r = 3, M = 9, N = 32 and l max = 1024. Under the traffic load ρ = 0.5, 0.9 and 0.99, our new bounds can always achieve the same performance as that of the old bound in [1] . Here the packet loss mainly occurs because the queue length is larger than the finite buffer size, i.e., the buffer is overflow.
For the more efficient construction of N-to-N output buffer switch in Fig. 5 , we study the performance of packet loss rate under the heavy traffic load ρ = 0.99 as an example (under the light traffic, we have the same results as that in Table 6 ). In addition, we adopt the unbalance traffic in [8] with unbalance factor 0.5 to evaluate the performance. The simulation results are summarized in Table 7 , which indicates clearly that both the improved upper bound and lower bound can achieve the same packet loss rate as that of the old bound. Further we argue that the lower bound deduced by using a greedily delay line assignment method in this paper can be further decreased to achieve a good performance approximation. Here we import the reduction ratio δ and reduce the number of FDLs in each bundle as |E i j | · δ, where |E i j | is the lower bound proposed in this paper. As shown in Table 7 , when δ = 0.9 the packet loss is approximate to the results of upper bound. The packet loss happens not only due to the buffer overflow but also due to the insufficient FDLs in each bound, i.e., the optical buffer can not completely emulate the FIFO behavior. We can also see from Table 7 that the packet loss rate decreased obviously until δ = 0.5, where half of the hardware in our lower bound are further reduced.
Conclusions
In this paper, we studied the non-conflicting condition for the SDL-based multistage feedforward network in Fig. 1 to exactly emulate the N-to-1 multiplexers. First, we provided an improved upper bound of the required FDLs in this network. Second we gave a method to construct a practical case which can be used to derive a lower bound of the required FDLs. Then these ideas were further extended to the construction of N-to-N optical output buffer switches. Through the comparison of the numbers of FDLs under different conditions, we found that the proposed bounds can significantly decrease the hardware cost. By simulation, we further verified that the new bounds, although require less hardwares, can still achieve the similar packet loss rates as that of the old bounds in both the finite sized N-to-1 multiplexer and N-to-N output buffered switch.
