†a) , Nonmember, Hiroaki HARAI † †b) , Naoya WADA † †c) , Fumito KUBOTA † †d) , Members, and Yoichi SHINODA † † †e) , Nonmember
Introduction
Photonic technology will be required for packet switching because of its fine granularity since the granularity of a WDM (Wavelength Division Multiplexing) light-path network is coarse [4] . Although IP/GMPLS (Generalized Multi-Protocol Label Switching) over WDM approach provides a fine granularity, its slower electronic processing, such as memory access for header analysis at IP routers, will be a bottleneck of a network. A photonic processing is thus required for high-speed processing. Systems in which the label of a packet is optically analyzed are demonstrated experimentally for a photonic packet switch [9] , [12] , [13] .
While labels can be analyzed optically and 1 × N photonic packet switching have been demonstrated, the experimental systems did not treat N ×1 contention resolution, that is, avoiding collision of packets coming in from multiple ports. The contention resolution is the main topic in this paper. Since an optical RAM (random access memory) is still immature, many researchers have investigated contention resolution, which is required for an N × N packet switch [3] , [5] - [7] , [15] . For example, wavelength conversion is introduced to avoid the collision of multiple packets [3] . Although wavelength conversion with opto-electronic and electro-optical conversions is available, the wavelength conversion more than 40Gbps is not available yet. We need to demultiplex high-speed optical signals on a single channel into electronic signals on multiple channels before the wavelength conversion. We need multiple wavelength converters for a single optical packet, which is not cost-effective. Deflection routing is also used to avoid the collision of packets [15] . Even though deflection routing decreases packet loss probability, it may degrade performance of original application (e.g., TCP). This results from the probable change of the arrival order of packets sent from a source node.
Another method for contention resolution is use of optical fiber delay line (FDL) buffer consisting of optical switches and FDLs for making delay to packets [5] , [7] , [14] , [16] . Since packets arriving at the same time are routed to different FDLs, collision can be avoided internally or at the output. Unlike wavelength conversion, FDLs are useful for high-speed data. This method maintains packet order between source and destination nodes while deflection routing does not. An FDL buffer has these advantages, and is a key technology of photonic packet switching for contention resolution.
FDL buffer requires a large-scale optical switch, which results in increasing of the physical size of a packet switch. It is difficult to integrate high-speed LiNbO 3 switches. However, this problem is solved by using highly integrated semiconducter optical amplifier (SOA) switches. The bulk of FDLs may also increase the physical size of a packet switch. However, this problem is alleviated by the following three methods. One method is to reduce the number of FDLs by introducing multi-stage buffering; specifically, connecting multiple blocks of FDLs in tandem [5] , [8] . To avoid collision internally or at the output, a scheduler controls all the FDLs. The second is a method to make the unit length of FDLs shorter by increasing the data rate. This method is possible as we have experimentally demonstrated 80 Gbps 1×N photonic packet switching [12] . We can also use sheetfiber for downsizing.
For further development of the FDL buffer, we have to solve the following problems; keeping high signal-to-noise ratio and compensating power penalty of optical signals that Copyright c 2005 The Institute of Electronics, Information and Communication Engineers results from differences in internal paths in the FDL buffer. On the other hand, an optical FDL buffer has crucial advantages. Since an optical FDL buffer does not have optical or semiconductor memory, it does not have bottleneck like the current electrical packet switch. It does not need opto-electronic conversion and serial-parallel signal conversion which are required when packets are cached in memory buffer. These merits become obvious as packet transmission speed becomes faster.
In this paper, we propose a multi-stage buffer architecture, that is, a buffer consisting of multiple blocks of FDLs for contention resolution. Scheduling, which determines the designated FDL of a packet for contention resolution, is operated in electrical domain even when data street of buffer is provided optically. We do not expect optical solution for scheduling in near future because the queue length in the buffer is changed dynamically and an arithmetic system is required to count the length.
The point of this paper is to take electronic processing speed for contention resolution into consideration. We think that application area of photonic packet switching is in switching of higher rate packet (e.g., 160 Gbps or more) since up to 40 Gbps electrical packet switching will appear. In 160 Gbps, a 500 byte-long packet is equivalent to 25 nsec. The output buffering method, which is considered in this paper, requires higher processing speed than the time equivalent to the packet length. The required processing time is proportional to the inverse of the number of input ports N (see "one-stage buffer" in Table 1 ). Slow electronic processing limits the number of ports. We thus propose a new structure, a multi-stage optical buffer, to compensate for the slow processing speed of the processor. Our multi-stage buffer is based on tree structure in which each node has a block of FDLs and a scheduler. Since the number of ports of the block is smaller than N, the same processing speed as single-stage buffer is not required (see "two-and threestage buffer" in Table 1 ). The required processing time is N 1 2 times as long as that for a single-stage buffer. Each processor works independently, which is also different from other multi-stage buffering methods [5] , [8] .
Another contribution of this paper is that the scheduler in the multi-stage FDL buffer handles asynchronous arriving variable-length packets [2] , [10] , [11] . This is a desirable feature in order to use photonic packet switches as an infrastructure of the Internet. It is known that the unit length of fiber delay lines makes different packet loss probabilities for the case of variable-length packets and a singlestage FDL buffer. In [2] , it is figured out that the optimum value is approximately 0.3 times as long as the mean packet length for the case of load is 0.8. In this paper, we will find the optimum values of the unit length of fiber delay lines in the multi-stage buffer. Accordingly, we develop highperformance multi-stage buffer in photonic packet switches. While the processing speed requirement for scheduling is decreased in the multi-stage buffer, the number of FDLs is likely to be increased. We therefore find the required number of FDLs not to degrade performance compared to singlestage buffer. To achieve this investigation effectively, we establish an analytical approximation method. This paper is organized as follows. We describe a packet switch architecture and a simple scheduling method in Section 2. We propose the multi-stage optical buffer structure in Section 3. In Section 4, we develop an analytical approximation method to calculate performance of the buffer, and assess the accuracy of the analytical method. In Section 5, we show the performance of multi-stage buffer and find the optimum unit length of FDLs in the multi-stage buffer. We finally present our conclusions and mention future work in Section 6.
Photonic Packet Switch Architecture

Overview
We show a photonic packet switch architecture in Fig. 1 . We apply our buffer management to the optical buffer in this architecture. The output-buffered N × N packet switch consists of N '1 × N' buffer-less packet switches followed by N 'N × 1' buffers. Every 1 × N switch is connected to all N × 1 buffers. 1 × N buffer-less packet switches make the address lookup function faster by providing photonic address lookup functions [12] to the packet switch. They can handle packet switching of asynchronously arriving variable-length packets with precedent activity [13] . As a result, the architecture provides ultra-high node throughput to the packet switch. N × 1 buffers are used to avoid packet collision and to reduce the packet loss probability.
Optical Fiber-Delay-Line Buffer
We use optical fiber delay lines (FDLs) to compose an N × 1 optical buffer. A large optical space switch of which switching time is an order of nano-second is required to realize photonic packet switching. Since the switching time of a thermo-optical switch or an optical MEMS (Micro-ElectroMechanical System) switch is from 1 micro-second to tens of milli-second, these do not fulfill the requirement. For example, the state-of-the-art photonic packet switching uses gate switches such as LiNbO 3 intensity modulators [12] or semiconductor optical amplifier gates (SOAGs) [1] , [14] with switching times less than 1 nsec in the photonic packet switching. Such fast optical switches are needed to optical FDL buffer as in [14] , in which a N = 2 and B = 3 FDL buffer by using a sequence of LiNbO 3 -based 1 × 2 optical switch has been demonstrated. To solve the physical size problem in larger B, we will require SOA switches.
Scheduler
Unlike the RAM buffer for an electronic node system such as an IP router, the optical straightforwardness property of the FDL buffer causes difficulties with the traditional store and forward approach. An appropriate FDL must be selected for each arriving packet before it arrives at the FDL buffer. The queuing function is implemented by switching arriving packet to one of FDLs. An arriving packet is inputted into a shortest FDL that satisfies the condition that the packet does not collide with another packet when the packet departs from the FDL. If any FDL does not satisfy the condition, packet is discarded.
In N × N output-buffered photonic packet switch, up to N packets may arrive simultaneously at the N × 1 optical buffer. When we employ fiber delay lines, we must implement a scheduling system to calculate all delays for the N packets within time l min , which corresponds to the minimum packet length. The allowable processing time for one packet is thus l min /N.
We describe the behavior of scheduler which implements the queuing function. The scheduler maintains internal variable f , which represents the time at which all stored packets in the buffer will depart and the buffer becomes idle. It receives a signal indicating a packet arrival before the packet actually arrives at the optical switch in the buffer, and calculates an appropriate delay of the packet. For example, assume that a packet of which length is l arrives at time t. In an ideal case, the delay time to avoid the collision is f − t. However, due to the discrete-time nature of the fiber delay line buffer, the delay to the packet is ∆D, where
packet is allowed to enter the delay line of which delay is ∆D. When ∆ ≥ B, packet is discarded. When the packet enters a delay line, the value f is updated to f = t+∆D+l to appropriate scheduling for future arriving packets.
We may find void space between a newly entered packet and the previous and adjacent packet, due to the discrete-time nature of the FDL buffer. These void spaces are different depending on the unit length of delay line D, so packet loss probabilities depend on D. If D is small enough, delay lines are used efficiently but capacity of buffer B × D is small. It is likely to cause packet loss, because the buffer can easily filled up with small number of packets. On the other hand, if D is large enough, the void is also large so the buffer can easily filled up. The D has optimum value and the value is figured out as 0.3 times of average length of packets for the case of the offered load is 0.8 [2] .
To implement scheduling of variable-length packets easily, we can also use a round-robin scheduling method described in [6] . The time constraint of this method is the same as the above method.
Multi-Stage Buffer and Its Scheduling
At an N × 1 buffer, a maximum of N packets may arrive at the same time. Consequently, a scheduler must decide the direction of a packet within l min N since every decision requires an access to the same variable f . An electronic technology may not be able to satisfy the time constraint. Thus, in this section, we propose a buffer structure such that the maximum number of input packets in a buffer is decreased, which gives the scheduler a time longer than l min N for the decision. Figure 3 shows a proposed buffer structure. The buffer forms a tree structure. The buffer consists of multiple blocks, each of which is composed of a scheduler and FDLs. Each block is allocated as a node of a tree. We define the number of stages as the number of blocks through which a packet passes. In the first stage of a N × 1 buffer, we allocate M ' The buffer block is designed such that a signal indicating a packet arrival enters the scheduler before the packet arrives at the buffer as described in the previous section. The scheduling scheme is the same as the scheme for the single- Although speed constraint is alleviated by the multistage buffer, a larger number of FDLs are needed to obtain similar performance to one-stage buffer. If each block does not have a sufficient number of FDLs, performance degradation is expected. Actually, we could not find combination of FDLs in each stage such that performance is improved compared to single-stage buffer having the same number of FDLs. Thus, the total number of FDLs is increased for obtaining similar performance to that of single-stage buffer. On the other hand, since multi-stage buffer structure requires MB 1 + B 2 FDLs, the number is likely to be increased more than that of a one-stage buffer. This is because FDLs in the first stage are shared by N input ports (ratio of sharing being 1 M ) and the utilization of the FDLs is smaller than that of one-stage buffers. Therefore, we find the sufficient number of FDLs in the first-stage block not to degrade performance compared to single-stage buffer.
Moreover, in a case that packets are of variable length and arrive asynchronously, performance of buffer depends on the unit lengths of fiber delay lines, D 1 and D 2 . The optimum values would be different from the case of singlebuffer structure. Therefore, we find the optimum values of D 1 and D 2 . We discuss the performance in Section 5 by using an approximation analysis method to calculate the packet loss probability in a two-stage buffer, which will be described in the next section.
Analysis
In this section, we establish an approximation analysis method to calculate packet loss probability in a multi-stage FDL buffer with asynchronous packet arrival. We assume that packets arrive at each first-stage FDL block according to a Poisson process with rate λ/M, identically set among N M input ports. The total arrival rate to the multi-stage FDL buffer is λ since we have M first-stage blocks. The packet length is exponentially distributed with average 1/µ. We establish this analysis by extending the existing analytical method for single FDL buffer [2] . The point of our analysis is approximation of the arrival process to the second-stage block. We assume that packets also arrive at the secondstage FDL block according to a Poisson process since the packets come from M first-stage blocks. In this section, we first develop the approximation analysis and then show its accuracy by comparing it with simulated results.
Approximation Analysis
Let π, π 1 , and π 2 denote the packet loss probabilities in the two-stage buffer, in the first stage, and in the second stage, respectively. The packet loss probability in the two-stage FDL buffer is calculated by following equation
Hereafter, we describe a method for determining π 1 and π 2 . Packet loss probability π 1 is calculated as the sum of products of probability of existing k packets in a first-stage FDL block q k and its loss probability P k . It follows that
Next, we calculate conditional packet loss probability P k . A state that k packets exist in the first-stage buffer block is equivalent to a state that k − 1 packets are completely in the buffer and the last packet is partially or completely in the buffer. We calculate P k from a condition that an arriving packet is allowed to enter the buffer just when the sum of lengths of the k − 1 packets is less than (B 1 − 1)D 1 . Conditional packet loss probability P k (k > 0) is calculated as follows.
where 1/µ 1 is average of the excess packet length in the buffer [2] . It follows that
Clearly, P 0 = 0. Since we can represent the arrival and departure process of the buffer as a traditional birth and death model, we can derive steady-state probability q k (k ≥ 1) from the following equation.
State probability q 0 can be calculated by substituting relations of Eq. (5) in the following relation
In the above equations, we need value of the average excess packet length µ 1 to calculate steady-state probability q k , and vice versa (see Eqs. (5) and (4)). We therefore corresponding calculations iteratively until the values converge, and reach the final values.
We describe an outline of approximation analysis for calculating packet loss probability π 1 as follows.
Initialization. Assume that no packet is in a system
(i.e., q 0 = 1). 2. Calculate the average excess packet length 1/µ 1 (Eq. (4)). 3. Calculate conditional packet loss probability when k packets are in the buffer P k (Eq. (3)). 4. Calculate steady-state probability when k packets are in the buffer q k (Eq. (5)).
If new value µ 1 by
Step 2 converges, terminate iteration and go to the next step. Otherwise, go to Step 2 for next iteration. 6. Calculate packet loss probability π 1 (Eq. (2)).
In a similar way, we can calculate the packet loss probability in the second-stage FDL block, π 2 . Here, the packet arrival rate is λ(1 − π 1 ), identically set among M input ports.
Assessment of Accuracy
We assess the accuracy of the approximation analysis method by comparing its result with a simulated result. We generate at least 10 9 packets in the simulation. We will use two multi-stage buffers, N = 16, M = 4 and N = 64, M = 8. Figures 4 and 5 show the comparison result between approximation analysis and simulation for the case of N = 16, M = 4. The average packet length is set to 1 (1/µ = 1). The numbers of FDLs in the first-stage block and second-stage one are set to B 1 = 20 and B 2 = 200, respectively. The vertical axis is packet loss probability of two-stage buffer. The horizontal axis of Fig. 4 is D 2 , the unit length of the optical fiber delay lines in the second-stage block, which is normalized by average packet length. In Fig. 5 , the horizontal axis is D 1 , the unit length in the first-stage block. In these figures, lines show analytical results while dots show simulation results.
From these two figures, we observe that analytical results are in good agreement with simulation results. We find that the packet loss probabilities obtained from the approximation analysis are slightly overestimated than those by simulation. We also find that as the arrival rate is smaller, the approximation is more accurate (see Fig. 4 ). The latter is due to the increase of packets which are not discarded or which are not given latency at the first stage. Accordingly, actual packet arrival process is close to a Poisson process and actual packet length distribution is also close to an exponential one, which are assumptions of our approximation.
Our approximation results are in good agreement with simulation ones even in larger M. We can observe this in Fig. 6 , which shows comparison result between approximation and simulation for a two-stage buffer of N = 64 and M = 8.
Performance of Multi-Stage Buffer
We investigate the performance of a multi-stage buffer with respect to packet loss probability. In doing so, we show performance of a two-stage buffer by focusing on an N × 1 optical FDL buffer, that is, an output port of a N × N packet switch. Although we again see the figures in the previous section, we only focus on the analytical results.
Optimum Unit Length of Fiber Delay Lines
In this subsection, we first show the performance of the multi-stage FDL buffer. We then derive the optimum unit length of fiber delay lines in the buffer and find sufficient number of FDLs in a first-stage block. Figure 4 shows the packet loss probability in a twostage buffer, for the case of N = 16 and M = 4. We set the numbers of fiber delay lines in the first and second stages to B 1 = 20 and B 2 = 200, respectively. The unit length of the first stage unit is D 1 = 0.5. In λ = 0.8, we find the optimum value of D 2 which minimizes the packet loss probability and its value is about D 2 = 0.3. In λ = 0.4, packet loss probability decreases exponentially in proportion to the increase of D 2 , and then does not decrease for large D 2 > 0.1. Accordingly, we obtain the minimum packet loss probability by setting the unit length of delay line D 2 to 0.3. Figures 6 shows packet loss probability for the case of N = 64 and M = 8. We can find that the optimum value of D 2 is almost same as that for the case of N = 16 and M = 4. On the other hand, the optimum value of D 1 is slightly smaller than that case. Figure 7 shows the result for the case of λ = 0.8. We find that the optimum value is about D 1 = 0.4. we find that the packet loss probability in the second stage with smaller number of FDLs in the first stage B 1 is smaller than that in a larger B 1 . This results from packet losses at the first stage. In larger B 1 , since packet loss decreases at the first stage, the packet loss probability of second stage converges.
In D 1 = 0.5, packet loss probability in the first stage is larger than that in the second stage for the case of B 1 < 15. Namely, in B 1 < 15, performance of the second stage buffer mainly affects performance of entire of buffer. When there are more than B 1 = 20 FDLs at the first stage, the performance of the entire buffer converges. To obtain the optimum performance in B 2 = 200, B 1 = 20 is a sufficient condition, where we can ignore the affection of first stage performance because the packet loss probability at first stage is much smaller than that at the second stage.
In D 1 = 0.3, packet loss probability in the first stage is larger than that in the second stage for the case of B 1 < 24. A large number of FDLs are required rather than the case of D 1 = 0.5 although it is almost the same performance as the case of D 1 = 0.5. We can also conclude that D 1 = 0.5 is the optimum value for the first stage from the viewpoint of the number of FDLs.
We focus on packet loss probability in the optimum set of parameters B 1 = 20, B 2 = 200, D 1 = 0.5, and D 2 = 0.3 in Fig. 8 . The probability is almost the same as 1.22 × 10 −3 , which is packet loss probability of a single-stage buffer of N = 16, B = 200, and D = 0.3 at λ = 0.8. As we mentioned, the number of FDLs in the two-stage buffer is actually increased more than that of single-stage buffer. In this case, the two-stage buffer has 280 FDLs and the onestage buffer has 200. However, we can confirm that by using larger number of FDLs two-stage buffer compensating for slow electronic processing can provide similar performance to single-stage buffer requiring high-speed processing.
We next derive combination of B 1 and B 2 to optimize packet loss performance in a given number of FDLs. Table 2 shows packet loss probabilities in a two-stage buffer dependent on the number of FDLs in each stage. We select three combinations. The total number of FDLs is set to 280. From the table, we find trade-off between B 1 and B 2 . Namely, increasing the number of FDLs in one stage for performance improvement results in performance degradation of other-stage buffer. We conclude a combination of B 1 = 20 and B 2 = 200 is the best, which has been used in our performance evaluation. 
Conclusion
We focused on contention resolution using an optical FDL buffer in a photonic packet switch. A scheduler for the contention resolution is operated in electrical domain even when data street of the buffer is provided in optical domain. The scheduler may not be fast enough. To compensate the gap of high-speed optical transmission and slow-speed electronic processing, we have proposed a multi-stage buffer that forms a tree-structure in which each node has a block of FDLs and a scheduler. Since the number of ports of the block is smaller than the number of ports of the packet switch, the same processing speed as that of a single-stage buffer is not required. Our multi-stage buffer supports asynchronously arriving variable length packets, which is a desirable feature for infrastructure of the Internet. We established the approximation analysis method to evaluate the packet loss probability in the two-stage buffer structure. We find the optimum unit lengths of FDLs in first and second stages to minimize packet loss probability for the case of the fixed number of In this paper, we have investigated two-stage buffer. As the number of ports is increased, a multi-stage buffer of more than two stages may be needed. The performance of a large packet switch when using a multi-stage buffer will be evaluated in a future study.
