Abstract Shared-Memory Optical Packet (SMOP) 
Introduction
The explosive increase of the Internet traffics requires fast and high capacity switching networks. All-optical switching, which implies that the data transmission is in the optical domain but the switching control is in electrical domain, is able to meet these requests since it eliminates the quite expensive optical-to-electronic-to-optical conversion and provides us an opportunity to make good use of the enormous bandwidth of optical networks [1, 2, 8, 11] . Time sliced (synchronous) optical switching without wavelength conversion is a simple and cost-effective technology for implementing all-optical packet switching [3, 4, 5, 6, 9] , where contending packets are temporarily buffered and forwarded at a later time slot. In optical networks, fiber delay lines (FDLs) are usually adopted to delay packet since the optical-RAM buffer is not available yet. Unlike the traditional electronic memories in which packets can be randomly read and written, a packet entering a fiber delay line must emerge a fixed time later and can not be removed before that time. As such, the implementation of large buffers requires a large number of fiber delay lines and thus a high hardware cost. To reduce the amount of required memory in a packet switch while guaranteeing a desired level of throughput or packet loss rate, numerous shared-memory optical packet switches have been proposed in literature, see, for example, [5, 6, 10, 11] , and the references therein.
A typical Shared-Memory Optical Packet (SMOP) switch architecture is illustrated in Fig. 1 , which was first proposed by Karol in [5] to support fixed length optical packet switching (ATM switching). It consists of N input ports, N output ports and Z feedback FDLs (of appropriately-selected lengths) which are shared among all the input ports. The outputs (inputs) of FDLs and the inputs (outputs) of the switch are collectively called the inlets (outlets) of the switch fabric. The (N +Z)×(N +Z) space switch can switch a packet either directly to an output port or to one of the delay lines, according to how much delay the packet needs.
For contention resolution, a scheduling algorithm is usually required to direct packets through the switch. In [5] , two scheduling algorithms (Non − F IF O/F IF O) for the SMOP switch have been proposed. In the Non − F IF O algorithm, a packet failed in the contention is sent to the shortest delay line that has the fewest packets destined for the same output, and buffered for another round of contention. If there are no free FDLs available, the new arrived packets will be dropped. In the F IF O algorithm, a FIFO list for all packets in each flow 1 is always maintained to indicate the position of buffered packets in FDLs. In addition, a priority is assigned to each arrived packet based on its "anticipated transmit time". Based on the FIFO list, this algorithm ensures that a packet can only be scheduled when all other packets with higher priorities have left the switch. Since the above two algorithms can not guarantee the departure time for packets that are lost in the contention and have to be buffered, the number of packet recirculation is unpredictable in advance. The simulation result in [5] indicate that the maximum number of recirculation in Karol's algorithm can be as high as 10 times [5] , which is undesired since the optical signals will be significantly attenuated with such number of recirculations. To alleviate the above recirculation problem, S. Y. Liew et al. proposed three reservation based scheduling algorithms for the single-stage shared-FDL switch in [6] . With the reservation scheme, the control algorithms perform not only the output port matching for current time slot but also the FDLs assignment for the entire journey of a delayed packet so that it can be scheduled to match with the desired output port in the future time slots. The number of packet recirculation here is constrained by the maximum number of FDLs delay operation. If the delay path to the right destination is unavailable, the packet will be dropped to avoid any resources occupation.
It is notable, however, that a significant problem with the reservation based algorithms is that the packets may be mis-sequence (to be explained in Sect. 2). In the traditional electronic domain, the mis-sequence problem can be easily solved through introducing additional resequence buffer at output ports. Unfortunately, designing a resequence buffer using switch and FDLs is difficult and costly [7] . In this paper, we focus on developing a framework to prevent packet from mis-sequence while maintaining a similar packet loss rate and delay performance as the original reservation-based SMOP switches.
The rest of the paper is organized as follows: Sect. 2 provides a review of the reservation scheduling algorithms 
and identifies the sources of packet mis-sequence problem. Sect. 3 first defines the "last-timestamp" variables to prevent from packet mis-sequence and then introduces our framework to ensure the packet loss rate and delay performance by modifying the FDLs lengths arrangement as well as the scheduling process in the current reservation-based algorithms. Sect. 4 provides the simulation results. Finally, Sect. 5 concludes this paper.
The Reservation Scheme and Packet MisSequence Problem
In this section, we first take the multi-packet FDL assignment (MUFA) algorithm in [6] as an example to introduce the reservation based scheme, because this algorithm is simple and able to process multi packets simultaneously. We then show that the packet mis-sequence problem is actually caused by two reasons, namely the "restriction of FDLs" and "restriction of algorithm". The notations employed in this paper are listed in Table 1 .
Reservation Scheme
In the MUFA algorithm, FDLs are allowed to have the same delay values where the delay values are distributed Fig. 3 , if FDL a is idle at time slot t then there is an arc from T (t) to T (t + D a ) in the transition diagram. At each time slot, the MUFA algorithm uses breadth-first search-based algorithm to find routes from G for the received requests. The rule is that the parent node will make decision for its accessible child nodes, i.e. the node T k−1 (τ ) makes decision for nodes T k (t) when the route from
If there is a request whose destination port is idle at time slot t, the route from T 0 (0) to T k (t) will be marked in T for this request. For the example considered in Fig. 2 (where symbol X means this channel is not idle), we can see the scheduling result of packet (i, j).m that arrived at time slot 0. The delay path of this packet is as follows: First, this packet is going to be buffered in FDL b with length 2 T cell . Then, at time slot 2, the input port of FDL a will be scheduled to connected to the output port of FDL b. Finally, at time slot 3, it is possible for the output port j to "read" the packet from FDL a. This process can also be logically represented by slot transition diagram G in Fig. 3 as 
Packet Mis-sequence Problem
It is notable that the above reservation algorithm does not regard to the packet mis-sequence problem, since the packets departure order may be reversed due to the contention of finite FDLs. Take the configuration table in Fig. 4 as an example, where N = Z = 2 and the lengths of FDLs are 1 and 2 T cell , respectively. Assume that a packet (1, 1).m reached the input port 1 at time slot t, but both the output port 1 and the FDL with length 1 T cell were not available (due to the subscription by packets arrived at previous or current time slots). In this case, the only choice is the idle FDL with a larger delay value 2 T cell . Hence, after the arbitration we can find a gap [output 1, time slot t + 1] appears in the configuration table. If a packet (1, 1).m + 1 happened to arrive at time slot t + 1 and filled the gap, the packet (1, 1).m + 1 would leave before packet (1, 1).m and the packet mis-sequence happened in flow (1, 1) . Because this kind of mis-sequence problem is due to the limitation of FDLs, we use term "restriction of FDLs" to refer to such cause.
Another cause of the packets mis-sequence problem is the "restriction of algorithm". Based on the MUFA algorithm, the parent node is going to make decision for its child nodes. Consider two routes with the same number of delay operations, it is possible that a larger delay route is selected rather than a smaller one, because the parent node of the former route has a smaller index [6] . As shown in Fig. 3 , the delay route (2) . In this case, it is also possible to bring a gap in the configuration table T .
Maintaining Packet Order in SMOP Switch
The analysis in Section 2 indicate that the mis-sequence problem may happen in a SMOP switch due to the existence of gap in configure table T . It is easy to see that a large gap may increase the probability of packets mis-sequence and deteriorate the packet loss performance. To avoid packets mis-sequence in reservation based scheme, we define here a "last-timestamp" variable T last (i, j) to remember the furthest output time slot subscribed by the last arrived packets of flow(i, j). When making decision for unfulfilled requests in node T k (t) of G, we only need to care about those requests which satisfy T last (i, j) < t and output j is idle at time slot t. We call these requests "entitled requests" in the rest of this paper. Thus the assigned departure time slots for entitled requests in each flow will not violate the FIFO constraint. However, this method may aggravate the performance in terms of packet loss rate and delay, because the gap in T may be heavily extended. In the following, we will illustrate our approaches to reduce the time slot gap in configuration table.
Alleviating the "Restriction of FDLs"
Notice that under the exponential distribution of FDLs lengths, the MUFA algorithm will introduce a large gap in T and add the delay operations in switch. Consider the case given in [6] , where the delay values are distributed among
.m arrived at time slot t and T last (i, j) = t + 8, but the fiber delay lines with length 8 T cell are not idle, then the next choice is the node T 1 (16) in level−1, which means the packet (i, j).m has to be delayed 16 time slots via the delay line with length 16 T cell . This increases the probability of packet mis-sequence. In additional, since the only way to obtain a path of odd time slot delay is to combine the delay line of length 1 T cell with other FDLs. So the number of delay operations may be increasd. Therefore, we would like to return to the linear distribution of FDLs lengths, D, 2D, 3D, 4D, · · · . By using the linear distribution of FDLs, the probability of emergence of a large gap in T decreased. In the next section, we compared the packet loss rate among different D in the MUFA algorithm but maintaining packet order. The results showed that the packet loss rate is lowest when D = 1, thus we select the linear distribution of FDLs lengths, 1, 2, 3, 4, · · · , in the SMOP switches. On the other side, one may deduce that the complexity of algorithm would increase due to the size of slot transition diagram becomes larger than the exponential distribution of FDLs lengths. To overcome this problem, based on the observation of our simulation in Fig. 6 of Sect. 4 that the distribution of the number of delay operation involves in 2 times, we will constrain the maximum packets recirculation only in 2 (i.e. K = 2) in our algorithm.
Eliminate the "Restriction of Scheduling Algorithm"
First of all, we give the principles used in the packet scheduling (1) minimum delay operations (2) select the paths of delay in ascending order in each level of G so as to decrease the packet delay and the gap in T . The first reason why we shall guarantee the minimum delay operations is the optical signals get attenuated when they are switched [11] . Another reason is multi delay operations will subscribe the FDLs resources of future time slots, which increases the probability of contention of future time slots and results in increment of packet loss rate. For example, two packets arrived at time slot t but all the delay lines with required large delay have been subscribed by packets arrived at former time slots, then the scheduling algorithm try to find a delay route of the same delay value by multi recirculation. If only one delay route is available, the second packet have to be dropped.
Based on the principles above and linear distribution of FDLs lengths, we propose the Sequence MUFA algorithm (SMUFA). In SMUFA, each node in G are required to reserve a list of all the effective routes from T 0 (0) and the nodes in each level will be read twice. When accessing a node T k (t), the algorithm first checks whether those routes from T 0 (0) to T k (t) are available, then matches the unfulfilled requests according to the T last (i, j) and output ports information at time slot t. After the matching process of all the nodes in level k, those unmatched nodes will copy their effective routes to their child nodes if the FDLs is available. The search procedure may be terminated early if all the requests have found their routing paths.
The notations employed for explaining the procedure of SMUFA algorithm are shown in Table 2 .
Table 2. NOTATIONS USED FOR SMUFA AL-GORITHM

Variables Comments
SET d "The set of destination nodes" contains those nodes T k (t) that packet is possible to be sent to the output port at time slot t. Initially all the notes belong to "the set of destination nodes". SET t "The set of transfer nodes" contains those nodes that failed in the output ports matching and can only be used as the 'parent nodes' to help finding longer delays, such as those nodes T k (t) who have the same delay value but unmatched at higher levels. 
Performance Evaluation
We assume the uniform Bernoulli traffic arrival process and N = Z = 32 in our simulation model, which are the same as that in [6] . In Fig. 5 , the packet loss rate performance is compared by using linear FDLs length distributions which is consecutive multiples of different D and the first-come first-served scheduling policy is adopted in MUFA algorithm. The plots show that the packet loss rate is lowest when D = 1 and increases with the increment of D. The curve of linear FDLs distribution when D = 2 is similar to the exponential FDLs distribution. Thus, to achieve the best packet loss rate performance we adopt the D = 1 linear FDLs length distribution in SMOP switches. Fig. 6 compares the number of delay operations under our SMUFA algorithm with the linear distribution and the exponential distribution of FDLs lengths. We can see that the number of delay operations almost involves in 2 under different F values in both of the two kinds of distribution. Furthermore, the delay operation in the linear distribution was mainly concentrated in 1 circulation than that of the exponential distribution. It is worth noting that, when F > 128, the distribution of assigned number of delay operations are almost the same and no packet is assigned a large FDL delay route of more delay operations. The reason is that the breadth-first search-based algorithm search nodes from left to right in each level of slot transition diagram G. Thus, to achieve a larger delay to avoid contention or guarantee packets FIFO, the FDL routes always combine short delay lines with long delay lines, which lower the probability to achieve a large delay combined by long delay lines in future time slots. Fig. 7 shows the packet loss rate change under the processes of our solution. At first if we only introduce the "last-timestamp" variable to avoid packets mis-sequence but keep the exponential assignment of FDLs length and using MUFA algorithm for solving contention, the performance heavily deteriorate. Then using the linear distribution to replace the exponential distribution of FDLs lengths, we can see the packet loss rate improved almost 10 times. Further, the SMUFA algorithm that guarantees the nodes in each level are read orderly achieves nearly 10 −7 packet loss rate at the load of 0.88. Fig. 8 compares the average packet delay, both the MUFA and SMUFA algorithm achieve a low packet delay. But because of the FIFO property and different FDLs lengths distribution, the SMUFA algorithm needs a few higher delay than MUFA algorithm under light traffic. Totally, the SMUFA algorithm achieves a analogous performance as MUFA algorithm but keeps packets of a flow in sequence. In Fig. 9 , the packet loss rate of the SMUFA algorithm under various switch size N is presented. We notice that the probability of packets discarding drops quickly with the increment of switch size. Under 64 input/output ports, the packet loss rate can achieve 10 −7 when the traffic load is 93%.
Conclusions
In this paper, we identify the two reasons that may cause and aggravate the packets mis-sequence problem in the reservation based algorithm of SMOP switch: the restriction of FDLs and the restriction of algorithm. Based on the analysis, we first defined the "last-timestamp" variable to avoid packets out-of-order, then modified the FDL length distribution and proposed an improved algorithm SMUFA to guarantee packet loss and delay performance. Through simulation, our approach achieves an analogous packet loss rate and delay performance as the original non-reservation algorithms but keeps packets of a flow in-sequence.
