Abstruf
I. INTRODUCTION
Optical packet switching has been widely recognized as one of the most promising technologies for future optical Inbrner. At central scheduler has to synchronize all the ports as well as the crossbar, so that they can switch at the same time. Due to the latent miations in signal arriving times, the clock and its phase have to be aligned, and extra clock margins have to be considered in order to avoid data loss. All the above operations need time to be completed before switching, collectively resulting in a large switch reconfiguration overhead.
Consequentlyl four penalties appear in OPS: 1) incoming packets have to be buffered in either optical or electrical domain. waiting for the switch reconfiguration: 2) traffic is inevitably delayed; 3) scheduling algorithm is necessary; and 4) an internal speedup is required in order to achwe performance guaranteed switching (i.e. 100% throughput with bounded packet delay). Thus, how to reduce the internal speedup and the packet delay is the main concern.
Unfortunately The remaining part of t h~s paper is organized as follows. In
Section 11: we review the existing OPS switch architecture 14-71. In Section 111, we briefly summarize the two existing scheduling algorithms, DOUBLE and ADAPTIW. Then Sections IV and V optimize these two algorithms and their underlying OPS switch architecture respectively. Finally the paper is concluded in Section VI.
11. EXISTING OPS SWITCH ARCHITECTURE The existing OPS switch architecture consists of the OPS switch fabric (Fig. 1) and the corresponding scheduling procedure/pipeline ( accumulated and TSA (time slot assignment) method is applied to determine a set of A5 configurations to deliver the collected packets. The scheduIing procedudpipeline in Fig. 2 These Ns-N colors can be mapped back to form Ns-N corresponding switch configurations P,,, n E { 1, . . . , NS-N to cover A. The fine matrix R does not need to be explicitly computed because all its elements are guaranteed to be less than T/(Ns-N). Therefore any Nconfigurations (from PNsN+I to PNs) that collectively represent every entry of C(T) and each weighted by T/(Ns-N), can be used to cover the fine matrix R.
Consequently, C(T) can be covered by (Ns-N>+V=.% switch configurations, each equally weighted by #n=T/(N~-iV). It means that the algorithm uses Nsx d,=Ns"T/(N~-N) compressed slots to transmit (at most) T packets for each input port. As a result, Sscbdule in (1) is given by DOUBLE I41 is a special case of our above discussion It uses NFZN configurations to cover C( T ) ( some of its lines sum to A' . In case 1): all of the line sums of A are at most N-1. Thus, A can be covered by N-I configurations according to Lemma 1; In case 2). some of the line sums of A equal to N . As a result, the corresponding lines of C(T) do not have residues in thefine mafrix R because the maximum line sum of C(T) is T (refer to formula (2) and note that N~2 N f o r DOUBLE). So, in the fine matrix R, the corresponding rows or columns should be all zeros. We can refer to Fig. 3 for an example. In We can see that the first MW, the first and the second columns of R are all zeros. This is because the corresponding lines in A sum to "4. Since the coarse matrix A is multiplied by an integer TIN=4 in C(T)'s decomposition and the maximum line s u m of C(ZJ is T=16. there are no residues in these corresponding lines of R. However. in the fine part schedule of DOUBLE, N configurations (that collectively represent every enfry of an NxNmatrix) are used to cover the fine matrix R. For the above example, an all-1 matrix (equals to the sum of the N perfect matching &P8) weighted by TI"4 is used to cover R. Obviously, for those lines of A whose line sums equal to N. slots are worthlessly wasted by DOUBLE'S fine part schedule. In fact, we can make use of these wasted slots to reduce Ns. (I) , the overall speedup S is reduced to # Lenrriia I : The computational complexity of the algorithm is reduced from O(AJ210giv) to U (hJ(h'-l) 
Consequently, S,c,,,=(2~V-1)x(T/A~lT=2-l/h! According to

logiV).
Proof Generally, tlie..edge-coloring algorithm has a time complexity of O(Elog13 [8] , where E is the number of edges and F is the number of vertices in the bipartite multigraph. For DOUBLE, E=O(N') and V=O(N) . For the new method E=O (N(iV-1) ) and li=O(N), Consequentlyy, the computatiod compksip of the algorithm is reduced from O(NzlogN) to # Theorem I: The optimized DOUBLE algorithm can cover C(T) using Ns2hT-1 configurations, each weighted by T/N? with SschchIe=2-11N. The computational complexity of the optimized algorithm is also reduced to O (N(N-1)logN) .
Usually, reducing Ns means that the packet delay 2T+H can also be reduced. At the same time, making the algorithm execution simpler is helpful for achieving a smaller H . A-1)logN) .
U(N(
B. Optimization on ADAPTIIEE
Following the same argument as above, ADAPTIVE [5] can also be optimized. The detailed derivations are omitted and the result is given below. From our previous discussion in Section 111, we know that the N configurations (perfect matchings) used to cover the fine matrix R can be chosen freely as long as they can cover ever).
em of an NxN matrix. Th~s is true for both DOUBLE and ADAPTIVE. In fact. these N perfect matchings do not need to be explicitly computed. That is. they can be predetermined ofline. This situation further implies that, in Fig. 2 , the OPS switch fabric does not need to wait the whole H time slots for algorithm execution before it enters Stage 3. If we can arrange these hi predetermined perfect matchings to be configured at the beginning of Stage 3, then switching (Stage 3) and algorithm execution (Stage 2) can work in paraIlel and thus time can be saved. (Note that Fig. 2 only shows that Stages 2&3 work in series.) Consequently, for DOUBLE 141 and ADAPTIVE [ 5 ] , these two stages can partially overlap to reduce the totaI packet delay. The key point is that. Xout of the Fig. 2 From (7) and (8): we get
[ : I Ssck:dule
The above equation (9) shows the relationship between Ssckddc and D under the optimized OPS switch architecture. It is important to note that (7) and (9) are still subject to the constraint P S N S , as discussed in Section 11. Our optimization on the OPS switch architecture makes use of the time overlap between Stages 2&3 (in Fig. 2 ), but this does not change the constraint, which indicates that T must be large enough to accommodate all the hi, configurations. If we take DOUBLE 141 as an example, from (9) we can see that the packet delay is reduced to D=1.5T+H because SScheh1,=2. If we assume H=T for this case, we can see that the new OPS switch architecture cuts down the packet delay by Generally, because &he&> 1 (except EXACT algorithm [4, 101 wluch uses N F N~-~N +~ configurations to aclueve SZhehle=l), the packet delay formulated in (9) is always less than that of the architecture shown in Figs. 1&2 (which is  2T+H) . Obviously, this is because our proposed architecture makes use of the time overlap between algorithm execution and traffic sending. For general cases, the amount of time saved depends on the particular situation. To make this point clearer, Ssc~d,,le in (3) is plotted in Fig. 5 . From the figure, we can see that a larger Sschsdule corresponds to a smaller Ns, According to our previous discussion, a smaller Ns means that a greater portion of the T (regular) slots in Stage 3 of Fig. 2 (the old architecture) is occupied by the N predetermined perfect matchings. (Formula (8) also reflects the ratio.) This further indicates that the overlapped time period in Fig. 4 V1. CONCLUSIOX Optical packet switches (OPS) bring about scalability, high line rate, huge capacity and low power consumption features to communication networks on an economical base. It is vely attractive for c q i n g IP traffic over WDM optical networks.
Due to the inherent reconfiguration overhead in OPS switches, speedup and packet delay are two key issues in terms of OPS implementation. Existing scheduling algorithms (DOUBLE [4] and ADAPTIVE [ 5 ] ) make effective tradeoff between these two factors. In this paper, we optimized these hvo algoritluns to use less switch configurations and lower speedup for performance guaranteed switching. The resulting algorithms' computational complexity and the packet delay are dso reduced. Based on the characteristics of these two algorithms, we also modified the existing OPS switch architecture. The new switch architecture was shown to reduce the packet delay significantly. In addition, all the above performance gains are achieved without incurring any extra cost.
