Can we schedule traffic more efficiently in optical packet switches? by Yeung, KL et al.
Title Can we schedule traffic more efficiently in optical packetswitches?
Author(s) Wu, B; Wang, X; Yeung, KL
Citation 2006 Workshop On High Performance Switching And Routing,Hpsr 2006, 2006, p. 181-186
Issued Date 2006
URL http://hdl.handle.net/10722/45951
Rights Creative Commons: Attribution 3.0 Hong Kong License
 Abstract—We consider traffic scheduling in non-blocking 
electronic-buffered optical packet switches (OPS) with bounded 
packet delay. Due to the reconfiguration overhead of the switch 
fabric, the two commonly used optimization objectives, 
minimizing packet delay and minimizing switch speedup, conflict 
with each other. Intelligent scheduling algorithms have been 
designed to provide tradeoff between these two objectives. In this 
paper, we propose a more efficient approach to schedule OPS 
traffic, resulting in significantly reduced speedup and/or packet 
delay. However, our approach is based on a very interesting 
conjecture, which has not been strictly proved so far. We would 
like to put forward this conjecture as an open question, and call 
for a proof or disproof. 
Index Terms—Conjecture, optical packet switch (OPS), 
performance guaranteed switching, scheduling.  
I. INTRODUCTION
ECENT progress on optical switching technologies [1-4] 
has enabled the implementation of electronic-buffered 
optical packet switches (OPS) as shown in Fig. 1. The core of 
this architecture is the optical switch fabric, which can 
efficiently provide huge switching capacity as demanded by the 
backbone routers in the Internet. Since optical connections (i.e. 
optical fibers) are used to interconnect the input/output 
line-cards with the central switch fabric, the input/output 
line-cards can be distributed into several racks, which may 
locate at hundreds of meters away from each other. As a result, 
power consumption in each rack can be reduced, and the switch 
becomes more scalable. 
On the other hand, the optical switch fabric usually needs 
some guard time to change its inter-connection pattern from 
one to another, and to synchronize the signals arriving at the 
input ports [5]. This guard time is called reconfiguration 
overhead. During this period, no packet can be transmitted 
across the switch fabric. Accordingly, packet transmission rate 
in the switch fabric must be faster than the external line-rate 
(i.e. a speedup is required in the switch fabric) in order to 
achieve performance guaranteed switching (i.e. 100% 
throughput with bounded packet delay) [6-12]. It is shown in 
[6, 8] that minimizing speedup and minimizing delay are two 
This work is supported by Hong Kong Research Grant Council Earmarked 
Grant HKU 7032/01E. 
Bin Wu, Xin Wang and Kwan L. Yeung are with the Department of 
Electrical and Electronic Engineering, The University of Hong Kong, 
Pokfulam, Hong Kong (Tel: 852-2857-8493; Fax: 852-2559-8738; e-mail: 
{binwu, xinwang, kyeung}@eee.hku.hk). 
conflicting goals, where a higher speedup gives a shorter packet 
delay and vice versa. 
Based on switch architectures similar to Fig. 1, several 
algorithms have been recently proposed [6-12] to schedule OPS 
traffic with guaranteed switching performance. Among them, 
MIN [6], ?i-SCALE [9] and QLEF [10] aim primarily at 
minimizing the packet delay, whereas reducing speedup is a 
secondary objective. DOUBLE [6] is the first algorithm that 
allows the tradeoff between speedup and delay. Let N denote 
the switch size and NS be the (maximum) number of switch 
configurations required for scheduling. DOUBLE needs no 
more than NS=2N configurations to schedule any legitimate
traffic matrix, with a speedup of Sschedule=2 (detailed in Section 
II). However, this algorithm does not consider the amount of 
reconfiguration overhead ? in its scheduling decision, and thus 
it is not optimized for switches with different ?. Besides, NS=2N
only represents a single point in the solution space [6, 8], and 
the characteristics for other NS values are not studied in [6]. To 
address those issues, ADAPTIVE [8] is proposed. It is shown 
[8] that DOUBLE can be regarded as a special case of 
ADAPTIVE at NS=2N.
In this paper, we explore the possibility of beating the 
performance of DOUBLE and ADAPTIVE. We show that this 
can be achieved based on a very interesting conjecture. We put 
forward this conjecture as an open question, and hope that a 
proof or disproof can be found soon. 
The rest of the paper is organized as follows. In Section II, 
we review the generic scheduling procedure [6-12] and the 
existing scheduling algorithms DOUBLE and ADAPTIVE. In 
Can We Schedule Traffic More Efficiently in 
Optical Packet Switches? 
Bin Wu, Student Member, IEEE, Xin Wang, and Kwan L. Yeung, Senior Member, IEEE
R
Scheduler   
optical switch 
Internal speedup 
Fig. 1.  A scalable high speed optical packet switch. 
1
N
1
N N?N unicast 
VOQs 
VOQs 
OQ1
OQN
N input line-cards N output line-cards 
Optical connections 
0-7803-9569-7/06/$20.00 c©2006 IEEE
181
Section III, we propose an approach to improve the scheduling 
efficiency. Section IV gives some further discussion, and we 
conclude the paper in Section V. 
II. TRAFFIC SCHEDULING AND SPEEDUP-DELAY TRADEOFF
A. Scheduling Procedure 
The generic four-stage scheduling procedure as shown in 
Fig. 2 is followed. In Stage 1, incoming packets are periodically 
accumulated in the input buffers over T time slots to construct 
an N×N traffic matrix C(T)={cij}. Each entry cij denotes the 
number of packets received at input i and destined to output j.
C(T) is legitimate if each of its line sums (i.e. row sum or 
column sum) is no larger than T. Throughout the paper, we only 
consider legitimate C(T). The scheduling algorithm takes H
time slots in Stage 2 to generate NS configurations Pn={p(n)ij}, 
n∈{1, …, NS}, each weighted by ?n, to cover C(T). “Cover” 
means that ?NS n=1?n p(n)ij ? cij for any i, j∈{1, …, N}. Pn is an
N×N permutation matrix with at most a single “1” in each line 
(row or column). p(n)ij=1 indicates that a packet can be sent from 
input i to output j in one slot; p(n)ij = 0 otherwise. In Stage 3, the 
switch fabric is reconfigured according to the NS configurations 
obtained in Stage 2. An internal switch fabric speedup S is 
applied, resulting in compressed/shortened time slots, to ensure 
that this stage occupies only T (regular) slots. The fabric holds 
each configuration Pn for ? n compressed slots for packet 
transmission. Finally in Stage 4 packets are sent onto the output 
lines from output buffers (in T slots). 
From the tagged packet in Fig. 2, we can see that the bounded 
delay of any packet is 2T+H slots. Because ?NS slots are used to 
reconfigure the switch for NS times in Stage 3, only T??NS slots 
are left for transmitting C(T). Since there are at most T packets 
waiting at each input port for transmission, a speedup factor 
Sreconfigure=T/(T??NS) is necessary to compensate for the idle 
time caused by reconfigurations. At the same time, the 
scheduling algorithm may produce some empty slots (i.e. 
underutilize the bandwidth provided by the configurations 
[6-12]). As a result, more than T compressed slots are usually 
needed in Stage 3 to transmit C(T). Therefore another speedup 
factor 
?
=
=
SN
n
nT
S
1
schedule
1 φ                                    (1) 
is required to compensate for the inefficient scheduling. In fact, 
Sschedule denotes the efficiency of the scheduling algorithm 
adopted. A smaller Sschedule indicates a more efficient scheduler 
(with less empty slots in the schedule). The overall internal 
speedup S is then given by 
?
=
−
=
−
=×=
SN
n
n
SS NT
S
NT
TSSS
1
schedulescheduleereconfigur
1 φδδ .    (2) 
B. Speedup-Delay Tradeoff and Scheduling Algorithms 
For a given C(T), we divide it by T/(NS?N) 1 to get a quotient 
matrix Q={qij}and a residue matrix R={rij}: 
RQTC +×
−
=
NN
T
S
)( .                          (3) 
Since each line of C(T) sums to at most T, the maximum line 
sum of Q is at most NS?N. So we can apply edge-coloring [13] 
to the bipartite multigraph of Q, and get NS?N configurations to 
cover Q [6-8]. On the other hand, all the entries in R are not 
larger than T/(NS?N). So, R can be covered by any N
non-overlapping configurations (i.e. any two of them do not 
cover the same entry), with each weighted by T/(NS?N). All in 
all, C(T) can be covered by (NS?N)+N=NS configurations, each 
weighted by ?n=T/(NS?N). From (1), Sschedule can be found. 
NN
NN
NN
T
TT
S
S
S
S
N
n
n
S
−
+=×
−
×== ?
=
1
11
1
schedule φ .            (4) 
The above formula (4) is referred to as speedup function in 
[8]. In essence, it depicts the tradeoff relationship between the 
speedup (Sschedule due to the inefficient scheduling) and the 
delay (in terms of NS). Recall that the bounded delay of any 
packet is 2T+H slots and T>?NS. Therefore the minimum 
achievable delay is given by 2?NS+H.
DOUBLE [6] requires NS=2N configurations to cover C(T).
This is obtained by replacing NS in (3) by 2N to get 
C(T)=[T/N]×Q+R. N configurations are required to cover Q
and R respectively, and each configuration is equally weighted 
by ?n=T/N. From (4), DOUBLE achieves Sschedule=2. Fig. 3 
gives an example of DOUBLE execution. 
Unlike DOUBLE, ADAPTIVE [8] substitutes (4) into (2), 
and minimizes the overall speedup S by solving 
0=
∂
∂
SN
S
.                                        (5) 
Therefore the schedule generated by ADAPTIVE is optimized 
with respect to the value of ?.
III. IMPROVING SCHEDULING EFFICIENCY
In this section, we aim at achieving a better scheduling 
efficiency than DOUBLE and ADAPTIVE. Since DOUBLE is 
a special case of ADAPTIVE at NS=2N, for simplicity, we only 
focus on DOUBLE below. We further assume that the switch 
1 If T/(NS?N) is not an integer, use ? ?)/( NNT S −  as the substitute [8]. 
T T+H 2T+H 3T+H 
Packet delay=2T+H
S
ta
g
e
 
Fig. 2.  Optical packet switch scheduling stages. 
Switch reconfiguration ?
Transmission phase 
Time 1
2
3
4
182
size N is an even number. 
A. Observation and Motivation 
In DOUBLE, the traffic matrix C(T) is decomposed as 
C(T)=[T/N]×Q+R. For any rij?R, if rij>T/(2N), we call it an 
LER (large entry in R). Otherwise it is an SER (small entry in 
R). We have the following Lemma 1 (proved in Appendix A). 
Lemma 1: In DOUBLE, if a particular line (row i or column 
j) of R contains k LERs, then in Q we have 
,rowfor i
2
kNq
N
1j
ij ??
?
??
?
−≤?
=
.columnfor j
2
kNqor
N
1i
ij ??
?
??
?
−≤?
=
For example, the second row of R in Fig. 3 contains k=3 
LERs (>T/(2N)=2). Then the entries in the second row of Q sum 
to at most ? ?2kN− =2. 
Based on Lemma 1, we can move some packets from R to Q,
while keeping the maximum line sum of Q not more than N.
Note that each configuration in DOUBLE is equally weighted 
by ?n=T/N. Without loss of generality, if row i of R contains k
LERs, we can move at most ? ?2k  of these LERs to Q by setting 
them to 0s in R, and at the same time increasing the 
corresponding entries in Q by one. Fig. 4 shows an example 
based on the Q and R in Fig. 3. We use Q´ and R´ to denote the 
modified Q and R. We can see that Q´ can still be covered by N
configurations with a weight ?n=T/N each. So we can pack 
more packets in the N configurations used to cover the quotient 
matrix (than DOUBLE). On the other hand, since some LERs 
are moved to Q, it may not be necessary to use another N
equally weighted configurations (with ?n=T/N) to cover R´.
The above observation motivates us to explore a more 
efficient scheduling algorithm than DOUBLE. In fact, there 
may be at most N LERs in each line of the original R. From 
Lemma 1, “half” of these LERs in each line of R can be moved 
to Q, while keeping the maximum line sum of Q´ not more than 
N. Therefore, it is reasonable to expect that each line of R´
contains at most N/2 LERs after packet moving. As a result, we 
may be able to find N/2 non-overlapping configurations, each 
weighted by ?n=T/N, to cover all the remaining LERs in R´. At 
the same time, we may find another set of N/2 non-overlapping 
configurations, each with a reduced weight of ?n=T/(2N), to 
cover the remaining SERs. If this can be done, then Sschedule of 
DOUBLE can be reduced to 
75.1
222
11 2
1
schedule =?
?
?
?
?
?
×+×+××== ?
=
N
N
TN
N
TN
N
T
TT
S
N
n
nφ        (6) 
B. Issues 
To achieve the goal mentioned above, we need to solve the 
following issues. 
• Determine the set of LERs to be moved to Q, such that 
each line sum of Q´ does not exceed N, and R´ contains at 
most N/2 LERs in each line. 
• Among the N non-overlapping configurations used to 
cover R´, N/2 of them should cover all the remaining 
LERs in R´, and the other N/2 configurations should cover 
all the SERs not yet covered by the first N/2 
configurations. 
Generally, it is not easy to determine the set of LERs that 
should be moved to Q. This can be seen from the example in 
Fig. 5. Assume all the non-zero entries (3s) in R are LERs. The 
number next to each line is the number of LERs that can be 
moved from this line to Q, which is obtained from ? ?2k  based 
on Lemma 1. If the four circled entries are moved, then we 
cannot further move any other LERs without violating the 
quota of the corresponding line. At this point, the last row still 
contains more than N/2 LERs. For larger switch size N, it will 
be more difficult to figure out a proper set of LERs to move. 
C. Methodology 
We first define two important notions, PCs (Predefined 
Configurations) and DHH matrix. For an N×N matrix, we can 
use N predefined non-overlapping configurations (or PCs) to 
cover all of its entries. As an example, eight non-overlapping 
PCs (i.e. PC1?PC8), as defined in Fig. 6, can be used to cover 
all the entries of an 8×8 matrix. Note that the number at each 
entry of this matrix denotes the particular PC that covers this 
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
58762341
65873412
76584123
87651234
43215678
32148567
21437856
14326785
Fig. 6.  A possible set of predefined 
configurations for an 8×8 matrix. 
?
?
?
?
?
?
?
?
?
?
?
?
0333
0300
3330
0003 1
2
1
2
1 1 12
R=
Fig. 5.  A carefully designed 
mechanism is necessary in order 
to move LERs properly. 
Fig. 3.  An example of DOUBLE execution. The all-1 matrix used to 
cover R equals to the sum of the N non-overlapping configurations (P5-P8). 
?
?
?
?
?
?
?
?
?
?
?
?
=
4370
4750
7330
00016
(16)C
?
?
?
?
?
?
?
?
?
?
?
?
=
1010
1110
1000
0004
Q
?
?
?
?
?
?
?
?
?
?
?
?
=
1
1
1
1
1P
?
?
?
?
?
?
?
?
?
?
?
?
=
1
1
1
2P
?
?
?
?
?
?
?
?
?
?
?
?
=
1
1
3P
?
?
?
?
?
?
?
?
?
?
?
?
=
1
4P
?
?
?
?
?
?
?
?
?
?
?
?
=
1
1
1
1
5P
?
?
?
?
?
?
?
?
?
?
?
?
=
1
1
1
1
6P
?
?
?
?
?
?
?
?
?
?
?
?
=
1
1
1
1
7P
?
?
?
?
?
?
?
?
?
?
?
?
=
1
1
1
1
8P
41 =φ                  42 =φ               43 =φ              44 =φ
45 =φ                  46 =φ                47 =φ              48 =φ
Step 1: Calculate Q
Step 2: Color Q
Step 3: Schedule Q
Step 4: Schedule R
?
?
?
?
?
?
?
?
?
?
?
?
×+
?
?
?
?
?
?
?
?
?
?
?
?
×≤
?
?
?
?
?
?
?
?
?
?
?
?
=
1111
1111
1111
1111
4
1010
1110
1000
0004
4
4370
4750
7330
00016
(16)C
?
?
?
?
?
?
?
?
?
?
?
?
=
0330
0310
3330
0000
R
T=16, N=4,  4=
N
T
, NS=8
?
?
?
?
?
?
?
?
?
?
?
?
+
?
?
?
?
?
?
?
?
?
?
?
?
×=
0330
0310
3330
0000
1010
1110
1000
0004
4(16)C
?
?
?
?
?
?
?
?
?
?
?
?
+
?
?
?
?
?
?
?
?
?
?
?
?
×≤
0030
0010
0300
0000
1110
1210
2010
0004
4(16)C
Fig. 4.  Move the circled LERs from R to Q.
183
entry. For easy reading, the entries covered by PC3 are circled 
in Fig. 6. 
Given an arbitrary 0/1 matrix, we can use two lines to 
partition it into four (N/2)×(N/2) zones/sub-matrices A, B, C
and D as shown in Fig. 9 (and Fig. 10) in Appendix B. For each 
row/column, if the number of 1s in zone A or C is no less than 
that in zone B or D, or is less by at most one, then this 0/1 
matrix is called a DHH matrix. In other words, the two diagonal 
zones (A and C) of a DHH matrix contain at least “half” number 
of 1s for each row and column. (Please refer to Appendix B for 
a more rigorous definition.) 
Our approach in improving the scheduling efficiency (i.e. 
minimizing Sschedule) is based on the concepts of PC and DHH 
matrix. We first convert the residue matrix R={rij} into a 0/1 
indicator matrix ?={?ij} such that ?ij=1 if rij is an LER and 
?ij=0 otherwise. The DHH conjecture given in Appendix B 
says that we can always find two permutation matrices U and V,
such that ?´=U?V is a DHH matrix. For the 8×8 matrix, 
assume that a DHH matrix ?´ is obtained by ?´=U?V. Then, 
PC5?PC8 defined in Fig. 6 (which span over the sub-matrices 
A and C as illustrated in Fig. 9) can cover “more than half” of 
the 1s for each row and column of ?´. The remaining 1s
covered by PC1?PC4 “correspond” to LERs that should be 
moved to Q. Since ?´ is obtained from ? after some 
row/column permutations (i.e. ?´=U?V), the 1s in ?´ do not 
directly match the original LER entries. Therefore, we need to 
invoke an inverse transform to get our desired configurations. 
Fig. 7 gives an example, where the execution steps are 
indexed by the numbers in the dashed circles. In Step 1, we 
construct the indicator matrix ? from R. In Step 2, we find two 
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
+
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
×=+=
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
33331101
33331011
33330000
33330200
33001001
11011333
12001333
01011333
00300000
10002010
03100010
20020000
00121002
00001201
00000211
00001121
44
331531101
73339051
315730040
1133110200
33485009
110151137
120011177
010157117
RQC(32)
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
11110000
11110000
11110000
11110000
11000000
00000111
00000111
00000111
?T=32, N=8,  N
T
=4, NS=16
1
==′ VU??
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
00010000
01000000
00100000
00000010
10000000
00000100
00000001
00001000
5?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
10000000
00001000
01000000
00010000
00000100
00000010
00100000
00000001
6?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
00000100
00000001
00001000
10000000
00000010
00010000
01000000
00100000
7?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
00000010
00100000
00000001
00000100
00010000
10000000
00001000
01000000
8?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
1
1
1
1
1
1
1
1
1P
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
1
1
1
1
1
1
1
1
2P
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
1
1
1
1
1
1
1
1
3P
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
1
1
1
1
1
1
1
1
4P
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
1
1
1
1
1
1
5P
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
1
1
1
1
1
6P
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
1
1
1
1
7P
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
1
1
8P
Fig. 7.  An illustrative example of our proposed approach. 
???
?
???
?
×+×+×≤′+′×≤ ???
===
8
5
4
1
8
1
4244
nnn
nnn ??PRQC (32) ( ) 75.144428432
1
schedule =×+×+××=S
6
5
?n=U?1(PCn)V?1
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
00001000
00000010
00000100
01000000
00000001
00100000
10000000
00010000
1?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
00000001
00010000
00000010
00001000
00100000
01000000
00000100
10000000
2?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
00100000
10000000
00010000
00000001
01000000
00001000
00000010
00000100
3?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
01000000
00000100
10000000
00100000
00001000
00000001
00010000
00000010
4?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
00300000
10002010
03100010
20020000
00121002
00001201
00000211
00001121
Q
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
33331101
33331011
33330000
33330200
33001001
11011333
12001333
01011333
R
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=′
01400000
20012010
13110010
21120000
01121002
00001202
00000321
00001231
Q
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=′
30031101
03301011
03300000
30030200
30001001
11011330
12001003
01011003
R
4
3
2
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
00001000
01000000
00100000
00000001
10000000
00000100
00000010
00010000
11110000
11110000
11110000
11110000
11000000
00000111
00000111
00000111
00000001
01000000
00100000
00000010
00001000
00000100
00010000
10000000
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
00010110
01101001
01101001
00010110
01001000
00010110
01101001
01101001
184
permutations U and V to permute ? into a DHH matrix ?´.
Then, Step 3 imposes U and V on the predefined configurations 
PC1?PC8 and uses ?n=U ?1(PCn)V ?1 (note U ?1=U and V ?1=V
in this particular example) to generate eight configurations ?1
??8. For ?1??4, if they cover some LERs in the original R,
then these LERs (circled entries in Step 4) are to be moved to Q.
Based on Lemma 1, we can set these LERs to 0s in R, and 
increase the corresponding entries in Q by one. Q´ and R´ are 
thus obtained in Step 4. In Step 5, to cover Q´ we simply use the 
same edge-coloring algorithm [13] as in DOUBLE to 
determine configurations P1?P8, each weighted by T/N=4. On 
the other hand, R´ can be covered by ?1??8, with a weight of 4 
for ?5??8, and a reduced weight of T/(2N)=2 for ?1??4. As a 
result, Sschedule can be reduced to 1.75, as shown in Step 6. In 
fact, this 12.5% reduction in Sschedule is independent of switch 
size N, as stipulated by (6). Note that there is a one-to-one 
mapping from entry to entry between ? and ?´, which is 
determined by the linear transform ?´=U?V. Therefore, the 
resulting ?1??8 are non-overlapping, and they can cover every 
entry of R´.
IV. DISCUSSION
If we define LERs as rij>T/[2×(NS?N)] (instead of rij>T/(2N)
for DOUBLE) and decompose C(T) as discussed in Part B of 
Section II, then the speedup function (4) used in ADAPTIVE 
[8] can also be reduced to 
NN
N
T
S
S
N
n
n
S
−
×+== ?
=
4
3
1
1
1
schedule φ .                      (7) 
For NS<2N, we can achieve a greater gain (than 12.5%) over the 
original ADAPTIVE algorithm. Since speedup and packet 
delay can trade one for another, this also means that packet 
delay can be smaller than that given in [8] if the same speedup 
is applied to the OPS switch. 
From Fig. 11 in Appendix B, we know that U and V are used 
to record the row/column permutations involved, and they can 
be constructed from a square unit matrix E (i.e. an N×N matrix 
with N 1s at its diagonal entries and all other entries are 0s). 
Therefore, it is not necessary to get U ?1 and V ?1 by algebraic 
calculations. Instead, we can also start from a square unit 
matrix E, and permute its lines in a reverse order to generate 
U?1 and V ?1.
Finally, it is important to note that our proposed approach is 
based on the DHH conjecture. We have tried to prove it for a 
very long time but without luck. We also cannot find a single 
counterexample by checking extensive samples using computer 
programs. Many mathematicians including the authors of [14] 
have reviewed our conjecture. As for now, the problem is still 
open. 
V. CONCLUSION
Due to the reconfiguration overhead, speedup and packet 
delay are two main issues for traffic scheduling in optical 
packet switches (OPS). In this paper, we proposed a new 
approach to improve the scheduling efficiency in OPS with 
guaranteed switching performance. Compared with the existing 
scheduling algorithms, our approach can significantly reduce 
speedup and/or packet delay. However, the proposed approach 
is based on the DHH conjecture given in this paper. We call for 
a proof or disproof for this conjecture. 
APPENDIX A    CORRECTNESS PROOF OF LEMMA 1
Lemma 1: In DOUBLE, if a particular line (row i or column 
j) of R contains k LERs, then in Q we have 
,rowfor i
2
kNq
N
1j
ij ??
?
??
?
−≤?
=
.columnfor j
2
kNqor
N
1i
ij ??
?
??
?
−≤?
=
Proof: After C(T) is divided by T/N, we have 
RQTC +=
N
T)( ijijij rqN
Tcor += ,                (8) 
Without loss of generality, we assume that row i of R contains k
LERs. Because 
Trq
N
Tc
N
j
ij
N
j
ij
N
j
ij ≤+= ???
=== 111
,                    (9) 
we have 
2
21
1
kN
N
T
k
N
TT
N
T
rT
q
N
j
ijN
j
ij −=
×−
<
−
≤
?
? =
=
.          (10) 
Since ?Nj=1qij is an integer, we then have 
??
?
??
?
−≤?
=
21
kNq
N
j
ij .                             (11) 
APPENDIX B    DHH CONJECTURE
In this appendix, we assume that the size N of the matrices/ 
vectors is an even number. 
Definition 1 (halve): Given an arbitrary 0/1 vector (row or 
column whose entries are either 0 or 1), use a line to separate it 
into two equal parts as shown in Fig. 8. Let x and y denote the 
number of 1s in each part. If |x?y|?1, we say that the 1s in the 
vector are halved by the line. 
Fig. 8 gives some examples, where the 1s are halved in (a) 
and (c), but not in (b) and (d). 
Definition 2 (DHH matrix): Given an arbitrary N×N 0/1 
matrix, use two lines to partition it into four (N/2)×(N/2) zones/ 
sub-matrices A, B, C and D as in Fig. 9. For each row/column 
of the matrix, if the number of 1s in zone A or C is more than 
that in zone B or D, or the 1s in this row/column are halved by 
one of the two lines, then this matrix is called a DHH matrix (it 
means that the diagonal half-size sub-matrices A and C contain 
at least “half” number of 1s for each row and column). 
The matrix in Fig. 10a is a DHH matrix, whereas the matrix 
in Fig. 10b is not, because its second column has two more 1s in 
zone D than that in zone A.
DHH conjecture: Given an arbitrary N×N 0/1 matrix ?, we 
185
can permute its rows or columns2 for a limited number of times, 
such that ? can be turned into a DHH matrix. In other words, 
there exist two permutation matrices U and V, such that U?V is 
a DHH matrix. 
For example, if we swap the first row and the last row in Fig. 
10b, then the resulting matrix is a DHH matrix. Fig. 11 gives a 
more complex example for ? in Fig. 7. 
REFERENCES
[1] J.E Fouquet et. al, “A compact, scalable cross-connect switch using total 
internal reflection due to thermally-generated bubbles”, IEEE LEOS 
Annual Meeting, pp. 169-170, Dec. 1998. 
[2] L. Y. Lin, “Micromachined free-space matrix switches with submilli- 
second switching time for large-scale optical crossconnect”, OFC’98 
Tech. Digest, pp. 147-148, Feb. 1998. 
[3]  O. B. Spahn,  C. Sullivan, J. Burkhart, C. Tigges, and E. Garcia, 
“GaAs-based microelectromechanical waveguide switch”, Proc. 2000 
IEEE/LEOS Intl. Conf. on Optical MEMS, pp. 41-42, Aug. 2000. 
[4] A. J. Agranat, “Electroholographic wavelength selective crossconnect”, 
1999 Digest of the LEOS Summer Topical Meetings, pp. 61-62, Jul. 1999. 
[5] K. Kar, D. Stiliadis, T. V. Lakshman, and L. Tassiulas, “Scheduling 
algorithms for optical packet fabrics”, IEEE Journal on Selected Areas in 
Communications, vol. 21, issue 7, pp. 1143-1155, Sept. 2003. 
2 i.e., swap its rows or columns. Note that a row can only be swapped with 
another row, and a column can only be swapped with another column. 
[6] B. Towles and W. J. Dally, “Guaranteed scheduling for switches with 
configuration overhead”, IEEE/ACM Trans. Networking, vol. 11, no. 5, 
pp. 835-847, Oct. 2003. 
[7] Xin Li and Hamdi, M., “On scheduling optical packet switches with 
reconfiguration delay”, IEEE Journal on Selected Areas in 
Communications, vol. 21, issue 7, pp. 1156-1164, Sept. 2003. 
[8] Bin Wu and Kwan L. Yeung, “Minimizing internal speedup for 
performance guaranteed optical packet switches”, IEEE GLOBECOM 
'04, vol. 3, pp. 1742-1746, Dec. 2004. 
[9] Bin Wu and Kwan L. Yeung, “Scheduling optical packet switches with 
minimum number of configurations”, IEEE ICC '05, vol. 3, pp. 
1830-1835, May 2005. 
[10] Bin Wu and Kwan L. Yeung, “Traffic scheduling in non-blocking optical 
packet switches with minimum delay”, IEEE GLOBECOM '05, vol. 4, pp. 
2041-2045, Dec. 2005. 
[11]  Bin Wu, Kwan L. Yeung and V. O. K. Li, “Two-layer parallel switching: 
A practical and survivable design for performance guaranteed optical 
packet switches”, IEEE GLOBECOM '05, vol. 4, pp. 1905-1909, Dec. 
2005. 
[12] Bin Wu and Kwan L. Yeung, “On optimization of optical packet switches 
with reconfiguration overhead”, IEEE HPSR '05, pp. 217-221, May 2005. 
[13] R. Cole and J. Hopcroft, “On edge coloring bipartite graphs”, SIAM 
Journal on Computing, vol. 11, pp. 540-546, Aug. 1982. 
[14] R. A. Brualdi and H. J. Ryser, “Combinatorial matrix theory”, Cambridge 
University Press, 1991. 
Swap columns 3 & 7 Swap rows 0 & 7 
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
11110000
11110000
11110000
11110000
11000000
00000111
00000111
00000111
×
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
01111000
01111000
01111000
01111000
01001000
00000111
00000111
00000111
×
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
00000111
01111000
01111000
01111000
01001000
00000111
00000111
01111000 ×
×
Swap columns 0 & 4 
U and V are used to record the row/column 
permutations. They can be constructed as follows: 
Initialize U and V as square unit matrices E. If we 
permute two rows in ?, then we also permute the 
corresponding rows in U; if we permute two columns 
in ?, then we permute the corresponding columns in V
too. After ? is turned into a DHH matrix, the 
corresponding U and V are also obtained. 
Fig. 11.  Permute a matrix ? into a DHH matrix (the rows/columns are numbered from 0 to 7). 
Swap rows 1 & 4 
A “×” is used to indicate a row or a column that 
violates the requirement defined in the conjecture. 
The final result is a DHH matrix. 
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
00010110
01101001
01101001
01101001
01001000
00010110
00010110
01101001
×
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
00010110
01101001
01101001
00010110
01001000
00010110
01101001
01101001
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
11110000
11110000
11110000
11110000
11000000
00000111
00000111
00000111
00000001
01000000
00100000
00000010
00001000
00000100
00010000
10000000
VU?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
=
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
00010110
01101001
01101001
00010110
01001000
00010110
01101001
01101001
00001000
01000000
00100000
00000001
10000000
00000100
00000010
00010000
?
?
?
?
?
?
?
?
?
?
?
?
0010
1010
1111
0001
?
?
?
?
?
?
?
?
?
?
?
?
0010
1010
1101
0001
(a) (b) 
Fig. 10.  (a) is a DHH matrix, and (b) is not. 
[                                ] 
1
1
0
0
1
0
1
1
(c) 
[                               ] 
1
1
0
0
1
1
1
1
(d) 
[ 1 0 1 0 0 1 1 0 ]   [ 0 0 0 0 0 1 1 0 ] 
(b) (a) 
Fig. 8.  Illustration of definition “halve”. 
?
?
?
?
?
?
CD
BA
Fig. 9.  Sub-matrices. 
186
