Performance Analysis of a Dual Round Robin Matching Switch with Exhaustive Service by Yihan Li et al.
Performance Analysis of a Dual Round Robin 
Matching Switch with Exhaustive Service 
Yihan Li, Shivendra Panwar,H. Jonathan Chao 
Absbact-  virtual Output Queuing is widely  used by  fixed- 
length high-speed  switches to  overcome head-of-line blocking. 
This is done by means  of matching algorithms. Maxi"  match- 
ing algorithms have good performance, hut their implementation 
complexity  is  quite high.  Maximal  matching algorith~ns  need 
speedup to guarantee good  performance.  Iterative algorithms 
(such as PIM and ISLIP) use multiple iterations to converge on 
a maximal match.  The  Dual Round-Robm Matching (DRRM) 
scheme has performance similar to SLIP and lower impnementa- 
tion complexity. The  ohjective  of matching algorithms  is to reduce 
the matching overhead for each time slot. The EXbaus$ivi: Semce 
Dual Round-Robm Matching (EDRRM)  algorithm amortizes the 
cost of  a match over multiple time slots  While EDRRlCI suf€ers 
from a thmughput below 100% for small switch sizes,  it in  conjec- 
tured to achieve an asymptotic 100% throughput under uniform 
Me.  Simulatious  show that it achieves high thmnghpd under 
nonuniform traflie Its delay performance is not sensitive  to traflic 
hurstiness, switch size and packet length. In an EDRRM switch 
cells helonging to the same packet are transferred to the output 
continu~ly,  which leads to good packet delay performance and 
simplifies the implementation of packet reassembly.  Io this pa- 
per we analyze the performance  of an EDRRM  switch by using an 
exhaustive service random poluog system model.  This was used 
to predict the performance of  switches tw  large to he simulated 
within a rearonahle run time. 
Index Terms-switching,  scheduling,  virtual Output Queueing, 
Dual Round Robin, polling, exhaustive  service. 
I. INTRODUCTION 
IXED-LENGTH switching technology is widely ;accepted  F  as an approach to achieve high switching efficiency for 
high speed packet switches. Variable-length  IP packets are seg- 
mented into fixed-length "cells"  at inputs and are reassembled 
at the outputs. 
Packet switches based on Input Queuing (IQ) are desirable 
for high speed switching, since the  internal operation speed 
is only slightly higher than the input line.  However, an  Input 
Queuing switch has a critical drawback [I],  [21: the through- 
put is limited to 58.6% due to the head-of-line (HOL.)  block- 
ing phenomena. Output Queuing (OQ) switches have optimal 
delay-throughput performance for all traffic distributions. hut 
the N-times speed-up in the fabric limits the scalability of this 
architecture. 
Vial  Output Queuing  (VOQ) is  used  to  overcome the 
drawbacks and combine the advantages of an Input Queuing 
switch and an Output Queuing switch. In a VOQ swilch, ekh 
input maintains N queues, one for each output. By using VOQ, 
no additional speedup is required and HOL blockin;: can he 
eliminard 
Considerable work has been done on scheduling algorithms 
for VOQ switches. It has been proved that by using a maximum 
weight matching algorithm 100% throughput can be reached 
for i.i.d.  arrivals (uniform or nonuniform) [31,  141,  [51,  [61. 
But maximum weight matching is not practical to implement 
in hardware due to its complexity,and  may not guarantee fair- 
ness and quality of  service.  A number of practical maximal 
marching algorithms have been proposed [7], [8], [9]. but max- 
imal matching algorithm cannot achieve as high a throughput 
as maximum matching algorithms. Iterative algorithms such as 
PIM [IO]  and SLIP [Ill, [6],  use multiple iterations to con- 
verge on a maximal matching. 
The Dual Round-Robin Matching (DRRM) switch [12]. [I31 
builds and improves on the ideas incorporated in ISLIP. It has 
been proven that DRRM can achieve 100% throughput under 
i.i.d.  and uniform traffic [13]. Funhemore, the DRRM scheme 
provides fairness and prevents starvation.  It has lower imple- 
mentation complexity compared to algorithms with similar per- 
formance and is scalable. According to simulation results [13], 
under uniform bursty  traffic, the average delay of a DRRM 
switch varies approximately linearly with burst length, hut un- 
der nonunifom traffic the throughput drops below 100%. 
Exhaustive service DRRM  (EDRRM) [14], a variation of 
DRRM,  improves  switching performance  under  hursty  and 
nonuniform  traftic.  The implementation  of  EDRRM  and 
DRRM are  comparable with both having lower complexity than 
iSLIP.  According to simulation results, it is conjectured that 
for an EDRRM switch of  large size, throughputs approaching 
100%  are achievable under uniform traffic. Analysis results in 
this paper support, though not rigorously prove, this conjecture. 
Compared to DRRM and SLIP, EDRRM has higher throughput 
under nonuniform traffic. The delay of EDRRM is less sensi- 
tive to traffic burstiness, and increases much slower with switch 
size. EDRRM is neither a maximum matching nor a maximal 
matching algorithm. Unlike any maximum or maximal match- 
ing algorithm, which try  to find as many matches as possible in 
each time slot, EDRRM achieves efficiency by looking at the 
matching overhead over time.  In EDRRM the cost in wasted 
slot times to get a match may be large, but the cost is amortized 
over a VOQ busy period. We believe that this is a new approach 
with both theoretical and practical implications. 
Usually in a packet switch multiple queues are needed at each 
Output Reassembly Module (ORM) if cells belonging to differ- 
ent packets are interleaved at the  same output [HI.  When a 
cell is transferred through the switch fabric to the output, it is 
delivered to one of  the queues of the ORM. The cells belong-  _. 
inz to the same oacket will he delivered to the same aueue and 
Yrhan  Li  is  a  Ph.D.  candidate  in  the  ElecUical  and  Computer  Engi- 
neenng  Depmcnr.  Polytechnic University,  Brwldyn.  NY  l1201.  email: 
yli@pholon.poly.edu. 
. Shivendra Panwar and H. Jonathan Chao are on the faeully of thi:  Uecyical 
and  Computer Eoginecring DepaOmenl. Polytechnic University. Brraklyn. NY 
11201, cmail: pwar@call.poly.edu. chao@an:ioch.poly.edu 
This work is supported in pan by the New York  Sue  Cenlcr fa Advanced 
Technology in Telecommunications (CAlT). and also in  pan by  Ihe  National 
Scicnce Foundation under grants ANIW81521 and ANlW81351 
c;  only leave &e queue until the whole packet is reas'sembled. 
The total delay a packet suffers includes the cell delay and the 
time needed for reassembly. In order to evaluate the variable 
component of the delay incurred in a packet switch, the packet ' 
delay as well as the cell delay of EDRRM is compared to those 
of DRRM and ISLIP. It shows that under uniform i.i.d.  traf- 
fic, the packet delay of EDRRM is lower, and not sensitive to 
0-7803-7632-3/02/$17.00 02002 IEEE  2292 switch size and packet length. At the same time, since all the 
cells belonging to the same packet are transferred to the output 
continuously, only one queue is needed in each ORM, which 
fulther simplifies  the switch implementation. 
In this paper we  analyze the performance of  the  EDRRM 
Step 2 : Grant.  If an output receives one or more requests, 
it chooses the one that appears next in a fixed round- 
robin schedule starting from the current position of 
the pointer. The pointer is moved to this position. The 
outvut notifies each reauestine inout whether or not its 
switch underuniform traffic by using an exhaustive service ran- 
dom polling system model.  The analytical result is used to 
predict the performance of  switches too large to be simulated 
within a reasonable run time. 
In section I1 we briefly review the EDRRM algorithm and its 
performance. In section 111, the EDRRM algorithm is analyzed 
by modeling it as a polling system. 
11.  THE  EXHAUSTIVE  SERVICE DRRM SCHEME AND ITS 
A.  The Exhaustive sefvice  DRRM scheme:  Motivation,  De- 
scription, and an Example 
In the DRRM scheme [131, each input selects one nonempty 
VOQ by round robin, and each output accept one of the multiple 
requests it receives, also in round robin order. When an input 
and an output are matched, only one cell is transferred from 
the input to the matched output.  After that both the input and 
the output will increment their pointers by one and in the next 
time slot this input-output pair will have the lowest matching 
priority.  This behavior is similar to the limited service policy 
[I61 in a polling system. In order to improve on DRRM's per- 
formance under non-uniform traffic, we  modified the DRRM 
scheme so that whenever an input is matched to an  output, all 
the cells in the corresponding VOQ will be transferred in the 
following time slots before any other VOQ of  the same input 
can be served. This is called the erhaustive service policy [I61 
in polling systems. We therefore call this the Exhaustive  service 
DRRM  (EDRRM)  scheme. 
In EDRRM, the pointers of inputs and outputs are updated 
in  a different way  from DRRM. In  a time slot if  an  input is 
matched to an output, one cell in the corresponding VOQ will 
be transferred. After that, if the VOQ becomes empty, the in- 
put will update its arbiter pointer to the next location in a fixed 
order; otherwise, the pointer will remain at the current VOQ so 
that a request will be sent to the same output in the next time 
slot. If an input sends a request to an output but gets no grant, 
the input will update its arbiter pointer to the next location in 
a fixed order, which is different from DRRM where the input 
pointer will remain where it is until it gets a grant. The reason 
for this modification is as follows. In EDRRM if an  input can- 
not get a grant from an output, it means that the output is most 
likely in a "stable marriage" with another input for all the cells 
waiting in the VOQ, and the unsuccessful input is likely to wait 
PERFORMANCE 
reiuest was granted. The poiiteiof the output arbiter 
remains at the granted input. If there are no requests, 
the pointer remains where it is. 
Fig. 1.  An example of  the EDRRM algorithm 
Figure 1 shows an example of the EDRRM arbitration algo- 
rithm.  rlr  r2,  r3 and r4 are arbiter pointers for inputs I, 2. 3 
and 4, and gl.  g2.  93 and g4 are arbiter pointers for outputs 1, 
2, 3 and 4.  At the beginning of the time slot rl points to out- 
put 1 while g1 does not point to input I, which means that in 
the last time slot input 1 was not matched to output I, and now 
input 1 requests output 1 for a new service.  Similarly, rz re- 
quests output 3 for a new service. Since r3  points to output 3 
and g3 points to input 3, it is possible that in the last time slot 
input 3 was matched to output 3 and in this time slot output 3 
will transfer the next cell from input 3 because the VOQ is not 
empty. Input 4 and output 2 have a similar situation as input 3 
and output 3. In the grant phase, output 1 grants the only request 
it receives from input 1 and updates g1 to I, output 2 grants the 
request from input 4 and output 3 grants the request from input 
3.  The request from input 2 to output 3 is not granted, so r2 
moves to 4.  By the end of this time slot, the 1"  VOQ of input 
1 and the 3'd  VOQ of input 3 are still nonempty so that rl and 
ra  are not updated; r4  is updated to 3 because the 2nd  VOQ of 
input 4 becomes empty. 
The implementation complexity of EDRRM's switching fab- 
ric is identical to that of DRRM. Since the ooerational stev and 
for a ling timc to &t  J grmt from this outpui. It IS h&r  forthe 
input to search for another free output than to wit  forthis busy 
data exchange i,  limited. the DRRM arbitmion mechanism can 
be implemcntedin il dismbuted manner tu make the switch rim- 
one.  Since an output has no idea if  the currently served VOQ 
will become empty after  this service, outputs will not update 
their arbiter pointers after cell transfer. 
A detailed description of the two step EDRRM algorithm fol- 
lows: 
Step I  : Request.  Each input moves its pointer to the first 
nonempty VOQ in a fixed round-robin order, starting 
from the current position of the pointer, and sends a 
request to the output corresponding to the VOQ. The 
pointer of ,@e  input arbiter is incremented by one lo- 
fation beyond the selected output if ihe request is not 
granted in Step 2, or if the request is granted and after 
one cell is served this VOQ becomes empty.  Other- 
wise, the pointer remains at that (nonempty) VOQ 
, 
pler Ad  more scalable. The length of each control message in 
DRRM is only kth  of that in SLIP. In  [I21 it is shown that 
by using a token-tunneling technique a switch capacity of more 
than one terabitkc is achievable with existing electronic tech- 
nology. The ORM  of EDRRM is simpler than that of  DRRM. 
Only one queue, with a buffer size equal to the maximum packet 
size, is maintained in the ORM  of  an  EDRRM switch since 
cells belonging to the same packet are served sequentially from 
a VOQ.  Usually, as in DRRM and ISLIP, since cells of dif- 
ferent packets are interleaved, N queues are needed in each 
ORM, one for each input. In the next section, we will show that 
EDRRM has performance comparable with DRRM and SLIP 
under uniform independent eaffic, and has better performance 
under bursty traffic and nonuniform traffic. 
2293 B.  Review of the simulated performance of  EDRRM 
In this subsection we will briefly review the simulation re- 
sults available in 1141 for the EDRRM switch. 
According to simulation results the throughput of  EDRRM 
under uniform and i.i.d.  traffic is close to, but not quite 100%. 
For switches with size not leger than 32, the throughput first 
decreased and then increases with switch size. It is conjectured 
that for larger N the throughput will approach 100%  asymptot- 
ically.  This conjecture is further supported by the analysis in 
this paper.  While this is certainly a  weakness of EDRRM as 
compared to DRRM and SLIP, we believe that this is an ac- 
ceptable tradeoff given its performance advantages ondia more 
typical traffic loadings. 
The throughput of EDRRM has been simulated under four 
nonuniform traffic patterns and compared to those of  DRRM 
and BLIP in  [141.  While the throughputs of  DRRM and is- 
LIP drop, EDRRM leads to high throughput, which is always 
higher or close to 90%. In some extreme traffic pattern, unfair- 
ness may wcur for an EDRRM switch when one input ocrupies 
an output for a long period and cells from other inputs destined 
to the same output cannot get through. To  avoid unfainess, a 
limit on the maximum number of  cells or packets that can be 
served continuously in a VOQ can be enforced by means of  a 
counter. With a VOQ cell service limit the unfairness can be ef- 
ficiently avoided and for other traffic patterns the performance 
of an  EDRRM switch does not differ much from the perfor- 
mance of EDRRM with no VOQ cell service limit. 
The performance of SLIP  and DRRM are.roughly compara- 
ble [13]. Under uniform and i.i.d. traffic, the average cell delay 
of an  EDRRM switch under a heavy load is larger than that 
of a DRRM  switch.  This is due to  the more efficient DRRM 
scheduling mechanism under uniform, heavy traffic. 
Since all the cells arrive within the same burst will be  served 
continuously,EDRRM is not sensitive to bursty traffic. Under 
uniform and geometrically distributed bursty traffic, with the 
same average burst length, the average delay of DRRM is much 
larger than that of  EDRRM under heavy load. The average de- 
lay of  a DRRM switch increases approximately linearly with 
burst length, which is similar to the behavior of an  EDRRM 
switch under light load. Significantly, under heavy load the av- 
erage delay of  an EDRRM switch does not change mush with 
different averagePurst lengths and is much smaller than that of 
a DRRM switch for long burst lengths.  Furthermore, the cell 
delay for EDRRM is less sensitive to switch size than DRRM 
for bursty traffic. As the switch size increases the average cell 
delay of  a DRRM switch grows rapidly, while the average cell 
delay of an EDRRM switch grows more slowly. 
DRRM and EDRRM are fixeddength switchicg algorithms. 
In a fixed-length packet switch, variable-length IP packets are 
segmented into fixed-length cells at the inputs.  The 'delay a 
cell suffered before it is reassembled into a packet and deliv- 
ered to its destination includes the cell delay discussed in the 
last subsection and the waiting time at the output reassembly 
buffer. The average packet delay performance under uniform 
i.i.d.  Bernoulli arrivals for the DRRM, iSLIP and  EDRRM 
switches is simulated [IS]. Simulation results show that the 
average packet delay of  EDRRM  is always comparable with 
or smaller than that of DRRM when the switch size i;s larger 
than 4, and the average packet delays of  DRRM and iSLIP.are 
similar to each oth,er.  For an EDRRh4 switch, packet dmzlay  in- 
creased with packei length under light load, while under heavy 
load the average packet delays for packets with different sizes 
are similar. On the other hand, in  a DRRM switch the ,werage 
packet delay increases linearly with the packet size. 
111.  PERFORMANCE ANALYSIS 
In this section we will analyze the delay performance of an 
EDRRM switch under uniform traffic by using an exhaustive 
random polling system model. 
Since an EDRRM switch is symmetric under uniform arrivals 
and all the inputs will have the same performance, we will con- 
sider one input, say input 0, without loss of generality.  After 
one VOQ is served and becomes empty, the input pointer will 
keep moving in a fixed order until a free output grants the re- 
quest from this input followed by the transfer of all the cells in 
the corresponding VOQ. This is similar to an exhaustive service 
polling system with N stations. After all the cells in one station 
(VOQ) are served, the server switches to another station and 
starts a new service. Since the pointer will not stay at a VOQ if 
the request is not granted, the service order of the VOQs is not 
fixed, which we will approximate by a random polling system 
[IX], where the next station polled is determined according to 
some random criterion. 
We say that an input (oroutput) is busy at the beginning of a 
time slot if in the last time slot this input (or output) is matched 
with an  output (or input) and the corresponding VOQ is not 
empty by the end of  last time slot.  Otherwise we say that an 
input (or output) is free  at the beginning of  a time slot. 
To simplify the model, we consider the system as a fully sym- 
metric random polling system and the arrival process to each 
VOQ is independent and identically distributed.  We  assume 
that all  station VOQs have the same probability of  selection 
for service after a VOQ is served.  This is not in general true 
because the criterion to determine the next VOQ polled is not 
memoryless. The input arhiter will check the VOQs in a fixed 
order to send out a request beginning from the last served VOQ, 
and the requested output will check inputs for a grant in a fixed 
order beginning from the last served input.  However, an ex- 
amination of  simulation runs indicated that this is a reasonable 
assumption. 
The time for the server to transfer all  the cells in a VOO 
is the service period.  After a VOQ is served. the server wiU 
switch over to another VOQ and start service. The time taken 
for the server to switch from one VOQ after service comple- 
tion to another VOQ for a new service period is the switch over 
time. Specifically, suppose one VOQ of input 0 is served and 
becomes empty by the end of  time slot t -  1, then in time slot 
t input 0 begins to search for a new input output matching. In 
time slot t+n an output gives a grant and the new service sms. 
Then the switch over period is from time slot t to t f  n, and the 
switch over time is n. 
A. Average switch over time 
During a switch over period, the input arbiter pointer moves 
to a nonempty VOQ and sends a request to the corresponding 
output.  If the output is free, and the input is the first one in a 
fixed order among  all the inputs sending requests to this output, 
the request will  92  granted and the switch over period ends. 
Otherwise, the pointer will move to the next VOQ and repeat 
this process. 
We make the following assumptions: 
I) Pointer Randomizarion Assumption:  each input has an 
equal chance of being pointed by an output pointer, and 
each VOQ has an  equal chance of  being pointed by an 
input pointer; and 
2) Memoryless Assumption: each output has the same prob- 
ability of being free (with one exception). 
The exception to the second assumption is as  follows. 
2294 Suppose VOQ k  of input 0 was just served and became empty 
by the end of time slot t -  1. In time slot t, suppose only one 
output is free, then this output must be output k that has just 
been released by input 0. Since, under the heavy load traffic as- 
sumption, we assume at least one VOQ of input 0 is nonempty, 
input 0 sends a request for the next busy VOQ to  the corre- 
sponding output. Since this output is busy and cannot grant the 
request, input 0 will send a request for the next busy VOQ in 
time slot t + 1. If no other output is released and output k is 
always the only free one, the same thing will happen in each 
time slot until the input pointer returns to VOQ k. In this time 
slot two alternatives can happen.  If VOQ k bas  new arrivals 
after its last service, the input pointer stays at VOQ k  and VOQ 
k gets service again.  Or if  VOQ k is still empty, the pointer 
skips it and directly moves to the next busy VOQ. Then a new 
cycle begins.  We  call the period during which the output just 
been released is the only free output an inefficientperiod. It will 
terminate when at least one other output becomes free so that 
other VOQs of input 0 have a chance to get service. 
In contrast to the inefficient period, we name the period from 
the first time slot in which more than one output is free to the 
time slot  just before input 0 gets a new service as an eficienfpe- 
riod.  During this period there is a higher probability of forming 
a stable, longer lasting matching. 
During an inefficient period, the same VOQ can get service 
several times in succession if it has new arrivals. But these ser- 
vices are typically very sbon compared to both the service pe- 
riod after an efficient period and the time without service during 
an inefficient period. The average number of cells available for 
one service in an inefficient  period is at most the product of the 
anival rate of one VOQ and N -  1. For uniform traffic, this 
value is always less than one. To simplify the analysis, we can 
omit the services during inefficient  periods and consider them 
part of a switch over time. A switch over period can be an ef- 
ficient period (if more than one output is free at time slot t) or 
an inefficient  period followed by an efficient period (if only one 
output is free at time slot t). 
We define X and Y as the length of an efficient period and 
the length of an  inefficient period, respectively, and m as the 
number of free outputs in a time slot.  Then the switch over 
time 
if  m > 1  in time slot  t 
s={ ?+y,  if  m=l intimeslot  t,  (1) 
t +  j,  j > 0. Note that m > 1 in time slot t and m > 0 in time 
slot t + j.  Q can be expressed as 
N 
Q  =  P(input 0 gets a grant and 
m=l 
m inputs are free in this time slot) 
N 
=  P(input o gets a granum inputs are free) 
P(m  inputs are free)  (4) 
m=l 
We  already know that input 0 is free, so that 
P(m inputs are free) = (;:;)pN-m(l-p)m-l.  (5) 
me  fact that m inputs are free in a time slot means that m -  1 
other inputs along with input 0 are sending requests, while m 
outputs are free.  The probability that the output requested by 
input 0 is free is 1.  If there are i other inputs also sending 
requests to the same output as input 0, the chance that input 0 
wins is *.  The probability that i other inputs request the same 
output as input 0 is ( mi  )  (1 -  w)"'-'-~w',  where w is 
the probability that an input requests the same output as input 
0. Therefore, 
P(input 0 gets a granum inputs are free) 
If an input requests the same output as input 0 does, the cor- 
responding VOQ  must be  nonempty.  A  lower bound on the 
probability that a VOQ is nonempty is %. If there are k other 
nonempty VOQs, then the probability that this VOQ is selected 
by  the input arbiter is &.  (  )  (1 -  $)N-k-l(%)k 
is the probability that k  of  the other N -  1 VOQs are nonempty. 
Therefore, 
N-1 
and 
N-1 
k=O  E(S)  = E(X) +  E(Y)P(m  = 1  in time slot  t).  (2)  N 
We  define p  as the  probability that  an  input  slot  has  an  = -  1  [1- (1 -  qN-11 
anival.  For symmetric stable traffic, p is also the probabil-  N  N 
ity  that one output or input is busy  in  a time slot.  In  time 
slot t, m =  1 means that all othei'inputs are busy,  so  that 
P(m = 1  in time slot  t) = pN-'. 
We first consider X. Suppose X begins at t and the next new 
Then 
service starts at t +  n. then 
I"  m 
N 
(1 -  Q  =,  m=1  E(m-1  N-l )pN-ffl 
P(x=~)  =  9'  n=O  (3)  (  mi  1 )  (1 Lw)m--i--1Wi-  1 
1+1  { (1 -  q)(l-  Q)"-lQ,  n > 0, 
where g is the probability that input 0 gets a grant in time slot  1 
Nw 
= -  [l -  (1 -  w)(l -  w(l -  p))N-']  (7)  t, and Q is the probability that input 0 gets a grant in time slot 
2295 B.  Average delay 
In [le]  the delay of arandompolling system is analyzed. For 
afully symmetricsystem, usingthenotationin [18],theaverage 
delay for a cell is described as 
Similarly,  in time slot t input 0 and m -  1 other inputs request 
new matches.  The output just released by  input 0 is free and 
input 0 requests another output in time slot t. Therefore, 
N 
q  =  P(input 0 gets a grant and 
m -  1 other inputs are free in time slot t 
/at least one other input is free) 
P(input o gets a grant 
N 
= 
-2 
/m inputs are free)P(m inputs are free) 
/P(at least one other input is free) 
[l -  (1 -  w) 
1 
w(1 -  pN-')(N -  1) 
-  - 
Nr(1 -  p)  (N -  1)r 
(i-~p)p+  i-~p  +-I  1-~p 
(16) 
where  p is the arrival rate for one VOQ,  U'  is the &- 
ation of the arrival process  for one VOQ, and r  = E(S), 
h2 = Var(S) = E(Sz)  -  E2(S).  For each VOQ, under i.i.d. 
Bemoulli traffic, p = $, U'  = e  " 
(1 -  w(1 -  p))N - ! 
(1 -  w(1 -  p))N-'  + 
N(1 -  P). 
From (3). we get 
0.1  0.2  0.3  0.4  0.5  0.6  07  0.8  0.9  1 
m  m 
Fig. 2.  The  average delay of EDRRM  with diffmnt switch Sizes 
E(X)=En(l-q)(l-Q)n-lQ=-  l-q  (9) 
"=I  Q' 
We next consider Y. Suppose Y begins at t,and  during time 
period [t, t +  n -  1)  only output k  is free and at time t +n more 
than one outputs are free. Then 
Our system is not exactly the same as a random polling sys- 
tem in [  181.  In  a random polling system, after one station is 
served, the server may switch to an empty station which leads 
to a service period of zero length, following which a new switch 
over period begins.  However, in  our system only nonempty 
VOQs are considered.  An  input only requests service for a 
nonempty VOQ, so that  the length of  a service period is al- 
ways larger than zero. Therefore, when using the less efficient 
random polling system we expect the delay to be over estimated 
for light traffic  load.  The analysis is more accurate in predict- 
ing the performance under heavy traffic load when VOQs are 
less likely to be empty. Also, as the switch size increases, the 
analysis will  approximate the system bener since the pointer 
randomization assumption and the memoryless assumption is 
p(y = n) = pn(N-l)(l -  pN-l),  and  (10) 
m 
n=1 
N-I  -  pN-'  E(Y)  = 1  nP"(N-l)(l -  P  ) -  W.  (11) 
From (2).  (9) and (1 I), we get 
closer to reality. 
Figure 2 is the comparison of the analysis result of the av- 
erage delay E(T)  and simulation result. The analysis result is 
quite close to the simulation result when the load  is heavy, and 
is larger than the simulation result when the load is light. The 
reason for the difference under light load is described above. 
E(Sj .= -  (I2) 
the details of derivation,  but  F~~  E(s2),  we will not 
only the final expression, 
(13) 
1-q  Z(1-Q)  E(X2)=Q  __  [  Q  "1'  C. The Performance When N Is Large 
When N goes to infinity, we will show that both the average 
switch over time and its second moment converge to a limit. We 
will also show that for large N the average delay is a function 
of N and p which always has a finite value for all p < 1. 
Since w goes to +(l -  e-P) and (1 -  w(1 -  P))~  goes to 
e-(l-p)(l-e-p)  when N is large, from (8) Ad  (9) we get 
(14) 
,,N-I  2pN-1 
2  -~  E(Y ) -  1 -  pN-l  (1 -  pN-l  +  ')? 
E(s2)  E E(x2)  +  PN-'(E(ya)  + ZE(X)E(Y))  (15) 
2296 1 -  e-(l-Pl(l-e-? 
lim  Q=  lim  q= 
Ntm  Ntm  1 -  e-p 
Therefore, 
1 -  e-p 
lim  E(S) =  -  e-(l-p)(l-e-q  -  1.  N+m 
Also, 
lim  E(Sz)  =  lim  (2Ez(S)  + E(S)). 
Ntm  N-tm 
Similarly, for large N,  it can be shown that 
N-p  2-p  N 
1-P  W-P)  1-P 
E(T)  -+  E(S)-  + ___ -  E(S)- 
0.1  1.2  0.2  0.4  0.5  06  0.7  08  0.9 
b& 
(20) 
Fig. 4.  The  average delay for large switch sizes 
Since E(S)  has a finite limit, E(Sz)  also has a finite limit 
for p < 1. Similarly, E(T)  is linear in N when N is large, and 
finite for p < 1.  These results suppott our conjecture that the 
switch throughput approaches 100%  for large N under uniform 
uaffic. 
Fig. 3.  The average switch over time of WRRM  with different switch sizes 
Figure 3 shows the average switch over time for 4 different 
switch sizes and its limit, compared to simulation results. It can 
be seen that when the switch size is 1024, the average switch 
over time is almost identical to the limit. 
Figure 4sh  ows the  calculated average delay  E(T)  of  4 
switches of large size. 
IV.  CONCLUSIONS 
The EDRRM algorithm is a variation of the DRRM schedul- 
ing algorithm.  The implementation"comp1exity  of EDRRM's 
switching fabric is the same as that of DRRM, while packet re- 
assembly is simpler than most other popular matching schemes. 
In an EDRRM switch when an input is matched with an  output 
all the cells in the corresponding VOQ are served continuously 
before any other VOQ of the same input can be served.  The 
average cell delay of an EDRRM switch is analyzed by using  an  exhaustive random polling system model in this paper. The 
performance of  an EDRRM switch is comparable to or better 
than a DRRM switch or an  iSLIP switch for most traffic sce- 
narios.  Under uniform i.i.d.  traffic, an  EDRRM switch has a 
larger average cell delay than a DRRM switch, but its average 
packet delay is lower and not sensitive to either switch size or 
packet size [14]. Furthermore, in [I41 we showedthat EDRRM 
is not sensitive to traffic burstiness. Under nonuniform traffic 
the throughputs of  a DRRM switch and an isLP  switch drop 
well below loo%, while the throughput of an EDRRM switch 
is closer to 100%. 
REFERENCES 
[I]  M  I  Karol. M. Illuchyj. and S.  Morgan.  "Input  ss output queuing on  a 
rpare-d~naun  parket rurteh. Pmc. Gl.OflECOM 1916, pp. 6S9-66S. 
121  M J  Karol. M. Hluchyi.  and S.  Morgan. "Input  vows  output queuing on 
B roaccdwisiun oaclict switch."  ICCK Trm.  on C~mmunicotlom.  "01 35. 
L31 p9-1356.~&37: 
assiulas. A. Ephrrmides,  "Stability QD@=  of  constrained queue- 
ing systems and  scheduling for maximum throughput in multihop radio 
networks:'  IEEE Tmns.  Automatic Conrml,Vol.  37, No. 2, pp.  19361949. 
141  N. Meh,  V.A nantharam, and 1.  Walrmd.  "Achieving lowb through- 
put in  an  input-queued switch:'  IEEE INFOCOM'%,p  p. 296302. 
[SI N. McKeown. A. Mekkioikul, V.  Ananularam and J. Walrand.  "Achieving 
100% throughput in an Input-Queued switch.  IEEE Tmm.  Comunico- 
tiom. vol. 47, No. 8,pp.  1260-1267, Aug.  1999. 
[6]  N.  McKeown,  'Schedulmg  algorithms for mput-queued cell switches", 
Ph.D.  77zsis.U C Berkeley. May 1995. 
[7]  A. ,&%my, P.  Knshna,  N.  Wtel and  R. Si",  "Algodthms  for p 
vrdmg bandwidth and delay  guarantees in Input-Buffered mssbm  with 
soceduo". IWOOS'9R. Mav 1998. 
I81  F!  fi&,  N.B:PaG,' A.6hhamiand R. Si",  "Un  the speedup required 
for worksanserving mossbar switches", lWQ0.7'98.  May 1998. 
[9]  A. Mekkittikul  and N. McKeown,  "A  practical scheduling algorithm to 
achieve  100% rhmughpt in input-queued witchw",  IEEE INFOCOM 
98, Vol2. pp. 792-759, April 1998. 
[IO]  T.,E.  Anderson. S.  S. Owiclii. I. B. Saxe and C. P.Th =!er.  '.High  speed 
switch scheduling fur local area networks:'  ACM  Tmm. en Computer Syr- 
tem. vol. II,  No. 4. pp. 319-352, Nor1 993. 
(111  N.  MeKeown.  'me  iSLlP  scheduling  algorithm for  lnput4usued 
switches". IEEUACMTmm. Neworkine. vol. 7. OD. 188-201. Ami1 1W. 
1121  H. J. Chm "Saw  P inahit parker bdch  Dual Rohd.Robm". 
1131  Y  1.1,s Panwar. tI J  Chao.'VnthcpcrformanccofaT)ualRuund-Rubin 
1141  Y  IA, S. Panuar. I1 I  Chao.  "The  Dual Kound-Kobm Matchme. ruirh 
1hEECommun"mon M~gmine,  "01  38  12.p~.  78-84, Du  2000 
ruitch." IEE:ElNFUCUM200I. "01  3, pp  1688-1697. Apd  2MI. 
1151  M. A. M&an.  A.  Biank. P'Ciacconc.  E.  LmnarJ~.  F Ncn.  "Packel 
IEEE  INFOCOM  Scheduling in Input-Queued Cell-Baed Switchr,." 
2UJI.  vd  2. PI)  1085.1(01. April 2001 
[  I61  H. TaLagi. 'Queueing analysii of polling models: an  update:'  Slochortic 
Annlysis of Computer and  Comunieoiion Sylem. Nnu Ymk  Elsevier 
Science and B. V. Norih Holland. pp. 267-318 1990. 
[I71  C-S. Chang. D. Lee and Y. Jou,  "Load  balanced BirLhoff-von Ncumann 
switches, pa~  I:  one-stage  buffering :  Sped  issue of Computer Commu- 
niccliiom  on "Current Isrues  in Terabit Switching."  2001. 
(181  L.  Kleinrock-.  H. Lory, 'The analysis of random polling systems."  Oprr- 
nlionr Rcreomh, Vo1.36. No.5 (Sepkmber-October),  pp. 716-732 1988. 
2297 