Birkhoff-von-Neumann Switches with Deflection-Compensated Mechanism by Zhang, Jinghui et al.
ar
X
iv
:1
30
8.
42
80
v1
  [
cs
.N
I] 
 20
 A
ug
 20
13
1
Birkhoff-von-Neumann Switches with
Deflection-Compensated Mechanism
Jinghui Zhang, Tong Ye, Member, IEEE, Tony T. Lee, Fellow, IEEE, Fangfang Yan, and Weisheng
Hu, Member, IEEE
Abstract—Despite the high throughput and low complexity
achieved by input scheduling based on Birkhoff-von-Neumann
(BvN) decomposition; the performance of the BvN switch be-
comes less predictable when the input traffic is bursty. In this
paper, we propose a deflection-compensated BvN (D-BvN) switch
architecture to enhance the quasi-static scheduling based on BvN
decomposition. The D-BvN switches provide capacity guarantee
for virtual circuits (VCs) and deflect bursty traffic when overflow
occurs. The deflection scheme is devised to offset the excessive
buffer requirement of each VC when input traffic is bursty.
The design of our conditional deflection mechanism is based on
the fact that it is unlikely that the traffic input to VCs is all
bursty at the same time; most likely some starving VCs have
spare capacities when some other VCs are in the overflow state.
The proposed algorithm makes full use of the spare capacities
of those starving VCs to deflect the overflow traffic to other
inputs and provide bandwidth for the deflected traffic to re-
access the desired VC. Our analysis and simulation show that this
deflection-compensated mechanism can support BvN switches to
achieve close to 100% throughput of offered load even with bursty
input traffic, and reduces the average end-to-end delay and delay
jitter. Also, our result indicates that the packet out-of-sequence
probability due to deflection of overflow traffic is negligible, thus
only a small re-sequencing buffer is needed at each output port.
Index Terms—input-queued switch, Birkhoff von-Neumann
switch, scheduling, deflection, burst traffic.
I. INTRODUCTION
IT is well known that the throughput of input-queuedswitches is limited by head-of-line (HOL) blocking. At
each time-slot, a traffic scheduler is necessary to configure
a connection pattern for the switch fabric to avoid output
contentions and maximize the throughput. A good scheduler
can improve the system throughput and reduce the end-to-
end delay, and should have a low computation complexity.
Recently, a number of scheduling algorithms [1]–[24] have
been proposed for this purpose.
A class of on-line algorithms [4], [8]–[11], such as iSLIP
and dual round-robin matching (DRRM), were devised to
compute the input/output paths and rearrange the connection
pattern of the switching fabric on the fly according to the
requests of input packets. Despite that these on-line algorithms
are highly efficient and can achieve 100% throughput for uni-
form input traffic, they are difficult to scale when the number
This work was supported by the National Science Foundation of China
(61001074, 61172065, and 60825103), Qualcomm corporation foundation,
973 program (2010CB328205, 2010CB328204), and Shanghai 09XD1402200.
The authors are with the State Key Laboratory of Advanced Optical Com-
munication Systems and Networks, Shanghai Jiao Tong University, Shanghai
200030, China. (e-mail: {zjhrzbb, yetong, ttlee, yff, wshu}@sjtu.edu.cn)
of ports N and the line rate are both very large, because these
real-time contention resolution algorithms require computation
on a slot-by-slot basis, and typically have a complexity on the
order of O(N logN) [6].
To avoid real-time computation while providing the capacity
guarantee for each input/output (I/O) pair, or virtual circuit
(VC), a quasi-static scheduling algorithm, called path switch-
ing, was proposed in [3], which was later called Birkhoff-von
Neumann (BvN) switching in [25]. This scheduling algorithm
guarantees the capacity assigned for each VC by the repeated
executions of a set of predetermined connection patterns,
which are calculated from the average loading of all VCs
subject to the fixed total switching capacity. The scalability of
input-queued switches can be significantly enhanced because
this scheduling algorithm does not compute the connection
patterns on the fly. Furthermore, results in [7] show that the
operation overheads, including the computation complexity
and memory requirement, can be substantially reduced by the
careful design of the switching structure. It was shown in [26]
that if the input traffic of each VC is deterministically bounded
by a so-called arrival curve [27], this quasi-static algorithm
can also achieve 100% throughput and bounded end-to-end
delay. However, the performance of the system becomes less
predictable when the input traffic is bursty, mainly due to the
explosion of multimedia applications. It is reported in [28] that
the input traffic at a core router can fluctuate drastically around
its mean within a relatively short period of time, though the
aggregated traffic is statistically stable in a longer term. In
this paper, we propose a compensation mechanism for BvN
switches based on the deflection of overflow traffic to cope
with the burstiness of arrivals.
A. Previous work on load-balancing and packet out-of -
sequence problems in BvN switches
A two-stage load-balanced BvN (LB-BvN) crossbar switch
was proposed in [1], [2] to deal with such unpredictable
traffic fluctuations. In the first stage, the traffic is evenly
distributed over all input ports of the second stage, while a
set of N circular-shift permutations is periodically running in
the second stage to switch the packets to their destinations.
Thus, it is essentially a round-robin algorithm that can cope
with any traffic conditions [6], [29], and only has a complexity
of order O(1). However, immoderate traffic distribution in the
first stage introduces a severe packet out-of-sequence problem
at the output ports in the LB-BvN switch, even when the input
traffic is quite smooth.
2One way to fix the out-of-sequence problem is to reorder
packets in a so-called re-sequencing buffer at each output port.
A Byte-Focal switch is proposed in [19] to solve the out-of-
sequence problem by using the path of each packet along the
switch. Though the computation complexity is low, this switch
design requires a set of N2 resequencing queues, called virtual
input queues in [19], at each output port, and the end-to-end
delay is on the order of O(N2) [20]. The scheme described in
[15] tried to cope with the out-of-sequence problem by using
a three-stage load-balancing switch, which consists of an LB-
BvN switch followed by a set of N memory banks as well as
an additional switch. This design requires additional hardware
and global information exchange for memory reservation in
practical implementation [16].
Yet another approach to tackle this problem is to prevent
packet mis-sequencing in advance. The first effort was made
in [2], where two policies, first-come-first-served (FCFS) and
earliest deadline first (EDF), were introduced to manage the
central buffers to bound packet out-of-sequence at output
buffers. However, the FCFS policy requires the central buffers
to speedup N times, while the EDF policy needs to check
the timestamps of all packets at each time slot, which makes
the system unscalable [12], [14]. To reduce the complexity,
a class of frame-based algorithms is proposed in [5], [6],
[14], [21], which collect a set of packets of the same flow
in either the input buffer or the central buffer so that these
packets can be scheduled consecutively as a frame. Obviously,
the switches that adopted these frame-based algorithms are
not work-conserving [6]. Furthermore, many existing frame-
based algorithms may suffer from various kinds of weak-
nesses. The Full-Frame First algorithm described in [6] has
a poor scalability because it requires N3 central buffers and
global information exchange. The Uniform Frame Spreading
algorithm proposed in [21] has a poor delay performance even
with light traffic as it takes time to accumulate the packets to
form a full-frame. The Padded Frame algorithm introduced in
[14] underutilizes the bandwidth since it pads dummy packets
into the frames to fasten forwarding these frames. Similarly,
due to the delivery of fake packets from input buffers to central
buffers, the Conservation-and-Reservation algorithm described
in [5] also cannot fully utilize the bandwidth.
Another kind of scheme is the mailbox switch proposed in
[12], which guarantees in-order packet delivery by tracking
the time of each packet departure from the central buffer.
However, potential packet contention at the central buffers
limits throughput. As mentioned in [5], the throughput of the
mailbox switch is low. An improved version of the mailbox
switch was the feedback-based LB-BvN switch reported in
[13], [16], [18]. Similar to mailbox switches, feedback-based
switches require timely feedback of the occupancy status of
the central buffers, which imposes a stringent requirement on
system implementation. To achieve 100% throughput in the
multi-cabinet case, it is shown in [16] that such a switch should
execute an on-line batch scheduling algorithm with speedup
capability at each input port.
In sum, the complexity or cost to reorder out-of-sequencing
packets in the LB-BvN switch is quite significant. Actually,
packet re-sequencing is equivalent to the function performed
by a time-slot interchanger, which possesses the same com-
plexity as a space-division packet switch [30].
B. Our approach in this paper
In this paper, as mentioned above, we introduce a traffic
deflection mechanism to enhance quasi-static scheduling based
on BvN decomposition. Deflection can be considered as an
alternative buffering scheme, which has been widely used in
the packet switching networks. Many packet switch architec-
tures effectively employ deflection routing for resolving con-
tentions among packets in a distributed manner. For example,
the tandem-banyan networks [31], the dual-shuffle exchange
networks with error-correcting routing [32], and optical burst
switches [33]. Unlike these deflection routing algorithms, the
aim of our algorithm is to smooth the fluctuations of input
traffic, and to balance the loadings of VCs.
The deflection-compensated BvN (D-BvN) switches provide
capacity guarantee for VCs according to the average traffic
loading in the same manner as that of BvN switches. However,
in contrast to the immoderate traffic distribution in LB-BvN
switches, only bursty traffic that overflows from the VCs
will be deflected in D-BvN switches. The design of this
conditional deflection mechanism is based on the following
intrinsic properties of bursty input traffic:
• For each individual VC, sometime the burst arrivals may
lead to buffer overflow, which implies the same buffer
may be empty at some other time and there exists spare
capacities not fully utilized.
• It is unlikely that the traffic input to all VCs bursts at
the same time especially when the number of ports N is
large.
That is, during a short period of time, some starving VCs
have spare capacities while some other VCs are in the overflow
state. Our deflection algorithm makes full use of the spare
capacities of those starving VCs to cope with the overflow
traffic in the following manners:
• Some spare capacities act as dynamic buffers, which can
be used to deflect the overflow traffic to other inputs
such that the overflow traffic can re-access the switching
network.
• Remaining spare capacities provide bandwidth for the
deflected traffic to re-access the desired VC.
Our analysis and simulation show that this deflection
scheme can support BvN switches to achieve close to 100%
throughput of offered load, and reduce the average end-to-end
delay and the delay jitter. Although the deflection also leads to
packet out-of-sequence at the outputs, our result indicates that
the packet out-of-sequence probability caused by conditional
deflections is very small, thus only a small re-sequencing
buffer is needed at each output. Thus, this simply designed
deflection-compensated mechanism is very easy to implement
in practice, yet it has high tolerance to the burst of input traffic.
The rest of the paper is organized as follows. In Section
II, we briefly introduce the basic concept of BvN switches
and then discuss the drawbacks of the quasi-static scheduling.
Motivated by the discussion, we propose the D-BvN switch
31
2
4
1
2
4
(b) Permutation matrices
S1
S1 S1 S1 S1 S2 S2 S2 S3 S3 S4 S1 S1
F slots
(c) Switch States Scheduling
(a) A 4 × 4 BvN Switch
VOQ21
VOQ22
VOQ23
VOQ24
3 3 S2 S3 S4
Fig. 1. Principle of BvN Switches.
architecture with the deflection-compensated scheduling algo-
rithm. In Section III, the performance of D-BvN switches is
analyzed by using the Markov modulated fluid-flow model
of input traffic. We devised an ideal deflection approximation
technique to estimate the minimum Virtual Output Queue
(VOQ) size required to achieve close to 100% throughput
of offered load. In addition, a detailed comparison of the
analytical results with simulations is given in this section.
To facilitate the presentation, we put the fluid-flow analysis
of each VC of D-BvN switches in Appendix A and that of
BvN switches in Appendix B, and the simulation model in
Appendix C. Section IV is devoted to the analysis of average
end-to-end delay and delay jitter. Section V provides the
conclusion of this paper.
II. PRELIMINARIES AND OVERVIEW
In this section, we briefly introduce the basic concept of
BvN switches, and discuss the relevant limitation of this quasi-
static scheduling algorithm, which motivates the deflection-
compensated mechanism proposed in this paper.
An N × N BvN switch is plotted in Fig. 1. There are
a set of N virtual output queues (VOQs) in each input, in
which VOQij denotes the VOQ at input i to buffer packets
destined for output j. We assume that the switch fabric under
consideration is a crossbar switch to simplify our discus-
sion; the same principle can be extended to a three-stage
Clos network in a straightforward manner [3]. The quasi-
static scheduling algorithm of the BvN switch is based on
the Birkhoff-von Neumann decomposition theorem of doubly
stochastic matrices [34].
Given an input traffic matrix [λij ]N×N , where λij is the av-
erage traffic rate from input i to output j, and
∑
i λij ≤ 1 and∑
j λij ≤ 1, let [cij ]N×N be a capacity matrix that satisfies the
set of constraints λij ≤ cij and
∑
i λij =
∑
j λij = 1, where
cij is the capacity assigned to the I/O pair (i, j). According to
the Birkhoff-von Neumann theorem [34], the capacity matrix
can be decomposed into a sum of permutation matrices and
expressed as follows:
[cij ]N×N =
M∑
i=1
φiSi, (1)
where M is an integer less than N2− 2N +2, Si is a permu-
tation matrix, and the weight φi > 0 satisfies
∑M
i=1 φi = 1.
Each permutation matrix Si represents a connection pattern
of the switch, as illustrated in Fig. 1(b). Within a frame
of F consecutive time slots, these predetermined connection
patterns are scheduled according to their weights in the BvN
switch.As the example illustrated in Fig. 1(c), the frame size F
is selected to make all Fφis integers.In each time slot within a
frame, a scheduled permutation establishes N I/O connections
for sending packets. For instance, if the permutation S1 shown
in Fig. 1(b) is scheduled in the kth time slot in the frame, then
input 2 receives a token for sending a packet to output 1 in
every kth time slot of a frame. According to (1), the BvN
switch guarantees that the assigned capacity cij for each I/O
pair (i, j), or virtual circuit (VC), can be statistically satisfied
by this periodic token assignment scheme.
The Birkhoff-von Neumann decomposition only satisfies the
average rate of traffic, and the capacity assignments based on
(1) are not adaptive to the input traffic fluctuation of each
individual VC. Sometimes, the newly arrivals will overflow
from the VOQ if the buffer is full due to bursty input traffic.
However, the same VOQ could be empty at some other time,
and the assigned tokens for the corresponding VC will be
discarded. Thus, the main drawback of BvN switches is the
throughput degradation in the face of bursty traffic, which not
only increases the overflow probability but also underutilizes
the assigned capacity.
Since VOQs of these VCs are independent to each other,
some VOQs may be full when some other VOQs are empty at
the same time, especially if the port numberN of the switching
fabric is large. This suggests that some spare capacities, called
free tokens, of those idle VCs can be utilized to carry the traffic
overflow from other VCs to reduce the system loss rate and
improve the throughput of the switch. Based on this self-tuning
property, we propose a deflection-compensated BvN (D-BvN)
scheduling algorithm to relax the limitation of BvN switches.
In a D-BvN switch, the traffic spilled from the saturated
VOQs is deflected to the VOQs at other inputs, such that the
overflow traffic can re-access the desired VCs. The free tokens
play two essential roles in a D-BvN switch: (i) to deflect
4i
j
0
k
VOQij
VOQik
Throttle Buffer 
TBi
to network
？
destination is j
destination is not j
VOQiN
VOQij
VOQik
VOQiN
packet i k
0
N N
j
Aik
Throttle Buffer
TBj
Fig. 2. Deflection process of an overflow packet.
overflow traffic, and (ii) to provide re-access bandwidth for
overflow traffic. At each input port in the D-BvN switch, as
plotted in Fig. 2, a throttle buffer is installed to temporarily
store the traffic overflow from saturated VOQs before it is
deflected to other inputs. Furthermore, a feedback link is
provided between an output and its corresponding input to
re-access VCs. As illustrated by the example shown in Fig.
2, the traffic deflected to output j has another chance to re-
access the desired VCs at the input j through the feedback
channel between these two ports. This deflection process is
illustrated by the example shown in Fig. 2, in which we adopt
the following notations:
Aik a newly arrived packet at input i that is destined
for output k
VCik the VC between input i and output k
VOQik the VOQ that buffers the traffic of VCik
TBi the throttle buffer at input i
The procedure of deflection is described as follows:
Step 1 If VOQik is not full, Aik enters VOQik and waits
for the service by VCik; otherwise, Aik joins TBi;
Step 2 When Aik becomes the HOL packet in TBi and
VCij currently has a free token (i.e., VOQij is
empty), Aik is deflected to output j via VCij ;
Step 3 If j = k, then Aik reaches its desired output;
otherwise, forward Aik back to input j and repeat
Step 1.
It can be readily seen from the above deflection procedure
that the implementation of the D-BvN switch only requires
marginally increasing hardware cost and computation com-
plexity.
III. FLUID-FLOW MODEL OF D-BVN SWITCHES
In this section, the performance of the D-BvN switch is
analyzed using a fluid-flow model of input traffic. Our primary
goal is to investigate the effectiveness of the deflection-
compensated mechanism. Since the deflection by using spare
capacity is a scheme designed to offset the VOQ buffer
requirements, our analysis initially focuses on the trade-off
between the VOQ size and the spare capacity, and aims to esti-
mate the minimum VOQ requirement to achieve the maximum
throughput. A widely used approach in the throughput analysis
of packet switching systems is to decompose the multi-queue
system into independent FIFO queues, and treat each VOQ
buffer separately [35]–[37]. We adopt a similar approach and
make the following assumptions in our analysis:
• Port number N is large enough such that the deflection
scheme can effectively compensate for traffic fluctuations.
• To characterize the burstiness of input traffic, we assume
that the arrival process of fresh traffic at each VC is a
Markov modulated on-off fluid flow [30].
• All VCs in the D-BvN switch are statistically identical.
That is, all VCs have independent and identical arrival
processes, and they possess the same buffer size and equal
service capacity.
• From the homogenous assumption, the flow conservation
law should be held for each VC in a D-BvN switch
with large port number N . That is, the overflow traffic
generated by a VC is equal to that deflected to this VC,
and the deflection traffic arriving at each VC is a constant
fluid-flow.
Suppose the service rate for each input is normalized to
1, and the average arrival rate of fresh traffic is λ¯p < 1.
The parameters and notations used in this paper are listed as
follows for easy reference:
C service capacity of each VC, C = 1/N
λ¯ average input traffic rate for each VC, λ¯ = λ¯p/N < C
λˆ peak rate of fresh traffic in the on state, λˆ > C
α transition rate of the on state
β transition rate of the off state
b burstiness of fresh input traffic, b = 1/(α+ β)
ρ offered load of each VC, ρ = λ¯/C
K buffer size of VOQ for each VC
∆ average rate of the traffic overflow from each VC
C2 average spare capacity of each VC
5on off
t
off off
on on
off off
on on
t
off
on
off
on
smaller burstiness b
larger burstiness b
on-off transition
Fig. 3. The burstiness of on-off fluid flow.
Since the BvN decomposition is based on the mean traffic
loading, the capacity C assigned to each VC may not be fully
utilized because of the burstiness of the input traffic. When
the VOQ buffer is empty, the tokens generated according to
the assigned capacity C could be wasted because there is no
bucket to store them. The spare capacity C2 here is referred to
these free tokens that are available to carry packets deflected
from other VCs.
In switches based on the BvN scheduling, the maximum
throughput cannot exceed the offered load. The throughput
of offered load in a D-BvN switch is defined as the ratio of
the average rate of the traffic departs from the switch to the
average rate of the fresh traffic arriving at the switch. For
example, in the fluid-flow model of D-BvN switch under the
above assumptions, the maximum throughput of offered load
is given by (C − C2)/λ¯.
The details of the analysis are described in the rest of
this section. In subsection III-A, the behavior of each VC is
analyzed by using the fluid-flow model of input traffic [30].
We first derive the average rate of overflow traffic spilled from
each VC, and then compute the spare capacity that cannot be
utilized due to the bursty arrivals. Based on these results, in
subsection III-B, an ideal deflection approximation is devised
to estimate the minimum VOQ requirement to achieve close
to 100% throughput of offered load. In subsection III-C,
we compare the approximate results of loss probability and
deflection probability with that obtained by simulations.
A. Analysis of VOQ behavior
We assume that the fresh traffic input to a virtual circuit
is a Markov modulated on-off fluid flow, as shown in Fig. 3,
with an arrival rate λˆ in the on state, and 0 in the off state.
The on and off periods are both exponentially distributed with
respective mean durations 1/α and 1/β. Hence, the peak-to-
average ratio of the fresh arrival rate is given by:
λˆ
λ¯
=
α+ β
β
.
The burstiness b of input traffic is defined as follows:
b =
1
α+ β
,
which is similar to the definition of burst length given in [4],
[27]. If the peak-to-average ratio is fixed, then the probability
that the input traffic is in an on or off state, denoted as pion =
β/(α + β) and pioff = 1 − pion, respectively, is also fixed.
It follows that the burstiness b is proportional to the average
t
off off
on on
K
output
C
Rout
C2
t
off off
on on
t
fresh arrival traffic deflected to the VC total arrival
Fig. 4. The model of a VC with a finite VOQ buffer.
duration of on state 1/α, and also the average duration of off
state 1/β:
b =
1
α
pioff =
1
β
pion.
As illustrated in Fig. 3, a longer on period in a cycle implies
a larger burstiness b in the time domain for the same peak-to-
average ratio.
In a D-BvN switch, the superposition of fresh input traffic
and overflow traffic deflected from other VCs is fed into each
VC. From the assumptions that all VCs are homogeneous and
the number of ports N is large, the aggregate traffic deflected
to the VC can be considered as a constant fluid flow with
intensity λd in steady state. Therefore, as illustrated in Fig. 4,
the superimposed traffic input to each VC is still a Markov
modulated on-off fluid flow with a peak rate λˆ+λd in the on
state, a deflection rate λd in the off state, and an average rate
λ¯+ λd.
The overflow packets will be temporarily stored in the
throttle buffer, and deflected by the spare capacity, denoted
as C2 in Fig. 4. The spare capacity provides the free tokens
that are not utilized by idle VCs with empty VOQs. Since the
free tokens for traffic deflection are only offered by idle VCs,
the capacity C assigned to each VC to serve the input traffic
will not be affected by the deflection scheme. As mentioned
in Section II, highly burst arrivals not only increase the buffer
overflow probability, but also cause poor utilization of assigned
capacity C. This point is elaborated in the following lemma.
Lemma 1: For each VC, the average rate of the overflow
traffic is given by
∆ = Pr{x = K} × (λˆ+ λd − C), (2)
and the average spare capacity is
C2 = C − (λ¯+ λd) + ∆, (3)
where
Pr{x = K}
= b ×
(λˆ+ λd − C)β − (C − λd)α
(λˆ+ λd − C)e−εK − (C − λd)α/β
e−εK (4)
is the probability that the VOQ is full and ε is a constant
defined as follows:
ε =
α
λˆ+ λd − C
−
β
C − λd
. (5)
Proof: If the input traffic is in the on state for a long time
such that the finite VOQ is saturated, then the new arrivals
61.0 1.5 2.0 2.5
0.0
5.0x10
-4
1.0x10
-3
1.5x10
-3
2.0x10
-3
S
p
are C
ap
acity
 Overflow Traffic Rate (ana)
 Overflow Traffic Rate (sim)
O
v
er
fl
o
w
 T
ra
ff
ic
 R
at
e
Burstiness, b
0.0
5.0x10
-4
1.0x10
-3
1.5x10
-3
2.0x10
-3
 Spare Capacity (ana)
 Spare Capacity (sim)
Fig. 5. Overflow traffic rate and spare capacity versus VOQ size, where
C = 1/64, λˆ = 0.8, λ¯ = 0.98/64, λd = 0.01/64 and K = 20.
overflow from the VOQ, as illustrated in Fig. 4. The average
rate of the overflow traffic can be calculated by
∆ =
overflow traffic in period T
time period T
=
T × Pr{x = K} × (λˆ+ λd − C)
T
= Pr{x = K} × (λˆ+ λd − C).
The following probability that the VOQ is full is derived in
Appendix A based on the standard fluid-flow model described
in [30]:
Pr{x = K} = b×
(λˆ+ λd − C)β − (C − λd)α
(λˆ+ λd − C)e−εK − (C − λd)α/β
e−εK ,
where
ε =
α
λˆ+ λd − C
−
β
C − λd
.
Due to traffic overflow, the average output traffic rate Rout is
less than the average input traffic rate λ¯ + λd, and can be
calculated by
Rout =
input traffic in period T − overflow in period T
time period T
=
(λ¯+ λd)T −∆T
T
= λ¯+ λd −∆.
Thus, the average spare capacity that cannot be utilized by the
aggregate input traffic is given by
C2 = C −Rout = C − (λ¯+ λd) + ∆.
Lemma 1 clearly demonstrates the self-tuning property of
the deflection scheme described in Section II. From (2) and
(3), it is easy to see that both the overflow traffic rate ∆ and
the spare capacity C2 increase with the burstiness b if the
average input traffic rate λ¯+λd and the buffer size K are fixed.
This result is consistent with the simulation results shown in
Fig. 5, where the simulations follow the discrete-time model
described in Appendix C.
On the other hand, if we fix the burstiness b and decrease
the VOQ size K , (2) and (3) of Lemma 1 indicate that both
the average rate of the overflow traffic rate ∆ and the average
spare capacity C2 will increase simultaneously. In the D-BvN
switch, the spare capacity C2 is utilized for the deflection
of overflow packets, and serves as a dynamic buffer for the
overflow traffic ∆. Thus, there is a trade-off between the real
VOQ buffer size K , and the dynamic buffer size C2. In the
next subsection, we estimate the appropriate VOQ size by
striking a balance between these two buffering strategies.
B. Ideal Deflection Approximation
In this subsection, we focus on the formulation of the trade-
off between the VOQ size K and the spare capacity C2. The
exact analysis of a feedback stochastic system, such as the
D-BvN switch, is mathematically intractable even with the
simplified fluid-flow model. Thus, we propose an approxima-
tion technique to solve the problem by assuming that the D-
BvN switch under consideration is furnished with an ideal
deflection mechanism, which guarantees that the spare capac-
ity NC2 is always available to serve overflow packets in the
throttle buffer. Apparently, this ideal deflection approximation
can only provide some optimistic estimation. Despite some
numerical discrepancies, the approximate solutions, in general,
agree with simulations, and they can precisely characterize
the deflection-compensated scheduling of D-BvN switches.
The following measurements are of interest to the throughput
analysis of D-BvN switches:
Pd Probability that an input packet is deflected.
Pl Packet loss probability.
K˙ Minimum VOQ size required to achieve a loss rate of
Pl = 10
−5
.
In the approximate analysis of the D-BvN switch with an
ideal deflection scheme, we use the bold roman type notations
Pd and Pl to denote the approximations of these probabilities,
and K˙ to denote the required minimum VOQ size that can
achieve 100% throughput of offered load.
There are N VOQs at each input, as illustrated in Fig. 6.
Some of these finite VOQs are full while some others are
empty at any instant of time due to the burstiness of input
traffic. The traffic spilled from these overflow VOQs is fed into
the throttle buffer, and then deflected via the spare capacity of
idle VCs. For large port number N , we consider the overflow
traffic input to the throttle buffer is a constant flow N∆, and
the superposition of the spare capacities of all the VCs at an
input is also a constant NC2 over time. In this subsection, we
show that close to 100% throughput of offered load can be
achieved by a D-BvN switch with a finite VOQ size K˙.
We first consider an extreme scenario, where the VOQ size
K of each VC is sufficiently small such that the total input
rate of a VC is larger than the assigned capacity C. From (3),
we know
N(λ¯+ λd) > NC iff N∆ > NC2,
in which case the system is unstable, and the throttle buffer
of any finite size will be fully occupied all the time. Under
the ideal deflection assumption that all spare capacities NC2
7K
C
K
C
Input i
K
C
N
NC2
deflected to other input ports
VOQi1
VOQi2
VOQiN
aggregate spare capacity
aggregate overflow traffic
N NC2
throttle 
buffer
Fig. 6. The model of an input port with a throttle buffer.
are utilized for deflection, the amount of traffic dropped by
the throttle buffer in a period of time T is (N∆−NC2)T at
each input. Thus, the traffic loss rate is given by
Pl =
(N∆−NC2)T
Nλ¯T
=
∆− C2
λ¯
. (6)
Since the total amount of deflected traffic at each input is
NC2T , then each input packet, either a deflected packet or a
fresh packet, will be deflected with a probability Pd given as
follows:
Pd =
traffic deflected at an input in period T
input traffic in period T
=
NC2 × T
N(λ¯+ λd)× T
=
C2
λ¯+ λd
. (7)
Since (λ¯+λd)Pd is the average rate of deflection traffic gen-
erated by a VC, it follows from the law of flow conservation,
we have
Pd =
λd
λ¯+ λd
. (8)
From (7) and (8), we immediately obtain λd = C2, which
is consistent with the assumption that the spare capacity of
each VC is fully utilized by the deflection traffic. Furthermore,
combining (2) and (3) under this assumption, we can calculate
the average deflection rate λd from the following equation:
λd =C − (λ¯+ λd) + (λˆ+ λd − C)
×
β
α+ β
(λˆ+ λd − C)β − (C − λd)α
(λˆ + λd − C)βe−εK − (C − λd)α
e−εK .
(9)
If N∆ > NC2, then the packet loss probability Pl and the
deflection probability Pd can be obtained from (6) and (8),
respectively.
The next scenario, by contrast, we consider a system with a
sufficiently large VOQ size K , such that the stable condition
N(λ¯+λd) < NC always holds, or equivalently, N∆ < NC2.
Again, under the ideal deflection assumption, all overflow
traffic can be deflected to other inputs by the spare capacity
NC2, and no packet losses, i.e., Pl = 0. Therefore, the
deflection probability is given by
Pd =
N∆× T
N(λ¯+ λd)× T
=
∆
λ¯+ λd
. (10)
From the law of flow conservation (8) and (10), we know
∆ = λd. Combining (2) and (3) again, we can calculate λd
from the following equation
λd =(λˆ+ λd − C)
×
β
α+ β
(λˆ+ λd − C)β − (C − λd)α
(λˆ+ λd − C)βe−εK − (C − λd)α
e−εK .
(11)
Then determine the deflection probability Pd from (10) for
N∆ < NC2.
The previous analysis demonstrates that loss of packets
occurs in D-BvN switches if the VOQ size K is too small.
On the other hand, the spare capacity NC2 will not be fully
utilized if the VOQ size K is too large. To determine the
minimum VOQ size K without causing packet losses under
the ideal deflection assumption, we consider that the system
is in the following equilibrium state:
N(λ¯+ λd) = NC. (12)
which is equivalent to N∆ = NC2. Since the spare capacity
NC2 is fully utilized to deflect the overflow traffic N∆, there
is no packet loss. In the following theorem, we show that
100% throughput of offered load can be achieved by the D-
BvN switch with a minimum VOQ size K˙, and the packet
deflection probability P˙d is independent of the burstiness b.
8Theorem 1: Under the ideal deflection assumption, the D-
BvN switch with the following VOQ size for each VC can
achieve 100% throughput of offered load:
K˙ = bλ¯×
[
α
β
(
ρ
1− ρ
− 1
)
− 1
]
, (13)
and the deflection probability is given by
P˙d = 1− ρ. (14)
Proof: Under the ideal deflection assumption, if the D-
BvN switch is in the equilibrium state N∆ = NC2, i.e., λ¯+
λd = C, both (9) and (11) become the following equation:
β(λˆ − λ¯)
β
{
1 + K˙(λˆ− λ¯)×
[
α
(λˆ−λ¯)2
+ β
λ¯2
]}
+ α
= C − λ¯,
from which we immediately obtain the expression (13) of
K˙. Similarly, the deflection probability P˙d is obtained by
substituting (12) into (8).
In the equilibrium state where N∆ = NC2, or N(λ¯+λd) =
NC, and under the ideal deflection assumption, the spare
capacity NC2 is fully employed as the dynamic buffer to
accommodate the overflow traffic N∆. The optimality of
VOQ size K˙ given in (13) can be interpreted by (14), which
indicates that the VC is busy all the time: either busy in
transmitting a packet with probability ρ, or deflecting a packet
with probability P˙d = 1− ρ.
Theorem 1 also implies that the D-BvN switch with fi-
nite VOQ size K˙ can achieve 100% throughput of offered
load with packet loss rate Pl = 0. However, this perfect
performance of the D-BvN switch with the ideal deflection
mechanism can never be realized in practice. Instead, we
define K˙ to be the required minimum VOQ size that a D-
BvN switch can have to achieve a loss rate of Pl = 10−5
or, equivalently, a throughput very close to 100%. In general,
this critical VOQ size K˙ in practice is larger than K˙, but the
gap can be reduced by increasing the throttle buffer size. A
detailed comparison of the ideal deflection approximation and
the simulation results is provided in the next subsection.
C. Comparison with Simulation Results
In the simulation, the time is slotted and the input traffic
of each VC is a discrete on-off Markov modulated process
with parameters described in Appendix C. In addition, the
deflection mechanism is modeled by discrete stochastic events
in our simulation. The assumption of an ideal deflection com-
pensation mechanism is only adopted in the approximation,
which provides an optimistic lower bound as we mentioned
before. For the purpose of performance comparisons, we also
provide the analytical and simulation results of the BvN
switch. The analytical results are given in Appendix B.
1) Packet loss probability and optimal VOQ size: Both
analytical and simulation results of the packet loss probability
Pl versus VOQ size K are plotted in Fig. 7, where all
simulation results of Pl are lower-bounded by the approximate
loss probability Pl derived under the ideal deflection assump-
tion, and upper-bounded by that of the BvN switch without
deflection. Herein, the analytical loss rate of the BvN switch
20 40 60 80 100 120 140 160 180 200
10
-5
10
-4
10
-3
10
-2
10
-1
 no deflection (ana) 
 no deflection (sim) 
 deflection B
T
= 1% NK (sim)
 deflection B
T
= 5% NK (sim) 
 deflection B
T
= 10% NK (sim) 
 ideal deflection (ana)
T
ra
ff
ic
 L
o
ss
 R
at
e
VOQ Size, K
Fig. 7. Traffic loss rate versus VOQ size, where N = 64, λˆ = 0.8, λ¯ =
0.98/64, α = 0.49, β = 0.0096 and b = 2.
is plotted according to (29) given in Appendix B. The throttle
buffer size of each simulation curve displayed in Fig. 7 is
specified by BT = x%NK , which means the throttle buffer
size is equal to x% of the size of all VOQs at each input.
In Fig. 7, the loss probability of D-BvN with ideal deflection
assumption quickly approaches 0 when the VOQ size is larger
than a critical value K˙, which suggests that 100% throughput
of offered load can be achieved by D-BvN switches with a
finite VOQ size K˙. However, all simulation results presented
in Fig. 7 also show that the loss probability will never drop to
0 if the VOQ size K is finite. Nevertheless, the system still can
achieve close to 100% throughput of offered load with a finite
VOQ size K˙. For example, the critical value K˙ shown in Fig.
7 lies in the range between 1.5K˙ and 2K˙ when Pl = 10−5.
The discrepancy between the simulation and analytical results
is due to the following reasons:
• In the fluid-flow analysis, when the port number N is
large, we assume that the input to the throttle buffer is
a constant flow, and the available spare capacity is also
constant over time. However, both of them are discrete
probabilistic events in the simulation model.
• The approximation is obtained under the ideal deflection
assumption, such that the utilization of spare capacity
for deflection is maximized. But the arrivals of overflow
packets and the availability of free tokens are both
probabilistic and the size of throttle buffer is finite in
the discrete stochastic model of simulation. Thus, the
services provided by free tokens to deflect packets may
not always ready. However, the gap between Pl and Pl
can be reduced by increasing the throttle buffer size as
shown in Fig. 7.
In spite of these numerical discrepancies, our analytical re-
sults clearly demonstrate the characteristic of D-BvN switches.
In comparison with the BvN switch without deflection, it can
be seen from Fig. 7 that the packet loss probability of the BvN
switch is significantly higher than that of the D-BvN switch,
which decreases slowly with the increasing of VOQ size K . In
other words, without the deflection compensation mechanism,
91.25 1.50 1.75 2.00 2.25 2.50
0
200
400
600
800
1000
 no deflection (ana)
 no deflection (sim)
 def B
T
= 1% NK (sim)
 def B
T
= 5% NK (sim)
 def B
T
= 10% NK (sim)
 ideal deflection (ana)
V
O
Q
 B
u
ff
er
 R
eq
u
ir
em
en
t
Burstiness, b
Fig. 8. Minimum VOQ size versus burstiness, where N = 64, λˆ = 0.8,
and λ¯ = 0.98/64.
the BvN switch requires a much larger VOQ buffer to achieve
the same throughput of offered load.
Presumably, according to the BvN scheduling, the VOQ
buffer is almost unnecessary if the traffic input to each VC is
smooth enough and satisfies the stable condition λ¯ < C; the
buffer is indispensable only when the input traffic is bursty.
This intuitive interpretation of the VOQ function is consistent
with (13) of the above theorem, in which the required VOQ
size K˙ is linearly proportional to the burstiness b. As shown
in Fig. 8, the same property holds for the critical value K˙
that is obtained by the simulation. Fig. 8 also reveals that a
much larger VOQ size is required by the BvN switch without
deflection to achieve the same throughput of offered load, in
which the analytical curve of the BvN switch is given by (29)
in Appendix B.
2) Packet deflection probability and out-of-sequence prob-
lem: The deflection probability Pd obtained by simulations
and the approximations Pd given by (8) for K ≤ K˙, and (10)
for K > K˙ are all displayed in Fig. 9. We note that in general,
Pd > Pd for all VOQ size K and any throttle buffer size
BT . Of course, the difference is caused by the ideal deflection
assumption in the analysis, because additional packet losses
may occur in the simulation.
The packet out-of-sequence problem is unavoidable in the
D-BvN switch due to packet deflections. Since the capacity
of each VC is guaranteed by BvN scheduling, input packets
should normally be routed by the VC within the scheduled
time. Thus, a fresh input packet without experiencing any
deflections should reach the desired output in the same se-
quential order of input. That is, out-of-sequence only occurs
to deflected packets. Since a fresh packet could be deflected
in the D-BvN switch with a probability Pd, the packet out-
of-sequence probability should be upper-bounded by this de-
flection probability Pd. As shown in Fig. 9, the discrepancy
between Pd and Pd is almost negligible, thus the packet out-
of-order probability should be in the same order as that of
Pd. Furthermore, in the equilibrium state, Theorem 1 indicates
that deflection probability P˙d is completely determined by the
offered load ρ and independent of the burstiness b. This point is
0 50 100 150 200
10
-3
10
-2
10
-1
10
0
 ideal def (ana)
 def B
T
= 1% NK (sim)
 def B
T
= 5% NK (sim)
 def B
T
= 10% NK (sim)
D
ef
le
ct
io
n
 P
ro
b
ab
il
it
y
VOQ Size, K
Fig. 9. Packet deflection probability versus VOQ size, where N = 64,
λˆ = 0.8, λ¯ = 0.98/64, α = 0.49, β = 0.0096 and b = 2.
1.5 2.0 2.5
10
-3
10
-2
10
-1
10
0
 ideal deflection (ana)
 def B
T
= 1% NK (sim)
 def B
T
= 5% NK (sim)
 def B
T
= 10% NK (sim)
D
ef
le
ct
io
n
 P
ro
b
ab
il
it
y
Burstiness, b
Fig. 10. Deflection probability versus burstiness, where N = 64, λˆ = 0.8,
λ¯ = 0.98/64 and K = K˙ or K˙.
reinforced by the simulations shown in Fig. 10, where, again,
the difference between Pd and Pd is insignificant.
IV. END-TO-END DELAY AND DELAY JITTER
In this section, we first derive the average end-to-end delay
and delay jitter of D-BvN switches under the ideal deflection
assumption, and then compare these analytical results with
simulations. Again, we show that the delay performances of
D-BvN switches are upper-bounded by that of BvN switches
without deflections, and lower-bounded by that with ideal
deflections. The following parameters and notations are used in
the delay analysis of D-BvN switches with the ideal deflection
mechanism:
D end-to-end packet delay
Dq queuing delay in the VOQ buffer
a a constant cross-switch delay
A. Analysis of Delay and Jitter
In the delay analysis, we consider that the D-BvN switch
under study is operated in the stable region, i.e., N(λ¯+λd) ≤
10
NC, in which no packet losses, i.e., Pl = 0, will occur under
the ideal deflection assumption. Each packet input to a VC
will either be deflected to another input with probability Pd
if the VOQ is full, or join the VOQ with probability 1−Pd.
Thus, the average end-to-end delay E[D] can be expressed as
follows:
E[D] =E[delay if it is deflected]×Pd
+ E[delay if no deflection]× (1 −Pd)
={cross switch delay + E[D]} ×Pd
+ E[Queuing delay in VOQ]× (1−Pd)
=E[a+D]Pd + E[Dq](1 −Pd). (15)
It follows that
E[D] =
aPd
1−Pd
+ E[Dq]. (16)
where the first term is the average delay incurred by deflections
in the D-BvN switch, and the second term, mean queuing
delay E[Dq], is given by (24) in Appendix A. The mean
delay E[D] can be determined by simultaneously solving (16)
together with (10) and (11). Since the deflection probability
Pd is typically very small in the stable region, the mean delay
E[D] in practice is predominated by the queuing delay spent
by the packet waiting in the VOQ buffer.
The delay jitter of a packet is defined by the variance var[D]
of the end-to-end delay D. As with the recursive relation
(15) for the mean delay, the second moment E[D2] can be
evaluated in a similar manner as follows:
E[D2] = E[D2q ](1−Pd) + E[(a+D)
2]Pd. (17)
or equivalently, we have
E[D2] = E[D2q ] +
{a2 + 2aE[D]} ×Pd
1−Pd
.
Thus, the delay jitter is given by
var[D] = E[D2]− E2[D]
= E[D2q ]− E
2[Dq] + a
2 Pd
(1−Pd)2
= var[Dq ] + a
2 Pd
(1−Pd)2
. (18)
where the second moment of queuing delay E[D2q ] is given
by (25) in Appendix A. Again, the delay jitter var[D] can
be simultaneously solved by (10), (11), and (18), and it
is typically predominated by the variance of queuing delay
var[Dq ] in the VOQ buffer.
When the D-BvN switch is in the equilibrium state, N(λ¯+
λd) = NC, the end-to-end delay and the queuing delay in the
VOQ buffer will be denoted by D˙ and D˙q, respectively. In this
particular case, we obtain the following explicit expressions of
delay and delay jitter.
Theorem 2: In the equilibrium state, N(λ¯ + λd) = NC,
the average end-to-end delay and the delay jitter in a D-BvN
switch with ideal deflection mechanism are given by
E[D˙] = a×
1− ρ
ρ
+ E[D˙q], (19)
and
var[D˙] = a2 ×
1− ρ
ρ2
+ var[D˙q ], (20)
respectively, where
E[D˙q] = b×[(
2α
β
+ 1
)
ρ−
(
1 + α
β
)] [(
2− β
α
)
ρ+
(
β
α
− 1
)]
2ρ(1− ρ)
,
(21)
and
E[D˙2q] =b
2×[
(2ρ− 1) α
β
− (1− ρ)
]2 [
(1− ρ) 2β
α
+ 2ρ− 1
]
3(1− ρ)2
.
(22)
Proof: When the D-BvN switch is in the equilibrium
state, N(λ¯ + λd) = NC, we know from Theorem 1 that the
deflection probability is given by P˙d = 1 − ρ. Thus, from
(16), (18), (24), and (25), we immediately obtain the results
(21) and (22).
Theorem 2 demonstrates that a packet input to a D-BvN
switch in the equilibrium state could be deflected (1 − ρ)/ρ
times on the average before it is switched by the VC to its
desired output. Thus, both the mean delay and delay jitter
incurred by deflections are independent of the burstiness b.
This characteristic of D-BvN switches in the equilibrium state
can be attributed to the fact that the deflection probability 1−ρ
is independent of b, which was elaborated in Theorem 1 in
Section III-B.
B. Comparison with Simulation Results
In this subsection, we provide a comparison of the analytic
results of delay and delay jitter under the ideal deflection
assumption with simulation results. Recall that the required
critical VOQ size for the D-BvN switch to achieve a loss rate
of Pl = 10−5 is K˙ . Let D˙ be the end-to-end delay of a packet
in the D-BvN switch with VOQ size K˙.
The average end-to-end delay versus burstiness is plotted
in Fig. 11, where the average delay E[D˙] given by (16) is
compared with the simulation result E[D˙]. Recall that the
capacity of each input port is equally divided and assigned
to N VCs in our model, which implies that each VC only
receives one token in a frame of N time slots. Thus, the
average packet queuing delay shown in Fig. 11 is on the order
of N times the mean queue length of the VOQ, which is
consistent with the results of BvN switches reported in [1].
All curves in Fig. 11 demonstrate that the average end-to-end
delay linearly increases with the burstiness b. Furthermore, we
also provide the delay performance of the BvN switch when
the packet loss rate is 10−5 in Fig. 11, where the analytical
curve is plotted according to (30) in Appendix B. As we
mentioned before, Fig. 11 shows that all simulation results
of E[D˙] are lower-bounded by E[D˙], which is derived in the
equilibrium state under the ideal deflection assumption, and
outperforms that of the BvN switch without deflections.
11
1.25 1.50 1.75 2.00 2.25 2.50
0
2
4
6
8
 no deflection (ana)
 no deflection (sim)
 def B
T
= 1% NK (sim)
 def B
T
= 5% NK (sim)
 def B
T
= 10% NK (sim)
 ideal deflection (ana)
M
ea
n
 D
el
ay
 (
×
 1
0
3
 s
lo
ts
)
Burstiness, b
Fig. 11. Mean delay versus burstiness, where N = 64, λˆ = 0.8, λ¯ =
0.98/64 and K = K˙ or K˙.
1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6
0.00
0.02
0.04
0.06
0.08
0.10
 ideal deflection (ana)
 def B
T
= 1% NK (sim)
 def B
T
= 5% NK (sim)
 def B
T
= 10% NK (sim)
A
v
er
ag
e 
D
ef
le
ct
in
 D
el
ay
 (
sl
o
ts
)
Burstiness, b
Fig. 12. Mean deflection delay versus burstiness, where N = 64, λˆ = 0.8,
λ¯ = 0.98/64 and K = K˙ or K˙.
As shown in Fig. 7 and 8, a very large VOQ buffer is
required to achieve a traffic loss rate of 10−5 in the BvN
switch. Thus, we should expect that backlogged packets will
form a long queue in the VOQ buffer. However, a much
smaller VOQ buffer size K˙ is enough for each VC in the D-
BvN switch to achieve the same loss rate. Therefore, according
to Little’s law in queuing theory, the queuing delay in a D-
BvN switch should be much less than that in a BvN switch
for achieving the same throughput of offered load.
In Fig. 12, the average deflection delay in expression (19)
is compared with the counterpart obtained by the simulation
model. All curves confirm that the average deflection delay is
independent of the burstiness b. In sum, the deflection mech-
anism of a D-BvN switch allows sharing of spare capacities
among different VCs at the expense of negligible deflection
delay. This simply designed scheme not only can reduce the
VOQ buffer requirement, but also significantly improves the
delay performances of the BvN switches, especially in the face
of highly bursty input traffic.
The delay jitter versus burstiness is plotted in Fig. 13,
1.25 1.50 1.75 2.00 2.25 2.50
0
1
2
3
4
5
 no deflection (ana)
 no deflection (sim)
 def B
T
= 1% NK (sim)
 def B
T
= 5% NK (sim)
 def B
T
= 10% NK (sim)
 ideal deflection (ana)
D
el
ay
 J
it
te
r 
(x
1
0
7
 s
lo
t2
)
Burstiness, b
Fig. 13. Delay jitter versus burstiness, where N = 64, λˆ = 0.8, λ¯ =
0.98/64 and K = K˙ or K˙.
in which the delay variance var[D˙] given by (17) provides
a lower bound of the delay variance var[D˙] obtained from
the discrete simulation model. Again, these curves clearly
demonstrate that the delay jitter of the BvN switch without
deflection is unsatisfactory when the burstiness b of input
traffic is high.
V. CONCLUSION
The quasi-static scheduling based on BvN decomposition
is a virtual circuit (VC) switching technique for input-queued
packet switches. Since the guaranteed capacity for each VC is
estimated from the mean loading, as expected, the performance
of BvN switches becomes unpredictable when the input traffic
is bursty. On the other hand, deflection routing is a fundamen-
tal strategy for packet switched networks to eliminate heavy
traffic and reduce the need for buffers. Combining these two
techniques, the D-BvN switch architecture proposed in this
paper provides a compromised scheduling solution for input-
queued switches with VOQs. Our algorithm is specifically
designed to adapt to the fluctuations of input traffic, and is very
easy to implement in practice. This deflection-compensated
scheme is devised to help offset the excessive buffer require-
ments caused by bursty traffic, and it can be easily extended to
any network environment with capacity and QoS guarantees.
APPENDIX A
FLUID FLOW ANALYSIS OF VC IN D-BVN SWITCHES
The queuing behavior of a virtual circuit (VC) is analyzed in
this appendix. As described in Section III, the arrival process
is a Markov modulated on-off fluid flow, where the traffic rate
is λˆ+λd during on periods and λd during off periods, and the
average traffic rate is λ¯+λd. The input traffic state is denoted
by S, the service rate of the VC is C, and the VOQ size
is K . In a VOQ, according to the standard fluid flow model
illustrated in [30], the cumulative distribution function (CDF)
of the queue length x in on period P1(X) and in off period
12
P0(X) satisfies the following equation:

(λd − C)×
dP0(X)
dX
= −βP0(X) + αP1(X)
(λˆ+ λd − C)×
dP1(X)
dX
= βP0(X)− αP1(X)
P1(0) = 0
P0(K) =
α
α+ β
.
It follows that the CDF of the queue length x can be expressed
by{
P0(X) = A0α+A1(λˆ+ λd − C)e
−εX
P1(X) = A0β +A1(C − λd)e
−εX (0 ≤ X < K) ,
(23)
where
A0 =
−(C − λd)α
(α + β)
[
−(C − λd)α+ (λˆ+ λd − C)βe−εK
] ,
A1 =
αβ
(α + β)
[
−(C − λd)α+ (λˆ+ λd − C)βe−εK
] ,
and
ε =
α
λˆ+ λd − C
−
β
C − λd
.
Differentiating (23), we obtain the following relevant proba-
bility density functions (PDFs):{
p0(x = X) = −εA1(λˆ+ λd − C)e
−εX
p1(x = X) = −εA1(C − λd)e
−εX (0 < X < K) ,
Pr{x = 0} = A0(α+ β) +A1λˆ,
and
Pr{x = K} = 1− Pr{x ≤ K−}
= 1− [A0(α+ β) +A1λˆe
−εK ].
Let Po be the overflow probability defined as follows:
Po =
overflow traffic during period T
input traffic during period T
=
Pr{X = K} × (λˆ + λd − C)× T
(λ¯ + λd)× T
.
For a sufficiently long period of time T , the total amount of
traffic entering the VOQ is (λ¯+ λd)(1− Po)T , which can be
classified into four classes as follows:
1) Traffic arriving at the VC when the VOQ is empty (i.e.,
x = 0)
The VOQ is empty only when the traffic source is in the off
state and the input traffic rate is λd. Therefore, the input traffic
arriving at the VC when the VOQ is empty is λd × Pr{x =
0} × T . Consequently, the probability that the waiting time
Dq = 0 is
Pr{Dq = 0}
=
traffic entering VC during T when x = 0
traffic entering VC over T
=
λd × Pr{x = 0} × T
(λ¯+ λd)(1 − Po)T
.
2) Traffic arriving at the VC when the queue length of the
VOQ is x = X(0 < X < K) and the input traffic state is on
In this case, the input traffic has to wait for X/C before it
can be served by the VC. Also, the input traffic rate is λˆ+λd.
Thus, the probability that the waiting time Dq = X/C and
input traffic state is on is given by
Pr
{
X
C
< Dq <
X + dX
C
, S = on
}
=
traffic entering VC during T when 0 < X < K & S = on
traffic entering VC over T
=
(λˆ + λd)p1(x = X)T
(λ¯+ λd)(1 − Po)T
dX
=
(λˆ + λd)p1(x = DqC)T
(λ¯+ λd)(1 − Po)T
CdDq.
3) Traffic arriving at the VC when the queue length of the
VOQ is x = X(0 < X < K) and the input traffic state is off
Similar with the previous case, the probability that the
waiting time Dq = X/C and input traffic state is off is given
by
Pr
{
X
C
< Dq <
X + dX
C
, S = off
}
=
traffic entering VC during T when 0 < X < K & S = off
traffic entering VC over T
=
λdp0(x = X)T
(λ¯ + λd)(1− Po)T
dX
=
λdp0(x = DqC)T
(λ¯ + λd)(1− Po)T
CdDq.
4) Traffic arriving at the VC when the queue length of the
VOQ is x = K
The VOQ is full only when the traffic source is in the on
state and the input traffic rate is λˆ+ λd. In this case, the rate
of the traffic that actually enters the VOQ is C. Therefore, the
probability that the waiting time of the input traffic is K/C
should be
Pr
{
Dq =
K
C
}
=
traffic entering VC during T when x = K
traffic entering VC over T
=
C × Pr{x = K}T
(λ¯+ λd)(1− Po)T
.
From the probability distributions defined above, we can
determine the first and second moments of the queuing delay
that the traffic waits in the VOQ as follows:
E[Dq] = 0×
λd × Pr{x = 0}T
(λ¯ + λd)(1− Po)T
+
∫ K
C
−
0
Dq×
T [(λˆ+ λd)p1(x = DqC) + λdp0(x = DqC)]
(λ¯+ λd)(1− Po)T
CdDq
+
K
C
×
C × Pr{x = K}T
(λ¯+ λd)(1 − Po)T
, (24)
13
and
E[D2q ] = 0×
λd × Pr{x = 0}T
(λ¯+ λd)(1− Po)T
+
∫ K−
C
0
D
2
q×
T [(λˆ+ λd)p1(x = DqC) + λdp0(x = DqC)]
(λ¯+ λd)(1 − Po)T
CdDq
+
(
K
C
)2
×
C × Pr{x = K}T
(λ¯ + λd)(1− Po)T
. (25)
In the equilibrium state, when N(λ¯+ λd) = NC and K =
K˙, we obtain the following results:
E[D˙q] = b×[(
2α
β
+ 1
)
ρ−
(
1 + α
β
)] [(
2− β
α
)
ρ+
(
β
α
− 1
)]
2ρ(1− ρ)
,
(26)
and
E[D˙2q ] =b
2×[
(2ρ− 1) α
β
− (1− ρ)
]2 [
(1− ρ) 2β
α
+ 2ρ− 1
]
3(1− ρ)2
,
(27)
APPENDIX B
FLUID FLOW ANALYSIS OF VC IN BVN SWITCHES
The analytical results of the BvN switch without deflections
are given in this appendix. We consider that the BvN switch
under study is homogenous, such that all VCs are identical
and independent to each other. The traffic input to each VC is
a Markov modulated on-off fluid flow, where the traffic rate
is λˆ in on periods and 0 in off periods, and the average rate
is λ¯.
Similarly to Appendix A, we firstly obtain the probability
density functions (PDFs) of the queue length x of each VOQ
as follows:{
p0(x = X) = −εA1(λˆ− C)e
−εX
p1(x = X) = −εA1Ce
−εX (0 < X < K) ,
Pr{x = 0} = A0(α+ β) +A1λˆ,
and
Pr{x = K} = 1− [A0(α+ β) +A1λˆe−εK ]. (28)
where
A0 =
−Cα
(α+ β)
[
−Cα+ (λˆ− C)βe−εK
] ,
A1 =
αβ
(α+ β)
[
−Cα+ (λˆ− C)βe−εK
] ,
and
ε =
α
λˆ− C
−
β
C
.
In BvN switches, overflow packets will be immediately
dropped by VOQs. Following the same derivation procedure
of Po and using (28), we can obtain the loss probability Pl as
follows
Pl =
1
α+ β
(λˆ− C)β[Cα − β(λˆ − C)]
λ¯[CαeεK − β(λˆ− C)]
,
where
ε =
α
λˆ− C
−
β
C
.
On the other hand, if Pl is given, the required VOQ buffer
size can is given as follows
K =
1
ε
ln
β(λˆ − C) +
(λˆ− C)β[Cα − β(λˆ − C)]
(α+ β)λ¯Pl
Cα
. (29)
Using the same technique described in Appendix A, we can
obtain the average delay E[D] and delay jitter var[D] of BvN
switches from (29) as follows:
E[D] =
∫ K
C
−
0
D ×
λˆp1(x = DC)
λ¯(1 − Pl)
CdD
+
K
C
×
C × Pr{x = K}
λ¯(1− Pl)
,
(30)
and
E[D2] =
∫ K
C
−
0
D2 ×
λˆp1(x = DC)
λ¯(1− Pl)
CdD
+
(
K
C
)2
×
C × Pr{x = K}
λ¯(1− Pl)
.
(31)
APPENDIX C
DISCRETE MODEL OF SIMULATION
The time is slotted in the simulation model of D-BvN
switches. We assume that the input stream to each VC is a
discrete-time on-off process with geometrically distributed on
and off states with transition probabilities α and β, respec-
tively, and all VCs are statistically identical. Furthermore, we
assume that an input packet is generated in each time slot with
probability λˆ during the on state, 0 < λˆ ≤ 1, and no packets
arrive during the off state. The probability λˆ is actually the
peak arrival rate, such that the following average input rate
λ¯ =
λˆβ
α+ β
< C =
1
N
,
is consistent with the same parameter of the fluid-flow model
employed in our analysis. As for BvN decomposition, we
use a set of randomly generated N circular-shift permutation
matrices to guarantee that the average capacity assigned to
each VC is C = 1/N . In addition, we set the cross-switch
constant delay a equal to one time slot in the simulation.
14
REFERENCES
[1] C. S. Chang, D. S. Lee, and Y. S. Jou, “Load balanced Birkhoff-von
Neumann switches, part I: one-stage buffering,” Computer Communica-
tion, vol. 25, no. 6, pp. 611–622, Apr. 2002.
[2] ——, “Load balanced Birkhoff-von Neumann switches, part II: multi-
stage buffering,” Computer Communication, vol. 25, no. 6, pp. 623–634,
Apr. 2002.
[3] T. T. Lee and C. H. Lam, “Path switching-a quasi-static routing scheme
for large-scale ATM packet switches,” IEEE J. Sel. Areas Commun.,
vol. 15, no. 5, pp. 914–924, Jun. 1997.
[4] N. McKeown, “The iSLIP scheduling algorithm for input-queued
switches,” IEEE/ACM Trans. Netw., vol. 7, no. 2, pp. 188–201, Apr.
1999.
[5] C. L. Yu, C. S. Chang, and D. S. Lee, “CR switch: A load-balanced
switch with contention and reservation,” IEEE/ACM Trans. Netw.,
vol. 17, no. 5, pp. 1659–1671, Oct. 2009.
[6] I. Keslassy and N. McKeown, “Maintaining packet order in two-stage
switches,” in Proc. IEEE INFOCOM, vol. 2, 2002, pp. 1032– 1041.
[7] B. Lin and I. Keslassy, “The interleaved matching switch architecture,”
IEEE Trans. Commun., vol. 57, no. 12, pp. 3732–3742, Dec. 2009.
[8] J. R. Liao and P. H. Wu, “Longest queue first in round-robin matching
for input-queued switches,” in Information Theory and its Applications
(ISITA), 2010 International Symposium on, Oct. 2010, pp. 332–336.
[9] D. Lin, Y. Jiang, and M. Hamdi, “Selective-request round-robin schedul-
ing for VOQ packet switch architecture,” in Proc. IEEE ICC, Jun. 2011,
pp. 1–5.
[10] C. He and K. L. Yeung, “D-LQF: An efficient distributed scheduling
algorithm for input-queued switches,” in Proc. IEEE ICC, Jun. 2011,
pp. 1–5.
[11] Y. Li, S. Panwar, and H. J. Chao, “On the performance of a dual round-
robin switch,” in Proc. IEEE INFOCOM, vol. 3, 2001, pp. 1688–1697.
[12] C. S. Chang, D. S. Lee, Y. J. Shih, and C.-L. Yu, “Mailbox switch: a
scalable two-stage switch architecture for conflict resolution of ordered
packets,” IEEE Trans. Commun., vol. 56, no. 1, pp. 136–149, Jan. 2008.
[13] H. I. Lee, “A two-stage switch with load balancing scheme maintaining
packet sequence,” IEEE Commun. Lett., vol. 10, no. 4, pp. 290–292,
Apr. 2006.
[14] J. J. Jaramillo, F. Milan, and R. Srikant, “Padded frames: A novel algo-
rithm for stable scheduling in load-balanced switches,” in Information
Sciences and Systems, 2006 40th Annual Conference on, Mar. 2006, pp.
1732–1737.
[15] X. Wang, Y. Cai, S. Xiao, and W. Gong, “A three-stage load-balancing
switch,” in Proc. IEEE INFOCOM, Apr. 2008, pp. 1993–2001.
[16] B. Hu and K. L. Yeung, “Feedback-based scheduling for load-balanced
two-stage switches,” IEEE/ACM Trans. Netw., vol. 18, no. 4, pp. 1077–
1090, Aug. 2010.
[17] H. I. Lee and S. W. Seo, “A load balancing scheme for Birkhoff-von
Neumann input-queued switches,” in Proc. IEEE ICC, May 2008, pp.
5664–5668.
[18] B. Hu, K. L. Yeung, and Z. Zhang, “Load-balanced three-stage switch,”
Journal of Network and Computer Applications, vol. 35, no. 1, pp. 502–
509, Jan. 2012.
[19] Y. Shen, S. Jiang, S. S. Panwar, and H. J. Chao, “Byte-focal: a practical
load balanced switch,” in Proc. IEEE HPSR, May 2005, pp. 6–12.
[20] Y. Shen, S. S. Panwar, and H. J. Chao, “Performance analysis of a
practical load balanced switch,” in Proc. IEEE HPSR, 2006, p. 6.
[21] I. Keslassy, “The load-balanced router,” Ph.D. dissertation, Stanford
Univ., 2004.
[22] B. Lin and I. Keslassy, “The concurrent matching switch architecture,”
IEEE/ACM Trans. Netw., vol. 18, no. 4, pp. 1330–1343, Aug. 2010.
[23] C. S. Chang, D. S. Lee, and C. Y. Yue, “Providing guaranteed rate ser-
vices in the load balanced Birkhoff-von Neumann switches,” IEEE/ACM
Trans. Netw., vol. 14, no. 3, pp. 644–656, Jun. 2006.
[24] B. Lin and I. Keslassy, “A scalable switch for service guarantees,” in
High Performance Interconnects, 2005. Proceedings. 13th Symposium
on, Aug. 2005, pp. 93–99.
[25] C. S. Chang, W. J. Chen, and H. Y. Huang, “On service guarantees for
input-buffered crossbar switches: a capacity decomposition approach by
Birkhoff and von Neumann,” in IEEE IWQoS’99, 1999, pp. 79–86.
[26] M. C. Chan, T. T. Lee, and S. Y. Liew, “Statistical performance
guarantees in large-scale cross-path packet switch,” in Proc. IEEE ICC,
vol. 3, 2000, pp. 1748–1752.
[27] R. L. Cruz, “A calculus for network delay. I. network elements in
isolation,” IEEE Trans. Inf. Theory, vol. 37, no. 1, pp. 114–131, Jan.
1991.
[28] V. W. S. Chan, G. Weichenberg, and M. Medard, “Optical flow switch-
ing,” in Broadband Communications, Networks and Systems, 2006, 3rd
International Conference on, Oct. 2006, pp. 1–8.
[29] Z. Zhang, “Distribute and match - the DM switch for high speed packet
switching networks,” in Proc. IEEE BLOBECOM, Dec. 2011, pp. 1–6.
[30] T. T. Lee and S. C. Liew, Principle of broadband switching and
networking. Wiley-Interscience, 2010.
[31] S. Bassi, M. Decina, P. Giacomazzi, and A. Pattavina, “Multistage
shuffle networks with shortest path and deflection routing for high
performance ATM switching: the open-loop shuffleout,” IEEE Trans.
Commun., vol. 42, no. 10, pp. 2881–2889, Oct. 1994.
[32] S. C. Liew and T. T. Lee, “N logN dual shuffle-exchange network with
error-correcting routing,” in Proc. IEEE ICC, vol. 1, 1992, pp. 262–268.
[33] C. F. Hsu, T. L. Liu, and N. F. Huang, “Performance analysis of
deflection routing in optical burst-switched networks,” in Proc. IEEE
INFOCOM, vol. 1, 2002, pp. 66–73.
[34] C. S. Chang, W. J. Chen, and H. Y. Huang, “Birkhoff-von Neumann
input buffered crossbar switches,” in Proc. IEEE INFOCOM, vol. 3,
Mar. 2000, pp. 1614–1623.
[35] J. Y. Hui and E. Arthurs, “A broadband packet switch for integrated
transport,” IEEE J. Sel. Areas Commun., vol. 5, no. 8, pp. 1264–1273,
Oct. 1987.
[36] M. J. Karol, M. G. Hluchyj, and S. P. Morgan, “Input versus output
queueing on a space-division packet switch,” IEEE Trans. Commun.,
vol. 35, no. 12, pp. 1347–1356, Dec. 1987.
[37] J. Y. Hui, Switching and Traffic Theory for Integrated Broadband.
Massachusetts: Kluwer Academic Publisher, 1990.
