Online packet scheduling for CIOQ and buffered crossbar switches by Al-Bawan, Kamal et al.
  
 
 
 
warwick.ac.uk/lib-publications 
 
 
 
 
 
Original citation: 
Al-Bawan, Kamal , Englert, Matthias and Westermann, Matthias (2016) Online packet 
scheduling for CIOQ and buffered crossbar switches. In: 28th ACM Symposium on Parallelism 
in Algorithms and Architectures (SPAA), California, USA, 11-13 Jul 2016. Published in: SPAA 
'16 Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures 
pp. 241-250. 
Permanent WRAP URL: 
http://wrap.warwick.ac.uk/79789                
 
Copyright and reuse: 
The Warwick Research Archive Portal (WRAP) makes this work by researchers of the 
University of Warwick available open access under the following conditions.  Copyright © 
and all moral rights to the version of the paper presented here belong to the individual 
author(s) and/or other copyright owners.  To the extent reasonable and practicable the 
material made available in WRAP has been checked for eligibility before being made 
available. 
 
Copies of full items can be used for personal research or study, educational, or not-for profit 
purposes without prior permission or charge.  Provided that the authors, title and full 
bibliographic details are credited, a hyperlink and/or URL is given for the original metadata 
page and the content is not changed in any way. 
 
Publisher’s statement: 
"© ACM, 2016. This is the author's version of the work. It is posted here by permission of 
ACM for your personal use. Not for redistribution. The definitive version was published in 
SPAA '16 Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and 
Architectures pp. 241-250. http://doi.acm.org/10.1145/2935764.2935792  
 
A note on versions: 
The version presented here may differ from the published version or, version of record, if 
you wish to cite this item you are advised to consult the publisher’s version.  Please see the 
‘permanent WRAP url’ above for details on accessing the published version and note that 
access may require a subscription. 
 
For more information, please contact the WRAP Team at: wrap@warwick.ac.uk 
 
Online Packet Scheduling
for CIOQ and Buffered Crossbar Switches∗
Kamal Al-Bawani
Department of Computer
Science
RWTH Aachen University
Germany
kbawani@cs.rwth-
aachen.de
Matthias Englert
DIMAP and Department of
Computer Science
University of Warwick
UK
englert@dcs.warwick
.ac.uk
Matthias Westermann
Department of Computer
Science
TU Dortmund
Germany
matthias.westermann@cs.
tu-dortmund.de
ABSTRACT
We consider the problem of online packet scheduling in Com-
bined Input and Output Queued (CIOQ) and buffered cross-
bar switches. In the widely used CIOQ switches, packet
buffers (queues) are placed at both input and output ports.
An N ×N CIOQ switch has N input ports and N output
ports, where each input port is equipped with N queues, each
of which corresponds to an output port, and each output
port is equipped with only one queue. In each time step,
arbitrarily many packets may arrive at each input port, and
only one packet can be transmitted from each output port.
Packets are transferred from the queues of input ports to the
queues of output ports through the internal fabric. Buffered
crossbar switches follow a similar design, but are equipped
with additional buffers in their internal fabric. In either
model, our goal is to maximize the number or, in case the
packets have weights, the total weight of transmitted packets.
Our main objective is to devise online algorithms that are
both competitive and efficient. We improve the previously
known results for both switch models, both for unweighted
and weighted packets.
For unweighted packets, Kesselman and Rose´n (J. Algo-
rithms ‘06) give an online algorithm that is 3-competitive
for CIOQ switches. We give a faster, more practical algo-
rithm achieving the same competitive ratio. In the buffered
crossbar model we also show 3-competitiveness, improving
the previously known ratio of 4.
For weighted packets, we give 5.83- and 14.83-competitive
algorithms with an elegant analysis for CIOQ and buffered
crossbar switches, respectively. This improves upon the
previously known ratios of 6 and 16.24.
∗The second and third author’s work was supported by ERC
Grant Agreement No. 307696.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
SPAA ’16, July 11 - 13, 2016, Pacific Grove, CA, USA
c© 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ISBN 978-1-4503-4210-0/16/07. . . $15.00
DOI: http://dx.doi.org/10.1145/2935764.2935792
Keywords
Online algorithms; competitive analysis; scheduling; buffer
management; CIOQ switches; buffered crossbar switches
1. INTRODUCTION
In the widely used Combined Input and Output Queued
(CIOQ) switches, packet buffers (queues) are placed at both
input and output ports. An N × N CIOQ switch has N
input ports and N output ports. Each input port is equipped
with N queues, each of which corresponds to an output port,
and each output port is equipped with only one queue. The
switching fabric of the switch connects the input ports with
the output ports and are used to transfer packets from the
queues of input ports to the queues of output ports. Figure
1 depicts an example of a CIOQ switch.
When a packet arrives at a CIOQ switch, it is first tagged
with the following information: the value that represents its
class of service, i.e., its priority, the input port through which
it enters the switch, and the output port through which it
has to leave the switch. Packets proceed inside the switch
in the following way. They are first stored in the queues of
the input ports, such that each packet is stored in the queue
that corresponds to its output port. After that, they are
transferred from input to output ports through the switching
fabric, and reside in the queues of the output ports until they
are eventually sent out of the switch. However, queues inside
the switch are of limited capacities and there may be bursts of
packets arriving which exceed the capacities. Thus, queues
may overflow. Typically, packets are transferred through
the switching fabric with a rate that is S times the rate of
transmission, i.e., they are transferred through the switching
fabric over S cycles of speed in each time step. We call S
the speedup of the switch. It is worth noting here that we
consider non-FIFO queues, i.e., packets can be stored in and
released from queues in any arbitrary order.
Closely related to CIOQ switches, another type of switch
architecture, the so-called buffered crossbar switches, is ob-
tained by adding further queues at the crosspoints of the
switching fabric. More specifically, for every queue at the
input ports, an additional queue is placed at the switch-
ing fabric and dedicated to accommodate packets that are
transferred from the input queue before they later on are
transferred further to the corresponding output port. The
number of those crossbar queues is proportional to the num-
ber of crosspoints, i.e., N2, but it has been shown that
the adoption of crossbar queues significantly decreases the
input ports switching fabric output ports
Figure 1: CIOQ switch — An example with N = 3
scheduling overhead of CIOQ switches. Figure 2 depicts an
example of a buffered crossbar switch.
Packet scheduling in both CIOQ and buffered crossbar
switches has been extensively studied in the networking liter-
ature (see, e.g., [9, 10]). The design and analysis of scheduling
algorithms in that line of research is mostly based on prior
assumptions about the traffic distribution, e.g., Poisson-like
distributions. However, it has been shown that Internet traf-
fics do not necessarily adhere to such particular distributions
(see, e.g., [26, 28]). We do not make any prior assumptions
about the arrival behavior of packets, and instead resort
to the framework of competitive analysis [27], which is the
typical worst-case analysis used to assess the performance
of online algorithms, i.e., algorithms whose input is revealed
piece by piece over time, and the decision they make in each
time step is irrevocable.
In competitive analysis, the benefit, in our case the switch’s
throughput, of an online algorithm is compared to the benefit
of an optimal algorithm opt which is assumed to know the
entire input sequence in advance. An online algorithm onl is
called c-competitive if, for each input sequence σ, the benefit
of opt over σ is at most c times the benefit of onl over σ.
The value c is also called the competitive ratio of onl.
1.1 Our Contribution
Our objective in the CIOQ model is twofold: to devise
online algorithms that are both competitive and efficient. All
online algorithms known for this problem are based on com-
puting a maximum matching in each scheduling cycle, and
thus are far from being efficient for real-world switches. We
present new algorithms that are significantly more efficient
and yet achieve the best competitive ratios known for this
problem.
In each scheduling cycle, a bipartite graph is induced from
the current configuration of the input and output queues,
where the vertices of the left-hand side correspond to the
input ports, and the vertices of the right-hand side correspond
to the output ports. An edge (i, j) indicates that a packet
can be transferred from the i-th input port to the j-th output
port. Clearly, a matching in this graph corresponds to an
admissible schedule for the current scheduling cycle.
We present two online algorithms in this model: Greedy
Matching (gm) for the unit-value case, i.e., where all packets
have the same value, and Preemptive Greedy (pg) for the
general-value case. Both algorithms are based on greedy
maximal matching computations, i.e., we construct a match-
ing incrementally by adding edges, one by one, until no more
edges can be added. This is much more efficient than comput-
ing the maximum matchings that have been used in previous
works. Moreover, computing maximal matchings complies
input ports switching fabric
output ports
Figure 2: Buffered crossbar switch — An example
with N = 3
more with the current practice in distributed systems where
packet scheduling has to perform in real time.
With respect to competitiveness, we show in Section 2.1
that gm is 3-competitive for any speedup, and thus it achieves
the best competitive ratio known for this problem [22]. In
Section 2.2, we show that pg has a competitive ratio of
3 + 2
√
2 ≈ 5.828 for any speedup, which improves upon the
previously known competitive ratio of 6 [23].
To obtain these results in an elegant way, we manipulate
the queues of an optimal oﬄine algorithm such that certain
invariants in relation to our online algorithms are maintained.
The techniques we use in the analysis of gm and pg also
allow us to achieve improved upper bounds in the related
model of buffered crossbar switches. For the unit-value
case of this model, Kesselman et al. [20] present a greedy
algorithm, which we call Crossbar Greedy Unit (cgu), with
a competitive ratio of 4 for any speedup. We improve on
this result and show that cgu is indeed 3-competitive. For
the general-value case, they give an algorithm that is 16.24-
competitive for any speedup. We present a slightly different
algorithm, Crossbar Preemptive Greedy (cpg), and show
that it achieves a competitive ratio of 12 + 2
√
2 ≈ 14.828 for
any speedup.
A similar analysis technique has been successfully used by
Jez˙ et al. [18] for other packet scheduling related problems.
However, in that work the buffer is manipulated in such a
way that the optimal algorithm and the online algorithm
always have an identical buffer content. In our proofs, we
maintain different invariants.
1.2 Related Work
For the general-value case of CIOQ switches with FIFO
queues, i.e., packets are stored and released in order of their
arrival, Kesselman and Rose´n [22] give two algorithms with
competitive ratios of 4 · S and 8 ·min{k, 2 logα}, where k
is the number of distinct packet values and α is the ratio
between the largest and the smallest packet value. The
latter result was improved by Azar and Richter [7] where
they give an algorithm with a competitive ratio of 8 for any
speedup. Kesselman et al. [21] show that this algorithm
is 7.47-competitive. For the buffered crossbar model with
FIFO queues, Kesselman et al. [19] give a 19.95-competitive
algorithm for any speedup.
The model of input queued switches (IQ) consists of m
queues of the same capacity B and one output port. It is
worth noticing that CIOQ switches generalize this model
since if the speedup is 1 and only one input port is in use,
CIOQ and IQ switches become equivalent. In the unit-value
case of the IQ model, Azar and Richter [6] provide a lower
bound of 2−1/m on the competitive ratio of any deterministic
algorithm. Clearly, this lower bound carries over to the CIOQ
model. Further results on the IQ model can be found in [3,
4, 5, 8, 17].
The problem of packet scheduling (also known as buffer
management) has also been studied under several other mod-
els. For example, the multi-queue model with shared memory
[1, 14, 15], the multi-queue model with class segregation [2],
the single-queue (FIFO) model [12], and the bounded delay
model, where packets have deadlines besides their values [11,
24]. Comprehensive and up-to-date surveys on this problem
and its variants can be found in [13, 16, 25].
1.3 Models and Notations
We consider a CIOQ switch with N input ports and N
output ports. Each input port has N queues and each output
port has one queue. We call the queues at the input ports the
input queues and those at the output ports the output queues.
An input queue that is placed at input port i (i = 1, . . . , N)
and corresponds to output port j (j = 1, . . . , N) is denoted
by Qij . An output queue that is placed at output port j
(j = 1, . . . , N) is denoted by Qj . For any input or output
queue Q, the capacity of Q is denoted by B(Q), and Q(t)
denotes the set of packets that reside in Q at time t. All
queues in the switch are non-FIFO, i.e., packets may be
stored in and released from queues in any arbitrary order.
An input instance of this problem is a sequence of packets
arriving at the switch in an online manner, i.e., packets that
arrive at time t are not known before t. All packets have
the same size. For each packet p in the input sequence, v(p),
arr(p), in(p), and out(p) denote p’s value, arrival time, input
port, and output port, respectively, where in(p) and out(p)
take on values between 1 and N .
Time is discretized into steps of unit length, and each
of these time steps is divided into three phases; namely,
arrival, scheduling and transmission phases. In the arrival
phase, arbitrarily many packets may arrive at the switch.
An arriving packet p is either accepted and thus inserted in
queue Qij , where i = in(p) and j = out(p), or it is rejected,
i.e., discarded.
In the scheduling phase, a set of packets that are stored in
input queues are transferred to their corresponding output
queues through the switching fabric of the switch. These
transfers take place in internal time cycles which we call the
scheduling cycles. We say that a switch has a speedup S
when it is capable of performing S scheduling cycles within
a single time step. we denote the s-th cycle of time step t
by t[s], for s = 1, . . . , S. In any scheduling cycle, a matching
between input and output ports is computed, such that at
most one packet is released from each input port and at most
one packet is admitted to each output port. More specifically,
when a packet p is transferred from queue Qij in scheduling
cycle t[s], it is forwarded through the switching fabric to
queue Qj , and no packet except p is released from input port
i or forwarded to output port j in t[s].
Finally, in the transmission phase, at most one packet is
sent out from each output queue, i.e., transmitted to its next
destination on the network.
Preemption is allowed, i.e., a packet that was previously
inserted into a queue can be preempted, i.e., discarded, before
it is sent. Therefore, a packet may be lost in one of two
occasions: rejection upon its arrival, or preemption after
getting stored in a queue.
The benefit made by an online algorithm onl on an input
sequence σ is denoted by onl(σ), and is defined as the total
value of packets that onl sends from the output queues. We
aim at maximizing this benefit. An algorithm that knows
the entire input beforehand and makes the maximum benefit
on any sequence is denoted as opt. An online algorithm onl
is c-competitive if opt(σ) ≤ c ·onl(σ) for any input sequence
σ.
Finally, the time that precedes the first arrival of the
sequence is denoted as time 0. We assume that the queues
of any algorithm are all empty at time 0.
Buffered crossbar switches are obtained by adding further
queues at the crosspoints of the switching fabric. A crossbar
queue that is placed at the crosspoint of input port i (i =
1, . . . , N) and output port j (j = 1, . . . , N) is denoted by Cij .
Again, all queues in the switch are non-FIFO, i.e., packets
may be stored in and released from queues in any arbitrary
order.
All other notations and conventions of the CIOQ model
hold also for the buffered crossbar model. However, each
cycle of the scheduling phase in the buffered crossbar model
is divided into two subphases: the input subphase and the
output subphase. In the input subphase, packets can be trans-
ferred from any input queue Qij to its corresponding crossbar
queue Cij , such that at most one packet is transferred from
each input port i. In the output subphase, packets can be
transferred from any crossbar queue Cij to its corresponding
output queue Qj , such that at most one packet is transferred
to each output port j.
2. CIOQ SWITCHES
2.1 Unit-Value Case
In this case, all packets have the same value 1. Thus, our
goal is to maximize the number of transmitted packets. In
the following, we present the Greedy Matching algorithm
(gm).
Arrival phase: For every arriving packet p with
in(p) = i and out(p) = j, accept p if Qij is not
full; otherwise, reject p.
Scheduling phase: In every scheduling cycle
t[s], a bipartite graph Gt[s] = (U, V,E) is in-
duced from the current configuration of the switch,
where U = {u1, . . . , uN}, V = {v1, . . . , vN} and
an edge (ui, vj) ∈ E if and only if the input queue
Qij is not empty and the output queue Qj is not
full at t[s].
A greedy matching Mt[s] is then computed on
Gt[s] in the following way: Start with an empty
matching and iterate over all edges of E. Add
an edge e to the current matching if e does not
violate the matching property.
After Mt[s] is computed, for each edge (ui, vj) ∈
Mt[s], the head packet of Qij is transferred to Qj .
Transmission phase: For every non-empty out-
put queue Qj , send the packet at the head of
Qj .
The next theorem shows that gm is 3-competitive for any
speedup.
Theorem 1. The competitive ratio of gm is at most 3 for
any speedup.
From now on, we fix an input sequence σ, and, for any
input or output queue Q, we reserve the notation Q for the
online algorithm and use Q∗ to denote the corresponding
queue of the oﬄine algorithm opt.
First, without loss of generality, we assume that opt is
greedy in transmission events, i.e, it sends a packet from an
output queue as long as its queue is not empty. Obviously,
as opt knows in advance which packets it is going to send,
holding packets back in output queues, rather than sending
them as early as possible, cannot improve its benefit.
Now, we modify opt in a way that does not decrease its
benefit of σ. Specifically, at the end of each scheduling cycle
t[s], i.e., immediately after opt performs its scheduling policy,
we apply the following two modifications on the configuration
of opt in the given order:
Modification 2.1.1. Suppose that gm trans-
fers a packet from Qij and opt does not transfer
any packet from Q∗ij in t[s]. If Q
∗
ij is not empty
in t[s], we release a packet p from Q∗ij and send it
directly out of the switch, i.e., through an imagi-
nary channel. In this case, p is called a privileged
packet of Type 1 and it contributes to the benefit
of the optimal algorithm.
Modification 2.1.2. Suppose that opt trans-
fers a packet p to Q∗j and gm does not transfer
any packet to Qj in t[s]. If Qj is not full in t[s],
we send p directly out of the switch. In this case,
we call p a privileged packet of Type 2 and it con-
tributes to the benefit of the optimal algorithm.
Clearly, these modifications do not decrease the benefit
of the optimal algorithm. They can only make it stronger
by allowing it to send packets directly from input ports to
outside the switch without being enqueued in output ports.
The input and output queues will respectively become shorter
in this case and thus the optimal algorithm may accept more
new packets.
Before we continue, we introduce further notations. We
call packets that opt schedules through the normal channels,
i.e., they are not privileged, normal packets. We use S∗
and P ∗ to denote the sets of opt’s normal and privileged
packets, respectively. Clearly, the benefit of opt is given by
|P ∗|+ |S∗|. We also use S to denote the set of packets sent
by gm. Thus, we want to show that |P ∗|+ |S∗| ≤ 3 |S|.
We now show how to derive the competitive ratio of 3.
First, we show in Lemma 1 how Modifications 2.1.1 and 2.1.2
are used to preserve the following invariant: At any time,
each queue in gm is no shorter than its counterpart in opt.
Therefore, for any time step t and output port j, if opt sends
a packet from j in t, gm must also send a packet from j in
t. Hence, |S∗| ≤ |S|. After that, we show by Lemma 3 that
|P ∗| ≤ 2 |S|. Thus, the proof of Theorem 1 follows directly
from these two lemmas.
Lemma 1. For any i, j ∈ {1, . . . , N} and any time t, the
following inequalities hold:
I1. |Q∗ij(t)| ≤ |Qij(t)|
I2. |Q∗j (t)| ≤ |Qj(t)|
Proof. Inequalities I1 and I2 can be shown by a simple
induction over time. Let the induction base be at time 0,
i.e., before the sequence starts. All queues are empty at this
time and thus I1 and I2 hold. Assume now that they hold
for any time up to time t− 1. We next show that they hold
for t as well.
Clearly, queues change in arrival, scheduling and transmis-
sion events only. So, we assume that t is the time immediately
after an event τ that is either an arrival, scheduling or trans-
mission event.
Assume τ is an arrival event. Clearly, output queues do
not change in arrival events and thus I2 holds for this case.
For I1, the only critical case is when the arriving packet is
rejected by gm and accepted by opt. However, the input
queue of gm must be full in this case and thus I1 still holds.
Now, let τ be a scheduling event. Here, the only critical
case for I1 is when gm transfers a packet from Qij while opt
does not transfer anything from Q∗ij . However, either Q
∗
ij is
empty in this case or it cannot happen due to Modification
2.1.1. For I2, the only critical case is when opt inserts a
packet into Q∗j while gm does not insert anything into Qj .
However, either Qj is full in this case or it cannot happen
due to Modification 2.1.2.
Finally, assume τ is a transmission event. Clearly, the
input queues do not change in transmission events and thus
I1 holds for this case. For I2, the only critical case is when
gm sends a packet from Qj while opt does not send anything
from Q∗j . However, since we assume that opt is greedy at
sending, its output queue must be empty in this case and
thus I2 still holds.
The following lemma shows that if Modification 2.1.2 takes
place, gm must transfer a packet from the same input port.
Lemma 2. Suppose that, in t[s], opt transfers a packet p
from Q∗ij to Q
∗
j and gm does not transfer any packet to Qj.
If Qj is not full in t[s], then gm transfers a packet p
′ from
Qij′ in t[s], where j
′ 6= j.
Proof. Recall the bipartite graph Gt[s] and the corre-
sponding matching Mt[s] which are induced from the config-
uration of gm right before performing the scheduling cycle
t[s].
Assume that Qj is not full in t[s]. By Inequality I1 of
Lemma 1, since opt transfers p from Q∗ij in t[s], gm must
have at least one packet in Qij . Therefore, an edge (ui, vj)
must be in E. Nevertheless, since gm does not transfer any
packet to Qj , (ui, vj) is not in Mt[s]. Since Mt[s] is a maximal
matching, there must exist an edge (ui, vj′), for j
′ 6= j, such
that (ui, vj′) ∈Mt[s]. Hence, a packet p′ is transferred from
Qij′ in t[s].
Lemma 3. The following inequality holds:
|P ∗| ≤ 2 |S| .
Proof. We carry out the following mapping scheme from
P ∗ to S in each scheduling cycle t[s].
1. Let p be a privileged packet of Type 1 that is sent
by opt from Q∗ij in t[s]. By Modification 2.1.1, gm
transfers a packet p′ from Qij in t[s]. Map p to p′.
2. Let p be a privileged packet of Type 2 that is sent by
opt from Q∗ij . By Lemma 2, gm transfers a packet p
′
from Qij′ in t[s], where j
′ 6= j. Map p to p′.
Clearly, this mapping scheme is feasible, i.e., each packet
p ∈ P ∗ is mapped to a packet q ∈ S. Furthermore, at most
two privileged packets can be mapped to each packet q ∈ S.
To see that, let q be a packet transferred by gm from Qij in
a scheduling cycle t[s]. Clearly, q can get mapped only in
t[s], provided that opt sends privileged packets in this time.
By Modifications 2.1.1 and 2.1.2, opt can send at most 2
privileged packets from input port i in t[s]: one of Type 1
if opt’s queue of Q∗ij is not empty, and one of Type 2 if it
transfers a packet from another queue Q∗ij′ . Thus, these two
privileged packets are mapped to q.
2.2 General-Value Case
For the case of arbitrary packet values, we present the
Preemptive Greedy algorithm (pg) that is a variant of a
6-competitive algorithm given by Kesselman and Rose´n [23].
We show next that pg has a competitive ratio of 3 + 2
√
2 ≈
5.828 for any speedup.
Before we describe pg formally, we introduce further nota-
tions. Let gij(t) denote the packet with the greatest value
in Qij at time t, and lij(t) (resp. lj(t)) denote the packet
with the least value in Qij (resp. Qj) at time t. Addition-
ally, let β ≥ 1 be a parameter of the algorithm that will be
determined later.
Arrival phase: If a packet p arrives at time t
with in(p) = i and out(p) = j, accept p if
|Qij(t)| < B(Qij)
∨
v(lij(t)) < v(p) ;
otherwise, reject p. If p is accepted while |Qij(t)| =
B(Qij), then lij(t) is preempted.
Scheduling phase: In every scheduling cycle
t[s], a weighted bipartite graphGt[s] = (U, V,E,w)
is induced from the current configuration of the
switch, where U = {u1, . . . , uN}, V = {v1, . . . , vN},
an edge (ui, vj) ∈ E if and only if
|Qij(t[s])| > 0
∧(
|Qj(t[s])| < B(Qj)
∨
v(gij(t[s])) > β v(lj(t[s]))
)
,
and the weight of (ui, vj) is given by w(ui, vj) =
v(gij(t[s])).
A greedy matching Mt[s] is then computed on
Gt[s] in the following way: Start with an empty
matching and iterate over all edges of E in a
descending order of their weights. Add an edge e
to the current matching if e does not violate the
matching property.
After Mt[s] is computed, for each edge (ui, vj) ∈
Mt[s], the packet gij(t[s]) is transferred to Qj . If
gij(t[s]) is transferred while |Qj(t[s])| = B(Qj),
then lj(t[s]) is preempted.
Transmission phase: For every non-empty out-
put queue Qj , send the packet with the greatest
value in Qj .
As described above, unlike the algorithm given in [23], pg
computes a maximal weighted matching in each scheduling
cycle rather than a maximum weighted matching.
Theorem 2. For β =
√
2 + 1, the competitive ratio of pg
is at most 3 + 2
√
2 ≈ 5.828 for any speedup.
First, we fix an input sequence σ. Without loss of general-
ity, we make the following assumptions about opt:
A1. opt is greedy in scheduling and transmission events, i.e,
when it transfers or sends a packet p from an input or
output queue, it chooses p as the one with the greatest
value in the queue.
A2. opt is work-conserving at output ports, i.e., it sends
a packet from every non-empty output queue in each
transmission event.
Obviously, as opt knows in advance which packets it is
going to send, it does not matter for opt in which order these
packets are released from queues or when they are transmit-
ted from output queues. Now, based on the greediness of
both pg and opt, we make another harmless assumption:
A3. In all input and output queues, pg and opt store
packets in the order of their values, where the packet
with the greatest value is at the queue’s head and the
one with the least value is at the queue’s tail.
Similarly to the unit-value case, we modify opt with-
out decreasing its benefit. Specifically, at the end of each
scheduling cycle t[s], i.e., immediately after opt performs its
scheduling policy, we apply the following modifications on
the configurations of opt:
Modification 2.2.1. Suppose that pg trans-
fers a packet from Qij and opt does not transfer
any packet from Q∗ij in t[s]. If Q
∗
ij is not empty in
t[s], we release the head packet p of Q∗ij , i.e., the
packet with the greatest value in Q∗ij , and send it
directly out of the switch. In this case, we call p
a privileged packet of Type 1 and it contributes
to the benefit of the optimal algorithm.
Modification 2.2.2. If opt transfers a packet
p to Q∗j and pg transfers a packet q to Qj in t[s]
with v(q) < v(p), we send p directly out of the
switch. In this case, we call p a privileged packet
of Type 2 and it contributes to the benefit of the
optimal algorithm.
Modification 2.2.3. Suppose that opt trans-
fers a packet p to Q∗j and pg does not transfer
any packet to Qj in t[s]. If Qj is not full in t[s]
or v(p) > β v(lj(t[s])), we send p directly out of
the switch. In this case, we call p a privileged
packet of Type 3 and it contributes to the benefit
of the optimal algorithm.
Note that Modifications 2.2.2 and 2.2.3 are closely related
and dealing with them separately is only for ease of exposi-
tion.
Let δij(k, t) (resp. δj(k, t)) denote the packet at position
k in Qij (resp. Qj) at time t, where position 1 corresponds
to the head of the queue. Let δ∗ij(k, t) and δ
∗
j (k, t) be the
corresponding notations for opt. The following lemma shows
that each packet in an opt’s input queue is aligned to a packet
of the same or greater value in the corresponding input queue
of pg, and each packet p in an opt’s output queue is aligned
to a packet q in the corresponding output queue of pg, where
v(p) ≤ βv(q).
Lemma 4. For any i, j ∈ {1, . . . , N} and any time t, the
following inequalities hold:
I1. v(δ∗ij(k, t)) ≤ v(δij(k, t)), for k = 1, . . . , |Q∗ij(t)|
I2. v(δ∗j (k, t)) ≤ β v(δj(k, t)), for k = 1, . . . , |Q∗j (t)|
Proof. Inequalities I1 and I2 can be shown by a simple
induction over time. Let the induction base be at time 0,
i.e., before the sequence starts. All queues are empty at this
time and thus I1 and I2 hold. Assume now that they hold
for any time up to time t− 1. We next show that they hold
for time t as well.
Clearly, input queues change only in arrival, scheduling or
transmission events. So, we assume that t is the time imme-
diately after an event τ which is either an arrival, scheduling
or transmission event. In the following, we will argue only
for I2. The argument for I1 is analogous, and we will put
the main differences between [ ] at the respective positions.
Before we start, we say that a packet p ∈ Q∗j (t) is in a legal
alignment, if p is aligned in time t to a packet q ∈ Qj(t) with
v(p) ≤ βv(q). Clearly, it suffices to show that any packet
p ∈ Q∗j (t) is in a legal alignment. We distinguish between
two cases:
Case I2.1 p ∈ Q∗j (t−1). Thus, by induction, p is aligned in
t−1 to a packet q ∈ Qj(t−1) with v(p) ≤ βv(q) [resp. v(p) ≤
v(q)]. We need to show in this case that p either remains
in the same alignment in t or it changes to another legal
alignment. Assumption A3 implies that any packet p from
t− 1 either remains in its position in time t, moves one step
ahead (if a packet, that is in front of p, is sent from the
queue) or moves one step back (if a new packet is inserted
in front of p).
Assume now that p remains in its position in t but q
moves. Note that neither q nor any packet in front of it
can be released from the queue in time t; otherwise, by
Assumption A2 [resp. Modification 2.2.1], some packet would
be also released from Q∗j , which makes p move one step ahead.
Thus, q can only move back in t. In this case, however, the
packet q′ that is directly in front of q is aligned with p. Since
v(q) ≤ v(q′), p is again in a legal alignment.
Next, assume that p moves one step ahead in t. In this
case, p either remains in a legal alignment with q (in case
q moves ahead as well) or it aligns with a packet that is in
front of q in t− 1 and thus makes again a legal alignment.
Finally, assume that p moves one step back in t. Thus,
a packet p′ must be inserted in front of p, implying that
v(p) ≤ v(p′). Note that the insertion of p′ happens only in
one of two cases: (i) if a packet r with v(r) ≥ v(p′) is inserted
into Qj (by Modifications 2.2.2), or (ii) if Qj is full in t and
v(p′) ≤ βv(lj(t)) (by Modifications 2.2.3). Let k denote the
position of the alignment (p, q) in time t − 1. In case (i),
either (1) r is inserted in a position k′ ≤ k, and thus p will
be aligned again with q in t, or (2) r is inserted in a position
k′ > k, and thus p will be aligned with some packet q′ in
t. Clearly, the second case implies that v(r) ≤ v(q′). Since
v(p) ≤ v(p′) ≤ v(r), then v(p) ≤ v(q′). Hence, p is in a legal
alignment in either case.
In case (ii), since Qj is full in t, p must be aligned with
some packet q′ in t. Clearly, v(lj(t)) ≤ v(q′). Moreover,
since v(p′) ≤ βv(lj(t)), v(p) ≤ v(p′) ≤ βv(q′). Thus, p makes
a legal alignment with q′. [The respective cases for I1 are:
case (i) p′ is also inserted into Qij , thus r = p′ in the above
argument, and case (ii) Qij is full in t and v(lj(t)) ≥ v(p′).]
Case I2.2 p /∈ Q∗j (t− 1). Thus, p is a new packet that is
inserted in the queue in time t. Again, note that the insertion
of p into Q∗j happens only in one of two cases: (i) if a packet
r with v(r) ≥ v(p) is inserted into Qj (by Modification 2.2.2),
or (ii) if Qj is full in t and v(p) ≤ βv(lj(t)) (by Modifications
2.2.3). In case (ii), since Qj is full in t, p must be aligned
with a packet q in t. Since v(p) ≤ βv(lj(t)), v(p) ≤ βv(q).
Thus, p makes a legal alignment with q.
Now, consider case (i). Let k denote the position at which
p is inserted. If k = 1, p is aligned with the most valuable
packet in Qj in t. Since r is in Qj in time t, p must be aligned
with a packet of value at least v(r) ≥ v(p). Now suppose
k > 1. Let p′ be the packet that is directly in front of p in
t. Clearly, p′ ∈ Q∗j (t − 1) and v(p) ≤ v(p′). Furthermore,
let q′ be the packet aligned with p′ in time t − 1. Thus,
v(p) ≤ v(p′) ≤ βv(q′). Additionally, let q be the packet at
position k in Qj in time t − 1 (assume q = ∅, if this is an
empty position in Qj).
Note that (1) r is inserted in position k, and thus p will be
aligned with r in t, (2) r is inserted in a position k′ < k, and
thus p will be aligned with q′ in t, or (3) r is inserted in a
position k′ > k, and thus p will be aligned with q in t. Clearly,
the last case implies that q 6= ∅ and that v(q) ≥ v(r) ≥ v(p).
Therefore, we have v(p) ≤ v(r) in the first case, v(p) ≤ βv(q′)
in the second, and v(p) ≤ v(q) in the third. Hence, p is in a
legal alignment in any case.
[The respective cases for I1 are: case (i) p is also inserted
into Qij , thus r = p in the above argument, and case (ii) Qij
is full in t and v(lj(t)) ≥ v(p).]
Similarly to the analysis of the unit-value case, granting
opt with privileged packets must be done carefully, so that
the total value of privileged packets remains within a certain
factor of the total value of packets that pg sends. Obviously,
each privileged packet of Type 1 can be paired with a packet
that pg transfers from the same input queue. In the following
two lemmas, we show that such a pairing is feasible for
privileged packets of Type 2 and 3 as well. Of course, as
packets of pg may be preempted after being transferred to
output queues, some pairs can be destructed. However, we
will show in Lemma 7 how to fix this problem.
Lemma 5. If opt transfers a packet p from Q∗ij to Q
∗
j and
pg transfers a packet q to Qj in t[s] with v(q) < v(p), then
pg transfers a packet p′ from Qij′ in t[s] with j
′ 6= j and
v(p′) ≥ v(p).
Proof. Recall the bipartite graph Gt[s] and the corre-
sponding matching Mt[s] which are induced from the config-
uration of pg right before performing the scheduling cycle
t[s].
By Inequality I1 of Lemma 4, since opt transfers p from
Q∗ij in t[s], pg must have at the head of Qij a packet r with
v(r) ≥ v(p). Obviously, v(r) > v(q) and thus q 6= r. As a
result, q must be transferred from an input queue Qi′j with
i′ 6= i. Moreover, since q is inserted in Qj , the edge (ui′ , vj) ∈
E, and either |Qj(t[s])| < B(Qj) or v(q) > βv(lj(t[s])).
Thus, it holds also for r that either |Qj(t[s])| < B(Qj) or
v(r) > βv(lj(t[s])). Hence, the edge (ui, vj) ∈ E as well, and
clearly w(ui, vj) ≥ w(ui′ , vj). This implies that (ui, vj) is
considered before (ui′ , vj) during the computation of Mt[s].
However, since (ui, vj) is not in the matching, the node ui
must have been matched before considering (ui, vj), and
thus there exists an edge (ui, vj′), for j
′ 6= j, that is inserted
in the matching before considering (ui, vj). As a result, a
packet p′ is transferred from Qij′ , and it must hold that
w(ui, vj′) ≥ w(ui, vj). Hence, v(p′) ≥ v(r) ≥ v(p).
The proof of the following lemma is analogous to that of
Lemma 5.
Lemma 6. Suppose that, in t[s], opt transfers a packet p
from Q∗ij to Q
∗
j and pg does not transfer any packet to Qj . If
Qj is not full in t[s] or v(p) > βv(lj(t[s])), then pg transfers
a packet p′ from Qij′ in t[s] with j
′ 6= j and v(p′) ≥ v(p).
Now, recall Inequality I2 of Lemma 4. It implies that if
opt sends a packet of value v from some output queue at
some time, pg must send a packet of at least v/β from the
same output queue at the same time. Let S (resp. S∗) denote
the set of all packets that pg (resp. opt) sends from output
queues. Thus, ∑
p∈S∗
v(p) ≤ β
∑
p∈S
v(p) .
Moreover, let P ∗ denote the set of all privileged packets, of
all types, that opt sends directly out of the switch. The
next lemma shows that∑
p∈P∗
v(p) ≤ 2β
β − 1
∑
p∈S
v(p) .
Thus, we can conclude the competitive ratio of pg as follows
opt(σ) =
∑
p∈S∗
v(p) +
∑
p∈P∗
v(p)
≤ β
∑
p∈S
v(p) +
2β
β − 1
∑
p∈S
v(p)
=
(
β +
2β
β − 1
)
pg(σ) .
Finally, it is easy to verify that the optimal value for β is√
2 + 1, resulting in a competitive ratio of 3 + 2
√
2 ≈ 5.828.
Lemma 7. The following inequality holds:∑
p∈P∗
v(p) ≤ 2β
β − 1
∑
p∈S
v(p) .
Proof. We consider the following mapping scheme:
1. Let p be a privileged packet of Type 1 that is sent by
opt from Q∗ij in scheduling cycle t[s]. By Modification
2.2.1, pg transfers a packet p′ from Qij in t[s], and by
Inequality I1 of Lemma 4, v(p) ≤ v(p′). Map p to p′.
2. Let p be a privileged packet of Type 2 that is sent by
opt from Q∗ij in scheduling cycle t[s]. By Lemma 5,
pg transfers a packet p′ from Q∗ij′ in t[s] with j
′ 6= j
and v(p) ≤ v(p′). Map p to p′.
3. Let p be a privileged packet of Type 3 that is sent by
opt from Q∗ij in scheduling cycle t[s]. By Lemma 6,
pg transfers a packet p′ from Q∗ij′ in t[s] with j
′ 6= j
and v(p) ≤ v(p′). Map p to p′.
4. Let q be a packet that is preempted from an output
queue Qj by another packet p
′. For each privileged
packet p that is mapped to q, re-map p to p′.
As shown above, this mapping scheme is feasible, i.e., each
packet p ∈ P ∗ is mapped to a packet p′ ∈ S. Now, it remains
to show that the total value of privileged packets that are
mapped to each packet p′ ∈ S is at most 2β
β−1v(p
′).
For any packet p′ ∈ S, p′ can get mapped in two events:
when it is scheduled and when it preempts a packet from an
output queue.
Assume that p′ is scheduled from Qij′ to Qj′ during
scheduling cycle t[s]. Now, assume that opt transfers a
packet from Q∗ij to Q
∗
j during t[s]. Clearly, we can only
send one privileged packet p1 of Type 1 from Q
∗
ij′ in t[s]
(in case j 6= j′). Furthermore, we can only send from Q∗ij
either a privileged packet p2 of Type 2 (in case pg transfers
a packet q to Qj with v(q) < v(p2)), or a privileged packet
p3 of Type 3 (in case pg does not transfer any packet to Qj).
Hence, at most two privileged packets may be sent during t[s]
from each input port i. Since privileged packets are mapped
only to packets that are transferred by pg from the same
input port during the same scheduling cycle, at most two
packets from {p1, p2, p3} can be mapped to p′. Furthermore,
as shown in the mapping scheme above, the value of any of
these privileged packets is at most the value of p′. Thus, the
total value of privileged packets that are mapped to p′ when
it is scheduled is at most 2 v(p′).
Assume now that p′ is the m-th packet in a chain of
packets q0, . . . , qm in which packet qn preempts packet qn−1,
for 1 ≤ n ≤ m. Let x(qn) denote the total value of privileged
packets that are mapped to a packet qn after it preempts
qn−1. Thus, the total value of privileged packets that are
mapped to p′ is given by x(qm). Note that q0 does not
preempt any packet and thus the total value of privileged
packets that are mapped to q0 is at most 2 v(q0). Thus, x(qm)
can be given by the following recursion:
x(q0) ≤ 2 v(q0) and
x(qn) ≤ 2 v(qn) + x(qn−1) , for 0 < n ≤ m .
Solving this recursion, we obtain that
x(qm) ≤ 2
m∑
n=0
v(qn) .
Note also that v(qn−1) ≤ v(qn)/β, for 1 ≤ n ≤ m. Hence,
we can rewrite x(qm) as follows:
x(qm) ≤ 2v(qm)
m∑
n=0
(1/β)n
≤ 2β
β − 1v(qm) .
3. BUFFERED CROSSBAR SWITCHES
3.1 Unit-Value Case
For the case where all packets have value 1, Kesselman
et al. [20] considered the following algorithm, which we call
Crossbar Greedy Unit (cgu). The arrival and transmission
phases of cgu are the same as those of gm (Section 2.1). In
the scheduling phase, cgu works as follows.
Scheduling phase: We divide every scheduling
cycle t[s] into two subphases:
- Input subphase: For each input port i,
choose an arbitrary input queue Qij which
satisfies
|Qij(t[s])| > 0
∧
|Cij(t[s])| < B(Cij) ,
and transfer its head packet.
- Output subphase: For each output queue
Qj , choose an arbitrary crossbar queue Cij
which satisfies
|Qj(t[s])| < B(Qj)
∧
|Cij(t[s])| > 0 ,
and transfer its head packet.
The next theorem shows that cgu is 3-competitive for any
speedup.
Theorem 3. The competitive ratio of cgu is at most 3
for any speedup.
First, we fix an input sequence σ. Again, we modify opt in
a way that does not decrease its benefit over σ. Specifically,
at the end of each scheduling cycle t[s], i.e., immediately after
opt performs its scheduling policy, we apply the following
modifications on the configuration of opt in the given order:
Modification 3.1.1. Suppose that cgu trans-
fers a packet from Qij and opt does not transfer
any packet from Q∗ij 6= ∅ in t[s]. We transfer a
packet p from Q∗ij in t[s]. If C
∗
ij is not full in
t[s], p is transferred to C∗ij. Otherwise, p is sent
directly out of the switch. In either case, p is
called a privileged packet and it contributes to
the benefit of the optimal algorithm.
Modification 3.1.2. Suppose that cgu trans-
fers a packet to Cij and opt does not transfer
any packet to C∗ij in t[s]. If C
∗
ij is not full in t[s]
and no privileged packet is transferred to C∗ij by
Modification 3.1.1 in t[s] (possibly because cgu
transfers from Qij while Q
∗
ij is empty), we gen-
erate a new packet and insert it into C∗ij. Such
a new packet is called an extra packet of Type
1 and it contributes to the benefit of the optimal
algorithm.
Modification 3.1.3. Suppose that opt trans-
fers a packet from C∗ij and cgu does not transfer
any packet from Cij in t[s]. If Cij is not empty in
t[s], we generate a new packet and insert it into
C∗ij . Such a new packet is called an extra packet
of Type 2 and it contributes to the benefit of the
optimal algorithm.
Note that extra packets are not used in the analysis of the
algorithms presented in Section 2 for the CIOQ model. Next,
we show how the above modifications are used to show a set
of invariants that is different from the invariants shown in
Section 2.1.
Lemma 8. For any time t and any i, j ∈ {1, . . . , N}, the
following inequalities hold:
I1. |Qij(t)| ≥ |Q∗ij(t)|
I2. |C∗ij(t)| ≥ |Cij(t)|
Proof. We show Inequalities I1 and I2 by a simple in-
duction over time. Let the induction base be at time 0, i.e.,
before the sequence starts. All queues are empty at this time
and all inequalities hold. Assume now that they hold for any
time up to time t− 1. We next show that they hold for t as
well.
Clearly, input and crossbar queues change only in arrival
and scheduling events. So, we assume that t is the time
immediately after an event τ which is either an arrival or a
scheduling event.
Assume τ is an arrival event. Clearly, crossbar queues do
not change in arrival events and thus I2 holds for this case.
For I1, the only critical case is when the arriving packet is
rejected by cgu and accepted by opt. However, the input
queue of cgu must be full in this case and thus I1 still holds.
Now, let τ be a scheduling event. Here, the only critical
case for I1 is when cgu transfers a packet from Qij while opt
does not transfer any packet from Q∗ij . However, either Q
∗
ij
is empty in this case or it cannot happen due to Modification
3.1.1. For I2, the first critical case is when cgu inserts a
packet into Cij while opt does not insert any packet into
C∗ij . However, either C
∗
ij is full in this case or it cannot
happen due to Modification 3.1.2. The second critical case
for I2 is when opt transfers a packet from C∗ij while cgu
does not transfer any packet from Cij . However, either Cij
is empty in this case or the size of C∗ij does not decrease due
to Modification 3.1.3.
In the following, we use S∗t[s] to denote the set of opt’s
normal packets in the input subphase of t[s]. These are
packets that opt schedules through the normal channels, i.e.,
they are not privileged, and are part of the original input
sequence, i.e., they are not extra. On the other hand, we
use St[s] to denote the set of packets that cgu schedules in
the input subphase of cycle t[s], i.e., from input queues to
crossbar queues.
Lemma 9. For any scheduling cycle t[s], |S∗t[s]| ≤ |St[s]|.
Proof. We want to show that in the input subphase of
any scheduling cycle t[s], if opt transfers a normal packet
from an input port i, cgu also transfers a packet from i.
Assume that opt transfers a normal packet p from Q∗ij (to
C∗ij) in t[s]. Thus, by I1 and I2 of Lemma 8, Qij is not empty
and Cij is not full in t[s] (Note that opt would not schedule
a packet to a full crossbar queue, as all packets are of the
same value). Hence, cgu transfers a packet from either Qij
or another Qij′ in t[s].
Let P ∗t[s] denote the set of opt’s privileged and extra pack-
ets (of either type) that occur in scheduling cycle t[s].
We consider the following mapping scheme from P ∗t[s] to
St[s]. For packets that are inserted in cgu’s output queues,
we use the notion of a marked packet. Initially, all packets
are unmarked.
1. Let p be a privileged packet that is transferred by opt
from Q∗ij in scheduling cycle t[s]. By Modification 3.1.1,
cgu transfers a packet q from Qij in t[s]. Map p to q.
2. Let p be an extra packet of Type 1 that is inserted into
C∗ij in the input subphase of scheduling cycle t[s]. By
Modification 3.1.2, cgu transfers a packet q into Cij
in t[s]. Map p to q.
3. Let p be an extra packet of Type 2 that is inserted into
C∗ij in the output subphase of scheduling cycle t[s]. By
Modification 3.1.3, Cij is not empty in t[s], and opt
transfers a packet p′ to Q∗j . Thus, Q
∗
j is not full right
before t[s]. Now, let q be the first unmarked packet in
Qj , i.e., the nearest to the queue’s head. Map p to q
and then mark q. Note that q can be the packet that
cgu may insert into Qj in t[s].
Next, we show that the mapping scheme is feasible, i.e.,
each packet p ∈ P ∗t[s] is mapped to a packet q ∈ St[s]. Clearly,
Steps 1 and 2 are feasible. We show now that Step 3 is
feasible as well. Let Mj(t) denote the set of marked packets
in Qj at time t, for any 1 ≤ j ≤ N . We first show the
following lemma.
Lemma 10. At any time t, |Mj(t)| ≤ |Q∗j (t)|.
Proof. We show the claim by induction over the schedul-
ing and transmission events. Clearly, Mj(t) and Q
∗
j (t) change
only in these events.
Assume first that t is a transmission event. The only
critical scenario in this event is that opt sends a packet
from Q∗j while cgu does not send a marked packet. If that
happens, then either cgu sends an unmarked packet or it
does not send any packet at all. The first case cannot happen
while |Mj(t)| > 0 since marked packets are always at the
front of the queue. The second case is safe because it implies
that Qj is empty and thus |Mj(t)| = 0.
Now, assume that t is a scheduling event. The only critical
scenario in this event is that a packet q is marked in Qj while
opt does not insert any packet into Q∗j . However, according
to Step 3 of the mapping scheme, marking q implies that
opt transfers a packet p′ from C∗ij to Q
∗
j . Thus, this scenario
cannot happen in scheduling events.
Now, to show that Step 3 of the mapping scheme is feasible,
we need to show that at least one packet is unmarked in Qj
in the scheduling cycle t[s]. For the sake of contradiction,
assume that all packets in Qj are marked or Qj is empty
in t[s]. The first thing that follows from this assumption is
that cgu does not insert any packet into Qj in t[s] (because
otherwise the inserted packet would be initially unmarked).
Recall from Step 3 that Cij is not empty in t[s]. Thus, since
no packet is inserted into Qj , Qj must be full in t[s]. Hence,
since all packets are marked by assumption, |Mj(t)| = B(Qj),
where t is the time right before t[s]. Thus, by Lemma 10,
|Q∗j (t)| = B(Q∗j ) as well. However, this contradicts with the
fact that opt inserts p′ into Q∗j in t[s]. Hence, at least one
packet is unmarked in Qj in the scheduling cycle t[s], and
thus Step 3 is feasible.
Lemma 11. For any scheduling cycle t[s], |P ∗t[s]| ≤ 2 |St[s]|.
Proof. As shown above, the mapping scheme is feasible
for each scheduling cycle t[s]. So, it remains to show that
at most two packets from P ∗t[s] are mapped to any packet
q ∈ St[s].
Consider a packet q ∈ St[s]. Let Qij be the input queue
from which q is transferred in the scheduling cycle t[s]. Ob-
viously, q may get mapped by: (i) a privileged packet p that
is transferred from Q∗ij in t[s], (ii) an extra packet p
′ of Type
1 that is inserted into C∗ij in the input subphase of t[s], and
(iii) an extra packet p′′ of Type 2 that is inserted into C∗i′j in
the output subphase of t[s], with i 6= i′. Therefore, at most
three packets can be mapped to q during its entire lifespan.
Assume now, for the sake of contradiction, that both p and
p′ are mapped to q. According to Modification 3.1.2, since
an extra packet p′ is inserted into C∗ij , C
∗
ij cannot be full in
t[s]. However, by Modification 3.1.1, p must be transferred
to C∗ij in this case and not directly to outside the switch.
Hence, by Modification 3.1.2, no extra packet is generated
in t[s] in this case and thus p′ does not exist, leading to a
contradiction.
Now, as cgu does not preempt packets, each packet which
cgu schedules in an input subphase must be eventually
sent, and thus it contributes to the benefit of cgu. Hence,
cgu(σ) =
∑
t[s] |St[s]|. Furthermore, note that opt(σ) =∑
t[s] |S∗t[s]| + |P ∗t[s]|. Therefore, the proof of Theorem 3
follows immediately from Lemma 9 and 11.
3.2 General-Value Case
For the case of arbitrary packet values, we present the
Crossbar Preemptive Greedy algorithm (cpg) that is a vari-
ant of a 16.24-competitive algorithm given by Kesselman et
al. [20].
Recall the notations gij(t), lij(t), and lj(t) that we used
with algorithm pg (Section 2.2). Let gcij(t) and lcij(t) be
the corresponding notations for crossbar queue Cij , i.e., the
packet with the greatest value and the packet with the least
value, respectively, in Cij at time t. Additionally, let β ≥ 1
and α ≥ 1 be two parameters of the algorithm that will be
determined later. If β = α, our algorithm will be the same
as the algorithm given in [20]. However, we show that to
minimize the competitive ratio for this algorithm, these two
parameters must take on different values.
The arrival and transmission phases of cpg are the same as
those of pg. In the scheduling phase, cpg works as follows.
Scheduling phase: We divide every scheduling
cycle t[s] into two subphases:
- Input subphase: For each input port i, let
J be defined as follows:
J =
{
j : |Qij(t[s])| > 0
∧(
|Cij(t[s])| < B(Cij)∨
v(gij(t[s])) > β v(lcij(t[s]))
)}
.
If J 6= ∅, choose Qij such that for all j′ ∈ J ,
j ∈ J
∧
v(gij(t[s])) ≥ v(gij′(t[s])) .
Transfer gij(t[s]) to Cij . If |Cij(t[s])| =
B(Cij), preempt lcij(t[s]) first.
- Output subphase: For each output queue
Qj , choose a crossbar queue Cij such that
for all i′ 6= i,
|Cij(t[s])| > 0
∧
v(gcij(t[s])) ≥ v(gci′j(t[s])) .
If the following condition is satisfied
|Qj(t[s])| < B(Qj)
∨
v(gcij(t[s])) > αv(lj(t[s])),
transfer gcij(t[s]) to Qj . If |Qj(t[s])| =
B(Qj), preempt lj(t[s]) first.
Note that all ties in cpg are broken arbitrarily.
Theorem 4. For β = 2
√
2 − 1 and α = 2√2, the com-
petitive ratio of cpg is at most 12 + 2
√
2 ≈ 14.828 for any
speedup.
The proof of Theorem 4 is omitted due to space limitations.
4. REFERENCES
[1] W. Aiello, A. Kesselman, and Y. Mansour. Competitive
buffer management for shared-memory switches. ACM
Transactions on Algorithms, 5(1):Article 3, 2008.
[2] K. Al-Bawani and A. Souza. Buffer overflow
management with class segregation. Information
Processing Letters, 113(4):145–150, 2013.
[3] S. Albers and M. Schmidt. On the performance of
greedy algorithms in packet buffering. SIAM Journal
on Computing, 35(2):278–304, 2006.
[4] Y. Azar and A. Litichevskey. Maximizing throughput in
multi-queue switches. Algorithmica, 45(1):69–90, 2006.
[5] Y. Azar and Y. Richter. The zero-one principle for
switching networks. In Proc. of the 36th ACM Symp. on
Theory of Computing (STOC), pages 64–71, 2004.
[6] Y. Azar and Y. Richter. Management of multi-queue
switches in QoS networks. Algorithmica, 43:81–96, 2005.
[7] Y. Azar and Y. Richter. An improved algorithm for
CIOQ switches. ACM Transactions on Algorithms,
2(2):282–295, 2006.
[8] M. Bienkowski and A. Madry. Geometric aspects of
online packet buffering: An optimal randomized
algorithm for two buffers. In Proc. of the 8th Latin
American Symp. on Theoretical Informatics (LATIN),
pages 252–263, 2008.
[9] S.-T. Chuang, A. Goel, N. McKeown, and
B. Prabhakar. Matching output queueing with a
combined input output queued switch. IEEE Journal
on Selected Areas in Communications, 17:1030–1039,
1999.
[10] S.-T. Chuang, S. Iyer, and N. McKeown. Practical
algorithms for performance guarantees in buffered
crossbars. In Proc. of the 24th IEEE Conf. on
Computer Communications (INFOCOM), pages
981–991, 2005.
[11] M. Englert and M. Westermann. Considering
suppressed packets improves buffer management in QoS
switches. In Proc. of the 18th Annual ACM-SIAM
Symp. on Discrete Algorithms (SODA), pages 209–218,
2007.
[12] M. Englert and M. Westermann. Lower and upper
bounds on FIFO buffer management in QoS switches.
Algorithmica, 53(4):523–548, 2009.
[13] L. Epstein and R. van Stee. Buffer management
problems. SIGACT News, 35(3):58–66, 2004.
[14] P. T. Eugster, A. Kesselman, K. Kogan, S. I. Nikolenko,
and A. Sirotkin. Essential traffic parameters for shared
memory switch performance. In Proc. of the 22nd
International Colloquium on Structural Information
and Communication Complexity (SIROCCO), pages
61–75, 2015.
[15] P. T. Eugster, K. Kogan, S. I. Nikolenko, and
A. Sirotkin. Shared memory buffer management for
heterogeneous packet processing. In Proc. of the 34th
IEEE International Conference on Distributed
Computing Systems (ICDCS), pages 471–480, 2014.
[16] M. H. Goldwasser. A survey of buffer management
policies for packet switches. SIGACT News, 41:100–128,
2010.
[17] T. Itoh and N. Takahashi. Competitive analysis of
multi-queue preemptive QoS algorithms for general
priorities. IEICE Transactions on Fundamentals of
Electronics, Communications and Computer Sciences,
E89-A(5):1186–1197, 2006.
[18]  L. Jez˙, F. Li, J. Sethuraman, and C. Stein. Online
scheduling of packets with agreeable deadlines. ACM
Transactions on Algorithms, 9(1):Article 5, 2012.
[19] A. Kesselman, K. Kogan, and M. Segal. Packet mode
and QoS algorithms for buffered crossbar switches with
FIFO queuing. Distributed Computing, 23(3):163–175,
2010.
[20] A. Kesselman, K. Kogan, and M. Segal. Best effort and
priority queuing policies for buffered crossbar switches.
Chicago Journal of Theoretical Computer Science,
2012(5):1–14, 2012.
[21] A. Kesselman, K. Kogan, and M. Segal. Improved
competitive performance bounds for CIOQ switches.
Algorithmica, 63(1–2):411–424, 2012.
[22] A. Kesselman and A. Rose´n. Scheduling policies for
CIOQ switches. Journal of Algorithms, 60(1):60–83,
2006.
[23] A. Kesselman and A. Rose´n. Controlling CIOQ
switches with priority queuing and in multistage
interconnection networks. Journal of Interconnection
Networks, 9(1–2):53–72, 2008.
[24] F. Li, J. Sethuraman, and C. Stein. Better online buffer
management. In Proc. of the 18th Annual ACM-SIAM
Symp. on Discrete Algorithms (SODA), pages 199–208,
2007.
[25] S. I. Nikolenko and K. Kogan. Single and multiple
buffer processing. Encyclopedia of Algorithms, pages
1–9, 2014.
[26] V. Paxson and S. Floyd. Wide-area traffic: the failure
of Poisson modeling. IEEE/ACM Transactions on
Networking, 3(3):226–244, 1995.
[27] D. Sleator and R. Tarjan. Amortized efficiency of list
update and paging rules. Communications of the ACM,
28(2):202–208, 1985.
[28] A. Veres and M. Boda. The chaotic nature of TCP
congestion control. In Proc. of the 19th IEEE Conf. on
Computer Communications (INFOCOM), pages
1715–1723, 2000.
