In this paper we analyze the average queue lengths in a combined input-output queued switch using a maximal size matching scheduling algorithm. We compare these average queue lengths to the average queue lengths achieved by an optimal switch. We model the cell arrival process as independent and identically distributed between time slots and uniformly distributed among input and output ports. For switches with many input and output ports, the backlog associated with maximal size matching with speedup 3 is no more than 3 1 3 times the backlog associated with an optimal switch. Moreover, this performance ratio rapidly approaches 2 as speedup increases.
Introduction
Although packet switches vary in their internal construction, the most common architecture for high performance switches is the crossbar switch. A crossbar switch contains N input lines and N output lines, where each input line meets each output line at a crosspoint. This is depicted in Figure 1 . When a crosspoint connecting and input line and an output line is closed, cells may be transferred between this input and output. Crossbar switches operate with the constraint that, when routing cells from inputs to outputs, each input may only be connected to a single output, and each output may only be connected to a single input.
Switches are generally analyzed under a model where time is slotted, and only one cell may arrive at each input and depart from each output per time slot. Each arriving cell has a destination output port to which it must eventually be sent. Since multiple cells with the same output destination may arrive simultaneously at the input ports, switches require some form of buffering to store the cells which can not be immediately output. Buffered crossbar switches vary in their architecture, with the simplest being the output queued switch. In an output queued switch, buffers are placed at each output. All arriving cells are placed in their respective output queues in each time slot, and the output queues are served on a first come-first served (FCFS) basis. The average queue backlogs achieved by output queueing are minimum among all buffered crossbar switches. However, for a switch with N inputs and N outputs, output queueing requires that the as many as N rounds of scheduling must be performed by the switch in each time slot (consider the case when N cells destined for a single output arrive simultaneously). It is this requirement that makes output queueing infeasible for switches with many input and output ports.
Another alternative is the input queued switch, where cells are placed in buffers at the input ports in which they arrive. Delay performance of an input queued switch is heavily dependent on the service discipline used to serve the queues. It was shown in [1] that if input queues are served FCFS, the switch can only achieve 58% throughput. That is, suppose cells destined for output j arrive at input i at an average rate of λ/N per time slot for all i, j. Then average backlogs are bounded for all λ < 1 under output queueing, but average backlogs are only bounded for all λ < 0.58 when input queues are served FCFS. However, if we use a service discipline which schedules cells in each input queue based on their destinations, it is possible to achieve 100% throughput with input queueing. In particular, it was shown in [7] that 100% throughput is achieved by using a service discipline based on constructing maximum weight matchings (MWM) between inputs and outputs in each round of scheduling. This has the advantage over output queueing that only one round of scheduling is required per time slot. However, the algorithms required for computing maximum weight matchings are computationally expensive to implement. Also, the only known bounds on the average backlog under MWM are O(N 2 ) [6] , as opposed to output queueing which has an average backlog which increases as O(N ).
Combined input-output queued (CIOQ) switches are an alternative to purely input queued or purely output queued switches. Combined input-output queued switches place buffers at both the input ports and the output ports, and perform some moderate number s ≪ N rounds of scheduling per time slot. The number of rounds of scheduling s is commonly referred to the speedup of the switch. It was shown in [4] that 100% throughput can be achieved by using speedup s = 2 and a simple service discipline based on greedily constructing maximal size matchings between input ports and output ports in each round of scheduling. Unlike pure input queueing with MWM scheduling, maximal size matching schedules can be computed with low computational cost. Also, unlike pure output queueing, the speedup requirements do not increase with the size of the switch.
The purpose of this paper is to show that average backlog performance of a CIOQ switch using maximal matching scheduling with low speedup is comparable to that of an output queued switch. Several previous papers have addressed the problem of analyzing backlogs in combined input-output queued switches with speedup. In [2] , it was shown that under any traffic, an output queued switch can be exactly emulated by a CIOQ switch operation with speedup 2. However, the queueing discipline used in each round of scheduling has quite high computational cost. In [6] , an upper bound on average backlog was proven for maximal matching scheduling with speedup 2 assuming IID Bernoulli traffic with uniform loading on input and output ports. Unlike the best known bound for MWM scheduling, the ratio between this upper bound and a lower bound on the backlog for an output queued switch is constant as N increases. However, this ratio becomes arbitrarily large as the arrival rate λ approaches 1. The same problem was considered and another upper bound on backlog was computed in [9] . There it was shown that the average backlog associated with maximal matching with speedup 2 is no more than 5 times the backlog associated with an output queued switch. In this paper we also consider switches under uniformly loaded IID traffic. We show that average backlog associated with maximal matching with speedup s gets arbitrarily close to 2 times the backlog associated with an output queued switch as s increases. Specifically, for a switch with many input and output ports, we show that for for speedup s = 3, the backlog associated with maximal matching with speedup 3 is no more than 3 1 3 times the backlog associated with an output queued switch. This performance ratio rapidly approaches 2 as s increases.
Preliminaries

Maximal Size Matchings
Performing a round of scheduling in a crossbar switch can be thought of as constructing a matching in a bipartite graph G. This is shown in Figure 2 . The vertices in G represent input and output ports, and there is an edge between vertices i and j if the queue at input port i contains a cell to be sent to output port j. Scheduling corresponds to choosing a collection of edges in G. That is, edge (i, j) is chosen if a cell is to be sent from input port i to output port j. The connectivity constraint imposed by the crossbar requires that the scheduled transfers correspond to a matching in the graph. A matching is a subgraph of G with the defining property that no two edges are incident on the same vertex. Scheduling algorithms for input and combined input-output queued switches essentially amount to various criteria for selecting matchings. In this paper we consider maximal size matchings. The main advantage to scheduling using maximal size matchings is that these matchings can be computed very efficiently using a simple greedy algorithm. A maximal size matching is a subgraph H ⊂ G with the property that if we add any edge in G − H to H, then H is no longer a matching. The key property of maximal size matchings which is used in our later proofs is that if edge (i, j) is in G, then there is an edge in H incident to either vertex i or vertex j.
The Markov Chain Switch Model
Here we assume a traffic model in which at most one cell may arrive at each input in a single time slot, and that cell arrivals at all time slots are independent and identically distributed. We let A ij (t) ∈ {0, 1} be the random variable giving the number of cells arriving at input i destined for output j in time slot t. For simplicity, here we consider the case where arrivals are uniformly and independently distributed across inputs and outputs. This implies that the first and second moments of A ij (t) are
for all i and all k = l, where 0 ≤ λ < 1 is a parameter describing the traffic intensity. Let D ij (t) ∈ {0, . . . , s} denote the number of cells sent from input queue i to output queue j in time slot t, and let E j (t) ∈ {0, 1} be the number of cells served from output queue j in time slot t. Also, we let X ij (t) denote the number of cells in input queue i destined for output j in time slot t, and let Y j (t) denote the number of cells in output queue j in time slot t. These random variables satisfy
Throughout this paper, we will occasionally write these quantities in lowercase when simply referring to feasible values that they may take. We will consider the problem of controlling the system to regulate the steady-state average per-period backlog in the input and output queues,
Under a maximal matching scheduling policy, D(t) and E(t) depend only on X(t) and Y (t). When this is the case, this system evolves as a Markov chain and we can use the following lemma to bound the average per-period backlog. This lemma is a special case of a more general result shown in [3] . Results similar to the lemma below also appear, for example, in [8] . For any h U : X → R such that inf x∈X {h U (x)} > −∞,
for all z ∈ X .
Proof. Let y = inf x∈X {h U (x)},
and
For all t ≥ 0,
Main Result
Our overall goal is to show that maximal matching scheduling with a speedup of s keeps the average backlog are relatively close to the backlog achieved by an output queued switch. Specifically, we will: (i) compute an upper bound on the backlog associated with maximal matching with speedup s, (ii) compute the backlog associated with an output queued switch, and (iii) compute a bound on the ratio of these quantities.
Our first step will be to compute the upper bound on the average backlog. The following lemma will be used in the proof of the bound. Here we will define the quantities
d il which will be used throughout the rest of this paper.
Lemma 2. For a CIOQ switch operating at speedup s ≥ 3 using maximal matching scheduling,
for all λ ≤ 1 and all feasible values of x ij , d, and e.
Proof. When operating at speedup s, s rounds of scheduling occur in each time slot. When using maximal matching scheduling, if there is a cell in input queue i destined for output j at the start of a round of scheduling, then either a cell is removed from input i or a cell is sent to output j in that round. Also, if a cell is sent to output queue j, then output queue j is served at the end of the time slot. It is clear that the lemma holds if x ij = 0. To prove the lemma for x ij > 0, we will consider three cases: 
When
Note that if x ij > 0 and
then the total number of cells either in input queue i or destined for output queue j is less than s. However, in this case at least one cell must be sent from input queue i to output queue j, implying that
Now we are ready to prove the upper bound on the backlog associated with maximal matching scheduling. We will let J MMs denote the average per-period backlog associated with maximal matching scheduling with speedup s.
Theorem 3.
A CIOQ switch operating with speedup s using a maximal matching scheduling policy has average per-period backlog satisfying
where
Proof. We prove this bound using Lemma 1 with
Since h U is quadratic with positive second order coefficients, it is clear that inf
satisfying the required condition of Lemma 1. Let
denote the expected drift in h i when in state (x, y) and action d is taken.
where we used the fact that
Similarly,
Therefore,
From Lemma 2, we have
for all values of x. Also, since e j = 1 if y j > 0,
for all values of y. Therefore,
The previous theorem established an upper bound on J MMs , the average per-period backlog under maximal matching with some fixed speedup s. We would now like to determine the expected per-period backlog associated with an output queued switch, which we will denote by J OQ . This is a standard result, but is presented here to keep our treatment self-contained.
Lemma 4. An output queued switch has
Proof. Output queue j is a discrete-time queue with queue with arrival process A 1j + · · · + A N j . By the Pollaczek-Khintchine formula, (see, for example, [5] ) the average steady-state per-period backlog of output queue j is
Using the fact that
we sum over all output queues to obtain
The upper bound and the result of the previous lemma are now used to determine a bound on the performance ratio between maximal matching and output queueing. 
Proof. From Theorem 3 and Theorem 4 we have
By differentiating, it is straightforward to show that for s ≥ 3 and N ≥ 2, the previous expression is increasing in λ for 0 ≤ λ ≤ 1. Therefore, For large N , the performance ratio approaches 2 as s increases. 
Conclusions
In this paper we have analyzed the average backlogs in network switches using a maximal size matching scheduling policy with speedup. It is shown that switches using maximal matching with speedup achieve backlogs comparable to an optimal switch. For the sake of simplicity, we have focused on the case of IID arrivals with uniform loading on input and output ports. We believe that the performance bounds proven in this paper can be tightened when arrivals are time correlated, and this is a subject of future research.
