We propose a systematic approach to quantify the impact of nonuniform tra c on the performance of non blocking switches with output queueing. We do so in the context of a simple queueing model where cells arrive to input ports according to independent Bernoulli processes, and are switched to an output port under a random routing mechanism. We g i v e conditions on pairs of input rate vectors and switching matrices which e n s u r e v arious stochastic comparisons for performance measures of interest. These conditions are formulated in terms of the majorization ordering while the comparison results are expressed in the strong and convex increasing orderings.
Introduction
Space division packet switching has been recognized as a key component in the ongoing evolution towards future high performance communication networks 1, 5 . This is due to the high capacity, viz., in the range 10 100Gps, that space division packet switching can achieve through the use of a highly parallel switching fabric with simple per packet processing distributed among many high speed VLSI circuits. In non blocking space division packet switches, it is always possible to establish a connection between any pair of idle input and output ports. However, output contention arises when more than one cell at di erent input ports demand to be routed to the same output. As the contending cells cannot be placed on the output port at the same time, bu ering has to be provided in order to store the cells which cannot beserved. Several bu ering strategies have beenreported in the literature 4, 20 , with proposed solutions depending on a variety of factors such as the speed of input and output lines relative to the cell transfer time across the switching fabric, and implementation complexity.
Noteworthy among proposed bu ering strategies is output queueing which we adopt in this paper. Consider a non blocking crossbar switch w i t h K input and L output ports. The switch operates in a synchronous mode with time divided into consecutive slots of equal duration. At the beginning of a time slot, new cells arrive into the system; the destination of a cell is immediately declared upon arrival. The switching fabric operates at K times the speed of the input and output lines, and each output port is equipped with an in nite capacity bu er, thereafter referred to as its output bu er. Under the output queueing strategy, all cells which arrive during a time slot and which are destined for a given output port, are transported across the switch during that single time slot, and put into the output bu er. This is indeed possible under the assumption made on the speed of the switching fabric. However, during any time slot at most one cell in each output bu er can be transmitted on the corresponding output line, The simplest model of this synchronous crossbar switch with output queueing is that of a collection of L discrete time queues, one for each output port,operating in parallel and fed by K independent Bernoulli processes under a random routing assignment. The arrival process at the k th input port, k = 1 ; : : : ; K , is a Bernoulli process with parameter k , 0 k 1. The output addressing scheme is described by a stochastic matrix R r k , called the switching or routing matrix, with the following implementation in each time slot: A cell that arrives at the k th input port at the beginning of a time slot is destined for the`t h output port with probability r k , k = 1; : : : ; K ; = 1; : : : ; L ; this assignment is carried out independently over time across input ports, and independently of the arrival streams which are assumed mutually independent.
The performance analysis for this model is typically carried out under the uni-form tra c and routing assumptions, which are speci ed by
L u k ; k = 1 ; : : : ; K ;`= 1 ; : : : ; L :
1.2 A distinct advantage of assuming 1.1 1.2 is the fact that the input rate vector and switching matrix being symmetric, it su ces to analyze a single queue in order to obtain information concerning most performance measures of interest.
In reality, h o wever, tra c o ered to the switch i s m o s t l i k ely to be nonuniform, and it is not clear how this will a ect its performance. As a case in point, with K = L, if cells arriving at the k th input port are always routed to the k th output port,k = 1 ; 2; : : : , there is no output contention and the best possible performance is achieved. This is in sharp contrast with the worst case scenario where all incoming cells are destined to the same output port, thereby creating severe congestion at the corresponding output bu er. Various attempts have been made to understand the range of possibilities that result from nonuniform tra c patterns. These e orts have been recently reported in the numerical studies 9, 10, 11, 18, 22 , and have focused on packet switches with output queueing as well as with input queueing and combination thereof. As nonuniform tra c refers to any tra c pattern different from 1.1 1.2, the number of possible nonuniform tra c patterns is simply huge due to the large number of parameters involved, and this precludes a systematic exploration of all cases. In fact, most analyses under nonuniform tra c have considered only very speci c tra c patterns, e.g., bi group tra c 9, 11, 18, 22 , hot spot tra c 14, 22 and point to point tra c 21, 22 . Given this state of a airs, in the context of the simple queueing model introduced earlier, we seek to understand in a more systematic manner the behavior of the output queueing switch as a function of the input rate vector 1 ; : : : ; K and of the switching matrix R. Speci cally, w e focus on nding conditions on pairs ; R and 0 ; R 0 of input rate vectors and switching matrices which ensure various stochastic comparisons for the corresponding performance measures. Switch performance is quanti ed by output queueing delays and bu er sizes, and we distinguish performance measures associated with output ports, e.g., the queue size at the`t h output bu er and the delay incurred by a cell leaving through the`t h output port, from measures which are associated with input ports, e.g., the delay incurred by a cell that enters the switch b y the k th input port.
We formulate the conditions on the pairs ; R and 0 ; R 0 in terms of the weak majorization ordering. The comparison results are expressed in the strong and convex increasing orderings for distributions, and not merely in terms of the rst moments of the performance measures. The results are summarized in Section 5, where only the steady state version is presented; however it should beclear from the discussion given in Sections 7 9 that transient v ersions hold as well. The results are derived through the combined use of recent ideas from the theory of stochastic convexity, and of techniques from the theory of stochastic orderings. In the process we establish several majorization properties for sums of independent Bernoulli rvs; some of these results given in Section 6 appear to be new.
In this paper we establish only one dimensional results, i.e., results pertaining to a particular queue or port. However, these results can already be used to obtain bounds on system performance. In particular, as we s h o w in 7 , under certain load constraints, we can identify the bestand worst scenarios. We refer the reader to the companion paper 6, 8 for a collection of multi dimensional comparison results which yield tra c and switch con gurations for optimal load balancing.
The paper is organized as follows: The model of interest is described in details in Section 2. Delay measures are introduced in Section 3, and the statistical equilibrium for the system is discussed in Section 4. The main results are presented Section 5, and their proofs can be found in Sections 7 9. In Section 6 we have isolated intermediary results on sums of Bernoulli random variables which are of independent i n terest. Several proofs have been relegated to two technical appendices. The notation st resp. icx stands for the the strong stochastic resp. convex increasing ordering on the collection of distributions 15, 19 . Finally two I R valued rvs X and Y are said to be equal in law if they have the same distribution, a situation we denote by X = st Y .
The Model
All rvs are de ned on some probability triple ; F; P, and let E denote the cor- In deriving 2.1 we made the following operational assumption: If the`t h output queue were empty at the beginning of a time slot, no cell arriving at that output queue during that time slot is eligible for transmission during the time slot. Instead of this gated" transmission strategy, w e could also consider a cut through" strategy according to which, if the`t h output queue were empty at the beginning of a time slot, cells arriving at that output queue during that time slot are eligible for transmission during the time slot. In that case, the dynamics 2.1 have to be replaced by Qt +1 ; R = h Qt; R , 1 + t +1 ; R i + ;`= 1 ; : : : ; L ; t = 0 ; 1; : : : :
The results derived here hold under either strategy, but for the sake of de niteness, we carry out the discussion only in the context of the gated strategy with queue dynamics 2.1. batch, m = 1; 2; : : : , but the order of service within a given batch is random. As a result, the delay process of the n th cell can be decomposed into two successive stages: First, all the cells which h a ve arrived in earlier time slots and which m ust belong to di erent batches are serviced. Then, the cells belonging to the same batch a s t h e n th cell are processed in random order. We c a n t h us write Dǹ; R = Wǹ; R + Bǹ; R; n = 1 ; 2; : : :
Delay Measures
where the rv Wǹ; R counts the number of slots required for transmitting all the cells in the batches which h a ve arrived before that containing the n th cell, and the rv Bǹ; R denotes the numberofslots that the n th cell needs to wait before it is served, once the batch to which it belongs starts beingserved. We can also interpretate Bǹ; R as the position of the n th cell in its batch. We also consider performance measures which are associated with the input 
The Steady State Regime
As some of the results below are concerned with performance measures for the system in statistical equilibrium, we now discuss the existence of such a steady state regime in some details. To set the notation, for any sequence of IR d valued rvs fX t ; t = 0; 1; : : : g, we denote its weak limit by X as t goes to 1 whenever it exists and write X t = t X to denote this weak convergence 2 . We call X the stationary version of the sequence fX t ; t = 0 ; 1; : : : g. ; R such t h a t Q t ; R = t Q; R. In such circumstances, the system is termed stable and Q; R is called the steady state queue size vector or the queue size in statistical equilibrium.
If for some`= 1 ; : : : ; L , we only have ` ; R 1, then the one dimensional convergence Qt; R = t Q`; R still takes place, in which case the`t h output queue is said to be stable.
We now turn to delay measures. Fix`= 1; : : : ; L , and assume the stability condition ` ; R 1. For each n = 1; 2; : : : , with t n denoting the arrival epoch of the batch containing the n th cell, we have the relation Wǹ; R = Qt n ; R.
Because the arrival of batches to the`t In other words, the rv B`; R is the forward recurrence time associated with 1; R.
Because for each n = 1 ; 2; : : : , the rvs Wǹ; R a n d Bǹ; R are independent, we obtain from 3.1 that Dǹ; R = n D`; R for some rv D`; R given by D`; R = st Q`; R + B`; R with W`; R and B`; R independent rvs.
In view of the independence mentioned at the end of Section 3, whenever Q t ; R = t Q; R, we conclude from 3. The conclusion 5.6 will simultaneously hold for all k = 1 ; : : : ; K provided the conditions R = R 0 , a n d ` ; R To formulate the second set of results concerning delay measures associated with input ports, we need to place restrictions on the switching matrices: The addressing scheme is said to be input independent if its switching matrix R has all its row identical, say r k = r, k = 1; : : : ; K , for some vector r in S L . Bi-group and hotspot tra c patterns are instances of input independent addressing schemes. Under this constraint, we explore how the routing vector r a ects the delay performance measure 3.2, as the input rate vector remains xed. The dependency on the pair ; R will beabbreviated to read ; r, where r is the common row of R. The main result along these lines is contained in Theorem 5.3 below, and its proof discussed in Section 9. for any increasing mapping ' because the mapping b ' : IN ! IR is then integer convex. We obtain 6.5 via 6.2 upon combining 6.6 with the equality E S K p = E S K q derived from the condition p q.
Under the condition p q, the validity of 6.6 is an immediate consequence of Lemma 6.2 once we note the equality of the means. It is then natural to wonder whether the conclusion 6.4 still holds under the weaker condition p w q. In order to answer this question in the a rmative, we need the following result. and Qt 0 ; R 0 resp. t +1 ; R a n d Qt; R being independent, we conclude from 7.1 and 7.2 that the comparison Qt +1 0 ; R 0 icx Qt +1 ; R holds because icx is preserved under convolution. This completes the induction step.
Under 5.1, the stability condition ` ; R 1 implies ` 0 ; R 0 1, so that the`t h output queue is stable in both cases. It is simple matter to show say by transform techniques that the steady state queue size rvs Q`; R and Q` 0 ; R 0 b o t h h a ve all their moments nite. On the other hand we also note that Qt; R st Qt +1 ; R st Q`; R for all t = 0 ; 1; : : : ; this monotonicity result follows by an easy induction argument 19, Theorem 2.2.8, p. 48 which is omitted for the sake of brevity. Combining these remarks, we readily conclude that the rvs fQt; R; t = 0 ; 1; : : : g are uniformly integrable, whence In Appendix B, we show that the mapping ' av : IN ! IR is integer increasing convex whenever the mapping ' : I R ! IR is increasing convex. Therefore, for every increasing convex mapping ' : I R ! IR, the inequality 8. The reader is referred to 16, 17 for proofs and additional details concerning these notions of stochastic convexity. 
To show that the mapping We shall prove this claim by induction on n.
The basis step: For n = 0 , w e h a ve e ' av 0 = '3,2'2+'1 0 because ' is integer convex.
22
The induction step: Suppose that B.3 holds for some n = 0 ; 1; : : : . Because 'n + 4 2'n + 3 , 'n + 2 b y t h e i n teger convexity of ', w e observe that 
