I. INTRODUCTION E analyse input-queued, space-division, variable-lengthpacket switches. The packet lengths and interarrival times are assumed to be random and drawn from continuous distributions. With the emergence of IP switching technologies whcn there are I packets in the systcrn and then using these rates in an M/M/l queue model for the switch. The first part of our paper can be considered to be a generalization of the results of (71 where we present a throughput delay analysis for an A4 x N input queued switch with arbitrary Poisson arrivals at each input, exponential packet lengths, arbitrary output line rates and arbitrary routing probabilities.
The very little literature that there is on the analysis of variable length packet switches are for Poisson packet arrival processes. Recent measurement studies over a wide range of packet networks have established the self-similar nature of packet traffic and thc failure of the traditional Poisson models to capture the lorig runge deperidence (LRD) and the burstiness of such packet arrival processes. The long range dependence in the arrival process is marked by the presence of correlations and burstiness over many time scales which are known to have a considerable impact on the queuing performance. We now know that queuing behavior with LRD arrival processes has a marked variation from those with Poisson arrivals. Extreme burstiness of packet traffic spanning over a number of time scales give rise to extended periods of large queue build .ups and also to sustained periods IOW activity. Thus if the arrival process feeding each port of an input queued switch is from a LRD process, their interaction with the HOL hlacking in an input queued switch can lead to a very bad queuing behavior. In view of the extreme queuing behavior expected, a deeper understanding of the switch behavior becomes necessary because the switch is the critical component in providing various quality of service guarantees in the multiservice Internet of the future. In this paper we extend the delay models for Poisson traffic arrivals to LRD input processes and present some results from our investigations into the queuing behavior of input queued, variable length packet switches under such input.
The rest of the paper is organised as follows. Section I1 introduces the delay throughput analysis for a M x N switch with Poisson arrivals and exponentially distributed packet lengths. In section I11 we present the analysis technique for a M x N switch with the arrival stream at each input port characterised by a selfsimilar process. We also analyse the switch under link multiplicities and asymetries in the traffic conditions. Finally, Section IV 0-7803-5880-5/00/$10.00 (c) 2000 IEEE presents a discussion on the results and concluding remarks.
EXPONENTIAL PACKET LENGTHS
We first consider a single stage unslotted, internally nonblocking M x N input queued packet switch. Packet arrivals to input port i fonn a Poisson process of rate A, and choose a destination j with probability p , j . The line rate on output port j is pJ and there are no buffers at the output. Input packets are served according 10 FIFO When ;I packcl movch to the hcad !)I its queue. i f i b debtination 1 5 buhy. thc packel will wail a1 the head of the input queue till the destination output port'rs liee and chooses to evacuate the packet. When an output port finishes service, of the packets that are waiting at the head of the queues of the inputs, the packet that was blocked first is served first. Service in random order, round robin or processor sharing disciplines can also be analyzed using the method developed here but we do not investigate them. From above, the arrival rate to output portj, Ajl and its utilization, q j , are
M
The sojourn time of an input packet has two componentswaiting time in the input queue till it moves to the head of the line (HOL) and the time spent at the HOL of the input queue till the HOL packets from other input queues that were blocked earlier finish their service and the packet is evacuated. The time spent at the HOL of the input queue corresponds to the "service time" in the input queue. This service time, once again, has two components -a blocking delay, the time until the output starts evacuating it, and the actual service time, the time taken to evacuate the packet by the destination port. Figure 1 shows these times i n detail. Since the arrivals to the input queue are Poisson, each input queue can be seen to be a MJCII queuc with service time distribution given by the time spent by a packet at its HOL. To analyse the queuing behavior the distribution of the time spent at the HOL of the queue needs to be obtained and this is derived below. In this derivation, we use techniques similar to the analysis of queueing networks with blocking (201.
Consider output port j. It has room for only the packet that is being evacuated (served). However, the HOL positions at the M input queues can contain a packet meant for output j which are waiting for the port to become free. These packets form a virtual queue for output j and are served FCFS Thus the virtual queue of any output has at most AI buffers. The time taken by the output port to evacuate a packet from the HOL of the inputs IS exponentially distributed with mean 1/p3. If we approximate the arrival process to the virtual queue by a Poisson process of throughput A,, then output queue j can be modeled as a MIMIIIM queue. We can easily show that as M 3 00. the arrival process to the output queue is indeed Poisson under certain conditions. Since the queue has finite buffers, the throughput is not equal to the arrival rate. The throughput of output port j should be Aj. Therefore the "arrival rate" corresponding to this throughput, let us call this the effective arrival rate Ai, will be obtained by solving for A; in the equation where qi = A;:/pj. The term in the square brackets in the first equality corresponds to the probability that an arriving packet into an MIMIlIM queue is not blocked. The probability that there are k packets in the virtual queue of output port j , q k ) , is given by Packet arrivals to the head of an input queue are approximated to fonn a Poisson process. Thus the probability that it will see k packets ahead of it in the virtual queue of the output will be e j ( k ) . However a packet moving to the head of an input queue can see only 0.1.' . ..%f -1 and will never see . \I packets ahead of 11. 'Therefore the probability thai il packet arriving to the head of an input queue wanting to go to output j sees k packets ahead of it, 7rJ(k), will be for IC = 0,1, .. . M -1 (4) In the virtual queue of output port j if there are k packets ahead of it, the packet has to wait for the evacuation of these packets before it can begin its service and its waiting time is a k stage Erlangian distribution (sum of the k independent, exponentially distributed evacuation times). In addition to the blocking delay there is the evacuation time that has an exponential distribution of mean l/pJ. Thus the conditional (conditioned on the packet wanting to go to output port j) sojourn time of a packet at the HOL of the input queue has a phase type distribution like that shown in Figure 2 Here the term in the first square brackets corresponds to the blocking delay and that in the second corresponds to the evacuation time given that the packet wants to go to output j . The first three moments of the blocking delay at input queue i, Bi, . . .
Evacuation

Tim
I
M-2
: Note that the difference between the analytical and simulation models improves for both the total and the blocking delay as the switch size increases. It is easily seen that our delay model is exact for N 4 00. As N -+ 00, the virtual M/MIlIN queue of the outputs becomes an M/M/I queue with arrival rate X and service rate 1.0. As N 3 00, the arrha1 process to the input queue is Poisson with rate X and it in turn is an W G I I queue with service tlme equal to the sojourn time in an M I queue with arrival rate A and service rate 1 .O. Thus for the input queue to be stable, X should be less than the reciprocal of the sojourn time of an M/M/I queue with arrival rate X and service rate 1.0.
This yields the condition, X 5 1 -X or X < 0.5 for stable queues at the input.
EXPONENTIAL PACKET LENGTHS
The maximum arrival rate &at input port i can support is obtained by solving for A, in XiX, = 1.0.
Consider the special case of an N x N switch with pij = 1/N for all i, j ; Ai = and pj = 1.0 for all j. Figure 3 shows the tofa1 delay and the blocking delay for various values for all Having modeled switch behavior under the somewhat idealized model of Poisson inputs we will now examine the behav-0-7803-5880-5/OO/$i0.00 (c) 2000 IEEE ior under a more realistic model of self similar inputs. Before presenting the delay analyses for self similar arrival processes we give a brief overview of the various equivalent definitions of self similarity and the packet arrival models that can be used with each o f these. Finally, we will select the self similar packet arrival model that has a well developed queueing theory.
Packet arrival insiant.r arc: riiodcled ils point prctccsseh. Divrclc the time axis into nonoverlaping interval5 of unit length and let X = {X, : t = 0 , l . 2 , . . .} be the number of points (packet arrivals) in the tth interval. Measurements and analysis of such packet arrival processes in real networks has indicated that K is a self similar process. This means that although analysis o f packet switches for the Poisson packet arrival model gives us a "first-order-feel" for their performance, to understand their performance in real networks, it is necessary to study their performance for self similar packet arrivals.
Mathematically, self similarity in the process X can be expressed in many ways. Let X be covariance stationary with mean A, variance g2 and autocorrelation function r ( k ) , k 2 0. Each of the above descriptions of a self similar process can lead to a class of models for the packet arrival process. From the point of understanding queuing behavior of systems, we 'consider those that are derived to match the LRD statistics of the packet arrival process. In [ 101 Leland et a1 show that Gaussian noise or nonlinear transformations on Gaussian noise such as fractional ARIMA can be used to characterise a LRD X. In [ 191, Paxson and Floyd show that superposition of ontoff sources that have a tixed rate i n the on period and have a heavy-tailed distribution for the on and off period lengths can be used to model LRD X. Erramilli As in the previous section we assume that each packet at input d chooses output j independent of other packets with probability p i j and the the rate at which a packet is evacuated from an input queue by output port j is pJ which is the line rate at output port j . Packets lengths are exponentially distributed with unit mean. There are infinite buffers at the input and none at the output. The output ports evacuate packets from the HOL of the input queues according to "first blocked first served" discipline. The "service time'' of the input queue, time spent at the HOL by packet, is obtained exactly as before by making the approximation that the virtual queue to each output is an M/M/lIM queue. The sojourn time in this W I I M queue is thus the service time for the input queue which we can now model as an MMPPIGII queue. Since the service time for the input queue is like before, the maximum throughput per port will be 0.5 and is derived exactly as before. Thus the moments of the service times are obtained exactly like in the previous section using Eqns 1-7. The first and second moments of the packet delays in the input queue can now be obtained using well known techniques for MMPP/G/I queues 161. The procedure 15 summarised in thc appendix.
Numerical results are obtained as follows. We use the Bellcore traces [ IO] and derive their statistical properties in terms
1'
0-7803-5880-5/00/$10.00 ( c ) 2000 IEEE of the Hurst parameter, the correlation at lag I and the time scales over which the burstiness occurs. These parameters and the arrival rate X are used to fit the parameters c i J , c? and T , for j = 1,. . . , 4 of the MMPP model described in [I] . The analytical results are obtained for the MMPPGII queue as described earlier. To validate the analytical results we also develop a simulation model in which the arrivals are MMPP with parameters derived above. The arrival process generator is validated by simulating a single server queue and comparing with the results given in [4] . The magnitudes of our delays and the knee region of the delay-throughput graph match that given in Figure  2 of [4] . In the simulation model a separate and independent MMPP arrival process generator is used for each of the input ports with the traces generated by each of the sources having identical statistical properties. Thus, statistically identical selfsimilar traces but with different sample paths are used as the input processes to the simulation model. In this paper we primarily use the Bellcore traces pAug . TL (H = 0.82 and p = 0.582) and p O c t . TL ( H = 0.92 and p = 0.356). We model burstiness over 4 time scales.
We mention here that we considered feeding the traces to obtain the simulation results. Since the number of inputs was large, the size of the traces was insufficient. The same trace cannot be fed to all the inputs because in that case the arrivals at each input will have a correlation of one, an obviously wrong choice for an arrival process. Also, we did not use shuffled versions of a single trace because shuffling of the time series of the traces would lead to a loss of the correlation structure and consequently the long range dependence.
In Figures 4-7 we show the first and second moments of total and blocking delays in the switch. It can be seen that the simulation and analytical results are in extremeiy good agreement except at loads close to the capacity of the switch. We see a marked difference in the shape of the delay characteristics for the pOct . TL trace at low loads which can be attributed to its comparatively low correlation value at lag one. At low loads. the low correlation suggests a lower probability of successive intervals having packet arrivals, which in turn leads to low delays.
Further investigation of the effect of the correlation structure is done in Section 111-B. As discussed earlier, the throughput delay curves in Figure 4 show that the switch saturates at a load of 0.5. Also, note that the first and second moments of the blocking delay shown in Figures 6 and 7 are identical for both the traces for a given switch size. This is because the virtual queue at each output port is modeled as an W l I M queue whose delay characteristics depend only on the average arrival rate of the input processes and not on any of their other statistical properties.
From Figure 4 we see that the mean delay increases exponentially as the arrival rate. The delay performance can be divided into three regions -IOW (Oi 0 -0.10), medium (0.10 -0.40) and high (0.40 -0.50) loads. Note that in the medium load load region the mean delay is of the order of the order of lo3. In all these regions the mean delay increases exponentially with increasing arrival rate. For comparison, we have shown the delays that would have been experienced in a single server queue without HOL blocking. This would be the delay experienced in an output queued switch in which the arrival rate to an output port would be described by the corresponding MMPP process. This shows that for a given arrival rate mean delay i n the input queued switch could be at least double and nearly IO times higher even at medium load.
The moments of the blocking delay for the case of Poisson arrivals and that of the MMPP arrivals is identical in the analytical models. Comparisons with the simulation model suggests that the analytical models are a good approximation. Hence we note that the effect of increase in the second moment in the case of self similar arrivals is significantly larger.
We have performed extensive analysis and simulations to understand the switch behavior under self similar arrivals and we have observed that when the burstiness extends over 3 time scales, the delays are of the order of lo2.
From the above results we note that the analytical results match the simulations reasonably well. Therefore, in the following we do not present any simulation results.
A. Evacuating Multiple Packets in Parallel to an Output
To increase the throughput and reduce the delay through the switch, we could introduce parallelism by increasing the link multiplicity to an output port similar to the discrete time switch described by Oie et al in [ 171. Note that this will require queuing at the output too.
It is easy to see that in this case if there are more than m HOL packets at the inputs destined for a particular output port, m of them are served simultaneously while the others are blocked. Here too we assume the input process to the queue to be Poisson which is an approximation when M is finite. T h u s the virtual queue of each output port will be modeled as an M/M/mIM queue and the effective arrival rate to output port j correspond- m or more packets waiting in the virtual queue will the packet at the HOL of an input queue have to wait. We can now use the expressions for the average delay and its second moment as given in Eqns 16 to obtain the latency for the arriving packets. Figure 8 shows the analytical results for the delay throughput characteristics for N x N switches with N = 8,16,32 and 64
for speedup factors of 2 and 4. We assume identical loads on all the inputs and uniform routing probabilities p,, . We see that effect of the switch size on the delay characteristics becomes negligible as the switch size increases. Also, the medium load region can be extended till the arrival rate of 0.75 for a speedup factor of 2 and upto 0.85 for a speedup factor of 4. Further, the mean delay is considerably lower with speedup than without.
Also, the steep rise in the mean delay in the low load region does not manifest i n the speeded up switch.
The maximum throughputs for a given speedup factor is obtained by solving for X in X x = 1.0, where x is obtained from Eqn 14. Table I shows the maximum achievable throughputs for switches of various sizes and for speedup factors of 2, 3 and 4.
Note that a switch with a speedup factor of 4 can support loads in excess of 99%.
B. Effect of Asymmetries in Traffic
Recall that the parameters in characterizing the input process are H the Hurst parameter, p the correlation a lag 1 and n the number of time scales over which burstiness occurs. In addition there are the routing probabilities and p,, that can generate hotspots on some outputs. In this section we examine the ef- 
=
As y increases, the contention for the hotspot output port h increases and hence the blocking delay for these packets at the head of their input queues increases. The increased blocking delay increases the "input service time" and hence the total delay of all the packets. In Figure 9 we show the effect of this hotspot for 7 = 2. As I S evident from the tigures. there I S a marked rise i n the average delays in the presence of hotspots and a considerable reduction i n the maximum achievable throughput. . . In our analytical model, asymmetry in the correlation or the Hurst parameter of the traffic at the input ports does not affect the delay performance of the other ports as long as the arrival rate remains constant. This is because the "service time" for a port depends on the blocking delay and the only factor affecting the blocking delay at the ports are the arrival rates into the virtual queues of the outputs. Thus the "service times" at all the ports in the presence of parameter asymetries is the same. Hence, if the arrival rates are the same, differences in H , p and n do not have any effect on the "service times" of the other ports. However, the total delay at the ports will depend on the traffic characteristics at that input port.
C. Effect of p and H on Total Delay
Now let us consider the effect of the correlation structure of the arrival process at each input on the delay throughput characteristics. Figure I O shows the effect of variation of the correlation on delay characteristics. The three curves correspond to the case when the input processes have the same Hurst parameter (H = 0.82) and arrival rate but correlations at lag one of 0.532, 0.582 and 0.632. Each input port of the switch is fed with traces having the same parameters. Observe that the delay decreases substantially with lower correlations. This is due to the reduced probability of successive time units having packet arrivals and thus reducing the queuing at the inputs. Finally we study the effect of variation in the Hurst Parameter. As in the previous case, we vary the Hurst parameter of the input streams keeping all other parameters constant. The delay throughput characteristics for the cases when the input steams at each port have Hurst parameters of 0.77, 0.82 and 0.87 for a Correlation at lag one of 0.582 are shown in Figure 11 . As before, each input port is fed with traces having the same statistical properties. Note that the delays decrease significantly with even slight reduction in the Hurst parameter. This can be explained by considering the fact that a lower H reduces the long range dependence and the burstiness thereby reducing the queue buildups at the inputs.
IV. CONCLUSION
In this paper, we have presented a generalized analytical model for an input queued, variable length packet switch. Although we have presented the analysis for switches with infinite input buffers our model can easily be extended to analyse finite buffer switches.
In [7] it was conjectured that FCFS service in the virtual output queue gives the least average delay. Our analysis easily confirms this because FCFS service has the least variance and this is the variance of the "service time" of the input queue which is an M/G/l queue. It is well known that for an M/G/I queue the variance of the service time, in addition to the rncan, contributes to the average delay. Also, from our models it is clear that the conjecture in [7] that the performance of an M x N switch is symmetric in M and N is not true.
From the throughput-delay characteristics of Figures 4. 10 and I 1 we see that capturing all the statistical properties of the arrival processes is essential to characterizing the switch performance. Another important result to note is that operation in continuous time limits the maximum achievable throughput to 0.5. though. with a speedup factor or 4. the achievable throughput can be increased to more than Y9%. Severe performance degradation takes place in the presence of hotspots, which can reduce the maximum throughput by 15% in a 16 x 16 switch. Also, Figures 10 and I I highlight the large variations in the delay characteristics with changes in the correlation structure and the Hurst parameter. Lower Hurst parameters and correlation values reduce the burstiness of the arrival streams and reduces the queuing effects at the inputs and can give significantly lower delays at low loads. Thus the correlation structure and the Hurst parameter of the arrival processes are of extreme importance in determining the overall switch performance.
The model for variable length packet queues that we have developed here can easily extended to consider priorities in the input queue. Also extending it to analyse finite input buffer queues is rather straightforward and we do not present it due to lack of space. This can be done by considering the input queue as an M/G/l/K queue with the service time described by Eqn 5 for the case of Poisson arrivals using results from [3], [9], [16] and as an MMPP/G/l/K queue for the case of self similar arrivals modeled as an MMPP process using results from [2] .
Finally, we add that our model does not address many architectures for variable length packet switches that are being considered today, specifically the virtual output queued (VOQ) switches. The buffer complexity of a VOQ switch is the same as The -yn for the service time distribution which is the summation of the phase-type distribution with Erlang-k service times and an exponential evacuation time is given by the weighted sum of the individual yfz values. The weights are the probabilities of encountering each of the individual distributions, the ra3 ( k ) s .
C The MMPP/G/I ulgorithm
Step 1. Compute the matrix G for the given input port.
Step 2 where n' is chosen such that ~~~1 ~k > 1 -€1. €1 << 1. Set Gi = Gf".
Recursion
B. Computation of yn
The yn for Erlang-k and exponential service times are given Step 3. Compute the moments of the waiting time using Eqn 16.
