Abstract
Introduction
Switch fabrics which support the ATM standard have traditionally been analyzed and simulated using Poisson or short-range dependent bursty traffic sources. In [8] it was shown that traffic from a digital source (in this case, Ethernet) is statistically selfsimilar. In other words, bursts that occur over short time periods are likely to be accompanied by swells of heavy traffic over larger periods of time. Furthermore, if the traffic is very bursty during a given time period, it is likely that the traffic will be bursty in the future. This is in contrast to Poisson-based traffic, which evens out over large periods of time. Due to this problem, it is possible that a physical switch may experience much more cell loss than what is predicted by a Poisson-based simulation.
In addition to Ethernet traffic, self-similarity has been found in TCP arrival processes [lo] and variable bit rate video streams [l] , to name a few. One explanation given for this stems from the observation that the degree of self-similarity increases with an increasing aggregation of traffic sources. This implies that when the load on the network is high, the degree of self-similarity is likely to be large as well.
Several analytical studies of an ATM output queue with self-similar input can be found in the literature. Diamond and Alfa [2] have found a matrix geometric solution for the queue length distribution in a single ATM output queue. Georganas and Fan [3] obtained the upper and lower bounds of the buffer overflow probability in an ATM switch with output queueing.
Fong and Singh [4] used a shared queue simulation to
compare the effects of self-similar versus bursty traffic, and found that load balancing and output address correlation issues had a more significant impact on cell loss than did the self-similarity of the process. In this case, the self-similar process was modeled by a fast fourier transform approximation proposed by Paxson in [ll] . However, the question remains as to the relative effects of different models of self-similarity on cell loss.
This study further investigates the effect on cell loss by self-similar processes, by simulating a shared/output queueing system with Pareto distributed interarrival times and with a Poisson-Zeta ON-OFF model, the processes used in [2] and [3], respectively. Additionally, the results from these selfsimilar processes are compared to the cell loss statistics from a Poisson interarrival process and from geometrically distributed ON-OFF input.
We simulate a queueing model in which a set of input ports address logical queues in a shared buffer stage, which in turn feeds into an output buffer stage.
This model is different from a single output stage switch fabric in that a given logical buffer in the shared queue can grow to accomodate a heavy load at the expense of the buffer space in the remaining logical buffers. Furthermore, the model may feature internal speedup between the shared and output queue stages, which allows more than one cell in a slot time to leave a logical buffer in the shared queue stage.
The simulation results were verified using analytical results from [2] and [3] in a single buffer configuration. Then, the simulation was configured as a 4-input1 4-output system and studied for its cell loss characteristics for varying buffer sizes and degrees of speedup.
The main body of this paper begins in Section 2 with formal definitions and statistical characteristics of self-similar processes, followed by a description of the data traffic generation methods used by the simulation in Section 3. Section 4 gives a description of the simulator itself, and Section 5 gives the results obtained from the simulator.
The variance of a self-similar process decreases proportional to $ as m approaches infinity. Equation 2 shows that the autocorrelation of the aggregated process has the same form as the original one, which suggests that the degree of variability is the same at all time resolutions.
The variable H = 1 -g1 0 < p < 1, is known as the Hurst parameter, and gives the degree of selfsimilarity of a process. When H = 0.5, self-similarity does not exist. The degree of self-similarity increases as H approaches one.
Long-range Dependence
An important aspect of a self-similar process is that it is long-range dependent [8] , that is,
slowly varying function. Such a property shows that the autocorrelation function decays hyperbolically as the time distance increases and implies XI, R ( k ) = CO.
The non-summability means that small, high-lag correlations have a significant effect on the behavior of the process and is in contrast to a short-range depen-
exponentially.
An equivalent way of describing long-range dependence is through the frequency domain. This approach
Self-similar Stochastic Processes
This section reviews the definition of a self-similar process in terms its variance and autocorrelation, and tion 2.3 discusses self-similarity as described by a heavy-tailed distribution.
Discrete-time Definition how it is characterized
Secdent process, whose autocorrelation function decays
To look at the behavior of a stationary time series
A ' over different time scales, the m-aggregated time series X(m) = {Xim)l k = 0 , 1 , 2 , . . . } is defined by averaging X over non-overlapping blocks of size m.
This can be expressed as X i m ) = 6 ~~~k m --( m --l ) X i If the process has the same statistical properties at all values of m (all aggregations), then that process is self-similar.
Self-similarity for a process is defined in terms of its variance V a r [ X ( t ) ] and autocorrelation R(t1, tz) 181:
A process A' is exactly self-similar with parameter
In many cases, a weaker definition is needed: A process X is asymptotically self-similar with parameter / 3 (0 < < 1) if for all k large enough, is useful for establishing self-similarity in empirical data.
Heavy-t ailed Distributions
Equations 1 and 2 describe self-similarity in terms of aggregated time series and long-range dependence. To develop queueing models with self-similar input, it is useful to have an interarrival time probability distribution that is self-similar. We use a heavy-tailed distribution to characterize probability densities relating to interarrival times and burst lengths.
The distribution of a random variable X is said to be heavy-tailed if
The Pareto distribution is the simplest heavy-tailed distribution [2] . Its density function and distribution functions -f(x) and F ( z ) , respectively -have parameters E and B ( E , B > 0), and are given by:
The parameter E is the smallest time value that can be assigned to X . The parameter 0 determines the mean and variance of X . For 1 5 8 5 2, the distribution has an infinite variance and a finite mean.
Diamond and Alfa show [2] that since the autocorrelation and variance-time curve for a heavy-tailed process with finite mean interarrival time are asymptotically hyperbolic for large times, the hence this distribution is asymptotically self-similar with p = 8 -1 according to Equations 1 and 2.
Methods of Generating Data Traffic
One of the significant tasks of simulating the ATM switch queueing model is the generation of interarrival times from a self-similar random process. The challenge is two-fold: One is to use an appropriate interarrival distribution that accurately captures the self-similar behavior of the input and the second is to have a distribution from which it is relatively easy to generate random variates. In this section we describe some of the processes used in the simulation.
For those processes where the cumulative distribution function (c.d.f.) is invertible, we use the inverse transform method [7] to generate the corresponding random variate. If x is a random variable with c.d.f.
F ( x ) = P ( X 5 x) the method requires the following two steps:
1. Generate U from a uniform probability distribu-
Self-similar Traffic
The Pareto distribution has a c.d.f. given by The inverse transform method gives a variate of this distribution as:
where E = e and X is the arrival rate. Using Equation 6, interarrival times based on the Pareto function may be generated using the input parameters X and 8
(the degree of self-similarity).
Some traffic models are based on a so-called ON-OFF source, which generates one cell per time unit during the ON, or active period, and no cells during the OFF, or idle period. Bursty traffic can be modeled in this way, where the length of a burst is the length of the ON period. Heavy-tailed traffic may be generated by making the distribution of the burst length selfsimilar [3].
Since the length of a burst is discrete, the zeta distribution is employed to give the heavy-tailed characteristic to the output. The zeta distribution is the discrete counterpart to the pareto distribution:
where ( L = 1,2,. . .) is the length of the burst, the parameter p( l < p < 2) is related to the Hurst parameter ( p = 3 -2 H ) , and K is the normalizing constant.
Since the c.d.f. of the random variable is not available, an iterative method is used to generate variates of this type. If G ( N ) = P ( X < N ) then we generate the variate with G(N -1) < U < G ( N ) , which results in a table look-up scheme. To generate a discrete burst length, the random value U -U(0,l) is used to look up a value for 1.
Poisson-Based Traffic
Traditionally, source traffic modeled using various Poisson-based functions. Two implementations are used in this study for comparison.
The use of the Poisson distribution to generate interarrival times lacks both the long-range dependence of self-similar traffic and the short-range dependence of correlated bursty traffic , but the fact that the use of this distribution can greatly simplify many queueing problems makes it a function of choice for many analytical studies [9] , [12] .
Poisson trace generation is done using the process described above. If the arrival process is Poisson with rate A, then the interarrival times are exponentially distruibuted with the c.d.f. F ( s ) = 1 -e-x [7] . Using the inverse transform method, we generate an interarrival time as:
A simple way to model bursty, short-range dependent traffic is to use an ON-OFF process similar to the Poisson-Zeta process described above [5] . This process can be viewed as a state machine with an idle state and an active state. The lengths of the active and idle periods are independent geometrically distributed random variables, with parameters tlo and tal, respectively:
where L is the average burst length. The parameter tlo is the probability of changing from an active period to an idle period, and to1 is the probability of a transition from idle to active. Hence, the pmf functions are, for IC 2 1:
were Lactlve and L I d l e are the random active and idle periods, respectively. Using the inverse trnsform method, the active and idle periods are generated as:
Cell arrivals can be correlated by making each arrival in a burst address the same output.
Description of the Simulation
The primary goal of this study is to examine the behavior of a multi-stage queuing ATM switch fabric, under several types of self-similar input traffic. The two queueing stages used are a shared queue and a set of output queues.
We consider a two-stage, multiple queue/multiple server queueing system (see Figure l) , for which the interarrival times are independent, identically distributed random variables. Time is partitioned into Figure 1 : A two-stage, N-queue, N-server system (servers are part of the queue, for simplicity). slot times, which in an ATM switch is the time in between cell departures. If a cell arriving from one of the N input ports finds its destination server idle (shared queue), it must wait in that stage until the next slot boundary, and then wait for a full slot time to pass.
A cell arriving at a busy queue must wait until the other cells in the queue have exited, before departing. Depending on the parameters given to the system, 1 or more cells may depart from one first-stage queue at the end of each slot time (this is internal speedup, and is described in the next section).
Upon departing the first queueing stage, a cell must wait in the second stage, with the same definition of service time. The second stage releases 0 or 1 cells at the end of each slot time.
The system begins with no arrivals and no cells in the system. The simulation ends when a given number of traces have arrived at the system, and no cells are found in either queueing stage. The interconnect linking the inputs and the two queueing stages are assumed to be fully nonblocking, and exhibit zero delay.
The first stage is a single buffer which is "shared" by all of the inlets. The buffer is uniformly partitioned into logical output queues, one for each output in the second stage, as shown by the bold lines in Figure 1 . The size of these logical buffers can change, so that if one buffer is experiencing a higher load than the others, it can expand in order to prevent cell loss while one or more of the other logical buffers temporarily shrinks. Thus, the individual logical buffer sizes are dynamic, but the sum of all logical buffer sizes remains constant.
This stage may have an output speedup K , that is, at each clock time it can eject up to K cells. In an ATM switch a speedup of K implies that the shared queue can process cells K times faster than the switch's clock speed.
The second stage of the system is a set of output queues, each of which may accept up to K cells during a slot time and emits 0 or 1 cells at the end of each slot time.
General Structure
The queueing system is implemented as an event simulation, where the state changes at arrivals to and departures from each queueing stage, and at the end of every slot time. The simulation begins with initialization, where all variables are set to appropriate values and buffers are allocated. The program then goes into a loop which terminates when the requisite number of traces have traveled through the system.
The loop consists of a timing routine and one of several event routines. The timing routine determines which event occurs next, based on the system state, and advances the system clock to the time of that event. The events modify the state of the system according to the type of event, and schedule the OCcurence of future events (such as a cell arrival or departure).
The simulation operation is based on a set of counters which represent the size of the various queues in the system, as well as other aspects of the state of these queues (number entered, number lost, number of cells ready to depart, etc.). Each event function updates a subset of these counters, depending on which part of the system it is controlling. both of the two stages.
The following statistics are kept for each queue in 0 The arrival rate, A, is an input parameter to the simulation, and is compared with the measured arrival rate for each queue in the two stages as determined by dividing the number of cells offered to the queue by the simulation run time, in number of slot times. This is used for maintaining internal accuracy.
0 If an arriving cell (in either stage) is addressed to a full buffer, that cell is lost, and a counter for this occurrence is incremented. The probability of cell loss for a queue is calculated as the number of cells lost at that queue divided by the total 0 4.2 number offered to that queue.
The average queue length i ( t ) can be computed by accumulating the number of cells in the queue at time d , over the total run time T of the simulation, then dividing the result by T [7] :
The integral is computed for a given queue by accumulating the number of cells in that queue multiplied by the time since the last event for that queue. Since the number in the queue is constant over this time, the result is an area under the graph of Q ( t ) and is an accurate value for the integral.
Input Sources and Queueing M o d e l s Used
For the purpose of comparison, two self-similar sources, namely, Pareto interarrival times and Poisson-Zeta ON-OFF source, were supported.
A matrix geometric solution for the queue length distribution in a single ATM output queue was found by Diamond and Alfa [2] . This study considers a single buffer of an ATM switch, which is fed by a finite number of s input ports. Slot time T is partitioned uniformly into s units, and one input port may inject a cell during a given units of time. The overall effect is that during a given slot time, the output buffer may receive up to s cells and release up to one cell. Interarrival times in this model have independent identical Pareto distributions.
The major difference between the simulation and the Pareto arrival process described here is that the simulation involves both a shared queue and multiple output queues, whereas the latter specifies a single output queue. Both account for multiple input ports.
To verify the simulation with the results of this analytical model, the simulation is given parameters to set it up for a single output queue and hence a single logical buffer in the shared buffer. Internal speedup is set to one, so that the single logical buffer in the shared queue emits one cell per slot time. Furthermore, the buffer sizes are set to a sufficiently high number that cell loss does not occur.
In another analytical study, Georganas and Fan [3] obtained the upper and lower bounds of the buffer overflow probability in an ATM switch with output queueing. Here, the number of bursts beginning at a discrete point in time is a independent random variable with a Poisson distribution, and the lengths of the bursts have independent identical Zeta distributions. The general form of the Zeta distribution is given in Equation 7 .
The simulation in this study uses a single "virtual" input port, which can take on new bursts according to a Poisson distribution. Each new burst is assigned to a unique input port in the simulation, from a large pool of inputs initialized at the outset. The parameters to the Poisson and Zeta functions can then be scaled to be consistent with the analytical model.
Results
Data was gathered from the simulator to satisfy two inquiries: A comparison of the Pareto and the Poisson-Zeta self-similar traffic models, and an investigation of the internal characteristics of the 2-stage buffering ATM switch. The traffic model comparison was achieved by configuring the simulator as a single output buffer with four input sources. The simulation of the entire switch was done by configuring the system as a 4 input, 4 output fabric. This small number kept the simulation runtimes within practical limits.
The four different input source traffic models were fit to the simulation using the methods discussed in Section 4.2.
Each data point in this study is the average of four simulation results with different random number seeds, and each simulation was run for 10 million cells (40 million cells total for each data point). This means that results in the range of or below may contain non-negligible error.
S i n g l e Buffer
The two analytical models described in this study, the single ATM queue with Pareto-distributed interarrivals and the ATM output queues with Poisson-Zeta ON-OFF sources, can be compared by looking at the switch in a single queue configuration. Since the analytical models can be related to a single queue case, the results in this section can be used to verify the simulation by comparison with the analytical results. 
4x4 Switch
An important use of a simulation such as this is to aid in the process of choosing appropriate buffer sizes. In this queueing model, it was found that for even small buffer sizes (> 8) in the shared and out- Figure 6 : Cell loss probability in a 4x4 switch configuration put queues, a speedup greater than 1 caused the cell loss probabilities to drop below the measurable range. Furthermore, just one of the two queueing stages exhibited loss for a given simulation run in all of the practical cases (buffer sizes above 10).
In certain cases, however, the results are interesting and unique to the type of shared buffer in this model, and these are discussed below.
It is seen in Figure 6 that when the complete switch is simulated, the cell loss probability characteristics associated with the four input processes are similar to those found in the single buffer case shown in Figure 5 . With the exception of the Poisson-Zeta process, the flexibility of the shared buffer results in smaller cell loss probabilities over the range of buffer sizes. The Poisson-Zeta process, however, performs worse for the 4x4 case than in the single queue case. This may be due to the variable number of bursty sources increasing above 4. Figure 7 shows a comparison between the Pareto and Poisson interarrival processes with an uncorrelated addressing bursty process. Here, the effect of correlating the addressing for each ON period in the bursty processes has a detrimental effect on performance. Clearly, introducing output addressing correlation for each burst causes the bursty process to perform worse than the highly self-similar Pareto process. Since the Poisson-Zeta process is an ON-OFF model, correlation could likewise affect the results from this model.
However, by comparing the correlated processes (Poisson-Zeta and bursty) and the uncorrelated processes (Pareto and Poisson) separately in Figure 6 reveals a more important result, which is that the selfsimilar processes have a more detrimental effect on the cell loss probability characteristic than their Poissonbased counterparts.
In choosing a buffer size for the shared queue in Figure 7 : Cell loss probability in a 4x4 switch configuration, with uncorrelated bursty process an actual switch, the designer must first determine the nature of the expected input traffic. If the traffic is most accurately modeled as one of the self-similar processes used in this study, then based on these results, the size of the logical buffer can be as low as 50 (Pareto), or it should be well over 400 (Poisson-Zeta).
It should be noted that modern ATM switch fabrics do not necessarily need to include this much buffer space. This is partly because fabrics may approach the problem of buffer overflow in different ways. The model used in this study used queue loss to deal with a saturated buffer, but other systems can use more sophisticated schemes, possibly involving some form of backpressure or priority scheme. One example of this is the Atlas switch [6], which uses flow control and a credit-based scheme in coordination with a shared buffer to deal with buffer overflow. The total shared buffer size in this case is 256 cells. The plots in Figure 8 show the effect on cell loss of changing speedup for small buffer sizes and fixed utilization, p (recall that a logical shared buffer size of 2 means that the total shared buffer size is 4 x 2 = 8). For each traffic type, the cell loss probability predictably declines when the speedup increases from 1 to 2. The loss probability levels off after that, because the reduced loss in the shared buffer results in an increased loss for the output buffers. The probabilities are quite high in this case, because of the very small buffer sizes. 6 
Conclusion
In this study we have examined the effect of selfsimilar traffic on an ATM switch fabric. It was found that for processes with either correlated or uncorrelated addressing schemes, the combination of a high degree of self-similarity in the input process and high utilization degrades the cell loss performance of the system significantly more than for a comparable Poisson-based traffic source.
It was also found that the two self-similar processes used in this study did not have the same effect on the queueing model. While both the Pareto and the Poisson-Zeta processes resulted in similar queue length behavior, the Poisson-Zeta source produced values nearly twice the magnitude of the corresponding Pareto results, for high utilization and Hurst parameter values below 0.9. The cell loss probabilities resulting from the Poisson-Zeta process were likewise higher and more heavy tailed than for the Pareto interarrival process.
Clearly, these results show that if the actual traffic in a physical system is self-similar, then simulations of this system should not ignore the effects of this longrange dependence. Furthermore, the process used to model the self-similar traffic should be carefully chosen, as it may have an impact on the accuracy of the simulation results.
