The main concern in designing the switching fabrics used in Asynchronous Transfer Mode ( 
Introduction
The Broad-band Integrated Services Digital Network (B-ISDN) is a computer communication network that provides multimedia services including voice, video and data with high bandwidth and statistical multiplexing technology [3] Jenq [2] analyzed the performance of the single buffered Banyan network under uniform traffic. In Jenq's research a simple analytical model is presented in recursive form, to analyze throughput and delay of a single buffered Banyan network that consists of 2x2 crossbar switching elements (SE). Yoon, et. al. [4] extended Jenq's model by analyzing the multibaered Banyan network consisting of SEs with arbitrary size. Wu [5] and Kim [7] studied the performance of buffered Banyan network under non-uniform traffic conditions. Hui and Edwards [6] studied the performance of switch with the Batcher sorting network followed by a Banyan network. The Batcher sorting network sorts packets such that internal blocking in Banyan network does not occur.
Ta and Meditch [SI improved the performance of the Banyan nehvork by employing 4x4 switching blocks over 2x2 switching elements. Tridandapani and Meditch [9] analyzed the single buffer Banyan network with dilations and replications to enhance throughput. They also analyzed the delay of various types of traffic passing through the switch by assigning priorities to packets.
Jenq [2] obtained the performance of single buffer Banyan switch by analytical method assuming that load on all swtching elements (SE) in switching fabric is unifcrmly distributed. He used two state Markov chain model to obtain the throughput and delay of the switching network. He assumed that the blocked packet will generate a new random output destination port in the next clock cycle. In recent study by analytical method, three-state model and priority scheme is proposed by Yan and Jenq [lo] . In this model they recognized that the blocked packet will not generate a new random destination in next clock cycle, instead the designated output port of blocked packet will remain the same after the packet is blocked. The three-state model results are closer to simulation results. In the priority schente the packet blocked in previous clock cycle in a SE has priority to advance in next clock cycle. The priority scheme can be used to significantly reduce the variance of delay.
In section I1 we describe the functioning of single buffer Banyan switch. In section I11 we: discuss simulation method. In section IV we compare simulation results to analytical results obtained by Yan and Jenq [ 101. In section V we increase throughput, decrease delay and variance of delay by implementing non-blocking first stage and enhanced priority schemes. In section VI we present the performance of switch based on single buffer and double buffer SEs, with mixed voice and data tra.f€ic, where priority is given to voice traffic. Conclusions are given in section VII.
Description of the single buffer Banyan fabric
The Banyan network was designed by Goke and Lipovski [l] . The 3-stage single buffer Banyan network is shown in Fig. 2 .1. Each switching element (box) is a 2x2 cross bar switch with one buffer on each input link. The b e e r serves as storage element for packets during routing. The switch with number of input links equal to n has number of stages b = logs. A header is attached to each ATM cell for routing packets through switching network to designated output link of switching fabric. The header contains the value of particular output link of switching fabric to which the cell is destined. The header with ATM cell constitutes a packet in switching fabric. '0' and '1' bit in the value of header is used to route the packet to the upper and the lower output link of SE respectively. The succeeding switching elements will route packets to next stage until the packets reach destined output link of switching fabric.
In order for the packet to be able to move forward, either the buffer at the next stage is empty or there is a packet in the buffer and that packet is able to move forward. If the buffer at following stage is occupied by packet, the packet in succeeding stage is blocked, this blocking is known as external blocking. Internal blocking occurs when both packets in buffers are going to the same output link of SE. In this case one packet will move to next stage and the other is blocked. The switching fabric operates synchronously. In first part of the clock cycle control signals are passed from the following stage to the succeeding stage, so that every port of the succeeding stage can determine whether to send or hold its packet. In the next part of clock cycle packets advance to following stage.
The parameters studied in performance analysis of single buffer Banyan switching fabric are normalized throughput, normalized delay, normalized standard deviation of delay and probability of packet loss. h'ormalized throughput is defined as the number of packets arriving at an output link per clock cycle. Normalized delay is defined as the average delay experienced by a packet at a switching element of the switching network. The variance of delay is the average of square deviations from the mean delay. The Normalized standard deviation of delay is the square root of the variance of delay experienced by a packet at a switching element of the switching fabric. When the buffer at the input stage is occupied by packet, the new incoming packet will be lost. f i e probability of packet loss for the switching fabric is defined as the ratio of total packets lost to the total number of packets arriving at switching fabric. In this section we describe the simulation method used to study the parameters defined in last section. The simulation program code is written in ANSI C programming Language. The user defines number of input ports to the switching fabric n, and the number of stages b at the beginning of program.
To realize switching fabric in simulation program the packet is conceptualize as the structure. The structure is named as atm-packet. The structure atm-packet is defined as a global variable. We conceptualize the stage of switching fabric as the array of pointers to structure packet. The number of such arrays are equal to number of stages in switching fabric, hence each buffer of a stage corresponds to an element of array. In the buffer of switching fabric a packet may be present or buffer is empty, i.e., there is no packet in the buffer. In case there is no packet in the buffer, the corresponding element of array has the value of pxx. In case the buffer is occupied by a packet, the corresponding element of an array holds the pointer to structure atmpacket. When packet advances to next stage, the value in the element of array is changed to pxx. The value in the element of array will be pxx until a new pointer to a hpacket moves into the element in next clock cycles The array of pointers to structure atm-packet called outfnJ is used to save packets that arrive at output ports. In each clock cycle desired data from structures am-packet is collected before discarding them from this array.
We declare an array of structures called pac of size [nx(b+l) ]. Each element of the array is a structure atmpacket. The content of atm-packet reside in the array as long as pointer to atm-packet is in switching fabric. When the pointer to atm-packet amves at array out, the contents of structure atm-packet are nullified. The array of pointer to structure atm-packet named pac-add of size [nx(b+I)j holds pointers that are not present in switching fabric. Any element that does not hold the pointer to packet has the value of pxx. When a pointer to arm-packet arrive at array out, it is recorded in this array. When a new packet is generated at input port, the top most pointer to atm-packet in this array will be the address of new packet.
The array c-inp[nJ shows total number of packets destined to each output link of switching fabric from input side. The size of array is n. The value in the i# element of array shows total number of packets destined to the i~ output link of the switching fabric. The may -num[3xn] . The rand() function in C is used to generate random numbers. The rand() function is seeded with system time to generate a new set of random numbers each time it is invoked. The value of random number varies between zero and one. The random numbers are used for generation of packets, assigning of the destination output port of switching fabric for packets, and In determination of type of traffic in packets. Every time a random number is accessed a new random number IS obtained from next element of array RIG R L~ A new packet is generated at an input link, if hows link the accessed random number is less than or equal to input load. The input load is the probability of arrival of packet at an input port in each clock cycle. The generated packet is always accessed by its pointer. The address of new packet is taken from array pnc-add. The address of the packet is address of element of array pac, which contains contents of structure atmgacket. A random number is accessed and multiplied by n, the integer value of product of these two numbers is the output link of switching fabric to which generated packet is destined. The type of traffic in the packet is determined by comparing random number with percentage of voice traffic in input load. A voice packet is generated when accessed random number is less than or equal to percentage of voice traffic, otherwise a data packet is generated.
The packets move to array corresponding to following stage at the end of each clock cycle. The program checks for elements of array of following stages with the value of pxx before switching packets. En case a packet is present in element of array of following stage, the packet at succeeding stage is blocked. When internal blocking occurs we have non-priority, priority and enhanced priority schemes. Whenever a packet is blocked at a stage, the value of respective element of array bfu is increment4 by one. In nun-priori& scheme both packets at input links of the SE have same probability to move forward. The scheme is implemented in simulation program by accessing a random number, if the random number is less than or equal to 0.5, the packet at lower input link of SE is blocked, otherwise the packet at upper link is blocked. The elements of array bfo indicates whether a packet was blocked in previous clock cycle. The packet is routed depending on bits in variable destadd. '0' or '1' routes packet to upper or lower output link of the SE respectively. The most si@icant bit in dest-add is used for switching the packet at first stage and the least significant bit is used for switching packet at last stage .  Fig 2.1. shows interconnections between different stages of switching fabric. The flow chart of simulation method is shown in Fig. 3.2 . In each clock cycle, the simulation program first routes pointers from elements of array for last stage (stage b), then routes pointers from elements of array at stage (b-1), and lastly from array for first stage. After packets from first stage are routed, new packets are generated, and pointers to new packets are copied in elements of an array for first stage. The program counts total number of packets generated. In case the element of array at stage 1 is occupied by packet, the newly generated packet is lost. The program counts total number of packets lost. 
Probability of packet loss = Total number of packets lost Total number of packets generated where N is the total number of packets arrived at output links.
dation and analytical
All results are for simulations run for 10' clock cycles.
We simulated switching fabrics consisting of 4, 6 and 10 stages. The results of simulation run for lo6 clock cycles for a switching fabric consisting of four stages indicates that the change in value of normalized delay is less than 0.1"/0, the change in values of normalized throughput and normalized standard deviation of delay is less than 0.2% and 0.6% respectively. The difference in values of parameters obtained from simulating the snitch consisting of 6 and 10 stages for lo6 clock cycles is lesser than the difference obtained for 4-stage switch.
4.1

C o~~~~~s o n of simulation and three-state ell results with non-priority scheme
The comparison between three-state model and simulation results with non-priority scheme is shown in figures 3, 4, and 5. The input load is shown on x-axis in all figures. Fig. 4 .1 shows that the normalized throughput of fully loaded switching fabrics consisting of 4, 6 and 10 stages is 10% lower than three-state model. The term fully loaded indicates an input load of 1.0 on the switch. The normalized delay of fully loaded switching fabrics consisting of 4, 6 and 10 stages is 2% higher than three-state model. We didn't plot the comparison of delay as it is almost same as analytical model. Fig. 4.2 shows that when the switch is fully loaded, there is a difference of 30%, 38%, and 45% in the normalized standard deviation of delay of switching fabric with 4, 6 and 10 stages respectively. The variance of delay obtained from three-state model are higher. The comparison shows that the results of three-state model are optimistic except for variance of delay.
priority versus priority scheme
Comparison of simulation results with non-
We simulated switching fabric with priority scheme after non-priority scheme. The performance improvement in normalized standard deviation of delay from priority scheme versus non-priority scheme is shown in figure 4 .3. When input load is 1.0 for 4, 6 and 10 stage switch the difference is 25%, 21% and 19% respectively. The three-state model [ 101 predicted increase in throughput by 10% for a 6-stage switch for input load equal to 1.0 with priority scheme. The simuIation results confrmed signifcant reduction of variance of delay with priority scheme, however it did not show any major change in throughput as shown by three-state model. When input load is 1.0, there is 1% difference between results obtained fiom two schemes for throughput, delay and probability of packet loss. The delay decreases, and throughput increases with priority scheme. Delay and variance of delay are important in case of delay sensitive traffic such as voice and video. We extend priority scheme proposed by Yan and Jenq [lo] even further by suggesting enhanced priority scheme. In the Enhancedpriority (EP) scheme when internal blocking occurs in the SE, priority is given to the packet blocked greater number of times. The simulation results shows that the normalized standard deviation of delay of EP scheme compared to priority scheme, for 4, 6 and 10 stage switch is lower by 20.7%, 25.7% and 33.7% respectively. This is shown in figure 4 .3. The results shows that the reduction in variance of delay is higher when the switch is largcr. The scheme also provides an increase in throughput by 2% for the 4-stage and 4.5% for 10-stage switch. The EP scheme can be useful in minimizing the variance of delay in the case of nonuniform traffic condition.
Non-blocking first stage (NBFS) switch
In the first stage it is easier to avoid internal blocking by routing packets on two Merent input links of SE just by checking the most s i w c a n t bit of the header of packet. We call the switching fabric with such an arrangement as a non-blocking first stage (NEIFS) switching fabric. The BEIFS scheme reduces degree of blocking significantly for first stage, this will allow lesser packets to be dropped, hence results in increase in throughput of the switch. In simulation the scheme is implemented by routing the packet with '0' in the most significant bit of header on the upper input link of the SE and '1' in the most SigniFicant bit on the lower input link of SE.
Switch with NBFS and EP scheme
We simulated switching fabric combining NBFS scheme and EP scheme. Fig. 5.1, 5.2, 5.3 and 5.4 demonstrate difference in throughput, delay, variance of delay and probability of packet loss between this smltching fabric and a normal switching fabric with priority scheme. Fig. 5.1 shows that the normalized throughput of fully loaded switch consisting of 4, 6 and 10 stages increases by 11.5%, 8.5% and 6.9% respectively. Fig. 5.2 shows that the normalized delay of fully loaded switch consisting of 4, 6 and 10 stages increases by 3.08%, 6.3% and 8.2% respectively. Fig. 5.3 shows that the normalized standard deviabon of delay for fully loaded 4, 6 and 10 stage switch decreases by 23.8%, 25%, and 29.4% respectively. Fig. 5.4 shows that the probability of packet loss of fully loaded switching network consisting of 4, 6 and 10 stage decreases by 1 1.6%, 6% and 3 .5% respectively. Different types of traffic will be passing through switch as the purpose of ATM network is to support multimedia traffic on a single network. We simulated the switch with NBFS, EP scheme in mixed traffic condition. In mixed traflic condition, the packets passing through the switch are voice and data packets. In case of internal blocking in the SE between a voice packet and data packet, voice packet has priority to advance to next stage as it is more sensitive to delay. When internal blocking occurs due to same type of packet, the packet blocked greater number of times has priority to advance to next stage. The results shows that the throughput of switch remains almost the same, delay and variance of delay depends on the type of traffk. When simulating the switch with mixed traffic, the variables are input load and percentage of voice trait. We simulated the 4, This is shown in Fig. 6.3 . The reason for this behavior can be explained as follows: When the input load is high and priority is given to voice packet, the data packets are more likely to be blocked. The blocking off each data packet causes external blocking of packets in succeeding stages resulting in increase of delay variance of voice and data traffic. However, When the voice traffic is increased above 80%, the data traffic decreases below 20%, there is less external blocking of voice packets due to blocked data packets, hence the variance of delay of voice traffic decreases.
Double buffer Banyan network
Multibuffer SE [4] were proposed to enhance the throughput. Multibuffer SE consist of arbitrary number of serial buffers at each input link of the SE. The problem with the Multibuffer SE is increase in the delay which is not desirable in case of certain traffic sensitive to delay. In the proposed double buffer SE we try to encounter this problem. The double buffer switching element is the development on single buffer switching element. The structure of 2x2 double buffer switching element is shown in figure 6.1. Each input link of the SE is connected to two buffers through a demultiplexer. The demultiplexer has an additional function of recognizing the empty buffer, and routing the incoming packet to the empty buffer. A multiplexer is placed between buffers and the cross bar switch. The multiplexer is an "intelligent" multiplexer. The multiplexer has the additional function to give priority to a type of traffic in the enhanced priority scheme. When the packet is blocked, it has the circuit to increment the number of times a packet is blocked.
The structure and operation of double buffer Banyan switch is similar to single buffer Banyan switch except that the single buffer S E are replaced with double buffer SEs. We simulated the 4, 6 and 10-stage switch, with NBFS and EP scheme for the input load of 0.1, 0.2, 0.3, 0.4, 0.5, 0.7 and 1.0. The simulation results shows significant increase in throughput, and decrease in delay and variance of delay of voice traffk at the expense of data traffic. Figure 6 .5 shows that the throughput of fully loaded 4, 6, and 10-stage switch increases by 12%, 23% and 39% respectively. The delay and variance of delay in single traffic condition, for fully loaded 6-stage double buffer switch, increases by 45.5% and 9.5% respectively. The delay and variance of delay of voice and data packets in mixed traffic condition for fully loaded 6-stage double buffer switch is shown in figure 6 .2 and 6.4 respectively.
In mixed traffic condition with 30% voice traf€ic, the delay and variance of delay of voice trafiic decreases by 12% and 15% respectively, the delay and variance of delay of data traffk increases by 30% and 70% respectively. The comparison of probability of packet loss between single and double buffer switch is shown in figure 6.6.
Input link Output link
B Multiplexer 2x1 
Conclusion
In this paper, we studied the performance of Banyan based ATM packet switch based on single and double buffer SE by simulations. The comparison of three-state model [IO] and simulation results shows that, the results of three-state model are optimistic except for variance of delay. The simulation results showed that although throughput does not increase with priorily scheme, there is a significant reduction in variance of delay.
The EP scheme can be useful in minimizing variance of delay in non-uniform traffic pattern. The throughput increases, and variance of delay significantly decreases for the switch with combined NBFS and EP scheme. We obtained the performance of switch with mixed/ voice and data traffic, where priority is given to voice packet. It is found that for input load greater than 0.4 ( which is the supportable throughput of the switch ) the normalized standard deviation of delay of voice traftic reaches its maxima when percentage of voice trmc is about 80%.
We proposed double buffer switching fabric to lessen the effects of external blocking. The throughput of fully loaded double buffer 4, 6, and 10-stage switch increases by 12%, 23%, and 39% respectively. The delay and variance of delay of fully loaded double buffer switch, with single type of traffic, is higher than single buffer switch. However, delay and variance of delay of voice traffic of fully loaded 6-stage switch in mixed traffic condition with 30% voice traffic, decreases by 12% and 15% respectively.
In pursuit of the goal of increasing throughput, and decreasing delay and variance of delay, the switching fabric with more than 'two buffers in parallel at each input link of the SE may be studied. Double buffer Banyan switch based on 4x4 SE operating in the EP scheme may also be studied.
