We present an analytical model for evaluating the performance of finite-buffered packet switching multistage interconnection networks using blocking switches under any general traffic pattern.
generally. There is a need for an analytical model to evaluate the perform ante under more general conditions.
We first present a description of a decomposition & iteration model which we propose for a specific hot spot pat tern. This model is then extended to handle more general traffic patterns using a transformation method.
For an even more general traffic condition where each processing element can have its own traffic pattern, we propose a superposition method to be used with the iteration model and the transformation method. We can extend the model to account for processing elements having different input rates by adding weighting factors in the analytical model.
An approximation method is also proposed to refine the analytical model to account for the memory characteristic of a blocking switch which causes persistent blocking of packets contending for the same output ports. The analytical model is used to evaluate the uniform traflic pattern and a very general traffic pattern " EFOS". In this paper, the interconnection network we consider is a clocked, packet-switched finite-buffered Banyan network made up of 2x2 switches, each of which has buffers of finite size K at their output ports (see Figure  1 ). There are N processing elements and N memory cal in topology. It is sufficient to discuss the delay and throughput performance of the forward network only.
Each packet generated at the processing elements carries an address tag with a number of bits equal to the number of stages of the interconnection network.
The address tag is a binary representation of the destination address. It is then fed into the first stage of the network. The first stage switch examines the first (i.e. most significant) bit of the address tag; if it is a O, the packet is routed to the queue at the upper output port. If the first bit is a 1, the packet is routed to the queue at the lower output port (see Figure 1 ). The packet then waits in the queue until its turn to be served. Packets are assumed to be of the same length (i.e. fixed size packets). A packet is generated by each processing element independently with probability Q in each cycle. All processing elements are assumed to have this identical bernoulli input process. This assumption is later relaxed by using weighting factors to allow each processing element to have its own in-put rate qj, 1 < j < N. We assume that there is no buffer space at the processing elements. After being generated, a packet is discarded if it cannot be delivered to the first stage of the interconnection network either due to a full buffer or a cent ention failure. Discarded packets are not re-submitted.
A packet, once accepted by the network, is never discarded inside the network. The input process is independent of the discarding process. (An extension of the current model to allow blocked packets to be stored in a finite-sized queue or an infinite queue is underway.) An important performance measure is the total time a packet spends in the network. Time delay is meaningful only for those packets accepted into the network. The probability of acceptance, another performance measure, is the probability that a packet is accepted into the network after it is generated. The normalized throughput is simply the probability of acceptance multiplied by the input rate. Current work also includes an extension to the case of multiple packet generation.
Each processing element has a memory module ref- Figure 1) . A slower memory module (e.g. 2 cycles to accept a packet) will have a severe effect on the performance of the network, Extension to slower memory models is underway.
Routing Model
In the real world, the packets are routed according to their destination address. However, in order to analyze the network analytically, an abstract flow model that can be used in an analytical model must be established that at least faithfully reflects the steady state flow situation in the network. We propose a routing matrix r-t,j , 1 < i < n, 1 < -j < N where rt,~is the routing probability of the jth input port in stage i. A packet entering a switch will be routed either to the upper output queue with probability T,,j or to the lower output queue with probability 1 -r~~. To simulate a uniform traffic pattern, we simply let all T,,j be ().5.
With equal probability of choosing output queues, no memory module is preferred. A special hot spot pattern can be created by letting all r,,j be an identical value greater than 0.5. For instance, by letting all TI,J be 0.8 in a 10 stage network, 10.770 (= .810) of the total traflic will go to memory module O in a 1024-node network with 2.7~o of the traffic going to the second highest referenced memory modules (all memory modules with a single l-digit in their address tag) and other fractions of traffic to the other memory modules. The advantage of this routing model is that by changing the value of T,,3 with proper mappings from real traffic patterns, we can evaluate any general traffic pattern. We leave the general r%,] to be discussed in Section 3.
Throughout this section, all r,,j are assumed to have the same value, ri,j = r. a decomposed queue, we consider the combined input from 2 input sources and the combined probability of blocking from the 2 output queues. The approach is as follows :
Let Ql,j represent the jth queue in stage i and P,,j (k) be the steady state probability that there are k packets in, the queue Q%,3. Let Q$-l,J1 and Q3-1,32 be the two input sources from stage i-1 that feed Q,,j.
be the probability that there are i packets destined to Q,,j from its two input sources. In the following, we solve for the equivalent input rates for a queue Q~,j which is located at output port O :
Figure 2: Markov chain of a queue Qi,j extracted from the network where the state variable represents the number of packets in that queue
The first term in the X O with probability r. Regarding the equivalent blocking condition, let Bi,j be the probability that a packet in the jth queue in stage i is blocked at the end of the cycle. Let Ci,j be the probability that the jth queue in stage i is blocking a packet in stage i-1. Let Qa+l,jl and Qi+ltiz be the two output queues of Qi,j and let Qi,l be the queue that feeds both Qi+l,jl and Qi+l,jz. Then the equivalent blocking condition for queue Qi,j is as follows :
The first term in the Bi,j equation represents the case when the packet at the head of queue Qi,j chooses Qi+l,jl with probability r and is blocked bY Qi+l,jl.
The second term represents the other case when the packet chooses Bi+l,jz and is blocked. There are two situations in which a queue blocks a packet in the preceding stage : firstly, when the queue is full, and secondly, when the queue has only one more space and a contention from Qi,l wins the arbitration. Figure 2 where B represents the blocking probability Bi,j, We repeat this process for other queues in the first stage, in the order Q1,2, Q1,3,.. .Ql,iv. Using these new state probabilities as the new input rates, we repeat the same process for all queues in the second stage in the or- 
The total output rate over the input rate is the probability of acceptance at the output port. From the input port, we solve for the probability that a packet generated at the PE's is discarded due to a full buffer or a contention failure at the first stage. This discarding probability is Bo,j , which can be solved for using equation (2) . Hence,
Both values, although solved in different ways, should be equal when the MIN reaches steady state. (This can be used to test for the correctness of the model.)
The normalized throughput is found by multiplying the probability of acceptance by the input rate. We apply Little's result to calculate the average time delay of a packet. When the network reaches steady state, we take the sum of the mean queue size for the whole network using the steady state probabilities of queue size of each queue. Given the throughput and the average number of customers in the system, the average time delay can be solved for by applying Little's result.
Results
The . Each processing element can have its own input rate.
These three different assumptions represent different levels of general traffic patterns. We shall discuss the modeling approaches for these three different assumptions in the next subsections.
Identical General Traffic Patterns for the Processing Elements
A traffic pattern that can be in any form implies that the~z,j's in the routing matrix no longer have the same value r as discussed in the previous section. The approach to model this general traffic pattern is to find a mapping scheme that transforms the given referencing pattern into a set of T2, j's which reflects the steady state traffic flow in the network.
Let us take a 3 stage Banyan network as an example, as shown in Figure 3 . Since we assume that all processing elements have the identical general traffic pattern, we only discuss the transformation method for one processing element. If there exists a steady state referencing pattern, we can represent it in terms of destination accessing probabilities Aj, the probability that a new packet generated by a processing element chooses memory module j as its destination. Consider a packet generated by processing element O and observe the path it takes as it travels through the network to access the memory modules. A packet chooses memory module O with probability A. which equals rll . T21 . r31. Similarly, a packet chooses memory mod- Since there is only one traffic pattern for all processing elements, this routing probability set is valid for all other processing elements. 
Results
Since the proposed analytical model employs several approximate methods, it is important to study how these approximations aHect the model accuracy. There are two approximations in the modelling approach :
w decomposing a queue from a network of queues with blocking into an independent queue. it') allows" blocked packets to be routed around a congested queue. In the real world, blocked packets repeatedly access the same destination, and most likely, these blocked packets will be blocked again (especially when the traffic is not uniform).
The EFOS (Even-First-Odd-Seccmd) pattern was proposed in [12] using an Omega network where even addressed processing elements send all their traffic to the first half of memory modules uniformly while the odd addressed ones send their traffic to the second half of memory modules uniformly.
The destination traffic The analytical results we obtained are plotted against simulation results in Figure 4 . As predicted , the analytical model is very optimistic due to the independent routing choices it allows. When severe blocking is present due to contention, the blocked packets will choose the same output queues repeatedly in the real world while the renewal choice in the analytical model allows the blocked packets to choose other queues. This inherited "memory" structure in blocking switches severely degrades the performance since it is likely to have persistent contention for a queue once contention occurs.
The discrepancy between analytical and simulation data is caused mainly by this memory characteristic of the blocking switch. We propose an improvement in the next section to model this "memory" behavior of a blocking switch. 
Model Approach
Since the basic model is a renewal process, we continue to model the memory behavior as a renewal process, However, the behavior of a blocked packet, after its first blocking, is such that the routing choice no longer uses the renewal probability r~,j. Biasing the routing probabilities to account for this does not help since it
FiWre 5: The states of a server during its busy period.
changes the memory referencing pattern. The rout ing probabilities were created to reflect the steady state memory referencing pattern; therefore, it is necessary to keep the values unchanged.
Although an exact model of this persistent blocking behavior would require that we keep track of how many times a packet has been blocked at a given node, we choose an approximation which captures the "first order" effect of this persistence using the following two state model. When the queue is not empty : we model the server as being either in the "new" state or the "blocked" state. When a packet first comes into the server, the server is in the new state. The server enters the blocked state when the packet is blocked, and it remains in the blocked state until the blocked packet finally goes through to the next stage. This cycle repeats until the server empties the queue and becomes idle. Observe that the server is inactive when it is in the blocked state. While in the new state, the server obeys the renewal behavior choosing an output port according to the routing probability TZ,J. Hence, we can approximate a blocking switch with "memory" characteristics by a finite buffer queue with a reduced service rate. The reduced portion is the probability that the server is in the blocked state.
The diagram in Figure 5 shows how the server alternates between the new state and the blocked state during its busy period. Let b be the probability that a new packet is blocked when it tries to go to the next stage. Let c be the probability that a blocked packet is blocked again when it tries to go to the same destination. Then for our approximation, the steady state probability that the server is in the blocked state, PblOcked, can be solved in terms of b and c :
where b is the blocking probability (for which we used the notation Bt,j in section 2.3). Once blocked, it is more likely that a blocked packet gets blocked again; therefore, the value of c is selected to be larger than the value of b. In fact, when a packet is in the blocked state, the length of the destination queue in the next cycle will be either K (full) or K-1 (only one space available). If we disregard how many times it has been blocked previously, there will be only two cases : either the blocked packet faces a full queue or a queue with one space left. In the first case, with probabilitỹ~~~~~~_ ll, the packet will be blocked again. In the second case, with probability -T, the packet will face possible contention from the other queue in the same stage which feeds this destination queue. Incorporating these two probabilities in equation (2), c can be found in a similar way : C =~. C~+l,jl + (1 -~, " Ci+l,j2
The probability that the j l-th queue in stage i+l is blocking a packet in stage i, C$+l,jl, is :
R+l,jl(~-1) _ C,+,,jl = ; .(1 -P,,J(0)). P,+l,jl(lf) + p,+l,jl(~-1)
Pbl.ck.d is the probability that the server is in the blocked state. During this period, the server is inactive. Therefore, we may use this probability to approximate the blocking switch with "memory" characteristic. At the beginning of each cycle, the server tosses a coin which comes up heads with probabdlt y PblOC&.d, . .
in which case the server will be blocked (inactive). If there is a packet at the server, it stays idle until the next cycle when the coin will be tossed again. With probability 1 -Pb&~ed, the server will be active. The queue length then determines whether the server will send a packet or not. If there are packets in the queue, the server takes the first packet and routes it according to the routing probability.
Incorporating the probability Pbl..k.d into our previous model, the approach is then similar except that the equivalent input rates and blocking probabilities are different. In the original model, when a queue is not empty (with probability 1 -Pt,j (0)), it tries to transmit a packet to the destination in stage i+ 1. However, for the persistent blocking model, a queue tries to transmit a packet to the next stage with probability
(1 -P,,j (0))" (1 -pb~~.ked), the former is the probability that the server is not empty and the latter is the probability that the server is in the "active" state. When the server is not empty and it is active, it transmits a packet to the next stage.
Let us define P,~~f (0) to be the effective probability that Q,,j will not send a packet (either the server is empty or the server is not empty and is blocked). Let P$,j,bf.eked be the probability that Qt,j k nOt empty and is in the blocked state. Then the effective input rates of a queue, Qi,j in this persistent blocking model are similar to the ones in section 2.3 :
[T(I -P;:{,,,(o))]
The equivalent blocking probability Bt,3 can be found as follows : 
Results
We ran our model incorporating this new technique to handle the memory behavior for the same 6-stage Omega network (as in Section 3.5) with buffer size 4 under both the uniform traffic and the EFOS traffic 
