An analytical model for evaluating the performance of a packet scheduling algorithm, called lookahead scheduling, is proposed in this paper. Using lookahead scheduling, each input port of a switch has B packet buffers. A packet arrives at an input port is scheduled for conflictfree transmission for up to B time slots in advance. If it cannot be scheduled for transmission in the next B slots, the packet is immediately discarded for having more room for the packets arrived later on. Based on a set of recursive equations for obtaining buffer occupancy and probability that a packet cannot be placed into a buffer, analytical expressions for switch throughput, packet loss probability and mean packet delay are derived. Analytical rbults are then compared with the simulation results and good agreement is found.
slots, the packet is immediately discarded for having more room for the packets arrived later on. Based on a set of recursive equations for obtaining buffer occupancy and probability that a packet cannot be placed into a buffer, analytical expressions for switch throughput, packet loss probability and mean packet delay are derived. Analytical rbults are then compared with the simulation results and good agreement is found.
I. I n t r o d u c t i o n
Asynchronous Transfer Mode (ATM) is an international networking standard designed for cost-effective transfer of multimedia traffic, such as video-on-demand and video conferencing. Various ATM switch architectures have been proposed and studied extensively in order to provide high performance packet switching for integrated ATM transport. In this paper we focus on input-buffered nonblocking switches with N input ports and N output ports.
It has been found [l] that the maximum throughput of an input-buffered packet switch is limited to 58.6% under uniformly distributed traffic condition. This is because of the Head-of-Line (HOL) blocking phenomenon. To solve this problem, many queueing and scheduling techniques have been proposed aiming at maximizing switch throughput and minimizing mean packet delay [2] - [6] . In this paper, we focus on a scheduling algorithm [7] for switches with a single buffer queue per input port. We call it the lookahead scheduling algorithm. The through- put of the lookahead scheduling is found to be comparable to that of using the SDR algorithm (81 by simulations [7] . SDR is a maximum cardinality algorithm with extremely high time complexity of O(N4). As far as we know, SDR offers the best throughput and delay performance among various scheduling algorithms [6] , [8] , [9] .
In the lookahead scheduling algorithm, each input port has a single queue with B buffers and each buffer can accommodate one packet (as shown in Fig. 1 ). Upon the arrival of a packet, it is immediately scheduled for conflictfree transmission for up to B time slots in advance. If the scheduling effort fails, the packet is immediately discarded (instead of storing in the buffer) for having more room for packets arrived later on. It is shown [7] that the computational complexity of the lookahead scheduling algorithm is lower than that of the SDR [8] , MRS [6] and RS (91 algorithms. However, the packet loss performance of the lookahead scheduling has not been considered [7] . Unlike conventional scheduling algorithms [6] , [8], [9] , the switch using lookahead scheduling has fixed buffer size at each input port. Therefore packets will be lost due to buffer overflow even under light input traffic loading.
In this paper, an analytical model for performance evaluation of the lookahead scheduling algorithm is constructed. Analytical expressions for the switch throughput, packet loss probability and mean packet delay are derived. In the next section, the lookahead algorithm is summarized. In Section 111, a pipeline implementation of the lookahead scheduling is described. In Section IV, the analytical model for evaluating the switch performance is constructed. Then the analytical results are compared with the simulation results in Section V. Finally the paper is concluded in Section VI.
Lookahead Scheduling Algorithm
A . Data Structures 
B. Packet Pansmission and Packet Scheduling
In each time slot, the lookahead scheduling algorithm consists of two steps, packet transmission and packet scheduling. These two steps are summarized by the following pesudo-codes. Packet transmission takes place column-wise in a cyclic fashion. If column j is the transmitting column in current time slot, then column ( j + l mod B ) becomes the transmitting column in the next time slot. For packet scheduling, the key is to make sure the packets stored in the same buffer column must destine to different output ports. This task is carried out by each input port processor. Let the current transmitting column be j.
To schedule a packet arrived at input i with destination n, the input port processor searches a buffer for packet placement and the searching follows a column sequence of { j + 1,j + 2,. . . , B , 1 , 2 , . . . , j}. In the worst case, the last column to be searched is the transmitting column of the current slot. (We assume that all buffers in the transmitting column are available for storing packets at the end of a searching pass.)
The packets arrived at the same input port may not be served on a FCFS (first-come-first-serve) basis. But it will not cause the well-known packet out-of-sequence problem for each end-to-end connection. This is because for packets destined to the same output port (thus they belong to the same end-to-end connection) are served on a FCFS basis. Therefore the packet integrity is maintained. To maintain input port access fairness, the first port to be served in each time slot is cyclically rotated. That is if in the current slot we start to serve from input port i, in the next slot we start from input port (i + 1 mod N ) .
Pipeline Implementation
Each input port has an input port processor to carry out the buffer checking function. Each input port processor should have a speed of at least N + 1 buffer checkings per time slot. This ensures that the buffers which are checked in the same minislot must be at different buffer columns. Otherwise packet output conflict may occur. Without loss of generality, let each time slot be divided into N + 1 minislots and a buffer checking can be completed within a minislot. An example of a time slot consists of 4 minislots is shown in Figure 2(a) . At the beginning of a minislot, each input port processor checks a particular buffer for possible packet placement if there is a waiting packet. 
1218
Global Telecommunications Conference -Globecom' 99
An example is shown in Figures 2(b) -(k) for a 3 x 3 switch. The current transmitting buffer column which marked by an arrow is column 1 in time slot k. At the beginning of time slot k, all packets in the transmitting column are switched to their respective outputs. Let the order for packet scheduling be (input 1, input 2, input  3) . In the first minislot, the packet at input 1 is checked for placement at buffer (1,2) as indicated by the tilt line in Figure 2 (c). Since this buffer is occupied, it is passed to buffer (1,3) at the end of minislot 1. In minislot 2 (Figure 2(d) ), packet at input 1 is checked for placement at buffer (1,3) and packet at input 2 is checked for placement at buffer (2,2) simultaneously (i.e. pipeline operation). Packet at input 2 with output 3 is successfully placed (as indicated by the italic number). In minislot 3 (Figure 2(e) ), packet at input 1 is checked for placement at buffer (1,4) and packet at input 3 is checked for placement at buffer (3,2). Both requests are rejected because the two associated buffers are occupied. Then in minislot 4 (Figure 2(f) ), packet at input 1 is successfully placed into buffer (1,5).
Let the same order of (input 1, input 2, input 3) be followed for packet scheduling in time slot k + 1. At the beginning of time slot k + 1 (Figure 2 For input port access fairness, the input port serving order for different time slots should be shifted. We can show that the pipeline operation is still valid. From the above example, we can see that with the proper synchronization among input port processors, a pipeline operation of packet placement can be carried out. The scheduling speed of this pipeline implementation is independent of the buffer size (B) at each input port. That means the scheduling complexity will not scale up with the buffer size if the input port processor can perform at least N + 1 packet placements in each time slot.
IV. Analytical Model
ability X that a packet arrives. Let the destination of a packet be uniformly distributed to all outputs. An exact analysis of the lookahead scheduling algorithm is intractable because of the large state random variables involved. An approximate model which assumes a fixed serving priority is adopted in this paper. Under the fixed serving priority assumption, input i is always served before input j if i < j. For a homogeneous traffic system as we considered here, the packet scheduling priority has no bearing on the switch throughput. Without loss of generality, let input 1 always have the highest scheduling priority. For simplicity, let the transmitting column be always represented by column 1'. Let fa, be the occupancy of buffer ( i ,~) and Fl be the average buffer occupancy of all column j buffers. We have = 1 , 2 , . . , N and j = 2 , 3 , . , B. Note that we have assumed that column 1 is always the transmitting column. In other words, f, in the next time slot is given A packet cannot be placed into a buffer if (i) that buffer is occupied, or (ii) the buffer is not occupied but there is another packet in the same buffer column destines to the same output. When the second situation occurs, the destination of the new packet depends on the outputs of the packets in the current buffer column. It is important to model this dependency of packet output addresses among different buffer columns. Two quantities are defined for this purpose:
by fi, = fi,,+l.
'One can think that at the beginning of each time slot, we
Assume the packets arrived at each input in each time relabel column + 1 to column j for < and column to slot follow an independent Bernoulli process with prob-column B .
Fjk': the average buffer occupancy in column j for storing packets with the same output addresses as those in column k, where k < j . Similarly, we can define F:' as the average buffer occupancy in column j for storing packets with the same output addresses as those in column k before buffer (i,j) being considered for packet placement. b t ) : the probability that a packet being considered for placement at buffer (i, j ) has an output conflict with a stored packet in column j (independent of if buffer (2, j ) is empty or not).
Therefore, we can have
where F,!,3)(2) is the average buffer occupancy in column 4 for storing packets with the same output addresses as those in columns 3 and 2, and before the packet (if available) arrived at input i is considered for placement.
Let us focus on several typical cases to explain the above set of equations. Consider biz the probability that a packet cannot enter buffer (i, 2). The first term on the right hand side of the equation is the probability that buffer (i,2) is occupied. The second term is the probability that the packet will cause an output contention with one of the existing packet in column 2 given that buffer (i, 2) is empty.
Consider bi3. A packet cannot be placed into buffer (i, 3) under two situations. First, buffer (i, 2) is occupied. Second, buffer (i, 2) is empty but the packet has an output contention with another packet in column 2. In case 1 (with probability f , z / b , z ) , the packet's output address is uniformly distributed to all N outputs. The probability that the packet will have an output contention with a packet in column 3 is Fi3. In case 2 (with probability v ) , the packet's output address is uniformly distributed to the set of output addresses occupied by packets in column 2. Thus the probability that the packet will have an output contention with some packet in column 3 is F,!32)/F2.
The expressions for bij where j 2 4 can be similarly defined but the complexity involved in solving them is very high. To simplify the analysis, we substitute 
2Frorn our assumption that input 1 is always served with the highest priority, buffer ( 1 , l ) will always be empty and b l l = 0. In other words, padtets arrived at input 1 will never be discarded. This however will not affect our subsequent derivations as we are only interested in the average switch performance, not a particular port.
leg --* --- where (Fj -Fj+l)/Fl is the probability that a packet needs to wait for j time slots. Using the lookahead scheduling, a packet can be delayed for at most B slots.
The packet loss probability of the switch is given by 
V. Performance Evaluations
The packet loss probability, switch throughput and mean packet delay are studied by both simulations and analysis in this section. We focus on a lox 10 switch with buffer sizes B = 3 and 10 respectively. Figs. 3 shows the packet loss probability PI,,, against input load A. When input load is light, we can see that the analytical model only slightly underestimates the packet loss performance of the switch. At X = 0.6 and B = 10, fi,,, = 5.22~ lo-' from simulation and 3.41 x lo-' from analysis. At the same load with B = 3, P~,,, = 6 . 4~ from simulation and 4.07 x low2 from analysis. Fig. 4 shows the mean packet delay versus input load. Both simulation results and analytical results match very well. At X = 0.8, mean packet delay is 4.55 time slots from simulation and 4.91 slots from analysis. 
VI. Conclusions
In this paper, an analytical model was proposed for the performance evaluation of an efficient packet scheduling algorithm, called lookahead scheduling algorithm. Analytical expressions for packet loss probability, throughput and mean packet delay were derived. Analytical results were compared with the simulation results and we found that the proposed model is very accurate in predicting the performance of the lookahead scheduling algorithm.
