Abstract-This paper proposes two cell dispatching algorithms for the input-queuing space-memory-memory (IQ-SMM) Closnetwork to reduce out-of-sequence (OOS) for multicast traffic. The frequent connection pattern change of DSRR results in a severe OOS problem. Based on the principle of DSRR, MF-DSRR is able to reduce OOS but still suffers from it under high traffic load. MFRR maintains the connection pattern separately for each input and can eliminate the in-packet OOS and thus significantly reduces the reassembly buffer size and delay.
II. MODEL OF THE SPACE-MEMORY-MEMORY CLOS-NETWORK
The Clos-network switch consists of three stages of switching elements (SE) and is denoted as ( , , ). The switch model has input/output modules (IM/OM) of size × , and central modules (CM) of size × as shown in Fig. 1 . Each IM/OM has connections to line cards and interstage connections to CMs. There exists only one interstage connection between an IM/OM and a CM. The number of input/output ports of the switch is = . We assume that a first-in-first-out (FIFO) queue, where multicast packets are temporally stored, is installed before each input of IMs. Variable-length packets are assumed to be segmented into fixed-size cells before entering IMs, and to be reassembled after traversing three stages. We also assume that each packet carries a fan-out vector b = ⟨ ⟩, ∈ {0, 1}, 1 ≤ ≤ , where = 1 indicates the packet is bound for the ℎ output, else = 0. Cells generated from the same packet have the same fan-out vectors. The FIFO queue is assumed to be able to examine the fan-out vector of each packet and inform the switch fabric of any fan-out change. IMs forward incoming cells to CMs according to cell dispatching algorithms. Since the IQ-SMM architecture is used, CMs 
III. CELL DISPATCHING ALGORITHM
Discussed in [9] , DSRR runs independently in each IM and connects each input to all outputs in a round-robin fashion. The connection pattern is changed after each cell time as shown in Fig. 2 , resulting in a well balanced distribution of cells to the CMs. However, using the IQ-SMM architecture shown in Fig. 1 , this causes a serious OOS problem. Independently treating the incoming cells is not optimal, since samepacket cells should be kept in a sequential order.
A. Multicast flow-based DSRR (MF-DSRR)
Instead of changing the connection pattern after each cell time as DSRR, MF-DSRR modifies the IM connection pattern each time when a change of received fan-out vector is detected. Take a 4 × 6 IM for example, the connection pattern of can initially be ) and so on.
B. Multicast flow-based round robin (MFRR)
MFRR is independently run in each IM. Each input monitors the change of received fan-out vectors. The AvailableList is used to record the idle outputs as its elements and the number of elements is thus ( − ). Elements can only be popped from the top and inserted to the bottom. When a change of fan-out vector is detected by , , one output is popped from the AvailableList and the connection of , is changed to that output. Meanwhile, the output that is released by , is inserted to the bottom of the list. If changes of fan-out vectors are detected at the same time, ties are broken randomly. Considering the 4×6 switch, the connection pattern of can initially be
) with 5, 6 in the AvailableList, as shown in Fig. 4 ).
IV. ANALYSIS AND SIMULATION RESULTS

A. Out-of-Sequence Probability Analysis
Assume the traffic to each , is an i.i.d. Poisson arrival process with arrival rate of . Variable-length packets are segmented into fixed-size cells before entering IMs, where is a random variable uniformly distributed with ( ) =¯. Each packet is bound for a destination with a probability of , i.e. ( = 1) = . Since a packet is bound for at least one destination, the fan-out ≜ ∑ , ∀ has ( = ) = ( ) ( and ( ) =
, where ≜ ∑ |c | , ∀ . All traffic is admissible, which means no input or output port is oversubscribed. The total traffic load of all outputs is ( ), and the offered load seen on each output is ( ). For , under the MF-DSRR scheme, the probability of a connection pattern change is:
where˙= ( − 1) , = 0, 1, 2, . . . is the number of connection pattern changes, and¯is the mean cell time for , to complete the transmission of a packet. Since the connection pattern resumes after m changes, we thus have the probability that same-packet cells are distributed to different CMs: 
B. Simulation Results
Comparisons between Static, DSRR, MF-DSRR, and MFRR in a (4, 7, 4) IQ-SMM switch are carried out in OPNET Modeler [10] . The scheduling algorithm proposed in [2] is used in the CMs and OMs. The Static scheme does not change the connection patterns in IMs and is thus used as a reference. Admissible traffic with ( ) = 4 and¯= 12 is provided to each input. Fig. 5 compares the percentage of total OOS cells including both inter-packet and in-packet OOS in all the cells received. As for the in-packet OSS cells, DSRR performs the worst, causing many in-packet OSS cells due to its frequent connection pattern change. MFRR has no inpacket OOS cells and outperforms the other two. MF-DSRR can reduce the in-packet OOS but cannot completely eliminate it due to the non-zero probability in Eq. 2 that cells of the same packet are distributed to different CMs. The three schemes have similar performance in inter-packet OOS but MFRR still outperforms the others. Fig. 6 compares the reassembly buffer size. MFRR can significantly reduce the buffer size and performs close to Static. Fig. 7 shows the cell delays. DSRR outperforms all the others because of its load balancing feature, evenly distributing cells to CMs. However, DSRR can cause serious reassembly delay which is about 75% of the mean packet transmission time under high load illustrated in Fig. 8 . MFRR and MF-DSRR both reduce the reassembly delay and MFRR has the best performance.
V. CONCLUSION
This paper proposes an input-queuing space-memorymemory (IQ-SMM) Clos-network architecture for multicast with two cell dispatching schemes, MF-DSRR and MFRR. MF-DSRR has a low implementation complexity. MFRR requires independent controllers on each input and achieves a low complexity by proper design. Simulation results show that MF-DSRR is able to reduce OOS, which is a serious problem from DSRR, but still suffers from it under high traffic load. MFRR can eliminate the in-packet OOS and thus significantly reduces the reassembly buffer size and delay.
