Multicast cross-path ATM switches: principles, designs and performance evaluations. by Lin, Hon Man. & Chinese University of Hong Kong Graduate School. Division of Information Engineering.
MULTICAST CROSS-PATH ATM SWITCHES: 
PRINCIPLES, DESIGNS AND PERFORMANCE 
EVALUATIONS 
BY 
LlN HON MAN 
A THESIS 
SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS 
FOR THE DEGREE OF M A S T E R OF PHILOSOPHY 
DlVISION OF INFORMATION ENGINEERING 
T H E CHINESE UNIVERSITY OF HONG KONG 
M A Y 1 9 9 8 
I^^M^^ 
1 ¾ ^ 5 J U L J ^ f l 
^ ^ T UNIVERSITY"""1動 
®^gsLIBRARY SYSTEN^^ 
^ ^ ^ ^ 
Acknowledgement 
I would like to express my gratitude to my supervisor Professor Tony T. Lee for 
his continuous support, guidance and invaluable suggestions during my studies 
at CUHK. 
Especially I would like to thank Dr. Cheuk-hung Lam for his insightful 
suggestions on my work and thesis. 
I also wish to thank members of the Broadband Communications Laboratory, 
Mr. Patrick C.K Wu and Mr. Walter C.W Fung for their help on computer 
systems, Mr. Ngai Li, Mr. Soung-yue Liew, Mr. Oo Tang, Mr. Hanford Chan, 
Dr. Philip To, Dr. Cathy Chan, Mr. Thomas Kwok, Mr. Alan Yeung, Mr. 
Vincent Tse and Ms. Sze-wan Cheng for their various kinds of help. 
ii 
Abstract 
The cross-path switch, a newly proposed large-scale ATM switch, is based on 
the three-stage Clos networks and has been shown to be capable for handling 
unicast and multirate traffic efficiently. In this thesis, we adopt two replica-
tion approaches to enhance the switch to support multicast switching. The first 
scheme replicates multicast cells at both input and output stages, while the sec-
ond one replicates cells at the input stage only. A feasible configuration for each 
scheme is considered and the effect of multicast traffic on switch performance in 
terms of throughput, delay and loss probability is studied. We observed that, 




1 Introduction 1 
1.1 Organization of Thesis 3 
2 Principles of Multicast Cross-Path Switches 4 
2.1 Introduction 4 
2.2 Unicast Cross-Path switch 5 
2.2.1 Routing properties in Clos networks 5 
2.2.2 Quasi-static routing procedures 5 
2.2.3 Capacity and Route Assignment 7 
2.3 Multicast Cross-Path Switch 8 
2.3.1 Scheme 1 - Cell replication performed at both input and 
output stages 10 
2.3.2 Scheme 2 - Cell replication performed only at the input 
stage 10 
3 Architectures 14 
3.1 Introduction 14 
3.2 Input Module Design (Scheme 1) 16 
iv 
3.2.1 Input Header Translator 16 
3.2.2 Input Module Controller 17 
3.2.3 Input Replication Network (Scheme 1) 19 
3.2.4 Routing Network 23 
3.3 Central Modules 24 
3.4 Output Module Design (Scheme 1) 24 
3.5 Input Module Design (Scheme 2) 25 
3.5.1 Input Header Translator (Scheme 2) 26 
3.5.2 Input Module Controller (Scheme 2) 27 
3.5.3 Input Replication Network (Scheme 2) 28 
3.6 Output Module Design (Scheme 2) 29 
4 Performance Evaluations 31 
4.1 Introduction 31 
4.2 Traffic characteristics 31 
4.2.1 Fanout distribution 31 
4.2.2 Middle stage traffic load and its calculation 32 
4.3 Throughput Performance 34 
4.4 Delay Performance 37 
4.4.1 Input Stage Delay 38 
4.4.2 Output Stage Delay 39 
4.5 Cell Loss Performance 43 
4.5.1 Cell Loss due to Buffer Overflow 44 
4.5.2 Cell Loss Due to Output Contention . . . 45 
4.6 Complexities 50 
V 
5 Conclusions 57 
Bibliography 59 
vi 
List of Tables 
2.1 Latin Square Assignment 8 
3.1 Local Routing Table for Input Module 0 18 
3.2 Trunk Number Translator (TNT) Table (Scheme 1) 19 
3.3 Trunk Number Translator (TNT) Table (Scheme 2) 28 
4.1 The Middle-Stage Traffic Load in Multicast Cross-path Switches 
for different Fanout Distributions 34 
4.2 Scheme 1: Loss Probabilities Due to Output Buffer Overflow, 
p=0.9, E[Y]=8 45 
4.3 Scheme 2: Loss Probabilities Due to Output Buffer Overflow, 
/9=0.9, E[Y]=8 45 
4.4 The Statistics of the Number of Packets Leaving from an Input 
Module for an Output Module 47 
vii 
List of Figures 
2.1 A three-stage Clos network 6 
2.2 Correspondence between middle-stage connection pattern in a 
Clos network and edge-coloring of regular bipartite multigraph . 7 
2.3 A counter example shows that replication is not appropriate in 
central modules of a cross-path switch 9 
2.4 Two multicast schemes in cross-path switch 11 
3.1 The Operations of Multicast Cross-Path Switches 15 
3.2 Header Translator 1 and the Lookup Table 17 
3.3 Routing Example (Scheme 1) 18 
3.4 Look-ahead Selection of IMC (Scheme 1) 19 
3.5 A Copy Network 20 
3.6 Operations of RAN, Dummy Address Encoders and RBN . . . . 21 
3.7 Broadcast Banyan Network and Trunk Number Translator . . . 23 
3.8 Batcher and Banyan Network 23 
3.9 Central Module: Benes Network 24 
3.10 Output Module Design (Scheme 1) 25 
3.11 Knockout Concentrator 26 
3.12 Look-ahead Selection of IMC (Scheme 2) 27 
viii 
3.13 Routing Example (Scheme 2) 28 
3.14 A Copy Network 29 
3.15 Output Module Design (Scheme 2) 30 
4.1 Throughput increase with expansion factor m/n 36 
4.2 Throughput increase with mean fanout number E[Y] 37 
4.3 Throughput varying with fanout distribution in scheme 1 . . . . 38 
4.4 Throughput varying with fanout distribution in scheme 2 . . . . 39 
4.5 Input Stage Delay vs. Expansion Factor, E[Y]=8, w=8 40 
4.6 Logical framework of an output-buffered Multicast Switch . . . 41 
4.7 Discrete time Markov chain state transition diagram for Q^ . . 42 
4.8 Output Stage Delay vs. Virtual Load, E[Y]=8 44 
4.9 Buffer Overflow Loss Probability vs. Virtual Load, m=36 (Scheme 
1 ) 53 
4.10 Buffer Overflow Loss Probability vs. Virtual Load, m=48 (Scheme 
2 ) .53 
4.11 Loss probability decrease with group size G in scheme 1 54 
4.12 Loss probability decrease with group size G in scheme 2 54 
4.13 Loss probability increase with expansion factor m/n 55 
4.14 The little change of loss probability with mean fanout E[Y] . . . 55 
4.15 The Complexity Comparison between two Multicast Schemes for 
Different throughput requirement 56 
4.16 The Complexity Comparison between two Multicast Schemes for 




Over the past decades, advancement of computer and data communication tech-
nologies in information delivery services created an exponential increase of band-
width demands in various kinds of services, such as WebTV, Video-On-Demand 
and video conferencing. Existing data networks are not capable to deliver data 
fast enough with a low cost, and there is no quality of service (QoS) guarantee. 
With the limited bandwidth resources, one possible way to tackle the problem 
is to make the systems more efficient in handling various kinds of services. In 
1988, the Asychronous Transfer Mode (ATM) technology has been selected by 
ITU-TSS as the transfer medium of B-ISDN. ATM uses a new cell switching 
technology and consolidates various information service types into streams of 
fixed-length cells of 53 bytes routed across different networks. It is expected 
this cell switching technology will combine the advantages of circuit switching 
in providing constant bit rate service of voice and still images as well as the 
advantages of packet switching for variable rate services of data and full-motion 
video in an inexpensive, efficient and unified transfer mechanism. 
1 
Chapter 1 Introduction 
ATM switch design plays an important role in the success of ATM networks. 
For many years, Clos network [4] has been used as the framework of many ATM 
switches. However, one major problem we need to solve is the cell route assign-
ment. Existing serial routing algorithm required a central controller to assign 
the path for each connection, which cannot be applied directly in a dynamic cell 
switching environment where all the routes have to be assigned on-the-fly by 
the central controller, and the high computation power required render such ap-
proach impractical for large-scale switches. In regard of this, a new quasi-static 
routing scheme named Path Switching and the corresponding network named 
Cross-Path Switch [36] have been proposed. In this switch, the central con-
troller will be used only to determine a set of periodic connection patterns at 
the central stage and routing duties at other stages are distributed. 
In handling point-to-multipoint traffic, a switch with multicast capability 
achieves a higher efficiency by minimizing the number of separate copies send 
from the source to the receivers. Multicast switch design and multicasting of 
real-time multimedia data have recently been receiving a great deal of attention 
:38, 1 7 , 1 0 , 18；. 
It has been shown that the cross-path switch can efficiently handle unicast 
and multirate traffic. Our motivation is to enhance this switch with multicast 
capacities [30]. In this thesis, we will discuss the necessary procedures to achieve 
this goal and analyze its performance. 
2 
Chapter 1 Introduction 
1-1 Organization of Thesis 
In chapter 2，we will start with the principles of unicast cross-path switches, 
then we will discuss two multicast schemes for the switch to support multicast 
connections, and their effect on traffic load to central modules. Chapter 3 dis-
cuss the architectures and operation details of the space-divisioned multicast 
cross-path switches. Performance evaluations on the multicast switches will be 
discussed in chapter 4. Our performance measures include maximum through-
put, cell delay at input and output stages, concentration load and loss due to 
buffer overflow. Finally, we will have our conclusions in chapter 5. 
3 
Chapter 2 
Principles of Multicast 
Cross-Path Switches 
2.1 Introduction 
A cross-path switch is a three-stage Clos network [4] with a new quasi-static 
routing algorithms called path switching. These procedures provide flexibilities 
for regulating the capacities assigned to different virtual paths so to satisfy 
various QoS constraints. A complete discussion on the regulating of virtual 
paths can be found in [5]. In the following sections, we will review some of the 
essential procedures of path switching which are also important for the multicast 
extensions. 
For every three-stage Clos network [4], there are altogether eight possible 
multicast schemes. We find that only two of them can be applied successfully 
in perform multicasting, without violating the principles of path switching. The 
two approaches are similar to those originally proposed in [10] for the growable 
4 
Chapter 2 Principles of Multicast Cross-Path Switches 
packet (ATM) switch [22]. Furthermore, the modifications are on the switching 
fabrics only and other routing procedures remain the same. 
2.2 Unicast Cross-Path switch 
2.2.1 Routing properties in Clos networks 
A three-stage Clos network is shown in Fig. 2.1. At the first stage, there are k 
input modules of dimension n x m, i.e. each of them have n input links and m 
output links. At the central and output stages, there are m central modules and 
k output modules of dimensions k X k and m X n respectively. We consider 
symmetric network only that the number of input modules equals the number 
of output modules. Note that there is a unique link interconnecting every pair 
of modules in adjacent stages, 
2.2.2 Quasi-static routing procedures 
The Clos network was originally proposed for circuit switching. Existing serial 
algorithms are not directly applicable for dynamic cell switching because in that 
case, the connection pattern will change in every time-slots, and all the routes 
have to be assigned on-the-fly. This will require a very high-speed central con-
troller for the switch, especially when its size is large. In order to solve this 
problem, cross-path switches use predetermined, periodical connection patterns 
at the central stage and delegate other routing duties to the input and output 
5 
Chapter 2 Principles of Multicast Cross-Path Switches 
I — \ X 秘 \ X — ： 
^ H ^ % ^ 
. nxm • \ / • kxk • \ / • nun • 
m w 
• nxm • ‘ kxk • ‘ mxn • 
I • • • I I 
W (m) (k) 
Figure 2.1: A three-stage Clos network 
modules. By considering each input and output module as a node, the connec-
tion patterns in the middle stage represent the coloring of edges connecting the 
corresponding nodes in a bipartite multigraph, as illustrated in Fig. 2.2 [36 . 
A virtual path between a pair of input and output module comprises all vir-
tual circuits connecting the modules. The number of edges between node Ii and 
node Oj indicates the number of packets that can be routed from input module 
i to output module j. This quantity is termed capacity of the virtual path con-
necting this pair of modules. The scheduling of path switching consists of two 
steps: tlie capacity assignment and the route assignment. Capacity allocation is 
the process of finding out tlie capacity of each virtual path satisfying its band-
width requirement, and the route assignment will convert the capacities into a 
desirable periodic connection pattern of central modules, by edge-coloring the 
corresponding regular bipartite multigraph based on the time-space interleaving 
principle [36]. The connection pattern will be renewed if the traffic statistics 
6 
Chapter 2 Principles of Multicast Cross-Path Switches 
3X3 3X3 Input Modules Output Modules 
國蓋 
3 3 3-Stage Clos Network Equivalent Bipartite Graph 
Figure 2.2: Correspondence between middle-stage connection pattern in a Clos 
network and edge-coloring of regular bipartite multigraph 
change significantly. Hence the slot-by-slot route computations for all simulta-
neously incoming packets required by the dynamic cell switched route scheme 
can be avoided. 
2.2.3 Capacity and Route Assignment 
The following are the procedures for homogeneous traffic. Suppose the average 
amount of traffic from input module U to output module Oj is \ij per time 
slot. In order to assign a desirable middle-stage connection pattern, where the 
number of connections assigned for a virtual path is always an integer, a frame 
size f was introduced to extend the route scheduling of the m central modules 
in one time slot to the scheduling of totally mf "central modules" in f time 
slots. The frame size f should be chosen to make g = mf/k an integer, where 
g denotes the number of central modules assigned for each virtual path during 
every f time slots. Then the desired connection pattern in the central modules 
7 
Chapter 2 Principles of Multicast Cross-Path Switches 
is given by the Latin Square assignment shown in Table 2.1. 
Table 2.1: Latin Square Assignment 
II Oo Oi O2 . . . 0而—1  
~ ~ ^ ^0 Ai A2 . . . Afc_i 
h — Ak-i ~ ^ ~ ^ ^ . . • ~Ak-2 
« • 參 • • • 
h-i I  Ai A2 As . . • Ao 
In the table, each A^, 0 < i < k — 1，represents a set of distinct central 
modules ranging from 0 to fm — 1, i.e. 
^0 二 {0, 1 ，…， g - 1 } ; 
^1 二 {flS ^ + 1 ,…， 2 ^ - 1 } ; 
< 
�A,_i = {(A^ — l)g, (k — l)g + 1, . . •, kg — 1}. 
The assignment a can be translated to a pair (^, r) by 
a = r X f + t, 
where r and t are the quotient and the remainder when a is divided by / , 
indicating that the central module r will connect input module i to output 
module j in the t^^ slot of every frame. 
2.3 Multicast Cross-Path Switch 
We do not consider replicating cells at the source as true multicasting because 
under most situations, it cannot transmit a cell to more than one output port 
at the same time. Furthermore, replicating cells at the second stage is not 
8 
Chapter 2 Principles of Multicast Cross-Path Switches 
appropriate for the cross-path switches. The reasons are as follows. Suppose 
we use the same procedures as in unicast switches, since the central connection 
patterns are predetermined, a cell cannot replicate itselffreely on demand within 
the central modules. 
On the contrary, we may consider amend the route assignment to include the 
replication patterns in route assignment. In this way, we cannot simply adopt 
the unicast procedures, but further complicate the matter with the exponential 
growing number of possible central connection patterns. In addition, the cell 
replication requests may not run in line with the connection patterns. Consider 
the scenario as depicted in Fig. 2.3. Packet A is destined for output port 0 and 
2. Two alternative paths will take packet A to output port 2, one is through port 
(a) and central module 0，and the other is through port(b) and central module 
1. Obviously, it save resources routing packet A through port (a), rather than 
port (b). This example shows that, if multicasting is allowed at second stage, 
the input module controller will need to take extra considerations routing the 
cells. 
^ 1 " ^ ^ ^ / f ^ \ ^ ^ " u 
53—— � , I ' \ \ —— 
^ 4 4 4 ^ 
rf^"^ ^ ^ K x ^ ^ 
Figure 2.3: A counter example shows that replication is not appropriate in 
central modules of a cross-path switch 
9 
Chapter 2 Principles of Multicast Cross-Path Switches 
Taking the above into considerations, we proposed two multicasting schemes 
for the cross-path switches. According to the cell-replication capability at the 
output stage, the multicast schemes can be classified into the following cate-
gories: 
2.3.1 Scheme 1 - Cell replication performed at both input 
and output stages 
As shown in Fig. 2.4a, scheme 1 replicates cells in both input and output 
modules. A cell waiting at the input stage is first replicated and the number 
of copies is equal to the number of destined output modules of the cell. These 
copies will then be routed by the central modules to their corresponding output 
module. Finally, each copy arriving at the last stage will be further replicated 
if it is destined for more than one output link within that output module. 
2.3.2 Scheme 2 - Cell replication performed only at the 
input stage 
In this scheme, cell replication is not allowed at the output stage. Therefore, all 
the replications must be finished within the input modules, as shown in Fig, 2.4b. 
In contrast to the first multicast scheme, a packet that has the same destined 
output module but different destined output ports needs to be replicated in 
the first stage and these copies will be routed independently through different 
central modules. 
The major difference between the two multicast schemes lies in the cell repli-
cation capability at output stage, resulting different traffic load in middle stage. 
10 
Chapter 2 Principles of Multicast Cross-Path Switches 
To achieve a given throughput requirement, the second scheme requires more 
central modules, while the first scheme suffers the large implementation com-
plexity of multicast output modules. 
In the original unicast cross-path switch, the capacity allocation is based on 
^ : ^ n ^ i " " ^ T ^ ^ p ^ ^ ^ i r ^ T " ^ ^ ^ 
— • 一 一 一 ^ ― 、 ’ — — — • — 一 一 _ » , — 
M ^ - ^ U^ ^C>^~~~ L^ 
n ^ ^ ^ \ & ^ 
^ n L>- ^ V q - ^ - ^ ^ 
P ^ r ^ ^ ^ 1 ^ 
-^J ‘ ^ ^^ ^^ 
(a) replicating cells in both (b) replicating cells only in 
input and output stage the input stage 
Figure 2.4: Two multicast schemes in cross-path switch 
the traffic load and the effective bandwidth of each virtual path within the mid-
dle stage of Clos network. The virtual-path traffic load of unicast traffic not 
requiring replication in middle stage is the same as in input stage. However, for 
multicast traffic, the cell fanout distribution and the replication schemes will af-
fect the traffic characteristics in middle stage, resulting in different middle-stage 
traffic load. The capacity allocation procedures for unicast cross-path switch 
also work here. 
Let iu,Oy,Ii,O3 be input link u, output link v, input module i and output 
module j, respectively. Basically, we assume that the traffic load (before repli-
cated) pr in unit of cell/slot, the effective bandwidth (before replicated) a^ and 
the cell fanout number yr of each call request r are given at the call setup, and 
11 
Chapter 2 Principles of Multicast Cross-Path Switches 
they satisfy the following constraints for fulfilling the cell level QoS requirement. 
1. Input link load constraint Yjreiu Pr — 1, Vu. 
2. Output link capacity constraint ^reoy ^r < 1, W . 
We say that r G iu when the call request comes from input link u, and r G Oy 
when one of the destinations of r is output link v. 
Define Xij to be the total traffic load of the virtual path connecting input 
module i and output module j. For scheme 1 in which a call request has only 
one copy routed in middle stage for each destined output module, 
AlP= E Pr. 
rElz, reOj 
Similarly, we say that r G Ii when r comes from input module i, and r G Oj 
when one of the destinations of r is in output module j. In contrast, the traffic 
load of call request r is expanded to yr times in middle stage and we assume 
yr{hj) of them on the virtual path connecting Ii and Oj. Then the total traffic 
load of this virtual path is given by 
^if = J2 Pr • yr{iJ)-
reh, reOj 
Similarly, the aggregate effective bandwidth |3ij of a virtual path connecting Ii 
and Oj in two multicast schemes are given respectively as follows 
的)、=Y. ^r, f^lf = J^ Ofr . yr(y). 
reh, reOj reli, reOj 
12 
Chapter 2 Principles of Multicast Cross-Path Switches 
The capacity assignment is to find the capacity Cij for the virtual path 
connecting Ii and Oj such that the following constraints are satisfied. 
‘ 
Cij > A’j Vz,i; 
{ Et Cij = m Vi; 
‘ E j Cij = m Vz, 
where m is the number of central module. 
An optimal assignment requires establishing an objective function such as 
minimizing total weighted delay [7, 3, 36] after each virtual path is modeled as 
an M/M/1 queue. 
\ . . 
_ Y^ A” 






In Chapter 2, we discussed the background theories of path switching, capacity 
assignments and route assignments. These procedures determine the connection 
patterns of the central modules and allow the input and output modules to 
route its cell with local information derived from the connection patterns. In 
this chapter, we will focus on the operational procedures and implementation 
details of the space-divisioned multicast cross-path switches. 
Figure 3.1 shows a schematic diagram of the cell routing process in the 
multicast switches. In brief, the multicast cross-path switches have the following 
operation requirements: 
• Input Modules 
1. Rearrangeably Nonblocking 
2. Cell Replication Capability 
14 
Chapter 3 Architectures 
I INPUTMODULE ! 
CELL| INPUT INPUT INPUT POINT-TO- | 
I ” + HEADER ——^ : ——• MODULE ——• REPLICATION ——• POINT ~^| 
I TRANSLATOR CONTROLLER NETWORK NETWORK i 
c : i ~ " o u W ^ ~ ~ o u ^ " ~ ^ o u ™ ^ Scheme2 | cENTRAL~~ 
0 ¾ ^ HEADER 4— =PUT 4— MODULE 4 ^ ^ = : ^ 
I TRANSLATOR BUFFEF^  CONTROLLER “ | MODULE 
i Scheme 1 | 
I OUTPUTMODULE ~~OUTPUT~~1 i 
丨 L REPLICATION 4^\ 
i NETWORK i 
I I 
Figure 3.1: The Operations of Multicast Cross-Path Switches 
• Central Modules 
1. Rearrangeably Nonblocking 
• Output Modules 
1. Rearrangeably Nonblocking 
2. Bandwidth Expansion 
3. For Scheme 1: Cell Replication Capability 
In addition to performing switching function, the cross-path switches also 
have traffic control schemes to ensure the Quality of Services requirements (QoS) 
of the subscribers are meet. Traffic control have been discussed in [5 . 
We will provide some reviews on existing switch architectures and the cor-
responding modifications employed in constructing of the multicast switches. 
15 
Chapter 3 Architectures 
Throughout this chapter, we will make use of two routing examples to illustrate 
the overall operations of the switches. 
3.2 Input Module Design (Scheme 1) 
Referring to Fig. 3.1, the input modules consist of five components, namely the 
input header translator, input buffer, input module controller, input replication 
network and a point-to-point routing network. The functions of these blocks 
will be discussed in the following subsections. 
3.2.1 Input Header Translator 
Each incoming cell to the switch have a pair of (VCI,VPI) entry in its header. 
The output port addresses of the cells of the virtual connections stored in the 
memory during call set-up. The function of input header translator (IHT) is 
to make use of this information to perform a table lookup and find out the list 
of the cell's destined output ports from the memory. Moreover, the switch also 
have the knowledge of the cell's destined output modules. We termed these lists 
Destination Port List (DPL) and Destination Module List (DML) respectively. 
While a T in bit position i of DPL indicates a cell copy destined for the zth 
output port of the switch, a T in bit position j of DML means a copy will be 
send to the j th module. As discussed in Chapter 2, a cell with more than one 
fanouts destined for the same module will have only one copy send to that output 
module. Hence, cell A with DPL=[0010 1100 0000 0001] will have DML=[1 1 0 
1]. In this scheme, the DML consisting of k bits with the format in Fig. 3.2 will 
be appended to a cell. After completion of the header translation, the cell will 
16 
Chapter 3 Architectures 
be packed into the input buffers waiting for look-ahead selection performed by 
the input module controller (IMC). 
ln'oA I 丨2.3�I ^^• TRANl^ foR1 ~~• | 丨"'°八 | � 2 . 3 �| ^ 
D a t a [VCI_VPI] 
Output 
V C I V P I Output links Modules (OM) 
2 3 0 0 1 0 , 1 1 0 0 , 0 0 0 0 , 0 0 0 1 1 1 ° 1 
3 5 0 0 0 0 , 0 0 0 0 , 0 0 0 1 , 0 0 0 0 ° ° 1 ° 
• • • ！ 
• • • . I I I 
个 … 个 
O M O - - - O M 3 
Figure 3.2: Header Translator 1 and the Lookup Table 
3.2.2 Input Module Controller 
At the start of each time slot, the input module controller (IMC) examines the 
headers of the cells waiting at the input buffers to determine which cells can 
be routed to the central modules, according to a local routing table derived 
from the central modules' global routing information. An example of the local 
routing table is shown in Table 3.1. The entries are the output port addresses (in 
binary format) of the input module. A cell will be selected if there is a free port 
associated with its destined output module. For example, a packet destined for 
output module 0 (OMO) can be selected in time slot 0 only if port 10 (binary) 
is free. During the first round of selection, IMC examines the first cell of each 
input queue; if the cell cannot be selected, it examines the next cell in queue. 
This process continues until a cell is successfully selected or w cells have already 
been checked. 
17 
Chapter 3 Architectures 
m ^ ^ K 
^"W"^W(^^=^^;^L5^ 
鼸 
IM3 O M 3 
Figure 3.3: Routing Example (Scheme 1) 
Table 3.1: Local Routing Table for Input Module 0 
Time slot OMO 0M1 0M2 0M3 
0 10 00 01 11 -
Consider a cross-path switch as shown in Fig. 3.3, with the internal structure 
of central module 0 as shown. Its local routing information is listed in Table 
3.1. For simplicity, only the information for time slot 0 has been shown. Fig. 
3.4 shows this network in action. Assuming that the selection process starts 
from cell Y. As shown in Fig. 3.4, cells A and B are selected, while X and Y 
are not since there is no path to output module 0 during that time slot. After 
the selection, IMC modifies the headers to reflect the changes. It also maintains 
a temporary table containing the VCI, VPI, copy index (CI) and destination 
of a cell copy, as shown in Table 3.2. This table will be used by the trunk 
18 
Chapter 3 Architectures 
Y I 0001 I I Y I 0001 I 
A | 1 1 0 1 I INPUT | A | 0 0 0 T ] INPUT 丨 小 
MODULE MODULE 
CONTROLLER CONTROLLER 
B I 0010 I X I 0001 I I X I 0001 I I B | 1 
Before Selection Aft6r S6l6ction 
Figure 3.4: Look-ahead Selection of IMC (Scheme 1) 
number translator of the replication network. The functions of the replication 
network and CI will be discussed shortly. IMC will also calculate and append 
the corresponding copy number to the cells. The winner cells are then feed into 
the replication network. At the end of the time slot, a cell will be taken out of 
the input buffer if its header is a list of all zeros. 
Table 3.2: Trunk Number Translator (TNT) Table (Scheme 1) 
VCI VPI CI Port Address SynIbX 
2 3 0 00 A 
2 - 3 ~1 10 A 
3 ~ 1 0~ 11 B 
3.2.3 Input Replication Network (Scheme 1) 
Cell replications will be performed by a nonblocking Copy Network [38]. A 
replication network is called nonblocking if it can produce the requested packet 
copies when the total number of copies required by all input packets is no more 
19 
Chapter 3 Architectures 
than the number of output links of the network. 
The fundamental building block of the Copy Network is the broadcast-
banyan network. In [38], it has been shown that the broadcast-banyan network 
will be nonblocking if the active inputs Xi, • • •, Xm ( Xj > Xi if j > i ) are 
concentrated and their corresponding sets of outputs Fi, • • • , Ym are distinct and 
monotonic. Hence, packets need to go through additional fabrics before entering 
the broadcast-banyan network. An example of a 4 x 4 copy network is shown 
in Fig. 3.5, which is also part of the routing as in Fig. 3.3. It consists of five 
Running Sum of Copy Numbers 
Running Sum ot Activity Bits 
m ^ • _ 口 ^ ^ ^ 嗣 
, ” �\ , |A|(0,1)|00| Z |B|(2,2)|2| \ 刚 _ _ ^ 
刚 — j^:}L.^^p^——L_LX ——V ^ - = — — C Z H 0 W 
— i S X _ / ^ \ 刚 &嗣 
刚 _ ^ [ ^ ^ / - ^ ^ 
\ R u n n i n g - Dummy Reverse Broadcast Trunk 
adder Address banyan banyan Number 
Network Encoder Network Network Translator 
CELL CN CELL MIN,MAX ADDR1 CELL MIN,MAX IR CELL CI CELL ADDR2 
CN ： COPYNUMBER IR ： INDEXREFERENCE 
ADDR1 ： OUTPUT ADDRESS OF REVERSE BANYAN NETWORK CI ： COPYINDEX 
MlN,MAX ： ADDRESS INTERVAL OF BROADCAST BANYAN NETWORK ADDR2 ： OUTPUT ADDRESS OF INPUT MODULE 
Figure 3.5: A Copy Network 
stages: a running-adder network (RAN), a set of dummy-address encoders, a 
reverse-banyan network (RBN), a broadcast-banyan network (BBN), and a set 
of m trunk number translators (TNT). In Fig. 3.6, the running-adder network 
and dummy address encoders assign to the input packets a set of concentrated 
20 
Chapter 3 Architectures 
output addresses for the reverse-banyan network. Based on the adjacent running 
Address assigned 
= S u m of activity bits 
= R u n n i n g sum -1 
0.0 0,0 
^ ¾ " ^ ¾ = ^ ^ _ i ^ " ^ F ^ " ^ M ( o , i ) | o | 
I A I CN:2 I " ^ ^ ^ : : : < ^ ^ ] ^ | ^ l A I ( 0 , 1 ) 卜 」 ^ ^ ^ ^ ~ ~ K ^ ^ n ^ ^ 
B C N : 1 ^ 5 ^ 1 , 1 ^ " " ^ ^ 1 B I ( 2 , 2 ) 卜 ] ~ 1 k ^  
^ ^ • • 
1 Running Sum of Activity Bits 
Running Sum of Copy Numbers 
A c a > ^ ^ 
+ B c+d a+b 
B d b ^ ^ 
^ ‘ ^ Running Sum of Activity Bits 
I Running Sum of Copy Numbers 
Figure 3.6: Operations of RAN, Dummy Address Encoders and RBN 
sums, an output address interval {RSj-x^ RSj-l)^ 0 < j < N — 1, is assigned to 
output j of the running-adder network, where 
RSj = j^CNi 
i=0 
and RS-i = 0. The self-routing reverse banyan network act as a concentrator. 
Expressing the output address in binary format, bn.. .¾, n = l0g2N. Starting 
from the least significant bit, at the zth stage, if hi is a 0, the cell will be forwarded 
to the upper outgoing link, otherwise it will be forwarded to the lower link. 
The broadcast banyan network produce packet copies according to the Boolean 
Interval Splitting Algorithm. The packet header specifies a contiguous output 
interval of (MIN,MAX), the smallest and largest output address respectively. A 
packet at stage k with the header containing an address interval specified by 
21 
Chapter 3 Architectures 
two binary numbers: MIN(A;-1) 二 m i . . • m^ and MAX(^-1)= Mi •. • M^. The 
routing decision is made according to the following rules: 
1. If ruk = Mk = 0 or rrik 二 Mk = 1, send the packet to link upper link (0) 
or lower link (1) respectively. 
2. If rrik — 0 and Mk 二 1, replicate the packet and modify the header accord-
ing to the following scheme: 
• For the packet sent out on link 0 
MIN{k) 二 MIN[k — 1 ) = m i . . . r r i n 
MAX{k) = Mi'"Mk-i^l--'l 
• For the packet sent out on link 1 
MIN{k) = Mi---Mfc_il0---0 
MAX[k) = 7krAX(A; — l) = Mi�Mn 
The routing operations of the broadcast-banyan network and TNT has been 
shown in Fig. 3.7. Two fields, index reference (IR) and copy index (CI) will be 
used to index different copies from the same source. Initially, IR is set to the 
minimum of the address interval, which is not modified within the broadcast-
banyan network. After the replication process, compute CI 二 output address -
IR. Then the copies will have distinct CI ranging from 0 to CN-1. TNT uses the 
CI field to lookup the output port addresses from the table created by IMC, as 
in Table 3.2. 
22 
Chapter 3 Architectures 
A (0,1) s ^ ^ ^ — A 0 — TNT — A 00 
B (2,2) ~ \ n I ~ ~ A 1 — TNT — A 10 
— 八 B 0 — TNT — B 11 
— — — T N T — 
Figure 3.7: Broadcast Banyan Network and Trunk Number Translator 
3.2.4 Routing Network 
The routing network used is a Batcher-banyan network. A banyan network with 
concentrated inputs and distinct, monotonic outputs is internally nonblocking. 
Batcher network [20] consisting of stages of comparators keeps the packets sorted 
before injecting into the banyan network. When passing through the compara-
tors, the cell with a smaller output address between the two incoming cells will 
be forwarded to the link where an arrow points to. Routing in the banyan net-
work is distributed. Expressing the output address in binary format, ^r..6n, 
n = l0g2N. Starting from the most significant bit, at the zth stage, if b{ is a 
0, the cell will be forwarded to the upper outgoing link, otherwise it will be 
forwarded to the lower link. The cells are then routed to the central modules. 
Fig. 3.8 shows the operation of the routing network. 
A 00 •• •• A 00 A 
Z H " ^ ^ J ^ ^ J " ^ y y y ^ ^ [ � 
B I 11 I J^f^|^-^f^^ I B I 11 I ^ ^ ^ ^ H T ^ H T—al 
— \ /^^\ B Modules 
Figure 3.8: Batcher and Banyan Network 
23 
Chapter 3 Architectures 
3.3 Central Modules 
The central modules are Bene networks [39]. An example of a Benes network is 
shown in Fig. 3.9. Benes network is rearrangebly nonblocking in the sense that 
if the existing connections can be rearranged, a connection can always be set up 
between any pair of idle input and output. Following the capacity assignment, 
the connection patterns of these benes network are determined by the routing 
assignment. 
團 
Figure 3.9: Central Module: Benes Network 
3.4 Output Module Design (Scheme 1) 
As we have described, the output modules of scheme 1 also support packet 
replication and bandwidth expansion. A straightforward choice of the switching 
fabrics is the Knockout switch [42] with broadcast buses. Referring to Fig. 3.10, 
the switch has m input links connecting to the central modules and n output 
links. Transfer medium of the packets are the broadcast buses. All the output 
links have access to the packets from each input. Output contention is resolved 
by a concentrator, as shown in Fig. 3.11，at each of the output selecting L 
destined packets to enter the output buffers. 
24 






• m • 
1 2 - - - m 1 2 - - • m 
Knockout Knockout 
Gonc6ntrator Gonc6ntrator 
1 I 2 I I L 1 I 2 I I L 
Shifter Shifter 
+ T • Y y y 
" \ ^ " ^ ^ ^ 
• • 
0 n-1 
Figure 3.10: Output Module Design (Scheme 1) 
3.5 Input Module Design (Scheme 2) 
In this scheme, all cell replications must be performed by the input modules and 
that cells are replicated for each output port, rather than each output module 
as in scheme 1. Referring to Fig. 3.1, the input modules also consist of five 
components, namely the Input Header Translator, Input buffer, Input Module 
Controller, Input Replication Network and a point-to-point routing network. 
Functions of IHT and IMC that are different from scheme 1 are discussed in the 
following sections. 
25 
Chapter 3 Architectures 
丄 丄 m input 
%f^  nn nn nn ""ks 
^ ^ f i n ^^^^ 
^ ^ ^ ~ ^ 5 ^ ^ ^ r ^ 
[a [ ^ ^ ] ^ ^ ^ ¾ ^ ^ ¾ 
f m ¢ " 1 r j _ _ E I 1 ^ L o s i n g 
~ ~ ~~ ~ ^ “ p a c k e t s 
fe m q ^ _ _ •"""^ 
\m m []j:^ ^^ ~ Q ~ " • 
m m 1½ ~ ~ ~ ^ r p — — • 
~ ~ ； ~ ~ 1 ~ ~ ； ~ ~ I 
L output links 
P a c k e t P a c k e t P a c k e t I n a c t i v e I n a c t i v e P a c k e t 
a _ b a a 
^ ^ 4 ^ ^ ^ 
Figure 3.11: Knockout Concentrator 
3.5.1 Input Header Translator (Scheme 2) 
The Input Header Translator gets a cell's output destination information and 
append it to a cell as header. Since replications are performed for each output, 
IMC will search the whole DPL during the selection process. Instead of append-
ing the DML as in scheme 1, the whole DPL will be appended. Please refer to 
Fig. 3.12. After the header translation, the cells will be packed into the input 
buffers waiting for being routed by the Input Module Controller. 
26 
Chapter 3 Architectures 
~ ~ |Y|0000,0000,0000,1000 I 
I A |0010,1100,0000,0001 I INPUT 
MODULE 
CONTROLLER 
B I 0000,0000,0001,0000 IX I oooo,oooo,oooo,oioT^  
Before Selection 
— |Y|0000,0000,0000,1000 I 
A I 0000 ,0000 ,0000 ,0001 I INPUT | A | 3 
MODULE 
CONTROLLER 
|x|oooo,oooo,oooo,oioi I I B 11 
After Selection 
Figure 3.12: Look-ahead Selection of IMC (Scheme 2) 
3.5.2 Input Module Controller (Scheme 2) 
Similar to scheme 1，IMC examines the cell headers to determine which can 
be routed to the central modules. The local routing table is the same as in 
scheme 1, which is shown in Table 3.1. A cell fanout will be routed if there is 
a free port for its destined output module in its header. We assume random 
selection among a group of cell fanouts for the same output module. Consider 
the cross-path switch as shown in Fig. 3.13, which in fact is the same network 
configuration as in Fig. 3.3 but this one is for scheme 2. Again, as shown in 
Fig. 3.12, only cells A and B can be routed. After the selection, IMC alters the 
headers of the cell in buffers to reflect the changes. The TNT table is shown in 
27 
Chapter 3 Architectures 
Table 3.3. The winner cells are feed into the replication network. 
Table 3.3: Trunk Number Translator (TNT) Table (Scheme 2) 
VCI VPI CI Port Address R e m ^ 
2 3 0 00 A 
2 ~ 3 1 01 “ A 
2 3 . ^ ~ 10 A 
3 5 ~0 11 “ B 
〕oo r ~ ^ I 一 
� ; ^ ^ B ^""^¾^^"¾^""^""""^ 
丨!!^^^「 
IM3 O M 3 
Figure 3.13: Routing Example (Scheme 2) 
3.5.3 Input Replication Network (Scheme 2) 
The replication network employed in this scheme is also a copy network. How-
ever, we have a new feature here. In scheme 2, we included the cell's output port 
address within its destined output module because the output module will have 
28 
Chapter 3 Architectures 
to distinguish different cell fanouts. The operations of the replication network 
is shown in Fig. 3.14. Other procedures are the same as in scheme 1. 
Running Sum of Copy Numbers 
Running Sum of Activity Bits 
1(0 0) U | | A _ 0 | E 0 _ r - T - ^ 
— M _ ^ ^ > p p = ^ / 7 = p = p ^ ~ ^ — — p = H ^ " ^ = H A | 1 0 | 0 0 | 
\ A (0,2) 00 / B (3,3) 2 \ ^ A 1 
7]T]- i?^^^^>^~~—^ y ——\\^~^——D-|A|00|01 
— M _ V ^ / ^ v ^ _ E z ^ g ^ 
^ (42) U|B|(3,3)|01|/ \ 刚 
7|7] — M_hy 丄, \ ~^^  1—k- lJi_Jl 
\ R u n n i n g - Dummy Reverse Broadcast Trunk 
adder Address banyan banyan Number 
Network Encoder Network Network Translator 
CELL CN CELL MIN,MAX ADDR1 CELL MIN,MAX IR CELL CI CELL ^ ^ ^ A D D R 2 
CN ： COPYNUMBER IR ： INDEXREFERENCE 
ADDR1 ； OUTPUT ADDRESS OF REVERSE BANYAN NETWORK C1 : COPYINDEX 
MIN,MAX ： ADDRESS INTERVAL OF BROADCAST BANYAN NETWORK ADDR2 ： OUTPUT ADDRESS OF INPUT MODULE 
ADDR3 ： ADDRESS OF OUTPUT PORT OF DESTINED MODULE 
Figure 3.14: A Copy Network 
3.6 Output Module Design (Scheme 2) 
The self-routing switching fabrics for the output modules of scheme 2 is a Batcher 
network with L parallel Banyan networks. We have already discussed the routing 
procedures of Batcher network and reverse banyan network, these procedures 
remain the same. Contention-resolutions will be done before the cells entering 
the reverse banyan network to allow no more than G packets to the same output 
port. As described in section 3.5.2, the destination port address is stored in the 
cell header. The output address of cell at link i will be compared with the one at 
29 
Chapter 3 Architectures 
link i + L, Since the cells are sorted after coming out from the Batcher network, 
if they have the same output address, then at least L + 1 packets have the same 
destination address. The cell at link i + L will be dropped before entering the 
reverse banyan network since buffers are not available at the input of an output 
module. 
A__^ 丨0 —— K 
0 n x n ^ 0 
1 1 1 ] 厂 Banyan T \ H u �~ " 
T r w W S w T ^ 
I _ ^ ‘ k - — — • Reverse- 乙 n x n ~ \ / ^ V 1 
Batcher L ^ banyan 乙+1 ^ ^ ^ ； " ^ / 
I _ G f k J : ! L ^ - concen t ra to r ^ ^ ^ ^ ^ ^ 
m 1 i n ^ nxn \^ ^JVv n-1 
m - i _ Banyan u � 
^ {l) |丨 丨[^  
Output 
Packet 1 ^ ^ ^ A d d r e s s 
3 ~ ？ 
Al~~^ 
Packet 2 Let packet 2 go 
through if and only if 
a not equals b 





In this chapter, we will study the effect of multicast traffic on the through-
put, delay and packet loss probability performance of an 1024x1024 cross-path 
switch. The performance analysis is based on two assumptions. Firstly, the 
traffic distribution among input and output ports is homogeneous, a packet at 
an input port is equally likely to be destined for any output port. Secondly, 
the arrival process at each input link is Bernoulli, the probability of a packet 
arriving in each time slot is identical and independent. 
4.2 Traffic characteristics 
4.2.1 Fanout distribution 
In order to characterize the effect of the multicast traffic on the cross-path 
switch, we consider three fanout distributions to make comparisons. Let Y be 
31 
Chapter 4 Performance Evaluations 
the random variable of the fanout number, and M be the maximum of fanout 
number. 
1. Constant distribution 
Fi{Y =： M} = 1. 
Mean E[Y] = M, and variance Var\Y] — 0. 
2. Uniform distribution 
Suppose the requested fanout is uniformly distributed from 1 to M, which 
is the maximum. In other words, 
P r { y = y] = l/_M, 1 < y < M. 
Thus E[Y] = ^ and Var[Y] = ^^^. 
3. Truncated geometric distribution 
(1 - o V — i 
Pr{F = y} = �1 y:M , 1 < y < M, 
where q is used to control the shape of the distribution. The mean is given 
by 
M y i - _ i _ _ J ^ 
^ J" l-q 1 — 严 
This distribution is often used in the literature for modeling the fanout 
distribution [38, 16]. By fixing M equal to the switch size N = 1024, the 
parameter q can be calculated for a given mean fanout E[Y]. 
4.2.2 Middle stage traffic load and its calculation 
Multicast traffic load in input stage is different from that in the output stage 
because cells are replicated inside the switch. Let pin and pout denote the traffic 
32 
Chapter 4 Performance Evaluations 
load of each input link at input stage and the traffic load of each output link 
at output stage, respectively. The traffic in output stage is more than in input 
stage, and the ratio is given by 
^ 二 網 ， 
Pin 
which is the mean cell fanout number. 
We also define the middle-stage traffic load pmid such that pmid/pin is equal 
to the ratio of the amount of traffic in middle stage to that in input stage. It can 
be viewed as a reference load of the switch and reflects some difference between 
two multicast schemes. 
Let X be the number of copies that are replicated from a packet at input 
stage, i.e. the number of distinct output modules that the packet is destined for, 
then pmid — pin • E[X]. For scheme 1 in which there is cell replication at both 
the input and the output stages, E[X] varies for different fanout distributions. 
In order to evaluate E[X]^ we first notice that the probability that a packet has 
no copy destined for a specific output module is given by 
M M m-n>| 
Po = E ^ f ^ . P r { y = "}. 
y=i [yJ 
Then the average of the total number of copies replicated from a packet at 
the input stage is k . (1 — po) = E[X]. For scheme 2, where no replication take 
place in output modules, E[X] = E\Y], thus pmid = pout. In Table 4.1, we 
provide some numerical figures of pmid by fixing pout = 1. 
33 
Chapter 4 Performance Evaluations 
Table 4.1: The Middle-Stage Traffic Load in Multicast Cross-path Switches for 
different Fanout Distributions 
Module Multicast Fanout Mean Fanout 
Size Scheme Distribution \E[Y] = 2 E[Y] = 4 E[Y] = 8 
Geometric 0.9855 0.9578 0.9064 
n=16 S1 Uniform 0.9902 0.9712 0.9345 
Constant 0.9927 0.9782 0.9501 
n=16 S2 G / U / C 1 
G 0.9706 0.9165 0.8242 
n=32 S1 U 0.9800 0.9416 0.8712 
C 0.9849 0.9554 0.9000 
n=32 S2 G / U / C 1 一 
G 0.9419 0.8438 0.6978 
n=64 S1 U 0.9596 0.8857 0.7620 
C 0.9692 0.9113 0.8087 
n=64 S2 G / U / C 1 一 
51 - replicating cells in both input and output stages 
52 - replicating cells only in input stage 
4.3 Throughput Performance 
Since output queueing implemented in the output modules ensures 100% through-
put at the output stage, the throughput of the whole switch is limited only by 
the head-of-line blocking in input modules with look-ahead selection mechanism. 
In this section, we need only focus on the throughput at the input stage. 
In a cross-path switch, the throughput of an input module is mainly char-
acterized by the window size w of input queues and the group size m/k. The 
window size in look-ahead selection is defined to be the number of packets wait-
ing at each input queue that will be checked in one time slot, and thus is limited 
by the module size n and the processing speed as well. For multicast cells, the 
window size is defined to be the number of copies instead of cells. On the other 
34 
Chapter 4 Performance Evaluations 
hand, the group size represents the maximum number of packets that can be 
delivered at an output port. For an n x m input module in cross-path switch, 
the real destination of a packet is one of the k output modules, thus the average 
group size is m/k. Look-ahead selection is employed to relieve the head-of-line 
blocking, while the channel group size can be looked as the output capacity pro-
vided by each destination. Increasing either of them will certainly result better 
throughput, but there is a trade-ofF. The processing speed limits the product of 
the window size w and the module size n, thus a smaller n is more desirable. 
On the contrary, when n is large, the statistical multiplexing gain is high. We 
choose the median size n = 32 in this thesis. 
We itemize the factors affecting the throughput as follows. 
1. The expansion factor m/n 
In a cross-path switch, the ratio m/n is called the expansion factor [36 . 
We can think that it is the average capacity per input link. As shown in 
Fig. 4.1, the throughput is increasing with the expansion factor. Nearly 
100%-throughput can be achieved by a sufficiently large expansion factor. 
The saturation speed is relatively faster in scheme 1 or with a larger mean 
fanout. 
2. The mean fanout E[Y 
To see the multicast effect, we first focus on the first moment of multicast 
traffic, i.e. the mean fanout. The throughput versus E[Y] is plotted in Fig. 
4.2. It can be observed that the throughput is higher when the mean fanout 
is larger. The reason is that the copies from the same packet have distinct 
destinations, thus the output contention at the head-of-line is reduced 
35 
Chapter 4 Performance Evaluations 
• / ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 1 . … 
- ^ : ^ « # . . . 丨 . . . j . 丨 . . . . . . … … . . . . . . . . . … … 
0.94 - 丨产 \.,作-./. : 
广一.於.丨........................丨………丨....••..i.….…i........； 
^ 0 . 9 - / ...:.%. " ^ ； 
o / jf/ / ： 
£ 认 ;,/ ： 
0.88 - ..... ; U 丨...: : ：丨  
i'/i:‘ : 
/ / I S c h e m e 1 
0 . 8 6 - / / / ： ； : . . . . . 
J , / ； - - S c h e m e 2 
0,Q^//./....： o G e o E [ Y ] = 2  
� ‘ / 来 G e o E [ Y ] = 4 
0 . 8 2 j ^ ； + G e o E[Y]=8 ： 
O a ' 1 1 1 1 i 1 1 ‘ ‘ ‘ 
1 1.1 1 .2 1.3 1 .4 1 .5 1 .6 1 .7 1 .8 1 .9 2 
Expans ion Factor , m / n 
Figure 4.1: Throughput increase with expansion factor m /n 
compared with the case of having the same total number of unicast cells 
waiting. In conclusion, the look-ahead throughput performance is more 
favoured to multicast traffic. 
3. The fanout distribution 
Even with the same mean fanout number, the throughput still varies for 
different fanout distributions as shown in Fig. 4.3 and 4.4. The throughput 
discrepancy is larger in scheme 1 because the middle-stage traffic load pmid 
with different fanout distributions is different, as shown in Table 4.1. The 
larger the pmid, the more congested it is in the switch and the smaller the 
throughput. It helps to explain why the highest throughput is attained 
when he cell fanout is geometrically distributed. 
Conversely, the throughput discrepancy in scheme 2 is smaller because the 
36 
Chapter 4 Performance Evaluations 
1 r- : ^ 
J ^ 米 ： 米 """"^ • 
^ ^ ： ^ — ^ — ； — — — 米 
Q 9 8 ^^ j^ ^^ j^^ j^^ _^^ ^^ >>-"-^ • • 二 .二. •二 • i _ - . ‘ . : — — — “ • "• ^‘ •，.，• 
0 . 9 6 - • . . . . ^ ^ : > ^ 
0.94 - ； ： ^ ^ ^ ^ ^ ^ ^ : 
r-.................丨..........丨......^^......:.......丨 
I 0 . 9 - : > ^ . . . : ： ； 
^ 0 . 8 8 - ： ^ / > ^ 一 S c h e m e 1  
； y ^ S c h e m e 2 
0 . 8 6 - ^ / ^ o G e o m / n = 1  
^ X 米 G e o m / n = 1 . 5 
0.84- y ——： ： ~ ~ . 一 _. 1 一 一 . - - -^  
^ 一 一 — — - - — — 今 - - ： 
暖 ^ . - . ‘ : - - — . 一 丨 ； 
n o l I I I 1 1 1 ‘ 
° - ^ 1 2 3 4 5 6 7 8 
E [ Y ] 
Figure 4.2: Throughput increase with mean fanout number E[Y 
middle-stage traffic load pmid is the same as pout for all fanout distribu-
tions. However, since the geometric fanout distribution has the largest 
variance, its performance is the worst. This is similar to other switching 
systems that the performance is probably better in handling traffic with 
less fluctuations. 
4.4 Delay Performance 
There are two places in a cross-path switch that incur delay to a packet. Firstly, a 
packet will have to wait at the input buffer of an input module before it is selected 
and routed to its destined output module. Secondly, it has to wait at the output 
queue of an output module before it is routed out of the switch. Unless otherwise 
specified, we will use the mean delay as a performance measure. Moreover, the 
mean of the total delay is the sum of the mean of the two mentioned delay 
37 
Chapter 4 Performance Evaluations 
1 厂 • 命 ： ， ^ > ^ - - - " f ; ; E ^ — ® 
: 。 一 - ； , : " " ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ : 
• : : ， 一 丨 . . . . . 丨 ^ ^ ^ . — . ： . . . . . . . . 丨 . 一 ： . . . . . . 丨 
0 . 9 6 < ^ - - / y ; ； l ^ : : •. 
子彳 . . . . . . . . ^ . . . . . .丨 . . . . . . . . . . . . . . . . . . .丨 . . . . . P ^ 1 
t 0 . 9 2 - . / / / ——E[Y]=2 i 
I 丨 / / . . . - - E [ Y ] = 8 ； 
jC 0 . 9 - ' W - o G e o 書 米 C o n 
0 . 8 8 - … … # : r + U n i . . : 
. . . . . 丨 丨 ^ “ ^ ^ 丨 
0 . 8 6 - . # . : :• : ； 
徹4......； ； 
0 82^ i i i i 1 i i 1 ‘ ‘ 
1 1 .1 1 . 2 1 . 3 1 . 4 1 . 5 1 . 6 1 . 7 1 . 8 1 . 9 2 
E x p a n s i o n F a c t o r , m / n 
Figure 4.3: Throughput varying with fanout distribution in scheme 1 
quantities. 
4.4.1 Input Stage Delay 
It is difficult to obtain an analytical model for the delay distribution at the input 
stage, as look-ahead selection scheme is implemented and the capacity assigned 
to each virtual path is periodic. In regard of these, we resort to simulation 
method to obtain the delay statistics instead. 
As shown in Fig. 4.5, input stage delay in scheme 2 decreases rapidly with 
the increase in expansion factor. Asymptotically, when the expansion factor 
goes to infinity, the two switches will become output buffered switches such that 
their input delay will be the same and equal to zero. Empirically, we observe 
that approximately 1.35 times more central modules is needed for scheme 2 to 
achieve delay performance comparable to scheme 1. 
38 
Chapter 4 Performance Evaluations 
1 � r vjZ' •_• 二 _ r ^ - - ^ - ' -^ 'rJjL::L^^-^^ -" 一，r ^  
； , : : : : 1 ; ^ ^ ; ^ ^ 
0 . 9 8 - . . . . . . . . ： . . . . . . . . . r ^ . ^ > ^ ^ . . . . .丨 . . . . . . . . . . . . . . . . . . . . . 
“6-........：........：力黄.....丨...................丨……….丨.…..…_........ 
0.94-......：...":厂#.. .. :.. .. :.. .. .i.. . • … . . . . . . . . \ 
I.......丨-等1.......................................... ....；.........：...••...丨 
S" 0 . 9 - i V ^ / ' : ； ： i 
2 ii-.i / / ： jE / " / : 
0.88 - •. |f/- \W- ..: • . •； 
；；,'f 一 E [ Y ] = 2 
0.86 - / 
; / / ; - - E [ Y ] = 8 
0 . 8 4 ^ r / - -; o G e o . . . ; 
/ 来 C o n 
0 . 8 2 ^ ^ ； + U n i . . : . 
0 o l i 1 i i i 1 1 1 1 1 
• 1 1.1 1 .2 1 .3 1 .4 1 .5 1 . 6 1 . 7 1 . 8 1 . 9 2 
E x p a n s i o n Fac tor , m / n 
Figure 4.4: Throughput varying with fanout distribution in scheme 2 
4.4.2 Output Stage Delay 
As we have described, output queueing is implemented at each output module 
of a cross-path switch. While it is difficult to obtain analytical results for input 
stage delay distribution, it is possible to develop an approximate model for the 
output delay distribution, taking into account the effect of cell replication. Most 
of the analyses present here follow the procedures as described in [41], where 
unicast switch is discussed. Consider a generic multicast switch of size N x N 
as shown in Fig. 4.6. Each output queue has its own buffer of size b, and is of 
FIFO nature. The traffic is homogeneous with link load equals to p. We fix our 
attention at a particular output queue only, because its characteristics represent 
all the others. -
Let random variable A denotes the number of packet arrivals destined for 
the particular output and Y be the random variable representing its number of 
39 
Chapter 4 Performance Evaluations 
6| 1 1 1 1 1 1~ I I 
X X S 1 , p = 0 . 8 
• X X S 2 , p = 0 . 8 
\ • Q S 1 , p = 0 . 9 . _ 
. � , � . . : Q — — 日 S 2 , p = 0 . 9 
\ ； I ： ： 
\ : '• ： 
4 - 、. : : ： -
:\ ： 
？ : \ ； 
I \ . 四 . \ ： 
込 3 - - . . 、 \ . : 、 : \ ： : ； ； : -
盆 •、 ：•、. •  
® ： .、 \ 
o ： \ \ ： � � : Ck : 
•、 ：、.、 ： . 
2 - ； :..、.,、.、.:...、..〈:：: : : : -
^ ^ ^ ^ ^ 、 . 、 . 丄 : : 二 : : : : : . 〜 」 . 、 丨 
1 - ： ; ^ " > ^ ^ ^ > ^ ^ ^ ^ : : : ^ ^ ^ 2 : ; ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ; ^ ^ ^ ^ ^ ^ ^ … … : 二 :〜..;,::.:^ :二 5 . : ; 
^^^^~^^^=^=========e======^  
ol I I 1 1 1 1 1 1  
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 
m/n 
Figure 4.5: Input Stage Delay vs. Expansion Factor, E[Y]=8, w=8 
fanout. In addition, define 





1 if packet at link i 
Ai = < has a copy to queue j, 
0 otherwise. 
k Let, 
Pi = Pr[Ai = 1 
=_Pr[A2 = l|Z)2 = l]x7=V[i^  = l: 
” M (N-A 





Chapter 4 Performance Evaluations 
input 
l'nk_^  p,- output 
\\ buffer 0 
hnput V \ ^ . 
linkJ \ ^ ¾ ^ output 
\ buffer 1 
\ i input \ 
link_^ ^ \ output 
buffer N 
Figure 4.6: Logical framework of an output-buffered Multicast Switch 
= ^ 
—N 




二 i ^ k > n i - p r 
= f ^ V - ) ^ f l - —)^~^ (4.1) 
\k)�N) V N) � ) 
k 二 0 , 1 , . . . , N 
By the same token, we can consider an output module used in scheme 1 as 
an m x n multicast switch. Substituting the corresponding values into Eqn. 4.1, 
a, 二 W ( ^ f ( l - - r - ' (4.2) 
Y k y n n 
For n 二 oo, m = oo, 




Chapter 4 Performance Evaluations 
k = Q,l,...,N 
When m + n, A will not be a Bernoulli random variable. Therefore, we will 
instead use Eqn. 4.2, however, the discrepancy will be rather small. To simplify 
the analysis, we assume that the group size, G, is large enough (say G=l2), 
such that loss due to output contention will be neglibly small in comparing to 
the loss due to buffer overflow. 
/ : ^ ^ ^ ^ ^ ^ ^ ^ ^ " ^ ^ ^ / X ^ ^ J : ^ . ^ ^ \^b+1+%+2+� + � 
Iff s A s\A ^^^¾ \ 
s ; 0 j ^ j D …….’ C 0 5 " � 
^0 ^0 ^0 
Figure 4.7: Discrete time Markov chain state transition diagram for Qm 
Let Qm denote the number of packets in the tagged output queue at the start 
of the mth time slot, and Am denote the number of packet arrivals at the mth 
time slot. The system changes according to the equation: 
Qm = min{max{0, Qm-i — 1) + ^m, ^} 
For N 二 oo and b = oo, the distribution of Qm can be modeled by a M / D / 1 
queue. For finite N and b, Qm can be modeled by a finite-state, discrete-time 
Markov chain, with state transition probabilities P{j = Pr[Qm 二 j|Qm-i = <], 
42 
Chapter 4 Performance Evaluations 
given by 
‘ 
ao + ai z = 0 j = 0 
ao 1 < i < b, j = i-l 
p..= 
” a,-^-+i 1 < j < b - l , 0 < i < j 
� E 二 =广终 l a n ^ J ^ h , 0 < z < j 
Fig. 4.7 shows the state transition diagram for Qm- The subscript m will 
be dropped for the steady-state quantities. From the balance equations, the 
steady-state queue size can be obtained directly, such that 
7Ti = Pr[Q = 1: 
1 — «0 — Cil 
二 . 7To 
«0 
7Tn = Pr[Q 二 n 
1 — « 1 ^ O . k r , , ^ 7 
= 7T^ -1 - ^ T^n-k, 2 < n < b 
do k=2 0^ 
and E L o ^i 二 1. 
From Little's result, the mean waiting time can also be obtained as, 
W = ！ 
Pout 
= E n = l n7Tn 
1 - TTotto 
Figure 4.8 shows the simulation and the analytical results of the output delay 
with output load when E[Y]=8. It can be observed that the mean waiting time 
is smaller than 5 cell times when the output load is no greater than 0.9. 
4.5 Cell Loss Performance 
The output stage of the cross-path switch is a loss system, where a cell may 
be lost due to two reasons. Firstly, a cell will be lost if the buffer cannot 
43 
Chapter 4 Performance Evaluations 
4 0 「 _ ： : -. ,: ： : 
A n a l y s i s 
X X S i m u l a t i o n : S 1 ’ m = 3 6 . 
35 - o o S i m u l a t i o n : S 2 , m = 4 8 ； 
3 0 - ： \ : ：； : • 
I 2 5 - : ； ； • • • ‘ 
f2� -. . . . . . . . . .丨.......................丨.................丨................. : : . . . . . : . 
0) / 
� - . . . . 丨j — 丨 — 丨 . . . i . . . . . . . 丨 — . 丨 . . . . . h 7 … ： 
1 0 - •： ： / •： 
5 _ n _ x 3 ^ ^ ^ ^ ^ " ^ " ^ - j j . j 丨 . g) • 
Q? I I I I I I 1 1 
0 . 6 0 . 6 5 0.7 0 . 7 5 0 . 8 0 . 8 5 0 . 9 0 . 9 5 1 
Output Load 
Figure 4.8: Output Stage Delay vs. Virtual Load, E[Y]=8 
accommodate all the arrival cells. Secondly, it may be lost due to too many cells 
arriving at the same output queue during a time slot. In the following sections, 
we will study the parameters to achieve a cell loss probability no larger than 
1 0 - 6 . 
4.5.1 Cell Loss due to Buffer Overflow 
The model used in section 4.4.2 to analyze the output delay can also be used 
in the analysis of cell loss probability. A cell will be lost if there are already b 
packets in the queue upon its arrival. In this loss system, 
pinE[Y] • (1 - Pr[loss]) = pout 
Pr[lossl = 1 " : t 
iOinE[Y 
where pout = 1 _ 7Toao, since the switch will be idle if and only if Qm-i = 0 
and v4m=0. The exact values have been obtained by simulation and the results 
44 
Chapter 4 Performance Evaluations 
Table 4.2: Scheme 1: Loss Probabilities Due to Output Buffer Overflow, yo=0.9, 
E[Y]=8 
Buffer Size m=36 m=iQ m = 4 4 ~ 
8 2.32e-02 2.33e-Q2 2.36e-Q2 
l6 3.72e-03 3.7Qe-Q3 T74e -03 
M 6.9Qe-04 6.67e-Q4 ~^ le -Q4 
32 1.34e-04 1.34e-04 T25e-Q4 
48 4.05e-06 ^.06e-06 T27e-06 
64 l.Ole-07 8.33e-08 l.Q2e-Q7 
Table 4.3: Scheme 2: Loss Probabilities Due to Output Buffer Overflow, /9=0.9, 
E[Y]=8 
b u f f e r Size m=40 m=44 m=48 
8 2.28e-Q2 2.29e-02— 2.33e-Q2 
16 3.52e-03 3.62e-Q3~ 3.73e-03 
M 6.41e-04 6.50e-04 " ^ 7 e - 0 4 
32 L19e-04 1.22e-Q4" 1.2Qe-Q4 
48 3.87e-06 6.11e-06_ 5.18e-Q6 
64 1.03e-07 l.Q4e-07 1.02e-07 
have been shown in Table 4.2 and 4.3. Figure 4.9 and 4.10 shows the analytical 
results along with the simulation results. The cell loss rate is not sensitive to 
the number of central modules, when the switch's throughput is greater than 
the output load. When p = 0.9, a buffer size of 64 is enough to keep the cell 
loss rate below 10~®. 
4.5.2 Cell Loss Due to Output Contention 
Consider an output module of size m x n. Since the cells of the m input links 
may come from different input modules and the pattern is varying time by time, 
45 
Chapter 4 Performance Evaluations 
we need to simulate all k input modules and totally N = 1024 links in order to 
obtain the loss probability. However, we observe that there is no buffering at 
the input links of each output module and thus the time correlation within the 
arrival process at each input link is not necessary to be taken into consideration 
in computing the loss probability. Although we still need the simulation data 
to capture the "space" correlation among the active neighbouring input links 
connected from the same input module in order to calculate the loss probability, 
it is sufficient to simulate only one input module and collect the statistics about 
how many packets are delivered to the specific output module. 
1. Consider all m input links of an output module. And let A be the ran-
dom variable of the number of active links. In order to calculate the loss 
probability, we first need to obtain the probability distribution of A. De-
fine GA{z) = £=oPrj^v4 = a ] . z � b e the generating function. Suppose 
among the m links, the number of links connected from input module z', 
0 < i < k — 1, is rui, and 5]f~Q^ rrii = m. Similarly define Ai be the number 
of corresponding active links among the rrii links from the input module i. 
Since the traffic from different modules is independent, it follows that 
OiW = ntJG ,^(^). 
Also since the traffic statistics from any input module is independent 
and identically distributed, we can obtain the probabilities Vi{Ai = a} , 
0 < a < rui, Vz, by simulating only one input module and collecting the 
departure statistics for each time slot. For example, we demonstrate the 
calculation of the generating function of A for the case that m — 36 and 
E[Y] = 2. The frame size is equal to f = 8, and g = ^ = 9, indicating 
46 
Chapter 4 Performance Evaluations 
Table 4.4: The Statistics of the Number of Packets Leaving from an Input 
Module for an Output Module 
Time 7V(*) Probability that x links are active The corresponding 
slot X = 0 X = 1 X = 2 generating function 
0 2 0.1589 1.2474 0.5937 GO(z) = 0.1589 + 0.247½ + 0.5937r^  
^ 1 1 0.2286~~0.7714 - G l ( z ) = 0.2286 + 0.771½ 
~~2 1 ^  0.2109 0 . 7 M T - G2(z) = 0.2109 + Q.7891z 
^ 3 i 0.1973 0.8Q^ - G3(z) 二 0.1973 + 0.8Q27^  
“^4 i““ 0.1850 l8150 - G4(z) = 0.1850 + Q.815Q^  — 
5 ~ 1 ~ 0.1795 0.8205 - G5(z) ^Q.1795 + Q.8205^ — 
“ “ 6 ~ ~ 1 Q.17lF 0.8290 “ - G6(z) 0.1710 + 0.829Qz 
7 1 0.1682 0.8318 - 67丨/) = 0.1682 + 0.8318z 一 
N - Number of links connected to the specific output module 
that there are totally 9 links connected from an input module to an output 
module during 8 time slots. The statistics of the packet delivered from any 
input module to the specific output module is summarized in Table 4.4. 
The m — 36 input links of an output module can be partitioned into four 
groups, each has 9 links with statistics given in Table 4.4. By convolution, 
the generating function of the probability distribution of the number of 
active packets arriving at an output module is given by 
[G^{z)Gl{z)G2{z)G?>{z)G^{z)G^{z)G^{z)Gl{z)]\ 
2. Let Yo be the random variable of the fanout number of a packet arriving 
at the output module. Consider a packet (at the input stage), the average 
number of the copies destined for the specific output module is obviously 
E\Y\lk^ including the case that no copy of the packet is destined for the 
47 
Chapter 4 Performance Evaluations 
output module. Then 
^^=Po-0^{l-po)-E[Yoi 
where po is the probability that no copy of the packet is destined for the 
output module. 
— 则 — 巡 
™ 二 k{l-po) 二 丽 
3. Consider the probability that one copy of a packet arriving at the out-
put module is destined for a particular output port, which is simply 
p = E[Yo]/n. In scheme 1, this probability is independent from link to 
link, because no two packets arriving at the output module come from the 
same original cell, which generates at most one copy leaving for any one 
output module. In scheme 2, since there may be the case that more than 
one packet simultaneously arriving at the output module is generated from 
the same original cell, this probability is dependently distributed over the 
links from the same input module. This dependence here is beneficial to 
having less loss because the copies generated from an original cell cannot 
be destined for the same output port, resulting in less output contention 
to the port. We make the independence assumption in the following com-
putation is therefore producing an upper bound for the loss probability 
of scheme 2. When k is sufficiently large or m/n is small, the bound is 
expected to be tight. 
4. Consider a particular output port. Let L be the random variable that the 
number of packet-copies destined for the port. Its generating function is 
48 
Chapter 4 Performance Evaluations 
given by 
m 
GL{z) = J 2 { l - p + pzrFv{A = a}, 
a=0 
where p = E[Yo]/n and Pr{v4 = a} is the probability that a of the m links 
are active. Thus 
Fv{L = / } = £ Pr{A = a} • ( j (1 - p p - V . 
a=l Vv 
And the loss probability is 
1 m 
——E {l-G)Fv{L = l}, 
"owt z=G+l 
where G is the group size of an output module. 
The results are presented below. 
1. The group size G 
First, we demonstrate the function of group size G in output modules in 
Fig. 4.11 and Fig. 4.12. The group size G is defined to be the number of 
packet-copies that can be delivered to each output port during one time 
slot. The larger the group size, the smaller the loss probability but the 
higher the complexity. Arbitrarily small loss probability can be achieved 
by a sufficiently large G. In addition, we can observe that the discrepancy 
in the loss probability is not so significant with different parameters such 
as the number of central modules m and the mean fanout number E\Y . 
In other words, it is mainly characterized by the group size G. 
2. The expansion factor 
As shown in Fig. 4.13, the loss probability is also varying with the ex-
pansion factor m/n. When the number of central modules is large, the 
49 
Chapter 4 Performance Evaluations 
number of active packets allowed to be simultaneously delivered to an out-
put module tends to be more uniformly distributed, and thus a higher loss 
probability is expected. 
3. The mean fanout number 
The effect of multicast traffic on the loss probability is not so significant. 
As shown in Fig. 4.14 with fixed group size G = 8, the expansion factor 
m|n dominates the trend of the loss probability. It changes only a little 
when the mean fanout number E\Y] increases, except that if the expansion 
factor is small {m|n 二 1) in scheme 1 due to the dramatic throughput 
increase. Please refer to Fig. 4.2 for this effect. 
4.6 Complexities 
In this section, we will estimate the complexity of a multicast cross-path switch 
with both cell-replication schemes and compare their complexity given certain 
performance requirement. The complexity unit is defined to be a 2 x 2 cross-bar 
switch. 
The input modules and the central modules in both multicast schemes are 
the same. Since only the rearrangeably nonblocking condition is required in 
central modules to ensure that every connection pattern of route assignment 
can be realized, a Benes network is suitable and sufficient. Its complexity is 
given by 
- 1 
Ccm = ^ (log2 k —-), 
where k is the size of a central module, i.e. the number of input modules. Then 
50 
Chapter 4 Performance Evaluations 
the complexity of the middle stage consisting m central modules is obtained as 
Cmid = rnk{l0g2 k -臺). 
Since cell replications are necessary at the input stage, a copy network [38 
cascaded with a point-to-point switch is sufficient. The copy network is es-
timated as two banyan interconnection fabrics, each with complexity Ccn — 
y l0g2 m, while the point-to-point switch consists a Batcher-banyan which com-
plexity is given by 
^ m 1 八 _,� m 1 
Cpp 二 — log2 m(l0g2 m + 1) + y l0g2 m. 
Then the total input-stage complexity is Cin = k{Ccn + Cpp). 
At the last stage, two different switches are required. For scheme 1 where 
cell replications are still needed in output modules, a knockout switch [42, 21 
with broadcast buses can be used. Since each concentrator with group size G 
has the complexity 
Q(Q 1 \) 
m + (m - 1) + (m — 2) + ——h (m — G) = mG ^^——� 
Zj 
we need Cknock = n[mG-G{G+l)/2] switching nodes to build up one knockout 
switch and totally Couti 二 kCknock for the whole output stage [42]. For scheme 2, 
a Batcher-banyan [2] switch with channel grouping can be used in output stage 
without multicasting. This switch consists an m x m Batcher-sorting network, 
an m x m concentrator and G parallel banyan networks. The total complexity 
of the output stage—for scheme 2 is given by 
^ , 「m 1 /1 1 \ rn - ^ n ‘ 
Cout2 - k • |^-l0g2 m(l0g2 m + 1) + Y l0g2 m + G • - l0g2 n . 
51 
Chapter 4 Performance Evaluations 
The numerical results of complexity comparisons between two schemes in 
terms of the number of 2 x 2 switch nodes under the same throughput require-
ment and the same loss probability are shown in Fig. 4.15 and Fig. 4.16, 
respectively. The trade-off between performance and complexity of scheme 2 is 
superior to scheme 1 because of the huge complexity in establishing broadcast 
knockout switches. 
52 
Chapter 4 Performance Evaluations 
10。I ] 1 ! 1 1 I ： 
‘ [ ^ ~ ~ X Pou t=0.6| ： 
10-1 r o 0 Pout=0.7 , 
丨 ^ ' + + Pout=0-8 i 
i 3 
^0"®l I I ^ I I 1 1  
0 10 20 30 40 50 60 70 
Buffer Size 
Figure 4.9: Buffer Overflow Loss Probability vs. Virtual Load，m=36 (Scheme 
1) � 
1 0 ° fc: : ： ： ： ： ： ： ： ： ： ： ： j ： ： ： ： ： ： ： ： ： ： ： ： ： : | : : ： ： ： ： ： ： ； ： ： ： ： ：!： ： ： ： ： ； ； ； ： ： ： ： ： ；！； ： ； ； ： ： ： ： ： ： ： ： ： ： ^ ： ： ： ： ： ： ； ： ： ： ： ： ： I ：；：：：：：;：：：：：： 
丨丨丨丨丨丨丨:::卩:丨丨丨:丨:丨:丨丨::::::::丨丨丨丨旧丨丨丨丨丨:丨丨丨丨丨丨|>< x " P o u t = 0 - 6 [ ; : 
10-1 : \ o o Pout=0-7 , 
丨 ^ ‘ + + Pout=0.8 : 
§ 3 
io"®L I I \ 1 I 1 1  
0 10 20 30 40 50 60 70 
Buffer Size 
Figure 4.10: Buffer Overflow Loss Probability vs. Virtual Load，m=48 (Scheme 
2) 
53 
Chapter 4 Performance Evaluations 
10° I 1 1 1 1 1  
； ^ ^ ^ ^ |—m=32 -
1 0 - 4 - : . . ^ ^ ¾ : — 1 = 4 8 -
^ ^ ^ o E [ Y ] = 2 
產 1 � - 6 : ^ ^ ‘ E m = 4 
1 ^ ° _ ^ ¾ ^ + E [ Y ] = 8 -
y ： ： ： • r : : : • 
10-12- : : i 、丨 
1 0 _ 1 4 | i I I i 1  
0 2 4 6 8 1 0 1 2 
G r o u p s i z e a t o u t p u t s t a g e , G 
Figure 4.11: Loss probability decrease with group size G in scheme 1 
10° 1 1 1 1 1  
,1 ^ ^ 1 ^ ： ： ： ： 
； ^ ^ ^ ： |_m=32 • 
10-4- : ^ ^ 场 --m=48 . 
X ; ^ � o E [ Y ] = 2 
^ 6 ^ ^ ^ , \ 米 E [ Y ] = 4 
S 10—6 - ^ % ^ x + E [ Y ] = 8 -
t . K x ^ - ^ “ ‘ 
i^10_8 - 丨 . \ 、 - 丄 -
° 丨 X j � 
: . : : T : : : : 1 
10_14|——： 1 J i 1 1  
0 2 4 6 8 1 0 1 2 
G r o u p s i z e a t o u t p u t s t a g e , G 
Figure 4.12: Loss probability decrease with group size G in scheme 2 
54 
Chapter 4 Performance Evaluations 
1 0 " ^ [ ‘~~I 1 1 1 T 1 I ~ ； ~ I I 
... S c h e m e 1  
“ . — S c h e m e 2 .:. ： 
o E [ Y ] = 2 G = 8 
^ 来 E [ Y ] = 4 G = 8 
S + E [ Y ] = 8 G = 8 
| lQ-'-l . . I: : . . ^^^^^^^^^^^^^^^^^^^^^^_^^^^^^^_^^) ;^^ :g ;g : :^ i 
I : … … . ^ ^ ^ ^ ^ _ ^ _ ^ _ : _ : ^ p ; : ; ; ^ ^ ^ ^ ^ ^ ^ ^ : ‘；  
m ^ ' 
1 0 ~ ^ I I I I 1 1 1 1 1  
1 1.1 1 . 2 1 .3 1 .4 1 .5 1 .6 1 . 7 1 .8 1 .9 2 
E x p a n s i o n Factor , m / n 
Figure 4.13: Loss probability increase with expansion factor m / n 
xio_7 
8「 : : ： 
^ ^ ^ ‘ ^ I 
_ : , ^ - ^ ^ …    
7i^^^^- . t - . - . : : - : : - . -^ : - - . . . . . .： : 
7、 一 一 
5e - 一 一 ： ； ； 
6 - ： •： ： 
. ^ 
S 5 - S c h e m e 1 : 
co 
2 - - S c h e m e 2 
^ o m = 3 2 G = 8 
2 来 m = 4 8 G = 8 _ ^ ^ ^ ^ 
P ^ ^ : 
( » ^ _ ^ : ^ “ 
1<^  — — I I I I I I 
2 3 4 5 6 7 8 
E[Y] 
Figure 4.14: The little change of loss probability with mean fanout E[Y 
55 
Chapter 4 Performance Evaluations 
^ rXlC)5  





0 2 - | 1.: -
妄 S c h e m e 1 
1 5 _ . — — S c h e m e 2  
o E[Y]=2 G = 8 
1 _ . 米 E [Y ]=4 G = 8  
+ E[Y]=8 G = 8 \ _ _ : ^ 
�f f l ^®- - S ^ ^ ^ 
osl 0..«,^^^'^'»'^'^^^^'^'^' 丨 1  
' 0 . 8 0 . 8 2 0 . 8 4 0 . 8 6 0 . 8 8 0 .9 0 . 9 2 0 . 9 4 0 . 9 6 0 . 9 8 1 
Throughput 
Figure 4.15: The Complexity Comparison between two Multicast Schemes for 
Different throughput requirement 
,x10' 
4.5p : : : 
I——^ ^——M : ^ 
4 - S c h e m e 1 ^ ^ > ^  
- - S c h e m e 2 ^ ^ 
3 . 5 - . o E [ Y ] - 8 Throughput=0.9 , . . . . ^ ^ . . . . . , : 
来 E[Y]=8 Throughput=0 .99 J ^ J 3 ^ ^ ' ^ ^ 
1 ^-' ^ ^ ' - ^ ^ - . > ^ … … ： 
f;:::::::i::::::p^^5^:i:::::] 
� 1 . 5- . ^ ^ ^ - . ： — 
1 - y^ >^  ^ 
z z ^ _ ^ _ _ ^ - - ^ - - > ^ - - ^ - - ^ 
^ ^ ^ - - ^ - - ^ - - ^ - - ^ - : ^ _ _ c ^ _ - o - - ^ - - ° 0.5-....|^_^__e--0--^--^--^--^  
o l i i i 1 1 1 
6 2 4 6 8 10 12 
Group size at output stage, G 
Figure 4.16: The Complexity Comparison between two Multicast Schemes for 




In this thesis, we studied the theories of path switching and the architectures of 
cross-path switch. This switch has distributed routing with limited amount of 
global information is possible to be used in building large-scale ATM switches. 
We discussed two possible schemes to support multicast connections. Two 
architectures have been designed to demonstrate the realization of these en-
hancements. Furthermore, we investigated the performance issues on the switch 
throughput, cell loss rate, delay distribution and complexities of an 1024 x 1024 
switch, under homogeneous traffic environment. We summarize our result below: 
1. The maximum throughput attained is greater than 0.95, when the expan-
sion factor (m/n) is greater than 1.5. 
2. The mean of input stage delay is smaller than 2 cells/slot, when the ex-
pansion factor is greater than 1.5 and with a switch load smaller than 0.9 
cell/slot. 
57 
Chapter 5 Conclusions 
3. To achieve comparable delay performance at the first stage, approximately 
1.35 times more central modules are needed for scheme 2. 
4. Output stage delay is not sensitive to the multicast traffic. Its distribution 
can be modeled by a M / D / l / N system. The mean delay is bounded by 5 
time slots with a switch load smaller than 0.9 cell/slot. 
5. The output buffer size of 64 is enough to keep the cell loss rate due to buffer 
overflow smaller than 10~® with a switch load smaller than 0.9 cell/slot. 
6. Cell loss due to output contention is mainly determined by the group size 
of the output module and is not sensitive to multicast traffic. When the 
group size equals to 10, it is enough to keep the cell loss rate smaller than 
10-8. 
7. When the mean fanout of the system is not large compared with the switch 
size, scheme 2 has much lower complexity with comparable performance. 
58 
Bibliography 
1] Acampora, A. C. An Introduction to Broadband Networks, Plenum Press, 
1994, pp. 216-217. 
2] A. Huang and S. Knauer "Starlite: A Wideband Digital Switch," Proceeding 
of Glohecom ,¾, pp.121-125. 
3] A. Kershenbaum "Telecommunications Network Design Algorithm," 
McGraw-Hill 1993. 
4] C. Clos "A Study of Non-blocking Switching Networks," Bell Syst Tech. 
J., Vol.32, Mar. 1953，pp. 406-424. 
5] Cheuk H. Lam, "Virtual Path Traffic Management of Cross-Path Switch," 
Ph.D. dissertation，The Chinese University of Hong Kong, 1997. 
6] D. Anick, D. Mitra and M. M. Sondhi "Stochastic Theory of a Data-
handling System with Multiples Sources," Bell System Tech. J., 61, pp. 
1871-1894, 1982. 
'7] D. Bertsekas and R. Gallager "Data Networks" 2nd ed., Prentice-Hall In-
ternational^ 1992. 
59 
8] D. G. Cantor “On Non-Blocking Switching Networks," Networks, Vol. 1， 
No.4, Winter 1971, pp. 367-377. 
9] D. Mitra, J. A. Morrison and K. G. Ramakrishnan, “ATM Network Design 
and Optimization: A Multirate Loss Network Framework," IEEE JSAC, 
vol. 4, no. 4，August 1996. 
.10] Daniel J. Marchok and Charles E. Rohrs, “First Stage Multicasting in a 
Growable Packet (ATM ) Switch", Proc. ICC ,91, pp. 1007-1013. 
11] F. S. Hillier and G. J. Lieberman "Introduction to Operations Research" 
5th ed., McGraw-Hill, 1990. 
12] F. T. Leighton, "Introduction to Parallel Algorithms and Architectures: 
Arrays . Trees . Hypercubes", Morgan Kaufmann 1992. 
13] G. de Veciana, G. Kesidis and J. Walrand, "Resource Management in Wide-
Area ATM Networks Using Effective Bandwidths", IEEE JSAC, Vol. 13, 
No. 6, Aug. 1995, pp. 1081-1090. 
14] H. Takagi, "Queueing Analysis, A Foundation of Performance Evaluation, 
Volume 1: Vacation and Priority Systems, Part 1”, Elsvier Science Pub-
lishers B. V., 1991. 
15] J. N. Daigle, "Queueing Theory for Telecommunications", Addison-Wesley 
Publishing Company, 1992. 
16] Joseph Y. Hui and Thomas Renner, "Queueing Analysis for Multi-
cast Packet Switching", IEEE Trans. Commun., vol. 42, no. 2/3/4, 
Feb./Mar./Apr. 1994, pp. 723-731. 
17] J. S. Tuner, “Design of a Broadcast Packet Switching Network", IEEE 
Trans. Commun., vol. 36, no. 6, June 1988, pp. 734-743. 
18] John C. Lin and Sanjoy Paul, "RMTP: A Reliable Multicast Transport 
Protocol，，, Proc. INFOCOM ,96. 
19] J. Y. Hui "Resource Allocation for Broadband Networks," IEEE JSAC, 
Vol. 6, No. 9, Dec. 1988, pp. 358-368. 
.20] K. E. Batcher "Sorting networks and their applications," Proc. 1968 Spring 
Joint Comput. Conf. 
.21] K. Eng, M. Hluchyj, and Y. Yeh, "A Knockout Switch for Variable-Length 
Packets", IEEE JSAC, vol. 5, no. 9, Dec. 1987. 
22] K. Y. Eng, M. J. Karol and Y. S. Yeh "A Growable Packet (ATM) Switch 
Architecture: Design Principles and Applications," in Proc. Glohecom ,89, 
Nov. 1989. 
23] K. J. Schrodi, B. Pfeiffer，J. M. Delmas and M. De Somer, "Multicast Han-
dling in a Self-Routing Switch Architecture", Proc. ofISS ,92, Yokohama 
1992, pp. 156-160. 
'24] L. Kleinrock, "Queueing Systems Volume 1: Theory", Wiley-Interscience, 
1975. 
'25] L. S. Lasdon and A. D. Warren "Generalized Reduced Gradient Software 
for Linearly and Nonlinearly Constrained Problems," Design and Imple-
mentation of Optimization Software, Sijthoff and NoordhofF, Alphen aan 
den Rijn, The Netherlands, 1978. 
26] M. Frank and P. Wolfe “An Algorithm for Quadratic Programming," Naval 
Research Logistics Quarterly, Vol. 3, 1956, pp. 95-110. 
27] M. G. Hluchyj and M. J. Karol, "Queueing in High-Performance Packet 
Switching", IEEE JSAC, pp. 1587-1597. 
28] R. Guerin, H. Ahmadi and M. Naghshineh "Equivalent Capacity and 
Its Application to Bandwidth Allocation in High-Speed Networks,” IEEE 
JSAC, Vol. 9，No. 7, Sep. 1991, pp. 968-981. 
29] R. Melen and J. S. Turner, "Nonblocking Multirate Distribution Networks," 
IEEE Trans. Comm,, Vol. 41, No. 2, Feb. 1993, pp. 362-369. 
30] Raymond H. Lin, Cheuk H. Lam and Tony T. Lee, "Performance and Com-
plexity of Multicast Cross-Path ATM Switches," Proc. INFOCOM '97. 
31] S. Karlin and H. M. Taylor "A First Course in Stochastic Processes," 2nd 
Edition, Academic Press, New York, 1975, pp. 221-226. 
32] S. C. Liew “Performance of Various Input-buffered and Output-buffered 
ATM Switch Design Principles under Bursty Traffic : Simulation Study", 
IEEE Trans. Comm., Vol. 42, No. 2/3/4, February/March/April 1994, pp. 
1371-1379. 
33] S. M. Ross, "Introduction to Probability Models", 5th Edition, Academic 
Press, 1993. 
34] T. T. Lee "A Modular Architecture for Very Large Packet Switch," IEEE 
Trans, on Commun., Vol. 6, pp. 1455-1467, July 1990. 
35] T. T. Lee and S. Y. Liew "Parallel Algorithm for Benes Networks," in Proc. 
INFOCOM '96. 
'36] T. T. Lee and C. H. Lam "Path Switching - A Quasi-Static Routing Scheme 
for Large-Scale ATM Packet Switching", IEEE JSAC, 5 1997, pp. 914-924. 
"37] The ATM Forum “Traffic Management Specification", Ver. 4.0, 1995. 
.38] Tony T. Lee, "Nonblocking Copy Networks for Multicast Packet Switching"， 
IEEE JSAC, vol. 6，no. 9, pp. 1455-1467, Dec. 1988. 
39] V. E. Benes "Mathematical Theory of Connecting Networks and Telephone 
Traffic," Academic Press, New York, 1965. 
.40] Y. N. J. Hui and E. Arthurs “A Broadband Packet Switch for Integrated 
Transport," IEEE Journal on Selected Areas in Communications, Vol. 5, 
No. 8, October 1987, pp. 1264-1273. 
41] Y. S. Yeh, M. G. Hluchyj and A. S. Acampora, "The Knockout Switch: A 
Simple Architecture for High-performance Packet Switching," IEEE JSAC, 
Vol. SAC-5, No.8, Oct. 1987, pp. 1274-1283. 
42] Y. S. Yeh, M. G. Hluchyj, and A. S. Acampora, "The Knockout switch: A 
simple, modular architecture for high-performance packet switching", IEEE 
JSAC, vol. 5, pp. 1274-1283, Oct. 1987. 
. 
C U H K L i b r a r i e s 
lll_li__llll 
DD37DM3hE 
