Designing a large scale switch interconnection architecture and a study of ATM scheduling algorithms. by Yee, Ka Chi. & Chinese University of Hong Kong Graduate School. Division of Information Engineering.
‘ . 
DESIGNING A LARGE SCALE SWITCH 
INTERCONNECTION ARCHITECTURE AND A 
STUDY OF ATM SCHEDULING ALGORITHMS 
BY 
Y E E K A C H I 
A THESIS 
SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS 
FOR THE DEGREE OF M A S T E R OF PHILOSOPHY 
D lV IS ION OF INFORMATION ENGINEERING 









 W  w .  1¾ 
b  2  7  M  謹  § 


































































































































































































































































































































































































































































































































I would l ike to express m y deepest grat i tude to m y advisor, Professor P.C. 
Wong, for t ra in ing me, g iv ing insights and po in t ing the d i rect ion to me, as wel l 
as supervising m y research work w i t h care. Professor Wong is a respectable 
advisor, who always shares his experience and construct ive idea w i t h me. I have 
learnt much f r om Professor Wong throughout the constant meetings w i t h h im. 
I t is m y pleasure to have m y two-year study under his a t tent ive guidance. 
Special thanks are due to m y colleagues. I would l ike to thank Dr . Jack 
Lee, for his advice and support on m y road of research. Thanks also to Clement 
Cheung, for discussions and encouragement over the years. I would l ike to thank 
the technical team of our depar tment , par t icu lar ly A lan Lam, who gave me much 
help on managing m y simulations. 
I would l ike to thank al l m y friends, par t icu lar ly Hanford Chan, Thomas 
Kwok , and Vincent Tse, for being m y companions and mak ing m y s tudy ing l i fe 
more joy fu l . A special thank goes to everyone who have prayed for m y study in 
these two years. 
A l though m y parents have no idea about m y research work, thei r uncondi-
t ioned love, care, and support , are always t l ie source of encouragement to me. 
Every n ight , the delicious dinner and nut r i t ious soup prepared by m y parents 
iii 
_ e 
always gives me more than enough energy to complete m y thesis. 
Last bu t not least, I would l ike to thank m y wife, Jenny, for her companion-
ship to me over al l the happiness, sadness, worries as wel l as the busy days ever 
happened i n our lives. 
iv 
Abstract 
I n th is thesis, we study two impor tan t problems related to Asynchronous Trans-
fer Mode ( A T M ) . 
One is to design a large scale A T M switch scalable to support a large number 
of I / O ports. We propose a novel switch interconnect ion archi tecture called Hi-
erarchical Banyan Switch Interconnect ion (HBSI ) , wh ich connects smal l shared 
memory swi tch modules in to a large scale switch. The proposed archi tecture is 
incrementa l ly scalable, stackable and faul t to lerant. We describe the cell rou t ing 
a lgor i thm inside the switch and we study the cell loss and call b locking perfor-
mance of HBSI . The architecture employs channel grouping technique to give a 
h igh th roughput and cell loss performance. 
The use of shared memory switches in H B S I is simple and gives good cell 
loss performance. However, to make the H B S I archi tecture generally appl icable 
w i t h the use of other kinds of switching modules, i t is desirable to relax the 
,assumpt ion on the type of modules used. We study the use of generic point- to-
point modules in HBSI , and show how channel grouping can be fac i l i ta ted and 
s t i l l ma in ta in the cell sequence. 
The other prob lem is how to guarantee the qual i ty of service for each of 
the A T M connections. We study a number of rate-based A T M ou tpu t por t 
V 
scheduling a lgor i thms and propose a new scheduling scheme called Gated V i r t u a l 
Clock Scheduling (GVCS) , which can guarantee the th roughput of i nd iv idua l 
streams and give a smal l delay to al l streams. We propose a imp lementa t ion of 
an ou tpu t por t scheduler which can be programmed to implement many different 
rate-based ou tpu t por t scheduling schemes, inc lud ing GVCS. 
vi 
Contents > 
1 Introduction 1 
1.1 Background 1 
1.1.1 Large Scale Switch Interconnections 2 
1.1.2 Mu l t i channe l Switching and Resequencing 6 
1.1.3 Scheduling 7 
2 Hierarchical Banyan Switch Interconnection 12 
2.1 In t roduc t ion 12 
2.2 Switch Arch i tec ture 13 
2.3 Swi tch Operat ion 19 
2.3.1 Cal l Setup 19 
2.3.2 Cel l Rout ing 21 
2.3.3 Fault Tolerance 27 
2.4 Cal l B lock ing Analysis 28 
2.4.1 D i la ted Banyan . . . 29 
2.4.2 D i la ted Benes Network 30 
2.4.3 H B S I 30 
2.5 Results and Discussions 31 
• • 
Vll 
2.6 S u m m a r y 37 
3 Multichannel Switching and Resequencing 40 
3.1 I n t r o d u c t i o n 40 
3.2 Channe l Ass ignment 41 
. 3.2.1 VC-Based Channel A l l oca t i on Mechan ism 41 
3.2.2 Por t -Based Channel A l l oca t i on Mechan ism . . 45 
3.2.3 Trunk-Based Channel A l l oca t i on Mechan ism 46 
3.3 Resequencer 50 
3.3.1 Resequencing A l g o r i t h m 50 
3.4 Results and Discussion 55 
3.5 S u m m a r y 60 
4 Scheduling 62 
4.1 I n t r o d u c t i o n 62 
4.2 V i r t u a l Clock Schedul ing (VCS) 62 
4.3 Gated V i r t u a l Clock Schedul ing ( G V C S ) : . . 70 
4.4 T i m e - P r i o r i t y Mode l 75 
4.5 P rog rammab le Rate-based Scheduler (PRS) 80 
4.6 In tegra t ion w i t h Resequencer 83 
4.7 Results and Discussions 86 
4.8 S u m m a r y 96 





I n this chapter, we give an in t roduc t ion to the background of th is thesis. The 
object ive of th is chapter is threefold: (1) to state the problems we tackle in this 
thesis, (2) to describe the relevant f ield of research, and (3) to in t roduce the 
cont r ibu t ion of th is thesis. 
1.1 Background 
Communica t ion technologies have been advancing rapidly, prov id ing us w i t h 
faster channels and more advanced applications. Generally, every step in the 
improvement of technologies gives us more bandwid th , a more rel iable commu-
nicat ion session, and a reduct ion i n cost [1]. As a result, many appl icat ions tha t 
could not be done in the past owing to either technical or economical reasons 
are now becoming feasible. 
Among the technologies developed in recent years, Asynchronous Transfer 
Mode ( A T M ) is recognized as the key technology for support ing al l k inds of 
1 
Chapter 1 Introduction 
services i n the fo r thcoming Broadband Integrated Services D ig i t a l Networks 
( B I S D N ) . A T M defines tha t al l information- services are t ranspor ted in the f o r m 
of short , 53-byte packets called cells, and i t can provide guarantees of Qua l i t y 
of Service (QoS) to different applications. A T M switches are now commerc ia l ly 
available and deployed in actual appl icat ions. St i l l , there are emerging standards 
to fur ther enhance the capabil i t ies of A T M , and there is p lenty of research in 
progress to tackle various ATM- re la ted problems. 
I n th is thesis, we tackle two impor tan t problems in A T M . The first p rob lem 
is how to make an A T M swi tch scalable. I n this aspect, we consider various 
switch interconnect ion architectures and propose a novel swi tch interconnect ion 
archi tecture called Hierarchical Banyan Switch Interconnect ion (HBSI ) . The sec-
ond prob lem is how to guarantee QoS for each A T M stream th rough the use of 
scheduling. We consider several rate-based scheduling a lgor i thms and propose 
a novel scheme named Gated V i r t u a l Clock Scheduling (GVCS) . F ina l ly , we 
combine the two studies in to the imp lementa t ion of a scalable A T M swi tch ar-
chi tecture wh ich can guarantee the QoS of each A T M stream. 
I n the fo l lowing, we describe the background of the work in detai l . 
1.1.1 Large Scale Switch Interconnections 
There have been many studies on bu i ld ing large scale fast packet switches by 
interconnect ing smal l swi tching modules [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
15]. The ma in mot iva t ion is tha t fu ture broadband central offices m igh t need 
to support many thousands of lines. Yet i t is d i f f icul t to pu t too many ports 
i n one swi tching module. The l im i t a t i on is p r imar i l y due to I / O circui tr ies and 
connectors, wh ich take up significant space and complex i ty [16 . 
11 
Chapter 1 Introduction 
Another mot iva t ion which perhaps have more near-term value. A T M switches 
are now deployed for L A N and enterprise network applications. I t is desirable i f 
a company can start w i t h a small switch, and expand the switch later by adding 
more switch modules. The concern then is how a switch can grow by w i r ing 
new modules to the exist ing modules. I n this thesis, we use the t e rm module to > ‘ 
stand for the small switching modules and switch for the interconnected switch 
fabric. 
Two-Stage Architectures 
In the l i terature, most designs are based on two-stage or three-stage architec-
tures. Figure 1.1 shows a two-stage architecture proposed by Lee [3]. As sug-
gested i n [3, 13], a two-stage architecture switches the cells in two steps. The 
first step is the d is t r ibut ion step where the rout ing and mul t icast ing functions 
are performed. The second step is the mul t ip lex ing step which mult iplexes the 
cells to the dest inat ion output ports. As there is a unique path between a source 
and a destination, cell sequence is not a problem in two-stage architectures. 
Three-Stage Architectures 
Figure 1.2 shows a generic three-stage architecture [5, 8] which is the wel l -known 
Clos network [17, 18]. By using more modules at the second stage, the Clos net-
work can provide more internal paths, result ing in a lower cell loss and call 
blocking probabi l i ty . The major problem of three-stage architectures is growa-
bi l i ty . Whenever more modules are added to increase the switch size, we have 
to rewire many internal lines. . 
11 
Chapter 1 Introduction 
1 1 
1 ^ r " " " ^ T" Mux + 1 
； Module 1 : /——•! 
M 二 • / K 
-翼： 
_ - i ) + i " ^ " f T T 7 i r \ i 
• Module K • \ 一丨 
N=MK ^ _ _ _ ^ _| ： M u x " ^ N 
~^"~~~ N^ 1 ^ 
Distributing Multiplexing 
Figure 1.1: A modular two-stage archi tecture 
1 ^ " " " " | 1 0 1 | " " " " | 1 1 | ^ " 1 — 1 ： 1 ^ r 1 i ： 1 i 
vrn^ / / vr vr\ j \ / / vm 
, ¾ ^ , 
i r 丨 丨 m 丨 ： r •: 
n——•! =1 =1 ——• n 
Figure 1.2: A generic three-stage archi tecture 
4 . 
Chapter 1 Introduction 
Multi-Stage Architectures 
Mult i -s tage architectures have three or more stages [6, 11,19], and the switch size 
is increased by adding more stages or modules. By having more stages, mu l t i -
stage architectures can use smaller and f ixed size switching modules. Figure 1.3 
shows an example of mult i -stage architecture, the banyan switch w i t h channel 
grouping, or normal ly called di lated banyan. Each of the modules is a generic 
16 X 16 switching fabric. 
One-Sided Architectures 
A l l of the above designs are two-sided architectures of which input ports are 
placed at one side and the output ports at the other [14, 20]. They require 
the rewir ing of I / O or internal l inks whenever new modules are added, and the 
growth is l im i ted by stages of expansion which is nonlinear in general. On the 
other hand, one-sided (or folded) architectures are proposed to allow for growth 
in an incremental manner [14, 20]. The architectures can be two-stage, three-
stage, or mult i-stage. I n one-sided architectures, al l the input and output ports 
are located at one side. The cells are first switched to the later stages and then 
back to the first stage. 
Hierarchical Banyan Switch Interconnection 
I n this thesis, we propose a one-sided switch interconnection architecture, Hi-
erarchical Banyan Switch Interconnection (HBSI) . H B S I allows a small number 
modules to be added at a t ime, and new modules can be added wi thout modi fy-
ing the exist ing connections. We refer the first feature as incremental scalability 
and the second feature as stackability. Also, we w i l l show how H B S I can achieve 
11 
Chapter 1 Introduction 
the fo l lowing features: 
• Hierarchical Switching 一 In two-sided architectures, al l t raff ic goes f rom 
one side to the other, so that al l stages w i l l have the same load. I n HBSI , 
al l the I / O ports are located at the first stage. Traff ic is switched towards 
> the later stages and back to the first stage. This reduces the load at later 
stages and gives a good performance i f the traff ic is localized in nature. 
Furthermore, local traff ic can st i l l be switched even i f some modules of 
later stages fai l . 
• Fault Tolerant - I n a switch interconnection network, a l ink or module 
may fai l . H B S I allows affected calls to be rerouted to alternate paths and 
thus maintains their connectivity. 
• Loose Synchronization 一 A t each stage of switching, buffers are employed 
for storing cells. This relaxes the inter-stage synchronization constraint, 
which could be a significant problem i f the switch is large. 
We w i l l discuss the H B S I architecture in detai l in Chapter 2. 
1.1.2 Multichannel Switching and Resequencing 
In HBSI , we use the channel grouping technique to enhance the throughput 
•5, 21, 22, 23, 24]. One of the requirements in A T M is that the cells delivered 
must be in sequence [25]. In the discussion of HBSI in Chapter 2, we assume 
the use of shared memory switches to ensure the cell sequence. 
The use of shared memory switches is simple and gives good cell loss perfor-
mance. However, to make the HBSI architecture generally applicable w i t h the 
11 
Chapter 1 Introduction 
use of other k inds of swi tch ing modules, i t is desirable to relax the assumpt ion 
on the t ype of modules used. We study the use of generic po in t - to -po in t modules 
i n HBSI , and show how channel grouping can be fac i l i ta ted and s t i l l ma in ta in 
the cell sequence. 
I n the l i te ra ture, there are ma in l y two ways to ensure the cell sequence in a 
swi tch interconnect ion architecture. The f irst approach is prevent ive in nature. 
K i m uses sorters, i.e. Batcher Networks [26], to ensure tha t the system appears 
l ike a series of v i r t u a l F I F O queues [27]. A t each stage, packets are sorted 
according to the i r i npu t sequence. I n [21], Cheng uses an imp l i c i t spat ial order 
of l inks w i t h i n a l ink group to represent the t ime order of packets. To prevent 
later packets over tak ing earlier packets which come on different l inks, d u m m y 
packets are used to re ta in the sequence integr i ty . 
The second approach to ensure cell sequence is to use a resequencer to re-
construct the cell sequence at the ou tpu t por t , and hence the cell order can be 
ignored at each in termediate stage [22, 28]. Th is second approach simplif ies the 
in terna l modules, bu t needs a resequencer at the ou tpu t por t . I n the fo l lowing, 
we w i l l focus only on the second approach and study a lgor i thms for channel 
al locat ion and guaranteeing the cell sequence. 
We w i l l discuss the detai l i n Chapter 3. 
1.1.3 Scheduling 
A T M is a connection oriented service. Nevertheless, there is no pre-assigned 
t ime slot for a par t icu lar connection as in T i m e Div is ion Mu l t i p l ex i ng ( T D M ) . 
Dif ferent A T M streams submi t cells in to the network only i f they have traf f ic, 
and A T M cells are mul t ip lexed at each stage of swi tching and transmission un t i l 
11 
Chapter 1 Introduction 
they are delivered to their destinations. This stat ist ical mu l t ip lex ing property 
of A T M makes i t a t t ract ive for support ing mul t i - ra te connections, variable b i t -
rate connections, and bursty traff ic. The down side of stat ist ical mu l t ip lex ing is 
that congestion may occur i f streams are submi t t ing long bursts of cells simul-
taneously. I f congestion persists, cells may be discarded or overdelayed, causing 
significant degradation in the qual i ty of service to A T M streams. 
The problem of congestion in A T M networks is challenging due to one fact: 
the characteristics of A T M streams supported can diverse greatly f rom one an-
other. A T M streams can select different traffic parameters such as peak cell rate, 
sustainable cell rate, cell delay var iat ion tolerance, and m a x i m u m burst size, and 
Quality of Service (QoS) parameters l ike peak-to-peak cell delay var iat ion, max-
i m u m cell transfer delay, and cell loss rat io [25]. A n A T M network therefore 
has to support high-rate continuous streams (e.g., H D T V ) up to tens of Mbps, 
and bursty, in te rmi t ten t data transmissions (e.g., te rmina l queries) down to a 
few cells a second. When such diverse streams are stat ist ical ly mul t ip lexed, i t 
is di f f icul t to guarantee the QoS for each stream. 
Consider a generic A T M mult ip lexer shown in Figure 1.4. A n A T M switch is 
l ike a mul t ip lexer w i t h mul t ip le output ports, each w i t l i i ts own output sched-
uler. Cells arr ived at the input ports are placed into the output queues for 
transmission. As each output port can deliver only one cell at a t ime to the out-
put l ink , scheduling is cr i t ical in control l ing the delay experienced by each cell. 
I n t rad i t iona l packet switching networks, packets are most ly served w i t h First-
Come-First-Served (FCFS) or Round-Robin (RR) discipline. These schemes are 
simple, but the QoS as perceived by one stream can be affected by the behavior 
of the other streams. Consider two bursty streams generating cells into an A T M 
11 
Chapter 1 Introduction 
mul t ip lexer wh ich is FCFS. As the mul t ip lexer delivers only one cell at a t ime, 
cells w i l l be accumulated inside the mul t ip lexer . I f there is now a t h i r d stream 
generat ing cells, the newly arr ived cells w i l l be delayed by the stored cells. On 
the other hand, the R R scheme divides the bandw id th equal ly among al l streams. 
Th is favours low-rate streams, whereas high-rate streams may receive a smaller 
bandwid th . I t is therefore d i f f icu l t for a swi tch tha t employs ei ther FCFS or 
R R to offer QoS guarantees such as th roughput , end-to-end delay, and cell delay 
var ia t ion [29]. 
To achieve some or al l of the QoS guarantees w i thou t g iv ing up the benefits 
of stat ist ical mu l t ip lex ing , rate-based disciplines have been proposed [30, 3 1 . 
I n par t icu lar , V i r t u a l Clock Scheduling (VCS) [32] provides f ire-walls among 
ind iv idua l streams. As long as a traf f ic flow has reserved a bandwid th , i ts 
th roughput is guaranteed independent of t l ie other streams. Another scheme, 
the Packetized Generalized Processor Sharing (PGPS) or bet ter known as the 
Weighted Fair Queueing ( W F Q ) scheme [33], provides a propor t iona l share of the 
remain ing bandw id th among the stream whi le achieving the same throughput 
guarantee. I t is shown tha t bo th W F Q and VCS can guarantee the delay as wel l 
i f the source has been shaped by a leaky bucket traff ic shaper [29, 34 . 
B o t h VCS and W F Q schemes are at t ract ive for guaranteeing the th roughput 
and due- t ime for ind iv idua l streams. St i l l , these schemes have l im i ta t ions . F i rs t , 
they require sort ing and insert ion of t ime labels, which are compl icated tasks 
to be realized at h igh switching speed. Second, their p r io r i t y assignment is 
coupled w i t h the rate of the reserved bandwid th . As a result , low-rate streams 
experience a larger delay than high-rate streams. Wh i l e i t is t rue tha t most 
low-rate streams can tolerate longer delay, some low-rate streams l ike a larm 
11 
Chapter 1 Introduction 
ieH~"\H~~}^ "~^ 16 
^ ^ K A ^ / L ^ 
i 6 " ^ i U r n \~^ 16 
-^a^^^' • 
1 6 " H t ^ 1 ^ ! " • 16 
—•I !•• 
Figure 1.3: Banyan architecture w i t h channel grouping 
in n ATM Multiplexer 
^ = ^ 面 111 
II. g — _ U - u ^ Output 
N II I , H 
^ Buffers 
Figure 1.4: A n A T M Mul t ip lexer 
10 
Chapter 1 Introduction 
messaging may need a smal l delay. I n [29], i t is suggested tha t low-rate streams 
may reserve a higher bandw id th to achieve a smaller delay. Th is w i l l reduce the 
network efficiency, bu t the improvement in delay for low-rate streams could be 
insigni f icant. Consider a source which generates a 1 K B y t e message per second. 
Even i f i t requests 10 t imes the bandwid th , i ts reserved rate is s t i l l far below a 
source t r ansm i t t i ng at 1 M B y t e per second； The delay of the 1 K B y t e / s stream 
would s t i l l be much larger than tha t of the 1 M B y t e / s stream. 
I n view of the l im i ta t ions , we can have three direct ions of research. We 
investigate how to implement VCS or W F Q in an efficient manner, resul t ing 
in a design wh ich is more implementable. We investigate schemes which give 
smaller delay to low-rate streams, whi le achieving the same th roughput guaran-
tee. F ina l ly , we observe tha t different scheduling disciplines have different char-
acteristics wh ich may suit a specific network and traf f ic condi t ion. I t wou ld be 
interest ing i f a scheduler can adapt to different network condit ions, by changing 
i ts parameters (e.g., f r om three to four p r io r i t y queues) or to a different scheme 
altogether (e.g., f r om VCS to RR) . We can then insta l l the scheduler inside a 
switch f i rst , and program the scheduler on-th.e-fly to adapt to a specific network 
condi t ion. 
I n Chapter 4, we present our efforts i n these three direct ions of research. 
11 
Chapter 2 
Hierarchical Banyan Switch 
Interconnection 
2.1 Introduction 
I n Section 1.1.1, we reviewed several categories of swi tch interconnect ion ar-
chitectures proposed in the l i terature. We found that some of the concerns of 
bu i ld ing a large scale switch interconnect ion archi tecture are (1) incremental 
scalabi l i ty, (2) effort to pay wl ien the switch is scaled up, i.e. stackabil ty, and 
(3) fau l t tolerance. I n this chapter, we propose a new switch interconnect ion 
archi tecture based on the one-sided approach. I n our proposal, channel group-
ing technique is employed to enhance the switch th roughput and s impl i fy the 
in ternal rou t ing operat ion. 
21 
V 
Chapter 2 Hierarchical Banyan Switch Interconnection 
2.2 Switch Architecture 
We use an example to i l lustrate the construct ion of HBSI . Figure 2.1 shows 
one k ind of modules which consists of M local inpu t , M local ou tpu t and 2G 
expansion ports (arranged as 2 l ink groups), i.e. a ( M + 2 G ) x ( M + 2 G ) switching 
fabric. I t is referred as an Interface Module ( I M ) , which consists the physical 
I / O and call processing circuitr ies. I n our example, we use an I M w i t h M = 16 
and G 二 8. I t w i l l have 16 fu l l duplex I / O ports- and 16 bi-di rect ional expansion 
ports, and an internal 32 x 32 switch fabric. 
Figure 2.2 shows another k ind of modules, the Expansion Module ( E M ) , 
which is a 4C x AG switch module w i t h no I / O ports and call processing cir-
cuitries. The ports are arranged in to 4 bi-direct ional l ink groups. W i t h G — 8, 
we have a 32 x 32 E M . As i t is known that I / O and call processing components 
dominate the cost [16], so EMs should be much simpler and cheaper than IMs. 
To construct a 32 x 32 switch f rom the above components, we wire up two 
IMs and two EMs as shown in Figure 2.3. Each l ink group between the f irst 
stage and the second stage consists of 8 bi-direct ional l inks. Similar ly, we can 
construct a 64 x 64 switch by adding two IMs and six EMs to system (Figure 2.4), 
which connects in a banyan manner. I n general w i t h M = 2G, we can construct 
a M ( 2 ^ ) X M ( 2 ^ ) switch w i t h 2^ IMs and k{2^) EMs, and extend i t in to a 
M(2^+i) X M(2^+i) switch by adding 2^ IMs and k{2^) + 2^+^ EMs. We see 
that al l input and output ports are available at one side, and new modules are 
added by rewir ing to the back of exist ing modules without modi fy ing the exist ing 
connections. 
I t appears that H B S I has to double i ts size at each step of expansion, result ing 
13 
•• 
Chapter 2 Hierarchical Banyan Switch Interconnection 
•H 
2.2 Switch Architecture 
We use an example to i l lustrate the construct ion of HBSI . Figure 2.1 shows 
one k i nd of modules which consists of M local inpu t , M local ou tput and 2G 
expansion ports (arranged as 2 l ink groups), i.e. a {M^2G) X ( M + 2 G ) switching 
fabric. I t is referred as an Interface Module ( I M ) , which consists the physical 
I / O and call processing circuitr ies. I n our example, we use an I M w i t h M = 16 
and G 二 8. I t w i l l have 16 fu l l duplex I / O ports and 16 bi-di rect ional expansion 
ports, and an internal 32 x 32 switch fabric. 
Figure 2.2 shows another k ind of modules, the Expansion Module ( E M ) , 
which is a AG X 4G switch module w i t h no I / O ports and call processing cir-
cuitries. The ports are arranged into 4 bi-direct ional l ink groups. W i t h G — 8, 
•零 
we have a 32 X 32 E M . As i t is known that I / O and call processing components 
dominate the cost [16], so EMs should be much simpler and cheaper than IMs. 
To construct a 32 x 32 switch f rom the above components, we wire up two 
IMs and two EMs as shown in Figure 2.3. Each l ink group between the f irst 
stage and the second stage consists of 8 bi-direct ional l inks. Similar ly, we can 
construct a 64 x 64 switch by adding two IMs and six EMs to system (Figure 2.4), 
which connects in a banyan manner. I n general w i t h M = 2G, we can construct 
a M ( 2 ^ ) X M ( 2 ^ ) switch w i t h 2^ IMs and k{2^) EMs, and extend i t in to a 
M(2^+i) X M(2^+i) switch by adding 2^ IMs and k{2^) + 2^+^ EMs. We see 
that al l input and output ports are available at one side, and new modules are 
added by rewir ing to the back of exist ing modules without modi fy ing the exist ing 
connections. ., 
I t appears that H B S I has to double its size at each step of expansion, result ing 
13 
Chapter 2 Hierarchical Banyan Switch Interconnection 
Interface Module (IM) 
l / ^ ^ < s Shared Memory Fabric = = " 
1 - ~ ~ M iPC ] ~ n ^ ~ ； ~ ~ " “ 
^ I � • 
. ： Link 
P • ~ ~ P C " L _ . • • Group X 
^ 2 ^ 1 G • 
•41 
參 
• 1 _ 一 
* • ^ Link 
• , Group Y , • • 
M • 'P<"- J M G » 
Figure 2.1: A n Interface Modu le ( I M ) 
External Module (EM) 
Expansion “ ~~ ZTT Expansion 
Links Shared Memory Fabric [inks 
^— n i] — • 
— • ^ — 
Link • * I • Link 
GroupA : • • • Group X 
^ G G ：__». 
• M 
^ 1 1 • 
^ M 
Link • • • • Link 
Group B • • • • Group Y 
• • • 參 
^ G G 一 
^ ^ ~ ~ " M 
Figure 2.2: A n Expansion Module ( E M ) 
14 
Chapter 2 Hierarchical Banyan Switch Interconnection 
i n a non-l inear growth. Note tha t , however, the EMs are for connecting the IMs. 
So i f we have less traff ic routed through EMs, especially at later stages, we can 
use less EMs. Another note is that we can add IMs in an incremental fashion 
as well. Figure 2.5 shows how a 32 x 32 switch can grow to 48 x 48 by adding 
one IMs and three EMs. I t may grow to 64 x 64 later by fur ther adding a I M 
and three EMs. 
The above discussions assume that there are two l ink groups in the IMs, 
and M = 2G. In general, we can have L l ink groups in each I M that each 
( M + LG) X ( M + LG) I M w i l l connect to L EMs in next stage. This gives 
more alternate paths at the cost of more interconnections. On the other hand, 
instead of having M 二 LG, we can reduce the switch internal loading by sett ing 
M < LG ( input expansion), or conversely, concentrate the input traff ic by 
sett ing M > LG. This allows more modules to be connected at each stage, 
so can reduce the number of switching stages for a given size of switch. This 
however requires either larger switching modules, or a smaller group size G which 
w i l l affect the switching performance. Nevertheless, hereafter, we focus on the 
simplest fo rm w i t h M 二 2G. 
I n HBSI , we group the internal l inks into l ink groups. Cells can be routed to 
any l ink of the l ink group. This is known as channel grouping [5, 21, 22, 23, 24], 
which can give a higher throughput and better cell loss performance. The Input 
Port Control lers ( IPC) located at the IMs are responsible to process the incoming 
cells and generate rout ing tags for them according to their V C I . The rout ing tags 
w i l l direct the path of the cells inside the switch. 
According to A T M Forum [25], switches must guarantee the cell sequence of 
a v i r tua l c i rcui t . To ensure the cell sequence in the switch, we can have at least 
15 • 
Chapter 2 Hierarchical Banyan Switch Interconnection 
8 
16 Local ^ _ ^ •,, ^ / ^ ^ . . 
I/O Links " ^ 旧 T ^ EM 
^^^ 
16Local ^ M ^ |M i _ _ ^ EM 
I/O Links ^ _ • ^ 
Figure 2.3: A 32x32 H B S I swi tch 
l S S s r z d IM b " " d EM ^ ^ EM 
^¾¾^ 
| 1 / S t ? n t = 旧 ^ EM • EM 
j s s rz^""^^""ViT^^^n^YV"^ 
^ ^ ^ ^ ^ ^ 
16Local ^ ^ ~ ^ |M ^ _ _ : EM ^ ， EM 
I/O Links ^ _ _ • ^ ^ • 
F igure 2.4: A 64x64 H B S I swi tch 
16 
«• 
Chapter 2 Hierarchical Banyan Switch Interconnection 
two approaches: 
1. using shared memory swi tch ing fabr ic in bo th IMs and EMs [9, 19]. B y 
ma in ta in ing the t ime order among the l inks w i t h i n a l i nk group, we can 
ensure the cell sequence imp l i c i t l y , or, 
2. using a generic swi tch ing fabr ic [5, 21, 22, 23, 24] and use some methods 
to keep the cells i n sequence expl ic i t ly . 
We w i l l look in to the first approach here, and leave the discussion on the 
use of add i t iona l methods to Chapter 3. I n other words, here we assume bo th 
I M and E M are based on a shared memory swi tching fabr ic, so the incoming 
cells are stored and read by the ou tpu t l inks. I n IMs and EMs, the memory is 
par t i t i oned in to F I F O queues for each local ou tpu t por t and each expansion l ink 
group. A t each t i m e slot, cells are stored one by one in to the shared memory in to 
the i r ou tpu t queues, according to the inpu t l ink order (F igure 2.6). Note tha t 
l i nk order is mandato ry for l inks w i t h i n the same group, bu t is not necessary 
for l inks belonging to different groups. So different l ink groups may store thei r 
cells in paral lel . I f the ou tpu t queue is fu l l , a cell inside the queue is dropped as 
a lost cel l i . On the other hand, cells f r om the ou tpu t queue are t ransmi t ted one 
by one to the ou tpu t por t or l ink group. I n the case of a l ink group, the f i rst G 
cells (or a l l cells i f there are less than G cells in the queue) w i l l be t ransmi t ted 
on the G l inks in the same group. 
I n fact , there is an imp l i c i t t ime order for two cells w i t h i n a l i nk group. 
Assuming a, b are two l inks in a l ink group and a < b, the cell on l ink a is of a 
iThe choice of the cell to drop imposes difference in system behaviour. We will not discuss 
the different effects here. 
17 
Chapter 2 Hierarchical Banyan Switch Interconnection 
16Local- .1 ~ L • [1^力 ^ ^ cu ^ ~ ~ • 16 Expansion 
I/O L i n k s ^ ^ IM ; ^ ^ EM ^ EM ^ ^ ,/o Links 
16 Loca l -<^~~ |M~~L J ~ ~ E M " ^ V ^ 
I/O Links^ ». ^ - ^ J ^ 
^ i ^ ^ 7 V ^ . 
i/OMnkl-^*"^ IM ^ ~ ~ • EM t i EM " ^ ^ 16Expansion I/。Links^ 一 ^ _ _ • ^ _ _ ^ - ^ - ^ I/O Links 
Figure 2.5: A 48x48 H B S I switch 
FIFO Queue 
1 Rx for Link Group Y 丁父 ^ 
~~2~H k^  r z ^ MZZZ>"2"""^ 
Link - ^ r S \ ^ = Z r T ^ ^ Link 
G r o u p ~ ~ ^ k A ^ ^ Z Z I 2 Group 
^ ^ Y 
To another \ 
Link Group Cfells from 
G r o u p A t o Group Y 
Figure 2.6: Ma in ta in ing l ink order when cells are stored and t ransmi t ted 
18 
• 
Chapter 2 Hierarchical Banyan Switch Interconnection 
higher time order. I f the cells belong to the same connection, the cell on l ink a 
wi l l be stored at an earlier posit ion in the shared queue, and w i l l be delivered 
to an ou tpu t por t at an earlier t ime slot or to an ou tpu t l ink w i t h a higher t ime 
order. W i t h the above arrangement, the t ime order of cells is mainta ined when 
cells are switched by each module. Thus, the cell sequence can be guaranteed 
on an end-to-end basis i f cells of the same connection are routed through the 
same path. 
I n nature, H B S I uses 32 x 32 switch as a 16 x 16 module, and has doubled 
the interconnections. The extra l inks are used to route the in termodule traff ic. 
We argue that this is necessary in order to prepare room for incremental growth, 
as wel l as to give a good call blocking performance, and the increase in cost can 
be m i n i m a l as I / O and call processing circuitr ies are dominat ing the cost of a 
switching system. 
2.3 Switch Operation 
2.3.1 Call Setup 
I n A T M , a connection have to be established first before data are sent. I n 
HBSI , the rout ing path of a call is the set of l ink groups used to deliver the 
cells f rom the source module to the destination module. The path is defined 
in the connection setup phase, and all cells f rom the same call are switched 
on the same path throughout the connection hold t ime. In HBSI , there exist 
mu l t ip le paths between a pair of source and destination. We therefore need a Cal l 
Admission Contro l (CAC) a lgor i thm to determine whether the requesting call 
can be accepted or not, and to allocate the best path to the call [35, 36, 37, 38 . 
19 
Chapter 2 Hierarchical Banyan Switch Interconnection 
The call setup process can be handled by a centralized or distributed approach. 
I n a central ized approach, there is a call processor connected to the switch. The 
source I M sends a request cell to the call processor, containing the desired traff ic 
contract. The call processor w i l l check all the possible paths between the pair 
of source and dest inat ion IMs to see i f the call can be accommodated. I f so, the > 
cal l processor w i l l send back an accept cell to the source I M w i t h the rou t ing tag 
for the new call. To reduce complexi ty, the call processor may check a smaller 
subset of paths instead of al l possible paths. 
I n a d is t r ibuted approach, there is no centralized call processor but the EMs 
w i l l have to ma in ta in l ink in format ion and part ic ipate in the call setup process. 
When a call request arrives, the source I M sends a request cell on each of the 
outgoing paths. When the request cell passes through a E M , the E M w i l l check 
i f the call can be accepted. I f so, i t reserves the resources and forwards the 
cell to the next stage. Otherwise the cell is dropped and a reject cell is sent 
back. When the request cell reaches the destination I M , an accept cell is sent 
back. Also, some in format ion concerning the traff ic condi t ion of the path can 
be embedded in to the accept cell. Upon receiving one or more accept cells, the 
source I M w i l l send a confirm cell through the selected path. The EMs on the 
other paths not receiving the confirm cell w-ill release the reserved resource after 
some t ime. 
Compar ing the two approaches, we conclude that the centralized approach is 
simpler in operat ion, requires less intell igence of EMs, and requires less resource 
management and control traff ic inside the switch. 
20 
Chapter 2 Hierarchical Banyan Switch Interconnection 
2.3.2 Cell Routing 
Figure 2.7 shows three paths that an incoming cell may take to an ou tpu t por t . 
We observe tha t for the same pair of source and dest inat ion IMs, there can be 
several possible paths and some are longer t l ian the others. For ma in ta in ing the 
cell sequence, the rout ing path of a v i r tua l c ircui t is f ixed since call setup, and 
the source IMs generate rout ing tags for each cell. 
Figure 2.7 shows an example of defining the rout ing tags. For i l lus t ra t ion, 
each module is labeled by {column,row), ind icat ing its posit ion. We define the 
switching di rect ion to the r ight as upstream, and the direct ion of switching back 
as downstream. There are four l ink groups in the EMs, so we need two bits 
for each rou t ing un i t used at eacli step for rout ing. We assign the rout ing un i t 
value for rou t ing cells upstream {X, Y in Figure 2.1 and 2.2) to be 01 and 10 for 
upward and downward respectively, and downstream (A , B i n Figure 2.2) to be 
00 and 11 for upward and downward respectively. As for the IMs, we also need 
two bi ts to indicate whether the cell is for upstream (01 or 10), or for delivery to 
the local ou tpu t ports (00). Let a1a2a3a4 be used to address the 16 output ports 
in I M . We can therefore assign the rout ing tags for the three paths, S — A, S-B, 
S-C, to be "OO,a1a2a3a4" which is a local ly switched cell, " lO , l l ,OO, a1a2a3a4" 
f rom IM(0,00) to IM(0,01) , and "lO,Ol, l l ,OO,OO,a1a2a3a4" f rom IM(0,00) to 
IM(0,10) respectively. 
Rout ing units for upstream (01 or 10) are identi f ied as Uj,s, rou t ing units for 
downstream (00 or 11) are identi f ied as dj's, and the rout ing un i t detected by an 
I M for local delivery (00) can be identi f ied as p. Let s be the number of stages 
a cell is switched upstream before being switched back. For a M ( 2 ^ ) x M ( 2 ^ ) 
switch, we have 0 < s < k. Let n 二 l o & M be the number of address bits 
21 
^ 0 
Chapter 2 Hierarchical Banyan Switch Interconnection 
for ident i fy ing a por t in a module of size M, We also define some control bi ts 
C1C2 . • • Cm for control and pr io r i t y functions. See Table 2.1 to 2.3 for a possible 
— 
assignment of control bi ts C1C2C3 and rout ing un i t values. A rou t ing .tag T w i l l 
i n general consist of five parts 
> — — — — — — / 八 ^ V 
T = CUDPA (2.1) 
where 
C 二 CiC2 . - • Cm are control and pr io r i ty bits 
U = U1U2 . . . Us are upstream rout ing uni ts 
D 二 dgds-i •.. di are downstream routing units 
P =z p is the rout ing un i t for local del ivery 
A = aitt2 . •. ttn are por t address bits 
A t each stage of rout ing, an I M or E M checks the control bi ts, identifies 
the rou t ing un i t and switches the cells accordingly. We examine the longest 
rout ing pa th S — C in detai l . On receiving an incoming cell for IM(0,10) , the 
IPC of IM(0,00) generates the rout ing tag "C1C2C3,10,01,11, 00,00, a1a2a3a4''. 
As the f irst rou t ing un i t is 10, the cell is -routed to a cross l ink to E M ( l , 0 1 ) . 
Simi lar ly, E M ( l , 0 1 ) routes the cell upstream to EM(2,01) based on the next un i t 
01, EM(2,01) switches back the cell to EM(1,11) based on the next un i t 11, and 
EM(1,11) w i l l deliver the cell to IM(0,10) based on the un i t 00. On detect ing a 
subsequent un i t 00, IM(0,10) knows that the cell is to be delivered to the local 
por t of address a1a2a3a4. One may concern that the rout ing tags are of variable 
lengths. Th is problem can be easily solved by padding d u m m y bits at the end 
of the rou t ing tags. 
22 • 
Chapter 2 Hierarchical Banyan Switch Interconnection 
Columns 
0 1 2 
00 ( s i r " 7 ^ p ^ ^ [ o ^ ^ 
^ ^ i o g W 10^ 1^  10 
nH 0 ^ | ^ ^ 0 0 O l h A 7 ^ O O 01 
01 ^ ^ 1 0 - ^ 1 1 ioVAyii 10 
Rows Y V 
i n ^ |00 0 1 / Y V o 01 
^ ^ i o f e ^ ; ioiy\ ” i 10 
oTy\oo oir ]oo 01 
11 10 ]11 10 11 10 
upstream downstream • ^ 
Figure 2.7: Cell rout ing in H B S I 
C1C2C3 Control packet type 
000 Nu l l packet 
001 Act ive data packet 
010 Pr io r i ty data packet 
011 Broadcast data packet 
100 Control packet # 1 (e.g. l ink level) 
101 Control packet # 2 (e.g. path level) 
110 Control packet # 3 (e.g. call level) 
111 Contro l packet # 4 (e.g. fabric level) 
Table 2.1: Values and functions of control bi ts 
23 
Chapter 2 Hierarchical Banyan Switch Interconnection 
Routing Tag 
For a swi tch w i t h 2^ IMs, the {column, row) label of a source and a dest inat ion 
IMs can be represented by (0,a/,afc_i . . • a i ) , and ( 0 , A f t _ i . •. A ) respectively, 
where ak and (3k are the most significant bits. Let R — rkVk-i... n , where 
r 
r, = a, ® /¾ (2.2) 
The vector R indicates the b i t changes necessary for a source label to convert 
— 
to the dest inat ion label. Let r^ be the most significant nonzero b i t in R. This 
means that the source and destination IMs lie w i t h i n the same subgroup of 
2^ modules. The number of steps required for switching upstream w i l l be s, 
therefore the to ta l number of paths for rout ing upstream is 2^. One can easily 
ver i fy tha t the downstream path is unique given the outermost E M , so the to ta l 
number of al ternate paths f rom the source .IM to the dest inat ion I M is also 2®. 
— 
We show that the upstream and downstream paths are related by R. F i rs t , 
we introduce some transform in the representation of the rout ing un i t . A n 
upstream path can be represented by u^u^ . • • ^ ^ where 
^ 0 i f Uj = 01 . 
w* 二 （2.3) 
1 if Uj = 10 
and, 
0 i f d, 二 00 d] = (2.4) 
1 if d, = 11 
Af te r s stages of upstream rout ing, the cell w i l l reach an E M of label (5, 
(kCk-i . . . G ) , where Q = cXj ® u*. Similar ly, the cell w i l l re turn to dest inat ion 
24 
Chapter 2 Hierarchical Banyan Switch Interconnection 
I M of label (0, | 3 k | h - i . . . /¾), where /¾ 二 ( i ® d* after 5 steps of downstream 
rout ing. A f t e r grouping, we get 
a , � /¾ = u; ® d] (2.5) 
— 
which indicates tha t the upstream and downstream paths are related by R. 
Given an upst ream pa th u { u l . . . u : cliosen by the source I M , the downstream 
pa th d*d*_^ . . . d^ is uniquely determined by 
d； 二 u； ® r , (2.6) 
Switch Broadcast 
Figure 2.8 shows tha t a cell can be broadcasted to al l modules by assigning the 
contro l b i ts C1C2C3 to 011 for data packets (111 for control packets in Table 2.3). 
The source w i l l copy the broadcast cell to al l local ports, and sends a cell to 
one upst ream E M . The E M receiving the broadcast cell w i l l fo rward the cell 
upstream, and send back a cell to the al ternate downstream l ink group. The 
process goes on u n t i l the last stage E M receives the cell and sends back to the 
al ternate downstream l ink group. I n the downstream direct ion, each E M w i l l 
copy the cell to the two downstream modules un t i l al l IMs receive a copy of the 
broadcast cell. F inal ly , al l IMs w i l l copy the cell to every local ou tpu t por t . 
Module Broadcast and Multicast 
A n I M may select to t ransmi t a cell to al l (or a group of) ports at a dest inat ion 
I M . The source I M establishes a pa th to the dest inat ion I M , and t ransmi ts 
a cell w i t h p = 11 ind icat ing tha t this cell is a module broadcast /mul t icast 
25 • 
Chapter 2 Hierarchical Banyan Switch Interconnection 
Value Routing function identified by IMs ‘ 
00 Deliver to local port by a1a2a3a4 
01 Deliver to a group of output ports 
a1a2a3a4 = 1111 (broadcast address) 
a1a2a3a4 = 0000 — 1110 (group address) 
10 Deliver to upstream l ink group X 
11 Deliver to upstream l ink group Y • 
Table 2.2: Functions of rout ing units detected by IMs 
Value Routing function identified by EMs 
00 Deliver to downstream l ink group A 
11 Deliver to downstream l ink group B 
01 Deliver to upstream l ink group X 
10 Deliver to upstream l ink group Y 
Table 2.3: Functions of rout ing units detected by EMs 
0 1 2 
( S ) 
0 0 " ~ ~ ^ ^ ! ^ ^ ^ ^ ' ^ ^ 
: : ^ ^ 
11 ^ T ^ ^ _ ^ _ _ — 
Figure 2.8: Cell broadcasting in H B S I 
26 
Chapter 2 Hierarchical Banyan Switch Interconnection 
cell (Table 2.2). The dest inat ion I M w i l l interpret the address a1a2a3a4 as a 
group address. I f a1a2a3a4 = 1111, the cell is copied to al l local ou tpu t ports. 
Otherwise a1a2a3a4 is a mul t icast address which indicates a group of ports. The 
dest inat ion I M w i l l lookup the l ist of ports defined w i t h the source I M dur ing 
call setup and copy the cell to those ports accordingly. 
We do not consider an expl ic i t mult icast mechanism for a source I M to send 
to a group of ports on different dest inat ion IMs. Such a mechanism requires 
ext ra hardware for EMs to handle group addresses, per form cell copying, and 
map cells to different l ink groups. Instead, the source I M can either broadcast 
the cell to al l IMs and let the dest inat ion IMs do the f i l ter ing, or establish an 
independent pa th to each dest inat ion I M . 
2.3.3 Fault Tolerance 
We consider the issue of faul t tolerance by not ing two kinds of fai lure: module 
fai lure and l ink (transmission l ine) fai lure. 
Module Failure 
Under normal operat ion, neighbouring modules may send status cells to one 
another. When a l ink or module fails, the neighbouring modules w i l l detect 
immediate ly and send out error cells to the call processor. In HBSI , we have 
mu l t ip le paths connecting the source and destination IMs. So when a module 
fails which affects the current connections, the centralized call processor can 
reroute the connections to take alternate paths, or terminate some low pr io r i ty 
connections i f necessary. 
27 
Chapter 2 Hierarchical Banyan Switch Interconnection 
Note tha t i n HBSI , the fai lure of one module affects only some of the con-
nections. When the affected connections change to take alternate paths, some 
other norma l modules may be overloaded. However, H B S I at least provides some 
availability i n case of component fai lure. Indeed, al l the local ly switched traff ic 
i n normal IMs w i l l be unaffected. 
I n case of a call processor fai lure, the whole switch operat ion w i l l be affected. 
So, a standby call processor and mi r ro r ing of al l the call in fo rmat ion should be 
employed to enhance the rel iabi l i ty. 
Link Failure 
I n case of l ink fai lure in HBSI , t l ie harmfu l effect can be min ima l . This is due to 
the stat ist ical mu l t ip lex ing among al l the l inks w i t h i n a l ink group. When a l ink 
fails, the module t ransmi t t i ng cells over that l ink must detect the case. Then 
the output l ink control ler of the module, which is responsible for reading out 
the cells f r om the shared memory and forward the cells over the ou tpu t l inks, 
w i l l mark the fai led l ink unavailable. As a result, the aggregate bandwid th of 
the affected l ink group is reduced. However, no connections has to be rerouted 
provided that the result ing l ink group can st i l l accommodate the connections 
passing through i t . 
2.4 Call Blocking Analysis 
I n a switch interconnection architecture, i t is crucial to control the loading on 
each interconnect ion l ink. Otherwise some l inks may become the bott leneck and 
cause significant cell loss. I f an arr iv ing call requests an internal l ink which does 
28 • 
Chapter 2 Hierarchical Banyan Switch Interconnection 
not have suff icient bandw id th , the call should be rejected so as not to affect the 
QoS of ex is t ing connections. . 
I n th is section, we derive the call b lock ing performance of H B S I , d i la ted 
banyan, and d i la ted Benes network, basing on the loading of l i nk groups [19 . 
Consider a M ( 2 ^ ) x M ( 2 ^ ) swi tch w i t h M x M modules, we bu i l d the three 
interconnect ion networks respectively, al l w i t h a d i la t ion factor of G. For sim-
p l ic i ty , we assume tha t each call w i l l request one l ink f r om each l i nk group along 
i ts rou t ing path. I n pract ice, calls may request only a po r t i on of l i nk b a n d w i d t h 
instead of the ent i re l ink . The b locking probab i l i t y in tha t case wou ld be lower. 
Let G 二 M/2 be the size of a l ink group and let p be the l ink u t i l i za t ion at the 
inpu t l inks. We assume tha t the traf f ic is un i fo rm ly and randomly d is t r ibu ted 
over the swi tch, so the probab i l i t y tha t a l ink is occupied is independent of the 
other l inks [18, ch. 9 . 
2.4.1 Dilated Banyan 
A d i la ted banyan has k stages and is based on unique pa th rou t ing . The u t i -
l i za t ion at each stage is the same as at the inpu t l inks. So the p robab i l i t y tha t 
a l i nk group is fu l l y occupied (i.e., not available) is given by p^ . As each pa th 
consists of k l i nk stages, t l ie probab i l i t y tha t a call is blocked by any one of the 
l i nk stages along the requested pa th is 
PB = 1 — (1 — p G f (2.7) 
29 
V 
Chapter 2 Hierarchical Banyan Switch Interconnection 
2.4.2 Dilated Benes Network 
A n M ( 2 ^ ) x M ( 2 ^ ) d i la ted Benes network has 2 k ^ l stages. Aga in , the u t i l i za t ion 
at each stage is the same as at the inpu t l inks. So the p robab i l i t y t ha t a l ink 
group is f u l l y occupied is also p^ . Using the Lee assumpt ion of independence, 
the b lock ing p robab i l i t y of a Benes network can be der ived using a recurrence 
re la t ion [18, ch. 9]: 
Pe,j_i = [ l - ( l - p G ) 2 ( l - i ^ ) ] 2 (2.8) 
where Ps , j is the b lock ing probab i l i t y of the 产 stage subnetwork (F igure 2.9). 
S tar t ing w i t h PB,k 二 0 (i.e., the midd le swi tching module) , the relat ionship 
allows us to compute b lock ing probabi l i t ies for d i la ted Benes networks w i t h 
3 , 5 , 7 , . • . stages as 
Ps 二 PB,o (2.9) 
2.4.3 HBSI 
H B S I is s imi lar to a d i la ted Benes network in tha t i t has two al ternate l inks at 
each stage of rout ing. Yet there are three differences. F i rs t , some paths have less 
hops than the others, so should experience less blocking. Second, some t raf f ic is 
local ly switched tha t the loading at later stages is less than tha t of the earlier 
stages. T h i r d , the f inal stage has no external connections, so the upstream and 
the downstream l ink groups must be available. Th is impl ies Pj5,fc-i 二 0. 
We obta in the b lock ing probab i l i t y for the longest pa th only, wh ich is the 
worst case b lock ing performance experience by al l connections. Instead of a 
13 
Chapter 2 Hierarchical Banyan Switch Interconnection 
constant p, let pj be the loading at the j -s tage l i nk {po = p). I f we assume a 
u n i f o r m d i s t r i bu t i on , we have 
0^ _ o i - i 
P] 二 ~^Po (2.10) 
Again , we have a recurrence relat ionship governing the b lock ing per formance 
Pe，r i = [ l — ( l l f _ i ) 2 ( l — i ^ ) r (2.11) 
I n th is case the s tar t ing po in t is Ps^k-i 二 0. The b lock ing p robab i l i t y of H B S I 
is given by 
PB = PB,o (2.12) 
after {k — l ) - t i m e s of operat ion. The b lock ing probabi l i t ies for other shorter 
paths may be obta ined s imi la r ly by set t ing a closer s ta r t ing po in t Ps , s - i = 0 
and pe r fo rm {s — l ) - t imes of operat ion. 
2.5 Results and Discussions 
To ob ta in the th roughpu t and cell loss performance, we develop s imu la t ion mod-
els [39，40] for H B S I , d i la ted banyan and d i la ted Benes network. Unless speci-
f ied otherwise, we assume the use of 16 x 16 swi tching modules for bu i l d ing a 
1024 X 1024 switch. The l ink group size is 8, and each ou tpu t queue has B cell 
buffers. A n I M has 16 queues for local ou tpu t ports and 2 queues for the l ink 
groups, so i t has (16 + 2)B buffers. A n E M has four l i nk group queues, so i t 
has AB buffers. W i t h B = 50, the capacity of the shared memory fabr ic for I M 
31 
Chapter 2 Hierarchical Banyan Switch Interconnection 
and E M w i l l be 900 cells and 200 cells respectively, which are reasonably smal l 
memory fabrics. 
I t is easy to see that H B S I and di lated Benes network can support more calls 
than d i la ted banyan. To give a fair comparison, we neglect call assignment at 
the t ime being. I n the case of H B S I and di lated Benes network, we assume that 
the upstream l inks are permanent ly assigned as straight connections tha t there 
w i l l be no blocking among cells when they are switched upstream. We assume 
that al l three networks have the same offered load and the traf f ic is randomly 
and un i fo rm ly d is t r ibuted over al l modules. 
Figure 2.10 compares the cell loss probabi l i t ies of the three networks. A l l 
networks give low cell loss even at high loading {p 二 0.9) and w i thou t the 
speedup or expansion of in ternal l inks. This indicates the impor tance of using 
l ink grouping and ind iv idua l module buffering. As expected, the cell loss prob-
abi l i t ies decrease as the buffer size increases. Figure 2.10 shows that d i lated 
banyan and di lated Benes network give almost ident ical performance. This is 
because in the Benes network, there is no blocking in the f irst k stages, and the 
remain ing k stages is exactly a A:-stage banyan. H B S I gives a s l ight ly better cell 
loss performance as some of the traffic is locally switched, so the loading and 
hence cell loss is reduced at later stages. 
Figure 2.11 shows the three networks under a nonuni form traff ic d is t r ibut ion. 
Each source module has a significant (20% and 40%) percentage of local t raf f ic, 
which the cells only destined at the ports of the local module. We see that 
di lated banyan gives a very poor performance as some l inks are congested due 
to its in ternal b locking nature. Di la ted Benes network is better than di lated 
banyan network, as the source modules w i l l split its traff ic in to alternate paths 
32 
Chapter 2 Hierarchical Banyan Switch Interconnection 
, —_ — — .— —— — — — — — . — 丽 — — — — — — “• 
I , 
p ^ , ： p ^ 
lO7^ � . ^ t t 
‘ \/""•j-siage \ / 
.；)t3 : 
丨 Pgj_i I I I . - . - - y-® .. i -V - - - • — - — - — — — - — — — — ‘^  
(j-1)-stage 
Figure 2.9: Recurrence relat ionship for call b locking 
1 .E+OOp ! 
Switch Size = 1024x1024 
Module Size = 16x16 
Uniform Traffic 
1.E-01 
Banyan Buffer = 20 
Benes _^^==========^^^^"^t 
, ^ ^ : ^ = ^ ^ ^ ^ ^ ¾ S i.E<>3,^;5s======""^I^-"<^^^"""^  ^--''^^^,.-'^ Bufer = 30^y^ 
I - /<^^^^^^^^^^^^^X^^^ 
0 y__>^ <^"^"^^^^^ l^^ -^"^*"^anyan ^^/^/^Buffer = 50 | 
SE^^.^.^^^-^ 〜、HBSI ^ X ^ 
^ ^ y ^ HBSI 
i.E^6 y^ 
Z I 
0.8 0.85 0.9 0.95 
Loading 
Figure 2,10: Cel l loss probabi l i t ies under un i fo rm traf f ic 
33 
Chapter 2 Hierarchical Banyan Switch Interconnection 
i n the f i rst few stages and hence reduce the in terna l content ion in the later 
stages. H B S I gives the best performance when local t raf f ic exists as i t reduces 
the in te rna l loading signif icantly. 
F igure 2.12 shows tha t the cell loss of al l three networks increases slowly w i t h 
the swi tch size. So we can expect H B S I to give a good performance even at a 
large swi tch size. As we have assumed a un i fo rm traf f ic , more I M t raf f ic of H B S I 
w i l l become nonlocal as the switch size increases. So the performance of H B S I 
approaches to tha t of d i la ted Benes network when the swi tch size increases. 
F igure 2.13 compares the delay of the three networks. I t is in terest ing to 
note tha t wh i le we are having six stages of switching, 50 cell buffer for each 
ou tpu t queue at eacli stage, the end-to-end average delay is only lO. to 20 cell 
t imes, and the empi r ica l m a x i m u m cell delay recorded is less than 80 cell t imes. 
Th is indicates tha t i nd iv idua l stage buffer ing does not increase the end-to-end 
delay substant ia l ly prov ided tha t the l inks are not overloaded. Hav ing a smaller 
number of stages, d i la ted banyan gives the smallest delay, whereas the delay of 
H B S I and d i la ted Benes network is only s l ight ly larger. I n the case of HBSI , 
bo th average and m a x i m u m delays are for the longest paths inside the switch. 
F igure 2.14 shows the delay profi le for different pa th lengths in HBSI . We see 
tha t cells for shorter paths do experience a smaller delay. 
Based on the analysis in Section 2.4, we obta in the cal l b lock ing performance 
of the three networks shown in Figure 2.15. Under un i fo rm traf f ic , group size 
equals to 8, H B S I is s l ight ly bet ter than the d i la ted Benes network, and bo th 
are signi f icant ly bet ter than t l ie d i la ted banyan. We also see tha t whi le d i la ted 
banyan m igh t be good in switching un i fo rm packet t raf f ic, i t gives a h igh block-
ing probab i l i t y (Ps > 0.1 for p > 0.6) when we are assigning calls inside the 
34 
Chapter 2 Hierarchical Banyan Switch Interconnection 
1.00E+00| • r ‘ 
Banyan ( 2 0 % localized traffic) 
Banyan ( 4 0 % localized traffic) 
^ _ _ ^ * 
“ ^ ‘ 
^ ^ 
1 ,ooE^2 ^^ _^_^ <<-^ -"-"^  Banyan (Uniform traffic) ^ ^ < ^ ^ ^ 
- ^ ^ ^ B e n e s (Uniform traffic) ^ ^ ^ 0 " ^ ^ ^ ^ ^ ^ ^ ' 
Benes ( 2 0 % localized traffic) <^<<<^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ "^""^ ^^ ^^  i 
B e n e s ( 4 0 % localized traffic) <^^；;：：：^^^^^ "^"^  ^ ^ ^ ^ 1.00E^3 ,^,'^ ^^ ^^ ^^ -^ ^^ ^^ ^ ^^ ^^ '^ ^^ ^^ ^ 
5 1舰敬 ^y^;^^^^^^^^^^^^>^^^ 
= ^ ‘ < ^ ^ ^ ^ ^ ^ j # ^ H B S I (Uniform traffic) 
0 9 X < ^ ^ ^ ^ . ^ ^ ^ : * ¢ ^ H B S I ( 2 0 % localized traffic) 
^ ^ ^ ; > : : ^ ^ ^ ^ ^ , , - ^ ^^ ^^ j^；^ :?^  H B S I ( 4 0 % localized traffic) 
_ _ - < ^ ^ ^ i i ! ^ Switch S ize = 1 0 2 4 x 1 0 2 4 
^ f ： ^ Module S ize = 1 6 x 1 6 
1 .cx)E-06 ^ ^ ^ Buffer Size = 30 | 
I 
0.8 0.85 0.9 0.95 
Loading 
Figure 2.11: Cell loss probabi l i t ies under nonuni form traff ic 
1.00E+00| — i 
Modele S i z e = 1 6 x 1 6 i 
Buffer S i z e = 2 0 
Loading = 0.95 
Uniform Traffic B a n y a n | 
B e n e s 




S3 _ _ = = r = ^ = ^ ^ « ^ = ^ ^ 
1 ~~~‘ ________ -^ 
云 _^ — -^"""^  """""^  i 




128 256 512 1024 
Switch Size 
Figure 2.12: Cel l loss probabil i t ies under various switch sizes 
35 
Chapter 2 Hierarchical Banyan Switch Interconnection 
90 -「 
Switch Size = 1024x1024 1 • 
Module Size = 16x16 ^ ^ ^ 
80 Buffer Size = 50 ^ - ^ 
Uniform Traffic ^ . ^ ^ ^ 
7。 ^ ^ : : ^ ^ ^ ¢ ^ ^ " " " ^ 
Empirical ^^；^：>^  ^ ^ 
Maximum Delay ^li：：^^^^^ ^ ^_^ -^^ ^ Benes 
60 _ ^ p = - ^ ' ^ ^ ^-<^ HBSI 
JJ^ _^ _^ --^ III^ ;^ :::::^ ^^  ^ ^ ^ Banyan . � ^ : ^ P ^ 
I 40^^-"^ o 
30 Empirical ^^ _^__^ . 
Average Delay __>^--"":;iI^^^^^^^^^^^^^^^-^  
2 _ _ ^ _ _ _ = ^ ^ ^ ^ ^ = ^ ！ ] 一~" m-^ Benes 
lOji -Vf- “ HBSI 
Banyan 
0-1 ‘ 
0.8 0.85 0.9 0.95 
Loading 
Figure 2,13: Cel l delay comparison 
70 r- ： 
_ ^ ^ ^ 
60 __--'-"^ _ ^ - ^ " ^ r ^ 
_^_____^ >^--""^""^  Empirical Maximum Delay 
soir^ "^""-""""^  
Switch Size = 1024x1024 
Module Size = 16x16 
40 Buffer Size = 50 | 






10 _ _ . ^ ^ ^ ^ ^ ^ ^ ^ 
<•••-4--^ "^^  Empirical Average Delay 
0-1 ^ 
0 2 4 6 8 10 12 
Path Length 
Figure 2.14: Delay profi le of HBS.I for different pa th lengths 
36 
Chapter 2 Hierarchical Banyan Switch Interconnection 
switch. I n the case of HBSI , we also show t-he result for each call requesting the 
bandw id th of half a l ink (e.g., 77.5 Mbps out of a 155 Mbps l ink ) instead of a 
fu l l l ink . Th is impl ies that the effective group size is doubled. We see that the 
call b locking performance is signif icantly improved which is t rue for the other 
networks as well. We also show that i f there is a significant por t ion of local traf-
fic (20% local, 80% un i fo rmly d is t r ibuted to other modules), the cali b locking 
performance of H B S I can be greatly improved, whereas t l ie other two networks 
w i l l have the same or worse performance. Figure 2.16 compares the call block-
ing probabi l i t ies of the three networks under various switch sizes, assuming a 
un i fo rm traff ic. We see that H B S I gives the best performance and i t approaches 
to tha t of d i lated Benes network when t l ie switch size increases, for the same 
reason stated earlier. 
2.6 Summary 
We have considered a new way of switching, hierarchical switching, where traff ic 
is switched f rom layer to layer, and is switched back whenever possible. This 
mode of switching allows switch interconnection architectures to be scalable and 
stackable. We considered H B S I as the simplest fo rm of hierarchical switching, 
and found that i t gives an excellent call .blocking, th roughput , and cell loss 
performance. Other forms of hierarchical switching is also possible. We showed 
how H B S I can provide alternate rout ing paths when a l ink or module fails. The 
drawback of H B S I is that i t uses 32 x 32 memory fabrics as 16 x 16 modules, and 
has doubled the interconnections. This is necessary in order to give a good call 
blocking performance, and the increase in cost can be m in ima l as I / O and call 
37 
Chapter 2 Hierarchical Banyan Switch Interconnection 
1E+00 ^ ： ~ ~ ~ ^ ^ ^ ^ ^ ^ 
Switch Size = 1024x1024 ^ ^ . . ^ ^ - ^ ^ ^ < ^ ^ ^ 
1F-01 -^^ Uniform Traffic,^ J^P^  
^ ^ Group Size=a^^X / 、 i Z 7 /^ 
0 1E-02 Banyan y ^ ^enes ^ ^ ^ 
1 iE-�3 j ^ y ^ 
u ^ ^ y ^ / H B S I (Uniform Traffic, 
1E-04 — y ^ y ^ / Group Size=16) 
'' Z / HBSI(20%LocalTraffic, 
y ^ / Group Size=8) 
1E-05 乙 L 
0.5 0.6 0.7 0.8 0.9 
Link Utilization 
Figure 2.15: Cal l b lock ing probabi l i t ies under different l i nk u t i l i za t i on 
1E+00 
^ ^ ^ — - ^ ^ 
t .^-‘^ — ^ 1；^ : 
S 1E-01 ^ * ^ ^ 
I ^ ^ ^ ^ 
左 ^ - ^ Banyan 
？ ^ / ^ Benes 
o y ^ HBSI 5 / 
- 1 E - 0 2 厂 ‘ / . 
/ Group Size = 8 
/ ‘ Link Utilization = 0.8 
/ Uniform Traffic 
1E-03 ^ 
64 128 256 512 1024 2048 
Switch Size 
Figure 2.16: Cal l b locking probabi l i t ies under various swi tch sizes 
38 
Chapter 2 Hierarchical Banyan Switch Interconnection 
processing c i rcui t r ies are domina t ing the cost of a swi tch ing system. F ina l ly , we 
note t ha t H B S I relies on shared memory swi tch ing modules for channel grouping 
and in-sequence cell del ivery and t l i is assumption is relaxed in Chapter 3. 
39 
Chapter 3 
Multichannel Switching and 
Resequencing 
3.1 Introduction 
This chapter is an extension of Chapter 2. In Chapter 2, the bu i ld ing blocks 
of H B S I are based on shared memory switching modules. Here, we introduce a 
mul t ichannel switching a lgor i thm and a resequencing a lgor i thm. By using these 
algor i thms, H B S I can be based on any generic point- to-point switch modules. 
The idea of mul t iehannel switching is described in Figure 3.1. Several ou tpu t 
l inks are grouped into a link group, or channel group. The l ink group is viewed 
as a t r unk and cells are delivered to any one l ink of the group instead of to a 
specific l ink . However, in point- to-point switch, cells have to routed to a specific 
ou tput l ink. We therefore need a channel allocation algorithm to assign a specific 
l ink for rout ing a cell to the output l ink group. 
The use of channel grouping can effectively increase the switching throughput 
40 
Chapter 3 Multichannel Switching and Resequencing 
by reducing the in ternal blocking and reduce the cell loss probabi l i ty by sharing 
buffer among the output l inks. On the other hand, since the cells f rom the 
same V C may be transported through different l inks, i t is possible for cells to 
get out of sequence. To solve this problem, we propose to add some ordering 
in fo rmat ion in the cell headers, and use a resequencing a lgor i thm to reorder the 
cells when they arr ive at their destination. 
3.2 Channel Assignment 
I n the fol lowing, we discuss three different channel al location algori thms. 
3.2.1 VC-Based Channel Allocation Mechanism 
I n [22], i t was suggested to split the stream of cells into sub-bursts of smaller 
size called strings. Those strings are t ransmi t ted through a set of A T M v i r tua l 
paths in parallel. As a result, constant stress on a part icular switch inside the 
network can be avoided. 
I n Figure 3.2, we chop the cells of each V C into groups of Ls cells. We call 
the groups of cells as strings and Ls as string length. The d is t r ibut ion a lgor i thm 
is described in A lgo r i t hm 3.1. Note that t l ie chopping of cells is done on the 
V C level. The cells are grouped according to their arr ival order at the input 
ports of the switch. Each str ing is associated w i t h a str ing ident i ty number, Sid, 
which w i l l be included in the cell headers. We can then d is t r ibute the cells w i t h 
different Sid in to different output l inks. We expect a balance of load among 
the l inks w i t h i n the l ink groups, where the str ing size Ls direct ly affects the 
efficiency of the load balancing. 
41 
Chapter 3 Multichannel Switching and Resequencing 
Algorithm 3.1 Cell Chopping (for each VC) 
Definitions: 
• n: number of cells forwarded in current string (set to zero initially) 
• s: current string id 
• Ls ： string length 
Procedure (to be executed at every cell arrival) 
1. Set the string id of the arrived cell to be s. 
2. Increase n by one. If n — Lg, update s and set n 二 0. 
Cells f r om different VCs are treated independently. I n other words, each V C 
has its own v i r t ua l d is t r ibutor . Figure 3.2 shows how the cells f rom the same 
V C can be d is t r ibuted to the output buffers. To faci l i tate the l ink assignment, 
we present a simple formula to govern the output l ink assignment in the channel 
group. Let the str ing id of a cell be Sid, the number of l inks in the target output 
l ink group be N。. Using a round-robin assignment of str ing to the l inks in the 
l ink group, we can assign the output l ink id 1。of the cell to be 
lo = Sid mod No (3.1) 
where 1。二 1 , 2 , . . . , N。— 1. Note that for the cells w i t h the same Sid, the assigned 
output l ink id of them w i l l be the same. Thus t l ie cells w i t h i n the same str ing 
w i l l take the same channel and are guaranteed to be in sequence. The fol lowing 
theorem shows the load sharing of the cells w i th in a VC. 
Theorem 3.1 In any period of time T (in cell time), we consider a particu-
lar VC. With the use of (3.1), the maximum difference of the fraction of cells 
distributed to the links within the link group is 争. 
42 • 
Chapter 3 Multichannel Switching and Resequencing 
••• m m A 
… n 
m — 
••• o _ i ^ m mm ---g 1 
•••M F^  R^ mNN ATM. 5 -^n^,.n i 
t ••• ro _ Switching • 卿 丨 
Module : : : 〜 A 
_ j j j _ _ _ _ J 3 _ S _ 
. ， _:i:__i^ i^  u_ 
I Key : i ^ 
I ； L inkGroup j 
. S Cells to link group i ！ 
！ • • Ce l l s to l i nkg roup j j 
Figure 3.1: Mul t ichannel switching 
String ID ~ ~ ~ 
• • • 
Si+2 Si+1 Si ^ t ” 
•••r^T^^i-^^"-^ ^^^ ^^^ RT‘'t^  A 
h- Ls— ATM ^ ^ — — H -
h • ••• M---r^ 
Switching ： 1 
, n Module • • _ ^ ^ 
i Key： i Link Group 
！ H Cells to link group i| 
Figure 3.2: VC-based output l ink assignment 
43 
Chapter 3 Multichannel Switching and Resequencing 
Proof B y (3.1), at any t ime, the difference of number of strings d is t r ibuted to 
the l inks w i t h i n the l ink group is at most one. Thus the difference of number of 
cells d is t r ibu ted to the l inks w i t h i n the l ink group is at most Ls. A n d therefore, 
the difference of the fraction of cells d is t r ibuted to the l inks w i t h i n the l ink 
group is at most 争. • 
W i t h Theorem 3.1, we can see that as T ^ oo, the f ract ion of cells dis-
t r i bu ted to the l inks w i t h i n the l ink group w i l l have no difference. However, in 
Theorem 3.1, we only concentrate on one V C only. I f there are more than one 
VCs t r y ing to route their strings to the same output l ink at the same t ime, the 
load sharing efficiency can be poor and there may be short t e rm buffer overflow 
in an ou tpu t l ink. 
I n the above, we assume Sid can be arb i t rar i ly large. Nevertheless, the range 
of Sid is l im i ted by the number of bits used to represent i t . So, we have to use 
a f in i te range of Sid and let the value wrap around when the m a x i m u m value of 
Sid has been used. I t can be verif ied that the same load sharing can be achieved 
for a large enough range of Sid. Also, a large enough range of Sid can ensure 
wrapping around w i l l not cause t l ie cells f rom a same V C , and having a same 
Sid, but f rom different period of Sid to be coexist in the switch. A detailed 
discussion on the range of Sid w i l l be given in Section 3.3. 
From the above discussion, we see that for a large value of Ls, there could 
be a large number of cells being assigned to one output l ink in a short period. 
I n buffer dimensioning, we should always reserve more buffers in this case. In 
other words, the buffer usage w i l l be more bursty when a larger Ls is used. I f 
we have a f in i te buffer size, the use of a larger Ls w i l l lead to a higher chance of 
44 
> » 
Chapter S Multichannel Switching and Resequencing 
cell loss. 
To conclude, the use of str ing chopping can balance the load w i t h i n the 
l ink group. However, the choice of t l ie str ing size Ls w i l l affect the cell loss 
probabi l i ty . I n par t icu lar , the higher the Ls, the higher chance the buffer w i l l 
overflow, which w i l l in t u r n lead to discarding the cells. 
3.2.2 Port-Based Channel Allocation Mechanism 
Figure 3.3 shows the operat ion of port-based assignment. A t the input por t , the 
por t control ler first identifies the destined output l ink group. Then an output 
l ink w i t h i n the target l ink group is assigned in round-robin fashion based on the 
assignments by the local input port controller only. We do not consider which 
V C the cells are belonging to and the sequence in format ion in the cell headers 
in the channel al location mechanism. A lgo r i t hm 3.2 describes the al location 
mechanism. 
Algorithm 3.2 Port-Based Channel Allocation Mechanism (for each input port) 
Definitions: 
• N: number of link group in the module 
• li: next allocating output link for link group i 
• rii： number of output link in link group i 
Procedure (to be executed at every cell arrival to the input port) 
1. Identify the destination link group i of the cell. 
2. Allocate the cell to output link h of link group i. 
3. Increase k by 1. If h exceeds n ,^ set k to one. 
45 
Chapter 3 Multichannel Switching and Resequencing 
Fol lowing the simi lar argument of Theorem 3.1, i t can be shown that the 
load is shared i f we consider one input port only. However, when we consider 
the whole module, i t is possible to have more than one input por t d is t r ibu t ing 
their cells to the same output l ink at the same t ime. There may be short t e rm 
buffer overflow at one part icular l ink. 
3.2.3 Trunk-Based Channel Allocation Mechanism 
Figure 3.4 shows the operation of trunk-based assignment. For each cell arr ival 
at any input por t , we first ident i fy its destined output l ink group. Then the 
ou tpu t l ink is assigned by a counter which manage t l ie l ink assignment of that 
l ink group. As i l lust rated in A lgo r i t hm 3.3,. we also use round-robin assignment 
of ou tput l ink to the arr ived cells according to their arr ival order. The a lgor i thm 
is simi lar to that of port-based channel al location mechanism. The only differ-
ence is tha t the counters in A lgo r i t hm 3.2 are local to each input ports, and the 
counters here are global to all input ports. 
Algorithm 3.3 Trunk-Based Channel Allocation Mechanism 
Definitions: 
• N.. number of link group in the module 
• /j: next allocating output link for link group i 
• m: number of output link in link group i 
Procedure (to be executed at every cell arrival to the module) 
1. Identify the destination link group i of the cell. 
2. Allocate the cell to output link h of link group i. 
3. Increase U by 1. If k exceeds n ,^ set k to one. 
46 
Chapter 3 Multichannel Switching and Resequencing 
• ^ 
•••o onim_mjTL AT|\/| ——…^ 囚——rn A 
"_F^R^R^ ^ r ^ [ ^ M n • Vl. ——i:::_S mra -
••• r7ir7i 囚 M Switching … rn c^ p^  _一 
> Module. "_ 障 \f 
Link Group 
I 一 1 
！ Key： i Cells from 
i H ceiistoinputportii different input port 
i Q] Cells to hput port 2丨 are treated 
j ^ Cells to input port 3j independently 
Figure 3.3: Port-based output l ink assignment 
• • • 
… w 
…rn yM__A 
ATM ~~---^ 如~~~U 
•••丽刚画 ^ Switching …：鬥 ^^ 4 t 
Module Link Group 
1 ‘ 
I Key: i Cells from 
i s Cells to link group i ！ different VCs 
i m Cells to link group j 丨 treated together 
j Q Cells to link group k| 
Figure 3.4: Trunk-based output l ink assignment 
47 
Chapter 3 Multichannel Switching and Resequencing 
Below, we w i l l show that by the use of. t runk-based assignment, the bufFer 
sharing among the l inks is opt imal . 
Theorem 3.2 At any time, the buffer occupancies ofthe links within an output 
link group will differ at most by one. 
Proof We w i l l show that i f the bufFer occupancy of the l inks differ by at most 
one in the previous t ime slot, the occupancies is st i l l di f fer ing by at most one 
after the current t ime slot. 
Let B i be the number of cells in the bufFer of l ink i; k be an integer, which 
k = 0，1, • •. , No — 1； B * is an non negative integer. Af ter the previous t ime slot, 
we have 
‘ B ^ + 1 for i < k , � 
B, = (3.2) 
B* for i > k 
w 
I n this t ime, the next l ink to be assigned should be k, which is indicated by 
the global counter. Before the new cells to jo in , each l ink should clear one cell 
f i rst. A f te r clearing, we get 
f 
B* for i < k 
B i = { (3.3) 
max{Q,B* - 1) for i > k 
K 
Now, we assume A new cells destined at the current l ink group arrive. Let 
Ai be ^4/凡」and Aj be A mod iVo. We consider two cases: 
1. Af < No - k 
f 
B* + A, for i < k 
Bi 二 max{Q, B* — 1) + A, + 1 for k < i < k + Af (3.4) 
max{0,B'-l) + A^ for i > k + Af • 
^ 
48 
Chapter 3 Multichannel Switching and Resequencing 
2 . Af > No 一 k 一 
B* + Ai + 1 for i < Af-No + k 
Bi = < B*^-Ai for Af - No + k < i < k (3.5) 
max{0, B* — 1) + Ai + 1 for i > k 
K 
I n al l cases, the buffer occupancies w i l l not differ by more than one. To 
complete the proof, we ident i fy the in i t ia l state of the system is the case of 
k = 0 and B* 二 0, which represents that al l buffers are empty. Therefore, the 
proof follows in every subsequent t ime slot. 口 
B y Theorem 3.2, we can conclude that in any t ime, the buffer occupancies 
of the l inks w i t h i n a l ink group are differed at most by one. This implies an 
op t ima l usage of the buffer. 
To aid the d is t r ibut ion of cells to different physical output buffers, a channel 
al locator should be used to assign the physical output address to the cells for 
switching across the module. Figure 3.5 sjiows the block diagram of the port 
control ler which is responsible for this operation. 
The implementat ion of the port controller is simple. The por t controller 
keeps a counter for each l ink group. The counter counts up f rom zero, and 
resets to zero when the number of l inks w i t h i n the l ink is reached. The counter 
value represents the next l ink id to ,go when a cell destined for that l ink group 
arrives. When a cell arrives, we only extract the destination l ink group of the 
cell. The corresponding counter is consulted to get the output l ink id. Then we 
count up the counter by one. The l ink id is then used to route the cell to the 
specific ou tput l ink. 
49 
Chapter 3 Multichannel Switching and Resequencing 
3.3 Resequencer 
I n the previous discussion, we have not consider the sequence of the cells. Since 
the cells may take different paths to the destination, there may be a delay 
var iat ion among the cells. Figure 3.6 i l lustrates the possible sequence patterns. 
When the str ing size is small, a delay of several cell t ime may cause a whole 
str ing of cells to overtake another str ing w i t h the previous Sid. However, for a 
longer str ing size, the delay var iat ion among the l inks w i l l only cause the arr ival 
of cells of different Sa to be overlapped [22 . 
I n order to recover the sequence of the cells when t l iey arrive at their desti-
nat ion, we use a resequencer for each V C at the output port to the switch. 
3.3.1 Resequencing Algorithm 
I n Secton 3.2.1, we introduced a str ing id Srd in each cell. Here, we fur ther define 
several terms to explain the resequencing algor i thm. 
• 5* — the target Sid. The resequencer is current ly forwarding the' cells w i t h 
Sid — <5 • 
• EOS — a cell w i t h a single b i t f ield set in the header, representing that the 
cell is the last cell of the string. 
• Ear ly Comer — the cells w i t h Sid + 5*, however, i t is supposed to be treated 
later than the cells w i t h Sid = 5*. 
• Late Comer — the cells w i t h Sid + 5*, however, i t is supposed to be treated 
earlier than t l ie cells w i t h Sid = <s*-
50 
Chapter 3 Multichannel Switching and Resequencing 
I Trunk-based Channel Allocator ； 
• p ^ counter for link group 1 1 
\ —• counter for link group 2 | 
I ~"^  counter for ink group C ！ 
j • I 
• B ( 
• • I 
j I destination identifier] U^counter(oriinkg7^ i 
' 3 :i n rj n u ‘ i i 
. … ! 






Figure 3.5: Trunk-based channel allocator for A T M switching module 
S i + 3 S i + 2 S i + 1 S i 
z \ z \ z \ z \ 
… o m P T V " m [ : ^ R ^ " ' [ ^ f z ^ P z i i ^ - p ^ F f i F F [ i F F r - F g . 
Input pattern 
Whole string overtook 
Z the previous string 
m-"miN_"^ pi_"mi[xqFximrmR^^ f^^ p "^_F^ 
At resequencer - Ls is small 
Overlap of the arrivals 
• Z of adjacent strings 
••.omn?"CTn]F^^�i^i^Fzi^~^F^^Frr"m 
At resequencer - Ls is large 
Figure 3.6: Possible patterns of sequence 
51 
Chapter 3 Multichannel Switching and Resequencing 
• Va l id Range — the range of Sid that we accept a cell. I f a cell comes w i t h Sid 
fa l l ing in to the val id range, we regard the cell as early-comer and accept i t . 
Otherwise, we regard the cell as late-comer as reject i t . Basically, the val id 
range is a sl iding window which relocate synchronously w i t h the change of 
s*. 
• D — a t imer value such that i f the resequencer is idle for the given t ime, 
and one or more early-comers have come, the resequencer w i l l regard the 
EOS of the str ing 5* has lost, consequently, the resequencer w i l l update 
5* and the val id range. 
Figure 3.7 i l lustrates the resequencing algor i thm. When a cell arrives, i t is 
f i rs t ly ident i f ied to be accepted or not w i t h the val id range. I f i t is a late-comer, 
the resequencer w i l l s imply discard t l ie cell and resequencing loss is claimed. 
For the cell w i t h i n the val id range, i f its Sid is the same as the target 5*, we 
forward the cell immediately. Moreover, i f i t is EOS, the 3* and val id range w i l l 
be updated. ‘ 
I f the Sid of the cell is not equal to <s*, that means i t is an early-comer. In 
this case, we buffer up the cell in F I F O queues and wait un t i l the 5* is count ing 
up to i ts Sid. 
Therefore, in case of cell loss, the resequencer w i l l be in t rouble only i f the 
lost cell is the last cell of a string. To cope w i t h that case, we employ a t imer 
to avoid wai t ing forever. For a val id cell arrives, i f i ts Sid is not equal to 5*, we 
activate the t imer . D t ime slots later, 5* w i l l change to the next value and loss 
of last cell of the previous str ing is assumed. On the other hand, i f any cell w i t h 
Si^ = 5* arrives before the t imer signals, the t imer is reset. We can see that the 
52 
Chapter 3 Multichannel Switching and Resequencing 
probab i l i t y of the t imer in act ion gets smaller w i t h the increase of s t r ing size Ls. 
A f te r s* is changed, the cells belonging to previous Sid may arr ive at the 
moment as late-comers, they are dropped and we c la im them as resequencing 
loss. 
One may expect that D is an impor tan t parameter here. I f D is large, the 
resequencing loss w i l l be small but the resequencing delay w i l l be large. On 
the contrary, i f D is small , the resequencing delay w i l l be small, however, the 
resequencing loss w i l l be large. 
I n Section 3.2, we ment ioned that the range of Sid is impor tan t in ensuring the 
load sharing of l inks. Here, we fur ther ident i fy that Sid should be large enough 
such that we can guarantee the cells w i t h same Sid owing to wrap around of 
str ing id w i l l not appear in the resequencer in the same t ime. In the fol lowing, 
we consider Ls — 1 to make Sid wrap around more quickly. 
Figure 3.8 shows the transmission pat tern and the arr ival pat tern in the 
resequencer. The s i tuat ion in Figure 3.8 is the most probable way for the Sid to 
be wrapped around, causing ambigui ty in cell sequence. In our case, assuming 
the val id range is M,, the previous W — 1 cells before the cell w i t h Sid 二《5 have 
lost. Upon the arr ival of cell w i t h Sid = <s, i.e. cell A, we activate the t imer and 
wait for a durat ion of D. Throughout t l i is period of t ime D, the V C continue 
to deliver cells, and the Su has wrapped around. Now, we have to ensure that 
D should be smal l enough so that the t imer w i l l signal before the cells w i t h 
Sid = s — {W — 1) in the next round, i.e. cell B, arrives. Otherwise, when cell 
B arrives, i t is located w i t h i n the val id range and thus wrong insert ion may be 
made, causing wrong sequence of cells. 
53 
Chapter 3 Multichannel Switching and Resequencing 
A cell arrives 
• Early-comer 
N ^ ^ " ^ d w i t h i n ^ ^ Y ，〖 • 
^ C ^ valid r a n g e T j ^ The Timer 
^ ^ ^ Y ^ ^ " " ^ " \ ^ N sig?als 
+ ~L<;^Sid = s* 0^_—— 
“ Deactivate the " \ ^ ^ ^ ^ f ^ ‘ 
- timer D ^^/^^^^~~\ 
Late-comer ^ , ^ „ . ^ ^ > ^ 
I N ^ . ^ ^ n y cell with &J^>v_^ 
I <C s* still in forwardingJ> 
Y / / ^ ^ ^ \ ^ ^ l ist? ^ : : ^ 
Z Is the c e l f ^ \ ^ ^ ^ ,, 
~ ~ " ^ EOS? 7 _ _ _ J L _ _ _ ^ 
\ ^ ^ ^ " "Ac t i va te the 丫 Update s* and 
I ^ Y timer D valid range 
Update s* and N ^ 
valid range 
. i • ___1___ 
I Deactivate the 
Drop it ！ Forward ！ Store up the cell timer D ！ 
Figure 3.7: The resequencing a lgor i thm 
Tx using PCR 
I - H 
In valid range (W-1), 
but in next round 
Cell Generat ion Pattern ^ ,^, ^  ' ^ ^ ' 
" i ^ p T ] p 7 ] … f~"|'"|s-2|s-1 
\ \ \cTDmax \ CTDmin 
\\\ V 
• 猶 • 
S 
I - " " ^ “ ^ 5 ^ ~ " ^ 1 
Arrival Pattern at Resequencer 
Figure 3.8: A n example of wrapping around of Sid 
54 
Chapter 3 Multichannel Switching and Resequencing 
I n the worst case, we have the cell transfer t ime of cell A to be the m a x i m u m 
cell transfer delay of the switch, CTDmax- W i t h i n cell A and cell B, the V C is 
sending cells at i ts peak cell rate, PCR. And , the cell transfer t ime of cell B 
is the m i n i m u m cell transfer delay of t l ie switch, CTDmin- To avoid ambigui ty, 
we have 
N ~ p ^ ~ 1) + CTDrmn > C T D m a . + D (3.6) 
where, N is the number of ind iv idua l Sid needed. Note that CTDmax — CTDmin 
is the peak-to-peak cell delay var iat ion of the switch, CDVp-p [25], We can 
rewr i te (3.6) to be 
^ ^ ^ - 衝 … (3.7) 
N > {D + CDV,-p) X PCR + {W — 1) (3.8) 
(3.8) suggests tha t , given CDVp-p, which is a parameter of the switch; PCR, 
which is a parameter of the V C , and D , we can determine the m i n i m u m range 
of Sid to avoid ambigui ty. 
3.4 Results and Discussion 
We have developed s imulat ion programs [39, 40] to evaluate the performance 
of the various schemes of mul t ichannel switching. In al l the fol lowing simula-
tions, a 1024 X 1024 switch bui ld ing on 32 x 32 modules is assumed. We assume 
speedup in the modules so that there is no i i i ternal blocking. A t each input por t , 
a random number of connections are established first to some randomly selected 
55 
Chapter 3 Multichannel Switching and Resequencing 
dest inat ion por ts. The average loading of each connection is also random. How-
ever, we impose an upper bound on loading at each inpu t por t , ou tpu t por t and 
in terna l l i nk , and therefore persistent overloading at any l ink is avoided. 
We use FCFS as the scheduling a lgor i thm in the ou tpu t por t . F rom the 
discussion i n Chapter 4, we know tha t FCFS gives the best delay performance. 
The use of rate-based scheduling a lgor i thm l ike VCS is unnecessary here be-
cause we are not concerning QoS guarantee. Instead, we set up resequencers to 
invest igate the resequencing performance. 
F igure 3.9 shows the effect of s t r ing size Lg in the case of VC-based assign-
ment . As we expect in tu i t i ve ly , the smaller the st r ing size, the bet ter the load 
sharing among the l inks, thus leading to a smaller swi tch cell loss probabi l i ty . 
F igure 3.10 shows the resequencing loss probab i l i t y when dif ferent s t r ing size 
Ls is used. W h e n L$ is smal l , we can see f r om Figure 3.9 tha t the swi tch cell 
loss p robab i l i t y w i l l be very small . Th is reduces the probab i l i t y of the loss of 
the last cell w i t h i n the str ing, and thus reduces the chance of t ime out for the 
resequencer. As a result , the resequencing loss owing to early-comers and late-
comers is also decreased. W h e n there is a larger swi tch cell loss p robab i l i t y i n 
the swi tch fabr ic , i t is more l ikely tha t the resequencer would t ime out and c la im 
resequencing loss, so we can see a rise in resequencing loss. As Ls goes larger, 
i t is less l ike ly tha t cells are out of sequence, and this leads to a sl ight drop in 
resequencing probabi l i ty . 
The to ta l cell loss probab i l i t y is shown in Figure 3.11 by summing up the 
swi tch cell loss p robab i l i t y and resequencing loss probabi l i ty . We can see tha t the 
shape of the to ta l cell loss probab i l i t y is s imi lar to the switch cell loss probabi l i ty , 
since the order of resequencing loss probabi l i ty is insignif icant when compar ing 
56 
Chapter 3 Multichannel Switching and Resequencing 
to the swi tch cell loss probabi l i ty . Th is impl ies tha t contro l l ing the swi tch cell 
loss is much more impo r tan t than contro l l ing the resequencing cell loss. 
F igure 3.12 shows the resequencing loss probab i l i t y for dif ferent values of D, 
i.e. the t imer value, and the val id range of Sid. F rom the f igure, we see tha t 
D has an o p t i m a l value for eacli val id range. I n the s imulat ion, the parameter 
is set so tha t N satisfies (3.8), so no ambigu i ty of sequence identi f iers occurs. 
For a very smal l D, we have l i t t l e tolerance on cell delay var iat ion: many late-
comers are dropped. For a very large D, the t imer is wa i t ing too long w i thou t 
advancing the va l id range window, so t l ia t the early-comers of the subsequent 
Sid are dropped. Therefore, prov ided tha t (3.8) is satisfied, F igure 3.12 gives an 
op t ima l D to m in im ize the resequencing loss. From Figure 3.12, we also see tha t 
for a f ixed value of D, the larger the val id range leads to the lower p robab i l i t y 
of resequencing loss. I t is because i f there is no ambigu i ty of sequence, a larger 
val id range can provide a bet ter tolerance on cell delay. 
F igure 3.13 shows the average to ta l delay on the same combinat ions of D and 
N. General ly, the larger D, t l ie larger resequencing delay, and thus the larger 
to ta l delay. We see tha t as long as N is suff iciently large, N w i l l not affect the 
delay: N only contr ibutes to resequencing loss. 
F igure 3.14 shows the performance of load sharing using VC-based, t runk -
based, port-based assignment schemes and shared memory modules respectively. 
I n al l the schemes, we use L^ 二 1 to give the smallest overal l cell loss probabi l i ty . 
A buffer of 5 cells are used for each ou tpu t l ink , which makes up an aggregate 
buffer of 40 cells for each l ink group. Trunk-based assignment scheme gives 
the best cell loss performance. We see tha t the performance of shared memory 
modules and t runk-based assignment scheme are essentially the same. Th is 
57 
Chapter 3 Multichannel Switching and Resequencing 
1.E+00, r , 
loading = 0.9 
1.E-01 r Z ^ " ~ ~ 咖 loading = 0.7 i 
^ l f * ~ ^ ~ ~ I 
^ / loading = 0.5 
. I 1-E-02/^>^~~"• t 
£ / Switch = 1024x1024 
名 / Module = 32x32 
^ 1.E-03^ Group Size = 8 
Buffer Size = 5 per link | 
N = 32 
D = 20 
1.E-04L 
0 10 20 30 40 50 60 70 80 90 100 
String Size 
Figure 3.9: Swi tch cell loss probab i l i t y under different s t r ing size (VC-based 
scheme) 
1.E-04, ！ 
Switch = 1024x1024 
Module = 32x32 
Group Size = 8 
1 tT r^ c • _j. r, n Buffer Size = 5 per link ； i.E-05 loading = 0.9 N 32 
I r ^ ^ ^ ^ _ _ T L _ 
% 1.E-06U p ^ ^ s 
1 / ^--^^-__^_^^ . 





0 10 20 30 40 50 60 70 80 90 100 
String Size 
Figure 3.10: Resequencing loss probabi l i ty under different s t r ing size (VC-based 
scheme) 
58 
Chapter 3 Multichannel Switching and Resequencing 
1 .E+00 , 
loading = 0.9 
iE-o i ：广一,~~„ J o a d i n g = 0.7 [ 
^ . /^.^-^‘>-*^ “ ？ 
2 / loading = 0.5 
. 1 ^.E-02U^^-----^ • t 
0 / Switch = 1024x1024 
S / . Module = 32x32 
° 1.E-03 / Group Size = 8 
Buffer Size = 5 per link i 
N = 32 
D = 20 
1.E-04 j 
0 10 20 30 40 50 60 70 80 90 100 
String Size 
Figure 3.11: T o t a l cell loss p robab i l i t y under di f ferent s t r ing size (VC-based 
scheme) 
1.E+00 ^ 
Switch = 1024x1024 
1 E-01 Module = 32x32 
• “ Group Size = 8 , v a l i d Range = 32 1 
Buffer Size = 5 per link 。 ^^_____ 1.E-02 Loading = 0.9 ^ ^ ^ 
. String Size = 1 ^ ^ ^ 64 j,^,^---^^^ \ 
1 :EV^^^"Z| 
I ：：：；�<y . i 
I.E-OsL 丨 • 
10 15 20 25 30 35 40 45 50 55 60 D 
Figure 3.12: Resequencing cell loss p robab i l i t y under di f ferent values of D ( V C -
based scheme) 
59 
Chapter 3 Multichannel Switching and Resequencing 
just i f ies our argument on the op t ima l i t y of t runk-based assignment scheme. On 
the other hand, the VC-based and port-based assignment schemes give a close 
performance. We see tha t the port-based scheme performs only s l ight ly bet ter 
t han the VC-based scheme in terms of load sharing when the s t r ing size is small . 
3.5 Summary 
I n th is chapter, we studied the channel al locat ion a lgor i thms and the correspond-
ing resequencing a lgor i thm. Using these algor i thms, we are able to use a generic 
po in t - to -po in t swi tch to emulate a shared memory switch in bu i ld ing HBSI . 
Simulat ions show tha t w i t h the use of t runk-based ou tpu t por t assignment, the 
performance is essentially the same as using shared memory modules. 
60 
Chapter 3 Multichannel Switching and Resequencing 
8 0 -
Switch = 1024x1024 , ; 
Module = 32x32 Valid Range = 128 • / ^ 
70 Group Size = 8 Vaiid Range = 64 Y / / 
Buffer Size = 5 per link ^ „ „ J / / 
Loading = 0.9 Valid Range = 32 - # 
^ 60 String Size = 1 f 
. 1 y, 
I 50 y ^ 
I 40 ^ ^ I 
30 ^ " ^ ^ " ^ " ^ 
_ _ ^ ^ z 
' ^ - " ^ : 20 I i i 1 J 
10 15 20 25 30 35 40 45 50 55 60 
D 
Figure 3.13: Effect of D on the end-to-end delay (VC-based scheme) 
1 • E+00 j - —• - -— - - — — 
Switch = 1024x1024 
Module = 32x32 V C - b a s e d 
1.E-01i Group Size = 8 (Ls = 1) ^ _ ^ 
Buffer Size = 5 per link ____^ *C^^ ^^ ^^ ^^ ^ 
N = 32 ^.-C：：^^^^^^^^ I 1.E-02 D = 30 ^^^%^^^P0XX-^^S^^ 
8 1.E-03 i ^..^¢^=^^'^ 
1 Z . 
f 1.E-04「 ^ ^ ^ 
1 ^ _ ^ ^ ^ - ^ — ^ ^ ^ ^ ^ 
i.E-05i B - — - ^ Shared memory , T ruck -based 
1.E-06L --- - … - - ' 
0.3 0.4 0.5 0.6 0.7 0.8 0.9 
Loading 





I n Chapter 1，we described the role of scheduling in the guarantee of the QoS 
of the A T M connections. I n par t icu lar , rate-based scheduling schemes provides 
guarantees i n th roughput and expected del ivery t ime to A T M streams. I n this 
chapter, we propose a new rate-based scheduling scheme that gives a smaller 
delay to low-rate streams. Also, the implementa t ion issue is discussed. 
4.2 Virtual Clock Scheduling (VCS) 
VCS uses the v i r t ua l clock value as a p r io r i t y for scheduling. I n the or ig inal 
proposal and proof of delay guarantees, VCS applies to variable size packets [29, 
341. Here, we confine our discussion w i t h i n the A T M context and consider f ixed 
J / • 
size A T M cells only. We assume that t ime is d iv ided in to t ime slots according 
to cell boundaries: t ime t means the t^'' t ime slot. Our discussion can easily be 
62 
Chapter 4 Scheduling 
extended to variable size packets. 
We assume that there is a scheduler at each output por t to control the 
transmission of cells to the output l ink. As A T M is connection-oriented, each 
stream w i l l f irst establish a connection through the A T M switch and specify the 
traff ic and QoS parameters. Based on these parameters, the switch reserves the 
V 
bandwid th for each stream. Let R be the output rate of the mul t ip lexer , we 
have R > YU Ri, where Ri is the requested bandwid th of stream i. 
The v i r t ua l clock value is generated by considering a reference server for each 
stream. Consider a stream i w i t h rate Ri = 1 /Ti , we consider a reference server 
of rate R i shown in Figure 4.1. Note that R{ and Ti are not restr icted to integers. 
Assume tha t the k^^ cell of stream i arrives at t ime tk, The expected delivery 
time dk for the k^^ cell at the reference server is given by 
"1 二 f;i + Tt (4.1) 
dk = m a x { t A ; , d k - i } + Ti 
I n VCS, dk is assigned as the v i r tua l clock pr io r i ty for each cell stored in 
the mul t ip lexer . A t each t ime slot the cell w i t h the smallest dk is selected for 
transmission. Bo th [29, 34] show that i f the switch is work-conserving and non-
preemptive, each cell w i l l be delivered before the expected delivery t ime at the 
reference server. I n this way, each stream is guaranteed a, bandwid th w i t h regard 
to its reference server, and t l i is guarantee is independent of the other streams. 
In the fol lowing, we show that instead of comparing the pr ior i t ies of al l cells 
inside the mul t ip lexer , we can focus on the head-of-line ( H O L ) cells only. This 
is because VCS assigns prior i t ies on a stream-basis. 
Theorem 4.1 In VCS, cells within one stream are FCFS. 
63 • 
• Chapter 4 Scheduling 
Proof Consider t l ie p r io r i ty equations in (4.1). We have dk > <4—i > • . . > d i 
at any t ime t. So the cells w i t h i n one stream must be FCFS. 口 
This result allows us to compare the H O L cells of ind iv idua l streams. A t any 
t ime t, we select the highest pr io r i ty H O L cell for transmission. Consequently, 
i f there are N streams, we only need to compare N values, one for each stream, 
instead of considering al l stored cells. I f N is small compared to the number 
of stored cells, the complexi ty is reduced. Whenever a H O L cell of a stream is 
delivered, the p r io r i t y value for that stream is updated for the next H O L cell. I n 
the fol lowing, we define a new set of equations for assigning the VCS prior i t ies 
to H O L cells. Let 4 be the t ime at which the k^^ cell of a stream becomes H O L 
and t'j^  be the departure t ime of this k^^ cell. I n VCS, we have 
tk < h < t'k < 4 (4.2) 
as cells are guaranteed to be delivered before their expected delivery t imes. 
Theorem 4.2 The expected deliverij time dk in VCS can be expressed in terms 
ofh: 
"1 二 “ T : (4.3) 
dk 二 meix{tky dk-i} + Ti 
八 
Proof We define a new set of equations based on tk： 
ei 二 “ T : • ( 4 . 4 ) 
Ck = m a x { 4 , e k - i j + Ti for k = 2 , 3 , • • • 
To prove the theorem, we use mathemat ical induct ion to show e.k = dk for 
k = 1 , 2 , 3 , . . . . 
64 
Chapter 4 Scheduling 
Basic Step [k = 1: 
The f irst cell becomes H O L immediate ly when i t arrives, i.e. i i = t i . We have 
ei = ii + Ti = ti + Ti = di. 
Inductive Step [k > 1 
Assume tha t ek- i = dk_i , we shall prove that ek = dk. Consider 
e/c 二 max{4,eA:_i} + T\ 
(4.5) 
dk — m a x { 4 , d k - i } + Ti 
Case 1: tk > dk—i 
I n this case, the k^^ cell arrives after the expected delivery t ime of the pre-
vious cell. By (4.2), VCS guarantees that the (k - 1)" ' cell w i l l be delivered 
before 4 _ i , i.e., t'|^_^ < 4 - i . As tk > 4 - i , the k*^ cell w i l l become H O L imme-
diately as i t arrives. We liave therefore ik 二 tk. As Ck-i = ^ 4 - i , we have ek — dk-
Case 2: tk < dk-i 
I n this case, the U^ cell arrives before the expected delivery t ime of the 
previous cell (i.e., tk < 4—i ) . This k^^ cell may arrive before or after the 
departure of the previous cell. I f the k^^ cell arrives before the departure of 
the {k — 1)认 cell (i.e., tk < ^ _ i ) , the k^^ cell w i l l become H O L at t ^ (i.e., 
ik = ^ _ i ) . This implies that 4 < 4 - i . Since Ck-i 二 4 - i and 4 < 4 -1，we 
have m a x { 4 , e f c _ i } 二 m a x { 4 , 4 - i } - Consequently, we lia,ve ek = 4 . 
I f the kth cell arrives after the departure of the (k — 1)力"cell (i.e., h > ^ _ i ) , 
the kth cell w i l l become H O L immediately. So 4 二 h, which is less than 4—i . 
Since ek- i = 4 - i , we have max{4,e i^ ._ i } 二 m a x { t A : , 4 - i } - Again, we have 
ek 二 dk. 
By mathemat ica l induct ion, we show that Ck = 4 for al l k >1 . So the 
65 • 
• Chapter 4 Scheduling 
expected del ivery t ime dk in VCS can be expressed as (4.3). 口 
This theorem allows the use of 4 instead of tk for calculat ing the expected 
del ivery t ime. When the H O L cell of a stream is served, the expected delivery 
t ime of the next H O L cell can be updated w i thout recall ing the arr ival t ime of 
that cel l . 
Note that the value dk is related to the current t ime that keeps on increasing. 
To decouple the p r io r i t y value f rom the current t ime, we subtract the current 
t ime t f r om dk and obta in another variable gk{t) for assigning pr ior i t ies to the 
H O L cells: 
gi{ii) 二 ：?1 + 2^广；^*1 
= T t • 
gk{ik) = m a x { 4 , 4 - i } h + Ti ( 4 . 6 ) 
二 m a , x { f A ; — 4 , dk-i 一 h } + Ti 
—m&x{0,gk-i{ik)} + Ti 
Note that (4.6) defines the pr io r i ty value of a cell at the instant when the cell 
becomes H O L . To ensure that gk(t) is the result of dk minus t , gk is decreased 
by one at each t ime slot as t ime progresses. So we have gk{t + 1) 二 gk{t) — 1 
at every t ime slot. A t any t ime t , gk{t) = 4 — ^ , so we can use gk(t) instead of 
dk as the pr io r i ty value for comparison. The fol lowing a lgor i thm i l lustrates the 
operat ion of VCS: 
Algorithm 4.1 Virtual Clock Scheduling Algorithm based on HOL cells 
Definitions: 
• N: number of streams in the scheduler 
• Gi： the value of the priority variable for the cells of stream i 
66 
• Chapter 4 Scheduling 
• Ti： the period of stream i in slots • 
Procedure (to be executed at every cell time) 
1. If the first cell of a new stream i arrives, set Gi = Ti 
2. If any other cell of stream i becomes HOL because it is a new arrival or its previous 
cell is transmitted, set Gi = max{0, G{} + Ti 
3. Select the cell among all streams that has the smallest Gi value for transmission 
4. Decrement Gi by 1 for every stream i 
W i t h this a lgor i thm, we get a better view of p r io r i ty changes in VCS as 
i l lust rated in Figure 4.2. The first cell of a high-rate stream 1 arrives at t i , 
assigned w i t h a p r io r i t y G i = J \ . (Note that the or iginal VCS pr io r i t y d i is 
f i + j^i.) This p r io r i t y increases at every t ime slot un t i l the cell is t ransmi t ted at 
a certain t ime t[. Assuming that another cell of this stream has arr ived before t[ 
and been backlogged, the pr io r i t y of this subsequent H O L cell w i l l be assigned 
as 2Ti — {t[ — ti). The in i t ia l p r io r i ty of this second cell is lower than that of the 
f irst cell, because its expected delivery t ime is now 2Ti - (t[ - ti) later instead of 
Ti for the first H O L cell. We see liow the pr ior i ty is control led by the expected 
delivery t ime. When a stream transmits several cells in a burst , i ts expected 
delivery t ime accumulates. 
Consider a cell of a low-rate stream 2 arrives at t ime 列(<Si < h ) , assigned 
w i t h a p r io r i t y G2 = T2, and the cell has not been t ransmi t ted (because of other 
higher p r io r i t y cells) before t ime h . A t t i , since this cell is f rom a low-rate 
stream {T2 > Ti), its priority is lower than that of the first cell of the high-
rate stream. In fact its pr io r i ty is lower than that of the second cell of the 
high-rate stream. The cell f rom the low-rate stream is t ransmi t ted later even 
67 
• Chapter 4 Scheduling 
^ ^ Expected delivery t imes 
ti 一 丁 丨 一 ^ ^ ^ ^ ^ / 
I i 2 3 ~ ~ 4~~~ 5 
• / / / / ~ ~ " ^ ^ r ^ ^ 
1 2 3 4 5 
^ , . Buffer ^ i 
Stream/ ^ ^ ^ 
"QMO~~~“ 
Figure 4.1: Reference server and Expected Del ivery T i m e 
G| p _ _ _ \ ^ 
“ ！ —- — -一 _ ^ ^ - ：- ：.- - \ ^ — - 2 V ( f / - f , ) 
i ^ N ^ f V x 
: “ , \ : ” w . 
¢^1 ti^li t,l 27, 
Figure 4.2: T ime -Pr io r i t y Model for VCS 
68 
• Chapter 4 Scheduling 
though i t arrives earlier than the cells of the high-rate stream. The graphical 
presentat ion in Figure 4.2 i l lustrates why VCS gives a larger delay to low-rate 
streams, especially when the high-rate stream is bursty. We refer this graphical 
representation as T ime-Pr io r i t y Model ( T P M ) , which are used to model other 
p r io r i t y schemes in Section 4.4. Below, we derive a number of interest ing results 
for VCS. 
Theorem 4.3 Given any busy period with length T, the number of cells from a 
stream of rate Ri — l|Ti having an expected delivery time fall within the period 
T is given by [T/Ti_ . 
Proof Consider a stream and its reference server in Figure 4.1. Before a busy 
period, there is no cell await for transmission. In the worst case, the f irst cell 
of th is stream arrives at the beginning of the busy period, and the expected 
delivery t ime of the subsequent cells occur at periods of J]. The number of 
cells of this stream having an expected delivery t ime before the end of the busy 
per iod T is given by [T /T i J . 口 
This result shows that the number of cells f rom a stream having an expected 
delivery t ime before t l ie end of a. busy period T is bounded by the length of the 
period. Given a f in i te number of streams, the to ta l number of cells having an 
expected delivery t ime w i t h i n T w i l l also be bounded. Based on this result, we 
prove the fol lowing impor tan t result in [29, 34] in a simpler way. 
Theorem 4.4 The VCS discipline guarantees that cells of multiplexed streams 
can be delivered no later than their expected delivery times with respect to their 
reference servers. 
69 
• Chapter 4 Scheduling 
Proof Consider a busy per iod start ing at some t ime t\ up to t ime ti + T. 
Before the busy period, there is no cell await for transmission. Assume that 
we have N streams to be mul t ip lexed, the m a x i m u m number of cells having an 
expected delivery t ime before t ime ti + T is given by summing up the cells f rom 
al l streams, or 
, N rp N rp 
y - < y - < T - R (4.7) 
Z_^ T 一 J^ T — 
i=l L』2J i=l 丄1 
where we use the fact that R > E i Ri, or the sum of rates of all streams is less 
than the mul t ip lexer ou tpu t rate. Since the mul t ip lexer is work-conserving, the 
number of cells served dur ing a t ime period T is T . R cells. So for any t ime T, 
all cells having an expected delivery t ime before t ime t i + T w i l l be served before 
ti + T. I n other words, no cell w i l l be served later than its expected delivery 
t ime w i t h respect to its reference server. 口 
4.3 Gated Virtual Clock Scheduling (GVCS) 
I n VCS, any new H O L cell is immediate ly assigned w i t l i the v i r tua l clock pr ior i ty . 
Nevertheless, Figure 4.2 suggests an idea that we can delay the assignment of 
the pr io r i ty un t i l one t ime period J] before the expected delivery t ime. We w i l l 
show that the cell can st i l l be t ransmi t ted before its expected delivery t ime. 
Consider a stream i w i t l i rate R i = l / T “ Let a cell arrive at t ime t and the 
expected delivery t ime is given by 4 , where 4 is defined by (4.3). We have the 
fol lowing theorem. 
Theorem 4.5 In VCS, as long as the k^^ cell of each stream is assigned a 
priority value dk — Ti as the expected delivery time at the instant dk - Ti, the cell 
70 
• Chapter 4 Scheduling 
will be delivered before dk, 
Proof I n Figure 4.3a, we show a stream of cell arrivals at the reference server. 
I n Figure 4.3b, we delay each cell arr ival to the expected departure t ime of the 
previous cell. We see that i f this stream of cells is served by the reference server, 
the same expected delivery t imes w i l l be guaranteed. Now consider that this 
modi f ied stream enters a VCS mul t ip lexer . As VCS guarantees the expected 
delivery t ime as the reference server, i t guarantees that this stream of cells w i l l 
meet their del ivery t imes. This implies that we can delay the or iginal stream as 
the new stream and put into the VCS mul t ip lexer. The VCS mul t ip lexer w i l l 
s t i l l guarantee the expected delivery t imes of this stream. 口 
This theorem implies that we do not need to assign a p r io r i t y to any cell 
unless i t is one per iod before its expected delivery t ime, which is the expected 
departure t ime of its previous cell of the same stream. Based on this result, 
we propose an enhanced v i r tua l clock scheme - Gated V i r t u a l Clock Scheduling 
(GVCS). We introduce two states of pr ior i ty for each cell. The k^^ cell of a 
stream is said to be in inactive state before the t ime dk - Ti. A f te r dk — T“ the 
cell is said to be in active state. Inact ive cells have a lower pr io r i ty than active 
cells. Below, we give the GVCS algor i thm: 
Algorithm 4.2 Gated Virtual Clock Scheduling 
Definitions: 
• N: number of streams in the scheduler 
• Gi: the value of the priority variable for the cells of stream i 
• Ti: the period of stream i in slots 
71 
• Chapter 4 Scheduling 
• Gf: the previous value of the priority variable for cells of stream i 一 
• Gmax' the largest constant value indicating the lowest priority 
Procedure (to be executed at every cell time) 
1. If the first cell of a new stream i arrives, set Gi — Tj, and define the cell to be active 
2. If any cell of stream i becomes HOL, set Gf - Gi and Gi - Gmax and define the HOL > 
cell to be inactive . 
3. For those inactive HOL cells, check if the current time has exceeded dk — Ti. If so, set 
Gi 二 max{0, G f } + Ti and define the cell to be active 
4. Select the cell that has the smallest Gi value for transmission 
5. Decrement Gi by 1 for every stream i 
Figure 4.4 i l lustrates the a lgor i thm. When the first H O L cell of stream 1 is 
served at t [ , the next H O L cell becomes inactive. The pr io r i ty is assigned to 
a m i n i m u m (Gi 二 Gmax)- We see that a cell of stream 2 can now be served. 
The inact ive cell of stream 1 w i l l increase its pr ior i ty at each t ime slot, but w i l l 
always have a lower pr io r i ty than active cells i f G^ax is suffi^ciently large. U n t i l 
t ime t i + T i , which is T\ before its expected delivery t ime, this cell is assigned 
Ti as i ts pr ior i ty . Tha t is, the pr io r i ty is now activated. The scheme is named as 
Gated V i r t u a l Clock scheduling as the pr ior i ty is gated to activate only w i t h i n 
the interval dk — Ti. 
GVCS w i l l give a better delay performance to low-rate streams. Consider 
a cell of a low-rate stream and another cell of a high-rate stream. Once a cell 
of that high-rate stream is t ransmi t ted, the next H O L cell f rom the high-rate 
stream will become inactive for some time and therefore the cell of the low-rate 
stream w i l l have a chance of transmission. Bursts of high-rate cells can no longer 
jeopardize the transmission opportunit ies. 
72 
• Chapter 4 Scheduling 
i 2 3 " ^ " 4 " " " 5 ~ ~ 6 
1t t t t t + 
(a) Original cell arrivals 
~ ~ 1 I 2 I 3 4 I 5 I 6 
7"""1 ^ t f ^ " 7 " " ^ / " ^ 
1 2 3 4 . 5 6 
(b) Cell arrivals aligned with departure times 
Figure 4.3: Cell arrivals deliberately delayed 
i^max 








^ 丁2 Act ive 
o \ 厂 Region 
•左 \ \ _ — - L 
\ ！ 27V f t ’ - f》 
丁1 \ \ 丨 
_ _— ••_ —_ — - — ^^^^ — — —^^1^^ — — ~~ “ “ ("• ^^^^^ 
0 in>r%j^Px�. 
S“ fy+ '1 t,+ 2l, 
Figure 4.4: Gated V i r t u a l Clock Scheduling 
73 
• Chapter 4 Scheduling 
I t m igh t be interest ing to compare GVCS, VCS, W F Q in tu i t i ve ly on how they 
share the spared bandwid th . I n VCS, the spared bandwid th is used by high-rate 
streams whenever possible. In W F Q , i t is shared by streams in propor t ion of 
ind iv idua l rates. I n GVCS, i t is somewhat shared in a R R manner (i.e., on an 
equal basis). Consider a stream that has just t ransmi t ted one cell. I t has used 
its transmission oppor tun i ty already un t i l J ] later. Before this t ime, i.e., any 
fur ther cell t ransmission is explo i t ing the spared bandwidth. As each stream 
can have only one inact ive H O L cell, streams w i l l be t ransmi t t i ng their inact ive 
cells (i.e., using the spared bandwid th) in a round-robin manner. 
Note that the use of the spared bandwid th in VCS creates a penalty problem. 
Whenever a cell is t ransmi t ted , the expected delivery t ime accumulates. So i f a 
stream is using the spared bandwidth, the-expected delivery time accumulates 
even i f no other streams have data for transmission. Consequently, when other 
streams resume transmission, the stream that have used the spared bandwid th 
w i l l have accumulated a, large expected delivery t ime value, thus w i l l have a very 
low pr io r i t y of transmission. This poses a penalty to streams that have used the 
spared bandwid th . 
I n [29], a modi f ied VCS scheme is proposed to reduce the effect of this penalty. 
Whenever the system is empty, i.e., no cell is wai t ing for transmission, the value 
dk of al l streams are reset to zero. Consequently, the accumulated expected 
delivery t imes in t l ie reference servers are reset to zero. In doing so, we reset 
al l connections and treat the subsequent cell arrivals as new streams. In the 
proof of Theorem 4.4 for bandwid th giia,rantee, the only assumption we made 
was the sum of ind iv idua l stream rates is smaller than the system bandwidth . 
As long as this assumption holds for each busy period, we can have a new set of 
74 
• Chapter 4 Scheduling 
streams for each new busy period. Consequently, the modif ied VCS scheme w i l l 
s t i l l guarantee the throughput and expected delivery t imes. 
As GVCS also relies on the accumulat ion of expected delivery t imes, we can 
use a simi lar approach to avoid the penalty problem. Whenever the queue is 
empty, we can reset the variables G f = 0 for all streams. 
4.4 Time-Priority Model 
I n the section, we show how T P M can model other scheduling schemes. In 
Figure 4.2 and Figure 4.4, we see that a pr ior i ty is assigned to Gi at a certain 
t ime, and after some t ime, another pr ior i ty is assigned, and so on. We can 
therefore have a generalized time-priority algorithm. 
Algorithm 4.3 Generalized Time-Priority Algorithm 
Definitions: 
• N: number of streams in the scheduler 
• Gi： the priority variable for cells of stream i 
• Di： the timer variable for updating the HOL cell priority 
• Ti.. the period of stream i in slots 
Procedure (to be executed at every cell time) . 
1. If any cell of stream i becomes HOL, set Gi 二 Function_A and A: 二 Function_B 
2. If Di counts down to zero, update Gi 二 Function_A and A 二 Function_B 
3. Select the cell of all streams which has the smallest Gi value for transmission 
4. Decrement Gi and Di both by 1 for every stream i 
75 
• Chapter 4 Scheduling 
Note tha t Function—A, Fui ic t ion_B can be a constant value or a func t ion 
based on arr iva l t imes, current t ime, or delay deadlines, etc. The t imer variable 
D i tracks the t i m e for the next update of the pr ior i ty . 
Figures 4.5 to 4.10 show the T P M models for F i rs t -Come-Fi rs t -Served (FCFS) , 
Last-Come-First-Served (LCFS) , Round Robin (RR) , Earl iest Deadl ine Fi rs t 
( E D F ) , Stat ic M u l t i p l e P r io r i t y Queues (SMPQ) , M u l t i p l e P r io r i t y Queues w i t h 
P r i o r i t y Jumps ( M P Q / P J ) . Weighted Fair Queueing ( W F Q ) can also be mod-
eled i f we al low the rate of increase of p r io r i t y (decrement of the p r i o r i t y vari-
able) changes w i t h respect to the aggregate stream rate. T h a t is, instead of 
decrement ing Gi by 1 at every t ime slot, we decrement Gi at a rate inversely 
p ropor t iona l to the aggregate rate. I n this way, the exist ing streams may share 
the spared bandw id th propor t iona l to t l ie i r stream rates[33 . 
I n FCFS, we store the arr ival t ime of every cell. W h e n t l ie k-th. cell of stream 
i becomes H O L , we set Gi = 0,謝—n, where 1�is the difference between the 
current t i m e and the arr ival t ime of the k-ih cell (i.e., how long the k-th cell 
has wai ted inside the queue). A cell tha t arrives and immedia te ly becomes H O L 
w i l l have G, 二 G,nax, wh ich is the lowest p r io r i t y in the system. The p r io r i t y of 
each H O L cell increases at every t ime slot, i.e., Gi = Gi — 1. Given a suff ic ient ly 
large Gmax, each cell would be serviced before G% reaches i ts m i n i m u m value of 
0. Smaller value for Gi indicates a higher pr ior i ty . To avoid confusion, we refer 
below the actual p r io r i t y of a cell instead of the value of i ts p r io r i t y variable. 
A t any t ime t , the cell wh ich arrives first w i l l be of the highest p r io r i t y and be 
served f i rst . LCFS is s imi lar to FCFS, except tha t a cell is given the highest 
p r io r i t y when i t arrives, and the p r io r i t y decreases at every t ime slot； 
76 
• Chapter 4 Scheduling 
^ ^max 
丨 丨 1 
Figure 4.5: F i rs t -Come-F i rs t -Serve 
! i ' ^ 
Figure 4.6: Last-Come-First-Serve (LCFS) 
77 
• Chapter 4 Scheduling 
Gfnax 
i t r ^ f 












' ^ _ 
Si ti u^ Si+Di U1+D3 t1+D2 
Figure 4.8: Earliest Deadline First (EDF) 
78 
0 




w 0) w (0 
CD 
T i _ • 
I 尸 ， p 
•f Pl 
左 
0 I_^II - • 
Si ti "7 
Figure 4.9: Stat ic M u l t i p l e Pr io r i t y Queues ( S M P Q ) 
k 
Gi p ^^^^f\ 
• 7 0)" 0 
(/) - - - — -
S P2 ^ ^ ^ - ^ f ^ 
0 
CD 
1 飞 ^ ^ — 
oI 
j .. .- ‘ • - - ^ - -- -- 一 —- 一 -
- , ^ _ : : ^ , 
Si *1 4 
Figure 4.10: M u l t i p l e P r io r i t y Queues w i t h Pr io r i t y Jumps ( M P Q / P J ) 
79 
• Chapter 4 Scheduling 
I n R R , scheduling is done on a stream basis. In Figure 4.7, we show four 
streams having cells for transmission. Consequently, a cell of stream a is trans-
m i t t e d first, stream b second, and so on. Whenever any cell of a stream is served, 
the next H O L cell is given the lowest pr ior i ty by assigning Gi 二 G^maa；. The pri-
or i ty increases at every t ime slot. I f a stream has no more cells for transmission, 
the p r io r i t y of tha t stream stays at the lowest pr ior i ty . 
E D F assigns an in i t ia l p r io r i ty value to each cell based on the deadline as-
sociated w i t h the stream. The pr io r i ty increases at every t ime slot, so a cell 
closer to i ts deadline has a higher pr ior i ty . Note that E D F cannot give a delay 
guarantee unless the characteristics of other streams are known. In SMPQ, each 
stream has a f ixed pr ior i ty . When a cell becomes H O L , i t is assigned w i t h that 
pr ior i ty . I n M P Q / P J , we have mul t ip le pr ior i ty queues as shown in Figure 4.10. 
A n arr ival cell may jo in any one queue. Af ter some specified t ime, the cell may 
jump to a higher p r io r i ty queue and so on. 
T P M can model a hybr id of schemes. For example, we can have a M P Q / P J 
discipl ine in which each queue uses EDF. We can also change the a lgor i thm 
f rom one discipl ine to another, and modi fy the scheduling parameters at any 
t ime. For example, we may start w i t h three queues and change to four queues 
later. T P M is therefore f lexible and versatile in model ing various scheduling 
algori thms. 
4.5 Programmable Rate-based Scheduler (PRS) 
The two major tasks in the T P M algor i thm are (i) to assign the H O L prior i t ies at 
different t ime instants, and ( i i ) to compare the priori t ies of H O L cells to find the 
80 
• Chapter 4 Scheduling 
highest p r io r i t y cell for transmission. Figure 4.11 gives a possible implementat ion 
for T P M . I t has a cell table of N entries for storing the scheduling in format ion 
for N streams. The cell table may be an extension of the V P I / V C I table that 
provides V P / V C mapping for each stream. Cells are stored into the cell memory, 
and the relevant scheduling parameters (e.g., T{) are stored in to the cell table. 
There are N priority counters for storing the pr ior i t ies of the H O L cells. 
A t every t ime slot, t l ie counters are decremented by one as needed by most 
disciplines. Assuming that the length of the counters is M bi ts, the M - b i t 
minimum circuit will determine the smallest value of all counters, and output a 
stream number to the cell memory. The H O L cell corresponding to that stream 
is read f rom the memory and delivered to the output . A t the same t ime, bo th 
the act ivat ion counters and the pr io r i ty counters w i l l be updated for the next 
H O L cell. Assume that the M - b i t m i n i m u m circuit compares the counter values 
on a b i t -by-b i t basis, we neecl at most M - b i t delay to determine the m i n i m u m 
counter value among the N streams. 
Note that the pr io r i ty values can be non-integers as the period T,-'s might 
not be an integer mu l t ip le of cell slots. For simpl ic i ty, we use only the integer 
part of the pr io r i ty values for comparison in the M - b i t m i n i m u m circui t . This is 
because the cells w i t h the same integer part can be roughly considered as having 
the same pr ior i ty . Wh i le the integer part is decremented at each t ime slot for 
comparison, the f ract ion part is used for accurately accumulat ing the pr io r i t y 
values. 
There are N act ivat ion counters to govern the next update of pr ior i t ies. 
These counters are preloaded w i t h a, value and decremented by one at every 
t ime slot. When these counters reach zero, some pre-calculated values stored 
81 
• Chapter 4 Scheduling 
i n the cell table w i l l be loaded to bo th the act ivat ion counters and the pr ior-
i t y counters. How t l ie act ivat ion counters are updated are dependent on the 
schedul ing disciplines implement ing. Below, we show l iow GVCS and FCFS dis-
cipl ines can be implemented. Other schemes can be implemented in a s imi lar 
fashion. 
I n GVCS, the cell table w i l l store the expected act ivat ion t ime (i.e., dk — Ti) 
of each subsequent cell. Whenever a cell becomes H O L , the expected act ivat ion 
t ime minus the current t ime w i l l be stored into the act ivat ion counter, whereas 
Gmax w i l l be assigned to the p r io r i t y counter. When the act ivat ion counter 
decrements to zero and the cell is s t i l l not serviced, the value Ti w i l l be assigned 
to the p r io r i t y counter, ind ica t ing tha t the cell is now w i t h an act ive pr ior i ty . 
Once a cell is i n active state, the act ivat ion counter stays at zero u n t i l the cell 
is delivered. On the other hand, the expected del ivery t ime for every cell is 
pre-calculated and stored in the cell table when the previous cell becomes H O L . 
Take the example shown in Figure 4.11. Stream 1 and 2 are bo th act ive streams, 
whereas others are inact ive streams. Stream 3 is close to active as i ts act ivat ion 
counter is now 1. We see tha t stream 2 lias the smallest p r io r i t y counter value 
among al l streams, so its H O L cell w i l l be delivered at this t ime slot. 
I n FCFS, the arr ival t ime of each cell has to be stored in the cell table. 
Whenever a cell becomes H O L , the difference between the current t ime and 
the arr ival t ime (i.e., how long the cell has wai ted inside the stream queue) is 
subtracted f r om Gmax, and t l ie result is loaded in to the p r io r i t y counter. So i f 
the cell has wai ted longer inside the queue, i t w i l l have a higher s tar t ing H O L 
pr ior i ty . The p r io r i t y increases as the pr io r i t y counter decrements at every t ime 
slot. The act ivat ion counters w i l l not be used at all. 
82 
• Chapter 4 Scheduling 
From these two examples, we see the trade-off needed for a c i rcui t to imple-
ment two different schemes. To l im i t our scope, we focus on the implementat ion 
of rate-based schemes only so that we do not need to store the arr ival t ime of 
each cell. We consider Figure 4.11 a Programmable Rate-based Scheduler (PRS) 
as (1) i t can implement many rate-based schemes, and (2) i t can be configured 
to implement different schemes by having different algorithms for updat ing the 
act ivat ion counters and pr io r i t y counters. 
The complex i ty of PRS increases w i t h the number of streams, so PRS is 
more feasible for small number of streams. We suggest that PRS may assign 
pr io r i t y counters at v i r t ua l pa th level instead of v i r tua l channel level. I t can be 
proved tha t i f each stream w i t h i n a group is leaky-bucket control led, the rate of 
each stream w i l l be guaranteed at the scheduler. Al ternat ive ly , we can use the 
scheduler to protect groups of well-behaved streams (e.g., V B R ) f rom groups of 
streams (e.g., A B R or U B R [25]) which liave less predictable behavior. Final ly , 
we may keep the pr ior i t ies for active streams only. Add i t iona l hardware w i l l be 
required to keep in format ion on active and inactive streams. 
4.6 Integration with Resequencer 
Up to now, we have introduced a large scale switch interconnection architec-
ture, a mul t ichannel switching a lgor i thm and the corresponding resequencing 
a lgor i thm and an output port scheduling algor i thm. In this section, we t r y to 
integrate them together and look at a global view. 
Figure 4.12 is a block diagram, showing the funct ional positions of the above 
idea. Our design is a one-sided large scale switch fabric which is scalable and 
83 
• Chapter 4 Scheduling 
stackable. The internal blocks can be shared memory switches, or generic point-
to-point switches. A t every output por t , we have an output por t control ler, 
containing a programmable rate-based scheduler, which is responsible for the 
output port scheduling of several virtual circuits destined at the output port. 
I f shared memory switches are used, no other devices are needed to reorder the 
cell sequence. However, i f generic switches are used, a resequencer should be 
used in each output port as described in Chapter 3. 
We observe that bo th the programmable rate-based scheduler and the rese-
quencer are located at the output por t . I n fact, the operation of bo th devices 
are inter-related. Figure 4.13 i l lustrates the idea. We have logical queues for 
each V C . The cells arr ive at the output port is first queued in the buffer i f the 
cells are w i t h i n the val id range of the corresponding VCs. We also have a PRS 
responsible for scheduling the cells. In each t ime slot, a stream is selected using 
the preset rate-based scheduling a lgor i thm..Note that the cells stored in the cell 
queues may not be readi ly t ransmi t ted due to the problem of out of sequence. 
So the pr io r i t y counter value of stream i in PRS is counting only i f there are 
cells f rom stream i w i t h target str ing id arrived. Otherwise, the counter value 
of stream i is temporar i ly disabled. 
We have another counter, the Sid changing counter, which w i l l be activated 
to change 5* i f out of sequence occurs. For details of resequencing mechanism, 
3* and Sid, please refer back to Section 3.3. 
84 
• Chapter 4 Scheduling 
Activation Priority 
Cell Table Counters Counters 
- 4 ^ 0 1 .| T5 ^ 1 
• 4 ^ " " " 0 " " " 2 ^ " 3 ^  今 
3 i 3 85 M-bits 
• 3 . 63 Minimum 
• ~ ~ z t ^ . 95 Circuit 
• ^ . 91 
N > 5 N| 92 ^ > ] 
~ ~ i 
^ Cell Memory __ :__ • 
Cell arrivals 
Figure 4.11: Programmable Rate-based Scheduler 
I l i n k g r o u p . 
丨 「 | 二 I M ， 0 二 E M ， ^ E M ； 
^ ；― PRS and — V < W ^ , 
“ ^ I ~ Resequencer — 
|/〇 _ ： ： I M 二 ： E M 厂 , : E M j 
Links i 丨 
； ： l M 二 二 E M r . 1 E M I 
: I • 〜 ^ < > 
i ； l M C , : E M ^ ， E M ！ 
^ , ^ • ' 
I HBSI - an one-sided interconnection architecture, 
A—Large Scale Switch 
Figure 4.12: A funct ional block diagram showing a large scale switch 
85 
Chapter 4 Scheduling 
4.7 Results and Discussions 
We developed a s imulat ion program [39, 40] based on T P M for evaluating the 
performance of VCS, GVCS and FCFS. The program is w r i t t en in C + + . The 
first version of the program model ing VCS requires one-month t ime to finish, 
whereas modif icat ions needed for GVCS and FCFS only take one day each. A 
un i fy ing model reduces our effort in developing s imulat ion models for various 
disciplines. We ant ic ipate that the same.would apply when we use PRS to 
implement various rate-based schemes. 
The s imulat ion model is shown in Figure 4.14. A T M streams f rom 1% to 
60% of the ou tpu t l ink bandwidth are generated for mul t ip lex ing. Note that the 
periods of some streams, e.g. 40% and 60%, are not integer mul t ip les of a cell 
t ime slot. Each stream is modeled as a constant b i t - rate source which- generates 
cells at regular t ime intervals. The cells are first passed into an art i f ic ia l delayer 
which introduces a random delay according to some distr ibut ions. The cells are 
then scheduled to arr ive at the mul t ip lexer . The delayer simulates a bounded 
random delay experienced by each cell f rom the network, hence w i l l introduce 
some j i t t e r and burstiness in the cell arrivals. We associate a random, but 
monotonic increasing t ime w i t h each cell in the delayer as its scheduled arr ival 
t ime. A t every t ime slot, the scheduled t ime of the H O L cell inside the delayer 
queue is compared w i t h the current t ime. I f the scheduled t ime meets the current 
t ime which implies that the cell has arrived, the cell is dequeued and stored in to 
the mul t ip lexer . Cells f rom mul t ip le streams are mul t ip lexed according to the 
scheduling a lgor i thm. Note that we could have used other sources (e.g., variable 
b i t - rate on-off sources) instead of constant bi t - rate sources. Nevertheless, we f ind 
86 • 
• Chapter 4 Scheduling 
tha t the mu l t ip lex ing delay heavi ly depends on the burstiness of each stream. 
Using constant b i t rate sources, we can control the stream burstiness by the 
ar t i f ic ia l network delayer. 
The f irst result is based on 10 streams of 1%, two streams each of 5% and 
10%, and one stream eacli of 20% and 40% of the system bandwid th . The to ta l 
loading is therefore 100% of the system bandwidth. We define Transmission 
Relat ive to Expected Del ivery ( T R E D ) as the t ime difference between the ac-
tua l transmission t ime and the expected delivery t ime. Figure 4.15 shows the 
d is t r ibu t ion of T R E D for FCFS, VCS, and GVCS. For clar i ty, only the highest 
and lowest rate streams are shown. We see that bo th VCS and GVCS guarantee 
that al l cells are delivered before their expected delivery t imes. I n FCFS, some 
cells f rom the high-rate stream are delivered later than their expected delivery 
t imes. Even so, the t ime lag f rom the expected delivery t ime is not significant. 
Note that the distr ibut ions of FCFS and GVCS are very s imi lar , except that 
the por t ion of cells exceeding their clue t imes in FCFS are squeezed before their 
expected delivery t imes in GVCS. • 
Figure 4.16 to 4.18 compare the average delay, the delay variance, and the 
m a x i m u m delay recorded for the three disciplines. The art i f ic ia l network delay 
is un i fo rmly d is t r ibuted f rom 1 to 400 cell t imes. FCFS gives a fair delay to 
al l streams. VCS gives a larger delay to low-rate streams, the smaller the rate, 
the larger the delay. We see that even though i t guarantees the bandwid th 
of low-rate streams, the average delay and the delay variance int roduced can 
be unacceptable for some pract ical applications. GVCS increases very sl ight ly 
the delay of high-rate streams, but reduces significantly the delay of low-rate 
streams even when compared w i t h FCFS. This is because in FCFS, cells of 
87 
• Chapter 4 Scheduling 
• • -nncED T T 1 
cell arrivals f f f 
DDCZn "^ _ •, — •. • • _ • — • • • "^ 
cell output - J y 日 日 」 y y LJ ‘ 
个 target s* f target s* T target s* 
S Id Changing ！ i ‘ 
Counters ‘ 
——^-^ ^ — — ^ ^ ^ ^ ^ 
、 0 h « ~ Queues forVCo Queues for VCi Queues for VCk 0 1 
Z Z E : ] ： 
0 ! . 
1 0 i , Activation Priority I 
3 'u^_J , Cell Table Counters Counters ； 
I J I i . . . , . — - , , 
J—»^- 0 1 15 •->•； 1 
‘ 0 - - . - - - , - - „ i „ . „ ！ 1 , 
一 A >^  . 0 ； 2 _ 3 」 - - > i ； ‘ 3 _—— 1 3 ‘ ~85―― M-bitsi I 
: • 一一； 8 “ i 63 1 5 Minimumi • 
！ • � - — ：一 27 • 95——丨：Circuit : , 
, • :——. 一 46 • 91 i ： 
！ — — N C : : : _ : 5 : _ : N r : 9 [ H _ i 1 
I Programmable Rate-based Scheduler j 
Figure 4.13: Integrat ion of scheduler and resequencer 
CBR ^ m 〇 “ ^ \ 
Artificial . ^ \ 
Delayer • 、 ^ 
= K 
ATM Multiplexer 
Figure 4.14: Simulat ion Model 
88 . 
• Chapter 4 Scheduling 
high-rate streams may accumulated into the cell buffer, thus may introduce a 
larger delay to cells of low-rate streams. In GVCS, cells of low-rate streams 
w i l l have an active pr io r i ty as soon as they arrive, so w i l l be t ransmi t ted earlier 
than those inact ive cells of high-rate streams. In any case, GVCS gives the best 
delay performance to low-rate streams whi le guaranteeing the throughput and 
delivery t ime of high-rate streams. 
Figure 4.19 to 4.21 i l lustrate the f ire-wall effect of VCS and GVCS. We 
consider four streams w i t h input loading 5%, 10%, 20% and 60% respectively. 
A t a certain t ime, the 20% stream misbelia,ves and starts pumping cells at 60% 
of the l ink bandwidth . Cells accumulate inside the mul t ip lexer , and the to ta l 
mu l t ip lex ing throughput goes to 100%. Even so, both VCS and GVCS main ta in 
the throughput of the other streams. The misbehaved stream only uses up all the 
remaining bandwid th which is available due to the randomness of cell arrivals. 
I n FCFS, al l streams except the misbehaved one suffer a throughput drop; the 
high-rate stream suffers most. The amount of throughput drop depends on 
how many cells have already been stored. Clearly, FCFS fails to guarantee the 
throughput for ind iv idua l streams. 
Figure 4.22 to 4.24 show how the spared bandwidth are shared in different 
schemes. Consider four streams w i t h reserved rates of 1%, 10%, 20%, 50% and 
actual loading rates of 1%, 50%, 50%, 50% respectively. The to ta l reserved 
rate is less than 100%, so the remaining bandwidth is shared by the extra cells 
generated by the 10% and 20% streams. Since VCS favours high-rate streams, 
the 20% stream gets a larger sliare than the 10% stream. In GVCS, the spared 
bandwid th is equally shared by the 10% and 20% streams. This is because 
extra cells are always inact ive, and inactive cells are serviced in a R R manner. 
89 
• Chapter 4 Scheduling 
i 8 j U ^ GVCS40% 
VCS40% I U 
\ tl 
T ^ s s� � J:、7。。' 
I . . j ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 3 " " " ^ ^ ^ " " ^ ^ ^ W H J l / 
-E|00 -400 -300 -200 -100 p 100 
I ,^00&^  
Transmission Time Relative to Expected Delivery 
Figure 4.15: Delay Guarantee of VCS and GVCS 
20 
VCS GVCS FCFS 
• 1 A|/ I 
S 10 I 
‘ I r n 
5 ~~ — . 
i 
0 U L l _ l _ j _ _ i _ ^ . l 」 
1% 5% 10% 20% 40% 
Figure 4.16: Mean Delay of VCS, GVCS and FCFS 
90 
• Chapter 4 Scheduling 
5Q0 T •_• ._• 
450一 I 
400 
VCS GVCS FCFS 
350 I 
0) 300. ~ 
^ 
> (0 • 
i 250-
> “ i I 2 0 0 -
Q 15。. I -. ” n I 
50- • _ ^ rn i 
I I j ] I 1 j I i 
0 U 1 」 H LLCLJ—.—— I — H U 
1% 57。 10% 20% 40% 
Figure 4.17: Delay Variance of VCS, GVCS and FCFS 
180-
1 6 0 -
_ VCS GVCS FCFS 
I: 1 r|/ 
1 1。。— I 
^ 80 I __ 
eo- \ U [ h n • 
40 J ~~~ ~ ~ ~ ~ ~ ~ ~ 
20 — 
oLJ 
1% 5% 10% 20% 40% 
Figure 4.18: Max Delay of VCS, GVCS and FCFS 
91 














0 100 200 300 400 
Time (in 1000 slots) 
Figure 4.19: Throughput Profile of ves under Misbehave Stream 
c 
o 






................... .. ........ , ......... . .......................... ... .. ................ . . ..................... . .................. . 
~ 0.3 
5 20% 0 .2 -~ 
1% 
o -1-- -- ----.--.--------.--- ------------.-.. -----------------.-.-._. __ .__ ----------i 
o 100 200 300 400 
Time (in 1000 slots) 
Figure 4.20: Throughput P rofile of eves under Misbehave Stream 
92 
• Chapter 4 Scheduling 
Note tha t i n VCS and GVCS, the throughput of the 1% and 50% streams are 
protected. I n FCFS, the throughput of the 50% streams drops to the same level 
as the two misbehaved streams. 
Figure 4.25 shows the penalty problem of VCS in using the spared band-
w id th . Consider four streams w i t h reserved rates 10%, 20%, 20%, and 50% 
which shares the 100% bandwid th in i t ia l ly . A t t ime A , the two 20% streams 
have noth ing to send, whereas the 10% and 50% streams increase their traff ic to 
20% and 75% respectively. As the to ta l loading is only 95%, these two streams 
can use the spared bandwid th f rom the 20% streams, but their expected delivery 
t imes accumulate as well. A t t ime B i , the 20% streams resume transmission at 
a higher loading of 40% each. We see that the throughputs of the 10% and 50% 
streams drop immediate ly to 0% and 20% respectively, whereas the 20% streams 
can use 40% each of the system bandwidth. A t t ime B2, the 20% streams have 
accumulated enough expected delivery times, so the throughput drops back to 
near 20%, A t t ime C , al l streams go back to the or iginal propor t ion of band-
w i d t h sharing even though the to ta l load st i l l exceeds the available bandwidth. 
Figure 4.26 shows the penalty problem of GVCS which is sl ight ly different. A t 
t ime B , bo th 50% and 10% streams have accumulated long expected delivery 
t imes, so their cells are essentially inactive cells. On the other hand, the active 
cells of the 20% streams only make up to 40% system ut i l izat ion. The effect is 
that the spared bandwid th of 6 0 % is shared 011 a round-robin basis by al l inac-
t ive cells in the system. We see that the 50% stream and the 20% streams have 
equal share of bandwidth , whereas the 10% stream gets its offered load band-
w i d t h which is 20%. Final ly, Figure 4.27 .shows that by reseting the pr io r i ty 
counters whenever the queue size is zero, we can el iminate the penalty problem 
93 













o 100 200 300 400 
Time (In 1000 slots) 











P, 0.1 - 1% 
0 -
0 200 400 
Time (in 1000 slots) 
Figure 4.22: Sharing of Rernaining Bandwidth under VCS 
94 




0.5-Aj|(lj^ /i*\^ VH|Uv^<\^ l^ «A^4YMW~~AVV 
0.4- I 
Y 
c 0.3 1 
1 20% , _ „ 1 
.M r ^ ^ I 
2 0.2-v^^A^^^^vVVWwVJWrMW"JA^ | 
10% 
0.1 -wvw"«iV» f^^ WViNVvMNw"> r^"^ wft"^  
1 % • 
0 U=:r=^==;:^=^:;=i^:^^ :==^^ :^=^^ ^^:^^ =^=^^ ^^^ ^^^ ^^^ =^^^ :^^ :^^=^^ ^^==^^ =^=^=^=^^ ^^==^^==^^ ^^^ =^^^ ^^^ =^^ ^^=^===^=^=^^ ^^^ ^^^ =^^^ ^^=^^ ^^^  
0 200 400 
Time (in 1000 slots) 




0.5jVjs/v4aV><^ V^^ "^#"^"JWVMAW/V<^W/] 
0.4 丨 . 
^ W!*^ iM<% )^#l^ HW4WwtH5lW<^ fitaN|p<#M<^ 4 
I 20% 
=> 0.2-^yVW^W^v>MH"^^ <^VvvMh<Jr*>AMM^  
10% 
0.1 ^^</iV^VN^ r^nWVl(V^VWSA>^*<vWv^ A^vyA^  
1% 
_ , I -u. * ._-- , .1 • - • • • , ._ • . . • • -x.>w.<.^ .> i^>~«.-> ..,^_«_^M»> . . . .<wywvA_/iwv>/>^ >t/<s_/v>^ i^ A<-<<^ <"N/*W^^ -W "^Vw*A^ *^Vr"^ *i nA>'VW-'VA.<fc.'V^ *i>NA'^  - oV «.*.-WV <J^  
0 J ^ - ' “ •••： •-—-一-:::^ :^:::^ --—^^ ^^ ^^ =^ ^^ :^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ :^^ :^^ ^^ ^^ •^^ :^^ ^^ ^ 0 200 400 
Time (in 1000 slots) 
Figure 4.24: Sharing of Remaining Bandwid th under FCFS 
95 
• Chapter 4 Scheduling 
i n bo th VCS and GVCS. A t t ime B , al l streams get immediate ly their reserved 
bandwid th . 
4.8 Summary 
We reviewed the VCS discipline by using a T ime-Pr io r i t y Model ( T P M ) . We 
found tha t the model is powerful on its own that i t can model many t ime-
dependent p r io r i t y schemes. Based on t l ie insights f rom the model, we derived a 
Gated V i r t u a l Clock Scheduling (GVCS) scheme which gives the same through-
put guarantee as in VCS, but has a smaller delay to low-rate streams. In the 
next chapter, we are considering the possible integration of the resequencing 
algor i thms and the output port scheduling algori thms when we have to do rese-
quencing at the output port of an A T M switch. 
96 
• Chapter 4 Scheduling 
VCS w i t h o u t Reset 
0.8y 






. 1 0,4 WVH>#wyW 
I 
5 0.3 
20% X 2 1 
0.2 I^AWvVwAVuKrvWAHWWVv|/WAAAfVA^iMVM^*^  ^ 
10% 
0.1 j— 
oi L _ = _ J L _ _ ^ _ _ i 
0 A 200 B i Bg C 400 
Time(in 1000 slots) 
Figure 4.25: Penalty effect of VCS wi thout reset 
GVCS w i t h o u t Reset 
0.8r 





！ 0.4 « 
.N 
1 0.3 . 
20% X 2 “ ^ ‘™'^ i 
0 . 2 , . . • •• j^j^^y^^^yvVvvVv^V^A>WWA"«\V^J7WMV^»JV^"-^^^^^^^*""^vvr""V*r*"«vV»r"WWW\MvvW^^^^>^^"^«">^ j 
10% 
0 . 1 ^ I 
0 0 A 200 B 400 C 
Time (in 1000 slots) 
Figure 4.26: Penalty effect of GVCS wi thout reset 
97 
• Chapter 4 Scheduling 
> • 












20% X 2 
0.2 i|V^ *^vN>v>v"^ h>A^ f^AM/^ A^ A^ Af^ V^W^ /^ "AA|f^ >^  —“ 
10% . 
0.1 1 
oL L _ _ _ J 
0 A 200 B 400 
Time (in 1000 slots) 




I n this thesis, we have discussed the construct ion of a large scale switch fabric 
using interconnection of smaller modules. We have considered a way of switch-
ing, hierarchical switching, where traffic is switched from stage to stage, and is 
switched back whenever possible. The result ing switch interconnection archi-
tecture has features of incremental scalable and stackable. We considered HBS I 
as the simplest fo rm of hierarchical switching, and found that i t gives a good 
call blocking, throughput , and cell loss performance. We showed how H B S I can 
provide alternate rout ing paths when a l ink or module fails. The drawback of 
H B S I is that i t uses 32 x 32 memory fabrics.as 16 x 16 modules, and has doubled 
the interconnections. We argue that this is necessary in order to give a good 
call b locking performance, and the increase in cost can be m in ima l as I / O and 
call processing circuitr ies are dominat ing the cost of a switching system. 
Next , we studied a mult ichannel switching a lgor i thm and the corresponding 
resequencing algor i thm. Using these algor i thm, we are able to use point- to-point 
99 
Chapter 5 Conclusion 
swi tch ing modules to emulate a shared memory switch in bu i ld ing HBSI . Simu-
lat ions show tha t w i t h t runk-based output por t assignment, the performance is 
almost the same as the cases when shared memory swi tching modules are used. 
I n the f ina l par t of this thesis, we discussed A T M scheduling for QoS guar-
antees of A T M connections. We reviewed the VCS discipl ine by using a T ime-
Pr i o r i t y Mode l ( T P M ) . We found that the model is powerful on i ts own tha t i t 
can model many t ime-dependent p r io r i t y schemes. Based on the insights f r om 
the model , we derived a. Gated V i r t u a l Clock Scheduling (GVCS) scheme which 
gives the same th roughput guarantee as in VCS, but has a smaller dela.y to low-
rate streams. We discussed the implementa t ion issues of T P M and also studied 
the possib i l i ty of in tegrat ion of the resequencer and A T M scheduler. 
F ina l ly , we would l ike to suggest the fo l lowing directions for fu tu re research: 
• W h i l e H B S I employs a regular banyan structure and arranges the l inks of 
in terna l modules in to 4 l ink groups, fur ther invest igat ion of the feasibil i t ies 
and propert ies of other interconnect ion patterns can be done. 
• GVCS provides gua,rantees to ind iv idua l streams, and the remain ing band-
w i d t h is shared by al l streams in a round-robin basis. I t m igh t b.e interest-
ing to consider other disciplines for sl iaring the remain ing bandw id th and 
study the effect to the delay of the streams. 
• To consider the complex i ty and propose detai led designs on the implemen-
ta t i on of the PRS, the resequencer and the channel al locat ion device in to 




1] Andrew S. Tanenbaum. "Computer Networks'\ Prent ice-Hal l Internat ional , 
Inc,, 1996. 
2] Y.S. Yeh K . Y . Eng, M.J . Karol . “ A growable packet ( A T M ) switch ar-
chitecture: design principles and appl icat ions". In Proceedings of IEEE 
GLOBECOM ,89，Dallas, Texas, pages 1159-1165, Nov 1989. • 
.3] T . T . Lee. “ A modular architecture for very large packet switches". IEEE 
J. Select. Areas on Cornmum.. 38:1097-1106, July 1990. 
'4] B.S. Choe and H.J. Chao. “Fault tolerance of a large-scale mult icast out-
put buffered ATM switch". In Proceedings of IEEE INFOCOM '94, San 
Francisco, CA, pages 1456-1464, Jun 1994. 
'5] S.C. Liew and K . Lu. “A 3-stage interconnection structure for very large 
packet switches". In Pvocc(^diugs of IEEE ICC ’90, pages 316.7.1-316.7.7, 
1990. 
6] P.C. Wong and E.H. Timg. “ A Large Scale Packet Switch Interconnect ion 
Archi tecture using Overflow Switches". In Proceedings of IEEE ICC '93, 
Geneva, Switzerland, pages 708-714, Ma,y 1993. 
101 
7] N .D . Georganas A . K . Gupta, L . 0 . Barbosa. "L im i ted intermediate buffer 
swi tch modules and their interconnection networks for B I S D N " . In Pro-
ceedings of IEEE GLOBECOM ,H, 1994. 
8] K . Sezaki and Y . Yasucla. “ A general architecture of A T M switching net-
works which are non-blocking at call level". In Proceedings of IEEE TEN-
CON ,92, Melbourne, Australia, pages 603-607, Nov 1992. • 
.9] A . Choudhury and E.L. Hanhe. “Buffer Management in a Hierarchical 
Shared Memory Switch". In Proceedings of IEEE INFOCOM '94, San 
Francisco, CA, pages 1410-1419, June 1994. 
10] T .H . Lee and S.J. L iu. "Performance Analysis of a Large Scale A T M Switch 
w i t h Input and Outpu t Buffers”. In Proceedings of IEEE INFOCOM '94, 
San Francisco, CA, pages 1465-1471, J im l994. 
11] K . Y . Cheung and P.C. Wong. “ B M S N and SpiderNet as Large Scale A T M 
Switch Interconnection Archi tectures". In Proceedings of IEEE ICC '91, 
1997. 
12] Kr ishnan Padmanabhan. “ A n Efficient Archi tecture for Fault-Tolerant 
A T M Switches". IEEE/ACM Trans, on Networking, 3(5):527-537, October 
1995. , 
.13] K .L . Eddie Law and Alber to Leon-Garcia. “ A Growable Large Scale A T M 
Mul t icast Swi tch". In Proceedings of IEEE ICC ,96, 1996. 
14] Andrz i f Jajszczyk and Wojciech Kabacii iski. “ A Growable A T M Switching 
Fabric Arch i tec ture" . IEEE J. Select, Areas on Commum, 43(2/3/4) :1155-
1162, F e b r u a r y / M a r c h / A p r i l 1995. 
15] P. S. M i n , H. Saidi, ancl M . V . Hegcle. “A Nonblocking Archi tecture for 
Broadband Mul t ichannel Switching". 'lEEE/ACM Trans, on Networking, 
3(2):181-198, A p r i l 1995. 
16] R .A . Shanke K . Y . Eiig, M , A . Pashan, M.J. Karol , and G.D. Mar t i n . " A 
H igh Performance Prototype 2.5 Gb/s A T M Switch for Broadband Applica-
tions". In Pmcccdmgs of IEEE GLOBECOM ,92, Orlando, Florida, pages 
111-117, Dec 1992. 
17] C. Clos. “ A Study of Non-Blocking Switching Networks". Bell Syst. Tech. 
J., 32:406-424, March 1953. 
18] Joseph Y . Hui . “Switching and Traffic Theory for Integrated Broadband 
Networks”. K luwer Academic Publishers, 1990. 
19] Youn Chan Jung ancl Chong-Kwan Un. "Banyan Mu l t i pa th Self-Routing 
A T M Switches w i t h Shared Buffer Type Switch Elements，’. IEEE J. Select. 
Areas on Commum, 43(11):2847-2857, November 1995. 
.20] Joan Garcia-Haro ancl Ariclrzej Jajszczyk. “ A T M Shared-Memory Switching 
Archi tectures". IEEE Network Magazine, pages 18-26, Ju l y /Augus t 1994. 
.21] Cheng Tee Hiang. “ A Mul t ichannel A T M Switch w i t h Ou tpu t Buffer-
ing" . I n Proceedings of IEEE Singapore International Conference on Net-
works/International Conference on Information Engineering ,93, pages 
364-368, 1993. 
22].Jacques H. Dejean, Lars D i t tmann , and Claus N. Lorenzen. "St r ing Mode 
—A New Concept for Performance Improvement of A T M Networks" . IEEE 
J. Select. Areas on Commum, 9(9):1452-1460, December 1991. 
.23] A r t hu r Y - M . L in and John A. Silvester. "Pr io r i t y Queueing Strategies for 
Traff ic Contro l at a Mul t ichannel A T M Switching System”. I n Proceedings 
ofIEEE GLOBECOM ,91, pages 234-238, 1991. 
.24] T . H. Cheng and D. G. Smith. "Queueing Analysis of a Mul t ichannel A T M 
Switch w i t h Input Buffer ing". In Proceedings ofIEEE ICC，91, pages 1028-
1032, 1991. 
.25] A T M Forum. Traffic Management Specification version 4.0, A p r i l 1996. 
26] K . E. Batcher. "Sort ing Networks and Their Appl icat ions". I n Proceedings 
of 1968 Spring Joint Computer Conf.,.l9Q8. 
27] Hyong S. K i m . "Mul t ichannel A T M Switch w i t h Preserved Packet Se-
quence". In Proceedings of IEEE ICC ,92, pages 1634-1638, 1992. 
.28] Toshiya A R A M A K I , Hiroshi SUZUKI , Shin ichiro H A Y A N O , and Takao 
T A K E U C H I . "Paral lel “ A T O M ” Switch Archi tecture for High-Speed A T M 
Networks". In Proceedings of IEEE ICC ,92, pages 250-254, 1992. 
• . 
29] G.G. X ie and S.S. Lam. "Delay Guarantee of V i r t u a l Clock Server". 
IEEE/ACM Trans, on Networking, 3(6):671-682, Dec 1995. 
30] H. Zhang and S. Keshav. "Comparison of rate-based service disciplines". 
I n Proceedings of ACM SIGCOMM ,91, pages 113-121, 1991. 
.31] Pawan Goyal and Harr ick M . V in . "Generalized Guaranteed Rate Schedul-
ing A lgor i thms: A Framework”，Technical Report TR-95-30, Department 
of Computer Sciences, Universi ty of Texas at Aust in , 1995. 
.32] L. Zhang. "V i r t ua l Clock: a new traff ic control a lgor i thm for packet switch-
ing networks". In Proceedings ofACMSIGCOMM，90, pages 19-29, August 
1990. 
33] A . K . Parekh and R.G. Gallager. “ A generalized processor sharing ap-
proach to flow control in integrated services networks: the single node case". 
IEEE/ACM Trans, on Networking, 1(3):344-357, Jun 1993. 
•34] N.R. Figueira and J. Pasquale. “ A n Upper Bound on Delay for the V i r t u a l 
Clock Service Discipl ine". IEEE/ACM Trans, on Networking, 3(4):399-408, 
August 1995. 
•35] Sanjay Gupta, Ke i t l i W . Ross, and Magda El Zarki. "Rout ing in V i r t u a l 
pa th Based A T M Networks". In Proceedings of IEEE GLOBECOM，92, 
pages 571-575, 1992. • 
36] Ren-Hung Hwang, James F. Kurose, and Don Towsley. “ M D P Rout ing 
in A T M Networks Using V i r t ua l Path Concept". In Proceedings of IEEE 
INFOCOM ,94, pages 1509-1517, 1994. 
37] M a r t i n o De Marco and Acl i i l le Pattavina. "D is t r ibu ted rout ing protocols 
for A T M extended banyan networks". In Proceedings of IEEE INFOCOM 
' 9 1 pages 1428-1437, 1994. 
'38] Ren-Hung Hwang. “LLR. Rout ing in Homogeneous VP-based A T M Net-
^ works". I n Proceedings of IEEE INFOCOM ,95, pages 587-593, 1995. 
39] Marc ia Lynn Whicker and Lee Sigelman. "Computer Simulation Applica-
tions: An Introduction: SAGE Publicat ions, Inc., 1991. 
•40] Sara Baase. “Computer Algorithms”. Acldison-Wesley, 1988. 
^ i i i m ^ i i ^ l ^ ^ i i
 ; ^ .
























































 ^ ¾ .






. . . 
B s s _ . f : : - . . . . . - . : , -
 .
 u




































: ¾ . . ¾
 / :
 . . .
 . .
 . : : v s 
:>.. . . - , .
 . . - . / "
 1 J 
、 , , -
 .
 -





 - - S 





























 . , -
 .





 - . : . 
. . . T 
h . .
 *
 • 
•
 ~
 . ,
 . 
t 
C
U
H
I
C
 
l
_
i
b
r
a
r
i
e
s
 
-  i_
l 
0
0
^
5
J
^
7
^
M
 一
 
