Performance analysis, design and reliability of the Balanced Gamma network by El Sayed, Yaser
1 CENTRE FOR NEWFOUNDLAND STUDIES 
. TOTAL OF 10 PAGES ONLY MAY BE XEROXED 
YASER EL SAYED 



Performance Analysis, Design and 
Reliability of the Balanced Gamma 
Network 
A thesis submitted to the Wool of Graduate 
Studios in mnformit?, with the yui-eots for the 
Degree of Doetor d Phil-phy 
Faculty of Em&dn# and Applied Science 
Memorial University of N d u n d l m d  
December 1999 
St. John's Nevfoudland Canada 
Abstract 
Switching is one of the bottlet~sks restraining the e6orts of mmrdten taaard 
implementing broadband communication wtems. in this dienation. we provide 
a comprehensive study of a pmmising switching architecture called the Balanced 
Gamma (BG) network. The BG network has shown gmd performance in t m s  
of throughput, average cell delay, and reliability, and has displayed potential for 
application in broadband communications switch fabrics. 
Designing highly reliable systrms is a m e a l  requirement in the industry of bmad- 
band communieatioos where mnsequellas of the system failurn are wry expensive. 
Accordingly, we pmvide an ernd model for network reliability of the BG oetmrk. 
The model demoasrates that the network is highly rel ibk and can be mofidentiy 
deploved in communication systems. 
The performance of the network is further invenignted under di6ereat payloads 
containing unifonn and non-uniform t d c .  Uniform random and burn). are the 
traffic types used. Sweral simulation experiment. am carried m t  to measure the e l l  
lops, cell average delay, and bu8ering requirements of the BC network. In addition, 
we pursue an analytical model under uniform random traffic to verify our simulatioo 
mulrs. The performance of the network is c m p d  with both an ideal nooblocking 
~etwork and the embar  nefxmk. It is determined that the network has much better 
behavior than the eropsbar switch and oparaler very eiwely to the ideal arehiteture 
under mast typa of 0 6 d  t d e  loads. 
Finally, we intraduce a VLSI deign for the BG network using 0.35 CMOS t& 
noloa supported by the Canadian Micmdectmnics Corporation. The d m  has 
mainly three mmpooent., the switching eiemmt, the output port, and the network 
main motroller. The deign feat- buiit-in sell-test ( E m )  which has bmme an 
menrial part of any fm digital system. We also m e t r i m  the design such that 
the amount of effort o d d  to generate a fabric with arbitrary size is minimal. We 
describe Ihe design in the V q  High Sped lntgrntd Circuit Desniptian Language 
(VHDL). 
Acknowledgments 
Fint of dl ,  I am thankful to The .Almighty God, Who in Hi infinite memy have 
helped me to bring this work to light. 
I also ave a l a  to my family, epwially my mother and father for their continuous 
support and great racrifim that wm the major facton in making this work a reality. 
I like to w p m  my sincere thanks to my supervisor Dr. Venkatesan lor his 
supervision, support, and mperatbo in the rmuse of this work. I also do not forget 
the r u p p n  and help of my mpenisory mmmittce Dr. H d  Hey. and Dr. Paul 
Gillad. 
I like to thank the School d Graduate Studie at  the Memorial Univenity of 
Sewfoundland for the financial support it pmvided during my Ph.D. p-m. 
I will always remember the impwant disu9ions I had with h. Rod Byme in 
the Depanment of Computer Sd-. These d i ius s im have greatly ensbled me in 
the design phase of the this d. Also, Mr. Michael Renddl in the Department 
of Computer Science d e e m  apeial thanks for his true eaopcnrtion in M n g  the 
problems related to the VLSI CAD tools. 
Finally, I like to thank all my friends and colleague that sincerely helped during 
my Ph.D. program. I specially thank B. B h b r a m a n i m ,  A. Khan, P. Mchmrra, 
K. Momin, F. Paver, R Thuppal. V. Vu-, and A. M. Zeiner. 
Contents 
Abstract i 
Table of Contents iv 
List of Tables xi 
Notation and List of Abbmietions xlii 
1 Introduction 
1.1 Baekgrwnd . . . . . . . . . . . . . 
1.2 ATM Switching and lP Switching . 
1.3 blotivalion. . . . . . . . . . . . . . 
1.4 Thesis Organitation. . . . . . . . . 
2 Fast Packet S w i t e  Nehmrh 10 
2.1 Intmduetion . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 10 
2.2 Cisrrifications of Packet Svitchiq Nehsorks . . . . . . . . . . . . . . 10 
2.3 Time Division Switch Fabrics . . . . . . . . . . . . . . . . . . . . . . 13 
2.3.1 S h d  Medium Architect- . . . . . . . . . . . . . . . . . . 13 
2.3.2 Shared Memory Architectures . . . . . . . . . . . . . . . . . .  15 
2.4 Space Division Snitch fabrics . . . . . . . . . . . . . . . . . . . . . .  20 
2.4.1 Smgle Stage Amhimturn . . . . . . . . . . . . . . . . . . . .  20 
2.4.2 Multistap IotemneEtion N e m r b  . . . . . . . . . . . . . .  25 
2.4.2.1 Single Path MINa . . . . . . . . . . . . . . . . . . . .  26 
2.4.2.2 Multipth MINs . . . . . . . . . . . . . . . . . . . .  28 
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  41 
3 Balanced Gamma Nehvork 42 
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  42 
3.2 Historical Bdgmund . . . . . . . . . . . . . . . . . . . . . . . . . .  42 
3.2.1 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  42 
3.2.2 Routing Algorithm . . . . . . . . . . . . . . . . . . . . . . . .  43 
3.3 Balanced Gamma New Stmctw . . . . . . . . . . . . . . . . . . . .  45 
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  48 
4 Performance Unde U n i h  and Non Uniform ' IhtTic 40 
4.1 Intmduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  49 
4.2 Buffering Strategis . . . . . . . . . . . . . . . . . . . . . . . . . . . .  50 
4.2.1 Input/Output BuBering . . . . . . . . . . . . . . . . . . . . .  51 
4.2.2 Internal BuAerirlg . . . . . . . . . . . . . . . . . . . . . . . . .  54 
4.3 Uniform Random ME . . . . . . . . . . . . . . . . . . . . . . . . .  57 
4.3.1 A d y t i d  Modelling . . . . . . . . . . . . . . . . . . . . . . .  58 
4.3.2 Finite Output B u k  . . . . . . . . . . . . . . . . . . . . . . .  61 
4.4 BurstyTra6e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  64 
4.5 Non-Uniform Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . .  82 
4.5.1 L = l  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  84 
4.52 L > 1  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  85 
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  89 
5 Design of the  B h e d  Gamma Network 80 
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  90 
5.2 Design Flow. Functional T s t  and Verification . . . . . . . . . . . . .  90 
5 3 Design far Tmability . . . . . . . . . . . . . . . . . . . . . . . . . . .  91 
5.3.1 BIST Methods . . . . . . . . . . . . . . . . . . . . . . . . . .  93 
5.3.2 Structural OfT-Line Arehitrtunr and Stimulus Structures . . 96 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .  5.4 Chip Architecture 99 
. . 
3.2 System Components . . . . . . . . . . . . . . . . . . . . . . . . . . .  103 
. . 
o.a.1 Switching Element . . . . . . . . . . . . . . . . . . . . . . . .  103 
5.5.2 Output Port Contmller . . . . . . . . . . . . . . . . . . . . . .  114 
5.5.3 Network Main Cantrolln . . . . . . . . . . . . . . . . . . . . .  121 
5.6 Simulation end T m  RrJuIts . . . . . . . . . . . . . . . . . . . . . . .  125 
. - 
9.1  Summaq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  128 
6 Favlt Tolerance and Reliability Pmput ies  130 
6.1 Intmduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  130 
6.2 Baekpund . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  130 
6.3 Fault Tolerance hopehies of the BG Network . . . . . . . . . . . . .  131 
6.4 Reliability Andysis . . . . . . . . . . . . . . . . . . . . . . . . . . . .  135 
6.4.1 TeminalReliabiity . . . . . . . . . . . . . . . . . . . . . . . .  1% 
6.4.2 Broadcast Reliability . . . . . . . . . . . . . . . . . . . . . . .  140 
6.4.3 Nemrk Reliability . . . . . . . . . . . . . . . . . . . . . . . .  1 4  
6.4.4 Mean T i e  to Failure . . . . . . . . . . . . . . . . . . . . . .  144 
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  147 
7 Conclusion m d  Future Wmk 
i .1  Future Work . . . . . . . . . 
References 
A Balanced Gamma Network Topology 
B Balaoeed Gemma Network Routing Algorithm 
C Thoughput Under Uniform Random 'It& 
D Calculation of I N 8  and LNF, 
List of Figures 
1.1 1 block d i ~ m  of a switch fabric . . . . . . . . . . . . . . . . . . .  6 
2.1 Classification d irltemnection networks [I] . . . . . . . . . . . . . . .  I2 
2.2 A basic buebmed snitch fabric . . . . . . . . . . . . . . . . . . . . . .  I3 
. . . . . . . . . . . . . . . . . . . . . .  2.3 A kit ring-based switch fabric 14 
. . . . . . . . . . .  2.4 Memory capacity d AHQs wrsus shared buffer 121 l i  
. . . . . . . . . . . . . . . . . . . . . . . . .  2.5 COM16M architmure 13) 19 
2.6 .4 crossbar snitch fabric . . . . . . . . . . . . . . . . . . . . . . . . . .  21 
2 . i  Cmssbar SE ststs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  21 
2.8 Knodmttt switch architecture . . . . . . . . . . . . . . . . . . . . . .  23 
2.9 Different banyan network topola@es (a) banyan (b) omega (c) bswline . 27 
2.10 .% rchitsture of the CIm network . . . . . . . . . . . . . . . . . . . . .  M 
2.11 .4 rchitsture of an 8 x 8 Kappa network 141 . . . . . . . . . . . . . . .  32 
2.12 .4 rehit~ture of an 8 x 8 BB network [5] . . . . . . . . . . . . . . . . .  33 
2.13 The architecture of an 8 x 8 BeneI network . . . . . . . . . . . . . . .  I 
2.14 A switching element in a Zdilated banyan network . . . . . . . . . . .  36 
2.15 The tandem srehitsure . . . . . . . . . . . . . . . . . . . . . . . . . .  37 
2.16 Timing quenee of the different pipeline p h  . . . . . . . . . . . . .  39 
2.17 The pipeline d t e e t u r e  . . . . . . . . . . . . . . . . . . . . . . . . .  40 
. . . . . . . . . . . . . . . . . . . . . . . . . . . .  3.1 Initial BG structure 44 
3.2 New BG structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . li 
3.3 A SE routing decision example. . . . . . . . . . . . . . . . . . . . . . 48 
4.1 Input-output buflerstratcgiea (a) puminput (b) purwutput (e) input- 
output buffering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 
.4 2 Behav~or of D,. and Dm, . . . . . . . . . . . . . . . . . . . . . . . . . 54 
4.3 Different internal buffering style. (a) input (b) output (c) maspoint 
(d) shared buflering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 
A.4 Performance of 64 x 64 Banyan with internal shared bufering stratem 
161 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  56 
4.5 D i r e r e  M a r b  chain of the input b d e r  satus (a) at the beginning 
of every T, and (b) after e w r y f .  . . . . . . . . . . . . . . . . . . . . 59 
A.6 .%-rage eell delay under UKT for different naaork 1- of N = 256. Bj 
4.7 Maximum e l l  delay under URT for different nuam* types of N = 256. 66 
1.8 Probability density for number of input requests for a single output pon. 68 
4.9 Average cell delay for d i h t  average burst length (a) L = 5. (b) 
L= lO. (c )L=15 . (d )L=M.  . . . . . . . . . . . . . . . . . . . . . 72 
4.10 Maximum eell delay for diiaent amage bum length (a) L = 5. (b) 
L = lo, (c) L = 15. (d) L = 20. . . . . . . . . . . . . . . . . . . . . . 73 
1.11 Input bufler requiremen- Lr diflerent average bunt length (a) L = 5, 
(b) L = 10. (c) L = 15, (d) L = 20. . . . . . . . . . . . . . . . . . . . 74 
4.12 Output buRerrquiremeoa f a r d i f m a a ~ a g e  burst length (a) L = 5. 
(b) L = 10, (c) L = 15, (d) L = 20. . . . . . . . . . . . . . . . . - . - 75 
4.13 Required number of plaw. to attain a 10-'allIm ratio with difetent 
average bunt lengths.. . . . . . . . . . . . . . . . . . . . . . . . . . . 76 
4.14 Average cell delay for different average hunt length (a) L = 5. (h) 
L = 10 . (el L = 15. (d) L = 20 . . . . . . . . . . . . . . . . . . . . . .  78 
4.15 Maximum cell delay for di8mnt average hunt length (a] L = 5 . (h) 
L = 10 . (c] L = 15. (dl L = 20 . . . . . . . . . . . . . . . . . . . . . .  79 
4.16 Input buffer requiremenu for different average blurt length (a) L = 5 . 
(b) L = 10. (e) L = 15. (dl L = 20 . . . . . . . . . . . . . . . . . . . .  80 
4.17 Output buffer requirementsfordifferent average hunt lensh (a) L = 5. 
(b) L = 10. (c) L = 15. (dl L = 20 . . . . . . . . . . . . . . . . . . . .  81 
1.18 OPCs selection probability for different IPCs seeordiig to the model 
in 17) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  83 
4.19 Petformane parametenafdiffemt wnfiyat ians under different loads 
of no-uniform t d e  . . . . . . . . . . . . . . . . . . . . . . . . . . .  86 
5.1 Design flow mmrnmded by CMC . . . . . . . . . . . . . .  
5.2 BIST methods (81 . . . . . . . . . . . . . . . . . . . . . . .  
5.3 Different structural BIST architectures [9] . . . . . . . . . .  
5.4 Top level deseriptian of the BG oemrk . . . . . . . . . . .  
5.5 Structure of the internal cell header . . . . . . . . . . . . .  
5.6 Broadcast r o g m  at  inputs 0 . 1. 3. d 6 are fnlfilled . . .  
5.7 Architecture ofan 4 x 4 SE . . . . . . . . . . . . . . . . . .  
5.8 Block diagram of the input buffer bank of an SE at Stage, . 
5.9 ASM &art of the SE sequmcer . . . . . . . . . . . . . . .  
5.10 Blodr diagram of the teding unit during t en  period . . . .  
5.11 .4 rchiteture afthe OPC . . . . . . . . . . . . . . . . . . . .  
5.12 ASM &art of the OPC aquencer . . . . . . . . . . . . . . .  
5.13 Blodt diagram of the Bu&r wntmllnr . . . . . . . . . . . .  
5.14 The Mhil &an of the network main mntmller . . . . . . . . . . . . .  122 
5.15 Simulation rerdts for a n 4  x 4 SE . . . . . . . . . . . . . . . . . . . .  126 
5.16 Simulation r su lu  for an OPC . . . . . . . . . . . . . . . . . . . . . .  127 
6.1 .a 1 possible S L  that can be visited by a eel1 arriving at input p m  i . 
6.2 T R  b e t  ease R-gmph for 16 x 16 BG network . . . . . . . . . . . . . .  
6.3 T R  worst ease R.gmph for 16 x 16 BG network . . . . . . . . . . . .  
6.4 T R  worn e m  R-gmph for N x N BG network . . . . . . . . . . . . .  
6.5 BR R-gmph for an 8 x 8 BG net-k . . . . . . . . . . . . . . . . . .  
6.6 (NL,, t )  polormanee of the BG network . . . . . . . . . . . . . . .  
D.l  The model d the critical pairs for an iotermrdite stage 
D.2 Expansion of the problem into two queues . . . . . . . .  
D.3 Dividing S L  in the lart stage into two p u p s  . . . . . .  
List of Tables 
2.1 Contml plane speed reduction ratio [lo] . . . . . . . . . . . . . . . . .  40 
4.1 Analytical and simulation mulls  br the BG network with zero input 
buffehg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  61 
4.2 ERect of input buffer size on the TP,, of a singledata plane BG network 62 
4.3 Cell Ims under URT for different network type. of N = 256 . . . . . .  63 
4.4 lnptttdvtput bu&g requirements under URT for dii-t network 
types of N = 256 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  67 
4.5 Cell laas ratio under bursty I d s  of mriw h m t  lenghs . . . . . . .  70 
4.6 Number of plane h l l d  to mmpare the multiple plane eonfigura 
tiom of the arehimturn under t e t  . . . . . . . . . . . . . . . . . . .  74 
4.7 Cell loss ratio f a  single plane a r e h i t ~ t u m  under non-unifonn traffic 
lo sd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  84 
4.8 Input buffer quiquirrments for BG-I mn@ration under uniform and 
man-unibnn bursty load.. . . . . . . . . . . . . . . . . . . . . . . . .  87 
4.9 Output buffer requirements br BG-1 mofiguratian under uniform and 
oon-unifm b u m  I& . . . . . . . . . . . . . . . . . . . . . . . . .  88 
5.1 Stimulus design approaches [9] . . . . . . . . .  
5.2 Definedell type... . . . . . . . . . . . . . . .  
5.3 Ditrihution of the hadware complexity of the S L  in a 16 x 16 (in 
gates) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  115 
5.4 Delays of the S L  critical paths . . . . . . . . . . . . . . . . . . . . . .  117 
5.5 Distribution of hardware mmpiarity for OPC (in gates) . . . . . . . .  122 
5.6 Illustration of cell arrival in Figure 5.15. . . . . . . . . . . . . . . . .  125 
- - 
a . ~  Illustration of cell arrival in F i w  5.16. . . . . . . . . . . . . . . . .  128 
. . . .  6.1 SEs' complexities (in pma) for diRetmt sizes of the BG network 1 1  
6.2 Faiiure./lb hours (Ar) br the mmpanents of 128 x 128 BG network 
due to the 6n t  pert 01 the model . . . . . . . . . . . . . . . . . . . . .  136 
6.3 Estimated failures/106 houn (Arr) due to the s e d  pan of the mde l  . 136 
. . . . . . . . .  6.4 Behavior of the TR for varioes sizes of the BG netyork 139 
6.5 Behavior of the BR Lr various sizes of the BG netaork . . . . . . . .  142 
6.6 Behavior of the BR Lr various sizes of the BG network by excluding 
the &K term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  142 
6.i Behavior of the NR for various s i m  d the BG network by excluding 
both the FSR pod OSR terms . . . . . . . . . . . . . . . . . . . . . .  145 
6.8 Estimated BG necwork M l u w l l b  hours . . . . . . . . . . . . . . . .  146 
Notation and List of Abbreviations 
L 
N 
rVR 
OL. 
A 
AHQ 
ATE 
ATTbl 
BB 
BG 
B-ISDN 
BiST 
BP 
CAC 
CMC 
CMOS 
CP 
CUT 
DPO 
DSM 
ECL 
EFCI 
EXCON 
FIFO 
GFC 
: number of stages in a MW 
: emtput buffer size 
: bmadcart reliability 
: delay in pure-input bufferingstratep 
: delay in pure-output bulking strategy 
: intput link i of an SE 
: number of data plane4 is a switching network 
: average burst length 
: number of iopurs/outputs i s  a mtsngular mithing network 
: netwrk reliabihty 
: output link i of an SE 
: numbcr of service priorities 
: Tbmughput 
: terminal reliability 
: switching parameter fm the old mnting decision of the 
BG network 
: failure rate 
: Add- Holding Queues 
: Automatic Test Equipment 
: Asyoehronws Tmrfer Mode 
: Bateher Banyan 
: Balanced Gamma 
: Broadband Integrated Services Digital Network 
: Built-Is Self-Test 
: baekpmure 
: Coonetion Adrnision Cosml  
: Canadian Microeistmoiea Corporation 
: Complementnry Metal Oxide Semimnduetor 
: Control Part 
: Circuit Under Test 
: Iklayed Pushwt 
: Expand-Concentrate 
: First-In Fim-Out 
: Generic Flow Cootml 
GN : Gamma Network 
HOL : Head Of Liw 
IETF : Internet Engineering Task Force 
IP : Internet Protoml 
IPng : Internet Pmtaml next generation 
IPv4 : Internet Pmtoml version 4 
I P 6  : Internet Pmtoml version 6 
ITU-T : International Telceommunieatiolu Unit-Teleeommunieations 
Standsrdiration Seetor 
KS : K a p p  Network 
LAN : Local Area Netmrk 
LANE : LAN Emulation 
LFSR : Linear Feedbaxk Shift W t e r  
SIIN : Multistage Iotereooncetion Netvork 
YISR : Multiple In Sh i i  Register 
LIPOA : MultiProtoeol Over ATM 
MTTF : Mean Time To Failure 
NC : No Controk 
XbIC : N e w &  Main Cmtmller 
OR4 : Output R e p o w  Analyzer 
PO : Pushout 
W : Quality of Service 
RBP : Restricted Bakp-re 
SE : Switching Element 
SIN : Single-stage Interconnection NaMrk 
SRN : Semi-Remangeably Nonblodting 
STD : Synchronous Time Division 
RI : Ring Interface 
RTL : Register Trwfer  l ewl  
b?C : Usage Parameter Contml 
URT : Unilorm Random Tr&c 
VCI : Virtual Cbaonel Identifier 
VHDL : Very high s p e d  integrated circuit High l ewl  k r i p t i o n  
Language 
W I  : Virtual Path Identifier 
Chapter 1 
Introduction 
1.1 Background 
Currently, the telpnnnmunieations industry is split betwen two pmtoeols: namely 
the asynehmnous transfer mode (ATM) and the inte111et p m t m l  (IP). ATTM kx 
been identtfied by the Intmationrl Telemmmunications Unit - Telsommuoications 
Standardization Seetor (ITU-T) as r comprehensive switching technique that is c a p  
ble of meting the require men^ of BISDN: sueh na high thmughput, low switching 
delay, low packet l m  probability, expandabillty, testability, fault tolerance, low cost. 
and ability to aehim bmadeasthg ar n u  as multkasting. BISDN (broadband in- 
tegrated service digital network) is the extendon of the ISDN (previously defined for 
telephone and data transmission servieea). BlSDN rill he capable of providing all 
services (audio, video, and data transfer applications) i n s  11t1iM farhion to various 
placos at varying speds. The advantages ue c a r  of installation and rnainte-, 
better user ae- and emnornical service, and fledbiity in the intmduction and e m  
lution of services. ATM ia also upceted to provide mmmunication aenices r i th 
negotiable quality of service (W) levels. IP rrp inmtcd to be the network layer 
(layer 3 in the OSI model) pmtocol mnning oo the top of Ethernet and Token Ring 
1 4  area network (LANs). The original m i o m  of IP, such as IP vnsion 4 (IPvl), 
'*0!1JaO(lOJ 
au!oda-ru!od ald!rlnm 30 ?o!od!alnmq-au!od %'sn iiq r w ~ p q  s anlnol!s or 
a o ~ q  ' lo~orwd r~l!OdQa-rmOd B i p a n w n n J  S! NJ,V tramaeu r s e ~ p o ~ q  E!~I  
uo i l u  d l  aqr JO gsmrse] aqa Amm pop w p m  aswpe01.q a q  Pad!sp n d l  . 
'P34Wd SB1UO!I~fmOJ B S! d l  !%O!IJ~OOOJ qJPa 
lo] paqsyqars aq plnoqs qred e 'a'! 'lomrwd parwuo-notmnoo~ s s! ~ q ~ y  . 
:maqi o-mq .sa>oaJay!p 
mrem am amqa *~anamoq !qawoid a o ! p ~ ! ~  1- ara d l  pop l q ~ y  aqr qlog 
'[ll] Wdl I0  (%dl) oo!wlaoa~ axan d l  m m o m )  pmpuslr nlau a q ~  
.oo!rsnolpia - p p  JO malqwd o n q  nan aqr aqos oa p w  w d d n s  or paodo~d  
o q  mq pvp- d l  sen v 'Sob rioddns aoo saop a! 's.! 'soo!rv~!!dde fdu!r%!wroen 
aqa 8 ~ -  a n o q r ~  p m  oo!qwmoara r n o q a ~  paddexp am sssffalq .a[qsquun ale 
as end-systems, network motmllers, and switch fabrics. With the diverse type. of 
services expeetd in the ATM networks, the QoS roquirments differ from one service 
to another. Some services sre sensitive to ddap ,  such ar real time applicatio~~s, and 
some other service are eositive toeell lorn, such as data transfer. .4iso, the sensiti%ity 
level differs fmm one end-user to another. The maximum mll rate, maximum cell 
delay, average cell delay, and cell loss pmbahility are parameters u d  to characterize 
the QaS. If an end-system is requesting a certain level of QoS that the network 
resources can not furnish, then the network Mode this request. In same e- the 
oetwrk rejects a request of an end-rptw even if them sre sufficient nsaurees br 
that request 1161. This takes place when the request violate. the initial poS contract 
between the network and the end-system. 
Congestion and flow eolrtml are among the most complex iaues with the ATM 
technology The reasons for this eomplexiry are wry evident [17]. Firstly, the diverse 
service elases that ATM pmmim to handle with m p t a h l e  QoS levels. Secondly, 
the ATTI1 technology will appar in different network domains - L4N. MAN, and 
bV.W - that have large differences in the h l d  equipment such as sped  and buffering. 
Thirdl?: although the ATM switches ean p m t r t  themselves fmm congestion by dis- 
carding cells, this may result in paarm werslt throughput sf the network espially 
for applications whichemploy encryption/eompreaion where - losing one d l  can 
lead to losing the whole mssage aea.  There are nine mechaoisms used by current 
.ITM switches to handle congestion md Bow emtml. These mshanisms are [17]: 
1. Conneetion Admission Cantrol (CAC); 
2. Usage Parameter Control (UPC); 
3. Selective Cell D i i i n 6 ;  
4. Traffic Shaping; 
.5 Explicit Forward Colyaion Indication (EFCI); 
6. Renaurce Management Using Virtual Paths; 
i Frame Disesrd; 
8. Generic Flow Contml (GFC); 
9. ABR Flow Control. 
The abwe me=hanismr reduce mogestioo in the network by either impeding the traffic 
that enters the oemrk ulltil the m u m s  h e  avaiiabie, or dmpping iav priority 
traffic and allow the higher priority traffic instead. The performance 01 the above 
mechanisms is greatly aceted by the availability d the network nsourm. 
The IP protoml is a muted pmtwol which ean easily sale up. H m e r ,  muting 
is not m efficient as the switching pmxs .  Accordingly, the current trend is towards 
integrating the IP routing pmtomi in the ATM switching networks. There are elsg 
sical methods which am p m p s d  by the Internet Enginewing Task Foree (IETF) 
and label-based IP switching metbods p ropad  by the industry [ I l l .  The clsssieal 
methods appmachs vm eriticind by the industry b s a u r  they -atad IF' on 
the top of ATM without taking any advantages of the ATM features. This leads to 
replication of functions and mmplicatim of the network mansgement. Eumples 
for the da~sieal methods am the L M  emulation (LINE) and the multipmtocol o w  
ATTbl (MPOA). In the label-bed IP switching methods, compaoi= such ar Ciseo 
and Ipsilon are integating the IP layer with the ATM switching layer. Example. lor 
the label-bared IP switching methods are the Tag switching and IP switchimg. 
Recently, the new Gigabit switched Ethernet networks m orering a good alter- 
native by %ggre&vrly increasing the bandwidth to the dgabit range. Such a devel- 
opment might lead to the old pmtacol stack using IPvB, which feat- QoS, running 
on the tap of gigabit switched Ethernet network at the LAN level. Such high speeds 
with the next generation IP to he an ear?. migration h m  the legacy LAN 
systems. 
From the above discussion, it appears that the industry it trending towards switch- 
ing rather than routing. Switch fabrics are the e m  of the switching pmelss. Switch 
fabrics are placed in switching ndes  scattered all m the aetwrk. The operation 
of any switching mechanism is directly (or indimtly) influend by the perfarmanee 
of the switch fabrics. Ler*i efficient switch fabrics result in paor performane of the 
whole network due to delays and retransmisaios of the Icat t r d e .  Efficient switch 
fabrics are charaeteriaed by their hi@ thmughpt and low delay. 
Figure 1.1 depicts a blmk diagram of a switch fabric. A witch fabric should 
feature some or all of the fo l ldog  funetiaor: 
. Cell buffering; 
Traffic mneentratioa and multiplexing; 
Fault tolerana; 
Multicasting and broadcasting; 
Cell scheduling bssed on delay priorities; 
. Selective cell diaearding b a d  on lasr priorities; 
. Congestion monitoring 
indeed, building up a switch fabric which featurea all of the above functions i. a 
challenging objective. Cell baeriog is an m t i a l  hoetion of the switch fabric, 
espffislly in case of magestion. W c  moeentrstiao and multipl- is a natural 
Switching 
Network 
F i w  1.1: A bhxk *am of a switch bkii. 
role of a switch fabric. A switch fabric m n t r a t e .  and mubipiew the arriving 
traffic onto the OPCs. Fault tolerance has bemme a very impanant feature in Icday's 
communications equipment where the down time of the network is very expnsive. 
.4 fault tolerance modd b usually -eiatPd with the tetability level of a sptcm. 
Bwause the MSDN network will pmvideperviee. such ar video on-demand and video 
eonbremcing, multicasting and bmadcapting are needed to ruppat t h e  semi-. 
Scheduling is neeers.ry for the network to effieientiy manipdate tbe difereot t d e  
types that arise fmm the various service =lases. Cell discarding is essential in care 
of congestion. Switch fabrio dmp nlla that have lower priorities. For the s u m  of 
the congestion and Em motml m&ani- we d i s c 4  earlier, a switch fabric b 
required to monitor the amount ofmagestioo it &em and paws this to the w w k  
contml plane. Hence. the congestion and Rm mntmi mechanisms take the pmpr  
dedsiena. 
1.3 Motivation 
In this dissertation, we pmvide a mmprehensive study of a promisingswitcbing a&+ 
teeture called the Baianeed Gamma (BG) network. With infinite buffering resou-. 
the BG network has shown g o d  perfonnaoee in terms of throughput. average eell 
delay, and reliability. To continue studying the BG network, we further imstigate 
the performanee of the network with realistic buffering resources. Realistic buffer 
resources are comprised of finite queues. We investigate the performanee d the net- 
work under d~fferent payloads containing uniform and "on uniform traffic. Uniform 
random and bursty are the traffic t?.pes used. Sewral simulation experiments are 
camled out to measure the cell l m ,  eell average delay, and hufleriog requiremencJ of 
the BG network. In additioo, we pursued an anslytical model under uniform random 
traffic to verib our simulatiao results. The performance of the network is compa~ed 
with both an ideal setwrk and the masbar network. The prformanee results of 
the ideal network are used as an upper bound and it is aham how the BG network 
relates to that hound. The selection of the crossbar oetwork a= e n t i a l  to p m  
the efficient performanee of the BG netwrk. The nonbar network is rrmgnimd 
b? many researchers as a suitable candidate for ATT41 switching because of iu good 
internal blocking characteristics. 
The reliability is a measure of the system's ability to operate without laium 
during a specified period of time. It also measures the a p t e m '  ability to tolerare 
faults. Previawly, the reliability models of the BG network were established and 
showed that the reliabilityaf the BG network outperformsether mmpetitive naxorlrp. 
Howewr. the network reliability model for the BG l lmork was incomplete due its 
complicated nature. In this d i i t i o o ,  .ae incmduee a mmplete network reliability 
model of the BG netmrk. 
To emphasize the modulariw and scalability of the BG network we decided to 
car? out a VLSI design for the metwrk. We also wanted to p m  that the BG 
network could be efficiently realized using the c u m t  available VLSl technologies. A 
0 36 prn nes the technology available to us during the design phase. 
1.4 Thesis Organization 
The thesis is divided into four parts. In the first pan. which can be found in Chapter 
2. we provide a survey of the switching architeeturn meant for broadband mmmu- 
nieations. The s u n ~ y  sheds light on the different clasoifications proposed by the 
researchers to classify the broadband architectures, emphwhing the differences and 
the similarities amonga them. In the survey, thmugh examples we describe the ad- 
%antages and the disadvantages of each class. We ako discuss the recent techniques 
used to improve the perfarmanee the existing architectures. 
In the wond  part, which can he found in Chapter 3, we brielly introduce the BG 
network and the previous efforts made to imp- and simplify the netwrk routing 
algorithm. We also emphasize our mntribvtion in dmpli@ing the routing algorithm 
and the necesary changes that had to be made to the network topology. 
In the third part, which can be found in Chapter 4, we invmigate the performance 
of the network with h i t e  buffering reso- io both the lPCs and the OPCs. We 
use both uniform and noo uniform traffic loads. The load types we use are the 
uniform m d o m  traffic ( b i )  and bursty. The parameters used to measure the 
performance are the cell 1061 probability, maximum e l l  delay, avenge cell delay, 
input buffer requirements, and output buffer requirements. .As re mentioned earlier, 
our inwtigation of the BG network will be compued with the perfomnee of both 
the cmsshar and ideal mn blocling networks. 
In the fourth pn. which is composed of bath Chapters 5 and 6, we introd- 
the VLSI design and the exact network reliabiliw modelling for the BC network. 
The YLSI design is carried out wing SFOW CAD tool supported b? the Canadian 
\linoelectronies Corporation (CMC). We dalso use the 0.36 pm CYOS technolcg?, 
also suppaned by CblC, to achieves high speed design. Very high speed integrated 
circuit High Level Description Language (VHDL) is used t o  deeribe the design at 
the Register Transfer Level (m). We finally conclude our work in Chapter i and 
give some directions for future open problems in the area. 
Chapter 2 
Fast Packet Switching Networks 
2.1 Introduction 
Previausix the interest in fast packet snitching oetxorks was due to their use in fast 
parallel computing machines. Recently, more attention has been focused on tbesp 
networks because of the evolving demands of broadband communications. Many 
clasrifieationr of these netxorks have h e n  reported in the literature. In this chap 
rer we shed nome light an there ELarJificatiolu commenting an the similarities and 
differences amongst them. Our mwrage will inciude the sratcof-the-an of the d i -  
tecrures pmpmed for diferent subel- showing thp advanfages and limitations of 
each. Sioee there is a great number of studies available in the literature that can not 
be cawed in this chapter, we tr), ody to focus oo those arehitenurn related to the 
scope of this dissertation. 
2.2 Classifications of Packet Switching Networks 
In the literature, many d d c a t i o w  of f a t  padnt switching networks, a h  called 
witch fabnes, have been rewned [I, 18, 19. 20. 21. 221. Perhaps the elasrifiation 
model ~ntrodueed in [I] is the simples and the most comprehensive, because it hap 
covered man of the well-known d i e t - .  Figure 2.1 depicu the hierarchy of this 
classification. Similar, but less detailed. el&ifieatioos are reported in 1181 and 1191. 
Kyas [I] divides switching networks into two main el-, networks that uss the time 
domain far switching, and networks that use the space domain br switching. In time 
division architeeturer, the phpieal resource is multiplexed among the input-output 
connections. based on discrete time slots. This physical resouree can be a shared 
memov or a shared medium. Ring and bus topologies are typical e m p l e s  for the 
shared medium caw. In care of space division architectures, the connections are 
based on the availability of nonmnfiicting phyical paths within the fabric. Figure 
?.l shows that the family of space division architectures bar more subel- than 
the time divis~on architectures. We will discus each one of these subelsses in more 
detail. stresing on the space division elas. because the pmpcsed switch fabric in this 
dissertation belone to that subelas.. 
Turner and Yamanaka [20] grouped the mhiteetures. in general. into three major 
categories: 1) srngle stage systems. 2) buffed multistage systems, a d  3) unbuHrd 
multistage wtemr.  This elarsifieation is mare directed to multistage intereonnedioo 
networks (HINs), which is a subelars ofthe space division CIS in Kyas's elasificstion 
madei. The rearoo is moeeivably due to the gmwiog interest in these networks. 
$IISs have many advantageous features over other iaermmection netnorks as we 
will explan later. Turner and Yamanaka inherently assumed that all switch fabrics 
are space division architectures, whether single stage or multistage, by including ail 
the time division architectures within the single stage category. This arsumption is 
acceptable becaw the t d e  is relayed through one single shared medium in time 
divisiooarchiteetures. The ehification akoemphsizea the mleaf bufering in MINs. 
Bebre we discus the details of these arehitecturrs, ve give a notation hint. Unles 
othenvise stated, all the architectures we d k w  here are symmetric, i.e. the number 
of input ports equals the number of output ports and is repmented ar IV. 
A 
Space D~w~lon Tlme Mvis~on 
I Mult~plemng 
Figure 2.1: Clusificatioo of intermnnmion networks [I]. 
- i Opc 
- IPC A- 
- -
1 I 
I ' I OPC 
- E F ~ -  
1 I 
Figure 2.2: A h& bur-bsred witch fabric 
2.3 Time Division Switch Fabrics 
2.3.1 Shared Medium Architectures 
In fact, the first commercial ATM switch that hit the marbt was a h u g b e d  ar- 
chitecture (ASX-100) pmvided by Fore Systems [a]. A typied simple bus-bared 
architecture is shown in Figure 2.2. The IPCs do the .4TM lxwr funet~oos arsae!ated 
with checking for .4TM header ern=, VCI and VPI translation and synchronization 
a1 the arriving data aream to the inkrnal switch timing. To relay the data to the 
OPCr. lPCs contend for access to the bus using one of a variety d bus.mtwtion 
techniques. The control pon (CP) p l a ~  the mle of controlling the bus. Each OPC 
checks the destination address in each cell m decide whether to b&er this cell or 
not. Each OPC mntaias an intmal buffer queue to hold the arriving cells. 06- 
ouslh the bus rate should be at least N timer aa fast as the rate of any individual 
IPC to alleviate the internal blocking problem, whee N is the number of IPG. The 
Figure 2.3: A basic riog-bad switch fabric. 
way to achieve such a rate is by increasing the width of the bur. For example, a 
system supporting 16 OC-3 links with internal bus rate of 40 MHz, requires a 64hit 
bus width. Notice that ar the number of the ports in the syaem increase, both the 
number of ports connected to the bus, as well as, the width of the bus must inere=. 
This yields a quadratic gmwh characteristic, making it uneconomical to implement 
large switches Another erncid problem with the bus systems is that ar the number 
of 110 ports increase, the capacitive loading on the lines increases, reduang the 
internal bus rate 
.A simplified ring-bad system is depicted in Figure 2.3. The components of the 
system are similar to thme in a b u s b a d  system except that each pair of [PC and 
OPC are integrated in one unit and served by a ring interlace (RI). The token-ring 
protocol is the m m  mmmon pmtaeol running the ring-bad networks. Ring w e m r  
have the same quadratic grouth pmblem as bus *terns. H m r .  they do not suffer 
from the capacitive loading pmblem. That is a ring system ran m at faster rates 
than a bus system if both are implemented using the same techoologv. This leads to 
smaller rtng width than wide bus width in the bus systems. 
Based on the above diseuwion. we conclude that both ring and bus systems haw 
pros and cons. Both haw the quadratic gmxth characteristic as the number of ports 
inereaes. Bus systems suKer from capacitance loading but t h q  e n j ~  the simpliciv 
of implementing multicasting (we will diseuss multicasting propny later in more 
detail). On the contram ring *terns do not suffer fmm capacitance loading because 
they are direct point-to-point sptems, but implementation of multicarting is more 
complicated than bus system. Additionally, in ring systems extra latency is added 
to p a s  the cells amongt the Rls. In general, we can state chat shared medium 
r:Ttemr are not the promising solution for future BlSDN due to their un-nomical 
implementation. 
2.3.2 Shared Memory Architectures 
A multi 110 ports memory can r ep rea t  a s h d  memory, also called shared huKer. 
switch. The lPCs write the arriving el ls  at  the beginning ofthe switching cycle to the 
shared memory space sod the OPCs read the cells that requen them Imm the shared 
memory space. The OPCs identie the cells that requa them with the aid of address 
haldnng queues (AHQr). The number of these queues is equal to the number of OPCs 
and each queue, which is ari~eiated with an OPC. holds the starting add- of the 
cells in the shared memory that reques this output. One ho t t l end  we can easily 
rcmgnize in shared memory witches is that they should be centrally eontmlled to 
organize the pro- of reading/niting a m s g t  110 portr. Central control is one 
major obatacle towards building a large switch fabric. The main advantqe of pure 
rhared memory system is that they have the lowest buffering requirements compared 
to any other switching architecture that have the same number of 110 pons. This 
is due to the complete sharing ef the storage medium of the switch fabric. whereas in 
other arehiteeturm the buffering medium is partially or mmpleteir divided amongst 
vsrem poru in the form of separate queues. Turner ef 01. [20] have dismvered that 
shared memory s).stem nduee the memory buffering requirements b? a fanor af 5 
to 8 lesn than shared medium system. This is a crucial faet when the t r d e  load has 
a hunty nature. Another bottleneck is the memory access speed, which alw declines 
as the memory size inme-. Obviousl~ we haw 2 Nreadlwrite operations have to 
be carried out in each switching cycle. That is, the memory a c c e  s p n l  should bp 
at l e s t  2 N tame+ as fast as any 110 port  This limit is twiee the minimum limit of 
the bus rate in shared medium -em. That meens tor the previous example of the 
16 OC-3 IPCs .).stem, with internal bus rate of 40 MHz. we need a bus width d 128 
for a pure shared memory system. 
Shared memory system ~ f f e r t i o m  similar problem we d i s c 4  above for shared 
medium systems. such as wide bus width and capacitance loading. Alw. Y m a h  
el ol. [2] provided a quantitative dirusaion about the unemmmical implementation 
of the shared memory architeeturn. Far a shared memory system of 2N input and 
output paru aod total s h a d  buffer she EN cells. the total memory rpaee required 
for the AHQs in bits is given by (21: 
where P is the number ofaervice priorities. Figure 2.4 depicts the memory capacity 
requirements of the AH@ and the shared buffer. For a 32 x 32 switch with 1K 
eel$ buffen (when B = 32). no fewer than 320K bits are necessary in total for 
AHQs. Even with today's thnoloby. this is roo large to i n t w t e  into the same die 
2 4 8 16 32 64 128 256 
Number ol output porls 
Figure 2.4: Memory capacity of AH* Venus s h a d  buffer [Zj. 
with the control parts of the s ~ e m .  Therefore, enernal M M r  or FlFOs haw to 
be introduced, which often results in pin number bottleneel. The ahe r  interesting 
fact is that far only one service priority ( P  = I) ,  the memory capacity needed for 
the AHQJ is greater than the shared bu8er needed to nore the cells themselws for 
network sires > 32. One can visualire the added complexity if more than one service 
priority ( P  > I )  are pmvided. Several methods haw been reparted in the literature 
to manage the operation of the IHQ. A summary d t h e  methods can be found io 
[?A]. 
Perhaps, the PRELUDE d i t f f t u r e  is the first trial to build a pure shared mem- 
or?. packer snitch fabric [25]. The architecture of the 16 x 16 PRELUDE switch 
is an asynchronous version of the etwical ~lduonaus  time division (STD) switch 
m d  in telephone network PI. The witch has gone thmugh many ph- of dewlop 
ment. .4s early as 1986, a throughput of 16 x 280 Mh/s was achiewd using m i t t ~ r  
coupled logic (ECL) reeh~lology at  the expense of a mnsumption of 4W W o 5 V. 
Pmgress in CMOS technology has led to the implementation of a second version of 
24 CMOS ch ip  in 1993 named COM16 1261. The = h i d  throughput pas 16 x 
155.52 i\lb/s while consumption ws limited to 40 W at  5 V. The third -ion is 
a monochip COM16M that B m r p a c e  all the switch functions. Figure 2.5 depins 
that architecture. 
In 1271, the authom r w i e m d  the eRorts by s research team s t  Bell Labs towards 
building ao ATTM witch. The group cooeluded that a pure shared memory switch has 
b m m e  a reality due to the advancer in VLSI technology. The first prototype war an 8 
x 8 switch fabric that had a 2K cells norage natie RZM of 10-ns~.  access time using 
a 0.8 BiCMOS technolw. The port speed for each 110 line of the witch fabric is 2.5 
Gb/s making the total switch capacity of up to 20 Gbp.  Further impmvements were 
made to reduce the areaof the switch m d  to inmare the capacity up to 160Gbfs bra  
Figure 2.2 COM16M arehimure 131. 
switch fabric. H m r ,  the new modified edawi fabric is not s pure shared memory 
archit~ture due to the sbow mentioned limitations, but it is a minure of s h a d  
memar\. and rpwe division nrchitetures. Every eight consecutive output ports are 
grouped in one set and w e d  by the initidiy dewloped pure 8 x 8 shared memov 
switch fabric. That is, there exist eight shared memory switeh fabrics preepded by an 
erpand-concentrate (EXCON) stage uped todistribute the arriving trafficamonbst the 
eight switch f*brics based on the requests made by the arriving ells. Generally, the 
switch performance is outstanding hut the mmplexity of the switch is wry obvious. 
Another challenging shared memwy witching module is pmposed in 1281 as wl l  
ar a switch fabric bcsd on that switching module. It is a dear example of how much 
eomplexitr one contends with in designing mtrally eontrolled achiteetutes. The 
switch perfomane, although nM provided, muld be outstanding hut a =laser look at 
the switch architmtw rweaLr that the roa/perlormanee ratio is very high for this 
switch fabric. .4dditionally, the maneetiom bet- the fabric central eontmller and 
dinerent module suggest that the implemenmtion of the system hss to be split over 
many hoards making it s wry difficult t ~ t  for the qotem to attain high sprrds. 
It appears, due to the Limitatiws we direusred ahow, researchers are tending to- 
wards hybrid systems. That is, networks are implemented ar a mix d both shared 
memory and apace division. Some newly reported examples that d-be this trend 
can be found in [24] and [2]. lo the s u h u e n t  reetiolcj we will diaeuss different hufer- 
ing schemes adopted in spa- division architectures, showing that s h a d  buffering 
scheme outperforms other huflering schemes. 
2.4 Space Division Switch Fabrics 
.As mentioned earlier, the el- of space division architmure has a larger collection 
when compared to the other class of time division architecturn. As depicted in 
Figure 2.1, space division arcbitsturea can be divided into two main subclasses: 
namelr single-stage internnetion network (SINS) and multistage interconnmion 
network (411Ns). Obviously, as both names d the aubelnacp idicate. architectures 
that belong to SINS are mmpmed only d one stage. whereas architeeturer belonging 
to bl1Ks arecampmedafmultiplest~. In the next senions we d h u u  each suhcl- 
with the provision of some illustratiw examples fa each. 
2.4.1 Single Stage Architectures 
There are two well-hovn examples for SINS, the Cmasbar (sometimes cdled Dot 
Matrix) switch aod the Knockout switch. The c m b a r  witch is depicted in Fiyre 
2.6. An SE is located at each crosspoint and taks one d tam states, either c m .  s a t e  
or bar state as illvstratd in F i i  2.7. The uoubar switch outperforms several 
other witching oemrks. The main re-n is that the witch does not mBer from 
any internal blocking. Haaever, it sutlers from output blacking because each main 
F i  2.6: A embar switch fabric. 
Figure 2.7: C-bar SE states 
output can not w e p t  more than one cell in any switching cycle. Furthermore. the 
witch has two main disadvantages, which are intensified as thesize of the netmrk (N)  
Increases. Firstly, the hardware complexity of the switch is O(W), which is highly 
complex when compared toother spaeedivisiao arehitmtures. Howewr, m o t l ?  Woo 
P9] has pmposed a new design that reduced the switch complexity to O(NV%) by 
adapting the theory d f i i t e  pmjelive planes. Although the new idea has reduced 
the complexity by s factor d a, the hard- mmplexity of the newly designed 
SEs to implement the new prop& switching pmtoml hss significantly ioe-ed. 
Secondly, there is lack offairnes amongst the input iinb ofthe onaorl. The pmblem 
takes plan because the iatemal S L  may give priority to the input ports of the 
network in a descending or ascending order, depending on the routing decisions made 
within the SEs. A randomizing mechanism may he used to allwiate the pmhlem 
but at the expense e l l  y r  hardware mmplexity of the SEr. In [3q, under uniform 
random traffic a thearetical upper limit for a muimum thmughput 010.38;18 has bean 
reached for the ermabar switch of infinite drc  (N = m) and infinite input buffering. 
Further inwt ip tbns  of the e m b a r  performanee under different huKer straregies 
can he found in 131, 321. To mociude, the c m b a r  switch muld be m n s i d d  the 
proper choice for the circuit switehig technique, hut it har lost attraction with the 
new demands of the h m a d h d  packet switehiog where scenarics of output t&c 
concentration exist. In [20l. a mrwy d d i f f m t  methods used to imp- themaark 
network performance -such as introducing hullning within the nosspoints, having 
multiple buffer queues at each input port and speeding up the fabric switching- is 
conducted. Despite the mmplcjty of the cressbar switch, reant implmentatiow 
that are based on c m b a r  srrhitenure can be found in 1331 and (341. 
The architecture of the knodmut switch is depicted in Figure 2.8. Each outpat 
link 0, is preeeded by an N:L concentrator. The conuntrstor has time main parts, 

I) N address filters. 2) N:L multiplexer, and 3) a buffer space. The add- filters 
p a s  only the proper cells and rejeet all ether celb that request other output pans. 
The concentrators confine the arriving cells in a parallel stream of L cells. That ts if 
more than L cells requa the same output port, the multiplexer has to reject some of 
the exes nlb ,  hence the name hockouf. Finally, the buffer store. the arriving eellr. 
If L = N, then the switch is called as a ptbt switch. The reason is that no sort 
of blacking is experienced in the network. This does not mean that the nework has 
a perfeet performance, h a u s e  the output queuing problem still exists. When the 
inmming traffic is more bursty, the output buffer meeds to be larger to prevent buffer 
overflow. This problem, which we refer to as the output queuing problem, is a elss~ieal 
queuing theory isrue. The network is sometimes called as a "Disjoint-pW network. 
beeause there is no interntion amongst input/output connections. Obviousl~~ the 
knockout switch suffers fmm the eaplcitance loading pmblem d i d  earlier for bus 
and shared memory amhiteerums. In fact, the perform- of the Lnodmut wit* is 
very similar to the performance ofshsred memory adteeturer because the knockout 
swrreh d m  m t  suffer horn any internal blocking. Note that the mmplexity of the 
concentrator g r m  dranieally pr Nor L inere-. In [S], a survey of some tradeaffr 
to implement the krlodmut witch b conducted. Thmughout tbis d h t i o n ,  we 
selected both the -bar and h&ut witches to compare their performance. 
with the architecture we propar. We call the wrsion of the knodmut switch we m 
a perfect witch h a u s e  we set L = N. 
At this stage, the di&rent blocking definitions for intercooo~tioo network are 
worth mentioning. Thew definitions daeribe the ability d a n y  witching n-rk to 
make different permutation conneetisns. Clearly, a anoblockin~ N N N N d i m  
N! permutations. The definitions are as foUm 1.51: 
. A network is said to be nonblodng m the s h c t  s m e  m strictly nonblocking if 
it is capableof immediatelyntablishing a connection between any input-output 
pair without interference h m  any arbitrary existing wnnenim. 
. A network is said to be nrmblmking in the wide s m e  or wide-sme nonblmking 
if any desired connection b e t w ~ n  an input-output pair can be established im- 
mediately, provided that the dstingmnnectiom haw bpen inserted using Jome 
routing algorithm. If the algorithm is not foliowed, -me attempted eonnmtians 
may get biodied. 
. A network is raid to be mmngmble or mmngmbly nonblmking if a desired 
connection between any input-output pair can be established if one or mom of 
the existing connections are remuted or mrral~ged. 
A network is said to be blmkingifsome connection nts ean prevent Jome desired 
connetions from being eatabliskl. 
.tithough, there definitiooa are atandud and used by many rrpearchen to cizsify 
switching architectures, they may pmvide a misleading picture about the prior- 
mance of a nvitehiog architstum. The fact is, permutation tlafhe is oat the traffic 
type anticipated io the broadkad cammunieation systems. In the brosdband t r d c  
uenarim, multicast aod braadcast eonnmiws, a aell as output eoncentratiws, are 
expected. There t r d c  loads degrade the perlomanee of many existing switching 
architectures dgnifimtly. A ryp(eal example, a we will rce later, is the Batcher 
banyan switch. 
2.4.2 Multistage Interconnection Networks 
Receoti?: the interest in MlNr h a  g m  very fat .  This is b u s  of rn main 
reasons. Fimiy, in MWs w central mntml is needed to supnvise the switching 
25 
procffo as the ease in shared memop and shared medium systems. That i s  the 
control is distributed amongst the SEs. This property is called slf-muting. Semndly. 
the architectures of there o e m h  are modular making them attractive for VLSl 
implementation. We first diruas thesingle path and then the muitipath \[INS in the 
next t>vo sections. 
2.4.2.1 Single Pa th  MlNs 
G o h  and Lipovaki [S] defined that, a network with a unique path tmm each input 
to each output is called a banyon network. ,4160, another dehoition b?. Pole1 (361 is: 
A delta network is an on x b" switching network with n st-. eon- 
sisting ofa x b ereasbar module. The link pattern betwen stage is such 
that there exists a unipc path of constant length from any soume to any 
detination. 
Obviously, the two definitions are claae and they are used interchangeably in the 
literature. In general, we will limit our diruaaian only far recranylar MINI. I 
rectangular iClIN is of size N x N. Severd welChown MINs Moog to ddtr  clag 
such as baqvan 1351, baseline 1371, re- bareline [37], omega [ a ] ,  modified data 
manipulator (371, and indirect binary -be [391. Some of thesp networks are shown 
in Figure 2.9. A rfftanylar banyan network baaed oo 2 x 2 SEs has lmN stages. 
Each stage has f SEs. In addition to the above mentioned advantages of MI%, 
banyan networks baaed on 2 x 2 SEa enjoy the privilege of the simple structure of 
their SEs. .i 2 x 2 SE is vzry simple to implement b u a e  the number of decisions 
needed to be taken are s d i e r  compared to higher order SEs. 
In [40], it has been sham that the performance under uniform random traffic is 
the same for all banyan eon&rstions. Dapite all the advantaga, the prformance of 
a banyan network is enremdy poor e m  under simple traffic loads, such as uniform 
26 
Figure 2.9: Dierent banyan network topolqiep (a) banyan (b) omega (e) basline. 
random traffic. Several methods wre  p m p d  to imp- the performance hut thii 
is achieved at the expense d the network complexity The mmt well-ham method 
is using buffen. We will pmvide a detailed discusion of the buffering techniques in 
Chapter 4. 
2.4.2.2 Multipath MINs 
The snbelw of multipath blINs has a huge collection, because all single path bIlNs 
are inherently included. Later we will pmvide diussions about some methodr used 
to build multipth MlNs fmm single path ones. Prior to that, we hrst d i i s s  the 
structures of standalone multipth MINs. 
Clm network 
The Clw network 1411 has been in the -arch focus lor decades. The re- is 
that it a a noobloeking network with f e w  er-onnecls than the e m b a r  switch. 
This fact made the Clos network a rtmng candidate for circuit switching networks. 
Howwr, the Clm switch war also eoasidered for packet switched networks as in 
Slemphh switch built for lBM GF-I1 parallel computer 1421. The Cim network is 
a three-stage naamrk that can be built in asymmetric or symmetric fashion. The 
fist s t y  of the network has r, (n, x m) SEs, the middle sage has m (7, x r.) 
SEs, and the third stage ha. n (m x m) SEs. In a symmetric network rl = rr = r 
and nl = n2 = n. Figure 2.10 depicts the arebiteeture of a symmetric Clo. nenork. 
CIm networks eat1 be designed to either operate in remangeable nonbloddng mode 
or nonblodiing mode. It has been sham lor a qvmmetric Clm n e t d  (431: 
if m > n theo the n m k  can operate in reanrageable nonbloelring mode. 
if m > LFJ then the netmrk can operate in the widesense noobloddng made. 
and if m > 2n - 1 theo the network ean operate in the srietly nonbloddng 
mode. 
Obviously, the hard- requirement for a Cles network to operate in a strictly oon- 
blocking made or widesense nonhlmkiog made is higher than that cooperate in a 
rearrangeable oonhloddng made. Accordingly, a symmetric semi-mamangeably non- 
bladnng (SRX) Clm network hu been provided (441. Similar to a nonblodting net- 
work. an SI(V network does not allow any arrangement of the existing connectioos in 
order to establish a new connection; however, unlike a nonbladting switch. an SRX 
switch allows at mmt one rearrangement to take place when an &ins connection 
is dlsonnected. Thus the overhead d walidng a new connection under SRN made 
of operation is the same as that under the northlocking operation. However, the 
overhead of disconnection in the SRN mode is slightly higher than that in the nsn- 
blocking mode of opcratioo. Nwertheles, this inerras in the owhead is justified 
because the hardware resources required for SRX are much lover than thee required 
far nmblacking mode of operation 1441. Reeentlh a similar approach ~u p m p d  
for asymmetric Clm netmks 1451. 
In (461, an optical implementation and a performance analysis of asymmetric Clcs 
network were propaed. The performance was studied under unit- random traffic. 
It was discovered that the performam is  acceptable under l a ,  t d e  loads, hut as the 
laad inereas- the mount d inmroal buKrriog be-. One drawback of internal 
buKeriog within MlNs is the emergenee of the so called out-of-squrnce pmblem. 
where cells reach the dntioatlioo. not in the order they arrived at the input ports. 
In the literataw there are two schemes uaed to control switching in Clm network 
to overcome the out-af-squenee pmblem but at high cust of complexity. The &st 
scheme. which is used b 1451, is d l e d  rnullirote circuit m'tching [41. The remod 
is a dvoamie muting seheme called e l l  auitching [48]. In the former scheme, a w h  
Figure 2.10: Architenure d the Cia netmrk. 
M 
is established for each mnnmiao such that the peak capacity is rererwd along the 
path and it is sufficient to carry the mnnenioo at any time. Clearly, this scheme will 
result in low utilization of the network capacity because it does not take advan ty  of 
thcstatirtieal multiploringofIered by the ATM pmtocol. But, it provides =guarantee 
of QoS lor each connection. The latter rheme is mare dymamic, thus it has better 
utilization, but extra overhead is needed to overmme the outdf-scquenee problem. 
.I recent scheme, which lier in the e o n  &wen both previous schemes, has been 
proposed in (491. Obviously, there ate too maqv concerns that have to be taken into 
account when designing a switch baaed an the Clm network. T h e  mnemr  mult  
in complicated implementation d the switch and high overhead pmeasing time, that 
maker it difficult to implement switch* with large sin bared on the Clas network. 
Another recent optical implcmentatias of an architecture bared on Clos n e m t k  can 
he found in [EO]. 
Kappa Network 
Kappa network (KN) [4] is a variation of the Gamma network (GN) [X]. Both GN 
and KN were proposed as fault-tolerant MINs. The GN is derived fmm the modified 
baseline network. Both KN and GN haw lag2N + 1 awes but with different SE 
ones. The firat stage of the GN has N 1 x 3 SEs, whereas the first s t y  of the KN 
is mmpased of N 1 x 4 SEs. Each middle sage in the GN is mmpasld of N 3 x 3 
SEs. whereas it is mmposed sf N 4 x 4 SEa in the KN. The Ian s t a e  d the GN hsp 
N 3 x 1 SEs. where- the Ian staged the KN has N 4 x 1 SEs. Fi- 2.11 depicts 
the arehiteeture of an 8 x 8 Kappa network. If we remow the link denoted by 'A' in 
Figure 2.11, we obtain the GN. Each link in the Kappa network has an alternate one. 
In case of a fault, the traffic is directed to the alternate link. Obviously, the family 
of banyan network doer not haw any fault tolerance feature. H m r ,  there are 
several methods to impmw their fault tolerance pmpnties by replicatio~ the network 

Figure 2.12: Archit~ture of an 8 x 8 BB oetaork [q. 
or dilation of the links. We will explain these mnnpts in the fallowing sections. 
Bateher banyan Network 
Banyan networks hemme nanhloeLing under permutation traffic if the arriving cells 
are sorted io ascending or d-ndiog order; pmvided that no gaps exist b e t w o  
actiw inputs 1521. A Batcher baoyan (BB) o m r k  is m m p e d  ofasorting (Bateher) 
network followed by a banyan network. Figure 2.12 depicw the mhitstule of an 8x8  
BB network. The BB network is a multipath network because the path fmm any 
source to any destination depend3 on the sorting order in the sorting network. The 
sorting network (531 has V l o h N  stages, with N/2 sorting elements in each stage. 
In total, $ ( F l o h N )  sorting elements and +lonN SEs a i r t  in the BB network. 
Obviously, the BB nnaork has l e s  cmsspaiots for large N if mmpared with the 
c r m k  switch, however it ia an alternative to impmve the performance 
of the banyan nemork. Despite that mmplaity, several switches that have k e n  
reported in the IiMlatue am based on the BB network [5]: Stulite swiwiteh. Smhine 
switch . tphare BB switch, and >phase BB switch. .Although the performance of 
the BB network is antatanding (100% thmughput) under permutation traffic, it hss 
poor perlormanee under other traffic lo& which are closer to the red ATbl traffic. 
This can be rationalized due to the limited prfomance of the banyan network which 
constitutes the output part of the BB oetwrk. It is wonh mentioaiog at this step 
that permutation tr&, which is used as the basis for definiog the blacking pmpenies 
of the different network architectures, does not really r e p e n t  the real ATTY traffic 
which is expected to be bunt? and have multican pmpenies [%I. That is why it was 
reported in [55l that the BB network is more popular in the e a r e h  communities 
than for commercial applications. 
Extended Banyan Netvnb  
Benej. (431 fist intmdueed the mneept of mended networks, henee these networks 
are called Beoeg networks. The Bene3 network is another er;mple for a multipath 
network obtained fmm the banyan o e m k .  The network h- 21anN-I r w ,  and is 
composed of two halves. Each half is s minor image of the other ar sham in Figure 
2.13. The fim half, which is composed of the first l w N  - 1 stages, is called the 
distnbutian network and the second half, which is composed of the 1s t  l o h N  stages. 
iscalled the mulingnetwork. Notice that weem still obtain a multipnth network wen 
with a diitrihutioo network that has number of stages I- than l m N -  1. H-r. 
such oaworkr have poorer oanbloeking pmpenies when compared to the generic 
Benej. network. &neS has p d  that such networks sre rearrangeably oonhlaeliog 
141- 
BeneS networks suffer fmm tw drawbacks. Fimly, the control is eentraliaed in 
the distribution network, which leads to losing the disrriboted eontml advantage that 
bllNs enjoy. S~eeondly, the amhimure does not wermme the out-of-queue pmb 
lem encountered if the SEs contain bullen because it is a multipnth architecture. 
Rmotly De blare0 and Pattavina 1561 have defined a new dipVibutpd mntml alge 
rithm, which is suitable for ATM cell switching, La contml the switching operation in 
Figure 2.13: The arehitenure of an 8 x 8 h e i  network. 
the extended banyan network. Despite this imprwement, the complexity of the SEs 
in the distribution network incrc- drsstieally. This is c a w d  by the complicated 
algorithm implemented in eafh SE. The functions of that algorithm include gathering 
information about the buf f s  in the same pool (me [.%I), manipulating this infarma- 
tion, and then taking the muting decision. Additiody, the output port contmllen 
have to guarantee the sequence of the arriving e l l s  because this algorithm does not 
obviate the out ofaequene pmhkm ifthe n e m k  is internally hflered. This scenario 
has to be repeated in evny switching eyde. I n d d ,  the mended banyan network 
is attractive for circuit switching systems. s the cpre for C k  network, but is less 
efficient when used in p w h t  saitching w t m .  
Dilated Networks 
.i dilated banyan network has the m e  topolw and the m e  number of SEs as the 
banyan network except that each link in the mtwock ia replicated many time.. Far 
example, the building SE for a 2-dilated batwan is sham in F i w  2.14. The up 
Figure 2.14: A switchingelemeot in a ?dilated banyan network 
per/lo\ver inputs(outputs) are connected to the same SE in the previous(oext) stage. 
Importantly, we should not mix the network dilation with increasing the internal 
bandwidth of the network. The latter is similu to using wide bus for sprrding up, 
discussed earlier in section 2.3. wh- in the former the number of ports per SE is 
increased. In k t ,  both concepts could be adopted in the same architecture. Note 
that the concept ofdilatioo is not restricted only to the banyan oenorhr, but indeed, 
it can be adopted with any other MIN archimture. 
In the literature, the idea of dilated networltJ bur been studied [57,%]. AS -Id 
be expected, The performance is shorn to be better than the regular banyan network. 
That is because of the incrrased number of paths that a eel1 can go thmugh. In 1591, 
Alimuddin ct al. haw d~rovered that uaiog 6 d  dilation, i.e., using thesamedilation 
degree. for ail SEE result in udmtilization in the initial stages and owrotilization 
in the [art stages. They haw pmpased a structure wbere the degree of dilation is mot 
the same in all stages. It increws t d  the last stages of the network. That way 
Fipre 2.133 The tandem srehiteeure. 
seenariad of output port contention can be reland with the availability of more patha. 
Finall?, we should be aware of the increased hard- mmplexity when incorporating 
dilation. 
Tandem N e t ~ r k s  
The first tandem architecture was Snt introduced by T o w  d 4. [[60, and it 
was based on the banyan network. A tandem structure is mmpawd of K fabrics 
connected in cavade ap shown in 2.15. Each fabric can belong a any of the switching 
arehitwtures we diwwsed a h .  The arriving cells are switched through fabric 1. 
If an internal c o d i n  exists, rome ells  win coatentbn and thc ahem an muted 
to the other unuspd output links. After the firs hbrie, cells an eb& and those 
haw not reached their destinations are sent to fabric 2 and the otber suecerrful cells 
ate received by the output ports. The pmass is repeated till reaching fabric K and 
then unsuccessful cells are dropped, or in -me otber uehiteeture. they muld he 
recirculated back. Note that switching is takmg p l re  in all hbriep simultsn~udy. 
which may lead to theoutof-sequence problem. For example, if a cell I- contention 
in con-tive stage end the tells arriving later io the rame logical connection win 
contention, then cells will arrive out-of-~equmce at the destination. 
The performance of the tandem banyan is definitely better than an ordinary 
banyan network as A' inc-. H m r ,  a tandem architecture is a very expensive 
solution especially if a low cell l m  ratio and minimum number of out-of-sequence 
cells are two objetiva. Beeause these m objectiva m ~ l a t e  to higher li and a 
eomplieated resequencing mechanism at the main output porn eontrailen of the net- 
work. The authors in 1601 reponed that, on average, the landem banyan network of 
size iV = 32 with recirculation a& at least 4 fabrics under uniform random traffic. 
Parallel and Parallel-LUo Nehmrk 
In parallel banyan networks, dao d e d  replicated banyan networks, there are li ner- 
works. each &led a plane, mnneeted in parallel. Each plane can be bawd on any MIN 
or swttehing architecture ae d i s c d  a h .  The &c is divided amangt t h e  K 
planes and then merged at the output ports. Many appmaehes can be followed to di- 
vide the arriving tm7ic. In the first appmaeh, the arriving tr&e is divided randomly 
with equal probabilities and each plane independently mutes its shared the traffic to 
the output ports (611. The second appmach is u, operate in a p ipe l id  [ashi  (621. 
where each switching eycla is divided into K overlapped ph-. In each phaw the 
headers oi the -Us at the head a i  line (HOL) are first mured sod sumgPlul headers 
that reach their destinations are positively sebowledged by the output ports. These 
pmitive acknowledgments are s a t  be& to the input porn. This part of any phase 
is called the %-tion qde. Now the heinput pon. reset by relaying the body of 
these positively sdolovledged cells through the plane which is corresponding to the 
current phase. Once the cells start t o g a  relayed they are no longer ert the HOL in 
the input pons. That is the input porn immediately am the reservation cycle of 
the next phase. The pm- is shorn in Figure 2.16. 
Figure 2.16: Timing q u e n a  of the different pipline ph-. 
In the first approach, the m e m i o n  in all plans is carried out at the same 
tlme. This is beeause each plam operates iodepndeotly of all others. Wereas, in 
the wond approach, the remation cycles of all planes can be carried out in one 
separate unit (called motrol plane in [62]), and the muting decision can be p a s 4  to 
each plane. Notice the a d  of the reservation cyde of phare K is synehronired with 
the start of the reservation cycle of phare 1 to guarante efficient utilization of the 
control plane. A structure based on the -nd appmaeh is s h m  in Figwe 2.17. 
There are three advantages of the semnd appmseh over the first one. Firstly, the 
structure of my of the data planes is simple i f c o m ~  with th- of 
the parallel planer in the f i n  approach because the data plans are not required to 
take any muting decisions. Semodly, the mntml plpne can run at lower rpeeds thsn 
the data planes, depedimg on the number of data p h e s  used. Table 2.1 illustrates 
4 w- 
-, '=-m 
OUM P o l  
- M 
Figure 2.17: The pipeline mhitceture 
the theoretical maximum speed reduction ratio for the motml plane of a pipeline 
structure. Obviously, ap the speed reduction ratio deaeases a. the network she N 
or the number of data planes K increases. Thirdly, the leeond appmaeh has better 
utilizationofthe data planes brcause nocellsare lost due to internal bladrinc Despite 
these advantages, the pipeline structure s u k  from two main problems, which can 
be Considered ar advantages for the first spproaeh. Firstly, the p m  fault tolemce 
N K Sped reduction ratio 
128 4 4.8 
256 4 4.24 
512 4 3.78 
1024 5 
Table 2.1: Caotml plane speed reduction ratio [lo] 
40 
properties of the network, because if a SE become h u l g  in the control plane. all 
other corresponding SEr in the data plane will be out of operation. To improve the 
fault tolerance pmpenies Cheng e l  ai. (631, proposed that each parallel p h e  taker its 
awn muting decision i d  of depending on one plane, but at the expense of extra 
hardware added to each plane. Secondly, the relaying cycle. io the parallel plans are 
not synchronized, which demands a more complicated aor i thm io the output ports 
to write to the buffers. 
2.5 Summary 
In this chapter, we introduced an overview of the different switching architectures 
proposed in the literature. We &d the differences and similarities, as well as 
the appmpriatenes, ofthe most popuiarelarsifications wed to classify these switching 
architectures. We a h  pmvided an example for each subelas in these ciasification 
hierarchies. Each example has demonstrated the a d v a n t w  and disadvantages of 
each rubclass. Additionally, we shared that the subelas of multipath MINs has the 
biggest collection when compared to 0th- rubel- due to their modular. scalable 
and efficient performance. In the next chapter we will discus the architecture re are 
proposing in this d imat ion ,  which is a multipth MIN. 
Chapter 3 
Balanced G a m m a  Network 
3.1 Introduction 
The BG network is a pmmising MIN lar hmadband fast p d n  switching. The BG 
network offers an outstanding perlormane when compared with other well-known 
nerwarh, such as the -bar and the Batcher &van netaorb. In this chapter, 
we introduce a historical hackpund as well as some new pmpced modifications to 
the BG network. Thee new modifications m n e m  both the muting algorithm and 
the network topology. We also pmvida some examples to deseribe the routing also- 
rlthm currently employed in the BG network. lo addition. we d i m s  some pmferred 
strate@ adopted in the sex muting albrithm. 
3.2 Historical Background 
3.2.1 Topology 
The BG network was f& rewrted in [a]. This network has a slrnilar stNCtUre a9 
the Kappa network [4], except in the last stage, the 4 to I concentrator a each output 
part of the kappa network b repland by a butler which is capable of receiving up to 
4  ells in one switching cycle. The BG netamrk has n + 1 stages where n = lagzN. 
The fim sage has 1 x 4 SEa. Each of the following n - 1 stages haw 4 x 4 embar  
SEs. The last stage is a buRer stage s mentioned earlier. Each stage har N SEs 
numbered fmm 0 to N - 1. Figure 3.1 depicts the initial structure of an 8 x 8 BG 
network. 
Each SE in the BG oemrk is addrened ar SE,, (0 I: 2 < N.0 5 j < n), where 
i is the number of the SE in stage j. Each SE has bur output l ink :lumbered from 
0 to 3 and indicated by 0 4 , .  ..,OLl. The output links of SE,, are mnnRted ar 
hilows: 0 4  is connected to one of the inputs of the (i - 29" SE in the next stage 
(SE,.a,+,), OL, is connected to one of the inputs of the (i)* SE in the next stage 
(SE,,,,). OL? is connected to w e  of the inputs of the (i+2J)Ih SE in the next stage 
(SE,+U,+~), and 0 4  is eoonected u, one of the inputs of the (i + 2Jt')'* SE in 
the next stage (SE,+,+I,+,). In the next section, we discus the muting algorithm 
propmed for the BG network. 
3.2.2 Routing Algorithm 
The BG network originally used a d i r tme  tag algorithm [641. In that algorithm. a 
routing tag is defined by the input port eontmllers br each arriving cell. This tag is 
given by (D-S) mod N where S deootep the input port the cell originating hom and 
D is the e l l  dertination. The S& interpret the bits of the muting tag in a rewne 
order; that is, the SEs of Stageo use the lean signi6emt bit for mating. If the routing 
bit is 'O', then the arriving cell is routed through OL,. If the muting bit is 'l', then 
the arriving cell is muted to O h .  As there are four input links mrning into each SE. 
it is quite likely that, in a given cycle, more than one p&t m y  arrive at the m e  
SE with the same muting bit (either 0 or 1). In esne of no priority arsigued to aoy 
cells, if three or mom cells are having the same routing bit value, two are randomly 
selected and the rest are dropped. If tbc two selected eelis have the muting bit value 
of 'O', then one of them is muted to OL,. If the two seleeted celk haw the muting 
Figure 3.1: Initial BG stmcture. 
bit value of 'l', then one of them is muted to 0 4  and the other is routed m OL.. 
However. the routing tag har -r h modified for these cells directed to OLa and 0 4 .  
To reduce the owrhead needed to generate the muting t w  and modifying them 
in the above algorithm, another muting algorithm was pm& in [65]. The new 
algorithm is called reverse destination tag. The tag is only the destination address D 
in binav, vir. d.-ld,.l.. . do. That way. the pro- needed to generate the tag at 
the input pons is oat ar complex ar it is in the abow algorithm. In this algorithm. the 
SEE also interpret the tag in a reverse order. Nonetheless, another paramem s has 
to be defined within each SE and d to take the mutiog deeisior~. This parameter 
u is given by: 
m = LI] mod 2. (3.1) 
If (d, 'a ol = 0) for an arriving cell to SE,,, the81 SE,, mutes this cell through OL, 
or O h ,  else SE,., routes this cell through 04 or OLo. If more than two c e b  have 
the same routing hit tags, then no are selected randomly for muting and the nst  
are dropped. Notice, in this algorithm the SEs do not need to modify the muting 
tags for the cells going through OL, and O h .  
3.3 Balanced Gamma New Structure 
In this dissertation, we inrmduee a new muting algorithm that Rdum the muting 
decision within the SEB to only ch&ng the value of the routing tag bit. In the new 
algorithm, which uses the rewm destination muting tag, if the value of the routing 
tap bit is 0, then the arriving cell is muted tbmugh OLo or OL,, else the arriving ell 
is routed through 04 or OLZ. We do not need to eh& for a parameter l i  a or 
modify the muting tag as the e e  in the abow two muting algorithm. Horn, we 
ought to change the n m r k  topology to implement this new algorithm, but without 
affecting the network modularity enjoyed by the previous topolagv daribed ahow. 
The new topology is daribed in Appodix A and depicted in Figure 3.2. A typical 
mnnedian. show in thick he, betreen input port 3 and output p a  6 in Figure 3.2 
rllustrates the new mutingalgarithm. In thismoneetioo, themutingtagis6 = (110)2. 
then the SE m Stageo routs the cell to 04 because the muting bit is 0, whereas 
the SEs in the subsequent st- mute the cell thmugh OL2 b e a u s  the muting bit 
is 1. 
Contrary to the topology used io the muting algorithm. intmdueed in reetion 
3.2.2, eww output link in each SE is connected to a particular input link ofthe SE in 
the oert stage. This all- us to treat direrent service priority cells by appropriately 
confining all the high priority t d e  to specific output links. In fact, we decided to 
route high priority mlis thmugh 04 or 04 if the camponding muting tag hit is a 
0 or a 1, respectively. Consequently, the high priority cells rill he confined to ILs and 
IL?. Hawever, the other links in the network an &o eany high priority cells. In c w  
of more than one cell are arriving to the same SE and having the same muting hit 
value, the top two high priority cells will win contention and others will he dmpped. 
In the example of Figure 3.3, three arriving cells ('h'. 'c'. 'd') have a muting bit = 
'O' with cells 'b' and 'd' having high priority, and ceU 'e' having low priority. Also 
there is me aniving cell ('a') having routing hit = '1' and high priority. Celk 'b' and 
'd' are muted thmugh 04 and OL,, respeetiwly, and cell 'e' is dmpped. Cell 'a' is 
routed through 04 and an IDLE cell is muted through OL3. 
A h ,  we decided to adopt a determinktic muting algorithm rather than a random 
one. We found out this is better for two masons. Firstly. the arehimure of the SE is 
l e s  complicated became no random number generator unit is d i n g .  Secondly, the 
routing algorithm is simpler k u s e  no random number is checked. All the design 
~ m e s  will be d i d  later in Chapter 5. Appendix B pmvide the detaik d tbe 
Stage0 Stage 1 Stage2 2: 
Figure 3.2: New BG structure. 
Figure 3.3: A SE muting decision example. 
new detennhktic routing algorithm 
3.4 Summary 
In this chapter n haw d i i u d  the old md new amhitmure of the BG network. 
We have provided a historical baekpond of the BG network previous arehitectum. 
Previauslh two routing algorithm were pmpaeed for the BG net&. H m ,  in 
this dimnation n propwed anew muting alprithm which issimpler than the previ- 
ous ones. The new muting algorithm necessitated some modi8eationa to the topology 
of the network. T h e  new modifiestiom did not d e c t  the network mmplaity or 
modularity. Also, the prop& muting algorithm cm handle m levels d priority 
cells. It tries to e d n e  the high priority mils to specific internal links. In chapter 6 
we discus fault talernee pmpenie~ of tbe BG neraark in more detail. 
Chapter 4 
Performance Under Uniform and 
Non Uniform T r d c  
4.1 Introduction 
In this chapter, we inwstigste the performance ofthe BG wtvork with finite huneriog 
resource in bath the IPCs and the OPCs. We use both uniform and .on-uniform 
traffic loads. The load typrr We use are Urn and bursty. The parametem used to 
measure the performance are the eel1 lass ratio, maximum cell delay, awrage cell 
dela): Input hufer quiremenu,  and output huller requirements. Tbe reader should 
he aware of the fact that bath the throughput (TP) aod eel1 lass ratio are related by: 
TP = 1 - 4 1  lass ratio (4.1) 
4s mentioned earlier, the muits  of the BG network will he compand with the per- 
formance of both the m h a r  and i d d  non blacking oetMrks. We fin1 pmvide 
an overview of the b u R e  tffhniqua that are used to enhance the performance of 
switch fabrics. Next, we d i i  the performam under URT with the intmdunion of 
an analytical model far the pipelined BG network. The next section discusses the per- 
formance of the thne arehitecturer under d i h t  busty t d c  I d s .  We mudude 
this chapter by inwotigating the pe r fo rnee  under various loa& of nan-uniform 
4.2 Buffering Strategies 
Due to the b u m  nature of the traffic loads in broadband packet switching s p t n n r  
buffering can not be avoided wen when using an ided nonbloeking switch fabric. .An 
example for this ideal mMork'~ng switch fabric i. the linmkmrt switch with iV = L. 
This ideal fabric, in any rwitehing cycle, is capable of fulfilling all the requ-ts issued 
br. the mivingeells at the input pma  of the labtic. Howewr, only oee eell m depart 
from any of the output porn in a switehing cycle. Accordingly, buffering should he 
adopted, otherxise the switch will ~ R e r  from eell I-. Notice that, buffering can not 
totally eliminate cell lass, but it can duee it to aeeeptable l m k .  Adding buffers is 
not a penslty-freesolution towards minimizingmu loss, simply becameoftwo masons. 
Firstly, buffering requirn adding extra hardware to the system in the form of memov 
elements and control circuitries. We bow that the sped of memar), degenerates ar 
the size increases, placing a limit om the maximum memory size that could be used. 
However, the sped of the storage system m be i n m u d  by bit slicing 1271 which 
is an erpensive solution. SPeondly, mils which an stored withim the buffers suffer 
fmm delay. In fact. the lager the bnffer s i m  the bigger is the dew. which imposes 
another limit on the buffer size used. In general, same applications are wry sensitive 
to delav~ such as real time multimediaapplicatiom. Both peak and average cell delay 
values are two parametw wed to negntiale the QoS during coonexion establishment 
in the bmadband communication systems. Notiee that both cell lass and cell delay 
are working against each a t k .  For a r p I e ,  if re try to impmve the lewl of 4 
Ias. which is normally achiewd by inmming buffer sizes, re end up by experiencing 
poor levels of cell delay, and vice w m .  
Buffering strategies take different mnfigurationr depending ao the switch arehi- 
teecure. We study the buffering strategies used in space division architectures in the 
next section. 
4.2.1 Input/Output Buffering 
A switch fabric can he pureinput buffered, purwutput buffered, or input-output 
buffered as depicted in Figure 4.1. In the pure-input buffering strategy, a buffering 
mechanism is located at  eaeh iaput line of the fahric. In eaeh switching cycle, the 
fabric routes the cells at the HOL in the input buffers to the output ports. The input 
buffers can then operate in either a lasqv mode or a hackpressure made. In the lops). 
mode, the input porn remove the unsuceessid muted cells from the input huffem and 
bring the cells at  the next locations to the HOL to participate in the nm switching 
qcle. While, in the backpressure mode the input pons are xknmlcdged, through an 
acknowledging mechanism that is distributed all wer the fabric, with the sueeessful 
routed cells. B d  an the received admmledgmeots, the input ports remove only 
the sueeesful muted cells from the input huffen and keep the u o s u d u l  ones. It 
is obvious that a fabric operating in a l m y  mode has s simpler hardware eomplexitv 
compared to a fabric operating in the h a c k p m n  mode. This is due to the nds 
tence of the baekp-"re mechanism distributed all owr the switch fahric adopting 
hackpresure mode. Havever, the performanee in the losry mode is not acceptable for 
broadband mmmunication sr~tcms where low l m l s  d cell Ims haw to he guamantecd. 
Several selection techniques, p r o d  in the literature and listed in 151, are used 
to m l w  the muting decision in ideal nonhloeLing networks, rueh as the emssk 
network. Recall that only one cell can he served by each output port in any switching 
cycle. Thus, these selection techniques are used to r m l w  the aituatian when more 
than one cell requests the same output destination. AU then seietioo techniques 
k i i r m  
!- 
. 
1 Switch ~ . 
. Fabric 1 . 
. i 
(C) 
Figure 4.1: lnputautput boner strategies (a) puminput ( b )  poreoutput (e) input- 
output bufleriog 
agree on that under uniform random t&e a pure-input hu f fed  ided aonbloeking 
network has a maximum thmughput (TP-) of 2 - fi = 0 . a  as N + m m d  a 
FIFO input buffer size B ,  + m. This is due m HOL blocking problem encountered 
at the input buKera as a rnult of the Wp-re mode. Clearly, using pure-input 
buffering strategy is not an optimal solution to i m p m  the performann of witch 
fabrin 
In pure-output buffering strategy, buKem only exist at the output ports. In an 
ideal nonblaeking fabric, the pweautput buffering strategy outperforms the pure- 
input buffering strategy provided that bath buffera are of infinite size and N + m. 
TP,, for the latter ease is 1WW compared to 58.6% for the brmer. The a m g e  n l l  
delw in both pure-input (D,,,) and purwutput buffering strategies (Dm,) are given 
D nl- P. 2(1-p,8 OsP<17 (4.3) 
where p represents the applied load. Figure 4.2 depicts the delay pdormmee of both 
strategies. 
The superiority of the purwutput buffering strategy daa ibn l  a b m  is not en- 
joyed under d l  conditions, espeially when the buffer sires are finite. P-utput 
buffering does not i m p m  the prformmec of some switching d i t r t w e s  at  dl. 
For example, t h m  arrhitwture where only one e l l  can he accepted by each output 
port in my swiding eyde will not beneiit from output bueering if the departure 
rate (%metimes called eonswnption rate) is 1 hom the OPCs. The re- is that 
after each rwitehiq cycle no cell will be I& to be buffed in the OPCs. 
We conclude from the abow discussion that p-input and p-utput buffeting 
Figure 4.2: Behavior of D,. and D,, 
strategies are not the ben solutiom. lostead, a hybrid of both strategies muld lead 
to a better eompmmire. Several studies reported in [$I haw supported this fact. With 
input-output buffering strategy we gain the advantage of both strategies. We gain 
the capability of the input buffer. to combat internal and outpot blocking. We alao 
gain the reduction of the HOL blocking enjoy4 by the oulput buffering ststrate=. An 
ideal network does not need input bu&ring because the nnwrk does not auRer from 
an? internal or output blocking. 
4.2.2 Internal Buffering 
In the literature [66], fo~u internal buffering s t W e s  haw been m e e d .  These 
strategies are depicted in F i w  4.3 for n 2 x 2 SE. Internal bullering is mainly used 
in single path MINs, bust using internal buffering in multipatb MlNs nauits in 
an out-ofquenee pmblem. Swerd efforts have been reported [671 w study the 
performance of banyan networks with internal butlering and under diRerent loads 
of uniform and no.-uniform traffic. If we exclude the input buffering stratew, it is 
found that under uniform r d o m  m& the banyan network with internal buffering 
Figure 4.3: Different ioternsl buRering s t y b  (a) input (b) output (c) mwpoint (d) 
shared bufering 
can attain TP,, d when the buRering budget + m. In ease of pure-input 
buffering, a TP., 01 75% is achieved under the same aspumption of infinite huRering 
budget. Also all the studies have .gned that the shared buMog strategy has the 
best performance amongst all others if the buffer sizes are finite. Haaever, shared 
buffering has the highcst bardware mmplexity. Thw, it has been argued in 168) that 
cmaspint bufiering strategy is the beat choice if both mmplexity and perfonnmce 
are considend. 
We mentioned in the previous neetion that real fabrim have limited h d e r  re- 
sources. As we approach t h e  real limits we dim- that internal buReriug is not as 
attractive alternative as it appars with the assumption of infinite buffer resources. 
For example, in 161, the prfonnance of a 64 x64 banyan network with internal huf[er- 
ing was studied. The network was built h 4 x 4 SEs with each having a shared 
buEer of site 64 cells. The prformmce was studied using fiw di8ere.t buffer m w -  
Figure 1.4: Performance of 64 x 64 Banyan with i n t e d  shared buffering amq? 
161. 
ing ~ehemes under uniform and non-uniform t d e  loads. The sehemep are no mntmls 
(VC),  pwhout (PO), boc-am (BP), mslrieted bodpmmm (RBP), and deloyd 
pwhovt (DPO). The simulation -Its sboaed that the best and the ammr e l l  lms 
ratio levels am obtained when using DPO and NC schemes respectively. Ewn with 
this cooriderable amount of internal buffering, which is 2880 cells in the whole hbric. 
the obtained levels of cell ims ratio are nr), high ap depietcd in F i r e  4.4 under URT. 
Far example, when adopting the best internal buffering managing &me, which is 
the DPO, the best eeU lms ratio level of lo-' can he only aehievnl under a load of 
25% Obviously, this pnfarmancc is not acceptable for broadband applications. In 
general. internal buffering is not desirable for the following reasons: 
. it d t s  in aut-of-sequmee pmblem in multipath networks as we discussed 
earlier. 
it complicates the internal structure. of the SD. Thb is refleeted on the design 
arpeetr, fault diagnosis, testing, area, speed, porn planning ... etc. 
. the eompiexity/pcrformmee ratio is high if compared to other solutions used 
to improve the performance of Mms; such as pipelining, replicatioll, dilation. 
In our analysis, wemmpare the pfiomanceofthe BG metwart with both thecrassbar 
and the ideal nonbiockiag networks. We haw chosen the ideal nonblocking namrk 
because it has the best performance that can ever be d i e d .  We also have chosen 
the crossbar network because it h s  @od internal nanbloeLinb characterhim. The 
reference model used for the t h r e  fabrics is the same as the model shown in Figure 
4.l.e. We deeided to use only the input-output buflering strategy. That is, no internal 
buffering is adopted in the three fabrics due to the reasons disused above. 
We briefly provide an werviev about the a r m w  of the obtained simulation r e  
suits in thii dissertation. Let H be a random -able rep-nting the outcome of one 
of our simulation experiments. We assume that H loll- s m a d  distribution with 
both unknown mean p and variance 8. A 100(1 -a) preent twc-sided confidence 
interval on p is given by [69]: 
where rn is the number of simulation trials at each point. I? is the awage of the m 
simulation trials, S is the standard deviation of them simulation trials, and 
is the t-distribution with n - 1 degrees of M o m .  h d l  our simulation trials we a 
a=O.OEandm=ZO. 
4.3 Uniform Random TratBc 
URT is perhap the mad mmmon traffic used to audy the performance of snitch 
fabrim for the following reasom. Fimly, the owrhesd o d d  to generate the U P 3  is 
wail  compared to other traffic models. Secondly, analytical modelling is feeble for 
almmt every switching arehitenure. Thirdly, URT pmvide more realistic loads than 
the previously famous permutation traffic. In the litmature, almmt all the p m w  
srvitchig architenures are mainly studied under URT loads. The URT -me8 the 
following: 1) the traffic is arriving at the input ~ r t r  with the same probability which 
is equal to the average applied load to the network p, and 2) the arriving cells at  each 
of the input pons selects their dstinations randomly with equal pmbability. 
4.3.1 Analytical Modelling 
lnalytieal modelling is the vehicle me3 u, validate simulation results. In the lit- 
erature, many eLns  have blen exerted to model the switching arehitfftum under 
uniform and nan-uniform cralfic loads. The analytical model an introduce 1701 here 
can be used to dc~erihe the perform- of the BG network when impbmented in 
either the pipelined fsshion or parallel W p r e s u r e  fashion discussed a the end of 
Section 2.4.2.2. Before we d k u s  our model in detail. an introduce the lollowing 
notation: 
K = Number of parallel planer. 
t, = Rpaervatian cycle. 
T = Switching cycle. 
~ ( t )  = Pmbability that a butler has i cells at switching cycle t. 
d(t) = Thmnghput after every t, ( t  is in step d t,). 
TP(r )  = Throughput alter mry T (r is in steps of T). 
Notice that. 
8.. 
m(t) = 1 - IFk(t) 
a=, 
In our model we acume the following: 
The probability of cell arrival is the same for all input linkr. 
Figure 4.: Diserete Markar chain d the input butler status (a) at  the beginning d 
every T. and (b) after wny t*. 
. .&niving cells will Rqueat outputs with equal pmbabiliw. 
m Cells arrive only at  the begi~ning d each switching cycle, that is. no fresh cells 
arrive in each reservation cycle. 
The cells depan from the output huiTer at  a rate of 4 x K cells every switching 
cycle. 
The Bmt two wumptions rep-t URT and imply that any SE located in any 
stage 1s indlingulshable fmm the other SEs belonging to the m e  stage. .4p a 
result, each stage can he charmeriaed by any of its SEr. The third assumption 
enabler splitting the dixrete Markav chain, which is wcd to demibe the input buffer 
status, into m m s h m  in Figure 4.5. The fourth =umption is basically equivaknt 
to assuming an infinite output buffer. Figure 4.5.a s h m  the Markov chain for the 
ioput buffer status at the beginning of euew switching cycle. whereas Figure 4.5.h. 
shorn the blarkov chain Lr the input buUer status after ewry m e m i e n  cycle. Fmm 
Figure 4.5.a we em anite: 
And from Figure 4 5.b we ean write: 
where 
d(t) = KO 
1 -POW (4.12) 
and S ( t )  is the probability dpmpgati ly a e l l  from an input poh af the BG network 
ro the required output under URT. This is explained in Appendix C. The m l t i n g  
throughput of the oetwrk is given by: 
The buffer status is initialized u, represent so empty h d e r  a. fall-: 
Table 4.1 introduce bothsimulation sod analytical multts ofthe TP- with B, = 0. 
Notice that a unity TP- for the simulation m u l s  dwa not n e e 4 y  mean am 
cell loa; it means no e l l  aac lost during our simulation trids. The number of eel& 
which were apptilied in any of our simulation experiments aac at least lo7. That is 
Table 4.1: lnalytieal and simulation results for the BG network mth zero input 
buffering 
a cell l m  level of lo-' is mured in care of unity TP-. We o t s e d  that as we 
inereme the capacity of the ioput boffer to two nUa far a network of two data planes. 
we obtain a unity TP., for both analytical and simulation results. This describes 
the ef6ciency of the BG network. We alm found out that the size of the input buffer 
marginally affects the performance of the network mmpmd of only one data plane 
as shown in Table 4.2. Thir agner with the fact, we will d i m  later, that input 
queuing has lesp impact on the performance of the network than output queuing. 
4.3.2 Finite Output Buffer 
In this aeetion, we ind iga te  the paformane of the BG network compared with 
the ermrbar and the ideal nanhloddng nenork. B G K  is a pipelimed BG network 
compmed of K data plane.. Similarly, C-barK is a piplioed C-bar network 
composed of K data plane.. Thee -Its are obtained for a buffer budget of SWO 
cells per input and output queue. E m p t  for the crossbar network, ewh input queue 
length is set to 1WO eeUs and output queue b set to 4WO cells. Sinee Che eonsumption 
rate is a m e d  to he 1 in dl our airnulation experiments, oo output buffering is 

[ Load I Ideal I BG-1 1 C-bar-1 1 Clcsrbar-2 I 
1 1 < 1 0 -  1 < 1 1 < 10.' 1 < 10.' 
7" I c ~n-7 I c ~ n - ~  I c ln-, I < tn-i - - . . - . - - . . . -- 
30 1 <lo-' 1 <lo-' 1 < LO-' 1 < lo-' 
1" I 7 -  I 7 -  I r 1"-7 I I,"-, 
Table 4.3: Cell Imp under URT for different network typcs of N = 2 3  
required for the crossbar-1 network. Accordingly, all the M e r  budget is applied o o b  
to the input queues of the crossbar-1 network. For the cmssbar-2 network each input 
queue is set to 2500 eelis and each output queue is net to 25W cells. For the other 
embar -K  network, where K 2 3 the input queue are set at IWO ells  and the 
output queue are ret at  4WO cells. .Also the network size N is set to 256. 
Table 4.3 pmvide the performanee of the cell Imp for the nenorks under con- 
sideration. Both Figure 4.6 and 4.7 &a. the performance of the cell delay The 
buffering requirements are a h  depicted in Table 4.4. All the abow results s u m  
that a single dam plme BG oetwrk and the crossbar-2 network are sufficient to 
satisfy the requirements of the URT. It is aim noticeable that the performance of the 
BG-1 network is the dasest to the i d 4  network. HOL blockinrat the input bufers is 
the main roaron that the input buffering requirements of the -bar mnfiyracions 
are larger. HOL blocking is a eonstqueme of the inefficient dl relaying behavior d 
the crmhar network. This ioefficimt bebaviar makes the -bar architmum rely 
more on the input buffem rather than the output buffers. .4s an discussed in Sccticn 
4.2.1, HOL blocking is mare profouod in Input buffered networks. Channel gmupiog, 
virtual output queues, and windwing are mil -knm mechanisms used to esse HOL 
4ruapuadapn! am s p w d  go pw ng jo oqidoal a q j  .(PO@ SO) aoo am! w nq 
paqpj (pouad no )  pa@ aoo :laowm B m q  e n! s l l a~  oa?s~aoaa axnos 
aql IT% smnnw lapam a u  '[cL] m n m  q q s o m  a q r ~  s m m  %o!rme aqr 
10 w m  q-p w 3  lapnn a q j  '00!r~pm!s o! s ~ l n o s  Lrulsmq lapom or parn 
L l a p  asom aqr p w  xaldutm rrsal aqr q lapom y w u o  a q j  '[a] siW jo srsinq 
-ns an!- 3"q.j p1!as aql lo srdo!  aqr p q f i  rn adLr >gal l  s s! >gall  imma 
'qlOx1ao 2-'89- a41 lano e w e o  1-38 aqr JD 
a ~ m i o p a d  &lap ianaq aqr rnlaldxa cqe qj .vod mdrno a@s e or qlaa P or dn 
lan!lap w~ p!qn 1 3 9  wqr  laSml ue s r o a m ! n h  Bmlagnq rndo! a! 'uod rnd ro  
a lwp  doe or r-m re q l a ~  2 la!lap iloo un 2 - m q w  aq? ' a l ~ b  BU!PJIU!* . ~ U O  U! 
'aJo!S 'h m dn 0  moq a % m  aqr a pqm am uod rndrno apo!s e a nsanba~ rxod 
rndn! aqr Jo 0666 rsOmp I T 1  qlal 8'P 'ErP uolrmb3 no pamq n! 8') aM!p 
(el')) N ~ , 5 0  : N~ - - w d  
- N )  ( ; ) - 
:ss a p h  %!prvs nhe u! E I J O ~  
mdo! Lq psnanbaz q pod lndrno w meql (*d) &!l!qqold aql r o a s ~ d a ~  or par" 
aq om oo!mqursrp ! ~ l n o m  'sag!l!qsqold lmba q r v  qmopwl paraalas a n  svod 
mdrna aqr m e ~ a a  'jlasr! jnn q r  jo u s e o  aqr or onp s! jnn aqr j o  s>oaruas!nba~ 
aq, roj qxofirao ~-lsq-~ aqr p w  3a aqr qroq jo boa!~glns aqr arnr?a!ooJ a*, 
'anrmatol q 
oo!rs?oawa[dm! usnplsq qaqr 'sap!% .ma!qold a~oanbao-jolno q r  or peal sms!u 
-8qmm asaql 'I-H .[IL] n! w!=p m mnrw!pm p a y n q  mdo! o! S o ~ m ~ q  
Figure 1.6: .Average dl delay under URT for dXemt network 
65 

m. 
Table 4.4: l ~ ~ p u ~ u t y u t  I~ Kcrieg m(uircnnen1s under URT for 4iKcnnt ttetwork t y p  of N = 250. 
0 1 2 3 4 5  
Number of Input Requests 
Figure 1.8: Probabitity density for orurnbe% dinput requests lor s single output port. 
evaluated from two geometric distributions given b?. 1741: 
where L is the period length in cells, 0 5 R < 1 is the random number generated, and 
0 < p 5 1 is inverse of the a v e q e  period length in cells. The cells arriving at each 
input line in a hunt are destined to the same output line. This output is selected 
randomly URT can be masidered as a special ease of the bursty traffic with L = 1. 
Table 4.5 depicts the d l  l o s  ratio for single plane ccodguratioaf under loads of 
bursty traffic of various h m t  lengths. The results in Table 4.5 imply that a single 
data plane BG network can attain very I m  levels of eell lam under heavy h u n y  
traffic loads of average burst lengths L 5 10. Morwver, the BG-1 network has good 
cell lam levels with L > 10. 
Figures 4.9. 4.10, 4.11, and 4.12 represent performance of the average cell delay, 
maximum cell delay, input h a e r  requirement.. and output buffer requiremello, re 
speetiwly, for single data plane mnfigurations of the three networks under test. The 
muits suggest that the performance of the BG-I configuration outperforms the per- 
formane of the crossbar-1 configuration. For example, the average cell delay of the 
BG-1 mnfiguration almost coincides with the average eell delay of the ideal network 
as depicted in Figure 4.9.8. H m w r ,  for L > 15 and losd greater than 80% the 
average cell delay of the BG-1 network grows fmer than the awwe e l l  delay d 
the ideal oenmrk. The reason is that a s  the hum length iinmsres, longer h a  are
created at the IPCs. Them h-t. eoncentratn far longer periods an their destined 
OPCs raising the probability ofmtput blocking. A similar behavior can be obserwd 
for the maximum cell delay in Figure 4.10. 
.As shown in Figure 4.11, the input buffer of the -bar-I 0verll0~6 at a I d   lea^ 
than 40%. which is the I d  alter which the input buffer d m  fm the crossbar-1 
Table 4.5 Cell loas racio under bursty I d s  of rarious bust lengths. 
70 
under URT. We can also deduce that the input buffer of the m b a r - 1  werflaur 
at lou*t loads as L i n e m .  Similar to the results we obtained under URT I d ,  
the ideal n e t ~ r k  has no input buffer requirements for the burst lengths we use. As 
expected. the buffering requirements of the BG-1 configuration diverge fmm the 
ideal network for L 2 15 and I d  greater 80%. As depicted in Figures 4.11.~ and 
4.ll.d. the input buffer overflows for the BG-1 under the afaementiened loads. Thnr 
eqlains the higher cell l m  ratio (> 10.') lor the BG1 eonftguratioo in Table 4.1. 
Figure 4.12 implies that, and similar to our o h t i o n  lor the performance under 
URT, the crossbar-1 coofiguration d m  not need any output bufiriog. 
To have a deep understanding d the e m b u  and the BG networks we mmpnre 
the performance of their mnfigurations, which have more planes (K > I), with the 
ideal network performance. Tam immediate advantaw are obtained by using mo- 
figurations of multiple planes. Firstly, the dormance  is greatly enhaneed for low 
performance architectures, such sr the -bar-1 mnfiguratim. Semndly, the system 
availability is impmved because failure in any 01 the planes is mmpemated by the 
other functional ones, although the performance degrades lu one or more planes are 
out 01 service. Figure 4.13 depicts the number of data planes needed to reach 10.' 
cell loss ratio, for each 01 the three networks, under hursty tr&c I d  up to W6 01 
different average burst length. Our criteria to mmpam the pdmmanee of the three 
architecturn under test will he based an the number of planes r h m  in Figure 4.13 
and listed in Table 4.6. 
Figures 4.14, 4.15, 4.16, and 4.17 depict the simulation n s u h  of the awrage cell 
delay, maximum d delay, input buffer requirements, and output hufferrequirrments. 
respectively, for the mafigurations listed in Table 4.6. The results suggest that the 
delay performance of the p m p d  mnftguratioas are very similar as depicted in 
Figures 4.14 and 4.11. Honwer, the results also imply that d g  the witching 
Figure 4.9: A v e r w  cell deiay fordifletenr amage burst length (a) L = 5, (b) L = 10. 
(e) L = 15, (d) L s M. 
Figure 4.10: Maximum cell delay for diflmt awlage bunt l a s h  (a) L = 5. (b) 
L = 10. (c) L = 15, (d) L = 20. 
.m 
.Za . ~p 
1 0 9 1 0  3 . m - i . w  
L . .  . U 
10 ,w 
Figure 4.11: Input buffer requinrnents for different average burst length (a) L = 6. 
(b) L = 10. (c) L = 1.3, (d) L = 20. 
Table 4.6: Number of planes follaved to mrnpsre the multiple plaoe mnfigurations 
of the architecturn under test. 
Figure 4.12: Outpot bufier requirements for dierent average burst lmgh (a) L = 5, 
(b) L = 10, (e) L = 15, (d) L = 20. 
1 5 10 15 20 
Averam Bum Length (L) 
Figure 4.13: Reguind number of plans u, attain a 10.' e l l  low ratio with diflewnt 
average hunt lengths. 
networks under heavy loads is not remmmended due to the high delays experienced 
by the t r a e  waiting in the input and output queues of the network. This delay 
can exceed 36M) switching cycles under 90% b u m  losd with L = 20 as depicted in 
Figure 4.lj.d. This also applia to the ideal onwork, where s maximum cell delay of 
almost 3000 switching cycles can be experienced as sham in Fire 4.l5.d. 
The output buffering requirements are alro vet similar for the three networks as 
depicted in Figure4.17, except under loadsgreater than 80% and L 2 5. H a w w r .  the 
lnput buffering requirements differ as depicted in Figure 4.16. .As expected, the ideal 
network har the I- input hulking requirement. follaned by theBG network. The 
reason that the mmrbar-3 eonfipration has the highest input bueering requirement. 
is because it is less efficient than the Ideal and the BG configurations in relaying 
the traffic from the iPCs to the OPCs. The drop in the input bu&r requirements 
for the BG mn&uguratianr as shown in Fi- 4.16.b and 4.16.e is hecause the BG 
configuration in Figure 4.16.b eoosistsofooe plane, where- it mnsiats oftwo plane in 
F~gure 4.16.e. With the input buRering requiremeot. of the ermbar-3 configuration 
are the highest, one should the that output bullerhg requirements of the 
crossbar configuration should be I-. Bat the fact is, the routing algorithm of the 
eroabar architecture does not treat all the OPCs with equal priority. Hence, thme 
OPCs that are fawred need mme huReriageapmity givingrisc to the output buffering 
requirements of the nassbarmnfigurationa. The impact of that eRRt lagens as the 
number of plana inmaem. A fair muting algorithm for the e m b a r  oemrk  is 
expenswe in terms of hardrare complexity [19]. Obviously, this is an advantage of 
the BG architecture m the erorrbar architecture. 
ICI lm 
Figure 4.14: Average dl delay tor d ikent  average burst kosh (a) L = 5, (b) 
L = 10, (c) L = 15. (dl L = 20. 
Figure 4.1% Maximum cell delay for Merent averw burst length (a) L = 5. (b) 
L = lo, (c) L = 15, (d) L = 20. 
,=---~ ~ 
. -. . . -.. 
. I 
.=. 
f "  f .  .C- 
a -. . . -. - - 
I . . . .  I I r n I .  
- - 
1. a 
Figure 4.16: Input bufler rquiremene f a x  dilkmt average burst length (a) L = 5, 
(b) L = 10, (c) L = 1.5, (d) L = 20. 
Figure 4.17: Output buRer requirements for diflereot average burst length (a) L = 5, 
(b) L = 10. (e) L = 15. (d) L = 20. 
4.5 Non-Uniform Traffic 
In all the traffic loads diiursed pmviouslg, w assumed that the OPCs are equally 
(randomly) selected by the arriving traffic at  the IPCs. However, in resl broadband 
communication netnwb it is expected that the arriving t d c  will relet  the OPCs 
in a "on-uniform way In this section, we we a modified f m  of the model presented 
in [i] for "on-uniform traffic generation. The model in [7] apsumes that the OPCE 
mapping for each incoming eel! is determined by the binomial distribution, where for 
l < , < , V - 2 :  
o,, = PrIa e l l  arriving at  the Cth input is dPniwd to the j.th output], 
(4.17) 
and r, is the probability aspociated with input i .  For the binomial distribution, the 
maximum probability occurs for j = LNr,J. If OPCN-,-, is chosen to receive the 
highest percentage of the traffic arriving at  input i, then we get: 
,. , - 1 - 2 -   
N - 1  (4.18) 
For IPCo and IPCw.1, the addrear of OPC,, wbere 0 < j < N - I, is given by the 
normalized PPossiot~-lib diiributicn with rate r M foil-: 
The advantage of the above model is that it giws a substantial number of hot 
spots (instead of only one or two), which is mare realistic ar the number of the switch 
ponc is relatively high. Hoxewr, the abmn model deals di8-dy with the IPCs as 
IPCO and IPCN-I f o i h  a diirent  dmribution. Also, the d u e  of r, for the other 
IPCs is different which resulb in dill-t distribution for each IPC s. depicted in 
Ftgure 4.18. 
Figure 4.18: OPCs selstion pmbabilily for different lPCs according to the model in 
171- 
Table 4.7: Cell Ima ratio for single plane arehiteturer under non-uniform traffic load. 
In our modified model, we asumc that all the lPCs will el-t the OPCs with the 
same binomialdistribotion with abed value dr.. In the begioningofeach simulatioo 
experiment, the value of r, is determined and each IPC is randomly associated with 
an OPC, not m be associated with any other IPC. For each IPC, thearaociated OPC 
is located at the cater  of the binomial distribution. That way we solve the problems 
we d'seussed a h  and add another a d v a n t y  of randomizing the OPC ~ l s t i o n  
>n the beginning of the simulation experiment. We first study the perform- if 
the arriving traffic has an average burat l e w h  L = 1 and then later we scudy the 
perbrmance under d i h t  burst lengths (L > 1). 
Table 4.7 lirtp the cell l a s  ratio for single plane configurations of the architecture. 
under test. h shown, -bar-1 mnfigurstion can not satis@ the lo-' ceU laar ratio 
$I loads above 40%. While, the BG-1 e o t ~ ~ a t i o n  can satisfy the 10-'ell loss ratio 
for loads up to 90%. Figure 4.19 s h m  the behavior d the performance parametera 
of the various single p h e  configurations and cmabu-2 conkrat ion under d i imnt  
loads olneaunifonn traffic with L = 1. Up ta 90% load, the performance of the BG- 
1 configration is almost coinciding with the performanee of the i d 4  archimture. 
Again, the performance embar-1 eonfigration lags far behind both the BG-1 and 
the ideal eoofiguratiooa. Although, the cressbar-2 could hrlfill the lo-' cell less ratio. 
its performance could not also match the perlormanee of the BG-1 eonfiguration ea 
rhorvn in Figure 4.19. This -It is similar to what wt ffousd out under URT leads. 
One interesting observation can he found hy comparing the performance ofsimilar 
configurations when operated under the URT and the oon-uniform load. In general. 
tve can observe a general enhancement in the performance under non-uniform loads 
for all mnfigurationr. Howwet, this enhancement is more pmnauneed in the -bar 
configurations. For example, in Figure 4.19.e the ioput buffer nquirements for the 
crossbar-2 eonfi@ratim under 9096 oon-uniform load is almort 37. While in Table 
4.4 under 90% URT load the messbar-2 eonfiguration bar input h u f f  requirements 
01 112. Under non-uniform loads, the ch- of multiple IPCs destining to one 
single OPC are l m n e d  hcuuss each s o w e  is now targeting certain hand of the 
OPCs. .Aemrdiogly, the "on-uniform tralfie of L = 1, to some extent, resembles the 
permutation traffic. We b o w  from the diiusrions in Seetioa 2.4.1 that the -bar 
architecture has a perfect performance under permutatiml tralfic. 
The results we obtained under oon-uniform burst loads for single planceonfiguntioos 
support the same argument we slated under "on-uniform loads with L = 1 that there 
is a slight enhancement in the perlormanee under non-uniform traffic loads. This 
muld be explained by the same rearan we have given above. H-, we obserwd 
an interesting phenomenon for the BG-1 mr5guration. .Although the delay (average 
and maximum) performance is slightly better under nan-unilm bursty loads thsn 
under uniform bumy I&, the input buffer requirements are found to he slightly 
Figure 4.19: Performance paramaem ofdifferent mnQurations under different foeda 
of oon-uniform tral%e. 
Table 4.8: Input buffer requirements br BG-1 configuration under uniform and oon- 
uniform bumy loads. 
Load 
10 
20 
30 
40 
50 
60 
iO 
80 
90 
Load 
10 
20 
30 
40 
50 
60 
70 
80 
90 
higher under non-unlfom h w t y  I& than under uniform bursty loads. On the 
contrary, the output buffer requirements under non-unilorm burst? loads are l m r  
than under uniform humy loads. Table 4.8 and 4.9 depict that phenomenon. We 
attribute this phenometlon to the internal blocking ebarBeteristlm of the BG d i -  
tecture. As we mmtioaed a h ,  non-uniform tralfie resembles permutation traffc. 
That is, output bloeling is 1- pmnounnd under ooo-uniform traf6c. The laser 
output buffer req~:ments indicate that the longer input buffer queues are not due 
to overloaded output queues. The only ether fanor which can lead to the i n k  
input buffer requirements is internal blocking. Hmrever, the o d l  performanee ol  
87 
Input Buffer Requirements 
Nonuni 
9 5 0.316 
10.8 f 0 254 
11.350.159 
13.2f0.238 
15.7f0.376 
19.3f0.408 
23.4 f0.563 
40 f 2.203 
81.35 2051 
L = 
- Nonun, 
25.6fl.M2 
31.8i  0.761 
36.9i  1.052 
37.9f 1.028 
44.2i  1.005 
56.5i1.112 
62 .8 i  1.353 
120.2f6.955 
1000 f 0.000 
L = 
Nonuni 
15 t 0 590 
21.2 f 0 384 
23.350268 
26.3f0.508 
30.if0.676 
37.850.798 
4 6 . i i  1.249 
84.i i 4 363 
142.15 3 493 
L = 2 0  
Noouoi 
28.650.934 
39.5i0.990 
45.45 1.423 
145.9f 57.989 
59 .9 i  1.377 
69.2i1.307 
87.552.988 
151.4i4.483 
1000 i 0.000 
L = 5  
tini 
8.1 i 0.308 
10.8 i 0.313 
11.7i0.130 
13.550.245 
15.850.313 
17.750.316 
21 10.446 
28.8 i 0 849 
64.06i 1610 
15 
tint 
23.8f0.750 
30.9f 1.200 
1 . 9 f  1.008 
40.95 1.168 
43.1i  0.634 
53.1i1.079 
M . 6 i  1.011 
81.1i1.786 
988.6 4.895 
10 
Uni 
15.7 f 0.644 
20.3 f 0.670 
26.63zO.ii.1 
26.9i  0.600 
32.750.651 
37.65O.iSS 
42.9 f0.6'21 
58.5 f 0.724 
109f 1.61i 
Uni 
32.2i1.927 
38.2f0.722 
47.2 i0.997 
53.8 i 1.723 
55.9i1.079 
6 i i  1.384 
82.8i1.326 
108.7i  2.714 
1WO f 0.000 
Table 4.9 Output bufer requirement. for BC-1 mnfiguration under uniform and 
non-uniform bursty leads. 
the BG-1 configuration is better under no*-uniform burst hads. Thisstem from the 
fact that the effect d internal hloddng in the BG architecture is minimal. Prrvi- 
a u s l ~  it has been shown 1651 that the BG architecture has wry lox. internal blocking 
characteristics. 
4.6 Summary 
The numerical results in this chapter have demonstrated the efficient prfomanee of 
the BG network under various traffic loads. With finite buffering, s single data p h e  
2% x 256 BG network has been shown that it is sufficient to me* the requirements 
of the URT loads up to 90%. While, it has been dismved that a txo plane emssbar 
perfoms less efficiently with higher buffer requirements under the same loading mo- 
ditions. This is an indication that the muting capability af the BG network b mom 
efficient than the crossbar a configurations. Additionall5 a two plane 256 x 256 
BG network has s h m  to he sufficient to meet the requirements of the bursty trd- 
fie of different average burst lengths. A thm plane cmssbar cotlfiguration is found 
to be less efficient with higher buffering requirements under uniform bursty t d e  
loads. Furthermore, a tam plane BG configuration had l m r  buffering requirements 
than a two data plane -bar configuration under non-uniform traffic loads. This 
is an interesting result bffauae the non-unlfmm t d c  model we used rrsemble the 
permutation t d c ,  which is the tr&e type under which a single data plane embar  
network hm an exact zero cell losl ntio. In general, we noticed that the performantee 
of the BG neoxork mnfrgwatians is d m  to the ideal nonbloeliog nemrk.  It is oC 
vious fmm the abwe diaursions that the BG Prchitcnure is a vrrv stmog candidate 
for bmadband mmmuoieatioo networks. 
Chapter 5 
Design of the Balanced Gamma 
Network 
5.1 Introduction 
In this chapter, we iatmduee the design of tbe BG network. We first d d b e  the 
design methodologv ~ ~ ~ m m n d e d  hy CMC for the desigo B w  using dwp mbm'~emn 
(DSM) tmhndow. B~eeauae our dnign features built-in *If-teJt (BIST). we review 
the BIST methods usnl in the dFsign of VLSl systems, shedding light an the BIST 
mwhankm we adopt in the BG network. Then, we illustrate the archimural design 
d the BG network in detail. The structured each module in the deign hierarchy 
is diseumd. emphasizing the complodty and timiw requirements. We alw describe 
our appmsch for design verification d the vhde system. Finally, we pr-t the 
simulation raults for eaeh module in the system. 
5.2 Design Flow, hct ional  Test and Verification 
The design Bow followed in this h m t i o n  is bared on the design flow mmmended 
by CblC for DSM technologies [76]. As depicted in F i  6.1, the first four s t e p  
of the design Boar sre carried out using Swopsys CAD tools and the mt are carried 
out using Cadence CAD tmlri. In the beginning, the architecture of the nenmrk is 
described in VHDL at the RTL level followed by simulation to verify the function- 
ality The next step is to synthesize the RTL description. It is recommended in the 
Synopys doeumentatioo tbat large deigns, such a9 the BG netmrk, not he imported 
directly to the synthesis tool. Instead, a hierareh~eal bottom-up appmaeh should be 
followed. This is because importing lage designs leads to crarhiog the synthesis tool 
and m some e- may result in an unaptimized de*gn. Acmrdingly, me divided the 
architwture of the BG nemrk  inte modules. Each module is designed, synthesized. 
and tested separately and used as a building blmk in brming the Plul oetmrk deign. 
In general, dividing the design of any system has several merits. Firstly, the 
debugging process is simpler whm building high level modules fmm loar level ones. 
For example, sametimes the Synopsys synthesis CAD tool produes logical w o n  in 
the resulting gate level description of the design. It hnomes intractable to detect 
and eomct these logical enom in large designs. With a design eompaaed of smaller 
suhmodules, one can more casiiy point at the suhmodules tbat are malfuoniaoing. 
Secondly, we can comfortably introduce testing features to t h e m e m  under conrider- 
ation. In fact, the pm- of building a testing mffhanism may fall for large syaems, 
because the larger the system the mare the internal pans bemme inaeeesihle. The 
third reason is the reusability of the designed modules in future m e m s .  
Finally, thedesign pr- aas carried out to the fourth step, gatelevel simulstioo. 
shown in Figure 5.1. 
5.3 Design for Testability 
To meet the quality and reliability requiremmI d today's complex mmmuaicatioo 
networks. efficient testing metbodoiogies are offenar?, at all levels of system design 
(*tern, hoard, IC, ... ete.). In practice. the quality of such systems is m e a d  by 
Synopsys 
RTL Simulation - 
Synthesis 
Verilog Simulation - 
a n  { Place & Route 
Design Verification - 
Figure 6.1: Design B a  -mended b?. CMC. 
the average time during which it functions correctly without any h l t ,  and is termed 
mean time to failure (ErlTTF) as we r i l l m  in Chapter 6. High YTTF guarantees 
uninterrupted reliable network m i e s  toeustomm. This can be d i e d  b? incor- 
porating built-in test capabilities that monitor the system hctiooality periodidly 
in the field against any bi lum, ss well as reliable hardware components that are thor- 
oughly terred for structural defeets. In [El, a comprehensive and detailed treatment 
of digital EyStemS testing can be found. 
The conventional test methods in digital eirmits are constantly challenged b?. in- 
creasing speed and circuit size which demand sophktieated automatic t w  equipment 
(ATE). This -Its in wry high ce¶s as~oeiated with test hardware, test genera- 
tion, and test application time. Funhermore, at-speed testing (verification of system 
functionality at  the rated speed) requires high-+mance ATE, which results in 
muitifold increase in their prim. BIST OK- a test methodology where the test func- 
tions are embedded into the circuit itself. The t w  functions in BIST are Iodized to 
the circuit, thereby fadlitsting at-speed test and substantial due t ioa  in test sppli- 
carion time. Furthermore, BIST pmvide6 ew wee to the embdded eompaoeots 
and interconnections of the *tern without my speeial test requirements. For the 
above reasons we decided to adopt BIST strategy in the design of BG n e t w k  for 
testing. 
In [76\, an overview d the digital BIST t~hnique6 that are p a r t i ~ u l ~ ~ l y  appli- 
cable to telemmmunieatian systems is presnted. This article h s e d  the BlST 
tehniqus used at  diKerent system levels. In the nm *tion we d i m  BIST meth- 
a& in mare d a d L  
5.3.1 BIST Methods 
BIST methods can be divided inu, tn, p u p  as depicted in Figure 5.2. In on-line 
BlST 
I 
I MI.rme on-/me 
I 
I I I+ 
Functional Strucluml Cormnent NomancumMl 
Figure 5.2: BIST methods [a] 
BIST. testing occurs during normal functional operating mnditions; i.e.. the circuit 
under test (CUT) is mot placed in test mode where normal functional operation is 
locked out. Concurrent on-line BIST is a form of testing that occurs simultaneously 
with normal functional operation. Concurrent on-line BlST uses redundancy for 
testing. Redundancy can be achieved by either using = d i g  whniques or duplication 
and comparison. An example for coding technique is the parity hit added to check 
the correctness of the data transmitted o w  s swem bus. In the duplication and 
comparison approach, the CUT is replicated and a checker circuit is used to take 
the final decision which is bswd on the majority decision given by the repliear. In 
nonmneurrent on-line BIST, testing is carried out while a system is in an idle state. 
The test p m  can be intempted at any timen, that normal operation can mume. 
Testing in nanmncumnl an-line BIST is often accomplished by executing diapastie 
xlftwm mutincs or d i i s t i c  h w a r e  routines. 
Off-line BIST d& with testing a system when it is not carrying out its normal 
functions. Functional of- l ie  BIST deals with the execution of a Mt based on a 
functional description of the CUT and often employs a functlooal or high-level h i t  
model. Such a ten is implemented as d i i m i c  s o b r e  or firmware. Structural 
off-line BlST d d s  with the exeation of a tea based on the structure of the CUT. 
Fault coverage is based on detecting structural Bults. 
On-line BlST methods offer two advantages over off-line BlST methods. In on-line 
testing, the main function of the CUT is not intempted as is the case in of-line BlST 
during the test application. .h, the diagnostic information bemmes eontiouously 
available. Hmwr, on-line BlST methods suffer h m  tr,  main drawbacks [XI. 
Firstly, the amount of hard- complexity added is, an the average, higher than for 
off-line BET methods. In some eases, the added mmplexity may reach up to 50% of 
the total area ofthe IC. SRandly, the design effort is larger than off-line BIST methods 
beesune caution has to he t&n not to alter the state or function of the CUT. Indeed. 
a large design effon translats to a lonpr design phase which is not preferred as the 
time-t-market is a crucial faeter in the VLSI design proem. One advantage of using 
struerural off-line BIST method is the fldhility of the design p-. The designer's 
effort is reduced to selecting the sppmaeh and the stimulation a r a t g r  that fits the 
CC'T and to apply them automstieally. In other w d s ,  the design pro- is more 
standardized than on-lime BlST methods, where the derign pmcess is significantly 
influenced by the arehitenure of the CUT. 
To summarize, both on-lime and off-line BIST methods have pma and eons, and 
chomiing the proper method is always a tradedff between extra hardware and test 
application time. H m w ,  to sehiwe a 4 I m l  of testability s hybrid of both 
techniques can he used. Acmrdingly, in this dimnation a testing mdaoism, which 
incorporates both methods, is prop&. We use a combination ol a structural of- 
line BlST method and a nonmoennea on-line BlST method. Before we p-r~t that 
testing mechanism, we first diccua some isaues regarding the nruaural off-line BlST 
method. 
5.3.2 Structural Off-Line Architectwes and Stimulus Struc- 
tures 
The fault model which is soumed by structural tenting methads 181 is the well-knm 
single permanent stuck-at hult  model. The model assumes that all the components 
of the CUT am fault free and faults m u m  only in the connecting nets. If a fault 
occurs in one of these nets, then it haa a %ed logic value v (u E (0.1)) and denoted 
as 8-a-v. The testing mechanism geomtes the stimuli to expwe these expected faults 
by ar~uming that only one fault exists in the CUT. h other words. each stimulus is 
selected to expare only a single nuel-at fault in the CUT. The single stuck-st fault 
model ignores other types of faults such as: 1) multiple stuek-at hults, 2) stuck- 
at open faults 3) bridging faults where two or more nets are ahmed, and 4) dela? 
faults. especially if the tpst is carried out at lower rpeed than the nlonnal operating 
rpeed 1761. That means there is a passibiiity that a testing mechanism that e m p l o ~  
single stuck-at fault model may fail in detecting a fault. However, experience haa 
demonstrated the validity and the effieimcy of the sisgle auck-at fault model due to 
the fallawing reasons: 
. some faults which are not w e d  in the fault model may he ewered with the 
used stimuli; and 
detecting delay faults can be d i d  to a large extent if BIST mechanism is 
used, because the testing mechanism can run at  the normal operating speed of 
the CUT. 
The architecture of a structural BIST system should mntain t h m  main mm- 
ponents: 1) a stimulus structure which generaten the test stimuli, 2) the ~oPBIST 
mode circuitry or CUT, and 3) an output mouse  analyzer (ORA) which pmduees 
the final decision aa whether the CUT is fault-lrre or not. These three mmponmts 
Figure 5.3: DiR-t structural BlST arehitenures [9] 
can be totally separate or merged. Figure 5.3 depicts the different architenurn of s 
structural BlST r)nitem. In the merged Plehitmtures (part. b, e, and d of Figure 5.3). 
savingr in the hardware complexity of the whole system is realized due to sharing d 
the memory elements located in the CUT with the stimulus and the OR4 ehcuiu. 
However, in some csacl this is mot feasible due to some optimization considerations 
that are impwed by the system requirements. 
There are four appmaehes lor designing the stimulus structures [9]: 
exhaustive, all pcsible input stimuli are applied to the CUT. Since the amount of 
time spent on testing is limited, the oumbn of circuit inputs that can be erhanstiwly 
stimulated is usually limited to Llog,(msnimum testing eych)]. 
pseudoexhaustive, where the circuit ean be partitioned into suhcircuits, each d e  
pendent on a subea of the inputs, and thus can he stimulated exbaurtiwly. The 
circuit is tmted by Maing the pnit iws.  
random, a stimulus soure with a h e d  sequene is used to m i t e  the CUT. Tbe 
major advantage of this apprmch is the low m t  of the stimulw structurep. Its major 
disadvantage is the difficulty of designing a stimulus structure that pmduce. all the 
stimuli required to expose the faults. The stimulus structure cao be modeled by a 
I .&ppmach I Fdaustive I P m d a d a u s t i w  1 Random I Deterministic I 
CUT Type 
CUT .&naly- 
sis 
Stimulus 
Structure 
Test Efec- 
tiwness 
Table 5.1: Stimulus desigo a p p d e s  [Sl 
Test Length 
finite state machine. Thur the designer is uaually wtrined to chwsing a starting 
rtate and the number of test cycles. Fault simulation in rewired to determine which 
fanits are expwed. 
deterministic, the stimulus mum is required to generate a specific set of stim- 
uli. .Analysis of the CUT determines the stimulus set. The structures used for a 
deterministic a p p d  are ususlly barrd on finite state machines. 
Table 5.1 depicts the design musiderations for the show discus4 approaches. 
where CL stands for mmhinationai logic. SL stand6 for m(uentia1 logic, and ATPG 
stands far automatic test pattern generatiom. to the n a t  &ion we introduce the 
chip architsture and description of the testing mdanism we are pmpming in this 
dinenation. 
winput CL 
none 
separate 
100% for CL 
holm 
2" 
CL where the 
inputs can be 
grouped 
rtruetural analysir 
to assign p u p  
ings 
separate 
1W% for CL faults 
depends on the 
grouping method 
CL, SL 
structural 
analyds for 
some metbda 
separate, 
merged 
requires fault 
simulation 
CL, SL 
structural sod 
.&TPG 
separate. 
merged 
ma? require 
fault simula- 
depends on 
fault simuls- 
tion results 
tion 
depends on 
generation 
method 
Figure 5.4: Top level description of the BG network 
I 
5.4 Chip Architecture 
I 
The BG network is a MIN that enjoys a modular arehiteeture. Tbe modular ar- 
chitecture made the BG met& attractive for VLSl implemmtation. A top lewl 
description of the network arehitectum b depicted in Fimre 5.4. There are four main 
components in the dedg. of the BG metwork. These eomponmts are 1) a 1 x 4 mul- 
tiplexer used as the buildin& block for the h t  sage ( S t o w ) ,  2) a 4 x 4 cmssbsr 
used as the building block for the next n - 1 stages. 3) a 4 x 1 multiplexer lued ar 
Fabric Controller 1 
I 
I 
the building block for the output stage, and 4) the nehmrk main mntmller (NMC). 
We denote the 1 x 4 demultiplner s. 1 x 4 SE. 4 x 4 crossbar s. 4 x 4 SE, and the 
4 x 1 multiplexer as OPC to follow the notation we use in this dieartation. We did 
not carry out the design of the IPC due ta time constraints. 
W e  tried to automate the design proeee by reducing the amount ofctTon needed 
when changing the features d network. This is achieved by only changing a few 
parameters in a header file of the VHDL description. T h e  parameters are the 
network size N and the output butTer sire B,,. Honewr, each component in the 
system still har to be t a d  and wrified separately. 
The design of the BG n a m r k  is restricted to circuits based on synehmnous logic 
rather than circuits bawd on asynchronous logic. This is to eliminate the problem 
of logic hazards which makes the function ofarynchronans circuits impmsible. Wc 
hazards ean be eliminated by using dedundant bgic but then the e imi t  b a r n s  
untestabie. Several other reasons that favor syoehronons eireuim over myochmnous 
ones can be found in (8, 77, 781. 
The testing mechanism ve are proposing, s. we mentimed in Seetion 5.3.2, in- 
corporates both a structural off-line BlST method and a r o n m m n t  on-line BlST 
methad. In fact. wing only one of these methods to reach 8 good testability level is 
either expensive or time moauming. If we use only .%structural &-line BlST method. 
the network operation hss to be stalled dudng the t a  procedure, causing dinuption 
to the whole svstw. A h ,  if ve only use the naomneumt  on-line BlST method, 
the process of diagnosing and eomting the fault is very expensive in terms of the 
design effort. Accordingly, we use the noncooeumt on-line BlST method to regu- 
larly check the system far any functional e m .  If an error is detected, the structural 
off-line BIST method is activated to ch& the nehnorx for faults. If a fault is de- 
tected, the structural &-Be BlST m h o d  diagnose it and recodgum the network 
to tolerate that fault. The nonmontrrent on-line BIST method is n a  eowred in this 
dissertation but we will diiuss the structural off-line BIST method in more detail. 
Before continuing our diseusrbn, we review some issues that are related to the 
architecture of the network. The switching mmbanimr of the arehitmure we are 
proposing adopts a baekp-"re stratear. A switching c.ycle is composed of two 
periods. a reservation period (sametimes we call it the muting period) and a relaying 
period. lo the reservation period, each P C  sends an internal hesder represeating the 
cell at the HOL of the b d e r  that belongs to that IPC. The IPC creates this internal 
header during the translation of the VPI and VCI in the Sbyle header of the cell. 
This internal header contains the destination add= and priority of the cell to be 
routed. The structure of this i o t e d  header rill be explained later. The muting unit 
in each SE uses the arriving headers to set up a routing p t t m  for the main eella to 
be rela.&. This pmeeas continues until these internal headers reach the OKs. The 
output pcrtr acknowledge the arriving beaden. A header is positively acknowledged 
if 1) it sueeerafully reaches its output destination and 2) there is mom for the eel1 
represented by that header in the buffer of its destination. The ltdrnarle+ts an 
returned to the IPCs b the switching mechanism. lo the relaying period, the P C s  
pass the bodies of the cells that wre  positivdy admowledgel during the reemation 
period. 
We defined three types of cells, as s h  in Table 5.2. Accordingly, the size of 
the internal header is n + 2 bits. ReesU that n represents the number of the s t a g e  
in the BG network. F i r e  5.5 depicts the structure of the internal header, where D 
is the add- d the destination. 
Although a BG network d a single data plane m fuPl up to four bmadwt 
requests as depicted in F i r e  5.6, we ddded  not to include bmadeart feat- in 
our design because of the added mmplerity to both the SEs and IKs. Firstly, the 
Table 5.2: Defined eel1 types. 
Fngure 5.5: Structure of the internal cell header 
complexity of the SEs incream h u s e  the routing unit in each SE has to account 
for the newly added priority. Sceondly, the complexity d the IPCs increases because 
they haw to coordinate betmen eaeh other. This translates to extra mmmmication 
channels and eontmi units added to the circuitry of each IPC. This emdinatinn is 
essential when more than four hmadcast nqucrts are issued st  the main inputs of the 
network. Far example, asume the care when one extra hmadeat request is issued 
at the main input '5. of the network in addition to the four request3 shown in Figure 
5.6. This request will reach SE,? on ILo, SEz2 on I&, SEtz  on IL,, and SEIl on 
1 4  of Stage*. Eaeh SE of that *up receives 3 ~quets. Sine the muting deckion 
is the same in all SF& that belong to the m e  stage and the reque~t at the main 
input '5. is arriving at difIereot input links of the SF& in Stage,, the request aniving 
at the main input '5' will be treated diierently in these SEs. This t-lates to the 
fact that -me of these SEs may mute md the others may not route the w e s t  
mming horn the main input '5'. In that a, the mritehing mDmanisn bee- 
v e p  complicated in order to account for t h e  diverse anions and to whmledge 
hack ra the IPCr. Additionally, the IPCs have to keep track of the outputs that the 
hmadearted cells fail to reach and try to m u t e  t h m  cells again to these outputs in 
subsequent witching cycles. Indeed, this p m  demandr vep complex algorithms 
which add mare hadware to the system and may cauae it to run at lower s p d s .  
.&eeordingly, mrrdioatioo is oeeded amongrt the lPCd to minimize the complexity. 
W e  bbelieve a design which leatuns hmadeastiog at an -table nm for the BG 
network is aeh ied  by emplyingoneafthe follaviogsolutioos. The first solution [79] 
is to precede the BG network with a mpy network which generates N copies of each 
hmadcasted cell with each copy destining lo one output port. We prefer this solution. 
because it is suitable for the ease of multicarting where a d l  is only drstining to a 
subset of the output pons. The necood solution is to coordinate amongst the lPCs 
as w mentioned above. 
5.5 System Components 
5.5.1 Switching Element 
Figure 5.7 depicts the architecture of an 4 x 4 SE. The uehitmwe of the 1 x 4 SE is 
a smaller version of the 4 x 4 SE. The difirenees between the nn, srehitmtures will 
be highlighted as we p d  in our dimssioo. The arehitmure of the SE is divided 
into the following: 
Input Butler 
The input buffer primarily holds the kw header hits needed w take the routing 
decision in the SE. The h a e r  is composed of one queue in the care ~f 1 x 4 SEa and 
four queues in the ease of 4 x 4  SEs. Each queue is a shii register of size nf2-i hits, 
where i is the stage order of the nurent SE. Notice that we do not need to save the 
whole address of the denination Din the internal header ap we a d m c e  damn-. 


That is, each SE csn skip the hit just alter the prioria hits (PIPo) when p-ng the 
headers to the SE d the next stage. If ae we the bll size of the internal header 
(n + 2 blts). we vewould require a number of cycle (eydesldt) give& t: 
to shift the headen thmugh the network. While, in our appmaeh we only need a 
number of cycles (ey~le.,.~) giwn by: 
d e s . a  = x ( n + 2  - i). 
I-0 
= ( n + l ) ( n + 2 ) - x i .  
,=o 
= ("+1)("+2)--. (5.2) 
This means we have saved a total of cycles in the reservation priod. A bloek 
diapam of the input buffer d an SE at Stage, is shown in Figure 5.8. 
Pushout Multiplexers 
After the headan are w i v e d  €ran the previous stage and the routing pattern is 
established within each SE, the pushout multiplexen are used to push the header 
bits to the SEB of the next stage. Each queue in the input buffer is send  by an 
n - 1 + 1 -+ 1 pushoot multiplexer. The select lines for these multiplexen are fed 
from the Pequencer, which we will deanibe later. 
Decision Maker 
The dmirion malter is entirely a mmbi~tional lode circuit. It u s e  the 6n t  three 
bits (0-2) from each quwe in the input buffer m tabe tbe muting decision. That is, 
the decision maker has 12 inputs in a 4 x 4 SE. While, the number of the outputs in 
an 4 x 4 SE is 12 bits and they are divided as loll-: 
Four pairs (So, . . ., 0)  are used as selmon m decide which input link will be 
Figure 6.8: B l d  di- of the input buRer bank of an SE at Stage,. 
connected to which output link. The S, pair is uaed to dside which input link 
should he connected to OL,, far 0 4 i j 3. 
The other four outputs m used to identify which inputs have w i w d  active 
cells and which ones have m i d  idle cells. This is needed when shifting the 
headers to the n u t  stwe. 
We have diivered that the Synopsrj synthesis m l  d m  not generate gwd min- 
imization levels for large combinational circuits such as the one we are dealing with 
here. Accordingly, we decided to u* thewell-known combinational logic minimization 
tool (Esprmo) developed at Univnsity of California. Berkeley [80, 811. The follas- 
ins algorithm was followed h u a e  we also discovered better minimization -Its are 
obtained if the problem is divided into smaller ones: 
1. Find l 3 p - m  minimisation br the truth table of the 12-inputs and a subset of 
i outputs from the total 12 outputs. 
2. Repeat the above pmepvl until all mmhinatim C!2 are exhausted 
3. Repeat both steps 1 and 2 for d l  values of i ,  where 1 4 i 4 12. 
We then mange the results obtained in an mending order for each value of i. M 
terwardr, we identify the best set of eomhinatiaos that m all outputs and results 
in the minimum hardware complexiQ. By trial and enor, the set is found to contain 
two gmup; (11, 10,9,8) and t7.6, 5.4.3.2, 1,O). The latter p u p  represents the 
SO.  . .. SS pairs and the former represents the outputs used to identify the active and 
idle arriving cells. That implis that there is eertain ~m-t of cornlatian among the 
selfftor hits (7. 6, 5.4, 3.2, 1.0). Similarly, the admmledgment bits (11. 10,9, 8) 
have a certain amount of mmlatioo. 
Arbiter 
The arbiter selects between the maln inputs of the SE and the outputs of the buKer 
hank. In the reservation period, the arbiter selects the outputs of the pushout mul- 
tiple~ers to route the headerr to the next stage. lo the relaying period, the arbiter 
selects the main inputs of the SE. 
Fault Tolerating Unit 
Notice that in Figure 5.7 the dection pain So to S3 Imm the decision maker are 
feeding the fault tolerating unit. In e a a  of faults in the SEs d the next stage. the 
fault tolerating unit madi6e the select lines by giving priority to the cells routed to 
OLO and OLz becaw high priority cells are usually muted thmugh these links. For 
example. assume the SE in the next sage that is m o n d  to O h  is faulty. The 
fault tolerating unit will always direct the cek targeting O h  to 0 4 .  The e l l s  
that are destining to OL, will he negatively acknowledged, by the wharledgment 
unit we will describe later, and will be tried in the next switching cycle. In ease of no 
faults in the next stage SEs, the fault tolerating unit acts transparently so that the 
input select lines are transmitted to the outputs of the fauit tolerating unit without 
any modification. The faulty S L  of the next Mge are reported to the current SE 
through the "n& stage foilurn status" word fed from the BIST eireuitrv of the SE. 
Selection Unit 
The selection unit is the troe e m b a r  part in the SE. IU function is to switch the 
arriving cells to their requerted output limb. The switching decision b parred to the 
selection unit through the modified r r k t  lines originating from the fault tolerating 
unit. 
Aeknovledgmea Unit 
The aehowledgment unit performs the rewne hc t ion  d the seleetion unit. It 
receives the acknowledgments Imm the next stage and mutes them back to the pre. 
vious stage. Notice that the select limes &at are usd to l e d  the selection wi t  are 
the same one that are feeding the acknowledgment unit. The reason is that the 
acknowledgment unit har to he o K e d  the final routing d~ i s ion  to giw the pmper 
acknowledgment. Notice a h  that the sele-3 liner are ANDed with the next stage 
failure status word to bl& a p i t i v e  acknowledgment that might arrive fwm the 
fault? SEs in the next stage. 
Sequencer 
The sequencer pla? a wry impohant mle in the operation of the SE, as well ar in the 
operation of the whale oetmrk. The sequencer synchmnizes the p- d moving 
the header f m  one SE to mother. Owe i t  a negativeedge on the triggering 
signal (STARTIN) fmm an SE in the previous stage, denoting the s t m  o f ~ n d i n g  
the header bits. the sequencer activates the el& of the input buffer to ,wive the 
header bits from the previous stage. It a h  i m n a  one wait clock cycle until the 
routing dsision settles within the decision maker. Then, it d a m s  the S b  of the 
next stage hy a negatiw edge en s triggering sigual (START-OUT) similar to the 
one it previously received h m  the p m ~ o u r  tage. Then it iterater n - i + I cyelrs 
through the sele-3 l i m  fed to the pushout multiplexers diaeurred previwsly. Fire 
5.9 depicts the ASM chut deneribing the sequener function. 
Testing Unit 
The testing unit is not depicted in Figure 5.7 for simplicity. The tst ing unit r ep  
resents the off-line structural BlST part of the testing mshan iw re are propsing. 
It is obvious that the &-line structunrl BIST methad is distributed thmughout the 
network, s ine  even SE har irp own t a t i i  onit. We ddded  to LUar  the stntrgv 
depicted in Fiyre 5.3r so that we do not mmplicate the stNcture of the internal 
core of the SE. A Mmplex mre tnorlater to a slawr system speed, and speed is&- 
ical since we are targeting a system speed of 155.52 Mbp (OC3e). We & followed 
the random approach for t t a  stimuli deign d i m 4  in Seetian 5.3.2 became ef its 
Figure 5.9: .4SM ehsrt of the SE sequencer. 
simplicity and mat effectiwoeu in the c s e  of our design. The fault emrage ranged 
f rom8j%forr lx4SEa92%brlx4SEs .  
During the normal mode of operation, the testing unit acts transparently such 
that the mre of the SE is mmected s all other SEs in the network. During the 
testing mode. the testing unit iaciates the mre fmm the ether parts of the network. 
Figure 5.10 depicts the block diagram of the testing unit during the test period. The 
linear reedhack shift re@ster (LFSR) represents the stimulus circuit and both the 
multiple in shift register (MISR) and signature analyzer r e p m n t  the O M .  The 
other blocks arc used to teat the link hetween the current SE and the other SF& in 
both the previoua and the next stages. The davnnream failure status buffer holds 
the status of the SEa in the next stap. The status word out of this buffer is used to 
feed the &avledgmel and hult tolerance units of the current SE. A h ,  this buRer 
is serially coanected a the SEs wrmundiog the eumnt SE and exist in the m e  
stage. This serial eaonection is used during the test period when chceking the testing 
mechanism itself. The testing unit rtepa through 10 states. The 6m state is when 
the network is functioning in its mama1 mode of operation. Stepping thmugh thee  
states is achieved by the network main eontmiler (NMC). We will list thee states in 
more detail when we discus. the NMC. 
Table 5.3 depieu the s t imaed  distribution of the hardware complexity in gotw 
for the different SEs in an 16 x 16 network estimated by Synapsys 1998.02-2 de- 
sign.onolynr tool. A p t e  is a 2-input NAND me. The results me b a d  en 0.35 fim 
CblOS techoolagr. The lm row in the table mpraeas the hard- complexity at 
an SE belonging to the h t  stage. The results in Table 6.3 m m t s  that the decision 
maker constitutes about WW d the m a l  complexity of an 4 x 4 SE in the sreond 
stages and ahow. In fact the maximum sped of the SE is determined by the speed 
Figure 5.10: Block diagram of the tening unit during test period. 
113 
of the decision maker through which the critical path of the SE p m .  Table 5.4 
s h m  the de1a.p for the critical paths of the SE. that belong to the 16 x 16 nerwork. 
The mimated delay thmugh the dleiaioo maker is 2.53 ns whieh mughly reprents 
44.5% of the total delay of the critical paths in 4 x 4 SE.. Clearlh this is a v q  high 
percentage. Luckily, the effect of the critical path of the decision maker is felt only 
dunng the routing period of the omwork. To teduce the e R a  of the critical p t h  in 
a 4 x 4 SE the sequencer adds an extra wait clock cycle before routing the hits of 
the internal header to the next stage. That nay we add enra latency to the system. 
However, the ratlo of the number of the added dock c y e b  (n) to the total number of 
clock cycler in one complete switehingeyele is insignificant. We ri l l  present the total 
number of clock cycles in one switching cycle after discwing the other components. 
5.5.2 Output Port Controller 
The design of the OPC we p m p m  here is eompwd done single buffer queue. That 
is. we do oat assume a scheduling mechanism of multiple service queues. .As depicted 
I" Figure 5.11, the OPC har the following blocks: 
Input ButTer 
The input buffer bar the same function ar the input buffer hank of the SEs. How- 
ever. the input bufer in the OPC holds only the priority bits (PIPo), which are no 
hits/ioput. The input buffer has the m e  architaure na the input huffem used in 
the SE., except it d o e  not contain any p d o u t  multiplexers. This is beeause ae do 
not need o push the header hits to any aubxquent s t w .  
Sequencer 
The sequencer of the OPC har nearly the same architaure ar the sequencer used 
in the SEr. Hoawer, it adds extra delay cycles h e m  the routing pcriod aod the 
relaying period. T h e  delay cycles am needed until the scknmledpent decisia is 

Figure 5.11: Arrhiteeture d the OPC 
116 
Table 5.4: b l a ~  of the SEs critical paths. 
transferred to the IPCs and the first bit d the cell is relayed afrm the network until 
it reaches the 0 P  Tbe nu& of these delay cycles (eydes&tnr) is given 4r.: 
The first two cycles are required until the lagie settles in the OPC, the next n cycles 
are required until the &oxIedgment reaches the IPCs, and the last n cycles are 
required until the first bit of the relayed cells reaches the OPCs. Figure 5.12 shows 
the ASbl chart of the v e n e e r .  
Sorter 
During the muting period, the sorter is ~spomible for sorting the arriving e l l s  in a 
dmernding order based on their priorittes. This h -rial hr tno rearom: 1) to ease 
the pm- of placing the cells into the bufler. and 2) toeare the pmees of identifying 
idle cells horn active ones. The sorter also pmvide. the pointer controller with the 
number of active arriving ells. The sorter is composed of trx, mmbirlational circuits. 
The 6nt circuit is an binput 11-output circuit called the drisioo maker. Eight of 
the 11 outputs are used u, feed a 4 x 4 crossbar netwrh, which represents the second 
circuit. The other three output bits am used to hold the a u n t  of the arriving e l l s  
(0 -t 4). Thae three bits are feeding the pointer mntroller which will be described 
in the next section. The -bar passes the arriving cells in a deeeodmg order to 
the buffer contmllera. 
Pointer Controller 
The pointer contmller pmvides a painter that points at  the next empty loeation in the 
output buffer. This value is used b?. the boffer contmilm and the aeLnmledgmeot 
d ~ i r l o o  unit. The value of the pointer for the next switching cycle (pointer(t + 1)) 
is piwn by: 
pmter( t+l)  =pointer(t)+owit.ol(t)-brture(t+l); 0 < pointer(t) 5 B,,+l, 
(5.4) 
whcrepointer(t) and owival(t) am the valueofthe pointerand the number ofarriving 
cells at the end of the previous cycle, rerpenively, b r h m e ( t +  1) is the indication 
of whether a eell will depart fmm the buffer or not im the m m n t  switching cycle. 
and B, b the size of the buffer in the OPC. Notice that at moa one eel1 is a l l d  
to depart fmm the OPC in any swithingcycle, i.e. &prture( t)  = (0.1). When the 
value of pointer(t) is 0 it means the buffer is totdly empty, and when it is B, + 1 
it means the buffer is totally h l l  and no cells c d d  be placed in the buffer. 
Admmledgment  
The acknowledgment process gaer through t h m  steps in the OPC. The 6 n t  step is 
carried out by the block ".&&nowledge Decision" in Fiyre5.11. In that step an initial 
mimate of the adromledgment word is formed bawd on the buffer mupane? and the 
painter value. This initial aeknmledgmmt is meant far the raned eeb. Aemrdingly, 
in the s m o d  step of the a d r n m l a e n t  procls. the ".4cknaul@ block s h m  in 
Fire 5.11 redirects this initial acknowledgment to its proper order of the arriving 
cells. The Admowledge black has the same stmcture as the adrnonledgment unit 
of the SE. In the third Mp ,  a mrreetion has to be made to the output h m  the 
aekoavledge block beeam the initial ncknarledgmcnt is not md on the status of 
the cells, rhetber idle or active. This cometion is made by ANDing the status of 
eaeh eell with its c o v o n d i n g  output fmm the acknmleda block. 
Figure 5.12: .4SM &art of the OPC sequencer 
Figure 5.13: Bloek diagram of the BuRer mntrollers. 
Bufler Contmllns 
The number of the buter mntmliem is B,, + 1. Eseh buffer contmlln connects 
two rut-ive locatiolu in the output buter. Figure 5.13 depicts a block diagram 
of the buffer ccontmllm. ,411 mntrollm are led from the pointer mntmller and the 
four sorted outputs Lorn the 4 x 4 -bar. If the value (i - pointer) is negative 
-where i r e p e n t s  the location of mamller C, -, then loeation i in the huter is 
occupied. While, if that value is p i t i v e  Pod I s r  than 4, then the controller pl- 
cell (I - panter) fmm the -bar at location i d the buffer. That way we can 
simuitan~wly mite to 4 sequential locations of the buffer in the same switching 
cycle. If the output buffer is empty and departure = 1, the current design allow a 
fresh arrivingeeli to depart fmm the OPC without placing it in the Pnt location d 
the buter. This reduces the latency in the bu&rbg mmhmbm sf the OPC and thus 
the whole network. 
Notice that all the bu&r mamllrn an identical, except mtumllers Co and Ck,. 
Co connects location 1 of the buffer to the outside m i d .  CL, is not fed from any 
buffers because it is located at the end ofthe queue. All other mnuolim are led h m  
one location and feed to the next loeation in the b f i r  queue. That is motroller C, 
is fed from the output of location i + 1 and feeds to the input of location I. 
BuEer 
The buffer is m m p d  of Bd banks. Each hank is a basic shift register equal to the 
sire of the eeil. It is connected to two contmllers as dmribed above in Figure 5.13. 
Bath the sizes of the cell and the output buffer B,, are tuned in the parameter file 
we mentioned earlier in Seetion 5.4. 
Testing Unit 
The testing unit of the OPC is similar to the testimg unit of the SE. .4ccordingly, here 
too the earlier discussion for the testing unit of the SE is applicable. H m r ,  the 
tertrng unit of the OPC har the extra function of testing the huRrr. Testing the buffer 
is achieved by pushing a sequence of Cm follow4 by a nqumee of Is and chedring 
whether the output of the buffer is follavimg these sequences. Notice that in Figure 
5.11, the buffer is r h m  sepmted from the buffer mntmllers as the testing unit is 
inserted between both. The testing unit of the OPC a h  acts transparently in the 
normal mode of operation of the network. 
Table 5.5 depicts the distribution of the hardware complrdty for the OPC. The 
complexity of the OPC is marginalb dependent on the oetwork size N, where N only 
affects the number of dday eyeis needed bawcen the a n i d  of the header and the 
arrival of the first hit of the al l .  The d t s  in Table 5.5 are based an a huRrr size 
of 6 cells and a cell size of 2 bytes. 
5.5.3 Network Main Controller 
The NMC piayr a vital mle in the operation of the BG network. The .4SM &art 
describing the functioning of the NMC is depicted in Figure 5.14. The network eao 
operate either in a normal mode or a testing mode. In the normal mode, the NMC 
Distribution of hard- complexity for OPC (in 
Deeirion Maker 
Sequencer 
Pointer Cootmller 
Input Buffer 
Arbowledge 
Crorsbar 
Buffer Controllers 
Buffer 
Terting Unit 
Otben 
Total 
I gates). 
Figure 5.14: The ASM chart of the network main eantmller. 
Seq. 
- 
40.98 
58.7 
44.07 
- 
- 
- 
392.14 
373.39 
- 
909.27 
Comb. 
231.41 
38.59 
i i .71 ' 
- 
42.95 
23.63 
79.48 
- 
228.76 
90.52 
813.05 
initiates the muting period of the SEs that belone to Stage0 by b r i w g  all the 
startin dgnais of t h e  SEs to logic 0. The NMC keeps t h e  signals law for n + 2 
cycles, which is the oumbn of eych  d d  to push the internal headers to the SEs 
of Stageo. Then, the NMC bring all the s a t i n  signals again to logic high and waits 
a number of cycles that are w e d d  to continue the muting pmiod. The total number 
of clack cycles in each switching cycle (yder,-,.,) is given by: 
where cellsize is in bytes. The lletwrk eperstes in the normal made indefinitely as 
long as the input signal teststan to the NMC is not activated. When it is activated 
the SbIC changes to the testing mode. lo the twting mode, the NMC activates the 
testing units in the SEs and 0%. The NMC & enables these testing units to step 
through 10 states. During each state part of the testing protocol is m u t e d .  The 
states of the testing unit in the SE are: 
Isolate: the testing units isolate the cores of the SEs and the 0% fmm all other 
components. 
CheckM&.nism/CheckEu&I: for the SEs, this state isealld c h d m R h a n i m  
and br the OPCs it is called chd-bufler. During this period a sequence of Is, foC 
lowed by a w e n e e  hr, is shifted thmvgh the testing mechanism to check the serial 
links betareen the testing units for any faults. The output quence is collected by 
the PiilC and c h d  lor feults. For an OPC, the testing unit rhifta sequences of Is 
and 0s through the memory elements d the buUer and eb& the eutput sequence 
for faults. 
CheckVnksl: the &st state to chwk the l i d s  for sal faults. Eaeh testing unit 
sends 0s a m  the all output link and m i -  ($ over all inputs linb. 
CheckJinLs2, "red to latch the results of the previous ten state. 
CheckJinLsJ: similar to state ch&Jiolo.l but far s-aO laults. 
CheckJinks-4: similar to &&inks2 By the end of this test sate,  each SE holds 
the status of the links to the next stage. 
Checkstmcture1: the tcsting units prepare for structural testing of the SEs and 
the OPCs hy hooking up the LFSRs and the MlSRs to the internal mm. 
Checklitruet-2 carrying out the structural testing. 
CheckstmctureS: latching the results of the previnrr test state in the next stage 
failure status word deaeh SE. 
ShlRf es t~esu l t s :  a c o p  of the test m u l u  are serially a h i W  to the outside amrld 
through the NMC. .4Iter this test state, the NMC (sets) r e ~ u  an output signal 
"test~esult" if (no) faults &. 
The total complexity of the NMC is 531.81 gates; 391.44 eombimtional and 140.37 
sequential. The critical path is found to be 4.35 ru. Notitice that: 
. the netwmrk can tolerate a fault in an SE by using the hult Mleranee uniu of the 
SEs heated in the previous stage and mnnected to the faulty SE. Recall that 
the bult toleranee unit d a n  SE redirects the tramc thmugh the output links 
depending on the next stage fault status amd set up in the cheeLstrueture3 
1s t  state .bow. 
the network ean not tolerate faults in the OPCk and the 1 x 4 SEs, but it cao 
detect and I-te them. 
In the next chapter m niU d i m s  the fault tolwance properties of the BG rretwmk. 
Table 5.6: Illustration of cell arrival in Figure 5.15. 
5.6 Simulation and Test Results 
hrifieation of the n-rk has t e n  carried out on different levels of the design 
hierarch?. Testbench- were used to check the functionality of the 1 x 4 SEs, I x I 
SEr. OPCs. NMC, and the whole design. We wmte C++ mutine to generate the 
test vectors needed to t en  each component. F i r e  5.15 shows the simulation -Its 
for a synthesized 4 x 4 SE in Stagez using 0.35 p n  CMOS technology. The r h m  
mapahot npreaentr the muting period of the SE, which is the critical period due to 
the inwlwment of the decision makm. The 11eg.tive edge on the s t u t i n  signal from 
the previous stage at time stamp 44935 ns demotes the start of the header arrival 
from the previous stage. Then a n  three cells arriving in that muting period at [LO,  
I L , .  and IL2. .Ir depicted ia Tabk 5.6, the headers arriving at ILo and ILt  have 
law priorit? and the e l l  arriving at I& has high priority. All the arriving header. 
have a muting bit equal to 1. A d i i y ,  the e l l  arriving at ILs  will be directed to 
OL2 and one of the other headem will be mured to Oh.  Also, idle e l l s  are pushed 
through O h  and OL,. The resulting sequemce an the output links MAIN-OUT of 
the SE, as sham in Figure 5.15, is 2. 1, and 1. The -Its of the timing diagram 
suggests that the SE ean run eomfonahly a a speed af 200 MHz, which is sufficient 
for our target speed 155.52 MHz. 
Figure 5.16 describe the operation d the OPC. As sham in Figure 5.16 and 

Fin 5.16: Simulation d t s  for an OPC. 
127 
Table 5.7: nlustratioo of cell arrival in Fiyre 5.16. 
illustrated 10 Table 5.7, there are two low priority cells arriving at 14 and 1 4  in the 
current swilehing cycle. The pointer value is qua i  U, 1 in the current switching cycle 
beeaw the previous value of the pointer is 0, two edls were arriving in the previous 
switching cycle (not s h m  in Figure 5.16) and the a n e n t  departure is 1 (refer to 
Equation 5.4). The clock speed in the timing d i m  is MO MHz. 
The timing diagram describing the operation of the NMC as well as that for 
the whale network are not pmvided because they require huge space lor presentation, 
=peeially during the the testing mode of operation. Hmver,  the obtained simulation 
results have demonstrated the eorreet functionality of the network. 
5.7 Summary 
We discussed the desia d the BG network. The derigu is mainly built from three 
main modules; the switching element (SE), the output port mntmller (OPC), and the 
network main eanvoller (NMC). The architecture and the function of esd, module 
were dereribed in detail. Camplnity and timing requiremeats for each module were 
pmvided. The design of the BG network featturn a BlST mechanism which mossts 
of two methods. The &st is an off-line structural method which is distributed ail over 
the network beeause each module, e x q t  the NMC, hm irr o m  internal structural 
testing unit. The seeond is an am-line n o n m n c w t  method, which it is used at the 
system level. We d i r d  the test protam1 used for the pmpaaed tet ing mechanism. 
We also provided the simulation results for theSE and the OPC. The d t s s u g g s t e d  
that the nemrk can operate eomfonahiy at a speed of 200 MHz. This meam that 
the network can comfortably support the OC3 bit rate which is 155.52 hlbpr. In the 
next chapter. we direus the fault tolerance properties d the BC nemrk. 
Chapter 6 
Fault Tolerance and Reliability 
Properties 
6.1 Introduction 
In thb chapter, we di- the fault tolerance and the reliability perlormame d the 
BG network. The fault t o l e m e  pmpmties of the BG network are sham to he 
exceptionally better compared to other MINa. The network is a single fault-tolerant 
and robust in p ~ e e n e  of multiple faults of the switching elements. We use t h m  
models to evaluate the terminal reliability, the broadcast reliability and the oetamrk 
reliability of the BG network. We alno use redistic failure rates for the n e r d  
building blocks. Thee  failurn rates an b a d  on a 0.8 pm BiCMOS implementation. 
Throughout this chapter we assume that the i n a g e  l ' i  coanmiog the SE. are 
highly reliable. 
6.2 Background 
In [82j, a dimmion of eleven fault-tolerant MINs has been prwnted. A general 
criterion to compare the fault-tolerant characteristics of diflemt MINs was described. 
In addition, a eonelu@ diseusion on the diflerent tshniqua uaed to obtain a fault- 
tolerant structure was presented. Thee techniques anre divided inm two goup: 
reehoiquer that alter the topology and Kehniquea that do not alter the topology. 
The lormet group of techniques are used in muleipath MINs, where other paths are 
used if a lault is detected in one of the paths. Replieation and adding extra stages 
are example for the iatter gmup, i.e., trying to increase the llumbcr of paths between 
mpur-output pairs. 
The reliability is another imponant aspect used to difereotiate MlNs. Sweral 
studies on the subject of MINI' reliabilities have been repend [83]-[88]. [16]. The 
metries used M m m w e  the reliability of MINs am terminal reliability (TR), broad- 
east reliability (BR), and network reliability (NR]. The BG neomrk exhibits superior 
reliability performance owr other MINs that have a similar hardware mmplexity. For 
example, a new MIN that has a comparable hardware complexity bas been recently 
reported [89,90]. H m r ,  when mmpamd with the BG network. the pe r fo rnee  
of the latter network is pwrer beeause it employs almost the m e  routing technique 
used in a parallel banyan nemrk. It h s  been shown that the performance of the 
BG network is superior to that d parallel banyan network [65]. 
6.3 Fault Tolerance Properties of the BG Network 
A fault-tolerant MIN isone that is able ta mute eelk from input pow to the q u d  
output pens, e m  when some of its oemrk  mmpooents am tsulty. A fault can be 
transient or pennanmt. Here we m e  all faults are ptrmanea. To nudy the 
fault tolerance properties of a MIN, we haw to define a fault toleraoee model, h l t  
tolerance criterion and fault tolerance &bod. The fmlt taleranee model character- 
i ze  all faults assumed M o m ,  sating the failure mode (if any) for each n m r k  
component. The fault toleranee criterion is the mnditioa that mua be met for the 
network to be said to haw tolerated a dven lault or faultr. Mcst a t d i e  -me a 
fault tolerance criterion that satisfie the full areecp property in MINs. A naaork is 
said to have full aeces property if a cell at any of the input ports can be routed to 
any of the output ports. The fault tolerance method is the way to overcome the faults 
dmeribed in the fault tolerance model in order to fulfill the fault tolerance criterion. 
The fault tolerance model are use is rr folim: 
1. SE faulrc an random and independent. 
2. SE faulu are permanent. 
3. Eaeh faulty SE is totally unusable 
4. All faults are detected and located with full sumas 
6. Any link failure m be subsumed by the failure d the SE that immediately 
precedes the link. 
The final fault tolerance madel of the BG network is s c h i  by adding the following 
modifications: 
1. Faults can occur in any SE except in the fint stage. 
2. The output buffem are highly reliable. 
Figure 6.1 depicts d l  the SEs in the h t  few nap of the BG nenork that m 
be visited by a cell originating at inpot port i. If the cell is muted upwar&, i.e. the 
routing bit is 0, then it may visit either SEX., or SE+2,1 in Stage,. T h e  fM SEs 
are encapsulated by an ellipse and mukrd ar i aod i + 2, respectively. H m w ,  if 
the cell is routed d m ,  then it may either vkit SE,.L,~ or SE{+,,, in Stage,. Thee 
two SEs are encapsulated by a. ellipse and marked as i - 1 and i + 1, respectively. 
Here is an cxampie. SE,*raz may din the arriving wll to SE,+s> or SE,+w if the 
8.1: AU @Me SEs that cao be visited by s cell a r r i h  at input port r. 
133 
routing hit is 0 and may direct the arriving mU to SE,+2,~ or SE,+lo.r if the muting 
bit is 1. To summarhe, each SE can send the miring cell at  its input to one of 4 
SEs in the aext stage. Additionally, each encapsulated SE pair in Figure 2 implies 
the fact that these two SEs are mnnsted to the same SE in the previous stage and a 
cell existing in that SE may m h  one of them. These SEs are called critical pairs. In 
[91]. we haw proven that the BG nemrk  will lase full mw propen?. iffany of the 
eritical pairs fails. By inripffting Figure 6.1 we ndice that S F J  l o r n  two critical 
pairs with SE,+*, and with SE,-w,. We denote the critical pair SE,, and SE,+*, 
together as (1.1 +I, j ) .  
More unreaehahie output porn d r t  as the failure of critical pairs takes place in 
the stag- elme to the input pons. For example, in a 16 x 16 BG network if the pair 
( i . 1 + 2 , 1 )  fai ls tha alloutput p t a i - 6 ,  i - 4 ,  i - 2 ,  i, i + 2 , i + 4 , i + 6 ,  i + 8 ,  
% + lo .  1 + 12 and i  + 14 are mmpletely unreachable for a mll oririginating P input 
port t. Howwer, if the eritical pair ( i , i  + 4 , 2 )  U l s  then only output pons i  - 4.1, 
i  + 4 .  t + 8 and i + 12 are unreadable by a mll originating at  input pon i .  
To summarize, we can cooelude thst the BG network bar N critical pairs in each 
Stage,, where 0 < rn < n - 1 and only N / 2  critical pairs in Stag%-, because in 
thlr stage the eritieal pain ( i . i  - Y ' , n  - 1)  and ( i , i  + 2"-'.n - 1)  are basically 
the rame (we know that P-' = Nj2) .  A d d i t i d y ,  it is clear thst the BG actwork 
a a single fault-tolerant network betnus any single hult in the network can he 
tolerated. Although not all double or multiple SEs faults muM be tolerated because 
eenain combinations of SEs form critical paw. Conquently, the BG n m r k  can be 
considered ap a mbwt nemrk  in the ease of multiple faults. We alao oatiw that the 
clwer the fault?. eritical pain are to tbc input ports the more the number of output 
pons that are l i l y  to be unreachable. 
r o P a /  - r . I . I - 1 j i l 1 1 9 - 6 1  
I'C 5839633 1 j58031 68 ' 570245 21 1 581lOl 70 1 :,lhlIlj 13 ' 
Y\!C 151927 71 1 148280 19 1559591% 1 l5311GM 1 159918 -2 
Table 6.1: SEs' eomplRities (in pm2) fm different sizer of the BG network 
6.4 Reliability Analysis 
In [92]. s two-fold reliability model for 0.8 em BiCMOS tshnology is p m t e d  to 
estimate reliability of VHSIC/VHSIClike slld VLSI integrated circuits. We use this 
model to estimate the different reliihility mcmrep for the BG network. Fintly, the 
failure rates are emluated brred on the area of each elemmt in the system. T h e  
failure rater. mp-nted by the failure rate A,, are used to evaluate the reliabilities 
of the individual SEp. Semndly, mntributions to the system reliability fmm other 
factors, ouch as envimnmeotal operating conditions, expected number of pins, hhti- 
cation quality and pda@ng, are taken i a o m o u a .  T h e  factors are represented by 
another failure rate An. In our analysis we ansume that the iatnconncnians bet- 
stage are highly reliable. Table 6.1 emphasizes the design mmplexity for different 
network sizes ranging from 8 x 8 up to 128 x 128. The -0 an use 0.8 grn BiCMOS 
t~hnalogy is that the ahwe mentioned madel is only applicable to that particular 
tmhllology. Table 6.2 pmvidep the Mlure rates d the SEE using the estimated areas 
in Table 6.1 according to the h t  pan of the model. We notice the ratio h e m  
the area o Ia  SE in the firn stas to the area of m t h e r  SE in the 0th- stage is not 
the same as the ratio h e m  their failure rates. This is because, in the model, the 
Table 6.2: Eailures/l@ hours (Ar) for the components of 128 x 128 BG network due 
to the first part of the model. 
Table 6.3: EstimaMd failum/lC haun (Arr) due to the semnd part of the model. 
significance of the term that masins the area is minor when compand m the other 
factors that are technology dependent. Table 6.3 provide the failure rater due to the 
seeand part for different network sim d the BG nauork. 
Three measures are normally used m ~ s c s s  the reliability performanee of MINs 
[87]. These are terminal reliability, broadcast reliability, and network reliability. 
6.4.1 Terminal Reliability 
TR is the pmbabiity that there exists a t  ieut  one fault-& path lram a particular 
input port to a particular output port. TR is alnay asglciatd with a terminal path 
(TP) which is onctwne  connection baaeen an input p x t  (the s o w )  a d  an out 
port (the dedination). A n m r k  is considered failed if it is not able to establish a 
connection b m  a given s o w  to a given destination. The ret of paths in s network 
between a Qwo inputoutput pair is represellled as a directed graph, sometimes 
referred to as the redundancy graph (R-pmph) [=I, with its Mnins representing 
the SEs and the edges representing the connetting link. This R-gmph is used to 
determine the T R  of the network. In our d y s e s ,  we assume constant failure rate A 
of all the components, henee reliability of the SE is given hy: 
We know that a cdl has an option of being muted t h r d  either one of two links 
in each stage. The R - p p h  of the BG network is om the same for each inpul-output 
pair and is dependem on the destination. Certain input-utput pairs use only a pair 
of SEs in each stage of their T P s  and henee have a lawn T R  than the other pairs 
which use more than two SEs at nnain stages. The T R  of an input-utput pair is 
the lowest when the least s i g n i h t  I;] tags bits are identical. Figure 6.2 shows 
an example of a best stew R-gmph for a 16 x 16 BG network and Figure 6.3 shows 
another example for a worn ease R-pmph. F i r e  6 depicts the worst case R-gmph 
for a general N x N BG network. 
We take the worn caw to evaluate the T R  d the BG network. At each stage one 
of the critical pair SEs has to be fault lie. Therefore, the worn c s x  T R  of the BG 
network is Qwn by: 
where pi is the reliahilityof SEI,, (0 < k < N- I),  popc is the Aiabiiity of the OPC, 
and PNMC is the reliability of the NMC. AU the R, POPC. and p ~ ~ c  are evaluated 
using the values previously pr-ted in Table 4. p ~ ~ c  k included in Equation 6.2 
because if the NMC faik the whole system faik. Table 6.4 depicts the b s h a k r  of 
the T R  for various sizes of the BG netwok. The figures obtained in Table 6.4 signify 
STAGED STAGE, STAGE, STAGE, 
1 
Figure 6.2: TR best care R-gmph for 16 x 16 BG network. 
STAGE STAGE, STAGE STAGE I 
Figure 68: TR wont case R - m h  for 16 x 16 BG network 
STAGE STAGE , STAGE STAGE". , 
Figure 6.4: TR worst c e s  R.gmph fw N x N BG network. 
Table 6A: Behavior of the TR far various sizes of the BG network. 
the fact that the T R  is not aRected by the oemrk  size h r  a mission pericd of 11 
pars. This is beesuse the SEs' failure rater are mall  and e k e  to each other. k fact 
the estimated failure rates in Table 6.4 d- not take imto amount the other f a o m  
mentioned abwe, which are packaging, number of pins, ne., and repen ted  by the 
failure rates provided in Table 6.3. We could not incotperate thew faaon when 
evaluating the T R  beeaun they dafribe the whole system, not a s p d l  ioput pon 
or a specific output port. Obviowly the -Its in Table 6.4 demonstrate that the 
TR of the BG nctvork is very high even for large nnwrks. That is, any particular 
input-wtput rwnectioo can be mablbhed with a hiih reliability. 
6.4.2 Broadcast Reliability 
Although we did not bxpora te  the bmadcwt feature in the design of the BG net- 
work due to the ~eaxws meatiowd in Seetion 5.4, we brie8y d i s u s  the BR 01 tbe 
BG network in thir section. BR is the probability that there exists at kart one fault 
free path fmm a panicular ioput port to all output p a s .  BR is always srrociated 
w~th a broadcast p t h  (BP)  which is a conllstieo from one Mume to all destina- 
tions ~n the network. Under thir criterion, the network is considered Stled when the 
connection cannot be made from a @wn input pan to at least one of the output 
ports. Bmadcastiog is achieved by muting the braadewt ell to tw output links, 
one is up and the other is dom, ar it naehes any SE. Another methad lo implement 
broadcasting is to generate N mpies of the b m d c m  cell at the input port, each one 
of these copies destined to a di8-t destination. 
Contray to the TR , the BR is identid lor all input parts and therefore the BG 
network has a uniform BR. The bmadewt R-moph for oran 8 x 8 nepuork is sham in 
Figure 6.5 BR R-gmph for an 8 x 8 BG network. 
Figure 6.5. From F i y n  6.5, the BR of an N x N BG s e m r k  is @ven by 
BR = p. x 
- II -a)2)'~] . pgpc * p r w .  /=, 16.3) 
Table 6.5 provides the numerical values for the BR of the BG o m r k  using the 
failure rates prwided in Table 6.2. Again, the factors that contribute to the second 
pan of the reliability are not used for the same remn mentioned when we prnmted 
the T R  caleulatiom for the BG nemrk.  The eflect d the network s i x  is very obvious 
on the pnlormance of the BR of the BC network. For a mission period dose to three 
years, the BR deteriorates to 87% for a neLMrL of sin 128. By inspecting Equation 
6.3 we find out that the term pgpc is responsible for the deterioration of the BR. 
Table 6.6 depict= the BR lwels for t h e w e  period in Table6.5 Pne exelude that t m  
(aauming that the all the OPCs are highly reliable). The results in Table 6.6 s u m t  
that the network architecture is not the bottleneck. Inned, it is the reliability of the 
output stage and r e p m t e d  by the term &pc that is sifleaotly afieeting the BR 
/ Yisien Time I 8 x 8 I 16 x 16 1 32 x 32 / 64 x 64 1 128 x 128 ] 
Table 6.j: Behavior of the BR for various a im of the BG network. 
I Mission Time I 8 x 8 I 16 x 16 1 32 x 32 1 64 x 64 1 128 x 128 1 
Table 6.6: Behavior of the BR for various sins of the BG nenmrk by excluding the 
PL, term. 
of the BG network. Moreom, this output stage reliability will always limit the BR 
levels of any other srehimture because if one OPC fails the architecture will lase itr 
broadearit pmpwty. 
One way to improve tbe BR performaon is by incres~ing the redundancy for the 
output stage. Thiscan by achieved by usiogaaodby unitsoithe OPCs in catjunction 
with the fdluredetectingmshanism r e d i d  in the previous chapter. Thismight 
lead to separating the output stage to mother chip. 
6.4.3 Network Reliability 
.NR is the probability of maintaining full ac- property thmughout the network. 
Recall that the BG network will I- lull ae- property i f s  critical pair fails in the 
network. The N R  of the BG network can be given by: 
where 
FSR = reliability of the first stage. 
SR. = reliability of Stop,. 
LSR = reliability of the Stage.-,. 
OSR = reliability of the output stage. 
and the? are given by: 
FSR = hN, 
NR 
LSR = LNF,pf . : ( l -  R-I)'. 
,=o 
(6.7) 
OSR = (6.8) 
INF, and LNF, indicate all pwible combinations of I SE fdailurer in an interme 
diate sage and the last s t a p  respectively, which do not make the BG network lase 
full access property. Both INF, and LNF, are given by: 
LNF, = ( ) x ( y!:;' ) ; 0 f i 5 N/2. (6.10) 
,=o 
The proofs of Equations 6.9 and 6.10 are provided in Appendix D. 
The overall system reliability ( N R , d , ( t ) )  can be given by: 
where the NR(t)  is evaluated wily Equations 6.4 up to 6.10 and the M l w  rater 
provided in Table 6.2 u, estimate the reliabilities of the SEs and A n  is obtained 
from Table 6.2. Figure 6.6 depicts the overall diahility performance br different 
network sins for rnhion periods of up to three years. .& exp ted ,  smaller networks 
feature better reliability figurn. H m e w ,  if we exclude both F S R  and O S R  by 
assuming both the components of the first stage and the output stage are highly 
reliable, we obtain very high reliability figure. sr depicted in Table 6. i .  .As we 
d i s e d  previousiy, tbtsfmcture d the BG oetmrk is not the bntlowek in deciding 
the overall NR. It is both the F S R  and the O S R  that d ~ i d e  that performance of 
the NR. . W n ,  tbe fan we have just d i a d  is the same for my other misritehig 
archimture. This is b ~ . m ~  a failure inany of the 1 x 4 SEs or the OPCs will mult 
in lading the full ae- property of the network. 
6.4.4 Mean Time to Failure 
The system mean time to fai lm (IWTTF) is given by: 
0 5000 loow l5mo moo 25WO 
Mission Time (hours) 
Figure 6.6: ( N L & )  performam of the BG nemrk 
Table 6.7: Behavior of the NR far various sizes of the BG network by excluding Wh 
the FSR and OSR terms. 
145 
Table 6.8: Estimared BG oetaork fallure./lOb hours. 
In general, a system failure rate is giwn by: 
A = L .  
MTTF (6.13) 
Table 6.8 pmvid- the failure rate (Aeo,) fm the BG network due the first pan 
of the model as well as the o v d l  cumulative failure rate (ABO-~.). ABoJ snd 
Aec-d, me defined as: 
1 
",= 
(6.14) 
1 
= - BN&(t)dt '  (6.15) 
The results in Table 6.8 emphasize the mle of the network size in degrading the 
reliability pedormmce of the oetvork. It is obvious the 6nn part fanor. of the model 
are contributing more to the overall m m  failure rate than the m o d  pan factors 
of the model. However, the contrihutiom d the m o d  pan is more signikant for 
smaller networks. For example, the hilure rate due to the a m n d  pan in sn 8 x 8 
network apprmimtely represents 25% of the werall mtem Mlure rate while it only 
represents 4.5% in 128 x 128 nenork. Therefore, we can conclude that failure in mall  
networks is partially due to envimmentd aperatitq mditions, expmed number of 
pins, fabrication quality d packing. H m w r ,  failure in large networks is mmly 
due to density and tedmolow wed. 
It is interesting to mote that the reliability of a 128 x 128 BG nchimrk is hater 
than 0.77 for a threeyear misrion period. This conpspoods t o m  MTTF of owr ten 
years for the device. Such high reliability figure suggest that thene networks are fit 
for practical implementation. Indeed, when repair is taken into m u n t ,  wry high 
atailability figures can be obtained. k diseussed earlier, the current anal* d- 
not inelude inte~tage links. Hmwr, ewn when they are included in the reliability 
model. we expect theoverall dependability of the device(=) implementing BG networks 
to be quite high. 
6.5 Summary 
The BG network is a single fault-tolerant nawork and mbva in case of multiple 
faults. In this chapter, we introduced an exact model for the network reliability of 
the BG network to obtain -rate multr. We alro adopted a model for failure 
rate prediction of the wry high speed integrated circuits to obtain realistic figures 
for the BG network reliability metrim. The animated failure rates are bsvd on 0.8 
wm BiCMOS tetbnalagr, and not the 0.35 w technology, due to the limitations 
imposed oo the domentioned model. The model is composed of two w. The 
first pan represents the tRhnolaw used and the area d the design. The seeood 
pan represents the envimnmeral operating mnditionr, expected number of pins, 
fabrication quality and padragjng. The m u l t i q  failure rates are d to evaluate 
the three reliability metric3 tennind reliability, bmsdepn relisbility, and network 
reliability. The obtalned results far t hee  three metric. p m  that the BG netmrk is 
a robust and reliable sviteh fabric. For aample. a 128 x 128 BG network has a mean 
time to failure of above 10 pars. The -Its alro r h d  that the deterioration in 
the BR and NR is mainly due to both the reliability of the fun stage and the output 
stage. In any other snitching architeerere, both the diabilitier of the 6nt and the 
output stager will haw a similar etIen. 
Chapter 7 
Conclusion and Future Work 
The main eontributiow d this work can be summarized ar follows: 
. Demonstating the outstamdim= performance of the BG network. The 
performance of the BC network hsr p rmn its Inn  eompetitiwnea with other 
switching arehitffture.. A single plane BC (BG1) mn6yratioo har s h m  to 
be an outstanding switching arehiteeture under various types of traffic loads. 
Under a wide range dUW, bursty, and nonunifom traffc loads, we discowred 
two imponant facts. FiFnlh the performance of B E 1  mnfiguratioo is almmt 
the same s r  the performance of the hypothetical ideal tletwork under the above 
mentioned loads. Scmodly, we dimmed tkat the performance of a single 
plane emasbar (-bar-1) mnfiguratioo &far behind the ideal and the B G  
1 configurations. The results provided in Chapter 4 demonstrate these two facts. 
These results are obtained under realistic renarim of finite buffering budgets 
rather than inbite ones to capture the aeet of the qwuing problems in these 
architectures. By -sing the -Its of Chapter 4 we m mnclude that it is 
not recommended that switching architectures be operated under heavy loads. 
This is b e u s e  the ideal oetwork, which prwides the best upper bound far any 
performance parmeter, ha. sham vew high I& of average and maximum 
cell delay under humy ttdie. .4s we how. BISDN traffic is going to be 
b u m  in nature. As we disused in Chapter 4, the efficient performance of 
the BG network is due to its arehitmural feature that enable meiviog up to 
4 cells by the OPCs in any switching cycle. 
. New muting algorithm. In this d i m t i o n ,  we introduced a new muting 
algorithm br the BG network which is assimpleas the routingalgMithm used in 
the banyan network. .48 weshaved in Chapter3, the new algorithm m i t a r e d  
modifieatioor to the earlier proposed topologies of the BG network. T h e  new 
modifications did not subdue the low mmplexlty or the high mcdularityenjoyd 
by the BG network. The mew algorithm has reduced the mutingdrision in any 
SE to mly one bit for eaeh wiving d l .  This has enables ua to mmlanably 
intmdun m lev& of priority for the cells handled hy the near topology. 
. Redinable arehit&-. One of the major tasks in this nork is to show that 
the BG nemrk  is not only an attractive network. s. far sli the performance 
analysis IS concerned, but to pmw that it is realizable s. far ar the hard- 
ware complexity is mneemed. In Chapter 5. we iiotmduced a mmprnheosiw 
fmnt-end dadgo of the BG mtwork. The modular and scalable architeeturn 
of the BG nemrk  has facilitated a hierarchical bottom-up and smooth d a i s  
process. The mcdularity has dimlcd os to fimly design the building blade - 
the switebing elemeat (SE), the output port motmlln (OPC), and the n-rk 
main mntmller (NMC) -and then use them to build the hiier level madula 
in the system hieraby. The scalability of the BG nemrk has enabled ur to 
parameterize the RTL dseripticn of the design. By changing a few paramem 
in one of the heads fles we re obtain a new and di6-t &ation ofthe BG 
oemrk.  T h e  parameters are the network siae N, the buffer size in the OPCs, 
and the cell size. Due to the nmng requirement far teatability in VLSI system, 
we included a BIST feature in the degn .  We a h  made use afthe BIST feature 
to build a self-diagnmtie and repair mechanism wbih t h  advantage of the 
fault tolerance properties of the BG network. The seMiagnmie end repair 
mechanism can localize faults and take the proper decisions to tolerate t h e  
faults. The results we obtained far our dedgo suggested that the network can 
operate comfortably at a speed of 200 MHz. T h e  results are b a d  on a 0.33 
prn CMOS tehnoiogy. 
. Accurate m l l e b i t y  modelling. Designing highly reliablespmm is amcia l  
requirement in the industry of broadband mmmunicationr where eonrequeneer 
of the system failures are wry expensive. Accordingly, n introduced an ccact 
modelling of the BG network reliability. Momver, to aesss the network reli- 
ability performaoae we adopted a model lor failure rate prediction of the very 
high speed integrated circuits to obtain realistic figure far the BG network 
reliability metria. This adopted model ir baaed on a 0.8 ~m &CMOS tcehaol- 
opv. The adopted mde l  is eompoeed of tam parts. The 6m pan r e p r e a s  
the technology used and the area of the design. The -nd pan reprents 
the environmental operating conditions, expeeted nomber of pins, fabrication 
quality and p a .  The m l t i n g  failure ratn am used to evaluate the three 
reliability metria: terminal reliability, bmadcan reliabiili*: and network reli- 
ability. The results abtsincd in Chapter 6 far ths.  three m a r i a  p d  that 
the BG network is a robust and reliable switch fabric. For example, a 128 x 128 
BG network hss a mean time to failure of a h  10 years. The results also 
s h o d  that the deterioration in the reliability metrics is mainly due to b a h  
the reliability of the first stage and the output stage. In any other switching 
architecture, both the reliabilities of the first and the output stages will haw s 
similar effm. 
Aecordingl~ we can eonelude the following: 
. The BG architecture is B wv stmngeandldate to be used in broadband mmmu- 
nicatian switch fabrics. The performance under wide range of loads of various 
tr&c models and the realistic reliability anal* of the BG architecture have 
proven this strong candidacy. This candidacy is a h  &med by the e q  and 
uniform VLSl desip of the architecture. 
. This study challenges the stmog belief by the bmadband mmmunication in- 
dustry that the Ihaohlodting" -bar architeture is the best candidate for 
bmadband switch fabrim fabrics. The -bar mnfigurations have shorn. less 
efficient performance unda the -our traffic models when compared to the 
BG configurations. lo addition, the emssbsr hu more complicated VLSI de- 
sign pm- due to its nonuniform architecture. 
. it is not reommended that switch fabrics run under heavy loads all the time be- 
cause this leads to undesirable high ImIs of delay and cell l m .  This conclusion 
is based on the performance results obtained for the ideal network performance 
and various tr&c I&. This is true for all ndning switch fabrics and a l s  the 
current practice. 
7.1 Future Work 
Although the BG network has shorn. outstanding perform- under dierent types 
of traffic loads, we still need to inmigate the performance under other -arb. 
Scheduling is a vny important issue as far as the quality of -ee is mn-4 in 
broadband communication networks. The tern scheduling refers to the mshanism 
that determines what queue is given an opportunity to transmit. A queuing structure 
and a scheduling mechanism attempt to achieve [94] flexibility, scalability, efficiency, 
guaranteed QoS, isolation, and U r n s .  IPCs and OPCs are the locations where 
scheduling mechanisms are needed to be investigated in the BG network. The m m n t  
structure of the first stage can only support one single queue at the first stage. One 
modification that faeilitatessupponiog two queuea can be achieved by replacing the 
1 x 4 SEE in the first stage by 2 x 4. Another modileation which supports four queues 
in each IPC m be also d i d  by building the 6rst sLage out of 4 x 4 SEs. That 
is. the whole BG network will be based on 4 x 4 SD. The hardware complexity and 
the realizability of the BG network will need to be investigated to know the impact 
of adding new service priorities. 
Buffer sharing is a well-how% method, where the buUering mutcea  of adjacent 
IPQ or OPCs an shared. As we d i s c 4  in Chapter 4, buUer sharing muits in 
the l o w t  buUering requirements. H m ,  this is achieved at  the expense of more 
complex hufer control mechanisms, apmially when the ievd of sharing increases. 
Accordingly, the performme and the implementation i s u e  should be studied to 
investigate the ewt-eUmtivens of bufFer sharing. 
Another issue that needs to be addrased b the performance analysis under multi- 
cast traffic loads. Multicarting hss always attracted the atteutim of marehem and 
designers. In [951, an e x m k v e  surwy of the multieastswitches lor the period between 
1984 and 1997 is presented with a timerableshowing the history of multicast switches. 
The authors haw resched the moclusion that them are three familie of multicast 
switches: 1) muiticart switch- without mpy networks, 2) multicast switches with 
cop? networks without prorbeduliog, 3) prneheduling before mpying. The survey 
conducted was only for space division switches and it rrvl dinemred that multiesn 
switches were barnd on one of the following nrchimtum: 1) knockout switch. 2) 
banyan network, 3) Clos network. In fact, multicasting is a mmplicsted pmblem and 
all the ~rn~ased archimturn m n o t  guarantee a mmplete solution for it. Both the 
design and performance i%ues should be inwstigated lor an? multicasting method 
prop& for the BG network. 
Finally. it would be optimum to test perfammce of the BG n m t k  under real 
B-ISDN trffie laads Hoawer, it seems this is not currently feasible because the real 
B-ISDN trffie is not established ya. Perhaps, testing the performance under mixes 
of different bursty sou- would be a more d i a t i c  situation. H m r ,  this will 
lead to longer simulation periods as the amount of computations needed for trffie 
generation will escalate. 
References 
[I] 0. K s ,  ATM Networks. loternatiood Thornson Computer Press. 19% 
[2] H. Yamanaka ct d., "Sealable Shad-Buffering ATM Switeh with a V m t i l e  
Searchable Queue." IEEE Journal on Selected A- in Cmnmunicottons, V O ~ .  15, 
pp. 773-784, June 1997. 
[3] D. Weil et 01.. *A 16 x 622 Mh/r ATM Switch: PRELUDE Switch Architecture 
Integrated into a CMiiion Tranairtor Monochip." IEEE J o u d  of Solid-Stote 
Cinuits, vol. 32, pp. 110&1114, July 1997. 
(41 S. Kathari, G. Prabhu, and R. Robens, 'The Kappa Network with Fault- 
Tolerant Destination Tag Algorithm," IEEE % ~ o c t i o ~  on Computers, wl. 37, 
pp. 612417, May 198% 
[5] R. Y. Awdeh and H. MouRab, " S u m  of.4TM Switch Architmtum," Compter 
Nettuonb and ISDN Sgstrms, wl. 27, pp. 156i-1613. Nmmber  1995. 
161 D. B d ,  A. K. Choudhury, snd E. L. Hahne. #Sharing Mmory in Banyan- 
BaKd ATM Switch=," IEEE Journal on Selected A- in Communicntiom, 
vol. 15, pp. 881-891. June 1997. 
[7] S. Q. Li. "Nwuniform TraKie Analysis on a Nooblocking SpaeaDiviaion Paebet 
Switch." IEEE ~ ( ~ e t i r m s  on Cmnmu~)i~tion, vol. 38, pp. 1085-1096, July 
1990. 
181 hl. .lbramoviei, M. A.Blpuer, and A. D.F"Pdmao. Dvlitnl System Testing and 
Te~toble DNigh Computer Science Prem, 1990. 
191 Rod Byme. A High-hel  Longuoge and CAD Enviornment for BISTEmbedding. 
PhD thesis. University of Victoria, 1994. 
[IO] P. Woog and M. Yeung. "Design and Analvrir of a Navel Faa P&t Snitch- 
Pipeline Banyan." IEEE 'hnsoetimu on Nehuorhg. vol. 3. pp. 63-69. February 
1995. 
[Ill bI. Bentall, C. Hohba, and B. Turton. ATM and In lmrr t  Pmtocol: A Canuer- 
gencr of Tcchnologiw Arnold Ioe.. 1998. 
[I21 W. Stallinp, ISDN and B&nd ISDN with Rane Relay and ATM. Plentiee 
Hall, Fourth Edition. 1999. 
1131 S. Kahav. An Engncrring Appmach to Cornpuler Networdr: ATM Netwrb.  
the Internet, and Me Telephone Netuwrb. Addison Wsly, 1997. 
[IS] W. R. Stevens, TCP/IP ~ltvrtroted Volume 1. Addison-Wesley. 1994 
1161 M. Guizaoi and A. Rayes, D~s*ing ATM Switching Nctworb. Mffirsw Hill. 
1999. 
1171 E. R. Coover, ATM Sdchcs. Art& H a w  Inc.. 1997. 
I181 P. N m a n .  "ATM Technology forcorparate NetMrkr," IEEE Communimtion, 
vol. 30, pp. 90.101, April 1992. 
1191 F. A. Tabagi. "Fasl P h t  Switch Arebiteturn for Broadband hteg~ated Ser- 
vices Digital Netmrb." Rce&ngs of the IEEE, vol. 78, pp. 13?-167, January 
1990. 
1201 J. Turner and N. Y a m a d ,  "Arebiteturd Choices in Larp Scale ATII 
Switches," IEICE Tmnsaeho~ an Cmnmuntcotiom, MI. E81-B, pp. 12b137. 
February 1998. 
1211 H. Ahmadi and W. E. Denzel, "A Survey of Modern High-Performance Switcb- 
ing Techniques." IEEE Journal on Selected A m  in Cmnmunimtsons, vol. 7, 
pp. 1091-1103, Septemh 1989. 
(221 E. W. Zcgura. " M i t e t u r e r  for ATM Switching Systems." IEEE Communica- 
tton Magerine, vol. 31, pp. 28-37. February 1993. 
1231 R. Rooholamini, V. Cherkky ,  and M. Garwr, 'Finding tbe Right ATM Switeh 
for the Market,' IEEE Computm, wl. 27, pp. 16-28. April 1994. 
1241 L. T. Lee and P. H. H m g ,  "A Hib Sped Famodal Style Memory Switch 
Architecture." IEICE Tm~aelimr. on Communicotiaa. MI. EX1-B, pp. 164- 
174. February 1998. 
1251 A. Thomas, 3. Coudrew, and M. Servel, "Asynchronous Tiedivision Switch- 
ing: An Experimental Psdat  Network integntiq Vi&mmunicatios." in 
h e n d i n g s  J ISS84. p. paper 32C2, May 1984. 
1261 A. Chemarin rt d,  94 High-speed CMOS eireuit far 1.2-Gbits/s 16 x 16 ATM 
Switching," IEEE Jorrmal of Solid-Stale CimiIr, vol. 27. pp. 11161120. July 
1992. 
1271 ti. Eng and M. Pasbm, "Advances in Shared-Memay D 6 i  for Gigabit .4TDl 
Saitchmg." Bell Labs Tcehniml J a u m d  pp. 175-187, Spring 1997. 
1281 bl. Praki. "A modular Element hr S h a d  Buffer .4TM Switch Fabric." in Pm- 
eeedsngs qllhs 10071EEE1ntmotiona1 Confnenae on Application Specific Sys- 
t m s .  Arch$tedum, and Pmeessors, pp. 432-436, 1997. 
1291 T. ti. Woo. " W g n  and Perfo-ee Analysis of C m h a r  ATTM Switching .k- 
chitemure," Compute Communicatio~, MI. 21, pp. 8894. 1998. 
1301 M. Hluchyj and &I. K m l .  "Queuing in Hi-pnfo-ce Packet Switching." 
IEEE Journal on Selected A m  m Communicatiow, vol. 6. pp. 1187-1597, De 
cember 1988. 
1311 bl. Karol el ol., "Input Veaus Output Queuing in A Space Division Paekct 
Switch," IEEE Tmmochmrr m Communimtim, vol. COM-3-5. pp. 1347-1356. 
December 1987. 
1321 J Hui and E. Arthurs, "A Broadband Packet Switch for Integrated Transpoe 
IEEE Journal on Selected Arms in Communidtow, vol. SAC:, pp. 1264-1273. 
October 1987. 
(331 N blckewn rt al., 'Tiny Tera: A Packet Switch Core." IEEE Mzno, pp. 26-33, 
January/Februacy 1997. 
1341 K. Genda, N. Yaman&, and Y. Doi. "A IMI GB/s ATM Switch Using Internal 
Sped-up C-bar Switeb Architecture," Electmnies md Cmnunimtiow in 
J o p n ,  vol. 80, pp. 68-78, September 1997. 
13.51 C. Gake and G. Lipovsb. " B a n ~  Networks for Partitioning b lu l t ip romr  
S!stems." in Fmt Annud S p t p m m  on Cornputs Amhilecture. pp. 21-28. 
1973. 
I361 J. H. Patel. 'Pefannaoce of Proeesoar-Memory Ir,ereonnwtions for MultipmeR 
10151(1 IEEE T m o e t i m  on Cornputem, vol. C-30, pp. i71-780. October 1981. 
1371 C. \Vu and Y. Feng, "On s Clars of Multistage Intereoonenioo Network," IEEE 
'hnradronr on Cornputrrs, MI. 29, pp. 694-702. August 1980. 
1381 D. Lawrie, "A- and Alignment of Data in an Array Pnrewar," IEEE h n r -  
oetionr on Computers. MI. 24, pp. 1145-1155. December 1975. 
1.391 51. Peme. T h e  I n d i m  B h q  n-Cube Multipm-r .hay," IEEE h n r o c -  
t l o ~  on Computers, MI. 26, pp. 458473, May 1977. 
1.101 D. Dim and M. Kumar. "P& Switching in M=N Multistage Networks," in 
Fmemiings of IEEE OLOBECOM'84, pp. 114-1M.1984. 
[41] C. Clos. "A Study d Nonhloeking Switching Networks," IEEE hnrastionr m 
Cornputma. pp. 4OW24. Mar& 1953. 
1.121 J. Beetem, M. Denncau, and D. Weingarten, T h e  GF-11 Supercomputer," in 
Pmued~ngs of the 12U Ann& S p t p m n  on Computer Amhitmhlm, pp. 1 W  
115, 1981. 
143) V. Benei. Mothenotiml Thmrg of Connecting N e t w a b  and Teleghone %BE. 
Academic P m .  lW5. 
144 1. Ohta, "A Simple Cantml Algorithm for R F m q a b l e  Switching Nefaorks 
with Time Division Multiplming Limb," IEEE J m m d  on Selected Amas in 
Communicationr, MI. 5. pp. 1302-1308, October 1987. 
[45] F. F. Liatopoular and S. Chalaaani, "&mi-Rearrangeably Nmlblocking Opera- 
tion of Clas N a w r h  in the Mullirate Environment," IEEE %mochas on 
Nehuorking, wl. 4. pp. 281-291. April 1996. 
[46] S. Saberan. W. A. Cmmland, and R. W. Scarr, "NonblaeLing AThf Switch- 
ing Networks Composed of ATM Switching Mduler." in Pmedings of IEEE 
CLOBECOM'S?, pp. 232-236.1997. 
[47] R. Meleo and J. Turner, "Nonbloeking Multirate Distribution Networks," IEEE 
Tmnsaetioru on Communimhons, wl. 41, pp. 362-369, February 1993. 
[48] ti. Eog. M. Kaml, and Y. Yeh, "A Gr-bie P&t (ATM) Switch Architecture: 
Design Prineipk sod Applications." in M i n p  of IEEE GLOBLECOM'RS. 
pp. 1159-1165, 1989. 
[49] T. T. Lee and C. H. Lam, "Path Switching-A QueASmtic Routing Scheme for 
Large-Scale ATM Packet Switch-," IEEE Journal m Selected A m  in Com- 
municatioru, wi. 15, pp. 914-924, Juoe 1997. 
(501 Naoaki et oL, "OPTIMA: Tb/s ATM Switching System .Archi&ture," in Pro- 
eoedtnp $ fie IEEE ATM.97 Workhop, pp. 691W3!%, 1997. 
[5ll D. Parker and C. W a w n d r a .  "The Gamma Nemrk." IEEE %~octxona on 
Computm, MI. 33. pp. 367-373. April 1984. 
[52] J. Hui, Swrtching md %Bc Themy f w  Intqated Bmodbmd Netwo*. Kluarr 
Academic Publishers, Boaon, 1990. 
[53] ti. Batcher. "Sarting Netuorks and Their Applications," in R w d i n p  of AFIPS 
Spring Joint Cmnputcr C a n f m c c ,  pp. M7-314, 1968. 
[j4] H. Sivakum and R. Venkatesao. "Blocking in Multistage Interconnettion Net- 
\wrLS for Broadband Packet Switch .4rehitfftura," in P d i n g s  of the S iM 
Annual Newfoundland Elrctrieol and Computer Engineering Confemce. 1995. 
[55l B. C. Lindherg, hgstd Emadbond NctwmCI 8 Snvins. MeGraa-Hill Series on 
Computer Communicatienr. 1994. 
[56l M. D. klareo and A. Pattavioa, "Distributed Routing P r o t m i  for AT31 Ex- 
tended Banyan Networks," IEEE Journnl m Selscted A m  m Communrenhwu. 
vol. 15, pp. 925-937. June 1997. 
1571 J. Turner. "Design of a Bmadesst Packet Switchillg Netmrk,' IEEE Tmnsoc- 
tionr on Comm~"imtiow, wl. 36, pp. 734-743, June 1988. 
1581 C. Kmskal and M. Snir, 'The Parformanee of Multisug lntermnncetioa 
Setworks for Multiproeeaon," IEEE nnN.d im on Compdas, MI. c-32. 
pp 1091-1098. Dsemhn 1983. 
[59] 41. Alimuddin, H. Alnuweiri, and R. Danaldson, "The Fat Banyan ATbI Switch," 
!n Pmdiings of the IEEE INFOCOM'PS, MI. 2, pp. 654666, 1995. 
I601 F. A. Tobagi, T. Kwok, and F. M. Chiussi. "Architenure, Perfmmnce. and 
Implementation of the Tandw Banyan Fapt Padat Switch," IEEE Journal on 
Selected A- in Communimtio~, wl. 9, pp. 1173-1193, October 1991. 
[61l J. Turner, "New Dimtion. in Commuoieariow (or Which Way to The Informa- 
tion Age?),"IEEE Cmnmunimtirms Mawzhe, pp. 8-15, October 1986. 
I621 P. Wang and M. Yeuag, "Pipeline Banyan - A P d k l  Fan Packet Switch Ar- 
chit=t"reP in f k d i n g a  of d e  ICC'g2, pp. 882-887,1992. 
[63] T. Cheng and Y. Sheo. "Perfomanee Analysis ofPardkl Banyan ATbl Switch." 
htemationd Journal a f C m m u ~ i m t i a s ,  vol. 10, pp. 43-62, 1597. 
1641 R. Venkat- and H. blouhah. "Balanced Gamma N e t ~ r k  - A New Candidate 
br Braadhand Paekt  Mtching Architectures," in P W n g s  ofthe IEEEIN- 
FOCOM'9Z. MI. 3. pp. 2482-2488.1992, 
[65] Harinath SivaLumar, "Perfomanee, Fault Tolemee and Reliability of Multi- 
stage Interconnection Netwrdni for B r o a d h d  Packet Switch Architectures.' 
Master's thesis, Memorial University of Newfoundland, 1995. 
[66] P Goli and V. Kumar. "Perfomann ofacrosspint Buffed ATM Switch Fah. 
ric," in F%cd~nga 01 the IEEE INFOCOM'92, wl. 1, pp. 4 M 3 5 ,  1592. 
1671 B. Zhou and M. Atiquazaman, "Effidnt Analysis a( Multistage ImMeonnectioo 
Networks Using Finite Outpt-BuRered Switching Elements," Computsr Net- 
w o h  and I S D N S y s t m ,  MI. 28, pp. 18C+1829. 1996. 
[68] V.P. Kumm et 01.. "PHOENIX: A Building BIoek for Fault Tolerant B m a d h d  
Packet Switch-," in F'mdings of the IEEE GLOBECOM'OI, pp. 228-233, 
1991. 
[691 W. W. Hines and D. C. Maitgomew, ProboYig and Stahrtics in Enpnming 
and Mnnwment Science. Job Wilq k Sons, 1980. 
[70] Y. E. Sayed and R. VenhreMn, "Modeling and Simulation of the Pipelined 
Balanced Gamma NefwDrk," in Romdmg of the Fourth IEEE Inlrmotiond 
C o n l m e e  a Qmt-ics, Oirruita, El Syalem IICECS'97), vol. 1, pp. 97-101, 
1997. 
1711 A. Pattavina. Switching Thmry: Arthitrchlre and Performace in Brmdbond 
A T M  Network. Wiley, 1998. 
1721 H. F. Badm and H. T. Mouftah. Wrfomaoee of Broadband Integrated Switch 
.4rchiteetuns with Input-Output-Buffering under Bxkpregure Mwhani-." 
Tech. Rep. 90-i, Queen's U~iwmity, Kinbsfoo. Ontario. Canada. 1990. 
1731 G. D. Stamoulis, M. .4niylmtou, and A. Georganta., 'Traffic Sauree Models for 
.4Thl Nemrks: A Surwy," Computm Cammunimtions, MI. 17, no. 6. pp. 428- 
438, 1994. 
1741 J .  Banks, J. S. h o ,  and B. L. Nelaw, Dumte-Em6 S y s h  Simulation 
Second Edition. Pmntice Hall, 1996. 
[iSj Royal Military College ef C a d a  and Canadian Mimlwuonics Corporation, 
Instructton on Bm'c hgital ICDeaign Flow Fmm RTL Description to Completed 
CMOS Deslgn Using Cod- (97A) and Sylopsya, Nowmber 1998. Document 
ICI-089. 
[ is ]  N. Mukherjee, T. J.Chakrsbony, and R. Karri, "Built-in Self-Tet: A eompkte 
Test Solution for Telmmmunieation System.' IEEE Commmrurtions Msgn- 
nnr, mi. 37, pp. 72-78, June 1999. 
[ i ]  J. Turino, Design to Tcat. Van Naarrand Reinhold, 1990 
174 G. R m d  and I. I. Sayen, Aduonrcd Simlolian md Ted MEUlodola~~'ed fw 
VLSI deign. Van Ncstrand Reinhold, 1989. 
[ a ]  51. H. Guo and R S. -, 'Mulitcart .4TM Switeha: Survey aod Per- 
famanee Evaluation." A C M  SIGCOMM Cmputcr Communimtion Itmiem, 
vol. 28, pp. 98-131, April 1998. 
163 
1801 F. J. Hill and C. R. Peterson, Cmnpvta Aadd Lqid h g n  m.th Emphosir on 
VLSI. John Wiley k Saos, 1993. 
1821 C .\dams. D. Agrawal, and H. Siegel. "A Survey and Comparison of Fault- 
Tolerant Multistage Intereonnectian Networks." IEEE Comptrr, vol. 20, pp. 14- 
?i, June 1987. 
1831 V. C h e r h k y  and M. Malek, "Reliability and Fail-Sofin- .9nalysis of Mul- 
tistage htmnnection Networks," IEEE T m ~ o e l i m  on Reltn6ilitg. wl. 31, 
pp, 324-527. Denmber 1985. 
1841 J. P. Prom. "Eaunds on the Reliability of Networks: IEEE T m o r t t a n s  on 
Rchohlity, ml. 35, pp. 260-268. August 1986. 
1851 J. Blake and K. Trivnli, "Heliabllig Anal* oflnterronnection Networks Using 
Hierarchical Com@tioa." IEEE T m w o c t i m  on Reltobility. MI. 38. pp. 111- 
119. .April 1989. 
[a61 A. Varma and C. Raghawndra, "Reliability .Mysis of Redundant-Path Inter- 
connection Netmrks," IEEE Tmuoctians on Reliability, ml. 38, pp. 130-137, 
April 1989. 
[871 C. Botting. S. Rai, and D. Agraral, 'Reliability Cemputation of Multistage 
Interconnection Networks," IEEE knsoet iow on Rcliobililg. MI. 38, pp. 138- 
143, April 1989. 
[881 J. Blake and K. TriMdi. "Multistage btemnnection Network Reliability," IEEE 
Tmluoctislu on Reliability, ml. 38, pp. 1MIO-1604, Nanmber 1989. 
1891 P. Tagle and N. Shanna, "A High Performance Fault-Tolerant Switching Network 
for B-ISDN." in Pmerrdmgs of lhc IEEE 14th A n n 4  h l l .  Phocniz C 4  on 
Computer And Commmicatiom, pp. 5-, 1995. 
1901 P. Tagle and N. S h m a .  "Performmee of Fault-Tolerant ATTM Switches." IEE 
Pmcerdmgs on Communication, voi. 143, no. 5, pp. 317-324. 1996. 
[Ql] Y. E. Say&, R Venht-n, and H. Sivakumar, "Fault Tolerance and Reliability 
Analysis of the B a l a n d  Gamma Network." htmofional  J m o l  of Pamliel 
and Ikstnbuted S w t m  and Networb, wl. 2, no. 4, pp. 244-254,1999. 
[921 Milttory Handbook, Reliability Prediction of Electmnic EmipmmL MILHDBK- 
21iF. 1991. (Updated in Feb. 1995). 
(931 D. A p w l  and J. Leu, "Dyamie Accauibility Testing and Path Length Opti- 
mization of Mullistsge lntermnDenioo Network." IEEE %moetions on Cam- 
puters. "01. e-34, pp. 263-266. M d  1986. 
1941 N. Gimux and S. Ganti. Quality of S m c e  in ATM Netsorb: State-of-the-Alt 
TmBc Monogmmt. Pmntice Hall PTR, 1999. 
19.51 W. Cuo and R Chang, "Multicast ATM Switches: Surwy and Perlormame 
Evaluation." SIGCOMM Cmnpltrr Communicahon Review, MI. 28. W. 98-13], 
April 1998. 
Appendix A 
Balanced Gamma Network 
Topology 
IL, r input link r. 
OL, =output link i. 
; + + )  / / j i s t h e s w e i o d e x  
for (i = O; i < N: I++) // t is  t h e m  index 
a = mod 2. 
if (a = 0) 
mnnect 04 to 1 4  ofSE,+>. 
mnnect OL, to ILt of SE,+zrtj+~. 
mnnect OL2 to 1 4  af SE,-Y~+I. 
mnnect O h  to JLS of SE+Y.)+L. 
eh 
mnneet OL. to IL2 of SE,-#,+I. 
connect OL, to ILs dSEICmj+t. 
connect OL2 to I 4  of SEW+,. 
connect OL1 to ILI ofSEit~tuJ+~. 
Appendix B 
Balanced Gamma Network 
Routing Algorithm 
The following notation i s  vportant  t o  understand the following psedo  
code: 
output[il- holds rhich input i s  connected to  output ' is. 
priorityCi1- holds the priori ty of ce l l  a t  input ' i ' .  
= -1 no ce l l  e l i l t s .  
= 0 low priori ty ce l l .  
- 1 hi@ priorzty ca l l .  
state-up = number of cells  destining up. 
state-dom = nmhr of cells  destining dom. 
cell-status - status of ce l l  whether going up o r  dom. 
ack[il - holds the acknosledgmat for ce l l  arr iving a t  input 'i'. 
= 0 ce l l  i s  dropped. 
= 1 ce l l  i s  accepted. 
output-decision 
a a p u t ~ O I = o u t p ~ t ~ l l ~ u t p O t  C21'0(1tputC31-4; 11 just  a rrlu. out of 
I /  r u y  t o  s t a n  with 
i f (  (state-up<-2) bk s t a t e - d m  c-2) I1  In tha t  case no conflict 
11 will  take place. 
if(priorityCl1D-1) I1 active ce l l  at input '1' 
i f  (cell-stam8 Ctl-p) 
output COl .I; 
e l se  
output c21.1; 
if (priorityC31w-1) 11 active ce l l  a t  inpot 2. 
i f  (cell-staturCU - up) I1 ce l l  destining np 
i f  (pnorityContputCOll<rpriority[31) 
output [ll-output to1 ; 
output [ol-3; 
else 
output [ll-3; 
i f  (priorityCoutputC211( priorityC31) 
output [31 =output C21; 
output Ell =3; 
else 
output [31-3; 
i f (  priority[Ol>-1) I1 active ce l l  a t  inpat '2'. 
i f  ( c e 1 1 ~ s t a t ~ C O l " ~ p )  
output to1 -a; 
else 
ontpmt[ll-O; 
else 11 ce l l  i s  destining down 
i f  (priorityloutput C2llcpriority[Ol) 
outpvt [3l=output 121 ; 
output [21.0; 
e l se  
ontput 131=0; 
ifktate-up>2) I/ more than two ca l l s  destining up 
m t  count-up-now; /I an internal  counter of how m y  ce l l s  
/I has granted i t s  requeat t o  be routed up. 
Count~up~~O*o; 
1f(priorityl31D-1) /I a Cell i s  arriving a t  input '3' .  
i f  (cell-statusl31-yp) 
output 101 -3; 
COunt-Up-IIoY"; 
e l se  /I ce l l  l a  destining doun. 
0.tput 121 -3; 
if(priorlty[ll>-1) / I  a ce l l  is arriving a t  input '1' 
i f  (priorityl~utptlOllcpriority 111) 
output [ll =omput 101 ; 
o"tputlol=l; 
e l se  
0utpotC11-1; 
else / I  1 ce l l  i s  destinioc down. 
output 121-1; 
if(priorntytZI>-1) I /  1 ce l l  i s  m i r i n g  a t  input '2' 
case 2 :  / I  two calla h a  Men -My 
/ I  grmted t h e n  request t o  berouted up 
i f  (priori tyl2l>ipriori ty [output 1011) 
if (priority 1 2 l ~ ~ r i o r i t y L o ~ p ~ t l l l I )  
output111=2; 
break; 
case 1: 
i f  (priorityloutput lo I l~=pr io r i ty  121 I 
output lll.o"tptL0l ; 
output 101-2; 
else 
output 111=2; 
0.tput 101.2; 
break; 
else I /  ce l l  1s destining down 
o u t p t  121-2: 
if(prioritylOl>-1) I /  a ce l l  is arriving a t  input '0' 
i f  (cell-starus[Ol--up) 
asi tch (count-npstos) 
ease 2: 
i f  (prioritylOl>.priority 1 ~ t p u t l O l l )  
i f  (priorityloutputl011W~ior1tyloutputl111) 
o~tpmlll.outp"t LO1 ; 
omtput 101.0; 
else 
i f  (priority lOl>priorltyl~(ltput 1111) 
o a t p t  111 -0; 
break; 
case 1: 
i f  (priori t  1omputlOll~~pri~rityLOl) 
outpt711 =output10l; 
output 101 -0; 
else 
output 111.0; 
c a e  0: 
output [ol-0; 
bra*; 
else /I a ce l l  i a  destining d m  
output [ZI -0; 
i f  (state-doun >2) I/ I t  i s  the s w  thing we did above but 
I/ different i n  that  there more than two 
I/ ce l l s  destining down. 
i n t  count-dom~ov; / I  s w  as count-up-now discussed above. 
count-do~-norO; 
if(priority[ll>-1) /I ce l l  i s  m i r i n g  a t  input '1'. 
e l se  
output[Ol-1; I/ ce l l  destining np 
if[prxorit~[31>-1) /I s e l l  i s  arr iving a t  input '3' 
i f  (cell-statas [31==dm) 
count-dosenow; 
i f  (p r io r i t  [output[2llCqriority[31) 
output&]-output [?I ; 
outputC21-3; 
e l se  I1 ce l l  den t in iq  up 
output [01.3; 
if(priority[Ol>-1) I /  a ce l l  i s  arr iving a t  i a p t  '0' 
i f  [ c e l l - s t a t ~ ~ ~  lOI=rdom) 
switch (count-dosenou) 
c u e  2: 
i f  (priority[Ol>=priority [outpa[211) 
i t(prior~ty[ontput [2ll>priority [output [311) 
output [3l=output C21; 
output [214; 
else 
i f  (priorityCOl~qriorityCoutputC3111 
output C31 So; 
break; 
Cas. 0: 
output C2l =o; 
bre*; 
else /I sell  destining up. 
output COl.0 ; 
switch (count-dom-now) 
c u .  2: 
tf (priorityC2l>qriorityCoatpurC211) 
i f  (priorit ~outputC2ll~iorityCnnputC311) 
outptbl-output ~ 2 1 ;  
output c21.2; 
else 
i f  (priorit~Mr~pri~rityCoutputC311) 
output 0 1  i l ;  
break; 
c u e  1: 
i f  (priority CoutputMI<qriority C21) 
outputC3l=output C21; 
output c2l-2; 
else 
output C31-2; 
break; 
/ I  end of routine output-dociaion 
ack[Ol-ack~ll-ack[2l-ackC31--1; I /  just an i n i t i a l i za t ion .  
for(mnt 1.0; i<4;1++) I /  acbou l .dg iq  accepted ce l l s  by '1' 
if(prior1ty[mput[iIlr-1) / I  ashovledge received ce l l s  
ackCoutpt[ill-1; 
for(1nt j=O;j<4;j*+)// connecting id l e  o o t p a l  t o  id l e  or 
I /  dr0pp.d inputs 
if(priority[oaputut[jlI<O) I /  if outpat ' j '  i s  not 
/ I  used at all. scan one of 
/ I  the id l e  i m t s  d connect 
/ I  this ontput'to the f i r s t  
I /  i d l e  input f e u d .  
else 
if(ack[(j*Z) X 4l < 0) 
outputCj1- (j+2) X 4; 
.IS. 
i f(ackt( j+l)  X 41 < 0) 
outputCj1 = ( j+l)  X 4; 
e1ae 
if(ack[(j+3) X 41) 
outputCjl- (j*3) % 4; 
forcint  k-0; k<4; k++)// a s b m l d g i q  dropped c e l l s  by '0' 
Appendix C 
Thoughput Under Uniform 
Random Traffic 
In [&I], a recursive system of equations for the TP of the BG network war achieved. 
The analysis asrunted only om type of cell priority. Both O h  or 04 me d i e d  
the output regular links; because they a n  the b r e d  output links in the caw of 
only one e l l  having a muting bit value of 0 or 1, mpetively. Both OLI or O h  are 
called output alternate links because they are used if more than one e l l  has the same 
routing hit. The probability that a cell will get muted thmugh an output regular link 
in an SE located at  stage i is denoted hy by,,(l). The probability that no e l l  will 
get routed tbmugh an output regular link in a. SE located at  atage i is denoted by 
r,,,(O). Similary, the pmbability a cell will (will not) get muted thmugh an autput 
alternate link of an SE located at  sage i is denoted hy z.,,(l) (z,,(O)). 
In Jtogeo, for any regular link we ean write rv&J) = r,,0(1) = p/2, and lor any 
alternate link we can Mite r.0(0) = r.&) = 0. We ean also write the following 
rmnrsive equations: 
.%s(Q) = < ,,-, (0)s: (0) +~~,,-,~o~~~,.-~~o~z~,~-,~l~ 
+ =,,<-,(O)= .,,- d l 1 2  ,,-, (0) + ;2 ,,-, (l)*:,-,(o) 
+ ;< ,,-, (0 )2  ,,-, (1) +=.,-~~o~~~,~-,~l~=~-t~o~=~,'-!~l~ 
174 
z*J(O) = 1 - .%,,(l). (C.4) 
With four cells ampted by Laeh output port, the TP of the BG network is giwn by: 
TP = 22,,'(1) + 2r.,,(l). (c.5) 
Appendix D 
Calculation of INFi and LNFi 
Calculations of INF, 
Without lods of generality, we w e m e  that the N critical pain in any of the inter- 
mediate stages are overlapped as shown in Figure D.1.a. The modulo N relationship 
can be depicted @ s eimle a. shmn in Figure D.1.b. Nav rn want to evaluate the 
total number of combinations in the intemdiate stage such that the netwrk d- 
not Ime iuli access propeny. Clearly, this b the total number of combinations such 
that the faulty SEs do not constitute critical pairs. M o m r ,  the problem can be 
formulated in the following combinstorial form: 
I1 we hove N dvtinctive dtea l m t d  eowecutiwly to M& 0th- on o 
cirele and each dte  is m&lc of holding at most one biwq number, 0 
or 1, what is the total n u m b  of mbinotions if w place i (i S N/2) 
identieol Is in *car d t w  under the mnditim that no huo 1s mn rrszdr in 
two m e c u l i w  *tea? i.c. Urn should be a sepmtian by ot I w t  one 0 
between any huo 13. 
This problem can be divided into m other problems as s h m  in Figure D.2 by 
unfolding the circle pmblem into m queue problem. The fim problem illustrated in 
Figure D.2.a, b to find the t a td  number of mbi- 

N m ~ ~ ~ t t h a ~ i t s ~ l . n d ~ e V O a d J & m t c s m m D n O t m n t p m  
l a  The& lil F I ~  D 2 b, m ta Bnd the total numbsr of 
c o m b u a t n a r o f ~ l ~ i n @ ~ d N m t e s s u c h t h a t t h e k . t 1 o t e ~ O  
a d w m a 4 r c e a l s t a ~ a o t u n h I s  
t t i e ~ t e d u r r s ~ 5 ~ t h e k 3 t a I ~  
obJeca nth (N - a) - (a - a)(= N - h) 9BlW 
~ h p i s ~ . g v r n b ~ ( ~ , - _ ( ; ~ )  ~ I $ , p s o c l a t ~ a O n t h e a c b l ~  
the seeand problem. The total number of mmbinations in thb csse is ( N ;  ). 
Finally, INF, b the sum of the abwe two value. 
Calculation of LNF, 
There are iV/2 critical pairs in Stag%-, as meotiooed in Section 6.3. We 'em deal 
with the problem easily by dividing the SEs into tm group such that any group does 
mot contain aqv critical pairs as shmm in Fiyre D.3. LNF, can he giwn by: 
L N F , = $ ( ~ ) X ( ~ ! ~ ; ~ ) ;  O < i < N / 2 ,  (D.1) 
where ( ) represents the total number of mmhinationr of j  erron in the fim 
group and ( I V / Y ; j )  wreaents the total number oteombinatialu of j  ermn in 
the smnd  p u p .  
Figure D.3: Dividing SEP in the *st stage into two gmups. 
180 
- 
GNP 
one 
G ~ P  
tyo 




