Algorithms in fault-tolerant CLOS networks by Lee, Hyunyeop
New Jersey Institute of Technology
Digital Commons @ NJIT
Dissertations Theses and Dissertations
Fall 1994
Algorithms in fault-tolerant CLOS networks
Hyunyeop Lee
New Jersey Institute of Technology
Follow this and additional works at: https://digitalcommons.njit.edu/dissertations
Part of the Electrical and Electronics Commons
This Dissertation is brought to you for free and open access by the Theses and Dissertations at Digital Commons @ NJIT. It has been accepted for
inclusion in Dissertations by an authorized administrator of Digital Commons @ NJIT. For more information, please contact
digitalcommons@njit.edu.
Recommended Citation
Lee, Hyunyeop, "Algorithms in fault-tolerant CLOS networks" (1994). Dissertations. 1090.
https://digitalcommons.njit.edu/dissertations/1090
 
Copyright Warning & Restrictions 
 
 
The copyright law of the United States (Title 17, United 
States Code) governs the making of photocopies or other 
reproductions of copyrighted material. 
 
Under certain conditions specified in the law, libraries and 
archives are authorized to furnish a photocopy or other 
reproduction. One of these specified conditions is that the 
photocopy or reproduction is not to be “used for any 
purpose other than private study, scholarship, or research.” 
If a, user makes a request for, or later uses, a photocopy or 
reproduction for purposes in excess of “fair use” that user 
may be liable for copyright infringement, 
 
This institution reserves the right to refuse to accept a 
copying order if, in its judgment, fulfillment of the order 
would involve violation of copyright law. 
 
Please Note:  The author retains the copyright while the 
New Jersey Institute of Technology reserves the right to 
distribute this thesis or dissertation 
 
 
Printing note: If you do not wish to print this page, then select  















The Van Houten library has removed some of the 
personal information and all signatures from the 
approval page and biographical sketches of theses 
and dissertations in order to protect the identity of 
NJIT graduates and faculty.  
 
INFORMATION TO USERS
This manuscript has been reproduced from the microfilm master. UMI 
films the text directly from the original or copy submitted. Thus, some 
thesis and dissertation copies are in typewriter face, while others may 
be from any type of computer printer.
The quality of this reproduction is dependent upon the quality of the 
copy submitted. Broken or indistinct print, colored or poor quality 
illustrations and photographs, print bleedthrougb, substandard margins, 
and improper alignment can adversely affect reproduction.
In the unlikely event that the author did not send UMI a complete 
manuscript and there are missing pages, these will be noted. Also, if 
unauthorized copyright material had to be removed, a note will indicate 
the deletion.
Oversize materials (e.g., maps, drawings, charts) are reproduced by 
sectioning the original, beginning at the upper left-hand comer and 
continuing from left to right in equal sections with small overlaps. Each 
original is also photographed in one exposure and is included in 
reduced form at the back of the book.
Photographs included in the original manuscript have been reproduced 
xerographically in this copy. Higher quality 6" x 9" black and white 
photographic prints are available for any photographs or illustrations 
appearing in this copy for an additional charge. Contact UMI directly 
to order.
University M icrofilms International 
A Beil & Howell Information C o m p a n y  
3 0 0  North Z e e b  R oad . Ann Arbor. Ml 4 8 1 0 6 -1 3 4 6  U SA  
3 1 3 /7 6 1 -4 7 0 0  8 0 0 /5 2 1 -0 6 0 0
Order Number 9514440
A lgorithm s in fau lt-to lerant Clos networks
Lee, Hyunyeop, Ph.D.
New Jersey Institu te of Technology, 1994
Copyright ©1994 by Lee, Hyunyeop. All rights reserved.
U M I
3 0 0  N . Z eeb  Rd.
Ann Arbor, MI 48106
ALGORITHM S IN  
FAULT-TOLERANT CLOS NETW O R K S
by
Hyunyeop Lee
A D issertation  
Subm itted to the Faculty of 
New  Jersey Institu te of Technology  
Partial Fulfillment of the R equirem ents for the D egree 
D octor of Philosophy
D epartm ent of Electrical and C om puter Engineering
October 1994
Copyright ©  1994 by Hyunyeop Lee 
ALL RIG H TS RESERVED
APPROVAL PAGE 
ALGORITHMS IN 
FAULT-TOLERANT CLOS NETWORKS 
Hyunyeop Lee 
Dr. John D. Carpinelli, Dissertation Advisor 	Date 
Director of Computer Engineering 
Associate Professor of Electrical and Computer Engineering, NJIT 
Dr. Sotirios Ziavras, Committee Member 	 	Date 
Assistant Professor of Electrical and Computer Engineering, MIT 
Dr. Michael Palis, Committee Member 	 	Date 
Associate Professor of Electrical and Computer Engineering, NJIT 
Dr. Frank Hwang, Committee Member 	 Date 
Member of Technical Staff, AT&T Bell Lab., Murray Hill, NJ 
Dr. Vaclav Benes, Committee Member 	 Date 
Research Associate 
BIOGRAPHICAL SKETCH 
Author: 	Hyunyeop Lee 
Degree: 	Doctor of Philosophy 
Date: 	October 1994 
Undergraduate and Graduate Education: 
o Ph. D. in Electrical Engineering, 
New Jersey Institute of Technology, Newark, NJ, 1994 
o Master of Science in Electrical Engineering, 
New Jersey Institute of Technology, Newark, NJ, 1990 
o Bachelor of Science in Electrical Engineering, 
Yonsei University, Korea, 1980 
Major: 	Electrical Engineering 
Presentations and Publications: 
Hyunyeop Lee and John D. Carpinelli "Algorithms in Fault-tolerant Clos 
Interconnection Networks," 1994 Conference on Information Sciences and 
Systems, Princeton University, NJ 
iv 
This work is dedicated to 
my family
v
A C K N O W L E D G M E N T
The au thor wishes to express his sincere appreciation to his advisor Dr. John 
D. Carpinelli for his guidance and assistance throughout his research. His continued 
friendship, and support throughout my career as a g raduate  s tudent at the New 
Jersey Ins ti tu te  of Technology is greatly appreciated. Also, I would like to thank all 
the  com m ittee members, Dr. Ziavras. Dr. Palis and Dr. Benes of N JIT . and Dr. 
Hwang of AT&;T Bell Lab., for sparing their precious time for this defense, and for 
their valuable suggestions. Finally, the  au thor wishes to thank  his family for their 
support,  help and encouragement throughout his doctoral studies.
T A BL E  O F C O N T E N T S
Chapter Page
1 IN TR O D U C TIO N  .................................................................................................................  1
1.1 Motivation ...............................................................................................................  1
1.1.1 Parallel Processing .......................................................................................  1
1.1.2 Interconnection Networks in Multiprocessor Systems ......................  5
1.2 B a c k g ro u n d ................................................................................................................. 8
1.3 O u t l i n e .....................................................................................................................  11
2 M ODELLING O F IN TE R C O N N E C T IO N  N E T W O R K  ...................................... 13
2.1 In troduction ........................................................................................................... 13
2.2 Interconnection Networks ................................................................................  13
2.2.1 R e p re s e n ta t io n ............................................................................................. 13
2.3 B ipartite  Multigraphs........................................................................................... 18
2.4 Fault T o le r a n c e ....................................................................................................  20
2.5 Reliability ............................................................................................................... 22
2.5.1 Fundamentals ............................................................................................... 22
2.5.2 System R e l ia b i l i ty ......................................................................................... 23
3 IM PLEM EN TA TION S O F MINS ..................................................................................  26
3.1 Introduction ........................................................................................................... 26
3.2 Design Factors of Interconnection Networks ............................................ 27
3.3 Completely Connected Network ......................................................................  28
3.4 Crossbar N e tw o r k ................................................................................................  29
3.5 Clos Interconnection Networks .......................................................................  29
3.5.1 Network Structures ..................................................................................  30
3.5.2 Properties of the Clos Networks ..........................................................  30
3.6 Benes N e tw o rk s .......................................................................................................32
3.7 Discussions 33
C h a p t e r  P a g e
4 D ECO M PO SITIO N  OF CLOS MINS ..........................................................................  37
4.1 Introduction ..........................................................................................................  37
4.2 M atrix Decomposition ......................................................................................  38
4.2.1 N eim an’s A lg o r i th m .................................................................................  38
4.2.2 R am anujam ’s Algorithm .......................................................................  40
4.2.3 Kubale’s Counterexample .....................................................................  42
4.2.4 Jajszczyk’s Algorithm .............................................................................  43
4.2.5 C ardo t’s Counterexample .....................................................................  44
4.3 Parallel Decomposition ....................................................................................  45
4.3.1 Carpinelli’s A lg o r i th m ........................................................  30
4.4 Edge Coloring and M a tc h in g ..........................................................................  48
4.4.1 Introduction ...............................................................................................  49
4.4.2 Vizing’s Method .......................................................................................  50
4.4.3 Euler P a r t i t io n s .........................................................................................  51
4.4.4 Clabow’s Modified Algorithm ...............................................................  53
4.5 C ordon’s Algorithm .........................................................................................  54
4.6 Discussion .............................................................................................................. 58
5 FAULT T O L E R A N T  MINS ..........................................................................................  59
5.1 Introduction .......................   59
5.2 E x tra  Stage Cube (ESC) N e tw o r k ...............................................................  60
5.3 Fault Tolerant Clos Networks (FTC ) .........................................................  64
5.3.1 Reconfiguration of the FTC  .................................................................  66
5.3.2 Examples ..................................................................................................... 68
6 NOVEL ALGORITHM  FO R CLOS MINS ................................................................  72
6.1 Introduction .........................................................................................................  72
6.2 Failure of Gordon’s Algorithm .....................................................................  72
6.3 New Algorithm for Clos Networks ...............................................................  74
viii
C h a p t e r  P a g e
6.4 Example ................................................................................................................  79
6.5 Worst-case Behavior .....................................................................................  81
6.6 Discussion ..............................................................................................................  83
7 ROUTING  FAULT T O L E R A N T  CLOS N E T W O R K S ......................................... 84
7.1 Introduction ..........................................................................................................  84
7.2 Routing the  F T C  ................................................................................................  84
7.2.1 Routing F T C  with Spare Switches in O uter  Stages (Type I) . . .  85
7.2.2 Routing F T C  with Spare Switches in the  2nd Stage (Type II) . 89
7.2.3 Routing F T C  with Spare Switches in All Stages (Type I I I )  92
7.3 Worst-case Behavior of the Algorithm .........................................................  98
7.4 Discussion .............................................................................................................. 100
8 RELIABILITY O F FAULT T O L E R A N T  CLOS N E T W O R K S ..........................  103
8.1 Introduction .......................................................................................................... 103
8.2 Fault Detection and Location of the F T C  ............................................... 103
8.3 Reliability of the  F T C  Network ...................................................................  104
8.4 Effect of Spare Numbers on Reliability .....................................................  106
8.5 Space Complexity of the  F T C  N e tw o r k .....................................................  109
8.6 O ptim um  N um ber of Spare Switches in the F T C  Network ..............  I l l
8.7 Discussion ..............................................................................................................  112
9 CONCLUSION ..................................................    114
9.1 S u m m a r y ................................................................................................................  114
9.1.1 Routing for Clos Networks ...................................................................  114
9.1.2 Routing for F T C  N e tw o rk s ...................................................................  115
9.1.3 O ptim um  N um ber of Spare Switches in FT C  ..............................  116
9.2 Open Problems ...................................................................................................  117
R EFER E N C E S ..........................................................................................................................  119
LIST OF F IG U R E S
Figure Page
1.1 Flynn's classification of m u l t ic o m p u te r s ...............................................................  4
1.2 Multiprocessor s y s te m ...................................................................................................  5
1.3 A shared bus s y s t e m ......................................................................................................  6
2.1 Switch settings of two 2 x 2  s w i t c h e s .....................................................................  16
2.2 Switch settings of three stage ne tw ork .....................................................................  19
2.3 A bipartite  m u l t i g r a p h ................................................................................................  20
2.4 Series, parallel, and series-parallel sy s te m s ............................................................  23
3.1 The completely connected n e tw o rk ..........................................................................  28
3.2 The N  x N  crossbar interconnection n e t w o r k ....................................................  29
3.3 The three-stage Clos n e tw o rk .....................................................................................  30
3.4 The three dimensional Clos interconnection network.........................................  32
3.5 The 8 x 8  Benes n e tw o rk .............................................................................................  33
3.6 An example of Looping A lg o r i t h m ..........................................................................  34
3.7 The 8 x 8  W aksman n e tw o rk .....................................................................................  36
4.1 Augmenting b ipartite  multigraphs: (a) before, (b) a f t e r ................................. 51
4.2 Euler p a r t i t io n in g ............................................................................................................ 52
4.3 Relations between the / / , 5 ,  and C  m a t r i c e s ....................................................... 55
5.1 The Extra  Stage Cube (ESC) n e t w o r k .................................................................. 61
5.2 The E x tra  Stage Cube Network: (a) Stage 0 interchange switch (b) Stage
3 interchange switch (c) Stage 0 enabled (d) Stage 0 disabled (e) Stage 
3 enabled (f) Stage 3 d isab led ..............................................................................  62
5.3 The F T C  with in  =  k — 3, and one ex tra  switch in each s ta g e .......................  65
5.4 The faulty F T C  with A’(l.O), A '(2 .1), and A'(2,2) faulty sw itches   69
6.1 Worst case runtim e vs. k ........................................................................................... 82
7.1 The F T C  network with extra  switches in the outer stages (Type I) . . . . 86
x
F ig u r e
7.2 The FT C  network with extra switches in the second stage (Type II) . . .
7.3 The F T C  network with ex tra  switches in all stages (Type I I I ) .........
7.4 Worst case runtim e vs. num ber of y.spares for various k ..................
7.5 Worst case run tim e vs. num ber of x.spares for various n ..................
7.6 Average case runtim e vs. number of y.spares for various k .........................
7.7 Average case runtim e vs. num ber of x.spares for various n .........................
7.8 N umber of simple, next simple, and successive swaps vs. x .s p a r e s ............
8.1 Reliability vs. num ber of y.spare switches in Type 1 networks when k =
??. =  2 0 ....................................................................................................................................................................................................................
8.2 Reliability vs. num ber of x.spare switches in Type II networks when
k = n = 2 0 ............................................................................................; ...................
8.3 Cost vs. number of spare switches in Type I networks when n = k =  20
and x =  0 ....................................................................................................................
8.4 Cost vs. number of spare switches in Type II networks when n =  k — 20
and ?/ =  0 ....................................................................................................................
C H A P T E R  1
IN T R O D U C T IO N
1.1 M otivation
1.1.1 Parallel Processing
Many of i today ’s scientific and industrial problems require enormous processing 
power, and the  desire for faster com puters appears boundless as the complicated 
applications th a t  require the processing of enormous am ount of data  emerge. Multi- 
microprocessors are used in areas requiring one or more of the following:
•  Verj' high com putational bandw idths an d /o r  short response times
•  High system resilience and fault-tolerance capabilities
•  Ability to  operate under adverse environmental conditions
•  Geographically d istributed com puting with an associated need for effective 
com munication between centers
• Storage and retrieval of large volumes of da ta  within a  relatively short period 
of time
•  Very close interactions between equipm ent and hum an beings
Advances in technology have achieved some increase in com puting power. 
In tegrated  circuit (IC) technology replaced conventional vacuum tubes and transistors 
and improved performance both in speed, size, and density. T he improvements in 
device technology, versatile instruction sets, large addressing ranges, and operating 
systems also contributed to the increase in processing power. T he  development of 
microprocessor architectures, accompanied by bigger and more powerful instruction
1
sets, has enabled the  overall th roughput provided by a single microprocessor to 
increase by more than three orders of magnitude during the past few decades. 
However, this development is approaching the limit where these technologies can 
no longer keep up with the need for more speed. To meet these problems requires 
deviation from the restriction of the von Neumann architecture which uses a single 
processor to fetch instructions from memory and execute them  one a t a time.
Long before the  advent of microprocessor technology, designers had proposed 
the concept of parallel systems as a mechanism to go beyond the upper bound 
on performance attainable with a single processor. A single processor can fetch 
instructions from memory and execute them  one at a time. Parallel systems, however, 
are based on the principle tha t more than  one task can be performed simultaneously. 
An evolutionary change such as parallel computer architectures and super fast micro­
processors makes parallel processing feasible. Parallel processing can be realized 
either at the software level or a t the  hardware level or at both. At the software 
level, parallelism is obtained by time-sharing the computer resources among different 
programs. Here, the operating system divides the CPU time among the different 
programs so tha t no one program monopolizes the CPU for a long time while others 
are waiting. This technique has been used on computers with a single processor to 
achieve parallelism in the form of multiprogramming, multitasking, multiuser and 
ti me-sli ari ng c.apabi li ti es.
W hen parallelism is implemented at the hardware level, it can take place 
at the com puter level, at the processor level, or at the subprocessor level. One 
hardware strategy is the use of pipelining [12], The concept of pipeline processing in 
a com puter is similar to assembly lines in an industrial plant. To achieve pipelining, 
one must subdivide the input task (process) into a sequence of subtasks, each of 
which can be executed by a. specialized hardware stage tha t operates concurrently 
with other stages in the pipeline. Successive tasks are streamed into the pipe and
are executed in an overlapped fashion at the subtask level. The pipeline consists of a 
cascade of processing stages. The stages are pure combinational circuits performing 
arithm etic or logic operations over the data  stream flowing through the pipe. The 
stages are separated by high-speed interface latches. According to the levels of 
processing, Handler has proposed the classification scheme of pipeline processors 
as arithmetic pipelining, instruction pipelining, and processor pipelining [6]. Vector 
pipelines are special form of pipelines which are specifically designed to handle vector 
instructions over vector operands. Computers with vector instructions are called 
vector processors.
Multiprocessor com puters include all systems th a t  use more than one processor 
to perform a. desired application. The spectrum of such systems ranges from low-cost 
personal computers which frequently utilize a second microprocessor for decoding 
the key depressed on the keyboard, to powerful supercomputers and array processors 
which contain hundreds of processors working in parallel. These processors cooperate 
to execute the instructions of a program. In the ideal case, a system with n identical 
processors could offer n times the throughput available with a single processor. A lter­
natively, the additional processors can be used as backups, on an automatic basis, 
in case the primary processor malfunctioned.
Parallel computer systems can be grouped according to Flynn's classification 
[5], which is based on the number of concurrent instruction and data streams in
a computer. An instruction stream is the sequence of instructions executed by a
computer. The data, s tream  is the sequence of d a ta  accessed to be processed by the 
instructions. Flynn defines the four classes as
• SISD (single instruction single data  stream)
• SIMD (single instruction multiple data stream)
• MISD (multiple instruction single data, stream)
4




MULTIPLE A L U --------
MISD -----  PIPELINE PROCESSORS
— SIMD T ~  ARRAY PROCESSORS
ASSOCIATIVE PROCESSORS
MULTIPLE PROCESSOR 
- TIGHTLY COUPLED --------- SYSTEM
*— MIMD- . MODERATELY COUPLED “  DISTRIBUTED SYSTEMS 
. LOOSELY COUPLED --------  COMPUTER NETWORKS
F ig u r e  1.1 Flynn's classification of multi computers
• MIMD (multiple instruction multiple data stream)
Figure 1.1 shows Flynn's classification of parallel computers.
An array com puter [14] is a .synchronous array of parallel processors which 
consists of many processing elements under the supervision of one control unit. An 
array processor can handle single instructions and multiple data streams (SIMDl. 
Each processing element (PE) consists of a processor with a local memory. Because 
of its large numbers of PEs. the array com puter is suitable lor applications in image 
processing, matrix manipulation, parallel sorting, and fast Fourier transform.
Another form of parallel processing is distributed processing, which is a l s o  
called "computer networking ". A com puter network is a mult icompuler arrangement 
where the computers communicate via special processor-to-processor data links. This 
is a looser coupling than  the shared memory communication of multiprocessing 
systems. A network can link computers hundreds of miles or just a f e w  fee t  apart.
Ml M2 Mm-1 Mm
Pm- PmP2
Interconnection Network
F ig u r e  1.2 Multiprocessor system
Short-distance networks, perhaps contained in one building, are referred to as "local" 
networks. Here the com putation  load is distributed am ong more than one com puter. 
Communications bet ween the different computers take place in the form of passing 
messages to obtain data or exchange results. The advantage of the distributed 
com puting system include fast response, high availability, fault tolerance, resource 
sharing, high adaptability  to the changes in the work load, and better  expandability. 
These advantages have been enhanced by the availability of low-cost microprocessors 
and data link interfaces produced by LSI circuit techniques.
1 .1 .2  I n t e r c o n n e c t i o n  N e tw o r k s  in M u l t i p r o c e s s o r  S y s te m s  
C’learlv. using many processors in the same system yields more speed than using 
one processor. Recent advances in VLSI technologies, coupled with the need for 
fast computers, have made large-scale multiprocessor systems economically feasible. 
In such systems, hundreds or even thousands of processors are used to carry out 
the com putations of a program concurrently, thereby speeding up the execution of
6
Processor 1 Processor 3
Processor NProcessor N-l
Processor 2
F ig u re  1.3 A shared bus system
the program. Many applications can benefit from this enormous computing power. 
The basic architecture of a multiprocessor system is shown in Figure 1.2. In this 
configuration, the .V processors carry out computations on data stored in the M 
memory modules. For the interaction between the processors and memory, there 
must he a communications mechanism to enable any processor to access any memory 
module in the shortest possible time. This communication channel is denoted as the 
interconnection network which plays important roles in multiprocessor systems.
Interconnection networks were first proposed for use in telephone e x c h a n g e s  to 
allow subscribers to talk with one another. Some decades later, researchers began 
to consider how networks could he incorporated into computers. Many different 
approaches have been considered and some implemented. These include the use of 
buses, hierarchies of buses, direct links, single stage networks, multistage networks 
and crossbars. The shared bus is shown in Figure 1.3. When several processors are 
connected together via a bus. these processors should be capable of communicating 
with each other. It is obvious tha t,  as the number of processors increases, the load on 
the interface increases sharply. If one provides a different bus for each path, the cost
of such multiple-bus connections increases as the square of the number of processors. 
On the other hand, if only one bus is used, the contention problem between different 
messages may become critical. W ith more processors/memories, the bus becomes a 
performance bottleneck. Most designers opt for multiple-bus solutions. The resulting 
network is named on the basis of its geometry as a  s tar, a cube, a hypercube, a 
hvpertree, a cluster, and by other similar self-explanatory names. In all of these 
cases, a few pairs of resources have direct links with each other, bu t other pairs must 
communicate via one or more intermediate nodes, thus introducing time delays and 
performance degradation. In order to reduce the load on the bus, it is now becoming 
common for individual processors to have cache memories.
The next simplest form of interconnection mechanism is the crossbar [17], In 
a crossbar switch, every input port can be connected to a free ou tpu t port without 
blocking. This is simple, bu t impractical as the number of processors increases. 
A more practical m ethod is the  use of multistage interconnection networks (MINs) 
which consist of small-sized crossbars and links between them  in a way unique for 
each MIN. Usually, a multistage network consists of more than one stage of switching 
elements and is capable of connecting an arbitrary input terminal to an arbitrary 
output terminal. These can be divided into three classes: blocking, rearrangeable. 
and nonblocking. In blocking networks, simultaneous connections of more than one 
terminal pair may result in conflicts in the use of network communication links. 
Examples of this type of network include the da ta  m anipulator [24], baseline [17]. 
SW banyan [23], omega [17], flip [25], and delta [28] networks. A network is called 
a rearrangeable network if it can realize all possible connections between its inputs 
and outputs  by rearranging its existing connections so th a t  a connection path  for a 
new input-output pair can always be established. A well-defined network, the Benes 
network, belongs to this class. A network which can handle all possible connections
without blocking is called a nonblocking network; some varieties of the Clos network 
are in this class.
As systems become more complex, the  reliability of the system has become a 
major concern because jus t  one fault in the  system can degrade system performance 
or cause the system to fail completely. The function of fault-tolerance is to preserve 
the delivery of expected system services in the presence of errors. There are two 
major aspects to  fault tolerance: (1) detecting and diagnosing faults; and (2) avoiding 
known faults if such a  capability exists. Techniques such as test pa tte rns ,  dynamic 
parity checking, and write/rea.d-back/verify can be used in various interconnection 
networks for detecting and diagnosing fault tolerance. In order to achieve fault 
tolerance, the topology of the network can be modified, usually by adding spare links 
and switches. O ther m ethod involve error-correcting codes, bit-slice implementation 
with spare bit slices, and duplicating the  entire network [57], Many of the known 
interconnection network can be made fault tolerant. Some of the examples are the 
E xtra  Stage Cubes (ESC) [56], the m ultipa th  omega network [59], the F-network 
[63], the enhanced I ADM network [30], the  merged delta [28], the  extra  stage gamma 
network [58], the  /?—network [66] and the INDRA network [61]. The fault tolerant 
Clos network (FTC) has been proposed by Nassar [60]. Little about the properties 
and routings of fault tolerance of the Clos networks is available in the literature.
1.2 Background
Interconnection networks have been widely studied since they play im portant roles 
in telephone switching networks and other communication, data networks arid 
computing systems. In multiprocessor systems, they are needed as a means of 
interprocessor communications. The three-stage Clos network [31] served as a basis 
for the Benes network [32] and the W aksman network [35]. Later, other networks
such as the omega, network [29], the baseline network [17] and the cube-connected 
network [26], were proposed in order to simplify the Benes network.
Several control algorithms for Clos networks have been proposed. The earlier 
algorithms were based mostly on m atrix  decomposition methods. Neiman [33] has 
proposed an 0 ( n 2k 2) algorithm which consists of two phases: a relatively simple 
preparatory phase, followed by a  complex iteration phase. Here, n represents the 
num ber of switches in the  second stage, and k the num ber of switches in the outer 
stage of the Clos network. Tsao-Wu [34] has presented two modifications to the 
preparatory phase, which result in lowering the probability th a t  the  second phase 
will be needed. However, this algorithm does not lower the  worst case complexity of 
N eim an’s algorithm. Waksman introduce another new algorithm [35], and Opferman 
and Tsao-Wu suggested the Looping algorithm for the Benes network [32], A different 
algorithm has been proposed by R am anujam  [36]. However, Ivubale [37] showed that 
R am anujam 's  algorithm fails for some permutations. Also, the m atrix  decomposition 
algorithm suggested by Jajszczyk [38] has been proved to fail by Cardot [39]. These 
algorithms select elements from the  m atrix  according to certain rules, and backtrack 
when they are unable to obtain a perm utation  matrix. These rules are ra ther intuitive 
and do not work in some cases.
Many algorithms are based on the  minimum edge coloring on a bipartite 
multigraph. Hwang [40] suggested th a t  edge coloring algorithms for bipartite  graphs 
can be adapted to decompose Clos networks. Vizing’s m ethod uses 0 ( n 2k) time 
to  perform a  complete coloring since it needs O(k)  t im e to find the alternate 
pa th  to color an edge. The Euler partitioning approach to edge coloring uses a 
divide-and-conquer technique and was formalized by Gabow and Kariv [46]. whose 
algorithm runs in time 0 ( n k *  lg k). A modified version of the previous algorithm 
was presented by Gabow [45], and it runs in time 0 { n k \ g  k).  Cole and Hofcroft 
[47] also proposed an algorithm by preprocessing the edges while keeping the degree
10
of a vertex invariant. Lev, Pippenger and Valiant [52] developed parallel edge 
coloring algorithms for routing on Clos networks. Recently, Gordon [43] introduced 
an algorithm which runs in tim e 0 { n k z!2) with the aid of specification and count 
matrices. Chiu [44] dem onstrated  th a t  Gordon's algorithm displays errors for some 
permutations.
Parallel algorithms were proposed by Nassimi and Sahni [49]. The time bounds 
of these algorithms may be reduced if all of the switch sizes are integral powers of 
two. Another parallel algorithm was proposed by Carpinelli [50] which eliminates 
backtracking by introducing the concept of partitioning. The Benes network control 
algorithm for frequently used perm utations was reported by Lenfant [53].
The self-routability of Clos networks has been studied by Douglass and Oruc 
[54]. This study shows th a t  the Clos network is self-routing if and only if A’/??? <  2 
or m  =  1. Raghavendra [55] also reported self-routing algorithms in Benes and 
shuffle-exchange networks.
Meanwhile, a  great deal of effort has been directed to the fault tolerant 
multistage interconnection network in order to make the network more reliable and 
fault tolerant. A single fault in the interconnection network can cause a severe degra­
dation in performance unless measures are provided to make the network tolerant 
to such faults. W ith developments in VLSI technology, large scale multiprocessor 
systems with fault-tolerant interconnection networks have become feasible. Many 
fault tolerant interconnection networks have been proposed. However, few fault 
tolerant Clos networks have been studied until Nassar [60], who provided alternate 
paths by adding multiplexers and switches to the network.
Although several control algorithms have been proposed in order to reduce the 
run time in the Clos interconnection network, little effort has been made in improving 
the performance by extending the algorithm to the fault-tolerant cases. Nassar's 
control algorithm [60] for the fault tolerant Clos network is based on Neiman's
11
algorithm, and has the same time complexity as Neiman's algorithm. Considering 
th a t  the spare switches can provide alternative paths in the system, his algorithm 
could have been faster if he could utilize these paths during the routing process. 
The ex tra  switches in the fault-tolerant Clos (FTC) network have been found to 
give great flexibility to the routing algorithms by providing alternative paths to  the 
system, and thus can be used to improve the run time significantly when the system 
displays few or no faults. No studies have been made so far about the utilization of 
the extra spare switches for the improvement of routing speeds in the fault tolerant 
Clos network.
In this thesis, Gordon’s algorithm is shown to display errors in some perm u­
tations. Then, the new simple algorithm which works for all perm utations for the 
control of rearrangeable Clos networks is proposed, which is ba»sed on his algorithm. 
The new algorithm is extended to the fault tolerant Clos (FTC) network. The extra 
switches in the fault-tolerant Clos (FTC ) network are used to improve the run time 
significantly since they provide alternative paths to the system when the system 
displays few or no faults. The effect of increasing the number of ex tra  switches on 
system routing time, reliability and cost in fault tolerant Clos networks is analyzed. 
Finally, the optim um  number of ex tra  switches on the fault tolerant Clos network is 
determined which will best satisfy the  run time, reliability and cost constraints.
1.3 O utline
This research has dem onstrated th a t  Gordon's algorithm displays errors for some 
permutations. Next, a  new algorithm is proposed for the Clos networks which is 
based on Gordon’s algorithm. This algorithm is extended to the FTC  networks, and 
resulting run times are compared with the ordinary networks. The FTC network has 
been classified into three types for the purpose of developing algorithms system at­
ically. The reliabilities for these networks are examined, and the optim um  number
12
of extra, switches which satisfies the  reliability, run time and cost, constraints is 
considered.
The rest of the thesis is organized as follows. In chapter 2, basic concepts and 
relevant notation which will be used in the thesis are introduced. These include 
the representation of interconnection networks, fault tolerance, and reliability of the 
system. In chapter 3, the im plem entation of im portan t MINs such as the crossbar 
network, Clos network and Benes network are examined. Routing algorithms based 
on the m atrix  decomposition, edge coloring and matching, and parallel decomposition 
are discussed in Chapter 4. Next, Gordon’s algorithm is examined and then a counter 
example is given which dem onstrates th a t  his algorithm has a flaw. A new algorithm 
for routing on ordinary Clos networks and three kinds of swaps used in the algorithm 
are introduced in chapter 5. In chapter 6, some of the  fault tolerant multistage inter­
connection networks, such as E x tra  Stage Cube (ESC) and Fault. Tolerant Clos (FTC) 
networks, are addressed. In chapter 7, three types of F T C  network are discussed, 
and swapping rules and conditions in each case are considered. A new' algorithm for 
the FT C  network is proposed, which is extended from the algorithm illustrated in 
chapter 5. Reconfiguration of the FT C  network is considered next. In chapter 8. 
reliabilities of the fault tolerant Clos network are considered and corresponding space 
complexities are examined. Also, the fault detection and location of the FTC network 
is considered. Finally, conclusions and open problems are presented in Chapter 9.
C H A P T E R  2
M ODELLING OF IN TER C O N N EC TIO N  N ETW O RK  
2.1 Introduction
The modelling of interconnection networks is im portan t in order to  analyze them. 
In this chapter, the concept of perm utations as well as basic definitions and notations 
tha t are used in interconnection networks are introduced in section 2.2. These provide 
a basis for representing interconnection networks in the various m atrix  forms by 
setting each of the stages of the network. Section 2.3 introduces b ipartite  m ulti­
graphs, which is another method of representing interconnection networks. These 
can be used to route a perm utation for Clos network in edge coloring algorithms, as 
will be shown in chapter 4. The concept of fault tolerance is described in section 2.4. 
followed by the concept of reliability in section 2.5. These will be used to describe 
the fault to lerant Clos network, which will be discussed in chapter 5.
2.2 Interconnetion Networks 
2.2.1 Representation
A set is a collection of distinct elements. A mapping or a function from a set .4 into 
a set B  is a rule which assigns to each element a  of A exactly one element b of B.  
It is written as b=( a ) f  to imply tha t a is m apped to b by f .  Let /  be a mapping of 
A into B.  It is said to be one-to-one if, whenever ai ^  a 2, ( a i ) /  ^  (a2) /  and it is 
said to be onto if for each b G B,  there exists a £ A such th a t  ( « ) /  — b. Let /  be a 
mapping of set A into set B  and let g be a  mapping of set B  into set C . The mapping 
/  • g . defined by (a ) f  ■ g = ({a) f )g  =  (b)g =  c, a €  A,  b € B , c € . C ,  is called the 
composition of /  and g. A  perm utation of a set S  is a  one-to-one mapping of 5  onto
13
14
itself. It is written as {x )P  = y to imply tha t x  is mapped onto y by perm utation  
P.  Both x  and y  belong to S.
A group is a set. G  with a binary operation do t( ')  on G, where the binary 
operation is associative, there is an identity element e in G  such tha t e • x  — x  ■ e = x 
for all x  in G, and for each x  in G, there is an inverse element x'  in G  with the
property tha t x' ■ x  = x ■ x ’ = e. A  subgroup of a  group G is a subset of G  which also
forms a group with respect to the group operation of G. The set of all perm utations 
of N  elements on S  form the symmetric group, denoted as E/v • The cardinality of
Ea- is N\.
Two notations are used for representing perm utation  P.  In s tandard  notation.
also called two-row m atrix  form, there are two rows of elements; the first row contains
the source elements to be permuted and the second row contains the destination
elements that they are mapped onto. It is written as P  =  ( 'Tl 32 Xn ] to
V 2/i 2/2 ■ ■ ‘ Vn J
imply tha t ( x i )P  =  y u 1 <  i <  n, where S' =  {.Ti,;r2,- • •,.?„} =  {t/i,t/2. • • ••*/«}■ 
/ l  2 3 4 V
For example, I 0 j is a perm utation which maps 1 to 2, 2 to 4, 3 to itself
and 4 to 1, where S  =  {1 ,2 ,3 ,4}. In cyclic notation, the perm utation  is of the 
form (,ri ,.r2, • • •, x n) where .Ti is mapped onto .r2, ,r2 is mapped onto .73, and so on. 
The final element x n is mapped onto the first element .Ti. It is written as P  = 
( .r i . .r2. • • •,.?•„) to imply tha t (x^ )P  =  ,r2, [x2)P  =  x 3.- ■ •, {xn)P = .Ti. where x, G S.
5  = {.Tj. ,r2. • • •, .rn }. The previous example P  =  ^ ^ ^ ^ ^ will be represented 
by the cycle (1 2 4). Any element which is m apped onto itself is not written explicitly, 
so 3 is not included in the cycle (1 2 4). Particularly, the permutation e is called 
the identity perm utation , and (x ) t  — x  for all x  £  S.  For example, the perm utation
6 =  ( l  ^ 3 4 )  m a Ps every element onto itself.
Given a switch with N  inputs and N  ou tputs, the setting can be expressed as 
an N  x N  m atrix , K . The rows of the m atrix  represent the inputs of the switch and
15
the columns represent its outputs. I \ [ i , j]  =  1 if the switch is set so th a t  input ? is 
connected to ou tpu t j ;  otherwise, K [i , j ]  =  0.
The K  m atrix  can be used to represent a stage. A stage is a  set of switches 
which are disjoint, th a t  is, there is no possible connection from the ou tpu t of one 
switch to the  inpu t of another in the set. Notice th a t  the  perm utations for the two 
2 x 2  switches were combined to form one perm utation encompassing four elements. 
The m atrix  approach is similar; the 2 x 2  matrices corresponding to the switch 
settings are embedded in a 4 x 4 m atrix  which defines the  setting of the entire stage. 
Given the settings of two switches




>. CO .fit II *o1
the 4 x 4  m atrix  which results from their embedding is
' 0 1 0 0 '
1 0  0 0 
0 0 1 0  
. 0 0 0 1 .
In the perm utation  notation, 1 and 2 had to be m apped amongst themselves, as 
did 3 and 4. This is because the  switches are disjoint, and the  inpu t of one switch 
cannot be mapped onto the ou tpu t of the other. In the m atrix  this is accomplished 
by setting the elements of the quadrants not on the  main diagonal to zero. In this 
example, rows 1 and 2 cannot have non-zero entries in columns 3 and 4. Figure 2.1 
shows the resultant switch settings.
A multistage perm utation  network consists of several stages of switches. The 
ou tpu ts  of one stage serve as the inputs to the next stage. T he mapping realized 
by the network is derived from the mappings of the individual stages. If the maps 
of each stage are represented as matrices, the m atrix  representing the map of the 
entire network is the ordered product of the stage matrices. As an example, consider 
a three-stage network as shown in Figure 2.2. The matrices corresponding to  the 
stage settings are, respectively,
16
2
F ig u r e  2.1 Switch settings of two 2 x 2 switches
'  0 1 0 0 ' 0 0 1 0 ' 1 0 (J ft ■
1 u u u 0 u 0 1
a n d
0 1 u (J
0 t) 1 0 1 0 0 t) 0 t) u 1
. 0 0 0 1 . . u 1 0 0 . _ 0 0 1 u .
Their ordered product is
0 0 1 0
0 0 (J 1
1 U (J 0 
0 1 0  0
The matrix of this kind becomes very sparse as the number of inputs increases. 
For :Y inputs, a m atrix of this form has A'2 entries. In order to reduce this size, 
a compacted m atrix  is often used. The A’ x A’ m atrix is consolidated into a /,■ x /,- 
matrix. . where in — X / k .  The first row of 11,n is the sum of the first m  nuv> 
of the original matrix: the second row of 7 7 is the sum of the second in rows, and 
so on. The columns are compacted in a similar manner. The 2 x 2 matrix. / / j . 




Not.e th a t  the sum of the  elements in each row and in each column is exactly 7 7 7 .
There is a  trade-off th a t  results from the savings in m atrix  size; one compacted
0 2
m atrix  may represent more than  one mapping. The compacted matrix 
represent any of the following matrices.
2 0
mav
•  0 0 1 0 ■ ■ 0 0 1 0 ■ ■ 0 0 0 1 ' '  0 0 0 1 ■
0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0
1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0
.  0 1 0 0 . .  1 0 0 0 . .  0 1 0 0 . .  1 0 0 0 .
When dealing with com pacted matrices, additional information is required to
distinguish between mappings. The m atrix H m is the representation used by the
matrix decomposition class of algorithms for routing on Clos networks. The H m
m atrix is a  typical way of representing the perm utation. However, as the m atrix
becomes sparse, further reduction in size can be possible with the use of S  and C
matrices, which are called the specification m atrix , and the count matrix. In the .S’
m atrix , the k x k  m atrix  H m is consolidated into a k x 77 m atrix  where ?? is the sum
of the elements in any row or column of H m. This is especially useful in representing
the  Clos network as is explained in chapter 5. In order to obtain the S  m atrix , let
P  = [ V ^   ̂ ] , where 0 <  i < N  — 1 and N  =  nk.  For each
\ y 0 y  1 • • • y i  • • • y N - i  J
signal 7. calculate x  and t where x — \_i/n\ is the first-stage input switch a t which 
the signal arrives, and t =  [y i /n \  is the last-stage ou tpu t switch to which it should 
be routed, and set any unassigned element which is the next unassigned element in 
the 2 th  row of S  to t. On the o ther hand, each element of C, c[:r, y], 0 <  x < k — 1. 
0 <  ?/ <  77 — 1, is the num ber of occurrences of the integer x  in column y of S.  As 
an example, the perm utation  m atrix  is given as
P  = 0 1 2 3 4 5 6 7 8 9 10 11
2 10 3 5 6 11 7 1 9 4 0 8
The H 3 m atrix is
H  =
1 0  1 1  
1 1 0  1 
0 1 1 1  
1 1 1 0
IS
The 5  and C  matrices are
' 0 3 1 ' '  1 2 0 '
1 2 3
and C  =
2 0 1
2 0 3 1 1 1
. 1 0 2 .  0 1 2
S  =
A m ultistage perm utation  network consists of several stages of switches. The 
o u tp u t  of one stage m ust be connected to the inputs of the  next stage. The perm u­
ta tion  realized by a. multistage network is the ordered composition of the perm u­
ta tions realized by its stages. Consider a 4-input, 3-stage network. The first stage 
realizes the  perm uta tion  ^ 9 1 4 3
/  1 2 3 4 A /  1 2 3 4
I g j 9 ] , and the th ird  stage the  perm utation  I  ̂ 3 ‘> 4
stage perm utations  in order, the  first perm uta tion  m aps input 1 onto output 2. This
ou tpu t 2 of the first stage is assigned to  inpu t 2 of the  second stage; the second
stage perm uta tion  m ap routes this to  ou tpu t 4 of the second stage. Finally, input
4 of the third stage is routed to o u tpu t  4, so the network routes input 1 to output
, the second stage realized the perm utation
Composing
4. R epeating this for the other inputs, the perm utation  realized by entire network is 
1 2  3 4 
4 2 3 1 , as shown in Figure 2.2.
2.3 B ipartite M ultigraphs
The b ipartite  multigraph also can be used to represent a perm uta tion  for Clos 
networks, which will be introduced in chapter 3. A graph G — (K  E )  is an ordered 
pair of finite sets V and E.  T he  elements of V’ are called vertices, and the elements 
of E  are called edges. An edge (tk i d )  is an unordered set of two distinct vertices. If 
an edge (v. iv) can occur more than  once. G is a multi-graph. Edge (v.u')  is incident 
to v and to w. and vertices v and w are adjacent. A subgraph of G  is a graph whose 
vertices and edges are in G. To delete edge e from G means to form the subgraph 
G — t .  consisting of all vertices of G  and all edges of G  except e. To delete vertex v 





F ig u r e  2.2 Switch settings of three stage network
and all edges of G except those incident to r .  A graph corresponding to a function 
has the property that the vertex set can be partitioned into two disjoint subsets 
f i  and D {R  corresponds to the set of range vertices and D  to the set of domain 
vertices) such that all edges in the graph join a vertex in D  to one in R. There are 
no edges that join two vertices in R  or two vertices in D.
A graph whose vertex set can be partitioned in this way is called a biparlil< 
graph. All graphs that correspond to functions are bipartite. The degree of a vertex 
r is the number of edges incident to v. An example of a b ipartite  multigraph is 
shown in Figure 2.3. A graph is regular if all vertices have the same degree. A path 
P  is a sequence of edges (t>,. r 2). (to. t ’3 )  . . . .  ( r , T h e  ends of P  are vertices 
C] and r„ . If tq ^  r„. P  is open, otherwise P  is closed. A graph is connected if 
there is a path between any two distinct vertices. A connected component of a graph 
is a maximal connected subgraph. A matching M  of G is a set of edges, no two of 
which are incident to the same vertex: M  covers any vertex incident to an edge in 
M.  An edge coloring of G is an assignment of a color to each edge in G such that
20
F ig u r e  2 .3  A bipartite multigraph
no edges incident to a vertex have the .same color. Thus all edges of a given color 
form a matching. A minimal edge coloring uses the fewest number of colors possible. 
The application of coloring and matching to routing C'los network will he discussed 
in Chapter 4.
2.4 F a u l t  T o le r a n c e
A fault tolerant MIN is one that provides service even when it contains a faulty 
component or components. A fault can be either permanent or transient. Fault 
tolerance is defined only with respect to a chosen fault tolerance model, which has t wo 
parts. The fault model characterizes all faults assumed to occur, stating the failure 
modes for each network component. The fault tolerance criterion is the condition 
that must be met for the network to be said to have tolerated a given fault or faults. 
The fault model is the type of faults that can occur in the network. Implicitly, 
the fault model specifies the type of faults that can be recovered from using t h e  
proposed fault tolerance design. Different designs specify different fault m o d e l s .  A
21
good design, however, is one whose fault, model includes as manj' fault types as 
possible. To illustrate, a typical fault model is as follows.
•  Any network component can fail: MINs are made up of two types of 
components which are switches and links.
•  Switches and links are likely to fail.
• The network is capable of recovering from any such fault.
•  A link fails if it is open or short circuited. A switch fails due to some internal 
malfunction.
The ex tra  hardware added to provide fault tolerance to the network fails at 
a lower ra te  th an  the network hardware. This assumption is usually made for two 
reasons. First, if the extra  hardware added to the network to make it fault tolerant 
could be assumed to fail at any significant rate, then it would not be possible to 
propose any fault tolerance design. In addition, this assumption can be justified for 
MINs because these components usually remain idle under normal conditions. Thus 
they can be expected to have higher lifetime than the actively working components 
of the network.
The fault tolerance criterion is the  condition tha t must be met in order for the 
system to be called fault tolerant. The fault tolerance criterion for the networks is 
mainly full-access retention. T h a t  is, after a fault occurs, each processor must still 
be able to com municate with any memory module. However, the two fault tolerant 
designs can offer a higher criterion, i.e., full recovery. Full recovery is the ability of the 
network to regain its pre-fault connectivity after a fault occurs. A network is single­
fault tolerant if it can function as specified by its fault tolerance criterion despite 
any single fault conforming to its fault model. Generally, if any set of i faults can 
be tolerated, then a network is i-fault tolerant. A network th a t  can tolerate some
22
instances of i faults is f-robust although not 7-fault tolerant. Many fault tolerant 
systems require fault diagnosis such as fault detection and location to achieve their 
fault tolerance. Techniques such as test patterns , dynamic parity checking, and 
w rite/read-back/verify can be used in various interconnection networks.
Fault tolerance can be achieved at various level in a  system. Techniques for fault 
tolerant design can be categorized by whether they  involve modifying the topology 
of the system. Three well known methods th a t  do not modify topology are error- 
correcting codes [64], bit-slice implementation with spare bit slices [63], and dupli­
cating an entire network [65].
2.5 Reliability
2.5.1 Fundamentals
The reliability of a system is defined as the probability th a t  the system will perform 
a required function under stated condition for a s ta ted  period of time t. M a th em at­
ically. the reliability', R , of a system is a function of A and t, where A is a constant 
representing the failure rate (per unit time). To simplify the analysis in this thesis, 
the time factor will be only implicit. In other words, when it is said tha t the relia­
bility of a switch is it will mean the reliability of the switch over a given period of 
time t. This is done because the focus will be on comparing reliabilities, ra ther  than 
obtaining the absolute reliability value. In comparing two networks, for instance, 
the two networks should be under the same circumstances, including the period of 
time, t , hence the omission of the time factor. Predicting reliabilities usually involves 
dealing with probabilities. It stands to reason then th a t  an overview of probability 





F ig u r e  2.4  Series, parallel, and series-parallel systems 
2 .5 .2  S y s te m  R e l ia b i l i ty
Simple systems can generally be classified into three categories as shown in Figure 
2.4: series, parallel, and series-parallel. A system can be broket) down into isolated 
components. First, a series system is defined as a complex system of independent 
units connected together, or interrelated, in such a way that the entire system will 
fail if any one of the units fails. It is assumed that the failure of one component has 
no effect on the probability of any other component failing. Titus, the system can 
he no better than its weakest component,. Series reliability is calculated using the 
product rule as
r .  =  n  p .
1=1
where P, is the probability that a component i of the system will function properly. 
W hen component reliabilities are equal, the reliability of the system is
/?, =  ( P i ) n
24
The unreliability of the system is defined as 1—reliability. Thus, the unreliability of 
a system is
U, =  l - R t =  l - f [ P i
!=i
On the other hand, a parallel system is defined as a  set of interrelated 
components connected in such a way th a t  a  redundant, or standby part  can take 
over the function of a  failed part to save the  system. Redundancy refers to the use of 
more than  one part for the same function. The calculations for parallel reliability are 
more complex than  those for series reliability and include the concept of unreliability. 
The parallel reliability of a  system is
R P  = 1 -  n  u >
!=i
For equivalent, component unreliability,
=  1 - ( £ / , • ) "  =  i - ( i - < p . r
Parallel reliability increases as the num ber of components increases, which is the 
opposite of the series systems. Parallel systems also display marginal proba­
bility. which refers to the increase in reliability as components are added. As the 
redundancy is increased in parallel systems, it is im portant to balance the costs 
involved.
Mixing the two kinds of systems, there can be a  series-parallel system which 
includes both series and parallel components. Reliability for these systems can be 
determined by computing the reliabilities separately, using the rules th a t  apply to 
either series or parallel systems, until the entire system is completed.
Sometimes a system has n parallel components but needs at least m of them 
to remain operational. This problem is a binomial distribution. The reliability of 
the system in this case can better be expressed as unity minus the probability of the 
complementary event ( tha t is, failure occurring from having between 0 and m  — 1
operational components). The operational components are indistinguishable from 
each other, and so are the non-operational components. Recall tha t the way to count 
the number of wavs these components can be arranged together is a combination 
problem. Thus the reliability of the system is
R = 1  ̂ ” j  P ‘(l - P ) n~l
This equation is used in chapter 8 to obtain the reliability of the fault-tolerant Clos 
network.
C H A P T E R  3
IM PLEM ENTATIO NS OF M INS
3.1 Introduction
The overall performance of multiprocessor configurations is affected by the number 
and the type of processors, the communication mechanism between the computing 
sources, the characteristics of the com putational workload, and the  control program. 
Whereas the m ajor constraint in uniprocessor systems is the speed of the processor, 
the critica.1 factor in multiprocessor systems is the speed of the interconnection 
mechanism. The performance of the interconnection mechanism, on the other hand, 
is determined by network structures and routing algorithms. A broad spectrum 
of networks has been studied ranging from simple linear arrays to the completely 
connected situation, with all o ther configurations falling in between. In many appli­
cation, the  choice of an appropria te  interconnection network is a key issue in the 
design of any system with multiple processing resources. Nonblocking networks 
which work for all perm utations are particularly well suited in these purposes. 
Rearrangeable nonblocking networks and their routing methods are also studied for 
their potential uses.
In this chapter, design factors for interconnection networks are discussed in 
section 3.2. In next two sections, the  fully-connected and crossbar networks, which 
are the most straightforward in design, are examined. In section 3.5, the  construction 
of the Clos network capable of m apping its N  input term inal to its N  output terminal 
is described. Finally, the  Benes network is discussed in section 3.6, followed by 
discussion in section 3.7.
26
27
3.2 Design Factors of Interconnection Networks
There are fundamental decisions in determining the appropriate architecture of an 
interconnection network. The decisions are the  operation mode, control strategy, 
switching m ethod, and network topology [17]. Among the four decisions, network 
topology is a key factor in determining a  suitable architectural structure. A network 
can be depicted by a graph in which nodes represent switching points and edges 
represent communication links. The topologies of interconnection networks tend to 
be regular and can be classified into the  following two categories: static  networks 
and d y n a m ic . networks. In a static network, links between two processors are 
passive and dedicated buses th a t  cannot be reconfigured for direct connections to 
other processors. Topologies in the static  category can be classified according to 
the dimensions required for layout, for example, one-dimensional, two-dimensional, 
three-dimensional and hypercube. In a dynamic network, links can be reconfigured 
by setting the network’s active switching elements.
There are three topological classes in the dynamic network: single-stage,
multistage, and crossbar. A single-stage network is composed of a stage of switching 
elements cascaded to a link connection pattern . The shuffle-exchange network is 
a single-stage network based on a perfect-shuffle connection cascaded to a stage 
of switching elements. A multistage network consists of more than one stage of 
switching elements and is usually capable of connecting an arbitrary input terminal 
to an arbitrary ou tpu t terminal. M ultistage networks can be one-sided or two-sided. 
The one-sided networks have input-output ports on the same side. The two-sided 
networks have separate input and ou tp u t sides.
The control-setting function can be managed by a centralized controller or by 
the individual switching element. T he la tter  strategy is called distributed control 
and the first strategy is called centralized control. Generally, the centralized control 
is simple, bu t takes a longer time. In contrast, the distributed control is fast bu t
28
F ig u r e  3.1 The completely connected network
requires additional computing .sources in each switch. The typical operation modes 
of interconnection networks can be classified into three categories: synchronous, 
asnychronous. and combined. Also, three switching methodologies can be identified 
as circuit switching, packet switching, and integrated switching, which are not 
covered in this thesis.
3 .3  C o m p l e te ly  C o n n e c t e d  N e tw o r k
The ideal situation would be to link directly each processor to every other processor 
so that the system is completely connected as shown in Figure 3.1. 1'nfort uriat ely. 
this is highly impractical for large N  because it requires ;Y — 1 connections for each 
processor, and the total number of connections needed in the network would reach 
A ' ( A  — 1). For example, if A' =  2 y. then 2 y( 2 y — 1) - 2 6 1 . 6 3 2  links would b e  n e e d e d .
3.4 C r o s s b a r  N e tw o rk
The simplest connection network is the Crossbar network, which has one switch lor 










N -l ... ---------------- O
•  9 •
0 1 2  N-l
F ig u r e  3.2 The :Y x A’ crossbar interconnection network
network would have A'2 switches and 0(  A'2) area. The routing algorithm to set the 
switches is trivial. The A’ x A' Crossbar network is shown in Figure 3.2. All Crossbar 
networks are strictly non-blocking. The difficult}- with crossbar networks is that the 
cost of the network or the number of crosspoint switches which grows with A"2. This 
makes the crossbar network infeasible for large systems.
3.5 C los  I n te r c o n n e c t i o n  N e tw o rk s
The interconnection networks shown above are impractical as the number of i n p u t s  
increases. .Many other networks are reported in the literature. Most of them are 
blocking networks which can not implement all the permutations. Rearrangeable 
nonblocking networks such as the Clos network and cellular networks are networks 
without blocking properties. The three-stage C'los interconnection network, which is 

























The three-stage Clos network [31] consists of two symmetrical outer stages of 
rectangular switches, with an inner stage of square switches. It is completely 
determined by the integer param eters n, m,and k tha t give the switch dimensions. 
The first stage contains k  switches, each of which has m  inputs and n outputs. Each 
switch is actually' a simple crossbar switch which can realize any mapping of its 
inputs onto its outputs on a one-to-one basis. The second stage consists of n k x k 
switches, each of which receives exactly one input from each first-stage switch. The 
ou tpu t stage has k n  x  m  switches, each of which receives exactly one input from 
each second stage switch. The num ber of inputs to the network is N  =  mk.  Inputs 
and outputs  to the first-stage switch or third-stage switch i are numbered from 
(?' — 1) ??? +  1 to 7?77. 1 <  ?' <  k. The Clos network can reduce the area of the 
crossbar switches for the same number of inputs. For example, when A” =  12 with 
?7 =  ?77 =  3 and k =  4. the number of cross points in the crossbar is 122 =  144, while 
in the  Clos network, total num ber of cross points is 2 x 4 x 32 +  3 x 42 =  120. The 
Clos network is much easier to visualize when it is illustrated in three dimensions as 
shown in Figure 3.4.
3.5.2 Properties of the Clos Networks
In contrast to most other interconnection networks, the Clos network satisfies some 
im portan t characteristics. One of the properties of the Clos network is the rearrange- 
ability if the network satisfies the condition n > m  . The interconnection network 
is rcarrangtablt if it can connect any idle input to any idle output by possible 
rearrangement of its existing paths. If the network satisfies m = n = k. then 
at most /? — 1 existing calls need be moved in the Clos network in order to connect 
an idle input-output pair. Also, Clos showed tha t for m  > 2n — 1. the network is 
nonblocking in the strict sense [31]. The network is strictly nonblocking if it is always
32
F ig u r e  3.4  The three dimensional Clos interconnection network.
possible to connect together an idle pair of input-outputs without disturbing the 
routing already established, no m atter  in what state the network may be. Note here 
that the network is nonblocking in the wide sense when putting up new calls results 
in avoiding all the blocking states, so that the system is effectively nonblocking.
3 .6  B e n e s  N e tw o r k s
Benes considered the class of rearra.ngea.ble 3-st.a.ge Clos networks with n — in — 2 
and /.' =  2' for some positive integer i. He showed that any such network can he 
recursively decomposed into 2/ -f 1 stages, each consisting of .'Y/ 2 2 x 2 cells. Benes 
networks have 2(lg A') — 1 stages and 0 ( N  lg A') crossbars, where A’ — ink  =  
2 '+ l . To illustrate Benes’s decomposition, consider the 3-stage Clos network with 
n =  in =  2 and k — 4 which is depicted in Figure 3.5. The first and last stages 
consist of four 2 x 2 crossbars, and the center stage consists of two 4 x 4 cells. These 
cells are decomposed into 2 x 2 crossbars and the total number of stages is live, each 

























/ \ \ /
5 S / □
6 □
7 D <
F ig u r e  3 .6  An example of Looping Algorithm
Sequential routing algorithms [42] need 0(.A’log A ) steps where A is the 
network size. O ther methods such as the parallel processing m ethod, heuristic 
method, or recursive approach are used to improve this time complexity. One ol 
the basic algorithms is the looping algorithm. In order to illustrate the looping 
algorithm, consider a permutation matrix P
0 1 2 3 4 5 6 7
P 1 3 7 4 0 2 (i 1 o 
The looping algorithm starts  recording the perm utation. P  as shown in Figure Tti.
The two output numbers of a switching element in the output stage are shown 
in the same column, and the two input numbers of a switching element in the input 
stage are shown in the same row. Then choose an arbitrary entry in the chart as 
a starting point. For example, starting at row 23 and column Ul. then look for a 
same-row or column entry to form a loop and choose row 23 and column -13. The 
process continues until a loop is obtained by re-entering row 23 and column 01. 1 lie 
loop's member entries are then assigned "a" and “b" alternately. The second loop 
can be formed in the same way. Then, assign input and output l i n e s  named "a lu
35
subnetwork “a" and those named “b” to subnetwork “b” . The looping algorithm can 
be applied recursively to the two subnetworks. Figure 3.6 shows an 8 x 8 Waksman 
network [35].
3.7 Discussion
T he Clos network is nonblocking and rearrangeable. Any idle input terminal of 
the network can always be connected to any idle ou tpu t term inal by rerouting the 
existing connections if necessary. Also, the  Clos network has an area complexity 
less than 0 ( N 2). For systems with a large number of processors, the Clos network 
has the advantage of area complexity when compared with crossbar switches. The 
propagation delay is also an im portan t consideration in perm utation  network design. 
Clos networks have propagation delays ranging from 0 (lg N )  to O (N ) .  depending 
on the values of the design parameters. O ther networks such as Benes networks 
are of importance. Various methods of implementing control algorithms have been 
developed, and will be discussed in chapter 4.





















C H A P T E R  4
D E C O M P O S IT IO N  OF CLOS M IN S
There are many algorithms reported in the literature for routing the Clos network. 
These algorithms can be classified basically in three categories: m atrix  decompo­
sition, edge coloring and matching, and parallel decomposition. These algorithms 
determine the setting of the switches of a  perm utation  network to realize a given 
perm utation, or a connection pa tte rn  of every stage from the inputs to the outputs. 
First, in this chapter, the m atrix  decomposition algorithms of Neiman, Ramanujam, 
and Jajszczvk are studied. Also, the counter examples of Kubale and Cardot are 
considered. The class of routing algorithms for Clos networks which make use of 
edge coloring on bipartite  graphs are presented. These two decomposition methods 
are reported to be basically the same [50]. Also, the parallel algorithms of Carpinelli 
are examined. Finally, Gordon's algorithm is discussed, which is a basis of the new 
algorithm tha t will be introduced in chapter 6 .
4.1 Introduction
In the Clos network, central routing units are required whose function is to receive 
a perm utation, and to find the corresponding settings for each individual switch 
to realize tha t permutation. Many routing algorithms have been developed for the 
Clos networks. But routing processes of the Clos network are extremely serial in 
natu re  and there often occur routing conflicts, which result in backtracking. The 
backtracking is going back to the previous steps when the conflict occurs in order 
to keep decomposing the m atrix . The basic approach in routing the Clos network is 
to find the switch settings of the second stage switches, and from there, set the first 
and th ird  stages switches accordingly. Once we know the  second stage settings, we
37
38
can set the rest of the switches very easily, without any calculations. However, this 
is not true  if we try  to decide the switch settings from outside to inside. Also, setting 
the  second stage switches involves conflicts and the algorithm must backtrack to find 
the  right switch settings. This keeps the algorithm relatively slow, and make the 
algorithm highly sequential. This is one of the reasons why few parallel algorithms 
have been developed so far. For this reason, Carpinelli's [50] parallel algorithm checks 
the  possibility of backtracking using a  partitioning technique before decomposing the 
matrix. One of the  ways to improve the speed of the algorithm is preprocessing, which 
arranges the switch settings to be closer to the final settings before the algorithm 
starts, so the  total workload can be reduced. Three approaches have been explored 
in the litera ture  for decomposing the matrix: the m atrix  decomposition approach, 
the coloring and matching approach and the parallel approach, which are covered in 
this chapter.
4.2 M atrix D ecom position
4.2.1 N eim an ’s Algorithm
N eiman's algorithm consists of two stages. The first step tries to m ark all k  elements 
from the m atrix . If the first step could not mark all k  elements, then the second step 
takes over and finishes marking the elements. The algorithm is illustrated as follows. 
Step 1: S tarting  with the left-most, column, m ark a non-zero element which has no 
marked elements in its row. Repeat this process on the next column, continuing 
until all columns are processed. If during this marking process, a column is found 
to have no non-zero entry whose previous entries in its row are not marked, then 
the algorithm proceeds to the next column without- marking any elements in tha t 
column. If k  elements are marked, then the algorithm is done; otherwise, Step 2 
must be performed once for each column with no marked elements.
Step 2: If the num ber of marked elements in Step 1 is x,  then the num ber of unmarked
39
elements must be k — x.  Mark a non-zero entry in a column with no marked elements, 
say Unmark the other marked element in this row, H m[i,j\. Mark another
non-zero element in this column, H m[i,j\, following the  rule th a t  once this stage 
m arks an element in a  row or column, no other element may be marked in that row 
or column until this iteration of Step 2 is completed. Continue to  unm ark and m ark 
elements until no row or column has more than one marked element. This will result 
in a matrix  with exactly one more marked element than  before executing Step 2.
The k  marked elements represent the setting for one of the m  switches of stage 
2. A marked element in a. row i and column j  represents th a t  the input, i of the
switch is to be connected to ou tp u t  j  of the same switch. Each marked element in
H m is then decremented by one to obtain H m- \ .  Next the  algorithm is applied to 
H m- 1 to obtain the setting for another switch in stage 2, and this process is repeated 
until H\ is obtained.
As an example, consider a Clos network with m  =  n =  3 and k =  4 with a 
perm utation  matrix
/  1 2 3 4 5 6 7 8 9 10 11 12 \
I 2 7 8 1 5 11 6 3 9 12 4 10
The corresponding H  m atrix  is
H 3 =
1 0  2 0 
1 1 0  1 
1 1 1 0  
0 1 0  2
The first stage arbitrarily m arks a non-zero element in the first column. H 3[2. 1]. The 
next columns are also marked without any duplications of rows or columns which 
have been already marked. Here, H 3[4,2] and ^ 3[1,3] are marked arbitrarily. Next, 
we need to mark i / 3[3,4], bu t column 4 has no non-zero entry in a row with no 
marked elements, so no element is marked in this column. Since there is no marked 
element in the fourth column, the second step must be executed. The matrix, with
40
asterisks representing marked elements, is
# 3  =
1 0 2* 0
1* 1 0 1
1 1 1 0  
0 1*  0 2
The second stage successively marks and unmarks elements of H3 until it has four 
elements, no two of which reside in the same row or column. Arbitrarily m ark a 
non-zero element in column 4 with no marked elements, / / 3[2,4]. U nmark the  other 
marked element in this row, i / 3[2 , l], and m ark a  non-zero entry in the column of the 
unmarked element, column 1. One choice can be /Z3[ l , l ] .  This process is repeated 
until one more element is marked than was the case in Step 1. Continuing the process 
of unmarking and marking elements, /i/3[1,3] would be unmarked and 7/3(3, 3] would 
be marked. Since four elements are now marked, and no two reside in the same 
row or column, the  algorithm terminates. The m atrix  £3 can be extracted from the 
marked elements of H :i as shown below'
H« =
■ 1* 0 2 0 ' ' 1 0 0 0 '
1 1 0 1* and £3 — 0 0 0 1
1 1 1* 0 0 0 1 0
0 1* 0 2 .  0 1 0 0 .
The H 2 m atr ix  is obtained by subtracting £3 from / / 3 matrix, and H 2 can also 
be decomposed using the same method described above, which would leave another 
two solutions, £ 2 and £ 1. The time complexity of Neiman's algorithm is known to 
be 0 ( n k 2) for pass 1 and 0 ( n 2k 2) for all. For large 7, Neiman's algorithm displays 
high time complexity, although his method holds for every possible perm utation .
4.2.2 R am anujam ’s Algorithm
Ram anujam  [36] uses a different matrix than the other algorithms in this class, but 
it is related to the / /  matrix. He uses the allocator matrix M  which has dimension 
k x A-, and M \i , j ]  is the set of all destinations of inputs to first-stage switch j  which 
are output at third-stage switch i. It is actually the transpose of Hm. with the entries
41
listed ra ther than  counted. The phase of the  algorithm which extracts the desired 
m atrix  operates as follows. Set up a k x  k m atrix  7 \ where T[i . j ]  is the maximum 
element of M [i , j ] ,  or 0 of M [i, j]  is empty. The largest element of T  is marked, and 
its row and column are crossed off. This is repeated on the subm atrix  left in T  until 
T  is null or contains all zeros. If T  is null, the marked elements define a m atrix for 
extraction. These elements are deleted from M ,  and the process is repeated until 
M  is null. If T  is not null, reform T,  replacing the largest value with a zero, and 
repeat this stage, choosing the largest element of T.  The marked elements form the 
£  m atrix . As an example, consider the Clos network with m  = n =  3. and k  =  4. 
The perm utation  to be realized is given as
P  =
0 1 2 3 4 5  6 7 8 9  10 11 
2 3 8 7 9 5 11 6 1 10 0 4
The allocator m atrix  M  and the T  m atrix  are
r {2} $ {1} {0} 1 ' 2 <E> 1 0 ‘
















$ {9} {11} { 10} J .  $ 9 11 1 0 .
From the T  m atrix , it can be seen tha t the largest element is T[3.2]. Mark this 
element, and then delete the row 3 and column 2. Since the largest remaining 
element is 8 , m ark T[2,0] and then delete row 2 and column 0. Continuing the same 
procedure, T [ l , l ]  and T[0,3j can be chosen. Marking each of the  chosen elements 
with asterisk, the resulting M  is
M  =
{2} $  {1} {0}*
{3} {5}' $  {4}
{8}* {7} {6} <&
$  {9} {11}* {10} J
From the marked M  matrix, extract one of the solution matrix  £3
£ 3  =
0 0 0 1 
0 1 0  0 
1 0  0 0 
0 0 1 0
42
The other two solution matrices E-\ and E 2 can be obtained in the same manner.
■ 0 0 1
10
' 1 0 0 0 '
1 0 0 0
and E 2 —
0 0 0 1
0 1 0 0 0 0 1 0
1 ' 0 0 0 1 . . 0 1 0 0 .
4.2.3 Kubale’s Counterexample
Ram anujanTs algorithm decomposes a k  x k m atrix  M  of sets of integers, called 
the allocator m atrix , into n  matrices having exactly one nonzero integer in each row 
and column. Ivubale [37], however, noticed th a t  the algorithm is incorrect for k  >  4 
because it may run into an endless loop in Step 3, although it works well for k < 4. 
For example,
P  =
0 1 2 3 4 5 6 7
2 4 0 3 1 5 6 7
with n =  2 and k =  4. From the step described above, the  allocator m atrix  M  
becomes as follows:
M  =
’ $  {0} {1} $
{2} {3} 4> $
{4} $  {5} 4>
$  $  $  {6,7}
By choosing the m axim um  integer in each of the sets m i j  we obtain an integer m atrix
$  {0 } { 1} $
{2} {3} $  $
{4} 4> {5} $
$  $  $  {7} _
Since 7 is the largest element, T[3, 3] is marked, and row 3 and column 3 are removed, 
leaving
*  {0} { 1} .
{2} {3} $  .
{4} $  {5} .
T  =
T  =
The next largest element is 5, and T[2, 2] is marked. T  t hen becomes
{ 0 } . .
T  = {2} {3}
43
From the above m atrix , it is obvious th a t  the invalid choice is made by selecting 
T fl ,  1] and then, TfO, 0], which has no elements. Since we are unsuccessful in selecting 
four nonzero integers, we must set the largest element of the original T  m atrix  to 
zero and go back to previous steps. Then the  T  m atrix  becomes
T  =
$  {0} {1} $  '
{2} {3} $  $
{4} $  {5} $
$  $  $  $
However, going back to previous steps is of no effect here because constructing the 
new T  is based on same m atrix  M ,  and the algorithm loops indefinitely, thus showing 
that the R am anujam 's  algorithm does not work in all cases.
4.2.4 Jajszczyk’s Algorithm
Neiman has shown tha t the control of the  rearrangeable switching network can be 
interpreted as a procedure of finding a set of E  matrices which can be subtracted, one 
at a time, from some given H m, and the E  matrices are perm utations to be realized 
by the middle-stage switches, which a one denoting a crosspoint to be closed and a 
zero to be open. Jajszczyk [38] used another approach to find a set of E  matrices, 
which is illustrated as follows.
Step 1: For each row and column of the m atrix  H m, find the number of zeros.
Step 2: Find the row or column with the m aximum number of zeros and mark an 
arbitrarily chosen nonzero element in this row or column.
Step 3: Cross out the row and the column containing the marked element. The 
size of the matrix is essentially reduced by one. although the indices of the elements 
remain unchanged.
Step Repeat the procedure m — 1 times, starting from step 1. for the reduced 
matrix. The last element is always a nonzero element and is marked after in — 1 
repetitions of the procedure.
44
Step 5: Form an elem entary perm utation  m atrix  E  with the elements E[i, j]  given
by
jpr • l _  J if hi,j is n° t  marked
|  1, if h i j  is marked
The obtained E  m a tr ix  is then subtracted  from the  H m m atrix , and the  procedure
is repeated for the resultant m a tr ix  / f m_, (0 <  i <  m. — 1), until the m atrix  H\  is
obtained. Notice th a t  the m atrix  H \  is equal to m a tr ix  E\ .  Jajszczyk's algorithm
is simple and the tim e complexity of the algorithm  is 0 ( n k 2), which is fast among
the  m atrix  decomposition algorithms.
4.2.5 Cardot’s Counterexample
Jajszczyk's algorithm is very efficient and works p re tty  well. However, Cardot [39] 
has found some errors in this algorithm. For example, the H 4 matrix with k =  10 is 
given below.
0 0 1 0 1 0 0 0 0 2
0 0 0 0 1 0 1 2 0 0
0 0 0 0 0 3 0 0 1 0
1 0 0 0 1 1 0 0 0 1
0 0 0 2 0 0 0 2 0 0
2 0 2 0 0 0 0 0 0 0
0 0 1 1 0 0 0 0 2 0
0 2 0 1 0 0 0 0 1 0
1 0 0 0 0 0 2 0 0 1
0 2 0 0 1 0 1 0 0 0
According to Steps 1 to 3 of Jajszczyk's a lgorithm , the  elements H  [3, 6], H\o,  7], H  [6.1] 
and f/[8,2] can be m arked, and all the rows and columns containing the marked 
elements are crossed out. leaving
45
. 1 0 1 0 0 2
. 0 0 1 1 0 0
. 0 0 1 0 0 1
. 1 1 0 0 2 0
. 0 0 0 2 0 1
. 0 0 1 1 0 0 _
Suppose we choose the element H [3, 7] which has m axim um  five zeros in its column. 
At the  next step, column 9 will be em pty, so the algorithm is blocked, which means 
there  is a flaw in Jajszczyk's algorithm.
4 .3  P a r a l l e l  D e c o m p o s i t i o n
4 .3 .1  C a r p i n e l l i ’s A l g o r i t h m
R am an u jam ’s algorithm and Jajszczyk 's  algorithm fail because they could not predict 
the  partitionability  of the  given perm uta tion  m atrix  in advance. Any algorithm to 
perform a m atrix  extraction m ust have the capability to determ ine whether a p a r t i­
tioning exists. Neiman's algorithm  achieves partitioning by convolving the marked 
elements until the partitions are accounted for although he never explicitly checked 
for them. Carpinelli’s algorithm [50] introduces a concept of partitioning which 
accounts for the failure these m atrix  decomposition algorithms. An algorithm to 
recognize this partitioning is given below. 
p a r t i t i o n ( i / m£'m)
{
int. H'm, partiiion.erists,  M \ .  A/2;
H ' m  =  E m  =  0 ;
while [H'm 1 = 0 )  {
partit ion_ exists— NO;
46
generate_partition( , partition-exists, M i ,  M 2 )',
if (parti t ion-exists==NO ) {
pick i, j  such tha t ^  0;
£ m [ . , i ] = l ;
)
else {
partition (M i, E i J; 
partition  (M 2, E 2 )',




First, the algorithm initializes the variables H'm and E m. The while loop 
adds elements to E m until it becomes a  perm utation matrix. T he subroutine 
gcnerate-partition() is to check if the partition exists. If a partition exists, the 
subroutine forms the submatrices and returns them  in Mj and M 2. If a partition 
does not exist, i and j  are chosen to m ark an arbitrary non-zero element. If a 
partition exists, two partition submatrices are processed recursively. The subroutine 
generatt-parti t ion() is shown below, which is a heart of the algorithm. 
g e n e ra te _ p a r t i t io n (M ^ ,,  partition .exists, M i, M 2 )
{
int. R, C;
Mi =  0; M 2 =  0;
parfor (each possible set of rows of H'm ) {
R  =set. of rows of H'm \
C  = se t  of all columns of H'm th a t  have a t least one non-zero in a row of R;
47
if (1*1= \ C \ )  {
Mi  =  rows and columns of H'm in R  and C;
M 2 =  rows and columns of H'm not in R  and C\ 
partit ion_ exists=Y  E S ;
}
}
This subroutine generates all the possible sets R  in parallel, and checks all 
possible partitions. F irs t,  the subroutine initializes the variables, and checks the 
partitionability  in parallel. The condition of the  partitionability  can be checked by 
extracting the sets R  and C  and checking the num ber of elements in the two sets. 
If the number of elements in the two sets is the same, this means tha t a partition  
does exist, and parti tion  submatric.es M i and M 2 are formed and the flag is set. 
Once a partition  is found, parallel executions are term inated , and the subroutine 
exits, returning the  values obtained. Return ing  to the subroutine parti t ion(J, two 
partition submatrices are recursively processed and partial E  matrices are created. 
A partial matrix  is a m atrix  with one or more rows of all zero elements. Then, these 
partial E  matrices are combined together to form E m.
For example, consider the matrix
H m =
1 0  2 0 
0 2 0 1 
2 0 1 0  
0 1 0  2
First, variables are initialized by setting H'm =  H m and Ern =  0, and starts  pa r t i­
tioning. Then partit ion-exists  is set to NO and calls subroutine generate-partition (). 
One of these execution has R  =  {1,3} and C  — {1-3}. Since |/?| =  |C | — 2.
48
partit ion.exists  is set to  YES, and Mi and M 2 become
' 1 2 '
2 1
and M 2 =
2 1
1 2
Since a partition does exist, Step 3b recursively processes M\  and M 2, resulting in 
two E i ,  £2 matrices
' 1 0 0 0 ' ‘ 0 0 0 0 ■
0 0 0 0
and £2 =
0 0 0 1
0 0 1 0 0 0 0 0
1 O 0 0 0 . .  0 1 0 0 .
These two matrices are added, resulting in the final m a tr ix  E m
' 1 0  0 0 '
0 0 0 1 
m ~  0 0 1 0 
. 0 1 0 0 .
4.4 Edge coloring and Matching
So far, m atrices are used to represent the Clos network and the  decomposition has 
taken place on th a t  basis. A nother approach to  represent a perm utation network is 
by using the b ipartite  m ultigraph. The b ipartite  m ultigraph G  can be expressed as 
a trip let {V'i, V2, £ } .  where V'i and l-r2 are sets of vertices and E  is the multiset of all 
edges of the multigraph. T he  coloring is the process of assigning tags, or colors to 
each edge such th a t  no vertex has m ore than one edge of a given color incident to 
it. This is actually a means of minimizing the num ber of colors used. The m atching 
is the  process of creating a set of edges such th a t  no two are incident to a common 
vertex. The following algorithms make use of coloring and m atching to effectively 
decompose the  perm utation .
4.4.1 Introduction
The graph theoretic approach to  finding the setting of the switches of stage 1 s ta r ts  
by trea ting  each switch in stages 0 and 2 as a vertex in a multigraph G. Let the set
of switches of stage 0 be denoted as V’O and the set of switches of stage 2 be denoted 
as V’2. Then, given a perm utation P,  an edge is added between vertex i and vertex 
j  if an inlet a ttached  to switch i of stage 0 is to be routed to an outlet attached to 
switch j  of stage 2. The result of this is the b ipartite  multigraph G =  (V’O, V’2. E).  
where E  is the set of edges between V’O and V2. G is a m ultigraph since multiple 
edges between vertices are allowed, and is b ipartite  since each edge in G  is incident 
to two vertices, one in VO and the other in V2. The degree of G,  which is the 
num ber of edges incident on any vertex, is clearly m.  The graph theoretic approach 
then decomposes G  into m  subgraphs, each of degree 1. Each such subgraph will 
represent the setting of one of the m  switches of stage 1. An edge in a subgraph 
between vertex i, i £ V’O, and vertex j .  j  £ V’2, indicates th a t  an input to switch 
•i is to be connected to an ou tpu t of switch j .  These settings insure th a t  no conflict 
will occur in stage 1 and all required paths specified by the perm utation  will be 
accommodated.
Many algorithms have been proposed to decompose G in the general case 
[40]. Hwang's algorithm runs in 0 { k 5̂ 2) time. O ther algorithms also exist where 
techniques such as edge coloring and Euler partitioning are used. The graph 
based algorithms are outside the scope of this thesis. The two routing approaches 
mentioned above have been discussed extensively in the literature and the graph 
theoretic techniques have always been described as more efficient. However, it has 
been found th a t  both edge coloring and direct matrix decomposition approaches are 
equivalent [51]. This finding may well lead to a new, unified routing algorithm that 
makes Clos network particularly suitable for processor interconnection in large-scale 
multiprocessor systems.
50
4.4.2 V izing’s M ethod
Vizing's m ethod [48] of coloring a b ipartite  multigraph uses the method of alternating 
path. The m ultigraph is initially uncolored, and each iteration adds one more colored 
edge to the multigraph. Assume th a t  edge { i . j )  which is incident to vertices ?' and j  
is uncolored. In the multigraph for Clos networks, each vertex has degree m.  Since 
this edge is uncolored, vertices i and j  are each missing at least one color. Assume 
that vertex i is missing color a and vertex j  is missing color b. If they both miss 
the same color, th a t  edge can be colored by the missing color. Color edge ( i , j )  with
a. This now' leaves two edges incident to vertex j  with color a and none with the 
color b, so change the color of the other edge from a to b. If this causes another 
vertex to have two edges colored 6, change the color or the other edge from b to a. 
and continue until the coloring is valid. Since the multigraph is b ipartite , and both 
vertex sets have the same cardinality, there m ust be at least one other vertex which 
needs color a. T he  alternating path, the  pa th  of edge color changes, will eventually 
find this vertex. An algorithm based on Vizing’s method which was formalized by 
Gabow and Kariv [46] is shown below'. 
a u g m e n t ( )
{
let vertex i miss color a and vertex j  miss color b\
let S  be the subgraph of edges colored a or 6;
let P  be a connected components of 5  incident to i or j \
interchange color a and b on the edges of P\
color edge (i, j);
}
As an example, consider Figure 4.1. First, edge (x \ , y \ )  is selected. Since
vertex a-] does not have color a and y i misses color 6, an alternating path of colors
a and b will be formed. Then color edge (X \ . y \ ) with a. The time complexity
51
YI XI
X2 Y2 X2 Y2
X3 Y3 X3 Y3
a)
F ig u r e  4.1 Augmenting bipartite  multigraphs: (a) before, (b) after
of this algorithm for the complete coloring is 0 ( |V ’| • IjE|) where |F |  is the number 
of vertices in the multigraph, and \E\ is the number of edges. Since | l ' j  = 2k. and 
\E\ =  m k  for the C’los network, the time complexity is again O ( n k ’ ). Likewise, t he 
space complexity is 0 ( |V |  +  l-EI). which reduces to 0 (n k ) .
4 .4 .3  E u l e r  P a r t i t i o n s
The Euler partition uses a divide-and-conquer technique. This partitions the edges 
of (i into open and closed paths, so that each vertex of odd/even degree is the end 
of exactly one/zero open paths. Figure -1.2 shows the Euler partitioning of a graph. 
The partition enables the division of G into two edge-disjoint subgraphs G\ and 
A path can be found by starting at a vertex of odd or even degree and selecting an 
edge. Add it to the path, traverse the edge from the original vertex to the other 
vertex it is incident to. and remove it from G. Repeat the process until a vertex ol 
zero degree is readied. If E  ^  <I> t hen repeat t he ent ire process. Once I he mull igraph 
is reduced to a set of paths, the subgraphs can be determined. This procedure can 
be formalized as the following recursive algorithm.
F i g u r e  4 . 2  Euler parti tioning
BEGIN
1. Lei 6 be the maximum degree in G';
2. If S =  1 TH EN  color all edges in G  using a new color 
ELSE
BEGIN
.'i. Form G] and G'2 using an Euler partition such that neither





4.4 .4  G a b o w ’s M o d if ie d  A lg o r i t h m
Ciahow [45] presents a modified version of the previous algorithm which always
determines a minimal edge coloring. If the degree of the vertex is odd. the algorithm
53
finds a matching of all vertices having m aximum degree. The edges in this matching 
are colored and removed from the multigraph. This reduces the degree of the 
multigraph by one, and the  degree now becomes even. The rest of the algorithm 
follows the same procedure as the  previous one, as illustrated below.
P R O C E D U R E  EC ( G J ) -  
BEGIN
PR O C E D U R E  REC(G,^);
BEGIN
1. IF 8 is odd THEN 
BEGIN
2. IF 6 = 1 TH EN  M  :=  G  ELSE M D (6’.M);
3. Let c be a new color;
4. FOR each edge e E M  DO
BEGIN
5. color(e) :=  c;




8. IF P  is not em pty  TH EN
9. Make L\  and L 2 em pty lists;
10. For each path p in P  DO 
BEGIN
11. Let p be the sequence of edges e j , • ■ •, er ;
12. For i :=  1 to r  DO
13. IF ? is odd TH EN  put e, in L\ ELSE put e,- in L 2\
54
END:
14. FO R i:= 1,2 DO 
BEGIN
15. Let G{ be the multigraph consisting of the edges in L, and the 





17. Delete all vertices of degree 0 from G\
18. Let S be the maximum degree of a vertex;
19 REC(G .d);
END(EC);
MD is a procedure which finds M  which is a matching of all vertices of maximum 
degree. EP forms P. the set of paths needed to derive the Euler partition. G a b o v ’s 
algorithm runs in time 0 { n k 3̂ 2 lg k) for the Clos network where m = n.
4.5 Gordon’s Algorithm
Unlike the above algorithm, Gordon [43] uses a unique method to decompose the 
matrix, although the nature of his algorithm is the same as the coloring decompo­
sition. He defined two k x  n matrices S  and C, called the specification and count 
matrices, respectively. The relations between the H ,  S.  and C  matrices can be seen 
in Figure 4.3. If we use the notation proposed by Neiman in reference to the Clos 
network, then the necessary connections are assumed and expressed as a perm utation
S matrix
0 1 2 2
1 3 2 0
0 4 4 3
3 3 0 4
2 4 1 1
1 1 2 0 0
1 1 1 1 0
1 0 0 1 2
1 0 0 2 1
0 2 1 0 1
C matrix
2 0 1 1
1 1 1 1
1 0 2 1
1 2 0 1
0 2 1 1
H matrix
F ig u r e  4 .3  Relations between the H . S .  and ( ’ matrices
/ C l  1 ... ; ... . v - 1
 ̂ TT(O) “ (I)  ... *(,'] ... "(A -  1 )
where inlet /’ is to be connected to outlet 7r(/). (J < i < A — 1. and A = ink. Initially. 
.S' is set to represent the specification in the following way. All elements of 5  are 
unassigned. Then for each signal i. 0 <  / <  A' —1. calculate .r and / where .r =  [i/ n \ is 
the first-stage input switch at which signal arrives, and / =  is the last-stage
output switch to which it should be routed, and set the next unassigned element in 
the .rth row of .S’ to /.  On the other hand, each element of C .  cj.r. //]. 0 <  ,r <  /.• -  1. 
b < // < » -  1. is initialized to the number of occurrences of the integer ./• in column 
fl of .S'. The ]jointers c.r and s.r represent rows of C  and .S' matrices respectively, and
!/ and r represent columns of the .S' or C' matrix. As an example, a sample / ‘ matrix
and result ing .S’ and C matrices when k = 4 and n — 3 are
5 6
P = 0 1 2 3 4  5 6 7 8 9  10 112 10 3 5 6  11 7 1 9 4  0 8




2 0 3 1 1 1
. 1 0 2 . 0 1 2
5  =
Algorithm: Initially, sx  is set to zero.
Step 1: Find a row c.r, in column y of C  such tha t c[ca-,?/] =  0. If no such element 
can be found then  increment y until either such an element is found or all columns 
are satisfied, in which case the algorithm halts with a solution.
Step 2: If we have not halted we must have found e{cx,y\ =  0. There must therefore 
be another column z (greater than y since we are leaving only satisfied columns to 
the left), such th a t  c\c.x,z] >  1. This follows since there are exactly n copies of each 
element (0 to 7? — 1) in each row. so a missing element in one column implies a repeated 
element in another. We increment z, from the  initial value y, until c[c.r,c] >  1.
Step 3: We now have a. column z of S  th a t  contains more than one copy of the missing 
element ex. Repeatedly increment .sr (mod k) until s[s.r, z] =  ex. As explained later, 
this way of setting sx  prevents the algorithm from entering a loop in which the same 
elements are swapped repeatedly on successive passes.
Step J,: Swap the  elements s [sr ,y ]  and s[s.r,r] . thus inserting the missing element 
s[s.r.r] into column c of S.  This will as a side effect reduce the number of elements 
s[s.r.r].
Step 5: Increm ent c[cr,y] and c{s[s.r,7/].r} and decrement c[c.r. r] and c{s[.sa. ;;/],(/}. 
Step 6: Increm ent ex (mod k) and go to Step 1.
Example: T he application of Gordon’s algorithm is illustrated by the following 
sequences of m atrices for the example. The two elements of the scheduling matrix  
that have been swapped are marked by *; the incremented and decremented elements 
in C are marked by +  and —. The P  m atr ix  is given as
5 7
P  =
0 1  2 3 4 5 6 7  8 9 1 0  11
4 1 2  11 6 8 9 7  10 0 3 5
The 5  and C  matrices are
'  1 0 0 2 ' '  0 2 2 0 ‘
5  = 1 2 2 1 and C  = 2 0 0 2
2 0 0 1 , _ 1 1 1 1 _
In the first iteration, cx =  0, y =  0, c =  1, and sx =  2. The resulting matrices are 
as follows with the swapped elements marked with asterisks.
1 0 0 2 ' ’ 1 + 1“ 2 0 ‘
.S' = 1 2 2 1 and C  = 2 0 0 2
0“ 2* 0 1 _ 0" 2+ 1 1 _
In the second iteration, cx = 2, y = 0, ~~ = 1. and s x  = 1.
1 0 0 2 ■ 1 1 2 0 ‘
5  = 2- r 2 1 and C  = 1“ 1+ 0 2
0 2 0 1 1 + 1“ 1 1 _
In the third iteration cx = 1 .  y = 2, x =  3 and s.r =  2.
'  1 0 0 2 '  1 1 1" 1+ ‘
S  = 2 1 2 1 and C = 1 1 1 + 1"
0 2 1' 0“ 1 1 1 1
In this example, S  becomes the solution m atrix after the th ird  step of the algorithm. 
The run time is dom inated by the number of swaps, which has time complexity 
0 ( n k 3/ 2). Gordon's algorithm is basically a special kind of edge coloring algorithm. 
Each column of the decomposed .S' matrix  determines the switch setting of a second 
stage switch whose destination is given by elements in tha t column. Since the 
Clos network has connections from each center-stage switch to each of the last- 
st.age switch, elements in each column of the 5  m atrix  are not identical. G ordon’s
58
algorithm, however, has been found t.o display errors which will be discussed in 
chapter 6.
4.6 Discussion
Neiman's algorithm, which consists of two stages, works for all permutation. 
However, the m atrix  algorithms of Jajszczvk and Ram anujam  are faster, but do 
not work for all permutations. This is due to the improper choice of elements in 
the H  matrix  which leads to errors in the algorithms. This can be prevented using 
the partitioning, which works for all perm utation  and does not require backtracking. 
Gordon's algorithm uses two matrices for the  decomposition. However, his algorithm 
is closer to the coloring algorithms in na tu re  because elements in each column of 
the decomposed .S m atrix  can be considered as edges colored with one of n different 
colors. The H . 5, and C  matrices are closely related, and each m atrix  has its own 
characteristics. Although the H  m atrix  and bipartite  m ultigraphs are basically 
the same, edge coloring algorithms usually work faster than  m atrix  decomposition 
algorithms without any errors. Gordon's algorithm does not work for all cases: this 
will be discussed in chapter 6. Also, a new routing algorithm is introduced based 
on Gordon's algorithm. The routing algorithms for fault to lerant Clos networks are 
discussed in chapter 7.
C H A P T E R  5
FAULT TO LER A N T MINS  
5.1 Introduction
In chapter 3, we reviewed the interconnection networks tha t can be applied for 
parallel/d istributed computer systems and switching networks. However, these 
interconnection networks provide only one pa th  from a given network input to a given 
output. Hence, if there is a single hardware fault, fault-free communication will not 
be possible between some network in p u t /o u tp u t  pairs. Different approaches to fault 
tolerant multistage interconnection networks have been studied. In general. MINs 
can be made fault tolerant by adding ex tra  hardware such as switches, interstage 
links and multiplexers/demultiplexers. Adding extensive hardware usually decreases 
performance degradation under faulty condition, but increases the cost and size. 
Adding little hardware, on the other hand, increases performance degradation under 
faulty conditions but keeps the cost and size down. As a consequence, a compromise 
must be made where the trade-offs are weighed carefully and the best design is 
reached. A good fault tolerance technique is one tha t needs minimal hardware and 
causes minimal performance degradation under faulty conditions. Any fault tolerance 
technique should cause no performance degradation under normal conditions. As 
the extreme case, the duplication provides two networks in parallel, with one being 
active and the  other being standby. If a fault occurs, the standby network is switciied 
in and the faulty network is switched out, and normal operation resumes. This 
approach provides the same performance in fault}' conditions as in normal conditions, 
but increases the cost, and size of the system.
A number of fault, tolerant MINs have recently been reported for m ultipro­
cessor systems. The details of these techniques depend mainly on the type of network
59
60
and the fault tolerance model used. Fault, tolerance has also been provided for some 
o ther network architectures through various approaches. In this chapter, some of the 
fau lt 'to le ran t MINS are discussed including the  Extra-Stage Cube (ESC) and fault- 
tolerant Clos network (FTC). The advantages and disadvantages of each network 
will be discussed. This will help explain the problem of fault tolerance, and thus will 
facilitate its solution. The reconfiguration of the  fault tolerant networks when faults 
occur is considered.
5.2 Extra Stage Cube (ESC) Network
The ESC network is formed from the generalized cube (GC) network by adding one 
extra  stage and multiplexers/dem ultiplexers to activate the bypass extra stage (.stage 
3) or the output stage (stage 0) [56]. An ESC network for N  =  8 inputs is shown in 
Figure 5.1. The stages are numbered in decreasing order from 3 to 0 s tarting  from 
the  extra stage. Stage 3 offers t.wo types of pa ths  depending on the  s ta tes  of the 
multiplexers. This results in an additional pa th  being available from each source to 
each destination. A stage is enabled when its interchange switches provide paths to 
the next stage. It is disabled when its interchange switches are bypassed. Enabling 
and disabling of stages 3 and 0 is accomplished by having dual in p u t /o u tp u t  ports, 
and multiplexers and demultiplexers to select between the  in p u t /o u tp u t  lines. Figure
5.2 details interchange switches for stages 3 and 0. At. stage 3. a m ultiplexer selects 
between two sets of identical input lines, one of which bypasses the stage 3 switch and 
the other of which routes through the switch. At stage 0. a demultiplexer provides 
the option of bypassing the switch or routing data  through it. Failures may occur 
in network interchange switches, links between interchange switches, and network 









X  x x  x E ^ e





























F ig u r e  5.2 The Extra Stage Cube Network: (a) Stage (J interchange switch (b) St age 
3 interchange switch (c) Stage (J enabled (cl) Stage 0 disabled (e) Stage 3 enabled (f) 
Stage 3 disabled
Once a fault occurs in the network, the network is recovered in the following 
ways. It is assumed tha t the ESC network can be tested to determ ine the existence 
and location of faults. If an input line connected to a stage 3 multiplexer fails, stage 
3 is enabled and the nonfaulty input line is used instead. If the fault is on an input 
line to a stage 3 interchange switch, that line is unused and the system continues to 
ignore the faulty line. If an output line from a stage 0 switch to a PE is faulty, the 
network is reconfigured as if stage 0 is faulty. If the fault is on an output line lrorn 
a demultiplexer, that line is unused and the system continues to ignore the faulty 
line. Stage 3 and U enabling and disabling may be performed by a system control 
unit. In normal operation, stage 3 is disabled and stage tl is enabled. This fault-free 
ESC is topologically identical to a C!C. If after running fault detection and location 
tests a fault is found, the network is reconfigured. If the fault is in stage 0. stage 3 is 
enabled and stage 0 is disabled. For faults in a link or switch in stages 2 or 1. both 
stages 3 and 0 will be enabled. Stage 3 of the ESC network allows access to two
63
distinct stage 2 inputs. Stages 2 to 0 of the ESC network form a GC topology, so 
each of the two stage 2 inputs has a single pa th  to the destination, and these paths 
are distinct except for the stage 3 and 0 switches, which are fault-free in this case. 
Thus, at least one fault-free path  m ust exist.
The ESC uses a  routing tag scheme for the control of the network, which is 
similar to the exclusive-or tag scheme for the GC network. The ESC network uses 
4-bit routing tag T  = f°r th e one-to-one source to destination connection.
The tag values depend on whether the ESC network has a fault, as well as the source 
and destination addresses, which need to be computed. If the network is fault free, 
stage 3 is disabled and the routing tag  is T  =  t%t2Mo- where t3 is ignored and can 
take any value. If there is a fault in a network link or switch in stages 2 to 1. stage 3 
is enabled, and bit 3 of the tags can be used to control stage 3 and select between the 
one of two paths. The primary path is used if it is not faulty. However, if it is faulty, 
the secondary path  is used. For routing tags, T  =  0 / 2M o yields the prim ary path 
and T  = 1 t 2M o the secondary path. Stage 0 uses t 0 instead of t0 to compensate for 
the swap already performed by stage 3. If the fault is in stage 0. stage 3 is enabled 
and stage 0 disabled. A routing can be accomplished by substitu ting stage 3 for 
stage 0. because both stage 0 and stage 3 perform same functions. In this case, the 
tag is T  -  toUtiio- where tg is ignored because stage 0 is disabled. The /3 is now set 
as /0, and stage 3 performs the function of stage 0.
The fault size of the ESC is 1. and any inputs must remain capable of accessing 
any outputs after the  ESC recovers from a fault. The ESC is robust in the presence 
of multiple faults. The ESC offers a straightforward routing method. In addition, 
the multiplexers and demultiplexers need to be set only after a fault occurs. Also, 
the ESC does not need specially designed switches. Simple binary switches and 
1 x 2  multiplexers/demultiplexers are used in order to form the ESC along with 
interstage links. On the other hand, the ESC requires N / 2  extra  switches in addition
64
to A’ multiplexers and N  demultiplexer to achieve fault, tolerance for a MIN of size 
N .  Also, there must be an external hardware unit to set all the multiplexers and 
demultiplexers so tha t d a ta  is routed through stage 3 ra ther  than being bypassed 
when a fault occurs. Furthermore, after recovering from a fault, additional time is 
needed to find if the fault lies on the primary path or on the secondary path before 
generating a new routing tag. This time constitutes performance degradation, as it 
slows down the system. Although the ESC has many advantages and drawbacks, 
this network is considered one of the best fault tolerant MINs reported.
5.3 Fault tolerant Clos Networks (FTC)
The fault tolerant Clos network adds fault tolerance to the ordinary Clos network 
by using extra switches and multiplexers/demultiplexers [60]. Recall tha t the Clos 
network of size N  must have k  =  N / m  switches of size n? x n in stage 0. and k 
switches of size n x m  in stage 2. The n switches of stage 1 must be of size k x k. 
An ordinary Clos network has n =  in. However, when n > in, some degree of fault 
tolerance is obtained since ex tra  paths exist in the network.
The F T C  achieves fault tolerance in the following wavs. To make the outer 
stages fault-tolerant. ksp extra switches are added to each of these two stages. Also. 
n sp extra switches a.re added to the middle stage in order to make it fault, tolerant. 
In the FTC. each inlet is connected by a demultiplexer to 1 + n sp distinct switches in 
stage 0. Also, each outlet is connected by a multiplexer to 1 + n sp distinct switches in 
stage 2. These multiplexers and demultiplexers serve as a fault, recovery mechanism 
iri case of a fault in either of the two outer stages. Figure 5.3 shows the FTC with 
n =  k — in — 3. and ksp — n sp =  1.
An FTC  with A' =  ink  is formed from an ordinary Clos of size A’ as follows. 
First., use k +  kap switches with size m  x (n +  n sp) in each of the outer stage. Then 
the original center stage switches must be enlarged from k x k to (k + kap) X (k + ksp).
6 5
□  □ □  □ □









































Connect the network inlets to the inputs of the first stage switches via 1 x ( nsp +  1) 
demultiplexers, and the network outlets  to the ou tpu ts  of the third stage switches 
via 1 x {n3p +  1) multiplexers.
For the FTC . the fault model is defined as follows.
1. Any switch can fail.
2. Any interstage link can fail.
3. External links and multiplexers/dem ultiplexers cannot fail.
It should be mentioned th a t  the  faults are assumed to occur independently, and that 
faulty components are unusable. The fault tolerance criterion of the FTC' is complete 
recovery, th a t  is. regaining pre-fault connectivity after a fault occurs.
5.3.1 Reconfiguration of the FTC
It is im portant for the FT C  to be reconfigured in case of faults in order to regain 
its pre-fault connectivity. Consider an FTC  network with n 3p — k3p = 1. Let three 
switches be A ?’), 0 <  ? < 2. where f 0, / ] .  and J2 are unused switches of the first, 
second, and third stage, respectively. The configuration of the  FT C  at any tim e is 
a function of the present values of /o. .A, and A. In general, the reconfiguration of 
the FTC  can be performed through one or more of the following operations:
• Setting the multiplexers and demultiplexers
• Terminal relabelling
• Perm utation translation
As will be seen below, the value of .A affects the term inal relabelling, while the 
values of f 0 and .A affect the settings of multiplexers/dem ultiplexers and perm utation
67
translation. The multiplexer/demultiplexer setting is performed if an outer stage 
switch fails.
When the FTC  is not faulty, one switch in each stage will be unused. This 
unused switch can be any switch, but for convenience it will be assumed to be the last 
switch in each stage, i.e., X[ k .  0), X ( n  — 1,1). and X( k , 2 ) .  This choice is convenient 
because it makes the multiplexers and demultiplexers remain in s ta te  0 under normal 
conditions. When a fault occurs, they can switch to s ta te  1, thereby avoiding the 
defective switch. Perm utation translation is also performed if an outer stage switch
fails. Let P  -- {Po, P]  ,P/v_i} be an arbitrary  perm utation of { 0 .1 ... . .A ' — 1}. In
the actual network. Pi is the outlet to which inlet i is to be connected. In an ordinary 
Clos network, P  goes directly to the central routing unit where th e se t t in g s  of the 
individual switches are extracted and delivered to the switches for implementation. 
In the FTC . the same steps are to be taken with the exception that permutation 
P  is translated  before it goes to the central routing unit. Terminal relabelling is 
performed if a middle-stage switch fails.
As mentioned above. j \  affects the labelling of the outputs  of switches A’(c.2). 
0 < c < k +  1. Let these outputs and inputs be referred to as the inward terminals of 
the outer stages or just the inward terminals. In each of these switches, only m out 
of the /? inward terminals will be used, and will be referred to as the active terminals. 
Each active terminal will have two labels: a local one. to be used by the switch's 
control unit, and a global one, to be used by the central routing unit. The local label 
is an integer -. 0 < x < m.  and the global label is also an integer Z.  0 < Z  < r n ( k + 1). 
The active terminals will be labeled from top to bottom  locally, with respect to the 
switch, as the sequence 0 .1 .. . . .  7?? — 1. Globally, the active terminals tha t were labelled 
from top to bottom locally will be labelled from top to bottom, with respect to the 
stage, as 0 . 1 , . . . m {k  +  1) — 1. The labels are updated always after a fault occurs, and 
the current labels are used to implement the routing information received from the
control unit. More details about the terminal relabelling can be found in [60], The 
reconfiguration of the F T C  network can be illustrated more straightforward using 
the 5  and C  matrices, as can be seen in following examples.
5.3.2 Examples
To illustrate this, consider a perm utation P  of the F T C  with n =  k =  3. and one 
spare switch in each stage.
P  =
0 1 2 3 4  5 6 7 8  
1 4 5 0 8 7 3 2 6
Initially, let the unused switches be A"(3, 0), A '(3 ,1), and A (3.2) in each of the three 
stages of the FTC. Recall that this is the configuration suggested to be used under 
normal conditions. Then, perm utation Q, according to the rules set forth above, will 
be
Q =
The H  and 5' m atrix  representations of P  are
0 1 2 3 4 5 6 7 8 9  10 11 
3 4 8 7 6 1 2 5 0 .t  x  x
■ 0 2 1 ' '  1 1 2 ‘
II 1 0 2 S' = 2 2 0
2 1 0 J 0 1 0
■ 0 2 1 0 " ■ 1 1 2 * '
1 0 2 0 S
2 2 0 *
2 1 0 0 0 1 0 *
.  0 0 0 3 . .  # # #  .
On the other hand, the  H  and 5  m atrix  representations of Q are
H 3 =
T he size of the m atrix  H  increases by exactly one row and one column, and the S  
m atrix  also has an additional row and column. The additional paths due to extra 
switches in the outer stages are represented as pound characters, and asterisks for the 
extra switches in the middle stage, which are explained in greater detail in chapter
As another example, again consider a perm utation  of P  of the FT C  with n =  
k =  3, and one spare switch in each stage. Assume th a t  switches A'( 1. 0). A’( 2 .1) and
6 9













































A’(2. 2) suddenly failed, as shown in Figure 5.4. Due to the failure of A’( 2 .1). the 
inward terminals of stages 0 and 2 should be relabelled. Specifically, inward terminal 
number 2 of each switch should be left out in assigning the numbers. The failure 
of A (1 .0 ) .  and A '(2 ,2) affects the  perm utation  translation. Perm utation P.  given 
before, is translated  according to the rules laid down above to
Q  =
0 1  2 3 4 5 6 7 8 9  10 11
3 4 11 x x  x  2 5 0 10 9 1
The routing result will be implemented by all the  switches except those tha t are 
defective, namely, A '(l,  0), A’( 2 ,1) and A'(2. 2). The m atrix  representation of perm u­
tation Q above is
0 2 0 1 
0 0 3 0 
2 1 0  0 
1 0  0 2
In the previous example, the S  m atrix was be given by
1 1 2 *  
2 2 0 *  
0 1 0 * 
L #  #  #
Since A'(1,0) is defective and A’(3.0) is a spare switch, all input signals are moved 
to the extra switch, and A’(1,0) becomes unusable, which is denoted as dots.
1 1 2 * 1
S  =
0 1 0 *  
2 2 0 *
Also, the faulty condition of A '(2.1) forces the elements in column 2 to be bypassed 
to the spare switch A’( 3 . 1) which is represented as column 3. resulting in




Finally, the faulty condition of Ar(2 ,2) prevents the use of the second switch of in 
the third stage. Instead, the signals assigned to this switch must now use the spare
71
switch which will be denoted as 3. Thus the resulting m atrix  is






Representing the reconfiguration of the network using the S' m atrix  shown above 
presents complications because of the introduction of dots in the rows and columns 
of the matrix. In chapter 7, the reconfiguration m atrix  is introduced which retains 
all the information of each switch's use without swapping the rows or columns.
C H A P T E R  6
NOVEL A LGO RITH M  FOR CLOS MINS
6.1 Introduction
Although Gordon's algorithm is simple and fast, as discussed in chapter 4. his 
algorithm does not work for all perm utations. His algorithm has two special features. 
First, the  use of two matrices in the algorithm contributes to the  improvement of 
the tim e complexity since it helps to find the num ber of occurrences of each element 
directly. The next is the use of s,r(mod A’), which is the heart of the algorithm 
and makes the  algorithm very effective. In this chapter, it will be shown that 
Gordon's algorithm does not work for all cases, and a counterexample will be given 
in .section 6.2. Next, a new simple algorithm will be introduced. This algorithm 
is based on the G ordon’s algorithm. Three kinds of swaps by which this algorithm 
realizes the desired mapping are discussed: 1) simple swap. 2) next simple swap, 
and 3) successive swap. Also, we are going to prove tha t the new algorithm works 
for all perm utations. In section 6.4. the worst case and the average behavior of the 
algorithm are discussed in detail.
6.2 Failure of G ordon’s Algorithm
The algorithm given by Gordon is very simple, fast, and works well when the matrix 
size is m oderate. Although Chiu and Siu [44] claimed the incorrectness of the 
algorithm, it s tem m ed mainly from a typographical subscript reversal, which led 
to a m isunderstanding about the algorithm. Gordon reaffirmed in his reply that the 
algorithm is still valid. However, our research found tha t his algorithm may run into 
an infinite loop for k > 5. The heart of his algorithm lies in the repeated increment 
of s .r(m od  k) until .sfs.r.:] =  c.r as shown in step 3 of his algorithm. Recall that
72
c.r represents a. row of C  which satisfies c[c.r.i/] =  0. This wav of setting .s.r is 
intended to prevent the algorithm from entering a loop in which the same elements 
are swapped repeatedly on successive passes. The setting of sx,  on the other hand, 
is influenced by the choice of ex. However, Gordon did not mention anything specific 
about the way of setting next cx after two elements in row s.r of 5  are swapped. This 
is especially true if row cx of C reaches k — 1 while column y of C  still contains zeros. 
It is quite possible th a t  the increment of cx (mod k) until c\cx.y] — 0 must have 
been used in the algorithm because this is the most easy and efficient way to choose 
the next value of ex. It is not likely that Gordon chose cx after some calculations 
because, if he had done tha t,  he certainly would have made it clear in the paper. W e 
have tested this algorithm on several possible cases. These include 1) increment of 
cx (mod k) after the swap. 2) decrement of cx, and reset to k — 1 when c.r < (J. and 
3) random choice of c.r. until c\cx,y) -- 0 for all three cases. An example for the first 
case is given below. The two elements of the C  m atrix  that have been swapped are 
marked by the incremented and decremented elements in S  are marked by +  and 
—. Suppose that currently, cx =  0, y — l . s . r  =  4. and
' 0 0 2 4 1 ‘ '  1 3 0 0 1 '
3 1 3 2 0 1 1 1 1 1
4 0 1 3 2 and C  = 1 1 1 1 1
1 0 4 1 4 1 0 2 1 1
2 2 3 4 3 _ 1 0 1 2 1
After the first repetition, cx =  1, y = 1. sx = 1, c =  2, and
'  0 0 2 4 1 ‘ '  1 3 0 0 1
3 3” r 2 0 1 o - 2+ i 1
4 0 l 3 2 and C — 1 1 1 i 1
1 0 4 1 4 1 1+ r l 1
2 2 3 4 3 _ 1 0 i 2 1
The second repetition yields cx =  4. y =  1. sx =  4. r  =  3. and
74
' 0 0 2 4 1 ' '  1 3 0 0 1 '
3 3 1 2 0 1 0 2 1 1
4 0 1 3 2 and C  = 1 0" 1 2+ 1
1 0 4 1 4 1 1 1 1 1
2 4' 3 2* 3 _ _ 1 1 + 1 1~ 1 _
After t.he third repetition, c.r =  1. y = 1, s x  =  1. z = 2. and
'  0 0 2 4 1 ‘ '  1 3 0 0 1 '
3 1* 3* 2 0 1 1 + 1- 1 1
4 0 1 3 2 and C = 1 0 1 2 1
1 0 4 1 4 1 0" 2+ 1 1
2 4 3 2 3 1 1 1 1 1
The fourth repetition yields cx — 2. y =  1, s.r =  4, r  =  3, which reduces to the 
first m atrix  and enters into an infinite loop. When examining the above example, 
it can be clearly seen tha t the use of (mod k) incrementing of s.r does not always 
effectively prevent the process from repeatedly finding the same element in following 
passes. In most cases, this does not happen and the algorithm behaves well, especially 
when k  < o. However, as k  increases, the  algorithm has more chances to enter a 
loop independent of the ways of setting cx  as described above. Chiu and Siu [44] 
reported a new algorithm by modifying Gordon's algorithm without giving the time 
complexity and proof tha t it works for all permutations. Also, their algorithm is 
trivial, so it will not be covered in this thesis. In the next section, a new algorithm 
is introduced for decomposing Clos networks which is based on Gordon’s algorit hm. 
This can be done by scanning the C  m atrix  row-by-row, and by a class of swaps, 
which will be explained later.
6.3 New Algorithm for Clos Networks
Although G ordon's algorithm is simple and fast, his algorithm has been demonstrated 
to have errors in some perm utations as shown in the previous section. In this section, 
a new algorithm will be discussed which is based on Gordon’s algorithm, but uses 
a different approach. In order to describe the algorithm, we shall use the notation
proposed by Neiman in reference to the Clos network. The necessary connections 
are assumed and expressed as a permutation:
P = 0 1 i7r(?)
N  -  1 
w{N - 1;7r( 0 )  7 T ( 1 )
where inlet i is to be connected to outlet 7r(?*). 0 < / <  N  — 1, and N  =  ink.  
This algorithm uses two k x n matrices S and C.  called the specification and count 
matrices, which were described in chapter 2. In order to obtain the 5' matrix  from 
the perm utation m atrix , the following step m ust be taken. Initially, all elements of 
S are unassigned. Then for each signal ?. 0 <  ? <  A' — 1. calculate .r and 1 where 
,r =  [_?’/??J is the first-stage input switch at which signal arrives, and / =  [7r(/)/;?J is 
the la.st-st.age ou tpu t switch to which it should be routed, and set the next unassigned 
element in the ,rth row of S  to t. The first stage switches are denoted by x. and the 
second stages are represented by y. Each element of s[.r. j/] is the destination switch 
in the third stage. Each element of C. c[a\?/], 0 <  x  <  k — 1. 0 <  y < n — 1. is 
initialized to the num ber of occurrences of the integer ,r in column y of S.
As an example, a perm utation P  and the 5  and C  matrices when k =  4 and 
11 =  3 is as follows.
P = 0 1
2 3 4 5 6 7 8 9 10
2 10 3 5 6 11 7 1 9 4 0
' 0 3 1 ' ' 1 2 0 ■
= 1 2 3 and C =
2 0 1
2 0 3 1 1 1
1 0 2 0 1 2
In order to explain the algorithm, it is necessary to define some of the term s that 
are going to be used.
D e f in i t io n  1: A column of C  is cl-missing if th a t  column does not contain any d. 
On the other hand, a column of C  is d-excessive if there are more than one d in that
column.
Definition 2: When a column y in the C  m atrix  is d-excessive and a column c is 
d-missing. an element which satisfies s[s:r,g/] =  d in the 5  matrix  for 0 <  sx  <  A’ — 1. 
is called a swapping element and s[s.r, ~] is called the swapped element.
Definition 3: When s[.s;r,y] is a swapping element and s[s,r. r] is a swapped element , 
then two elements .s[s.T,t/] and s[sa-.c] are simply swappable if s[sx.?/] < a[a.r. c] and 
c[cx. y } — c[cx, r] =  1 for 0 <  cx <  s[sx, y].
Definition 4: When c] is a  swapped element and s [s .t . y] is a swapping element , 
then two elements c[cx.y] and c[cx,z] are successively swappable if s[.s;r, y] > s[^.r.r] 
and c[c.r, y] = c[c.r, ~] =  1 for 0 < cx < s[.s;r, j/].
Definition 5: When two elements ,s[.s.r.?/], and a[.s.r.c] are swapped because of 
being successively swappable, an element sfsxi.j/] which satisfies .s[s.r].y] =  s[s.r. g/] 
is called a [air. y]-alternative.
The new algorithm is illustrated as follows.
Algorithm: Initially sir is set to zero.
Step 1: Find a column ex. in a row y of C such th a t  c[c.r,y] > 1 .  If no such element 
can be found then increment cx until either such an element can be found or all rows 
are satisfied, in which case the algorithm stops with a solution. If the algorithm lias 
not stopped, it must have found c[cx,y] > 1. Set r  =  0.
Step 2: Increment r  until c[c.r, ~] =  0. This follows since there are exactly » copies of 
each element (0 to n — 1) in each row, so a repeated element in one column implies a 
missing element in another. We now have a column c of 5  th a t  contains no element 
c x .
Step 3: (Simple Swap) Repeatedly increment .sir(mod k) until s[s.r.j/] =  cx. If 
s[s;r.r] < c.r, go to Step 2. Otherwise, swap the elements .s[a.T. y] and a[a.r. r] thus 
removing the repeated element cx =  a[a.r,j/] in column y of .S'. This will, as a side 
effect, increase the num ber of occurrences of element cx in column r  of S.  Increment
c[c.r.c] and c{s [s.r. ~], y )  and decrement c[c.r,j/] and c[.s[.s.r. r], r]. It is easily seen 
tha t these four simple changes restore the count, property. If swapped, go to .Step 1. 
Step f.  (Next simple swap) Repeat Step 3. thus providing one more chance to simply 
swap two elements in another row. If swapped, go to Step 1. This step is done only 
once before c\cx.y] becomes 1.
Step 5: (Successive Swap) Swap .s[s.r.?/], s[s.r.^]. and update C  as in Step 3. If 
s [s .r ,2/] >  c.r. go to Step 1. Otherwise, increase s.r(mod k ) for another s[sa.'.y] and 
repeat Step 5.
This algorithm works for all permutations, which can be proved using the following 
t hree theorems.
T h e o r e m  1: Given two sets Se and S m  which are i -excessive and V-missing. 
respectively, let A’e(/). and X m ( i )  be numbers with the value i in the sets S t  and 
S m . where 0 <  i < Y . If the number of l'"s in S t  is two. it is always possible to 
reduce the number of V in Se  to one without any cha.nge in the occurrence of AT 
and A'???.
P r o o f : Arranging the elements of the set Se  and S m .
A’e(O) A'm(O)




Z t  Z m
Ze Z m
Z m
There are two possible cases for Ye  to be swapped with an element in the set S m .
78
First, if Ve and Z m  are in the same row, then two elements can be swapped, resulting 
the reduction of num ber of Y e  in the set Se  to be one without any change in the 
number of occurrences in A’e or X m .  However, if Y e  and X m  are on the same row. 
h e and any one of X m ( i ) ,  0 < i < Y  should be swapped. The index i is used in 
order to distinguish the elements of A'e and X m  which have the same value i. As a 
result, two identical numbers A’e( V — 1) and X m ( Y — 1) are on the same Ve-excessive 
column. Now take A’e(V — 1), which is an X m ( Y  — 1 )-alternative. Again, there are 
two possibilities. If A'e(V — 1) is in the same row with Z m ,  the number of Ye  in 
the Ve-excessive column can be reduced to one without any change in the number of 
occurrences in A'. However, if A"e(V — 1) is in the same row with X m ( ] ’ — 2). we need 
to swap A’e(V — 1) and X m ( Y  — 2). and then find the X r n ( ) ' — 2)-alternative which 
is X e { Y  — 2). In worst case, this process continues until A’m ( l )  finds its alternative 
A'e(O). Since other A'???s are not in the same row with A e(0). A*e(0) must select 
Z m .  which leads to the proof of the theorem. □
Theorem  2: Given two sets of Se  and S m  which are 1''-excessive and V-missing. 
respectively, let A’e(f), and X m ( i )  be numbers with the  value i in sets Se arid S m .  
where 0 <  i < Y .  If the number of Vs in Se  is three, it is always possible to reduce 
the  number of Y  in Se to  one by applying simple and successive swaps.
Proof : Any Y e  in the Ve-excessive column can be swapped into the Vc-missing 
column without any change of occurrences of A's, which can be proved using the 
same procedure as in Theorem 1. Once the number of Ve's is reduced to two. 
Theorem 1 can be applied, .so the number of V s  can be reduced to one.D 
Theorem 2 can be generalized to the case when the number of Ve's is arbitrary. 
Theorem  3: Given an arbitrary  permutation of the S  matrix , it is always possible 
to decompose the m atrix  if the C  m atrix is scanned row-by-rovv from top-to-bottom. 
P r o o f : For an arbitrary c[c.r, y] in a row cx being scanned which satisfies c[cx.y] > 1. 
it is always possible to make c[c.r,y] =  1 by applying Theorems 1 and 2. Thus, all
7 9
elements in the C  m atrix which are greater than one can be reduced to one. □ 
L e m m a  1: The maximum number of swaps in the success!ve-swap is k — 1.
Proof: All k elements in the Ve-excessive and Ve-missing columns can be swapped 
except the remaining Yc .  which is at least one.D .
6.4 Example
To illustrate the algorithm clearly, consider a three-stage Clos network having n =  3 
and k  =  5 with an H  m atrix  as shown below.
Hs =
2 0 1 0  0 
0 1 0  0 2 
0 1 1 1 0  
0 1 1 1 0  
1 0  0 1 1
The 5  and C  matrices derived from the H  matrix are shown below.
' 0  0 2 ' ' i l l '
1 4 4 1 1 1
3 1 2 C = 1 0 2
2 3 1 2 1 0
3 4 0 0 2 1
5  -
Now check the C  matrix for an element tha t is greater than 1. which implies that more 
than two edges incident to the corresponding ou tpu t node are colored identically. 
Since C'[2.2] >  2 and C[2. l] =  0, the C  matrix is 2-excessive in column 2 and 2- 
missing in column 1. Since c.r =  2, we find cx in the S  m atrix  at .$.r =  0 because .s.r 
was first set to zero. Thus, we find that 5[s.r. y] = 2. and .S'[s.r. r] =  0. These two 
elements are not simply-swappable because S’[.sa\ y] > .5'[.sx.c], so we move to the 
next row. 2, in the S' matrix. Since ,S'[2,1] < .S'[2.2] in this case, they too are not 
simply-swappable. thus a forced-swap must be applied. This is done by swapping the 
first two elements 5[0. 1] and .S'[0. 2], and then updating the C  matrix by incrementing 
C[0.2] and C'[2.1] and decrementing C [0 ,1] and C[2.2].
8 0
'  0 2 0 * '  1 0 2
1 4 4 1 1 1
3 1 2 c  = 1 1 1
2 3 1 2 1 0
3 4 0 _ 0 2 1 _
Since the swapped element 5[0,2] in column y is 0 which is less than c.r, next find 
the 0-aJterna.tive in column 2, which is 5[4.2]. Now, F fs r .y ]  — .?[4.2] and 5'f^.r.r] =  
5[4. 1]. These two elements are simply-swappable since S[4. 2] < 5 [ 4 .1] and thus can 
be swapped. This finishes the successive swap for C[2. 2] and results in
' 0 2 0 * ' 1 1 1 '
1 4 4 1 1 1
3 1 2 C = 1 1 1
2 3 1 2 1 0
3 0 4 _ 0 1 2
Next, we proceed to C[3.0], which is greater than  1. From the S  m atrix , we 
find that S'[2. 0] is not simply-swappable with S [2 .2], so we move to the next 3 in the 
4th row. For 5[4,0] < 5[4, 2], we can now swap two elements and the two matrices 
are shown below.
' 0 2 O ' ' i l l '
1 4 4 1 1 1
3 1 2 C = 1 1 1
2 3 1 1 1 1
4 0 3 1 1 1
Finally, the  program term inates since all the elements in the C  m atrix  are
1. The resulting three columns of the S m atrix denote the completely decomposed 
switch settings of the second-stage switches, and first and third stage switch settings 
can be derived from this. The basic idea of the algorithm is to make the C m atrix all 
l 's  by using three kinds of swaps. This means tha t there are no identical elements in 
each column of .S' when completely decomposed. Steps 1 and 2 find the two columns 
ru and y0 which are c.r-missing and cr-excessive from the C  matrix. The cx is. on the
81
other hand, the element in F  which is missing or excessive in the same two columns 
of 5. Then, swaps are performed from Steps 3 to 5 until all c[cx.y] become 1.
6.5 Worst-case Behavior
This algorithm is simple, but deriving the exact tim e complexity of the algorithm 
is very complicated. Gordon reported the tim e complexity of his algorithm in his 
paper without giving any proof. He just mentioned tha t the time complexity is 
roughly proportional to the num ber of swaps. The basic difficulty in deriving the time 
complexity of the algorithm is as follows. First, the runtim e is proportional to the 
num ber of swaps. However, it is difficult to calculate the  number of swaps for a given 
perm utation. For a given c[cx.y] > 1, the number of swaps to be performed must 
be c[c.r.f/j — 1. But, I^(c[c.r.y] — 1) does not necessarily represent the total number 
of swaps, because one swap results in the change of four elements of c[c.t.;i/]. two of 
them  increase, and two of others decrease. Secondly, for an element c[cx.y\ > 1. it 
is difficult to know analytically what kind of swaps must be performed in the worst 
case for a given permutation.
Considering the difficulty of analytic approaches, the next possible method 
is simulation. The com puter simulation usually cannot prove all the possible 
cases as the problem becomes complex. However, it helps to narrow the bound of 
t im e complexities. For th a t  reason, the new algorithm has been programmed and 
simulated for various values of n and k. Figure 6.1 shows the worst case runtim e 
vs. k with respect to various values of n. The graph shows th a t  the runtim e of 
setting the Clos network increases as k increases. For a fixed k. the runtime also 
increases as n increases. A closer look a.t the graph show's tha t the runtime is roughly 
proportional to n, but in case of k , the runtime is proportional to kx for some values 
of ,r. In order to exactly obtain the tim e complexity of the algorithm, these curves 
were fitted to the a rb itrary  non-polynomial function. The result of the curve fitting
82
R untim e
R u n tim e  vs.  k
60(10
5000  -





F ig u r e  6.1 Worst case runtime vs. k
shows that the time complexity of the algorithm is proportional to n k :̂ 2. that is
The simple swap dominates the other two kinds of swaps, and the next simple 
swap also dominates successive swaps. Simple swaps do not require much time to  
swap two elements. The successive swaps, on the other hand, are not frequent, but 
take relatively long since up to k — 1 swaps must be made in order to reduce c[c.r.ij} 
by one. As a result, successive swaps still have considerable effects on the overall 
runtim e although they are less frequent. Another thing to mention here is that the 
runtim e is linearly proportional to just the number of columns n. but not to the 
number of rows k. This is mainly due to the use of s.r (mod k) and the effect o f  
successive swaps.
6.6 D iscu ss io n
In this chapter. Gordon's algorithm has been dem onstrated to display some errors 
in some oi the permutations. A new algorithm for decomposing the Clos network
which is based on Gordon's algorithm has been introduced. In this algorithm, the 
same S' and C  matrices are used to represent, the  Clos network, which help to speed 
up routings by checking C  in order to calculate the number of occurrences of each 
element in S  in each column. The basic difference between Gordon's algorithm 
and the new algorithm lies in the scanning direction in the C  matrix. In Gordon's 
algorithm, it is scanned column by column, removing columns once all elements 
are nonidentical in each column. Swapping elements can take place between a not- 
vet-decomposed leftmost column and the rest of the columns. However, the new 
algorithm scans the C  m atrix row-by-row, and swapping elements are restricted to 
two columns for the successive swap. This gives an obvious advantage in proving 
that, it works for all permutations, but. in Gordon's algorithm, it is difficult t.o prove. 
Another advantage to the new algorithm is th a t  it has the potential to be run in 
parallel since only two columns are involved in the successive swap and other pairs 
of two columns can be swapped at the same time.
C H A P T E R  7
R O U T IN G  FAULT TO L ER A N T CLOS NETW O RK S
7.1 Introduction
T he Clos network can not realize all possible perm utations when a fault occurs in 
the system. Thus, ex tra  switches are added to the ordinary Clos network in order 
to achieve fault tolerance. The algorithm for the ordinary Clos network needs to 
be extended to the  fault-tolerant cases for following reasons. First, the s truc tu re  of 
the FTC is basically same as the ordinary Clos network except for added switches. 
Because of this, the representation of the network does not become complicated. 
Second, the spare switches can greatly simply the routing process, which is an obvious 
advantage when there are few or no faults in the system. Third, the same routing 
algorithm for the F T C  can be used for the ordinary Clos network, which is a special 
case of the fault tolerant network. A new routing algorithm for the  F T C  will be 
introduced in section 7.2, which utilizes extra  switches in all stages. For clarity, the 
F T C  is classified into three types of networks, and in each case, the representation 
of the network, and routing rules are considered. In the last section, the simulation 
of the runtim e for the FTC  is discussed.
7.2 Routing the FTC
In chapter 6, we introduced a new routing algorithm for the Clos network. The 
algorithm for the ordinary Clos network can be extended to the fault-tolerant C'los 
network discussed in chapter 5. Recall th a t  the F T C  has extra switches in all stages 
and they provide alternative paths when faults occur in the network. However, when 
there are no faults in the system, these extra, switches can be utilized as additional 
routing paths, which simplify the requirements of the routing process and reduce the
84
8 5
runtime. The outer stage spare switches generate additional rows in the .S' m atrix  
and the second stage spare switches creates additional columns. Additional paths 
introduced by the two types of spare switches are very flexible during the routing 
process, but they have different, characteristics. In- order to develop the new routing 
algorithm for the FT C , it is required to know the properties of these two types of 
spares and, for th a t  reason, the fault-tolerant Clos network will be classified into 
three possible configurations:
I. networks with spare switches in each of the outer stages only.
II. networks with spare switches in the second stage only.
III. networks with spare switches in all stages.
In the following subsections, the networks and representation of the extra switches
are discussed for all three possible cases. The rules and conditions for swapping the 
elements are considered, which will be the basis of the new algorithm for the FTC.
7.2.1 Routing FTC with Spare Switches in Outer Stages (Type I)
The first type of FTC  has extra spare switches in outer stages only along with 
multiplexers and demultiplexers. In this configuration, signals are bypassed to the 
spare switches through the multiplexer/dem ultiplexers in case of faults which occur 
in the outer stages. Figure 7.1 shows the Type I FTC  network which has one extra 
spare switch in each outer stage.
The 5  and C  matrices are the same as those of the ordinary Clos network except 
tha t extra  rows are added which account for the ex tra  outer stage switches and 
multiplexer/demultiplexers. The elements in the .rth row of .S' represent the signals 
passing through the r t h  switch of the first stage whose destination switches are 
s[.r.g], where 0<  y <  ?? — 1. T he elements in the y th  column of S  are the signals 
passing through the j/th second stage switch whose destinations are .s[.r.?y]. where
8 6




•  * ©
V
cda-C/2









































0<  x < k — 1. Each element of the S  matrix represents the signal directed to the last 
stage switch s [ r ,y ]  through the y th  second stage switch. Let ksp be the number of 
spare switches in the first or third stage. If the number of spare switches are not equal 
in those stages, then the smaller num ber will be taken as ksp. The ksp spare switches 
at the outer stages create ksp additional rows in the S  m atrix , and each element can 
serve as an alternative path during the routing process. The to tal number of extra 
paths is n k sp. All the redundant paths due to the spare switches are denoted as #
for convenience. Initially, the elements of 5 ,  s [r ,y ] ,  where k <  x < k + ksp 1.
0 < y <  n — 1. are initialized to spares # .  Also, each c[r,y] of the C  matrix, where 
U < x  <  k — 1. 0<  y < ?7 — 1 is initialized to the num ber of occurrences of the 
integer x  in column y  of 5. The num ber of spares in the S  m atrix  is-not considered 
in the C  matrix. For example, the S  and C  matrices for a Type I Clos network with 
n — k = 3. and 2 spare switch in each of the outer stage can be given as.
S  =
1 0 1 
1 2 0 
0 2 2 
#  #  #  
# # #
an d C =
1 1 1 
2 0 1 
0 2 1
To consider the reconfiguration of faulty switches in the outer stages, faulty 
switches and interstage connections must be taken into account. Recall that we have 
assumed no multiplexers/demultiplexers are defective. If the .rth switch at the first 
stage is faulty, the .rth multiplexer is set so tha t each signal in the .rth switch is 
bypassed to the available spare switches. One of the spare switches, the r-th spare 
switch, is assigned to these signals in order to provide alternative paths. The r th  
row of the S  m atrix  is simply cleared which is denoted as dots. Now define a new 
m atrix , the  reconfiguration matrix , R.  The R  matrix is a k x  .3 matrix , where each 
row y represents the yth switch in one stage, and each column r  denotes the .rth 
stage. The element /?[y,r] shows th a t  the  i?[j/.r]th spare switch in the r t h  stage is 
assigned instead of the yth switch in the r t h  stage. For example, if the 0th switch in
88
the first stage is defective in the above example, the S,  C.  and R  matrices would be
' 1 0  1 '
1 2 0 ' 1 1 1 '
1---O0 
CO1
= 0 2 2 C = 2 0 1 and R  = 1 1 1
#  #  # „ 0 2 1 . 2 2 2
Notice th a t  the elements in the Oth row of 5  remains same, but the #  spares in the 
last, row are no longer available. These spares are now assigned to the signals in 
the 0th row, which can be seen in the  R  m atrix  where ?'[0, 0] =  3. Note that other 
elements in R  show tha t other switches are not reconfigured, and remain the same. 
Dots in the S  m atrix  mean that there are no paths available in the Oth input switch, 
and they are simply ignored during the decomposition process. On the other hand, 
if the .rth switch at the third stage is faulty, the .rth demultiplexer is set so that 
rerouted signals from the third-stage spare switches can be bypassed to reach out lets 
of the .rth switch. For example, if the Oth switch in the third stage is also defective, 
and the 4th spare switch is used instead for the above matrices, the resulting 5. C. 
and R  matrices would be
1 0 1
1 2 0 '  1 1 1 ' ' 3 0 4 '
- 0 2 2 C - 2 0 1 and R  -- 1 1 1
# # # 0 2 1 _ 2 2 2
Perm utation  translation can also be used as was shown in chapter 5 for the 
reconfiguration due to the failure of outer stage switches. Faults due to the interstage 
links can be modeled as a switch failure and the network can be reconfigured in the 
same way described above. The rules and conditions for swapping elements in the 
ordinary Clos networks can be applied in the FTC. since the basic structures remain 
the same. Recall tha t,  in the Clos network, any two elements except spares in a 
row .r of S  can be swapped. This is due to the fact that inlets input to each of the 
first-stage switches can be fully connected within the switch. Each first-stage switch 
is represented by a row of S.  and each element in a row of S  corresponds to the inlets
8 9
to a switch which flows to the third stage switch s[.r,?/]. Each column must have no 
identic.a.1 elements except spares when completely decomposed. This is because each 
second-st.age switch has only one connection to each third-stage switch.
The introduction of #  spares has the following features. First. #  spares in a 
column y  of S  can be swapped with any elements in tha t column. This is due to 
the multiplexers and demultiplexers along with spare switches in outer stages which 
can bypass input signals to the spares switches. Second, spares in a. column y can be 
swapped with any element in another column r  as long as both  columns maintain 
the same number of #  spares since the number of outer spare switches is fixed in the 
network. When the matrix is fully decomposed, then all the elements in C' matrix 
must be one. The zeros in the C  m atrix  indicate that these elements are swapped 
with the #  spares in the same column. The total number of spares in each column 
is restricted to ksp. which does not change during the routing process.
7.2.2 Routing FTC with Spare Switches in the 2nd Stage (Type II)
In contrast to the first type, the second type of FTC has extra spare switches in 
the second stage. In this configuration, signals are bypassed to the extra switches in 
case of faults in the second stage switch y.  where 0 <  y < ?? — 1. Figure 7.2 shows 
the Type II FTC network with one extra spare switch in the second stage. In 
the Type II FTC, the S  m atrix is represented in a  different way from the Type I 
FTC. Let n sp be the  number of spare switches in the second stage. The n sv spare 
switches at the middle stages create n sp additional columns in the S  matrix, and 
each element in the additional column can serve as an alternative path during the 
routing process. The total number of extra  paths is k n ap. All the initial elements in 
the spare columns of S' are denoted as asterisks (*) for convenience. These spares 
are wild cards, like the #  spares, but different in characteristics. Also, the C  matrix 
is defined as in the ordinary Clos network except that extra columns are added. The

91
e[.r. ;(/] of the C  m atrix , where 0 < x  <  k — 1, 0< y <  n + n sp — 1 are initialized to t lie 
number of occurrences of the integer x  in column y of S. Each extra  spare switch in 
the second stage generates one ex tra  column in the 5  and C  matrices. T he elements 
in the j/th column of 5  represent the signals moving to destination switches .s[.r. t/]. 
0 < x < k  — 1 through the ,yth second stage switch. For example.
' 1 0 1 * * ' 1 1 1 0
1o
1 2 0 * * and C = 2 0 1 0 0o
1 2 2 * * I O 2 1 0 0 _
The right two columns of C  are all zero because there are no elements between 0 to 2 
in those columns, only * are in these columns. If the .rth switch of the middle stage 
is faulty, the term inal relabelling described in chapter 5 must be performed. In the 
$  matrix, the term inal relabelling can be achieved by clearing the r th  column of .F. 
and assigning these spares for the faulty .rth second stage switch. The cleared r th  
column of S  will be denoted as dots. The relationship between the faulty switch and 
spare switches will be noted in the reconfiguration matrix R  as in the Type I FTC. 
The R  m atrix  is used to perform the  term inal relabelling of the inward terminals of 
the outer stages. For example, if the first switch in the middle stage in the above 
example is defective, and the 4th spare switch is used instead, the resulting S.  C.  
and R  matrices would be
■ 1 0 1 * . ' 1 1 1 0 0 ■ ' 0 0 0 ‘
= 1 2 0 * . C = 2 0 1 0 0 and R  — 1 4 1
0 2 2 * . 0 2 1 0 0 _ 2 o 2
Notice tha t the elements in the first column of S  remain the same, but the * spares in 
the last column are no longer available. These spares are assigned now to the signals 
in the first column, which can be seen in the R  m atrix where r [ l ,  l] =  4. Dots in the 
S  matrix mean tha t there are no paths available in the first second-stage switch, and 
they are simply ignored during the decomposition process. The rules and conditions 
for swapping elements and * spares are as follows. First, any two elements including 
spares {*"') in a row x  of S  can be swapped with any element. After the swap, the
92
swapped elements are again free for any other swaps. This flexibility of * spares 
makes the routing processes very simple. However, spares in a column y can not be 
swapped with any elements in th a t  column. Secondly, each column of 5  m ust have 
no identical elements except * spares when completely decomposed. This is because 
the second-stage switch has only one connection to each of the third-stage switches. 
Finally, the num ber of * spares in a column can take any value i, where 0 <  i < k — 1.
7.2.3 Routing FTC with Spare Switches in All Stages (Type III)
The last type of F T C  is the one with ex tra  spare switches in outer stages, along 
with multiplexers a.nd demultiplexers, as well as in the second stage. In this FTC. 
alternative paths are provided regardless of faults in any of the three stages. Figure
7.3 shows the type III of the FTC network with one extra spare switch in each stage.
In this type of network, there are k sp spare switches in each outer stage and 
n sp spare switches in the middle stage. The ksp spare switches in the outer stages 
create ksp additional rows in the 5  m atrix , which has a total of n k sp extra paths. 
Also, the n sp spare switches in the middle stage create n sp additional columns in the 
S' matrix, and this can generate a tot.al of k n sp extra paths. Initially, the elements 
of S', s[.r.j/j, where k <  x  < k +  ksp — I. 0 < y < n — 1, are initialized to #  spares 
and s[.r.</]. 0<  x < k — 1, ?? < y < n +  nsp — 1 are initialized to * spares. The 
elements of ,S. s[.r,y], where k < x  < k +  ksp — 1. n < y <  n ■+ n sp — 1. are denoted 
as blanks because spares in this area are not used as will be illustrated later in the 
new algorithm. Note that this area could have been initialized to * spares. Also, 
the C matrix has n sp additional columns due to the second stage spare switches, 
but there are no additional rows in the matrix. The c[.r.?/] of the C m atrix , where 
0 <  x <  k — 1. 0 <  y < n ■+ n ap — 1 are initialized to the number of occurrences of the 
integer x  in column y of 5. Each ex tra  spare switch in the second stage generates
93
ooo 00 0 0
oucdC-











































one extra column in the .? and C  matrices. For example, when n sp — ksp —
S  =
1 0 1 * *
1 2  0 *  *  




1 1 1 0  0 
2 0 1 0  0 
0 2 1 0  0
Since this type of F T C  has extra switches in all stages, all the defective switches 
need to be considered for the reconfiguration. If the x th  switch in the first stage is 
faulty, the .rth multiplexer is set so that, each signal in the r t h  switch is bypassed 
to the available spare switches. One of the spare switches, the r th  spare switch, is 
assigned to these signals in order to provide alternative paths. The r t h  row of the 
5’ m atrix  is simply cleared, which is denoted as dots. Set the R  matrix  with r  =  0 
as in the Type I FTC , where the element r [y .r ]  represents tha t the r[?/.r]th  spare 
switch in the r th -s tage  is assigned instead of the j/th switch in the .rth-stage. If 
the r t h  switch in the middle stage is faulty, clear the r th  column of 5 , and assign 
these spares for the  faulty r t h  second stage switch. The cleared r th  column of S  
will be denoted as dots. The R  matrix  is used to perform the terminal relabelling of 
the  inward terminals of the outer stages. For example, if the Oth switch in the first 
stage and the 1st switch in the middle stage in the above example are defective, the 
resulting 5 ,  C\ and R  matrices would be
1 0 1 * . '
1 2 0 * . " 1 1 1 0 0 " " 4 0 0 '
= 0 2 2 * . C = 2 0 1 0 0 and R  - 1 4 1
# # # 0 2 1 0 0 2 2 2
The rules and conditions for swapping elements and * or #  spares are as follows. 
First. #  spares in a column y  of S  can be swapped with any elements in th a t  column 
except * spares because the multiplexers and demultiplexer along with spare switches 
in outer stages can bypass signals. Also, spares in a column y can be swapped with 
an}- elements in another column z as long as both columns maintain the same number 
of #  spares. Secondly, any two elements including * spares in a row .r of .S' can be
95
swapped with any element except #  spares. After the swap, the swapped elements 
are again free to perform any other swaps. This flexibility' of * spares makes the 
routing process very simple. However, * spares in a  column y can not be swapped 
with any elements in that, column. Third, each column of S  must have no identical 
elements except #  or * spares when completely decomposed. This is because the 
second-stage switch has only one connection to each third-stage switch. Finally, the  
number of * spares in a column is not restricted to n sp, bu t can take any value /. 
0 <  i < k T  7isp — 1. However, the number of #  spares in a column must remain k sp. 
Based on the above rules and conditions, the algorithm for the FTC network is 
introduced as follows. The S  and C  matrices in the FTC  are the same as in the 
ordinary Clos network except th a t  two kinds of spares are considered. First, the 
elements of 5, -s[.r,i/], where k < x < k + ksp — 1, 0 <  y < n — 1. are initialized to #  
spares. Also. s[.r, y] is initialized to * spares where 0<  x < k — 1.7? <  y < n + n sp — 1. 
The c[.r. y] of the C  matrix , where 0 <  x < k — 1. n <  y <  7) -f n sp — 1 are initialized 
to  the number of occurrences of the integer x  in column y of S.
Algorithm:  Initially sx  is set to zero.
Step 1: Find a column ex.  in a row y of C  such that c[c.r.y] >  1. If no such element 
can be found then increment, cx until either such an element can be found or all rows 
are satisfied, in which case the algorithm term inates with a solution. If the algorithm 
has found c\cx. y] > 1. set r  =  0.
Step 2: (W ild Swap) Check whether spares are available in column y. If available, 
increment ,sx(mod k) until s[.s,t,;i/] =  c.r, then swap s[.s.r,y] with a #  spare in the 
column y. and go to Step 1. If not available, then check the * spare in the row a.r. If 
the  * spare is available, increment s.r(mod k) until s[.s.r.y] =  cx, then swap s[.s.r. y] 
with a spares in the row sx  and go to Step 1.
Step 3\ Increment. :(m od  k) until c[c.r,c] =  0.
Step (Simple Swap) Repeatedly increment s.r(mod k) until s[s.r.y] =  ex. If
9 6
s[.s,r.r] < .s[.s.t ,?/], go to Step 3. If s[s.r.;(/] <  or ,s[.s.t . r] is *, swap the
elements s[s.r.?y] and s[s.r. r] and update  the C  m atrix . If swapped, go to Step 1. 
Step 5: (Next simple swap) Repeat Step 4, thus providing one more chance to simply 
swap two elements in another row. If swapped, go to Step 1. This step is done only 
once before c[cx,y] becomes 1.
Step 6: (Successive Swap) Swap s[s.r,7/] and s[sa:.r], and update  C  as in Step 4. If 
s[s.r.t/] >  cx or s[sa\ y] is *, go to Step 1. Otherwise, increase s.r(mod k)  for another 
s[s'.t, ?/] and repeat Step 6.
Example:  For a given S' and C  m atrix  below when 77 =  k — 4 and ksp = n sp =  1.
5  -
1 1 2  3 *
1 3 2 0 *
2 0 2 3 *
1 0 0 3 *
# # # #
and C  =
0  2 1 1 0 *  
3 1 0 0 0
1 0 3 0 0
0 2 0 3 0
First repetition: Wild swap continuously while scanning the C  m atrix  row by row.
* 1 2 3 1
1 3 #  0 *
5  =  2 #  * 3 2
#  0 0 3 *
1 0  2 #
* 1 2 3 1
1 3 #  0 *
5  =  2 #  * 3 2
# 0 0 # *  
1 0  2 3
Third repetition: cx = 3. y — 3. sx  — 0. and r  — 0
S' =
The wild swap greatly reduces the number of next swaps or successive swaps 
which take much time, since at most k — 1 swaps are needed in order to find the
' 0 1 1 1 0 ‘
1 1 0 0 1
and C  =
1 0 1 0 1
.  0 1 0 3 0 .
cx =  3. sx 3- J = 3
' 0 1 1 1 0 '
1 1 0 0 1
and C —
1 0 1 0 1
0 1 0 2 0
' 3 1 2 * 1 ' ' 0 1 1 1 0 '
1 3 # 0 *
and C —
1 1 0 0 12 # * 3 2 1 0 1 0 1
# 0 0 # *
.  1 1 0 1 0 .
1 0 2 3
97
alternative paths. The number of simple swaps is also reduced. As the num ber of 
extra rows and columns increases, the algorithm has more chances to suppress the 
simple and successive swaps and thus improve the run time. The new algorithm 
works for all perm utations, which can be proved using the following three theorems. 
T h e o r e m  4: Given two sets of S t  and S m  which are Y’-excessive and Y'-missing. 
respectively, let A‘e(?), and X m ( i )  be numbers with the value i in the set S t  and 
S m .  where 0 <  i < Y .  Each set contains the same num ber of #  wild cards, bu t the 
number of * wild cards may be different. If the number of Y 's  in Se  is two. it is 
always possible to reduce the number of Y  in Se  to one without any change in the 
occurrence of AT and Abu.






Z t  Z  m.




The proof is basically the same as Theorem 1. There are two possible cases for Ye 
to be swapped with an element in the set S m .  First, if Y e and Z m  (or *) are in 
the same row, then two elements can be swapped, resulting in the reduction of the 
num ber of Y e  in the  set Se  to one without any change in the number of occurrences
9 8
in X e  or X m .  However, if Ye  and X m  are on the same row. ) t and any one of 
X m ( i ) .  0 <  ? < Y  should be swapped. The index ? is used in order to distinguish 
the elements of X t  and X m  which have the same value i. As a result, two identical 
numbers A"e(l’ — 1) and X m ( Y  — 1) are in the same V e-excessi ve column. Now take 
A’e(V’ — 1). which is an X m ( Y  — l)-alternative. Again, there are two possibilities. If 
A’e(V — 1) is in the  same row with Z m  (or *), the  num ber of Ye  in the  T'e-excessive 
column can be reduced to one without any change in the number of occurrences in A'. 
However, if X e ( Y  — 1) is in the same row with X m ( Y  — 2 ), we need to swap A’e(V’ — 1) 
and X m ( Y  — 2). and then find the X m ( Y  — 2 )-alterna.tive. which is A’e (T  — 2). In 
the worst case, this process continues until A*???(l) finds its alternative A’e(O). Since 
other A ms are not in the same row with .Ae(0), A’e(O) must select Z m  (or *). which 
leads to the proof of the theorem. □.
Theorems 2 and 3 in chapter 6 can be used similarly to prove tha t the algorithm for 
the FTC holds for all permutations.
7.3 Worst-case Behavior of the Algorithm
The new algorithm for the F T C  network is similar to tha t of the ordinary Clos 
network, so deriving the exact time complexity7 of the algorithm with respect to the 
number of extra switches is a very complicated m atter. In this case, the run time is 
dominated by the num ber of swaps, which consist of simple swaps, next simple swaps, 
successive swaps and wild swaps. Wild and simple swaps do not require much time 
to swap two elements. The successive swaps on the other hand, are not frequent, but 
take a relatively long time since they are continued until the alternative paths are 
found. For the F T C  algorithm, the basic difficulty of deriving the time complexity 
of the algorithm remains the same as was explained in chapter 6. These are 1) 
£!(c[c.r. y] — l)  does not necessarily represent the total number of swaps, because one 
swap results in the change of four elements of c[ca',y], two of which increase and two
99
R untime Runtime vs. y_spare (n=20)
6 0 0 0
5 0 0 0  X
4 0 0 0  ■ -
3 0 0 0  +
2 0 0 0  • S’
1000
0 7 10 12 155
y_spare
F ig u r e  7.4 Worst case runtime vs. number oi y.spnres ior various /.’
of which decrease. 2 ) For an element c:[c.r.</] > 1. it is difficult to predict analytically 
what kind of swaps must be performed in the worst case for a given perm utation. For 
that reason, the new algorithm for the F T C  network has been simulated to obtain the 
runtim e of the algorithm with respect to various numbers ol extra switches, f igure
7.4 shows the worst case runtime vs. </_.$pc/re for various values of /,'. The graph 
shows that the runtim e of the algorithm for the FTC  network decreases as y. spor t  
increases. This is continued until y . s j m r t  reaches about kj '2.  where the runtime is 
saturated to a certain value. Runtime can he reduced to lar less than half ol that 
when there are no extra switches in the network.
Figure 7.5 shows the worst case runtime vs. . repor t  for various values of u. As in 
the previous figure, the graph shows that the runtim e of the algorithm for the FJ C 
network decreases as x.spare  increases. This is continued until .rspan-  reaches about 
i:/2. But. the runtim e decreases more slowly in this case, and it is reduced to slightly 
more than half of that when there are no extra switches in the network. Figures
7.6 and 7.7 show the average runtime versus the number of extra switches y . s p a r t
1 0 0
R untim e
6 0 0 0  r -
Runtimc vs. x_spare (k=20)
5 0 0 0
n = 2
4 0 0 0  - •




0 2 5 7 10
x_sp are
F ig u r e  7.5 Worst case runtime vs. number of .r.spare:^ for various n
(.v-xjian ) for the various k(ri). Figure 7.8 shows the number of each swap with 
respect to .vsjxrrt .  As can be seen from the figure, the wild swap increases steadily 
with in creases in ,v_s par t ,  but other simple swaps and successive swaps decrease, 
which explains the improvement in runtime.
7.4 D is c u s s io n
In this chapter, a novel algorithm for routing in the fault tolerant Clos network 
has been introduced. Clos networks are used mainly to realize permutations. 
Without any fault tolerance, if a switch in the  network fails, the net work is rendered 
inoperative and the system has to be interrupted to put the network back to work. 
The FTC network can continue its work uninterrupted during the presence of a 
fault because the FTC  network can reconfigure itself dynamically, by changing the 
settings of the multiplexers and demultiplexers and using the adaptive permutation 
translation scheme which can be facilitated by the use of the reconfiguration matrix 
R. The defective item can then be repaired during the lime at which the system
1 0 1
Runtim e
Runtim e vs. y_spare (x_sparc=0, n=20)
4 0 0 0
3 5 0 0  --
3 0 0 0  ■■
2 5 0 0  ■-






F ig u re  7 .6  Average case runtim e vs. number of y.spa res for various k
R u n tim e vs. x s p a r e  (y_sp are= 0 , k = 2 0 )
R untim e
4 0 0 0
3 0 0 0  ■-
2 5 0 0
5 0 0  +'
1000 f
0 3 10 205
x_spare
F ig u re  7 .7  Average case runtime vs. number of x.spares for various u
1 0 2










0 5 7 10
 ■  w i l d
s i m p l e
n e x t
 * t o t a l
x s p a r e
F ig u r e  7.8 N umber of simple, next simple, and successive swaps vs. x .spnn*
is unused. The spare switches introduce two types of wild cards depending on the 
location of spare switches in particular stages. In the Type I FTC network, two extra 
spares along with multiplexers/demultiplexers are required in order to create one 
additional row in the specification matrix. The Type II FTC  network requires less 
hardware to create one additional column in the matrix, and the wild cards are much 
more flexible than in the Type I FTC network. In designing the routing algorithm, 
any wild cards can be used at any time during the decomposition. However, it is 
preferable t,o use the type 1 extra spares first and then type II spares next, since 
type II spares are more flexible. As in the previous algorithm, the new algorithm 
scans the C  m atrix  row by row. and swapping elements are restricted to two columns 
for the successive swap, which gives the obvious advantage in proving that it works 
for all permutations. Another advantage to the new algorithm is that it has the 
potential to be run in parallel since only two columns are involved in the successive 
swap and other pairs of two columns can be swapped at the same time.
C H A P T E R  8
R E L IA B IL IT Y  OF FAULT T O L E R A N T  CLOS N E T W O R K S
8.1 Introduction
So far we have discussed the  new routing algorithms in ordinary and fault, tolerant 
Clos networks. Also, we considered the runtim e with respect to the number of extra 
switches in the outer and middle stages. Another im portant factor in the FTC 
network is the reliability and space complexity with respect to the number of extra 
switches. The reliability and space complexity are dependant on the number of 
spare switches in the outer and middle stage switches, and these switches generate 
additional extra rows and columns in the specification matrix  which contribute to 
the improvement in runtime. Thus, it is im portant to  understand exactly how these 
factors are related, and design the F T C  network accordingly. In section 8.2, the fault 
detection and location for the F T C  network is discussed briefly. Next, the reliability 
and space complexity of the  FT C  network, which are im portant factors in designing 
the fault tolerant Clos networks, are considered. Finally, in the last section, the 
optim um  number of extra switches for the fault tolerant Clos network is considered 
which will best balance the runtime, reliability and cost.
8.2 Fault Detection and Location of the FTC
The work of any fault tolerant MIN generally depends on two things: fault detect ion 
and fault location. Two techniques have been proposed in the literature for fault 
detection and location. First, fault detection and location can be performed off­
line by applying prescribed test patterns to the inlets and comparing the output at 
the outlets with the expected values. Second, faults can be detected and located 
dynamically online through either parity checking or data bit checking. As good as
103
104
the online techniques may sound, the}- require a special switch design with built-in 
hardware to carry out the dynam ic checking. This online fault detection and location 
technique is the mechanism th a t  can be applied to many MINs. However, the FTC  
network does not require any particular mechanism; rather it requires only that the 
processors be notified of the location of the fault, if any. For the work done in this 
thesis, it is assumed tha t there is some mechanism to detect and locate faults and 
notify the processors of the location of the fault.
8.3 Reliability of the FTC Network
The reliability of both the ordinary Clos network and the F T C  net work are dependent 
on the reliability of each switch and link of the networks. In chapter'5 , multiplexers 
and demultiplexers are assumed to have high reliability when compared with switches 
and links in the FTC  network, Rigorous reliability analysis is possible which considers 
the reliabilities of both multiplexers and demultiplexers. However, they are not 
considered in this thesis for analytical simplicity. First, define the reliability, r. of a 
single switch as the probability that, the switch does not. fail over a period of time 
r .  Then. /  =  1 — r  is the probability that the switch fails in the same period r. 
Similarly, define the reliability R  of the network, ordinary or FTC . as the probability 
that the network does not fail over a period of time r. Then F  — (1 — /?) is the 
probability th a t  the network fails in the same period r.  A switch fails if it cannot 
realize, partially or completely, a mapping of its inputs onto its outputs. Similarly, 
a network fails it cannot realize, partially or completely, a mapping of its inlets onto 
its outlets. For the ordinary Clos network to be fully operational over the period of 
time r .  all of its switches m ust be operational over the same period of time r.  For 
simplicity, assume that all the switches have the same reliability r. Therefore, the 
reliability of the ordinary network, assuming statistical independence (independent 
failure events), is
1 0 5
Rord =  r 2k+™
where 2k +  m  is the number of switches in the ordinary Clos network.
For the FTC with one extra switch in each stage, the network will remain 
fully operational if up to one switch in every stage fails. Let /?0, Ri  and R 2 be 
the reliabilities of stages 0, 1, and 2, respectively. Clearly, the three stages are 
statistically independent. Thus, the reliability of the network is
R f t c  =  R qR i R?
The reliability of the first stage, Ro is the probability that at least k out of the k + 1 
first stage switches, will be operational. Alternatively, if F0 is the probability tha t 
the first stage fails, then
Ro — 1 — Ro
For stage 0 to fail, given th a t  there is one ex tra  switch, at least two switches will 
have to fail, or less than k switches will have to function properly. This is a case of 
binomial distribution or Bernoulli trials, for which F0 can be written as
Fo = £  ( k * 1 )  * 1 )  ^  -  r )"+1_'
where ^  ̂ "t  ̂ j  is the combination of k + 1 taken i at a time. Substituting F0 
to Ro = 1 — Fo and realizing th a t  Ro -- R 2 since the outer stages are the same, it 
follows that
/?0 = /?2 = l-g^ /l' + !
A similar analysis shows that the reliability of the middle stage is
106
1 = 0
Substituting these two equations yields.
r>v. = (1 - E ( k t 1 ) '-'O - (i - i  ( "71 ) r'(> - r>"'+,_'
When more than one switch is added to every stage, additional alternative 
paths are created and thus, greater reliability is expected. To verify that, the previous 
equation will be generalized to the case where x  switches are added to each of stages 
0 and 2. and y switches are added to stage 1. Using the  same procedure as above, 
it can be shown th a t  the reliability of the new network. R f t c  is
R f t c r‘(l -  r)***")’ ( l  -  £  (  ™ + » ) r'(l -  r)— -
8.4 EflFect of Spare Numbers on Reliability
The above equation can be used to show the reliability of a fault tolerant Clos net work 
with respect to the num ber of spares switches x  or y. Figure 8.1 shows the reliability 
of a fault tolerant Clos network with respect to the number of extra switches in the 
first or third stage in the Type I FTC  network, when the  reliability of the switch r 
is 0.9, 0.96, 0.98, and 0.99, respectively, and n =  k = 20.
As shown in the figure, the reliability of the system depends on the number of 
extra switches, and just one or two extra switches are needed in each stage in order 
to improve the reliability of the system considerably especially when r is high. The 
high system reliability can be obtained as the reliability of the switch r increases. 
It can be seen that if r is large, the addition of more than  one switch per stage is 
not needed and the reliability approaches 1. However, when the reliability of the
R eliab ility  vs. y_sparc (x_sp are= 0 , k = n = 2 0 )
R eliab ility
0 .9  T
0.8 - - -o-
0 .7  --
0.6  - - — ■—  r=0  9 0
0 .9 6
0 .4  --  0  98
0  99
0.2  - -
v_spare
F ig u r e  8.1 Reliability vs. number of yspare  switches in Type I networks when 
k = »  =  20
switch r is low or when switches with high reliability are used lor a long time, the 
system reliability increases slowly with respect to the number of spare switches. In 
this case, more switches are needed in order to obtain the better reliability ol the 
system. Also, the relatively low system reliability is obtained when r is low.
Figure 8.2 shows the reliability of a fault tolerant Clos network with respect 
to the number of extra switches in the second stage in the Type II FI C network 
when the reliability of the switch r is 0.9. 0.96, 0.98. and 0.99. respectively, and 
11 =  k -  20. As in the Type 1 network, the reliability of the system depends on the 
number of extra switches, and just one or two extra switches are needed to improve 
the reliability of the system considerably when r is high. The high system reliability 
can be achieved as the reliability of the switch r increases, but it takes on a lower 
value than in Type I networks for the same reliability of the switch r. This is t r u e  
when the reliability of the switch is low, where the system reliability increases slowly 
with respect to the number of spare switches, but with a much lower value. The
1 0 8
R eliab ility  vs. x_spare
R eliab ility  
0 .7  -r
0.6  - •
0 .5  -•
0 .4  --




F i g u r e  8.2 Reliability vs. number of .v,spare switches in Type II networks when 
k  =  u =  20
main reason for this is that the Type I network has extra switches in both the first 
and third stages, while in Type  II networks the extra switches are available only 
in the second stage, so the total number of extra switches is about the half that 
of the Type ] network. Note, in the above two figures, that the network reduces 
to an ordinary Clos network and the reliability is same in both types of network 
when .v-xpart  =  y s p a r e  =  (J. Generally, the addition of extra switches increases the 
overall reliability of the network by orders of magnitude when the reliability r is low. 
while the addition of same number of switches increases the overall reliability of the 
network only slightly when r  is high.
Therefore, it can be concluded tha t  when the reliability r of the individual 
switches is high, there is no need for adding excessive hardware, especially when 
the total number of switches is small. That is because the higher the number ol 
switches in the network, the higher its vulnerability to failure. The existence ol 
small numbers of switches with a few extra switches in the FTC' makes a failure in
109
the network insignificant. Adding more switches per stage can be seen to increase 
the overall reliability of the network. However, reconfiguration of the net work would 
be more difficult and time consuming. Moreover, the ex tra  switches would increase 
the hardware of the network and complicate its design. The reliability of the FTC 
is generally greatly higher than tha t of the ordinary network and the FT C  is more 
beneficial for networks with poor switch reliabilities. When r  =  1. there is clearly no 
need for any fault tolerance.
8 .5  S p a c e  C o m p l e x i t y  o f  t h e  F T C  N e tw o r k  
We will now consider the space complexity of the FTC network. The addition of the 
extra switches in the first and third  stages causes an increase in the number of inputs 
and outputs  in each of the second stage switches. This is the same as when extra 
switches are added in the second stage, which results in the increase of the number 
of inputs and ou tpu ts  in the first and third stage switches. Note that the addition of 
spare switches in the second stage results in the increase of switch areas in both the 
first and third stages, while the changes in the outer stages result in an increase only 
in the second stage. Since the switches are actually crossbar switches, the area of 
the switches, or the number of cross points is generally proportional to the product 
of the number of inputs and outputs  of the switch. Here, we assume that the area of 
multiplexers/demultiplexers are not significant for the simplicity of analysis. Also, 
it is assumed here that the costs for the F T C  networks are proportional to the area 
of the total num ber of switches.
Let x  and y again be the number of extra  switches in the second stage and 
first (or third) stage, respectively. Then the total number of switches in the second 
stage is n + x,  while it is k 4- y in the first (or third) stage. The number of inlets 
in the first stage switch is n, and the number of outlets is n +  x. In the second 
stage switch, the number of inlets or outlets is k +  y. Since the first and third
110
Cost Cost vs. y_spare (n=k=20, \_sparo=0)
4






2 * 1 0
200 10
\ _ s p a r c
F ig u re  8 .3 Cost vs. number of spare switches in Type 1 networks when // =  /.• =  "Jll 
and .r = 0
stag/'s are identical, the total area of the outer stages is twice t lie area of eit her 
outer stage. The space complexity or the cost of the FTC network is proportional to 
2f /.■ ~  .//)/? ( 11 4- . r ) 4- ( n +  x )(/: 4- // )v Figure 8.3 shows the cost vs. the number of ext ra 
!!..•> pa r< > in lype  1 networks when n =  k = 20 and .r.sport =  0. As can be seen in 
the ligure. the cost increases monotonically as the number of //_.s/>r//c increases.
F i g u r e  8.4 s h o w s  t h e  cost  vs. n u m b e r  o f  .r_s/u/;-( s in T y p e  11 n e t w o r k s  w h e n  
n — k — 20 a n d  t j-spart  =  0. A s  in T y p e  I n e t w o r k s ,  t h e  cost a lso  i n c re a s e  s t e a d i l y  
w i t h  th e  i n c r ease  in  .r_,sp a n .  H o w e v e r ,  i n t h i s  case,  t h e  cost is less t h a n  in  T y p e  I 
n e t w o r k s .  N o t e  t h a t  t h e  i n c re ase  in i / s p a r t  i n  T y p e  1 n e t w o r k s  a c t u a l l y  a d d s  t w i c e  
I l ie n u m b e r  ol  e x t r a  s w i t c h e s  l o  t h e  n e t w o r k ,  a l t h o u g h  t h e  n u m b e r  o f  e x t r a  s w i t c h e s  
is t h e  n u m b e r  o f  . r . s / w / u  in T y p e  I I  n e t w o r k s .  It can  be  seen f r o m  t h e  f ig u re s  t h a t  
t h e  l y p e  I n e t w o r k  is in  g e n e ra l  m o r e  e x p e n s i v e  t h a n  Typ e  I I  n e t w o r k s ,  b u t  d u e s  
not  d o u b l e  t h e  cost  o f  t h e  T y p e  I I  n e t w o r k s  for  t h e  s a m e  n u m b e r  o f  ./■_.->/«//•< an d
I l l










x s p a r e
F ig u r e  8 .4  Cost vs. number of spare switches in Type II networks when n =  /,■ =  20 
and // =  0
//.s /a /rt.  However, it can achieve better reliability than the Type II network since 
there are more extra switches.
8 .6 O p t im u m  N u m b e r  o f  S p a re  S w i tc h e s  in th e  F T C  N e tw o rk  
So far we have examined the runtime, reliability, and cost with respect to the number 
of spare switches x. and ij. As was seen in chapter 7. the runtime is roughly the same 
in buth the Type 1 and Type 11 FTC networks. More specifically, the I ype II net wurk 
is fast er when the number of spare switches is small. However, as t he number of spares 
increases, the runtime is slower than the Type I network since it needs extra tim e 
to lind the location of spares and to make sure that there are no identical element.' 
in the specification matrix  S.  The number of spares needed in Type I networks for 
generating additional rows in the 5  matrix is twice the number ol spares in the 1 ype 
11 network for creating the same number of additional columns. On the other hand, 
the Type 1 network can achieve belter reliability than the Type 11 network, but it
112
requires twice the number of extra switches. Because of this, the Type I network 
is more expensive than the Type II network, but it does not double the cost of the 
Type II network when r s p a r e  and y. spare  are the same. The optim um  number 
of spare switches in each stage of the FT C  network can not be determined exactly, 
rather it depends on the availability of the  resources and requirements of the system. 
The general approach would be to decide the above factors first and then adjust the 
number of spare switches in the outer stages and in the middle stage.
The research so far has shown the following results for det ermining the number 
of spare switches in each stage. When the reliability of the switches is high, just one 
or two extra switches are needed in each stage in order to improve the reliability of 
the system. In this case, the  fault tolerant routing algorithm is not efficient, and the 
runtim e approaches the speed of an algorithm for the ordinary Clos network. No 
additional costs are required. However, when the reliability of the switches are not 
high, more than two extra  switches are required in order to improve the reliability. 
High reliability can be achieved by adding more switches in outer stage, but with 
the increase in cost. Adding more extra switches in the middle stage is less costly 
in improving the run tim e than  adding spares in the outer stages. However, better 
reliabilities are possible in the la tte r  case. In both cases, the introduced fault tolerant 
routing algorithm utilizes extra  switches to improve the runtime, which is roughly 
the same in both types of FTC.
8.7 Discussion
Besides the fault tolerance the F T C  provides, the reliability of the network is greatly 
enhanced. High reliability' means more system availability with uninterrupted 
operations. It is seen from the analysis that using this fault tolerance approach is 
most beneficial when the reliability of the individual switches is poor, or the number 
of switches in the network is large. As far as reliability is concerned, larger numbers
of extra switches are needed in order to increase the reliability. This num ber depends 
on the num ber of switches in the network and the reliability of the individual switch, 
and can be determ ined for an op tim um  value. However, pu tt ing  a large number 
of ex tra  switches per stage adds significantly to the network hardw are and routing 
complexity. High reliability can be achieved by adding more switches in any of 
the stages. B u t adding switches in the  outer stage increases the cost and system 
hardware more rapidly. The same improvements in runtim e can be obtained by 
adding more ex tra  switches in the m iddle stage, which is less costly in improving 
the run time, bu t relatively low improvements can be achieved in reliability.
C H A P T E R  9
CONCLUSION  
9.1 Summ ary
This thesis has demonstra.ted the failure of G ordon's algorithm which uses two 
matrices for improving the runtime. A new simple algorithm for the  control of 
rearrangeable Clos networks which runs in tim e 0 ( n k 3^2) is proposed based on his 
algorithm. The new algorithm is extended to the  fault tolerant Clos (FT C ) network, 
which can further improve the run time when there are relatively few or no faults in 
the system. In order to achieve this, the F T C  network has been classified into three 
types to find the swapping rules and conditions of extra elements. The optim um  
number of extra  switches on the fault tolerant Clos network is considered which 
will best satisfy the run time, reliability and cost constraints. The result of each 
perspectives are summarized below.
9.1.1 R outing for Clos Networks
Although Gordon's algorithm is simple and fast, this research has shown tha t his 
algorithm displays errors in some of the perm utations, especially when k > 5. The 
new algorithm is based on Gordon's algorithm where the  Clos network is represented 
by the specification m atrix  and count m atrix . As in Gordon's algorithm, the new 
algorithm has the advantage of speeding up routings by just checking the C  m atrix 
in order t.o calculate the number of occurrences of each element in each column of 
the S  m atrix . The new algorithm consists of three kinds of swaps: simple swap, 
next simple swap, and successive swap. The successive swap can be compared with 
the iteration phase of N eim an’s algorithm, where the algorithm backtracks in order 
to select all elements which are not in the same rows and same columns. The time
114
1 1 5
complexity of the new algorithm for the ordinary Clos network has been found to be 
0 { n k 3/ 2).
The basic difference between Gordon's algorithm and new algorithm lies in 
the scanning direction in the C  matrix. In Gordon's algorithm, it is scanned column 
by column, removing columns once all elements are nonidentical in each column. 
Swapping elements can take place between a not-vet-decomposed leftmost column 
and the rest of the columns. However, the new’ algorithm scans the C  matrix row-by- 
row. and swapping elements a re  restricted to two columns for the successive swap. 
This gives an obvious advantage in proving th a t  it works for all permutations, but. in 
Gordon's algorithm, this is hard to prove. Another advantage to the new algorithm 
is that it has the potential to be run in parallel since only two columns are involved 
in the successive swap, and other pairs of columns can be swapped at the same time. 
Also, the simple, next simple, and successive swaps can easily be extended to the 
fault tolerant Clos network, which is yet another advantage.
9.1.2 Routing for FTC Networks
The new algorithm for FTC networks shows tha t the previous algorithm for the 
ordinary Clos network can be easily extended to the fault-tolerant cases. It has been 
shown that the original matrices can be modified using extra rows and columns in the 
specification matrix so tha t they can represent the extra spare switches in the FTC 
network. Extra switches generates wild cards in the matrix, which provide flexibility 
during the decomposition process. The wild swaps employed in the algorithm, in 
addition to three kinds of swaps in the ordinary case, were found to be important 
since they can reduce the chances of entering into the time-consuming next simple 
swaps or successive swaps. The spare switches introduce t.wo types of wild cards 
depending on the location of spare switches in the network. In Type 1 networks, two 
extra spares along with multiplexers/demultiplexers are required in order to create
116
one additional row in the specification matrix . The Type II network requires less 
hardware to create one additional column in the matrix , a.nd the wild cards are 
much more flexible than  in the Type I network. It was shown tha t the addition of 
extra switches to the  network considerably decreases the  runtim e of the algorithm. 
The failure in the switch is reflected in the reconfiguration matrix , which helps to 
reconfigure the network dynamically by changing the settings of the multiplexers 
and demultiplexers and using extra switches. As in the ordinary Clos network, 
the new algorithm realizes every perm utation  because of its scanning the C  matrix 
row-by-row and restricting swapping elements to two columns in the successive swap.
9.1.3 Optimum Numbers of Spare Switches in FTC
Optim um  numbers of ex tra  switches in FT C  networks can be determined with respect 
to the reliability, run tim e and cost. The research so far has shown the following 
results for determ ining t.he number of spare switches in each stages. When the relia­
bility of the switches is high, just one or two ex tra  switches are needed in each stage in 
order to improve the reliability of the system. In this case, the fault tolerant, routing 
algorithm is not efficient, and the runtime approaches the speed of the algorithm for 
the ordinary Clos network. No additional costs are required. However, when the 
reliability of the switches is not high, more than  two extra switches are required in 
order to improve the reliability. High reliability can be achieved by adding more 
switches in outer stages, but with an increase in cost. Adding more ex tra  switches 
in the middle stage is less costly in improving the run time than adding spares in 
the outer stages. However, better reliabilities are possible in the la tter  case. The 
runtime is improved by roughly the same amount in both types of FT C  networks. 
The optimum num ber of spare switches in each stage of the FTC  network can not 
be determined exactly, bu t rather it depends on the availability of the resources and 
requirements of the system. The general approach would be to decide the above
117
factors first and then adjust, the number of spare switches in the outer stages and in 
the middle stage.
9.2 Open Problems
This research has covered the routing issues in the ordinary as well as fault tolerant 
Clos networks in depth. In spite of the progress made in some areas, some problems 
have been observed and some encouraging ideas th a t  need further research have 
been discovered. Those will altogether contribute to establish the sound bases of 
the research by continuing the study more deeply. First, the  current algorithm 
for decomposing the  Clos network requires that, no identical elements be present 
in a column of S  except spares in order to completely decompose the matrix . This 
condition is due to the structure of the Clos network in which each of the second stage 
switches is connected to every third-stage switch. Also, swaps are allowed only for 
the elements in the  same row. This is also due t.o the first.-st.age switches’ connect ion 
to each of the second-stage switches. These conditions look straightforward, but 
in fact, they requires extremely serial decomposition and frequent backtracking. 
However, by modifying the Clos network somehow, the current conditions might 
be alleviated in a way that could lead to a much faster, straightforward routing 
strategy. The question here is: Is there any modified structure of the Clos network 
which could lead to the much faster routing th a t  can be performed in a serial as well 
as in a parallel method? And if so. how can we find tha t structure, and how much 
difference can we expect?
Meanwhile, this research has developed a new algorithm for decomposing the 
Clos interconnection network. This algorithm can be applied t.o Benes and other 
similar interconnection networks which are derived from the Clos network. Then, 
can we apply this algorithm to other multistage interconnection networks such as 
the shuffle-exchange, banyan, and omega networks? Also, can the algorithm for the
118
F T C  network be applied t.o other fault to lerant interconnection networks such as the 
ESC?
Also, the newly introduced algorithm decomposes the specification m atrix  
row by row, while Gordon's algorithm decomposes it column by column. The 
potential advantage of decomposing the m atrix  column by column is the reduction 
of the dimension of the specification m a tr ix  as the routing progresses, since each 
decomposed column can be removed from the matrix . Gordon’s algorithm  has been 
dem onstrated  to display errors for some perm utations, but can we explain why his 
algorithm fails? Also, can we really find an algorithm which decomposes the  m atrix  
in column by column bases?
This thesis assumed that the F T C  has an ability for the detection and location 
of faults. F urther research is required in this area. In addition, another study needs 
to be performed on the reconfiguration problems due to the failure of interstage 
connections, and the  analysis of the tim e complexity of new algorithms.
REFER ENCES
1. J. Baer. "Multiprocessing Systems." IEEE Transactions on Computers, vol. C-
25. no. 12, pp. 613-641. December 1976.
2. L. Bhuyan. "A Combinatorial Analysis of Multibus Multiprocessors."
Proceedings of 1984 International Conference on Parallel Processing, pp. 
225-227, August 1984.
3. H. Lorin. Parallelism in Hardware and Software, Prentice-Hall. Englewood
Cliffs. NJ. 1972.
4. M. Flynn. “Very High-Speed Computing Systems." Proceedings of the IEEE.
vol. 54, pp. 1901-1909, December 1966.
5. M. Flynn, “Some Computer Organization and Their Effectiveness." IEEE
Transactions on Computers, C-21, no. 9. pp. 948-960. September 1972.
6. W. Handler. "The Impact of Classification Schemes on C om puter Architecture."
Proceedings of 1977 International Conference on Parallel Processing, pp. 
7-15. 1977.
7. W. Davis. Operating Systems: A Systematic View. 2nd Edition. Addison Wesley.
Reading. MA. 1983.
8. M. Ma.no. Computer System Architecture, 2nd edition. Prentice-Hall, Englewood
Cliffs, NJ. 1982.
9. T. Hallin and M. Flynn. “Pipelining of Arithmetic Functions." IEEE Trans­
actions on Computers, vol. C-21. no. 8. pp. 880-886. August 1972.
10. T. Mudge et. a.l.. "Analysis of Multiple Bus Interconnection Networks."
Proceedings of the 1984 International Conference on Parallel Processing. 
pp. 228-232, August 1984.
11. T. Mudge. J. Haves and D. Winsor, “Multiple Bus Architectures." Computer.
pp. 42-48, June 1987.
12. T. Chen, “Parallelism. Pipelining, and Com puter Efficiency." Computer Design.
pp. 69-74. vol. 10. no. 1, January  1971.
13. P. Wayner, “Processor Pipelines," Byte.  vol. 17. pp. 305-306. January  1992.
14. D. Lawrie. “Access and Alignment of D ata  in an Array Processor." IEEE Trans­
actions on Computers, vol. 24, no. 12, pp. 1145-1155. December 1975.
15. W. Chu, “Advances in Com puter Communications and Networking," Artech
House, Dedham, MA, 1979.
119
120
16. Chuan-lin Wu and Tse-Yun Feng, “On a Class of Multistage Interconnection
Networks,” IE E E  Transactions on Computers , vol. C-29. no. 8, pp. 694- 
702, August 1980.
17. T. Feng, "A Survey of Interconnection Networks," Computer,  vol. 14. no. 12.
pp. 12-27. December 1981.
18. H. Siegel. “Interconnection Networks for SIMD Machines.” Computer,  vol. 12.
pp. 57-65, June 1979.
19. Yao-Ming Yeh and Tse-Y'un Feng. “On a Class of Rearrangeable Networks.”
IE E E  Transactions on Computers , vol. 41, no. 11. pp. 1361-1379. 
November 1992.
20. F. I\. Hwang. “On the Rearrangeabilit.y of Some Multistage Connecting
Networks.” Belt Systems Technical Journal, vol. 55, No. 9. pp. 1411-1422. 
November 1976.
21. F. I\. Hwang and A. Jajszczyk, “On Nonblocking Multiconnection Network."
IE E E  Transactions on Communications , vol. COM-34, no. 10. pp. 1038- 
1041. October 1986.
22. B. Douglass. “Rearrangeable Three-Stage Interconnection Networks and Their
Routing Properties," IE E E  Transactions on Computers,  vol. 42. no. 5. 
pp. 559-567. May 1993.
23. G. Goke and G. Lipovski, “Banyan Networks for Partitioning Multiprocessor
Systems.” First Annual Symposium on Computer Architecture,  pp. 21- 
28, December 1973.
24. M. Leland, “On the Power of the Augmented D ata M anupulator Network." 19S5
International Conference on Parallel Processing, pp. 74-78, August 1985.
25. K. Batcher. “The Flip Network in STARAN.” Proceedings of  the 1976 Interna­
tional Conference on Parallel Processing, pp. 65-71. 1976.
26. H. Siegel and R. McMillen, “The Multistage Cube: A Versatile Interconnection
Network,” Computer , pp. 65-76, December, 1981.
27. F. Lombardi and C'. Feng, “Detection and Location of Multiple Faults in
Baseline Interconnection Networks,” IE E E  Tra nsact ions on Computers.  
vol. 41. pp. 1340-1344. October 1992.
28. M. K um ar and J. R. Jum p, “Generalized Delta Networks.” Proceedings of  the
19S3 International Conference on Parallel Processing, pp. 10-18. August 
1983.
29. Z. Cvetanovic, “Best and Worst Mapping for the Omega Network." IBM  Journal
of  Research and Development. , vol. 31, pp. 452-463. July 1987.
121
30. D. R.au. J. Fortes and H. Siegel, “Destination Tag Routing Techniques Based
on a State Model for the IADM Network." IEEE Transactions■: on 
Computers, vol. 41, no. 3, pp. 274-285. March 1992.
31. Charles Clos, “Study of Non-blocking Sw itching Networks." Bell Systems
Technical Journal, vol. 32, no. 2. pp. 406-424, March 1953.
32. V. E. Benes, “On Rearrangeable Three-Stage Connecting Network." Bell
Systems Technical Journal vol. XLI. no. 5, pp. 117-125, September 1962.
33. V . 1. Neiman. “S tructure  e t commande optimales des reseaux de connexion sans
blocage," Annal.es des Telecommun., pp. 232-238, Julv/August. 1969.
34. Nelson T. Tsao-W u, “On Neiman's Algorithm  for the Control of Rearrangeahle
Switching Networks," IEEE Transactions on Communications, vol. 
COM-22, no. 6. pp. 737-742, June 1974.
35. Abraham Waksman. “A Perm utation Network." Journal of the ACM. vol. 15.
no. 1. pp. 159-163. January  1968.
36. H. R. Ram anujam , “D ecom position of Perm utation Networks." IEEE Trans­
actions on Computers, vol. C-22. no. 7. pp. 639-643, July 1973.
37. Marek Kubale, “Com m ents on Decomposition of Perm utation Networks." vol.
C-31. no. 3, p. 265, March 1982.
38. Andrzej Jajszczyk. “A Simple Algorithm for the Control of Rearrangeable
Switching Networks.” IEEE Transactions on Communications, vol. COM- 
33, no. 2, pp. 169-171, February 1985.
39. Claude Cardot, “Comments on a Simple Algorithm for the Control of
Rearrangeable Switching Networks." IEEE Transact ions on Communi­
cations. vol. COM-34, no. 4, p. 395. April 1986.
40. Frank I\. Hwang, “Control Algorithms for Rearrangeable Clos Networks," IEEE
Transactions on Communications, vol. COM-31, no. 8. pp. 952-954. 
August 1983.
41. D. G. Opferman and N. T. Tsao-Wu, “On a Class of Rearrangeable Switching
networks. P art  I: Control Algorithm." Bell Systems Technical Journal. 
vol. 50. no. 5. pp. 1579-1600, May-June 1971.
42. St.einar Andresen. “The Looping Algorithm Extended to Base 2'." IEEE Trans­
actions on Communications, vol. COM-25, no. 10. pp. 197-203. October 
1977.
43. J. Gordon and S. Srikanthan, “Novel Algorithm for Clos-Tvpe Networks."
Electronic Letters, vol. 26, no. 21, pp. 1772-1774. October 1990.
122
44. Y. K. Chiu and W. C. Siu, “Comment: Novel Algorithm for Clos-Type
Networks,'" Electronic Letters. vol. 27, no. 6, pp. 524-526. March 1991.
45. Harold Gabow, “Using Euler Partitions to Edge Color B ipartite  Graphs
and M ultigraphs,’' International Journal  o f  Computer and Information 
Sciences , vol. 5. no. 4, pp. 345-355, 1976.
46. H. Gabow and Oded Kariv, “Algorithm for Edge Coloring B ipartite  Graphs and
M ultigraphs," S I A M  Journal on Computing,  vol. 11. no. 1, pp. 117-129, 
February 1982.
47. Richard Cole and John Hopcroft, “On Edge Coloring B ipartite  Graphs.'1 S IA M
Journal  on Computing , vol. 11, no. 3, pp. 540-546. August 1982.
48. V. Yizing, “On an Estim ate of the  Chromatic Class of a p-graph." Diskret.
Analiz.,  no. 3, pp. 25-30. 1964.
49. D. Nassimi and S. Salmi, “A Self-routing Benes Network and Parallel Perm u­
tation Algorithms." IE E E  Transactions on Computers, vol. C-30, no. 5. 
pp. 332-340. May 1981.
50. John D. Carpinelli and A. Ya.vuz Oruc, “A Non-backtracking Decompo­
sition Algorithm for Routing on Clos Networks." IE E E  Transactions on 
Communications , vol. 41. no. 8. pp. 1245-1251. August 1993.
51. J. Carpinelli, Interconnection Networks: Improved Routing Methods fo r  Clos and
Benes Networks,  Ph.D. Thesis, Rensselaer Polytechnic Institute. Troy. 
NY. August 1987.
52. G. Lev, N. Pippenger and L. Valiant, “A Fast Parallel Algorithm for Routing
in Perm utation  Networks." IE E E  Transactions on Computers,  vol. C-30. 
no. 2. pp. 93-100. February 1981
53. J. Lenfant, “Parallel Permutations of Data: A Benes Network Control Algorithm
for Frequently Used Perm utations." IE E E  Transactions on Computers.  
vol. 27, no. 7, pp. 637-647, July 1978.
54. B. G. Douglass and A. Y. Oruc, “On Self-Routing in Clos Connection Networks."
I E E E  Transactions on Communications,  vol. 41, no. 1, pp. 121-124. 
January  1993.
55. C'. Raghavendra and R. Boppana. “On Self-Routing in Benes and Shuffle-
Exchange Networks," IE E E  Transactions on Computers,  vol. 40. no. 9. 
pp. 1057-1064, September 1991.
56. G. Adams and IT Siegel, “The Extra  Stage Cube: A Fault-Tolerant Intercon­
nection Network for Supersystems.’’ IE E E  Transact ions on Computers. 
vol. C-31. no. 5. pp. 443-454, May 1982.
123
57. G. Adams, D. Agrawal and H. Siegel. “A Survey and Comparison of Fault -
tolerant Multistage Interconnection Networks." Computer . pp. 14-27. 
June 1987.
58. I\. Yoon and W. Hegazy, “The E xtra  Stage G am m a Network." Proceedings of
the 13th A nnual Symposium on Computer Architecture, pp. 175-182, 1986
59. K. Padmana.bhan and D. Lawrie, “A Class of Redundant Pa th  M ultistage In ter­
connection Networks,’' I E E E  Transactions of  Computers,  pp. 1099-1108. 
December 1983.
60. H. Nassar. Fault-Tolerant Interconnection Networks f o r  Multiprocessor Systems.
Ph.D. Thesis, New Jersey Institu te  of Technology, Newark, NJ. 1989.
61. C. Raghavendra and A. Varma. “INDRA: A Class of Interconnection Networks
with Redundant Pa ths ,” Proceedings o f  the 1981, Heal Time Systems  
Sympos ium , pp. 153-164. 1984.
62. T. Feng and C'. Wu. '‘Fault-Diagnosis for a Class of M ultistage Interconnection
Networks,” IE E E  Transactions on Computers . vol. C-30. no. 10. pp. 
743-758, October 1981.
63. D. Agrawal . “Testing and Fault Tolerance of M ultistage Interconnection
Networks," Computer , pp. 41-53, April 1982.
64. J. Lilienkamp. D. Lawrie and P. ’lew , “A Fault Tolerant Interconnection
Network Using Error Correcting Codes.” The Proceedings of  the 1982 
International Conference on Parallel Processing, pp. 123-125. 1982.
65. D. Agrawal and D. Kaur. “Fault Tolerant Capabilities of Redundant
M ultistage Interconnection Networks.” The Proceedings of  Real-time 
Systems Symposium,  pp. 119-127. December 1983.
66. J. P. Shen, “Fault Tolerance o f  0 -networks in Interconnected Mult icomputer
System.  Ph.D. Dissertation. D epartm ent of Electrical Engineering. 
Lhiiversity of Southern California, CA. August 1981.
67. W. Fuchs. J. Abraham and K. Huang, "Current Error Detection in VLSI Inter­
connection Networks.” The Proceedings o f  the 10th Annual Inter national 
Symposium on Computer Architecture , pp. 309-315. 1983.
