New Jersey Institute of Technology

Digital Commons @ NJIT
Dissertations

Electronic Theses and Dissertations

Spring 5-31-1989

Fault-tolerant interconnection networks for multiprocessor
systems
Hamed Mohamed Nassar
New Jersey Institute of Technology

Follow this and additional works at: https://digitalcommons.njit.edu/dissertations
Part of the Electrical and Electronics Commons

Recommended Citation
Nassar, Hamed Mohamed, "Fault-tolerant interconnection networks for multiprocessor systems" (1989).
Dissertations. 1232.
https://digitalcommons.njit.edu/dissertations/1232

This Dissertation is brought to you for free and open access by the Electronic Theses and Dissertations at Digital
Commons @ NJIT. It has been accepted for inclusion in Dissertations by an authorized administrator of Digital
Commons @ NJIT. For more information, please contact digitalcommons@njit.edu.

Copyright Warning & Restrictions
The copyright law of the United States (Title 17, United
States Code) governs the making of photocopies or other
reproductions of copyrighted material.
Under certain conditions specified in the law, libraries and
archives are authorized to furnish a photocopy or other
reproduction. One of these specified conditions is that the
photocopy or reproduction is not to be “used for any
purpose other than private study, scholarship, or research.”
If a, user makes a request for, or later uses, a photocopy or
reproduction for purposes in excess of “fair use” that user
may be liable for copyright infringement,
This institution reserves the right to refuse to accept a
copying order if, in its judgment, fulfillment of the order
would involve violation of copyright law.
Please Note: The author retains the copyright while the
New Jersey Institute of Technology reserves the right to
distribute this thesis or dissertation
Printing note: If you do not wish to print this page, then select
“Pages from: first page # to: last page #” on the print dialog screen

The Van Houten library has removed some of the
personal information and all signatures from the
approval page and biographical sketches of theses
and dissertations in order to protect the identity of
NJIT graduates and faculty.

INFORMATION TO USERS
The most advanced technology has been used to photo
graph and reproduce this manuscript from the microfilm
master. UMI film s the text directly from the original or
copy submitted. Thus, some thesis and dissertation copies
are in typewriter face, while others may be from any type
of computer printer.
The quality of th is reproduction is dependent upon the
quality of the copy submitted. Broken or indistinct print,
colored or poor quality illustrations and photographs,
print bleedthrough, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send UMI a
complete manuscript and there are m issing pages, these
will be noted. Also, if unauthorized copyright m aterial
had to be removed, a note will indicate the deletion.
Oversize materials (e.g., maps, drawings, charts) are re
produced by sectioning the original, beginning at the
upper left-hand corner and continuing from left to right in
equal sections with small overlaps. Each original is also
photographed in one exposure and is included in reduced
form at the back of the book. These are also available as
one exposure on a standard 35mm slide or as a 17" x 23"
black and w h ite photographic print for an additional
charge.
Photographs included in the original manuscript have
been reproduced xerographically in th is copy. H igher
quality 6" x 9" black and w hite photographic prints are
available for any photographs or illustrations appearing
in this copy for an additional charge. Contact UMI directly
to order.

U niversity M icrofilm s International
A Bell & H owell Information C o m p a n y
3 0 0 North Z e e b R oad , Ann Arbor, Ml 4 8 1 0 6 -1 3 4 6 U SA
3 1 3 /7 6 1 -4 7 0 0
8 0 0 /5 2 1 - 0 6 0 0

O rder N u m b er 9003133

F a u lt-to lera n t in terco n n ectio n n etw ork s for m u ltip rocessor
sy stem s
Nassar, Hamed Mohamed, D.Eng.Sc.
New Jersey Institute of Technology, 1989

UMI

300 N. ZeebRd.
Ann Arbor, MI 48106

Fault-Tolerant Interconnection
Networks for M ultiprocessor Systems

by
Hamed Nassar

D issertation subm itted to the Faculty of the G raduate School
of the New Jersey In stitute of Technology in p artial fulfillment
of the requirem ents for the degree of
Doctor of Engineering Science
1989

Approval Sheet

Title of Thesis: Fault-Tolerant Interconnection Networks for Multiprocessor
Systems
Name of Candidate: Hamed Nassar
Doctor of Engineering Science, 1989

Thesis and Abstract Approved:
Dr. John Carpinelli
Date
Assistant Professor
Department of Electrical Engineering

Dr. Raj Misra

Date

Dr. Peter Ng Date

Dr. Anthony Robbi

Date

A bstract

T itle of Thesis: Fault-Tolerant Interconnection Networks for M ultiprocessor
Systems
Hamed Nassar, Doctor of Engineering Science, 1989

Thesis directed by: Dr. John Carpinelli

Interconnection networks represent the backbone of multiprocessor systems. A
failure in the network, therefore, could seriously degrade the system performance.
For this reason, fault tolerance has been regarded as a m ajor consideration in in
terconnection network design. This thesis presents two novel techniques to provide
fault tolerance capabilities to three m ajor networks: the Baseline network, the Benes
network and the Clos network.
F irst, the Simple Fault Tolerance Technique (SFT) is presented. The SFT
technique is in fact the result of merging two widely known interconnection mecha
nisms: a norm al interconnection network and a shared bus. This technique is most
suitable for networks w ith small switches, such as the Baseline network and the
Benes network. For the Clos network, whose switches may be large for the SFT,
another technique is developed to produce the Fault-Tolerant Clos (FTC ) network.
In the F T C , one switch is added to each stage. The two techniques are described
and thoroughly analyzed.

VITA
Name: Earned Mohamed Nassar.

Degree and date to be conferred: D.Eng.Sc., 1989.

Secondary education: Shebin El-Kanater Secondary School.
Collegiate institutions attended
Dates
Degree
Date of Degree
Ain Shams University, Egypt
1974-79 B.S.E.E.
May, 1979
New Jersey Institute of Technology 1983-85 M.S.E.E. May, 1985
New Jersey Institute of Technology 1985-89 D.Eng.Sc. May, 1989
Major: Electrical Engineering
Minor: Computer and

To my parents, and to M anal and little Nancy.

ii

Acknowledgements
Mere words are not enough to express my gratitude to my advisor, Dr. John
Carpinelli. His m astery of the subject m atter combined with a great deal of sin
cerity, enthusiasm and dynamism, have made working on this dissertation the most
wonderful experience. Indeed, these qualities make him the kind of advisor every
student wishes for, b u t seldom finds.
M any thanks are due to the distinguished members of my committee: Dr. Raj
Misra, Dr. P eter Ng and Dr. Anthony Robbi. Their invaluable suggestions and
stim ulating discussions have considerably improved the quality of the dissertation.
I am really thankful to all the professors and staff of the D epartm ent of Electrical
Engineering who have been helpful and supportive throughout the years I have spent
at N JIT. Special thanks are due to Dr. W arren Ball, Dr. Joseph Strano, Dr. Edwin
Cohen, Dr. Khalil Denno, and Ms. B renda Walker.
I would also like to thank Dr. Rom an Voronka, of the M ath D epartm ent, who
has m ade me love m athem atics, and Mr. Steve Keeton, of the C om puter Services
D epartm ent, whose role was crucial in completing the com puter work of the disser
tation.
Last b u t not least, I would like to thank my imm ediate family. M any thanks
go to my parents, for their continuous prayers and encouragement, and to my wife,
M anal, and daughter, Nancy, for p u ttin g up w ith my study habits.

C o n te n ts
D e d ica tio n

ii

A ck n o w led g em en ts

iii

1

In tr o d u c tio n
1.1 Parallel P r o c e s s in g .........................................................................................
1.2 M u ltip ro cesso rs................................................................................................
1.3 Interconnection Networks and the Need for Fault T o le r a n c e .............
1.4 Outline ..............................................................................................................

1
1
4
8
13

2

B a sic C o n cep ts and N o ta tio n
2.1 Interconnection N e tw o r k s ............................................................................
2.2 Fault Tolerance for M I N s ............................................................................
2.3 Com binatorics ................................................................................................
2.3.1 P erm utations (a rra n g e m e n ts ).........................................................
2.3.2 Com binations (s e le c tio n s )................................................................
2.4 Fundam entals of R e lia b ility .........................................................................
2.4.1 P robability of a simple e v e n t .........................................................
2.4.2 P robability of a compound e v e n t ...................................................
2.4.3 Reliability models .............................................................................
2.5 N o ta tio n .............................................................................................................

15
15
19
20
21
23
24
24
25
29
31

3

M IN Im p le m e n ta tio n s
3.1 The Baseline N e t w o r k ..................................................................................
3.1.1 Routing the Baseline n e tw o r k .........................................................
3.2 The Clos Network .........................................................................................
3.2.1 R outing the Clos n e tw o r k ................................................................
3.3 The Benes N etw o rk .........................................................................................
3.3.1 R outing the Benes n e t w o r k ............................................................
3.4 O ther MIN Im plem entations .....................................................................
3.5 The C rossbar S w itc h ......................................................................................

33
35
40
42
44
50
51
53
54

4

Fault T oleran t M IN s
4.1 The E x tra Stage C u b e ..................................................................................
4.1.1 O peration and fault tolerance m o d e l ............................................
4.2 A ugm ented Shuffle-Exchange M I N ...........................................................
4.2.1 O peration and fault tolerance m o d e l ............................................
4.3 Fault D etection and L o c a tio n .....................................................................

57
59
59
63
66
70

iv

5

T h e S im p le Fault T olerant B a selin e netw ork
5.1 Design of the SFTB ......................................................................................
5.2 Routing the SFTB Under Faulty C o n d itio n s..........................................
5.2.1 Perform ance degradation under faulty c o n d itio n s ....................
5.2.2 Accessing the b u s ................................................................................
5.3 Design of the Enhanced SFTB .................................................................
5.3.1 P erm utation realization capabilities of the enhanced SFTB .
5.4 Use of the Standby Bus Under Normal Conditions .............................
5.5 Using the SFT technique in MINs with large s w i t c h e s ......................
5.6 D isc u ssio n ..........................................................................................................

71
73
77
78
80
82
83
96
98
99

6

The
6.1
6.2
6.3
6.4
6.5
6.6

7

C o n clu sio n s
123
7.1 S u m m a r y ............................................................................................................. 123
7.2 The SFT Technique .........................................................................................124
7.3 The Fault-Tolerant Clos (FT C ) n e tw o r k .................................................... 125
7.4 Open P r o b le m s ................................................................................................... 125

F au lt-T oleran t C los N etw o rk
102
Design of the F T C ............................................................................................ 102
Reconfiguration of th e F T C ........................................................................... 104
Routing the F T C
............................................................................................ 107
Reliability A n a ly sis............................................................................................ 113
G eneralization to More T han One E xtra Switch per S t a g e ................... 118
D iscu ssio n ............................................................................................................. 122

B ib lio g ra p h y

127

L ist o f T a b les
4.1

Routing Tags for the E S C ............................................................................

62

5.1
5.2

M ultiplexer and dem ultiplexer operation modes
Values of T>(2) and -np v( ° ) and their ratio for some values of u

83
96

L ist o f F ig u r e s
1.1
1.2
1.3
1.4
1.5

Basic m ultiprocessor a r c h ite c tu r e ..............................................................
M ultiprocessor system w ith a shared b u s .................................................
N x M crossbar switch ...............................................................................
Basic configuration of interconnection n e tw o r k s ....................................
4 x 4 switch set to realize an arb itrary m a p p i n g ................................

5
6
7
8
9

2.1
2.2

Generalized M I N .............................................................................................
Basic reliability m o d els...................................................................................

18
29

3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9

8 x 8 Baseline network w ith routing e x a m p l e .......................................
Shuffling 8 o b j e c t s .........................................................................................
8 x 8 Omega n etw o rk ......................................................................................
Legal states of the binary s w i t c h ..............................................................
8 x 8 ordinary Clos n e tw o r k .........................................................................
G raph representation of perm utation P .................................................
8 x 8 Benes n e t w o r k ......................................................................................
6 x 6 BBC n e tw o rk .........................................................................................
Im plem entation of a binary s w i t c h ...........................................................

36
37
39
41
43
49
50
54
55

4.1
4.2
4.3

The E x tra Stage Cube ( E S C ) .....................................................................
8 x 8 Baseline n e tw o rk ...................................................................................
The Augmented Shuffle-Exchange Network (ASEN) ..........................

60
65
67

5.1
5.2
5.3
5.4
5.5

Baseline network of size 8 ............................................................................
The SFT equivalent of the Baseline network of Figure 5 . 1 ................
The enhanced SFT equivalent of the Baseline network of Figure 5.1 .
Perm utation P realized w ith a PT U policy at all s w itc h e s ................
Same perm utation of Figure 5.4, realized with a P T U policy at all
switches except X ( l, 0)
Subset structure of the Baseline network of Figure 5 . 1 .......................
The two types of blocking, assum ing higher priority for the upper input

72
76
84
87

5.6
5.7
6.1
6.2
6.3
6.4
6.5

88
92
94

9 x 9 ordinary Clos n e tw o r k ............................................................................ 103
The equivalent FT C of the network of Figure 6 . 1 ....................................105
F T C reconfigured to accom m odate faults in X (1 ,0 ), X ( 2 ,l) and
X (2 ,2 )
108
The graph representation of both the ordinary network of Figure 6.1
and the FTC of Figure 6.3 as they realize perm utation P.........................112
Reliability vs. N for both the ordinary network and the F T C , for
r = 0 . 9 8 .................................................................................................................. 116

vii

Reliability vs. N for both the ordinary network and the F T C , for
r = 0 .8 .......................................................................................................... 117
6.7 Gain in reliability vs.N for various fault tolerant networks with r = 0.8119
6.8 Gain in reliability vs. N for various fault tolerant networks with
r = 0 . 9 9 ....................................................................................................... 120
6.6

viii

C h a p te r 1
I n tr o d u c tio n
1.1

P a r a lle l P r o c e s s in g

Fast com puters are increasingly being required as sophisticated, com putation
intensive applications continue to evolve. The search for a fast com puter seems
endless as m ore and m ore speed is always dem anded. This need for speed was
fulfilled in th e early days of com puters by advancements in technology. As tubes
were replaced by transistors and other discrete solid state components, the com puter
became faster. Then came integrated circuits (IC) technology to make com puters
even faster. This race came to the point where the component technology could no
longer catch up with the need for more speed. A drastic change in the architecture
of the com puter was the place where an answer could be found. The change was in
the form of using parallel processing.
Early com puters - the so-called von Neumann machines - used a single proces
sor to fetch instructions from memory and execute them one at a tim e [57]. Parallel
systems, however, are based on the principle th a t more th an one task can be p er
formed simultaneously. This concurrency can be realized either at the software level
or at the hardw are level [56].
At the software level, parallelism is obtained by time-sharing the com puter re

1

sources among different programs. Here, the operating system divides [30] the CPU
tim e among the different programs so th a t no one program monopolizes the CPU
for a long tim e while others are waiting. This technique has been used on comput
ers with a single processor to achieve parallelism in the form of multiprogramming,
multitasking, multiuser and time-sharing capabilities.
When parallelism is im plem ented at the hardw are level, it can take place at the
computer level, at the sub-processor level, or at the processor level. Parallelism can
also be achieved by computers either having a single processor, uniprocessors, or
having more th an one processor, multiprocessors.
Distributed computing [26] is the nam e used when parallelism takes place at
the computer level. Here the com putation load is distributed among more than
one computer. These computers, which are connected by a communications net
work, work totally independently and asynchronously. Communications between
the different com puters take place in the form of passing messages to obtain data
or exchange results. The computers may exist in close proximity to each other, in
which case they are connected by a Local Area Network (LAN), or they may be
scattered over a wide geographic area, in which case they are connected by a Wide
Area Network (WAN). Com puters in a distributed computing system are said to be
loosely coupled.
Computers can achieve parallelism at the sub-processor level in several ways.
One way is by fetching an instruction while another is being executed. Another
way is to overlap Central Processing Unit (CPU) and In p u t/O u tp u t (I/O ) opera
tions. Yet another way is by using pipelining [25]. In pipelined architectures, the
idea of assembly lines is utilized. In an assembly line, the job is divided into many
steps and each step is assigned to a specific worker along the line. This manu
facturing technique has proven efficient, because all the elements of the line are
2

continuously busy. In a pipelined com puter, a control unit divides the instruction
[42] into a num ber of phases and assigns each phase to a subunit in the m ain pro
cessor. Each subunit performs its p art and then sends the result to the next subunit
along the way. This makes each subunit continuously busy, therefore increasing the
throughput of the system [23,24]. At steady state, the flow of instructions into the
pipeline is equal to the flow of instructions out of the pipeline.
Undoubtedly, more parallelism can be obtained by having more th an one pro
cessor in the computer. In a multiprocessor com puter [8], the different processors
cooperate to execute the instructions of a program . The program is divided into
different parts, each of which can be executed independently. The partial results
from the different processors are exchanged and the overall result of the program
can be obtained from them by a m aster processor or by a control unit. In a m ulti
processor, all the processors access the same memory, which is usually divided into
interleaved modules for greater efficiency [74]. An array com puter [51] is similar to a
multiprocessor com puter except th a t the processors are replaced by A rithm etic and
Logic Units (ALU) which work synchronously under the supervision of a common
control unit. Moreover, in the array com puter, each ALU is provided with a local
memory to make up a Processing Element (PE ).
It was once possible to classify a parallel com puter as using one parallelism tech
nique or another. Now, however, more th an one technique may be used in the same
com puter, making categorizing a particular com puter a difficult task. For example,
some of the techniques used for parallelism in uniprocessors, such as pipelining, can
be used for the individual processors in a m ultiprocessor com puter, giving rise to
greater execution speeds. Moreover, some software parallelism techniques can also
be used with th a t com puter, giving rise to even greater speeds, and so on.

3

1.2

M u ltip r o c e sso r s

Clearly, using many processors in the same system yields more speed than using
one processor [36]. Recent advances in VLSI technology, coupled w ith the need for
fast com puters, have made large-scale m ultiprocessor systems economically feasible.
In such systems, hundreds or even thousands of processors are used to carry out
the com putations of a program concurrently, thereby speeding up the execution of
the program . Many applications can benefit from this enormous com puting power.
Typical applications include sim ulation program s, such as w eather forecasting, and
real-tim e program s, such as radar tracking.
The basic architecture of a m ultiprocessor system is shown in Figure 1.1. In this
configuration, the N processors carry out com putations on d ata stored in the M
memory modules. For the interaction between the processors and memory, there
m ust be a communications m echanism to enable any processor to access any memory
module in the shortest possible tim e. This mechanism is of extrem e im portance,
as the efficiency of the system depends m ainly on its ability to establish the re
quired p ath s between its two sides. M any such mechanisms have been proposed
and explored in the literature.
At one extreme is the shared bus [46]. This is similar to the bus of the uniproces
sor com puter with a control unit to lim it the access to the bus to one processor at a
time. A m ultiprocessor system using a shared bus as its communications mechanism
is shown in Figure 1.2. A processor requiring access to mem ory puts the address
of the memory location it wants to access on the bus. The address is decoded and
used to enable the memory module where the target location is. As simple and
inexpensive as this mechanism is, it results in extremely poor perform ance when
N is large [58]. T hat is because only one processor can use the bus at a time. As

4

N Processors
N - 1

Comm unications Mechanism

M —1

M M emory Modules
Figure 1.1: Basic m ultiprocessor architecture

N Processors
N- 1

Shared bus

M- 1

M Memory Modules
Figure 1.2: M ultiprocessor system with a shared bus
a consequence, this mechanism has been ruled out for large-scale multiprocessor
systems, those with possibly thousands of processors. However, if the num ber of
processors is small enough, e.g. N = 2, the bus can be used as a simple, inexpensive
communications mechanism. (Actually, this is the principle behind the Simple Fault
Tolerance (SFT) technique presented in this thesis.) T he possibility of having more
th an one bus in the system has been explored [14,60] for relatively large values of
N . However, the work done in this direction indicates [29,50,83] th a t performance
still degrades as the ratio N / B increases, where B is the num ber of buses.
At another extreme is the crossbar switch, such as the one shown in Figure 1.3.
This is called an N x M switch because it has N inputs and M outputs. The crossbar
switch can be thought of as two rows of conducting bars placed on top of each other
w ithout direct contact. Figure 1.3a shows this conceptual construction. There are
N horizontal bars and M vertical bars. To establish a connection between, say,

6

horizontal bar 0 and vertical bar 1, one only has to connect the two bars at the
point where they intersect (symbolized by the little circle in the figure.) Now if a
signal is put at input 0, the same signal will appear at o u tp u t 1. T h at is, a p ath has
been created between input 0 and output 1. To remove this path, one only has to
disconnect the two bars again at the point where they intersect. This connection and
disconnection is im plem ented by a control unit attached to the switch. Moreover,
the actual switch is an electronic circuit, usually an IC, where there are no actual
bars. The crossbar switch is ideal in th at it can be set to connect any x inputs,
x < N , to any y outputs, y < M , simultaneously and in a one-to-one fashion. Its
Inputs
0 --- ®----- Sr

0
1

JV-l

N - 1

••• M - 1
O utputs
a) Internal structure

0
1
M - 1

)

b) R epresentation

Figure 1.3: N x M crossbar switch
drawback, however, is th a t it becomes prohibitively expensive [15] for large values
of N or M . Figure 1.3b shows the symbolic representation of the crossbar switch.
This representation is used throughout the thesis to indicate crossbar switches.
Between the two extrem es are multistage interconnection networks (MINs). These
networks are built from small crossbar switches arranged in stages, w ith each stage
being connected to the next stage through a set of links. T he inputs to the network
are called inlets and the outputs of the network are called outlets. The words input
and output will be used only for the individual switches making up the network. A
p ath can be established between an idle inlet and an idle outlet by setting the indi7

0
1
2

3

Figure 1.4: Basic configuration of interconnection networks
vidual switches. A 4 x 4 MIN is shown in Figure 1.4. There are three stages, each
containing one switch, and the stages are interconnected by interstage links. The
network has 4 inlets and 4 outlets, and it can be seen how paths can be established
between inlets and outlets ju st by making appropriate connections of the inputs
and outputs of the individual switches. It should be noted th a t the figure does not
represent a real MIN, as each switch can perform all possible perm utations of the
inlets into the outlets. The figure is intended only to show how stages of switches
m ay be linked to form a MIN. Figure 1.5 shows how a switch can be set to realize
a given m apping. In fact, the switch is th at shown in the first stage of the network
of Figure 1.4.

1.3

I n te r c o n n e c tio n N etw o r k s a n d th e N e e d for
F a u lt T o le r a n c e

MINs were developed originally for use in telephone system s, long before the idea
of m ultiprocessor com puter systems started to materialize. In telephony, MINs are
used in switching offices to link different callers by connecting their respective lines
8

Inputs
0 — p—

1)------ 5------9

1

e c o n ta ct

2-

q

n o c o n ta c t

3)

2

3

O utputs
a) Internal structure

b) R epresentation

Figure 1.5: 4 x 4 switch set to realize an arbitrary m apping
together. In m ultiprocessors, they are used for communications between processors
and m em ory modules.

Setting (or routing) a MIN is the process of setting the

individual switches so th a t the network can realize a mapping from the inlets into
the outlets. T he network shown in Figure 1.4 is set to realize the mapping:
0

->

2

1 -> 3
2

— >

0

3 —>• 1
T hroughout this thesis, MINs will be discussed in the context of their use in
m ultiprocessors as defined in Section 1.2. In other words, no reference will be made
to other com puter architectures th a t use MINs. This is done because the focus in
the thesis is on MINs, regardless of the operating environment or the architectural
organization of the com puter where they are used. However, it should be noted th at
MINs are also used in other com puter architectures such as array computers, d ata
flow com puters and vector com puters. In all these architectures, there are more
th an one processing unit connected together by a MIN.
Also in this thesis, the words multiprocessor, multiprocessor computer, multipro
cessor system and multiprocessor computer system will be used interchangeably. It
is w orth noting th a t the processors in a multiprocessor system are said to be tightly
coupled.

MINs may be classified according to the way they receive the mapping. Typi
cally the processors connected to the inlets require access to the memory modules
connected to the outlets by generating a memory access request. This request m ust
contain the num ber of the particular outlet to which the memory module is con
nected.

If all processors send their requests at the same time to the network,

the network is said to work synchronously. A memory access cycle [69] is the time
needed to establish the p ath , plus the tim e it takes to read or write a memory word.
Thus in a synchronous environment, processors may send requests only at fixed in
tervals equal to the mem ory access cycle. If, on the other hand, the network is able
to handle requests on an individual basis subm itted at any time by the processors,
then the network is said to work asynchronously. In any case, since the processors
work independently, it is likely th a t more th an one processor will seek access to the
same memory m odule, causing a memory conflict. This problem is unrelated to the
MIN operation, and a solution to it can be achieved in the design of the operating
systems and m em ory m anagem ent schemes of the multiprocessor system. As this
thesis is concerned w ith MINs, only network conflicts will be considered.
It is w orth m entioning th a t realizing random permutations is the ultim ate figure
of m erit for testing th e connectivity of any MIN. T hat is because if a MIN can
realize any random perm utation, it can realize any mapping. For this reason, MINs
are normally designed and studied as permutation networks. MINs m entioned in
this thesis are all perm utation networks.
MINs may be classified according to the way they are routed [34]. Some MINs
have a central routing unit. This unit receives the m apping and runs an algorithm to
find how the individual switches should be set so th at the mapping can be realized.
The settings are th en sent to the individual switches for im plem entation by local
control units. This type of MIN is called centrally routed. By contrast, there are
10

self or distributed routed MINs. In a self-routed MIN a routing tag is placed on the
inlet to establish the required p ath to the outlet. One or more bits in the routing
tag are used to set the switch in the first stage, thereby allowing the rem aining bits
to go one stage towards th e destination. Again, one or more bits are used to set
the switch in th e second stage, thereby allowing the rem aining bits to travel one
more stage towards the destination. This process continues until a path is created
between the inlet and the target outlet. This thesis covers networks of both routing
types, central and distributed.
MINs may be classified according to whether every possible m apping of the inlets
into the outlets can be realized. If this is the case, the MIN is called non-blocking.
An example of a non-blocking MIN is the Clos network [28]. If, on the other hand,
one or more paths in the m apping cannot be realized, the network is called blocking.
An example of a blocking MIN is Omega network [51]. Typically, self-routed MINs
are blocking, and centrally routed MINs are not. This thesis covers MINs of both
types.
Blocking MINs m ay be classified according to the way partially completed paths
are treated [47]. Suppose a p a th gets blocked in a switch at a given stage between the
inlet and the outlet. The blocking switch, the switch where the p ath was blocked,
may send a signal to the preceding switches on the partially completed p ath to
dism antle the path. The MIN in this case is called unbuffered. If, on the other hand,
the partially completed p a th is kept, and the blocking switch stores the rem aining
routing bits in a queue for possible completion in the next memory cycle, then the
MIN is called buffered [31]. This thesis deals only with unbuffered MINs.
Self-routed MINs m ay be classified as either circuit switched or packet-switched.
In circuit-switched MINs, the p a th from the processor to the memory module is
established before any d a ta is sent by the processor. Moreover, the p ath is kept
11

for the entire duration of the mem ory cycle. Any “bare bones” MIN is circuitswitched; to make it packet-switched, additional hardw are is required. In packet
switched MINs [32], the processor sends a self-contained message, having its address
and the address of the destination, to the switch connected to it in the first stage.
The switch investigates th e destination address of the packet and sends it to the
next switch along the way. Once a switch sends a packet to the next switch, it can
receive another packet and send it along. This operation mode allows for greater
throughput, but at the expense of a complicated switch design. Packet-switched
MINs are usually buffered for greater performance. The MINs studied in this thesis
are all of the circuit-switched type.
Finally, MINs may be classified according to their fault tolerance capabilities.
This is the topic of this thesis. Fault tolerance has been raised as an im portant
issue in designing MINs for m ultiprocessor systems. The reason stems from the fact
th at these systems are likely to run im portant tasks where interruption could have
damaging effects. Since the network is at the center of such systems, a failure in the
network can seriously degrade the perform ance of the system. This is particularly
true in networks where the paths between in let/o u tlet pairs are unique, such as the
shuffle networks [87]. In such networks, if a switch needed to establish a path is
faulty, th a t p ath cannot be established until the fault is physically removed. Given
the sequential and dependent nature of com puter program s, a memory word can be
vital to the completion of a program . If th a t word cannot be accessed, the program
cannot be completed.

In a m ultiprocessor system, therefore, it is necessary to

m aintain the ability of establishing comm unications paths between processors and
memory modules at all tim es. This entails th a t the interconnection network be fault
tolerant.
One cannot determ ine whether a network is fault-tolerant unless a fault tolerance
12

criterion is defined; a MIN can be fault-tolerant according to one criterion, but not
so according to another. Furtherm ore, a network may tolerate a given num ber of
faults of a specific type, but not a different num ber of faults or even the same
num ber of faults when they are of a different type. Therefore, a fault-tolerant MIN
design should specify a fault tolerance criterion, and the num ber and type of faults
to be tolerated. These three item s make up the fault tolerance model.
In this thesis, a novel technique is described to provide fault tolerance to virtually
any MIN. The technique is dem onstrated on one type of MIN, the Baseline network,
where it can be used m ost efficiently. In the Clos network where the technique is
less successful, another technique is suggested. Together, the two techniques should
offer a comprehensive solution to the fault tolerance problem of an im portant class
of MINs.

1.4

O u tlin e

The rest of this thesis is organized as follows. C hapter 2 presents some m athem atical
background needed to understand the work th a t is either m entioned or developed
in this thesis.

In particular, a generalized model for m ultistage interconnection

networks is developed for use later in the thesis. Furtherm ore, the basics of fault
tolerance, com binatorics and reliability are presented. These basics are used quite
extensively in this thesis. C hapter 3, shows some popular im plem entations of MINs.
The construction, operation and routing of each MIN are discussed. Chapter 4 is
a survey of some of the work th a t has already been done on fault tolerance for
the class of MINs considered in this thesis.
of these techniques are highlighted.

The advantages and shortcomings

In C hapter 5, the Simple Fault Tolerance

(SFT) technique is introduced through im plem entation on a Baseline network. The
construction, operation, and routing of the Simple Fault-Tolerant Baseline (SFTB)
13

network under both norm al and faulty conditions are described in detail.

The

advantages and disadvantages of the SFT technique are discussed. It is shown why
this technique is not suitable for networks with large switches such as Clos networks.
C hapter 6 gives an alternative technique th a t is suitable for Clos networks. The
last chapter contains some concluding rem arks on the work done in the thesis.

14

C h a p te r 2
B a s ic C o n c e p ts a n d N o ta t io n
In this chapter, the basic concepts needed to understand the work in this thesis
are introduced. These concepts include interconnection network modelling, fault
tolerance principles, combinatorics, and reliability theory. They all are used either
to explain work already done by others or to derive new results.

2.1

I n te r c o n n e c tio n N e tw o r k s

Before, discussing the characteristics of MINs, some definitions will be introduced.
The definitions are adapted m ainly from [37].
D e fin itio n 2.1 A function F from a set A into a set B is one-to-one if each element
in B has at most one element in A mapped into it. In other words, F is one-to-one
if a-yF = a 2 jF implies a-i = a^. Further, a function F is onto B if every element in
B has at least one element in A mapped into it. In other words, F is onto B if for
each h £ B there exists a € A such that aF = b. It should be noted that a function
on A by definition is defined for each a £ A.
D e fin itio n 2.2 A permutation of a set A is a one-to-one function F mapping A
onto itself. One may write this as

I f aF = a for all a € A, F is called the identity permutation.
To illustrate, consider the set A = {0,1,2, 3}. Suppose th a t a function F is given
by the m apping:
0

->

2

1 —> 3
2
1
3 —►0
It is obvious th a t F is one-to-one and onto and, thus, defines a perm utation. The
perm utation F can be w ritten in a more standard notation as
0
F = ,
V2
D e fin itio n 2 .3 Let

ip

1 2
3 1

3
0

and ip be two functions. I f

<p

maps element a into b, denoted

as ap — b, and ip maps element b into c, denoted as bip = c, then a(pip) = (ap)ip =
bip = c. pip is called the composition of p and ip.
T h e o r e m 2.1 I f F\ and F2 are both permutations of the set A, then the composite
function F \F 2 is also a permutation of A.
To illustrate Theorem 2.1, whose proof can be found in [37], consider the above
set A and p erm utation F. Also consider the perm utation G, given by

G

0
VI

1 2
3 2

3
0

Then the composite function F G is given by

FG

0
\2

1 2
0 3

3
1

which is a p erm utation according to Definition 2.2. It should be noted th a t F G ^ GF.
D e fin itio n 2 .4 Let N be the cardinality (number of elements) of the set A. Then,
the set of all possible permutations of A is called the symm etric group, denoted by
Stv- The cardinality of Tin is A!.
16

Figure 2.1 shows a generalized MIN. In this MIN there are N inlets, M outlets, u
stages, and Sj switches in each stage j , 0 < j < v —1. In addition, stage j derives
all of its inputs from stage j — 1, stage 0 derives all of its inputs from the inlets of
the MIN, and the outlets are all derived from stage v — 1. All the MINs discussed
in the thesis fit in this generalized model unless otherwise stated. The MINs th a t
do not fit in the model will appear only at the end of C hapter 3 for the sake of
completeness.
Let

1 < j < v —1 be the function realized by the set of links between stages

j — 1 and j , mapping the outputs of stage j — 1 onto the inputs of stage j . Further,
let F3j, 0 < j < v — 1, be the function, realized by the set of switches in stage
j , m apping the inputs of stage j onto its outputs. Finally, let Fj be the function,
realized by the set of links connecting the inlets to the inputs of stage 0, m apping
the inlets on the inputs of stage 0, and let Fo be the function, realized by the set of
links connecting the outputs of stage v —1 to the outlets, m apping the outputs of
stage v — 1 on the outlets. Then a MIN is completely defined by the (2v + 3)-tuple
( 1 ,0 , Fi, F0 ,F h ,Fi2, . . . , Fi„_j,

, F ai, . . . , T iv_x)

where T ai is the set of all m appings realizable by stage j of its inputs into its
outputs, and I and 0 are the sets of inlets and outlets, respectively.
In any MIN, for all j , 1 < j < u — 1, F , is a permanent perm utation of the
outputs of stage j —1 into the inputs of stage j . However, for all j , 0 < j < v —1, Faj
is a variable perm utation th at can be changed by setting the individual switches
of stage j . Throughout this thesis, the word mapping will indicate a one-to-one
m apping, unless otherwise specified.

17

M -l

N -l

I

Fj

T 3o

Fh

F ai

Fh

Flv_x T , v_x

Figure 2.1: Generalized MIN

18

F0

Let 5jvm be the set of m appings of the N inlets into the M outlets realizable by
the MIN defined above. Then, it can be seen th at
Smn =

. . . Fiv_x!Fav_xFo-

(2.2)

A MIN m ay or may not have N = M . All MINs in this thesis have M = N .
Such MINs are usually called perm utation networks. An ideal perm utation network
is one where Sjyjv =: Ejy. This is the case for non-blocking networks. Blocking
networks have «Sjvj\r C Sat.

2.2

F a u lt T o lera n c e for M I N s

Before designing a fault-tolerant MIN, a fault tolerance model [4,49] m ust be de
fined. The fault tolerance model contains three elements: the fault model, the fault
tolerance criterion, and the fault tolerance size.
The fault model is the type of faults th at can occur in the network. Implicitly,
the fault model specifies the type of faults th at can be recovered from using the
proposed fault tolerance design. Different designs specify different fault models. A
good design, however, is one whose fault model includes as many fault types as
possible. To illustrate, a typical fault model is as follows.
1. Any network com ponent can fail: MINs are made up of two types of compo
nents: switches and links. This assum ption then states th at the switches and
links are likely to fail, and th a t the proposed design is capable of recovering
from any such fault. A link fails if it is open or short circuited. A switch
fails due to some internal m alfunction. More details on switch failures can be
found in C hapter 5.
2. The extra hardw are added to provide fault tolerance to the network cannot
fail: This assum ption is usually made for two reasons. First, if the extra
19

hardw are added to the network to make it fault tolerant could be assum ed to
fail, then it would not be possible to propose any fault tolerance design. In
addition, this assum ption can be justified for MINs because these components
usually rem ain idle under norm al conditions. Thus they can be expected to
have higher lifetime th an the actively working components of the network.
The fault tolerance criterion is the condition th at m ust be met in order for the
system to be called fault tolerant. The fault tolerance criterion for the networks sur
veyed in this thesis is m ainly full-access retention. T hat is, after a fault occurs, each
processor m ust still be able to communicate with any memory module. However,
th e two fault tolerant designs introduced in this thesis can offer a higher criterion
- full recovery. Full recovery is the ability of the network to regain its pre-fault
connectivity after a fault occurs.
The fault tolerance size is the number of faults th a t the system can recover
from. T h at is, the num ber of faults that the system can have and still meet the
fault tolerance criterion im posed. All the fault-tolerant MINs in this thesis, w hether
surveyed or developed, are single fault-tolerant. This means th at the fault-tolerance
criterion can be met only if there is one fault anywhere in the network. If a faulttolerant network can tolerate i specific faults, i > 1, but not any arb itrary i faults,
it is called i-robust.

2 .3

C o m b in a to r ic s

This section discusses the counting techniques necessary to evaluate the probability
of a given event. One is often faced with the question “In how many ways can this
event occur?” The answer starts with the Law of Composition which states:

20

If event A a can happen in n-± ways, event A 2 can happen in n 2 ways, ...,
and event A m can happen in n m ways, then the com pound event (Ai
m
AND A 2 AND . . . AND A m) can happen in JJrz.,- ways.
t=i
E x a m p le 2.1 How many 3-digit numbers can be written in the three slots below,
given that each slot can be filled with a decimal digit i 6 { 0 , 1 ,. .. , 9}.

The solution is obtained by applying the Law of composition. Label the slots from
left to right as 1, 2, and 3. Slot 1 can be filled with any one of the ten digits, and
so can slots 2 and 3. T h at is, there are ten ways to fill in each slot, for a to tal of
10 x 10 x 10 = 1000 ways. This answer is correct because in three positions, one
can have any num ber from 000 to 999, which are 1000 num bers in total.
Now there are two additional counting problems th a t are often encountered:
perm utations and combinations.

2.3.1

P erm u ta tio n s (arran gem en ts)

Here the num ber of ways k elements can be arranged is required. It should be noted
th a t order matters in perm utations, i.e. A B ^ B A .
E x a m p le 2.2 How many 3-letter words can be formed from the 5 letters A, B, C,
D, and E, without repeating any letters?
This example is solved again by drawing three slots as shown below.

Label the slots as in Example 2.1. In slot 1, one can p u t any one of the five
letters, leaving four letters for the rem aining two slots. Slot 2 can be filled with one
of four letters, for each choice m ade for slot 1, leaving three letters to be used with
21

slot 3. Thus, the num ber of ways one can fill in the three slots, thereby creating 3letter words, is 5 x 4 x 3 = 60 words. The num ber 60 is the num ber of permutations
of 5 things taken 3 at a time. In general the num ber of perm utations of n things
taken k at a tim e, denoted as P ( n ,k ) , is
P(n, k) = n • (n - 1) • • • (n - k + 1) = -—

\n —fcj!

(2-3)

In fact, this is the first k factors of n\.
In the above example, repetition was not allowed. (Implicitly it was assumed
th a t one copy of each letter was available.) In other words, words like AAA, AAB,
. . . etc. were not included in the num ber given. However, if repetition is allowed,
the equation for P { n , k ) becomes
P ( n ,k ) = n k.

(2.4)

Thus in the example above, if repetition was allowed, the total num ber of 3-letter
words would be 53 = 125, which can be verified by the slots m ethod. It should be
noted th a t in Example 2.1 repetition was allowed. In general, one can tell easily
w hether repetition is allowed or not from the context of the problem.
Of interest is the perm utations of n things taken n at a time, where the n things
are not distinct. Suppose th a t of the n things, n x are similar, n 2 are sim ilar, —
etc., such th at n x + n 2 + . . . + nu = n. T hen the num ber of visible perm utations is
P ( n ,n ) = -

■ ------ 1

n 1!n2! •••71*,!

(2.5)

This formula is very useful in dealing w ith binary num bers, as illustrated by Exam 
ple 2.3.
E x a m p le 2.3 How many 8-bit binary numbers can be formed from three 0 ’s and
five l ’s?
22

Applying Equation 2.7, one finds th at the num ber of the 8-bit num bers is

P (8 ’ 8 ) = ( 3 i P ) = 5 6 '

2.3.2

C om b in ation s (selectio n s)

Here the num ber of ways k elements can be selected is required. It should be noted
th a t order does not m atte r in com binations, i.e., A B = B A . This would be the
situation for example in form ing a team from a pool of players. Suppose th at it is
required to form a team of three players from a pool of 10 players. Suppose th a t a
selection was m ade in which John was chosen first, Jack second, and Bill last. One
cannot say th a t another team can be formed by choosing Bill first, John second and
Jack last; it is the same team . This analogy should serve as a rem inder th a t the
approach needed to tackle a com binatorial problem depends on the problem itself.
Since order does not m a tte r in combinations, it is expected th at the formula for
th e combinations of n things taken k at a tim e, C ( n ,k ) , will be the same as th at
of the perm utations of n things taken k at a tim e, P ( n , k ) , after eliminating the
“perm utations” w ithin each selection. Since each &-item selection can be arranged
&! times, the form ula for th e combinations is obtained by dividing the perm utations
form ula by kl to get

c <re-*> = ( ^ n ;

<2-6>

In fact, this is the first k factors of n! divided by A:!. Sometimes C(n, k ) is w ritten as
( £ j . The latter notation is the one adopted in the thesis for indicating combinations.
E x a m p le 2.4 How many 11-player teams can be formed out of 20 players ?
The num ber of 11-player team s is

(ll) = 16796023

2 .4

F u n d a m e n ta ls o f R e lia b ility

The reliability of a system is defined as [68,82] the probability th at the system will
perform a required function under stated conditions for a stated period of tim e t.
M athem atically, the reliability, R , of a system is
R = e~At,

(2.7)

where A is a constant representing the failure rate (per unit time). To simplify the
analysis in this thesis, the tim e factor will be only implicit. In other words, when it
is said th a t the reliability of a switch is r, it will mean the reliability of the switch
over a given period of tim e t. This is done, because the focus will be on comparing
reliabilities, rath er th an obtaining the absolute reliability value. In comparing two
networks, for instance, the two networks should be under the same circumstances,
including the period of tim e, t, hence the omission of the time factor.
Predicting reliabilities usually involves dealing with probabilities. It stands to
reason then th a t an overview of probability theory be given before discussing the
fundam entals of reliability.

2.4.1

P ro b a b ility o f a sim p le event

In an experim ent of n equally likely outcomes, the probability th a t one event will
occur is 1/ n .
E x a m p le 2.5 What is the probability of obtaining 6 on a fair die?
Since the die is fair, it is equally likely th a t any one of its 6 sides will appear if
the die is throw n. The six events are {1,2,3,4,5,6}. Thus the probability of a 6
appearing, denoted as P r(6 ), is
P r(6 ) = 1/6.
24

2.4.2

P ro b a b ility o f a com pound event

A compound event is a composition of simple events using two rules: AND and
OR. For example th e event th at one gets either a 6 OR a 4 on a die if throw n is a
com pound event m ade up of the simple events 6 and 4 and the rule “O R ” . Similarly
the event of getting 4 AND 6 on two different dies is a compound event made up
of the simple events 4 and 6 and the rule “AND” .
Let A 1 and A2 be two simple events. Then the two rules are defined as follows.
1. The probability th a t Ax OR A2 will occur, denoted as P r{A i U A 2), is
P r ( A x U A 2) = P r(A i) + P r { A 2) - P r(A x D A 2)

(2.8)

If the two events are mutually exclusive, the last term vanishes and the prob
ability of the com pound event becomes
P r { A 1 \J A 2) = P r { A l ) + P r { A 2)

(2.9)

In general, for any k m utually exclusive events, Ai, A 2, . . . , Ak, the probability
of the com pound event Aj OR A 2 OR . . . OR Ak is
k
P r(A a U A2 U . . . U Afc) = 5 3 ^ .

(2.10)

i= 1

The O R rule is used usually when words like “at least” or “either” are m en
tioned.
2. The probability th a t Ai AND A2 will occur, denoted as P r ( A i fl A2), is
P r ( A x n A2) = Pr(A x) • P t (A 2|Ax),

(2.11)

where P r (A 2|A i) indicates the probability th at A2 occurs given th a t Ai has
occurred. If the two events are independent, the last term becomes P r(A 2)
25

and the probability of the compound event becomes
P r(A i fl A 2) — P t (A \) • Pr(A2)

( 2 . 12 )

In general, for any k independent events, Ax, A 2, . . . , Ak, the probability of
the com pound event Ax AND A 2 AND . . . AND Ak is
P r ( A 1 n a 2 n ...

k
n A k) = n * »=i

(2.13)

The AND rule is usually used when words like “all” or “b o th ” are mentioned.
Simple as they are, these two rules, OR and AND, can be used to solve complex
probability problems by using them systematically.
E x a m p le 2.6 What is the probability of having 2, 4, or 6 when throwing a die?
This com pound event can be expressed as a composition of the simple events it
contains. Let getting a 2 on the die be denoted as Ax, and similarly let the two
events of getting 4 and 6 be denoted as A 2 and A 3, respectively. T hen w hat is
required is Pr(Ax OR A 2 OR A3). Since Pr(2) = P r ( 4) = P r(6 ) = 1/6, and since
Ax, A 2 and A3 are m utually exclusive, then, by E quation 2.12, the probability of
getting 2, 4 or 6 is
Pr{Ax U A 2 U A 3) = 1 /6 + 1 / 6 + 1 / 6 = 1 /2 .
E x a m p le 2.7 What is the probability of getting a 4 or an even number when throw
ing a die?
This is again a compound event involving the O R rule. Let the event of getting
a 4 be denoted as Ax and the event of getting an even num ber be denoted as A 2.

26

Clearly the two events are not m utually exclusive and therefore, Equation 2.10 m ust
be applied.
From Examples 2.5 and 2.61, Pr(A x) = 1/6 and P r ( A 2) — 1/2. The last term
in the equation denotes the intersection of the two events. Clearly, the two events
intersect in (have in common) the num ber 4 (which is again A \ ) whose probability is
1/6. S ubstituting in Equation 2.10 the probability of getting a 4 or an even num ber
is
Pr(Ax U A 2) = 1 / 6 + 1 / 2 - 1 / 6 = 1/2.
E x a m p le 2.8 What is the probability of getting at least 2 on a die?
Here we need to evaluate P r ( 2 OR 3 OR 4 OR 5 O R 6). These events are all
m utually exclusive, and therefore the solution is
P r ( 2 U 3 U 4 U 5 U 6) = 1 / 6 + 1 / 6 + 1 / 6 + 1 / 6 + 1 / 6 = 5 /6 .
The same result could have been obtained by evaluating 1 — P r ( l ) , which is
obviously equal to 5/6. In general, problems involving at least can be best solved
by subtracting the probability of the com plem entary event (getting a 1, in Exam 
ple 2.8), from unity.
E x a m p le 2.9 Assuming that the probability of a child in a particular family being
male is 0.53, find the probability that in a family of 5 children,
1. the 3 oldest will be boys and the 2 youngest will be girls, and
2. there are three boys in the family and 2 girls.
Let the event th at a child will be a boy be denoted by b and the event th a t a
child will be a girl be denoted by g. Then,
Pr(g) = 1 - p r (b) = 1 - 0.53 = 0.47.
27

1. Let the probability that the three oldest will be boys and the two youngest
will be girls be denoted as P r ( l ) .

Then P t’(I) can be w ritten as (not the

order of the events)
P r ( l ) = Pr(b fl b fl b fl g Pi g).
Clearly, the sex of the new born child is independent from the sex of the
previous child. Thus one can w rite
P r ( l ) = P r(b) • Pr(b) ■Pr(b) ■Pr{g) ■P r (g ) = (0.53)3(0.47)2 = 0.033.
2. Let the probability th at there are 3 boys and 2 girls in the family be denoted
as P r(2 ). Then,
P r( 2) = Pr([b n & n & n g n p j u f & n & n g ' n ^ n & J u . . . )
The probabilities of the com pound events inside the brackets are all equal,
namely, they are equal to P t*(1) obtained in P art 1 of this example. Thus the
question is: In how many ways can one arrange 5 item s (children) three of
which are similar (boys) and th e rem aining two are also similar (girls). This
is the perm utation problem discussed in Section 2.3.1, and the answer to the
question is ^ j . Although this answer is obtained from using perm utations, it
is the same as the num ber of com binations of 5 item s taken 3 (or 2) at a time.
Thus,

Pr(2)=(3)Pr(1)=2§! '°'033 =°'33
This example illustrates the binomial distribution or Bernoulli trials. It is any
experiment whose outcomes m ust be one of two things (e.g. success or failure), and
whose next outcome is independent of the present one. Tossing a coin, for example,
is a Bernoulli trial, because the outcom e is either heads or tails, and the outcome
28

this time does not affect the outcome next time. The binom ial distribution is used
in this thesis in obtaining the reliability of the fault tolerant Clos network.

2.4.3

R elia b ility m odels

Once again, it will be shown th a t the A N D /O R rule is useful in evaluating the
reliability of a system. A system can be broken down from the reliability standpoint
into isolated blocks. For simplicity, suppose th a t a system can be broken down into
two blocks A and B . Then there are two situations possible.
If the system fails when either block fails, then the blocks are said to be in
series and the combined system reliability is the probability th at both A AND B
are operational. This situation is depicted in Figure 2.2a. If the two blocks are
statistically independent, i.e. the failure of one is independent from the failure of
the other, then
R = R xR 2,

(2.14)

where R is the reliability of the system , R x is the reliability of block A and R 2 is the

a) Series system

b) Parallel system

Figure 2.2: Basic reliability models

reliability of block B . In general, for a series system of n statistically independent
blocks, the combined reliability of the system is
n

R = l[R i

(2.15)

i= l

On the other hand, if the system fails if both blocks fail, then the blocks are said to
be in parallel and the combined system reliability is the probability th at either A
OR B are operational. This situation is depicted in Figure 2.2b. If the two blocks
are statistically independent, i.e. the failure of one is independent from the failure
of th e other, then
R — R \ -f- R 2 —R \ R 2.

(2.16)

This expression can also be w ritten as
R = l - ( 1 - R 1) ( l - R 2).

(2.17)

Recall th a t this form could have been obtained directly by considering the fact
th a t in a parallel system, the reliability of the system

isthe probability th a t at

least one block is operational (which is obtained by subtracting the probability th at
b o th A AND B are inoperational from unity. In general, for a parallel system of n
statistically independent blocks, the combined reliability of the system is
n

R = l - n a - Ri)
i=i

(2.18)

Ju st as the AND and OR rules can be used to solve complex probability problems,
the series and parallel reliability decompositions can be used systematically to find
the reliability of complex systems.
Sometimes, a system has n parallel blocks, b u t needs at least m of them to
rem ain operational. This problem is a binomial distribution. The reliability of the
system in this case can b etter be expressed as unity minus the probability of the
30

com plem entary event (th at is, failure occurring from having between 0 and m — 1
operational blocks). The operational blocks are indistinguishable from each other,
and so are the non-operational blocks. Recall th at the way to count the num ber of
ways these blocks can be arranged together is a com bination problem (although it
is originally a perm utation problem.) Thus the reliability of the system is
»

R •'(1 - R f - 1

x = i - Z

(2.19)

1=0

This form ula is used in the thesis to obtain the reliability of the fault-tolerant
Clos network in C hapter 6.

2.5

N o t a tio n

T he following notation is used throughout the thesis.
i : general index, switch num ber in a network stage
j: general index, stage num ber in a network
X ( i , j ) : (crossbar) switch num ber i in stage num ber j
N : network size, num ber of inlets or outlets of a network
J: set of all inlets of a network
O: set of all outlets of a network
m : in a Clos network, num ber of inputs to a first-stage switch, or num ber of outputs
to a third-stage switch
n: in a Clos network, num ber of outputs of a first-stage switch or num ber of inputs
to a third-stage switch

31

k: in a Clos network, the num ber of inputs or outputs

ofa m iddle-stage switch

lg: log2, logarithm to the base 2
L&J: integer value less than or equal to the real num ber x, also called the floor of x
[V |: integer value grater th an or equal to the real number x, also called the ceiling
of x
S', source inlet (integer value)
D: destination outlet (integer value)
(D)£: i^-bit binary representation of the integer D
a mod b: rem ainder after dividing the integer a

by the integer b

i/: num ber of stages in a MIN
r: reliability of a switch over a given period of time
R: reliability of a network over a given period of time
P: p erm utation
V: set or group of perm utations
e: identity perm utation
V $ .’ set of all perm utations blocked in exactly £ paths when realized on a network
of size N
sym m etric group, the set of all perm utations of size N
|A|: cardinality of A , the num ber of elements in the set A
4>: em pty set
32

C h a p te r 3
M I N I m p le m e n ta tio n s

Over the past three decades, a large num ber of MINs have been proposed. The
MINs are m ostly an im plem entation of the generalized MIN model defined in C hap
ter 2. Since this thesis centers on MINs, a thorough understanding of their con
struction, operation and routing is necessary before attem pting to enhance them
with fault tolerance capabilities. However, due to the large num ber of MIN im ple
m entations, it is difficult to cover all of them in this thesis. Instead, only the most
popular im plem entations will be discussed.
In this chapter, the set of MINs discussed in the thesis will be presented. The
generalized MIN model given in C hapter 2 will be used systematically to introduce
and rigorously define these MINs.

Moreover, the model will serve as a tool to

highlight the differences among the different MIN im plem entations. The model is
repeated here, as E quation 3.1, for convenience.
M I N = ( 1 ,0 , F j , F 0 ,F h ,F h , . . .

.Fao, F SI, . . . , ( 3 . 1 )

where
• I and O are the sets of inlets and outlets, respectively,

33

• Fj is the perm utation realized by the set of links between the inlets and the
inputs of the first stage,
• Fo is the perm utation realized by the set of links between the outputs of the
last stage and the outlets,
• Fij is the perm utation realized by the set of links between stage j — 1 and j ,
1 < j < v — I , and
• J-aj is the set of all m appings F

realizable by

stage j , 0 < j < v — 1.

Recall th a t the model applies only to MINs where all the inputs of stage j , 1 <
j < v — 1 , are derived from the outputs of stage j — 1 , all the inputs of stage 0
are derived from the inlets and all the outlets are derived from the outputs of stage
v — 1. Incidentally, not all MINs have this characteristic; at the end of this chapter,
some of those MINs will be shown.
According to the generalized model, all MINs m entioned in this thesis, unless
otherwise specified, have
1. 1 = 0 and |I| = \0\ = N ,
2. v > 2,
3. switches in each stage are all of the same size,
4. for all j , 0 < j < v — 1, F3j is a perm utation (the im plication here is th at the
num ber of inputs of any stage is equal to the num ber of its outputs), and
5. for all j , 0 < j < u — 1, T Sj C S/y, where N is the MIN size (the im plication
here is th a t there is more th an one switch in any stage of the MIN).

34

From the first point above, it is no longer necessary to refer to the size of a network
as N x M , where N is the num ber of inlets and M is the num ber of outlets. Since
both numbers are now equal, a network of size N x JV, will be referred to simply as
a network of size N .

3.1

T h e B a s e lin e N e tw o r k

The Baseline network shown in Figure 3.1 is chosen in this thesis as representative
of a family of networks, called the shuffle family [80,87]. This family is characterized
by using the same switch structure and layout. More specifically, a network of size
N in this family m ust have v = \ g N stages, with each stage having N /2 switches,
each 2 x 2 . Moreover, the switches of any stage j -f- 1 can be interchanged so th at
the links between stages j and j 4-1, 0 < j < u — 1, form a 2-shuffle of the terminals
of one stage into the term inals of the other.
The c-shuffle of eg objects, where c and q are two positive integers, is formed
as follows [41]. Think of the cq objects as cards in a deck. Divide the cards into c
piles of q cards each. P u t the piles in a row, in any arbitrary order. Pick up the
top card of the first pile and put it as the first card of a new pile. Pick up the top
card of the second pile and put it on top of the first card of the new pile. Repeat
this process u n til the top card of each of the c piles is picked up. Now, visit the c
piles again in the same order picking up the top card of each pile and putting it in
the order it was picked up on top of the new pile. Every tim e the c piles are all
visited, c cards are added to the new pile. R epeat this process in a circular fashion
until the cards in the c piles are all picked up. Clearly, the new pile now has all cq
cards. The ordered cards of the new pile represent the c-shuffle of the original deck.
Figure 3.2a shows the 4-shuffle of 8 objects. The column on the left represents the
original 8 objects before shuffling. These objects are divided from top to bottom
35

4

4

5

5

6

6

7

7

stage 0

stage 1

stage 2

Figure 3.1: 8 x 8 Baseline network with routing example

36

into 4 sections. The arrows in the figure signify how the objects of each section
were interleaved in the m anner described above to form the 4-shuffle. The column
on the right then is the 4-shuffle of the column on the left.

'O-

o-

Jo-

O

O

2

2

3

3

4

4

5

5

6

6

7

O-

a) 4-shuffle of 8 elements

7

b) 2-shuffle of 8 elements

Figure 3.2: Shuffling 8 objects
If c = 2, the shuffle is called perfect. A perfect shuffle of 8 objects is shown in
Figure 3.2b. It should be noted th a t a c-shuffle is the inverse of a g-shuffle. That is,
if a c-shuffle is perform ed on N objects, the order of the N objects can be restored

37

by performing a g-shuffle, where cq = N , on the N objects. In other words,
(c —shuffle) x (q — shuffle) = e.
This identity is dem onstrated in Figure 3.2.
The 8 x 8 Baseline network shown in Figure 3.1 consists of three stages, each
having four 2 x 2 switches. For the purpose of this study, the stages will be labelled
from left to right as 0,1,2, and the switches in each stage will be labelled from top
to bottom as 0,1,2,3. A similar num bering scheme will be assumed for all MINs in
this thesis unless otherwise specified. Note also th a t for all MINs in this thesis, the
inlets will be on the left of the MIN, whereas the outlets will be on the right. In
general, a Baseline network of size N m ust have N /2 switches, each 2 x 2, in each
stage, and the num ber of stages m ust be v — lg N .
In reference to E quation 3.1, the Baseline network fits in the model as follows.
1. v = lgTV, where N is the network size
2. Fi — Fo = e, where e is the identity perm utation
3. W ithin any stage, the switches can be rearranged so th a t perm utations F],, . . . , Fiv
represent either a 2-shuffle or an iV/2-shuffle
4 T
J

so

—F

------

J

*1

—

------

All networks

* * *

—T

------

J

*v —
I

in the shuffle family can be constructed from

one another [86].

For example, consider the Baseline network of Figure 3.1. It can be seen th at the
links between stages 0 and 1 form a perfect shuffle of the inputs of stage 1 into the
outputs of stage

0. Now, interchange switches 1 and 2 of stage 2. The result will

be a perfect shuffle from the inputs of stage 2 into the outputs of stage 1. If the
links between the outlets and the outputs of stage 2 can now be rearranged to form

a perfect shuffle from the outlets into the outputs of stage 2, and if the inlets are
used as outlets and the outlets are used as inlets, then the resulting network, shown
in Figure 3.3, is called the Omega network [51].

Figure 3.3: 8 x 8 Omega network
In reference to Equation 3.1, the Omega network fits in the model as follows.
1. u = \ g N ,
2. Fi — 2-shuffle and Fo = e,
39

3. Fh = Fh = • • • = Fiv_1= 2-shuffle,
4. F 30 —

- • ■■— F av_x.

The fact th a t m any networks can be developed from one shuffle network by
changing the link m aps between the stages was recognized after many networks in
the shuffle family had been proposed. Examples of such networks are the Base
line [85], Generalized Cube [77], Indirect Binary ra-cube [71], Omega [51], Shuffleexchange [81], STARAN™ flip [11] and SW -banyan (S = F = 2 ) [40] networks. These
networks seemed different at the beginning, but later they were proven to be topo
logically equivalent [77,78,86]. Therefore, when one needs to speak about the shuffle
family, it is sufficient to speak about only one m em ber of the family. As m entioned
earlier, the member selected in this thesis is the Baseline network.
It should be noted th a t the Baseline network is a special case of the delta network
[69], with a = b = 2 and 2-shuffle for the link m aps between the stages. The delta
network is a generalization of the shuffle family. Its basic components are delta
elements, which are a x b crossbar switches, with c-shuffles between the stages.

3.1.1

R o u tin g th e B aselin e netw ork

T he Baseline network is built from 2 x 2 crossbar switches. O ther names for 2 x 2
switches are b eta elements, binary cells, binary switches, binary m odules, in ter
change boxes, and exchange boxes. The two names used in this thesis are the 2 x 2
switch and binary switch.

A binary switch can assume one of two legal states,

shown in Figure 3.4. In the straight state, the upper input is connected to the u p 
per o u tp u t and the lower input is connected to the lower o u tp u t. In the cross state,
the upper inp u t is connected to the lower output and the lower input is connected
to the upper output. A switch assumes one of its legal states based on the routing

40

a) straight state

b) cross state

Figure 3.4: Legal states of the binary switch
bits on its inputs.
An inp u t is connected to one of the two outputs depending upon w hether the
routing bit is 1 or 0. Normally, if the bit is 0, the input is connected to the upper
output; and if the bit is 1, the input is connected to the lower o u tput. Therefore,
the upper o u tp u t of a binary switch is sometimes referred to as the 0 output while
the lower o u tp u t is referred to as the 1 output. A problem arises if the two inputs
of a switch have identical routing bits. In such a case the two inputs would be in
effect asking for the same output, giving rise to a conflict. P u t differently, a conflict
occurs when trying to put the switch in both of its legal states simultaneously. Since
only one in p u t can be connected to a given o utput, one of the two inputs will not
be connected to any output; it will be blocked. It is obvious then th a t the Baseline
network is a blocking network; not all sets of paths can be established between the
inlets and outlets. One can also easily verify from Figure 3.1 th a t the p a th between
any given inlet and any given outlet is unique.
Routing the Baseline network is carried out in a distributed fashion. To estab
lish a p ath between inlet S and outlet D, one only has to send the ix-bit binary
representation of D on inlet S and let the individual bits control the switches they
41

traverse as they pass from one stage to another.

Let (D )£ be the t'-bit binary

representation of the integer D. T hat is,
(■D)^ =

,o?„_2 ,. ■., d,Q.

T hen bit di will be used to control a switch in stage v — 1 —i, 0 < i < v — 1. This
is called distributed routing, and is the main advantage of the Baseline network.
A dem onstration of this routing technique is shown in Figure 3.1. The thick line
represents a p ath established between inlet 0 and outlet 6 by putting the 3-bit binary
representation of 6, 110, on inlet 0. Each bit in this routing tag is shown next to
th e switch it controls. The tim e complexity of the distributed routing scheme of
the Baseline network is 0 (lg N ).
The disadvantage of th e distributed routing, and hence the disadvantage of the
Baseline network, is th a t it cannot realize every perm utation in the symm etric
group, £ # ; only a family of perm utations can be realized. This family has already
been identified and found [1] to include many of the perm utations often needed [53]
in parallel processing.

3 .2

T h e C lo s N e tw o r k

A Clos network of size 8 is shown in Figure 3.5. It has three stages, which can be
num bered 0, 1, and 2 from the input side to the output side, respectively. Stage 0
has four switches, each 2 x 2 , stage 1 has two switches, each 4 x 4 , and stage 2 has
four switches, each 2 x 2 .
Clos networks in general have three stages. A Clos network of size N m ust have
k = N / m switches, each m x n , in stage 0, and k = N / m switches, each n x m ,
in stage 2. All three stages are connected by inter-stage links in such a way th a t a
switch in a given stage has access to all the switches in the next stage. Since there
42

Figure 3.5: 8 x 8 ordinary Clos network

43

is a link from each switch in stage 0 or stage 2 to every switch in stage 1, there are
exactly n switches, each k x k, in stage 1. It should be noted th at in Clos networks,
n > m. In this thesis, th e term ordinary Clos network, or just the Clos network,
will refer to the case where n = m. W hen n > m , some degree of fault tolerance is
obtained, a fact utilized by the work of this thesis.
In reference to E quation 3.1, the Clos network fits in the model as fellows.
1. u = 3
2.

Fj

=

Fo

= e

3. FixF i2 = e
4. F,a = F , 2
5- F ^ F ^ F ^ F ^ F ^ = Ejv

3.2.1

R o u tin g th e C los netw ork

In the Clos network, there is a central routing unit whose function is to receive a
m apping, usually a p erm utation, and to find the proper setting for each individual
switch to realize th a t perm utation. This routing task turns out to be the main
drawback of Clos networks. Setting the switches of stages 0 and 2 first and then
trying to set the switches of stage 1 is not the right procedure, as conflicts will arise
in stage 1. The proper procedure is to start by setting the switches of stage 1, then
the switches of stages 0 and 2 can be set accordingly. However, finding the right
settings for the switches of stage 1 such th a t no conflict occurs is not a trivial task.
Three approaches have been explored in the literature for solving this problem: the
group theoretic approach [66], the direct m atrix decomposition approach [22], and
the graph theoretic approach [54]. The group theoretic approach has been found
44

[17] unfeasible and therefore will not be discussed here. The other two approaches
will be illustrated by way of an example.
Suppose it is required to realize on the Clos network of Figure 3.5 the following
perm utation
P =

0 1 2 3 4 5 6 7
4 3 2 1 5 0 7 6

The direct m atrix decomposition approach starts by constructing an N x N m atrix,
I , from the perm utation to be realized above as follows.
Tr• -1 — /
^ inlet i is to be routed to outlet j
[ 0 otherwise
where I\i ,j] is the element of m atrix I in row i and column j , 0 < i , j < N — 1.
Thus for perm utation P above,
0
0
0
0
0
1
0
0

0
0
0
1
0
0
0
0

0
0
1
0
0
0
0
0

0
1
0
0
0
0
0
0

1
0
0
0
0
0
0
0

0
0
0
0
1
0
0
0

0
0
0
0
0
0
0
1

o'
0
0
0
0
0
1
0

The next step is to partition I into k x k quadrants, m x m each, and then
construct a k x k m atrix, H m, from I as follows.
H m[i j j ] = sum of the m x m entries in quadrant i , j of I.
Thus, for the example under consideration,

=

0 1 1 0
1 1 0
0
1 0
1 0
0 0 0 2

Decomposing the H m m atrix into m m atrices, k x k each, such th at each row
or column in any of the m m atrices has only one 1 and all the other entries are 0’s,
45

gives the proper settings for the switches of stage 1. For the example above, since
the m atrix is so small, this decomposition can be perform ed by inspection to yield
'0
1
1
.0

1
1
0
0

1
0
1
0

O'
0
0
2.

—

'0
0
1
.0

0
1
0
0

1
0
0
0

O'
‘0
1
0
+
0
0
1.
.0

1
0
0
0

0
0
1
0

O'
0
0
1.

The two m atrices to the right represent the settings of the two switches in stage
1 of the Clos network of Figure 3.5. A 1 in row i and column j , indicates th a t input
i of the switch is to be connected to o u tp u t j of the same switch. The settings
resulting from decomposing Hm insure th a t no conflict will occur in stage 1 and
th a t all required paths specified by the perm utation will be accom m odated. Once
the switches of stage 1 have been set, the switches of stages 0 and 1 can be easily set
to complete realization of the perm utation. M any algorithm s have been proposed
to decompose H m in the general case.
One such algorithm is Neim an’s algorithm [64]. N eim an’s algorithm consists of
the following two steps.
1. Starting with the leftmost column, m ark in each column of H m a non-zero
element, such th a t no two non-zero elements are m arked in the same row. If
a column is encountered where no such an element can be m arked proceed to
the next column. If k elements are m arked this way, then the algorithm is
done; otherwise the second step below m ust be performed.
2. Assume there are x marked elements, x < k, from previous m arking oper
ations. M ark a non-zero element in the column where no element could be
marked before and unm ark the previously m arked element in th a t row. Visit
the column where an element has just been unm arked and m ark an element in
th at column, in a row where there is no element marked. Keep this process of
m arking and unm arking, following the rule th a t there cannot be two m arked
46

elements in the same row or the same column and th a t rows and columns
cannot be revisited, until x + 1 elements are successfully marked.
It should be clear then th at every time Step 2 is perform ed, one more element
is m arked. Hence, Step 2 m ust be repeated until k elements are successfully
m arked.
The k m arked elements now represent the setting for one of the m switches of
stage 1. A marked element in row i and column j means th a t input i of the switch
is to be connected to o utput j of the same switch. Each m arked element in H m is
then decrem ented by one to obtain

Then the algorithm is applied to Urn-i

to obtain the setting for another switch and also H m- 2 as before. This process is
repeated until Hi is obtained. Hi itself will represent the setting for a switch in
stage 1. Clearly, to obtain the settings for all the m switches of stage 1, the above
algorithm is perform ed m — 1 times (notice th at the ruth tim e is not needed). An
analysis of N eim an’s algorithm shows th a t it runs in 0 ( m k A) tim e [44]. This tim e
complexity is rather large, especially for large k.
Two other algorithms have been proposed to decompose Hm: R am anujam ’s
algorithm s [73] and Jajszczyk’s algorithm [44]. However, both of them were later
proven wrong [16,48]. The reason why the two algorithm s fail has been identified
and a solution to make them succeed has been proposed [17].
On the other hand, the graph theoretic approach to finding the settings of the
switches of stage 1 starts by treating each switch in stages 0 and 2 as a vertex in
a m ultigraph G. Let the set of switches of stage 0 be denoted as Vq and the set
of switches of stage 2 be denoted as V2. Then, given a perm utation, P , an edge is
stretched between vertex i and vertex j if an inlet attached to switch i of stage 0
is to be routed to an outlet attached to switch j of stage 2. The result of this is

47

the bip artite m ultigraph G = (Vo,V2, E), where E is the set of edges between Vo
and V2. G is a m ultigraph since multiple edges between vertices are allowed, and is
b ip artite since each edge in G is incident on two vertices, one in ko and the other in
V2. The degree of G (the num ber of edges incident on any vertex) is clearly m. The
graph theoretic approach then decomposes G into m subgraphs of degree 1 each.
Each such subgraph will represent the setting of one of the m switches of stage 1.
For the example above, the graph representing perm utation P is shown in Fig
ure 3.6 w ith its two equivalent subgraphs. The two subgraphs in the figure represent
th e settings for the two switches of stage 1 of the Clos network of Figure 3.5. An
edge in a subgraph between vertex i, i £ V0 and vertex j , j £ V2, indicates th at
in p u t i of the switch is to be connected to output j of the same switch. These
settings insure th a t no conflict will occur in stage 1 and th a t all required paths
specified by the p erm utation will be accommodated. M any algorithms have been
proposed to decompose G in the general case.
One such algorithm uses matching [43] to extract the individual subgraphs. A
m atching is a set of edges in G such th at no two are incident on the same vertex.
This algorithm runs in 0 ( k 2-5) time. O ther algorithm s also exist where techniques
such as edge coloring [18,84] and Euler partitioning [38] are used. The graph-based
algorithm s are outside th e scope of .this thesis.
The two routing approaches mentioned above have been discussed extensively
in the literatu re and th e graph theoretic techniques have always been described as
more efficient. Recently, however, it has been found th a t both edge coloring and
direct m atrix decom position approaches are equivalent [19]. This finding may well
lead to a new, unified routing algorithm th a t makes the Clos network particularly
suitable for processor interconnection in large-scale multiprocessor systems.

48

+

0 ------- 0
©

= ®

0

0

0

Figure 3.6: G raph representation of perm utation P

49

0

3.3

T h e B e n e s N e tw o r k

The Benes network can be developed from a Clos network with n = m = 2 and
k = 2‘, for some positive integer t > 1, by recursively decomposing the middle stage
into a 3-stage Clos subnetw ork whose outer stages contain only 2 x 2 switches. If
this decomposition is continued until every switch in the network is of size 2 x 2 ,
the resulting network is called the Benes network [12,13].
Consider for example the Clos network shown in Figure 3.5. Replace each of
the two switches in stage 1 by a 3-stage Clos subnetwork.

Then the resulting

network, shown in Figure 3.7, is called the Benes network. There are 5 stages in the
network, each containing 4 switches. In general, a Benes network of size N m ust

-2

-4

Figure 3.7: 8 x 8 Benes network
have v = 2(lgiV) —1 stages, with each stage having N /2 switches, each 2 x 2 .
In reference to Equation 3.1, the Benes network fits in the model as follows.
•50

1. v = 2(lg N ) - 1
2. F i — F o — fi

3- FijFiu_} = e, for all j , 1 < j < {u - l) / 2
4.

= . . . = ^rai/_1

It is interesting to note th at the Benes network can be viewed as two backto-back Baseline networks, with one of the two middle stages elim inated. At the
beginning of this section, the relationship between the Clos network and the Benes
network was m entioned. It should, therefore, be obvious th a t the three networks
used in this thesis, the Baseline network, the Clos network and the Benes network,
are closely related.

3.3.1

R o u tin g th e B en es netw ork

Many algorithm s have been proposed for routing the Benes network. In general two
m ethods can be applied: central routing or distributed routing.
The best known central routing algorithm is the looping algorithm [7,65]. It
uses the fact th a t at any stage j , 0 < j < (u — l) /2 , the stages between j and
2(lg N ) —j —2 form two Benes subnetworks of size N/2* each. The two subnetworks
between stages 0 and 4 of the Benes network of Figure 3.7are enclosed
box and denoted C0 and Cx for easy reference. If the proper

in a dashed

paths are

routed

through the proper subnetwork, no blocking will occur and random perm utations
can be realized conflict free. To understand how the looping algorithm works, the
dual of a num ber is first defined. Given two integers a and b, a is said to be the
dual of b, denoted as a = b, if \a\ = [&_]• For instance, 0 is the dual of 1, and 7 is
the dual of 6, and so on. The looping algorithm says th at if no two dual inlets or
dual outlets are routed through the same subnetwork, then any perm utation can be
51

realized on the network conflict-free. The algorithm starts by assigning an element
from the perm utation arbitrarily to either subnetwork, and then it divides all other
elements between the two subnetworks on the condition th at no dual inlets or dual
outlets go through the same subnetwork. This will be illustrated by an example.
Suppose it is required to realize the same perm utation P as before. P is repeated
here for convenience.
p

F irst, arbitrarily assign

/0
\4

1 2 3 4 5 6
3 2 1 5 0 7

7\
6J ’

to C0. Since outlet 4 goes now through C0, its dual,

outlet 5, m ust go through C\. Thus ^

goes to C\. Now, since inlet 4 goes through

Ci, its dual, inlet 5, m ust go through C0. As a result, Q) is assigned to C\. This
procedure is repeated until all elements of the perm utation are processed. For the
example under consideration the elements of P will be divided between the two
subnetworks as follows.

0

/0 2 5 6 \
V 4 2 0 7

J

1

/4 3 1 7
V 5 1 3 6

The two outer stages, stages 0 and 4, will be set according to the assignments made
to C0 and C\. This procedure is repeated until all the switches are set. The tim e
complexity of the looping algorithm is 0 ( N lg N ).
The Benes network can also be routed in a distributed fashion [63] in much
the same way as the Baseline is routed. T he tim e complexity of the distributed
routing algorithm is 0 (lg N ). The tradeoff, however, is th a t not every perm utation
in Ejv can be realized on a network of size N . T h at is, the Benes network becomes
blocking if routed in a distributed fashion. Recall th a t the same network is nonblocking if routed centrally. It should also be noted th a t a group theoretic approach
has been suggested [20] to route the Benes network. The results are promising and
52

can eventually lead to a linear-tim e set up algorithm . If routing is perform ed on
a m ultiprocessor, as opposed to a uniprocessor as has been the case in the above
algorithm s, a great deal of routing tim e can be saved [21].

3 .4

O th er M I N I m p le m e n ta tio n s

Thus far three MIN im plem entations have been discussed in some detail: the Base
line network, the Clos network and the Benes network.

These three MINs will

be th e focus of the thesis because of their popularity. A great deal of research
has been done to investigate their properties. Furtherm ore, these three MINs have
well-established routing algorithm s and their area complexity is highly acceptable.
There are, however, many other types of MINs th a t can be found in the literature.
One such MIN is the BBC network [9].
A BBC network of size 6 is shown in Figure 3.8. In general, a BBC network of
size N m ust have N —1 stages of one switch each. Each stage produces exactly one
outlet, except the last stage which produces two outlets. For this reason, the BBC
does not fit in the generalized MIN model which requires th a t all outlets be derived
from the last stage. The BBC realizes a perm utation P of size N one element at
a tim e throughout the network except at the last stage where two elements are
realized simultaneously. Consider for example the BBC of Figure 3.8 and consider
a size 6 perm utation. The first stage connects only one of the 6 inlets to outlet
0 and passes the rest of the perm utation to the second stage. The second stage
connects one of its inputs, according to the perm utation to outlet 1. This process
is continued until the perm utation is fully realized.
More details on the BBC and other MINs with regular geometries can be found
in [17].

53

0—
1—

3—
4—

— 4

5—

— 5

Figure 3.8: 6 x 6 BBC network

3.5

T h e C ro ssb a r S w itc h

Evidently, the building block of any MIN is the crossbar switch. This switch has
been shown so far as a simple box. However, the switch is not as simple as it looks even for the smallest size, 2 x 2 . The complexity of the switch is due prim arily to the
need for a control unit attached to the switch which can “talk ” either to a central
routing unit, in the case of a centrally-routed MIN, or to switches in other stages, in
the case of a self-routed MIN. For this reason, m ost of the switch im plem entations
proposed in the literatu re have been for binary switches.
One such im plem entation [69] is shown in Figure 3.9. It consists of two blocks,
the DATA block and the CONTROL block. The bidirectional arrows connected to
the DATA block represent the d ata bus and R ead/W rite control lines. It should
be noted th a t in distributed routing the d ata bus is used at the beginning of each
54

memory cycle to carry the routing tag of the destination. The DATA block is the
m ain element of the switch; this is the place where the d ata passes from the input
of the switch to its output. The inputs of the switch are connected to the outputs,
either in the straight or the cross state as explained earlier, based on inform ation
received from the CONTROL block.
Connected to the CONTROL block are two sets of 1-bit control lines correspond
ing to the two sets of d ata lines of the DATA block. These lines act as signaling
lines between neighboring stages to help set up a new path . There are three such

to

do
bo

Tq
bo

CONTROL
Tl
d\

7*!

Io

Jo
INFO
h

Ji

t:

request

b: busy
d: destination

Figure 3.9: Im plem entation of a binary switch
signaling lines: request (r), busy (b) and destination (d). Lines of the same type
on each switch are connected to their counterparts in the previous and next stages.
The signaling lines on the input side of the switches of the input stage are connected

55

to the processors, and the signaling lines on the output side of the switches of the
last stage are connected to the memory modules. The operation of these signaling
lines is described below.
The binary switch described above operates in a network of v stages as follows.
All processors requiring memory access place a logic 1 on the request lines of stage 0
and place the address of the destination on the d ata lines. The request propagates
from one stage to another from the input side of the network to the output side.
Once the request signal reaches a switch, the switch investigates its destination
line. The destination line of a switch in stage j is connected to d ata line u — j —1
of the d ata bus input to the switch. The switch is put in one of its legal states,
straight or cross, based on one of the two destination lines d0 and da (arbitration
is used if there is a conflict). Lines x and x carry the connection decision from
the CONTROL block to the DATA block. They are two lines, and not one, for
hardw are considerations.
The busy line goes high if the connection requested for a data bus is denied by
the switch due to a conflict. This busy signal propagates backwards towards the
input of the network and finally to the processor requesting the connection. After
8u gate delays, the busy line attached to the processor is valid and the processor
can investigate it. If the busy line is 0, it means the route has been successfully set
up. On the other hand, if the busy line is 1, the processor m ust try again later, as
a conflict has happened somewhere on the way to the destination.
A nother im plem entation of a binary switch is reported in [10], while a fault
tolerant im plem entation is given in [55]. In the latter im plem entation, a built-in
fault detection capability through data bits checking is im peded at each switch. In
this im plem entation the switch ends up having 88 pins for a data bus of 16 bits,
because it uses more control lines th an the one shown in Figure 3.9.
56

C h a p te r 4
F a u lt T o lera n t M IN s
In this chapter, an overview of some fault tolerance techniques th a t have ap
peared in the literature for MINs will be presented. The advantages and shortcom 
ings of each design will be highlighted. This will help explain the problem of fault
tolerance, and thus will facilitate its solution.
In general, MINs can be m ade fault tolerant by adding extra hardw are. An
obvious approach then is to fully duplicate the MIN (100% redundancy). Here two
MINs are put in parallel, with one being active and the other being standby. If a
fault occurs, th e standby MIN is switched in and the faulty MIN is switched out,
and norm al operation resumes. T he advantage of this approach is th a t perform ance
rem ains the same under faulty conditions as under normal (no faults) conditions.
The disadvantage of duplication is the increased cost and size of the system. To
keep the cost and size of the system at a minimum, one m ust search for a solution
other th a n duplication. Adding extensive hardware usually decreases perform ance
degradation under faulty condition, but increases the cost and size. Adding little
hardw are, on the other hand, increases performance degradation under faulty con
ditions but keeps the cost and size down. As a consequence, a compromise m ust
be m ade where the tradeoffs are weighed carefully and the best design is reached.
A good fault tolerance technique is one th at needs minimal hardw are and causes

m inim al perform ance degradation under faulty conditions. Needless to say, any
fault tolerance technique should cause no performance degradation under norm al
conditions.
Recognizing the need for fault tolerance in multiprocessor systems, a num ber of
fault tolerant MINs have recently been suggested. The details of these techniques
depend m ainly on the type of network and the fault tolerance model used. For
example, fault tolerance has been provided for the shuffle family by adding an extra
stage [2,3,33,88], by adding extra interstage links and using non-binary switches as
a result [27,45,67], or by adding intrastage links and modifying the switch design
[49]. Fault tolerance has also been provided for some other network architectures
[4,52,39,59,72,75,76] through various approaches. All these techniques offer some
level of fault tolerance to the MIN by avoiding the costly approach - duplication.
U nfortunately, m ost of the fault tolerant techniques cited above are MIN-specific.
Moreover, m ost of these techniques are suggested only for the shuffle family. Despite
an extensive search in the literature, it has not been possible to find any fault
tolerant work done for either the Clos or the Benes networks. If there is really no
such work, this thesis may well be the first attem pt in th a t direction. In this thesis,
a fault tolerance technique th at is not MIN-specific is developed. As such, it can be
used with either Benes networks or Clos networks, or any network th a t can fit in
th e generalized MIN model. The generalized technique, however, is m ost efficient
for MINs w ith small switch sizes, e.g. binary switches.

Since the Clos network

is characterized by using large switches, the generalized technique becomes less
efficient. It is for this reason th a t another technique will be presented in C hapter 6
to provide fault tolerance for the Clos network. Together, the two techniques should
offer a reasonably comprehensive solution to the fault tolerance problem in a great
num ber of MINs.
58

W ith this in mind, two fault tolerant MINs th at have recently been suggested
will now be presented. The construction of each MIN will be described as well as the
fault tolerance model used in its design and the recovery method. The advantages
and disadvantage of each MIN will also be examined.

4.1

T h e E x tr a S ta g e C u b e

T he E x tra Stage Cube (ESC) has been suggested [2] for the shuffle family.

It

is dem onstrated on a variation of th at family, the Generalized Cube network, in
Figure 4.1. Stage 3 in the figure is the extra stage. It should be noted here th at
the stage num bering scheme in this figure is from right to left, opposite to the
scheme adopted everywhere else in this thesis. The position of the inlets and outlets,
however, conforms w ith the convention adopted in this thesis, th a t is, to the left
and right of the MIN, respectively. The inlets are connected to 1 x 2 demultiplexers
(shown as little boxes). One of the two outputs of each demultiplexer is connected
to a switch in stage 3, and the other output is connected to a m ultiplexer on the
other side of the switch. The use of multiplexers and demultiplexers in fault tolerant
MINs seems inevitable. They can be looked upon as switches choosing one of many
target fines on one side at a tim e and connecting it to a single fine on the other side.
They are usually provided with selection (or control) fines to select a specific target
line. Thus in th e ESC, stage 3 as a whole can be bypassed if the target lines chosen
by the demultiplexers and m ultiplexers are not those attached to the switches. This
is the idea behind the ESC - th a t a stage can be switched in or out at will.

4.1.1

O p eration and fault toleran ce m od el

It should be noted th a t the generalized cube generates its routing tag, T, as the
bit-wise exclusive-or of the two integers representing the source and destination.
59

demux

stage 3

mux

dem ux

stage 2

stage 1

Figure 4.1: The E x tra Stage Cube (ESC)

stage 0

mux

Suppose it is required to establish a p ath between inlet S and outlet D. Then the
routing tag will be
T = S © D — tu~ i . . . ti^Oi
where u = lg N is the num ber of stages in the generalized cube. A switch in stage
i needs only examine U. If U = 0, the straight state of the switch is assumed. If
U = 1, the cross state is assumed. As is the case with any network in the shuffle
family, if the two inputs of the switch have routing bits th a t try to p u t the switch
in the two states simultaneously, only one will be given priority and the other will
be blocked.
The ESC, on the other hand, is normally set with stage u disabled (bypassed)
and stage 0 enabled. If a fault occurs in stage 0, stage 0 is disabled and stage u
is enabled. If a fault occurs in stage x, 0 < x < u, then both stages 0 and u are
enabled. It should be noted th a t stage 0 can also be switched in or out of the
MIN by adjusting the m ultiplexers and demultiplexers around th at stage. Having
an extra stage in the network offers two paths between any inlet/outlet pair. One
p ath , the primary p ath , corresponds to the norm al p ath th a t would otherwise be
established on the norm al cube, and the other p ath , the secondary p ath is the one
used when stage u is enabled (in case a fault occurs in the network).
Since each bit in the routing tag controls a switch in the network, and since the
switches to be controlled change according to the location of the fault, a dynamic
(v + l)-b it routing tag, T, other th an the norm al T m ust be generated by the
processors for each p a th desired. Table 4.1 shows these routing tags for all possible
fault locations, where t„ is a dumm y bit th at can be assigned any arb itrary value.
It is evident th at a processor m ust know the exact location (stage) of the fault
so th a t it can generate th e proper routing tag. Thus once a fault is detected and
located, after running some test, all processors m ust be notified of the location of
61

Fault location
No Fault
Stage 0
Stage x, 0 < x < v

Routing tag T
T — t vtv_i . . .
T = totu- i . . . t^to
f Ot„_i . .. Mo If prim ary p ath is fault free
1 lt„_ i . . . Mo K prim ary p ath is faulty

Table 4.1: Routing Tags for the ESC
the fault. It is assumed th at there is an external hardw are unit th a t will enable
stage u if there is a fault in the network.
The fault model in the ESC is as follows.
1. Any network component can fail
2. Faulty components are unusable
3. Faults occur independently
4. Multiplexers and demultiplexers as well as links attached to them cannot fail
The fault size of the ESC is 1, and the fault tolerance criterion is full access
retention, th a t is, any inlet m ust rem ain capable of accessing any outlet after the
ESC recovers from a fault. The ESC is robust in the presence of multiple faults.
The m ain advantage of the ESC is ease of operation; the multiplexers and demul
tiplexers have to be adjusted only once after a fault occurs. Another advantage is
th a t the ESC does not need specially designed switches; the normal binary switches
are still used. Other than links associated with the m ultiplexers and demultiplexers,
no additional links, interstage or intrastage, are needed.
The shortcomings of the ESC, however, can be summ arized as follows. F irst,
for a MIN of size N , N / 2 extra switches are needed in addition to 2N multiplexers
and 2 N demultiplexers. It should be m entioned th a t the ESC was later modified
[3] where the demultiplexers at the input of the network were replaced by dual
62

in p u t ports and the multiplexers at the output of the network were replaced by
dual o u tp u t ports. Second, to enable stage v after a fault occurs, there m ust be
an external hardw are unit to adjust all the multiplexers and demultiplexers around
the stage so th a t d ata is routed through stage v rath er th an around it. Third, the
ESC cannot realize a perm utation after it recovers from a fault in any stage except
stage 0. For example, if a switch in stage x fails, 0 < x < v — 1, the m axim um
num ber of paths th a t can be realized simultaneously will be N —2, where N is the
size of th e network. Fourth, the E x tra Stage technique is relatively MIN-specific;
it works only with networks of the shuffle family, or any network where the link
m aps between the stages are the same or can be made the same by rearranging the
switches w ithin the same stage. Fifth, after recovering from a fault, tim e is needed,
before generating a new routing tag, to find if the fault lies on the prim ary p ath or
not. This tim e constitutes performance degradation, as it slows down the system.
As a whole, the E x tra Stage technique is one of the best fault tolerant MINs p u b 
lished. It has been adapted for two other networks: the delta network (introduced
in C hapter 3) [33] and the gam m a network [88] which is also a shuffle network.

4 .2

A u g m e n te d S h u ffle-E x c h a n g e M I N

Before discussing this design, a word on the structural layout of shuffle networks is in
order. Shuffle networks are characterized by having switches in each stage forming
groups called conjugate subsets. Each conjugate subset of a stage leads to a unique
subset of outlets.

The two outlet subsets reachable from two conjugate subsets

of a given stage are disjoint. To illustrate this, consider the Baseline network of
Figure 4.2. O utlets 0,1,2 and 3 can be reached from either switch X ( 0 ,1) or switch
AT(1,1) in stage 1. Therefore, switches X (0 ,1 ) and AT(1,1) belong to the same
conjugate subset.

On the other hand, outlets 4,5,6 and 7 can be reached from
63

either switch X ( 2 ,l) or switch X ( 3 ,l) in stage 1. Therefore, switches A f(2,l) and
X ( 3 ,l) belong to the same conjugate subset. It is clear then th a t stage 2 is m ade
up of two conjugate subsets. Notice th a t the two subsets of outlets reachable from
these two conjugate subsets of switches are disjoint. Consider now stages 0 and 1.
Since all the outlets are reachable from any switch in stage 0, then all switches of
stage 0 belong to the same conjugate subset. On the other hand, stage 2 can be
easily seen to have 4 conjugate subsets, one switch each. It should be noted th a t
a conjugate subset in stage j , 0 < j < v — 2 has access to exactly two conjugate
subsets in stage j + 1. More about conjugate subsets can be found in C hapter 5.
The Augm ented Shuffle Exchange Network (ASEN) is another MIN-specific fault
tolerance scheme proposed [49] for the shuffle family of network. It is dem onstrated
in Figure 4.3 on a network close to the generalized cube discussed in Section 1. Al
though the ASEN published does not show explicitly the dem ultiplexers connected
to the inlets and the multiplexers connected to the outlets, they are implicitly
present. Therefore, they are explicitly shown in Figure 4.3. This will facilitate
perform ing comparisons between different design shown in this thesis.
The ASEN replaces the binary switches of a shuffle network by 3 x 3 switches,
which are sim ilar in operation to the binary switch but with an auxiliary input and
an auxiliary outp u t. The switches of any conjugate subset can always be recognized
as two groups, w ith each group having access to all the switches in the two conjugate
subsets in the next stage. The ASEN links together such a group using the auxiliary
term inals to form a loop as shown in Figure 4.3. For instance, switches X (0 ,0 ) and
X (1 ,0 ) of the only conjugate subset in stage 0 have access to the two conjugate
subsets of stage 1. Thus, these two switches are looped together as shown. In
stage 1, since each subset contains only two switches, each group will have only one
switch, and therefore there is no point in forming a loop around the same switch.
64

4

4

5

5

6

6

7

7

stage 0

stage 1

stage 2

Figure 4.2: 8 x 8 Baseline network

65

The idea behind these loops is th a t if one switch fails in stage 1, say, then a route to
it from stage 0 can be sent through the loop to another switch in the group where
it can access again the original outlet. In reference to the ASEN of Figure 4.3,
suppose th a t switch X (1 ,0 ) wants to establish a route to outlet 3, b u t finds th a t
switch X ( l, 1) is defective. Then switch X (1 ,0 ) can send the route through its
auxiliary output to switch AT(0,0). As can be seen switch X (0 ,0 ) has access to
outlet 3, and thus the route can still be establish despite the existence of a fault
along the original path. Routing the ASEN is described in more detail below.
The input and ou tp u t stages, stages 0 and v — 1, respectively, are made fault
tolerant as usual by using m ultiplexers and dem ultiplexers. Each inlet has access to
two loops so that if one loop is defective, the inlet can route its connections through
the other. Similarly, each outlet is reachable from two distinct switches so th a t if
a switch fails, the outlet can be reached from the other switch. It should be noted
th a t this arrangem ent at the o utput side of the network eliminates the last stage of
switches, stage v. It is also interesting to note th a t the num ber of loops in stage j
is 2J+1. This means th at the num ber of loops in stage

—2 is equal to the num ber

of switches of the stage (2"-1 ). This eliminates the need for having loops in stage
v — 2, as it would be meaningless to form a loop around the same switch.

4.2.1

O peration and fault to lera n ce m od el

Given its construction, the ASEN works as follows. A processor requests a p ath
by p utting the routing tag for the destination on the inlet.

For each switch j ,

0 < j < n — 2, the request m ay arrive on any of the three inputs. The switch
m ust use the proper routing bit in the routing tag to extend the path to the next
stage on one of its two norm al inputs. If a switch cannot use one of its two normal
outputs because it receives a signal from the next switch indicating th a t it is busy

66

demux

0_
1_j

n— 1

2- i
3_n

_2

5 -4

l— 5

3

6_

7_

_7

stageO

stagel

stage2

Figure 4.3: The Augmented Shuffle-Exchange Network (ASEN)

67

or faulty, th e switch routes the p ath on its auxiliary o utput. (The operation of the
ASEN depends on having switches capable of detecting faults in switches one stage
ahead.) This process continues until stage

— 3 is reached. As m entioned earlier,

th e switches of stage v —2 are norm al binary switches and behave as such. However,
if a switch in th a t stage finds th a t the demultiplexer it wants to use is defective
the route cannot continue. Clearly, the demultiplexers should be able to send a
busy signal back to stage v — 2 which can then be relayed back to the processor
requesting th e path.
To illustrate with an example, consider the ASEN of Figure 4.3. Suppose now
th at switch X ( l , l ) is faulty and it is required to establish a route from inlet 2 to
outlet 0. U nder norm al circum stances, this route would traverse switches X (1 ,0 ),
X ( l, 1) and then through the dem ultiplexer connected to the upper output of switch
X ( l , l ) to th e destination. But now since X ( l , l ) is faulty, switch X (1 ,0 ) will use
its auxiliary o u tp u t to route the request to switch X (0 ,0 ). Now if switch X (0 ,0 )
has its u p p er o u tp u t vacant, it can route the request to switch X (0 ,1 ) and from
there it can go through the upper o u tp u t to the destination.
The fault model in the ASEN is identical to th at of the ESC. T h at is,
1. Any netw ork com ponent can fail
2. Faulty com ponents are unusable
3. Faults occur independently
4. D em ultiplexers cannot fail
The fault size of the ASEN is 1, and the fault tolerance criterion is full access
retention, th a t is, any inlet m ust rem ain capable of accessing any outlet after the
ESC recovers from a fault. The ESC is robust in the presence of m ultiple faults.
68

The advantages of the ASEN are as follows. First, no external hardw are unit
is needed to control the network; every routing step remains distributed as in an
ordinary shuffle network. Second, the num ber of multiplexers and demultiplexers
are half th a t used by the ESC. T hird, one stage of switches, stage

—1 is eliminated.

The shortcom ings of the ASEN can be summarized as follows. F irst, specially
designed switches are needed to construct the ASEN. Unlike the ESC, the ASEN
requires 3 x 3 switches w ith intelligence built in each switch so th at it can make a
decision as to which ou tp u t it will route a request. Adding only one input and one
o utput to a binary switch, adds at least 50% to its hardware complexity. Second, in
the ASEN intrastage links are needed to form the loops, adding to the complexity
of the network. Recall th a t these links are not single lines; they are complete buses
incorporating d a ta and signal lines as m entioned at the end of C hapter 3. Third,
the num ber of links connecting the network to the inlets and outlets is double th at
in a norm al shuffle network.
It is not clear in the ASEN w hat an outlet would do if it receives two requests.
This is particularly perplexing in a m ultiprocessor environment where the outlets are
connected to mem ory modules which normally have no intelligence. Also the ques
tion is open as to how the m ultiplexers at the input side would resolve contention
in case two requests were received simultaneously. Clearly, these multiplexers are
not the same as the ones m entioned in the ESC. More likely they are intelligent
components which can make an autonom ous decision. In the work developed in
C hapter 5, intelligent m ultiplexers will also be needed.
From th e above overview of two fault tolerance techniques for MINs, it should
be obvious th a t the problem is not simple. It is difficult to present a design without
having some drawbacks. A good technique is one th at tries to minimize those draw
backs, rath er th a n elim inate them. These guidelines will be utilized in developing
69

two fault tolerance techniques in the next two chapters.

4 .3

F a u lt D e t e c t io n a n d L o c a tio n

The work of any fault tolerant MIN depends on two things: fault detection and fault
location. Two techniques have been proposed in the literature for fault detection
and location. F irst, fault detection and location can be performed off-line through
applying prescribed test p atterns to th e inlets and comparing the output at the
outlets w ith th e expected values [5,35]. Second, faults can be detected and located
dynamically online through either parity checking [79] or data bits checking [55]. As
good as the online techniques may sound, they require a special switch design with
built-in hardw are to carry out the dynamic checking. This online fault detection and
location technique is th e m echanism assumed by the ASEN. However, the ESC does
not require any p articu lar mechanism; rath er it requires only th at the processors
be notified of th e location of the fault, if any. For the work done in this thesis, it
is assumed th a t there is some mechanism to detect and locate faults and notify the
processors of th e location of the fault.

70

C h a p te r 5
T h e S im p le F a u lt T o lera n t
B a s e lin e n e tw o r k
As has been m entioned, fault tolerance has been provided for MINs by adding
extra hardware in the form of extra switches, extra interstage links, ex tra intrastage
links, or a com bination of these components. In this chapter, a fundam entally
different approach to fault tolerance of MINs will be introduced: the Simple Fault
Tolerance (SFT ) technique. The prim ary advantage of this technique is th a t it is
not MIN-specific. In fact, it can be used w ith any MIN th at fits into the MIN
model of C hapter 2. T he SFT will be dem onstrated below on a Baseline network to
construct the Simple Fault Tolerant Baseline (SFTB) network. The 8 x 8 Baseline
network of Figure 3.1 will be chosen for this dem onstration, and is repeated here
as Figure 5.1 for convenience.
As its nam e implies, th e idea behind the SFT technique is simple. In C hapter 1,
it was mentioned th a t th e interconnection mechanism in a m ultiprocessor system can
be either a single bus or a MIN. The SFT technique combines these two mechanisms
in one, with the MIN being the prim ary m echanism and the bus being used only
after a fault occurs, and only by the processors affected by the fault. The resulting
network thus combines all the characteristics of the original MIN. In addition, the

71

4

4

5

5

6

6

7

7

Figure 5.1: Baseline network of size 8

72

resulting network has the fault tolerance capability at a low cost.
The philosophy behind the SFT technique is th at faults exist only tem porarily.
Therefore, drastic changes in the design of a MIN to make it fault tolerant are not
w arranted. On the one hand, those changes normally increase the cost of the MIN.
On the other, in some cases the changes tend to have negative im pacts on MIN
operation under norm al conditions. The latter point may not be apparent to the
designer, b u t it can be m ade clear as follows. If the fault tolerance capability is
im peded in the switches, the switches will be more complex. The direct consequence
of this complexity is th a t the propagation delay of the switch will increase. This
increase is of course unw anted, as it will decrease the throughput of the system.
The SFT technique, by using an external bus in parallel with the MIN, does not
change anything in the original MIN. The technique cannot cause negative im pacts
on the operation of the MIN under norm al conditions, as the bus is totally invisible
under those conditions.

5.1

D e s ig n o f th e S F T B

For the SFTB , the fault model is defined as follows.
1. Any switch can fail: A switch can fail in several ways. For instance, the switch
can be stuck in one of its legal states, giving a proper connection only if th a t
happened to be the desired state. Also, a switch can be stuck in a partially
legal state, such as connecting only one input to one output. This perm anent
connection again may happen to be desired to establish a path. A switch can
be stuck in an illegal state, such as connecting the two inlets to each other
and the two outlets to each other, making it totally useless. In addition, a
switch may be responsive to its control unit but give sometimes or always the

73

wrong state. All these cases will be lum ped as a switch failure, for which the
switch is totally useless and m ust be avoided.
2. Any link can fail: A link can fail if it is disconnected from a switch to which
it should be connected. In fact, an open circuit in a link disconnects commu
nications between two switches. Although the SFTB, as will be shown, can
handle link failures, the discussion will mainly focus on switch failures for two
reasons. The first reason is brevity and clarity, as analysis of link faults will
only clutter the work w ithout adding any new substance. The second reason
is th a t switch failures are more difficult to recover from and once a network
is able to recover from switch failures, it is trivial to adapt the results to link
faults.
3. The standby bus, dem ultiplexers, m ultiplexers, and external links cannot fail.
These item s are the hardw are th a t is supposed to provide fault tolerance to
the system. If it could be assumed to fail, then it would not be possible
to propose any fault tolerance design. In addition, this assum ption can be
justified here because these components rem ain idle under normal conditions.
Thus they can be expected to have higher reliability th an the actively working
switches.
It should be m entioned th a t faults are assumed to occur independently, and th a t
faulty components are unusable.
The fault tolerance criterion for the SFTB is full-access retention. T h at is, after
a fault occurs, each processor m ust still be able to com municate with any memory
module. It is worth noting th a t in the enhanced SFTB, to be presented later, a
higher level of fault tolerance criterion can be achieved - full recovery. Full recovery
is the ability of the network to regain its pre-fault connectivity after a fault occurs.
74

The fault tolerance size is the num ber of faults th at the system can recover from.
The SFTB can tolerate as many faults as possible (all switches and all links fail). In
other words, if every component of the Baseline network fails, the communications
between the processors and memory can still be m aintained. One caveat, however,
is th a t th e more faults th a t exist, the worse the perform ance of the SFTB will be.
For this reason, and to keep perform ance at almost the pre-fault level, the SFTB
network will be analyzed as a single fault tolerant network.
W ith th a t in mind, the SFTB can now be described. Starting with a Baseline
network, an external bus bypassing the network and connecting the processors to
memory is added. Each processor is connected to both the network and the bus
through a demultiplexer.

Similarly, each memory m odule is connected to both

the network and the bus through a multiplexer.

Under norm al conditions, the

processors and memory will have connections only to the network. Since the bus
does not become active until a fault occurs, it is called a standby bus.
This technique is applied to the Baseline network of Figure 5.1 to create the
SFTB, shown in Figure 5.2. As can be seen, each inlet is connected both to the
ordinary network and to the standby bus through a 1 x 2 demultiplexer.

The

demultiplexer can be looked upon as a switch connecting its input to only one of its
two outputs at a time, based on a control (selection) bit. If the dem ultiplexer is set
to connect the inlet to the network, it is said to be in the 0 position. P u ttin g the
dem ultiplexer in the

1

position connects the inlet to the bus. Similarly, each outlet

is connected to both the network and the standby bus through a

2

x

1

multiplexer.

Here again, the multiplexer can be either in the 0 position, thereby connecting the
network to the outlet, or in the

1

position, connecting the bus to the outlet.

The SFTB operates as follows. Under norm al condition, the m ultiplexers and
demultiplexers should be in the 0 position.
75

This makes the SFTB functionally

dem ax

max

Figure 5.2: The SFT equivalent of the Baseline network of Figure 5.1

76

identical to the ordinary Baseline. It follows th at under norm al conditions, the
SFTB will use the routing algorithm of the ordinary network and therefore the
addition of the standby bus will not have any negative im pact on the operation of
the network.

5.2

R o u tin g th e S F T B U n d e r F a u lty C o n d itio n s

Upon the occurrence of a fault, the SFTB m ust be reconfigured to cope with the
fault. Thus, a mechanism for detecting and locating faults m ust be used to invoke
this configuration process.
As indicated earlier, the discussion will deal only w ith switch faults to avoid
cluttering th e discussion unnecessarily. Assuming now th a t a switch has been iden
tified as being faulty, the normal sequence of establishing p aths will be modified as
follows.
1. At the beginning of each mem ory cycle, each processor requiring memory
access m ust find first if the defective switch is along the p ath it wants to
establish.
2. If the defective switch is along the path, the processor will have to access
mem ory using the standby bus instead of the network.
3. If th e defective switch is not along the path , the processor starts the memory
cycle as it normally would under norm al condition.
Step

1

above results in some perform ance degradation due to the time taken

by the processor to find if the defective switch lies along the path. The details of
this will be discussed later. Step

2

requires accessing the bus. Since more th an

one processor, at most two, may try to access the bus simultaneously, a contention
77

problem may arise. This is particularly true in synchronous operation, where all
processors sta rt their memory cycle at the same time. Some solutions to this prob
lem are presented in Section 5.2.2. Step 3, establishing the p a th over the network, is
the ordinary Baseline procedure used under normal conditions, and therefore needs
no further explanation. After a fault occurs, all the processors except two will be
able to still use the same routing scheme they do under norm al condition. This is
a principal advantage of the SFT technique.

5.2.1

P erform an ce d egradation under fau lty con d ition s

As has been shown, using the SFT technique under normal conditions is completely
transparent; th e perform ance of the SFTB is identical to th a t of the ordinary Base
line network. However, when the network is faulty, some perform ance degradation
occurs due to the fact th a t at the beginning of each memory cycle, a processor
requiring access to memory m ust find out if the faulty switch lies on the p a th
to the destination. Suppose th at processor S, 0 < S < N — 1 , needs a p ath to
destination D, 0 < D < N — 1, whose binary representation is d„_i, d„_ 2 , .. •, doRecall th a t the binary representation of D represents the routing tag needed to
establish the p a th from S to D. On a Baseline network, given the current switch
X ( i , j ) , and the routing bit

one can find the next switch

along the p ath

X ( i , j + 1), 0 < i < (N / 2 ) — 1 , as follows.
■_ / L*/2J + ( « / 2 )L</aj
\ [.72] +
+
where a =

a / 2

if
^

du-i-j

=
=

0
1

and |_®J denotes the integer p art in the real num ber x. Assume

now th at switch X ( a , b ) is defective.

Processor S can apply the above formula

recursively to find if X ( a , b ) is along the path to memory module D. Procedure
AVOID.N ETW ORK below utilizes this formula and m ust be used at the beginning

78

of the mem ory cycle by each processor requiring memory access. The procedure
adjusts a binary flag avoid. If avoid is set, it means th a t the defective switch is
along the p a th and therefore the processor m ust use the bus to establish the required
connection. If, on the other hand, avoid is reset, it means th a t the processor can
use the network as if there was no fault.
PROCEDURE AVOID_NETWORK (z/,a, 6 ,JD)
BEGIN
IF {{b = 0 AND [5/2J = a) OR (b = v - 1 AND [D/2J = a)) THEN avoid <ELSE
BEGIN
avoid <— 0 , j *- 0 , z <— [5/2J
WHILE j < b DO
BEGIN
a «_
IF
= 0 THEN % [z/2 j + a / 2 [i/aj
ELSE i <— |_z/2 j + a / 2 [z’/a j + a / 2
j <—j + 1 , i <—i
END {while}
IF i — a THEN avoid <— 1
END {else}
RETURN (avoid)
END {AVOID_NETWORK}

1

/

This procedure represents the difference between the operation of the SFTB
under norm al and faulty conditions. As a result, the tim e it takes to execute the
procedure is th e measure of perform ance degradation under faulty conditions. To
estim ate this tim e, notice first th a t if the network has two stages, u —
procedure will not enter the W HILE loop.

2

, the

For v > 2, the statem ents outside

the loop will be executed only once. Assume the tim e it takes to execute these
statem ents is T0, which is 0 (1 ). As for the loop, in the worst case it will be executed
v — 2 times. It is obvious th a t the worst case is if the faulty switch happens to be
in stage v — 2. If the loop execution tim e is T/, then the worst case run time for the

79

procedure, Tp, is
Tp = T0 + (u — 2)2].
Note th a t 2] = 0 (1 ), hence Tp = 0 ( v ) . This is an acceptable value, in view of
the fact th a t the alternative is to use hardw are means to notify the processor th a t
the p ath cannot be established due to a faulty switch. T hat would be in the form of
a signal from th e faulty switch back to the processor. Taking the worst case, a fault
at stage

— 2 , it would take the signal twice the propagation delay tim e from the

processor to the faulty switch. T h at is, it takes 2(v —1 )TS, where T, is the switch
propagation delay which is estim ated [69] to be

8

gate delays.

In either case, software or hardw are, the complexity of the delay time under
faulty conditions is 0 ( u ), but the software procedure has the advantage of not
requiring any specially designed switches or extra interstage signaling lines.

5.2.2

A cc essin g th e bus

After finding th a t a p a th cannot be established on the network because of a fault,
the processor m ust use the bus to establish th a t path. If the fault is in a link, the
single processor affected can start using the bus with no problems. But if the fault
is in a switch, two processors will need to use the bus. If the two start using the bus
simultaneously, a conflict occurs. To avoid the contention, the bus is provided with
an extra line indicating w hether the bus is currently available, the bus-busy line.
A processor wanting to use the bus m ust first sense the bus-busy line. If it is low,
the processor asserts it and starts using the bus. If the line is high, the processor
waits in a loop until it is low and then seizes it. A problem, however, can occur
if the two processors sim ultaneously test the line, find it low and try to seize it at
the same tim e. This will most likely be the case in a synchronous environment,
where all processors sta rt accessing memory at the same tim e. For this situation,
80

a more elaborate m echanism m ust be devised. Below, two such mechanisms are
suggested and discussed: one depends on a Central Control Unit (CCU) and the
other is dynamic.
The CCU is a hardw are unit th a t can be accessed by all processors. Its function
is to receive requests for the bus and grant access to one of the processors at a time.
A processor wanting the bus sends a request to the CCU giving its num ber and the
memory module num ber. The CCU identifies the m ultiplexer in question, puts it
in the

1

position, and sends an acknowledgement to the processor upon which it

can safely start using the bus. Notice th a t the demultiplexers can always be set
by the processors w ithout the help of any hardw are unit. The CCU will grant the
bus immediately to the processor th a t asks for it. But if the two processors subm it
their requests at exactly the same tim e, a built in arbiter [70] m ust decide to which
processor it will grant the bus. The arb itratio n can be either random or prioritized.
At the end of the memory cycle, th e CCU m ust put the multiplexer back in the 0
position, preparing for the next m em ory cycle.
Instead of relying on the services of a central control unit, the processors can
perform the services themselves dynamically. A processor wanting to communicate
over the bus can put its own dem ultiplexer in the 1 position. However, two problems
can arise: competing for the bus and setting the multiplexers on the other side of
the network. The bus contention problem can be solved using the bus-busy line
mentioned above in two ways together with a round robin scheme for the processors
to test th a t line. In the round robin scheme, a processor wanting to use the bus
counts a num ber of clock cycles equal to its num ber in the system before it tests
the bus-busy line. If the line is low, the processor asserts it and start using it on
the next clock pulse. If the line is high the processor keeps testing it continuously
until it goes low, when it asserts it and starts using the bus on the next clock pulse.
81

If the clock period is T, then the m inim um wait tim e before seizing the bus is T,
for processor 0, and N T , for processor N — 1, with the average waiting time being
N T / 2. It should be noted here th a t T m ust be greater th an r max, the propagation
delay between processors 0 and N — 1 . Otherwise, processor i, 0 < i < N — 1 will
have to wait (z + l ) r max instead of (z -f- 1 )T, before it tests the bus-busy line.
Assuming now th at the processor has seized the bus dynamically, it puts on
the bus the num ber of the memory module it wants to comm unicate with. This
num ber will be decoded and used on th e other side of the network to both put the
multiplexer in the 1 position and enable the memory module. This procedure is
the mechanism adopted in uniprocessor systems with one bus. The processor can
proceed then talking to the memory module as in a uniprocessor. Once it finishes
communicating with the memory module, the processor should p u t the multiplexer
and demultiplexer back in the default,

5.3

0

, position.

D e s ig n o f th e E n h a n c e d S F T B

In some applications, full access retention only may not be enough as a fault toler
ance criterion. Note th a t with a full-access retention only capability, the network
cannot realize any perm utation. If th e network is used m ainly to realize perm uta
tions, then a higher fault tolerance criterion m ust be im posed, full recovery. This
allows the network to be able to realize after recovery any connection p attern it
was capable of before the fault. Such a criterion can be easily met for any network
using binary switches with the help of an enhanced version of the SFT technique.
The idea is to add two standby buses to the network instead of one. Recall th a t
the worst single fault in a network w ith only binary switches, such as the Baseline
network or the Benes network, prevents the realization of exactly two paths. W ith
two standby buses, these two paths can be im plem ented resulting in full recovery
82

Selection word
00
01
10
11

Selection
inputs of network
standby bus No. 1
standby bus No. 2
unused

Table 5.1: M ultiplexer and demultiplexer operation modes
of th e system. The enhanced technique will be dem onstrated below on a Baseline
network, where the the advantages of this enhancem ent both under norm al and
under faulty conditions will be illustrated.
The enhanced SFT equivalent of the Baseline network of Figure 5.1 is shown in
Figure 5.3. Using two buses, entails the use of dem ultiplexers and multiplexers of
size 1 x 4 and 4 x 1 , respectively. Each of the m ultiplexers or the demultiplexers
has two selection lines controlling the operation mode of their respective units as
shown in Table 5.1.
Some work on multiple bus systems has already been perform ed [61,62,83], where
solutions to the bus access problem are suggested. Basically, the techniques used for
accessing a bus in a m ultibus system are similar to those discussed for the SFTB,
w ith CCUs equipped with arbiters being the most recommended.

5.3.1

P erm u ta tio n realization cap ab ilities o f th e en h an ced
SFTB

As indicated earlier, an N x N Baseline network cannot realize every perm utation
in th e sym m etric group, E/y [37]. Only a class of perm utations can be realized, and
this class has been already identified [1 ]. As will be proven later, if there is blocking
while a perm utation is being realized on a Baseline network, the m inim um num ber
of blocked paths is 2. If a perm utation is blocked in exactly 2 paths, it can be
realized blocking-free on the SFTB by realizing the two blocked paths on the two

83

dem ux

mux

Figure 5.3: T he enhanced SFT equivalent of the Baseline network of Figure 5.1

84

standby buses. A perm utation, P, of a set of elements A, A = { 0 ,1 ,..., N —1 }, is
a one-to-one function m apping A onto itself, such that
P = {Pi : i, Pi E A}
where P, = Pj if and only if i = j , for all i , j £ A. Realizing perm utation P on a
MIN means connecting inlet i to outlet P{ for all i, Pi € A , where A in this case is
the set of all inlets (or outlets). A perm utation is also sometimes called a bijection
[37]. The size of a perm utation is the cardinality of the set it acts on. A size- N
perm utation is one th a t maps a set of cardinality N . In this section, a perm utation
with £, 0 < £ < TV, unrealizable elements will be called a perm utation with £ blocked
paths. The switch where a conflict occurs is called a blocking switch.
The enum eration of perm utations of different numbers of blocked paths on the
Baseline network will be performed by counting the connection patterns on the
network which should be unique for distinct perm utations. A connection p attern
can be looked upon as a “snap shot” of the settings of the switches and the location
and type of the blocking. It turns out, however, th at with the same perm utation,
one can see m ore th an one connection p attern unless some consistency is adopted.
Specifically, a fixed arbitration policy m ust be m aintained for all switches in case
a conflict occurs. A conflict occurs if the two inputs of a switch ask for the same
output. The switch may give priority to the upper input, in which case it is said
to adopt a Priority To Upper (PTU ) Policy, or it may give priority to the lower
input, in which case it is said to adopt a Priority To Lower (PTL) policy. Assigning
a fixed priority policy to each switch throughout the counting process is necessary
to achieve a one-to-one relationship between each perm utation and the connection
p attern resulting from its realization on the network. It should be noted, however,
th a t in practice a switch may change its priority policy adaptively.
85

Once this

consistency is achieved, one can easily count the num ber of perm utations of every
blocking class, taking one p attern to represent one unique perm utation. To show
th at changing the arb itratio n policy of a switch can create more than one connection
pattern for the same perm utation, consider the perm utation
p _ / 0

^2

1 2 3 4 5 6 7 \

1 4

Realizing this perm utation on the

8

6

x

8

7 3 0 5 ]'
network of Figure 5.1, w ith all switches

having a PT U policy, results in two blocked paths and

6

conflict-free paths, as

shown in Figure 5.4.
The two blocked paths are Q) and ^ . The two inlets associated with these
two paths are m arked in the figure. Shown also are the two blocking switches with
arrows inside. The arrow shows the original direction of the blocked p ath and how
the path was blocked. The goal now is to take a “snap shot” of the pattern in
Figure 5.4, including the blocking switches and the arrows inside, and consider it
one perm utation, nam ely P above.
Now, consider perm utation P again. Only this time assign switch X ( l, 0) a PTL
policy. The resulting connection p a tte rn is shown in Figure 5.5. It can be seen th at
the pattern is different from th a t of Figure 5.4. It can also be seen th a t although
this is still a perm utation with two blocked p ath , the two blocked paths are different
from the previous realization; in Figure 5.5, the two blocked paths are

and

.

Besides having only one p attern for each perm utation, to be able to calculate
the perm utations w ith £ blocked paths, the converse should also be true. T hat
is, there should exist only one perm utation for each connection pattern . This last
requirement is evidently true.
Although th e focus here is on perm utations with exactly two blocked paths,
some interesting results on the capability of the Baseline network to realize random
86

X

X

4

X

5

X

6

7

Figure 5.4: P erm utation P realized w ith a PTU policy at all switches

87

4
5

6

7

Figure 5.5: Same perm utation of Figure 5.4, realized w ith a P T U policy at all
switches except X ( 1 , 0 )

88

perm utations will be presented.
Let v)tP be the set of all perm utations of size N th a t, when realized on an N x N
Baseline network, have exactly £ blocked paths, 0 < £ < N , and N —£ conflict-free
paths. Then, it is obvious th at
N

>(«
E r a 1 = JV!>
«=o
where

denotes the cardinality of the set

(« .

T h e o r e m 5.1 In a permutation P £ Pffi, let the number of blocking switches be
donated as if. Then if = £.
Proof. The theorem will be proven in two steps. F irst, if ft £ is proven by contra
diction. Assume th at if < £. This means th at there are k switches, 1 < k < £/2,
each blocking two paths. This implies th at a switch can be in a state where neither
of its two in p u ts can get through. Obviously, this is not tru e because, from the way
the switches operate, at least one input must get through. The second p art is to
prove th a t if

£. This again will be proven by contradiction. Assume th a t if > £.

This entails th a t a p ath can be blocked in more th an one switch. But by definition,
if a p a th is blocked at a switch, it stops there; it does not go to another switch
where it may be blocked again. Thus the second assum ption is also wrong and the
theorem is proven. □
T h e o r e m 5.2 For any N x N Baseline network,

< p (° )

= 2 1/211

Proof. From the uniqueness of p ath property of the Baseline network, realizing
a perm utation should result in a unique connection p attern . A rbitrarily put each
switch in the Baseline network in one of its two legal states, shown in Figure 3.4. The
p a ttern resulting will represent one size-N perm utation. By changing the settings of
89

all the switches, one at a time, different patterns will appear, with each representing
a conflict-free perm utation. In any Baseline network the num ber of switches is
n N / 2 — v2t/~1. Since each switch can have one of two states, the to tal num ber of
p attern s th a t can be created by changing the states of the switches is 7?(°)

=

2 1' 2 "

□
T h e o r e m 5.3 For any N x N Baseline network, 'pC1)

=

0.

Proof. To prove this theorem , it is required to prove th a t once there is a blocked
p a th , there is at least one other blocked path. Consider th a t the path between inlet i
and outlet o is blocked. T hen there is a path between i and the wrong outlet o. This
m eans th a t i is missing its right outlet, i.e. a missing path. But a perm utation by
definition, is a one-to-one and onto m apping of N elements onto themselves. Thus,
outlet o m ust have an inlet in the perm utation to which it should be connected.
This m eans th a t o is missing its right inlet, i.e. another missing path. T h at is, one
cannot have a perm utation blocked in exactly one path, or -pf1) = 0. □
It should not be inferred, however, th at blocked paths occur in pairs, because
overlapping is possible for more than two blocked paths. In other words, it is possible
to see a pair of blocked paths satisfying the above theorem , and another pair also
according to the theorem , with one p ath in common with the two pairs, resulting in
a to ta l of three blocked paths. The theorem is dem onstrated in Figures 5.4 and 5.5,
where the two blocked paths are shown. It can be seen th a t when a p ath is blocked,
the inlet can still reach an outlet, but the wrong one.
It has been proven [6 ] th a t the minim um num ber of conflict-free paths is 2 ^ 2h
This translates in the notation developed here as
=0

90

for all £, N - 2 ^ /2l < £ < N .
From the Baseline network of Figure 5.1, it can be seen th at the set of outlets
accessible to switch X ( i , j ) depends on both i and j . For example, any switch in
stage 0, can access any outlet. A switch in stage 1 has access only to half the outlets.
For instance, switch X ( 2 ,l) has access only to outlets 4 through 7. Let Cl be the
set of all outlets, and let Sj be the set of all switches of stage j . Also, let Suj C Sj
be the subset of all switches X ( i , j ) th at have access to the same subset of outlets
^ u,j ^ fl- Then,

if and only if u = u.

5 (u , j ) is called a conjugate subset of switches [49]. It can be seen th a t each Sj
is divided into 7 j disjoint subsets Suj , 0 < u < j j — 1 , such th at
7> -l

U

~ ‘-bf>

u=0

and
_ J Suj
S uj fl S&j — | ^

if u = u
otherwise

where <f>denotes the em pty set. It can be easily verified th at for all jf, 0 < j < v —1 ,

where

0

< u < 7 ^ —1 , and
E

|s«jl = ISjl =

n

/2.

u=0

Considering the subset structure of the Baseline network, a new representation
for the network can be arrived at, the subset representation.

For the Baseline

network of Figure 5.1, the subset representation is shown in Figure 5.6. Subset SUij
at stage j ,

0

< j < v — 2 has access to exactly

S 2 u,j~t-i and

5 2u + i , j + i -

2

subsets in stage j +

1

, namely,

These two subsets are complementary in the sense th a t if an
91

stage

0

stage

1

stage

2

Figure 5.6: Subset structure of the Baseline network of Figure 5.1

92

input of a switch in subset S uj cannot reach one of them because of a conflict, it will
by default reach the other. This observation will be utilized in the next theorem .
T h e o r e m 5.4 In a permutation P e

(21

let the two blocking-switches be X { i , j )

and X ( i , j ) . Then j = j , and X ( i , j ) , X ( i , j ) E S uj , where u = [ i/ 7 iJ = \ } h j \ Proof. From the subset structure of the Baseline network, a switch at subset Suj , 0 <
j < v — 2, has access only to the two subsets S^uj+i, ^ u + ij+ i- Suppose one of the
two inputs of switch X ( i , j ) E S uj wants to go to subset S 2 U,j+i but cannot enter
because of a conflict. Then th at input will be connected incorrectly to S^u+ij'+iThis wrong connection will be of course at the expense of a right connection which
will incorrectly go to -Shu+ij+i from another switch. T h at is, the two switches,
from which wrong connections em anate, have access to the two subsets S 2 uj.fi an d
■S,2 u+i,j+i) an d both are one stage before j +

1

. These two switches can only exist in

one unique subset in the network, Suj . □.
By looking at Figures 5.4 or 5.5, one can recognize two types of blocking: the
first is when the Two inputs Request the U pper (TRU) output and the second when
the Two inputs Request the Lower (TRL) o utput. These two types of blocking are
shown in Figure 5.7. The arrow indicates th e direction in which the lower input
wanted to go but could not because of a conflict with the upper input which has a
higher priority.
(2)
T h e o r e m 5.5 In a permutation P € V t f , if one of the two blocking switches has
a TRL-type blocking, the other must have T R U type of blocking, and vice versa.
Proof. The proof follows again from the subset structure of the network and from
the definition of a perm utation. The requests of a given subset m ust be divided
exactly in half between the next two subsets. If two inputs of a switch ask for the
93

a) T w o R e q u e st U p p e r (T R U )

b ) T w o R e q u est L ow er (T R L )

Figure 5.7: The two types of blocking, assuming higher priority for the upper input
lower subset (TRL), then the two inputs of the other switch m ust be asking for the
upper subset (TRU). By the same token, the converse is also true. □
The above theorems and discussion will be used to find the num ber of per
m utations w ith exactly two blocked paths,

. F irst, from Theorem 5.1, there

are exactly two blocking switches. For each position assumed by the two blocking
switches, one can arbitrarily set the remaining switches, with each setting repre
senting one perm utation with exactly two blocked path. Let B be the num ber of
perm utations P € V N caused by blocking in two specific switches having two spe
cific blocking types. It is obvious th at there is a perm utation for each tim e the state
of one of the rem aining v2v~x —2 switches is changed. Thus,
B =

2 1' 2’' ' 1- 2

(5.1)

Now, letC be the num ber of ways in which the two blocking switches can appear
in the network. Then the total num ber of perm utations w ith exactly two blocked
p ath can be found as
V ^{ \ = B C

(5.2)

Recall from Theorem 5.5 th a t the two blocking switches always have opposite (or
94

com plem entary) types of blocking. Therefore the two switches are distinguishable
and in finding C, perm utations, rather than combinations, must be used. Recall
fu rth er from Theorem 5.4, th a t the two switches cannot assume any arbitrary loca
tions in the network, but they m ust be in the same subset. Let C, and Cj be the
num ber of ways the two switches can appear in subset S uj and stage j , respectively.
Then,
C , = p ( \ S ^ \ , 2 ) = p ( 2 - i- \ 2 )
where p ( 2 ''- J - 1 , 2 ) =

is the perm utations of 2l'~^~1 things taken

(5.3)
2

at a

tim e.
Since there are 7 j subsets in stage j , then
Cj

jC,

=

7

=

w

( 2 ~ i - ‘ ,2 )

(5.4)

Substituting 2J for 7 j and w riting p ( 2 U~^~1,2) explicitly in the above equation gives
Cj

=

2 j 2 u~j - 1 ( 2 v - j - 1

=

2*'- 1 (2u~j - 1 - l )

l)

-

(5.5)

The num ber of perm utations with exactly two blocked paths associated w ith block
ing in stage j is obviously BC j. Since perm utation blocking can occur in any stage
but stage

u —1 , the to tal num ber of perm utations with exactly twoblocked

in any Baseline network is
v-2

=

Y .B C ,
j= 0
u-2

=

Y , 2 l/2U~l - 22 ‘" 1

- l )

j=0

_ 2v2''~l+u~3Y
j= 0
95

- l)

paths

N
4

V

8

3
4
5

16
32
64

l-Pw'l
16
4096
4.23 x 109
1 . 2 x 1 0 24
6.28 x 1 0 57

2

6

8

16384
9.49 x 1010
1.26 x 1 0 26
2 . 8 6 x 1 0 6°

0.5
4
22

104
456

(2 )
-p(°)
Table 5.2: Values of <rT>
N and r N and their ratio for some values of v
'v-2
-

2V

2-1

^■=o

(5.6)

-(■'-I)

v-2

The factor

^ 2

; is a finite geometric series whose sum is 2 —

Substituting

;= °

in Equation 5.6, and with some algebraic m anipulation, one can find th a t

p ( 2)

(5.7)

N

Equation 5.7 represents the num ber of all the size-N perm utations which when
realized on a Baseline network, result in exactly two blocked paths. W ith an en
hanced SFTB , if these two p aths are realized on the two standby buses under norm al
conditions, th en the equation gives the num ber of extra perm utations th a t the en
hanced SFTB can realize. Table 5.2 summarizes these results for some values of
N.

5 .4

U s e o f th e S ta n d b y B u s U n d e r N o r m a l C o n 
d itio n s

It has been shown th a t the two buses of the enhance SFTB can be used under
norm al conditions to enhance the network capability of realizing perm utations. In
addition, there are two more functions th a t both the SFTB and its enhanced ver
sion can perform under norm al conditions. These two functions are broadcasting

96

and establishing critical connections th at cannot otherwise be established over the
Baseline.
Broadcasting has been suggested [79] as a useful operation in multiprocessor
systems. A processor may want to access all the memory modules at once and
write to them simultaneously. Providing this capability on the ordinary Baseline
network requires specially designed switches th at can connect one of their inputs
to the two outputs simultaneously. It turns out th a t providing this capability adds
considerably to the complexity of the switch. However, w ith a standby bus having
access to all memory modules, the broadcast function is readily available. W ith a
CCU, a processor can ask it to p u t all the multiplexers in the one position. W ith the
dynamic access situation, the multiplexers can be m ade to understand a broadcast
“b it” which throws them im m ediately in the 1 position. One more advantage of
implem enting broadcasting on the bus, rather than on the network, is speed. The
time needed to establish a broadcast connection on the bus, Tg, is the time needed
to access the bus added to th e tim e needed to put the multiplexer in the

1

position

(assuming th a t the propagation delay on the bus is negligible). Clearly, T b is at
most as large as the propagation delay of one switch, Ts, which is estim ated [69]
to be

8

gate delays. Thus broadcasting on the bus is at least u times as fast as it

would be on the network.
The other function th a t the bus can perform under norm al conditions is estab
lishing connections th a t are needed urgently but cannot be established over the
Baseline network. Suppose th at a processor critically needs to establish a p ath to
a given mem ory module, b u t the p ath is blocked. W ith the SFTB, one such con
nection can be established at a tim e, while w ith the enhanced version two can be
established at the same time.

97

5.5

U s in g t h e S F T te c h n iq u e in M I N s w ith large
s w itc h e s

In this section, the perform ance of an SFT network under faulty conditions will be
examined. In the analysis below, only the ordinary SFT technique (th at is, only
one standby bus) will be considered; the results can be easily adapted to other
variations of the technique.
It should be evident th a t the m ain problem in the SFT technique is th a t only one
processor can access the bus at a tim e. Two approaches were specified to resolve
contention for the bus: the CCU approach and the dynam ic approach. In both
approaches, a processor waits at most a period of time, ^m axim um w ait’ before it
takes control of the bus. Clearly,
^m axim um wait = (^ —l)^ c + ^misc

(5-8)

where
1

. fi is the num ber of processors com peting for the bus,

2. Tc is the memory cycle duration, and
3. Tm;gr is tim e w aited by the processor due mainly to propagation delays. In
the CCU scheme, for example, r m;sr is the tim e taken in communicating with
the CCU and the arb itration tim e. In the dynamic access scheme, Trn;sr is
the tim e taken in counting before testing the busy line. In both schemes,
Tjnisc includes the tim e needed to set one m ultiplexer and one demultiplexer.
The second term of Tm axjm um wajt, ^m isc’ m ay seem independent of ji but
careful exam ination reveals th at it is indeed dependent on /x, regardless of which
access scheme is used. Moreover, the first term of ^m axim um wait depends linearly
98

on p. It is therefore the value of pi th a t determ ines the efficiency of the SFT scheme
for any network. The larger the value of

pi

is the less efficient will be the network.

The value of pi is determ ined by the num ber of inputs or outputs of the largest
switch used in the MIN. For instance, if a network has x switches, one of which of
size

8

pi =

8

x

8

and the rem aining x — 1 switches of size

2

x

2

, th a t network will have

. In such network, the efficiency of the SFT technique will be much less than

w ith a binary network, such as the Baseline network, where p. = 2. Therefore,
binary networks are the best candidates for the SFT techniques. O ther techniques
m ust be devised for networks with large switch sizes. Such technique is introduced
in Chapter

5 .6

6

for the Clos network.

D is c u s s io n

In this chapter a novel technique to add fault tolerance capabilities to MINs has
been introduced, the Simple Fault Tolerance (SFT ) technique. In this technique,
an external bus is used to offer an “emergency link” between the inlets and outlets
of th e network, in case a fault occurs. Under norm al conditions, the processors use
the network as norm al, with the bus totally invisible. Under faulty conditions, the
processors affected by the fault use the bus, while the unaffected ones continue using
the network. The SFT network, thus, incorporates two interconnection mechanisms,
the original network and the bus, both of which have been thoroughly analyzed in
the literature for use in multiprocessor systems. The advantages and shortcomings
of each have been pointed out: the bus is simple b u t cannot support a large number
of processors, while the network can support a large num ber of processors but its
hardw are is complex. Thus if the bus can be guaranteed to serve a small num ber of
processors, it can give both simplicity and good perform ance. This is the principle
behind the SFT technique, as the num ber of processors affected by a fault in a MIN
99

is much less th an the total num ber of system processors.
Although this technique can be applied to virtually any MIN, it works best
with binary networks, those with 2 x 2 switches such as the Benes and Baseline
networks. In these networks, a single fault affects at m ost only two processors. In
such a case the perform ance of the bus will be nearly as high as th a t of a bus in a
uniprocessor system. In this chapter, the technique is applied to design the Simple
Fault Tolerant Baseline (SFTB) network. The design is described and the perfor
mance is examined. The SFTB is shown to have five advantages. F irst, there is
no performance degradation under normal conditions. Second, it allows im m ediate
full-access retention to affected processors under faulty conditions. T hird, it uses
the same binary switches of the ordinary Baseline; no specially designed switches are
required. Fourth, it uses the same num ber of switches and stages as the ordinary
Baseline. Finally, the SFTB uses the same distributed routing algorithm as the
ordinary Baseline both under norm al and faulty conditions. Only those processors
affected by the fault (at most 2) use a different algorithm . A by-product feature of
the SFTB is th at it can implement quickly and easily broadcast connections on the
bus under normal conditions, thus increasing the system efficiency. The control of
the SFTB is extremely easy. Several control schemes are suggested and examined.
An enhanced SFTB can be developed by adding another bus. W ith this addition,
an ultim ate criterion of fault tolerance is achieved, complete recovery. Moreover, the
two extra buses can be used to relieve blockage under norm al conditions. Here, if
all the switches are operational but a p ath cannot be realized due to a conflict, th a t
p ath can be established on the bus, thereby im proving the throughput of the system.
The num ber of the extra perm utations realizable in this m anner is calculated. In
the context of this calculation, some new and interesting results on blocking in the
Baseline network are presented.
100

It has been seen th a t the num ber of processors affected by the worst case failure
event, determines the am ount of perform ance degradation under faulty conditions
of networks using the SFT technique. In any MIN, the worst case failure event is
switch failure. It is clear then th a t as the size of the network switches increases, the
performance of the SFT technique will decrease.
Clos networks are characterized by using non-binary switches. In fact, there
is no limit on the size of a switch in the Clos network. Thus, a fault tolerance
technique suitable for the Clos network is w orth developing. This point, coupled
w ith the fact th at the literatu re does not seem to have any fault tolerant design
for Clos networks, have been the m otivation behind a fault tolerant Clos network
presented in the next chapter.

101

C h a p te r 6
T h e F a u lt-T o le r a n t C los N e tw o r k
Clos networks inherently have the full access retention property if the fault is
in a middle stage switch. This is due to the fact th at each outer stage switch is
connected to all middle stage switches. But with this inherent fault tolerance, one
cannot realize a perm utation th at was realizable before the failure of the middlestage switch. Moreover, a fault in one of the two outer stages cannot be tolerated by
the network. In either case, the ability of the network to realize perm utations will be
im paired until th e fault is physically removed. The design of a Fault-Tolerant Clos
(FT C ) network presented in this chapter offers the complete recovery capability
which of course includes full access retention.

6.1

D e s ig n o f th e F T C

For the F T C , the fault model is defined as follows.
1. Any switch can fail.
2

. Any interstage link can fail.

3. External links and m ultiplexers/dem ultiplexers cannot fail.

102

0_
1_
2_

_ 2

3_
4_
5_

_ 4
_ 5

6_

7_

_ 7

8_

stage

0

stage

1

stage

2

Figure 6.1: 9 x 9 ordinary Clos network
It should be m entioned th a t faults are assumed to occur independently, and th at
faulty components are unusable.
The fault-tolerance criterion of the FTC is complete recovery, th at is, regaining
pre-fault connectivity after a fault occurs. The fault tolerance size of the FTC is

1

.

Since the F T C is single-fault tolerant, complete recovery is possible if only one fault
occurs. In the FT C , one switch in each stage can fail with the network rem aining
fully functional; therefore it can be called 3-fault robust.
A Clos network of size 9 is shown in Figure 6.1. In stage 0, each crossbar switch
has 3 inputs and 3 outputs, hence its size is 3 x 3.
Recall form C hapter 3 th a t a Clos network of size N , must have k = N / m
switches of size m x n in stage 0, and k = N / m switches of size n x m in stage
2. The switches of stage

1

m ust be of size k x k. Furtherm ore, there are exactly n

switches in stage 1. It should be noted th at in Clos networks, n > m. An ordinary
Clos network has n = m. W hen n > m , some degree of fault tolerance is obtained,
103

a fact utilized in the design of the F T C .
An FTC of size N is formed from an ordinary Clos of size N as follows. F irst,
use switches w ith n = m +

1

in the outer stages. Second, add one extra switch

to each of the three network stages. Each switch must be of the same size as the
switches of the stage to which it is added. T hird, connect the network inlets to the
inputs of the first stage switches via

1

x

2

dem ultiplexers, and the network outlets

to the outputs of the th ird stage switches via 2 x 1 m ultiplexers. As an example,
the FTC equivalent of th e network of Figure 6.1 is shown in Figure
be noted th a t using switches with n = m +

1

6 .2

. It should

in the outer stages autom atically

adds an extra switch to the middle stage. As will be seen later, this provides fault
tolerance to the middle stage. W hat rem ains then is to make the outer stages also
fault-tolerant; th a t is why one extra switch is added to each of these two stages
as shown. In the FT C , each inlet is connected by a demultiplexer to two distinct
switches in stage 0. Also, each outlet is connected by a m ultiplexer to two distinct
switches in stage 2. These multiplexers and demultiplexers serve as a fault recovery
system in the case of a fault in either of the two outer stages. This type of fault as
well as faults in the middle stage, stage

6.2

1

, will be described later.

R e c o n fig u r a tio n o f th e F T C

The m ajor feature of the FTC is its ability to be reconfigured such th a t pre-fault
connectivity is totally regained. At any given tim e, there are three unused switches
in the FTC, one per stage.

Let these three switches be X ( f o,0), X ( f 1, 1 ) and

X ( f 2, 2 ), where / 0, f i and

are the unused switch numbers for the first, second

/ 2

and third stage, respectively. The configuration of the FT C at any tim e is a function
of the present values of / 0, f \ and / 2.
In general, the reconfiguration of the FT C can be performed through one or
104

dem ux

D— 6

means unused
Figure 6.2: The equivalent FTC of the network of Figure 6.1

105

more of the following operations:
1

. Changing the state of the m ultiplexers and demultiplexers

2. Term inal relabelling
3. P erm utation translation
As will be seen below, the value of f i affects operation 2, while the values of f 0 and
f ’l affect operations

and 3.

1

The m ultiplexer/dem ultiplexer state change operation is perform ed if an outer
stage switch fails. W hen the FTC is not faulty, one switch in each stage will be
unused. This unused switch can theoretically be any switch, but for convenience
it will be assumed to be the last switch in each stage, i.e. X ( k , 0), X ( n — 1,1),
and X ( k , 2). This choice is convenient because it makes the m ultiplexers and de
multiplexers rem ain in state
they switch to state

1

0

under norm al conditions until a fault occurs; then

, thereby avoiding the defective switch. Suppose for example

th a t _X"( 1 , 0 ) in Figure

6 .2

fails during norm al operation. T hen the demultiplexers

attached to th a t switch will change their state to state 1. This gives the resources
attached to

(1,0) access to the network through -A(3, 0) instead. Realize now that

X ( 1 , 0 ) is the unused switch in stage

0

, which confirms the fact th a t at all times

there is one unused switch in each stage.
Perm utation translation is also perform ed if an outer stage switch fails. Let
P = {Po: Pi,

■ ■ ■, P

n

-

i} be an arb itrary perm utation of { 0 , 1 , . . . , N — 1 }. In the

actual network, Pi is the outlet to which inlet i is to be connected. In an ordinary
Clos network, P goes directly to the central routing unit where the settings of the
individual switches are extracted and delivered to the switches for im plem entation.
In the F T C , the same steps are to be taken with the exception th a t perm utation P
is translated before it goes to the central routing unit.
106

Terminal relabelling is perform ed if a middle-stage switch fails. As m entioned
above, / j affects the labelling of the outputs of switches X ( z , 0 ), and the inputs of
switches X ( z , 2), 0 <

2

< k + 1 . Let these outputs and inputs be referred to as the

inward terminals of the outer stages or just the inw ard term inals. In each of these
switches, only m out of th e n inw ard term inals will be used, and will be referred
to as the active terminals. Each active term inal will have two labels: a local one,
to be used by the sw itch’s control unit, and a global one, to be used by the central
routing unit. The local label is an integer z, 0 < z < m , and the global label is
also an integer Z , 0 < Z < m (k + 1). The active term inals will be labeled from
top to bottom locally, with respect to the switch, as the sequence

0

, 1 , . . . ,m — 1 .

Globally, the active term inals th at were labelled locally will be labelled from top to
bottom , with respect to the stage, as 0 , 1 , . . . , m ( k + 1) —1. This labelling is shown
in Figure 6.2, for

/1

= 3. In Figure 6.3, AT(2,1) is faulty, i.e. f i = 2. Therefore, the

active term inals have been changed to exclude the third inward term inal in each
switch. The new labels are shown in the figure.
The labels are updated always after a fault occurs, and the current labels are
th e ones used to im plem ent the routing inform ation received from the control unit.
Realize th a t leaving out one inward term inal in each switch in stages 0 and 2 sums
up to leaving out 2k inw ard term inals. If these 2k term inals in each of the two
stages are chosen to have th e same position w ith respect to the individual switches,
then one middle stage switch will be left out. This switch is the unused switch
under norm al conditions and the defective switch under fault conditions.

6 .3

R o u tin g th e F T C

T he FTC is still required to perform the same function as the ordinary network realization of perm utations. For an ordinary Clos network, a perm utation is sent
107

y/ means unused
Figure 6.3: F T C reconfigured to accommodate faults in X ( l, 0), X ( 2 ,l) a n d X (2 ,2 )

directly to the routing algorithm where the proper switch settings to realize the
p erm u tatio n are found. However, for the FTC the perm utation cannot be subm it
ted to the routing algorithm as is, for n ^ m . Instead, a new perm utation Q is
generated from P as described below. This perm utation translation is a flexible
routine th a t can easily accom m odate faults in the switches of the input or output
stages. Together with adjusting the right demultiplexers an d /o r m ultiplexers, this
routine can keep the network running after the occurrence of a fault in either or
bo th of the outer stages.
Given perm utation P = {P0, P i , . . . , P/v_ 1 }, perm utation Q = {Qo, Qi , • •., C?jv+m-i}
can be created as follows.
If \ i / m \

/o and |P j/m J ^ f 2 then Q, = Pi.

If It/m J

/o and [P ;/m J = f 2 then Qi = N + (P» m od m ).

If [i/m \

= /„ and [P»/m J7^ h then Qn+{< mod m) = Pi-

If [i/m \ = f 0 and [Pi/m \ = f 2 then QN+{{ mod m) = JV + (Pi mod m ).
The m rem aining elements in Q are formed by arbitrarily m apping the m labeled
outputs of X (/o ,0 ) onto the m labeled inputs of X ( f 2, 2).
To illustrate with an example on the FTC shown in Figure 6.2, consider the
perm utation
/ 0 1 2 3 4 5 6 7 8 \
^ 3 4 8 7 6 1 2 5 0 / '
T he realization of the element Q,.) means th at inlet i is to be connected to outlet
Pi on the actual network.

Initially, let the unused switches be X (3 ,0 ), X ( 3 ,l) and X (3 ,2 ) in the three
network stages, as shown in Figure 6.2. Recall th at this is the configuration sug
gested to be used under norm al conditions. Then perm utation Q, according to the

109

rules set forth above will be

Q =

0 1 2 3 4 5 6 7 8 9
3 4 8 7 6 1 2 5 0 ®

10 11
x x

W here x £ {9,10,11} and the m apping is one-to-one and onto. The condensed
m atrix representation of P is
0 2 1
1 0 2
2 1 0

Hz =

o

to

2

0

0
0

0 ■
0
0
CO

0
1

--- 1

1—1

H, =

1
2
1--o

On the other hand, th e condensed m atrix representation of Q is

It is obvious th a t the size of the m atrix increases by exactly one row and one column.
This is because of the two unused switches X (3 ,0 ) and X (3 ,2 ).
Using N eim an’s algorithm for routing, it can be seen th a t the new perm utation
does not com plicate decomposing the m atrix. T hat is because the algorithm uses
the condensed m atrix representation of the network and proceeds by selecting a non
zero element from a row (or a column) at a time. Problems arise in this algorithm
if there are rows or columns with more th an one non-zero element. B ut since the
condensed m atrix of th e F T C introduces a row and a column with each having only
one non-zero elem ent, th e algorithm will be forced to choose this element every pass
of the decom position process. It is obvious th a t the new condensed m atrix is not
easier to decompose th an the old one either.
To give another translation example, assume th a t switches X (1 ,0 ), X ( 2 ,l) and
X ( 2 ,2) of Figure 6.2 suddenly failed. This situation is depicted in Figure 6.3. Then,
the new values for the unused switches will be /o = 1, f \ = 2 and fa = 2. Due to
110

the failure of X ( 2 , 1), the inward term inals of stages 0 and 1 should be relabelled.
Specifically, inward term inal num ber 2 of each switch should be left out in assigning
the numbers. The failure of AT(1, 0) and X ( 2 , 2) affects the perm utation translation.
Perm utation P given before is tran slated according to the rules laid down above to
/ 0 1 2 3 4 5 6 7 8
y 3 4 1 1 a : x s 2 5 0

9 10 11 \
10 9 1 j ’

where, x € {6,7,8} and the m apping is one-to-one and onto. The routing result
will be im plem ented by all the switches except the ones th a t are defective, namely,
X (1 ,0 ), X ( 2 ,l) and X (2 ,2 ). The m atrix representation of perm utation Q above is
'0

3“

2

0

1

'

0 0 3 0
2 1 0 0
. 1 0

0

2

'
.

Again, if Neim an’s algorithm is used, the new element in H 3, namely H3(2,3),
will neither complicate nor facilitate th e algorithm . Notice th a t the element added
to Hm is always Hm(fo + l , / 2 + 1) = m . The time complexity of routing for an
interconnection network is an im p o rtan t m easure of the efficiency of the network.
It is shown below th at if Neim an’s algorithm is used, the tim e complexity of routing
an FTC is equal to th a t of routing the ordinary Clos network.
The tim e complexity of Neim an’s algorithm is
0 { T ) = 0 (m k* ).
The FTC has one extra switch in the outer stages, i.e, k + 1 switches in each
outer stage. So if k + 1 is substituted in the above expression for k, then
0(T )

=

0 ( m x ( f c + l ) 4)

=

0 ( m ( k 4 + 4&3 + 6k2 + 4A: + 1))

=

0 ( m k 4)
111

Likewise, if a graph-based algorithm is used, it can be shown th at the time com
plexity for routing rem ains the same.
Using a graph-based algorithm , the new row and column added to the con
densed m atrix represent two vertices in the bipartite graph. These two vertices are
connected by m edges, where m is as defined above. Figure 6.4 shows the graph
representation of both the ordinary Clos network of Figure 6.1 and the FT C of
Figure 6.3 as they realize perm utation - ^ = ^ 3 4 g y g ^ 2 5 o ) '
Since m = 3, three edges, shown as dark lines in Figure 6.4b, will be stretched
between the two extra vertices. It can be seen th a t edge-coloring the new graph is

a) O rdinary network

b) FTC

Figure 6.4: The graph representation of both the ordinary network of Figure 6.1
and the FT C of Figure 6.3 as they realize perm utation P
neither easier nor more difficult th an for the original graph. T h at is because three
112

colors will be chosen and assigned to the edges such th a t no two edges incident
on the same vertex have the same color. It is easy to see th a t in the new graph,
each of th e three added edges will be assigned a color in a straightforw ard manner.
In fact, any algorithm with a polynom ial tim e complexity will have the same time
complexity on the FTC as on the ordinary Clos network.
The discussion so far has dealt only with switch faults. Link faults can be easily
handled as follows. If a link between switches X{i.tj ) and X ( a , j -f 1), 0 < j < 1,
fails, the case is treated as if the two switches have failed, and the procedures
discussed above are applied. Recall th at the FTC is capable of tolerating more
th an one simultaneously faulty switch provided th a t there is only one such switch
per stage. This solution has the advantage of keeping the reconfiguration process
as simple as possible. More elaborate solutions can be designed but will complicate
the ability of the network to reconfigure itself easily.

6 .4

R e lia b ility A n a ly s is

Here the reliability [82] of both the ordinary Clos network and the FT C are exam
ined. First, define the reliability, r, of a single switch as the probability th at the
switch does not fail over a period of tim e r . Then, / = (1 — r) is the probability
th a t the switch fails in the same period r . Similarly, define the reliability R of the
network, ordinary or FTC , as the probability th a t the network does not fail over a
period of tim e r . Then F = (1 —R ) is the probability th a t the network fails in the
same period r . A switch fails if it cannot realize, partially or completely, a mapping
of its inputs onto its outputs. Similarly, a network fails if it cannot realize, partially
or completely, a m apping of its inlets onto its outlets.
Evidently, for the ordinary Clos network to be fully operational over the period
of tim e r , all of its switches m ust be operational over the same period of tim e r . For
113

simplicity, assume th a t all the switches have the same reliability r. Therefore, the
reliability of the ordinary network, assuming statistical independence (independent
failure events), is
R o r d in a r y = T 2 k + m

(6.1)

where 2k + m is the num ber of switches in th e ordinary Clos network.
For the F T C , th e network will rem ain fully operational if up to one switch
in every stage fails.

Let R 0,

R-i and R 2 be the reliabilities of stages 0, 1, and

2, respectively.Clearly, the three stages are statistically independent. Thus the
reliability of the network is
R f t c = R 0R 1R

2

(6-2)

The reliability of the first stage, R 0, is the probability th a t at least k out of the
k + 1 first stage switches, will be operational. Alternatively, if F0 is the probability
th a t the first stage fails, then
Ro = 1 — F0

(6.3)

For stage 0 to fail, given th at there is one extra switch, at least two switches
will have to fail. This is a case of binomial distribution or Bernoulli trials [68], for
which F q can be w ritten as
Jc-l
i r k + l —i

jt-i

£

= £ I
i=0

7

’•‘(i -

<6-4)

( k+1 \
where I
I is th e combination of k -f 1 taking i at a time.
Substituting in Equation 6.3 and realizing th a t R 0 = R 2 due to symmetry, it
follows th at

JJo = R 2 = 1 - ‘f ; ( k t 1 ) r*(l - r)1*1-'
i=0 \
'
114

(6.5)

A sim ilar analysis shows th a t the reliability of the middle stage is
■Ri = 1 - I ] ( m ,+ 1 ) >-‘(1 - r ) ” +I- i

(6.6)

S ubstituting Equations 6.5 and 6.6 in Equation 6.2 yields,

R ftc = j l - £

( * 7 1 ) r‘(l - r)‘+ '-‘} | l - E

( m 7 1 ) r‘(1 - '■r+1' i }
(6.7)

Equations 6.1 and 6.7 thus represent the reliabilities of the two networks, the
ordinary and the fault tolerant.

They are plotted in Figure 6.5 for m = 6 and

r = 0.98. T he reliability of both networks drops as N increases. This is due to the
fact th a t m is constant and therefore a larger N implies a larger num ber of switches
in the network (at least in the two outer stages). Intuitively, the more components
the network has, the less reliable it is. It is, therefore, understandable why the
reliability of bo th networks falls as N increases. However, it can be seen th a t for
th e same IV, IV > 0, the reliability of the FTC is higher th an th at of the ordinary
Clos network.
It can also be seen from Figure 6.5 th at as N increases, the reliability of the SFT
becomes considerably higher th an th a t of the ordinary network. T h at is because,
the higher the num ber of switches of the network is, the higher is its vulnerability
to failure. The existence of one more switch in the FT C makes a single failure in
the network insignificant. Therefore, the FTC is recommended for networks where
there are a large num ber of switches.
To see th e effect of the reliability of the individual switch on the reliability of the
ordinary network, and on the need for an FTC , Equations 6.1 and 6.7 are replotted
in Figure 6.6 for r = 0.8. Notice th at the horizontal axis is different from th a t of
Figure 6.5. It can be seen, first, th a t the reliabilities of both networks are much less
115

Reliability, R
1.0 -+
o
0.9

r = 0.98, m — 6

—o
o

0 .8

-

o

»

F a u lt to le ra n t n e tw o rk

q

O rd in a ry netw o rk

O
0.7

-

O
O
O

0.6

o

nr, —

0.4

—

0.3

—

0 .2

—

0.1

—

o

•

0

i
100

r
200

•

t300

400

500

600

700

800

900

1000

No. of Inlets or Outlets, N

Figure 6.5: Reliability vs. N for b o th the ordinary network and the FTC , for
t- = 0.98

116

Reliability, R
1 .0 ->
0.9

r = 0.8, m rs 6

-

# F a u lt to le ra n t n e tw o rk
0 .8

—

0.7

-

O O rd in a ry n e tw o rk

O•

o•
o •

0.6

o •
0.5

—

o •
o •

0.4

o

-

•

0
0.3

•

o

-

•

o
0 .2

•

o

-

•

O
0.1

I

o

-

0

i
25

— i
50

75

i
100

i--------- 1--------- r~
125

150

175

200

225

250

No. of Inlets or Outlets, N

Figure 6.6: Reliability vs. N for both the ordinary network and the F T C , for r = 0.8

117

th an those of Figure 6.5. T hat is understandable because the switches represent
the building blocks for the network, and the reliability of the network is determ ined
m ainly by the reliability of its switches. Second, notice th a t the reliability of the
F T C is greatly higher th an th at of the ordinary network over a wider range of N
th a n the case in Figure 6.5. Indeed, the FTC is more beneficial for networks with
poor switch reliabilities. In the lim iting case, r = 1, there is clearly no need for any
fault tolerance (recall th a t what is being said about switches, includes in fact both
switches and links).

6 .5

G e n e r a liz a tio n to M o re T h a n O n e E x tr a S w itc h
p e r S ta g e

W hen more than one switch is added to every stage, in the same m anner described
for the F T C , greater reliability is expected. To verify th a t, Equation 6.7 will be
generalized to the case where x switches are added to each of stages 0 and 2, and y
switches are added to stage 1.
Using the same procedure used to derive Equation 6.7, it can be shown th a t the
reliability of the new network, Rmorei is

im -t-y -i

( 6 .8 )

This equation is used in Figures 6.7 and 6.8 to show the ratio of the reliability
of a fault tolerant Clos network and the reliability of its ordinary version. F irst,
Figure 6.7 shows the reliability ratio for four cases, namely, when 1, 2, 3, and 4
ex tra switches per stage are used (th at is,a: = y = 1, 2, 3 and 4).

In all cases

the reliability of the individual switch is r = 0.8. Since m is fixed, the horizontal

118

-^F.T./-^Ordinary
io9 —
. A

r = 0.8, m = 12

io8

A : 4 e x tr a sw itches p e r s ta g e
B : 3 e x tr a sw itches p e r sta g e

io7

.B

C : 2 e x tra sw itch es p e r sta g e
D : 1 e x tra sw itches p e r sta g e

io6 -

. .c

io5 -

io4 •

io3

•

D

io2

10

-

1
0

100

r~
200

T

T

300

400

T
500

600

700

I

I

800

900

T
1000

No. of Inlets or Outlets, N

Figure 6.7: Gain in reliability vs. N for various fault tolerant networks with r = 0.8

119

iZpT./-^Ordinary

.A
.

-B

r = 0.99, m = 12

4 —

A : 3 e x tra sw itches p e r sta g e
B : 2 e x tra sw itches p e r sta g e
C : 1 e x tr a sw itches p e r sta g e
3 —

2

—

n
100

i
200

i
300

i
400

i
500

i

r

600

700

800

900

1000

No. of Inlets or Outlets, N

Figure 6.8: Gain in reliability vs. N for various fault tolerant networks with r = 0.99

120

axis really represents the num ber of switches in the network. It can be seen how
th e reliability of the Clos networks can be increased by many orders of m agnitude
ju st by adding a few switches. The gain in reliability monotonically increases as
N increases as concluded before. However, this increase tends to satu rate as N
becomes higher and higher. It can be also seen th a t as the num ber of extra switches
per stage increases, th e gain in reliability always increases. At N = 0 there is no
gain in reliability regardless of the num ber of extra switches, because N = 0 means
there is no network.
It was m entioned earlier th at if the reliability of the individual switch increases,
the reliability of the network, ordinary or fault tolerant, increases. This fact is
dem onstrated in Figure 6.8, which is similar to Figure 6.7 except th a t r = 0.99. It
can be seen th a t if r is so large, the addition of more than one switch per stage is
unw arranted. Unlike th e case in Figure 6.7, where the addition of one more switch
increased the overall reliability of the network by orders of m agnitude, the addition
of one more switch in Figure 6.8 increases the gain only slightly. In fact, the curve
for the network with x = y = 4 could not be drawn here because it coincided with
curve for the network w ith x = y = 3 throughout the range of N in the figure. The
figure also shows th a t for small N , adding any num ber of extra switches per stage
yields the same gain in reliability. Therefore, it can be concluded th a t when the
reliability of the individual switches is high, there is no need for adding excessive
hardw are, especially when TV is small.
It is obvious from Figures 6.5 through 6.8 th at adding more switches per stage is
more advantageous when the number of switches in the network is large. For Clos
networks w ith a small num ber of switches (implied by small IV), the addition of one
switch per stage would be sufficient. Adding more switches per stage can be seen
to increase the overall reliability of the network. However, reconfiguration of the
121

network would be m ore difficult and time consuming. Moreover, the extra switches
would increase the hardw are of the network and complicate its design.

6 .6

D is c u s s io n

This chapter shows the design and performance of a fault-tolerant Clos network,
th e F T C . Clos networks are used mainly to realize perm utations. W ithout any fault
tolerance, if a switch in the network fails, the network is rendered inoperative and
th e system has to be in terru p ted to put the network back to work. W ith the fault
tolerance introduced here, the network can continue its work uninterrupted with the
presence of a fault. T h at is possible simply because the FTC can reconfigure itself
dynamically, by changing the settings of the multiplexers and demultiplexers and
using the adaptive p erm utation translation scheme presented. The defective item
can then be repaired during the tim e at which the system is unused. Besides the fault
tolerance the F T C provides, the reliability of the network is greatly enhanced. High
reliability m eans more system availability (the tim e of an uninterrupted operation).
It is seen from the analysis th at using this fault tolerance approach is m ost beneficial
when
1. the reliability of the individual switches is poor
2. the num ber of switches in the network is large
As far as reliability is concerned, adding more th an one switch per stage is recom
m ended to a certain num ber of extra switches. This num ber depends on the num ber
of switches in th e network and the reliability of the individual switch, and can be de
term ined for an optim um value. However, p u ttin g a large number of extra switches
per stage adds significantly to the network hardw are and routing complexity.

122

C h a p te r 7
C o n c lu sio n s
7.1

Sum m ary

This thesis has focused on fault tolerance for interconnection networks in general,
and for three networks in particular. The three networks are: the Baseline network,
the Benes network and the Clos network. These three networks have found wide
interest in the past three decades. Fault tolerance has become a consideration only
recently, after large-scale m ultiprocessor started to become a reality.
The thesis started by defining a generalized MIN model which was later used
system atically to p ut in perspective the MINs considered in the thesis. This rig
orous foundation was a key step to understanding how a given MIN can be m ade
fault-tolerant. In devising fault tolerance techniques for MINs, one should meet two
common criteria. First, the fault tolerance mechanism should not add significantly
to the hardw are complexity of the system. Second, the mechanism should not sig
nificantly degrade perform ance under both normal conditions and faulty conditions.
The two fault tolerance techniques presented in this thesis meet the above m en
tioned criteria. The two add to the wealth already in the literature. However, they
b oth have features which are unique to them. Taken together they offer a reasonable
solution to the fault tolerance problem in a large num ber of MINs.

123

Fault tolerant MINs developed according to one of the two techniques suggested
possess these im portant features:
• Using the same switches: T he fault-tolerant networks are constructed from
the same basic switches the ordinary networks use.
• Using the same routing algorithm s: The fault-tolerant networks use the same
routing algorithm s as the ordinary networks.
• Having the same hardw are and routing time complexities: The hardw are and
routing complexity of the fault-tolerant networks are the same as those of the
ordinary ones.

7.2

T h e S F T T e c h n iq u e

The prim ary advantage of the SFT technique is th at it is not MIN-specific. This
means th a t it can be applied to any MIN with characteristics sim ilar to those of the
generalized MIN model. As has been shown, the SFT technique is useful not only
under faulty conditions, but also under normal conditions. Among the functions
th a t a bus in an SFT network can perform are broadcasting and blockage relief.
These two functions are im portant in multiprocessor operation. It was shown th a t
if two buses are added to the system in an SFT network, and if the two buses
are used under norm al conditions to relieve blockage, more perm utations can be
realized w ith the help of the two buses than on the original network. Also in this
enhanced S FT network, the full recovery property is possible on networks with
binary switches.
The cost of the SFT technique is m inim al, as it does not require any switch
design. Moreover the bus is totally invisible under normal conditions, which causes
no negative im pacts on routing while there are no faults.
124

7.3

T h e F a u lt-T o le r a n t C lo s (F T C ) n e tw o r k

The FTC is suggested as an alternative to using the SFT technique on Clos net
works. The m ain reason is th at switch sizes in Clos networks can be so large th at
using the SFT technique would result in a severely poor performance under faulty
conditions. T he F T C is characterized by ease of operation and by using little addi
tional hardw are. It is shown how the addition of only three switches to the network
considerably increases th e reliability of the network. Another advantage of the FTC
is full recovery. This is particularly im portant in Clos networks, as the Clos network
is prim arily a perm u tatio n network. Having only the full access capability as a fault
tolerance criterion would not be acceptable for a Clos network.

7.4

O p e n P r o b le m s

On the way to solving any problem, one often sees problems th at were not noticeable
before.

In th e case of the work done in this thesis, some problems have been

observed, and as such they can make good research areas. First, the SFT technique
was extended only to two standby buses. A possible SFT approach for networks
th a t use large switches, would be in the form of using more than two standby buses.
T he optim um num ber of buses for a given network can be found. Controlling access
to such large num ber of buses, as well studying the performance of the system as
a whole would be of interest. Developing such a scheme for the Clos network and
com paring it with an F T C of the same size would clarify which approach is more
appropriate.
Another extension th a t can be m ade to the SFT technique is to make it tolerate
more than one fault. This again can be done by increasing the num ber of buses to
be larger th an the num ber of processors affected by at least two worst case failures.
125

Aside from the fault tolerance problems, some other problems have been ob
served. In C hapter 5 for example, the num ber of perm utations blocked in a Baseline
network in exactly £ p ath s, V j$ \ was calculated for £ = 0, 1,2. It is interesting to
calculate

for all other values of f, namely, 3 < f < N — 2 ^ 2^.

One last problem concerning the FTC , presented in Chapter 6, is to develop
a new routing algorithm th a t takes advantage of its extra paths available under
norm al conditions. Such an algorithm could run in less tim e than those m entioned
in C hapter 6, because of the flexibility resulting from the extra paths.

126

B ib lio g r a p h y
[1] M. Abidi and D. Agrawal. “On Conflict-free Perm utations in M ulti-stage In te r
connection Networks” , Journal of Digital Systems, vol. V, Summer 1980, pp.
115-134.
[2] G. Adam s and H. Siegel. “The E xtra Stage Cube: A Fault-Tolerant Intercon
nection Network for Supersystem s” , IE E E Transactions on Computers, vol.
C-31, no. 5, May 1982, pp. 443-454.
[3] ------------------“M odifications to Improve the Fault Tolerance of the E x tra Stage
Cube Interconnection Network” , Proceedings of the 1984 International Confer
ence on Parallel Processing, 1984, pp. 169-173.
[4] G. Adam s, D. Agrawal and H. Siegel. “A Survey and Comparison of Faulttolerant M ultistage Interconnection Networks” , Computer, June 1987, pp. 1427.
[5] D. Agrawal. “Testing and Fault Tolerance of M ultistage Interconnection N et
works” , Computer, April 1982, pp. 41-53.
[6] ------------------“G raph Theoretical Analysis and Design of M ultistage Intercon
nection Networks” , IE E E Transactions on Computers, vol. C-32, no. 7, July
1983, pp. 637-648.
[7] S. Andresen. “The Looping Algorithm Extended to Base 2* R earrangeable
Switching Networks” , IE E E Transactions on Communications, vol. COM-25,
no. 10, October 1977, pp. 1057-1063.
[8] J. Baer. “M ultiprocessing System s” , IE E E Transactions on Computers, vol.
C-25, no. 12, December 1976, pp. 613-641.
[9] S. Bandyopadhyay, et. al. “A Cellular Perm uter A rray” , IE E E Transactions
on Computers, vol. C-21, no. 10, October 1972, pp. 1116-1119.
[10] G. Barnes and S. Lundstrom . “Design and Validation of a Connection Network
for M any-Processor M ultiprocessor Systems” , Computer, December 1981, pp.
31-41.
[11] K. Batcher. “The Flip Network in STARAN™” , Proceedings of the 1976 I n 
ternational Conference on Parallel Processing, 1976, pp. 65-71.
[12] V. Benes. “On R earrangeable Three-Stage Connecting Networks” , The Bell
System Technical Journal, vol. XLI, no. 5, September 1962, pp. 1481-1492.
127

[13] ----------------- - Mathematical Theory of Connecting Networks and Telephone
Traffic, New York, Academic Press, 1965.
[14] L. Bhuyan. “A Com binatorial Analysis of M ultibus M ultiprocessors” , Proceed
ings of 1984 International Conference on Parallel Processing, August 1984, pp.
225-227.
[15] ------------------“Interconnection Networks for Parallel and D istributed Process
ing” , Computer, June 1987, pp. 9-12.
[16] C. C ardot. “Comments on A Simple Algorithm for the Control of Rearrangeable
Switching Networks” , IE E E Transactions on Communications, vol. COM-34,
no. 4, April 1986, p. 395.
[17] J. Carpinelli. Interconnection Networks: Improved Routing Methods for Clos
and Benes Networks, Ph.D . Thesis, Rensselaer Polytechnic Institute, Troy, NY,
A ugust, 1987.
[18] ------------------“Applications of Edge-Coloring Algorithms to Routing on P aral
lel C om puters” , Proceedings of the 3rd International Conference on Supercom
puting, May, 1988.
[19] J. Carpinelli and Y. Oru$. “On the Equivalence of Edge Coloring and M a
trix Decom position A lgorithm s for Routing in Clos Networks” , Submitted for
publication
[20] ------------------“Some Group Theoretic Results Towards a Linear-Time Set Up
Algorithm for Benes Networks” , Proceedings of the 20th Annual Conference on
Information Sciences and Systems, M arch, 1986.
[21] ------------------“Parallel Set-Up Algorithms for Clos Networks Using a TreeConnected C om puter” , Proceedings of the 2nd International Conference on
Supercomputing, May, 1987.
[22]

“M atrix Decomposition Algorithms for Dynamic Topology Re
configuration in Parallel C om puters” , Proceedings of the 4th International Con
ference on Supercomputing, April, 1989.

[23] J. Carpinelli, C. Lin and M. Singh. “APAP: The A rithm etic Pipeline Analysis
Package” , Proceedings of the 19th Annual Pittsburgh Conference on Modeling
and Simulation, May, 1988.
[24] _____________ “APAP: A Com puter-based Tool for Analyzing D ata Flow in
A rithm etic Pipelines” , Proceedings of the 1988 Frontiers in Education Confer
ence, O ctober, 1988.
[25] T. Chen. “Parallelism , Pipelining, and Com puter Efficiency” , Computer D e
sign, vol. 10, no. 1, January 1971, pp. 69-74.
[26] W. Chu. Advances in Computer Communications and Networking, Artech
House, D edham , M a., 1979.

128

[27] L. Ciminiera and A. Serra. “A Connecting Network with Fault Tolerance Ca
pabilities” , IE E E Transactions on Computers, vol C-35, no. 6, June 1986, pp.
578-580.
[28] C. Clos. “A Study of Non-blocking Switching Networks” , Bell Systems Tech
nical Journal, vol. 32, no. 2, M arch 1953, pp. 406-424.
[29] C. Das and L. Bhuyan. “Bandw idth Availability of M ultiple-Bus M ultiproces
sors”, IE E E Transactions on Computers, vol. 0-34, no. 10, O ctober 1985, pp.
918-926.
[30] W. Davis. Operating Systems: A Systematic View, 2nd Edition, AddisonWesley, Reading, M a., 1983.
[31] D. Dias and J. Jum p. “Analysis and Simulation of Buffered D elta Networks” ,
IE E E Transactions on Computers, vol. C-30, no. 4, pp. 273-282.
[32] ___________ “Packet Switching Interconnection Networks for M odular Sys
tem s”, Computer, December 1981, pp. 43-53.
[33] ___________ “Augm ented and P runed iVlogiV M ultistage Networks: Topol
ogy and Perform ance” , Proceedings of the 1982 International Conference on
Parallel Processing, 1982, p p .10-11.
[34] T. Feng. “A Survey of Interconnection Networks” , Computer, December 1981,
pp. 12-27.
[35] T. Feng and C. Wu. “Fault-Diagnosis for a Class of M ultistage Interconnection
Networks” , IE E E Transactions on Computers, vol. C-30, no. 10, O ctober 1981,
pp. 743-758.
[36] M. Flynn. “Very High-Speed Com puting Systems” , Proceedings of the I E E E ,
vol. 54, December 1966, pp. 1901-1909.
[37] J. Fraleigh. A First Course in Abstract Algebra, Reading, MA, Addison-Wesley,
1967.
[38] H. Gabow. “Using Euler P artitions to Edge Color B ipartite M ultigraphs” , I n 
ternational Journal of Computer and Information Science, vol. 5, 1976, pp.
345-355.
[39] I. Gazit and M. Malek. “Fault Tolerance Capabilities in M ultistage NetworkBased M ulticom puter Systems” , IE E E Transactions on Computers, vol. 37,
no. 7, July 1988, pp 788-798.
[40] G. Goke and G. Lipovski. “Banyan Networks for P artitioning M ultim icropro
cessor Systems” , Proceedings of the 1st Annual Symposium on Computer A r 
chitecture, December 1973, pp. 21-28.
[41] S. Golomb. “P erm utation by C utting and Shuffling” , S IA M Review, vol. 3,
October 1961, pp. 293-297.
[42] T. Hallin and M. Flynn. “Pipelining of A rithm etic Functions” , IE E E Transac
tions on Computers, vol. C-21, no. 8, August 1972, pp. 880-886.
129

[43] J. Hopcroft and R. K arp. “An n 2 Algorithm for M aximum M atchings in Bi
p artite G raphs”, S IA M Journal on Computing, vol. 2, no. 4, December 1973,
pp. 225-231.
[44] A. Jajszczyk. “A Simple A lgorithm for the Control of Rearrangeable Switch
ing Networks” , IE E E Transactions on Communications, vol. Com-33, no. 2,
February 1985, pp. 169-171.
[45] M. Jeng and H. Siegel. “A Fault-Tolerant M ultistage Interconnection Network
for the M ultiprocessor Systems Using Dynamic R edundancy” , Proceedings of
the 6th International Conference on Distributed Computing Systems, 1986,
p p .70-77.
[46] L. Kinny and R. Arnold. “Analysis of a M ultiprocessor System with a Shared
Bus” , Proceedings of the 5th Annual Symposium on Computer Architecture,
April 1978, pp. 89-95.
[47] C. Kruskal and M. Snir. “The Perform ance of M ultistage Interconnection N et
works for M ultiprocessors” , IE E E Transactions on Computers, vol. C-32, no.
12, December 1983, pp. 1091-1098.
[48] M. Kubale. “Comments on Decomposition of Permutation Networks” , IE E E
Transactions on Computers, vol. C-31, no. 3, M arch 1982, p. 265.
[49] V. K um ar and S. Reddy. “Augm ented Shuffle-Exchange M ultistage Intercon
nection Network”, Computer, June 1987, pp. 30-40.
[50] T. Lang, M. Valero and I. Algre. “Bandw idth of Crossbar and Multiple-Bus
Connections for M ultiprocessors” , IE E E Transactions on Computers, vol. C31, no. 12, December 1982, pp. 1227-1233.
[51] D. Lawrie. “Access and Alignment of D ata in an A rray Processor” , IE E E
Transactions on Computers, vol. 24, no. 12, December 1975, pp. 1145-1155.
[52] J. Lilienkamp, D. Lawrie and P. Yew. “A Fault Tolerant Interconnection N et
work Using Error Correcting Codes”, Proceedings of the 1982 International
Conference on Parallel Processing, 1982, pp. 123-125.
[53] J. Lenfant. “Parallel P erm utations of D ata: A Benes Network Control Algo
rith m for Frequently Used P erm utations”, IE E E Transactions on Computers,
vol. 27, no. 7, July 1978, pp. 637-647.
[54] G. Lev, N. Pippenger and L. Valiant. “A Fast Parallel Algorithm forRouting
in P erm utation Networks”, IE E E Transactions on Computers, vol. C-30, no.
2, February 1981, pp. 93-100.
[55] W . Lin and C. Wu. “Design of a 2 x 2 Fault-Toler ant Switching Elem ent” ,
Proceedings of the 9th Annual Symposium on Computer Architecture, 1982,
pp. 181-189.
[56] H. Lorin. Parallelism in Hardware and Software, Prentice-Hall, Englewood
Cliffs, N .J., 1972.

130

[57] M. M ano. Computer System Architecture, 2nd edition, Prentice-Hall, New Je r
sey, 1982.
[58] M. M arsan, G. Bibo, G. Conte and F. Gregoretti. “Modelling Bus Contention
and M emory Interference in a M ultiprocessor System ”, IE E E Transactions on
Computers, vol. C-32, no. 1, January 1983, pp. 60-72.
[59] R. McMillen and H. Siegel. “Performance and Fault Tolerance Improvements
in the Inverse Augm ented D ata M anipulator Network” , Proceedings of the 9th
Annual Symposium on Computer Architecture, April 1982, pp. 63-72.
[60] T. M udge, J. Hayes and D. Winsor. “M ultiple Bus A rchitectures” , Computer,
June 1987, pp. 42-48.
[61] T. M udge et al. “Analysis of M ultiple Bus Interconnection Networks” , Pro
ceedings of the 1984 International Conference on Parallel Processinq, August
1984, pp. 228-232.
[62] T. Mudge and H. Al-Sadoun. “A Semi-Markov Model for the Perform ance of
M ultiple-Bus System s” , IEE E Transactions on Computers, vol. C-34, no. 10,
October 1985, pp. 934-942.
[63] D. Nassimi and S. Sahni. “A Self-Routing Benes Network and Parallel P er
m utation A lgorithm s” , IEEE Transactions on Computers, vol. 30, no. 5, May
1981, pp. 332-340.
[64] V. Neiman. “S tructure et Command Optim ales de Reseaux de Connexion sans
Blocage” , Annales des Telecommunications, vol. 24, July-A ugust 1969, pp.
232-238.
[65] D. O pferm an and N Tsao-Wu. “On a Class of•Rearrangeable Switching Net
works, P art I: control Algorithm”, Bell Systems Technical Journal, vol. 50, no.
5, M ay-June 1971, pp. 1579-1600.
[66] Y. O ru 5 . Interconnection Networks: Group Theoretic Modeling, Ph.D . Thesis,
Syracuse University, Syracuse, NY, 1983.
[67] K. P adm anabhan and D. Lawrie. “A Class of R edundant P a th M ultistage
Interconnection Networks”, IE E E Transactions on Computers, vol C-32, no.
12, December 1983, pp. 1099-1108.
[68] A. Pages and M. Gondran. System Reliability: Evaluation and Prediction in
Engineering, Springer-Verlag, New York, 1986.
[69] J. Patel. “Perform ance of Processor-Memory Interconnections for M ultiproces
sors” , IE E E Transactions on Computers, vol. C-30, no. 10, October 1981, pp.
771-780.
[70] R. Pearce, J. Field and W. Little. “Asynchronous A rbiter M odule” , IE E E
Transactions on Computers, vol. 24, no. 9, Septem ber 1975, pp. 931-932.
[71] M. Pease. “The Indirect Binary n-C ube Microprocessor A rray” , IE E E Trans
actions on Computers, vol. 26, no. 5, May 1977, pp. 458-473.
131

[72] C. R aghavendra and A. Varma. “INDRA: A Class of Interconnection Networks
with R edundant P a th s” , Proceedings of the 1984 Real Time Systems Sympo
sium, 1984, pp. 153-164.
[73] H. Ram anujam . “Decomposition of Perm utation Networks” , IE E E Transac
tions on Computers, vol. C-22, no. 7, July 1973, pp. 639-643.
[74] B. Raw. “Program Behavior and the Perform ance of Interleaved M emories” ,
IE E E Transactions on Computers, vol. C-28, no. 3, M arch 1979, pp. 191-199.
[75] S. Reddy and V. K um ar. “On Fault Tolerant M ultistage Interconnection N et
works” , Proceedings of the 1984 International Conference on Parallel Process
ing, 1984, pp. 155-164.
[76] J. Shen and J. Hayes. “Fault-Tolerance of Dynamic-Full-Access Interconnection
Networks” , IE E E Transactions on Computers, vol. C-33, no. 3, M arch 1984,
pp. 241-248.
[77] H. Siegel and S. Sm ith. “Study of M ultistage SIMD Interconnection Networks” ,
Proceedings of the 5th Annual Symposium on Computer Architecture, April
1978, pp. 223-229.
[78] H. Siegel. “Interconnection Networks for SIMD M achines”, Computer, vol. 12,
June 1979, pp. 57-65.
[79] H. Siegel and R. McMillen. “The M ultistage Cube: A Versatile Interconnection
Network” , Computer, December 1981, pp. 65-76.
[80] H. Stone. “Parallel Processing w ith the Perfect Shuffle” , IE E E Transactions
on Computers, vol. C-20, no. 2, February 1971, pp. 153-161.
[81] S. Thanaw astien and V. Nelson. “Interference Analysis of Shuffle/Exchange
Networks” , IE E E Transactions on Computers, vol. C-30, no. 8, August 1981,
pp. 545-556.
[82] P. Tobias and D. Trindade. Applied Reliability, Van N ostrand Reinhold, New
York, 1986.
[83] M. Valero, et. al. “A Perform ance Evaluation of the M ultiple-Bus Network for
M ultiprocessor System s” , Proceedings of A C M Conference Performance Eval
uation, August 1984, pp. 200-206.
[84] V. Vizing. “On an E stim ate of the Chrom atic Class of a p-graph” , Diskret.
Analiz., no. 3, 1964, pp. 25-30.
[85] C. Wu and T. Feng. “On a Class of M ultistage Interconnection Networks” ,
IE E E Transactions on Computers, vol. C-29, no.8, August 1980, pp. 694-702.
[86] ____________ “The Reverse-Exchange Interconnection Network” , IE E E Trans
actions on Computers, vol. C-29, no.9, Septem ber 1980, pp. 694-702.
[87] -------------------- “The Universality of the Shuffle-Exchange Network” , IE E E
Transactions on Computers, vol. C-30, no. 5, May 1981, pp. 324-332.
[88] K. Yoon and W. Hegazy. “The E x tra Stage G am m a Network” , Proceedings of
the 13th Annual Symposium on Computer Architecture, 1986, pp. 175-182.
132

