A number of hierarchical interconnection networks @INS) has been proposed in the literature which can be used for building large cluster-based multiprocessors. It is very desirable that a HIN must be fault tolerant, because even a single fault in the network can completely disconnect a large number of processors and memory modules from the rest of the system. As a result, the performance of the system will decrease significantly. In this paper, we have proposed a HIN which can work under faulty conditions but with a slight degradation in performance. We have also developed analytical models to determine the performance of the proposed fault tolerant HIN.
INTRODUCTION
Recently a great deal of attention has been paid to the design of cluster-based multiprocessor systems [1]- [16] . Cluster-based design is very appealing when a system is to be built with a very large number of processors and memory modules. because such a system needs a less expensive interconnection network than a noncluster-based system: A cluster structure using shared buses as the basic interconnection media has been proposed by Wu and Liu [2] . Multiple levels of clustering may be present in their organization. Shared buses are used to interconnect the units within a cluster, and the entire system is built using hierarchy of buses. Agrawal and Mahgoub [5] - [6] proposed a clusterbased multiprocessor system where a hierarchical interconnection network (HIN) is used for communication.
The conflict-free access within each cluster is satisfied by relatively smaller crossbar switches. They showed that clusterbased scheme provides results closer to a fully connected crossbar system if every processor accesses memory modules within its own cluster more frequently than other memory modules. Mahgoub and Elmagamid p ] proposed a generalized class of cluster-based multiprocessor systems. They proposed a multilevel hierarchical network for their systems, which consists of a large number of smaller crossbar switches. The performance of their network is very close to that of a full crossbar connection if a processor accesses its nearer memory modules more frequently than remote memory modules. Potlapalli and Agrawal proposed [9] a HIN called the Hierarchical Multistage Interconnection Network. This network consists of many levels. and the network at each level is built using multistage interconnection networks. A number of other hierarchical interconnection networks are proposed in the literature, which can be used for multiprocessor and multicomputer systems [3] - [4] , [81, The performance of a HIN is very sensitive to network faults. Sometimes a single fault in the network can degrade the performance of the system very significantly, depending upon the location of the fault, For example, if any one of the HINs presented in [71 and [91 has a faulty Link. then that faulty link will isolate a number of devices (processors and/or memory modules) from the rest of the system. The number of devices which will be isolated from the other devices depends on the location of the fault. If a fault occurs at a higher level, then that fault will isolate more devices than if the fault occurs at a lower level. Since all the devices of a hierarchical system can not be used together in the presence of a fault in the HIN, the performance of the system will degrade, and the amount of degradation will depend on the location of the fault.
In this paper we present a fault tolerant HW. The HIN is designed using many crossbar switches. Multiple links are used at every input and output port of the crossbar switches. Thus, when a link becomes faulty, the bandwidth of the corresponding port decreases as opposed to the fact that a number of devices gets disconnected from the rest of the system. Hence, all the devices of the entire system can still be used together, but with a slight degradation in performance.
We have developed analytical models in order to determine the memory bandwidth of the HIN under fault-free and faulty conditions. We have verified our analytical models using extensive simulations. The most of the results from the analytical models match very closely (within 5%) to those from the simulation models.
Section II describes our fault tolerant HIN. The analytical models are presented in Section 111. Results from the analytical and simulation models are presented and discussed in Section lV, and the conclusions are presented in Section V.
A FAULT TOLERANT HIERARCHICAL NETWORK
In this paper we have presented a fault tolerant HIN. Figure 1 shows a 3-level fault tolerant HIN. This HIN is made using a number of crossbars connected in a hierarchical manner. All the processors and memory modules of the system are grouped into a number of clusters, called the 0th level clusters or the local clusters. Each 0th level cluster contains no processors and mg memory modules. A 0th level cluster is connected to its 1st level parent interconnection network (IN) through an outlet and an inlet containing a1 and bl links, respectively. The interconnection network inside a 0th level cluster, called the 0th level IN, is built using a (nO+bl)x(mg+al) crossbar switch.
For a hierarchical system with L levels (including the 0th level), an ith (1 I ; i I L2) level If a processor generates a memory reference for one of its local (0th level) memory modules, then that reference goes to the memory module through the local interconnection network. But if a processor generates a reference for one of its ith (bo) level memory modules, then that reference first keeps moving up through the parent outlets of different INS until it reaches an ith level IN. The reference then starts moving down through the child outlets of different INS until it reaches the rcfcrcnced memory module. The above description leads to the following definitions. The values of Ni. Mi. ni. mi, can be expressed as 
Definition
Ni = ki.N,-1, ni = N. I -N. 1-1 (la) Mi = ki.Mi-,. mi = Mi -Miel
B. Analytical Model for the Fault Free HIN
The models presented in this paper are developed based on the following assumptions:
1) The multiprocessor system is synchronous with N processors and M memory modules.
2) The new references generated in a cycle are random and independent of each other.
The references which are not accepted during a memory cycle are resubmitted for the same memory modules in the next cycle.
3)

4)
is the probability with which an active (unblocked) processor generates a new reference in a memory cycle.
The assumptions 1 and 2 are used by almost all the bandwidth analysis model available in the literature. Assumption 3 is used to make our model more realistic.
Let f be the fraction of the processors which are active at steady state, and other processors are blocked due to the fact that their references were not accepted during the previous cycle(s). It can be shown [l] that for a system with N processors and M memory modules and for uniform reference model, the probability that a particular memory module is requested by at least one active or blocked processor is M
Let fi be the fraction of the processors which are blocked for their ith level memory modules. Then we can say that 
The bandwidth of the hierarchical system can then be determinedas BW = M *
Computation of Bandwidth Contribution from the ith ( O S S 5 1 ) level references
An ith level IN receives (i+l)th and higher level references through its parent inlet. Let v i + l~ be the portion of these references which is directed to jth ( i + k j G l ) level memory modules. The bandwidth contributions due to different types of references will be proportional to the number of corresponding references which arrive at a local cluster. Let d, be the average number of distinct 0th level references generated by the active and blocked processors of a local cluster. The value of d, can be expressed as d , = m g * The average number of ith level references which arrive at a local cluster from the 1st level parent IN is pdlblvl,i. Hence.
The fraction of processors which attempt to access their ith level memory modules during a memory cycle is @Si + f i (%iSL-1). But the fraction of the processors which get service from their ith level memory modules during a memory cycle is BWilN (OSiSL-I). Thus, the value of fi for the next iteration of bandwidth computation can be determined as (12) where n is the iteration number. The following iterative algorithm can be used to determine the bandwidth of the hierarchical system. Accept BW as the bandwidth of the system.
.
C. Analytical Model for the Faulty HIN
Analytical models for single and multiple faulty outlets have been developed in this subsection. An outlet is considered as a faulty outlet if it has one or more faulty links.
We say that a link is faulty if it can't be used either because there is a fault on the link itself, or there is a fault in an IN which makes the link unusable. First we present a few lemmas and then we are going to develop the model. where dui is given by (4). In our approximate model we assume that the bandwidth available from the ith level parent outlet of a good ith level cluster is proportional to pui+lai+l, and that available from the faulty ith level parent outlet is proportional to p i + ] (ai+] -x). Now using lemma 2 we can say that the bandwidth contribution from the processors of a good ith level cluster is propottional to pui+lai+l, and that from the processors of the faulty ith level cluster is proportional to pUi+l (a,+] -x). Thus, the total bandwidth of the HIN in the presence of the faulty parent outlet ui is given
where, Ci is the total number of ith level clusters in the entire system and puy+l(ai+l -XI 
. U U t = U
From the above algorithm it is clear that U i n U j = 0 for 0 _< i j I t and i#j. Since the outlet ui E U0 is neither an ancestor nor a descendant of the outlet uj E U for all i#j. in our approximate model we assume that the outlet ui E U0 does not have any significant effect on any other outlets of the set U and vice versa Hence, the total loss in bandwidth due to the presence of all the faulty outlets of the set U0 is aSW(U0) = Ebwh,Au(hi,xi) ui €UO Since u i n u j = 0 for i#j, it is clear that the references which move through one outlet of a set can't move through an outlet of another set. Thus, in our model we assume that the outlets of one set is not going to have any significant effect on the outlets of another set. Without losing generality, we can assume that U1 = (ul. u2, u3,. . uv), where v is the number of faulty outlets in Ul,.It is clear that. for every pair of faulty outlets (ui,uj) E U1, the outlet ui is either an ancestor or a descendant of the outlet uj. Without losing generality, we can also assume that hi e hj for all i < j. Thus, U, is an ancestor of ui for all ui,uj E U1 and j > i. Since U, is an ancestor of ui, the references which move through ui, also move through uj. As a result, there will be a direct effect of one faulty outlet on another. The loss in bandwidth due to the presence of the faulty outlet ui is bWh, Au(hi,xi). Hence, the bandwidth contribution from the faulty hjth (hj z hi) level cluster, due to the presence of the faulty outlet ui, is given by (bwh, -bwh,Au(hi,xi)). The bandwidth contribution from the faulty hjth level cluster, due to the presence of the faulty a n d u j , i s o u t l e t s u i (bWh, -bwhiAu(hi.xi))(l -Au(h,,xj)). Hence, the totd loss in bandwidth due to the presence of the faulty outlets ui and uj is
Similarly. the total loss in bandwidth due to the presence of all the faulty parent outlets of the set U1 can be expressed as ABW(U1) = BW*
The bandwidth of the HIN in the presence of all the faulty parent outlets of the set U is given by
Now we are going to develop analytical models for faulty child outlets. First we present three more lemmas and one definition, and then we are going to show the analytical model.
L e m 3:
level references of kj-1 number of (j-1)th level clusters.
Lemmu4:
through Ndj number of jth (in) level child outlets, where
An ith level child outlet carries the jth (iljSL1)
The ith level references of a processor move Defiifion-9 A cluster is called an affected cluster if the bandwidth conhibution from every processor of that cluster is going to be affected due to the presence of fault(s) in the system.
Analytical Model far one Faulty ith Level Child OutIet
Let the faulty child outlet be di. Let pdrbe the rate of reference at a link of the faulty child outlet. The value of pdr can be exoressd as
where ddi is given by (6) . In our approximate model we assume that the bandwidth available from the faulty ith level child outlet is proportional to pdr(bi -x) and that available from a good ith level child outlet is proportional to pdibi.
Using lemmas I, 2. 3 and 4 we can show that the fraction of the bandwidth contribution which will be lost from an (i-1)th level affected cluster, is Ad(i,x)/(ki-l). where
Let's use the term 6o,i,x) to indicate the fraction of the bandwidth contribution which will be lost from a jth level affected cluster due to the presence of x faulty links in an ith level child outlet. Using lemma 4, we can show that in general, the value of 6 (i,i,x) can be expressed as 6(j,i,x) = Since there are (k,+l-l) number of jth level affected clusters, the total loss in bandwidth from all the affected clusters in the system can be expressed as
L-2
j=i-l ABW({di}) = 2 bw,(k,+l -1)6(j,i,x)
(22)
The above term can be reduced to the following closed form Hence, the bandwidth of the HIN with x faulty links in an ith level child outlet is
Model for Multiple Faulty Child Outlets
Let the set of faulty child outlets be D = ( d l , d2. d3, . . Since the outlet dj E Di.0 is neither an ancestor nor a descendant of the outlet dk E Di for all j#k, we assume that dj E Di,o doesn't have any significant effect on dk E Di and vice versa for all j#k. Assume that the outlet dj E Di,o carries some gjth level references of 0th level cluster #i. The loss in bandwidth from 0th level cluster #i, due to the presence of the faulty outlet dj E Di.0 is given by bwoS(g, -1, hj,xj).
For every pair of outlets (dj, dk) E Di,., (15n<m), d. is 1 either an ancestor or a descendant of dk. Hence, any two outlets of Di,n will have a direct effect on each other. Lemma 5 shows that the outlets of Di,n carry only one type of references (say, gnth level references) generated by the processors of 0th level cluster #i. If the faulty outlets of the set Di,n are the only faulty outlets in the system, then the loss in bandwidth from 0th level clusterki can be expressed as
Since, at steady state, the total bandwidth available from a cluster is proportional to the bandwidth available from the ith (OSiSL-1) level references of the cluster, the maximum bandwidth which can be obtained from 0th level cluster #i is limited by the references which cause the maximum loss in bandwidth. Let Abw,(j)be the loss in bandwidth from 0th level cluster #ti caused by the jth level references, and bwo(i) be the total bandwidth contribution from 0th level cluster #i. Then we can write
The values of Abwi(j) (ISjSL-1) can be determine using the following algorithm. 
Model for Multiple Faulty Parent and Child Ourlets
Let the set of faulty parent and child outlets be U = ( 
Assume that the child outlet d,EDi,o carries gjth level references of 0th level cluster #i. Then we assume that the loss in bandwidth from 0th level cluster #i, due to the presence of all the faulty parent outlets U k E U j at levels 0 through gj-1 and the faulty child outlet d,€Di,o is
Assume that the outlets of Di,n (n>l) carry gnth level references of 0th level cluster #i. NOW we can say that the loss in bandwidth from 0th level cluster #i, due to the presence of all the faulty child outlets of the set Di,n (n>l) and all the faulty parent outlets ukEUi at levcls 0 through gn-l.is The total loss in bandwidth from 0th level cluster #i depends on those references which cause the maximum loss. Hence, aftergetting Abwi(j) (1SjSL-1) from Algorithm 6, we can use (25) to determine the bandwidth available from 0th level cluster #i, and then we can use (26) to determine the bandwidth of the HIN with multiple faulty parent and child outlets.
NUMERICAL RESULTS and DISCUSSIONS
We have analyzed a number of 4-level hierarchical systems. In our analysis, the memory references of a processor were distributed among different memory modules in such a way that we can have the same utilization for all the INS. Note that when the INS at a particular level become the bottleneck, the performance of a hierarchical system degrades severely, because most of the processors will be blocked for the memory modules at the corresponding level. A simulation program was written to simulate the synchronous behavior of the hierarchical multiprocessors. In order to determine the bandwidth (with 95% confidence interval) of a particular system for a given set of parameters, the program was run ten times with different seeds, and each run was for 400.000 memory cycles.
In order to have the same utilization for all the INS, we determined the values of si ((KiSL-1) as follows 1. so=md(mg+al) and w1 =al/(mo+al) 2. si = wikibi/(kibi+ai+l) and wi+l = wiai+l/(kibi+ai+l) for 1SiSL-2 3. sL-1 "L-1 Table I shows the parameters of three different 4-level HINs and their bandwidth under fault free conditions. For all these systems ai=bi=2 (16iS3). Both the analytical and simulation results were obtained for W=1. The simulation results are shown with 95% confidence intewal. From Table I it is seen that, for fault free conditions, the results from the analytical model are very close to those from the simulation model. The error column of Tablc I shows the error in the results from the analytical model, which is under 5%. Each one of the above four systems was analyzed under ten different types of faulty conditions. Table I1 shows the types of faults which were investigated for the above mentioned HINs. When we say that an outlet is faulty, this means that one out of two links of that outlet is faulty. Table 111 shows the performance of the systems HINl through HIN3 in the presence of different types of faulty outlets. This table also shows that for most of the cases the results from the analytical models are within 5% of those from the simulation models. Comparing the results of Table I with those of Table 111 we see that the performance of a system degrades in the presence of faults. The degradation depends on the position of the faults as well as on the number of faults. The performance degradation of different hierarchical systems in the presence of different types of faulty parent outlets is summarized in Figure 2 . This figure shows that the performance of a system is more sensitive to the position of a fault rather than the number of faults, especially when the INS are not underutilized. For example, the degradation due to one 3rd level faulty parent outlet (see fault G1) is more than that due to two faulty parent outlets at the 1st and 2nd levels (see faults G2). Comparing the performance of G3 with G4 it is seen that for multiple faulty parent outlets at certain levels, the performance degradation is slightly more when no faulty parent outlet is ancestoddescendant of the other faulty parent One 3rd level parent outlet is faulty Two parent outlets (a 1st level and a 2nd level) are faulty. Neither faulty outlet is the anmtodd-dant of the other one. Three parent outlets (a 1st lev, a 2nd lev. and a ~ 3rd lev.) are faulty. No faulty outlet is the ancestor/dcscendant of the other faulty outlets.
Three parent outlets (a 1st lev., a 2nd lev., and a 3rd lev.) are faulty. The 2nd level faulty outlet is the ancestor of the 1st level faultv 
CONCLUSION
It is very desirable that a hierarchical interconnection network WIN) is to be fault tolerant, because a single faulty link can disconnect a set of processors and memory modules from the rest of the system. As a result, all the processors and memory modules of the system can't be used together 
