Many researchers have paid significant attention to the design of cluster-based systems, due to the fact that such systems need very inexpensive networks compared to those needed for noncluster-based systems. A number of hierarchical interconnection networks (HINs) have also been proposed in the literature which can be used for building large cluster-based systems. Most of the existing HINs are not fault tolerant. It is very desirable that a HIN be fault tolerant, because even a single fault in the network can completely disconnect a large number of processors and/or memory modules from the rest of the processors and memory modules of the system. As a result, the performance of the system will decrease significantly. In this paper, we have proposed two types of hierarchical interconnection networks which are fault tolerant and can be used to build large cluster-based multiprocessor systems. We have also developed analytical models to determine the performance of the proposed fault-tolerant HINs under fault-free and faulty conditions. Simulation models were also developed to verify the accuracy of the analytical models. The results obtained from the analytical models were found to be very close to those obtained from the simulation models. The technique that has been used to develop models in this paper can also be used to develop models for other hierarchical systems.
INTRODUCTION
Recently a great deal of attention has been paid to the design of cluster-based multiprocessor systems . Clusterbased design is very appealing when a system is to be built with a very large number of processors and memory modules. A cluster-based multiprocessor system needs a less expensive interconnection network compared to that needed for a non-cluster-based system. A number of clusterbased designs are available in the literature. The Cm* [1] is made up of 50 processor-memory pairs called compute modules, grouped into clusters. Communication within a cluster is via a parallel bus controlled by an address mapping processor termed a Kmap. There are five clusters and these communicate via an intercluster bus. The CEDAR system [2, 3] uses a bus interconnection between the processors within a cluster and the cluster memory they share, and a multistage interconnection network between all processors and the global memory shared among all clusters. The DASH multiprocessor [4] is also a cluster-based system. The processors and memory modules of a cluster are connected by a bus. This multiprocessor system can have as many clusters as needed. All the clusters can be connected by a 148 S. M. MAHMUD, L. T. SAMARATUNGA AND S. KOMMIDI a multilevel hierarchical network for their systems, which consists of a large number of smaller crossbar switches. The performance of their network is very close to that of a full crossbar connection if a processor accesses its nearer memory modules more frequently than remote memory modules. Potlapalli and Agrawal [9] proposed a HIN called the Hierarchical Multistage Interconnection Network. This network consists of many levels and the network at each level is built using multistage interconnection networks. A number of other hierarchical interconnection networks are proposed in the literature [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] which can be used for multiprocessor and multicomputer systems.
The motivation behind designing HINs is to exploit the inherent locality that exists in many general and parallel computations. The success of all cluster-based systems with reduced or limited interconnection depends on the locality of computations. This means that a processor must access the memory modules within its own cluster more frequently than those in other clusters. In fact, for the analysis of all hierarchical networks it is assumed that the probability that a processor generates a reference for one of its ith-level memory modules is p i , where p i > p j for all i < j.
The performance of a HIN is very sensitive to network faults. Sometimes a single fault in the network can degrade the performance of the system very significantly, depending upon the location of the fault. For example, if any one of the HINs presented in [5, 6, 8, 9 ] has a faulty link, then that faulty link will isolate a number of devices (processors and/or memory modules) from the rest of the system. The number of devices that will be isolated from the other devices depends on the location of the fault. If a fault occurs at a higher level, then that fault will isolate more devices than if the fault occurs at a lower level. Since all the devices of a hierarchical system can not be used together in the presence of a fault in the HIN, the performance of the system will degrade and the amount of degradation will depend on the location of the fault. The performance of the system will degrade significantly if the fault occurs at or near the highest level of the system. Moreover, if multiple faults exist in a HIN, then these faults may divide the entire system into many small isolated subsystems. Thus, in the presence of multiple faults in the HIN, the system may not be usable at all.
In this paper we present two fault-tolerant HINs. Both HINs are designed using many small crossbar switches. In one type of HIN, multiple links are used at every input and output port of the crossbar switches. The bandwidth available from a port of a crossbar depends on the number of links present in that port. Thus, when a link becomes faulty, the bandwidth of the corresponding port decreases as opposed to the fact that a number of devices become disconnected from the rest of the system. Hence, all the devices of the entire system can still be used together, but with a slight degradation in performance. In another type of HIN we use only one link in every input and output port of a crossbar, but we use a small backup circuit with every crossbar in order to tolerate one or more faults within the crossbar. We have developed analytical models in order to determine the memory bandwidth of both types of HINs under fault-free and faulty conditions. We have verified our analytical models using extensive simulations. Most of the results from the analytical models match very closely (within 5%) to those from the simulation models.
Section 2 describes our fault-tolerant HINs. The analytical models are presented in Section 3. Results from the analytical and simulation models are presented and discussed in Section 4, and the conclusions are presented in Section 5.
DESCRIPTION OF FAULT-TOLERANT HIERARCHICAL INTERCONNECTION NETWORKS
The performance of a hierarchical system is very sensitive to network faults. If a link can not be used either because the link itself is faulty or there is a fault in the network which makes the link unusable, then a set of processors and memory modules becomes disconnected from the rest of the processors and memory modules of the system. As a result, the performance of the system may degrade significantly depending upon what fraction of the processors and memory modules is available within that set which becomes disconnected. Thus, it is very desirable that a hierarchical interconnection network must be fault tolerant.
A multiple-link-based HIN
The multiple-link-based HIN, presented in this paper, has many levels of hierarchy. 
In general, we can say that if a hierarchical system has L levels, then an ith 
From the above description it is clear that k 1 zeroth-level clusters are connected to a first-level IN to form a first-level cluster, k 2 first-level clusters are connected to a second-level IN to form a second-level cluster, k 3 second-level clusters are connected to a third-level IN to form a third-level cluster and so on. Thus, the total number of zeroth-level clusters in an L-level hierarchical system is k 1 k 2 k 3 . . . k L−1 . Since there are n 0 processors and m 0 memory modules in each zeroth-level cluster, the total numbers of processors and memory modules in the system are Figure 1 shows a four-level multiple-link-based hierarchical system. If a processor generates a memory reference for one of its local (zeroth-level) memory modules, then that reference goes to the memory module through the local interconnection network. However, if a processor generates a reference for one of its ith-(i > 0) level memory modules, then that reference first keeps moving up through the parent outlets of different INs until it reaches the ith-level IN of the ith-level cluster in which the processor is located. The reference then starts moving down through the child outlets of different INs until it reaches the referenced memory module.
A HIN with fault-tolerant INs
Here we propose another type of fault-tolerant HIN. This type of HIN is designed using only one link at every inlet and outlet port. However, every IN has a backup circuit, as shown in Figure 2 , to tolerate faults within the crossbar (main crossbar). If any reference can not move through the main crossbar due to the presence of fault(s) in the main crossbar, then that reference tries to move through the backup circuit. For an ith-level fault-tolerant IN, the backup circuit is composed of two small crossbars: one It is assumed that the main crossbar has a built-in fault detection circuit, which generates control signals to route memory references through the backup circuit when necessary. The backup circuit allows a maximum of z i references to move through it, if these references can not move through the main crossbar due to the presence of fault(s) in the main crossbar. Thus, if z i or fewer references can not move through the main crossbar due to the presence of fault(s), the performance of the system will not degrade, because all these references can move through the backup circuit. However, the performance of the system will degrade if more than z i references need to be moved through the backup circuit. 
PERFORMANCE ANALYSIS
In this section we present some analytical models to determine memory bandwidths of the HINs which are presented in the previous section. Different analytical models are developed to determine the bandwidths of the HINs in the presence of different types of faults. The following main notation is used to describe different parameters of the system. In addition to this notation, some other parameters are also used in this paper, and they are defined just before they are used in the text. The number of processors in an ith-level cluster is
Notation
The number of memory modules in an ith-level cluster is
HIERARCHICAL NETWORKS FOR SHARED MEMORY MULTIPROCESSORS 151 The number of ith-level processors of a memory module is
The number of ith-level memory modules of a processor is
The total number of ith-level clusters in the entire system is
The models presented in this paper are developed based on the following assumptions:
(1) the multiprocessor system is synchronous with N processors and M memory modules; (2) the new references generated in a cycle are random and independent of each other; (3) the references which are not accepted during a memory cycle are resubmitted for the same memory modules in the next cycle; (4) ψ is the probability with which an active (unblocked) processor generates a new reference in a memory cycle.
Assumptions (1) and (2) are used by almost all the bandwidth analysis models available in the literature. Assumption (3) is used to make our model more realistic.
Bandwidth analysis of a multiple-link-based HIN

Model for a fault-free multiple-link-based system
Let f be the fraction of the processors which are active at steady state. Note that normally f is less than 1, because some of the processors might have been blocked due to the fact that their references were not accepted during the previous cycle(s). The blocked processors will remain in the inactive state until their references are accepted by the memory modules and then they will go to the active state. Since s i is the fraction of a processor's references which are directed to its ith-level memory modules and there are m i ith-level memory modules of a processor, using the empirical expression developed by Yen et al. [30] , one can show that the average number of distinct references competing for the parent outlet of a local cluster is
Computation of the rate of reference at a link of the parent outlet of an IN. Since the parent outlet of a local cluster has a 1 links, during a memory cycle the average number of distinct references arriving at a link of the parent outlet of a local cluster is du 0 /a 1 . Since pu 1 is the rate of reference at a link of the parent outlet of a zeroth-level IN and the rate of reference at a link can not be greater than unity, the value of pu 1 can be determined as
During a memory cycle, the probability that a processor either generates a new reference for an ith-level memory module or it is already blocked for an ith-level memory module is ψf s i + f i . An ith-level IN can receive only ithand higher-level references from its child INs. Let u i,j be the portion of these references which is directed to the j th-level memory modules. The value of u i,j can be expressed as
The average number of distinct references competing for the parent outlet of an ith-level IN is
The rate of reference at a link of the parent outlet of an ith-level IN can be expressed as follows: 
where pd L = 0 and u L−1,L−1 = 1. The average number of distinct references competing for a link of a child outlet of an ith-level IN can be expressed as
. Do the analysis shown by (6) through (13), and get the value of BW using (13 Total memory bandwidth of the hierarchical system is
Computation of bandwidth contribution from the ith
An ith-level IN receives (i + 1)th and higher-level references through its parent inlet. Let v i+1,j be the portion of these references which is directed to the j th-
can be determined as follows:
and
The bandwidth contributions from different types of references will be proportional to the number of corresponding references which arrive at a local cluster. Let d z be the average number of distinct zeroth-level references generated by the active and blocked processors of a local cluster. The value of d z can be expressed as
The average number of ith-level references which arrive at a local cluster from the first-level parent IN is pd 1 b 1 v 1,i . Hence,
The fraction of processors which attempt to access their ithlevel memory modules during a memory cycle is
. The value of f i for the next iteration of the bandwidth computation can be determined as
where n is the iteration number. The iterative algorithm shown in Algorithm 1 can be used to determine the bandwidth of the hierarchical system. Note that, at steady state, the bandwidth of the system must be equal to f Nψ.
Models for multiple-link-based HINs with faulty links
In this subsection we are going to show the analytical models for different types of link faults. By link faults we mean that some links can not be used due to the presence of faults in the network. A link may not be usable either because there is a fault on the link itself, or there is a fault in an IN which makes the link unusable.
First we present a few lemmas and then we show the analytical models for different types of faulty HINs. 
(x < a i+1 ). Due to the presence of the faulty links in an ithlevel parent outlet, the bandwidth contribution from the corresponding faulty ith-level cluster will be less than that from other good ith-level clusters. In this subsection, we present an approximate analytical model for bandwidth analysis of a faulty HIN. Since the results obtained from this approximate analytical model were found to be very close to those obtained from the simulation model, we did not try to develop the actual probabilistic model for the faulty HIN which is very complex. Since the approximate analytical model gives very good results, we also felt that it may not be worthwhile developing the exact complex probabilistic model.
Let pu * i+1 be the rate of reference at a link of the faulty parent outlet of an ith-level IN. Note that the superscript ' * ' is used to indicate that the corresponding value is for a faulty cluster. The value of pu * i+1 can be expressed as
where du i is given by (9) . However, the rate of reference at a link of the parent outlet of an ith-level IN of a good ithlevel cluster is pu i+1 , as given by (10) . In our approximate model we assume that the bandwidth available from the ithlevel parent outlet of a good ith-level cluster is proportional to pu i+1 a i+1 and that available from the faulty ith-level parent outlet is proportional to pu * i+1 (a i+1 − x). Now using Lemma 2 we can say that the bandwidth contribution from the processors of a good ith-level cluster is proportional to pu i+1 a i+1 and that from the processors of the faulty ith-level cluster is proportional to pu * i+1 (a i+1 − x). Let us use the term u(i, x), as shown below, to indicate the degradation due to the presence of x faulty links in an ithlevel parent outlet:
Hence, the bandwidth loss due to the presence of x faulty links in an ith-level parent outlet is given by bw i u(i, x).
Note that bw i = BW/C i , where BW is the total bandwidth of a good HIN and C i is the total number of ith-level clusters in the entire system. Hence, the total bandwidth of the HIN in the presence of the faulty parent outlet u i is given by
Models for multiple faulty parent outlets. Let the set of faulty parent outlets be U = {u 1 , u 2 , u 3 , . . . , u r }, where r is the number of faulty outlets. Let the faulty parent outlet u i ∈ U be at level h i and the number of faulty links in the outlet be x i . First we use Algorithm 2 to generate a number of disjoint sets from U . From Algorithm 2 it is clear that U i ∩ U j = ∅ for 0 ≤ i, j ≤ t and i = j . Now we are going to determine the loss in bandwidth due to the presence of the faulty outlets of the set U 0 . Since the outlet u i ∈ U 0 is neither an ancestor nor a descendant of the outlet u j ∈ U for all i = j , the references which move through u i ∈ U 0 can not move through any other outlets of the set U and vice versa. Thus, in our approximate model we assume that the outlet u i ∈ U 0 does not have any significant effect on any other outlets of the set U and vice versa. Hence, we can still assume that the bandwidth contribution from the h i th-level faulty cluster, which became faulty due to the presence of the faulty outlet u i ∈ U 0 , is proportional to the bandwidth available from u i . Therefore, the loss in bandwidth due to the presence of the faulty outlet u i ∈ U 0 is bw h i u(h i , x i ), where bw h i is the bandwidth contribution from a good h i th-level cluster and u(h i , x i ) is given by (20) . Hence, the total loss in bandwidth due to the presence of all the faulty outlets of the set U 0 is
If U 0 = U , then there are more sets of faulty outlets. Since U i ∩ U j = ∅ for i = j , it is clear that the references which move through one outlet of a set can not move through an outlet of another set. Thus, in our model we assume that the outlets of one set are not going to have any significant effect on the outlets of another set. Let us try to determine the loss in bandwidth due to the presence of the faulty outlets of the set U 1 . Without loss of generality, we can assume that U 1 = {u 1 , u 2 , u 3 , . . . , u v }, where v is the number of faulty outlets in U 1 . It is clear that, for every pair of faulty outlets (u i , u j ) ∈ U 1 , the outlet u i is either an ancestor or a descendant of the outlet u j . Recall that the faulty parent outlet u i ∈ U is at level h i . Without loss of generality, we can assume that h i < h j for all i < j. Thus, u j is an ancestor of u i for all u i , u j ∈ U 1 and j > i. Since u j is an ancestor of u i , the 154 S. M. MAHMUD, L. T. SAMARATUNGA AND S. KOMMIDI references which move through u i , also move through u j . As a result, there will be a direct effect of one faulty outlet on another. The loss in bandwidth due to the presence of the faulty outlet u i is bw h i u(h i , x i ). Hence, the bandwidth contribution from the faulty h j th-(h j > h i ) level cluster, due to the presence of the faulty outlet u i , is given by (bw h j − bw h i u(h i , x i ) ). Since the references which move through u i also move through u j , in our approximate model we assume that the bandwidth contribution from the faulty h j th-level cluster, due to the presence of the faulty outlets u i and u j , is (bw h j − bw h i u(h i , x i )) ( 1 − u(h j , x j ) ). Hence, the total loss in bandwidth due to the presence of all the faulty parent outlets of the set U 1 can be expressed as
The loss in bandwidth due to the presence of the faulty parent outlets of the set U i (2 ≤ i ≤ t) can be determined in a similar way to that for the faulty parent outlets of the set U 1 . Thus, the total loss in bandwidth due to the presence of all the faulty parent outlets of the set U is
Hence, the bandwidth of the HIN in the presence of all the faulty parent outlets of the set U is
Now we develop analytical models for the multiple-linkbased HIN with faulty child outlets. First we present three more lemmas and then we show the analytical model for a system with faulty child outlets.
LEMMA 3. An ith-level child outlet carries the j th-
(i ≤ j ≤ L − 1) level references of k j − 1 (j − 1)th-level clusters.
LEMMA 4. The ith-level references of a processor move through Nd j j th-(j ≤ i) level child outlets, where
Nd j =      k i − 1, for j = i, (k i − 1) i−1 y=j k y , for j < i.(26)
LEMMA 5. Assume that D is a set of child outlets such that, for every pair of outlets (d m , d n ) ∈ D, d m is either an ancestor or a descendant of d n , and all the outlets of D carry some references of a given processor. If d m ∈ D carries the ith-level references of the given processor, then d n ∈ D (for all m and n) also carries the ith-level references of the processor and no outlet of D can carry any other type of references, say j th-(j = i) level references, from that given processor.
A cluster is called an affected cluster if the bandwidth contribution from every processor of that cluster is going to be affected due to the presence of fault(s) in the system.
Analytical model for the HIN with one faulty child outlet. Let the faulty child outlet be d i . Let d i be a child outlet of an ith-level IN and let the number of faulty links in d i be x (x < b i ). Let pd *
i be the rate of reference at a link of the faulty child outlet. The value of pd * i can be expressed as
where dd i is given by (11) . However, the rate of reference at a link of a good ith-level child outlet is pd i , as given by (12) . In our approximate model we assume that the bandwidth available from the faulty ith-level child outlet is proportional to pd * i (b i − x) and that available from a good ith-level child outlet is proportional to pd i b i . Lemma 3 shows that an ithlevel child outlet carries the j th-(i ≤ j ≤ L − 1) level references of k j − 1 (j − 1)th-level clusters. Thus, a faulty child outlet will affect the bandwidth contribution of those clusters whose references move through the faulty outlet. If an ith-level child outlet is faulty, then this faulty outlet will affect the bandwidth contribution of many (i − 1)th-and higher-level clusters. Now let us try to determine the loss in bandwidth contribution from different types of affected clusters.
From Lemma 3 it is clear that the bandwidth contribution from k i −1 (i −1)th-level clusters will be affected by a faulty ith-level child outlet, because the ith-level references from these (i − 1)th-level clusters move through the faulty ithlevel child outlet. From Lemma 4 we see that the ith-level references of a processor move through k i − 1 ith-level child outlets. Thus, the ith-level references of an (i − 1)th-level affected cluster move through (k i − 2) good ith-level child outlets and the faulty child outlet. Hence, in our approximate model we assume that the bandwidth contribution from the ith-level references of an (i − 1)th-level affected cluster is proportional to (k i − 2)pd i b i + pd * i (b i − x) and that from a non-affected (i − 1)th-level cluster is proportional to (k i − 1)pd i b i . Let us use the term d(i, x), as shown below, to indicate the degradation due to the presence of x faulty links in an ith-level child outlet:
Let us use the term δ(j, i, x) to indicate the fraction of the bandwidth contribution which will be lost from a j th-level affected cluster due to the presence of x faulty links in an ith-level child outlet. In general, the value of δ(j, i, x) can be expressed as 
The total loss in bandwidth from all the affected clusters in the system can be expressed as
Since bw j = k i k i+1 . . . k j bw i−1 and bw i−1 = BW/C i−1 , (30) can be reduced to the following closed form expression:
Hence, the bandwidth of the HIN in the presence of the faulty child outlet is 
Since, at steady state, the total bandwidth available from a cluster is proportional to the bandwidth available from the ith-(0 ≤ i ≤ L − 1) level references of the cluster, the maximum bandwidth which can be obtained from the zeroth-level cluster #i is limited by the references which cause the maximum loss in bandwidth. Let bw i (j ) be the loss in bandwidth from the zeroth-level cluster #i caused by the j th-level references and let bw 0 (i) be the total bandwidth contribution from the zeroth-level cluster #i. Then we can write 
where bw 0 (i) is given by (33).
Bandwidth analysis of a HIN with fault-tolerant INs
When there is no fault in this type of HIN, the analytical model is the same as that of the multiple-link-based HIN. The only difference is that for the HIN with fault-tolerant
Since an ithlevel backup circuit can move z i references through it, the performance of the HIN will not degrade if z i or fewer references can not move through the main crossbar. Thus, the analytical models are developed for the case when more than z i references can not move through the main crossbar.
Here we present an analytical model for only one faulty IN in the system. The model can easily be extended for multiple faulty INs in a similar way as was done for multiple faults of the other type of HIN, as described in the previous subsection. Let the faulty IN be an ith-level IN. Some of the inlets of the fault-tolerant IN will not be able to send their references through the main crossbar when there are some faults in the main crossbar. Let us call these inlets faulty inlets. The model will depend on whether or not the parent inlet is one of the faulty inlets.
The parent inlet is not one of the faulty inlets
Let the number of faulty inlets be g (g > z i ). Thus, the references from these g faulty inlets will move to the backup circuit. The average number of distinct references which will try to move through the backup circuit is given by
Hence, the probability that there is a reference on any output line of the (k i + 1) × z i backup crossbar is given by
The probability that the parent outlet of the faulty IN is going to be accessed by at least one reference is
In our approximate model we assume that the bandwidth contribution from the (i + 1)th-and higher-level references of the faulty ith-level cluster is proportional to pu * i+1 and that of a good ith-level cluster is proportional to pu i+1 .
Let dd * i,i be the average number of distinct ith-level references competing for all the child outlets of the faulty ith-level IN. The value of dd * i,i can be expressed as
Let dd i,i be the average number of distinct ith-level references competing for all the child outlets of a good ithlevel IN. The value of dd i,i can be expressed as
Total bandwidth of the HIN with one ith-level faulty IN can be expressed as
The parent inlet is one of the faulty inlets
The average number of distinct references which will try to move through the backup circuit is given by
Now the value of qu i can be determined using (36). Let x be the fraction of db i which are incluster references (the references which came through g −1 child inlets). The value of x can be expressed as
The total number of distinct ith-level references competing for all the child outlets of the faulty ith-level IN can be expressed as The probability that an out of cluster reference (the reference which comes from the parent IN) will pass through the (k i + 1) × z i crossbar of the backup circuit is
Let us use the term c(i + 1, g), as shown below, to indicate the degradation due to the fact that references from g inlets of an ith-level IN can not move through the main crossbar and one of these g inlets is the parent inlet:
The effect of this degradation is similar to that of a faulty (i + 1)th-level child outlet of the multiple-link-based HIN. Thus, the total loss in bandwidth can be expressed as
Hence, the total bandwidth of the HIN with one ith-level faulty IN is
NUMERICAL RESULTS AND DISCUSSIONS
We have analyzed a number of four-level hierarchical systems. In our analysis, the memory references of a processor were distributed among different memory modules in such a way that the INs at no particular level became the bottleneck. This means that we tried to have the same utilization for all the INs. Note that when the INs at a particular level become the bottleneck, the performance of a hierarchical system degrades severely, because most of the processors will be blocked for the memory modules at the corresponding level.
We have developed simulation models to verify the accuracy of the analytical models which are presented in this paper. A simulation program was written to simulate the synchronous behavior of the hierarchical multiprocessors. Queues were maintained in the simulation model in order to keep track of the blocked processors. The simulation program was driven by a linear congruential random number generator. In order to determine the bandwidth (with 95% confidence interval) of a particular system for a given set of parameters, the program was run ten times with different seeds and each run was for 400,000 memory cycles. However, the first 50,000 memory cycles were ignored in order to avoid the initial transients.
Results for the multiple-link-based system
In order to have the same utilization for all the INs of a multiple-link-based system, we determined the values of s i (0 ≤ i ≤ L − 1) as follows:
Since the bandwidth available from the parent outlet of an ith-level IN is the same as that from the parent inlet of the ith-level IN, for a real system we can assume that 
. Both the analytical and simulation results, shown in Table 1 , were obtained for ψ = 1. The simulation results are shown with 95% confidence interval. From Table 1 it is seen that, for fault-free conditions, the results from the analytical model are very close to those from the simulation model. The error column of Table 1 shows the error in the analytical results, which is under 5%. Each one of the above five systems was analyzed under six different types of faulty conditions, and we assumed that a faulty outlet has only one faulty link. Table 2 shows the types of faults that were investigated for the above-mentioned five HINs. Tables 3 and 4 show the performance of the systems HIN1-HIN5 in the presence of different types of faulty parent outlets. These tables also show that the results from the analytical models are very close to those from the simulation models. For most of the cases, the results from the analytical models are within 5% of those from the simulation models. Comparing the results of Table 1 with  those of Tables 3 and 4 , we see that the performance of a system degrades in the presence of faults. The degradation depends on the position of the faults as well as on the number of faults. The performance degradation of different hierarchical systems in the presence of different types of faulty parent outlets is summarized in Table 5 . This table shows that the performance of a system is more sensitive to the position of a fault rather than the number of faults. For example, the degradation due to one third-level faulty parent outlet (see fault F1) is more than that due to two faulty parent outlets at the first and second levels (see fault F2).
Since the performance degradation of a HIN with a large number of processors is very significant when the highest level outlets become faulty, we further analyzed a HIN with a large number of processors and with the faults at the highest level in order to determine the accuracy of our analytical model at this high degradation. The parameters of the HIN which we investigated are k 1 = k 2 = k 3 = 8. The HIN 
Parameters of the HINs
Bandwidth of fault-free HINs was analyzed under four different conditions, as shown in Table 6 . The bandwidth of the HIN was determined for ψ = 1, and the number of processors in the system was varied from 1024 to 8192. The results from the simulation model are shown in Figure 3 . This figure shows that for all three types of faults (S2, S3 and S4) the degradation is almost the same. The system saturates after 3072 processors. At the saturation point, the degradation due to all three types of faults is about 20%, which is significant. Since the degradation due to all three types of faults is the same, we can conclude that the degradation due to a faulty parent or a child outlet is the same, as long as the fault occurs at the same level and a i = b i for all i. Since in practice it is unlikely that a given system will have too many faults at the same time and since the simulation is very time consuming, we did not try to determine how accurate our analytical model is for a system with too many faults. A second-level crossbar has four faulty inlets X3
A third-level crossbar has four faulty inlets X4
A second-level crossbar has four faulty inlets and one of the faulty inlets is the parent inlet
Results for the HINs with fault-tolerant INs
Even for the HINs with fault-tolerant INs, we determined the values of s i using (48)-(52). The parameters of the HINs which we investigated are the same as those shown in Table 1 , except the values of a i and b i for 1 ≤ i ≤ 3; note that for the HINs with fault-tolerant INs a i = b i = 1 for 1 ≤ i ≤ 3. We assumed that for all the fault-tolerant INs z i = 1 for 1 ≤ i ≤ 3. This means that the maximum number of references which can move through the backup circuit of any IN is one. Since the parameters k 1 , k 2 and k 3 of the HINs with fault-tolerant INs which we investigated are the same as those of the HINs shown in Table 1 , we still would like to use the notation HIN1, HIN2, HIN3, . . . , HIN5 to indicate these HINs. Each one these five HINs has been analyzed under four different conditions as shown in Table 7 . Figure 4 shows the bandwidth of the HINs under different conditions. This figure also shows that the degradation in performance is more when the fault(s) occur at a higher level than at a lower level. Among the faulty conditions X2, X3 and X4, the degradation due to X3 is the maximum, because all the faults are at the third level. For both X2 and X4 the faulty IN is at the second level, but still the degradation due to X4 is more than that due to X2. The reason is that X4 has one faulty parent inlet which carries higher-level traffic. Thus, the bandwidth contributions from more processors are affected by the faulty condition X4 than by X2. As a result, the degradation due to X4 is more than that due to X2. The analytical results were also verified by simulation. The error in analytical results was under 5%.
Multiple-link-based HINs versus HINs with fault-tolerant INs
The advantage of a multiple-link-based HIN is that if a link of a port is unusable, either because the link itself is faulty or there is a fault in an IN which makes the link unusable, then all the devices in the system can still be used together (but with a slight degradation) due to the presence of other good links in the corresponding port. However, if a link in a HIN with fault-tolerant INs becomes unusable due to the presence of a fault on the link itself, then this fault can not be tolerated. A HIN with fault-tolerant INs can tolerate only those faults which are present inside the INs. Another advantage of a multiple-link-based HIN is that the values of a i = b i , especially at the higher levels, can be selected in such a way that a certain minimum traffic can be maintained at the highest level, which may be necessary for the services from the operating system. Similarly, if a given system is to be expanded by adding one more level, then the value of a i = b i for the highest level IN can be properly selected to maintain a certain minimum traffic. Such a flexibility is not available in the other types of HIN, because, even if k i = 2 for 1 ≤ i ≤ L − 1, the traffic at the highest level will decrease by at least 50% when the system is to be expanded by adding another level.
The disadvantage of a multiple-link-based HIN is that this type of HIN needs a more complicated arbitration circuit for each port of an IN than that needed for the other types of HIN.
CONCLUSION
It is very desirable for a HIN to be fault tolerant, because for certain types of HINs a single faulty link can disconnect a set of processors and/or memory modules from the rest of the system. As a result, all the processors and memory modules of the system can not be used together. Some of the existing HINs [5, 6, 8, 9] are not fault tolerant. In this paper we have presented two types of fault-tolerant HINs. We have also presented a number of approximate analytical models to determine the bandwidth of the HINs under faultfree and faulty conditions. The results obtained from the approximate analytical models were found to be very close to those obtained from the simulation models. We have presented results for systems with up to four faults. For most of the cases, the error in our analytical results is under 5%, except in a few cases where it is greater than 5% but less than 7%. Since there is a very good match between the approximate analytical model and the simulation model, we felt that it is not worthwhile developing the exact and lengthy probabilistic model, which is very complex. The approach that we used to develop our approximate analytical models can also be used to develop models for other HINs available in the literature (such as [5, 6, 8, 9] ) provided that those HINs are made fault tolerant.
