System decomposition is a novel technique for modeling the dependability of complex systems without constructing a single-level Markov Chain(MC). This is demonstrated in this paper for the availability computation of a class of multiprocessors that use 4 4 switching elements for the multistage interconnection network (MIN). The availability model is known as task-based availability, where a system is considered operational as long as the task requirements are satis ed. We develop two simple MCs for the processors and memories and solve them using a software package, called HARP. The probabilities of i processing elements(PEs) and j memory modules(MMs) working at any time t, denoted as P i (t) and P j (t), are obtained from their corresponding MCs. The e ect of the MIN is captured in the model by nding the number of switches required for the connection of i PEs and j MMs. A third MC is then developed for the switches to nd the probability that the MIN provides the required (i j) connection. Multiplying this term with P i (t) and P j (t), the probability of an (i j) working group is obtained. The methodology is generalized to model arbitrary as well as larger size systems. Using this approach the transient and steady state availability are computed for a variety of MIN con gurations and the results are validated through simulation.
I. INTRODUCTION
There has been considerable interest in shared memory multiprocessors that use multistage interconnection network (MIN) as the communication medium. This is mainly because MIN-based designs are more cost-e ective compared to other alternatives such as crossbar or bus-based multiprocessors for building large systems. Several research e orts have been directed towards the design and performance evaluation of various MINs and their fault-tolerant features [1] [2] [3] . Examples of commercial and experimental multiprocessors that have used MIN topology include the BBN Butter y 4], NYU Ultracomputer 5], IBM RP3 6], PASM 7] , and CEDAR 8] .
In many applications of multiprocessors, dependability* is a major concern in addition to high performance. Dependability prediction of multiprocessors is essential so that they can be used in various critical applications. Recently, reliability evaluation of MINs have been addressed by several researchers [10] [11] [12] [13] . In this paper, we present an availability model for MIN-based multiprocessors.
The class of MINs focussed here use 4 4 crossbar switches as the building blocks of the network. This choice stems from the fact that the rst commercially available MIN-based multiprocessor, the Butter y, uses 4 4 switches. The name \Butter y network" is used because of its topological similarity with that of a Fast Fourier Transform Butter y. We consider the processing elements (PEs), memory modules (MMs), and switching elements (SEs) as the basic components of a system. The system behavior is analyzed under the assumption that each type of component has a separate repair facility.
The availability model studied here is known as task-based availability, where a system is considered operational if at least a connected group of I PEs and J MMs is available for the execution of a task. Task-based analysis allows the incorporation of graceful degradation which is essential for improving the fault-tolerance of multiprocessors.
The modeling paradigm is based on a system decomposition approach. The PEs, MMs, and SEs are assumed to have independent repair facilities. Each subsystem is analyzed separately for satisfying the task requirements. Complete system availability is obtained by combining the results of these subsystems. Two simple MCs are developed for the processors and the memory modules. These chains could be solved by using software packages such as HARP 16] , SHARPE 17] , or SAVE 18] . The probabilities of i PEs and j MMs working are obtained from these two models. The e ect of the MIN is captured by nding the number of SEs required for the connection of the computing resources. This number is obtained by analyzing a second level of decomposition, where a system is divided into four groups. Based on the distribution of the i PEs and j MMs in these four groups, the required number of SEs is calculated. A third MC is then constructed for the switches to nd the probability that the MIN provides the (i j) connection. Product of all these three terms gives the probability that the system has an (i j) working group. Summation of all these working group probabilities gives system availability. The model is validated by comparing the analytical results with those obtained from simulation.
The rest of the paper is organized as follows. We brie y describe the system under investigation and the underlying assumptions in Section II. Details of the modeling technique are discussed in Section III for (16 16 ) and (64 64) systems. In Section IV, a generalized technique for availability evaluation is presented, where a broad range of con gurations are investigated. These con gurations include large systems, arbitrary size systems (not 4 i ), and systems that use a a SEs. Section V contains numerical results obtained from solving the model. The research contributions are summarized in Section VI. A message-passing protocol for information transmission is considered as used in the Butterfly TM system. A processing element, PE i , and its corresponding memory module, MM i , are assumed to be located on a single board, called node i. We therefore interchangeably call an (N N) system as an N-node con guration. A local memory reference does not need to go through the switching network. A reference to a remote memory requires two passes through the MIN. In the rst pass, the requesting processor sends the request to the remote memory through the network. The reply is transmitted from the memory to the processor in a second pass. For example, a round trip path from PE2 to MM8 is shown by the heavy lines in Figure 1 . This type of communication scheme needs 4 SEs in a (16x16) system as against only 2 SEs required for a circuit switching one pass protocol. We use the former approach to make our model more representative of the actual Butter y system. However, the model can also be applied to circuit switching networks as indicated in Section IV.
II. SYSTEM DESCRIPTION
The system behavior is analyzed under the following assumptions about the failure and repair process of the PEs, MMs, and SEs.
(1) The components of a subsystem are all homogeneous and have identical exponential failure distributions. Let p , m , and se represent the failure rates of a PE, MM, and SE, respectively. (2) There is a single repair facility for each type of component with exponential distribution of repair time. The corresponding repair rates are given as p , m , and se . These assumptions are normally used in dependability studies. One di erence between the assumptions in 14] and ours is that the former model assumed on-line repair only for processors and memory modules. The authors in 14] assumed that the SEs could not be repaired on-line and a safe-down of the system was necessary for system wide maintenance. We have assumed that the SEs could also be repaired on-line. The validity of the assumption lies in the fact that one can use a fault detection algorithm for MINs that can locate the failure of SEs. Isolation and repair of a SE is therefore possible without stopping the system activity. A second motivation for allowing repair of SEs is that multiprocessors are expected to provide fault-tolerance in the form of graceful degradation in addition to providing high computing power. Since the MIN is substantially complex in nature, failure of the SEs cannot be ignored and should be handled without bringing down the system. To justify our claim further, recent research on hierarchical cache design for multiprocessors is geared towards including cache in the SEs of the MIN. In view of this, fault diagnosis of SEs is essential both from dependability and performance standpoint.
III. MODELING TECHNIQUE
The analysis is based on a decomposition approach where the complete multiprocessor is divided into three subsystems as shown in the Figure 2 . Each subsystem is analyzed independently. There should be at least a connected group of I PEs and J MMs in order to satisfy the system-level requirements. 
A. Processor/Memory Subsystem
The initial con guration of the multiprocessor has N processors and at least I of those must work for the system to be operational. The MC for the processor subsystem is shown in Figure 3 . The model includes imperfect coverage and imprecise repair in the analysis. The coverage of a processor is denoted as C p and the successful repair factor as r p . The processor subsystem moves from a working state i, for I i N, to a failed state F i either due to imperfect coverage (1 ? C p ) or because of the imprecise repair (1 ? r p ). If the fault coverage and repair process are perfect, the above chain reduces to a simple one dimensional machine repairman model.
The solution of the MC using HARP gives the probability that i PEs are working in the system at time t, denoted as P i (t), for I i N.
The MC for the memory subsystem is almost the same as the processors, except that the system tolerates up to N ? J failures. The failure rate, repair rate, coverage factor, and successful repair factor are given as m , m , C m , and r m , respectively. Solving the memory MC, we can get P j (t); probability that there are j working MMs at time t, for J j N.
B. MIN Subsystem
A degradable MIN should provide connection between some i PEs and j MMs. Hence, the modeling of the MIN should guarantee that the required number of SEs must work in order to satisfy the connection. Since the working PEs and MMs could be distributed in an arbitrary fashion, nding the proper SEs for the connection becomes an extremely There are two di erent ways of having a connected group of i PEs and j MMs (i j) in the system. The rst case is when exactly i PEs and j MMs are working and at least the required number of SEs are available for providing the connection. In the second situation, more than the required number of processors and/or memory modules may be working but the connectivity is still (i j). This case arises when the working SEs are su cient to connect only i PEs and j MMs. An (N N) MIN with 4 4 SEs has a total of ( N 4 log 4 N) switches. For a given (i j) distribution, let x represent the number of SEs required for the connection and y(= N 4 log 4 N ? x) be the additional SEs. The status of these additional SEs does not a ect the connectivity. A MC of the MIN with up to y failures can still provide the required connection for the (i j) system. In the second case, when more than (i j) elements are working, if we use the probability that only the required x SEs are perfect, the system size would still be (i j).
A second level decomposition (partitioning) of the multiprocessor is used to nd x, the required number of SEs. The system size is usually given by (4 i 4 i ) since the MIN consists of 4 4 SEs. A (4 i 4 i ) con guration can be partitioned into four (4 i?1 4 i?1 ) subsystems without disturbing the connection. For example, a (16 16) multiprocessor is divided into four (4 4) groups or a (64 64) system is divided into four (16 16) groups. Computation of the required number of SEs is simpli ed using the four group partitioning scheme.
The modeling methodology discussed here is not limited to MINs with 4 4 SEs. The decomposition technique can be applied to any log a N stage MIN designed with a a SEs. The network in that case would be logically divided into a groups. However, the precision of the results would depend on the x and y values. It is extremely di cult to come up with a generic expression to compute these values for a a SEs without sacri cing accuracy or simplicity. Instead of dwelling on a general MIN of a a SEs, we consider 4 4 SEs since that has been used in commercial systems. Extension to a a SEs would be discussed in Section IV.
The modeling of the MIN using four groups, that follows next, is discussed in an inductive approach. Instead of starting with a general methodology for log 4 N stage MIN directly, we will systematically analyze (16 16) and (64 64) con gurations before presenting a general model. This is primarily because of clarity of explanation. The concept of four groups and its exact analysis would be introduced with the (16 16) MIN which consists of two stages. The technique of partitioning a (64 64) network with one more stage of SEs would be discussed next. This will be followed by a general approach to model n stages of SEs.
C. 16-Node Con guration
The 16 PEs and the corresponding 4 input SEs could be divided into four groups where each group has 4 PEs and the corresponding input switch. Let us call these groups as processor groups (PG 0 ; PG 1 Let us de ne the meaning of corresponding groups explicitly. Let p = 3 and the corresponding processor groups equal 0, 1, and 2. If we choose all the working memory groups from 0, 1, and 2 then it is called a corresponding selection, and m p. In this example, selection of memory group 4 is not a corresponding selection and gives = 1.
The rst term in equation (2) says how many ways one can choose from four groups.
The second term represents the number of possible ways to select ( ? ) from for a given value of (number of groups selected outside the original groups). The third term denotes the selection of from (4 ? ) groups. Equation (2) 
Equation ( For p = m = 2, the total number of boxes to be included in the calculation is shown with heavy lines in Figure 4 . It gives sum(i; j) = 121, g(2; 2) = 6, g(3; 3) = 52 , and g(4; 4) = 63. Then g(k;k) sum(i;j) gives the probability that a (k $ k) connection is required. It was mentioned that a (k $ k) connection would determine the required number of switches x, that must work for the p processor groups to be connected to m memory groups. The MC for the MIN is then drawn as depicted in Figure 5 . The two-tuple notation (x; y) represents the number of required SEs (x) and the number of extra SEs (y) in the MIN such that x + y = N 4 log N 4 = S t ; the total number of switches in the system. se , se , C se , and r se denote the failure rate, repair rate, coverage factor, and successful repair factor for the SEs, respectively. The horizontal transition with a rate of x se C se shows that the system fails when any one of the required switches fail. The second part of the horizontal transition rate with the (1 ? C se ) term represents system failure due to imperfect coverage of a faulty SE from the extra group. Although an extra SE does not contribute to any useful connection, its failure should be detected properly so that the system recon guration is possible. The third part represents imprecise repair of the SEs.
The vertical transitions re ect the failure and repair of the extra y SEs. It should be observed that from a state (x; y), the MIN subsystem could either go to a failed mode (horizontal transition) or to state (x; y ? 1) (vertical transition) since only one SE could fail at a time. Hence, the repair rate is always se in Figure 5 . The repair of a required SE is assumed to have a higher priority than that of an extra SE. This gives no transition between the failed states and improves the availability of the MIN.
The MC of Figure 5 can model any degradation with the proper values of x and y. For example, if the working processors and memories need at least two processor groups and two memory groups, the network of Figure 1 may have to provide a (2 $ 2), a (3 $ 3), or a (4 $ 4) connection depending on the distribution of the PEs and MMs. A (2 $ 2) connection gives x = 4, y = 4, a (3 $ 3) connection gives x = 6, y = 2, and a (4 $ 4)
connection means x = 8, y = 0 (no degradation of the MIN). Thus, for evaluating a system with 50% degradation, the MC of Figure 5 should be solved three times with three di erent sets of (x; y) values. Let P (x;l) (t) denote the probability that the system is in state (x; l). Solution of the MC of Figure 5 using HARP would give P (x;l) (t), for 0 l y. The probability that the MIN would provide a (k $ k) connection with x working SEs is denoted as P (k$k) (t) and is given as P (k$k) (t) = g(k; k) sum(i; j) P (k$k) (t) P i (t)P j (t):
The superscript 1 represents the rst case when only i PEs and j MMs are working. The rst two summation terms denote various combinations of processor and memory groups containing i PEs and j MMs. The third summation term shows that k can vary up to size 4. For example, if p = 2, m = 3, then k = 3. This means that if i PEs and j MMs are distributed in 2 processor groups and 3 memory groups, depending on the distribution, we may need 6 SEs (P (3$3) (t)), or all the 8 SEs (P (4$4) (t)). g(k;k) sum(i;j) gives the probability of requiring a (k $ k) MIN connection. For p = m = 3, g(3; 3) = 16, g(4; 4) = 39, and sum(i; j) = 55 from Figure 4 . Multiplying the switching subsystem probability with P i (t) and P j (t), obtained from the processor and memory subsystems, we nd the rst case probability P
(i j) (t). In the second case, the connection is restricted to (i j) when some PEs and MMs are working in the system, for i; j. Note that this is possible only if the required i and j components are working in separate groups and ( ? i) and ( ? j) elements are working in separate groups and there is no connection between the working (i j) subsystem and rest of the working components. Let us consider an example in which i PEs and j MMs are working using switches 1, 2, 3, 5, 6, and 7 of Figure 1 . If any other component works in these three processor groups and three memory groups then it increases the connectivity. The extra processors and/or memories must work in processor group 0 and/or memory group 0, and SEs 0 or 1, or both must fail to disrupt the communication. 
The numerator in equation (7) gives the number of ways to choose i PEs from k 4 processors of k groups (each group in a ( 16 16 ? 4 k represents the number of ways we can select a (k $ k) processor-memory group that contains (i j) subsystem. The denominator gives the total number of ways i PEs and j MMs can be selected. Upper limit of k is equal to 3 since distribution of i PEs or j MMs over all four groups makes second case probability zero.
The state (x; 0) in the MC of Figure 5 represents that only the required SEs are working and all other additional SEs have failed. Inclusion of this state probability would guarantee that the additional working processors and/or memories cannot increase the working system size (i j). Note that P (x;0) (t) can be obtained for a given (k $ k) group.
The second case probability where we can get an (i j) connection from working PEs and working MMs is denoted as P ( ; ) (t). P ( ; ) (t) becomes P ( ; ) (t) = P ex (t)P (x;0) (t)P (t)P (t) (8) where P (t) and P (t) give the probability that PEs and MMs are working at time t. Since and can be in the range i N and j N, respectively, the second case probability is expressed as P (2) 
The second and third terms in equation (9) have negligible contributions and can be dropped to make the computation faster. Finally, the availability of the system becomes A(t) = Computation of the number of SEs required for the input and output groups can be explained by analyzing an example. Let us assume that i = j = 32. These 32 processors and 32 memories give N p = 2 and N m = 2 from equation (1) . These elements could be distributed in 2, 3, or 4 processor/memory groups. When the processors are distributed in all the four groups, we may need between 1 to 4 switches of a group depending on the location of the processors in that group. The same is also true for the memory group. But if we analyze the distributions carefully, it would turn out that all the 4 SEs of a group will be required in most of the cases. This is because the working components would be distributed uniformly with a high probability. Even if they are not uniformly distributed, we may need all four SEs of a group. This can be illustrated using Figure 1 as one group of 16 PEs and 16 MMs in the (64 64) system. Let us consider a situation when the PEs use top 2 SEs of the processor group and the MMs use bottom 2 SEs of the memory group. Now in order for the PEs and MMs to communicate, all 4 SEs of the processor and memory groups are needed. There will be very few situations when all 4 SEs of a group are not required. We neglect these cases and assume that all 4 SEs of a required PG or MG is used in the connection. Hence, if we know the (k $ k) connection for any i PEs and j MMs, the number of input and output switches required is 2k 4. The required number of SEs x is obtained by adding the middle stage SEs to 2k 4. The MC for the MIN given in Figure 5 is then solved.
Equations (6), (9), and (10) are used to nd system availability for any degradation.
The number of (k $ k) connections, g(k; k) and sum(i; j) are obtained from the lookup table given in Figure 4 . The k 4 term should be replaced by k 16 in equation (7), since each group has 16 elements.
IV. A GENERALIZED MODEL
The preceeding section leads us to formalize a general technique for availability computation of MIN-based multiprocessors. This technique is aimed at encompassing a wide range of system con gurations. First, the modeling methodology needs to be extended for analysis of larger systems with reasonable accuracy. The second objective is to include the analysis of systems where the system size is not a power of 4. This need arises because commercial multiprocessors can be con gured for various sizes to match the need. For example, the Butter y con guration can be designed for any size up to 256 nodes. Finally, we also discuss the extension of the model to include other classes of MINs and tightly coupled multiprocessors. By including the analysis techniques of such systems, the model is made general as well as complete.
The main idea is to decompose the system into various groups. Distribution of the i PEs and j MMs can be done among the groups in the same way as discussed for the (16 16) system. Depending upon the size of the system and with the aid of the MC of Figure 3 , we can compute P i (t) and P j (t). The only other aspect left to be focused is the determination of the number of required SEs, x. The number of extra SEs y can be obtained by using the relationship y = S t ? x, where S t is the total number of SEs in the MIN. Knowledge of x and y is essential for the solution of the MC of Figure 5 for nding the contribution of the MIN to the system availability. 
The MC of Figure 5 is solved using this value of x and the corresponding value of y. Equations (5) and (6) are then used to nd the probability that there is an (i j) connected group when exactly i PEs and j MMs are working. P ex (t) is obtained from equation (7) by replacing (k 4) by (k N 4 ). One can obtain the probability of an (i j) connectivity when more than i PEs and j MMs are working by solving equations (8) and (9) . Finally, equation (10) gives the availability of the system at time t.
An Example: 256-node Con guration Table I . With the x values known, the MC for the SEs can be evaluated for nding P (k$k) (t) and P (x;0) (t) which are used in the computation of P (1) (i j) (t) and P (2) (i j) (t) in equations (6) and (9), respectively. The g(k; k) and sum(i; j) terms are obtained from the lookup table shown in Figure 4 . The k 4 term in equation (7) should be changed to k 64 since each group has 64 elements.
B. Modeling N 6 = 4 i -Node Con guration
Consider an N N system where 4 i?1 < N < 4 i for some integer i. The number of stages, n, required to form the MIN is i. The next modeling parameter needs to be known is the number of processor and memory groups. Division of the whole system into four groups might give rise to an ine cient and expensive design. For example, let us consider a 100-node con guration. It would not be a cost e ective design to spread these 100 nodes in all the four groups. Many of the (4 4) SEs would be under utilized (not uses all its input/output links) or may not be used at all. It is therefore essential to pack the nodes densely into as few groups as possible. Dense packing means selection of nodes so as to maximize the switch utilization. This also results in formation of groups of 4 n?1 nodes. So the number of groups would depend on the system size N. Let the number of groups be denoted by G where 1 G 4. G is thus determined by
The 100-node con guration example using equation (12) 
The i PEs could be distributed over p PGs, for N p p G, and the j MMs could be distributed over m MGs, for N m m G. Each of these distribution would need a speci c set of SEs for interconnection. By replacing 4 by G in equations (2), (3), and (4), we can compute the values for g(k; k) and sum(i; j) where, k G. 
These values of x and y are used in the MC shown in Figure 5 . Rest of the technique is the same except that the term (k 4) should be also replaced by (k N G ) in equation (7).
An Example: 48-Node Con guration A (48 48) con guration needs n = 3 stages of SEs. The system is decomposed into G = 3 subgroups each of which contains 16 PEs and 16 MMs. Replacing 4 by 3 in equations (2)- (4), the values for g(k; k) and sum(i; j), for k 3, can be obtained. (14) . These computations are tabulated in Table II . Note that all stage 1 SEs are not used for connection even if we need all the three groups for some values of i and j. This is because three of the stage 1 SEs are idle in a (48 48) connection. Failure of these SEs should not a ect an (i j) connection. Equations (5)- (10) are used to evaluate the system availability after making necessary changes as described in Section IV.B
C. Extension of the Model
The proposed technique could be used to evaluate MINs that uses SEs of size other than 4 4. The analysis of the PEs and the MMs remains the same. Computation of various parameters using p groups of working PEs and m groups of working MMs can be done using the expressions derived in Section III. Only change needs to be done is the number of groups should be a instead of 4 if the switch size is a a. The second consideration is the computation of x to be used in the MC of Figure 5 . Di erent approximate techniques could be used to determine the value of x. One approach is to use all the SEs of a group as has been discussed in this paper. This method would be reasonable for a 4 because with larger SEs the number of groups increases while the number of stages decreases. So the approximation is better. On the other hand, for a = 2 or 3, the smaller number of groups and more stages of SEs in larger systems needs more accurate estimation of x. One such technique is discussed in 20]. The proposed availability model is aimed basically at unique-path MINs. The analysis, within a limited scope, can be used for fault-tolerant multipath MINs. A survey of such MINs is reported in 2]. Fault tolerance can be improved by using various techniques. These techniques include addition of extra stages, increasing the number of ports, replicating the MIN, crosslinking, bypassing, etc. 2]. As the redundancy and topology of the fault-tolerant MINs vary, it is not possible to formalize a general technique for computing availability. The modelling technique for the PE subsystem and the MM subsystem however remains the same in all cases. Modelling of the MIN subsystem would vary with respect to the topology and could be quite involved. The concept presented in this paper can be applied to a fault-tolerant MIN if the SEs can be grouped into two groups; required number of SEs, x, and extra SEs, y, with a reasonable accuracy. It may not be possible to distinctly identify x and y SEs in a multipath MIN. For example, addition of an extra stage of SEs provides alternate paths between any input-output pair. This can be modelled as parallel paths with the same number of x and y values for each path. But the SEs in a path would not be disjoint as some of them would be shared with other paths. Exact solution of such a model would be extremely di cult. One can, however, nd an upper bound by neglecting the sharing of SEs. Similar approximations can be made for other topologies. Validity of the model for fault-tolerant MINs will thus depend upon the feasibility of nding appropriate values for x and y. We refrain from elaborating more on this since the computation technique for x and y would be topology dependent.
V. RESULTS AND DISCUSSION
Equation (10) is used to compute the transient as well as steady state availabilities for di erent con gurations and for di erent amounts of degradation. We have done extensive simulation using the Monte Carlo method to validate the analytical results over a wide range of parameters. The simulation results are also given in this section. Figure 8 depicts variation of system availability for a (16 16) multiprocessor with 25% degradation and for two di erent sets of input parameters. The solid curves show the analytical results whereas the dotted curves show the simulation results. The simulation results exhibit slight uctuations. This is mainly because of the di culty in ne tuning of the simulator. In spite of this, the simulation results match nicely.
Figures 9 and 10 show the variation of availability for (64 64) and (256 256) multiprocessors. We have intentionally kept the C and r low to verify that the analysis is valid over wide ranges. In Figure 11 , the availability curve for a (48 48) system is plotted along with the simulation results. The closeness of the two curves ensures the generalization of our mod- eling technique. The results from analysis and simulation deviate within 7%. We have also checked the validity of our model over a wide range of failure and repair rates.
VI. CONCLUSIONS
An analytical technique for the availability evaluation of multiprocessors using multistage interconnection network is presented in this paper. The novelty of this approach is that the complexity of constructing the single-level exact MC is not required. Using a structural decomposition, the system is divided into three subsystems, namely, processors, memories, and MIN. Two simple MCs are solved for nding the probability of i working PEs, P i (t), and j working MMs, P j (t), at time t, where i and j satisfy the system task requirement. A second-level decomposition is then used for nding an approximate number of SEs (x) required to connect the i PEs and j MMs. This decomposition divides the whole system into subgroups. Considering various possible distributions of the working elements in these subgroups, the required number of SEs x, and the additional SEs y are obtained. A third MC is constructed and solved in order to nd the probability that the MIN provides the necessary connectivity. Multiplying the third term with P i (t), and P j (t), the probability of an (i j) connected group at time t is obtained.
The model has been validated through simulation for up to 256-node con guration. Since the model is very general, it can be extended beyond (256 256) systems and for analyzing systems that use a a SEs.
The following are the contributions of this paper. It provides an analytical technique for the dependability evaluation of MIN-based multiprocessors, which is known to be a complex problem. We do not know of any work that has addressed the modeling of such a wide range of large size repairable MIN-based multiprocessors. The model is simple since it avoids the generation of a detailed MC, thereby reducing the size of the state space drastically. Finally, research on performability has been con ned to generic multiprocessors where the network degradation has been neglected (except for 16 16 system). Our model would provide the basic framework for analyzing the performability behavior of large MINbased systems including the e ect of the network.
