Most previous work on fault-tolerant (FT) multiprocessor design has concentrated on deterministic k-fault-tolerant (k-FT) designs in which exactly k spare processors and some spare switches and links are added to construct multiprocessors that can tolerate any k processor faults. However, after k faults are reconfigured around, much of the extra links and switches can remain unutilized. We show here how to use the node-covering principle of Dutt and Hayes and error correcting codes in order to construct probabilistic designs with very high average fault tolerance but low wiring and switch overhead. This design methodology is applicable to any multiprocessor interconnection topology. We also obtain the deterministic fault tolerance for these designs and develop efficient layout strategies for them.
Introduction and Motivation
It is important to design multiprocessors with fault tolerance and reconfiguration capabilities so that they continue to function correctly and with very little degradation in spite of a few processor/link failures, and without requiring the applications to be coded for fault tolerance. To achieve this, multiprocessors need to be designed with structural fault tolerance (SFT), which is the ability to reconfigure around faulty components (processors or links) in order to preserve the original processor interconnection structure.
A number of researchers have proposed various design methods for structural fault tolerance in multiprocessors, for example, [2, 4, 5, 6, 7, 9, 10, 151. Most of this work has concentrated on the design of deterministic k-fault-tolerant (k-FT) multiprocessors in which there are exactly k spare processors, and the link and switch complexity is high enough to tolerate any k processor failures. Unfortunately, in such designs 50% or more of the redundant link/switch overhead remains unutilized on the average after k faulty processors have been reconfigured around. It intuitively seems possible to utilize this remaining spare link and switch capacity of the system by putting in s > k spare processors so that on the average, more ' This research was supported by NSF grant MIP-9210049.
than k faults can be reconfigured around. Looking at this from a different angle, we would like to design FT multiprocessors in which the link/switch complexity is kept quite low (for example, corresponding to that of a 2-FT deterministic design), but many more (s >> 2) spare processors are added so that k processor faults can be reconfigured around on the average (or with very high probability), where k < s and t >> 2. This will also lead to very high utilization of the spare link/switch capacity. Since the hardware complexities in almost all multiprocessor systems are wire dominated, it is important to keep the link redundancy as low as possible in a FT design; the processor complexity is not that critical, and thus adding some excess, say, 5-lo%, processor redundancy does not significantly affect the hardware complexity of the system. Since such designs are geared towards having a large fault tolerance on the average (over all fault patterns), but the worst-case fault tolerance may be lower, we call them probabilistic FT designs.
In earlier work, Dutt and Hayes developed a very efficient methodology called node covering for designing deterministic k-FT multiprocessors [lo] . The basic node-covering technique will be the underlying design process for probabilistic FT, with linear codes being used to determine the interconnection between the external switches used for reconfiguration in order to obtain very high average FT.
Before proceeding further, we first define average fault tolerance as kavg = E:==, (Probability of tolerating a size-i fault that is not contained in a tolerable fault pattern of a larger size) . i, where s is the number of spare processors. A tolerable j-fault pattern will contain i-fault patterns, where j > i, that should not be counted separately as tolerable i-fault patterns. Assuming that tolerable i-fault patterns are uniformly distributed over the space of tolerable and intolerable fault patterns of larger sizes and that pi is the probability of tolerating any i-fault pattern (this we might determine by, say, simulation), we have +\++++ (a) cGFzzi5 ( section we give the rationale for using error-correcting codes (ECC's) as the basis for determining such processor groupings. Section 5 describes efficient layouts of some ECC-based FT designs and obtains the area overheads in each case. Next, Sec. 6 describes how we model and solve the reconfiguration process in our FT designs as a network flow problem. In Sec. 7, we determine the deterministic fault tolerance (DFT), which is the maximum number of faults that are guaranteed to be tolerated, of each ECC-based design. Section 8 discusses the average-case fault tolerance of these designs obtained by Monte-Carlo simulations, and we finally conclude in Sec. 9. Due to space limitations, many of our results are stated without proofs, which can be found in [ll] .
The Deterministic Node-Covering Method
In this technique [lo] , we designate a small subset COW(.) of nodes (i.e., processors) of the FT multiprocessor GI to which U can be mapped (i.e., by which w can be replaced) for reconfiguring around faults. The set C O W ( W ) is called the cover set of w, while each node
The node U is said to be a dependent of all its covers, and the set of dependents of a node U is denoted by d e p ( u ) . Note that for SFT, we need some mechanism for each node U to be connected to the neighbors of a dependent node w in the event it replaces v; we will shortly describe this mechanism.
A covering graph (CG) of a FT multiprocessor G' is a directed raph, whose node set consists of all the nodes of G' ?including the spare nodes), and whose arc set is
is the node set of the non-FT multiprocessor G; see Fig. l(b) . A covering sequence for a node w in the CG is an ordered subset (w, VI, 212,. . . , W h ) of the node set of the CG such that w1 + w, wj + w j -1 , for 1 < j 5 h, and 211, is the only spare node in this ordered set. Basically, a covering sequence for U specifies a mapping which can be performed when w is faulty, so that w is replaced by q , which in turn is replaced by wz, and so on, until Vh-1 is replaced by the spare node v h , thus configuring it into the system. In order to be able to reconfigure around any k faults, we need a set of k node-disjoint covering sequences, one for each fault, in the CG.
In order to provide each node with the necessary switching capability so that it can connect t o the neighbors of any dependent node that it replaces, we use a 2 x 2 switch called a recon-switch in our de- signs whose possible states are specified in Fig. l(a) . This switch has 4 bidirectional terminals (in spite of their labels like "v-input" and "h-output" , these are all bidirectional) and can connect any pair of them together as shown. These terminals can have any width (number of wires), and a switch with d wires at each terminal is termed a d-link switch. In a k-FT system, each processor is connected to its d neighbors via the w-inputs and w-outputs of k d-link recon-switches, as shown in Fig. 2 (a) for k = 2.
For an efficient k-FT design, we need a streamlined way of allowing a node to replace any of its k dependent nodes. This is done by determining a linear ordering C of the nodes and by linking the h-outputs of each of the k recon-switches of node U to the h-inputs of the corresponding recon-switches of the node w to its standard CG in which each node is covered by the k nodes immediately preceding it in G; see Fig. l(b) . Now, for node U to replace a dependent, say, w (either when w becomes faulty or when w replaces one of its dependents in the covering sequence of a faulty node), the d links from U are reconfigured to connect to the h-inputs of w's recon switch, after passing through the recon-switch of any intermediate processor w, and thereby to tu's neighbors; the links from w will be blocked by its first recon-switch if it is faulty, or a similar connection will be established from w to the neighbors of the node it replaces ( Fig. 2(b) ). In order to minimize the link overhead, C should be determined from an optimal 2-D or 3-D layout of the multiprocessor so that adjacent nodes in C are also adjacent in the layout. Then, since recon-links are only required between spatially adjacent processors, the wiring overhead of the k-FT design is minimized.
2.

Intersecting FT Processor Groups
In order to obtain high fault tolerance a t reduced recon-switch and recon-link overheads, we need to partition the processors into smaller groups that are individually made, say, kl-FT, using the technique described in the previous section. The rationale for partitioning into smaller FT processor groups is as follows. Assuming the model of uniformly and independently distributed processor faults, the reliability of a k-FT system with Ar primary processors
, where p is the failure probability of a single processor. If we partition the system into t disjoint groups, and make each group kl-FT, then the system's reliability is ( R k l ( N / t ) 
Group4 -
Legend
-1-FTgroup Figure 3 : (a) A disjoint processor grouping into two 2-FT roups; it cannot tolerate the 3-fault pattern shown. (b) i n intersectin processor grouping into four 1-FT groups that has a simifar hardware overhead to that of the disjoint grouping; it can tolerate the 3-fault pattern. kl = k, then it is easy to see that the partitioned system has higher reliability at the same recon-link and switch cost (but more spare processor cost, which is not that critical). Even for kl < k, we can get the same or higher reliability in the partitioned system at reduced link and switch overheads. For example, when p = 0.02, the reliability &(16) of a 2-FT 16-processor system is 0.994, whereas a comparable reliability of 0.984 is obtained when the system is partitioned into 4 disjoint groups, each of which is made only 1-FT.
A crucial question is whether these processor groups should be disjoint or intersecting. Figure 3 (a) shows a disjoint grouping of processors into two groups, each of which is made 2-FT, while Fig. 3(b) shows a non-disjoint grouping of processors into four groups each of which is made 1-FT. Each group is shown as a thick line spanning a linear processor space. The hardware overheads of the two processor grouping schemes are almost the sanie-each requires four spare processors, and two recon-switches and recon d-links per processor (in Fig. 3(b) , each processor belongs to two groups, and one recon switch is used to connect it to one of its groups and the second one to the other group). It is easy to see that the intersecting-groups scheme can tolerate all fault patterns that the disjointgroups scheme can (assume for the moment that each group can reconfigure around faults in it without any conflict with reconfigurations in intersecting groupswe will shortly show that this is true for our designs). Furthermore, the former scheme can tolerate many more fault patterns than the latter. For example, the 3-fault pattern shown in Fig. 3 cannot be tolerated in the disjoint-grouping case, but it can be tolerated in the intersecting-groups scheme as follows. The leftmost fault can be reconfigured in Group 4, the middle one in Group 1 and the third one in Group 2. This simple example clearly shows that everything else being equal, an intersecting processor grouping yields more fault tolerance and reliability than a disjoint processor grouping. This is essentially because intersecting processor groups have many more paths between processors and spares than disjoint groups. The next crucial question is whether there is a systematic method to form different types of intersecting processor groups that have different fault tolerance and hardware cost properties. The answer to this is in the affirmative. In particular, we can draw upon linear error-correcting codes (ECC's) in order to form processor groupings; different ECC's will yield FT designs with different properties with respect to the above parameters. In
Sec. 4, we show how ECC's can be used to form (intersecting) processor groups, and why they are a good choice to determine these groupings.
Switching Structure
The switch implementation of a FT multiprocessor with intersecting rocessor groups is an extension of the deterministic &FT) node-covering design described in Sec. 2. We assume for simplicity that each processor group is made 1-FT, and thus has a single spare processor. Suppose each processor belongs to d, groups. Then we connect d , recon-switches to each processor in series (via w-input and -output terminals as in k-FT designs). We label the recon-switches from top (to which the processor is directly connected) to
we order the processors in it in some manner (essentially by physical proximity as described in Sec. 5), and connect, say, the ith recon-switches of adjacent processors in this ordering via their h-inputs andoutputs to form a chain;'the spare processor is a t the end of the chain. Figure 4 (a) shows such a reconswitch interconnection for a 4 x 4 array of processors (this array can correspond to a multiprocessor of any topology-most homogeneous or almost-homogeneous multiprocessors like k-ary n,-cubes can be optimally laid out so that rocessors appear in regularly spaced rows or columns7 in which each row and column corresponds to a distinct processor group. Thus each processor belongs to exactly two processor groups, and this grouping is called a degree-2 grouping. 
This is because the recon-switches are interconnected using bidirectional recon-links so that in Fig. 4 (a) processor (2,1) can replace processor (1,l) or vice versa.
(2) The covering relation within each processor group is also transitive, i.e., if U + U and w + w, where u,u and w belong to the same processor group, then U + w . Again, this is facilitated by the switching first spare processor is considered to be at of the chain), without requiring any extra 'It may not always be possible to connect all processors in a group by their ith recon-switches throughout, in which case after connecting the first few processors by their ith recon switches we shift to connecting the rest via, say, their j t h recon switches; we also connect the last ith recon switch of the first subset of processors to the first j t h recon switch of the second subset of processors. Note that if (1,l) was non-faulty, then bypassing it in this manner does not hinder its replacing capabilities in its column group, i.e., it can replace any processor in column # 1 or can be replaced by any processor in it. This is because (1,l) is bypassed by using only the h-input and h-output links of its recon switch corresponding to its row group, and these links are not useful for (1,l) t o replace or be replaced by processors in its column group; see Fig. 4 (c). It is in this sense that the covering relation within a group is transitive. We thus have the following result. The next theorem establishes that a matching between the set of faults and the set of spares in the above sense is sufficient for reconfiguration. Fig. 4(c) , there is no matching for the set of primary faults {(1,0), (l,l), (2,1), (2, 2) , (3,3)}, since both spares in (3,3)'s two groups are faulty. However, reconfiguration is still possible as shown in the figure. In this case, (3, 3) is replaced by (3,2), which in turn is replaced by spare resulting in a "bend" in the reconfiguration path. This bend occurs because (3,2)'s replacement of (3,3) is along its column group and then S:,~'S replacement of (3,2) is along its row group. We are thus bending from a roup connected by a higher numbered reconswitch ?# 1) to one connected via a lower numbered recon-switch (# 0). The reverse type of bend, i.e., replacement along a row followed by replacement along a column is not possible because of the contention for the v-link connecting the 1st and 0th recon switch of the processor a t which the bend occurs. Finally, note in the figure that (1,0), (1,l) and (2,l) are all replaced by "matched" spares in their own groups, and their reconfiguration paths are thus straight.
Theorem 1 T h e covering relation in a GG
It is possible to exactly determine a valid set of reconfiguration paths for a set of faults in the abovedescribed switching structure using a network maxflow algorithm as described later in Sec. 6. However, the matching criterion is still important because: (1) in most cases we can determine the reconfiguration paths by using a simple matching algorithm, (2 it to ura tsu le. : points us to the use of ECC's as a systematic met h od for determining FT processor groups, and (3) it, is also useful in obtaining the deterministic fault tolerance of our designs. operation. In the rest of this section, all arithmetic will be modulo 2; note that modulo 2 multiplication is the same as the AND operation. A single error in any bit in the code word will result in a violation of the above equation and thus lead to a detection of the error. Instead of talking about the parity of all the information bits, we can talk about parities of various groups of information bits. An example is the 2D-parity code, in which the IV information bits are arranged in two dimensions as an n x m array, where N = n ' m , and each row and column of this matrix is a group with which a parity check bit is associated; see Fig. 5 (a) in which two example parity equations (for c1 and cg) are also given. The 2D-parity code can detect any 2 errors (its deterministic error detectability) and can correct a single error. Any (linear) ECC is essentially a collection of parity equations, one for each group of information bits. We can construct processor groups from a given ECC in the following manner. Each primary processor is uniquely associated with an information bit and each spare processor with a check bit. Each group and its check bit in the ECC then gives rise to a processor group and its associated spare.
There are five metrics associated with ECC's that are of interest to FT multiprocessor design: (1) Checkbat overhead, which is the ratio of check to information bits. This translates to spare-processor overhead in a FT multiprocessor. If the 2-D arrangement is an n x n square, i.e., IV = n2, then this overhead is g/iV = 2n/N = 2 / n . (2 Group sate, which is iv-1 .
the size of the largest group. T' his is important, since when a single fault occurs that can be reconfigured only the processors in that group need be invo 1" ved then in using matching (to a spare in one of its groups the replacement process, and will not be available for the period it takes to reconfigure. Thus with a large group size, the availability of the multiprocessor decreases. Group size of the square 2D-code is 0. (3) Group degree, which is the maximum number of groups an information bit lies in. In a FT multiprocessor, this translates to the maximum number of recon-switches and recon-links needed per processor. The 2D-parity code has degree 2. (4) Deterministic error detectability (DED), which is the maximum number of errors that the ECC will always detect. This provides a lower bound on the DFT (maximum number of processor faults that can always be reconfigured around of the FT multiprocessor, as we will shortly see. (5 1 Awerage erasure correctabihty, which is defined similarly to average fault tolerance kavg (see Sec. 1). The concept of erasure correctability, which was introduced in [14], will be explained shortly. This metric provides a lower bound on the average fault tolerance of the multiprocessor, a s shown lat,er.
We will describe four useful ECC's that provide different tradeoffs of these metrics. Before describing these codes we briefly revisit the theory of errorcorrecting codes [l] that will be useful in formulating why ECC's are a good basis for determining FT processor groupings.
Relating Error Detection, Erasure Correction and Fault Tolerance
A linear ECC can also be described in terms of a g x (;V + g parity check matrix, H = [PI11 shown in Fig. 5(b) . 7kh e AT columns of P represent information bits, and the g columns of I the check bits. Each row corresponds to a group; the columns that have 1's in a row correspond to information bits in that group. If vector X is a valid codeword, i.e., its check bits are derived from the information bits using the parity equations, then H . X = 0 (recall that we are using modulo 2 arithmetic). We next define the concept of linear independence and then state a basic theorem in error-correction coding theory. A set S of binary vectors is said to be linearly dependent, if the modulo 2 sum of some subset of its vectors yields the 0 vector. If S is not linearly dependent, then it is said to be linearly independen,t.
Theorem 4 [l] The DED of a n EGC is t if and only i f any t columns of its parity m#atrix are linearly independent. 0
Similar to the matching system between faulty processors and spares described in Sec. 3, we can set up a matching system between bits and parity groups as follows. Let A be the set of information and check bits and P the set of parity groups in a code. With each bit ij in A we associate a subset Pi, of P of those parity groups that contain ij. Then, given a subset E of erroneous bits of A , there is a matching between E and P with respect to {Pi,} if there exists a one-toone function 41 : E + P such that $(ij) E Pi, for all ij E E . We next relate matching and error detectabil-t ity of a code; we first state a useful result from linear algebra [12] given here in a more relevant form.
Theorem 5 [12]
A n y set of m vectors each of length n are linearly dependent i f m > n. 0 Theorem 6 If t is the DED of a code C , then there will always be a matching between any set E of at m o s t t erroneous bits and the set P of parity groups of C . Pro05 Suppose there is a subset E o f t bits for which there is no matching. Since C can detect any t errors, it means that the columns of the parity matrix H corresponding to the bits in E are linearly independent (Theorem 4). For convenience, let these columns be HO,. . . , Ht-' . Note that the set Pij is the set of rows (i.e., parity groups) in which the i .th column of H has 1's. Since there is no matching foraits io,. . . In general, if r > t errors occur, then they are detectable as long as the sum of the T columns corresponding to r error bits is non-zero. Linear independence of the T columns is sufficient to yield a non-zero sum, however it is not necessary (for example, a subset of the r columns can be summed to yield a 0 vector, but the sum of all T columns may not be zero). A more relevant concept for our purpose is erasure correctability of a code introduced in [14] , where ECC's were applied to design redundant disk arrays (RAID'S) in which disk erasures (disk failures in which the data on the disk is lost) can be tolerated by reproducing the data of the failed disks using parity equations. In such RAID'S, each primary (spare) disk corresponds to an information (check) bit of the code. The deterministic erasure correctability of a code is the same as its DED. However, its average erasure correctability is upper-bounded by its average error detectability, since r > t disk erasures can be corrected if and only if the T columns corresponding to the failed disks are linearly independent [14] . The mean-time-to-failure (MTTF) of a RAID derived from an ECC was determined in [14] In [14] , a number of useful ECC's like the 2D-parity, 3D-parity, full-2 and full-3 codes were identified that yield very high MTTF for RAID designs based on these codes, and which have attractive values of metrics like check-bit overhead, group degree and group size. These metrics are also relevant to FT multiprocessor design as discussed earlier. In view of this, and Theorems 7 and 8 and Corollary 1, we choose to use these codes to design FT multiprocessors. Later we show that the DFT, layout area overhead and reliability of these FT multiprocessors are extremely favorable. Next, we briefly describe the above ECC's.
Some Useful ECC's
The 2D-parity code was described earlier; its parity matrix is shown in Fig. 5(c) . The 3D-parity code can be similarly formed by arranging the information bits in a 3-dimensional array and associating a parity equation with each dimensional group. Its check bit overhead is 3/N1l3, its group size is 1V1l3, it is a degree-3 code and its DED is 3. The full-2 code is a degree-2 2-DED code. It can be defined in terms of its parity matrix Hfulls = [PfUll2II]. PfUllz consists of all possible distinct columns with exactly two 1's. The number N of information bits in terms of g is given by N = g(g -1)/2, i.e., its check-bit overhead is approximately m, and its group size is g -1 M @ -1. The full-3 code is a degree-3 3-DED code, and Pful13 consists of all possible distinct columns with exactly three 1's. For this code,
, thus its check-bit overhead is approximately (6N)'l3/1V = 1.8/N2/3, and its group size is 3N/g = (g-l)(g-2)/2 M (6N)Z/3/2. The full-2 and full-3 codes have the least possible check-bit overheads of any degree-2 and degree-3 codes, respectively. The 2-D and 3-D parity codes have better reliabilities than the full-2 and full-3 codes, respectively, though obviously higher check-bit overheads. However, as we will see next, wiring overheads for FT multiprocessors based on full-2 and full-3 codes are a little higher than those based on the 2D-and 3D-parity codes, respectively.
Wire-Efficient Layouts
Layout Scheme
In this section, given a regular 2-D square or 3-D cubic layout2 of a non-FT multiprocessor of an arbitrary topology, we will obtain a similar layout of an ECC-based FT multiprocessor by embedding processor groupings defined by the ECC in the given non-FT processor grid. Embeddings with minimum edge congestion (the number of recon d-links that get mapped onto a single grid line, i.e., the space between adjacent processor rows/columns/heights) will be used to minimize the redundant wiring overhead, and thus obtain efficient FT layouts in a topology independent manner. Let 0,1,. . . , g -1 be the g processor groups in the FT multiprocessor and let d, denote its group degree (the number of groups a processor belongs to). Any processor U is associated with a d,-digit label (bo, b l , . . . , b d , -2 , b d , -l ) which specifies its location in a d -dimensional grid-for different codes these labels wily be determined differently. Recall from Sec. 3 that every processor uses d, recon switches connected in series and labeled 0,1,. . . , d, -1. (see, e.g., Fig. 4(a) ).
For the 2D-and 3D-parity codes, the processor labels are the subsets of grid points given by the square These processors are linked from the smallest-the head processor) to the largest-labeled processor i the tail processor) via their 0th recon switch. In a single spare per group design, we connect a spare s : ,~~,~~ to the tail processor in the group, and in the double-spare case, we connect in addition another spare s ! ,~, ,~~ to the head processor via their 0th recon switches. Similarly, column and height groups are formed, and have spares connected to them. Note that this layout has an edge congestion of one (see Fig. 6 ).
Next, for the full-2 and full-3 codes, the label g > bo > bl > b2 2 0). For the full-2 code, a square layout can be obtained by modifying the triangular layout of Fig. 7(a) as diagramatically illustrated in Fig. 7(b) . Similarly, the triangular pyramidal layout for the full-3 code can be modified to a cubic layout.
For any group j , where 0 5 j < g, let j ,,, and j m i n be the largest and smallest digit positions, respectively, over all processors in the group, at which *Actually, as will become clear later, our layout schemes are equally effective when the given non-FT layout is rectangular or cuboidal. the group number j appears in a processor's label (see Fig. 7 a) ). The j t h processor group is formed Such an ordering is useful in minimizing reconfiguration link length between adjacent processors in a group-recall that a processor's label gives its location in a dg-dimensional grid, and thus adjacent processors in this ordering will be spatially adjacent. Next, the last processor (in the above ordering) with bk = j is linked to the first processor with bk+1 = j , for j m i n 5 k < j ,,,, via the kth recon switch of the former and the (k + 1)th recon switch of the latter.
This gives us a total ordering of all the processors in group j with the first, processor with bj,,, = j at the head and the last processor with bj,,, = j at the tail of the group. We connect spares s$ and s : to the tail and head processors, as in the 2D-and 3D-parity cases, via the j,,,th and jminth recon switches, respectively. Note again that the layouts obtained have a congestion of one.
Area/Volume Overhead
We now give the fractional area (volume) overhead units. The total area or volume A of a layout for a given network topology is proportional to the square or 312 power, respectively, of its bisection width, which is defined as the minimum number of wires that must be cut to separate the network into two equal halves. The bisection widths for mesh and hypercube archi- From the above theorems, we obtain the following results for a mesh architecture (of any size): fo(square 2D-parity or triangular full-2 code layout) = 44% and fo(square full-2 code layout) = 65%; for a 1024-processor hypercube the corresponding overheads are 6% and 9%. The overheads for the 3-D layouts corresponding to the 3D-parity and full-3 codes are somewhat higher. For the mesh architecture, the overheads corresponding to the 2D-parity and full-2 codes are reasonable considering the improvement in reliability of the system (as will be seen in Sec. S), while for the hypercube architecture the corresponding overheads are quite low.
Theorem 9 T h e fractional area overhead of a F T design based o n the 2D-parity code compared to a n o n -F T design
Reconfiguration Algorithm
The reconfiguration problem in the node-covering method is equivalent to finding the maximum flow in a flow graph G. The flow graph corresponding to the 4 x 4 2D-parity code based FT array of Fig. S(a is shown in Fig. 8(b) and in the general case is descri 13 ed as follows. For each recon switch, there is a vertex in G, and between recon switches in the same or different processors that are linked, there are two oppositely directed edges to model reconfiguration in either direction between the corresponding switches. Furthermore, there is an edge from the topmost ( ( d g -1)th) to the bottommost (0th) recon-switch vertex of a processor. The 0th recon-switch vertex of a faulty processor is a unit-surplus source denoted s. Finally, there is an edge from each vertex corresponding to recon switches linked to spares, to a sink vertex t. All edges have unit capacity. Note that since the edges between recon-switch vertices of a processor have unit capacity, the subgraph corresponding to a processor models the contention for v-links when reconfiguration paths bend as pointed out in Sec. 3.
Any flow path from a source to sink in G obtained by a maxflow algorithm (e.g., in Fig. 8(b) ) has a direct correspondence with a reconfiguration path from a fault to a spare in the actual hardware (e.g., in Fig. 8(a) 
Deterministic Fault Tolerance (DFT)
In this section, we give the corner and interior DFT capabilities for FT multiprocessors based on the various codes-these are the maximum number of faults guaranteed reconfigurable at corner and interior locations in the layout, respectively. or smaller (kaug = 92). A FT system derived from the 3D-parity code had 100% reconfigurability for all fault sizes 363 (the total number of groups) and less (ICaug = 363); using two spares per group in this system is not practical-even one spare per group is quite expensive-hence that case was not simulated. A full-3 code based FT system had 100% reconfigurability in both the single and double spare cases (using 20 and 40 spares, respectively) for all fault sizes less than or equal to the number of spares ( k a u g = 20 and 40, respectively). Note t,hat the average fault tolerance kaug for the different ECC-based FT multiprocessors are approximately equal to the number of spares s, as desired.
Theorem 13 T h e corner D F T capability
As we saw in Sec. 5.2, FT systems derived from both the 2D-parity and full-2 codes have reasonably low area overheads and are therefore cost-effective to build. Moreover, their DFT capabilities as derived in Sec. 7 and average-case fault tolerance as noted above are quite high. The full-2 code based system has the advantage that it has lower spare processor overhead and also, as seen above, is able to more effectively utilize its spares. The 2D-parity code based system, on the other hand, has smaller area overhead (see Sec. 5.2). Therefore the various codes considered here offer different tradeoffs between wiring area/volume overheads, reconfiguration capability (or reliability/MTTF) and spare processor overhead.
Conclusions
Our aim was to design FT multiprocessors of any topology with low spare-link and switch overheads, but with very high average fault tolerance. We developed a methodology to effectively use linear ECC's to design such FT multiprocessors by partitioning them into intersecting FT processor groups. We showed that the deterministic error detectability and the average erasure correctability of linear codes are lower bounds for the deterministic and average fault tolerance, respectively, of the FT multiprocessor derived from them in this manner. These lower-bound results provide us with the rationale for using ECC's with high average erasure correctability, but low group degree, group size and check-bit overhead to design FT multiprocessors-the latter metrics affect its hardware overhead and availability. Such ECC's were identified in [14] to be the 2D-parity, 3D-parity, full-2 and full-3 codes, and were used to design redundant disk arrays or RAID'S. We used these ECC's to design FT multiprocessors and developed efficient layout strategies for them. Our results show that indeed the resulting designs have very high average fault tolerance (kavg NN s) or reliability with very small (6-9%) to reasonable (44-65%) area/volume overheads. Another attractive feature of our designs is that the processor degree d in the FT system is the same as that in the non-FT system. We also showed that the deterministic fault tolerances of these designs are quite high. We believe that the methodology presented here offers a very cost-effective technique to build highly-available and/or highly-reliable multiprocessor systems of any topology that can be used in environments ranging from misson-or life-critical systems to business transaction processing.
