Abstract-An approach to the design of reconfigurable tree architectures is presented in which spare processors are allocated at the leaves. The approach is unique in that spares are associated with subtrees and sharing of spares between these subtrees can occur. The Subtree Oriented Fault Tolerance (SOFT) approach is more reliable than previous approaches capable of tolerating link and switch failures for both single-chip and multichip tree implementations while reducing redundancy in terms of both spare processors and links. VLSI layout is 0(n) for binary trees and is directly extensible to N-ary trees and fault tolerance through performance degradation.
require 0(n log n) area [7] . Hassan and Agarwal [7] , recently presented a modular technique which allocates one spare to multilevel groups of processors. This scheme is conceptually similar to the RAE approach in that it dedicates every spare to one specific group of processors, but has the advantage of 0(n) layout and modularity for multichip architectures.
Another proposal for reconfiguration which is applicable to tree architectures has been proposed by Rosenberg [8] . The approach requires a collinear layout with each node requiring access to a log n bus. Redundancy in terms of switching transistors is 0(log n) for each node. The switching structure provides for efficient utilization of spare processors. However, fault tolerance for the communication lines and the switching transistors is not considered.
One of the important objectives in designing for reconfiguration is efficient utilization of spares. If the architecture has k spare processors, then the objective is to be able to tolerate any combination of k processor failures through reconfiguration. This should be accomplished with a reasonable increase in interconnect, manageable layout complexity for large numbers of processors, and a bounded number of pins per chip. In addition, failures in interconnect and switching structures should also be tolerable through reconfiguration.
A strategy for satisfying these objectives for binary tree architectures is presented in this paper. The approach places spare processors at the leaves of the tree and provides for considerable flexibility in reconfiguration through sharing of spares between adjacent subtrees. This strategy, which is referred to as Subtree Oriented Fault Tolerance (SOFT), utilizes a-virtual displacement technique to reconfigure a spare processor into the tree. The capability of sharing spare processors between subtrees provides the SOFT approach with significantly higher reliability than previous techniques allowing for switch and link fault failures, where reliability is the probability that the tree is functional at a time t, given that it was fault free at time 0. In contrast to other proposals, SOFT is able to tolerate link and switch failures while reducing the number of -redundant links between processors. For binary trees, the approach is shown to yield 0(n) layout. The architecture can be partitioned on separate chips for arbitrarily large trees, while providing fault tolerance for both on and offchip connections. Fault tolerance through performance degradation is also possible with a-SOFT design, as well as application to N-ary trees.
In Section II of this paper, the SOFT architecture is presented for binary trees. Considerations for implementing a SOFT binary tree are discussed, including the placement of spare processors and communication links. Section III provides a formal analysis of reconfigurability in SOFT binary trees in the presence of processor, switch, and link failures. In Section IV, comparisons between the reliability of SOFT and past reconfigurable designs are presented. Finally, Section V extends the SOFT concept to N-ary trees.
II. THE SOFT DESIGN FOR BINARY TREES The SOFT approach to reconfigurable tree architectures employs spare processors at the leaves of the tree and additional links between processors to maintain a complete tree topology in the presence of multiple faulty processors and links. Faulty processors and links are bypassed by utilizing a simple switching structure, which allows a logical displacement of processors and links to occur from the failure to a spare at a leaf. At levels high in the tree, failure of a node results in bypassing the node, thereby allowing communication directly between the faulty node's father and one of its sons. Thus, the faulty node's son assumes the tasks allocated to its faulty father. Since the son is performing the tasks of its father, another processor must be found to assume the son's tasks. In a similar fashion, one of the son's sons assumes its responsibilities. This "logical displacement" continues until a spare is configured in at the leaves. A detailed discussion of SOFT reconfiguration is presented in Section III.
A. Terminology for Binary Trees
All trees are said to have i + 1 levels, with the root at level 0 and the leaves on level i. The term "upper levels" refers to levels 0 through i -1. The root is labeled 1, the left son of any node (processor) n is labeled 2n and the right son is 2n + 1. Two nodes are adjacent if they are connected by a nonredundant or redundant communication link. Thefather of a node n on level k, fn, is the adjacent node on level k -1.
The grandfather of n, gn, is the father of the father of n.
Similarly, the son of a node n is son,. fi and son, represent the father and son nodes of link 1, respectively. The brother of a node n, bn, is the single node having the same f,. b, refers to either node connected to a redundant link 1. A subtree is defined as a tree with < i levels, which is contained in the tree architecture such that the leaves of the subtree correspond to leaves in the complete tree. The leftmost (rightmost) descendent of a subtree is the node which can be found by following only left (right) descendents of the root of the subtree. The cousin of a node n, cous,, is the leftmost (rightmost) descendent of the root on the same level if n is the rightmost (leftmost) descendent of the root. For all other n, cousn is n -(+ )1 if n is a left (right) son of its father. The ancestor of a node n on a level q, A n, is the single node on level q which contains n in its subtree.
B. Allocation of Redundant Nodes and Links
The number of spare processors supported by the SOFT architecture is 2c, where c is an integer: 0 < c < i -1. 1, where x is the leftmost leaf of the root and 0 k < 2c -1, is referred to as a Spare Subtree, or SST. Each SST has an associated spare which is adjacent to its rightmost leaf. The spare adjacent to an SST's leftmost leaf is referred to as its nonassociated spare. In contrast to X-tree or Hypertree structures [9] - [11] , the SOFT topology is not a half-ring structure in which each level contains cousin connections instead of the n to b, connections utilized by SOFT. [1] - [3] . It is also assumed that I/O through the leaves is not required. This is not a necessary restriction, but simplifies presentation of the architecture. Most tree architectures also do not use I/O through the leaves [1] - [3] . In fact, the classical H-tree layout cannot accommodate I/O through the leaves for large trees.
1) Switching Scheme: The virtual processor displacement concept of SOFT reconfiguration is implemented with the switch structures of Fig. 2 . Utilizing the switch structures, design of the processing elements is independent of the reconfiguration scheme and each processor can maintain a standard three input-output port design.
2) Multichip Trees: If an entire tree cannot fit onto a single chip or wafer, then the tree must be partitioned for chip allocation. The major partitioning consideration is imposed by pin limitations. From Fig. 1 , it can be seen that at most one additional link per chip is required for chips containing no leaf processors, and fewer pins per chip for chips containing leaf processors.
3) VLSI Layout of SOFT Trees: To efficiently lay out a SOFT binary tree in VLSI, an adjustment in the general architecture is made in order to employ a variation of the classical 0(n) H-tree layout. The VLSI layout for a tree of i < 4 follows the layout of. Fig. 3(a) . The location of spares depends upon the percentage of spares allocated to the tree. As in Algorithm 1, the spares are located on the nonbrother redundant links, between the leaves. For trees with i > 4 the layout algorithm presented below results in area of 0(n). The result of Algorithm 3 with i = 7 is depicted in Fig. 3(b) . The ellipses represent the five-level subtrees constructed in the first "for loop" of the algorithm. Theorem 1: The SOFT layout, as described in Algorithm 3, for a binary tree of n leaves, has 0(n) area.
Proof: It is well known that the area of an H-tree is 0(n) [12] . The layout can be thought of as having 0(1n) rows and columns. The layout produced by Algorithm 3 (as in Fig. 3 ) has at most one redundant link (additional row or column) parallel to each row or column of the 0(n) H-tree layout. Since each spare can be thought of as lying in a redundant link, the edge of the square corresponding to the layout of a SOFT tree is at most 0(kV1), where k is a constant which, in the worst case is between 2 and 3. Since this is still 0(V4), the area of the SOFT layout is 0(n). O The modification to the general SOFT approach necessary for VLSI layout has the following implications.
1) While the maximum number of spares is '25 percent, the minimum is increased from an arbitrarily small percentage (1 per tree) to = 3 percent (1 per five-level subtree).
2) Since the five-level subtrees do not share any connections among spares, SST's cannot "borrow" spares from neighbors not within that subtree. This does not, however, affect the reliability analysis presented in Section IV, which shows significant reliability enhancement is gained with the SOFT strategy. 
A. Tolerance of Processor Failures
In Section III-A-1, a concise description of SOFT reconfiguration is presented. The second section presents some basic properties of SOFT reconfiguration. Based on these properties, necessary and sufficient conditions for reconfigurability are derived in Section III-A-3. [13] . A concise summary of those algorithms is presented below.
Failure of spares: Bypass the spare. The result is equivalent to taking the two SST's adjacent to the faulty spares and combining them into one larger SST. Failure of SST nodes: If there are no faults within the SST and the associated spare is not configured in as the son of its nonassociated SST, then reconfiguration proceeds toward the associated spare, else reconfiguration proceeds toward the nonassociated spare.
Failure of a node above the SST level: Reconfiguration proceeds from an upper level to an SST. Once the displacement has reached the SST level, it continues as previously described. Three conditions determine the SST toward which reconfiguration proceeds: a) the root of the SST is a descendant of the faulty node, b) all ancestors of the root of the SST on levels between and including the root of the SST and the level below the faulty node are fault free, c) the associated spare of the SST is not being used. If there exists an SST satisfying a), b), and c) then reconfiguration proceeds toward the leftmost such SST, else reconfiguration proceeds toward the leftmost SST satisfying a) and b). If there is no SST for which a) and b) is true then the tree is not reconfigurable. 3) Analysis of Reconfigurability: In previous approaches, unreconfigurable multiple failures correspond to more than one fault in a group of nodes which have been allocated a single spare [6] , [7] . Unreconfigurable multiple failures in the SOFT approach, regardless of the number and location of spares, correspond to faults which force double displacement. For example, consider the failure of a node n. Displacement must occur through a sonn. Consequently, the presence of a fault subset as depicted in Fig. 5(a) is not reconfigurable. If 
where good(n) is 1 only if node n is fault free. Theorem 2 describes reconfigurability at the SST level. Corollaries 1-3 provide necessary and sufficient conditions for reconfiguration of a failure anywhere in a SOFT tree. Theorem 3 describes the class of fault subsets which are reconfigurable in SOFT architectures.
Theorem 2: The failure of any node n within an SST whose root is not displaced is reconfigurable if and only if SS(n) is 1. Proof: A detailed proof can be found in [13] . An outline of the proof follows. From Definition 3, one of three conditions holds for a node in SST S to have SS of 1: a) its associated spare is available, or b) there exists an SST whose spare is available such that all SST's left of S and right of the available spare have either faulty spares or spares configured in to their associated SST's, or c) the associated spare of S is failed and its nonassociated SST has SS of 1. Any one of these three conditions is sufficient for reconfiguration. If none of these three conditions holds, reconfigtJration is not possible. r Corollary 1: Failure of node n, above the SST level, is reconfigurable if and only if the node is not displaced and SS(n) is 1, or, the node is displaced and s(n) is 1. Proof: A detailed proof is contained in [13] . The proof distinguishes the cases of n not displaced and n displaced. If Fig. 2(a) only prevents displacement to the left if the PE has failed (displacement to the right is possible), or displacement to the right if the PE has not failed (displacement to the left is still possible). In Fig. 2(b) Proof: By symmetry, the only switches which need to be considered are switches 1-5 in Fig. 2(a) and (b) and the switches of 2(c). Since the node is PFF, SFF, and LFF prior to the switch failure, the node can support any reconfiguration for which either the faulty switch is not necessary (stuck-open fault), or the switch should be closed, or the-closing of the switch has no effect. The fact that the node is replaceable indicates that the failure of this node is tolerable. As a result, if the node is displaced it can be returned to the undisplaced state. This can be done at upper levels by shifting the displacement into another subtree. At the leaves, the reconfiguration is shiftable since the failure of this leaf is reconfigurable. Failure of the switches of Fig. 2(a) and (b) In spare switching [ Fig. 2(c) [7] . This is demonstrated by first establishing an upper bound on the reliability of the previous approaches, independent of the actual implementation, i.e., their optimal reliability, and comparing it to a lower bound for SOFT trees. Exact reliability calculations of some specific SOFT implementations are also derived and compared.
A. Reliability of Other Approaches
It is assumed that the i + 1 level tree is allocated k spares. A Modular Sparing Approach (MSA) to fault tolerance in binary trees is any approach to reconfigurable design which partitions a tree into k groups of processors and allocates each group of processors one spare to be used exclusively by that group. The work of Raghavendra et al. (RAE) and Hassaan and Agarwal (M trees) can both be classified as MSA. Significantly, SOFT trees are not included in this category. It should be noted that the strategy of Rosenberg [8] is not MSA. However, Rosenberg's strategy does not allow for interconnect or switch failure. With an MSA, each module must be functioning in order for the tree to be functioning. Thus, the reliability can be expressed as the product of the reliability of all of the modules. In general this is m=k RSYS = Rm m= 1 (2) where Rm is the reliability of the mth module. Although some MSA schemes may tolerate interconnect failure, the following reliability analysis considers only processor failures, for sake of simplicity. Since the spare can be configured into the module in case of any single failure, the reliability of each module can be expressed as Rmodule = Rq+ qRq-
where R is the reliability of each individual processor and q is the number of processors in the module, including the spare. The reliability for all processors is assumed to be equal and exponentially distributed (i.e., R = e-Xt). Although this assumption is not accurate for many environments, it does provide an initial point of comparison and is a common assumption in reliability analysis [6] , [7] , [14] . For simplicity, the failure rate of spares, ,u, is assumed to be equal to the failure rate of nonredundant processors X. In the following discussion, it is assumed that the trees can be divided evenly into modules of size q, although the theorem does not rely on this assumption. Theorem 6: Optimal reliability in an MSA tree corresponds to when the tree is divided into modules of equal size: q = (2i+1 -1 + k)/k.
Proof: Consider a tree with equal module size. The following inequality indicates that every time a single node is moved from a module of size q to another module of size q, thereby creating modules of size q + 1 and q -1, RSYS decreases.
[
Additionally, the following inequality indicates that moving nodes from modules of smaller size into modules of larger size will decrease reliability.
[Rq+c+ (q + c)Rq+c
for any 0 < c, and 0 < k < q. E-i It has been shown that the reliability versus spares curve of M-trees is greater than that of RAE [7] . The reason for this is that the RAE scheme allocates an entire spare to the root, whereas M-trees can "spread" the spare out into level 1 (or more) nodes. At lower levels, however, the number of nonredundant nodes per spare is the same. Consequently, as i and/or k increases, the reliabilities of both schemes converge. In contrast, the next section demonstrates that the SOFT approach always results in superior reliability over MSA designs.
B. Reliability of SOFT Trees
The reliability of a redundant system composed of nodes with equal reliability R, which is not subject to degraded performance, can be thought of as a polynomial of degree Derivation of the number of "choices" for the jth fault to occur for each aj-I in optimal MSA trees is straightforward and is less than the lower bound of Theorem 7. places forj per a m=(k-j+ 1) ((2 k )) * Due to the analytical complexity of a global reconfiguration strategy and the variety of possible SOFT implementations, a closed form expression for reliability has not been found for arbitrary size trees with arbitrary numbers of spares. However, the analysis presented in Section III is sufficient to determine the reliability of any specific SOFT tree. The following section presents some exact reliability calculations for example SOFT implementations.
C. Reliability Examples
The reliabilities of four-level trees, implemented by M-tree, RAE, and SOFT as a function of the number of spares, for R e-0.lt, t = 0.5, and t = 1.0, are shown in Table II . It should be noted that the SOFT reliability is superior even when the modular schemes are allocated more spares. The data for t = 1.0 are graphically displayed in Fig. 6 . In Table  III and Fig. 7 , the reliabilities versus time curves of a tree with no redundancy, duplicated four-level trees, and four-level trees with four spares employing an optimal MSA, a SOFT approach, and a tree with optimal reliability are presented. A scheme with optimal reliability guarantees reconfiguration for any number of faults less than or equal to the number of spares. Table III includes calculations for an RAE tree with four spares and an M-tree with five spares. The M-tree approach was allocated five spares due to its inability to support four.
D. Increasing SOFT Reliability
There are several possibilities for enhancing the reliability of a SOFT architecture. If a designer is not concerned with VLSI layout issues or is willing to pay O(n log n) area, the SOFT tree can be implemented such that the leaves are fully connected and the full sharing of inter-SST spares is practical. As an alternative, sharing of spares between the i = 4 leaf subtrees is possible using the procedure of Horowitz and Zorat [9] . A second option is the addition of redundant lines between leaves which are adjacent to the same spare. This allows reconfiguration of SST's with numerous faults, assuming that the spares are available in neighboring SST [6] . Since failures are passed to the leaves, the only nodes which must assume the functions of their brothers are the leaves. Also, as noted before, the redundancy in terms of links is reduced and the reliability is enhanced. The only failures which disable the tree are long runs of failures along the leaves and a failure of a node for which no path of good nodes into the leaves exists, i.e., the class of failures depicted in Fig. 5 .
V. SOFT N-ARY TREES N-ary trees, in which each nonleaf node has N sons, are .more suitable for certain tasks than classical binary trees. For example, 4-ary (quad) tree architectures have been proposed for implementing several classes of artificial intelligence related algorithms [15] . The following brief discussion summarizes how the SOFT approach is applicable to N-ary trees. A more complete description and analysis can be found in [13] .
A. Construction of Reconfigurable N-ary Trees 1) Location of Redundant Lines: A link allocation approach which is applicable to arbitrarily large N is to restrict reconfiguration to displacement of the "outside" two children of each nonleaf node. Redundant links to each of a node's brothers are added only to the two outside brothers. The link redundancy at upper levels is approximately (2N -3) where Nc is the number of spares allocated to the tree, with 1 < c < i -1.
B. Reconfiguration
Reconfiguration is fundamentally the same as in binary trees. A sample reconfiguration for four failures in a 5-ary tree is illustrated in Fig. 8 .
VI. CONCLUSIONS A unique approach to the design of reconfigurable tree architectures has been presented. The design allocates spares at the leaves of trees and allows sharing of spares between subtrees. The architecture has 0(n) VLSI layout for binary trees and is directly extensible to N-ary trees. A lower bound on reliability for a SOFT tree was shown to be more reliable than all modular sparing approaches, with significantly less redundancy. The 
