Abstract-Fault-tolerant approaches have been widcly em ployed to improve the yield of ULSI and WSI processor arrays.
INTRODUCTION
F OR SOME digital signal processing applications, such as matrix multiplication and some classes of filtering operations, the hexagonal interconnection network has been proposed in literature [2] , [3] as the most efficient. When implemented in a wafer-scale integration (WSI) or an ultra-large scale integration (ULSI), a processor array with thousands of cells can be squeezed into a single wafer or chip. One of the main drawbacks with such large in tegrations is that these arrays are quite prone to defects because of the imperfections in the manufacturing pro cess. A fault not only destroys the regularity of the array, but also may make it useless for the algorithms using the array. Thus fault tolerance is the only solution capable of giving acceptable production yield, as it permits initial testing and subsequent array reconfiguration using spare cells and extra switching hardware. The locality of inter connections and regularity and simplicity of the switching devices are important considerations. Much of the pre vious work in array reconfiguration [4] - [7] has primarily dealt with rectangular or square arrays. These approaches can be broadly divided into multiplexer-based and switched-bus-based models.
The index-mapping algorithms of [5] , [6] ping a set of logical indexes onto a set of physical indexes denoting working cells, Depending on the complexity of the actual algorithm, such schemes tend to have a fair sur vival rate. Each PE is usually directly connected to many other neighboring PE's through large switches/multiplex ers capable of connecting four to eight separate parallel buses. A number of extra communication links are needed to attain proper reconfiguration. Clearly such a scheme is justifiable only when the individual PE's are relatively complex, comprising several thousand transistors, thereby making it highly desirable that all the nonfaulty proces sors be incorporated into the working logical array, On the other hand, the switched-bus architecture, as employed in [7] and others, is quite attractive when the area for an individual cell is small . The reconfiguration is based on a fixed number of horizontal and vertical buses connected using simple 2 X 2 switches. A reeonfi guration technique called modified indexed mapping is used to overcome the deterministic process of index modifi cation, Reconfiguration is analyzed with respect to the switching and routing capabilities of the interconnection network . However, the generalization of these schemes to apply to hexagonal arrays has not been studied much. The task is certainly nontrivial, owing to the asymmetric nature of the hexagonal interconnection and having to maintain transparent connections to all six logical neighbors for each PE, Gordon et al. lill proposed the fi rst reconfi guration scheme intended primarily for hexagonal arrays wherein the individual processors occupy relatively small silicon area. It worked by bypassing faulty as well as some fault free PE's, using a bare minimum in terms of extra switch hardware or links. However, the approach suffers from very poor faull coverage and processor utilization in the event of multiple faults. The authors justify it by claiming that the probability of getting a chip with only one or two faults is quite high, The reconfiguration algorithm HEX REPAIR, presented in this paper, is based on similar as sumptions as [II] , However, HEX-REPAIR is much more robust and has fault coverage rates comparable to the in dex-mapped schemes. The rest of the paper is organized as follows, In Section II, we present the terminology and the fault model used. Sections III and IV explain in detail the inner workings of the algorithm. The proof of cor rectness is derived in Section V, while the fault-coverage characteristics are studied in Section VI. Section VII per tains to implementation related aspects such as the area and delay penalty of reconfiguration, 0278-0070192$03,00
II. PRElIMINARIES
Before embarking on the mechanics of HEX-REPAIR, it will be illuminating to understand the assumptions made. a) PE's, both faulty and nonfaulty, can be bypassed during reconfiguration using simple switches. Thc by passed cells will be referred to as switching elements (SE) to indicate the fact that they cease to perform processing, but serve to maintain the logical interconnections between logically adjacent cells. The switching-elements that are actually working processors are sometimes called pseudo faults in the literature. b) A conventional assumption that many approaches [16] , [9] , [5,] [15] make is that faults affect only the PE's.
The switchcs and interconnects are considered to be fault free. We also makc this assumption . Thc rationale here is that the PE's have to be designed with leading edge rules so as to maximize speed and density, while the switches and interconnects can be designed with more conservative design rules. Also, interconnects are often formed on a singlc laycr, whereas the PE's use multiple layers . This leads to additional failure modes for the processors such as mask misalignment and interlayer shorts, thereby re ducing the chip yield as a polynomial function of the num ber of masking levels used. c) Each PE is associated with a six-port switch that can provide six different switching functions. Such a switch can be designed with as few as 15 ON/O!'!' devices. This is the only overhead in temlS of extra hardware that has to be incorporated into the array. Such low overhead is critical , especially fo r digital signal processing applica tions whcrc thc processors may only comprise a few hundred transistors. Using multiplexers, the relative switching hardware overhead per processor increases to an extent that the validity of the fault-free switch fault model becomes questionable. d) Our approach is intended for off-line rcconfigura tion, mainly to handle production failures. Testing infor mation regard ing the location of faults is assumed to be available using schemes such as the one suggested in [17] .
The goal of reconfi guration is to recover a logical hex agonally connccted array of working cells from the orig inal physical array. Each cell of the array is represented by a pair of physical indexes (Px, py), and a pair of log ical indexes (I" LJ that indicate the indexes of the func tion of each cell a t runtime. The two are the same in the absence of faults, but can differ when faults occur. Log ical indexes of faulty cells and the SE's are set to zero. We let g; denote the fault set, i.e., the collection of phys ical indexes of the faulty elements in the array; and let I g; I denote the size of g;. Further, we let Xi and Yi denote the Px and P I values of the ith fault, respectively.
DefinitiOl l 1: Two cells are said to be horizontally con nected if they lie on the same physical row. Vertically and diagonally ' connected cells are defined similarly.
'
By diagonal. we refer to the one going from the top left of the array and proceeding to the bottom right. 2) Graph Construction: Construct the HCG and the VCG for the given fault pattern.
3) Graph TransversaL: Determine the set Sh of all dis tinct paths in the HCG and the set 51, of all distinct paths in the VCG that start at a root node and end at the sink node. 4) Integer Programming: Determine the solution-set S consisting of nil paths from Sh and nt, paths from S", which together cover all faults in g; and which sat isfy nh :5 number of spare rows = R n" :5 number of spare columns = C.
Configuration Phase: Here we configure each SE so as to ensure proper interconnections in the reconfi gured ar ray. A simple scheme is presented to determine the par ticular configuration of an SE, based on the nature of in teraction between the H lines and V lines at that array location. The final step is in assigning the new logical indices. Fig. 1 shows an 8 X 8 physical grid. The indexes of the 13 shaded processing elements represent the fault set :f. This example array can be reconfigured using just one spare row and one spare column of cells to yield a target logical array of dimension 7 x 7. We shall use this prob lem as our running example throughout the rest of this paper for illustrative purposes.
TIl. THE COVERING PHASE In addition to finding a suitable set of H and V lines, this phase also detem1ines the maximum size working logical array that can be recovered from the given (faulty) physical array. If this maximum is deemed to be insuffi cient, the reconfiguration process can be either terminated or more spares can be added and the algorithm repeated. Procedure 1 outlines the main steps involved in the cov ering phase. These are explained in greater detail in the following paragraphs. in Fig. 2 , the processing elements are visited in the order: (1, 1), (2,1), (1,2), (3,1), (2,2), (1,3), (3,2), (2,3), (1,4), (3, 3) , (2, 4) , (3, 4) . As each fault is encountered, it is mapped to the current fault-index value and the fault-in dex value is then decremented by one. The fault indexes for the example problem are indicated besides the faulty cells in Fig. 1 .
This step thus assigns a number between I and l:f 1 to each fault in the array. This mapping is used in the con struction of the cover grap h s and H!V lines.
B. Horizontal and Vertical Cover Graphs
The horizontal and vertical cover graphs (HCG), (VCG) aid in identifying the maximal sets of faults such that all the faults in one set can be simultaneously covered with a single spare row (HCG) or column (VCG). The idea is to impose a partial order on the set :f so that the search for suitable lines can be restricted to just the fault set, rather than the whole physical array. This is especially important for large arrays with only a few faults.
Formally, these graphs can be defined as follows:
The HCG for a given :f consists of l:fl + I nodes, one for each fault amI a special sink node.
There exists a directed edge from node i to node j, iff r; > Y; and Xi 2: Y0 and Y; -Y; 2: X; -Xi:
In other words, i should not be directly connected to a node that is an ancestor of j.
The VCG for a given :f consists of l:fl + I nodes, one for each fault and a special sink node. There There do not exist nodes
The HCG and VCG, as defined above, can be con structed in O(I:f12) time. Fig. 3 shows the two 13-nodc HCG and VCG for our running example. The circled nodes are those with no ancestors in thc graph. These will be referred to as root nodes.
C. Graph Traversal
The set Sh(S,.) is a collection of paths in the HCG (VCG, respectively). starting from a root node and terminating at the sink node. New paths are generated using a depth first search strategy. Since, theoretically, the number of possible paths in cach graph could be as high as 2 1"'1 /2 , we usc a parameter, called MaxPathLimit, to restrict the maximum number of paths generated for each graph. Pro cedure 2 outlines the path traversal method. The value set for MathPathLimit represents a tradeoff between computation time and a 100% guarantee of find ing a feasible solution if one exists (MaxPathLimit = in fi nity). In our simulations, we have set MaxPathLimit to 1000 because we assume that the faults occur randomly and the number of possible paths in each graph is rela tively small. Thus, the probability that a fault (node in the graph) will occur in one of the 1000 chosen paths is quite high. The solution sct S comprises of n" paths fr om S" and n,. paths fr om S", such that togethcr they covcr all the faults in :J, where n,,(n,.) is less than or equal to the number of spare rows (spare columns) available. This problem is similar to the prime implicant covering problem com monly encountered in gate minimization. A natural rep resentation is thc cover table that contains a row for each horizontal path (member of S, ,) and for each vcrtical path (member of S,.) and a column for each fa ult in :J. A � is placed in the ith row and jth column if fault j is covered by the ith (horizontal and vertical) path. Certain rows and columns can be deleted from the cover table to simplify determining S. The rules for simplification follow.
I) Suppose, fo r a given column j, a � occurs only once in row i. Then this row can be considered an essential path and a part of every solution. This is because faultj can only be covered and replaced by this path. All the other columns k for which there is a � in row i can also be removed. 2) If two or more columns are identical, al l but one can be deleted.
3) Row i is said to dominate row j if it has a � in each column wherej has one, and also in a few additional columns. Furthermore, both i and j should either bc horizontal paths or both be vertical paths. In this case, the dominated row j can be deleted. 4) Column i is said to dominate column j if i contains a � in every row where j has a �, and in addition i has some more � in other rows too. In such a case, we can remove the dominating column.
Hence, if we let Bi be a Boolean variable that is I if row i of the cover table is selected for inclusion in S, and
Bjl' Bj2 ••• Bjn) be the rows that cover column (fault) j; then the solution for covering all faults is given by the product-of-sums Boolean equation
Using the distributive laws of Boolean algebra, this can be equivalently represented by a sum-of-products expression of the fo rm A product tcrm with the fewest number of literals will constitute the minimum number of spares that are needed to obtain a reconfiguration solution. However , any prod uct term that consists of fewer than n;, horizontal paths and n" vertical paths, is an acceptable candidate for re configuration.
example: Table I 
Here the quantities aj,j are either 0 or I and are obtained from the cover table. The solution set S includes all paths that correspond to a nonzero Xj or Yj value at the end of the optimization. The constraints ensure that the number of horizontal paths choscn do not exceed the number of spare rows and the number of vertical paths do not exceed the number of spare columns, The first set of 15'1 con straints ensure that each fault is accounted for in at least one member of S, For m rows and n columns, the algorithm typically finds an reach the optimal vector after a number of iterations that is no bigger than the order of m or n, whichever is larger. In our case, m is detem1ined by the value for MaxPathLimit and is independent of the array or fault-set size and n = 1 g: I. Thus, the number of iterations required is solely determined by the number of faults. For most arrays with up to 25-30 faults that we simulated, we ob tained a solution in less than a second working on a Sun Sparcstation. It is interesting to note that network flow techniques can also be used to determine S.
IV. THE CONFIGURATION PHASE
Once the horizontal and vertical covering paths have been successfully determined, the confi guration phase is next invoked. The three main steps constituting this phase are stated in Procedure 3. 2) Reconfigure the array by assigning logical indices to the remaining good processing elements.
3) Confi gure all the SE appropriately so as to ensure that the restructured array is transparent to the var ious alogrithms using the hexagonal array. Thus a PE can continue to communicate with its logical neighbors on the same links as before, unaware that it now reaches them through some SE's. In the presence of faults occurring in more than one line, this procedure can lead to some overlap between H/V lines. We remove such overlap by removing the COTTlmon faults from all but one line and reconstruct the other lines as before. 
B. Configuring the Switching Elements
A byproduct of the covering strategy used is that the confi guration of the SE's to maintain logical transparency is reduced to a simple TABLE-LOOKUP operation. Note that when we refer to an SE configuration, we actually mean the configuration of the switch associated with that PE. Fig. 5 shows the six different switching functions, of which the first corresponds to a good PE and the latter five correspond to an SE of type "a," "b," "c," "d," and "e," respectively. Table II 
C. Assignment of Logical Indexes
Procedure 4 describes the scheme for assigning the log ical indices to the working cells of the reconfigured array. First we scan, column by column, assigning a logical row index to each PE in that column. This value is determined by the numher of H lines that traverse the column above the row under consideration. An SE gets a logical row index of O.
We then perform a similar operation on each row, as- signing a logical column index to each PE, based on the number of V lines met until that point. It is possible for two cells, CI and Cz, to get assigned the same logical indices when either two H lines or two V lines intersect at the same SE. However, we note that for an n X n phys ical array, the total number of SE' s that result from two such intersecting H or V lines is equal to 2n -1. On the other hand, every spare row or column consists of n cells. Thus, we end up with an cxtra spare cell. We can there fore resolve the conllict by converting one of C1 or Cz to an SE. Corollary 1: Each switching element in thc reconfig ured array is intersected by at most one H line and at most one V line. The proof follows from the H-V line con struction and conflict resolution strategy employed in Pro cedure 4. Table II, C2 can there fore be of type "a" or "e"; and C; can be of type "a" or "d." In all cases, the signal path arriving at the left port of Cz goes to the bottom diagonal port of C { and then comes out of its right port. Since at most one H line can pass through any cell, C; cannot be simultaneously horizontally connected to cell (i -1,j + k) by an H line. Thus, we can extend the same arguments we made before, by replacing occurrenccs of C [ and C;. We observe that with each H line, we move up one physical row; and with each V line we proceed one column to the right. Let this process fi nally terminate at a good cell Cg which lies in a subarea of type T(rr ' , C + c'). This means that the signal path starting at the right port of CI meets r' H lines and c' V lines. Therefore, the physical indexes of Cg will be (i -r', j + c' + 1)_ Thus, the logical indices of Cg are «i -r') -(r -r'), U + c' + 1) -(c + c')) = (i -r, j -c + 1), which is as required.
The proof of correctness for up/down neighbor connec tions can be similarly derived, The case of diagonal con nections is slightly more involved. If the cell (i + 1, j + 1) has a Straight V line, then the path traced will be (i, Q.E.D.
VI. EVALULATION OF THE ALGORITHM
Reconfi guration algorithms, such as HEX-REPAIR, can be compared by thcir probability of getting stuck in a fatal situation, namely, one for which the algorithm fails to fi nd a suitable reconfi guration. Even for optimal algo rithms based on local connections only, there are small clusters of faults that cannot be overcome. For fixed prob ability of cell failurc, the probability that such clusters occur therefore approaches onc as the array becomes in finitely large. Thus, there is a critical size of the array that cannot be overcome unless the probability of single cell failure decreases. Thus, in such cases, a degradation in array size would result unless new spares are added.
One pertinent measure is the critical constant (Xc which is defined [19] to be the largest number such that if (X < (X C ' then for an array with N processors and individual processor failure probability of 1/ NO:, the reconfi guration will almost certainly gct stuck in a fatal situation. Thus, it is desirable to have as small a value for (Xc as possible. The rest of this section pertains to deriving the critical constant for HEX-REPAIR.
Definition: An atomic fail pattern is defined as a fail pattern that cannot be solved by the algorithm; while re moving any one fault from the pattcrn leads to a recon figurable array. Proof An atomic fail pattern for HEX-REPAIR is as follows: Pick any cell other than one lying along the top two rows of the two rightmost columns. Place the fi rst fault there. Now place the second fault to the right and above the first fault. Likewise, place the third fault to the right and above the sccond fault. Fig. 7 shows a typical member of the fail-pattern family. It consists of patterns
Typical member of an atomic fail-pattern for ct.
of the form:
where the top left cell is labeled (1, 1).
Now, consider any three columns. It is clear that the number of patterns T satisfying the constraints mentioned above is given by n-2 n-] n ( )
We can also choose the columns in (3) ways. Hence, the total number of patterns for such an array is (3) 2 . Proof: We consider only the case when all fail pat terns A are each the size k. This is true for HEX-REPAIR. Let FA be a random variable equal to the number of fail patterns in the processor array G. Let E(FA) and E(F � ) be its first and second moments, respectively. Then it was shown in [19] that for positive constants a and {3
where TA•N is the number of fail patterns in G; SCi ) denotes the number of fail patterns B that have exactly i common faults with any given fail pattern A; and N{3 oc T.4.N'
For one spare row and one spare column, by Theorem 2, each fault pattern is of size k = 3. Hence, {3 is also 3.
Therefore, 
E(FA)
It can be further shown that for each fail pattern A of HEX-REPAIR,
Hence, for our fail pattern family, with the individual processor failure probability of p = N-u, it follows that
Hence, dividing by E(FA)2 = 02 • N6( 1 -al , we get E(F2)
As N 
U j
When N -> 00, i.e. when N » k, the number of pat terns is O(N'). Since the number of faults in each pattern is k, we get the critical constant a < I.
Thus, to summarize, our algorithm guarantees a recon fi gurable solution either when 1) the number of faults < the number of spare rows and columns, or 2) the proba bility of failure of an individual processing element � 1/ N, where N is the number of preocessors in the ar ray.
The analysis of Theorem 3 has been verified by exten sive simulations. Fig. 8 shows the successful reconfi gu ration rate (100% fault coverage), and independent runs using one spare row and one spare col umn of cells. The dimension of the arrays used is indi cated in the legend. Especially for the larger arrays, we achieve 100% fault coverage when a > I as expected.
VII. IMPLEMENTATION ISSUES
A switch which realizes the six different switching functions needed in HEX-REPAIR can be realized using only 15 basic ON/OFF devices per cell. This is shown in Fig. 10 . Note, no additional datapaths or multiplexers or long switched buses are required. Table IV shows how the different transistors need to be set ON and OFF in the six cases. Thus, for the hexagonal array, this scheme en tails only 2.5 switching devices per port of a cell.
On the other hand, the switching overheads for direct reconfiguration and other types of fault-stealing ap proaches [5] , [6] is quite high. Based on the reconfigu ration rules for the direct scheme, for any cell (i, j ), pos sible logical neighbors are:
• along the horizontal axis: cells (i -1, j -1), i, j -1), and (i + l,j + 1).
Thus, each cell needs a 7 X I, a 3 x 1 and a 12 x 1 mux for selecting inputs and a 2 x 1 mux for selecting For production-time reconfiguration, these devices are ideally e-beam/tlV programmed floating gate transistors. By e-beam addition or UV deletion of the gate charge, the transistor can be set to either the ON or OFF state. Alter natively, electronically programmable ON/OFF dcvices sueh as pass transistors can be used. Besides offering an easy mechanism for correcting in-service faults, they pro vide for yield enhancement reconfiguration with intercon nection adaptation to a particular problem. However, they need more area since the state also has to be stored. The other advantage of using physically restructurable switches is that they typically introduce a smaller on-state resistance than electronic switches. Thus signal rise time degradation and propagation delay per switched link is considerably reduced. The switch programming is non volatile and need not be run eaeh time on power-on. This is especially important for a large array with many PE's. The chief disadvantage is that they offer only a permanent solution and the connections cannot be reprogrammed at runtime to account for additional faults.
In systolic arrays, timing is important. It is desirable to minimize the additional delay caused by communicating over a logical link which could be made up of a series of physical links connected by switches. Based on an incre mental distributed-RC model of electrical interconnec tion, the delay TRC introduced is roughly N 2 Rs"C" I ' where R,w is the series resistance introduced by each switch and Cl l! is the capacitance imposed by each link and N is the number of links in the path. For a typical minimum ge ometry pass transistor, this can be about 10 ns delay per cm of line length and per square of gate area. In our run ning example where one H line and one V line were used, it can be shown from Theorem 4 that there are at most three links per logical path. Assuming a link is O.l cm, this places an upper bound of 3 ns of the delay. Clearly it is preferable to keep the number of additional links per reconfi gured path as small as possible. The following theorem establishes a bound on the maximum wire length and, therefore, bounds the maximum delay in signals. In practice, however, Ra nd C need to be replaced by the actual number of H lines and V lines that were used in arriving at the solution.
Theorem 4: For a hexagonal array, with R spare rows and C spare columns, the length I, of any path p, between two logically adjacent cells satisfies the relation:
I ::S; max (2R + C + I, 2C + R + 1).
Proof: a) Horizontal direction: From Theorem 1, it follows that the path p has two SE's on every H line be tween the cells; and one SE on every V line between.
Since there can be at most R H lines and C V lines be tween the two, this means that path p can at most com prise 2 R + C SE's. Hence, the maximum length in the horizontal direction, lh is less than or equal to 2R + C + 1. b) Vertical direction: By similar arguments, it can be shown that the maximum length in the vertical direction, II is less than or equal to 2C + R + I. c) Dia{?onal direction: From Theorem 1, we can de duce that this case is a combination of the above two. Every cell on path p which contains a Straight H line re sembles a), and those that contain a Straight V line resem ble b). Thus, the maximum length in the diagonal direc tion ld, subject to postprocessing, is less than or equal to max (2R + C + 1, 2C + R + I). However, fortype "c" cells, the path length increases by just I for each pair of an H line and V line. This has been found to be more the typical case. Hence, ld is more often close to max (R + I,C+I).
The theorem follows from the above three cases. Q.E.D.
Postprocessing: A consequence of the automatic switch settings done by table lookup is that a reconfi gured path may traverse at SE of type "c" twice. The presence of such a loop in the path is unnecessary and is eliminated in a postprocessing step, wherein the abovementioned SE's of type "c" are reconfigured appropriately. Note that these loops do not affect the correctness of the solution, but their removal improves the performance by reducing the overall length. Fig. II (a) shows how such a postpro cessing procedure can detect double traversals of the re configured path between logical cells (l, 2) and (2, 3) at the cell marked with a * and reconfigure it as shown in Fig. II(b) . This step thus eliminates three additional links in the reconfigured path .
VIII. CONCLUSION
In this paper we have presented and analyzed a recon figuration algorithm, HEX-REPAIR, intended for wafer scale hexagonal processor arrays. Reconfiguration schemes can be evaluated by their fault-coverage char acteristics and the accompanying switching overhead needed. The two are usually directly proportional to one another. For example, fault-stealing approaches have good fault coverage but need very large multiplexers and a large number of extra data links between processors. Consequently, they arc not suitable fo r many hexagonal arrays used in digital signal and image processing appli cations, which often have relatively simple procesSOrs , each consisting of a few hundred transistors only.
HEX-REPAIR has been shown to be fairly robust even in the presence of multiple faults. Computational efficient techniques such as fault compaction and suitable heuris tics, such as SE configuration by table-lookup, have been employed to get a solution whenever possible, in time which is polynomial in the number of faults. This is de spite the fact that the original problem is NP-col11 plete. The only extra hardware needed to implement this algo rithm is a switch made up of 15 ON/OFF devices per pro cessor. No extra data paths are introduced between pro cessing elements . Also, the switch complexity is independent of the number of rows and columns of spare cells used . The correctness of the reconfigured solution and bounds on path length increase have been derived.
