



Theoretical Computer Science 197 (1998) 17 l-l 88 
Testing and reconfiguration of VLSI linear arrays’ 
Roberto De Prisco a,*, Angelo Monti b, Linda Pagli’ 
a Laboratory for Computer Science, Massachussetts Institute of Technology, 545 Technology Square, 
43-368, Cambridge, MA 02139, USA 
b Dipartimento di Scienze deil’lnformazione. Universitd degli studi di Roma “La Sapienzu”, 
via Sularia 113. 00198 Romu. Italy 
’ Dipurtimmto di Informatica, Universitd di Piss, Corso Italia 40, 56125 Pisu, Italy 
Received March 1995; revised April 1996 
Communicated by G. Aussiello 
Abstract 
Achieving fault tolerance through incorporation of redundancy and reconfiguration is quite 
common. In this paper we study the fault tolerance of linear arrays of N processors with k 
bypass links whose maximum length is g. We consider both arrays with bidirectional links and 
unidirectional links. 
We first consider the problem of testing whether a set of n faulty processors is catastrophic, 
i.e., precludes reconfiguration. We provide new testing algorithms which improve and generalize 
known testing algorithms. For bidirectional arrays we provide an O(kn) time testing algorithm 
and for unidirectional arrays we provide an O(n) time algorithm for the case k = I, and an 
O(kn log k) time algorithm, for the case k > 1. 
When the fault pattern is not catastrophic we study the problem of finding an optimal recon- 
figuration of the array. We consider optimality with respect to two parameters: the size of the 
reconfigured array and the number of redundant links to activate. Considering optimality with 
respect to the size of the reconfigured array, we prove that the problem is NP-hard in the strong 
sense if the bypass links are bidirectional, while it can be solved in O(kng) time if the bypass 
links are unidirectional. Considering optimality with respect to the number of bypass links to 
activate, we prove that the problem can be solved in O(kn) time if the bypass links are bidi- 
rectional, and in O(kng) time if the bypass links are unidirectional. @ 1998-Elsevier Science 
B.V. All rights reserved 
Kr~~vor& Array processors; Catastrophic fault patterns; Fault tolerance; Fault detection; 
Reconfiguration algorithms 
* Corresponding author. E-mail: robdep@theory.lcs.mit.edu. 
’ Part of this work appeared in Proceedings of the 3rd Workshop on Algorithms and Data Structures, 
Lecture Notes in Computer Science, vol. 709 (1993) 553-564. 
0304-3975/98/$19.00 @ 1998 -Elsevier Science B.V. All rights reserved 
PII SO304-3975(97)00238-7 
172 R. De Prisco et al. I Theoretical Computer Science 197 (1998) 171-188 
1. Introduction 
In a linear array of N processing elements, one faulty element is sufficient to stop the 
flow of information from one side to the other. Without the provision of fault-tolerance 
capabilities, the yield of VLSI chips for such an architecture would be so poor that its 
production would be unacceptable. A lot of research has been devoted to the design of 
fault-tolerant parallel architectures. The most important techniques for this purpose can 
be divided into two main groups. The first one does not make use of redundancy on 
the given architecture but tries to simulate the global functioning using the healthy part 
of the machine (e.g., [lo, 131). This approach uses simulation algorithms which should 
guarantee the same functionality with a reasonable slowdown in time. The second 
group includes the techniques that do add redundancy to the given architecture. This 
approach maintains the desired structure by isolating faults and activating certain spare 
links or processors (e.g., [3,7,9, 19,21-231). 
Our approach belongs to the second group. We consider linear arrays with both 
spare processors and links. Beside the regular links connecting neighboring processing 
elements, extra links, called bypass links, are included in a regular fashion. These 
redundant links can be activated in a reconfiguration phase to bypass faulty processors. 
In this work we make the following assumptions: 
- Only processors can fail. 
_ Faults are total, that is, faulty processors cannot route or compare. 
_ Faults are static, that is, faulty processors cannot be repaired. 
Redundant processors elements are used to replace any faulty processor. Redundant 
links are used to bypass the faulty processors and, possibly, to reach redundant pro- 
cessors used as replacement. 
There are essentially two different ways of allocating extra links to the given archi- 
tecture, namely: 
(i) There are spare communication lines that any working processor can use to bypass 
faulty processors. Communication is realized through switches located before and 
after each processor. These switches can be activated to reconfigure the array. This 
approach has been extensively studied [2,3,9, 17, 19,221. 
(ii) A fixed set of spare links is dedicated to each processor. In this case a multiplexer 
located inside the processor element can route messages onto one of its private, 
spare links in case of faults. This approach was first introduced for tree and ring 
architectures [ 12,201, and later extended to linear arrays [4,5, 13-16, 181. 
In this work we follow the second strategy, which is more suitable for the impor- 
tant case of production-time reconfiguration of faulty devices [ 161. Indeed, the first 
strategy allows a general on-line reconfiguration at the expense of a larger propagation 
time along the spare lines. This propagation time may become intolerably large in a 
fixed communication pattern, in particular, when the message must traverse a chain of 
switches to bypass a sequence of consecutive faulty processors. 
Both strategies require that a switch (or multiplexer) be traversed at the input and 
output of each processing element. We assume that both the number of spare links, and 
R. De Prisco et al. I Theoretical Computer Science 197 (1998) 171-188 173 
the length of the longest link, are reasonably small, so that the circuitry added to 
each processor is simple, and the communication delays along the links are negligible. 
Therefore, the total propagation time depends only on the number of processors, which 
is fixed. 
Since any processing element in the array may be faulty, each one of them has to 
be provided with the bypass links. The connections to these bypass links must occupy 
different tracks in the chip. Hence, the total area required by the interconnection network 
is proportional to the length of the array, the number of extra links per processor, and 
the length of the longest link. Finally, note that in each processor, a modest amount 
of circuitry (multiplexer and self-repairing control unit) must be devoted to implement 
the proposed routing discipline. Although, in principle, this discipline could be applied 
to any chip, it is clearly advisable when the functionality of a processing element is 
not too elementary. 
This approach has some inherent limits. Under a realistic assumption that the length 
of the longest link is small with respect to the number of processing elements, re- 
gardless of any amount of redundancy, there are sets of faults occurring at strategic 
positions which affect the chip in an non-reparable way (see [ 131). Such sets of faults, 
called catastrophic fault patterns (CFP for short), have been extensively studied in 
[4, 16, 181, but only for the specific case in which the number of faults in the pattern is 
exactly the length g of the longest bypass link. To have at least g faults, is a necessary 
condition for a fault pattern to be catastrophic [ 131. 
Nayak et al. [ 161 devised an O(g2) time algorithm for testing fault patterns consisting 
of exactly g faults, for redundant arrays with a single bypass link of length g. This 
algorithm has been improved in [14] with an O(kg) time algorithm that solves the 
problem in the more general case of k 3 1 bypass links. The problem representation 
and the solution techniques presented in previous works are not easily extensible to 
the general case of any number m, g<m <N, of faults. 
In this paper, following a completely different approach, we consider the more gen- 
eral problem of testing whether a fault pattern consisting of iz faults, * is catastrophic. 
In addition, when a fault pattern is not catastrophic, we consider the problem of finding 
optimal reconfiguration strategies, where optimality is with respect to either the number 
of processors in the reconfigured array (the reconfiguration is optimal if such a number 
is maximized) or the number of redundant links to activate in order to reconfigure the 
array, i.e., the amount of work needed to reconfigure the array (the reconfiguration is 
optimal if such a number is minimized). 
Our results are the following: 
The problem of testing whether a set of n faulty processing elements is catastrophic 
for a redundant array with k bypass links can be solved in time O(kn) when the links 
in the array are bidirectional, and in time O(nk log k) if k > 1, or in time O(n) if k = 1, 
when the links in the array are unidirectional. 
‘We remark that we will actually consider, without loss of generality, fault patterns of m faults, man, 
subdivided into n blocks of consecutive faulty processors. 
174 R. De Prisco et al. I Theoretical Computer Science 197 fl998) 171-188 
The problem of finding a reconfiguration strategy that is optimal with respect to 
the size of the reconfigured array is NP-hard in the strong sense, when the links are 
bidirectional, while it can be solved in time O(kng), where g is the length of the 
longest bypass link, when the links are unidirectional. 
The problem of finding a reconfiguration strategy that is optimal with respect to the 
number of redundant links used in the reconfiguration, is solvable in O(kn) 
time when the links are bidirectional, and in O(kng) time when the links are uni- 
directional. 
We provide algorithms for all the cases in which the problem can be solved, i.e., 
all but the problem of finding an optimal reconfiguration strategy in arrays with bidi- 
rectional links, where optimality is with respect to the size of the reconfigured array. 
This paper is organized as follows: Basic concepts and a formal definition of the 
problem are introduced in Section 2. A testing algorithm for arrays with bidirectional 
links is given in Section 3. In Section 4, a testing algorithm for arrays with unidirec- 
tional links is provided. Section 5 contains results on reconfiguration strategies that are 
optimal with respect to the size of the reconfigured array. Finally, Section 6 contains 
results on reconfiguration strategies that are optimal with respect to the number of 
redundant links used. Section 7 contains concluding comments and open questions. 
2. Preliminaries 
The basic components of a redundant linear array are the processing elements, or 
simply processors, and the links. There are two kinds of links: regular or bypass. 
Regular links exist between neighboring processors, while the bypass links connect 
non-neighbors processors. The bypass links are used only for reconfiguration purposes 
when faulty processors are detected. 
More precisely, let A = { ~1,. . . pi} denote a linear array of identical processing 
elements connected by regular links (pi, pi+l), 1 <i < N. Let G = {gi, . . . , gk} be an 
ordered set of integers such that gi < g2 < . . <gk. We say that A has redundancy G 
if, for each gl, 1 <t <k, there is a bypass link (pi, ~i+~,), 1 < ibN - gl. Notice that 
the set G does not contain the regular link. We denote by g the length of the longest 
bypass link, i.e., g = gk. 
At the extremities of the array two special processors, called I (for Input) and 0 
(for Output), are responsible for the Z/O functions of the system. We assume that I is 
connected to pl , . . , ps while 0 is connected to pN-_~+i , . . . , pN so that bottlenecks at 
the borders of the array are avoided. 
Example 1. Fig. 1 shows a linear array of 20 processing elements with redundancy 
G = (4). 
We refer to this structure as a redundant linear array or as a redundant array or 
simply as an array. The array is called bidirectional or unidirectional according to the 
R. De Prisco et al. I Theoretical Computer Science 197 (1998) 171-188 175 
Fig. I. A linear array of processors. 
nature of its links. We admit faults occurring in the processors only (i.e., both Z and 
0 and the links always operate correctly). We refer to a processor pi as processor i 
or simply as pi. 
Definition 1. For a redundant linear array A, a fault pattern F is an ordered set of pairs 
of positive integers F= {(f~,l~),(f~,l~),...,(f~,l~)}, where fi+li<fi+i <N-1,+1, 
1 <i<n. 
Each pair (J;, li) identifies the block of faulty processors ph, px+l,. . ., pj;+l,_l. 
Hence, a faulty processor pz is such that fi <z< fi + li for some i, 1 <i<n. Non- 
faulty processors are working processors. A path from a working processor io to a 
possibly fault processor is+1 is a sequence of processors io, il,. . . , i,, i,+l such that, for 
each j = 0, 1, . , s, processor 4 is a working processor connected by a link to processor 
ij+ 1 and 4 = i, if and only if j = z, 0 d j, z <s + 1 (i.e., a processor is used only once). 
The length of the path is s + 1. An escape path is a path from I to 0. We represent 
paths in the following way: since the flow of computation usually goes from processor 
i to processor i + 1, it is enough to indicate those processors for which the computa- 
tion does not continue on the consecutive processor. Formally we give the following 
definition of a path. 
Definition 2. A path P is represented as a triple consisting of a starting processor p,,, 
an ending processor p. and a set of pairs of integers {(el, al ), (e2, az), . . . , (e,, a,)}, 
where l<ei<N, ei #ei if if j and -k<ai<k, for each i=1,2,...,q. 
Processor ei, i = 1,2,. . . , q, has active a link that is not the regular one. The active 
link of processor ei is defined according to ai, namely: 
_ if ai = 0 the active link is from pe, to pe,_l, 
_ if a, < 0 the active link is from pe, to pe, _-9, _“, , , 
_ if ai >0 the active link is from pe, to P~,+~,,, 
All other processors have their regular link active. The path represented by P is 
the sequence of processors obtained starting from pU and following the active links. 
It is obvious that this sequence must not contain a faulty processor, except, possi- 
bly, the last processor pC. An escape path is a path P for which p,, = I and pl, = 0. 
In representing escape paths we will omit processors I and 0. Since by activating 
the link a, of the processor ei, for i = 1,2,. . . ,q, we reconfigure the system (or 
achieve a path from pU to p,), we call the set {(el,al),(e2,a2), . . . , (e4,ag)} 
reconjiguration set. 
176 R. De Prisco et al. ITheoretical Computer Science I97 (1998) 171-188 
.----------------------------_ 
- l @pJW 
lr____, 
I chunk0 ,L____ J r - - - chunk1 
----- 7 
L_________- J 
r &di 1’ 
L____ J; 
, 
L_______________-------------: fault zone 
Fig. 2. A fault pattern and the chunks 
Definition 3. Given a redundant array A, a fault pattern is catastrophic for A if and 
only if no escape path exists. 
Given a fault pattern for a redundant array A, we focus our attention on that part 
of A beginning at processor pf, _-g+l and ending at processor p~,+1,,+~__2. We call fault 
zone this part of the array. Moreover, since all the processors are indistinguishable, 
without loss of generality, we will assume that the fault zone begins at processor PI, 
i.e., ft =y. 
A block of maximum length of working processors in the fault zone will be called 
chunk. More formally, we give the following definition: 
Definition 4. Given an array A with redundancy G and a fault pattern F, chunki, 
16idn - 1, is the block of processors between j; + I, - 1 and ji+t, i.e., the block 
p~+l,, . . . , pi,, __I. Moreover, chunk0 is the block of processors pf, _-g+l,. , pf; -1, i.e., 
the first g - 1 processors of the fault zone, and chunk, is the block of processors 
ph,+l,,, . . , pf;,+r,,+a_2, i.e., the last g - 1 processors of the fault zone. 
Example 2. Consider the fault pattern F = ((7, l), (14,2)} for a bidirectional linear 
array of 20 processors with link redundancy G = (4). Then, the fault zone begins at 
processor p4 and ends at processors pls. There are three chunks: chunk0 begins at pro- 
cessor p4 and ends at processor pb; chunk, begins at processor px and ends at proces- 
sor ~13; chunk2 begins at processor p16 and ends at processor ~18. An escape path P 
is the sequence of Processors 6 PI? P2, P3, P4, PS, P6, PIO, P9, P13, p17, PIX, Pl9, P2Oro. 
The reconfiguration set that achieves this escape path is { (6,1), ( 10, 0), (9,1), ( 13,1)}. 
Fig. 2 shows the fault pattern F, the fault zone, the chunks and the escape path P. 
Notice that only the active links of the processors in the escape path are drawn and 
that .f~ # 9. 
When a fault pattern is not catastrophic, we are interested in finding escape paths. 
Depending on the fault pattern there can exist several escape paths. We are interested in 
finding those escape paths that are optimal with respect to either the size of the recon- 
figured array or the number of redundant links to be activated to reconfigure the array. 
In the former case, optimality is achieved when the size of the reconfigured array 
is maximized, i.e., when the number of processors in the escape path that reconfigures 
the array is maximized. In this case, an optimal escape path is called a maximum 
escape path, and a reconfiguration set that achieves a maximum escape path is called 
a maximum reconjiguration set. 
R. De Prisco et al. I Theoretical Computer Science 197 (1998) 171-188 177 
In the latter case, optimality is achieved when the number of redundant links that we 
have to activate in order to reconfigure the array is minimized. In this case, an optimal 
escape path is called minimum escape path, and a reconfiguration set that achieves a 
minimum escape path is called a minimum reconfiguration set. 
Example 3. Consider the fault pattern F given in Example 2. The escape path P of 
the example, has length 13. It is not a maximum escape path. Indeed, it is easy to 
see that the escape path PM = ((6,1), (10, 0), (9,0), (8,1), ( 13,l)) of 15 processors is a 
maximum escape path. Neither P nor PM are minimum escape paths. The escape path 
P,={(6,1),(13,1)} IS a minimum escape path since it uses only two redundant links 
and there are no escape paths that use only one link. 
3. A testing algorithm for bidirectional arrays 
Let A be a bidirectional array of N processors with redundancy G = {gt, . . . , gk}, and 
let F be a set of m faults grouped into n <m blocks of consecutive faulty processors. 
A straightforward way to test if F is catastrophic for A is to consider the graph whose 
vertices are the working processors and whose set of edges is given by the links 
between working processors, and to test if vertices I and 0 are connected in such a 
graph. By applying a standard algorithm for the test of connectivity we would obtain 
an algorithm with O((N - m)k) time complexity. However, in the case of redundant 
arrays, the usual assumption is that N is much greater than m. We propose an algorithm 
running in O(nk) time. 
The idea is to represent the problem by a graph whose set of vertices is given by the 
chunks of working processors and whose size is linear in the number of faults, and to 
test the connectivity of I and 0 in such a graph. More formally, we construct a graph 
H = (V, E) as follows: The set V of vertices is {Co, Ct , . . . , C,}, where Ci’s represent 
the chunks of F. We will write px E C’i to indicate that the processor px belongs to 
chunki. For each i and j, 0 < i, j < n, i # j, the edge (Ci, Cj) belongs to E if and only if 
there are two processors, p.r E Ci and py E Cj, and an integer t, 1 <t < k, such that 
1~ - XI = gr, that is, if and only if the two processors are connected in A by a bypass 
link. 
We call the graph H the derived graph of the fault pattern F. By definition of 
derived graph it follows that 
Fact 5. A fault pattern F is not catastrophic for an array A, if and only if Co is 
connected with C,, in the derived graph. 
Fig. 3 shows an algorithm, called GRAPH, which constructs the derived graph. Inputs 
to GRAPH are the fault pattern F and the redundancy G. The output is the derived 
graph represented by its adjacency lists. In addition to the code shown, the following 
should be noted: the adjacency list of node Ci is L(Ci), for i = 0, 1, . . . , n and initially 
178 R. De Prisco et al. ITheoretical Computer Science 197 (1998) 171-188 
GRAPH(F, G, H) 
for t = 1 to k do 
j=l 
fori=Oton-ldo 
while z, + gt > y, do 
j=j+l 
endwhile 
while j 5 n and z, 5 vi + gt do 
h(G) = UC*) u c, 
L(C,) = UC,) u G 
j=j+l 
endwhile 
if y,_, 1 cc,+1 + gt then j = j - 1 
endfor 
endfor 
H is given by L(C,) for i = 1, . . ..n 
return(H) 
Fig. 3. Algorithm GRAPH 
L(C) is empty; the first and the last processor of chunk C; are denoted by xi and yi, 
respectively. 
Lemma 6. Algorithm GRAPH constructs the derived graph. 
Proof. To prove the lemma we have to show that C, is inserted into L(C’i) and Ci is 
inserted into L(C,) if and only if (Ci, C,) is an edge of the derived graph. 
Assume that (C;, Cl,) is an edge of the derived graph. Without loss of generality, 
let i<s. By definition of derived graph, there exist two integers, z and t, such that 
1 <t <k, Xi <Z - gt <yi, and x,~ <z<Y,~. Hence, we have xi + gt < y, and 
XS d)‘i + qt. 
Consider the ith iteration of the internal for. At the beginning of this iteration at least 
one among Xi + gt d yj_ 1 and j = i + 1 holds. Hence, j is incremented until xi + gt < yj, 
and for all j’ such that xi + gt < _v,j/ and .Xj’ < yi + gt, chunk Cj, is inserted into L( Ci) 
and chunk Ci is inserted into L(Cj/). Hence, also C is inserted into L(C) and Ci is 
inserted into L(C). 
Assume that C is inserted into L(Ci) and Ci is inserted into L(C,). Without loss of 
generality, let i <s. There must exist a gt such that xi + g, < y, and x,? <yi + qt. This 
implies that there exists an integer z such that x, + gt < yi + gt and x,~ <z < ys. Hence, 
P~-~,, E Ci and pz E C,, and thus, by definition of derived graph, (C,, C,) E E. q 
We remark that some redundant information may be present in the adjacency lists 
(i.e., an edge may appear more than once in the same list), however, this does not 
affect the order of magnitude of the size of the lists and thus the time complexity of 
the testing algorithm. 
Theorem 7. The problem qf’ testing whether a fault pattern of II blocks is catastrophic 
for a bidirectional redundant array with k bypass links is solvable in time O(kn). 
R. De Prisco et al. I Theoretical Computer Science 197 (1998) 171-158 179 
Proof. By Fact 5 and Lemma 6, it is sufficient to show that the algorithm GRAPH 
requires 0(&z) time. Indeed, there are well-known algorithms that test connectivity in 
a graph (V,E) in O(lVl + IEI) time. 
Since the outer for loop requires O(k) time, we simply have to prove that the inner 
for loop requires O(n) time. Let zi be the increment of variable j at the ith iteration 
of the internal for. Clearly, the ith iteration of the for costs O(zi) and C:iO’ zi = n. 
Hence, the internal for requires O(n) time. 0 
4. A testing algorithm for unidirectional arrays 
In this section we study the problem of testing whether a fault pattern is catastrophic 
for a redundant array with unidirectional links. In this case the information (useful for 
the bidirectional case) about the chunks and their relations captured by the derived 
graph, is not sufficient because the starting and the ending points of the links connecting 
two chunks must be taken into account. 
Example 4. Consider the fault pattern F = {(5,1), (7,3), ( 12,2), (16,l)) for a unidirec- 
tional array with redundancy G = (5). The derived graph erroneously suggests that Cj 
can be reached from Cl. Indeed, F is catastrophic whereas in the derived graph there 
is a path from Cs to Cd. 
To cope with this problem we use a different approach and we present a solu- 
tion requiring O(n) time when k = 1, and O(nklogk) time when k > 1. Informally, 
the algorithm looks for all the reachable parts of the array staring from chunko. 
By a reachable part of the array we mean a set, called block, of consecutive (fault or 
working) processors, such that there is a path from Z to each processor of the block. 
A block will generate new blocks if it contains working processors, i.e., if the block 
overlaps one or more chunks. The algorithm considers blocks and chunks in increasing 
order and discards them when they have been exploited to produce new blocks. The 
order of chunks and blocks is given by the starting position. The algorithm ends when 
it cannot create new blocks (because all chunks or blocks have been discarded). A 
fault pattern is not catastrophic if and only if there is a block that lies after the last 
fault. 
In the following we will denote a chunk (or block) by the pair (x, v) where pX and 
pJ are the first and the last processor in the chunk (or block), respectively. We say 
that a pair (x, v) is minimum in a set X of pairs if, for each (u, V) in X, xbu holds, 
and we say that the pair is maximum if, for each (u, v) in X, u 6x holds. Fig. 4 shows 
algorithm TEST, which, given a fault pattern F and the link redundancy G, tests if F 
is catastrophic or not. 
Lemma 8. Given a fault pattern F jLw a linear array with link redundancy G, algo- 
rithm TEST returns true if and only if F is catastrophic. 
180 R. De Prisco et al. I Theoretical Computer Science 197 (1998) 171-188 
TEST(F, G) 
b&n 
- s = {(fl - 9 + Lfl - 1)); J3 = {(h - !7 + Lfl - 1)) 
fori=l.....n-ldoinsert(f,+1,,fi+l-1)intoSendfor 
whileS&0andB#Odo .- 
let (I’, y’) be a minimal element of B 
let (z, y) be a minimal element of S 
case 
y’ < z : delete (z’, y’) from B 
y<z’:delete(z,y)fromS 
otherwise : 
Z = m&z, 2’) 
fori=l,...,kdoinsert(d+g,,y+g;)intoB 
delete (5, y) from S 
endcase 
endwhile 
if in B there is a pair (z’, y'), y' + g > fn + I, return false 
else return true 
end 
Fig. 4. The algorithm TEST. 
Proof. Let (xi, yi) denote the ith chunk inserted into S. It is easy to show, by induction 
on i, that all the blocks (x:, yj) such that ~$62 + gt d yj, 1 d j d k, are inserted into B, 
if and only if there is a path from ~f.,-~+l to pz, xi <z<yi. 
Assume that F is not catastrophic. Let P be an escape path. We assume, without 
loss of generality, that P passes through processor j’i - g + I. Since P is an escape 
path, it must bypass the fault zone. Let pu and pc be two consecutive processors in the 
path P (i.e., there is a link gt between them), such that u <fn and v> fn + I,. Clearly, 
there is a path from processor ~f;_~+l to processor pu. Hence, a block (x’, y’) that 
contains v, i.e., x’ 6 v < y’, is inserted into B. Such a block cannot be deleted anymore. 
Therefore, the if statement returns false. 
Assume that TEST returns false. Then a block (x’, y’) with y’> fn + I, has been 
inserted into B. This implies the existence of a path from P~,-~+I to a processor pz, 
with fn + I, <zd y’. Such a path can be easily extended to an escape path. 
Theorem 9. The problem of testing if cc fault pattern of n blocks is catastrophic for 
a unidirectional array is solvable in time O(n) when the array has only one bypass 
link and in time O(nk log k) when there are k > 1 bypass links. 
Proof. By Lemma 8, it is sufficient to show that the algorithm requires time O(n) 
when k = 1 and O(nk log k) otherwise. 
Since chunks in S are inserted in increasing order, we can organize S as a queue, so 
that insertions and deletions in S and finding the minimal element of S take constant 
time. Hence, the first for requires O(n) time. 
The number of iterations performed in the while is O(b). Indeed, during each iter- 
ation there must be a deletion from S or B, and k insertions into B occur always with 
a deletion from S. Noting that S initially contains n elements and B is a singleton, we 
have that after at most O(nk) iterations one among S and B is empty. 
R. De Prisco et al. I Theoretical Computer Science 197 (1998) 171-188 181 
To complete the time analysis of TEST we need to analyze the time complexity of 
the body of the while loop and the final if. 
The new blocks to be inserted into B are not produced in increasing order, hence a 
standard queue is not sufficient to efficiently handle set B (unless k = 1). We organize 
B in k subsets Bi, 1 < i < k, each containing the blocks produced using the link gl. 
When a new block is generated using the link gi, we insert it in the corresponding Bi. 
The inserted block is maximal in Bi, hence each Bi can be organized as a standard 
queue. Moreover, we organize the k “heads” of the queues Bi, which contain the 
minimal elements of each Bi, as a heap providing the minimal element of B. With 
this data structure, we can insert and delete in B or find the minimal element of B in 
O(log k) time (or constant time if k = 1). 
Each iteration of the while loop may require time O(log k), 0( 1) or O(k log k) 
depending on the case inside the while (if k = 1 then it requires constant time). Since 
the number of while iterations is O(kn), we only have to show that the number of while 
iterations that require time O(k log k) is O(n). This follows easily noting that each of 
these iterations requires a deletion from S, whose cardinality is initially n. Finally the 
if test may be done in O(k) time by finding the maximal elements of each Bi (constant 
time if k= 1). 0 
5. Maximum escape paths 
In this section we consider the problem of finding maximum escape paths. We 
prove that the problem is NP-hard for a bidirectional redundant array, while for a 
unidirectional array we provide an algorithm that finds a maximum escape path in 
0( kng ) time. 
First, we take into account the case of bidirectional links. Consider the following 
Muximum Reconfiguration Length (MRL for short) problem. 
Definition 10 (MRL problem). Given a bidirectional redundant array A consisting of 
N processors, with link redundancy G, a fault pattern F and a positive integer K, is 
there an escape path of length at least K? 
The following lemma holds. 
Lemma 11. The MRL problem is NP-complete in the strong sense. 
Proof. We reduce the problem of testing whether there exists a Hamiltonian path 
between two given vertices of a graph (HP for short), known to be NP-complete (see 
[6]), to the MRL problem. Since it is easy to give a non-deterministic polynomial-time 
algorithm that solves the MRL problem we conclude that MRL is NP-complete. 
Let H = (V,E) be the input graph for the HP problem. Without loss of generality, 
assume that V={1,2,..., n} and that 1 and n are the vertices to be tested. 
182 R. De Prisco et al. I Theoretical Computer Science 197 (1998) 171-188 
Consider the following instance of the MRL problem. The array A consists of 
N = (6n3 - 3n2 - 9n + 8)/2 processing elements. For i = 1,2,. . . , n define 
ai = (n + i - 2)n2 + 
(n - l)(n - 2) + (i - l)(i - 2) + , 
2 
The link redundancy G has, for each edge (i, j) E E, a bypass link of length lai - ai/. 
Moreover, there is an additional bypass link of length g = ai (it is easy to see that this 
is the longest link). The faulty pattern consists of all the processing elements pk such 
that kfui, for i= 1,2 ,..., n, i.e., the only non-faulty elements are pa,, pal,. . . , pa,,. 
Formally, we have that F={(l,y- l),(al + 1,~ -al - l),...,(a,_l + l,a,-a,_, - 
1 ), (a, + 1, g - 1 )}. Finally, let K = n. 
Notice that the above MRL instance can be constructed in time polynomial in the 
size n of the graph and all the integers occurring in the description of the instance are 
polynomially related to n. 
We will prove that H has a Hamiltonian path if and only if the above instance of 
the MRL problem admits a solution, i.e., if there is an escape path of size n. In order 




Any escape path must traverse pa, and pu,,. Indeed, the first and the last block 
of faults consist of g - 1 faulty elements and thus any escape path must traverse 
pa, and pa,, because the longest link has length g. 
If pa,, with 1 <i <n, is traversed by an escape path, then it must be traversed 
after pa, and before pa,,. Indeed, let di,/, 1 < i # j < n, be the distance between pa, 
and pu,, i.e., di.j=luj-u,/=n’li-jl+l(j- l)(j-2)/2-(i-l)(i-2)/21. Since 
(for i#j) it holds that n21i-jl<d,,j<n2ji-jl+n2, then (for l<i#j,u#v<n), 
we have di,j = d,, if and only if {i, j} = {u, z:}. 
Graph H is isomorphic to the graph consisting of the non-faulty elements pa,, 
i = 1,2,. . . , n and their incident links. Indeed, since di,j <dl,, <g, 1 <i# j bn, 
processors pa, and pa, are connected by a bypass link, if and only if vertices 
i and j are connected by an edge in graph H. Moreover, since no other two 
working processors are at a distance d,,,i, this bypass link connects only pa, 
and pu,. 
Now, we can prove that there is an escape path of length at least K = n if and only if 
there is a Hamiltonian path between vertices 1 and n in the graph H. 
Assume that there is an escape path of size K = n. Since in A there are exactly K 
working processors, each processor is involved in the escape path. Since all the working 
processors are traversed, by (i)-(iii) we conclude that there exists a Hamiltonian path 
between vertices 1 and n in H (recall that by the definition of path each processor can 
be traversed at most once). 
Conversely, given a Hamiltonian path between vertices 1 and n in H, by (iii) it 
corresponds to a path from pa, to pa,,, which traverses once all the non-faulty process- 
ing elements of A. This path can be easily extended to an escape path of size K = n 
connecting I to pa, and pa,, to 0 by means of the longest bypass link. 





LENGTH[f,-_S+ l]=o;i = h -g+ 1 
while i < f” + I, + g - 2 do 
ifpi is a working processor and LENGTH[~] is defined then 
for h E (1) u G do 
if LENGTH[a]+l >LENGTH[i+ h] or LENGTH[~+ h] is undefined then 
LENGTH[i+h]=LENGTH(i]+l 
ifh>lthen 
Let t be the integer s.t. h = gt 







P=REcsET[~,+~,+~ - 21 
return(P) 
Fig. 5. The algorithm MAXIMUM-SEI 
Therefore, we can test if there exists a Hamiltonian path between vertices 1 and n 
in H by testing if there exists an escape path of size at least K for the array A. II 
The strong NP-completeness of MRL clearly implies the strong NP-hardness of the 
problem of finding a maximum escape path for a bidirectional array. 
When the array is unidirectional, the problem of finding a maximum escape path is 
“easy” and can be solved in O(kng) time. 
Fig. 5 shows an algorithm, called MAXIMUM-SET, which, given the redundancy of 
a unidirectional array A and a non-catastrophic fault pattern, constructs a maximum 
reconfiguration set for A. Before analyzing algorithm MAXIMUM-SET we remark that in 
the code of MAXIMUM-SET we use, for the sake of simplicity, assignments of sets to the 
REC_SET'S, whose cost is proportional to the cardinality of the sets. However, we can 
construct the REC-SET'S by means of pointers, so that each assignment takes constant 
time. Hence, though we use assignments of sets, we consider that each assignment 
takes constant time. 
Lemma 12. MAXIMUM-SET is correct and constructs a maximum reconjiguration set in 
0(/k) time, where & is the number of working processors in the fault zone. 
Proof. LetusdefinethesetB[s]={iI(i=s-g,, t=1,2,...,kori=s-1) and pi is not 
faulty}. Observe that, since the array is unidirectional, we can reach processor ps only 
from one of {pi 1 iEB[s]}. 
Let z be an integer such that, ft - g + 1 <z < fn + I, + g - 2. We want to prove 
the following invariant: at the iteration of the while for which i = z, LENGTH[Z] is the 
184 R. De Prisco et ul. /Theoretical Computer Science 197 (1998) 171-188 
length of a longest path from processor fr - g + 1 to processor z, and REC_SET[Z] is 
a reconfiguration set that achieves a path (from pf, +!+I to pz) of such a length. we 
prove the invariant by induction. It is easy to see that the invariant holds for z <,f,, 
for which LENGTH[Z]=Z - (,fi - g + 1). 
Suppose that the invariant holds for any j<z. Consider the iterations of the while 
in which i was equal to j with j E B[z] (notice that if such a set is empty, no path to 
pz exists). The algorithm has already considered the path to pz passing through pi: 
this path has been discarded if it was shorter than an already considered path, while, it 
has been stored in REC_SET[Z] and its length in LENGTH[Z], if it was the longest among 
all the already considered paths. Hence, once i = z, all the possible paths have been 
considered and the longest one has been recorded. 
Thus, LENGTH[,~, + I, + g - 21 is the length of a longest path from processor ft -g + 1 
to processor fi, + I,, + g - 2, and REC_SET[~,, + l,, + g - 21 is a reconfiguration set that 
achieves a path of such a length. Moreover, since the array is unidirectional, any 
maximum escape path must pass through ft - g + 1 and fn + I, + g - 2. This means 
that REC_SET[~,, + I, + g - 21 is a maximum reconfiguration set. 
The complexity of MAXIMUM-SET is easily computed: the first for takes O(e) time 
and the while with the nested for takes 0(/k) time. Hence, the algorithm runs in time 
O(dk). q 
The next lemma states that if the chunks of a fault pattern are enough big, then 
the fault pattern can be “splitted” into several fault patterns which can be considered 
separately. 
Lemma 13. Let F = {(,f,, II), . . . , (.fn, l,)}, be u fault pattern for (I unidirectional 
array with redundancy G. If there exist integers jl, j2, . . . , j,, with jl < j2 < . . <j, < n, 
such that chunk,,, 1 <i <s, has more than 2g - 4 processors then F is a CFP if and 
onlyifat leastoneamony tlzefaultpatternsF;={(f~,l~),...,(fj,,lj,)},~={(fj,+~+ 
l~~l,,+~),...~<.~:,lj2)}~...,~={(fj,~I+~~l,~_-I+~),...,(fn,ln)} is a CFf’. 
Proof. For the sake of contradiction, assume that F is catastrophic and none among 
F;,F,,..., l$ is catastrophic. Since chunkj, has more than 2g - 4 processors any escape 
path for c ends before the beginning of any escape path for &, . Then concatenating 
the escape paths of 4, F2,. . . , < we can construct an escape path for F, contradicting 
the hypothesis that F is catastrophic. U 
Theorem 14. Given a unidirectional array with k redundant links and whose longest 
link has length g, the problem of finding a maximum escape path jar a non catas- 
trophic fault pattern of n blocks of faults is solvable in time O(kng). 
Proof. Let F be the fault pattern. Split F into s fault patterns, F;, . . , <, such that 
between l$ and &,, for i=1,2,...,s - 2, there is a chunk of more than 2g - 4 
processors. By Lemma 13, a maximum reconfiguration set for F is given by the union 
R. De Prisco et al. I Theoretical Computer Science 197 (1998) 17I-188 185 
of the s reconfiguration sets obtained applying the algorithm MAXIMUM-SET to F;, . . ,e. 
Let n; be the number of blocks in 4, 1 di<s. Clearly, c:=, ni = PZ. Since the number 
of elements in the fault zone for I$ is less than niy + (n - 1)(2g - 3) + 2g - 2 
(remember that each chunk has less than 29-4 processor and that E is not catastrophic 
for A, hence each block has less than g elements), by Lemma 12, constructing a 
maximum reconfiguration set for 4 takes O(nigk) time. Hence, constructing a maximum 
reconfiguration set for F takes O(CgZ1 njgk) time, that is, O(kng) time. 
6. Minimum escape paths 
In this section we consider the problem of finding minimum escape paths. When 
the array is bidirectional we provide an algorithm solving the problem in O(kn) time, 
and for the unidirectional case we propose an algorithm solving the problem in O(kizg) 
time. First, we consider the case of bidirectional link. 
Theorem 15. Given u bidirectional array with k redundant links whose longest link 
has length g, the problem ofjinding a minimum escape path for a non-catastrophic 
fault pattern of n blocks of jaults is solvable in O(kn) time. 
Proof. By Lemma 5, an escape path exists if and only if chunks Ce and C,, in the 
derived graph are connected. Since each edge in the derived graph corresponds to 
the use of a redundant link, the number of redundant links that we have to use in 
any reconfiguration set is at least equal to the length of the shortest path from Co 
to C, in the derived graph. On the other hand, given a path from CO to C, it is 
easy to obtain a reconfiguration set whose cardinality is exactly the length of the path. 
Hence, we can find a minimum escape path by finding a shortest path from CO to 
C, in the derived graph. It is well known that this problem is solvable in time linear 
in the number of edges. The derived graph has at most O(kn) edges, because it is 
constructed in O(kn) time. 0 
Now we consider the case of unidirectional links. In this case we can use the same 
technique used to find a maximum escape path. Fig. 6 shows an algorithm, called 
MINIMUM-SET, that, given the redundancy of a redundant array and a non-catastrophic 
fault pattern, constructs a minimum reconfiguration set. 
Lemma 16. Algorithm MINIMUM_SET is correct and constructs a minimum reconjigu- 
ration set in time O(k/), where C is the number of working processors in the fault 
zone. 
Proof. Let us define the set B[s]={il(i=s-g,, t=1,2,...,k OY i=s-1) and pi is not 
faulty}. Observe that since the array is unidirectional we can reach processor ps only 
from one of {pi liEB[s]}. 
186 R. De Prisco et al. I Theoretical Computer Science 197 (1998) 171-158 
MINIMUMSET(F,G,P) 
for i = f, - 9 + 1 to f,, + 1, + 9 - 2 do 
LINKS(+undefined; RECSET[i]= 0 
endfor 
LINKS[h - 9 + l] = 0; i = fl -9 + 1 
while i < fn+ 1, +g - 2 do 
if pz is a working processor and LINKS[~] is defined then 
if LINKS[~] < LINKS[~ + 1) or LINKS[; + 1] is undefined then 
LINKS[~ + ~]=LINKS[$ RECSET[~ + ~]=RECSET[~] 
endif 
for h E G do 
if LINKS[~]+~ <LINKS[~ + h] or LINKS[I + h] is undefined then 
LINKS[~ + ~]=LINKs[~]+~ 






P=RECSET[ jn + 1, + g - 21 
return(P) 
Fig. 6. The algorithm MINIMUMSET. 
Fix an integer z, fi - g + 1 <z <JiI + 1, + g - 2. As in Lemma 12, we can prove by 
induction the following invariant: at the iteration of the while for which i = z, LINKS[Z] 
is the minimum number of redundant links that any path from processor f; - g + 1 to 
processor z must use, and REC_SET[Z] is a reconfiguration set that achieves a path using 
such a minimum number of redundant links. 
Thus, LINKS[~, + I, + g - 21 is the minimum number of redundant links that any path 
from processor jj - g + 1 to processor fn + I, + g - 2 must use, and KEC_SET[J;, + I,, + 
g - 21 is a reconfiguration set that achieves an escape path that uses such a number of 
redundant links. Hence, REC_SET[~~ + I, + g - 21 is a minimum escape path. 
The complexity of MINIMUM-SET is easily computed: the first for takes O(l) time. 
The while with the nested for takes O(Lk) time. Hence, the algorithm runs in 
0(/k) time. 0 
Theorem 17. Given a unidirectional array with k redundant links and whose longest 
link has length g, the problem of finding a minimum escape path for a non catas- 
trophic fault pattern of n blocks of faults is solvable in time O(kng). 
Proof. Let F be the fault pattern. Split F into s fault patterns, 4,. . . ,f$, such that 
between I$ and &,, , for i=1,2,...,s - 2, there is a chunk of more than 2g - 4 
processors. By Lemma 13, a minimum reconfiguration set for F is given by the union 
of the s reconfiguration sets obtained applying the algorithm MINIMUM-SET to F;, . , 4. 
The rest of the proof is as in Theorem 14. 0 
R. De Prism et ul. I Tizeoretituf Computer Science 197 11998) 171-188 187 
7. Summary and open questions 
In this paper we studied the problem of providing fault-tolerant capabilities to par- 
allel architectures by means of redundancy. This approach consists of adding spare 
processors and extra links that can be used to bypass faulty processing elements and 
reconfigure the architecture with no slow down in the performance. 
No matter how much redundancy is provided, it is always possible to have a set of 
faulty elements for which no reconfiguration is possible. Such sets of faults are called 
catastrophic. 
Before attempting any reconfig~ation it is important to test whether the set of faults 
is catastrophic. When a set of faults is not catastrophic it is important to provide 
eEicient reconfiguration algorithms that provide optimal reconfigurations. 
In this paper we have considered linear arrays of processing elements. We have 
considered both the case when the array has bidirectional links and the case when the 
array has unidirectional links. We have provided new testing algorithms which improve 
and generalize previous known algorithms. 
We have also considered the problem of finding optimal recon~guration when the 
set of faults is not catastrophic. Optimality is considered either with respect to the size 
of the reconfigured array or with the amount of changes needed to reconfigure the 
array. We proved that when the links are bidirectional, the problem of finding optimal 
recon~guration with respect to the size of the reconfigured array is NP-hard in the 
strong sense. In all the other three cases we provided algorithms which efficiently find 
an optimal reconfiguration. 
In this paper the case of linear array has been extensively studied, however, some 
questions still remain open. For instance the given O(kng) algorithm to find an optimal 
reconfiguration of a unidirectional array maximizing the size of the reconfigured ar- 
ray, is indeed pseudo-polynomial in g. Better algorithms for this problem might exist. 
Also, the problem of failures of the links has not been considered yet. Hence, an- 
other direction for research is to consider fault patterns consisting of both processing 
elements and links. 
Other parallel architecture are used in practice. In particular, bidimensional arrays: 
memory chips are organized in this form and many existing parallel machines have 
a mesh architecture (see e.g., [ 111) and their importance is still increasing nowadays. 
The approach adopted here might be useful also to study bidimensional arrays of 
processors. 
References 
[ 11 K.P. Belkhale, P. Banejee, Reconfiguration strategies in VLSI processor arrays, in: Proc. Intemat. Conf. 
on Computer Design, 1988, pp. 418-421. 
[2] J. Bruck, R. Cypher, C. Ho, Tolerating faults in a mesh with a row of spare nodes, in: 4th IEEE Symp. 
on Parallel and Distributed Processing, Arlington, 1992, pp. 12-19. 
[3] M. Chean. J.A.B. Fortes, A taxonomy of reconfiguration techniques for fault-tolerant processor arrays, 
IEEE Comput. 23 (1990) 55-69. 
188 R. De Prisco et al. /Theoretical Computer Science 197 (1998) 171-188 
[4] R. De Prisco, A. De Santis, Catastrophic faults in reconfigurable VLSI linear arrays, Discrete Appl. 
Math. 75 (1997) 1055123. 
[5] R. De Prisco, A. Monti. On reconfiguration of VLSI linear arrays, in: 3rd Workshop on Algorithms 
and Data Structures, Lecture Notes in Computer Science, vol. 709, 1993, pp. 5533564. 
[6] M. Garey, D. Johnson, Computers and Intractability, Freeman, New York, 1979. 
[7] J.W. Greene, A.E. Carnal, Configuration of VLSI arrays in presence of defects, J. ACM 31 (1984) 
694-717. 
[8] E. Horowitz, S. Sahni, Fundamentals of Computer Algorithms, Computer Science Press, Rockville, MD, 
1978. 
[9] S.H. Hosseini, On fault-tolerant structure, distributed fault-diagnosis, reconfiguration, and recovery of 
the army processors, IEEE Trans. Comput. 38 (1989) 9322942. 
[IO] C. Kaklamanis, A.R. Karlin, F.T. Leighton, V. Milenkovic. P. Raghavan, S. Rao, C. Thomborson, 
A. Tsantilas, Asymptotically tight bounds for computing with faulty arrays of processors, in: Proc. of 
31st Annual Symp. on Foundation of Computer Science, 1990, pp. 285-296. 
[l I] H.T. Kung, Why systolic architecture?, IEEE Comput. 15 (1982) 37-46. 
[12] M.T. Liu, Distributed Loop Computer Network, Adv. Comput. 17 (1978) 163-221, 
[13] A. Nayak, On reconfigurability of some regular architectures, Ph. D. Thesis, Dept. System and Computer 
Engineering, Carleton University, Ottawa, Canada. 1991. 
[14] A. Nayak, L. Pagli, N. Santoro, Efficient construction for VLSI reconfigurable arrays, Integration VLSI 
J. 15 (1993) 133-150. 
[15] A. Nayak, N. Santoro. Bounds on performance of VLSI processor arrays, in: 5th Intemat. Parallel 
Processing Symp., Anaheim, CA, May 199 I. 
[ 161 A. Nayak, N. Santoro, R. Tan, Fault-intolerance of reconfigurable systolic arrays, in: Proc. 20th Intemat. 
Symp. on Fault Tolerant Computing, FTCS’20, 1990, pp. 2022209. 
[17] R. Negrini, M.G. Sami, R. Stefanelli, Fault-tolerance techniques for array structures used in 
supercomputing, IEEE Comput. 19 (1986) 78887. 
[18] L. Pagli, G. Pucci, Counting the number of fault pattern in redundant VLSI arrays, Inform. Process. 
Lett. 50 (1994) 337-342. 
[19] D.K. Pradhan (Ed.), Fault-Tolerant Computing: Theory and Techniques, ~01s. I and 2, Prentice-Hall, 
Englewood Cliff, NJ, 1986. 
[20] C.S. Raghavendra, A. Avizienis, M.D. Ercgovac, Fault tolerance in binary tree architectures, IEEE 
Trans. Comput. C-33 (1984) 5688572. 
[21] A.L. Rosemberg, The diogenes approach to testable fault-tolerant arrays of processors, IEEE Trans. 
Comput. 32 (1983) 9022910. 
[22] V.P. Roychowdhury, J. Bruck, T. Kailath, Efficient algorithms for reconfiguration in VLSI/WSI arrays, 
IEEE Trans. Comput. 39 (1990) 480-489. 
[23] M. Sami, R. Stefanelli, Reconfigurable architectures for VLSI processing arrays, Proc. IEEE 74 (1986) 
712-722. 
