Achieving fault-tolerance through incorporation of redundancy and reconguration is quite common. In this paper we study the fault-tolerance of linear arrays of N processors with k bypass links whose maximum length is g. We consider both arrays with bidirectional links and unidirectional links.
Achieving fault-tolerance through incorporation of redundancy and reconguration is quite common. In this paper we study the fault-tolerance of linear arrays of N processors with k bypass links whose maximum length is g. We consider both arrays with bidirectional links and unidirectional links.
We rst consider the problem of testing whether a set of n faulty processors is catastrophic, i.e., precludes recon guration. We provide new testing algorithms which improve and generalize known testing algorithms.
For bidirectional arrays we provide an O(kn) time testing algorithm and for unidirectional arrays we provide an O(n) time algorithm for the case k = 1, and an O(kn log k) time algorithm, for the case k > 1.
When the fault pattern is not catastrophic we study the problem of nding an optimal recon guration of the array. We consider optimality with respect to two parameters: the size of the recon gured array and the number of redundant links to activate. Considering optimality with respect to the size of the recon gured array, we prove that the problem is NP-hard in the strong sense if the bypass links are bidirectional, while it can be 
Introduction
In a linear array of N processing elements, one faulty element is su cient to stop the ow of information from one side to the other. Without the provision of fault-tolerance capabilities, the yield of VLSI chips for such an architecture would be so poor that its production would be unacceptable. A lot of research has been devoted to the design of fault-tolerant parallel architectures.
The most important techniques for this purpose can be divided into two main groups. The rst one does not make use of redundancy on the given architecture but tries to simulate the global functioning using the healthy part of the machine (e.g., 10, 13] ). This approach uses simulation algorithms which should guarantee the same functionality with a reasonable slowdown in time.
The second group includes the techniques that do add redundancy to the given architecture. This approach maintains the desired structure by isolating faults and activating certain spare links or processors (e.g., 3, 6, 9, 19 ,21{23]).
Our approach belongs to the second group. We consider linear arrays with both spare processors and links. Beside the regular links connecting neighboring processing elements, extra links, called bypass links, are included in a regular fashion. These redundant links can be activated in a recon guration phase to bypass faulty processors. In this work we make the following assumptions:
{ Only processors can fail. { Faults are total, that is, faulty processors cannot route or compare. { Faults are static, that is, faulty processors cannot be repaired.
Redundant processors elements are used to replace any faulty processor. Redundant links are used to bypass the faulty processors and, possibly, to reach redundant processors used as replacement.
There are essentially two di erent ways of allocating extra links to the given architecture, namely:
(i) There are spare communication lines that any working processor can use to bypass faulty processors. Communication is realized through switches located before and after each processor. These switches can be activated to recon gure the array. This approach has been extensively studied 2, 3, 9, 17, 19, 22] . (ii) A xed set of spare links is dedicated to each processor. In this case a multiplexer located inside the processor element can route messages onto one of its private, spare links in case of faults. This approach was rst introduced for tree and ring architectures 12, 20] , and later extended to linear arrays 4,5,13{16,18] .
In this work we follow the second strategy, which is more suitable for the im-portant case of production-time recon guration of faulty devices 16]. Indeed, the rst strategy allows a general on-line recon guration at the expense of a larger propagation time along the spare lines. This propagation time may become intolerably large in a xed communication pattern, in particular when the message must traverse a chain of switches to bypass a sequence of consecutive faulty processors.
Both strategies require that a switch (or multiplexer) be traversed at the input and output of each processing element. We assume that both the number of spare links, and the length of the longest link, are reasonably small, so that the circuitry added to each processor is simple, and the communication delays along the links are negligible. Therefore, the total propagation time depends only on the number of processors, which is xed.
Since any processing element in the array may be faulty, each one of them has to be provided with the bypass links. The connections to these bypass links must occupy di erent tracks in the chip. Hence the total area required by the interconnection network is proportional to the length of the array, the number of extra links per processor, and the length of the longest link. Finally, note that in each processor, a modest amount of circuitry (multiplexer and self repairing control unit) must be devoted to implement the proposed routing discipline. Although, in principle, this discipline could be applied to any chip, it is clearly advisable when the functionality of a processing element is not too elementary.
This approach has some inherent limits. Under a realistic assumption that the length of the longest link is small with respect to the number of processing elements, regardless of any amount of redundancy, there are sets of faults occurring at strategic positions which a ect the chip in an non reparable way (see 13] In this paper, following a completely di erent approach, we consider the more general problem of testing whether a fault pattern consisting of n faults 1 , is catastrophic. In addition, when a fault pattern is not catastrophic, we consider the problem of nding optimal recon guration strategies, where optimality is with respect to either the number of processors in the recon gured array (the recon guration is optimal if such a number is maximized) or the number of redundant links to activate in order to recon gure the array, that is, the amount of work needed to recon gure the array (the recon guration is optimal if such a number is minimized).
Our results are the following.
The problem of testing whether a set of n faulty processing elements is catastrophic for a redundant array with k bypass links can be solved in time O(kn) when the links in the array are bidirectional, and in time O(nk log k) if k > 1, or in time O(n) if k = 1, when the links in the array are unidirectional.
The problem of nding a recon guration strategy that is optimal with respect to the size of the recon gured array is NP-hard in the strong sense, when the links are bidirectional, while it can be solved in time O(kng), where g is the length of the longest bypass link, when the links are unidirectional.
The problem of nding a recon guration strategy that is optimal with respect to the number of redundant links used in the recon guration, is solvable in O(kn) time when the links are bidirectional, and in O(kng) time when the links are unidirectional.
We provide algorithms for all the cases in which the problem can be solved. That is all, but the problem of nding an optimal recon guration strategy in arrays with bidirectional links, where optimality is with respect to the size of the recon gured array. This paper is organized as follows. Basic concepts and a formal de nition of the problem are introduced in Section 2. A testing algorithm for arrays with bidirectional links is given in Section 3. In Section 4, a testing algorithm for arrays with unidirectional links is provided. Section 5 contains results on recon guration strategies that are optimal with respect to the size of the recongured array. Finally, Section 6 contains results on recon guration strategies that are optimal with respect to the number of redundant links used. Section 7 contains concluding comments and open questions. 1 We remark that we will actually consider, without loss of generality, fault patterns of m faults, m n, subdivided into n blocks of consecutive faulty processors. 
Preliminaries
The basic components of a redundant linear array are the processing elements, or simply processors, and the links. There are two kinds of links: regular or bypass. Regular links exist between neighboring processors, while the bypass links connect non-neighbors processors. The bypass links are used only for recon guration purposes when faulty processors are detected.
More precisely, let A = fp 1 ; : : : ; p N g denote a linear array of identical processing elements connected by regular links (p i ; p i+1 ), 1 i < N. Let G = fg 1 ; : : : ; g k g be an ordered set of integers such that g 1 < g 2 < : : : < g k . We say that A has redundancy G if, for each g t , 1 t k, there is a bypass link (p i ; p i+gt ), 1 i N ? g t . Notice that the set G does not contain the regular link. We denote by g the length of the longest bypass link, i.e., g = g k .
At the extremities of the array two special processors, called I (for Input) and O (for Output), are responsible for the I=O functions of the system. We assume that I is connected to p 1 ; : : : ; p g while O is connected to p N ?g+1 ; : : : ; p N so that bottlenecks at the borders of the array are avoided.
Example 1. Figure 1 shows a linear array of 20 processing elements with redundancy G = f4g.
We refer to this structure as a redundant linear array or as a redundant array or simply as an array. The array is called bidirectional or unidirectional according to the nature of its links. We admit faults occurring in the processors only (i.e. both I and O and the links always operate correctly). We refer to a processor p i as processor i or simply as p i .
De nition 1 For a redundant linear array A, a fault pattern F is an ordered set of pairs of positive integers F = f(f 1 ; l 1 ); (f 2 ; l 2 ); : : : ; (f n ; l n )g, where f i + l i < f i+1 N ? l n + 1 , 1 i < n.
Each pair (f i ; l i ) identi es the block of faulty processors p f i ; p f i +1 ; : : : ; p f i +l i ?1 . Hence a faulty processor p z is such that f i z < f i + l i for some i, 1 i n. Non-faulty processors are working processors. A path from a working processor i 0 to a possibly fault processor i s+1 is a sequence of processors i 0 ; i 1 ; : : : ; i s ; i s+1 such that, for each j = 0; 1; : : : ; s, processor i j is a working processor connected by a link to processor i j+1 and i j = i z if and only if j = z, 0 j; z s+1 (that is, a processor is used only once). The length of the path is s+1. An escape path is a path from I to O. We represent paths in the following way: since the ow of computation usually goes from processor i to processor i + 1, it is enough to indicate those processors for which the computation does not continue on the consecutive processor. Formally we give the following de nition of a path.
De nition 2 A path P is represented as a triple consisting of a starting processor p u , an ending processor p v and a set of pairs of integers f(e 1 ; a 1 ); (e 2 ; a 2 ); : : : ; (e q ; a q )g, where 1 e i N, e i 6 = e j if i 6 = j and ?k a i k, for each i = 1; 2; : : : ; q.
Processor e i , i = 1; 2; : : : ; q, has active a link that is not the regular one. The active link of processor e i is de ned according to a i , namely:
{ if a i = 0 the active link is from p e i to p e i ?1 , { if a i < 0 the active link is from p e i to p e i ?g (?a i ) , { if a i > 0 the active link is from p e i to p e i +ga i .
All other processors have their regular link active. The path represented by P is the sequence of processors obtained starting from p u and following the active links. It is obvious that this sequence must not contain a faulty processor, except, possibly, the last processor p v . An escape path is a path P for which p u = I and p v = O. In representing escape paths we will omit processors I and O. Since by activating the link a i of the processor e i , for i = 1; 2; : : : ; q, we recon gure the system (or achieve a path from p u to p v ), we call the set f(e 1 ; a 1 ); (e 2 ; a 2 ); : : : ; (e q ; a q )g recon guration set.
De nition 3 Given a redundant array A, a fault pattern is catastrophic for A if and only if no escape path exists.
Given a fault pattern for a redundant array A, we focus our attention on that part of A beginning at processor p f 1 ?g+1 and ending at processor p fn+ln+g?2 . We call fault zone this part of the array. Moreover, since all the processors are indistinguishable, without loss of generality, we will assume that the fault zone begins at processor p 1 , i.e., f 1 = g.
A block of maximum length of working processors in the fault zone will be called chunk. More formally we give the following de nition:
De nition 4 Given an array A with redundancy G and a fault pattern F, chunk i , 1 i n ? 1, is the block of processors between f i + l i ? 1 The recon guration set that achieves this escape path is f(6; 1); (10; 0); (9; 1); (13; 1)g. Figure 2 shows the fault pattern F, the fault zone, the chunks and the escape path P.
Notice that only the active links of the processors in the escape path are drawn and that f 1 6 = g.
When a fault pattern is not catastrophic, we are interested in nding escape paths. Depending on the fault pattern there can exist several escape paths. We are interested in nding those escape paths that are optimal with respect to either the size of the recon gured array or the number of redundant links to be activated to recon gure the array.
In the former case, optimality is achieved when the size of the recon gured array is maximized, that is, when the number of processors in the escape path that recon gured the array is maximized. In this case, an optimal escape path is called a maximum escape path, and a recon guration set that achieves a maximum escape path is called a maximum recon guration set.
In the latter case, optimality is achieved when the number of redundant links that we have to activate in order to recon gure the array is minimized. In this case, an optimal escape path is called minimum escape path, and a recon guration set that achieves a minimum escape path is called a minimum recon guration set. is connected with C n in the derived graph. Consider the i th iteration of the internal for. At the beginning of this iteration at least one among x i + g t y j?1 and j = i + 1 holds. Hence, j is incremented until x i + g t y j , and for all j 0 such that x i + g t y j 0 and x j 0 y i + g t , chunk C j 0 is inserted into L(C i ) and chunk C i is inserted into L(C j 0). Hence also C s is inserted into L(C i ) and C i is inserted into L(C s ). Assume that C s is inserted into L(C i ) and C i is inserted into L(C s ). Without loss of generality, let i < s. There must exists a g t such that x i + g t y s and x s y i +g t . This implies that there exists an integer z such that x i +g t y i +g t and x s z y s . Hence p z?gt 2 C i and p z 2 C s , and thus, by de nition of derived graph, (C i ; C s ) 2 E. 2
We remark that some redundant information may be present in the adjacency lists (i.e. an edge may appear more than once in the same list) however this does not a ect the order of magnitude of the size of the lists and thus the time complexity of the testing algorithm. In this section we study the problem of testing whether a fault pattern is catastrophic for a redundant array with unidirectional links. In this case the information (useful for the bidirectional case) about the chunks and their relations captured by the derived graph, is not su cient because the starting and the ending points of the links connecting two chunks must be taken into account.
Example 4. Consider the fault pattern F = f(5; 1); (7; 3); (12; 2); (16; 1)g for a unidirectional array with redundancy G = f5g. The derived graph erroneously suggests that C 3 can be reached from C 1 . Indeed F is catastrophic whereas in the derived graph there is a path from C 0 to C 4 .
To cope with this problem we use a di erent approach and we present a solution requiring O(n) time when k = 1, and O(nk log k) time when k > 1. Informally, the algorithm looks for all the reachable parts of the array staring from chunk 0 . By a reachable part of the array we mean a set, called block, of consecutive (fault or working) processors, such that there is a path from I to each processor of the block. A block will generate new blocks if it contains working processors, i.e., if the block overlaps one or more chunks. The algorithm considers blocks and chunks in increasing order and discards them when they have been exploited to produce new blocks. The order of chunks and blocks is given by the starting position. The algorithm ends when it cannot create new blocks (because all chunks or blocks have been discarded). A fault pattern is not catastrophic if and only if there is a block that lies after the last fault.
In the following we will denote a chunk (or block) by the pair (x; y) where p x and p y are the rst and the last processor in the chunk (or block), respectively. We say that a pair (x; y) is minimum in a set X of pairs if, for each (u; v) in X, x u holds, and we say that the pair is maximum if, for each (u; v) in X, u x holds. Figure 4 shows Assume that F is not catastrophic. Let P be an escape path. We assume, without loss of generality, that P passes through processor f 1 ? g + 1. Since P is an escape path, it must bypass the fault zone. Let p u and p v be two consecutive processors in the path P (i.e., there is a link g t between them), such that u < f n and v > f n + l n . Clearly there is a path from processor p f 1 ?g+1 to processor p u . Hence a block (x 0 ; y 0 ) that contains v, i.e. x 0 v y 0 , is inserted into B. Such a block cannot be deleted anymore. Therefore the if statement returns false. Assume that test returns false. Then a block (x 0 ; y 0 ) with y 0 > f n + l n has been inserted into B. This implies the existence of a path from p f 1 ?g+1 to a processor p z , with f n + l n < z y 0 . Such a path can be easily extended to an escape path. 2 Theorem 9 The problem of testing if a fault pattern of n blocks is catastrophic for a unidirectional array is solvable in time O(n) when the array has only one bypass link and in time O(nk log k) when there are k > 1 bypass links. The new blocks to be inserted into B are not produced in increasing order, hence a standard queue is not su cient to e ciently handle set B (unless k = 1). We organize B in k subsets B i ; 1 i k, each containing the blocks produced using the link g i . When a new block is generated using the link g i , we insert it in the corresponding B i . The inserted block is maximal in B i , hence each B i can be organized as a standard queue. Moreover, we organize the k \heads" of the queues B i , which contain the minimal elements of each B i , as a heap providing the minimal element of B. 
PROOF. By Lemma

Maximum escape paths
In this section we consider the problem of nding maximum escape paths. We prove that the problem is NP-hard for a bidirectional redundant array, while for a unidirectional array we provide an algorithm that nds a maximum escape path in O(kng) time.
First we take into account the case of bidirectional links. Consider the following Maximum Recon guration Length (MRL for short) problem.
De nition 10 (MRL problem) Given a bidirectional redundant array A consisting of N processors, with link redundancy G, a fault pattern F and a positive integer K, is there an escape path of length at least K?
The following lemma holds.
Lemma 11
The MRL problem is NP-complete in the strong sense.
PROOF. We reduce the problem of testing whether there exists a hamiltonian path between two given vertices of a graph (HP for short), known to be NP-complete (see 7]), to the MRL problem. Since it is easy to give a non deterministic polynomial time algorithm that solves the MRL problem we conclude that MRL is NP-complete. Let The faulty pattern consists of all the processing elements p k such that k 6 = a i , for i = 1; 2; : : : ; n, that is, the only non faulty elements are p a 1 ; p a 2 ; : : : ; p an .
Formally we have that F = f(1; g ?1); (a 1 + 1; a 2 ?a 1 ?1); : : : ; (a n?1 + 1; a n ? a n?1 ? 1); (a n + 1; g ? 1)g. Finally let K = n.
Notice that the above MRL instance can be constructed in time polynomial in the size n of the graph and all the integers occurring in the description of the instance are polynomially related to n. We will prove that H has an hamiltonian path if and only if the above instance of the MRL problem admits a solution, that is, if there is an escape path of size n. In order to prove this, we rst need the following three facts.
(i) Any escape path must traverse p a 1 and p an . Indeed, the rst and the last block of faults consist of g ? 1 faulty elements and thus any escape path must traverse p a 1 and p an because the longest link has length g.
(ii) If p a i , with 1 < i < n, is traversed by an escape path, then it must be traversed after p a 1 and before p an . Indeed, let d i;j , 1 i 6 = j n, be the distance between p a i and p a j , that is, d i;j = ja j ? a i j = n Now we can prove that there is an escape path of length at least K = n if and only if there is a hamiltonian path between vertices 1 and n in the graph H. Assume that there is an escape path of size K = n. Since in A there are exactly K working processors, each processor is involved in the escape path. Since all the working processors are traversed, by (i); (ii) and (iii) we conclude that there exists a hamiltonian path between vertices 1 and n in H (recall that by the de nition of path each processor can be traversed at most once). Conversely, given a hamiltonian path between vertices 1 and n in H, by (iii) it corresponds to a path from p a 1 to p an , which traverses once all the non faulty processing elements of A. This path can be easily extended to an escape path of size K = n connecting I to p a 1 and p an to O by means of the longest bypass link. Therefore we can test if there exists a hamiltonian path between vertices 1 and n in H by testing if there exists an escape path of size at least K for the array A. 2
The strong NP-completeness of MRL clearly implies the strong NP-hardness of the problem of nding a maximum escape path for a bidirectional array.
When the array is unidirectional, the problem of nding a maximum escape path is \easy" and can be solved in O(kng) time. Figure 5 shows an algorithm, called maximum set, which, given the redundancy of a unidirectional array A and a non-catastrophic fault pattern, constructs a maximum recon guration set for A. Before analyzing algorithm maximum set we remark that in the code of maximum set we use, for the sake of simplicity, assignments of sets to the rec set's, whose cost is proportional to the cardinality of the sets. However, we can construct the rec set's by means of pointers, so that each assignment takes constant time. Hence, though we use assignments of sets, we consider that each assignment takes constant time.
Lemma 12 maximum set is correct and constructs a maximum recon guration set in O(`k) time, where`is the number of working processors in the fault zone.
PROOF. Let us de ne the set B s] = fij(i = s ? g t ; t = 1; 2; : : : ; k or i = s?1) and p i is not faultyg. Observe that, since the array is unidirectional, we can reach processor p s only from one of fp i ji 2 B s]g. Let z be an integer such that, f 1 ? g + 1 < z f n + l n + g ? 2. We want to prove the following invariant: at the iteration of the while for which i = z, length z] is the length of a longest path from processor f 1 ?g+1 to processor z, and rec set z] is a recon guration set that achieves a path (from p f 1 ?g+1 to p z ) of such a length. we prove the invariant by induction. It is easy to see maximum set(F; G; P) for i = f 1 ? g + 1 to f n + l n + g ? 2 2] return(P) Suppose that the invariant holds for any j < z. Consider the iterations of the while in which i was equal to j with j 2 B z] (notice that if such a set is empty, no path to p z exists). The algorithm has already considered the path to p z passing through p j : this path has been discarded if it was shorter than an already considered path, while, it has been stored in rec set z] and its length in length z], if it was the longest among all the already considered paths. Hence once i = z, all the possible paths have been considered and the longest one has been recorded.
Thus length f n + l n + g ? 2] is the length of a longest path from processor f 1 ? g + 1 to processor f n + l n + g ? 2, and rec set f n + l n + g ? 2] is a recon guration set that achieves a path of such a length. Moreover, since the array is unidirectional, any maximum escape path must pass through f 1 ?g+1 and f n + l n + g ? 2. This means that rec set f n + l n + g ? 2] is a maximum recon guration set.
The complexity of maximum set is easily computed: the rst for takes O(`) time and the while with the nested for takes O(`k) time. Hence the algorithm runs in time O(`k). 2
Next lemma states that if the chunks of a fault pattern are enough big, then the fault pattern can be \splitted" into several fault patterns which can be considered separately.
Lemma 13 Let F = f(f 1 ; l 1 ); : : : ; (f n ; l n )g, be a fault pattern for a unidirectional array with redundancy G. If there exist integers j 1 ; j 2 ; : : : ; j s , with j 1 < j 2 < : : : < j s < n, such that chunk j i , 1 i s, has more than 2g ? 4 processors then F is a CFP if and only if at least one among the fault patterns F 1 = f(f 1 ; l 1 ); : : : ; (f j 1 ; l j 1 )g, F 2 = f(f j 1 +1 +l 1 ; l j 1 +1 ); : : : ; (f j 2 ; l j 2 )g : : : : : : ; F s = f(f j s?1 +1 ; l j s?1 +1 ); : : : ; (f n ; l n ) g is a CFP.
PROOF. For the sake of contradiction, assume that F is catastrophic and none among F 1 ; F 2 ; : : : ; F s is catastrophic. Since chunk j i has more than 2g ?4 processors any escape path for F i ends before the beginning of any escape path for F i+1 . Then concatenating the escape paths of F 1 ; F 2 ; : : : ; F s we can construct an escape path for F, contradicting the hypothesis that F is catastrophic. 2 Theorem 14 Given a unidirectional array with k redundant links and whose longest link has length g, the problem of nding a maximum escape path for a non catastrophic fault pattern of n blocks of faults is solvable in time O(kng).
PROOF. Let F be the fault pattern. Split F into s fault patterns, F 1 ; : : : ; F s , such that between F i and F i+1 , for i = 1; 2; : : : ; s ?2, there is a chunk of more than 2g ? 4 processors. By Lemma 13, a maximum recon guration set for F is given by the union of the s recon guration sets obtained applying the algorithm maximum set to F 1 ; : : : ; F s . Let n i be the number of blocks in F i , 1 i s. Clearly P s j=1 n i = n. Since the number of elements in the fault zone for F i is less than n i g + (n ? 1)(2g ? 3) + 2g ? 2 (remember that each chunk has less than 2g ? 4 processor and that F i is not catastrophic for A, hence each block has less than g elements), by Lemma 12, constructing a maximum recon guration set for F i takes O(n i gk) time. Hence, constructing a maximum recon guration set for F takes O( P s j=1 n j gk) time, that is, O(kng) time. 2 6 Minimum escape paths
In this section we consider the problem of nding minimum escape paths. When the array is bidirectional we provide an algorithm solving the problem in O(kn) time, and for the unidirectional case we propose an algorithm solving the problem in O(kng) time. First we consider the case of bidirectional link.
Theorem 15 Given a bidirectional array with k redundant link whose longest link has length g, the problem of nding a minimum escape path for a non minimum set(F; G; P) for i = f 1 ? g + 1 to f n + l n + g ? 2 2] return(P) C n in the derived graph are connected. Since each edge in the derived graph corresponds to the use of a redundant link, the number of redundant links that we have to use in any recon guration set is at least equal to the length of the shortest path from C 0 to C n in the derived graph. On the other hand, given a path from C 0 to C n it is easy to obtain a recon guration set whose cardinality is exactly the length of the path. Hence we can nd a minimum escape path by nding a shortest path from C 0 to C n in the derived graph. It is well-known that this problem is solvable in time linear in the number of edges. The derived graph has at most O(kn) edges, because it is constructed in O(kn) time. 2
Now we consider the case of unidirectional links. In this case we can use the same technique used to nd a maximum escape path. Figure 6 shows an algorithm, called minimum set, that, given the redundancy of a redundant array and a non-catastrophic fault pattern, constructs a minimum recon guration set.
Lemma 16 Algorithm minimum set is correct and constructs a minimum recon guration set in time O(k`), where`is the number of working processors in the fault zone.
PROOF. Let us de ne the set B s] = fij(i = s ? g t ; t = 1; 2; : : : ; k or i = s ? 1) and p i is not faultyg. Observe that since the array is unidirectional we can reach processor p s only from one of fp i ji 2 B s]g. Fix an integer z, f 1 ? g + 1 < z f n + l n + g ? 2. As in Lemma 12, we can prove by induction the following invariant: at the iteration of the while for which i = z, links z] is the minimum number of redundant links that any path from processor f 1 ? g + 1 to processor z must use, and rec set z] is a recon guration set that achieves a path using such a minimum number of redundant links.
Thus links f n + l n + g ? 2] is the minimum number of redundant links that any path from processor f 1 ? g + 1 to processor f n + l n + g ? 2 must use, and rec set f n + l n + g ? 2] is a recon guration set that achieves an escape path that uses such a number of redundant links. Hence rec set f n + l n + g ? 2] is a minimum escape path.
The complexity of minimum set is easily computed: the rst for takes O(`) time. The while with the nested for takes O(`k) time. Hence the algorithm runs in O(`k) time. 2 Theorem 17 Given a unidirectional array with k redundant links and whose longest link has length g, the problem of nding a minimum escape path for a non catastrophic fault pattern of n blocks of faults is solvable in time O(kng).
PROOF. Let F be the fault pattern. Split F into s fault patterns, F 1 ; : : : ; F s , such that between F i and F i+1 , for i = 1; 2; : : : ; s ?2, there is a chunk of more that 2g ? 4 processors. By Lemma 13, a minimum recon guration set for F is given by the union of the s recon guration sets obtained applying the algorithm minimum set to F 1 ; : : : ; F s . The rest of the proof is as in Theorem 14. 2 
Summary and open questions
In this paper we studied the problem of providing fault-tolerant capabilities to parallel architectures by means of redundancy. This approach consists of adding spare processors and extra links that can be used to bypass faulty processing elements and recon gure the architecture with no slow down in the performance.
No matter how much redundancy is provided, it is always possible to have a set of faulty elements for which no recon guration is possible. Such sets of faults are called catastrophic.
Before attempting any recon guration it is important to test whether the set of faults is catastrophic. When a set of faults is not catastrophic it is important to provide e cient recon guration algorithms that provide optimal recon gurations.
In this paper we have considered linear arrays of processing elements. We have considered both the case when the array has bidirectional links and the case when the array has unidirectional links. We have provided new testing algorithms which improve and generalize previous known algorithms.
We have also considered the problem of nding optimal recon guration when the set of faults is not catastrophic. Optimality is considered either with respect to the size of the recon gured array or with the amount of changes needed to recon gure the array. We proved that when the links are bidirectional, the problem of nding optimal recon guration with respect to the size of the recon gured array is NP-hard in the strong sense. In all the other three cases we provided algorithms which e ciently nd an optimal recon guration.
In this paper the case of linear array has been extensively studied, however some questions still remain open. For instance the given O(kng) algorithm to nd an optimal recon guration of a unidirectional array maximizing the size of the recon gured array, is indeed pseudo-polynomial in g. Better algorithms for this problem might exist. Also, the problem of failures of the links has not been considered yet. Hence another direction for research is to consider fault patterns consisting of both processing elements and links.
Other parallel architecture are used in practice. In particular bidimensional arrays: memory chips are organized in this form and many existing parallel machines have a mesh architecture (see for example 11]) and their importance is still increasing nowadays. The approach adopted here might be useful also to study bidimensional arrays of processors.
