ABSTRACT Increasing the size of target arrays is beneficial to reuse fault-free processing elements (PEs) for reconfiguring 2-D mesh-connected processor arrays with faults. In this paper, we discuss the reconfiguration problem under the row and column rerouting constraint. We present a novel approach, making use of the idea of integer programming, for constructing larger size target arrays. Meanwhile, we propose a new method to deal with the fault-free processing elements in the physical row that is selected for exclusion. Compared with the state-of-arts algorithms, our method can make the fault-free PEs used as much as possible, which means the size of the target array can be significantly improved. Experimental results show that, compared with previous studies, the proposed algorithm achieves better results in terms of the usage rate of fault-free PEs in the host array.
I. INTRODUCTION
Recently, the advanced techniques in very large scale integration (VLSI) have developed rapidly, and lots of processing elements (PEs) can be integrated on a single chip. However, some PEs are invariably going to be faulty with the dramatic increase of the integrated density of the chip during the fabrication process. Hence, efficient fault-tolerant strategies must be employed to enhance system yield and dependability. One typical method to provide fault tolerance is the degradation approach, which uses as many fault-free PEs as possible to construct a fault-free subarray [1] . Many approaches for reconfiguring two-dimensional (2D) processor subarrays have been proposed under three different rerouting constraints [1] , namely 1) row and column bypass, 2) row bypass and column rerouting, 3) row and column rerouting. Most problems that arise under the above mentioned constraints are NP-complete.
Under the second constraint (i.e., row bypass and column rerouting), relevant experts have proposed many algorithms for the reconfiguration of VLSI arrays. An optimal algorithm, called greedy column rerouting (GCR), was proposed in [2] to construct a maximum target array (MTA) in linear time that contains a set of the selected rows. Jiang [3] presented an efficient algorithm based on a flexible rerouting constraint, such that the usage of fault-free PEs achieves considerable improvement. Zhu et al. [4] presented a heuristic approach to reduce the reconfiguration time by identifying fault-free PEs that cannot form the target array. For parallel reconfiguration, Shen et al. [5] proposed a parallel greedy column rerouting algorithm to accelerate the algorithm GCR, and Wu et al. [6] employed the multithread and divide-andconquer approaches for the reconfiguration. In addition, minimizing the total interconnection length (inter-length for short) of the target array can reduce the routing costs, capacitance and dynamic power dissipation. Thus, a dynamic programming approach was introduced in [7] for reducing the power dissipation of a logical array by reducing the number of long-interconnects. Wu et al. [8] proposed a fast algorithm based on the strategy of shortest partial path first extension to reduce the runtime of the algorithm proposed in [7] . A divide-and-conquer algorithm (denoted as ALG14 in this paper) was proposed to minimize the total inter-length in [9] . In our previous work [10] , we presented a new method that based on integer programming for constructing tightly coupled target arrays. Furthermore, we made the use of network flow in [11] (denoted as ALG16 in this paper) to construct an optimal subarray that contains the minimum total inter-length.
It is more flexible and difficult to reconfigure an array under the third constraint (i.e., row and column rerouting), as the rerouting in both row and column directions are considered at the same time [12] . In order to enlarge the size of the target array under the third constraint, Low [12] proposed an efficient heuristic algorithm called RCRoute, which combines GCR and performs row-exclusion to obtain a larger target array. In [13] and [14] , Fukushi et al. utilized a genetic method to construct an MTA on small host arrays. A more flexible algorithm, based on an integrated row and column rerouting constraint, was sequentially presented to increase the size of target arrays in [15] . Wu et al. [16] removed the complicated row-exclusion and presented a simple algorithm, called RCRT10 in this paper, for further improving the size. For the reconfiguration of three-dimensional (3D) mesh-connected processor arrays, see [17] , [18] , and [19] . In this paper, we devote to construct a maximum target arrays of 2D processor array under the row and column rerouting constraint. A mathematical model utilizing integer programming techniques is proposed to obtain logical rows (or columns), which leads that the reconfiguration of 2D processor array can be solved by efficient integer programming solver. Meanwhile, in order to make the fault-free PEs in the host array to be used as many as possible, we reuse faultfree PEs in bottleneck rows [12] on the process of building the integer programming model, yet these PEs are bypassed directly in RCRoute. Compared with the state-of-the-arts, the proposed algorithm can significantly increase the size of target arrays. 
II. PRELIMINARY
A physical (host) array, denoted as H , is an array which may contain some faulty PEs after fabrication. A subarray which includes no faulty PEs after reconfiguration is called a target array (or logical array), denoted as T . The rows (columns) in H are called physical rows (columns). The rows (columns) in T are called logical rows (columns). The architecture, switch states, rerouting and compensation manners of the host array linked by four-port switches are shown in Fig. 1 . As same as in [12] , only PEs are assumed to be faulty. The other parts, such as links, tracks and switches, are assumed to be fault-free.
In this paper, the PE that located at the position (i, j) of the host array is indicated as e i,j . The row (column) bypass scheme and the row (column) rerouting scheme are four typical rerouting schemes. In the row bypass scheme, if e i,j is a faulty PE, then e i,j−1 can directly communicate with e i,j+1 , and e i,j is bypassed. The column bypass scheme can be similarly defined. In the column rerouting scheme, e i,j can directly connect to e i+1,j with external switches, where |j − j | ≤ d col , and d col is called the column compensation distance [12] . In practice, d col is kept small to reduce the overhead of the switching mechanisms. As same as in [12] , d col is limited to 1 in this paper. The row rerouting scheme can be similarly defined.
Given an m × n host array, let R 1 , R 2 , · · · , R m be the m physical rows. Assuming the fault-free PE u is located in physical row R i . Based on the limitation of the column compensation distance, the lower adjacent set Adj + (u) and the upper adjacent set Adj − (u) of PE u can be defined as follows: Fig. 1 has four internal switches [13] . Using these switches, PE has three possible states: Use, Pass Vertically (PassV ) and Pass Horizontally (PassH ), as shown in Fig. 2 . The PassV state corresponds to bypassing the PE vertically and the PassH state equivalents to bypassing the PE horizontally. The reconfiguration problem discussed in this paper can be formalized as follows:
Problem P: Given an m × n host array H and integers r and c, find an m × n fault-free logical array T under the constraint of row and column rerouting scheme such that m ≥ r and n ≥ c.
III. INTEGER PROGRAMMING MODEL OF P
In this section, we aim to build an integer programming model for the relevant array on the current reconfiguring iteration process. Given an m × n host array, without loss of generality, we assume that R 1 , R 2 , · · · , R m are physical rows of the host array that contains some faulty PEs. As shown in Fig. 3(a) , On the process of reconfiguring a target array at the first iteration, all four physical rows in Fig. 3 (a) are selected as logical rows, denoted as R 1 , R 2 , R 3 and R 4 , as shown in Fig. 3(b) . For each PE u in physical row R i , because of the limitation of column compensation distance, at most six PEs can be connected with it, one half of which is in row R i−1 , while the other is in row R i+1 . It is also to say there has at most three links entering the PE u (hereafter referred to as in-links) and at most three links emanating it (hereafter referred to as out-links).
Zhu et al. [4] pointed out that if the PE u is fault-free and the set Adj + (u) (or Adj − (u)) is empty, then the PE u cannot be utilized to form a target array and it becomes unusable. That is to say the PE u does not have any in-link (or out-link). In this paper, we regard these unusable fault-free PEs as badfault-free PEs. In contrast, if the PE u is fault-free and the sets Adj + (u)) and Adj − (u) are non empty (u has in-links and out-links), then we regard PE u as a good-fault-free PE. For example, in Fig. 3(a) , the lower adjacent set Adj + (u 1 ) of faultfree PE u 1 is empty, so u 1 is a bad-fault-free PE. Similarly, the PEs u 2 , u 6 and u 9 are bad-fault-free PEs. If a PE v in the host array is faulty and can be bypassed vertically on the reconfiguration process, it has related in-links or out-links. We regard the PE v as a faulty-PassV PE. For example, the faulty PE f 5 in Fig. 3(a) is located in the last physical row R 4 , it can be bypassed vertically on the process of constructing target arrays, so it is a faulty-PassV PE.
In this paper, our goal is to construct a maximum size target array, which means fault-free PEs in the physical array can be utilized as many as possible. Thus, we translate goodfault-free PEs in the host array into variables of the integer programming model. For example, in Fig. 3(b) , there have 7 good-fault-free PEs, let boolean variables v 1 , v 2 , · · · , v 7 denote them. If v i has the value of 1, it means the PE v i is used in the target array; otherwise, it is bypassed vertical or horizontal, for 1 ≤ i ≤ 7.
In addition, as explained in [11] , there is no PE or link in common for any two logical columns, and all logical columns keep a partial order in the left-to-right manner on the host array. Therefore, in order to build a valid integer programming model, some constraints about these good-fault-free PEs and their in-links (out-links) must be met for the above requirements. Thus, in the host array, we add relevant links for good-fault-free PEs and faulty-PassV PEs according to the in-links and out-links about them. For example, in Fig.  3(b) , there have 9 links in the host array, let boolean variables c 1 , c 2 , · · · , c 9 indicate them. If link c j is true, it means it is used in the target array, otherwise, it is unused, for 1 ≤ j ≤ 9.
After setting the variables of the integer programming model, the related objective function and constraint formulas must be met. Assuming there have q good-fault-free PEs in the host array, which can be denoted as boolean variables v 1 , v 2 , · · · , v q . As mentioned above, our goal is to construct a maximum target array, which means fault-free PEs in the physical array can be utilized as many as possible. Therefore, the objective function of the integer programming model can be summarized as:
For example, in Fig. 3(b 
PEs in the host array, the objective function can be described as max{
It has been explained above that there have some constraints about the good-fault-free PE and its links. Assuming the good-fault-free PE v i has in-links c e , c f , c g and out-links c x , c y , c z . Three constraints about the PE and its links must be met for the above requirements. The details of these constraints can be described as below:
It is obvious that if the PE v i is used or bypassed in the target array, and one in-link is true, then one out-link has the true value. That is to say the number of in-links and out-links related to v i must be equal in the target array. Meanwhile, the PE v i cannot be used for any two logical columns, there has at most one in-link and at most one out-link that can be used in the target array. Thus, the first constraint can be described as:
For example, the PE v 3 in Fig. 3(b) 
For instance, in Fig. 3(b) , if v 3 is used, then one in-link between c 1 and c 2 (or one out-link between c 3 and c 4 ) has the true value. Hence, the formula is
As illustrated above, all logical columns keep a partial order relationship in the target array. Therefore, there do not have any two links crossed each other in the target array. That is to say, if two links c i and c j are crossed each other in the relevant array, at most one of them has the true value. Hence, the third constraint is:
Such as links c 7 and c 8 in Fig. 3(b) , the formula is c 7 +c 8 ≤ 1. So far, the integer programming model has been built with three constraints and the objective function. In practice, these constraints will be applied to all links existed in the relevant array, and produce corresponding mathematical formulas. After solving the model by relevant integer programming solvers, an optimal solution will be obtained. This solution corresponds to the maximum number of fault-free PEs that can be used in the host array. Furthermore, we obtain a maximal number of logical columns according to the solution, as these constraints of the model satisfy the partial order among these logical columns.
If the number of logical columns obtained at the last reconfiguration procedure dissatisfies the minimal dimension requirement of the target array, one logical row will be selected as a bottleneck row R γ , that is a bottleneck for constructing the target array and the fault-free PEs of this row can be used to compensate the faults in the neighbor rows [12] . For example, in Fig. 3(b) , the logical array obtained at the first reconfiguring iteration is an 4 × 1 array, which dissatisfies the minimum dimension requirement of 3 × 2, so a logical row would be selected as a bottleneck row. The process of selecting the bottleneck row R γ in this paper is identical to the process of logical row exclusion (LRE) in [12] . However, on the course of dealing with fault-free PEs in R γ , there has a difference between our method and LRE. For example, in Fig. 3(b) , there does not have any faulty PE in the up and down adjacent locations of v 3 , so v 3 only has the bypass state and would not be used on the next reconfiguration process in LRE. While in our method, for the sake of utilizing as many fault-free PEs as possible, all fault-free PEs in R γ are reused to build the integer programming model at the next iteration process rather than bypassed directly. In addition, faulty PEs in R γ can be bypassed vertically at the next iteration process, so they become faulty-PassV PEs and would be used to build the integer programming model at the next iteration process. For example, in Fig. 3(b) , the logical columns obtained at the first iteration process dissatisfies the minimal dimension requirement, so R 2 will be selected as a bottleneck row, as shown in Fig. 4 . On the next reconfiguration process, faulty PEs f 1 , f 2 and f 3 in R 2 become faulty-PassV PEs. Then we build an integer programming model about the relevant array. As shown in Fig. 4 , there have 11 good-fault-free PEs, let boolean variables v 1 , v 2 , · · · , v 11 denote them. The faulty PEs f 1 , f 2 , f 3 and f 5 are faulty-PassV PEs. We add related links for these good-fault-free PEs and faulty-PassV PEs. There have 26 links in the array, let boolean variables c 1 , c 2 , · · · , c 26 indicate them. After that, we construct the model by utilizing the same method described above and call the relevant integer programming solvers to solve it.
IV. RCRT_IP ALGORITHM
We have presented a new algorithm, denoted as RCRT_IP in this paper, for constructing a maximal target array. It contains procedures Row_First and Column_First. The two Procedures are similar except that the roles of the rows and columns are conversed, so we here only describe how procedure Row_First works.
Given an m × n host array H and constants r and c, the aim of procedure Row_First is to find a row based maximum target array. Constants r and c represent the lowest dimension requirement of a processor in practical application. However, in our simulation experiments, we do not set these two constants actually and we choose the largest logical array as the output one from all of the logical arrays that obtained on m reconfiguring iteration processes. Initially, let S denote the set of m physical rows
Then procedure Row_First builds an integer programming model about the host array H and uses the Gurobi Optimizer [20] to solve it. The maximum number of logical columns is obtained by the solver. Because the model is built about the host array, each logical row in the target array corresponds to a physical row in H , so the number of logical rows in the target array is m.
If the logical rows and columns satisfy the minimal dimension requirement of the target array, then Row_First terminates with the target array obtained as a possible solution. Otherwise, it proceeds to select a logical row from the set S as bottleneck row R γ . Fault-free PEs and faulty PEs in R γ become good-fault-free PEs and faulty-PassV PEs, respectively. Following that, the method of building an integer programming model is applied to the resultant array and obtain the relevant logical columns. In addition, because the bottleneck row R γ is used to construct the model, the number of PEs in logical column C i may not equal to that in logical column C j . Thus, for the sake of constructing a correct target array which satisfies the mesh structure of processors and row compensation distance, the algorithm GCR [2] will be employed in these logical columns and then obtains the relevant logical rows. The whole process described above is then repeated until a maximal target array is found or the number of logical rows (or columns) is less than the required lower bound. In the latter case, procedure Row_First fails in obtaining the desired target array.
The pseudo-code for Row_First is presented as Algorithm 1. The largest array that obtained by procedures Row_First and Column_First is then regarded as the final target array. Fig. 5 illustrates the reconfiguration results of the example in Fig. 3 by algorithm RCRT10 [16] and our proposed algorithm RCRT_IP. As shown in Fig. 5(a) , the maximum target array obtained by algorithm RCRT10 is an 3 × 2 subarray, while it becomes to 3 × 3 by RCRT_IP, as shown in Fig. 5(b) . This is because the logical columns constructed
Build an integer programming model about the host array H and call the Gurobi Optimizer to solve it, resulting in c logical columns; 3: Logical rows r := m; 4: m := r ; n := c ; 5: while (m ≥ r) and (n < c) do 6: Select a bottleneck row R γ from S; 7: Fault-free PEs in R γ become good-fault-free PEs, faulty PEs in R γ become faulty-PassV PEs; 8: Build an integer programming model about the resultant array, then call the Gurobi Optimizer to solve it, resulting in c logical columns; 9: Call the GCR algorithm that employed in the c logical columns, resulting in r logical rows; 10: m := r ; n := c ; 11: end while 12: if (m ≥ r) and (n ≥ c) then 13: the target array is obtained; 14: else 15: algorithm failed; 16: end if by the proposed algorithm are global optimal, yet they are local in the previous algorithm. It is obvious that our proposed algorithm can utilize fault-free PEs more effectively.
V. EXPERIMENTAL RESULTS
We have implemented the algorithm RCRT_IP, RCRT10, ALG14 and ALG16 in C language. For RCRT_IP, we have developed a program for automatically generating the formulas of the model and called the Gurobi Optimizer [20] to solve it. All the options of the Gurobi Optimizer were set as default. In order to make a fair comparison, we use the same random generator to produce faulty PEs in host arrays. A Windows PC with an Intel(R) Xeon(R) E5607 2.27GHz CPU and 4.0 GB of memory is utilized to run the experiments. To evaluate the performance of the two algorithms, harvest (har) and the improvement in harvest are calculated for each logical array. Note that as algorithms ALG14 and ALG16 have the same results in terms of the size of the target array, we denote these two algorithms as ALG14/16 for a short. The har represents how effective the fault-free PEs are utilized in constructing a target array from a host array [12] Table 1 shows the size of target array, har, runtime, imp14/16 and imp10 for physical arrays of size 24 × 24, 32 × 32 and 48 × 48. The fault density is varied from 5% to 15% for a comprehensive comparison. The theoretical maximum target array, that is, the upper bound of the maximal target arrays, is also calculated utilizing the approach shown in [12] . From Table 1 , it is obvious that the proposed algorithm significantly increases the size of target arrays. For example, the target arrays derived from ALG14/16 and RCRT10 are 32 × 20 (of size 640) and 31 × 22 (of size 682) for an 32 × 32 host array with 15% faults, respectively, while for the proposed algorithm RCRT_IP, the target array becomes to 28 × 27 (of size 756). It is closer to the theoretical maximal size (e.g. 30 × 29 (of size 870)) than that derived from ALG14/16 and RCRT10. The imp14/16 and imp10 are up to 15.46% and 10.86%.
Owing to the fact that the constructed model is solved by the integer programming solver, the reconfiguration time of the proposed algorithm is longer than that of ALG14/16 and RCRT10. It is clear from Table 1 that with the increases of the size of host array and fault density, the runtime becomes longer and longer. Even so, the proposed algorithm outperforms the ALG14/16 and RCRT10 in terms of the percentage of harvest. For the fault density of 15% on 32×32 host arrays, the har of ALG14/16 and RCRT10 are 75.18% and 78.30% respectively, while it is increased to 86.80% by RCRT_IP. Thus, the proposed algorithm is suitable for reconfiguration to enhance system yields and reliability. Fig. 6 shows the improvements in harvest for host arrays of size 24 × 24, 32 × 32 and 48 × 48. The fault density is varied from 5% to 15% compared with ALG14/16 and RCRT10.
It is obvious that the improvements in Fig. 6(a) and Fig. 6 (b) are gradually increasing with the increase of fault density on the same host array. For instance, in Fig. 8(a) , on the 48 × 48 host array, the imp10 under the 5% fault density is 2.48%, while they are 4.99%, 5.49% and 5.95% under the 8%, 10% and 15% fault densities, respectively. This implies that these algorithms are nearly optimal for the host arrays for smaller fault densities. But for the larger fault densities, RCRT_IP obtains higher harvest than old ones. This also illustrates that the proposed algorithm RCRT_IP is more flexible on the process of constructing target arrays.
VI. CONCLUSIONS
In this paper, we have proposed a novel algorithm that based on integer programming for reconfiguring VLSI subarrays under the row and column rerouting constraint. The major advantage of the algorithm is that it is capable of constructing a larger target array, which utilizes fault-free PEs as many as possible on the host array. Experimental results show that the proposed algorithm to further increase the size of the target array, and to promote the reliability of the system. The most improvements in harvest are up to 15.46% and 10.86% compared with the latest algorithms under the row bypass and column rerouting and row and column rerouting constraints respectively for the target array derived from the 32 × 32 host array with 15% faults. Moreover, for the larger fault densities, the proposed algorithm is more effective to obtain higher harvest than old ones. 
