Field programmable gate arrays (FPGAs) are widely used in reliability-critical systems due to their reconfiguration ability. However, with the shrinking device feature size and increasing die area, nowadays FPGAs can be deeply affected by the errors induced by electromigration and radiation. To improve the reliability of FPGA-based reconfigurable systems, a permanent fault recovery approach using a domain partition model is proposed in this paper. In the proposed approach, the fault-tolerant FPGA recovery from faults is realized by reloading a proper configuration from a pool of multiple alternative configurations with overlaps. The overlaps are presented as a set of vectors in the domain partition model. To enhance the reliability, a technical procedure is also presented in which the set of vectors are heuristically filtered so that the corresponding small overlaps can be merged into big ones. Experimental results are provided to demonstrate the effectiveness of the proposed approach through applying it to several benchmark circuits. Compared with previous approaches, the proposed approach increased MTTF by up to 18.87%.
Introduction
Field programmable gate arrays (FPGAs) have inherent redundancy and reconfiguration capability, thus are appealing solutions for reliability-critical systems, such as remote, long life, or nonstop applications. However, due to the fast shrinking feature size, the FPGA devices are more vulnerable to electromigration [1] which will induce permanent faults.
To deal with the permanent faults, a popular method is to protect the FPGAs with multiple partially overlapped configurations. Each alternative configuration contains different reprogrammable resources to be implemented in the target circuit. When an error occurs, the current configuration will be swapped out, and one of the alternative configurations with no faulty resource will be swapped in.
Since any defect on the overlaps of multiple configu-rations may make all the involved configurations fail, the problem of planning the overlaps of alternative configurations is critical to maximize the reliability of a fault-tolerant (FT) FPGA. A domain partition (DP) model was proposed in [2] to solve this problem. Using the DP model, the reliability of the FT system can be easily formulated into a function of the overlapping areas. In this way the above problem can be modeled and solved as an optimization problem.
In this article, a DP model approach is proposed for the recovery from permanent faults of FPGA-based reconfigurable systems. The remainder of this paper is organized as follows. Section 2 reviews some previous researches on fault recovery techniques developed for FPGA-based reconfigurable systems. The DP model is introduced in Sect. 3 . Section 4 presents our heuristic procedure based on the DP model. Section 5 details the process of implementing the proposed approach to design an FPGA-based FT system. The experiments on ITC'99 [3] , [4] benchmark circuits are provided in Sect. 6. Section 7 summarizes the key features and the advantages of the proposed approach.
Related Work
Previous research on the fault recovery in reconfigurable hardware can be roughly classified into the following two categories [5] : (1) run-time replacing and rerouting techniques for dynamic generation of alternative configurations after error detection and defect location; and (2) approaches based on precompiled configurations created during the design phase.
In [6] , the recovery with configurations generated by run-time replacing and rerouting techniques was studied. A roving self-testing area-based technique was proposed for fault detection and location. The FPGA was filled with a pattern that every used logic cell was adjacent to at least one spare logic cell. Thus, any defective cell could be bypassed through the nearest spare cell during replacing and rerouting on error. Another example for the use of run-time replacing and rerouting approach can be found in [7] .
The major shortage of recovery techniques based on run-time replacing and rerouting is the long latency for recovery. It usually takes minutes to complete the replacing and rerouting. Moreover, precise fault location is required to identify the defective logic cells or interconnection resources before generating the alternative configurations. The time-consuming precise fault location will also increase the downtime and decrease the system availability.
Copyright c 2011 The Institute of Electronics, Information and Communication Engineers
To reduce the latency for recovery, a precompiled configuration-based approach was proposed in [8] . The entire design was partitioned into a set of tiles. The alternative configurations for each tile were generated in the design phase and stored in a nonvolatile memory. When an error occurs, one of the alternative configurations containing no faulty resource would be loaded from the memory to recover the system.
The proposed approach in this article is also based on precompiled configuration. Other examples for fault recovery using precompiled configurations can be seen in [5] , [9] - [14] .
A drawback of precompiled configuration approach is the extra storage space for the precompiled configurations. Data compression technique may help to save storage space. Further discussion can be found in [5] .
The precompiled configuration-based approaches can be classified into two types: the non-overlapping scheme and the overlapping scheme. The non-overlapping schemes were implemented in [5] , [13] - [15] . With the assumption of single defect, a single recovery attempt can bring the system back to its normal operation since each alternative configuration has been mapped into distinct programmable resources from the basis configuration.
The overlapping scheme usually provides more alternative configurations than the non-overlapping schemes and leads to higher reliability under the same redundancy. However, any defect on the overlaps of multiple configurations may make all the involved configurations fail. Thus, the problem of planning the overlaps of alternative configurations is critical to maximize the reliability of a reconfigurable FT system using the overlapping scheme.
An empirical solution to plan the overlaps is the "k+m" solution which was used in [5] , [10] , [11] , [14] . In these works, each FPGA was divided into tiles. In general, each tile contained k + m columns of configurable logic blocks (CLBs). In each configuration of a certain tile, k columns were used to implement the original circuit and m columns were left as spare. In [15] , one spare column was preserved for each occupied column. Such a solution can be considered as a special case of the "k+m" solution where k = 1 and m = 1.
There are two major shortages of the column-based "k+m" solution. The first shortage is lack of flexibility. When the total number of the available columns in the device is not divisible by k + m, the design cannot be symmetrically divided into tiles and thus the reliability degrades. The other shortage is inefficiency in tolerating the interconnection defects. In each alternative configuration of a certain tile, there are m spare columns prohibited from placing. However, in practice some interconnection resources in the prohibited areas may be utilized to route the design. It results in a much larger overlapping area between the alternative configurations than the designers expected and degrades the reliability.
The stand-by-redundant system introduced in [16] is similar to the "k+m." The available set of CLBs are partitioned into p blocks, in whichm blocks are for spare and the rest are for application. This model is more flexible than the "k+m" since the block is not limited to an entire CLB column.
A dynamic reconfiguration design flow named as early access partial reconfiguration (EAPR) has been proposed by the Xilinx Inc. in [17] . In the EAPR design flow, partial reconfigurations can take place in regions of any rectangular size instead of whole-column regions as it used to be in the modular-design-based flow. Recently, a partition-based design flow using the ISE 12.1 design suite was proposed by the Xilinx Inc. [18] , where engineers can create tiles easily and the interfaces between the tiles will be automatically generated. With these techniques, the engineers can develop more flexible and reliable solutions, such as the DP model based solutions, to implement reconfigurable FT systems.
The DP model was originally proposed in [2] for modeling the FT systems with multiple overlapping configurations. The problem of planning the overlaps and maximizing the reliability was converted into a pattern optimization under given area (POGA) problem. A first-order optimal solution (denoted as the POGA solution) was conjectured to plan the overlaps of configurations. As mentioned in [2] , the POGA solution is not globally optimal. In contrast with the "k+m" solution, the POGA solution is more flexible but less reliable under the same redundancy.
In our previous work [21] and [22] , a second-order approximation domain partition (SOADP) method was developed. It has the ability to find more reliable solutions than the POGA solution by formulating the reliability maximization problem into a second-order planning problem. However, the SOADP method only works well when the redundancy is less than 2. Experiments show that the secondorder approximate model is not accurate enough when there are more than 2 times available resources than those required by the target circuit in the FPGA. Sometimes the objective function of the second-order planning process will even be a constant so that the SOADP method will fail to find the locally optimal solution. In addition, it takes extremely long time to solve the second-order planning problem before we can find the optimal assignment of the alternative configurations.
In this paper, we propose a DP model-based approach for the fault recovery of FPGA-based reconfigurable systems. The overlapping scheme based on precompiled configuration is exploited. The overlap of the alternative configurations of each tile is assigned by a heuristic procedure with the DP model. Thus, the proposed approach is expected to be not only reliable but also less time-consuming than existing solutions.
Domain Partition Model
The proposed heuristic procedure is based on the DP model, thus we need to briefly introduce the DP model before the description of the proposed approach is detailed.
The DP model given in [2] is an abstraction of a recon- figurable device with several versions of circuit configurations as follows: Let S be a fault-tolerant design implemented on a reconfigurable FPGA. The domain-partition of S is defined as a 2 p dimensional vector N, whose components are
, where p is the number of alternative configurations, v is a p dimensional vector whose elements are in {0, 1}, N(v) is the relative area of the overlap of the configurations corresponding with 1s in vector v. Each overlapping area resulted from a non-zero N(v) is called a subarea.
An example of using the DP model in a duple modular redundancy (DMR) system with three alternative configurations is shown in Fig. 1 , where the shaded blocks denote the occupied resources (for application) and the blank blocks represent the unoccupied resources (for spare).
As shown in Fig Partition N must satisfy the following relations:
where V is the set of all the p dimensional vectors whose elements are in {0, 1}, i.e. V = {0, 1} p , v i represents the i-th component of v, r represents the redundancy that equals to the ratio that the area of all available resources in S over the area occupied by a single configuration.
Equation (1) means the area of each overlap cannot be less than 0. Equation (2) means the areas of all available resources are obtained by summing up all the overlapping areas. Equation (3) denotes that the relative area of each configuration is 1/r, which is derived from the definition of r. For example, in the FT system shown in Fig. 1 , r = 2 since there are 6 blocks in total and a half of them are occupied in each configuration. We have V = (000, 001, 010, 011, 100, 101, 110, 111). Obviously, Eqs. (1) and (2) 
Calculations about Cnf 2 and Cnf 3 are similar.
The reliability of S on time t under partition N can be formulated as:
where P A denotes the possibility that the entire FPGA does not contain any faults, P(v) denotes the possibility that the subarea corresponding with the vector v does not contain any faults, |v| denotes the number of 1s in vector v, λ is the failure rate of all available resources in S . The derivation of Eq. (4) was based on the application of the inclusion-exclusion principle. The details can be found in [2] .
In a homogeneous FT system S , P A can be formulated as P A = e −λt , and P(v) can be formulated as P(v) = e −N(v)λt . Thus, we have:
It is proposed in [2] that to maximize the reliability R(N * ) of a homogeneous FT system S , the partition N * must satisfy:
where q is the smallest integer not less than p/r. The above solution is denoted as the POGA solution in this paper, which was proved to satisfy the Karush-KuhnTucker condition. As mentioned in [2] , the POGA solution is not always globally optimal.
Heuristic Procedure

Lessons from Previous Solutions
The theoretical analysis can show that the "k+m" solution is more reliable than the POGA solution. For example, we use the DP model to formulate a "k+m" FT system, as shown in Fig. 2 . As shown in Fig. 2 , there are six alternative configurations, thus p = 6. There are total four available columns, in which two columns are used for application in each configuration, thus r = 2. This solution can be formulated using the DP model as:
where V 2+2 = {111000, 100110, 010101, 001011}.
In the "k+m" design, each column can be considered as a subarea in the DP model. For example, the first column in the left side can be considered as a subarea resulted from N 2+2 (111000), which represents the overlapping areas occupied by Cnf 1 , Cnf 2 and Cnf 3 but not occupied by Cnf 4 , Cnf 5 and Cnf 6 .
The POGA solution for the above "2+2" system (p = 6 and r = 2) is shown in Fig. 3 .
As shown in Fig. 3 , the domain partition N * 2+2 satisfies:
where V * 2+2 = {|v| = 3}. According to Eq. (5), we have:
−19λt/20 − 10e −λt . When t < 6/λ (mean time when 6 faults may occur in the FPGA), we have R(N 2+2 ) > R(N * 2+2 ), which means the "k+m" solution will be more reliable than the POGA solution before 6 faults occur in the "2+2" system.
The above calculations show the reliability of the "k+m" solution is higher than that of the POGA solution under the same redundancy and the same number of configurations. One possible reason is that there are much less subareas in the N 2+2 solution than in the N * 2+2 solution. As shown in Fig. 2 and Fig. 3 , there are only 4 subareas in the N 2+2 solution whereas there are 20 subareas in the N * 2+2 solution. When the first fault occurs, it will fail 3 configurations since each subarea is shared by 3 configurations in both the "k+m" solution and the POGA solution. However, when the second fault occurs, it has larger probability to exist in the previous faulty subarea in the "k+m" solution than in the POGA solution, since the subarea in the "k+m" solution is bigger than that in the POGA solution. Thus the second fault has less chance to fail the rest 3 fault-free configurations in the "k+m" solution, which results in a higher reliability.
Furthermore, we observed that the set V 2+2 is a subset of V * 2+2 and the Hamming distance between every two vectors in V 2+2 is equal. Thus, we propose to improve the POGA solution by merging several small subareas into big ones, for example removing some vectors from the sets {|v| = q} and {|v| = q − 1}, ensuring that the rest vectors have equal Hamming distance from each other, and enlarging the subareas corresponding with the rest vectors.
Improving the POGA Solution
Inspired by the "k+m" solution, we found that the POGA solution can be improved by merging several small subareas into big ones. The proposed domain partition is as follows: The setsV a andV b can be generated by calling the heuristic procedure Hamming Filter (as shown in Fig. 4) , as Hamming Filter(p, a) and Hamming Filter(p, b) .
The procedure Hamming Filter is used to find all possible subsetV of the set {v|v ∈ {0, 1} p , |v| = c}, so that all the vectors inV have equal Hamming distance from each other. Please note the Hamming distance between every two vectors inV is even, thus ham dist ≥ 2. When c < p/2, the sets of the indexes corresponding with 1s respectively in the There may be many equivalent solutions for each ham dist under given p and c, thus we can break the current iteration cycle after a solution has been found (indicated by f lag =TRUE) under a certain ham dist and continue the procedure after increasing ham dist by 2.
A sub-procedure Ham Check is shown in Fig. 5 . It is used to check whether satisfying subsetV can be found which contains the vector v. Define a global variable sln set. sln set is a partial solution set, each component of which is a set of vectors, which have equal Hamming distance from each other.
In the sub-procedure Sln Check shown in Fig. 6 , v x denotes the x-th vector ofV, thus v After we call the procedure Hamming Filter under given p and c, a partial solution set sln set will be returned as the result. If sln set is empty, i.e. no satisfyingV can be found under given p and c, we setV to its initial value, i.e. V = {v|v ∈ {0, 1} p , |v| = c}. Thus we can constructN with the two setsV a andV b , which can be generated by respectively assigning c = a and c = b, and calling the procedure Hamming Filter. If we cannot find satisfyingV a andV b where a = q and b = q − 1, the proposed partitionN will be equivalent to the POGA solution.
We can exhaustively attempt each possible combination of a and b to find the most reliable one. An intuitive explanation of the above factor 2 and 3 is as follows: The subareas overlapped by a alternative configurations are bottleneck of the reliability. Decreasing a helps to reduce the number of overlapped configurations in the bottleneck subareas, and increasing b helps to reduce the area of the bottleneck subareas, which may lead to higher reliability. (Note, the heuristics mentioned in factor 2 and 3 are not always true.)
It is recommended to exhaustively attempt all possible combinations since the time cost is applicable (see Table 2 ).
Implementing the Fault Recovery Approach
There are two major steps to implement the proposed fault recovery approach, i.e. generating the alternative configurations and loading a proper configuration on error.
Generating the Alternative Configurations
To generate the alternative configurations for an FT FPGA, the following steps need to be carried out.
• Step 1: Deciding the parameters There are three parameters which must be determined before applying the proposed DP model-based approach, i.e. the number of tiles (n, for short), the number of configurations for each tile (p) and the redundancy (r). To avoid the Buckets Effect, we evenly divide the FPGA so that all tiles have identical p and r.
The value of r depends on the ratio that the size of the FPGA divided by the size of the original circuit. The value of n and p can be determined according to the reliability requirement of the application. An FT design with bigger n and p has better chance to tolerate defects.
• Step 2: Assigning the overlaps The overlaps of the alternative configurations can be assigned following the proposed solution generated by the heuristic procedure Hamming Filter. Since all tiles have equal p and r, the above process can be carried out only once to determine the alternative configurations for all tiles.
The partitionN will be used to create area constraints in
Step 4.
• Step 3: Tile the design To implement the proposed approach, the target design should be divided into tiles. Each tile contains a sub-circuit of the target functional circuit, which fulfills a subset of the original functionality. The tiles are described as black-box instances and are connected together to fulfill the original functionality in the top-level HDL file of the tiled design. Each tile should be protected by concurrent error detection (CED) techniques so that error can be reported and the faulty tile can be located.
In a tiled design, the top-level HDL file contains blackbox instances of all the partial reconfigurable (PR) tiles, the interconnections between them and the static logic circuits.
The static logic circuits will be assigned to the base region, which are not partially reconfigurable. The static logic circuits include global clocks, bus-macro (BM) enable controls, CED information collector, state machine for PR management, I/O interfaces, etc. The CED information collector monitors the error signals reported by the CED checkers in the PR tiles and drives the state machine for PR management. The state machine handles the BM enable controls and communicates with an external or internal processor which controls the process of partial reconfiguration.
As in the EAPR flow [17] , each tile will be assigned to a partially reconfigurable region (PRR) which is defined in the top-level area group range constraints. Each alternative configuration of a tile is a partially reconfigurable module (PRM) that can be loaded into the corresponding PRR. Each PRM of a certain PRR is generated from a unique constraint file and a common HDL source file describing the PR tile (module) which is also used to generate other PRMs of this PRR.
As in the new partition-based PR flow [18] , each tile is a reconfigurable partition (RP), which is a logical section of the design. Each alternative configuration of a tile is a reconfigurable module (RM) for the corresponding RP. The static logic will lie in static partitions. Bus-macros are no longer required since the logic in the partitions will be connected through partition pins which will be automatically created for all reconfigurable partition ports.
• Step 4: Creating area constraints As shown in the domain partition generated by the procedure Hamming Filter, each configuration is composed of several subareas. For each tile, we place the subareas in the PRR. And then area constraints for each alternative configuration (PRM) can be created according to the partition N solved in Step 2. As mentioned in Step 3, each PRM has a unique constraint file preventing the PRM from occupying the spare resources. An example can be found in Fig. 7 , where p = 2, r = 1.25, n = 2, the proposed partition is: As in Fig. 7 , the shadowed areas are prohibited. The subarea placement and the constraints for the two PRMs of PRR b are omitted which are similar with those for PRR a .
•
Step 5: Generating the configurations EDA tools can be used to generate the configurations from the source files describing the application circuit and the area constraint files. The detailed flow can be found in [17] , [18] .
Concurrent Error Detection
Since the target design is tiled in the proposed approach, CED techniques are required to locate the failed tile so that partial reconfiguration can be initiated on error. CED techniques can be classified into four categories: hardware redundancy CED, time redundancy CED, information redundancy CED, mixed redundancy CED.
Hardware redundancy CED techniques provide quicker response than other CED techniques but require more hardware resources. An example is duplicating the original functional circuit and connecting the both outputs to a comparator for final output. Such a DMR CED technique is application independent. An example of application-specific techniques can be found in [19] , where an additional independent unit is used for predicting some special characteristic (such as parity, 0's or 1's count etc.) of the output according to the input sequence, and a checker compares the predicted characteristic with the output to indicate error on mismatch.
Time redundancy CED techniques require less resource but may cause performance degradation, such as recomputing with shifted operands and inverse comparison [5] . Information redundancy CED techniques are usually application-specific. For example, the inputs are encoded so that to keep some arithmetic characteristic, which can maintain after passing the functional circuit. Then a checker before the final outputs can indicate errors by checking the characteristic of the outputs. Time-shared TMR introduced in [20] is an example of mixed redundancy CED techniques.
Hardware redundancy CED techniques are recommended when implementing the proposed approach so that to shorten the latency of error detection.
Recovery Approach
After alternative configurations have been generated and stored, the reconfigurable FPGA can recover from errors by means of reloading one of the configurations from memory. One simple way to recover from an error is to try all possible configurations alternately until a proper configuration is loaded which does not involve the faulty resources. Such a technique is called as blind reconfiguration, which is introduced in [5] .
Note, SRAM-type FPGAs are sensitive to single event upsets [23] . From the user perspective, pseudo-permanent faults will arise when SEUs change the values of the truth table in the configuration SRAM. In practice, it is not always necessary to distinguish between pseudo-permanent faults and permanent faults if we exhaustively retry the alternative configurations on those faults. In other words, the proposed recovery approach can cover the pseudo-permanent faults induced by the SEUs in an identical process with the permanent faults.
There are several testing methods can be used to determine whether the attempt is successful, such as the builtin self-testing methods introduced in [6] and [7] , and the application-dependant testing method introduced in [24] .
Experiments
Comparison with DP Based Approaches
To evaluate the effectiveness of the proposed domain partition model approach with the heuristic procedure (HPDP for short in the following), several circuits from the ITC'99 benchmark circuits are chosen to be implemented on Xilinx's FPGAs. For the purpose of comparisons, the chosen circuits were also implemented under the POGA approach and the SOADP approach. Feature parameters from implementation tools were imported into a Monte Carlo simulation script running in the MATLAB toolbox to evaluate reliability.
Failure rate of S under partition N is used as a scale to measure the ability of fault tolerance of each implementation, which is defined as:
In each experimental implementation, we symmetrically divided the baseline circuit into 8 tiles, each of which was protected with p alternative configurations. The parameter r is evaluated as the ratio that the total number of available slices in the device over the number of slices needed to implement the baseline circuit. In practice, we usually have up to 3 times redundant resources in the FT design, thus we chose the devices respectively for the benchmark circuits so that it satisfied 1 < r < 3 in each implementation. The parameters assignment is shown in Table 1 . Mean Time to Failure (MTTF) in the FPGA is 1/λ, in which the failure rate of the FPGA device is λ. In the experiments, we took into account the data presented in [25] , and set the defect rate of the FPGA device as λ = 10
. The failure rate of the implementations at a certain time point t and the time costs of HPDP and SOADP approaches are listed in Table 2 . In the HPDP approach, the time cost means the total time for exhaustively attempting all possible combinations of "a" and "b" by calling the procedure Hamming Filter(p,a) and Hamming Filter(p,b) and comparing the reliability in each attempt until finding the most reliable solutionN among the possible combinations. In the SOADP approach, the time cost means the time for running the second-order planning process until finding a locally optimal solution.
As shown in Table 2 , the HPDP solutions outperformed the POGA solutions in all the implementations. Compared with the POGA implementations, the proposed HPDP implementations reduced the failure rate by 11.85% to 78.81% when t = 1/λ (mean time when the first fault may occur in FPGA), and reduced the failure rate by 4.56% to 63.10% when t = 2/λ (mean time when 2 faults may occur in FPGA).
In the only case of "b17" on 4vlx15, the SOADP method provided more reliable solutions than the HPDP. In other cases, the HPDP method provided more reliable solutions. Especially, in the cases where r > 2, the SOADP method failed to provide locally optimal solutions. In some cases, the SOADP processes even immediately exited once started, for the Hiaasen matrixes of the second-order optimization problem were all zero, as marked with short dashes ("−") in time costs in Table 2 . In these cases, the first-order optimal solutions were returned by the SOADP processes, which were equivalent to the corresponding POGA solutions.
As shown in Table 2 , the time cost for calling the proposed procedure and finding the reliable solutionN is applicable.
Comparison with Stand-by-Redundant
For the purpose of comparisons, several circuits from the ITC'99 benchmark were chosen to be implemented under the HPDP approach and the stand-by-redundant (SBR for short) approach. The SBR system introduced in [16] is an abstraction for FPGA-based FT systems. The "k+m" implementations introduced in [5] , [10] , [11] , [14] , [15] can also be formulated with the SBR model. Failure rate of S under partition N and MTTF are used as scales to measure the ability of fault tolerance of each implementation. MTTF is calculated as:
where R(t) is defined in Eq. (5) .
As in the SBR model,p denotes the number of available blocks in each tile (p was used in [16] , to avoid confusion we change it top) andm denotes the number of spare blocks in each tile.
The parameters assignment is shown in Table 3 . We symmetrically divided the baseline circuit into 8 tiles.p and m were assigned according to the total number of available slices in the device and the number of slices which can be used as spare. The parameter r was evaluated as the ratio that the total number of available slices over the number of slices needed to implement the baseline circuit. The parameter p was set to p = Cm p so that the HPDP implementation contained the same number of alternative configurations with the SBR implementation.
The failure rate of the implementations at a certain time point t and the MTTF of the implementations are shown in Table 4 .
As shown in Table 4 , compared with the SBR implementations, the proposed HPDP implementations reduced the failure rate by up to 43.15% when t = 1/λ (mean time when the first fault may occur in FPGA), and reduced the failure rate by up to 42.34% when t = 2/λ (mean time when 2 faults may occur in FPGA). In other words, the HPDP may help to decrease the number of failed chips in a large system such as [26] by up to 40% at the mean time when the first two faults may occur.
Compared with the SBR implementations, the proposed HPDP implementations increase the MTTF by up to 18.87%. It means the FPGA's life time can be obviously extended by the proposed approach.
Conclusions
A permanent fault recovery approach for FPGA-based reconfigurable systems has been presented in this paper. In the proposed approach, the entire FPGA is divided into several tiles, each of which is protected by multiple partially overlapping configurations. A heuristic procedure based on the DP model has also been proposed to plan the alternative configurations and improve the reliability. Compared with existing work, the proposed approach offers a higher reliability with an applicable time cost for finding the reliable solutions. The experiments have shown that when p is a combinatorial number (i.e. p can be formulated as p = C y x ), it will always return a non-empty partial solution set when calling the proposed procedure Hamming Filter(p,yp/x) or Hamming Filter(p,(x − y)p/x). In other words, the set of "k+m" solutions is a subset of the proposed solutions. Thus, the proposed implementations will never be less reliable than the "k+m" implementations.
Due to the page limit, we ignored the costs caused by the static-region resources (such as the IO pins, the bus macros and the interface ports between the tiles), which are overlapped by all the alternative configurations and reduce the reliability. However, they also help to reduce the interference during partial reconfigurations. We will address the issue on the tradeoff between the reliability and the performance in our future work.
