An evolvable hardware paradigm for autonomic regeneration called Competitive Runtime Reconfiguration (CRR) is developed whereby an individual's performance is assessed using the dynamic properties of the population rather than a static fitness function. CRR employs a Sliding Evaluation Window of recent throughput data and a periodically updated Outlier Threshold which avoids the extensive downtime associated with exhaustive Genetic Algorithm (GA) based evaluation. The relative fitness measure favors graceful degradation by leveraging the behavioral diversity among the individuals in the population. Throughput-driven assessment identifies configurations whose discrepancy values violate the Outlier Threshold and are thus selected for modification using Genetic Operators. Application of CRR to FPGAbased logic circuits demonstrates the identification of configurations impacted by a set of randomly injected stuck-at faults. Furthermore, regeneration of functionality can be observed within a few hundred repair iterations. The viable throughput of the CRR system during the repair process was maintained at greater than 91.7% of the fault-free throughput rate under a number of circuit scenarios. CRR results are also compared with alternative soft computing approaches for autonomous refurbishment using the MCNC-91 benchmarks.
Introduction
Evolvable hardware (EH) mechanisms for self-repair seek to actively restore lost functionality of digital logic circuits realized in reprogrammable logic devices such Field Programmable Gate Arrays (FPGAs). The EH techniques aim towards providing faultrecovery capability without incurring the increased weight, size, and cost penalties incurred with redundant spares. Hence, recent research has explored the feasibility of using GA techniques to increase FPGA reliability and autonomy [6, 14, 2, 5, 19, 24, 32] . In particular, a detailed overview of a several dozen works which define the field of autonomous regeneration and categorize the possible approaches is presented in Ref. [21] . A key feature of the existing evolutionary techniques for this problem is their reliance upon either exhaustive functional fitness evaluation or exhaustive resource testing during regeneration.
Alternatively, using the Competitive Runtime Reconfiguration (CRR) approach developed herein, an initial population of functionally identical (same input-output behavior), yet physically distinct (alternative design or place-and-route realization) FPGA configu-rations is produced at design-time. At runtime, these individuals compete for selection favoring fault-free behavior. Discrepant behavior, whereby the outputs of two competing individuals do not agree on a bit-by-bit basis, is used as a metric during the fitness evaluation process. Over a period of time, as the result of successive pair-wise comparisons, performance capabilities are identified for the entire population regarding the fitness of individuals relative to one another. This fitness information can then be leveraged to select underperforming configurations to receive genetic modification such as crossover or mutation.
Conventional GA techniques employ a population-based optimization approach to the regeneration problem with the objective of producing a single best-fit configuration as the final outcome. They utilize a fitness function for each regenerated circuit which is evaluated exhaustively for all possible input values. However, given that partially complete repairs are often the best attainable with a tractably sized search [14, 27] , there is no guarantee that the individual with the best absolute fitness measure for an exhaustive set of test inputs will correspond to the individual that has the superior performance under a particular subset of inputs actually applied. Thus, exhaustive evaluation of regenerated alternatives is computationally expensive, yet not necessarily indicative of the optimal performing individual within a population of partially correct repairs. In this paper, we base fitness instead on the actual throughput data of the alternatives when they are placed into ser-vice. Hence, two innovations are developed in the CRR approach to facilitate self-adaptive EH regeneration:
(1) relaxation of the requirement for exhaustive assessment vectors; and (2) fitness evaluation based on outlier identification over time.
These characteristics are used to assess the evaluation overhead and the repair capabilities, respectively, of the CRR approach defined herein.
Related work
The existing schemes for EH repair can be broadly classified in terms of their refurbishment strategy, resource coverage, and the granularity at which faults are isolated.
Offline refurbishment with exhaustive functional testing
Several approaches to GA-based fault-handling in FPGAs utilize exhaustive testing for fault isolation and offline regeneration mechanisms. As listed in Table 1 , conventional Triple Modular Redundancy (TMR) [15] , Vigander's approach [27] which combines TMR with GAs, and other n-plex spatial voting techniques [17] deliver real-time fault-handling, but increase power consumption n-fold during fault-free operation. These techniques also require exhaustive evaluation of the function being refurbished. On the other hand, STARS [1] is an example of exhaustive evaluation of a resource-oriented diagnostic that performs Built-in Self-Tests (BISTs) on sub-sections of FPGA. Under this paradigm, the test area roves across all FPGA resources. Portions of the FPGA are continually taken offline in succession for testing while the functionality is moved to a new location within the reprogrammable fabric. One limitation is that detection latency can be large since tests must sweep through all intervening resources before a fault is detected. Potential throughput unavailability due to diagnostic reconfigurations when no faults have yet occurred is also a consideration.
As listed in Table 1 , methods proposed by Lohn [14] , Larchev [11] , and Lach [10] either rely on offline regeneration supported by exhaustive functional testing, or pre-determined spares defined at design-time. Other soft computing-based approaches to fault tolerance include Jiggling [4, 5] which combines TMR with a (1 + 1) GA, genetic-learning based models [24] , an Asexual GA for FPGA selfrepair [22] , and bio-inspired systems for evolving dependability [26] which apply immune-based approaches on circuit case studies similar to those herein. In addition, Liu et al. [13] present another biologically inspired model for transient faults.
Of the methods in Table 1 , only Keymeulen et al. [6] investigate the possibility of using a population-based approach to desensitize circuits to faults. They exploit evolutionary techniques at design-time so that a circuit is more likely to be designed to remain functional even in presence of various a-priori envisioned faults. Their population-based fault tolerant design method evolves diverse circuits and then selects the single most fault-insensitive individual. While their population-based fault tolerance approach provides passive runtime fault tolerance, the approach developed herein adapts dynamically to environmental demands through intrinsic evaluation of a fitness consensus from the entire population at runtime.
Forming a robust consensus using diversity
To form a robust consensus, Layzell and Thompson [12] dealt with fitness evaluation by exploiting populational fault tolerance (PFT). Under a PFT strategy, the creation of the best-fit individual proceeds incrementally by incorporating additional elements into partially correct prototypes as they adapt to faults. They speculate that evaluation becomes focused on the precise regions of relevance within the search space during the execution of online processes. This provides motivation to explore CRR's first goal mentioned in Section 1 of relaxing exhaustive input evaluation, especially before returning a partially repaired configuration back to service.
Yao et al. [30] further emphasize that in evolutionary systems the population as a whole contains more robust information than any one individual alone. They demonstrate the utility of information contained within the population using case studies from the domains of artificial neural networks and rule based systems. In both cases, the final collection of individuals outperforms any single individual. Yao and Liu [31] further extend this concept by presenting four methods for combining the different individuals in the final population to generate system outputs. While the authors devise a method to utilize the information contained in the population to improve the final solution, they did not attempt to use the information in the population to improve the optimization process itself. More recently in Ref. [29] the authors describe using fitness sharing and negative correlation to create a diverse population of solutions. A combined solution is then obtained using a gating algorithm that ensures the best response to the observed stimuli. The authors claim that applying the described techniques to evolvable hardware applications should be straightforward, but do not provide examples. They state the absence of an optimal way of predicting future performance of evolved circuits in unforeseen environments as an impediment. Problems related to fault tolerance in online evolution identified by the existing approaches can be addressed by a new Consensusbased Evaluation scheme [32] . With relative fitness measures based on competition, a consensus is produced regarding the fitness of individuals in response to the actual environmental stimuli with hardware in-the-loop. This intrinsic evolutionary process can adapt to runtime requirements and improve fault-handling capability. The approach utilizes a temporal, rather than spatial, voting scheme whereby the outputs of two competing instances are compared at any instant and alternative pairings are considered over time. The presence or absence of a discrepancy is used to adjust the discrepancy values (DVs) of both individuals without rendering any judgment at that instant on which individual of the pair is actually faulty. The faulty, or later exonerated, configuration is determined over time through other pairings of competing configurations.
These concepts are expanded here by evaluating the real-time performance of individuals in comparison to others in the population. Instead of using an absolute fitness function with concomitant exhaustive testing, relative discrepancy values are used as the threshold to identify faulty individuals. The system favors selection of individuals which intrinsically perform the well in the current environment as dictated by the resource failure status and recent throughput data for the circuit itself. Each individual competes for selection based on its performance over an observation window as described in the next section.
Competitive Runtime Reconfiguration (CRR) paradigm

Detecting faults using a population of alternatives
In the CRR paradigm, each individual in the population has an instance of one of the two complementary logic circuits which form a discrepancy detection arrangement [3] . For purposes of illustration in Fig. 1 , assume two competing half-configurations labeled Functional Logic Left ("L") and Functional Logic Right ("R") are loaded in tandem on the physical FPGA platform. The halfconfigurations occupy mutually exclusive physical resources to implement identical functionality. Like a duplex version of TMR, the L and R functional logic elements are identical, but their outputs are enabled if and only if the discrepancy detector in the complementary half finds the outputs to be fault-free. This supports the possibility of a single fault in either half of the functional logic, or in the discrepancy detector logic of either halfconfiguration.
This arrangement realizes a conventional Concurrent Error Detection (CED) capability to identify at least any single resource fault with certainty [18] . As in traditional CED approaches, comparison of the outputs of the two resident half-configurations will produce either discrepant or matching outputs which will indicate the presence or absence of faulty resources in the FPGA hardware platform respectively.
Whenever two loaded half-configurations disagree on the computed value of throughput data, the discrepancy value (DV) of each half-configuration is incremented. By repeated pairing over a period of time, only those half-configurations which do not use faulty resources will eventually become preferred as evidenced by their reduced DV. This is because the DV of a faulty halfconfiguration is always increased regardless of its pairing, yet the DV is never increased whenever fault-free half-configurations which are paired together. This pairing process occurs as part of the normal throughput processing of the FPGA without additional test vectors or other resource diagnostics by tracking the Health State of each half-configuration.
Tracking competence using health states
The Health State of the configurations that comprise the L and R functional configuration populations are managed by the procedural flow for the CRR algorithm as depicted in Fig. 2 . After Initialization, the Selection of the L and R half-configurations occurs. The selected individuals are then loaded onto the FPGA. Next, the Detection process is conducted when the normal data processing inputs are applied to the FPGA. The DVs of the competing halfconfigurations are updated based on whether or not their outputs are discrepant. The central Primary Loop representing discrepancyfree behavior can repeat without reselection as long as there is no discrepancy. However, even in the absence of any observed discrepancies, one or more of the competing individuals may be replaced to hasten regeneration in the presence of any Under Repair individuals. As described later, the Replacement Rate, R X , determines the frequency with which such discrepancy-free individuals are replaced to allow rotation of other individuals from the Dormant pool. For instance, the system availability can be increased by using a relatively small value for R X , rather than using a relatively high value for R X .
As described in detail in Section 4, the Fitness State Adjustment process will be used to validate and update the state of the individual after an Evaluation Window has elapsed. Otherwise reselection will occur, without updating the fitness state of the individual being replaced. For Under Repair individuals, if the value of the corresponding History matrix H ii element, as described in Section 4.3, is greater than the threshold value, then Genetic Operators are invoked only once without attempting to achieve complete refurbishment. The modified configuration is then immediately returned to the pool of competing configurations and the process resumes starting with the Selection phase.
Selecting candidates for competition
The population of configurations to be evolved by the GA consists of L and R individuals chosen by the Selection and Detection processes shown in Fig. 3 . During the selection process, Pristine, Suspect, and Refurbished individuals [2] are preferred in that order for one half-configuration. The other half-configuration is selected based on a stochastic process determined by the Reintroduction Rate ( R ). In particular, Under Repair individuals are selected as one of the competing half-configurations, on average, at a rate equal to R . Thus, a genetically modified Under Repair configuration is reintroduced at a controlled rate into the operational throughput flow. The reintroduced configurations act as a new competitor to potentially exhibit fault-free behavior against the larger pool of configurations. Individuals in the population have a finite probability of being selected as the Active individuals with a suitable interval between successive selections.
The Detection process is presented in the lower right corner of Fig. 3 . If a discrepancy is observed as a result of output comparison, the FPGA is reconfigured with a different pair of competing configurations and the output of the FPGA is not propagated externally. Thus, CRR employs the runtime inputs simultaneously as evaluation inputs, with subsequent isolation of outliers as described in Section 4.2. Also, the partially correct outputs generated by competing fault-affected individuals can improve availability as opposed to keeping a device offline while a perfect solution is evolved.
Identifying outliers by evaluating performance
Determination of the Evaluation Window
CRR uses runtime inputs for individual performance evaluation rather than exhaustive testing with a predefined set of test vectors. Nonetheless, sufficient testing of individuals with throughput data can provide adequate test coverage as shown in Section 5. While the range and sequence of online inputs may not be known at design-time, a probabilistic model is analyzed here to estimate the expected number of evaluations anticipated to encounter a range of evaluations with high probability. An initial default width for the Evaluation Window, E, can then selected based on the analysis.
The characteristics of the circuit Under Repair will influence the determination of E as illustrated for an unsigned integer multiplier. Let the circuit input width, W, denote the total number of operand bits to the multiplier. In the case of a 3-bit × 3-bit multiplier, W = 6 and the total number of distinct input combinations is 2 W = 64. Thus in the case of the 3-bit × 3-bit multiplier, an exhaustive set of inputs would consist of all 64 possible combinations. The problem of determining the number of random inputs needed to facilitate all possible inputs appearing at least once is similar to the coupon collector problem [7] . In the coupon collector problem, the expected number of coupons to be collected before at least one each of D total coupons are collected is given by the simplified expression, D × H D , where H D is the D th harmonic sum [7] . However, for the exhaustive test modeling problem at hand, the number of random inputs required to facilitate the appearance of all possible inputs with varying confidence factors needs to be derived. This problem can be modeled as a game involving selection of balls from a set of 64 differently colored balls. A single ball is selected in each drawing, with replacement. In other words, what is the probability that, after D drawings, at least one ball of each of the 64 colors appeared at least once? Clearly, for D < 64, the probability is zero, and for D = 64 is 2.54 × 10 −116 which is highly improbable.
To solve this problem, consider the case where all balls are of one color. After D drawings, we have 1 1
number of feasible sample events, so x 1 = 1 D . Now, consider the case when D ≥ 64. In general, a K-color experiment can be described as a sum of experiments involving smaller numbers of colors for any constant value of D:
Since the numerical value of K D in Eq. (2) can be excessively large, it may not be possible to represent it using an unsigned long variable, the widest variable in a 32-bit system, since for example 64 64 > 2 32 − 1. Therefore, an alternate representation can consider x K as a sample event in which all K colored balls appear at least once with a probability P K . D is the number of drawings, and K D is the total number of possible permutations, yielding:
Now, by dividing Eqs. (1) and (2) by K D , we obtain, respectively
and
so when K = 1 . .
Therefore, in general:
Eq. (6) yields P K recursively without the computational burden of calculating K D as ((K − 1)/K) < 1 for all K.
As shown in Fig. 4 , when K = 16 colors and D = 100 drawings, the probability P 16 of all 16 colors appearing is ≈ 100%. Similarly, 250 trials for 32 colors are sufficient given equiprobable inputs. Table 2 shows the result for the case when K = 64, which applies to the 3-bit × 3-bit multiplier. In order to achieve comprehensive coverage with a certainty of 97.59%, approximately 500 evaluations are sufficient. A certainty of 99.50% implies an Evaluation Window of width E = 600 which was adopted for the fault isolation experiments in Section 4.3. Thus, in the case of a 3-bit × 3-bit multiplier design, if 1-out-of-64 inputs articulate a fault in a single individual C i , and all the input combinations are equally likely to appear, then the expected discrepancy value after E = 600 evaluations is:
While analysis does not guarantee any amount of test vector coverage in practice, experiments shown in Section 5 indicate Table 2 Probability of all 64 inputs appearing at least once given D evaluations. numerous case studies where a 99% or greater equiprobable coverage provides stable adaptive behavior during fitness evaluation.
Determination of the Sliding Window width
To ascertain if the DV anticipated by Eq. (7) is indicative of a change in fitness, it is compared to the consensus value of the recent number of discrepancies observed throughout the population. The Sliding Window, S, is used to update the global discrepancy consensus to which all individual values are compared. Typically, S is selected to be an integer multiple of E such that S = q × E, where 1 < q < |C| and |C| is the population size. In the experiments described, a Sliding Window width size is selected such that q = 5 for |C| = 20, as these are shown experimentally below to build a sufficient consensus of recent observations across 5/20 = 25% of the population. In particular, the size of E is indirectly proportional to the update frequency of each individual DV i for individual i.
For instance as shown in Fig. 5 , for a Sliding Window width of only 15 and a cut-off value of 0.5, 100% outlier identification can be achieved. To minimize fault location time, the lowest value of S, S min is sought which repeatedly identifies the faulty individuals. As S is reduced from 20 to 5, the curves indicate that a higher cut-off value is required to identify outliers and that 0.4 is sufficient for S ≥ 10. However, as shown in Fig. 5 , with S = 5, and a cut-off value of 0.9, outliers can be consistently identified, thus yielding S min = 5. A faultimpact value of 32-out-of-64 was selected to reflect a middle-range damage scenario where the physical resource fault manifests a discrepancy for 50% of the applied inputs. For less catastrophic faults, consistent fault isolation can be relied upon with the parameters identified above.
Identification of outliers
Outlier diagnostics is based on the principle of detecting the Least Squares (LS) projection matrix H [23] . This matrix is well known under the name hat matrix, because it is denoted by a hat on the column vector y = (y 1 , . . ., y n ) t such thatŷ = H × y andŷ is the LS prediction for y. The hat matrix H is defined as follows: consider there are p explanatory variables and one response variable which will have n observations. The n-by-1 vector of responses is denoted by y = (y 1 , . . ., y n ) t . The linear model states that y = X × Â+e, where Â is the vector of unknown parameters, e is the error vector and X is the n-by-p matrix:
Then, the H matrix is composed from X as follows:
The diagonal elements of H have a direct interpretation as the effect exerted by the i th observation on the expectation of response variable because they equal ∂ ∧ y i /∂y i . The average value of the diagonal element H ii is p/n and it follows that 0 ≤ H ii ≤ 1 for all i.
In the CRR approach, the DV of each individual can be viewed as one observation or one explanatory variable, and the Observation Interval can be set as the size of the entire population. Fortunately, since the X matrix consists of only one column in our application, the result of the X t X product is a single-element vector matrix, and its inverse can be computed using a straightforward one-step computation. In general, the computation complexity of the H matrix approach is 2n 2 + 1.
The recommended threshold for the identification of outliers is H ii > 2p/n and a stricter cut-off value 3p/n has been used in previous works [8, 9] . For an analysis of the CRR problem for fault isolation, setting p = 1 and n = 20 corresponds to one faulty individual among a population of 20. For example, a cut-off value of 10 × p/n = 10/20 = 0.5 can be used in conjunction with a larger Sliding Window width of 15 to favor fairly consistent outlier identification as demonstrated in the case studies. Also, to increase the confidence with which outliers are isolated, we increase threshold from one standard deviation from the mean to a value of 2.5 .
CRR performance evaluation
Outlier detection and fault isolation performance
Experimental results regarding the effect of the outlier detection parameters are illustrated in Figs. 6-13. Each has been generated using a simulator written in the C++ programming language which utilizes an equiprobable selection of individuals. In the data reported for experiments, the inputs causing the first discrepancy are applied once after each pair of faulty configurations is replaced to assess the damage definitively under a single-fault model.
To further illustrate how the DVs are mapped to the H ii values, Figs. 6-13 are presented in pairs that show results from the same experiment. The first figure in each pair shows the observed DVs and the subsequent figure shows the H ii values calculated using this data. For example, Fig. 6 shows the DVs observed over 50 individual evaluations, where each evaluation occurs after the particular individual has completed E = 600 computations as an Active configuration on the simulated FPGA. Fig. 7 shows the plot of H ii values for a subset of evaluations corresponding to the identification of the first outlier. Figs. 6 and 7 depict the identification of outlying individuals in the population that has a 10-out-of-64 fault impact caused by a single fault. A Sliding Window width of 15 was used in this experiment. Fig. 6 shows identification of an outlier (a high yaxis value) at evaluation numbers 11, 31, and 48 (along the x-axis). In general, outliers can be identified with a periodicity of approximately 20 × E for a population size of 20 individuals. By choosing a smaller Sliding Window, outlier identification will take place at an increased frequency as shown in subsequent experiments. Fig. 7 shows the outlier at evaluation 11 exhibits H ii ≈ 0.94 which is well over an order of magnitude larger than H ii ≈ 0.02 of the other competitors. Based on analysis of the H ii values, and an outlier cut-off value of 0.5, the outlying individual is identified without statistically significant ambiguity.
In Figs. 6 and 7, individual performance was measured using a Winner-Takes-All scheme, where the only information available from the discrepancy detection is bit-wise output equality. A alternate discrepancy detection circuit could provide information such as the Hamming distance of the observed output of individuals. The use of Hamming distance information leads to outliers having a higher discrepancy value, as shown in Fig. 8 , when compared to Fig. 6 . As in the previous experiment, a 10-out-of-64 fault impact is considered, with a Sliding Window width of 15. The higher DV of approximately 140 can be accounted for by the fact that the observed Hamming distance between the observed discrepant output and the desired ideal can be greater than 1. This is opposed to the previous case, where the presence of a discrepancy increases the DV of the corresponding yielding DV≈70. The Outlier Threshold remains the same, nonetheless, since the hat matrix operates on normalized information. Figs. 8 and 9 show plots of the discrepancy value and the H values when the Hamming distance is used to quantify divergence which still clearly delineate outliers as high values on the y-axis.
Although the Hamming distance of the outputs serves as a useful metric to quantify divergence, other performance evaluation schemes can be used to compare and evaluate the fitness of individuals. As an example, a bit-weight performance evaluation scheme can be used, where the output of a configuration is converted to a numerical value by assigning weights to the bits in the output. In this scheme, the binary output is evaluated as a binary number, and the numerical difference between the outputs serves as the discrepancy metric where most significant bits would correspond to more drastic errors in output values.
In the case when a single faulty L individual with a less catastrophic 1-out-of-64 fault impact is analyzed, two outlier points are successfully isolated as shown in Fig. 10 . Fig. 11 shows the corresponding plot of the H ii for the same experiment. The detection rate was observed to be 100% for this fault injection scenario. When compared to the results in Figs. 6 and 7, it can be seen that the identification takes place more frequently with a periodicity of approximately 5 × E. This corresponds to the use of a narrower, yet equally effective, Sliding Window width as opposed to the 15 × E Fig. 9 . Plot of H ii showing Outlier Identification when Hamming distance is used. used in the earlier experiment. In Fig. 11 , the outlier cut-off value is 0.3 as compared to 0.5 in Fig. 7 . Also, the first outlier in Fig. 11 is closer to the cut-off value which can be expected with a narrow Sliding Window. A wider Sliding Window width helps reinforce identification, yet too large a value can delay identification without improving the discrimination among faulty and viable competitors.
For a greater fault-impact scenarios, where one or more faults impacts one or more configurations, the isolation will be more challenging and time-consuming as shown in Figs. 12 and 13. Both figures depict the isolation characteristics for a single faulty L individual with a 32-out-of-64 (50% correctness) fault impact. A greater number of observations are required than the 1-out-of-64 scenario and the divergence of the outlier is also greater. Individuals that are eventually identified as outliers are replaced by the CRR algorithm more often, since the computations involving these individuals produce discrepant outputs. Under the default replacement strategy for discrepancy-free behavior depicted in Fig. 3 , fault-free individuals reside on the FPGA indefinitely. However, in this experiment, they are replaced in accordance with the Replacement Rate R X = 0.16, which corresponds to a guaranteed evaluation period of 100 contiguous iterations out of the E = 600 window. Individuals that do not produce discrepant outputs are replaced with other individuals less frequently than ones that do. Thus, individuals that are not fault-affected complete the required E number of iterations to finish evaluation much sooner than the fault-affected individuals. This is because discrepancies trigger immediate reconfiguration as a means of maintaining throughput and improving system availability within limits dictated by R .
FPGA circuit representation and characteristics
The FPGA structure used in the following experiments is similar to that used by Miller and Thompson for GA-based arithmetic circuit design [16] . The feed-forward combinational logic circuit uses a rectangular array of nodes with two inputs and one output. Each node represents a Look-up Table ( LUT) in the FGPA device, and a Configurable Logic Block (CLB) is composed of four LUTs. In the array, each CLB will be a row of the array and two LUTs are represented as four columns of the array. There are five dyadic functions -OR, AND, XOR, NOR, NAND -and one unary-function NOT, each of which can be assigned to an LUT. The LUTs in the CLB array are indexed linearly from 1 to n. Array routing is defined by the internal connectivity and the inputs/outputs of the array. Internal connectivity is specified by the connections between the array cells. The inputs of the cells can only be the outputs of cells with lower row numbers. Thus, the linear labeling and connection restrictions impose a feed-forward structure on the combinational circuit.
As an example of the circuit representation, the 3-bit × 3-bit multiplier can be implemented using the above FPGA structure, as shown in Fig. 14 . The entire configuration utilizes 21 CLBs. XOR gates are excluded from the initial designs to force usage of a higher number of the gates than conventional multiplier designs to increase the design space. XOR gates simplify the process of calculating partial binary sums, and thus reduce the number of gates required to build half-adders and full-adders.
A library of user-defined modules can be defined to instantiate a population of diverse yet functionally equivalent circuits. In this case study, 20 distinct individuals are created at design-time using a set of 10 or more variations of three fundamental subcircuits. These consist of parallel-AND, half-adder, and full-adder primitives. For example, 24 different full-adder designs and 18 different half-adder designs were created for use in building the individual 3-bit × 3-bit multiplier designs. Thus, each multiplier is a distinct combination of building blocks, where each building block itself is chosen from among alternate designs in the library. Fig. 14 illustrates an individual with three parallel-AND, three full-adder, and six half-adder modules.
The population of competing alternatives is then divided into two groups, L and R, where each group uses an exclusive set of physical resources. For crossover to occur such that offspring are guaranteed to utilize only mutually exclusive physical resources with other resident half-configurations, a two-point crossover operation is carried out with another randomly selected Pristine, Suspect or Refurbished individual belonging to the same group. By enforcing speciation, breeding occurs exclusively within L or R, and non-interfering resource use is maintained. The crossover points are chosen along the boundary of CLBs so that intra-CLB crossover is precluded. The mutation operator randomly changes the LUT's functionality or reconnects one input of the LUT to a new, randomly selected output inside the CLB.
Refurbishment of a unique failed configuration -multiplier case study
In this experiment, GA-based recovery operators are applied to regenerate the functionality in the affected individuals. In order to simulate a hardware fault in the FPGA, a single stuck-at fault is inserted at a randomly chosen LUT input pin. This fault happens to affect the L individuals in the population. Later, a different single fault is introduced, which will affect R individuals, and the experiment is repeated. Upon observing the first discrepancy, the same inputs are applied once to the reloaded configurations as a definitive means of damage assessment under a single-fault model. Over 25 experimental runs, an average of 2,171 iterations were required to dependably demote the fitness state of the affected individual from Pristine to Under Repair. During regeneration, the Genetic Algorithm performs inter-module crossover and intramodule mutation operator called the input permutation operator. Unlike traditional mutation, the input permutation operator alters a specific LUT's functionality, choosing from among AND, OR, XOR, NOR and NAND gates, as also changing the connections to the input pins. Such mutation in conjunction with the crossover operations enables full exploration of a wide range of designs. Table 3 lists the evolutionary regeneration characteristics of CRR for stuck-at-0 and stuck-at-1 faults. The faults were injected at randomly chosen locations in the designs. For the experiment, DV R DV O , the repair and operational thresholds, were 2.5 and 1 respectively. The use of multiples of standard deviation as the threshold ensures that the system adapts in the case of catastrophic fault conditions, as well as the condition where very few discrepancies are observed.
The parameters which control the rate at which individuals are rotated on the FPGA, R and R X were set at 0.2 and 0.16, respectively. The Reintroduction Rate of 0.2 implies that 20% of the computations were carried out using a pair of individuals, one of which was Under Repair. In spite of this, the effective throughput remains high and above 97.5% on an average. This shows that even the individuals undergoing repair produce useful output approximately 0.975 − (1 − R )/ R × 100% = 87.5% of the time.
Using a higher value for R will lead to faster regeneration at an incremental cost to repair throughput. This provides finegrained control over system performance measured in terms of availability and regeneration latency. Unlike other evolvable hardware approaches, CRR can be optimized to reduce downtime, increase availability, or to speed up the fault identification and regeneration process accordingly. The results listed in Table 3 indicate that the evolutionary algorithm is capable of regeneration for the tested fault locations. The correctness of the affected configurations is raised from as low as 22-out-of-64 correctness to complete operational suitability. The effective throughput is maintained throughout at above 97.6%. It can also be seen that CRR-based regeneration can be more computational tractable without exhaustive evaluation, as is listed in the Repair Iterations column. In order to further evaluate the performance of CRR-based regeneration on a wider variety of circuits, refurbishment experiments were conducted on circuits from the MCNC-91 benchmark suite [28] . Four different circuits from the benchmark suite were used, namely, the z4ml, cm85a, cm138a, and a 2x-decod circuit. These circuits were chosen to represent two different kinds of circuits. The z4ml and cm85a circuits have a fan-in greater than the fan-out, and the cm138a and 2x-decod circuit have a fan-out that is greater than the fan-in. The z4ml circuit is a representation of a binary adder. The cm138a and cm85a circuits are combinational logic circuits, and the decod circuit is a decoder. In these refurbishment experiments, a single stuck-at fault is introduced at a random LUT input which affects 18 of the 20 configurations. As compared to the prior experiment described in Section 5.3, all but one of the L and R configurations are affected by the fault such that a subset of their outputs are incorrect. All other parameters for these experiments are identical to those for the experiment described in Section 5.3. Results of the experiments with multiple fault-affected configurations are shown in Table 4 .
As listed in Table 4 , for all the circuits, CRR is able to refurbish a minimum of 14 configurations. The throughput is maintained at above 91.7% for the experiments. Using a lower value of R will further improve the throughput. From these experiments, it is clear that CRR is able to utilize information obtained from pair-wise competition to direct the global search for refurbished configurations. Furthermore, no correlation is observed between the fan-in to fanout ratio of the circuits and the relative difficulty of refurbishing the circuits.
Comparison with alternative soft computing fault tolerance schemes
In Vigander's experiment with using a voting system in conjunction with TMR [27] , the target circuit is a 4-bit × 4-bit multiplier. With a population size of 50, and a crossover rate of 70%, most of the 44 runs Vigander performed developed a set of three modules which vote to provide fully fit output for the exhaustive set of 256 unique input combinations. However, it is not always possible to successfully identify a single fully repaired individual from those three alternatives. Vigander's experiment selected a population size of 50, which is 500% greater than the population used in the repair experiments attempted herein. Most significantly, it relies on exhaustive serial testing against the set of all possible inputs. CRR, however, achieves refurbishment with runtime inputs, continually providing some validated outputs that maintains useful throughput above 91.7% across all benchmarks mentioned above. Furthermore, in Ref. [11] 3-bit by 3-bit multipliers are evolved in the presence of faults using a Genetic Algorithm. Over 17 runs of the experiments, the fitness of the multiplier is improved by 12.6% on average. However, none of the runs succeed in realizing a fully fit multiplier, after as many as 222,056 generations. This highlights the asymptotic performance of GA optimization in this domain -where improvement to the fitness of a failed configuration is rapid initially, but continued improvement to a fully fit solution is difficult. CRR is able to leverage the diverse genetic information contained in the population of designs to realize fully functional refurbished configurations within a few thousand iterations as shown in Section 5.3. More importantly, even in the presence of fault-affected configurations, the population of solutions is capable of maintaining some useful throughput.
Compared to Jiggling [4] , which is also an evolutionaryalgorithm based approach to repairing permanent faults, CRR can exhibit lower latency by virtue of not relying on exhaustive tracking of the repair candidates. Additionally, the (1 + 1) Evolutionary System described therein relies on rollbacks to preserve best-fit mutants. CRR, by virtue of depending on a population of higher-fit alternatives that are evaluated temporally over many iterations, precludes the need for rollback of configurations and ensures higher populational fault tolerance capability. In Ref. [5] , results are presented for the performance of the Jiggling scheme when the technique is applied to some of the MCNC-91 benchmark circuits presented in Section 5.4. For the cm138a circuit when configurations are available, the system is able to maintain reliability at greater than 99% even when a mean time between faults is 52.8 s. The results also show that repair is achieved for some circuits such as cm138a, while refurbishments remained elusive for others, such as the decod circuit. Furthering the results of Keymeulen et al. on populational fault tolerance [6] , CRR achieves device refurbishment at runtime, while ensuring sustainable levels of throughput with graceful degradation. As compared to the Roving STARs approach [1] , CRR minimizes detection latency, as faults are evident immediately upon a discrepancy at the outputs. Also, unlike STARs, by virtue of the runtime-input based performance evaluation, CRR leverages partially fit configurations to provide some functional throughput. This effectively improves the granularity of spare usage to include those affected by stuckat faults, as the GA may evolve solutions that use fault-affected resources in generating repair configurations. Nonetheless, despite its exhaustive testing of FPGA resources, STARs does avoid the need for convergence inherent in GA-based approaches, including CRR.
Conclusion
Evolutionary regeneration can benefit from a population of partially working designs which provide diverse, relevant alternatives. This also allows departure from conventional fitness evaluation with a rigid individual-centric fitness measure specialized at design-time for each functional behavior needing repair. CRR uses instead, a self-adapting, population-centric assessment method at runtime based on actual throughput data rather than test vectors. CRR relies adapts fitness criteria based on performance of the population, thus providing graceful degradation. By utilizing outlier detection techniques that work temporally without the need for exhaustive testing, CRR provides a fault tolerance technique that achieves usable device throughput during the fault detection process.
While the pre-existing methods focus on creating a single fully fit configuration, CRR extends this to maintain a population of solutions that have preferred fitness. This enhances the adaptability of viable alternatives to a variety of unanticipated faults. An additional benefit of maintaining a population of diverse partially fit individuals is that when the inputs to the system are localized to a subset of the set of all possible inputs, even partially fit individuals can assist in generating many of the required outputs, thereby improving the rate of throughput during recovery.
A concern with self-adapting systems such as CRR is the avoidance of undesirable emergent behaviors. In the case of CRR, one potential risk is functional drift occurring over a period of time. In other words, fitness evaluation without exhaustive testing creates an exposure that the population's behavior will begin to deviate from the originally intended function, as repeated repairs accumulate on long missions encountering multiple failures. One way CRR mitigates this risk is through its designation and tracking of Pristine configurations which are normally preferred for selection. As long as at least one known Pristine configuration exists then there is an avenue for evolution to be guided back to the desired behavior under a number of fault scenarios.
Future work involves scaling to larger circuit sizes, which is currently being investigated in pipelined combinational logic designs as follows. After discrepancy detection, a Combinatorial Group Testing (CGT) method [25] tracks utilization of resource sets among individuals in the population to quickly identify the stage containing the faulty resource. This can be incorporated within the configuration selection step of CRR. The Genetic Operators can then be applied only to that isolated stage to attempt recovery [20] , thus providing an approach to extend the CRR method to larger circuits while remaining computationally tractable. Finally, experiments with hardware in-the-loop for Xilinx Virtex-II Pro [33] and Virtex-5 FPGAs indicate feasibility on various hardware platforms.
