Abstract-TMR is a very effective technique to mitigate SEU effects in FPGAs, but it is often expensive in terms of FPGA resource utilization and power consumption. For certain applications, Partial TMR can be used to trade off the reliability with the cost of mitigation. In this work we propose a new approach to build Partial TMR circuits for FPGAs using approximate logic circuits. This approach is scalable, with a low granularity, and can provide an optimal balance between reliability and overheads. The proposed approach has been validated using fault injection.
power consumption [4] . For applications that can tolerate some temporary misbehaviour, Partial TMR can be used to trade off the reliability with the cost of mitigation. In [3] , [4] , an automatic solution for Partial TMR is proposed that is based in the concept of persistence. A persistent configuration bit is a sensitive configuration bit that will cause an error when upset that cannot be recovered by scrubbing, so that even after repairing persistent configuration bits through configuration scrubbing, the FPGA circuit does not return to normal operation. On the contrary, non-persistent bits imply some data loss, but the design returns to normal operation when the error is repaired through configuration scrubbing. Persistent bits can be found by topological analysis, looking for feedback structures. These feedback structures are associated to persistent bits and thus must be triplicated first. If resources allow, mitigation is applied to the non-persistent circuit structures to reduce the remaining design sensitivity.
In this work we propose the use of approximate logic circuits [8] to implement Partial TMR in FPGAs. An approximate logic circuit is a circuit that performs a possibly different but closely related logic function, so that it can be used for error detection or error masking where it overlaps with the original circuit. Then, Partial TMR can be implemented by voting among approximate logic circuits instead of exact copies of the original circuit. The goal is to find approximate logic circuits that cover the persistent or most critical bits and reduce mitigation on bits that are less critical to reduce resource utilization. This approach is scalable, with a low granularity, and can provide an optimal balance between reliability and overheads. In particular, an advantage of this approach is that it can selectively provide protection against unidirectional errors, i.e., errors that show up as a change of the logic value from 0 to 1 or from 1 to 0.
Approaches to build approximate logic circuits for partial mitigation of Single-Event Transients (SETs) have been proposed in [8]- [13] . However, to the best of our knowledge this is the first time approximate logic circuits have been used for partial error mitigation in FPGAs. The proposed approach has been preliminarily validated with a fault injection campaign on an Artix-7 FPGA, showing that error mitigation for critical errors is not degraded with respect to conventional TMR techniques, while the overhead can be significantly reduced.
II. APPROXIMATE LOGIC CIRCUITS
Given a logic function G, a logic function G' which Partial TMR in FPGAs using Approximate Logic Circuits
A. Sánchez-Clemente, L. Entrena, M. García-Valderas correctly predicts the result of G for a fraction of its input space is called an approximate logic function with respect to G. Therefore, G' can be used for partial error mitigation of G in those cases where both functions overlap. The interest of this idea lies in finding approximate functions with a good balance between overheads and protection against faults. When approximate logic circuits are used for faulttolerance, it is necessary to identify the overlapping cases for comparison. In general, an additional logic function can be used to explicitly mark such cases, which is referred as indicator function [11] . However, this approach requires that the indicator function is robust. A more convenient approach is to implicitly identify the overlapping areas by means of implication relationships between logic functions.
A logic function F which satisfies the implication [8] , and can be exploited to protect against faults in several ways, such as error detection by checking the implication relationship or partial error masking with a TMR-like scheme [10] . Consider the scheme shown on Figure 2 (a). This is similar to TMR but using an over-approximation H and an underapproximation F instead of exact replicas of the target circuit G. With this scheme, the circuit is protected against single faults as long as the three circuits give the same result, which is represented in Fig. 1 as the grey area. In comparison with conventional TMR, this scheme does not provide full coverage against single faults. However, if approximations are properly chosen, relevant resource savings can be obtained with a low impact on error masking capabilities. In addition, the implication relationship ⊆ ⊆ guarantees that the correct result is given in the absence of faults, because at least one of both approximations agrees with the original circuit for every input vector. This scheme can be extended for sequential circuits as shown in Fig. 2(b) . Approximations of the combinational part of the circuit are generated, and then both outputs and flip-flops are voted. It must be noted that voters can be affected by faults and therefore they should be hardened either by design or by using triple voters.
Several methods have been proposed to generate approximations for a given logic circuit. These methods try to simplify the implementation of logic functions using cube elimination or addition [9] , [11] , [13] , or line substitution [10] , [12] . In this work, we have adapted the line substitution technique [10] for the case of FPGA circuits. This technique is briefly summarized here. Given a logic circuit, an approximation can be obtained by replacing some of the lines of the circuit with logic constants. Then, the logic originally used to implement the replaced lines is eliminated. A major advantage of the line substitution technique is that, under certain conditions that can be automatically checked [10] , each line substitution is guaranteed to produce either an underapproximation or an over-approximation, so that the implication relationships can be preserved by construction. In the field of FPGAs, the circuit structure is typically composed of LUTs. In this context, the application of the previously described technique requires special attention in the sense of which logic transformations are performed to generate approximations. In particular, only line substitutions that effectively reduce the number of LUTs in the circuit are interesting, either by eliminating LUTs or merging contiguous LUTs. The contrary will result in a degradation of the logic function of the circuit without achieving any benefits in terms of resource usage.
Identifying the most critical or persistent bits and the circuit primitives affected by them is a difficult task. Fault injection can be used for this purpose, as proposed in [3] . Although there are some known differences between the fault-injection method and a radiation testing environment, it is shown that a good estimate of the dynamic cross section can be obtained by fault injection with a careful test design. However, this approach is very time consuming. In practice, the authors reason that circuit primitives that are part of feedback structures within the design contribute to the persistent error behaviour, and use a tool to identify the feedback structures [4] . In our case, we use a probabilistic approach to identify the most critical circuit components. Our approach is based on analyzing the number of clock cycles for which the faulty circuit response differs from the correct one. Then, we consider a component as critical if an error in the component produces a large amount of differences. The rationale of this criticality metric is that a fault that produces a very different output response implies a large data loss and most likely means the circuit functionality cannot be recovered. Conversely, an error that produces an almost correct response involves some data loss but can be considered operational. Fig. 3 shows the results of this analysis for the circuit that has been used as a test case in the experiments. For each line in the circuit, we measured the effect of setting it to a constant value, 0 or 1. Note that this is equivalent to a permanent stuckat fault. We used the stuck-at fault simulator HOPE [14] to count the number of clock cycles for which the circuit response is wrong. The results are shown in Fig. 3 in increasing order of the proposed criticality metric. It can be seen that there is an abrupt change around 25% of the faults. The faults on the left can be considered as non-critical and the faults on the right as critical. We have checked that the noncritical faults actually belong to feed-forward logic. Therefore, this criterion coincides with the feedback criterion proposed in [4] for the studied circuit. In any case, the approximation method can use any other criticality criterion. After a classification of criticality is obtained, partial mitigation can be achieved by approximating the less critical components first. When TMR is used for mitigation, the choices for each component are to include it in the mitigated section or not. However, approximate logic circuits can consider an intermediate solution for unidirectional errors. For instance, an error that changes the logic value of a line from 0 to 1 may be critical while the opposite may be not. In such a case, we can replace the line by a constant 0 in just one of the approximate circuits. The overall effect is equivalent to have a non-critical unidirectional hardwired error in one of the three subcircuits that are voted.
A circuit that contains a unidirectional approximation still works correctly in the absence of SEUs because there are always two correct copies for voting. The circuit is not protected for non-critical errors in the same direction, because it is enough to have such an error in any of the other two copies for the circuit to fail. However, the circuit is still protected for critical errors in the opposite direction. As a matter of fact, for such errors to be unmasked it is necessary that the two correct copies have errors in the same critical direction. Critical errors are less likely to happen than in the TMR circuit, because in the TMR circuit an error is observed when two out of the three copies fail. Thus, the overall effect is a shift of the critical cross-section to the non-critical crosssection with respect to the TMR circuit.
III. EXPERIMENTAL RESULTS
The proposed approach has been tested with the B13 benchmark from the ITC'99 set. This benchmark was selected for compliance with current efforts towards a common set of benchmarks that can be used for comparison among different experiments [15] . It also includes a set of pre-generated input vectors that were designed by the ATPG community to fully cover the functionality of the circuit.
Firstly, we implemented a TMR version of B13 by triplicating the logic and using triple voters at the output of each flip-flop. The outputs of the circuit were also voted. From this TMR design, we implemented four different partial TMR designs (A1 to A4) using approximate circuits. Approximations were based on the results of the criticality analysis described in the previous section. As a general target, we considered only approximations that produce less than 15% erroneous responses in the execution of the full set of input stimuli. Then, for each design, lines with a percentage of errors below a selected threshold were replaced by constants. Namely, we used thresholds of 0.05%, 0.2%, 2% and 15% for A1 to A4, respectively. For the sake of comparison, we also considered a TMR design using single voters (SV) and the original unmitigated design (ORIG) of B13. The synthesis results for all considered designs, as given by Xilinx Vivado tool, are shown in Table I.   TABLE I. SYNTHESIS RESULTS   Design  #FFs  #LUTs  TMR  135  298  A1  127  287  A2  119  276  A3  117  265  A4  115  261  SV  135  213  ORIG  45  53 The experiments were run on an Artix7 XC7A100T FPGA from Xilinx. As the B13 benchmark is rather small, we included 24 copies of each design in the same FPGA. All designs run concurrently using the same input stimuli, which are the ITC'99 proposed input stimuli. We also included a small checker circuit to detect if any of the design copies produces an error or if the percentage of errors in a single execution of the full set of input stimuli is greater than the selected target of 15%. The checkers and the interface to the host are tripled to reduce the impact of errors in these modules on the measures. The complete circuit, including all copies of all designs, used 66% of the LUTs and 15% of the flip-flops of the FPGA device. Fault injection experiments have been performed using the Soft Error Mitigation (SEM) Core from Xilinx [6] . The device was allowed to run without scrubbing or reconfiguring it until we observed most of the design versions had a critical error. Then it was fully reconfigured and checked again.
Preliminary fault injection results are summarized in Fig. 4 , which shows a comparison of the Mean Time To Failure (MTTF) obtained in the experiments. For each version, the non-critical MTTF was measured as the mean time until the first error is detected in any of the 24 design copies. The critical MTTF was measured as the mean time until the first critical error is detected, i.e., until the percentage of erroneous responses in a single execution of the full set of input stimuli is greater than the selected target of 15%. As expected, the original unmitigated design (ORIG) shows a low MTTF in comparison with the mitigated versions. The critical and non-critical MTTF are very close, because most errors produce a critical effect. The version using single voters (SV) improves MTTF with respect to the original version, but still the critical MTTF is close to the non-critical MTTF. On the contrary, the full and all the partial TMR versions show a significant improvement of the critical MTTF. The results clearly show that the MTTF for critical errors is not degraded in the Partial TMR versions. With respect to the non-critical MTTF, the differences in the results are small and can be even better than TMR, even though some degradation of the noncritical MTTF can be expected for the approximate logic circuits.
For better validation, radiation test must be used. Radiation test results, along with additional fault injection results and discussion will be reported in a final paper to be submitted to Transactions on Nuclear Science. 
Circuit version
Critical Non-Critical
