Abstract-In presence of increasing soft error rates due to shrinking feature sizes, design tools are required to analyze fault tolerance and robustness of circuits.
I. INTRODUCTION
Continuously shrinking feature sizes of digital circuits allow for the integration of more and more components in a single chip. Shrinking feature sizes have several positive side effects, e.g., low-power circuitry operating at high frequency. But there are substantial drawbacks as well. Manufacturing failures and transient faults may increasingly tamper the functionality, consequently the Soft Error Rate (SER) increases. Precautions against soft errors are taken at different levels, e.g., architectural level, algorithmic level, or layout level [1] . But the implementation of these techniques has to be verified. During implementation bugs may be introduced. Thus, checking the fault tolerance of a given implementation early in the design flow becomes an important step in the design process. Several formal and non-formal verification methods have been applied for this purpose.
Non-formal methods, like simulation or emulation [2] perform fault injection to check for a limited number of simulation traces whether those lead to faulty behavior. Due to the simulation-based nature, this is a fast but not a complete method with respect to all potential scenarios or all faults.
Formal methods can cover all inputs and any fault covered by the given fault model. Various methods have been proposed based on symbolic methods using Binary Decision Diagrams (BDDs) [3] - [6] or Boolean Satisfiability (SAT) [7] - [9] . The approaches of [6] , [8] , [9] are most tightly related to our approach.
Based on the Stuck-At Fault Model (SAFM) the method in [6] computes the probability for transient faults to be propagated to primary outputs of combinational circuits. The given theoretical extension to sequential circuits is too complex to be applied in practice as the analysis relies on BDDs. Internally, all testpatterns are calculated for each fault to be considered. By this, the probability of applying a testpattern for a certain fault can be calculated. In the following we refer to [6] as a probabilistic approach. In [10] the authors proposed an alternative probabilistic approach to approximate the signal probabilities for combinational circuits.
This work has been funded in part by DFG grants DR 287/19-1 and FE 797/5-1.
The work in [9] analyzes the self-checking fault secureness [11] for combinational circuits and provides a robustness measure with a lower and an upper bound for the SAFM. This method analyzes a circuit model by using ATPG algorithms. For the analysis assumptions on the environment and the checker functionality of the circuit are required.
In [8] a model similar to Bounded Model Checking (BMC) has been proposed for the analysis of soft errors. As well as [9] this method provides a lower and an upper bound for the robustness. Both methods can be seen as a worst-case analysis: a component is classified non-robust if there exists at least a single testpattern, such that a fault of this component may change the output behavior. We call this classification worst-case analysis, i.e. the probability for excitation and propagation is ignored by the robustness measure. In contrast the probabilistic approach of [6] considers all testpatterns. But this approach relies on BDDs for the analysis restricting the application to very small circuits.
In this work we propose a robustness measure that constitutes a trade-off between the worst-case analysis and the consideration of all testpatterns. Thus, a limited number of testpatterns that show the faulty computation is considered. Therewith a more detailed view of the robustness is given, which can be computed in feasible run time. We use the model of [8] which considers soft-errors in sequential circuits.
Technically, a sequential ATPG engine is used to consider a bounded time window for the calculation. When calculating only a single testpattern for each soft error, worst-case analysis is performed. By calculating all testpatterns the probabilistic analysis is performed at high computational costs. Our approach finds up to a predefined number of testpatterns. As a result hot-spots in the circuit are identified, that are easily sensitized to propagate soft errors. This knowledge guides the designer to increase the robustness of the circuit. The previous measures, worst-case analysis and usage of all input stimuli, can be embedded into the new measure.
The remainder of this paper is structured as follows: Section II reviews the preliminaries. The underlying circuit model as well as the approach of robustness computation based on [8] are described. Section III introduces the new measure in detail. An algorithm is presented in Section IV that computes the testpatterns. Experimental results are reported in Section V. Finally, conclusions are stated in Section VI.
II. PRELIMINARIES A. Boolean Satisfiability
The Boolean Satisfiability (SAT) problem is a well-known N P-complete decision problem [12] that asks whether a Boolean formula f : B n → B is satisfiable or not (unsatisfiable). If a formula f is satisfiable, a satisfying value assignment of the variables can be found. Typically, the problem is given in Conjunctive Normal Form (CNF), whereas every Boolean formula can be converted into a CNF. Nowadays, large formulas related to practical problems with millions of clauses and variables can usually be solved in feasible time by state-of-the-art SAT solvers [13] , [14] .
In this work multiple satisfying assignments are of interest. These assignments are computed by iteratively calling the SAT solver and adding blocking clauses. A blocking clause excludes a (previously computed) solution.
B. Circuit Model
We consider sequential and combinational circuits C with Primary Inputs PI(C), Primary Outputs PO(C) and State Elements S(C). For combinational circuits S(C) = ∅ holds. Furthermore a circuit consists of various components. For example, such components can be primitive gates (AND, OR, etc.), modules (Adder, Multiplier, etc.) or statements of a hardware description language (if (...) then ...endif;, etc.). The number of components of a circuit C is denoted by |C|. A circuit C can be converted into CNF in time and space linear to |C| [15] .
C. Classification of Components
As a basis for the robustness computation we use the following classification of each component g ∈ C as presented in [8] . A soft error in component g is either a bit-flip from 0 to 1 or from 1 to 0 on one or on multiple output signals of g 1 . A component g ∈ C belongs to one of three disjoint classes [8] :
1) non-robust -A soft error of component g leads to an abnormal output behavior under at least one input trace τ t for PI(C) at a certain time frame t, i.e. the primary outputs differ from the fault-free computation. Additionally, if the circuit is equipped with a fault detection signal flt, this signal does not report a fault, i.e. flt = 0. The input trace τ t is also called a testpattern for the soft error.
2) non-classified -The classification is similar to the nonrobust classification, with the exception that the states differ from the fault-free states but the primary outputs are equal. This case shows Silent Data Corruption (SDC). 3) robust -A soft error on component g is reported by the fault detection signal (flt = 1) or is corrected by the internal logic. Consequently, the primary outputs are correct with respect to the fault-free computation under every possible input. All components of the circuit are partitioned into the set T of robust components, the set S of non-robust components and the set U of non-classified components, i.e. C = S ∪ T ∪ U. For combinational circuits the set U is empty, because without state elements the second case above is not applicable.
D. Sequential Modeling
In [8] the sets T, S and U were computed by using a SAT solver. Basically, a Sequential Equivalence Check (SEC) of a circuit C and a circuit C ′ with additional logic to model soft errors is performed. The circuit is unrolled up to a given time limit t d . The schematic view of the model is shown in Figure 1 . A soft error is assumed to be corrected or detected (i.e. signalized by a fault signal flt) within a short period of time or the fault remains undetected. Consequently, the analysis can be bounded by a certain time limit t d . Combinational circuits can be easily embedded in the sequential model.
For the approach in [8] it is sufficient to compute at least one testpattern as mentioned in Section II-C for a component g, to classify g as non-robust. Given the sets T, S and U at a certain time frame t the robustness is defined as follows:
For combinational circuits the measure yields R t lb = R t ub , since |U| = 0 and t = 0. For sequential circuits the lower and upper bound may differ even if an unlimited number of time frames would be considered. In particular, if SDC occurs and the faulty system state is not corrected, the divergence between the faulty system and the fault free system may persist.
As mentioned before for this robustness measure it is sufficient to find one testpattern to classify g as non-robust, or to prove that no testpattern exists -the component is robust. This robustness measure considers a Single Testpattern (ST). In the following we denote this by R t ST as well as R t ST,lb and R t ST,ub for lower bound and upper bound at time frame t, respectively.
III. ANALYSIS USING MULTIPLE TESTPATTERNS
In the following we illustrate drawbacks of the measure introduced in Section II using an example. Next, the new measure is introduced that overcomes those limitations using an analysis based on Multiple Testpatterns (MT). Then, the relation to previously defined measures is analyzed.
A. Motivating Example
The robustness measure discussed in Section II can be considered as a "worst-case analysis": a component is considered non-robust as soon as there exists a single testpattern that shows faulty behavior of this component at least at one primary output. The probability to apply such a pattern, i.e. the excitation probability for the fault, is ignored. Consider the following example.
Example 1: Consider a combinational circuit C with four primary inputs, i.e. |PI(C)| = 4. Furthermore, let a, b ∈ C be two non-robust and c, d, e ∈ C be robust components. The worst-case analysis yields R ST = 3 /5 = 60%.
Further assume, that there are only two testpatterns that excite a fault in a, denoted by ψ(a) = 2. Given the total number of 2 |PI(C)| = 2 4 = 16 input traces, the probability to excite the fault in a is only 2 /16 = 12.5%.
Moreover, let any input trace be a testpattern for a fault at b, i.e. ψ(b) = 16 and the excitation probability at b is 100%.
The worst-case analysis does not differentiate the two components. Both are simply classified as non-robust, even though b can be considered as a hot-spot while a is relatively save.
The exact computation of excitation probabilities along the lines of [6] would overcome this limitation. But in that case all testpatterns -potentially a number exponential in the number of primary inputs -have to be found. In [6] a BDD-based symbolic analysis was used for this purpose. But a BDD-based analysis is typically limited to small circuits.
B. New Robustness Measure
Instead of considering only a single testpattern per component, the new robustness measure takes Multiple Testpatterns (MT) into account. By this, components that only have a few testpatterns can be differentiated from those components having many testpatterns. As a result a grading of the non-robust components is achieved which can be utilized to identify hotspots in the circuit.
In the following the combinational case is considered first, i.e. lower bound and upper bound for the robustness are identical. Let ψ(g) denote the number of testpatterns at component g. Then, the quotient between the number of testpatterns ψ(g) and all input traces Ψ = 2 |PI(C)| yields the excitation probability for a soft error on g: e(g) = ψ(g) Ψ Or alternatively, as a measure for robustness, the probability that a soft error is not observable, is given by:
Consequently, the robustness of the circuit is measured by The overall value of the new measure increases compared to the worst-case analysis. More important is the observation that components a and b can be differentiated. The presented measure can be extended to the sequential case. In this case testpatterns span multiple time frames and up to t time frames are considered in the analysis. The number of all input traces is given by Ψ = 2 |PI(C)|·t . Also a component may cause SDC and is considered non-classified in this case. Such components are collected in a set U. Then a lower bound and an upper bound for the robustness are determined by assuming that all non-classified components may turn out to be non-robust or robust, respectively. For non-classified components no testpattern can be found that shows a soft error at one of the primary outputs, i.e. ψ(g) = 0 holds for g ∈ U.
We retrieve the following bounds:
C. Embedding Previous Measures
First, the new measure is compared to the worst-case analysis previously proposed in [8] and described in Section II. Assume that λ ′ is close to zero. Then, ⌈λ ′ Ψ⌉ as used in Equation (1) becomes 1. In this case the robustness of a component g as defined in Equation (1) becomes:
If there exists at least one testpattern, the robustness of a component becomes 0. If no testpattern exists, the component is considered robust. Consequently, for λ close to zero the new measure converges to the one of [8] .
The probabilistic approach of [6] considers exact excitation probabilities to determine the robustness of components 2 . This is achieved using our measure by setting λ to 1, i.e. Equation (1) becomes
This is the probability of a soft error to remain undetected.
IV. COMPUTATION In this section we present an algorithm using a SATbased sequential ATPG engine to calculate the new robustness measure and discuss potential extensions.
A. Algorithm
The model for sequential SAT-based ATPG is shown in Figure 2 . Given are a component g and a limit t for the number of time frames to be considered. For time frame 0 the fan-out cone of g is modeled. For all outputs in this fan-out cone, the transitive fan-in of these outputs is also modeled until the primary inputs or a state element are reached. In time frame 0 the state elements remain unconstrained during the analysis or they are constrained as proposed in [8] . For time frame 1, the same copy is created. But the traversal does not stop when reaching state elements in time frame 1. Instead the model also includes the driving circuitry of time frame 0 to adequately model the sequential behavior. This process is repeated until time frame t is reached.
In principle this is a standard procedure for sequential ATPG. In our case faults are only injected in time frame 0, because soft errors are modeled and all states are covered by leaving the initial state unconstrained or allowing all reachable states, respectively. Moreover, the problem instance for analyzing t time frames is reused and extended to analyze t + 1 time frames. This improves the efficiency as learned information is reused by the SAT solver [16] .
The pseudocode to determine the testpatterns for a soft error in a component g is given in Algorithm 1. A circuit C, a 2 Indeed a different fault model has been used in [6] , but the fault models can be aligned. considered component g, the current time t, the maximum considered time frame t, the SAT solver object s as well as the parameter λ are given as input to the algorithm. The algorithm creates a SAT instance for the component g at time frame t. If t > 0, the existing problem instance is extended as explained above and as shown in Figure 2 . Only the primary inputs that are contained in the cone of g are considered. The maximum number of testpatterns to be considered is configured by λ and then stored in Ψ ′ in line 5. From line 6 to line 9, the testpatterns are computed. While a satisfying assignment exists and the maximum number of testpatterns Ψ ′ is not exceeded, a new testpattern is extracted from the SAT model and the number of testpatterns is incremented. To compute another testpattern, the currently computed satisfying assignment is excluded by adding a blocking clause that contains values of primary inputs and state elements in time frame t = 0. The loop continues until all testpatterns are found or the limit has been reached.
The computation is finished, if 1) the number of considered time frames is exceeded, 2) the SAT instance becomes unsatisfiable, i.e, there are no more testpatterns, or 3) the number of testpatterns exceeds Ψ ′ . Finally, the number of computed testpatterns is returned.
Further performance improvements can be achieved using a Minimal Assignment Analysis (MAA) similar to [17] , [18] for each extracted testpattern. Given a testpattern this analysis computes which of the primary inputs justify at least one differing output. Starting from a faulty primary output, one controlling input or all non-controlling inputs are traced at each gate until reaching the primary inputs. This analysis is repeated for the fault signal, if the circuit is equipped with such a signal. The remaining primary inputs not contained in these traces do not influence the output value. These primary inputs can be considered as don't care. Using this analysis the number of SAT calls can be decreased significantly. The number of testpatterns can be calculated by simple arithmetic operations.
B. Discussion
In [8] an algorithm to perform the worst-case analysis was proposed. That algorithm was based on SEC. All potential faults are modeled in a single problem instance and learned information can potentially be reused for all other faults. Preliminary experiments have shown that this is more efficient in some cases than using a large number of ATPG calls. But, here we also calculate multiple testpatterns for each fault. Therefore we chose an ATPG engine to keep the size of the problem instances smaller.
V. EXPERIMENTAL RESULTS
This section presents the evaluation of the new robustness measure. Combinational circuits were taken from the LGsynth93 benchmark suite and sequential circuits from the ITC'99 benchmark suite, respectively. For every circuit a parity checker was implemented and optimized using SIS [19] . The parity is checked at the primary outputs and on the state elements. A wrong parity is reported by setting the fault signal flt to 1. The robustness of the parity circuits is less than 100%: fault masking may occur, i.e. a single soft error flips an even number of outputs and state elements.
All experiments were carried out on an AMD Dual-Core Opteron Processor with 32GB main memory under Linux. The algorithm is implemented in C++. As underlying SAT solver MiniSat v2.0 [13] with a feature to allow for incremental satisfiability is used. A time out was set to 5000 CPU seconds. Exceeding this limit is denoted by Ab. Table I shows the results for the combinational circuits. The first three columns describe properties of the circuit: the name, the number of primary inputs and the number of gates in the circuit. Note that for combinational circuits SDC cannot occur, i.e. all components are classified as robust or as non-robust. Consequently, there is only a single value for the robustness of such circuits (see Section II-D and Section III-B).
A. Combinational Circuits
The results of the worst-case analysis, the new measure using 500 testpatterns and the new measure using 10,000 testpatterns are given in the following columns. The parameter λ has been adjusted accordingly. For the worst-case analysis, the robustness value and the number of non-robust components are shown in columns R ST and |S|, respectively. For the new measure, the robustness value R MT , the parameter λ, the overall run time t in CPU seconds without MAA, and the run time t MAA with MAA are shown in the respective columns. Additionally, column > λΨ gives the number of components having more than 10,000 testpatterns. Blank cells denote that no computation with 10,000 testpatterns was required as all components had less than 500 testpatterns.
The run times are longer for the new measure as usually more than a single testpattern has to be considered before the classification of a non-robust component is completed. For small circuits the use of MAA increases the run time (e.g. for par 5xp1). In these cases the SAT solver efficiently enumerates multiple solutions. In contrast, when the number of inputs increases, MAA often yields shorter run times as a single solution of the SAT solver is generalized to many testpatterns (e.g. for par apex9 and for par rot).
The robustness value for the new measure is typically larger than the one for the worst-case analysis. As soon as a single testpattern exists a component in is classified as "completely non-robust" in the worst-case analysis. For the new measure non-robust components are graded by the number of testpatterns. As long as the number of testpatterns is below the predefined limit, a component contributes to the circuit's robustness. For example, this case occurs for the circuits par cmb and par cu. In such cases a fine grain differentiation between non-robust components is available. The designer decides whether further protection is required for some of these components.
If the number of testpatterns always exceeds the predefined limit, the robustness values are identical for the worst-case analysis and the new measure. For example, this occurs in case of par rot and par t481. All non-robust components must be considered as hot spots and further precautions have to be taken to handle transient faults. 
B. Detailed Example
A more detailed evaluation for the combinational circuit par_cu is shown by the histogram in Figure 3 . The xaxis depicts the number of testpatterns. The y-axis gives the number of components that had a certain number of testpatterns. In total there are 2 14 = 16, 384 input traces. The number testpatterns considered was limited to 10,000 as in the previous experiment. The worst-case analysis yields 51 nonrobust components. For most of these components there exist less than 10,000 testpatterns. A very fine grain differentiation between non-robust components is determined. Table II shows results for sequential circuits. Up to 10 time frames are analyzed and up to 100 testpatterns are taken into account for the new measure. All components of all circuits are classified, i.e. upper and lower bound are identical for both measures. In the sequential case the number of testpatterns is typically more than 100 such that the predefined limit is exceeded for the new measure. For many of the smaller circuits, a finer differentiation between non-robust components becomes available when using the new measure, e.g. for par b06 and par b07.
C. Sequential Circuits

VI. CONCLUSION
In this paper we proposed a new robustness measure. This measure constitutes a trade-off between worst-case analysis and a probabilistic approach. The worst-case analysis as well as the probabilistic approach can be embedded in the proposed measure. With the new measure a more detailed view on the robustness can be achieved in feasible run time. Furthermore, the measure identifies hot-spots in the design, i.e. easily sensitizable components. A sequential ATPG engine considering a bounded time window was used to compute the testpatterns efficiently.
VII. ACKNOWLEDGMENT
We would like to thank Andre Sülflow for helpful discussion. 
