Abstract-Due to the shrinking of feature size and the significant reduction in noise margins, nanoscale circuits have become more susceptible to manufacturing defects, noise-related transient faults, and interference from radiation. Traditionally, soft errors have been a much greater concern in memories than in logic circuits. However, as technology continues to scale, logic circuits are becoming more susceptible to soft errors than memories. To estimate the susceptibility to errors in combinational logic, the use of binary decision diagrams (BDDs) and algebraic decision diagrams (ADDs) for the unified symbolic analysis of circuit reliability is proposed. A framework that uses BDDs and ADDs and enables the analysis of combinational circuit reliability from different aspects, e.g., output susceptibility to error, influence of individual gates on individual outputs and overall circuit reliability, and the dependence of circuit reliability on glitch duration, amplitude, and input patterns, is presented. This is demonstrated by the set of experimental results, which show that the mean output error susceptibility can vary from less then 0.1% for large circuits and short glitches (20% cycle time) to about 30% for very small circuits and long enough glitches (50% cycle time).
I. INTRODUCTION

F
OR THE LAST few decades, the main factors driving the design of digital systems have been cost, performance, and, more recently, power consumption. However, with technology scaling, reliable operation of digital systems is being severely challenged, thus requiring the use of fault-tolerancedriven design methodologies not only for mission critical applications (medical, banking, traffic control, etc.) but also for regular mass-market applications [1] .
To allow for the efficient design of a system that can tolerate faults, a first natural step includes understanding the source of induced errors, but most importantly, their modeling and analysis for the purpose of guiding the overall design process.
A "fault" is an incorrect state in the hardware or software that is part of the system. Such faults can result from physical defects, design flaws, or operator errors. According to their source or duration, faults can be divided into permanent, transient, and intermittent faults.
• Permanent faults occur and remain stable until a repair is undertaken (e.g., stuck-at-zero, stuck-at-one).
• Transient (external, soft, or single-event upset) faults occur for a short period of time and then disappear (a bit flip due to a transient physical phenomena, e.g., cosmic ray, alpha particle). These faults can cause an error in the system by changing the internal state, although they last only for a short time.
• Intermittent faults, after they first occur, usually exhibit a relatively high occurrence rate and, eventually, tend to become permanent [2] .
Manifestation of a fault is called an "error," and the systemlevel effect of an error is known as a "failure." The principle of fault tolerance is to automatically surmount the effects of faults by use of redundant components. Consequently, a fault-tolerant system is one that is capable of continued operation with little or no performance degradation and without corruption of data in the presence of failure due to either internal or external causes. However, not all faults lead to errors, and not all errors lead to failures.
In this paper, we address the first issue mentioned abovesoft error susceptibility-that is, the likelihood that a transient (physical) fault will lead to an error. Our main goal is to allow for "symbolic modeling and efficient estimation" of the soft error susceptibility of a combinational logic circuit. This can be further used to reduce the cost of applying various techniques for error detection and correction.
A. Transient Faults in Current Semiconductor Technology
Shrinking of feature size leads to the decrease of the amount of charge usually stored in circuit nodes. This decrease, together with the significant reduction in noise margins, makes circuits more susceptible to manufacturing defects, noise-related transient faults, and interference from radiation. When high-energy neutrons or alpha particles hit the silicon bulk, they create minority carriers, which, if collected by a p-n junction, result in a current pulse of very short duration. A current pulse that occurs as the result of the strike is often called a single-event transient (SET). These events may cause a bit flip in some latch or memory element. Additionally, a SET may occur in an internal node of combinational logic and propagate to the latch. If latched, it results in a "soft error."
Traditionally, soft errors have been of greater concern in memories than in logic circuits because of the small cell size of memories and the nature of memory-a SET can immediately result in a soft error if it exceeds the critical charge stored in the 0278-0070/$20.00 © 2006 IEEE cell. In contrast to this, three factors have prevented logic from becoming more susceptible to soft errors. 1) Logical masking: To be latched, a SET needs to be on the sensitized path from the location where it originates to the latch. 2) Electrical masking: A SET needs to create a pulse that has a duration and amplitude large enough to reach the latches. Due to the electrical properties of the gates the pulse (glitch) is passing through, it can be attenuated and even completely masked before it reaches the latch. 3) Latching-window masking: If the pulse reaches the latch and appears at its input "on time" (during this window), depending on its amplitude and duration, it has a high probability of being latched. However, as technology continues to scale, logic circuits are becoming much more susceptible to soft errors. The trends toward reduced logic depth reduce the attenuation when SET is propagating through the circuit. Smaller feature sizes and lower voltage levels allow lower energy particles to cause SETs. Therefore, soft error failure rates in combinational logic are expected to become very important in the future [3] and even exceed soft error rates (SERs) in memories.
B. Paper Organization
The rest of this paper is organized as follows: In Section II, we give an overview of related work. Section III describes our assumptions and the notations we use in the rest of this paper. Section IV presents in more detail the mathematical model that lies behind our framework. In Section V, we describe our symbolic modeling methodology, while in Section VI, we describe a practical method for determining circuit susceptibility to soft errors. In Section VII, we report experimental results for a set of common benchmarks. Finally, in Section VIII, we conclude our paper and provide some directions for future work.
II. RELATED WORK
A. Transient Fault Analysis and Modeling
Intensive research has been done so far in the area of analysis and modeling transient faults [3] - [5] , [7] - [9] . However, for estimating the likelihood of soft errors as the result of a SET, most of the previous work has relied on fault injection [1] , [6] , [7] and simulation instead of the symbolic modeling of the probability of soft errors. The results presented in [1] show that the soft error susceptibility of internal nodes in a logic circuit can vary by at least one order of magnitude. Based on this fact, the authors have applied concurrent error detection techniques asymmetrically (targeting mostly nodes with high soft error susceptibility), which led to reduced cost.
In [7] , Omana et al. give a mathematical model for analyzing the propagation of a transient fault through a chain of combinational gates. They verified that their model has 90% average accuracy with respect to HSPICE simulation. However, their work was focused on estimating electrical masking on the sensitized path in the circuit, while logical and latching-window masking were not included.
Zhao et al. [8] also stressed the importance of analyzing the effect of internal glitches on the latched outputs of the circuit. For electrical masking, the authors use noise rejection curves and find the probability that noise will propagate through the given node without being completely attenuated. Each node is analyzed separately, so their analysis does not reflect the influence of the location of the node inside the circuit on the observability of the noise at the latched output. Moreover, for logical masking, the authors use path tracing, which can become very inefficient for larger circuits.
Zhang et al. [9] present a methodology for SER analysis. This paper focuses mostly on modeling the probability that a SET is generated by a particle hit. Electrical masking for each path is obtained from HSPICE simulation, and logical masking is computed for each input vector and each path separately by flipping the logic value of each node.
Two more recent works on reliability evaluation have been presented [10] , [11] . Dhillon et al. [10] present an "independent" computation of the three factors, logical, electrical, and latching-window masking, to find the soft error tolerance of the circuit. Krishnaswamy et al. [11] use probabilistic transfer matrices and their representation via algebraic decision diagrams (ADDs). Each gate can be represented as a matrix where the probability of each output value is explicit for each input combination. Parallel compositions of gates are represented with tensor products. However, the work presented in [11] focuses only on the logical masking effect of the circuit for given gate output probabilities without considering electrical and latching-window masking.
B. Analysis of Combinational Circuits Using Binary Decision Diagrams (BDDs) and ADDs
To estimate the probability of errors in combinational logic, our symbolic tool uses BDDs and ADDs as part of the CUDD package [12] . BDDs [13] , [14] provide an efficient and canonical representation for Boolean functions. In [15] , a new type of BDD, called a multiterminal BDD (MTBDD), was introduced. An MTBDD allows for multiple terminal nodes in the canonical representation. Similar to MTBDDs, ADDs [16] are introduced as a class of symbolic models and associated algorithms applicable not only to arithmetic but also to many algebraic structures. For example, these decision diagrams were applied to symbolic timing analysis in [17] . In that work, the authors present RESTA, a robust and extendable timing analysis tool that addresses three main goals, namely 1) considers both internally and externally specified input constraints, 2) handles a wide range of circuit structures, and 3) have a robust underlying framework. This application has shown ADDs to be practical and efficient while providing quite accurate results.
C. Paper Contribution
There are some important differences between our model and those in [8] - [11] . In comparison to [8] - [10] , where latchingwindow, electrical, and logical masking are analyzed separately and assumed independent, our approach provides a unified treatment of these three factors while including their "joint" dependency on input patterns and circuit topology. In most of the previous work, information about electrical masking is obtained by simulation [9] , while information about logical masking is obtained by path tracing [8] - [10] . In this paper, by using BDDs and ADDs, this information is instead implicitly stored inside the decision diagrams and, therefore, allows for efficient concurrent computation of output error susceptibility due to hits on various internal nodes. In the case of reconvergent glitches (that is, glitches arriving at the same gate or latched output from the same source on two or more different sensitized logical paths), the problem of merging the glitches needs to be addressed. In [8] , a similar problem for several different noise sources is solved by shifting the noise rejection curve. The authors in [9] approximate the case of reconvergent glitches with the worst case and claim that in most cases this does not affect the accuracy significantly. Our approach to this problem is different from these two and is explained in more detail in Section V. Finally, while Krishnaswamy et al. [11] provide a symbolic method for circuit reliability, it does not include the additional joint impact of electrical and latching-window masking and presents logical masking only.
III. ASSUMPTIONS AND NOTATION
We show in Fig. 1 an example of a target circuit we are analyzing, including the combinational logic, as well as its input and output latches. We estimate the probability that a pulse or glitch, occurring due to some transient physical phenomenon at an internal gate G of the circuit, will result in an error at output F . In our framework, we capture all gate output combinations, i.e., we determine the probability of a soft error at any output due to a fault originating at any internal gate. At the output of gate G, the glitch has an initial duration d init and an initial amplitude a init . The duration at the output of the gate is always measured at switching threshold voltage (V S ) [18] of the downstream gate; therefore, according to 
The propagation of a glitch through an internal gate G (V S ) is shown in Fig. 2(b) . At the input of gate G , the glitch has amplitude a in and duration d in , and the output has amplitude a out and duration d out . Durations d in and d out are in this case measured at the switching threshold voltage of gate G [18] . However, for all output neighbors of gate G , d out will be recomputed according to their switching thresholds. The propagation delay of gate G is t prop . To find out if the glitch propagates through gate G , and to compute the new amplitude and duration, we use the methodology from [7] , as explained in Section IV. Finally, at the latched output F , the glitch has amplitude A and duration D. The switching threshold voltage of the latch, at which D is measured, is V S,latch . Since there is a delay from gate G to output F (T 2 ), the time when the glitch becomes larger than V S,latch is t 1 , and the time when it becomes lower than V S,latch is t 2 , i.e.,
Duration D and amplitude A can have different values at the output F , depending on the various sensitized paths, from G to F . The set of different values of duration D for various sensitized paths is denoted by D k . The delay T 2 depends on the sensitized path (i.e., on the gate delays on that path) from gate G to output F , while the delay from input latches to gate G (T 1 ) depends on the path from inputs to gate G. However, in our model, when computing latching-window masking, we assume the worst case in which the latching-window probability is maximized, as it will be seen next.
Since we are interested in the propagation of a glitch in the time interval between two rising edges of the clock signal, we can take [0, T clk ] as the interval of observation. For a signal to be latched, it needs to be stable during the setup time t setup before the rising edge of the clock and during the hold time t hold after the rising edge of the clock. In other words, it needs to be stable inside the interval
IV. MATHEMATICAL DESCRIPTION OF THE MODEL
This section describes the conditions that are needed for a transient glitch at the output of an internal gate to be propagated to the output and latched such that a soft error is registered. We detail the interdependency between conditions for logical, electrical, and latching-window masking and describe their joint model.
A. Necessary Conditions
To this end, we define the following events: E a glitch originating at gate G is latched at output F ; A the amplitude of a glitch at the output is larger than the switching threshold of the latch (if the correct output value is "0") or smaller than the switching threshold (if the correct output value is "1"); D the duration of a glitch at the output is larger than the sum of setup and hold time of the latch; T the glitch appears at the output on time to be latched (i.e., it satisfies the setup time and hold time conditions when the rising edge of the clock occurs). It is clear that for event E to happen, the other three events need to occur as
In this model, logical and electrical masking are implicitly included in A and D, while latching-window masking is included in T . As mentioned in Section III, the switching threshold of the latch at output F is V S,latch . To satisfy the latching condition, the time at which the glitch reaches V S,latch (t 1 ) must satisfy
In addition, the time when the glitch becomes less than V S,latch (t 2 ) must satisfy
with duration D of the glitch at output F given by (3). Thus, the condition that allows a glitch occurring at gate G to be latched can be written as
It is important to note here that, even if t 1 does not satisfy this condition, there is a nonzero probability of a metastable state, thus latching the wrong value. However, since this probability is of the order of 10 −7 for the current technology [18] , its contribution is negligible for all practical purposes.
More formally, one can express the three events as follows: A A > V S,latch (if the correct output value is "0") or A < V S,latch (if the correct output value is "1");
Therefore, the probability of event E can be written as
where
we denote the conditional probability of event E 1 given event E 2 . As seen in Fig. 2(c) , D is satisfied only if A is satisfied, that is, only if the amplitude of the glitch is larger than the switching threshold would the duration can be different from zero, in other words
and, thus
which implies
where {D k } is the set of possible output glitch durations along various sensitized paths. We assume that it is equally likely for a gate G to be hit during a cycle period, that is, t 1 is uniformly distributed in the interval (T 1 , T 1 + T clk − d init ). At the same time, the minimum time interval required for latching a propagated glitch on a given path is (T clk + t hold − T 2 − D, T clk − t setup − T 2 ) for a given output glitch duration D. Thus, the worst case corresponds to having the glitch completely cover the interval (
In other words, the maximum probability of an output glitch of duration D k to be latched is
and, thus, the probability of event E becomes
B. Attenuation Model
From the previous equations, we can see that, to determine the probability of event E, it is necessary to find out what are the possible values for duration D k and determine the probabilities associated with those values. Another issue is finding the correct values for amplitude at the output. To find these values, we use the method proposed in [7] . Fig. 2 shows how the glitch propagates from the output of gate G to the output of gate G , which is assumed to be on the sensitized path from gate G to a generic output F .
As claimed in [7] , when the glitch propagates to the input of gate G , depending on the relation between the duration d in of the glitch and the propagation time of the gate G , t prop , there are three possible options.
1) If d in ≤ t prop , then the glitch will not propagate through the gate (it is "masked"). 2) If t prop < d in ≤ 2t prop , then the glitch will propagate, but the amplitude and the duration will be smaller at the output of a gate (it is "attenuated"). 3) If 2t prop < d in , then the glitch will not be attenuated and will be propagated "as is." As can be seen, the amplitude and the duration of the glitch at the output of the gate through which the glitch propagates depend on the input glitch duration, amplitude, and propagation delay of gate G . However, if the output glitch amplitude a out is "not" larger than the switching threshold for the downstream gate, then it can be assumed that the glitch does not propagate at all. As in [7] , we assume the following: when the output voltage has a "1" logic value (V dd ), and a glitch affects the input, the output minimum amplitude is
Similarly, when the output voltage has a "0" logic value, and a glitch affects the input, the output maximum amplitude is
(15) where V T 1 and V T 2 are the thresholds that divide the interval in which a in /V S can take values into three parts. These thresholds are functions of the glitch duration normalized with respect to the gate propagation delay t prop . We obtain the curves that represent V T 1 and V T 2 from simulating a chain of gates and for each gate type we find the two specific curves. We approximate these curves with a third-order polynomial [7] and find the coefficients to be used in the model. This attenuation model has been shown to have an average accuracy of 90% when compared to HSPICE.
V. SYMBOLIC MODELING FRAMEWORK
To find the probability of event E (as described in Section IV-A.), we need to find the possible values for the duration and amplitude of a glitch at the generic output F . To determine the probability of having a glitch of duration D k at that output, we use BDDs and ADDs. All relevant BDD/ADD functions and algorithms used in this paper are included in CUDD package [12] . The algorithm is described in the following.
A. ADD Creation
ADDs are created starting with the first node in topological order. Duration and amplitude ADD are the same, except for the values stored in terminal nodes. Terminal node "0" represents combinations of inputs that logically mask the glitch and all cases when the glitch becomes too short or too attenuated to be propagated, i.e., all cases when the glitch is electrically masked. The values on the other terminal nodes will depend on the paths through which the glitch propagates.
The initial ADD for each gate is built for the glitch originating at that gate. It consists of only one terminal node for all possible input patterns-initial duration or amplitude value. Those ADDs are passed to all fanout gates, which use them for creating new ADDs based on their own attenuation model.
Let us now assume that gates G and G are internal gates on the sensitized path through which the glitch propagates to the output F . To create new ADDs for gate G , we use propagated ADDs from gate G (which will propagate the initial glitch amplitude and duration ADDs but also ADDs that it has built with respect to ADDs passed from its fanin gates) and sensitization BDDs. Since the glitch propagates only if it is on a sensitized path, we need to create sensitization BDDs to find out for which input patterns the path between gates G and G is sensitized. Thus, to build new ADDs for gate G , we use an ADD received from its input neighbor G and a sensitization BDD that represents the function f = ∂G /∂G. Only for the cases that end up in the terminal node "1" in the sensitization BDD and a node different than "0" in the ADDs do we calculate new values for duration and amplitude. All other cases represent either logically or electrically masked values. Starting with the first node in the topologically sorted list, we create ADDs and BDDs at each node, but they are destroyed as soon as they are not needed. Moreover, some of the current ADDs become "0" due to masking effects, so those ADDs are also removed. When the final node in the circuit is reached, only the ADDs for output F are needed.
Each of these final ADDs represents a pair "gate-output," where gate is the one where glitch appears, and output is the one for which we determine the probability of error susceptibility. The terminal nodes for these ADDs represent the final duration or amplitude of a glitch at the "output" given that glitch with duration d init and amplitude a init occurred at the output of the "gate". In addition, we also keep track of a list of delays that are computed in parallel with creating ADDs. Since reconvergent glitches propagate from the output of one gate (e.g., gate G) to inputs of another gate (e.g., gate G ) on different sensitized paths, they can arrive at inputs of gate G at different time instances. Therefore, we need to keep track of the delay of each reconvergent glitch from the output of gate G to the input of gate G . We denote this delay as T 2 , as described in Section III, and associate a delay value to each pair of duration and amplitude ADDs. As already described, the duration and amplitude ADDs for a given "gate-output" pair is initialized at the output of "gate" and updated at each of the fanout gates of the "gate", and so on, up to the "output". The delay T 2 is increased by the delay of each gate at which we update these ADDs. To show how our method works, Fig. 3 presents ADDs that are built on paths G 1 → G 5 and G 2 → G 3 → G 5 of the ISCAS'85 benchmark C17 (Fig. 4) . Fig. 3 shows sensitization BDDs for paths G 1 → G 5 and G 2 → G 3 → G 5 , while Fig. 3 represent initial and propagated duration ADDs for glitches originating at gate G 2 (two steps) and gates G 1 and G 3 (one step for each). As can be seen from Fig. 5 , the algorithm for creating ADDs is linear in the number of gates and number of inputs, while the algorithm for computing probabilities is linear in number of gates and number of outputs. Several functions that are used in our main algorithm are described in detail in [19] .
In Section V-B (Roman numeral 5-B), we explain how glitches arriving on different reconvergent paths are merged.
B. Reconvergent Glitches
To find and merge all ADDs that represent reconvergent glitches, we define a procedure "mergeADDs." For example, in the case of benchmark C17, we can see that the output of gate G 2 goes to gates G 3 and G 4 , and that the outputs of these gates (G 3 and G 4 ) are inputs to gates G 3 and G 6 . Thus, a glitch occurring at the output of the gate G 2 can propagate through two paths (through gates G 3 and G 4 ) to gate G 6 . In this case, depending on the values on the circuit inputs, different superpositions of the two glitches arriving at the inputs of the gate G 6 can occur. Therefore, when building ADDs for duration and amplitude, we need to know whether such situations occur in order to compute the correct values. The pseudocode for the function that merges reconvergent glitches is given in Fig. 6 .
In more detail, from the list of all reconvergent paths arriving to a given gate, we separately analyze groups of paths that originate at the same gate. For paths with the same start gate, and their corresponding ADDs (i.e., glitches), we build a quasi sensitization BDD, that is, a BDD where the zero node represents all the cases where at least one of the inputs not carrying a glitch is controlling and one where neither one of them is controlling. This BDD can reduce ADD size; therefore, we only analyze cases where glitches affect the output of the gate. Next, we find cases where inputs that carry glitches mask each other. Not all reconvergent glitches occur for all input patterns, and there are situations where one of the glitches appears at the gate input, but at least one of the others is logically masked. Moreover, if this input is set to a controlling value, then the existing glitch will also be masked. Thus, we need to mask all these cases in the ADDs for the reconvergent paths. When masking is completed, the only input combinations that lead to nonzero terminal nodes in ADDs are those that allow glitches to affect the output of a gate. Each ADD has a delay associated with it, so the list of reconvergent paths is sorted according to their delays. From the sorted list, we take pairs of ADDs and merge them, as shown in Fig. 6 . When merging ADDs, four possible situations can occur, as shown in Fig. 7 .
Two inputs (carrying reconvergent glitches) that are to be merged can be both controlling, both noncontrolling, or the first controlling and the second noncontrolling, and vice versa. It is easy to conclude from Fig. 7 that, in some cases, the resulting glitch is the same as one of the original two-we just need to keep that one and mask those cases in ADDs of the other one. The same holds when a new glitch starts at the same time as one of the original two, except that in this case the corresponding value in duration ADD is changed. There are also situations when glitches mask each other, or when one of the glitches is attenuated, the other one is removed, and a new one appears. Since the new glitch has a different delay, we cannot merge it into neither one of the ADDs, but a new ADD with the new corresponding delay needs to be created. If the two resulting glitches are close enough to each other, we can assume that they are merged into a single long glitch (worst-case approximation). Hence, the function "merge" is nothing more but a case statement that includes three possible subcases and in each of them assigns a new terminal value to the resulting ADD, representing the duration of the resulting glitch. As can be seen by direct inspection, the algorithm for merging ADDs on reconvergent paths is linear in the number of reconvergent paths arriving at the gate. Subfunctions of function "mergeADDs" (Fig. 6 ) are described in detail in [19] .
VI. PROBABILITY COMPUTATION
Since different combinations of "0"s and "1"s can occur at the inputs of a given combinational circuit, we set various values for the probability of each input being "1." We use these probabilities to find the error susceptibility for each output of the combinational logic.
When all ADDs for a given circuit are built, the error susceptibility for each output due to an error at the output of any gate in the circuit can be computed. We use (11) to compute these probabilities. For a generic output F j and a gate G i , we build all ADDs representing the duration and amplitude of a glitch originating at the output of gate G i and propagating to output F j . Given the probability of "1" for each input, we compute the probability that the glitch duration D at the output is D k , and the corresponding latching probability for this specific duration value as in (13) . To analyze error susceptibility of a given combinational logic circuit, we assume a discrete set of test glitches of different initial duration d init and use randomly generated input probability distributions. The main part of our framework is building ADDs. In cases when input correlations exist, they can be represented by input ADDs. Forbidden input combinations end up in terminal node 0, while other combinations end up in terminal nodes representing the probability of that combination. This way of representing input vector distributions has been described and used before [20] , is compatible with our framework, and can be used in conjunction with it. These input ADDs can be "merged" (ANDed) with final gate-output ADDs to get correct probability values. Therefore, we did not make an assumption on whether inputs are dependent or independent since both cases can be covered within our framework. In order to simplify the analysis, we assume that input probabilities are given as a set of values and not as input ADDs.
We analyze each circuit from two aspects, namely 1) reliability of its outputs when faults occur inside the circuit, and 2) influence of individual gate's error on outputs.
A. Mean Error Susceptibility (MES) and Mean Error Impact (MEI)
Assuming that an effective particle hit occurred, we can define following events:
F j output F j fails; G i the glitch occurs at the output of a given gate G i ; P k the probability that the distribution for the primary input vector stream is f k . We further define two additional events
2) a init = a; which mean that the particle hit resulted in a glitch with given initial duration d and given initial amplitude a. Events "d init = d" and "a init = a" can be represented in terms of random variables X and Y as
with a joint probability density function f (x, y). Since X and Y are continuous variables, then P (X = x, Y = y) = 0, so we should write instead
For simplifying the notation, and according to the nature of random variables X and Y , we represent events "d init = d" and "a init = a" by two events X d and Y a , respectively, as
We now define event E d,a j,i,k as the event of output F j failing given that the glitch with duration d and amplitude a occurs at the output of the gate G i and given that the input vector probability distribution is f k , i.e.,
The probability of event E d,a j,i,k is then
The probability of event F j , that is, the probability that output F j fails, can be expressed as the total probability
We can express the conditional probability of event F j , given that events X x and Y y occur, as
We can write the numerator of (20) in the form
where we assume that G i , i = 1, . . . , n G are independent events (i.e., internal gate hits are considered independent), and P k , k = 1, . . . , n f are independent events. n G is the cardinality of the set of internal gates of the circuit {G i }, and n f is the cardinality of the set of probability distributions {f k } associated with the input vector stream. We can now express the numerator from (20) 
where we assume that event G i is independent of events P k , X x , and Y y (i.e., the probability of an internal gate being hit is independent on the produced glitch duration and amplitude, and input probability distribution), and that event P k is independent of events X x and Y y (i.e., the produced glitch duration/amplitude is independent on input probability distribution). We note that the probability of event E x,y j,i,k was determined in Section IV for
We analyze each circuit from two aspects, namely 1) reliability of its outputs when faults occur inside the circuit, and 2) influence of individual gate error on outputs. For each output F j , initial duration d, and initial amplitude a, we find MES as the probability of output F j failing due to errors in the internal gates, i.e.,
Note that the computation of P (E d,a j,i,k ) was described in Section IV. The MES expression was derived based on two assumptions.
1) The probability to output a glitch, due to an effective particle hit, is same for all gates G i and equal to 1/n G . 2) Each input probability distribution f k is equally likely to occur with probability 1/n f .
Let us now define an event F, which represents the event of any output failing. Event F is, therefore
For each gate G i , initial duration d, and initial amplitude a, we define THE MEI over all outputs F j that are affected by a glitch occurring at the output of gate G i as
where n F is the number of outputs of the circuit. We derived this expression, including the following assumptions:
1) Individual output failures due to the glitch originating at the output of gate G i are disjoint for a given internal gate hit, initial produced glitch duration/amplitude, and input probability distribution. This assumption is the same as the assumption made in [9] , where SER is found for each output separately. In our case, final ADDs can actually be used to find joint probabilities of outputs too.
2) The same as assumption 2 above.
B. Relationship With SER
According to (23) , the numerator in (20) and (22) now becomes
When using (26) in (20) and (19), we obtain the probability of output F j failing as
Our framework computes MES for all outputs of the circuit and for a discrete set of pairs (x, y) of initial glitch durations and amplitudes, while the surface defined by all allowed pairs (x, y) is continuous. Thus, we partition this surface into a grid with increments of ∆x and ∆y for x and y, respectively. We assume that MES is constant within each subsurface. Further, the choice of function f (x, y) depends on the particles' energy and the critical charge for circuit transistors. This function describes the most probable, least probable, or average initial glitch duration and amplitude. Without loss of generality, we assume a uniform distribution along the surface of allowed pairs (x, y) as
. Therefore, we can rewrite (27) as a double sum
We can now derive an expression for SER as
where R PH is the particle hit rate, R eff is the fraction of particle hits that result in charge generation, and A circuit is the total silicon area of the circuit.
VII. EXPERIMENTAL RESULTS
In this section, we compare the results of our symbolic framework for eight combinational circuits given different glitch durations and different sets of input probabilities. The technology used is 70 nm, Berkeley Predictive Technology Model [21] , [22] . The clock cycle period (T clk ) used is 250 ps, and setup (t setup ) and hold (t hold ) times for the latches are assumed to be 15 ps each. V dd is assumed to be 1 V, and for simplicity, all switching threshold voltages, gate threshold (V S and V S ), and latch threshold V S,latch are assumed to be V dd /2. The delay of an inverter in the given technology is determined by simulating a ring oscillator in HSPICE and found to be 10.2 ps. The delays for other gates are found by using logical and electrical effort methodology [23] . The benchmark circuits are chosen from ISCAS'85 and mcnc'91 suites. Our symbolic modeling framework is implemented in C++ and run on a 3-GHz Pentium-4 workstation running Linux. In Table I , we show the experimental results for several benchmarks of varying complexity. We present minimum, maximum, average, and median output MES for all benchmarks as well as the associated run time and memory usage. As can be seen from the results, the MES decreases with circuit complexity due to more significant electrical and logical masking. The results also show that the median value is usually closer to minimum. Therefore, we can conclude that most of the outputs have small MES, but as will be seen next, in the case of large glitches, almost all gates have an impact on output failure. The results for one small benchmark 5xp1 (116 gates) and one larger benchmark C1908 (384 gates) are presented in Fig. 8 . We divide interval [0, 1] of possible error impact into ten subintervals. For each benchmark, each error impact interval, and various input probability distributions, we show the number Fig. 9 . MEI for a small benchmark (5xp1) computed as in (17) and using exact propagation delays from primary inputs to originating glitch gate for three glitch durations (small-50 ps; medium-80 ps; large-125 ps). of gates that have minimum, maximum, mean, or median error impact in those intervals. We present this dependence assuming three different initial glitch durations. For the small glitch that has a duration of 50 ps, all error impact values are in the range from 0 to 0.4. The gates that influence outputs are just the output gates and their fanin gates. In case of larger circuits, there is a significant number of gates that do not have any impact on output error. However, in case of a 125-ps-long glitch, it might not propagate to the output due to logical masking, or it will not be latched due to latching-window masking. Since glitch is very long even at the output, there is a considerable number of gates that will almost certainly have an impact on output error.
We also experimented with an improved delay model that uses delay ADDs (which store in the terminal nodes all possible values of delay from primary inputs to a given gate) [17] , [24] . As already explained, the model used for the results presented so far takes into account only the worst-case delay among these, i.e., the one that maximizes latching-window probability. To show the difference, we have created "delay" ADDs for several small benchmark circuits and compared them with the results of our main framework. Fig. 9 presents the minimum, maximum, average, and median error impact of gates in benchmark circuit 5xp1, for the same glitch durations as in Fig. 8 , in the case of using the exact delay model from the primary inputs to the gate under attack. As can be seen from Figs. 8 and 9, the original case that does not include delay ADDs has a larger spread toward larger error impact values since it is based on a worst case for determining latching-window probability.
In Fig. 10 , we present the average bit SERs for the same set of benchmark circuits as in Table I . P (F j ) and SER for each output are found using (29) and (30), respectively. The allowed interval for the initial duration of the glitch is assumed to be (d min , d max ) = (45, 125) ps, while the initial amplitude is in the range (a min , a max ) = (0.8, 1) V. The MES for each output is found within these allowed intervals at incremental steps ∆x = 20 ps and ∆y = 0.1 V. The R PH used is 56.5 m −2 s −1 , R eff is 2.2 · 10 −5 , and the total silicon area found for each benchmark circuit is proportional to the number of gates. We used the comparison with results in [9] , since the results are reported as SER values, while other related work [8] , [10] , [11] reported results using different measures: "softness," "unreliability," and "reliability." The SERs for benchmarks in Table I , found using our framework, are similar or slightly larger than the SERs computed for two smaller benchmarks (4 × 4 and 8 × 8 multipliers) in [9] , while they are one to two orders of magnitude smaller than the SER for two larger benchmarks (16 × 16 and 32 × 32 multiplier) in [9] . Due to the approximate treatment of reconvergent glitches, there is an overestimation of SER in [9] . This especially affects center bits that have the largest SER since a large number of paths leads to those bits.
We compared glitch durations and delays (obtained using our symbolic framework) at the outputs of circuits C17, with the results from HSPICE simulations for several initial glitch durations ranging from 30 to 100 ps. The relative error of our model is presented in Fig. 11 . As can be seen, the error stemming from the approximate gate delay model and the attenuation model we are using ranges between less than 5% and about 20% in one instance (40-ps glitch duration), while averaging 9% overall for an effective 3900× average speedup (up to 5000× in some cases). Moreover, simulations for long glitches (which are not attenuated or latching window masked) show that our model captures logical masking with 100% accuracy.
VIII. CONCLUSION AND FUTURE WORK
In this paper, we have presented a symbolic modeling methodology and associated framework for the efficient estimation of soft error susceptibility of a combinational logic circuit. We have demonstrated the efficiency of our framework by applying it on a subset of ISCAS'85 and mcnc'91 benchmarks of various complexities. The framework allows for the analysis of reliability of combinational circuits from various aspects: output susceptibility to error, influence of individual gates on individual outputs and overall circuit reliability, and the dependence of circuit reliability on glitch duration, amplitude, and input patterns.
An area of future research is the use of the proposed symbolic framework for the resynthesis of logic circuits for minimizing soft error susceptibility and error impact.
