An efficient methodology for soft error analysis of sequential circuits based on Monte Carlo sampling is proposed. It uses nested sampling for faster statistical convergence: it samples only from the workload space and statically evaluates the conditional probability over the subspace of particle strike and circuit parameters. A novel check on the stationarity of machine state sequence to reduce the number of samples to convergence is introduced. The flow combines logic simulation for latch-level error propagation and stationarity diagnostic and an improved combinational error simulator with a new masking model based on signal controllability.
INTRODUCTION
Vulnerability of integrated circuits to single event upsets (SEU) caused by extrinsic radiation has become a significant and growing reliability concern. SEUs arise when a highenergy particle, typically, an alpha particle or a by-product of a neutron decay, hits the depletion region of a pn-junction between the drain (source) and the bulk terminals of a MOS transistor. The resulting voltage pulse, known as a single event transient, can produce a bit-flip at circuit registers as it propagates through combinational logic and gets latched into memory elements. It often persists over many cycles and may never be fully masked.
Accurate yet computationally scalable estimation of systemlevel error rate due to SEUs is challenging as it requires analysis at multiple levels of abstraction. Accurate sequential SEU estimation requires capturing the mechanisms of error propagation and masking at both combinational and sequential levels. At combinational level, much research has identified the importance of several masking mechanisms that modulate the probability of a SET propagating to a memory element [1, 2, 3] . Sequential circuit analysis adds the challenge Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. of estimating fault propagation through multiple cycles efficiently. Several methods have been proposed to model, analyze, and estimate SER in sequential circuits [4, 5, 1, 6] . Signal-probability based methods [4, 5] are based on analytical computations of error probability which allows achieving high computational efficiency. However, these methods are generally not able to capture the complex temporal and spatial dependencies between internal machine states defined by a state transition graph [7, 8] . As a result, the level of accuracy allowed by the purely analytical methods is not generally sufficient. Symbolic methods [1, 6] eliminate explicit enumeration of input vectors and, thus, have high coverage but also cannot capture temporal correlations of internal states and typically have very high memory requirements. Samplingbased methods have also been proposed but without a focus on statistical convergence and ways to improve the rate of convergence [9] .
In this paper we describe a soft error prediction flow for sequential circuits that is based on Monte Carlo sampling. Monte Carlo sampling is attractive in several aspects. It provides a feasible approach to dealing with an enormous space of possible workloads via sampling. Perhaps, its main attraction is that it is a flexible framework that is compatible with many requirements needed for accurate simulation. First, it allows ensuring that the faults are injected into legitimate states of a finite state machine (FSM). Second, it allows a static input-specific combinational circuit evaluation of key masking mechanisms.
The basic problem of any procedure based on Monte Carlo sampling is the question of convergence of the estimate. The main contribution of this paper is the flow based on the rigorous analysis of the following questions: (1) how to analyze the variance of the estimate, (2) how to know when the simulation can be terminated, i.e., what are the convergence criteria, and (3) what are the sampling and simulation strategies that are efficient specifically in the sense of error variance reduction.
A sequential circuit Monte Carlo faces the problem of a large sampling space: a state of the circuit and the values of the primary inputs when the strike occurs, the location of a particle strike and its timing within a given strike-cycle, the magnitude of charge deposited by the particle at a circuit node. Sampling the state space directly is difficult. Because in a sequential circuit the state is defined by a history of primary outputs, a random assignment of state bits leads to a highly inaccurate estimate of the error rate. We propose a historydependent strategy: starting from a random internal state, we sample a sequence of primary inputs, until the internal state transitions approach a stationary state. In particular, we advocate using a check on the stationarity of the machine state sequence aimed at reducing the number of samples to convergence. The adopted heuristic check monitors the potential scale reduction factor to infer stationarity: its convergence to unity indicates high degree of stationarity [10] . As results indicate, injecting errors into non-stationary states leads to wasted simulation e↵ort.
With an FSM state correctly sampled in stationary phase, the seemingly most natural sampling strategy, that we refer to as a direct approach, is to sample from the strike parameter space (within-cycle time, location, and charge magnitude) directly to compute the latch-level error probability. We show that the direct sampling strategy is sub-optimal from the convergence point of view. Instead, we advocate a hybrid static-sampling approach that exploits a conditional probability estimation property. In that approach, we sample the workload input space but statically evaluate the conditional expectation of error probability for the given input over the remaining sampling sub-space in its entirety. In e↵ect, a much larger sample space is covered by each sample and the number of workload samples required for convergence is reduced. However, the specific computation occurring after each sampling is now di↵erent since the circuit-level simulator needs to compute the conditional probability rather than simply propagate a binary value through a circuit. Thus, the efficiency of the overall procedure is based on our ability to efficiently compute the conditional expectation of bit errors.
At the combinational level, we use a static algorithm that combines electrical cell pre-characterization in the style of static timing analysis and an efficient timing-window analysis to enable an accurate prediction of the probabilities of single event transients being latched: this captures the electrical and timing-window masking mechanisms. Because the input to a combinational circuit is fixed at the time of analysis, we are able to improve the accuracy of latching window (LW) modeling by incorporating signal controllability. This translates into the overall accuracy improvement. A single strike can impact multiple flip-flops and our algorithm captures such an event via a novel strategy of representing impacted flip-flops as sets along with the associated set error probability. This captures the impact of correlated multi-bit errors.
If the error probability for a flip-flop (FF) set is above a threshold, a set of bit-errors is injected and the simulation moves to the purely logic domain. The circuit outputs are monitored to determine if the error persists or is masked. A single simulation run is terminated if no internal latch shows an error (indicating the fault is masked) or after a prespecified number of cycles. The result of each such run is a binary outcome of error injection. The product of combinational error probability and the binary outcome contributes a data point to the Monte Carlo series. The individual estimates are not independent. For correlated samples, the convergence of this series is assessed continuously using the nonoverlapping batch method [10] .
MONTE CARLO FLOW AND ITS CON-VERGENCE
In this section we present the proposed Monte Carlo flow and analyze its convergence properties. In particular, we demonstrate that the nested sampling strategies based on the conditional expectation approach achieves faster convergence than a direct method.
A sequential circuit can be formally abstracted as an FSM F = h⌦I , ⌦O, ⌦S, δ, ⇢, s0i, where ⌦I , ⌦O, ⌦S denote the set of possible primary inputs (PI), outputs (PO), and internal states respectively. s0 is the initial state in ⌦S. δ is the state-transition function:
and ⇢ is the output function:
A strike at cycle n can be characterized by a set of parameters pn that include an indicator variable zn capturing strike occurrence in cycle n, strike time tn, location ln and charge qn: pn = (zn, tn, ln, qn). Possible particle strikes extend the error-free FSM F to a faulty FSM F 0 with the state-transition function δ 0 and the output function ⇢ 0 , which have additional dependence on strike parameters pn:
Here ⌦P represents the set of possible particle strikes. A particle strike generated at cycle n may a↵ect the output ports of interest immediately at cycle n or be latched in FFs and a↵ect the outputs later. We use an indicator variable et n to denote whether a strike at cycle tn ever causes an output error. We observe that et n is a function of particle strike information pt n , current state st n , and subsequent inputs im (m ≥ tn):
In the following discussion, we use capital symbols to denote random variables in order to distinguish them from the associated realizations: e.g., ET n denotes the random variable for the indicator variable et n . The Monte Carlo simulation is outlined in Algorithm 1. The proposed flow is based on nested sampling using conditional expectation but we include direct sampling to highlight the di↵erences. [11] . It can be shown that for these systemsr ⇤ indeed exists, i.e., is unique, and is equal to the expectation E⇡[ET n ] with respect to its stationary distribution [6] :r
For sequential circuits whose Markov chains lack the above properties the error estimates produced by simulations based on di↵erent initial conditions may not converge. Efficient checking of whether a given circuit satisfies these criteria is not dealt with in this paper but is an important challenge for future work.
The well-behaved convergence ofr ⇤ to E⇡[ET n ] enables the nested sampling strategy. According to the law of total expectation:
where EP [ET n |I, S] is the conditional expectation of error rate evaluated over all possible strike injection sites for a given input and state. The nested sampling strategy uses this conditional probabilty in place of ET i defined for a single injection site. Now, the empirical error rate under the nested sampling strategyr 0 is:r
where M is the number of cycles for which injections are evaluated. Critically, in terms of empirical convergence, the above analysis establishes thatr ⇤ = limN!1r = limM!1r 0 . The evaluation of EP [ET n |I, S] needs to further comprehend two cases. A strike can a↵ect primary outputs immediately at the injection cycle: we represent this possibility via an indicator variable F ; . Alternatively, a strike can first cause errors on one or more internal flip-flops. We capture this by identifying for each injection cycle the set of a↵ected flip-flops (FF sets) and the associated error probability. We use the indicator variable Fi to represent the errors latched on the i-th set of FFs. Because F ; and Fi's are exclusive, E[ET n |I, S] can be represented as:
This decomposition is useful since EP [ET n |I, S, Fi = 1] can be evaluated efficiently through fast logic-level simulation. The computation of P r(F ; = 1|I, S) and P r(Fi = 1|I, S) is the focus of Section 3. We now analyze the convergence behavior and prove that the procedure that relies on nested sampling and the use of conditional expectation has faster convergence than the direct sampling method. Since Et n and EP [ET n |I, S] are functions of the Markov chain of the state sequence S, we rely on the weak Law of Large Numbers (LLN) for stationary Markov chains [11] to analyze the convergence rate of direct and nested sampling:
where σ 2 denotes the steady-state variance constant (SSVC), given by
Intuitively, SSVC acts as the measure of total covariance for a Markov chain. Because an empirical average of conditional expectation EP [ET n |I, S] is used in the nested sampling strategy, the corresponding SSVC becomes
According to LLN, smaller SSVC implies faster convergence (i.e., smaller N for same ✏). Now we show that nested sampling reduces SSVC enabling faster convergence. In order to demonstrate that conditioning improves convergence rate, we need the following Theorem regarding SSVC of conditional expectations [12] :
Theorem. For a random process {Xn} with a finite steadystate variance constant σ
Thus, compared to direct sampling, sample SSVC is reduced in nested sampling, leading to faster convergence. Section 4 confirms this significant improvement in rate of convergence.
The above convergence analysis is based on the important assumption that we sample from the stationary distribution of FSM states. If we sample FSM states that do not represent a stationary distribution, the convergence is slower. To overcome this, we introduce a diagnostic that monitors a sequence of FSM states to determine the on-set of stationarity. The monitoring involves only fast logic-level simulation. We use a monitoring method based on the potential scale reduction factor (PSRF) to infer stationarity [10] . The method generates m independent sequences of length n that originate from di↵erent random seeds and convergence to stationarity is established when the parallel trajectories begin to have similar distributional properties as measured by the PSRF, defined as:
in which W is the average covariance matrix of each state trajectory and B is the covariance matrix between trajectories. λmax(·) denotes the largest eigenvalue of a matrix. In the experiments, the matrices B and W are observed to be sparse. Therefore, the efficient power method [13] can be employed to find the largest eigenvalue, as we show in Section 4.
COMBINATIONAL ERROR ANALYSIS
This section presents an efficient algorithm to evaluate the conditional error probabilities defined in Equation 1 at the combinational circuit level. For compactness, we refer to both P r(F ; = 1|I, S) and P r(Fi = 1|I, S) as Pe. We use a static algorithm that combines electrical cell pre-characterization and a new timing-window analysis to capture electrical and timing-window masking. Consider a combinational circuit with Ng gates each of area Ai such that the the total circuit area is At. Then, the error probability Pe at the output can be calculated as [1] :
AiRi(qm)∆q∆tPi(qm, tn)
where R(q) is the e↵ective charge-dependent strike rate, ∆q and ∆t are the discretization steps for charge distribution and time, qm = m∆q, and tn = n∆t. Pi(q, t) is the probability for a strike at gate i with charge q and at a time t to be latched. The challenge of combinational analysis is an efficient computation of Pi(q, t) . We use the cell-based technique of [1] to capture the evolution of a pulse, i.e., its electrical masking. The novel contribution is in customizing the analysis to the Monte Carlo setting in which the combinational input is fixed at the time of analysis: this allows us to improve the accuracy of latching window modeling by incorporating signal controllability. A soft error glitch is only captured at a latch if it is present at the D-input of a flip-flop (for DFF) in the interval [−Ts, T h ]. (This is because it needs to reach D-input Ts before the clock edge t clk and also to remain steady for T h following the clock edge.) Note that an error pulse is latched only if covers the entire interval. A timing window is the time such that a glitch that covers it will propagate to a latch, and a glitch outside it will not be latched. A latching window for a gate g is the set of time intervals LW (g) in which a pulse generated at g can propagate to some set of latches if the pulse contains at least one of intervals. Each interval is described by start and end times:
, such that the overall latching window LW (g) for a gate g is given by
Since the pulse needs to cover at least one interval to be latched, the latching condition for a pulse of width P W generated at time t can be formally stated as below, where we designate the pulse start and end times as S p i = t and E p i = t+P W :
Starting with the latching window at the D-input to a latch, the LWs of gates are derived by a reverse-topological order traversal of the circuit graph. We distinguish two scenarios in the backtracking procedure: non-reconvergent paths and reconvergent paths. For the scenario of a non-reconvergent path from gate g to gate h, the error pulse generated at (the output of) gate g at time t with pulse width P W is delayed and shifted in time by d(h) once it propagates to gate h. In order to be latched, the pulse at the output of gate h needs to cover at least one latch window LWi(h):
Solving the inequality above, we get 
Thus, LW (g) is just a translation of LW (h) by the propa-
We now consider the case in which reconvergent paths are present. Figure 1 
illustrates the case of two reconvergent paths g-B-h and g-C-h. We assume that d(B)  d(C).
It turns out that the resulting LW depends on the gate type at the reconvergent node, instantaneous signal values, the propagation delays and the error pulse width.
An error pulse propagating along two paths creates two pulses at the inputs of a reconvergent gate: Figure 2 .
In contrast to the conventional method taking union of all LWs [14] , the main intuition for our model is that a gate will not produce an error when there is a correct controlling signal at one of its inputs. We capture this via the notion of the error inhibition window as the set of time intervals over which at least one controlling input is correct: tinh = {t : 9j, in h,j (t) is controlling}, where in h,j (t) refers to the signal values of the j-th input of gate h at time t. Next, we introduce the notion of an e↵ective pulse EP (g, h) as the union of input pulses minus the error inhibition window:
Thus, an e↵ective pulse represents all time intervals of faulty pulses at the inputs of gates that produce error at the outputs. Excluding the inhibition time is critical for the overall accuracy, as the results from experiments show that t inh and PWs are of the same order of magnitude (the typical value of t inh /P W is about 40%). Generally, the e↵ective pulses EP (g, h) can be represented as a set of intervals: Figure 2 , at all times there is at least one correct input eliminating the error pulse, i.e. EP (g, h) = ;.
In deriving the corresponding latching windows, we repeat the earlier procedure but replace the raw (physical) pulse start/end times t, t + P W with the e↵ective pulse start/end times at the input to h:
The overall LW at gate g is the union of the sub-intervals LWij(g) found above:
We continue using Figure 1 
which can be re-written as
Thus, the latch window of g becomes [
Reconvergent paths can lead to an exponential number of intervals in the representations of LWs for some gates. This is because a raw SET pulse will generate two e↵ective pulses under the scenario that both B,C are non-controlling and Figure 2 . In order to achieve scalability, we introduce the approximation of interval filling to make EP (g, h) contain only one interval, i.e.
Technology scaling reduces gate delay while P W of SETs increases. For the 16nm technology node used in the experiments, the gate delay is about 10ps while P W is about 1ns. Thus, the fraction of reconvergent paths that require interval filling is relatively small: less than 0.002% in the experiments. Among the cases that required interval filling, the interval gap ✏ is negligible compared to the total P W = P W 1 + P W 2. We also find that the average relative gap ✏/(P W 1 + P W 2) is less than 3%, indicating that the inaccuracy introduced by this approximation is small. Table 1 : Runtime and accuracy analysis for circuit-level simulators.
EXPERIMENTAL RESULTS
In this section, we report on the extensive experiments to demonstrate the e↵ectiveness of the proposed Monte Carlo framework together with the circuit-level simulation algorithm.
The Monte Carlo framework and the circuit-level static SEU modeling algorithm were implemented in C++ and run on an Intel Xeon 2.93GHz workstation with 74G bytes of memory. The circuits are chosen from the ISCAS and ITC benchmarks [15, 16] as well as the functional units from ARM Amber 25 and synthesized based on the 16nm Predictive Technology Model [17] . For each testbench, the technology-mapped netlist is taken as an input. The output of our tool is a list of sets of impacted primary outputs and their error probabilities.
We first demonstrate the accuracy and efficiency of the proposed circuit-level simulator as a crucial building block of the framework. Table 1 shows the accuracy and runtime comparison of the proposed simulator considering signal controllability in LW modeling against the strategy ignoring signal controllability and HSPICE. The unit for the SER in the table is 10 −10 . Both circuit-level simulators, with and without considering controllability, achieve a 10 6 X speed-up compared to the HSPICE simulation. HSPICE cannot handle the larger circuits (Wishbone and b17), which we designate by n/a. The e↵ective pulse and inhibition time are not negligible compared to the pulse width: the average t inh /P W is 40%. The ability to accurately model LW by considering signal controllability translates into higher accuracy. Considering controllability allows matching HSPICE to within 0.6%, on average, while the error is 12.6% when signal controllability is ignored.
Next we investigate the e↵ectiveness of monitoring stationarity through the PSRF method. We set m = 5 and choose 1.1 as the threshold for diagnostics. Figure 4 plots PSRF for several benchmark circuits. Each state trajectory approaches the stationary distribution with the increase of the number of cycles. The ability to sample from a stationary distribution results in a large reduction of the number of samples needed, as shown in Table 2 . The runtime needed for the diagnostics, including running multiple parallel logic simulations and evaluating the PSRF metric, is relatively small compared to the runtime savings in needed circuit-level simulation. During the experiments, the empirical convergence analysis on the results of the Monte Carlo runs is done using the nonoverlapping batch means method [18] . The number of samples and total runtimes are listed in Table 3 . As expected, compared to direct sampling, nested sampling reduces the number of samples by up to 1500X and runtime by up to 25X.
After injecting faults into the FF sets, we observed the evolution of status of di↵erent errors (propagated to primary outputs, masked, and still residing on FFs) in Figure 5 . Most errors are either masked or reach the primary outputs in a relatively small number of cycles. There is little change in the distribution of errors after about 20 cycles (for benchmark b17). This means that a small number of cycles may be sufficient to accurately compute the overall error.
To understand the e↵ect of the error injection threshold, we run experiments on one of the benchmarks (b17) and plot the runtime and the relative error (with respect to an evaluation with zero threshold) in Figure 6 . Increasing the error injection threshold increases the inaccuracy but reduces runtime as fewer errors are captured. This can be used to find the optimum setting for the threshold.
CONCLUSIONS
In this paper we describe a Monte Carlo simulation flow for SEU estimation in sequential circuits. We propose a nested sampling strategy that allows faster statistical convergence and introduce a check on the stationarity of machine states for further convergence efficiency. Experiments show that nested sampling reduces the number of samples by up to 1500X and runtime by up to 25X compared to direct sampling. Stationarity checking allows reducing sampling number by 25%, on average.
