Efficient Aging-aware Failure Probability Estimation Using Augmented Reliability and Subset Simulation by AWANO, Hiromitsu & SATO, Takashi
Title Efficient Aging-aware Failure Probability Estimation UsingAugmented Reliability and Subset Simulation
Author(s)AWANO, Hiromitsu; SATO, Takashi
Citation
IEICE Transactions on Fundamentals of Electronics,








IEICE TRANS. FUNDAMENTALS, VOL.E100–A, NO.12 DECEMBER 2017
2807
PAPER Special Section on VLSI Design and CAD Algorithms
Efficient Aging-Aware Failure Probability Estimation Using
Augmented Reliability and Subset Simulation
Hiromitsu AWANO†a) and Takashi SATO††,Members
SUMMARY A circuit-aging simulation that efficiently calculates tem-
poral change of rare circuit-failure probability is proposed. While conven-
tional methods required a long computational time due to the necessity of
conducting separate calculations of failure probability at each device age,
the proposed Monte Carlo based method requires to run only a single set
of simulation. By applying the augmented reliability and subset simulation
framework, the change of failure probability along the lifetime of the device
can be evaluated through the analysis of the Monte Carlo samples. Com-
bined with the two-step sample generation technique, the proposed method
reduces the computational time to about 1/6 of that of the conventional
method while maintaining a sufficient estimation accuracy.
key words: failure probability calculation, NBTI, Monte Carlo, subset
simulation, augmented reliability
1. Introduction
Aggressive scaling of a semiconductor manufacturing tech-
nology has enabled us to integrate billions of transistors into
a single silicon chip. The increasing manufacturing vari-
ability, on the other hand, requires us to consider various
uncertainty sources making the design of reliable circuits a
significantly challenging task. Representative examples can
be found in designs of an SRAM bit-cell or a D Flip-Flop
(DFF). Considering that a modern large-scale integration
(LSI) circuit embeds millions of such components, each cir-
cuit component needs to be highly resilient to the manufac-
turing variability to achieve satisfiable yield. For example,
the failure probability of an SRAM cell must be extremely
low—as low as 10−8 to 10−6, or below [1]. Simulation
methods that can efficiently estimate the rare circuit failure
probability are, therefore, becoming increasingly important
in circuit design.
The rare event simulation is known to be a difficult
task. A straightforward approach for realizing such failure
probability estimation is to rely on the naive Monte Carlo
(MC) method, in which random samples are drawn from
the statistical distribution corresponding to a process vari-
ability. Circuit simulations are then repeatedly conducted
with those samples. However, since the target failure event
is extremely rare, one may obtain very few failure samples
even after utilizing millions of samples. The small number
Manuscript received March 9, 2017.
Manuscript revised July 10, 2017.
†The author is with the VLSI Design and Education Center,
The University of Tokyo, Tokyo, 113-8656 Japan.
††The author is with Graduate School of Informatics, Kyoto
University, Kyoto-shi, 606-8501 Japan.
a) E-mail: awano@vdec.u-tokyo.ac.jp
DOI: 10.1587/transfun.E100.A.2807
of failure samples makes the failure probability estimation
inaccurate, or often makes it impossible when no failure
sample is available. To resolve this issue, various meth-
ods have been proposed. Representative examples include
well-known importance-sampling techniques [2], [3]. In
the importance sampling, the original distribution is “dis-
torted” so that more failure samples can be obtained, which
drastically increases the estimation efficiency. However, the
importance-sampling based methods still required extensive
computational efforts to identify the suitable shape of the
distorted distribution. The computational cost of the impor-
tance sampling tends to grow exponentially as the number
of transistors, i.e., the number of uncertain variables in the
circuit, increases. With the invention of subset simulation
(SubSim) [4], failure probability estimations of larger cir-
cuits, such as DFFs or an SRAM array, became possible
with a practical time limit [5].
Recently, failure probability estimation has become
even more complicated than before—device aging has to
be simultaneously dealt with. Transistors of extremely small
channel lengths and increasing operating frequencies have
accelerated device aging. As a result, device aging has
emerged as not only an academic research topic but also an
industrially important concern. The circuit failure probabil-
ity provides vital information for designers to determine the
critical parameters of the circuit, such as operational voltages
or clock periods, and hence the estimation of the temporal
change of the failure probability over the circuit-lifetime is
becoming particularly important.
In [6] and [7], authors proposed a failure probability
calculation that can take both initial process variability and
device degradation into account. The impact of NBTI on the
stability of an SRAM cell is analyzed in [8]. Those methods,
however, rely on simplified models to make the calculations
tractable. In the cases of [6] and [7], authors employed a
response surface model to approximate inherently nonlin-
ear relationships between the variability and corresponding
circuit metrics, such as noise margin. In [8], noise margin
of an SRAM cell is assumed to follow a normal distribu-
tion. However, approximation using the normal distribution
is accurate only about the mean of the target distribution.
The approximation accuracy is severely deteriorated at the
rightmost tail of the distribution, which is the most impor-
tant region in failure probability estimation. In order to
avoid such approximations and to obtain an accurate failure
probability estimation, an MC-based method is definitely re-
quired. An extension of the MC-based method that utilizes a
Copyright © 2017 The Institute of Electronics, Information and Communication Engineers
2808
IEICE TRANS. FUNDAMENTALS, VOL.E100–A, NO.12 DECEMBER 2017
Latin hyper-cube sampling is proposed in [9]. However, this
method only aims to efficiently capture the temporal change
of the outline of a performance distribution, so it is not suit-
able to apply for calculating rare circuit failure probability.
To the best of our knowledge, there is no efficient method
that gives the temporal change of circuit failure probability.
In this paper, we propose a method that simultaneously
but efficiently handles device manufacturing variation and
device aging to accurately calculate the temporal change of
the rare circuit failure probability. For this purpose, a novel
method that combines an augmented reliability approach and
SubSim was developed. Hereafter, we call the augmented
reliability extension of SubSim as SubSim-AR. SubSim-
AR is originally proposed in [10] in order to assess the
impact of structural design on its reliability. With SubSim-
AR, the temporal change in rare failure probability can be
estimated with only a single SubSim run. As compared to
the conventional assessments through multiple simulation
runs, SubSim-AR can reduce simulation time.
The contribution of our work is to apply and extend
SubSim-AR for realizing the aging-aware failure probability
calculation of the circuits. The key idea of our proposal
is to treat the operation time (circuit age) of an LSI as a
design parameter, enabling the analysis of failure probabili-
ties over the course of the operation time with only a single
SubSim run. However, since the device aging drastically
changes the circuit failure probability, sufficient accuracy
cannot be obtained by simply applying SubSim-AR for the
circuit analysis. With this reason, a two-stage algorithm has
been developed; in the first stage, the temporal change of the
failure probability is coarsely estimated to calibrate param-
eters for SubSim-AR, and in the second stage, accuracy is
improved with the calibrated parameters. The application of
this method is not limited to the aging-aware circuit analy-
sis. Wherever the multiple failure probability calculations
are conducted to evaluate circuit reliability, the proposed
method can be suitably applied for accelerating the simula-
tions.
The rest of this paper is constructed as follows. In
Sect. 2, background knowledge that forms the basis of our
method is reviewed. The proposedmethod and its implemen-
tation will be described in Sect. 3. Then, in Sect. 4, results
of numerical experiments are presented. Finally, conclusion
is drawn in Sect. 5.
2. Preliminaries
2.1 Device Aging
In this section, amodel for negative bias temperature instabil-
ity (NBTI) is briefly reviewed as an example aging mode of
a transistor. However, our approach is independent of aging
modes; other aging modes, such as positive bias tempera-
ture instability (PBTI), hot carrier injection (HCI), and time
dependent dielectric breakdown (TDDB), can be included
similarly to or simultaneously with the case of NBTI.
NBTI is a phenomenon that the threshold voltage (VTH)
of a pMOS transistor increases gradually when negative bias
is applied to its gate terminal. The degraded VTH recovers,
but often not perfectly, as soon as the negative bias is re-
moved. This makes the modeling and the prediction of the
impact of NBTI a very challenging task.
There are many models that are intended to predict the
VTH change due to NBTI [11]–[13]. Considering that our
final goal is to calculate how the failure probability changes
over a long period of operation time, we employ a long term
prediction model [13], in which the magnitude of VTH shift
(∆VTH) is expressed as
∆VTH = k · tn. (1)
Here, k is amodel parameter that reflects the stress condition,
operational temperature, and the chip fabrication process. t
is an age of the transistor. n is also a model parameter
which corresponds to the degradation speed. For scaled
transistors, variability in device degradation is frequently
observed [14]. In order to take this effect into consideration,
we treat n as a random variable [13], i.e., the value of n varies
from transistor to transistor. It is experimentally found that
the statistical distribution of n can be well approximated by
a log-normal distribution [15]. Hence, in this paper, we
employ a log-normal distribution to replicate the variability
in device degradation. The logarithm of n (nlog) is assumed
to follow a normal distribution:
nlog ∼ N (nlog |µn, σn). (2)
Here, N (x |µ, σ) is the probability density function (PDF)
of a normal distribution,









2.2 Failure Probability Calculation
In general, the calculation of a failure probability of a circuit
is to evaluate the following integral:
PF |φ = P(F |φ) =
∫
I (x;φ)P(x |φ)dx. (4)
Here, φ is an L-dimensional vector whose components cor-
respond to design parameters of the circuit, such as channel
length or width. x is the D-dimensional random variable
that represents process variability. The joint PDF of x is
given by px (x). Without loss of generality, we assume that
the random variables xd , which are the elements of x, are
mutually independent because any sets of random variables
can be converted into uncorrelated random variables using a
transformation called “whitening.” Therefore, px (x) can be





where pdx (·) is a one-dimensional PDF for xd . I (x;φ) is
AWANO and SATO: EFFICIENT AGING-AWARE FAILURE PROBABILITY ESTIMATION
2809
an indicator function that returns “1” when a realization of
process variability x causes the malfunction of the circuit
whose design parameters are given by φ, and returns “0”
when the circuit functions correctly.
The failure event can be defined for the circuit to exceed
its performance specification, such as a noise margin or a
signal propagation delay, over the pre-defined threshold θ.
Hence, it is natural to define the indicator function as follows:
I (x;φ) =
1 (y (x;φ) ≥ θ)0 (otherwise) . (6)
Here, y (x;φ) is a function which returns the performance
value of the circuit. In most cases, simulators like SPICE
serve as the function y .
2.3 Subset Simulation
Subset simulation (SubSim) is first proposed to efficiently
calculate extremely small failure probability [4]. Note again
that the naive MC method is unsuitable for estimating a
small failure probability. Hence, SubSim decomposes the
extremely small failure probability into a product of inter-
mediate failure probabilities, each of which is large enough
to be estimated using the naive MC method. In other words,
SubSim converts a rare event simulation into a series of
non-rare event simulations.
Let us consider a sequence of nested failure events {F=
FK ⊂ FK−1 ⊂ · · · ⊂ F1}. F is the target failure event and
Fi (i=K−1,· · ·, 2) is the intermediate failure event. Let Ωk
be the failure region which corresponds to the failure event
Fk :
Ωk = {x; y (x) > θk } (k = 1, 2, · · · , K ), (7)
where θk is an element of an increasing sequence θ =
{θ1, θ2, · · · , θK }, which satisfies θ1 < θ2 < · · · < θK = θ.
Then, the failure probability P(F) can be rewritten as
PF = P(F) = P(F1)
K−1∏
k=1




where P1 = P(F1) and Pk = P(Fk |Fk−1) for k = 2, 3,· · ·, K .
By appropriately selecting the intermediate failure events Fk ,
the conditional failure probability P(Fk |Fk−1) can be made
sufficiently large, so that it can be calculated using a naive
MC method.
Through an example, we here describe how SubSim
works. For the sake of simplicity, variable x is assumed to
be two-dimensional, x= (x1, x2).
We first draw N random samples from px (·) to
obtain {x (1,l); l = 1, 2,· · ·, N }. The performance values
{y (x (1,l)); l = 1, 2,· · ·, N } are calculated using a circuit
simulator such as SPICE. Then, the T-th largest value in
{y (x (1,l)); l = 1, 2,· · ·, N } is selected and set to be θ1. We
then identify a subset of samples that satisfies y (x (1,l))>θ1
and relabel them as {x (1,t)F ; t=1, 2,· · ·,T }. This is an example
of how the intermediate failure event F1 is selected. Obvi-
ously, P1 can be calculated as P1=T/N , since θ1 is set to the
T-th largest value and the size of the subset is T .
Our next objective is to calculate the conditional fail-
ure probability P(F2 |F1). To generate random samples
from the conditional distribution, Markov chainMonte Carlo
(MCMC) is used. A typical MCMC is based on a random
walk, i.e., it forms a chain of random samples, which starts
from a given seed sample. A proposal distribution and the
rejection method are used to construct the chain of random
samples. Samples drawn from the proposal distribution de-
cide the move of the candidate. The rejection method then
decides whether to accept the move or not, using another
random sample. For the simulation of the conditional dis-
tribution, T independent chains, each of which starts from
{x (1,t)F ; t=1, 2,· · ·,T } are formed.
In the traditional MCMCs, selection of a proposal dis-
tribution is another challenging task particularly in high-
dimensional cases. To reduce the correlation among random
samples, the variance of the proposal distribution should be
large, which in turn leads to a low acceptance rate of the can-
didate walk. To solve this problem, the modified Metropolis
(MM) [16] is applied.
Let us describe how MM algorithm generates the con-
ditional samples. As mentioned above, the Markov chains
start from x (1,t)F . Therefore, the heads of chains, x
(2,t,1) , are
initialized as x (1,t)F for t = 1, 2,· · ·,T . To obtain the second
sample x (2,1,2) = [x (2,1,2)1 , x
(2,1,2)
2 ], we first draw a random
sample xnew1 as
xnew1 ∼ qx (xnew1 |x (2,1,1)1 ). (9)
qx (·|·) is a one-dimensional proposal distribution that
satisfies qx (xnew |xold) = qx (xold |xnew). Normal distri-
bution is usually selected as the proposal distribution:
qx (xnew1 |x (2,1,1)1 ) = N (xnew1 |x (2,1,1)1 , σ2x ). The choice of σ
affects the performance of MM algorithm. Practically, σ is
set to the same value as the standard deviation of the manu-
facturing variability. We then calculate







where p1x (·) denotes the original PDF of x1 for the man-
ufacturing variability. A random sample u (0 ≤ u ≤ 1) is
drawn from the uniform distribution and the candidate sam-
ple x̂ (2,1,2)1 is replaced with the new sample or with the same
sample as the previous one:
x̂ (2,1,2)1 =

xnew1 (u ≤ min(1, r))
x (2,1,1)1 (u > min(1, r))
. (11)
A two-dimensional candidate sample x̂ (2,1,2) is obtained by
calculating the other component, x̂ (2,1,2)2 in a similar way.
Then, a circuit simulation for the candidate sample is per-
formed to calculate the corresponding performance value
ŷ (2,1,2)= y ( x̂ (2,1,2)). Finally, x (2,1,2) is set to
2810
IEICE TRANS. FUNDAMENTALS, VOL.E100–A, NO.12 DECEMBER 2017
Fig. 1 A two-dimensional example to illustrate SubSim operation. (a)
Naive Monte Carlo simulation to calculate P1 in the first step of SubSim.
The intermediate failure event Ω1 is adaptively chosen. (b) Starting from
the samples in Ω1, chains of random samples are generated in the second
step, and next intermediate failure event Ω2 is again selected.
x (2,1,2) =
 x̂
(2,1,2) ( ŷ (2,1,2) ≥ θ1)
x (2,1,1) (otherwise)
. (12)
By repeating the above procedure, a set of conditional ran-
dom samples {x (2,1,l); l = 1, 2,· · ·, L} are generated. Here, L
denotes the length of the Markov chain and L = N/T . The
other Markov chains starting from {x (2,t,1); t=2, 3,· · ·,T } are
generated and simulated in the same way. Finally, we obtain
T ·L random samples because there are T chains of length L.
These procedures are illustrated in Fig. 1.
According to [4], if the Markov chain starts from an
initial sample which follows the conditional distribution
P(x |x ∈ Ω1), then the generated samples also follow the
same distribution. In the above procedure, the initial sam-
ples can be considered to follow the conditional distribution
P(x |x ∈Ω1) since the initial samples are selected from ran-
dom samples with a distribution px (·) and that are in the first
level failure regionΩ1. Therefore, the newly generated sam-
ples also follow P(x |x ∈Ω1). We again identify T-th largest
performance value in {y (x (2,t,l)); t=1, 2,· · ·,T ; l=1, 2,· · ·, L}
to set θ2. The conditional failure probability is again calcu-
lated as P2 = P(F2 |F1) = T/N . The samples that belong
to the next failure region {Ω2 | y (x) ≥ θ2} are identified and
MCMC is again conducted to fill the rest of N−T samples
distributed in Ω2.
Repeating the above procedure until the threshold θk














Here, K is the number of iterations at which θK exceeds θ and
T ′ is the number of random samples that satisfy y (x (K,t,l))>
θ in the last SubSim iteration.
The selection of T is an integral part of SubSim to
achieve a good sampling efficiency. According to [4], it is
suggested to use 0.1, so in this paper, N and T are chosen
so that the probabilities of the intermediate failure events are
maintained about 0.1.
2.4 Augmented Reliability Problem
In order to calculate the failure probability given the de-
sign parameters, we first introduce an augmented reliability
problemwhere the design parameters φ are deliberately con-
sidered as stochastic variables whose PDF is given by pφ (·).
In the rest of this manuscript, we call pφ (·) as “prior distribu-
tion.” The role of pφ (·) here is not to introduce uncertainty
but to facilitate stochastic handling of φ in the failure prob-
ability calculation framework.
In the augmented reliability problem, the failure proba-
bility P(F |φ) is considered as a conditional probability dis-
tribution given by a specific design variable φ. By applying
Bayes’ theorem, P(F |φ) can be rewritten as follows:
P(F |φ) = P(φ |F)
P(φ)
P(F). (14)
Studying (14), we notice that P(F |φ) can be calculated when
we know P(F) and conditional probability P(φ |F).
The calculation of P(F) is the same as that described
in Section 2.3 except that the pair of the process variability
and the design variables [x,φ] are now considered to be the
uncertain variables. The remaining problem is the estimation
of the conditional probability P(φ |F). Let us take a look at
the failure samples which are obtained during the SubSim
run. Then, we notice the failure samples distribute according
to P(x,φ |F) and their φ components distribute according to
P(φ |F), whichmeans the samples that are marginalized over





According to this observation, P(φ |F) can be calculated
by non-parametric density estimators, such as the kernel
density estimator [17]. Hence, both P(F) and P(φ |F) can
be obtained with a single SubSim run and using Eq. (14)
yields P(F |φ).
As noted earlier, the calculation of P(F |φ) involves
the estimation of the PDF of P(φ |F) from the conditional
samples, which is known to be non-parametric density esti-
mation. In this paper, the PDF is estimated using a histogram
so that Eq. (14) simply becomes:
P(F |φ ∈ Im) = P(φ ∈ Im |F)P(φ ∈ Im) P(F). (16)
Here, P(φ ∈ Im) indicates the interval probability on a
hyper-cube Im (m=1, 2,· · ·,M) andM represents the number
of bins in the histogram.
3. Proposed Method
In this section, we apply SubSim-AR to estimate the tem-
poral change of the circuit failure probability due to device
AWANO and SATO: EFFICIENT AGING-AWARE FAILURE PROBABILITY ESTIMATION
2811
Fig. 2 Failure probability of a 1-D example calculated with SubSim-AR (markers) and the analytical
method (line). The bar graph shows the histogram of φex component of the failed samples.
aging. The key idea of the proposed method is to treat the
device age as the design parameter φ. With the proposed ex-
tension, failure probability as a function of device age can be
estimated through a single SubSim run. This improves com-
putational efficiency against the existing method, in which
SubSim has to be repeated many times while gradually al-
tering the device age.
3.1 Failure Probability Calculation by Example
In this subsection, the proposed failure probability calcula-
tion is explained using a simple example.
We consider randomvariable r which follows a standard
normal distribution whose PDF is given by (3), where µ=0
and σ = 1, and let us calculate the probability that φex · r
exceeds a pre-defined threshold levelCex. Here, φex is a one-
dimensional design parameter. In other words, the failure
event is defined as Fex = {(r, φex); φex ·r >Cex}. Because the
random variable r is a scalar, we can analytically calculate
the conditional failure probability:
P (Fex |φex) =
∫ ∞
Cex/φex






where Φex(x) is the cumulative distribution function (CDF)






















where !! denotes the double factorial.
Now, let us calculate P(Fex |φex) using SubSim-AR to
comparewith that of the analyticalmethod based on Eq. (17).
Figure 2(a) shows the comparison of the two methods. Here,
the target thresholdCex is set to 5. Note again that we have to
choose the PDF of the design parameter φex before applying
SubSim-AR. In this particular example, the uniform distri-
bution on the interval [1.5, 3.0] is used. The bar graph shows
the histogram of φex component of the failed samples, which
corresponds to P(φ ∈ Im |F) in Eq. (16). Multiplying the
augmented failure probability P(F) by the conditional failure
probability P(φ ∈ Im |F), we obtain the failure probability
as a function of the design parameter, which is shown as the
marker with error bar indicating 95% confidence interval.
The red line shows the failure probability that is analytically
calculated using Eq. (17). Since the confidence interval only
measures an absolute error, it is not suitable for measuring
an estimation accuracy of a rare failure probability. Hence,
according to [5], we utilize a normalized confidence interval
as a quantitative measure of the yield estimation. Here, the
normalized confidence interval is defined as the confidence
interval divided by the estimated probability. Fig. 2(c) shows
the estimation error using a normalized confidence interval.
We notice that the histogram well approximates the change
of the failure probability as a function of the design param-
eter and that SubSim-AR gives very good estimation of the
failure probability.
From this example, we recognize the advantage of
SubSim-AR over the conventional method that separately
handles manufacturing variability and the device aging. In
the proposed analysis, the augmented failure probability
P(F) is a common factor, i.e., it remains constant for differ-
ent sets of design variables φ. In SubSim-AR, the random
samples are allowed to explore the “augmented” uncertain
space, consisting of the pair of the process variability and the
design variables. This scheme enables all random samples
contribute to the estimation of P(F), increasing the calcula-
tion efficiency.
3.2 Adjustment of Prior Distribution for Design Parameter
The selection of statistical distribution for design parameter
P(φ), is also an integral part to achieve higher estimation ac-
curacy. Studying Fig. 2(a), the confidence intervals become
wider as the design parameter φex becomes small, which is
attributed to the uniform prior given to the design parameter
φex. Again, SubSim-AR estimates P(F |φ ∈ Im) by making
a histogram of the φ component of the failed samples. If
the number of failure samples in an interval Im is small,
the estimation accuracy will be drastically deteriorated. In
order to overcome this issue, we propose to adjust the prior
distribution so that the failure samples distribute evenly in
the augmented variability space.
2812
IEICE TRANS. FUNDAMENTALS, VOL.E100–A, NO.12 DECEMBER 2017
Before describing the detailed procedure of prior dis-
tribution adjustment, let us first demonstrate the impact of
prior distribution on the estimation accuracy using the above
example. Figure 2(b) shows the estimations using the cal-
ibrated prior distribution. In this example, only the prior
distribution for φ is adjusted and the other parameters, such
as the number of random samples, remain unchanged. The
estimation accuracy in Fig. 2(c) shows drastic improvement
by the prior distribution adjustment.
Obviously, the optimal choice for the prior distribution
is Popt(φ) ∝ 1/P(F |φ). However, sampling from Popt(φ) is
infeasible because it requires calculation of the conditional
failure probability, P(F |φ). To resolve this chicken-and-
egg problem, we propose a two-stage algorithm: in the first
stage, the temporal change of failure probability is estimated
coarsely to calibrate the prior distribution, and in the second
stage, failure probability is accurately estimated using the
calibrated distribution. The proposed two stage algorithm is
summarized as follows.
(1) First stage: In the first stage, a rough sketch of the
failure probability as a function of the device age is estimated
by using SubSim-AR algorithm. Note again that the first
stage is required only to calibrate the prior distribution and
hence only the small number of circuit simulations is con-
ducted to avoid performance overhead. SubSim-AR starts
by using a uniform prior distribution. Let P1st(F |φ ∈ Im) be
the estimated failure probability as a function of the chip age
in the first stage. Then, the cumulative distribution function
(CDF) of Popt(φ) is approximated as:




P1st(F |φ ∈ Ii) , (20)
whereC is a normalizing constant so thatΦM = 1 is satisfied.
Since a continuous rather than discrete CDF is preferable for




ep − 1 (φmax − φmin) + φmin, (21)
where p is a fitting coefficient, which is calibrated so that
Eq. (21) best fits the pointwise estimation of Eq. (20). φmin
and φmax are the lower and the upper limit of the design
parameter, i.e., φ ∈ [φmin, φmax].
Second stage: In the second stage, SubSim-AR is con-
ducted to accurately estimate the failure probability. Here,
φ is sampled according to the CDF of Eq. (21) obtained
in the first step. In this sample generation, any methods
of random number generation can be used, such as inverse
transform sampling [18], etc. Finally, the failure probability
as a function of device age is estimated using Eq. (16).
3.3 Proposed Algorithm
Detailed implementation of our proposed SubSim-AR is
given in Algorithms 1 and 2. In the following, the pro-
cess variability x and the design variable (circuit age) φ are
Algorithm 1 Proposed two-stage algorithm.
1: Obtain failure samples using SubSim-AR. Here, the uniform prior dis-
tribution is given to the design variable (φ), i.e. pφ ( ·) is assumed to
be uniform.
2: Construct CDF for the prior distribution by using Eq. (20) and Eq. (21).
3: Conduct SubSim-AR again to estimate failure probability as a function
of device age. In this step, the φ is sampled according to CDF of
Eq. (21) with model parameters calibrated.
assumed to follow px (·) and pφ (·), respectively. The failure
occurs when the performance value exceeds the threshold, θ.
Note that, in the proposed algorithm, φ is directly sampled
from the prior distribution to reduce the correlation among
samples.
3.4 Calculation of Confidence Interval
The confidence interval of the estimated failure probability
can be derived as below [5]. According to the central limit
theorem, P1 approximately follows the following normal dis-
tribution:
P1 ∼ N (P1 |µ1, ν1). (22)
Here, µ1 =T/L and ν1 is the sample variance which can be
approximated as
ν1 ≈ 1/L · µ1(1 − µ1). (23)
Deriving the estimator of Pk for k ≥ 2 is a non-trivial
task because MCMC generates correlated samples. We first









Here, IFk (x, φ) is an indicator function that returns “1” when
y (x, φ) ≥ θk . Otherwise, it returns “0.” It is shown that Pk
approximately follows the following normal distribution [5]:
Pk ∼ N (Pk |µk, νk ). (25)







s(k,t), νk ≈ 1T (T − 1) ·
T∑
t=1
(s(k,t) − µk ).
(26)
The target failure probability PF can be obtained as the
product of Pk . To convert the product into the summation,
we take a logarithm of Pk . It can be shown that log(Pk )
approximately follows the following normal distribution [5]:
log(Pk ) ∼ N [log(Pk ) | log(µk ), νlog,k], (27)
where νlog,k is given by
νlog,k = νk/(µk )2. (28)
AWANO and SATO: EFFICIENT AGING-AWARE FAILURE PROBABILITY ESTIMATION
2813
Algorithm 2 SubSim-AR algorithm.
1: Generate N random samples from px ( ·) and pφ ( ·) to obtain
{(x (1, l), φ(1, l) ); l=1, 2, · · ·, N }.
2: Calculate corresponding performance value {y(x (1, l), φ(1, l) ); l =
1, 2, · · ·, N }.
3: Set θ1 to T -th largest performance value and identify the random
samples which satisfy y(x (1, l), φ(1, l) ) ≥ θ1. Label these samples
as {(x (1, t )F , φ(1, t )F ); t=1, 2, · · ·, T }.
4: Calculate P1 as T/N .
5: Set k=2 and L=N/T .
6: Set (x (k, t,1), φ(k, t,1) )= (x (k−1, t )F , φ
(k−1, t )
F ); t=1, 2, · · ·, T .
7: while θk−1<θ do
8: for t=1, 2, · · ·, T do
9: for l=2, 3, · · ·, L do
10: for d=1, 2, · · ·, D do
11: Generate random variable xnew
d
from the proposal distri-
bution qx (xnewd |x (k, t, l−1)d ).






13: Generate a random value u which uniformly distribute on
[0, 1] and set x̂ (k, t, l)
d
as






u ≤ min(1, r )




15: Generate φ̂(k, t, l) from pφ ( ·).
16: Calculate ŷ(k, t, l) = y(x̂ (k, t, l), φ̂(k, t, l) ) using a circuit sim-
ulator.
17: Set (x (k, t, l), φ(k, t, l) ) as
(x (k, t, l), φ(k, t, l) )
=
{
(x̂ (k, t, l), φ̂(k, t, l) ) ŷ(k, t, l) ≥ θk−1
(x (k, t, l−1), φ(k, t, l−1) ) otherwise .
18: end for
19: end for
20: Set θ̂k as the T -th largest value among {y(k, t, l) ; t =
1, 2, · · ·, T and l=1, 2, · · ·, L }.
21: if θ̂k ≥ θ then
22: Set K =k and θk =θ.
23: Identify samples that satisfy y(x (K, t, l), φ(K, t, l) ) ≥ θ and label
them as {(x (K, t )F , φ(K, t )F ); t=1, 2, · · ·, T ′ }.
24: Using the same algorithm provided above, generate another N ′=
N/T ′ to populate conditional samples for accurate estimation of
the conditional probability P(φ ∈ Im |F ).
25: Calculate PK by PK =T ′/N .
26: Break the while loop.
27: else
28: Set θk as the T -th largest performance value.
29: Identify samples that satisfy y(x (k, t, l), φ(k, t, l) ) ≥ θk and label
them as {(x (k, t )F , φ(k, t )F ); t = 1, 2, · · · , T }.
30: Calculate Pk by Pk =T/N .
31: Set k=k + 1.
32: end if
33: end while
34: Calculate PF =
∏K
k=1 Pk and the histogram of φ component of the
failure samples (x (K, t )F , φ
(K, t )
F ) to obtain P(φ ∈ Im |F ).
35: P(F |φ ∈ Im ) is calculated using (16).
Then, it is shown that log(PF ) approximately follows the
following normal distribution:
log(PF ) ∼ N [log(PF ) | log(µF ), νlog,F ], (29)





and νlog,F is the variance of log(PF ) whose upper bound can










νlog,k · νlog,k+1. (31)
We then derive the confidence interval of P(φ∈ Im |F)=
Pm








where Im(φ) returns “1” when φ ∈ Im and “0” when φ <
Im. Note that the φ component of the failed samples is
used here. Just like the estimation of Pk , it can be shown
that the logarithm of the conditional probability log(Pm
φ |F )
approximately follows normal distribution:
log(Pmφ |F ) ∼ N [log(Pmφ |F ) | log(µmφ |F ), νmlog,φ |F ]. (33)
Here, µm
φ |F and ν
m














(s(t,m)cond − µmφ |F ). (35)
By combining the logarithm of Eq. (16), and Eqs. (29)
and (33), we finally obtain the approximation of the PDF of
log P(F |φ∈ Im)= log PmF |φ as follows:
log(PmF |φ) ∼ N [log(PmF |φ) | log(µmF |φ), νmlog,F |φ], (36)
where log(µm
F |φ) and ν
m
log,F |φ are given by
log(µmF |φ) = log(µF ) + log(µ
m
φ |F ) − log Pmφ , (37)
and
νmlog,F |φ ≈ νlog,F + νmlog,φ |F + 2
√
νlog,F · νmlog,φ |F . (38)
Here, log Pmφ is the logarithm of the probability that the de-
sign variable falls in the region Im, which can be analytically
calculated because φ is an artificial random variable having
a known (given) distribution. In this implementation, we se-
lected a uniform distribution for pφ (·). Therefore, log(Pmφ )
can be obtained as log(1/M) = − log(M), where M is the
number of bins for the histogram. Finally, we can derive the
confidence interval using the above equations. For example,
the 95% confidence interval can be represented as{
exp
[













IEICE TRANS. FUNDAMENTALS, VOL.E100–A, NO.12 DECEMBER 2017
4. Numerical Experiment
4.1 Experimental Setup
In this section, the proposed method is validated through the
numerical experiments of calculating the clock-to-q delay
degradation of a DFF. The schematic of the DFF is shown
in Fig. 3. The failure is defined as an event that clock-to-
q delay to exceed a given threshold. In order to simulate
the process variation, we assume variabilities in the initial
VTH, gate length, and gate width. For a pMOS transistor,
we additionally consider the variability in NBTI-induced
degradation. In total, 84 independent random variables are
used in this experiment.
The initial VTH variation of each transistor is assumed
to follow a normal distribution:
∆V INITTH ∼ N (0, AVT /
√
L ·W ). (40)
Here, AVT is the Pelgrom coefficient with an assumed value
of 4 × 10−9 V/m2. L and W are the channel length and
width, respectively. The variations of gate length and width
are assumed to follow normal distributions whose standard
deviations are 1 nm and 5 nm, respectively.
As the aging mode, we consider the NBTI-induced VTH
shift (∆VNBTITH ) according to (1). Model parameters, such
as k, and the mean and standard deviation of the log-normal
distribution in (2), are adjusted so that the model well reflects
degradations and the variability observed on a commercial
65 nm process. Note again that in the first SubSim run, a
Fig. 3 Circuit schematic of a DFF.
Fig. 4 Calculated failure probability (a) and its accuracy (b).
uniform distribution is given to φ. Then, in the second
run, the distribution is modified so that the failure samples
uniformly distribute over the design variable space.
4.2 Result
We calculated the change of the failure probability as a func-
tion of device age. The red markers in Fig. 4(a) show the
results of the proposed method based on the augmented reli-
ability approach, SubSim-AR. The x- and y-axis correspond
to the device age and failure probability, respectively. The
error bar shows the 95% confidence interval. The required
calculation time is also shown inside the bracket of labels.
In our experiment, all numerical calculations are conducted
on a Linux PC of Intel Xeon E7-4870. As a comparison, the
failure probability shift of the same circuit is estimated by
repeating SubSim at each aging time step from 1.0 to 10.0
years at the interval of 1 year, whose result is shown using the
black marker. Hereafter, we call this conventional approach
which requires multiple SubSim runs as SubSim-Multi.
The proposed method calculates the failure probability
in the intervals. For example, the leftmost red marker shows
the averaged failure probability over the first 1 year. On the
other hand, SubSim-Multi calculates the failure probability
at the exact time point, i.e., the leftmost black marker shows
the failure probability of the 1-year-old circuit. The selec-
tion of the time interval width is practically important since
it affects both on the time resolution of the estimated yield
and on the estimation accuracy: reducing the width of the
time interval improves the time resolution at the cost of the
reduced estimation accuracy, when the total number of the
samples is unchanged. The interval width is adjusted so as to
satisfy a required estimation accuracy while avoiding a large
change in the estimated yield among different time steps. In
order to predict whether the selected interval width satisfies
the above requirements, the outcome of the first stage is again
utilized. In our particular experiment, an interval of one year
should be sufficient to see the degradation of the failure prob-
ability. Studying Fig. 4(a), we notice that the probabilities
obtained by SubSim-Multi do not just suffer from large vari-
ance, but exhibit non-monotonic changes along the device
age, which also demonstrates the advantage of SubSim-AR
over SubSim-Multi.
Figure 4(b) shows the relative error defined as the ratio
AWANO and SATO: EFFICIENT AGING-AWARE FAILURE PROBABILITY ESTIMATION
2815
of the 95% confidence interval to the estimated failure prob-
ability. The red and the black markers again show the results
of SubSim-AR and SubSim-Multi, respectively. Studying
Fig. 4(b), the estimation accuracy of SubSim-AR is almost
constant, owing to the proposed two-stage algorithm. Judg-
ing from error bars, the estimation accuracy of the proposed
method is almost equal to or slightly higher than that of
SubSim-Multi. Considering the computation time differ-
ence, the proposed method achieved 6.38× speed up com-
pared to the conventional method.
5. Conclusion
In this paper, an efficient method to evaluate the impact of
device aging on the failure probability is proposed. The aug-
mented reliability approach, which was originally proposed
to understand the impact of design parameters in structural
design area, was extended to analyze failure probability of
the circuits due to device aging. By considering the age of
the circuit as a design parameter, an efficient calculation of
the degradation of failure probability has become possible.
In order to enhance accuracy for the ages of analysis, a two-
stage algorithm was proposed to adjust the parameters for
prior distribution. The simulation experiment showed that
the proposedmethod (SubSim-AR) achieved 6.38× speed up
over the conventional method with almost equal accuracy.
Acknowledgments
Thisworkwas partially supported by aGrant-in-Aid for JSPS
Fellows and MEXT/JSPS KAKENHI Grant No. 26280014
and 17H01713. The authors also acknowledge support from
VDEC with the collaboration with Synopsys Corporation.
References
[1] A. Bhavnagarwala, X. Tang, and J. Meindl, “The impact of intrinsic
device fluctuations on CMOS SRAM cell stability,” IEEE J. Technol.
Computer Aided Des., vol.36, no.4, pp.658–665, 2001.
[2] R. Kanj, R. Joshi, and S. Nassif, “Mixture importance sampling and
its application to the analysis of SRAM designs in the presence of
rare failure events,” Design Autom. Conf., pp.69–72, 2006.
[3] K.Katayama, S.Hagiwara, H. Tsutsui, H.Ochi, andT. Sato, “Sequen-
tial importance sampling for low-probability and high-dimensional
SRAM yield analysis,” Int. Conf. Comput. Aided Design, pp.703–
708, 2010.
[4] S.K. Au and J.L. Beck, “Estimation of small failure probabilities in
high dimensions by subset simulation,” Probabilistic Eng. Mechan-
ics, vol.16, no.4, pp.263–277, 2001.
[5] S. Sun and X. Li, “Fast statistical analysis of rare circuit failure events
via subset simulation in high-dimensional variation space,” Int. Conf.
Comput. Aided Design, pp.324–331, 2014.
[6] E. Maricau and G. Gielen, “Efficient variability-aware NBTI and hot
carrier circuit reliability analysis,” IEEE J. Technol. Computer Aided
Des., vol.29, no.12, pp.1884–1893, 2010.
[7] E. Maricau and G. Gielen, “Stochastic circuit reliability analysis,”
Design Automation and Test in Europe, pp.1–6, 2011.
[8] K. Kang, H. Kufluoglu, K. Roy, and M. Alam, “Impact of negative-
bias temperature instability in nanoscale SRAM array: Modeling
and analysis,” IEEE J. Technol. Computer Aided Des., vol.26, no.10,
pp.1770–1781, 2007.
[9] Y.L. Chen, W. Wu, C.N. Liu, and L. He, “Incremental Latin hyper-
cube sampling for lifetime stochastic behavioral modeling of ana-
log circuits,” Asia South Pasific Design Autom. Conf., pp.556–561,
2015.
[10] S.K. Au, “Reliability-based design sensitivity by efficient simula-
tion,” Computers Structures, vol.83, no.14, pp.1048–1061, 2005.
[11] T. Grasser, W. Gos, V. Sverdlov, and B. Kaczer, “The universality of
NBTI relaxation and its implications for modeling and characteriza-
tion,” Int. Rel. Phys. Symp., pp.268–280, April 2007.
[12] B. Kaczer, V. Arkhipov, R. Degraeve, N. Collaert, G. Groeseneken,
and M. Goodwin, “Disorder-controlled-kinetics model for negative
bias temperature instability and its experimental verification,” Int.
Rel. Phys. Symp., pp.381–387, April 2005.
[13] K. Sutaria, J. Velamala, C. Kim, T. Sato, and Y. Cao, “Aging statistics
based on trapping/detrapping: Compact modeling and silicon vali-
dation,” IEEE Trans. Device Mater. Rel., vol.14, no.2, pp.607–615,
2014.
[14] H.Awano, S.Morita, andT. Sato, “Scalable device array for statistical
characterization of BTI-related parameters,” IEEE Trans. Very Lagre
Scale Integr. (VLSI) Syst., vol.25, no.4, pp.1455–1466, April 2017.
[15] H. Awano, M. Hiromoto, and T. Sato, “Variability in device degrada-
tions: Statistical observation ofNBTI for 3996 transistors,” European
Solid State Device Research Conf., pp.218–221, 2014.
[16] S.K. Au and J.L. Beck, “Subset simulation and its application to
seismic risk based on dynamic analysis,” J. Eng. Mechanics, vol.129,
no.8, pp.901–917, 2003.
[17] B.W. Silverman, Density Estimation for Statistics and Data Analysis,
CRC press.
[18] C.M. Bishop, Pattern Recognition and Machine Learning, Springer,
2006.
Hiromitsu Awano received his B.E. de-
gree in Informatics and M.Sc. and Ph.D. degrees
in Communications and Computer Engineering
from Kyoto University in 2010, 2012, and 2016,
respectively. He was with Hitachi, Ltd., Tokyo,
Japan in 2016. In 2017, he joined the VLSI
Design and Education Center, The University of
Tokyo, Japan, where he is an assistant professor.
His research interests include CAD for VLSI de-
sign and hardware accelerator for machine learn-
ing. He was a research fellow of japan society
for the promotion of science and a member of IPSJ.
Takashi Sato received B.E. and M.E. de-
grees fromWasedaUniversity, Tokyo, Japan, and
a Ph.D. degree from Kyoto University, Kyoto,
Japan. He was with Hitachi, Ltd., Tokyo, Japan,
from 1991 to 2003, with Renesas Technology
Corp., Tokyo, Japan, from 2003 to 2006, and
with the Tokyo Institute of Technology, Yoko-
hama, Japan. In 2009, he joined the Graduate
School of Informatics, Kyoto University, Kyoto,
Japan, where he is currently a professor. He was
a visiting industrial fellow at the University of
California, Berkeley, from 1998 to 1999. His research interests include
CAD for nanometer-scale LSI design, fabrication-aware design methodol-
ogy, and performance optimization for variation tolerance. Dr. Sato is a
member of IEICE. He received the Beatrice Winner Award at ISSCC 2000.
