Abstract-Circuit reliability is affected by various fabrication-time and run-time effects. Fabrication-induced process variation has significant impact on circuit performance and reliability. Various aging effects, such as negative bias temperature instability, cause continuous performance and reliability degradation during circuit run-time usage. In this work, we present a statistical analysis framework that characterizes the lifetime reliability of nanometer-scale integrated circuits by jointly considering the impact of fabrication-induced process variation and run-time aging effects. More specifically, our work focuses on characterizing circuit threshold voltage lifetime variation and its impact on circuit timing due to process variation and the negative bias temperature instability effect, a primary aging effect in nanometer-scale integrated circuits. The proposed work is capable of characterizing the overall circuit lifetime reliability, as well as efficiently quantifying the vulnerabilities of individual circuit elements. This analysis framework has been carefully validated and integrated into an iterative design flow for circuit lifetime reliability analysis and optimization.
I. INTRODUCTION
Aggressive scaling of CMOS process technology poses serious challenges on the lifetime reliability of integrated circuits (ICs). IC lifetime reliability is affected by fabrication-induced process variation and run-time aging effects [3] . Feature size reduction increases the difficulty of precise fabrication process control. Fabrication induced geometric and electrical parameter variations, e.g., changes in device effective channel length and threshold voltage, have significant impact on IC performance and reliability. Meanwhile, run-time aging effects, such as electromigration, thermal cycling, and negative bias temperature instability (NBTI), have become another fast-growing concern of IC lifetime reliability. NBTI is known to be the dominating circuit lifetime aging effect [12] , [7] . The occurrence of NBTI is due to the generation of traps at Si-SiO2 interface when PMOS devices are negatively stressed, e.g., Vgs = −Vdd. This effect causes temporal increase of PMOS threshold voltage (Vth) and long-term performance degradation.
Process variation [16] , [15] and NBTI effect [13] , [6] , [14] , [21] , [26] , [20] have both drawn significant attention in the recent past. Most of the past work treats them as two independent issues, and addresses the impact of each effect on IC reliability and performance * Corresponding author. E-mail: xzeng@fudan.edu.cn separately. However, IC reliability is jointly affected by both effects. In addition, process variations and NBTI effect have strong influence on each other. As reported by Bhardway et al., NBTI-induced threshold voltage shift of PMOS transistor depends not only on its working condition (such as input duty cycle and temperature), but also on the underlying process parameters such as original threshold voltage and dioxide thickness [6] . Due to fabrication-induced process variations, NBTI effect shall be modeled as a random process. On the other hand, the circuit timing statistics will be affected by the NBTI effect as well as the process variation as time evolves.
Recently, work starting to consider both effects has been reported. In [11] , NBTI effect (ΔVth) is modeled as a random process. This work, however, ignores the variation of other process parameters. In [4] , the authors consider both process variation and NBTI effect for standard cell modeling and optimization. Process parameters are treated as random variables and modeled with response surface method. In this work, however, NBTI effect is incorporated with the worst-case delay model. Since NBTI effect is a strong function of circuit run-time condition, the worst-case approximation is pessimistic (up to 30× increase of ΔVth versus the nominal has been reported in [4] ). In [22] , the authors present a thorough analysis of circuit aging under the variation of threshold voltage. This work's main focus is on single path circuit modeling. A comprehensive analysis of IC performance and reliability yet requires statistical techniques to address correlated paths and other process parametric variations. We believe this work starts an important direction, and our study will expand and build up on this foundamental work.
In this article, we present an analysis framework to evaluate the IC lifetime reliability by jointly considering process variation and NBTI effect. This work makes the following contributions:
• We present a nonlinear scalable statistical gate delay aging model, which considers both run-time gate working condition and fabrication-induced process variation.
• We present a statistical timing analysis framework using the proposed gate delay model, which is capable of characterizing the performance and reliability degradation under process variation and run-time aging. A fast pruning algorithm is proposed to improve the analysis efficiency.
• We present a criticality and sensitivity analysis method to quantify the reliability impact of each individual circuit element. Such quantification enables efficient iterative IC reliability optimization flow. The rest of the article is organized as follows. Section II introduces the proposed analysis framework. Sections III, IV, and V describe in detail the proposed modeling, analysis, and optimization methods. Section VI reports the experimental results. The paper is concluded in Section VII.
II. OVERVIEW OF AGING-AWARE STATISTICAL FRAMEWORK
The proposed aging-aware statistical timing analysis framework is shown in Figure 1 . The analysis framework consists of three key components: Gate-Level Aging-Aware Statistical Timing Model: Given a technology library in the SPICE netlist form, it characterizes the process 30.4
514
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC'09, July [26] [27] [28] [29] [30] [31] 2009 
III. STATISTICAL GATE AGING MODELING

A. Parametric NBTI Aging Modeling
This section describes a parametric method to model PMOS NBTI effect, which extracts and formulates NBTI run-time dependencies, e.g., temperature and signal probability, in a compact form that allows rapid estimation of NBTI-induced time degradation under arbitrary run-time conditions. This proposed method can also be extended to model other lifetime aging effects. The NBTI effect manifests itself as increase of PMOS threshold voltage and degradation of circuit timing. NBTI physical mechanism has been studied in [7] , [19] . A NBTI model under arbitrary dynamic temperature variation is proposed in [26] . In [6] , the authors propose a long-term NBTI model, which provides an analytical upper bound estimation of the NBTI impact over time, described as follows.
where α is the average signal duty cycle, T is the average temperature, Vth is the initial threshold voltage and tox is thickness of gate dielectric. For the sake of simplicity, please refer to [6] for detailed explanation of other parameters. This model shows that the NBTI effect is a strong function of the run-time temperature T and signal probability (duty-cycle) α of the logic gate. In this work, we extract the dependency on T and α from the long-term model described in Equation (1). We further assume Tclk is small based on the fact that modern high-speed IC designs are typically clocked at the multi-gigahertz range. Using the similar reduction technique as in [20] and considering the exponential temperature dependency shown in Equation (3), the NBTI-induced threshold shift model can be approximately reduced to
where k is Boltzmann constant, Eα = 0.49eV and b is a fitting constant. Figure 2 demonstrates the relative error of our simplified model in Equation 5 under varying working conditions against the original long-term model in Equation 1, using the 65nm CMOS technology. The temperature ranges from 320K to 380K and the average duty cycle ranges from 0.1 to 0.95. As is displayed in the figure, the simplified model achieves very good accuracy within normal work conditions. Following the alpha-power law [18] , the first-order gate delay can be approximated as a linear function of the threshold voltage. The gate delay can then be expressed as follows [20] , [23] :
where constantb can be fitted from SPICE characterization. A primary advantage of using Equation (6) to characterize gate-level NBTI effect is that, given a reference model pre-characterized at Tref and αref , the aging effect of a logic gate under any arbitrary T and α can be efficiently calculated using parameter scaling, which will be discussed in detail in Section IV-A.
B. Considering Process Variations in NBTI Aging Modeling
The NBTI-induced gate delay degradation depends not only on circuit run-time conditions but also on fabrication-determined process parameters, such as the initial threshold voltage Vth and oxide thickness tox. Due to fabrication-induced process variations, the NBTI aging process and its impact on circuit timing become a random process. To model NBTI-induced aging under process variation, we apply the stochastic collocation method, which was originally proposed to model the gate delay under uncertainty [15] . Considering a set of process parameters as a random variable vector with normal distribution
the variation of gate delay degradation can be modeled using the polynomial chaos expansion on the set of random variables.
where {Φj} P j=1 is the complete set of the d-dimension Hermite polynomials up to the l-th order, and cj is the unknown coefficient. Hermite polynomials form a set of orthogonal basis of Hilbert space under Gaussian measure, and thus is the best if ξ is Gaussian. Second order Hermite polynomials are sufficient in practice. The coefficients cj can be determined using the stochastic collocation method [5] , [15] . Using Equation (8), the delay of a logic gate at time t can then be expressed as follows.
where d0( ξ) is the initial gate delay after chip fabrication. Δd( ξ, T, α, t) represents the time-dependent gate aging effect. d0( ξ) = P P j djΦj( ξ) is expressed in polynomial chaos form with respect to the same set of random variables as Δd, and the coefficients are computed using the same stochastic collocation method. For gates with multiple inputs, this model is applied for each input of the gate.
IV. AGING-AWARE STATISTICAL TIMING ANALYSIS
Given a circuit netlist and the aging-aware variational gate delay model (Equation 9) as inputs, the proposed aging-aware statistical timing analysis method computes the aging effect of each logic gate based on its run-time condition, and carries out circuit-level statistical timing analysis.
A. Computation of Gate NBTI Aging Effect
For each type of logical gate provided by the technology library, the NBTI aging effect is characterized once under a reference working condition Tref and αref , and is expressed as Δd( ξ, Tref , αref , t). During circuit-level logic analysis, for each logic gate i, the duty cycle αi of its input signal is estimated by circuit simulation using user-provided input signal vectors, and the gate temperature Ti is provided from chip thermal profile. The aging effect of gate i can then be calculated by scaling the referenced model as follows.
where the scaling factors for temperature Ti and signal duty cycle αi are
and
Due to the exponential dependence of the temperature and error introduced during model simplification, a maximum of ±25K temperature difference is allowed in order to achieve good accuracy of the scaled aging effect for a gate (This setting is used in the experimental result section.). In order to cover the complete circuit operation range, we develop piece-wise gate aging model. In addition, the above scaling model is not applicable when α = 1, which indicates that the gate is under static stress. Using the static NBTI model, a scaling aging model can also be developed for α = 1 using the same method described above.
B. Equivalent Aging Time Analysis
Circuit workload may vary over time, so is the aging process. For example, as shown in Figure 3 , a gate experiences three different working conditions, (T1, α1), (T2, α2) and (T3, α3) , with a duration of t1, t2 and t3, respectively. In this work, we introduce equivalent aging time to facilitate characterization of the aging effect under such varying conditions. For the sake of clarity, the dependency of the aging effect on ξ is omitted. Given the aging effect of a logic gate under condition (T2, α2), the equivalent aging effect under condition (T1, α1) is described as follows.
Using Equation 6, the equivalent aging time teqv1 can be computed as
where RT and Rα are defined in Equation (11) and (12) . Then, the aging effect of the gate at t2 equals that of the gate working under (T2, α2) during time (0, teqv1 + t2), and can be computed as Δd(T2, α2, teqv1 +t2). This procedure can be carried out inductively whenever the working condition changes. Figure 3 demonstrates the use of equivalent aging time to estimate the overall aging process under three different working conditions. The dotted lines are the equivalent aging times computed at each transition using Equation (14) . The solid lines are actual aging durations. 
C. Aging-Aware SSTA and Pruning Algorithm for Repeated Analysis
Once the aging effect of a gate i is computed, the delay of each gate is expressed in PCE form (Equation 9) of its underlying process parameter ξi. To handle the correlation between the process parameters { ξi} q 1 from q different gates, principle component analysis [2] is carried out to extract a set of independent variables z according to the correlation matrix of { ξi} q 1 . And each set of process parameters ξi is represented by a linear combination of z. After this transformation, a PCE delay model based timing analysis method [5] is adopted to compute the arrival time (AT) and required time (RT) of each node on the graph. The arrival time and required time are also represented in the PCE form.
In practice, the circuit needs to be tested under different combinations of working conditions and it is time-consuming and thus undesirable if aging-ware SSTA has to be re-run each time the working condition of the circuit changes. Inspired by the work of [24] , we propose a fast pruning algorithm which prunes redundant timing nodes and edges and thus improves the subsequent repeated analysis.
It is observed in [24] that the distributions of the arrival time at the different inputs of a node may be distantly separated. The statistical maximum of them, which is the distribution of arrival time at the output of the node, will also be separated from some of the inputs. Such input can be identified using the following equation:
where μ and σ are the means and variances of the input and output arrival times. Inputs of this type do not contribute to the timing statistics of the later stages and therefore the incident edges corresponding to this type of inputs can be removed from the timing graph of the circuit. Note that as the circuit ages, it is possible that inputs not contributing to circuit performance at time zero become relevant due to the NBTI aging effect. However, as studied in [21] and [20] , the maximum delay degradation along a timing path is limited by 20%. Considering this factor, we add a safe margin to the edge pruning condition to prevent the removal of any potentially important edge due to aging effect:
where is set to be 20%. Furthermore, a node with all its fan-out edges pruned can also be removed from the timing graph, along with its fan-in edges. In summary, the pruning algorithm consists of two phases. Firstly, it does a forward topological search to prune unimportant timing edges using Equation (16) . Next, it carries out a backward topological search to prune nodes with no remaining fan-out edges, and their fanin edges. The running time of the pruning algorithm is dominated by SSTA in the first phase of the algorithm. The payoff of this algorithm is that the aging-aware SSTA under different working conditions can be sped up on the pruned graph. In addition, optimization needs only to consider the pruned graph.
V. CRITICALITY-SENSITIVITY BASED ELEMENT IMPACT
ANALYSIS Statistical analysis fulfills two purposes: first, it calculates the reliability after several years' aging; second, when the target reliability is not achieved, it provides guidance for circuit optimization. In this section, we propose a criticality-sensitivity based element impact analysis, which can be embedded in an iterative optimization framework to improve reliability of the circuit after years of aging. The whole flow is shown in the lower right part of Figure 1 . Our aging optimization procedure is similar to TILOS [9] and the work in [10] in the sense that it chooses a group of most effective gates for sizing at each iteration. However, we propose a criticality-sensitivity based analysis to measure, in the probability space, the impact of sizing this gate on the reduction of aging effect of the whole circuit. At each iteration of the optimization flow, the aging-aware statistical timing analysis proposed in Section VI is used to update the aging effect and timing information of the circuit. Then, criticality-sensitivity based element impact analysis discussed below is carried out to select a group of gates Φ, whose change affects the timing of the circuit most effectively. At the end of the iteration, gates in Φ are optimized to improve the circuit yield under process variation and aging effect. The optimization iteration repeats until it meets the expected reliability or violates the constraints of area or power.
Because of circuit structure, the effects of updating different elements are different and interdependent. Updating one element in each iteration solves the interdependence problem but is expensive. It is also unnecessary to update an element on a non-critical path. Therefore, it is important to select a small group of most effective elements to optimize in each iteration. As a study, we focus on gate sizing where the designed width μ1 of each gate will be selected to optimize the circuit reliability at the end of a given aging period. Notice that the fabricated actual width ξ1 of a gate is still a random variable, but its mean value and variance are decided by μ1.
The effect of sizing gate i on the reliability y of the whole circuit, which is defined as the circuit yield after a given period of aging, is just the reliability gradient over μ1, which can be expressed as
where di is the gate delay, and Ai is the slack given by
where ATi,in and RTi,out are the arrival time at the input and the required time at the output of gate i. From the above relationship, we know ∂Ai/∂di = 1. The product of the last two terms in Equation (17) gives the sensitivity ∂di/∂μ1, which is the derivative of the gate delay with respect to the designed width. Since gate delay is a distribution, the sensitivity, as its derivative over a constant variable, is also a distribution. We will use its mean value to guide the optimization. For this, we have
where the first term is a constant and can be obtained from the coefficient of the ξ1 term in Equation (9), and the second term is 1 if μ1 gives the mean of ξ1. The term ∂y/∂Ai captures the complex relationship between gate slack and circuit reliability. When the slack is positive, it will be zero. Even when the slack is negative, if the gate is not on all critical paths, the term will still be zero. Furthermore, with process variation, Ai becomes a distribution instead of a deterministic figure. Therefore, we use criticality [25] to evaluate how important a gate is to improving the reliability. Considering the aging effect, the criticality ci is defined as the probability that gate i lies on the critical path of the circuit after a period of aging time, due to the process variation. To compute the criticality of each gate in the circuit, we adopt the cutset-based method proposed in [25] , [17] . Since our timing distribution is expressed in PCE form instead of first-order canonical form, the tightness probability, a basic building block of the criticality computation algorithm, is computed using either numerical integration [8] or the APEX method [16] .
After the criticality ci and sensitivity si of every gate in the circuit are computed, we rank the gates according to the product ci × si. Although ci × si is not the same as ∂y/∂μ1, it provides an approximate ranking of the gates by their impact on the improvement of the circuit reliability after aging. Using this criterion, a group of n highest-ranking gates are chosen for optimization. After optimization, the aging-aware statistical timing analysis is carried out to update the aging effect, timing, and reliability of the circuit. Notice that in the analysis, not all the gates in the circuit need to be updated. Only the fan-in gates of the n modified gates and those on their fan-out cones need to be reexamined.
VI. EXPERIMENTAL RESULTS
The proposed statistical reliability analysis framework is implemented in C++, including a modeling engine, a timing analysis engine, and a criticality-sensitivity based element impact analysis engine. The effectiveness of the impact analysis is tested in an iterative optimization approach. Circuits from ISCAS85 are used as the testbench. The experiments are run on a 64-bit Linux server with 3.0GHz Xeon CPU and 2G memory.
A. Verification of the Gate Aging Model
This section evaluates the proposed statistical gate aging model shown in Equation 8 . Various types of logic gates, such as BUF, NAND, and NOR, are considered with 65nm PTM [1] technology. For each type of gate, the channel width, gate length, and threshold voltage of PMOS and NMOS are modeled using Gaussian random variables. The variance of each random parameter is set as 10% of its mean. The reference aging model is characterized once using the proposed modeling method under normal working condition with T = 325K and α = 0.5. To test the accuracy of our aging model in different working conditions, the aging effect of the gates under a different working condition is computed by scaling from the reference model using Equation 10 . We select T = 350K and α = 0.75 as the testing condition, which achieves the maximum temperature difference against the reference condition allowed in our model, as is discussed in Subsection IV-A. The aging effects of each gate are compared against the results of corresponding 5000-point MonteCarlo simulation. In each of the Monte-Carlo instances, the random ΔVth is computed using long-term model in Equation 1 according to the sampled process parameters.
The relative error results against Monte-Carlo are given in Table  I . The aging effect is modeled accurately at the reference working condition. As the working condition deviates from the reference one, the modeling error begins to grow, which is not unexpected since the modeling error stems not only from the coefficient regression but also from the fact that the scaling relationship in α and T does not exactly satisfy the long-term model. Still, the aging model has sufficient accuracy to be used in the aging-aware statistical analysis framework under allowed fluctuation of working conditions. 
B. Aging Effects of Circuit Delay Distribution
Next, using the proposed aging-aware statistical timing analysis method, We characterize circuit delay distributions under aging effects. The proposed analysis method uses the SSTA method which has been validated against Monte-Carlo simulations in the past work [5] . To evaluate temperature-dependent aging effect, we constructed a run-time temperature profile gathered from a computer server. The chip temperature varies from 36
• C (low workload) to 75 • C (high workload). The temperature profile is then repeated in a 5-year time span. In addition, during the first two years, the ratio between high workload and low workload is set to 0.5, and for the next three years, the ratio is increased to 2. The aging history for ISCAS85 circuits under the changing temperature profile is computed using the proposed equivalent aging time technique. The delay distributions of each circuit are computed at years 0, 2 and 5. The evolution of the means and the variances at years 2 and 5 are shown with respect to year zero in Figure 4 . These results demonstrate that, due to process variation and NBTI aging effect, mean delay increases while delay variance decreases in general, as was first discovered by Wang et al [22] . However, in some circuits, such as c17, c449 and c7552, delay variance increases over time. This is due to the fact that, unlike the single path case [22] , the distributions of the paths converged at a gate inputs may approach or depart from each other, resulting in a possible increase of circuit delay variance.
C. Pruning Effects
We next evaluate the effectiveness of the proposed aging-aware pruning algorithm (Section IV-C). We choose = 20% as the delay degradation of the paths in the circuit will not exceed 20% under normal operation, which is also suggested by Figure 4 . Table II lists the pruning statistics, accuracy and speedup against the results of using SSTA without pruning. With the maximum difference of mean and variance under 1% and 4%, the pruning algorithm introduces little error for the subsequent timing analysis. From Table II , it is observed that the analysis speedup is proportional to the number of pruned edges and nodes, which depends on the structure of the circuit and the different aging effect of each gate in the circuit. For ISCAS85 benchmarks, speedups from 10% to 70% are achieved. However, if a circuit is well-balanced, that is, the arrival time distributions at the inputs of every node are close to each other, the pruning algorithm is not effective, as shown in the result of c7552.
D. Effectiveness of Element Impact Analysis
Finally, we designed and implemented an iterative gate sizing optimization framework to test the effectiveness of the proposed criticality-sensitivity based element impact analysis (Section V). The four largest benchmarks shown in Table II are used in this experiment. For each benchmark, we consider the high workload condition with temperature of 75
• C. The optimization flow is set to target a 5-year time span. During each optimization iteration, the proposed aging-aware timing analysis is used to estimate the circuit timing information. Gates are then ranked by the product of criticality and sensitivity in a non-increasing order. The first Φ gates are then chosen, and the size of each gate is increased by . Φ = 50 provides the best performance-time trade-off in our experiments. This procedure is repeated until the μ + σ of the circuit delay is improved by 25%.
For comparison purposes, we also consider another two gate impact analysis methods which rank the gates using either sensitivity or criticality alone. The traces of the mean and variance of the delays of these circuits after each iteration are plotted in Figure 5 . It can be observed that the progress of the sensitivity-guided method is unstable while that of the criticality-guided method is not fast enough.
On the other hand, the progress of the proposed method using the product of sensitivity and criticality is both stable and fast, showing the effectiveness of criticality-sensitivity analysis.
VII. CONCLUSION In this work, we have studied the impact of process variation and run-time aging effect on IC lifetime reliability. This work yields a comprehensive IC reliability analysis framework, which considers the joint impact of NBTI aging effect and parameter variations. Techniques are proposed to optimize the accuracy and efficiency of IC reliability analysis, including scalable NBTI aging model, equivalent aging time and aging-aware timing graph pruning. A novel criticality-sensitivity based analysis method is proposed to allow rapid estimation of the impact individual circuit element on the overall circuit lifetime reliability. Leveraging the proposed analysis framework, we have designed and implemented a statistical reliability optimization flow for nanometer-scale IC design.
