Abstract-Recent work has shown large variations due to biastemperature instability (BTI) at the device level, and we study its impact on the behavior of larger circuits. We propose an analytical method that is over 600x faster than Monte Carlo simulation and accurate for technologies down to 16nm, and demonstrate it on circuits with up to 68,000 transistors. Results show that the impact of BTI variability at the circuit level is significantly smaller than at the device level, but increases with device downscaling.
INTRODUCTION
Bias temperature instability (BTI) is a major reliability concern in deeply-scaled very large scale integrated (VLSI) circuits. The BTI effect causes the threshold voltage, , of CMOS transistors to increase under voltage tress, resulting in a temporally-dependent degradation in the signal propagation delay in digital circuits.
The reaction-diffusion (R-D) model [1] for BTI, based on the dissociation of Si-H bonds at the Si/SiO 2 interface, has been a prevailing theory of the mechanism of BTI, and has been widely employed in research on circuit reliability and optimization. However, over the years, several limitations in the R-D theory have been exposed. For example, the predictions from R-D theory are unable to adequately explain the experimentally observed sensitivity of the degradation to the applied gate bias. Moreover, the observed recovery begins faster and lasts longer than that predicted by R-D models. These issues are addressed by recent advances in BTI modeling [2] , based on the charge trapping/detrapping theory, in which the temporal variations are caused by defects in gate dielectrics that can capture charged carriers.
Moreover, the charge trapping model explains the observation that for small-area devices the degradation and recovery of Δ proceed in discrete steps [2] , which is similar to the case of random telegraph noise (RTN) and 1/f 2 noise. Based on the charge trapping model, the intrinsic variability of BTI effect was explored analytically in [3] , which indicates the possibility of very large device-level variations for smallgeometry FETs, leading to orders-of-magnitude lifetime variations. These models were incorporated in a circuit simulator [4] and demonstrated on an inverter circuit, predicting large delay variations (with / of over 20%). However the proposed approaches in [4] were based on SPICElike circuit simulation that feeds a Monte Carlo method, and this approach is not scalable to large-scale digital circuits. A significant unanswered question, which this paper attempts to address, is: how does this large device-level variation translate to variations at the circuit level? We answer this question by:
(a) taking large circuits in 32nm, 22nm, and 16nm technologies, (b) mapping them to a standard-cell library to meet realistic timing specifications, and (c) examining and analyzing the results.
SPICE-based approaches are inefficient in finding the critical paths in large circuits, and hence a cell-based timing model is used in this work, consistent with the approaches used for timing analysis in realistic circuits. An analytical method is proposed for finding the variation in delay on the critical path, and verified using Monte Carlo simulations.
We find that BTI variability increases with downscaling, but that circuit-level variations are significantly lower than device-level variations. After we present our experimental results, we will analyze the reason for this reduction in the magnitude of variations. We also show that this circuit delay variation is Gaussian in nature, and the degradation is proportional to the logarithm of time.
II. MODELING

A. BTI Variability under Charge Trapping Model
As discussed in [4] [5], the threshold voltage degradation of small-geometry devices shows significant levels of uncertainty due to the random nature of the spatial distribution of defects in the dielectric, as well as the impact on Δ of each defect. Under the charge trapping model for BTI, each MOS device is characterized by the number of defects, the capture and emission time of each defect, and the impact on device when each defect is charged [4] . These defect parameters are characterized in device experiments for given semiconductor technology using the recently-proposed timedependent defect spectroscopy (TDDS) method [6] .
For a MOS device of length and width , the total number of defects is modeled as a Poisson distribution, ~Poisson , with mean value , where stands for the dielectric areal defect density . The BTI-induced threshold degradation due to each single defect when charged (occupied) is modeled as an exponential distribution, Δ ~Exp , with mean value / [3] . Both and are technology-specific parameters.
At any given time, each of the defects in the MOS device has two states: charged (occupied) or uncharged. Only occupied defects make contributions to the device threshold voltage. Following the models in [7] , the timing characteristics of each defect corresponding to the charge trapping and detrapping events, namely the capture and emission time and , are strongly dependent on voltage and temperature. In digital circuits, the voltage dependence effect is simplified by the fact that there are only two nontransient voltage stageslogic 1 and logic 0 -corresponding to two static modes of stress and relaxation. We capture the temperature dependence effect by the use of a standard corner-based approach where the worst-case temperature corner is assumed. Thus each defect can be described by four time constants, denoted as vector , , ,
The trapping and detrapping of electric charges in a defect is a stochastic process with time constants characterized by the vector . The occupancy probability (i.e., the probability of charge trapping) of a single defect under AC stress is derived in [7] to be a function of total time t and duty factor DF as follows, , , * * * 1 exp 1
Here, the duty factor of a device under AC stress is defined as the probability of the transistor in accumulation mode that is effective for BTI stressing (in some papers, is also referred to as the signal probability, ). The parameters * and * are defined as the effective capture and emission time constants under AC stress, which account for the duty factor effect are functions of :
Fig . 1 shows an example plot of the occupancy probability function , , of a single defect as defined in (2), with the values of the time constants shown in the figure. The plot indicates that the occupancy probability increases from zero to an asymptotic value of one. It can be seen that as a function of DF, P c increases gradually. The time over which this change occurs is relatively in a short time compared to the circuit lifetime, which is in the range of 10 to 10 a.u.
B. The Mean Defect Occupancy Probability
Since the defects are created in the fabrication process and uniformly distributed in the dielectric layer, for each individual component of , the statistical distribution associated with any defect in a wafer is independent and identically distributed. The statistical distributions of different time constants in the vector may be different, and for each single defect the four components of are correlated according to [6] , and their joint distribution can be characterized for a specific technology. In this paper, we follow the assumptions in [4] and use the following distributions to generate the time constants:
is uniformly distributed on the log scale between 10 and 10 a.u.
 , and
, are also taken to be uniformly distributed on the log scale and weakly correlated with , with mean values given by , 0.1 , and , 100 , .
 , is assumed to be much larger than the other time constants, and uniformly distributed on the log scale between 10 and 10 a.u. With these assumptions, the mean occupancy probability function, , , is introduced as the expected value of the occupancy probability of a single defect defined in (2):
Here is the joint probability distribution function (PDF) of . The mean occupancy probability function , can be calculated by evaluating (5) numerically. Fig. 3 shows the , function corresponding to the assumed that is plotted in Fig. 2 . This plot indicates that the mean occupancy probability is a monotonically increasing function of both and time. Even though , , increases rapidly with t for a single defect with a specific , as shown in Fig. 1 , due to averaging effects of large number of defects with different time constants,
, changes more gradually with time. Since , is only determined by the distribution which is technology-related, and independent of the circuit structure, it can be pre-characterized and stored in a look-up table (LUT) for use in circuit analysis.
C. Cell Delay Model of Digital Circuits
Due to BTI effects, the threshold voltages of stressed increase over time. The propatation delay of a logic cell (e.g., NAND, NOR) in a digital circuit is dependent on the of all MOS transistors in the cell, and therefore digital circuits slow down as they age.
Based on a commonly-used model [8] , we use a cell delay degradation model based on a first-order Taylor approximation. The delay of cell is modeled as a linear function of Δ of each transistor in cell :
Here is the nominal value of delay, / are the sensitivities of delay to the threshold of device at the nominal threshold voltage. These parameters are calculated using standard design automation techniques, built on top of SPICE simulations.
The characterization of cell delay and sensitivities is also circuit-independent and depends purely on the cell library: hence as part of library characterization, these values are computed for each cell in the library and stored in LUTs.
III. ANALYSIS OF CIRCUIT DELAY DEGRADATION
The delay of a digital logic circuit is calculated by static timing analysis (STA). The digital circuit is modeled as a directed acyclic graph (DAG). The circuit delay is the sum of the delay of logic cells on the longest signal path, or critical path, as ∑ ∈ , which limits the maximum operation frequency of the digital circuit:
1/ . This paper calculates the circuit delay degradation due to BTI based on the critical path. The duty factor of each transistor in the circuit is calculated using standard techniques [9] .
A. Analytical Method
Based on the mean defect occupancy probability function (5) introduced in Section II.B, the number of occupied defects, denoted as , of a transistor has a Poisson distribution 1 
~Poisson
, with its mean value calculated using , .
• , ⋅ , 7
The total threshold voltage degradation, Δ , of a transistor is the sum of contributions to the degradation from all occupied defects in the transistor, i.e.,
Δ Δ 8
Since the Δ values of all defects are independent and identically distributed (i.i.d.) as discussed in Section II.A, the total Δ is a sum of exponential random variables, while has a Poisson distribution. A closed form for this sum is found in [3] and both the PDF and CDF of Δ .
The distribution functions have complex form, but the probit plot of the CDF function in [3] indicates that for an adequate number of defects (e.g., 10), the BTI-drive Δ can be approximated as following Gaussian distribution:
where and 2
This Gaussian approximation can be justified by the central limit theorem (CLT) for a large number of occupied defects. As shown in Section IV, this approximation does not induce significant errors to the circuit level results.
If process-induced variation of device is also considered, which is modeled as an independent Gaussian distribution, ~ , . The combined degradation of MOS transistor can be expressed as 1 The number of occupied defects in a device follows a Poisson distribution by definition because (a) each occupied defect has the same occurrence rate / within the device area of by , and (b) the occurrence of all occupied defects are independent with each other. This is similar to the number of all defects which follows ~Poisson , and is verified by experiments in Sec IV.
In a digital circuit, the critical path is obtained from STA. Based on the cell delay model in (6), the delay degradation of the critical path is derived as
Since the Δ of each transistor are independent Gaussian random variables, the degraded delay also has a Gaussian distribution with mean and variance as follows 
B. Monte Carlo Method for Verification
The Monte Carlo flow for calculating the delay degradation is shown in Fig. 4 . This Monte Carlo simulation yields the distribution of delay degradation, and it is employed in this work to verify the proposed analysis method. 
C. Overall Analysis Flow
As shown in Fig. 5 , the flow of the BTI variability analysis procedure is as follows:
 The input is a digital circuit, the corresponding cell library, and the technology parameters.
 The mean defect occupancy probability , and cell delay sensitivities are characterized offline (Section II.B, II.C). For each circuit, the nominal critical path delay and are computed using EDA techniques.  The analytical (Section III.A) method is used to determine circuit delay degradation and its variation spreads under BTI variability.
When we evaluate the correctness and accuracy of our method, the third step is replaced by the Monte Carlo method (Section III.B). As we will show in the results, the analytical method provides accurate results, and is over 600 times faster than Monte Carlo.
IV. EXPERIMENTAL RESULTS
The proposed approaches for circuit delay degradation analysis under BTI variability are applied to ISCAS85 and ITC99 benchmarks, which are mapped to a subset of the Nangate cell library [10], and scaled down to 32nm, 22nm and 16nm for comparison. The characterization of cell delay and sensitivities is performed using HSPICE with PTM models [11] . Both the analytical and Monte Carlo methods are implemented in C++ and executed on a Linux PC with 3GHz CPU and 2GB RAM. Table I shows the circuit delay degradation at 10 a.u. at different technology nodes. The mean delay degradation is listed in columns named Δ % , while the columns named / % show the normalized standard deviation of circuit delay as a percentage of the mean, for only BTI-induced variation (without including process-induced variation). Next, process-induced variation is added in, setting / 5%, and we show the results of the corresponding normalized standard deviation of circuit delay, as a percentage of the mean, in the columns labelled / %. The errors of the proposed method as compared to Monte Carlo (MC) simulation and the runtime comparisons are also shown in the table. On average, the proposed method has 1.31% error in Δ and 0.76% error in , and is over 600 faster compared with MC.
Simulation results in Table I indicate that even with significant device-level variations (the combined / of is up to 4.5% due to both BTI variability and process variation for these experiments). Therefore, the circuit level impact of BTI variability is much less than device level; however it increases with device downscaling, and is more noticeable at 16nm.
The results also verify the proposed analytical method is accurate as compared to Monte Carlo. The probit plot of the delay distributions (analytical and Monte Carlo) of benchmark
circuit c432 under 32nm, 22nm and 16nm technologies are shown in Fig. 6 . The Monte Carlo and the analytical results show near-perfect overlap in each case, indicating the proposed analysis method gives accurate results, and that the circuit delay has a Gaussian distribution.
Probit(Delay) percentile The reduced circuit delay variations can be explained by the following factors.
 First, random variations cancel out over multiple stages on a critical path, as shown in Fig. 8 ; for a path consisting of cells with independent variations, the overall variance goes down by 1/√ , according to central limit theorem (CLT) 2 . Fig. 8 shows the Monte Carlo results of the normalized delay vs. time for inverter chains of (a) one stage and (b) ten stages. The narrower spread of curves in (b) in both x and y direction indicates smaller variations of both lifetime and delay on longer critical path.  Second, device-level variations are largest for smallgeometry FETs. However, devices on critical path in real circuits are sized larger to meet the timing specifications. Smaller FETs are used in off-critical paths, and large delay variations there do not matter since even the altered delays do not exceed the clock period constraint. 
V. CONCLUSION
This paper studies the impact of BTI variability on the behavior of larger circuits under charge trapping model. We propose an analytical method for technologies down to 16nm that is fast and confirm its accuracy with Monte Carlo simulations. Our results have shown that although device-level variations can be extremely large in scaled technologies, they result in moderate circuit variations, primarily due to the averaging effect over multiple stages, and the fact that minimum-sized devices are usually not used on critical paths.
