PMOS stress (ON) probability has a strong impact on circuit timing degradation due to NBTI effect. This paper evaluates how the granularity of stress probability calculation affects NBTI prediction using a state-of-the-art long term prediction model. Experimental evaluations show that the stress probability should be estimated at transistor level to accurately predict the increase in delay, especially when the circuit operation and/or inputs are highly biased. We then devise and evaluate two annotation methods of stress probability to gate-level timing analysis; one guarantees the pessimism desirable for timing analysis and the other aims to obtain the result close to transistor-level timing analysis. Experimental results show that gate-level timing analysis with transistor-level stress probability calculation estimates the increase in delay with 12.6% error.
Introduction
In nanoscale integrated circuits design, negative bias temperature instability (NBTI) is one of the serious concerns on device reliability. NBTI is the degradation effect which causes gradual V th degradation. When ΔV th is defined as ΔV th = V th aged − V th fresh , ΔV th of PMOS is a negative value and |ΔV th | gradually increases while a negative bias is applied to PMOS, i.e. V gs = −V dd , where V th aged and V th fresh are threshold voltages of aged and fresh MOS transistors. This condition is defined as stress phase of NBTI. When NBTI stress continues for a long time, path delay increases, which may lead to a timing error. On the other hand, while PMOS is OFF, i.e. V gs = 0, |ΔV th | gradually decreases to its former value before stress impression, and PMOS degradation is relaxed. This condition is defined as recovery phase of NBTI. Repeating stress and recovery cycles, degradation caused by NBTI increases and could finally result in a timing error [1] .
In order to predict NBTI effect, a long term prediction model has been proposed [1] , [2] . The degradation by NBTI depends on operational parameters such as temperature, supply voltage, and stress probability [3] , [4] . Stress probability is defined as time ratio of stress phase, that is (time of stress)/(time of stress + time of recovery). When the stress probability is almost 100% and PMOS is under stress for a long time, an increase in path delay becomes extremely large due to large V th shift.
For accurate prediction of circuit delay degradation, appropriate stress probability estimation and consideration in timing analysis are crucially important. Reference [5] proposed a transistor-level estimation method of stress probability. Also, gate-level estimation of path delay using NBTI aware static timing analysis is proposed in [6] . Thus, there are several proposals with different granularities. However, the importance of the granularity (circuit/instance/transistor) in probability estimation and ΔV th annotation for timing analysis has not been sufficiently discussed. Stress probabilities can be highly biased depending on circuit operation, and in such cases, stress probability computation with finer granularity might be necessary.
In this paper, we experimentally evaluate how the granularities (circuit/instance/transistor) of stress probability computation affects the estimation accuracy of NBTIinduced timing degradation through case studies. Our evaluation reveals that transistor-level probability computation is indispensable in cases of highly-biased circuit operations. On the other hand, transistor-level timing analysis with transistor-by-transistor ΔV th variation is less compatible with gate-level timing analysis of industrial practice and it is computationally expensive. We therefore propose gatelevel timing analysis in which instance-by-instance ΔV th is annotated according to transistor-level stress probability estimation. Experimental results show that the proposed method significantly improves the estimation accuracy of timing degradation from timing analysis with gatelevel probability computation, though the timing analysis is carried out at gate-level.
The rest of this paper is organized as follows. The prediction model of NBTI used in this work is described in Sect. 2. Section 3 presents three methods of stress probability calculation, and clarifies the importance of the granularity in stress probability computation. After that, Sect. 4 proposes instance-by-instance ΔV th annotation for gate-level timing analysis, and shows its effectiveness for estimating NBTI-induced timing degradation. Finally, the discussion is concluded in Sect. 5.
Copyright c 2011 The Institute of Electronics, Information and Communication Engineers

Prediction Model of NBTI Effect
Various prediction models of NBTI has been proposed in [1] , [7] - [10] . In this paper, we use a long term prediction model which is useful for estimation of gradual V th degradation by years on a basis of the stress and recovery cycles [1] , [2] . With this model, path delays monotonically increase depending on V th degradation, since the model gives estimates of monotonic V th degradation from a long-term point of view. V th degradation after time t has passed (|ΔV th |) is expressed as Eq. (1).
In Eq. (1), K v is a parameter depending on supply voltage and temperature. T clk is clock period, and α is stress probability of PMOS. β t is a parameter that has a dependence on temperature, T clk , α, and t. Moreover, n is equal to 1/6 in a hydrogen molecule diffusion based model [11] . All of these parameters are important for the progress of NBTI degradation [12] .
References [1] , [2] reported that there is little relation between |ΔV th | and T clk [1] , [2] when the operating frequency is higher than 100 Hz. In this case, Eq. (2) is used for |ΔV th | estimation instead of Eq. (1).
In Eq. (2), parameter C is dependent on temperature, and t ox denotes gate oxide thickness. Here, in Eq. (2), when α approaches to 1, |ΔV th | reaches an infinite value and is not appropriate. In such a situation, as its upper limit, we use Eq. (3) which models only stress phase of NBTI [13] . Let us show an example of NBTI degradation. Figure 1 shows V th degradation calculated with Eq. (2), referring a parameter set of 65 nm process in [12] . A significant V th degradation can be found in the first year. In addition, we can see the dependency of V th degradation on α. We also applied this V th degradation to all PMOSs in a small combinational circuit shown in Fig. 2 . After ten years, the critical path delay increases by 4.3% (α = 0.1), 6.7% (α = 0.5), 11.0% (α = 0.9), respectively. Thus, the increase in path delay depends on α.
Granularity of Stress Probability Computation
This section introduces three methods of stress probability computation with different granularity levels. Then, the timing analysis results based on these three methods are compared.
Stress Probability Computation
We use three stress probability calculation methods (SPCM) with different granularities as follows.
SPCM-A (Circuit level): Set stress probability of all
PMOSs to 50% uniformly. SPCM-B (Instance level): Set stress probability of all PMOSs in an instance to the state probability of the instance output. State probability means the probability of a node being high. SPCM-C (Transistor level): Calculate stress probability for each PMOS.
SPCM-A is the simplest method, which is based on an assumption that each part of a circuit works uniformly. Reference [14] reported that 50% is a reasonable value to instantly conjecture the maximum timing degradation. In SPCM-B, the stress probability of PMOSs in an instance of standard cell is assumed to be identical to the state probability of the instance output, which can be obtained by logic simulation.
Note that other methods of state probability computation except logic simulation, such as probability propagation [15] , could be used for stress probability estimation, though in this paper logic simulation is adopted as a representative method. In SPCM-C, the stress probability computation of each PMOS is individually calculated. Table 1 lists features of SPCMs. In Table 1 , with SPCM-A, logic simulation is not required. In SPCM-B, logic simulation is required, while connection of all transistors is not taken into consideration. In SPCM-C, not only logic simulation but also consideration of connection of all transistors are required. In these methods, SPCM-C is the 
finest-grained calculation method and is expected to obtain the most accurate stress probability. In this work, with SPCM-C, stress probabilities computation is executed on the basis of logic simulation results, which means the state probabilities of all the nets are given. The computation procedure is:
1. For each cell instance, examine whether each PMOS is under stress or not for every combination of input states. 2. Calculate the probabilities of all the combinations of input states at each instance by using the state probabilities of the nets. We here call this probability as combinatorial probability. 3. Obtain the stress probability of each PMOS using the above two information.
As an example, we apply SPCM-C to 2-input AND shown in Fig. 3 . Supposing the state probabilities of inputs A and B are 0.7 and 0.1 respectively, the combinatorial probabilities of {0, 0}, {0, 1}, {1, 0}, and {1, 1} are listed in Table 2 . Here, the correlation between inputs is not considered in the combinatorial probability computation within an instance, though the state probabilities of inputs A and B are derived taking into account the logical correlation through logic simulation. Then, we examine whether P1, P2, and P3 are under stress for each input combination. Finally from Table 2 , the stress probability of P1 is expressed as a summation of the combinational probabilities, of which P1 is under stress, α = 0.30 × 0.90 + 0.30 × 0.10 = 0.30.
The stress probabilities of 2-input NAND and AND gates calculated by each SPCM using an example of Tables 3 and 4 . We first investigate the NAND gate having PMOSs in parallel. In Table 3 , comparing SPCM-B and -C, the stress probabilities of P1 are significantly different though those of P2 are similar. SPCM-B cannot well cope with a large difference of input state probability. Moreover, another problem arises with SPCM-B when analyzing the AND gate. In Table 4 , using the state probability of the instance output, the stress probability of P3 in the second stage can be well estimated. However, the stress probabilities of P1 and P2 in the first stage cannot be well estimated, because the circuit structure is not considered in SPCM-B. Thus, each method gives different stress probabilities. The impact of this probability difference will be investigated in the next section from a timing degradation point of view.
Experimental Setup
Circuits for Evaluation
In this paper, double floating point unit (D-FPU) and symmetric-key cipher algorithm AES [16] are adopted as target circuits for evaluation. The specifications and implemented results of D-FPU and AES using an industrial 65 nm standard cell library are summarized in Tables 5 and  6 , respectively. Input and output operands of D-FPU are 64-bit, which consists of 1 bit sign, 11-bit exponent, and 52-bit mantissa. On the other hand, AES performs encryption of 128-bit plaintext with 128-bit key, and outputs 128-bit ciphertext.
In this work, we assume three operating situations be- Table 3 Result of stress probability estimation (2-input NAND gate).
30.0 90.0 Table 4 Result of stress probability estimation (2-input AND gate).
30.0 90.0 7.0 Test vectors are generated so that operands whose 8-bit of 11-bit exponent and 40-bit of 52-bit mantissa were fixed to 0.
As for AES, the difference originating from operating conditions was limited and hence only general operation with random data will be shown in this paper. For each situation, 100,000 test vectors were prepared for D-FPU. On the other hand, for AES, 100 keys and 1,000 plain texts were prepared.
Procedure of Timing Degradation Analysis
In order to evaluate timing degradation due to NBTI, it is necessary to annotate ΔV th to timing analysis. To annotate ΔV th transistor by transistor, we adopted a commercial transistor-level static timing analyzer (Synopsys NanoTime [17] ) in this work. To eliminate estimation difference originating from tool and library difference, the same timing analyzer was used for all SPCMs. Figure 4 illustrates the overall procedure adopted for timing degradation analysis. Firstly, the state probabilities at input/output terminals of all instances are calculated from logic simulation results using given test vectors. Secondly, the stress probabilities of all PMOSs are calculated by SPCM-A through -C. After that, the stress probability for each PMOS is converted to ΔV th referring the long term model (Eq. (2)), and it is applied to the transistor-level netlist. Finally, using the netlist including information of ΔV th , we perform static timing analysis. In addition to these three methods, we define DC-stress, in which stress probability of all PMOSs is 100%, and evaluate the timing degradation for it. In this situation, Eq. (3) is used for ΔV th prediction.
Stress Probability Distribution for All PMOSs
We first examine the difference in stress probability distributions for three operating situations. Stress probability distributions calculated by SPCM-B and -C for all PMOSs of D-FPU and AES are illustrated in Fig. 5 and Fig. 6 , respectively. Note that PMOSs which compose feedback not related to timing in DFFs are excluded. In Fig. 5 and Fig. 6 , samples of 0%/100% stress probability are included in the range of below 10%/over 90%. In Fig. 5 (a), with both SPCM-B and -C, stress probabilities of PMOSs are largely estimated as around 50%. There is a small number of PMOS transistors whose stress probability is around 100%. In contrast, as shown in Figs. 5(b) , (c), there are many PMOSs whose stress probability is nearly either 0% or 100%. Moreover, a lot of stress probabilities of PMOSs which are estimated as over 90% in SPCM-C are misestimated as below 10% in SPCM-B. When operation or data is biased, the active circuit portion is limited, and a number of nets are fixed 0 or 1. In those cases, there is a large difference in stress probability distributions between the SPCMs.
As for AES, in Fig. 6 , although plain texts and keys are randomly given, there are a lot of PMOSs whose stress probabilities are under 10% or over 90%. AES has twenty substitution-boxes whose outputs are fixed to 0 or 1. Therefore, in AES, stress probability distributions are inherently biased toward both 0% and 100%.
Degradation of Critical Path Delay
Following the procedure in Fig. 4 , delay increase of critical path is here estimated for each operating situation.
(1) General Operation with Random Data Table 7 (a) and Table 8 show the evaluation results of NBTI degradation after 3 years in D-FPU and AES, respectively. In D-FPU, we can see only small differences of delay increase among SPCM-A through -C. On the other hand, in AES, between SPCM-C and a group of SPCM-A and -B, the error of delay increase at 125
• C is 26.9% (= (0.26 − 0.19)/0.26). It is supposed that this difference originates from the difference in a proportion of PMOSs whose stress probability is estimated at either 0% or 100% shown in Fig. 5(a) and Fig. 6 . Table 7 (a), the amount of delay increase at 125
• C in SPCM-C becomes 1.5 (= 0.87/0.57) times larger. In Table 7 (b), between SPCM-C and a group of SPCM-A and -B, the error of delay increase at 125
• C reaches 31.0% (= (0.87 − 0.60)/0.87). This error comes from the difference in Fig. 5(b) . Table 7 (a), the amount of delay increase is 2.7 (= 1.54/0.57) times larger than that at 125
• C in SPCM-C. In Table 7 (c), between SPCM-C and a group of SPCM-A and -B, the error of delay increase at 125
• C reaches 61.0% (= (1.54 − 0.60)/1.54) due to the mismatch in Fig. 5(c) .
Meanwhile, Fig. 7 shows the V th degradation predicted with Eqs. (2) and (3) as a function of stress probability assuming 125
• C and 3-year operation. The stress probability difference between 0% and 50% causes 37 mV V th difference. On the other hand, the difference between 50% and 100% corresponds to 124 mV shift indeed. As the stress probability approaches 100%, V th degradation drastically increases. In SPCM-A, the stress probability is never estimated as 100%, which can result in an optimistic estimation. Furthermore, with SPCM-B, since the number of PMOSs whose stress probabilities are close to 100% is fewer than SPCM-C as described with Fig. 5 and Fig. 6 , the estimate of path delay degradation tends to be smaller. In this way, timing degradation in an inactive circuit are often underestimated when the stress probability is estimated with circuit and instance level granularities. From another point of view, in an inactive circuit, the stress probabilities are biased toward not only 100% but also 0%, because a CMOS digital circuit basically consists of inverting logic gates, and therefore a timing analysis assuming DC-stress for all PMOSs provides extremely pessimistic results. For these reasons, the stress probability should be estimated for each PMOS.
Instance-by-Instance ΔV th Annotation for GateLevel Timing Analysis
Section 3 clarified that transistor-level stress probability computation is necessary for accurate timing degradation analysis. However, transistor-level timing analysis, which is necessary for transistor-by-transistor ΔV th consideration, is less compatible with gate-level timing verification of industrial practice for large SoC designs. This is because ordinary gate delay models cannot give gate delay coping with transistor-by-transistor ΔV th variation, though ΔV th variation uniformly applied to all PMOSs is often considered in cell library characterization. This section discusses how to exploit the transistorlevel stress probability information in gate-level timing analysis, aiming to obtain the estimation results close to those by transistor-level timing analysis.
ΔV th Annotation to Each Instance
For the purpose above, we introduce two instance-byinstance annotation method (IAMs) of |ΔV th | to each instance.
IAM-I: Choose and annotate the largest PMOS |ΔV th | within the instance of interest. This annotation always provides pessimistic results, which is a desirable property for static timing analysis. IAM-II: Choose and annotate the largest |ΔV th | of PMOS on the most timing-critical path within the instance of interest. It is expected to estimate the timing degradation close to that by transistor-by-transistor annotation, because |ΔV th |s which are large yet less related to the circuit critical path are not annotated. Therefore, this annotation cannot guarantee the pessimism, but helps eliminate the excessive pessimism of IAM-I.
Let us apply IAM-I and -II to 2-input AND shown in Fig. 8 as an example. With IAM-I, 42.6 mV is the largest |ΔV th | of PMOSs in this instance and we annotate it as the |ΔV th | of this instance. On the other hand, with IAM-II, |ΔV th | of P1 (32.2 mV) on the most timing-critical path is selected. This difference between 42.6 mV and 32.2 mV corresponds to reduction in the pessimism of estimated timing degradation.
Experimental Results of Timing Degradation
We evaluated the timing degradation of D-FPU with IAM-I and -II using transistor-level stress probabilities obtained by SPCM-C. The evaluation setup is the same with Sect. 3.3. Table 9 shows the evaluation results of NBTI-induced timing degradation after 3 years in D-FPU. In this table, timing degradation estimated by transistor-by-transistor annotation method (TAM), which is equivalent to SPCM-C in Table 7 , in addition to that under DC-stress are listed as well.
With IAM-I, the amount of delay increase is larger than that with TAM shown in Table 9 (a) through Table 9 (c), which means IAM-I preserved the pessimism in estimation as expected. The introduced pessimism is the largest in biased operation with random data (Table 9 (b)), and the amount of delay increase at 125
• C in IAM-I is 2.2 (= 1.93/0.87) times larger than that with TAM.
On the other hand, with IAM-II, the amount of delay increase is close to that with TAM under Table 9 (a) through Table 9 (c), though a small optimism is introduced. In Table 9(b), the error of delay increase at 125
• C estimated with IAM-II is 12.6% (= (0.87 − 0.76)/0.87), which is smaller than those of SPCM-A and SPCM-B (31.0% and 29.9%).
Discussion
We have discussed the granularity of stress probability computation and ΔV th annotation, and now have five estima These results demonstrate that, when all functions are randomly performed, SPCM-A and -B are effective for delay prediction as shown in Table 7 . Meanwhile, in case that operation or operand is biased, the accuracies of SPCM-A and -B significantly degrade and the error reaches 2.0 ns. We can find that instance-level stress probability computation might induce unacceptable error in timing estimate, even though actual input patterns are used for evaluation.
On the other hand, as shown in Fig. 9 (c) and NBTI V th degradation models have a certain amount of error, and it causes delay estimation uncertainty. If the uncertainty is much larger than the difference of delay degradation discussed above, the accurate stress probability computation and annotation becomes less important. On the other hand, the models are still actively studied, and hence a typical value of their accuracy is not available. We therefore show, as a reference data, V th degradation values that correspond to the difference of estimated delay degradation in Table 10 . This degradation values were obtained by changing V th of all PMOSs in the circuit uniformly so that the critical path delay became the same with one of the estimated delay degradations. In Table 10 , the difference of V th degradation between SPCM-C & TAM and a group of SPCM-A and -B is significantly large (= 65 mV). Meanwhile, comparing SPCM-C & TAM with SPCM-C & IAMs, the differences of V th degradation are 9 mV at most. Taking into account the accuracy of NBTI V th degradation model, we will need to select an appropriate estimation method of timing degradation. These results indicate that the combination of SPCM-C and IAMs gives reasonably accurate estimation in NBTIinduced timing degradation using an ordinary framework of gate-level timing analysis.
Finally, Table 11 shows the CPU times for operations accompanied with each SPCM and each annotation method. In this work, stress probability calculation and |ΔV th | annotation were executed by ruby scripts, and hence CPU time could be reduced by implementing with other languages, such as C and C++. In Table 11 , logic simulation notably consumes larger CPU time than other operations. Because of not performing logic simulation in SPCM-A, there is a large difference in total execution time between SPCM-A and a group of SPCM-B and -C. Furthermore, with SPCM-C & IAM-II, preliminary transistor-level timing analysis is required to obtain the order of paths in delay, which is referred in the annotation process. Owing to this extra timing analysis, CPU time with SPCM-C & IAM-II is certainly larger than those with SPCM-C & TAM/IAM-I, while this CPU time increase is still much smaller than that of logic simulation.
Conclusion
This paper evaluated how much stress probability consideration in estimation of NBTI-induced delay degradation impacts the accuracy, focusing on analysis granularity. Stress probability calculation was performed for two circuits at three granularity levels: circuit-level, instance-level, and transistor-level. Evaluation results showed that considerable path delay difference may arise even though instancelevel stress probability calculation is performed. We demonstrated that distribution of stress probabilities heavily depends on operation of circuits and affect prediction of path delay. Moreover, in order to enable gate-level timing analysis even with transistor-level stress probability calculation, instance-by-instance |ΔV th | annotation is considered. Using the proposed |ΔV th | annotation, accurate prediction of delay degradation due to NBTI can be performed with ordinary gate-level timing analysis.
