As technology scales, Negative Bias Temperature Instability (NBTI) has become a major reliability concern for circuit designers. And the growing process variations can no longer be ignored. Meanwhile, reducing leakage power remains to be one of the design goals. In this paper, we first present a platform for NBTI-aware statistical timing and leakage power analysis. A variation-aware supply voltage assignment (SVA) technique combining dual V dd assignment and dynamic V dd scaling techniques is proposed to minimize NBTI degradation and leakage. Based on the statistical platform, we analyze the impact of V th variations on NBTI degradation and leakage. The experimental results show that our SVA technique can mitigate on average 52.98% of NBTI degradation with little or without leakage power increase; furthermore, it can reduce on average 32.46% more leakage power compared with the pure single V dd scaling technique. Compared with scheduled voltage scaling technique [9] , our dynamic scaling technique is more effective because the circuit delay will exactly meet the specification at each dynamically decided time node during circuit operation.
INTRODUCTION
With the continuous scaling of CMOS technology, Negative Bias Temperature Instability (NBTI) is emerging as one of the major reliability degradation mechanisms [1] . NBTI is an aging effect which gradually increases the threshold voltage of PMOS transistors when they are negatively biased, thus leading to an increase of gate delay. Over a long period, such effect results in degradation in circuit speed. Moreover, the drastically growing process and device variations are emerging as key influencing factors of circuit performance. Traditional Static Timing Analysis (STA) is becoming insufficient to accurately evaluate the various variations' impact on circuit performance. Instead, Statistical Static Timing Analysis (SSTA) is an effective technique to evaluate the increasing variations [2] .
Meanwhile, leakage power has become a large portion of the total power consumption. During circuit operation time, NBTIinduced V th degradation and process variations may severely affect the leakage power. The variations may cause the leakage magnitude and distribution to become larger and wider [3, 4] . So how to accurately analyze and reduce leakage power under the impact of both NBTI and variations needs to be solved.
Many researchers have explored how to mitigate NBTI degradation, and have provided some techniques such as synthesis [5] , sizing [6] , Input Vector Control(IVC) [7] , and Internal Node Control(INC) [8] . These techniques are implemented at the design time and the circuit is fixed during the circuit operation time. However due to the various variations, the circuit performance may be changed during operation time and be different from the prediction results.
Recently a few adaptive techniques were proposed to mitigate the NBTI effect during the circuit operation. Zhang et al. [9] proposed a scheduled voltage scaling to increase lifetime. They gradually increase V dd to compensate for NBTI degradation. Their technique has the potential to increase IC lifetime by 46% compared with guard-banding, while using 10 discrete voltage levels increase IC lifetime by 32.5%. However, their results were based on theoretic analysis. Scheduled technique is suitable for any circuits. Furthermore, they achieved a lower leakage power compared with guard-banding; in fact, leakage power will be significantly higher than the original value without any NBTI protection techniques. Kumar et al. [10] proposed an Adaptive Body Biasing(ABB) and Adaptive Supply Voltage(ASV) technique: they adjust the supply and bias voltage during circuit operation to recover the circuit performance. Their technique has a minimal overhead in area and a small increase in power compared with guard-banding. However, they still increased the leakage by 27% on average compared with the original value. Both Zhang and Kumar's results were based on deterministic performance estimation without considering variations.
In this paper, a variation-aware supply voltage assignment (SVA) technique combining dual V dd assignment and dynamic V dd scaling techniques is proposed based on a statistical platform to minimize NBTI degradation and leakage: 1) We build a statistical platform for statistical timing, NBTI and leakage analysis considering V th variations. 2) We propose a variation-aware SVA technique.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ---------------- 
MODEL REVIEW 2.1 Gate delay and NBTI model
According to the alpha-power law, the load dependent delay of gate v is given by [11] :
where C L is the load capacitance, K and α are constants. NBTI can be described using Reaction-Diffusion mechanism [12] [13] [14] . When a PMOS transistor is negatively biased, the holes in the channel weaken the Si-H bonds, results in the generation of positive interface charges and hydrogen species, and then, threshold voltage of PMOS increases. The threshold voltage shifts ∆V th under static NBTI can be calculated as [13, 14] 
where q is the electron charge, k is the Boltzmann constant, C ox is the oxide capacitance per unit area and E ox =(V gs -V th )/T ox .
Leakage power model
A leakage lookup table is created by simulating all the standard cell types under all possible input patterns and varying V dd , V th . Thus the leakage power can be expressed as:
leak dd leak dd th input
where I leak (v,input,V dd ,V th ) and Prob(v,input) are the leakage current and the probability of gate v under input pattern input. Leakage power will be smaller due to the NBTI degradation and be larger under high V dd stress according to the formulas in [3, 15] . Figure 1 shows our statistical NBTI and leakage analysis flow. For a given circuit, logic simulator is used to generate the voltage level of each internal node. The NBTI-induced delay degradation of each gate at a given time node is calculated through transistor level NBTI model, and then the statistical timing of the circuit can be calculated. The leakage power is estimated through the input vector aware leakage lookup tables. 
STATISTICAL ANALYSIS PLATFORM
else v i is an internal gate then 7.
Calculateμ ATi (t) andσ ATi (t) using max operation;
8.
Calculate μ LT (t) i andσ LTi (t) using add operation; 9. end for 10. Calculateμ LT (t) andσ LT (t) of output; Figure 2 . The statistical timing analysis algorithm. Table 1 shows the variables in the statistical timing analysis algorithm and figure 2 shows the algorithm. We first calculate the correlation coefficient of all the gates. Mean value and standard deviation of V th at time t are calculated using the NBTI model. We evaluate all the gates in a topological order. If the gate is a primary input, then its mean value and standard deviation of arrival time are both set to be 0. Otherwise it's an internal gate, the arrival time is calculated using MAX operation considering all its inputs, then the leave time at its output is calculated by ADD operation. Finally we calculate the delay of the whole circuit. The total leakage power can be calculated as follows:
Where μ Li (t) and σ Li (t) are mean value and standard deviation of leakage power of gate i at time t.
Modeling of variation
Many variations strongly affect the gate delay. Since gate delay and leakage both strongly depend on V th , we simply consider the V th variations and assume that it is modeled by Gaussian distribution: 
Operations for timing analysis
1) ADD operation: Given mean value and variance of n gates in one path, mean value and variance of the path delay are calculated:
2) MAX operation: In many cases, the MAX result of two or more Gaussian distribution can be assumed as an approximate Gaussian distribution [16] , the mean value and standard deviation are calculated with Clark's formula [17] :
Correlation coefficient of all the gates
We assume that the logically close gates are also spatially close, and the threshold voltage of gate v i is correlated with some nearest gates, each correlation coefficient is:
The threshold voltage of each gate v i is expressed as linear combinations of its mean value and random variables:
where ∆V i 's are standard Gaussian random variables; V i is the mean value of V i,th . We can get ρ i 's from lab measurement, a i 's can be calculated as:
SUPPLY VOLTAGE ASSIGNMENT(SVA) 4.1 Overview of the SVA technique
Based on the NBTI model, gate V th will increase and then results in degradation in circuit speed. To counteract the degradation, V dd must gradually increase. However, increasing V dd will directly increase leakage. We notice that it is not necessary to increase all gates' V dd , because non-critical gates have delay slack. So instead of using one scaling supply voltage, we propose to use dual V dd , the high V dd is used to compensate for NBTI degradation on critical gates, while low V dd is used to reduce leakage power on other gates who do not affect the circuit delay. Furthermore, In our technique, all the protection parameters are dynamically calculated according to the circuit performance constraints. There are two steps in our technique (Fig. 3 ): 1) Dual V dd assignment: divide all the gates into two sets: HVS (high V dd set) and LVS (low V dd set); 2) Dynamic V dd scaling: dynamically determine the optimal time nodes and the voltage values.
Dual V dd assignment
In the beginning, the nominal delay and leakage power at time 0 are calculated. Then we determine two gate sets: HVS (high V dd set) and LVS (low V dd set). Since a low V dd gate cannot directly drive a high V dd gate, a level converter should be used. In order to avoid level converters, HVS includes all the critical gates and all the predecessors of critical gates; LVS is composed of all the rest gates who do not directly affect the circuit delay. Generally speaking, HVS is larger than LVS for most circuits. Based on the proportion between HVS and LVS of each circuit, we can approximately estimate the potential of NBTI-induced degradation mitigation and leakage reduction by our technique.
Dynamic V dd scaling
In our technique, we set a delay specification for each circuit, which is chosen as the delay value of each circuit when t=10 days.
At time 0, we first calculate mean value and standard deviation of delay and leakage, then determine the optimal V ddhigh (t 1 ) and V ddlow (t 1 ) which will be assigned in the following time interval [t 0 , t 1 ]. The HVS gates are assigned V ddhigh while LVS gates are assigned V ddlow . The detailed determination of V ddhigh and V ddlow will be described in the below. With new V ddhigh (t 1 ) and V ddlow (t 1 ), we can estimate V th degradation of each gate at later time as same as the method in [9] , and then predict the next time node t 1 at which the circuit delay will exceed the specification and we need to scale supply voltages again.
The same procedure including three operations: determine optimal V ddhigh (t i+1 ), V ddlow (t i+1 ) and predict the next time node t i+1 , will be repeated at each time node t i , until circuit lifetime ends. Figure 4 shows how V dd scaling improves the delay degradation, the delay will be a sudden drop at time t i immediately after assigning the optimal V ddhigh (t i+1 ). Notice that the higher V ddhigh (t i+1 ) is, the smaller the circuit delay at time t i will be. However, higher V dd leads to higher NBTI-induced V th degradation and higher leakage power. Our object is to make sure that the circuit delay during time interval [t i , t i+1 ] and leakage power at time t i (leakage at t i is the maximum value during [t i , t i+1 ]) achieve optimal values simultaneously. Considering statistical model, the object is to minimize the following function:
Determine the optimal V ddhigh
where A and B are two weight constants to balance the leakage and circuit delay, D(t i+1 ) and L(t i ) are the circuit delay at t i+1 and circuit leakage power at t i . However, before V ddhigh (t i+1 ) is calculated we can't predict t i+1 , because t i+1 depends on V ddhigh (t i+1 ) which in turn depends on t i+1 . We simply assume that t i+1 equals to t i +(t i -t i-1 ) (this assumption is only used in equation (12) to simply find the optimal V ddhigh ; after V ddhigh is calculated, the real t i+1 will be predicted using NBTI degradation model). An approximate numerical algorithm is implemented to find the minimum value of equation (12) 
Determine the optimal V ddlow
Consider the gates in LVS, they have delay slack, so their delay can be relaxed and then their V dd 's could be lower:
where D relax , D current , D slack are the relaxed delay, the current delay, the delay slack of the gate v respectively. C is a constant between 0 and 1 to make sure that the gate in LVS will not change to be a critical one. With the relaxed gate delay and according to equation (1) , the low V dd of gate v denoted by V ddlow (v) can be calculated. The optimal V ddlow of the whole circuit is the maximum value of all the V ddlow 's.
Overhead of our technique
Dynamic V dd scaling technique needs voltage regulator and DAC [9] . The detailed analysis of this overhead will not be discussed in this paper. On the other hand, we do not use level converters. Because besides critical gates, HVS also includes all the predecessors of critical gates, which may not be critical gates at all. In fact, these gates still have delay slack. So if we use level converters then the trade-off between leakage power and penalty induced by level converters must be well balanced.
SIMULATION RESULTS

Implementation and Experiment Setup
Our platform and algorithms are implemented by C++. The benchmark circuits are synthesized using a 65nm library from industry. Some key technology parameters are: nominal |V dd |=1.0V; |V th |=0.20V for both NMOS and PMOS transistors; T ox =1.2nm. ISCAS85 benchmark and some ALU circuits are used to evaluate our platform and algorithms. The temperature is set to be 378K corresponding to the worst-case NBTI degradation and leakage. The circuit lifetime is set to be 10 years. Figure 5 shows the impact of the V th variations on NBTI for c880. The mean value of circuit delay considering variations is larger than the nominal value without variations. The delay distribution is the lower and upper bounds(μ±3σ). As time passing, the mean value increases due to NBTI effect, while the distribution becomes narrower which matches the silicon results in [12] . Figure  6 shows the impact of the V th variations on leakage. The leakage decreases due to V th shifts which is opposite compared with circuit delay degradation. Similar to NBTI, the mean value of leakage considering variations is larger than the nominal value and the leakage distribution becomes narrower. Figure 7 shows our statistical timing results compared with Monte Carlo simulation for c880 results at 3 time nodes: 0, 3 years, and 10 years. It also shows that the mean value of delay increases and standard deviation decreases as the same as figure 5 . Considering NBTI degradation, the lower bound (μ-3σ) of delay after 3 years is larger than the upper bound (μ+3σ) of delay at time 0. So NBTI is a major reliability degradation mechanism and can not be ignored in circuit estimation and optimization. Comparing our statistical timing results with Monte Carlo of all the circuits, the error ratio of the mean value of circuit delay is 0.98% on average and error ratio of standard deviation is 7.48% on average.
Impact of variations on NBTI and leakage
Runtime: our statistical platform is at least 5X faster (from 0.016s to 307s for benchmark circuits from array4x4 to C7552) than Monte Carlo method (from 16s to 1542s for benchmark circuits from array4x4 to C7552). Figure 6 . The impact of variations on leakage power for c880. Figure 8 and 9 show how our technique is performed on c1908. In the beginning, our technique gets the same circuit delay as the nominal delay, because the delay does not achieve the specification. Once the delay exceeds the specification, the dual V dd voltages are scaled, so the circuit delay has a sudden drop and leakage also has a sudden change. During circuit operates, the same procedure will be repeated at each time node to guarantee that the circuit delay does not exceed the specification and the leakage power reduces as much as possible. By increasing high V dd , the delay by our technique is always smaller than the nominal degradation value. For c1908, we get 72.99% NBTI degradation saving compared with nominal design. Figure 9 shows the leakage power reduction. The leakage power increases gradually because we increase the high V dd at HVS gates which are more than gates in LVS for c1908. During early operation time, our technique can get a lower leakage power; while during later time, the leakage power increases and is larger than the nominal value. However, our dual V dd technique can make the leakage power as low as possible and get 3.29% reduction of the maximum leakage power for c1908. Figure 10 shows the V dd scaling for c1908. The high V dd increases to counteract the NBTI degradation, while the low V dd decreases to reduce leakage. However it's not true for any circuits, because the NBTI effect will change the circuit timing during circuit operation, and the slack of each gate may severely change. Figure  11 shows the V dd scaling for c7552. The high V dd increases but low V dd increases and decreases iteratively. Figure 12 shows the comparison between single V dd scaling technique and our dual V dd scaling technique. By single V dd scaling, we only use single high V dd for all gates and scaling the V dd to mitigate the NBTI degradation (similar but not same as [9] or [10] ). During the whole circuit lifetime, the delay of our approach is very close to that of single V dd scaling technique. However, our approach can save 47% more leakage power than pure single V dd scaling for c1908. The results show that SVA technique can mitigate on average 52.98% of NBTI degradation and save 2.5% of maximum leakage simultaneously. On the contrary, for single V dd scaling, it only save 6% more delay degradation, however it causes 29.96% leakage increase. We also compare the results of large circuits with average results. For large circuits, the LVS gates number is more than that of average, hence more leakage saving can be achieved by our approach. Compared with single V dd scaling, our technique can reduce 56.5% more leakage for large circuits. Furthermore, for different circuits, the voltage level, the time nodes and optimal voltage values are different. The average of voltage levels is 9.6, which is close to the optimal level number analyzed in [9] . However scheduled voltage values and scheduled time nodes are not the optimal solution for any circuits. Figure 13 shows the results for c499, using 10 voltage levels and 10 time nodes chosen same as [9] . The comparison shows that scheduled technique adjusts too frequently in the beginning but less frequently during the later time, so it can't guarantee that the delay will not exceed a specification during circuit operation time.
Results of variation-aware SVA technique
The effect of our technique
Runtime: the runtime of our SVA ranges from 0.078s to 621s for the benchmark circuits array4x4 to C7552.
The impact of variations on results
We also implement our technique on STA platform without considering V th variations. Table 3 shows the results implemented on STA platform. Compared with the results of SSTA, the STA results are better. STA results save 2.8% more NBTI degradation and reduce 8.24% more leakage power. The comparison shows that STA model without considering variations underestimates the NBTI degradation and leakage, which may be severely affected by variations ( figure 5 and 6 ). So in order to get a more accurate optimization, SSTA is recommended.
CONCLUSION
Power and reliability have become two key design goals with technology scales. Meanwhile, the growing variations can severely affect circuit delay and leakage. In this paper, a supply voltage assignment technique combining dual V dd and dynamic V dd scaling is proposed on a statistical platform to minimize circuit degradation and leakage. Our technique can save on average 52.98% NBTI degradation and reduce 2.5% leakage; while for large circuits, we can save more leakage. Compared with single V dd scaling, our technique save 32.46% more leakage on average. Furthermore, our technique is more flexible since the optimal results are dynamically decided for any circuits so that the circuit delay will exactly meet the specification at each time node during circuit operation. 
