Abstract
Introduction
With the continuing trend in the CMOS technology scaling, leakage power is becoming a dominant contributor to the total power consumption. To reduce leakage power, many techniques have been proposed, including transistor sizing, multi-V th , dual-V th , optimal standby input vector selection, dual-V DD , transistor stacking, and body bias.
Dual-V th assignment [1-6, 7-9, 11, 12] is an efficient technique for leakage reduction. Traditional deterministic approaches for dual-threshold assignment [1-6, 11, 12] to minimize the leakage power utilize the timing slack of noncritical paths to assign high V th to some or all gates on those paths to decrease the leakage. These approaches can be divided into two groups: heuristic algorithms [1] [2] [3] [4] and linear programming [5, 6, 11, 12] . Unlike a heuristic algorithm that can only guarantee a locally optimal solution, linear programming formulation ensures a global optimization.
However, the increased variation of process parameters of nanoscale devices can cause a significant increase in the leakage current because of an exponential relation between the leakage current and some key process parameters. Gate leakage is most sensitive to the variation in oxide thickness (T ox ). The subthreshold current is extremely sensitive to the variation in oxide thickness (T ox ), effective gate length (L eff ) and doping concentration (N dop ). Compared with the gate leakage, the subthreshold leakage is more sensitive to parameter variations [10] . Twenty percent variations in effective channel length and oxide thickness can cause up to 13 and 15 times differences, respectively, in the amount of subthreshold leakage current. Gate leakage can have 8 times difference due to a 20% variation in oxide thickness.
Variation of process parameters not only affects the leakage current but also changes the gate delay, degrading either one or both, power and timing yields of an optimized design. To minimize the effect of process variation, some techniques [7] [8] [9] statistically optimize the leakage power and circuit performance by dual-V th assignment. Leakage current and delay are treated as random variables. A dynamic programming approach for leakage optimization by dual-V th assignment has been proposed [7] using two pruning criteria that stochastically identify pareto-optimal solutions and prune the sub-optimal ones. Another approach [9] solves the statistical leakage minimization problem using a theoretically rigorous formulation for dual-V th assignment and gate sizing.
Our work is motivated by the above research. A mixed integer linear programming (MILP) model is proposed to minimize leakage power with specified timing yield under process variation. This MILP method is specifically devised with a set of constraints whose size is linear in the number of gates. Although theoretical worst-case complexity of MILP is exponential, our experimental results show that actual complexity depends on the nature of the problem. To deal with the complexities of delay models and leakage calculation, two look up tables for the nominal delay and leakage current are pre-constructed for each cell. This greatly simplifies the optimization procedure. Experimental results show that 30% more reduction of leakage power can be achieved by using the statistical approach when the result is compared to a deterministic approach. This paper is organized as follows. Section 2 presents a deterministic linear programming formulation. Section 3 discusses the statistical leakage and gate delay modeling, and proposes a statistical linear programming method for leakage minimization under process variation. In Section 4, experimental results are presented and discussed. A conclusion is given in Section 5.
Deterministic Dual-V th ILP
In the deterministic approach, the delay and subthreshold current of every gate are assumed to be fixed and without any effect of the process parameter variation. Basically, this type of methods can be divided into two groups: heuristic algorithms [1] [2] [3] [4] and linear programming [5-6, 11, 12] . Heuristic algorithms give a locally optimal solution and linear programming formulation ensures a globally optimum solution.
An ILP that optimizes the leakage power and assigns dual-V th to gates in one step [11, 12] has an advantage over an iterative procedure [5] , which must assume power-delay sensitivities to be constants in a small range. Figure 1 gives the basic idea of the ILP method [11, 12] to minimize total subthreshold leakage while keeping the circuit performance by dual-V th assignment. A detailed version for the ILP formulation is presented in Figure 2 . X i is an integer that can only be either 0 or 1. A value 1 means that gate i is assigned low V th , and 0 means that gate i is assigned high V th . T i is the latest arrival time at the output of gate i. Each gate in the design library with low and high threshold versions is characterized for its leakage in various input states and gate delay, which also depends on the fanout number, using Spice simulation.
Statistical Dual-V th Assignment
Process variations include inter-die and intra-die variations, or global and local variations. For inter-die variations, the deterministic and statistical approaches are exactly the same. Since our objective is to have a statistical ILP formulation that enhances the deterministic approach to leakage optimization under process variations, we ignore the inter-die variation. In the remainder of this paper, process variation will only mean intra-die variation.
Leakage current is composed of reverse biased PN junction leakage, gate leakage and subthreshold leakage. In a sub-micron process, PN junction leakage is much smaller than the other two components. Gate leakage is most sensitive to the variation in T ox and changes in the gate leakage due to other process parameter variations can be ignored [10] . Further, assuming T ox to be a well-controlled process parameter [14, 15] , we ignore the gate leakage variation in our design, focusing only on changes in the subthreshold leakage due to process variation.
Due to the exponential relation of subthreshold current with process parameters, such as, the effective gate length, oxide thickness and doping concentration, process variation can severely affect both power and timing yields of a design obtained by a deterministic method. Because fixed subthreshold leakage and gate delay do not represent the real circuit condition, statistical modeling should be used. This is discussed next.
Statistical Subthreshold Leakage Modeling
Subthreshold current has an exponential relation with the threshold voltage, which in turn is a function of oxide thickness, effective channel length, doping concentration, etc. T ox is a fairly well-controlled process parameter and does not significantly influence subthreshold leakage variation [14, 15] . Therefore, we only consider variations in L eff and N dop . The statistical subthreshold model can be written as [15] :
Where, DL eff is the change in the effective channel length due to the process variation and DV th,Ndop is the change in the threshold voltage due to the random distribution of doping concentration, N dop . Both are random variables with a normal distribution, N(0,1). Fitting parameters c 1, c 2 and c 3 are determined from Spice simulation.
From equation (1), it is obvious that I sub has a lognormal distribution. The total leakage current in a circuit, which is the sum of subthreshold currents of individual gates, has an approximately lognormal distribution. Rao et al. [15] use the central limit theorem to estimate this lognormal distribution by its mean value with the assumption that there is a large number of gates in the circuit, which indeed is the case for most present day chips. Hence, the total leakage can be expressed as:
Where, 
Statistic Delay Modeling
The deterministic gate delay D is given by [13] :
where α equals 1.3 for the short channel model. Similar to the subthreshold current model, the V th deviation due to the process parameter variation is also a consideration in our statistical delay model. The change of V th due to the variation of process parameters can be expressed as [7] :
where X i is a process parameter, X i0 is the nominal value of X i , and β Xi is a constant for the specific technology.
To get an approximated linear relation between D and the variations of the process parameters, equation (5) is expanded into a Taylor series (7) in which only the first order term is retained because higher orders terms are relatively small and can be ignored. 
Equation (8) 
Because r i is a normal N(0,1) random variable, m Di , the mean value of D i , is equivalent to D nomi , the nominal delay of gate i.
Statistical Dual-V th Assignment ILP
In statistical approach to minimize leakage power by dual-V th assignment (Figure 3) , the delay and subthreshold current are both random variables, and h is the expected timing yield. The power yield is not considered because in Section 4 (Results) we will find that the statistical approach can get about 30% additional leakage power reduction for most circuits compared to the deterministic approach. In Figure 3 , T POi is the path delay from primary input to the i th primary output and is assumed to have a normal (Gaussian) distribution N(m TPOi ,s 2 TPOi ). Inequality (12) allows leakage to be optimized with timing yield η and it can be expressed into a linear format by the percent point function F -1 [16] :
In statistical linear programming (Figure 4 ) all variables, except X i , are random variables with normal distribution. Comparing the deterministic ILP ( Figure 2 ) and statistical ILP (Figure 4) , we observe the following differences:
• The deterministic gate delay in (D-C1) is extended to (S-C1) and (S-C2) to get the mean and standard deviation of the statistical delay.
• (D-C2) is extended to to (S-C3) through (S-C6) to get the mean and standard deviation of the statistical arrival time T i at the output of gate i.
Minimize total sub
I , (11) Subject to ( ) h ‡ £ max T T P POi (12) • (D-C4) is updated to (S-C8) to ensure certain timing yield under process variation. 
Linear Approximations
In linear programming, all the expressions and constraints should be linear functions. However, in statistical analysis, some nonlinear operations are present. We, therefore, use linear approximations.
• ADD, A = B + C If B and C are N(m, s 2 ) random variables, then their sum A also has a normal distribution. 'Add' is a linear function, but in statistical analysis, to obtain the standard deviation σ A , we must deal with s A 2 =s B 2 +s C 2 , which is a nonlinear operation. Considering, 2 ) random variables, A does not necessarily have a normal distribution [17, 18] . However, a normal approximation with following mean and standard deviation has been used [19, 20] The error in this approximation has been shown to be small [19, 20] .
•
MAX, A = MAX(B+D, C+D)
Similarly, for function A=Max(B+D,C+D), we use (16) and (17) The above linear approximations are used in our statistical analysis to model the leakage optimization problem under process variation by a linear programming formulation.
Results
We use the BPTM 70nm CMOS technology [22] . Low V th for NMOS and PMOS are 0.20V and -0.22V, respectively. High V th for NMOS and PMOS are 0.32V and -0.34V, respectively. We regenerated the netlists of ISCAS'85 benchmark circuits using a cell library in which the maximum gate fanin is five. Two look-up tables for nominal gate delays and nominal leakage currents, respectively, for each type of cell were constructed using Spice simulation. A C program parsed the netlist and generated the constraint set for the CPLEX ILP solver in the AMPL software package [21] . CPLEX then gave the optimal V th assignment as well as the minimized leakage current for the circuit.
Comparison of Leakage Power Reduction by Deterministic and Statistical Methods
To compare the power optimization results of the statistical ILP with those from the deterministic approach, we assume that all the gates have the same c i1 , c i2 and c i3 (sensitivities of gate delay to the variation of different process parameters) in equation (8) . Therefore, each gate has the same r i. (=s Di /u Di ). We assume it to be 10%. This assumption is only for the simplicity and does not change the efficacy of the statistical approach.
In Table 1 , columns 4, 6 and 9 give the optimized leakage power by deterministic ILP, by statistical ILP with 99% timing yield and by statistical ILP with 95% timing yield. From Table 1 , we see that compared to the deterministic method, which uses the fixed values, when we use statistical models for gate delay and subthreshold leakage current, ISCAS85 benchmarks can achieve on average 29% greater leakage power saving with 99% timing yield and 41% greater power saving with 95% timing yield. The reason is that statistical model has a more flexible optimization space, while the deterministic approach assumes the worst case. For c499 and c1355, which have many critical paths due to their extremely symmetrical circuit structures, the optimization space is limited and therefore the additional power saving contributed by optimization is much smaller, especially with the higher timing yield (99%). It is also obvious that with a decreased timing yield, higher power saving can be achieved due to the relaxed timing constraints, resulting in larger optimization space. Figure 5 shows the power-delay curves for C432 for deterministic and statistical approaches. The starting points of the three curves, (1,1), (1,0.65) and (1,0.59), indicate that if we can reduce the leakage power to some 1 unit by deterministic approach, 0.65 unit and 0.59 unit leakage power can be achieved by using statistic approach with 99% and 95% timing yields, respectively. Lower the timing yield, higher is power saving. With a further relaxed T max , all three curves will give more reduction in leakage power because more gates will be assigned high V th .
Run Time of MILP Algorithm
The run time of ILP is always a big concern since its complexity is exponential in the number of variables and constraints of the problem in the worst case. However, our experimental results show that the real computing time may depend on the circuit structure, logic depth, etc., and may not be exponential. Running on a 2.4GHz AMD Opteron 150 processor with 3GB memory, many CPU run times for solving the ILP problem were less than one second (columns 5, 8 and 11 in Table 1 ). This is an advantage over other techniques [9] because we achieve 30% more leakage reduction with 99% timing yield but in much less CPU time.
Besides ISCAS'85 benchmark circuits, we also optimized the leakage for an ARM7 IP core, which has 15.5k combinational cells and 2.4k sequential cells implemented in TSMC 90nm CMOS process. The experimental results in the last row of Table 1 show 14% more leakage reduction achieved with 37 seconds run time and partly demonstrate the feasibility of applying our MILP approach to real circuits.
Although today's SOC may have over one million gates, it always has a hierarchical structure. ILP constraints can be generated for submodules at a lower level and the run times will be determined by the number of gates in the individual submodules. Such a technique may not guarantee a global optimization, but still would get a reasonable result within acceptable run time.
Conclusion
A mixed integer linear programming formulation to statistically minimize the leakage power in a dual-V th process under process variations is proposed in this paper. The experimental results show that 30% more leakage power reduction can be achieved by using this statistical approach compared with the deterministic approach. In the statistical approach, the impact of process variation on leakage power and circuit performance is simultaneously minimized when a small yield loss is permitted.
