Abstract-In response to the increasing variations in integrated-circuit manufacturing, the current trend is to create designs that take these variations into account statistically. In this paper, we quantify the difference between the statistical and deterministic optima of leakage power while making no assumptions about the delay model. We develop a framework for deriving a theoretical upper bound on the suboptimality that is incurred by using the deterministic optimum as an approximation for the statistical optimum. We show that for the mean power measure, the deterministic optima is an excellent approximation, and for the mean plus standard deviation measures, the optimality gap increases as the amount of inter-die variation grows, for a suite of benchmark circuits in a 45 nm technology. For large variations, we show that there are excellent linear approximations that can be used to approximate the effects of variation. Therefore, the need to develop special statistical power optimization algorithms is questionable.
Evaluating Statistical Power Optimization Jason Cong, Fellow, IEEE, Puneet Gupta, Member, IEEE, and John Lee, Student Member, IEEE Abstract-In response to the increasing variations in integrated-circuit manufacturing, the current trend is to create designs that take these variations into account statistically. In this paper, we quantify the difference between the statistical and deterministic optima of leakage power while making no assumptions about the delay model. We develop a framework for deriving a theoretical upper bound on the suboptimality that is incurred by using the deterministic optimum as an approximation for the statistical optimum. We show that for the mean power measure, the deterministic optima is an excellent approximation, and for the mean plus standard deviation measures, the optimality gap increases as the amount of inter-die variation grows, for a suite of benchmark circuits in a 45 nm technology. For large variations, we show that there are excellent linear approximations that can be used to approximate the effects of variation. Therefore, the need to develop special statistical power optimization algorithms is questionable.
Index Terms-Algorithms, gate sizing, optimization, physical design, statistical power.
I. Introduction

S
TATISTICAL optimization via circuit sizing has been an active research topic over the last decade. The realization was that the traditional corner-based optimization [2] ) may be too pessimistic [3] , and the trend was to incorporate more and more statistical data into the optimization process.
There are many papers that explore the benefits of adding statistical delay data into the optimization process [4] , [5] - [10] , and there are also a number of papers that use a statistical power measure [7] , [11] - [15] . However, to the best of our knowledge, there is no publication that shows the benefits of using the statistical power measure alone.
This brings up an interesting question: how much of the improvement should be attributed to the use of a statistical delay model and how much should be attributed to the use of a statistical power model? This question is part of a growing skepticism over the benefits of statistical optimization, and whether they outweigh its costs. Adopting statistical analyses and optimization involve considerable overhead in terms of the [1] . This paper was recommended by Associate Editor M. Orshansky.
J. Cong is with the Department of Computer Science, University of California, Los Angeles, CA 90095 USA (e-mail: cong@cs.ucla.edu).
P. Gupta and J. Lee are with the Department of Electrical Engineering, University of California, Los Angeles, CA 90095 USA (e-mail: puneet@ee.ucla.edu; lee@ee.ucla.edu).
Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCAD. 2010.2061390 engineering effort involved as well as the turn-around time. It requires an almost complete overhaul of process modeling, circuit simulation, and also a modification to the algorithms for statistical optimization. It is, therefore, important to do a thorough cost-benefit analysis of statistical optimization compared to conventional deterministic optimization methods. The related question of statistical delay optimization vs. deterministic delay optimization has already been studied [16] - [18] . In [16] , the claim is made that corner or scenariobased optimization is still the most practical because of the following:
1) intra-die effects are still small; 2) there is usually not enough information to do a fullblown statistical analysis; 3) the gains of using a full-blown statistical analysis are small.
In [18] , the authors quantify the difference between cornerbased methodologies and full statistical optimization methods. They find that with a 5% variation in stage delay, the fullblown statistical analysis and optimization gives a mere 2% improvement, and a 12% variation gives a 6% improvement over a statistical worst-case corner that employs a guardband. In [17] , the tradeoff between yield and circuit delay, and the improvements in slack are examined. Significant improvements are shown for a set of benchmark circuits.
In this paper, we focus on the amount of improvement that can be made by using a statistical power measure as an objective for gate sizing, l eff and V th assignment, when compared to the deterministic power measure. The key contributions of the paper are as follows.
1)
We develop a mathematical programming-based framework to estimate the suboptimality gap between different power measures. 2) For the common case of discrete gate sizing, we give an intuitive explanation of the suboptimality using solution rank orders. 3) We show that for certain sizes of variations, the deterministic power measure is a provably good approximation for the statistical power measures, which means that the deterministic power measure can be used in place of the statistical power measures with very similar optimization results. 4) We present a simplified measure of statistical power that can be used as a proxy for full statistical power optimization.
It is important to mention that the analysis in this paper is independent of the model for the delay. This paper does not give judgments on the difference between statistical and 0278-0070/$26.00 c 2010 IEEE 
deterministic delay optimization, or the difference between static and statistical static timing analysis. The delay is only used to generate examples for the suboptimality bounds.
The approach in this paper intends to remove the dependence on timing feasibility, which is a consideration that makes the problem of finding bounds difficult. To remove this dependence, we rely on the structure of the power objective to create a relaxation of the timing feasible region. This can be done because a power optimum inherently contains information about the timing feasible region, as this optimum is the minimum power point in the region. Once this is done, the relaxation can be used to find bounds without any further dependence on the timing feasible region.
The rest of the paper is organized as follows. The following section outlines the leakage power measures and models used in this paper. Section III describes the test circuits that are used in this paper. Section IV gives an intuitive explanation of the suboptimality gap using rank orders. Section V develops the mathematical framework to estimate the suboptimality gap incurred from using the deterministic power measure in place of the statistical power measures. Section VI introduces a simplified measure of statistical power that can be used as a proxy for full statistical power optimization. The paper is summarized in Section VII.
II. Statistical Power
In this paper, uppercase bold symbols represent random variables (e.g., X) and uppercase non-bold symbols represent commonly used constants (e.g., V dd ). Vector quantities will have arrows above them (e.g., x) and scalar quantities will be lowercase non-bold (e.g., p). The principal symbols are summarized in Table I . This paper examines the benefits of using the statistical power as an objective to the problem as follows: minimize Statistical Power(w, l, v th ) subject to Delay(w, l, v th ) ≤ T max (1) compared to using the deterministic power as the objective. In this section, we derive expressions for statistical leakage power, review its mathematical properties, and discuss how the different properties of the statistical power would affect the resulting optima.
A. Variations
In this paper, the gate length and the threshold voltage are assumed to be the sources of power variations. These variations are assumed to be Gaussian in both the length and the threshold voltage. The standard deviations in v th are 4.714% in both die-to-die (dtd) and within-die cases (wid); the within-die variations are uncorrelated and they affect each gate independently. Three different standard deviations for the length are used in this paper:
1) 1 nm (dtd) and 0.5 nm (wid); 2) 1.6 nm (dtd) and 1.15 nm (wid); 3) 2 nm (dtd) and 0 nm (wid). These are representative of the variations for 45 nm given in the International Technology Roadmap for Semiconductors Roadmap 2007.
The effects of the variation are simulated for the gates in the Nangate Open Cell Library v1.2 [19] using predictive technology model 45 nm and HSPICE 2007. 1 We assumed that the input combinations to each gate are equi-probable and the measured leakage power is the average power over all the input combinations.
B. Models
The leakage power is modeled as a log-normal random variable as follows [20] :
where κ i is the nominal power for the gate i;
, and V thd are independent Gaussian random variables; and λ d(i) , λ w(i) , τ d(i) , and τ w(i) are coefficients that are used to fit the mean and the standard deviation of the SPICEsimulated data. In (2) above, the variations in the threshold voltage and the gate length are assumed to be Gaussian, and the relationship between these parameters to the leakage power is exponential.
To model the change in the leakage as a function of the size, threshold voltage and the length (see Section V), we use the approximation as follows:
where κ i , α i , β i , and γ i are parameters that are fitted to the data. The variable z i denotes the "adjusted gate width," which incorporates the effect of w, l, and v th on the power into an equivalent gate width. The z variables cannot be re-mapped to a unique w, l, and v th , but this is not a problem, as z is used only to compute lower bounds and is not used for design purposes. The nonlinear mapping from w, l, and v th to z has the effect of shifting the nonlinear relation of p d on v th to a linear one in z (which is, in turn, nonlinear in v th ), creating a useful abstraction for computing bounds.
C. Measures of Statistical Leakage Power
In this paper, we will cover the statistical measures summarized in Table II : the deterministic, mean, mean + 3σ, and the quantile power measures. The mean + 3σ measure refers to the mean + 3σ of the total leakage power of a design. When there is no within-die variation, the 99.87% quantile measure, which corresponds to the three sigma quantile of a Gaussian distribution, will be used as well.
In the table, p σ (·) is the measure of the standard deviation of the power, which can be expressed as follows:
The quantile measure p q is used only when inter-die variation is present. When intra-die, or within-die, variation is present, there is no closed-form expression for p q .
The power measures above have useful mathematical properties. p d , p m , and p q are all linear in κ i , and are thus concave and convex in z i . p m3σ is convex in z.
D. Why Do We Expect the Optimizations to be Similar?
The statistical power and deterministic power are not similar. For example, the statistical leakage power can be larger than the deterministic leakage by 10-500%. It is natural to expect that the influence of these measures will also be different, and that optimizing statistical power will yield different results compared to deterministic power.
In optimization, however, it is not the magnitude of the power but the relative magnitude which matters. If the statistical power is a scaled version of the deterministic power, then the optima will be the same. To see this mathematically, we examine the optimality condition for an optimum x : x is optimal if for any feasible
This condition will also hold for any positive scaling of f (x).
Although the values of the statistical power and the deterministic powers may be quite different, the trends are similar; the mean power is larger than the deterministic power for all of the gates, as is the quantile power, and so on. For example, consider Fig. 1 , which shows the power vs. size sensitivities. The sensitivities for the different gates follow the same trend. This suggests that the optimizations will be similar as well.
To see why this happens for statistical power, we take the quantile power expression with inter-die length variation as an 
If the λ d(i) are all equal, then the effect of variation will be seen equally for each gate and the objective will be a scaled version of the deterministic power as follows:
In actuality, the values of Surprisingly, the effect of the scaling leaves the orientation of the power sensitivities mostly intact. The magnitude changes significantly, but the direction of the vector remains similar. We can plot a power vector where each gate is a separate dimension, and the magnitude along the dimension is the leakage power for that gate. We can compute the difference in the angle of the corresponding statistical power vector and the deterministic power vector, θ, as shown in Fig. 3 . For σ L = 2 nm (dtd)/0 nm (wid) and σ V th = 4.7% (dtd and wid), the difference between the vectors is very small, at ≈ 1 deg. In Section V, we will see that the effects may not be large enough to make a significant difference in the optimized powers.
E. Increasing Variations
It is also interesting to examine how the sensitivities change as the variations increase. We can use (2) for a first-order analysis of how the sensitivities will change as the variations change.
We reconsider the case with inter-die length variation only. Doubling the variations results in the following: This will increase the difference in the scaling factors between gates. This is seen in Fig. 2 , where the increase in the variations changes the sensitivities for each gate. Notice that the sensitivities do not increase uniformly for each gate. The growth of the sensitivities depends on the topology of the design, and how much effect the variations will have on the gate.
III. Circuit Examples
In this paper, we use the International Symposium on Circuits and Systems '85 benchmarks and a 128-bit arithmetic logic unit (ALU) [21] as examples. The Verilog registertransfer level (RTL) of these benchmarks are synthesized to four different target speeds using the Encounter RTL compiler [22] to the the Nangate Open Cell Library v1.2 [19] , which uses the 45 nm technology node and has the sizes given in Table IV . The synthesized speeds are the maximum speed, the minimum speed, and two speeds in between. The fastest speed is labeled v1 and the slower synthesized speeds have higher cardinality (e.g., the slowest speed is v4). A table listing the number of gates in each design and the target delays is given in Table III . The tighter delay constraints result in larger designs, as these designs utilize more buffering to meet the delay constraint. These larger synthesized designs are also more interesting from a sizing point of view, as the space of possible solutions is larger.
All the optimization routines in this paper are solved using the MATLAB Optimization Toolbox [23] .
IV. Comparing Randomly Generated
Configurations The major difficultly in estimating the difference between statistical and deterministic optimization is due to the difficulty and l ∈ {1, 2, 3} n ), the problem is NP-complete [24] and it is tremendously difficult to find an optimal solution. However, there is some intuition that can be gained by examining the statistical and deterministic powers of points even if timing feasibility is ignored. Namely, we can consider: 1) how does a sorting of configurations (w, l, and v th assignments) change when ordered statistically, instead of deterministically, and 2) how does the statistical power vary for configurations with the same deterministic power?
As an experiment, the deterministic and statistical powers were computed for 10 000 randomly generated widths, lengths, and v th , for each design. Three different v th values were used (low, high, and normal), and four different gate lengths were used ({+0 nm, +1 nm, +2 nm, +3 nm}). In these examples, we assumed that σ L = 2 nm(dtd)/0 nm(wid), and σ V th = 4.7% (dtd & wid). Fig. 4 plots the deterministic power vs. the statistical power for the c1355 v2 circuit. From the plot, it is clear that the p d and the p m measures correlate more than the p d and the p m3σ . We can measure the correlation between the ranks using Kendall's τ [25] . Sequences that are perfectly correlated (e.g., have the same ranking) have a τ = 1 and sequences that are perfectly anti-correlated (e.g., are reverses of each other) have a τ = −1. The values for the different designs are given in Table V .
From this table, we can see that the p m rankings are near perfect (≈ 1). This indicates that the optimizations will be very similar. However, the p m3σ rankings range from 0.5 to 0.9. In this case, it is not clear from the rankings whether the statistical power is a good surrogate for the deterministic power.
Why the p m rankings correlate better than the p m3σ measures can be seen in Fig. 1 , which plots the power vs. size sensitivities for the p d , p m , and p q measures (the p m3σ can be thought of as a rough approximation of the p q measure). Notice that a sorting of the gates by sensitivity would be very similar for the p d and the p m measures. However, the sorting by p q , and, hence, the relations of the sensitivities of different gates, is very different.
In the analysis above, it is difficult to tell exactly how large the gap will be between the deterministic and statistical sizing solutions. To get a direct idea of the difference between the deterministic power compared to the statistical power, Table VI. we can make a useful assumption. Suppose we know the value of the minimum deterministic power (p d ) that is timing feasible. Then, we can estimate the difference between the best statistical sizing and the worst sizing for the given deterministic power as in Fig. 4(c) .
There is inherent error in this process because it depends on the number of samples. As the number of samples grows, the gap will increase as the number of extreme points are sampled. In the following section, we will compute these bounds without needing to sample the distribution.
V. Suboptimality Bounds
The central question in this paper is whether the deterministic power solution is a good approximation for the statistical power optimum. This generally requires information about the space of timing-feasible solutions (T ), which is difficult to describe, and is highly problem dependent, making it hard to give an exact answer. However, there is a way to solve a simpler problem with no assumption on the structure of T , instead, relying on the structure of the deterministic power objective.
In this section, we consider the following question. Suppose we approximate the solution to the statistical power optimization problem as follows: minimize p s ( w) (statistical power) subject to w ∈ T (timing constraint & discreteness) w ∈ B (upper and lower) (size bounds on w)
using the deterministic power optimum, w d , which is the solution to the problem as follows:
minimize p d ( w) (deterministic power) subject to w ∈ T (timing constraint & discreteness) w ∈ B (upper and lower).
(size bounds on w).
(D)
How good of an approximation will this be, and what is a bound for the suboptimality of this solution?
In the following, we will describe a method for creating suboptimality bounds. First, a simple set T is constructed that contains T . Optimizing the statistical power over this simpler set will return a lower bound on the statistical power optimum. This lower bound is then compared with the statistical power of the approximate solution, p s ( w d ), to bound the accuracy of the approximation. This is described in detail below.
A. Relaxed Constraints, Enclosing Sets, and Lower Bounds
The difficult part of w, l, and v th optimization is the timing constraint and the discreteness constraint. Thus, to find a quick lower bound, we must first relax the timing constraint with a looser constraint. In other words, we would like to relax the constraints, by enclosing the timing feasibility and discreteness condition in a simple, convex set.
Relaxing the constraints of a problem turns the resulting solution into a lower bound for the true solution. For example, consider the sets T 0 ⊆ T 1 ⊆ ... ⊆ T k and the following sequence of problems:
If the optimal solution of problem (P i ) is w i , then we have the property as follows:
In other words, the optimal value for the relaxed problem is a lower bound for the original problem. The intuition for this is the fact that the constraints in the relaxed problem enclose the constraints on the original problem. Thus, the optimal solution in the original problem is also feasible for the relaxed problem. In the process of solving the relaxed problem, the solver is free to choose a better point in the larger space, making the resulting optimum a lower bound for the original problem.
B. Linear Functions, Optimum Solutions, and Enclosing Sets
For certain classes of functions, it is easy to find a simple set that encloses the optimum. The following analysis will derive a set using the properties of linear functions, but the results also hold for more general functions. 2 The key to finding an enclosing set for the constraints is to start with an optimal solution and leverage the fact that any other feasible point cannot be better. For example, if w d is optimal for (D), then
For linear functions, the inequality on the right side can be rewritten in a simple form. This is because any linear function f (x) can be expressed in the form f (x) = f (x 0 ) + s T (x − x 0 ), where s = ∇f (x). 3 Thus, expanding about the minimum x of f gives the following:
Applying this to (12) with s = ∇p d ( w d ) shows the following:
Note that T is a continuous, connected set. This gives the following relaxed problem:
C. Creating Lower Bounds for Related Problems
The above analysis seems a little circular; the optimum is required to create a lower bound for the optimum. However, the utility emerges when we use the same enclosing sets to find lower bounds for related problems.
Suppose the solution for (D) is known, and we would now like to find the lower bound for (S), which has a different objective function, but identical constraints. This can be done by leveraging the solution w d for (D) to compute a simple, 2 Specifically, (15) holds whenever p d ( w) satisfies the following:
Mathematically, this is equivalent to saying that 0 ≤ ∇p d ( w ) T ( w − w ) defines a supporting hyperplane for the super-level sets
This includes functions that are linear, concave, and quasi-concave. This does not necessarily hold for convex functions or posynomial functions, and in these cases, other properties of the timing-feasible region must be assumed for the results in this section to hold. 3 Note that s is constant over x for linear functions.
enclosing set for the constraints, as in the section above. The relaxed problem is then solved as follows:
The problem is solved over the continuous set T , and the continuous upper and lower bound constraints in B. The corresponding solution w can be used as a lower bound on the true optimum w s (p s ( w ) ≤ p s ( w s )). In the case of p s = p m and p s = p q , this is a linear programming problem, and for p s = p m3σ , this is a nonlinear convex optimization problem. Using this lower bound, we can now find a bound for how well the deterministic solution approximates the solution for the statistical problem. The suboptimality gap between this approximation w d , and the true optimum, w s is bounded by the following:
Smaller values indicate that w d is a good approximate solution for w s , and larger values indicate that it is a bad approximation. For example, if
then w d is a 5% approximate solution for w s . In other words, using w d in place of the real optimum w s , would cost at most 5% (it is suboptimal by at most 5%). This process also works with optimization over w, l, and v th . This is exactly similar to the above example, with w replaced by z, the adjusted gate widths. 4 A surprising fact is that no properties of p s ( w) are assumed. It may be non-convex or it may be nonlinear, as in the case of the p m3σ measure. The only assumption is that (17) is solvable.
Another interesting fact is that the deterministic optima is only used to determine: 1) the minimum deterministic power that is timing feasible, and 2) its corresponding statistical power. The delay of the design is thus important in determining the deterministic optimum, but does not play any other role in the bounding process. This highlights the independence of this method on the timing feasible region. 4 This gives the following problem:
Here, z is a continuous variable, which represents the effect of the widths, lengths, and v th on the power. Interestingly, the actual values of l, w, and v th do not play a direct role in the optimization above. They affect the optimization by determining a range for the values of z i . In fact, the corresponding values of w i , l i , and v th may not be unique; it is only important that there is at least one combination of w i , l i , and v th that satisfies z i = w i e αl 2 i +βl i e −γv th i . Thus, z acts as a proxy for w, l, and v th and a corresponding value is not important, as z is used solely to find a lower bound on the power, and not a minimum power configuration. A visual example of the lower bounding process is shown in Fig. 5 . The figure shows that the suboptimality bound is more related to the geometry of the problem than the actual difference between the statistical and deterministic optima.
D. Width-Sizing Experiment
In this section, we compute suboptimality bounds for width sizing. The bounds (δ so ) are computed for each circuit in Section III and the results are presented in Table VI .
The Nangate Library is modeled using (3). The synthesized widths from the Cadence RTL compiler are assumed to be the optimal deterministic widths, 5 w d . These tables show that the bounds grow larger as the proportion of die-to-die variation increases. Note that this is in spite of the fact that the total σ L per gate is roughly the same, for example
This is because the die-to-die variations effect the power more efficiently than within-die variations, which may cancel each other out [see (4) ]. The case σ L = 1 nm (dtd) /0.5 nm (wid) is given for comparison with [1] . Another interesting thing is that adding v th variations does not necessarily increase the suboptimality bound. When the length variations are small, the bounds do tend to increase. However, for large σ L , the effect of v th variations is unpredictable and small.
E. Assumption Free Bounds
The biggest weakness of the analyses above is that the deterministic optimum is not available, as the problem is too difficult to solve exactly for real circuits. In this part, we derive bounds that do not rely on an initial deterministic solution. This works by exploiting the geometry of the deterministic power measure.
We begin by assuming that only the value of the deterministic power measure p d is known. Thus, the actual deterministic optimum is one of the configurations that has a corresponding power p d . Out of these possible configurations, the worstcase statistical power can be found by solving the following problem:
Denoting the optimal solution of the above as w ub , the worstcase suboptimality bounds can be found as follows:
where w lb is found using (17) . Note that in (17) , only the value p d and not the actual sizes of the deterministic optimum are used. As an experiment, we chose 20 equally spaced values for p d (between the minimum power and the maximum power), and computed the worst-case suboptimalities using (23) . Running these examples for the mean + 3σ measure gives the values in Table VII , which gives values that are approximately 3x-4x the values in Table VI . Note that these numbers are only for width sizing.
It is interesting to note how the suboptimality changes as a function of p d . This is shown for the c432 v4 circuit in Fig. 6 . At the minimum value of p d , there is only one possible sizing (e.g., all gates at minimum size), so the suboptimality is zero. Similarly, for the maximum value of p d , there is also only one size, so the suboptimality is also zero. For the remainder of the values, the suboptimality grows to a peak at a third of the way, and decreases for the remainder of the values. This implies that the suboptimality is larger for more aggressive designs, and smaller for low power designs.
F. Relating Maximum Gate Size to Suboptimality Bounds
In this section, we use (2) and (3), which model the power as a function of the gate size, to see how the bounds would change if the maximum gate sizes are increased for all gates.
Table VIII shows how the upper bounds in Table VI (Section V-D) increase when the gate size upper bounds are removed. In this case, the optimal deterministic power value is fixed. The values are given for the case σ L = 1.6 nm (dtd)/1.15 nm (wid) and σ V th = 4.7% (dtd and wid). When compared to Table VI, the suboptimalities for p m increase by 0.05-0.75 percentage points, and the suboptimalities for p m3σ increase by 0.5-8.6 percentage points.
The suboptimalities increase because a better lower bound is found when larger gates are available in (17) . This is because the maximum size bounds limit the sizes of the gates with the best deterministic power vs. statistical power tradeoff. As the maximum sizes increase, more of the power is used on the gates with the best tradeoffs. However, once the maximum gate size is large enough to use all of the power budget on the gates with the best tradeoff, the suboptimality will not increase any further.
The relation between suboptimality and the maximum size can be seen in Fig. 7 . The increase in suboptimality from a maximum size of 4 to 8, and from 8 to 16, is approximately 3.5 percentage points. However, the increase tapers off; the increase from 16 to 32 is 1.48 percentage points and the increase from 32 to 64 is 0.01 percentage points. The Table VII . The worst-case suboptimality δ wc reaches its limit of 20.58% for large maximum gate sizes.
suboptimalities for 64 and 128 are identical. This happens because the maximum size bound no longer plays a role in determining the lower bound. 6 Results for the "assumption free bounds" in Section V-E can also be analyzed to see how the maximum gate size affects the worst-case suboptimality δ wc . Recall that in this case, the bounds are computed for a range of values for p d and not just one value. Using the same range of values as used in Table VII , the suboptimality also reaches a limit as the maximum gate size increases, as in Fig. 8 . The values for a maximum gate size of 64 and 128 are identical.
The explanation for this trend is similar to the case in Fig. 7 . However, in this case, an upper bound is also computed in (22) . This upper bound can also take advantage of the increasing maximum gate size by enabling it to use more of the power on the gates with the worst deterministic power vs. statistical power tradeoff. It is important to note that in realistic libraries, gate sizes will not go beyond 64x, with most cells being limited to 16x. As a result, numbers presented in this section are somewhat exaggerated.
G. Suboptimalities for w, v th , and l Sizing
In this section, we extend these results to w, v th , and l assignment. Because we do not have designs with optimal w, v th , and l assignment and, hence, an optimal z , we follow a methodology that is similar to Section V-E. Ten equally spaced values of the deterministic power are chosen ({p d 1 , . .., p d 10 }), where p d 1 is the power with all gates at the minimum power cell, and p d 10 is the power with all gates at the maximum power cell. For each deterministic power, a corresponding lower bound is found (z lb , as in the prior sections) and a corresponding upper bound is found (z ub ). The lower bound is computed by using a variation of (20) that uses a given deterministic power value p d i as follows:
The upper bound is computed by using a variation of (24) as a maximization problem as follows:
The latter problem finds the maximum statistical power configuration that has a deterministic power equal to p d i . Thus, we find the maximum statistical power that can also be an optimal deterministic power solution. Combining the two results, the maximum suboptimality is computed as follows:
This results in a more conservative bound than in Section V-D, because a worst-case lower bound and a worst-case upper bound are computed as follows: These values are computed for width sizing (δ w,max ), and full w, v th , and l assignment (δ full,max ). The ratios δ full,max /δ w,max are shown in Table IX. The ratios in Table IX can be used to relate the suboptimality bounds from Section V-D to the case of full w, v th , and l assignment. This table indicates that the values will roughly double in most of the cases, but may increase up to 6.6x.
Note that in the process above, w, l, and v th are not used. This is because the effect on the power is central to the bounding process, and the effect of w, l, and v th on the power can be summarized by the variable z. Using the actual w, l, and v th cannot be done as p d is not quasi-concave in w, l, and v th .
H. How Conservative are These Bounds?
A natural question to ask about the bounds above is, "How conservative are they?" This is because the bounds for the larger length variations are significant, and it may be useful to perform statistical optimization to see what the actual suboptimality is for those cases.
One indication of how conservative these bounds are comes from looking at the sizes that are used to compute the lower bound. In Fig. 9 , the sizes of the deterministic optimum are plotted against the sizes that are used to compute the lower bound. The difference in the sizings is large; many of the minimum-sized gates become large gates, and vice-versa. Fig. 9 . Difference between the sizes that are synthesized for deterministic power (w d ), and the sizes that are used to compute the lower bound (w lb ). The c1355 v2 circuit is shown.
The effect of flipping the gates will generally violate timing. Gates are usually sized larger because they need to drive larger fanouts, and smaller gates are smaller because they have smaller fanouts. Thus, by flipping the sizes of these gates, small gates will need to drive larger fanouts, and there will be large gates with small fanout load.
Another intuition can be gained by re-examining Fig. 5 . In both cases of (b) and (c), the actual suboptimality is 0%. However, due to the geometry of the sets and the sensitivities, the suboptimality bounds are very different. This indicates that the bounds may be very loose in some cases.
A final comment is related to the mathematical procedure used to create the bounds. The space was relaxed to be continuous, and the lower bound is found by finding a continuous sizing that is a bound. Adding in the discreteness constraints can only make the suboptimalities smaller.
To test the bounds, we made several small circuit examples as follows: 1) 7 gate fanout tree: 1 primary input gate, 4 primary output gates; 2) 7 gate fanin tree: 4 primary input gates, 1 primary output gate; 3) 9 gate diamond: 1 primary input gate, 1 primary output gate, with 2, 3, and 2 gates at logic depth 2, 3, 4, respectively; 4) 13 gate star: there is 1 gate in the middle at depth 3, and the fanout and fanin cones are both trees with root at this node; 5) 14 gate, 2-bit adder. We randomly assigned the gates to these examples, and used enumeration to compute the suboptimality. A variation of σ L = 1.6nm (dtd)/1.15nm (wid) and σ V th = 4.7% (dtd & wid) is used. After running an optimization however, we found that all of these designs have suboptimality of 0%.
To find examples with suboptimalities, we randomly generated circuits according to the following rule: 1) with an N gate circuit; 2) j is a fanout of i with probability p if j > i, probability 0 otherwise. We used N = 10 and p = 0.5. All the possible width combinations were enumerated, and the best designs for a Fig. 10 . Suboptimality plot for a ten gate circuit (computed by enumeration). The x-axis is the delay constraint, while the y-axis is the statistical power. The "x" denote statistical minima and the "o" denote deterministic optima. When the black and gray lines diverge, there is a suboptimality gap, which is the difference between the two lines. The maximum gap in this plot is 10.1%. Note that for the majority of the delay constraints, the deterministic and statistical optima are the same. Fig. 11 . Histogram of the suboptimalities for 500 randomly generated circuits. The majority of the examples have suboptimality of 0%.
series of delay targets were recorded. Fig. 10 shows one example case. The statistical and deterministic optima are the same for the majority of the cases, but they depart for a few different delay targets. In this case, the maximum suboptimality is 10.1%. This example circuit has logic depth 5, and the non-primary output gates have fan-outs of {6, 5, 5, 4, 3, 2, 1}, respectively.
We generated 500 of these random circuits, and computed the worst-case suboptimality for each circuit across all delay targets. Most of the circuits that were generated have 0% suboptimality (see the histogram in Fig. 11 ), however the worst-case suboptimality was 16.6% which correlates with the numbers in Table VII . This shows that the suboptimality is likely to be 0%, although the worst-case numbers may be significant.
VI. Bridging the Suboptimality Gap
The suboptimality bounds for large l variations cause some reason for concern. Some of the bounds are over 10%, which is a significant amount that is too large to ignore. In this section we present ways to the suboptimality gap using simpler measures.
There is a significant cost to using the p m3σ (·) measure. It is significantly more complicated than the other power measures. While the deterministic (p d (·)) or the mean (p m (·)) power measures are linear in z, the p m3σ (·) is nonlinear in z (although it is convex in z). Linear measures have the advantage that the total power is the sum of the individual powers. Thus, these measures can be used in existing optimization methods by replacing the power values in the library files with the statistical power values. Fortunately, there is a simple linear approximation that we can use to bridge the suboptimality gap. The idea is to use the variation information to construct a linear measure that has a low suboptimality gap. This would result in a measure that is a provably good approximation to the full statistical optimization, and can be used by replacing the deterministic power values by these approximate values. We define this approximation to the p m3σ measure as follows: The derivation of this approximation is purely empirical; this measure was found to give very small suboptimality bounds. It can be interpreted as using the 1-σ value of the random variables v th w(i) , v th d , L w(i) , and L d . The √ 2/2 factor used to account for intra-die cancellation. This measure performs better than the p m measure and it is also better than the 3-σ value of the random variables.
Worst-case suboptimality bounds for this approximation are given in Table X for full w, v th , and l assignment (as in Section V-G), 1-32x sized gates (as in Section V-F), and the same range for p d as used in Table VII . This measure is very effective, as it reduces the suboptimality gap to 3.5% and less. This is around an 80% reduction in the suboptimalities. This indicates that although the p m3σ measure may be different from the deterministic measure, the sensitivities can be well approximated by a linear measure.
In contrast p m is not as good of a linear approximation to p m3σ . The resulting suboptimality bounds give about a 10% reduction in the suboptimalities.
VII. Conclusion
In this paper, we compared deterministic solutions of sizing problems with statistical solutions of sizing problems. The rankings of the solutions coincide very well for the mean power measure, indicating that good deterministic solutions will be equally good mean power solutions. However, the other measures have significant discrepancy in their rank. These findings were substantiated by computing the worstcase bounds on the suboptimality gap. The bounds are always insignificant for the mean-power measure p m , but they may become significant for the p m3σ measure when the inter-die component of the length variations are large. As a way to bridge the suboptimality gap, we presented a proxy measure for the p m3σ measure, which is an excellent approximation, and has an insignificant suboptimality.
