In this paper, we present a statistical power estimation method where estimation time and accuracy can be balanced by assigning smaller errors to the nodes with higher power dissipation and higher errors to the nodes with lower power dissipation. To calculate the error rates for individual nodes, a quadratic programming based problem will be formulated which incorporates the distribution data of all individual node switching activities. Also, an iterative statistical power estimation system will be presented. Finally, we will demonstrate experimental results which show drastic reduction in the number of simulation patterns compared to previous methods.
Introduction
With increasing complexity and operational frequency of VLSI circuits, power reduction has become one of most important design goals. Especially in the design of portable electronic equipment and mobile electronics, the failure to meet the specification of power consumption may result in repetitive redesign steps of the system because high power consumption can shorten the battery life and, even worse, may lower the reliability of the system. Gate-level power estimation techniques are divided into two categories [1] ; probabilistic and statistical techniques. In probabilistic techniques [2, 3] , the average switching rates (called transition densities) of the primary inputs are propagated upstream to the primary outputs of a circuit using some gate-level transfer functions. Probabilistic techniques are usually fast in computing the power dissipation, but their accuracy may not be reliable enough to be used in the design process. It is mainly because the circuit's topological correlation due to the signal reconvergence can hardly be modeled using simple signal probabilities and probabilistic transfer functions of logic elements. On the other hand, in sampling-based simulation methods, either the average switching activities of individual circuit nodes [4] or the total switching activities of a circuit [5] [6] [7] [8] is estimated. The advantages of the simulation-based techniques are the accuracy and the simplicity of implementation using existing logic simulator with realistic delay models, where glitches and signal correlations can be automatically taken care of. To determine the number of simulation patterns (sample size), a stopping criterion is derived under some statistical assumptions. For example, the normality on the distribution of the average transition densities of individual nodes [4] or the normality on the distribution of the total transition density [5] are often assumed. Then the average transition density is estimated with a user-specified error and a confidence level. Extensions of [4] and [5] approaches have been proposed subsequently [6] [7] [8] .
It has been experimentally shown that the nodes with higher transition densities often tend to converge more rapidly to a given error bound than those with lower transition densities [4] . Thus, if a constant estimation error is specified for every individual node as in [4] , the actual errors for the nodes with high switching activities will be much lower than the given target errors. And, because of slow convergence rate for the nodes with low transition densities, the number of simulation patterns could be unacceptably large. Fig.1 shows the above statement more clearly. For c6288 and s38417 benchmark circuits, the statistical technique of [4] has been used with 5% error and 99% confidence. In Fig.1 , the entire nodes were partitioned into 20 groups according to their transition densities, and the resultant error in each group has been averaged. The results in Fig.1 indicate that a fixed estimation error for all nodes may not be adequate to meet reasonable simulation time budget. In [4] , to alleviate the slow convergence of the noes with low transition densities, the circuit nodes are classified into regularand low-density nodes. For the low-density nodes whose average switching densities are less than some value η min , an absolute error bound η min ε is applied with the same confidence level, where ε is the user-specified percentage error for regular-density nodes. Although this approach can speed convergence of statistical power estimation, there is no clear way to determine η min to separate low-density and regular-density nodes. For the practical applications such as electromigration analysis and power optimization, it is evident that the nodes with higher power dissipation should be identified with smaller errors even at the sacrifice of estimation errors for low power nodes. Therefore, for a fast sampling-based statistical power estimation, errors for individual nodes may need to be graded according to their relative contributions to the total power dissipation. That is, individual nodes are assigned gradually decreasing errors according to their contribution to the total power consumption. In fact, this type of error assignment for all individual nodes naturally matches with the characteristics of simulation-based power estimation as shown in Fig.1 .
In this paper, we present a method to determine the errors which are dependent on individual node's relative contribution to the total power dissipation. Since it can be impractical to assign different error to every individual node in a circuit, we partition the entire nodes in a circuit into several groups according to their contribution to the total power dissipation. And gradually decreasing errors are assigned to the groups in such a way that the nodes of high power dissipation have a smaller error. For this, a quadratic programming based error calculation problem will be formulated. Also, an iterative statistical power estimation method will be proposed.
A Statistical Power Estimation Problem

General Ideas
The number of simulation patterns in a sampling-based statistical power estimation is totally dependent on the stopping criterion. The stopping criterion of MED approach [4] is,
where N, ε, n , and s are the number of simulation patterns, user-specified error, sample mean of the number of transition at a node, and the standard deviation, respectively. And Z α/2 is obtained from the normal distribution with confidence level α.
From Eq.(1) with the constant values of ε and α, we can observe that the nodes with the smallest value of (s / n ) determines the largest N, which is the total number of simulation patterns. Unlike the MED approach with a fixed error for all nodes, we use variable errors for individual nodes to balance the simulation time and the estimation accuracy. The variable errors for individual nodes will be determined such that the nodes with higher transition densities can be estimated more accurately by tightening the estimation error bound. Although the error rates are relaxed for low power nodes, we put a constraint on the relaxed errors so that the sum of estimated power dissipation for all nodes should be within a certain error ε T of the total power dissipation.
This additional constraint will bound the excessive errors for low density nodes.
An Optimization Problem Formulation
To reduce the complexity of the analysis, the entire nodes are partitioned into M groups according to their contributions to the total power dissipation. Let G i and G j be the i-th and j-th partitions. We further assume that ε i and ε j be the errors assigned to the nodes in G i and G j , respectively. It is also assumed that, if i > j, the nodes in G i consume less power than the nodes in G j . Thus, G 1 is the group of nodes with the highest power consumption. Let the estimated power of the nodes in G i be P i and the sum of the estimated power dissipation for all nodes be P T . We further assume that ε i and ε j be the errors assigned to the nodes in G i and G j , respectively. And, ε T is the user-specified estimation error of the total power P T . Then, the following equations hold. Obviously, if ε i = ε T , the Eq. (4) holds and it is the same case as in MED. Since our goal in this work is to determine gradually decreasing error rates so that the nodes with higher contribution to the total power dissipation can be more accurately estimated, we need a constraint as in Eq. (5).
where β is some arbitrarily small positive number. Notice that the sample size of this problem is the maximum of the individual sample size for each group. Therefore, to achieve the maximum speedup for convergence, we need to have equal number of simulation patterns for the nodes in different partitions In Eq.(7), the index 1 represents the group of the nodes with highest power consumption and ε 1 can be specified by the user.
The purpose of Eq. (7) is to equalize the sample sizes by preventing a certain term in the quadratic cost function from being excessively large, which determines the total sample size in our problem. Now, we can formulate an optimization problem as below.
<An Optimization-Based Error Calculation Problem> Find the error rates ε 1, ε 2 , ..., ε M for the partitions G 1 , G 2 , ..., G M which minimize the quadratic objective function of Eq. (7) subject to the linear constraint equations in Eq. (4) and (5).
Circuit Node Partitioning
The efficiency of our method is dependent on the way that the entire circuit nodes are partitioned. In this work, we try three different ways of partitioning after simulation with typically over 30 simulation samples from Central Limit Theorem.
(1) Type 1 : Interval based partitioning From the simulation data of individual power consumption, the minimum and the maximum value of the power dissipation of all nodes can be determined. With the data, we divide the interval between the minimum value (V min ) and the maximum value (V max ) into fixed M groups. And let ∆ be (V min -V max )/M. Then, the nodes in G i have power dissipation between V min +∆*(i-1) and V min + ∆*i. This partitioning method is simple, but it does not reflect the distribution of power dissipation.
(2) Type 2 : Distribution based partitioning with equal weights.
First, we sort the individual node power dissipation and construct its cumulative distribution. If the individual nodes are to be partitioned into M groups, the sum of power dissipation in each partition has 100/M contribution in percentage value. By taking this approach, the size of each partition is dependent on the shape of cumulative distribution.
(3) Type 3: Distribution based partitioning with increasing weights This partitioning approach is a modified version of Type 2. Unlike the equal weight as in Type 2, each partition in this approach has different contribution to the total power dissipation such that the sum of the power dissipation in G 1 is larger than that of G 2 , etc. We give the weight (M+1-i) to G i . So, the overall weight will become M(M+1)/2. For example, if M = 10, the aggregate power contribution percentage of the nodes in G 1 to the total power is about 18.2% and the aggregate power contribution of the nodes in G 10 is about 1.8%. This partition can lead the quadratic programming tool to obtain higher errors on the nodes with lower power dissipation and lower errors on the nodes with higher power dissipation.
We have performed experiments with these three partitioning methods for c6288 and s38417 ISCAS benchmark circuits [9, 10] , where 30 random patterns are simulated and M=10. The percentages of the nodes in each partition are shown in Fig. 2 . 
Iterative Statistical Power Estimation
Since the initial partitioning is based upon small amount of simulation results, there may exist incorrect prediction that some nodes are placed in a wrong partition and thereby incorrect error rates are assigned to them. Therefore, we need to iterate the statistical power estimation until all nodes are verified to be correctly partitioned. The overall scheme of our statistical power estimation is shown in Fig. 3 .
First, we simulate the circuit with 30 or more simulation patterns. Then, we partition the nodes in a circuit and determine their error rates by solving the quadratic programming based problem. Next, we simulate the circuit until the power estimation of all nodes can be done within their calculated error rates and confidence level. To make sure that all individual nodes are estimated with correct error rates, we have to repeat the circuit partitioning and calculate their error rates again. If the power dissipation of every node is already correctly estimated within the new calculated error, this procedure stops. 
Experimental Results
We have implemented the proposed statistical power estimation method in C language and have performed several experiments on ISCAS benchmark circuits on a SUN Sparc-20 workstation. We use a gate-level logic simulator with unit-delay model. Although the unit-delay model may not fully represent the switching activity of a circuit, our method can be used in conjunction with any arbitrary logic simulators. Our main objective in the following experiments is to compare the relative efficiency between our method and the conventional statistical estimation methods which use a constant error for all individual node power estimation. Table 1 shows the comparison results between the MED approach and ours for the ISCAS85 benchmark circuits. In the experiment of the MED approach, 99.9% confidence and 5% userspecified error have been specified for all nodes and no η min (predefined lower bound absolute error) has been assumed. On the other hand, in our proposed method, overall confidence, total power estimation error, and the error for the partition with the highest power dissipation were specified 99.9%, 5%, and 2%, respectively. And all nodes in each circuit has been partitioned into 25 groups.
From the results in Table 1 , our method requires much less simulation patterns in all three partitioning methods. The results obtained from type 2 and 3 partitioning methods outperform the type 1 result, which imply that the partitioning methods based upon the distribution data of individual node power are well suited for statistical power estimation. Especially, for the circuit c2670 in the type 2, the number of samples required is as low as 1.1% of the samples of the MED approach. On average, the type 2 approach requires only 3.2% samples compared to the sample size of MED. The CPU time spent for solving the quadratic programming is negligible compared to the overall simulation time. And the number of iterations are less than two times in all experiments. Obviously, the speedup ratio achieved from our method over the MED approach are close to the sample size reduction ratio. While the results in Table 1 may seem to be quite obvious, it can be claimed that the efficient control of error rates can lead to an accurate estimation of high power dissipation nodes and the total power consumption in significantly shorter CPU time than conventional statistical power estimation methods.
The number of partitions can affect the overall power estimation efficiency. In general, more partitions produce higher (lower) errors for the groups of low (high) power nodes in solving the quadratic programming. However, if the errors for the high power nodes are too low, satisfying the errors for high power nodes can become a bottleneck rather than satisfying the errors for low power nodes. From our experience, 20 to 30 partitions yield reasonable results and the time spent on solving the quadratic programming is less than 30 seconds up to 50 partitions. In Fig. 4 , the target error rates and the real errors rates in c6288 and c7552 circuits are compared against the error rates of the MED approach. In our method, the real error rates are roughly approaching the target error rates. This phenomena indicates that our method can avoid excessive simulation efforts and thus can accelerate the power estimation process. Fig. 4 . Variable error rate when using our approach In Table 2 , some benchmark results of the ISCAS89 circuits are shown for the MED approach and ours. For larger circuits, MED sets a threshold value to reduce the execution time to a practical level. In the MED approach, 20% of the circuit nodes with the lowest transition densities has been assumed as low-density node and η min was chosen to include the low 20% nodes. In this experiment, we have followed the same procedure as in MED and 20% of the circuit nodes with the lowest transition densities has been assumed as low-density node. The stopping conditions are 99.9% of confidence, 5% user-specified error for the total power, and 2% error for the group of nodes with the highest power dissipation. As shown in Table 2 , our method needs about 30-70% simulation patterns of the MED approach. Because an absolute error has been applied to the 20% of low-density nodes, the saving of sample size in our method has been reduced. However, it is still possible to apply our method to any operating environment and to speed up the statistical power estimation process.
Conclusion
In this paper, we have proposed a new statistical estimation method for the individual node power as well as the total power in CMOS logic circuits. Unlike the conventional statistical power estimation methods, the error rate for each node is determined according to the relative contribution of each node to the total power dissipation. Hence, the nodes with higher power dissipation can have smaller error rates at the sacrifice of higher estimation errors for the nodes with lower power dissipation. For this, a quadratic programming based problem was formulated and an iterative statistical power estimation system has been developed. Compared to the previous methods which employ a constant error for all nodes, our proposed method will balance between the time and the estimation accuracy in simulation-based statistical power estimation. By using our method, the power dissipation of all individual nodes can be estimated with 3.2% of samples size of the MED technique.
