Abstract-In this paper, we develop a novel technique based on Markov chains to accurately estimate power sensitivities to primary inputs in CMOS sequential circuits. A key application of power sensitivities is to construct a complicated power surface in the specification-space so as to easily obtain the power dissipation under any distribution of primary inputs, thereby offering an effective power macromodel for high-level power estimation. We demonstrate that such a power surface can be approximated by only a limited number of representative points. This benefit dramatically reduces the CPU and memory requirements. We have verified the feasibility and accuracy of the new technique to estimate power sensitivities on a large number of sequential benchmark circuits. Results on the power dissipation under different distributions of primary inputs demonstrate the efficiency and effectiveness of our power macromodeling technique.
I. INTRODUCTION
The increasing use of portable computing and communication systems makes power dissipation a critical parameter to be minimized during circuit and system design [5] , [17] . Hence, there is a great need for tools to accurately estimate power dissipation at various levels of design abstraction.
Research on power estimation has started in earnest [14] , however, most of the research concentrates on the logic level. In order to shorten design time and reduce design iterations, we have to estimate power dissipation at a high level of abstraction. One of the main objectives for high-level power estimation is to develop a power macromodel for a module so that power dissipation can be easily obtained under different distributions of primary inputs [12] , [13] , [15] , [16] , [18] . When the same module is reused, we can obtain its power by simply using a look-up table.
A good macromodel must be able to determine the power under different primary input distributions. Since power dissipation of a circuit is strongly dependent on the statistics of primary inputs, the relationship of power versus primary input probabilities (probability of a signal being logic ONE) and activities (probability of signal switching) is a complicated surface. Once such a surface is set up, power dissipation under different distributions of primary inputs can be easily obtained. However, to construct such a power surface, a large number of discrete points are required. If one chooses d representative values for the probability and activity of each primary input, the number of representative points in the specification-space can be d 2m (m is the number of primary inputs). Hence, to generate the power surface, a symbolic or statistical power estimation process has to be repeated d 2m times. For Publisher Item Identifier S 0278-0070(00)10294-5.
large circuits with large number of inputs, such a process is obviously impractical due to the exponential growth of complexity. Moreover, the memory or storage complexity is O((2m + 1)d 2m ) = O(md 2m ).
In this paper, we present a novel macromodeling technique based on power sensitivity. The basic idea of our technique is to use a limited number of representative points in the specification-space to approximate a complicated power surface. Power dissipation under different distributions of primary inputs is calculated by considering the representative points and power sensitivities. The power macromodeling technique can be applied to both combinational and sequential circuits as long as efficient techniques to estimate power sensitivities are available. In [7] and [9] , symbolic and statistical techniques have been proposed to estimate power sensitivities in combinational circuits. In this paper we present a novel approach based on Markov chains to accurately estimate power sensitivities in sequential circuits. The power sensitivities are then used to develop the power macromodel to estimate power under different distributions of primary inputs.
II. PRELIMINARIES

A. Power Dissipation in CMOS Logic Circuits
Among the three sources of power dissipation-switching current, short-circuit current, and leakage current-switching power is by far the most dominant in current technology. Thus the average power for a CMOS circuit can be approximated by Power avg = (1=2)V 2 dd j2all nodes C(j)A(j), where V dd is the supply voltage, C(j) is the node capacitance, A(j) is the activity at node j. Since A(j)
is proportional to the normalized activity a(j) [a(j) = A(j)=f , where f is the clock frequency] and C(j) is approximately proportional to the fanout at node j, we can define the normalized power dissipation
where fanout(j) is the fanout number at node j.
B. Power Sensitivity
To measure the effect of the variations of primary input specifications on power dissipation, we define power sensitivity to primary input activity S a(x ) and power sensitivity to primary input probability S P (x ) as Poweravg is proportional to 8. Therefore, we can define normalized power sensitivity to primary input activity a(x ) and normalized power sensitivity to primary input probability P (x ) in terms of 8 as follows:
where a(j) is the activity of node j. For simplicity, we can refer to @a(j)=@a(x i ) and @a(j)=@P (x i ) as activity sensitivities.
0278-0070/00$10.00 © 2000 IEEE Let (xi) be P (xi) or a(xi). The mth (m 2) order power sensitivities to primary input x i in terms of 8 can be defined as
where x k varies for all primary inputs.
III. A NOVEL POWER MACROMODELING TECHNIQUE
In this section, we first present our power macromodel based on power sensitivities. Then, we show how to use a limited number of points to approximately construct a complicated power surface in the specification-space so as to obtain power under different distributions of primary inputs.
A. Power Macromodel
If we can determine the power sensitivities to each primary input, we can compute power under different distributions of primary inputs using the following equation:
where i and k vary for all primary inputs (PI's). Equation (6) is the Taylor's formula with Lagrange remainder R n , the initial point being (P (x1)nom, a(x1)nom, ..., P (xm)nom, a(xm)nom) (m is the number of primary inputs) [11] . Power nom is the average power dissipation based on nominal probabilities (P (x i ) nom ) and activities (a(x i ) nom ).
For the first-order approximation, we can ignore the second-or higher order power sensitivities to obtain power by
B. Construction of Power Surface
Power dissipation of a CMOS circuit is heavily dependent on the primary input probabilities and activities. If we plot the relationship between average power and input specifications in terms of signal probability and activity (in this paper we call such a plot power-specification plot), we will get a complicated surface. Since 2m (m is the number of primary inputs) dimensions are required to represent a certain specification of primary inputs, the power-specification plots are intrinsically
Obviously, for a CMOS circuit, once a power surface is available, power dissipation under different distributions of primary inputs can be obtained. But the process to obtain such a power-specification plot is nontrivial. A possible method is to use random number generators to generate a large number of distributions of primary inputs. Then, use symbolic or statistical method to obtain the power dissipation corresponding to each set of such distributions, which is a representative point for the power surface in the specification-space. However, the effectiveness of this method strongly depends on the density of the chosen points. The more points one chooses, the more accurate result one obtains. However, more points directly translate to longer CPU time. In practice, one must use a finite number of points to approximate a complex power surface. In order to obtain the power dissipation under a given specification of primary inputs, one has to approximate its power by the value of a simulated point. This means that for some inputs there exist differences between the actual values of the probabilities and activities and those values corresponding to the representative points in the specification-space. In [7] , it has been shown that for some circuits a small deviation in the statistics of some primary inputs may have a major effect on power dissipation. For those sensitive primary inputs, such approximations may make average power severely off the actual value. These errors can be effectively reduced by taking power sensitivities at different nominal values.
The basic idea of our method is to approximate a complex power surface with a number of small planes. First, randomly choose d different statistics of primary inputs. Then, determine average power and power sensitivities for the d different statistics of primary inputs. Given a certain statistics of primary inputs, we use the nearest point (see Section V-B) in the specification-space as the nominal value for the required power [Powernom in (7)]. Finally, the required power dissipation can be obtained by (7) . Consequently, the vicinity of a representative point is approximately characterized by a plane.
To illustrate how our technique works, let us consider a two-input AND gate y = x 1 1 x 2 , where x 1 and x 2 are independent primary inputs. We have activity a(y) = P (x 1 )a(x 2 ) + P (x 2 )a(x 1 ) 0 (1=2)a(x1)a(x2) [10] . If we plot activity a(y) versus primary input specifications (P (x 1 ), a(x 1 ), P (x 2 ), a(x 2 )), we get a five-dimensional surface which cannot be envisioned. Fig. 1 gives a simplified activity surface for the two-input AND gate based on the above expression with P (x 1 ) = 0:3 and P (x 2 ) = 0:3. Approximating the activity surface, by (7), we have
where (a(x1)nom, a(x2)nom, a(y)nom) is a representative point for the activity surface obtained when P (x 1 ) = 0:3 and P (x 2 ) = 0:3, and a(y) nom = 0:3a(x 2 ) nom + 0:3a(x 1 ) nom 0 0:5a(x 1 ) nom a(x 2 ) nom . Having proposed the macromodel and shown how to construct the power surface under different distributions of primary inputs, it should be pointed out that the number of representative points d used to construct such a power surface need not be large. In some special cases, d can be as small as one (see Section V-B). It should also be emphasized that the macromodeling technique can not only be applied to combinational circuits but also to sequential circuits. The efficiency of such a technique is determined by whether efficient techniques to estimate power sensitivities can be developed. In the following section we will present an efficient technique to obtain power sensitivities as a by-product of power estimation processes.
IV. AN EFFICIENT TECHNIQUE TO ESTIMATE POWER SENSITIVITIES
As outlined previously, the task at hand is to efficiently obtain power sensitivities. A naive approach to estimate power sensitivity would be to simulate a circuit to obtain the average power dissipation based on nominal values of primary input signal probabilities and activities. Then, assign a small variation to only one primary input and re-simulate the circuit. After all the primary inputs have been exhausted, power sensitivity can be obtained using 1Poweri=1(xi), where (xi) denotes P (xi) or a(xi). This naive method can be easily implemented. However, it involves m + 1 runs of power estimation.
If the number of primary inputs (m) is large, this method can be computationally expensive. Therefore, the naive simulation method is impractical for large circuits with large number of primary inputs. In this section, we propose a novel technique based on Markov chains to accurately estimate power sensitivities in CMOS sequential circuits.
For a sequential circuit, the primary inputs can be temporally correlated. It has been shown in [6] that such temporal correlations lead to spatial correlations between primary inputs and state bits. Therefore, a sequential circuit can not be modeled as a Markovian process. A state bit expansion technique has been proposed in [6] to transform the non-Markovian process to a Makovian one. In this technique, primary input x i expands to two new temporally and spatially independent variables up x and down x which have probabilities as follows:
Such a state expansion technique is adopted in this paper so that a sequential circuit can be modeled as a Markov chain fX Tr is a column vector (the superscript T r represents the transpose).
Each element in the transition matrix P is a function of the activities of primary inputs, as shown in (14) and ( where is a small perturbation. The transition matrix P changes from P to P = P + @P=@a(x i ), where Q @P=@a(x i ) is a matrix with entries @Pij =@a(xi).
We apply the technique for sensitivity analysis of Markov systems presented in [2] - [4] . Power sensitivity to the activity of primary input xi can be obtained as follows:
where g is called the potential vector, and g i (n) can be estimated as
Power(X fig   k ) ].
Each term in
Qg takes the form iqij gj , and ipij gj can be estimated as follows [4] : 
Multiplying each side of (17) 
where n is length of the period over which we use the expectation of the sum of the performance function (average power) to approximate gi. In [4] , it has been shown that the results are not sensitive to the value of n. Substituting (21) into (16) We now derive expressions for q X ; X =p X ; X . As given in (14) , if pX ; X = 0, then qX ; X =pX ; X = 0. If 
Therefore, based on (22), power sensitivities can be obtained as a by-product of average power estimation. It should be noted that P (x ) can be estimated simply by replacing a(xi) in (24) with P (xi). The (9)-(12).
V. EXPERIMENTAL RESULTS
We have implemented the techniques to estimate average power and power sensitivities in C under the Berkeley SIS environment. Based on the power sensitivities, the power macromodel has been implemented to estimate power dissipation under different signal statistics.
A. Comparison of Statistical Method and Naive Method
In this section, we will verify the techniques to estimate power sensitivities in sequential circuits. All primary inputs are assumed to be spatially independent and have probability and activity values (nominal) of 0.5 and 0.26, respectively. Corresponding to (22), n is chosen to be 7 while N is 100 000. In order to show the accuracy and efficiency of this technique, a long run simulation method (naive technique described in Section IV) is performed as a figure of merit for the statistical method. The activity variation used in this experiment is 0.05. Table I shows the comparison of the results for a number of benchmark circuits. The last column "Diff %" represents percentage difference obtained using expression ( i j a(x ) (ST AT ) 0 Table I is for a SUN ULTRASPARC 1 workstation. Results indicate that the average difference between ST AT and SIM for the simulated circuits is less than 5%.
The simulation method repeats the estimation procedure m+1 times (m is the number of primary inputs) and, hence, execution time may be unacceptably long for large m. For the potential based technique, power sensitivity is obtained as a by-product of average power estimation. If the sample number is set to be the same in the two methods, ST AT can be m + 1 times faster than the naive method.
B. Average Power Under Different Distribution of PI's
After obtaining power sensitivities, we use (7) to compute the average power for each simulated circuit under certain specification of primary inputs. The values of power sensitivities used in this section are obtained by using the statistical techniques proposed in [9] and in this paper for combinational circuits and sequential circuits, respectively. For simplicity, we assume that each primary input has a fixed probability value of 0.5. Table II shows the percentage differences between the results obtained by our technique and by a Monte Carlo based simulation method with the activity of each primary input equal to (a =rmnominal + 1a).
For columns 2 and 3, each primary input of a tested circuit has a randomly generated activity between 0.05 and 0.95 while a fixed value of 0.05 is set for 1a. Results indicate that with both zero-delay model and unit delay model, the average percentage difference is only 0.3% with the maximum percentage difference no greater than 0.6%.
The results reported in columns 4 and 5 are for a relatively large 1a of 0.25 for each primary input. Compared with logic simulation, the average percentage difference is about 4%. This is encouraging. It demonstrates that even a relatively large activity change does not affect the accuracy of our technique. Hence, the number of nominal activity values used to construct a power surface can be really small. The fewer the number of simulated points, the less will be CPU time and memory requirements. Table II give the results when both a and 1a
are randomly generated with values between zero and one. This means that each primary input may have a different activity and a different 1a. However, for each primary input, there exists a constraint a + 1a < 1:0. For the zero delay case, results obtained by our method match extremely well with those obtained by the logic simulation. The maximum percentage difference is only 0.2%. For the unit delay case, the average percentage difference is still under 6% while the maximum difference is less than 10%.
To show the efficiency and accuracy of our technique, we report in Table III the results for combinational circuits based on the power surface approximated by ten randomly chosen points for each circuit. For each representative point, we randomly generate an activity value for each primary input, and then estimate the corresponding power and power sensitivities. Therefore, the power surface of a circuit is approximated by ten randomly generated points. In order to compare our technique with logic simulation, for each circuit, ten different distributions of primary inputs are randomly generated.
For distribution 7i, we first find the nearest point that minimizes j (a(j) 0 a(j) nom(i) ) 2 , where j varies over all primary inputs and i varies over the ten representative points in the specifica- V  COMPARISON OF OUR TECHNIQUE AND LOOK-UP TABLE TECHNIQUE tion space. Then, by (7) we use the corresponding power as the nominal value to compute the power corresponding to distribution 7 i , denoted by Power(7 i (macro)). We use Power(7 i (sim)) to represent the power obtained by logic simulation. By using expression (Power(7i(macro))0Power(7i(sim)))=Power(7i(sim)), the percentage differences between the results obtained by the two methods are obtained, which are shown in columns 3 through 12 in Table III . Columns "Avg" represent the average percentage difference over the ten distributions. The results are promising. For each simulated circuit, our technique can yield 95% accuracy on average with only ten randomly chosen representative points. Table IV reports the results for sequential circuits based on zero delay model. Results indicate that on average our power macromodel based on only ten representative points can give the power dissipation under different input distributions with inaccuracy less than 10%. We also compared our technique and logic simulation for seriously correlated streams. The inputs are generated by a linear feedback shift register (LFSR) [1] . For each simulated circuit, the initial state is randomly generated while not all bits can be either logic 0 or 1. The percentage differences between the results obtained by our power macromodeling technique and the results obtained by logic simulation are 05.75%, 1.29%, 03.75%, 2.05%, 02.02%, 00.82%, 2.4%, 1.99%, 2.9% and 1.26% for circuits C432, C499, C880, C1355, C1906, C2670, C3540, C5315, C6288, and C7552, respectively. Results indicate that our technique can give very accurate results even for seriously correlated streams. On average the percentage difference is only 2.4% for the ten ISCAS benchmark circuits.
So far, we have shown the results for temporally correlated input sequences. In order to show the effectiveness and robustness of our technique, we apply the ten representative point based power macromodel to both temporally and spatially correlated primary input sequences. For each simulated circuit, we randomly generated ten different primary inputs with pairwise spatial correlation. The spatial correlations are randomly represented by two-input NAND or two-input NOR gates as illustrated in Fig. 3 . The dashed box is used to conceptually generate spatially correlated inputs. The inputs to the sequential circuit are the outputs of the dashed box, which are temporally and spatially corre- We conclude this section with a few comments. First, the power dissipation shown in the "SEN" columns (and rows in Table V) in the above tables are obtained by only considering (first-order) power sensitivities. If accuracy is the prime concern, higher order power sensitivities can be included without much overhead because power sensitivities Tables III and IV we have shown that our technique can yield 95% accuracy on average with only ten randomly chosen representative points. In Table VII , we report the results for ten randomly generated points based on a power surface obtained by twenty points instead of ten as shown in Tables III and IV . Results for delay model indicate that the accuracy is increased consistently for all the ten ISCAS circuits. Similar results can be obtained for any other delay models. Due to the unique nature of our power macromodeling technique, which is Taylor expansion based, a ten-or twenty-point based power surface should give accurate enough results according to our experimental results. Third, we can further reduce memory requirements by storing only those parameters associated with very sensitive primary inputs. Finally, in some special cases, the number of representative points used to construct the power surface can be reduced to ONE, which means only one symbolic run or only one statistical power estimation run is needed. For example, if the specification of each primary input falls into a relatively small range, for each primary input, we can use the middle point of its range as the nominal value to obtain all the necessary parameters, namely: nominal average power, power sensitivities to activities, and power sensitivities to probabilities. Based on these parameters, we can easily compute the possible bounds on average power [9] .
VI. CONCLUSION
In this paper, we present a novel approach to accurately estimate the sensitivities of power to primary inputs in sequential circuits. The sensitivities are estimated as a by-product of estimating the average power, leading to efficient implementation. Power sensitivities can be used to construct power surfaces to determine the average power under different statistics of primary inputs. A key advantage of the power macromodeling technique is that a power surface can be approximated by only a limited number of representative points. This not only makes the implementation of our technique very efficient but also reduces the memory requirements. The feasibility and effectiveness of our technique have been verified on a large number of benchmark examples.
