Excessive p o w er dissipation in integrated circuits causes overheating and can lead to soft errors and or permanent damage. The severity of the problem increases in proportion to the level of integration, so that power estimation tools are badly needed for present-day technology. T raditional simulation-based approaches simulate the circuit using test functional input pattern sets. This is expensive and does not guarantee a meaningful power value. Other recent approaches have used probabilistic techniques in order to cover a large set of input patterns. However, they trade-o accuracy for speed in ways that are not always acceptable. In this paper, we i n v estigate an alternative technique that combines the accuracy of simulation-based techniques with the speed of the probabilistic techniques. The resulting method is statistical in nature; it consists of applying randomly-generated input patterns to the circuit and monitoring, with a simulator, the resulting power value. This is continued until a value of power is obtained with a desired accuracy, at a speci ed con dence level. We present the algorithm and experimental results, and discuss the superiority of this new approach.
Introduction
Excessive p o w er dissipation in integrated circuits causes overheating and can lead to soft errors and permanent damage. The severity of the problem increases in proportion to the level of integration. The advent of VLSI has led to much recent w ork on the estimation of power dissipation during the design phase, so that designs can be modi ed before manufacturing.
Perhaps the most signi cant obstacle in trying to estimate power dissipation is that the power is pattern dependent. In other words, it strongly depends on the input patterns being applied to the circuit. Thus the question what is the power dissipation of this circuit?" is only meaningful when accompanied with some information on the circuit inputs.
A direct and simple approach of estimating power is to simulate the circuit. Indeed, several circuit simulation based techniques have appeared in the literature 1-2 . Given the speed of circuit simulation, these techniques can not a ord to simulate large circuits for longenough input vector sequences to get meaningful power estimates. In order to simplify the problem and improve the speed, the power supply voltage is often assumed to be the same throughout the chip. Thus the power estimation problem is reduced to that of estimating the power supply currents that are drawn by the di erent circuit components. Fast timing or logic simulation can then be used to estimate these currents 3 .
We call these approaches strongly pattern dependent because they require the user to specify complete information about the input patterns. Recently, other approaches have been proposed 4, 5 that only require the user to specify typical behavior at the circuit inputs using probabilities. These may be called weakly pattern dependent. With little computational e ort, these techniques allow the user to cover a huge set of possible input patterns. However, in order to achieve good accuracy, one must model the correlations between internal node values, which can be very expensive. As a result, these techniques usually trade-o accuracy for speed. The resulting loss of accuracy is a signi cant issue that may not always be acceptable to the user.
In this paper, we i n v estigate an alternative approach that combines the accuracy of simulation-based approaches with the weak pattern dependence of probabilistic approaches. The resulting approach i s statistical in nature; it consists of applying randomly-generated input patterns to the circuit and monitoring, with a simulator, the resulting power value. This is continued until a value of power is obtained with a desired accuracy, at a speci ed con dence level. Since it uses a nite number of patterns to estimate the power, which really depends on the in nite set of possible input patterns, this method belongs to the general class of so-called Monte Carlo methods. A most attractive property of Monte Carlo techniques is that they are dimension independent, meaning that the number of samples required to make a good estimate is independent of the problem size. We will show that this property indeed holds for our approach see Table 4 in section 5.
Both 4 and 5 use probabilities to compute the power consumed by individual gates, which are then summed up to give the total power. In this context, it was observed in 5 that it would be too expensive to estimate the individual gate powers using a simulation with randomly generated inputs. The key to the e ciency of our new approach is that, if one monitors the total power directly during the random simulation, su cient accuracy is obtained in much less time than is required to compute the individual gate powers. The excellent speed performance and the simplicity of the implementation make this a very attractive approach for power estimation.
An approach similar to this was independently proposed in 6 , but the treatment there is not very rigorous and overlooks some important issues. Furthermore, no comparisons were performed with other approaches to show the superiority of the approach. In this paper, we present a rigorous treatment that provides the theoretical justi cation of this method. We also present experimental results of our implementation and compare it to probabilistic approaches.
Overview
In this section, we provide an overall view of our technique, and discuss its superiority t o the probabilistic approaches previously proposed 4, 5 .
Overview of Monte Carlo power estimation
The block diagram in Fig. 1 . gives an overall view of the technique. The setup and sample blocks are parts of the same logic simulation run, in which the input patterns are randomlygenerated. The power value at the end of a sampling phase is noted and used to decide whether to stop the process or to do another setup-sample run. The decision is made based on the mean and standard deviation of the power values observed at the end of a number of successive iterations. The power is found as the average value of the instantaneous power drawn throughout the sample phase, and not during the setup phase. However, the setup phase is a critical component of our approach, and serves two purposes : 1 In the beginning of the simulation run, the circuit does not switch as often as it typically would at a later time, when switching activity has had time to spread to all the gates. Thus, the circuit is allowed to get up to speed during setup. This argument will be made more precise in section 3, where we also derive an exact value for the setup time. 2 The values of power observed at the end of successive sample intervals should be samples of independent random variables. This is required in order for the stopping criterion to be correct, and is guaranteed by restarting the random input waveforms at the beginning of the setup phase. The details are given in section 3. Thus the setup phase guarantees that we are indeed measuring typical power, and ensures the correctness of the statistical stopping criterion.
Comparison with probabilistic techniques
There are two distinct advantages of the Monte Carlo approach that make it an excellent choice for power estimation over probabilistic techniques. These are : 1 it achieves desired accuracy in reasonable time, avoiding the speed accuracy trade-o of probabilistic techniques, and 2 the simplicity of the algorithm makes it very easy to implement in existing logic or timing simulation environments.
Probabilistic methods 4, 5 su er from a speed accuracy trade-o because they must resolve the correlations between internal circuit nodes. If these correlations are taken into account, these methods can be very accurate. This, however, is computationally very expensive and impractical. As a result, fast implementations of these techniques are necessarily inaccurate. It is the aim of this paper to show that the proposed Monte Carlo method is very fast solving circuits with thousands of gates in a matter of seconds and also highly accurate easily within 5 of the total power. Tables 3 and 5 compare the accuracy of Monte Carlo and probabilistic methods for power estimation.
We also should make the point that the accuracy level in our approach is predictable upfront : the program will work to achieve a n y level of accuracy desired by the user. Naturally, as higher accuracy is desired, the computational cost starts to increase. However, we will show in section 5 that accuracy levels of 5 are easily and e ciently attainable.
Detailed Approach
This section describes the details of the approach. We start out with a rigorous formulation of the problem and show h o w it reduces to the well-known problem of mean estimation in statistics. We then discuss the stopping criterion, and the normality assumption required for it to work. We conclude with a discussion of the setup and sample phases and the applicability to sequential circuits.
Problem formulation
Consider a digital circuit with m internal nodes gate outputs. Let x i t, t 2 ,1; +1, be the logic signal at node i and n x i T be the number of transitions of x i in the time interval , T 2 ; + T 2 . If, in accordance with 7 , we consider only the contribution of the charging discharging current components, the average power dissipated at node i during that interval is 
The power rating of a circuit usually refers to its average power dissipation over extended periods of time. We therefore de ne the average power dissipation P of the circuit as :
The essence of our approach is to estimate P, corresponding to in nite T, as the mean of several P T values, each measured over a nite time interval of length T. In order to see how this mean estimation problem comes about, we m ust consider a random representation of logic signals as follows.
Corresponding to every logic signal x i t, t 2 ,1; +1, we construct a stochastic process x i t as a family of the logic signals x i t + , where is a random variable. This process has been called the companion process of x i t in 8 , where the reader may nd more details on its construction. For each , x i t + i s a shifted copy o f x i t . Therefore, observing P T for x i t + corresponds to measuring the power of x i t o v er an interval of length T centered at , rather than at 0. We can then talk of the random power of x i t o v er the interval , T 2 ; + T 2 , to be denoted by :
where n x i T i s n o w a random variable. It was shown in 8 that x i t i s stationary 12 so that, for any T, the expected average number of transitions per second is a constant :
where E denotes the expected value mean operator. In 5 and 8 , Dx i w as called the transition density of x i t; it is the average number of transitions per second, equal to twice the average frequency. As a result of 4, E P T is the same for any T, and the average power can be expressed as a mean :
Thus the power estimation problem has been reduced to that of mean estimation, which i s a frequently encountered problem in statistics. In order to apply the above theory, w e m ust ensure that the signals x i t observed throughout the ,T 2 ; +T 2 i n terval are samples of the stationary processes x i t. This requirement will be addressed in section 3.4.
Stopping criterion
Let us assume that P T is normally distributed for any T. The theoretical justi cation and experimental evidence for this assumption will be discussed in the next section. Suppose also that we perform N di erent simulations of the circuit, each of length T, and form the sample average T and sample standard deviation s T of the N di erent P T values found.
Therefore, we h a v e 1 , 100 con dence that j T , E P T j t =2 s T = p N, where t =2
is obtained from the t distribution 9 with N , 1 degrees of freedom. This result can be rewritten as :
Therefore, for a desired percentage error in the power estimate, and for a given condence level 1 , , we m ust simulate the circuit until :
We can use this relation to illustrate the important dimension independence property of this approach, common to most Monte Carlo methods, as follows. If N is the smallest number of iterations that satis es 7, then :
By dimension independence, we mean that N should be roughly independent of the circuit size number of nodes. In equation 7, t =2 is a small number, typically between 2.0 and 5.0, and is a constant. We therefore look to the ratio s then N should decrease with circuit size. Even when the x i s are not independent, we h a v e 2 y = 2 y 2 x i = 2 x i , a constant, which suggests that N should typically decrease or remain constant with increasing circuit size. This is indeed the observed behavior in Table 4 .
An important consequence of this result is that, since each iteration of the Monte Carlo approach takes roughly linear time in the size of the circuit, then the overall process should also take linear time. Probabilistic methods that do not take correlation into account also depend linearly on circuit size. However, if correlation is taken into account in order to improve the accuracy, their dependence is frequently super-linear.
In order to use the stopping criterion in practice, we m ust ensure that the observed P T values are samples from independent P T random variables. This requirement will be addressed in section 3.4.
Normality
A su cient condition for the normality o f P T is that i m is large and ii nx i T T are independent. This is true under fairly general conditions irrespective of the individual nx i T T distributions see 10 , pp. 188 189, and for any v alue of T.
Another su cient condition that holds even for small m, but for large T, is as follows. If i the consecutive times between identical transitions of x i t are independent which, using renewal theory see 11 , pp. 62 63, means that nx i T T is normally distributed for large T and ii the nx i T T are independent so that they are also jointly normal see 12 , p. 126 for large T then P T is normal for large T see 12 , p. 144.
To the extent that these conditions are approximately met in practice, the power should be approximately normal. We h a v e found that for a number of benchmark digital circuits 13 , the normality assumption is very good, as shown in the normal scores plots 9 in Fig. 2 . The plot for each circuit corresponds to 1000 evaluations of the average power over a 2.5 sec interval. Each e v aluation covered an average of 50 transitions per primary input. The consequences of deviations from normality are discussed in section 4.
Setup and sample
This section deals with the mechanics of how the input patterns are to be generated, when to start and stop measuring a P T value, and how di erent P T values should be obtained. We start by observing that, by stationarity o f x i t , the nite intervals of width T, o v er Figure 2 . Normal scores plot for the ISCAS-85 circuits. which the P T values will be measured, need not be centered at the origin. A P T value may be obtained from any i n terval of width T, henceforth called a sampling interval. H o w ever, the following two requirements must be met : i Throughout a sampling interval, the signals x i t m ust be samples of the stationary processes x i t. ii The di erent P T samples must be samples from independent P T random variables.
We will now describe a simulation process that guarantees both of these requirements.
Suppose that the circuit primary inputs are at 0 from ,1 to time 0, and then become samples of the stationary processes x i t in positive time. Consider a primary input driving an inverter with delay t d . Since its input is a stationary process for t 0, its output must be stationary for t t d . By using a simpli ed timing model for every gate as in 5 , we can repeat this argument enough times to obtain the following conclusion: If the maximum delay along any path from the primary inputs to node i is T max;i , then the process x i t becomes stationary for t T max;i .
If the maximum delay along any path in the circuit is T max = max i T max;i , then the sampling interval may start only after t T max . This guarantees that requirement i i s met. From that time onwards, all internal processes are stationary, and the circuit is in probabilistic steady state. We will call the time interval from 0 to T max the setup phase.
Intuitively, the circuit needs to get up to speed before a reliable sample of power may b e taken and, as we h a v e shown, the minimum time required to achieve that is T max . Finding T max in combinational circuits is straightforward; the case of sequential feedback circuits is discussed in section 3.5. In order to guarantee requirement ii, we simply restart the simulation with an empty event queue at the beginning of every setup phase. As a result, the time axis is divided into successive regions of setup and sampling, as shown in Fig. 3 . The only remaining task is to describe how the inputs are to be generated. This has to be done in such a w a y that the input processes, after the start of every setup phase, are independent of the past. This can be done as follows, for every input signal x i . At the beginning of a setup phase, we use a random number generator to select a logic value for x i , with appropriate probability Px i . We then use another random number generator to decide how long x i stays in that state before switching. This must assume some distribution for the duration of stay in that state. Once x i has switched, we use another random number generator to decide how long it will stay in the other state, again using some distribution. Let F 1 x i t be the distribution of times spent in the 1 state, and F 0 x i t be that of the 0 state. Since computer implementations of random number generators produce sequences of independent random variables, independence between the successive sampling phases is guaranteed.
The probability Px i and distributions F 1 x i t and F 0 x i t should be supplied by the user. In fact, these parameters represent the way in which the approach i s w eakly pattern dependent. They also provide the mechanism by which the user can specify any information about typical behavior at the circuit inputs. In order to simplify the user interface, our current implementation does not require the user to actually specify distributions. Rather, we require only two parameters : the average time that an input is high, denoted by 1 x i , and the average time that it is low, denoted by 0 x i . Based on this, it can be shown 8 that Px i = 1 x i = 1 x i + 0 x i and Dx i = 2 = 1 x i + 0 x i . As for the distributions, our implementation uses exponential distributions 12 so as to allow the comparisons with probabilistic methods to be given in section 5. We emphasize, however, that the stopping criterion and the overall Monte Carlo algorithm are valid for any distribution. In fact, in our implementation, the choice of distribution can be easily modi ed by the user.
Sequential circuits
The Monte Carlo method presented in this paper is valid for both combinational and sequential circuits. The only aspect of the problem that is speci c to sequential circuits is the computation of the setup time T max . Strictly speaking, since T max is the longest delay along any path, then T max = 1 for sequential circuits. Recall that it is su cient to wait for T max before starting a sampling interval in order to guarantee stationarity. It is not clear, however, whether that condition is also necessary. In practice, it seems that we should be able to compute an approximate T max for these circuits by, for example, opening feedback connections. The quality of such a heuristic could be tested by examining the expected power for di erent sampling regions. If the expected power is constant, then the heuristic does a good job of predicting the length of the setup region. Table 1 shows that this is true for combinational circuits, which agrees with our assertion in section 3 that the power would be stationary after T max . F uture implementations of this approach will include such heuristics to allow it to handle sequential circuits.
Deviations From Normality
We h a v e found that the Monte Carlo method can be applied to circuits that have non-normal power distributions without adversely a ecting the accuracy of the results. In cases of severe deviations from normality, some modi cations of the basic approach m a y be required.
It is important to note at the outset that the normality assumption was required only to formulate the stopping criterion. Given enough samples, one ultimately converges to the desired power value, whatever the power distribution. This is true because of equation 5 and the strong law of large numbers see 11 , page 26. Furthermore, we are only concerned Table 1 . If the setup region is chosen correctly, then the process is stationary and the expected power is the same for any sampling region. This is illustrated for combinational circuits with sampling regions of 625ns, 1.25s, and 2.5s.
Circuit
Power Name T = 625ns T = 1.25s T = 2.5s c432 1.13 mW 1. 13 with deviations from normality for small values of T. F or large T, and since P T tends to a constant a s T ! 1 , the variance of P T goes to 0, and its distribution must become bell-shaped, approaching a normal distribution.
Circuits do exist that have non-normal power for small T. An example would be a circuit with an enable signal whose value strongly a ects the power drawn by the whole circuit. When the enable signal is low, the circuit would have one power distribution, and when it is high it would have a di erent distribution. If these two distributions had di erent means, then over a small T interval the overall distribution would not be normal, but would be a so-called bimodal distribution, as shown in Fig. 4 . When each of the two distributions is normal, we will refer to the overall distribution as a double normal.
As a concrete example, consider the simple XOR circuit in Fig. 5 . The enable signal allows the output stage to switch, drawing much more power than otherwise. If the transition density see equation 4 at the enable line is low compared to the other two inputs, then, over a short time interval, only one of the two modes of operation would be observed. Using a density of 2e7 at the inputs and 2e5 at the enable line, a sampling interval of T = 2 : 5 sec, and with 11000 samples, we get the histogram shown in Fig. 6 . A circuit with enable. This is one of many w a ys in which the distribution can deviate from normality. W e h a v e also considered two other ways in which the distribution can be distorted. We will refer to distributions with elongated tails, as shown in Fig. 7a , as tailed normal distributions. Those with chopped tops, as in Fig. 7b , will be called chopped normal distributions.
We h a v e examined the performance of our stopping criteria for each of the above v arieties of distorted normals. The non-normal power values were arti cially obtained from customized random number generators. The parameters for the stopping criterion were set Since the distributions were not normal, one would expect the resulting accuracy to be somewhat worse than 5. In all but a few cases, better than 10 accuracy was achieved with 99 con dence. The only examples that showed worse than 10 accuracy were distributions with very long tails, and double normal distributions with widely separated means. Even the distributions with very long tails had better than 15 accuracy with 99 con dence. The double normal distributions, however, had very large errors if the rst few samples were all centered around one hump of the distribution. When this occurred, the stopping criteria erroneously terminated the simulation.
In the cases of the drastic double normal distributions, we feel that they can be treated as follows. When a node has a high fanout, making it a potential cause of the double normal, then the length of each sampling region should be changed so that that node transitions several times in a sampling region. This will prevent the problems associated with an enable signal and should prevent problems with any double normal distributions.
Having said all this, and before leaving this section, it is important to reiterate that the normality assumption holds very well for all the benchmark circuits that we h a v e considered, as discussed in section 3 and shown in Fig. 2 .
Experimental Results
The Monte Carlo methods presented in this paper were implemented based on a simple variable delay logic simulator. This program will be referred to as McPOWER. The test circuits to be used in this section are the benchmarks presented at ISCAS in 1985 13 . These circuits are combinational logic circuits and Table 2 presents the number of inputs, outputs, and gates in each. c432  36  7  160  c499  41  32  202  c880  60  26  383  c1355  41  32  546  c1908  33  25  880  c2670  233  140  1193  c3540  50  22  1669  c5315  178  123  2307  c6288  32  32  2406  c7552  207  108  3512 We will compare the performance of McPOWER to that of probabilistic methods and substantiate the claims of section 2.2 that McPOWER has better accuracy and competitive simulation times. DENSIM 5, 8 is an e cient probabilistic simulation program that gives the average switching frequency called transition density in 5, 8 at every circuit node. These density v alues can be used to give an estimate of the total power dissipation. DENSIM does not take i n to account the correlation between internal circuit nodes. While it is known that this causes inaccuracy in the density v alues, it is prohibitively expensive to take all correlation into account for large circuits. Table 3 compares the performance of DENSIM, when used to estimate total power, to that of McPOWER. In both programs, every primary input had a signal probability of 0.5 and a transition density of 2e7 transitions per second corresponding to an average frequency of 10MHz. For McPOWER, a maximum error of 5 with 99 con dence was speci ed. As mentioned in section 3, McPOWER performs one long simulation that is broken into setup and sampling regions. The delays of the circuit determine the length of each setup region; however, the length of a sampling region is speci ed by the user. For Table 3 , the sampling region was set to 2.5 micro-seconds abbreviated s, which allows an average of 50 transitions per sampling interval on each input. The column labeled LOGSIM gives our best estimates of the power dissipation of these circuits, obtained from very long logic simulation runs. As seen from the table, McPOWER is consistently and highly accurate, while DENSIM has signi cant errors for some circuits. Although DENSIM is frequently faster, McPOWER's reliable accuracy makes it a more attractive approach for power estimation.
Typical convergent behavior of McPOWER is shown in Fig. 8 . The gure shows the power from three di erent iterations converging to the average power for c6288, one of the most complex ISCAS circuits. A similar plot is shown for c5315 in Fig. 9 .
Care must be taken in drawing conclusions from a single run of McPOWER. Since it uses random input vectors, the speed of convergence and the error in the power estimate depend on the initialization of the random number generator. This is illustrated in Table 4 , which shows the statistics obtained from one thousand McPOWER runs. The minimum, maximum, and average number of iterations required per run for 5 accuracy with 99 con dence are given. Notice that the average number of iterations required to converge does not increase with the circuit size. This con rms the dimension independence property of this approach which, as pointed out in the introduction, is a common feature of Monte Carlo methods. Also shown in the table are the percentage number of runs for which the error was greater than 5, which, as expected, is less than 1 in all cases. Table 5 compares DENSIM to a single run of McPOWER with 95 con dence, 20 accuracy, and a sampling region of 625 nano-seconds. As in Table 3 , each input has a signal probability of 0.5 and a transition density of 2e7 transitions per second. With these 
Conclusions
We h a v e presented a Monte Carlo based power estimation method. Randomly generated input waveforms are applied to the circuit using a logic timing simulator and the cumulative value of total power is monitored. The simulation is stopped when su cient accuracy is obtained with speci ed con dence. The statistical stopping criterion was discussed, along with experimental results from our prototype implementation McPOWER.
We h a v e shown that Monte Carlo methods are, in general, better than probabilistic methods for the estimation of power since they achieve superior accuracy with comparable speeds. They are also easier to implement and can be added to existing timing or logic simulation tools. Furthermore, the accuracy can be speci ed up-front with any desired con dence.
Feedback circuits present a severe problem for probabilistic methods. Monte Carlo methods are based on simple timing or logic simulation techniques and, therefore, experience very few di culties with feedback circuits. The only unresolved problem is to determine the length of the setup region, but we feel that good heuristics can be developed for this. Future research will focus on developing such heuristics, thus generalizing the Monte Carlo technique to handle any logic circuit.
Although we h a v e clearly demonstrated the superiority of Monte Carlo methods for power estimation, it is not clear that they will be better than probabilistic methods for other applications, such as estimating the power supply current w a v eforms. Future research will be aimed at exploring this and other applications of the Monte Carlo approach.
