Abstract --The design of low-power digital circuits has become a topic of great relevance in recent years, with increased consumer interest on portable, battery operated communications and computing. Aiming for low-power performance requires a robust methodology for the estimation of the likely power consumption of circuits and system in advance of their fabrication. The Monte Carlo parameter estimation technique is a general method that has found application in recent years as a useful approach to CMOS power estimation, particularly when used in conjunction with a simulation engine such as SPICE. Transistor-level circuits are particularly amenable to this technique. This paper describes an implementation of a first-order MC method, which is used to determine the average power estimation of CMOS digital circuits, based on netlists extracted from the circuit layout. Power estimates for a variety of circuits are presented.
INTRODUCTION
Power estimation for CMOS circuits largely relies on the application to the circuit of uniform white noise (UWN) input data and the subsequent monitoring of circuit activity [1] , [2] . These UWN-based power estimates allow comparative power measurements when evaluating architectures and they set an upper bound to the likely power estimation when the circuit is driven by real data [3] . In this context, the term power rating is taken to represent the typical power consumption of the system or module, given typical input data. The Monte Carlo estimation technique, as applied to the estimation of average power dissipation in CMOS circuits, consists of the continuous application of an input stimulus while simultaneously monitoring some measure of circuit activity such as node transitions or supply current [4] , [5] , [6] . The convergence of the circuit activity is calculated and when the power estimate, which is derived from the circuit activity measure, has converged with sufficiently small error, then the simulation is terminated and the result produced. For many circuits this process, based on the underlying Gaussian distribution of circuit activity values, provides a useful method for estimating the average circuit power consumption [7] . This paper explores an implementation of the MC methodology for CMOS digital circuits. The layout of the remainder of the paper is as follows. Section II describes the standard MC technique as implemented in this paper, while Section III presents the power estimation tool, LLAMA. Typical results are given in Section IV and the conclusions are presented in Section IV.
II THE MONTE CARLO TECHNIQUE
The Monte Carlo (MC) method is based on the application of random input data to a circuit, to estimate the average power. Such tools are useful to the designer in that they automate and consequently speed up the circuit analysis and design development cycles. Because of its ability to dynamically determine when to end a simulation, the MC technique has been successfully applied in various forms to many problems, including that of power estimation [8] , [9] . It is a statistical technique, which is dependent on the input data applied to the circuit. The input patterns are applied and the resulting power is monitored with a simulator. This is continued until the result is obtained with the required accuracy, at a specified confidence level.
One major feature of the Monte Carlo method is that it is dimension independent, which means that the number of samples required to achieve an adequate estimate is independent of the circuit size [10] . The block diagram in Figure 1 gives an overview of the technique as implemented in this work. 
Figure 1: Monte Carlo Power Estimation
At the beginning of the simulation, two parameters, the desired error and the confidence level are specified. A simulation run, usually referred to as an iteration, is then executed with a fixed number of input vectors. After a small number of iterations, typically three to five, the error is calculated. If the estimated error is small enough, the simulation terminates with the average power estimate. If the error is still larger than the value specified by the user at the start of the simulation, the iterations continue until the error reduces to the required value. This iterative approach gives the MC method two distinct advantages over other techniques. It avoids the speed/accuracy trade-off of those techniques, in that the desired accuracy may be achieved in realistic time for even large circuits [2] . The simplicity of the basic algorithm makes it easy to implement in conjunction with existing logic or circuit simulation environments. The operating principle behind the MC method is now briefly described.
Typically, a circuit description consists of a number of active elements, linked by capacitive nodes. Regardless of how the power estimate is obtained, the total power will be determined by the summed contributions from each node, many of which are independent from each other. The central limit theorem [11] indicates that in such circumstances the total power will be approximately normally distributed. This is demonstrated in Figure  2 . In this figure, the activity statistics from a 1-bit full adder are reported for a single internal node. The number of logic transitions at the node was recorded for each one of 1000 simulations. Each simulation consisted of a single iteration with 1000 UWN input vectors. When the input data is random, the average of the power estimates converges to the true power dissipation. The more power estimates obtained, the closer the average value converges to the true value [12] . In the context of this work, the true power is taken to be that power dissipation the value of which may be asymptotically approached by running an arbitrarily long simulation. In practice, estimates of the true value of the power dissipation when UWN data is the stimulus are determined by running very long simulations. This may involve over 100,000 input vectors with long simulation times.
The user may decide how closely the estimate need approach the true value by specifying the required accuracy. The estimate is finally produced within a given confidence interval, typically 95% to 99%. The normally distributed power estimates are used to tell the simulator when to stop by establishing a stopping criterion, which is evaluated at the end of each iteration. The stopping criterion is obtained as follows. Given a normally distributed variable of total power P T , running N iterations of the simulator produces N different estimates of the average power dissipation. A sample average η T and a sample standard deviation s T are then formed. There is therefore (1-α)100% confidence that
where z α /2 is obtained from the normal distribution. This may be rewritten as in (2), where the left-hand quantity represents the fractional error in the estimated power.
Therefore, for a specified percentage error ε in the power estimate and for a given confidence level (1-α ), the circuit is repeatedly simulated until (3) becomes true. This expression is referred to as the stopping criterion.
At the end of iteration N, both the standard deviation s T and the mean η T are estimated. The test statistic above is also calculated. If the value obtained is larger than the specified error, the result is not accurate enough and a further iteration is performed. This cycle is repeated until the specified accuracy is obtained.
Given the accuracy and the confidence level, the remaining parameter to be selected is the sample size, i.e. the number of input vectors per iteration. The optimum sample size is not readily apparent. The smaller the sample size, the less time is taken by each simulation cycle. However, it is easily observed that using a larger number of input vectors results in a quicker convergence, that is less iterations are required. The overall simulation time is also highly dependent on the size of the circuit.
III OPERATION OF LLAMA
As part of this work, an MC tool for power estimation of CMOS circuits was developed.
LLAMA (Low Level Analysis of Microelectronic
Architectures) is an MC-based power estimation tool. This tool operates at the circuit, i.e. layout, level. It consists of a Monte Carlo 'wrap-around' program, which uses a SPICE engine for simulation. The stopping criterion for the simulator is outlined and simulation results for various circuits are reported.
Some typical data are presented for the operation of LLAMA in Figure 4 , where the total simulation time and the average time per iteration is given as a function of the sample size for a 12-bit adder.
Figure 4: Simulation Time as a Function of Input Sample Size
In general, the number of iterations required for convergence declines with the sample size. However, the simulation time for each iteration increases, as there are more vectors per sample. This offsets any benefit in reducing the input sample size unduly. Figure 5 gives the number of iterations required for 50 simulations of the 12-bit adder, with 50 input vectors per iteration. There is a spread of values, from 2 to 12, reflecting the uniform distribution from which the data is drawn. 
Figure 5: Number of Iterations of MC Loop for Convergence
Based on a review of practice as reported in the literature [12] , [2] and the experience of running many such simulations as reported in this section, a figure of 100 inputs per sample appears to be an acceptable choice of sample size. It provides an adequate balance between the three time measures of iteration time, number of iterations and total simulation time. The use of other sample sizes is not greatly significant, affecting one or other of the three measures, but not altering the accuracy or effectiveness of the MC technique in any way.
Details are now provided concerning the internal organisation and operation of the power estimation tool LLAMA. Figure 6 shows an outline of the implementation. The tool operates in conjunction with a SPICE simulation engine. The input to the estimator consists of a circuit-level netlist and some start-up parameters. Based on the output from the data generator, a stimulus file containing the input vectors is generated for the current iteration. These vectors are applied to the simulator at a rate determined by the user-specified sampling frequency using a file-based input mode within the simulator. The resulting output is a matrix of time current pairs, which represents the total supply current variation over the simulation period. The time resolution of this matrix is dynamically controlled by the simulator and is typically of the order of picoseconds depending on the circuit activity and the simulator parameter RELTOL [Micr96], which sets the lower limit of time increment. This limit exists to prevent instability in the simulator, which can lead to convergence problems if the increment is too small. The average value of the current waveform is then obtained using trapezoidal integration, which, when multiplied by the supply voltage, yields the average power dissipation. The stopping criterion is then evaluated and a decision is made on whether to proceed with the next simulation cycle. If the criterion has been met with an error less than the specified value, the program terminates and the error, the number of iterations and the estimated mean power dissipation are reported.
IV RESULTS
The following figures illustrate the operation of LLAMA. Firstly, a typical simulation run is analysed in Figure 7 , which illustrates the distribution of power estimates over a simulation run of 30 iterations. The normally distributed values are clustered around a mean of approximately 0.15mW, which is the final power estimate from the simulation. Two outlying values of 0.26mW occurred during the simulation.
Figure 7: Typical Monte Carlo Simulation Run
The effect of this large value on the convergence is seen in Figure 8 , which plots the error as a function of the iteration number. As a result of the excessive current caused by the particular set of input bit transitions during iteration three, the power dissipation recorded is well in excess of the average power. As the Monte Carlo method estimates the average power only, such extremes of power dissipation are to be occasionally expected. The effect of such extrema is to prolong the convergence of the error and to extend the simulation time. In this simulation, large power dissipation was recorded during iterations two and 11, in each case causing the error to increase. As the effect is cumulative, the impact of the second value is much less. It does mean, however, that the error function is nonmonotonic and is subject to statistical variation. The figure also shows a running average for the power. The average approaches the final value of 0.15mW as the error decreases to the 10% specified by the user. Although the power calculated during iteration two was 0.26mW, the previous value was much lower, thus giving an average value of approximately 16.5mW. A number of circuits of increasing size were tested to give an indication of how Monte Carlo power estimation behaves with increasing circuit complexity. The details of each circuit are summarised in Table 1 . The circuits, n-bit adders, for n = 4, 8, 12, 16 and 20, were automatically created using a synthesis tool which was developed during this work. This allowed the verification and exploration of the power estimation and data generation techniques developed. It also provides a convenient method of producing consistent, repeatable netlists for verification of the power estimation software. The circuits were simulated with 100 vectors per iteration, at a 99% level of confidence. The maximum allowable error was 20%. As expected, the simulation time for a circuit is a function of the circuit size. This is most clearly indicated by the relationship between average iteration time and the circuit size. Due to the input pattern dependence of the MC process, there is no deterministic relationship between circuit size and the number of iterations. Large samples of 1000 input vectors and greater do occasionally result in a smaller number of iterations, but this is not guaranteed. They can also lead to a decrease in the total number of iterations per simulation run, as the error between iterations is reduced, due to the larger number of vectors per iteration. One consequence of this practice is that there are insufficient power estimates to use a normal distribution. Student's tdistribution provides an approximation to the normal distribution when the sample size is small. It is specified for α degrees of freedom, where, for a sample size N, α is equal to N-1. The stopping criterion is modified by replacing the normal variate with an appropriate t-statistic. Thus after N iterations, with a (1-α)100% confidence level, the stopping criterion becomes 
In this expression s T is the estimated standard deviation, n T is the mean and t is the t-statistic with N-1 degrees of freedom. In practice, the majority of simulations require less than thirty iterations and so this stopping criterion is routinely incorporated into Monte Carlo applications, as is the case in this implementation.
V CONCLUSIONS
A Monte Carlo power estimation tool for CMOS circuits was developed. It uses the SPICE simulation engine and by applying UWN input vectors to the circuit netlist, an accurate power estimate, when compared to a single long SPICE simulation, is obtained. Circuits of varying degrees of complexity were tested and the program performs well, independent of circuit size. The tool, while tested for this paper on transistor netlist level circuit descriptions, is capable of estimating power for a variety of circuit description types. This is because is operates a wrap-around program in conjunction with a standard circuit or VHDL simulator.
