Abstract{Higher levels of integration have led to a generation of integrated circuits for which p o w er dissipation and reliability are major design concerns. In CMOS circuits, both of these problems are directly related to the extent of circuit switching activity. The average number of transitions per second at a circuit node is a measure of switching activity that has been called the transition density. This paper presents a statistical simulation technique to estimate individual node transition densities. The strength of this approach is that the desired accuracy and condence can be specied up-front b y the user. Another key feature is the classication of nodes into two categories: regular-and low-density nodes. Regulardensity nodes are certied with user-specied percentage error and condence levels. Low-density nodes are certied with an absolute error, with the same condence. This speeds convergence while sacricing percentage accuracy only on nodes which contribute little to power dissipation and have few reliability problems.
I. INTRODUCTION
The advent of VLSI technology has brought new challenges to the manufacture of integrated circuits. Higher levels of integration and shrinking line widths have led to a generation of devices which are more sensitive t o p o w er dissipation and reliability problems than typical devices of a few years ago. In these circuits excessive p o w er dissipation may cause run-time errors and device destruction due to overheating, while reliability issues may shorten device lifespan. It is especially useful to diagnose and correct these problems before circuits are fabricated. In CMOS circuits, gates draw current and consume power only when making logical transitions. As a result, power dissipation and reliability strongly depend on the extent of circuit switching activity. Hence, there is a need for CAD tools that can estimate circuit switching activity during the design phase. Circuit activity is strongly dependent on the inputs being applied to the circuit. For one input set the circuit may experience no transitions, while for another it may switch frequently. During the rst input set the circuit dissipates little power and experiences little wear, but for the second its activity could cause device failure. However, the specic input pattern sets cannot be predicted up-front. Furthermore, it is impractical to simulate a circuit for all possible inputs. Thus, this input pattern dependence severely complicates the estimation of circuit activity.
Recently, some approaches have been proposed to get around this problem by using probabilities to represent typical behavior at the circuit inputs. In [1] , the average number of transitions per second at a circuit node is proposed as a measure of switching activity, called the transition density. An algorithm was also proposed to propagate specied input transition densities into the circuit to compute the densities at all the nodes. The algorithm is very ecient, but it neglects the correlation between signals at internal nodes. This leads to errors in the individual node densities that may not always be acceptable, especially since the desired accuracy cannot be specied up-front.
This correlation problem was avoided in [2] , where the total average power of the circuit (a weighted sum of the node transition densities) was statistically estimated by simulating the circuit for randomly generated input patterns. The power value is updated iteratively until it converges to the true power with a user-specied accuracy (percentage error tolerance), and a user-specied condence level. It was found that convergence is very fast because the distribution of the overall circuit power was very nearly Gaussian and very narrow about its mean.
While po wer estimation is one important reason to nd the transition densities in a circuit, it is not the only one. The densities can be used to estimate average current in the power and ground busses, to be used for electromigration analysis. For this application, it is not enough that the overall estimated power be accurate, but the individual node density v alues must be accurate as well. However, it becomes extremely inecient t o 31 S T apply the statistical sampling technique in [2] to single gates (so as to estimate the transition density a t e v ery gate output). This is because a large number of input patterns is required to converge for nodes that switch very infrequently, a s w e will demonstrate later on.
In this paper, we will present an extension of the approach in [2] whereby w e remove the above limitation and eciently estimate the transition density at all circuit nodes. To o v ercome the slow convergence problem, we apply absolute error bounds to nodes with low transition density v alues, instead of percentage error bounds. This is done by establishing a threshold, min , to classify node transition density v alues. Any node with a transition density v alue less than the threshold is classied as a low-density node and is certied with absolute error. Nodes with transition density v alues equal to or above the threshold are classied as regular density nodes and are certied with a percentage error. A major advantage of this approach is that the desired accuracy can be specied up-front b y the user. Furthermore, the percentage error bound is relaxed (i.e., replaced by an absolute error bound) only on low-density nodes. These nodes dissipate little power and have few reliability problems.
The statistical simulation techniques to be presented are implemented in a prototype called \Mean Estimator of Density" (MED). MED's performance is evaluated by looking at the accuracy of its results, its convergence rate, and its execution time.
The paper is organized as follows. In the next section, the statistical estimation technique is described. Section III presents experimental data and evaluates MED's performance, while section IV presents a summary.
II. PROPOSED SOLUTION This section presents our statistical estimation technique for computing the transition densities at all circuit nodes. It is expected that the user will supply the transition density, denoted D(x), for every circuit input node. Actually, the user should also specify the fraction of time that a circuit input signal is high, called the probability at that node, and denoted by P (x). If unspecied, these probabilities can be assigned default values of 1=2. This technique, as well as the other techniques reviewed in the introduction, apply only to combinational circuits. It can be applied to sequential circuits provided that the transition densities at the latch outputs are specied.
Given the input transition densities and probabilities, we can use a random number generator to generate corresponding logic input waveforms with which to drive a simulator. Based on such a simulation of the circuit for a given time period T , w e can count the number of transitions at every node, a number which will be called a sample taken at that node. If we repeat this process N times, and form the average n of the number of transitions at a node, so-called the sample mean, then n=T is an estimate of the transition density at that node.
It is well known from statistical mean estimation [3] that for large values of N, the sample mean n will approach the true average number of transitions in T , t o be represented by . Likewise, the sample standard deviation s will approach the true standard deviation for large N. One continues to take samples (make simulation runs) until n is close enough to . The method by which one tests for this is called the stopping criterion, to be discussed next. The following sub-section details the mechanism of input waveform generation.
A. Stopping Criterion
According to the Central Limit Theorem [3] , n is a value of a random variable with mean whose distribution approaches the normal distribution for large N. The minimum number of samples, N, to satisfy nearnormality i s t ypically 30. It is also known that for such values of N one may use s as an estimate of .
Since the distribution of sample means is near-normal, we can make inferences about the quality o f a n individual sample. With (1 ) condence it then follows that [3] :
(1) where z =2 is dened so that the area to its right under the standard normal distribution curve is equal to =2. Equation (1) may be rearranged to better accommodate mean estimation, by using:
which is justied for values of N which normalize the sample mean distribution, typically for N 30. This is not restrictive; typical simulations take many more samples. The transformed equation is more applicable to our problem, so that with condence (1 ) 
where is dened to be a user-specied error tolerance. Thus (4) provides a stopping criterion to yield the accuracy specied in (6) with condence (1 
Thus min 1 becomes an absolute error bound that characterizes the accuracy for low-density nodes. We therefore classify the circuit nodes into regulardensity nodes and low-density nodes. During the algorithm (after N exceeds 30) (4) is used as a stopping criterion as long as n min , otherwise (7) is used instead. The value of min can be specied by the user and strongly aects the speed of the algorithm, as will be shown in section III.
Although the percentage error for low-density nodes sharply increases as n 0, the absolute error remains relatively xed. In fact, it can be shown that the absolute error bounds for low-density nodes are always less than the absolute error bounds for regular density nodes. Although these nodes require the longest time to converge, they have the least eect on circuit power and reliability. Therefore the above strategy reduces the execution time, with little or no penalty. B. Input Generation Fig. 1 illustrates the simulator block diagram, and shows that it can run in one of two modes, synchronous and asynchronous. In the synchronous mode, we assume that the (combinational) circuit is part of a larger synchronous sequential circuit design, so that its input events should be generated in synchrony. Otherwise, asynchronous operation is assumed and events do not have to be synchronized. Thus the only dierence between synchronous and asynchronous operation is the generation of input transitions driving the circuit.
In the synchronous mode, an input node may transition only at the beginning of a clock cycle, so that the input pulse widths are discrete multiples of the clock p eriod, T c . The distribution of the high (and low) pulses at the inputs is arbitrary, and can be user-specied. Our implementation assumes that an input signal is Markov, so that its value after a clock edge depends only on its value before the clock edge, once that value is specied, and not on its values during earlier clock cycles. Under this assumption, it can be shown that the pulse widths have a geometric distribution. If 0 and 1 are the mean low and high pulse widths, computed from [1] as:
then it can also be shown that the probability that a low signal will transition high on the clock is:
and the probability of a high signal transitioning low o n the clock is:
A random number generator uses (11) and (12) to generate input transitions for every clock cycle. For circuits running asynchronously, input transition generation proceeds dierently. Since input transitions may occur at any time, the input generation routine determines the length of time between transitions instead of the probability of transition at the clock edge. Again, the distribution of the pulse widths is arbitrary, and can be specied by the user. Our implementation was based on a Markov assumption, so that the length of time between successive transitions is a random variable with an exponential [3] distribution. The length of time a signal stays in the low (high) state has mean 0 ( 1 ). From this information, the waveform is easily generated.
Additionally, when running asynchronously the simulator requires a setup period. This is a waiting period during which no samples are collected. It is needed for the same reasons that a setup period was required in [2] . Briey, it allows the circuit to \get up to speed." Before sampling begins, transitions at the inputs must be allowed to propagate into the internal nodes of the circuit. Until all levels of the circuit are involved, switching activity is articially low and any p o w er or reliability estimates will be skewed. The length of the setup period should be, as was also shown in [2] , no less than the maximum delay of the circuit.
III. EXPERIMENTAL RESULTS
This technique has been implemented in the program MED (Mean Estimator of Density), in which the basic simulation capability i s e v ent-driven, gate level, with a scalable delay timing model (based on output capacitance and fanout). In general, any simulation strategy can be used, so that the technique presented can be wrapped a r ound any existing simulator and simulation library. In this section we present data collected with MED, and show that it is both accurate and practical on a number of large benchmark circuits.
A. Input Specication
The experimental results to be presented are based on a specication of the typical circuit inputs as follows.
In the synchronous mode, we assumed that the circuit would be operated near its maximum operating frequency, so that the clock cycle time, T c , is close to the maximum circuit delay, T max . Unless otherwise specied, the results presented were based on a value of T c that is 1 nsec longer than T max .
The second assumption concerns the input probability and transition density v alues. It was specied that every input node has probability of 1/2 and a transition density o f 1 = (2T c ). Thus, on average, each input node was assumed to spend an equal time high and low, and to have one transition every other clock cycle.
Finally the transition density v alues were normalized to the clock period, i.e., the transition densities output by the program are expressed in terms of transitions per clock cycle. The output densities are then invariant t o clock cycle time, and the user has a more intuitive view of circuit activity -0.5 transitions per clock cycle is much more informative than 5e7 transitions per second. This is especially useful in light of the fact that the absolute transition density v aries linearly with clock frequency.
Asynchronous input probability and density assumptions are similar to the synchronous assumptions. Inputs are assumed to have probabilities of 1/2 and transition densities of 1=(2T max ). Transition densities for asynchronous circuits are normalized by T max .
B. Data Collection
The issues to be investigated are (1) the error of the technique, (2) the handling of low-density nodes, and (3) the practicality of the technique for large circuits. The data collected should allow MED's performance to be evaluated in the above three categories.
B.1. Establishing accurate transition density values
The rst step in evaluating MED's performance is to establish a set of accurate node transition densities. This baseline would then be used to calculate the actual error of the estimated transition density v alues. This was done by running MED for a long time on the benchmark circuits presented at ISCAS in 1985 [4] . Typically, in order to achieve 99.99% condence and 1% error tolerance for all the nodes, this required millions of input vectors and hours or days of SUN Sparc-10 CPU time. Table I lists the circuits, number of gates, number of samples, and execution times for each circuit and mode of operation. To v erify that MED produces results within the specied error tolerances, 10 runs with min varying linearly from 0.05 to 0.50 were executed with 95% condence (1 = 0 : 95) and 5% error tolerance ( = 0 : 05) on the ISCAS 1985 set. Node transition density v alues from the runs were compared with the standard values computed above. Regular transition density v alues, n > min , are valid if 95% of the values have less than 5% error. Lowdensity v alues, n < min , are valid if 95% of the values satisfy j nj min 1 . Tables II and III give the percentage of transition density v alues out-of-bounds for all the circuits under investigation. From the tables it can be seen that this percentage is very low, well below the specied 5%. This happens because many of the nodes are oversampled, since the simulator will run until the last node converges. This yields more accuracy than what is actually specied by the user. It is expected that since the simulator runs until its last node converges, and further that low-density nodes require the longest time to converge, then adjusting min would signicantly aect overall simulation time while sacricing percentage accuracy on a small number of nodes.
Ten simulations are run with min varying linearly from 0.05 to 0.50. SUN Sparc-ELC execution times in cpu seconds are tabulated and reported in Table IV . Low-density nodes typically require the largest number of samples to converge, and as a result execution time drops dramatically as min rises. In some cases however, the lowest-density nodes are not the last to converge, and the adjustment o f min has no eect on execution time. The simulation times for all circuits except for c6288 follow a general downward trend, as shown in Fig. 2 . The curves result from averaging circuit execution times (excluding c6288) normalized by the time required for the circuit to simulate with min = 0 : 05.
The behavior of circuit c6288 is an exception to this trend. The execution times for c6288 are essentially invariant t o min for 0 < min < 0:5. This occurs because c6288 has regular density nodes with considerable variation, and at least one of the regular density nodes with n > 0 : 5 converges after all low-density nodes. Because of this, the last nodes to converge are not aected by min . The nal issue investigated is the simulator's execution time when processing larger circuits. For the technique to gain wide acceptability, i t m ust have reasonable execution times on larger circuits. The circuits used in this section are the largest ones presented at ISCAS in 1989 [5] .
Circuits were rst simulated with high min . This provided a rough estimate of each circuit's transition density distribution. The simulation was then rerun with min chosen to classify under 20% of the nodes as low-density nodes while providing reasonable execution times. The number of gates, execution times, and percentage of lowdensity nodes are shown for each circuit in Table V . Considering the high accuracy level (5% error at 95% condence), the execution times are reasonable, especially for the more common class of synchronous circuits, and indicate that this approach is applicable to large circuits. IV. CONCLUSIONS This paper describes a statistical estimation technique, implemented in the program MED, which estimates individual node transition densities with user-specied accuracy and condence. It uses a threshold min to classify nodes as either regular-or low-density nodes. Regulardensity nodes, n min , h a v e transition density v alues certied to be within a user-specied percentage error. Low-density nodes, n < min , h a v e transition density values with absolute error bounds.
Data were gathered to verify that both regular-and low-density node transition density v alues are within the stated error bounds. Trials were run with 95% condence and 5% error tolerance. It was found that well over 95% of regular node transition density v alues have less than 5% error. This occurs because many o f t h e nodes converge quickly and are subsequently oversampled. Low-density nodes also performed well. Well over 95% of low-density node transition density v alues have less than the specied absolute error.
Data were also gathered to investigate the variation of execution time with min . In most cases, it was found that the execution time for circuits falls dramatically as min rises. This occurs because the lowest density nodes typically converge last.
Finally, data were taken for execution times on large circuits. MED required reasonable execution times for large circuits when under 20% of nodes are classied as low-density. 
