Higher levels of integration have led to a generation of integrated circuits for which power dissipation and reliability are major design concerns. In CMOS circuits, both of these problems are directly related to the extent of circuit switching activity. The average number of transitions per second at a circuit node is a measure of switching activity that has been called the transition density. This paper presents a statistical simulation technique to estimate individual node transition densities in combinational logic circuits. The strength of this approach is that the desired accuracy and con dence can be speci ed up-front by the user. Another key feature is the classi cation of nodes into two categories: regular-and low-density nodes. Regular-density nodes are certi ed with user-speci ed percentage error and con dence levels. Low-density nodes are certi ed with an absolute error, with the same con dence. This speeds convergence while sacri cing percentage accuracy only on nodes which contribute little to power dissipation and have few reliability problems.
I. INTRODUCTION
The advent of VLSI technology has brought new challenges to the manufacture of integrated circuits. Higher levels of integration and shrinking line widths have led to a generation of devices which are more sensitive to power dissipation and reliability problems than typical devices of a few years ago. In these circuits excessive power dissipation may cause run-time errors and device destruction due to overheating, while reliability issues may shorten device lifetime. It is especially useful to diagnose and correct these problems before circuits are fabricated. In CMOS circuits, gates draw current and consume power only when making logical transitions. As a result, power dissipation and reliability strongly depend on the extent of circuit switching activity. Hence, there is a need for CAD tools that can estimate circuit switching activity during the design phase.
Circuit activity is strongly dependent on the inputs being applied to the circuit. For one input set the circuit may experience no transitions, while for another it may switch frequently.
y This work was supported in part by the National Science Foundation (NSF), under grant MIP-9308426.
Submitted to VLSI Design, 1995.
During the rst input set the circuit dissipates little power and experiences little wear, but for the second its activity could cause device failure. However, the speci c input pattern sets cannot be predicted up-front. Furthermore, it is impractical to simulate a circuit for all possible inputs. Thus, this input pattern dependence severely complicates the estimation of circuit activity.
Recently, some approaches have been proposed to overcome this problem by using probabilities to represent typical behavior at the circuit inputs. In 1], the average number of transitions per second at a circuit node is proposed as a measure of switching activity, called the transition density. An algorithm was also proposed to propagate speci ed input transition densities into the circuit to compute the densities at all the nodes. The algorithm is very e cient, but it neglects the correlation between signals at internal nodes. This leads to errors in the individual node densities that may not always be acceptable, especially since the desired accuracy cannot be speci ed up-front.
This correlation problem was avoided in 2], where the total average power of the circuit (a weighted sum of the node transition densities) was statistically estimated by simulating the circuit for randomly generated input patterns. The power value is updated iteratively until it converges to the true power with a user-speci ed accuracy (percentage error tolerance), and a user-speci ed con dence level. It was found that convergence is very fast because the distribution of the overall circuit power was very nearly Gaussian and very narrow about its mean.
While power estimation is one important reason to nd the transition densities in a circuit, it is not the only one. If we assume that the power bus carries a constant voltage V dd , then a single logic gate draws an average current 1] of: (1) and dissipates an average power of:
where C x is total capacitance, and D(x) the transition density, at the gate output node x.
Thus the individual node transition densities can be used to nd the individual gate power values using (2) which are helpful in order to avoid hot spots and to ensure that the power dissipation is relatively uniform across the chip. Furthermore, the individual node densities can be used to estimate average current in the power and ground busses using (1), to be used for electromigration analysis. However, it becomes extremely ine cient to use the statistical sampling technique in 2] to estimate the transition densities at every gate output. This is because a large number of input patterns would be required to converge for nodes that switch very infrequently, as we will demonstrate in section II.
-2-
In this paper, we will present an extension of the approach in 2] whereby we remove the above limitation and e ciently estimate the transition density at all circuit nodes. To overcome the slow convergence problem, we apply absolute error bounds to nodes with low transition density values, instead of percentage error bounds. This is done by establishing a threshold, min , to classify node transition density values. Any node with a transition density value less than the threshold is classi ed as a low-density node and is certi ed with absolute error. Nodes with transition density values equal to or above the threshold are classi ed as regular density nodes and are certi ed with a percentage error. A major advantage of this approach is that the desired accuracy can be speci ed up-front by the user. Furthermore, the percentage error bound is relaxed (i.e., replaced by an absolute error bound) only on low-density nodes. These nodes dissipate little power and have few reliability problems. As with other previous work in this area, our technique is presently restricted to combinational circuits (we are in the process of extending it to include sequential circuits).
The statistical simulation techniques to be presented are implemented in a prototype called \Mean Estimator of Density" (MED). MED's performance is evaluated by looking at the accuracy of its results, its convergence rate, and its execution time. Preliminary results of this work have appeared in 3].
The paper is organized as follows. In the next section, the statistical estimation technique is described. Section III presents experimental data and evaluates MED's performance, while section IV presents a summary. Finally, two appendices are presented that contain some required theoretical results.
II. PROPOSED SOLUTION
This section presents our statistical estimation technique for computing the transition densities at all circuit nodes. It is expected that the user will supply the transition density, denoted D(x), for every circuit input node. Actually, the user should also specify the fraction of time that a circuit input signal is high, called the probability at that node, and denoted by P(x). If unspeci ed, these probabilities are assigned default values of 1=2. This technique, as well as the other techniques reviewed in the introduction, apply only to combinational circuits. It can be applied to the combinational part of a sequential circuit provided that the transition densities at the ip-op outputs (which are inputs to the combinational part) are speci ed.
Given the input transition densities and probabilities, we can use a random number generator to generate corresponding logic input waveforms with which to drive a simulator.
Based on such a simulation of the circuit for a given time period T, we can count the number of transitions at every node, a number which will be called a sample taken at that node. If we repeat this process N times, and form the average n of the number of transitions at a -3-node, so-called the sample mean, then n=T is an estimate of the transition density at that node.
It is well known from statistics 4] that for large values of N, the sample mean n will approach the true average number of transitions in T, to be represented by . Likewise, the sample standard deviation s will approach , where 2 is the variance of the number of transitions in T. One continues to take samples (make simulation runs) until n is close enough to . The method by which one tests for this is called the stopping criterion, to be discussed next. The following sub-section details the mechanism of input waveform generation.
A. Stopping Criterion
According to the Central Limit Theorem 4], n is a value of a random variable with mean whose distribution approaches the normal distribution for large N. The minimum number of samples, N, to satisfy near-normality is typically 30. It is also known that for such values of N one may use s as an estimate of .
Since the distribution of sample means is near-normal, we can make inferences about the quality of an individual sample. With (1 ? ) 100% con dence it then follows that 4]:
?z =2 n ? n z =2 n (3) where 2 n = 2 =N is the variance of n and where z =2 is de ned so that the area to its right under the standard normal distribution curve is equal to =2. Equation (3) may be rearranged to better accommodate mean estimation, by using:
which is justi ed for values of N which normalize the sample mean distribution, typically for N 30. This is not restrictive; typical simulations take many more samples. The transformed equation is more applicable to our problem, so that with con dence (1 ? ) 100%, we have: where is de ned to be a user-speci ed error tolerance. Thus (6) provides a stopping criterion to yield the accuracy speci ed in (8) with con dence (1 ? ) 100%. It should be clear from (6) that for small values of n, say n < min , the number of samples required can become too large. It thus becomes too expensive to guarantee a percentage accuracy for low-density nodes. Instead, we can certify these nodes with an absolute error bound, as follows. Suppose we use the modi ed stopping criterion:
for low-density nodes (with n < min ). Then with (1 ? ) 100% con dence:
Thus min becomes an absolute error bound that characterizes the accuracy for low-density nodes.
We therefore classify the circuit nodes into regular-density nodes and low-density nodes.
During the algorithm (after N exceeds 30) (6) is used as a stopping criterion as long as n min , otherwise (9) is used instead. The value of min can be speci ed by the user and strongly a ects the speed of the algorithm, as will be shown in section III.
Let n r and r be the measured and true density values for a regular-density node, and let n l and l be the corresponding values for a low-density node. Since n r > min then at convergence and for small , 1 , we have: j n r ? r j n r 1 > min 1 = min 1 + min j n l ? l j (11) so that the absolute error values for low-density nodes should be less than the absolute errors for regular-density nodes. Although low-density nodes require the longest time to converge, they have the least e ect on circuit power and reliability. Therefore the above strategy reduces the execution time, with little or no penalty. B. Input Generation Our implementation of this technique has two modes, synchronous and asynchronous, as shown in the block diagram in Fig. 1 . In the synchronous mode, we assume that the (combinational) circuit is part of a larger synchronous sequential circuit design, so that its input events should be generated in synchrony. Otherwise, asynchronous operation is -5- assumed and events do not have to be synchronized. Thus the only di erence between synchronous and asynchronous operation is in the generation of the input transitions.
B.1. Synchronous mode
In the synchronous mode, an input node may transition only at the beginning of a clock cycle, so that the input pulse widths are discrete multiples of the clock period, T c . The distribution of the high (and low) pulses at the inputs is arbitrary, and can be user-speci ed. The choice of distribution is not very important because, as observed in 2], the power is relatively insensitive to the particular distribution, rather it depends mainly on the input transition densities. Our implementation assumes that the distribution is geometric 4]. This arises from a simple su cient condition that an input signal be Markov 5], i.e., that its value after a clock edge depends only on its value before the clock edge, once that value is speci ed, and not on its values during earlier clock cycles. Under this assumption, we show in appendix B that the pulse widths have a geometric distribution. If 0 and 1 are the mean low and high pulse widths, computed as shown in appendix A from:
then it is also shown in appendix B that the probability that a low signal will transition high on the clock is:
-6-and the probability of a high signal transitioning low on the clock is:
A random number generator uses (14) and (15) to generate input transitions for every clock cycle. B.2. Asynchronous mode For circuits running asynchronously, input transition generation proceeds di erently. Since input transitions may occur at any time, the input generation routine determines the length of time between transitions instead of the probability of transitioning at the clock edge. Again, the distribution of the pulse widths is arbitrary, and can be speci ed by the user. Our implementation is based on a Markov assumption, so that the length of time between successive transitions is a random variable with an exponential 5] distribution. The length of time a signal stays in the low (high) state has mean 0 ( 1 ). From this information, the waveform is easily generated using an exponential random number generator.
Additionally, when running asynchronously the simulator requires a setup period. This is a waiting period during which no samples are collected. It is needed for the same reasons that a setup period was required in 2]. Brie y, it allows the circuit to \get up to speed." Before sampling begins, transitions at the inputs must be allowed to propagate into the internal nodes of the circuit. Until all levels of the circuit are involved, switching activity is arti cially low and any power or reliability estimates will be skewed. The length of the setup period should be, as was also shown in 2], no less than the maximum delay of the circuit.
III. EXPERIMENTAL RESULTS
This technique has been implemented in the program MED (Mean Estimator of Density), in which the basic simulation capability is event-driven, gate level, with a scalable delay timing model (based on output capacitance and fanout). In general, any simulation strategy can be used, so that the technique presented can be wrapped around any existing simulator and simulation library. In this section we present data collected with MED, and show that it is both accurate and practical on a number of large benchmark circuits.
A. Input Speci cation
The experimental results to be presented are based on a speci cation of the typical circuit inputs as follows.
In the synchronous mode, we assumed that the circuit would be operated near its maximum operating frequency, so that the clock cycle time, T c , is close to the maximum circuit delay, T max . Unless otherwise speci ed, the results presented were based on a value of T c that is 1 nsec longer than T max .
-7-Secondly, the transition density values were normalized to the clock period, i.e., the transition densities used by the program are expressed in terms of transitions per clock cycle, rather than transitions per second. The output densities are then invariant to clock cycle time, and the user has a more intuitive view of circuit activity -0.5 transitions per clock cycle is more informative than 5e7 transitions per second.
Finally, it was speci ed that every input node has probability of 1/2 and a transition density of 1=2. Thus, on average, each input node was assumed to spend an equal time high and low, and to have one transition every other clock cycle.
Asynchronous input probability and density assumptions are similar to the synchronous assumptions. In this case, the transition densities are normalized by T max and inputs are assumed to have probabilities of 1/2 and transition densities of 1=2.
B. Data Collection
The issues to be investigated are (1) the error of the technique, (2) the handling of lowdensity nodes, and (3) the practicality of the technique for large circuits. The data collected should allow MED's performance to be evaluated in the above three categories.
B.1. Establishing accurate transition density values
The rst step in evaluating MED's performance is to establish a set of accurate node transition densities. This baseline would then be used to calculate the actual error of the estimated transition density values. This was done by running MED for a long time on the benchmark circuits presented at ISCAS in 1985 6] . Typically, in order to achieve 99.99% con dence and 1% error tolerance for all the nodes, this required millions of input vectors and hours or days of CPU time. Table I lists the circuits, number of gates, and number of samples required for each circuit and mode of operation.
B.2. Calculating error distributions
To verify that MED produces results within the speci ed error tolerances, 10 runs with min varying linearly from 0.05 to 0.50 were executed with 95% con dence ( = 0:05) and 5% error tolerance ( = 0:05) on the ISCAS 1985 set. Node transition density values from the runs were compared with the standard values computed above. Regular transition density values, n min , are valid if 95% of the values have less than 5% error. Low-density values, n < min , are valid if 95% of the values satisfy j ? nj min . Tables II and III give the percentage of transition density values out-of-bounds for all the circuits under investigation. From the tables it can be seen that this percentage is very low, well below the speci ed 5%. This happens because many of the nodes are oversampled, since the simulator will run until the last node converges. This yields more accuracy on some nodes than what is actually speci ed by the user.
-8- It is expected that since the simulator runs until its last node converges, and further that low-density nodes require the longest time to converge, then adjusting min would signi cantly a ect overall simulation time while sacri cing percentage accuracy on a small number of nodes.
Ten simulations are run with min varying linearly from 0.05 to 0.50. SUN Sparc-ELC execution times in cpu seconds are tabulated and reported in Table IV . Low-density nodes typically require the largest number of samples to converge, and as a result execution time drops dramatically as min rises. In some cases however, the lowest-density nodes are not the last to converge, and the adjustment of min has no e ect on execution time.
The simulation times for all circuits except for c6288 follow a general downward trend, as shown in Fig. 2 . The curves result from averaging circuit execution times (excluding c6288) normalized by the time required for the circuit to simulate with min = 0:05.
The behavior of circuit c6288 is an exception to this trend. The execution times for c6288 are essentially invariant to min for 0 < min < 0:5. This occurs because c6288 has regular density nodes with considerable variation, and at least one of the regular density nodes with n > 0:5 converges after all low-density nodes. Because of this, the last nodes to converge are not a ected by min .
-9- 
B.4. Execution times on larger circuits
The nal issue investigated is the simulator's execution time when processing larger circuits. For the technique to gain wide acceptability, it must have reasonable execution times on larger circuits. The circuits used in this section are the largest ones presented at ISCAS in 1989 7] .
Circuits were rst simulated with high min . This provided a rough estimate of each circuit's transition density distribution. The simulation was then rerun with min chosen to classify under 20% of the nodes as low-density nodes while providing reasonable execution times. The number of gates, execution times, and percentage of low-density nodes are shown for each circuit in Table V. Considering the high accuracy level (5% error at 95% con dence), the execution times are reasonable, especially for the more common class of synchronous circuits, and indicate that this approach is applicable to large circuits.
IV. SUMMARY
We have presented a statistical estimation technique, implemented in the program MED, which estimates individual node transition densities with user-speci ed accuracy and condence. It uses a threshold min to classify nodes as either regular-or low-density nodes.
-10- Data were gathered to verify that both regular-and low-density node transition density values are within the stated error bounds. Trials were run with 95% con dence and 5% error tolerance. It was found that well over 95% of regular node transition density values have less than 5% error. This occurs because many of the nodes converge quickly and are subsequently oversampled. Low-density nodes also performed well. Well over 95% of low-density node transition density values have less than the speci ed absolute error.
Data were also gathered to investigate the variation of execution time with min . In most -12-cases, it was found that the execution time for circuits falls dramatically as min rises. This occurs because the lowest density nodes typically converge last. Finally, data were taken for execution times on large circuits. MED required reasonable execution times for large circuits when under 20% of nodes are classi ed as low-density. De nition 1. The signal probability of x(k), to be denoted P(x), is de ned as:
It can be shown that the limit in (A.1) always exists. If x(k) 6 = x(k?1), we say that the signal undergoes a transition at time k. Corresponding to every logic signal x(k), one can construct another logic signal t x (k) so that t x (k) = 1 if x(k) undergoes a transition at k, otherwise t x (k) = 0. Let n x (K) be the number of transitions of x(k) over fb?K=2c + 1; : : :; b+K=2cg. Therefore, n x (K) K.
De nition 2. The transition density of a logic signal x(k), denoted by D(x), is de ned as:
Notice that n x (K) = P b+K=2c k=b?K=2c+1 t x (k), so that D(x) = P(t x ), and the limit in (A.2) exists.
-13-
The time between two consecutive transitions of x(k) will be referred to as an intertransition time: if x(k) has a transition at i and the next transition is at i + n, then there is an intertransition time of length n between the two transitions. Let 1 ( 0 ) be the average of the high (low), i.e., corresponding to x(k) = 1 (0), inter-transition times of x(k). In general, there is no guarantee of the existence of 0 , and 1 . If the number of transitions in positive time is nite, then we say that there is an in nite inter-transition time following the last transition, and 0 or 1 will not exist. A similar convention is made for negative time. A stochastic process is said to be stationary if its statistical properties are invariant to a shift of the time origin 5]. Among other things, the mean E x(k)] of such a process is a constant, independent of time, and will be denoted by E x]. Let n x (K) denote the number of transitions of x(k) over fb?K=2c + 1; : : :; b+K=2cg. For a given K, n x (K) is a random variable. If x(k) is stationary, then E n x (K)] depends only on K, and is independent of the location of the time origin. Furthermore, one can show that if x(k) is stationary, then the mean E n x (K)=K] is constant, irrespective of K.
Let z 2 Z be a random variable with the cumulative distribution function F z (k) = 1=2 for any nite k, and with F z (?1) = 0 & F z (+1) = 1. One might say that z is uniformly distributed over the whole integer set Z. We use z to construct from x(k) a stochastic 0-1 process x(k), called its companion process, de ned as follows. De nition 3. Given a logic signal x(k) and a random variable z, uniformly distributed over Z, de ne a 0-1 stochastic process x(k), called the companion process of x(k), given by:
x(k) 4 = x(k + z) (A:4)
For any given k = k 1 , x(k 1 ) is the random variable x(k 1 + z) -a function of the random variable z. Intuitively, x(k) is a family of shifted copies of x(k), each shifted by a value of the random variable z. Thus, not only is x(k) a sample of x(k), but one can also relate statistics of the process x(k) to properties of the logic signal x(k), as follows.
-14-
