Statistical Estimation of Combinational and Sequential CMOS Digital Circuit Activity Considering Uncertainty of Gate Delay Models by Chou, Tan Li & Roy, Kaushik
Purdue University
Purdue e-Pubs
ECE Technical Reports Electrical and Computer Engineering
10-1-1995
Statistical Estimation of Combinational and
Sequential CMOS Digital Circuit Activity
Considering Uncertainty of Gate Delay Models
Tan Li Chou
Purdue University School of Electrical and Computer Engineering
Kaushik Roy
Purdue University School of Electrical and Computer Engineering
Follow this and additional works at: http://docs.lib.purdue.edu/ecetr
This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact epubs@purdue.edu for
additional information.
Chou, Tan Li and Roy, Kaushik, "Statistical Estimation of Combinational and Sequential CMOS Digital Circuit Activity Considering
Uncertainty of Gate Delay Models" (1995). ECE Technical Reports. Paper 146.
http://docs.lib.purdue.edu/ecetr/146
STATISTICAL ESTIMATION OF 
COMBINATIONAL AND SEQUENTIAL 
CMOS DIGITAL CIRCUIT ACTIVITY 
CONSIDERING UNCERTAINTY OF 
GATE DELAY MODELS 
TR-ECE 95-25 
OCTOBER 1995 
Statistical Estimation of Combinational and Sequential CMOS 
Digital Circuit Activity Considering Uncertainty of Gate Delay 
Models 
Tan-Li Chou and Kaushik Roy 
Electrical and Computer Engineering 
Purdue University 
West Lafayette, IN 47907-1285 




Statistical Estimation of Combinational and Sequential C:MOS 
Digital Circuit Activity Considering Uncertainty of Gate Delay 
Models1 
Abstract 
While estimating glitches or spurious transitions is challenge due to signal correlations, the 
random behavior of logic gate delays makes the estimation problem even more clifficult. In 
this paper, we present statistical estimation of signal activity at the internal and output 
nodes of combini3tional and sequential CMOS logic circuits considering uncertainty of gate 
delay models. The methodology is based on the stochastic models of logic signals and the 
probabilistic behavior of gate delays due to process variations, interconnect parmitics, etc. 
We propose a statistical technique of estimating average-case activity, which is flexible in 
adopting different delay models and variations and can be integrated with worst-case analysis 
into statistical logic design process. Experimental results show that the uncertainty of gate 
delay makes a great impact on activity at individual nodes (more than 100%) and total 
power dissipation as well. 
'This research was supported in part by ARPA (under contract F33615-95-C-1625), by NSF CAREER 
award, and by IBM Corporation 
1 Introduction 
As mobile and portable information systems are becoming more popular, battery and pack- 
aging technologies do not seem to be keeping up with the same pace. On the other hand, 
deep submicron processes are pushing higher levels of integration, which increases the num- 
ber of transistors: in VLSI circuits, and hence, the power density. Reliability problems such 
as run time errors due to overheating have become more and more serious. As a result, it 
has become important to consider power dissipation and reliability issues during the design 
phase. In order t;o design circuits for low power and high reliability, accurate estimation of 
power dissipatio~i s required. This paper considers accurate estimation of dynamic power 
dissipation for ClvIOS digital circuits considering uncertainty in gate delay models. In CMOS 
digital circuits m.ajority of the power dissipation is due to charging and discharghg of load 
capacitances of logic gates. Such charging and discharging occur due to signal transitions, 
which depend on input signal patterns. Therefore, accurate estimation of power dissipation 
involves accurate estimation of signal switching activity at the internal nodes of a circuit. 
However, it is computationally too expensive to try all possible combinations 01. inputs in 
order to estimate power dissipation. Therefore, techniques are being developed to accu- 
rately estimate power dissipation considering probabilistic and statistical approaclies. These 
techniques are referred to as weakly input-pattern dependent [7]. 
In the probabilistic approaches, the signals are usually modeled as stochastic processes 
having signal probability (probability of having a logic ONE) and activity (average number 
of signal switching per unit time). A single symbolic simulation run determines the inter- 
nal node signal probability and activity by propagating the input probability in.formation 
through the circu:it [l, 2, 61. For accuracy, the probabilistic approaches need to consider both 
spatial and temporal correlation of signals [2, 14, 151. However, when it comes to sequential 
circuits, spatial ixnd temporal correlations among state bits are more complicated, and were 
addressed in [8, 9, 41. On the other hand, the statistical approaches simulate the circuits with 
a limited number of inputs vectors. The input vectors are randomly generated conforming 
to the given signa.1 probability and activity of the input signals. The number of simulations 
for combinational circuits are determined by user-specified parameters, such as confidence 
levels and errors that can be tolerated [lo, 111. When sequential circuits are cl~~sidered, 
the feedback effec.t of state bits and the near-close set problems must be dealt with [16, 31. 
Readers are referred to a survey paper [7] for more details on some of the above techniques. 
Some of the above techniques considered spurious transitions (glitches or haza:rds) while 
others ignored. Spurious transitions appear in circuits not necessarily due to de:sign errors. 
They also depend on the timing relationship between logic signals of the circuits. A node 
with inputs having different path delays in a synchronous circuit may' have several transitions 
before it reaches steady state logic value. Though glitches can be reduced by balancing paths 
and by reducing; the depth of circuits, the power dissipation caused by glitches can be as 
high as 70% of the total power in some circuits such as combinational adders [7, 51. The 
probabilistic techniques proposed to consider glitches while handling spatial and temporal 
correlations require building BDDs (Binary Decision Diagrams) of various forms [13, 6, 71, 
which can be computationally expensive. One of the problems that these techniques have 
is that inertial alelay is not considered. As will be shown later, they usually overestimate 
the activity. Mo~reover, the activity at a node can be very sensitive to the path delays of its 
inputs as pointed out in [17]. Therefore, slight variation of delays result in great changes in 
activity. As a result, the minor delay model inaccuracies in the statistical and probabilistic 
approaches [6,10,11,16,3] can lead to large errors in estimated activity. Unfortunately, these 
minor inaccuracies are common due to sources of uncertainty such as process variations (die 
to die, wafer to .wafer, and lot to lot), interconnect parasitics, temperature, approximation 
made by modeling gate delays, and other effects. 
A solution has been proposed to estimate an upper bound on activity of individual nodes 
in larger circuits [1.7]. This technique uses signal uncertainty to capture the worst cases- 
to estimate the  maximum activity. The technique provides information that citn be used 
in worst case de:sign techniques and is indeed very fast. However, worst cast: modeling 
can be extremely conservative and therefore can over-estimate the impact of i~laccuracies 
in delay models on estimating glitches. In statistical designs of complex logic circuits for 
performance, a ~nethodology proposed in [:1.8] employs two stages of worst case analysis, 
calibrated with statistical simulation. The two stages of worst case analysis serve as filters 
to screen out circuits that easily meet their performance requirements. Similarly., statistical 
designs for low-power (low-activity) can take advantage of improved accuracy resulting from 
statistical simula.tion. In this paper, we propose a statistical estimation of CMOS circuit 
activity considering uncertainty of gate delay models. The average-case estimaticln together 
with the worst-case analysis can be used in statistical designs fir  low-power. For example, a 
two-stage design process is shown in Figure 1. The sources of uncertainty are represented by 
probability distributions in the statistical estimation. The first stage uses worst-case analysis 
to identify nodes -that can have very high activity, which may result in reliability problems [I]. 
RTL description 





timings) I Worst-case analysis (Optimize worst-case 
activities to specificatio ) I 
timings 
Averagecase statis tical 
estimation (verify and 
optimize activities to 
specification) 
Figure 1: Design flow for statistical low-power design of logic circuits 
Then one can reduce the activity (especially glitches) of these nodes by balancing paths 
(through transistor resizing, re-synthesizing the circuits, inserting buffers and latc:hes, etc.). 
Circuit nodes tha.t could not easily meet the power dissipation goal at the first stage will 
be passed down to the average-case statistical estimation stage for further anitlysis and 
optimization. In t'he second stage, average-case statistical estimation is performed providing 
more information than worst-case analysis. Therefore, one can utilize the more accurate 
information to opltimize the nodes that failed the specification in the first stage. In addition, 
since in the second stage the average-case activity is known, one can further optimize the 
circuits for lower overall power dissipation. 
The paper is organized as follows. Section 2 reviews signal probability and activity defi- 
nitions and the relationship between activity and power dissipation. Monte Carlo methods, 
which are the basis of the proposed statistical estimation, will be introduced in Section 3. 
Section 3 also analyzes the errors in sequential circuit activity estimation when "near closed 
sets" are present in sequential circuits. Section 4 introduces the probabilistic mod.el of gate 
delays and determines a statistical estimation method of how to take the random behavior of 
gate delay into consideration. Experimental results are presented in section 5. Conclusions 
are given in Section 6. 
2 Preliminaries and Definitions 
This section for:mally defines signal probability and activity. A brief discussio:n on power 
dissipation in ClMOS circuits is also presented. 
2.1 Signal Probability and Activity 
Given a logic signal x(t)  and a random variable T ,  the companion process of x(t) is defined 
as x(t) = x(t + .r), where T is uniformly distributed over 2 (real number). The bold font 
is used to represent a stochastic process. The primary inputs to a circuit are modeled as 
mutually independent companion processes of logic signals. It can be proved [I] that the 
probability of a companion process of a logic signal x(t) assuming the logic value ONE at  any 
given time t (linh,, )J:& x(t)dt)  becomes a time constant and is called the equilibrium 
probability. This is denoted by P(x) .  In contrast, the signal probability is defined as (clock 
cycles in which the signal is steady state ONE)/(total clock cycles)). Note that steady state 
signals are only considered in signal probability estimation and any spurious trailsitions are 
ignored. Najm [I] has also shown that the activity A(x), defined as limT,, v, is equal 
to the expected value of (mean-ergodic). The variable n, is the number of switching 
of x(t) in the time interval (-T/2, T/2]. 
If we assume all primary inputs to the circuits under consideration switch only at  the 
leading edge of the clock and the circuits are delay-free, we can define normalizled activity. 
Normalized activity, denoted by a(x) is defined as A(x)/ f ,  where A(x) and f are the activity 
at node x and clock frequency, respectively. Normalized activity has an intuitive meaning. 
That is, the prot)ability of node x switching within a clock cycle. In circuits with arbitrary 
gate delays where glitches (or hazards) exist, we still define normalized activity a(x) as 
A(x)/ f .  However, note that a(x) can be greater than one. Hence, a(x) repiresents the 
average number of transitions in a clock cycle. 
2.2 Power Dissipation in CMOS Logic Circuits 
Of the three souirces of power dissipation in digital CMOS circuits - switching, direct-path 
short circuit curlrent, and leakage current - the first one is by far the dominant,. Ignoring 
power dissipation1 due to direct-path short circuit current and leakage current, the average 
power dissipation in a CMOS logic is given by POWER,,, = +&: C; C;A(i), where is 
the supply voltag;e, A(;) is the activity at node i ,  and C; is the capacitive load at that node. 
The summation is taken over all nodes of the logic circuit. It should be observed. that A(i) 
is proportional to a(i). C; is approximately proportional to the fanout at that node. As 
a result, the normalized power dissipation measure defined as = C; fanout:; x a(i)  is 
proportional to tlze average power dissipation in CMOS circuits. The parameter, f anout; is 
the number of fanouts at node i. 
3 Monte Carlo Based Power Estimation for Sequen- 
tial Cii:cuits 
In this section we will first give a short overview of the Monte Carlo techniques in 12stimation 
of signal activity followed by a detailed analysis of the errors introduced in esti.mation of 
circuit activity for sequential circuits when "near closed sets of states" are present. In the 
presence of such states, we also derive a technique to estimate power dissipation in sequential 
circuits. 
The basic idea of Monte Carlo methods for estimating activity of individua.1 nodes is 
to simulate a circuit by applying random pattern inputs. The convergence of :simulation 
can be obtained when the activities of individual nodes satisfy some stopping criteria. The 
procedure is outlined in Figure 2. 
We can use random number generators to generate input patterns conformi.ng to the 
given probabilities and activities. During a given period, say T (T  clock cycles), we count 
the number of trimsitions at each node, nl and call the value n l /T  a random sample. T 
is called the sample length in this paper. The process is repeated I( times to have K 
independent sam~ples, aj = nj/T, j = 1 - - - I<, by using different seeds for the random 
number generators. The sample mean is defined as a = (C,"=, aj)/IC. For large I(, a will 
approach the expected value of a, which is limT,, nT/T, and is denoted as a since the 
signal at each nocle is mean-ergodic (section 2). nT is the number of transitions i11 the time 
interval (3, $1. Similarly, for large I( the sample standard deviation s will approach the 
true standard deviation a. Furthermore, according to the Central Limit Theorem [19] ii is 
a random variable with mean a and has a distribution approaching the normal distribution 
if K is large (typ:ically > 30). Likewise u B s / a .  It has been shown in [lo,  11.1 that for 
I Generate a Random Circuit State 
( Generate Inputs (a,P) and samplel 
1 Yes 
I End I 
Figure 2: Monte Carlo based technique flow chart 
(1 - a) x 100% confidence the following inequality holds: 
la - ZI ;zaps 
- 
5 -  
a i i a '  (1) 
where zal2 is a specific value such that the area under the standard normal distrib~ution from 
zap to cw, is a/2. Therefore, if 
la - ZI z a / 2 ~  I la - iil el 
we have -7 <-5 r , and hence - <- - r .  
a - cia a 1 - e1 
Equation 2 is the stopping criterion for (1 - a) x 100% confidence and r is an upper bound 
on the relative error. 
If any node in the circuit has a very low activity, that is, its a << 1, by equation 2 the 
number of samples required can be very large. This results in slow convergence. However, 
since these low-a.ctivity nodes contribute little to power dissipation, a modified stopping 
criterion is proposed in [I. 11. One can specify a particular threshold value a,;,, below which 
the activities of nodes are less important. Hence one may not wait for those nodes to converge 
to a value within a certain percentage of error. Furthermore, if 
1 ' 1 - Z (  za/2s u,;~E' 
wehave 7 < -< -, and hence la - 7i( 5 a,;,e'. 
a - i i J I? -  a 
Therefore, equation 3 becomes the stopping criterion (with 7i < a,;,) for (1 - a) x 100% 
confidence and am;,et is an absolute error bound (not a percentage error bound).. 
In sequential circuits, things are different due to the state-bit feedback. One of the 
approaches is to monitor the state-bit probabilities to determine the convergence [16]. In 
order to have (1 .- a) x 100% confidence and some error c (upper bound on absoliite error of 
the state-bit prolbabilities), one must perform at least K > max(N,2, N i ,  N i )  runs, where: 
However, to derive each sample probability of a state bit is a problem since the probability 
depends on the state the sequential circuit is in. This can be resolved as follows. Assume 
that the state of the machine at time k (kth clock cycle) becomes independent of its initial 
state at time 0 as k -+ m. As a result, the probability that the state bit signal si(k) is 
logic ONE at tinne k with initial state S(0) being So, denoted as P(si(k) = l JS(0)  = SO) 
(abbreviated as lS(silSo)), has the following property: 
lim Pk(~i12;O) = lim Pk(si = 1) = P(s;). Similarly, 
k+m k - r m  
lim Pk(silS1) = P(s;). 
k - r w  
That is, the probability will be independent of the initial state as k -+ m. Based on this 
property, two runs starting from two different initial states So and S1 are perfor~ned at the 
same time. During the simulation, the difference and average of Pk(si ISO) and. Pk(silSl) 
are monitored. I:€ both the difference and average remain within a window of width of c, 
for a certain user-specified number of consecutive clock cycles, it is declared that Pk(silSo) 
and Pk(silS1) have converged. This convergent value of probability is a sample value. The 
sample procedure is repeated I< times to meet the user-specified error and confidence level, 
as mentioned earlier in this paragraph. However, there are some circuits that can not be 
directly estimated by this approach. 
Let us consider the State Transition Graph (STG) of Figure 3. Let G1 and G2 denote 
the set of states given by {slsO, sls0) and {;ls0, i l so) ,  respectively. Assumt: that the 
probability of making a transition between the set of states in G1 and the set, of states 
in G2 is very low. As a result, most of the samples are collected from G1 if the initial 
states are slsO (270) or ;la (Sl). These sets G1 and G2 are called sets of near-cllose states. 
Figure 3: Example2: Another STG of a sequential logic circuit. 
Let y be the out8put node given by y = (slsO + i l i 0 ) z l .  Considering only the set of 
states in G1, y =: (slsO + Lls0)zl = 21, while considering only the set of states in G2, 
y = (slsO + ils'b)zl = 0. Therefore, P(y) = P(z l )  in GI and P(y)  = 0 in G2. That is, the 
probability behavior is very different in two different groups of states. Data samp1.ed from a 
particular group is biased giving errors. As a matter of fact, if we assume all the primary 
inputs have the same probability and normalized activity of 0.2 and 0.3 respect;ively, the 
normalized activity of y sampled from G1 is 0.3 (a(y , GI)) and 0.02 (a(y , G2)) from G2. 
A solution to this problem is as follows. Let us assume that we know the values of P(Gl )  
and P(G2) of the previous example. The normalized activity, a(y) can be coniputed as 
follows, 
If STG is given, P (G l )  and P(G2)  may be computed by assuming that the primary inputs are 
either temporally luncorrelated [8,9] (which may give errors when they are not) or M:arkov [4]. 
A primary input is Markov if its future value depends on the present value and does not 
depend on its past. However, if no STG knowledge is assumed, can we find out P(Gl )  
and P(G2)? Under the assumption that the primary inputs are Markov, it turns out that 
we can implicitly compute P ( G I )  and P(G2). Based on the assumption that the primary 
inputs are Markov, a new state transition graph called Extended STG (ESTG) i:s built by 
transforming the original STG according to some rules. The resultant ESTG (rather than 
STG) is Markov [4]. Therefore, there is a transition matrix that corresponds to ESTG. It 
is found that P~~,,(G;), the probability of reaching one of the states of G; at the end 
of k clock cycles (k  2 a certain period of time called the warmup period) with any initial 
state is very close to P(G;). The error specified by the users determines the wannup period 
according to the following empirical inequality that we derived in [3], 
IP&, (G:) - P(Gi) I < N, 1 ~ 2  1 * , 
where X2 is the second largest eigenvalue (absolute value) of the transition matrix and N, is 
the number of E!3TG states. The upper bound on the number of states of the ESTG is 2'+J, 
where i and j are! the number of state bits and the number of inputs, respectively. Therefore, 
if we specify the upper bound on the relative error EG that P(Gi) can have, we have 
Equation 8 imp1i.e~ that if we starting with a randomly generated initial state simulate the 
circuit for a warnlup period of clock cycles and then sample data, the probability of sampling 
data from among the states of G; is P(G;). If we repeat the same procedure N times, we will 
have N - P(Gl ) samples from G1 and N P(G2) samples from G2. Let aj( y IG;) represent the 
x!yG;) LZ,(?,lGi)jth sample taken from Gi. Hence the mean of the samples taken from G; is +Y.P(Gi) , 
denoted as a(ylG!;). As a result, the mean of these samples is 
which is not biased. However, since STG is not given, ESTG and its corresponding transition 
matrix and hence X2 and N, can not be derived. To be conservative, we may choose X2 to 
be 0.9 and N, to be 2' ( r  is the number of primary inputs and state bits), which is an upper 
bound on the number of ESTG states. The modified version of Monte Carlo based technique 
for sequential circuits is outlined in Figure 4. It is worth mentioning is that the warmup 
period simulatiorl does not need any delay model. What matters is' the steady state logic 
value (it may have some spurious transitions) of the state bits rather than the transient 
behavior. 
Lastly, there is only one problem left. If we examine the assumption of the stopping 
criterion (Inequality 2), it is assumed that the mean a has a normal distribution. But 
in the previous example, apparently a(y) has a bimodal distribution. Can we apply the 
same stopping criterion to the bimodal distribution in this case? In order to answer this 
question, we will compare the user-specified relative error (E'  in Inequality 2) with the actual 
-4 Generate a random circuit state I 
Combinational Sequential 
What k i d  of 
circuit? 
I Run simulation for a warmup period I 
I 
I I I Generate inputs (a,P) and sample I 
I U 
At the end of warmup. 
new initial state is 
generated in the circuit. 
Figure 4: Monte Carlo based technique flow chart for both sequential and com.binationa1 
circuits 
Figure 5: Birnodal distribution as a linear combination of two normal distributions 
relative error ( E * )  resulting from applying the stopping criterion (Inequality 2) to a bimodal 
distribution. Let us assume that the bimodal distribution function is a linear combination 
of two normal distribution functions fl (x) and fi(x). That is, 
f (x) = P(G!l) . fl(x) + P(G2) - f 2 ( ~ ) ,  (10) 
where P(G1) and P(G2) are steady-state probabilities of two groups of states as mentioned 
earlier (two near-close sets) and P(G1) + P(G2) = 1. fi(.) represents the populat'ion of Gi. 
It is justified by assuming T >_ 30 that fi is normal, where T is the number of clock cycles 
in each sample. It is illustrated pictorially in Figure 5 
Assume that in each sample we start to take data after simulating the sequential circuit 
for a warmup period. If a total number of N samples are taken when the stopping criterion is 
met, we have N; (x N . P(G;)) samples collected from G;. N is determined by th.e stopping 
criterion (Inequality 2) to meet the user-specified relative error E and (1-a) x 100% confidence 
level. However, it is based on the assumption that the distribution is normal r.ather than 
bimodal. Without loss of generality, let us assume P(G1) 5 P(G2). If we use N; samples to 
estimate pi, we have s; and m; as sample standard deviation and sample mean. It can be 
shown (see Appendix A) that 5, the ratio of the resultant relative error to the user-specified 
relative error, with the same confidence level is, 
- and talz and z,l2 are obtained from the t-distriliution and where rm = $, rrsm - 
32 Im2 ' 
normal distribution [20]. A few plots are shown in Figures 6 through 9. Surpr:isingly the 
ratios are less th.an 1.5 for most of the values of P(G1) with different parameters. This 
implies that in order to ensure the actual relative error to be less than ~ b ,  E' in the stopping 
criterion (Inequality 2) has to be less than 2. For example, if we assume 7.5% error to be 
tolerable (cb = 0.075), then E' must be less than 2 = 5% (c' = 0.05). The worst case only 
occurs.when P(Grl) is very small (that is, when only a couple of samples are col1t:cted from 
GI). For example, 5 is equal to 1.52 as P(G1) = 0.08, rm = 10, and r,,, = 0.5. Several 
other observation.s can also be made. The larger the rm is (ml is greater than mz), the 
higher ratio of relative error is when P(GI)  is very small (less than 0.05). This is; shown in 
Figure 6 and can be explained as follows. Since only a few samples are collected from G1 
while ml is much larger than mz, we may expect higher error ratio when the ratio rm (2) 
is larger. On the other hand, when ml is smaller than mz ( Figure 7) and even wlnen only a 
couple of samples: are from GI, it does not really effect the error since P(G1) is very small. 
It is also observed (Figure 8) that with more samples (greater N) the error ratio is smaller 
when P(G1) is snnall. Another factor that affects the error ratio is r,,,, which it; the ratio 
of relative sample standard deviation % to z. When only a couple of samples are taken 
from G1 and they have greater relative sample standard deviation, it is also expected that 
the ratio will be higher, which is shown in Figure 9. 
In the above analysis, we have assumed that there are only two near-close sets. But 
there can be more than two near-close sets in a sequential circuit. Our technique can be 
extended to cases having multiple near-close sets. In addition, the experimental results 
(Section /refresult) show that accurate results can be obtained for the sequential ISCAS 
Figure 6: Relative error ratio with r,,, = 1 Figure 7: Relative error ratio with r,,, = 1 
and N = 120. and N = 120. 
Figure 8: Relative error ratio with r,,, = 1 Figure 9: Relative error ratio with r ,  = 5 
and r ,  = 5. and N = 120 
benchmark circuits under this assumption. 
4 Delay Models and Statistical Estimation 
As mentioned earlier, minor delay model inaccuracies may lead to large errors in estimated 
activity. Therefore, delay models are crucial to the statistical estimation of activity. Prob- 
abilistic delay models used in the estimation will be introduced to capture the uilcertainty 
of gate delays. Based on the probabilistic delay models, we will generalize the Monte Carlo 
approach. 
4.1 Delay models 
In the design phase, a designer is faced with different sources of uncertainty that affect 
the delays of the circuit. These sources can be grouped into two classes: systematic and 
random [18]. The systematic class includes approximations made to simplify the model 
for improving simulation time, approximations made to estimate device and interconnect 
parasitics prior to layout, and uncertainty in the final process center and distribution when 
Figure 10: A Ciircuit with nominal delays Figure 11: A Circuit with random delays 
design proceeds in parallel with process development. On the other hand, the random class 
includes uncontrolled variations in photolithography, die to die variations, wafer to wafer 
variations, lot to lot variations, operating temperature, power supply voltage, etc. In [17], it 
has been shown that a circuit node where two reconvergent paths with different d'elays meet 
may have a large number of spurious transitions. However, even in a tree-structured circuit 
with balanced paths (without reconvergent fanout) there can be a large number of spurious 
transitions due to slight variations in delays. These variations can be caused by any of the 
above sources of uncertainty. 
Let us consider the circuit of Figure 10. All gates are assumed to have the sitme delay. 
Because the tree has perfectly balanced paths, there are no glitches at all. The final output 
has normalized activity 0.5 when all the primary inputs are assumed to be synchronous 
and have activity of 0.5. However, due to sources of uncertainty, the gate delays may have 
variations and are shown in Figure 11. As a result, glitches do occur and the values of 
activities at individual nodes change. This is shown in Figure 11. The inertial delays are 
assumed to be half of the values of transport delays for the simulation. Notice that the final 
output normalized activity becomes 1.30 rather than 0.5. In order to capture this random 
behavior in statistical design, these sources of uncertainty are represented by probability 
distribution while in worst-case design, the extreme cases are taken into account. 
In this paper, we choose transport delay (d) model with inertial delay (dr). However, it 
should be noted that the technique is not restricted to such a delay model. The point is to 
model the parameters of chosen delay models as random variables in order to ca.pture the 
probabilistic behavior of gate delays. The transport delay is modeled as a random variable of 
truncated normal distribution with mean pd and standard deviation ad as shown in Figure 12. 
The mean is the nominal value of transport delay d and the deviation is either assigned by 
users or determined by feedback from the fabricated chips. Moreover, if a randorn delay is 
Min Max 
Figure 12: Random delay with truncated normal distribution 
less than a minimum value Min,  it is discarded since in real circuits it must be larger than 
some positive value. Similarly, if a random delay is greater than a maximum value Max,  it 
is truncated since it can be considered as a delay fault. 
4.2 Statistical Estimation 
Recall that in Monte Carlo based technique the primary input patterns are generated con- 
forming to a given activity and probability of the input signals. In a more abstract view 
point, we can think of activity (a) at a node as a function of primary input vectors PI. Each 
component of PI is a stochastic process (see Section 2). Therefore, a is also a stochastic 
process and can be expressed as follows, 
a = F (PI). (12) 
In Section 3, we applied Monte Carlo based techniques to estimate the expected value of a, 
E(a) .  However, what is missing in this approach is the information about the delqy. In other 
words, the delays of the gates of the circuit are assumed to be some constants (dete:rministic). 
Now assume that gate delays are not deterministic and each gate delay can be represented 
by a random variable d;. If D is a random vector consisting of all the random vosriables of 
gate delays, a car1 be represented as follows, 
a = F(P1,  D). 
Therefore, when applying Monte Carlo based techniques to estimating F(P1, i3), delays 
are modeled as random variables and should be generated from time to time along the 
4 
Generate a random circuit state 
Combinational Sequential 
What kind of 
At the end of warmup, 1 Generate inputs (a,P) and sample new initial state is 
generated in the circuit. 
[ End J 
Figure 13: Modified Monte Carlo Based Technique Flow Chart 
simulation. The rationale behind this is that whenever we generate a new set of delays, they 
correspond to another die or even the same die but with different operating conditions such 
as temperature and power supply voltage. In contrast to Figure 2, Figure 13 outlines the 
procedure of how the modified Monte Carlo based technique works. 
5 Experimental Results 
The Monte Carlo based approach with Probabilistic Delay models (MCPD) to estimate 
activities at the internal and output nodes of both combinational and sequential circuits has 
been implemented in C under the Berkeley SIS environment. In this section, in order to 
compare the results with gate delays to those assuming no delays, all the primary inputs 
are assumed to be synchronous. That is, if they switch, they switch at the same time (say, 
at the leading ealge of a clock cycle). Primary inputs are randomly generated conforming 
to the given probability and activity of the input signals. In our analysis all the primary 
inputs are assumed to have signal probability of 0.5 and normalized activity of 0.5. MCPD 
uses the probabilistic delay model (Section 4). The transport delay d, which is a random 
variable, has mean pd (equal to nominal value) and standard deviation ud. Unless mentioned 
otherwise, we assume that the standard deviation ad is equal to 0 . 3 ~ ~  and the inertial delay 





C1908 8E 1634.8 
is also a random variable that equals to 0.5d. In order to asses the accuracy of the results, we 
run MCPD for a long time with 99% confidence and 1% error. For combinational circuits, 
since the activity is higher than that with zero delay models due to the presence of spurious 
transitions, we choose a higher threshold (a,,, = 0.5). In Table 1 the number of gates 
(#gates), maximum levels (#levels) and CPU time for long run MCPD (in liours on a 
SPARC 5 workstation) are provided. The same table also presents several power dissipation 
values (normalizled power dissipation measure) of different delay models. Both and a, 
use the same probabilistic delay model but the former represents the long run results while 
the latter the normal run results. The normal run result is the result by assuming 95% 
confidence, 5% error, and threshold of a,;, = 0.5. a,, am and a, represent the results of no 
delays, transport delay (not random but equal to nominal value) with inertial d.elay being 
half of the transport delay, and unit delay, respectively. 
Several interesting observations can be made by examining at Table 1. Though the results 
of unit delay model can have higher spurious transitions (6 out of 10 circuits of Table I) ,  it 
is not always the highest. As far as a, and a, are concerned, a, is not necessarily greater 
than a,. It depends on types of circuits. For example, for the first four circuit:; a, > a, 
while for the rest of the circuits a, < a,. That is, for some circuits like C499, activity at 
some nodes is sensitive to uncertainty and therefore a, (activity with probabilistic delay) is 
greater than a, (activity with non-random nominal delay). This is shown in Figure 14. Also 
note that the estimated activity at individual node with probabilistic delay models can be 
twice as high as that with nominal delay models. On the other hand, for some circuits like 
C6288, uncertainty helps balance paths of circuits and results in fewer spurious .transitions 
Figure 14: Activity sensitive to uncertainty Figure 15: Uncertainty helping balance 
(C499) paths of circuits (C6288) 
(Figure 15). The estimated activity at individual node with nominal delay moclels can be 
twice as high as that with probabilistic delay models. Another interesting observation is 
that when activity at a node is sensitive to uncertainty, the higher the uncertainty is, the 
more spurious transitions the node has. However, when uncertainty helps balance paths of 
circuits, the higher the uncertainty is, the less spurious transitions the node has. These are 
illustrated by Figure 16 (C499) and Figure 17 (C1908), where represents the ilormalized 
power dissipation measure and a and ,u are the standard deviation and mean of the transport 
delay. 
The accuracy of the normal run results are shown on Table 2. The average relative error 
(% error) is the average percentage of all the relative errors of individual nodes provided that 
a(y) 2 0.1. This indicates that on the average how accurate the long run MCPD is for those 
nodes with higher activity. Maximal absolute error (Max abs error) among all the nodes is 
also given. The term a represents the activity of a node at which maximal absolute error 
occurs and its percentage error is denoted as a % error in the table. Notice that, the CPU 
time of normal run MCPD is comparable to that of statistical techniques without considering 
probabilistic behavior of logic gate delays [ll]. 
Similarly, for :sequential circuits we run MCPD for a long time with 99% confidence, 1.5% 
error, 0.1 threshold (a,;,), 2nd largest eigenvalue (Az) of 0.99, near-close set probability 
(P(G;)) of 0.1, and the upper bound on the relative error of P(G;) ( E ~ )  being 1%. The 
results are shown in Table 3. Number of primary inputs (#PI) and latches (#ff), length 
of warmup period (warmup length), and CPU time in seconds on SPARC 5 wa'rkstations 
are also shown. In contrast to long run MCPD, the normal run MCPD takes the following 
parameters: 95% confidence, 7.5% error, 0.3 threshold (a,;,), 2nd largest eigenvalue (Az) of 
0.9 near-close set probability (P(G;)) of 0.5, with EG being 5%. Comparison between results 
Figure 16: Activity sensitive to uncertainty Figure 17: Uncertainty helping balance 
















- 1 1  
CPU Ave. abs % error Max abs a a % 
(sec) error error error 
89.2 0.0067 1.2 0.052 0.96 5.4 
C499 78.3 0.0065 1.1 0.034 1.23 2.8 
Table 3: Long run results on ISCAS sequential benchmark circuits 
Table 4: Individual node information on MCPD in comparison with long run MCJPD results 
CPU NDQ PDQ Ave. abs % error Max abs 
samples (sec) error error error 
146 119.3 83.8 106.0 0.0033 0.99 
Table 5: Individual node information on s526 MCPD with different number of samples 
of long run MCPD and normal run MCPD is shown in Table 4. In the table, Ave. abs error, 
% error, Max abs error, a and a % error have the same meaning as those of Table 2. NDQ 
and PDQ represent the normalized power dissipation with zero delay and with probabilistic 
delay, respectiveljr. Notice that we assume that there are only two near-close sets. Therefore, 
the activity samplle can be taken from a bimodal population (GI and G2) rather than a single 
population (GI or G2). Though the stopping criterion (Inequality 2) is derived based on the 
assumption that the distribution is normal, we can still apply it with slight mo~dification. 
This is shown as Follows. When a user specifies the upper bound on relative erroi: to be cb, 
e' in Inequality 2 must be less than or equal to (Section 3). For example, when the user 
specify relative error to be 7.5%, Inequality 2 will be applied with e' being 5% to check if 
the simulation coliverges. 
Notice that the sequential circuits s400, s444, and s526 have higher relative error over 
Table 6: Run time test for large sequential circuits 
the long run results. It is observed that these three circuits have smaller number of samples 
than 120. Especiidly, s526 has only 50 samples. Recall the discussion on bimodal l~opulation 
sampling in Secticon 3. The ratio F of relative error assuming bimodal population to relative 
error assuming normal distribution is higher when the number N of samples is smaller. This 
is shown in Figure 8. Therefore, we suspect that the high percentage error can be attributed 
to the smaller number of samples than necessary. That is, the ratio 5 is greater than 1.5. 
Therefore, we also tried cases of N being 90 and 120. The results are shown in the table as 
s526a (N=90) and s526b (N=120), respectively. In Table 5 we examine the activities at 12 
different nodes that have relative error more than 7.5% when N=50. The 7.5% relative error 
is what we specified up-front. As the number of samples increase, the relative error drops 
dramatically. This can be explained from Figure 8 as follows. When N increase from 60 to 
120, the ratio (drops dramatically, too. Therefore, the 1.5 ratio assumption for. is met 
and the relative error falls within the user-specified range. 
In order to test the run time for larger sequential circuits, three larger circu:its (~9234, 
~13207, ~15850) have been tried with the same parameters as those of normal run except for 
X2. X2 is assumed to be 0.5 to speed up the simulation (shorter warmup period) since these 
three circuits have a large number of latches. The run time (CPU) is shown in Table 6. 
6 Conclusions 
In this paper we have proposed a statistical estimation technique considering probabilistic 
delay models for both combinational and sequential CMOS logic circuits. Experimental 
results show the great impact that the probabilistic behavior of gate delays cain have on 
activity of individual nodes as well as power dissipation of a whole circuit. The CPU run time 
of estimating activity is reasonable and comparable to that of estimating activity without 
considering uncertainty. Though for our experiments we chose transport delay wi.t h inertial 
delay models, the technique is not restricted to a particular model. When more accurate delay 
models are provided, say rise/fall delay models rather than transport ones, our technique 
can easily adopt the new model. Together with worst-case analysis, the proposed. technique 
can be integrated into a statistical design process that takes advantage of both. 
A Appendix 
In Section 3 we iissume that a bimodal distribution function (f (x)) is a linear combination 
of two normal distribution functions fl(x) and f2(x). That is, f (x) = P(Gl)  - fl(x) + P(G2) -
fz(x). f;(-) represents the population of G;. We will show in this section that 5,  the ratio of 
the resultant relative error to the user-specified relative error, with the same confidence is, 
Assume that in each sample we start to take data after simulating the sequen.tia1 circuit 
for a warmup period. If a total number of N samples are taken when the st0ppin.g criterion 
(Inequality 2) is :met (the simulation converges), we have N; (z N P(G;)) samples collected 
from G;. N is dekermined by the stopping criterion (Inequality 2) 
to meet the user-specified relative error E and (1 -a) x 100% confidence level. However, the 
convergence criterion is based on the assumption that the samples form a normal clistribvtion 
rather than bimodal. Therefore, we suspect that the relative error with the same confidence 
level may not be the same as E .  Therefore, we will calculate the resultant relative error based 
u on a bimodal population. For simplicity, we compare the relative errors in term.s of , 
denoted as (c'), rather than in terms of v, denoted as ( e ) .  m and p are sample mean and 
true mean, respectively. Notice that the denominators are different in these two e:~pressions. 
Let s; and na; be the sample standard deviation and sample mean for G;, s be the 
sample standard deviation for the overall circuit, and y; and zj  be samples from G1 and Gz 
respectively. Hence, we have, 
N1 2 - N2 2 - Ci=l IIi Nl ' ml 2 Ci=lz; N2 ' m2 s; = - , 3 2  = , and 
.Nl - 1 N2 - 1 
Without loss of generality, we will assume P (Gl )  5 P(G2).  Let us consider the case when 
Nl < 30 and N2 2 30. This implies that if we use N2 samples to estimate the true mean 
value (pZ) for G2, the sample mean m2 estimated from G2 is approximately of normal 
distribution. Therefore, e2, the upper bound on relative error with IC samples ( IC calculated 
by Equation 14) can be calculated by Inequality 1, 
However, if we use Nl samples to estimate pl, ml is of t-distribution with (Nl-1) degrees of 
freedom [20] rather than of normal distribution. Consequently, the upper bound on relative 
error is 
za/2 and ta12 are: specific values such that the areas under the normal distrib~t~ion and t- 
distribution from. za/2 (or ta12) to co is a /2  Therefore, the overall resultant error is 
The last equality is derived by substituting el and c2 with Equation 17 and 16. From 
Inequality 14, we solve for m by rearranging terms and get m j m. Therefore, equation 18 
Z a l z S  
becomes 
Define rm, ram;, ;md r,,, as 2, $, and '.Lml respectively. Therefore, 
rsm2 ' 
N -l)+s<.(Nz-l) Recall that in Equation 15 s2 = 'I ( N-1 . If we substitute s l  and s2 with Equation 20 
and 21 and rearrange terms, we have 
Similarly, after s : ~  and s2 have been replaced in Equation 19 we have 
Combining Equation 22 and 23, we derive Inequality 11. 
References 
[I.] F.N. Najm, LLTransition Density, A New Measure of Activity in Digital Circuits," IEEE 
Trans. on Computer-Aided Design, vol. 12, No. 2, Feb. 1993, pp. 310-323. 
[2] T.-L. Chou, K. Roy, and S. Prasad, "Estimation of Circuit Activity Considering Sig- 
nal Correlations and Simultaneous Switching," IEEE Intl. Conf. on Compzrter-Aided- 
Design, 1994:, 300-303. 
[3] T.-L. Chou and K. Roy, "Statistical Estimation of Sequential Circuit Activity," IEEE 
Intl. Conf. on Computer-Aided-Design, 1995, to appear. 
[4] T.-L. Chou and K. Roy, "Estimation of Sequential Circuit Activity Considering Spatial 
and Temporall Correlations," IEEE Intl. Conf. on Computer Design, 1995, pp. 577-583. 
[5] A.P. C h a n d r i h h a n ,  S. Sheng, and R. Brodersen, "Low Power CMOS Digital Design," 
IEEE Trans. on Solid-State Circuits., vol. 27, No. 4, April, 1992, pp. 473-483. 
[6] A. Ghosh, S. Devadas, K. Keutzer, and J. White, "Estimation of Average Switching 
Activity in Combinational and Sequential Circuits," ACM/IEEE Design Automation 
Conf., 1992, pp. 253-259. 
[7] F.N. Najm, 'A Survey of Power Estimation Techniques in VLSI Circuits," IEEE Trans. 
on VLSI Systems, Dec. 1994, pp. 446-455. 
[8] C.-Y. Tsui, M. Pedram, and A. M. Despain, "Exact and Approximate Methocls for Cal- 
culating Signal and Transition Probabilities in FSMs," ACM/IEEE Design Avtomation 
Conf., 1994, pp. 18-23. 
[9) J. Monteiro, S. Devadas, and B. Lin, "A Methodology for Efficient Estimation (of Switch- 
- - 
ing Activity in Sequential Logic Circuits," ACMIIEEE Design Automation Conf., 1994, 
pp. 12-17. 
[lo] R. Burch, F. N. Najm, and P. Yang, T. N. Trick, "A Monte Carlo Approach for Power 
Estimation," IEEE Trans. on VLSI Systems, vol. 1, No. 1, March 1993, pp. 63-71. 
[ll] M. G. Xakellis and F. N. Najm, "Statistical Estimation of the Switching A.ctivity in 
Digital Circuits," ACM/IEEE Design Automation Conf. , 1994, pp. 728-733. 
[12] E. Seneta, "Non-Negative Matrices and Markov Chains," 2nd Edition, Springer-Verlag. 
[13] C. Tsui, M. Pedram, and A. Despain, " Efficient estimation of dynamic power consump- 
tion under a real delay model," IEEE Intl. Conf. on Computer-Aided-Des:ign, 1993, 
pp.224228. 
[14] R. Marculescu, D. Marculescu, and 1M. Pedram, "Efficient Power Estimation for Highly 
Correlated Input Streams," A CM/IEEE Design Automation Con f., 1995, pp. 628-634. 
[15] R. Marculescu, D. Marculescu, and M. Pedram, "Switching Activity Analysis Consider- 
ing Spatioternporal Correlations," IEEE Intl. Conf. on Computer-Aided-Design, 1994, 
pp. 294-299. 
[16] F.N. Najm, S. Goel, and I.N. Hajj, "Power Estimation in Sequential Circuits," 
ACM/IEEE Design Automation Conf., 1995, pp. 635-640. 
[17] F.N. Najm, M.Y. Zhang, "Extreme Delay Sensitivity and the Worst-Case Switching 
Activity in VLSI Circuits," ACM/IEEE Design Automation Conf., 1995, pp. 623-627. 
[18] S. G. Duvall, LLA Practical Methodology for the Statistical Design of Complex Logic 
Products for Performance." IEEE Trans. on VLSI Systems, Mar. 1995, pp. 112-123. 
[19] A. Papoulis, Probability, "Random Variables, and Stochastic Processes", 3rd Edition, 
New York: McGraw-Hill, 1991. 
[20] I. Miller and J. E. Freund, "Probability and Statistics for Engineers", Prentice-Hall, 
1965. 
[21] E. Cinlar, "1;ntroduction to Stochastic Process," Prentice-Hall, 1975. 
