Estimating the power dissipation and the reliability o f i n tegrated circuits is a major concern of the semiconductor industry. Previously 1 , we showed that a good measure of power dissipation and reliability is the extent of circuit switching activity, called the transition density. H o w ever, the algorithm for computing the density in 1 is very basic and does not take i n to account the e ect of inertial delays of logic gates. Thus, as we will show in this paper, the transition density m a y be severely overestimated in high frequency applications. To o v ercome this problem, we model the e ect of gate delay on logic signals in the form of a conceptual low-pass lter module that does not allow unacceptably short logic pulses to propagate. Using a stochastic model of logic signals, we then derive the equations required to propagate the transition density through the lter. We will present experimental results that illustrate the validity and importance of these results. Computer-Aided Design, 1994. 
Introduction
The dramatic decrease in feature size and the corresponding increase in the number of devices on a chip, combined with the growing demand for portable communication and computing systems, has made power consumption one of the major concerns in VLSI circuits and systems design. Indeed, excessive p o w er dissipation in integrated circuits not only discourages their use in a portable environment, but also causes overheating, which can lead to soft errors or permanent damage. As such, power dissipation becomes one of many other reliability concerns such as electromigration, hot-carrier degradation, etc that are becoming increasingly important with today's technology.
A crucial observation is that the power dissipation and, in general, the reliability o f a c hip is directly related to the extent of its switching activity, i.e., the rate at which its nodes are switching. Less active circuits consume less power and are more reliable. However, estimating the level of activity has traditionally been very hard because it depends on the speci c signals being applied to the circuit primary inputs. These signals are generally unknown during the design phase because they depend on the system in which the chip will eventually be used. Furthermore, it is practically impossible to simulate large circuits for all possible inputs. To address these issues, the transition density was introduced in 1 as a compact measure of switching activity in digital circuits. Simply put, the transition density at a node is the average number of transitions per second at that node, and it can be e ciently evaluated without requiring exact information about the primary input signals.
However, the algorithm for computing the density in 1 is very basic and does not take into account the e ect of inertial delays of logic gates. Thus, the transition density m a y b e severely overestimated for high speed circuits, as we will now demonstrate.
Consider a multi-input multi-output logic module M whose outputs are Boolean functions of its inputs. M may be a single gate, a cell, or a higher level module. A simpli ed timing model was used in 1 to represent the propagation delays through M, consisting of a single value of delay for every input-output node pair. The main result in 1 was a simple expression for the density at the outputs of M, in terms of its input densities and its Boolean di erence probabilities, as follows assuming M has n inputs x i and m outputs y j :
The resulting algorithm requires only two pieces of information about every primary input node, namely its equilibrium probability Px fraction of time that it is high and its transition density Dx. Thus 1.1 provides a very e cient w a y o f propagating these values throughout the circuit so that Px and Dx are computed for all internal nodes.
The problem with this approach is that it places no checks or restrictions on the maximum density or, equivalently, the minimum pulse width at a module's output nodes. This is a result of the simpli ed timing model. To illustrate, consider an n-input OR gate whose inputs have equal probabilities P = 0 : 5 and equal densities D = d. Since the Boolean di erence @y j @x i is the OR of the n , 1 other inputs, its probability is at least 0:5, which leads to Dy nd=2. Thus, for large enough n, the gate output will carry arbitrarily high density, and therefore unrealistically short pulses. In practice, such short pulses are not generated; they are glitches that are ltered out because the module is not fast enough to respond to them. In order to model this ltration e ect of the circuit inertial delays, we introduce a new delay block called a lter block at every module output, as shown in Fig. 1 .
Boolean,
Logic Module The internal block M 0 is a zero-delay Boolean block that has the same Boolean function as M. The delay blocks, which can have di erent t d delay v alues, simply introduce a delay in the logic signal which is the same for a rising or falling transition. In the general case of a m ulti-output module, these delays may be di erent for di erent output nodes.
The lter block is a delay block with a low-pass ltering property, that may be de ned as follows : a 0 ! 1 1 ! 0 transition at the lter input is transmitted to its output after a delay o f 1 0 if and only if its input does not change state during that time. Thus the lter block e ectively sets a minimum pulse width at the output y : the minimum high low pulse width at the module output is 0 1 . This paper is devoted to the analysis of a lter block, in order to propagate Px and Dx from its input to its output. In section 2, we review the formalism of a companion process of a logic signal that was introduced in 1 . The following section discusses the distribution of pulse widths of a logic signal and makes a simplifying assumption that is -2-required for the remainder of this paper. In section 4, we study the behavior of a lter block and present our main result on the propagation of density v alues through it. The following two sections are devoted to experimental results and conclusions. Finally, some proofs relevant to the simplifying assumption are given in appendix A, and the proof of our main result is given in appendix B.
Companion Process of a Logic signal
Throughout this paper, we will use bold font to represent random quantities, and will denote the probability o f a n e v ent A by PfAg. F urthermore, if x is a random variable, we will denote its mean or expected value by E x . In this section, we will brie y review the concept of a companion process from 1 . Based on these results, the expression 1.1 was derived 1 and used to propagate probability and density v alues throughout the circuit.
Pulse Width Distributions
The purpose of this section is to introduce a simplifying assumption that will make possible the solution of the lter block in section 4. In order to illustrate the need for this assumption, -3-and to show that it is actually mild in the sense that it is approximately true in practice, we will lead up to it by discussing the distribution of pulse widths in a logic signal. In general, a logic signal xt can have an in nite number of high pulses corresponding to x = 1 and low pulses x = 0. The width of a high pulse of xt can be any v alue in the interval 0; +1 . Let the population of high pulse-widths have the cumulative distribution function cdf F 1 t. Thus F 1 t is the fraction of high pulses that are shorter than or equal to t. Let r 1 be a random variable with the same distribution F 1 t, i.e., Pfr 1 tg=F 1 t, and the probability density function pdf f 1 t = d dt F 1 t. Likewise, let r 0 be a random variable distributed as the low pulses of xt, with the cdf F 0 t = Pfr 0 tg, and the pdf f 0 t = d dt F 0 t. Consider the situation shown in Fig. 2 , where t 0 is a xed non-random time point, and where we focus on the case xt 0 = 0. In this case, let the length of the low pulse around t 0 ber 0 . I t i s i n teresting to note thatr 0 does not have the same distribution as r 0 this is an instance of the so-called inspection paradox, see 5 , page 69. This happens because it is more probable for t 0 + to lie in the longer pulses of xt recall that xt 0 = x t 0 + , and is uniform over the whole real line. LetF 0 t andf 0 t be the cdf and pdf ofr 0 , respectively. W e show in appendix A that these distributions are given by :
Likewise, when xt 0 = 1, if the length of the high pulse around t 0 isr 1 , with a cdf of F 1 t and a pdf off 1 t, it can be shown that :
Going back to the xt 0 = 0 situation shown in Fig. 2 , let r 0 1 be the length of the rst high pulse to the right o f r 0 , as shown in the gure. We show in appendix A that, if r 0 1 and r 0 are independent, then r 0 1 is distributed as r 1 .
We can generalize this result, so that if r 0 1 is any high pulse to the right or left ofr 0 , and if it is independent ofr 0 , then r 0 1 is distributed as r 1 . Likewise, when xt 0 = 1, if r 0 0 is any -4-high pulse to the right or left ofr 1 , and if it is independent ofr 1 , then r 0 0 is distributed as r 0 . Otherwise, i.e., if these pulses are not independent, then their distributions will depend on their correlation withr 0 orr 1 . F or the interested reader : the reason that these pulses are not all distributed according to F 1 t o r F 0 t is because the xed time reference t 0 was chosen arbitrarily, and not, for instance, at the beginning of a pulse. This choice will become important when Fig. 2 is again invoked in the derivations in the appendices.
In practice, it is reasonable to assume for a general logic signal that two pulses that are su ciently separated in time will be uncorrelated or independent. By extending this intuitive notion, we arrive at the following simplifying assumption which will make possible the solution of a lter block in the next section :
Assumption: The width of every pulse of xt is independent of all other pulse widths in its past and future.
This assumption is mild in the sense that it is approximately true in practice, such a s when two pulses are widely separated in time.
Since the collection of all previous future pulse widths completely determines the past future of xt, then every pulse of xt is independent of its past and future. As a result, the process xt probabilistically restarts itself after every transition. A stochastic process with this property is commonly called a stationary, time-reversible, semi-Markov 0-1 process 5 . We will make use of these properties in the next section as we study the propagation through a lter block.
Filter Block Analysis
Let F be a lter block with input xt and output yt. The behavior of a lter can be formally de ned by the state diagram shown in Fig. 3 . A lter has four states, determined by the current v alues of x and y. The states S 0 corresponding to x = y = 0 a n d S 3 corresponding to x = y = 1 are called stable states. The lter will stay in these states inde nitely if x does not change. The states S 1 x = 0 ; y= 1 and S 2 x = 1 ; y= 0 are called unstable states. If the lter gets into state S 1 S 2 , then it can stay there for at most 0 1 , after which time it will automatically transition to the stable state S 0 S 3 . Transitions of yt are generated only during these autonomous transitions from an unstable to a stable state. If the lter is in an unstable state, and a transition at x occurs, then it will move t o a stable state immediately, and no transition at y will be generated.
The main result of this paper is the following theorem that shows how Py and Dy can be computed from Px and Dx : Theorem 1. For a lter with input x and output y, and given the basic assumption made -5- Notice that all that is needed to use these results are the two distribution functions F 0 t and F 1 t. In practice, it is not clear what these functions should be, or how one might estimate them. We will have more to say on this in the next section. For now, we will show that by using a simple approximation, we can simplify this requirement so that only F 0 0 and F 1 1 are required, as follows. The ratio Dy=Dx will be called the transmission probability of the lter F, and will be denoted by P F : Thus all that is needed is F 0 0 and F 1 1 . The experimental results in the next section will not use these approximations, but will be based on the accurate expressions 4.1 and 4.2.
Experimental Results and Discussion
Given the equilibrium probability Px and transition density Dx at the primary inputs of a combinational logic circuit, one can compute the corresponding probability and density at every internal node using 1.1. The density v alues can be used to estimate the circuit power dissipation as well as the susceptibility to certain reliability problems such as hotcarrier degradation and electromigration. The density propagation algorithm based on 1.1 was implemented in the program densim and presented in 1 . The results of this paper equations 4.1 and 4.2 have been incorporated into densim by simply applying the lter operation to the output of every gate as shown in the block diagram in Fig. 1 . The lter parameters 0 and 1 can be set by the user in the module library; otherwise, they are derived from the propagation delay and rise fall times of a module. In order to use the lter equations 4.1 and 4.2, however, we need to know the pulse width distribution functions F 1 t and F 2 t. The form of these distributions is generally unknown, but one may make reasonable assumptions about them, as follows. We will again make use of the intuitive property that the values of a logic signal at widely separated time points are relatively independent. If we extend this property to the point that the future value of the signal is independent of its past, once its present v alue is speci ed, then it is said to be Markov 2 and its high and low pulses are known to be exponentially distributed. In the absence of any other information, therefore, it seems that the exponential distribution is a reasonable assumption. We will come back to this point later in this section, after we've considered the e ect of these distributions on the lter behavior.
If the mean pulse width is , then the probability density function pdf for an exponential distribution is 1=e ,t= and is shown in Fig. 4a . This distribution is a special case of the gamma distribution -it is a gamma distribution of order 1. Two other gamma distributions, of orders 2 and 3, are also shown in the same gure.
-7- The e ect of the lter on an input waveform for the three distributions is shown in Fig. 4b . The density of the ltered signal starts to deviate appreciably from that of the un ltered signal at high densities. This plot was obtained using equations 4.1 and 4.2.
It is prudent at this point to experimentally validate the results of theorem 1 and of Fig. 4b . To do this, we applied a randomly generated logic signal to the inputs of a lter block, and processed the signal as one would in a logic simulator. We then monitored the signal at the lter output. Averaging over a long enough simulation time, the output probability and density should converge to those predicted by theorem 1. This behavior was indeed observed, for the three di erent distributions, as shown in Figs. 5 and 6. In both gures and in the remainder of this section, the results of logic simulation are marked logsim," while the results of applying theorem 1 are marked densim." Going back to the issue of the form of the distributions F 0 t and F 1 t, we h a v e performed extensive experimental studies on several kinds of circuits, but there seems to be no general statements that one can make about the shape of the distributions in practice. Fortunately, though, we h a v e found that the overall power dissipation of a circuit a weighted average of the node densities is relatively insensitive to the pulse width distributions at its primary inputs. For instance the average power dissipated in a 32-bit ripple adder measured with a logic simulation, with an input density o f 2 10 9 w as found to be 17.74 mW for the exponential distribution, 17.49 mW for a gamma distribution of order 2, and 17.33 mW for a gamma distribution of order 3. Therefore, at least for purposes of computing the average power, it is enough to assume some arbitrary input pulse width distribution. For the reasons -8- and Dx = 1 : 2 e 9 600 MHz for di erent input distributions.
-9-given above, we h a v e c hosen to use the exponential distribution. We should point out that the 2 10 9 input density c hosen for this test case is high enough for the lter mechanism to make a di erence" in the results, as can be seen from The plot shown in Fig. 7 compares the average power dissipation of the circuit, as measured by logic simulation, to that measured by densim with and without the lter mechanism. The horizontal axis shows the average frequency of the signals applied to the circuit primary inputs the transition density i s t wice the average frequency 1 . It clearly shows the need for the lter mechanism at higher frequencies. Fig. 8 shows the results of a similar analysis for a 4-bit parallel multiplier and a 4-bit alu. Finally, some more results are shown in Fig. 9 for the rst two ISCAS-85 benchmark circuits.
The above experimental results demonstrate the validity of the results in theorem 1, and the fact that if the lter mechanism is not used, then the basic density propagation algorithm 1.1 will severely deviate from the correct results at higher frequencies.
As a nal note, we should say that the improved accuracy a orded by the lter mechanism is obtained at virtually no speed penalty. Equations 4.1 and 4.2 have t o b e e v aluated only once for a logic gate. Thus, the density propagation algorithm remains as e cient a s w as shown in 1 .
-10- On the other hand, the overall approach still has some accuracy problems, even at low frequencies, due to the independence assumptions implicit in 1.1. As was discussed in 1 , this is due to node correlations resulting from reconvergent fanout. This issue is part of our continuing work in this area.
Summary and Conclusions
The average number of logic transitions per second, called the transition density, w as introduced in 1 as a measure of circuit power dissipation and reliability. An algorithm was also presented to compute the node densities by propagating the transition densities speci ed at -11-the circuit primary inputs. In this paper, we h a v e pointed out that that algorithm does not place any c hecks or restrictions on the maximum transition density at a node. Realistically, an upper bound on the node density does exist because a logic gate with non-zero delay cannot propagate arbitrarily short logic pulses. Pulses that are too short appear as glitches and do not propagate through the gate. In order to overcome this problem, we h a v e presented an extension to the transition density approach in 1 by taking into account the e ect of the inertial delay of a logic gate. In the framework of the stochastic representation of logic signals of 1 , we h a v e modeled this e ect with a conceptual low-pass lter block. Detailed analysis of this block has yielded compact expressions for the transition density at its output given the density at its input.
Experimental results demonstrate that the lter module behaves as it should, and that the lter mechanism is required in order to maintain accuracy at higher frequencies.
Appendix A Some Proofs Relevant to Section 2
Recall that r 1 is a random variable distributed as the high pulses of xt, with the cumulative distribution function cdf F 1 t = Pfr 1 tg, and the probability density function pdf f 1 t = d dt F 1 t. Likewise, r 0 is a random variable distributed as the low pulses of xt, with the cdf F 0 t = Pfr 0 tg, and the pdf f 0 t = d dt F 0 t. In an interval ,T 2 ; +T 2 , let n x;0 T be the total number of low pulses of xt and n t x;0 T be the number of those low pulses whose width is in the interval t; t+dt . From the de nition of a pdf, it follows that : f 0 tdt = Pftr 0 t+dtg = lim T !1 n t x;0 T n x;0 T A:1
Recall that, in the de nition of the companion process xt, is a random variable uniformly distributed over the whole real line time axis. Thus, for any xed t 0 , xt 0 = x t 0 + is a random variable equal to either 0 or 1. If xt 0 = 0, let the length of the low pulse around t 0 ber 0 , a s s h o wn in Fig. 2 . It is interesting to note that, as we will now show, r 0 does not have the same distribution as r 0 this is an instance of the so-called inspection paradox, see 5 , page 69. This happens because it is more probable for t 0 + to lie in the longer pulses of xt.
Let R 0 4 = ft : xt = 0 g be the subset of the time axis for which x = 0, and R t 0 R 0 be the set of those x = 0 i n tervals whose width is in t; t + dt . LetF 0 t andf 0 t be the cdf and pdf ofr 0 , respectively. F rom the de nition of a pdf, it follows that : Otherwise, i.e., if these pulses are not independent, then their distributions will depend on their correlation withr 0 andr 1 . Since the collection of all previous future pulse widths completely determines the past future of xt, then every pulse of xt is independent of its past and future. As a result, the process xt probabilistically restarts itself after every transition. A stochastic process with this property is commonly called a stationary, time-reversible, semi-Markov 0-1 process 5 . Proof : Since companion processes are stationary 1 , then P f y t 0 = 0 j x t 0 = "g does not depend on t 0 . Therefore, for any xed t : P f y t 0 = 0 j x t 0 = "g = P f y t = 0 j x t = "g ; 8t B:2 Let t ,1 t 0 be the random time of the last 0 ! 1 transition of xt before t 0 , and t 0 ,1 be the random time of the 1 ! 0 transition of xt that lies between t ,1 and t 0 .
If y is 0 at the end of a 0-pulse, t 0 ,1 ; t 0 , of xt, then either that pulse persisted long enough i.e., t 0 , t 0 ,1 0 , or yt w as already low at the beginning of that pulse i.e., yt 0 ,1 = 0. This corresponds to the lter arriving at state S 0 at t , 0 via either S 1 or S 2 , and can be formally expressed as : P f y t 0 = 0 j x t 0 = "g = P where we h a v e used the fact that, by the same argument given for t 0 ,1 , yt ,1 and t 0 ,1 ,t ,1 are independent in fact, we also have that yt ,1 and t ,1 are independent. Therefore : P f y t 0 = 0 j x t 0 = "g = 1 , F 0 0 + F 0 0 F 1 1 P f y t , 1 = 0 j x t 0 = "g B:6
Since yt ,1 and t ,1 are independent, then for any xed t t 0 w e h a v e :
P f y t , 1 = 0 j x t 0 = "g = P f y t , 1 = 0 j t , 1 = t ; x t 0 = "g B:7
The event ft ,1 = t g is equivalent to the intersection of the two e v ents fxt = "g and fxt makes no 0 ! 1 transitions in the interval t ; t 0 g .However, when xt = " , y t , 1 = y t is independent o f x t for all time larger than t . Therefore :
P f y t , 1 = 0 j x t 0 = "g = P f y t = 0 j x t = "g = P f y t 0 = 0 j x t 0 = "g B:8
where we h a v e used B.2 to write the last equality. This, coupled with B.6, leads to B.1 and completes the proof. Likewise, one can show that :
P f y t 0 = 1 j x t 0 = g = 1 , F 1 1 1 , F 0 0 F 1 1
