A Pattern Independent Approach to Maximum Current Estimation in CMOS Circuits by Kriplani, Harish et al.
April 1993 UILU-ENG-93-2209
DAC-36
Analog and Digital Circuits
A PATTERN INDEPENDENT 
APPROACH TO MAXIMUM 
CURRENT ESTIMATION 
IN CMOS CIRCUITS
Harish Kriplani, Farid Najm, and Ibrahim Hajj
Coordinated Science Laboratory 
College of Engineering
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Approved for Public Release. Distribution Unlimited.
April 1993 UILU-ENG-93-2209
DAC-36
Analog and Digital Circuits
A PATTERN INDEPENDENT 
APPROACH TO MAXIMUM 
CURRENT ESTIMATION 
IN CMOS CIRCUITS
Harish Kriplani, Farid Najm, and Ibrahim Hajj
Coordinated Science Laboratory 
College of Engineering
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Approved for Public Release. Distribution Unlimited.
UNCLASSIFIED
« fn ft ltŸ T L â 3 ft ïE Î& T lO N  OF THIS PAGfc'
report documentation page Form Approved OMB NO. 0704-0186
la. REPORT SECURITY CLASSIFICATION 
Unclassified 
2a. SECURITY CLASSIFICATION AUTHORITY
2b. DECLASSIFICATION /DOWNGRADING SCHEDULE
1b. RESTRICTIVE MARKINGS
None
3 . DISTRIBUTION /AVAILABILITY OF REPORT 
Approved for public release; 
distribution unlimited
4. PERFORMING ORGANIZATION REPORT NUMBER(S)
UILU-ENG-93-2209 (DAC-36)
5. MONITORING ORGANIZATION REPORT NUMBER(S)
6a. NAME OF PERFORMING ORGANIZATION
C oordinated S c ien ce  Lab 
T T m ' i r p r Q i  f v  n f  T l l i n o i s
6b. OFFICE SYMBOL 
(If applicable)
7a. NAME OF MONITORING ORGANIZATION 
SRC & Texas In stru m en ts
I 6c. ADDRESS {City, State, and ZIP Code)
1308 W Main St 
Urbana, IL 61801
7b. ADDRESS (City, State, and ZIP Code) TI
SRC PO 12053 D a l la s ,  TX 75205 
R esearch T r ia n g le  
Park, NC 27709
I 8a. NAME OF FUNDING/SPONSORING 
ORGANIZATION
SRC /  TI
8b. OFFICE SYMBOL 
(If applicable) 92-DP-109
I 8c. ADDRESS (City, State, and ZIP Code)
SRC, PO 12053 TI.
I RTP, NC 27709 D a l la s ,  TX 75205
10. SOURCE OF FI
PROGRAM  
ELEMENT NO.
JN D IN u  NUMBfcK:
PROJECT
NO.
TASK
NO.
WORK UNIT 
ACCESSION NO.
11. TITLE (Include Security Classification)
A P a ttern  Independent Approach to  Maximum Current E stim a tio n  in  CMOS C ir c u its
12. PERSONAL AUTHOR(S)
K r ip la n i, H arish; Najm, F a r id , and Ha.jj, I brahim , ......
13,. TYPE OF REPORT----------------- 113b. TIME COVERED |14. DATE OF REPORT O 'M r, Montft. D ,y ) 15. PAGE COUNT
T e c h n ica l | FROM S/Q1 TO 7/ 9.3.  | 9 3 /0 4 /0 1  I -----33.
16. SUPPLEMENTARY NOTATION
COSATI CODES
FIELD GROUP SUB-GROUP
18. SUBJECT TERMS (Continue on reverse if necessary and identify by block number)
Maximum c u r r e n t, W orst-case  V o lta g e  drop, p a tte r n  indep en d en t  
approach, s ig n a l  c o r r e la t io n ,  p a r t ia l  in p u t enum eration
19. ABSTRACT (Continue on reverse if necessary and identify by block number)
(a tta c h e d )
20. DISTRIBUTION/AVAILABILITY OF ABSTRACT
Q3 UNCLASSIFIED/UNLIMITED □  SAME AS RPT. □  DTIC USERS
21. ABSTRACT SECURITY CLASSIFICATION
Unclassified _____
22c. OFFICE SYMBOL
22a. NAME OF RESPONSIBLE INDIVIDUAL
DD Form 1473, JUN 86
22b. TELEPHONE (Include Area Code)
Previous editions ere obsolete. SECURITY Cl ASSIF ICATION OF THIS PAGE.
UNCLASSIFIED
A Pattern  Independent Approach to  M axim um  
Current E stim ation in CMOS Circuits
A bstract
Currents flowing in the power and ground (P&G) lines of CMOS digital circuits affect both 
circuit reliability and performance by causing excessive voltage drops. Excessive voltage drops 
manifest themselves as glitches on the P&G lines and cause erroneous logic signals and degra­
dation in switching speeds. Maximum current estimates are needed at every contact point in 
the P&G lines to study the severity of the voltage drop problems and to redesign the sup­
ply lines accordingly. These currents, however, depend on the specific input patterns that 
are applied to the circuit. Since it is prohibitively expensive to enumerate all possible input 
patterns, this problem has, for a long time, remained largely unsolved. In this paper, we 
propose a pattern-independent, linear time algorithm (iMax) that estimates at every contact 
point, an upper bound envelope of all possible current waveforms that result by the application 
of different input patterns to the circuit. The algorithm is extremely efficient and produces 
good results for most circuits as is demonstrated by experimental results on several benchmark 
circuits. The accuracy of the algorithm can be further improved by resolving the signal cor­
relations that exist inside a circuit. We also present a novel partial input enumeration (PIE) 
technique to resolve signal correlations and significantly improve the upper bounds for circuits 
where the bounds produced by iMax are not tight. We establish with extensive experimental 
results that these algorithms represent a good time-accuracy trade-off and are applicable to 
VLSI circuits.
1 Introduction
A major concern in present day VLSI circuits is the design of power and ground (P&G) lines in 
a way that ensures design reliability and performance. Excessive supply currents can severely 
affect both circuit reliability and performance by causing excessive voltage drops in the P&G 
lines. Excessive voltage drops manifest themselves as glitches on the P&G lines, and cause 
erroneous logic signals (soft errors) and degradation in switching speeds. Severity of the voltage 
drop problems intensify with the continuing push for denser chips and finer technologies. As 
is known from the classical scaling theory [1], as the minimum feature size and supply voltage 
are scaled down, while the total power dissipation on the chip remains constant, the currents 
flowing in the P&G lines increase. With higher currents flowing in narrower lines, the voltage 
drops in the P&G lines go up and quickly become a limiting factor in the design of VLSI chips. 
Furthermore, a lower supply voltage means that the noise margins [1] for the correct operation 
of the transistors on the chip decrease. In short, in order to avoid logic errors, the circuit 
needs to be appropriately designed to take care of increased voltage drops and reduced noise 
margins. This highlights the need for efficient CAD tools to estimate voltage drops in power 
and ground lines. Since worst case currents determine worst case voltage drops, our research 
is focused on the problem of estimating maximum currents in the P&G lines.
Power and ground lines deliver power to all the gates in a circuit. Points at which in­
dividual gates or cells are tied to the bus are called contact points. In VLSI circuits, P&G 
lines take up an appreciable amount of routing area, typically 20-50% or even more in some 
circuits. Several design methods, such as [2, 3], have appeared in literature that make use of 
the maximum current estimates at the contact points to redesign the P&G lines. The output 
of a design optimization procedure depends upon the accuracy with which maximum currents 
are estimated. A poor estimate of maximum currents will result in a pessimistic design and 
therefore wasted silicon area. Clearly, an accurate estimation of currents at every contact point 
in a circuit is very crucial and is the subject of this paper.
Current drawn by a CMOS circuit depends upon the specific input pattern applied at its 
inputs. An input pattern for a circuit with n inputs is defined as a vector of n excitations, 
where each excitation could be any one of four possibilities i.e., low, high, high to low or 
low to high. For different input patterns, different transient current waveforms are drawn at 
the contact points. Therefore, in the presence of such input dependent and transient current 
waveforms, we need to define what we mean by the maximum current waveform at a contact 
point. Chowdhury et. al. [4] find the maximum of the peaks of various transient current 
waveforms at every contact point for all possible input patterns. In the final analysis of the 
supply lines, they then assume that these constant peak values are applied at the contact 
points for all time, i.e., they have dc currents flowing in the lines. This assumption, however,
1
gives pessimistic results since separate sections in a circuit rarely draw their maximum currents 
simultaneously. In this paper, we propose a better measure of the maximum current waveform 
at a contact point called maximum envelope current (MEC) waveform. This maximum current 
estimate is discussed in section 4.
Accurate estimation of the maximum current waveform at every contact point is extremely 
difficult since for that we need to determine current waveforms corresponding to all possible 
input patterns. If the circuit has n primary inputs then we need to explore the set of 22n input 
patterns to calculate the maximum current waveforms. This makes the problem practically 
impossible to handle by any of the known search procedures for large circuits. As will be shown 
in the next section, most previous work in this area has been based on search techniques. In this 
paper, we propose a pattern independent, linear time (in the number of gates) algorithm (iMax) 
that provides tight upper bounds on the MEC waveforms. The proposed approach represents a 
trade-off between execution speed and tightness of these bounds.
In order to maintain reasonable execution times, the iMax algorithm neglects various signal 
correlations that exist inside a circuit. As will be shown later, while in most cases iMax pro­
duces good upper bound waveforms, in some cases the loss due to signal correlations can be 
significant. We then propose a new partial input enumeration (PIE) algorithm that efficiently 
resolves these correlations and leads to significant improvement in the upper bound waveforms. 
The PIE algorithm is based on (1) intelligently selecting a few critical inputs and (2) enumerat­
ing a limited number of cases at these inputs to produce an overall improvement in the upper 
bound waveform at every contact point. It turns out that the choice of these critical inputs is 
the key to improving the upper bound. We present two heuristics for automatically selecting 
the critical inputs, that have shown good results in practice. While the PIE algorithm is slower 
than the simple iMax algorithm, we demonstrate good speed and accuracy performance results 
on circuits with over twenty thousand gates. Furthermore, the algorithm has the attractive 
property that it does an iterative improvement, so that one can stop the algorithm at any time 
and obtain a better upper bound than the simple iMax result.
This paper is organized as follows. In the next section, we briefly discuss previous and 
related work in this area. In section 3, we discuss various assumptions on which our algorithms 
are based. In section 4, we describe the proposed maximum current estimate. After that, we 
present the iMax algorithm in detail in section 5. Experimental results on several benchmark 
circuits using iMax are also provided in this section. The signal correlation problem is described 
in section 6. This is followed by a discussion of possible methods that can be used to resolve 
the signal correlations in section 7. In section 8, we present the partial input enumeration 
algorithm along with extensive experimental results on several benchmark circuits. Finally, in 
section 9, conclusions and some guidelines for future work are presented.
2
2 Previous Work
Several papers (such as [5, 6, 7, 8]) have appeared in literature on the estimation of P&G 
currents from deterministic input patterns. These methods oifer significant improvement in 
execution times compared to SPICE, while providing acceptable accuracy in the current wave­
forms. These methods can be used for finding maximum currents for small circuits having a 
few inputs, by calculating the current waveforms corresponding to all possible input patterns. 
However they are not much helpful for large circuits, as they do not guide us in selecting an 
input pattern that leads to the maximum currents.
Chowdhury et. al. have addressed the problem of maximum current estimation in [4]. In 
their methodology, they divide the circuit into a set of macros, where each macro consists of 
a combinational interconnection of logic gates. Considering each macro separately, they use 
either an exact search technique (namely, branch and bound) or a heuristic technique to find 
the maximum of its transient currents, assuming that the macro has only one cantact point 
and its inputs switch simultaneously. In the analysis of the bus, to calculate maximum voltage 
drops, the authors assume that all the macros draw their maximum currents simultaneously. 
In addition, they assume that internal nodes of a circuit make at most one signal transition. 
Our experience with various benchmark circuits indicates that multiple signal transitions (or 
glitches) at internal nodes can contribute a significant amount to the P&G currents. Because of 
these assumptions, their methodology overestimates the worst case currents and voltage drops. 
Secondly, due to the huge size of the input space, their branch and bound search technique 
is slow on large circuits. Furthermore, their heuristic approach does not guarantee an upper 
bound on the maximum currents.
Devadas et. al. have addressed a similar problem in [9]. They consider the estimation 
of worst case power dissipation in CMOS combinational circuits. They reduce this problem 
to a weighted max-satisfiability problem, on a set of multi-output Boolean functions obtained 
from the circuit logic description. These functions are appropriately weighted to account for 
different load capacitances. They then use either a disjoint cover enumeration algorithm or 
the branch and bound algorithm to solve the (J\fV-complete) max-satisfiability problem. They 
are able to account for the glitches at various internal nodes. However, for a multilevel logic 
circuit, even under a unit gate delay assumption, the functions generated by their algorithm 
are fairly complex. Consequently, even for small circuits, their analysis is slow. Analysis of 
multi-level circuits under a general delay model was not attempted.
From this brief survey, it is clear that existing methods for the calculation of maximum 
current are computationally too expensive to handle large VLSI circuits. In order to be able to 
handle these circuits, (near) linear algorithms (rather than exponential) are necessary. There­
fore, pattern independent algorithms become a natural choice. Hercules [10] was an initial
3
F igu re 1. An example of a latch controlled synchronous digital circuit.
attempt in the direction of a pattern independent approach to maximum current estimation. 
However, the analysis presented in [10] makes several simplifying assumptions. The approach 
subdivides the circuit into stages but does not discuss how information is represented at the 
output of each stage and how it is propagated from one stage to others. Further, the signal 
correlation problem is not mentioned in the paper. In this paper, we present a novel approach 
that is able to address these problems. This approach is discussed in section 5.
3 A ssum ptions
In order to reduce the complexity of the problem, we focus on a specific, but very common de­
sign style, namely (edge-triggered) latch-controlled synchronous digital circuits. These circuits 
consist of combinational blocks separated by latches (see Fig. 1) such that all the inputs to 
each block switch simultaneously. As a result, we will focus the analysis, from the next section, 
on a single combinational block all of whose inputs switch simultaneously (if at all). This ef­
fectively eliminates the time domain uncertainty about the input transitions, and significantly 
simplifies the problem. This assumption has also been used by all the previous approaches.
We assume that the delay of each gate in the circuit is fixed and is specified ahead of time. 
Different gates can have different delays. Further, we assume that each time the output of a 
gate switches, a triangular pulse of current is drawn from the supply lines, as shown in Fig. 2. 
The duration of this pulse is computed from the delay of the gate (by charge conservation), 
and its height (peak value) is a user-specified value. Two separate values for the peak current 
are allowed, corresponding to low to high and high to low transitions at the gate output.
Given the specific clocking scheme of the synchronous circuit, the maximum current wave­
forms from different combinational blocks can be appropriately shifted in time depending upon 
the individual clock trigger, and used to find the maximum voltage drops in the bus. Therefore, 
for the purposes of this paper, we will focus on the analysis of a single combinational block 
whose inputs switch at time zero.
4
Peak
F igu re 2. Model of a gate current pulse.
Transient
Currents
F igure 3. The Maximum Envelope Current (MEC) waveform.
4 M axim um  Current Estim ate
We define excitation at a node (or net) at any time t as the stimulus (or signal value) present 
at the node at that time. At any time, a node in the circuit could be either stable at low or 
high, or could transition from high to low or from low to high. Thus, the excitation could be 
any single value from the set X  = {/, h, hi, lh}, where l = low, h =  high, hi = high to low 
transition and lh = low to high transition.
We now describe the measure we use in our approach to represent maximum currents. 
For the purposes of this illustration, let us consider a specific contact point in a circuit. As 
mentioned in the introduction, the current drawn by a CMOS circuit is a complex function 
of input excitations. For each input pattern that is applied to the circuit, a transient current 
waveform is drawn at the contact point. Instead of representing the maximum current at the 
contact point by a single dc value, in our approach, we represent it by a waveform whose value 
at any time is the maximum current value that the circuit can draw at that time (see Fig. 3). 
We call this the Maximum Envelope Current (MEC) waveform.
Let us suppose that a circuit under consideration has n inputs and when an input pattern
5
p = (e i , e2, . . . , en), where et- £ X, 1 < z < n, is applied to the circuit, a transient current 
waveform Ip(t) is drawn from the circuit at the contact point. Let us denote the set of all 
possible input patterns that can be applied to the circuit by U = {(ei, e2, . . . ,  en) | e,- G X, 1 < 
i < n}. If the value of the MEC waveform at any time t at the contact point is denoted by 
IlSEc(t), then we have the following equation:
/hec(0  =  maxJp(t) (1)
Clearly, (also see Fig. 3) the MEC waveform is the maximum envelope of all transient current 
waveforms corresponding to all possible input patterns (and hence the name). There is a 
unique MEC waveform at every contact point.
If the power or the ground bus of a circuit is represented by an equivalent RC network, 
then we have the following result:
T heorem  1 : The voltage drop ( v j ^ 0 ( t ) j , calculated at any node k in the power or
ground bus when the MEC waveforms are applied at the contact points, is an upper bound on 
the voltage drop (V£ (/)) occurring at tha t node when any input pattern p is applied to the 
circuit, i.e., VjJ!EC(i) > Vjf(t), for all t.
Proof : From Eq. (1), we have
fMEc(f) > Ip(t)
Therefore, from Theorem A1 in the appendix, the result follows. ■
Estimating MEC waveforms at every contact point in a circuit is an extremely difficult 
problem as for that we need the current waveforms corresponding to all possible input patterns. 
In the next section, we describe a linear time algorithm that provides tight upper bounds on 
the MEC waveforms at every contact point.
5 The Proposed A lgorithm  (iMax)
The proposed pattern independent, linear time algorithm operates at the gate level description 
of the circuit. Unless specified by the user, we assume that nothing is known about the specific 
excitations at the primary inputs, except that they may transition (only) at time zero, i.e., 
each primary input may carry any excitation from the set X at time zero. We call this an 
uncertainty about these input signals. The basic idea of the proposed algorithm is to propagate 
this uncertainty present at the inputs inside the circuit, so that, at the output of every logic 
gate, we know the set of all possible excitations and their associated timing. From this, the 
worst case gate currents are computed, as explained below.
6
F igu re 4. An illustration of uncertainty waveform at a gate output.
5 .1  S ig n a l R e p r e s e n ta t io n
Perhaps the first question that comes to mind is what kind of information one maintains in 
order to represent the signal uncertainty about internal circuit nodes. Ideally, one would like 
to compute the set of all possible transitions (along with their timing information) that occur 
at the output of every gate in the circuit. That would certainly be enough to estimate the 
MEC waveforms. However, due to the uncertainty at the primary inputs and the general gate 
delay model used, the number of possible transitions at internal gates grows exponentially, 
and quickly becomes a bottleneck. To avoid this problem, we maintain information, not about 
individual transitions, but about intervals during which the output of the gate might switch. 
Thus, for each of the excitations low, high, hi and Ih, we store a list of intervals during which 
a node might carry that excitation. These intervals, which might overlap, serve to describe the 
signal uncertainty. We call these intervals uncertainty intervals.
D efin ition  1 (U n cer ta in ty  Set X n(t)) : The uncertainty set at time t for a node n
defines the set of all possible excitations that the node can assume at that time. X n(t) C X.
D efin ition  2 (U n cer ta in ty  W aveform ) : The uncertainty waveform describes the signal 
uncertainty present at a node as a function of time. At time t, the set of values taken by the 
waveform is the uncertainty set for the node at that time.
An example of the uncertainty waveform is given in Fig. 4. In this figure, we show an 
uncertain signal U(t) represented as four sets of intervals1 along the time axis. Thus, if u(t) 
is a logic signal that belongs to the family U(t), i.e., u(t) E U{t), then u(t) will be low up to 
¿1, will switches from low to high sometime between ti and , will then be high up to ¿3, etc.
1O n e  set o f interva ls for each low, high, hi and Ih excitations.
7
Since the signal can switch from low to high at any time between ti and , therefore, it can 
be either high or low during that interval. Notice that between t6 and the signal may make 
any number of low to high and/or high to low transitions. At the primary inputs, signals 
are represented by such waveforms with a single point of possible transition at time 0. As 
internal signals are generated, the number of points at which transitions can possibly occur, 
increases. In order to contain the complexity, we then start to merge neighboring transition 
points into intervals. In general, this strategy can be stated as follows : when the number of 
intervals associated with a gate corresponding to any excitation exceeds a certain user-specified 
threshold (Max_No_Hops), we repeatedly merge closest-neighbor intervals, so as to keep their 
count below the threshold.
5 .2  I n d e p e n d e n c e  A s s u m p t io n
While propagating information at a logic gate, we know the uncertainty waveforms at each of 
its inputs and we would like to derive the corresponding waveform at its output. However, one 
cannot do this accurately without knowing how some of these inputs, if any, are correlated. For 
instance, certain combinations of the gate input excitations may not be possible. Unfortunately, 
maintaining information about correlation between various circuit nodes is very expensive. 
We, therefore, use a conservative approximation, one that does not underestimate the MEC 
waveforms, as follows. If we assume that all combinations of the gate input excitations are 
possible, i.e., the gate inputs are independent, then the worst case current in that case will 
be an upper-bound on the gate current for the case when the inputs are dependent. In other 
words, the worst case current over all combinations of inputs is certainly an upper bound on 
the worst case current over some.
5 .3  S in g le  G a te  S im u la t io n
Given the type of a Boolean gate and the independence assumption for the uncertainty wave­
forms at its inputs, we now describe how the uncertainty waveform at the output of the gate 
is calculated. This process is divided into the following two parts:
1. Calculation of the uncertainty set at the output of the gate at a time t.
2. Calculation of uncertainty intervals at the output of the gate.
5.3 .1  C alcu lating  U n certa in ty  Set :
One can calculate all possible excitations at the output of the gate at time t from the excitations 
present at the inputs at time t -  D, where D is the delay of the gate. Let us denote the 
uncertainty set at the ith input of the gate at time t -  D by Xt. Let us further suppose that
8
the gate has m  inputs. Then the set of all possible input patterns that lead to an excitation at 
the output of the gate at time t can be represented by {(xi ,X 2 , . .  .xm) \ £ Xi,  VI < i < m }. 
For each input pattern, the output of the gate can be easily determined from the Boolean 
equation of the gate. Thus, by calculating the output of the gate corresponding at each input 
pattern, the uncertainty set at the output of the gate at time t can be determined. This process 
would require one to generate and evaluate |X i||X 2| .. - \Xm\ input patterns. This worst case 
complexity can be greatly reduced by the following observations.
1. The above input pattern generation and evaluation process can be stopped when the 
uncertainty set at the output of the gate becomes equal to X. Obviously, trying out any 
more input patterns would not lead to any improvement in the uncertainty set at the 
output of the gate.
2. An input to a gate is completely ambiguous at time t if its uncertainty set at time t is X. 
If all the inputs to the gate are completely ambiguous at time t — D, then the output of 
the gate is also completely ambiguous at time t.
3. All the gates in a circuit can be divided into the following two categories.
(a ) Gates whose outputs depend not only on the excitations present on its inputs but 
also on the total count of input lines, e.g., XOR, XNOR etc. gates.
(b )  Gates whose outputs do not depend on the count of the inputs. The outputs of such 
gates depend only on the specific excitations present on the inputs, e.g., NAND, 
NOR etc. gates.
For all the gates which belong to the second category above, all the input lines which 
have the same uncertainty sets can be merged into a single line (only for the purpose 
of calculating uncertainty set!). This observation leads to a reduction in the number of 
input lines and thus the number and size of the input patterns.
All of these observations lead to tremendous savings in the calculation of uncertainty sets at 
the output of the gate and thus contribute to the speed of the algorithm.
5 .3 .2  C alcu latin g  U n certa in ty  Intervals :
In iMax, since signals are represented in the form of uncertainty intervals at the inputs of the 
gate, therefore, the output of the gate would also be in the form of uncertainty intervals. An 
interval at the output of the gate could begin or end at time t only if an interval begins or ends 
at any of the inputs at time t — D. Between the time at inputs when an interval begins or ends, 
and the next interval begins or ends, the sets of excitations that the inputs can assume do not
9
Inpu t D escr ip tion  : il, i2 G {/, h, hi, IK] at time 0. 
U n certa in ty  Intervals :
i l ,  i2: lh[0, 0], hl[0, 0], /[0, oo), h[0, oo) 
n l: lh[ 1, 1], hl[ 1, 1], /[0, oo), h[0, oo) 
ol: lh[2, 2][3, 3], h/[2, 2][3, 3], /[0, oo), ^[0, oo) 
if MAX_N0_H0PS =  1 then
ol: lh[2, 3], hl[2, 3], /[0, oo), /i[0, oo)
K ey  : Excitation[Interval Begin, Interval End]
F igure 5. An example of uncertainty waveforms calculation.
UncertaintyInterval
i : : *“.i
***. i : ***. : V
**.i. : **.. : i
'*•. : : i
F igure 6. Current waveform due to an uncertainty interval.
change and therefore no corresponding uncertainty interval can begin or end at the output 
during that time shifted by D. Based upon these observations, the uncertainty intervals at the 
output of the gate can be easily calculated.
An example illustrating how uncertainty waveforms at various circuit nodes are calculated 
is shown in Fig. 5.
5.4 Current C alculation
After the uncertainty waveform at the output of a gate is known, its current contribution is 
calculated next. Since the output of the gate could switch at any time during an uncertainty 
interval, therefore, a triangular pulse of current could be drawn at any time during that interval 
(shifted backwards by the delay of the gate) from the P&G lines due to these transitions, as 
shown in Fig. 6. Hence, by taking an envelope of all possible triangular current pulses, 
we get the worst case current contribution of the gate for an uncertainty interval. At every 
gate, there are two types of uncertainty intervals that result in some switching activity at the 
output and therefore, there are two possible current waveforms, one due to the hi uncertainty 
intervals, called hlC urrent and the other due to Ih uncertainty intervals, called lhC urrent.
CurrentWaveform
10
Since at any time, the output of the gate could switch either from high to low or from low 
to high, therefore, by taking an envelope of the h lC urrent and lhC urrent waveforms, we get 
the maximum current contribution of the gate. Once all the gate currents are calculated, the 
current waveforms at the contact points are calculated by combining the individual currents 
of those gates that are tied to it.
5 .5  I m p le m e n ta t io n  D e ta i ls
The above approach has been implemented in a program in C. In the program, the circuit is 
first levelized so that the output of a gate at level j  does not feed any other gate at a level less 
than or equal to j .  Any user-specified restrictions on certain inputs are then imposed, while 
all other inputs are assumed to take all possible excitations from the set X. After this, the 
circuit is analyzed in a level by level fashion, starting from the lowest level, by propagating the 
uncertainty waveforms at the inputs of every gate to its output. As a result, we get a current 
waveform that is a point-wise upper bound on the MEC waveform at every contact point:
T h eorem  : The current waveform calculated at any contact point by the above (iMax) 
algorithm (/iMax(f)) is a point-wise upper bound on the corresponding MEC waveform (iMEc(f)), 
i.e., /iMax(f) > ImecM-
Proof : As we start with the set of all possible excitations at the primary inputs of the 
circuit, by merging uncertainty intervals, we only get bigger intervals containing the old set 
of intervals, use the conservative signal independence assumption and calculate the worst case 
current for each uncertainty interval, the above theorem follows. ■
5 .6  Q u a lity  A s s e s s m e n t
In order to assess the quality of the solution at every contact point, we need to determine, 
how close the upper bound waveforms obtained from iMax are to the exact MEC waveforms. 
One way of doing this would be to perform an exhaustive enumeration over all possible input 
patterns and actually calculate the MEC waveforms at every contact point. However, this 
would be very expensive and practically impossible for circuits with more than about 10 inputs 
(Note: 410 = 1,048,576). We, therefore, resort to the following random optimization approach. 
We repeatedly apply different input patterns, randomly selected from the set of all possible 
patterns, to the circuit. Then we use a logic simulator to calculate the outputs of various 
gates. From these gate outputs, the P&G current waveforms at every contact point are easily 
calculated. At every contact point, by maintaining an upper-bound envelope of the current 
waveforms obtained for different input patterns, we basically get a lower-bound on the MEC 
waveform. Naturally, the more patterns are simulated the closer this waveform will get to the 
MEC waveform. Ideally, one would like to see the upper-bound obtained from iMax come as
11
Table 1. iMax and SA  resu lts  for 9 sm all c ircu its.
Circuit No. Gates No. Inputs iMaxlO SA Ratio
BCD Decoder 18 4 32.41 32.41 1.00
Comparator A 31 11 54.47 54.47 1.00
Comparator B 33 11 54.00 54.00 1.00
Decoder 16 6 26.75 26.75 1.00
P. Decoder A 29 9 44.11 44.11 1.00
P. Decoder B 31 9 49.35 49.35 1.00
Full Adder 36 9 58.00 55.44 1.05
Parity 46 9 47.57 47.57 1.00
Alu (SN74181) 63 14 78.89 70.87 1.11
close to this lower-bound as possible. The program that implements this random optimization 
technique is called iLogSim (Current Logic Simulator).
In our experiments, iLogSim calculates the lower bound by trying out several thousand 
randomly generated input patterns. However, such a technique does not make any use of the 
current waveform of a currently applied input pattern in generating the next (hopefully better) 
input pattern. By making use of this information, we can generate better input patterns and 
thus avoid wasting time in randomly enumerating the input space. We have experimented 
with one such iterative optimization scheme, namely the simulated annealing (SA) algorithm 
[11, 12]. However, simulated annealing algorithm needs an objective function to indicate how 
well the search is progressing. We have supplied the peak of the total current waveform as the 
objective function to the algorithm. Total current waveform is the sum of the current waveforms 
at individual contact points. (Also see the definition of objective function in section 8). In our 
experimental results, we compare the peak of the total current waveform obtained from iMax 
to that obtained from SA.
5.7 E xperim en ta l R esu lts
In all the circuit examples considered below, two assumptions are made. First, a fixed number 
is assigned to each gate as its delay value. This delay value is different for different gates. 
Second, the peak of the transition current for every gate for both Ih and hi transitions is taken 
to be 2 units of current. The results of simulated annealing were collected after trying about 
100,000 input patterns for each circuit.
Table 1 lists the results of running iMax and SA algorithms on nine small CMOS circuits. 
These circuits have number of gates ranging from 16 to 63 and number of inputs ranging from 
4 to 14. Under columns iMaxlO and SA, we report the peak values of the respective current
12
Table 2. iMax and SA results for 10 ISCAS-85 circuits.
No. No. Peak Currents CPU Times
Circuit Gates Inputs iMaxlO SA Ratio iMax10 SA (10k)
c432 160 36 181.9 162.0 1.12 1.2s 9m 40s
c499 202 41 247.9 186.6 1.33 1.2s 10m 46s
c880 383 60 418.5 321.4 1.30 3.0s 26m 32s
cl355 546 41 633.8 415.8 1.52 4.6s 36m 18s
cl908 880 33 732.1 445.6 1.64 7.6s lh  34m
c2670 1193 233 1169.2 866.1 1.35 7.9s 2h 14m
c3540 1669 50 1740.4 866.1 2.01 12.7s 3h 39m
c5315 2307 178 2312.6 1558.9 1.48 16.0s 4h 40m
c6288 2406 32 4096.2 3193.2 1.28 37.8s 61h 58m
c7552 3512 207 4339.7 2768.6 1.57 28.4s 7h 28m
waveforms and their ratio is shown in the last column. The number (10) next to iMax in the 
table indicates the value of the Max_No_Hops parameter. The ratio shown in the table represents 
an upper bound on the worst case error as it is the ratio of an upper bound to a lower bound. 
A value close to one indicates that the iMax upper bound is close to the SA lower bound. From 
the table, we observe that for most of the circuits, the results of iMax are in perfect agreement 
with the SA results.
In Table 2, we report similar results on the ten ISCAS-85 benchmark circuits [13]. These 
circuits have number of gates ranging from 160 to 3512 and all the circuits have at least 32 
inputs. In the table, in the last two columns, we document the cpu times needed by the iMax 
algorithm and the typical times needed for trying 10,000 input patterns by the SA algorithm on 
a sun SPARCstation ELC. We observe that for all the circuits, the linear time iMax algorithm 
took only a few seconds of cpu time compared to several hours of time needed by the SA 
algorithm. Furthermore, for most of these circuits, the ratio of the upper bound obtained from 
iMax to the lower bound obtained from SA is less than 1.57. There are two possible reasons for 
this mismatch. Firstly, it is quite possible that the lower bound obtained from SA is not very 
close to the MEC waveform. Since all the circuits have at least 32 inputs, the space of possible 
input patterns is huge, and the lower bound waveform obtained after trying about 100,000 
input patterns may not be very close to the MEC. For smaller circuits (Table 1) where the input 
space is not so huge, 100,000 input patterns were enough to get a lower bound estimate of 
MEC waveform that is quite close to the actual waveform, as is reflected by the results. The 
second possible source of mismatch is our conservative independence assumption for the signals 
at the gate inputs. One can improve on this assumption by attempting to resolve the signal
13
Table 3. iMax resu lts  v s . Max_No_Hops
Circuit iMax: Max_No_Hops
1 5 10 oo
c432 236.3 ( 0.4) 185.1 ( 0.9) 181.9 ( 1.2) 181.7 ( 2.1)
c499 292.1 ( 0.4) 247.9 ( 0.8) 247.9 ( 1.2) 247.9 ( 1.3)
c880 550.6 ( 0.8) 424.1 ( 2.0) 418.5 ( 3.0) 415.1 ( 11.8)
cl355 749.9 ( 1.3) 646.8 ( 2.7) 633.8 ( 4.6) 633.8 ( 19.5)
cl908 908.3 ( 2.0) 740.5 ( 5.0) 732.1 ( 7.6) 724.9 ( 86.3)
c2670 1476.0 ( 2.3) 1188.6 ( 5.4) 1169.2 ( 7.9) 1166.4 ( 21.7)
c3540 2088.2 ( 3.7) 1752.4 ( 8.2) 1740.4 (12.7) 1730.3 ( 170.0)
c5315 2632.0 ( 5.1) 2336.5 (11.2) 2312.6 (16.0) 2309.0 ( 109.5)
c6288 4134.8 (10.4) 4098.4 (20.7) 4096.2 (37.8) 4096.1 (7086.0)
c7552 5163.1 ( 8.0) 4401.6 (20.0) 4339.7 (28.4) 4325.8 ( 177.8)
correlations, as discussed in the following sections.
We next discuss the effect of varying the Max_No_Hops parameter on the performance of 
iMax. Table 3 lists the peak values of the upper bound waveforms for ISCAS-85 circuits for 
different values of Max_No_Hops. In parentheses, we also tabulate the cpu times (in sec.) needed 
by the algorithm. As the value of Max_No_Hops increases, the number of intervals being merged 
at every gate decreases and therefore, the cpu time needed by the algorithm increases. This 
also improves the peak value of the current waveform, as shown in the table. However, with an 
increase in Max_No_Hops parameter, while the cpu time continues to increase, the improvement 
in peak value is not significant beyond Max_No_Hops = 10. Similar behavior is observed for the 
entire current waveform and not just for the peak value. In Fig. 7, we plot the upper bound 
waveforms for cl908 for three different values of Max_No_Hops. The difference between the 
upper bound waveforms for iMaxlO and iMaxoo is almost negligible. Similar plots are obtained 
for all other circuits. Therefore, we conclude that a value between 5 and 10 seems to be a good 
choice for Max_No_Hops parameter.
6 The Signal Correlation Problem
In general, signals at internal nodes of a circuit are correlated. This limits the number of 
transitions that can possibly occur at the outputs of the gates, an effect that is ignored by the 
iMax algorithm. Two examples of how signal correlation limits the number of transitions are 
illustrated in Fig. 8.
In Fig. 8(a), signal lines x l  and x2 are correlated (in this case, they carry the same signal).
14
c l908  Current Waveforms
F igu re 7. iMax current waveforms for different values of Max_No_Hops parameter.
(a) At most one of the two 
gates can switch.
(b) Correlated signals at the inputs of the 
NAND gate block the transition.
F igu re 8 . Two examples to illustrate the signal correlation problem.
Depending upon the specific excitation present at x, only one of the two gates can switch 
at a time. However, since iMax ignores the signal correlation present between x l  and x2, it 
calculates the uncertainty sets at the outputs of the two gates as shown in the figure and thus 
erroneously concludes that both gates may switch at the same time, and therefore adds two 
triangular current pulses due to both gates switching simultaneously to the overall current 
waveform. It is this kind of approximation that contributes to a loose iMax upper bound. 
Similarly, in Fig. 8(b), the output of the inverter is correlated with its input and therefore, 
the NAND gate may never switch. However, ignoring this correlation, iMax concludes that the 
NAND gate can switch.
As is clear from these examples, the source of the signal correlation problem, in general, is 
a gate (or input) whose output fans out to several other gates. Such gates are called multiple
15
F igure 9. An example of a Multiple Fan-out Gate.
fan-out (MFO) gates. The general situation is shown in Fig. 9, where a MFO gate G with 
output node n fans out to nodes n i, n2, . . . ,  nk that in turn feed gates Gi, G2, . . Gk- In this 
figure, inputs to the gates Gi, G2, . . Gk (which are 7ii, n2, .. . ,  respectively) are correlated. 
Due to this correlation, even though the output of each gate (Gi, G2, .. . ,  Gk) can assume all 
possible excitations as calculated by iMax, they may not simultaneously carry their worst case 
excitations. As one goes deeper into the circuit, where these correlated outputs reconverge and 
feed the same gate, the inputs of that gate become correlated (e.g., NAND gate in Fig. 8(b)). 
Such gates are called reconvergent fan-out (RFO) gates. With correlated signals at the inputs 
of a gate, the number of transitions that can possibly occur at its output is reduced. The 
signal correlations considered above, which exist among various nodes throughout the circuit, 
are called spatial correlations.
Besides the spatial correlations, there is another set of correlations that the iMax algorithm 
completely ignores. The excitation assumed by a node at time t , restricts the set of possible 
excitations that the node can assume at an earlier or a later time. For example, if a node is 
low at time t, then it can either stay at low or switch from high to low at time t~ and it can 
either stay at low or switch from low to high at time t+. These correlations which exist in the 
time domain are called temporal correlations.
The iMax algorithm completely ignores all spatial and temporal signal correlations and, 
therefore, overestimates the supply currents. The advantage of ignoring correlations in the 
algorithm is its, very desirable, linear time performance.
16
Table 4. N u m b er  o f  M FO  g a te s /in p u ts  in IS C A S -85  circu its.
Circuit No. Inputs No. MFO Circuit No. Inputs No. MFO
c432 36 153 c2670 233 1129
c499 41 170 c3540 50 1647
c880 60 357 c5315 178 2184
cl355 41 514 c6288 32 2384
cl908 33 855 c7552 207 3405
7 R esolving Signal Correlations
The upper bound produced by the iMax algorithm can be made exact by doing a brute-force 
enumeration at the inputs of the circuit and calculating an envelope of the current waveforms 
produced. In enumeration, since unambiguous input patterns are applied to the circuit, there 
is no “uncertainty” present at the inputs and therefore, signal correlations do not become an 
issue. In a similar fashion, one can improve the results of the iMax algorithm by doing a partial 
enumeration at a few selected nodes in the circuit.
An example of how partial enumeration helps improve the upper bound can be seen from 
Fig. 8(a). In this circuit with no enumeration, iMax would assume that the signal lines x l  
and x2 are mutually independent and therefore infer that both NAND and NOR gates can 
switch at the same time. However, if we do partial enumeration at signal line x , then we would 
generate four cases corresponding to when x = l, x = h, x = hi and x = Ih. When x = l or 
hi, only the NOR gate switches. Similarly, when x = h or /h, only the NAND gate switches. 
Thus, by splitting the problem into four sub-problems, we have improved our result, i.e., found 
that only one of the two gates may switch at any given time.
While enumerating a node, we only need to process a small subset of the gates. We define 
the COne of INfluence, COIN(n), of a node n as the set of all the gates that can possibly be 
affected by a change in excitation at the node. Thus, a gate is in the COIN(n) of a node n if 
it is either directly fed by n or is connected to the output of a gate that is in COIN(n). While 
enumerating a node, we only need to consider those gates that are in its COIN.
One technique to partially enumerate the internal nodes of a circuit, called Multi-Cone 
Analysis (MCA), was reported in [14]. The motivation behind such an approach was to be able 
to enumerate at the outputs of the MFO gates, which are the sources of the signal correlation 
problem. However, from the results in [14] and Tables 6, 7 in this paper, it can be seen that 
the MCA approach offers only a modest improvement in the upper bound. There are several 
reasons for this.
As shown in Table 4, there are usually several MFO gates/inputs in a circuit and all of 
these nodes should be enumerated to properly resolve the signal correlation problem. From
17
our experience with ISCAS-85 benchmark circuits, we have found that the COINs of several of 
these nodes overlap and therefore, to properly handle signal correlations, these nodes should 
be enumerated simultaneously. Furthermore, because of the presence of glitches in a circuit, 
signals at internal nodes span several time points (i.e., signal transitions occur at several time 
points). To take care of the temporal correlation problem, the node should be enumerated 
at each of these time points. Simultaneous enumeration is an extremely expensive process 
specially when there are several nodes and each node needs to be enumerated at several time 
points. For example, to enumerate two nodes simultaneously, the cpu time needed to enumerate 
each node gets multiplied. To avoid this multiplicative growth of cpu time, we made several 
simplifying assumptions in the implementation of the MCA algorithm [14]. Because of these 
simplifications, the algorithm led to only mild improvement in iMax results.
There are usually several RFO gates in a circuit. If at the output of a RFO gate, a false 
transition is predicted by iMax (such as in Fig. 8(b)), then as this transition propagates 
through the circuit to the output nodes, it causes several other false transitions in the circuit 
along its way. Therefore, locating and suppressing such transitions in iMax is important. 
However, these false transitions can be isolated only by doing a simultaneous enumeration at 
the source MFO gate(s) that created them. In effect, one needs to construct the supergate [15] 
for each RFO node in the circuit and for each supergate, do a simultaneous enumeration at 
its MFO inputs. However, these supergates can be as big as the entire circuit and therefore 
enumerating their inputs becomes intractable. We have implemented an approximate algorithm 
based on enumerating primary stem regions [16, 17]. However, our results show only modest 
improvement in the upper bound waveforms.
From our experience with enumerating internal nodes of a circuit, as explained above, 
we infer that improving iMax upper bound waveforms by enumerating internal nodes is very 
expensive and does not offer a practical solution for VLSI circuits. In the next section, we 
present an alternative partial input enumeration approach that significantly improves the iMax 
results and represents a good speed-accuracy trade-off.
8 Partial Input Enum eration (P IE )
As shown in Table 4, there are usually many more MFO nodes than primary inputs in a circuit. 
Secondly, as stated in section 3, all the inputs to a circuit switch at most once at time zero. 
Therefore, there is only one time point at which a primary input needs to be enumerated. This 
is in contrast to an internal circuit node which usually needs to be enumerated at several time 
points. These observations, combined with the fact that iMax is an extremely fast algorithm 
led us to explore the following partial input enumeration (PIE) algorithm to improve the iMax 
upper bound at every contact point.
18
8.1  T h e  a lg o r ith m
Let x \, x 2, . . xjy be the N  primary inputs of a circuit under consideration. Let X{ represent 
the uncertainty set for input X{ at time zero. The input search space for the circuit consists of 
the set of all valid input patterns that can be applied to the circuit. Mathematically, the input 
search space is {(ei, e2,. . . ,  ejy) \ e\ £ X\ ,  e2 £ X 2, . . . ,  ej\r € X n }. For brevity, we denote this 
by (X i,X 2, .. . ,X jv). Suppose, for the purposes of this illustration, for a particular input X{, 
X{ = X. Then the input search space ( X \ , X 2, .. . ,Xjy) for the circuit can be divided into four 
disjoint parts, namely {X1, X 2, ..,{/}, . . ,XN), ( X x, X 2, .., {h}, . . ,XN), ( X u  X 2, .., {hi}, . . ,XN) 
and ( X i , X 2, { l h } ,  ,.,Xj\f). We can compute the maximum current waveforms (at every 
contact point) for each of these four parts by running the iMax algorithm and in each case, 
restricting the excitation on input X{ to the value in its respective uncertainty subset. Since 
the four parts combined together constitute the complete search space, by taking an upper 
bound envelope of the four current waveforms at every contact point, we can still guarantee an 
upper bound on the respective MEC waveforms. Since, in each of the four runs of iMax, specific 
excitations are present at input X{, signal correlations due to X{ disappear and the resulting 
current waveform should be an improvement on the original upper bound. In a similar fashion, 
the upper bounds for the individual subcases can be improved.
The set of inputs selected for enumeration has a direct influence on the quality as well as 
the cost of the solution obtained. If all the inputs are selected and enumerated then the upper 
bound obtained at every contact point would be exact. However, doing this is practically 
impossible for most circuits. The extent to which an input contributes to signal correlations 
inside a circuit is different for different inputs. For example, in Fig. 8(a), enumerating input 
x is more beneficial than enumerating any of the other two inputs. Hence, by selecting and 
enumerating inputs in an intelligent fashion, we can significantly improve the iMax upper 
bounds, without spending too much cpu time.
We have developed an intelligent best first search (BFS) algorithm [18] that is very effective 
in selecting and enumerating inputs and thereby improving the upper bound at every contact 
point. An example of this search process is given in Fig. 10 in which the second input is 
enumerated first. The search proceeds along a conceptual search tree in which each node 
corresponds to a partial assignment to the primary inputs, i.e., at each node, some inputs have 
specified excitations (e.g. say low), while others have uncertain (or unspecified) excitations. We 
will refer to these nodes as “s_nodes” (search nodes). The process of enumerating a primary 
input at a s_node translates, in the search tree domain, to the so-called expansion of the s_node, 
as shown in the figure. An objective function is associated with the search, which specifies the 
quantity that is being minimized. This function will be explained later in this section. We also 
associate an upper bound (UB) and a lower bound (LB) with the search. UB is the highest value
19
UB = 50d D
I
F igure 10. An example search tree for the BFS algorithm.
of the objective function, corresponding to any s_node generated during the search. LB is the 
value of the objective function corresponding to a specific input pattern. During the search, 
s_nodes which correspond to the best objective value are repeatedly expanded. Because of 
this best first strategy, there is a gradual reduction in the value (at any time) of the upper 
bound waveform at every contact point in the circuit. This iterative improvement is a very 
important feature of the algorithm for large circuits where an exhaustive exploration of the 
input space is practically impossible. The BFS algorithm can be stopped at any intermediate 
stage and the current best upper bounds at various contact points can still be reported.
The BFS algorithm starts with the initial uncertain state i.e., s_node = (Xi, X2, . . . ,  X jv) 
and a known LB, which is the objective value for some input pattern. During the search, a 
s_node with the highest objective value is repeatedly selected and its descendent s_nodes are 
generated by enumerating an input, as explained in the following outline:
R em ark: List is an ordered list of s_nodes, arranged in decreasing o b je c t iv e  
values.
1. List starting s_node (Initial uncertain state).
UB o b je c t iv e  value of the starting s_node.
LB <— o b je c t iv e  value for a specific input pattern, (otherwise 0.0).
2. While Stopping C r ite r io n  is not satisfied, do
2.1. Remove the top s_node from the List.
2.2 . Calculate next input number to enumerate from the S p l i t t in g  C r ite r io n .
2.3. Generate all (< 4) children s_nodes by enumerating the input and calcu­
late their o b je c t iv e  values.
2.4 . If these children are leaf s_nodes, then update the LB, 
else, insert them in List, after pruning if any.
2.5. UB <— o b je c t iv e  value of the top s_node in List.
3. Report the best upper bound current waveform at every contact point. STOP.
20
where a leaf s_node is one which has exactly one excitation associated with each of its 
uncertainty sets (i.e., it corresponds to an input pattern). The following functions are used in 
the outline above.
O b jective  Function: This function specifies the quantity that is being minimized during 
the search. There are usually several contact points within a combinational block and we would 
like to minimize the (iMax) upper bound waveform at each of these points, so that they are 
close to their respective MEC waveforms. One possibility is to minimize the peak of a weighted 
sum of the upper bound waveforms, where these weights are determined depending upon how 
much ‘influence’ the contact point has on the overall voltage drops. We are currently working 
on this problem. In the experiments reported in this paper, we have assumed these weights to 
be unity and we minimize the peak value of the sum of all the upper bound waveforms. This 
corresponds to minimizing the worst case total current of the combinational block.
Stop p in g  C riterion: We stop the search if any of the following two conditions is satisfied.
a. Best UB < LB X ETF.
b. Number of s_nodes generated > User specified parameter (Max_No_Nodes).
The Error Tolerance Factor (ETF) is a user-specified parameter that provides control over 
the final desired accuracy of the algorithm. The value of this parameter is always bigger than 1. 
The first condition above specifies that when the UB value is within ETF factor of some known 
LB, then the search can be terminated. In large circuits where calculating an exact solution 
by running the search to completion is extremely expensive, and an overestimation by 20% to 
30% may be acceptable, such a parameter can be extremely useful. The second condition puts 
a hard limit (Max_No_Nodes) on the number of s_nodes that are to be generated in the search.
P ru n in g  C riterion: During the search, if we come across a s_node for which the UB 
satisfies the following condition:
UB < LB X ETF.
then, such a s_node can be deleted from the search as its UB value is already acceptable. This 
pruning criterion deletes several unnecessary s_nodes during the search and thus keeps the 
memory usage down.
S p littin g  C riterion  (SC ): This criterion specifies the input which should be enumerated 
next from any s_node during the search. In the next subsection, we describe two heuristic 
functions for doing this.
The BFS algorithm always processes s_nodes which are on its current wavefront, see Fig. 11. 
At the start, this wavefront consists of only one s_node, namely the initial uncertain state. As 
the search progresses, this wavefront moves forward through the input search space. An input 
pattern leading to the maximum objective value could belong to any of the s_nodes on the 
wavefront. Therefore, when the algorithm terminates, at every contact point, we compute an
21
F igure 11. Exploring s_nodes through the input search space.
envelope of the current waveforms of all the s_nodes on the wavefront (at every contact point) 
and report that as the upper bound waveforms.
8.2 H euristic  SC and E xperim en ta l R esu lts
We now describe two heuristics for the splitting criterion that have shown good results in 
practice. The first heuristic selects an input which has the highest sensitivity while the second 
one selects an input based upon the influence it has inside the circuit.
8 .2.1 Hi H eu ristic
Let us suppose that during the search, we are at a particular s_node n and we select an input 
X{ for enumeration. If we assume that the uncertainty set for Xi at time zero is X, then by 
enumerating X{, we would generate four children s_nodes, as shown in Fig. 12. We assume 
that the objective value of s_node n is denoted by objn and the objective values of the children
22
s_nodes are denoted by obji, obj/¡,, obj^i and obji/¡,. If we denote
Aobji = objn -  max {obji, objh, objhU objih} ,
then by enumerating we can improve the objective value of s_node n by an amount Aobji. 
We can repeat this calculation for every input and then select an input which gives rise to the 
maximum improvement in the objective value.
However, if Aobji is zero for all the inputs, which happens very often in practice, then the 
above selection process would not work well. For a specific input Xi, Aobji = 0 means that 
the objective value of one of its children s_nodes is equal to objn . However, for the remaining 
children s_nodes, the objective value may not be equal to objn and this information can be 
used in assigning credit (or relative improtance) to Xi. Based on these observations, we have 
come up with the following Hi heuristic function corresponding to an input xp.
Hi = A x  (objn ~ obji) + B  X (objn ~ obj2) +
C X (objn -  obj2) +  (objn ~ obj4)
where obji, obj2, obj$ and obj4 are the objective values of the children s_nodes, generated by 
enumerating Xi, arranged in decreasing order and A , B  and C are three constants such that 
A >  H >  C >  1. At any s_node during the search, we compute the heuristic values for 
every input and select an input with the maximum associated heuristic value. This splitting 
criterion is called dynamic (Hi) splitting criterion because at each s_node, it calculates the 
heuristic value for every input and then selects the best input for enumeration.
The results of partial input enumeration using the BFS algorithm and using the dynamic 
(Hi)  splitting criterion for the nine small circuits are documented in Table 5. For all the 
circuits, the algorithm was run to completion, i.e., till the UB became equal to the LB (ETF = 
1). The results clearly show that the PIE algorithm is very efficient in scanning the input space. 
As an example, the last circuit in the table (Alu) has 14 inputs and therefore, the number of 
possible input patterns for this circuit is 414 = 268,435,456. The PIE algorithm was able to 
scan the entire search space after generating just 233 s_nodes. This also shows that the upper 
bound produced by the iMax algorithm is very tight for these circuits.
As can be seen from Table 5, the number of iMax runs needed in the dynamic splitting 
criterion far exceeds the number of s_nodes generated. At a s_node, to calculate the Hi value 
for a particular input X{, we need to run the iMax algorithm \Xi\ times. If a s_node has k inputs 
which are possible candidates for enumeration (i.e., their \Xi\ > 1), then we need to run the
iMax algorithm J2i=i l-X’il number of times to find the best input to enumerate next. For bigger 
circuits, with large number of inputs, this time will be even more dominant rendering the PIE
23
Table 5. R esu lts  o f  PIE for 9 sm all c ircu its.
Circuit
Dynamic (H { 1 Splitting Criterion Static (Hi) Splitting Criterion
No. S_nodes
Generated
iMax runs 
in SC
Time No. S_nodes
Generated
iMax runs 
in SC
Time
BCD Decoder 17 40 1.9s 17 17 1.2s
Comparator A 109 664 87.7s 277 45 35.1s
Comparator B 45 264 30.6s 45 45 9.7s
Decoder 25 84 3.6s 25 25 1.8s
P. Decoder A 37 180 11.4s 37 37 3.9s
P. Decoder B 37 180 11.7s 37 37 4.2s
Full Adder 53 296 29.2s 53 37 7.7s
Parity 37 180 18.7s 37 37 6.4s
Alu (SN74181) 233 1952 378.2s 2525 57 352.7s
algorithm prohibitively expensive. Therefore, we have experimented with other less expensive 
alternatives.
Instead of calculating the heuristic function value (Hi)  for every input at every s_node 
during the search, we calculate the heuristic value for every input at the beginning of the 
search. All the inputs are arranged in the decreasing order of their heuristic values. During 
the search, at every s_node, inputs are selected in this fixed order. This criterion is called 
static (Hi) splitting criterion. The amount of time spent in the static splitting criterion is 
fixed and is equal to YaLi 1^ *1 runs of the iMax algorithm for a circuit with N  inputs. The 
results of the PIE algorithm using the static (Hi) splitting criterion are also summarized in 
Table 5. With static splitting criterion, the number of runs of the iMax algorithm needed in 
the splitting criterion goes down, but the number of s_nodes generated during the search goes 
up for some circuits. However, for all the circuits, we observe an overall reduction in the cpu 
times needed for the algorithm to complete.
8 .2 .2  #2 H eu ristic
The number of gates that are affected by a change in excitation at an input is a good heuristic 
measure of how much influence the input has on the upper bound waveforms. Therefore, inputs 
which affect more number of gates (i.e., which have larger COINs) should be enumerated before 
others. This leads us to another (static) splitting criterion #2, in which we calculate the size 
of the COIN(xj) associated with each input X{. As with Hi , all the inputs are arranged in the 
decreasing order of H2 values and during the search, at every s_node, inputs are selected in 
this fixed order. We will show in the next section that, while both static Hi  and H2 give good
24
Table 6. R e su lts  o f  PIE for 10 ISC A S-85  c ircu its.
Circuit iMax MCA
Static Hi  SC Static H2 SC
BFS
(100)
BFS
(lk)
Time
(100)
BFS
(100)
BFS
(lk)
Time
(100)
c432 1.12 1.12 1.08 1.05 5m 14s 1.12 1.12 lm  34s
c499 1.33 1.20 1.33 1.33 4m 40s 1.33 1.33 lm  23s
c880 1.31 1.26 1.25 1.22 17m 16s 1.28 1.26 4m 5s
cl355 1.52 1.52 1.52 1.52 21m 28s 1.52 1.52 6m 13s
cl908 1.64 1.55 1.49 1.46 33m 17s 1.58 1.54 11m 51s
c2670 1.35 1.34 1.29 1.28 lh  57m 1.35 1.35 11m 56s
c3540 2.01 1.95 1.45 1.36 51m 12s 1.59 1.37 17m 3s
c5315 1.48 1.44 1.42 1.40 3h 2m 1.48 1.47 26m 2s
c6288 1.28 1.28 1.28 1.27 2h 5m 1.28 1.28 57m 28s
c7552 1.57 1.55 1.52 1.50 6h 21m 1.53 1.53 45m 4s
results in practice, H2 is much better in terms of speed and has accuracy comparable to H\.
The results of partial input enumeration using both H\ and H2 static splitting criteria for 
the ISCAS-85 benchmark circuits are shown in Table 6. In the table, under various iMax, MCA 
and BFS columns, we show the ratio of the respective upper bound to the lower bound obtained 
from simulated annealing. The numbers in parentheses under the BFS columns indicate the 
number of s_nodes that were generated before stopping the search (i.e., the Max_No_Nodes 
parameter; lk stands for 1000). Total cpu times needed by the algorithm on a sun SPARC 
station ELC (with Max_No_Nodes=100) are also shown in the table. From Table 6, we note 
that for all the circuits, the ratio of the upper bound to the lower bound is at most 1.52 (as 
opposed to a worst case of 2.02 for the simple iMax algorithm). This ratio can be further 
improved by running the PIE algorithm for longer durations. We emphasize that, since we 
can only compare the upper bound to a lower bound, the numbers in the table are only upper 
bounds on the error. It is prohibitively expensive to measure the true error.
While the improvement over the original iMax algorithm is not large in all the cases, in 
those cases where the iMax bound was very loose, such as c3540, the new PIE algorithm with 
H1 or H2 heuristic gives significant improvement : the ratio of 2.02 (maximum over-estimation 
by 1.02) is now 1.37 (maximum over-estimation by 0.37) with H2, a reduction in the maximum 
over-estimation by about 64%.
We also emphasize the following attractive property of the algorithm : a significant amount 
of improvement in the upper bound occurs in the first fpw s_nodes (about 50-200) of the 
algorithm. This is shown in Fig. 13 for c3540, where the ratio of the upper bound to the
25
2.2
2.0
1.8 
R
A 1.6
T
I 1.4
O
1.2 
1.0 
0.8
0 20 40 60 80 100 120 140 160 180
Time (min)
F igure 13. ‘Upper Bound /  Lower Bound vs Time’ plot for c3540.
lower bound is plotted as a function of cpu time for the first 1000 s_nodes. This indicates that 
our heuristics are working well to select the most critical s_nodes first. Similar behavior is 
observed for most other circuits.
The cpu time needed for generating the input list by the H2 splitting criterion is negligible 
compared to the time needed by the Hi criterion. For VLSI circuits with several hundred 
inputs, where the time needed by the Hi criterion may be large, H2 criterion may be used 
instead. As can be seen from Table 6 (also see Table 7), the results produced by using either 
splitting criteria are quite comparable, specially for those circuits where iMax did not produce 
a good upper bound.
In order to demonstrate the applicability of the partial input enumeration algorithm for 
VLSI circuits with several thousand gates, we have also experimented with the ISCAS-89 
benchmark circuits [19]. For these synchronous sequential circuits, we have extracted the 
combinational blocks by deleting the flip-flops. These combinational blocks have gate counts 
ranging up to 22,000 and number of inputs ranging up to 1750. The results of the PIE algorithm 
on some of the ISCAS-89 circuits (combinational blocks) using both Hi and H2 splitting criteria 
are summarized in Table 7. It is clear from the table that even for circuits of this size, our 
algorithms show good speed and accuracy performance.
9 Conclusions and Future Work
In this paper, we have proposed a linear time algorithm (iMax) that computes maximum 
currents in the supply lines. Most of the previous algorithms on maximum current estimation
26
Table 7. R esu lts  o f  PIE for 10 IS C A S -89  (C o m b .) c ircu its.
Circuit No.
Gates
iMax MCA
Static HI SC Static H2 SC
BFS
(100)
BFS
(Ik)
Time
(100)
BFS
(100)
BFS
(lk)
Time
(100)
sl423 657 1.35 1.32 1.32 1.29 37m 22s 1.35 1.34 7m 43 s
sl488 653 2.21 2.10 1.40 1.08 5m 32s 1.41 1.06 2m 49s
sl494 647 2.18 2.08 1.37 1.06 5m 35s 1.39 1.05 2m 51s
s5378 2779 1.38 1.37 1.29 1.25 2h 23m 1.30 1.23 13m 21s
s9234 5597 1.76 1.74 1.51 1.47 7h 24m 1.56 1.56 37m 18s
sl3207 7951 1.37 1.35 - - - 1.30 1.26 36m 53s
sl5850 9772 1.81 1.80 - - - 1.64 1.57 lh 11m
s35932 16065 1.66 1.66 - - - 1.56 1.56 2h 6m
s38417 22179 1.73 1.70 - - - 1.72 1.68 2h 46m
s38584 19253 1.45 1.38 - - - 1.39 1.37 2h 15m
suffer from exponential complexity and are not adequate for large circuits. Our approach avoids 
exponential complexity by adopting a pattern independent approach. The results produced by 
the algorithm are within acceptable bounds for most circuits. We have also presented a new 
partial input enumeration algorithm that partially resolves the signal correlations and further 
improves the upper bound obtained from iMax. The algorithm is based on the best first search 
(BFS) technique and represents a good time-accuracy trade-off. The PIE algorithm involves 
a search procedure, but this search need not be carried too deep to obtain good results. The 
algorithm is quite applicable to VLSI circuits, as is demonstrated by the experimental results 
on circuits with up to 22,000 gates. In our future research, we plan to extend the study to 
include better gate delay and current models and to identify troublesome voltage drop sites in 
supply lines, using RC models, from the maximum current estimates.
Acknowledgem ents
Part of this research was performed while the first author was at Texas Instruments during 
the summers of 1991 and 1992. The authors are thankful to the technical staff members of 
the Semiconductor Process and Design Center at TI Dallas, especially Dr. Ping Yang and Dr. 
Jue-Hsien Chern for providing valuable discussions and guidance. Thanks are also due to Prof. 
Tamer Basar and Prof. Janak Patel of University of Illinois for their help. This research is 
supported by Texas Instruments Inc. and the Semiconductor Research Corporation.
27
A ppendix (Voltage Drops in Buses)
To calculate voltage drops occurring in the power or ground bus, we need to extract the 
equivalent RC network of the bus. Various circuit extraction algorithms, such as the one 
described in [20] can be used for this purpose. In the RC network, capacitances are assumed 
to be lumped between a node and ground. If we denote the vector of “voltage drops” at 
various nodes (which is (Vdd -  node voltage) for the power bus and (node voltage) for the 
ground bus) by V  and the vector of node currents by 7, then by properly adjusting the sign 
of I  (current entering (leaving) the ground (power) bus is taken positive), one can show the 
following relationship between V  and I:
Y V  = I  -  CV  (2)
where Y  is the node admittance matrix of the resistive circuit and C is a diagonal matrix of 
node capacitances. The ith  diagonal entry of the C matrix is equal to the node capacitance 
between node i and ground. The following lemma holds between the node voltages and the 
applied currents:
L em m a : If non-negative current waveforms are injected at various nodes of a RC network, 
then all the node voltages will be non-negative.
Proof: The current voltage equation for the system is given by Eq. 2:
CV(t) = 7(f) -  YV( t )  V(0) = 0
To prove that the node voltage v,-(f) at any node i is non-negative for all time (i.e., vtf t )  > 0 Vt), 
all that we need to prove is that whenever the node voltage becomes zero (v{(r) = 0), its time 
derivative at that time is non-negative (ut(r) > 0) and therefore, the node voltage never 
becomes negative. In the beginning (t = 0), E(0) =  0, therefore,
CV(0) = 7(0) > 0
Therefore, the time derivatives of all the node voltages are non-negative at time 0.
At some arbitrary time t, lets suppose that the node voltage at some node i is zero 
(v{(t) = 0), while all other node voltages are non-negative (vj(t) > 0, 1 < j  < n, j  ^  i). 
The differential equation corresponding to node i can be written as,
n
cM t )  = ii(t) -  Y y i j V j ( t ) 
j=i
Here, ct- is the ith diagonal entry of C and Ii is the ith component of 7. Since Vi(t) = 0,
28
therefore,
m nS VijVjit)
j=i
j ¥ '
Now, C{ > 0, Ii(t) > 0, Vj(t) > 0 fo r  j  = 1,2,. . .ra, j  /  i and 3/ty, i /  j  are the oif-diagonal 
terms of the node admittance matrix [21] and therefore are all negative. Hence v / t )  > 0 . ■
T h eorem  A1 : If the vectors of node voltages of an RC network in the two cases when 
non-negative current waveforms Ii(t) and I2{t) are injected into various nodes of the network, 
are denoted by V\(t) and V2(t) respectively, and if Ii(t)  < I2(t), then
Vi(t) < V2(t)
Proof : Note that each of I \( t) and I2{t) consists of a set of n non-negative current waveforms, 
if the circuit has n nodes. Let AV(t) denote the vector of voltage drops at various nodes 
when I2(t) — Ii(t) current waveforms are applied at the nodes. Since I2{t) > h (t) ,  therefore, 
I2(t) — I i( t) > 0. Hence, from the lemma just proved, we conclude that AV(t) > 0. Note that
m = m + (wt) - h w)
Therefore, from the linearity of the RC network, we have
V2 (t) = Vi (t) + A V
Since A V  > 0, therefore V2(t) > Vi(t).
References
[1] C. Mead and L. Convey, Introduction to VLSI Systems. Reading, MA: Addison-Wesley, 
1980.
[2] R. Dutta and M. Marek-Sadowska, “Automatic sizing of power/ground (p/g) networks in 
VLSI,” in Proceedings of 26th ACM /IEEE Design Automation Conference, pp. 783-786, 
June 25-29, 1989.
[3] S. Chowdhury, “Optimum design of reliable ic power networks having general graph 
topologies,” in Proceedings of 26th ACM /IEEE Design Automation Conference, pp. 787- 
790, June 25-29, 1989.
[4] S. Chowdhury and J. S. Barkatullah, “Estimation of maximum currents in MOS IC logic 
circuits,” IEEE Transactions on Computer-Aided Design, voi. 9, no. 6, pp. 642-654, June 
1990.
[5] F. Rouatbi, B. Haroun, and A. J. Al-Khalili, “Power estimation tool for sub-micron CMOS 
VLSI circuits,” in Proceedings of IEEE/ACM  International Conference on Computer- 
Aided Design, pp. 204-209, Santa Clara, CA, November 8-12, 1992.
29
[6] A. Nabavi-Lishi and N. Rumin, “Delay and bus current evaluation in CMOS logic cir­
cuits,” in Proceedings of IEEE/ACM  International Conference on Computer-Aided De­
sign, pp. 198-203, Santa Clara, CA, November 8-12, 1992.
[7] U. Jagau, “SIMCURRENT -  an efficient program for the estimation of the current flow of 
complex CMOS circuits,” in Proceedings of IEEE International Conference on Computer- 
Aided Design, pp. 396-399, Santa Clara, CA, November 11-15, 1990.
[8] A.-C. Deng, Y.-C. Shiau, and K.-H. Loh, “Time domain current waveform simulation of 
CMOS circuits,” in Proceedings of IEEE International Conference on Computer-Aided 
Design, pp. 208-211, Santa Clara, CA, November 7-10, 1988.
[9] S. Devadas, K. Keutzer, and J. White, “Estimation of power dissipation in CMOS combi­
national circuits using boolean function manipulation,” IEEE Transactions on Computer- 
Aided Design, no. 3, pp. 373-383, March 1992.
[10] A. Tyagi, “Hercules: a power analyzer for MOS VLSI circuits,” in Proceedings of IEEE In­
ternational Conference on Computer-Aided Design, pp. 530-533, Santa Clara, CA, Novem­
ber 9-12, 1987.
[11] S. Kirkpatrick, C. D. Gelatt, Jr., and M. P. Vecchi, “Optimization by simulated anneal­
ing,” Science, vol. 220, no. 4598, pp. 671-680, 13 May 1983.
[12] P. J. M. van Laarhoven and E. H. L. Aarts, Simulated Annealing: Theory and Applications. 
Norwell, MA: Kluwer Academic Publishers, 1987.
[13] F. Brglez and H. Fujiwara, “A neutral netlist of 10 combinational benchmark circuits and 
a target translator in fortran,” in Proceedings of International Symposium on Circuits and 
Systems, pp. 695-698, June 1985.
[14] H. Kriplani, F. Najm, and I. Hajj, “Maximum current estimation in CMOS circuits,” in 
Proceedings of 29th ACM /IEEE Design Automation Conference, pp. 2-7, Anaheim, CA, 
June 8-12, 1992.
[15] S. Seth, L. Pan, and V. D. Agrawal, “Predict -  probabilistic estimation of digital circuit 
testability,” in The 15th International Symposium on Fault Tolerant Computing, pp. 220- 
225, Ann Arbor, MI, June 19-21, 1985.
[16] F. Maamari and J. Rajski, “A reconvergent fanout analysis for efficient exact fault simula­
tion of combinational circuits,” in IEEE 18th International Symposium on Fault Tolerant 
Computing, pp. 122-127, Tokyo, Japan, June 27-30, 1988.
[17] S. Dey, F. Brglez, and G. Kedem, “Corolla based circuit partitioning and resysthesis,” in 
Proceedings of 21th ACM /IEEE Design Automation Conference, pp. 607-612, Orlando, 
FL, June 24-28, 1990.
[18] J. Pearl, Heuristics -  Intelligent Search Strategies for Computer Problem Solving. Reading, 
MA: Addison-Wesley, 1984.
[19] F. Brglez, D. Bryan, and K. Kozminski, “Combinational profiles of sequential benchmark 
circuits,” in Proceedings of International Symposium on Circuits and Systems, pp. 1929- 
1934, May 1989.
30
[20] H. Cha, “Current density calculation using rectilinear region splitting algorithm for very 
large scale integration metal migration analysis,” Master’s Thesis UILU-ENG-90-2223, 
University of Illinois at Urbana-Champaign, August 1990.
[21] C. A. Desoer and E. S. Kuh, Basic Circuit Theory. New York, NY: McGraw-Hill, 1969.
31
