Currents owing in the power and ground P&G buses of CMOS digital circuits a ect both circuit reliability and performance by causing excessive v oltage drops. Excessive v oltage drops manifest themselves as glitches on the P&G buses and cause erroneous logic signals and degradation in switching speeds. Maximum current estimates are needed at every contact point i n the buses to study the severity of the voltage drop problems and to redesign the supply lines accordingly. These currents, however, depend on the speci c input patterns that are applied to the circuit. Since it is prohibitively expensive t o e n umerate all possible input patterns, this problem has, for a long time, remained largely unsolved. In this paper, we propose a patternindependent, linear time algorithm iMax that estimates at every contact point, an upper bound envelope of all possible current w a v eforms that result by the application of di erent input patterns to the circuit. The algorithm is extremely e cient and produces good results for most circuits as is demonstrated by experimental results on several benchmark circuits. The accuracy of the algorithm can be further improved by resolving the signal correlations that exist inside a circuit. We also present a n o v el partial input enumeration PIE technique to resolve signal correlations and signi cantly improve the upper bounds for circuits where the bounds produced by iMax are not tight. We establish with extensive experimental results that these algorithms represent a good time-accuracy trade-o and are applicable to VLSI circuits.
Introduction
A major concern in present d a y VLSI circuits is the design of power and ground P&G buses in a way that ensures design reliability and performance. Excessive currents can severely a ect both circuit reliability and performance by causing excessive v oltage drops in the P&G buses. Excessive v oltage drops manifest themselves as glitches on the buses and cause erroneous logic signals soft errors and degradation in switching speeds. Severity of the voltage drop problems intensify with the continuing push for denser chips and ner technologies. As is known from the classical scaling theory 1 , as the minimum feature size and supply voltage are scaled down, while the total power dissipation on the chip remains constant, the currents owing in the P&G buses increase. With higher currents owing in narrower buses, the voltage drops in the P&G buses go up and become a limiting factor in the design of VLSI chips. Furthermore, a lower supply voltage means that the noise margins 1 for the correct operation of the transistors on chip decrease. In short, in order to avoid logic errors, the circuit needs to be appropriately designed to take care of increased voltage drops and reduced noise margins. This highlights the need for e cient CAD tools to estimate voltage drops in the buses. Since worst case currents determine worst case voltage drops, our research is focused on the problem of estimating maximum currents in the P&G buses.
Power and ground buses deliver power to all the gates in a circuit. Points at which individual gates or cells are tied to the buses are called contact points. In VLSI circuits, P&G buses take up an appreciable amount of routing area, typically 20-50 or even more in some circuits. Several design methods, such a s 2 , 3 , have appeared in literature that make use of the maximum current estimates at the contact points to redesign the buses. The output of a design optimization procedure, however, depends upon the accuracy with which maximum currents are estimated. A poor estimate of maximum currents will result in a pessimistic design and wasted silicon area. Clearly, an accurate estimation of currents at every contact point i n a circuit is very crucial and is the subject of this paper.
Current drawn by a CMOS circuit depends upon the speci c input pattern applied at its
inputs. An input pattern for a circuit with n inputs is de ned as a vector of n excitations, where each excitation could be any one of four possibilities : l low, h high, hl high to low o r lh low to high. For di erent input patterns, di erent transient current w a v eforms are drawn at the contact points. Therefore, in the presence of such input dependent and transient current waveforms, we need to de ne what we mean by the maximum current w a v eform at a contact point. Chowdhury et. al. 4 nd the maximum of the peaks of various transient current waveforms at every contact point for all possible input patterns. They then use these constant peak values at the contact points to redesign the supply lines. This assumption, however, gives pessimistic results since separate sections in a circuit rarely draw their maximum currents simultaneously. In this paper, we propose a better measure of the maximum current w a v eform at a contact point called maximum envelope current MEC w a v eform. This maximum current estimate is discussed in section 4. Accurate estimation of the maximum current w a v eform at every contact point is extremely di cult since for that we need to determine current w a v eforms corresponding to all possible input patterns. If a circuit has n primary inputs then we need to simulate it for 4 n input patterns, since each input can be l, h, hl or lh. This makes the problem practically impossible to handle by a n y of the known search procedures for large circuits. As will be shown in the next section, most previous work in this area has been based on search techniques. In this paper, we propose a pattern independent, linear time in the number of gates algorithm iMax that provides tight upper bounds on the MEC waveforms. The proposed approach represents a trade-o between execution speed and tightness of these bounds.
In order to maintain reasonable execution times, the iMax algorithm neglects various signal correlations that exist inside a circuit. As will be shown later, while in most cases iMax produces good upper bound waveforms, in some cases the loss due to signal correlations can be signi cant. We then propose a new partial input enumeration PIE algorithm that e ciently resolves these correlations and leads to signi cant improvement in the upper bound waveforms. The PIE algorithm is based on 1 intelligently selecting a few critical inputs and 2 enumerating a limited number of cases at these inputs to produce an overall improvement in the upper bound waveforms at the contact points. It turns out that the choice of these critical inputs is the key to improving the upper bounds. We present t w o heuristics for automatically selecting the critical inputs, that have shown good results in practice. While the PIE algorithm is slower than the simple iMax algorithm, we demonstrate good speed and accuracy performance results on circuits with over twenty v e thousand gates. Furthermore, the algorithm has the attractive property that it does an iterative improvement, so that one can stop the algorithm at any time and obtain better upper bounds than the simple iMax results. This paper is organized as follows. In the next section, we brie y discuss the previous and related work in this area. We then discuss various assumptions that our algorithms are based on. In section 4, we describe the proposed maximum current estimate. After that, we present the iMax algorithm in detail in section 5. Experimental results on several benchmark circuits using iMax are also provided in this section. The signal correlation problem is described in section 6. This is followed by a discussion of possible methods that can be used to resolve the signal correlations in section 7. In section 8, we present the partial input enumeration algorithm along with extensive experimental results on several benchmark circuits. Finally, i n section 9, conclusions and some guidelines for future work are presented.
Previous Work
Several papers such a s 5 , 6 , 7 , 8 have appeared in literature on the estimation of P&G currents from deterministic input patterns. These methods o er signi cant improvement i n execution times compared to SPICE, while providing acceptable accuracy in the current w a v eforms. These methods can be used for nding maximum currents for small circuits having a few inputs, by calculating the current w a v eforms corresponding to all possible input patterns. However they are not much helpful for large circuits, as they do not guide us in selecting an input pattern that leads to the maximum currents.
Chowdhury et. al. have addressed the problem of maximum current estimation in 4 . In their methodology, they divide the circuit into a set of macros, where each macro consists of a combinational interconnection of logic gates. Considering each macro separately, they use either an exact search technique namely, branch and bound or a heuristic technique to nd the maximum of its transient currents, assuming that the macro has only one cantact point and its inputs switch simultaneously. In the analysis of the bus, to calculate maximum voltage drops, of this assumption, their methodology overestimates the worst case currents and voltage drops. Secondly, due to the huge size of the input space, their branch and bound search technique is slow on large circuits. Furthermore, their heuristic approach does not guarantee an upper bound on the maximum currents.
Devadas et. al. have addressed a similar problem in 9 . They consider the estimation of worst case power dissipation in CMOS combinational circuits. They reduce this problem t o a w eighted max-satis ability problem on a set of multi-output Boolean functions. These functions are obtained from the logic description of the circuit. The functions are appropriately weighted to account for di erent load capacitances. They then use either a disjoint c o v er enumeration algorithm or the branch and bound algorithm to solve the N P -complete maxsatis ability problem. However, for a multilevel logic circuit, even under a unit gate delay assumption, the functions generated by their algorithm are fairly complex. Consequently, e v en for small circuits, their analysis is slow. Analysis of multi-level circuits under a general delay model was not attempted.
From this brief survey, it is clear that existing methods for the calculation of maximum current are computationally too expensive to handle large VLSI circuits. For these circuits, near linear algorithms rather than exponential, are necessary. Therefore, pattern independent algorithms become a natural choice. Hercules 10 w as an initial attempt in the direction of a pattern independent approach to maximum current estimation. However, the analysis presented in 10 makes several simplifying assumptions. The approach subdivides the circuit into stages but does not discuss how information is represented at the output of each stage and how it is propagated from one stage to others. Further, the signal correlation problem is not discussed in the paper. In this paper, we present a n o v el approach that is able to address these problems. This approach is discussed in section 5.
Assumptions
In order to reduce the complexity of the problem, we focus on a speci c, but very common design style, namely edge-triggered latch-controlled synchronous digital circuits. These circuits consist of combinational blocks separated by latches see Fig. 1 such that all the inputs to each block switch simultaneously. As a result, we will focus the analysis, from the next section, on a
Combinational Block
Figure 1: A latch controlled synchronous digital circuit.
single combinational block all of whose inputs switch simultaneously if at all. This e ectively eliminates the time domain uncertainty about the input transitions, and signi cantly simpli es the problem. This assumption has also been used by all the previous approaches. We assume that the delay o f e a c h gate in the circuit is xed and is calculated ahead of time. Di erent gates can have di erent delays. Further, we assume that for every transition at the output of a gate, the current w a v eform drawn from the power or ground bus, called the transition current waveform, is represented by a triangular waveform, as shown in Fig. 2 . The value of the delay and various parameters of the transition current w a v eform, such a s its duration, peak value and the time point at which the peak occurs, are calculated in a preprocessing phase from the circuit level parameters of the gate under consideration as well as of other gates that are connected to its inputs and output. This work, as well as the extension of the proposed algorithms under more general delay and current models is the subject of another paper and the interested reader is referred to 11, 12 . In this paper, due to space limitations, we focus on the algorithmic aspect of the maximum current estimation process under these simpli ed gate models.
Given the speci c clocking scheme of the synchronous circuit, the maximum current w a v eforms from di erent combinational blocks are appropriately shifted in time depending upon the individual clock trigger, and are used to nd the maximum voltage drops in the buses. Therefore, for the purposes of this paper, we will focus on the analysis of a single combinational block whose inputs switch at time zero.
Maximum Current Estimate
We de ne excitation at a node or net at any time t as the stimulus or signal value present at the node at that time. At a n y time, a node in the circuit could be either stable at low or high, or could transition from high to low or from low to high. T h us, the excitation could be any single value from the set X = fl, h, hl, lhg.
We n o w describe the measure we use in our approach to represent maximum currents. For the purposes of this illustration, let us consider a speci c contact point in a circuit. As mentioned in the introduction, the current drawn by a CMOS circuit is a complex function of input excitations. For each input pattern that is applied to the circuit, a di erent current waveform results at the contact point. Instead of representing the maximum current a t t h e contact point b y a single dc value, in our approach, we represent i t b y a w a v eform whose value at any time is the maximum current v alue that the circuit can draw at that time see Fig. 3 . We call this the Maximum Envelope Current MEC w a v eform.
Let us suppose that a circuit under consideration has n inputs and when an input pattern p = e 1 ; e 2 ; : : : ; e n , where e i 2 X, 1 i n , is applied it, a transient current w a v eform I p t is drawn at the contact point. Let us denote the set of all possible input patterns that can be I MEC t = max p2U I p t 1 Clearly, also see Fig. 3 the MEC waveform is the maximum envelope of all transient current waveforms corresponding to all possible input patterns and hence the name. There is a unique MEC waveform at every contact point.
If the power or the ground bus of a circuit is represented by an equivalent R C network, then we h a v e the following result:
The Estimating MEC waveforms at all the contact points in a circuit is an extremely di cult problem as for that we need the current w a v eforms corresponding to all possible input patterns. In the next section, we describe a linear time algorithm that provides tight upper bounds on the MEC waveforms at the contact points.
The iMax Algorithm
The proposed pattern independent, linear time algorithm operates at the gate level description of the circuit. Unless speci ed by the user, it assumes that nothing is known about the speci c excitations at the primary inputs, except that they may transition only at time zero, i.e., each primary input may carry any excitation from the set X at time zero. We call this an uncertainty about these input signals. The basic idea of the proposed algorithm is to propagate this uncertainty present at the inputs inside the circuit, so that, at the output of every logic gate, we know the set of all possible excitations and their associated timing. From this, the worst case gate currents are computed, as explained below.
Signal Representation
Perhaps the rst question that comes to mind is what kind of information one maintains in order to represent the signal uncertainty about internal circuit nodes. Ideally, one would like to compute the set of all possible transitions along with their timing information that occur at the output of every gate in the circuit. However, as will become clear soon, due to the uncertainty at the primary inputs and the general gate delay model used, the number of possible transitions at internal nodes grows exponentially, and quickly becomes a bottleneck. To a v oid this problem, we maintain information, not about individual transitions, but about intervals during which the outputs of the gates might switch. Thus, at each node, for each of the excitations l, h, hl, lh, we maintain a list of intervals during which the node might carry those excitations. These intervals, which might o v erlap, serve to describe the signal uncertainty. W e call these intervals uncertainty intervals.
De nition 1 Uncertainty Set X n t : The uncertainty set at time t for a node n de nes the set of all possible excitations that the node can assume at that time. X n t X.
De nition 2 Uncertainty W a v eform : The uncertainty w a v eform describes the signal uncertainty present at a node as a function of time. At time t, the set of values taken by the waveform is the uncertainty set for the node at that time. An example of the uncertainty w a v eform is given in Fig. 4 . In this gure, we show a n uncertainty w a v eform Ut represented as four sets of intervals 1 along the time axis. Thus, if ut is a logic signal that belongs to the family Ut, i.e., ut 2 Ut, then ut will be low up to t 1 , will switches from low to high sometime between t 1 and t 2 , will then be high up to t 3 , etc. Since the signal can switch from low to high at any time between t 1 and t 2 , i t c a n b e either high or low during that interval. Notice that between t 6 and t 7 the signal may make any number of low to high and or high to low transitions. At the primary inputs, signals are represented by such w a v eforms with a single point o f p ossible transition at time 0. As internal signals are generated, the number of points at which transitions can possibly occur, increases. In order to contain the complexity, w e then start to merge neighboring transition points into intervals. In general, this strategy can be stated as follows : when the number of intervals 1 One set of intervals for each low, high, hl and lh excitations. associated with a gate corresponding to any excitation exceeds a certain user-speci ed threshold Max No Hops, we repeatedly merge closest-neighbor intervals, so as to keep their count below the threshold.
Independence Assumption
While propagating information at a logic gate, we know the uncertainty w a v eforms at each o f its inputs and we w ould like to derive the corresponding waveform at its output. However, one cannot do this accurately without knowing how some of these inputs, if any, are correlated. F or instance, certain combinations of the gate input excitations may not be possible. Unfortunately, maintaining information about correlation between various circuit nodes is very expensive. We, therefore, use a conservative approximation, one that does not underestimate the MEC waveforms, as follows. If we assume that all combinations of the gate input excitations are possible, i.e., the gate inputs are independent, then the worst case current in that case will be an upper-bound on the gate current for the case when the inputs are dependent. In other words, the worst case current o v er all combinations of inputs is certainly an upper bound on the worst case current o v er some.
Single Gate Simulation
Given the type of a Boolean gate and the independence assumption for the uncertainty w a v eforms at its inputs, we n o w describe how the uncertainty w a v eform at the output of the gate is calculated. This process is divided into the following two parts:
1. Calculation of the uncertainty set at the output of the gate at a time t.
2. Calculation of uncertainty i n tervals at the output of the gate.
Calculating Uncertainty Set :
One can calculate all possible excitations at the output of the gate at time t from the uncertainty sets at its inputs at time t,D, where D is the delay of the gate. Let us denote the uncertainty set at the i th input of the gate at time t , D by X i . Let us further suppose that the gate has m inputs. Then the set of all possible input patterns that lead to an excitation at the output of the gate at time t can be represented by fx 1 ; x 2 ; : : : x m j x i 2X i ;8 1im g . F or each input pattern, the output of the gate can be easily determined from the Boolean equation of the gate as explained below. For simple functions, such as AND, OR and NOT, it is easy to verify that the output of the gate is as shown in Fig. 5 . In fact, the set l, h, hl, lh along with the above de nitions for AND, OR and NOT constitutes a 4-values Boolean algebra 13 . The output of a gate realizing any arbitrary Boolean function can be easily calculated by repeated applications of the above. By calculating the output of the gate for each and every input pattern, the resulting activity or uncertainty set at the output of the gate at time t can be determined. This process, however, requires one to generate and evaluate jX 1 jjX 2 j : : : j X m jinput patterns. This worst case complexity can be greatly reduced by the following observations.
1. The above input pattern generation and evaluation process can be stopped when the uncertainty set at the output of the gate becomes equal to X. O b viously, trying out any more input patterns would not lead to any further improvement in the uncertainty set.
2. If the uncertainty sets for all of the inputs at time t , D are Xs then the uncertainty set at the output of the gate at time t is also X. It is trivial to verify this fact for simple NAND, NOR and INVERTER gates. Because of the functional completeness of NAND, NOR and INVERTER 14 , any composite gate can be represented in terms of these simple gates. Therefore, the fact holds for any gate type.
Both of these observations lead to tremendous savings in the calculation of uncertainty sets at the output of the gate and thus contribute to the speed of the algorithm.
Calculating Uncertainty I n tervals :
In iMax, since signals are represented in the form of uncertainty i n tervals at the inputs of a gate, the output of the gate would also be in the form of uncertainty i n tervals. An interval at the output of a gate could begin or end at time t only if an interval begins or ends at any o f its inputs at time t , D. Between the times at inputs when an interval begins or ends, and the next interval begins or ends, the sets of excitations that the inputs can assume do not change and therefore no corresponding uncertainty i n terval can begin or end at the output during that time shifted by D. Thus by calculating the uncertainty sets at the output of the gate at every time point at which an uncertainty i n terval begins or ends at any of its inputs, the uncertainty intervals at the output are calculated. An example illustrating how uncertainty i n tervals at various circuit nodes are calculated is shown in Fig. 6 . At the primary inputs, it is assumed that each of the inputs may carry any excitation from the set X. T h us, each input may switch hl or lh at time 0, or stay a t l or h for all time. Given this information at the input of the inverter and assuming its delay as 1 time unit, its output may switch lh or hl at time instant 1 o r s t a y a t l or h. Similarly, the NAND gate may switch at time 2 because of the second primary input, or at time 3 because of the output of the inverter, or stay a t l or h for all time. In this fashion, uncertainty w a v eforms are propagated from one gate to another. From this example, we also notice that while each o f the inputs to the NAND gate may switch only at most once, its output may switch a t t w o time points. Thus, as the uncertainty w a v eforms are propagated through the NAND gate, the number of time points at which the gate can switch has doubled. This multiplicative growth of the number of intervals can potentially lead to memory bottlenecks for large circuits. In order to contain this growth, we h a v e suggested merging neighboring intervals to form bigger intervals. Thus, if MAX NO HOPS parameter is set to 1, then we w ould merge intervals 2, 2 and 3, 3 to form interval 2, 3 , as shown in the gure.
Current Calculation
After the uncertainty w a v eform at the output of a gate is known, its current contribution is calculated next. Since the output of the gate could switch a t a n y time during an uncertainty interval, a transition current w a v eform could be drawn at any time during the interval shifted backwards by the delay of the gate from the P&G buses, as shown in Fig.7 . Hence, by taking an envelope of all possible transition current w a v eforms, we get the worst case current contribution of the gate due to the uncertainty i n terval. At e v ery gate, there are two t ypes of uncertainty i n tervals that result in some switching activity at the output and therefore, there are two possible current w a v eforms, one due to the hl uncertainty i n tervals, called hlCurrent and the other due to lh uncertainty i n tervals, called lhCurrent. Since at any time, the output of the gate could switch either from high to low or from low to high, b y taking an envelope of the hlCurrent and lhCurrent waveforms, we get the maximum current contribution of the gate. Once all the gate currents are calculated, the current w a v eforms at the contact points are calculated by adding the individual currents appropriately shifted in time of those gates that are tied to it.
Calculation of Voltage Drops
It is shown in the appendix that when the equivalent network of the power or ground bus is represented by a resistive network, the vector of voltage drops appearing at its nodes V i s related to the corresponding vector of contact point currents I as follows : 
Implementation Details
The above approach has been implemented in a program in C. In the program, the circuit is rst levelized so that the output of a gate at level j does not feed any other gate at a level less than or equal to j. A n y user-speci ed restrictions on certain inputs are then imposed, while all other inputs are assumed to take all possible excitations from the set X. After this, the circuit is analyzed in a level by level fashion, starting from the lowest level, by propagating the uncertainty w a v eforms at the inputs of every gate to its output. From these uncertainty waveforms, we calculate the current w a v eforms at the contact points which are point-wise upper bounds on the corresponding MEC waveforms, i.e., if the current w a v eform calculated at any contact point b y the iMax algorithm is denoted by I iMax t and the corresponding MEC waveform by I MEC t, then I iMax t I MEC t; forall t 0: where Ma xV D p is the maximum voltage drop that occurs in the bus when an input pattern p is applied to the circuit.
An important property of the iMax algorithm is that each gate is considered exactly once in the entire analysis. Further, because of the interval merging feature, the memory space requirement per gate is xed. Therefore, the algorithm is linear in time as well as space in the number of gates in the circuit.
Quality Assessment
In order to assess the quality of the solution obtained from the iMax algorithm, we need to determine how close the upper bound obtained is to the WC Max VD. One way of doing this would be to perform an exhaustive e n umeration over all possible input patterns and actually calculate the WC Max VD. H o w ever, doing this is very expensive and practically impossible for circuits with more than about 10 inputs Note: 4 10 = 1 ; 048; 576. Therefore, the following repeated enumeration approach is used for the veri cation of results. In the approach, di erent input patterns are repeatedly applied to the circuit. For each pattern, a logic simulator 2 is used to calculate the outputs of various gates. From these gate outputs, the current w a v eforms at the contact points are calculated, as in the case for iMax. Using these current w a v eforms, the maximum voltage drop in the bus is calculated, as described in Section 5.5. By repeating this process for a nite number of input patterns say V , we obtain a lower bound on the WC Max VD, as seen from the following equation for an input pattern p, the maximum voltage drop is denoted by Ma xV D p :
Naturally, as more patterns are simulated the closer this lower bound approaches to the worst case value. In our experiments, for those cases where it is not possible to calculate the WC Max VD, w e compare the iMax upper bound with this lower bound. The program that implements this repeated enumeration technique is called iLogSim Current Logic Simulator.
The choice of input patterns for the above repeated enumeration process is very crucial to the goodness of the lower bound obtained. By a poor selection of input patterns, we m a y end up wasting cpu time without much improvement in the lower bound value. For the experimental results, we h a v e tried a combination of schemes such as random selection, simulated annealing and exhaustive e n umeration on a reduced input space; and have reported the best lower bound obtained.
Experimental Results
In this section, we tabulate the results obtained from running the iMax and iLogSim algorithms on the power buses of several small and large circuits. Similar results can be obtained for the ground buses. For lack of real data, the bus network for each circuit was generated by randomly assigning each gate to a contact point and randomly generating links between the contact points. The network was, however, not restricted to a simple tree topology. Table 1 lists the results of running iMax and iLogSim algorithms on nine small circuits. These circuits have n umber of gates ranging from 16 to 121 and number of inputs ranging from 4 to 14. The numb e r o f c o n tact points for each circuit are also shown in the In Table 2 , we report similar results for the ten ISCAS-85 benchmark circuits 16 . These circuits have n umber of gates ranging from 218 to 5066 and all the circuits have at least 32 inputs. In the table, in the last two columns, we document the cpu times needed by the iMax algorithm and the typical times needed for trying 10,000 input patterns by the iLogSim algorithm on a sun SPARCstation ELC. The actual iLogSim results were obtained after trying about 100,000 input patterns. We observe that for all the circuits, the linear time iMax algorithm took only a few seconds of cpu time compared to several hours of time needed by the iLogSim algorithm. Furthermore, for most of these circuits, the ratio of iMax upper bound to iLogSim lower bound is less than 1.72. There are two possible reasons for this mismatch. Firstly, it is quite possible that the iLogSim lower bound is not close to the WC Max VD. Since all the circuits have at least 32 inputs, the space of possible input patterns is huge, and the lower bound obtained after trying about 100,000 input patterns may not be very close to the WC Max VD. F or circuits where the input space is not so huge Table 1 , we w ere able to obtain WC Max VD results and these are in good agreement with iMax results. The second possible source of mismatch is our conservative independence assumption for signals at various nodes. One can improve on this assumption by attempting to resolve the signal correlations, as discussed in the following sections.
We next discuss the e ect of varying the Max No Hops parameter on the performance of iMax. T able 3 lists the iMax upper bound results for ISCAS-85 circuits for di erent v alues of Max No Hops. In parentheses, we also tabulate the cpu times in sec. needed by the algorithm. As the value of Max No Hops increases, the number of intervals being merged at every node 
The Signal Correlation Problem
In general, signals at internal nodes of a circuit are correlated and this limits the number of transitions that can possibly occur at the outputs of the gates. Two examples of how signal correlation limits the number of transitions are illustrated in Fig. 8 .
In Fig. 8a , signal lines x1 and x2 are correlated, in this case, they carry the same signal. It is easy to verify that depending upon the speci c excitation present a t x , only one of the two gates can switch at a time. However, since iMax ignores the signal correlation between lines x1 and x2, it calculates the uncertainty sets at the outputs of the two gates as shown in the gure and thus erroneously concludes that both gates may switch at the same time. It, therefore, adds two transition current w a v eforms due to both gates switching simultaneously to the contact point current w a v eforms. Similarly, in Fig. 8b , the output of the inverter is correlated with its input and so the NAND gate may never switch. However, ignoring this correlation, iMax concludes that the NAND gate can switch. Thus, the iMax algorithm calculates more transitions than can actually occur in a circuit. It is these kinds of approximations that contribute to a loose iMax upper bound. As is clear from these examples, the source of the signal correlation problem, in general, is a node the output of a gate or an input which fans out to several other gates. Such nodes are called multiple fan-out MFO nodes. The general situation is shown in Fig. 9 , where a MFO gate G with output node n fans out to nodes n 1 , n 2 , : : : ,n k that in turn feed gates G 1 , G 2 , : : : ,G k . In this gure, inputs to the gates G 1 , G 2 , : : : ,G k which are n 1 , n 2 , : : : , n k respectively are correlated. Due to this correlation, even though the output of each gate G 1 , G 2 , : : : ,G k can assume all possible excitations as calculated by iMax, they may not simultaneously carry their worst case excitations. As one goes deeper into the circuit, where these correlated outputs reconverge and feed the same gate, the inputs of that gate become correlated e.g., NAND gate in Fig. 8b . Such gates are called reconvergent fan-out RFO gates. With correlated signals at the inputs of a gate, the number of transitions that can possibly occur at its output is reduced. The signal correlations considered so far exist among various nodes throughout the circuit and are called spatial correlations.
Besides the spatial correlations, there is another set of correlations that the iMax algorithm completely ignores. The excitations assumed by a node at time t restricts the set of possible excitations that the node can assume at an earlier or a later time. For example, if a node is low at time t, then it can either stay a t l o w or switch from high to low at time t , and it can either stay a t l o w or switch from low to high at time t + . These correlations which exist in the time domain are called temporal correlations.
The iMax algorithm completely ignores all spatial and temporal signal correlations and, therefore, overestimates the supply currents. The advantage of ignoring correlations in the algorithm is its, very desirable linear time performance.
Resolving Signal Correlations
The upper bound produced by the iMax algorithm can be made exact by doing a brute-force enumeration at the inputs of the circuit. In enumeration, since unambiguous input patterns are applied to the circuit, there is no uncertainty present at the inputs and therefore, signal correlations do not become an issue. In a similar fashion, one can improve the results of the iMax algorithm by doing a partial enumeration at a few selected nodes in the circuit.
An example of how partial enumeration helps improve the upper bound can be seen from Fig. 8a . In this circuit with no enumeration, iMax would assume that the signal lines x1 and x2 are mutually independent and therefore infer that both NAND and NOR gates can switch at the same time. However, if we do partial enumeration at signal line x, then we w ould generate four cases corresponding to when x = l, x = h, x = hl and x = lh. When x = l or hl, only the NOR gate switches. Similarly, when x = h or lh, only the NAND gate switches.
Thus, by splitting the problem into four sub-problems, we h a v e improved the result, i.e., found that only one of the two gates may switch a t a n y given time.
While enumerating a node, we only need to process a small subset of gates that are present in its fanout cone. The fanout cone FOC o f a n o d e n is de ned as the set of all gates that can possibly be a ected by a c hange in excitation at the node. Thus, a gate is in the FOC o f a n o d e n if the gate is either directly fed by n or is connected to the output of a gate that is in FOC.
One technique to partially enumerate the internal nodes of a circuit, called Multi-Cone Analysis MCA, was reported in 17 . The motivation behind such an approach w as to be able to enumerate at the MFO nodes, which are the sources of the signal correlation problem. The approach i n v olves partitioning the circuit in a fashion such that each gate belongs to the FOC of at most one MFO node and then enumerating a nite number of cases at these nodes. The Table 4 , there are typically several MFO nodes in a circuit and all of these nodes should be enumerated to properly resolve the signal correlation problem. From our experience with ISCAS-85 benchmark circuits, we h a v e found that the FOCs o f s e v eral of these nodes overlap and therefore, to properly handle signal correlations, these nodes should be enumerated simultaneously. F urthermore, because of the presence of glitches in a circuit, signals at internal nodes span several time points i.e., signal transitions occur at several time points. To take care of the temporal correlation problem, the nodes should be enumerated at each of these time points. Simultaneous enumeration is an extremely expensive process specially when there are several nodes and each node needs to be enumerated at several time points. As an example, to enumerate two nodes simultaneously, the cpu time needed is the product of the times needed to enumerate each node separately. T o a v oid this multiplicative growth of cpu time, several simplifying assumptions were made in the implementation of the MCA algorithm 17 . Because of these simpli cations, the algorithm led to only mild improvement i n iMax results.
From above, it is clear that that improving the iMax upper bound by e n umerating internal nodes is very expensive and does not o er a practical solution for large circuits. In the next section, we present an alternative partial input enumeration approach that signi cantly improves the iMax results and represents a good speed-accuracy trade-o .
Partial Input Enumeration PIE
As shown in Table 4 , there are usually many more MFO nodes than primary inputs in a circuit. Further, as discussed in section 3, all of the inputs to a circuit switch at most once at time zero. Therefore, there is only one time point at which a primary input needs to be enumerated. This is in contrast to an internal circuit node which usually needs to be enumerated at several time points. These observations, combined with the fact that iMax is an extremely fast algorithm, led us to explore the following partial input enumeration PIE algorithm to improve the iMax upper bound.
The PIE Algorithm
Let x 1 , x 2 , : : : ,x N be the N primary inputs of a circuit under consideration. Let X i represent the uncertainty set for input x i at time zero. The input search space for the circuit consists of the set of all valid input patterns that can be applied to the circuit. Mathematically, the input search space is fe 1 ; e 2 ; : : : ; e N je 1 2X 1 ; e 2 2 X 2 ; : : : ;e N 2X N g .F or brevity, w e denote this by X 1 ; X 2 ; : : : ; X N . We assume, without loss of generality, that for a particular input x i , X i = X. Then the input search space X 1 ; X 2 ; ::; X i ; ::; X N for the circuit can be divided into four disjoint parts, namely X 1 ; X 2 ; ::; flg; : :; X N , X 1 ; X 2 ; ::; fhg; : : ; X N , X 1 ; X 2 ; ::; fhlg; : :; X N and X 1 ; X 2 ; ::; flhg; : :; X N . We can compute the bounds on maximum voltage drop in the bus for each of these four parts by running the iMax algorithm, and in each case, restricting the excitation on input x i to the value in its respective uncertainty subset. Since the four parts combined together constitute the complete search space, by taking the maximum of the four bounds on maximum voltage drop, we can still guarantee an upper bound on the worst case maximum voltage drop in the bus. In each of the four runs of iMax, speci c excitations are present at input x i , therefore, signal correlations due to x i disappear and the resulting bound on maximum voltage drop should be an improvement on the original iMax upper bound. In a similar fashion, the upper bounds for the individual subcases can be improved.
The set of inputs selected for enumeration has a direct in uence on the quality a s w ell as the cost of the solution obtained. If all of the inputs are selected and enumerated, then the upper bound obtained would be exact. However, doing this is practically impossible for most circuits. The extent to which an input contributes to signal correlation inside a circuit is di erent for di erent inputs. For example, in Figure 8a , enumerating input x is more bene cial than enumerating any of the other two inputs. Similarly, in Figure 8b , enumerating input I is better than enumerating the other input. Hence, by selecting and enumerating inputs in an intelligent fashion, we can signi cantly improve the iMax upper bound, without spending too much cpu time.
We h a v e developed an algorithm based on best rst search BFS approach 18 that is very e ective in selecting and enumerating inputs and thereby improving the upper bound on maximum voltage drop. Before describing the details of the algorithm, we note that both PIE and the previous approaches mentioned in section 2 are based on search techniques. However, unlike previous approaches which produce a meaningful result only after a long exploration of the entire input space, the PIE algorithm starts with resolving signal correlations due to those inputs rst which contribute the most to the problem. Therefore, signi cant improvements in results are observed early in the search process.
The algorithm proceeds along a conceptual search tree in which each node corresponds to a partial assignment to the primary inputs, i.e., at each node, some inputs have speci ed excitations e.g., say h, while others have uncertain e.g., l, hl excitations. We will refer to these nodes as s nodes," search nodes, to avoid confusing them with circuit nodes. The process of enumerating a primary input at a s node translates, in the search tree domain, to the so-called expansion of the s node into children s nodes. After a s node has been expanded, it is dropped from consideration and its children s nodes are added to the list of s nodes yet to be explored. This list of yet to be explored" s nodes is called a wavefront. The BFS algorithm always processes s nodes which are on its current w a v efront. At the start, the wavefront consists of only one s node, namely the initial uncertain state X 1 ; X 2 ; : : : ; X N . As the search progresses, this wavefront m o v es forward, as shown in Figure 10 . At a n y time, the set of all the s nodes on the wavefront constitutes the complete input search space for the circuit, i.e., if W 1 ; W 2 ; : : : ; W k are the s nodes on the wavefront and W = X 1 ; X 2 ; : : : ; X N i s the initial uncertain state, then Input Search Space = fp j p 2 Wg = fp j p 2 W 1 or p 2 W 2 ; : : : ;o rp 2W k g 8 Thus, an input pattern leading to the WC Max VD must belong to one of the s nodes on the wavefront.
An upper bound value is associated with every s node generated during the search. The value of this bound for a s node is the upper bound on maximum voltage drop obtained from the iMax algorithm for the corresponding input assignment. Further, two parameters, an upper bound UB and a lower bound LB, are associated with the search. The value of UB at any stage during the search is the current best estimate on maximum voltage drop. It is the maximum value of the upper bounds of all the s nodes on the wavefront. The second parameter, LB, k eeps track of the maximum value of the upper bound corresponding to all of the input patterns 4 seen thus far. As the search progresses, the estimates on LB and UB improve. During the search, s nodes which correspond to the maximum or best value of upper bound are repeatedly expanded. Because of this best rst strategy, there is a gradual reduction in the upper bound on maximum voltage drop UB. This iterative improvement is a very important feature of the algorithm for large circuits where an exhaustive exploration of the input space is practically impossible. The PIE algorithm can be stopped at any i n termediate stage and the current best UB can still be reported.
The PIE algorithm starts with the initial uncertain state and a known LB, which i s t h e bound on maximum voltage drop for some input pattern. During the search, a s node with the best bound on maximum voltage drop is repeatedly selected and its descendent s nodes are generated by e n umerating an input, as explained in the outline shown in Figure 11 . Here a leaf s node is one which corresponds to an input pattern. Before explaining various functions used in this outline, an example to illustrate the algorithm follows.
In Figure 12 , we show h o w the PIE algorithm progresses for a circuit with three inputs. Various s nodes generated during the search are shown by o v als in the gure. Upper bounds on maximum voltage drop, as obtained from iMax, for the s nodes are shown within the ovals. We assume that the uncertainty set for each of the inputs at time zero is X. Then the initial uncertain state for the circuit can be denoted by XXX. If the upper bound on maximum voltage drop corresponding to this s node is 50, then the value of UB at the start of the algorithm is 50. At this time, suppose we select the second input for enumeration. Then we w ould generate the four children s nodes, a s s h o wn in the gure. With their associated upper bound values as shown, the value of UB improves from 50 to 47 by this enumeration. At this stage, one could either stop with 47 as the estimate on maximum voltage drop or continue with the enumeration process. To continue, we w ould select s node XhlX for enumeration, as this s node has the maximum associated upper bound value. Note that by improving the upper bound value associated with this node, the overall upper bound on maximum voltage drop UB can be improved. At this s node, i f w e select the rst input for enumeration, then we w ould generate four more s nodes, a s s h o wn. With this enumeration, the value of UB improves to 45. At this state, the current w a v efront consists of the following s nodes: X l X , XhX, l hlX, h hlX, hl hlX, lh hlX and XlhX. To continue further, we w ould select s node XlX for enumeration and so on. We n o w explain the stopping, pruning and splitting criteria mentioned in the above outline.
Stopping criterion
We stop the search when any one of the following two conditions is satis ed.
a UB LB ETF.
b Number of s nodes generated User speci ed parameter Max No Nodes.
The Error Tolerance F actor ETF is a user-speci ed parameter that provides control over the nal desired accuracy of the algorithm. The value of this parameter is always bigger than 1. The rst condition above speci es that when the UB value is within the ETF factor of some known LB, then the search can be terminated. In large circuits where calculating an exact solution by running the search to completion is extremely expensive, and an overestimation by 30 to 40 is often acceptable, such a parameter can be useful. The second condition puts a hard limit Max No Nodes on the number of s nodes that are to be generated during the search.
Pruning criterion
During the search, if we come across a s node for which the associated upper bound value satis es the following condition: upper bounds node LB ETF then such a s node can be deleted from the search as its upper bound value is already acceptable. This pruning criterion deletes unnecessary s nodes during the search and thus keeps the memory usage down.
Splitting Criterion and Experimental Results
The splitting criterion SC speci es the input which should be enumerated next from any s node during the search. We n o w describe two heuristics for the SC that have shown good results in practice. The rst heuristic selects an input which has the highest sensitivity while the second one selects an input based upon the heuristically determined in uence it has inside the circuit. 
H 1 heuristic
Let us suppose that during the search, we are at a particular s node n and we select an input x i for enumeration. Without loss of generality, w e assume that the uncertainty set for x i at time zero is X. Then by e n umerating x i , w e w ould generate four children s nodes, as shown in Figure 13 . We assume that the upper bound associated with s node n is denoted by bound n and the upper bounds associated with the children s nodes are denoted by bound l , bound h , bound hl and bound lh respectively. I f w e denote bound i = bound n , max fbound l ; bound h ; bound hl ; bound lh g ; 9 then by e n umerating input x i at s node n, w e can improve its associated bound by an amount bound i . This calculation can be repeated for every input one at a time while the uncertainty sets of other inputs are not perturbed and an input that gives rise to the maximum improvement in the bound can be selected, i.e., Find k such that bound k bound i ; 1 i N; i6 =k: 10 However, if bound i is zero for all of the inputs, which occurs very often in practice, then the above selection process would not work well. For a speci c input x i , bound i = 0 means that the bound associated with at least one of its children s nodes is equal to bound n . H o w ever, for the remaining children s nodes, the upper bound values may not be equal to bound n and this information can be used in the selection of the best input. Based on these observations, we h a v e come up with the following H 1 heuristic function for assigning value to an input x i :
H 1 x i = A bound n , bound 1 + B bound n , bound 2 + C bound n , bound 3 + bound n , bound 4 where bound 1 , bound 2 , bound 3 and bound 4 are the bounds associated with the children s nodes, generated by e n umerating x i and arranged in decreasing order. A, B and C are three constants such that A B C 1. At a n y s node during the search, we compute the heuristic values for all of the inputs and select an input with the maximum associated heuristic value. This splitting criterion is called dynamic H 1 splitting criterion because at each s node, it calculates the heuristic values for all the inputs and then selects the best input for enumeration. The results of partial input enumeration using the PIE algorithm and using the dynamic H 1 splitting criterion for the nine small circuits are documented in Table 5 . For the table, the iMax algorithm with Max No Hops = 1 0 w as used. For all the circuits, the search w as run to completion 5 , i.e., until UB became equal to LB ETF = 1. The results clearly show that the PIE algorithm is very e cient in scanning the input space. As an example, the last circuit in the table Alu has 14 inputs; therefore, the number of possible input patterns for this circuit is 4 14 = 268,435,456. The PIE algorithm was able to scan the entire search space after generating just 57 s nodes. Since the iMax upper bound is used to guide the search, the table also indicates that the bound produced by the iMax algorithm is very tight for these circuits.
As can be seen from Table 5 , the number of iMax runs needed in the dynamic splitting criterion far exceeds the number of s nodes generated. At a n y s node, to calculate the H 1 heuristic value for a particular input x i , w e need to run the iMax algorithm jX i j number of times, where jX i j is the number of elements in the uncertainty set. If the s node has k inputs which are possible candidates for enumeration i.e., their jX i j 1, then the iMax algorithm will be run P k i=1 jX i j number of times to nd the best input to enumerate next. For bigger 5 This is how the WC Max VD results were obtained for circuits in Tables 1 circuits, with a large number of inputs, this time will be even more dominant rendering the PIE algorithm prohibitively expensive. Therefore, we h a v e experimented with other less expensive alternatives. Instead of calculating the H 1 heuristic values for all the inputs at every s node during the search, the heuristic value for every input is calculatedat the beginning of the search. All of the inputs are arranged in the decreasing order of their heuristic values. During the search, at every s node, inputs are selected in this xed order. This criterion is called the static H 1 splitting criterion. The amount of time spent in the static splitting criterion is xed and is equal to P N i=1 jX i j runs of the iMax algorithm for a circuit with N inputs. The results of the PIE algorithm using the static H 1 splitting criterion are also summarized in Table 5 . With the static splitting criterion, the number of runs of the iMax algorithm required in the splitting criterion goes down, but the number of s nodes generated during the search goes up for some circuits. However, for all of the circuits, an overall reduction in the cpu times required by the algorithm to nish is observed.
H 2 heuristic
The number of gates that are a ected by a c hange in excitation at an input is a good heuristic measure of how m uch in uence the input has on the upper bound current w a v eforms at the contact points and thus on the maximum voltage drop. Inputs which a ect more gates i.e., which h a v e larger FOCs should be enumerated before others. This leads us to another static splitting criterion H 2 , in which the sizes of the FOC associated with all the inputs are calculated. As with H 1 , all of the inputs are arranged in the decreasing order of H 2 values i.e., FOC values and during the search, at every s node, inputs are selected in this xed order. We will show that, while both static H 1 and H 2 give good results in practice, H 2 is much better in terms of speed and has accuracy comparable to H 1 .
The results of the PIE algorithm using both H 1 and H 2 static splitting criteria for the ISCAS-85 benchmark circuits are shown in Table 6 . In the tables, under various iMax and PIE columns, we show the ratio of the respective upper bound to the lower bound obtained from iLogSim. The numbers in parentheses under the PIE columns indicate the number of s nodes that were generated before stopping the search i.e., the Max No Nodes parameter; 1k stands for 1000. Total cpu times required by the algorithm on a SUN SPARCstation ELC with Max No Nodes=100 are also shown in the tables. From the tables, we note that for most of the circuits, the PIE algorithm leads to some improvement in results, as is re ected by the ratio. This ratio can be further improved by running the PIE algorithm for longer durations. We emphasize that, since we can only compare the upper bound to a lower bound, the numbers in the table are only upper bounds on the error. It is prohibitively expensive to measure the We also emphasize the following attractive property of the algorithm : a signi cant amount of improvement in the upper bound occurs in the rst few s nodes about 50-100 of the algorithm. This is shown in Figure 14 for c3540, where the ratio of the upper bound to lower bound is plotted as a function of cpu time for the rst 1000 s nodes. The gure also indicates that our heuristics are working well to select the most critical s nodes rst. Similar behavior is observed for most other circuits.
The cpu time needed for generating the input list by the H 2 splitting criterion is negligible compared to the time needed by the H 1 criterion. For VLSI circuits with several hundred inputs, where the time needed by the H 1 criterion may be large, the H 2 criterion may b e used instead. As can be seen from Table 6 also see Table 7 , the results produced by using either splitting criterion are quite comparable, especially for those circuits where iMax did not produce a good upper bound.
In order to demonstrate the applicability of the PIE algorithm for large circuits with several thousand gates, we h a v e also experimented with the ISCAS-89 benchmark circuits 19 . For these synchronous sequential circuits, we h a v e extracted the combinational blocks by deleting the ip-ops. These combinational blocks have gate counts ranging up to 27,400 and number of inputs primary inputs and D-ip-ops ranging up to 1750. The results of the PIE algorithm on these circuits using both H 1 and H 2 splitting criteria are summarized in Table 7 . Similar improvements in results, as for those of Table 6 , are observed. It is clear from the table that even for circuits of this size, our algorithms show good speed and accuracy performance.
Conclusions and Future Work
In this paper, we h a v e proposed a linear time algorithm iMax that computes maximum currents in the supply lines. Most of the previous algorithms on maximum current estimation su er from exponential complexity and are not adequate for large circuits. Our approach a v oids exponential complexity b y adopting a pattern independent approach. The results produced by the algorithm are within acceptable bounds for most circuits. We h a v e also presented a new partial input enumeration algorithm that partially resolves the signal correlations and further improves the upper bound obtained from iMax. The algorithm is based on the best rst search BFS technique and represents a good time-accuracy trade-o . The PIE algorithm involves a search procedure, but this search need not be carried too deep to obtain good results. The algorithm is quite applicable to VLSI circuits, as is demonstrated by the experimental results on circuits with up to 27,400 gates. In our future research, we plan to extend the study to include better gate delay and current models and to identify troublesome voltage drop sites in supply lines, using RC models, from the maximum current estimates. At some arbitrary time t 0, let's assume that the node voltage at node i is zero v i t = 0, while all other node voltages are nonnegative v j t 0; 1 j n; j 6 = i. The di erential equation corresponding to node i can be written as 
