Excessive instantaneous power consumption in VLSI circuits may reduce the reliability and performance of VLSI chips. Hence, to synthesize circuits with high reliability, it is imperative to e ciently obtain a precise estimation of the maximum power dissipation. However, due to the inherent input-pattern dependence of the problem, it is intractable to conduct an exhaustive search for circuits with a large number of primary inputs. Hence, the practical approach is to generate a tight lower bound and an upper bound for maximum power dissipation within a reasonable amount of CPU time. In this paper, instead of using the traditional simulation-based techniques, we propose a novel approach to obtain a lower bound of the maximum power consumption using Automatic Test Generation (ATG) technique. Experiments with MCNC and ISCAS-85 benchmark circuits show that our approach generates the lower bound with the quality which cannot be achieved using simulation-based techniques. In addition, a Monte Carlo based technique to estimate maximum power dissipation is described. It not only serves as a comparison version for our ATG approach, but also generates a metric to measure the quality of a lower bound from a statistical point of view.
Introduction
With the demand of high reliability in today's VLSI design, it is essential to accurately estimate the maximum power consumption during the synthesis of VLSI circuits. Peak power dissipation can have a large impact on reliability and hence, proper design guidelines should be considered for highly reliable systems. For a CMOS circuit, the power dissipation is mainly due to switching activities charging and discharging capacitances at the internal and the output nodes of a circuit. Accurately estimating the maximum power consumption for CMOS circuits involves exhaustively searching for two consecutive binary input vectors to induce as many switchings as possible. Unfortunately, the problem is NP-complete and the time complexity for this search is O(4 n ), where n is the number of the primary inputs of the circuit. Hence, for circuits with large number of primary inputs, the only practical way to solve the problem is to generate a tight lower bound and an upper bound for maximum power consumption such that the range between lower and upper bound is as narrow as possible.
Several approachs have been proposed to estimate the maximum power consumption for CMOS circuits. In 4], Devadas et. al. formulated the power dissipation of CMOS circuits as a Boolean function in term of the primary inputs. They tried to maximize the function by solving a weighted max-satis ability problem. During the branch-and-bound process which maximizes the objective function, the lower bound can be improved successively. However, the time complexity to obtain the objective function and to optimize the objective function are both exponential functions of the number of primary inputs (PI's). Hence, this approach is only feasible for small circuits. In 5], 6], Kriplani et. al. addressed the problem of determining an upper bound for the maximum power dissipation. They rst generate an upper bound by propagating the signal uncertainty through the circuit. The bound is then successively made tighter by considering spatial correlation of signals at the outputs of logic gates. In this approach, the time to obtain the initial upper bound is linear in the number of gates, and the branch-and-bound strategy to improve the upper bound is, of course, CPU time intensive.
To generate a lower bound, traditional simulation-based approaches search for two consecutive input binary vectors to maximize the instantaneous power dissipation. The process is CPU time intensive. Furthermore, for a circuit with large number of PI's, simulation tends to generate a loose lower bound.
Motivated by these, we propose a novel approach to estimate maximum power using the techniques of Automatic Test Generation (ATG) for stuck-at faults. This technique generates lower bounds of good quality. In CMOS circuits, the power dissipation is dominated by the dynamic power consumption 1], and hence the instantaneous power dissipation due to two consecutive input binary vectors is proportional to : Pi = X for all gates T(g) C(g) (1) where C(g) denotes the output capacitance of gate g, and T(g) is a binary variable which indicates whether gate g switches or not corresponding to the two input vectors. T(g) equals 1 if gate g switches, and is 0 if gate g does not switch. To maximize Pi e ciently, we sort the gates by the output capacitance (C( )) in non-increasing order, and then assign transitions to gates (i.e. let T( )=1) from the sorted list of gates with the largest output capacitance.
To justify the transitions (i.e. to see if T( ) = 1 is achievable), we use the modi ed justi cation mechanism in 9-V algorithm 9] (an ATG algorithm for stuck-at faults), which was originally used to justify the fault-propagation paths in combinational circuits. Experimentally, the execution time of our algorithm is approximately proportional to the number of gates in the circuit, and is comparable to the time of simulating the circuit once. In addition, experiments show the quality of the bounds are superior to the results from simulation.
To measure the quality of the lower bounds (of peak power), we propose a simulation-based approach to estimate the maximum power by a Monte Carlo based technique. During the circuit simulation, we monitor the maximum power while estimating the mean and deviation of the power. The quality of the maximum power (a lower bound of peak power) can then be measured from a statistical point of view after the simulation. The paper is organized as follows. Section 2 illustrates our approach to estimate maximum instantaneous power dissipation. Section 3 discusses how to measure the quality of a lower bound by Monte Carlo simulation. Section 4 presents the implementation and experimental results. Finally, conclusions are given in Section 5.
2 Maximum Power Estimation -the ATG Approach
In CMOS circuits, since the capacitive load at a gate output can be approximated by the fanout factor of the gate 10]. Hence, the power dissipation due to two consecutive input vectors is given by :
where V1, V2 denote two consecutive input binary vectors to the circuit , g( ) denotes the Boolean function of gate g in term of PI's, and F(g) denotes the number of fanouts of gate g. Pi is a measure of the instantaneous power dissipation due to two consecutive input vectors, and is the objective function to be maximized.
To maximize Pi, instead of searching for appropriate V1 and V2, we greedily assign transitions to the gates with large fanout. The gates are rst sorted by the fanout number in non-increasing order. Then, in each iteration, we select a gate g which is untried and currently has the the largest fanout number F(g) to justify the assignment : g(V1) g(V2) = 1. The justi cation mechanism in our algorithm includes two processes -backtracing and implication, which will be discussed in Section 3.1 and 3.2 respectively. If the justication for assigning transitions to a gate fails, the state of the circuit from the incorrect decision will be recovered, and next gate will be tried. The gates in the circuit are assigned and justi ed one by one until all gates have been processed.
To recover the state of the circuit from incorrect decisions, we store all the values which have been either assigned or implied to the gates in the circuit. To implement this, we associate each gate g with a stack to store all the composite logic values a=b which have been assigned to g (a and b denotes g(V1) and g(V2) respectively). The variables a and b can be 1, 0 or u (unknown value). At each gate, the top of the stack stores the most updated value for the gate.
For example, in Figure 1 , the top of the stacks of gate x, y and z are 0=u, u=u, u=1 respectively. After assigning a rising transition (0=1) to x, y(V2) of y is forced to be 1, and y(V1) is still left unknown. Hence, u=1 is pushed into the stack of y to be gate y's current value. Then, the most updated values for x, y and z turn out to be 0=1, u=1, u=1 respectively. We know x, y are the gates whose values have been changed. If it is determined later that assigning 0=1 to x causes con icts, the stacks of gate x and y are popped to recover the state of the circuit from the incorrect decision.
To make a gate switch, either 0=1 or 1=0 can be tried if the current value of the gate is not in con ict with the new assignment. We say two composite values are in con ict if they have 0 and 1 at the same position. For example, 1=0 can't be assigned to gate x in Figure 1 , since 1=0 is in con ict with 0=u.
Backtracing
To justify the transition assigned to a gate, the backtracing process generates a path for propagating the transition toward the PI's. The implication process then justi es the path by computing and checking the values of all implied gates.
The transition assigned to the output of a gate is justi ed if either one of the following paths can be generated: 1. The primary path : a path for backtracing the transition from the assigned gate to a PI, which has the composite value not in con ict with the transition. 2. The secondary path : a path for backtracing the transition from the assigned gate to a primary path, which was generated for justifying another transition.
For example, in Figure 2 , path 1 is a primary path for the transition at g. We can also backtrace the transition at g to the primary path w, which was generated for justifying the transition at the output of gate x. Consequently, a secondary path can be also generated (path 2). Either path 1 or path 2 justi es the transition at gate g.
The justi cation process in our algorithm is complete, i.e. it can exhaustively search for the primary or secondary path to justify a transition. If the process returns FAIL for both rising and falling transitions at a gate output, it is impossible to have signal switching under the current state of the circuit. The justi cation process consists of two processesbacktracing and implication. The backtracing process makes decisions about how to justify a transition, and the incorrect decisions made by backtracing can be detected by the implication process. For example, in Figure 2 , if fanin g3 is chosen for backtracing the transition 1=0 at gate g, gate h will be implied to have the value 0=u, which is in con ict with its current value 1=u.
In MAXP , the backtracing process is implemented by a recursive subroutine Backtrace(). For justifying an assigned transition, it backtraces the transition from the assigned gate toward the PI's gate by gate. At each gate the transition passes over, if there are several fanins not in con ict with the transition, the subroutine Backtrace() computes the priority of these fanins. The fanins are then sorted by the priority to decide the order in which they will be tried. We de ne the priority (pr( )) of a fanin gi as: pr(gi) = F(gi) ? level(gi) (3) where and denotes two constants which can be speci ed by the user before executing program MAXP , F(gi) denotes the number of the fanout of gi, and level(gi) denotes the level of gate gi. In the circuit, the level of a gate is de ned as : 1 + (the maximum level of the gate's fanins). The level of the PI's are de ned as 0. To increase Figure 2 , the priority of g1, g2, and g3 are 1.4, 1.4, 1.2 respectively. That means g1 and g2 are the most favored fanins currently, and hence, the subroutine Backtrace() will try to backtrace the transition through g1 or g2 rst.
Implication
In stuck-at testing, the implication process is used to verify the decisions made by the backtracing process. In MAXP , the implication process works similarly as in the 9-V algorithm, except for the j-Frontiers. In the progress of test generation, there are some gates whose outputs are known but not implied by the inputs (for example, a 2-input AND gate, which has the output 0, and two unknown inputs). JFrontier is a set of gates which consists of all such gates in the circuit. In 9-V algorithm, after a path for propagating the fault to a primary output has been built, the path is immediately justi ed by eliminating the j-Frontier. However, in our approach, the j-Frontier left by a successful backtracing is not eliminated immediately. This is due to the fact that justifying the transition assigned to one gate may reduce the possibility of assigning transitions to the others. To maximize the number of transitions we can possibly get, j-Frontiers are not justi ed immediately after each assignment in MAXP . Instead, they are justi ed automatically while other transitions are assigned.
In Figure 3 , for example, g1 is an unjusti ed j-Frontier gate and it can be justi ed automatically when a falling transition is assigned to g2. If g1 was justi ed at the time it was implied, an incorrect decision (I1 = 1=1, I2 = 0=1) may be made to prohibit the transition at g2.
However, the unjusti ed j-Frontiers may be in con ict.
For example, in Figure 4 , it is impossible to assign I1, I2 and I3 consistently, where g1, g2 and g3 are j-Frontier gates. Since assigning transitions to the gates in di erent order may result in di erent j-Frontiers, one possible solution for the problem is to change the assigning order and re-do the whole process. The procedure is repeated until O(2 n!), where n is the number of the gates in the circuit). Hence, in MAXP , the problem is solved in a different way. We incorporate a justifying mechanism, which has adjustable criterion of justi cation, into the implication process. In the algorithm, a justi cation index Jx is designed for controlling the criterion of justifying an assigned transition. Without using the maximum justi cation index, all transitions are just partially justi ed, i.e. the j-Frontiers generated by two transitions may be in con ict even if they have passed the consistency check.
In MAXP , the implication process justi es the j- 2. Sim = 1 denotes a forced assigning mode for simple gates. For a j-Frontier gate, a favored unknown fanin is selected to assign the controlling value. For example, in circuit (a) of Figure 5 , one of the unknown fanins of gate A (B) will be chosen to have the value u=0. 3 . Xor = 0 denotes the normal implication mode for XOR and XNOR gates. For example, in circuit (b) of Figure 5 , the two fanins of gate P are left unassigned. 4. Xor = 1 denotes a forced assigning mode for XOR and XNOR gates. The two unknown fanins of a j-Frontier gate will be assigned if either one of the fanins has reconvergent fanout. In circuit (b) of Figure 5 , the two fanins of gate P will be left unassigned in this mode, since they don't have reconvergent fanout. 5. Xor = 2 also denotes a forced assigning mode for XOR and XNOR gates. In this mode, the two unknown fanins of j-Frontier gates are always assigned. For example, in circuit (b) of Figure 5 , either (u=1; u=0) or (u=0; u=1) will be assigned to the two fanins of P when P is implied to have the value u=1.
Then the functionality of Jx can be described as follows: 1. Jx = 0 denotes the normal implication setting (Sim = 0,Xor = 0). Let us consider the two examples of Figure 5 . I1 consistently under the current state of the circuit. The inconsistency can be detected by Program MAXP . It will increase the value of Jx by 1 and re-execute the assigning process. Jx = 1 denotes the settings (Sim = 1,Xor = 0) and (Sim = 0,Xor = 1). In this mode, MAXP will try both settings to determine if a consistent assignment for the circuit can be obtained. Under the setting (Sim = 1,Xor = 0), all simple gates in j-Frontiers will be justi ed immediately. In circuit (a), gate A will be implied to have the value u=1 if gate B is assigned u=1, and gate B will be implied to have the value u=0 if gate A is assigned u=0. It can be noted that one of the gates (A or B) will be implied during the justication of the other under the setting (Sim = 1,Xor = 0). Hence, for circuit (a), consistent assignments can be obtained if Jx = 1 is used. In circuit (b), It can be noted that I2 cannot be assigned consistently. To remove the inconsistency, the XOR gate P should be justi ed immediately when it is implied. Since both of the fanins of P do not have reconvergent fanout, P will be justi ed immediately when it is implied under the setting (Sim = 0,Xor = 2) (i.e. Jx = 2). Hence, a consistent assignment for circuit (b) can be obtained in the mode Jx = 2.
Time Complexity
To facilitate analyzing the complexity of MAXP , we introduce the following notations:
1. Lm: the number of the levels in the circuit. 2. fm : the maximum fanin number of gates in the circuit. Experimentally, the program works quite e ciently for all the circuits we tried, no matter what bound was set. One reason for the phenomenon is: the worst-case behavior of justifying an assigned transition can be observed if the gate cannot have the transition under the current state of the circuit (in that case, the algorithm exhaustively tries all possible paths). These situations are expected to be rare at the beginning of the assigning process, since most of the gates are left unassigned at this time. On the other hand, at the end of the assigning process, since most of the gates are assigned, only a small number of gates needs to be checked for justifying an assignment. Hence, the speed of justifying a transition is expected to be fast.
Since the number of backtrackings at each gate is bounded by the number of the fanins, the algorithm works e ciently for circuits with small fm (maximum fanin number of gates).
Statistical Approach
For estimation of average power dissipation, researchers have used statistical techniques 7] 8]. With a pre-speci ed accuracy, it can generate an estimate for the average power based on a number of simulation runs. In 7], Burch et. al. made the assumption that power dissipation of the CMOS combinational circuit over a xed time interval is a normally distributed random variable. They then generated bounds for average power with the pre-set accuracy based on the assumption.
In our statistical approach to estimate the peak power, we propose to simulate the circuit and monitor the maximum value of the instantaneous power generated from sim- The measure of quality is a function of the mean and deviation of the instantaneous power. Therefore, we apply Monte Carlo technique to estimate the mean and deviation of the instantaneous power while simulating the circuit. The application is straightforward, and hence, the detailed descriptions have been omitted. In this approach, the desired quality of the simulated maximum power can be speci ed before the simulation. The program stops simulating the circuit if (1) the pre-speci ed bound on the number of simulations (say, 100000) has been exceeded, or (2) the measure of quality of the simulated maximum power has exceeded the pre-speci ed criterion. This approach can be implemented in a gate-level simulator which estimates the mean and deviation of the power while stressing the circuit by the PI's with high activity values (we assume that high activity at the inputs produces high activity at the internal nodes. XOR gates can be an exception. Extensive experiments suggest that our assumption is valid 3]).
To facilitate estimating the mean and deviation of the instantaneous power Pi, the normality assumption of Pi should be justi ed rst. As mentioned in 7], it cannot be proved that the power dissipation over a nite period, especially if the period is as short as one clock cycle, is normally distributed. However, according to (1) Pi, the distribution function of Pi, which is denoted by DP i ( ), can be calculated if both the mean and deviation of Pi are known. For a simulated maximum power X with a large value of DP i (X), the probability of generating a sample of instantaneous power larger than X by further simulations is low. Hence, we propose to associate the simulated maximum power X with the probability DP i (X) as its measure of quality.
For example, in Figure 6 , Pi denotes the instantaneous power, which is a random variable. The value of DP i (X) is Dl denote the bounds of DP i (X). In statistics, the quality measure of X (DP i (X)) falls into the range (Dl; Du) with con dence . We can interpret the measure Dl as : with con dence , the probability of the event fpi Xg is greater than Dl, where pi is a random sample of instantaneous power. It is less likely, by further simulations, to increase a simulated maximum power with a large value of Dl of high con dence level.
Experimental Results
The proposed ATG-like algorithm (MAXP ), along with the statistical approach (SIMP ), have been implemented in C on a HP 715=50 workstation. In MAXP , the parameters for computing the backtracing priority of the fanins are set as ( ; )=(0.8,0.2). In the simulator SIMP , the probability and activity values 2] of the primary inputs are speci ed before simulating the circuits. The probability of a signal denotes the statistical ratio between the number of the clock cycles on which the signal is high and the total number of the clock cycles. The activity of a signal is de ned as the statistical ratio between the number of the clock cycles on which the signal switches and the total number of the clock cycles. In SIMP , the probability values of the PI's are set as 0:5. It is generally believed that the PI's with high activity values tend to stress the circuit with high switching activities at the internal and output nodes 3]. To maximize the instantaneous power, the activity values of PI's should be set as high as possible. We set the activity values of the PI's as 0:9 in SIMP , because of the following two reasons :
1. The bit-correlations of the two input vectors which induce the peak power in the circuit are unknown. Hence, the activity values of the PI's should not be set as 1. 2. The PI's with very high activity values may not be suitable for stressing the circuits with XOR or XNOR gates (e.g. xor, cccxor in Table 1 ). Table 1 shows the results of 6 small circuits. Table  2 (g(V1) g(V2))F(g), where F(g) denotes the fanout number of gate g. In SIMP , the maximum power X and CPU time are measured based on 10000 input patterns. As mentioned in Section 4, we can measure the mean and distribution of the power while monitoring the maximum power X. In Table 1, 2 and 3, we associate each maximum power X with the quality measure Dl at 95 % con dence level.
From the results in Table 1 ,2 and 3, the signi cances of the ATG-like approach can be concluded as follows :
1. Considering the speed-performance of MAXP , for very large circuits (say, over 100000 gates), the ATG-like approach may be the only feasible way to generate a reliable lower bound for peak power. 2. The ATG-like approach generates the lower bounds superior to the results from the conventional simulation-based techniques (e.g. i10, i5, i4 in Table 3 , and C7552, C6288, C880 in Table 2 ).
It should be mentioned that there are some parameters left within MAXP ( , , and the assigning order of the gates), which can be varied to improve the results. Considering the speed-performance of MAXP , it is reasonable to modify MAXP to use several sets of parameters for estimation, and to report the best bound generated. For example, for Circuit C5315 and C1355 in Table 2 , better lower bounds 2620 and 384 respectively can be obtained from MAXP by setting ( , )=(0.5,0.5).
Conclusions and Future work
In this paper, a novel approach has been proposed to estimate the maximum power using test generation techniques for stuck-at faults. Furthermore, a statistical approach has been proposed to generate a lower bound for peak power. It can also measure the quality of the lower bound from a statistical point of view. Experiments shows the test generation approach is superior to the traditional simulationbased technique in both e ciency and the quality of the results. Within a very small amount of time, it generates lower bounds superior to the results from simulating the circuit for a long period of time. Considering the speed of our algorithm, it not only serves as an estimator which is superior to simulation, but also may be the only practical way to estimate the peak power for very large circuits.
