We present new techniques for estimating the maximum instantaneous current through the power supply lines for CMOS circuits. We investigate four different approaches: (1) timed-ATPG based approach, (2) probability based approach, (3) genetic algorithm based approach and (4) integer linear programming (ILP) approach. The first three approaches produce a tight lower bound on the maximum current. The ILP based approach produces the exact solutions for small circuits, and tight upper bounds of the solutions for large circuits. Our experimental results show that the upper bounds produced by the ILP approach combined with the lower bounds produced by the other three approaches confine the exact solution for the maximum instantaneous current to a small range.
INTRODUCTION
Continuous shrinking of the device feature sizes introduces and emphasizes many new problems in VLSI designs. Large voltage drops caused by high instantaneous current flowing through the power supply lines can affect the reliability as well as the performance of the circuit. High current density is a cause of electromigration which can lead to short or open circuits. To design the power supply lines properly, it is necessary to estimate the maximum instantaneous current flowing through them.
For CMOS circuits, the maximum instantaneous current is mainly due to the signal switching which depends on the input patterns applied to the circuits. To be able to observe switching on signals, a two-vector sequence, V = (v 1 ,v 2 ), has to be applied at the inputs. One way to find the maximum instantaneous current would be to simulate all possible patterns. For a circuit with n primary inputs, this would require simulation of 4 n patterns. This is practical only for circuits with a small number of primary inputs. Also, the maximum instantaneous current is very dependent on the circuits delays because the maximum current depends on the number of signals that are switching simultaneously or within a small time interval. Therefore, accurate circuit timing (gate and interconnection delays) should be considered during the process of the maximum current estimation.
Several research groups have worked on estimating the maximum instantaneous current through the power supply lines of CMOS circuits [1, 4, 6, 10, 14, 15] . Methodologies proposed in [1, 4] are applicable only to small circuits. They produce a lower bound on the maximum instantaneous current through the power and ground lines. Kriplani et al. [10] present a pattern independent algorithm to find an upper bound on the maximum current. However, because of their assumption that all signals (primary inputs as well as internal signals) are independent, the estimated maximum current for most circuits represents a very loose upper bound. In order to take into account some of the signal correlations, they propose an algorithm for partial input enumeration. However, for large circuits this methodology might not be effective since a large number of inputs would need to be enumerated to tighten the upper bound. Hsiao et al. [6] propose a Genetic-Algorithm-based approach to find a lower bound for maximum power dissipation. The fitness function used in their GA-based approach is performed at gate-level, which cannot take short circuit current and partial voltage swing into consideration. The problem of estimating the maximum instantaneous current is also addressed in [14, 15] . A test generation strategy is devised for finding test patterns that would produce the maximum instantaneous current. Signals with large fanout count are assigned transitions which are then, justified backwards until the primary inputs are reached. Therefore, the estimated maximum current represents a lower bound. The main problem in this methodology is the use of the zero-delay model for the circuit. It is known that the glitches can significantly contribute to the maximum instantaneous current and they cannot be considered when a zero-delay model is used. Even if two signals both have transitions under the zero-delay model, their transitions may not occur at the same time. Assuming that both transitions contribute to the maximum instantaneous current is not a realistic assumption.
In this paper, we present four approaches for estimating the maximum instantaneous current through the supply lines of CMOS circuits: timed-ATPG [9] , probability based [9] , genetic algorithm based [7] , and integer-linear-programming (ILP) based [8] approaches. The first three approaches produce tight lower bounds on the maximum current, while the ILP-based approach obtains the exact solutions for small circuits, and tight upper bounds for large circuits. The timed-ATPG based, probability based, and ILP based approaches operate at the gate level, while the genetic algorithm based approach can be used at any level of abstraction.
In our timed-ATPG approach, a set of signals whose simultaneous switching produces high current is assigned transitions and timed-ATPG [3] is used to justify the assignments and derive 2-vector sequences. In the probabilitybased approach, a set of selected gates is assigned weights based on their possible current contribution at the given time. Next, these weights are propagated backwards to the primary inputs, and the sequences for the maximum instantaneous current are derived using these values. The vectors derived by timed-ATPG-based and probability-based approaches are simulated by using a commercially available event-driven transistor-level power/current simulator [13] . The highest currents caused by these vectors correspond to lower bounds of the exact worst-case solutions. The geneticalgorithm-based approach applies iterative genetic operations to generate patterns causing high instantaneous current. New generations of patterns are generated using "good" patterns derived in the previous iterations. In the ILP-based approach, we model the problem as an ILP problem. Solving the corresponding ILP formulae allows us to obtain the exact solutions for the maximum current. However, this approach is impractical for large circuits. Therefore, we propose to partition a large circuit into sub-circuits, and then obtain the exact solution of each sub-circuit by solving its corresponding ILP formulae. Since the worstcase solution for each sub-circuit can be computed, the sum of the worst-case solutions of all sub-circuits corresponds to an upper bound of the worst-case solution for the entire circuit.
All of the above approaches have their advantages and disadvantages. In general, the timed-ATPG approach produces a tighter lower bound on the maximum instantaneous current at the expense of memory and computation time. The probability-based approach is more efficient but it produces a looser lower bound on the maximum current. The estimation quality of geneticalgorithm-based approach is dependent on the population size and the number of generations. According to our experimental results, the bound derived by geneticalgorithm-based approach is similar to that derived by timed-ATPG-based approach, while its execution time is shorter. Also, compared to the exact solution derived by the ILP-based approach for small circuits, the lower bounds derived by timed-ATPG and genetic-algorithm based approaches are close to the exact solution. Timed-ATPG, probability-based, and ILP approaches are only applicable to combinational circuits, while the GA approach applies to both combinational and sequential circuits.
In case of large circuits, the upper and lower bounds of the worst-case solutions give proper guidelines for estimating the exact worst-case solutions to designers. These bounds confine the exact solution to a small range. Our experimental results show that the ILP based approach produces, on the average, an upper bound on the maximum instantaneous current which is 47% and 32% tighter than the one obtained by iMax and PIE algorithms [10] , respectively ( Table 5 ).
The rest of this paper is organized as follows. In Section 2, we briefly review the iMax algorithm [10] since some of our approaches are based on this work. In Section 3, we introduce the timed-ATPG based, probability based and genetic algorithm based approaches for estimating the maximum instantaneous current. In Section 4, we formulate the maximum instantaneous current problem as an ILP problem. We also propose a partitioning strategy for large circuits. Section 5 gives the experimental results. Section 6 concludes the paper.
PRIOR WORK, MODEL AND ASSUMPTATIONS
Our gate-level approaches (timed-ATPG, probabilitybased and ILP-based approaches) rely on the iMax algorithm [10] in the pre-processing step. The iMax algorithm assumes that all inputs to the combinational logic switch simultaneously at time t=0 and that the delays of the gates and interconnects can take any arbitrary values and are known for each gate and each wire. We assume that the current drawn from the supply lines during switching of a signal is of a triangular form as shown in Fig. 1 . The peak current is assumed to coincide with the transitions at the input of the gate. At any point in time a signal is assumed to have one out of four possible excitations: stable 1 value (S1), stable 0 value (S0), rising transition (r) or falling transition (f) [10] . To find an upper bound on the maximum instantaneous current by simultaneously considering all possible 2-vector input patterns, the excitations on the signals are represented using uncertainty waveforms [10] . An uncertainty waveform U(t) captures all possible excitations that a signal can have under any 2-input vector applied to the primary inputs. The following example illustrates the iMax procedure.
Example 1:
Consider the circuit in Fig. 2(a) . The rising and falling delays of all gates are assumed to be 0.1ns. It is assumed that at t = 0 any input signal can have any of the four possible excitations. Fig. 2(b) shows the uncertainty waveforms for all signals in this circuit. The transient current is assumed to be triangular with peak value of 3mA and the duration of the current pulse is assumed to be 0.3ns. From the uncertainty waveforms in Fig. 2(b) we can see that at time t = 0 only the inputs of gates d and e may be switching. Therefore, the total current at t = 0 is I tot (0) = I d +I e = 3+3 = 6mA. Next, to find the total current at time t = 0.1ns, we note that since the inputs to gate d can only switch at time t = 0, the current contribution of gate d at time t = 0.1ns is 2mA. The inputs to gate e can switch either at time t = 0 or at time t = 0.1ns and the maximum current contribution of gate e at time t = 0.1ns is I e = max (2, 3) mA. Also, at time t = 0.1ns the maximum current contribution of gate f is 3mA. Therefore, the total current at time t = 0.1ns is I tot (0.1) = I d +I e +I f = 2+3+3 = 8mA. The current contributions at other time points can be found in a similar way: I tot (0.2) = I d +I e +I f = 1+2+3 = 6mA, I tot (0.3) = I e +I f = 1+2 = 3mA, I tot (0.4) = I f = 1mA, I tot (0.5) = 0. From the above discussion, the maximum current is found to be 8mA at time 0.1ns.
DERIVING A LOWER BOUND ON THE MAXI-MUM INSTANTANEOUS CURRENT 3.1. Timed-ATPG and probability based approaches
Timed-ATPG and probability based methods approach the problem of estimating the maximum instantaneous current through test generation. The goal is to find a small set of two-vector patterns which would produce high instantaneous current. Both approaches use gate-level models and rely on the iMax algorithm [10] in the pre-processing step. Pre-processing step: Our goal is to use the current waveform produced by iMax to find a set of time instances when the current is most likely to have a high value. Fig. 3 shows a possible current waveform produced by the iMax algorithm. Because the signal correlations have been ignored in the iMax algorithm, the predicted maximum current (I iMax ) might be much higher than the actual one. Our assumption is that the value of the actual maximum instantaneous current is somewhere between the values I cutoff and I iMax . The value of I cutoff represents some percentage of I iMax . The time instances for which the value of the current is in the interval [I cutoff , I iMax ] we call the target times. Corresponding to each time instance t is a set of gates such that their simultaneous switching would cause current I max (t) to flow through the power supply lines. The set of gates that corresponds to a target time T is called set of target gates G. According to the assumed current waveform (Fig. 1 ) the output of a target gate is required to switch at time T + t r (g) or T + t f (g), where t r (g) and t f (g) represent the rising and falling delay of gate g, respectively. If a 2-input test can be found such that all target gates contribute to a transient current at the target time, the maximum instantaneous current would be equal to the current predicted by the iMax algorithm. However, often such a 2-input vector cannot be found. In our algorithm, for each pair (T, G) we try to find a two-vector test that maximizes the current produced by switching of the outputs of the target gates. We propose two methods for generating the two- g G ∈ vector test. The first method is based on the timed-ATPG technique [3] . The second is a probabilistic approach.
Timed-ATPG based approach
Timed-ATPG. Timed-ATPG is ATPG with an additional dimension: time. Timed-ATPG was first proposed in [3] for timing analysis. In timed-ATPG each signal is characterized by its logic value and the time interval in which this logic value should occur. Therefore, in timed-ATPG conflicts on the signals can be twofold: logic or timing conflicts. Logic conflicts occur when a signal is required to simultaneously have two different logic values. Timing conflicts occur when it is required that a signal be assigned the given logic value outside the required time interval. An example of a timing conflict is shown in Fig. 4 . Let us assume that the value 0 at the output of the AND gate has to be justified and that it is required that the output be 0 in the interval [0.6ns, 0.7ns]. The gate delay is assumed to be 0.1ns for both transitions. The inputs, on the other hand, can have value 0 only in the intervals [0.2ns, 0.4ns] and [0.1ns, 0.3ns]. Therefore, it will be impossible to justify the 0 value at the output of the AND gate because of a timing conflict. The timed-ATPG proposed in [3] was able to generate a one-vector test. Here, we extend the timed-ATPG concept to handle a vector pair. Our algorithm consists of several steps.
Step 1. Given the target time T and its corresponding set of target gates G, we try to assign transitions to as many target gates as possible. We process the gates in the target set one at a time. The order for processing the target gates depends on the value of the current contribution of the gate, i.e., the value of the current produced when the output of the target gate switches. The current contribution of a gate is a function of the load of the gate (sum of the input capacitances of its fanouts), the gate type (NAND, NOR, etc.), the type of the transition (rising, falling), the number of inputs that are switching. Also, since we assume that the current waveform is triangular, the current contribution of a gate depends on time. To estimate the value of the peak current and the value of the duration of the current pulse as a function of the above variables, we use a transistor-level simulator [13] to characterize the library cells and create lookup tables for different cells, with different number of inputs.
To decide which type of transition will be assigned to a target gate at the target time, we use the signal uncertainty waveforms derived in the iMax algorithm (see Fig. 2 ). For example, if at the target time the uncertainty waveform shows that the given signal can only have a rising (falling) transition, we assign a rising (falling) transition to the gate. However, in some cases the uncertainty waveform might indicate that the signal can be assigned either a rising or a falling transition at the target time. For such gates we pick the transition that produces a higher current at the target time.
Step 2. After the order for processing of gates has been decided, we assign the required transition to the target gate on the top of the list. Using the information about the target time T, gate delays (t r and t f ) and the uncertainty waveforms derived by iMax we try to sensitize a path from the given gate g to a primary input. The sensitized path has to be such that the required time for the transition at the given target gate is T + t r (g) or T + t f (g) (depending on the transition type) and the required time at the primary input is 0. In this process, we use only mandatory assignments and their implications for the on-and off-inputs of the path, and we keep updating the uncertainty waveforms using these assignments.
If there is a conflict in this phase, we backtrack to sensitize another path to a primary input. The number of paths that can be sensitized using the general delay model usually is not very large. This is because a necessary condition for a path to be sensitized is that the primary input at the source of the sensitized path has to be applied at time t = 0. If for some gate no path to a primary input can be sensitized, we leave the output of this gate unassigned and proceed with the next gate in the list. Gates that are processed first will have a larger search space for the sensitized path and that is why we process the gates with higher current contribution first. During the path sensitization phase, we keep track of the gates that require justification, i.e., we create a justification list.
Step 3. After all gates in the target set have been processed, we check if the justification list is empty. Some signals might be in the justification list more than once since it might be necessary to justify their value in more than one time interval. If all signals are successfully justified, there could still be some primary inputs with unspecified values. From the derived set of uncertainty waveforms at the primary inputs, we obtain the set of excitations that are possible for each such PI and we randomly assign one of them. On the other hand, if it is impossible to justify all the signals in the justification list for a given target time, we backtrack to the last decision in the path sensitization phase and try to sensitize a different path. If a different path can be sensitized, the justification procedure is again attempted. The procedure ends when either all gates are justified or when all possibilities for sensitizing the paths have been explored. The whole procedure is then repeated for the next (T, G) pair. Example 2: Consider again the circuit in Fig. 2 get that the target times are 0.1ns, 0.2ns and 0ns. Since the current is the highest at time t = 0.1ns we process this target time first. For t = 0.1ns, the set of target gates contains gates f, e and d. Their current contributions at the target time are 3mA, 3mA and 2mA, respectively. Either gate f or gate e can be processed first. We pick gate f since it is further from the primary inputs than gate e. From the uncertainty waveforms in Fig. 2(b) we see that the current contribution of gate f at the target time could be due to either a rising or a falling transition at time t = 0.2ns. In our example, the currents due to the falling or rising transition are the same and we randomly assign a falling transition to signal f. Next, we have to sensitize a path from f to some primary input. The only two paths that can satisfy the timing requirements for a falling transition at t = 0.2ns at the output of f are paths {adf, falling} and {bdf, falling}. Let the chosen sensitized path be {adf, falling}. The sensitized path and the requirements on the path on-inputs are shown in Fig. 5(a) . Next, we describe the process of updating the uncertainty waveforms: (1) Since the on-input d must have a rising transition at time t = 0.1ns (in order to have a falling transition at time t = 0.2ns on signal f), the only possible excitations at the offinput e are S1, rising transition at t = 0.1ns and falling transition at time t = 0.2ns. Therefore, the uncertainty waveform at e is updated and the new uncertainty waveform is implied across gate f (Fig. 5(a) ). (2) In order to have a rising transition at signal d at time t = 0.1ns when input a has a falling transition at t = 0, signal b can only be assigned a falling transition or a S1 value. The new uncertainty waveforms for signals a and b are shown in Fig. 5(b) . (3) A rising transition at t = 0.1ns at the input d combined with any excitation at the primary input c cannot produce a rising transition at t = 0.1ns at the output e. Therefore, the uncertainty waveform of signal e has to be further updated and it is shown in Fig. 5(b) .
After repeating this same process for gates e and d we get that, for example, assigning a falling transitions to inputs a and b and a rising transition to input c results in the maximum current of 8mA at time 0.1ns.
Probability based approach
Timed-ATPG approach can be computationally expensive for large designs. Therefore, for such designs we need to develop a more efficient method. In this subsection we describe our probability based approach to generate test vectors for high instantaneous current given a pair (T, G). This method is more practical for large designs than the timed-ATPG at the price of a looser lower bound on the maximum instantaneous current.
In this approach, the idea is to derive good weights for switching at the primary inputs to generate patterns for the maximum current. Our algorithm starts from a target time T and the set of target gates G generated in the pre-processing phase. It numerically characterizes each of the four possible excitations at the output of each gate . Next, these values are backward propagated to the primary inputs. Once the primary inputs are reached, the derived numerical values are used as weights for generating a small set of 2-vector tests which have a high probability to generate a high current at the target time T. To be able to handle arbitrary circuit delays, the assigned numerical values have to be associated with time.
Each gate in the target set is assigned four excitation lists: L 0 , L 1 , L r and L f . Each excitation list contains pairs of type (w, t). Value w represents a numerical measure characterizing the preference for the gate to have the given type of excitation at the time t. For example, if for a gate g, list L 1 (g), contains a pair (0, t) it means that at time t it is not desirable for g to have a stable 1 value. A higher value for w denotes a higher preference for the signal to have the given excitation at time t.
Step 1. Since our goal is to have as many gates in G as possible switching at the target time T, each target gate is initially assigned the following values:
For each target gate g, the value w f (w r ) represents the current contribution caused by the falling (rising) transition at the output of g at time T+t r (g) (T+t f (g)).
Step 2. After initializing all the excitation lists at the outputs of target gates, we propagate these lists backward to the primary inputs. The gates are processed in a topological order starting from the target gates towards the primary inputs. For each gate, the lists for all four excitations are propagated from the output to each of its inputs by backward propagating each (w, t) pair in the output list according to the rules explained below. Table 2 shows all the coefficients for a 2-input NAND gate.
The excitation lists for fanout stems are found by combining the excitation lists of the fanout branches. Combining of the excitation lists means that the pairs (w, t) with the matching time t are combined by summing up the values of w for each pair, and then the lists with different time components are concatenated together. An example for obtaining the excitation transition list for rising transition at the fanout stem is given in Figure 6 .
Step 3. Once the primary inputs are reached, the derived numerical values can be used as weights to generate weighted random tests for estimating the maximum instantaneous current. Example 4: Consider again the circuit in Fig. 2(a) . From Example 1, the maximum current occurs for target time T = 0.1ns and the target gates are f, e and d. For each target gate, we initialize the excitation lists using the information about the current contribution of the gate at the target time:
gates f and e:
2)}. Next, we need to backward propagate these excitation lists. We illustrate this step for gate f. The procedure for gates e and d is similar. Using the rules from Table 2 for gate f, we get: w r e w r d The bottleneck of the probability based approach is the iMax algorithm used in the first step. The excitation lists can be propagated backward in a linear time which makes this approach efficient for larger designs.
Genetic algorithm based approach
Genetic algorithms (GAs) [5] are search algorithms based on the mechanics of natural selection and natural genetics. To use GAs, the elements in the solution space are coded into finite length strings. Each string has an associated fitness value which depends on the application. Then, the GAs search the strings to find the one with the optimal solution. The simple GAs involve three processes: selection, crossover, and mutation. These three processes are applied to a current string population to create a new string population. The initial population contains N random strings of length L. The fitness value of each string is calculated by a fitness function. The objective of the simple GAs is to evolve a population of individuals having high fitness values. Generation of a new population is found by selecting two individuals from the current population, crossing the two selected strings, and mutating the elements of the new strings with a given mutation probability. The process is repeated until the number of strings in the new population is equal to N. Selection is biased towards individuals with higher fitness values so the average fitness value tends to increase. The next population is generated based on the current population using the same procedure. The process continues until the number of generations reaches a pre-defined value, or the optimal solution has been found.
To apply the GA to estimate the maximum instantaneous current, we need to address several issues. (1) We must decide a coding technique to translate each element (a two-vector sequence) in the solution space into a finitelength string. (2) There are many schemes for selection, crossover, and mutation in GAs. We need to determine a strategy for selecting the appropriate schemes. (3) The results of the GAs are strongly dependent on the fitness function. Therefore, we need to find an adequate fitness function. Coding scheme: We apply the following straightforward coding rules. For the combinational circuits, if the bit value of the first input vector is 0, and it is 0 in the second vector as well, then the bit is coded as 0. Similarly, 01 is coded as 1, 10 is coded as 2, and 11 is coded as 3. For the sequential circuits, we assume that any vector can be applied at the pseudo primary inputs as the first vector. Since the second vector of the pseudo primary inputs is determined by the first vector, different coding rules are applied for the pseudo primary input bits. If the bit value is 0(1) in the first vector, the bit is coded as 0(1). In other words, for pseudo primary input bits, we only consider and code the first vector. Selection scheme: We utilize the tournament selection [12] without replacement as the selection process. This selection scheme involves picking two individuals in the current population, selecting the one with a larger fitness value, and then temporarily removing the two picked individuals from the population until all other individuals are also removed.
Since two individuals are removed from the population for every individual picked, the original population is restored after the new population is half-filled, and the selection is applied again to fill the new population. The selection technique guarantees that the individual with the highest fitness value will be selected twice, and the individual with the lowest fitness value will not be selected at all. Since the individuals with lower fitness values tend to be removed, the average fitness value of the generation may increase, and the possibility to find the individual with the largest fitness value is higher. Crossover scheme and mutation probability: Crossover is the operation of generating two child-strings from two parent-strings. We apply the one-point crossover algorithm [5] . A bit position is randomly selected between 2 and (L -1), where L is the number of primary inputs of the circuit, and the two parent-strings are crossed at that point. Thus, the first child is identical to the second parent after the crossing point. This crossover scheme leads to generating new individuals with higher fitness values. As the mutation probability we use 1/L.
Fitness function for the maximum instantaneous current:
The maximum instantaneous current cannot be accurately estimated at the gate-level, so we use a transistor-level power/current simulator PowerMill [13] to simulate the two-vector sequence. The peak current reported by PowerMill is referred to as the fitness value of the sequence. The applied GA may reach and stay in a local optimal solution because the solutions derived by GA partly depend on the initial individuals. We apply the following heuristic to take the GA away from a local optimal solution: If there is no improvement after a pre-defined number of generations (k), the mutation probability of the next generation will be changed to 1/2. Setting the mutation probability to this value will generate new random individuals from the old good individuals. This value could lead the GA to search for solutions in a completely different direction. In our experiment, the pre-defined number of generations k is set to 5. The summary of our GA based algorithm is shown in Fig. 7 .
DERIVING AN UPPER BOUND ON THE MAXI-MUM INSTANTANEOUS CURRENT
In this section, we present our integer linear programming based approach for estimating the maximum instantaneous current. This approach produces the exact solution for smaller circuits, while for large circuits, it produces a tight upper bound. The ILP based approach consists of several steps. First, we derive a set of transformation rules for converting the logic functions of primitive gates into ILP formulae. Next, using these rules, we transform the logic description of a circuit into a set of integer linear constraints. Then, maximizing instantaneous current corresponds to optimizing an objective function with respect to the set of linear constraints. In the following, we describe our transformation rules. For CMOS circuits the current through the supply lines is mainly due to the switching on the signals. Based on the general-delay model, and using the iMax algorithm [10] we can obtain all possible switching times for each signal and calculate current contributions of all gates at all times. The GA for estimating maximum instantaneous current () Best-fitness = 0; gen = 0; Generate the initial population randomly; While (gen < the max. number of generations)
Compute the fitness values for all individuals in the population; Update the best-fitness; Set the mutation probability to 1/2 if no improvement after k generations; Selection using tournament without replacement; One-point Crossover; Mutation; Update the population; gen += 1; Figure 7 : GA for estimating the maximum instantaneous current. instantaneous current at time t corresponds to the sum of the current contributions of all gates at time t. The instantaneous current at each time instance can be modeled as a single function, and the instantaneous current for all time instances can be represented as a multi-function. Since the ILP package we use [11] can only optimize a single objective function, we further transform the optimization of a multi-function into the optimization of a single function. We use the following proposition to perform this transformation. Proposition 1. For an m-output multi-function with outputs c 1 , c 2 , ..., c m , the maximum value can be found by solving the following ILP formulae: Proof. Since the values of α 1 , α 2 , ..., α m are limited to 0 and 1, to satisfy constraint (2), one α must be set to 1, and others to 0. Constraints (3), (4), and (5) ensure that if α i is set to 1, then k i is equal to c i ; otherwise k i is equal to 0. The four constraints ensure that only one k can have a non-zero value, while others are equal to zero. Therefore, maximizing the objective function (1) results in the maximum value of the multi-function.
ILP Formulation for the Maximum Instantaneous Current
Before introducing our ILP formulation for the maximum instantaneous current we define the following notation: G is the set of all gates. g i is the gate with index i. g i (t) the output value of g i at time t. Fin(g i (t)) the set of all the g i 's fanin nodes at the corresponding times determining the value g i (t). T is the set of all time instances.
is the set of all possible transition times of the output of gate g i .
[T(g i )] j is the set of switching times of gate g i such that these transitions contribute to the instantaneous current at time j. is the current at time j contributed by gate g i .
is the current value at time j contributed by the output switching at time m of gate g i . = 0, otherwise.
I(j)
is the total instantaneous current at time j (for the entire circuit). The maximum instantaneous current problem can be formulated as follows:
,
,for 1 ≤ i ≤ ;
, for1≤i≤ , , ;
where L is a large real number whose value is greater than or equal to any possible value of the instantaneous current at all time instances, and the values of α 1 , α 2 , ..., α m are limited to 0 and 1. The objective function (6) states that we are going to maximize the instantaneous current. Constraints (7), (8), (9), and (10) implement the multi-function optimization (see Proposition 1). Constraint (7) states that the maximum instantaneous current appears only at one time instance. Constraint (11) states that the value g i (t) is determined by the values of all the fanin nodes at the corresponding times. The functionality of g i is expressed by using Rules 1-4. Constraint (12) states that the switching of gate g i at time t k happens when the output values of the gate are different at times t k and t k-1 . Constraint (13) refers to the instantaneous current at time j contributed by gate g i . The operation Max can be expressed using integer linear constraints. (This transformation is tedious, and the details are omitted here.) This current corresponds to the maximum of all possible current contributions that gate g i can have at time j. Constraint (14) states that the instantaneous current for the entire circuit at each time instance is derived by summing up the current contributions of all gates at the corresponding time instance. The optimal solution of the objective function represents the maximum instantaneous current. Example 6: Consider again the circuit in Fig. 2(a) . The rising and falling delays of all gates are assumed to be 0.1ns. The transient current is assumed to be triangular with peak value of 3mA and the duration of the current pulse is assumed to be 0.3ns. The ILP formulae for finding the maximum current of the circuit is shown in Fig. 9(b) . In this figure, the t -1 and t 0 are the times instances for the first and second vectors applied to the primary inputs, respectively. The unconstrained variables are a(
) and c(t 0 ). The objectives of the ILP formulae are to find the maximum current as well as the corresponding values of variables a(t -1 ) -c(t 0 ). The objective function (Eq. 4.0) corresponds to Equations (6 -10). Equations (4.1 -4.8) (4.9 -4.13) (4.14 -4.24) and (4.25 -4.29) correspond to Equations 11, 12, 13, and 14, respectively.
Partitioning-based approach
The time required for solving the ILP formulae grows rapidly with the increase of the size of the circuit. Therefore, we propose a partitioning-based approach to obtain upper bounds of the worst-case solutions for larger circuits.
This approach partitions a large circuit into sub-circuits, and applies our ILP-based approach for each sub-circuit to obtain the worst-case solution for each sub-circuit. After these worst-case solutions are obtained, the summation of the solutions of all the sub-circuits represents an upper bound of the worst-case solution for the entire circuit. In this section, we investigate the partitioning issues in order to achieve tight upper bounds of the worst-case solutions.
Our partitioning-based approach for obtaining a tight upper bound on the maximum instantaneous current, in the first step, uses the iMax algorithm [10] to produce an upper bound of the instantaneous current at each time instance. As mentioned before, the bound given by iMax for each time instance is a loose upper bound. Then, all time instances with non-zero upper bound of the instantaneous current are put in the processing list. This list is next sorted in descending order of the corresponding upper bound.
We select the time instance with the highest upper bound from the processing list and extract the part of the circuit which contributes to this upper bound. We then apply a partitioning algorithm K-MAFM [2] to partition this part of the circuit into sub-circuits. The maximum number of gates allowed in each sub-circuit is chosen as 300 in our experiment. Empirically, this value gives tight upper bound in a reasonable CPU time for our ILP-based approach. After partitioning the extracted circuit into sub-circuits, we obtain all possible switching time instances of all inputs for all sub-circuits. For each sub-circuit, there are no constraint for the values at the inputs in all possible switching times. Solving the corresponding ILP formula results in the maximum instantaneous current and the sub-circuit input values in all corresponding time instances.
We apply the ILP-based approach to each sub-circuit and then sum up the exact solutions for the sub-circuits. The result represents a new, tighter upper bound (tighter than the bound obtained by iMax) of the instantaneous current at the given time because the signal correlations in each sub-circuit are considered. If the upper bounds for some time instances in the processing list are already lower than the newly obtained upper bound, we do not need to process those time instances (we remove these time instances from the processing list). The above process continues by selecting a time instance with the next highest upper bound on the instantaneous current, and ends when the processing list is empty. The maximum value of the new upper bounds at all time instances is referred to as a tight upper bound of the maximum instantaneous current of the entire circuit. Figure  10 shows the summary of our algorithm.
EXPERIMENTAL RESULTS
We have implemented the four proposed approaches and tested them on benchmark circuits. In the experiments for timed-ATPG and probability based approaches, the value of I cutoff was set as 50% of the maximum current predicted by iMax algorithm. The time granularity for calculating the current was set to be 0.01ns for all approaches. The parameter p used in probability based approach was set as 2 in our experiments. For both approaches we generate a small set of patterns and simulate them using PowerMill [13] . The number of generated patterns for these two approaches does not exceed 100 for any of the tested benchmark circuits. In the experiments for genetic algorithm based approach, due to the fact that the execution time of GAs is directly proportional to the population size and the number of generations, both parameters must be set such as to adequately explore the solution space while limiting the execution time. The size of population of GA is set as 24, while the number of generations is 100. That means the number of patterns for PowerMill simulation is 2400. We also generate a set of weighted random patterns and simulate them using PowerMill. We generate these patterns in such a way that 90% of the inputs are assigned transitions. This is because simulating patterns in which each input has a transition does not have to necessarily produce the maximum current. This is especially true for circuits with XOR gates. The size of the set is the same as the number of patterns used by GA based approach. In the experiments for ILP based approach, we formulate the problem into ILP formulae and use a commercially available tool LINDO [11] to solve the formulae. The exact solution obtained by ILP based approach can be used to evaluate the estimation quality of other approaches for the maximum instantaneous current.
The estimated maximum instantaneous current for 9 small MCNC benchmark circuits is shown in Table 3 . All the values are normalized with respect to the exact solution obtained by ILP. In this table, Columns 2-3, 4-5, 6-7, 8-9, 10-11, 12-13 show the maximum instantaneous current and normalized values estimated by (1) iMax algorithm, (2) ILP-based approach, (3) timed-ATPG-based approach, (4) probability-based approach, (5) genetic-algorithm-based approach and (6) random approach, respectively. Note that the values estimated by timed-ATPG, probability-based, genetic-algorithm-based, and random approaches correspond to lower bounds, and the values estimated by iMax algorithm refer to upper bounds. The exact solution can be derived by the ILP-based approach. Note that all the values are validated by PowerMill except the one derived by iMax algorithm. The average CPU time of all the benchmarks for each approach is also reported.
The experimental results show that the timed-ATPG and genetic-algorithm based approaches produce tight lower bounds on the maximum instantaneous current. The probability-based approach is more efficient but it produces a looser lower bound on the maximum current. The execution time of the genetic-algorithm-based approach is less than the time of the timed-ATPG based approach, while the bound derived by these two approaches are similar. Also, for these small circuits, the lower bounds derived by timed-ATPG and genetic algorithm based approaches are close to the exact solutions derived by the ILP based approach.
The estimated maximum instantaneous current for the 8 largest ISCAS85 benchmark circuits are shown in Table 4 . The maximum instantaneous current and normalized values Perform iMax algorithm to obtain the upper bound of the instantaneous current at each time instance; Sort the time instances into a processing list; While the processing list is not empty { Select the time instance with the highest upper bound of the instantaneous current; Extract the part of the circuit which contributes to this current; Partition the extracted circuit into sub-circuits; Apply the ILP-based approach to each sub-circuit to obtain the instantaneous current at this time; Sum up the instantaneous current at this time in all sub-circuits; Update the upper bound of the instantaneous current at this time; Update the processing list; } Figure 10 : The flow of the partitioning approach for estimating the maximum instantaneous current. estimated by (1) iMax algorithm, (2) PIE algorithm with H 2 heuristic and parameter value 1000, (3) ILP-with-partitioning approach, (4) probability-based approach, (5) genetic algorithm based approach, and (6) random approach are shown in Columns 2-3, 4-5, 6-7, 8-9, 10-11, and 12-13, respectively. All normalized values are with respect to the values derived by the ILP-with-partitioning approach. Note that the values estimated by the ILP-with-partitioning approach, the iMax and PIE algorithms are upper bounds, and the values estimated by probability-based, genetic algorithm based and random approaches are the lower bounds.
The CPU times of all approaches are reported in Table 5 .
The experimental results show that the ILP-with-partitioning approach provides tighter upper bounds of the worst-case solutions for large circuits as compared to the bounds derived by iMax algorithm. Note that the upper bounds derived by ILP-with-partitioning approach are close to the lower bounds derived by the genetic algorithm based approach. Therefore, the two bounds confine the worst-case solutions to a small range.
The results for the 12 largest ISCAS89 sequential benchmark circuits are shown in Table 6 . Columns 2 and 3 show the maximum instantaneous current estimated by the random approach and the GA approach. Column 4 shows the ratio the maximum current produced by GA to that produced by the random approach. On the average the GA gives 21% tighter lower bounds on the maximum instantaneous current, than the bounds obtained by the random approach. The CPU times for the two approaches are reported in Columns 5 and 6, respectively.
We have also tested our GA approach for a set of large industrial designs whose statistics are summarized in Table  7 . The table gives the number of primary inputs, technology, power supply voltage, and the number of transistors in these circuits. The results for the maximum instantaneous current are shown in Table 8 . Column 2 shows the maximum instantaneous current estimated by the functional vectors while Column 3 shows the current estimated by the GA approach. Column 4 shows the ratio the maximum current produced by GA to that produced by the functional vectors. Columns 5 and 6 give the number of patterns used for functional vectors and GA. On the average the GA gives 82% tighter lower bounds on the maximum instantaneous current. The CPU times are reported in Columns 7 and 8, respectively. Table 9 summarizes the comparison of the proposed approaches. The ILP based approach has the highest computation complexity due to the need for solving the ILP formulae, while the probability-based approach has the lowest computational complexity. For the estimation quality, the ILP based approach offers the exact solution and, therefore, is of the highest quality. The timed-ATPG and the GA based approaches also offer high estimation quality, and the quality of random approach is the lowest. Finally, the GA based and random approaches are most flexible for handling circuits at different levels of abstraction because they are primarily simulation based and thus can be combined with simulators at different levels of simulators.
DISCUSSION AND CONCLUSIONS
We investigate four different approaches for estimating the maximum instantaneous current through the power supply lines of CMOS circuits. The experimental results show that in comparison with the lower and upper bounds derived by other approaches, the bounds produced by our approaches are much tighter. For producing the lower bound of large circuits, the GA-based approach seems to be most efficient and effective. As for producing a tight upper bound, the ILP-with-partitioning approach significantly outperforms previous approaches. Also, the bounds confine the exact solutions to a relatively small range. 
INTRODUCTION
PRIOR WORK, MODEL AND ASSUMPTATIONS
Example 1:
DERIVING A LOWER BOUND ON THE MAXI-MUM INSTANTANEOUS CURRENT 3.1. Timed-ATPG and probability based approaches
Timed-ATPG based approach
Step 3. After all gates in the target set have been processed, we check if the justification list is empty. Some signals might be in the justification list more than once since it might be necessary to justify their value in more than one time interval. If all signals are successfully justified, there could still be some primary inputs with unspecified values. From the derived set of uncertainty waveforms at the primary inputs, we obtain the set of excitations that are possible for each such PI and we randomly assign one of them. On the other hand, if it is impossible to justify all the signals in the justification list for a given target time, we backtrack to the last decision in the path sensitization phase and try to sensitize a different path. If a different path can be sensitized, the justification procedure is again attempted. The procedure ends when either all gates are justified or when all possibilities for sensitizing the paths have been explored. The whole procedure is then repeated for the next (T, G) pair. Example 2: Consider again the circuit in Fig. 2(a) get that the target times are 0.1ns, 0.2ns and 0ns. Since the current is the highest at time t = 0.1ns we process this target time first. For t = 0.1ns, the set of target gates contains gates f, e and d. Their current contributions at the target time are 3mA, 3mA and 2mA, respectively. Either gate f or gate e can be processed first. We pick gate f since it is further from the primary inputs than gate e. From the uncertainty waveforms in Fig. 2(b) we see that the current contribution of gate f at the target time could be due to either a rising or a falling transition at time t = 0.2ns. In our example, the currents due to the falling or rising transition are the same and we randomly assign a falling transition to signal f. Next, we have to sensitize a path from f to some primary input. The only two paths that can satisfy the timing requirements for a falling transition at t = 0.2ns at the output of f are paths {adf, falling} and {bdf, falling}. Let the chosen sensitized path be {adf, falling}. The sensitized path and the requirements on the path on-inputs are shown in Fig. 5(a) . Next, we describe the process of updating the uncertainty waveforms: (1) Since the on-input d must have a rising transition at time t = 0.1ns (in order to have a falling transition at time t = 0.2ns on signal f), the only possible excitations at the offinput e are S1, rising transition at t = 0.1ns and falling transition at time t = 0.2ns. Therefore, the uncertainty waveform at e is updated and the new uncertainty waveform is implied across gate f (Fig. 5(a) ). (2) In order to have a rising transition at signal d at time t = 0.1ns when input a has a falling transition at t = 0, signal b can only be assigned a falling transition or a S1 value. The new uncertainty waveforms for signals a and b are shown in Fig. 5(b) . (3) A rising transition at t = 0.1ns at the input d combined with any excitation at the primary input c cannot produce a rising transition at t = 0.1ns at the output e. Therefore, the uncertainty waveform of signal e has to be further updated and it is shown in Fig. 5(b) .
Probability based approach
Step 1. Since our goal is to have as many gates in G as possible switching at the target time T, each target gate is initially assigned the following values: T+t r (g) )}. For each target gate g, the value w f (w r ) represents the current contribution caused by the falling (rising) transition at the output of g at time T+t r (g) (T+t f (g)).
Step 2. After initializing all the excitation lists at the outputs of target gates, we propagate these lists backward to the primary inputs. The gates are processed in a topological order starting from the target gates towards the primary inputs. For each gate, the lists for all four excitations are propagated from the output to each of its inputs by backward propagating each (w, t) pair in the output list according to the rules explained below. other excitations and other gate types can be found in a similar way. Table 1 shows the truth table for a 2-input NAND gate. The first three rows in the left half of the table show all possible input excitations that result in a falling transition at the output of the NAND gate, the next three rows show the same for the rising transition, etc. There are three input combinations that lead to a falling transition at the output. From column in1 (or in2) we see that in two combinations input in1 (in2) has a rising transition and in one combination a S1 value. Therefore, the coefficient A r can be found as A r = 2/(1+2). Because we are interested in finding the maximum current, we prefer to have transitions at the internal signals instead of stable values. Let p denote a weight of a transition with respect to a stable value. In other words, if p = 1, a transition and a stable value are weighted equal, if p = 2, a transition is twice as valuable as a stable value, etc. Then, the expression for A r can be written as A r = 2p/ (1+2p). If the output of a NAND gate has a rising transition or a S0 value, from Table 1 we see that no input can have a rising transition. Therefore, we have B r = 0 and C r = 0. When the output of a NAND gate has a S1 value, there are two possibilities for an input to have a rising transition. Also, in this case the input can have a transition in a total of 4 cases. Therefore, using the transition weight p, we get D r = 2p/(5+4p). Table 2 shows all the coefficients for a 2-input NAND gate.
2)}. Next, we need to backward propagate these excitation lists. We illustrate this step for gate f. The procedure for gates e and d is similar. Using the rules from Table 2 for gate f, we get: Table 1: Truth table for w r e w r d
Similar to what was said for the fanout stems, for each gate, the pairs with matching time component are combined by adding up the numerical components while the pairs with different time components are just concatenated. After backward propagation for all gates in the circuits, at the primary inputs we get: input c:
inputs a and b:
where s 1 , s 2 , s 3 and s 4 are some complicated functions of p which are irrelevant for our discussion.
From the above expressions we see that all primary inputs have excitations that contain pairs of type (w, 0) and, for example, for p = 2 from these pairs we get:
gate c: w r = w f = 2.28, w 0 = 0.36, w 1 = 1.05 gates a and b: w r = w f = 4.20, w 0 = 0.36, w 1 = 3.21 These values can next be used to assign excitations at the primary inputs. For all inputs the maximum values appear for falling/rising transition and, for example, assigning falling transitions to a and b and a rising transition to c produces the maximum current of 8mA.
The bottleneck of the probability based approach is the iMax algorithm used in the first step. The excitation lists can be propagated backward in a linear time which makes this approach efficient for larger designs.
Genetic algorithm based approach
Fitness function for the maximum instantaneous current:
The maximum instantaneous current cannot be accurately estimated at the gate-level, so we use a transistor-level power/current simulator PowerMill [13] to simulate the two-vector sequence. The peak current reported by PowerMill is referred to as the fitness value of the sequence. The applied GA may reach and stay in a local optimal solution because the solutions derived by GA partly depend on the initial individuals. We apply the following heuristic to take the GA away from a local optimal solution: If there is no improvement after a pre-defined number of generations (k), the mutation probability of the next generation will be changed to 1/2. Setting the mutation probability to this value will generate new random individuals from the old good individuals. This value could lead the GA to search for solutions in a completely different direction. In our experiment, the pre-defined number of generations k is set to 5. The summary of our GA based algorithm is shown in Fig. 7. 
DERIVING AN UPPER BOUND ON THE MAXI-MUM INSTANTANEOUS CURRENT
ILP Formulation for the Maximum Instantaneous Current
I(j)
is the total instantaneous current at time j (for the entire circuit). The maximum instantaneous current problem can be formulated as follows: Maximize (6) Figure 9 : The ILP formulae of an example.
(a) (b) current. Example 6: Consider again the circuit in Fig. 2(a) . The rising and falling delays of all gates are assumed to be 0.1ns. The transient current is assumed to be triangular with peak value of 3mA and the duration of the current pulse is assumed to be 0.3ns. The ILP formulae for finding the maximum current of the circuit is shown in Fig. 9(b) . In this figure, the t -1 and t 0 are the times instances for the first and second vectors applied to the primary inputs, respectively. The unconstrained variables are a(
Partitioning-based approach
The time required for solving the ILP formulae grows rapidly with the increase of the size of the circuit. Therefore, we propose a partitioning-based approach to obtain upper bounds of the worst-case solutions for larger circuits. This approach partitions a large circuit into sub-circuits, and applies our ILP-based approach for each sub-circuit to obtain the worst-case solution for each sub-circuit. After these worst-case solutions are obtained, the summation of the solutions of all the sub-circuits represents an upper bound of the worst-case solution for the entire circuit. In this section, we investigate the partitioning issues in order to achieve tight upper bounds of the worst-case solutions.
EXPERIMENTAL RESULTS
DISCUSSION AND CONCLUSIONS
