A mixed integer linear programming (MILP) technique simultaneously minimizes the leakage and glitch power consumption of a static CMOS circuit for any specified input to output delay. Using dual-threshold devices the number of high-threshold devices is maximized and a minimum number of delay elements are inserted to reduce the differential path delays below the inertial delays of incident gates. The key features of the method are that the constraint set size for the MILP model is linear in the circuit size and power-performance tradeoff is allowed. Experimental results show 96%, 40%, and 70% reductions of leakage power, dynamic power, and total power, respectively, for the benchmark circuit C7552 implemented in the 70 nm BPTM CMOS technology.
INTRODUCTION
In the past, the dynamic power has dominated the total power dissipation of CMOS devices. However, with the continuous trend of technology scaling, leakage power is becoming a main contributor to power consumption. To reduce leakage power, several techniques have been proposed, including transistor sizing, multi-V th , dual-V th , optimal standby input vector selection, stacking transistors, dual V dd , etc. Among these, the dual-V th assignment is an efficient technique for decreasing leakage power. Its basic idea is to utilize the timing slack of non-critical paths to assign high V th to gates on those paths to decrease the leakage. There are heuristic algorithms 8 12 20-24 that search for an optimal solution of dual-V th assignment. For example, the backtrace algorithm 21 22 can determine a dual-V th assignment for a possible solution without guaranteeing an optimal one (see example of Fig. 10 ). Because the backtrace search direction for non-critical paths is from primary outputs to primary inputs, the gates close to the primary outputs have a higher priority for high V th assignment, even though their leakage power savings may be smaller than those of gates close to the primary inputs. Wang et al. 20 treat the dual-V th assignment as a constrained 0-1 programming problem with non-linear constraint functions. They use a heuristic algorithm based on circuit graph enumeration to solve this problem. Although their swapping algorithm tries to avoid local optimization, a global optimization is still not guaranteed.
By describing both the objective function and constraints as linear functions, linear programming (LP) can easily get a globally optimum solution. Nguyen et al. 11 use LP to minimize the leakage and dynamic power by gate sizing and dual-threshold voltage device assignment. The optimization work is separated into several steps. An LP is first used to distribute slack to gates with the objective of maximizing total power reduction. Then, another independent algorithm resizes gates and assigns threshold levels. This means that the LP still needs the assistance of a heuristic algorithm to complete the optimization. 11 Gao and Hayes 5 use mixed integer linear programming (MILP) to optimize the total power consumption by dual-threshold assignment and gate sizing.
The techniques cited above 5 8 11 12 20 22 23 have not considered the glitch power, which can account for 20%-70% of the dynamic switching power. 4 To eliminate these unnecessary transitions, a designer can adopt techniques of hazard filter 2 25-28 and path balance. 3 14 29 In Hazard filtering, gate sizing or transistor sizing is used to increase a gate inertial delay which can filter the glitches. An obvious disadvantage of hazard filtering, when used alone, is that it may increase the circuit delay due to the increase of the gate delay. Alternatively, any given performance can be maintained by path delay balancing, although the area overhead and additional power consumption of the inserted delay elements can become a major concern.
In the present research, a new MILP model is proposed to minimize leakage power by dual-V th assignment and simultaneously eliminate dynamic glitch power by inserting zero-subthreshold delay elements to balance path delays. To our knowledge, no previous work on optimizing dynamic and static power has adopted such a combined approach. This MILP method is specifically devised with a set of constraints whose size is linear in the number of gates. Thus, large circuits can be handled. Although theoretical worst-case complexity of MILP is exponential, actual complexity depends on the nature of the problem. A discussion about this point is presented at the end of Subsection 6.1. To deal with the complexities of delay models and leakage calculation, two look up tables for the delay and leakage current for both low and high threshold versions are constructed in advance for each cell. This greatly simplifies the optimization procedure.
To further reduce power, other approaches such as gate sizing can be easily implemented by extending our cell library and look up tables. However, a dual-V dd technique may require additional considerations beyond the delay look up tables for low V dd and high V dd .
This paper is organized as follows. Section 2 presents the necessary background knowledge about subthreshold leakage, delay, and glitches. Section 3 proposes the mixed integer linear programming for power minimization. Sections 4 and 5 discuss the implementation of delay elements for glitch elimination and the superiority of MILP, respectively. In Section 6, experimental results are presented and discussed. A conclusion is given in Section 7. Some work from this paper has appeared in a recent presentation by the authors. 9 
BACKGROUND

Leakage and Delay
The leakage current of a transistor is mainly the result of reverse biased PN junction leakage and subthreshold leakage. Compared to the subthreshold leakage, the reverse bias PN junction leakage can be ignored. The subthreshold leakage is the weak inversion current between source and drain of an MOS transistor when the gate voltage is less than the threshold voltage. 24 It is given by:
where 0 is the zero bias electron mobility, n is the subthreshold slope coefficient, V gs and V ds are the gate-tosource voltage and drain-to-source voltage, respectively, V T is the thermal voltage, V th is the threshold voltage, C ox is the oxide capacitance per unit area, and W eff and L eff are the effective channel width and length, respectively. Due to the exponential relation between V th and I sub , an increase in V th sharply reduces the subthreshold current. Our Spice simulation results on the leakage current of a two-input NAND gate are given in Table I for 70 nm BPTM CMOS technology 1 (V dd = 1 V, Low V th = 0 20 V, High V th = 0 32 V). The leakage current of a high V th gate is only about 2% of that of a low V th gate. If all gates in a CMOS circuit could be assigned the high threshold voltage, the total leakage power consumed in the active and standby modes can be reduced by up to 98%, which is a significant improvement. However, according to the following equation, the gate delay increases with the increase of V th .
where equals 1.3 for short channel devices. 17 Table II gives the delays of NAND gates obtained from Spice simulation when the output fans out to varying numbers of inverters. We observe that by increasing V th form 0.20 V to 0.32 V, the gate delay increases by 30%-40%.
We can make tradeoffs between leakage power and performance, leading to a significant reduction in the leakage power while sacrificing only some or none of circuit performance. Such a tradeoff is made in MILP. Results in Section 6.1 show that the leakage power of all ISCAS85 benchmark circuits can be reduced by over 90% if the delay of the critical path is allowed to increase by 25%.
Glitch Elimination Techniques
When transitions are applied at inputs of a gate, the output may have multiple transitions before reaching a steady state ( Fig. 1 and Fig. 2(a) ). Among these, at most one is an essential transition, and all others are unnecessary transitions often called glitches or hazards. Because switching power consumed by the gate is directly proportional to the number of output transitions, glitches reportedly account for 20%-70% dynamic power. 4 Agrawal et al. 3 prove that a combinational circuit is minimum transient energy design, i.e., there is no glitch at the output of any gate, if the difference of the signal arrival times at every gate's inputs remains smaller than the inertial delay of the gate. This condition is expressed by the following inequality:
where we assume t 1 is the earliest arrival time at inputs, t n is the most delayed arrival time at another input, and d i is gate's inertial delay, as illustrated in Figure 1 . The interval t n −t 1 is referred to as the gate output timing window. 14 To satisfy inequality, 4 we can either increase the inertial delay d i (hazard filtering) or decrease the path delay difference t n − t 1 (path balancing). Figures 2(b) and (c) illustrate these procedures for the gate of Figure 2 (a). Hazard filtering, when used alone, can increase the overall input to output delay. Path balancing does not increase the delay but requires insertion of delay elements. A combination of the two procedures can give an optimum design. 3 29 
AN MILP FOR POWER MINIMIZATION
We use a mixed integer linear programming (MILP) model to determine the optimal assignment of V th while maintaining any given performance requirement on the overall circuit delay. To minimize the total leakage the MILP assigns low V th to the largest possible number of gates while controlling the critical path delays. Unlike the heuristic algorithms, 8 12 20 22 23 the MILP gives us a globally optimal solution as discussed in Section 5.
To eliminate the glitch power, additional MILP constraints determine the positions and values of the delay elements to be inserted to balance path delays within the inertial delay of the incident gates. We can easily make a tradeoff between power reduction and performance degradation by changing the constraint for the maximum path delay in the MILP model.
Variables
Each gate is characterized by four variables: X i : assignment of low or high V th to gate i is specified by an integer X i which can only be 0 or 1. A value 1 means that gate i is assigned low V th , and 0 means that gate i is assigned high V th . Each gate has two possible values of delays, D Li and D Hi , corresponding to low and high thresholds, respectively. T i : latest time at which the output of gate i can produce an event after the occurrence of an input event at primary inputs of the circuit.
t i : earliest time at which the output of gate i can produce an event after the occurrence of an input event at primary inputs of the circuit. d i j : delay of a possible delay element that may be inserted at the input of gate i from gate j.
Thus, an n input gate is characterized by n + 5 quantities, i.e., n input buffer delay variables, two inertial delay constants, one [0, 1] integer variable, and two output timing window variables.
Objective Function
The objective function for the MILP is minimization of the sum of all gate leakage currents I leaki and the sum of all inserted delays:
For a static CMOS circuit, the leakage power is
If we know the leakage currents of all gates, the leakage power can be easily obtained. Therefore, the first term in the objective functions of this MILP minimizes the sum of all gate leakage currents, i.e.,
I Li and I Hi are the leakage currents of gate i with low V th and high V th , respectively. Recognizing that the subthreshold current of a gate depends on its input state, we make a leakage current look-up table of I Li and I Hi for all gates i through simulation. These look-up tables are similar to Table I and are used for power estimation by logic simulation as discussed in Section 6. For the MILP, we need one set of I Li and I Hi for each gate and the average values from the look-up tables can be used.
Besides the leakage power, we also minimize the glitch power, simultaneously. We insert minimal delays to satisfy the glitch elimination conditions at all gates. This leads to the second term in the objective function:
When implementing these delay elements, we use transmission gates with only the gate leakage. The two terms in the objective function, i I leaki and i j d i j , have different units and numerically i I leaki is 50 to 1000 times larger than i j d i j in our examples of benchmark circuits. Therefore, the objective function of Eq. (5) puts greater emphasis on leakage power, assuming it to be the dominant contributor to the total power. Experimental results show that an objective function Min A × i I leaki +B × i j d i j with A → large constant and B = 1 generates the same results as those by the objective function of Eq. (5) in which the terms are left unweighted. In general, suitable weight factors A and B can be used to make tradeoffs between leakage power reduction and glitch power elimination.
Constraints
Constraints are imposed on each gate i with respect to each of its fanin j, where j refers to the gate providing the fanin:
where D Hi and D Li are the delays of gate i with high V th and low V th , respectively. With the increase in fanouts, the delay of the gate increases proportionately. Therefore, a look-up table is constructed by simulation and specifies the delays for all gate types for varying fanout numbers. D Li and D Hi for gate i are obtained from the look-up table whose entries are indexed by the gate type and the number of fanouts. As discussed in Subsection 2.2, constraints (9-11) ensure that the inertial delay of gate i is always larger than the delay difference of its input paths. This would be done by inserting the minimal number of delay elements while maintaining the critical path delay constraints. We explain constraints (9-11) using the circuit shown in Figure 3 . Here the numbers on gates are gate indexes and not the delays. Red (bold) lines show critical paths and two grey shaded triangles are delay elements possibly inserted on the input paths of gate 2. Similar delay elements are placed on all primary inputs and fanout branches throughout the circuit. Let us assume that all primary input (PI) signals on the left arrive at the same time. For gate 2, one input is from gate 0 and the other input is directly from a PI. Its constraints corresponding to inequalities (9-11) are:
Variable T 2 that satisfies inequalities (12) and (13) is the latest time at which an event (signal change) could occur at the output of gate 2. Variable t 2 is the earliest time at which an event could occur at the output of gate 2, and it satisfies both inequalities (14) and (15) . Constraint (16) means that the difference of T 2 and t 2 , which equals the delay difference between two input paths, is smaller than The critical path delay T max is specified at primary output (PO) gates 1 and 3, as:
T max can be the maximum delay specified by the circuit designer. Alternatively, the delay of the critical path (T c ) can be obtained from a linear program (LP) by assigning all gates to low V th , i.e., X i = 1 for all i. The objective function of this LP is minimization of the sum of T k 's where k refers to primary outputs. The critical path delay T c is then the maximum of T k 's found by the LP.
If T max equals to T c , the actual objective function of the MILP model will be to minimize the total leakage current without affecting the circuit performance. By making T max larger than T c , we can further reduce leakage power with some performance compromise, and thus make a tradeoff between leakage power consumption and performance.
When we use this MILP model to simultaneously minimize leakage power with dual-V th assignments and reduce dynamic power by balancing path delays with inserted delay elements, the optimized version for the circuit of Figure 4 is shown in Figure 5 . In these figures the labels in or near gates are inertial delays. Three black shaded gates are assigned high V th . They are not on critical paths (shown by red or bold lines) and their delay increase does not affect the critical path delay. Although delay elements were assumed to be present on all primary inputs and fanout branches, only two were assigned non-zero values. They are shown as grey triangles with delays of 1.5 and 3.0 units, respectively. To minimize the additional leakage and dynamic power consumed by these delay elements, we implement them by CMOS transmission gates. In Section 4, we will show that an always turned-on CMOS transmission gate can be treated as a zero-subthreshold leakage and low-dynamic-powerconsumption delay element. 15 16 32 A 14-gate full adder is used as a further illustration. Figure 6 is the original circuit with all low V th gates. Critical paths are shown in red (or bold) lines. Figure 7 shows an MILP solution. All gates on non-critical paths were assigned high V th (black shaded) to minimize leakage power. At the same time, three delay elements (grey shaded) are inserted to balance path delay to eliminate glitches. When the critical path delay is increased by 25%, the MILP gives the solution of Figure 8 . Greater leakage power saving is achieved since some gates on the critical path are also assigned high V th . All three circuits were implemented in the 70 nm BPTM CMOS technology 1 we mentioned in Section 2.1. The three delay elements use high-V th devices and their design is described in the next section. The leakage average leakage currents for the circuits of Figures 6 (unoptimized), 7 (optimized with no critical path delay increase), and 8 (optimized with 25% increase in critical path delay) were 161 pA, 73 pA, and 16 pA, respectively.
DELAY ELEMENT IMPLEMENTATION
In our design, all the delay elements are implemented by transmission gates, whose obvious advantage is that they consume very little dynamic power because they are not driven by any supply rails. 10 They also have lower area overhead and leakage power consumption compared with the more conventional two-cascaded-inverter buffer. 15 16 18 19 CMOS transmission gates are adopted in our design to avoid the voltage drop when signal passes through series transistors. The circuits in Figure 9 simulated for the subthreshold current by Smart-Spice were used to compare the leakage power dissipation in the two delay elements. In Figure 9 (a), there is only a gate leakage path and no subthreshold leakage. The two transistors are always turned on. In two cascaded inverters of Figure 9 (b), beside gate leakage, subthreshold paths always exist. Hence, we can treat a transmission-gate delay element as a zero-subthreshold-leakage delay element. The delay of a transmission gate is given by:
Where R eq is the equivalent resistance of a CMOS transmission gate, and C L is load capacitance. By changing the widths and lengths of the transistors, we can change the delay of the transmission gate. We simulated the circuit of Figure 9 (a) for nearly 80 transmission gates with transistors whose dimensions were varied. By subtracting the delay of the circuit in which the transmission gate was replaced by a short, we obtained the delay of the transmission gate. These data were arranged in a lookup table of delays versus transmission gate dimensions. For any required delay between two entries in the look-up table, the size of the transmission gate is determined by interpolation.
The is a deterministic approach in which the initial delay of a gate is assumed to have a fixed value. However, variations of process parameters, especially in nanometer technologies, can change gate delays and affect the path delay balancing, causing incomplete suppression of glitches. Hu 6 30 proposes a statistical analysis to treat the gate delays as random variables with normal distributions. The results show that the power distribution due to the process variation can be reduced. Our deterministic MILP model can also be extended as a statistical MILP model to minimize the impact of the process variation on the glitch elimination and leakage power. 31 
ILP AND HEURISTIC ALGORITHMS
In the introduction, we mentioned several heuristic algorithms 8 12 20 22 23 used for dual-V th assignment. Due to the intrinsic limitation, heuristic algorithms normally aim at achieving a locally optimal solution. In MILP, the objective function and constraints are both linear, ensuring a global optimization. To illustrate the point, we examine the backtrace algorithm 22 as an example to show the advantage of the MILP.
In Figure 10 , the XOR gate (gate 1) close to primary input has the largest leakage power reduction if assigned a high threshold. However, in Figure 10(a) , the slacks for the non-critical paths are first consumed by gates 6, 3, and 4, which are closer to primary outputs. Hence, by the time the backtrace arrives at the XOR gate the slack has already been used up and it cannot be assigned high-V th . In Figure 10 (b), MILP considers leakage reduction and delay increase of each gate simultaneously, making sure that the best candidates (gates with the largest leakage reduction without violating the timing constrains) are selected. Due to the global optimization, MILP achieves 26% greater leakage power saving compared to the heuristic backtrace algorithms. Other heuristic algorithms have the similar problems, because the available slack for each gate must depend on the search direction or the selected cut 20 in the circuit graph. Thus, a global optimization cannot be guaranteed. Table III . Leakage reduction alone due to dual-V th assignment (27 C). 
RESULTS
To study the increasingly dominant effect of leakage power, we use the BPTM 70 nm CMOS technology. 1 Low V th for NMOS and PMOS devices are 0.20 V and −0.22 V, respectively. High V th for NMOS and PMOS are 0.32 V and −0.34 V, respectively. We regenerated the netlists of ISCAS'85 benchmark circuits using a cell library in which the maximum gate fanin is 5. Two look-up tables for gate delays and leakage currents, respectively, of each type of cell were constructed using Spice simulation. A C program parses the netlist and generates the constraint set (see Section 3) for the CPLEX ILP solver in the AMPL software package. 13 CPLEX then give the optimal V th assignment as well as the value and position of every delay element. The dynamic power is estimated by an event driven logic simulator that incorporates an inertial delay glitch filtering analysis.
Leakage Power Reduction
The results of the leakage power reduction for ISCAS'85 benchmark circuits are shown in Table III . Here the objective of the MILP was set to minimization of leakage alone. All d i j variables were forced to be 0 and constraint 11 was suppressed. The numbers of gates in column 2 are for our gate library and differ from those in the original benchmark netlists. T c in column 3 is the minimum delay of the critical path when all gates have low V th . This was determined by the LP discussed in Subsection 3.3 in the paragraph following Eq. (17) . Column 4 shows the total leakage current with all gates assigned low V th . Column 5 shows the optimized circuit leakage current with gate V th reassigned according to the MILP optimization. Column 6 shows the leakage reduction (%) for optimization without sacrificing any performance. Column 9 shows the leakage reduction with 25% performance sacrifice.
From Table III , we see that by V th reassignment the leakage current of most benchmark circuits is reduced by more than 60% without any performance sacrifice (column 6). For several large benchmarks leakage is reduced by 90% due to a smaller percentage of gates being on critical paths. However, for some highly symmetrical circuits, which have many critical paths, such as C499 and C1355, the leakage reduction is less. Column 9 shows that the leakage reduction reaches the highest level, around 98%, with some performance sacrifice.
The curves in Figure 11 show the relation between normalized leakage power and normalized critical path delay in a dual-V th process. Unoptimized circuits with all low V th gates are at point (1, 1) and have the largest leakage power and smallest delay. With optimal V th assignment, leakage power can be reduced sharply by 60% (from point (1, 1) to point (1, 0.4)) to 90% (from point (1, 1) to point (1, 0.1)), depending on the circuit, without sacrificing any performance. When normalized T max becomes greater than 1, i.e., we sacrifice some performance, leakage power further decreases with a slower decreasing trend. When the delay increase is more than 30%, the leakage reduction saturates at about 98%. Thus, Figure 11 provides a guide for making tradeoffs between leakage power and performance.
The CPU times shown in columns 7 and 10 of Table III are for the MILP. Although in the worst case, the solution time of MILP is exponential in the problem size, in Table IV . Leakage, glitch, and total power reduction for ISCAS'85 benchmark circuits (90 C). The leakage current increases with temperature because V T (thermal voltage, kT/q) and V th both depend on the temperature. Our Spice simulation shows that for a 2-input NAND gate with low V th , when temperature increases from 27 C to 90 C, the leakage current increases by a factor of 10. For a 2-input NAND gate with high V th , this factor is 20.
The leakage in our look-up table is from simulation for a 27 C operation. To manifest the dominant effect of the leakage power, we estimate the leakage currents at 90 C by multiplying the total leakage current obtained from CPLEX 13 by a factor between 10 and 20 as determined by the proportion of low to high threshold transistors.
The dynamic power is estimated by a glitch filtering event driven simulator, and is given by
where C inv is the gate capacitance of an inverter, T i is the number of transitions at the output of gate i when 1,000 random vectors are applied at PIs, and FO i is the number of fanouts for gate i. Vector period is assumed to be 20% greater than the critical path delay, T c . By simulating each C432  160  160  C499  182  128  C880  328  303  C1355  214  112  C1908  319  313  C2670  362  330  C3540  1097  1258  C5315  1178  1198  C6288  1189  1307  C7552  1046  845 gate's number of transitions, we can estimate the glitch power reduction. To demonstrate the projected dominant effect of leakage power in a sub-micron CMOS technology, we compare the leakage power and dynamic power at 90 C in Table IV . "All low V th " means the unoptimized circuit that has all low threshold gates, and "Dual V th " means the optimized circuit whose V th has been optimally assigned for minimum leakage. Column 6 gives the glitch power of the optimized design, which is further reduced as shown in column 7 when glitches are eliminated. We observe that for 70 nm BPTM CMOS technology at 90 C, unoptimized leakage power (column 3) of some large ISCAS'85 benchmark circuits can account for about one half or more of the total power consumption (column 9). With V th reassignment, the optimized leakage power of most benchmark circuits is reduced to less than 10%. With further glitch (dynamic) power reduction, total power reductions for all circuits are more than 50%. Some have a total reduction of up to 70%.
However, the area overhead due to the inserted delay elements is large. From Table V , we observe that the number of delay elements ( di #) is almost equal to the number of gates (Gates #), except for C1355. If we assume that the average number of transistors in a gate is 4 (e.g., consider a 2-input NAND gate), and each delay element implemented by a transmission gate has 2 transistors, the area overhead will be around 50% due to delay element insertion. The main reason is that our cell library has some complex gates, for example, AOI (AND-OR-INVERT) gates whose fanin number may be up to 5. Some NAND or NOR gates can also have up to 4 inputs. As a result, it is very possible that more than one delay buffer is inserted for a gate. The solution is to use a simpler and smaller cell library which will be used in our following research.
CONCLUSION
A new technique to reduce the leakage and dynamic glitch power simultaneously in a dual-V th process is proposed in this paper. A mixed integer linear programming (MILP) model is generated from the circuit netlist and the AMPL CPLEX 13 solver determines the optimal V th assignments for leakage power minimization and the delays and positions for inserting delay elements for glitch power reduction. Experimental results for ISCAS'85 benchmarks show reductions of 20%-96% in leakage, 28%-76% in dynamic (glitch), and 27%-76% in total power. We believe some of the other techniques, such as gate sizing and dual power supply can also be incorporated in the MILP formulation. Ongoing work incorporating process variation in this power reduction technique will be the topic of a future publication. 31 The transmission gate delay elements avoid the comparatively larger capacitive dissipation and subthreshold leakage inherent in the alternative design of twoinverter type of delay elements. However, the gate leakage of the transmission gate delay element could become a concern and will require further investigation.
