Glitches are an important source of power dissipation in static CMOS ICs that can contribute to as much as 70% of total power dissipation in certain cases (e.g., arithmetic modules). Although research into various aspects of glitch power dissipation has been undertaken in the past, most approaches to addressing it are ad hoc and limited in their applicability.In this paper, we propose a new framework, gate triggering, for systematically minimizing glitch power dissipation in static CMOSICs. The framework is based on the idea that glitches can be effectively minimized by triggering logic evaluation at a gate only when all of its inputs have stabilized. For this purpose, to every potentially glitchy gate is added a small amount of control logic, which, when enabled, triggers logic evaluation at the gate. A clocked delay chain is employed to generate enable signals at the proper times for all gates to be triggered. We present an integer linear programming (ILP) formulation to minimize the overheads (viz. delay element, control logic, and extra wiring) of our approach subject to a critical-path delay constraint. Application of the new approach to test circuits (such as ripple carry adder and array multiplier) in 1 . 2~ technology yields 95% or more elimination of glitch power dissipation with negligible area and timing overheads after optimization. An added advantage of the approach is that short-circuit power dissipation at all triggered gates is also minimized-shortcircuit power dissipation in current standard-cell based designs can exceed 50% of the to-tal power dissipation.
Introduction
Increasing levels of device integration, die size, and operating frequency, combined with a burgeoning portable computing and communications market, have made power dissipation a major concern in VLSI design [12, 131. Among VLSI technologies, digital CMOS is by far the dominant one and consumes relatively less power. Complementary static CMOS is a popular digital CMOS logic style that is being increasingly employed because of its robustness, which lends itself to design automation, and because of its amenability to voltage scaling for low power [ 131. In this logic style, power is primarily dissipated during logic transitions when gate load capacitances charge and discharge.
While some logic transitions are necessary and are dictated by circuit functionality, others, such as glitches, are not.
Glitches are spurious transitions that occur before a gate output reaches a stable value and are caused by unequal propagation delays of input signals to the gate. This is depicted in the synchronous sequential circuit, a popular and structured design style [ 1 I], of Fig. I(c) . Also, evident in Fig. l(c) is that glitches multiply as they propagate through the combinational logic block. Glitch power is typically significant and can be as high as 70% of total power dissipation in some cases [12] . As we go into deeper submicron technologies, interconnect delays become more predominant, which leads to differential delays and more glitching [16] .
Glitch minimization is important not only for low power, but also because of other reasons. Power estimation is a difficult problem because glitch power dissipation is significant and is hard to estimate accurately [ 1 1, 121 . Thus minimizing it will improve the accuracy of power estimation. Asynchronous systems need to be glitch-free to operate correctly [17] , and so also does the clock-generation circuitry in a synchronous system [4] . Finally, glitches are also important to minimize in high-performance digital-to-analog converters [ 191.
Short-circuit power dissipation, which occurs because of the DC current that flows from VDD to VSS during switching when both N pulldown and P pull-up conduct simultaneously, is another source of concern. This is especially 0-7803-5766-3/00/$10.00 02000 IEEE Control logic comprising n and p control transistors connected to VSS and VDD, respectively, is enabled when the last input to the gate arrives. This prevents unnecessary charging and discharging to VDD and Vss, respectively, of output capacitance and internal capacitances in P pull-up and N pull-down and also prevents short-circuit current during the time when input signals are unsteady. Note that for this particular control logic, two de!lay chains are needed, one for the p and the other for the n control transistor. Conclusions are in Sec. 6.
Proposed Methodology: Gate Triggering
The key idea we employ to minimize glitches is to trigger logic evaluation at a gate only after all of its inputs have stabilized. For this purpose, to every potentially glitchy gate, we add some small control logic, which, when enabled, triggers logic evaluation at the gate (see , when the last input to the individual gates has stabilized), we first perform a timing simulation of the combinational block. Timing simulation is an essential step in the design flow of a VLSI chip [18] (e.g., to determine the critical-path delay in a combinational block, which in turn determines the clock period). Hence it does not represent an extra step in the application of our method. From this timing simulation, we obtain the delays of different gates and also the latest times by which the various inputs of a gate will have stabilized. For instance, in Fig. 1 (d) , gate go1 has a delay of five time units, and its top, middle, and bottom inputs will have steady-state values latest by five, four, and two time units, respectively. In order to prevent glitches at the output of gate ,001, its control logic can be enabled at time five and should re-main enabled for at least five time units, which is the delay of the gate, so that the gate logic may evaluate completely.
Therefore we use an enable signal for the combinational block with a high period equal to the maximum delay for any gate in the block (five units for the combinational block of Fig.  l(d) ). As shown in Figs. I(a) and (d), the enable signal is generated by ANDing clock complement with the clock signal delayed by this maximum delay. This initial enable signal is then delayed by various amounts using a delay chain comprising delay elements as in Fig. l 
(d).
The output of a delay element in this chain provides an appropriately delayed version of the initial enable signal that can be used to trigger a gate(s). For example, in Fig. I(d) , gates go1 and gl 1, for both of which the last input stabilizes by time five, are controlled by the initial enable signal delayed by five time units. Similarly, gate g32 is enabled at time nine, since that is when its last input (the bottom input) stabilizes. In contrast, gate ,021 does not need any control logic or enable signal since both of its inputs have equal delays.
Using the above approach, all potentially glitchy gates are triggered by the enable signal when the last input to them stabilizes, thus ideally preventing all glitches in the combinational block. In practice, however, minor or partial glitches may occur due to the nonideal behavior of transistors. It should also be noted that short-circuit power dis-sipation in all triggered gates can be minimized by triggering them after the last input has stabilized, since before triggering, gate connection to V, , and/or VSS is cut off by the control logic. However, in most cases, it will not be cost-effective to control all gates in this manner to minimize short-circuit power dissipation because of the overheads it will entail. Rather, it will be best to control few select gates where potential for glitch and shortcircuit power savings is maximum.
The main overheads of our approach are logic (delay element and control logic) overhead, wiring overhead for generating and routing the enable signal for potentially glitchy gates, and a delay overhead because of an increased delay for the combinational block. The logic overhead for generating the initial enable signal using an AND gate and a delay element is minimal. We have observed in our simulations that the delay overhead is negligible. Note that reducingthe number of delay elements or the amount of control logic should lead to lower wiring overhead, since each delay element corresponds to a distinct enable signal to be routed and each control logic corresponds to an enable signal to be routed to control it. In the next section, we provide some ways by which logic overhead, and thus wiring overhead, can be minimized.
Logic and Wiring Overhead Optimization

Delay Element Optimization
The delay element overhead depends primarily upon the total delay provided by all delay elements and the number of delay elements. The number of delay elements in turn depends upon the number of delayed versions of the enable signal needed from the delay chain. Therefore, the number of delay elements can be reduced by synchronously triggering with a common enable signal as many gates as possible after their last inputs have stabilized. For example, in Fig. l(e) , the set of gates go1 and g l l , the set of gates g02, ,012, g22, and g32, and the set of gates 803, 813, and g23 are all triggered synchronously by enable signals delayed by five, ten, and fourteen time units, respectively. Note that to synchronize, some gates are triggered later than normal (such as g32, and g13 in Fig. l(e) compared to Fig. l(d) ). Also, note that the gates selected for late triggering are not on any critical path (shown with bold lines in Fig. l(e) ) so as not to increase the overall delay of the combinational block. The application of this optimization thus results in a much reduced number of delay elements in the delay chain (three in Fig. l(e) compared to five in Fig. l(d) ) Note that a smaller number of delay elements also means a smaller wiring overhead, since a fewer number of distinct enable signals need to be routed. The delay element we chose is a transmission gate with appropriately-sized transistors to provide the required amount of delay. We selected a transmission gate because it requires less area and consumes very little power. A detailed discussion and comparison of delay elements motivating our choice is the subject of another paper ( [9] ).
Control Logic Optimization
There are two ways in which control logic may be optimized. First, after applying the technique to reduce the number of delay elements discussed above, we can use the same control logic to control all gates that are to be triggered simultaneously. For instance, in Fig.  l(e) , the set of gates go1 and g l l can be controlled by the same control logic as shown in Fig. l(f) . However, it should be noted that sharing the control logic in this way may mean that the transistors of the ccintrol logic will need to be sized up (compared to when no sharing is done) to avoid increase in delay of the combinational block.
Another way to reduce the amount of control logic is to schedule the triggering of earlier gates so that inputs to later gates are synchronized. For instance, in Fig. l(g) , gates g l l and go2 are triggexed later than normal and control logic is added to gate 9 2 1 (compare Fig. l(g) to Fig. l(f) ) so that all inputs to gates 9-12, 922, 903, and g 2 3 are synchronized, thereby obviating the need for controlling these gates using control logic. Note again that the gates selected for late triggering are those not on any critical path in order not to increase the delay of the combinational block:. This results in less control logic, and possibly less delay element and wiring overhead, since: some enable signals no longer need to be generated and routed.
The various optimizations discussed above are not exclusive, but may be used in conjunction to various degrees depending upon the combinational logic block under consideration to minimize the total overhead. In the next section, we formulate this overhead minimization problem as an integer linear program (ILP) subject to a critical-path delay constraint.
ILP Formulation for Overhead Minimization
can be stated as fol-lows:
The overhead minimization problem Problem 1 Minimize a weighted sum of the total amount of delay and the number of delay elements in the delay chain, and the number of gates triggered (which corresponds to the amount of control logic .and wiring required), such that: ( I ) there are no glitches, and (2) the critical-path delay of the circuit does not exceed a specijied upper bound Clearly, the total amount of delay corresponds to the latest triggering time over all gates, while the number of delayelements to the number of distinct gate triggering times (see Fig.  I(&) . No glitching requires that every gate with asynchronous inputs must be triggered no earlier than the latest input arrival time for that gate; obviously, a gate with synchronous inputs will not glitch and hence should not be triggered. The problem of glitch minimization, in which the amount of glitching is part of the objective to be minimized, as opposed to glitch elimination being considered here, seems to be more difficult, and will be considered in future research. The second constraint in Problem 1 implies the following theorem.
Theorem 1 There exists a finite set Tu of triggering time instants for every gate U in the circuit such that the optimal solution to Problem I is not agected by restricting U ' S triggering time to T,.
Proof Outline: The latest input arrival time for a gate in the original circuit (before applying the glitch-minimization technique) and the upper bound on critical-path delay set lower and upper bounds, respectively, on the triggering time of the gate. Triggering a gate later than the lower bound time, rather than triggering it at the lowerbound time, can lead to lower overhead only if it satisfies one of the following two conditions. (1) The gate triggering time synchronizes with the triggering time of at least one other gate, so that a common control signal can be used for both gates, thereby saving a delay element (see Fig.  1(f) ). (2) The gate triggering time is such that the gate output synchronizes with the arrival of other inputs to some fan-out gate, thereby saving a control element at the fan-out gate, which does not need to be controlled (see Fig. l(g) ). There are only a finite set of triggering time instants that will lead to one of these two synchronizations.
Space constraints do not permit us to specify and prove what the exact finite set of triggering times implied by Theorem 1 is for a gate. The constants, variables, expressions, objective (corresponding to Problem l), and constraints for the ILP and their descriptions are given for easy reference in Table 1 . Only two points need to be explained, and after that the rest of the ILP is easily understood by inspection of the detailed descriptions in the The second point to be understood is the correspondence between the two principal constraints of Problem 1 (i.e.. no glitches and bounded critical-path delay increase) and the constraints of the table. Constraints RI through R7 ensure that a gate is triggered when its inputs are asynchronous.
Constraints RR and R9 ensure that whenever a gate is triggered, its triggering time is no earlier than the latest input arrival time for the gate, so that all glitches are eliminated. We note that the upper bound on the triggering time for every gate automatically enforces the constraint on the increase in the critical-path delay. The objective function in the table directly corresponds to the objective in Problem 1.
Related Work
Glitch and short-circuit power dissipation are discussed in [12, 131. Glitch estimation, modeling, and propagation issues are covered in [3, 11, 161 . The importance of glitch minimization for various applications is considered in [4, 17, 191. Designing two-level glitch-free circuits using logic redesign, assuming only one input changes at a time, is addressed in [6] . Glitch removal through path balancing obtained via, say, transistor sizing or layout changes, is discussed in [lo, 12, 131; this can be cumbersome and involves trial and error. Furthermore, in deep submicron technologies, transistor sizing will not be very effective for path balancing since logic delays become relatively smaller compared to interconnect delays. Retiming and buffer placement approaches to filter or reduce glitches and glitch propagation are described in [2, 71; these approaches, although somewhat effective, entail appreciable area overheads for flipflops and buffers. Glitch reduction at the RTL level in control flow intensive designs is given in [15] .
Therefore, current methods for glitch reduction are either (i) not applicable in all contexts, or (ii) can not be auto-mated and are ad hoc, or (iii) are not very effective, or (iv) have high areddelay overheads, or (v) restrict the manner in which logic is transformed to a gate realization. There is no methodical, generally applicable approach to minimizing glitch power dissipation. Our proposed gate triggering approach in this paper attempts to overcome all the above limitations of current approaches.
Conclusions
Although research into various aspects of glitch power dissipation has been undertaken in the past, most approaches to addressing it are ad hoc and limited in their applicability. This paper presented a new framework called gate triggering for systematically minimizing glitch power dissipation in static CMOS ICs. The logic and wiring over-heads of our approach were analyzed and an ILP formulation was given to minimize these overheads subject to a criticalpath delay constraint. Application of the new approach to test circuits (such as ripple carry adder and array multiplier) yields 95% or more elimination of glitch and, in gates to which applied, short-circuit power dissipation with very little to negligible area and timing overheads after optimization. timesforgateu. 
EIO(U) 5 El(%) + M . J 3 7 (~)
M is a sufficiently large number,
U E V
R12
rival times are equal). the time cannot be earlier than the sum of its earliest input arrival time and its gate delay. It gate U IS not mggered (i.e., all its inputs are synchronous or its earliest and latest input arrival times are equal). the time at which its output stabilizes cannot be any later than the sum of its latest input arrival time and its gate delav.
Triggering time Ti is not chosen (i.e., A; = 0) if no gate is triggered at time Ti.
The number of latest triggering times can be no greater than 1.
e latest triggenng nme can only be chosen from one of the Table 1 : ILP formulation for minimizing delay element, control logic, and wiring overheads when applying : glitch-minimization technique of Sec. 2 to a combinational logic block so that no glitches occur and the increase in critical-path delay of the block'does not exceed a specified upper bound.
