The time taken for a CMOS logic gate output to change after one or more inputs have changed is called the delay of the gate. A conventional multi-input CMOS gate is designed to have the same input to output delay irrespective of which input caused the output to change. A gate which can offer different delays for different input-output paths through it, is known as a variable input delay (VID) gate and the maximum difference in delays of any two paths through the gate is known as "u b ." The VID gates have a known application in minimizing the active power of a digital CMOS circuit. A previous publication has proposed three different designs for implementing VID gates. In this paper, we describe transistor sizing methods to implement the three types of VID gates for any specified delay requirement. We also describe techniques for calculating the u b for each type of gate design. We outline an algorithm for an efficient determination of the transistor sizes for a gate for given delays and output load capacitance. The algorithm is a two-step approach with a look-up table of sizes in the first stage and a sensitivity based steepest descent method for the second stage. We also give a brief introduction to the power saving potential by maximizing u b when used in conjunction with the previously published technique.
INTRODUCTION
In this section, we describe the prior work and motivation for this work. We then discuss the new sizing procedures and algorithms in the following sections.
Prior Work
Dynamic power consumed in the normal operation of a circuit consists of the essential power and glitch power. Glitches are spurious transitions caused by the imbalances in arrival times of signals at the inputs of gates. Techniques such as delay balancing, hazard filtering, transistor sizing, gate sizing, and linear programming have been proposed for eliminating glitches. 1-7 11-14 19-21 23 25 For further references the reader is directed to recent books and articles. 8-10 15 16 18 24 27 Our focus in this paper is on a recent technique known as variable input delay logic. 19 22 23 Raja et al. have described a technique for reducing glitches using special gates known as variable input delay (VID) gates where the delay through any input-output path can be manipulated without affecting the delays of the other paths up to a certain limit. This limit is known as the differential delay upper bound or u b . The parameter u b depends on the technology in which the circuit is implemented and is needed for finding optimal implementable solution from the linear program.
Motivation
Raja et al. describe three new ways of implementing the VID gate viz. Capacitance manipulation, nMOS transistor insertion, and CMOS transistor insertion. 19 22 23 Any of these gate designs can be used for efficient manipulation of input delay without altering the output delay of the gate. However, the published literature has the following shortcomings.
Transistor Sizing of Logic Gates to Maximize Input Delay Variability

Raja et al.
R on C r C p C in Fig. 1 . The RC components along the charging path.
• What is the algorithm for finding the right sizes and what are the trade-offs?
We answer these questions in this paper.
RC Delay of a Gate
Gate Delay is the time taken for the signal at the output of the gate to reach 50% of V dd after the signal at the input of the gate has reached 50% of V dd . 17 26 Consider the path shown in Figure 1 . The delay of a gate is a function of the on resistance R on (ignoring saturation effects) and the load capacitance C L . The load capacitance is given by:
where C p is the parasitic capacitance due to the "on" transistor, C r is the routing capacitance of the path, and C in is the input capacitance of the fanout transistors. C in is the major component of C L . C r and C p are not considered controllable and hence, we ignore them in the current discussion. The delay of the path during a signal transition is given by:
The delay can be manipulated by changing C L , or R on by sizing the transistor. This alters the gate delay along all paths equally. This is called the conventional gate sizing. For VID logic, we describe the gate delay as the sum of an output delay and an input delay of the fanin signal. The output delay is the common delay component of the gate no matter which input has caused the transition. The input delay for an input of a gate is the delay of the input-output path through that input. Both input and output delays should be independent. Clearly, conventional gate sizing cannot be used for designing a VID gate. In this paper, we describe the variable input delay gate sizing for the VID gates proposed by Raja et al. 19 22 23 
GATE DESIGN BY INPUT CAPACITANCE MANIPULATION
The overall gate delay is given by Eq. (2). In the new gate design we need to manipulate the input delay of the gate without affecting the output delay too much. Substituting Eq. (1) into Eq. (2), we get:
From the above where R on is the ON resistance of the fanin gate analysis we separate the input and output delays of the gate. The output delay depends on C p and C r , which are unalterable. The input delay is a function of R on and C in of the transistor pair. Thus, the delay of an input of a CMOS gate can be changed by adjusting the C in offered by the transistor-pair that input feeds into. Note that this does not alter the input delays of the other inputs of the gate (this is not always true as shown in Section 2.2).
Calculation of u b
The delay of the transistor pair can be calculated by using Eq. (3). The input capacitance of a transistor pair is given by:
where W is the transistor width, L is the transistor length and C ox is the oxide capacitance per unit area, which is technology and process dependant. The range of manipulation for C in is limited by the range of W and L of the transistors allowed. The range of dimensions for digital design, in any technology, is governed by second-order effects, such as channel length modulation, threshold voltage variation, standard cell height, etc. 16 26 We have chosen the limit of the transistor length for 0.25 technology as 3 , which is determined by the standard cell height. The minimum gate length in the same technology is 0.3 .
Hence, the maximum difference in input capacitance is 2 7 × C ox . The maximum differential delay d diff and the minimum differential delay d min obtainable in the technology can thus be:
Thus, the gate differential delay upper bound u b is given by:
Thus, the u b of the technology can be calculated by using the bounds on the dimensions of the transistors in the particular technology. There are several design issues in this gate design as described below.
Design Issues
The gate design proposed in the previous section has several drawbacks.
• In this gate design output and input delays are not independent for both falling and rising transitions. For example, the NAND gate consists of two pMOS transistors in Raja et al. Transistor Sizing of Logic Gates to Maximize Input Delay Variability parallel and two nMOS transistors is series. The gate has different rising delays along both inputs if pMOS transistors are sized differently. But the same is not true for a falling delay. Altering the size of one of the nMOS transistors affects the R on of the output discharging path and, thus, the output delay. This dependancy makes the sizing for a given delay a non-linear problem, whose convergence to a solution may be difficult.
• The parasitic capacitance C p is assumed to be constant and independent of the transistor sizes. But, in reality, C p is a function of the transistor sizes. Altering the sizes of one transistor can affect C p and the output gate delay.
• When the transistors are connected in series to one other, some of them are ON and some are OFF. This causes the threshold voltages of the transistors to change drastically due to body effect. 17 26 This makes the output delay of the gate, input pattern dependant. This is a problem because conventional design methods require a single delay for every gate output. 20 21 
GATE DESIGN WITH nMOS PASS TRANSISTORS
In the design proposed in Section 2, the main problem was the inter-dependence of output and input delays. In this second design, we propose to leave the input capacitance unaltered, and increase the resistance of the path.
Effects of Increasing Resistance and Input Slope
Consider the charging path shown in Figure 1 . Energy is drawn from the supply to charge the C L through R on . The energy consumed by a signal transition is given by 0.
dd , where C L is the load capacitance and V dd is the supply voltage. Note that the energy expression does not include resistance R on in it. The resistance governs the switching time but the overall energy per transition remains the same. Hence, increasing the resistance of the path does not alter the energy consumed per transition. Increasing resistance degrades the slew of the input waveform. This increase in input slope affects gate delay and needs to be accounted for:
Gate Delay = t step + t slew where t step is the gate delay when the input is a step waveform and t slew is the gate delay due to the input slope or slew. Thus, by increasing R on we manipulate t slew part of the gate delay. But increasing the input slew decreases the robustness and noise immunity of the circuit. 16 A large input slope means that the circuit is in transition for a longer period of time and is more susceptible to noise and short-circuit power. The input slope is restored or improved by using regenerative gates. The CMOS logic gates are regenerative as they improve the slope of the waveform while passing the signal transition from the input to the output. In our new VID gate design by inserting resistance, we use this regenerative property of the CMOS gates in the output for restoring the slope. However, the slope restoration also has limits and hence, there is a practical limit to degrading the input slope. This is one of the major factors that influence the practical value of u b for a given technology.
Proposed Gate Design
We insert a single nMOS transistor that is always ON, with resistance R s , in the series charging path. A modified NAND gate is shown in Figure 2 . The delays of the gate along both I/O paths are given by:
Thus, the input and output delays are separated completely from each other. The output delay can be controlled by sizing the gate transistors and the input delay can be controlled through R s . Delay d 2→3 is not affected by altering d 1→3 . This concept can be extended to a n-input gate. The differential delay of path x with respect to the other n − 1 paths, can be controlled by inserting n − 1 transistors in series with the inputs. These paths can be independently controlled by sizing the n − 1 transistors. Thus, we have a VID gate design that is extendible to all multi-input gate types.
Calculation of u b
As seen from Eq. (8), the input delay can be controlled independently by altering the size of the nMOS transistor. The nMOS transistor passes logic 0 effectively but degrades the signal when passing logic 1. Let us assume that there is a degradation of voltage when logic 1 is passed through the transistor. 17 26 When the transistor is acting as a resistor, there is an IR voltage drop also across the capacitor. The drop can be significant for two reasons:
• If the drop is too large, then the transistors in the fanout will not switch OFF completely. This increases short circuit dissipation of the fanout gate.
• The leakage power of the transistors is a function of the gate to source voltage (V gs ). Hence, larger drop would increase leakage current of fanout gate.
The circuit in Figure 3 (a) shows a single transistor pair at the output of the nMOS. The operating regions for the transistors are as shown. The critical condition in this configuration is the pMOS transistor remaining in cutoff. If this condition is not met, the pMOS transistor is also ON and, hence, there is a direct path from the supply to the ground. This increases the short circuit dissipation. To meet the condition, we need to make sure that V g > V dd − V tp , where V tp is the threshold voltage of the pMOS transistor. There are two factors that control the input voltage V g in this case, (1) I ds R s , where I ds is the drain to source stand-by current through the series transistor, and (2) the signal degradation .
Consider the input configuration in Figure 3 (b). The nMOS transistor passes a logic 0 without any degradation ( = 0). The critical condition here is the nMOS transistor in cutoff. By using a similar analysis as above, the condition is given by: Equations 9 and 10 give an upper bound on R s . This limits the amount of resistance that can be added to the charging path. Thus, the amount of input delay that can be added is also limited by this condition.
where R max is the maximum resistance that can be added and C L is the load capacitance of the gate. This is the theoretical limit of u b but the practical limit is governed by signal integrity issues as explained in Section 3.1.
Design Issues
This new VID gate design, although an improvement over the design in Section 2 has the following issues:
• Theoretical u b may be further reduced by dimension limits on the series nMOS transistors.
• The short circuit dissipation is a function of the ratio of the input and output waveform slopes. 17 By inserting resistance we are increasing the input waveform slope thereby increasing the short circuit dissipation.
• The leakage power is a function of the gate to source voltage (V gs ). Since > 0 when passing a 1, the leakage power of the fanout transistors increases. This drawback is alleviated in the design discussed in the next section.
• This design has an area overhead due to extra transistors added.
GATE DESIGN WITH CMOS PASS TRANSISTORS
In the gate design described in Section 3, the single nMOS transistor degrades logic 1, thereby increasing leakage power. This disadvantage can be alleviated by adding a CMOS pass transistor instead. The CMOS pass transistor consists of an nMOS and a pMOS transistor connected in parallel. Both transistors are kept always ON and = 0 while passing either logic 1 or logic 0.
Calculation of u b
The u b calculation is similar to the single nMOS added design but with > 0. Note that the resistance R s is the effective parallel resistance of both the transistors together.
Design Issues
The design issues involved in this gate design are:
• R s is the effective series/parallel resistance of both the nMOS and the pMOS transistors. Hence, effective resistance per unit length reduces and the transistors have to be longer to achieve the same resistance as a single nMOS transistor.
• Larger area overhead than the design in Section 3.
Raja et al.
Transistor Sizing of Logic Gates to Maximize Input Delay Variability
TECHNOLOGY MAPPING
The process of designing gates that implement a given delay by altering the dimensions of the transistors is called technology mapping or transistor sizing. In this section we describe the transistor sizing of VID gates. From Eq. (2), gate delay is dependant on C L of the gate, which is dependant on the dimensions of the fanout gate size. Hence, to obtain a valid transistor sizing for delay at a gate G, the sizes of the gates in the fanout of G have to be decided. Therefore, to design an entire circuit, we use a reverse breadth first search methodology and first design the gates connected to the primary outputs and work towards the inputs of the circuit. Each n-input MOS gate must be designed for an output capacitance C L and n delays, one for the output and one each for its fanins. The minimum of these fanin delays is added to the output delay and subtracted from all fanin delays, thus leaving a total of n non-zero delays. These delays are realized by designing n gates, one n-input CMOS gate and n − 1 one-input transmission gates that feed into the non-zero delay inputs of the CMOS gate. Each gate is designed for its own delay d req and load capacitance. For a k-input gate (k = n for CMOS gate and k = 1 for transmission gate), the 4k +1 dimensional design space consists of the lengths and widths of 2k transistors and a load capacitance.
Look-Up Table Generation
The first stage is to generate a look-up table of sizes by simulation, for different d req and C L . For every gate type, we simulated the gate with the smallest sizes to find rising delay d rise and falling delay d fall . The objective function is to minimize:
Delays d rise and d fall can be increased by increasing the length of the transistors and decreased by increasing the width. Thus, by an iterative process an implementation for the given d req and C L can be achieved (to within acceptable values of error ) and noted in the look-up table. Thus, the look-up table has size assignments for all different gate types and some values of C L . This look-up table can be used for all circuits.
Fine Tuning Size Assignments
When a particular circuit is being optimized, the look-up table may not have the exact C L . In such cases, we go to the second stage of fine tuning the sizes. We start with the closest entry in the look-up table. Each dimension is perturbed by one unit (since dimensions are discrete in a technology) and the sensitivity is calculated as,
where d current is the present measured gate delay, and d rise and d fall are the rise and fall delays after a perturbation in the dimension. There can be 8 perturbations, two for each of the dimensions. The perturbation with the highest sensitivity is incorporated and the gate is simulated again. The objective function is to minimize given earlier. This procedure is called the steepest descent method as the objective function is minimized by driving the dimensions based on sensitivities. The complexity is greatly reduced by using the lookup table as the search is limited to the neighborhood of the solution. Hence, local minima will not be a problem. The procedure can also be tuned for including the area of the cell in the objective function.
APPLICATION OF MAXIMUM u b TO LOW POWER DESIGN
In this paper, we have described techniques for maximizing the differential delay that can be achieved between two input-output paths through a single gate u b . How does this translate to dynamic power savings in real circuits? To answer this question we refer to two theorems proven in earlier papers that deal with reducing glitches in circuits through the use of linear programming.
Theorem 1: Glitches can be eliminated in a circuit by balancing the arrival times at each input of a gate such that the differential delay between any two inputs is less than the inertial delay of the gate. This is known as hazard filtering.
2
Theorem 2: Glitches cannot be eliminated by hazard filtering alone when the design is constrained by a maximum critical path delay (maxdelay). In this situation, buffers need to be inserted in non-critical paths to eliminate all glitches in the circuit. We shall call this the Agrawal LP technique for easy reference. 1 2 These theorems mean that glitches can be eliminated in a circuit, through an appropriate delay assignment to every gate, by using hazard filtering alone but this will increase the maxdelay of the circuit. In order to eliminate glitches without increasing maxdelay of the circuit, delay buffers may have to be inserted.
However, there are disadvantages of using explicit buffers as delay elements:
• Delay buffers consume switching power thereby increasing the overall dynamic power of the circuit.
• Delay buffers add area overhead to the circuit.
Raja et al. proposed the variable input delay logic where the differential delay can be reduced by the use of variable input delay gates thereby eliminating glitches without the insertion of buffers. 19 21 That technique relied on the technology parameter u b to provide the neccessary differential delay offset to meet the hazard filtering condition and eliminate glitches. In this case as well, if the circuit is constrained by maxdelay, and there exists a path that has more differential delay than u b achievable in technology of implementation then a delay buffer needs to be inserted in the circuit to eliminate all glitches. However, the advantages of this method are:
• Reduced extra dynamic power due to VID gates, than delay buffers.
• Smaller area overhead due to VID gates, than delay buffers.
• In maxdelay constrained circuits, much smaller number of buffers is inserted compared to the Agrawal LP technique. 19 21 These advantages are illustrated in Figure 4 . The plot shows the circuit design space for a given critical path delay (maxdelay) and total dynamic power consumed. Each point in the space, shows a circuit design solution with the given maxdelay and consuming the shown total dynamic power. By using the hazard filtering technique, the circuit can be made to consume the least power by eliminating all glitches in the design. This optimized solution is shown by the point at u b = 0. This solution circuit does not contain any buffers but is much slower compared to the unoptimized circuit. To increase the speed of the circuit, buffers are inserted by the Agrawal LP technique. The power consumed by the extra buffers is added to the total power consumed by the circuit. According to this technique, more buffers need to be added to increase the speed of the circuit with no glitches. 21 Thus, the total power increases as maxdelay is decreased and the solution curve is as shown by the Agrawal LP technique solution curve. In the VID logic technique proposed by Raja et al. suppose that we can achieve u b = 5. This allows an optimized solution circuit shown by the point u b = 5. 1 The solid line shows the circuit design solutions achieved by the LP technique proposed by Agrawal et al. 2 The points shown as u b = 5, u b = 10, etc., are the circuit design solutions by using VID technique proposed by Raja et al. 19 23 The u b = 5 and 10 curves show that by increasing u b of a technology, dynamic power for a given maxdelay can be reduced. Thus, the techniques for designing gates with maximum u b described in this paper can be used to reduce dynamic power of circuits.
The power consumed by this design is the same as the u b = 0 design as there are no buffers added in both cases. The maxdelay of u b = 5 design is smaller than that for the u b = 0 design, thereby making the u b = 5 design faster. To make the u b = 5 circuit even faster, more buffers can be added and this will lead to the solutions shown by the dotted curves. 19 Note that as maxdelay is decreased more buffers need to be added and this increases the total dynamic power of the circuit. As stated earlier, u b is a technology dependant parameter and can be increased by using one of the VID gate designs described in this paper. As can be seen in the solution space, power is reduced by choosing a higher u b . In this paper, we have proposed three different implementations for VID gates and techniques for evaluating the u b of each type. Each of the proposed designs in this paper try to maximize u b that can be achieved in a given technology. These proposed gate designs can be used by the VID logic technique to reduce the glitches in a circuit by using fewer buffers. The power savings results are described in previous publications and have not been duplicated here. 19 21 23 
CONCLUSION
We have explained why conventional CMOS gates cannot be used as variable input delay (VID) gates. We have presented three new implementations of VID gates. We presented an analysis of each of the gates and listed their limitations. Then, we proposed a two-step approach for fixing the transistor sizes of every gate instance in the circuit. The main idea of this paper is to present the transistor level implementation details of the variable input delay logic. We gave a brief introduction to the power saving potential of these gate designs when used with a previously published gate-level linear programming technique. 19 23 The advantages of the technique, its power reduction results, and comparisons with other techniques are the same as presented in earlier publications and are not duplicated here. 19 21 23 
