# Dynamic Gates with Hysteresis and Configurable Noise Tolerance

Krishna Santhanam Kenneth S. Stevens Electrical and Computer Engineering University of Utah

krishna.santhanam@utah.edu

kstevens@ece.utah.edu

# Abstract

Dynamic logic can provide significant performance and power benefit compared to implementations using static gates. Unfortunately dynamic gates have traditionally suffered from low noise margins, which limits their reliability. A new logic family, called complementary dynamic logic (CDL), is presented. CDL replaces the standard keeper logic with a dual dynamic keeper gate that is applicable to all dynamic gate structures. CDL provides dynamic gates with two novel characteristics: hysteresis and arbitrarily configurable noise margins. However, these two benefits come at the cost of reducing the gain and increasing the energy of the dynamic gate. This paper compares the noise, energy, performance, gain, and total transistor width tradeoffs of CDL and three other logic families applied to a 65nm cell library consisting of 23 functions. The results show that the performance advantages of dynamic domino gates can be maintained while providing significantly enhanced noise margins using CDL structures.

# 1. Introduction

Deep submicron designs consist of a number of competing critical design tradeoffs. Performance has traditionally been the most important design metric. Others such as power and noise have become increasingly important due to scaling. Indeed, the impact of wires in our design is having an enormous effect on our architectures and circuits by increasing the delay, noise, and power of our designs [9, 13].

The circuit family used in our designs maintains a direct relationship to the performance, power, noise tolerance, and time to market of a design. The robustness and ease of mapping combinational functions to static logic are significant advantages that keep this logic family at the forefront of our design world. However, other logic families hold distinct advantages in terms of power and performance over traditional static logic design. For example, a domino implementation of a six-gate two-input NAND pipeline is 40% faster with 21% less peak switching energy than a static implementation driving an identical load.

A dynamic gate owes its significant performance advantages to its unique logic structure. Dynamic gates implement the *state change* of a function, then act as *latches*. Hence the transistor logic is only implemented to effectuate the change in a function from high to low (or low to high). Otherwise the gate is left in a high impedance state. This results in a very efficient gate. For instance, a traditional 2-input domino NAND gate has a logical effort [14] *less* than that of an inverter<sup>1</sup> – giving an input-to-output gain greater than an inverter in traditional CMOS processes. The high gain of these gates are the primary reason for both performance and power advantages of dynamic gates, and their latching property can create design advantages.

However, these structures also have a serious drawback. The dynamic latched output states that are not covered by the set-reset function are sensitive to noise. Noise is one of the primary reasons that dynamic gates are not exploited in more designs for their performance and power advantages.

This work presents a new dynamic gate structure, called *complementary dynamic logic*, or CDL, that can provide dynamic gates with hysteresis and a configurable noise margin. Hysteresis provides the gates with a high switching threshold when the output is low, and a low switching threshold when the output is high. The configurable noise margin allows propagation and coupling noise effects to be mitigated – even to the point that noise sensitivity is less than that of a comparable static gate.

The noise tolerance comes at the cost of gain – increasing the delay and power of the gate. Therefore CDL gates will be sized to optimize gain based on the specific noise requirements of the interconnect.

In this paper, we report the results of our characterization of these logic gate as we trade off performance, power, gain, and total transistor width to increase the noise margin of this novel circuit family. We characterize and compare CDL gates to static and traditional dynamic domino gates with a weak feedback keeper.

## 2. Dynamic Gate Noise Reduction

Several gate structures have been used to increase the noise margin of dynamic gates while retaining speed and energy advantages compared to static gates. These techniques fall under two categories.

The first category of circuits were developed to reduce the propagated noise of dynamic gates by (a) dynamically increasing the switching threshold and (b) precharging intermediate nodes to increase the body effect [15, 1, 5, 3, 4]. These methods do nothing to improve coupling noise, and have a relatively limited improvement on propagated noise, but do reduce the leakage of these gates. We classify these as uncompetitive alternatives.

A second more effective method is to employ *keeper* structures that retain the output voltage in dynamic states of the gate. The circuits in this category are effective against both propagation noise and crosstalk noise. The most successful keeper design has been to implement a *jam latch*, or back-to-back inverters, on the output of the dynamic gate<sup>2</sup>.

<sup>1</sup> Logical effort is a metric of the gain of a gate that takes into account the complexity of logic required to switch a gate. An inverter is traditionally assumed to be the highest gain gate.

<sup>2</sup> When dynamic gates are used inefficiently in clocked pipelines by connecting the precharge to the clock the pull-down structure of the second inverter in the jam latch can be removed because the gate will never be in a dynamic state with the output low.



Figure 1. Footed Domino 2-input NAND gates with (a) jam latch, (b) CDL

This latch, shown in Figure 1(a) provides a pull-up or pulldown path for the output at all times. However it also has one deleterious property: The jam latch keeper logic will always oppose the output transition in the dynamic gate. This has several drawbacks: (a) This increases the power dissipation of the gate due to the short circuit current between power and ground when the output switches. (b) The switching delay of the gate is increased due to the fight between the keeper and dynamic gate. (c) The keeper becomes a ratioed gate and must be sized properly or the gate will not function. If the keeper is too large, the gate will not switch. This limits the ability to create a dynamic gate with a large noise tolerance. (d) The feedback inverter of the dynamic gate must be sized to switch the keeper logic quickly to reduce short circuit current. This increases the load on the output of the dynamic gate.

Stronger noise margins are needed than can be provided by the standard jam latch keeper. CDL gates provide the needed ability to obtain the necessary noise tolerance to allow usage of domino logic further into deep submicron technologies. CDL requires a second dynamic gate that is the dual of the dynamic set-reset gate. The outputs of these two gates are tied together as shown in Figure 1(b). The complementary CDL gate provides noise tolerance to the high impedance states of a dynamic gate. This secondary dynamic gate will never switch the output of the gate and does not fight the gate's transition. Instead, it's solitary purpose is to provide noise tolerance. The complementary dual gate can be arbitrarily sized to achieve any necessary noise margin - even to the point where it is stronger than the gate that toggles the output. CDL gates can therefore drive longdistance communication wires. However, in most applications dynamic gates will continue to drive short local wires. In such cases the complementary gate will be very small, having a minimal impact on the gain of the dynamic gate.

There are two main advantages to CDL logic. It is the only set-reset dynamic logic family that can have an effectively controllable noise margin. The second key advantage of CDL is hysteresis. This is due to the dual gate which continues to provide current to retain the previous state until the inputs have fully switched or the output has toggled.



Figure 2. Static and dynamic C-Elements

# 3. Dynamic Gate Architectures

The logic of a dynamic gate is only intended to toggle the output. Therefore dynamic gates are most efficient when technology mapped from *production rules* [7] or other setreset synthesis methodology. Such an approach is employed by asynchronous synthesis CAD [2, 16]. This is in stark contrast to the wasteful approach typically used in clocked designs where the precharge input is tied to the clock.

A simple C-Element, or rendezvous, will be used as an example to illustrate the design and benefits of the CDL gate using set-reset synthesis. The production rules for a C-Element are given as:

$$\begin{array}{ccc} a \uparrow \cdot b \uparrow & \mapsto & o \uparrow \\ a \lfloor \cdot b \rfloor & \mapsto & o \rfloor \end{array}$$

The implementation mapped to a static gate versus a dynamic gate are shown in Figure 2 and Table 1. The dynamic gate implementation is 40% faster with a 20% reduction in energy compared to the static implementation when driving the same output load. Hence a design methodology that efficiently uses dynamic gates has a two fold benefit (1) there can be a substantial reduction in overall logic when exploiting the dynamic states, and (2) the gates themselves have higher gain, particularly when the set and reset functions have disjoint input conditions.

The design of the complementary dual gate is illustrated based on the dynamic C-Element. The KMap of the dual gate is shown in Table 2. All dynamic states from Table 1 are specified as 0 or 1 based on the value of the output  $\circ$ . All other states are don't care but must be covered with either a zero or one based on the values of the dynamic gate. The full CDL logic is shown in Figure 3 where the (a) is the set-reset gate, (b) is an inverter to provide proper polarity to the CDL gate (c).

The primary disadvantage of CDL gates is the complexity of the keeper logic. First, since the two gates are duals, the keeper gate will produce additional load on the inputs, reducing the gain of the set-reset gate. Second, since the dual keeper gate does not switch the output, it will always have the output feeding back as an input, creating a latching structure. The keeper gate will therefore have more transistors than the set-reset dynamic gate. Third, the complementary nature of the gate results in inefficient series structures when the set-reset gate has wide OR functionality.

However, the complexity of the dual CDL gate may not be as significant as one initially expects. The complexity is mitigated by two factors. First, mapping functions into setreset logic can have a significant overall reduction in the

|   |   | ab |    |    |    |  |  |
|---|---|----|----|----|----|--|--|
|   |   | 00 | 01 | 11 | 10 |  |  |
| 0 | 0 | 0  | ×  | 1  | ×  |  |  |
|   | 1 | 0  | ×  | 1  | ×  |  |  |

Table 1. KMap for C-Element:  $\times$  states are dynamic. Set, reset coverings = ab,  $\overline{ab}$ 

|   |   | ab    |    |    |    |  |
|---|---|-------|----|----|----|--|
|   |   | 00    | 01 | 11 | 10 |  |
| 0 | 0 | ${0}$ | 0  | -1 | 0  |  |
|   | 1 | ${0}$ | 1  | -1 | 1  |  |

Table 2. Dynamic C-Element dual: high, low coverings =  $(a \circ + b \circ), (\overline{a \circ} + \overline{b \circ})$ 

logic, particularly when mapping to sequential functions. For example, 18 transistors are required for the static implementation of the C-Element, versus 12 for the CDL and 8 for a dynamic gate with a jam latch keeper. Secondly, what really matters in a design from a power and performance perspective is the the *size* in terms of total transistor widths, not the number of transistors<sup>3</sup>. Unless very high noise margins are required, the total transistor width in a dynamic gate, including the more complicated CDL Logic, can be quite small when compared to a traditional static gate. For example, the total transistor widths of the standard keeper and CDL C-Elements are 23% and 27% respectively of the static gate for an identical load where the keepers are sized to drive 20% of the switching current of the devices. Note that the CDL implementation is only 18% larger than a gate using a traditional keeper. This highlights two important factors to remember: (a) the increased gain and reduction in logic of dynamic gates can result in a significant reduction in total transistor widths, and (b) the extra transistor width for the dual gate in CDL logic is very small unless high noise margins or hysteresis are needed.

The CDL gate structure has in the past been arbitrarily applied to a few dynamic gates. The CDL C-Element shown here was used in the design of the Post Office [12] and is compared against other C-Element circuit structures in [10]. However, a general approach to the design and sizing of a complementary keeper to control noise is novel.

Because of the complex relationship between the noise advantages and gain costs of the CDL, a rigorous evaluation of these gates has been carried out and will be reported in Section 6.

# 4. Circuit Comparison

The circuit comparison has been carried out using the functions of a complete 65nm cell library. This library contains 23 independent functions [11]. We have mapped four gate families to these 23 functions: two dynamic cell libraries and a complex gate static library. The dynamic libraries consist of traditional dynamic domino with a jam latch keeper and domino CDL gates. Only a fraction of this



Figure 3. CDL Inverting C-Element

data can be presented here due to space limitations and the scope of the study. We will therefore use a representative example and summarize the full data results here. The complete data set is contained in [8].

**Noise Margin:** A domino gate will fail if it flips state or produces non-monotonic output changes. Therefore an aggressive definition of failure under noise is adopted: any change of  $V_{th}$  on the output is deemed a failure. Noise immunity curves are reported because they show how noise margins scale compared to the size of the keeper logic and provide a timing perspective which is absent in a DC analysis.

**Performance:** Performance is measured as the delay between 50% change in the input to a 50% change in the output. Transistors are sized in this study by setting the PMOS to NMOS ratio to 2:1.

The current drive of the equivalent static gate is used as the baseline. Dynamic gates are sized to have an equivalent switching current and load as the static gate shown in Figure 4. This results in similar delay, but greatly underestimates the gain advantage of dynamic gates. An alternative approach is to match the input loads of the gate which takes the gain into account [6]. We opted to use identical drive size to create the worst case scenario for the dynamic gates and to mitigate variations based on our sizing of the keeper transistors.

**Switching Power:** The sum of the energy to drive both the inputs and the outputs are reported.

**Gain:** Gain of the gate is calculated as the ratio of output load to the input load  $C_{out}/C_{in}$ . The gain is reported as  $C_{in}$  in this study because the output load remains constant.

**Transistor Width:** The total transistor width is the best first order metric of the cost of the circuit in terms of leakage and transistor area (but perhaps not layout area).

Hysteresis: Measured by the DC switching points.

# 5. Experimental Setup

Noise margins are modified by changing the size of the keepers. Our experiments consisted of measuring the seven design metrics in the previous section upon varying the size of the keepers. The circuits are evaluated under both input noise propagation and aggressor or crosstalk noise coupling on the output of the gate.

#### 5.1. Keeper Sizing Parameters

Three parameters are used to modify the gate sizes in the keeper structures.

<sup>3</sup> Many deep submicron libraries now split single logical transistors into many smaller transistors through "*legging*" to reduce variation, etc.



## Figure 4. Propagation noise setup

The keeper logic is sized in relation to the set and reset function of the gate using the parameter s. When s = 1 the keepers will drive nearly the same current as the set-reset logic function as shown in Figure 4. When s = 0 the gate is fully dynamic. This is the primary keeper scaling parameter used to control noise immunity. Most of our graphs show results of keeper sizing with s ranging from 0 to 1.

Parameter r is used to optimize the sizing of the keeper in the CDL gate by keeping the noise margin the same while reducing the load on the input pins. This optimization technique improves gain at the cost of slight increased delay and switching power of the gate.

Parameter t is used to optimize the size of the first inverter in the feedback path to the keeper as shown in Figure 4. The fanout load of the inverter was varied using this parameter while measuring the delay, power, and noise margin. The optimal value of t for the domino gate with regular keeper is approximately a fan out of four, whereas in the case of the CDL gate structure it is a fan out of 10. The optimal values have been used in all the simulations while varying the other parameters.

#### 5.2. Noise Modeling

**Propagated noise:** The setup used to measure noise propagation is shown in Figure 4 using 2-input NAND gates as an example. All simulations switch the output closer to ground. The transition to the input is a ramp that saturates at approximately  $V_{cc}/2$ .

Dynamic noise immunity curves are reported varying the s parameter that dictates the size of the keeper transistors. Spice simulations step the parameter s, sweeping the duration of the input noise pulse until the propagated noise changes by  $V_{lh}$ . The results plot input duration versus parameter s. The traditional design range in size for a jam latch is from 0.1–0.2 times that of the dynamic gate. However, we have plotted the graphs across a much larger dynamic range, varying s from zero to one.



Figure 5. crosstalk noise configuration

**Crosstalk noise:** Parameter *l* is introduced to model aggressor noise as shown in Figure 5. This parameter specifies the percentage of the effective load on the gate that can be associated with a noise source, and ranges from 0 to 1. The total capacitance on the output node remains constant, but as *l* increases more of the total cap is attributed to cross-coupled wires. Therefore this is a figure of merit that can be used to determine the maximum wire length that can be safely driven for a given keeper size. The aggressor signals are ramps that saturate at  $V_{ee}$ . Dynamic noise graphs are created by incrementing the keeper size *s*, and sweeping *l* until the maximum noise on the output changes by a threshold  $V_{lb}$ .

# 6. Simulation Results and Comparisons

All values are taken from spice simulations of the gates in a 180nm process with a power supply of 1.8V and threshold voltage of 0.4V. All signal ramps for propagated and crosstalk noise use ramps with a 150ps rise time that is equal to FO4 values.

Dynamic noise graphs are plotted in Figures 6 and 8. These plot footed domino NAND and NOR structures that range from two to four inputs using jam latches and CDL keepers. The NAND and NOR structures give the best intuition for the scaling and cost, since we cannot show the results of all 23 gate functions. The more complicated AOI gates exhibit an additive combination of the characteristics of these structures.

All values in these graphs are normalized to the values of a static gate. A value of 2 on the vertical axis is twice as good as the static gate, and 0.5 is half as good. The horizontal axis scales parameter s. Changes in parameter s have an effect on the keeper structures but there is no change in the static transistor's sizes. When zero, the gate is fully dynamic. When s = 1, the keeper logic has approximately the same drive as the static gate.

The devices are sized pessimisticly for the dynamic gates by matching the drive strengths and loads. Therefore the



Figure 6. Dynamic Noise Graphs of NAND structures

performance and power of the dynamic gates are similar to the static gate. However, the gain for these circuits is considerably better than for the static gate. This implies that from a system perspective, substantial performance and power improvements are possible beyond what is reported here.

The arcs in the graphs align so that on the horizontal axis, the order of the arcs are gain, delay and power which are an improvement over the static gate. These values degrade as the keeper size is enlarged. The exception is the the gain of the domino with a week keeper which remains constant because the inputs are independent of the keeper logic. Next the coupling and propagated noise appear, each with a worse value than the static gate. Noise immunity improves as the size of the keeper is increased in all gates. The dynamic gates are identical with a zero sized keeper for gain, performance, power and noise margin. They begin to diverge based on the particular properties of the keeper.

The CDL gate scales better in NAND structures than the jam latch for performance, power, and coupled noise. The CDL noise immunity is equal to the static gate with a keeper size about 80% the size of the dynamic gate. The CDL gate doesn't scale as well for propagated noise. The performance/noise tradeoff improves compared to the static gate with the deeper NAND structures. A CDL gate has a 20% performance penalty when the same noise margin as the static gate is achieved for the 4-input NAND. The jam latch cannot reliably reach this point.

There is a substantially larger improvement in performance and power for the NOR structures in Figure 8 compared to the static gate with small keepers even given equivalent switching transistor sizes. Performance improves by a factor over  $2 \times$  for the 3-input NORs. However, the propagated noise (the lowest line in the graphs) is substantially worse than the static gate. The CDL gate shows better noise margin scaling for large keeper structures. (The jam latch failed a little over 80% the size of the dynamic gate in the 2input NAND case.) The CDL gate achieves approximately the same performance as a static gate for an identical coupling noise margin.

Figure 7 shows the DC analysis of a 2-input NAND gate as the CDL keeper gate scales. For large keepers a substantial hysteresis differential is created. This was consistent across all of the gates in our 23 function library.

# 7. Conclusion and Future Work

A new gate structure (CDL) has been designed and evaluated showing configurable noise tolerance for dynamic gates that improves robustness and range of application. While there is no advantage for small keeper logic over the standard weak inverter implementation, the CDL gates show advantages in both NOR and NAND structures when increased noise tolerance is required and large keeper structures are used. The results show that the CDL gate can



Figure 7. CDL Hysteresis of 2-input NAND

2007 IFIP International Conference on Very Large Scale Integration (VLSI-SoC 2007)



Figure 8. Dynamic Noise Graphs of NOR structures

achieve the same coupled noise margin as a static NOR gate with approximately the same performance and an improved gain. Dynamic gates were also shown to have over 600mV of hysteresis in a 1.8V process with large CDL structures.

There is significant cost in transistor width and decrease in gain for the NOR structures in a CDL gate. We briefly studied applying ratioed gates the the PMOS NOR structures in a CDL gate. This showed a potential for substantial improvements in gain and transistor width at a cost of some contention when the gate switches. We are also investigating datapath implementations using CDL logic, automatic sizing of the keeper gate for noise and wire lengths, and comparing CDL against static and traditional keeper designs in a test chip.

## References

- G. Balamurugan and N. R. Shanbhag. The twin-transistor noise-tolerant dynamic circuit technique. *IEEE Journal of Solid-State Circuits*, 36(2):273–280, Feb 2001.
- [2] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and A. Yakovlev. Petrify: a tool for manipulating concurrent specifications and synthesis of asynchronous controllers. *IEICE Transactions on Information and Systems*, E80-D(3):315–325, 1997.
- [3] J. J. Covino. Dynamic CMOS circuits with noise immunity. Technical Report US Patent 5650733, IBM, Jul 1997.
- [4] G. P. D'Souza. Dynamic logic circuit with reduced charge leakage. Technical Report US Patent 5483181, Sun Microsystems, Inc., Jan 1996.
- [5] S. Goel, T. Darwish, and M. Bayoumi. A novel technique for noise-tolerance in dynamic circuits. In *IEEE Computer Society Symposium on VLSI*, pages 203–206, Feb 2003.
- [6] D. Harris, G. Breed, M. Erler, and D. Diaz. Comparison of Noise Tolerant Precharge to Conventional Feedback Keepers for Dynamic Logic. In *Great Lakes Symposium on VLSI*, pages 261–264, 2003.

- [7] A. Martin. Compiling Communicating Processes into Delay-Insensitive VLSI Circuits. *Distributed Computing*, 1(1):226– 234, 1986.
- [8] K. Santhanam. Novel Dynamic Gate Structure with Configurable Noise Tolerance. Master's thesis, University of Utah, May 2007.
- [9] P. Saxena, N. Menezes, P. Cocchini, and D. A. Kirkpatrick. Repeater scaling and its impact on CAD. *IEEE Transactions* on Computer-Aided Design of Integrated Circuits and Systems, 23(4):451–463, April 2004.
- [10] M. Shams, J. C. Ebergen, and M. I. Elmasry. Modeling and Comparing CMOS Implementations of the C-Element. *IEEE Transactions on VLSI Systems*, 6(4):563–567, December 1998.
- [11] K. S. Stevens and F. Dartu. Algorithms for MIS Vector Generation and Pruning. In *International Conference on Computer-Aided Design (ICCAD-06)*, pages 408–414. IEEE Computer Society, Nov. 2006.
- [12] K. S. Stevens, A. L. Davis, and W. S. Coates. The Post Office Experience: Designing a Large Asynchronous Chip. In *Proceedings of the 26th Hawaii International Conference on System Sciences*, pages 409–418, January 1993.
- [13] R. Suaya, R. Escovar, S. Ortiz, K. Banerjee, and N. Srivastava. Modeling and extraction of nanometer scale interconnects: Challenges and opportunities. In Advanced Metallization Conference (AMC-2006), pages 17–27. Materials Research Society, Sept. 2006.
- [14] I. Sutherland, B. Sproull, and D. Harris. Logical Effort: Designing Fast CMOS Circuits. Morgan Kaufmann Publishers, Inc., San Francisco, 1999.
- [15] L. Wang and N. R. Shanbhag. Noise tolerant dynamic circuit design. In *International Symposium on Circuits and Systems* (ISCAS), pages 549–552. IEEE, Jun 1999.
- [16] K. Y. Yun and D. L. Dill. Automatic Synthesis of 3D Asynchronous Finite-State Machines. In *International Conference on Computer Aided Design, ICCAD-92*, pages 576– 580, Los Alamitos, Calif., November 1992. IEEE Computer Science Press.

2007 IFIP International Conference on Very Large Scale Integration (VLSI-SoC 2007)

189