Abstract-High speed and small area are the main advantages of the dynamic logic for digital circuits. Power consumption of this logic family is the main drawback. In this paper a new method for reducing the power consumption of dynamic circuits is presented. The proposed technique is especially suitable for large fan-in gates where the dynamic node discharges very frequently. These kinds of gates are widely used in high performance applications like microprocessors. The proposed method is used in an 8-input NOR gate and an 8-input OR gate. The power-delay product of these gates is reduced by 46.7% and 35.15%, respectively in the 90nm CMOS technology, compared to their conventional dynamic counterparts. Meanwhile, we show that an inherent data latching capability exists in the proposed circuit that can result in reduced silicon area in pipelined structures.
INTRODUCTION
Digital dynamic circuits can provide higher speed and are more area efficient compared to static CMOS circuits and have been the subject of many researches [1] [2] [3] [4] [5] . When the fanin of a gate is large the dynamic architecture becomes more attractive. For example, in a 40-bit tag comparator an OR gate with a fan-in of 80 is employed. If this wide fan-in OR gate is to be implemented by static circuit, the number of transistors that should be stacked will be impractically large. However, wide fan-in dynamic gates are very power hungry due to the fact that the capacitance of the dynamic node is very large and it discharges in almost every clock cycle. Also, dynamic circuits suffer from lower noise margin in comparison to the static logic. Many approaches have so far been presented to enhance the noise margin of wide fan-in dynamic circuits [1, 6] . These kinds of gates are extensively used in multiplexers [6] , tag comparators [7] , register files [8] , fast adders [9] , SRAM predecoder gate [10] , programmable logic arrays (PLA) [11] , and programmable encoders [12] .
The techniques that have been proposed for power reduction of dynamic gates can be generally categorized as the following. a) Reducing the voltage swing of the dynamic node b) Improving the performance of the keeper The first approach is effective since the dynamic power is proportional to the square of the voltage swing. Therefore, reducing the voltage swing can lead to power saving. In the dynamic gates the voltage of the dynamic node may change due to the leakage current or noise. Keepers are used as a remedy for this problem. The problem of using a keeper transistor is that a contention current is generated when the dynamic node is to be discharged. This leads to more power dissipation. In the second above mentioned approach the main goal is to reduce the contention current by using smaller/smarter keeper circuitry. Examples of the above mentioned techniques are discussed in section II.
In this paper a new charging scheme for dynamic circuits that is suitable for reducing the power dissipation of large fanin OR gates is proposed. In this approach the dynamic node will experience the full swing only if it is supposed to be 1 in the succeeding evaluation phase. The paper is organized as the following. In section II a few state of the art techniques for power saving in dynamic circuits are discussed. Section III presents the proposed architecture. The inherent latching property of the circuit is discussed in section IV. Simulation results and discussions are provided in section V. Section VI concludes the paper.
II. PRIOR TECHNIQUES
As mentioned above, the main techniques for reducing power dissipation of dynamic circuits can be divided into two categories. In this section some of the previous methods are discussed.
978-1-4673-5634-3/13/$31.00 ©2013 IEEE
A. Current-Comparison-Based Domino
Recently an approach is proposed to enhance wide fan-in dynamic circuit robustness and its speed [1] . The main idea is to compare the current of the dynamic gate with its leakage current during the evaluation phase ( Fig. 1-a) . The reference circuit is a copy of the OR gate when all the inputs are high. This reference circuit produces the leakage currant that is mirrored by M 3 . In the precharge phase, M D becomes on and Out signal charges to V DD . Also in this phase, M 1 and the main keeper, K 1 are off. In the evaluation phase, M D and Mp turn off. If the PDN turns on, the gate voltage of M 2 will drop and this transistor will be turned on. The difference between the drain current of M 2 and the mirrored leakage goes to the Dyn node. Hence, the Dyn node charges up and the Out signal turns low. The main drawback of this technique is the number of extra transistors.
B. Conditional Keeper Domino (CKD)
In [13] a method is presented to enhance the keeper performance of wide fan-in dynamic OR gates. As shown in Fig. 1-b , the keeper transistor is divided into two smaller keepers, K 1 and K 2 . K 1 is active unconditionally at the beginning of the evaluation phase but K 2 is off and the PDN needs to overpower only K 1 . If the PDN stays off, K 2 turns on after a delay equal to T delay_element +T NAND . In order to obtain a desired unity noise gain (UNG), the sum of K 1 and K 2 sizes is selected equal to that of a conventional keeper.
C. High Speed Domino (HS-Domino)
Anis et al. have proposed a technique called high speed domino that reduces the contention current at the beginning of the evaluation phase ( Fig. 1-c) [14] . In this circuit the keeper transistor is off at the rising edge of the CLK signal. This is because transistor M 1 charges the gate of K 1 in the precharge phase. After a delay, determined by two inverters, and if the PDN stays off M 1 turns off and M 2 turns on. This activates K 2 . Although this technique is very effective in reducing the contention current, but the circuit operation is not reliable at the beginning of CLK cycle.
D. Reduced Dynamic Swing Domino Logic
Another technique for reducing the power consumption of dynamic circuits is reducing the voltage swing of the dynamic nodes by lowering/raising the high/low voltage values as shown in Fig. 1-d [15] . In this approach two voltage levels (V DDL <V DD , V GNDH >V GND ) are generated by diode-connected transistors. The voltage of the dynamic node swings between V DDL and V GNDH . This leads to power saving in the dynamic node since the power consumption is proportional to the voltage swing. The main drawback of this technique will be the leakage current generated in the succeeding stage if the following stage is to be a full swing gate. In addition, converting the low swing voltage levels to the full swing voltage levels is a non-trivial task. Moreover, if V DDL and V GNDH are generated by diode connected MOSFETs, the voltage swing will be affected by process variations.
The idea of reduced voltage swing is used in wide fan-in OR gate in [16] (Fig. 1-e) . In this case only the voltage swing across the large capacitance at the output of the PDN is reduced and the rest of circuit experiences full swing. This technique adds a pre-evaluation phase to the timing of the circuit. This increases the delay of the gate. Moreover, the low voltage swing at the dynamic node reduces the noise margin of the circuit. Another constraint of this logic is the usage of two extra power supplies and two additional clock pulses.
E. Other Methods
In [6] many parameters like the power consumption and speed of the traditional dynamic logic are optimized in order to achieve a targeted noise level. This is achieved by using resonant tunneling diodes and depletion-mode NMOS transistors for implementing smart keepers. The main drawback is that these devices are not compatible with the standard CMOS technology. In [17] XOR gate instead of a NOT is used to turn the keeper on and off as shown in Fig. 1-f . This allows the designer to achieve larger noise immunity. Using XOR gate makes the circuit more complex. Moreover, considerable noise may be imposed on the dynamic node during the clock transition.
The reader can refer to [1] and [3] for more detailed comparisons between previous works. III.
THE PROPOSED LOW POWER DYNAMIC ARCHITECTURE
In dynamic logic circuits the dynamic nodes charge to V DD in every clock cycle. This issue results in high power consumption, especially when the capacitance of the dynamic node is large and the dynamic node discharges very frequently in the evaluation phase. This is the case in wide fan-in NOR/OR gates. In the proposed circuit the dynamic node will charge to V DD only if it is not to be discharged in the succeeding evaluation phase. Fig. 2 shows the basic circuit. In this circuit, transistor M 1 and the pull-down network (PDN) behave like the conventional dynamic logic. Instead of a PMOS pull-up transistor an NMOS (M n ) is used. This causes the dynamic node to charge to V DD -V th,n in the pre-charge phase. In low voltage circuits it is possible to make this voltage around V DD /2 by choosing a proper size for the M n with respect to the period of the pre-charge phase. Transistor M b in Fig. 2 is the bleeder transistor which compensates the leakage current and the impact of power supply, substrate and the input noise. In order to drive the bleeder, a clocked inverter is used. This is another difference between the proposed circuit and its conventional counterpart. The clocked inverter helps to reduce the direct path current in this inverter.
The proposed circuit operates as the following. During the pre-charge phase (CLK=0) the dynamic node (Out) is charged to V DD -V th,n which will be called V S throughout this paper. During this phase the bleeder transistor is off. In the evaluation phase (CLK=1) the clocked inverter is activated and the node Out may either discharge to GND by the PDN or go to V DD by the bleeder transistor. We discuss each case separately as the following.
a) The dynamic node (Out) is to be discharged in the evaluation phase:
Since this node discharges to GND from a voltage less than V DD , the power consumption is reduced compared to the case where Out is pre-charged to V DD . Ignoring the leakage and direct path currents, the power consumption of the conventional dynamic circuit can be found from the following equation [18] .
However, in the proposed circuit the dynamic power is:
is the probability of the dynamic node being charged to V DD . In the proposed architecture the 0⟶V DD transition happens only when the output node is to evaluate to 1 during the evaluation phase. Since in wide fan-in gates, According to (3), depending on the value of V S , we can expect a dynamic power reduction of up to 1-V S /V DD . Note that in this case the delay of this circuit is less than a conventional dynamic gate since the output is discharged to GND from V S which is less than V DD .
b) The dynamic node (Out) is not to be discharged in the evaluation phase:
In the evaluation phase the PDN is off and the following clocked-NOT turns on since CLK is 1. The voltage V S is large enough to turn the NMOS of the inverter on. This pulls the gate of the keeper transistor to GND and the dynamic node charges to V DD . Since, this node is charged to V DD the proposed circuit behaves like full-swing gate when connected to the following gates. In this regard, the keeper transistor plays a crucial role.
An important point that should be emphasized here is that charging the dynamic node to a voltage less than V DD does not affect the power consumption of the clocked inverter used for the keeper. This can be explained as the following. During the precharge phase (CLK=0) the pull down path of the clocked inverter is off and no direct path exists between V DD and GND. At the beginning of the evaluation phase (CLK=1), the output of the clocked inverter ( Out ) is high. Hence, no current passes through M 2 irrespective of its gate voltage. In this way, the direct path current in the inverter is not more than that of a conventional gate, although the gate of M 2 is not fully charged to V DD .
In order to check the functionality of the proposed architecture, an 8-input OR/NOR gate is implemented. The waveforms of important nodes are shown in Fig. 3 . As can be seen in this figure, most of the time the node Out is not charged to V DD which leads to a considerable power saving. Also note that node Out stays at V DD for most of the time, instead of discharging to GND in every clock cycle in the conventional dynamic circuit, which leads to a further power saving. Moreover, if Out is connected to a static gate, the switching activity of the cascaded gate will b to less power consumption.
Note in the reduced swing technique the p of the gate is increased due to the lower across the gate-source terminals. In the prop the propagation delay is not degraded since t is not lowered.
IV. INTRINSIC PROPERTY OF DATA-L PROPOSED CIRCU
Pipeline structure is one of the most su high performance applications, especially through the critical path is long. While this higher clock frequency it requires extra regi between stages. This comes at the expen power consumption, and timing issues. A structure incorporates a chain of cascaded modules alternatively. These modules con dynamic or static gates and they are separate latch can be realized using C 2 MOS logi structure as shown in Fig. 4 [19] . When the the precharge phase, the following CLK -m evaluation phase and the intermediate latc mode. The proposed dynamic architecture data-latching property that can be used in pi this section we describe how data-latch embedded in the proposed dynamic structure (Fig. 2) . During the evaluation phase, the ou may go to 0 or V DD depending on the inp degrade. This can be avoided by upsizing the PMOS transistor. As can be seen in Fig. 5 times that of a conventional one. As a result the register between pipelined stages can be omitted if the proposed dynamic circuit is used (Fig. 6) . It is worth noting that this technique is race free since the registers are omitted and race can not occur.
V. SIMULATION RESULTS

A. Basic Gates
In order to explore the performance of the proposed technique, we have implemented an 8-input OR/NOR gate (OR-8, NOR-8) in the 90nm CMOS technology. The supply voltage is 1V and the load is a static inverter with a fan out of 4 (FO4). The clock frequency is set to 2.4GHz. The power consumption, delay, and unity noise gain (UNG) are obtained for the circuit. In the simulations, the input is changed from 0 to 255 uniformly. In order to calculate the UNG the definition in [3] is used. The average power consumption is obtained over a long period of time so that all possible input combinations have happened. The results are shown in Table I and are compared with a similar conventional dynamic circuit.
The results are provided for two cases. For the NOR-8 gate the output is taken from Out while for the OR-8 gate the output is taken from Out . In the case of OR-8, since the load is connected to Out the transistors of the output inverter are sized up.
According to Table I , the proposed structure is very effective in reducing the power consumption, while the other parameters are not changed considerably. Thus, the normalized FOM of the proposed circuit is bigger. As can be seen in this table, the proposed circuit consumes less power for the case of the NOR gate compared to the OR gate. This is due to the fact that in the OR gate the capacitance of the dynamic node is smaller. In order to check the robustness of the proposed technique against process variations, the NOR-8 circuit is simulated in all process corners. The results are shown in Table II . The circuit operates properly in all process corners and the power saving is still considerable in all corners. For the sake of comparison, an OR-16 gate is simulated and its performance is compared with a few state of the art dynamic circuits. The results are shown in Table III . The simulations are done under the same conditions as for the OR-8 gate explained above. All the gates are designed such that they have the same UNG (305 mV). As can be seen in Table  III , HS domino, CKP domino and XOR-based keeper have better performance compared to the conventional dynamic circuit. Although, the reduced dynamic swing inherently offers higher UNG (due to the diode inserted in the foot of the circuit), but its performance and power dissipation are not as good as other circuits. According to Table III, the proposed circuit is very effective in saving power and PDP, while it requires minimum number of extra transistors.
B. Pipelined Architecture Using the Concept of Self-Data
Latching A NOR-4 pipelined with a 4-input OR gate is selected as the benchmark circuit to show the inherent data latching property of the proposed circuit. The designed circuit is depicted in Fig. 7 . As can be seen in this figure, the NOR-4 gate operates with the rising edge of the CLK and the 4-input OR gate operates with the rising edge of CLK signal. This structure can be used as a PLA in which the first stage (NOR gate) operates as the AND plane [11] . This is because using De Morgan's theorem we can write:
. For this purpose, we suppose that the inputs of NOR gate are inverted. The second stage works as the OR plane. Simulation results are reported in Table IV . This table shows the benefits of the proposed architecture in PDP reduction without considerable speed degradation. It is assumed that inputs of the second stage are delayed for half of the clock period. This simulation is carried out in the 1-V 90-nm CMOS technology at the clock frequency of 1GHz. The output load is considered to be a 4×minimum-size inverter. 
VI. CONCLUSIONS
A new technique for designing wide fan-in dynamic logic circuits is presented that modifies the charging scheme of highly capacitive dynamic node. The voltage swing of the dynamic node is kept less than V DD in the precharge phase and the dynamic node will be charged to V DD at the beginning of the evaluation phase only if the output is to evaluate to 1.
This technique is very efficient in reducing the power consumption of the wide fan-in dynamic gate. The proposed technique is used in an 8 and 16-input OR/NOR gate and the performance is compared with the performance of a few previously presented circuits. Meanwhile, we show that using the proposed charging scheme, it is possible to lower the circuit area on silicon by omitting the positive latches between stages in the pipelined architectures.
