Abstract-This paper presents two proposed circuits that employ a footer transistor that is initially OFF in the evaluation phase to reduce leakage and then turned ON to complete the evaluation. Also a new circuit is added using a NAND gate that improves the performance more than 10%-15% compared with latter proposed circuit. According to simulations in a predictive 70nm process, the proposed circuit increases noise immunity by more than 26X for wide OR gates and shows performance improvement of up to 20% compared to conventional domino logic circuits. The proposed circuit reduces the contention between keeper transistor and NMOS evaluation transistors at the beginning of evaluation phase. High fan-in comparators and multiplexers demonstrate high noise immunity compared with previous proposed works.
I. INTRODUCTION
High fan-in compact dynamic gates are often used in high performance critical units of microprocessors. However, the use of wide dynamic gates is strongly affected by subthreshold leakage and noise sources [1] . This is mainly due to decreased threshold voltage that results in exponentially increased leakage currents in scaled technologies. To reduce power consumption, supply voltage scaling is used across technology scaling. However, threshold voltage needs to be scaled down as well to maintain transistor overdrive for large ON currents. Less threshold voltage means smaller gate switching trip point in domino circuits. Smaller trip points make the domino circuit more prone to input noise. Moreover, excessive leakage can discharge the precharge (dynamic) node of a domino circuit resulting in a logic failure (wrong evaluation). In addition to reduced trip pint and increased leakage, other noise sources such as supply noise and crosstalk noise also increase by technology scaling, further degrading the robustness of domino logic. Fig. 1 shows different noise sources and how they impact the robustness of the domino logic [2] . Conventional approach for improving the robustness of domino circuits is keeper transistor upsizing. However, as the keeper transistor is upsized, the contention between keeper transistor and NMOS evaluation network increases in the evaluation phase. Such current contention increases evaluation delay of the circuit and increases power dissipation. Thus, keeper upsizing trades off delay and power to improve noise and leakage immunity. Such trade-off is not acceptable because it may make the circuit too slow or too power hungry. There are techniques proposed in the literature to address this issue. High-speed domino logic [3] and conditional keeper [4] are among the most effective solutions for improving the robustness of domino logic. Fig.1 . the main sources of noise in domino logic circuit [2] In this paper, we propose new domino circuits for high fan-in and high-speed applications in ultra deep submicron technologies. The proposed circuits employ a footer transistor that is initially OFF in the evaluation phase to reduce leakage and then turned ON to complete the evaluation. In order to avoid the delay penalty due to an initially OFF footer transistor, an extra path for evaluation is provided that is controlled by the output. According to simulations in a predictive 70nm process [5] , the proposed circuits increase noise immunity by more than 26X for wide OR gates and shows performance improvement of up to 20% compared to conventional domino logic circuits. The proposed circuits reduce the contention between keeper transistor and NMOS evaluation transistors at the beginning of evaluation phase. This results in less power dissipation for the proposed technique. The rest of paper is as follows. Section III and IV describes our proposed circuits. The simulation results are presented in section V. Finally section VI concludes the paper.
II. PROPOSED DOMINO LOGIC CIRCUITS
There are many proposed circuits that reduce leakage current and total power consumption. One of the existing leakage tolerant domino circuits is highspeed domino logic (HS). The description of this circui has been explained in [3] . Another existing leakage tolerant domino circuit is conditional keeper domino logic (CKL) [4] . In this paper, a conditional keeper that is turned on during the hold time is employed to dominate the leakage through parallel NMOS network. The schematic of our proposed circuit is shown in Fig. 2 (referred as proposed circuit-1). The proposed circuit-1 employs stacking effect (by adding the footer transistor MN1) for noise immunity improvement and uses the steady state voltage of N_FOOT node at the beginning of evaluation phase to reduce leakage of the evaluation network. Below the operation of the circuit is analyzed for the different operational modes.
a) Precharge mode:
When clock is low, the circuit is in the precharge phase. MP1 is turned on and the dynamic node starts charging to VDD. In addition, PMOS keeper transistor (MP2) is turned on helping the precharge. At the beginning of the precharge phase, MN1 is on. Thus, it pulls the N_FOOT node to ground. Meanwhile, node GMN2 is low and MN2 is in the off state. After the delay of the inverters (delay element), MN1 is turned off. In this case, the voltage of N_FOOT rises to an intermediate voltage level. The evaluation transistors are sized such that the DC voltage of GMN2 node does not exceed the threshold voltage of MN2 to avoid any possibility of short circuit current in the precharge phase. We have selected MN2 to be larger than other NMOS transistors.
b) All inputs at zero in evaluation
At the beginning of the evaluation phase, NMOS footer transistor MN1 is off. Thus, N-FOOT node is floating. Therefore, in this case, its voltage reaches a DC voltage. If this voltage exceeds |V tn-MP3 |+V OUT , MP3 is turned on. In the other words:
In that case, the GMN2 node is charged to V N-FOOT, and therefore:
If condition (2) is satisfied, (MN2 is turned on), a wrong evolution occurs. However, in our design we have sized MN1, MN2, MP3, and MP4 considering the voltage of GMN2. The sidings are done in such a way that condition (1) and (2) do not happen. Therefore, the DC voltage of N_FOOT acts as a source biasing for the evaluation network without affecting the functionality of the circuit. This DC voltage reduces leakage of the evaluation network substantially, resulting in significant leakage tolerance. Our proposed circuit has significant immunity to input noise because due to the DC voltage of NMOS transistors source terminals in the evaluation network, their threshold voltage increases. Thus, their trip point increases and their subthreshold leakage current reduce significantly, due to stacking effect. In our proposed circuit, performance improvement is achieved by upsizing MN2 transistor. This is further described in the following subsection.
c) An input switching high in evaluation phase:
The waveforms of the circuit in this mode are shown in Fig.3 . As observed, the increased voltage of N_FOOT node at the beginning of the evaluation phase causes MP3 to be turned on. Therefore, the GMN2 node is charged to the voltage of N_FOOT node which rises above the threshold voltage of MN2. Therefore, the NMOS transistor MN2 is turned on at the onset of evaluation phase (when the footer transistor MN1 is off), connecting the dynamic node to ground. After delay of the delay element, N_FOOT node is strongly at zero voltage. Thus, the transistor MP3 switches to the off state. Since the output nod is at high now, it turns on the MN3, and connects GMN2 node to ground turning MN2 off. However, the rest of evaluation phase (discharging of the dynamic node) completes through the evaluation network and the footer transistor that is fully on. Here we have more degree of freedom for increasing speed or enhancing noise immunity. For example, for improving speed, upsizing of MP3, MN2, MN1, evaluation transistors, and MN1 are all options. To improve the operation of the proposed circuit-1 we use an extra circuit to improve the evaluation speed. This circuit is shown in Fig. 4 (referred as proposed circuit-2). After the primary time of evaluation, the dynamic node starts to be discharged (when at leas one of inputs is at high). The input of small keeper transistor MP2 starts to go high. So, both inputs of the NAND gate are at high logic, so the output of the NAND gate goes to low and the gate of NMOS MD transistor start to going high that helps to discharging dynamic node. Then the speed of our proposed circuit is improved significantly. In other times at least one of the inputs of the NAND gate is 0, so the voltage of the MD transistor is low. So the MD transistor is switched off.
III. PROPOSED MULIPLEXER
High fan-in dynamic MUXs are commonly used in register files for implementation of bit-lines [7] . In register files, because of the fairly small size of the memory, the bit-lines are implemented using wide domino MUX gates [8] . Multiplexers with a high fan in are widely used in many applications, such as the column decoders of memories. In footless domino logic MUX, the excessive leakage of the evaluation network can cause logic failure during the read operation. A method is proposed in [7] to improve the leakageimmunity of register file bit-lines is pseudostatic bit-line. In this technique, the subthreshold leakage is reduced considerably. However, the technique either exhibits considerable increase in transistor count and delay penalty due to the use of many static OR gates. The proposed technique can be applied to the register file bitline MUX as shown in the Fig. 5 . The worst case scenario for noise at the inputs is when all the inputs from the memory cells are high, and all the RS signals are low and receive same noise in the evaluation phase. In the FLDL MUX, the keeper transistor is upsized from a Keeper ratio of 1 to 2 in order to achieve different data points for delay and noise immunity. The FLDL MUX fails to operate for smaller keeper sizes because of high subthreshold leakage in the 70-nm technology. In the pseudostatic MUX, the keeper transistor is upsized from a keeper ratio of 0.1 to 1 in order to achieve different data points for delay and noise immunity. In the proposed MUX, the keeper transistor is sized for a keeper ratio of 0.1 to 0.3 to achieve different data points for delay and noise immunity.
Proposed MUX-2 is designed based on the configuration in Fig.4 . To get the differenet data points, we increase the the keeper ratio from 0.1 to 0.3. To improve the noise immunity of this circuit, we can use the minimum size devices for the NAND gate in proposec circui-2.
IV. PROPOSED HIGH FAN-IN COMPARATORS
The worst case for delay is when inputs A and B are different in only a single-bit position. In this case, only one of the evaluation paths conducts and discharges the precharge node. The worst case scenario for noise at the inputs is the case where all the inputs are low and receive the same noise in the evaluation phase. In the standard domino comparator, the keeper transistor is upsized from a keeper ratio of 1 to 2 in order to achieve different data points for delay and noise immunity. The standard domino comparator fails to operate for smaller keeper sizes because of high leakage in scaled technologies (70 nm). In this proposed circuit, the keeper size variations are very small. We changed the keeper size from minimum size to 0.15 to achieve data points. In fact, in proposed circuit the speed is almost independent of keeper size but it is due to Mn2 sizing. We can achieve even more speed by increasing the size of NMOS transistor Mn2. UNG is defined as the amplitude of input noise Vnoise that causes an equal amplitude noise pulse at Vout.
Our simulation results are obtained in the worst case corner of the 70-nm predictive technology at 0.9 V and 110°C. Meanwhile the proposed circuit has a better performance and also UNG compared with DiodeFooted domino logic [8] . To configure the proposed comparator-2, a configuration based on Fig.3 is designed (referred as proposed comparator-2) 
V. SIMULATION RESULTS
In this section, we study the behavior of our proposed circuits based on simulation results. The results are obtained using predictive technology model of 70nm technology at the temperature of 110C. The noise immunity metric used in our work is unity noise gain (UNG) [8] . UNG is the amount of DC noise at all inputs that result in the same amount of noise at the output noise [8] [9] [10] . Therefore, larger UNG indicate more noise (leakage) immunity. UNG of our proposed circuits is obtained by varying keeper transistor size from 0.3W EVAL to 1W EVAL (WEVAL being the width of evaluation transistors) for both circuits.The keeper ratio is defined as the size of keeper transistor to the size of evaluation transistors. The proposed circuit-1 shows very high UNG compared to other proposed circuits and higher speed than some existing designs [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] . Therefore, the proposed circuit-1 has a higher performance and very high UNG over conventional domino circuits. Results show that the improvement of UNG for our proposed circuits compared to conventional circuits is as large as 26X. In addition, speed of our proposed circuit is acceptable and it shows 20% improvement for some cases. In proposed circuit-2 the perfomance is increased significantly due the added configuration that improves discharging the dynamic node during the evaluation phase (when at least one of the input signals is high). But the UNG of this circuit is lower than proposec circuit-1. In the evaluation mode with all inputs zero, we observed that the subthreshold leakage current has reduced significantly in our circuits. In our proposed circuits, by sizing transistors precisely, we can get less power dissipation compared to conventional circuits. Simulation results show that the power dissipation of our proposed circuits is lower compared to other domino logic circuits. The proposed circuits have employed small devices for the evaluation network, and therefore, the areas of our proposed circuits are less comparing with conventional circuits [6] .
Fig.8. UNG vs. delay for our proposed comparators
In summary, according to the simulation results, the propose circuits show 3.58X to 26X UNG improvement, 10% to 41% performance enhancement, and 6% to 22% power reduction compared to existing leakage tolerant domino techniques. New proposed domino circuit-2 has a lower UNG compared with proposed circuit in [6] , but it improves speed of circuit with 10%-15%. We employ our two proposed circuits to design new comparators and MUXs. Fig.7 shows the results of this experiment for 16-in MUX in the worst case Ioff corner of the 70-nm predictive technology at 0.9 V and 110°C. As observed from Fig. 7 , the UNGs of the proposed circuits are larger than that of the FLDL design, and the proposed circuits design show the best delay among all the designs. Ofcourse, The pseudostatic implementation has the highest UNG, but its delay is larger than our proposed circuit. Also, proposed circuit-2 shows a higher speed than FLDL MUX with a higher UNG. For the comparator circuit-1, as observed from Fig. 8 , the UNG of the proposed circuit is considerably larger than that of the Footless Domino Logic Comparator (FLDLC), and the delay of the proposed design is comparable to the best delay of the standard domino design, but for proposed comparator-2, the speed is improved compared to FLDL comparator in some cases. However the second domino circuit0-2 has an area overhead more than 10% compared with other circuits.
VI. CONCLUSIONS
We have proposed new leakage-tolerant and highspeed domino logic circuits with reduced power dissipation and also higher speed. In these circuits, we obtained excellent noise immunity and higher speed compared to existing domino circuits. The proposed techniques use a small keeper transistor to reduce power dissipation. Also, the proposed circuit has been employed in comparator and MUX circuits. The results for these circuits were excellent compared with previous works. It proves leakage tolerance by using a footer transistor.
