ABSTRACT
INTRODUCTION
The rapid advancements in the field of VLSI is due to the increased use of battery operated devices such as laptops, PDAs, mobiles etc., advancements in wireless communications and computations are the urge for low power budgets and compactness. To achieve this, the transistor size has been continually scaled down and to have proper operation of the device, the supply voltages have also been scaled. As the technology aggressively scales down, the density on the chip has increased and hence the interconnection density, which increased the coupling capacitance of the circuit. This lead to increased interaction between the connections and thereby increasing crosstalk and system failures. On the other hand with the decrease in supply, the gate threshold is decreased to preserve system throughput and so leakage currents have increased. And therefore the noise margins of the gate reduced.
Dynamic logic circuits found their wide application in high speed, low power areas such as microprocessors, digital signal processing, dynamic memories etc., because of their low device count, high speed, short circuit power free and glitch free operation [2] . On the other hand it is also possible to design a dynamic logic unit that is smaller than its static counterpart. Dynamic logic consists of pull down network realizing the logic. From the basic theory of dynamic logic the circuit is pre-charged and evaluated at every clock cycle. Due to high clock frequency, a large amount of noise gets induced and power consumption increases. The main draw backs in dynamic logic are charge sharing and cascading. To overcome these problems domino logic is used. When a dynamic gate is cascaded by a static inverter, it is called Domino logic. Domino gates runs faster than the static gates as they present much lower input capacitance for the same output current and a lower switching threshold.
The leakage immunity is of more concern in high fan-in circuits because of larger leakage due to more parallel evaluation paths. Since the leakage current is proportional to the fan-in, the noise immunity decreases with increase in fan-in. Leakage and noise immunity are major issues for the high fan-in domino logic circuits, because the evaluation transistor are all in parallel, leaking the charge from pre-charge node. In this paper we are proposing a technique to reduce power and increase speed and noise immunity of a high fan-in domino gate.
PROBLEM STATEMENT :
The basic domino logic stage consists of logic realized using N-MOS (Mn) in pull down network and the pull up net work consists of a single P-MOS (Mp) to pre charge the dynamic node to logic high as shown in Fig.1 . The dynamic node is cascaded into a static inverter from where the gate output is taken and can be connected to the N-MOS input of the next stage [1] . When clock = 0, the dynamic node charges to Vdd and the bottom transistor Mn is responsible for holding the charge on the dynamic node irrespective of the input combination applied to the pull down network. Thus the output goes to logic 0 during this interval (pre-charge phase). When the clock = 1(evaluation phase) the pre-charge transistor (Mp) goes off, allowing the dynamic node to settle down to a state determined by the inputs . Based on the logic implemented, the charge on the dynamic node may be retained at logic 1, thus output remains at logic 0 or the dynamic node may get discharged to logic 0 and output may rise to logic 1.
During evaluation phase when all the inputs are at logic 0, dynamic node should be at logic 1, but the pull down network leaks the charge stored on the dynamic node due to sub threshold leakage. This is again compensated by P-MOS keeper (Fig.2) , which aims to restore the charge on the dynamic node. But when a noise pulse occurs at any of the input such that pull down network provides a direct path to ground, the keeper may not be able to retain the charge on the dynamic node and the node gets wrongly discharged. As the noise in Domino gates is becoming more important than area, power and delay issues in the sub micro meter regime , recently several techniques have been proposed [6] , [7] to reduce noise in domino circuits. All the techniques have aimed at reducing the noise effect, but have several drawbacks related to area, power and delay.
In section II existing domino techniques were discussed, section III discusses the proposed scheme, Noise metrics were discussed in section IV, Encoder design with proposed OR gate is presented in section V, simulation and results compared with other existing schemes is presented in section VI and section VII presents the conclusion.
BACKGROUND AND RELATED WORK
To compensate the leakage at dynamic node a weak transistor called keeper transistor is used. It prevents the charge loss and keeps the dynamic node at strong high when pull down network is off. In the first Domino proposal [3] the gate of the keeper is connected to ground which makes it always ON. Thus at the beginning of the evaluation phase if the pull down network turns ON, the dynamic node tends to discharge through PDN and keeper starts injecting the lost charge to the dynamic node as it is always ON, which results in contention. This technique introduced a potential DC power consumption. In order to reduce this extra power dissipation, a feedback keeper was proposed in [4] , [5] . In this the PMOS gate of the keeper (Fig.2) is connected to the output of the static inverter. Thus during Pre-charge the dynamic node is at high, and the keeper remains on and during evaluate phase if the pull down net work is on, dynamic node gets discharged and the output node is at logic high which makes the keeper transistor off, thus eliminating contention.
In [6] a diode footed Domino was proposed (Fig.3) , where an NMOS transistor Mn is connected in diode configuration. In this configuration, the leakage current flowing through the PDN in evaluation phase causes the drop across the diode transistor making Vgs negative, thus reducing leakage current. The performance degradation can be compensated by the mirror network. The inverted clock increases the capacitive load of the clock driver. With the associated network, power and area requirement increases.
In [7] the circuit is based on pull-up network, consisting of only n-MOS transistors. This style does not have a p-MOS transistor. When the clock is low the pre-charge transistor M1 is switched on and the dynamic node is charged to 0v. When the clock is high, M1 is off and the dynamic node gets conditionally charged by the pull-up network to Vdd, but due to the absence of pull-up transistor, the node gets charged to Vdd-Vth and this drop is compensated by M2 the keeper transistor. This circuit needs an inverted clock which increases the capacitive load and area to invert the clock.
In [8] an additional evaluation transistor M5 is added in order to stack M3 and make its gate-to source voltage small, thus making the circuit noise robust and less leakage power consuming (Fig. 4) . The performance degradation is compensated by widening the keeper transistor M2.
PROPOSED SCHEME
The proposed domino circuit is as shown in the figure 5. Transistor M3 and M5 are connected between the dynamic-node and ground and the gate of M3 transistor is connected to the OUT terminal and M3 is stacked with M5. During evaluation phase when PDN is on with one or more inputs connected to logic one, the transistor M4 discharges the dynamic-node and the Out terminal goes to logic '1' and M3 becomes on which aids in faster discharge of any accumulated charge on dyn_node along with PDN and M4. The rate of discharge can be controlled by changing the W/L ratio of M4. When all the inputs are at logic '0', output stay at logic '0' and M3 remains off and thus dyn_node retains its charge. If any input changes from logic '0' to logic '1' , PDN becomes conducting and during evaluation phase when clock is high, dyn_node discharges below the threshold voltage of the inverter turning its output to logic '1'. When output becomes logic '1', M3 turns on providing a path to discharge dyn_node quickly as M5 is also ON during evaluation phase. At the same time as the source node of M6 being connected to N-foot, effect of noise in the circuit can also be reduced.
NOISE METRICS
The noise that can cause un recoverable logic errors have high amplitudes and pulse widths. As the amplitude and pulse width decreases, their effect on dynamic circuits tends to diminish. This can be better understood using noise immunity curves (NIC's) [9] and [10] . Fig.6 shows two typical NIC's, NIC1 and NIC2 where all points on and above the curve represent a set of noise amplitude(Ai) and noise pulse width(Wi) combination which cause logic error. All other amplitude and width combination points that are below the curve will not affect the behavior of the circuit. A circuit with its NIC shifted to the above portion is more robust than others. These pairs are used to calculate ANTE (Average Noise Threshold Energy) metric
Where i is the number of points defining the noise immunity curve. A higher ANTE implies more energy requirement to generate a logic failure.
ENCODER DESIGN
An encoder has 2n inputs and n outputs .The output lines as an aggregate generate the binary code corresponding to the input value. The encoder can be implemented with OR gates whose inputs can be directly determined from the truth table. A 4-to-2 encoder and an octal -to -binary encoder is designed using the proposed OR circuit.
For a 2-to-4 encoder the outputs out0 and out1 are given by out0 = in1+ in3 ……………….. (2) out1 = in2 + in3 ……………….. (3) where in0, in1, in2 and in3 are inputs to the encoder.
Similarly for an octal to binary encoder the outputs out0, out1 and out2 are defined as out0 = in1 + in3 + in5 + in7………………….(4) out1 = in2 + in3 + in6 + in7 …………………(5) out2 = in4 + in5 + in6 + in7 ………………… (6) where in0, in1, in2, in3, in4, in5, in6, in7 are inputs for the 8-to-3 encoder. 
SIMULATION AND RESULTS

a. Power and Delay measurements
The circuits were simulated using Tanner T-spice 15.0 with 16nm technology PTM files, and 1V supply is used. The proposed OR gate is compared with OR gate of the existing techniques. The OR gate was implemented because it is a typical example of wide pull down network. It is found that the proposed technique performs better than the existing techniques. The power and delay were measured using T-Spice and PDP was calculated. Delay for various circuits is measured using window technique. The inputs were set to the required combination during precharge phase and clock is made high to start evaluation phase. Then the delay is measured with respect to start of evaluation phase. Power is measured using T-Spice. Table. 1 shows the power and delay measured for basic domino with keeper, scheme proposed in [8] and for the proposed techniques with various supply voltages (with a 20% variation from normal voltage). Table.2 and Chart 1 shows the PDP of the proposed circuit in comparison with the previous circuits. A plot of effect of supply voltage on delay is shown in Fig.8 , and Fig.9 shows the simulated wave forms for a two input OR gate for various techniques. As can be seen from the wave forms the ripple in the Output is less in the proposed technique compared to the others thus reducing power dissipation. Output fall time is also less compared to other schemes, thus resulting in lower PDP Table 1 . Power and Delay comparison of the proposed scheme with existing schemes Table 2 . PDP comparison of the proposed scheme with existing schemes 
b. Noise analysis
The proposed circuit along with the other circuits is exerted for noise pulses of different pulse widths with a clock frequency of 25MHz, and the corresponding amplitude of the noise pulse to generate logic failure is measured. The amplitude and width pairs (Ai,Wi) are shown in Table 3 . Table 3 . Noise pulse width Vs Amplitude Table 3 shows the pulse amplitude required for Scheme of [8] and proposed Scheme for noise pulse widths from 400ps to 2ns. The circuits were simulated with a clock period of 4ns (25MHz). ANTE was calculated for Scheme of [8] and is 1.55e-9J and for Proposed Scheme 1.57e-9J. The proposed Scheme shows an improvement of 1.83 percent in Average Noise Threshold Energy. Fig.11 shows noise immunity curve for both the schemes.
C. Encoder simulation results
A 4-to-2 encoder designed with proposed technique is simulated for all input combinations and the Fig.11 shows the simulation results. Fig.12 shows the simulation results of Octal to Binary converter for all input combinations. 
CONCLUSION
In this paper we have proposed high speed low PDP domino logic circuit, which exhibits some noise tolerance at the output node. Simulations are done using Tanner T-Spice with PTM 16nm-low power technology files. From the results it is proved that the proposed design is better than the previous designs and offers about 29% reduction in delay and 3.5% reduction in PDP is achieved. Average Noise Threshold Energy (ANTE) is calculated and found an increase of 1.83% in ANTE. An octal-to-binary encoder is designed and simulated with proposed technique.
