This paper describes the Differential Pass Transistor Pulsed Latch (DPTPL) which enhances D-Q delay and reduce power consumption using NMOS pass transistors and feedback PMOS transistors. The proposed flip-flop uses the characteristic of stronger drivability of NMOS transistor than that of transmission gate if the sum of total transistor width is the same. Positive feedback PMOS transistors enhance the speed of the latch as well as guarantee the full-swing of internal nodes. Also, the power consumption of proposed pulsed latch is reduced significantly due to the reduced clock load and smaller total transistor width compared to conventional differential flip-flops. DPTPL reduces E × D by 45.5% over ep-SFF. The simulations were performed in a 0.1 µm CMOS technology at 1.2 V supply voltage with 1.25 GHz clock frequency.
Introduction
As the operating clock frequency of microprocessor goes higher due to advanced process technology and deep pipeline, the clock period gets shorter and the flip-flop overhead increases. Because the short clock period can be reduced to 6 to 8 FO4 [1] , it is highly demanding to use highspeed flip-flops. Power consumption of flip-flop is another significant problem in many digital systems. In a recent high frequency microprocessor, the clocking system consumed 70% of the total chip power consumption [2] . In the clocking system, 90% of the power is consumed by the flip-flops. As a result, it is important to reduce the power consumption of the flip-flop. In addition, the number of flip-flops required increases in deep pipeline architecture, area overhead by flip-flops has become a serious problem.
There are a lot of flip-flops in the literature that reduce either delay or power consumption or both. The MasterSlave Latch (MSL) is a good candidate for low power application [3] , [4] . Hybrid latch flip-flop (HLFF) and semidynamic flip-flop (SDFF) have small delay at the cost of power consumption [4] - [6] . There are sense amplifierbased flip-flop (SAFF) and modified sense amplifier-based flip-flop (MSAFF) as differential type flip-flops [4] , [7] . The ep-SFF has advantages of lower power consumption and small delay [8] . In addition, there are reduced clockswing flip-flop (RCSFF) and low-swing clock double edge- triggered flip-flop using reduced clock swing scheme which reduces power consumption of clock networks [9] , [10] . The modified SDFF (MSDFF) is one of the fastest flip-flops [11] . However, it still consumes large amount of power. This paper proposes pulsed-latches using pass-transistor logic which exhibits fast speed, low power consumption and simple structure. To overcome the voltage drop of the pass transistor logic for 'High' input data, proposed pulsed latches use positive feedback PMOS transistors to restore full V DD . The rest of this paper describes the characteristics of the conventional flip-flops and proposed flip-flop and their simulation results of D-Q (data to Q) delay, total transistor width, setup time, power, P × D (power delay product), and E × D (energy delay product) in each case.
Conventional Flip-Flops
In this section, several conventional flip-flops will be described. The MSL using the transmission gate master-slave latch pair is reported as a low-power flip-flop. Although the Clk-Q (clock to Q) delay of MSL is small, the large setup time of MSL makes the D-Q delay of MSL relatively large. Also, the positive setup time of MSL makes slack borrowing which utilizes time left over by previous partition difficult. Both HLFF and SDFF have been mentioned as fast flip-flops. Their D-Q delay is smaller than that of MSL, because they have the negative setup time. However, both of them have two disadvantages. One is that they consume much more power due to use of dynamic circuits. The other is that the Q output can have a voltage bump if the 'High' data input feeds them when output Q is 'Low.' MSDFF improves on the design of SDFF by improving D-Q delay and avoiding the glitch consuming needless power. However, MSDFF still consume more power than MSL, HLFF and ep-SFF.
SAFF and its modified version (MSAFF) are based on sense amplifier, and sense a small difference between inputs D and Db. However, SAFF has asymmetric rise and fall times because of SR latch which is a speed bottleneck. SAFF incurs a large area cost and large power consumption because of having many transistors. The MSAFF has symmetric rise and fall times and faster D-Q delay than a SAFF. However, MSAFF uses many transistors incurring a large area cost. The ep-SFF is based on a single latch using pulsed clock generator as shown in Fig. 1 . The pulsed clock generator provides pulsed clock as shown in Fig. 1(c) . The generated short pulse width is controlled by the delay of three stage inverters. It has fair D-Q delay, consumes small energy and occupies small area. Figure 2 shows the proposed the differential pass transistor pulsed latch (DPTPL). DPTPL is differential type flip-flop having two data inputs and outputs. DPTPL consists of two parts, a pulsed clock generator and a static latch. The static latch consists of four parts, pass transistors, feedback PMOS transistors, clocked feedback NMOS transistors and output drivers. Generally, NMOS transistor has a stronger mobility than that of PMOS transistor. Using the scheme that NMOS pass transistor has a better drive strength than the transmission gate of having equivalent size, the PMOS transistor of the transmission gate of ep-SFF (P1) is removed. However, NMOS pass transistor has a shortcoming which can not swing up to VDD. Therefore, the PMOS transistors which prevent an internal node A and B from voltage drop of Vth are connected. Also, because of making PMOS transistors cross-coupled, the PMOS transistors not only keep away the voltage drop of Vth but also shorten the evaluation time of DPTPL due to positive feedback.
Copyright c 2006 The Institute of Electronics, Information and Communication Engineers

Proposed Flip-Flop Design
In Fig. 2(a) , when the dck is 'Low,' in order to main- tain the state of previous stage, small feedback NMOS transistors are used with dclkb control. While HLFF and SDFF use a back-to-back inverter type at the output node without clock control, small feedback NMOS transistors in DPTPL are controlled by clock signals to prevent fighting current, which makes DPTPL faster with less power consumption. DPTPL has the advantage of short D-Q delay time since it needs only one NMOS pass transistor and one inverter from the input to the output. Also, because DPTPL has a symmetrical structure, D-Qb (data to Qb) and Db-Q delay are almost the same. DPTPL uses pulsed clock generator which supplies a static latch with pulsed clock. The power consumption of the pulsed clock generator in pulsed latches can be a significant portion of the total power consumption. If an external Local Clock Buffer (LCB) includes a pulsed clock generator and provides flip-flops with pulsed clock signal, the power consumption of the pulsed clock generator in each flip-flop can be reduced. DPTPL has large negative setup time making slack borrowing possible through the pulsed clock generator. As shown in Fig. 2 , the schematic is simpler than other differential type flip-flops, which dramatically reduces area cost.
The operation of DPTPL is as follows. During the short pulse width which is made by pulsed clock generator, when the dck is 'High,' NMOS pass transistors turn on and transmits input data to output, at that time, feedback NMOS transistors are off. Consequently, DPTPL can be considered as an edge-triggered flip-flop. The PMOS transistor of DPTPL prevents an internal node A and B from voltage drop of Vth. When the dck is 'Low,' the pass transistors are off and feedback NMOS transistors are on. The small feedback NMOS transistors as well as cross-coupled PMOS transistors make the latch hold the previous state.
Simulation Conditions and Test Bench
The simulation of DPTPL and conventional flip-flops are performed with two methods. First, all flip-flops are simulated in a 0.13 µm CMOS technology at 100
• C with 1.2 V supply voltage and normal process corner. Operating clock frequency in this simulation is 1.25 GHz. For fair comparison of simulation results, all of the flip-flops are optimized to have minimum E × D with the same output load of 25 fF. Figure 3 shows block diagram for measurement of delay and power. Flip-flop speed is measured by data to output delay, which is the real performance of a flip-flop. Each flip-flop is designed to have balanced 'Low' to 'High' and 'High' to 'Low' transition delays, and the worst case delay is selected Fig. 4 . Operating frequency in this simulation is 1 GHz. It is difficult to feed a clk and data signals with a small time difference such as 5 ps, 2 ps, etc. Hence, a signal generator to control the time difference is needed.
As shown in Fig. 5 , while passing the three phase interpolators, the delay (τ) from the delay cell is divided into three delay categories of τ/2, τ/4, and τ/8. By utilizing these delays from the phase interpolator, clk and data signals can be generated with a fine time difference. The outputs from the multiplexer will provide two cases. One case is for the clk signal leading the data signal, the other case is for the clk signal lagging the data signal. The former case is to measure the positive setup time and the latter is to measure the negative setup time of each flip-flop. Furthermore, the time difference between the clk signal and data signals is controlled by Vctrl, which is applied from outside of the chip to control the delay of delay cell. Figure 4 shows that the layout is divided into four sub-blocks. The first sub-block is for a power measurement. Thirty-two flip-flops are connected to measure the amount of power consumption. The second sub-block is the clk leading block which is used to measure the positive setup time of each flip-flop. The third sub-block is the clk lagging block which is used to measure the negative setup time of each flip-flop. The circuit generates all the necessary signals for other sub-block. With this circuit, setup time, Clk-Q delay, D-Q delay, and power of DPTPL, MSL, SDFF, ep-SFF, and MSAFF can be measured.
Simulation Results
Waveforms
The signal waveforms of DPTPL are shown in Fig. 6 . After clk signal goes 'High,' input D arrives after 45 ps and then the voltage at node A follows the input D. Finally, output Qb goes 'Low' after 57 ps.
Delay
The worst case delay of each flip-flop is plotted in Fig. 7 . The setup time of proposed DPTPL is about −45 ps. As shown in Fig. 7 , because of the largest negative setup time of DPTPL, the minimum D-Q delay of DPTPL is smaller than other conventional flip-flops. Consequently, the speed improvement over conventional flip-flops is up to 45%. Figure 8 presents the power comparison with 20% input data activity. Internal power in Fig. 8 for ep-SFF and DPTPL includes power consumption by a pulsed clock generator. DPTPL reduces the power consumption by 23% to 25% over conventional differential flip-flops. Compared to single-ended flip-flops, the DPTPL's power consumption is smaller than HLFF and SDFF but lager than ep-SFF and MSL. If the pulsed clock generator is embedded in a local clock buffer (LCB), additional power savings can be achieved.
Power
PDP and EDP
D-Q and D-Qb delays are summarized in Table 1 and the worst one is used to calculate both P × D and E × D. Simulation results of P × D and E × D are presented in Table 1 . DPTPL reduces P × D and E × D up to 46% and 69% compared to differential type flip-flops. PDP and EDP of DPTPL can be improved significantly because the speed of DPTPL is faster than conventional differential flip-flops as well as DPTPL has smaller clock load and total transistor width. Table 1 summarizes the general characteristics of proposed and conventional flip-flops. Ep-SFF and DPTPL have good negative setup time characteristic and large hold time. Some padding may be needed to avoid shortest path problem. Total transistor width of proposed pulsed latches is decreased significantly compared to their counterparts. If the pulse generator is embedded in LCB, further area and power savings can be achieved.
Conclusion
In this paper, the DPTPL are presented. The speed, power consumption, P × D, and E × D characteristics are compared among conventional flip-flops and proposed flip-flops. DPTPL utilizing the strong drivability of NMOS and positive feedback PMOS transistors enable faster operation than their conventional counterparts. It also has advantage of lower power consumption mainly due to simple scheme and smaller clock load and total gate width. Therefore, DPTPL reduces E × D by 45.5% over ep-SFF, which has the best characteristics in our simulation among conventional flip-flops, at the conditions of 100
• C, 1.2 V supply voltage and normal process corner with 1.25 GHz clock frequency. Hence, DPTPL is a good candidate for deep-pipeline multiGHz microprocessors that demand high-speed, low-power operation with small area.
