Abstract-In this paper, a new charge-recycling differential logic named split-level precharge differential logic (SPDL) is presented. It employs a new push-pull type output driver which is simple and separated from the NMOS logic tree. Therefore, it can improve energy efficiency, driving capability, and reliability compared with the previous differential logic structures which use cross-coupled inverters as the output driver. To verify the reliability and the applicability of the proposed SPDL in VLSI systems, an 8-bit full adder is fabricated in a 0.6-m CMOS technology. Experimental results show that the performance of the SPDL is about two times as good as that of the previous half-rail differential logic (HRDL) in terms of power-delay product. Moreover, the SPDL has stable operation under mismatch or parameter variation.
I. INTRODUCTION
R ECENTLY, various logic styles have been introduced to implement high-performance digital systems, replacing conventional static CMOS logic styles. Among them, some clocked CMOS differential logic structures were introduced [1] - [5] . They have high-speed and low-power characteristics due to small load capacitance using only the NMOS logic tree as in domino logic. Moreover, they can acquire high speed and good logic flexibility through the effective use of differential output signals.
As energy efficiency becomes a more important issue in portable systems or other applications, several charge-recycling techniques for differential logic have been introduced in order to increase the energy efficiency [3] - [5] . A full-level precharge differential logic precharges and discharges one of two complementary output nodes in every clock cycle. On the other hand, in the charge-recycling differential logic, the discharged and pulled-up output nodes are precharged to about halflevel by equalizing two output nodes in precharge phase without additional charging. Therefore, it has about two times as good efficiency as the full-level precharge differential logic structures. This technique is more efficient in heavily loaded conditions. Previous charge-recycling structures generally use cross-coupled inverters as output drivers. However, they have a reliability problem because they are sensitive to offset voltage. Moreover, since halfgate-to-source voltage is precharged to the input and output nodes of the output drivers for charge-recycling operation, it degrades driving capability in initial evaluation phase.
In this paper, we describe a new high-speed and low-power charge-recycling differential logic, named split-level precharge differential logic (SPDL). It uses new output drivers which are reliable and insensitive to the offset problems. Moreover, since the input nodes are precharged to full-swing signal level instead of half-, their driving capability is better than that of previous ones. Fig. 1(a) shows a charge-recycling differential logic (CRDL), which applies the charge-recycling concept to the differential logic structure for the first time [4] . The complementary outputs of a CRDL gate are precharged to halflevel by connecting
II. PREVIOUS CHARGE-RECYCLING DIFFERENTIAL LOGIC STRUCTURES
OUT and OUT in the precharge phase. In an ideal situation, the charge-recycling technique saves 50% of the power consumption over the full-level precharge differential logic.
In the CRDL, cross-coupled inverters are generally used as the output driver. If the mid-level output voltage for the charge recycling is higher than the threshold voltage of the outputdriving devices, it induces static power consumption in crosscoupled inverters. To solve this problem, the CRDL uses highthreshold PMOS transistors as output driving devices. However, they require additional bias and their driving capability is reduced due to the high threshold voltage. Fig. 1(b) shows another charge-recycling differential logic, named half-rail differential logic (HRDL) [5] . It uses complementary clock signals allowing the tristate output driver to isolate the output nodes from supply or ground without additional threshold voltage control.
The CRDL and HRDL have good performance in terms of power consumption and speed, but they have reliability and large through-current problems in the output drivers. Generally, they use cross-coupled inverters as output drivers. They can efficiently accelerate logic evaluation by positive feedback even when the logic tree is deep. However, since equalizing the input nodes of the cross-coupled inverters for charge recycling makes the initial voltage difference between the nodes so small, the inverters are sensitive to offset voltage or noise. Moreover, since the gate-to-source voltage of the output-driving transistors is half-, they have small driving capability in the initial evaluation state and increase the evaluation time compared with a full-level precharge structure. This gate-to-source voltage also results in large through current in logic evaluation because all the pull-up and pull-down transistors with halfgate voltage in the output driving path are turned on during the initial evaluation phase.
For correct and fast evaluation, discharging through the NMOS logic tree must induce enough voltage difference at the input nodes of the output driver before the evaluation starts. However, providing the proper timing margin for a sufficient difference in voltage is a difficult task. Moreover, since the NMOS complementary logic tree is directly connected to the output loads, these structures have not only large output loads driven by the tree but also large offset voltage caused by the asymmetry of the tree itself. We propose the SPDL to overcome these problems as it efficiently separates the NMOS logic tree from output loads. Fig. 2 shows the block diagram of the proposed SPDL gate. It consists of output driver, precharge circuitry, and an NMOS complementary logic tree. Ein signals are enable signals for the NMOS complementary logic tree. If the SPDL is used for implementing the self-timed structure, eout and eout of the preceding stage are connected to ein and ein of the next stage, respectively. Fig. 3 shows the proposed SPDL structure. The detailed operation of the SPDL is as follows. When CLK becomes high, the precharge phase begins and MN , MN , MP , and MP in the precharge circuits are turned on. They set NG eout and NG eout to ground and PG and PG to , respectively. In the output driver, OUT and OUT nodes are connected through MN , recycling the charge used during the previous evaluation phase. Thus, all transistors except MN in the output driver are turned off so that the charge-recycling part is isolated from supply to ground rail.
III. SPLIT-LEVEL PRECHARGE DIFFERENTIAL LOGIC
When CLK goes low, output nodes conserve the floating state. To improve noise immunity during the evaluation phase, small The advantages of the SPDL can be summarized as follows. At first, it separates the NMOS complementary logic tree from output loads unlike the previous structures. Thus, it can decrease the propagation delay because the tree drives only small parasitic loads (at PG and PG nodes). Moreover, it can increase reliability because the asymmetry of the tree does not significantly affect the offset voltage of the output driver. The second advantage is that the pull-up and pull-down transistors in the output driving paths are not turned on at the same time during the evaluation phase. This raises the energy efficiency by eliminating through current. It also decreases the probability of the wrong or metastable state. The third advantage lies in the driving capability of the pull-up and pull-down transistors. In case of the CRDL and HRDL, gate nodes of these outputdriving transistors are precharged to halffor the charge recycling. In the SPDL, the gate-to-source voltages of the outputdriving transistors can be maximized to or ground level by precharge circuitry, enhancing the driving capability. Only the output nodes of the output driver are precharged to halffor charge recycling. Finally, the simplicity of the control circuit is another advantage. To generate control signals for the next stage, the CRDL needs five transistors and additional threshold voltage control and the HRDL needs nine transistors for generating complementary enable signals. The SPDL needs only four additional transistors for enable signals (MP , MP and enable transistors in Fig. 3 ). Other transistors in the precharge circuit are used in order to separate output loads and the NMOS logic tree.
IV. PERFORMANCE COMPARISON
To compare the performance of the proposed SPDL, we simulated some prelayout SPDL circuits and measured some postlayout ones with previously reported CVSL [1] and HRDL [4] .
It is difficult to compare the CRDL [3] with the SPDL because the CRDL uses special process parameters with high-threshold PMOS transistors. Simulations are performed by HSPICE using 0.6-m CMOS parameters. Fig. 4 shows the simulated performance of a four-input XOR gate according to tree-height (fan-in number) and load capacitance (fan-out number) at a 3.3-V supply voltage and 50-MHz clock frequency. Since the XOR logic tree has good logic symmetry, it is proper to compare the performance of the various differential logic gates without asymmetric effects. Fig. 4(a) shows that the SPDL has the best performance in terms of power consumption and propagation delay according to tree height (fan-in). The SPDL consumes less power than the HRDL due to small through current. In the SPDL, increasing tree height means only lengthening discharge path but, in the HRDL, it means not only a lengthening discharge path but also increasing through current during the delay time in the crosscoupled inverters. The power consumption of the CVSL is larger than that of the others by about two-three times because it does not have charge-recycling operation.
Propagation delay of the SPDL is also less than the that of the HRDL, though it increases more rapidly than that of the HRDL using cross-coupled inverters which can easily accelerate logic evaluation in a deep logic tree. If the fan-in number increases greatly, we efficiently use a cascaded SPDL gate with a high-speed characteristic. The propagation delay of the CVSL increases rapidly as the fan-in number increases because the CVSL drives directly output loads by cross-coupled PMOS and NMOS logic tree without additional output drivers. Fig. 4(b) shows the performance variation according to increase in the load capacitance (fan-out). The characteristic of power consumption is similar to that aforementioned. Propagation delay of the SPDL is also less than the others. The propagation delay of the HRDL increases more rapidly than that of the CVSL or SPDL due to the effects of the large output load. In the HRDL, large output loads can increase discharging time through the NMOS logic tree because the NMOS logic tree drives directly the large output load before the cross-coupled inverters have enough voltage difference.
The SPDL separates output loads from the NMOS logic tree. It can improve both the speed and reliability at the high fan-in and fan-out situations because the interaction between the NMOS logic tree and output node is negligible. Fig. 4(a) and (b) shows that the power-delay product of the SPDL is one and one-half to two times as good as that of the HRDL and two to five times as good as that of the CVSL. Fig. 5 shows the reliability of the SPDL and HRDL with asymmetric NMOS logic tree. Six-and eight-input AND/NAND gates are used as the asymmetric tree. As the logic tree mismatch increases, the HRDL exhibits error in operation but the SPDL does not with the alternating input signals. Since the HRDL uses cross-coupled inverters as output drivers, it is sensitive to the offset voltage which results from the mismatch of the logic tree, load capacitance, etc. On the contrary, the SPDL not using them is insensitive to the offset voltage. Moreover, since it separates output nodes from the NMOS logic tree, the asymmetry of the NMOS logic tree cannot affect the offset voltage of the output driver as mentioned above. Therefore, the SPDL can improve logic reliability over the HRDL. Similar results can be shown in asymmetric output loads. These results also show that the reliability of the SPDL is better than that of the HRDL. To compare system performance using the SPDL and HRDL, self-timed 8-bit full adders are implemented with a 0.6-m 1-poly 3-metal CMOS process in each logic structure. A microphotograph of the fabricated chip is shown in Fig. 6 . The active area of the SPDL adder is m . That of the HRDL adder is almost equal. Fig. 7 shows the waveforms of the final ripple carry outputs of the adder operating at a 3.3-V supply voltage. The delay of the SPDL adder is about 10 ns while that of the HRDL is about 25 ns. This delay can be decreased about 40% by optimizing the cascaded carry chain [3] . Fig. 8 shows the measured delay time of the carry output versus supply voltage. The speed improvement of the SPDL over the HRDL exceeds a factor of two. The power consumption of the SPDL and HRDL are 390 W and 490 W, respectively, measured at 10 MHz and a supply voltage of 3.3 V. These results show that the delay-power product of the SPDL is about three times as good as that of the HRDL.
V. CONCLUSION
An energy-efficient and charge-recycling CMOS differential logic, SPDL, is presented. A simple output driver of the push-pull type improves energy efficiency, driving capability, and reliability, compared with the previous output driver using cross-coupled inverters. The operation of the proposed SPDL is simulated and verified with an 8-bit full adder. Measured and simulated performance of the SPDL gates is about twice as good as that of the previous HRDL in terms of power-delay product. Moreover, the SPDL has stable operation under size mismatch or parameter variation.
