Abstract-In this paper, we present a highly reliable and flexible CMOS differential logic called current sensing differential logic (CSDL). This CSDL eliminates the timing constraints between the enable signal and input signals, which cause difficulties in design with conventional differential logic families, by employing a simple clocking scheme. The power-delay product of CSDL is also reduced by using a swing suppression technique. To verify the reliability and the applicability of the proposed CSDL in large very large-scale-integration systems, a 64-bit carry-lookahead adder has been fabricated in a 0.6-m CMOS technology. Experimental results show that the critical path delay is 3.5 ns with a power consumption of 27 mW at 50 MHz.
Current Sensing Differential Logic: A CMOS Logic for High Reliability and Flexibility
Joonbae Park, Jeongho Lee, and Wonchan Kim Abstract-In this paper, we present a highly reliable and flexible CMOS differential logic called current sensing differential logic (CSDL). This CSDL eliminates the timing constraints between the enable signal and input signals, which cause difficulties in design with conventional differential logic families, by employing a simple clocking scheme. The power-delay product of CSDL is also reduced by using a swing suppression technique. To verify the reliability and the applicability of the proposed CSDL in large very large-scale-integration systems, a 64-bit carry-lookahead adder has been fabricated in a 0.6-m CMOS technology. Experimental results show that the critical path delay is 3.5 ns with a power consumption of 27 mW at 50 MHz.
I. INTRODUCTION
A LARGE number of CMOS differential logic families have been introduced for applications in low-power and high-speed logic systems [1] - [6] . The main stream underlying the evolution of these differential logic families is to employ an acceleration circuit in order to reduce the large RC delay, which increases with the tree height of the nMOS transistors. Although the driving capability of a complex logic function implemented in a single gate can be improved, there is a penalty to be paid. It suffers from a severe timing constraint between the enable signal of output drivers and input signals when several gates are cascaded.
The fundamental reason for that constraint, we observed, is that the acceleration circuits operate irrespective of input signals. Usually, for the activation of a stage, the enable signal generated by the preceding stage is used. If the enable signal arrives earlier than input signals, false outputs can be generated by output drivers regardless of input signals. To avoid this problem, it should be ensured that the input signals are fully settled before the logic gate is enabled. The determination of that timing interval is affected by many parameters, such as mismatches in device parameters and differences in load capacitance values. These environmental variations set the speed limit for reliable operation.
Because of this restriction, the actual speed improvement obtained by the output stages in cascaded gates is not as large as anticipated.
In this paper, we present a new approach that eliminates the timing problems and enables highly reliable and flexible logic evaluation: current sensing differential logic (CSDL). When cascaded, CSDL operates in a manner similar to domino logic, providing a reliable operation. There is no negative side effect of this logic, and the power-delay product of CSDL is comparable to that of the previous differential logic families. Therefore, the proposed CSDL is suitable for implementing large very large-scale-integration (VLSI) systems consisting of iterative modular cells, enabling fast design time and simplified interconnections.
II. CURRENT SENSING DIFFERENTIAL LOGIC
CSDL is conceived to solve the problems mentioned above while maintaining the advantage of low power consumption. These improvements are made primarily by its simple clocking strategy. Fig. 1 shows the schematic of the proposed CSDL. It employs the same number of transistors as differential current switch logic (DCSL) in [5] , consisting of a complementary nMOS logic tree, pull-up transistors, two inverters, and five clocked transistors for precharge and logic evaluation. Here, the voltage swings of parasitic nodes in the nMOS logic tree are restricted, as in DCSL, ensuring the reduction of power consumption.
In CSDL, output drivers with a separate enable signal are removed in order to obtain a reliable operation. This eliminates the skew problem between the enable signal and input signals. The two inverters connected to nodes and separate these nodes from load capacitance values, which restrict the load capacitance values of and to gate capacitances of the inverters. Hence the speed degradation of CSDL in the case of heavy load conditions is not so large owing to buffeting of these inverters. Instead, the operation of CSDL is more sensitive to fan-in rather than fan-out numbers. The swing suppression technique adopted in DCSL is also used to reduce the discharging time of internal nodes. The overall operation 0018-9200/99$10.00 © 1999 IEEE of CSDL becomes quite similar to the domino logic due to these inverters.
The detailed operation of CSDL is as follows. The precharge cycle begins with a high-to-low transition of the clock signal. Both outputs become low because and become high due to the pull-up action of transistors and , respectively. Transistors and are turned off during the precharge period, while transistors and are turned on. Since transistors and are turned off, no through current flows from to ground. The voltages of and are equalized by transistor during the precharge period. When the clock signal goes to high, CSDL enters the evaluation period. Transistors and turn on, and CSDL undergoes a conditional discharge. Unlike other differential logic families, a CSDL gate is evaluated only after the current path to the ground is established through the nMOS logic tree. The outputs are evaluated according to the difference of currents in transistors and . When the current flowing in transistor is larger than the current in transistor , remains high by transistor , while , , and are discharged to ground through the logic tree. Then becomes high, while remains low. Just after the clock signal becomes high, both transistors and strongly turn on. Since the pulldown strength of transistor is greater than that of transistor , discharges faster than . This, in turn, further weakens the pulldown strength of transistor , because transistors and have a positive feedback configuration. As the voltage of decreases, the current flowing through transistor decreases. After the voltage of becomes lower than the threshold voltage of transistor , the current flowing to the logic tree becomes zero, and thus no further charge is supplied to the internal nodes of the logic tree. This crosscoupled connection saves power consumption like DCSL by disconnecting and from the nMOS logic tree after the logic evaluation is completed. If there were no crosscoupled transistors, all parasitic nodes connected to would be fully charged, resulting in a waste of power consumption. Thus, the swing suppression technique helps to reduce power consumption.
During the precharge period, only and are precharged to , and the voltages of the internal nodes in the logic tree are close to the ground rather than , which causes the charge-sharing problem. This charge-sharing noise generates some amount of glitches in and . Transistors and together with transistors and prevent this chargesharing noise. The two inverters in a CSDL gate, and , not only pull up the load capacitors but also prevent the glitches from being transferred to the next stages. To this end, the logic threshold voltages of and must be lowered so that those glitches vanish at and . This glitch suppression also helps to achieve a reliable operation by avoiding the simultaneous current path through the differential logic tree. The amount of charge-sharing noise fairly depends on the tree height and the ordering of input signals in the logic tree. To minimize the charge-sharing noise, the input signals that have arrived late must be connected to the points close to and . Transistors and are also used for eliminating the static power consumption by fully charging node or to after the logic evaluation. If they are removed, and cannot be fully charged to due to the threshold voltages of transistors and . This causes one of the transistors or to turn on weakly and the CSDL gate to consume static current. Transistors and may have very small ratios because they pull up and to only after the logic evaluation.
III. PIPELINED OPERATION
In clocked logic families including CSDL, the availability of a pipelined operation is essential in maximizing throughputs. Since valid outputs of a CSDL gate are only available during one clock level, the outputs must be latched in order to be used by the next pipeline stage during the other clock level. Fig. 2 shows a CSDL gate with a latched output stage, which is formed by cross-coupled NAND gates. Since inputs of the NAND gate connected to and all become high during the precharge period, the logic levels of the outputs are maintained.
Occasionally, several CSDL gates are cascaded in a pipeline stage to perform a complex function. Fig. 3 shows an example of cascaded CSDL gates. The clocked transistors and can be eliminated only if all inputs are supplied from the former CSDL gates within the same pipeline stage, as shown in Fig. 3 . Since all input signals of a CSDL gate are low during the precharge period, transistors and can be safely eliminated without the through current. The elimination of those clocked transistors gives an advantage of lowering the delay time by reducing the number of stacked transistors.
The operation of cascaded CSDL gates arises from the manner in which the logic propagates through the circuit from left to right like the domino logic. The logic evaluation of the present stage is performed only after the preceding stages are all evaluated. Thus, the evaluation of the CSDL gate is determined by the inputs, not by the clock signal. This greatly simplifies the design and improves reliability when several CSDL gates are cascaded. In cascaded CSDL gates, the completion signal of a gate is identical to the low-tohigh transitions of output signals in former stages. This makes it possible for a CSDL gate to avoid the logic transmission errors, which occur in conventional differential logic families, caused by the timing skew between the enable signal and input signals. Hence CSDL provides a highly reliable operation without any degradation of performance for moderate fan-in numbers, which makes it suitable for the system design.
IV. PERFORMANCE COMPARISONS

A. Comparison Between CSDL and DCSL
The performances of the CSDL and the previous DCSL in [5] have been compared using 0.6-m CMOS model parameters. In Fig. 4(a) and (b) , the propagation delay and the power consumption versus the tree heights of NAND gates are shown, respectively. The propagation delay was measured as the time from 10% of the input value to 90% of the output value. The supply voltage was 3.3 V, and the number of fan-outs was six.
The propagation delay of DCSL is smaller than that of CSDL. It is due to the shorter discharging path to ground in DCSL. As mentioned above, the problem of CSDL is that the propagation delay is sensitive to the tree height, increasing the propagation delay rapidly as the tree height increases. Here, the propagation delay of DCSL is insensitive to the depth of tree height because of the output stages formed by crosscoupled inverters. It is to be noted, however, that the power consumption of CSDL is smaller than that of DCSL when the tree height is small. This is because the short-circuit current flowing through the output stage in DCSL is larger than the sum of the current flowing through inverters and the charging current through the parasitic nodes in CSDL. However, as the depth of the tree height increases, the discharging slope of and in CSDL become steeper, which increases the charge buildup in internal parasitic nodes and the short-circuit current flowing through inverters. Fig. 5 shows the power-delay product of CSDL and DCSL. The performance of CSDL is comparable with that of DCSL because of the swing suppression technique if the depth of the logic tree does not exceed five.
B. Sensitivity Analysis
Although the low power consumption and short propagation delay are considered as primary issues for designing logic gates, the reliable operation must be guaranteed for practical use. The reliability is affected by many parameters, such as load mismatches, device mismatches, and structural imbalances. The variations of device sizes and threshold voltages are the main sources of device mismatches. Fig. 6 shows the simulated sensitivity of DCSL and CSDL for NAND gates. When the offset voltage caused by the mismatches of the output stages has enough driving strength to exceed the signal current transferred to the output stages, DCSL leads to a malfunction. On the other hand, the sensitivity of CSDL means allowable size variations between transistors and relative to transistors and in Fig. 1 . The values of size variations have been calculated at the point when the glitches in outputs begin to exceed the threshold voltage of an nMOS transistor.
The charge-sharing noise of CSDL becomes larger as the tree height increases. Nonetheless, it must be noted that the sensitivity of CSDL does not mean the wrong operation, unlike DCSL. If the pulse width of a glitch in an output node is narrow and its height is small, it does not cause any significant problem on the operation of the next stage.
The simulation above reveals that circuit imbalances will result in a serious problem in the case of DCSL rather than CSDL, which means that CSDL is much more reliable and flexible. Thus CSDL is suitable for implementing random logic functions, while the availability of the DCSL is restricted to The improved flexibility and reliability of the CSDL expand the application area to large and complex VLSI circuits. A 64-bit carry-lookahead adder (CLA) has been designed and fabricated to verify these features. The CLA is widely used because it is advantageous in terms of operating speed, area, and modularity [7] , [8] . The critical path of our adder is designed to have six stages to restrict the maximum tree height to five, which prevents the speed degradation caused by large tree height. The coalition of other components has been avoided as much as possible because it increases delay time due to the large fan-in numbers. Instead, the basic blocks were used repeatedly, which simplifies the adder design. This modular design makes the design regular and the design period short because only a few cells need to be designed. Fig. 7 shows the microphotograph of the fabricated 64-bit CLA. The active area of the adder is 870 1380 m . Fig. 8 shows measured operating waveforms of the fabricated adder. The sum outputs are toggled according to the least signficant-bit input signal, which shows a correct operation. Fig. 9 shows the detailed waveforms of the adder. The critical path delay is 3.5 ns at a supply voltage of 3.3 V, and the power consumption including that of the clock buffer is 27 mW at 50 MHz.
VI. CONCLUSIONS
In this paper, a new differential logic called CSDL has been proposed, which is suitable for designing systems with enhanced reliability and flexibility. Regenerating output drivers with enable signals are not used so that the operation of CSDL is insensitive to structural imbalances and device mismatches. The elimination of the timing constraint between the enable signal and input signals increases the degree of freedom in constructing logic functions and the connections of gates. The speed degradation caused by the elimination of the regenerating output driver is partially compensated for with the swing suppression technique and buffering inverters. The features of CSDL, flexibility and reliability, make it effective for constructing VLSI systems. The fabricated 64-bit carrylookahead adder as an example of system applications has a critical path delay of 3.5 ns and consumes 27 mW at 50 MHz.
