Abstract -A new CMOS differential logic, called the latched CMOS differential logic (LCDL), is proposed and analyzed. LCDL circuits can implement a complex combinational logic function in a single gate, and form the pipeline structure as well. It is shown that the LCDL with a fan-in number between 6 and 15 has the highest operation speed among those differential logic circuits. It is also free from charge-sharing, clock-skew, speed and race-free performance of the proposed LCDL.
No difficulties with numerical stability or convergence have been observed when the model is used with WATAND. On the other hand, expected thermal instabilities occur with some circuits and can be studied with the self-heating model.
It should be noted that the increase in model complexity over the standard GP model causes some increase in CPU time. For this work with the current mirror, CPU time for a single dc solution was a fraction of a second with the standard model versus a little over one second with the self-heating model. No rigorous study of CPU time was done since such studies depend upon the simulator and techniques used with it [8] .
The self-heating capability adds two nodes and two current variables to the basic GP model for an increase of four variables for each BJT. The number of variables is six for the standard GP model versus ten for the self-heating model.
VI. CONCLUSION
This paper has presented a self-heating Gummel-Poon BJT model and has demonstrated it with discrete and IC BJT current-mirror circuits. The discrete simulations showed error with respect to experimental measurements of 6.1% or less. In contrast, the standard GP model produced errors up to 84%. The IC current-mirror simulation showed the expected current tracking with the self-heating model, which also calculated the transistors' junction temperatures. LCDL circuits can implement a complex combinational logic function in a single gate and achieve high operation speed and high driving capability without static power dissipation. Moreover, the pipeline structure of the LCDL circuits is shown to have no charge-sharing problem and no race problem.
[41.
CIRCUIT TECHNIQUES AND CLOCKING STRATEGY

A. Pseudo-Two-Phase LCDL
The schematic diagram of the pseudo-two-phase LCDL circuit is shown in Fig. 1 . It consists of five major components: 1) the differential cascode NMOS logic tree, which performs the complex logic function; 2) the five-transistor latching sense amplifier M1-M5; 3) two precharge transistors M6 and M7; 4) the control MOS transistors M9 and M10, which isolate the sense amplifier from the NMOS logic tree, and the control transistor M 8 , which activates the logic tree; and 5 ) the dynamic clocked CMOS (C2MOS) output latches [6] Mll-M18, which enable the LCDL to form the pipeline connection. The clock timing diagram of the pseudo-two-phase LCDL is shown in immunity of the circuit to this type of clock skew. During time t3, both 42 and q2 are high. The loading of the NMOS complex logic is added to the sense amplifier. This does not affect the output voltages since definite results have been generated. Meanwhile, the C2MOS latches still hold the same results. Thus, this type of skew does not cause logic fault. From the above analysis, it is evident that the pseudotwo-phase LCDL has no clock-skew and race problems.
B. Pseudo-One-Phase LCDL
The pseudo-one-phase LCDL circuit is shown in Fig. 3 . As compared to the pseudo-two-phase LCDL, it has a simpler clock scheme and fewer MOS transistors. Moreover, it requires only one clock line in each gate. The pseudo-one-phase LCDL has two phases of operations, namely, the precharge phase and the evaluation phase. As the clock signal goes low, M6 and M 7 are turned on and this circuit is in the precharge phase. Noges A and B are precharged to VDD. Meanwhile, Q and Q are held in the previous output state by the modified C2MOS output latches [7] . In the evaluation phase, the clock signal raises to high and M5, M 8 , M10, and M13 are turned on. A path exists from node A or B to ground through one side of the differential NMOS cascode tree. This leads to a voltage difference between nodes A and B , which causes the sense amplifier to trip. Thus, the node ( A or B ) with the lower voltage is discharged rapidly to ground while the other node remains at VDD.
The pipeline connection of the pseudo-one-phase LCDL is shown in Fig. 3(b) . The circuit schematic diagram and the corresponding clock timing of the 4 stage are the same as those shown in Fig. 3(a) . The circuit structure of the & stage is similar to that of the 4 stage but with the clock signal 4 replaced by &. Based upon similar considerations in the pseudo-two-phase LCDL, it can be shown that this circuit also has no charge-sharing problem and is free from clock-skew and race problems.
COMPARISON AND EXPERIMENTAL VERIFICATION
A. Comparisons
The speed comparisons of the NORA-type pipeline structures with a multi-input NAND gate in each stage for the clocked CVSL, two-stage clocked CVSL, ECDL, SSDL, pseudo-two-phase LCDL, and pseudo-one-phase LCDL are shown in Fig. 4 , where SPICE-simulated minimum clock periods as a function of the fan-in number of the NAND gate are plotted. The logic gate has a unity fan-out number and a 0.2-pF output capacitive load. The 0.2 pF is equivalent to a fan-out number of 8. These SPICE simulation results are based upon the device parameters of the 1.2-pm CMOS process. Since the clocked CVSL can form a multistage domino-type structure in each pipeline section [5] , it is also separated into two stages in the comparison of Fig. 4 . For example, the 12-input NAND gate is separated into three 4-input NAND gates and one 3-input NAND gate. These simulation results in Fig. 4 are denoted as the two-stage clocked CVSL. Generally, multistage clocked CVSL is faster than the clocked CVSL, but the device count is higher.
From Fig. 5 it is seen that the operation speed of the LCDL is the fastest in complex logic application with a fan-in number smaller than 15 and greater than 6. For those gates with low fan-in numbers, the SSDL's speed is similar to the LCDL's. But the SSDL has considerable dc power dissipation. Moreover, for those gates with a fan-in number less than 6, the LCDL has no major benefits because of the required overhead devices. For those gates with a fan-in number beyond 15, the two-stage clocked CVSL tends to be faster than the LCDL. Thus, in this case, the multistage design in LCDL could be considered as a compromise between speed and device count.
B. Experimental Verification
Several experimental circuits were designed and fabricated to verify part of the simulated results on the proposed logic circuits. The experimental chips were fabricated in a 3.5-pm, single-metal, single-poly, p-well CMOS process. The test circuit of the pseudo-two-phase LCDL is a 15-input NAND gate. Fig. 5 shows the chip photograph of this test circuit. From the measurement results shown in Fig. 6(a) , this test circuit can work at a clock period of 24 ns. This result is in agreement with the SPICE-simulated minimum clock period of 24 ns. As shown in Fig.  6(b) , the fabricated 15-input pseudo-two-phase LCDL NAND gate can also work correctly even if there is a 5 4 s clock skew so that 42 lags + 2 and the pulse widths of 42 and 42 are only 6 ns. This verifies that the pseudo-twophase LCDL circuit has no clock-skew problem. Fig. 7(a) shows the chip photograph of the fabricated pseudo-one-phase LCDL ten-input NAND gate. Fig. 7(b) shows the measurement results of the fabricated ten-input pseudo-one-phase LCDL NAND gate. It is seen that this circuit can work with a clock rate of more than 60 MHz, which is consistent with the SPICE-simulated maximum clock rate of 62.5 MHz.
IV. CONCLUSIONS
In this paper, new differential CMOS logic circuits called the latched CMOS differential logic (LCDL) circuits are proposed and analyzed. The circuits can implement complex random logic functions and achieve a high operation speed. Moreover, the proposed logic circuits have no static-power-dissipation, no charge-sharing, and no clock-skew problems in the pipeline structure. The performance of the proposed LCDL circuits has been partly verified through an experimental chip.
