Abstract-This paper introduces a new reduced swing logic style called dynamic current mode logic (DyCML) that reduces both gate and interconnect power dissipation. DyCML circuits combine the advantages of MOS current mode logic (MCML) circuits with those of dynamic logic families to achieve high performance at a low-supply voltage with low-power dissipation. Unlike CML circuits, DyCML gates do not have a static current source, which makes DyCML a good candidate for portable devices and batterypowered systems. Simulation and test results show that DyCML circuits are superior to other logic styles in terms of power and delay. A 16-bit DyCML carry look-ahead adder (CLA), fabricated in 0.6-m CMOS technology, attains a delay of 1.24 ns and dissipates 19.2 mW at 400 MHz.
Index Terms-CMOS integrated circuits, current mode logic, digital circuits, high-speed integrated circuits, logic design.
I. INTRODUCTION
L OW-POWER design has become a critical issue in VLSI design, especially for portable devices and high-density systems [1] . Usually, switching power dominates the power dissipation in most digital systems. Switching power is calculated using the following equation [1] : (1) where is the switching rate of the logic gate output, is the supply voltage, and is the output voltage swing, which is normally equal to and is the load capacitance including interconnect parasitics.
It is clear from the equation that reducing the supply voltage is the most effective approach to decrease switching power dissipation. Unfortunately, lower supply voltage degrades performance dramatically [1] . Limiting the output voltage swing is another method to reduce dynamic power and delay, simultaneously. However, the design of reduced swing logic circuits is more complex than full swing logic circuits because in reduced swing circuits some of the transistors are ON while the other transistors are partially ON. To illustrate the difference between full swing and reduced swing logic circuits, we may refer to full swing logic circuits as voltage mode logic (VML) circuits. Such circuits rely on the The authors are with the Department of Electrical and Computer Engineering, University of Waterloo, ON N2L-3G1, Canada (e-mail: mwaleed@vlsi.uwaterloo.ca; elmasry@vlsi.uwaterloo.ca).
Publisher Item Identifier S 0018-9200(01)01469-X. value of the input voltage(s) to switch the different gate transistors to either the ON or the OFF state, creating a path from the output node to one of the supply rails. This is different from the reduced voltage swing circuits, where some transistors are completely ON while the other transistors are partially ON. Such transistors do not work properly in VML. Current mode logic, where all the transistors are ON (partially or fully), is preferred for such cases. In current mode logic circuits, the value of the output logic of the gate is based on the the difference between currents passing through the two branches of the circuit, as in the case of MOS current mode logic (MCML) [2] . However, MCML is not widely used in digital design because of its static power dissipation and design complexity. In this paper, we will present a new logic family, referred to as dynamic current mode logic (DyCML). The new logic family utilizes the current mode scheme to reduce dynamic power and enhance performance. In the mean time, DyCML circuits utilize the dynamic operation to cancel static power dissipation associated with current mode logic circuits. Therefore, DyCML achieves high performance at low voltage and low-power dissipation.
This paper is organized as follows. The next section outlines the operation of MCML circuits and describes their advantages and disadvantages. Then, the DyCML circuit architecture, and operation is introduced in Section III. Section IV describes the implementation of DyCML logic and the simulation results. The following section presents the experimental results followed by a summary.
II. MOS CURRENT MODE LOGIC (MCML)
In order to explain the architecture and operation of MCML circuits, an MCML inverter is shown in Fig. 1 [2] . Transistor acts as a dc current source controlled by . Resistors and are pull-up resistors. The logic function is implemented by 0018-9200/01$10.00 © 2001 IEEE the logic block connected between the resistors and the current source. For an inverter/buffer, the logic block is the differential pair constructed by transistors and . The operation of the MCML logic is based on the differential pair circuit. Each input variable is connected to a differential pair circuit. The value of the input variable controls the flow of current through the two branches of the differential pair. For example, if is higher than , the current passing through exceeds the current passing through . Therefore, the voltage of node begins to drop, until it reaches a steady state, where the current going through the resistor matches the current going through transistor . In the mean time, node is charged to through resistor . The output voltage swing isdefinedasthevoltagedifference between and at steady state. The amount of current passing through the ON branch ( in the previous case) controls the discharge delay of the logic gate (1 0 transition), while the load resistor controls the charging ofthe output nodes (0 1 transition).
To achieve the best performance, all of the current needs to pass through the ON branch only, and the load resistors should be small in order to reduce the RC delay. This guarantees that the voltage at one of the output nodes is , while the other nodal voltage is , where is the value of the current flowing through the current source, and is the load resistance ( , ). MCML circuits are faster than other logic families, because it uses NMOS transistors only and these transistors operate only in the saturation or linear regions.
The small output swing of MCML circuits reduces the cross talk between adjacent signals. The constant current source reduces the switching noise and supply fluctuations. For these reasons, MCML is recommended for mixed signal design to reduce the interference between the digital and analog blocks. The reduced output swing also reduces the dynamic power dissipation in the case of long busses. Therefore, MCML may be used in the implementation of bus transceivers to reduce power and noise. Another important feature of MCML circuits is its high noise immunity, due to the differential nature which is recommended at high operating frequencies [3] .
However, MCML has some major drawbacks which limit its use in digital systems. First is the static power dissipation due to the constant current source which is independent of the operating frequency. By using Power/MHz as a measure for power dissipation, MCML power dissipation is reasonable at high operating frequency, but the Power/MHz becomes much higher at lower operating frequencies because the current source is fixed. Compared with CMOS circuits, MCML consumes more power at low frequencies. Therefore, MCML is preferred in high-frequency applications only, in order to reduce the overhead of its static biasing power. Secondly, MCML is not suitable for power-down modes because of the dc current source. Hence, it is inappropriate for large systems, where power-down techniques are used to reduce the system power. MCML circuits also require special fabrication technologies to implement the large load resistors in a reasonable area. This increases the cost and area of the chip. MCML designs need to include a reference voltage distribution tree to control the current source of each gate, leading to larger chip area and more complex routing. Finally, the matching of the rise and fall delays is not an easy task, because it is a function of the load of each gate.
III. DyCML CIRCUIT ARCHITECTURE AND OPERATION
To achieve the high-speed characteristics of MCML, but exclude its drawbacks, the current source and load resistors of the MCML gate should be redesigned. Dynamic Current Mode Logic (DyCML) employs a dynamic current source with a virtual ground to eliminate the static power and other side effects associated with the conventional static current source. The new architecture also utilizes active loads, instead of the traditional load resistors to reduce power dissipation. Fig. 2 shows the basic architecture of a DyCML logic gate. It consists of the following: an MCML block for logic function evaluation, precharge circuit ( , , ), dynamic current source ( , ), and a latch to preserve logic value after evaluation ( , ). The operation of the DyCML is described as follows: during the low phase of the clock, the precharge transistors , turn ON to charge the output nodes to , while transistor turns ON to discharge capacitor to . Meanwhile, transistor is OFF, eliminating the dc path from to . During the high clock phase, the precharge transistors , , and turn OFF, while transistor switches ON creating a current path from the two precharged output nodes to the capacitor . The latter acts as a virtual ground. These two paths have different impedances depending on the logic function and inputs; therefore, one of the output nodes drops faster than the other node. The cross-connected transistors and speed up the evaluation and maintain the logic levels after evaluation. During the evaluation phase, when one of the output nodes voltage drops less than , the transistor whose gate is connected to this node turns ON, charging the other output node back to . Fig. 3 shows the voltages at different nodes in a DyCML gate.
Transistor is used as a capacitor. It acts as a virtual ground to limit the amount of charge transferred from the output node(s). The value of this capacitor is dependent on the value of the load capacitance (fan out), and the required output voltage swing. From Fig. 3 , it is clear that of transistor after the evaluation, and the voltage of node is . Since the charge stored on transistor equals the charge drained from the output nodes, the following equations are used to calculate the size of transistor :
where is the output voltage swing, and are the width and length of transistor , respectively, is the gate oxide capacitance per unit area, and is the load capacitance per output node. The parasitic capacitances of the MCML block are included in , as well as the gate capacitance of transistors and , and the parasitic capacitances of the precharge transistors and . Although the voltage of only one output node drops, a small current (charge) flows from the other output node to , until the latch switches ON. Thus, transistor should be sized up to accommodate this extra charge. From simulation results, the required increase in size was found to be 20%. The area of transistor is a minor fraction of the total gate area. For example, for a DyCML inverter gate with a fan out of 8, implemented in 0.6 m, transistor area should be 10 m , while the logic gate area is around 250 m , i.e., the capacitor size is about 4% of the gate area. The size of transistor is small because of the following:
• is around 20% of [2] .
• is large especially for new fabrication technologies where the thickness of the gate oxide is reduced.
• The input transistors ( ) of the DyCML logic gate are small because these transistors are responsible for steering the current only. The logic transistors are not supposed to completely charge or discharge the output load.
Unfortunately, in Deep SubMicron technologies (DSM), the interconnect capacitance grows up rapidly leading to higher load and, consequently, larger capacitor size. However, the increase in the interconnect capacitance is fairly compensated for by the increase in gate capacitance per unit area, which leads to smaller capacitor sizes. Therefore, the percentage increase in the capacitor size in DSM technologies is limited and is not expected to have a major effect on the logic gate size. 
A. Operation of the Dynamic Current Source
Transistor and construct a dynamic current source, which enhances the performance of the DyCML gates dramatically. The operation of the current source is described as follows: at the beginning of the evaluation phase, transistor acts as a current source with its gate biased with , driving a large current from the MCML block. As the current charges the capacitor, node voltage starts to rise, limiting the current flowing through until it eventually turns OFF when its becomes zero. This large instantaneous current speeds up evaluation leading to a smaller delay. The instantaneous current is shown in Fig. 4(b) whereas Fig. 4(a) shows and of transistor . Dynamic power dissipation of DyCML gates is small compared with other dynamic differential logic styles because of the reduced output swing and small input transistors. The latch connected transistors and eliminate the subthreshold leakage problem that degrades the stability of dynamic logic circuits.
DyCML does not suffer a static power dissipation because transistors and would never turn ON simultaneously. Only dynamic power exists, and it is independent of the input combinations. This occurs because the voltage at one of the output nodes is , whereas the other drops to after each precharge/evaluation cycle.
The proposed DyCML gate operates properly at a supply voltages as low as . This value guarantees that, during the evaluation phase, the latch ( , ) switches ON to avoid any charge leakage. 
B. Cascading DyCML Gates
DyCML gates may be cascaded in two different fashions by using:
• a clock delay (CD) mechanism where the clock signal is buffered from one gate to another; • a self-timing scheme where a gate generates the clock signal for the gates in the following logic level.
1) Clock Delay (CD):
Clock delay is a well-known scheme in dynamic circuits. The clock signal is delayed between cascaded gates by adding a buffer. A single clock buffer may be used to generate the clock signal feeding more than one gate. This is possible as long as the gates have equal logic depths. This scheme is the simplest and gives the best power and delay results. However, it must be clear that the CD should be larger than the gate delay. From the simulation results, it was found that even complex DyCML logic gates with large fan outs would have half the delay of the clock buffer. Therefore this condition is satisfied because of the speed of the DyCML gate.
2) Self-Timed Scheme (ST): Self-timing requires each gate to generate a completion signal for the following logic level as shown in Fig. 5 . In DyCML, this signal may be the voltage on the transistor/capacitor (node d in the DyCML gate schematic). A special buffer is used to convert this signal to a full swing signal to be used as the clock signal for the next block. Fig. 6 shows the architecture of this buffer. It consists of a cascade of two clocked inverters. The PMOS transistor of the second buffer is removed to reduce the delay of the generated clock signal. The input to the first inverter EOE (the End Of the Evaluation) is the voltage on the transistor from the previous logic level. The buffer operates as follows: when the clock (CLK) is low, transistor turns ON, charging node to which turns transistor OFF. Transistor turns ON and discharges the output node to "0." Since the transistor 's gate is discharged to "0" and the clock is low, transistor turns OFF during this clock phase.
When the clock signal becomes high, transistor turns OFF while transistor turns OFF. Until EOE input starts to rise, no current will pass from node to the ground, keeping transistor OFF. When the input starts to rise, transistor switches on, discharging the node to "0." Consequently, transistor turns ON to charge the output node to . Fig. 7 shows the voltages of various nodes in the buffer.
The ST clocking scheme is more appropriate for circuits with large variations in the operating conditions like supply voltage, temperature, etc. The reason is that each logic level will not start evaluation until the previous level has already evaluated, unlike the CD technique where the gate will start evaluation as soon as the delayed clock signal arrives. The price for the increased stability is higher delay, and power dissipation because of the buffers.
C. DyCML-CMOS Interfacing
DyCML gates may be used in conjunction with CMOS gates in the same design. Inputs of DyCML logic may be connected directlytoCMOSgates'outputs.Nobufferingorinterfacingcircuits are required. To connect the outputs of DyCML gates to the inputs of CMOS gates, a special buffer is required to convert the reduced swing signal to afullswingsignal. Manydifferential single-ended buffers exist. Unfortunately, most of them are complex and they rely on a dc current bias. To take advantage of the presence of a clock signal in DyCML logic, a newconversion circuit is designed as shown in Fig. 8 . It consists of a clocked inverter followed by a regular CMOS inverter. The operation of the buffer buffer is as follows: when the clock is "0," transistor is ON, discharging node to ground, and therefore, the output node becomes high. Since the DyCML gate outputs are precharged to when the clock is low, transistor is OFF. When the clock becomes high, transistor turns OFF. Depending on the input signal (the output of the DyCML gate), transistor will either turn ON, leading to a "0" output, or stay OFF, keeping the output at "1." To speed up the interfacing circuit, the voltage swing of the DyCML needs to be increased. This increase is required only at the gates driving the CMOS logic gates. The voltages on the interfacing circuit are shown in Fig. 9 .
IV. CIRCUIT IMPLEMENTATION AND SIMULATION RESULTS
To evaluate the performance of the DyCML logic style, a set of logic gates were designed and simulated using 0.6-m CMOS (HP/MOSIS/CMC) technology. This technology has an effective channel length of 0.5 m and threshold voltages of about 0.7 and 0.9 V for the NMOS and PMOS transistors, respectively. To compare the performance of DyCML with other logic styles, six logic gates (AND, OR, XOR, MUX, AOI, FA) were implemented using five different logic families, namely, Conventional CMOS [4] , Complementary Pass Logic (CPL) [4] , Domino [5] , Dynamic Differential Cascode Voltage Switch (DDCVS) [3] , and MCML. A 16-bit CLA adder was also used to evaluate the performance of each family on the block level.
The stability of DyCML circuits was also examined under different operating conditions.
A. Gate Simulation
Forcomparison purposes,five of the most frequently used logic gates were chosen, namely, NAND/AND, NOR/OR, XOR, MUX, and AOI.Thefulladderisalsoincluded, asithasbeenusedhistorically to evaluate logic families. The two different cascading techniques for DyCML (CD, ST) were compared to CMOS, CPL, Domino, DDCVS and MCML logic styles.
The performance of each logic family is a function of the input waveform [4] . The rise and fall times control short circuit currents, charge sharing, power dissipation, and speed of evaluation. For such reasons, it is preferred to drive each logic gate by a logic gate from the same logic style. Thus, the setup shown in Fig. 10 was used to compare the different logic gates. For each simulation, ten logic gates of the same type were cascaded to decrease the percentage of error in the measurements. Each logic gate had a fan out of two gates of the same type. The output of each gate was alternately connected to one of the inputs in the following stage. This ensured that the gate was properly loaded. This was particularly important for CPL implementations, where the output of the CPL gate was alternately connected to the "gate" or "drain" of the device in the next stage. A driving gate was added to drive the input of the first gate, whose power and delay were not taken into account in the measurements. All gates for all the logic styles were sized in order to achieve minimum energy delay product (EDP). Delay was calculated as the worst-case delay, whereas the Power/MHz was used as a measure for power consumption. The effect of interconnect capacitances is included in the simulation as . is estimated as the wiring capacitance between two logic gates, which are five logic gates apart both in the vertical and horizontal directions. Clock tree power was included in the power measurement for Domino and DCVS circuits. For MCML gates, the reference voltage is a static signal. Therefore, it was neglected during power calculation since it charged the current source only once during startup.
The simulations were executed at 100-MHz frequency while the voltage supply was 3.3 V. Because of the noninverting nature of conventional Domino logic, the XOR and full adder were implemented using NP-Domino, whereas the MUX was not implemented in Domino because it is impractical. Table I shows the simulation results for the different logic styles. All the results are normalized with respect to CMOS results to simplify the comparison. The delay of DyCML gates is the lowest among all the logic styles. As mentioned earlier, this is a result of the high evaluation current, reduced output voltage swing, and the logic transistors that do not operate in the cut-off region. The second best delay is obtained using DDCVS followed by Domino, MCML, CPL, and CMOS, respectively. The DyCML (ST) had 17%-30% higher delay compared to the DyCML (CD) because of the self-timing circuitry.
Since energy delay is equal to delay power, DyML has the best EDP product among all the logic styles.
From the simulation results, it is clear that DyCML circuits achieve high speed at a reasonable power dissipation. DyCML is more suitable for complex logic gates, where the power over- head of the extra circuitry vanishes with respect to the power of the logic evaluation block.
B. CLA Adder Simulation
On the block level, a 16-bit CLA adder was used to compare DyCML against the five logic styles. Fig. 11 illustrates the architecture of the 16-bit CLA adder. For more details on CLA adders, refer to [6] . CLA was chosen as a comparison figure because of the reasons mentioned in [7] .
Figs. 12 and 13 illustrate a block generate gate and a carry propagation gate implemented in DyCML. Both gates utilize the exclusive relationship between propagate and generate signals to reduce the number of transistors and the gate complexity. The logic expression for the generate gate is (4) while the logic expression for the carry propagation gate is (5) where ranges from one to three. Table II presents the power (including clock power if applicable), delay, power-delay product, and energy-delay product results for the seven logic styles. All the results are normalized to CMOS logic results. Both versions of DyCML proved to have the smallest delay among all the logic styles, because of the large evaluation current and reduced output swing. DyCML also had limited power dissipation due to its reduced output swing. DyCML tops all logic styles for minimum EDP, which proves that the CLA adder is efficiently implemented using DyCML. The clock power in DyCML is only 35% of the total power dissipation, because all the evaluation and precharge transistors are minimum-size transistors.
C. Stability
A divide-by-2 circuit (T F/F) is designed to determine the effect of supply voltage scaling on the performance of the DyCML logic. The schematic of the circuit is shown in Fig. 14 . The maximum operating frequency of the circuit is defined as the maximum frequency at which the divide-by-2 circuit is able to generate a voltage swing of 20% at the output nodes. Fig. 15 shows the maximum frequency versus supply voltage for the divide-by-2 circuit.
The simulation results show that the maximum toggle frequency of DyCML is about 2.7 GHz at 3.3-V supply, which is about 40% higher than the equivalent CMOS circuit. The high speed is achieved mainly because of the reduced output swing, which requires a smaller amount of charge to be transferred through the logic block. Also, because the logic transistors are ON at the start of evaluation phase, unlike CMOS logic where some transistors are OFF, they should be turned ON before the output nodes toggle between states.
V. EXPERIMENTAL RESULTS
The 16-bit DyCML(CD) CLA adder test chip was fabricated in a 0.6-m CMOS process. The chip has a built-in clock recovery circuit and a scan chain to force the CLA inputs into the chip. To monitor the outputs, reduced-to-full-swing buffers are used to convert the outputs of the CLA to full swing signals before the output pads. The adder and buffers are designed on 690 160 m compared to 870 150 m , for the equivalent CMOS implementation. The microphotograph of the chip is shown in Fig. 16 .
The chip was tested on frequencies ranging from 0 to 400 MHz. The testing was done in two phases. The first phase checks for the functionality. In this test, a set of random input vectors is shifted into the chip, and the output is compared to the expected results. The test vector generation and validation is carried out using IMS XL100 logic tester running at 50 MHz. The CLA chip operates successfully down to a supply voltage of 2.2 at 50 MHz. Because of the limitation on the clock frequency of the IMS tester, another test setup had to be implemented to measure maximum operating frequency, power, and delay.
The second test scheme forces the test vectors that yield the worst-case delay. These vectors are A 0's, B 1's, and Cin toggling between "1" and "0" at half the clock frequency. Because the power of DyCML is not a function of the input combinations, this test can be used to measure the power. The output signals were recorded using a 50-GHz Tektronix 11 801C digital sampling scope. Fig. 17 shows the measured waveforms. The signals from left to right are the clock signal, carry out of the first 4-bit CLA block, carry in to the last CLA block, and the last output sum bit . The measured delay was 7% less than the simulated delay, while the measured power was 4.5% less than simulated layout, which falls within process variation limits. The measured delay of the CLA adder is 1.24 ns. At 400
MHz and a 3.3-V supply, the chip consumes 19.2 mW. The measurement results include the power and delay of the conversion buffers.
VI. SUMMARY
In this paper, a new logic-style DyCML is introduced. The DyCML family combines the advantages of both MCML logic circuits and dynamic logic styles. A major advantage of the DyCML is the dynamic current source, which achieves smaller delays compared to the basic MCML circuits. Other advantages inherited from MCML are high performance, noise immunity, and robustness to supply voltage scaling. DyCML gates reduce power dissipation by reducing the output voltage swing.
Simulation results show that DyCML circuits have better delay, power-delay and energy-delay products compared to five of the most famous logic styles. A 16-bit CLA adder is fabricated in 0.6-m CMOS technology to validate the simulation results. Experimental results show that DyCML circuits achieve high speed with low-power dissipation. The area of DyCML circuits is comparable to the area of the equivalent CMOS circuits.
