Abstract|In this paper we present a Low Voltage Di erential Current Switch Logic (LVDCSL) gate which is capable of achieving high performance for large fan-in gates. High fan-in is enabled by allowing large stacked NMOS tree heights using a pre-discharged NMOS tree, at the same time the power penalty of an increased number of internal nodes in the gate is mitigated by restricting internal node voltage swings. It is topologically a Cascode Voltage Switch Logic Gate with a cross-coupled inverter based load. However, unlike other DCVS gates with cross-coupled inverters, it is fairly robust and relatively insensitive to load imbalances at the output. The salient features of this low-voltage DCSL family are: high speed for high fan-in large stack height NMOS trees, low power due to restricted internal voltage swings and a latching nature which locks out inputs once outputs are evaluated. While the gate exhibits spikes at its di erential outputs (in common with other sense-amp based CVSL logic gates such as SSDL) transitions are greatly reduced simplifying interface to conventional CMOS circuits. Our results show that LVDCSL is capable of working at under 2volts in a 0:35 CMOS process while being faster than comparable Domino gates. At the same time total power consumption is reduced. LVDCSL achieves 40% delay improvement and 22% power reduction in comparison with Domino gates for 8 bit carry lookahead circuits. The e ect of changing circuit parameters on the energy-delay performance of LVDCSL is presented. Results for the critical path of an adder, reveal that the complexity a orded by the gate, e ectively decreases the number of logic levels and leads to improved performance.
I. INTRODUCTION
Unlike conventional CMOS circuit designs which use low functionality gates with limited fan-in, di erential cascode voltage switch circuits (DCVS) allow much higher functionality with greater fan-in 1], 2], 3]. This is especially true for DCVS logic styles which use internal sense circuits (a cross-coupled inverter pair) to speed output transitions, as in the case of Enable/Disable Cascode voltage switch logic (ECDL) 5], 7], Sample Set Di erential Logic (SSDL) 4], Latched di erential cascode Logic (LCDL) 6]. The DCVS logic style is especially good at implementing exclusiveor type of functionality found in arithmetic circuits, using large NMOS trees 7], 8], 9]. The high functionality afforded by such gates allows the number of logic levels, and gate output nodes to be minimized. However DCVS circuits are not free of their own shortcomings. They exhibit a high power consumption due to the precharged di erential nature of the outputs. More importantly, a reduction in gate output nodes { due to multiple simple gates This work was supported in part by Intel and by DARPA under contract F33615-95-C-1625. A preliminary version of the paper appeared in the proceedings of the Symposium on Low Power Electronics, in August 1997. Dinesh Somasekhar is in the Department of Electrical Engineering, Purdue University, West Lafayette, IN 47906 USA. E-mail: somasekh@ecn.purdue.edu. Kaushik Roy is with the Department of Electrical Engineering, Purdue University, West Lafayette, IN 47906 USA. E-mail: kaushik@ecn.purdue.edu being replaced by a higher functionality gate { does not give any bene ts in power because the number of internal nodes which see near rail to rail swings in the NMOS tree is large. While higher complexity gates can reduce the number of logic levels, an overall performance improvement is achieved only if the gate delay does not degrade rapidly with increase in the complexity { in terms of stacked MOS devices in the NMOS tree { of the gate. Apart from ECDL, the use of precharged NMOS stacks results in body e ected devices which limits the height of gates.
Low voltage Di erential Current Switch Logic is a DCVS logic gate which attempts to achieve low power consumption by restricting the voltage swing of internal nodes. At the same time high fan-in gates are enabled using a predischarged NMOS tree with the performance advantages of a precharged CVSL design. Previous versions of DCSL 10], while achieving these goals were limited to high supply to threshold voltage ratios (V DD >> 3V t ) because of a large number of stacked MOS devices in discharge paths. In this paper we present LVDCSL, whose circuit topology enables it to operate with lower supply to threshold voltage ratios. It is thus capable of operating well in newer low voltage CMOS processes with threshold voltages being an appreciable fraction of the supply voltage. This paper is organized as follows: We brie y relate LVDCSL to various DCVS logic families in section I. A description of the topology of LVDCSL with an emphasis on solutions to possible problems is presented in section II. A description of the various stages of operation is given in section III. Section IV discusses the e ects of having a pre-discharged NMOS tree which enables high fan-in gates. The performance of LVDCSL gate with respect to Domino gates is compared in section V. The improvement in performance obtained is evaluated by replacing input sections in the critical path of 64 bit adder with high fan-in LVD-CSL gates. We summarize the salient features and the shortcomings of LVDCSL in section VI.
II. Low Voltage Differential Current Switch Logic
Low voltage Di erential Current Switch Logic (LVD-CSL) is a di erential cascode voltage switch (DCVS) logic gate. In common with DCVS logic families it consists of a large evaluation NMOS tree with di erential/complementary inputs and outputs which provides the gate functionality. The NMOS tree is designed so that there is exactly one path from one of the outputs to ground through the tree. We can conceive the simplest form of precharged DCVS by precharging the tree outputs high. The NMOS tree now evaluates its inputs by discharging one of the outputs. LVDCSL uses techniques employed in ECDL and SSDL to speed the output transition, namely the addition of a cross coupled inverter pair across the output which acts as a simple sense ampli er. In fact it is possible to extend ECDL and SSDL gates to get precharged low and precharged high DCSL logic families, which give superior performance with reduced power 10]. The topology of a such a gate derived from SSDL { a precharged high DCSL gate { is shown in gure 1. However such a gate has a number of shortcomings which make it di cult to use in practice. Some of the more important ones being, the large number of NMOS devices in series forces high supply to threshold voltage ratios, steep output spikes are observed, and the gate is not robust since it is very sensitive to di erences in output loading.
The new topology of the DCSL gate (LVDCSL) is shown in gure 2. The gate uses a pre-discharged NMOS tree with a sensing stage to determine the outputs. In common with precharged CVSL gates it has precharge and evaluate phases. Conventional DCVS gates allow a path from both outputs to the NMOS tree, during the evaluate phase. This allows the output which is driven high to charge internal nodes of the NMOS tree to V CC ?V th , where V CC is the supply voltage and V th the threshold voltage of NMOS devices. LVDCSL mitigates this power penalty by disconnecting the high going transition from the NMOS tree using transistors T7 and T8 which are cross-coupled to the output. Since the outputs are di erential, if OUT is high OUT# goes low and switches o T7. This disconnects the NMOS tree from OUT. The resultant e ect is to limit the rise in voltage of the internal nodes nodes of the NMOS tree. Simulations show that in a 3:3V process with 0:6V threshold it is possible to restrict internal voltage swings to well under 1V . Disconnecting the NMOS tree also reduces the total capacitance seen by the high going output and speeds its transitions. A detailed description of this feature of DCSL gates is provided in 10]. This disconnection of outputs from the NMOS tree after stabilization of outputs allows the use of a pre-discharged NMOS tree without any rush-through currents from supply to ground.
Transistors T5, T6, P1, P2, P5 are used for precharging, and ensuring rail to rail swings at the drain of T5 and T8. P3/T3 and P4/T4 form an inverter pair which drives the output. The inputs of the inverter are the drains of T7 and T8. The inverter loop is closed during evaluation with T1 and T2 turning on. The sense structure formed speeds up the output transitions.
The DCSL gate shown in gure 1, as mentioned previously, su ers from a number of shortcomings. We discuss these in greater detail, and present the way in which the LVDCSL topology of gure 2 solves them.
Output Spikes: Both outputs start falling low until the inverter loop cuts in to drive one of the outputs high. To allow the gate to be easily interfaced to conventional CMOS circuits this, low going spike has to be limited. It is also advantageous to limit the spike, since it leads to an added power loss. In gure 1 we observe that the output spike will have a magnitude of at least V tp , since the PMOS devices do not turn on and limit the spike till the output falls below V CC ? V tp .
In reality the spike lies between V tp and and V CC =2. LVDCSL avoids this problem by having devices P3 and P4 active at the start of evaluation. The strong drive at the outputs limits the glitch. Reduced supply voltages: Modern processes have lowered supply voltage to 1:5V to 3:3V from 5V . In contrast, threshold voltages have not decreased to the same extent to prevent excessive leakage and lower noise margins in Domino gates. A 3:3V process has thresholds in the range of 0:7 to 0:9V as opposed to 1 to 1:1V . While the original DCSL gate worked well with V t being 1 5 V DD it fails to operate well with V t around 1 3 V DD . As mentioned before the output spike is greater than V tp , hence the source of T4 and T5 must fall well below V CC ? V tp ? V tn before a path to the NMOS tree is even established. This fact severely a ects gate operation at low voltages, with relatively high threshold voltages. Simulations in a 0:5 m revealed that at least 3V was required for proper operation with 0:9V V t . A side-e ect seen was that gate height is limited to quite low values. In contrast since the LVDCSL gates limits the output spike the NMOS tree charges from V CC ? V tn .
Gate Robustness: Gate robustness is adversely affected due to the following reasons:
In the evaluate phase for the gate shown in gure 1, devices T1, T2 shunt currents away from the evaluate tree. Only a fraction of the discharge currents for the output is directly controlled by the NMOS evaluation tree. LVDCSL has device T3, T4 o at the start of evaluation since the gates are precharged low by T5, T6. The entire discharge current ows into the NMOS tree. The cross-coupled inverter pair is equally sensitive to imbalances in capacitive output loading as it is to current di erences in the NMOS tree. This is aggravated because the disconnection of the evaluation NMOS trees after the outputs toggle, prevents the outputs from recovering. The inputs to the inputs of the inverters are decoupled from the outputs through T1 and T2 in the case of LVDCSL. Since T1 and T2 are saturated at the start of evaluation { since the source of T1 and T2 is pre-discharged to GND through T5 and T6, and the drains are held at the supply { the source of T1 and T2 (which are the inputs to the inverter pair) are weakly sensitive to the outputs. While there are other points of concern, namely the sensitivity of the gate to injected noise, these can be avoided by proper design. A better understanding of these e ects can be achieved by studying the operation of the LVDCSL gate.
III. Operation of the Gate
We can view the operation of the gate as being split over three states, namely the precharge, evaluate, and the stable output state.
A. Precharge State
The precharge state has CLK# high and CLK low. The state of the gate is shown in gure 3. The grey transistors in the gure indicate transistors not in the active path, while the other transistors play a role in driving the outputs or internal nodes of the gate. As shown in the gure outputs are precharged high, while internal nodes A, and B are charged low. We note that if the inputs of the NMOS tree are all high { which is the case when the gate is fed by a preceding Domino internal node { there is a path to ground from A/B to the ground via the NMOS tree. This allows CLK#, the precharge CLK, to be deactivated with the gate maintaining its precharge state.
Unlike the previous DCSL circuit of gure 1, every transistor in the path from V CC to the evaluation NMOS tree is Figure 4 shows the paths switched on when the gate enters the evaluate stage with CLK high and CLK# deactivated, and the inputs to the NMOS tree being set up. When T1 and T2 turn on the NMOS tree begins to charge up through T1, T7 and T2, T8. Assume that the NMOS tree has a stronger path on the left, node A will be held at a lower voltage than B. As node B goes higher than V tn , T3 switches on and the positive feedback loop rapidly drives the outputs in the proper direction. T8 turns o because of OUT going low, which in turn disconnects the NMOS tree from the high output OUT#. This limits the voltage swing in the internal nodes of the NMOS tree. In common with the previous DCSL circuits, this achieves our goal of limiting the power consumption at internal nodes of the large NMOS evaluation tree. The gate comes to rest in the state shown in gure 5. In this state changes in input will not cause the gate state to be disturbed. We observe that the NMOS evaluation tree can disturb the state of the gate only by pulling an output node to ground. Since the path from the high output is disconnected the NMOS evaluation tree cannot e ect the outputs. Hence all inputs to the gate may be precharged after evaluation, in case they are fed by a similar gate. This allows for simple pipelined con gurations.
The various states of operation and the voltage waveforms at various nodes is shown in gure 6. These show the three states of operation, the voltage buildup at the internal nodes and the nal outputs. We note that CLK# needs to go low before CLK goes high (it may go low well in advance of the CLK rising edge). CLK needs to go low before CLK# goes high to avoid through path currents during the precharge face. While not shown in our simulations are carried out with all inputs originally high, and some set of them going low just before the CLK high edge. LVDCSL designs appears to be similar to CVSL in having an NMOS evaluation tree and a load network. However, as opposed to the precharged NMOS tree in most CVSL designs, the NMOS tree in LVDCSL is pre-discharged. Enable/disable cascode voltage switch logic is similar to LVD-CSL in this sense. The use of pre-discharged NMOS trees accounts for the large fan-in capability of the gate since transistors farther away from GND in a NMOS stack are not body-e ected. This factor is well known for EDCL 7] , where very high fan-in adder circuits are presented. The pre-discharged NMOS tree imposes a some what di erent set of constraints as opposed to a precharged high NMOS tree. The design of the NMOS tree should account for two factors which can cause a failure of the gate namely; the NMOS tree appears as a RC network with the capacitance in both branches being equally signi cant as the di erence in resistance of the two paths, the inputs to the tree transition from low to high and can couple to the internal nodes of the tree via the gate to source/drain capacitances.
A simpli ed circuit for one half of LVDCSL is shown in gure 7. The feedback paths are not shown in this circuit. Transistor numbering are referenced with respect to previous gures to show the correspondence. The NMOS tree is modeled as a simple RC network 1 . The rise in voltage at node X is modeled as a ramp (based upon simulation results) i.e the voltage at X at time t, V x (t) = t, where , the ramp rate is a constant . We justify this model since the source of T7 follows the gate of T7. The gate of T7 is connected to the output where the voltage decay is nearly linear at the start of evaluation. Thus: = (C ef f + t R ) The cross-coupled inverter loop responds to the di erential current arising from the two halves. We note that I 1 is dependent not only on R the path resistance to ground, but also on the e ective capacitance seen in the tree. Hence the NMOS network implementing the tree should assure that the inactive half of the NMOS tree (high R) should have a lower value of e ective C as seen from X when compared to the other half. Many structures such as XOR trees automatically satisfy this constraint. However in cases such as a NAND AND con guration, it may not be automatically satis ed unless proper transistor sizing is employed.
The inputs to LVDCSL gates transition from high to low before the evaluate state, A nite capacitance exists from gate to source/drain of every transistor in the NMOS tree. The falling transition at the gate inputs couple to the source drain regions and attempt to drive them lower than GND (the tree is pre-discharged before evaluate and all nodes are at GND). In the half of the NMOS tree where there is a path to ground, this e ect is harmless since the nodes are held at GND, however in the other half because of an absence of any GND paths, internal nodes and the node X can be driven negative. If a su ciently high voltage builds up, the current I 1 in the o path can achieve a value which is greater than that in the on path of the tree. The inverter loop now ips in the opposite direction. Tree structures which have a lot of NMOS devices connected at the node X are susceptible to such e ects, and not suitable for LVDCSL NMOS trees. As an example an inverted carry propagate-generate structure is not suitable for use with LVDCSL gates.
It is apparent from gure 7, that lowering the switching threshold on the inverter will improve the performance of the gate at the expense of noise margin. This e ect will be quanti ed in section V. This means gate performance is dependent on the ratio of PMOS to NMOS device widths in the inverter. A larger NMOS and a smaller PMOS lowers the switching threshold. The voltage at node A needs to increase by a smaller amount for the gate outputs to be exercised.
The operation of the gate at the start of evaluation can be modeled as follows. Node A is low, and the output is strongly driven by the PMOS in the inverter (P4). The device is in its linear region and can be modeled as a resistor
The discharge currents into the NMOS tree cause a voltage drop across this resistor which manifests itself as an output spike. Clearly the spike can be reduced by increase the PMOS width. However this runs converse to what was stated in the previous paragraph where we would like to make the ratio of T3 to P3 widths large.
V. Performance with Respect to Domino
Performance of LVDCSL was veri ed with a 0:35 process. Supply voltage used was 2:2V and below. Threshold voltages of devices were in the range of 0:45V . Previous work 10] has shown the marked advantage of using DCSL gates in comparison to similar DCVS gates. In this work instead of contrasting to DCVS gates we compare to high performance Domino gates. A 64bit adder was selected for this purpose. Domino gates were selectively replaced to evaluate the advantage of LVDCSL. The critical carry look ahead path has a basic building block of an 8 bit carry lookahead circuit, composed of two 4-input propagate-generate domino gates followed by a 2-input static CMOS gate. The overall delay of the domino gates (not including the static CMOS gate) was 210ps. The functional complexity achieved by LVDCSL allows the two stage 8 bit CLA circuit to be implemented in a single 8 input stage. The 8 inputs CLA tree is a stack of stages, each implementing: c i+1 = p i :c i + g i c i+1 = p i + c i :g i where, p i , g i , and c i are the input propagate generate and carry terms. The 8 input LVDCSL stage directly implements c 8 , c 8 . Figure 8 shows the performance and depth of output spike with respect to variation in the width of NMOS device in inverter loop. We see that the gate is capable of surpassing the speed of a 4-input Domino (210ps). In fact the functionality achieved by this gate is a full 8 bit propagategenerate computation as opposed to a 4 bit computation achieved in the Domino gate. The depth of glitch is above VDD 2 which allows the gate to directly drive static CMOS gates. While the speed of the gate is high, additional time has to be allowed for setup time of inputs with respect to clock. In spite of this { a 100ps setup margin { the initial stage delay achieved by the 8 bit CLA dropped from 0:55ns to 0:33ns, a 40% improvement in performance.
We note that the following factors help in improving the delay with respect to domino. A decrease in the number of stages results because the higher functionality of the LVDCSL gate allows domino gates to be combined. This may not be true in all situations, however in the case of the adder it allows the compression of two stages into one. LVDCSL has a much lighter loading at the gate inputs since the NMOS transistors in the evaluation tree are small. This factor also improves power consumption. The main factor which causes an increase in delay as compared to Domino is the need to allow for setup time for inputs with respect to CLK. LVDCSL achieves the above high performance without compromising power. Figure 10 shows the power consumption of an 8 bit stage as compared to Domino. The graph shows the highly spiked Domino currents ( 8mA peak ) consumed during precharge. In contrast LVDCSL draws supply current during both precharge and evaluate how- width of NMOS to PMOS increases. However increasing the NMOS widths excessively, dramatically increases the size of the output glitch. The low tree internal node voltages for the 3:3volt supply used is well illustrated in gure 13. Increasing the size of T3,T4, decreases inverter switch thresholds, and is re ected in the decrease in the internal node voltages. The e ect of inverter ratio on scaling is shown in gure 13. Increasing the size of the NMOS devices increases the energy consumption proportionately. Normalizing energy with respect to the width of T3 + P3 (T4 + P4) indicates that the energy consumption per unit device width is substantially unchanged. Figure 14 shows simulations for an 8 bit Carry Lookahead Circuitry using the robust LVDCSL gate. The height of the NMOS tree is 4. While the gate voltage degrades as we approach V tp + V T n = 1:1V , it shows that the gate is usable to twice the threshold voltage. The graph shows the A feature of LVDCSL is the automatic lock-out of inputs once gate evaluation is completed. This allows DCSL to operate correctly with partial voltage swing at its inputs. We verify this e ect by restricting the input voltage swing between V DD and V LO in gure 15. Figure 16 plots the performance and energy consumption with an input voltage swing of V DD ?V LO . While the delay increases as the swing decreases, the change is small (< 10%). LVDCSL gates may thus prove useful in interfacing to partial swing buses.
The advantage of the using the new gate was veri ed by replacing the rst stage of the critical path of a 64bit 0:35 adder. Single stage LVDCSL gates replace the carry lookahead circuitry for an 8 bit propagate, generate circuitry. The reduction in stage as opposed to implementing it using a combination of LVDCSL and Domino gives an overall improvement of speed by 26%. Number of stages decreases from 6 to 4.
VI. CONCLUSION
In this paper we have presented a di erential current switch logic gate which is capable of high performance with High performance with large stack height in the NMOS tree.
A robust gate in spite of the use of a latching cross coupled inverter loop sense ampli er. Load mismatches at the output of a factor of 5 are tolerated It is capable of operating at voltages down to 2 V t . Previous designs limit the lowest supply to 3 V t .
A latching output stage which automatically locks out gate inputs once evaluation is completed. Power consumption is limited by decreasing the voltage swing at internal nodes of the NMOS tree. Compared to a high performance logic family like domino, the gate is capable of higher speed at a lower power consumption. The main disadvantages of LVDCSL in the authors viewpoint are:
the high complexity of the output stage prevents its use in simple gates. The layout of the output stage is critical in the sense that internal nodes A and B have to be balanced. Unlike Domino the gate does have a setup time with respect to CLK. In addition, the complexity of the output structure does not allow very short cycle times. It is however possible by judiciously replacing initial stages of existing high performance designs to greatly reduce transistor count and power without impacting on performance. LVDCSL circuits are targeted at very high performance circuits where power is often a secondary issue. As such we have restricted our comparisons with respect to domino, however further work is needed to quantify the power performance trade-o with respect to static CMOS.
