The power versus frequency performance of a micropipelined conventional CMOS logic family is compared with that of three similarly pipelined energy-recovering logic families. Using a circuit simulator, the supplies and operating voltages of each family are optimized for minimum power consumption at each frequency. One of the energy-recovering logic families is shown to be capable of substantially lower dissipation than the conventional case, one is comparable, and one is worse.
Introduction
Adiabatic switching is a recently advocated circuit technique for reducing the power dissipation in digital logic by recovering some of the energy that would be dissipated in conventional logic ( 1, 2) . A variety of circuit approaches to creating adiabatic switching logic have been proposed, including both retractile and micro-pipelined techniques. Analyses of the retractile approaches have shown that they can only do better than voltage-scaled CMOS at very low frequency (2, 3) . This work presents a comparison of conventional circuits and adiabatic circuits in the high speed micro-pipelined regime, using voltage-scaling for all circuit families considered.
Circuits and Methodology
The four circuit families that have been compared are: (a)TSPC (True Single Phase Clock, a conventional CMOS logic familly)(4), (b) 2N-2N2D (one of the first adiabatic families)(S), (c) 2N-2N2P (a more recently proposed variation on the same idea as 2N-2N2D)(6), and (d) Hot Clock nMOS (as originally proposed by Seitz) (7, 8) . An example of a 2-input NAND gate is shown for each of these families in Fig. l(a-d) . Note that (b) and (c) are dual rail logic f m ilies, which simplifies some of the logic. waveforms used. All four circuit families yield a micro-pipelined type of archltecture, and so are well suited to arithmetic functional blocks and other DSP types of circuits. Fig. 3 shows the basic circuit that is simulated for each logic family. It is a slice of a 32 bit adder, including representative cany-lookahead circuits. The circuit is modlfied as little as necessary to adapt it to each of the 4 logic families. The TSPC version is implemented in a repeated PC2-PC-NC2-NC pipeline (see (4)), with the most complex gates formed in PC2 or NC2 to reduce capacitance. Since it is a little more complex, the TSPC adder circuit is shown in more detail in Fig. 4 . The carry-lookahead gates are intended to be more complex than would generally be used in a circuit design. By guaranteeing that these circuits work under worst case conditions, it is expected that circuits with simpler logic gates will function reliably.
To provide a forward-looking view of what technology will be capable of, the simulations reported here assume 0.1 pm conventional bulk MOSFETs throughout. The nominal nFET width is taken to be 2 pm, and the nominal pFET width is 4 pm. For series stacked devices, the width is in all cases increased with the number of devices in series, and some tapering is used in the TSPC family. For 2N-2N2D the nominal nFETs are 2 pm, while in 2N-2N2P the nominal nFET and pFET devices are 1 pm and 4 pm, respectively, because there are always two parallel S E T paths for pulldown. In Hot Clock nMOS, the nominal bootstrap isolation devices are taken to be 0.6 pm, as a minimum dimension, while the nominal driven pull-up devices are 2.4 pm, and the nominal state holding pull-down devices are 1 pm. The capacitance on the final output node, SO, is chosen to be large enough to represent a 1 mm line being driven to carry the signal to some other part of the chip. The inverters leading up to it are scaled up appropriately. Also, extra capacitance is added to some of the nodes to account for logic loads that would exist in the full 32 bit adder. By estimating the size of the entire adder (see Table l ), it is estimated that the typical internal nodes of the adder have short wires (5-20 pm), with capacitances of order 2 fF. Since t h s capacitance is much smaller than the capacitance due to the logic gates, the capacitance of the internal wires is neglected.
The necessity of using the fully driven form of Hot Clock nMOS shown in Fig. l The small wiring capacitance in this circuit macro is detrimental to many proposed adiabatic circuits, such as REL, 2N-2N2D, CAMOS, and some variations of hot-clock nMOS. These circuits have floating nodes during some part of the clock cycle, and only work well if there is substantial capacitance to ground. In the presence of such capacitance, they can work quite efficiently, but in its absence, they easily fail to have adequate operating margins because of unfavorable capacitive coupling between the floating nodes and the clocks.
The circuit simulations were constructed so as to locate the minimum dissipation conditions for both supply voltage, V,, , and threshold voltage, V, while still maintaining operating margins. These margins are implemented by requiring that the circuit operate correctly not only at nominal conditions, but also under worst case conditions. Worst case conditions consisted of the supply voltage (AC or DC) being up to 10% high or 10% low, and the FETs simultaneously varying up to gate length 930% and AV, = 20mV + 0.05 . VDD, or down to gate length -30% and AV, = -(40mV + 0.05 . VDD) . The V, shifts proportional to VD, are intended to account for voltage drops and inductive effects in the on-chip wiring. For TSPC the power includes both the DC supply and the clock driver dissipation, while for the adiabatic families only the logic dissipation is computed, since the supply efficiencies are unknown (but are expected to be high). Correct circuit operation at each set of conditions was verified by simulating a 10 bit input sequence designed to test a variety of state change conditions. 5300
Results
The 0.1 pm circuits simulated here all show excellent performance compared to present technology, with operating speeds up to more than 2 GHz for the best circuits. A plot of the dissipation versus frequency for all 4 families is shown in Fig. 5 , and the optimized nominal voltage conditions are indicated in Fig. 6 . Note that the supply voltage scales down with frequency for all of these circuits, resulting in at least quadratic power reduction with frequency over most of the range. The threshold voltages rise somewhat with decreasing frequency, as required to reduce leakage current contributions to dissipation. Table 1 shows the values of the various parameters at 500 MHz.
The worst case dissipation plotted in Fig. 5 is found by perturbing about the nominal operating conditions using the margin conditions described above, to find that set of conditions that results in the highest power. (It is this worst case power that is minimized in the procedure described above.) As can be seen, the conventional circuit performance and that of the Hot Clock nMOS are substantially comparable. The Hot Clock nMOS is probably significantly degraded by the need to dissipatively create dynamic inverses at each input, but this is necessary to achieve robust operation under the assumed circuit conditions. The CPL form of Hot Clock M O S would avoid the dissipation associated with dynamic inverters, and might well give lower power performance.
Although the 2N-2N2D and 2N-2N2P logic families are in many ways quite similar, their performance is dramati- Both logic families can run at very low voltages as the speed is decreased, but the energy-recovering logic achieves up to 6X less dissipation, because the energy is recovered into the power supplies instead of being converted to heat. Thus, even if the power supplies do not recover energy perfectly, it should still be possible to do better than the conventional CMOS circuit.
In conclusion, it has been shown that there is at least one adiabatic logic family that may be able to do better than conventional CMOS in the high speed micro-pipelined arena. The challenge for this family, 2N-2N2P, is the distribution and phase locking of 4 highly efficient sinusoidal power sources. None of the other adiabatic approaches evaluated here appear very promising in the low wiring capacitance regime considered.
