A new Quasi-Static Energy Recovery Logic family (QSERL) using the principle of adiabatic switching is proposed in this paper. Most of the previously proposed adiabatic logic are dynamic and require complex clocking schemes. The proposed Quasi-Static energy recovery logic uses two complementary sinusoidal supply clocks and resembles behaviors of static CMOS. Thus, switching activity is signi cantly lower than dynamic logic. In addition, QSERL circuit can be directly derived from static CMOS circuits.
Introduction
In the design of low-power circuits, adiabatic logic shows great potential. Numerous designs of adiabatic logic have been presented in 1, 12, 5, 6, 9, 10, 7] , demonstrating the possibility of achieving ultra-low energy computing. However, there are some common problems in realizing adiabatic logic. With few exceptions, adiabatic logic families are dynamic in nature and often use di erential signaling, which is only suitable for some arithmetic circuits such as adders. Higher switching activity of circuit nodes in dynamic approaches is unfavorable for low power design. Large transistor and wiring overheads are other di culties in applying adiabatic logic. Careful transistor sizing may be also required in many designs to ensure proper circuit operation. Finally, previous adiabatic logic designs required complex trapezoidal clock schemes which are di cult to generate. In most previously published research on adiabatic circuits, clock generation circuitry was either not included or designed separately. Sinusoidal waveforms can be generated with higher energy
The research was supported in part by NSF (9633516-MIP) and by DARPA (F33615-95-C-1625), and IBM Corporation. A preliminary version of this paper appeared in ISLPED'97. e ciency than trapezoidal waveforms. However, sinusoidal clocks can not be e ciently utilized by most previous designs.
We propose a Quasi-Static Energy Recovery Logic (QSERL) which uses two complementary sinusoidal supply clocks and possesses several positive characteristics of static CMOS logic. Circuit nodes are not necessarily charging and discharging every clock cycle which reduce the node switching activities signi cantly. The lower switching activity reduces energy dissipation. A preliminary version of QSERL was rst introduced by authors in 9] and by De and Meindl in 4] . The QSERL presented in this paper has been modi ed for noise reduction. We also present an alternative approach which eliminates diodes by using augmented circuitry. QSERL circuits can be directly converted from static CMOS circuits without drastically increasing the circuit complexity and transistor overheads.
A high-e ciency clock generation circuitry which generates two complementary sinusoidal clocks required by QSERL circuits is also presented in this paper. The adiabatic clock circuitry locks both frequency and phase of the clock signals, which makes it possible to integrate the adiabatic module into a VLSI system.
We have designed and simulated an 8x8 QSERL carry-save multiplier using two-phase sinusoidal supply clocks. Results are compared to a static-CMOS carry-save multiplier.
The rest of the paper is organized as follows. Sec. 2 gives a brief discussion of adiabatic switching and energy-recovery concepts. In Sec. 3 we present the quasi-static adiabatic logic. An 8x8 multiplier using Quasi-Static energy recovery logic is presented in Sec. 4 . Quantitative results of the performance of our adiabatic circuits are also detailed in Sec. 4 based on MOSIS 0:5 m CMOS NWELL process. In Sec. 5 we propose a new scheme of generating supply clocks for adiabatic circuits. Conclusions are given in Section 6.
Energy Recovery Using Adiabatic Switching
To date, all micro-chips are designed to dissipate the entire amount of electrical energy transferred from the power supply during a switching transition. However, new approaches based on the second law of thermodynamics point the way to recover the switching energy by avoiding the erasure of information and switching under quasi-equilibrium (adiabatic) condition.
Let us examine how energy is dissipated during a switching transition in standard CMOS circuits. Fig. 1 shows a simple RC tree. The transition of a circuit node from LOW to HIGH can be modeled as charging an RC tree with a switch. When the switch closes, there is a sudden current ow through R and the C is charged to V dd after a short period of time. The energy taken dd . However, only half of that, 1=2CV 2 dd is stored in C. The other half is dissipated on R. It should be observed that whenever current experiences a voltage drop V , energy is dissipated at the rate of i V (instantaneous dissipative power), where i is the current. Thus, the energy dissipation on R can be calculated by R 1 0 i( V )dt = 1=2CV 2 dd (refer to 9]), which is exactly half of the total energy transferred from the power supply. Let us now consider adiabatic switching in Fig. 2 . When we charge the same RC tree using a gradually increasing voltage supply, the voltage drop V across R is very small. Hence, the energy dissipation on R is also small.
Power dissipation in static CMOS circuits can now be easily understood by considering the inverter of Fig. 3(a) . A logic ONE to ZERO transition on input x turns the p-mos transistor on and the output node which is associated with capacitance C is charged from 0 to V dd . With such a transition, V dd Q(= CV dd 2 ) of energy is extracted from the supply, half of which, 1 2 CV dd 2 , is stored in the capacitance temporarily, and the other half is dissipated in the path. When the input experiences a LOW to HIGH transition, 1 2 CV dd 2 of energy is again dissipated. Hence, CV dd 2 of energy is dissipated in an entire cycle. Such energy dissipation can be greatly minimized by considering adiabatic switching. Let us consider the circuit of Fig. 3(b) , where the supply voltage swings gradually from 0 to V dd (evaluation period), stays at V dd for some time (hold period) and then swings back from V dd to 0 (restoration). If the output y at the beginning of the evaluation period is at logic ZERO and input x is valid and is also equal to logic ZERO then the output node y would follow to a logic ONE in a way such that there will be very little voltage drop across the channel of the p-mos transistor. Hence, only a small amount of energy is dissipated. Fig. 3(c) and Fig. 3(d) illustrate the di erence in charging a circuit node between standard CMOS and adiabatic switching. The energy dissipation in the resistance is given by
In standard CMOS ( RC T )CV dd 2 In adiabatic switching (2) where V is the voltage drop across the resistance, T is the transition time in adiabatic switching. Since RC 0:1ns for a moderate fanout in current day technology, and T 1=f (f is the operating frequency), E dissipation is small when the operating frequency f 100MHz.
3 Quasi-Static Energy Recovery Logic Fig. 4 shows a schematic of quasi-static energy recovery logic which resembles the static-CMOS logic. However, the circuit operates in a nearly adiabatic fashion. A QSERL gate is based on static CMOS gate with two additional diodes. The diode on the top of the p-mos tree controls the charging path, while the other diode at the bottom of the n-mos tree controls the discharging path. Two sinusoidal clocks in complementary phases, and , are su cient. Hence, the complexity in circuit wiring and design are greatly reduced compared to other adiabatic logic families. The supply clock signal consists of two phases, evaluation and hold, as shown in Fig. 4 . Let us consider the rst gate in Fig. 4 . In evaluation phase, swings up while swings down. One of the two paths, the p-mos pull-up tree or the n-mos pull-down tree, is turned ON. There are four cases as follows:
1) The circuit output node X is LOW and the p-mos tree is turned ON. Then X follows as it swings HIGH.
2) The circuit node X is LOW and the n-mos tree is ON. X remains LOW and no transition occurs.
3) The circuit node X is HIGH and the p-mos tree is ON. X remains HIGH and no transition occurs.
4) The circuit node X is HIGH and The n-mos tree is ON. X follows down to LOW. In the hold phase, swings down while swings up. The circuit node X remains unchanged due to the diodes. Note that cascaded gates are in alternate phases. The second gate in Fig. 4 evaluates its logic value while the rst gate is in hold phase. Advantages of this quasi-static CMOS logic include the simplicity and the similarity with static CMOS. In contrast to dynamic adiabatic logic in which each gate charges and discharges in every cycle, QSERL is \static". Circuit nodes are not necessarily charging and discharging every clock cycle, thus reducing the node switching activity substantially. To best understand the signi cance of the di erence, let us take multiplier as an example. For a sequence of independent random input vectors with equal probability of being 1 and 0 for each input, Monte Carlo simulation indicates that the average internal node switching probability (the probability of 0? > 1 transition or 1? > 0 transition) in a multiplier is 0:29. In dynamic circuits with di erential signaling design, one of the two branches in a gate always charges from 0? > 1 and discharges from 1? > 0 in every clock cycle. Thus, lower switching activity in QSERL logic reduces energy dissipation.
The diodes used in QSERL for controlling the charging and discharging paths can be replaced by low-threshold voltage MOSFETs. There are three possible ways to connect the diodes as shown in Since P d and N d function as diodes, the energy dissipation at P d (N d ) for charging (discharging) a circuit node is CV t (V dd ? V t ). Hence, it is essential to reduce the threshold voltage of the control transistors P d and N d for lower power dissipation. In regular CMOS applications, the threshold voltage is usually determined by circuit noise tolerance, standby leakage current, overall device optimizations, and process variations. For adiabatic circuits, the lower threshold voltage often results in better performance. Standby current is usually not a problem because the supply clocks are to be shut down in standby mode. However, if circuit nodes are oating in some portion of a clock cycle, the larger leakage current in active mode due to lower threshold voltage can cause more noise. The voltage change V due to leakage current has to be negligible, and hence, the following condition must hold for noise immunity. Q = C V I leakage (1=f) CV dd (3) Simulations indicate that it is safe to lower jV t j of both p-mos and n-mos devices to 0.2 volt when frequency f > 10MHz at 3.3 volt supply voltage.
We simulated an 8-inverter chain implemented using QSERL logic of can save slightly more than 50% of energy at 100MHz and saves 14% of energy at 400MHz. By assuming ideal Schottky diode, the energy savings in QSERL inverter chain increase considerably, especially at high frequencies (approximately 65% at 100MHz and 50% at 400MHz). We have designed and simulated several QSERL circuits. In this section, we use QSERL adder as an example to address some issues we observed in designing QSERL circuits. Design and simulation results of an 8x8 QSERL multiplier will also be explained in detail in this section.
QSERL adder
Fig. 7 shows a QSERL adiabatic adder which is directly modi ed from static CMOS mirror adder. Let us consider how this adder operates. When and are in evaluate phase, there is conducting path(s) in either p-mos tree or n-mos tree. Node CARRY may evaluate from low to high or from high to low or may remain unchanged, which resembles the static CMOS circuit. Thus, there is no need to restore the node voltage to 0 (or V dd ) every cycle. When and are in hold phase, node CARRY holds its value in spite of the fact that and are changing. The reader can easily verify that by noting the diodes and the fact that the inputs of a gate have a di erent phase with the output (A; B; C; are in evaluation phase while CARRY is in hold phase).
p-mos tree n-mos tree n-mos tree 
Noise reduction for complex gates
The output node of a QSERL gate is oating in half of a clock cycle (in hold phase) which introduces noise. We observed that this noise is tolerable for two input logic gates. However, when implementing a complex function into one gate, noise may cause incorrect operations, especially at high speed. For circuits such as multiplier which consists of an array of adders, a mixed approach { mixing conventional CMOS and QSERL logic may be a good choice to keep the noise to a minimum. Let us consider a complex gate in Fig. 8(a) . A clocked inverter loop is used to keep the output signal constant in hold phase. CL is a conventional pulse clock signal which is HIGH when is in hold phase and is LOW when is in evaluation phase. Thus, the inverter loop is ON in hold phase and keeps the output node at constant voltage level. The integration of adiabatic gates and standard CMOS circuits is ensured by the supply clock generation scheme given in Sec. 5, in which the supply clock is synchronized to the reference clock. Minimum size transistors can be used for P1, P2 and N2. N1 should have larger size to force the inverter loop to settle to its state fast. In contrast, P1 can have minimum size since it is not clocked. For a full adder, 10 transistors are needed to generate the carry and 14 transistors for the sum. Many transistors are fairly large because of large fan-out (10 for carry) in a multiplier. Hence, the energy consumed in P2, N2, and clock signals is still a small portion of a complex gate.
When di erential signaling is used (often preferred in arithmetic circuits), it is simpler to implement the noise-reduction inverter loop. Let us take a 3-input XOR gate of Fig. 9 as an example. Both sum and sum are available, thus a pair of cross coupled inverters will lock the outputs in hold phase ( Fig. 9(b) ). Minimum size transistors can be used in the two inverters.
Elimination of diodes in QSERL logic
Diodes can be avoided if we put an extra latch as shown in Fig. 8 
An 8x8 QSERL multiplier
We have designed an 8 by 8 carry-save adiabatic multiplier using QSERL logic and two-phase sinusoidal supply clock. Fig. 10 shows the organization of the QSERL multiplier, which is identical to conventional CMOS carry-save multiplier. The adder cell is implemented using 3-input XOR Fig. 9 . The last stage of the multiplier is a 7-bit carry-lookahead adder, which is detailed in Fig. 11 . Thus we use di erential signaling approach in the carry-save array and regular static approach in the carry-lookahead adder.
The multiplier was simulated using MOSIS 0:5 m CMOS NWELL process. The transistors functioning as diodes need to be sized up as the frequency goes higher. The rest of the n-mos transistors are of size l=w = 3 =2 and p-mos transistors are of size l=w = 6 =2 , where = 0:3 m. Results are compared to a static CMOS carry-save multiplier using the same transistor size. Simulation was performed using 64 randomly generated input vectors which are independent with each other. The probability of being HIGH (and LOW) for each input in each clock cycle is 0:5, and the probability that an input switches in the following cycle is also 0:5. Fig. 12 shows the energy consumption per clock cycle for the 8x8 QSERL multiplier and static-CMOS multiplier. The adiabatic circuits exhibit signi cant energy savings. At 10MHz the energy saving is more than 50%, Figure 11 : The 7-bit QSERL carry-lookahead adder used in the 8 by 8 QSERL multiplier and at 100MHz the energy saving is 37%. As clock frequency goes higher, the \diode transistors"
have to be sized up proportionately and the factor RC=T increases as well (refer to Eq. 1). At 200MHz, there is only 8% energy savings for the adiabatic multiplier. At low frequency, the energy consumption of QSERL circuits is mainly due to the diodes and conventional clocks (used to reduce noise in adders). This portion of energy consumption (per cycle) is independent of frequency. While adiabatic circuits maintain reasonably high throughput, the input/output latency is large. Our 8x8 QSERL multiplier has a latency of 12 clock phases (6 clock cycles). Hence, it seems that adiabatic technology is more suitable for some speci c applications where speed and latency are not critical.
Supply Clock Generation Circuit
Research on design of highly e cient resonant drivers for generating adiabatic supply clocks has started in earnest 3, 12, 6] . So far, the frequency range for the existing schemes is fairly low (well below 100 MHz), prohibiting its use in high frequency environment. In addition, the existing schemes work in a frequency determined by inductor(s) and the capacitive load rather than the external reference frequency. Hence it is not possible to integrate the adiabatic module into a VLSI system in which the rest of the system operates in a frequency de ned by the system. The scheme we proposed in this paper (also see 11]) can operate at 100 MHz with more than 90% power e ciency (using 0:5 m technology). Moreover, the operating frequency in our design is determined by the external reference clock, i.e., it locks frequency by using a regular Phase-Locked-Loop (PLL). 
Previous research
The underlying idea of adiabatic clock generation circuits is to use a resonant driver. Let us rst consider a generic resonant driver shown in Fig. 13 . Ideally, the circuit oscillates between 0 and 2V ref . The circuits starts to oscillate when S 0 is turned ON and ceases oscillating when S 0 is turned OFF. There is a pull-up path and a pull-down path which can replenish the energy dissipated by the resistances in the load. The pull-up p-mos transistor S p is turned ON and the pull-down n-mos transistor S n is turned OFF when voltage at node y is higher than V ref . Conversely, the pull-down path is ON and the pull-up path is OFF when voltage at y is below V ref . Thus, the control signals at S p and S n are 180 degree out of phase. The size of replenish transistors S p and S n should be determined to maintain the desired amplitude of oscillation.
The generic circuitry exhibits following several problems which need to be solved.
The serial connected control transistor S 0 has nite resistance, which decreases the energy e ciency substantially.
The control signals for S p in the pull-up path and S n in the pull-down path has to be generated by extra circuitry. All energy used for charging the gate capacitances of S 0 , S p , and S n are dissipated. Since these transistors have large size, the energy dissipation can be signi cant.
The circuitry generates single phase clock, which is not enough for general adiabatic circuits. Hence more than one clock generation circuitry is required.
A clock generation scheme has been proposed recently in 3], which is shown in Fig. 14. It consists of two branches which are 180 degrees out of phase. Two almost non-overlapping clock signals are generated. Only pull-down path is used in each branch for replenishing energy and maintaining undamped oscillation. Non-sinusoidal \blip" waveforms are produced for lack of pull-up path. For resonant circuits, sinusoidal waveform has the highest energy recycling percentage. Any other waveform contains a base sinusoidal component and higher order harmonics. The component of base frequency f 0 can be e ciently recycled (determined by the Q value of the resonant circuit), while all other components in higher frequencies (2f 0 ; 3f 0 ; 4f 0 ; ) are almost completely dissipated, as illustrated in Fig. 15 . Therefore, the energy e ciency of this scheme can still be substantially improved. In addition, two inductors and an extra reference voltage source is used in this scheme.
Two possible variations are illustrated in Fig. 16 and Fig. 17 , respectively. Note that the reference voltage source can be removed and the two inductors can be further replaced by one inductor in both schemes. The two branches in the scheme of Fig. 16 are not symmetric. Pull-up path is used in one branch while pull-down branch is used in the other branch. In the scheme of Fig. 17 , both pull-up and pull-down paths are used in each branch, thus the generated waveforms are closer to the sinusoidal curve. While the sinusoidal waveform is important for higher energy e ciency, this scheme has a severe shortcoming. Within a certain period of time in every cycle, both n-mos and p-mos transistors are ON when the voltage V 1 or V 2 is in the vicinity of V dd =2.
Hence, short circuit current dissipates signi cant amount of energy. In the following section, we propose a new scheme which generates better sinusoidal waveforms and also prevents the short circuit current. 
A new scheme of adiabatic clock generation circuitry
We propose a new resonant driver for generating adiabatic supply clocks as shown in Fig. 18 . The oscillator generates two complementary phases of nearly sinusoidal waveforms. Two p-mos transistors (P 1 and P 2 ) and two n-mos transistors (N 1 and N 2 ) are used for energy replenishment and frequency-phase lock-up. The circuitry starts to oscillate when the control signal enable = 1 and ceases to oscillate when enable = 0 (because the pull-down transistors N 1 and N 2 are turned ON and the pull-up transistors P 1 and P 2 are turned OFF). The size of these four transistors are determined by the operating frequency and the capacitive loads the circuitry is driving. The higher the frequency and/or the larger the capacitive load is, the larger the transistors should be, such that the peak oscillating voltage V dd is achieved. The optimal size can be found by simulations.
Unlike the previous scheme 3], no reference voltage source is needed. There is no serial connected transistor in the driver circuitry, and hence the energy e ciency is mainly determined by the load,
i.e., the resistance R in the clock distribution lines and the capacitive load C. One inductor is su cient. The value of the inductor is determined by resonant condition at the given frequency.
Simulation results indicated that L should be slightly larger than the theoretically calculated value
for optimal e ciency. The PLL (Phase Locked Loop) samples the clock signal(s) at the load and produces two control signals C 1 and C 2 at frequency of the reference clock, which in turn forces the circuitry to oscillate at the frequency of the reference clock. The two control signals C 1 and C 2 are 180 degrees out of phase, each of which is a pulse signal with 25% duty cycle only. The transistors of the inverters (INV1, INV2), NAND gates (NAND1, NAND2) and NOR gates (NOR1, NOR2), which are controlling the replenishing transistors, should be much smaller than the replenishing transistors (e.g., approximately 1=20 of the replenishing transistors at 100MHz). Thus, the voltage at the gate of a replenishing transistor has nite rise and fall times. Approximately a triangular waveform is produced at the gate of each replenishing transistor. The optimal rise and fall time should be close to 25% of a clock cycle such that each replenish transistor is ON for approximately 50% of time. The advantages of the above scheme are as follows:
The replenishing transistors turn ON and OFF gradually so that only small interferences are imposed on resonant circuitry, which ensures that the waveforms at both sides of the inductor are nearly sinusoidal. As we have mentioned in Section 5.1, sinusoidal waveforms have the highest energy e ciency. Energy for the higher order harmonic components are almost completely dissipated in the resistances of clock distribution lines and resistive loads.
The p-mos and n-mos replenishing transistors in one side (e.g. P 1 and N 1 ) are never turned ON simultaneously to prevent the short circuit current. Figure 19: Waveforms of the resonant driver. Fig. 19 shows the waveforms of a control signal C 1 from PLL (top of Fig. 19 ), voltage at the gate of replenishing transistors P 1 and N 1 (middle of Fig. 19) , and the clock waveforms produced at the output (bottom of Fig. 19) .
We de ne the energy e ciency of the supply clock generation as the ratio of energy delivered from DC (equals to the energy dissipated) and CV 2 , where V is the peak voltage. The replenishing transistors are sized to satisfy V peak V dd . Fig. 20 shows the energy e ciency of our scheme, which is obtained from simulations for R = R 1 = R 2 = 0:5 and C = C 1 + C 2 = 100pF. The energy e ciency is approximately 95% at 100 MHz for the chosen R and C value. It is essential to minimize R in the supply clock distribution network to obtain high energy e ciency. When the load capacitances are excessively large, multiple clock generation circuits connected in parallel are needed to have high energy e ciency.
There are other possible variations of this clock generation circuitry. Fig. 21 shows a slightly di erent version of the scheme. The main di erence is that the pair of p-mos transistors are selfcontrolled in a cross-coupled fashion. The advantage of this change is that energy for charging the p-mos transistors can be recycled while the oscillating frequency is still locked to the external reference frequency.
Simulation also indicates that the proposed clock generation circuitry o ers large tolerance on the process and load variation. There is only slight decrease in energy e ciency for 10% of mismatch between C 1 and C 2 . When C 1 and C 2 have 50% of mismatch, clock frequency is still locked to the reference clock although it is severely deviated from sinusoidal waveform. The impacts from other process variations are less signi cant.
Conclusions
In this paper, we presented QSERL, a low-energy, quasi-static adiabatic logic family. QSERL has lower switching activity than dynamic adiabatic logic and possesses several of the positive characteristics of static CMOS. An 8 x 8 QSERL multiplier has been designed, which shows the applicability of QSERL in a low power system. Simulations indicate that nearly 40% energy savings over static CMOS multiplier.
A scheme to generate two complementary sinusoidal supply clocks for QSERL circuits is also presented. The high energy e ciency of this clock generation circuitry is veri ed by simulation study. More importantly, the ability to lock the supply clock frequency to the system clock make the QSERL circuits suitable for integration into a conventional VLSI system.
