Abstract-Clock Distribution Networks (CDNs) in high speed designs can consume 30-50% of the total chip dynamic power. Adiabatic clock circuits can save some of this power, but these depend on a time varying power supply which is difficult to implement in practice. In this paper, we present the first quasiadiabatic clock circuit with a constant supply voltage at high speeds. Our proposed adiabatic clocks attain an average 23% clock power savings with better slew rate and the same skew compared to traditional buffered clocks.
I. INTRODUCTION
Clock Distribution Networks (CDNs) synchronize most data signals by generating a common reference. Although clock signals seem like a simple control signal, they consume significant amounts of power and require high accuracy. In high-speed applications, CDNs have been shown to consume 30-50% of the total chip dynamic power. In modern technologies, power consumption has become a primary design issue that can affect circuit performance, reliability, and require bulky, expensive cooling systems such as heat sinks and fans. Reducing the power consumed in the clock network is highly challenging and is an actively researched area of circuit design and design automation [1] .
Over the past decades, numerous effective techniques have decreased dynamic power consumption. Clock gating can reduce clock power, but active devices limit power savings since they cannot be disabled. Reducing the supply voltage decreases the dynamic power quadratically, however threshold voltage lower bounds limit potential scaling since V 2 dd /V 2 threshold ≥ 10 [2] . Frequency scaling is often used in conjunction with voltage scaling to achieve a cubic power reduction, but many high-performance applications require improvements in peak performance. Recently, distributed resonant clocking has reduced dynamic clock power by up to 90% in theory [3] and 30% in practice [4] . However, the passive component area overhead prevents adoption. While these techniques effectively reduce power, increased integration levels demand more techniques for power reduction.
Adiabatic circuits reduce dynamic power by recycling energy. Previously, adiabatic clocking circuits were differential with time-varying supplies that are hard to implement, and only practical at low-speeds. Quasi-adiabatic circuits with constant supplies, on the other hand, do not recycle 100% of the dynamic power, but often have simpler designs which make them more practical to implement. In this paper, we introduce the first quasi-adiabatic clock circuit with a constant supply that saves substantial dynamic power at high speeds.
The remainder of this paper proceeds as follows: Section II presents adiabatic circuit models and reviews adiabatic behavior. Section III proposes our quasi-adiabatic circuit and its clock distribution application. Section IV proposes the circuit design and implementation. Section V quantifies the major performance metrics of power, slew, and skew using industrial clock benchmarks through circuit simulations compared to buffered clocks. Section VI concludes the paper and discusses the remaining road map to integrate our quasi-adiabatic clock circuit.
II. ADIABATIC CIRCUITS
The word adiabatic means impassable in Greek which is a thermodynamic term for changing the state of matter with no loss of heat. Adiabatic switching circuits recycle energy between clock cycles which decreases dynamic power dissipation.
There are two types of adiabatic circuits: fully adiabatic with asymptotically zero energy dissipation and quasiadiabatic with non-zero energy dissipation [5] . Fully adiabatic circuits are impossible due to fundamental energy losses in devices, but are theoretically possible with ideal devices. Quasi-adiabatic circuits, on the other hand, cannot achieve zero energy consumption even with ideal devices. However, this concession can potentially enable simpler circuit implementations yet still recycle substantial energy in the system.
Many quasi-adiabatic circuits use differential logic and a time-varying supply voltage. The two halves of the differential circuit are doing opposite activities (receiving negated inputs and producing a negated output). When the supply is time varying, the differential circuit can operate with only one half, while the other half is off. In every supply clock cycle, we get an output and its negation. Even when the supply is low, the process completes with no loss of data. Usually, the power dissipated when using constant supply voltage is (1/2)CV 2 dd as shown in Figure 1 (a), but after using the time varying supply, the power dissipated can be calculated as (RC/T s )CV 2 dd where R is the driver resistance, C is the switching capacitor and T s is the supply period [5] . This power reduction is due to spreading charge across time T s as shown in Figure 1 (b), which reduces the peak current. Increasing T s allows more dynamic power savings. Examples of these types of quasi-adiabatic circuits are 2N-2P inverter logic, 2N-2N2P logic [6] , Clock Adiabatic Circuits (CAL) [7] , Pass-Transistor Adiabatic Logic (PAL) [8] , Positive Feedback Adiabatic Logic (PFAL) [9] , and the Differential Cascode Pre-resolve Adiabatic Logic (DCPAL) [10] III. PROPOSED QUASI-ADIABATIC CLOCK Our quasi-adiabatic clock circuits use a switched capacitor as shown in Figure 2 . Energy is recycled between the load capacitance (C clk ) and the extra storage element (C adiabatic ) to reduce overall power. The extra capacitor has two operating modes: First, it helps discharge the load capacitor, C clk , from high to low while recovering some of the charge. Second, it aids the driving buffer by supplying charge to C clk during the low to high transition. The amount of charge recovered and reused in C adiabatic determines the overall efficiency of the energy recycling. Our proposed clock is a switchedcapacitor quasi-adiabatic circuit that does not recycle 100% of the energy, as it is active during only a limited part of the signal transition.
When two capacitors are in parallel at different potentials, current will flow from the higher potential to the lower potential until the two capacitors have the same potential. At this point, no current flow occurs and all nodes are in steady state. If C adiabatic is not disconnected during some portions of the cycle, the clock buffer will have to charge/discharge the total capacitive load C clk + C adiabatic each cycle which would increase the overall power consumption to (C clk + C adiabatic )V 2 dd f . To avoid this, C adiabatic is switched into the circuit only during the charge recovery and reuse stages using a passgate as shown in Figure 2 . The timing of the control signals, control and control, determine the duration of the energy recovery and reuse. Figure 3 illustrates the key idea of our quasi-adiabatic circuit which connects C adiabatic only when V adiabatic > V clk and C clk is charging, or when V adiabatic < V clk and C clk is discharging. If C adiabatic is sufficiently larger than C clk , it will remain a steady-state voltage, V steadystate , that is somewhere between V dd and 0 as shown in Figure 3 .
The control signals depend on the cross-over when V clk = V adiabatic , which in turn depends on the slew rate of the clock and any fluctuation in V adiabatic . Typical clocks will have slew rates on the order of 10% of the clock period [1] . We size C adiabatic so its fluctuation can be neglected to the first order, but the amount of charging and discharging must be equal to maintain the steady-state voltage. The passgate control signals, control and control, are double the clock frequency since they are required to enable C adiabatic twice per clock pulse for charging and discharging activities, respectively. For simplicity, it is assumed that both the recovery and reuse stages will be the same duration. Different recovery and reuse periods would require increasingly complex circuits potentially using voltage comparators. In addition, they wouldn't preserve a steady-state voltage on C adiabatic .
IV. CIRCUIT IMPLEMENTATION
The switched control signals must arrive before the clock transition so that the pass gate is properly enabled. This is done by supplying them from an upper point in the clock tree distribution as shown in Figure 4 . Typically, the load on these control signals is much less than the clock and the remaining levels of the clock tree. Therefore, this timing is not difficult.
In order to double the control signal frequency, we use the proposed pulse generator circuit shown in Figure 5 . The output from the NAND gate, a short pulse, will be the input to an even-number of series inverters to keep the same polarity while delaying the pulse. At the same time, the pulse from the The pulse generator circuit controls the pulse width and pulse separation to control the two phases of charge recovery and reuse. NAND gate will be an input to an OR gate. This will result in two pulses at different times. The number of inverters used before the NAND gate determines the pulse width while the number of inverters before the OR gate determines the time gap between the two pulses. The output signal is control, inverting control using a simple inverter will result in control.
The sizes of the passgate and clock buffer transistors in Figure 2 determine the efficiency of our proposed system. The equivalent circuit of Figure 2 is shown in Figure 6 . The current flowing into C clk is the combination of the buffer current and the current from C adiabatic during the 0 → 0.5V dd transition and can be expressed by
where R p is the passgate equivalent resistance, (V dd /2)(1 − e −t/RpC clk ) is the change in potential from C adiabatic while from while V dd (1 − e −t/R buf C clk ) is the change in potential from V dd and the buffer. There is an additional current from V dd to C adiabatic but this has no net effect on C clk .
At the point where the clock voltage is 0.5V dd , the pass gate will be turned off and the C clk is solely driven by the clock buffer during the remainder of a clock transition from 0.5V dd → V dd . The clock buffer must be sufficiently sized to satisfy a 10%-90% clock slew rate when it "takes over" midway through the transition. This can be expressed by
where R buf is the buffer equivalent resistance and T slew is the required slew rate. This assumes that the slew rate is constant during the entire transition which is an approximation. In reality, it is conservative since the slew rate of the first half of the signal transition is likely faster due to the adiabatic charging assistance. Solving Equation 2 using Newton Raphson obtains the required value for R buf .
Since q= i(t)dt, the charge q can be expressed from Equation 1 as
(3) To get the voltage V clk (t), we can utilize V clk (t) = q/C clk , and since V clk (T CO ) = V dd /2, where T CO is the cross over time when V steadystate = V clk . Now, we have one equation in one unknown, R p , that can be obtained by
From Equation 2 and Equation 4, the buffer and the passgate sizes are of a known values that are used to maintain a V steadystate value of V dd /2. Fig. 6 . While charging, during the 0 → 0.5V dd transition, the total current flowing to C clk is the buffer current i buf (t). in addition to the current flowing from C adiabatic ip(t). Now, the buffer and the passgate widths are obtained using R buf and R p respectively, with the help of the transistor ON resistance equation:
where µ is the mobility, C ox is the gate oxide capacitance per unit area, W/L is the aspect ratio, V GS is the Gate-Source voltage and V T H is the threshold voltage.
Sizing of C adiabatic is another important consideration. A bigger C adiabatic will save more power by avoiding a large voltage fluctuation. Based on our experiments, C adiabatic /C clk = 10 ∼ 100, with varying efficiency, keeps the fluctuation minimal while reducing dynamic power without area overhead.
V. EXPERIMENTS

A. Experimental Setup
Our simulations use an IBM 130nm technology and the ISPD2010 clock synthesis benchmarks in FreePDK 45nm for system-level analysis since we do not have benchmarks in the IBM technology. The supply voltages are 1.2V and 1V for the IBM and FreePDK technologies, respectively. The circuit simulations use operating frequencies of 1, 2 and 4 GHz while the system benchmarks are analyzed only at 1GHz. Figure 7 shows C adiabatic charging/discharging activity with respect to the output clock signal. C adiabatic discharges during the rising edge of the clock, until C adiabatic and C clk have the same potential, then C adiabatic is disconnected. At the falling edge of the clock, C adiabatic charges from C clk until they both have the same potential, then C adiabatic is disconnected. These simulations confirm at the fundamental level that our circuit recycles charge back and forth between C clk and C adiabatic . Experiments also show that our proposed quasi-adiabatic clock converges at steady state after 80nsec.
B. Circuit Analysis
In IBM 130nm, we built a single buffer driving C clk of 100fF at 2GHz frequency. We then added a passgate (8µm NMOS and 8µm PMOS) and C adiabatic of 100X (10pF). Simulation results show that the proposed clock circuit has less slew compared to a buffered clock with the same specifications. Moreover the dynamic power consumption is reduced from 482µW (buffered clock) by 24% to 366µW . The 24% saving uses the same buffer size of 6µm PMOS transistor and 3µm NMOS transistor. In order to duplicate the same signal performance (i.e., slew), the proposed clock buffer can be reduced to a 5µm PMOS and 2.8µm NMOS. This reduction in buffer size results in a dynamic power of 342µW which is almost 29% power savings. The total power consumption, including the passgate power consumption using ideal control signals, is shown in Table I for a range of frequencies. The dynamic power saving trend is consistent at nearly all frequencies. However, at higher frequencies the passgate control logic consumes more power, as power increases linearly with frequency, leading to an overall reduction in total power savings. . Transient analysis shows the voltage across C adiabatic is discharging during the rising edge of the clock and charging during the falling edge of the clock, and in both cases it is disconnected when having the same potential.
C. Benchmark Analysis
In Table II , the results and simulations show the ability of the proposed adiabatic clock to reduce power consumption with an average of 23.4% in high-frequency systems at different chip sizes and different sink capacitances/placements. The high percentage of power savings is when using smaller benchmarks with higher capacitance per unit area. While using big benchmarks with small capacitance per unit area, the power savings drops to an average of 5% likely due to the interconnect parasitics decreasing the system efficiency.
VI. CONCLUSION
In this paper we presented the first quasi-adiabatic clock circuit with constant supply voltage at high speeds. Spice simulations and results show that using the proposed technique is a promising approach that can save on average 23% of clock power over a range of clock frequencies. In addition, the proposed circuit is easier to implement than existing adiabatic clock circuits, as our proposed circuit has a constant supply voltage, and can work at very high clock frequencies unlike previous adiabatic circuits.
Since the C adiabatic capacitors can be quite large, we plan to investigate integrated on-package capacitors, as they are of high density with small sizes. With the help of the proposed pulse generator, a complete clock design is ready for fabrication to be used at the system level.
