An adiabatic charge pump based charge recycling design style by Manne, Vineela
Retrospective Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 
1-1-2003 
An adiabatic charge pump based charge recycling design style 
Vineela Manne 
Iowa State University 
Follow this and additional works at: https://lib.dr.iastate.edu/rtd 
Recommended Citation 
Manne, Vineela, "An adiabatic charge pump based charge recycling design style" (2003). Retrospective 
Theses and Dissertations. 19497. 
https://lib.dr.iastate.edu/rtd/19497 
This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and 
Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses 
and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, 
please contact digirep@iastate.edu. 
An adiabatic charge pump based charge recycling design style 
by 
Vineela Manne 
A thesis submitted to the graduate faculty 
in partial fulfillment of the requirements for the degree of 
MASTER OF SCIENCE 
Major: Computer Engineering 
Program of Study Committee: 
Akhilesh Tyagi (Major Professor) 
Chris Chu 
Soma Chaudhuri 





Iowa State University 
This is to certify that the master's thesis of 
Vineela Manne 
has met the thesis requirements of Iowa State University 
Signatures have been redacted for privacy 
111 
TABLE OF CONTENTS 
LIST OF FIGURES iv 
LIST OF TABLES v 
ACKNOWLEDGEMENTS vi 
ABSTRACT vii 
1 INTRODUCTION 1 
2 POWER DISSIPATION IN CMOS DIGITAL CIRCUITS 4 
3 EXISTING LOW POWER TECHNIQUES 8 
4 BLOCK LEVEL DESCRIPTION 11 
5 CHARGE PUMP OPERATION 15 
5.1 Modification to Standard Charge Pump 17 
5.2 Circuit operation 19 
5.3 Determining Output Capacitor Value — A Design Issue 26 
5.4 Energy analysis of Charge Pump Circuit 27 
6 BLOCK LEVEL IMPLEMENTATION 30 
6.1 Multiple Charge Pumps based Implementations 32 
7 SIMULATION RESULTS AND PERFORMANCE ANALYSIS 35 
8 CONCLUSIONS 43 
BIBLIOGRAPHY 45 
1V 
LIST OF FIGURES 
Figure 1. Short circuit current 6 
Figure 2. Conceptual description of the proposed architecture 11 
Figure 3. Simplified model of a charge pump circuit  15 
Figure 4. Current path during charge pump operation 16 
Figure 5. Three-stage input model for a charge pump 19 
Figure 6. Step 1 20 
Figure 7. Step2 21 
Figure 8. Step 3 21 
Figure 9. Step 4 22 
Figure 10. Snapshot of charge pump circuit schematic 23 
Figure 11. Voltage comparator 25 
Figure 12. Block level connections far charge pumps 30 
Figure 13. FIR filter structure 32 
Figure 14. Spatial implementation of multiple charge pumps 33 
Figure 15. Time multiplexing multiple charge pumps 34 
Figure 16. Energy cost comparison 38 
Figure 17. Energy savings 38 
Figure 18. Energy costs -Tune multiplexed implementation 40 
Figure 19. Area f figures with and without charge pumps 41 
V 
LIST OF TABLES 
Table 1(a). Energy costs with and without the charge pump (CP) circuit in various circuit 
blocks with two random sample inputs (setl and set2). 36 
Table 1(b). Energy costs with and without time multiplexed charge pumps (CP) in various 
circuit blocks with two random sample inputs (setl and set2). 39 
Table 1(c). Area comparison as a measure of transistor count with and without the charge 
pumps and control circuitry. 41 
Table 2. Delay figures with and without the charge pump circuit 42 
Vl 
ACKNOWLEDGEMENTS 
I would like to take this opportunity to express my gratitude to all who helped me in various 
aspects of my research and writing this thesis. First and foremost I would Like to thank Dr. 
Akhilesh Tyagi for his encouragement and backing, and stimulating suggestions that made 
working on my thesis an exciting experience. His expert guidance helped me gain very useful 
insights into the area of Low Power VLSI, which proved to be an invaluable contribution to 
my thesis. 
I would also like to thank my parents and brother for their support, encouragement and 
understanding. 
Last but not the least I would also like to thank all my friends for standing by me in my good 
and bad times during my stay at Iowa State University. 
vii 
ABSTRACT 
A typical CMOS gate draws charge equal to CL Vdd2 from the power supply (Vdd) where CL
is the load capacitance. Half of the energy is dissipated in the pull-up p-type network, and the 
other half is dissipated in the pull-down n-type network. Adiabatic CMOS circuits reduce the 
dissipated energy by providing the charge at a rate significantly lower than the inherent RC 
delay of the gate. The charge can also be recovered with an RLC oscillator based power 
supply. However, the two main problems with adiabatic design style are the design of a high 
frequency RLC oscillator for the power supply, and the need to slow down the rate of charge 
supply for lower energy. This reduction in speed of operation is a real critical concern in 
circuits which are designed to satisfy certain timing requirements, thereby rendering the 
adiabatic technique inapplicable in certain situations. A new approach incorporating an 
adiabatic charge pump that moves the slower adiabatic components away from the critical 
path of the logic is proposed in this work. The adiabatic delays of a charge pump are 
overlapped with the computing path logic delays. Hence, the proposed charge pump based 
recycling technique is especially effective for pipelined datapath computations (digital signal 
processing, DSP, is such a domain) where timing considerations are important. Also the 
proposed design style does not interfere with the critical path of the system, and hence the 
delay introduced by this scheme does not reduce the overall computational speed. In this 
work, we propose one implementation schema that involves tapping the ground-bound 
charge in a capacitor (virtual ground) and using an adiabatic charge-pump circuit to feed 
internal virtual power supplies. As the design relies on leakage charge to generate virtual 
viii 
power supplies, it is most effective in large circuits that undergo considerable switching 
activity resulting in substantial charge recycling by the proposed scheme. The proposed 
method has been implemented in DSP applications like FIR filter, DCT/IDCT filters and FFT 
filters. Simulations results in SPICE indicate that the proposed scheme reduces energy 
consumption in these DSP circuits by as much as 18% with no loss in performance, paving 
way for a new approach towards conserving energy in complex digital systems. 
1 
1 INTRODUCTION 
The cost of VLSI chips was earlier driven primarily by area and performance considerations. 
Recent improvements in silicon process technologies have brought down these costs to some 
extent. On one hand, while these new technologies resulted in higher performance and higher 
integration, on the other they resulted in higher power dissipation in these chips. This 
increase in power dissipation has compounded the problems of packaging and ensuring 
reliable operation of these chips. Thus, for many applications, it has become essential to 
minimize the power dissipated in order to reduce the costs associated with cooling the chip. 
The rapid growth in the deployment of highly complex digital circuits in communication, 
signal processing and portable wireless applications has created a large market for low-cost, 
high throughput and low-power devices. Considerable research efforts over the last decade 
have been focused on building digital circuits that can operate at high clock frequency [1, 2]. 
Coupled with developments in processing technology and shrinking feature size, complex 
systems with high computational capacity are the norm for the day. Paralleling the increased 
performance requirements are the low power specifications (often needed simultaneously 
with high performance). In fact power consumption is slowly becoming the main bottleneck 
in the implementation of these high-complexity circuits in certain applications. 
Although traditionally low power digital circuits have mainly targeted low throughput 
markets like calculators and palmtops, recently the industry has witnessed an increasing 
demand for Personal Communication Services (PCSs), such as digital cell phones and 
portable computing [3, 4]. On one hand, with PCS applications extending to wireless data 
2 
transfer, multimedia applications with voice and image compression capabilities, complex 
DSP circuits are required to satisfy the thirst for high computation power. On the other hand, 
critical issues like power supply design, thermal issues, along with limited improvements in 
battery energy / (weight &volume} have aroused the need for building systems that consume 
less energy without sacrificing performance. Low power designs are also attractive from an 
alternate viewpoint. Although it is not directly evident, even in non-portable wired 
applications, low power design is highly critical. With increasing package density and 
operating speeds, power dissipation and package cooling limits are becoming important 
issues. Designing large cooling fans (or worse yet having to resort to liquid cooling) is not 
only difficult but also adds to overall system cost. 
The first step towards implementing a low power technique or alternatively a power 
minimization technique in any VLSI system is power estimation. Power estimation refers to 
the process of determining with a high level of confidence, the power consumed by a circuit. 
The most general and the most accurate power estimation may be performed at the circuit 
level using tools such as SPICE. Although estimation at such a low level is very accurate, the 
time taken for the estimation precludes the use of this technique for large circuits. For large 
circuits, it is therefore essential to estimate power at higher levels of abstraction in the design 
process. 
Power optimization and low power design techniques can be implemented at various levels 
of abstraction: technology, circuit, architecture and algorithm [5]. In this work, a new method 
of conserving power by tapping the charge bound for the ground terminal (ground-bound 
3 
charge/current) which is recycled to generate virtual internal Ydd terminals is proposed. This 
reduces energy wastage and consumption. 
A brief description of the various leakage mechanisms in CMOS circuits is described in 
Chapter 2. The different existing low power methodologies are explained in Chapter 3. In 
Chapters 4, 5 & 6, a complete block level description of the proposed scheme with design 
details and operational methodology is covered. Simulations results and detailed performance 
analysis is presented in Chapters 7 & 8. Finally, the various target applications where the 
approach can be adopted are described in Chapter 8. 
4 
2 POWER DISSIPATION IN CMOS DIGITAL CIRCUITS 
The main sources of power consumption in CMOS circuits can be classified into two 
types, namely static power dissipation and dynamic power dissipation, as given in Equation 
P = * V + C * Vdd 2 * * a +~ * Vdd Total leaks e dd L fclk short circurt (1) g static — dynamrc 
The first term in Equation (1) refers to the static dissipation that occurs when the circuit is 
inactive. Power dissipation during those times is mainly due to the leakage current arising 
from substrate injection and sub-threshold effects. These components are determined by 
fabrication technology and hence are not design dependent. Also, this is a small fraction of 
the total dissipation in comparison to the other components, for computational logic. This 
component is larger in case of NMOS circuits which has made the replacement of NMOS 
with CMOS worthwhile although the CMOS process is more complex than the NMOS 
process. 
The second term in Equation (1) refers to dynamic power dissipation that occurs when the 
outputs toggle due to change in input logic level. This is by far the dominant component of 
power dissipation. This dynamic component of power can be further classified into switching 
dissipation and short circuit dissipation. 
Switching dissipation is the effect of charging and discharging of load capacitances (CL) in 
the circuit. When CMOS circuits switch, the output is either charged up to Vdd, or discharged 
down to ground. In static logic design, the output only transitions on an input transition, 
while in .dynamic logic, the output is precharged during half the clock cycle, and transitions 
5 
can only occur in the second clock phase, depending upon the input values. In both cases, the 
power dissipated during switching is proportional to the capacitive load; however, they have 
different transition frequencies. Each time capacitor charges, energy is drawn from the 
supply terminal which is then dumped to ground when the capacitor discharges. The energy 
loss by this mechanism is dependent on factors like size of load capacitance (CL), supply 
voltage (Vdd), clock frequency ~f~jk) and the "activity factor (a)" which refers to the 
probability with which the power consuming transitions occur. Significant savings in 
dynamic power dissipation can be obtained from operation at a reduced supply voltage. 
However to improve noise margins and performance concurrently, threshold voltages need to 
be lowered too. The threshold voltages place a limit on the minimum supply voltage that can 
be used without incurring unreasonable delay penalties, which in turn determines the 
switching power dissipation. Threshold voltages as low as possible are desired but 
unfortunately at too low threshold voltages the static component of power due to 
subthreshold currents becomes significant. Thus design of devices and selection of supply 
and threshold voltages require tradeoffs. Since dynamic switching power is the major 
component of overall power dissipation, the low-power design methodology concentrates on 
minimizing total capacitance, supply voltage, and frequency of transitions. The effective 
final switching power dissipation term as given in Equation (1) can be derived by considering 
the instantaneous current and voltage at the various time instances as given in Equation (2) 
T/2 T 
Pd = 1  ~In (t)(Vout )dt + 1  f Ip (t)(Vdd - Vout )dt 
T o T Tia 
Since In(t) = CL dYout/dt and Ip(t) = C~ d(Ydd — Yout) /dt, 
(z) 
6 
Ydd 0 2 
Pd = ~` J (Pout )dVout + f (Ydd — Vout )d (vdd — Yout ) — C L add  ~3~ 
o Ada 
which is the same as in Equation (1). From Equation (3) we can see that dynamic power is 
proportional to switching frequency and square of supply voltage but independent of device 
parameters. Also since 2/T is the average number of transitions per second, CLYdd2 /2 is the 









tl t2 t3 
Figure 1. Short circuit current 
t 
The second component of the dynamic power dissipation is the short circuit dissipation. 
Short circuit dissipation occurs mainly during switching transients. In static CMOS circuits, 
while Vtn < Vin and Vin < Udd - I Vtpl ,where Vin is the voltage at a changing input, with 
the other inputs steady, both the NMOS (pull down) and PMOS (pull up) subnetworks 
conduct simultaneously and a short circuit path exists for direct current to flow from Ydd to 
ground terminal. This is a case typically observed when the inputs are varying slowly and the 
voltage value lie in a region where its value is higher than the threshold voltage of the n-type 
7 
transistor and at the same time less than the I Vtl of the p-type transistor. The mean short 
circuit power dissipation is given by 
Imean Vaa 
The short circuit dissipation of the gate varies with output load and the input signal slope as 
shown in Figure 1 [11, 12J. It decreases in both absolute terms and as a fraction of total 
dissipation as the output load increases. In Figure 1, T is the time period, zR is the rise time 
and zF i s the fall time of the input signal Vin . 
The aim of this work is to tap the ground-bound charge that is lost by any combination of 
these mechanisms and recycle it by generating internal power supplies (as will be explained 
in the later chapters). 
8 
3 EXISTING LOW POWER TECHNIQUES 
The common practices followed to reduce power consumption in CMOS circuits can be 
listed as follows [16, 17]. 
• Good design practices 
• Process size shrinking 
• Scaling of supply voltage 
• Transistor Sizing 
• Clock gating to reduce transitions 
• Power down testability blocks when not in test mode 
• Power down functional blocks 
• Minimum number of sequential elements in circuit 
• Checking for low slope signals in design and rectifying them. 
• Downsizing the non critical path circuits 
• Reduction of clock loading 
• Parallelism 
• Charge recycling 
• Adiabatic Circuits 
9 
One of the most common low power synthesis techniques is lowering the supply voltage 
coupled with techniques that compensate for the loss of performance. Logic and circuit level 
transformations minimize dynamic power dissipation based on switching activity in the 
circuit. Power dissipation is being considered at all phases of design cycle to efficiently 
reduce power dissipation. Power dissipation in a circuit can be calculated by estimating the 
C1rCult activlty by assuming the average case input signals to the system [ 18, 19, 20] . 
Behavioral level optimization can reduce the power dissipation to a large extent because the 
operations haven't been assigned and area and hardware allocations have not been performed 
yet and the entire design space can be explored efficiently. It is typically targeted to optimize 
the hardware resources required and the average number of clock cycles per task required to 
perform a set of tasks. Logic synthesis tools are often the first tools in the design flow. Logic 
optimization is carried out by state assignment and combinational logic synthesis. In the next 
stage circuit level optimization such as transistor resizing and input signal reordering are 
performed. 
Significant savings in power consumption can be achieved by reducing the supply voltage 
and maintaining a good noise margin through reduced threshold voltages [13]. Reduced 
supply voltages increase the circuit delays. The channel length to width ratios of the devices 
in the circuit could be altered to adjust the delay. Another practice is to exploit parallelism 
and pipelining. This increases latency and additional circuit overhead. Yet, power savings by 
a factor of 10 have been shown to be obtainable [3, 9]. In memory circuits one effective way 
to reduce power is to lower supply voltage and increase the effective capacitance to maintain 
sufficient charge in the cell [10]. 
10 
The scheme introduced in this work incorporates the adiabatic concepts to reduce power 
dissipation in CMOS circuits along with internal virtual supply terminals fed by charge 
pumps that recycle the ground bound charge. 
11 
4 BLOCK LEVEL DESCRIPTION 
The conceptual description of the proposed scheme is given in Figure 1. The main strategy is 
to collect the ground-bound charge from one module (a logic block) into a capacitor, which 
acts as a virtual ground. This charge is used to pump up the voltage of other capacitors 
(virtual Ydds) to a higher level using an adiabatic charge pump circuit. These virtual Ydds 











C'apacit-or to capture the 
LeaX~age charge 
♦ '~ "~ • 
Logic 
Block 
Figure 2. Conceptual description of the proposed architecture 
~ Virtual 
Vdd 
This scheme is most applicable in systems with considerable activity that contain a large 
number of transistors. Although not a necessary condition, it best suits pipelined logic blocks. 
The earlier pipeline stages can be driven by clean Ydd terminals (and virtual ground 
terminals). The charge collected and recycled from the earlier stages can drive the later 
12 
pipeline stages through virtual Ydd nodes. This pipelining can be inserted explicitly for 
accommodating the proposed charge recycling technique. Note that the pipelining is not 
necessarily synchronous variety in our schema. In fact, the scheme adopted by us deploys 
self-timed (asynchronous) control for the charge recycling. The following discussion and the 
specific DSP filter implementations described later illustrate the concept further. 
Such systems are first divided into conceptual logic blocks with specific functionality. Each 
of these logic blocks is then associated with either a virtual ground node or a virtual supply 
node or both. Depending on the transition activity in the blocks, the block with large number 
of transistors and large ground-bound charge is chosen as a `source block'. The ground node 
of the `source block' is connected to a capacitor that collects all the leakage charge from that 
particular logic block and serves as "virtual ground". The voltage on the "virtual ground 
capacitor" is continuously monitored to ensure that it still serves a.s logic `0' (a 
predetermined voltage should not be exceeded) and does not hinder the performance of the 
circuit. Once the capacitor reaches the desirable voltage threshold, it is disconnected from the 
logic block. Any further ground-bound charge from the `source block' is either collected in 
another capacitor or dumped to ground. The charged capacitor is then connected to a charge 
pump circuit that amplifies the DC voltage to a higher level so as to be able to act as `virtual 
supply' . 
Having generated a higher voltage on the output of the charge pump, a suitable sub-block in 
the system is chosen to act as the "target blocks'. The supply for this "target block" is 
provided by the charge pump output. The choice of this "target block" is dependent on 
certain design issues. Firstly, the time separation between the operation of the "source block" 
13 
and the "target block" should match or exceed the delay introduced by the charge pump 
circuit. In other words, it is necessary that the "target block" needs to operate certain time 
duration after the "source block" so that the charge dumped by "source block" could be used 
to generate the required supply voltage. This is also the main rationale behind the suitability 
of this scheme to systems that can be divided into sub-blocks that operate sequentially as 
indicated earlier. 
Secondly, the energy available at the output of the charge pump should exceed the energy 
required by the "target block" to undergo a certain number of transitions. This places an 
indirect constraint on the complexity, or in other words the `tranSltlon actlVlty' of the "target 
block". Since each logic transition is associated with charging and discharging of certain load 
capacitance, for almost all simple systems, the energy required by the block can be computed 
considering an average input case. Based on the energy requirement, an appropriately loaded 
block is then chosen as "target block". 
An alternate strategy to the above scheme is to first decide a "target block" and based on the 
transition activity and the power requirements of the block, a suitable sub-block in the system 
is chosen as a "source block". The charge pump can then be designed such that the delay and 
the power requirements of the blocks are satisfied. However, in big systems, it is usually 
desirable that the "source block" and the "target block" are chosen so that they are symmetric 
in terms of logic transitions. 
As the complexity of the system under consideration increases, more than one block can be 
chosen as "source block" and more than one block as "target block". The virtual ground 
consists of a sea of capacitors that are dynamically connected to various "source blocks" as 
14 
they dump charge. This not only ensures the availability of a charged capacitor for use by the 
charge pump at any instant of time, but also increases the efficiency of the circuit by making 
sure that almost all the dumped charge is collected for recycling. The charge pump can then 
be continuously operated by time multiplexing the various charged capacitors. Further 
extension to the scheme involves implementing an array of charge pump circuits, so that 
almost all ground-bound charges in the system can be optimally utilized to source some sub-
block at any instant of time. In summary, the source charge can be either spatially distributed, 
or temporally distributed, or a combination of both. 
Note that this scheme also provides a limited degree of voltage scaling naturally. The logic 
blocks with a virtual Vdd or ground operate with a lower voltage swing. 
In the following chapter the functionality of the traditional charge pump is presented along 
with the associated design issues. In Chapter 5, details of the actual implementation of the 
above scheme and how the traditional charge pump can be modified for optimum 
performance are presented. 
15 
5 CHARGE PUMP OPERATION 
Charge pumps are circuits that generate a voltage larger than the supply voltage from which 
they operate. They are widely used DC-DC converting circuits that generate voltages higher 
than the available supply voltage. Most charge pumps are based on the Dickson charge pump 
circuit [6, 14, 15]. The basic theory behind charge pump operation can be explained with the 





C Cout Vout 
P 
b~v ~ I Vclk 1
Figure 3. Simplified model of a charge pump circuit 
The circuit is mainly comprised of diodes D1, D2 and a capacitor Cp. The diodes act as self-
timed switches and are used to ensure uni-directional flow of charge. The lower plate of 
capacitor Cp is controlled by a clock signal, of magnitude V~rk . The circuit operates by 
pumping charge along the diode chain as the capacitor Cp is charged and discharged during 
each clock cycle. The basic operation of this circuit occurs in two phases [7,8]. In Phase 1, 
the lower plate of capacitor Cp is connected to ground (clock low}, and the capacitor is 
16 
charged through DI as shown in Figure 3. Assuming sufficient charging time is provided, 
the capacitor is charged to voltage, Tjn — Vd where Vd represents the drop across the diode. 
D1 
Vin 
Figure 4. Current path during charge pump operation 
During phase 2, the clock signal is raised high, with the voltage at node `b' rising from `o' to 
Vclk and the voltage at node `a' from Vn — Vd to (Vn — Vd ) + V~Ik .This turns CUFF diode Dl and 
switches oN diode D2. Charge is pushed form node `a' to node `out' until the output node 
reaches a f final voltage given by: 
V out V in + \ V clk — V d ~ — V d ~2~ 
Assuming the clock voltage is much higher than the diode drop, a boosted voltage is obtained 
at the output node that is higher than the voltage at the input node. 
More detailed analysis of the charge pump, including effects of external resistive load and 
stray capacitances, along with the necessary and sufficient condition for charge pump 
operation are given in [7,8] . In a practical implementation of the above circuit, the diodes are 
17 
replaced by diode-connected NMOS transistors, where the diode drop Vd is replaced by MOS 
threshold voltage, ~h . In cases where the drop across the MOSFETs is critical, with some 
slight additional modifications, the diode connected NMOS transistors are replaced by bi-
directional low drop switches that are implemented using transmission gates. 
5.1 Modification to Standard charge Pump 
As explained in Chapter 3, the conceptual idea of the proposed architecture is to collect the 
dumped charge from a logic block into a capacitor and then use a charge pump circuit to 
boost the virtual supply voltage to a higher level so as to act as virtual supply fora "target 
block". Although the additional area overhead due to charge pump is not an important issue, 
the energy consumed by the charge pump itself, during the process of voltage boosting is 
critical. It needs to be ensured that the additional energy spent in the charge pump operation 
does not constitute a significant portion of the total energy saved by the scheme, thereby 
rendering the scheme inapplicable. Although the traditional charge pump well serves the 
purpose of boosting the voltage, it needs to be observed that considerable energy is 
consumed from the external clock sources. This is acceptable in cases where the main 
criterion is to generate high voltage and not optimize power. If we consider the operation of 
traditional charge pump in our application, with the "virtual ground capacitor" as the source 
of input energy, every time the charge is shared between the "virtual ground capacitor" and 
capacitor Cp of the charge pump, half oY the energy is dissipated in the switches. In other 
words,. if the charging of the capacitor Cp occurs abruptly, then the well known 1/ CVO 
18 
formula expresses the dissipated energy where V is the voltage difference between the input 
capacitor and Cp. This loss is inevitable regardless of the network design parameters. 
However, if the charge transport is slowed down, energy dissipation can be reduced 
considerably. This is the well known "adiabatic charging principle". The principle states that 
the dissipation while charging a given node capacitance to a particular voltage, can to a first 
approximation be asymptotically reduced to zero, if the charging time tends to infinity. In 
other words, if the voltage difference between the two capacitors is reduced, the 
corresponding energy dissipation during charge distribution also reduces. This is because the 
dissipation is proportional to the square of the voltage difference between the capacitors 
during charge distribution. 
With this idea in mind, it is better to charge capacitor Cp in incremental steps of dV rather 
than in one step [21]. This then requires the availability of multiple input capacitors to the 
charge pump, with input voltages Vl, V~ V,~ .... VN, such that 
VI = dV 
V2 = 24V 
V3 = 34V 
VN = NdV 
where dV = V 
The energy dissipated during the charging process is then given by 
_ C LV max
E Dissipated — N 
z 




It can be seen from Equation {4) that this energy increases with N. However, as the number 
of stages is increased, not only does the complexity of the circuit increase, but the issues like 
additional dissipation in control circuitry required to monitor the input capacitor voltages, 
leakage effects from the multiple input capacitors, speed of operation, area overhead etc also 
become critical. Hence atrade-off must be reached between energy dissipation and circuit 
complexity. In this work, the number of input stages to the charge pump was therefore, 
restricted to three. The circuit diagram of the three-stage input model for a charge pump is 











Figure 5. Three-stage input model for a charge pump 
5. ~ Circuit operation 
In the scheme discussed here we adopt a three stage charge pump circuit as shown above. 
The inputs to the charge pump are three di~ferent virtual ground capacitors with the capacitor 
voltage (Vinl, Vint, Vin3) differing by a constant pre-determined value. These voltage values 
on the input capacitors are achieved by monitoring the charging of the capacitors and 
20 
disconnecting them from the "source block" once the required voltage is achieved. This task 
of correctly observing the voltage and br 
capacitor" and the "source block" and th 
appropriate input of the charge pump c 
;along the connection between the "virtual ground 
n connecting the "virtual ground capacitor" to the 
;pending on the voltage level is controlled by a 
voltage comparator. The design of the voltage comparator implemented in this scheme is 
similar to a SRAM cell with one input fi: 
connected to the "virtual ground cap 
predetermined voltage value (Vin~/Vin2/ti 
is used to control a switch which either c 
charge pump circuit. 
;ed at a reference voltage level and the other input 
~citor". Once the capacitor is charged to the 
in3), the comparator output toggles and this signal 
~nnects or disconnects the virtual ground from the 
The charge collected in the various input capacitors of the charge pump is then transferred in 












Step l: Initially at this instant the gate gl is closed with g2, g3 and g4 open. The lower plate 
of the capacitor Cp is held at OV. This enables a portion of the charge to flow from the input 
21 
capacitor CI to charge pump capacitor Cp. This raises the potential of the upper plate of the 
capacitor Cp. This step has caused charge sharing between the two capacitors Cl and Cp. 
v=o 0 
Fi _ure 7. Step2 
out 
Step 2: Now the gate gl is opened and g- is closed connecting the input capacitor C2 to Cp. 
Charge is shared between these two capa~ itors by charge flow from CZ which is at a higher 
voltage level than Cp. At the end of this s ep a portion of the charge from input C2 has been 
transferred to Cp . 









Fig re S. Step 3 
 84 O 
Cout 
Step 3: Gate g3 is closed after opening t e gate g2. Since C3 has higher voltage than Cp, 
charge flows from C3 to Cp. During steps 1, 2 and 3 the voltage on lower plate of capacitor 
Cp is kept at OV to maintain the flow of cu ent. 
22 





~-~ ~._~ ag 
Cl 
Figure 9. Step 4 
Step 4: By this time most of the charge on the input capacitors Cl , CZ and C3 has been 
transferred to the charge pump capacitor Cp. At this instant all the input capacitors are 
disconnected from Cp by opening the gat: s gl, g2 and g3. Now g4 is closed after raising the 
voltage of the lower plate of Cp from i 
capacitor Cp and the output capacitor C• 
square of the voltage difference between 
low as possible the voltage difference is 1 
output capacitor Cout by raising the volt, 
number of stages or steps for adiabatic op 
for convenience and symmetry with the 
shown in Figure 9. 
to V. Charge sharing between the charge pump 
ut results in power dissipation, which varies as a 
these two capacitors. To keep this energy loss as 
~ veered. This is done by adiabatically charging the 
ge at the lower plate of Cp in gradual steps. The 
-ration of the charge pump has been fixed to three 
~ put end. The current flows from Cp to Cout as 
The snapshot of the implementation of this circuit in Cadence's schematic editing 



















































"~il ~. N >~~ 
~. . ~ 
~~~. 
>'>w 
~~i~ ~ . --








~Q rm ~ N ~-N  ~- cv ~ > n 
Z~1i~ 
m a r: ~ 
r N ~ O 
'7 ++ 
it 




—~ —(+ Il—~  (~ 
p ~ i
c 
  4 6~1i~ 
4 
{jW11dA~•: ._ ~-.'.dwndA. dWndn 30~ 1 
£u a,~i _e 1 ~'1 ~ ZUI ,: ~ 
::VJ  aW~a„ i 
+Hi6lf'_M ~,t~~l
.l ~ 





~,~.x ~~ ~ 
~ V 
r) 










Figure 10. Snapshot of charge pump circuit schematic 
24 
Now the output capacitor has accumulated the charge from the three input capacitors. This 
process is repeated till the energy level on the capacitor Cout reaches a value that is sufficient 
to serve as a virtual Vdd for a target logic block supporting a certain number of transitions. 
The different capacitors are connected to the intermediate capacitor through transmission 
gates controlled by special step wave signals. The three stage charge pump operation 
explained in these four steps can be summarized as follows. The capacitor with least voltage 
value (say CI) is first connected to the intermediate capacitor. This is achieved by closing the 
first gate (gatel). The charge on CI is now shared between the two capacitors CI and Cp. If 
the capacitors CI and Cp are designed to be of equal value, then the voltage on both the 
capacitor settles to (Vinl)/2 (assuming negligible drop across the transmission gate). After 
the voltage from the above step settles to a steady value, the second gate is closed and `gate 1 ' 
is opened. This disconnects Cl and connects C2 to Cp. Since the design of charge pump is 
such that the voltage on C2, denoted as Vint, is greater than Vinl and hence greater than the 
steady state voltage value from step 1 (i.e. Vinl/2), it is essential to disconnect Cl before 
connecting C2, to ensure that no reverse charge flows from C2 into Cl. Also since Vint > 
Vinl , charge sharing occurs between capacitor C2 to Cp when `gate2' is closed. Once 
maximum charge transfer occurs between these two capacitors, `gate2' is opened and `gate3' 
is closed to connect the last capacitor C3 to Cp for charge sharing. Once the capacitor Cp 
accumulates all the charge, it is disconnected from the inputs and connected to the output 
capacitor through `gate4' . During the entire duration of the first three steps (when the 
capacitor Cp is connected to various inputs), the lower plate of Cp is at ground potential 
thereby causing charge to flow from input into the charge pump. During Step4, the clock 
signal, and thereby the lower plate of Cp is raised high to a voltage Vp, forcing the voltage at 
25 
the top plate of the capacitor Cp to increase by a value Vp. When capacitor Cp is connected 
to the output capacitor Cout, charge sharing takes place between the two capacitors and 
energy is dumped from the charge pump into the output capacitor. This process is repeated 
cyclically until the energy on the output capacitor equals the energy required by the "target 
logic block" to which this capacitor is to be connected as virtual supply. 
Similar to the design of input stages, the charging of the output capacitor is also performed 
adiabatically to reduce the energy dissipation while dumping the charge from capacitor Cp to 
Cout. This is done by increasing the voltage at the lower plate of capacitor Cp (i.e. Vp) in 
small incremental steps to allow adiabatic charge transfer. This would reduce the stepwise 
voltage difference between Cp and Cout. Hence the energy dissipation, which varies as a 





Figure 1 1. Voltage comparator 
The voltages on the capacitors Cl , C2 and C3 as well as Cout are monitored constantly so 
that they don't exceed their threshold values using a voltage comparator circuit. The design 
26 
of the voltage comparator (Figure 11) implemented in this scheme is similar to a SRAM cell 
with one input fixed at a reference voltage level (Vn) and the other input (Vp) connected to 
the "virtual ground capacitor". The voltage comparator is like a differential amplifier. Its 
output settles to a `O' or `1' based on whether Vp< Vn or Vp> Vn. This output is used to 
control the gates that connect the ground capacitor to charge pump. 
5.3 Determining Output Capacitor Value — A Design Issue 
The choice of the output load capacitor of the charge pump, which in turn determines the size 
of the input and stage capacitors of the charge pump is decided based on the energy 
consumption of the block that the capacitor needs to drive. To determine the value of this 
capacitor, a "target logic block" is considered. Consider a case when the logic block has `2n' 
transistors, comprising `n' IVMOS and `n' PMOS transistors in a complementary CMOS 
logic family. The worst-case energy is drawn when all the n transistors undergo switching 
transitions. This would cause charging or discharging of the parasitic capacitances at the 
various transistor nodes. Assuming each parasitic capacitance to be of value Cx, each CMOS 
pair would take O.SCxVdd2 energy for such a transition. Thus for a block with 2n transistors, 
virtual Vdd should be able to provide sufficient energy required to charge or discharge these 
n capacitances. 
In this work, we consider an average case input vector to determine an energy figure. In most 
cases the average input case is much farther away from the worst-case input. Considering a 
worst case input may just result in an over-design of the system. With this scheme (mostly 
targeting applications like DSP) where the input usually follows a standard pattern, the 
assumption of an average case input is justified. Thus based on the average input case, the 
27 
output capacitor value is determined such that it can deliver this energy to the logic block 
without its logic level dropping from Vdd to a voltage below the acceptable "logic high". 
5.4 Energy analysis of Charge Pump Circuit 
The modified charge pump circuit was designed in order to utilize the wasted energy and 
reduce the overall energy consumption of the system. The charge pump and its control 
circuitry will draw certain amount of energy to operate. The input energy to the charge pump 
is provided from dumped charge which would have leaked to ground in the absence of this 
scheme. So this energy is obtained for free and does not play any role in the cost of virtual 
supply generation. The energy spent by the charge pump and its control circuit should be 
taken into account for computing the total energy advantage of the system. 
Consider a logic block which needs energy Eo for a certain number of transitions. Without 
the scheme discussed here, this entire energy Eo would be drawn from a clean Vdd source. In 
this case the energy cost would thus be Eo . 
Consider the case in which we have the provision to generate virtual Vdd. The energy Ein 
obtained from virtual ground terminals is fed as input to a charge pump. This Ein is a result 
of collection of ground-bound charge after certain switching activity in a "source block". 
As discussed, this Ein is now free energy. The energy drawn from the inputs to the charge 
pump is given by 
DEin = Ein (0) — Ein (T ) 
28 
where, Ein(D) is the energy on inputs to charge pump before the charge pump begins to 
operate. Ein(T) is the energy remaining on the inputs to charge pump after the charge pump 
operation is complete (one iteration). 
Similarly, the energy delivered to output of the charge pump is 
~Eout = Eout (0) — Eout (T ) 
The other source of energy to charge pump other than the virtual ground input is provided 
through the clocks which control the charge pump operation. This energy is denoted by Eclk. 
The charge pump circuit includes some transmission gates which also dissipate certain 
amount of energy. The total dissipation in the charge pump circuit is given by Ediss. 
The charge pump circuit is therefore getting energy from inputs and clocks. A portion of this 
energy is dissipated in the circuit elements and the rest is settled in the capacitors of the 
circuit at the end of time T. 
Based on the Law of conservation of Energy, the energy figure of the circuit is given as 
nF,in + DEclk = ~Eout + nF,p + nF,diss 
where, DEp = Ep (0) — Ep (T) is the resultant energy that is stored in the capacitor Cp. 
All the E values included in this equation vary depending on the input and output logic block 
and the activity that they undergo. 
From this equation we see that to generate a ~Eout equal to Eo with this scheme, the new 
cost would be Eclk which is the only other source of energy. The benefit or savings in energy 
obtained by using the charge pump scheme is the difference in costs in both schemes, i.e. Eo 
—Eclk. 
29 
(Eo — Eclk~ I~ x 100 The percentage improvement in energy is given by E~ 
30 
6 BLOCK LEVEL IMPLEMENTATION 
A typical block level implementation of the proposed scheme is shown in Figure 12. The 
different source blocks are selected based on criteria explained in the previous chapters and 
are clustered to provide virtual ground inputs to the charge pump. The system is also 
designed such that the charge pump's computational delay (TAP) overlaps with the "source 
cluster to target block" delay (Tdelay) • This serves the dual purpose of avoiding any further 
delay degradation in the circuit due to adiabatic charge pump circuit, and of ensuring that the 
output of the charge pump circuit is boosted to proper supply voltage (Vdd) to support the 
desired number of transitions of the target block. 
Virtual Grounds as Inputs to CP 
.' 
~ ~ ~ // 
/ ~ 
t ~ 
~~ _ ( S83 ~ ~'
Source Block Cluster 
i 
~~~~~~~~ 
i ~~ i ~ 
/ ~ 
~~ ~~ .~ ~~ 
Tdelay 
~' 
Charge Pump (CP) 
,•—•-- TcP -----•~ 
t%ir-tual ~'dd 
Figure 12. Block level connections for charge pumps 
An example of a perfectly suitable system that can derive the best results out of the proposed 
scheme is a case where in the components of the system work in a pipelined fashion. One 
31 
such system is an `n' tap FIR Filter as shown in Figure 13 [22, 23] . The different blocks 
shown in the system are as follows. 
D: Delay Element 
M: Multiplier Block 
FA: Full Adder Block 
W: Weight by which the corresponding input is multiplied. 
The proposed charge pump circuit can be conveniently fit into a structure like this as it can 
be observed that for any `r-th' block to execute, all the previous blocks numbered "1" to "r" 
must have executed. This introduces an intrinsic delay within the logic computation. A 
charge pump can be inserted in parallel with this path so that the computation delays are 
overlapped with the charge pump delay. The logic blocks Ml, M2 and FA1 of Figure 13 can 
be mapped onto SB 1, SB2 and SB3 of the Source Block cluster in Figure 12. 
The ground terminals of these source blocks are connected to capacitors which form virtual 
grounds at these nodes. These capacitors serve as inputs to the charge pump. Let us denote 
the time it takes the charge pump to provide a boosted virtual voltage as TAP. The target 
block, TB, is chosen such that the Tdelay from the source block to TB is such that Tdelay > 
Tcp. This ensures that the charge pump has sufficient energy at its output to supply virtual 
Vdd to the target block by the time it gets activated. When the output of the charge pump is 
drained by the target block, the target block is disconnected from this virtual Udd (Charge 


























Figure 13. FIR filter structure 
6.1 Multiple Charge Pumps based Implementations 
In a complex system comprising of several sub-blocks that are scattered both in terms of 
spatial location on the chip and in terms of their temporal schedule, it may be more suitable 
to go for multiple charge pump circuits rather than use one complex charge recycling 
mechanism. Multiple charge pumps can also be used to increase the utility and applicability 
of this scheme to achieve improved energy savings. This can be implemented in two ways. 
(1) Spatially implementing multiple charge pumps, interlaced among different source and 
target blocks. 
(2) Time multiplexing multiple charge pumps to provide continuous virtual supply to certain 
target blocks. 
33 
v~T-teal ud~ ,~ 
.-_~ 
~ ~~~ 
~ ~ ~~ 
~ ~~ 
~~ ..---~.. 
~I 1~ ~'~  * ~ ~-► ._., 
virtual Gnd 's 
i 
..—~~ ---~ 
'~ ~~ `~~ --~,, ~ ~ t 
~`` v'  ~♦ 
~'+~~~~ 
Charge Pump2 
~~ ~_ ~` 
~~ -- --~ 
i  ~ 
—►►,' I + 1~~1 
., ,,,~ 
Charge Pump r 
Figure 14. Spatial implementation of multiple charge pumps 
Figure 14 shows the implementation involving multiple charge pumps to reduce energy 
consumption further. In this method, various source block clusters provide virtual grounds as 
inputs to different charge pumps which in turn provide virtual supplies for various target 
blocks. Here each charge pump is associated with a single target block. When the output of 
the charge pump is drained off its virtual Vdd due to activity in the target block, the target 
block is disconnected from this charge pump and connected to clean Vdd. It is re-connected 
to the same charge. pump once the charge pump has regained the sufficient energy at its 
output to sustain the transitions in the target block. 
34 
Source Cluster 1 
~  ~♦ I '.~ ~ 






Svur-ce Cluster 2 
~ 1 
f ~ ~ 
~ ~ .~ ~ 




Figure 15. Time multiplexing multiple charge pumps 
Figure 15 shows an alternate way of connecting multiple charge pumps in the circuit. Here 
some target blocks are constantly fed from virtual Vdd throughout the operation of the 
circuit. The charge pumps outputs are multiplexed in time domain and connected and 
disconnected accordingly from the target block, maintaining a continuous virtual supply. For 
example, consider a system of two charge pumps that are multiplexed in time to feed the 
target block as shown in Figure 15. At first the output of CPS is connected as virtual Vdd to 
target block. As the target block draws energy from CPI , charge pump CP2 is in the 
charging phase and generates sufficient boosted voltage at its output by tapping the ground 
bound charges from it's source blocks. Once the output of CPI falls below a threshold value, 
it is disconnected from the target block and CP2 is connected to the target block. The source 
and target blocks are chosen such that CPl and CP2 (and so forth) keep alternating to supply 
continuous virtual Vdd to the target block. 
35 
7 SIMULATION RESULTS AND PERFORMANCE ANALYSIS 
The proposed scheme is most suitable for large systems which can be divided into many sub- 
blocks which function sequentially. This scheme can be implemented even more efficiently if 
the application can tolerate some inaccuracy. This is because the voltage levels of virtual 
ground terminals are based on the average case charge flow through source logic blocks. If 
the actual ground-bound charge flow is significantly lower than the average, the following 
virtual Ydd terminals may not achieve desired voltage levels designed to tolerate reasonable 
noise margins. Hence noise margins drop potentially resulting in more inaccurate results. In 
these cases, no backup and voltage monitors need be designed. DSP applications where the 
least significant bits do not carry much information can tolerate these inaccuracies in the least 
significant bits. They are also easily pipelined. Three DSP applications, namely FIR, FFT 
and DCT were chosen because they have architectures which satisfy both the conditions (of 
sequential computation blocks and inaccuracy tolerance). Also these systems are mostly 
found in areas where power reduction is usually a major issue, for example in cellular 
phones. All the three systems were implemented in SPICE (in Cadence design setup) in 
TSMC0.18µ technology (with supply voltage of 1.8V) and tested with the adiabatic charge 
pump circuits. 
To estimate the energy savings of the proposed scheme, various small and big circuits were 
considered and implemented in Cadence using BSIM3 (Level 49) HSPICE models. All the 
three .architectures (1~'r"1', FIR, DCT) were also implemented with and without the charge 
recovery scheme discussed in this paper, and the resulting total energy savings were 
36 
computed. The circuits were first divided into smaller blocks like multipliers, adders, or 
combination stages of both. Certain blocks were chosen for virtual ground (source blocks) 
and certain for virtual I~dd (target blocks). The .source and target blocks were chosen such 
that they had significant computational delay. The time separation between these two blocks 
ensures that the charge pump circuit has enough time to generate the virtual supply to the 
target block. 
Table 1(a). Energy costs with and without the charge pump (CP) circuit in various circuit blocks with two 
random sample inputs (setl and sett). 
Circuit under 
consideration 




cost with CP (pj) 
Total energy savings (%) 
(incl. control circuit 
dissipatt'on) 
r'I.1~ set 1 
sett 
2.656 2.06 11.12 






3.77 3.033 10.88 
~`r"I' setl 
sett 
2.239 1.65 11.67 
6.125 4.975 13.44 
64 bit setl 
ALU sett 
0.962 0.62 1.56 
1.438 1.009 7.1 
37 
An adiabatically controlled charge pump was included in each of these systems. During the 
period of computation between the source and target blocks, the adiabatic charge pump 
recycles the charge and generates sufficient virtual Vdd for the target block. 
Table 1(a) shows a set of energy cost values, i.e., the energy drawn from the power supply 
pin, for different systems with and without the CPs (charge pumps). Two random sets of 
inputs (sets 1 and 2) were given to these systems and their energy consumption figures are 
given in Table 1(a). 
Figure 16 depicts the percentage savings achieved by using this scheme in different 
circuits. For most of the systems about 15% energy savings are observed. This figure can be 
further improved by refining the charge pump circuits to draw optimally minimum energy. In 
Figure 17 we notice that this scheme is most beneficial i.e. has higher savings when used to 
generate supply terminals for large logic blocks with high activity. This is because, in small 
logic blocks with less signal activity, the energy required is low. The energy dissipation in 
the control circuitry which is a constant quantity would dominate this energy and reduce the 
efficiency of the scheme. an the other hand, in large circuits with high signal activity, the 
energy spent in control circuit forms a negligible fraction of the saved energy. 
38 













1 m I-1~ 
FIR FIR DCT DCT FFT FFT A LU A LU 
O Energy cost w ithout charge pumps 
O Energy cost w ith charge pumps 









Percentage savings vs 6iergy cost 
O 2 4 6 
Origins! Energy Cost 
8 
Figure 17. Energy savings 
Another variation of the scheme involving various multiplexed charge pumps yields higher 
energy savings for large circuits. This is illustrated in Table 1(b) which shows energy cost 
values for selected circuits with and without the charge pumps. 
39 
Table 1(b). Energy costs with and without time multiplexed charge pumps (CP) in various circuit blocks with 
two random sample inputs (setl and sett). 
Circuit under . consideration 
Energy cost w/o 
CP (p~) 
External energy cost 
with CP (pJ) 
Total energy savings (%) 
(incl. Control circuit 
dissi anon P ~ 
5.23 7 3.901 13.63 ~'Ii~ set l 
sett 6.492 4.763 17.40 
DCT set 1 
sett 
2.230 1.160 9.98 
4.860 3.440 12.36 
2.501 1.566 10.97 ~'r'1' set 1 
sett 7.117 5.528 13.61 
64 bit setl 
ALU sett 
1.581 1.049 0.042 
1.939 1.400 1.530 
Figure ~ 8 shows a graph of percentage energy savings with respect to energy cost of the 
system. It can be observed that that this scheme is most beneficial i.e. has higher savings, 
when used to generate supply for large logic blocks of high activity. Note that time 
multiplexing of multiple charge pumps to provide extended virtual supply to target blocks 
can improve the overall energy savings in large circuits. 
40 
DCT FFT FFT ALU ALU 
D Energy cost w ithout chargepumps 
D Energy cost w ith charge pump 
Figure 18. Energy costs -Time multiplexed implementation 
To get an estimate of the area overhead due to the proposed scheme, the number of 
transistors in the system with and without charge pump were compared. To account for the 
area occupied by the various input capacitors and the intermediate capacitors, an equivalent 
representative number of transistors were computed based on capacitor density in the process 
and minimum transistor size. Depending on the size of the capacitor, a fixed transistor count 
was then added to the overall system while computing the area overhead. The additional 
circuitry including the charge pump and control circuitry has less than 100 equivalent 
transistors which form less than 2% of the total transistor count of the circuits considered, as 
shown in Table 1(c). The energy savings can be increased further by spatial implementation 
of multiple charge pumps. Though this would result in an increase in transistor count, the 
additional area overhead would still be a small fraction of the entire device area. This 
measure is graphically represented in Figure 19 where from it can be inferred that the area 
increase is negligible enough to be neglected as a significant overhead. 
41 




# of transistors 
w/o CP 
# of transistors 
with CP 
%increase in area 
FIR 24224 24302 0.32 
DCT 20032 20110 0.38 
i 1552 11630 0.67 rr-1~ 







0 ® '~' 
Area w/o CP 
Area w ith C P 
.4
FIR DCT FFT 64 bit 
ALU 
Figure 19. Area figures with and without charge pumps 
Another figure-of-merit that needs to be computed is the speed of operation of the target 
block or equivalently the circuit delay. Since the virtual Vdd generated by the charge pump is 
not exactly equal to the clean Ydd and is also not an ideal voltage source with infinite supply 
42 
of charge, with time the voltage value drops gradually. The logic block when powered with 
virtual Vdd will be slightly slower than when fed with a clean Vdd due to the reduced voltage 
swing. These delay values due to lowered voltage swing in a few sample circuits are shown 
in Table 2. It can be seen that the percentage increase in delay due to this is quite negligible 
and does not degrade the circuit performance. 









26.73 26.94 0.78 ~'IK 
Dom' 41.35 41.82 1.13 
~~r-1' 18.21 18.35 0.76 
64 bit ALU 10.49 10.74 2.38 
43 
8 CONCLUSIONS 
The typical CMOS logic is designed to draw energy from the power supply at Vdd and to 
dump it into the ground terminal. The adiabatic CMOS design style reduces the energy 
converted into heat in the transistors of the logic by maintaining as low as possible drain to 
source voltages. The traditional adiabatic recovery phase recovers the energy from the logic 
into an oscillating power supply. The main shortcoming of adiabatic CMOS is the need to 
slow down the logic to accommodate the adiabatic charge transfer. Additionally, a way of 
integrating high-frequency RLC power supplies into a reasonable sized silicon area has not 
been found. The proposed charge recovery mechanism cuts the Vdd to ground path by 
introducing virtual ground nodes to collect the dumped charge which is then recycled into 
virtual Vdd nodes to supply other logic blocks (that are activated later in time, and hence can 
tolerate the intervening charge pump latency). The charge pumps incorporate adiabatic 
charge transfer in order to save energy (a charge pump operating normally would result in net 
energy loss). The proposed charge recycling places adiabatic blocks in the paths that can 
tolerate its latency, hence this scheme does not result in any performance loss. The pipelined 
datapath units and algorithms are ideal candidates for the proposed scheme. This schema also 
incorporates a limited degree of voltage scaling naturally since all the virtual ground and Vdd 
logic blacks operate with a reduced voltage swing. We incorporated this design methodology 
in several DSP filters such as FIB and DCT/IDCT. The resulting energy savings are of the 
order of 18°0. Future work includes other spatially and temporally multiplexed charge pumps 
to increase the deployability of the scheme. We are also considering CAD algorithms to 
identify the logic blocks that can benefit from the proposed scheme both at the logic 
synthesis and at netlist levels. 
45 
BIBLIOGRAPHY 
[1] Arslan, T., Horrocks, D.H. and Erdogan, A.T., "Overview and Design Directions for Low-
Power Circuits and Architectures for Digital .Signal Processing'; IEE Colloquium on Low 
Power Analogue and Digital VLSI: ASKS, June 1995. pp 6/1— 6/5. 
[2] Garcia, A., Burleson, W. and Danger, J.L., "Low Power Digital Design in FPGAs: A study of 
Pipeline Architectures implemented in a FPGA using a low supply voltage to reduce power 
consumption ', IEEE International Symposium on Circuits and Systems, Vol. 5, 2000, Geneva. 
pp. 561-564. 
[3] Chandrakasan, A.P., Sheng, S. and Brodersen, R.W., "Low-Power CMOS Digital Design", 
IEEE Journal of Solid-State Circuits. Vo1.27, No.4, April 1992. pp. 473-484. 
[4] Uming Ko, Balsara, T. and Wai Lee, "Low-Power Design Techniques forHigh-Performance 
CMOS Adders", IEEE Trans. On VLSI Systems, Vol. 3, No. 2, June 1995. pp. 327- 333. 
[5] Guyot, A. and Abou-Samra, S., "Low Power CMOS Digital Design ", Proceedings of the 
Tenth International Conference on Microelectronics (ICM), 1998. pp.I.P.6-I.P.13. 
[6] Dickson, J., "On-chip High- Voltage Generation in NMOS Integrated Circuits Using an 
Improved Voltage Multiplier Technique", IEEE Journal of Solid-State Circuits, Vol.11, No.6, 
June 1976. pp. 374-378. 
[7] Pylarinos, L., "Charge Pumps: An Overview", Edward S. Rogers Sr. Department of Electrical 
and Computer Engineering, University of Toronto. 
[8] San, H. and et.al, "Highly-Efficient Low- Voltage-Operation Charge Pump Circuits Using 
Bootstrapped Gate Transfer Switches", T. IEE Japan, Vol. 120-C, No.10, 2000. pp. 1339-1345. 
46 
[9] A. P. Chandrakasan et al., "A Low Power Chipset for Portable Multt'media Applications," 
IEEE ISSCC, pp. 82- $3, 1994. 
[10] M. Takada, "Low Power Memory Design," IEDM Short Course Program, 1993. 
[11J H. Veendrick, "Short Circuit Dissipation of Static CMOS Circuitry and its Impact on the 
Design of Buffer Circuits," IEEE Journal for Solid State Circuits, vol. 19, no. 4, Aug. 1984. 
[12] L. Bisdounis, O. Koufopavlou, and S. Nikolaidas, "Accurate Evaluation of CMOS Short 
Circuit Power Dissipation for Short Channel Devices," International Symposium on Low Power 
Electronics and Design, Monterey, CA, Aug. 1996, pp. 181-192. 
[13J Y. Taur et al., "High Performance 0.1 mm, CMOS Devices with 1.5V Power Supply," IEDM 
Tech. Dig., vol. 38, pp. 127-130, 1993. 
[14] T. Tanzawa and T. Tanaka, "A Dynamic Analysis of the Dickson Charge Pump," IEEE 
Journal for Solid State Circuits, vol . 32, pp. 1231-1240, Aug. 1997. 
[ 15] J. S. Witters, G. Groesenken, and H. E. Maes, "Analysis and Modeling of on-chip high 
voltage generator circuits ,for use in EEPROM circuits," IEEE Journal for SOlid State Circuits, 
vol. 24, pp. 1372-1380, Oct. 1989. 
[16J J. D. Meindl, "Low Power Microelectronics: Retrospect and Prospect," IEEE, vol. 83, pp. 
619-635, 1995. 
17 J. D. Meindl, "Theoretical, Practical, and Analogical Limits in ULSI," IEDM Tech. Dig., 
vol. 28, pp. 127-130, 1993. 
[18J J. Monteiro and S. Devadas, "A Methodology for Efficient Estimation of Switching Activity 
in Sequentz'al Circuits," ACl1/I/IEEE Design Automation Conference, pp. 12-17, 1994. 
47 
[19] F. Najm, "A Survey of Power Estz'mation Techniques in VLSI Circuits," IEEE Trans. VLSI 
Systems., pp. 446-455, Dec. 1994. 
[20] T. L. Chou and K. Roy, "Accurate Estimatz'on of Power Dissipatz`on in CMOS Sequential 
Circuits", IEEE Trans. VLSI Systems, pp. 369-380, Sept. 1996. 
[21] L. J. Svensson and J. G. Koller, "Driving a Capacitz've Load without Dissipating fCV2," 
IEEE Symposium on Low Power Electronics, pp. 100-101, 1994. 
[22] D. E. Borth, I. A. Gerson, J. R. Haug, and C. D. Thompson., "A flexible adaptive FIR~lter 
VLSIIC," IEEE Journal on Select. Areas Commun., SAC, pp. 494-503, Apr 1988. 
[23] T. Yoshino, R. Jain and et al.. "A 100-MHz 64-tap FIR digital filter in 0.8,u BiCMOSgate 
array," IEEE Journal on Solid State Circuits, vol. 25, pp. 1494-1501, Dec 1990. 
