










Citation Nicolas Butzen, Michiel Steyaert, (2016), 
A 94.6%-Efficiency Fully Integrated Switched-Capacitor DC-DC 
Converter in Baseline 40nm CMOS Using Scalable Parasitic Charge 
Redistribution 
Digest of Technical Papers - IEEE International Solid-State Circuits 
Conference, 59, pp. 220-221. 
Archived version Author manuscript: the content is identical to the content of the published 
paper, but without the final typesetting by the publisher 
Published version http://dx.doi.org/10.1109/ISSCC.2016.7417986 
Journal homepage http://www.isscc.org 
Author contact nicolas.butzen@esat.kuleuven.be 
+ 32 (0)16 325534 
  
 
(article begins on next page) 
12.2 A 94.6%-Efficiency Fully Integrated 
Switched-Capacitor DC-DC Converter in Baseline 
40nm CMOS Using Scalable Parasitic Charge 
Redistribution 
 
Nicolas Butzen, Michiel Steyaert 
 
KU Leuven, Leuven, Belgium 
 
In recent years, there has been an ever increasing interest in monolithic power 
supplies. Integrating the power supply with the application has many direct 
benefits, including a reduction of the bill of materials and reduced size. Even 
more substantial are the potential efficiency gains due to reduced power delivery 
network losses and voltage margins, especially in a world dominated by energy 
limited devices where this translates directly into improved battery life. However, 
to warrant a migration to integrated power supplies, it is crucial that these gains 
do not get eclipsed by the decreased efficiency of the power converter caused by 
the reduced quality of integrated passives. Switched-Capacitor (SC) converters 
have become more and more popular because, contrary to inductive converters, 
they only use transistors and capacitors, both of which are native to CMOS 
technologies and scale well into deep submicron nodes. The maximal efficiency 
of a SC converter depends on two factors [1]: a topological parameter whose 
optimal value depends on the Voltage Conversion Ratio (VCR), and α, the 
relative size of the parasitic, so-called Bottom-Plate (BP), capacitance to the 
flying capacitance. The further the VCR is removed from 1/1 and the larger α, the 
lower the efficiency that is obtained. With α typically around 1.5% for MOM- and 
MIM-, and 7% for MOS capacitors, a SC converter could theoretically achieve an 
efficiency of 89% and 79% respectively for a 1/2 conversion [1]. Due to additional 
losses (control, leakage, etc.), this efficiency ceiling is lower, but its existence is 
still confirmed by previous work. The highest reported monolithic SC converter 
efficiencies in baseline CMOS are 87% [2], although at a more favorable VCR of 
4/5, and 85% [3], both using MIM capacitors. Higher efficiencies have been 
demonstrated using high-density Deep-Trench (90%) [4] or Ferro-Electric (91% 
in 1/2) [5] capacitors, which have reportedly up to 25 times smaller α. However, 
these capacitors are not part of baseline CMOS and thus require additional 
masks and costs.  
 
The design in [6] attempts to reduce the impact of the parasitic coupling by 
shorting the BP nodes of two converters in antiphase during the dead time 
in-between phase transitions, effectively redistributing half of the charge from the 
discharging BP capacitor to the charging one. The supply voltage consequently 
only needs to supply the remaining half to charge the BP capacitor, resulting in a 
2x reduction in BP losses. While effective, this method does not scale to higher 
levels of redistribution because it still uses regular two-phase control signals for 
each core. This paper presents a scalable technique that significantly increases 
the efficiency of integrated converters by redistributing parasitic charge up to any 
desired level. 
 
Figure 12.2.1 compares the presented technique with 8 cores and 3 Charge 
Redistribution Steps (CRS) to regular 8 core Time-Interleaving (TI) for a 1/2 
converter. With TI, the converter is split into N smaller converter cores. At each 
clock edge, the 2 cores that have been in the high/low state the longest, transition 
to the next state, fully discharging/charging its BP capacitors voltage (VBP) to 
Vss/Vout in the process. In the proposed method, the converter is also split into N 
cores, but instead of a core transitioning from high/low to low/high directly, it 
enters an intermediary, dedicated BP discharging/charging state where all the 
regular power transistors are non-conducting. Here, at every clock edge, each 
BP discharging core is paired with the BP charging core that has the closest, yet 
lower VBP. By shorting the BP nodes of each pair, their VBP’s average out by 
transferring charge from the BP discharging to the BP charging core. BP 
charging/discharging cores for which there is no BP discharging/charging core 
with a higher/lower VBP, will instead transition to the high/low state just like a 
regular SC converter. Generally, only 1/(CRS+1) of the initial charge has to be 
supplied by Vout and the associated BP losses are drastically reduced. Because 
all cores are only shifted in phase relative to each other, the necessary 
connections between cores and timing of the charge exchanges are known at 
design time. Moreover, the phases as shown in Fig. 12.2.1 are stable and require 
no initialization. 
 
The presented technique has been extended to 16 cores and 9 CRS, thus 
reducing the BP losses tenfold and leading to a record fully-integrated DC-DC 
conversion efficiency of 94.6%. Figure 12.2.2 shows the system overview of this 
implementation. With 16 cores, a total of 72 different core connections are 
required for the BP charge redistribution. Instead of using a dedicated 
interconnect line for each of those connections, 8 Charge Redistribution Buses 
are used. When two BP nodes need to be shorted, they are both connected to the 
same bus. Furthermore, the bus they use depends on the resulting voltage after 
their VBP’s average out. The end result is that significantly less area overhead is 
needed for BP interconnects and that the swing on each interconnect is 
approximately zero, reducing charging/discharging losses of the interconnects’ 
parasitic coupling and effectively making them DC voltage rails. Normally this 
would require 9 buses (one for each intermediate voltage). In this design 
however, the 0.5xVout bus is replaced with a short connection between each 
in-/anti-phase pair which are physically next to each other. Because they transfer 
charge for the relatively small BP capacitors, the Redistribution transistors are 
significantly smaller than the regular power transistors. By consequence, the 
increase in transistor losses is much smaller than the achieved reduction in 
capacitor losses. The output of the converter is regulated at a fixed voltage Vref 
under varying input voltages and load conditions using a Hysteretic controller 
clocked at 50MHz. Because a core is only switched to Vout every other phase, 
each trigger event causes two clock pulses to be passed on to the 32-phase 
Non-Overlapping Clock (NOC) generator, assuring a fast response time. The 
converter frequency is consequently 3MHz (2*50/32). The width of the CLKmult 
pulse, set by a Voltage Controlled Delay (VCD), determines the non-overlapping 
time between phases. All 32 phase signals are subsequently used to locally 
generate the control signals at each core pair. Figure 12.2.3 shows an example 
local decoding schema together with the NOC generator implementation. 
Moreover, because the presented technique inherently extends on the 
time-interleaving concept, no decoupling is required at the output of the converter, 
which allows the power density of the presented work to be higher than those of 
previous record-efficiency baseline CMOS designs [2, 3]. 
 
The design is realized in a 40nm baseline CMOS process using 10nF MOM 
capacitance and the measured efficiency versus output power is plotted in Fig. 
12.2.4. A peak efficiency of 94.6% is achieved at loads of 2.7 to 3.15mW. The 
converter produces an output voltage of 0.9V with an input voltage ranging from 
1.855V to 2.07V. Figure 12.2.5 shows the load-step response of the system, 
switching from full load to self-loading and back with a transient time of 8ns. Even 
without the use of an output capacitor Cdc, the droop and overshoot are only 21 
and 18mV respectively. Figure 12.2.6 compares the results with the current 
state-of-the-art of highly-efficient SC converters. The presented work achieves a 
significantly higher efficiency than other SC converters, including those using 
high-density capacitors requiring extra masks. The full chip measures 2.4mm2 
excluding bond pads, as demonstrated in Fig. 12.2.7. 
 
This work introduces a scalable parasitic charge-redistribution technique that is 
able to significantly increase the efficiency of integrated SC converters. A circuit 
has been fabricated in a 40nm baseline CMOS process that demonstrates the 
presented technique and advances the state-of-the-art by achieving a record 
efficiency for fully-integrated SC converters of 94.6%. 
 
References: 
[1] H.-P. Le, et al., "Design Techniques for Fully Integrated Switched-Capacitor 
DC-DC Converters," IEEE JSSC, vol.46, no. 9, pp. 2120-2131, Sep. 2011. 
[2] T. Van Breussegem, et al., "A fully integrated gearbox capacitive 
DC/DC-converter in 90nm CMOS: Optimization, control and measurements," 
IEEE Control and Modeling for Power Electronics (COMPEL), June 2010. 
[3] L. G. Salem, et al., "An 85%-efficiency fully integrated 15-ratio recursive 
switched-capacitor DC-DC converter with 0.1-to-2.2V output voltage range," 
IEEE ISSCC Dig. Tech. Papers, pp. 88-89, Feb. 2014. 
[4] L. Chang, et al., “A Fully-Integrated Switched-Capacitor 2:1 Voltage Converter 
with Regulation Capability and 90% Efficiency at 2.3A/mm2,” IEEE Symp. VLSI 
Circuits, pp. 55-56, June 2010. 
[5] D. El-Damak, et al., “A 93% Efficiency Reconfigurable Switched-Capacitor 
DC-DC Converter using On-Chip Ferroelectric Capacitors,” IEEE ISSCC Dig. 
Tech. Papers, pp. 374-375, Feb. 2013. 
[6] T. M. Andersen, et al., “A 4.6W/mm2 power density 86% efficiency on-chip 
switched capacitor DC-DC converter in 32 nm SOI CMOS,” IEEE Applied Power 
Electronics Conference and Exposition (APEC), pp. 692-699, March 2013.
 Figure 12.2.1: Comparison of regular time-interleaving to the presented 
charge redistribution technique. 
 
Figure 12.2.2: System overview of the converter together with transistor 
level implementation of the converter cores. 
 
Figure 12.2.3: Implementation of the 32-phase Non-Overlapping-Clock 
(NOC) generator and an example decoding schema. 
 
Figure 12.2.4: Measured efficiency versus output power for Vin=1.855V 
and Vout=900mV. The peak efficiency is achieved for output powers of 
2.7 to 3.15mW. 
 
Figure 12.2.5: Full load-step transient response with Vin=1.855V and 
Vref=900mV. The load current switches from 4.25mA to zero and back 
with a transition time of 8ns. 
 
Figure 12.2.6: Comparison with fully-integrated, highly-efficient 
switched-capacitor DC-DC converters. Baseline CMOS designs are 
highlighted. 
 
Figure 12.2.7: Chip micrograph with a total area of 2.4mm2. 
