Abstract-A monolithic switched-capacitor power converter is designed in the nanoscale CMOS technology, employing a two-stage system architecture to mitigate overvoltage breakdown risk. The efficiency impact of various parasitic components is analyzed for design optimization on both the topology level and the device level. An external capacitorless low-dropout regulator with a wide bandwidth and a high power supply rejection ratio is introduced, removing high-frequency ripples and enabling a fast load transient response. The proposed design was implemented with the 180-nm CMOS process. At a switching frequency of 100 MHz, it regulates V out at 0.8 V, with a maximum power efficiency of 60%. When the load current switches between 1.5 and 15 mA, V out responds within 45 ns, with a voltage droop below 45 mV.
I. INTRODUCTION
T HE exponential growth of mobile electronic devices has fueled perpetual demands for significantly broadened functionality and advanced signal processing capabilities. This has led to the continuous growth of mixed-signal systemon-a-chip (SoC) technologies, as CMOS technology enters nanometer regimes [1] . To power multiple function blocks and facilitate such diverse functionalities in a SoC, it is imperative to implement power converters fully on-chip. For inductorbased switching power converters, either a technologically intensive on-chip inductor or a bulky off-chip inductor is essential, leading to a high cost and a large system volume [2] . In contrast to inductor-based converters, switched capacitor (SC) power converters employ capacitors as power storage devices [3] - [9] , which are feasible for an on-chip implementation. However, for a fully integrated SC power converter, a large output filter capacitor is traditionally required to attain a small ripple voltage, demanding a large die area. In addition, although the operation speed, power consumption, and the integration density are highly improved in the nanoscale CMOS technology, the breakdown voltages of standard core transistors have degraded, exacerbating major failure mechanisms, including oxide breakdown and source/drain-substrate junction diode breakdown [10] , [11] . This makes it extremely challenging to fulfill highly reliable SC power converters on the nanoscale CMOS and seriously limits the input and output voltage ranges. Moreover, as advanced SoCs are generally powered by battery cell(s), the voltage of which is several times higher than the breakdown voltage of core transistors, the reliability of such an SC power converter is further threatened.
To elaborate this, Fig. 1 shows a conventional SC power converter in the nanoscale CMOS technology. When it operates in charge phase Φ ch , circuit node V 1 is connected to input V in , and the voltage stresses across the thin gate oxide (V DG2 ) and the drain-source (V DS2 ) of power switch M 2 are V high = V in − V out , which is higher than the rated supply voltage V DD . In discharge phase Φ dch , node V 1 is connected to output V out ; thus, the V DG and V SD stresses on switch M 1 are also V high . For example, overvoltage stress V high is up to 2.1 V (much larger than the V DD of nanoscale CMOS processes) under the 3-V V in and the 0.9-V V out . Hence, M 1 and M 2 both face multiple reliability risks since their oxides and channels undergo considerable overvoltage stresses. Even a small increment of such a stress can lead to device malfunction or breakdown. For example, a study shows that a 10% increase in the gate electric field can reduce the average lifetime for the gate oxide by ten times [1] .
In order to mitigate breakdown risks and achieve robust operations, both device-and circuit-level measures can be taken. Device-level techniques employ technological solutions such as thickened oxide or extended drains to increase breakdown voltages. However, these techniques require extra process steps and a larger die area, thereby significantly increasing the fabrication overhead. On the other hand, circuit techniques employ transistor stacking structures [10] - [12] . However, a major drawback is the reduction of the power efficiency as a consequence of increased power losses from additional gate drivers and the stacked transistors. Decent power efficiency is critical to extend the battery lifetime. On the other hand, by migrating to a highfrequency operation, the normally off-chip capacitors in the SC power converters can be fully integrated, therefore achieving a highly compact solution. However, the high-frequency operation can dramatically reduce the efficiency. In addition, a bottom-plate parasitic capacitor (C par ), which is located between the bottom plate of a pumping capacitor and the chip substrate, is another main source of power loss. Since its capacitance is usually not negligible, a large amount of charge on the bottom plate is usually dumped to ground, essentially degrading the efficiency. Lastly, the RF/wireless circuits in the SoCs are sensitive to the power supply noise. Thus, the power supply with a low voltage ripple, a fast load transient response, and a predictable noise spectrum is highly desirable. For traditional discrete implementations, output filter capacitors are made sufficiently large to lower the ripple noise; however, large output capacitors are difficult to be integrated and limit load transient responses.
To overcome the aforementioned challenges, a fully integrated two-stage SC power converter, which consists of an SC power stage and a low-dropout (LDO) regulator, is proposed to achieve high reliability and high supply noise rejection, and to maintain decent power efficiency. The rest of this paper is organized as follows. The system design strategy and architecture are presented in Section II. Detailed circuit implementations are addressed in Section III. Experimental results are then provided in Section IV to verify the functionality and to evaluate the performance of the proposed converter. Finally, we conclude this paper in Section V.
II. SYSTEM DESIGN STRATEGY AND ARCHITECTURE

A. System Design Strategy
The main objective of our system design strategy is to mitigate the breakdown risks imposed by a high input voltage while still receiving benefit from the high performance by the thinoxide core transistors. For the conventional converter in Fig. 1 , the voltage swing V 1 at the top plate of capacitor C 1 is high, which causes the overvoltage stress, i.e., V high , across the gate oxide, and the drain-source is larger than V DD . By reducing V 1 , it is feasible to mitigate the overvoltage stress, i.e., if the output voltage (V out ) of the first stage meets the requirement of Fig. 2(b) , the voltages across the gate oxide (V DG ) and the drain-source (V DS ) of power switches M 1 and M 2 are limited within V DD . Therefore, the aforementioned stresses on power switches M 1 and M 2 are eliminated. Since the output of the SC stage is raised to V out , a second-stage circuit, i.e., an LDO regulator, is employed to regulate the final output V out , as proposed in Fig. 2(a) . With the proposed architecture, the reliability of the power switches is ensured, and the converter can operate beyond the breakdown voltage limit of the process.
In addition to the breakdown-resilient feature, the proposed design also achieves a highly compact monolithic implementation. Since the power supply rejection ratio (PSRR) of the LDO regulator is high, the ripple voltage of the SC power stage can be suppressed. In this way, although the output ripple of the SC power stage is high, a low ripple at V out can be still attained. On the other hand, for a conventional SC power converter, the output filter capacitor has to be large to achieve a low ripple voltage, which leads to a high silicon cost. Thus, the LDO regulator with a high PSRR not only reduces the output ripple but also facilitates an efficient implementation, even if the area of the LDO is taken into account. Furthermore, because the output voltage of the SC power stage is increased with the proposed two-stage architecture, the bottom-plate parasitic capacitances have a lower voltage swing and, thus, a lower power loss. Fig. 3 illustrates the circuit block diagram with the key waveforms of the proposed converter. The input voltage is 2.8 V, while the output is 0.8 V to power SoCs in nanoscale technologies. As shown in Fig. 3(b) , to realize the targeted voltage conversion, an SC converter with a 1/3 conversion ratio is traditionally adopted. However, power transistors M 1 and M 2 would suffer a 2-V overvoltage stress. In addition, the output capacitor should be sufficiently large to minimize the ripple noise. The conversion ratio of the proposed SC power stage is designed as 2/5 instead of 1/3. With a higher conversion ratio, the voltages across the gate oxide and the drain-source of power switches M 1 and M 2 are limited within V DD ; thus, the critical reliability issues are overcome under the 2.8-V input voltage. To further ensure the reliability of each power transistor, the gate drivers are carefully designed to limit the oxide voltage within V DD when each power transistor is turned on.
B. System Architecture
The proposed SC power stage is an open-loop design considering the output drooping during the load transient, the output ripple in the steady state, and the switching noise spectrum in the frequency domain. A closed-loop control scheme incurs a time delay due to the feedback, which worsens the voltage drooping during the load transient. In addition, a highfrequency fast-switching open-loop stage can ensure a small output ripple in the steady state. As the proposed SC stage has a predictable noise spectrum from the fixed switching frequency, it is more suitable for noise-sensitive analog/RF systems.
The second stage of the system is implemented by a wideband LDO regulator. As shown in the key waveforms in Fig. 3(b) , due to its wideband closed-loop control scheme, a fast load transient response is achieved, thereby minimizing the voltage droop during the load transient. The fast feedback loop also facilitates a small output capacitor, which enables its on-chip integration. As discussed in Section II-A, since the PSRR of the LDO regulator is high, the output ripple can be minimized without a large output capacitor.
III. CIRCUIT IMPLEMENTATIONS OF POWER CONVERTER
A. Breakdown-Resilient SC Power Stage With Two-Level Efficiency Optimization
The proposed SC power stage is optimized on both the circuit and device levels to maintain the efficiency at a high switching frequency. In the meantime, it mitigates the breakdown risks without using either thick-oxide or drain-extended transistors. The implementation with only the thin-oxide transistor significantly improves the power efficiency and reduces the fabrication cost.
For the circuit implementation, two potential topologies (topologies A and B) with a 2/5 conversion ratio are proposed in Fig. 4 . The parasitic capacitance C par,i of pumping capacitor C i and its voltage V par,i are illustrated correspondingly. The total power loss (P par ) from the parasitic capacitances is
where f sw is the switching frequency, and V par,i,h and V par,i,l are the high and low values of voltage V par,i , respectively [13] - [17] . The P par of topology A can be calculated as 0.56f sw C par V 2 in , which is less than half of that of topology B (1.16f sw C par V 2 in ). Thus, topology A is adopted to realize the 2/5 SC power stage in this paper.
The proposed SC stage based on topology A is shown in Fig. 5(a) . Its timing diagram and reliable operation scheme are illustrated in Fig. 5(b) and (c), respectively. In charge phase Φ ch , power switches M 1 , M 4 , M 5 , M 9 , and M 10 are turned on, with the voltage across the oxide V oxide = V DD . Pumping capacitor C 1 is connected with input source V in ; thus, all pumping capacitors C 1 , C 2 , and C 3 are charged up. At the same time, the energy drawn from V in is also delivered to the output of the power stage. Assume that C 1 , C 2 , and C 3 are equal to C. At the end of Φ ch , the voltages across capacitors C 1 and 7 , and M 8 are turned on, with V oxide = V DD . As a result, C 1 is disconnected from V in . The energy stored in C 1 , C 2 , and C 3 during the previous Φ ch is discharged to output V O−SC . At the end of phase Φ dch , the voltages across capacitors C 1 and C 2 (C 3 ) are V O−SC and V O−SC /2, respectively. When the power stage works in the steady state, the total charge difference on C 1 , C 2 , and C 3 between two phases is equal to the charge consumed by load resistor R L in one switching period T as follows:
Considering that C 1 , C 2 , and C 3 are equal to C, we have As shown in Fig. 5 , when transistor M 1 or M 2 is off, the voltages across the oxide and the drain-source are V in − V O−SC = V DD , indicating that the overvoltage stresses of M 1 and M 2 are overcome. The junction diode breakdown between the drain/source and the body node is also examined. In advanced nanoscale CMOS technologies, junction breakdown happens at the voltage levels that are several times larger than V DD [18] . If a triple-well (deep n-well) nMOS transistor is available in the technology [18] , [19] , the body node of nMOS transistors is connected to its source node, which overcomes the risk of drain/source-to-body junction breakdown. Meanwhile, by tying the body to the source node, the body effect is eliminated, leading to a smaller threshold voltage. This means a lower ON-resistance and less conduction power loss [6] . If the technology does not provide a deep triple-well nMOS transistor, the proposed power stage still works. However, the input voltage is limited within a value of V in−max such that, for the output power transistor M 2 , the requirement of V DB2−max = V in−max < V j−breakdown is satisfied to avoid the junction breakdown. In addition, the body node of the pMOS (n-well) is accessible in a standard p-substrate CMOS technology; thus, the body can be tied to the source node. Consequently, the largest reverse voltage of the body-to-drain junction diode is equal to the source-to-drain voltage V SD , which is not higher than V DD in the power stage. For the pMOS body-to-substrate (n-well to chip substrate) junction diode, the reverse breakdown voltage is even higher than the drain/source-to-body diode due to its light doping level. Therefore, the pMOS transistors in the proposed SC stage do not suffer from the junction breakdown.
In addition to the efficiency optimization on the circuit level, the optimization is also performed on the device level, following the approaches in [6] - [9] . The total power losses (P total ) can be calculated as
where P R on is the conduction power loss of the transistor ON-resistance R on , P C GS is the switching power loss of the transistor gate-source capacitance C GS , P C is the intrinsic loss due to the charge redistribution of capacitors, and P par is the power loss from the parasitic capacitances. The power losses from the gate drivers and the leakage to the bulk are negligible since both losses are much lower than the other losses in (4). Conduction power loss P R on is
where D i , R on,i , a r,i , and I out stand for the duty ratio, the ONresistance of power transistor M i , the switch charge multiplier of M i , and the output current of the power stage, respectively. D i is 0.5 in the proposed design. For power transistor M i with 
Consider that V O−SC = 2V in /5 and that Q in,Φch is equal to 2Q out /5. Thus, Q out,Φch = 2Q out /5 and Q out,Φdch = 3Q out /5 can be derived with the law of conservation of charge. Then, |a r,i | = 2/5 i = (1, 2, 3, 4, 10), and |a r,i | = 1/5 (i = 5, 6, 7, 8, 9) can be obtained [6] , [15] . In Table I ,
The minimum length L min is used for the power transistors. From (5), conduction power loss P R on is calculated as
Switching power loss P C GS is
where C GS,i represents the gate-source capacitance of power transistor M i . Note that the width of nMOS transistor W n is equal to k μ W p ; thus, the switching power loss is
The intrinsic switched-capacitor loss P C can be expressed as
Cf sw +2× The power loss from parasitic capacitances P par is computed, as in (1). From (7) and (9), P R on and P C GS have opposite trends with regard to width W p . Therefore, by satisfying ∂P total /∂W p = 0, an optimized width W optimized can be reached to minimize the total power losses, leading to the maximum power efficiency. The optimized width is derived as
With the widths of the pMOS transistors determined, the optimized widths of the nMOS can be calculated according to Table I .
B. Design of Gate Drivers
The clock generator, the level shifters, and the gate drivers are shown in Fig. 6 . By adopting cross-coupled capacitive level shifters in Fig. 6(b) and (c) , the nonoverlapping clocks CLK 1 , CLK 1p , and CLK 2 (0 ≤ clocks ≤ V DD ) can be reliably shifted to the required level (V O−SC ≤ level ≤ V in ) and applied to the gate drivers. Although the highest voltage of the level shifters is V in , the voltage across the gate oxide of transistors M 1 and M 2 is still within V DD , which ensures the oxide reliability. Moreover, the M 1 and M 2 in the level shifter do not suffer drain/source-to-body junction breakdown [11] . In addition, a classic level shifter is utilized to translate CLK 1 to the input (0 ≤ input ≤ V in ) of the gate driver for power switch M 9 [20] . All the other gate drivers are inverters or inverter chains with an appropriate input to match the phases in Fig. 5(b) . The power supplies of all the gate drivers are tailored to the required voltage levels in Fig. 5(b) ; thus, the voltage across the gate oxide of any power switch is limited within V DD . In this way, the breakdown under the 2.8-V V in is overcome during the whole operation period. The clock generator in Fig. 6(a) is utilized to generate nonoverlapping clocks and to eliminate the shoot-through current [11] .
C. LDO Regulator Design
The proposed second stage of the system, i.e., a wideband LDO regulator, is shown in Fig. 7(a) , including an error amplifier (EA), a buffer stage, and a power stage. To suppress the high-frequency ripple from the SC power stage and to realize an area-efficient implementation, the LDO regulator is designed to achieve a high PSRR. In addition, the feedback loop of the LDO regulator has a wide bandwidth to achieve a fast load transient response.
To achieve a wide bandwidth and to stabilize the LDO regulator, a source follower (M 11 , M 12 , and M 13 ) is employed as the buffer stage to drive power transistor M power . The high output resistance R O,EA (g m8 r o8 r o10 ||g m4 r o4 r o6 ) of the EA is isolated from the large gate capacitor (C gate ) of M power ; thus, the dominant pole p 1 at the EA output is pushed to a higher frequency than a conventional LDO regulator without a buffer stage [see Fig. 7(b) ]. Meanwhile, the resistance R O,BUF (= 1/g m11 ) at the gate of power transistor M power is much lower than R O,EA without the buffer stage. This pushes the nondominant pole p 2 at the gate of M power to a higher frequency. In addition, on-chip resistor R Z and capacitor C Z create a left-half plane (LHP) zero (z 1 ), which is located before the nondominant pole p 2 to extend the unity gain frequency (UGF) and to increase the phase margin for better stability. Therefore, the LHP zero and the buffer stage jointly extend the bandwidth and facilitate the loop stability.
The design employs a small on-chip output capacitor C out to achieve a fully monolithic implementation. Because the proposed LDO regulator uses an nMOS power transistor (instead of a conventional pMOS), the output impedance of the power transistor is reduced. This low output impedance, in combination with the small output capacitor, results in a nondominant pole (p 3 ) at a high frequency, simplifying the frequency compensation of the feedback loop.
Since a small on-chip output capacitor is employed, a relatively small amount of charge can be held in the output capacitor. Thus, a large voltage droop could be observed if the feedback loop cannot respond to the load change promptly. However, the aforementioned wide bandwidth reduces the time delay of the feedback loop and facilitates a faster transient response, as illustrated in Fig. 7(c) , leading to a lower output undershoot/overshoot than a conventional LDO regulator.
The modeling and analysis on loop gain T (s) and the PSRR are performed. Open-loop gain T (s) is
where A EA is the small-signal gain of the EA, β is the feedback factor, ω z1 is the frequency of zero z 1 , and ω p1 , ω p2 , and ω p3 are the frequencies of poles p 1 , p 2 , and p 3 , respectively, as follows:
C out (16) where g m,power is the transconductance of power transistor M power .
The PSRR of the LDO regulator is
where 
where Z O and r o are the output impedance of the LDO regulator and the output resistance of the power transistor, respectively. Substituting (13) and (18) into (17), the PSRR can be expressed as
Since the nMOS power transistor has a high input resistance, it provides better noise isolation than a pMOS transistor, which is demonstrated by (19) . A high UGF and better noise isolation by the nMOS power transistor jointly enable a high PSRR, as indicated by (19) .
IV. EXPERIMENTAL VERIFICATIONS
The proposed SC power converter is implemented with the 180-nm CMOS process. Fig. 8 shows the chip micrograph. The total chip area is 3.2 mm 2 , including all bonding pads and testing circuits. The nominal operation frequency of the SC power stage is 100 MHz, which can be increased up to 200 MHz for higher power delivery. Fig. 9 shows the measurement results on the load transient response. When the load steps up from 1.5 to 15 mA, the output voltage recovers within 45 ns with a droop of 40 mV. As the load drops back to 1.5 mA, the output overshoot is 45 mV, with a recovery time of 45 ns. Although the on-chip output capacitor C out is small, the load transient response shows a low drooping behavior. As the output ringing is not observed during the load transient, the stability of the feedback loop is confirmed. Under a 15-mA load current, the V out ripple is about 15 mV, whereas the output ripple of the SC stage is around 50 mV, which means that the ripple reduction is accomplished by the LDO regulator. The proposed system achieves an area-efficient implementation with a small output ripple. Without the LDO regulator, the output filter capacitor of the SC stage has to increase by 3.3 times to achieve the same ripple performance. The measured PSRR is shown in Fig. 10 . It demonstrates a 40-dB power supply noise rejection at 100 MHz. Consistent with (19) , at a low frequency up to a −3-dB bandwidth (10 kHz), the PSRR is dominated by the loop gain of the LDO regulator (≈ A EA β). As the frequency goes beyond 10 kHz, the PSRR begins to roll off with a −20-dB/decade slope as the loop gain decreases. In addition to the zero z 1 , output capacitor C O−SC forms a second zero z 2 . Thus, when the frequency reaches the two zeroes at around 8 MHz, the PSRR starts to increase. As the load current increases, the noise isolation capability of the nMOS power transistor reduces; thus, the PSRR is higher at 5 mA than at 10 mA. Fig. 11 shows the measured efficiency of the SC power stage at 75, 100, and 125 MHz. The SC stage has a peak power efficiency of 68%, whereas the entire system achieves a maximum efficiency of 60% with the 2.8-V V in and the 0.8-V V out . This is two times higher than a stand-alone LDO regulator, which only achieves 28.6% efficiency at this condition. In Fig. 11 , the SC power stage reaches the peak efficiency at a 15-mA load current, which is the nominal operating point of this design. As the total power loss increases with the switching frequency, the efficiency at 75 MHz is higher than that at 125 MHz. A maximum efficiency of 60% can be achieved for the entire converter at a 21-mA load. Table II compares the proposed design with the prior arts in [21] and [22] . As the switching frequency in [21] is much lower, it achieves higher power efficiency than the work in [22] and this paper. However, the design does not accomplish a fully monolithic integration since external pumping capacitors are employed. The design in [22] is monolithic, but it has much lower power efficiency than this paper, although its switching frequency is lower, which is due to the different strategies on the system architecture and circuit designs.
V. CONCLUSION
This paper has presented a high-frequency, monolithic, and breakdown-resilient power converter. The two-stage cascaded structure overcomes the high risk of breakdown, and the implementation of each stage enables the area-efficient fully monolithic integration. The 2/5 SC power stage realizes reliable and efficient voltage conversion, while the fully on-chip LDO regulator regulates the output with a wide bandwidth and a high PSRR.
