Abstract-
I. INTRODUCTION

P
OWER management plays a key role in an era of autonomously connected devices. Such devices form the backbone of the so-called "Internet-of-Things," and so must be as power efficient as possible in order to maximize their power autonomy and battery life. However, the steady scaling of CMOS technology has been accompanied by lower supply voltages, which translates into larger supply currents for a given power dissipation. At the same time, interconnect resistance has increased, leading to larger IR drops and making it increasingly difficult to realize efficient power management systems. Furthermore, battery voltages have not scaled at the same rate as supply voltages, adding to the costs.
To cope with this challenge and to deliver power with high efficiency and minimum area, various types of voltage regulators (VRs) are used [1] , [2] . However, none of these are ideal, with each type having its own unique set of advantages and disadvantages. Although linear regulators can be quite area efficient and can provide high amounts of power, their efficiency is ultimately limited by the ratio of their input to output voltages [3] . Inductive switched mode power supplies have high efficiency but typically require off-chip discrete components such as inductors. Capacitive switched mode power supplies can be integrated on-chip with moderate efficiency, at the cost of low area efficiency and output current. While some of these limitations can be mitigated by conventional techniques such as the use of trench capacitors [4] or in-package inductors [5] , they cannot be completely circumvented. In this paper, we will introduce the voltage stacking technique [3] and show that it can significantly increase both the power and area efficiency of on-chip power management systems.
Voltage stacking [3] is a technique that involves connecting power domains in series rather than in parallel. As shown in Fig. 1 , disregarding the calculations for now, this is analogous to connecting resistors in series rather than in parallel. If a system has two power domains, each modeled as impedances between supply and ground rails, then a conventional imple-mentation would see these impedances connected in parallel, i.e., connecting the ground rail to the ground rail and the supply rail to the supply rail. In contrast, a voltage stacking scenario would involve connecting the supply rail of the first [see the bottom of Fig. 1 (a) and (b)] power domain to the ground rail of the second [see the top of Fig. 1 (a) and (b)] power domain. Assuming that each power domain uses the same supply voltage and current, this means that the total supply voltage doubles, while the total supply current halves. In other words, an implicit 2:1 conversion has been realized without the power conversion losses associated with voltage converters and with no area overhead.
In practical implementations, the top and bottom power domains will not consume the same current and so a VR will be needed to stabilize the midrail between the series connected power domains. If the supply current mismatch is small, the power supplied by this regulator will be much less than that supplied by the regulator of a conventional parallel connected system. In consequence, the VR of a voltage stacked system can be quite compact. Furthermore, since it delivers less current, its losses will not be significant, so the overall system power efficiency will be higher than that of a conventional system. Therefore, the implicit conversion step associated with voltage stacking relaxes the requirements on the explicit conversion step (e.g., VR).
The benefits of voltage stacking have been demonstrated before [6] , but its use in realistic applications has only been described in [7] . Earlier work [6] , [7] , [8] , [9] , [10] featured either simple circuit blocks or larger but disconnected systems. In [6] , multipliers were used to read out operands from on-chip SRAM. The circuit infrastructure like level shifters between the power domains and VR that can control the midnode were well studied and implemented, but the system lacks the complexity of a real application. Similarly, in [8] , the concept of voltage regulation is further developed into several linear regulator blocks providing the current in the design, and the application space has similarly been well established in the form of a lockstep microcontroller (MCU) system. The final silicon implementation, however, featured only phase-locked loops (PLL) stacked on top of each other, which is still not a complete system that could demonstrate the feasibility of voltage stacking. In [9] , the complexity of the implemented system was much higher with stacked memory blocks and processor cores; however, the separate MCUs were disconnected from each other and did not function as one system that is needed in a realistic application. The other novelty was the introduction of switched capacitor converters into voltage stacking implementation, which employed an adaptive regulation scheme further enhanced by per-core frequency scaling that reduced the current imbalance even further. The final work that treated the topic was in [10] where a stacked IO driver was proposed with thin-oxide transistors for low power and high speed. The benefits compared with thick-oxide IO pad implementations were due to voltage stacking-although the delivered supply voltage was 2 V DD , the thin-oxide devices observed only V DD over their respective voltage range.
In this paper, the realized system addresses the shortcomings of the previous implementations. First, a full microcontroller IC is implemented featuring an ARM Cortex-M0+ processor, 16-kB SRAM, 4-kB ROM, and an on-chip switched capacitor VR (SCVR) in a 40-nm technology and uses only thin-oxide single threshold voltage transistors. The important part to realize that while this system features sufficient complexity (20k standard cells), it also functions on its own without the use of duplicates or multiple systems stacked on top of each other. The system is stacked at the block (IP) level where level shifters interface between the power domains. To the best of our knowledge, this is the first work where such a scheme is implemented on heterogeneous components in a scalable manner, fully compatible with the conventional digital design flow. Furthermore, while previous voltage stacking implementations were fixed and could not be configured into an alternative operating mode with voltage stacking off, this work features a reconfigurable chip where the system can work both with voltage stacking turned on or off. This way, the benefits of voltage stacking can explicitly be measured on the same chip, which was not done in previous works. Third, previous level shifter implementations were typically impractical for high complexity systems and were not standard cell compatible, while here a novel level shifter is presented that does not rely on exotic elements like capacitors and thickoxide devices and can be densely laid out along with the standard cells.
The testchip not only fulfills the above-mentioned features, but also the benefits from voltage stacking are significant. While voltage stacking was off, the maximum power efficiency was 81%, which improved to 96% for the case when voltage stacking was enabled. Similarly, the converter area has also reduced and the supply quality has improved.
II. SYSTEM DESIGN
The primary goal of voltage stacking is to reduce the amount of power processed by the VRs. This comes with the power and area benefits. While the area of the converter can be estimated from its type and its output current, the power benefit has to be calculated. If a 100% efficient power delivery system would exist, voltage stacking would not confer power benefits since it does not influence the power consumption of the load circuitry itself but only reduces the losses of the power conversion step. The power efficiency for the stacked system can be calculated as the ratio of the output to input powers, as illustrated in Fig. 1(b) . At the input, I current flows into the upper domain supply rail, while I in,VR flows into the VR input, both at 2V DD . At the output, I current is the same as at the input, while the VR output observes I out,VR flowing out at V DD voltage. This adds up to a combined system efficiency η sys , which is also derived in Fig. 1 
where I out,VR = I is the difference current between the top and bottom domains that needs to be supplied by the VR. In the limiting case when I = 0, voltage stacking is off since there is no charge recycling between the power domains, while when I = 0, there is perfect voltage stacking where there is no current mismatch between the power domains. Since the converter size is also proportional to its output current I , the ultimate goal while designing a voltage stacked system is to minimize I . The illustration of (1) for various VR efficiencies can be seen in Fig. 2 . While (1) seems simple enough to minimize, there are several secondary effects that are needed to be taken into account. The first effect is that the efficiency of the VR depends on the output current I . On one hand, according to (1) , the higher the current imbalance, the more the I current is sourced from the VR, increasing the proportion of the current that suffers from conversion losses. This is limited by the output current dependence of the VR efficiency: η VR ( I ) itself is a function rather than a constant in relation to I . The second mechanism that influences the efficiency is the additional power consumption of the level shifters, which is not represented in (1) . Finally, the output impedance of the VR drops the output voltage by R out · I , which scales the voltages applied to the bottom and top power domains, and influences their power consumption.
Based on these considerations, a practical implementation of voltage stacking should, on one hand, benefit in power and area, while on the other hand should be simple enough to be implemented with minimum overhead in design effort. In the following, we present our voltage stacked microcontroller system where we focused on maximizing gains with only small modifications to the otherwise complex standard design flow. Fig. 1(a) shows a conventional power delivery scheme where the power domains are connected in parallel and are supplied by the same VR. The first power domain in our case incorporates most of the logic of the system, while the second power domain contains the memory. The voltage stacking scenario, which has beneficial power efficiency over the traditional approach, is shown in Fig. 1 of voltage stacking has been largely implicit and relied on many assumptions. Therefore, we transform the system into a reconfigurable architecture where the first power domain containing the logic is fixed between 0 V and V DD , while the second power domain with the memory is reconfigurable to be either between 0 V and V DD or between V DD and 2 V DD . The same system is compared in two power modes, "stacking off" and "stacking on," as in Fig. 1 (a) and (b), respectively.
A. Power Delivery
In the "stacking on" mode, the memory becomes the top power domain and the logic becomes the bottom power domain. The current that is drawn by the memory from the external 2V DD supply is routed from the V DD ground of the top power domain to the V DD supply rail of the bottom power domain, which means that this current is directly sourced from the 2V DD supply without conversion losses. The SCVR regulating the V DD supply now has to be capable of supplying the system in both power modes.
The transition between power modes should be as seamless as possible. The system should not require a powerup or even a reset after stepping from stacking off to stacking on and back. The whole procedure should only require the clock of the system to be stopped and then resumed. B. System Architecture Fig. 3 depicts the architecture of the complete microcontroller system. The bottom power domain incorporates the core and the peripherals and the top power domain incorporates the memory. The top power domain is placed in a deep n-well so that its supply and ground voltages can be arbitrarily set. With either stacking on or off, the system requires an external supply, V IN = 2V DD , with V OUT = V DD generated by the SCVR. The two power domains communicate via level shifters in the stacking on mode and through regular buffers acting as a bypass circuitry in the stacking off mode.
The Cortex-M0+ core accesses the ROM and SRAM through advanced high-performance bus (AHB), which can also be controlled through serial wire (SW) interface. Next to general-purpose IO, the AHB bus is connected to advanced peripheral bus, which provides access to various peripherals like the Universal Asynchronous Receiver/Transmitter (UART) module, 32-b timer, and clock generation unit. The Instruction SRAM (ISRAM) can be programmed through the SW interface. The partitioning choice of placing memory on top of logic was motivated, on one hand, by the system power composition for the typical testcase of Fig. 4 . As can be seen from Fig. 4 , memory and logic power consumption is well matched; furthermore, there is no scenario in which one is completely off-in active mode, the core needs to read instructions from the instruction SRAM every clock cycle, toggling several nodes in both power domains and securing a well-balanced power consumption scenario. While more refined partitioning approaches that exploit high granularity partitioning are possible, our heuristic approach produced good results with minimum design effort, which is important in typical applications. Next to this, the system's overhead needs to be taken into account. Basically, the number of level shifters and the layout partitioning cost limit the partitioning efforts. From this perspective, simplicity is important, e.g., a minimum number of level shifters and power domains. When counting the number of level shifters, the connections to the IO pads also need to be included. This motivated that the memory would be placed in the top domain and the logic in the bottom one, and not the other way around. No IO pad is directly connected to the memories, and this way the order of stacking ensured that no level shifter was needed to interface to the IO pads, which reduced the number of level shifters.
C. Level Shifter
As shown before, the level shifter is responsible for the effectiveness of voltage stacking-it must offer minimum timing, power, and area overhead. The levels shifter needs to be standard cell compatible and laid out in a dense manner. These requirements necessitate the use of thin-oxide devices, which in turn raises questions about reliability. The 2V DD voltage that is the maximum voltage drop across the level shifter is usually harmful for thin-oxide devices, so special care has to be taken to ensure that there devices are not the subject of voltage overstress. Furthermore, the extreme voltage shift between the input and output is a challenge since there is barely any overlap between the voltages of the input and the output. The signal voltages within the level shifter have to be high enough to toggle a change and low enough to avoid the problem of voltage overstress.
Meeting the aforementioned requirements is not possible with most conventional level shifters. Commonly, the application that requires these kinds of extreme level shifters is floating-voltage HV drivers, e.g., boost converter control circuitry as in [12] . In these applications, however, thickoxide devices are used. Thin-oxide devices are used in [6] for voltage stacking purposes; however, these cells were still heavily dependent on the ratio of the input to output devices and need a large capacitor to operate.
The proposed level shifters are compatible with standard digital cells and are entirely made of thin-oxide devices. The up-and down level shifter instances are depicted in Fig. 5(a) and (b) , respectively. They are fully static and can be densely laid out, enabling a standard cell-based design. Illustrating the operation on the up-level shifter, the input signal is buffered by two inverters that, shielded by four clamp transistors, control a pMOS latch, whose output swings from rail to rail without floating nodes, due to the pMOS pulldown transistors. To protect these devices from overvoltage, the pMOS devices have been placed in a so-called "hot" nwell, which limits the voltages between two arbitrary terminals of the device to V DD .
The operation of the up-level shifter in The input signal propagates through I1 and I2 inverters that buffer it and convert the single-ended input into a differential signal. Being differential, either I1 or I2 assumes a low-level value, and the corresponding transistors (either M1-M5 or M2-M6 pair) open and pull down the appropriate node in the pMOS latch. Having pulled down one node of the latch, either M8 or M7 opens and pulls up the other node, producing a differential signal, which is buffered by I3 at the output.
From the possible voltage overstress point of view, M1-M8 devices need to be carefully analyzed. If the output of I2 is at 0 V, M2 is open and pulls down the drains of M4 and M6 to 0 V. In turn, the M4 and M6 pull down the gate of M7 to VSS TOP voltage (1.1 V). The reason why it is not pulled further is because M6 connects the gate of M7 to VSS TOP and this turns off M4, which has also VSS TOP at its gate, and therefore, V GS (M4) becomes 0 V. We can now observe that M2 has V DSGB = (0, 0, 1.1, 0) V, while M7 has V DSGB = (2.2, 2.2, 1.1, 2.2) V, and therefore, no overvoltage occurs. For M4, V DSGB = (0, 1.1, 1.1, 1.1) V holds, and for M6, V DSGB = (1.1, 1.1, 0, 1.1 ) V. The last concern is on transistor M1, since M5 and M3 pull up its drain to 2.2V. However, the lowest voltage over M1 is still 1.1 V since I1 keeps its source at VDD level: V DSGB = (2.2, 1.1, 1.1, 1.1 ) V. The illustrated behavior of the up-level shifter cell holds similar to that of the down-level shifter cell, and the degradation mechanisms for both cells have been simulated with industry standard aging simulation tools. It has been found that the lifetime of the level shifters did not exceed the standard requirements.
To enhance the operation, hot wells are used for transistors M1-M6. I3 and M7 and M8 are placed in a triple well so that the nMOS body bias voltage equals V DD . During voltage stacking off, the level shifter is in off state and the transistors M3-M8 receive 1 × V DD lower supply, so that M7 source is connected to V DD while M5 drain to 0 V, and so on. The sizing of the transistors in the design is important, the I1N, I2N, and M1 and M2 need to be about 5× larger than the rest of the devices. The up-and down-level shifters were implemented with high threshold voltage devices due to process limitations that did not allow multiple threshold voltage design.
D. Bypass
To ensure correct operation with voltage stacking on or off, the level shifters in Fig. 5 must be bypassed in the conventional power mode when voltage stacking is off. To achieve this, the scheme in Fig. 6 is proposed, which shows the implementation for up-and down-level shifters, respectively. The signal path is split into two paths: 1) the first path with a level shifter and 2) the second bypassing the level shifter. The two paths are selected with demultiplexers and multiplexers. The demultiplexers are realized with isolation cells that select and drive the path that is active in the given power mode (stacking on and stacking off). The multiplexer at the output of the level shifter then selects the active path and forwards it to the output.
The bypass circuit operates without external control signal. Staying at the example shown in Fig. 6(a) , when voltage stacking is on, memory ground VSS TOP node is at V DD voltage and memory supply VDD TOP at 2 V DD . This enables the AND isolation gate and disables the OR isolation gate, since one of their input is connected to VSS TOP. The OR gate output settles at 1.1 V, not causing voltage overstress in the multiplexer. Therefore, the level shifter is activated and produces an output signal between V DD and 2V DD . The multiplexer receives the VDD node at its select input, which corresponds to a logic 0 since the ground node VSS TOP is at V DD , the same voltage as VDD node. Therefore, the "SEL = 0" input is selected, which happens to be that of the level shifter. When voltage stacking is turned off, memory ground VSS TOP node is at 0 V and memory supply VDD TOP is at V DD . This disables the AND isolation gate and enables the OR isolation gate, activating the bypass path and disabling the level shifter. The multiplexer selects the "SEL = 1" input since its ground voltage is 0 V and the select input is hooked up to the VDD node. The output of the multiplexer copies the bypass signal.
E. Switched Capacitor Voltage Regulator
The task of the VR is to regulate the mid node that serves as the supply rail of the top power domain as well as the ground of the bottom power domain. It has to be sized for the worst case current consumption so that the midnode is always kept in the desired voltage level.
Just like [9] , we chose an SCVR since it provides superior efficiency compared with a linear regulator, and a higher regulator efficiency means that there is less room for power saving for voltage stacking, making the comparison with conventional power delivery more realistic. Since the system is composed of two power domains stacked on top of each other, the voltage halving architecture was chosen for its simplicity, high efficiency, and low area, as shown in Fig. 7 . The concept of the voltage halving SCVR is simple; a so-called flying capacitor swaps places between the input and the output. There are three main nodes present in the converter, the input (V IN ), output (V OUT ), and ground (V S S ). In phase 1, which is active corresponding to 1 , the capacitor is connected between V IN and V OUT , while in phase 2, it is connected between V OUT and V SS . Since the charge on the capacitor stored in each phase corresponds to the voltage present, over time there is a net charge transfer between V IN and V OUT , which reaches equilibrium once the voltage of V OUT is half that of V IN . This charge transfer works both ways, and thus positive and negative excess current both can be handled by the SCVR, making it suitable for voltage stacking.
The exact circuit implementation of the voltage halver is shown in Fig. 7 . There is a clock signal needed to control phases 1 and 2, which needs to be level shifted for the switches that are connecting the flying capacitor to V IN and V OUT . For this purpose, the level shifter discussed in Section II-C is used. Furthermore, the switches cannot have overlap since that would result in undesired short-circuit current, and therefore, the switch control signals that are derived from the clock signal need to be nonoverlapping. To achieve this, the clock signal is converted to a nonoverlapping clock signal pair through a separate circuit block. The control signals on the top and the bottom of Fig. 7 always have some latency compared with each other. Due to the synchronicity of the signals, the overlap margin was chosen to be high since the level shifter delay is hard to account for over several process-voltage-temperature (PVT) corners. This high timing margin meant that the clock signal had a maximum frequency of 20 MHz, which could not be exceeded.
To benefit from high efficiency, the flying capacitor and the switches have to be sized so that the switching and the conduction losses are in balance and none of them limit the peak efficiency considerably. During the design stage, a total of 656-pF flying capacitance was available in the form of 2 × 20.5 pF accumulation PMOS capacitors per SCVR instance, and the maximum switching frequency was set to 25 MHz. According to the Seeman model [14] , this meant a minimum output impedance with ideal switches (slow switching limit) of about R SSL = 1/4f sw C≈19.1 . Considering a maximum of 7.5% allowed average output voltage drop (82.5 mV) and a 3-mA maximum output current, the maximum allowed output impedance is 27.5 . Adding to this the effect of finite switch conductance, the goal was to stay under a 25-output impedance, which indicated about 8.1-switch impedance based on [14] 
There were in total 16 SCVR instances implemented. To adjust for the output current requirements between stacking on and off modes, ten instances could be turned off for six active instances in stacked mode, while all the 16 instances were active when voltage stacking was off. Further, the efficiency is enhanced by not interleaving the regulator instances, similar to the approach in [15] . This meant connecting the SCVR instances in a daisy chain configuration, one 6× and one 10× chain, as shown in Fig. 7 . The reduction in charge sharing by connecting the capacitor instances in series increased the SCVR efficiency at the cost of higher voltage ripple. Just like the level shifter cells, overvoltage has to be avoided for the devices, since they are implemented with traditional thin-oxide devices. To ensure proper operation upon startup, parallel to the power switches, startup helper circuits were implemented to bring the flying capacitor's voltages to a known state. This ensures that the thin-oxide flying capacitor is not exposed to a higher voltage than 1.21 V (V DD + 10%), which could cause overstress and degrade the device. To ensure proper operation, the powerup of the SCVR is performed in a stepwise fashion. First V IN is brought to V DD voltage, then V OUT also receives V DD voltage, and finally V IN is ramped up to 2V DD voltage. 
F. Experimental Results
The testchip has been fabricated in a CMOS 40-nm process. The micrograph can be seen in Fig. 8 . The total area accounts for 1.49 mm 2 , of which the area of the bottom power domain is 0.077 mm 2 and the top power domain is 0.113 mm 2 , while the level shifter instances occupy 0.028 mm 2 .
Since voltage stacking claims to improve the VR efficiency, first it is important to look at the SCVR itself to characterize its Characterization of current profiles for memory and logic demonstrating low mismatch. behavior under various load conditions. Therefore, the SCVR efficiency was measured separately under different load currents. The results can be seen in Fig. 9 . It is important to note that while with voltage stacking off the output current I OUT is always positive, it is sourced from the SCVR that in stacked mode, the current can be either positive or negative. The sign is determined by the current consumption of the power domains. If the bottom power domain consumes more, then I OUT will be positive, meaning that the useful power is consumed between V OUT and V SS . However, if the top power domain is consuming more, I OUT will take a negative sign as the current has to be sunk by the converter. In this latter case, the useful power is consumed between V IN and V OUT . As can be seen from Fig. 9 , the SCVR has a similar efficiency profile for both current sink and source scenarios, which is expected since the capacitors are mostly symmetric for the charge transfer sign. The peak efficiency achieved in both cases was 81%. At low load current I OUT , the switching losses of the converter start to dominate since they represent a load current independent portion of the losses. The switching losses depend on the switching frequency of the converter, which was fixed in this experiment. The measurements were at room temperature for a typical sample.
After characterizing the SCVR, the next step was to measure the current consumption that can be expected from the two power domains. Over 400 testcases were examined, where the current consumption of the top and the bottom power domains were compared. The results in Fig. 10 show that the logic in extreme cases consumes 1.75× more current than the memory, while on average the mismatch is much smaller. The power consumption of the two power domains is well correlated since the MCU core has to read instructions from the memory every clock cycle, and there is no scenario in which one of the power domains is inactive.
The timing and power overhead from the level shifting must be minimized to keep voltage stacking beneficial for various systems. In the current testchip, the timing impact from the level shifting was a 1.5-ns added delay to the critical path between the memory and the logic, which permitted only an 80-MHz operation in the stacked mode instead of 100 MHz, which was used with stacking off. Further, the power penalty from the level shifters can be captured with the difference in the power drawn by the system from the voltage conversion stage with stacking on and off. It has been found that on average the output power of the conversion stage is 8% higher in the stacked mode, which is partially the power overhead of the level shifting; however, the total power consumed by the entire system was still lower, as can be seen in the following, due to the higher efficiency of the conversion stage.
Characterizing the SCVR and the digital MCU part separately, the next step is to operate them simultaneously with voltage stacking on and off and quantify the benefits of voltage stacking. The system power efficiency was measured across the 400 testcases in Fig. 11 . In the stacked mode, on one hand, the SCVR needs to provide 4× less maximum output current, and the system achieves a 96% efficiency and a 15% improvement over the SCVR peak efficiency of 81%, despite the fact that the SCVR becomes less efficient at low load currents. The number of SCVR instances and their switching frequency was determined separately for the stacking on and stacking off modes, keeping the criterion that their output voltage drop and efficiency have to be aligned as much as possible. After calibration, it was found that when stacking is off, using 16 SCVR instances at 10 MHz resulted in a similar output voltage and a similar SCVR standalone efficiency as six SCVR instances at 5 MHz in the stacked mode. Fig. 12 shows a reconfiguration scenario where the system is in transition between the two power modes. The supply and ground rails switch simultaneously so that the memory content is preserved, and after reenabling the clock, the program execution can be resumed. The clock has to be disabled before the supply rails change, and similarly, some minimum time is required before reenabling the clock after the supply transition. The transition did not take longer than 0.5 μs.
Zooming in the supply waveforms in Fig. 13 , the waveforms were compared for the case of voltage stacking on or off under active load. With stacking off, 16 instances of the SCVR, switched at 10 MHz, are used to process current I BOT + I TOP , while with stacking on, only six SCVR instances are turned on at 5 MHz to process current I BOT − I TOP . Turning on voltage stacking resulted in a 3.4-dB supply ripple reduction, and the voltage drop of the SCVR reduced from 58 to 36 mV, in accordance with [16] . Due to the smaller current that is processed in the stacked mode, even weaker VR can produce cleaner supply, lifting the tough constraints on the power delivery system, which is due to the scaled CMOS process.
Summarizing the results of this paper, Table I shows a comparison of the results of this paper with those of the previously reported works on voltage stacking. In the comparison, this is the first IC that implements voltage stacking within a practical MCU system. Despite using bulk CMOS process, higher efficiency can be obtained using voltage stacking, compared with converters implemented with deep trench capacitors. Another important point to note is that no previous work features a comparison of both conventional and voltage stacking power delivery for the same system on the same die. Here, we present a comparison for the voltage stacking on and off modes. The efficiency has improved from 81% to 96%, while the power density increased threefold.
The benefit from voltage stacking can be further understood by comparing the efficiency and power density with other converters. In Fig. 14 , the x-axis shows the power density, while the y-axis shows the efficiency of various SCVR implementations from bulk through SOI to processes that allow deep trench capacitors. It can be observed that for the same technology, there is a tradeoff between efficiency and power density; high-efficiency converters typically have poor power density and high power density converters have poor efficiency. In this paper, while the same limitation applies to the implemented SCVR, using voltage stacking, the tradeoff can be bypassed and an improvement for both efficiency and power density can be achieved. In the comparison, only advanced processes can achieve the same kind of benefit that voltage stacking delivers in a standard bulk CMOS.
III. CONCLUSION
A microcontroller system with voltage stacking has been presented. For the first time, voltage stacking has been applied on a practical system. Unlike previous works, the implementation did not focus on the stacking of simple circuit blocks or larger but independent systems; instead, a realistic system has been chosen as a demonstration vehicle for the benefits of voltage stacking. Furthermore, for the first time, voltage stacking could be turned off on the same system, making the benefits of voltage stacking directly quantifiable.
The benefits of voltage stacking were as follows. The system power efficiency improved from 81% with voltage stacking off to 96% with voltage stacking on, while using five times less effective power converter area. All this was achieved using the bulk CMOS process with a fully on-chip solution. Next to all this, though using a much weaker converter, the supply ripple has reduced by 3.4 dB and the voltage drops from 58 to 36 mV. As future work, the technique can be extended to an arbitrary system provided the partitioning of the design is automated. In 2014, he joined NXP Semiconductors, Eindhoven, The Netherlands, as a Senior Scientist, where he was involved in low power techniques for microcontrollers. His current research interests include energy-efficient digital systems and power conversion methods.
Ajay Kapoor received the B.Tech. degree in electrical engineering from IIT Delhi, New Delhi, India, and the M.Sc. degree (cum laude) in embedded systems from the University of Twente, Enschede, The Netherlands.
Since 1999, he has been with Philips (now NXP) Semiconductors, Eindhoven, The Netherlands, working on design of low power circuits, system, and algorithms. His current research interests include wireless energy-based systems, low-power circuits, architectures, and signal processing.
Arjun Majumdar, photograph and biography not available at the time of publication. He is an Innovation Lead or the Department Manager at NXP Semiconductors. He has authored or co-authored several scientific publications and presentations, and holds more than 25 U.S. patents. His current research interests include low-power design, multiprocessors, heterogeneous and reconfigurable systems, and variability tolerance design. He was a Faculty Member with the Department of Electrical Engineering, Texas A&M University, Texas, USA. Currently, he is a Fellow at NXP Semiconductors. He also holds the professorship "Resilient Nanoelectronics" with the Department of Electrical Engineering, Eindhoven University of Technology. He has co-authored more than 150 publications in the fields of testing, nonlinear circuits, and low power design. He has also co-authored four books and holds a number of granted patents.
Dr. de Gyvez has been an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS PART I and PART II and of Technology for the IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING. He is also a member of the Editorial Board of the Journal of Low Power Electronics.
