On-die adaptive power regulation and distribution for digital loads by Gangopadhyay, Samantak








of the Requirements for the Degree
Doctor of Philosophy in the
School of Electrical and Computer Engineering
Georgia Institute of Technology
December 2017
Copyright © Samantak Gangopadhyay 2017
ON-DIE ADAPTIVE POWER REGULATION AND DISTRIBUTION FOR
DIGITAL LOADS
Approved by:
Dr. Arijit Raychowdhury, Advisor
School of Electrical and Computer
Engineering
Georgia Institute of Technology
Dr. Sudhakar Yalamanchili
School of Electrical and Computer
Engineering
Georgia Institute of Technology
Dr. Hua Wang
School of Electrical and Computer
Engineering
Georgia Institute of Technology
Dr. Visvesh Sathe
School of Electrical Engineering
University of Washington
Dr. Keith A. Bowman
Qualcomm Technologies, Inc.
Raleigh, NC
Dr. Rahul M. Rao
IBM India Private Ltd
Bangalore
Date Approved: October 27, 2017
If everything seems under control, you are not going fast enough.
Mario Andretti
To Sanjana, For her support, her patience, her faith. Because she always understood.
ACKNOWLEDGEMENTS
Firstly, I would like to express my sincere gratitude to my advisor Prof.Arijit Ray-
chowdhury, without whose support and guidance during my entire Ph.D journey, this thesis
would not have been possible. Your knowledge, creativity and logical view-points have
helped me and inspired me at every step of this journey. Thank you for being the best
teacher, mentor and advisor I ever had.
Besides my advisor, I would like to thank the rest of my reading committee: Prof.
Sudhakar Yalamanchili and Prof. Hua Wang for their insightful comments, encouragement,
and questions which helped me to widen my perspective and enhance my understanding of
my research area.
My sincere thanks goes to Dr. Keith Bowman who provided me an opportunity to
join his team as an intern. Working with you not only gave me the chance to learn about
technical details but also taught be valuable life lessons of discipline, focus and creativity.
I would like to thank Dr. Rahul Rao for being one of my first mentors in the professional
world. I got my inspiration to start the Ph.D. journey from you and without you I do not
think the path I took would be the same.
Many thanks to my fellow ICSRL labmates for the daily stimulating discussions and
for all the fun we have had in the last few years.
I would like to thank my family: my parents and my sister for supporting me throughout
my life. My father who inspires me every day with his kindness, humility and intelligence.
My mother whose unconditional love has made everything possible. My sister who has
been my best friend, confidant and advisor.
Finally, thank you to my wife for bearing this entire journey, every step of the way and
for all the love, encouragement and patience.
v
TABLE OF CONTENTS
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Chapter 1: Introduction and Background . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2: Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Digital Assists/Substitution for Analog Low Dropout Voltage regulator . . . 6
2.1.1 All Digital LDOs . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Hybrid LDOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Wider Power PFET . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.4 Digitally Assisted Analog LDO for DSP application . . . . . . . . . 8
2.2 Advanced Clock Generation and Distribution Techniques . . . . . . . . . . 10
2.2.1 Adaptive Frequency Control using Supply Voltage Tracking . . . . 11
2.2.2 Adaptive Clock Distribution for Supply Voltage Droop Tolerance . . 12
2.2.3 Adaptive Phase Shifting PLL . . . . . . . . . . . . . . . . . . . . . 13
2.3 Multi Ratio Switched Capacitor Converters . . . . . . . . . . . . . . . . . 15
2.3.1 Recursive Switched Capacitor Converter . . . . . . . . . . . . . . . 15
vi
2.3.2 Tri-output PMU for IoT systems . . . . . . . . . . . . . . . . . . . 16
2.4 Reconfigurable Switched Capacitor Converter . . . . . . . . . . . . . . . . 18
2.4.1 Dual-Symmetrical-Output Switched-Capacitor . . . . . . . . . . . 18
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Chapter 3: Digitally-Assisted Leakage Current Supply (LCS) Circuit . . . . . . 22
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Test Chip Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Test Chip Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Chapter 4: All Digital Low Dropout Voltage Regulators . . . . . . . . . . . . . . 36
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Discrete Time (DT) Digital LDO . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.1 Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.2 Model Analysis and Adaptation Result . . . . . . . . . . . . . . . . 40
4.3 Continuous Time (CT) Digital LDO . . . . . . . . . . . . . . . . . . . . . 42
4.3.1 Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3.2 Control Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.3 Measured Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Chapter 5: Unified Voltage and Frequency Regulator (UVFR) . . . . . . . . . . 59
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
vii
5.2 Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2.1 Overrun Protection . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2.2 Digital Logic Load . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.3 Local Voltage Controlled Oscillator . . . . . . . . . . . . . . . . . 67
5.3 Design Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3.1 Small Signal Model and behaviour . . . . . . . . . . . . . . . . . . 68
5.3.2 Large signal behavior . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.3.3 Output Voltage Ripple and Local clock phase noise . . . . . . . . . 71
5.4 Test Chip and Measurements . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Chapter 6: Quad-Output Elastic Switched Capacitor Converter . . . . . . . . . 81
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.2 Architecture, Design Principle of Operation . . . . . . . . . . . . . . . . . 83
6.2.1 Quad-output Elastic SCC Design . . . . . . . . . . . . . . . . . . . 85
6.2.2 Extended Binary Bit Switched Capacitor . . . . . . . . . . . . . . . 87
6.2.3 Per-core LDO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.2.4 Core and Load Circuit . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2.5 SCN Clock and Cross-domain Regulation . . . . . . . . . . . . . . 93
6.3 Dynamic Dual-loop control and phase allocation via FSM . . . . . . . . . . 94
6.4 Measured Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
viii
Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
ix
LIST OF TABLES
3.1 Power management unit (PMU) configurations. . . . . . . . . . . . . . . . 26
3.2 Comparison between output-pole and internal-pole dominated analog LDOs. 27
3.3 LDO comparisons for VDO,MIN and FOMs. . . . . . . . . . . . . . . . . . . 35
4.1 Voltage and current ranges for measurement. . . . . . . . . . . . . . . . . . 53
5.1 Comparison with LDOs for voltage regulation. . . . . . . . . . . . . . . . . 80
6.1 Comparison table with other SC topologies. . . . . . . . . . . . . . . . . . 103
x
LIST OF FIGURES
1.1 Static and dynamic variations cause supply voltage noise that can lead to
timing fails. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Voltage regulators for power management in SoC designs. . . . . . . . . . . 3
2.1 Schematic diagram for (a) output pole dominant (b) internal pole dominant
analog LDO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Schematic for the digitally assisted analog LDO[40]. . . . . . . . . . . . . 9
2.3 Block diagram for adaptive frequency control using supply voltage tracking
[44]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Testchip diagram for Adaptive Clock Distribution (ACD) [47]. . . . . . . . 13
2.5 Block diagram for adaptive phase shifting PLL[46]. . . . . . . . . . . . . . 14
2.6 Recursive switched capacitor basic cell [50]. . . . . . . . . . . . . . . . . . 15
2.7 Examples of recursive switched capacitor implementation for 1/4 and 3/8
ratios [50]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.8 Architecture diagram for the tri-output power management unit for IoT ap-
plications. The three SC converters have been encircled through dotted
shapes [51]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.9 Strategy of dynamic power-cell allocation and system architecture of Dual
Symmetrical output SC [53]. . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Test-chip architecture of a dual core voltage (VCORE) design on a shared
voltage rail (VIN). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Load current versus dropout voltage in an analog LDO. . . . . . . . . . . . 24
xi
3.3 Power management unit (PMU) block diagram with header switches (HS),
analog LDO and the leakage current supply (LCS) circuit. . . . . . . . . . . 25
3.4 LCS leakage-current-starved ring oscillator (RO) schematic with VCNTL <
VTH. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Three-stage pipeline prototypical core. . . . . . . . . . . . . . . . . . . . . 28
3.6 Output-pole dominant two-stage analog LDO schematic and simulated LDO
loop gain for heavy and light load conditions. . . . . . . . . . . . . . . . . 28
3.7 Measured oscilloscope captures after a load step for (a) analog LDO, which
fails to regulate, and (b) LCS assisted hybrid LDO, which continues to
regulate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.8 Measured minimum dropout voltage (VDO,MIN) for analog LDO and LCS
assisted hybrid LDO as well as LCS assisted hybrid LDO VDO,MIN reduction
across temperature versus VIN. . . . . . . . . . . . . . . . . . . . . . . . . 30
3.9 Measured LCS leakage-current-starved RO frequency (FRO) versus core
leakage current (ILEAK) with temperature ranging from 25° C to 85° C for
each die. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.10 Measured voltage droop (VDROOP) and effective VCORE versus VREF for a
load step from 0.8mA to 2.8mA (Dotted Lines: Analog LDO, Solid Lines:
LCS assisted hybrid LDO). . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.11 Measured Core1 power versus clock frequency across multiple dynamic
voltage-frequency scaling (DVFS) states with Core0 setting VIN. A and
C represent the maximum VCORE1 while satisfying VDO,MIN for the analog
LDO and LCS assisted hybrid LDO, respectively. B and D represent the
switch to HS mode for the analog LDO and LCS assisted hybrid LDO,
respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.12 Measured power supply rejection ratio. . . . . . . . . . . . . . . . . . . . . 33
3.13 Measured load regulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.14 Measured current efficiency. . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.15 Test-chip micrograph and characteristics. . . . . . . . . . . . . . . . . . . . 35
4.1 Schematic diagram of a generic discrete time digital LDO. . . . . . . . . . 38
xii
4.2 Schematic diagram of the ADC stage. . . . . . . . . . . . . . . . . . . . . 39
4.3 Generation of the control signals for the barrel shifter corresponding to the
ADC outputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Measured current efficiency of Discrete Time Digital LDO. . . . . . . . . . 42
4.5 Architecture of the Phase locked LDO. . . . . . . . . . . . . . . . . . . . . 43
4.6 Design of current Starved ring oscillator based VCO. . . . . . . . . . . . . 44
4.7 JC stage illustrating phase detection and the level-shifting output pass PMOS
devices (P1 to P4). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.8 Schematic diagram of overrun protection (OP) block. . . . . . . . . . . . . 46
4.9 Instantaneous phase difference (Δφ) created when a transient event changes
the output voltage by ΔVOUT. The overrun protection guarantees that the
resultant transient phase saturates to 0 on one end and 2π on another. . . . . 47
4.10 Level-shifter (LS) schematic. . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.11 Block diagram of the eight JC stages illustrating operation on both clock
edges. Clock gating on each section provides higher efficiency of the con-
trol logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.12 Small signal Laplace model illustrating a second order system. . . . . . . . 50
4.13 Simulated Bode plots of the open loop system illustrating a phase margin
of (a) 45° at light load (0.625 mA) in red and dashed (b) 98° at heavy load,
10X (6.25 mA) in blue and solid. . . . . . . . . . . . . . . . . . . . . . . . 52
4.14 Chip micrograph and characteristics. . . . . . . . . . . . . . . . . . . . . . 52
4.15 VCO frequency with varying VCTL. . . . . . . . . . . . . . . . . . . . . . . 53
4.16 Chip micrograph and characteristics. . . . . . . . . . . . . . . . . . . . . . 54
4.17 Measured transient response for switching load current. . . . . . . . . . . . 55
4.18 Measured load regulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.19 Effect of VLOGIC on the output settling time. . . . . . . . . . . . . . . . . . 56
4.20 Power efficiency vs VLDO (VIN=0.8V and ILOAD=3mA) for different VLOGIC. 56
xiii
4.21 Integration of Digital LDO with an FFT engine [67]. The memory is pow-
ered by the input VCC while the low-power core logic is operated at VccCORE
which is generated by the integrated digital LDO. . . . . . . . . . . . . . . 58
4.22 Measured FMAX vs VccCORE, when VccCORE is powered externally and when
VccCORE is powered through the Digital LDO (b) Measured power of the
logic core both with and without the digital LDO. . . . . . . . . . . . . . . 58
5.1 Traditional two-loop system for providing voltage and frequency. . . . . . . 60
5.2 Unified voltage frequency regulator (UVFR) architecture. . . . . . . . . . . 61
5.3 Johnson Counter based multi-phase unified voltage and frequency regulator
with a divide ratio of N=1. . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4 At steady state FREF (R) and FLOC (L) settle down at same frequency with a
constant phase difference . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.5 Overrun protection (OP) to prevent aliasing in large phase errors. . . . . . . 64
5.6 Timing diagrams of the overrun protection unit. Here R represents FREF,
L represents the steady state LOC and L’ represents FLOC under a transient
event. (a) If R=L=1, then R should be held at 1, (b) if R=L=0, then R should
be held at 0, (c) if R=0 and L=1 then L should be held at 1, and (d) if R=1
and L=0, then L should be held at 0, to prevent phase aliasing. . . . . . . . 65
5.7 Schematic diagram of overrun protection (OP) block. . . . . . . . . . . . . 66
5.8 Digital Logic Load with (1) pipeline with EDS and (2) programmable DC
load and (3) programmable noise generator. . . . . . . . . . . . . . . . . . 67
5.9 Schematics for the (a) TRC-based VCO and (b) level shifter. . . . . . . . . 68
5.10 Small signal s-domain model of the UVFR control loop. . . . . . . . . . . . 70
5.11 Simulated Bode plots of the open loop system indicating a phase margin of
(a) 52°at light load (0.5 mA) in dashed and (b) 89° at heavy load (5 mA) in
solid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.12 Plot of phase margin (PM) versus load capacitance variation. The PM re-
duces because the pole moves to lower frequencies as the output capaci-
tance increases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
xiv
5.13 Output voltage ripple versus number of interleaving stage at (a) constant
load current (b) constant reference frequency. . . . . . . . . . . . . . . . . 73
5.14 Chip micrograph and characteristics. . . . . . . . . . . . . . . . . . . . . . 74
5.15 Measured results show VREG adapting with (a) temperature, (b) process and
(c) aging variations to maintain frequency lock. The process VT = 350mV
and UVFR operates from 0.84V to 0.27V.. . . . . . . . . . . . . . . . . . . 74
5.16 Measured oscilloscope capture showing full load step and local clock adapt-
ing to VREG changes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.17 Measured (a) voltage droop and (b) settling time for varying FRREF. . . . . 76
5.18 Measured scope data on high-speed active probe demonstrates that UVFR
enables error-free operation even under large voltage droops. . . . . . . . . 77
5.19 Measured voltage regulation (VREG) versus reference clock frequency (FREF). 78
5.20 Measured (a) load regulation and (b) line regulation. . . . . . . . . . . . . . 79
5.21 Measured current efficiency versus load current. . . . . . . . . . . . . . . . 79
6.1 Detailed top-level structure of the Quad-Output Elastic Switched Capacitor
Converter supplying power to 4 cores. . . . . . . . . . . . . . . . . . . . . 82
6.2 Block level diagram for QOESC architecture. . . . . . . . . . . . . . . . . 83
6.3 Decision flow chart for resource allocation in QOESC architecture. . . . . . 84
6.4 Detailed top-level structure of interleaving and resource sharing scheme
loop control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.5 Detailed top-level circuit diagram of interleaving and resource sharing scheme
loop control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.6 Extended binary switched capacitor converter circuit diagram and switch
control tables for3⁄4, 1⁄2 and 1⁄4 ratios. . . . . . . . . . . . . . . . . . . . . . . 87
6.7 Switched Capacitor configuration for N=3/4 based on EXB codes. . . . . . 89
6.8 Switched Capacitor configuration for N=1/2 based on EXB codes. . . . . . 89
6.9 Switched Capacitor configuration for N=1/4 based on EXB codes. . . . . . 90
xv
6.10 KVL equations for the SC configurations for N=3⁄4, 1⁄2 and 1⁄4 (Fig.6.7, Fig.6.8,
Fig.6.9). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.11 Block Diagram of PLDO and the prototype core . . . . . . . . . . . . . . . 92
6.12 Timing diagram of dual loop control . . . . . . . . . . . . . . . . . . . . . 94
6.13 (a) FSM for resource allocation and flowchart of operation principle (b)
Decision flow for resource slice allocation . . . . . . . . . . . . . . . . . . 95
6.14 Circuit level implementation of FSM . . . . . . . . . . . . . . . . . . . . . 96
6.15 Measured SC power with respect to (a) varying load current (b) varying
output voltage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.16 Measured scope capture showing boot-up of all the 4 cores using QOESC. . 97
6.17 QOESC internal resistance (ROUT) versus switching frequency. . . . . . . . 98
6.18 Measured output voltage of proposed vs baseline design vs. varying load
current shows improvement of 43-64%. . . . . . . . . . . . . . . . . . . . 99
6.19 Measured power efficiency of proposed vs baseline design by varying out-
put power shows increase of 68-90% in efficiency. . . . . . . . . . . . . . . 99
6.20 Measured output voltage ripple of proposed vs baseline design for different
load current shows improvement of 43-50%. . . . . . . . . . . . . . . . . . 100
6.21 Scope capture demonstrating the regulation under load step. . . . . . . . . . 100
6.22 Power vs frequency for the one of the cores showing improved operating
range. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.23 Measured system efficiency shows that proposed design through flexible
allocation of resources allows cores to perform at higher power states at
consistent efficiency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.24 Measured data shows coupling on steady state cores can be reduced by
transient boosting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.25 Chip micrograph and characteristics. . . . . . . . . . . . . . . . . . . . . . 103
xvi
SUMMARY
The objective of this dissertation is to provide a power architecture solution where
guardband reduction and consistent performance are the key-goals for power delivery net-
works in multicore SoCs. The necessity for maximizing energy efficiency without com-
promising performance has led to the implementation of fine-grain Dynamic Voltage and
Frequency Scaling (DVFS). However, as DVFS schemes support ever increasing supply-
frequency operating points, static and dynamic variations result in increasing design guard-
bands and impact the system power efficiency. The research work presented here, will
attempt to address these concerns through multiple approaches geared towards the differ-
ent components in the power delivery network hierarchy. Digital assists for analog LDOs,
novel approach of integrating clocking and supply voltage loops and elastic and multiple
output switched capacitor networks, including theoretical models and measurements from
silicon test-chips, will be discussed. For the different techniques discussed in this thesis





With each successive generation, the need for low power circuits and systems is growing.
As this requirement for low-power circuits and systems continues to grow, the importance
of design for low voltages and wide dynamic ranges have become indisputable. The neces-
sity for maximizing energy efficiency has led to fine-grain voltage domains and voltage as-
signments in multi-core microprocessors [1–9]. However, fine-grain Dynamic Voltage and
Frequency Scaling (DVFS) brings with it significant challenges to power delivery, voltage
regulation and clocking in digital Systems-on-Chip (SoCs).
Further, as DVFS schemes support ever increasing supply-frequency operating points,
static and dynamic variations result in increasing design guardbands and impact the system
power efficiency. Fig.1.1 shows some examples of variations and their first order effect.
Variation induced noise on supply can cause timing failures leading to functional errors.
While for the static variations and slow dynamic variations viz. temperature, aging etc
it is possible to calibrate and adapt, the fast dynamic variations caused by power state
transitions etc. can be extremely difficult to mitigate. Most of the proposed techniques to
address these have high area or power overhead [10–20]. As such, voltage guardbands still
remain the most popular and effective measure to address fast dynamic variations. When
the guardbands are high, the SoC design operates at DVFS states that are sub-optimal in
terms of power efficiency. In short, we trade off power efficiency for functional accuracy.
In order to implement fine-grained DVFS in multi-core SoC designs integrated voltage
regulators (IVRs) are essential [21–27]. Further, it is well understood that Integrated volt-
age regulators and DC-DC converters that constitute the power delivery network need to (1)
be flexible and adaptive to maintain consistent performance across wide dynamic range of
load (2) improve resilience towards variations and reduce the guardband so that additional
1
Figure 1.1: Static and dynamic variations cause supply voltage noise that can lead to timing
fails.
and possibly more efficient DVFS states are available.
On-chip power delivery networks for today’s systems-on-chip (SoCs) are characterized
by dynamic supply voltage, many embedded VRs, lower de-cap, high current ranges, mul-
tiple power modes and fast transient loads are designed to minimize AC load transients and
supply noise. Such networks are designed in a hierarchical manner: buck converters (off-
die) followed by, switched capacitor (SC) VRs (on-die) followed by linear VRs (on-die)
to address power hotspots across multiple-voltage domains and wide dynamic operation.
Fig. 1.2 provides the comparison and general schematics of the different kind of regulators
used.
Embedded VRs provide finer temporal and spatial voltage distribution, but often at
the expense of lower system efficiency. Analog LDOs are primary choice in current SoC
CPUs to satisfy the high bandwidth requirement for fast transient performance [28–32].
Area-constraints and high bandwidth requirement puts a restriction on the size of the power
PFET in these LDOs and consequently limit the current drive and result in higher minimum
dropout voltage (VDO,MIN). This essentially translates into loss of DVFS range. If we look
towards the other end at voltages close to threshold, Analog LDO loses it efficacy, as the
analog principles it is based on do not allow operation at low voltages. One possible way
to operate at low voltage is to provide higher voltage to the analog controller block of the
2
Figure 1.2: Voltage regulators for power management in SoC designs.
LDO. However, usually such analog components are buried deep within digital units, that
are powered by dense power grids, and therefore it is challenging to provide them separate
higher supply rails. In this proposal, an attempt to solve these issues have been provided
through assist-circuits and digital alternatives.
As mentioned in the beginning of this chapter, in DVFS eco-system static and dynamic
variations, result in increase of guardbands specifically voltage guardbands. Especially, the
dynamic variations caused by the supply droops are most difficult to address.Techniques
that address these issues and provide mitigation have been proposed but in general they are
always associated with high area and power-head. The scenario demands for a solution
that is more disruptive than simple assist or substitution based solution. For this issue, in
this literature we provide a novel approach where we integrate clocking and supply voltage
loops and modulate the clock frequency according to supply voltage transients and maintain
an error-free and guardband oblivious solution.
3
On-Die SCVRs provide high efficiency but at the cost of a large area and hence are
suited for providing a single on-die output voltage. Further, switched capacitors have high
efficiency within a small range of input and output voltage because they are designed to
be optimal for a desired ratio. The SCVR output is typically regulated with linear VRs
(including LDOs) to provide power to local grids. However, if the regulated voltage is
far-off from the SCVR output voltage power efficiency drops significantly. Per-core SCVR
would be an improvement; but that would imply (1) reduction in available per core total
capacitance and switch area resulting in lower conversion efficiency (2) inefficient usage of
capacitance and switch resources when a core is in sleep or idle mode. to address the above
mentioned issues, in this thesis, an elastic, multi-ratio and multiple output switched capac-
itor architecture implementation has been provided. The multi-ratio capability allows the
SCVR to have extended range of high efficiency operation. In addition, a control scheme
is proposed through which the capacitance and switch resources are distributed to different
cores based on the load requirement. Just like turbo mode for thermal management, the
proposed topology allows one core to run at a power of approximately 4PMAX while others
are in standby (approximately 0 power).
To summarize, the objective of this dissertation is to provide a power architecture so-
lution where guardband reduction and consistent performance are the key-goals for power
delivery network in multicore SoCs. The necessity for maximizing energy efficiency with-
out compromising performance has led to the implementation of fine-grain Dynamic Volt-
age and Frequency Scaling (DVFS). However, as DVFS schemes support ever increasing
supply-frequency operating points, static and dynamic variations result in increasing design
guardbands and impact the system power efficiency. The research work presented here, will
attempt to address these concerns through multiple approaches geared towards the differ-
ent components in the power delivery network hierarchy. Digital assists for analog LDOs,
novel approach of integrating clocking and supply voltage loops and elastic and multiple
output switched capacitor networks, including theoretical models and measurements from
4
silicon test-chips, will be discussed. For the different techniques discussed in this thesis





The survey presented in this section has been divided into four sections. The first two
sections relate to Low dropout voltage regulators and last two sections are focused on
switched capacitor designs. The goal of this survey is to present a selected list of state
of the art designs with respect to three main techniques presented in this research, Leakage
current supply circuit, Unified voltage and frequency regulator and the Quad-output elastic
switched capacitor.
2.1 Digital Assists/Substitution for Analog Low Dropout Voltage regulator
Premium-tier SoC CPU cores typically have a requirement of high bandwidth for fast tran-
sient performance and analog LDOs are usually the primary choice. With scaling every
generation and reduction in core area, the load capacitance available to an analog LDO is
reducing. As a result, most of the analog LDOs used are internal pole dominated. For high
bandwidth this puts restriction on the size of the power PFET. Area-constraints on the size
of the power PFET typically limits the current drive in an analog LDO and results in higher
minimum drop-out voltage (VDO,MIN). The VDO,MIN values of high-bandwidth analog LDOs
range from 150-300mV in order to supply the core maximum current demand at the worst
case dynamic and leakage power conditions [33, 34]. A key challenge in industrial analog
LDOs is this large VDO,MIN, which limits the opportunities to enable LDO mode for voltage
scaling power benefits.
2.1.1 All Digital LDOs
In recent years, all-digital LDOs have received significant attention to address the VDO,MIN
issue [35–37]. While digital LDOs have a lower VDO,MIN requirement as the power PFETs
6
operate in the linear region, these designs suffer from low gain and high output ripple due
to limit cycle oscillations [35]. Hence, high-bandwidth analog LDOs are preferred in high-
performance cores as compared to digital LDOs.
2.1.2 Hybrid LDOs
Recently, hybrid LDOs [38, 39], that employ both digital and analog loops to trade-off
the strengths and weaknesses of traditional digital and analog designs have been proposed.
The challenge with the hybrid LDO designs is managing the complex current-load sharing
between the analog and digital loops while maintaining high-bandwidth and stability. The
load sharing problem often leads to an overdesign of both the analog and digital loops
2.1.3 Wider Power PFET
Figure 2.1: Schematic diagram for (a) output pole dominant (b) internal pole dominant
analog LDO
An alternative approach to reduce the headroom is to increase the width of the power
PFET. The obvious penalty of this approach is expensive silicon area and in several SoC
designs that have limited die size this approach might not be feasible. Apart from the area
cost, increasing power PFET size degrades the control loop dynamics of both input and
7
output pole dominated analog LDO.
In case of output pole dominant analog LDO the dominant pole is formed by the load
resistance (RL and load capacitance (CL) (Fig.2.1(a)). When the size of PFET is increased,
the non-dominant pole formed by the gate capacitance of PFET (CM) and output resistance
of the error amplifier stage (REA) reduces and goes to lower frequency. This causes the
two poles to come closer and reduce the phase margin of the system. As the phase margin
reduces, the system starts to approach underdamped behavior and exhibits oscillatory and
unstable behavior. Such as system will have high overshoot and undershoot during load
transients and also higher settling time. Conversely, if the system is internal pole dominant
then the dominant pole is at the gate of the PFET (Fig.2.1(b)). If the power PFET size
is increased, it causes the dominant pole to further reduce and therefore reduce the loop
bandwidth. Since, loop bandwidth is directly proportional to response time of the system
to reduce error, a wider PFET negatively affects the performance and speed of the analog
LDO to large load steps.
2.1.4 Digitally Assisted Analog LDO for DSP application
In this work [40], a high bandwidth internal pole dominant LDO with digital assist from
block head switches has been designed (Fig 2.2).The LDO does not require an external
capacitor. However, lack of capacitance makes it necessary for the LDO to be fast at high
load conditions. To improve the transient response of the LDO, digital assist is provided
to offload a significant portion of the load current. The analog loop has a current mirror
driving an analog-to-digital converter (ADC). The ADC senses the current provided by
the analog loop. The digital loop can then offload the excess current from the analog
loop. Sizing the analog pass transistor to deliver the maximum total current would make
it much larger and would degrade its transient response. If ADC detects the analog LDO
is supplying lower current than preset low threshold then FSM is triggered which turns off
one of the block head switches (BHS). Similarly, if high threshold is reached then one BHS
8
switch is turned on. The dominant pole is at the output of the error amplifier and a second
pole is located at the load. At light load condition the pole at the load moves to lower
frequency and makes the design unstable. Therefore, fixing the minimum current supplied
by analog LDO helps in stability.
In a nut-shell through the low bandwidth digital assist loop, the total load current pro-
vided by the analog LDO is maintained within a smaller range and therefore the size of
power PFET of the analog LDO can be reduced. However, this implies that the complete
system can only cater to a limited range of load transients at high speed. In case of droop,
if the load transient is higher than the maximum load current capability of the analog LDO
then the LDO will have to wait for the slow digital loop to respond and pull the output
voltage node back to its steady state. Further, the scheme does not address the issue of
headroom improvement for increased DVFS states for enhanced power efficiency.
Figure 2.2: Schematic for the digitally assisted analog LDO[40].
9
2.2 Advanced Clock Generation and Distribution Techniques
In recent years, fine grained dynamic voltage and frequency scaling (DVFS) has become
one of the most effective technique to reduce power consumption in multi-core designs. In-
tegrated voltage regulators, both switching and linear are being actively pursued to provide
optimal power to meet target frequencies and throughput. Recent advances in compact,
power-efficient linear regulators, operating in the low dropout (LDO) mode exhibit capa-
bilities of supplying large load transients in a few clock cycles [35, 36, 41, 42].
As DVFS schemes support ever increasing supply-frequency operating points, varia-
tions result in increasing design guardbands. Static variations, like process induced vari-
ations can be calibrated for [13–16]. Dynamic variations are more difficult to address;
slower dynamic variations e.g., induced by temperature or aging require run-time sensing
and calibration [17, 43]. Although it requires additional circuits at the cost of power and
area, its promise has already been demonstrated. However, variations induced by high fre-
quency supply droops, caused by power state transitions, clock gating, pose serious risks in
correct operation; and can be mitigated with overdesign and significant supply guardbands.
Techniques that employ double sampling on data paths [18–20] can be used to detect timing
errors, which can be used to flush the pipeline and restart computation. However, such tech-
niques have high overhead; [17] has 9.4% power overhead and 6.9% area overhead when
compared to a baseline design. In light of this, it can be concluded that voltage guardband
still remains as one of the more effective technique to protect against dynamic variations.
Another notable parallel effort in alleviating the effect of supply droops is to modulate
the clock generation or distribution by the supply voltage of the digital circuit [44–48].
As opposed to traditional systems, where clock and supply voltage are generated from
separate and independent control loops, if the clock adapts to the changes in voltage due
to variations then it should reduce the voltage guardband required for correct functionality.
In the following subsections, designs, that implement such adaptive clock techniques,have
10
been discussed.
2.2.1 Adaptive Frequency Control using Supply Voltage Tracking
[44] proposes the use of a combination of the digital supply and a clean supply to power
the voltage controlled oscillator (VCO) in the PLL loop with the goal to track and adapt the
clock to first order droops. Fig.2.3 shows the block diagram for the implemented design.
The adaptation mechanism consists of a voltage mixer between the analog and digital volt-
age supplies that powers the VCO in the phase locked loop. The analog supply is regulated
voltage generated from an on-die linear voltage regulator inside the phase locked loop unit
and the digital voltage is output provided to the digital load. The mixer creates a short
between the analog and digital supplies and since the digital supply will be noisy due to
static and dynamic variations, these voltages can often differ. In order to avoid crowbar
current due to the voltage difference another control circuit called voltage compare and
track (VCAT) is used. This loop causes the analog supply to closely follow the digital
supply. When the digital supply voltage goes to lower values during power retention states
the adaptation mechanism is completely shut down to prevent crowbar current. During a
first order voltage droop, the VCO slows down proportionately, and if the VCOs supply
sensitive matches that of the critical data-path, pipeline errors can be avoided. This scheme
shows 5% performance improvement by supply guardband reduction in an industrial mi-
croprocessor. However, this scheme has two major shortcomings. Firstly, the band-width
(BW) of the PLL dictates the capability of the VCO to track supply droops. Low frequency
supply droops (within the PLLs bandwidth) are quickly detected by the loop and suppressed
providing no immunity for such supply droops. The VCO remains sensitive only to high-
frequency droops outside the PLLs loop BW. As fast-lock PLLs with increasing loop BW
become reality, the efficacy of the scheme decreases. Secondly, a detailed study conducted
in [46, 49] does show promise and process scalability of such supply sensitive PLL designs,
but also reveals the extensive calibration required to get the VCO’s supply sensitivity right.
11
Figure 2.3: Block diagram for adaptive frequency control using supply voltage tracking
[44].
2.2.2 Adaptive Clock Distribution for Supply Voltage Droop Tolerance
In this scheme, [47] utilizes a tunable replica circuit (TRC) based logic array that makes
the sensitivity of clock path to voltage droops identical to data-path sensitivity (2.4). This
essentially provides for compensation for a pre-decided fixed number of clock cycles (con-
figured through calibration) at the beginning of the droop. After that clock frequency is
dropped to half of its original value until the effects of voltage droop subsides. The de-
sign integrates a tunable length delay prior to the global clock distribution to prolong the
clock-data delay compensation in critical paths during a voltage droop. The tunable length
delay is achieved through a standard tunable replica circuit (TRC) that includes both tran-
sistor and interconnect delay components. Through calibration the TRC is tuned such that,
the clock distribution paths supply voltage sensitivity matches the voltage sensitivity of the
critical pipeline path. The delay in the TRC determines the absolute amount of time where
12
there would be optimum clock data compensation. An on-die dynamic variation monitor
(DVM) is used to detect the onset of the voltage droop and generate the corresponding sig-
nals to drive a finite state machine (FSM) that either gates the clock or adjusts its frequency.
The first version of the design required extensive calibration [47]. An auto-calibrated ver-
sion of the scheme proposed in [45] shows its effectiveness in commercial designs.
This work provides an effective resilient technique where functional errors are avoided
by delaying the effect of droop by some clock cycles, followed by a reduction in clock fre-
quency. However, since the clock frequency gets halved for multiple cycles the throughput
can decrease, especially for low frequency droops.
Figure 2.4: Testchip diagram for Adaptive Clock Distribution (ACD) [47].
2.2.3 Adaptive Phase Shifting PLL
In this work [46], the authors provide an extensive model and analysis of clock data com-
pensation phenomenon and conclude that to have optimum clock data compensation, both
the control over supply noise sensitivity and phase of supply noise observed by the clock
needs to be tuned. To achieve this objective, the design uses a large capacitor bank which
can be binary programmed (Fig.2.5). As can be noted Cu, Cd and Cf are all programmable.
13
The capacitor banks and transistors M1 and M2 form a high-pass filter so that the resonant
supply noise can be AC coupled to the bias voltage of the VCO to generate an adaptive
clock signal. Using a proper configuration of the three capacitor banks, the desired phase
shift and noise sensitivity can be achieved. While this design was successful in its achieving
desired objectives, it suffers from inherent limitations. The design would need extensive
calibration. For different droops with varying frequency the required settings of sensitivity
and phase would change. Additionally, the massive capacitor banks incur heavy penalty on
precious SOC area. Finally, since this is a control loop any noise introduced that is within
the bandwidth of PLL would be rejected. Thus, the design will not have immunity against
slow changing frequency droop.
Figure 2.5: Block diagram for adaptive phase shifting PLL[46].
14
2.3 Multi Ratio Switched Capacitor Converters
2.3.1 Recursive Switched Capacitor Converter
[50] proposes a recursive switched capacitor(RSC) DC-DC converter topology that achieves
high efficiency across a wide output voltage range by providing 2N-1 conversion ratios us-
ing N 2:1 Switched Capacitor (SC) cells with minimal hardware overhead. Fig. 2.6 shows
the basic 2:1 converter cell. The converter produces an output voltage at node MID that is
an average of the voltage at INTOP and INBOTTOM. In order to produce multiple ratio this
basic cell is repeated through instances in series to produce ratios of higher resolution.
Figure 2.6: Recursive switched capacitor basic cell [50].
The first instantiated 2:1 SC cell is connected between Vin and circuit ground. The
INTOP ports of all subsequent 2:1 cells are either connected to Vin or another stages MID
port, while the INBOTTOM ports are either connected to circuit ground or another stages
MID port. Through these connections, the amount of charge through the flying capacitors
is minimized. In order to improve efficiency, parallel connections of these cells are also
15
allowed. The number of iterations (i.e., recursion depth N) defines the resolution of the
output voltage as VIN/2N, where VOUT is obtained through the MID port of the final con-
version stage. Figure 2.7 illustrates simplified examples of 1/4 and 3/8 ratios. The design
Figure 2.7: Examples of recursive switched capacitor implementation for 1/4 and 3/8 ratios
[50].
achieves high efficiency by maximizing the number of connection to VIN and ground in
order to minimize the total charge transferred through the flying capacitor. This minimizes
cascading losses. Further, design uses parallel connection to utilize maximum amount of
resources and optimal relative sizing of switches and capacitance depending on the current
drive.
The major shortcoming of such a cascaded design is at after every stage an output
capacitance is required in order to act as low pass filter and produce a static voltage at MID
that is an average of voltage at INTOP and INBOTTOM. This capacitance adds to area overhead
and also causes parasitic losses.
2.3.2 Tri-output PMU for IoT systems
[51] presents a power management unit (PMU) design specifically geared towards Internet
of things (IoT) applications. The design is fully integrated and converts an input voltage
16
Figure 2.8: Architecture diagram for the tri-output power management unit for IoT appli-
cations. The three SC converters have been encircled through dotted shapes [51].
within a 0.9V to 4V range to 3 fixed output voltages: 0.6V, 1.2V, and 3.3V. The maximum
efficiency is provided within load conditions from 5nW to 500µW. In the given input and
output conditions the converter demonstrates maximum power efficiency of nearly 60%.
The work also proposes a load-proportional bias scheme that helps maintain high efficiency
at low output power without sacrificing the response time during high output power condi-
tions. Figure 2.8 shows the overall structure of the system. It contains three SC converters
(binary-reconfigurable SC up/downconverter, 1:3 Dickson upconverter, 2:1 SC downcon-
verter) with each responsible for generating one of the three output voltages: 1.2V, 3.3V,
and 0.6V. The binary-reconfigurable up/downconverter converts a wide range of input volt-
ages into a 1.2V output voltage. The Dickson upconverter and 2:1 downconverter then
receive this 1.2V output and convert it into 3.3V and 0.6V, respectively. Proper conversion
ratio configuration of the binary converter is important for robust and power-efficient 1.2V
generation. If the ratio is set too low, the binary converter output cannot reach 1.2V, while
if the ratio is set too high, conversion efficiency worsens due to large conduction loss. The
17
system regulates the conversion ratio by using both feedback and feedforward control [52].
When the system input voltage (VBAT) becomes available, the main controller starts up and
turns on the binary converter with a small default ratio. Conversion ratio is continually in-
creased by feedback control until the converter output voltage reaches ˜1.2V, which triggers
the 'output on detector'.
While the PMU structure suits well to IoT designs, it uses three separate switched
capacitor converter to get the multi-ratios. Implementing three ratios from three converters
is not optimal in terms of area and resource utilization. Further, the design will also suffer
from cascading losses as the Dickson converter and the 2:1 converter are placed in series
after the binary reconfigurable SC.
2.4 Reconfigurable Switched Capacitor Converter
2.4.1 Dual-Symmetrical-Output Switched-Capacitor
[53] presents a fully integrated dual-output SC converter with dynamic power-cell alloca-
tion for application processors. The power cells are shared and can be dynamically allo-
cated according to load demands. A dual-path VCO that works independently of power-cell
allocation is proposed to realize a fast and stable regulation loop. The converter can deliver
a maximum current of 100mA and this total current can be distributed between the two
outputs in different ratios. To illustrate if one output drives a load current of 100 mA, then
the other output will handle an extremely low current (few A). The other extreme of this
distribution would be both the outputs drive a load current of 50 mA, each with over 80%
efficiency. Figure 2.9 shows the dynamic power-cell allocation strategy. The converter
consists of two channels, CH1 and CH2, with output voltages, VO1 and VO2, respectively.
Each output is regulated through frequency modulation. The switching frequencies of the
two channels are f1 and f2. The goal is to adjust them to be equal so that both channels have
the same power density, and the converter achieves the best overall efficiency. Assume, for
example, that the two channels start with the same number of power cells, but the load of
18
Figure 2.9: Strategy of dynamic power-cell allocation and system architecture of Dual
Symmetrical output SC [53].
CH1 is larger than that of CH2. To regulate the outputs properly, we should initially have
f1>f2, and assign more power cells to CH1. It means the physical boundary should migrate
to the right until f1 and f2 are approximately equal. By balancing the power densities of
the two channels with an optimal switching frequency, both switching and parasitic losses
are reduced. By dynamically adjusting both the number of power cells and the optimal
switching frequencies, the channels are able to provide sufficient power to the loads, and
utilization of capacitors is maximized.
The power cells are connected to either CH1 or CH2 by channel selection switches.
The boundary between the two channels is controlled by the outputs of the bidirectional
shift register (SR) sel[1:m+n]. The direction of boundary shifting is determined by the
19
frequency comparator. After each comparison, the boundary will only shift along adjacent
power cells as sel[1:m+n] will only shift by one bit. As such, potential glitches due to
reconnecting power cells are minimized. There are a total of 82 power cells, and they work
with interleaving phases to reduce the output ripple voltage.
While this design works effectively for dual outputs, the strategy will become exponen-
tially complex for multiple outputs. Further, since it is still a single output SC it suffers
from the limited range of VIN to VOUT ratio where the SC has the high efficiency.
2.5 Summary
Minimum dropout voltage (VDO,MIN) of analog LDOs limits the opportunities to enable
LDO mode for voltage scaling power benefits. While digital LDOs and hybrid analog-
digital LDOs can resolve this to some extent, they each have their own limitations. Digital
LDOs suffer from limit cycle oscillations, low bandwidth and poor power supply rejection
(PSR). Hybrid LDOs require careful design to maintain stability due to complex load shar-
ing and this often leads to over-design. The intuitive approach to increase the area of power
MOSFET to reduce VDO,MIN, creates either bandwidth or stability related issues depending
on the type of LDO. In chapter 3, we provide digital assist technique that achieves the goal
of VDO,MIN reduction without suffering from the above mentioned issues.
Several advanced phase locked loop (PLL) and clock distribution techniques that re-
spond to supply voltage droops and modulate clock frequency (or provide beneficial jitter)
have been proposed. However, their effectiveness is limited either in terms of droop sen-
sitivity (limited by PLL loop bandwidth, response time etc.), or in terms of complex auto-
tuning or calibration requirement. Further, some of the implementations also need high-
overhead clock buffers and finely-controlled clock gating. On top of this the fundamental
limitation in conventional systems is the fact that voltage and frequency are generated by
separate control loops. In chapter 5, we provide a single control loop that unifies the supply
voltage and frequency regulation. The proposed implementation provides a tight coupling
20
between the local clock frequency and the regulated voltage that allows voltage guardband
reduction.
Multi-ratio switched capacitors (SC) can enhance the range of of high efficiency power
conversion. Existing approaches achieve multi-ratio through either cascading different in-
stance of SC converter or through separate individual SC converters. As a result, such
approaches are inefficient in terms of power conversion and area requirements. In order to
address these issues, in Chapter 6, we provide a quad output SC design that achieves the
different ratios without cascading or separate SC design. In addition to this the design also
features a control scheme that allocates capacitance and switch area resources in an elastic
manner, based on workload requirement.
21
CHAPTER 3
DIGITALLY-ASSISTED LEAKAGE CURRENT SUPPLY (LCS) CIRCUIT
3.1 Introduction
Industrial system-on-chip (SoC) processors contain a number of distinct supply voltage
(VDD) rails driven from a power management integrated circuit (PMIC). A cluster of SoC
processor cores share the same VDD and clock frequency (FCLK) from a dedicated phase
locked loop (PLL). Each core in a cluster must either operate at the same VDD and FCLK
as the other cores in the cluster or disable operation with a power gate configuration. With
on-die low-dropout (LDO) voltage regulators [19, 33–37, 40, 54], each cluster on a shared
voltage rail may employ a unique VDD and FCLK. In this case, the cluster requiring the
highest VDD and FCLK determines the shared VDD rail. A cluster with a lower target VDD
and FCLK always operates at the lower FCLK for a linear FCLK power reduction. If this cluster
satisfies the LDO minimum dropout voltage (VDO,MIN) requirement, this cluster executes at
the lower target VDD via LDO mode for an additional linear VDD power reduction, which
accounts for the LDO power loss. The dual-core design in Fig. 3.1 represents two separate
clusters on a shared voltage rail (VIN) with each cluster containing a unique core, FCLK
generator, and power management unit (PMU).
Premium-tier SoC CPU cores typically prefer analog LDOs to satisfy the high-bandwidth
requirements for fast transient performance. The analog LDO along with header switches
form the PMU. The header switches are large switches that are capable of providing max-
imum load current even at extremely low source to drain voltage difference. When turned
on completely they ensure that VIN is virtually equal to VOUT. Area-constraints on the size
of the power PFET (i.e., transistor MPA in the PMU in Fig. 3.1) typically limits the current
drive in an analog LDO and results in higher VDO,MIN.
22
Figure 3.1: Test-chip architecture of a dual core voltage (VCORE) design on a shared voltage
rail (VIN).
Minimum dropout voltage, VDO,MIN is the minimum input to output voltage differential
that has to be maintained so that the analog LDO is able to provide the maximum load
current without losing regulation. Fig.3.2 provides an intuition towards this requirement
by plotting the drain current of the LDO versus the source to drain voltage of power PFET
(MPA) by varying the source to gate voltage. As has been shown in the figure, for a given
maximum load current, the drain to source voltage becomes equal to minimum dropout
voltage when the gate voltage goes to its minimum value (0 V) so that |V GS| (source to gate
voltage difference in the PFET) is maximum. This is an absolute fundamental limit as the
transistor will not be capable of providing the required maximum load current if the drain
to source voltage ( VIN -VOUT) falls below VDO,MIN. However, in case of an analog LDO,
the practical limit of the minimum dropout voltage is even higher. When the gate voltage
of power PFET is reduced, the transistor starts to move from saturation to linear region.
For proper functioning of the analog LDO we need the power PFET to remain in saturation
and therefore the minimum dropout voltage for the analog LDO (VDO,MIN is achieved when
the gate voltage is low enough to be at the edge of saturation and its drain current is equal
23
Figure 3.2: Load current versus dropout voltage in an analog LDO.
to the maximum load current.
The VDO,MIN values of high-bandwidth analog LDOs range from 150-300mV in order
to supply the core maximum current demand at the worst-case dynamic and leakage power
conditions [33, 34]. A key challenge in industrial analog LDOs is this large VDO,MIN, which
limits the opportunities to enable LDO mode for voltage scaling power benefits. This
chapter describes a digitally-assisted leakage current supply (LCS) circuit and 130nm test-
chip measurements to reduce the maximum current demand for analog LDOs. The low-
bandwidth LCS circuit supplies the slow-changing leakage current and the high bandwidth
analog LDO supplies the fast-changing dynamic current. By decreasing the maximum cur-
rent requirement for the analog LDO, the LCS reduces the analog LDO VDO,MIN, resulting
in core power savings.
24
3.2 Test Chip Design
Figure 3.3: Power management unit (PMU) block diagram with header switches (HS),
analog LDO and the leakage current supply (LCS) circuit.
The block diagram in Fig. 3.1 captures the behavior of an industrial SoC with two cores
on a shared voltage rail with each core containing a separate FCLK generator and PMU to
allow unique core voltage (VCORE) and FCLK operation. The PMU in Fig. 3.3 consists of
header switches (HS), an analog LDO, and an LCS circuit to allow four configurations as
described in Table 3.1; (1) Power gate mode with the analog LDO and HS disabled, (2)
HS mode with the analog LDO disabled and HS enabled to directly connect VCORE to VIN,
(3) LDO mode in the baseline design with LDO enabled and HS disabled, and (4) LDO
mode in the proposed design with LDO enabled and the LCS circuit controlling the HS
transistors. In the LDO mode for the proposed design, the LCS circuit supplies the slow
changing leakage current while the high-bandwidth analog LDO supplies the fast-changing
dynamic current. By decreasing the maximum current requirement for the analog LDO, the
LCS lowers the analog LDO VDO,MIN while keeping the power PFET (MPA) in saturation,
thus, increasing the opportunities to enable LDO mode for core power reduction.
An intuitive alternative approach to reduce the headroom is to increase the width of the
25
Table 3.1: Power management unit (PMU) configurations.
power PFET (MPA). This leads to a larger PMU area and degrades the loop dynamics in
both internal-pole and output-pole dominant analog LDO loops. As described in Table 3.2,
if the analog LDO is output-pole dominant, then a wider MPA shifts the internal pole at the
gate of MPA to a lower frequency, thereby reducing the phase margin. Conversely, if the
analog LDO is internal-pole compensated, then a wider MPA reduces the loop bandwidth,
thus negatively affecting the response time to large load steps. To address these issues, the
proposed LCS circuit enables a lower VDO,MIN while minimizing the impact on the area
and the analog loop dynamics [55]. In the proposed design, the LCS circuit supplies a
portion of the load current. This is particularly effective at high temperatures when the
leakage current, and hence, the total load current is the highest. Due to load sharing, the
analog power PFET (MPA) is smaller, thus decreasing the gate capacitance and allowing a
higher frequency pole compared to a baseline analog-only design. As a result, the proposed
design fully integrates an output-pole dominant, capacitor-less analog LDO with superior
performance as summarized in Table. 3.2.
From Fig. 3.3, load sharing through the HS devices is enabled by the LCS circuit. The
LCS circuit includes: (1) Leakage-current-starved ring oscillator (RO), as described in Fig.
3.4, to monitor the changes in core leakage across temperature (T) and process variation,
(2) RO frequency counter to map the RO frequency output (FRO) to a digital signature over
a programmable period of time (e.g., 1ms), and (3) control logic that receives the digital
signature to enable a target number of HS transistors to supply the load leakage current.
26
Table 3.2: Comparison between output-pole and internal-pole dominated analog LDOs.
Figure 3.4: LCS leakage-current-starved ring oscillator (RO) schematic with VCNTL <VTH.
The mapping from the digital signature to the target number of HS transistors is obtained
through post silicon calibration and with the help of configuration registers. The LCS
leakage-current-starved RO contains an NFET footer device with a control voltage (VCNTL)
biased below the NFET threshold voltage (VTH). From silicon measurements, a VCNTL of
200mV ensures the voltage discharge of the internal RO nodes is governed by the NFET
leakage current to allow the RO frequency (FRO) to track leakage current while maintaining
a sufficiently high FRO to allow leakage monitoring and LCS configuration every 1ms. The
control logic requires post-silicon characterization to determine the configuration register
settings. An external on-board circuit contains the control logic and provides the interface
27
Figure 3.5: Three-stage pipeline prototypical core.
Figure 3.6: Output-pole dominant two-stage analog LDO schematic and simulated LDO
loop gain for heavy and light load conditions.
for silicon characterization. The test-chip contains a three-stage pipeline circuit in Fig.
3.5 with built-in self-test to mimic core functionality and scan programmable NFETs to
generate realistic load steps.
The output-pole dominant analog LDO in Fig. 3.6 features a two-stage error amplifier
design, consisting of an operational transconductance amplifier (OTA) stage with a low
output capacitance followed by a shunt feedback stage with a low-output resistance. This
places the internal poles of the system at high frequencies (i.e., 100s of MHz), which is
well beyond the unity gain bandwidth of the loop. The dominant pole of the analog am-
plifier is at the output node (VCORE). Even with a small load capacitance of 400pF and no
28
external capacitance, the worst-case phase margin is simulated at 88°. Excellent light load
stability allows the analog LDO to provide retention voltage to the load circuits, when state
preserving flip-flops consume ˜100 µA of total current.
3.3 Test Chip Measurements
Measured oscilloscope captures in Fig. 3.7 with VIN=1.2V, T=85°C, and a dropout of
180mV demonstrate that the baseline design fails to regulate under a load step, whereas
the LCS assisted hybrid LDO continues to regulate. Here it is important note that LCS is
able to maintain regulation because the HS switches provide the excess current. In case of
baseline design the HS switches remain ’off’ and are not utilized.
In comparison to the analog LDO, measurements in Fig. 3.8 reveals that the LCS as-
sisted hybrid LDO reduces VDO,MIN by 30- 38% for three different VIN values. The efficacy
of the LCS assisted hybrid LDO is most pronounced at high temperature (85°C), where
leakage is high, providing an additional 9-14% VDO,MIN reduction relative to the analog
LDO as compared to T=25°C.
Figure 3.7: Measured oscilloscope captures after a load step for (a) analog LDO, which
fails to regulate, and (b) LCS assisted hybrid LDO, which continues to regulate.
29
Figure 3.8: Measured minimum dropout voltage (VDO,MIN) for analog LDO and LCS as-
sisted hybrid LDO as well as LCS assisted hybrid LDO VDO,MIN reduction across tempera-
ture versus VIN.
From measurements in Fig.3.9, across four dies and temperature ranging from 25-85°C,
the leakage sensor FRO closely tracks the changes in core leakage current. For an indus-
trial SoC processor, this data indicates that post-silicon characterization of relatively small
number of parts (e.g., 100s) across wide ranges of T can provide the configuration register
settings for LCS control logic for every part in high volume, shipping thus avoiding the
expensive test time of per part calibration.
Figure 3.9: Measured LCS leakage-current-starved RO frequency (FRO) versus core leak-
age current (ILEAK) with temperature ranging from 25° C to 85° C for each die.
30
Figure 3.10: Measured voltage droop (VDROOP) and effective VCORE versus VREF for a load
step from 0.8mA to 2.8mA (Dotted Lines: Analog LDO, Solid Lines: LCS assisted hybrid
LDO).
Detailed transient measurements in Fig. 3.10 of the proposed design as compared to the
analog LDO at VIN=1.2V and T=85°C with a load step of 800µA to 2.8mA demonstrate:
(1) Voltage droop (VDROOP) becomes worse in both designs as the reference voltage (VREF)
increases due to the diminishing loop gain, (2) Analog LDO regulates until 0.94V, whereas
the LCS assisted hybrid LDO operates until 1.02V, thus providing an extended operating
range, and (3) Effective core voltage (VREF-VDROOP) is 46mV higher in the proposed design
at VREF=0.94V, translating to lower VDD or FCLK.
In measuring the impact of the LCS assisted hybrid LDO on digital loads in Fig. 3.11,
Core0 operates at highest VDD and FCLK, and thus, determines VIN. Core0 VIN:FCLK values
are 1.2V:486MHz, 1.15V:463MHz, 1.1V:415MHz, and 0.9V:280MHz. Core1 executes at
the lower FCLK. If VIN-VCORE1>=VDO,MIN, then Core1 operates at the lower VDD via LDO
mode to support the Core1 FCLK; otherwise VCORE1 remains connected to VIN in HS mode.
Each plot in Fig. 3.11 contains four distinct operating points (A-D). For the baseline
design, A represents the maximum VCORE1 (i.e., maximum FCLK in LDO mode) in which the
analog LDO satisfies VDO,MIN while maintaining regulation and B indicates the necessary
switch to HS mode.
31
Figure 3.11: Measured Core1 power versus clock frequency across multiple dynamic
voltage-frequency scaling (DVFS) states with Core0 setting VIN. A and C represent the
maximum VCORE1 while satisfying VDO,MIN for the analog LDO and LCS assisted hybrid
LDO, respectively. B and D represent the switch to HS mode for the analog LDO and LCS
assisted hybrid LDO, respectively.
For the proposed design, C represents the maximum VCORE1 in which the LCS assisted
hybrid LDO satisfies VDO,MIN while maintaining regulation and D indicates the switch to
HS mode. The VDO,MIN reduction from the LCS assisted hybrid LDO enables a wider range
32
of LDO operation, as defined from point A to point C in Fig. 3.11, thus resulting in new
VCORE1:FREF DVFS states. The availability of these new DVFS states results in core power
reduction of 21-28% at iso-FCLK within this range.
It is also significant to note that since this power reduction essentially stems from a
reduction in VCORE1, it would also lead to an improvement in the reliability of Core1 due to
reduction in electric field.
From power supply rejection ratio (PSRR) measurements in Fig. 3.12, the additional
LCS PFET shunt devices in parallel with the analog LDO has a small effect on the loop
gain. The PSRR plot also demonstrates: (1) high overall bandwidth and (2) no peaking
effect. Load regulation in Fig. 3.13 is less than 1mV/mA.
Peak current efficiency in Fig. 3.14 is 97.2%. The LCS circuits are duty cycled and op-
erated every 1ms, resulting in a small decrease in the overall current efficiency. A compar-
ison in Table 3.3 with state-of-the art designs indicate competitive figure of merits (FOMs)
and low VDO,MIN as compared to traditional analog LDO solutions. Fig. 3.15 describes the
chip micrograph and characteristics.
Figure 3.12: Measured power supply rejection ratio.
33
Figure 3.13: Measured load regulation.
Figure 3.14: Measured current efficiency.
3.4 Summary
A digitally-assisted leakage current supply (LCS) circuit lowers the maximum current re-
quirement for analog LDOs to reduce the minimum dropout voltage (VDO,MIN), thus, ex-
panding the LDO operating range for reducing SoC core power. Silicon measurements
from a 130nm test chip demonstrate that the LCS assisted hybrid LDO lowers VDO,MIN by
30-38%, resulting in core power reduction of 21-28% at iso-FCLK within the wider LDO
operating range.
34
Work This Work [34] [33] [35] [37]
Technology(nm) 130 28 65 130 28
LDO Type Hybrid Analog Analog Digital Digital
VIN(V) 0.9-1.2 0.9-1.1 1 0.5-1.2 1.1
VOUT(V) 0.6-1.02 0.5-0.8 0.85 0.45-1.1 0.9
Load Regulation (mV/mA) 1 0.027 11 10 1
Total Capacitance (nF) 0.4 0.48 0.14 0.8 23.5
Load Type Pipelined Core NA NMOS NMOS Processor
Peak Current Efficiency(%) 97.2 98.4 99 98.3 99.94
Pole Position (Bandwidth) Output node (high) Internal (Low) Tri-Loop (Med) NA NA
Voltage Domains 2 1 1 1 1
FOM1:Minimum Dropout(mV) at peak VIN 180 300 200 100 200
*FOM2:(Transient Time)*ICTL/IMAX(ns) 4.73 0.32 3.01 76.5 7.75
*Normalized to Technology
Table 3.3: LDO comparisons for VDO,MIN and FOMs.
Figure 3.15: Test-chip micrograph and characteristics.
35
CHAPTER 4
ALL DIGITAL LOW DROPOUT VOLTAGE REGULATORS
4.1 Introduction
Modern SoC design methodology uses multiple voltage domains to provide fine-grained
spatial and temporal control of the operating voltage and frequency, and software-controlled
chip power-states that enables lower standby power along with faster wake-up. This allows
the digital circuits to expand their dynamic ranges of operation. The integration of on-die
voltage regulation on the core microprocessor [29, 35, 36, 41, 42, 54, 56] allows faster and
wider dynamic voltage and frequency scaling (DVFS).
On-chip power delivery networks for today’s systems-on-chip (SoCs) are characterized
by dynamic supply voltage, many embedded VRs, lower de-cap, high current ranges, mul-
tiple power modes and fast transient loads are designed to minimize AC load transients
and supply noise. Such power delivery networks are designed in a hierarchical manner,
combining slower and more efficient switching VRs with faster and less efficient linear
regulators, to address power hotspots across multiple-voltage domains and wide dynamic
operation. For regulators that are the closest to the load circuits and operate close to the in-
coming line voltage, linear regulators (that are often configured in the low-dropout (LDO)
mode) are widely used [28, 57–59]. Traditional LDOs have been analog in nature and em-
ploy a high-gain error amplifier to provide regulation. They provide high bandwidth, low
ripple, fast response times and high power supply rejection (PSR) [58]. However, the use
of analog design principles do not allow operation at low input and control voltages and are
difficult to integrate as collaterals embedded deep within a digital functional unit.
This has inspired the design of digital implementations of the LDO [29, 56, 60–62].
Digital LDOs have digital control that can be designed using the digital design methodolo-
36
gies and libraries. Such LVRs can be discrete time (1) or continuous time (2) and provide
compact, process compatible, high efficiency design solutions. In this chapter, the follow-
ing sections will discuss both the discrete and continuous time digital LVRs, in terms of
design principles and performance.
4.2 Discrete Time (DT) Digital LDO
With the popularity of digital LVRs, it is prudent to investigate not only the overall stability
of LVRs, but also understand how to maximize high efficiency with adaptive control under
wide dynamic digital loads. This problem is further exacerbated by the fact that digital
loads undergo large dynamic ranges, resulting in significant movement of the output pole
frequency, thereby making it difficult to guarantee overall system stability across the op-
erating range. The time and frequency domain response of the closed loop system also
changes as the output load changes going from an under-damped to an over-damped sys-
tem. Further designing for the highest load current leads to an inefficient design solution
in light load conditions. This calls for autonomous and adaptive control strategies in the
VR loops that will be cognizant of the position of the output pole. In this section a discrete
time regulator design emphasizing on programmable gain and high system efficiency will
be discussed.
4.2.1 Design Principles
The proposed discrete-time digital LDO consists of three main stages: an ADC input stage,
a controller stage with programmable gain and a current-based DAC at its output stage
(Fig.4.1). In this section we will discuss a generalized form of the design illustrating the key
design components. As shown in Fig. 4.1, the analog-to-digital converter (ADC) samples
the output voltage at the rising edge of the ADC clock. The resolution of the ADC shows a
design trade-off between the speed of the regulator loop and the complexity of design. For
most practical designs a 1-4b flash ADC suffices. Bias currents in the ADC comparator can
37
Figure 4.1: Schematic diagram of a generic discrete time digital LDO.
be avoided by employing a CLK-ed sense amplifier (SA) based ADC front-end. Fig. 4.2
shows a typical flash ADC block diagram and a simple architecture of a SA with a latch
connected at the output for signal restoration. For an N-bit thermometer coded ADC output,
the circuit employs N comparators with reference voltages determining the corresponding
resolution of the converter. Thus, the ADC provides a digitally sampled measure of the
error voltage (VOUT -VREF) and this encoded error is used in the control loop to turn on or
off power MOSFETs. In steady state the closed loop control will ensure an infinitesimally
small error, and the output voltage (VOUT) will track the reference (VREF). The ADC output
drives a bidirectional barrel shifter. The purpose of the barrel shifter is to take in parallel
data, shift it, and drive control signals to the power PMOSs. If the error (ADC output) is
negative illustrating VOUT>VREF, then the shifter shifts down, turning off more PMOSs.
On the other hand, a positive error leads to a shift-up resulting in the turning-on of more
PMOS devices. The number of PMOS devices that will be turned on for each bit of error,
is programmable and implemented using the barrel shifter. The architecture of the parallel
barrel shifter allows the shifter to achieve multiple gains of two and three shifts in a single
cycle. A higher gain is instrumental for a faster convergence when the error voltage is
38
Figure 4.2: Schematic diagram of the ADC stage.
larger in magnitude thereby causing a multi-bit error. The shifter used in the current design
is 128b wide and uses two 4x2 bit multiplexers for control signal generation (Fig. 4.3).
The first level mux makes the choice between latch outputs An, An+2 and An-2 to produce
the output Bn. The second level of a mux makes a choice between Bnn, Bn+1, and Bn-1 to
determine the input to each latch. The select signals are chosen according to sign and the
magnitude of the error. As an example, different programmability modes corresponding to
different gains have been shown in Table I of Fig.4.3.
The output stage of the digital LDO comprises of a bank of pull up PMOS devices.
Depending on the demand of the load current as well as the target output voltage (VREF), a
section of the PMOSs is turned ON and the rest are OFF. In steady state, when regulation is
achieved, the number of ON PMOSs is just enough to supply the load current and suppress
the error voltage to an infinitesimal value.
39
Figure 4.3: Generation of the control signals for the barrel shifter corresponding to the
ADC outputs.
4.2.2 Model Analysis and Adaptation Result
Since, this digital LDO operates in discrete time the control model for such a model has to
be derived as a z-domain model. The open loop transfer function [35, 63, 64] for the DT
digital LDO is given as
Open Loop Transfer function =
KBARRELKDC(z
0.5)




where, KBARREL is gain of the barrel shifter, KDC is the DC gain of the plant or the low pass
filter usually. In this case the plant is formed by a first order filter with the pole formed
by the load capacitance (CLOAD) and load resistance (RLOAD). FS is the sampling frequency
or the clock provided to the digital controller and FLOAD = 1/(RLOADCLOAD). Noting that
for a digital system to be stable, the poles in the z-domain need to lie within the unit
circle, it can be noted that if the sampling frequency increases then the system approaches
40
unstable or underdamped behavior. Similarly if the load current decreases then FLOAD starts
to approach 0 and makes the system unstable. It is evident that the dynamic nature of digital
load circuit necessitates an online adaptation of the control loop such that the closed loop
system poles are constrained within bounds. In essence, as the output pole changes, a truly
adaptive control scheme [65] should be able to adjust the sampling frequency (FS), such
that the z-domain open loop pole (e-FLOAD/FS) remains invariant. Such fine-grained control
is, of course, not energetically viable. Hence, we propose a simple adaptation scheme,
which instead of keeping e-FLOAD/FS invariant, will ensure that it is constrained within certain
pre-defined bounds.
A programmable ring-oscillator based CLK generator, capable of providing three CLK
frequencies (FHIGH, FNOMINAL and FLOW) automatically selects one of the three sampling
frequencies depending on the location of the output pole. The online adaptation scheme is
described as follows. It can be noted from the equation 4.1 that the output pole is a function
of the load current and load current can be estimated from the number of pull-up PMOSs
that are ON. We can use this knowledge to predict if the frequency of output pole is below
or above a predefined threshold. The circuit implementation of this adaptive controller
logic involves, observing the value at two specific bit locations of the barrel shifter (bit-
40 and bit-80 in this design) and feeding the observed output to two 10 bit counters. If
bit-80 is ’0’ for a consecutive of 1024 cycles, then the counter output reaches all ones,
indicating that for the last 1024 cycles the load current has been such that at least 80 pull-
up devices were ON. In other words, the location of the load pole (FLOAD), has moved to a
higher frequency. In such a case, the output of the counter will trigger the CLK generator
to switch to high frequency FHIGH. Conversely, if the similar situation is observed for bit-
40 (remains ’1’) then clock generator switches to FLOW. By following this mechanism,
the design maintains the output pole e-FLOAD/FS within bounds. Apart from the stability and
consistent performance, the biggest advantage of adaptive control is in power efficiency.
Since, at light load conditions the sampling frequency is also converted to a lower value the
41
controller power significantly reduces and thereby increases the power efficiency. This has
been measured in silicon (130 nm CMOS) and shown in the Fig. 4.4. A 4x higher current
efficiency is observed at light-load when compared with the baseline design.
Figure 4.4: Measured current efficiency of Discrete Time Digital LDO.
4.3 Continuous Time (CT) Digital LDO
Although quite compact and easy to integrate, the discrete time digital LDO a regulator
topology will suffer from low closed loop bandwidth, limit cycle oscillations where the
output PMOS devices switch continuously between one or more steady state values and
requires small signal sensing which is prone to mismatches and comparator offsets. Further,
the number of PMOS stages at the output stage provides a trade-off between the preciseness
of the voltage output and the response time of the LDO. Coarse quantization levels lead to
faster transient response at the expense of higher limit cycle oscillations and a steady state
error between the reference voltage and the output voltage. To address the above mentioned
short comings of the DT Digital LDO, we present in this chapter, a phase locked based
LDO [56, 66]. By implementing a continuous time (CT) control loop, similar in loop
dynamics to a phase-locked loop (PLL), we can provide regulation at high efficiency and
42
high bandwidth across a wide range of load currents. The regulator implemented on a 32
nm process technology has been used with a resistive load where it shows 97% current
efficiency and transient response times of 1020 ns. The LDO has also been inserted in a
digital signal processor where the embedded SRAM is running of a common high voltage
supply and the proposed LDO has been used to drop down to a digital supply to power the
core logic in the DSP.The details of the DSP and its implementation can be found in [67].
Figure 4.5: Architecture of the Phase locked LDO.
4.3.1 Design Principles
Fig. 4.5 illustrates the basic LDO design. It comprises of two voltage-controlled oscillators
(VCOs) with configurable lengths, one running off a reference voltage (VREF) and the other
off the sense voltage (VS), which is the output of the LDO (VOUT=VS). The two VCO
outputs (RCLK and SCLK) are used to clock a 32-bit Johnson Counter (JC) with embedded
output drivers, divided into four sections of eight stages each. For converting from voltage
43
Figure 4.6: Design of current Starved ring oscillator based VCO.
to frequency we have used a current starved ring oscillator based VCO. Current starving
the VCO has the advantage that VREF and VOUT draw no current. Further, the output level
of the VCO is at appropriate voltage level and requires no level shifting before clocking the
JC. Fig. 4.6 illustrates the design of the VCO.
A single Johnson Counter Stage has been shown in Fig. 4.7. The data input to each
stage is the output of the previous stage. i.e. Si-1 and Ri-1 are the input for ith instance.
These data inputs are latched at the rising edge of the SCLK and RCLK. The path between
VIN and VOUT consists of two parallel paths each containing two PMOS device switches.
When both Si and Ri are ”00” PMOS P1 and P2 are on. Similarly when Si and Ri are ”11”
PMOS P3 and P4 are on. The path between VIN to VOUT is closed or shorted whenever
the Si and Ri have the same logical value. Thus, there is an implicit XOR-ing of Si and Ri
signals and a short circuit path exists for the time that is proportional to the phase difference
between these two signals. At steady-state condition, this phase difference between RCLK
and SCLK locks to a constant value such that the amount of current provided by the pull-up
devices in this period of time matches the load current and holds VOUT at VREF. The phase
locking occurs at each stage of JC and the total current provided by all the PMOSs in a
time interleaved manner enables voltage regulation. If a load transient causes the output
44
Figure 4.7: JC stage illustrating phase detection and the level-shifting output pass PMOS
devices (P1 to P4).
voltage to decrease below VREF, the VCO responds by slowing down SCLK and stretching
the pulse Si. This perturbs the phase locking and additional phase difference is created
between Si and Ri, allowing the pass devices to supply higher current until re-locking and
regulation are again achieved. Similarly, if the output voltage increases when compared to
VREF SCLK speeds up, which reduces the phase difference and the loop goes out of lock.
This in turn reduces the supply of current by the pull-up devices and ultimately reduces
VOUT until re-locking is achieved.
The dynamics of this loop is similar to that of a phase locked loop used in CLK gener-
ation and recovery; and hence the range of locking and/or pulling needs to be investigated.
As we will see in this subsection, an overrun protection block increases the locking range
such that the phase detector does not limit the locking range. Instead, the locking range
is governed only by the maximum current handling capacity of the output pull-up devices,
and hence the frequency-range of the VCO. If the LDO is pushed far from lock, then RCLK
and SCLK would tend to overrun each other. This is prevented in the design by collision
45
Figure 4.8: Schematic diagram of overrun protection (OP) block.
detection and overrun protection (OP) (Fig.4.7).This does not allow the phase difference to
be cyclic at a period of 2π as is the case in a simple XOR based phase detector. Instead at
the extreme cases where SCLK is too slow or too fast the output of the phase detector with
the OP saturates the phase difference of 0 or 2π. If the output voltage is too low such that
the even after keeping the PMOSs on for the entire cycle time VOUT is unable to catch up
to VREF then the LDO will hold that state. On the other hand, if VOUT goes to an extremely
high value, then all the PMOSs are turned off for the entire cycle and this state is main-
tained until VOUT can be discharged to a point where regulation can restart. It can be seen
that in both these two extreme cases, regulation failed not because the phase detector did
not lock, but rather the VCO frequency failed to catch up with the instantaneous transients
on VOUT. The implementation of the OP follows the logic:
(a) propagates Ri if Si 6=Ri and
(b) propagates Si if Si=Ri. The circuit implementation is shown in Fig. 4.8.
In the Fig. 4.9, we have plotted a family curve obtained by studying transient phase
difference characteristics by varying VOUT by a small difference of VOUT at different load
currents (obtained by changing RLOAD). These curves demonstrate that when the transient
46
Figure 4.9: Instantaneous phase difference (Δφ) created when a transient event changes
the output voltage byΔVOUT. The overrun protection guarantees that the resultant transient
phase saturates to 0 on one end and 2π on another.
droops (overshoots) go beyond the regulation range of the LDO it saturates to a maximum
(minimum) phase corner. For example, the phase difference at RLOAD=400Ω reaches the
phase difference of 2π at ΔVOUT of -0.2V and it maintains that for higher droops.
In an effort to lower the controller power, we investigate the possibility of using a
lower supply voltage (VLOGIC) for the controller logic than VIN. By allowing the control
signals Si, Ri, and their complements to be level-shifted (Fig. 4.10) at the output stage,
the logic supply (VLOGIC to the VCO+JC) can be lowered below VIN, thereby gaining in
energy efficiency. Of course, this requires access to a second supply, VLOGIC and may not
be practical in all applications.
It is interesting to note that the JC computes phase differences in parallel. Commonly-
used phase-frequency detectors (PFD) [68] are operated at slow frequencies, whereas the
current design can be clocked at several GHz. Further, by virtue of the fact that at each
instance, at least one stage of the JC is operating on an edge, any perturbation from the
steady-state condition is immediately identified and corrected, a design aspect that is absent
47
Figure 4.10: Level-shifter (LS) schematic.
in a PFD running off a sub-sampled clock. Because the pass devices are driven by phase
differences, VLDO can reach VREF with infinitesimal voltage error (unlike a DT controller
where finite quantization levels at the input ADC and the output stages cause steady-state
limit cycle oscillations [64]).
As further optimizations, the LDO includes two identical counters (one clocked on
the rising edge and the other on the falling edge), allowing the VCOs to run at half the
frequency without sacrificing transient response time (Fig. 4.11). Furthermore, because a
JC propagates only one data edge at a time, significant power savings are obtained by clock
gating each section of the JC, in a manner shown in Fig. 4.11. A significant amount of
power (about 15%) is wasted in unnecessarily CLK-ing the flip-flops of the JC even when
it is not propagating any data edge. This is reduced by breaking up the 32-stage JC into
four sections (with eight stages in each). If the data-in and the data-out to a section is the
same, then that particular section is not propagating a data-edge and can be CLK gated.
The choice of four sections is dictated by the trade-off that lies in the amount of extra logic
required to implement CLK gating and the benefits from fine-grained CLK gating.
48
Figure 4.11: Block diagram of the eight JC stages illustrating operation on both clock
edges. Clock gating on each section provides higher efficiency of the control logic.
4.3.2 Control Model
In this section we present an s-domain control model of the proposed phase-locked LDO
whose dynamics are similar to a second order phase locked loop [69]. Assuming a linear
model, we can relate the applied control voltage ( VCTL) and the output phase(φ) generated







here N is the number of inverters in the chain and is the proportionality constant determined
by the speed of a component inverter. The product αN can be replaced by the constant KVCO.







Figure 4.12: Small signal Laplace model illustrating a second order system.





The resultant phase difference between the two clock signals is:
φD SS =
KV CO(VREF − VLOC)
s
(4.5)
The phase difference in (4.5) determines the amount of time the PMOS will be on. In a
simplified linear model, the power PMOSs can be modelled using the effective transcon-
ductance (GM SS) of pull-up devices. This leads to

























RLOAD and CLOAD are the effective resistance and capacitance of the load respectively and
GM SS refers to the steady state effective transconductance of all the distributed power-














s2 + sτ +KOPKV CO
VREF (4.10)
Equation (4.10) illustrates a second order system, much akin to a phase locked loop, whose
loop gain in controlled by the gain of the VCO and the output stage. It is also important
to note that both the loop gain and the output poles are affected by RLOAD, i.e., the load
current. The main source of error in the loop is the existence of phase noise which arises
from jitter in the VCO clock. Modeling the phase error as Eφ(s) and using a linear model,
we can obtain the phase noise transfer function as:
VOUT =
s.KOP
s2 + sτ +KOPKV CO
Eφ(s) (4.11)
From (4.10) and (4.11) we can write the overall transfer function of the control loop as:
VOUT =
KOPKV CO
s2 + sτ +KOPKV CO
VREF +
s.KOP
s2 + sτ +KOPKV CO
Eφ(s) (4.12)
The schematic representation of the model has been shown in Fig. 4.12.
Fig.4.13 shows the Bode plots for the LDO for a light load and a heavy load condition.
Even under light load sufficient phase margin can be achieved. A high system gain will
51
Figure 4.13: Simulated Bode plots of the open loop system illustrating a phase margin of
(a) 45° at light load (0.625 mA) in red and dashed (b) 98° at heavy load, 10X (6.25 mA) in
blue and solid.
+
Figure 4.14: Chip micrograph and characteristics.
cause the system to have a lower phase margin thus making it more prone towards insta-
bility. System gain can be decreased by either reducing the oscillator gain KVCO (i.e., by
increasing the number of inverter stages in the VCO) or by increasing the capacitive load,
CLOAD, which increases KOP. The loop can also be stabilized, by inserting a zero as, much
akin to the zero introduced by the equivalent series resistor (ESR) of the output capacitance
in analog LDOs.
52
Table 4.1: Voltage and current ranges for measurement.
Figure 4.15: VCO frequency with varying VCTL.
4.3.3 Measured Results
The digital LDO along with programmable and switchable NMOS loads is fabricated in 32
nm CMOS (Fig. 4.14 and Table 4.1) and occupies a total area of 7705 mm2 . The LDO
is used to power the DSP core logic in an Fast Fourier transform (FFT) engine where the
memory supply and the logic supply were separate. The details of the DSP core are not
relevant to this discussion and can be found in [67]. In this subsection, the measurement
results of using the proposed LDO on a realistic load circuit and show, how embedded reg-
ulation allows a separate supply to be used for the core logic and the embedded memory,
thereby allowing the logic to run at a lower supply (and power). The LDO is first charac-
terized with a resistive NMOS load whose strength can be programmed using built-in scan.
53
Figure 4.16: Chip micrograph and characteristics.
Further, the NMOS load can also be switched with programmable strengths and frequen-
cies to emulate supply droops and load transients. Table 4.1 shows the ranges of input,
output and reference voltages and load currents that were used for the measurements.
The VCO which performs large signal sensing of the output and the reference voltages
is first characterized for the sensitivity of the output frequency to the control voltage. In the
present design two current-starved VCOs are used, one with a thirty-one stage oscillator
and another with a seventeen stage-oscillator with a MUX based switchable feedback stage,
as is shown in Fig. 4.6. Fig. 4.15 illustrates the measured VCO frequency response for
both the longer and the shorter VCO chains. The two chains provide two different gains
(KVCO) of the loop and as is shown here, this is one way of controlling the open loop gain.
The gain KVCO as a function of the control voltage has been shown in Fig. 4.16 and this
has been used to calibrate the control model described in previous subsection. For VCTL
between 0.6V and 0.9V VCO frequencies in excess of 1GHz was measured, illustrating
the fast response time of the proposed structure. It should be mentioned that although the
phase locked design uses a VCO as a sense circuit, the dynamics exhibited by the loop is
in continuous time; and hence, the slow transients which characterize discrete-time CLKed
controllers is not present here.
54
Figure 4.17: Measured transient response for switching load current.
Fig. 4.17 illustrates the transient response of the LDO under a load transient where
the load current was changed by 3X (1.2mA to 400A and back up to 1.2mA). The mea-
sured transient response shows nano-second response time for both voltage overshoots and
droops.
Fig. 4.18 illustrates measured load regulation of the LDO from ILOAD=0.5mA to 3mA.
We note less than 1% load regulation for the measured current and voltage ranges.
Figure 4.18: Measured load regulation.
55
Figure 4.19: Effect of VLOGIC on the output settling time.
Figure 4.20: Power efficiency vs VLDO (VIN=0.8V and ILOAD=3mA) for different VLOGIC.
As was previously noted, the control logic can run at a lower voltage (VLOGIC) signifi-
cant power savings in the controller and increases the overall energy efficiency. However,
it comes at the cost of increased transient-response time an energy-efficient trade-off that
is often acceptable in digital circuits operating in the low-voltage/low-power mode. Fig.
4.19 illustrates the simulated normalized settling time as a function of VLOGIC.
Fig. 4.20 illustrates the measured power efficiency of the design for varying VLDO and a
nominal load current of 3mA. The smallest drop-out voltage of 50mV has been measured.
The ideal LDO efficiency (VLDO/VIN) has also been plotted. By lowering VLOGIC, the power
dissipation in the digital control loop can be significantly lowered and it slowly approaches
56
ideal efficiency. At ILOAD=3mA, VLDO=0.7V (VIN=0.8V) and VLOGIC =0.5V the power
efficiency reaches 85% when the ideal efficiency is 87.5% (i.e., 97% of the ideal efficiency,
or, equivalently 97% of current efficiency).
To understand the ability of the designed LDO, to regulate an embedded load, it is
integrated with the logic core of a digital signal processor, used for audio pre-processing.
The details of the design of the signal processor can be found in [67]. The schematic of
the overall structure has been shown in Fig.4.21, illustrating the use of the incoming line
voltage (VCC) to power the embedded memory and the LDO; and the core logic supply
is derived from the output of the LDO. The minimum operating voltage (VMIN) of the
embedded memory for successful read and write operations is measured at 0.8V. Hence,
the line voltage is kept constant at 0.8V and the LDO output voltage is reduced (thereby
reducing the core logic supply).
A separate experiment is performed where the memory supply is kept constant at 0.8V,
and the supply to the core logic is externally forced using an ideal voltage source. Fig.
4.22(a) illustrates the measured frequency response (FMAX vs VCORE) for both the two cases
and it illustrates that the ability of the embedded LDO to handle current transients associ-
ated with the computation. Fig. 4.22(b) illustrates the measured power of the overall design
with and without considering the power dissipated in the LDO. The LDO exhibits better
than 70% current efficiency across the entire dynamic range.
4.3.4 Summary
In this section a discussion on a phase locked continuous time digital LDO can be embed-
ded in digital designs for fine-grained power management is provided. Measured results
on a 32nm test-chip with a resistive load show current efficiency as high as 97%. The dig-
ital LDO has been embedded in a DSP processor and demonstrates a wide dynamic range
of operation. It illustrates better than 70% current efficiency when operated from 0.4V to
0.75V with an incoming supply of 0.8V.
57
Figure 4.21: Integration of Digital LDO with an FFT engine [67]. The memory is powered
by the input VCC while the low-power core logic is operated at VccCORE which is generated
by the integrated digital LDO.
Figure 4.22: Measured FMAX vs VccCORE, when VccCORE is powered externally and when
VccCORE is powered through the Digital LDO (b) Measured power of the logic core both
with and without the digital LDO.
58
CHAPTER 5
UNIFIED VOLTAGE AND FREQUENCY REGULATOR (UVFR)
5.1 Introduction
In recent years, fine-grain dynamic voltage and frequency scaling (DVFS) has become one
of the most popular and effective technique to reduce power consumption in multi-core
system-on-chip (SoC) designs. The applications that run on such SoCs demand run-time
adjustments of both the supply voltage and clock frequency in order to maximize energy
efficiency. In a traditional digital system, there are two separate and independent control
loops for voltage and frequency. External voltage rail control incurs significant cost in
terms of turn-on delays, board-level complexity, area and the number of available power
pins in the SoC. As a result, integrated voltage regulators (IVRs) have gained importance,
and in current multi-core SoC designs, multiple IVRs are required to provide fine-grain
spatio-temporal voltage control. For embedding in the SoC, linear regulators operating in
low-dropout (LDO) mode are preferred [35, 36, 41, 42, 54, 56]. Similarly, phase-locked
loops (PLLs) with a wide range of programmable divide ratios are used to provide inde-
pendent frequency control for the different domains.
Fig. 5.1 describes a traditional two loop system for voltage and clock control. The
IVR block (linear regulator) uses a voltage reference and provides a well-regulated volt-
age to the core. Similarly, a PLL uses a reference frequency to provide a local clock to
the digital core. However, in these systems, the clock frequency regulation is unaware of
dynamic changes in voltage, temperature or aging. Similarly, the voltage control loop does
not account for temperature, aging or the impact of dynamic parameter variations on clock
jitter or phase noise. Conventional designs apply voltage or clock frequency guardbands to
ensure correct operation of the digital pipelines during the presence of dynamic parameter
59
Figure 5.1: Traditional two-loop system for providing voltage and frequency.
variations. In order to reduce these guardbands, designs may employ adaptive or resilient
circuit techniques [13–20, 43, 45, 47] to mitigate the impact of dynamic parameter varia-
tions on performance and energy efficiency. However, such techniques have considerable
overhead; for example, [18] cites 9.4% power overhead and 6.9% area overhead when
compared to a baseline design. Further, these techniques also contribute towards increased
test time. Additionally, most of these techniques have limited response time and range of
resiliency for reducing the impact of high-frequency voltage droops.
This chapter describes a unified voltage and frequency regulator (UVFR) that combines
the voltage and clock frequency generation into a single control loop. The UVFR reduces
the circuit area and power overhead to support fine-grained DVFS as compared to a tradi-
tional two-loop control system. By incorporating clock generation and voltage regulation
in a single loop, there is a tight one-to-one coupling between the instantaneous voltage and
frequency. As a result, the frequency and voltage in UVFR intrinsically adapt to dynamic
parameter variations, thus significantly reducing the guardbands or the overhead of adap-
tive and resilient circuits for the traditional voltage and frequency regulation systems with
60
two separate and independent control loops. This also caters to current design trends for
per-core DVFS where embedded linear regulators are used for fine-grain spatio-temporal
power management in commercial as well as research SoCs [5, 8, 34, 40, 56, 57].
Figure 5.2: Unified voltage frequency regulator (UVFR) architecture.
Fig.5.2 shows the UVFR top-level architecture. Synthesized from all-digital cells, the
UVFR simultaneously generates and co-regulates a local clock (FLOC) and a local sup-
ply voltage (VREG) for a digital circuit block embedded in a multi-domain SoC. Here, a
frequency-only reference (FREF) is provided from a shared PLL. The single loop contains
a local tunable replica circuit (TRC) voltage-controlled oscillator (VCO) to produce FLOC.
After dividing FLOC by the PLL divide ratio (N), a phase detector compares FREF to FLOC/N
to drive a pulse-width modulator (PWM) for controlling the PMOS header devices to reg-
ulate VREG. Since VREG supplies power to both the TRC VCO and the digital circuit block,
FLOC and VREG are tightly coupled. For example, a change in VREG due to voltage noise
results in a simultaneous change in FLOC and the digital circuit path delays, thus mitigating
the timing-margin degradation for the digital circuit paths. In a multi-domain SoC design
with UVFR, the system states are determined by the target frequency of a domain and the
TRC VCO setting while the local VREG internally adapts to support the target frequency. A
61
DVFS state is uniquely defined by the performance (i.e., FREF) and the TRC VCO setting.
5.2 Design Principles
Figure 5.3: Johnson Counter based multi-phase unified voltage and frequency regulator
with a divide ratio of N=1.
The UVFR system utilizes FREF to produce FLOC from a local VCO (LVCO) that is
powered by VREG. Fig. 5.3 illustrates the circuit implementation with a divide ratio of N=1.
The reference clock and the LVCO outputs are used to clock a 16-bit Johnson Counter (JC)
with overrun protection, PWM generation and embedded output drivers. The OP block
will be discussed in detail in the next section.The outputs of the different JC stages (Ri
for reference clock and Li for local clock) form multi-phase and 16x subsampled versions
of the reference clock and the LVCO clock. Another 16-bit JC triggered by the negative
clock edges provide further multi-phase capabilities. At steady-state condition, the phase
difference between FREF and FLOC locks to a constant value and turns the power PMOS on
for the exact duration of time that the load current demands to keep VREG constant. This
is shown in Fig. 5.4, here δ̄φ represents the duty-cycle of the Power PMOSs. The phase
locking occurs at each stage of the JC and the total current provided by all the PMOS
devices in time interleaved manner enables voltage regulation. If a load transient causes
62
Figure 5.4: At steady state FREF (R) and FLOC (L) settle down at same frequency with a
constant phase difference
the VREG to decrease from its steady state value, then the LVCO responds by slowing down
FLOC and stretching the pulse at Li. This perturbs the phase locking and creates additional
phase difference allowing the pull up devices to supply higher current until re-locking and
regulation are again achieved. Similarly, if the VREG increases from its steady state value
FLOC speeds up, which reduces the phase difference and the loop goes out of lock. This in
turn reduces the supply of current by the pull-up devices and ultimately reduces VREG until
re-locking is achieved. The process locks FREF to FLOC and a multi-phase design enables a
ripple-free VREG.
5.2.1 Overrun Protection
XOR based phase detector (PD) suffers from phase aliasing. The overrun protection (OP)
circuit helps to remove this effect as has been shown in the Fig 5.5. The OP block functions
based on the logic that (a) holds the value of Ri if Li=Ri and propagates the previous stage
value (Rt-1) to Ri if Li 6=Ri and (b) holds the value of Li if Li 6= Ri and propagates the previous
stage value (Li-1) to Li if Li= Ri. The thought process behind this logic can be explained
through the timing diagrams in Fig. 5.6. In this figure, FREF has been represented by R,
FLOC has been represented by L, and L’ represents FLOC during a transient event such as a
droop or overshoot.
When VREG goes through a droop at steady state:
63
Figure 5.5: Overrun protection (OP) to prevent aliasing in large phase errors.
(a) If Ri=1 and Li=1, then Ri should be held at 1 (Fig. 5.6(a)).
(b) If Ri=0 and Li=0, then Ri should be held at 0 (Fig. 5.6(b)).
When VREG goes through an overshoot at steady state:
(c) If Ri=0 and Li=1, then Li should be held at 1 (Fig. 5.6(c)).
(d) If Ri=1 and Li=0, then Li should be held at 0 (Fig. 5.6(d)).
The circuit implementation of the OP block has been shown in the Fig. 5.7. The OP
block is needed to remove the locking range limitation imposed by the XOR gate based
phase detector. Instead it is able to lock the loop and operate till the maximum current
limitation of the power devices. For a detailed discussion to this effect, interested readers
are pointed to [66]. It can be concluded that the locking range is governed only by the
maximum current handling capacity of the output pull-up devices, and hence the frequency-
range of the VCO. During a large load transient, when FLOC slows down with respect to
FREF, the phase difference FLOC and FREF saturates to 0. This implies 100% duty cycle for
the pull-up devices i.e. the devices remain on throughout the cycle and provide maximum
load current possible. On the other extreme if the FLOC is much higher when compared to
FREF, due to either a change in FREF or a large negative load step, then the phase difference
64
Figure 5.6: Timing diagrams of the overrun protection unit. Here R represents FREF, L
represents the steady state LOC and L’ represents FLOC under a transient event. (a) If R=L=1,
then R should be held at 1, (b) if R=L=0, then R should be held at 0, (c) if R=0 and L=1
then L should be held at 1, and (d) if R=1 and L=0, then L should be held at 0, to prevent
phase aliasing.
approaches π. This implies that the pull up devices are turned off until the output voltage
decreases to restore the locking between FREF and FLOC. It is interesting to note that the JC
computes phase differences in parallel. By virtue of the fact that at any instance, at least one
stage of the JC is operating on an edge, any perturbation from the steady-state condition
is immediately identified and corrective action is taken. Further, the proposed circuit is
designed with digital gates and all the control nodes are full swing. However, the control is
essentially analog in the sense that it is time-based. Hence, as a voltage regulator loop, the
proposed circuit doesn’t exhibit limit cycle induced ripple which is a major shortcoming of
all-digital LDOs [20].
65
Figure 5.7: Schematic diagram of overrun protection (OP) block.
5.2.2 Digital Logic Load
The UVFR is designed to drive digital logic circuits. In the test-chip implementation, the
digital load consists of a pipeline stage of random logic with an error-detection capability
as described in (Fig. 5.8). A positive edge triggered flip-flop and a positive latch sample
the same input data. Since the flip-flop samples data on the rising clock edge and the
latch samples data on the falling clock edge, the latch allows a longer delay based on the
clock high-phase delay. This configuration, referred to as error-detection sequential (EDS),
produces an error signal if the output of the flip-flop and the latch are unequal, which
signifies a delay error [20]. The EDS is used to measure timing errors during dynamic
variations and to evaluate the UVFR circuits. The error-detection window is equal to the
high phase of the clock, which captures dynamic delay variations of ˜50% of the cycle
time for the test-chip implementation. From measurements it was observed that for droops
of up to 35% we can correctly capture any pipeline error. Scan programmable DC load
circuits and high-speed noise generation circuits are integrated to produce a large dynamic
load range and abrupt load steps to mimic realistic load conditions.Capability is provided
66
Figure 5.8: Digital Logic Load with (1) pipeline with EDS and (2) programmable DC load
and (3) programmable noise generator.
through high speed pads to excite load transients as well as observe FREF, FLOC, error signals
from the pipeline output and the output voltage node, VREG.
5.2.3 Local Voltage Controlled Oscillator
The LVCO consists of a scan programmable inverting tunable replica circuit (TRC), which
is calibrated to mimic half of the critical path delay and consists of both transistor-dominated
and interconnect-dominated delay as illustrated in Fig. 5.9a. It is half of the critical path
delay because 1/FLOC should equal twice the TRC path delay. A level shifter is used to feed
FLOC to the JC. The schematic diagram for the level shifter has been provided in the Fig.
5.9b. At steady state, VREG is at the correct voltage such that the TRC based VCO locks its
frequency to NFREF. Consequently, VREG is also the correct voltage to enable the critical
path of the pipeline circuit to meet the timing requirement (1/FLOC). The digital load is
clocked by LVCO. Hence, any voltage droop (overshoot) at VREG leads to LVCO slowing
down (speeding up) proportional to the critical path thereby preventing delay errors in the
pipeline. This leads to a larger (smaller) phase difference between FREF and FLOC which
in turn increases (decreases) the duty cycle of power PMOS. This brings VREG back to
regulation and FLOC back to FREF simultaneously.
67
Figure 5.9: Schematics for the (a) TRC-based VCO and (b) level shifter.
UVFR has low calibration overhead. Instead of calibrating the supply voltage corre-
sponding to a frequency as in a conventional DVFS, the UVFR TRC setting is calibrated
for each FREF value. TRC circuits are programmable within 0.5% of clock-cycle time,
which reduces design pessimism.
5.3 Design Analysis
5.3.1 Small Signal Model and behaviour
To understand the system dynamics, we linearize the loop and formulate the model. The
derivation assumes linearity between VREG-FLOC , with TRC oscillator using VREG as its
supply voltage and FLOC as its output frequency. The model derivation for UVFR design
is similar to a phase locked LDO small signal model [66]. The resultant phase difference
between the two clock signals can be given as:






The phase difference in (5.1) determines the amount of time the PMOS will be on. In a
simplified linear model, the power PMOSs can be modelled using the effective transcon-
68
ductance (GM SS) of pull-up devices. This leads to


























RLOAD and CLOAD are the effective resistance and capacitance of the load respectively and
GM SS refers to the steady state effective transconductance of all the distributed Power-
PMOSs combined. Since, this is a small signal model a linear relationship between FLOC
and VREG
FLOC = KTRCVREG (5.5)
Here, KTRC is TRC oscillator gain. Using eqn. (5.3) and (5.5), the closed loop transfer
function between VREG ,FLOC and FREF is given as:
VREG =
KPMOS
s2 + sτ +KPMOS
KTRC
N
FREF , FLOC =
KPMOSKTRC




The entire derivation has been summarized in the form of a block diagram and has been
provided in the Fig. 5.10.
Eqn. 5.6 illustrates a second order system, similar to a phase locked loop, whose loop
gain in controlled by the gain of the TRC oscillator and the output stage. It is also important
to note that both the loop gain and the output poles are affected by RLOAD, i.e., the load
current.
Fig. 5.11 shows the Bode plots for the LDO for a light load (500µA) and a heavy
69
Figure 5.10: Small signal s-domain model of the UVFR control loop.
load (5mA) conditions. We observe that in light load conditions the phase margin of the
system degrades to 52°as compared to heavy load condition where the phase margin is
nearly 90°. The dominant pole of the open loop is at the origin which originates from the
VCO. Decrease in the load current moves the output pole to a lower frequency and reduces
the phase margin. Hence, in this current topology the light load stability of the loop has to
be maintained, which is similar to the design of a capacitor-less internal-pole compensated
analog LDO.
Since, the dominant pole of the system is not at the output, a decreasing output capaci-
tance (CLOAD) makes the system more stable by increasing the phase margin. This is shown
in Fig. 5.12. This makes the proposed system apt for multi-domain SoCs, where decou-
pling capacitance on individual rails is constraint limited. Finally, the high system gain
at DC (pole at the origin) makes the local clock (FLOC) capable of tracking the reference
frequency (FREF) within the limits of thermal jitter and phase noise; thus showing high DC
regulation.
70
Figure 5.11: Simulated Bode plots of the open loop system indicating a phase margin of
(a) 52°at light load (0.5 mA) in dashed and (b) 89° at heavy load (5 mA) in solid.
5.3.2 Large signal behavior
In the previous section we assumed a linear relationship between VREG and FLOC. As pre-
viously mentioned, even if FLOC changes non-linearly with VREG, which is typical for large
voltage droops, the timing margin will not degrade as long as the sensitivity of the TRC
and the critical path to the supply voltage are similar. Consider that during a droop VREG
changes and in response FLOC changes as FLOC = f(VREG ), where f represents the non-linear
dependence between the instantaneous frequency and the output voltage. Since the criti-
cal path of the pipeline has the same voltage sensitivity to the TRC path, the critical path
delay slows down at the same rate as the cycle time (=1/FLOC). This scheme allows inten-
tional phase deviation on FLOC that is perfectly correlated to VREG. This phase deviation,
in response to voltage droops, dominates over the random component of jitter and creates
a tightly coupled VREG-FLOC pair, even if there are non-linearities in the loop.
5.3.3 Output Voltage Ripple and Local clock phase noise
During general operation of UVFR, due to the JC implementation at least one of the power
MOSFETs will continue to provide the load current as long as the following condition
71
Figure 5.12: Plot of phase margin (PM) versus load capacitance variation. The PM reduces




< No. of interleaving stages (5.7)
. Here, the maximum load current refers to the load current that will be provided if all
the power PMOSs are turned on for the complete cycle. If the condition in Eqn. (5.7)
is violated then UVFR will act in a discontinuous conduction mode and that can lead to
higher output voltage ripple. When the load current increases the duty cycle of the PMOSs
increase and there is higher overlap between PMOSs controlled by adjacent stages of JC
and this leads to reduction in output voltage ripple. Higher FREF will also have similar
effect. A higher FREF will require a higher VREG and this leads to increase in both dynamic
and static load current. Fig.5.13(a) shows the output voltage ripple versus the number of
interleaved phases (total stages of JC) by varying the reference frequency at a fixed load
current of 3mA. As we increase the number of interleaving stages for a constant load current
the voltage ripple decreases. As seen in the plot the increase in reference frequency reduces
the output ripple as well. Fig. 5.13(b) shows the output ripple voltage versus interleaved
phases by varying the load current at a constant reference frequency of 600 MHz. As we
72
Figure 5.13: Output voltage ripple versus number of interleaving stage at (a) constant load
current (b) constant reference frequency.
reduce the load current for same reference frequency the plots show an increase in voltage
ripple. At a load current of 1mA, linear regulator configurations with less than 6 stages fail
to regulate and at a load current of 500A, linear regulator configurations with less than 12
stages fail.
5.4 Test Chip and Measurements
The test chip is fabricated in GF 130nm 8-M CMOS process and the UVFR occupies an
active area of 0.0204mm2 as shown in Fig. 5.14. The total silicon area is 0.11 mm2, which
includes active devices, local TRC VCO, load circuit and scan logic. The test-interface is
a QFN package.
UVFR allows VREG to autonomously adapt to process, voltage, and temperature (PVT)
variations while the LVCO maintains frequency locking to FREF. Measurements in Fig.5.15
demonstrate that the loop can track (a) temperature, (b) process and (c) aging variations
to adjust VREG to maintain the target FREF while reducing the voltage guardband. Fig.
5.15(a) illustrates how VREG automatically changes with FREF all the way to near thresh-
old voltage (NTV) operation. At FREF=100KHz and T=90° C, the loop maintains regulation
73
Figure 5.14: Chip micrograph and characteristics.
Figure 5.15: Measured results show VREG adapting with (a) temperature, (b) process and (c)
aging variations to maintain frequency lock. The process VT = 350mV and UVFR operates
from 0.84V to 0.27V..
with VREG=270mV, which is below the process threshold voltage (VT) (linear VT=300mV).
At FREF=500MHz and T=90° C, the loop locks with VREG=0.84V. When FREF is low (<
1MHz), the VREG is regulated at voltage levels below the threshold voltage. In this condi-
tion, the drain current of the gates in the TRC-based LVCO is primarily the subthreshold
current that increases exponentially as temperature increases. As the drain current directly
influences the gate delays, the regulated voltage reduces when temperature is higher. On
the other hand, when FREF is high (> 100 MHz), the drain current in the TRC-based LVCO
gates is dominated by saturation current, which has a negative temperature coefficient at
74
Figure 5.16: Measured oscilloscope capture showing full load step and local clock adapting
to VREG changes.
high voltages. The temperature coefficient is negative at high voltages because the carrier
mobility changes are more sensitive to temperature changes as compared to the gate over-
drive of the drain current. Therefore, the required VREG increases when the temperature
increases for the same FREF. Fig. 5.15(b) shows UVFR performance with respect to pro-
cess variations. In this figure die 2 represents a fast part and die 3 represents a slow part.
Fig. 135.15(c) demonstrates UVFR performance under aging. For this experiment, the
chips are kept at high temperature with the clock enabled at high frequency and the mea-
surements are taken at periodic intervals as marked on the x-axis (aging time).Measured
guardband reduction is 14-32% for temperature, 30% for process and 6-7% for aging.
Fig. 5.16 is an oscilloscope capture of a 3mA load step at FREF=400MHz. The steady
state VREG required to lock FLOC to NFREF is 760 mV. The maximum droop observed is 160
mV and the droop recovery time is 180 ns. The magnified block shows that the local clock
also slows down in response to the supply voltage droop and follows the same profile as
VREG.
Fig. 5.17 shows UVFR performance in terms of voltage droop and settling time when
FREF is varied. As FREF increases to transition from a low-power mode to a high-performance
75
Figure 5.17: Measured (a) voltage droop and (b) settling time for varying FRREF.
mode, the voltage droop decreases (Fig. 5.17(a)) and the settling time improves (Fig.
5.17(b)). In conventional designs, the pipeline waits for the voltage regulator and the PLL
to both settle before operating at the new DVFS mode. However, in the case of UVFR,
since there is a tight coupling between VREF and FLOC, timing margins do not degrade dur-
ing power state transitions. Since the load is constituted of digital blocks, the dynamic
current (i.e., switching and short-circuit current) increases when FREF, and therefore FLOC,
are higher. An increase in load current causes the output pole frequency to increase, thus
improving the bandwidth. Further, a higher FLOC causes the charging and discharging pe-
riod of the load capacitor to reduce. As a result, UVFR shows improved performance in
terms of voltage droop and settling time when FREF is higher.
Fig. 5.18 shows the ability of the UVFR scheme to avoid delay errors during voltage
droops. For this experiment, FLOC at steady state is regulated at a frequency of 200 MHz
(N=1, FREF=200 MHz) and a load step of 3 mA is applied. The maximum voltage droop
is 155mV, resulting in a minimum FLOC of 98 MHz. As the VREG droops under a load
step, the baseline design violates the timing-margin requirements, resulting in pipeline er-
rors. In contrast, the UVFR continues to operate without any pipeline error, demonstrating
76
Figure 5.18: Measured scope data on high-speed active probe demonstrates that UVFR
enables error-free operation even under large voltage droops.
compensation between the clock and data to maintain the timing-margin target.
Fig. 5.19 shows the measured VREG-FREF trade-off between the baseline design and
the UVFR while only considering the guardband for voltage droops. Owing to a smaller
voltage guardband, UVFR enables 18-27% reduction of VREG at iso-FREF for FREF ranging
from 500MHz to 10MHz.
Fig. 5.20a shows 2mV/mA of load regulation for different FREF-VREG combinations.
Fig. 5.20b shows 10mV/V line regulation with corresponding frequency locking as mea-
sured over an average of 1000 cycles. During these measurements, the digital load circuit
is continuously clocked.
In Fig. 5.21, current efficiency versus load current is shown across a range of reference
frequencies from 10MHz to 500MHz. The controller power consumption increases when
FREF is higher as the switching and short-circuit current increase. The UVFR macro con-
sumes 36 µA at FREF=100KHz to 330 µA at FREF=500MHz with VIN=1V. The peak current
efficiency is 99.4%.
Table 5.1 compares the LDO characteristics with other published results to indicate
competitive figure-of-merits (FOMs).
77
Figure 5.19: Measured voltage regulation (VREG) versus reference clock frequency (FREF).
5.5 Summary
A single control loop unifies the supply voltage and frequency regulation in a 130nm
CMOS test chip. The unified voltage and frequency regulator (UVFR) provides a tight
coupling between the local clock frequency and the regulated voltage. As a result, the local
clock frequency autonomously adapts to variations in voltage, thereby allowing a voltage
guardband reduction as compared to a traditional voltage and frequency regulation system
with two separate control loops. The system demonstrates error-free pipeline operation
during large voltage droops and overshoots. Measured silicon data across a wide range of
voltage and frequency conditions reveals an 18-27% voltage reduction at iso-performance
through adaptation that is intrinsic to the UVFR control loop.
78
Figure 5.20: Measured (a) load regulation and (b) line regulation.
Figure 5.21: Measured current efficiency versus load current.
79
Work This Work [36] [35] [62] [61] [60]
Type LDO+Clock LDO LDO LDO LDO LDO
Technology(nm) 130 65 130 65 65 40
LDO Type Digital Digital Digital Digital Digital Digital
Control Methodology Co-regulation Linear Linear Event-driven SAR/PD/PWM Linear
VIN(V) 0.6-1 0.6-1 0.5-1.2 0.45-1 0.5-1 0.6-1.1
VOUT(V) 0.38-0.81 0.55-0.95 0.45-1.14 0.4-0.95 0.3-0.45 0.5-1
Headroom 190 50 50 50 150-200 100
Load Current:Imax(mA) 6 500 4.6 3.4 2 210
Load Regulation (mV/mA) 1.8 0.25 10 NA 5.6 0.075
Controller Current:ICTL(µA) 36-300 300 24-221 8.1-258 14 22.6-98.5
Total Capacitance(nF) 0.2 1.5 0.8 0.1 0.4 20O
Active area (mm2 0.0204 0.158 0.021 0.03 0.0023 0.1926
Peak Current Efficiency(%) 99.4 99.99 98.3 99.2 99.8 NA
Droop(mV)@Load-step(mA) 163@3 35@100 40@0.7 34@1.44 40@1.06 36@200
FOM1(ns/mA) 0.32 0.4 NA 3294.11 8.7E-5 6.5
FOM2(ps)* 666 1.6 76.5 26682 105 57.14
FOM1=Droop recover time / Load-step; FOM2 = (Transient time)*ICTL/IMAX; NA = Insufficient data;* Normazlied to technology node
Table 5.1: Comparison with LDOs for voltage regulation.
80
CHAPTER 6
QUAD-OUTPUT ELASTIC SWITCHED CAPACITOR CONVERTER
6.1 Introduction
With an increasing number of power domains, fine-grain per-core DVFS and decreasing
decoupling capacitance per domain, power delivery and management in digital SoCs con-
tinue to pose serious challenges. Switched capacitors (SC) have gained popularity due to
their ability to provide high efficiency and ease of on-chip integration [70–73] . However,
the SC provides high efficiency only in a limited input and output range as they are designed
and optimized for discrete conversion ratios. Multi-ratio switched capacitor (SC) DC-DC
converters provide high energy efficiency for multiple conversion ratios and therefore can
enhance this range [50, 51, 53, 74–80]. In chapter 2, selected designs that implement multi-
ratio SC designs have been discussed. Unfortunately, in spite of enhanced high efficiency
range, multi-ratio SCs usually suffer from low energy density due to low on-die capacitance
density.
In multi-core SOC designs SCVR output is typically regulated with linear voltage reg-
ulators (VRs) (including LDOs) to provide power to local grids. However, if the regulated
voltage is far-off from the SCVR output voltage power efficiency drops significantly. In
such a case per-core SCVR would be a better choice. But, per-core SCVR has the follow-
ing major short-comings: (1) reduction in per core total available capacitance and switch
area (2) inefficient usage of capacitance and switch resources when a core is in sleep or
idle mode. These short-comings lead to reduction in power conversion efficiency. To ad-
dress this [53] presents an integrated dual-output SC converter with dynamic power-cell
allocation. However, while such redistribution of resources is easy for a system of two
core systems, the distribution logic becomes exponentially more complex for three or more
81
Figure 6.1: Detailed top-level structure of the Quad-Output Elastic Switched Capacitor
Converter supplying power to 4 cores.
cores.
In this chapter, we present a quad-output elastic SC (QOESC) converter with per-core
LDO (Fig. 6.1) that provides regulated voltage supply to 4 cores. As opposed to a baseline
design where a SC converter (SCC) is dedicated per core, the current design routes power
on demand by sharing the total capacitance network across all the cores and delivering
power to each core in a time interleaved manner. As the current demand of a particular
core increases (as indicated by the duty-cycle of the local phase-based LDO [56, 66, 81],
more cycles/resources are dynamically and autonomously allotted to the core. If the power
demand increases further, the corresponding SCC moves to a higher output voltage by
dynamically switching the conversion ratio. Each core is supported by three ratios (3⁄4,
1⁄2 and 1⁄4) and the dynamic resource allocation/power management is realized through a
fully digital finite state machine (FSM). Just like turbo mode for thermal management, the
proposed topology allows one core to run at a power of approximately 4PMAX while others
are in standby (approximately 0 power), as opposed to a baseline design where each core
can run at a maximum power of PMAX only. In the following sections in this chapter, we
will discuss the QOESC architecture, design and principles of operation, dynamic control
82
Figure 6.2: Block level diagram for QOESC architecture.
and phase allocation via the FSM design, measurement results and conclusions.
6.2 Architecture, Design Principle of Operation
Fig.6.1 shows the top-level structure of the QOESC supplying power to 4 cores. As shown
each core has its own LDO. For this work, we have used phase locked LDO (PLDO), a
continuous time digital LDO, as it can leverage the benefits of a digital LDO of low voltage
operation without suffering from the limit cycle oscillations that are present in discrete
digital LDOs. For the switched capacitor design, we have used Extended Binary (EXB)
scheme that uses two flying capacitors to produce 3 ratios with 1⁄4 resolution. The VIN
ranges from 1V to 1.2V and VOUT ranges from 0.15V to 0.9V for this design.
The detailed architecture of the QOESC test-chip is shown in Fig.6.2. The figure shows
only a single core for ease of representation. The capacitance and switch resources have
been divided into 32 identical resource slices, each forming a unit EXB SC block. QOESC
83
Figure 6.3: Decision flow chart for resource allocation in QOESC architecture.
design routes power on demand by sharing the total capacitance network across all the cores
and delivering power to each core in a time interleaved manner. As mentioned before,
when the current demand of a core increases, more resource slices are dynamically and
autonomously allotted to the core. The current demand is indicated by the duty-cycle of
the input pulse width modulated (PWM) signals of PFETs of the local phase-based LDO.
(Phase locked LDO and the PWM signals will be discussed in upcoming subsections).
The PWM signal forms the input of a 32-bit counter that uses the same reference clock
of the PLDO. The output of the counter duty cycleLDO PFET IN is a digital signature that
is a measure of the duty-cycle which is compared to preset duty-cycle thresholds using a
digital comparator. If duty-cycle is found to be high (low) then resource slices are added
(removed) to the core by increasing (decreasing) the number of interleaving cycles to the
design. An upper and a lower limit has been set for the total number of resource-slice that
can be dedicated to a single core. If the duty-cycle of PLDO for a core remains higher than
upper duty-cycle threshold, even after reaching the upper limit of resource slices, then the
SCC responds by increasing the conversion ratio. Similar corollary also exists for a case
84
Figure 6.4: Detailed top-level structure of interleaving and resource sharing scheme loop
control.
when the duty-cycle is below the lower duty-cycle threshold. To improve cross regulation
during sudden and large voltage droops, every core is provided with a droop-detector. If a
core experiences a droop, the droop-detector triggers the control of multiplexer to increase
the transient switching frequency for minimizing the impact of the droop on the core as
well as the neighboring cores. Fig. 6.3 provides further clarity by providing the decision
flow for resource slice allocation in the QOESC architecture.
6.2.1 Quad-output Elastic SCC Design
The SC network (SCN) uses the EXB scheme, mentioned before, to generate multiple step-
down ratios. Fig. 6.4 shows the top-level structure. The current design supports conversion
ratios of 3⁄4, 1⁄2 and 1⁄4. The goal of the design is to flexibly allocate capacitor and switch
resources as per load demand of each of the 4 cores. Further, for each output, 32 time-
interleaved phases are generated that reduce output voltage ripple at the SCN output. The
32-stage time interleaving is realized through 7 circular 32-bit shift registers (bank1) for
each of the phases, as shown in the columns of switch control table in Fig.6.6. The resource
sharing is implemented through 4 circular 32-bit shift registers (bank2) for the 4 cores. The
85
Figure 6.5: Detailed top-level circuit diagram of interleaving and resource sharing scheme
loop control.
32 unit SC blocks obtain their phase inputs from the shift registers in bank1 and generate
different ratios in a periodic sequence. The registers in bank2 are responsible for making
the correct connection between the desired ratio and the desired core. To add further clarity,
consider the traditional interleaved SC designs, where a single ratio is generated for all the
phases and is connected to a single output in each of those phases. In case of QOESC,
a sequence of ratios is generated periodically based upon the inputs provided by bank1
registers and the ratio is directed to the desired core through additional switches which are
controlled by the registers in bank2. It takes 4 cycles to generate each ratio therefore 8
ratios are generated in 32 cycles. To summarize, a sequence of 8 step down voltage states
are generated and allocate to the 4 cores as determined by the individual load requirements.
Fig. 6.5 demonstrates the detailed circuit level implementation. Each of the two register
banks have the facility of parallel load during initialization. The loading operation is done
when the signal sc en is set to 0. The initialization values dictate the series of ratios as well
86
Figure 6.6: Extended binary switched capacitor converter circuit diagram and switch con-
trol tables for3⁄4, 1⁄2 and 1⁄4 ratios.
as the core connections. The initialization values which pertain to core connections can be
provided either by the phase allocation FSM, when fsm en is equal to 1 or can be provided
externally, when fsm en is set to 0. All the above-mentioned registers are synchronously
driven by the SCN reference clock which provides one single switching frequency for all
the ratios. In general, the clock frequency is set at value such that it provides highest
efficiency for the core with highest power consumption.
6.2.2 Extended Binary Bit Switched Capacitor
EXB scheme can be used to generate multiple step-down ratios in binary resolution [82,
83]. Unlike conventional binary representation, EXB refers to a modified signed-digit rep-
resentation with 0, 1 and -1 as its numerals. This allows for multiple representation for
the same number through non-unique EXB codes. To provide further clarity the following
87
example is provided. Any number N in the range (0, 1) can be represented in the form:





where A0 can be either 0 or 1, Aj takes any of three values -1, 0 and 1, and n defines the
resolution. For illustration, the code {1 0 -1 1} {1 0 0 -1} both represent 7/8.
{1 0 -1 1} = 1 + 0 · 2−1 − 1 · 2−2 + 1 · 2−3 = 7/8,
{1 0 0 -1} = 1 + 0 · 2−1 + 0 · 2−2 − 1 · 2−3 = 7/8
The process of generation for different EXB codes, for a given N, is intuitive and iterative.
The procedure starts with the conventional signed binary code representation of the number
N. We begin from any Aj that is equal to ”1”. First step is to add 1 to the jth column or
location in the signed bit representation of the number N. This would result in Aj becoming
”0”. In order to maintain the original value of N we add ”-1” to the jth location. This makes
the original Aj which was ”1” convert to ”-1”. In short, replace a 1 by -1 and then add a 1
to the bit on the left. The procedure is repeated for all Aj =1 in the original code and for all
Aj =1 in each newer EXB code generated. Since, for this design the resolution n is equal to
2, the EXB generation process has been shown for 3⁄4, 1⁄2 and 1⁄4.
N=3⁄4
(1){0 1 1} + {0 0 1}= {1 0 0}, {1 0 0} + {0 0 -1}= {1 0 -1}
(2){0 1 1} + {0 1 0}= {1 0 1}, {1 0 1} + {0 -1 1}= {1 -1 1}
N=1⁄2
(1){0 1 0} + {0 1 0}= {1 0 0}, {1 0 0} + {0 -1 0}= {1 -1 0}
N=1⁄4
(1){0 0 1} + {0 0 1}= {0 1 0}, {0 1 0} + {0 0 -1}= {0 1 -1}
(2){0 1 1} + {0 1 0}= {1 0 1}, {1 0 1} + {0 -1 1}= {1 -1 1}
88
For 3⁄4 and 1⁄4 there are a total of 3 codes including the standard binary signed bit repre-
sentation. However, for 1⁄2, there are only 2 codes. This is because in order to represent 1⁄2 a
resolution n=1 is required. For a number N with resolution n (equation 6.1), the minimum
number of EXB codes is n+1. This is because for each Aj that is equal to ”1” in the conven-
tional binary code with resolution n, generates a new EXB code and a carry. Furthermore,
since the generated EXB codes results in the propagation of a carry, each Aj that is equal
to ”0” in the binary code, will turn into a ”1”, this will result in a newer code. Another
conclusion that can be drawn is, for each Aj = 1 in the original (signed binary code) and the
newly generated EXB codes of a given N there will be at least one Aj = -1 in another EXB



























{1 -1 1} {1 0 -1} 
{0 1 1} 
















{1 -1 0} {0 1 0} 



























{1 -1 -1} {0 1 -1} 
{0 1 1} 
Figure 6.9: Switched Capacitor configuration for N=1/4 based on EXB codes.
0.VIN + 1.VC1 + 1·VC2= VOUT,
1.VIN + 0.VC1 − 1·VC2= VOUT,
1.VIN − 1.VC1 + 1·VC2= VOUT
0.VIN + 1.VC1 + 0·VC2= VOUT,
1.VIN − 1.VC1 − 1·VC2= VOUT,
0.VIN + 0.VC1 + 1·VC2= VOUT,
0.VIN + 1.VC1 − 1·VC2= VOUT,
1.VIN − 1.VC1 − 1·VC2= VOUT
3/4 1/4
1/2
Figure 6.10: KVL equations for the SC configurations for N=3⁄4, 1⁄2 and 1⁄4 (Fig.6.7, Fig.6.8,
Fig.6.9).
The various EXB codes of a given number Nε(0,1) can be translated into different
sequence of SCC topologies that would finally create an output voltage such that the ratio
of VOUT to VIN is equal to N. For such a step-down SCC, the circuit would consist of
a voltage source VIN, n flying capacitors Cj and output load. The connection of VIN is
defined by the coefficient A0 in each of the EXB codes for a given N. If A0=1 then VIN
is connected and if A0=0 then VIN is not connected. The flying capacitors Cj are always
connected serially according to the coefficients Aj in the EXB codes. If Aj=1, then Cj is
connected in series to the output in same polarity to load if Aj=-1, then Cj is connected in
series to the output in opposite polarity to load and if Aj=0, Cj is bypassed.
Fig.6.7, Fig.6.8 and Fig.6.9 show the one to one translation of the EXB codes into
90
circuit configurations. If the SC cycles through these configurations, then at steady state
voltage across capacitor C1 (VC1), would be equal to VIN/2 and across capacitor C2 (VC2)
is equal to VIN/4. The ideal value of the voltage across VOUT would be NVIN.
As mentioned before since there is a ”-1” corresponding to every ”1” in the EXB codes
every flying capacitor will go through discharge and charge cycle and thereby will obtain
its desired nominal value. The capacitors do not need to start from the steady state value,
they can charge from 0V as their initial condition.
Fig.6.10 shows the circuit equations for the different configurations for the three dif-
ferent ratios. The equations follow Kirchhoff’s Voltage Law (KVL). It can be noted that
for the ratios of 3⁄4 and 1⁄4 there are three unknowns (VOUT, VC1 and VC2) and three equa-
tions. For the ratio of 1⁄2, there are two unknowns (VOUT, VC1) and two equations. Since,
each of the equations is linearly independent of others for a given N, it will lead to unique
solutions for the unknown variables. Here, I have used n=2 for ease of explanation and
demonstration. In practice, these results can be extended for any natural number value for
n. In summary, for a resolution of n, the EXB requires, one input source, n flying capacitors
and 1 output node or capacitor. By running through all the codes for the given N, the SCC
is in fact subjecting the capacitors (including the output capacitor) to the set of equations
shown in Fig.6.10.
This design scheme allows for a seamless multi-output SCN design through an arrange-
ment of flying capacitors, reconfiguration switches and digital control. In the implemented
design n=2; therefore, we can generate 3⁄4, 1⁄2 and 1⁄4 ratios. For each ratio in this scheme 4
EXB representations are used, that translates into 4 circuit arrangements. Since, the max-
imum number of representation possible for N=2 is 3 at least one of the representation is
repeated. Through this mapping, a multi-phase SC design is formed that can generate 2n-1
ratios, where n is total number of flying capacitors. The circuit diagram and the switch
control table for generating these ratios have been provided in Fig.6.6.
91
Figure 6.11: Block Diagram of PLDO and the prototype core
6.2.3 Per-core LDO
The SCN output produces discrete output voltage levels, which are regulated via per-core
LDOs (Fig. 6.11). The current design utilizes phase-locked LDO (PLDO) with 16 parallel
phases [56, 66, 81]. PLDO utilizes two clocks FREF, output of reference voltage controlled
oscillator (VCO) and FLOC generated from a local VCO (LVCO) that is powered by VREG.
Fig. 6.11 illustrates the circuit implementation with a divide ratio, N=1. The reference
clock and the LVCO outputs are used to clock a 32-bit Johnson Counter (JC) with overrun
protection [66], PWM generation and embedded output drivers. The outputs of the dif-
ferent JC stages (Ri for reference clock and Li for local clock) form multi-phase and 32x
subsampled versions of the reference clock and the LVCO clock. At steady-state condi-
tion, the phase difference between FREF and FLOC locks to a constant value and turns the
power PMOS on for the exact duration of time that the load current demands to keep VREG
constant. The phase locking occurs at each stage of the JC and the total current provided
by all the PMOS devices in a time interleaved manner enables voltage regulation. If a load
92
transient causes the VREG to decrease from its steady state value, then the LVCO responds
by slowing down FLOC and stretching the pulse at Li. This perturbs the phase locking and
creates additional phase difference allowing the pull up devices to supply higher current
until re-locking and regulation are again achieved. Similarly, if the VREG increases from its
steady state value FLOC speeds up, which reduces the phase difference and the loop goes out
of lock. This in turn reduces the supply of current by the pull-up devices and ultimately re-
duces VREG until re-locking is achieved. The process locks FREF to FLOC and a multi-phase
design enables a ripple-free VREG. The overrun protection (OP) circuit removes any phase
aliasing in the XOR based phase detector (PD). For each phase, the duty-cycle of the PWM
at the input of the PLDOs power PFET, indicates the current demand of the local core. A
high-speed clock samples this PWM signal of the first phase to digitally represent (4-bits)
the load.
6.2.4 Core and Load Circuit
The power network has 4 cores as load. Each core consists of an SRAM array, ALU, In-
struction decoder and a three-stage pipeline (Fig. 6.11). Further, scan programmable DC
load circuits and high-speed noise generation circuits are integrated to mimic a large dy-
namic load range, and abrupt load steps characteristic of power gating/un-gating or power
state transitions in realistic load conditions. Capability is provided through high speed pads
to excite load transients as well as observe the output voltage node VREG.
6.2.5 SCN Clock and Cross-domain Regulation
Since the SCN is designed to act as a converter only, it is run at a clock frequency that pro-
vides highest efficiency for the core with highest power consumption. This is implemented
through a scan programmable VCO, whose frequency is set by the highest SCN conversion
ratio. Cross-domain regulation/noise is minimized by (1) detecting voltage droops at a core
with high-speed droop detector and (2) temporarily boosting the SCN clock to a transient
93
Figure 6.12: Timing diagram of dual loop control
frequency (FSW TRANSIENT = 30MHz) as shown in Fig. 6.4. This allows the flying capacitors
to quickly replenish their charge and reduce cross-domain noise propagation.
6.3 Dynamic Dual-loop control and phase allocation via FSM
There are altogether 3 nested loops operating simultaneously involving each of the cores
and the QOESC network. The first loop is the phase locked loop LDO which is instrumental
in regulating core supply voltage. The resource allocation for the SCN is implemented
through two loops that form the dynamic loop control.
If, for a given core, the duty cycle of the PWM increases (decreases) above (below)
the predefined upper (lower) threshold then then there is deficit (surplus) of resources al-
located to it and the FSM appropriately increases (decreases) resource slices to the given
core. If allocating resources also does not reduce (increase) the duty-cycle then we increase
(decrease) the conversion ratio. Such a dual loop control guarantees that optimal power is
routed to each core, while providing fine-grain elasticity from the SCN and regulation from
the PLDO.
94
Figure 6.13: (a) FSM for resource allocation and flowchart of operation principle (b) Deci-
sion flow for resource slice allocation
Fig 6.12 shows the timing diagram of the dual loop control for the case when the cur-
rent demand increases. As the current demand increases, the duty-cycle of Power PMOS
also increases i.e. they remain on for a longer time to supply the higher load current. If
the duty-cycle of PMOS indicated by the signal duty cycleLDO PFET IN increases more than
duty cyclelimit high, a preset threshold, then additional capacitance and switch area resources
are allocated indicated by c1 graph. Finally, in the case duty cycleLDO PFET IN remains high
even after allocating the maximum allowed number of resource slices then ratio control
loop increases the conversion ratio. For this design, maximum and minimum bound over
total phases to a core has also been placed (maximum of 20 and minimum of 4).
Fig. 6.13(a) shows the operation of FSM implementation for the phase allocation loop
through an example and Fig.6.13(b) explains through a flowchart. In this case core1(C1)
indicates that it needs additional resources and core3(C3) indicates that it has surplus. The
FSM assigns two pivots, one at the beginning of phases of C1 and the other at the beginning
of phases of C3. Once the pivots are decided rest of the registers are clock-gated and only
the registers between the two pivots are shifted in the clockwise direction from C1 towards
C3.
95
Figure 6.14: Circuit level implementation of FSM
The circuit implementation of the FSM design has been shown in Fig. 6.14. Since there
are 4 cores we need 2-bits to represent each core, hence two 8-bit circular-shift registers
are used to store the resource allocation states. The outputs Q0(0:7) and Q1(0:7) repre-
sent the current resource allocation states. The duty-cycle of the PLDO of each core is
compared to preset upper and lower threshold to generate two 4 bit signals, ds up(0:3) and
ds down(0:3). The duty-cycles information from all four cores, along with current Q0(0:7)
and Q1(0:7), allows the clock gating gen block to determine the pivots. If a resource de-
mand is noted then clock-gating is removed for all the flip-flops between the two pivots
and right shift operation is performed. Further, sc en signal is set to 0 for 4 cycles and
init value generates fresh sets of values to update pf con and pf sc bus. (pf con and pf sc
bus destination location can be noted in Fig. 6.5)
6.4 Measured Results
The design is fabricated in 130nm CMOS, occupies 2mmx2mm area, and uses 4nF of dual
MIMCAP for switching capacitance and 0.3 mm2 switch area. The test interface package
used is QFN. Fig.6.15 shows the power efficiency of the SCC at Core1 for the three ratios
as a function of the output load current (all the resources are allocated to Core-1). A peak
96
Figure 6.15: Measured SC power with respect to (a) varying load current (b) varying output
voltage.
efficiency of 87% is measured. The power efficiency of the SCC+LDO as function of the
output voltage is measured at Core1 (Fig. 6.15(a)) showing peak efficiency of 87%, 81%
and 67% for SCN ratios of 3⁄4, 1⁄2 and 1⁄4. Fig. 6.15(b) plots power efficiency by varying
output voltage for a constant load current of 1mA. The graph demonstrates typical behavior
of a multiple-ratio switched capacitor design. The three peaks correspond to the three target
ratios of 3⁄4, 1⁄2 and 1⁄4.
Fig. 6.16 shows less than 600ns of wake-up time for the SCC+LDO as the four cores
Figure 6.16: Measured scope capture showing boot-up of all the 4 cores using QOESC.
97
Figure 6.17: QOESC internal resistance (ROUT) versus switching frequency.
are simultaneously enabled. Fig.6.17 shows the plot of QOESC internal resistance versus
switching frequency for the 3 conversion ratios. The peak efficiency is measured at the
knee of the curve as it is pointed out in the figure.
The output voltage is measured as a function of the output load current for the proposed
design and compared with a baseline design where each core is assumed to have a dedicated
SCC and LDO, thus allocating 1/4th of the SCN resources per-core. In Fig. 6.18, we note
more than 2X increase in output current at iso-output voltage and 64%, 50% and 43%
increase in the output voltage for SCN ratios of 3⁄4, 1⁄2 and 1⁄4.
Power efficiency is measured as a function of output power for the proposed and base-
line designs for all three ratios and the results are shown in Fig. 6.19. We note 2-2.7X
increase in the output power as well as 68%-90% peak increase in power efficiency in the
proposed design. Similarly, the output ripple of the SCN (which is indicative of the total
SCN losses) is measured for three ratios for the proposed and the baseline designs and
shows 43% to 52% reduction of ripple (Fig. 6.20).
A full 1 mA load step for a target dropout of 300 mV shows droop recovery is 650ns
through the dual-loop SCC+LDO feedback (Fig. 6.21).
As a result of the increased operating range from the QOESC converter, the voltage-
frequency trade-off of a core shows extended range of 18% in power and 1.5X in operating
98
Figure 6.18: Measured output voltage of proposed vs baseline design vs. varying load
current shows improvement of 43-64%.
Figure 6.19: Measured power efficiency of proposed vs baseline design by varying output
power shows increase of 68-90% in efficiency.
frequency, thus enabling new DVFS states per core (Fig. 6.22).
Dynamic resource allocation across various operating states show: (1) extended oper-
ating configurations and (2) high power efficiency in all states (Fig. 6.23). The table in the
figure shows the amount of resources allocated to each core. This is also a measure of the
operating power of the core i.e. higher the operating power of a core the more resources
that are allocated to it. State1 represents a case where in the baseline design all the cores are
operating at the same power of P. The power P represents the maximum operating power
possible for a core in the baseline design. As shown in the figure, in this state the proposed
99
Figure 6.20: Measured output voltage ripple of proposed vs baseline design for different
load current shows improvement of 43-50%.
Figure 6.21: Scope capture demonstrating the regulation under load step.
and baseline system operate at same efficiency. With QOESC additional new states such as
State2, State3 and State4 are possible where an individual core can operate up to ˜4P power
levels (subject to availability of resources when other cores are in idle or sleep mode) while
maintaining similar power efficiency. It is important to note that these states are not avail-
able in baseline, either due to high voltage dropout due to internal resistance and high load
current or the power efficiency is at an unacceptable level.
Use of transient boosting during a voltage droop, reduces cross-domain noise by as
much as 85% (Fig. 6.24) (less than 12mV/V of cross-domain noise). Chip micrograph
is shown in Fig.6.25. Table 6.1 shows competitive metrics compared to state-of-the art
designs.
100
Figure 6.22: Power vs frequency for the one of the cores showing improved operating
range.
6.5 Summary
A quad-output elastic SCC with per-core LDO shows peak efficiency of 87% and 150%
increase in operating frequency range, through dynamic allocation of SC and switch re-
sources through an all-digital FSM.
101
Figure 6.23: Measured system efficiency shows that proposed design through flexible allo-
cation of resources allows cores to perform at higher power states at consistent efficiency.
Figure 6.24: Measured data shows coupling on steady state cores can be reduced by tran-
sient boosting.
102
Figure 6.25: Chip micrograph and characteristics.
Work This Work [51] [78] [79] [53]
Technology(nm) 130 180 65 350 28
Topology Step-down Step-down Step-up/down Step-up Step-down
Number of outputs 4 3 2 2 2
Passive On-chip On-chip On/Off-chip Off-chip On-chip
VIN(V) 1-1.2 0.9-4 0.85-3.6 1.1-1.8 1.3-1.6
VOUT(V) 0.15-0.9 0.6,1.2,3.3 0.1-1.9 2,3 0.4-0.9
Total Capacitance(nF) 4 3 1000 9400 8.1
Power Efficiency(ηPEAK) 87% 81% 95.8% 89.5% 83%
Max load per Output(mA) 6.4 0.033 1 or 10 12 100
Regulation LDO Freq-mod Freq-mod SHAOT Freq-mod
Multi-Ratio Yes(3 ratios) Yes(3 ratios) Yes(6 ratios) Yes(2 ratios) Yes
Fully Integrated Yes No No Yes No
Elastic SC allocation Yes No Partial No Yes
Power density (µW/mm2) 1800 250 N/A 87000 150000
Table 6.1: Comparison table with other SC topologies.
103
CONCLUSION
In this thesis, a power architecture solution has been provided for power delivery net-
works in multicore SoCs, with guardband reduction and consistent performance as key
goals. In order to address the key challenges of wide dynamic load range and variations,
this work proposed multiple approaches geared towards the different components that form
the power delivery network (PDN).
The various approaches, focused primarily on integrated LDOs and switched capacitor
(SC) converters, have been organized as different chapters in the thesis. Leakage current
supply circuit, a digital assist scheme for conventional analog LDO, lowers the maximum
current requirement for analog LDOs to reduce the minimum dropout voltage (VDO,MIN) by
30-38% and core power reduction of 21-28% at iso-frequency conditions. This is followed
by a discussion on all-digital LDOs where the operation and analysis of digital LDOs is
provided and merits with respect to low voltage operation and high power efficiency is
highlighted. In the chapter of Unified Voltage and Frequency Regulator (UVFR), we in-
troduce a single control loop that unifies the supply voltage and frequency regulation. In
this design, due to tight coupling between clock and supply, the local clock frequency
autonomously adapts to variations in the supply voltage, thereby allowing a 18-27% reduc-
tion in voltage guardband at iso-performance. Finally, we demonstrate a fully-integrated
quad-output multi-ratio elastic Switched Capacitor with per-core LDO that shows an im-
provement of 68-75% in power efficiency and 150% increase in core-operating frequency,
with respect to a baseline SC design, through dynamic allocation SC and switch resources.
For all the above techniques and schemes, detailed descriptions of design principles and
implementation from circuits perspective have been provided. Comprehensive analysis of
the design is shown through theoretical models, simulations and measurements from silicon
test-chips taped-out in scaled CMOS nodes.
SoC power delivery network designs are typically application and specification ori-
104
ented. As such there are several perspectives and for each perspective many permutations
of implementation exist. Hence, a single monolithic solution for the entire hierarchical
PDN is impractical and not feasible. Therefore, in this thesis the approach followed is to
optimize and enhance the various components that constitute the PDN with the governing
notion being to enable system designers to move away from worst-case to adaptive designs.
The underlying theme across all the designs is to sense the dynamic nature of digital loads
and based on it make intelligent choices in the mode of the converter or regulator that would
eventually lead to a more power efficient design.
Beyond the work described in this thesis, in the area of power management, there are
two important trends that will influence future power delivery network (PDN) design. In
order to handle ”Big Data” we are noticing a dramatic increase in the number of data-
centers. For these data centers, power and energy efficiency is extremely critical to lower
operating costs. Since AC power must be ultimately converted to DC to be used by compute
and storage elements DC-DC converters that can handle high voltage and convert them
down to low digital voltages suitable for SoC design, at high power efficiency will become
extremely critical. The other trend is that of self-powered wearable and implantable devices
and distributed sensor nodes, now more popularly known as components of the ”Internet
of Things”. These devices will present newer and more complex challenges to the PDN
design. IoT devices are characterized by long idle times and sporadic active modes which
requires very high power efficiency during light load conditions and fast switching time
between operating modes. Further, many of such IoT devices like sensor nodes at remote
locations will use multiple energy harvesting sources to enhance the battery life. In such
scenarios, the PDN will not only need to adapt based on the dynamic load but also for the
varying nature of the supply source. We will continue to see growing challenges in power
management and system design; and we expect innovations in circuit topology, control
designs as well as a stronger coupling between the power delivery network and load circuits
will power the next technology revolution.
105
REFERENCES
[1] Kihwan Choi, Ramakrishna Soma, and Massoud Pedram. “Fine-grained dynamic
voltage and frequency scaling for precise energy and performance tradeoff based on
the ratio of off-chip access to on-chip computation times”. In: IEEE transactions on
computer-aided design of integrated circuits and systems 24.1 (2005), pp. 18–28.
[2] Sebastian Herbert and Diana Marculescu. “Analysis of dynamic voltage/frequency
scaling in chip-multiprocessors”. In: Low Power Electronics and Design (ISLPED),
2007 ACM/IEEE International Symposium on. IEEE. 2007, pp. 38–43.
[3] Jason Howard et al. “A 48-core IA-32 message-passing processor with DVFS in
45nm CMOS”. In: Solid-State Circuits Conference Digest of Technical Papers (ISSCC),
2010 IEEE International. IEEE. 2010, pp. 108–109.
[4] Kevin J Nowka et al. “A 32-bit PowerPC system-on-a-chip with support for dynamic
voltage scaling and dynamic frequency scaling”. In: IEEE Journal of Solid-State
Circuits 37.11 (2002), pp. 1441–1447.
[5] Nasser Kurd et al. “Haswell: A family of IA 22 nm processors”. In: IEEE Journal of
Solid-State Circuits 50.1 (2015), pp. 49–58.
[6] Efraim Rotem et al. “Power-management architecture of the intel microarchitecture
code-named sandy bridge”. In: Ieee micro 32.2 (2012), pp. 20–27.
[7] Eric J Fluhr et al. “5.1 POWER8 TM: A 12-core server-class processor in 22nm
SOI with 7.6 Tb/s off-chip bandwidth”. In: Solid-State Circuits Conference Digest
of Technical Papers (ISSCC), 2014 IEEE International. IEEE. 2014, pp. 96–97.
[8] Christopher Gonzalez et al. “3.1 POWER9: A processor family optimized for cog-
nitive computing with 25Gb/s accelerator links and 16Gb/s PCIe Gen4”. In: Solid-
State Circuits Conference (ISSCC), 2017 IEEE International. IEEE. 2017, pp. 50–
51.
[9] Teja Singh et al. “3.2 Zen: A next-generation high-performance× 86 core”. In: Solid-
State Circuits Conference (ISSCC), 2017 IEEE International. IEEE. 2017, pp. 52–
53.
[10] James W Tschanz et al. “Adaptive body bias for reducing impacts of die-to-die and
within-die parameter variations on microprocessor frequency and leakage”. In: IEEE
Journal of Solid-State Circuits 37.11 (2002), pp. 1396–1402.
106
[11] James W Tschanz et al. “Dynamic sleep transistor and body bias for active leakage
power control of microprocessors”. In: IEEE Journal of Solid-State Circuits 38.11
(2003), pp. 1838–1845.
[12] I Pierce et al. “26.2 Power supply noise in a 22nm z13 microprocessor”. In: Solid-
State Circuits Conference (ISSCC), 2017 IEEE International. IEEE. 2017, pp. 438–
439.
[13] Kelin Kuhn et al. “Managing Process Variation in Intel’s 45nm CMOS Technology.”
In: Intel Technology Journal 12.2 (2008).
[14] Chris H Kim et al. “A process variation compensating technique with an on-die
leakage current sensor for nanometer scale dynamic circuits”. In: IEEE Transactions
on Very Large Scale Integration (VLSI) Systems 14.6 (2006), pp. 646–649.
[15] Jim Tschanz, Keith Bowman, and Vivek De. “Variation-tolerant circuits: circuit solu-
tions and techniques”. In: Proceedings of the 42nd annual Design Automation Con-
ference. ACM. 2005, pp. 762–763.
[16] Shekhar Borkar et al. “Parameter variations and impact on circuits and microarchi-
tecture”. In: Proceedings of the 40th annual Design Automation Conference. ACM.
2003, pp. 338–342.
[17] James Tschanz et al. “Adaptive frequency and biasing techniques for tolerance to
dynamic temperature-voltage variations and aging”. In: Solid-State Circuits Con-
ference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International. IEEE.
2007, pp. 292–604.
[18] David Bull et al. “A power-efficient 32 bit ARM processor using timing-error de-
tection and correction for transient-error tolerance and adaptation to PVT variation”.
In: IEEE Journal of Solid-State Circuits 46.1 (2011), pp. 18–31.
[19] Dan Ernst et al. “Razor: A low-power pipeline based on circuit-level timing specula-
tion”. In: Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM
International Symposium on. IEEE. 2003, pp. 7–18.
[20] Keith A Bowman et al. “Energy-efficient and metastability-immune resilient cir-
cuits for dynamic variation tolerance”. In: IEEE Journal of Solid-State Circuits 44.1
(2009), pp. 49–63.
[21] Wonyoung Kim et al. “System level analysis of fast, per-core DVFS using on-chip
switching regulators”. In: High Performance Computer Architecture, 2008. HPCA
2008. IEEE 14th International Symposium on. IEEE. 2008, pp. 123–134.
107
[22] Jason Howard et al. “A 48-core IA-32 processor in 45 nm CMOS using on-die
message-passing and DVFS for performance and power scaling”. In: IEEE Journal
of Solid-State Circuits 46.1 (2011), pp. 173–183.
[23] Wonyoung Kim, David Brooks, and Gu-Yeon Wei. “A fully-integrated 3-level DC-
DC converter for nanosecond-scale DVFS”. In: IEEE Journal of Solid-State Circuits
47.1 (2012), pp. 206–219.
[24] Zeynep Toprak-Deniz et al. “5.2 distributed system of digitally controlled microreg-
ulators enabling per-core DVFS for the POWER8 TM microprocessor”. In: Solid-
State Circuits Conference Digest of Technical Papers (ISSCC), 2014 IEEE Interna-
tional. IEEE. 2014, pp. 98–99.
[25] Noah Sturcken et al. “A switched-inductor integrated voltage regulator with nonlin-
ear feedback and network-on-chip load in 45 nm SOI”. In: IEEE Journal of Solid-
State Circuits 47.8 (2012), pp. 1935–1945.
[26] Rinkle Jain and Seth Sanders. “A 200mA switched capacitor voltage regulator on
32nm CMOS and regulation schemes to enable DVFS”. In: Power Electronics and
Applications (EPE 2011), Proceedings of the 2011-14th European Conference on.
IEEE. 2011, pp. 1–10.
[27] Hanh-Phuc Le et al. “A sub-ns response fully integrated battery-connected switched-
capacitor voltage regulator delivering 0.19 W/mm 2 at 73% efficiency”. In: Solid-
State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE Interna-
tional. IEEE. 2013, pp. 372–373.
[28] Gabriel A Rincon-Mora and Phillip E Allen. “A low-voltage, low quiescent current,
low drop-out regulator”. In: IEEE journal of Solid-State circuits 33.1 (1998), pp. 36–
44.
[29] Yasuyuki Okuma et al. “0.5-V input digital LDO with 98.7% current efficiency and
2.7-µA quiescent current in 65nm CMOS”. In: Custom Integrated Circuits Confer-
ence (CICC), 2010 IEEE. IEEE. 2010, pp. 1–4.
[30] Vishal Gupta and Gabriel A Rincón-Mora. “A 5mA 0.6 µm CMOS miller-compensated
LDO regulator with-27dB worst-case power-supply rejection using 60pF of on-chip
capacitance”. In: Solid-State Circuits Conference, 2007. ISSCC 2007. Digest of Tech-
nical Papers. IEEE International. IEEE. 2007, pp. 520–521.
[31] Vishal Gupta, Gabriel A Rincón-Mora, and Prasun Raha. “Analysis and design of
monolithic, high PSR, linear regulators for SoC applications”. In: SOC Conference,
2004. Proceedings. IEEE International. IEEE. 2004, pp. 311–315.
108
[32] Doyun Kim and Mingoo Seok. “8.2 Fully integrated low-drop-out regulator based
on event-driven PI control”. In: Solid-State Circuits Conference (ISSCC), 2016 IEEE
International. IEEE. 2016, pp. 148–149.
[33] Yan Lu, Wing-Hung Ki, and C Patrick Yue. “17.11 A 0.65 ns-response-time 3.01
ps FOM fully-integrated low-dropout regulator with full-spectrum power-supply-
rejection for wideband communication systems”. In: Solid-State Circuits Conference
Digest of Technical Papers (ISSCC), 2014 IEEE International. IEEE. 2014, pp. 306–
307.
[34] Inna Vaisband et al. “Distributed LDO regulators in a 28 nm power delivery system”.
In: Analog Integrated Circuits and Signal Processing 83.3 (2015), pp. 295–309.
[35] Saad Bin Nasir, Samantak Gangopadhyay, and Arijit Raychowdhury. “5.6 A 0.13 µm
fully digital low-dropout regulator with adaptive control and reduced dynamic sta-
bility for ultra-wide dynamic range”. In: Solid-State Circuits Conference-(ISSCC),
2015 IEEE International. IEEE. 2015, pp. 1–3.
[36] Fan Yang and Philip KT Mok. “A 0.6–1V input capacitor-less asynchronous digital
LDO with fast transient response achieving 9.5 b over 500mA loading range in 65-
nm CMOS”. In: European Solid-State Circuits Conference (ESSCIRC), ESSCIRC
2015-41st. IEEE. 2015, pp. 180–183.
[37] Yong-Jin Lee et al. “A 200-mA Digital Low Drop-Out Regulator With Coarse-Fine
Dual Loop in Mobile Application Processor”. In: IEEE Journal of Solid-State Cir-
cuits 52.1 (2017), pp. 64–76.
[38] Kazuaki Mori et al. “Analog-assisted digital low dropout regulator with fast transient
response and low output ripple”. In: Japanese Journal of Applied Physics 53.4S
(2014), 04EE22.
[39] Mo Huang et al. “20.4 An output-capacitor-free analog-assisted digital low-dropout
regulator with tri-loop control”. In: Solid-State Circuits Conference (ISSCC), 2017
IEEE International. IEEE. 2017, pp. 342–343.
[40] Martin Saint-Laurent et al. “A 28 nm DSP powered by an on-chip LDO for high-
performance and energy-efficient mobile applications”. In: IEEE Journal of Solid-
State Circuits 50.1 (2015), pp. 81–91.
[41] Chao-Chang Chiu et al. “A 0.6 V resistance-locked loop embedded digital low
dropout regulator in 40nm CMOS with 77% power supply rejection improvement”.
In: VLSI Circuits (VLSIC), 2013 Symposium on. IEEE. 2013, pp. C166–C167.
109
[42] John F Bulzacchelli et al. “Dual-loop system of distributed microregulators with
high DC accuracy, load response time below 500 ps, and 85-mV dropout voltage”.
In: vol. 47. 4. IEEE, 2012, pp. 863–874.
[43] James Tschanz et al. “A 45nm resilient and adaptive microprocessor core for dy-
namic variation tolerance”. In: Solid-State Circuits Conference Digest of Technical
Papers (ISSCC), 2010 IEEE International. IEEE. 2010, pp. 282–283.
[44] Nasser Kurd et al. “Next generation intel core micro-architecture (Nehalem) clock-
ing”. In: IEEE Journal of Solid-State Circuits 44.4 (2009), pp. 1121–1129.
[45] Keith Bowman et al. “8.5 A 16nm auto-calibrating dynamically adaptive clock dis-
tribution for maximizing supply-voltage-droop tolerance across a wide operating
range”. In: Solid-State Circuits Conference-(ISSCC), 2015 IEEE International. IEEE.
2015, pp. 1–3.
[46] Dong Jiao, Bongjin Kim, and Chris H Kim. “Design, modeling, and test of a pro-
grammable adaptive phase-shifting PLL for enhancing clock data compensation”.
In: IEEE Journal of Solid-State Circuits 47.10 (2012), pp. 2505–2516.
[47] Keith A Bowman et al. “A 22 nm all-digital dynamically adaptive clock distribution
for supply voltage droop tolerance”. In: IEEE Journal of Solid-State Circuits 48.4
(2013), pp. 907–916.
[48] Michael S Floyd et al. “26.5 Adaptive clocking in the POWER9 processor for voltage
droop protection”. In: Solid-State Circuits Conference (ISSCC), 2017 IEEE Interna-
tional. IEEE. 2017, pp. 444–445.
[49] Dong Jiao, Jie Gu, and Chris H Kim. “Circuit design and modeling techniques for en-
hancing the clock-data compensation effect under resonant supply noise”. In: IEEE
Journal of Solid-State Circuits 45.10 (2010), pp. 2130–2141.
[50] Loai G Salem and Patick P Mercier. “4.6 an 85%-efficiency fully integrated 15-ratio
recursive switched-capacitor dc-dc converter with 0.1-to-2.2 v output voltage range”.
In: Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014 IEEE
International. IEEE. 2014, pp. 88–89.
[51] Wanyeong Jung et al. “8.5 A 60%-efficiency 20nW-500µW tri-output fully inte-
grated power management unit with environmental adaptation and load-proportional
biasing for IoT systems”. In: Solid-State Circuits Conference (ISSCC), 2016 IEEE
International. IEEE. 2016, pp. 154–155.
[52] Toke Meyer Andersen et al. “20.3 A feedforward controlled on-chip switched-capacitor
voltage regulator delivering 10W in 32nm SOI CMOS”. In: Solid-State Circuits
Conference-(ISSCC), 2015 IEEE International. IEEE. 2015, pp. 1–3.
110
[53] Junmin Jiang et al. “20.5 A dual-symmetrical-output switched-capacitor converter
with dynamic power cells and minimized cross regulation for application processors
in 28nm CMOS”. In: Solid-State Circuits Conference (ISSCC), 2017 IEEE Interna-
tional. IEEE. 2017, pp. 344–345.
[54] Saad Bin Nasir, Shreyas Sen, and Arijit Raychowdhury. “A 130nm hybrid low dropout
regulator based on switched mode control for digital load circuits”. In: European
Solid-State Circuits Conference, ESSCIRC Conference 2016: 42nd. IEEE. 2016,
pp. 317–320.
[55] Samantak Gangopadhyay et al. “Digitally-assisted leakage current supply circuit for
reducing the analog LDO minimum dropout voltage”. In: Custom Integrated Circuits
Conference (CICC), 2017 IEEE. IEEE. 2017, pp. 1–4.
[56] Arijit Raychowdhury et al. “A fully-digital phase-locked low dropout regulator in
32nm CMOS”. In: VLSI Circuits (VLSIC), 2012 Symposium on. IEEE. 2012, pp. 148–
149.
[57] Peter Hazucha et al. “Area-efficient linear regulator with ultra-fast load regulation”.
In: IEEE Journal of solid-state circuits 40.4 (2005), pp. 933–940.
[58] Mohamed El-Nozahi et al. “High PSR low drop-out regulator with feed-forward
ripple cancellation technique”. In: IEEE Journal of Solid-State Circuits 45.3 (2010),
pp. 565–577.
[59] Yat-Hei Lam and Wing-Hung Ki. “A 0.9 V 0.35 µm adaptively biased CMOS LDO
regulator with fast transient response”. In: Solid-State Circuits Conference, 2008.
ISSCC 2008. Digest of Technical Papers. IEEE International. IEEE. 2008, pp. 442–
626.
[60] Wen-Jie Tsou et al. “20.2 Digital low-dropout regulator with anti PVT-variation tech-
nique for dynamic voltage scaling and adaptive voltage scaling multicore processor”.
In: Solid-State Circuits Conference (ISSCC), 2017 IEEE International. IEEE. 2017,
pp. 338–339.
[61] Loai G Salem, Julian Warchall, and Patrick P Mercier. “20.3 A 100nA-to-2mA
successive-approximation digital LDO with PD compensation and sub-LSB duty
control achieving a 15.1 ns response time at 0.5 V”. In: Solid-State Circuits Confer-
ence (ISSCC), 2017 IEEE International. IEEE. 2017, pp. 340–341.
[62] Doyun Kim et al. “20.6 A 0.5 VV IN 1.44 mA-class event-driven digital LDO
with a fully integrated 100pF output capacitor”. In: Solid-State Circuits Conference
(ISSCC), 2017 IEEE International. IEEE. 2017, pp. 346–347.
111
[63] Saad Bin Nasir, Samantak Gangopadhyay, and Arijit Raychowdhury. “All-Digital
Low-Dropout Regulator With Adaptive Control and Reduced Dynamic Stability for
Digital Load Circuits”. In: IEEE Transactions on Power Electronics 31.12 (2016),
pp. 8293–8302.
[64] Samantak Gangopadhyay et al. “Modeling and analysis of digital linear dropout reg-
ulators with adaptive control for high efficiency under wide dynamic range digi-
tal loads”. In: Design, Automation and Test in Europe Conference and Exhibition
(DATE), 2014. IEEE. 2014, pp. 1–6.
[65] Karl J Åström and Björn Wittenmark. Adaptive control. Courier Corporation, 2013.
[66] Samantak Gangopadhyay et al. “A 32 nm embedded, fully-digital, phase-locked low
dropout regulator for fine grained power management in digital circuits”. In: IEEE
Journal of Solid-State Circuits 49.11 (2014), pp. 2684–2693.
[67] Arijit Raychowdhury et al. “A 2.3 nJ/frame voice activity detector-based audio front-
end for context-aware system-on-chip applications in 32-nm CMOS”. In: IEEE Jour-
nal of Solid-State Circuits 48.8 (2013), pp. 1963–1969.
[68] C Chen, JH Wu, and ZX Wang. “150 mA LDO with self-adjusting frequency com-
pensation scheme”. In: Electronics letters 47.13 (2011), pp. 767–768.
[69] William F Egan. Phase-lock basics. John Wiley & Sons, 2007.
[70] MD Seeman et al. “The future of integrated power conversion: The switched capac-
itor approach”. In: IEEE COMPEL Workshop. 2010, pp. 1430–1434.
[71] Vincent Ng and Seth Sanders. “A 92%-efficiency wide-input-voltage-range switched-
capacitor DC-DC converter”. In: Solid-State Circuits Conference Digest of Technical
Papers (ISSCC), 2012 IEEE International. IEEE. 2012, pp. 282–284.
[72] Yogesh Ramadass et al. “A 0.16 mm 2 completely on-chip switched-capacitor DC-
DC converter using digital capacitance modulation for LDO replacement in 45nm
CMOS”. In: Solid-State Circuits Conference Digest of Technical Papers (ISSCC),
2010 IEEE International. IEEE. 2010, pp. 208–209.
[73] Yogesh K Ramadass and Anantha P Chandrakasan. “Voltage scalable switched ca-
pacitor DC-DC converter for ultra-low-power on-chip applications”. In: Power Elec-
tronics Specialists Conference, 2007. PESC 2007. IEEE. IEEE. 2007, pp. 2353–
2359.
[74] Wanyeong Jung, Dennis Sylvester, and David Blaauw. “12.1 A rational-conversion-
ratio switched-capacitor DC-DC converter using negative-output feedback”. In: Solid-
112
State Circuits Conference (ISSCC), 2016 IEEE International. IEEE. 2016, pp. 218–
219.
[75] Loai G Salem and Patrick P Mercier. “A battery-connected 24-ratio switched ca-
pacitor PMIC achieving 95.5%-efficiency”. In: VLSI Circuits (VLSI Circuits), 2015
Symposium on. IEEE. 2015, pp. C340–C341.
[76] Junmin Jiang et al. “20.5 A 2-/3-phase fully integrated switched-capacitor DC-DC
converter in bulk CMOS for energy-efficient digital circuits with 14% efficiency im-
provement”. In: Solid-State Circuits Conference-(ISSCC), 2015 IEEE International.
IEEE. 2015, pp. 1–3.
[77] Nicolas Butzen and Michiel Steyaert. “12.2 A 94.6%-efficiency fully integrated
switched-capacitor DC-DC converter in baseline 40nm CMOS using scalable para-
sitic charge redistribution”. In: Solid-State Circuits Conference (ISSCC), 2016 IEEE
International. IEEE. 2016, pp. 220–221.
[78] Chen Kong Teh and Atsushi Suzuki. “12.3 A 2-output step-up/step-down switched-
capacitor DC-DC converter with 95.8% peak efficiency and 0.85-to-3.6 V input volt-
age range”. In: Solid-State Circuits Conference (ISSCC), 2016 IEEE International.
IEEE. 2016, pp. 222–223.
[79] Zhe Hua and Hoi Lee. “A Reconfigurable Dual-Output Switched-Capacitor DC-DC
Regulator With Sub-Harmonic Adaptive-On-Time Control for Low-Power Applica-
tions”. In: IEEE Journal of Solid-State Circuits 50.3 (2015), pp. 724–736.
[80] Suyoung Bang et al. “A fully integrated successive-approximation switched-capacitor
DC-DC converter with 31mV output voltage resolution”. In: Solid-State Circuits
Conference Digest of Technical Papers (ISSCC), 2013 IEEE International. IEEE.
2013, pp. 370–371.
[81] Samantak Gangopadhyay et al. “UVFR: A Unified Voltage and Frequency Regulator
with 500MHz/0.84 V to 100KHz/0.27 V operating range, 99.4% current efficiency
and 27% supply guardband reduction”. In: European Solid-State Circuits Confer-
ence, ESSCIRC Conference 2016: 42nd. IEEE. 2016, pp. 321–324.
[82] Sam Ben-Yaakov and Michael Evzelman. “Generic and unified model of switched
capacitor converters”. In: Energy Conversion Congress and Exposition, 2009. ECCE
2009. IEEE. IEEE. 2009, pp. 3501–3508.
[83] Alexander Kushnerov and Sam Ben-Yaakov. “Algebraic synthesis of Fibonacci switched
capacitor converters”. In: Microwaves, Communications, Antennas and Electronics
Systems (COMCAS), 2011 IEEE International Conference on. IEEE. 2011, pp. 1–4.
113
VITA
Samantak Gangopadhyay received his B.Tech and M.Tech degrees in Electronics and Elec-
trical Communication Engineering (specialization in Microelectronics and VLSI design)
from the Indian Institute of Technology, Kharagpur in May, 2009. After graduation, he
joined IBM India as a Physical design R&D Engineer and worked on POWER series and
Z-Mainframe microprocessors. In this role, he was responsible for the implementation of
synthesizable blocks of his units and ensuring that the timing, noise, power, Electromigra-
tion, DFT and DFM specifications are met.
In Fall 2013, he started his doctoral studies at the Georgia Institute of Technology and
joined the Integrated Circuits and Systems Research Laboratory (ICSRL). At ICSRL, his
primary focus has been adaptive power management and clock generation techniques for
wide range computation in multi-core VLSI designs. His wider interest includes energy
harvesting in multi-source IoTs and discrete dynamical systems through distributed archi-
tectures.
S. Gangopadhyay has published 7 technical papers in refereed conferences, 2 journal
papers and holds 2 patents. He was a finalist in Qualcomm Innovation fellowship (2015).
He has also received CETL outstanding TA award (2015) and ISSCC Student travel grant
award (2017).
114
