High Performance Distributed On-Chip Voltage Regulation for Modern Integrated Systems by Wang, Longfei
University of South Florida
Scholar Commons
Graduate Theses and Dissertations Graduate School
November 2018
High Performance Distributed On-Chip Voltage
Regulation for Modern Integrated Systems
Longfei Wang
University of South Florida, longfei@mail.usf.edu
Follow this and additional works at: https://scholarcommons.usf.edu/etd
Part of the Electrical and Computer Engineering Commons
This Dissertation is brought to you for free and open access by the Graduate School at Scholar Commons. It has been accepted for inclusion in
Graduate Theses and Dissertations by an authorized administrator of Scholar Commons. For more information, please contact
scholarcommons@usf.edu.
Scholar Commons Citation
Wang, Longfei, "High Performance Distributed On-Chip Voltage Regulation for Modern Integrated Systems" (2018). Graduate Theses
and Dissertations.
https://scholarcommons.usf.edu/etd/7590
High Performance Distributed On-Chip Voltage Regulation for Modern Integrated Systems
by
Longfei Wang
A dissertation submitted in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Department of Electrical Engineering
College of Engineering
University of South Florida
Major Professor: Selçuk Köse, Ph.D.
Ismail Uysal, Ph.D.
Nasir Ghani, Ph.D.
Mehran Mozaffari Kermani, Ph.D.
Ulya R. Karpuzcu, Ph.D.
Date of Approval:
October 29, 2018
Keywords: On-chip power delivery network, voltage regulator, power efficiency, stability, reliability
Copyright c© 2018, Longfei Wang
DEDICATION
To those who seeded or nourished my life.
ACKNOWLEDGMENTS
It was not an easy journey to finally approach the end of my Ph.D. study. Hunderds of
people enlightened, motivated, encouraged, and helped me along the road to reach the point. First
of all, I would like to express my sincerest thankfulness to my Ph.D. advisor Professor Selçuk Köse
for his continuous support, guidance, patience, and trust. Without the endorsement of Professor
Selçuk Köse, I would not have been awarded the prestigious University Graduate Fellowship when I
entered USF and later the Allan R. Gondeck Memorial Scholarship and Chih Foundation Research
and Publication Award. I would not also have been part of the collaborative research projects
with University of Minnesota and University of Central Florida to hone my research skills. I would
remember every word Professor Selçuk Köse edited to improve the quality of my conference and
journal paper submissions many times even at the last minute.
I would also like to thank Professor Ulya R. Karpuzcu, Professor Ismail Uysal, Professor
Mehran Mozaffari Kermani, and Professor Nasir Ghani for serving on my dissertation committee
and Professor Attila A. Yavuz for serving as the chair of my defense. Your insightful comments and
advice strengthened my research and dissertation.
I would like to thank my previous and current lab mates Prof. Weize Yu, Dr. Orhun Aras
Uzun, Dr. Mahmood Azhar, Mohammad Ali Vosoughi, and Soner Seckiner for their support. Weize
shared with me his wisdom for research. Orhun helped me significantly with Cadence simulations.
Mohammad was readily for discussion whenever I had questions.
Finally, I would like to thank my family members for their selfless love. They accompanied,
experienced, sustained, understood, and kept an eye on every step of my move.
TABLE OF CONTENTS
LIST OF TABLES iv
LIST OF FIGURES v
ABSTRACT viii
CHAPTER 1: INTRODUCTION 1
1.1 Power Delivery Networks 1
1.1.1 Buck Converters 3
1.1.2 Low-Dropout Regulators 4
1.1.3 Switched Capacitor Converters 5
1.2 Our Contribution 5
1.3 Organization 6
CHAPTER 2: UNBALANCED CURRENT SHARING AMONG DISTRIBUTED
ON-CHIP VOLTAGE REGULATORS 7
2.1 Introduction 7
2.2 The Unbalanced Current Sharing Problem 9
2.2.1 Large Current Variations 11
2.2.2 Voltage Regulator Malfunctions 12
2.3 Efficiency Implications of Unbalanced Current Sharing 13
2.3.1 Regulator Loss Model and Efficiency 14
2.3.2 Efficiency Degradation of Distributed Regulators with Unbal-
anced Current Sharing 16
2.4 Stability Implications of Unbalanced Current Sharing 18
2.4.1 Stability of Individual On-Chip Voltage Regulators 19
2.4.2 Stability of the Power Delivery Network 21
2.5 Reliability Implications of Unbalanced Current Sharing 23
2.6 Adaptive Reference Voltage Control 26
2.6.1 Adaptive Reference Voltage Control Mechanism 26
2.6.2 Adaptive Reference Voltage Control Implementation 29
2.6.2.1 Average Current Sensor 29
2.6.2.2 N Comparator 30
2.6.2.3 Current Mismatch Decision 33
2.6.2.4 Multi-Level Reference Voltage Generation 35
2.6.3 Simulation Verifications 36
2.6.4 Practical Concerns 37
i
2.7 Case Study: IBM POWER8 Like Microprocessor 38
2.7.1 Benchmarks 38
2.7.2 Architecture 38
2.7.3 Simulation Framework 39
2.7.4 Power Grid and Voltage Regulator Properties 40
2.8 Conclusion 42
CHAPTER 3: AGING EFFECTS ON THE TRANSIENT PERFORMANCE
DEGRADATION OF ON-CHIP VOLTAGE REGULATORS 43
3.1 Introduction 43
3.2 Background 45
3.2.1 Conventional DLDO Regulator 45
3.2.2 Negative Bias Temperature Instability 47
3.3 NBTI Induced Performance Degradation 48
3.3.1 Maximum Current Supply Capability 48
3.3.2 Load Response Time 49
3.3.3 Magnitude of the Droop 50
3.4 NBTI-Aware DLDO Voltage Regulator 51
3.5 Evaluation 53
3.5.1 Simulation Framework 54
3.5.1.1 IBM POWER8 Like Microprocessor 54
3.5.1.2 DLDO Design Specifications 55
3.5.1.3 Evaluation of NBTI Induced Performance Degradation 55
3.5.2 Simulation Results 56
3.5.2.1 Performance Degradation within Conventional DLDO 56
3.5.2.2 Mitigation with Proposed NBTI-Aware DLDO 56
3.5.2.3 Discussions 57
3.6 Conclusion 58
CHAPTER 4: AGING EFFECTS ON THE STEADY STATE PERFORMANCE
DEGRADATION OF ON-CHIP VOLTAGE REGULATORS 59
4.1 Introduction 59
4.2 Limit Cycle Oscillation 63
4.3 Reduced Clock Pulse Width 65
4.4 LCO Mitigation with Proposed Aging-Aware DLDO 67
4.5 Trade-Off between Area Overhead and Program Output Quality 69
4.6 Conclusion 72
CHAPTER 5: CONCLUSIONS 73
CHAPTER 6: FUTURE WORK 75
6.1 Co-Optimizing Different Design Aspects to Avoid Overdesign 75
6.2 NBTI-Aware Digital LDO with Adaptive Gain Scaling Control 75
REFERENCES 77
ii
APPENDICES 89
Appendix A: Copyright Notices 90
ABOUT THE AUTHOR End Page
iii
LIST OF TABLES
Table 2.1 Technology and Architecture Parameters 37
Table 3.1 Load Characteristics of Different Functional Blocks within One Core of
an IBM POWER8 Like Microprocessor Chip under All Experimented
Benchmarks 53
Table 3.2 Conventional DLDO Performance Degradation for Different Functional
Blocks under All Experimented Benchmarks for a Five-Year Time Frame 55
Table 4.1 TFF Setup Time, Logic Delay, and Comparator Delay Before and After
a Five-Year Aging Period 66
Table 4.2 Maximum LCO Mode under Different Sampling Clock Frequency and
Load Current Condition for Conventional Dual Edge (CDE) and Aging-
Aware (AA) DLDO 67
iv
LIST OF FIGURES
Figure 1.1 Representative power delivery network components for modern inte-
grated systems. 2
Figure 1.2 Schematic of a buck converter. 3
Figure 1.3 Schematic of a low-dropout regulator. 4
Figure 1.4 Schematic of a switched capacitor converter with 2:1 conversion ratio. 5
Figure 2.1 On-chip power delivery network with distributed voltage regulators. 9
Figure 2.2 Unbalanced current sharing between two identical distributed on-chip
buck regulators. 10
Figure 2.3 Unbalanced current sharing among 96 identical distributed on-chip
VRs within IBM POWER8 like microprocessor. 11
Figure 2.4 Conventional buck regulator, SC regulator, and LDO efficiency curves. 13
Figure 2.5 Simplified schematic of a synchronous buck regulator. 15
Figure 2.6 Unbalanced current sharing induced extra power loss and efficiency
degradation as a function of CSRi for N identical distributed on-chip VRs. 17
Figure 2.7 Stability of individual on-chip VR as a function of CSRi and N . 20
Figure 2.8 λimin(jωk) as a function of fk under different values of CSRi and N . 22
Figure 2.9 MTTF as a function of CSRi. 24
Figure 2.10 Simplified model of two identical distributed on-chip VRs with power
grid effective resistances. 25
Figure 2.11 System level block diagram of the proposed Vref control method and
multi-level Vref generator for N identical distributed on-chip VRs. 26
Figure 2.12 Flowchart of the proposed adaptive Vref control algorithm. 27
Figure 2.13 Schematic of the average current sensor. 29
v
Figure 2.14 Schematic of the analog N comparator for maximum and minimum
current decision. 30
Figure 2.15 Schematic of the current_mismatch decision block. 31
Figure 2.16 Simulation results with and without the proposed adaptive Vref control
scheme for two identical distributed on-chip VRs. 32
Figure 2.17 Simulation results with and without the proposed adaptive Vref control
scheme for three identical distributed on-chip VRs. 33
Figure 2.18 Simulation results with sinusoidal and step load current for three iden-
tical distributed on-chip VRs. 34
Figure 2.19 Simulation results with and without the proposed adaptive Vref control
scheme for three distributed on-chip VRs under distribution wire and
VR mismatches. 35
Figure 2.20 Chiplet simplified floorplan. 38
Figure 2.21 Calibrated efficiency curve for the on-chip voltage regulator. 39
Figure 2.22 Power saving and regulator power loss saving with balanced current
sharing for different applications. 40
Figure 3.1 Schematic of conventional DLDO. 45
Figure 3.2 Digital controller for conventional DLDO. 46
Figure 3.3 Percentage IpMOS degradation of conventional bDSR based DLDO. 48
Figure 3.4 Proposed uni-directional shift register for NBTI-aware DLDO. 51
Figure 3.5 Operation of the proposed uni-directional shift register. 52
Figure 3.6 A schematic diagram demonstrating the floor plan of one core within
IBM POWER8 like microprocessor chip. 54
Figure 3.7 Percentage IpMOS degradation mitigation of the proposed NBTI-aware
DLDO as compared to the conventional DLDO design for different
functional blocks under all experimented benchmarks. 56
Figure 3.8 Percentage TR degradation mitigation of the proposed NBTI-aware
DLDO as compared to the conventional DLDO design for different
functional blocks under all experimented benchmarks. 57
Figure 3.9 Percentage ∆V degradation mitigation of the proposed NBTI-aware
DLDO as compared to the conventional DLDO design for different
functional blocks under all experimented benchmarks. 57
vi
Figure 4.1 Nonlinear sampled feedback model of a conventional DLDO. 62
Figure 4.2 Schematic of the proposed aging-aware DLDO. 64
Figure 4.3 Maximum LCOmode with simulation results superimposed for conven-
tional and aging-aware DLDO under different load current conditions
after a five-year aging period. 66
Figure 4.4 Simulated output voltage ripple and LCO mode reduction with the
proposed aging-aware DLDO under 10 mA load current. 68
Figure 4.5 Percentage area overhead (OH) utilizing conventional DLDO and per-
centage area overhead saving (OH_S) utilizing aging-aware DLDO for
∆V degradation mitigation within each functional unit. 69
Figure 4.6 Percentage area overhead utilizing conventional DLDO and percentage
area overhead saving utilizing aging-aware DLDO for ∆V degradation
mitigation within LSU under different temperature profile. 69
Figure 4.7 Percentage ∆V degradation mitigation of the proposed aging-aware
DLDO as compared to the conventional DLDO design for LSU under
all experimented benchmarks and different temperature profile. 70
Figure 4.8 Percentage area overhead within each functional unit for percentage
error rate degradation mitigation utilizing bDSR and uDSR based DLDO. 71
vii
ABSTRACT
Distributed on-chip voltage regulation where multiple voltage regulators are distributed
among different locations of the chip demonstrates advantages as compared to on-chip voltage reg-
ulation utilizing a single voltage regulator. Better on-chip voltage noise performance and faster
transient response can be realized due to localized voltage regulation. Despite the advantages of
distributed on-chip voltage regulation, unbalanced current sharing issue can occur among each volt-
age regulator, which has been demonstrated to deteriorate power conversion efficiency, stability, and
reliability of the power delivery network. An effective balanced current sharing scheme that can be
applied to most voltage regulator types is proposed to balance the current sharing. Furthermore, a
relatively high on-chip temperature induced by increased power density leads to prominent voltage
regulator performance degradations due to aging. The emerging type of digital low-dropout regu-
lator is investigated regarding aging induced transient and steady state performance degradations.
Reliability enhancement techniques for digital low-dropout regulators are developed and verified.
Such techniques introduce negligible power and area overhead and do not affect the normal oper-
ations of digital low-dropout regulators. Reliability enhancement techniques also reduce the area
overhead needed to mitigate aging induced performance degradations. Area overhead saving further
translates into more space for increased number of distributed on-chip voltage regulators, enabling
scalable on-chip voltage regulation.
viii
CHAPTER 1:
INTRODUCTION
1.1 Power Delivery Networks
Power delivery networks are essential parts of modern integrated systems such as processors,
internet of things (IoT) devices, and energy harvesting platforms to supply the required power
and voltage levels to each functional block within these systems. Representative power delivery
network components [1–3] for modern integrated systems are shown in Fig. 1.1, which include
power supply/off-chip voltage regulators, printed circuit board (PCB) with parasitics, package and
bumps, global power grid, on-chip voltage regulators, and local power grid. The power and voltage
levels generated by the power supply or off-chip voltage regulators feed to the integrated system
through PCB. PCB and global power grid are connected through package and bumps. Global and
local power grids are, respectively, connected to the inputs and outputs of on-chip voltage regulators.
On-chip voltage regulators provide the required power and voltage levels to the load circuits through
local power grid.
Performance of the whole power delivery network such as power conversion efficiency, sta-
bility, and reliability largely replies on the characteristics of each individual component and the
interactions among them [4–29]. Conventionally, a single on-chip voltage regulator per functional
block is utilized to supply the load circuits. Such an on-chip power delivery scheme may not be
able to satisfy the stringent on-chip voltage noise performance requirement especially when each
functional block occupies a relatively large chip area and the load current changes can be consider-
able and abrupt [30, 31]. The emerging distributed on-chip voltage regulation [4, 7, 18, 19, 25] is a
viable solution to achieve better on-chip voltage noise performance. Within such a power delivery
scheme, multiple tiny on-chip voltage regulators are distributed within each functional block to
1
P
o
w
er
 S
u
p
p
ly
/
O
ff
-c
h
ip
 V
o
lt
ag
e 
R
eg
u
la
to
rs
P
ri
n
te
d
 C
ir
cu
it
 B
o
ar
d
 w
it
h
 
P
ar
as
it
ic
s
P
ac
ka
ge
 a
n
d
 B
u
m
p
s
G
lo
b
al
 P
o
w
er
 G
ri
d
O
n
-c
h
ip
 V
o
lt
ag
e 
R
eg
u
la
to
rs
Lo
ca
l P
o
w
er
 G
ri
d
Lo
ad
 C
ir
cu
it
s
Figure 1.1: Representative power delivery network components for modern integrated systems.
regulate the voltage and supply the load current. Localized on-chip voltage regulation enabled by
each individual voltage regulator can rapidly mitigate the voltage overshoot and undershoot due to
load current changes to realize better on-chip voltage noise profile.
By adaptively turning on or off some of the distributed on-chip voltage regulators, the
regulator-gating methodology [15] can be adopted to achieve optimal power efficiency. Moreover,
once the optimal number of active on-chip voltage regulators is determined, the locations of these
active on-chip voltage regulators can be decided in a thermally-aware fashion to optimize the on-
chip thermal profile [31]. The locations of the distributed on-chip voltage regulators together with
decoupling capacitors at the design stage have also been investigated to achieve better voltage
noise performance [5]. Due to the large number of distributed on-chip voltage regulators and the
sophisticated interactions among different voltage regulators and the on-chip power grids, conven-
tional stability checking criteria for individual voltage regulator cannot be directly applied. Efficient
localized stability checking schemes [32, 33] have been proposed to address this issue.
More recently, power delivery network has been leveraged to enhance the security of modern
integrated systems [34–61]. The output voltage ripple of a multiphase switched capacitor converter
has been shown to be related to the activation/deactivation pattern of different phases in [61].
Through randomly changing the activation/deactivation pattern of a multiphase switched capacitor
converter, the security level against side channel power analysis attacks is enhanced. Converter-
reshuﬄing technique is proposed in [59] to further improve the security feature against side-channel
attacks by randomly changing the active converter stages even when there is no load current change.
Time-delayed converter-reshuﬄing [48], charge-withheld converter reshuﬄing [46], and false key-
2
Vin
VoutFee
d
b
ack 
co
n
tro
l an
d
 
gate d
rivers
L
C Iload
MP
MN
Vref
Vout
Vramp
Figure 1.2: Schematic of a buck converter.
controlled aggressive voltage scaling [42] are more advanced countermeasures against power analysis
attacks through leveraging different characteristics of power delivery network. Multiphase on-chip
voltage regulators have been exploited as strong physical unclonable function (PUF) primitives to
secure IoT devices [35]. Distributed on-chip power delivery also demonstrates advantages against
electromagnetic (EM) side-channel attacks [51].
Individual voltage regulator is an essential part of distributed on-chip voltage regulation
systems. Different types of on-chip voltage regulators demonstrate their respective advantages that
can be utilized for different application scenarios. Typical on-chip voltage regulator types include
buck converters, low-dropout regulators (LDOs), and switched capacitor converters which are briefly
illustrated below.
1.1.1 Buck Converters
Buck converters consist of power switches MP and MN , an LC filter, feedback control
circuits, and gate drivers as shown in Fig. 1.2. Feedback control circuits sense the output voltage
Vout change and compare the instant Vout with a reference voltage Vref to determine if the duty
cycle of the power transistor gate signals needs to be increased or decreased. A ramp signal Vramp is
utilized for pulse width modulation (PWM) signal generation. The intermediate voltage generated
in the middle of the power transistors MP and MN is filtered to supply the load. Buck converters
3
Vin
Vout
C Iload
MP
Vref
Vout +
Figure 1.3: Schematic of a low-dropout regulator.
can achieve high power conversion efficiency over a wide load current range and are implemented
in recent Intel CoreTM SoCs [62].
1.1.2 Low-Dropout Regulators
LDOs have the benefits of easy implementation and fast transient response speed. The
schematic of a conventional analog LDO is shown in Fig. 1.3. It consists of a power transistor
MP , an error amplifier, and an output capacitor. The error amplifier senses the output voltage Vout
changes and compares the instant Vout with the reference voltage Vref to generate the gate signal of
MP for output voltage regulation. WhenMP is divided into a power transistor array and controlled
by a digital logic, digital LDOs [63] are formed which can operate under low supply voltages for
IoT applications. LDOs are widely used in recent IBM [64] and Intel products [65, 66].
4
Vin Vout
Cfly
clk
clk
clk
clk_b
Figure 1.4: Schematic of a switched capacitor converter with 2:1 conversion ratio.
1.1.3 Switched Capacitor Converters
Switched capacitor (SC) converters consist of only capacitors and switches. The schematic
of a SC converter with 2:1 conversion ratio is shown in Fig. 1.4. It includes four switches and one
flying capacitor Cfly. The switches are controlled by complementary clock signals clk and clk_b.
Cfly is charged when connected to input voltage Vin and discharged when only connected to Vout
and ground. As no inductor is needed, SC converters have the benefits of easy integration. Through
varying the topologies of capacitors and switches, different conversion ratios as well as reconfigurable
conversion ratios can be achieved [67]. A high power density of 3.2 W/mm2 can also be realized
utilizing a switched capacitor converter [68]. Distributed version of a multiphase SC converter has
also been implemented in the form of a DC-DC converter ring to achieve fast dynamic voltage
scaling [69].
1.2 Our Contribution
Efficiency, stability, and reliability implications of unbalanced current sharing phenomenon
among distributed on-chip voltage regulators are investigated. An effective balanced current sharing
5
scheme that is general enough for most voltage regulator types within the framework of distributed
on-chip voltage regulation is proposed. The effectiveness and benefits of the proposed scheme are ver-
ified through extensive simulations under practical simulation settings. Furthermore, aging effects
on the emerging digital LDOs, which are essential parts of distributed on-chip voltage regulation
systems, are demonstrated to degrade transient and steady state performance. Aging mitigation
techniques that induce negligible power and area overhead are proposed and verified.
1.3 Organization
The content of this work is organized as follows. In Chapter 2, unbalanced current sharing
issue is discussed and the corresponding balanced current sharing scheme is proposed and verified.
Aging effects on the transient and steady state performance degradation of on-chip voltage regulators
are, respectively, investigated in Chapters 3 and 4. Conclusions and future work are, respectively,
offered in Chapters 5 and 6.
6
CHAPTER 2:
UNBALANCED CURRENT SHARING AMONG DISTRIBUTED
ON-CHIP VOLTAGE REGULATORS
2.1 Introduction
Efficient, stable, and reliable operation of power delivery networks (PDNs) are crucial to
sustain high performance and low power design targets of modern large scale integrated circuits
(ICs).1 Thermal design power (TDP) of microprocessors has increased over generations and can
go beyond 100W [2]. The peak power of a microprocessor can, however, be 1.5 times the TDP
rating [70]. Even small power conversion efficiency degradations within such power-hungry ICs lead
to tremendous power loss, resulting in higher heat dissipation. Meanwhile, the complexity and large
component count incur serious stability and reliability concerns.
Voltage regulators (VRs) as an essential part of PDNs, including commonly used buck,
switched capacitor, and low-dropout regulators, have been moved from off-chip placements to on-
chip implementations to save board area and to enable efficient, fast, and secure localized voltage
regulation [61, 71, 72]. Distributed on-chip voltage regulation has recently become an emerging re-
search field where multiple on-chip VRs are connected in parallel and distributed across the power
grid to supply current across the whole die [32, 33, 73–78]. Previous work mainly focuses on the
efficiency improvement of stand-alone VRs [71] and that of the PDNs as a whole [79]. The impli-
cations of the complex interactions among on-chip VRs and the power grid have, however, been
typically overlooked. Although there are appealing benefits of the distributed on-chip voltage regu-
lation, complex interactions among regulators and the power grid may lead to significant efficiency,
1This chapter was published in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no.
11, pp. 3019-3032, November 2017 "Efficiency, Stability, and Reliability Implications of Unbalanced Current Sharing
among Distributed On-Chip Voltage Regulators". Permission is included in Appendix A.
7
stability, and reliability issues. Among the various implications of distributed on-chip voltage regu-
lation, unbalanced current sharing, if not carefully controlled, can stultify the previously proposed
efficiency enhancement benefits or even shorten the lifetime of the chip.
Unbalanced current sharing problem has been widely studied in conventional power elec-
tronics field for multiphase interleaving buck regulators [80, 81]. Little attention has, however, been
paid to this problem within microelectronics field for distributed on-chip voltage regulation and, to
the best of the authors’ knowledge, the efficiency, stability, and reliability implications of unbalanced
current sharing within distributed on-chip PDNs have not yet been investigated.
Voltage regulators within distributed on-chip PDNs, are connected to a passive mesh net-
work [2], which supplies the required current to the load circuits. Several factors may lead to
unbalanced current sharing within distributed on-chip power delivery systems that consist of mul-
tiple parallel VRs. These factors include mismatches in the component values and control loop
mismatches, which are common factors leading to the unbalanced current within conventional cen-
tralized multiphase regulators [80, 81]. Specific to distributed on-chip PDNs, the power grid parasitic
impedance among the VRs and load circuits, although quite small, may have significant variations
based on the placement of the VRs and the load circuits. Therefore, even with perfectly matched
components and control loops among different distributed on-chip VRs, the variations of the power
grid resistance among individual VRs and load circuits may lead to non-negligible mismatch and
severe current sharing problems.
The contribution of this chapter is threefold. First, the unbalanced current sharing problem
is presented with extensive simulations in both Cadence Virtuoso and VoltSpot [82]. Power effi-
ciency, stability, and reliability implications of the unbalanced current sharing within distributed
on-chip PDNs are investigated. Theoretical derivations and simulation results lead to the obser-
vation that unbalanced current sharing can adversely affect the important design concerns, which
necessitates an efficient current balancing scheme. Second, an adaptive reference voltage control
mechanism is proposed as the current balancing scheme for distributed on-chip VRs to dynami-
cally modulate the reference voltage of each individual VR. Circuit implementations are analyzed
for the proposed control algorithm and preliminary simulations are performed to verify the effec-
8
Global power grid
Connected to package
VR
VR
VR
VR
VR
Connected to package
Load circuitsGlobal ground
distribution
Local power grids
VDD C4 pads
GND C4 pads
Distributed 
on-chip VRs
Figure 2.1: On-chip power delivery network with distributed voltage regulators.
tiveness. Finally, an IBM POWER8-like [64] microprocessor simulation platform is constructed in
VoltSpot [82] to study the implications of the unbalanced current sharing problem in practical ap-
plications. Extensive simulations based on several benchmarks are performed and simulation results
confirm the benefits of balanced current sharing. Although the analyses are conducted assuming a
homogeneous PDN with buck regulators, without loss of generality, the proposed technique can be
easily applied to heterogeneous PDNs that house different regulator types.
2.2 The Unbalanced Current Sharing Problem
An on-chip PDN model with distributed VRs is shown in Fig. 2.1. The inputs of the
distributed VRs are connected to a global power grid that is connected to the package through
9
t (s)
     
(a)  
#10-6
2 4 6 8 10In
du
cto
r c
ur
re
nt
 (A
)
-0.5
0
0.5
1
1.5
t (s)
     
(b)  
#10-6
8 8.05 8.1In
du
cto
r c
ur
re
nt
 (A
)
-0.5
0
0.5
1
1.5
t (s)
     
(c)  
#10-6
2 3 4 5In
du
cto
r c
ur
re
nt
 (A
)
0
0.5
1
1.5
t (s)
     
(d)  
#10-6
3 3.05 3.1In
du
cto
r c
ur
re
nt
 (A
)
0
0.5
1
1.5
Figure 2.2: Unbalanced current sharing between two identical distributed on-chip buck regulators.
(a) Inductor currents of two identical regulators supplying total load current of 1A. (b) A zoomed
view of the inductor current profiles at steady state. (c) Inductor currents of two identical regulators
supplying total load current of 2A, one inductor current goes saturated due to the maximum 1.27A
load current one regulator can supply. (d) A zoomed view of the inductor current profiles showing
the saturation of one inductor current.
the dedicated C4 pads. The outputs of the distributed on-chip VRs provide the required current
at the target voltage level to the local power grid that feeds the load circuits. The global ground
distribution provides the ground plane for the load circuits and is connected to the package through
the dedicated GND C4 pads. The global and local power grid, and the global ground distribution
are composed of orthogonal metal lines connected with vias [2]. With a first order approximation,
these power grids can be modeled as a resistive mesh where the effective resistance between any
two nodes on the power grid depends on the distance between the two nodes [30, 83]. The effective
resistance mismatch between the distributed VRs with only local voltage regulation loops may cause
unbalanced current sharing among the VRs and may even cause VR malfunctions.
10
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 3
0.
0
0.
5
1.
0
1.
5
2.
0
2.
5
On−chip VR location on the IBM POWER8 like processor chip
On
−c
hip
 V
R 
loa
d 
cu
rre
nt
 (A
)
0.
0
0.
5
1.
0
1.
5
2.
0
2.
5
Fig. 3. Unbalanced current sharing among 96 identical distributed on-chip
VRs within IBM POWER8 like microprocessor.
metal layers in [23]. Second, a buck regulator model is
extracted and included in VoltSpot [20] for PDN simulations
with large number of on-chip VRs. An IBM POWER8 like
processor with 96 identical distributed regulators is used in the
simulations. Detailed VoltSpot simulation setup is explained
in Section VII. Simulation results demonstrating the unbal-
anced current sharing problem in both Cadence Virtuoso and
VoltSpot are summarized in this section.
A. Large current variations
The load current supplied by a buck regulator is the average
value of the respective inductor current. The inductor current
of the two regulators when the total load current is 1A is
shown in Fig. 2 (a) and (b). Due to the difference in the
effective resistance for the two regulators, these regulators
have different average inductor current values of 328.7mA and
671.3mA, respectively. With unbalanced current sharing, one
regulator supplies more than twice the average output current
than the other. With a larger effective resistance mismatch, the
difference can be even larger.
The output current values of the 96 identical distributed
on-chip VRs within an IBM POWER8 like microprocessor
chip for application lu ncb is shown in Fig. 3. The detailed
simulation setup is explained in Section VII. In this simulation,
96 on-chip VRs are evenly distributed across the chip. As can
be seen from Fig. 3, large current variations occur among these
on-chip VRs. The highest current supplied by one VR goes up
to nearly 2.5A and the lowest current supplied by one VR is
around 0.5A. There is 5x difference between the highest and
lowest on-chip VR current.
B. Voltage regulator malfunctions
For the same two buck regulator design at the same physical
locations on the power grid as used in Section II-A, with a
higher total load current of 2A, the inductor current distribu-
tion between the two regulators is shown in Fig. 2 (c) and (d).
As can be seen from these figures, the difference between the
two regulator inductor currents gradually becomes larger and
at steady state one inductor becomes saturated and provides a
constant current. For the saturated regulator, the pull-up PMOS
is always on, leading to 100% duty cycle operation and the
malfunction of the VR. When the total load current is equally
Bu
ck
 re
gu
lat
or
 e
ffic
ien
cy
 (%
)
Load current (A)                      Load current (A)
SC
 re
gu
lat
or
 e
ffic
ien
cy
 (%
)
Load current (A)
LD
O 
cu
rre
nt
 e
ffic
ien
cy
 (%
)
I1 Io/2 Io/2I1I2 I1 I2Io/2I2
Fig. 4. Conventional buck regulator, SC regulator, and LDO efficiency curves.
shared between the two, the malfunction of the VRs could be
avoided as the current supplied by each VR is less than the
maximum VR current capability.
Please note that, in Fig. 3, on-chip VR model is included
in VoltSpot for current distribution simulations and no limit
is set for the maximum current that an individual VR can
provide. If the output current capability of a VR is designed
to be 1.5A, there would be more than ten on-chip VRs that
enter this saturation point in this simulation, leading to chip-
wide VR malfunctions. As over-current protection schemes
are implemented for most of the DC-DC converters, VR mal-
functions can be avoided. However, overloaded current can
lead to output voltage drop [24], which is still not acceptable.
Furthermore, as one VR supplies 5x current than the other,
huge current density can lead to local hotspots of the VR and
even destroy the VR and the nearby functional blocks.
III. EFFICIENCY IMPLICATIONS OF UNBALANCED
CURRENT SHARING
Power conversion efficiency curves for the conventional
buck, SC, and LDO regulators are shown in Fig. 4. Consider
two identical distributed on-chip buck or SC regulators with
each design optimized at Io/2 for a total load current of Io.
With balanced current sharing, each buck or SC regulator
operates at the optimum design point, providing maximum
efficiency. With unbalanced current sharing, one regulator
provides lower current I1 while the other one provides higher
current I2. As can be seen from Fig. 4, any variation in the
load current from the optimum load current point leads to an
unavoidable power efficiency loss. For LDOs, the efficiency
is determined by
⌘LDO =
IoVo
(Io + Iq)Vi
, (1)
where Io is the output current of the LDO and Iq is the qui-
escent current. With balanced current sharing, each LDO pro-
vides Io/2 current and the total efficiency is (IoVo/2)/(Io/2+
Iq)Vi = IoVo/(Io + 2Iq)Vi. With unbalanced current sharing,
one of the LDOs provides I1 current and the other one
provides I2 current with I1 + I2 = Io. Since MOS transistors
have a nearly constant quiescent current with respect to the
load current [25], the total efficiency can be expressed as
(I1+I2)Vo/(I1+I2+2Iq)Vi, which is the same as the balanced
current sharing case. Theoretically, there is no significant
efficiency degradation due to unbalanced current sharing for
LDOs, however, larger currents induced by the unbalanced
Figure 2.3: Unbalanced current sharing among 96 identical distributed on-chip VRs within IBM
POWER8 like micropro essor.
To demonstrate the unbalanced current sharing problem, two sets of simulations are per-
formed. First, two identical buck regulators providing localized voltage regulation are designed and
simulated in Cadence Virtuoso using IBM 130nm CMOS process. The input voltage of the buck
regulator is 3.3V and the output voltage is 1V. The switching frequency is 140MHz with a 5nH
inductor. The peak to peak current ripple on the inductor is about 1A and the load regulation
is 0.02%/A. Each regulator has a maximum load current supply capability of 1.27A. The on-chip
power grid is designed as a resistive mesh using the design parameters of respective met l layers
in [84]. Second, a buck regulator model is extracted and include in VoltSpot [82] f r PDN sim-
ulations with large number of on-chip VRs. An IBM POWER8 like processor with 96 identical
distributed regulators is used in the simulations. Detailed VoltSpot simulation setup is explained
in Section 2.7. Simulation results demonstrating the unbalanced current sharing problem in both
Cadence Virtuoso and VoltSpot are summarized in this section.
2.2.1 Large Current Variations
The load current supplied by a buck regulator is the average value of the respective inductor
current. The inductor current of the two regulators when the total load current is 1A is shown in
11
Fig. 2.2 (a) and (b). Due to the difference in the effective resistance for the two regulators, these
regulators have different average inductor current values of 328.7mA and 671.3mA, respectively.
With unbalanced current sharing, one regulator supplies more than twice the output current than
the other. With a larger effective resistance mismatch, the difference can be even larger.
The output current values of the 96 identical distributed on-chip VRs within an IBM
POWER8 like microprocessor chip for application lu_ncb is shown in Fig. 2.3. The detailed simu-
lation setup is explained in Section 2.7. In this simulation, 96 on-chip VRs are evenly distributed
across the chip. As can be seen from Fig. 2.3, large current variations occur among these on-chip
VRs. The highest current supplied by one VR goes up to nearly 2.5A and the lowest current sup-
plied by one VR is around 0.5A. There is 5x difference between the highest and lowest on-chip VR
current.
2.2.2 Voltage Regulator Malfunctions
For the same two buck regulator design at the same physical locations on the power grid
as used in Section 2.2.1, with a higher total load current of 2A, the inductor current distribution
between the two regulators is shown in Fig. 2.2 (c) and (d). As can be seen from these figures, the
difference between the two regulator inductor currents gradually becomes larger and at steady state
one inductor becomes saturated and provides a constant current. For the saturated regulator, the
pull-up pMOS transistor is always on, leading to 100% duty cycle operation and the malfunction
of the VR. When the total load current is equally shared between the two, the malfunction of the
VRs could be avoided as the current supplied by each VR is less than the maximum VR current
capability.
Note that, in Fig. 2.3, on-chip VR model is included in VoltSpot for current distribution
simulations and no limit is set for the maximum current that an individual VR can provide. If the
output current capability of a VR is designed to be 1.5A, there would be more than ten on-chip
VRs that enter this saturation point in this simulation, leading to chip-wide VR malfunctions. As
over-current protection schemes are implemented for most VRs, VR malfunctions can be avoided.
However, overloaded current can lead to output voltage drop [85], which is still not acceptable.
12
Bu
ck
 re
gu
lat
or
 e
ffic
ien
cy
 (%
)
Load current (A)                      Load current (A)
SC
 re
gu
lat
or
 e
ffic
ien
cy
 (%
)
Load current (A)
LD
O 
cu
rre
nt
 e
ffic
ien
cy
 (%
)
I1 Io/2 Io/2I1I2 I1 I2Io/2I2
Figure 2.4: Conventional buck regulator, SC regulator, and LDO efficiency curves.
Furthermore, as one VR supplies 5x current than the other, huge current density can lead to local
hotspots of the VR and even destroy the VR and the nearby functional blocks.
With unbalanced current sharing, each on-chip VR needs to be designed for the worst case
scenario to be able to supply the highest possible current with high efficiency. The size of power
MOSFETs needs to be increased as compared to the design targeting at the total load current
divided by N for N distributed VRs, which may introduce extra power and area overhead as power
MOSFETs can occupy a large percentage of the total VR area.
2.3 Efficiency Implications of Unbalanced Current Sharing
Power conversion efficiency curves for the conventional buck, SC, and LDO regulators are
shown in Fig. 2.4. Consider two identical distributed on-chip buck or SC regulators with each design
optimized at Io/2 for a total load current of Io. With balanced current sharing, each buck or SC
regulator operates at the optimum design point, providing maximum efficiency. With unbalanced
current sharing, one regulator provides lower current I1 while the other one provides higher current
I2. As can be seen from Fig. 2.4, any variation in the load current from the optimum load current
point leads to an unavoidable power efficiency loss. For LDOs, the efficiency is determined by
ηLDO =
IoVo
(Io + Iq)Vi
, (2.1)
13
where Io is the output current of the LDO and Iq is the quiescent current. With balanced cur-
rent sharing, each LDO provides Io/2 current and the total efficiency is (IoVo/2)/(Io/2 + Iq)Vi =
IoVo/(Io + 2Iq)Vi. With unbalanced current sharing, one of the LDOs provides I1 current and the
other one provides I2 current with I1 + I2 = Io. Since MOS transistors have a nearly constant
quiescent current with respect to the load current [86], the total efficiency can be expressed as
(I1 + I2)Vo/(I1 + I2 + 2Iq)Vi, which is the same as the balanced current sharing case. Theoretically,
there is no significant efficiency degradation due to unbalanced current sharing for LDOs, however,
larger currents induced by the unbalanced current sharing do adversely affect the reliability as will
be discussed in Section 2.5.
Buck regulators will be the focus throughout the chapter, however, the proposed techniques
can also be tailored for SC and LDO regulators. The regulator loss model and optimum efficiency
discussions are provided in Section 2.3.1 . The extra power loss and efficiency degradation induced
by unbalanced current sharing for the general case of N identical distributed on-chip regulators are
theoretically explored in Section 2.3.2.
2.3.1 Regulator Loss Model and Efficiency
The simplified schematic of a synchronous buck regulator is shown in Fig. 2.5. It is composed
of high-side (Q1) and low-side (Q2) power MOSFETs for synchronous rectification, LC filter with
parasitic resistances RDCR and RESR, and a feedback control path.
The simplified power loss model in [87] is enhanced by including the conduction loss of the
capacitor ESR (PESR) for the power loss analysis in synchronous buck regulators
Ploss = Req · i2rms + PESR +A · f (2.2)
where Req is the regulator equivalent resistance, irms is the inductor RMS current, A is the switching
power loss factor, and f is the regulator switching frequency. Detailed power loss analysis and
expressions for Req, PESR, and A can be referred to [87, 88].
14
CVi
Dead time 
generation 
and gate 
drivers
L
EA
Compensator
CMP
Ramp
Vref
RDCR
RESR
Q1
Q2
+
_
+
_
Vo
Io
Ii IL
Figure 2.5: Simplified schematic of a synchronous buck regulator.
Power conversion efficiency can be written as
η =
Pout
Pout + Ploss
. (2.3)
Since PESR is independent of the regulator output current Io, by setting ∂η/∂Io = 0, the maximum
efficiency for the continuous conduction mode (CCM) operation is obtained as [87]
ηmax =
1
1 + 2
Req
Vo
· Io_opt
(2.4)
at the optimum load current of
Io_opt =
√
A · f + PESR
Req
+
1
12
I2p−p (2.5)
where Vo and Ip−p are, respectively, the regulator output voltage and inductor peak to peak current.
15
2.3.2 Efficiency Degradation of Distributed Regulators with Unbalanced Current Shar-
ing
Consider two identical buck regulators and assume the total load current supplied by these
two regulators is Io and each regulator design is optimized at Io/2. With unbalanced current sharing,
the load current supplied by the two regulators are, respectively, I1 and I2 for regulators 1 and 2.
Current sharing ratio (CSR) for the two regulators are
CSR1 =
I1
Io
, CSR2 =
I2
Io
. (2.6)
According to (2), for CCM operations, the extra power loss induced by the unbalanced
current sharing for two regulators as compared to the balanced case is
P exloss_2 = Req · I2o · (CSR21 + CSR22 −
1
2
) (2.7)
and P exloss_2 = 0 if and only if when CSR1 = CSR2 = 1/2, otherwise P
ex
loss_2 > 0, which means that
unbalanced current sharing leads to extra power loss.
Efficiency degradation due to unbalanced current sharing can be written as
ηdeg_2 = ηmax|Io_opt= Io2 −
Vo
Vo
ηmax|
Io_opt=
Io
2
+Req · Io · (CSR21 + CSR22 − 12)
, (2.8)
where ηmax|Io_opt= Io2 is the maximum efficiency at the optimum load current of Io/2. Note that
ηdeg = 0 for balanced current sharing.
Equations (2.7) (2.8) can be generalized for N identical distributed on-chip VRs with each
design optimized at Io/N for a total load current of Io as explained below.
The extra power loss induced by unbalanced current sharing with CSRi for the ith regulator
is
P exloss_N = Req · I2o · (
N∑
i=1
CSR2i −
1
N
). (2.9)
16
CSR1
0 0.2 0.4 0.6 0.8 1
Ex
tra
 P
ow
er
 L
os
s (
m
W
)
0
10
20
30
40
50
(a)
CSR1
0 0.2 0.4 0.6 0.8 1
Ef
fic
ien
cy
 D
eg
ra
da
tio
n 
(%
)
0
1
2
3
4
5
6
7
(b)
0
CSR1
0.2
0.40.4
0.2
CSR2
150
100
50
0
0
Ex
tra
 P
ow
er
 L
os
s (
m
W
)
(c)
0
CSR1
0.2
0.40.4
0.2
CSR2
0
5
15
10
0E
ffic
ien
cy
 D
eg
ra
da
tio
n 
(%
)
(d)
Figure 2.6: Unbalanced current sharing induced extra power loss and efficiency degradation as a
function of CSRi for N identical distributed on-chip VRs. (a) Extra power loss, N=2. (b) Efficiency
degradation, N=2. (c) Extra power loss, N=3. (d) Efficiency degradation, N=3.
The total efficiency degradation induced by unbalanced current sharing is
ηdeg_N = ηmax|Io_opt= IoN −
Vo
Vo
ηmax|
Io_opt=
Io
N
+Req · Io · (
∑N
i=1CSR
2
i − 1N )
. (2.10)
Note that (2.9) (2.10) can be applied to a wide range of load current. As phase shedding
technique [62, 89] for conventional multiphase converters and converter gating technique [61] for
distributed on-chip VRs are well developed to enhance the light load efficiency and achieve a high
efficiency over a wide load range, the number of active VRs Nactive can be dynamically changed
to make sure that each regulator can operate at the optimal efficiency point under various load
17
conditions with balanced current sharing. Thus, (2.9) (2.10) hold for extra power loss and efficiency
degradation calculations under a wide load range.
As an example, using design parameters in [71] for the fully integrated buck regulator,
the extra power loss and efficiency degradation are evaluated for two and three distributed buck
regulator cases with different CSR values. Each regulator is optimized at 225mA and the total load
currents are, respectively, 450mA and 675mA for two and three regulator cases. As can be seen from
Fig. 2.6, as CSR varies from the balanced current sharing point (CSR1 = 0.5 for two regulator case,
CSR1 = CSR2 = 1/3 for three regulator case), the additional power loss and efficiency degradation
increase rapidly. Moreover, the highest extra power loss and efficiency degradation points for the
three regulator case are worse than the two regulator case. It is difficult to visually demonstrate
the extra power loss and efficiency degradation change when the number of regulators increase over
three. With more number of regulators and larger output current, however, the highest extra power
loss and efficiency degradation further increase. This indicates that significant attention should be
paid to guarantee the proper current sharing among distributed on-chip VRs that are widely used
in high performance microprocessors.
2.4 Stability Implications of Unbalanced Current Sharing
Stable operation of the stand-alone on-chip VR as well as the whole PDN is the basis for
every other performance metric. Oscillations can occur due to an unstable internal feedback loop
of a single VR or interactions among different VRs. The stability issue, if not properly addressed,
can adversely affect important design aspects including line and load regulations, making other
performance enhancing techniques useless.
Stability implications of unbalanced current sharing are explored for both individual on-chip
VRs and the PDN as a whole in this section. To evaluate the effects of unbalanced current sharing
on individual on-chip VRs, the state-space averaging method [90] is applied to obtain the various
important transfer functions of closed loop synchronous buck regulators while considering parasitic
impedances. For the stability of the whole PDN, the implications of unbalanced current sharing
18
can be examined by analyzing the Y-parameter model of the individual on-chip VRs based on the
recently proposed hybrid stability framework for PDNs [32].
2.4.1 Stability of Individual On-Chip Voltage Regulators
The state-space expression for a conventional voltage mode controlled buck regulator with
diode rectification and g-parameters has been explored in [91]. For the synchronous buck regulator
operating in CCM, as shown in Fig. 2.5, the open-loop g-parameter set can be written as
 Yi_o Toi_o
Gio_o −Zo_o
 =
 D2sL D(1+sRESRC)LC
D(1+sRESRC)
LC − (RE+sL)(1+sRESRC)LC

s2 + sRE+RESRL +
1
LC
(2.11)
Gci
Gco
 =
 sDUEL
UE(1+sRESRC)
LC

s2 + sRE+RESRL +
1
LC
+
Io
0
 (2.12)
where
RE = RDCR +Ron_hsD +Ron_ls(1−D) (2.13)
UE = Vi + (Ron_ls −Ron_hs)Io. (2.14)
Yi_o, Toi_o, Gio_o, Zo_o, Gci, Gco, D are, respectively, the open loop input admittance, the output to
input current transfer function, the input to output voltage transfer function, the output impedance,
the control to input current transfer function, the control to output voltage transfer function, and
the duty cycle of the buck regulator.
The line and load regulation capabilities of a buck regulator can be examined by analyzing
the closed-loop input to output voltage transfer function Gio_c and the output impedance Zo_c,
respectively. To achieve a stable line and load regulation, all poles of the corresponding transfer
function need to lie within the left-half of the s-plane. The closed-loop g-parameters can be obtained
19
0 0.2 0.4 0.6 0.8 1
CSRi
-6
-4
-2
0
2
4
6
8
S(
CS
Ri
,N
)
106
N=2 N=10 N=20 N=50 N=80
Unstable
Stable
Figure 2.7: Stability of individual on-chip VR as a function of CSRi and N .
based on the open-loop g-parameters and the relationship demonstrated in [91]. Assuming Type III
compensation [92], the characteristic equation of Gio_c and Zo_c is
CLs2 + (CGaGccGseUERESR + CRESR + CRE)s+GaGccGseUE + 1 = 0 (2.15)
where Gse, Gcc, Ga are, respectively, the sensing gain of the output voltage, the transfer function
of the error amplifier (EA) and compensator, and the PWM generator gain. Typically, Gse and
Ga are constant. As some of the coefficients are a function of Io, solutions of (2.15) change as
Io changes. For N identical distributed on-chip VRs with unbalanced current sharing, some of the
parallel on-chip VRs will supply more current while others will supply less, leading to the movement
of system poles. As the stability is affected by the right-half plane (RHP) poles, we define a CSR-
20
and N -dependent function S(CSR,N) as
S(CSR,N) =

max
i=1,...,n
{Re(pi)}, max
i=1,...,n
{Re(pi)} < 0
min
i=1,...,j
{Re(p+i )}, otherwise
(2.16)
where n, j, pi (i = 1, ..., n), p+i (i = 1, ..., j) are, respectively, the total number of system poles, the
total number of RHP (or 0) poles, the ith system pole, and the ith RHP (or 0) pole. |S(CSR,N)|
either indicates how close the system is to be unstable (for max
i=1,...,n
{Re(pi)} < 0) or how far the system
has gone beyond the marginally stable point (for otherwise). The system is stable if S(CSR,N) < 0
and unstable otherwise.
Using similar design parameters in [71], S(CSRi, N) for the ith VR within N identical
distributed on-chip VRs is plotted as a function of CSRi and N in Fig. 2.7. It can be seen from
Fig. 2.7 that, for a fixed number N , S(CSRi, N) increases as CSRi increases. Note that although
all CSRi values are plotted even for large number of N in Fig. 2.7 for completeness, due to the
maximum current supply capability of a single VR, inductor current of individual VR can become
saturated and the CCM model is no longer valid. The output voltage can drop [85] for large
number of N and CSRi values, for example N = 80 and CSRi = 0.5. Also, as N becomes large,
S(CSRi, N) approaches the unstable region from the stable one as CSRi increases, indicating the
negative effects of unbalanced current sharing on the stability and proper operation of individual
VR.
2.4.2 Stability of the Power Delivery Network
A sufficient condition for stability checking of the PDN network is proposed in [32] based
on the hybrid stability framework. This condition consists of a complementary way of using either
passivity evaluation or system gain evaluation for LTI systems. By satisfying either one of these two
conditions, the stability of the PDN can be guaranteed. For stability checking using the system gain
condition, a Z-parameter model of the passive subnetwork is needed for evaluation. The passive
subnetwork model can vary for different applications or design requirements, which makes it difficult
21
107 108fk (Hz)
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
i m
in
 N, balanced CS
N=2, CSRi=0.6
N=10, CSRi=0.15
N=20, CSRi=0.1
N=50, CSRi=0.05
Passive
45MHz
Potentially
unstable
Figure 2.8: λimin(jωk) as a function of fk under different values of CSRi and N . λ
i
min(jωk) shifts
rightwards as N · CSRi increases, demonstrating the adverse effects of unbalanced current sharing
on VR passivity.
to evaluate the general effects of unbalanced current sharing on the stability of PDN. However, the
passivity evaluation does shed light on this point.
The synchronous buck regulator system is approximated as a linear continuous-time time-
invariant system through state-space averaging method [93]. Thus, the passivity criterion [32] can
be applied, which is given by
λmin(jωk) = min
i=1,...,N ;j=1,2
{λj(Yi(jωk) + Y Hi (jωk))} (2.17)
where λmin(jωk) is the minimum eigenvalue among any ith VR at ωk and H denotes the complex
conjugate transpose. Passivity condition is met for the VRs if λmin(jωk) ≥ 0.
22
The Y-parameter model for the ith VR can be obtained through the closed-loop g-parameters.
Note that the Y-parameter model is a function of individual VR output current Io and thus with
unbalanced current sharing, it will be affected and so does λmin(jωk). Using the same design pa-
rameters in Section 2.4.1, λimin(jωk) is examined for the i
th VR under different CSRi and N values
in Fig. 2.8, where
λimin(jωk) = min
j=1,2
{λj(Yi(jωk) + Y Hi (jωk))}. (2.18)
λimin(jωk) remains negative for fk < 10MHz and positive for fk > 100MHz. As Io supplied by the
ith VR, (i.e., N · CSRi), increases, λimin(jωk) shifts rightwards, rendering the following
λmin(jωk)|ωk≤ωk0 = min
i=1,...,N
{λimin(jωk)} = λimin(jωk)|CSRi=CSRmax (2.19)
where
λmin(jωk0) = 0, CSRmax = max
i=1,...,N
CSRi. (2.20)
For example, at fk = 45MHz, with balanced current sharing (CS), (i.e., ∀N , balanced CS),
λmin(jωk)|ωk=9pi·107 > 0, the passivity condition is satisfied. However, with unbalanced current
sharing case, (e.g., N = 20, CSRi = 0.1), λmin(jωk)|ωk=9pi·107 < 0, which pushes the originally
passive point to the potentially unstable region, indicating the adverse effects of unbalanced current
sharing on the stability of the whole PDN.
2.5 Reliability Implications of Unbalanced Current Sharing
Electromigration (EM) induced wear-out dictates the lifetime of each component of the
PDN. EM results in gradual mass transport in metal conductors along the direction of an applied
electric field, which in turn may cause open or short circuits. The metal wires in the PDN are
particularly vulnerable to EM as they experience uni-directional currents [94], and such constant
stress reveals EM failures faster. EM grows with current density J.
Black’s equation [95] captures the mean time to failure (MTTF) due to EM:
MTTF = AJ−nexp(Ea/kT ) (2.21)
23
0.4 0.6 0.8 1.0
2
3
4
5
6
7
CSRi
M
TT
F 
(y
ea
r)
2 VRs
3 VRs
Figure 2.9: MTTF as a function of CSRi.
where A is a constant that depends on the geometry, Ea is the EM activation energy, k is Boltzmann
constant, n is a material-specific constant, and T is the temperature. Following [82], Black’s equation
can be adjusted to consider current crowding and Joule heating as
MTTF = A(cJ)−nexp[Q/k(T + ∆T )] (2.22)
where both Q and c are material-specific constants.
Consider N identical distributed on-chip VRs, each of which optimized for a load current of
Io/N , where Io represents the total load current. Since J is directly related to CSRi at a specific
Io, MTTF of the metal wire at the output of the ith regulator can be expressed in terms of CSRi
as
MTTFi = A
′
(cCSRi)
−nexp[Q/k(T + ∆T )] (2.23)
where A′ is a constant that depends on the geometry and Io.
24
VR1
VR2
Vi
Vo1
Vo2
Vo
Reff1
Reff2
Reff3
Iload
I1
I2
I3
Figure 2.10: Simplified model of two identical distributed on-chip VRs with power grid effective
resistances.
For the same example in [71], for two and three regulator cases with a total load current of
450mA and 675mA, respectively, Fig. 2.9 shows how MTTFi for the ith regulator changes due to
unbalanced current sharing. Fig. 2.9 captures the impact of unbalanced current sharing on MTTF
under EM per (2.23). We report how the MTTF varies as a function of CSR where n = 1.8, Q
= 0.8eV, c = 10, and ∆T = 40◦C [96]. We observe that differences in CSR can result in notable
differences in MTTF. The MTTF at CSR = 0.5 (0.33), which corresponds to perfect load balance,
is 5 years at 65◦C for the two (three) regulator case. For the two regulator case, both regulators
would have this same MTTF=5 years at CSR = 0.5. If CSR assumes a higher value than 0.5 for
one of the regulators, the MTTF value quickly decreases below 5 years. The other regulator’s CSR
in this case remains lower than 0.5, and hence induces an MTTF of more than 5 years. In this case,
one of the regulators would fail much earlier than the other. Better load balance (i.e., CSR = 0.5
for the two regulator case) mitigates this adverse effect on reliability. Fig. 2.9 reveals a similar trend
for three VR case.
25
Average 
current 
sensor
Vi
Dead time 
generation 
and gate 
drivers
Vi
Dead time 
generation 
and gate 
drivers
Average 
current 
sensor
Vi
Dead time 
generation 
and gate 
drivers
Average 
current 
sensor
Cout
Vi
Dead time 
generation 
and gate 
drivers
L
EA
Compensator
CMP
Triangular wave
N distributed on-chip VRs
To 
power 
grids
Switch network
Vref1 Vref<1:N> 
Bandgap 
voltage 
reference
Rb
Rb
Rs
Rs
Rs
Average 
current 
sensor
Vsense1
Vsense2
VsenseN
N 
comparator
current_
mismatch 
decision
Vref 
control 
logic
EN
Figure 2.11: System level block diagram of the proposed Vref control method and multi-level Vref
generator for N identical distributed on-chip VRs.
2.6 Adaptive Reference Voltage Control
The implications of unbalanced current sharing on power efficiency, stability, reliability and
overall functionality of the chip are demonstrated above. Balanced current sharing is beneficial to
maintain the overall PDN performance. An adaptive reference voltage control method designed
specifically for distributed on-chip VRs is proposed to balance the current sharing. The proposed
technique is scalable for different number of distributed on-chip VRs and can be used for different
types of VRs. The control algorithm is explained and circuit implementation and simulations are
presented to verify the effectiveness of the proposed techniques. Practical concerns are also addressed
in this section.
2.6.1 Adaptive Reference Voltage Control Mechanism
Consider two identical distributed VRs connected to the same power grid. The simplified
model is shown in Fig. 2.10 with the power grid effective resistance included between any two
connection nodes within the grid. With a large number of VDD C4 pads, the input voltage of the
VRs Vi can be considered ideal and constant. To perform a steady state analysis with multiple VRs,
suppose Vo1 = Vo2, then I3 = 0, and Reff3 can be removed as open circuit. When Vo1 = Vo2, to
make I1 = I2 for balanced current sharing, Reff1 and Reff2 have to be equal. However, in practice,
due to the location variations of the VRs with respect to the load, Reff1 and Reff2 can hardly be
26
Start
Skip n cycles and take 
average inductor 
current for each VR
Find min and max 
average inductor 
current
Is max-
min>current_mismatch?
Decrease Vref_max by a 
voltage step;
Increase Vref_min by a 
voltage step
No
Yes
Figure 2.12: Flowchart of the proposed adaptive Vref control algorithm.
equal, which means variations between Vo1 and Vo2 are unavoidable to make I1 = I2 for balanced
current sharing. In fact, the effective resistances Reff1, Reff2, and Reff3 are very small, making the
balanced current sharing possible with quite small variations of Vo1 and Vo2 with negligible effects
on the regulated output voltage Vo.
Based on the above analyses, an adaptive reference voltage Vref control mechanism that is
tailored specifically for distributed on-chip VRs is proposed. A system level block diagram of the
proposed adaptive Vref control method is illustrated in Fig. 2.11 and the Vref control algorithm is
presented in Fig. 2.12 for N identical distributed on-chip VRs. The proposed adaptive Vref control
block consists of an average current sensor within each VR, two comparators with N inputs for each
(N comparator) [97] to determine the maximum and minimum currents, a current_mismatch deci-
sion block, and a Vref control logic. For each iteration, the average current value of each VR for that
27
cycle is obtained through the average current sensor and represented by respective output voltage
Vsensei (i = 1, ..., N). The maximum and minimum value of Vsensei (i = 1, ..., N) are decided by the
N comparator [97]. The difference between the maximum and minimum current is compared to a
current_mismatch value by the current_mismatch decision block. The processed outputs of the
N comparator and current_mismatch decision block serve as the control signals for the Vref con-
trol logic for multi-level Vref generation through the switch network and resistor string. Mismatch
between the maximum and minimum average inductor current indicates unbalanced current sharing.
If the mismatch is larger than a certain threshold current_mismatch, the proposed Vref control
algorithm is triggered and the corresponding reference voltages are adjusted. current_mismatch
value is added as an option to adjust the desired accuracy for the current matching among the VRs
and to eliminate constant toggling during steady state where all the VR output currents are close
to each other. If the optimal load current (Io_opt in (2.5)) a single VR can supply is in the range
of several hundred mA, a few mA of the threshold value can be considered as balanced current.
A threshold value of 30 mA is used in the simulations. A too small threshold value can lead to
toggling reference voltages at steady state.
By increasing (decreasing) Vref of an individual on-chip VR, the output current supplied by
that VR will increase (decrease). Vref_max and Vref_min in Fig. 2.12 denote the reference voltages
for the on-chip VRs with the maximum and minimum average inductor current, respectively. Once
the difference between the maximum and minimum average inductor current values is greater than
current_mismatch, Vref_max is decreased by a voltage step to decrease the output current supplied
by the VR which provides the maximum output current. Vref_min is increased by a voltage step to
increase the output current supplied by the VR which provides the minimum output current. The
reference voltages of other VRs remain unchanged.
Note that the Vref control loop waits n clock cycles before changing the Vref again. This
is done in order to allow the VR’s voltage regulation feedback loop to respond before any changes
made to the Vref in the next step. Making the reference control loop slower than the VR’s voltage
regulation feedback loop improves the stability of the overall system.
28
CholdTransmission
gate Chold Rsenseφ Transmission
gate
VR power stage
φ Chold RsenseTransmission
gate
Vsense1Vsense1
Chold Rsense
VR power stage
φ 
opamp
Tr nsmission
gate
VR power stage
φ 
φ 
opamp
Chold
ZVPower MOSFET
RsenseTransmission
gate
Vsense1Vsense1
Chold Rsense
VR power stage
φ 
φ 
opamp
ZVPower MOSFET
Sense 
MOSFET
Transmission
gate
VR power stage
To gate driv r
To gate driver
φ 
φ 
opamp
Vi
Chold
Power 
MOSFET
Sense 
MOSFET
Rsense
Vsense
Vi
Average current sensor
Figure 2.13: Schematic of the average current sensor.
As compared with [73], the proposed method does not rely on equalizing duty cycles to
balance the current sharing, and thus can be applied to most regulator types that need a reference
voltage to operate. Furthermore, as the reference voltage of each VR is adjusted individually with
respect to an initial reference voltage, the power noise on the local power grids is less affected by
localized load fluctuations.
2.6.2 Adaptive Reference Voltage Control Implementation
Circuit level implementation of the proposed adaptive Vref control method is analyzed in
this section. Although buck regulator is adopted for demonstration, the proposed Vref control
method can be applied to other regulator types by adopting an appropriate current sensor for that
regulator type, as the proposed method is a general way of modulating Vref to balance the current.
2.6.2.1 Average Current Sensor
The schematic of the average current sensor [98] is shown in Fig. 2.13. When the sampling
clock φ becomes high, the drain voltages of the power MOSFET and the sense MOSFET are
equalized by the operational amplifier. The inductor current from the power MOSFET is mirrored
29
Vsense1 Vb1
Vdd
Vmax1
Vb1
Vdd
Vmax2
Vsense2 Vb1
Vdd
VmaxN
VsenseN
Vdd
Vb1
Vb2
Vdd
Vdd
Vb2
Vsense1
Vmin1
Vb2
Vdd
Vsense2
Vmin2
Vb2
Vdd
VsenseN
VminN
N comparator 
for maximum 
value decision
N comparator 
for minimum 
value decision
Mtail
M1 M2 MN
Figure 2.14: Schematic of the analog N comparator for maximum and minimum current decision.
to the sense MOSFET and a corresponding voltage Vsense that is proportional to the inductor
current is generated as output. Vsense is maintained when φ becomes low. By replacing the ramp
signal in Fig. 2.5 with a symmetrical triangular waveform shown in Fig. 2.11, a clock signal φ′ can
be generated to sample the instant inductor current value in the middle of the inductor energizing
or de-energizing phase, which corresponds to the average inductor current value [98]. As n clock
cycles need to be skipped before taking the next sample for average inductor current, the frequency
fφ of the actual sampling clock signal φ needs to be fφ′/(n+ 1).
2.6.2.2 N Comparator
The schematic of the N comparator [97] for maximum and minimum current decision is
shown in Fig. 2.14. Vsensei (i = 1, ..., N) from the output of the average current sensor serves as the
input of the N comparator. For the N comparator for maximum current decision, the tail current
30
TG
current_mismatch 
comparator
V’
m
ax
2
Vsense2
TG
V’
m
ax
1
Vsense1
TG
V’
m
ax
N
VsenseN
TG
V’
m
in
2
Vsense2
TG
V’
m
in
1
Vsense1
TG
V’
m
in
N
VsenseN
EN
Vdd
IT
Vmin Vmax
Vdd Vdd Vdd
Vmax
Vmin
2W
L
2W
L
2W
L
W
L
W
L
Iref
EN
Ma Mb
Figure 2.15: Schematic of the current_mismatch decision block.
provided by transistor Mtail is divided into each branch equally when the same voltage is given
to all inputs. Mi (i = 1, ..., N) devices are biased and sized appropriately ((WL )Mtail=N(
W
L )Mi) to
reflect this distribution. The voltage input Vsensei determines the portion of the tail current that
passes through each branch. Since the sum of the currents from all the branches must be equal to
the tail current provided by the Mtail device, the branch with the highest input voltage gets the
largest portion of the tail current. The branch currents are then mirrored and a high resistance
output node is formed using theMi (i = 1, ..., N) devices. SinceMi (i = 1, ..., N) devices are biased
for 1/N of the tail current, the output voltage becomes logic high when a branch gets more than
1/N of the tail current, which is true for the branch with the highest voltage, and logic low if a
branch gets less than 1/N of the tail current. The high resistance node provides high gain at the
output but further cascading may be needed to provide rail to rail outputs. Less than 1mV input
31
3 4 5 6 7
t (s)
(a)
10-6
-0.5
0
0.5
1
1.5
In
du
cto
r c
ur
re
nt
 (A
)
5.95 6 6.05
t (s)
(b)
10-6
-0.5
0
0.5
1
1.5
In
du
cto
r c
ur
re
nt
 (A
)
3.95 4 4.05
t (s)
(c)
10-6
-0.5
0
0.5
1
1.5
In
du
cto
r c
ur
re
nt
 (A
)
3 4 5 6 7
t (s)
(d)
10-6
0.48
0.49
0.5
0.51
0.52
Re
fe
re
nc
e 
vo
lta
ge
 (V
)
w/o Vref control w/ Vref control
Figure 2.16: Simulation results with and without the proposed adaptive Vref control scheme for two
identical distributed on-chip VRs. (a) Inductor currents before and after the proposed Vref control
is applied. (b) A zoomed view of balanced current sharing showing the effectiveness of the proposed
Vref control method. (c) A zoomed view of unbalanced current sharing without the proposed Vref
control. (d) Vrefs signal change showing the operation of the proposed Vref control method.
voltage difference can be distinguished by cascading three stages in the simulations. In the case
where the input voltages are very close to each other, this comparator may give incorrect outputs
where more than one current is minimum or maximum. Considering this case, the outputs of the
N comparator Vmaxi and Vmini (i = 1, ..., N) are processed by a digital logic to generate V ′maxi
and V ′mini (i = 1, ..., N) to control the current_mismatch decision block and Vref control logic
shown in Fig. 2.11. If there are more than one maximum or minimum current, the digital logic
simply selects the VR with smaller i as the one that supplies the maximum or minimum current.
The N comparator for minimum current decision can be implemented as a complement of the N
comparator for maximum current decision shown in Fig. 2.14.
32
Figure 2.17: Simulation results with and without the proposed adaptive Vref control scheme for
three identical distributed on-chip VRs. (a) Inductor currents before and after the proposed Vref
control is applied. (b) A zoomed view of balanced current sharing showing the effectiveness of
the proposed Vref control method. (c) A zoomed view of unbalanced current sharing without the
proposed Vref control. (d) Vrefs signal change showing the operation of the proposed Vref control
method.
2.6.2.3 Current Mismatch Decision
The schematic of the current_mismatch decision block is shown in Fig. 2.15. The processed
outputs of theN comparator V ′maxi and V
′
mini
(i = 1, ..., N) are fed to 2N transmission gates (TG) as
selection signals for the maximum and minimum value of Vsensei (i = 1, ..., N). The maximum and
minimum value of Vsensei serve as the inputs of the current_mismatch comparator as, respectively,
Vmax and Vmin to generate the enable signal EN for subsequent Vref control logic. An intentional
input transistor size mismatch is introduced for the current_mismatch comparator with larger
transistor size connected to Vmin as compared to that connected to Vmax to achieve the offset voltage
Voffset that corresponds to the current_mismatch value. Only when Vmax − Vmin > Voffset will
the EN signal be active. As current_mismatch does not need to be accurate as long as it is
33
1.99 1.995 2 2.005 2.01
t (s)
(a)
10-6
1.2
1.4
1.6
1.8
Si
nu
so
ida
l
 cu
rre
nt
 lo
ad
 (A
)
1.8 2 2.2 2.4 2.6
t (s)
(b)
10-6
1.8
2
2.2
2.4
St
ep
 
cu
rre
nt
 lo
ad
 (A
)
1.99 1.995 2 2.005 2.01
t (s)
(c)
10-6
-0.5
0
0.5
1
1.5
In
du
cto
r c
ur
re
nt
 w
ith
 si
nu
so
ida
l lo
ad
 (A
)
1.8 2 2.2 2.4 2.6
t (s)
(d)
10-6
0
0.5
1
1.5
In
du
cto
r c
ur
re
nt
 w
ith
 st
ep
 lo
ad
 (A
)
2 2.05 2.1
t (s)
(e)
10-6
0
0.5
1
1.5
In
du
cto
r c
ur
re
nt
 w
ith
 st
ep
 lo
ad
 (A
)
2.5 2.55 2.6
t (s)
(f)
10-6
0
0.5
1
1.5
In
du
cto
r c
ur
re
nt
 w
ith
 st
ep
 lo
ad
 (A
)
Figure 2.18: Simulation results with sinusoidal and step load current for three identical distributed
on-chip VRs. (a) Sinusoidal load current applied at 2µs. (b) Step load current waveform applied.
(c) Balanced inductor currents under sinusoidal current load. (d) Balanced inductor currents under
step current load. (e) A zoomed view of balanced inductor currents near the rising edge of the step
current load. (f) A zoomed view of balanced inductor currents near the falling edge of the step
current load.
larger than ∆(∆I), as will be discussed next, practical circuit implementations considering process
variations have negligible impacts on the circuit function.
34
3 4 5 6 7
t (s)
(a)
10-6
-0.5
0
0.5
1
1.5
In
du
cto
r c
ur
re
nt
 (A
)
6.98 6.99 7 7.01 7.02
t (s)
(b)
10-6
-0.5
0
0.5
1
1.5
In
du
cto
r c
ur
re
nt
 (A
)
3.98 3.99 4 4.01 4.02
t (s)
(c)
10-6
-0.5
0
0.5
1
1.5
In
du
cto
r c
ur
re
nt
 (A
)
3 4 5 6 7
t (s)
(d)
10-6
0.45
0.5
0.55
Re
fe
re
nc
e 
vo
lta
ge
 (V
)
w/o Vref control w/ Vref control
Figure 2.19: Simulation results with and without the proposed adaptive Vref control scheme for
three distributed on-chip VRs under distribution wire and VR mismatches. (a) Inductor currents
before and after the proposed Vref control is applied. (b) A zoomed view of balanced current
sharing showing the effectiveness of the proposed Vref control method under distribution wire and
VR mismatches. (c) A zoomed view of unbalanced current sharing without the proposed Vref
control. (d) Vrefs signal change showing the operation of the proposed Vref control method.
2.6.2.4 Multi-Level Reference Voltage Generation
The proposed multi-level Vref generator is composed of a Vref control logic, a bandgap
voltage reference, and a simple resistor string DAC as shown in Fig. 2.11. There are two resistors
with large resistance Rb at the top and bottom of the string and a few resistors with smaller
resistance Rs connected in the middle to generate the desired Vrefs. V ′maxi , V
′
mini
(i = 1, ..., N),
EN and a clock signal, which is a delayed version of φ are given to the Vref control logic. This
logic determines how the reference voltages for each VR should behave according to the algorithm
in Fig. 2.12. The logic can be implemented completely in verilog and synthesized.
The reference voltage generation requires analog implementation, and this implementation
can be a resistor string DAC. The voltage step level that can achieve the desired current_mismatch
35
value is the LSB of the DAC. The goal of the adaptive Vref control method is to achieve ∆I =
Imax− Imin < current_mismatch. If without Vref control, ∆I = ∆I0 and one voltage step change
can introduce ∆(∆I) of ∆I change, the number of bits for the DAC (NDAC) that is fine enough
for balanced current sharing can be estimated as NDAC > log2(∆I0/∆(∆I)). A 7-bit DAC is used
to achieve a 30mA current_mismatch value with a voltage step of 1mV in the simulations. In
the case where large number of VRs and high resolution DAC are needed, a charge pump can be
utilized for each phase after the Vref control logic for DAC implementation to avoid possible routing
problem induced by the resistor string.
2.6.3 Simulation Verifications
To demonstrate the effectiveness of the proposed control method, two and three identical
distributed on-chip VR cases are simulated. The power grid parameters are provided in Section 2.7.
Simulation results with constant DC load current are shown in Fig. 2.16 and Fig. 2.17, respectively,
for the two and three VR cases. In the simulations, ideal Vref = 0.5V is used to realize 1V output
voltage. A Vref step of 1mV is used in the simulations. The proposed adaptive Vref control method
begins to operate at 5µs. As can be seen from Fig. 2.16 (a)(c) and Fig. 2.17 (a)(c), for stand-
alone VRs operating without proper Vref control, large inductor current variations occur among
those VRs. After the proposed Vref control mechanism is applied, seen from Fig. 2.16 (a), (b) and
Fig. 2.17 (a), (b), the unbalanced current converges quickly to the balanced one for both two and
three VR cases. Also, as can be seen from Fig. 2.16 (d) and Fig. 2.17 (d), only small variations of
reference voltage lead to quite good inductor current match and meanwhile the proper operation of
the VRs is guaranteed. Simulation results with a fast changing sinusoidal and a step current load
are shown in Fig. 2.18. In the simulations, the frequency of the sinusoidal wave is ten times of the
VR switching frequency. As can be seen from Fig. 2.18, the proposed Vref control method works
well under changing load currents.
36
Table 2.1: Technology and Architecture Parameters
Technology Parameters
Technology node: 22nm, Frequency: 4.0GHz
TDP: 150W, Area: 441mm2, Vdd: 1.03V
Architecture Parameters
# cores: 8
issue width: 8
64 architectured FRF, 32 architectured IRF
L1-I cache: 32KB, 8-way, 64B, LRU, 1-cycle hit
L1-D cache: 64KB, 8-way, 64B, LRU, 1-cycle hit
L2 cache: 512KB, 8-way, 128B, LRU, 11-cycle hit
L3 cache: 64MB, 8-way, 128B, LRU, 30-cycle hit
2.6.4 Practical Concerns
Considering the practical implementations of the Vref control method, there are parasitic
impedances between the generated reference voltage and the corresponding error amplifier intro-
duced by the distribution wires. The impedance of the distribution wires among different VRs can
be different. Also, there can be VR components and control loop mismatches. Considering these
effects, simulations are performed by introducing wire resistances and capacitances as well as VR
components and loop delay mismatches to justify the effectiveness of the proposed method. 1mm
distribution wire is assumed in the simulations. Based on IBM 130nm process, the parasitic re-
sistance and capacitance are, respectively, around 70Ω and 230fF. A 10% mismatch is introduced
among each VR regarding distribution wire impedance, L, C, RDCR, RESR, Q1, Q2 size. 5ns con-
trol loop delay difference is introduced among each phase. The simulation results for three phases
are shown in Fig. 2.19. As can be seen from the simulation results, the proposed method is immune
to these mismatches.
37
ISU EXU
IFU LSU
L2
(a) core
Core1 Core2 Core3 Core4
Core5 Core6 Core7 Core8
L3 L3
L3 L3
NOCM
C
M
C
(b) chip
Figure 2.20: Chiplet simplified floorplan.
2.7 Case Study: IBM POWER8 Like Microprocessor
2.7.1 Benchmarks
All the benchmarks used in the simulations are from SPLASH2x [99]. The benchmarks
experimented represent typical application domains and features. Eight threads are involved in the
simulations and analysis is limited to the region-of-interest of the benchmarks.
2.7.2 Architecture
An IBM POWER8-like [64] processor is modeled to quantitatively characterize unbalanced
current sharing effects. The technology and architecture parameters of the processor are summa-
rized in Table 2.1. The schematic of a core is shown in Fig. 2.20a, which contains a private L2,
an instruction scheduling unit (ISU), an execution unit (EXU), a load store unit (LSU), and an
instruction fetch unit (IFU). L1 data cache is a part of LSU, while L1 instruction cache resides
38
0 0.5 1 1.5 2 2.5 3
Load Current (A)
50
60
70
80
90
Ef
fic
ien
cy
 (%
)
Figure 2.21: Calibrated efficiency curve for the on-chip voltage regulator.
inside IFU. Fig. 2.20b illustrates the whole chip floor plan, which contains 8 cores, 96 identical
on-chip regulators, shown as little squares, network-on-chip (NOC), and memory controller (MC).
2.7.3 Simulation Framework
Dynamic power traces are collected by integrating MR2 [100] version of McPAT [101] into
SNIPER6.0 [102] micro-architectural simulator. Then, we calculate the static power of each unit
based on its temperature and area. We use the equation from [103] to capture temperature-
dependence of static power. The static power of the whole chip is calibrated in a way that it
takes less than 30% of the total chip power at 80◦C. Hotspot6.0 [104] is used to find the transient
temperature across the chip. Transient temperature (output of Hotspot) is used to calculate the
static power (input to Hotspot). So, we iteratively run Hotspot and update the static power num-
bers until they converge. Default parameters of Hotspot are used. VoltSpot is deployed to capture
the current distribution among VRs at different locations and the method from [82] is followed to
generate cycle-accurate power traces. One sample contains 2K cycles and 200 samples are obtained
39
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
lu_ncb barn. chol. fft fmm lu_cb radio. rayt.
power savings reg power loss saving
po
we
r s
av
ing
s (
W
)
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
0
1.
6
3.
2
4.
8
6.
4
8
re
g 
po
we
r l
os
s s
av
ing
 (%
)
Figure 2.22: Power saving and regulator power loss saving with balanced current sharing for
different applications.
with equal distance for each application. The first 1K cycles are used for warm-up and the rest for
analysis. 4 clock cycles are used as the power trace sampling interval.
2.7.4 Power Grid and Voltage Regulator Properties
In VoltSpot configurations, the on-chip power grid is designed as a resistive mesh using
similar metal width, pitch, and thickness parameters in [84] for the global, intermediate, and local
PDN layers. The unit power grid resistance is around 8mΩ and the total power grid size is 345 by
345. The effective resistance between any two nodes can be estimated using the equations in [30,
83].
LDOs used in IBM POWER8 microprocessor and FIVRs used in Intel Haswell micropro-
cessor are two state-of-the-art on-chip power delivery solutions. It is demonstrated in [78] that
FIVR-based power delivery scheme is more advantageous with large number of cores due to high
efficiency over a wide conversion ratio. The gaining impetus and benefits of distributed on-chip
voltage regulation together with the advantages of FIVR motivate us to investigate distributed
buck regulators in the simulation setups.
96 identical on-chip VRs, with the area of each as 0.04mm2, are used in the simulations to
distribute across the chip as shown in Fig. 2.20b. The optimal placement of LDOs is first investigated
40
in [105] to meet the IR-drop constraint. To avoid any adversely biased analysis in our simulations,
we mimic the algorithm proposed in [106] where a voltage-noise-minimizing technique is proposed
to determine the locations of the C4 pads across several benchmarks. We use this algorithm to
determine the optimal locations of the on-chip VRs that would minimize the voltage-noise. Since
the resulting maximum voltage noise only decreases by less than 0.4% with the optimal placement
as compared to the uniform distribution, we adopt the uniform placement of the VRs to simplify
the analysis. These on-chip regulators are calibrated to match the conversion efficiency of FIVR
design in Intel’s Haswell processor [62] as it is one of the most efficient regulators in industry.
Efficiency curves in [62] are picked for calibration and each VR provides around 1A load current
with the optimum efficiency of about 90%. The calibrated efficiency curve is shown in Fig. 2.21.
The on-chip VR is modeled as an ideal supply voltage in series with a RLC network in VoltSpot [82]
simulations. Simpler RL and RC based models have previously been used, respectively, in [2, 107]
and in [108] to model VRs. The proposed adaptive Vref control method can be applied to balance
the current sharing.
Simulation results showing the power saving and regulator power loss saving with balanced
current sharing for different applications are shown in Fig. 2.22. Power saving up to 1W and
VR power loss saving up to 8% are observed. Note that balancing the current may lead to extra
power losses on the power-grid resistors. The total gained power saving is due to the fact that
the power saving induced by balanced current sharing can be much larger than the extra power
loss consumed on the power grid resistors. For a general case of N distributed VRs, a total load
current NIo_opt with any CSRi (i = 1, ..., N) for the ith VR, when CSRi varies further from the
balanced current sharing point, balanced current sharing may introduce more loss on the power-grid
parasitic resistors, however, balanced current sharing induced power saving also increases as can
be seen from Fig. 2.6 and (2.9). With large number of VRs deployed, distributed load currents
are supplied by adjacent VRs, which effectively reduces the distance VR output currents travel to
balance others. Furthermore, effective resistance between two nodes on the power grid does not
increase linearly with distance [30, 83]. Even with quite large distance, effective resistance can be
only a few times of the unit power-grid resistance. All these factors contribute to the power savings
41
seen from Fig. 2.22. More importantly, with balanced current sharing, VR malfunctions can be
avoided and stability and reliability are enhanced.
2.8 Conclusion
Efficiency, stability, and reliability implications of unbalanced current sharing among dis-
tributed on-chip voltage regulators are investigated in this chapter both theoretically and through
extensive simulations. A current balancing scheme that can be applied to most regulator types
is proposed in this work. A simple relationship between the individual voltage regulator output
current and its corresponding Vref is identified for balanced current sharing. And an adaptive Vref
control method based on the relationship is proposed. The proposed method generates and modu-
lates the Vref for each regulator to balance the output current. The implementation of the method
is analyzed and simulations are presented to verify the effectiveness. Regulator power loss saving
up to 8%, enhanced system stability, and several years of MTTF improvement are verified through
practical case studies.
42
CHAPTER 3:
AGING EFFECTS ON THE TRANSIENT PERFORMANCE DEGRADATION OF
ON-CHIP VOLTAGE REGULATORS
3.1 Introduction
With ubiquitous applications of on-chip voltage regulation [2] within modern micropro-
cessors, Internet of Things (IoT), wireless energy harvesting, and applications such as aerospace
engineering, the reliable operation and lifetime of on-chip voltage regulators have become one of the
most significant and challenging design considerations.1 Within those applications, large variations
in the load current, voltage, and temperature can occur. These variations may speed up the aging
process of the devices under stress and further deteriorate the performance and lifetime of on-chip
voltage regulators. As those regulators are already deployed in the field, replacement of them can
be costly or even impossible. The conflicting need of harsh environment applications and highly
reliable designs necessitates reliability evaluations at design stage as well as reliability enhancement
techniques.
The major transistor aging mechanisms include bias temperature instability (BTI), hot
carrier injection (HCI), time dependent dielectric breakdown (TDDB), and electromigration (EM),
among which BTI is the dominant reliability concern for nanometer integrated circuits design [2-4].
BTI can induce threshold voltage increase and consequent circuit level performance degradation.
Positive BTI (PBTI) induces aging of nMOS transistors while negative BTI (NBTI) causes aging of
1This chapter was published in IEEE Design, Automation and Test in Europe Conference and Exhibition, Dres-
den, Germany, 2018, pp. 803-808 "Mitigation of NBTI Induced Performance Degradation in On-Chip Digital
LDOs" and in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Early Access, DOI (identifier)
10.1109/TVLSI.2018.2871381 "Exploiting Algorithmic Noise Tolerance for Scalable On-Chip Voltage Regulation".
Permissions are included in Appendix A.
43
pMOS transistors [109]. The impact of BTI aging mechanism is a strong function of temperature,
electrical stress, and time.
On the other hand, as an essential part of large scale integrated circuits, on-chip voltage
regulators need to be active most of the time to provide the required power to the load circuit.
The load current and temperature can vary a lot especially for microprocessor applications [31]. All
of these variations partially contribute to different aging mechanisms of on-chip voltage regulators,
which should be considered to avoid overdesign for a targeted lifetime.
Several studies have been performed regarding the reliability issues in nanometer CMOS
designs [6-8]. There is, however, quite limited amount of work on the reliability of on-chip voltage
regulators. Device aging on the immunity level of electro-magnetic interference (EMI) for low-
dropout regulators (LDO) is characterized in [110]. A method of distributing the aging stress by
rotating the phase to shed at light load is proposed in [111] to enhance the light load efficiency for
multiphase buck converters. The reliability of metal wires connected to on-chip voltage regulators is
investigated in [6]. Nonetheless, quantitative analysis of aging effects on on-chip voltage regulators
considering load current characteristics and temperature variations as well as efficient reliability
enhancement techniques under arbitrary load conditions have not yet been investigated.
As compared to other voltage regulator types, the emerging digital LDO (DLDO) has gained
impetus due to the design simplicity, easiness for integration, high power density, and fast response
[63, 112]. DLDOs have demonstrated major advantages in modern processors including the recent
IBM POWER8 processor [64]. More importantly, as compared to the analog LDOs, DLDO can
provide certain advantages for low-power and low-voltage IoT applications due to its capability for
low supply voltage operations [113]. However, as pMOS is used as the power transistor for DLDOs,
NBTI induced degradations largely affect important performance metrics such as the maximum
output current capability Imax, load response time TR, and magnitude of the droop ∆V as defined
in [73]. It is therefore imperative to investigate aging mitigation techniques for DLDOs to achieve
reliable operation of critical systems.
The main contributions of this chapter are threefold. First, NBTI induced threshold volt-
age Vth degradations are theoretically demonstrated that deteriorate DLDO performance metrics
44
Iout
Vout
Vin
M1
M2
M3
MN
D
ig
it
al
 
co
n
tr
o
lle
r
C
Vref
clk
Vcmp
Q1
Q2
Q3
QN
Figure 3.1: Schematic of conventional DLDO.
including Imax, TR, and ∆V , making NBTI-aware DLDO designs necessary. Second, a novel uni-
directional shift register (uDSR) is proposed to mitigate the NBTI induced DLDO performance
degradation under arbitrary load conditions without degrading the performance. Third, possible
mitigation strategies of DLDO performance degradation using the proposed technique are evaluated
and reliability-aware design considerations are explored within practical applications.
The rest of this chapter is organized as follows. Background information regarding conven-
tional DLDO regulator and NBTI is introduced in Section 3.2. NBTI induced DLDO performance
degradation including Imax, TR, and ∆V is demonstrated theoretically in Section 3.3. The pro-
posed uDSR based NBTI-aware DLDO is described in Section 3.4. Evaluation of the benefits of the
proposed NBTI-aware DLDO through simulation of an IBM POWER8 like processor is provided in
Section 3.5. Concluding remarks are offered in Section 3.6.
3.2 Background
3.2.1 Conventional DLDO Regulator
The schematic of a conventional DLDO [63] is shown in Fig. 3.1. DLDO is composed
of N parallel pMOS transistors Mi (i = 1, ..., N) connected between the input voltage Vin and
45
DFF
D Q
clk
S
Q1
Q2
0
DFF
D Q
clk
S
Q2
Q3
DFF
D Q
clk
S
Q3
Q4
DFF
D Q
clk
S
QN
1
Vcmp
clk
Set
Q1 Q2 QN-1
(a)
Q1 Q2 Q3 Q4 Q5 Q6 QN-1 QN
(1) Initialize: all Mi turned off
1 1 1 1 1 1 1 1
(2) Step k
1 1 1 10 0 0 0
(3-a) Step k+1 if Vcmp=H: Shift right
1 1 10 0 0 0
(3-b) Step k+1 if Vcmp=L: Shift left
1 1 1 10 0 0 1
0
(b)
Figure 3.2: Digital controller for conventional DLDO. (a) Bi-directional shift register. (b) Operation
of bi-directional shift register.
output voltage Vout, and feedback control loop implemented with a clocked comparator and digital
controller. The value of Vout and reference voltage Vref are compared through the comparator at
the rising edge of the clock signal clk. More (less) number of Mi is turned on through the digital
controller output signals Qi (i = 1, ..., N) if Vout < Vref , Vcmp = H (Vout > Vref , Vcmp = L). A
bi-directional shift register (bDSR), as shown in Fig. 3.2a, is conventionally implemented for the
digital controller to turn on (off) power transistors M1 to Mm (Mm+1 to MN ) with the value of m
decided by the load current Iout. At a certain step k+1, Mm+1 (Mm) is turned on (off) if Vcmp = H
(Vcmp = L) and bDSR shifts right (left) as demonstrated in Fig. 3.2b.
46
DLDO needs to able to supply the maximum possible load current Imax. It is, however,
demonstrated that, within most practical applications, including but not limited to smart phone
[111] and chip multiprocessors [114], less than the average power is consumed most of the time. The
application environment of DLDO together with the conventional activation scheme of Mi leads to
the heavy use of M1 to Mm and less or even no use of Mm+1 to MN . This scheme can therefore
introduce serious degradation to M1 to Mm due to NBTI. The subsequent DLDO performance
deteriorations are discussed in Sections 3.2.2 and 3.3.
3.2.2 Negative Bias Temperature Instability
NBTI can introduce significant Vth degradations to pMOS transistors due to negatively
applied gate to source voltage Vgs. The increase in |Vth| due to NBTI is considered to be related
to the generation of interface traps at the Si/SiO2 interface when there is a gate voltage [115].
|Vth| increases when electrical stress is applied and partially recovers when stress is removed. This
process is commonly explained using a reaction-diffusion (R-D) model [115]. The Vth degradation
can be estimated during each stress and recovery phase using a cycle-to-cycle model and can also be
evaluated using a long-term reliability model [109, 116, 117]. As the long-term reliability evaluation
is the focus of this work, the analytical model for long-term worst case threshold voltage degradation
∆Vth estimation in [109] is adopted in this work as
∆Vth = Klt
√
Cox(|Vgs| − |Vth|)e
−Ea
kT (αt)
1
6 (3.1)
where Cox, k, T , α, and t are, respectively, the oxide capacitance, Boltzmann constant, temperature,
the fraction of time (activity factor) when the device is under stress, and operation time. Klt and
Ea are the fitting parameters to match the model with the experimental data [109]. Note that NBTI
recovery phase is already included in the model.
47
0 2 4 6 8 10
t (years)
0
10
20
30
40
50
Pe
rc
en
ta
ge
 I p
M
OS
 d
eg
ra
da
tio
n 
of
co
nv
en
tio
na
l b
DS
R 
ba
se
d 
DL
DO
 (%
) 27oC (300 K)
75oC (348 K)
125oC (398 K)
Figure 3.3: Percentage IpMOS degradation of conventional bDSR based DLDO.
3.3 NBTI Induced Performance Degradation
Imax, TR, and ∆V are among the most important design parameters for DLDOs. The effect
of NBTI induced degradations on these important performance metrics is examined in this section.
3.3.1 Maximum Current Supply Capability
Without NBTI induced degradations, Imax=NIpMOS , where IpMOS is the maximum output
current of a single pMOS stage. For DLDO, |Vgs| in (3.1) is equal to Vin when Mi is active. The
pMOS transistor Mi operates in linear region when turned on and the on-resistance Ron of a single
pMOS stage can be approximated as [109]
Ron ≈ [(W/L)µpCox(Vin − |Vth|)]−1 (3.2)
48
where W , L, µp, and Cox are, respectively, the width, length, mobility, and oxide capacitance of
Mi. IpMOS can thus be expressed as
IpMOS =
Vsd
Ron
= (Vin − Vout)(W/L)µpCox(Vin − |Vth|) (3.3)
where Vsd is the source drain voltage of Mi. NBTI induced degradation factor DFi for Mi can be
defined as
DFi =
IdegpMOSi
IpMOS
=
Vin − |Vth| −∆Vthi
Vin − |Vth| (3.4)
where ∆Vthi and I
deg
pMOSi
are, respectively, NBTI induced Vth degradation and the degraded IpMOS
for Mi. Degraded Imax can be expressed as
Idegmax = IpMOS
N∑
i=1
DFi. (3.5)
As an example, the percentage IpMOS degradation 1−DFi for smaller value of i, considering
Mi is active most of the time, is shown in Fig. 3.3 as a function of time under different temperatures.
A 32 nm metal gate, high-k strained-Si CMOS technology from PTM model library [117] is utilized.
A nominal supply voltage Vin = 0.9 V is used. PTM is adopted for simulation as it is widely used
for BTI study due to the availability of fitting parameter values in the ∆Vth degradation model [109,
118], [6-8]. As shown in Fig. 3.3, NBTI can induce significant IpMOS degradations, especially at
high temperatures. Degraded IpMOS can further lead to reduced Imax and lower output voltage
regulation capability under high load current. Moreover, as discussed in Sections 3.3.2 and 3.3.3,
degraded IpMOS also exacerbates TR and ∆V , necessitating reliability enhancement techniques.
3.3.2 Load Response Time
Load response time TR measures how fast the feedback loop responds to a step load. TR
can be estimated as [119]
TR = RCln(1 +
∆iload
IpMOSfclkRC
) (3.6)
49
where R, C, fclk, and ∆iload are, respectively, the average DLDO output resistance before and after
∆iload, load capacitance, clock frequency, and amplitude of the load change. Considering NBTI
effect, degraded TR can be expressed as
T degR = RCln(1 +
∆iload
DFIpMOSfclkRC
). (3.7)
As 0 < DF < 1 and TR < T
deg
R , NBTI induced degradation slows down DLDO response.
3.3.3 Magnitude of the Droop
Magnitude of the droop ∆V reflects the Vout noise profile under transient response and can
be estimated as [119]
∆V = R∆iload − IpMOSfclkR2Cln(1 + ∆iload
IpMOSfclkRC
). (3.8)
Considering NBTI effect, degraded ∆V can be expressed as
∆Vdeg = R∆iload −DFIpMOSfclkR2Cln(1 + ∆iload
DFIpMOSfclkRC
). (3.9)
Let ∆iload/IpMOSfclkRC = A, A > 0. Under 0 < DF < 1, the following holds
1 +A > (1 +
A
DF
)DF (3.10)
thus
IpMOSfclkR
2Cln(1 +
∆iload
IpMOSfclkRC
) > DFIpMOSfclkR
2Cln(1 +
∆iload
DFIpMOSfclkRC
) (3.11)
and ∆V < ∆Vdeg, which means NBTI can degrade the transient voltage noise profile.
Furthermore, it is worth noting that, as seen from (3.5), (3.6), and (3.8), NBTI induced
DLDO performance degradations are mainly due to the degradation of the power transistors Mi
50
TFF
T Q
clk
S
Q1T1 TFF
T Q
clk
S
Q2T2Ta
TbTc
QN
Q1
QN
Vcmp
TFF
T Q
clk
S
QNTN
clk
Set
Q1
Q2
Q1
Vcmp
QN-1
QN
QN-1
Vcmp
Figure 3.4: Proposed uni-directional shift register for NBTI-aware DLDO.
(i = 1, ..., N) rather than the control loop. Thus, mitigation of power transistor degradations should
be taken as a priority.
Power transistor Mis with smaller values of i are more heavily used than those with larger
values of i for conventional bDSR based DLDO. As studied in [114] that load current variation per
processor clock cycle can be small most of the time. It is thus reasonable to assume that the newly
activated/deactivated power stages have similar level of IpMOS degradations. As below average
power is mostly consumed, conventional bDSR based DLDOs experience worst case TR and ∆V
degradations since the worst degraded Mis are utilized most of the time.
3.4 NBTI-Aware DLDO Voltage Regulator
To mitigate NBTI induced DLDO performance degradations, distributing the electrical
stress among all available power transistors as evenly as possible under arbitrary load current con-
ditions is essential. Reliability is not considered in conventional bDSR based DLDO designs, and
therefore too much stress is exerted on a small portion of Mis. A novel uDSR is thus proposed in
this work to evenly distribute the electrical stress among all of the Mis to realize a NBTI-aware
DLDO voltage regulator and enhance reliability.
The schematic and operation of the proposed uDSR are shown, respectively, in Figs. 3.4
and 3.5. The elementary D flip-flop (DFF) and multiplexer within bDSR, as shown in Fig. 3.2a, are
replaced with T flip-flop (TFF) and simple logic gates within the proposed uDSR, respectively. The
rest of the DLDO including parallel power transistors and clocked comparator remains unchanged.
The idea is to balance the utilization of each available Mi under all load current conditions. To
achieve this objective, control signals Qi−1 and Qi for two adjacent power transistors Mi−1 and Mi,
51
Q1 Q2 Q3 Q4 Q5 Q6 QN-1 QN
(1) Initialize: all Mi turned off
1 1 1 1 1 1 1 1
(2) Step k
1 1 1 10 0
(3-a) Step k+1 if Vcmp=H: Shift right
1 1 10 0
(3-b) Step k+1 if Vcmp=L: Shift right
1 1 1 1
0
1
1
01 1
Figure 3.5: Operation of the proposed uni-directional shift register.
respectively, are XORed to determine if Mi−1 and Mi are at the boundary of active and inactive
power transistor portions. Normally, there are two such boundaries if at least one power transistor
is active, as shown in Fig. 3.5. Qi−1 and output of the comparator Vcmp are thus XORed to decide
which power transistor at the boundaries need to be turned on/off at the rising edge of the clock
signal. Inactive (active) power transistor at the right (left) boundary is turned on (off) if Vcmp
is logic high (low). A uni-directional shift register is realized through this activation/deactivation
scheme, as demonstrated in Fig. 3.5. Qi−1 for the first stage is QN from the last stage and
thus a loop is formed. Considering the initialization step when all Mis are off and the full load
current condition when all Mis are on, additional control signals are inserted as Tb and Tc in the
first stage, to avoid inaction under these two situations, where Tb = Q1 · Q2 · · · QN · Vcmp and
Tc = Q1 +Q2 + · · ·+QN + Vcmp. The logic functions for Tb and Tc can be implemented with
n-input AND/NOR gates [120]. Considering the similar area of DFF and TFF, the proposed uDSR
only induces ∼ 3.8% area overhead per control stage compared to bDSR. The total area overhead
is thus ∼ 2.6% of a single DLDO area designed with µA current supply capability [63]. As little
extra transistors are added per control stage and the bDSR only consumes a few µW power [63],
the uDSR induced power overhead is also negligible. With larger IpMOS for higher load current
rating, both the area and power overhead can be significantly less.
52
Table 3.1: Load Characteristics of Different Functional Blocks within One Core of an IBM POWER8
Like Microprocessor Chip under All Experimented Benchmarks
IFU LSU ISU EXU L2
Min Iload (A) 0.091 0.172 0.125 0.251 0.178
Max Iload (A) 3.245 12.092 1.356 5.056 2.195
Avg Iload (A) 1.138 0.908 0.201 1.294 1.719
Under transient load current conditions, if Vout < Vref (Vout > Vref ) due to increased (de-
creased) load current, inactive (active) power transistors at the right (left) boundary are gradually
turned on (off) to supply the required output current and regulate Vout. Under steady state con-
ditions, the number of active power transistors changes dynamically due to limit cycle oscillations
as shown in [63] in order to supply the required current. Newly activated (deactivated) power
transistors always occur at the right (left) boundary, leading to the right shift of the active power
transistors all the time. Thus, regardless of the load current conditions, electrical stress can always
be evenly distributed among all of the available power transistors. Furthermore, as compared to
conventional bDSR based DLDO, the number of activated/deactivated power transistors per clock
cycle remains the same and thus the DLDO performance is not negatively affected.
Unlike the rotating phase-shedding scheme for multiphase buck converters implemented
in [111], which only mitigates aging effects at light load conditions with only one active phase, the
proposed uDSR is effective under all load current conditions. Moreover, as the proposed uDSR is
a generalized method to determine which parallel power stage needs to be turned on/off, it can
also be tailored for reliability enhancement within multiphase buck or switched capacitor voltage
regulators with phase-shedding functionality.
3.5 Evaluation
To evaluate the benefits of the proposed uDSR based DLDO architecture in terms of reli-
ability enhancement and to provide design insights for a targeted lifetime, an IBM POWER8 like
microprocessor [64] simulation platform is constructed.
53
L2 
LSU 
EXU 
IFU 
ISU 
Figure 3.6: A schematic diagram demonstrating the floor plan of one core within IBM POWER8
like microprocessor chip.
3.5.1 Simulation Framework
3.5.1.1 IBM POWER8 Like Microprocessor
IBM POWER8 microprocessor [64] is among one of the state-of-the-art server-class proces-
sors and thus representative for evaluation of the proposed NBTI-aware DLDO scheme. The same
corresponding technology and architecture parameters listed in Table 2.1 are considered. The IBM
POWER8 like microprocessor as shown in Fig. 3.6, includes a load store unit (LSU), an execu-
tion unit (EXU), an instruction fetch unit (IFU), an instruction scheduling unit (ISU), an L1 data
cache inside LSU, an L1 instruction cache inside IFU, and a private L2. All benchmarks are from
SPALSH2x [121] and cover a wide range of representative application domains. Analysis is restricted
to the region-of-interest of the benchmarks and eight threads are involved in the simulations. The
load characteristics of different functional blocks, as shown in Fig. 3.6, under all experimented
benchmarks are summarized in Table 3.1.
54
Table 3.2: Conventional DLDO Performance Degradation for Different Functional Blocks under All
Experimented Benchmarks for a Five-Year Time Frame
IFU LSU ISU EXU L2
% IpMOS degradation 16.2 21.4 15.3 16.6 15.1
% TR degradation 9.4 12.9 8.9 9.7 8.7
% ∆V degradation 6.4 8.7 6.1 6.6 6
3.5.1.2 DLDO Design Specifications
Distributed micro-regulators are implemented in IBM POWER8 microprocessor [122]. In
this simulation example, a switch array of 256 pMOS transistors, which is typical in DLDO designs
[63], is implemented in each micro-regulator. Two different DLDO designs with bDSR and uDSR
controls are implemented using 32 nm PTM CMOS technology where Vin = 1.1 V and Vout = 1 V
as in [122]. IpMOS = 2 mA and Imax = 512 mA are used in the simulations, leading to 7, 24, 3, 10,
and 5 micro-regulators (DLDOs) in, respectively, IFU, LSU, ISU, EXU, and L2 blocks to be able
to supply the maximum load current across all benchmarks in each block. Load current of each
block is assumed to be supplied by micro-regulators within that block, which is reasonable due to
the principle of spatial locality [30] regarding current distribution. Each micro-regulator within a
certain block is assumed to provide equal current due to the availability of current balancing scheme
implemented within IBM POWER8 microprocessor [73]. fclk = 10 MHz and C = 15 nF are used
for each DLDO to achieve smaller than 10% Vdd transient voltage noise [31] most of the time. The
total output capacitance of 735 nF is comparable to 750 nF used in [122].
3.5.1.3 Evaluation of NBTI Induced Performance Degradation
Equations (3.1), (3.3), (3.6), and (3.8) are leveraged for evaluation of NBTI induced perfor-
mance degradation. A typical temperature profile [31, 123] of 90oC, 69oC, 67oC, 63oC, and 62oC
for, respectively, LSU, EXU, IFU, ISU, and L2 is adopted for evaluations. The activity factors
for both DLDO designs under different benchmarks and functional blocks are estimated through
simulations in Cadence Virtuoso. The worst case IpMOS degradations are used for evaluations of
both designs, which is reasonable due to load characteristics of typical applications [114] and the
consequent heavy use of a portion of Mis in conventional DLDOs.
55
Figure 3.7: Percentage IpMOS degradation mitigation of the proposed NBTI-aware DLDO as
compared to the conventional DLDO design for different functional blocks under all experimented
benchmarks.
3.5.2 Simulation Results
3.5.2.1 Performance Degradation within Conventional DLDO
Conventional DLDO performance degradation for different functional blocks for a five-year
time frame is summarized in Table 3.2. These degradations apply to all the experimented bench-
marks as the worst case IpMOS degradation is considered. As shown in Table 3.2, NBTI can induce
serious IpMOS , TR, and ∆V degradations for all functional blocks. IpMOS degradation can lead
to the deterioration of DLDO Vout regulation capability and possible Vout drop under large load
current conditions. Larger than 10% Vout drop can lead to voltage emergencies and potential execu-
tion errors for microprocessors. Similarly, TR and ∆V degradations can, respectively, increase the
duration and frequency of voltage emergencies, which can slow down microprocessor executions as
further actions may need to be taken to remedy the errors. Moreover, for a longer targeted lifetime
of more than five years, the degradations are expected to be more disastrous as IpMOS degradations
are even worse, as seen from Fig. 3.3, which may not be tolerable for critical applications where
replacement of the devices can be costly or even impossible.
3.5.2.2 Mitigation with Proposed NBTI-Aware DLDO
Simulation results for all benchmarks are summarized in Figs. 3.7, 3.8, and 3.9 regarding,
respectively, IpMOS , TR, and ∆V degradation mitigation of the proposed NBTI-aware DLDO as
56
Figure 3.8: Percentage TR degradation mitigation of the proposed NBTI-aware DLDO as compared
to the conventional DLDO design for different functional blocks under all experimented benchmarks.
Figure 3.9: Percentage ∆V degradation mitigation of the proposed NBTI-aware DLDO as compared
to the conventional DLDO design for different functional blocks under all experimented benchmarks.
compared to the conventional DLDO design for a five-year time frame. Up to 39.6%, 43.2%, and
42% performance improvement is achieved for, respectively, IpMOS , TR, and ∆V . The highest
performance improvement is obtained for LSU with the highest operation temperature. Even at the
lowest operation temperature within L2, degradation mitigations of up to 15.1%, 16.4%, and 15.9%
are achieved for, respectively, IpMOS , TR, and ∆V .
3.5.2.3 Discussions
For high temperature applications and applications with high maximum to average current
ratio, such as the LSU block, NBTI can induce greater performance degradations as summarized
in Tables 3.1 and 3.2. The benefits of the proposed NBTI-aware DLDO scheme are also more
advantageous for certain applications as shown in Figs. 3.7, 3.8, and 3.9 for the LSU portions. For
57
applications where the average current is close to the maximum current, such as the L2 block, the
performance degradation mitigations using the proposed NBTI-aware DLDO are less significant but
still beneficial as compared to the conventional design considering negligible extra power and area
overhead induced by the proposed design.
DLDO performance degradations can vary under different load characteristics and temper-
ature. It is thus essential to examine these degradations in early design stage with the applied
reliability enhancement techniques. Extra design margins, such as increased number of Mi and/or
output capacitance, should be adopted adaptively according to the aging speed of different func-
tional blocks and benchmark applications instead of utilizing a uniform margin to avoid potential
overdesign.
3.6 Conclusion
The DLDO regulators can experience serious NBTI induced performance degradations in-
cluding IpMOS , TR, and ∆V . These degradations are typically overlooked in the design of DLDOs
and can deteriorate the regulation capability, response speed, and transient voltage noise profile. A
novel uni-directional shift register is proposed in this chapter to evenly distribute the electrical stress
among different power transistors to mitigate NBTI induced performance degradation with nearly
no extra power and area overhead under arbitrary load conditions. Through practical simulations
of an IBM POWER8 like microprocessor and benchmark evaluations, it is demonstrated that up to
39.6%, 43.2%, and 42% degradation mitigation can be achieved for, respectively, IpMOS , TR, and
∆V with the proposed technique. Simulation results also highlight the necessity of adaptive design
margins to avoid overdesign.
58
CHAPTER 4:
AGING EFFECTS ON THE STEADY STATE PERFORMANCE
DEGRADATION OF ON-CHIP VOLTAGE REGULATORS
4.1 Introduction
Distributed on-chip voltage regulation [2, 124] in fine temporal and spatial granularity en-
ables fast and timely control of the operating point.1 Thereby the operating voltage and frequency
can better match the needs of the workload to maximize energy efficiency. As a function of the
workload, throughout the execution time, different components of a processor chip exhibit different
microarchitectural activities, which translates into different demands for current to be pulled from
the respective regulators. Different components of the processor chip also show different degrees of
tolerance to errors, which may result from deviation of design parameters from their target values
due to device wear-out, voltage noise, temperature, or process variations. For example, it has been
observed that the emerging recognition, mining, and synthesis (RMS) [125] applications can tolerate
errors in the data flow, but not in control [126].
Heterogeneous distributed on-chip voltage regulation has been explored to best capture
spatio-temporal variations in current demand of different processor components, where the regula-
tor operating regimes are tailored to the activity range of the respective load (processor component).
Such tailoring can be achieved by (i) keeping the regulator design constant across chip, but making
each regulator reconfigurable; or (ii) by designing each regulator from the ground-up to match differ-
ent load conditions. A promising direction which has not been explored is how such heterogeneity
can help in trading the program output quality for area overhead, by e.g., assigning error-prone
1This chapter includes portions of the published paper in IEEE Transactions on Very Large Scale Integration
(VLSI) Systems, Early Access, DOI (identifier) 10.1109/TVLSI.2018.2871381 "Exploiting Algorithmic Noise Toler-
ance for Scalable On-Chip Voltage Regulation". Permission is included in Appendix A.
59
(i.e., slower and/or less accurate) regulators to feed processor components in charge of data flow
which can tolerate errors. Control-heavy components, on the other hand, should not be permitted
to leave the error-free zone to avoid catastrophic program termination or excessive loss in program
output quality even if the program does not crash.
To this end, we must understand the type and impact of errors that voltage regulators can
introduce to the system, such that we can assess to what extent such regulator-induced errors can be
masked by their respective loads (i.e., data flow heavy processor components) and how regulator-
induced errors interact with load-induced potential errors in determining the final computation
accuracy. In this chapter, we will try to shed light into this question by quantifying the impact
of one of the most prevalent reliability concerns, aging, on regulator robustness, without loss of
generality.
The major transistor aging mechanisms include bias temperature instability (BTI), hot
carrier injection (HCI), time dependent dielectric breakdown (TDDB), and electromigration (EM),
among which BTI is the dominant reliability concern for nanometer integrated circuits design [109,
118, 127]. BTI can induce threshold voltage increase and consequent circuit level performance
degradation. Positive BTI (PBTI) induces aging of nMOS transistors while negative BTI (NBTI)
causes aging of pMOS transistors [109]. The impact of BTI aging mechanism is a strong function
of temperature, electrical stress, and time.
As an essential part of a processor chip, on-chip voltage regulators need to be active most
of the time to provide the required power to different components of the processor chip. The load
current and temperature can vary significantly among these components for different processor ap-
plications [31]. All of these variations partially contribute to different aging mechanisms of on-chip
voltage regulators, which should be considered to avoid over-design for a targeted lifetime. Addi-
tionally, in certain processor components that can show higher degrees of tolerance to errors, the
regulators can be intentionally under-designed to save valuable chip area and potentially power-
conversion efficiency. In other words, a heterogeneous distributed power delivery network can be
designed consisting of different voltage regulators with accurate voltage regulators that house ad-
ditional circuitry to mitigate the aging induced supply voltage variations and approximate voltage
60
regulators that are intentionally under-designed to mitigate just-enough aging-induced variations.
The quality of the supply voltage directly affects data path delay and signal quality, and fluctua-
tions in the supply voltage result in delay uncertainty and clock jitter. Accordingly, the supply noise
tolerance of certain processor components is investigated as an area-quality control knob where the
quality of the supply voltage can be compromised to save valuable chip area.
Several studies have been performed regarding the reliability issues in nanometer CMOS
designs [116, 128, 129]. There is, however, quite limited amount of work on the reliability of on-chip
voltage regulators. Device aging on the immunity level of electro-magnetic interference (EMI) for
low-dropout regulators (LDO) is characterized in [110]. A method of distributing the aging stress
by rotating the phase to shed at light load is proposed in [111] to enhance the light load efficiency
for multiphase buck converters. An algorithm to uniformly distribute the current provided by the
power transistor array of a digital LDO (DLDO) is proposed in [65] to reduce hot spots and ensure
reliable silicon operation. The reliability of metal wires connected to on-chip voltage regulators is
investigated in [6]. Nonetheless, quantitative analysis of aging effects on on-chip voltage regulators
considering load current characteristics and temperature variations as well as efficient reliability
enhancement techniques under arbitrary load conditions have not yet been investigated.
As compared to other voltage regulator types, the emerging digital LDO has gained impetus
due to the design simplicity, easiness for integration, high power density, and fast response [63, 112,
130]. DLDOs have demonstrated major advantages in modern processors including the recent IBM
POWER8 processor [64]. More importantly, as compared to the analog LDOs, DLDO can provide
certain advantages for low-power and low-voltage IoT applications due to its capability for low
supply voltage operations [113]. However, as pMOS is used as the power transistor for DLDOs,
NBTI induced degradations largely affect important performance metrics such as the maximum
output current capability Imax, load response time TR, and magnitude of the droop ∆V as defined
in [73]. Meanwhile, the combined NBTI and PBTI induced control loop degradations can potentially
increase the mode of inherent limit cycle oscillations (LCOs) within DLDOs and adversely affect
the steady state output voltage ripple performance. It is therefore imperative to investigate aging
mitigation techniques for DLDOs to achieve reliable operation of critical components. As aging
61
Delay
pMOS array 
and load circuit
ZOH
z-1
Vref
N(A, φ)
S(z)
P(z)
x(t) y*(t)
D(z)
Figure 4.1: Nonlinear sampled feedback model of a conventional DLDO.
effects on the transient performance degradation of DLDOs have been investigated in Chapter 3,
aging effects on the steady state performance degradation of DLDOs will be the focus of this chapter.
Furthermore, when a circuit component can tolerate higher degrees of errors, the DLDOs can be
designed with minimal area overhead, achieving heterogeneous power delivery. A voltage regulator
is proposed in this chapter that can be designed at the design time based on the supply noise
resiliency requirement of the circuitry it powers. Since the number of voltage regulators can be as
high as several hundreds in modern processors [64], the area and number of voltage regulators can
be easily scaled thereby to satisfy the diverse needs of systems that house components with varying
degrees of noise tolerance.
The rest of this chapter is organized as follows. The potential side effects of limit cycle
oscillation on the steady state performance of DLDO is studied in Section 4.2. Aging-aware limit
cycle oscillation mitigation technique is investigated in Section 4.3. Effectiveness of the proposed
technique is verified in Section 4.4. Trade-off between area overhead and program output quality is
illustrated in Section 4.5. Concluding remarks are offered in Section 4.6.
62
4.2 Limit Cycle Oscillation
In conventional DLDOs, when the shift register turns on/off the pass transistor, the output
voltage of the DLDO cannot change instantaneously due to the output pole of the DLDO. The
delay between the operation of the shift register and fluctuation of the output voltage, together
with the quantization effects of the comparator and the delay between the sampling instant and
the time of pMOS array actuation lead to the occurance of LCO. Such behavior can be examined
by a nonlinear sampled feedback model developed in [131] to determine the possible modes and
amplitudes of LCOs.
The model consists of N(A,ϕ), P (z), S(z), and D(z) as shown in Fig. 4.1, which represent,
respectively, the describing function of the clocked comparator, transfer function of the zero-order
hold (ZOH) together with the pMOS array and load circuit, transfer function of the shift register,
and delay element between the comparator and shift register. A and ϕ stand for the LCO amplitude
and the phase shift of x(t), respectively.
N(A,ϕ), P (z), S(z), and D(z) can be expressed, respectively, as [132, 133]
N(A,ϕ) =
2D
MTA
M−1∑
m=0
sin(
pi
2M
+
mpi
M
)∠( pi
2M
− ϕ) (4.1)
P (z) = KOUT
1− e−FlT
Fl(z − e−FlT ) (4.2)
S(z) =
z
z − 1 (4.3)
D(z) = z−1 (4.4)
where KOUT = KDCIpMOS , T = 1/fclk, Fl = 1/(RL||RpMOS)C, and ϕ ∈ (0, pi/M). D, Fl, KOUT ,
KDC , RL, and RpMOS are, respectively, the amplitude of comparator output, load pole, gain of
P (z), DC proportional constant, load resistance, resistance of power transistor array.
The mode and amplitude of LCO can be determined by the following Nyquist criterion,
N(A,ϕ)P (ejωT )S(ejωT )D(ejωT ) = 1∠(−pi) (4.5)
63
Iload
Vout
Vin
M1
M2
M3
MN
U
n
id
ir
ec
ti
o
n
al
 
sh
if
t 
re
gi
st
er
C
Vref
clk
Vcmp
Q1
Q2
Q3
QN
tc tc tc
T Minimized tc
Figure 4.2: Schematic of the proposed aging-aware DLDO.
where ω = pi/TM is the angular LCO frequency. The phase shift ϕLCO for a steady LCO can thus
be expressed as [132]
ϕLCO =
pi
2
− pi
2M
− tan−1( pi
MTFl
). (4.6)
ϕLCO needs to be within (0, pi/M) for mode M to exist.
Transistor aging can lead to increased path delay [134]. Considering BTI induced propaga-
tion delay degradation of the clocked comparator and shift register, the delay element in Fig. 4.1
becomes
D′(z) = z−1z−
tdc
T z−
(tds−tdc )
T = z−1−
tds
T (4.7)
where tdc and tds are, respectively, the degraded propagation delay of the clocked comparator and
shift register. Note that tdc is canceled out in D′(z) and thus the propagation delay of clocked
comparator has negligible effects on the mode of LCO. ϕLCO then becomes
ϕ′LCO =
pi
2
− pi
2M
− tan−1( pi
MTFl
)− pit
d
s
MT
. (4.8)
64
The negative effect of the propagation delay of the shift register on LCO can be explained as
follows. If an LCO modeMa exists and the propagation delay of the shift register is not considered,
the phase shift ϕLCO is within (0, pi/Ma). That is 0 < pi/2 − pi/2Ma − tan−1(pi/MaTFl) < pi/Ma.
For a larger LCO mode Ma + 1 to exist, the following condition needs to be satisfied
0 <
pi
2
− pi
2(Ma + 1)
− tan−1( pi
(Ma + 1)TFl
) < pi/(Ma + 1). (4.9)
Typically
pi
2
− pi
2(Ma + 1)
− tan−1( pi
(Ma + 1)TFl
) >
pi
2
− pi
2Ma
− tan−1( pi
MaTFl
) (4.10)
and if pi/2− pi/2Ma − tan−1(pi/MaTFl) is very close to pi/Ma, it is likely that
ϕLCO|M=Ma+1 =
pi
2
− pi
2(Ma + 1)
− tan−1( pi
(Ma + 1)TFl
) > pi/Ma > pi/(Ma + 1) (4.11)
such that LCO mode Ma + 1 can not exist as (4.9) is violated.
However, if the propagation delay of the shift register is included, for LCO mode Ma + 1,
ϕLCO becomes
ϕ′LCO|M=Ma+1 =
pi
2
− pi
2(Ma + 1)
− tan−1( pi
(Ma + 1)TFl
)− pit
d
s
(Ma + 1)T
. (4.12)
The contribution of pitds/(Ma + 1)T term may push ϕ′LCO|M=Ma+1 to be within the range of
(0, pi/(Ma + 1)), making a larger LCO mode Ma + 1 possible. This demonstrates the potential
negative effect of the propagation delay of the shift register on LCO.
4.3 Reduced Clock Pulse Width
Dual clock edge triggering has been employed in [133, 135] to reduce the control signal delay,
where the clocked comparator and shift register are triggered at the rising and falling edge of the
clock signal, respectively. Considering the potential side effect of the control loop delay element
D′(z) on LCO, a reduced clock pulse width tc, as shown in Fig. 4.2, is proposed to minimize the
delay element. With dual clock edge triggering implementation of the control loop, the following
65
Table 4.1: TFF Setup Time, Logic Delay, and Comparator Delay Before and After a Five-Year
Aging Period
TFF setup time Logic delay Comparator delay
Fresh (ps) 170 209.6 171.5
Aged 5 yrs (ps) 180 227.4 225
5 10 50 100 300 500
Iload (mA)
0
1
2
3
4
5
6
M
ax
im
um
 L
CO
 m
od
e Conventional dual edge DLDOAging-aware DLDO
Figure 4.3: Maximum LCO mode with simulation results superimposed for conventional and aging-
aware DLDO under different load current conditions after a five-year aging period.
condition needs to be satisfied regarding tc for proper operation of the uDSR based DLDO
tc > t
d
c + t
d
l + t
st
t (4.13)
where tdl and t
st
t are, respectively, the total propagation delay of the logic gates connected to the
first stage TFF within the uDSR and the setup time of the TFF. Aging induced degradation of tdc ,
tdl , and t
st
t needs to be considered with the targeted lifetime to decide the value of tc. The one-shot
pulse generator in [98] can be leveraged for reduced pulse width clock generation.
Within the proposed aging-aware DLDO, ϕLCO becomes
ϕ′′LCO =
pi
2
+
pi
2M
− tan−1( pi
MTFl
)− pi(t
d
s + tc)
MT
. (4.14)
66
Table 4.2: Maximum LCO Mode under Different Sampling Clock Frequency and Load Current
Condition for Conventional Dual Edge (CDE) and Aging-Aware (AA) DLDO
CDE/AA LCO mode Sampling clock frequency fclk (MHz)
Iload (mA) 10 50 100 300 500
10 4/2 8/6 11/9 20/18 27/27
100 3/2 3/2 4/3 6/6 8/8
500 3/2 3/2 3/2 3/3 4/4
4.4 LCO Mitigation with Proposed Aging-Aware DLDO
To verify the benefits of the proposed reduced clock pulse width DLDO regarding LCO mit-
igation, the theoretical maximum LCO mode for dual edge triggered and reduced clock pulse width
DLDO with uDSR implementation are respectively examined by considering BTI induced threshold
voltage degradation of the control loop. An average IBM POWER8 microprocessor temperature
profile of 70oC is utilized for Vth degradation evaluation. NBTI and PBTI are considered as the
major Vth degradation factor for pMOS and nMOS transistors in the control loop, respectively.
Under different load current conditions, the activity factor of each transistor within the control loop
is obtained through Cadence Virtuoso simulations. Equation (3.1) is then leveraged to calculate the
Vth degradation for each transistor within a five-year time frame. The calculated Vth degradation is
embedded in each transistor by adopting the subcircuit model for BTI effect in [136] within Cadence
Virtuoso simulations. The fresh and aged TFF setup time tstt , logic delay tdl , and comparator delay
tdc are summarized in Table 4.1. The aged tstt , tdl , and t
d
c are approximately load current indepen-
dent. tc = 1ns is adopted to satisfy timing constraint in (4.13). The maximum LCO mode for dual
edge triggered and reduced clock pulse width DLDO under different load current conditions after a
five-year aging period is illustrated in Fig. 4.3.
Seen from Fig. 4.3, with reduced clock pulse width considering aging imposed limitations,
maximum LCO mode can be greatly reduced especially at light load conditions. The simulated
steady state output voltages for both conventional dual edge triggered DLDO and the proposed
aging-aware DLDO under 10 mA load current are demonstrated in Fig. 4.4. LCO mode reduction
from 4 to 2 and 3 times output voltage ripple amplitude reduction are achieved. As the minimum
and average Iload can be way smaller than the maximum Iload shown in Table 3.1 especially for LSU,
67
6 6.1 6.2 6.3 6.4 6.5
t (s) 10-5
0.95
1
1.05
V o
ut
 (V
)
Conventional dual edge DLDO
Aging-aware DLDO
Figure 4.4: Simulated output voltage ripple and LCO mode reduction with the proposed aging-
aware DLDO under 10 mA load current.
light and medium load conditions are experienced most of the time such that outstanding benefits
can be achieved with the proposed aging-aware DLDO considering the negligible power and area
overhead induced.
Furthermore, in many applications the clock frequency can be much higher than 10 MHz such
as 1 GHz in [137]. However, the 1 GHz sampling clock sacrifices the quiescent current. Recent work
such as [135] and [138] utilizes a high clock frequency for fast transient and a much lower frequency
for steady state operation. For a better verification of LCO improvement utilizing the proposed
reduced clock pulse width scheme, maximum LCO mode under different sampling clock frequency
and load current condition for both conventional dual edge and aging-aware DLDO is shown in Table
4.2. Seen from the table, the proposed reduced clock pulse width scheme demonstrates maximum
LCO mode reduction under a wide fclk range especially under light load current condition. For a
clock frequency of 1 GHz, there would be no room to further reduce the pulse width due to the
timing constraint. However, as discussed before, clock frequency utilized at steady state operation
is typically much lower.
68
0 2 4 6 8 10
t (years)
0
10
20
30
40
Pe
rc
en
ta
ge
 a
re
a 
ov
er
he
ad
 (s
av
ing
) (
%
)
 OH IFU
 OH LSU
 OH ISU
 OH EXU
 OH L2
 OH_S IFU
 OH_S LSU
 OH_S ISU
 OH_S EXU
 OH_S L2
Figure 4.5: Percentage area overhead (OH) utilizing conventional DLDO and percentage area
overhead saving (OH_S) utilizing aging-aware DLDO for ∆V degradation mitigation within each
functional unit.
0 2 4 6 8 10
t (years)
0
10
20
30
40
50
60
Pe
rc
en
ta
ge
 a
re
a
 o
ve
rh
ea
d 
(s
av
ing
) f
or
 L
SU
 (%
)
 OH @ 25 oC
 OH @ 90 oC
 OH @ 125 oC
 OH_S @ 25 oC
 OH_S @ 90 oC
 OH_S @ 125 oC
Figure 4.6: Percentage area overhead utilizing conventional DLDO and percentage area overhead
saving utilizing aging-aware DLDO for ∆V degradation mitigation within LSU under different
temperature profile.
4.5 Trade-Off between Area Overhead and Program Output Quality
Considering aging effects, regulators are typically designed and optimized for the expected
service life of the processor. Deploying regulators optimized for a shorter service life cannot guar-
antee error-free operation, however, if such regulators are confined to feed error-tolerant loads, the
69
fft cho
l
bar
nes
lu_
ncb lu_
cb fmm
oc_
ncpoc_
cp rad
ix
rad
io
wa
ter
_s
wa
ter
_n vol
r
ray
t
Av
era
ge
0
10
20
30
40
50
%
 
 V
 d
eg
ra
da
tio
n
 m
itig
at
ion
 fo
r L
SU
25oC 90oC 125oC
Figure 4.7: Percentage ∆V degradation mitigation of the proposed aging-aware DLDO as com-
pared to the conventional DLDO design for LSU under all experimented benchmarks and different
temperature profile.
service life can be traded for lower hardware complexity, which almost always directly translates
into area savings. Please note that area represents a scarce on-chip resource for distributed voltage
regulators as many of these regulators are squeezed between various circuit blocks. Such area savings
can enable a higher number of on-chip voltage regulators, hence enhance the scalability of on-chip
voltage regulation. To illustrate this point, the percentage area overhead within each functional unit
to achieve the same fresh ∆V performance utilizing conventional DLDO is examined in Fig. 4.5.
The relative area between pMOS array & shift register and output capacitance is based on the data
in [138] for estimation. Adding extra output capacitance to mitigate ∆V degradation is considered
in the estimation. The percentage area overhead is relative to the original DLDO area including out-
put capacitance designed in an aging-unaware fashion. The percentage area overhead saving within
each functional unit for ∆V degradation mitigation utilizing the proposed aging-aware DLDO is
also demonstrated in Fig. 4.5. As shown in Fig. 4.5, a large area overhead can be introduced
to mitigate aging induced transient voltage noise degradation for conventional DLDOs. Similar to
the trend demonstrated in Fig. 3.3, the area penalty required to compensate for the aging-related
deterioration of ∆V is significant especially in the first two years. The percentage area overhead
also plateaus to within 10% after two years. These trends need to be considered to realize optimal
70
0 20 40 60 80 100
Percentage error rate degradation mitigation (%)
0
5
10
15
20
25
30
Pe
rc
en
ta
ge
 a
re
a 
ov
er
he
ad
 (%
)
 IFU bDSR
 LSU bDSR
 ISU bDSR
 EXU bDSR
 L2 bDSR
 IFU uDSR
 LSU uDSR
 ISU uDSR
 EXU uDSR
 L2 uDSR
Figure 4.8: Percentage area overhead within each functional unit for percentage error rate degra-
dation mitigation utilizing bDSR and uDSR based DLDO.
design based on different application environment and lifetime target. Furthermore, leveraging the
proposed aging-aware DLDO, due to mitigation of aging induced ∆V degradation, significant area
overhead savings compared to the conventional DLDO case can be achieved as shown in Fig. 4.5.
Our proof-of-concept analysis reveals approximately 1% total DLDO area, which corresponds to ∼
36% active DLDO area, savings for per year service life reduction.
The temperature variation effects on percentage area overhead (saving) within LSU is
demonstrated in Fig. 4.6. Seen from the figure, as temperature increases, the percentage area
overhead needed for conventional DLDO to mitigate ∆V degradation increases significantly. The
percentage area overhead saving achieved by the aging-aware DLDO also greatly increases. Although
the relative benefits of aging-aware DLDO do not improve significantly as temperature increases
shown in Fig. 4.7, the area overhead saving is considerable due to the relatively large ratio between
the area of output capacitance and that of active DLDO.
For a proof of concept analysis, considering a five-year aging period, the percentage area
overhead within each functional unit for percentage error rate degradation mitigation utilizing bDSR
and uDSR based DLDO is demonstrated in Fig. 4.8 based on the relationship between error rate and
supply voltage demonstrated in [139, 140]. The percentage error rate degradation mitigation is with
71
respect to the degraded error rate utilizing bDSR based DLDO and a 100% error rate degradation
mitigation means the same error rate within each functional unit is achieved as the fresh one after
a five-year aging period. Seen from Fig. 4.8, contrary to the bDSR curves, the uDSR curves do not
start from origin, which means with negligible area overhead, uDSR based DLDO achieves certain
amount of error rate degradation mitigation compared to bDSR based DLDO. Also, for the same
amount of error rate degradation mitigation, the area overhead needed for uDSR based DLDO is
lower than that of bDSR based DLDO.
4.6 Conclusion
As an emerging and essential part of modern processor power delivery network, DLDO
regulators experience serious aging induced performance degradations including IpMOS , TR, and
∆V . In particular, DLDO degradation can increase noise in the supply voltage and further de-
teriorate program output quality. Area overhead needed to fully compensate these degradations
can be significant especially when a conventional DLDO design is utilized. Algorithmic noise toler-
ance of different processor components is leveraged as an area-quality control knob to alleviate the
area overhead requirement through scalable on-chip voltage regulation at design time. Furthermore,
DLDO designed in an aging-aware fashion is proposed to mitigate aging induced performance degra-
dations with negligible power and area overhead. With reduced DLDO performance degradation,
a significantly better area and quality trade-off can be achived due to aging-aware DLDO induced
area overhead savings. Therefore, more efficient scalable on-chip voltage regulation can be realized
with the proposed aging-aware DLDO. Up to 3X steady state DLDO performance improvement as
well as more than 10% area overhead saving can be achieved utilizing the proposed aging-aware
paradigm.
72
CHAPTER 5:
CONCLUSIONS
Unbalanced current sharing among distributed on-chip voltage regulators negatively affect
the power conversion efficiency, stability, and reliability of the power delivery network. An effective
balanced current sharing technique is proposed to enhance the power efficiency, stability, and relia-
bility. The proposed technique slightly increases the reference voltages of on-chip voltage regulators
that provide less current and decreases the reference voltages of those providing more current to
balance the overall current sharing. Due to the small effective resistance variations connecting volt-
age regulators at different locations, the reference voltage changes needed to balance the current are
also negligible. Simulation results demonstrate up to 8% regulator power loss saving, several years
of MTTF improvement, and enhanced system stability.
NBTI leads to the amplitude increase of the threshold voltage and has been demonstrated
to degrade the current supply capability, transient response time, and voltage droop performance of
digital LDOs. Conventional digital LDOs utilizing bidirectional shift register for power transistor
array control impose too much stress on a certain portion of power transistors. The proposed
unidirectional shift register based NBTI-aware digital LDO can more evenly distribute the electrical
stress among all of the power transistors to mitigate NBTI induced performance degradations. Under
practical simulation settings, NBTI-aware digital LDO can achieve up to 42% voltage droop and
43.2% transient response time degradation mitigation.
BTI also leads to control loop degradation of digital LDOs, specifically propagation delay
degradation of the control loop, which is not desirable for steady state performance. It is demon-
strated through simulations that the propagation delay degradation can be small as compared to
half clock cycle of typical digital LDO clock signal such that reduced clock pulse width triggering
can be implemented to further reduce the mode and amplitude of steady state limit cycle oscilla-
73
tions especially at light load conditions. Up to three times steady state digital LDO performance
improvement is achieved.
Error rate of a certain functional block is largely affected by the supply voltage level. Aging
can lead to the degradation of on-chip voltage noise profile and further the degradation of error
rate. Algorithmic noise tolerance of different functional blocks can vary. Meanwhile, additional area
overhead is needed to mitigate aging induced on-chip voltage regulator performance degradations.
Higher algorithmic noise tolerance of a certain functional block can be leveraged to reduce area
overhead and allow more on-chip voltage noise degradations. The desired error rate can also be
maintained. Area overhead reduction may further enable increased number of distributed voltage
regulators for functional blocks that may have lower level of algorithmic noise tolerance.
74
CHAPTER 6:
FUTURE WORK
6.1 Co-Optimizing Different Design Aspects to Avoid Overdesign
High performance on-chip power delivery network involves different design aspects such
as power conversion efficiency, thermal issue, and reliability. Different application scenarios and
environments may impose different design targets and specifications. Power conversion efficiency,
thermal issue, and reliability can be mutually affected and should be considered as a whole to realize
optimal design and avoid overdesign. For example, power conversion efficiency can be a function
of temperature. Reliability can be largely affected by temperature. Implementation of thermal
mitigation techniques may need to sacrifice power conversion efficiency. Reliability enhancement
techniques can also introduce additional power and area overhead. Depending on the targeted
lifetime, power efficiency requirement, and on-chip temperature profile, appropriate efficiency boost
technique, thermal emergency mitigation technique, and aging mitigation technique need to be
adopted. Furthermore, algorithmic noise tolerance capability of different functional blocks needs to
be considered. A generic design flow considering different design aspects and trade-offs among them
will be considered in our future work.
6.2 NBTI-Aware Digital LDO with Adaptive Gain Scaling Control
Unidirectional shift register that can activate or deactivate a single power transistor per clock
cycle is proposed in our recent work [4, 130] to mitigate NBTI induced digital LDO performance
degradations. Digital LDO with improved transient performance [141] has been proposed to achieve
faster response time by turning on or off more number of power transistors per clock cycle during
the load transient. However, bidirectional shift register is utilized in [141] that can lead to the heavy
75
use of a portion of power transistors. The unidirectional shift register proposed in [4, 130] cannot be
directly applied to digital LDOs with adaptive gain scaling control capability. Novel NBTI-aware
digital LDO with adaptive gain scaling control capability will be proposed in our future work to
mitigate NBTI induced performance degradations.
76
REFERENCES
[1] X. Wang, J. Xu, Z. Wang, K. J. Chen, X. Wu, Z. Wang, P. Yang, and L. H. K. Duong, “An
Analytical Study of Power Delivery Systems for Many-Core Processors Using On-Chip and
Off-Chip Voltage Regulators,” IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol. 34, no. 9, pp. 1401–1414, September 2015.
[2] I. Vaisband, R. Jakushokas, M. Popovich, A. V. Mezhiba, S. Köse, and E. G. Friedman,
On-Chip Power Delivery and Management, Fourth Edition. Springer, 2016.
[3] R. Jakushokas, M. Popovich, A. V. Mezhiba, S. Köse, and E. G. Friedman, Power Distribution
Networks with On-Chip Decoupling Capacitors, Second Edition. Springer, 2011.
[4] L. Wang, S. K. Khatamifard, U. R. Karpuzcu, and S. Köse, “Exploiting Algorithmic Noise
Tolerance for Scalable On-Chip Voltage Regulation,” IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, 2018, in press.
[5] S. A. Sadat, M. Canbolat, and S. Köse, “Optimal Allocation of LDOs and Decoupling Capac-
itors within a Distributed On-Chip Power Grid,” ACM Transactions on Design Automation
of Electronic Systems (TODAES), vol. 23, no. 4, pp. 49:1–49:15, May 2018.
[6] L. Wang, S. K. Khatamifard, O. A. Uzun, U. R. Karpuzcu, and S. Köse, “Efficiency, Stability,
and Reliability Implications of Unbalanced Current Sharing among Distributed On-Chip Volt-
age Regulators,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25,
no. 11, pp. 3019–3032, November 2017.
[7] I. Vaisband, B. Price, S. Köse, Y. Kolla, E. G. Friedman, and J. Fischer, “Distributed Power
Delivery with 28 nm Ultra-Small LDO Regulator,” Analog Integrated Circuits and Signal
Processing, vol. 83, no. 3, pp. 295–309, April 2015.
[8] I. Vaisband, M. Azhar, E. G. Friedman, and S. Köse, “Digitally Controlled Pulse Width Mod-
ulator for On-Chip Power Management,” IEEE Transactions on Very Large Scale Integration
(VLSI) Systems, vol. 22, no. 12, pp. 2527–2534, December 2014.
[9] I. Savidis, S. Köse, and E. G. Friedman, “Power Noise in TSV-based 3-D Integrated Circuits,”
IEEE Journal of Solid-State Circuits, vol. 48, no. 2, pp. 587–597, February 2013.
[10] S. Köse, E. Salman, and E. G. Friedman, “Shielding Methodologies in the Presence of Power/-
Ground Noise,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19,
no. 8, pp. 1458–1468, August 2011.
77
[11] L. Wang and S. Köse, “Reliable On-Chip Voltage Regulation for Sustainable and Compact
IoT and Heterogeneous Computing Systems,” in Proceedings of the ACM/IEEE Great Lakes
Symposium on VLSI (GLSVLSI), May 2018, pp. 285–290.
[12] L. Wang, “Design of Integrated Voltage-Mode Controlled DC-DC Buck Converter,” Master’s
thesis, Texas Tech University, Lubbock, TX, May 2013.
[13] L. Wang and S. Köse, “Reliability Enhanced On-Chip Digital LDO with Limit Cycle Oscil-
lation Mitigation,” in Proceedings of the Government Microcircuit Applications and Critical
Technology Conference, March 2019.
[14] M. E. Belviranli, W. Yu, and S. Köse, “Ultra-Fine Grain Power Management at Datapath-
Level: Fact or Fiction,” in International Conference on Architectural Support for Programming
Languages and Operating Systems (ASPLOS) WACI Session, March 2015, pp. 1–2.
[15] O. A. Uzun and S. Köse, “Regulator-Gating Methodology With Distributed Switched Capac-
itor Voltage Converters,” in Proceedings of the IEEE Computer Society Annual Symposium
on VLSI, July 2014, pp. 13–18.
[16] M. Azhar and S. Köse, “An Enhanced Pulse Width Modulator with Adaptive Duty Cycle and
Frequency Control,” in Proceedings of the IEEE International Symposium on Circuits and
Systems, June 2014, pp. 958–961.
[17] S. Köse, I. Vaisband, and E. G. Friedman, “Digitally Controlled Wide Range Pulse Width
Modulator for On-Chip Power Supplies,” in Proceedings of the IEEE International Symposium
on Circuits and Systems, May 2013, pp. 2251–2254.
[18] S. Köse and E. G. Friedman, “Distributed Power Delivery for Energy Efficient and Low Power
Systems,” in Asilomar Conference on Signals, Systems, and Computers, November 2012, pp.
757–761.
[19] S. Kose and E. G. Friedman, “Design Methodology to Distribute On-Chip Power in Next Gen-
eration Integrated Circuits,” in IEEE 27th Convention of Electrical and Electronics Engineers
in Israel, November 2012, pp. 1–4.
[20] S. Köse and E. G. Friedman, “Power Delivery in Heterogeneous Integrated Circuits,” in IEEE
CAS-FEST Workshop (in conjunction with ISCAS2012), May 2012.
[21] S. Köse, S. Tam, S. Pinzon, B. McDermott, and E. G. Friedman, “An Area Efficient On-Chip
Hybrid Voltage Regulator,” in Proceedings of the IEEE International Symposium on Quality
Electronic Design (ISQED), March 2012, pp. 398–403.
[22] S. Köse and E. G. Friedman, “Fast Algorithms for IR Voltage Drop Analysis Exploiting
Locality,” in Proceedings of the IEEE/ACM Design Automation Conference (DAC), June
2011, pp. 996–1001.
[23] S. Kose and E. G. Friedman, “Distributed Power Network Co-Design with On-Chip Power
Supplies and Decoupling Capacitors,” in Proceedings of the ACM/IEEE International Work-
shop on System Level Interconnect Prediction (SLIP), June 2011, pp. 1–5.
78
[24] I. Savidis, S. Köse, and E. G. Friedman, “Power Grid Noise in TSV-Based 3-D Integrated Sys-
tems,” in Government Microcircuit Applications and Critical Technology Conference, March
2011, pp. 1–4.
[25] S. Köse and E. G. Friedman, “Simultaneous Co-Design of Distributed On-Chip Power Sup-
plies and Decoupling Capacitors,” in Proceedings of the IEEE International SoC Conference,
September 2010, pp. 15–18.
[26] S. Kose and E. G. Friedman, “An Area Efficient Fully Monolithic Hybrid Voltage Regulator,”
in Proceedings of the IEEE International Symposium on Circuits and Systems, May/June
2010, pp. 2718–2721.
[27] W. Yu and S. Köse, “Fast Algorithms for Power Grid Analysis Based on Effective Resistance,”
in Proceedings of the IEEE International Symposium on Circuits and Systems, May/June 2010,
pp. 3661–3664.
[28] S. Köse, E. Salman, and E. G. Friedman, “Shielding Methodologies in the Presence of Power/-
Ground Noise,” in Proceedings of the IEEE International Symposium on Circuits and Systems,
May 2009, pp. 2277–2280.
[29] S. Köse, E. Salman, Z. Ignjatovic, and E. G. Friedman, “Pseudo-Random Clocking to Enhance
Signal Integrity,” in Proceedings of the IEEE International SoC Conference, September 2008,
pp. 47–50.
[30] S. Köse and E. G. Friedman, “Efficient Algorithm for Fast IR Drop Analysis Exploiting
Locality,” Integration, the VLSI Journal, vol. 45, no. 2, pp. 149–161, March 2012.
[31] S. K. Khatamifard, L. Wang, W. Yu, S. Köse, and U. R. Karpuzcu, “ThermoGater: Thermally-
Aware On-Chip Voltage Regulation,” in Proceedings of the International Symposium on Com-
puter Architecture, June 2017, pp. 120–132.
[32] S. Lai, B. Yan, and P. Li, “Localized Stability Checking and Design of IC Power Delivery with
Distributed Voltage Regulators,” IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol. 32, no. 9, pp. 1321–1334, September 2013.
[33] I. Vaisband and E. G. Friedman, “Stability of Distributed Power Delivery Systems with Mul-
tiple Parallel On-Chip LDO Regulators,” IEEE Transactions on Power Electronics, vol. 31,
no. 8, pp. 5625–5634, August 2016.
[34] S. K. Khatamifard, L. Wang, S. Köse, and U. R. Karpuzcu, “A New Class of Covert Channels
Exploiting Power Management Vulnerabilities,” IEEE Computer Architecture Letters, vol. 17,
no. 2, pp. 201–204, July-Dec. 2018.
[35] W. Yu, Y. Wen, S. Köse, and J. Chen, “Exploiting Multi-Phase On-Chip Voltage Regula-
tors as Strong PUF Primitives for Securing IoT,” Journal of Electronic Testing: Theory and
Applications, vol. 34, no. 5, pp. 587–598, October 2018.
[36] M. Azhar, F. Amsaad, and S. Köse, “Duty Cycle-Based Controlled Physical Unclonable Func-
tion,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 9, pp.
1647–1658, September 2018.
79
[37] S. K. Khatamifard, L. Wang, S. Köse, and U. R. Karpuzcu, “POWERT Channels: A Novel
Class of Covert Communication Exploiting Power Management Vulnerabilities,” in Proceed-
ings of the 25th IEEE International Symposium on High-Performance Computer Architecture,
February 2019.
[38] F. Amsaad, M. Niamat, A. Dawoud, and S. Köse, “Reliable Delay Based Algorithm to Boost
PUF Security against Modeling Attacks,” Information, vol. 9, no. 9, pp. 1–15, September
2018.
[39] W. Yu and S. Köse, “Exploiting Voltage Regulators to Enhance Various Power Attack Coun-
termeasures,” IEEE Transactions on Emerging Topics in Computing, vol. 6, no. 2, pp. 244–257,
April-June 2018.
[40] M. Yilmaz, E. Guvenkaya, H. M. Furqan, S. Köse, and H. Arslan, “Cognitive Security of Wire-
less Communication Systems in the Physical Layer,” Wireless Communications and Mobile
Computing, vol. 2017, pp. 1–9, December 2017.
[41] M. Yilmaz, S. Köse, N. Chamok, M. Ali, and H. Arslan, “Partially Overlapping Filtered Mul-
titone with Reconfigurable Antennas in Uncoordinated Networks,” Physical Communication,
vol. 25, no. 1, pp. 249–258, December 2017.
[42] W. Yu and S. Köse, “False Key-Controlled Aggressive Voltage Scaling: A Countermeasure
Aganst LPA Attacks,” IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems, vol. 36, no. 12, pp. 2149–2153, December 2017.
[43] W. Yu and S. Kose, “A Lightweight Masked AES Implementation for Securing IoT Against
CPA Attacks,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 64, no. 11,
pp. 2934–2944, November 2017.
[44] W. Yu and S. Köse, “Security-Adaptive Voltage Covnersion as a Lightweight Countermeasure
Against LPA Attacks,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
vol. 25, no. 7, pp. 2183–2187, July 2017.
[45] W. Yu and S. Kose, “A Voltage Regulator-Assisted Lightweight AES Implementation Against
DPA Attacks,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 63, no. 8,
pp. 1152–1163, August 2016.
[46] W. Yu and S. Köse, “Charge-Withheld Converter-Reshuﬄing (CoRe): A Countermeasure
Against Power Analysis Attacks,” IEEE Transactions on Circuits and Systems II: Express
Briefs, vol. 63, no. 5, pp. 438–442, May 2016.
[47] W. Yu and S. Kose, “Security Implications of Simultaneous Dynamic and Leakage Power
Analysis Attacks on Nanoscale Cryptographic Circuits,” IET Electronics Letters, vol. 52,
no. 6, pp. 466–468, March 2016.
[48] W. Yu and S. Köse, “Time-Delayed Converter-Reshuﬄing: An Efficient and Secure Power
Delivery Architecture,” IEEE Embedded Systems Letters, vol. 7, no. 3, pp. 73–76, September
2015.
80
[49] L. Wang and S. Köse, “When Hardware Security Moves to the Edge and Fog,” in Proceedings
of the IEEE International Conference on Digital Signal Processing, November 2018, pp. 1–5.
[50] M. Azhar and S. Köse, “Process, Voltage, and Temperature-stable Adaptive Duty Cycle based
PUF,” in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS),
May 2018, pp. 1–5.
[51] A. W. Khan, T. Wanchoo, G. Mumcu, and S. Köse, “Implications of Distributed On-Chip
Power Delivery on EM Side-Channel Attacks,” in Proceedings of the IEEE International Con-
ference on Computer Design (ICCD), November 2017, pp. 329–336.
[52] B. Pekoz, S. Köse, and H. Arslan, “Adaptive Windowing of Insufficient CP for Joint Minimiza-
tion of ISI and ACI Beyond 5G,” in Proceedings of the IEEE Annual International Symposium
on Personal, Indoor, and Mobile Radio Communications (PIMRC), October 2017, pp. 1–5.
[53] A. Roohi, R. Demara, L. Wang, and S. Köse, “Secure Intermittent-Robust Computation
for Energy Harvesting Device Security and Outage Resilience,” in Proceedings of the IEEE
Conference on Advanced and Trusted Computing, August 2017, pp. 1–6.
[54] W. Yu and S. Köse, “A Lightweight AES Implementation Against Bivariate First-Order DPA
Attacks,” in Proceedings of the ACM Workshop on Hardware and Architectural Support for
Security and Privacy (HASP), June 2017, pp. 1–7.
[55] S. Köse, “Efficient and Secure On-Chip Reconfigurable Voltage Regulation for IoT Devices,”
in Proceedings of the ACM/IEEE Great Lakes Symposium on VLSI, May 2017, pp. 369–374.
[56] V. T. Alaparthy and S. Köse, “An Adaptive Senior Design Course with an Emphasis on
Undergraduate Course Curriculum,” in Proceedings of the IEEE International Conference on
Microelectronics System Education, May 2017, pp. 59–62.
[57] W. Yu and S. Köse, “Implications of Noise Insertion Mechanisms of Different Countermea-
sures Against Side-Channel Attacks,” in Proceedings of the IEEE International Symposium on
Circuits and Systems, May 2017, pp. 1–4.
[58] S. Köse, L. Wang, and R. Demara, “On-Chip Sensor Circle Distribution Technique for Real-
Time Hardware Trojan Detection,” in Proceedings of the Government Microcircuit Applica-
tions and Critical Technology Conference, March 2017, pp. 1–4.
[59] W. Yu, O. A. Uzun, and S. Köse, “Leveraging On-Chip Voltage Regulators as a Countermea-
sure Against Side-Channel Attacks,” in Proceedings of the IEEE/ACM Design Automation
Conference (DAC), June 2015, pp. 1–6.
[60] S. Köse, E. G. Friedman, R. M. Secareanu, and O. Hartin, “Current Profile of a Microcon-
troller to Determine Electromagnetic Emissions,” in Proceedings of the IEEE International
Symposium on Circuits and Systems, May 2013, pp. 2650–2653.
[61] O. A. Uzun and S. Köse, “Converter-Gating: A Power Efficient and Secure On-Chip Power
Delivery System,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems,
vol. 4, no. 2, pp. 169–179, June 2014.
81
[62] E. A. Burton, G. Schrom, F. Paillet, J. Douglas, W. J. Lambert, Radhakrishnan, and M. J.
Hill, “FIVR - Fully Integrated Voltage Regulators on 4th Generation Intel Core SoCs,” in
Proceedings of the IEEE Applied Power Electronics Conference and Exposition, March 2014,
pp. 432–439.
[63] Y. Okuma, K. Ishida, Y. Ryu, X. Zhang, P.-H. Chen, K. Watanabe, M. Takamiya, and
T. Sakurai, “0.5-V input digital LDO with 98.7% current efficiency and 2.7-µA quiescent
current in 65 nm CMOS,” in Proceedings of the IEEE Custom Integrated Circuits Conference,
September 2010, pp. 1–4.
[64] E. J. Fluhr, J. Friedrich, D. Dreps, V. Zyuban, G. Still, C. Gonzalez, A. Hall, D. Hogenmiller,
F. Malgioglio, R. Nett, J. Paredes, J. Pille, D. Plass, R. Puri, P. Restle, D. Shan, K. Stawiasz,
Z. T. Deniz, D. Wendel, and M. Ziegler, “Power8: A 12 core sever-class processor in 22nm
soi with 7.6tb/s off-chip bandwidth,” in Proceedings of the IEEE International Solid-State
Circuits Conference, February 2014, pp. 96–97.
[65] P. Patra, R. Muthukaruppan, and S. Mangal, “A Reliable Digitally Synthesizable Linear
Drop-out Regulator Design for 14nm SOC,” in Proceedings of the 2015 IEEE International
Symposium on Nanoelectronic and Information Systems (iNIS), December 2015, pp. 73–76.
[66] R. Muthukaruppan, T. Mahajan, H. K. Krishnamurthy, S. Mangal, A. Dhanashekar,
R. Ghayal, and V. De, “A Digitally Controlled Linear Regulator for Per-Core Wide-Range
DVFS of AtomTM Cores in 14nm Tri-gate CMOS Featuring Non-linear Control, Adaptive
Gain and Code Roaming,” in Proceedings of the IEEE European Solid State Circuits Confer-
ence, September 2017, pp. 275–278.
[67] H.-P. Le, S. R. Sanders, and E. Alon, “Design Techniques for Fully Integrated Switched-
Capacitor DC-DC Converters,” IEEE Journal of Solid-State Circuits, vol. 46, no. 9, pp. 2120–
2131, September 2011.
[68] T. M. Andersen, F. Krismer, J. W. Kolar, T. Toifl, C. Menolfi, L. Kull, T. Morf, M. Kossel,
M. Brandli, and P. A. Francese, “A 10 W On-Chip Switched Capacitor Voltage Regulator
With Feedforward Regulation Capability for Granular Microprocessor Power Delivery,” IEEE
Transactions on Power Electronics, vol. 32, no. 1, pp. 378–393, January 2017.
[69] Y. Lu, J. Jiang, and W.-H. Ki, “A Multiphase Switched-Capacitor DCâĂŞDC Converter
Ring With Fast Transient Response and Small Ripple,” IEEE Journal of Solid-State Circuits,
vol. 52, no. 2, pp. 579–591, February 2017.
[70] J. L. Hennessy and D. A. Patterson, Computer Architecture, Fifth Edition: A Quantitative
Approach. Morgan Kaufmann Publishers Inc., 2011.
[71] C. Huang and P. K. T. Mok, “An 84.7% Efficiency 100-MHz Package Bondwire-Based Fully In-
tegrated Buck Converter With Precise DCM Operation and Enhanced Ligh-Load Efficiency,”
IEEE Journal of Solid-State Circuits, vol. 48, no. 11, pp. 2595–2607, November 2013.
[72] S. S. Chong and P. K. Chan, “A 0.9-uA Quiescent Current Output-Capacitorless LDO Regu-
lator with Adaptive Power Transistors in 65-nm CMOS,” IEEE Transactions on Circuits and
Systems I: Regular Papers, vol. 60, no. 4, pp. 1072–1081, April 2013.
82
[73] J. F. Bulzacchelli, Z. Toprak-Deniz, T. M. Rasmus, J. A. Iadanza, W. L. Bucossi, S. Kim,
R. Blanco, C. E. Cox, M. Chhabra, C. D. LeBlanc, C. L. Trudeau, and D. J. Friedman, “Dual-
Loop System of Distributed Microregulators with High DC Accuracy, Load Response Time
Below 500 ps, and 85-mV Dropout Voltage,” IEEE Journal of Solid-State Circuits, vol. 47,
no. 4, pp. 863–874, April 2013.
[74] Z. Zeng, X. Ye, Z. Feng, and P. Li, “Tradeoff Analysis and Optimization of Power Deliv-
ery Networks with On-Chip Voltage Regulation,” in Proceedings of the IEEE/ACM Design
Automation Conference, June 2010, pp. 831–836.
[75] S. Köse and E. G. Friedman, “Distributed On-Chip Power Delivery,” IEEE Journal on Emerg-
ing and Selected Topics in Circuits and Systems, vol. 2, no. 4, pp. 704–713, December 2012.
[76] S. Köse, S. Tam, S. Pinzon, B. McDermott, and E. G. Friedman, “Active Filter-Based Hybrid
On-Chip DC-DC Converter for Point-of-Load Voltage Regulation,” IEEE Transactions on
Very Large Scale Integration (VLSI) Systems, vol. 21, no. 4, pp. 680–691, April 2013.
[77] S. Köse and E. G. Friedman, “On-Chip Point-of-Load Voltage Regulator for Distributed Power
Supplies,” in Proceedings of the ACM/IEEE Great Lakes Symposium on VLSI, May 2010, pp.
377–380.
[78] A. Paul, S. P. Park, D. Somasekhar, Y. M. Kim, N. Borkar, U. R. Karpuzcu, and C. H. Kim,
“System-Level Power Analysis of a Multicore Multipower Domain Processor with On-chip
Voltage Regulators,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
vol. 24, no. 12, pp. 3468–3476, December 2016.
[79] W. Lee, Y. Wang, and M. Pedram, “Optimizing a Reconfigurable Power Distribution Network
in a Multicore Platform,” IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems, vol. 34, no. 7, pp. 1110–1123, July 2015.
[80] J. A. Abu-Qahouq, “Analysis and Design of N-Phase Current-Sharing Autotuning Controller,”
IEEE Transactions on Power Electronics, vol. 25, no. 6, pp. 1641–1651, June 2010.
[81] S. R. Sanders and G. Eirea, “Phase Current Unbalance Estimation in Multi-Phase Buck
Converters,” in Proceedings of the IEEE Power Electronics Specialists Conference, June 2006,
pp. 1–6.
[82] R. Zhang, K. Wang, B. H. Meyer, M. R. Stan, and K. Skadron, “Architecture Implications of
Pads as a Scarce Resource,” in Proceedings of the ACM/IEEE 41st International Symposium
on Computer Architecture (ISCA), June 2014, pp. 373–384.
[83] S. Köse and E. G. Friedman, “Effective Resistance of a Two Layer Mesh,” IEEE Transactions
on Circuits and Systems II: Express Briefs, vol. 58, no. 8, pp. 739–743, November 2011.
[84] K. Mistry, C. Allen, C. Auth, B. Beattie, D. Bergstrom, M. Bost, M. Brazier, M. Buehler,
A. Cappellani, R. Chau, C.-H. Choi, G. Ding, K. Fischer, T. Ghani, R. Grover, W. Han,
D. Hanken, M. Hattendorf, J. He, J. Hicks, R. Huessner, D. Ingerly, P. Jain, R. James,
L. Jong, S. Joshi, C. Kenyon, K. Kuhn, K. Lee, and H. Liu, “A 45nm Logic Technology
with High-k+Metal Gate Transistors, Strained Silicon, 9 Cu Interconnect Layers, 193nm Dry
83
Patterning, and 100% Pb-free Packaging,” in Proceedings of the IEEE International Electron
Devices Meeting, December 2007, pp. 247–250.
[85] F. Ma, W. Chen, and J. Wu, “A Monolithic Current-Mode Buck Converter with Advanced
Control and Protection Circuits,” IEEE Transactions on Power Electronics, vol. 22, no. 5, pp.
1836–1846, September 2007.
[86] B. S. Lee, “Understanding the terms and definitions of ldo voltage regulators,” http://www.
ti.com/lit/an/slva079/slva079.pdf, 1999, [Online; accessed 08-October-2018].
[87] M. Gildersleeve, H. P. Forghani-zadeh, and G. A. Rincon-Mora, “A Comprehensive Power
Analysis and a Highly Efficient, Mode-Hopping DC-DC Converter,” in Proceedings of the
IEEE Asia-Pacific Conference on ASIC, August 2002, pp. 153–156.
[88] X. Zhou, T. G. Wang, and F. C. Lee, “Optimizing Design for Low Voltage DC-DC Converters,”
in Proceedings of the Applied Power Electronics Conference, February 1997, pp. 612–616.
[89] P. Zumel, C. Fernández, A. Castro, and O. García, “Efficiency Improvement in Multiphase
Converter by Changing Dynamically the Number of Phases,” in Proceedings of the IEEE
Power Electronics Specialists Conference, June 2006, pp. 1–6.
[90] R. Middlebrook and S. Cuk, “A General Unified Approach to Modelling Switching-converter
Power Stages,” in Proceedings of the IEEE Power Electronics Specialists Conference, June
1976, pp. 18–34.
[91] M. Hankaniemi, “Dynamical Profile of Switched-Mode Converter - Fact or Fiction?” Ph.D.
dissertation, Tampere University of Technology, 2007.
[92] C. P. Basso, Switch-Mode Power Supplies SPICE Simulations and Practical Designs.
McGraw-Hill, Inc., 2008.
[93] B. Johansson, “DC-DC Converters - Dynamic Model Design and Experimental Verification,”
Ph.D. dissertation, Lund University, 2004.
[94] X. Huang, T. Yu, V. Sukharev, and S. X.-D. Tan, “Physics-based Electromigration Assessment
for Power Grid Networks,” in Proceedings of the IEEE/ACM Design Automation Conference,
June 2014, pp. 1–6.
[95] J. R. Black, “Electromigration: A brief survey and some recent results,” IEEE Transactions
on Electron Devices, vol. 16, no. 4, pp. 338–347, April 1969.
[96] R. Zhang, B. H. Meyer, K. Wang, M. R. Stan, and K. Skadron, “Tolerating the Consequences of
Multiple EM-Induced C4 Bump Failures,” IEEE Transactions on Very Large Scale Integration
(VLSI) Systems, vol. 24, no. 6, pp. 2335–2344, June 2016.
[97] J. G. Delgado-Frias and W. R. Moore, VLSI for Neural Networks and Artificial Intelligence.
Springer, 1994.
84
[98] V. R. H. Lorentz, S. E. Berberich, M. Marz, A. J. Bauer, H. Ryssel, P. Poure, and F. Braun,
“Lossless Average Inductor Current Sensor for CMOS Integrated DC-DC Converters Operating
at High Frequencies,” Analog Integrated Circuits and Signal Processing, vol. 62, no. 3, pp. 333–
344, March 2009.
[99] C. Bienia, S. Kumar, J. P. Singh, and K. Li, “The PARSEC Benchmark Suite: Characteriza-
tion and Architectural Implications,” Tech. Rep. TR-811-08, 2008, princeton University.
[100] S. L. Xi, H. Jacobson, P. Bose, G. Wei, and D. Brooks, “Quantifying Sources of Error in
McPAT and Potential Impacts on Architectural Studies,” in Proceedings of the IEEE Interna-
tional Symposium on High Performance Computer Architecture (HPCA), February 2015, pp.
577–589.
[101] S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, “McPAT: An
Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Ar-
chitectures,” in Proceedings of the IEEE/ACM International Symposium on Microarchitecture
(MICRO), December 2009, pp. 469–480.
[102] T. E. Carlson, W. Heirman, and L. Eeckhout, “Sniper: Exploring the Level of Abstraction
for Scalable and Accurate Parallel Multi-core Simulation,” in Proceedings of the International
Conference for High Performance Computing, Networking, Storage and Analysis, November
2011, pp. 1–12.
[103] Y. Zhang, D. Parikh, K. Sankaranarayanan, K. Skadron, and M. R. Stan, “Hotleakage: A
Temperature-Aware Model of Subthreshold and Gate Leakage for Architects,” Univ. of Vir-
ginia Tech. Report CS-2003-05, 2003, Univ. of Virginia.
[104] K. Skadron, M. R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy, and D. Tarjan,
“Temperature-Aware Microarchitecture: Modeling and Implementation,” ACM Transactions
on Architecture and Code Optimization (TACO), vol. 1, no. 1, pp. 94–125, March 2004.
[105] T. Yu and M. D. F. Wong, “Efficient simulation-based optimization of power grid with on-chip
voltage regulator,” in Proceedings of the Asia and South Pacific Design Automation Conference
(ASP-DAC), January 2014, pp. 531–536.
[106] K. Wang, B. H. Meyer, R. Zhang, M. Stan, and K. Skadron, “Walking pads: Managing C4
placement for transient voltage noise minimization,” in Proceedings of the IEEE/ACM Design
Automation Conference (DAC), June 2014, pp. 1–6.
[107] Altera, “AN 574: Printed Circuit Board (PCB) Power Delivery Network (PDN) De-
sign Methodology,” https://www.intel.com/content/dam/www/programmable/us/en/pdfs/
literature/an/an574.pdf, 2009, [Online; accessed 08-October-2018].
[108] R. Zhang, K. Mazumdar, B. H. Meyer, K. Wang, K. Skadron, and M. R. Stan, “Transient
Voltage Noise in Charge-Recycled Power Delivery Networks for Many-Layer 3D-IC,” in Pro-
ceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design,
July 2015, pp. 1–6.
85
[109] D. Rossi, V. Tenentes, S. Yang, S. Khursheed, and B. M. Al-Hashimi, “Reliable Power Gat-
ing with NBTI Aging Benefits,” IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol. 24, no. 8, pp. 2735–2744, August 2016.
[110] J. Wu, A. Boyer, J. Li, S. B. Dhia, and R. Shen, “Characterization of Changes in LDO
Susceptibility After Electrical Stress,” IEEE Transactions on Electromagnetic Compatibility,
vol. 55, no. 5, pp. 883–890, October 2013.
[111] Y. Ahn, I. Jeon, and J. Roh, “A Multiphase Buck Converter with a Rotating Phase-Shedding
Scheme for Efficient Light-Load Control,” IEEE Journal of Solid-State Circuits, vol. 49, no. 11,
pp. 2673–2683, November 2014.
[112] S. Köse, “Regulator-Gating: Adaptive Management of On-Chip Voltage Regulators,” in Pro-
ceedings of the ACM/IEEE Great Lakes Symposium on VLSI, May 2014, pp. 105–110.
[113] M. Alioto, Enabling the Internet of Things - from Integrated Circuits to Integrated Systems.
Springer, 2016.
[114] D. Pathak, H. Homayoun, and I. Savidis, “Smart Grid on Chip: Work Load-Balanced On-
Chip Power Delivery,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
vol. 25, no. 9, pp. 2538–2551, September 2017.
[115] M. A. Alam and S. Mahapatra, “A Comprehensive Model of PMOS NBTI Degradation,”
Microelectronics Reliability, vol. 45, no. 1, pp. 71–81, January 2005.
[116] I. Agbo, M. Taouil, D. Kraak, S. Hamdioui, H. Kukner, P. Weckx, P. Raghavan, and
F. Catthoor, “Integral Impact of BTI, PVT Variation, and Workload on SRAM Sense Ampli-
fier,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 4, pp.
1444–1454, April 2017.
[117] Y. Cao, Predictive Technology Model for Robust Nanoelectronic Design. Springer, 2011.
[118] M. M. Mahmoud, N. Soin, and H. A. H. Fahmy, “Design Framework to Overcome Aging
Degradation of the 16 nm VLSI Technology Circuits,” IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, vol. 33, no. 5, pp. 691–703, May 2014.
[119] S. Leitner, P. West, C. Lu, and H. Wang, “Digital LDO Modeling for Early Design Space
Exploration,” in Proceedings of the IEEE International System-on-Chip Conference (SOCC),
September 2016, pp. 7–12.
[120] M. Alioto and G. Palumbo, “NAND/NOR Adiabatic Gates: Power Consumption Evaluation
and Comparison Versus the Fan-In,” IEEE Transactions on Circuits and Systems I: Regular
Papers, vol. 49, no. 9, pp. 1253–1262, September 2002.
[121] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, “The SPLASH-2 Programs: Char-
acterization and Methodological Considerations,” in Proceedings of the International Sympo-
sium on Computer Architecture, June 1995, pp. 24–36.
86
[122] Z. Toprak-Deniz, M. Sperling, J. Bulzacchelli, G. Still, R. Kruse, S. Kim, D. Boerstler,
T. Gloekler, R. Robertazzi, K. Stawiasz, T. Diemoz, G. English, D. Hui, P. Muench, and
J. Friedrich, “Distributed System of Digitally Controlled Microregulators Enabling Per-Core
DVFS for the POWER8 Microprocessor,” in Proceedings of the IEEE International Solid-State
Circuits Conference, February 2014, pp. 98–99.
[123] S. Köse, “Thermal Implications of On-Chip Voltage Regulation: Upcoming Challenges and
Possible Solutions,” in Proceedings of the IEEE/ACM Design Automation Conference, June
2014, pp. 1–6.
[124] V. De, “Fine-Grain Power Management in Manycore Processor and System-on-Chip (SoC)
Designs,” in Proceedings of the IEEE/ACM International Conference on Computer-Aided De-
sign, November 2015, pp. 159–164.
[125] Y.-K. Chen, J. Chhugani, P. Dubey, C. J. Hughes, D. Kim, S. Kumar, V. W. Lee, A. D.
Nguyen, and M. Smelyanskiy, “Convergence of Recognition, Mining, and Synthesis Workloads
and Its Implications,” Proceedings of the IEEE, vol. 96, no. 5, pp. 790–807, May 2008.
[126] H. Cho, L. Leem, and S. Mitra, “ERSA: Error Resilient System Architecture for Probabilis-
tic Applications,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, vol. 31, no. 4, pp. 546–558, March 2012.
[127] T. Chan, J. Sartori, P. Gupta, and R. Kumar, “On the Efficacy of NBTI Mitigation Tech-
niques,” in Proceedings of the IEEE/ACM Conference on Design, Automation and Test in
Europe, March 2011, pp. 1–6.
[128] K. Wu, I. Lin, Y. Wang, and S. Yang, “BTI-Aware Sleep Transistor Sizing Algorithm for
Reliable Power Gating Designs,” IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol. 33, no. 10, pp. 1591–1595, October 2014.
[129] J. Fang and S. S. Sapatnekar, “The Impact of BTI Variations on Timing in Digital Logic
Circuits,” IEEE Transactions on Device and Materials Reliability, vol. 13, no. 1, pp. 277–286,
January 2013.
[130] L. Wang, S. K. Khatamifard, U. R. Karpuzcu, and S. Köse, “Mitigation of NBTI Induced
Performance Degradation in On-Chip Digital LDOs,” in Proceedings of the IEEE/ACM Con-
ference on Design, Automation and Test in Europe, March 2018, pp. 809–814.
[131] S. B. Nasir and A. Raychowdhury, “On Limit Cycle Oscillations in Discrete-Time Digital
Linear Regulators,” in Proceedings of the IEEE Applied Power Electronics Conference and
Exposition (APEC), March 2015, pp. 371–376.
[132] M. Huang, Y. Lu, S.-W. Sin, S.-P. U, R. P. Martins, and W.-H. Ki, “Limit Cycle Oscillation
Reduction for Digital Low Dropout Regulators,” IEEE Transactions on Circuits and Systems
II: Express Briefs, vol. 63, no. 9, pp. 903–907, September 2016.
[133] S. B. Nasir and A. Raychowdhury, “A Model Study of an All-Digital, Discrete-Time and
Embedded Linear Regulator,” http://arxiv.org/abs/1501.00579, 2015.
87
[134] H. Yi, T. Yoneda, M. Inoue, Y. Sato, S. Kajihara, and H. Fujiwara, “A Failure Prediction
Strategy for Transistor Aging,” IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol. 20, no. 11, pp. 1951–1959, November 2012.
[135] S. B. Nasir, S. Gangopadhyay, and A. Raychowdhury, “All-Digital Low-Dropout Regulator
with Adaptive Control and Reduced Dynamic Stability for Digital Load Circuits,” IEEE
Transactions on Power Electronics, vol. 31, no. 12, pp. 8293–8302, December 2016.
[136] W. Wang, V. Reddy, A. T. Krishnan, R. Vattikonda, S. Krishnan, and Y. Cao, “Compact
Modeling and Simulation of Circuit Reliability for 65-nm CMOS Technology,” IEEE Trans-
actions on Device and Materials Reliability, vol. 7, no. 4, pp. 509–517, December 2007.
[137] M. Onouchi, K. Otsuga, Y. Igarashi, T. Ikeya, S. Morita, K. Ishibashi, and K. Yanagisawa, “A
1.39-V Input Fast-transient-response Digital LDO Composed of Low-Voltage MOS Transistors
in 40-nm CMOS Process,” in Proceedings of the IEEE Asian Solid-State Circuits Conference,
November 2011, pp. 37–40.
[138] M. Huang, Y. Lu, S.-W. Sin, S.-P. U, and R. P. Martins, “A Fully Integrated Digital LDO
With Coarse-Fine-Tuning and Burst-Mode Operation,” IEEE Transactions on Circuits and
Systems II: Express Briefs, vol. 63, no. 7, pp. 683–687, July 2016.
[139] D. Ernst, N. S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler, D. Blaauw, T. Austin,
K. Flautner, and T. Mudge, “Razor: A Low-Power Pipeline Based on Circuit-Level Timing
Speculation,” in Proceedings of the Annual IEEE/ACM International Symposium on Microar-
chitecture, December 2003, pp. 7–18.
[140] S. R. Sarangi, B. Greskamp, R. Teodorescu, J. Nakano, A. Tiwari, and J. Torrellas, “VAR-
IUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects,” IEEE
Transactions on Semiconductor Manufacturing, vol. 21, no. 1, pp. 3–13, February 2008.
[141] J.-H. Lin, W.-J. Tsou, K.-H. Chen, C.-L. Wey, Y.-H. Lin, J.-R. Lin, and L.-Y. Lsai, “A Digital
Low-Dropout-Regulator with Steady-State Load Current (SLC) Estimator and Dynamic Gain
Scaling (DGS) Control,” in Proceedings of the 2016 IEEE Asia Pacific Conference on Circuits
and Systems (APCCAS), October 2016, pp. 37–40.
88
APPENDICES
89
Appendix A: Copyright Notices
The following notice is for the material in Chapter 2.
Title: Efficiency, Stability, and
Reliability Implications of
Unbalanced Current Sharing
Among Distributed On-Chip
Voltage Regulators
Author: Longfei Wang
Publication: Very Large Scale Integration
Systems, IEEE Transactions on
Publisher: IEEE
Date: Nov. 2017
Copyright © 2017, IEEE
LOGIN
If you're a copyright.com
user, you can login to
RightsLink using your
copyright.com credentials.
Already a RightsLink user or
want to learn more?
Thesis / Dissertation Reuse
The IEEE does not require individuals working on a thesis to obtain a formal reuse license, however,
you may print out this statement to be used as a permission grant: 
Requirements to be followed when using any portion (e.g., figure, graph, table, or textual material) of an IEEE
copyrighted paper in a thesis:
1) In the case of textual material (e.g., using short quotes or referring to the work within these papers) users
must give full credit to the original source (author, paper, publication) followed by the IEEE copyright line © 2011
IEEE. 
2) In the case of illustrations or tabular material, we require that the copyright line © [Year of original
publication] IEEE appear prominently with each reprinted figure and/or table. 
3) If a substantial portion of the original paper is to be used, and if you are not the senior author, also obtain the
senior author's approval. 
Requirements to be followed when using an entire IEEE copyrighted paper in a thesis: 
1) The following IEEE copyright/ credit notice should be placed prominently in the references: © [year of original
publication] IEEE. Reprinted, with permission, from [author names, paper title, IEEE publication title, and
month/year of publication] 
2) Only the accepted version of an IEEE copyrighted paper can be used when posting the paper or your thesis
on-line.
3) In placing the thesis on the author's university website, please display the following message in a prominent
place on the website: In reference to IEEE copyrighted material which is used with permission in this thesis, the
IEEE does not endorse any of [university/educational entity's name goes here]'s products or services. Internal or
personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for
advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to
http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License
from RightsLink. 
If applicable, University Microfilms and/or ProQuest Library, or the Archives of Canada may supply single copies
of the dissertation.
    
 Copyright © 2018 Copyright Clearance Center, Inc. All Rights Reserved. Privacy statement. Terms and Conditions. Comments? We would like to hear from you. E-mail us at customercare@copyright.com 
90
Appendix A (Continued)
The following notices are for the material in Chapters 3 and 4.
Title: Mitigation of NBTI induced
performance degradation in on-
chip digital LDOs
Conference
Proceedings:
2018 Design, Automation &
Test in Europe Conference &
Exhibition (DATE)
Author: Longfei Wang
Publisher: IEEE
Date: March 2018
Copyright © 2018, IEEE
LOGIN
If you're a copyright.com
user, you can login to
RightsLink using your
copyright.com credentials.
Already a RightsLink user or
want to learn more?
Thesis / Dissertation Reuse
The IEEE does not require individuals working on a thesis to obtain a formal reuse license, however,
you may print out this statement to be used as a permission grant: 
Requirements to be followed when using any portion (e.g., figure, graph, table, or textual material) of an IEEE
copyrighted paper in a thesis:
1) In the case of textual material (e.g., using short quotes or referring to the work within these papers) users
must give full credit to the original source (author, paper, publication) followed by the IEEE copyright line © 2011
IEEE. 
2) In the case of illustrations or tabular material, we require that the copyright line © [Year of original
publication] IEEE appear prominently with each reprinted figure and/or table. 
3) If a substantial portion of the original paper is to be used, and if you are not the senior author, also obtain the
senior author's approval. 
Requirements to be followed when using an entire IEEE copyrighted paper in a thesis: 
1) The following IEEE copyright/ credit notice should be placed prominently in the references: © [year of original
publication] IEEE. Reprinted, with permission, from [author names, paper title, IEEE publication title, and
month/year of publication] 
2) Only the accepted version of an IEEE copyrighted paper can be used when posting the paper or your thesis
on-line.
3) In placing the thesis on the author's university website, please display the following message in a prominent
place on the website: In reference to IEEE copyrighted material which is used with permission in this thesis, the
IEEE does not endorse any of [university/educational entity's name goes here]'s products or services. Internal or
personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for
advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to
http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License
from RightsLink. 
If applicable, University Microfilms and/or ProQuest Library, or the Archives of Canada may supply single copies
of the dissertation.
    
 Copyright © 2018 Copyright Clearance Center, Inc. All Rights Reserved. Privacy statement. Terms and Conditions. Comments? We would like to hear from you. E-mail us at customercare@copyright.com 
91
Appendix A (Continued)
Title: Exploiting Algorithmic Noise
Tolerance for Scalable On-Chip
Voltage Regulation
Author: Longfei Wang
Publication: Very Large Scale Integration
Systems, IEEE Transactions on
Publisher: IEEE
Date: Dec 31, 1969
Copyright © 1969, IEEE
LOGIN
If you're a copyright.com
user, you can login to
RightsLink using your
copyright.com credentials.
Already a RightsLink user or
want to learn more?
Thesis / Dissertation Reuse
The IEEE does not require individuals working on a thesis to obtain a formal reuse license, however,
you may print out this statement to be used as a permission grant: 
Requirements to be followed when using any portion (e.g., figure, graph, table, or textual material) of an IEEE
copyrighted paper in a thesis:
1) In the case of textual material (e.g., using short quotes or referring to the work within these papers) users
must give full credit to the original source (author, paper, publication) followed by the IEEE copyright line © 2011
IEEE. 
2) In the case of illustrations or tabular material, we require that the copyright line © [Year of original
publication] IEEE appear prominently with each reprinted figure and/or table. 
3) If a substantial portion of the original paper is to be used, and if you are not the senior author, also obtain the
senior author's approval. 
Requirements to be followed when using an entire IEEE copyrighted paper in a thesis: 
1) The following IEEE copyright/ credit notice should be placed prominently in the references: © [year of original
publication] IEEE. Reprinted, with permission, from [author names, paper title, IEEE publication title, and
month/year of publication] 
2) Only the accepted version of an IEEE copyrighted paper can be used when posting the paper or your thesis
on-line.
3) In placing the thesis on the author's university website, please display the following message in a prominent
place on the website: In reference to IEEE copyrighted material which is used with permission in this thesis, the
IEEE does not endorse any of [university/educational entity's name goes here]'s products or services. Internal or
personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for
advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to
http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License
from RightsLink. 
If applicable, University Microfilms and/or ProQuest Library, or the Archives of Canada may supply single copies
of the dissertation.
    
 Copyright © 2018 Copyright Clearance Center, Inc. All Rights Reserved. Privacy statement. Terms and Conditions. Comments? We would like to hear from you. E-mail us at customercare@copyright.com 
92
ABOUT THE AUTHOR
Longfei Wang received the B.S. degree in electronic information engineering from Southwest
Jiaotong University, Chengdu, China, in 2010, and the M.S. degree in electrical engineering from
Texas Tech University, Lubbock, TX, USA, in 2013. He is currently pursuing the Ph.D. degree in
electrical engineering with the University of South Florida, Tampa, FL, USA. His main research
interests include on-chip voltage regulation and power management.
