Inductive noise characterization and mitigation techniques for modern computer systems by Smith, Andrew Timothy
c© 2020 Andrew Timothy Smith
INDUCTIVE NOISE CHARACTERIZATION AND MITIGATION




Submitted in partial fulfillment of the requirements
for the degree of Master of Science in Electrical and Computer Engineering
in the Graduate College of the
University of Illinois at Urbana-Champaign, 2020
Urbana, Illinois
Adviser:
Associate Professor Rakesh Kumar
ABSTRACT
As the processor market continues to diversify, and device form factor contin-
ues to change, the challenges of processor power delivery and voltage noise
become increasingly difficult to address. This thesis characterizes voltage
emergencies and evaluates voltage noise mitigation techniques across a range
of processors and power distribution networks. Each processor class has
unique design constraints that affect voltage noise to differing degrees. This
work considers three processor and PDN combinations representative of the
current processor market. We evaluate the voltage noise observed, and for
each system we evaluate three predictive architectural techniques to mitigate
voltage noise. We describe the challenges inherent to each of these tech-
niques, and the potential improvements to the architectural techniques. We
also present Predict-T, a modern simulation framework for evaluating power
supply and processor interaction. The framework is useful for evaluating
collaborative circuit level and architecture level noise mitigation techniques.
Finally, an instruction dependency-based prediction mechanism is proposed
and evaluated across the different processors.
ii
To my parents Tim and Julie Smith
iii
ACKNOWLEDGMENTS
I would like to acknowledge the guidance of my adviser, Professor Rakesh
Kumar, without whom this research would not be possible. I would also like
to acknowledge the support of my peers Husnain Mubarik, Jingyang Liu,
Matthew Tomei, Nathan Bleier, Nicholas Cebry, Adam Auten, and Ramakr-
ishna Kanungo. I am also indebted to Professor Yue Cao, Professor Pavan




LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
LIST OF ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . xi
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . 1
CHAPTER 2 VOLTAGE EMERGENCIES . . . . . . . . . . . . . . . 5
2.1 Power Distribution Network . . . . . . . . . . . . . . . . . . . 5
2.2 Impedance and Droop . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Circuit-level Solutions . . . . . . . . . . . . . . . . . . . . . . 7
CHAPTER 3 ARCHITECTURAL TECHNIQUES TO MITIGATE
VOLTAGE EMERGENCIES . . . . . . . . . . . . . . . . . . . . . 8
3.1 Reactive Mechanisms . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Predictive Mechanisms . . . . . . . . . . . . . . . . . . . . . . 9
CHAPTER 4 POWER DISTRIBUTION NETWORKS FOR MOD-
ERN SYSTEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1 Power Distribution Configuration . . . . . . . . . . . . . . . . 14
CHAPTER 5 A METHODOLOGY FOR EVALUATING PDN -
ARCHITECTURE INTERACTIONS . . . . . . . . . . . . . . . . . 20
5.1 Simulating Processors and PDNs . . . . . . . . . . . . . . . . 20
5.2 Predict-T: A Simulation Framework for Voltage Emergency
Mitigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3 Processor Configurations . . . . . . . . . . . . . . . . . . . . . 23
CHAPTER 6 CHARACTERIZING VOLTAGE EMERGENCIES
ACROSS SYSTEMS . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.1 Desktop Class . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2 Laptop Class . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
v
CHAPTER 7 CHARACTERIZING EFFICACY OF PREDICTION
MECHANISMS ACROSS SYSTEMS . . . . . . . . . . . . . . . . . 35
7.1 Voltage Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.2 µArchitecture Event Predictor and Signature Predictor . . . . 37
7.3 All Things Considered . . . . . . . . . . . . . . . . . . . . . . 38
CHAPTER 8 POTENTIAL IMPROVEMENTS TO VOLTAGE
EMERGENCY PREDICTORS . . . . . . . . . . . . . . . . . . . . 40
8.1 Adressing DeCoR Domino Effect . . . . . . . . . . . . . . . . 40
8.2 Improved Prediction through Instruction Dependency Analysis 41
8.3 Reconsidering Throttling As a Mitigation Technique . . . . . . 45
CHAPTER 9 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . 50
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
vi
LIST OF TABLES
5.1 Processor Parameters for the Mobile Processor Class . . . . . 24
5.2 Mobile Processor Electrical Specification . . . . . . . . . . . . 24
5.3 Processor Parameters for the Laptop Processor Class . . . . . 25
5.4 Laptop Processor Electrical Specification . . . . . . . . . . . . 25
5.5 Processor Parameters for the Desktop Processor Class . . . . . 26
5.6 Desktop Processor Electrical Specification . . . . . . . . . . . 26
8.1 Power Distribution Network Parameters for Controlled Models 47
vii
LIST OF FIGURES
2.1 First- and second-order voltage droop during load step . . . . 6
3.1 State machine for a generic reactive mechanism . . . . . . . . 8
3.2 Black box power prediction unit and the underlying state
machine of a power prediction unit . . . . . . . . . . . . . . . 10
3.3 Microarchitectural Event Predictor’s black box circuitry . . . . 11
3.4 Inner workings of signature based predictor showing CAM
and Bloom filter . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1 Mobile class PDN frequency response . . . . . . . . . . . . . . 16
4.2 Mobile class PDN step response . . . . . . . . . . . . . . . . . 16
4.3 Full mobile PDN model . . . . . . . . . . . . . . . . . . . . . 16
4.4 Approximated mobile PDN model . . . . . . . . . . . . . . . . 16
4.5 Laptop class PDN frequency response . . . . . . . . . . . . . . 17
4.6 Laptop class PDN step response . . . . . . . . . . . . . . . . . 17
4.7 Full laptop PDN model . . . . . . . . . . . . . . . . . . . . . . 17
4.8 Approximated laptop PDN model . . . . . . . . . . . . . . . . 18
4.9 Desktop class PDN frequency response . . . . . . . . . . . . . 18
4.10 Desktop class PDN step response . . . . . . . . . . . . . . . . 18
4.11 Full desktop PDN model . . . . . . . . . . . . . . . . . . . . . 18
4.12 Approximated desktop PDN model . . . . . . . . . . . . . . . 19
5.1 Simulation framework . . . . . . . . . . . . . . . . . . . . . . 23
6.1 Desktop processor class, single-core current (left) and a
detailed view (right) . . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Desktop processor class, single-core voltage (left) and a
detailed view (right) . . . . . . . . . . . . . . . . . . . . . . . 28
6.3 Total number of voltage emergencies over execution time
for desktop class Core+PDN . . . . . . . . . . . . . . . . . . . 29
6.4 Laptop processor class, single-core current (left) and a de-
tailed view (right) . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.5 Laptop processor class, single-core voltage (left) and a de-
tailed view (right) . . . . . . . . . . . . . . . . . . . . . . . . . 30
viii
6.6 Total number of voltage emergencies over execution time
for laptop class Core+PDN . . . . . . . . . . . . . . . . . . . 31
6.7 Mobile processor class, single-core current (left) and a de-
tailed view (right) . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.8 Mobile processor class, single-core voltage (left) and a de-
tailed view (right) . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.9 Total number of voltage emergencies over execution time
for mobile class Core+PDN . . . . . . . . . . . . . . . . . . . 33
6.10 Voltage emergency rates across the different classes of CPU
and PDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.1 Total number of voltage threshold crossings over execution
time for desktop class Core+PDN . . . . . . . . . . . . . . . . 36
7.2 Total number of voltage threshold crossings over execution
time for laptop class Core+PDN . . . . . . . . . . . . . . . . . 36
7.3 Total number of voltage threshold crossings over execution
time for mobile class Core+PDN . . . . . . . . . . . . . . . . 37
7.4 Speedup with respect to a DeCoR only system for the
Desktop Class CPU and PDN . . . . . . . . . . . . . . . . . . 38
7.5 Speedup with respect to a DeCoR only system for the Lap-
top Class CPU and PDN . . . . . . . . . . . . . . . . . . . . . 39
7.6 Speedup with respect to a DeCoR only system for the Mo-
bile Class CPU and PDN . . . . . . . . . . . . . . . . . . . . . 39
7.7 Reduction in the number of voltage emergencies across the
three systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
8.1 DeCoR domino effect. Immediately after the DeCoR event
is over, the device transitions back to a high dynamic power
state causing a droop event. . . . . . . . . . . . . . . . . . . . 40
8.2 Reduction in number of voltage emergencies with throt-
tling on restore . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8.3 Speedup with respect to a DeCoR only system for the desk-
top class CPU and PDN . . . . . . . . . . . . . . . . . . . . . 42
8.4 Speedup with respect to a DeCoR only system for the lap-
top class CPU and PDN . . . . . . . . . . . . . . . . . . . . . 43
8.5 Speedup with respect to a DeCoR only system for the mo-
bile class CPU and PDN . . . . . . . . . . . . . . . . . . . . . 43
8.6 Average speedup across the three different systems with
realistic PDN models . . . . . . . . . . . . . . . . . . . . . . . 44
8.7 Number of VEs improved for all except qsort with the sig-
nature predictor . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8.8 Average improvement in number of VEs across the three
realistic systems . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8.9 Desktop voltage emergencies observed across prediction mech-
anisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
ix
8.10 Mobile voltage emergencies observed across prediction mech-
anisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.11 Harvard PDN frequency response plots for the three classes
of processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
8.12 Average improvement in number of VEs across the three
controlled PDN systems . . . . . . . . . . . . . . . . . . . . . 49
8.13 Average speedup across the three controlled PDN systems . . 49
x
LIST OF ABBREVIATIONS
AMS Analog and Mixed Signal







L# Level # Cache
PCB Printed Circuit Board




SIMD Single Instruction Multiple Data
SQ Store Queue
TAS Throttle after Stall
VAMS Verilog Analog Mixed Signal
VE Voltage Emergency
VPI Verilog Procedural Interface




As processors continue to scale towards the limits of physics, so do the chal-
lenges of power delivery. Processor technology has scaled much more rapidly
than the more mature field of power distribution network design, which has
made the challenge of mitigating voltage noise in microprocessors an increas-
ingly difficult issue. As processors become more complex and optimized for
performance, so do the challenges of managing the rate at which the oper-
ating point can shift. A single processor core can transition from operating
at a low dynamic power state to a high-power state in as many cycles as
the pipeline is deep. The transitions create large magnitude supply current
steps with high slew rates which in turn excite high frequency impedance
peaks in the power delivery network. Increasingly optimized core designs
only increase the slew rate as the processor can shift operating points more
rapidly.
Effective power distribution network design is further challenged by other
recent developments in processor technology such as the almost exponential
increase in the number of cores per device. Today processors are produced
with upwards of 64 cores in a single socket solution [1], [2]. In collaborative
multi-threaded applications, synchronization points, where all cores transi-
tion from a waiting state to an executing state, can cause a large magnitude
current step [3], [4]. Power distribution network design is also increasingly
constrained by form factor, especially in mobile applications where printed
circuit board space is expensive, posing challenges to designers. All of these
developments place a burden on engineers to design large voltage margins
into the device or require an increase in power distribution size and cost.
Previously, voltage emergency mitigation has been studied at both the
circuit level and at the architecture level. Circuit level techniques have the
advantage of not incurring an execution time penalty. However, they usu-
ally incur the cost of increased power consumption and reduced efficiency.
1
Techniques such as adding auxiliary circuits to source or sink current dur-
ing transients [5], [6] either have physical limitations due to sensor delay or
require accurate load transient information far in advance of the transient.
Other circuit level techniques have attempted to move the off-chip switching
voltage regulators onto the die [7]. Although the control bandwidth is orders
of magnitude larger than the control bandwidth of an off-chip regulator, they
are still susceptible to first-order voltage droop.
Architectural level techniques attempt to control the rate of change from
one operating point through throttling [8], [9], dynamic instruction issue
width [10], or no-op insertion [11]. When these techniques are activated by
voltage sensors, the sensor delay and phase delay of the PDN can limit their
effectiveness. Thus, predictive mechanisms have been proposed to identify
transient occurrences in advance. Despite being an effective tool to mitigate
voltage noise, the cost of invoking these techniques incurs a performance
penalty.
There have also been many new developments in device technology as
well as shifts in device form factor that constrain the PDN designs, such
as 2.5D and 3D integration techniques [12]. The extra interface layers from
interposers and stacked chiplets all add complexity to the PDN. These PDNs
in conjunction with the modern processors have exacerbated the problems of
mitigating voltage emergencies. The efficacy of a prediction mechanism and
mitigation technique are closely coupled when considered in the context of
modern PDN and processor design.
We characterize modern processors from the mobile, laptop, and desktop
classes of devices and their corresponding PDNs. Each PDN and processor
pair is evaluated with six architectural voltage emergency mitigation tech-
niques and the efficacy of each is analyzed. Each configuration has unique
processor-PDN interactions that impact the efficacy of the mitigation tech-
nique. Further research is required to implement mitigation techniques that
are specific to the nuances of the PDN and processor pairs.
In Chapter 2, we discuss the basics of power distribution networks, the
origins of voltage emergencies, and the negative impacts on processor exe-
cution. Various state-of-the-art circuit-level droop mitigation techniques are
presented, and their corresponding design trade-offs are analyzed.
In Chapter 3, we cover the various architectural techniques that have been
proposed to mitigate voltage emergencies, their implementation details, and
2
design tradeoffs. There are two distinct classes of architectural mitigation
techniques: predictive and reactive. To achieve the greatest reduction in
voltage emergencies, a predictive mechanism must be used. However, there
are many challenges in designing an effective predictive mechanism.
In Chapter 4, we discuss the need for PDN designs adapted for specific
classes of processors. We present and analyze three PDN designs for a modern
mobile, laptop, and desktop processor.
In Chapter 5, we introduce the three processor configurations and their
electrical specifications. We also present an overview of Predict-T — our
simulation framework used to evaluate the PDN-processor-predictor combi-
nations.
In Chapter 6, we analyze the quantity, magnitude, and rate of voltage
emergencies across the three classes of systems. Our research shows how
different PDN and processor pairings create different noise profiles across
the benchmark applications.
In Chapter 7, we analyze the effectiveness of the voltage emergency miti-
gation techniques, and show how differing PDN-processor interactions limit
the efficacy of prior mitigation techniques. We also discuss limitations of
these techniques and finish with a discussion of the performance degrading
factors for the mobile and desktop class devices.
In Chapter 8, we present some possible improvements over prior baseline
predictive mechanisms. In this chapter we also propose two derived pre-
diction mechanisms based on the state of the processor and compare their
efficacy to the prior prediction mechanisms. Finally, we discuss some of the
issues of using throttling as a mitigation technique and highlight the need
for developing new methods for reducing slew rate while maintaining perfor-
mance.
The Predict-T simulator framework provides a rapid way to implement and
evaluate processor-PDN interactions. Our simulator combines the benefits of
detailed RTL simulations by modeling the PDN in Verilog AMS [13], with the
speed of cycle accurate simulators. We present the challenges of mitigating
the voltage noise on modern systems with prior architectural techniques, and
we adapt two sate-based techniques for predicting voltage emergencies.
The work described in this thesis was part of a larger multidisciplinary
collaborative project between the author, students and professors in the Pas-
sat Research Group at UIUC, and Integrated Circuits and Systems Group
3
at UIUC. We received guidance from researchers at Intel and ARM. The
author was primarily responsible for background research and building the
Predict-T simulation framework to perform rapid studies of PDN-processor
interactions. Other students in the Passat Group assisted with modifying
and testing of the tools that comprise the framework. Students from the
Integrated Circuits and Systems Group assisted with the PDN modeling for
the three test systems. As processors and device form factors continue to
change, in order to address voltage droop in modern systems there must be





A voltage emergency is the direct result of the voltage level dropping below
the rated Vmin of the processor. Each device has an electrical specification
requiring that, at all the operating points of the device, the supply voltage
remains within a range of the nominal voltage of the device (Vnom). When
voltage noise is severe, it is possible for the voltage to temporarily drop
below the lower bound or rise above the upper bound. When this occurs the
timing of the circuits is affected, and setup or hold time violations can occur.
In turn, timing violations can cause an incorrect value to be stored in the
next pipeline stage, thereby affecting the correctness of program execution.
However, in order to understand voltage emergencies, a basic knowledge of
the topology of the power distribution network (PDN) is required.
2.1 Power Distribution Network
A power distribution network is the path from the power supply to the load.
Normally there are four stages in a power distribution network: the voltage
regulator module (VRM), the printed circuit board (PCB), the package, and
the die. The voltage regulator module is usually a switched mode power
supply (SMPS) due to its high efficiency for converting the system voltage
down to the low voltage, high current requirements of modern processors.
The SMPS commonly switches across an inductor. Due to the physical nature
of the SMPS it cannot change its output current requirement as fast as the
load can change its operating point. To ensure that the voltage remains
within an acceptable tolerance required by the device, bulk capacitors are
placed at each stage of the power delivery network. These capacitors act as
a stopgap to supply current while the SMPS responds at a much slower rate
to the new operating point. Each PDN stage consists of a trace or copper
5
pour depending on the current requirement, and a bulk capacitance.
2.2 Impedance and Droop
In reality, the capacitors and traces are not ideal and have an intrinsic par-
asitic inductance. Parasitic inductance at each stage of the PDN resists the
change in current required by the load. This parasitic inductance, along
with the resistance of the copper pour and the bulk capacitances, forms an
RLC circuit. These RLC circuits formed by each PDN stage have resonant
frequencies from the inductance and the capacitance. Load transitions can
excite these resonant frequencies and cause a voltage droop at the device. By
convention, each stage of the PDN corresponds to a different order droop.
The die stage has the smallest L and C values and the highest frequency
resonant peak. This resonant peak is known as the first-order peak and is
responsible for first-order droop. Working back towards the VRM, the stages
usually have larger capacitances and thus lower frequency peaks and are re-
sponsible for second- and third-order droops. The first- and second-order
resonances can clearly be seen in Figure 2.1. The high frequency sinusoid is
from the die stage of the PDN and the lower frequency sinusoid is produced
by the package stage. The voltage droop due to the stages of the PDN is
commonly referred to as inductive noise or di/dt noise.
Figure 2.1: First- and second-order voltage droop during load step
6
2.3 Circuit-level Solutions
There have been many proposed solutions for handling voltage emergencies
at the circuit level. These proposed solutions have differing implementation
challenges, with some more feasible than others.
One approach is the Intel Fully Integrated Voltage Regulator (FIVR) [7].
In the field of power electronics, there has been a recent movement to point
of load (PoL) power supplies. These supplies move the power supply reg-
ulation as close to the load as possible. Relocating the regulator closer to
the load improves the response of the feedback loop. Intel’s FIVR technol-
ogy moves the switching regulator directly onto the chip and integrates the
power inductors into the package substrate. The switching transistors are
implemented on the same process as the processor and thus can switch on
the order of 100 MHz. The control bandwidth of a SMPS is on the order of
one tenth of the switching frequency and scales with the number of phases
(the number of supplies in parallel operating at staggered intervals) [6]. The
combination of high switching frequency and many phases gives FIVR an
impressive control bandwidth that can effectively counteract the second- and
third-order droop. However, it is still susceptible to the first-order droop.
Another solution is to use a current shunt mechanism to source and sink
current during the load transients. The challenge with this approach is that
it requires a voltage level greater than Vnom to be maintained on the device
[14]. This is costly, as a percentage of the area available to integrate the
metal-insulator-metal (MIM) capacitors [15] must be devoted to holding the
charge for the higher potential. The reduction capacitance on the normal
voltage line would increase the die impedance peak. However, this solution
has been successfully implemented in a commercial 10nm CMOS process and
has shown promising reductions of up to 74% in voltage droop [16].
Other approaches attempt to use an auxiliary circuit that can reduce the
slew rate of the device seen by the power distribution network. These circuits
source or sink current leading up to a load transition. These approaches are
disadvantaged due to losses in the inductor and capacitor. These circuits also





As discussed in Chapter 2, implementing solutions to inductive noise at the
circuit level is costly and difficult. Architectural solutions can be effective and
less expensive as they can be implemented on die and require no extra cir-
cuits or different voltage domains. The architectural solutions can roughly be
divided into two distinct classes: reactive mechanisms and predictive mecha-
nisms. The following sections discuss their differences and implementations.
3.1 Reactive Mechanisms
Reactive mechanisms merely detect and react to the voltage emergency by
invoking a safety net mechanism that ensures correctness of execution but
does nothing to stop the emergency from occurring. The state machine for a
generic reactive mechanism is shown in Figure 3.1. There are two states: a
normal state and an emergency handler state. When in the normal state, the
processor is executing as expected. If the processor detects that a voltage
emergency occurred, then it transitions to the emergency handler state. In
the emergency handler state, the processor stalls execution and invokes a
checkpoint and restore mechanism. After the restore completes, the device
transitions back to the normal state and continues execution. This approach
is known as a checkpoint and rollback mechanism.
Figure 3.1: State machine for a generic reactive mechanism
8
3.1.1 DeCoR
An implementation of a checkpointing and rollback mechanism specifically
designed to handle voltage emergencies is the DeCoR [18] delayed commit
and rollback scheme. This mechanism was designed to be lightweight and low
latency compared to traditional error correction mechanisms. The need for
a lightweight mechanism designed for voltage emergency recovery originates
from the realization that voltage emergencies are becoming more frequent.
At the rate voltage emergencies occur, the overhead of using traditional error
correcting checkpointing schemes would be prohibitive. In all the following
results and comparisons, the DeCoR mechanism is modeled to take 100 cycles
to restore and roll back to the last saved execution context prior to the voltage
emergency.
3.2 Predictive Mechanisms
The predictive mechanism attempts to predict if a voltage emergency is going
to occur within the next X cycles and takes an action to mitigate the voltage
emergency. The action can be throttling, or it could be something more
elaborate like restricting functional unit utilization or inserting no-ops like
the techniques proposed by Powell and Vijaykumar [10],[11]. In the most
generic sense, the predictive mechanism is a black box circuit that takes in a
series of input signals [i0, ..., in−1] and produces an action signal (Figure 3.2).
If the action signal is non-zero, then an action is performed to mitigate the
voltage emergency. All the predictive mechanisms have the same underlying
state machine (Figure 3.2).
9
Figure 3.2: Black box power prediction unit and the underlying state
machine of a power prediction unit
All predictive mechanisms need a fallback mechanism to ensure correctness
of execution in the event of a misprediction. All the following mechanisms
use DeCoR commit and rollback to handle a voltage emergency in the event
of an incorrect prediction.
3.2.1 Voltage Sensors
The first predictive mechanism is a voltage sensor-based approach. This
approach is evaluated by Gupta et al. [8] and Reddi et al. [9] The prediction
unit takes in inputs from sensors scattered across the die area and compares
the values with a threshold voltage. This threshold voltage is chosen such
that the following constraint is satisfied: Vemergency < Vthreshold < Vnominal.
If Vthreshold is selected aggressively, then there may be many false positives
issued. For all the following voltage sensor evaluations, Vthreshold is chosen to
be 98% of the Vnominal of the device.
3.2.2 Microarchitectural Event Predictor
The work performed by Gupta et al. [8] highlights the strong causality be-
tween microarchitectural events such as an L2 cache miss or branch mispre-
diction and a voltage emergency. Essentially these microarchitectural events
are associated with the processor being stalled, blocked, or squashed. During
10
these events, the processor is in a state of low computational activity, and
after the event completes, the execution resumes to normal. This results in
a load step and potentially a voltage emergency. Gupta et al. also point out
that there are only a few unique instruction addresses and event combina-
tions for programs that lead to these voltage emergencies, so they use a small
content-addressable memory to store the PC and event that have caused a
voltage emergency in the past. Each time the event updates, the CAM is
checked for a match with the current <PC, Event> pair. If there is a match,
then a throttling action is issued. The black box prediction logic is shown in
Figure 3.3. The prediction circuit takes in a <PC, Event> pair and outputs
a binary throttle / no throttle decision.
Figure 3.3: Microarchitectural Event Predictor’s black box circuitry
3.2.3 Event Signature Predictor
Reddi et al. [9] make three notable modifications to the microarchitectural
event predictor to improve its efficacy. First, they reduce the aliasing of
the input data to the predictor by taking a longer signature of events. This
signature is an array of the last n events to have occurred in the processor.
Second, a long-term storage container to remember signatures and PCs that
have been evicted from the CAM is added. The underlying container is a
Bloom filter, which hashes the value n-ways and sets the bits in a bit array to
1 corresponding to the hashes. This type of container can only be added to;
elements cannot be removed, which may lead to higher false positive rates
over time. Finally, the addition of a counter stored along with the <PC,
11
Signature> pairs acts as a threshold to store evicted elements in the Bloom
filter if they have been observed more than threshold number of times. The
resulting black box logic is shown in Figure 3.4.
Figure 3.4: Inner workings of signature based predictor showing CAM and
Bloom filter
3.2.4 Other Notable Approaches
Although the primary focus of this thesis is on the performance of the
prediction mechanisms highlighted in Subsections 3.2.1, 3.2.2, 3.2.3, there
have been other proposed architectural voltage noise prediction techniques.
Kayanam et al. [19] propose a statistical analysis-based approach to predict-
ing the power in a scalar and vector processor. They use stall probe points in
the processor pipeline and watch for the occurrence of select microarchitec-
tural events. Using trace data collected from test applications they generate
a statistical model for prediction the instantaneous power. Although this
method is meant for use in the pre-silicon design stage of the processor and
not meant for an integration into the processor, their study is important as
they can extract accurate power predictions by tracking stall points in the
processor and using microarchitectural events. The prediction mechanism
proposed by Leng et al. [20], although tuned for GPU compute platforms,
12
uses analysis of the dispatched instruction type and quantity along with regis-
ter file accesses to predict a voltage droop. We attempt a basic version of this
mechanism analyzing quantity of instructions issued by the dispatch stage
in Chapter 8. The technique proposed by Miller et al. [3] focuses on power
supply droop caused by synchronization in collaborative multi-threaded ap-
plications. The voltage emergency duration and magnitude caused by barrier
entry and exit are on the order of microseconds in duration. These types of
voltage emergency are better suited to be mitigated though software and the
OS. Other work by Reddi et al. [21] analyzes the constructive and destructive
effects of voltage noise in a multi-programmed environment. Their proposed
solution is to schedule tasks that have destructive interference on separate





Previous works have developed and evaluated voltage noise mitigation tech-
niques in the context of desktop processors. The prior works have used
simplified PDN models from early Intel Pentium chips. However, when eval-
uating voltage noise across different systems it is important to use a PDN
model that suits a device’s electrical requirements. It is not sufficient to
connect a mobile processor to a desktop class PDN as the desktop PDN was
designed to supply a much larger load current and is subject to different
spatial constraints than a PDN designed for a mobile device. Thus, updated
PDN designs representative of modern PDNs are required for each class of
processor: mobile, laptop, and desktop. In Section 4.1 the three specialized
classes of PDN are described and their operating ranges are characterized.
4.1 Power Distribution Configuration
Power distribution networks are constrained by the device form factor, cost,
and physics. Each device and PDN class is limited to a differing degree.
There have been prior power distribution works by Zhang et al. [22], Herrell
and Becker. [23] and Smith et al. [24] that study voltage noise in distributed
grid models of the power distribution network. This type of power distri-
bution network is useful for studying intradie voltage droop; however, when
studying the effects of the power distribution network in the absence of a
chip floor plan, or information on the package’s C4 pad placement, a sim-
pler lumped model can be used. Impedance profiles seen by the device as a
whole are nearly identical in the distributed model and the lumped model,
as shown in the work by M. S. Gupta et al. [25]. The power distribution
networks do not model the switching behavior and the power supply, nor do
they model the feedback loop of the VRM. Intel has been able to achieve a
14
VRM bandwidth of 100 MHz with their FIVR on-chip switching regulators
[7]; however, for off-chip switching regulators, the control bandwidth is 1 to
2 orders of magnitude lower [26], [27], [28]. The greater distance between the
load and the regulation point can limit the regulation performance as the
feedback loop is subject to the same parasitic inductance and capacitance
as the PDN. Common feedback points are either at the package pin or the
die [29], [30]. The high order impedance peaks are usually outside of the
control bandwidth of the VRM. This allows the PDN to be approximated
by removing stages that result in the lower order impedance peaks. The
result is a simplified model that only simulates the effects of the high fre-
quency impedance; otherwise, the DC IR droop normally compensated by
the feedback loop can result in voltage emergencies from DC current alone.
Similar PDN approximations are used by Gupta et al. [8] and Reddi et al [9]
for the PDN models used in their analysis. In all the following systems the
approximate PDN model closely follows the impedance profile of the high
frequency peaks and matches the step response of the full model, however
with slightly less DC IR droop. Improvements to the PDN model such as
modeling a VRM with feedback are reserved for future work.
4.1.1 Mobile Power Distribution Network
The mobile PDN is most limited by the device form factor and the cost of
the solution posing a difficult design challenge. Mobile devices are often on
the cutting edge of the process node [31] to extract the most performance
and battery hours in as small a die area as possible. The smaller die size
limits the die capacitance and the smaller device package limits the package
capacitance. In smart phones and tablets the motherboard may only be two
or three square inches as a majority of the internal volume is used to fit
larger batteries. The processor and power delivery network need to share
this confined space with all of the other devices such as memory, battery
management, peripheral controllers and RF power stages leaving very little
space for decoupling capacitors at each stage of the PDN (Figure 4.3). The
incredibly dense PCB design also limits the layers the PDN can use and
the trace widths. All these factors lead to a PDN with high impedance
peaks from the package and die stages. The limited capacitance results in
15
impedance peaks of 0.3 Ω for the die stage and 0.05 Ω for the package stage, as
shown in Figure 4.1. The large high frequency impedance is highly excitable
during a load transient event. The die impedance causes a high frequency
resonance on top of the lower frequency package impedance droop, as shown
in Figure 4.2.
Figure 4.1: Mobile class PDN
frequency response
Figure 4.2: Mobile class PDN step
response
The mobile PDN is modeled from the frequency response plot of a simpli-
fied PDN for a dual core ARM device [32]. The full mobile PDN is shown in
Figure 4.3 and its approximate model is shown in Figure 4.4. The full PDN
model in Figure 4.3 shows each of the stages of the PDN. When compared
to the laptop or desktop PDNs shown in Section 4.1.2 and Section 4.1.3 the
mobile PDN has notably less capacitance at each stage; however, the mobile
PDN has similar levels of parasitic inductance and damping resistance. The
lower capacitance is the primary reason that the peak impedance is so large.
Figure 4.3: Full mobile PDN model
Figure 4.4: Approximated mobile PDN model
16
4.1.2 Laptop Power Distribution Network
The laptop class PDN is less space constrained than the mobile PDN, but still
more space constrained than the desktop PDN. The laptop class processor has
a larger die area and package size than the mobile class processor so it can fit a
larger capacitance at each stage (Figure 4.7). The larger capacitance reduces
the resonant frequency of the impedance peak and its magnitude, as shown
in Figure 4.5. The resulting PDN responds at a much lower rate to a load
transient, as shown in Figure 4.6. The resulting voltage waveform is much
more predictable than the mobile system. The slower response allows the
mitigation mechanism more time to respond and correct the load transient
than the mobile system. The approximate model of the laptop PDN only
has a single high order peak outside of the control bandwidth of the VRM
compared to the mobile PDN (Figure 4.5). This lower frequency single peak
results in a single sinusoid resulting from the load step (Figure 4.6). The full
laptop PDN is shown in Figure 4.7 and the approximate model is shown in
Figure 4.8.
Figure 4.5: Laptop class PDN
frequency response
Figure 4.6: Laptop class PDN step
response
Figure 4.7: Full laptop PDN model
17
Figure 4.8: Approximated laptop PDN model
4.1.3 Desktop Power Distribution Network
The least constrained power distribution network is the desktop class. With
the largest die size and board space it can afford the most capacitance at
each stage. This results in a PDN with a lower magnitude high order peak
at a slightly lower frequency than the mobile PDN (Figure 4.9) and a similar
step response behavior to the laptop PDN (Figure 4.10). The full desktop
PDN is shown in Figure 4.11 and the corresponding approximate model is
shown in Figure 4.12.
Figure 4.9: Desktop class PDN
frequency response
Figure 4.10: Desktop class PDN step
response
Figure 4.11: Full desktop PDN model
18
Figure 4.12: Approximated desktop PDN model
19
CHAPTER 5
A METHODOLOGY FOR EVALUATING
PDN - ARCHITECTURE INTERACTIONS
To evaluate the three modern PDN designs proposed in Chapter 4, three
corresponding classes of processor are required for characterizing voltage
emergencies and evaluating mitigation mechanisms. This chapter presents
the simulation framework for analyzing voltage emergencies and mitigation
techniques, as well as the three processors representative of modern mobile,
laptop, and desktop devices.
5.1 Simulating Processors and PDNs
Previous works focusing on voltage noise analysis have used many different
evaluation methods. Prior circuit level works have used pure analog simula-
tions, and even real implementations of the auxiliary circuits [6], [14], [15].
The works that purely focus on circuit level techniques usually simplify the
processor as a repetitive load step. While these types of simulations are
well accepted for evaluating the performance limitations of circuit level tech-
niques, they lose the context of the load profile generated by real processors
and the execution of benchmark applications. Other circuit level works, such
as Intel’s FIVR [7], have evaluated the performance through integration on
commercial CPU designs. While this approach provides the most accurate
performance evaluations coupled with real load transient profiles, it is also
the most expensive and least flexible evaluation technique.
Architecture level studies have primarily use three types of tools to evaluate
techniques. First, a tool for simulating a processor and program execution
is needed. Second, a tool to analyze the instantaneous power of the system
is required. Third, a tool to simulate the model of the power distribution
network is required in order to evaluate the voltage noise. The previous
studies by Gupta et al. [8] and Reddi et al [9] have used Simplescalar [33]
20
and Wattch [34] to model the execution and the power of the processor. These
tools, although adequate at the time, no longer have the features or scalability
to simulate modern processors. Prior studies by Miller et al. [3] have used
the SESC [35] cycle accurate simulator and power tools such as CACTI [36]
for computing SRAM access power, and Orion 2 [37] for evaluating network
on chip (NoC) power, while SESC has evolved into ESESC [38]. ESESC
supports a limited number of ISAs, and a lack of a full system mode that can
boot Linux. This limits the potential of the studies that can be performed.
To evaluate a statistical approach to power prediction, Kalyanam et al. [19]
use industry grade tools such as Synopsys Verilog Compiler Simulator [39],
and Ansys PowerArtist [40]. However, this approach despite being detailed
also requires actual core implementation RTL, something which is difficult
to obtain. The work performed by Chen et al. [41] use the Gem5 simulator
[42] combined with McPAT [43] for instantaneous power and HSPICE [44]
for the analog simulation.
5.2 Predict-T: A Simulation Framework for Voltage
Emergency Mitigation
After analyzing previous works, we identified a need for a framework that
could simulate complex power delivery networks and switching regulators,
with the speed and flexibility of a cycle accurate simulator. We designed a
framework with three goals in mind: to be scalable, to be interactive, and
to have high resolution. We wanted the framework to be a quality tool that
could model processor power interactions for single-core systems, multicore,
and massive multicore systems. We also wanted the simulation to be inter-
active. For example, the processor should be able to communicate with the
power supply during execution, enabling the study of fine-grained interac-
tions between the processor and power supply. Communication between the
processor and power supply enables studies of collaborative noise mitigation
techniques. Lastly, a high-resolution view of the power profile of the device
in the spatial and temporal dimensions is highly desirable for exploring high
frequency voltage noise.
To build this simulation framework we used three reputable preexisting
tools and built interfaces to move data between the tools. To simulate the
21
processor execution, we chose the Gem5 [42] simulator. Gem5 is a good fit
because it is well documented, well supported, very extensible and has a
thriving active community. Gem5 allows us to easily add to and extend on
the functionality of the processor due to its modular design. The power pre-
diction units are all built on a base power prediction unit base class allowing
for rapid design and testing of new power prediction schemes.
To model the instantaneous power of the processor over time we used the
Multicore Power Area and Timing (McPAT) [43] simulator. McPAT provides
a fine-grained temporal and spatial analysis of the static, dynamic, and leak-
age power of the device. McPAT can perform power analysis for network on
chip (NoC), SRAM memory accesses, and logic all in a single tool. McPAT
allows us to perform various sweeps on the electrical characteristics of the
device with support for different process optimizations and technology nodes.
There are a wide range of lithographies supported, although the lower limit is
already antiquated at 22 nm. McPAT is also highly scalable, allowing for the
power and area analysis of many core systems. To improve the performance
for our application while collecting long traces of execution data, extensive
modifications to McPAT were made. The area calculation is done only at
the start of the application profile and the subsequent calculations are only
dynamic power calculations. In some of our testing this improved simulation
execution time by 99%.
Finally, all the analog behavior is modeled in VerilogAMS [13] and simu-
lated with Cadence NCVerilog [45] and Spectre [46] tools. These tools are
industry verified and can perform large analog simulations quickly and accu-
rately. We chose them over other candidate tools because they can do both
digital and analog simulation in the same environment, allowing complex
power distribution and SMPS models to be used that may contain digital
logic in the feedback and control loop.
The simulation framework we developed is similar to the simulation frame-
work outlined in the work by Chen et al. [41] as we use Gem5 and McPAT;
however, we use a different analog simulator to allow for the real time in-
teraction between the processor and power supply. Chen et al. use only a
forward flow of data, so the CPU does not know its current impact on the
supply voltage.
The overall simulation framework and flow of signals between the different
tools are shown in Figure 5.1.
22
Figure 5.1: Simulation framework
To evaluate the three processor and PDN pairs, each pair is simulated
on the framework across a subset of applications from the MiBench [47]
benchmark suite. Each processor and PDN pair is evaluated across all the
benchmarks, first without a power prediction unit to establish the rates of
voltage emergency, and subsequently with different power prediction units to
evaluate the efficacy of the power prediction units across the systems. Future
work should incorporate benchmarks from MiBench, Spec2006 [48], and Par-
sec 3.0 [49]. However, because at the time of writing fine-grained temporal
power prediction mechanisms for multi-core systems are still a relatively new
area of research, multicore benchmarks are reserved for future studies.
5.3 Processor Configurations
The three processors were configured to meet area and peak power targets
from the power area and timing tool: McPAT [43]. To keep the comparisons
consistent across the devices, all the processors were given a voltage emer-
gency value of 97% of the device’s nominal voltage Vnom. This allows for a
3% voltage droop from the Vnom of the processor. Any droop that is larger
than the 3% allowable droop is considered a voltage emergency. This margin
is not tied to the real-world voltage margins of any specific device.
23
5.3.1 Mobile Processor
A typical mobile handset processor has a power budget of 0-15 W [50]. The
mobile device class is one of the most diverse market segments. There are
many different types of chips with different core configurations and accel-
erators. Due to limitations in the McPAT simulator only classical system
configurations can be modeled. Unfortunately, this ignores many of the SoC
components such as modem, video, graphics, and machine learning (ML)
accelerators that are found on modern mobile platform devices. Future it-
erations of this work should address these recent developments and consider
the system as a whole. The mobile processor specifications are shown in
Table 5.1 and the processor’s electrical operating specifications are shown in
Table 5.2. This processor is representative of a high-end mobile device.
Table 5.1: Processor Parameters for the Mobile Processor Class
Parameter Value Parameter Value
Lithography 22 nm IQ Size 16 Entry
Cores/Threads 4c/4t Int ALU 6
Optimization Low Power Int Mult/Div 4
Clk Frequency 3.0 GHz FP ALU 2
Core Width 6 Way Superscalar FP Mult/Div 1
Fetch Buffer Size 16 Entry L1 ICache Size 4 kB
Fetch Queue Size 8 Entry L1 DCache Size 2 kB
Load Queue Size 8 Entry L2 Cache Size 64 kB
Store Queue Size 8 Entry L3 Cache Size 2 MB
ROB Size 48 Entry Area 41.31 mm2
Int Registers 64 Die Capacitance 30 nF
FP Registers 64
Table 5.2: Mobile Processor Electrical Specification
Parameter Value
Supply Voltage 0.9 V
Emergency Voltage 0.87 V
Peak Power 20 W
24
5.3.2 Laptop Processor
The second device evaluated is the laptop class device. These devices, despite
having a similar core count to the mobile processor class, usually have higher
peak power. The form factor and integration are also not as tightly con-
strained as the mobile system. Laptop systems can afford a larger die area
and active cooling solutions, allowing designers to use larger super-scalar
structures, more functional units in the FU pool and larger caches at each
level of the hierarchy. A laptop processor’s peak power consumption can
range from 10 W [51] to 65 W [52]. The processor parameters are shown in
Table 5.3 and the electrical specification is shown in Table 5.4.
Table 5.3: Processor Parameters for the Laptop Processor Class
Parameter Value Parameter Value
Lithography 22 nm IQ Size 32 Entry
Cores/Threads 4c/4t Int ALU 6
Optimization Low Power Int Mult/Div 4
Clk Frequency 3.0 GHz FP ALU 4
Core Width 6 Way Superscalar FP Mult/Div 2
Fetch Buffer Size 32 Entry L1 ICache Size 8 kB
Fetch Queue Size 16 Entry L1 DCache Size 4 kB
Load Queue Size 16 Entry L2 Cache Size 128 kB
Store Queue Size 16 Entry L3 Cache Size 4 MB
ROB Size 96 Entry Area 50.12 mm2
Int Registers 128 Die Capacitance 600 nF
FP Registers 128
Table 5.4: Laptop Processor Electrical Specification
Parameter Value
Supply Voltage 1.2 V
Emergency Voltage 1.16 V
Peak Power 50 W
25
5.3.3 Desktop Processor
The final processor class studied is the desktop class. The desktop processor
is the least constrained by device form factor but the most by cost and
thermal solution. Desktop processors can range from a low power option
similar in specification to a laptop processor with two or four cores, to a
massive many-core processor with upwards of 64 cores [1], [2]. The peak
power for desktop processors ranges from 65 W [53] all the way to 250 W
[54]. The processor specifications for the desktop class device are shown in
Table 5.5 and the electrical specifications are shown in Table 5.6.
Table 5.5: Processor Parameters for the Desktop Processor Class
Parameter Value Parameter Value
Lithography 22 nm IQ Size 64 Entry
Cores/Threads 8c/8t Int ALU 8
Optimization High Performance Int Mult/Div 6
Clk Frequency 4.0 GHz FP ALU 6
Core Width 8 Way Superscalar FP Mult/Div 4
Fetch Buffer Size 64 Entry L1 ICache Size 16 kB
Fetch Queue Size 32 Entry L1 DCache Size 8 kB
Load Queue Size 32 Entry L2 Cache Size 256 kB
Store Queue Size 32 Entry L3 Cache Size 8 MB
ROB Size 192 Entry Area 141.01 mm2
Int Registers 256 Die Capacitance 1.2 µF
FP Registers 256
Table 5.6: Desktop Processor Electrical Specification
Parameter Value
Supply Voltage 1.4 V
Emergency Voltage 1.36 V





As discussed in Chapter 5, each processor and PDN class is subject to differ-
ent design and implementation constraints. The constraints affect the rate
and magnitude of the voltage emergencies observed over the course of an
application’s execution. Not only are the average rates different across the
classes of processors, the rates for specific applications can differ across the
different classes of processor. In the following sections, the interactions be-
tween the power distribution networks and the processors are characterized.
6.1 Desktop Class
Of the three classes of processor, the desktop class processor has the largest
and most complex core design and can draw the most current. However,
when coupled with the least constrained PDN design and a large die capac-
itance, large sustained shifts in the operating point cause a voltage droop.
Figure 6.1 shows the processor currents over time as the benchmark appli-
cations execute. The processor current follows a similar pattern across all
benchmarks. When the processor is idle it is consuming around 10 A of
static current. When the core transitions from idle to active and executes
instructions there is a large change in operating current. The current can
reach magnitudes of over 35 A in approximately 20 ns, as shown in Figure
6.1. The large optimized core design coupled with large caches allows the
core to sustain a period of activity for longer than the other two classes of
smaller processors.
27
Figure 6.1: Desktop processor class, single-core current (left) and a detailed
view (right)
The large capacitance at each stage of the desktop power distribution net-
work reduces the voltage noise caused by the smaller current transient events.
In Figure 6.1 there are smaller current transients in the qsort benchmark.
These current transients result in almost no voltage droop due to the larger
capacitance as shown in Figure 6.2. However, the desktop processor and
PDN pair is not immune to voltage emergencies. For example, during the fft
benchmark there is a large sustained current transient starting at 4000 ns.
The magnitude and duration of the transient are responsible for the large
resulting voltage droop shown in Figure 6.2.
Figure 6.2: Desktop processor class, single-core voltage (left) and a detailed
view (right)
The large current transients produced by the processor result in the second
highest voltage emergency rate of 4.316 VE/µs among the single threaded
test workloads (Figure 6.3). Figure 6.3 also illustrates that the rate of voltage
emergencies is application dependent. The type of work that the application
performs and its dependency on memory are contributing factors to the num-
ber of observed voltage emergencies. A memory bound application such as
dijkstra leads to repetitive phases of memory accesses and computation. Pe-
riods of inactivity while waiting on memory requests followed by periods of
28
activity after the requests complete result in a high rate of voltage emergen-
cies compared to the other applications tested.
Figure 6.3: Total number of voltage emergencies over execution time for
desktop class Core+PDN
6.2 Laptop Class
The laptop class PDN is similar in resonant frequency and only double the
peak impedance of the desktop PDN; however, when paired with the laptop
processor there is a different voltage emergency behavior than the desktop
processor and PDN pair. The laptop class processor has a smaller and nar-
rower core design than the desktop core and it is coupled to smaller caches
than the desktop processor. Physically, the laptop core uses the low power
optimizations from McPAT so it has a lower static and dynamic power than
the desktop core. The laptop class processor has an idle current of 3 A with
peak currents reaching 15 A for certain workloads, as shown in Figure 6.4.
The lower magnitude transients also have a lower slew rate than the desktop
processors due to the lower clock speeds. All of the characteristics of the
laptop core are well suited for the PDN; hence, this combination experiences
fewer total voltage emergencies. Despite observing fewer emergencies, the
burst behavior that results from having a narrower core, fewer functional
units, and smaller caches results in a noisier supply line when comparing the
laptop voltage supply in Figure 6.5 with the desktop voltage supply shown
in Figure 6.2.
29
Figure 6.4: Laptop processor class, single-core current (left) and a detailed
view (right)
Figure 6.5: Laptop processor class, single-core voltage (left) and a detailed
view (right)
Performance of the laptop class processor and PDN combination highlights
the differing rates of voltage emergency across applications as shown in Figure
6.6. Again, the highly memory dependent dijkstra, qsort, and susan applica-
tions see the highest rate of voltage emergency due to their periodic idle and
activity phases of computation. Applications that are not as memory bound
such as toast, untoast, and fft have lower magnitude current transients and
longer duration transients, resulting in a lower rate of voltage emergency.
The combination of only a slightly higher impedance peak and a lower peak
current core results in the lowest average voltage emergency rate of 2.726
VE/µs as shown in Figure 6.10.
30
Figure 6.6: Total number of voltage emergencies over execution time for
laptop class Core+PDN
6.2.1 Mobile Class
It is common to push the mobile core to its operational limits to get the
most performance from the processor. When the processor is combined with
the constrained mobile PDN, the result is the highest voltage emergency
rate among all three classes. The mobile processor has an idle current of
approximately 2.25 A with currents as high as 8 A as shown in Figure 6.7.
The mobile processor also has the smallest core parameters, cache sizes,
and functional unit pool. The execution of the test workloads results in
high frequency transitions between the idle and active phases. The mobile
class with its highly constrained PDN coupled with the smallest cache and
core parameters results in current transients that can excite the resonant
frequency of the PDN.
Figure 6.7: Mobile processor class, single-core current (left) and a detailed
view (right)
These transitions excite the high frequency die stage impedance at the
31
resonant frequency as shown in Figure 6.8. There is also not much delay
between the occurrence of a transient and the resulting droop. For example,
benchmark application qsort exhibits two large transients occurring at 5600
ns as shown in Figure 6.7, and the droop occurs instantaneously as shown
in Figure 6.8. The delay interval between transient and droop in the laptop
and desktop is much more pronounced. The reduced delay between transient
and droop in the mobile configuration requires a fast mitigation mechanism
to effectively compensate for the load transient. Resonance is also another
challenge to account for in the mobile systems that is much less prevalent in
the laptop and desktop class systems. An example of resonance occurs shortly
after 5640 ns in Figure 6.8. When the second load transient in the qsort
benchmark occurs shortly after the first transient, the first-order resonance
has not been dampened by the time the second transient occurs. The droops
interfered constructively, and the resulting droop is larger than the original.
Figure 6.8: Mobile processor class, single-core voltage (left) and a detailed
view (right)
The mobile PDN and processor combination leads to the highest rate of
voltage emergencies out of the three classes of processor with a rate of 8.562
VE/µs as shown in Figure 6.9 and Figure 6.10. The high rate of voltage
emergencies can be attributed in part to the PDN and to the behavior of the
core. The high frequency impedance of the die PDN stage coupled with the
small core parameters results in load transients that excite the resonances of
the PDN. The voltage emergencies are short in duration but occur at a much
higher rate. A single transient can trigger many voltage emergencies due to
the high frequency die impedance making the challenges of controlling the
voltage noise more complex.
32
Figure 6.9: Total number of voltage emergencies over execution time for
mobile class Core+PDN
A comparison of voltage emergency rates across all systems is shown in
Figure 6.10. As discussed, the mobile class PDN has the highest rate of
voltage emergency due to the large die stage impedance and small core design.
The laptop has the lowest rate of voltage emergency due to the appropriately
sized power distribution network and slightly smaller core than the desktop
processor. The desktop processor has the second greatest rate due to its large
performance optimized core design and large caches.
Figure 6.10: Voltage emergency rates across the different classes of CPU
and PDN
Not only do the different classes of processors and PDNs have different
rates of voltage emergency, the duration, delay and magnitude of the volt-
age emergency are different among the classes. These differences impact
33
the efficacy of the different architectural mechanisms for mitigating voltage






To evaluate the efficacy of the predictive mechanisms, each mechanism is
tested for each device class. There are two important metrics that determine
the efficacy of a mechanism. First, it must reduce the number of voltage
emergencies; second, it must offer a speedup in execution over the baseline
reactive mechanism. A mechanism that is able to mitigate every single emer-
gency, but takes longer than the baseline reactive mechanism to complete,
would be ineffective. The reactive mechanism with DeCoR as the emergency
backstop will be the baseline for evaluating the predictive mechanisms. The
following predictive mechanisms attempt to reduce the number of times that
the voltage emergencies occur and limit the number of times the DeCoR
emergency handler must be invoked. A mechanism will be considered effec-
tive if it achieves a reduction in number of voltage emergencies and provides
a speed increase over the baseline. Each mechanism has varying degrees of
success across the different systems and applications.
7.1 Voltage Sensors
The voltage sensor approach throttles the processor when the supply voltage
drops below an arbitrary trigger threshold. Setting this threshold more ag-
gressively allows the processor to have more time to respond to the voltage
emergency by throttling; however, the higher threshold will have a higher
performance overhead due to the greater number of crossings that do not
cause emergencies. Crossings that do not result in emergencies are known
as benign crossings. If the voltage crosses the threshold, the processor issues
the throttle signal. If the threshold crossing were benign, this would result in
a performance degradation of the processor by throttling when unnecessary.
The desktop class processor has a low rate of voltage emergency threshold
35
crossings that do not result in a voltage emergency, as shown in Figure 7.1.
In this case, we would expect the voltage sensor approach to be able to mit-
igate the voltage emergencies with limited overhead, as a threshold crossing
is more likely to result in an emergency. In Section 6.1 it was shown that
the desktop supply voltage is the least noisy of the systems analyzed; thus,
there are fewer benign threshold crossings.
Figure 7.1: Total number of voltage threshold crossings over execution time
for desktop class Core+PDN
The laptop class processor and PDN combination experiences a higher
rate of benign threshold crossings. However, the overhead of the throttling
scheme is expected to be higher than the overhead of the throttling scheme
on the desktop processor (Figure 7.2). On average the laptop system ex-
periences approximately twice the number of threshold crossings to voltage
emergencies.
Figure 7.2: Total number of voltage threshold crossings over execution time
for laptop class Core+PDN
36
In Section 6.2.1 we observed high frequency resonances on the PDN. These
resonances cause many threshold crossings which we would expect to severely
degrade its performance. Benchmark application qsort, when run on a mobile
device, approaches nearly 900 threshold crossings in 25 µs (Figure 7.3). On
average the mobile system experiences approximately three times the number
of threshold crossings to voltage emergencies.
Figure 7.3: Total number of voltage threshold crossings over execution time
for mobile class Core+PDN
7.2 µArchitecture Event Predictor and Signature
Predictor
The µArchitecture event-based predictor and signature-based predictor have
varied performance across all the devices. There are two primary disadvan-
tages of these predictors. The first disadvantage arises from the existence
of multiple events occurring in a single cycle. The multi-event cycles mean
that the order in which events arrive has an impact on the performance of
the predictor. In the case of the µArchitecture predictor, some events may
be aliased out because it is only tracking a single event. In the case of the
signature predictor there can be many different orderings of events, resulting
in multiple signatures that all correspond to the same droop. The second
main drawback of these two predictors is the point at which they throttle.
The monitored events are events that cause stalls, and the predictor throt-
tles immediately after the event comes in and the history register is updated.
There is the possibility that the processor is still stalled even after the throt-
37
tle period is over. In this case the throttle issued will have no benefit over
the DeCoR rollback system.
7.3 All Things Considered
When comparing the predictive mechanisms to the reactive mechanism, the
results are varied. The laptop class system sees the most performance im-
provement from the prediction mechanisms while the higher power desktop
and constrained PDN mobile systems experience negligible improvement.
The discrepancies and poor performance of the predictors can be attributed
to two potential implementation details. First, in the high-power desktop
system there is a side effect known as the DeCoR domino effect. The DeCoR
domino effect, discussed in Section 8.1, is a result of the performance of
the predictive mechanisms being hindered by the chain reactions of rollback
events. The second potential issue is that the core-PDN pairing is such that
throttling is not an effective mechanism to mitigate the voltage emergencies,
as will be discussed in Section 8.3. The performance may also be hindered
by the aliasing and ordering issues of multi-event cycles that would reduce
the prediction accuracy of the event and signature predictors, as shown in
Figures 7.4, 7.5, and 7.6.
Figure 7.4: Speedup with respect to a DeCoR only system for the Desktop
Class CPU and PDN
38
Figure 7.5: Speedup with respect to a DeCoR only system for the Laptop
Class CPU and PDN
Figure 7.6: Speedup with respect to a DeCoR only system for the Mobile
Class CPU and PDN
The greatest improvements were observed in the laptop processor class for
all of the prediction mechanisms (Figure 7.7) while the mobile and desktop
processor classes perform marginally better than the DeCoR only system in
reducing the number of observed voltage emergencies.






8.1 Adressing DeCoR Domino Effect
We observed that the systems equipped with the DeCoR delayed commit
and rollback mechanism suffered from many more voltage emergencies than
a system without the DeCoR system in place. During a DeCoR delayed
commit and rollback, the device enters a low activity phase as it restores
the last voltage emergency free context. After the device completes this
rollback, it transitions back to the normal execution state. In the case of
high-performance workloads this usually generates a large transient, which
then causes another voltage emergency. For certain processors, PDNs, and
applications this can cause a chain reaction of voltage emergencies, as shown
in Figure 8.1.
Figure 8.1: DeCoR domino effect. Immediately after the DeCoR event is
over, the device transitions back to a high dynamic power state causing a
droop event.
The chain reaction of emergencies is an unexpected consequence of the
error handling mechanism. A second consequence of the chain reaction is
unnecessary additions to the two CAM based predictors. In the case of the
microarchitectural event predictor, the limited size of the CAM and the least
40
recently used replacement policy could evict meaningful <PC, Event> pairs
from the memory and hinder its ability to make predictions.
A solution to mitigate the chain reaction would be to gradually ramp up the
core utilization techniques such as pipeline muffling and damping, throttling,
or instruction reordering post checkpoint restoration. This would reduce the
slew rate of the transition back to normal execution. Using throttling, a
slight reduction of 91 voltage emergencies or 16.18% was observed for the
laptop class processor as shown in Figure 8.2. As discussed in Section 8.3
the mobile and desktop systems are immune to the throttle action, so the
improvement in the number of emergencies is negligible.
Figure 8.2: Reduction in number of voltage emergencies with throttling on
restore
8.2 Improved Prediction through Instruction
Dependency Analysis
Another potential improvement is to evaluate more architectural states along
with the event history. Tracking the event history only tells part of the story
of what is occurring in the processor. Essentially, if an event signals a stall,
then it is possible that there could be a droop after the stall completes and
the processor begins executing. In workloads that have a high degree of in-
struction level parallelism, it is often the case that after a stall there are many
instructions that can be dispatched because they have no data dependencies.
In an application that has limited instruction level parallelism, there may
only be a fraction of the core’s width instructions that can be dispatched
at the end of the stall. We developed two more power prediction units to
41
exploit the extra architectural state: the Throttle After Stall (TAS) and the
Instruction Pending (InstPending). Both of the state-based units function in
a similar manner. The TAS and InstPending mechanisms monitor the state
of the instruction dispatch stage of the processor. If the dispatch stage is
idle, the processor will be in a low power state over the next several cycles
because no instructions were issued. However, if the dispatch unit was ac-
tive and issuing instructions, then the pipelines will have high activity as the
issued instructions are processed by the downstream stages. This technique
for identifying load transients is like the approach studied by Leng et al.
[20] in GPU systems. If the power prediction units see a period of inactiv-
ity followed by instructions being issued, then the prediction unit issues a
throttle signal. The TAS unit will always stall if one or more instructions are
observed after the phase of inactivity. However, the InstPending mechanism
will only issue the stall if more than a threshold number of instructions are
dispatched. The two new prediction units can perform nearly as well as the
sensor and microarchitecture-based approach shown in Figure 8.4. Again,
there are no improvements in the mobile and desktop classes (Figure 8.3 and
Figure 8.5).
Figure 8.3: Speedup with respect to a DeCoR only system for the desktop
class CPU and PDN
42
Figure 8.4: Speedup with respect to a DeCoR only system for the laptop
class CPU and PDN
Figure 8.5: Speedup with respect to a DeCoR only system for the mobile
class CPU and PDN
The total speedup across all the systems and prediction mechanisms is
shown in Figure 8.6. The Sensor, Microarchitecture, and TAS mechanisms
all reduce the number of voltage emergencies by over 50% while the Signature
and InstPending techniques can reduce the number of voltage emergencies
by over 20%. The Sensor, Microarchitecture, and TAS mechanisms reduce
the number of voltage emergencies to varying degrees. The reduction is
not enough to provide a sizeable performance improvement over the DeCoR
only baseline system. The TAS and InstPending mechanisms are decoupled
from monitoring the voltage making them more adaptable than the other
mechanisms to a multicore system.
43
Figure 8.6: Average speedup across the three different systems with
realistic PDN models
Not only is the performance improved for the laptop class system, the av-
erage number of voltage emergencies is also reduced across all the prediction
mechanisms evaluated. The performance of the TAS and InstPending pre-
dictors is between the microarchitectural event and the signature predictors
(Figure 8.7).
Figure 8.7: Number of VEs improved for all except qsort with the signature
predictor
Of the systems with the realistic PDN models, the laptop system performs
the best across all the prediction mechanisms where throttling is an effective
mitigation technique (Figure 8.8).
44
Figure 8.8: Average improvement in number of VEs across the three
realistic systems
Although the TAS can dramatically reduce the number of voltage emer-
gencies observed, in certain applications the TAS scheme performs worse
than the sensor scheme. For example, the voltage emergencies in the untoast
benchmark are nearly mitigated (Figure 8.7); however, the TAS predictor
performed worse than the sensor-based scheme (Figure 8.4). The high over-
head of such a naive always-throttle mechanism means it can benefit from
more informed decisions. In the work performed by Kayanam et al. [19]
activity probes were inserted at all the stages of the pipeline. Inserting more
probes would provide extra context on the state of the processor and the
likelihood and magnitude of a transient. If the relationship between type of
instruction, operand and utilization of FUs is considered, there is potential
to improve the accuracy and reduce the misprediction rate.
8.3 Reconsidering Throttling As a Mitigation
Technique
All is the mechanisms studied so far have shown improvement for the laptop
class system, while the mobile and desktop systems have shown very little
performance improvement from any of the prediction mechanisms. There
are many potential factors that may deteriorate the efficacy of the prediction
mechanisms. Recall that for the desktop and mobile class systems, the rate of
voltage emergencies is double and quadruple, respectively, that of the laptop
45
class. The high emergency rate is an indicator that the power distribution
network is nearing its limits. If the PDN is operating at its limits, the
throttle mechanism will not have much of an effect on the ability to avoid
the emergency, and this is the behavior that is observed on the desktop
and mobile systems. Even though the prediction mechanisms are issuing the
throttle signal, there is a negligible reduction in voltage emergencies observed,
as shown in Figure 8.9 and Figure 8.10. Thus, a new underlying mechanism
to reduce the slew rate of the processor must be introduced.
Figure 8.9: Desktop voltage emergencies observed across prediction
mechanisms
Figure 8.10: Mobile voltage emergencies observed across prediction
mechanisms
To evaluate the impact of the PDN on the performance of the prediction
46
mechanisms, three new single-order PDNs were evaluated with the systems.
The new PDN models reduce the variability introduced by the realistic mod-
els described in Chapter 4. The PDNs are all based on the PDN model
studied by Gupta et al. [8] and Reddi et al. [9]. Each of the power distribu-
tion networks is scaled to match the current requirement of the device. The
parameters of the power distribution networks are provided in Figure 8.1 and
the corresponding impedance profiles are shown in Figure 8.11.
Table 8.1: Power Distribution Network Parameters for Controlled Models
Class L C R Q ω0
Desktop 3.98 pH 636.6 nF 1 mΩ 10 100 MHz
Laptop 8.36 pH 303.2 nF 1 mΩ 5.25 100 MHz
Mobile 15.91 pH 159.1 nF 1 mΩ 2.5 100 MHz
47
Figure 8.11: Harvard PDN frequency response plots for the three classes of
processor
After controlling for the power distribution network, the state based pre-
diction units effectively use throttling to reduce the number of voltage emer-
gencies at the expense of performance across all the processor classes (Figures
8.12 and 8.13); however, the checkpoint and rollback penalty is set to 100
cycles which is a very optimistic penalty. If the checkpoint and rollback
mechanism had a greater penalty for a voltage emergency, the speedup is
expected to improve. The effectiveness of throttling across systems with the
controlled PDN highlights the need to develop new and improved solutions
to mitigating voltage noise on systems with realistic PDNs.
48
Figure 8.12: Average improvement in number of VEs across the three
controlled PDN systems
Figure 8.13: Average speedup across the three controlled PDN systems
Interestingly, with the controlled PDN models, the performance of the
sensor based prediction mechanism is incredibly poor as shown in Figure 8.12.
The three new PDNs have a three to four times lower DC resistance than
the realistic PDN models discussed in Section 4.1. The lower DC resistance
reduces the ability of the sensor-based approach to detect and mitigate the
voltage emergency with throttling. Each of the PDNs has a phase delay so the
voltage droop is lagging the event that generated the transient, hindering the
ability to issue a throttle in time. In this case, the state-based predictors are





The great variety of present day device form factors and processors leads to a
wide range of voltage emergency rates and methods available to mitigate the
emergencies. Reductions in board space limit the bulk capacitance at each
stage of the PDN, increasing the inductive noise on the voltage supply. To
ensure that the performance of the processors is maintained even though the
PDN designs are constrained by the form factor, new techniques for voltage
emergency prediction and mitigation are required.
To support the development of new prediction and noise mitigation tech-
niques, a new framework for simulating systems was developed. Previously,
to evaluate voltage noise on the supply line in conjunction with complex
power supply designs, a slow but detailed RTL and spice simulation had
to be run. Difficult-to-access RTL and long simulation times have pushed
researchers to develop frameworks that combine a simplified mathematical
PDN model with a cycle accurate microarchitectural simulator. However,
the mathematical model of the PDN does not allow for interactions between
architecture level techniques and circuit level techniques. Our Predict-T
framework benefits from the speed and flexibility of a cycle accurate simu-
lator, and the detail provided by a RTL power distribution network. This
hybrid technique can enable the exploration of interactive mitigation tech-
niques. To evaluate the framework and highlight the need for new mitigation
techniques, we created three PDN designs representative of current mobile,
laptop, and desktop systems.
We characterized the magnitude, rate, and duration of the voltage emer-
gencies observed across these systems. We evaluated the base implementation
of four prior mitigation techniques and proposed improvements to them. We
presented two derived state based predictive techniques, TAS and InstPend-
ing, that use the decode stage as a litmus test for future voltage emergencies.
These two new techniques were able to perform as well as the prior proposed
50
solutions. All the mitigation techniques performed well on the laptop class
processor and PDN. However, the mobile and desktop class systems posed a
challenge. All the techniques showed either a marginal improvement in num-
ber of voltage emergencies at the cost of performance degradation or showed
both more voltage emergencies and performance degradation.
Through the evaluation of the six predictive mechanisms, we identified a
need for new mitigation techniques. The mobile and desktop class devices
are immune to the throttle response, and thus new techniques to reduce the
slew rate and magnitude of a load transient are required. Future work should
evaluate collaborative mitigation techniques between both architectural and
circuit level techniques. Another possible path to explore is compiler opti-
mization for voltage noise mitigation. A compiler has a complete view of a
program’s dataflow, whereas the processor’s instruction window has a lim-
ited view. A compiler could reorder instructions to potentially reduce the
slew rate and magnitude of the transients. Finally, at the architecture level
there is more room for improvement in mitigating the DeCoR domino effect
through techniques other than throttling. This work highlights the need for
new techniques to mitigate voltage noise as processors and PDNs diversify
due to shifts in manufacturing technologies and device form factors, and can
serve as a foundation for further research.
51
REFERENCES
[1] AMD, “AMD EPYC 7742,” AMD, Databrief, 2019. [Online]. Available:
https://www.amd.com/en/products/cpu/amd-epyc-7742
[2] Ampere, “Ampere Altra 64-bit multi-core Arm processor,” Ampere,
Databrief, 2020. [Online]. Available: https://amperecomputing.com/
wp-content/uploads/2020/03/Altra PB v0.60 20200227.pdf
[3] T. N. Miller, R. Thomas, X. Pan, and R. Teodorescu, “VRSync: Char-
acterizing and eliminating synchronization-induced voltage emergencies
in many-core processors,” in 2012 39th Annual International Symposium
on Computer Architecture (ISCA), 2012, pp. 249–260.
[4] R. Bertran, A. Buyuktosunoglu, P. Bose, T. J. Slegel, G. Salem,
S. Carey, R. F. Rizzolo, and T. Strach, “Voltage noise in multi-core
processors: Empirical characterization and optimization opportunities,”
in 2014 47th Annual IEEE/ACM International Symposium on Microar-
chitecture, 2014, pp. 368–380.
[5] A. V. Peterchev and S. R. Sanders, “Load-line regulation with estimated
load-current feedforward: Application to microprocessor voltage regu-
lators,” IEEE Transactions on Power Electronics, vol. 21, no. 6, pp.
1704–1717, 2006.
[6] P. S. Shenoy and P. T. Krein, “Power supply aware computing,” in 2010
International Conference on Energy Aware Computing, 2010, pp. 1–4.
[7] E. A. Burton, G. Schrom, F. Paillet, J. Douglas, W. J. Lambert, K. Rad-
hakrishnan, and M. J. Hill, “FIVR — Fully integrated voltage regulators
on 4th generation Intel R© CoreTM SoCs,” in 2014 IEEE Applied Power
Electronics Conference and Exposition - APEC 2014, 2014, pp. 432–439.
[8] M. S. Gupta, V. J. Reddi, G. Holloway, G.-Y. Wei, and D. M. Brooks,
“An event-guided approach to reducing voltage noise in processors,” in
Proceedings of the Conference on Design, Automation and Test in Eu-
rope, ser. DATE ’09. Leuven, BEL: European Design and Automation
Association, 2009, p. 160–165.
52
[9] V. J. Reddi, M. S. Gupta, G. Holloway, G. Wei, M. D. Smith, and
D. Brooks, “Voltage emergency prediction: Using signatures to reduce
operating margins,” in 2009 IEEE 15th International Symposium on
High Performance Computer Architecture, 2009, pp. 18–29.
[10] M. D. Powell and T. N. Vijaykumar, “Pipeline damping: A
microarchitectural technique to reduce inductive noise in supply
voltage,” in Proceedings of the 30th Annual International Symposium
on Computer Architecture, ser. ISCA ’03. New York, NY, USA:
Association for Computing Machinery, 2003. [Online]. Available:
https://doi.org/10.1145/859618.859628 p. 72–83.
[11] M. D. Powell and T. N. Vijaykumar, “Pipeline muffling and a priori
current ramping: Architectural techniques to reduce high-frequency
inductive noise,” in Proceedings of the 2003 International Symposium on
Low Power Electronics and Design, ser. ISLPED ’03. New York, NY,
USA: Association for Computing Machinery, 2003. [Online]. Available:
https://doi.org/10.1145/871506.871562 p. 223–228.
[12] D. Milojevic, P. Marchal, E. J. Marinissen, G. Van der Plas, D. Verkest,
and E. Beyne, “Design issues in heterogeneous 3d/2.5d integration,” in
2013 18th Asia and South Pacific Design Automation Conference (ASP-
DAC), 2013, pp. 403–410.
[13] Accelera, “Verilog-AMS language reference manual version 2.4.0,”
Accellera Systems Initiative, User Manual, 2014. [Online].
Available: https://accellera.org/images/downloads/standards/v-ams/
VAMS-LRM-2-4.pdf
[14] E. Alon and M. Horowitz, “Integrated regulation for energy-efficient
digital circuits,” IEEE Journal of Solid-State Circuits, vol. 43, no. 8,
pp. 1795–1807, 2008.
[15] T. Charania, A. Opal, and M. Sachdev, “Analysis and design of on-
chip decoupling capacitors,” IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 21, no. 4, pp. 648–658, 2013.
[16] S. Bang, M. Cho, P. A. Meinerzhagen, A. Malavasi, M. M. Khellah,
J. W. Tschanz, and V. De, “An all-digital, vmax -compliant, stable, and
scalable distributed charge injection scheme in 10-nm CMOS for fast and
local mitigation of voltage droop,” IEEE Journal of Solid-State Circuits,
vol. 55, no. 7, pp. 1898–1908, 2020.
[17] Z. Shan, C. K. Tse, and S. Tan, “Pre-energized auxiliary circuits for very
fast transient loads: Coping with load-informed power management for
computer loads,” IEEE Transactions on Circuits and Systems I: Regular
Papers, vol. 61, no. 2, pp. 637–648, 2014.
53
[18] M. S. Gupta, K. K. Rangan, M. D. Smith, Gu-Yeon Wei, and D. Brooks,
“DeCoR: A delayed commit and rollback mechanism for handling induc-
tive noise in processors,” in 2008 IEEE 14th International Symposium
on High Performance Computer Architecture, 2008, pp. 381–392.
[19] V. K. Kalyanam, P. G. Sassone, and J. A. Abraham, “Power prediction
of embedded scalar and vector processor: Challenges and solutions,”
in 2017 18th International Symposium on Quality Electronic Design
(ISQED), 2017, pp. 221–228.
[20] J. Leng, Y. Zu, and V. J. Reddi, “GPU voltage noise: Characterization
and hierarchical smoothing of spatial and temporal voltage noise inter-
ference in GPU architectures,” in 2015 IEEE 21st International Sympo-
sium on High Performance Computer Architecture (HPCA), 2015, pp.
161–173.
[21] V. J. Reddi, S. Kanev, W. Kim, S. Campanoni, M. D. Smith, G. Wei,
and D. Brooks, “Voltage smoothing: Characterizing and mitigating volt-
age noise in production processors via software-guided thread schedul-
ing,” in 2010 43rd Annual IEEE/ACM International Symposium on Mi-
croarchitecture, 2010, pp. 77–88.
[22] R. Zhang, K. Wang, B. H. Meyer, M. R. Stan, and K. Skadron, “Archi-
tecture implications of pads as a scarce resource,” in 2014 ACM/IEEE
41st International Symposium on Computer Architecture (ISCA), 2014,
pp. 373–384.
[23] D. J. Herrell and B. Beker, “Modeling of power distribution systems
for high-performance microprocessors,” IEEE Transactions on Advanced
Packaging, vol. 22, no. 3, pp. 240–248, 1999.
[24] L. Smith, R. Anderson, and T. Roy, “Chip-package resonance in core
power supply structures for a high power microprocessor,” Proceedings
of IPAC ’01, 2001.
[25] M. S. Gupta, J. L. Oatley, R. Joseph, G. Wei, and D. M. Brooks, “Under-
standing voltage variations in chip multiprocessors using a distributed
power-delivery network,” in 2007 Design, Automation Test in Europe
Conference Exhibition, 2007, pp. 1–6.
[26] International Rectifier, “8+0/7+1/6+2 dual output digital multi-phase




[27] Linear Technology, “Ltc1736 5-bit adjustable high efficiency
synchronous step-down switching regulator,” Linear Technology,
Datasheet, 1999. [Online]. Available: https://www.analog.com/media/
en/technical-documentation/data-sheets/1736f.pdf
[28] ON Semiconductor, “8/7/6/5/4/3/2/1 phase buck controller with
pwm vid and i2c interface,” ON Semiconductor, Datasheet,
2017. [Online]. Available: https://www.onsemi.com/pub/Collateral/
NCP81274-D.PDF
[29] Intel Corporation, “Voltage regulator module (VRM)
and enterprise voltage regulator-down (EVRD) 11.1,”




[30] J. Radhakrishnan, J. Ejury, and W. Moussa, “How load-
line will help the application,” Infineon Technologies, Re-
port, 2019. [Online]. Available: https://www.infineon.com/
dgdl/Infineon-DCDC\ FPGA\ load-line-AN-v01\ 00-EN.pdf?fileId=
5546d46269e1c019016a7b06d9e53116
[31] Qualcomm, “Qualcomm R© SnapdragonTM 865+ 5G mobile
platform,” Qualcomm Corporation, Databrief, 2020. [On-
line]. Available: https://www.qualcomm.com/media/documents/files/
qualcomm-snapdragon-865-5g-mobile-platform-product-brief.pdf
[32] S. Das, P. Whatmough, and D. Bull, “Modeling and characterization of
the system-level power delivery network for a dual-core ARM Cortex-
A57 cluster in 28nm CMOS,” in 2015 IEEE/ACM International Sympo-
sium on Low Power Electronics and Design (ISLPED), 2015, pp. 146–
151.
[33] T. Austin, E. Larson, and D. Ernst, “Simplescalar: An infrastructure
for computer system modeling,” Computer, vol. 35, no. 2, p. 59–67,
Feb. 2002. [Online]. Available: https://doi.org/10.1109/2.982917
[34] D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: A framework
for architectural-level power analysis and optimizations,” SIGARCH
Comput. Archit. News, vol. 28, no. 2, p. 83–94, May 2000. [Online].
Available: https://doi.org/10.1145/342001.339657
[35] J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze,
S. Sarangi, P. Sack, K. Strauss, and P. Montesinos, “SESC simulator,”
January 2005, http://sesc.sourceforge.net.
55
[36] R. Balasubramonian, A. B. Kahng, N. Muralimanohar, A. Shafiee,
and V. Srinivas, “Cacti 7: New tools for interconnect exploration in
innovative off-chip memories,” ACM Trans. Archit. Code Optim., vol. 14,
no. 2, June 2017. [Online]. Available: https://doi.org/10.1145/3085572
[37] A. B. Kahng, B. Li, L. Peh, and K. Samadi, “Orion 2.0: A power-area
simulator for interconnection networks,” IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, vol. 20, no. 1, pp. 191–196,
2012.
[38] E. K. Ardestani and J. Renau, “ESESC: A fast multicore simulator
using time-based sampling,” in International Symposium on High Per-
formance Computer Architecture, ser. HPCA’19, 2013.
[39] Synopsys, “VCS,” Synopsys, Datasheet, 2020. [Online]. Available:
https://www.synopsys.com/verification/simulation/vcs.html
[40] Ansys, “Ansys powerartist comprehensive RTL design-for-power
platform: analyze, debug, reduce,” Ansys, Datasheet, 2020. [Online].
Available: https://www.synopsys.com/verification/ams-verification/
hspice.html
[41] Y. Chen, M. Shintani, T. Sato, Y. Shi, and S. Chang, “Pattern
based runtime voltage emergency prediction: An instruction-aware block
sparse compressed sensing approach,” in 2017 22nd Asia and South Pa-
cific Design Automation Conference (ASP-DAC), 2017, pp. 543–548.
[42] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu,
J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell,
M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, “The gem5
simulator,” SIGARCH Comput. Archit. News, vol. 39, no. 2, p. 1–7, Aug.
2011. [Online]. Available: https://doi.org/10.1145/2024716.2024718
[43] S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen,
and N. P. Jouppi, “McPAT: An integrated power, area, and timing
modeling framework for multicore and manycore architectures,” in
Proceedings of the 42nd Annual IEEE/ACM International Symposium
on Microarchitecture, ser. MICRO 42. New York, NY, USA:
Association for Computing Machinery, 2009. [Online]. Available:
https://doi.org/10.1145/1669112.1669172 p. 469–480.
[44] Synopsys, “HSPICE,” Synopsys, Datasheet, 2020. [Online]. Available:
https://www.synopsys.com/verification/ams-verification/hspice.html
[45] Cadence, “Incisive enterprise simulator,” Cadence,




[46] Cadence, “Spectre simulation platform,” Cadence,
Datasheet, 2020. [Online]. Available: https://www.cadence.
com/content/dam/cadence-www/global/en US/documents/tools/
custom-ic-analog-rf-design/spectre-simulation-platform-ds.pdf
[47] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge,
and R. B. Brown, “MiBench: A free, commercially representative em-
bedded benchmark suite,” in Proceedings of the Fourth Annual IEEE
International Workshop on Workload Characterization. WWC-4 (Cat.
No.01EX538), 2001, pp. 3–14.
[48] J. L. Henning, “SPEC CPU2006 benchmark descriptions,” SIGARCH
Comput. Archit. News, vol. 34, no. 4, p. 1–17, Sep. 2006. [Online].
Available: https://doi.org/10.1145/1186736.1186737
[49] C. Bienia, “Benchmarking modern multiprocessors,” Ph.D. dissertation,
Princeton University, January 2011.
[50] Intel Corporation, “Intel atom z8000 processor series datasheet
volume 1,” Intel Corporation, Datasheet, 2016. [Online]. Avail-
able: https://www.intel.com/content/www/us/en/products/docs/
processors/atom/z-series/z8000-datasheet-vol-1.html
[51] Intel Corporation, “Intel Core i3-10110u processor,” Intel Corporation,
Databrief, 2019. [Online]. Available: https://www.intel.com/content/
www/us/en/products/processors/core/i3-processors/i3-10110u.html
[52] Intel Corporation, “Intel Core i9-10900t processor,” Intel Corporation,
Databrief, 2020. [Online]. Available: https://www.intel.com/content/
www/us/en/products/processors/core/i9-processors/i9-10900t.html
[53] Intel Corporation, “Intel Core i3-10100 processor,” Intel Corporation,
Databrief, 2020. [Online]. Available: https://www.intel.com/content/
www/us/en/products/processors/core/i3-processors/i3-10100.html
[54] Intel Corporation, “Intel Xeon platinum 8380h pro-
cessor,” Intel Corporation, Databrief, 2020. [Online].
Available: https://www.intel.com/content/www/us/en/products/
processors/core/i9-processors/i9-10900t.html
57
