Dynamic power management for computing and communication by Wei, Da
c© 2019 Da Wei




Submitted in partial fulfillment of the requirements
for the degree of Doctor of Philosophy in Electrical and Computer Engineering
in the Graduate College of the
University of Illinois at Urbana-Champaign, 2019
Urbana, Illinois
Doctoral Committee:
Professor Pavan Kumar Hanumolu, Co-chair
Professor José Schutt-Ainé, Co-chair
Professor Naresh R. Shanbhag
Assistant Professor Arijit Banerjee
ABSTRACT
Power management is essential in state-of-the-art many-core processor and system-on-chip
designs due to the ever-increasing demand for performance and the diminishing benefits
from technology scaling. With more and more integration of processing cores and functional
blocks, aggressive power management is needed to keep the chips from overheating, to reduce
the power requirement of the circuit boards and to allow data centers to host more machines
within its power density limit.
On the communication side, on- and off-chip total bandwidth has been increasing expo-
nentially. However, the transceiver energy efficiency has by and large remained constant.
Previous studies have explored the dynamic voltage and frequency scaling (DVFS) and Rapid
on/off (ROO) techniques to optimize power efficiency of the transceivers for short-reach on-
board chip to chip links. In this thesis, we explore power scaling techniques for on-chip
high-speed links. To this end, A 10Gb/s rapid-on/off on-chip link transceiver is presented
to demonstrate the architecture and circuit techniques to improve energy efficiency under
all utilization levels. Fabricated in 65nm process, the proposed transceiver uses single-ended
signaling with only 0.5μm width and spacing and achieves 5Gb/m throughput density. Fast-
lock signaling and clocking circuits greatly reduce the power-on time to 17ns. More than
125x effective data rate scaling (10Gb/s to 80Mb/s) is obtained with an energy efficiency
degradation of only 1.6x (627fJ/b/mm to 997fJ/b/mm). When the supply voltage is scaled
from 1V to 0.7V, the peak data rate scales from 10Gb/s to 6Gb/s and the power scalable
range increases to 208x (10Gb/s to 48Mb/s) with energy efficiency degradation of only 1.2x
(627fJ/b/mm to 753fJ/b/mm).
On the computing side, more and more aggressive power management profiles are de-
ployed on modern processors. However, their effectiveness is limited due to the potential
supply droops caused by the large load current steps that compromises the power integrity.
To mitigate the droops, we explore the circuit and architecture techniques for a fast load
transient DC-DC converter that is able to withstand large load steps without a supply
droop/overshoot. The proposed converter achieved 89% peak. It also achieved less than
8mV droop/overshoot when a 480mA/1ns load step is applied.
ii
ACKNOWLEDGMENTS
I would like to express my gratitude to my advisor, Professor Pavan Kumar Hanumolu. Your
insight vision, continuous encouragement and patient guidance always inspires me during
my journey of pursuing my Ph.D. I admire you for your total dedication to me and other
members in the research group.
I would also grateful to my co-advisor Professor Jose Shutt-Aine and doctoral committee
member Professor Naresh R. Shanbhag and Professor Arijit Banerjee for providing valuable
comments and suggestions to help me to enhance my research and this thesis.
I would like to express heartfelt appreciation for Dr. Tejasvi Anand, a former member of
our research group, who spent countless nights in the lab helping me to diagnose the faulty
circuits.
I would also like to extend my appreciation for my talented colleagues that I know I can
always count on, they include Junheng (Charlie) Zhu, Nilanjan Pal, Dongwook Kim, Timir
Nandi, Ahmed Elmallah, Mostafa Ahmed, Dr. Guanghua Shu, Dr. Woo-Seok Choi, Dr.
Mrunmay Talegaonkar, Dr. Saurabh Saxena, Dr. Ahmed Elkholy and Dr. Romesh Kumar
Nandwana.
Finally, I would like to express my deepest gratitude to my parents and my wife for their
dedicated love and support throughout this endeavor.
iii
TABLE OF CONTENTS
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis proposal organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
CHAPTER 2 CONVENTIONAL DYNAMIC POWER MANAGEMENT TECH-
NIQUES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Power management in on-chip links . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Power management in computing cores . . . . . . . . . . . . . . . . . . . . . 7
2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
CHAPTER 3 A 10GBIT/S/CH RAPID-ON/OFF TRANSCEIVER FOR ON-
CHIP INTERCONNECTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Proposed architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 Measurement results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
CHAPTER 4 A 20MHZ NEAR-ZERO DROOP BUCK CONVERTER USING
LOAD-ASSISTED CURRENT COMPENSATION . . . . . . . . . . . . . . . . . 36
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Proposed architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Building blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Measurement results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.6 Alternative compensation correction method . . . . . . . . . . . . . . . . . . 55
CHAPTER 5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59





Ever increasing demand for higher performance combined with aggressive scaling of CMOS
technology has led to the integration of more and more computing blocks on a single chip.
In order to move data in and out of these computing blocks, the on and off-chip communi-
cation fabric’s bandwidth has also been increasing exponentially. Consequently, the power
consumed by these large chips is also increasing. In order to sustain this growth in both com-
puting and communication applications, aggressive dynamic power management techniques
are needed to prevent exceeding the power and thermal budgets.
In the context of a communication fabric, conventional dynamic power management tech-
niques like low-voltage differential signaling (LVDS) usually focus on optimizing the efficiency
for the peak speed. However, these methods ignore the fact that most of the communication
links operate with bursty data. For example, DDR memory buses are only activated when
there is a last level cache (LLC) miss on the processor and memory is accessed, PCI express
buses are only activated when the peripherals like network adapters or graphic cards need
to communicate with the CPU. Studies have shown that approximately 85% of the time
these buses are idle [1, 2]. Unfortunately, the transmission of random data is still needed to
maintain the synchronization between the transmitter and receiver such that the bandwidth
is available when a data request is presented. This is mainly due to slow locking process of
the conventional low bandwidth phase locked loop (PLL) based clock generators. Dynamic
voltage and frequency scaling (DVFS) partially mitigate the problem but still suffers from
the slow frequency locking. Recent research [3, 4] shows that the rapid on/off (ROO) tech-
nique can effectively improve the power efficiency of the short reach off-chip communication
fabric by turning off the transceiver and the corresponding clock generator once idle is de-
tected and power on and lock the transceiver rapidly once the data is presented. The ROO
technique is proven effective for the off-chip links, however, for the increasing demanding
on-chip links that share the similar challenges, such a technique is still missing. In addition,
1
the on-chip link’s unique environment and requirements bring additional challenges. For
example, the channel length is usually much longer than the width( > 10000x), together
with the close proximity between channels results in the RC dominant, high loss and high
crosstalk channels. The large number of channels require an area and power efficient trans-
mitter (TX) and receiver (RX) architecture. It also requires extremely low latency between
TX and RX. On the other hand, the ability to co-design the transmitter, the channel and
the receiver may lead to novel optimization that is impossible in an off-chip link scenario.
In the context of computing cores, the dynamic power management historically was only
intended to reduce the idle power consumption. However, with the integrated DC-DC con-
verter [5], advanced power management techniques like per-core DVFS and even load-aware
power management are becoming popular to reduce active power and/or increase perfor-
mance. For example, Intel’s turbo boost technique allows processors to operate at higher
frequencies with higher voltage as long as the temperature and power consumption is within
the designed limit. For example, when a single threaded program is running, all the power
budget can be allocated to the active core, results a dramatically increased single core
frequency and performance. A table of the maximum turbo frequency of Intel i7-8700K
processor is shown in Table 1.1.
Table 1.1: Max turbo frequency vs. number of active cores.
Num of active cores 1 2 3 4 5 6
Turbo frequency (GHz) 4.7 4.6 4.5 4.4 4.4 4.3
Source: https://www.anandtech.com/show/11859/the-anandtech-coffee-lake-review-8700k-and-8400-initial-
numbers/18
The application-aware frequency scaling is an example of fine-grain power management
to improve the performance. Instead of designing the processor for the worst case, the Intel
Haswell was designed for the nominal case. The frequency plan is shown in Figure 1.1 (b).
When the processor is executing non-AVX instructions, higher frequency is chosen. How-
ever, when heavy load instructions like AVX are detected, the processor can throttle its
frequency to a lower level to satisfy the timing requirement of the longer critical paths ac-
tivated by these heavy instructions. Although techniques like Intel’s speed shift technology
allows processors to quickly switch between power states/operating frequencies (shown in
Figure 1.1 (a)), the positive/negative load steps introduced by the aforementioned system
level power management techniques may cause large supply voltage droop/overshoot. Thus,
a novel DC-DC converter is needed to support such aggressive power management techniques
to maximize the potential power saving and performance improvement.
The proposed thesis explores circuit and architecture design techniques, with focus on
2




the dynamic power management for on-chip communication fabrics and computing cores to
further improve the power efficiency of state-of-the-art many-core processors and SoCs.
1.2 Thesis proposal organization
The thesis proposal is organized as follows:
Chapter 2 reviews the basic operation and limitation of conventional power management
techniques for on-chip communication fabrics and computing cores.
Chapter 3 discusses high-speed, high-throughput density rapid on/off on-chip transceiver
to illustrate the power efficiency improvement is achievable by implementing novel power
management on on-chip links.
Chapter 4 discusses the fast load transient response DC-DC converter that eliminates the
supply droop/overshoot despite of large load current steps.





2.1 Power management in on-chip links
On-chip high-speed links have become vital components to connect different functional blocks
in modern SoCs and many core processors. For example, to connect all the cores together to
ensure cache coherence, Intel Nehalem-EX architecture introduced a ring bus that contains
more than 1000 wires and is able to deliver over 3TB/s peak bandwidth [6] (shown in Figure
2.1 (a)). When the number of cores increases to Haswell’s 18 and Broadwell-DE’s 24, the
second ring bus was introduced and the frequency is also doubled [7], allowing a massive
10+TB/s aggregated on-chip bandwidth (shown in Figure 2.1 (b)). For many-core processor
like Skylake-SP architecture, a 2D mesh network-on-chip (NoC) is implemented to ensure
all the cores can communicate with each other efficiently as shown in Figure 2.1(c)[8].
Not only the high performance processor but also the low power applications can benefit
from NoC. In [9], an LTE baseband IC implemented a 15-router asynchronous NoC to allow
run-time reconfiguration to support multiple communication protocols.
Although NoC brings substantial performance improvements, it also creates many chal-
lenges and presents designers with several requirements such as:
• High throughput density (higher Gb/s/µm).
• Low power even while communicating across long distances (lower fJ/bit/mm).
• Maintain high power efficiency under all workload scenarios.
Unfortunately, fulfilling these requirements is challenging. For instance, driving global NoC
interconnects that are often very long (> 5mm) at high data rate results in excessive power
consumption. A 5mm global interconnect wire has unit resistance of 310ohm/cm, unit
capacitance of 2.12pF/cm and buffer delay of 10ps [10], the conventional buffer insertion
5
Figure 2.1: NoC architecture involving
[6, 7, 8].
6






For a ring bus of 1000 wires mentioned above, 2500 buffers are needed for a single 5mm
segment. Even a highly optimized global bus consumes high power. For example, the on-
chip mesh network discussed in [11] consumes 13.2W power, accounts for 10% of the total
power budget of the processor.
Other techniques like low-swing signaling techniques [12], power efficient voltage-mode
drivers [13], high sensitivity receivers that employ power efficient equalizers [14] and offset-
cancelled sense amplifiers [15] are used to lower power consumption. However, these tech-
niques do not consider bursty and application-depend nature of data traffic, and as a result
they are ineffective in scaling power in proportion to link utilization [16]. In other words,
aforementioned low power techniques only help reduce active power and are ineffective for
reducing idle power. In fact, on-chip links consume nearly the same amount of power re-
gardless of whether the link is active or idle. Consequently, link energy efficiency severely
degrades at lower utilization levels [17]. This is particularly the case in mobile applications
where the data traffic can be very sporadic (short active periods interspersed between long
inactive periods). Interconnect power can be partially scaled by clock gating [10], however,
due to the always-on power consumption of the clock-generating circuits, such a solution
suffers from low energy efficiency at low link utilization. Prior research [10], [18], and [15]
also suffers from low throughput density due to large spacing between lanes and wide wire
width. Techniques such as dynamic voltage and frequency scaling (DVFS) could potentially
scale power in proportion to link utilization [19]. But the time required to scale the supply
voltage can be much longer than the data burst interval, which makes DVFS alone ineffective
in such use cases.
The solution proposed in Chapter 3 achieves high throughput density at its peak speed
and less than 2x power efficiency degradation when the effective data rate is scaled by more
than 100x, greatly reducing the link’s overall power consumption in practical scenarios.
2.2 Power management in computing cores
The system-level power management methods discussed in Chapter 1 demonstrate the ef-
fectiveness to save power and increase performance. However, aggressive DVFS usually
creates large load current steps, drooping the supply voltage below the tolerance because
7
of the low bandwidth of the DC-DC power converter. Conventionally, the supply voltage
is increased in light load scenario so that it will not droop below the target voltage when
a light-to-heavy load transition happens. However, this method dramatically decreases the
power efficiency during the light load condition due to the fact that dynamic/leakage power
is quadratic/exponential related to supply voltage. This is especially problematic for mobile
applications since they operate in idle state most of the time and are sensitive to power
efficiency. Many efforts like [20] tried to reduce the droop by optimizing the power con-
verter’s bandwidth by using a fast compensator. Unfortunately, they are ultimately limited
by the slew-rate of the inductor current. Cheng [21] proposed a dedicated compensation
path that can bypass the slow inductor to supply extra current to the load. However, the
limited number of predefined compensator strength is only effective for very few power state
transition cases with specific L-C output filter setups. In addition, the detector based com-
pensation requires the droops to happen before being able to compensate, thus limiting its
effectiveness. The clock-data compensation proposed by [22] tried to solve this problem in a
different way. Instead of removing the droop, it reduces the frequency of the processor (the
load) proportional to the estimated droop. However, it suffers from limited effectiveness due
to inaccurate droop estimation and prolonged performance penalty during the droop that
may last for tens of micro-seconds. Advance notice from the load that informs the converter
regarding the power state transition can be used to eliminate the droop completely. A pre-
energized inductor method proposed by [23] achieved promising results. Unfortunately, the
extra set of output inductor and capacitor makes it unsuitable for mobile applications. In
addition, all the prior techniques are developed for analog DC-DC converters that require
passive components like capacitors and resistors, making their applications impossible to
implement in the integrated DC-DC converters found within processors or SoCs.
The lack of a fast load transient response DC-DC converter significantly limits the effec-
tiveness of the DVFS and ROO techniques. The study conducted by [24] indicates even
the transition between active power states (P-state) in modern processors will take 30 µs or
more, let alone the transition between idle/lower power state (C-state) to full active state.
For example, any real-time applications that require a response time less than 30 µs will
make the CPU runs at full clock rate even if the actual utilization rate is low. Using the
P-state table shown in Table 2.1 [25] and assuming the actual utilization rate is 50%, an ideal
system with instant power state transition ability could possibly achieve 36% power saving
when compared with the conventional systems that have to run at full speed all the time. To
better support the DVFS and ROO operation, a fast load transient DC-DC converter is pro-
posed later in Chapter 4. It achieves less than 8mV droop/overshoot when a 480mA/1ns load
current step is applied, potentially enables more aggressive power management techniques.
8
Table 2.1: P state table.








Basic applications of on-chip links and DC-DC converters are introduced. The need for
innovative dynamic power management techniques is discussed. Brief comparison between
prior publications are also included and their limitations are given to motivate the solutions
in the following chapters to break the aforementioned limitations.
9
CHAPTER 3
A 10GBIT/S/CH RAPID-ON/OFF TRANSCEIVER
FOR ON-CHIP INTERCONNECTS
3.1 Introduction
It has been demonstrated that rapid-on/off is a promising way to reduce power consumption
of serial links by almost 70% [26, 4, 27, 28, 29, 30]. However, these examples are limited to
off-chip interconnects and suffer from long power-on time, low energy efficiency in always-on
condition, and operate at low data rates. A high-speed power scalable rapid-on/off on-chip
interconnect does not exit.
In this chapter, we present a complete source synchronous rapid on/off transceiver for
on-chip interconnects capable of scaling its power down to near zero in accordance with
utilization. Proposed 10Gb/s prototype transceiver features a capacitive driver based three-
tap FFE implemented in a 65nm CMOS process, and achieves 5Gb/s/µm throughput den-
sity along with 627fJ/b/mm energy efficiency (471fJ/b/mm if the clocking power is amor-
tized among nine data lanes). More importantly, its energy efficiency degrades to only 997
fJ/b/mm (768fJ/b/mm assuming nine data lanes) even when the effective data rate is scaled
by more than 100x to about 80Mb/s.
The rest of the chapter is organized as follows. Section 3.2 presents the proposed archi-
tecture. Sections 3.3 and 3.4 describes circuit implementation of transmitter and receiver,
respectively. Experimental results from the test chip are presented in Section 3.5. Key
contributions of this work are summarized in Section 3.6.
3.2 Proposed architecture
The block diagram of the proposed transceiver is shown in Figure 3.1. The transmitter
consists of parallel PRBS generators of different lengths, a serializer that serializes 16 parallel
streams of 625Mb/s data into 10Gb/s data, a half-rate capacitively driven CMOS-based
driver with three-tap pre-emphasis that launches 10Gb/s low-swing data signal onto a 0.5µm
10
wide, 5mm long wire (cross section is shown in Figure 3.2) using a 5GHz low-jitter clock


































































Figure 3.2: Channel cross section.
The clock is forwarded to the receiver on a dedicated channel using a differential capac-
itively driven output driver. Differential signaling helps to minimize duty cycle distortion,
and because the clock lane can be amortized among several data lanes it incurs insignificant
area or power overhead. A dedicated 7-bit phase interpolator (PI) is used in each transmit-
ter to correct path mismatch between the data and CLK channels and ensure near optimal
sampling of the data by the clock on the receive end.
Crosstalk limits maximum achievable data rates and data throughput in on-chip intercon-
nects [31], while it can be partially mitigated by implementing twisted routing [10]. Such
twisting, however, requires two dedicated metal stacks for channels as opposed to one, which
severely constraints lower-level floor-planning. In this work, a single-ended data lane shielded
by a ground lane is used to reduce crosstalk and achieve high throughput density.
To confirm the frequency response and crosstalk performance of the proposed channel, the
RLGC parameters of two adjacent channels can be obtained through an electrical-magnetic
11
solver. The simulation indicates that when compared with the case without the ground
shielding, the crosstalk between a shield single-ended line is reduced by 75.4%.
The comparison between the proposed channel and the differential channel can be studied
in the same way. The differential channel configuration with the same throughput density
as the proposed channel is shown in Figure 3.3. In this case, line 2 is the victim of line 1
while line 3 is the victim of the line 2. The direct coupling between line 1 and line 3 is weak
and ignored for easy analysis (not ignored in the simulation). Again, the simulation shows
that the crosstalk between the unshielded differential line is 229% more than the proposed
























Figure 3.3: Differential channel configuration.
A physical model of the channel together with the surrounding dielectric layers are built
using HFSS and s-parameters are extracted from the physical model by the full wave solver.
The simulation indicates that at worst-case frequency (5GHz), the signal-to-crosstalk ratio
of the shielded single-ended channel is 3dB better than the signal-to-crosstalk ratio of the
unshielded differential channel for the target throughput density due to asymmetry between
the aggressors present on either side of the differential traces. To obtain the same signal-
to-crosstalk ratio as the proposed scheme, the unshielded differential signal channel needs
either increased spacing or additional shield, both of which lower throughput density.
The receiver consists of a broadband data amplifier, two half-rate charge-based samplers,
and a 2:16 de-serializer. The samplers are clocked by the amplified and duty-cycle corrected
(DCC) forwarded clock. The de-serializer generates 16 parallel streams of 625Mb/s data
from 10Gb/s data. A parallel PRBS checker verifies the correctness of the recovered data
12
and flags an error signal if any bit is found to be in error.
Transceiver power is scaled in proportion to utilization by cycling the transceiver between
active and sleep modes. An externally provided signal (WKPTX) indicating inactive periods
(lack of traffic) puts the link in sleep mode during which all the transmitter, receiver, and
clocking circuits are powered down to bring the link power to near zero (155 µW in the
prototype). When the WKPTX signal is asserted high, the link exits the sleep state and
enters the active state in the following sequence: (1) MDLL and PI bias are powered on by
the WKPTX signal that is synchronized to the MDLL reference clock. (2) The line driver is
powered on and the common-mode voltage is restored. (3) The WKPTX is transmitted to
the receiver where it is used to turn on the bias circuitry of the amplifiers. (4) All the digital
circuits are turned on once the MDLL reaches steady state. The power down process follows
the above steps executed in the reverse order. All the circuits are designed, as described
later, to turn-on rapidly and minimize exit latency (less than 17ns in the prototype). Near
zero off-state power and very small exit latency help achieve linear power scaling across a
very wide range of traffic profiles.
We note that an ideal rapid on/off link should consume zero off-state power and turn-on
from the idle state instantly. However, this is difficult to achieve in practice because of
slow-settling circuits. For instance, clock generators implemented using phase-locked loops
take a long time to reach steady state thereby greatly increasing the time taken by the link
to turn on.
3.3 Transmitter
3.3.1 Rapid-on/off clock generator
The block diagram of the fast power-on-lock clock generator is shown in Figure 3.4. Since
phase locked loops (PLLs) suffer from sluggish phase locking [28], we employed a multiplying
delay-locked loop (MDLL) based clock generator that is capable of almost instantaneous
phase locking [26]. The proposed enhancements reduce MDLL power-on-lock to two reference
cycles (2TREF), as compared to three reference cycles for past fast-locking MDLLs [26]. The
MDLL consists of a bang-bang phase detector (BBPD), digital loop filter implemented using
an accumulator, resistive DAC (rDAC), voltage controlled oscillator (VCO) and the selection
logic. The proposed MDLL generates a 5GHz output clock from a 312.5MHz reference
clock. BBPD measures the phase difference between reference and feedback clocks in the























































Figure 3.4: Proposed MDLL architecture.
accumulated by a 12-bit accumulator clocked at a reduced frequency of 156.25MHz. To
reduce dithering jitter resulting from bang-bang behavior of the feedback loop, 4 LSBs of
the 12-bit accumulator output are dropped. Note that this also reduces the integral path
bandwidth by 16X. The remaining 8 MSBs are fed to the rDAC, which controls the VCO
output frequency. Since the MDLL frequency is controlled by an 8-bit Nyquist frequency
DAC, the quantization noise is dictated by the DAC step size and dropping the 4 LSBs does
not affect the quantization noise.
Replacement of every 16th VCO output edge with a reference clock edge is performed
by selection logic. The two voltage-controlled delay lines (VCDL) at the input of BBPD
are used to cancel static phase offset resulting from mismatches between the reference and
feedback paths. Without these two VCDLs, any delay mismatch will be translated into
residual phase error and injected into the VCO loop. The periodically injected phase error
at reference frequency will then result in a large reference spur. The wake-up signal, WKPTX,
re-timed by the reference clock powers on/off the MDLL.
The VCO is implemented with a four-stage ring oscillator where only two of the stages are
controlled to reduce gain. A MUX selects the VCO edge during normal operation and the
reference edge during edge injection. Output of the 8-bit accumulator tunes VCO frequency
by varying the delay cell supply voltage through rDAC shown in Figure 3.5. Nyquist-rate
rDAC is chosen to achieve fast settling compared to current-mode or over-sampling DACs
[32]. The rDAC is divided into eight banks, each of which contains a parallel combination
of MOS transistors [32]. Transistors within each bank are sized to achieve linear transfer



















Figure 3.5: rDAC schematic.
15
versus VCO period step is depicted in Figure 3.6. The period resolution is about 100fs



















Figure 3.6: Simulated MDLL period step vs. code.
and the tuning range in the typical process corner is 4.9GHz to 5.2GHz. While the process
variation is compensated by tuning the load capacitor, the 300MHz tuning range through
rDAC allows MDLL to operate at 5GHz across a wide range of temperatures and supply
voltages. The worst-case peak-to-peak jitter caused by quantization error is 1.6ps.
While rDAC enables fast settling of the VCO frequency, it is susceptible to supply noise.
Although on-chip decoupling capacitors can compensate for supply noise during run-time,
supply droops during a turn-on event can significantly affect locking behavior of the MDLL
and increase its settling time. While power-on time of the MDLL ideally can be very small
(about 1 TREF) [26], transient settling of the VCO control voltage can increase locking time
to several TREF as illustrated in Figure 3.7. Since control voltage is pulled-up to VDD during
the off-state, VCO runs faster than target frequency immediately after it is turned on as
depicted by region 1. Increased supply impedance due to bond wire inductance limits the
supply current during the power-on transient, which causes internal supply voltage to droop
due to discharging of the on-chip de-coupling capacitor by the MDLL current (see Figure
3.7). This large voltage droop (about 80mV in the illustrated case in Figure 3.8) slows down














Figure 3.7: MDLL control voltage (Vctrl) settling behavior.
When VCO finally finishes the 16th cycle, the VCO loop is broken by the selection logic
and kept open until the next reference edge is injected, resulting in passing through of a full
reference cycle.
This phenomenon significantly increases the power-on-lock time. In order to reduce the
power-on-lock time in the presence of supply droops, an architectural solution is proposed. A
programmable divider is used to temporarily decrease N (to N = 13 or 14) during the power
on process, which allows injection of the reference edge. This reduces di
dt
noise and controls
voltage ripple. Compared to the fixed divider ratio case, using a programmable divider
reduced the supply droop by 50%, and increased timing margin of the selection signal to
0.5ns. This enabled MDLL to lock within 2TREF as shown in Figure 3.9. Note that the
temporary division ratio cannot be set too small as it will stop the oscillator too soon. This
causes control voltage to go higher than the target value, which increases settling time. In
addition, the temporary division ratio is only determined by the supply droop during fast-on
process, which in turn is determined by the load current step. Since the load current step
and the parasitic inductance do not significantly vary with temperature, supply voltage or
process, the same temporary division ratio can be used across PVT and at all effective data
rate conditions.
17
Figure 3.8: Simulated MDLL power on behavior.
18
Figure 3.9: Simulated MDLL power on behavior with programmable divider.
19
3.3.2 Line driver with capacitive feed forward equalization (FFE)
Voltage mode driver has better energy efficiency compared to a current mode driver [33, 28].
Unfortunately, the separate supply rail required to generate low-swing signals complicates
the design of a voltage mode driver. As the data rate significantly exceeds 3-dB bandwidth
of the channel, transmitter side equalization becomes inevitable. Such equalization can
be performed using a linear feed forward equalizer (FFE), implemented using a DAC [34].
However, such a DAC requires partitioning the line driver and pre-drivers depending on the




























































Figure 3.10: Block diagram of proposed line driver.
A capacitively driven line driver based FFE is an excellent alternative to decouple the
trade-off between complexity and energy efficiency present in the voltage and current-mode
driver [10]. As shown in Figure 3.10, output swing is governed only by the ratio of output
capacitance to the channel capacitance. Therefore, output swing can be accurately pro-
grammed by adjusting the output capacitance. Also, outputs of multiple drivers can be
shorted without creating a short circuit between supply and ground. As depicted in Figure
3.10, a smaller auxiliary driver is shorted with the main driver. By setting the polarity of the
auxiliary driver to be positive or negative, the induced charge on output capacitors can be
enforced or partially canceled, respectively. Thus, a voltage mode driver with feed forward
20














d[n− 1] · Cpost + d[n] · Cmain + d[n+ 1] · Cpre
Cch
(3.1)
Pulse response of the channel obtained using electromagnetic field solver HFSS is shown in
Figure 3.11. Equalizer coefficients of -0.184 and -0.0179 are used to suppress post-cursor
inter-symbol interference. The simulated transmitted eye with three-tap FFE is shown in
Figure 3.12. Peak distortion analysis indicates FFE improves worst-case vertical eye opening
by more than 58% (from 45.7mV to 72mV).
Figure 3.11: Simulated channel pulse response.
Aforementioned analysis is valid only when the channel is RC-dominant. The on-chip
channel can be analyzed as following. A basic transmission line model of an on-chip wire is
shown in Figure 3.13.
21









Figure 3.13: General transmission line model.
22
Each distributed section contains a series indicator (L) and resistor (R) that come from
the wire, a parallel resistor (G) and capacitor (C) between wire and ground that models the
dielectric loss and parasitic capacitance to ground. The voltage along the transmission line






To validate the RC dominant channel assumption |Ri| 
∣∣L ∂i
∂t
∣∣ should be true. Consider a
general current mode driver case in Figure 3.14, when output transits from high to low, M1
turns on, pulling Voutp to VL, sinking I1 from supply through RTX and I2 through channel.
Assuming transition time of 20% of a bit period T, right after the transition it can be written

















Figure 3.14: General transceiver model.
is much smaller than the channel bandwidth thus constant current charging/discharging
during bit transition can be assumed. From the electrical-magnetic solver Rch and Lch can
23
be extracted and their values are 191.57Ω and 2.72nH, respectively. According to Eq. 3.3 the
maximum data rate when RC dominant assumption is still valid is 14.19Gb/s, 42% higher
than the target data rate of proposed transceiver thus the RC dominant assumption is valid.
Also, frequency response of the channel is extracted by full-wave simulation. The channel
loss at 5GHz is about 16.6dB, which indicates reflected wave will be attenuated by 33.2dB
before reaching the far end.
3.3.3 DC driver
Since neither transmitter nor receiver is terminated, a dedicated DC driver is needed to
set the common-mode voltage Vcm of the data line. To this end, a transistor-based voltage
divider is used to set the common voltage. As shown in Figure 3.10, a long channel PMOS
in series with an NMOS transistor forms a voltage divider. Channel lengths of both the
PMOS and NMOS transistors are made programmable with 3-bit resolution. The resistance
is chosen to keep static current below 10µA per branch. Because driver’s output capacitor
and the DC driver form a high-pass filter, its 3-dB bandwidth is designed to be low enough
to support at least PRBS-31 data. It should be noted that when the transmitter enters sleep
mode, output of the driver is pulled high and the DC line driver should maintain that voltage
level. To achieve this, a second branch is added. When the WKPTX signal is asserted, the
main branch is activated and common-mode voltage of the channel is kept at Vcm; when it is
de-asserted the main branch is deactivated, and the auxiliary branch is activated to change
the common-mode voltage of the channel to Vcm + Vswing.
Figure 3.15 (a) and (b) show the simulation of power-on settling difference between with
and without auxiliary branches, respectively. Turn-on time increases by at least 250ns in
the absence of auxiliary branch.
3.4 Receiver
3.4.1 Data amplifier
The channel output is fed into the data amplifier that is implemented using a resistively-
loaded common source amplifier shown in Figure 3.16. Load resistor is set to 500ohms
considering the trade-off between gain, bandwidth, power consumption and to also produce
output common mode voltage (800mV) that is within the input common-mode range of
the following sampler stage. Two 6-bit resolution programmable current sources with a
24
Figure 3.15: Simulated Vcm settling during power-on process.
C
D




















Figure 3.16: Schematic of proposed data amplifier.
25
full-scale current of 63µA are attached to the output to perform offset cancellation. Monte
Carlo simulations indicate an output referred offset of ±14.1mV (±3σ), which mandates a
resolution of 1µA resolution to cancel the offset to within 0.5mV. The data amplifiers are
turned off during the idle state by using the switch inserted between the input pair and tail
current source. The switch is sized to minimize kick-back on to the bias node during power-
on process. Because kick-back is governed by the capacitive voltage divider formed by CP
and CD, CD is made large compared to CP. The reference generator is implemented using
a replica of the DC line driver of the transmitter. It is kept always-on to reduce power-on
time; this adds about 10µA idle current penalty per lane.
3.4.2 Clock amplifier
The clock channel’s output is fed into the clock amplifier, which has an architecture similar to
that of the data amplifier. A source degeneration capacitor is added between two differential
branches to reduce the impact of common-mode mismatch on the output clock duty cycle.
Output of the clock amplifier is AC coupled using a 220fF metal-oxide-metal capacitor and
fed to a self-biased inverter that generates a rail-to-rail clock signal.
3.5 Measurement results
The prototype transceiver is implemented in 65nm CMOS process; the die photo is shown
in Figure 3.17 and the area breakdown is shown in Figure 3.18.
We note that the circuit area can be scaled with technology but the channel footprint
cannot in order to achieve the same channel frequency response. The die is packaged in a
10mm X 10mm 88-pin QFN plastic package. The critical supplies of MDLL, PI and line
drivers are carefully designed to avoid supply voltage droops caused by the large current
step during a turn-on event. Due to high supply sensitivity of the MDLL, care was taken to
reduce power supply inductance by dedicating four bond pads (∼1nH supply inductance) to
MDLL supply, while all other supplies used two bond pads (∼2nH supply inductance). A
large number of pads are used to reduce the impact of long bonding wires of the chosen QFN
package. Off-chip decoupling capacitors and damping resistors are used to prevent supply
ringing. The measured transceiver’s performance in always-on and rapid-on/off conditions
are presented in the following sections.
26
Figure 3.17: Die photo.
MDLL : 36413 um^2
PRBS generator : 6589 um^2
Serializer : 7718 um^2
CLK driver : 5236 um^2
Data driver : 7683 um^2
PI : 14287 um^2
CLK amplifier : 3110 um^2
Data amplifier : 4649 um^2
Sampler : 2349 um^2
De-serializer : 23767 um^2
PRBS checker : 8097 um^2
MDLL PRBS generator Serializer CLK driver Data driver PI
CLK amplifier Data amplifier Sampler De-serializer PRBS checker
Figure 3.18: Area breakdown.
27
3.5.1 Always-on measurements
The measured output jitter of the MDLL operating at 5GHz is shown in Figure 3.19.
Figure 3.19: Measured MDLL jitter at 5GHz output frequency.
The proposed MDLL achieves 1.3ps RMS jitter and 11.7ps pk-pk jitter. The phase noise
measurement result is shown in Figure 3.20.
The measured reference spur is 47.8dB, and the largest in-band spur is 41.7dB as shown in
Figure 3.21. The BER bathtub curve of the transceiver operating at 10Gb/s with PRBS31
data under two conditions, with and without FFE, is shown in Figure 3.22. This data is
obtained by synchronizing one sub-rate recovered data from 1:8 de-serializer with Tektronix
BSA260C BERT. The sampling position of the data eye was altered by adjusting the trans-
mitter PI code. The bathtub curve indicates that the FFE improves horizontal eye opening
by 280%, from 0.1UI to 0.38UI. The PI code and FFE ratio are set to maximize eye opening
for best BER and retained through entire measurement.
A similar method was used to obtain the statistical data eye as well. In addition to
adjusting sampling instance, the sampler’s reference voltage is also varied. Figure 3.23 shows
the receiver data eye with and without FFE. It confirms that the FFE increased horizontal
opening by 280%, and vertical opening by 133% (from 30mV to 70mV), thus proving the
effectiveness of the proposed FFE scheme.
28
Figure 3.20: Meaured MDLL phase noise plot at 5GHz.
29





















SWP    10 of    10
1
Marker 1 [T1 ]
          -15.92 dBm
     5.000000000 GHz
2
Delta 2 [T1 ]
          -47.75 dB 
  -312.525000000 MHz
3
Delta 3 [T1 ]
          -41.71 dB 
  -156.275000000 MHz
4
Delta 4 [T1 ]
          -50.64 dB 
   312.500000000 MHz
Date: 15.NOV.2016  23:23:09
41.7dB47.8dB
Figure 3.21: Measured MDLL output voltage spectrum at 5GHz.
30
Figure 3.22: Measured transceiver bathtub curve.
Figure 3.23: Measured receiver eye diagram.
31
3.5.2 Rapid-on/off measurements
Prior to turn-off, pre-calibrated PI control code, RX amplifier mismatch control code, FFE
ratio and the MDLL frequency code are stored digitally during the off state. These control
codes are restored at turn-on so that the wake-up time is minimized. This assumes no
temperature drift during the power-off state. However, if the temperature changes, then
either the duration of off time must be limited or lookup table based compensation [4] must
be used.
Figure 3.24: Measured transceiver on/off behavior.
Figure 3.24 shows the measurement setup and the on/off performance of the transceiver.
PRBS-31 data is transmitted through the on-chip channel, recovered by the receiver and
examined by the on-chip PRBS checker. Approximately 3.6 billion on/off transitions are
captured, in these measurements. The error signal is generated by the PRBS checker and
captured by the high-speed real-time scope that shows it goes low within 17ns (170 bits),
which indicates the transceiver operates error-free after 17ns from the power-on event. Of
the 17 ns power-on time, MDLL needs 1ns to start and two reference injections of 6.4ns
to settle. This represents 33% improvement over prior work [26] thanks to the proposed
programmable divider. The PRBS checker seeding latency is roughly 7-9ns. We believe the
extra power-on time is due to the slow common mode settle of the channel.
32
The effective data rate is scaled by changing the duty cycle of the WKPTX signal and the
resulting power-scalable behavior of the transceiver is illustrated in Figure 3.25. More than
125x effective data rate scaling (10Gb/s to 80Mb/s) is achieved with an energy efficiency
degradation of only 1.6x (627fJ/b/mm to 997fJ/b/m). When the supply voltage is scaled
from 1V to 0.7V, peak data rate ranges from 10Gb/s to 6Gb/s and the power scalable
range increases to 208x (10Gb/s to 48Mb/s) with energy efficiency degradation of only 1.2x
(627fJ/b/mm@1V to 753fJ/b/mm@0.7V). The wake-up signal path contains only a few
minimum-sized inverters operating at low frequency (several MHz), which consume negligible
power.
The performance of the transceiver is compared to the state-of-the-art in Table 3.1. To
the best of our knowledge, the proposed transceiver achieves smallest turn-on time, least
off-state power and almost constant energy efficiency.
Table 3.1: Performance comparison of proposed transceiver.
This work JSSC’15 [10] IITC’07 [36]
Link Data Rate (Gb/s) 10 10 10
Channel Length (mm) 5 6 5
Technology 65nm 65 nm 90 nm
Throughput Density (Gbit/s/um) 5 2.1 0.5
TX swing (V) 0.45 N/A 0.12
Equalization 3-tap FFE Pre-emphasis Passive
Fast On-Off Functionality Yes Yes No
Off state power (mW) 0.155 4.5 N/A
Energy Eff. (fJ/b/mm)1 148 48 54
Total Energy Eff. (fJ/b/mm)2 6273 N/A N/A
Total Energy Eff. (fJ/b/mm)4 4724,3 255 N/A
Channel loss (5GHz) (dB) 16.65 12 < 8
Energy eff./unit loss (fJ/b)6 51.1 96.5 N/A
1 TX and RX power only.
2 Total power including SerDes, Pattern generation, Clocking, TX and RX
for 1 data lane.
3 The energy efficiency is traded off for higher throughput density.
4 Assuming there are 9 data lane to share the clocking power.
5 Estimated channel loss.
6 Calculated by
Energy/bit/Channel
Channel loss in linear scale
.
Other solutions to further decrease the exit latency include keeping MDLL always on. In
this case, exit latency can be reduced to sub 10ns but the off-state power will increase from
155uW to 6.4mW. A 2X reduction in exit latency results in a 41X increase in the off-state
power, which severely degrades the overall power efficiency.
33




On-chip interconnects used in NoC applications must deliver high bandwidth with low com-
munication latency and high power efficiency. Because of the bursty traffic, the utilization
of these on-chip interconnects varies with application. Therefore, it is paramount to main-
tain excellent power efficiency across all utilization levels. However, the power efficiency of
conventional interconnects severely degrades at low utilization levels because they consume
large power even when they are idle. This work presents techniques to implement power
efficient transceivers for on-chip interconnects that can eliminate idle power almost entirely
thus achieving energy proportional operation.
Fabricated in 65nm CMOS technology, the proposed 10Gb/s transceiver delivers 5Gbps/µm
throughput density and power-on-lock time of less than 17ns thanks to the programmable
divider that allows MDLL to lock within 2 Tref . In rapid-on/off mode, more than 125x effec-
tive data rate scaling (10Gb/s to 80Mb/s) is achieved with an energy efficiency degradation
of only 1.6x (627fJ/b/mm to 997fJ/b/m). When the supply voltage is scaled from 1V to
0.7V, peak data rate ranges from 10Gb/s to 6Gb/s and the power scalable range increases to




A 20MHZ NEAR-ZERO DROOP BUCK
CONVERTER USING LOAD-ASSISTED CURRENT
COMPENSATION
4.1 Introduction
Digitally controlled DC-DC converters are very well suited for monolithic integration within
large digital systems such as processors and system-on-chips [37]. They benefit from tech-
nology scaling as they obviate the need for passives (resistors/capacitors) and are less sus-
ceptible to transistor imperfections. Furthermore, the ability to dynamically reconfigure the
digital compensator offers flexibility in setting the loop response and helps making the con-
verter stable under component (L/C) variations [37]. However, they require high resolution
analog-to-digital converters (ADCs) and digital PWM (DPWM) converters that complicate
the design and degrade efficiency. One common way to alleviate this is by reducing the com-
pensator loop bandwidth to a very small fraction of the switching frequency, which helps
alleviate the ADC/DPWM speed/resolution requirements. But this severely degrades the
converter transient response. This trade-off is especially problematic in modern processors
that employ rapid transitioning between power states [38] to save system power. Large load
current steps during power-state transitions cause excessive output voltage droop that detri-
mentally impacts the processor performance [39]. Therefore, there is a need for integrated
DC-DC converters that possess excellent load transient response behavior.
Many of the recent efforts that improve load transient response are based on increasing the
bandwidth of analog controllers [20, 5]. While they are shown to be effective, unfortunately,
they cannot be applied to digital controllers because of conflicting bandwidth requirements
elucidated earlier. As a result, transient performance of digitally controlled converters is
severely lagging compared to that of their analog counterparts. In view of these drawbacks,
we seek to improve the load transient response of digitally controlled DC-DC converters by
leveraging feed-forward techniques as opposed to classical feedback approaches that require
wide bandwidth controllers. To this end, we will present robust and accurate compensa-
tion techniques that nearly eliminate the voltage droop by leveraging power-state transition
information from the processor and the re-configurability of the digital controller. The pro-
36
totype buck converter achieves near-zero droop/overshoot (less than 8mV) in response to
480mA/1ns load step and employs safeguards to limit performance degradation when the
processor provides false information.
Rest of the chapter is organized as follows. Section 4.2 presents the proposed droop com-
pensation scheme. Section 4.3 describes circuit implementation of converter. Experimental
results from the test chip are presented in Section 4.4. Key contributions of this work are
summarized in Section 4.5.
4.2 Proposed architecture


























































Figure 4.1: Block diagram of the proposed DC-DC converter.
analog-to-digital converter (LADC) that quantizes error voltage (VE = VOUT − VREF ) into
a digital word, DE, and drives a digital proportional-integral-derivative (PID) compensator
implemented using a fixed-point algorithm. Using multiple high-frequency clock phases
provided by the clock generator, the DPWM unit converts PID compensator’s output code
into a duty-cycle modulated signal that drives power FETs, MP and MN . DPWM uses
a hybrid (counter + multi-phase selector) architecture [40] and achieves better than 0.4%
37
duty cycle resolution. The proposed droop compensator detects the droop using LADC
output and uses it along with the estimate of load magnitude provided by the smart load
(viz. processor) and injects compensation current (IC) into the output node to prevent
voltage droops. A free-running ring oscillator provides the sampling clock to the LADC and
16 phases clock at 320MHz to DPWM to set the converter switching frequency to FSW =
20MHz.
Figure 4.2 shows the model of the converter during compensation mode and its operation















































Figure 4.3: Proposed digital droop compensator and waveforms illustrating its behavior.
The onset of droop is indicated by the START signal, which goes high TADV seconds (about
3ns in this work) before the load step occurs. Compensation current, IC , is set to ÎSTEP by
pre-loading the accumulator in the ramp generator with the desired DCOMP and the duty
cycle is set to 100% to maximize the inductor current slew rate, SRL prior to occurrence of
the positive/negative load step. IC is then ramped down to zero with the target slew rate,
KT , such that IC + IL = ILOAD. The compensation phase ends when IC reaches zero at
38
which time a new duty cycle corresponding to the new load current is programmed into the
PID compensator to minimize the droop.
The effectiveness of the proposed compensation scheme depends on the accuracy of esti-
mated load current step timing (TADV ), load current step size (ÎSTEP ) and slew rate (KT )
that depends on poorly controlled output filter components (L/C). While the PID controller
would eventually compensate these imperfections, the resulting voltage droop can be exces-
sive. For example, a 20% error in the slew rate or load step estimate can result in droop
of 20mV or 50mV, respectively, when using parameters shown in Table 4.1. To avoid this
droop performance degradation, the load current and slope estimates are updated during

































Figure 4.4: Compensation behavior w/ on-chip compensation monitor algorithm.
Consider the scenario in which inductance (L) was underestimated and capacitance (C)
was overestimated, which causes the compensation current slope greater than the target
slope (K0 > KT ). Under this condition, as depicted in Figure 4.4, when a positive load step
occurs at time t0, VOUT droops as a result of imperfect compensation and reaches one LSB
of the LADC at time t1, which can be calculated as follows:








Using the tabulated parameters shown in Table 4.1 the value of t1 can be calculated as
189.7ns. At this point, LADC output changes indicating the presence of droop and can
therefore be used to decrease the slope from K0 to K1 = 0.93× 106A/s. Also, note that at
time t1, compensation current IC reaches 196.5mA while the target compensation current
IT must be 247.1mA. Using the updated slope, the time at which IC reduces to 0 can be





((IT −KT t)− (IC −K1t))tdt
=
(K1 −KT )(t3 − t1)2
2
+ (IT − IC)(t3 − t1)
(4.2)
Solving the above equation yields VD1 = 2.2mV, leading to a total residual voltage droop
(VDRES) of 8.2mV (0.82% of the VOUT ) after compensation. The maximum voltage droop
(VDMAX) occurs when IC = IT at t2 = 316.2ns and it can be calculated similarly using
Eq. 4.2 and the result is 10.3mV. This indicates that proposed compensation monitor can
achieve near-optimal results even if the output capacitor and inductor values are off by 20%.
Table 4.1: Numeric values used in simulation.
Description Value
COUT Output cap (Actual) 0.8µF
CNOM Output cap (Nominal) 1µF
LOUT Output ind (Actual) 0.6µH
LNOM Output ind (Nominal) 0.5µH
KT Compensation current slew rate (target) 1.33× 106A/s
K0 Compensation current slew rate (init.) 1.6× 106A/s
VLSB ADC resolution 6 mV
ISTEP Load current step 500 mA
IT Initial compensation current (target) 500 mA
IC Initial compensation current (est.) 500 mA
Since VOUT changes by less than 1% during the compensation process, inductor current
IL can be accurately approximated as:
IL(t) ≈ IL(t0) +
VL
L
× (t− t0) (4.3)
40
Thus, at the end of compensation (t = t3), inductor current increment can be calculated as
IL(t2)− IL(t0) = 0.80.6×10−6 × (400.2×10
−9) = 0.533mA, which is 6.06% higher than the target
value. If the slope error is not compensated, t3 = 312.5ns and VOUT droop equals 16mV and
the inductor current increment of only 416mA, 16.8% smaller than the target value.
We notice that the inaccurate load current step IS cannot be compensated, the benefits of
the compensation monitor also diminishes as the IS inaccuracy increases. In Section 4.6, an
alternative method is discussed that can obtain more accurate estimation of IS, COUT and
LOUT .
In addition to the load step estimation, the droop monitor is also equipped with safeguards
to turn off the compensation if false-positive prediction is made. For example, if a droop
prediction is received however a overshoot higher than a threshold voltage is detected, the
compensation is shutdown immediately to prevent over-voltage.
4.3 Building blocks
4.3.1 Logarithmic analog-to-digital converter
The two important requirements of the ADC are: (i) small quantization step to minimize
steady-state output ripple voltage [41] and (ii) high conversion/sampling rate to reduce the
latency in detecting the voltage droop. Unfortunately, classical high-speed high-accuracy
ADCs that meet these requirements are not appealing as they consume significant power
[42]. So instead of using such conventional architectures, we observe that high accuracy is
only needed when VOUT is in the vicinity of VREF and quantization error requirement can
be relaxed otherwise. This enables realizing both high resolution and high speed with little
power and area overhead using the flash ADC architecture shown in Figure 4.5 [43]. It uses
12 comparators whose thresholds are logarithmically scaled to achieve a wide input range of
about 300mV.
Gain of the LADC is maintained nearly constant by generating the digital output according
to the Figure 4.5. This minimizes converter loop gain variation and alleviates stability
concerns. The reduced resolution does induce slower settling during large transit. The
simulation shown in Figure 4.6 compares the start-up process between the proposed ADC
and a conventional 8-bit ADC and larger overshoot and ringing is observed with the proposed
ADC.
The comparator schematic is implemented using a conventional sense amplifier as shown in

















Vout (mV) ADC code 
Vout ≤ 807 
82 
807 < Vout ≤ 919 
114 
919 < Vout ≤ 975 
122 
975 < Vout ≤ 982 
123 
982 < Vout ≤ 989 
124 
989 < Vout ≤ 996 
125 
996 < Vout ≤ 1003 
126 
1003 < Vout  ≤1010 
127 
1010 < Vout ≤ 1017 
128 
1017 < Vout ≤ 1024 
129 
1024 < Vout ≤  1080 
130 
1080 < Vout ≤  1192 
138 
Vout > 1192 
154 

Figure 4.5: LADC architecture.
Figure 4.6: LADC settling.
42






















Figure 4.7: LADC comparator.























Figure 4.8: Clock generator.
43
The schematic of a free-running 16-phase clock generator that oscillates nominally at
320MHz is shown in Figure 4.8. current starved differential ring oscillator is adopted for its
compact size. Eight differential delay stages are used to generate the 16 phases to achieve
0.5% duty cycle resolution when used with the DPWM generator.
4.3.3 Digital PWM generator
The DPWM generator that consists of a DPWM core that output Set (S) and Reset (R)














Figure 4.9: DPWM architecture.
The DPWM core schematic is shown in Figure 4.10. A hybrid architecture is adopted
where a counter provides the 4-bit MSB and a multi-phase selector (MPS) that selects one
phase from 16 input phases to provides the 4-bit LSB, achieving 8-bit duty cycle resolution
and 20MHz PWM frequency. An anti-glitch circuit is deployed to avoid glitches when MPS
switching phases.
The operation details of the DPWM core can be illustrated by the timing diagram shown
in Figure 4.11. A switching cycle starts when counter value reaches zero, asserts the S signal.
The R signal is asserted by the rising edge of the selected phase. In addition, a set of mask
signals is generated from the MUXen signal to ensure the selected phase is passed through
























Q      D
DFF
D      Q
DFF
D      Q
DFF







Figure 4.10: DPWM core.
Figure 4.11: DPWM timing.
45
signal is synced with the falling edge of the phase 0 and is set to high when the counter
value equals to 7. The MUXen signal is then further synchronized with phase 3, 7, 11 and 15
to generate mask[0:3] signals that control phase[0:3], [4:7], [8:11] and [12:15] respectively so
that only the unmasked rising edge of the selected phase can be passed through. To further
avoid possible glitch, the valid PWM codes are limited from 16 to 235 (counter value 1 to
14).
4.3.4 Deadtime control unit















Figure 4.12: Deadtime control unit.
64ps resolution with the minimum deadtime of 253.8ps.
46
4.4 Measurement results
Figure 4.13: Die photo.




DPWM and CLK gen 1250
Droop compensator 1500
Deadtime ctrl 300
MP and gate driver 135700
MN and gate driver 29000
Top comp. current source 17500
Bottom comp. current source 3800
total 206600
A prototype proposed buck converter was fabricated in a 65nm CMOS process. All the
digital logic (digital droop compensator, etc.) are fully synthesized using standard cells.
47
PowerFETs are implemented using 2.5V devices in order to operate with VIN = 1.8V. Op-
erating with L = 500nH, C = 1µF. The die photo is shown in Figure 4.13 and the area
breakdown is shown in Table 4.2.
4.4.1 Steady state measurements
Efficiency The efficiency vs. load current is shown in Figure 4.14. The converter achieved
peak efficiency of 89.8% at load current of 350mA. The maximum load current is 2.1A.
































Figure 4.15: Load regulation.
48
Load regulation The VOUT vs. load current is shown in Figure 4.15. The output voltage
is tightly regulated between 0.996V and 1.002V, one LSB of the error ADC, when load
current scales from 14mA to 2.1A.
LADC measurement The LADC response measurement is shown in Figure 4.16 and the
non-linearity measurement within the linear region of the ADC is shown in Figure 4.17. The
accurate response of the LADC ensures the converter to have optimal load regulation.






















Figure 4.16: LADC response.


























Figure 4.17: LADC non-linearity.
49
DPWM measurement The DPWM response measurement is shown in Figure 4.18 and
the non-linearity measurement is shown in Figure 4.19. The maximum values of DNL and
INL are both less than 1 LSB, indicating the DPWM achieves 8-bit resolution.

























Figure 4.18: DPWM response.


























Figure 4.19: DPWM non-linearity.
Steady-state output voltage ripple The steady-state ripple is shown in Figure 4.20.
Since the DPWM is designed to always drive the output voltage to the dead-zone of the
ADC, the bang-bang behavior is avoided, a ripple less than 3mV is achieved.
50
Figure 4.20: Steady-state output voltage ripple.
4.4.2 Droop compensator measurement
Droop compensator with perfect advance notice The VOUT droop/overshoot when a
positive and negative 480mA (maximum compensation current) load current step is applied
with and without the droop compensation is shown in Figure 4.21. The compensator reduces
the droop/overshoot from 110mV to less than 5mV or less than 0.5% VOUT . Thanks to the
Figure 4.21: Droop measurement w/ highest load step.
51
Figure 4.22: Droop measurement w/ different load steps.
52
fully programmable compensator, different load steps can be also compensated. Figure 4.22
shows the compensated/uncompensated VOUT droop/overshoot respectively for load steps
of 280mA, 340mA and 410mA. The droop/overshoot in all cases are reduced to less than
5mV.
Droop compensator with imperfect advance notice Figure 4.23 shows the effective-
ness of droop compensator in the impact of inaccurate predictions made by the load. When
an unpredicted 302mA/1ns load step is applied, VOUT droop is reduced from 96mV to 30mV
and the 1% settling time is improved from 1.3µs to 0.4µs. Similarly, overshoot due to false
prediction of 302mA load step is reduced to 18mV from 80mV. When the load provides an
inaccurate estimate of ISTEP , the latency of droop compensator leads to an additional 10mV
VOUT droop in the worst case. The impact of inaccurate compensation ramp rate is measured
by setting it to the lowest/highest possible values. For a 302mA/1ns load step, the voltage
droop/overshoot was measured to be 20mV/30mV under these worst-case conditions.
Figure 4.23: Impact of inaccurate prediction.
53
Duty cycle correction The measured compensation w/ and w/o duty cycle correction is
shown in Figure 4.24. Without the proposed duty cycle correction unit, an additional 35mV
droop will appear when a 300mA/1ns load step is applied even if the perfect prediction is
received. This is due to the finite settling time of the PID compensator.
Figure 4.24: Duty cycle correction.
The performance of the converter is compared to the state-of-the-art in Table 4.3. To
the best of our knowledge, the proposed transceiver achieves smallest droop/overshoot, 1%
setting time and is able to compensate for any load steps.
4.5 Conclusion
This work presents techniques to implement fast transient DC-DC buck converter that utilize
the advance notice from the load to completely eliminate output voltage droop/overshoot
due to large load current steps.
Fabricated in 65nm CMOS technology, the proposed digitally controlled fast load transient
buck converter can deliver load current up to 2.1A with peak efficiency of 89.8% when load
current is 350mA and switching at 20MHz. The proposed DPWM unit intentionally drives
54
Table 4.3: Performance comparison of proposed converter.
This work ISSCC’10[45] ISSCC’16[20] ISSCC’17[21]
Tech. 65nm 180nm 180nm 130nm
Controller type Digital Digital Analog Analog
FSW 20MHz 0.5MHz 30MHz 30MHz
Inductor 500nH 18.8µH 220nH X4 90nH
Capacitor 1µF 22µF 0.62µF 0.47µF X2
VIN 1.8V 3.3V(typical) 3.3V 3.3V
VOUT 1.0V 1.8V 1.8V 1.8V
Peak efficiency 89.8% 94% 86.5% 88%@VOUT =1.8V
Load current step Variable, max 0.48A(1ns) 0.3A 1.8A(5ns, 4 phases) 1.25A(2ns)/0.62A(2ns)
TADV 3ns N/A N/A N/A
Droop 8mV 130mV 100mV 36mV/12mV
1% Settling time 0 100µs 4.8µs 125ns/0ns
FoM1 22.5 0.39 0.48 2.77/2.04









the output voltage into error ADC’s dead-zone to avoid the bang-bang behavior, success-
fully reduces the measured steady-state ripple to less than 3mV. Using load-assisted droop
compensation techniques the prototype converter achieves less than 8mV output voltage
droop/overshoot in response to a 480mA/1ns load step. When the load provides erroneous
information, the proposed safeguards limit the worst-case droop to be less than 40mV.
4.6 Alternative compensation correction method
In this section we present a more accurate compensation correction method that estimates
the unknown output filter and the load step value.
To estimate the value of the output filter as well as the load step current, two independent
stimulus are applied to the output filter to obtain three state equations. Consider the worst
scenario where LOUT and ISTEP are underestimated (K0 > KT , IC0 < IT ) while COUT are
overestimated, as depicted in Figure 4.25, the positive load step occurs again at time t0 and








The first stimulus can be created by assuming the IC0 is the only error source and it is
updated to IC1 as:



































































































Figure 4.25: Compensation behavior w/ advanced compensation monitor algorithm.





((IT −KT t1 −KT t)− (IC1 −K0t))dt




The second stimulus can be created by assuming the only error source is now the inductor
so that K0 can be updated to K1 as:









((IT −KT (t1 + t2)−KT t)− (IC1 −K0t2 −K1t))dt




Combining Eqs. 4.4, 4.5, 4.6 and 4.8 the values of IT , COUT , KT and target compensation
time tT can be calculated. Assuming currently the output voltage is VD volt below the target
output voltage, the compensation current and the ramping rate should be changed to:
IC2 =
2COUTVD
(tT − t1 − t2 − t3)2




(tT − t1 − t2 − t3)
(4.10)
so that at the end of the compensation the output voltage and inductor value can reach their
target value.
To verify the performance of the aforementioned algorithm, a MATLAB simulation using
the tabulated parameters shown in Table 4.4 is performed and the compensation behavior
with and without the proposed algorithm is shown in Figure 4.26 and Figure 4.27 respectively.
Table 4.4: Numeric values used in simulation.
Description Value
COUT Output cap (Actual) 0.5µF
CNOM Output cap (Nominal) 1µF
LOUT Output ind (Actual) 1µH
LNOM Output ind (Nominal) 0.5µH
KT Compensation current slew rate (target) 1× 106A/s
K0 Compensation current slew rate (init.) 2× 106A/s
VLSB ADC resolution 6 mV
ISTEP Load current step 1000 mA
IT Initial compensation current (target) 1000 mA
IC0 Initial compensation current (est.) 100 mA
With the proposed algorithm, the compensation ends after 1.25µs. The inductor current
reaches its target value of 1.1A and the accuracy of the calculated output capacitor and
inductor values are more than 99%. The maximum overshoot/droop voltage is bounded
within +/-1 VLSB despite of a 50% error for predicted output capacitor and inductor values
and 90% error for the load step current value. On the other hand, the output voltage droops
to zero in less than 0.5µs, without the proposed algorithm, proving the effectiveness of the
proposed algorithm.
57
Figure 4.26: Compensation behavior w/ advanced compensation monitor algorithm,
MATLAB sim.






The thesis demonstrates the circuit and architecture techniques to address the high idle
power consumption of the on-chip interconnects. The proposed solution dramatically in-
creases the power efficiency for bursty, low utilization on-chip links.
Specifically, a 10Gb/s/ch, 0.6pJ/bit/mm power scalable rapid-on/off on-chip interconnects
transceiver is presented. The transceiver employs a capacitive driver based FFE scheme
to simultaneously gain the high efficiency of the voltage mode driver and the equalization
ability without implementing additional transmitter supply rails. The programmable divider
deployed in the MDLL reduces the power-on-lock time to 2 TREF . Fabricated in 65nm CMOS
technology, the proposed transceiver delivers 5Gbps/µm throughput density and power-on-
lock time of less than 17ns. The rapid-on/off ability allows the effective data rate to scale
by 125x with the penalty of only 1.6x power efficiency degradation.
To mitigate the possible supply droop/overshoot caused by the proposed rapid on/off op-
eration or aggressive dynamic voltage frequency scaling, a fast transient DC-DC buck con-
verter that utilizes the advance notice from the load to completely eliminate output voltage
droop/overshoot due to large load current steps. Fabricated in 65nm CMOS technology, the
proposed digitally controlled fast load transient buck converter can deliver load current up
to 2.1A with peak efficiency of 89.8% when load current is 350mA and switching at 20MHz.
Using load-assisted droop compensation techniques the prototype converter achieves less
than 8mV output voltage droop/overshoot in response to a 480mA/1ns load step. When
the load provides erroneous information, the proposed safeguards limit the worst-case droop
to be less than 40mV.
59
REFERENCES
[1] D. Abts, M. R. Marty, P. M. Wells, P. Klausler, and H. Liu, “Energy proportional
datacenter networks,” in Proceedings of the 37th Annual International Symposium on
Computer Architecture, ser. ISCA ’10. New York, NY, USA: ACM, 2010. [Online].
Available: http://doi.acm.org/10.1145/1815961.1816004 pp. 338–347.
[2] L. A. Barroso and U. Hlzle, “The case for energy-proportional computing,” Computer,
vol. 40, no. 12, pp. 33–37, Dec 2007.
[3] G. Shu, W. Choi, S. Saxena, S. Kim, M. Talegaonkar, R. Nandwana, A. Elkholy, D. Wei,
T. Nandi, and P. Hanumolu, “A 16Mb/s-to-8Gb/s 14.1-to-5.9pJ/b source synchronous
transceiver using DVFS and rapid on/off in 65nm CMOS,” in 2016 IEEE International
Solid-State Circuits Conference, ISSCC 2016, vol. 59. United States: Institute of
Electrical and Electronics Engineers Inc., 2 2016, pp. 398–399.
[4] T. Anand, M. Talegaonkar, A. Elkholy, S. Saxena, A. Elshazly, and P. K. Hanumolu,
“A 7Gb/s embedded clock transceiver for energy proportional links,” IEEE Journal of
Solid-State Circuits, vol. 50, no. 12, pp. 3101–3119, Dec 2015.
[5] E. A. Burton, G. Schrom, F. Paillet, J. Douglas, W. J. Lambert, K. Radhakrishnan,
and M. J. Hill, “FIVR - Fully integrated voltage regulators on 4th generation Intel Core
SoCs,” in 2014 IEEE Applied Power Electronics Conference and Exposition - APEC
2014, March 2014, pp. 432–439.
[6] C. Park, R. Badeau, L. Biro, J. Chang, T. Singh, J. Vash, B. Wang, and T. Wang, “A
1.2 TB/s on-chip ring interconnect for 45nm 8-core enterprise Xeon processor,” in 2010
IEEE International Solid-State Circuits Conference - (ISSCC), Feb 2010, pp. 180–181.
[7] M. K. Kumashikar, S. G. Bendi, S. Nimmagadda, A. J. Deka, and A. Agarwal, “14nm
Broadwell Xeon processor family: Design methodologies and optimizations,” in 2017
IEEE Asian Solid-State Circuits Conference (A-SSCC), Nov 2017, pp. 17–20.
[8] S. M. Tam, H. Muljono, M. Huang, S. Iyer, K. Royneogi, N. Satti, R. Qureshi, W. Chen,
T. Wang, H. Hsieh, S. Vora, and E. Wang, “SkyLake-SP: A 14nm 28-core Xeon pro-
cessor,” in 2018 IEEE International Solid - State Circuits Conference - (ISSCC), Feb
2018, pp. 34–36.
60
[9] F. Clermidy, C. Bernard, R. Lemaire, J. Martin, I. Miro-Panades, Y. Thonnart, P. Vivet,
and N. Wehn, “A 477mW NoC-based digital baseband for MIMO 4G SDR,” in 2010
IEEE International Solid-State Circuits Conference - (ISSCC), Feb 2010, pp. 278–279.
[10] S. Höppner, D. Walter, T. Hocker, S. Henker, S. Hänzsche, D. Sausner, G. Ellguth,
J. U. Schlüler, H. Eisenreich, and R. Schüffny, “An energy efficient multi-Gbit/s NoC
transceiver architecture with combined AC/DC drivers and stoppable clocking in 65nm
and 28nm CMOS,” IEEE Journal of Solid-State Circuits, vol. 50, no. 3, pp. 749–762,
March 2015.
[11] P. Salihundam, S. Jain, T. Jacob, S. Kumar, V. Erraguntla, Y. Hoskote, S. Vangal,
G. Ruhl, P. Kundu, and N. Borkar, “A 2Tb/s 64 mesh network with DVFS and
2.3Tb/s/W router in 45nm CMOS,” in 2010 Symposium on VLSI Circuits, June 2010,
pp. 79–80.
[12] K. Lee, S.-J. Lee, S.-E. Kim, H.-M. Choi, D. Kim, S. Kim, M.-W. Lee, and H.-J. Yoo,
“A 51mW 1.6GHz on-chip network for low-power heterogeneous SoC platform,” in
Solid-State Circuits Conference, 2004. Digest of Technical Papers. ISSCC. 2004 IEEE
International, Feb 2004, pp. 152–518 Vol. 1.
[13] D. Schinkel, E. Mensink, E. A. M. Klumperink, E. van Tuijl, and B. Nauta, “Low-power,
high-speed transceivers for network-on-chip communication,” IEEE Transactions on
Very Large Scale Integration (VLSI) Systems, vol. 17, no. 1, pp. 12–21, Jan 2009.
[14] B. Kim and V. Stojanovic, “A 4Gb/s/ch 356fJ/b 10mm equalized on-chip interconnect
with nonlinear charge-injecting transmit filter and transimpedance receiver in 90nm
CMOS,” in 2009 IEEE International Solid-State Circuits Conference - Digest of Tech-
nical Papers, Feb 2009, pp. 66–67,67a.
[15] S. K. Lee, S. H. Lee, D. Sylvester, D. Blaauw, and J. Y. Sim, “A 95fJ/b current-mode
transceiver for 10mm on-chip interconnect,” in 2013 IEEE International Solid-State
Circuits Conference Digest of Technical Papers, Feb 2013, pp. 262–263.
[16] A. E. Kiasari, Z. Lu, and A. Jantsch, “An analytical latency model for networks-on-
chip,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 21,
no. 1, pp. 113–123, Jan 2013.
[17] T. Anand, A. Elshazly, M. Talegaonkar, B. Young, and P. K. Hanumolu, “A 5Gb/s, 10ns
power-on-time, 36µW off-state power, fast power-on transmitter for energy proportional
links,” IEEE Journal of Solid-State Circuits, vol. 49, no. 10, pp. 2243–2258, Oct 2014.
[18] J. S. Seo, R. Ho, J. Lexau, M. Dayringer, D. Sylvester, and D. Blaauw, “High-bandwidth
and low-energy on-chip signaling with adaptive pre-emphasis in 90nm CMOS,” in 2010
IEEE International Solid-State Circuits Conference - (ISSCC), Feb 2010, pp. 182–183.
[19] J. Kim and M. A. Horowitz, “Adaptive supply serial links with sub-1-V operation and
per-pin clock recovery,” IEEE Journal of Solid-State Circuits, vol. 37, no. 11, pp. 1403–
1413, Nov 2002.
61
[20] S. Huang, K. Fang, Y. Huang, S. Chien, and T. Kuo, “12.6 capacitor-current-sensor
calibration technique and application in a 4-phase buck converter with load-transient
optimization,” in 2016 IEEE International Solid-State Circuits Conference (ISSCC),
Jan 2016, pp. 228–229.
[21] L. Cheng and W. Ki, “10.6 A 30MHz hybrid buck converter with 36mV droop and
125ns 1% settling time for a 1.25A/2ns load transient,” in 2017 IEEE International
Solid-State Circuits Conference (ISSCC), Feb 2017, pp. 188–189.
[22] A. Grenat, S. Pant, R. Rachala, and S. Naffziger, “5.6 adaptive clocking system for
improved power efficiency in a 28nm x86-64 microprocessor,” in 2014 IEEE Interna-
tional Solid-State Circuits Conference Digest of Technical Papers (ISSCC), Feb 2014,
pp. 106–107.
[23] Z. Shan, C. K. Tse, and S. Tan, “Pre-energized auxiliary circuits for very fast tran-
sient loads: Coping with load-informed power management for computer loads,” IEEE
Transactions on Circuits and Systems I: Regular Papers, vol. 61, no. 2, pp. 637–648,
Feb 2014.
[24] A. Mazouz, A. Laurent, B. Pradelle, and W. Jalby, “Evaluation of CPU frequency
transition latency,” Computer Science - Research and Development, vol. 29, no. 3, pp.
187–195, Aug 2014. [Online]. Available: https://doi.org/10.1007/s00450-013-0240-x
[25] R. Ge, X. Feng, W. Feng, and K. W. Cameron, “CPU MISER: A performance-directed,
run-time system for power-aware clusters,” in 2007 International Conference on Parallel
Processing (ICPP 2007), Sep. 2007, p. 18.
[26] T. Anand, M. Talegaonkar, A. Elshazly, B. Young, and P. K. Hanumolu, “A 2.5GHz
2.2mW/25µW on/off-state power 2psrms-long-term-jitter digital clock multiplier with
3-reference-cycles power-on time,” in 2013 IEEE International Solid-State Circuits Con-
ference Digest of Technical Papers, Feb 2013, pp. 256–257.
[27] J. Zerbe, B. Daly, W. Dettloff, T. Stone, W. Stonecypher, P. Venkatesan, K. Prabhu,
B. Su, J. Ren, B. Tsang, B. Leibowitz, D. Dunwell, A. C. Carusone, and J. Eble, “A
5.6Gb/s 2.4mW/Gb/s bidirectional link with 8ns power-on,” in 2011 Symposium on
VLSI Circuits - Digest of Technical Papers, June 2011, pp. 82–83.
[28] B. Leibowitz, R. Palmer, J. Poulton, Y. Frans, S. Li, J. Wilson, M. Bucher, A. M.
Fuller, J. Eyles, M. Aleksic, T. Greer, and N. M. Nguyen, “A 4.3GB/s mobile memory
interface with power-efficient bandwidth scaling,” IEEE Journal of Solid-State Circuits,
vol. 45, no. 4, pp. 889–898, April 2010.
[29] F. O’Mahony, J. E. Jaussi, J. Kennedy, G. Balamurugan, M. Mansuri, C. Roberts,
S. Shekhar, R. Mooney, and B. Casper, “A 47×10Gb/s 1.4mW/Gb/s parallel interface
in 45nm CMOS,” IEEE Journal of Solid-State Circuits, vol. 45, no. 12, pp. 2828–2837,
Dec 2010.
62
[30] K. Christensen, P. Reviriego, B. Nordman, M. Bennett, M. Mostowfi, and J. A. Mae-
stro, “IEEE 802.3az: The road to energy efficient ethernet,” IEEE Communications
Magazine, vol. 48, no. 11, pp. 50–56, November 2010.
[31] E. Mensink, D. Schinkel, E. Klumperink, E. van Tuijl, and B. Nauta, “Optimally-placed
twists in global on-chip differential interconnects,” in Proceedings of the 31st European
Solid-State Circuits Conference, 2005. ESSCIRC 2005, Sept 2005, pp. 475–478.
[32] M. Talegaonkar, A. Elshazly, K. Reddy, P. Prabha, T. Anand, and P. K. Hanumolu,
“An 8 Gb/s-64 Mb/s, 2.3-4.2 mW/Gb/s burst-mode transmitter in 90nm CMOS,” IEEE
Journal of Solid-State Circuits, vol. 49, no. 10, pp. 2228–2242, Oct 2014.
[33] A. K. Joy, H. Mair, H. C. Lee, A. Feldman, C. Portmann, N. Bulman, E. C. Crespo,
P. Hearne, P. Huang, B. Kerr, P. Khandelwal, F. Kuhlmann, S. Lytollis, J. Machado,
C. Morrison, S. Morrison, S. Rabii, D. Rajapaksha, V. Ravinuthula, and G. Surace,
“Analog-DFE-based 16Gb/s SerDes in 40nm CMOS that operates across 34dB loss
channels at Nyquist with a baud rate CDR and 1.2Vpp voltage-mode driver,” in 2011
IEEE International Solid-State Circuits Conference, Feb 2011, pp. 350–351.
[34] E. J. Choi, S. Kim, Y. K. Jeong, K. W. Kwon, and J. H. Chun, “A low-swing AC- and
DC- coupled voltage-mode driver with pre-emphasis,” in 2011 IEEE 54th International
Midwest Symposium on Circuits and Systems (MWSCAS), Aug 2011, pp. 1–4.
[35] Y. Lu, K. Jung, Y. Hidaka, and E. Alon, “Design and analysis of energy-efficient recon-
figurable pre-emphasis voltage-mode transmitters,” IEEE Journal of Solid-State Cir-
cuits, vol. 48, no. 8, pp. 1898–1909, Aug 2013.
[36] H. Ito, J. Seita, T. Ishii, H. Sugita, K. Okada, and K. Masu, “A low-latency and high-
power-efficient on-chip LVDS transmission line interconnect for an RC interconnect
alternative,” in 2007 IEEE International Interconnect Technology Conferencee, June
2007, pp. 193–195.
[37] H. K. Krishnamurthy, V. Vaidya, S. Weng, K. Ravichandran, P. Kumar, S. Kim, R. Jain,
G. Matthew, J. Tschanz, and V. De, “20.1 A digitally controlled fully integrated voltage
regulator with on-die solenoid inductor with planar magnetic core in 14nm tri-gate
CMOS,” in 2017 IEEE International Solid-State Circuits Conference (ISSCC), Feb
2017, pp. 336–337.
[38] E. Fayneh, M. Yuffe, E. Knoll, M. Zelikson, M. Abozaed, Y. Talker, Z. Shmuely, and
S. A. Rahme, “4.1 14nm 6th-generation core processor SoC with low power consumption
and improved performance,” in 2016 IEEE International Solid-State Circuits Confer-
ence (ISSCC), Jan 2016, pp. 72–73.
[39] J. Kwak and B. Nikoli, “A self-adjustable clock generator with wide dynamic range in
28nm FDSOI,” IEEE Journal of Solid-State Circuits, vol. 51, no. 10, pp. 2368–2379,
Oct 2016.
63
[40] S. Höppner, S. Haenzsche, S. Scholze, and R. Schüffny, “An all-digital PWM gener-
ator with 62.5ps resolution in 28nm CMOS technology,” in 2015 IEEE International
Symposium on Circuits and Systems (ISCAS), May 2015, pp. 1738–1741.
[41] M. P. Chan and P. K. T. Mok, “Fully integrated digital controller ic for buck converter
with a differential-sensing ADC,” in 2008 IEEE International Conference on Electron
Devices and Solid-State Circuits, Dec 2008, pp. 1–4.
[42] S. Fan, Li Geng, and S. Wang, “An algorithmic ADC applied in digital controlled
switched DC-DC converters,” in 2010 10th IEEE International Conference on Solid-
State and Integrated Circuit Technology, Nov 2010, pp. 330–332.
[43] M. Pagin and M. Ortmanns, “Evaluation of logarithmic vs. linear ADCs for neural signal
acquisition and reconstruction,” in 2017 39th Annual International Conference of the
IEEE Engineering in Medicine and Biology Society (EMBC), July 2017, pp. 4387–4390.
[44] R. C. N. Pilawa-Podgurski and D. J. Perreault, “Merged two-stage power converter with
soft charging switched-capacitor stage in 180nm CMOS,” IEEE Journal of Solid-State
Circuits, vol. 47, no. 7, pp. 1557–1567, July 2012.
[45] H. H. Ahmad and B. Bakkaloglu, “A 300mA 14mV-ripple digitally controlled buck
converter using frequency domain ∆Σ ADC and hybrid PWM generator,” in 2010
IEEE International Solid-State Circuits Conference - (ISSCC), Feb 2010, pp. 202–203.
64
