Clock multiplication techniques for high-speed I/Os by Nandwana, Romesh Kumar
c© 2017 ROMESH KUMAR NANDWANA
CLOCK MULTIPLICATION TECHNIQUES FOR HIGH-SPEED I/Os
BY
ROMESH KUMAR NANDWANA
DISSERTATION
Submitted in partial fulfillment of the requirements
for the degree of Doctor of Philosophy in Electrical and Computer Engineering
in the Graduate College of the
University of Illinois at Urbana-Champaign, 2017
Urbana, Illinois
Doctoral Committee:
Associate Professor Pavan Kumar Hanumolu, Chair
Professor Naresh R. Shanbhag
Professor Deming Chen
Professor Pramod Viswanath
ABSTRACT
Generation of a low-jitter, high-frequency clock from a low-frequency reference clock using
classical analog phase-locked loops (PLLs) requires a large loop filter capacitor and power
hungry oscillator. Digital PLLs can help reduce area but their jitter performance is severely
degraded by quantization error. In this dissertation different clock multiplication techniques
have been explored that can be suitable for high-speed wireline systems. With the emphasis
on ring oscillator based architecture using cascaded stages, three possible architectures are
explored.
First, a scrambling TDC (STDC) is presented to improve deterministic jitter (DJ) perfor-
mance when used with a low-frequency reference clock. A cascaded architecture with digital
multiplying delay locked loop as the first stage and hybrid analog/digital PLL as the second
stage is used to achieve low random jitter in a power efficient manner. Fabricated in a 90 nm
CMOS process, the prototype frequency synthesizer consumes 4.76mW power from a 1.0V
supply and generates 160MHz and 2.56GHz output clocks from a 1.25MHz crystal reference
frequency. The long-term absolute jitter of the 160MHz digital MDLL and 2.56GHz digital
PLL outputs are 2.4 psrms and 4.18 psrms, while the peak-to-peak jitter is 22.1 ps and 35.2 ps,
respectively. The proposed frequency synthesizer occupies an active die area of 0.16mm2
and achieves power efficiency of 1.86mW/GHz.
Second, a hybrid phase/current-mode phase interpolator (HPC-PI) is presented to im-
prove phase noise performance of ring oscillator-based fractional-N PLLs. The proposed
HPC-PI alleviates the bandwidth trade-off between VCO phase noise suppression and ∆Σ
quantization noise suppression. By combining the phase detection and interpolation func-
tions into an XOR phase detector/interpolator (XOR PD-PI) block, accurate quantization
error cancellation is achieved without using calibration. Use of a digital MDLL in front
ii
of the fractional-N PLL helps in alleviating the bandwidth limitation due to reference fre-
quency and enables bandwidth extension even further. The extended bandwidth helps in
suppressing the ring-VCO phase noise and lowering the in-band noise floor. Fabricated in
65 nm CMOS process, the prototype generates fractional frequencies from 4.25 to 4.75GHz,
with an in-band phase noise floor of -104 dBc/Hz and 1.5 psrms integrated jitter. The clock
multiplier achieves power efficiency of 2.4mW/GHz and FoM of -225.8 dB.
Finally, an efficient clock generation, recovery, and distribution techniques for flexible-rate
transceivers are presented. Using a fixed-frequency low-jitter clock provided by an integer-N
PLL, fractional frequencies are generated/recovered locally using multi-phase fractional clock
multipliers. Fabricated in a 65 nm CMOS, the prototype transceiver can be programmed to
operate at any rate from 3-to-10Gb/s. At 10Gb/s, integrated jitter of the Tx output and
recovered clock is 360 fsrms and 758 fsrms, respectively.
iii
To my parents,
for their love, encouragement, and support.
iv
ACKNOWLEDGMENTS
Pursuing a doctoral degree has been a very rewarding journey for me and helped in gaining
a better understanding of many aspects of life. This was made possible only with the
constant support and assistance that I have received from my family, friends and many
others throughout this journey. For that, I would like to thank all the people who made it
possible, and made my graduate school an enjoyable and pleasant experience.
First and foremost, I would like to express my deepest gratitude to my adviser Prof. Pavan
Kumar Hanumolu, for his excellent guidance, encouragement, caring, and patience with me
for doing research. I am deeply indebted to him for giving me such a wonderful opportunity
to work under his supervision even when I was not so certain about my capabilities for
pursuing a doctoral degree. Working with him gave me a better understanding in different
aspects of circuit design as well as life.
I would like to extend my sincerest gratitude to Prof. Naresh Shanbhag, Prof. Deming
Chen and Prof. Pramod Viswanath for serving on my thesis committee. Their constructive
comments and questions have always helped provide a different perspective to understanding
circuit topologies. I would also like to thank Prof. Karti Mayaram and Prof. Un-Ku Moon
and Prof. Gabor Temes of the Oregon State University who shaped my interest in circuit
design when I was just a Master’s degree student.
I would also like to thank many friends who made this journey extremely enjoyable starting
with my labmate as well as roommate Saurabh Saxena, as well as Mrunmay Talegaonkar,
Ahmad Elkholy, Guanghua Shu, Woo-Seok Choi, Charlie Zhu, Seong-Jung Kim, Tejasvi
Anand, Amr Elshazly, Rajesh Inti, Sachin Rao, Karthik Reddy, Manideep Gande, Ankur
Guha Roy, Timir Nandi, Brady Salz, Danny Coombs, Ahmed Safwat, Mustafa Gamal, Da
Wei, Min-Sun Keel and Mingu Kang. I would also like to thank my friend since under-
v
graduate studies, Mohit Singh, for his constant encouragement throughout my engineering
studies.
I am also deeply grateful to Ken Chang, Yohan Frans, Parag Upadhaya of Xilinx Inc. and
Ganesh Balamurugan of Intel Labs for providing me internship opportunities and industrial
prospective to circuit design.
This acknowledgment would be incomplete without expressing my gratutude to my beau-
tiful wife Priya, for bearing with me throughout our courtship period and thereafter. She
patiently listened to all my mumblings and rants about circuits as all reject filter and pro-
vided constant love and support. Lastly, and most importantly, I wish to thank my parents
(Ram Gopal and Aruna), and my sibling (Mahesh). They receive my deepest gratitude for
all their endless love, encouragement and moral support through all these years. To them, I
dedicate this dissertation.
vi
TABLE OF CONTENTS
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
CHAPTER 2 DIGITAL FREQUENCY SYNTHESIZER FOR LOW-FREQUENCY
REFERENCE CLOCKS USING SCRAMBLING TDC . . . . . . . . . . . . . . . 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Jitter in Low Reference Frequency DPLLs . . . . . . . . . . . . . . . . . . . 8
2.3 Evaluation of Frequency Synthesizer Architecture Options . . . . . . . . . . 12
2.4 Proposed Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Second-Stage DPLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
CHAPTER 3 CALIBRATION-FREE FRACTIONAL-N CLOCKGENERATION
USING HYBRID PHASE DETECTOR/INTERPOLATOR . . . . . . . . . . . . 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Prior Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Proposed Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
CHAPTER 4 FLEXIBLE CLOCKING SCHEME BASED WIDE DATA-RATE
TRANSCEIVER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2 Prior Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Proposed Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4 Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
CHAPTER 5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
vii
CHAPTER 1
INTRODUCTION
1.1 Introduction
System on Chip (SoC) design is an emerging era in the world of high-level integration.
Figure 1.1 shows key components of a typical SoC system. It can be categorized into two key
functional blocks known as the Accelerated Processing Unit (APU) and the Input/Output
Processing Unit (I/O). While APU provides the necessary computational power with the
help of the Central Processing Unit (CPU) (single or multicore) and Graphical Processing
Unit (GPU), the I/O block provides essential data transfer capabilities among different
processing units and with the external data interfaces.
Figure 1.1: Key functional components of a typical SoC.
To illustrate it further, the top-level block diagram of an SoC chip from AMD Embedded
G-Series SoCs is shown in Fig. 1.2 [1]. It consists of four CPUs and one GPU core as
APU processing units and has multiple communication links ranging from a low-speed RS-
232 interface to the very high-speed Gigabit Ethernet (GbE) and PCIexpress links which
are vital for data transfer between multiple cores, CPU to memory, as well as with other
external interfaces. As the level of integration increases, the I/O processing unit becomes
more critical.
1
Figure 1.2: Block diagram of AMD embedded G series SoC [1].
As an example of a typical communication link that is used in the SoCs, a block-level
diagram of a serial link transceiver is shown in Fig. 1.3. It takes serial data, Din, and
transmits over a physical channel using an output driver. On the receiver side, the received
signal is amplified and the data, Dout, is recovered using a clock and data recovery (CDR)
block. A Phase-Locked Loop (PLL) on the transmitter side (TxPLL) and one on the receiver
side (RxPLL) provide the necessary clocking for the transceiver. These PLLs are one of the
most important parts of high-speed communication circuits. The PLL takes off-chip crystal
oscillator output as a reference clock and generates a high-frequency on-chip clock. For
almost all the SoCs which operate on clock frequency higher than 50MHz, PLLs are used
as de-facto frequency multipliers and also play a key role in the clock distribution network
2
Figure 1.3: Block diagram of a typical serial link transceiver.
on an SoC.
Since SoCs have different data communication links simultaneously working at different
data rates, different clock frequencies are needed for them. For error-free transmission, these
clocks have different noise specifications in terms of clock jitter. However there is no single
PLL solution available which can fulfill all the requirements for all the data communication
links. Therefore, on a typical SoC the number of PLLs can range anywhere from 4 to 50 or
even higher. Due to this large number, it is impractical to put multiple LC tank based PLLs
on chip. These LC tank based PLLs occupy a significant area and have a narrow output
clock range. So, the aim of this work is to explore different clocking methods to achieve
reasonable noise performance using ring-VCO based PLLs. The dissertation is organized
as follows. Chapter 2 describes the issues associated with PLL design using low-frequency
reference clocks, and describes a proposed cascaded architecture using scrambling time-to-
digital converter to provide a fully integrated clocking solution that overcomes the challenges
of a low-frequency reference clock. Chapter 3 looks at the issues associated with ring-VCO
based fractional-N clock generation and provides a calibration-free method to generate a
fractional-N clock to achieve superior jitter performance by breaking the quantization noise
and VCO phase noise suppression trade-off.
Based on this work, we present a clocking strategy in Chapter 4 for efficient clock gener-
3
ation, recovery, and distribution in flexible-rate transceivers. Using a fixed-frequency low-
jitter clock provided by an integer-N PLL, fractional frequencies are generated/recovered
locally using multi-phase fractional-N clock multipliers, which offers the following advan-
tages. First, clock distribution power is reduced because only a single-phase fixed-frequency
clock is needed. Second, it can provide multiple phases with infinite phase-shifting capability
across a wide frequency range without using phase interpolators. Finally, by maximizing the
bandwidth of a local ring-based clock multiplier, it achieves jitter performance comparable
to that of LC-based multipliers. Finally, a summary of the key contributions is presented in
Chapter 5.
4
CHAPTER 2
DIGITAL FREQUENCY SYNTHESIZER FOR
LOW-FREQUENCY REFERENCE CLOCKS USING
SCRAMBLING TDC
2.1 Introduction
High-frequency clock generation from a low-frequency reference clock is needed in many
of the general-purpose application-specific integrated circuits (ASICs). Phase locked loops
(PLLs), that take an off-chip crystal oscillator as a reference clock and generate a high-
frequency on-chip clock are most commonly used in such frequency multiplication appli-
cations. The design of fully integrated PLLs with an acceptable noise performance is a
significant design challenge. To elucidate the main challenges, consider the design of a clas-
sical Type-II analog charge-pump PLL shown in Fig. 2.1(a) [2] that generates a 2GHz output
from a 1MHz reference clock. Since the PLL bandwidth, FBW, must be less than 1/10 of
the reference frequency (FBW < FREF/10) for stable operation, maximum PLL bandwidth
can be at most 100 kHz. This small bandwidth is not sufficient to adequately suppress the
phase noise from a ring oscillator. Additionally, achieving this bandwidth with a reason-
able phase margin also requires a loop-stabilizing zero frequency, FZ, to be about 10 kHz (≈
FBW/10) [3]. With a loop filter resistor of 15 kΩ, the loop filter capacitor needs to be at
least 1 nF, which is prohibitively large to implement on-chip.
Recently much research has been done toward moving the PLLs more into the digital
domain to leverage the process scaling benefits. Since the PLL loop acts on every reference
edge, it is periodic and discrete time in nature. Due to this unique property, we can replace
the analog loop filter components with the digital loop filter using continuous time to discrete
time by using first-order Taylor series expansion of the discrete time operator z = esT as
shown in Eq. (2.1).
5
(a)
(b)
Figure 2.1: Block diagram of conventional PLLs with a low-frequency reference clock: (a)
charge-pump-based analog PLL, and (b) digital PLL.
icpR +
icp
Cs
=⇒ KP +KI z
−1
1− z−1 (2.1)
where
KP = icpR and KI =
icpTREF
C
6
A digital PLL (DPLL) shown in Fig. 2.1(b) offers an attractive alternative to reduce the
area of a traditional charge-pump based PLL [4–6]. A DPLL is composed of mostly digital
circuits such as the time to digital converter (TDC), digital proportional-integral loop filter,
digitally controlled oscillator (DCO), and a feedback divider. Replacing the analog loop filter
with a digital loop filter obviates the need for a large loop filter capacitor, thus resulting
in significant area savings. However, DPLLs suffer from a conflicting bandwidth trade-off
to suppress the TDC quantization error and DCO phase noise simultaneously. A low PLL
bandwidth is needed to suppress the TDC quantization error while a wide bandwidth is
desirable to suppress the DCO phase noise [7]. As a result, the DPLL bandwidth is typically
much lower than FREF/10. This constraint further exacerbates the random jitter (RJ) issue
present in analog PLLs.
Additionally, the DPLL also suffers from a unique trade-off between the reference fre-
quency and deterministic jitter (DJ) resulting from the accumulation of phase/frequency
quantization error [8]. Both of these issues are discussed in Section 2.2. The focus of
the chapter is to address RJ and DJ performance issues associated with a low-frequency
input reference clock. To this end, we propose a cascaded digital MDLL and digital PLL
frequency synthesizer architecture with a scrambling TDC (STDC). The proposed STDC
alleviates the trade-off between the deterministic jitter accumulation and the input reference
frequency and achieves reference frequency independent deterministic jitter. Fabricated in
a 90 nm CMOS process, the prototype frequency synthesizer consumes 4.76mW power from
a 1.0V supply and generates a 2.56GHz output clock with a long-term absolute jitter of
4.18 psrms from a 1.25MHz crystal reference frequency.
The rest of the chapter is organized as follows. After describing the RJ and DJ issues in
Section 2.2, we evaluate different possible architecture options in Section 2.3 and present the
proposed architecture in Section 2.4. The implementation details of key building blocks are
discussed in Sections 2.4 and 2.5. Experimental results from a prototype implementation
fabricated in a 90 nm CMOS process are shown in Section 2.6.
7
2.2 Jitter in Low Reference Frequency DPLLs
0 100 200 300 400 500 600 700 800 900 10000
1
2
3
4
5
6
7
8
9
10
Bandwidth [kHz]
In
te
gr
at
ed
 
Ji
tte
r 
r.
m
.
s 
[%
 
UI
]
Figure 2.2: Effect of PLL loop bandwidth on the oscillator phase noise suppression.
The DPLL output clock jitter can be decomposed into two categories: (i) random jitter
(RJ) resulting from thermal/flicker noise sources, and (ii) deterministic jitter (DJ) caused by
quantization error sources. RJ is typically dominated by the phase noise of the ring oscillator,
which can be only reduced either by increasing the PLL bandwidth or by increasing the power
consumption of the oscillator. Since the phase noise only improves by 3 dB with doubling
the power, it is more power efficient to suppress the phase noise by increasing the PLL
bandwidth to the extent possible. Therefore, for a given power budget, RJ is dictated by
the maximum allowable bandwidth, which is FREF/10, where FREF is the PLL reference
update rate.
To quantify this, consider a 2.5GHz VCO designed with a power budget of 3mW to
achieve a spot phase noise of -98 dBc/Hz at 1MHz offset frequency. When this VCO is
embedded in a PLL, the resulting RJ is plotted as a function of the PLL bandwidth in
Fig. 2.2. To achieve less than 1% UIrms of RJ, the PLL bandwidth must be at least 700 kHz,
which translates to a lower limit on FREF to be about 7MHz. Alternatively, the plot also
8
Figure 2.3: Effect of proportional and integral path gain on the deterministic jitter
accumulation: (a) TDC output waveform, (b) output phase dithering in the conventional
DPLL, and (c) output phase dithering in the DMDLL and the hybrid-PLL due to the TDC
output.
reveals that RJ can be at best 5% UIrms when FREF=1MHz (100 kHz bandwidth). This
issue of large RJ is further exacerbated in DPLLs because the PLL bandwidth has to be
further lowered to suppress the TDC quantization error. Assuming the PLL bandwidth is
reduced to FREF/20 (50 kHz in this example), RJ due to the VCO phase noise alone increases
to about 10% UIrms (see Fig. 2.2), which is unacceptable for most applications.
In addition to the RJ issues discussed thus far, DPLLs also suffer from a large DJ. The
hard nonlinearity of the TDC around the lock point (zero phase error) makes the DPLL
behave like a nonlinear system even in the steady state. As a result, the steady state of
a DPLL is a bounded limit cycle whose frequency and magnitude are governed by noise
and the loop delay. The DJ resulting from such a limit cycle behavior has been calculated
in [9]. However, to highlight how a low-frequency reference exacerbates the limit cycle
induced DJ we simply assume that the TDC output dithers only between two states (±1)
9
at FREF/2 as depicted in Fig. 2.3(a).
1 Under this assumption, the DCO output frequency
dithers between ±(KP + KI) · KDCO, where KDCO is the DCO gain and KP and KI are
the proportional and integral path gains, respectively. The output phase accumulates as
illustrated in Fig. 2.3(b), which appears as DJ in the time-domain or as large spur at FREF/2
in the frequency spectrum. The magnitude of DJ is proportional to the limit cycle period,
2TD, the proportional path gain, KP, the integral path gain, KI and the DCO gain, KDCO.
Because KP ≫ KI to ensure an over-damped response, output DJ is dominated by the
proportional path. In other words, the TDC quantization error appears at the output
mostly through the proportional path while a majority of it is suppressed in the integral
path. Thus, DJ is proportional to KPKDCO and the limit cycle period, 2TD. As a result,
lowering KPKDCO can reduce DJ, but it also reduces the loop bandwidth and exacerbates
RJ caused by the DCO phase noise. Therefore, new techniques are needed to eliminate the
DJ contribution from the proportional path without degrading the RJ performance and to
reduce the jitter accumulation time to much less than 2TD.
A hybrid-PLL reported in [10] provides a means to eliminate the limit cycle induced DJ in
the proportional path. By using an analog proportional path, the TDC quantization error
and its associated DJ are eliminated. The integral control, however, was implemented in the
digital domain using a simple TDC to obviate the need for a large loop filter capacitor. Be-
cause the proportional path governs PLL loop dynamics, the hybrid-PLL exhibits linear loop
dynamics and the DJ performance is greatly improved compared to a DPLL implemented
using a TDC in the proportional path. However, much like in an analog PLL, the bandwidth
is limited to FREF/10, which is shown to be inadequate to meet the RJ performance using a
low-power DCO (see Fig. 2.2).
Multiplying delay locked loops (MDLLs) are shown to achieve superior RJ performance
compared to the PLLs [11–14]. By replacing the noisy VCO edge with a clean reference clock
edge, MDLLs periodically reset the VCO jitter accumulation at a rate faster than in PLLs.
In the frequency domain, this translates to an equivalent noise suppression bandwidth of
about FREF/4, which is 2.5 times larger than the maximum allowable PLL bandwidth. As a
1This assumption is valid because the loop delay can be reduced to be much lower than reference clock
period (≈ 1µs).
10
0 50 100 150 200 250 300 350 4000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
DCO Frequency Step Resolution [ppm]
De
te
rm
in
is
tic
 
Ji
tte
r 
[U
I]
Figure 2.4: Effect of the DCO frequency resolution on deterministic jitter with limit cycle
period of 1µs.
result, MDLLs exhibit nearly 8 dB superior phase noise compared to a PLL using the same
oscillator [14]. Referring to Fig. 2.2, the RJ reduces to about 2% UIrms with FREF = 1MHz,
which represents an improvement of 4.5x and 2.5x compared to a DPLL and an analog PLL,
respectively.
More importantly, resetting of the DCO phase by a periodic reference injection obviates
the need for an explicit proportional path in an MDLL. As a result, digital MDLLs also
exhibit superior DJ performance because jitter accumulation is proportional to KIKDCO
instead of KPKDCO as in DPLLs (see Fig. 2.3(c)). These features make digital MDLLs
particularly suitable for high-frequency clock generation using very low-frequency reference
clocks. Next, we take a closer look at the DJ performance limitations of a digital MDLL.
The DJ of the output clock generated by a digital MDLL is proportional to KIKDCO
and the limit cycle period, 2TD. Hence, an improvement of the DJ performance requires
lowering of either KIKDCO or the limit cycle period (or both). To reduce KIKDCO a very
high-resolution DCO is required.
Figure 2.4 shows the effect of the DCO frequency resolution on DJ with a limit cycle
11
period of 1µs (equivalent to a 1MHz reference clock). For DJ to be less than 0.25UI of the
output clock requires a DCO resolution better than 150 ppm/LSB. This translates to very
small current step size for the DAC used in a DCO (<1 nA/LSB). On the other hand, the
limit cycle period depends upon the loop latency and the input clock period and cannot
be shorter than 2TREF. Thus, for a given DCO resolution, the reference frequency sets
the lower bound on DJ. In view of this, we present a cascaded frequency synthesizer that
uses the proposed scrambling TDC, to decouple DJ from the reference clock period. The
proposed scrambling TDC facilitates the design of a frequency synthesizer with excellent DJ
performance even when operating with a very low-frequency reference clock.
2.3 Evaluation of Frequency Synthesizer Architecture Options
Figure 2.5: Block diagram of the available architecture options: (a) DPLL only, (b)
DMDLL only, (c) cascaded DMDLL+DPLL, and (d) cascaded DMDLL+DMDLL.
MDLLs offer superior jitter performance compared to PLLs, but they are susceptible to
non-idealities of the circuit responsible for injecting a clean reference edge (select logic) in
place of a noisy VCO edge [11]. Typically, such imperfections manifest as increased DJ,
negating some of the lower DJ benefits of MDLLs. The strict waveform shape matching and
12
timing requirements to reduce select logic induced DJ get exacerbated at low reference fre-
quencies and large divide ratios, thus limiting MDLLs to low-to-moderate output frequencies
and small division ratios (<50) [11,14].
A cascaded frequency synthesizer in which the multiplication factor is split between two
stages reduces the division ratio in each stage and eases the design of the low-frequency
first stage. For comparison, we consider the noise performance of four possible combinations
namely, DPLL-only, DMDLL-only, cascaded DMDLL+DPLL, and cascaded DMDLL+DMDLL
as shown in Fig. 2.5. We assume that the VCO is the dominant source of noise and keep
the total VCO power the same in all four architectures to 3mW. In the case of the cascaded
architectures, the VCO power is optimally split among the two stages. Since the first stage
dominates the overall noise performance, the majority of the power is used in the first-stage
VCO. The second-stage VCO (LP-VCO) is designed to guarantee sustained oscillation. With
these considerations, the first-stage VCO is designed with 2.7mW, while the second-stage
VCO consumes only 325µW.
The simulated output phase noise plots shown in Fig. 2.6 indicate that cascaded archi-
tectures with a digital MDLL (DMDLL) as the first stage shows superior RJ performance
compared to both DMDLL-only and DPLL-only topologies. Among the two cascaded archi-
tectures, using a DMDLL in the second stage offers a slightly better phase noise performance
compared to using a DPLL. However, as discussed earlier, additional power and design ef-
fort is needed to design a high-speed select logic for the second-stage DMDLL. Thus, the
DPLL second stage is chosen in the proposed architecture for simplicity. This configura-
tion is previously used in fractional-N frequency synthesizers to reduce the fractional divider
quantization noise [15,16]. In our work, it is used to improve the power-jitter (RJ) trade-off.
2.4 Proposed Architecture
The block diagram of the proposed frequency synthesizer is shown in Fig. 2.7 [17, 18]. It is
composed of a cascade of a digital MDLL (DMDLL) and a Type-II DPLL. The DMDLL
operates with a 1.25MHz input reference clock and provides a 160MHz output, OUT1, which
is then multiplied using a DPLL to generate the 2.56GHz output, OUT2. Figure 2.8 shows
13
103 104 105 106 107 108
-140
-130
-120
-110
-100
-90
-80
-70
-60
-50
-40
-30
Frequency [Hz]
Ph
as
e 
No
is
e 
[d
Bc
/H
z]
Figure 2.6: Simulated phase noise comparison between DPLL only, DMDLL only, cascaded
DMDLLs and the proposed cascaded DMDLL+DPLL architecture.
Figure 2.7: Simplified block diagram of the proposed architecture.
the simulated phase noise contributions of the different stages of the proposed frequency
synthesizer architecture.
As discussed earlier, because of the absence of the proportional path, the DMDLL DJ is
dictated by the integral path gain and the jitter accumulation period, which is about 1µs.
14
103 104 105 106 107 108
-140
-130
-120
-110
-100
-90
-80
-70
-60
-50
-40
-30
Frequency [Hz]
Ph
as
e 
No
is
e 
[d
Bc
/H
z]
Figure 2.8: Simulated phase noise contributions of the different stages in the proposed
architecture.
In this work, the proposed scrambling TDC (STDC) minimizes DJ by reducing the jitter
accumulation time as described next. An accumulator (ACC) integrates the STDC output
and drives the multiplexed ring oscillator through a current-mode digital to analog converter
(DAC). The control signal to the edge replacement multiplexer is generated by the select
logic. Details of each of these building blocks are provided next starting with the STDC.
2.4.1 Scrambling TDC
The block diagram of the proposed scrambling TDC is shown in Fig. 2.9. It consists of
a sub-sampling bang-bang phase detector (BBPD) followed by a gain scaling second-order
delta-sigma (∆Σ) modulator. The BBPD is implemented using a cascade of two flip-flops,
FF1 and FF2. The flip-flops are designed using symmetric sense amplifier architecture
reported in [19]. FF1 sub-samples the VCO output with a reference clock and detects the
sign of the phase error. FF2 re-samples FF1 on the negative edge of the reference clock to
15
Figure 2.9: Block diagram of the scrambling TDC.
Figure 2.10: Output waveforms of the scrambling TDC: (a) first-stage output, (b) final
scrambling TDC output, and (c) output phase accumulation due to scrambling TDC
output.
reduce the output state-dependent hysteresis [14]. The output of FF2, denoted as FF2OUT, is
a single bit representation of the sign of the phase error and is directly used to drive the loop
filter accumulator in conventional DMDLLs. Because FF2OUT gets updated only at every
reference clock period TREF, the jitter accumulates in proportion to TREF, as illustrated in
Fig. 2.10.
The STDC reduces the jitter accumulation period by scrambling FF2OUT such that the
output rate is increased (to 64FREF in our implementation) without altering the mean of the
BBPD output. A digital ∆Σ modulator is used to perform scrambling as shown in Fig. 2.9.
16
104 105 106 107
-150
-140
-130
-120
-110
-100
-90
-80
-70
-60
Frequency [Hz]
M
ag
n
itu
de
 
[d
B]
Figure 2.11: Scrambling TDC output spectrum using first- and second-order ∆Σ
modulator.
The magnitude of FF2OUT is first scaled by a factor of 2
−12 and the resulting 12 bits are
truncated to 1 bit using a second-order ∆Σ modulator, implemented using an error-feedback
architecture.
Clocked at a frequency of 64FREF, STDCOUT is a 1 bit sequence whose mean is equal to
that of a conventional BBPD while the gain and output rate are scaled by a factor of 2−12
and 64, respectively. As a result of the increased output rate, the jitter accumulation time
is reduced, translating in a much smaller DJ as illustrated in Fig. 2.10.
Note that simply clocking the STDC at a higher frequency (without scaling the BBPD
output) reduces the jitter accumulation period but also increases the overall TDC gain,
potentially increasing the DJ to an unacceptable value. To mitigate this undesirable effect,
the BBPD output FF2OUT is scaled by 2
−12 before feeding it to the ∆Σ modulator.
A second-order (as opposed to first-order) ∆Σ modulator is used in the STDC to mini-
mize jitter accumulation caused by limit cycles of the ∆Σ modulator itself. The simulated
spectrum of the STDC output, STDCOUT, shown in Fig. 2.11, reveals large spectral peaks
17
when a first-order ∆Σ modulator is used. These spectral peaks indicate the presence of low-
frequency limit cycles in the MDLL loop, which increase the jitter accumulation time and
degrade DJ. When a second-order ∆Σ modulator is used, no spectral peaks are observed,
resulting in a limit cycle free behavior of the MDLL loop.
The shaped ∆Σ truncation error at the output of the STDC is filtered by the DMDLL’s
low-pass jitter-transfer characteristic. Because the oscillator phase noise suppression is
mainly from the feed-forward reference edge injection, the DMDLL input jitter transfer
bandwidth can be reduced to adequately suppress the STDC quantization error. This is in
contrast to a DPLL, which suffers from a conflicting noise bandwidth trade-off to simulta-
neously suppress the TDC quantization error and oscillator phase noise [10].
It is also important to note that in conventional cases ∆Σ modulators are used to truncate
a high-accuracy input into a fewer number of bits while maintaining the high-accuracy level.
In clock multipliers, this input is provided by either using a power hungry, high-resolution
TDC [13, 20], or using a large size accumulator [10, 14]. To achieve high accuracy, this
requires a significantly large size adder in the accumulator block and an area consuming
low-pass, post filter. However, the ∆Σ modulator used in the STDC provides a power and
area efficient mean for gain scaling (2−12) of the input, FF2OUT, while still maintaining the
effective 1-bit resolution at the input, FF2OUT, as well as at the output, STDCOUT.
2.4.2 Accumulator and DAC
The block diagram of the integral path accumulator along with the DAC schematic is shown
in Fig. 2.12. It consists of an 8-bit accumulator, a binary to thermometer converter, and a
255 unit element-based current mode DAC. The fully synthesized accumulator integrates a
single-bit STDC output and generates an 8-bit output. Since the STDC output is updated
at 64FREF, the accumulator and the subsequent DAC are also updated at 64FREF. Clocking
them at a lower rate is equivalent to down sampling the STDC output, which corrupts the
noise shaping, and severely degrades the DJ performance benefits offered by the STDC.
The 8-bit output of the accumulator is passed through a binary to thermometer converter
whose output is used to drive an 8-bit current-mode DAC consisting of 255 identical stages
18
Figure 2.12: Block diagram and schematic of the accumulator and the current mode DAC
used in the DMDLL.
of PMOS unit current sources (see Fig. 2.12). The output current of the DAC, IOUT, is
directly used to drive the multiplexed ring oscillator.
2.4.3 Multiplexed Ring Oscillator
The schematic of the multiplexed ring oscillator along with the select logic is shown in
Fig. 2.13. The core ring oscillator is implemented using 44 current starved CMOS inverter-
based delay stages. Since the oscillation frequency of the ring is relatively low, a large
number of inverter stages are used to achieve sharp rise and fall times, which helps minimize
pattern jitter during reference injection. A divider along with the select logic is used for
periodically injecting a reference clock into the ring oscillator. The select logic is similar to
the one employed in [13] and is implemented using standard cells. Using the divider output
and the output of an intermediate delay stage in the oscillator, the select logic generates a
select signal, SEL, for the multiplexer. A NAND-gate based multiplexer is used for selecting
between the buffered reference clock, REFB and oscillator output, OUTB. To minimize
clock feed through, both REFB and OUTB signals are connected farthest away from the
19
Figure 2.13: Schematic of the multiplexed ring oscillator.
multiplexer output (see Fig. 2.13).
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.50
20
40
60
80
100
120
140
Sense-Amplifier Offset [ps]
O
cc
u
rr
en
ce
σ
offset= 0.14ps
T
rise= 30ps
Total Runs= 1000
Figure 2.14: Simulated sense-amplifier offset using Monte-Carlo analysis.
20
Figure 2.15: Effect of reference injection on the multiplexed ring oscillator output
waveform: (a) with common supply voltage, and (b) with independent supply.
The static phase offset (SPO) in the DMDLL must be minimized as it directly appears as
pattern jitter at the output [14]. The three main sources of SPO are: (a) the voltage offset of
FF1 in the STDC (see Fig. 2.9), (b) the rise/fall time mismatch between REFB and OUTB
signals, and (c) the periodic voltage ripple on the DCO supply node (VCC in Fig. 2.13). The
voltage offset of FF1 is reduced by up-sizing the transistors that contribute to voltage offset
and the impact of the voltage offset on SPO is minimized by reducing the rise/fall times of
both the reference clock and the VCO inputs of FF1. Monte-Carlo simulations indicate that
the standard deviation of the input referred voltage offset of FF1 is 3.7mV, which translates
to an SPO of only 0.14 ps with a rise/fall time of 30 ps as shown in Fig. 2.14. The rise/fall
time mismatch between REFB and OUTB is minimized by buffering the reference clock using
replica delay stages. Using a large number of stages in the oscillator lowers the rise/fall time,
21
which helps to reduce the impact of the rise/fall time mismatch on the static phase offset.
Figure 2.15(a) shows the effect of the reference injection on the VCO supply node VCC
when the MDLL ring and the multiplexer supply are connected to the same node (VCC). In
this case, the voltage swing difference between the injected reference REFB (VCC1) and the
ring output OUTB (VCC) causes a periodic ripple on the VCO supply node VCC. This ripple
changes the VCO output clock period (TSHORT) and appears as DJ at the DMDLL output.
To minimize DJ, the ring VCO control is split into two parts: variable control voltage VCC
and a fixed voltage VDD. The multiplexer along with the delay stages connected to its input
and output are connected to VDD while the rest of the delay stages are connected to the
control node VCC as shown in the Fig. 2.13. Separating the VCO control voltage from the
circuitry responsible for reference injection minimizes the reference clock feed though the
supply node and helps reduce DJ (see Fig. 2.15(b)).
2.5 Second-Stage DPLL
Figure 2.16: Block diagram of the second-stage DPLL.
The second stage of the proposed frequency synthesizer is implemented using a hybrid-
PLL as shown in Fig. 2.16 [10]. It uses a 160MHz DMDLL output as the reference clock and
generates a 2.56GHz output signal (multiplication factor of M=16). Classical TDC-based
22
digital PLLs suffer from a coupled noise bandwidth trade-off, which severely limits their
jitter performance. In other words, the need to suppress the TDC quantization by low-pass
filtering and the DCO phase noise by high-pass filtering mandates either a high-resolution
TDC or a low-noise oscillator both of which increase power dissipation. By using an analog
proportional path, a hybrid-PLL eliminates the TDC quantization error and alleviates the
coupled noise bandwidth trade-off of digital PLLs. Furthermore, an analog proportional
path also helps ease the resolution requirements of the digital integral path. As a result, a
hybrid-PLL is capable of achieving a low-jitter with a low-power consumption. Other than
the analog proportional path, the hybrid-PLL resembles a conventional digital PLL with the
digital integral path controlling the current controlled oscillator (CCO) through a digital to
analog converter.
Analog proportional control is implemented using a three-state PFD that directly drives
the CCO through a three-level current-mode DAC, denoted as PDAC in Fig. 2.16 [10].
Because the PFD produces an output proportional to the input phase error without any
quantization error, the hybrid-PLL behaves like a linear system in steady-state and exhibits
well-controlled loop dynamics. Integral control is implemented by digitally accumulating the
sign of the phase error generated by detecting the sign of the phase error between UP/DN
outputs of the PFD. A flip-flop (FF) performs sign detection and its output is integrated
in an 18-bit accumulator. The four least significant bits (LSBs) are dropped to reduce the
gain of the integral path and the rest of the 14-bits are fed to a ∆Σ DAC. A second-order
digital ∆Σ modulator operating at twice the reference frequency truncates 14 bits to 8 bits
and generates an integral control using an 8-bit current-mode DAC.
The schematic of the current controlled ring oscillator used in the DPLL is shown in
Fig. 2.17. The delay cells are implemented using current starved pseudo differential CMOS
inverters coupled in a feed-forward manner using transmission gates. The oscillation fre-
quency is controlled by varying the input current, ICTRL, with the help of a current source.
An ac coupled inverter biased with a replica inverter is used as an output buffer to achieve
rail-to-rail output swing with low power and minimal duty cycle distortion.
23
Figure 2.17: Schematic of the ring oscillator used in the second-stage DPLL.
2.6 Experimental Results
A complete block diagram of the proposed frequency synthesizer is shown in Fig. 2.18. Using
a 1.25MHz reference clock, the first-stage DMDLL and the overall DMDLL+DPLL provide
160MHz, OUT1 (multiplication factor of N=128), and 2.56GHz, OUT2 (multiplication
factor of N×M=2048), outputs, respectively. A prototype of this frequency synthesizer is
implemented in a 90 nm CMOS logic process and occupies an active area of 0.16mm2. The
die micro-graph of the prototype is shown in Fig. 2.19. The die is packaged in a standard
48-pin plastic QFN package and characterized using a four-layer FR-4 printed circuit board.
The reference clock is provided by a 1.25MHz crystal oscillator. Operating from a 1.0V
supply, the prototype chip consumes 4.76mW of power, out of which 3.56mW is consumed
by the DMDLL and the rest (1.2mW) is consumed by the second-stage DPLL.
The measured voltage spectrum of the DMDLL output is depicted in Fig. 2.20(a) when
the STDC is not enabled. In this case, 1-bit TDC output, FF2OUT, is directly fed to the
24
Figure 2.18: Complete block diagram of the proposed frequency synthesizer.
8-bit
DAC
DMDLL
DPLL
PDAC
IDAC
ACC
VCO
DSM
STDC
&
ACC
RING
OSC
&
SEL
LOGIC
Figure 2.19: Die micrograph of the proposed frequency synthesizer.
25
accumulator. As expected, large limit cycles appear as spectral peaks at various locations in
the output clock spectrum. When the proposed STDC is enabled the jitter accumulation pe-
riod reduces and the magnitude of limit cycles is greatly suppressed as shown in Fig. 2.20(b).
In this case, the dominant spurious tone is the reference spur of magnitude -53 dBc caused
by the residual static phase offset.
Time domain measurement of long-term absolute jitter of the DMDLL output at 160MHz
is performed using a digital sampling oscilloscope (Tektronix DSA8300). When the STDC
is OFF, the long-term jitter is 20.07 psrms and 99 ps peak-to-peak as shown in Fig. 2.21(a).
When the STDC is turned ON, the jitter reduces to 2.4 psrms (22.1 ps peak-to-peak jitter)
limited only by the random noise component (see Fig. 2.21(b)). This represents approxi-
mately an 8× improvement in rms jitter and a 4× improvement in the peak-to-peak jitter.
To quantify the performance of the stand-alone second-stage DPLL which provides a
2.56GHz output, OUT2, its jitter performance is measured by providing a 160MHz external
reference clock generated using an arbitrary waveform generator (Tektronix AWG70002A).
The measured long-term absolute jitter is 3.4 psrms and 26.1 ps peak-to-peak, as shown in
Fig. 2.22.
Figure 2.23 shows the time domain measurement of the complete 1.25MHz to 2.56GHz
frequency synthesizer in the cascaded DMDLL+DPLL configuration. When the STDC is
OFF, the final 2.56GHz output, OUT2, has a long-term absolute jitter of 30.2 psrms and
154.8 ps peak-to-peak (see Fig. 2.23(a)). The poor jitter performance is mainly attributed
to the large deterministic jitter introduced in the DMDLL. This large peak-to-peak jitter is
approximately 40% of the output time period (≈390 ps), which makes the 2.56GHz clock
output, OUT2, unusable for any practical application. When the STDC is enabled, the jitter
reduces to 4.18 psrms and 35.2 ps peak-to-peak, as shown in Fig. 2.23(b), which represents
approximately a 7× improvement in rms jitter and a 4× improvement in the peak-to-peak
jitter. Figure 2.24 shows the measured phase noise plots of the DMDLL and cascaded
DMDLL+DPLL outputs using an Agilent E4440A spectrum analyzer. At a 100 kHz off-
set frequency, the DMDLL and DMDLL+DPLL achieve an in-band phase noise floor of
-106.2 dBc/Hz and -81.8 dBc/Hz, respectively. The difference between the two noise floors
is 24.4 dB, which is only 0.4 dB larger than the ideal value of 24 dB (10 log(M2)). Thus the
26
in-band phase noise contribution at the output is dominated by the DMDLL phase noise.
Table 2.1: Performance Comparison of the Proposed Frequency Synthesizer with
State-of-the-Art Low-Frequency Input Reference Designs.
This Work Watanabe Chung Chen
DMDLL DMDLL+ DPLL & Yamauchi [21] & Ko [22] et al. [23]
Technology 90 nm 90 nm 65 nm 65 nm 0.18µm
Supply Voltage [V] 1.0 1.0 5.0 1.0 1.8
Reference Frequency [MHz] 1.25 1.25 0.3397 0.941 0.019
Output Frequency [MHz] 160 2560 61.3 527 4.3
Divide Ratio 128 2048 1022 5600 224
Power Consumption [mW] 3.6 4.76 N/A 1.81 15*
Jitter RMS/Pk-Pk [ps] 2.4/22.1 4.18/35.2 234/1590 8.6/165 45/110
Area [mm2] 0.10 0.16 1.17 0.07 0.16
Power Efficiency [mW/GHz] 22.5 1.86 N/A 3.5 39.7
* @378MHz output frequency.
Table 2.1 shows the performance summary of the proposed frequency synthesizer at both
the 160MHz MDLL and 2.56GHz MDLL+DPLL outputs. Performance of the state-of-
the-art frequency synthesizers operating with a low-frequency reference is also presented in
Table 2.1. In comparison to the existing frequency synthesizers, the proposed architecture
achieves the best power efficiency of 1.86mW/GHz and lowest long-term absolute jitter of
4.18 psrms and 35.2 ps peak-to-peak.
27
(a)
(b)
Figure 2.20: DMDLL output spectrum at 160MHz: (a) when STDC is OFF, and (b) when
STDC is ON.
28
(a)
(b)
Figure 2.21: Measured DMDLL output jitter histogram at 160MHz: (a) when STDC is
OFF, and (b) when STDC is ON.
29
Figure 2.22: Measured stand-alone DPLL output jitter histogram at 2.56GHz.
30
(a)
(b)
Figure 2.23: Measured MDLL+DPLL output jitter histogram at 2.56GHz: (a) when
STDC is OFF, and (b) when STDC is ON.
31
Figure 2.24: Measured phase noise plot at the DMDLL and DMDLL+DPLL cascaded
outputs.
32
CHAPTER 3
CALIBRATION-FREE FRACTIONAL-N CLOCK
GENERATION USING HYBRID PHASE
DETECTOR/INTERPOLATOR
3.1 Introduction
Figure 3.1: Block diagram of a conventional charge-pump-based fractional-N PLL.
Fractional-N phase-locked loops (PLLs) provide a convenient means to generate high fre-
quency clocks whose frequency can be controlled accurately. Conventional fractional-N PLLs
are typically implemented using the charge-pump-based architecture shown in Fig. 3.1 [24].
In addition to a phase frequency detector (PFD), loop filter, charge-pump, and a voltage
controlled oscillator (VCO), it consists of a dual modulus divider that is dithered by a delta-
sigma (∆Σ) modulator. Because frequency division ratio of the feedback divider is restricted
to only integer values (N), fractional division ratio (N+α) is obtained by switching the di-
33
vide ratio between integer values using the ∆Σ output, y[k], i.e. instantaneous division ratio
is equal to N+y[k]. The ∆Σ modulator truncates the input frequency control word and
generates a sequence of integers, y[k], with a running average equal to the fractional division
ratio, α. The quantization error introduced by the ∆Σ modulator is filtered by the low-pass
action of the PLL feedback loop. The impact of quantization error on output phase noise
can be reduced to negligible levels by reducing the PLL bandwidth.
Besides the ∆Σ modulator, VCO is a major source of phase noise in a fractional-N PLL.
Owing to its high-pass noise transfer function, VCO phase noise can be suppressed by in-
creasing the PLL bandwidth. Therefore, choosing the PLL bandwidth that suppresses both
the ∆Σ quantization error and VCO phase noise adequately becomes very challenging. It is
this conflicting bandwidth requirement that complicates the design of low-noise fractional-N
PLLs with low power.
The noise bandwidth trade-off is quantified by using the simulated phase noise plots shown
in Fig. 3.2. In the first case, PLL bandwidth is chosen low enough to make the contribution of
quantization to the output phase noise negligible (see Fig. 3.2(a)). In this particular example,
the bandwidth is only 250 kHz, the reference frequency is 50MHz and the spot phase noise
of the free running VCO is -87 dBc/Hz at 1MHz offset. Due to the low bandwidth, the total
output phase noise is dominated by the VCO phase noise resulting in an integrated jitter of
about 10 psrms, which is unacceptably large for most applications. On the other hand, when
the PLL bandwidth is increased to 2.5MHz
(
FREF
20
)
, VCO phase noise is greatly suppressed
(see Fig. 3.2(b)). But output phase noise is dominated by ∆Σ quantization error resulting
in an integrated jitter of more than 7 psrms, which is also large and unacceptable. This
noise bandwidth trade-off makes the conventional architecture not well suited particularly
for ring-VCO based PLLs because ring-VCO phase noise is significantly higher than that of
LC-VCOs.
Several techniques that seek to mitigate the phase noise and bandwidth trade-off using
quantization error cancellation have been proposed [25–28]. As described in Section 3.2, the
effectiveness of these techniques is limited by gain mismatch between the quantization error
and the cancellation paths as well as nonlinearity caused by analog circuit imperfections. As
a result, complex calibration is often required to achieve high performance. In this chapter,
34
  
VCO Open Loop
VCO Noise
Detector Noise
∆Σ Noise
Total PLL Noise
103 104 105 106 107
-160
-140
-120
-100
-80
-60
-40
Frequency [Hz]
Ph
as
e 
No
is
e 
[d
Bc
/H
z]
(a)
 
 
VCO Open Loop
VCO Noise
Detector Noise
∆Σ Noise
Total PLL Noise
103 104 105 106 107
-160
-140
-120
-100
-80
-60
-40
Frequency [Hz]
Ph
as
e 
No
is
e 
[d
Bc
/H
z]
(b)
Figure 3.2: Simulated phase noise plot for a charge-pump-based fractional-N PLL (a) using
low bandwidth for ∆Σ quantization noise cancellation, and (b) using high bandwidth for
VCO phase noise suppression.
35
we demonstrate a ring-VCO based fractional-N PLL that employs a hybrid phase/current
phase interpolator (HPC-PI) to achieve highly accurate error cancellation without using
calibration. The maximum loop bandwidth to suppress ring-VCO phase noise is limited
to one-tenth the reference frequency. To overcome this, reference clock frequency to the
fractional-N PLL is increased from 50MHz to 500MHz using an on-chip low-noise integer-N
digital frequency multiplier. Fabricated in a 65 nm CMOS process, the prototype fractional-
N frequency synthesizer consumes 11.6mW power and generates fractional output frequency
in the range of 4.25 to 4.75GHz from a fixed 50MHz reference clock. The proposed clock
multiplier achieves an integrated jitter of 1.5 psrms with a power efficiency of 2.4mW/GHz
resulting in an FoM of -225.8 dB.
The rest of this chapter is organized as follows. Prior art on quantization error cancellation
is briefly discussed in Section 3.2 with the goal of motivating the proposed PLL architecture
presented in Section 3.3. The circuit implementation details are described in Section 3.4.
The measured results are presented in Section 3.5.
3.2 Prior Art
Quantization error cancellation is a common technique used to mitigate the bandwidth trade-
off in fractional-N PLLs. As shown in Fig. 3.3, quantization error of the ∆Σ modulator
(EQ), is computed by subtracting its input from the output and canceled at the output
of charge-pump using a current-mode digital-to-analog converter (IDAC) [25]. Because the
dual modulus divider acts as a phase integrator, a digital phase accumulator is needed in the
cancellation path to explicitly integrate EQ. In other words, IDAC outputs a current whose
magnitude is equal to the amount of charge-pump current resulting from the quantization
error of the ∆Σ modulator. Perfect cancellation of EQ enables an increase in the PLL loop
bandwidth to adequately suppress VCO phase noise. However in practice, the effectiveness
of this approach is severely limited by gain and timing mismatch between the quantization
error path through the divider, PFD, charge-pump and the cancellation path through the
digital phase accumulator and the IDAC. As a result, high-precision analog circuitry and
extensive calibration is needed to mitigate these path mismatches and to achieve acceptable
36
Figure 3.3: Block diagram of fractional-N PLL utilizing current-mode DAC based error
cancellation.
performance [26].
Figure 3.4: Block diagram of phase interpolator based fractional-N PLL.
An alternative technique shown in Fig. 3.4 performs quantization error cancellation at the
output of the divider [27–29]. As before, EQ is first integrated using the phase accumulator.
37
Figure 3.5: Phase interpolation operation using conventional PI.
However, in contrast to the previous IDAC-based approach, cancellation is performed in the
phase domain by using a phase interpolator (PI), which converts phase accumulator output
into phase and subtracts it from the phase of the divider output. Compared to the IDAC-
based approach, this technique is less susceptible to path mismatch because of the absence of
additional phase to current conversion. Nevertheless, the effectiveness of this approach also
depends on the accuracy with which cancellation path gain matches the gain of quantization
error path through the divider. Therefore, the PI must be designed to have: (i) a fixed and
known gain and (ii) a linear range that is large enough to cancel phase quantization error,
ΦEQ , at the output of the divider. The range of ΦEQ depends on the number of output bits
of the ∆Σ modulator. For a 1-bit output ∆Σ modulator, ΦEQ spans one VCO period.
A PI is most commonly implemented using a two-step architecture. Two adjacent clock
phases are selected from M phases based on the most significant bits (MSBs) of the input
digital word. An interpolator, controlled by the remaining least significant bits (LSBs),
mixes these two phases to generate the output phase. The M phases are either generated
using an explicit multi-phase generator [29] or tapped off from the ring-VCO itself [28].
Figure 3.5 shows the basic working principle of the conventional interpolator [29]. It uses
two adjacent phases Φ1, and Φ2, as input and generates an interpolated waveform ΦOUT
based on input control word, α. The interpolation operation is performed in two steps.
First, inputs Φ1, and Φ2, are passed through a signal conditioning circuit (SigCon), which
slows down the edges by increasing rise/fall times and generates outputs Φ1S, and Φ2S, such
that both waveforms overlap during transition time. In the second step, Φ1S, and Φ2S, are
38
passed through an interpolator circuit (Interp) consisting of two buffers. These buffers scale
Φ1S, and Φ2S, by interpolation weights 1 − α and α, respectively, and the resulting buffer
outputs are summed to generate the interpolated output phase, ΦOUT.
An important advantage of this implementation is the gain of PI is very well defined and
is equal to 2−L · 2pi/M rad/LSB, where L is the number of input bits. However, the main
drawback is that the PI linearity depends on several factors such as, the integral nonlinearity
(INL) of the multi-phase generator, input waveform shapes, their rise/fall times, phase sepa-
ration of the PI inputs, and the interpolator output time constant [30]. Additionally, it also
depends heavily upon the inherent nonlinearity of current to rise/fall time conversion process
in SigCon block. The interpolation range of the conventional PI is also limited depending
upon the transition time overlap between SigCon outputs, which necessitates the use of a
greater number of phases from the multi-phase generator to cover the entire interpolation
range. These factors limit the PI linearity to about 4-5 bits in practice, which severely
restricts the effectiveness of the PI-based quantization error cancellation approach. To over-
come phase interpolator nonlinearity, elaborate calibration [27] or noise-shaped segmentation
techniques [31] are needed, both of which add to the design complexity.
The quantization error cancellation can also be achieved by using a delay line in-place of
a phase interpolator [32] in the feedback path or in the reference clock path [33]. However,
for efficient cancellation the delay line range needs to be precisely one VCO clock period.
Additionally, the delay line should also be linear across the whole range of operation. To meet
the gain and linearity requirements, this approach requires additional calibration circuitry,
which increases design complexity.
Even if the PLL bandwidth is extended to FREF/15 (2MHz in [31]) using one of the above
quantization error-cancellation techniques, it may not be sufficient to adequately to suppress
ring-VCO phase noise, especially in deep sub-micron technologies. For instance, behavioral
simulations indicate integrated jitter can be as high as 2.3 psrms when the PLL bandwidth
is chosen to be 3.5MHz (FREF/15) and VCO phase noise is -87 dBc/Hz at 1MHz offset.
In view of these drawbacks, we present a ring-VCO based calibration-free fractional-N PLL
that achieves accurate quantization error cancellation and efficient ring-VCO phase noise
suppression simultaneously.
39
3.3 Proposed Architecture
Figure 3.6: Simplified block diagram of the proposed HPC-PI based PLL.
A simplified block diagram of the proposed hybrid phase/current phase interpolator based
fractional-N PLL is shown in Fig. 3.6 [34]. In addition to a conventional type-II loop filter
(LF) and the ring-VCO, it consists of a HPC-PI formed by a dual modulus multi-phase
divider along with an XOR-based phase detector/interpolator (XOR PD-PI) block. By
combining the functions of phase detection and phase interpolation in XOR PD-PI block as
discussed next, this architecture eliminates the path mismatch issues associated with current-
mode DAC based cancellation approach. By using a HPC-PI it mitigates the nonlinearity
of conventional voltage-mode phase interpolators. Thus, this architecture can accurately
cancel quantization error without the need for any calibration.
A detailed block diagram of the hybrid phase/current phase interpolator is shown in
Fig. 3.7. A divide-by-2 stage divides the VCO output and generates four equally spaced
phases denoted as I, IB, Q and QB at half the VCO frequency (FVCO/2). Out of the four
phases, QB phase is fed to the dual modulus 4/5 divider that is controlled by the truncated
frequency control word (FCW). A second-order ∆Σ modulator truncates the 20-bit frequency
40
SUM 
MSBs 1 2
00 PI PQ
01 PQ PIB
10 PIB PQB
11 PQB PId
Figure 3.7: Block diagram of hybrid phase/current phase interpolator (HPC-PI).
control word into 6 bits and feeds them to a digital phase accumulator. The digital phase
accumulator performs integration and modulo 2pi operation to generate a 6-bit SUM output
and 1-bit CARRY output. The phase accumulator can be viewed as a 1-bit first-order
∆Σ modulator, hence the CARRY output is used to control the divide-by-4/5 dual modulus
divider. The SUM output represents the phase quantization error, ΦEQ , and is used to control
the HPC-PI such that the ∆Σ quantization error is canceled. This architecture maps ∆Σ
modulator output bits directly into phase domain such that the range of quantization error
is limited to 2TVCO clock periods, irrespective of the number of bits in the modulator output.
The divider output is synchronized by all the four phases using flip-flops to generate four
phases denoted as PI, PQ, PIB and PQB (see Fig. 3.7). Apart from theses four phases, an
additional two VCO cycle delayed version of output phase PI, called PId is also generated by
synchronizing PI again with phase I of divide-by-2 based quadrature phase generator. These
phases along with the SUM output of the digital phase accumulator are fed to the HPC-PI.
The HPC-PI performs phase quantization error cancellation by interpolating between the
synchronized divider output phases based on the 6-bit SUM output of the digital phase
accumulator. The phase interpolation itself is implemented in two stages. In the first stage,
a 5-to-2 Phase Mux selects two adjacent phases, Φ1 and Φ2, from the five input phases PI,
41
PQ, PIB, PQB and PId based on two MSBs of the SUM signal. This phase selection operation
provides 2-bit coarse error cancellation in phase domain and can generate error-free fractional
division ratios of 4.25, 4.5, and 4.75 with respect to half the VCO frequency.
Figure 3.8: Timing waveforms of the MSB interpolation using HPC-PI with divide ratio of
4.25.
Figure 3.8 shows the detailed timing waveforms of coarse error cancellation for a divide
ratio of 4.25. The MSBs of the SUM signal are incremented by one code every reference
period so that the Phase Mux adds a quarter clock period phase shift to the divider output,
resulting in a fractional division ratio of 4.25. Under this condition, phase quantization error
is perfectly canceled by the first-stage coarse PI itself. For glitch-free operation the SUM
signal is synchronized with the negative edge of phase PId, so that all the four phases are
42
Figure 3.9: Timing waveforms of the LSB interpolation using HPC-PI.
at the same voltage level (0 V in this case) when the Phase Mux control signal is changed.
In addition to Φ1, its adjacent phase Φ2, which is half the VCO period (TVCO/2) delayed
from Φ1 is also selected for fine phase interpolation in the second stage. Note that adjacent
phase, Φ2, is readily available when either PI, PQ or PIB is selected as Φ1. However, when
phase PQB is selected (Φ1 = PQB), then PI cannot be used as the adjacent phase output Φ2.
As, instead of being half the VCO period delayed in comparison to PQB, phase PI is one and
half VCO period (3TVCO/2) ahead of PQB (see Fig. 3.8). So instead of phase PI, a two-VCO
cycle delayed version of PI denoted as PId is used.
Fine phase interpolation in the second stage is directly implemented inside the phase
43
detector/interpolator (XOR PD-PI) block by scaling its output as depicted in Fig. 3.7. The
phase detector consists of two XOR gates that compare Phase Mux outputs Φ1 and Φ2 with
reference clock, REF. Denoting interpolation weight corresponding to the four LSBs of SUM
signal as α, fine phase adjustment is achieved by weighting the output currents of two phase
detectors by α and 1 − α and by summing them to generate the output current. Fig. 3.9
shows the timing waveforms for the 4-bit LSB interpolation. Similar to the two MSBs case
in the first stage, four LSBs of the SUM signal are also synchronized with the negative edge
of PId phase for glitch-free operation. Because phase interpolation is performed by scaling
output currents, much better linearity can be achieved [35], [36], compared to inherently
nonlinear current or voltage to phase conversion based phase interpolator [37]. Slew-rate
controllers are also not needed. Another important advantage of this architecture is its
suitability at low frequencies because of its insensitivity to Φ1 and Φ2 waveform shapes.
Compared to a three-state phase and frequency detector (PFD), the XOR phase detector
has higher gain and is also well suited for incorporating charge-pump function needed for
phase interpolation. They can also be implemented in a power efficient manner, especially at
high reference frequencies. These characteristics allow the PLL to achieve large bandwidth
with low power consumption. Furthermore, because XOR PD locks with a phase offset of
pi/2, it obviates additional circuitry needed in a PFD to generate phase offsets to achieve
linear interpolation [25], [38]. A potential drawback of the XOR PD is its sensitivity to
the input duty cycle. While the input range is prone to duty cycle errors, it has been
demonstrated that the interpolation is immune to duty cycle errors [35].
Simulated output phase noise plot of the proposed HPC-PI based PLL is shown in
Fig. 3.10. Thanks to the proposed noise cancellation scheme, quantization error has minimal
impact on output phase noise. As a result, PLL bandwidth can be increased to as high as
FREF/20 to maximize VCO phase noise suppression. However, because of relatively poor
phase noise of the ring oscillator, even with a bandwidth of 2.5MHz (FREF/20 at FREF=
50MHz), integrated jitter is dominated by the unfiltered VCO phase noise and is about
2.6 psrms (see Fig. 3.10). Hence, further reduction in the jitter requires either lowering the
VCO phase noise or further increasing PLL bandwidth. VCO phase noise improvement
comes with a large power penalty (reducing phase noise by 3 dB doubles the VCO power
44
  
VCO Open Loop
VCO Noise
Detector Noise
∆Σ Noise
Total PLL Noise
103 104 105 106 107
-160
-140
-120
-100
-80
-60
-40
Frequency [Hz]
Ph
as
e 
No
is
e 
[d
Bc
/H
z]
Figure 3.10: Simulated phase noise plot for the proposed proposed HPC-PI based PLL.
consumption). Instead, we explore the possibility of increasing PLL bandwidth. The maxi-
mum PLL bandwidth is limited by stability considerations to about FREF/10, which results
in a trade-off between VCO phase noise suppression and the reference frequency.
Figure 3.11: Simplified block diagram of the proposed two-stage architecture.
45
  
VCO Open Loop
VCO Noise
REFH + Det. Noise
∆Σ Noise
Total PLL Noise
103 104 105 106 107
-160
-140
-120
-100
-80
-60
-40
Frequency [Hz]
Ph
as
e 
No
is
e 
[d
Bc
/H
z]
Figure 3.12: Simulated phase noise plot of the proposed HPC-PI based PLL architecture
with high-frequency reference clock and quantization error cancellation.
To decouple the trade-off between VCO phase noise suppression and the reference fre-
quency, we propose to increase the effective reference frequency by 10× by cascading a low
noise integer-N frequency multiplier with the fractional-N PLL (see Fig. 3.11). This helps
in increasing the PLL bandwidth by 10× so that VCO phase noise is suppressed adequately
without increasing input crystal oscillator frequency beyond 50MHz. Simulated phase noise
plot of this architecture, shown in Fig. 3.12, illustrates that integrated jitter can be reduced
to 1.4 psrms. The noise performance of integer-N frequency multiplier is crucial to the low-
noise operation of the proposed architecture. It is implemented using a digital multiplying
delay-locked loop to achieve low jitter as well as low power. Its implementation details are
presented in Section 3.4.
Note that higher-reference frequency also increases oversampling ratio of the ∆Σ mod-
ulator used in the fractional divider [15]. As a result, in-band phase quantization error is
reduced, which brings the need and effectiveness of the phase noise cancellation into ques-
tion. So it is instructive to evaluate the phase noise performance improvement solely because
of increasing the reference frequency in the absence of cancellation. To this end, we look
at the simulated output phase noise plot shown in Fig. 3.13 when the HPC-PI is turned
46
  
VCO Open Loop
VCO Noise
REFH + Det. Noise
∆Σ Noise
Total PLL Noise
103 104 105 106 107
-160
-140
-120
-100
-80
-60
-40
Frequency [Hz]
Ph
as
e 
No
is
e 
[d
Bc
/H
z]
Figure 3.13: Simulated phase noise plot of the proposed HPC-PI based PLL architecture
with a high-frequency reference clock and no error cancellation.
off and PLL bandwidth is readjusted for optimum noise performance. As expected, the
integrated jitter improves to 2.3 psrms when the reference frequency is increased to 500MHz.
However, this is approximately 1.7× worse than the case when HPC-PI is used under the
same conditions.
The high-reference frequency generation also comes with additional power penalty. So the
performance of the proposed two-stage architecture should also be compared to the case when
the first stage is eliminated and its power is used to improve phase noise performance of the
VCO used in the fractional-N PLL. To this end, the phase noise performance was simulated
using a new low noise VCO (HPVCO) that consumes 2.5×more power. Figure 3.14 shows the
simulated phase noise performance with HPVCO and 50MHz reference clock. When HPC-
PI is on, integrated jitter is equal to 1.8 psrms. This indicates that HPC-PI combined with
high-reference frequency generator yields the lowest jitter in a ring-VCO-based fractional-N
PLL compared to other possible design choices.
47
  
HPVCO Open Loop
HPVCO Noise
Detector Noise
∆Σ Noise
Total PLL Noise
103 104 105 106 107
-160
-140
-120
-100
-80
-60
-40
Frequency [Hz]
Ph
as
e 
No
is
e 
[d
Bc
/H
z]
Figure 3.14: Simulated phase noise plot of the proposed HPC-PI-based PLL architecture
with low noise VCO and quantization error cancellation.
3.4 Building Blocks
3.4.1 Phase Detector/Interpolator
The schematic of the XOR-based phase detector/interpolator unit is shown in Fig. 3.15. It
consists of four XOR-based phase detectors (XOR 1-4) and a 4-bit current-steering DAC to
scale the PD output currents. XOR-1 and XOR-2 are used to detect phase difference between
REFH and feedback phases (Φ1 and Φ2). XOR gates are implemented using current mode
logic with current source loads instead of resistive loads [36]. The use of current source
loads not only eases summing of output currents but also allows the integration of charge-
pump functionality into the phase detector, obviating the need for a separate charge-pump.
Current weighting is performed by varying the tail current sources using a thermometer-
coded 4-bit DAC. Two additional phase detectors XOR-3 and XOR-4 are used to steer the
tail current when XOR-1 and XOR-2 are off. A unity gain buffer is also added between the
two PMOS current sources, MP1 and MP2 to reduce the VDS mismatch between them. These
two techniques greatly improve the interpolation linearity. The output current IOUT is then
48
Figure 3.15: Schematic of the phase detector/interpolator.
 
 
INL = 0.6 LSB
DNL= 0.23 LSB
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15-0.35
-0.15
0.05
0.25
0.45
0.65
DN
L 
[L
SB
]
-0.35
-0.15
0.05
0.25
0.45
0.65
IN
L 
[L
SB
]
Digital control word (DP)
Figure 3.16: Simulated non-linearity of the phase detector/interpolator.
49
passed to the loop filter of the PLL. Figure 3.16 shows the simulated static nonlinearity of
the XOR PD-PI, which indicates an INL of 0.6 LSB and a DNL of 0.23 LSB.
3.4.2 Ring-VCO
Figure 3.17: Schematic of the ring-VCO.
The schematic of the ring oscillator is shown in Fig. 3.17. It is composed of four pseudo
differential delay cells connected in ring topology. The delay cells are implemented using
two current-starved CMOS inverters coupled in a feed-forward manner using resistors for
differential operation. The frequency of the oscillator is controlled by varying the gate
voltage of the PMOS current source, MP. An AC coupled output buffer biased by a replica
inverter is used to convert low-swing delay cell output to rail-to-rail output. The simulated
spot phase noise of the ring-VCO at 1MHz frequency offset from the carrier frequency of
4.75GHz is -87 dBc/Hz.
50
3.4.3 Quadrature Phase Generation
Figure 3.18: Schematic of divide-by-2 quadrature phase generator.
The schematic of the quadrature phase generator is shown in Fig. 3.18. It is implemented
using a cascade of two flip-flops, SA-FF1 and SA-FF2, which provide quadrature phases, I,
IB, Q, and QB, at half the input clock frequency. To minimize phase spacing errors caused
by asymmetric low-to-high and high-to-low transitions, the flip-flops are implemented using
symmetric sense amplifier architecture [19]. The sense-amplifiers are sized to minimize the
effect of mismatch at the expense of additional power. Note that multiple phases of the
ring-VCO can also be used but maintaining their phase relationship while buffering and
routing the signals over long distances is difficult and power hungry. In contrast, using
only differential output of the VCO as proposed eliminates this issue albeit at the expense
of increased phase spacing. Thanks to the hybrid phase current interpolation technique,
increased phase spacing does not degrade the linearity of the HPC-PI.
3.4.4 Dual Modulus Divider
Figure 3.19 shows the schematic of divide-by-4/5 dual modulus divider. It consists of a
cascade of two divide-by-2/3 stages that are implemented using digital logic standard cells.
Each of the divide-by-2/3 stages consists of two active high-level and two active low-level
51
transparent latches (L1-L4) along with a few logic gates. By selecting the control node P the
divide-by-2/3 operation is performed. To achieve divide-by-4/5 operation only one divide-
by-2/3 stage is controlled using DIVCTRL node with the CARRY output of digital phase
accumulator while the control of the second divide-by-2/3 stage is kept at logic low. The
clock input to the dual modulus divider is provided through phase QB of the quadrature
phase generator and output of divider is synchronized with all four quadrature phases using
flip-flops. The range of this divider can be extended to achieve higher divide ratios either
by using control node P of the second divide-by-2/3 stage or by cascading more of these
stages. However, dual modulus divider with a division ratio between 4.25 to 4.75 is sufficient
to provide an output frequency range of 500MHz between 4.25GHz to 4.75GHz because of
the high reference frequency of 500MHz and the presence of static divide-by-2 stage (used
for quadrature phase generation) in front of the dual modulus divider.
Figure 3.19: Schematic of divide-by-4/5 dual modulus divider.
52
3.4.5 High-Frequency Reference Generation
Figure 3.20: Block diagram of a high-frequency reference generator digital MDLL.
A digital multiplying delay-locked loop (MDLL) is used to generate 500MHz reference
clock for the fractional-N PLL from a 50MHz crystal oscillator. Compared to a PLL, an
MDLL offers superior phase noise due to the periodic reference injection and consumes low
power due to relaxed phase noise requirements of the oscillator [11, 12, 14]. The block dia-
gram of the digital MDLL is shown in Fig. 3.20. A D flip-flop (FF) detects the sign of the
phase error between the reference clock and oscillator output and its output is integrated by
a 16-bit accumulator. The two LSBs are dropped to reduce gain of the integral path and the
rest of the 14 bits are fed to a second-order digital ∆Σ modulator. The modulator, clocked
at 125MHz, truncates 14 bits to 5 bits and drives the binary-to-thermometer (BIN2THM)
converter. The output of BIN2THM controls a 31-level thermometer coded IDAC imple-
mented using identical stages of NMOS unit current sources. The output current of the
IDAC is then converted to voltage using a resistor. A fourth-order passive low-pass filter
(LPF) is used at the IDAC output to suppress out-of-band quantization noise and generate
control voltage of the multiplexed ring-VCO. A divider along with the select logic is used for
periodic injection of the reference clock into multiplexed ring-VCO. Select logic similar to
the one reported in [13] is employed and is implemented using standard cells. It generates
the select signal, SEL, using the divider output and an intermediate delay stage output of
53
Figure 3.21: Schematic of multiplexed ring-VCO used in digital MDLL.
the oscillator.
Figure 3.21 shows the schematic of the multiplexed ring-VCO used in digital MDLL.
The multiplexed ring-VCO is implemented using 32 pseudo differential delay stages and a
multiplexer. Since the oscillation frequency of the ring is relatively low, a large number of
inverter stages are used to achieve sharp rise and fall times. Fast edge transitions help min-
imize pattern jitter during reference injection. The delay cells are implemented by coupling
two current starved CMOS inverters in a feed-forward manner using resistors. A NAND-gate
based multiplexer is used for selecting between the buffered reference clock and the oscil-
lator output, REFH. The frequency of the multiplexed ring-VCO is controlled by varying
the gate voltage, VCTRL, of the PMOS current source. To minimize the impact of reference
injection on the VCO virtual supply voltage, VS, multiplexer and the delay stage driving
the multiplexer are connected to the supply voltage, VDD, instead of VS.
54
3.5 Experimental Results
Figure 3.22: Complete block diagram of the proposed architecture.
The block diagram of the complete two-stage fractional-N PLL is shown in Fig. 3.22. A
prototype chip fabricated in a 65 nm CMOS process and occupies an active area of 0.48mm2.
The die micrograph is shown in Fig. 3.23. The die is packaged in a standard 60-pin QFN
plastic package and characterized using a four-layer printed circuit board. Operating from
1.0V supply the proposed clock multiplier can generate fractional frequencies over a range
of 4.25 to 4.75GHz while consuming 11.6mW of power.
The spectrum of the MDLL output measured using Agilent MXA N9020A spectrum an-
alyzer is shown in Fig. 3.24. Operating with a 50MHz reference clock provided by a crystal
oscillator, the MDLL generates a 500MHz output and the measured reference spur magni-
tude is -57 dBc. The measured phase noise plot of the MDLL output is depicted in Fig. 3.25.
The in-band phase noise at 400 kHz offset frequency is -122.8 dBc/Hz and the jitter obtained
by integrating phase noise from 2 kHz to 20MHz is 1.1 psrms.
Measured output phase noise plots of the complete fractional-N PLL in integer- and
fractional-N modes are shown in Fig. 3.26. When operated in the integer-N mode at 4GHz
output frequency, in-band phase noise is -104.7 dBc/Hz at 400 kHz offset frequency and the
integrated (2 kHz to 20MHz) jitter is 1.42 psrms. When the mode of operation is changed from
55
Figure 3.23: Die micrograph of the proposed clock generator.
Figure 3.24: Measured output spectrum of the high-frequency reference generator MDLL.
integer-N to fractional-N, the in-band phase noise floor rises only by 0.9 dB to -103.8 dBc/Hz
at 400 kHz offset indicating that the quantization noise has minimal impact on the overall
56
Figure 3.25: Measured output phase noise plot of the MDLL used for high-frequency
reference generation.
Figure 3.26: Measured output phase noise plot of the proposed architecture in integer and
fractional-N mode.
noise performance. The integrated jitter also remains almost the same at 1.46 psrms.
The measured output phase noise plots shown in Fig. 3.27 quantify the effectiveness of
57
Figure 3.27: Measured output phase noise plot of the proposed architecture with HPC-PI
ON and OFF.
different steps in the HPC-PI cancellation method. Shown in the figure are output phase
noise plots in different HPC-PI states when generating fractional output frequency with
a frequency control word corresponding to 4.75-(2−7+2−8+2−19)GHz . When the HPC-
PI is OFF, integrated jitter is 6.7 psrms (2 kHz to 20MHz) and the in-band phase noise
floor is -90.6 dBc/Hz at 400 kHz offset. When only two MSBs of the PI are turned ON
(i.e only Phase Mux portion of the PI is exercised), the integrated jitter and the noise
floor reduce to 3.52 psrms and -94.5 dBc/Hz, respectively. When the HPC-PI functionality is
turned ON completely, integrated jitter is 1.46 psrms and noise floor reduces to -103.8 dBc/Hz,
corresponding to an improvement of 13.2 dB compared to the case when the HPC-PI is
disabled.
The output spectrum of the complete fractional-N PLL is shown in Fig. 3.28. The ref-
erence spur at 50MHz is about -60 dBc and its magnitude depends on two factors: (a) the
reference spur magnitude at the output of the MDLL and (b) the bandwidth of second-stage
fractional-N PLL. The leakage of the MDLL reference spur (-57 dBc in our implementa-
tion) to the output can be reduced by lowering the fractional-N PLL bandwidth. However,
this exacerbates the impact of VCO phase noise on output jitter. As a compromise be-
58
Figure 3.28: Measured output voltage spectrum of the proposed architecture at 4.75GHz.
Figure 3.29: Measured fractional spur of the proposed architecture.
tween the suppression of MDLL reference spur and VCO phase, the PLL bandwidth is
chosen to be 12MHz, which is about two-times smaller than the possible maximum (recall
FREF/20=25MHz for the fractional-N PLL).
59

	




	

	
 !"
#
  
	
  


  $%!
&
  
#
Figure 3.30: Power breakdown of the proposed architecture.
With the use of a high-reference frequency, REFH, of 500MHz, along with divide-by-
2 based quadrature phase generator, the fractional frequency control word ranging from
only 0.25-0.75 can provide an output frequency range of 4.25 to 4.75GHz. As a result, the
frequency control word does not need to be close to the integer boundary, which greatly im-
proves the worst-case in-band spur performance. Figure 3.29 shows the measured fractional
spur performance at the output. With 12MHz PLL bandwidth, the in-band fractional spur
magnitude is -51 dBc.
Power breakdown of the fractional-N PLL is shown in Fig. 3.30. Operating from a 1.0V
supply, the proposed clock multiplier consumes a total power of 11.6mW, of which the MDLL
consumes 3.2mW. Performance summary and comparison of key parameters with the state-
of-the-art ring-based fractional-N PLLs is depicted in Table 3.1. For fair comparison, phase
noise is normalized to 4.75GHz. The proposed clock multiplier achieves the lowest jitter and
2× power efficiency improvement compared to state-of-the-art fractional-N ring-based PLL
designs. It also achieves the best figure of merit (FoM) of -225.8 dB, defined as a metric for
jitter/power efficiency, among all ring oscillator-based fractional-N PLLs.
60
Table 3.1: Performance Comparison of the Proposed Clock Multiplier with
State-of-the-Art Designs.
This Work
Kao [31] Park [39] Jee [40] Chen [41] Yu [42]
ISSCC’13 JSSC’12 JSSC’12 JSSC’10 JSSC ’09
Technology 65nm 90nm 130nm 130nm 65nm 0.18µm
VCO Topology Ring Ring Ring Ring Ring Ring
Supply Voltage [V] 1.0 2.5/1.2 N/A N/A 1.1-1.3 1.8
Area [mm2] 0.48 0.055 0.17 0.35 0.027 N/A
Output Frequency [GHz] 4.25-4.75 1.872-1.896 2.4 1 0.6-0.8 0.17-1.25
Reference Frequency [MHz] 50 26 32 32 26 (2-40) 24
Integrated Jitter [psrms] 1.5 3.4 N/A N/A 21.5 17.3
Reference Spur [dBc] -60 -67 -60 -67 -52 N/A
In-band Phase Noise -103.8 -100 -97 -106 -93
N/A[dBc/Hz] @400kHz @400kHz @100kHz @100kHz @1kHz
Normalized Phase Noise -103.8 -100 -97 -106 -93
N/A@4.75GHz [dBc/Hz] @400kHz @400kHz @100kHz @100kHz @1kHz
Power Consumption [mW] 11.6 10 15.2 16.8 2.9-3.5 6.1
Power Efficiency [mW/GHz] 2.4 5.24 6.33 16.8 26.8 13.8
FoM* [dB] -225.8 -219.4 -216 N/A -207.9 -207.3
*FoM = 10 log
[(
σt
1sec
)2 PPLL
1mW
]
61
CHAPTER 4
FLEXIBLE CLOCKING SCHEME BASED WIDE
DATA-RATE TRANSCEIVER
4.1 Introduction
Figure 4.1: Block diagram FPGA-based system with chip-to-chip serial links.
Chip-to-chip serial link transceivers that can operate across a wide range of data-rates of-
fer a great deal of flexibility in optimizing the performance of large compute systems. Such
flexible transceivers are especially desirable in FPGAs as shown in Fig. 4.1, where they can
support different wireline communication standards to communicate with different periph-
eral chips with a single solution [43,44] . However designing such wide range transceivers is
a challenging task due to the following reasons. First, signaling circuits such as TX drivers,
equalizers, samplers that operate over a wide range of data-rates will incur hardware re-
dundancy, which increases area and degrades power efficiency. As a result, compared to
transceivers that are optimized to operate at one single data-rate, flexible-rate transceivers
are power and area hungry. These challenges are addressed in [43].
These transceivers also require clock generation, distribution, and recovery architectures
that can support flexible rate operation. Because a single PLL cannot generate clocks across
62
the entire interface operating range, [43, 44] use multiple LC tanks, carefully optimized
waveform shaping circuits, power hungry clock distribution, and complex frequency plan-
ning methods. While it is possible to achieve wide range operation using classical clocking
techniques, such an approach will again result in large area and power efficiency penalty.
In this work, we present a clocking strategy for efficient clock generation, recovery, and
distribution in flexible-rate transceivers. Using a fixed-frequency low-jitter clock provided
by an integer-N PLL, fractional frequencies are generated/recovered locally using multi-
phase fractional-N clock multipliers, which offers the following advantages. First, clock
distribution power is reduced because only a single-phase fixed-frequency clock is needed.
Second, it can provide multiple phases with infinite phase shifting capability across a wide
frequency range without using phase interpolators. Finally, by maximizing the bandwidth
of a local ring-based clock multiplier, it achieves jitter performance comparable to that of
LC-based multipliers. Fabricated in a 65 nm CMOS process, the prototype transceiver can
be programmed to operate at any rate from 3-to-10Gb/s. At 10Gb/s, integrated jitter of
the transmitter output and recovered clock is 360 fsrms and 758 fsrms, respectively.
The rest of this chapter is organized as follows. Prior art on different clocking schemes
for flexible rate transceivers is briefly discussed in Section 4.2 with the goal of motivating
the proposed clocking architecture presented in Section 4.3. The circuit implementation
details are described in Section 4.4. The summary and discussion of the measured results
are presented in Section 4.5.
4.2 Prior Art
Figure 4.2 shows a commonly used clock distribution architecture in which a master clock
generator is shared among multiple data lanes [45, 46]. To provide wide tuning range of at
least one octave, typically a dual LC tank type structure or two independent clock generators
are used. Because all the lanes share the same master clock, they must operate at nominally
the same data-rate. In other words, this architecture is not suitable if independent lane
control is desired.
While on the receiver side, Fig. 4.3 shows the block diagram of a conventional phase
63
Figure 4.2: Block diagram of a conventional clock distribution architecture.
interpolator (PI) based clock and data recovery (CDR) circuit.
Figure 4.3: Block diagram of a conventional phase-interpolator-based clock and data
recovery circuit.
In this PI-based architecture, multiple phases generated from a high-frequency clock that is
typically shared with transmitter (TxCLK). A digital control loop consisting of a proportional
and integral path denoted as KP and KI, respectively, drives these multiple phases with PI
using an accumulator, ACCP, and generate the recovered clock, RCLK. In the case of sub-
rate operation as shown in Fig. 4.3, multiple PIs are used to generate multiple phases of
the recovered clock. There are two major power hungry operations in this architecture:
First, multi-phase clock generation and distribution with precise phase spacing and low
64
Figure 4.4: Block diagram of wide data-rate transceiver with multi-phase clock routing
scheme.
Figure 4.5: Block diagram of wide data-rate transceiver with high-frequency single-phase
clock routing scheme.
jitter presents a direct trade-off between power and performance. This trade-off worsens if
multiple phases are to be routed to multiple lanes. Due to the wide range operation, it also
precludes the use of low power clock distribution techniques such as resonant clocking. The
second issue is the use of multiple high-speed phase interpolators which also incur a large
area and power penalty as described earlier in Chapter 3.
There have been several attempts made to address the issues described with clock distribu-
tion and recovery. In the approach depicted in Fig. 4.4, multiple LC-based clock generators
are used to simultaneously meet low jitter, wide range, and independent lane control [43].
Quadrature clocks are distributed to each lane to perform clock recovery.
In the second approach shown in Fig. 4.5, only a single-phase clock is distributed and
quadrature phases are generated locally using a delay-locked loop (DLL) or a quadrature
injection-locked oscillator (ILO) [47]. While this architecture saves some power by rout-
ing only one phase, it cannot provide independent per lane data-rate control as the clock
generator is shared among all the lanes.
The third approach as shown in Fig. 4.6 routes a low-frequency clock and uses a local PLL
or multiplying delay locked loop (MDLL) to perform frequency multiplication and generate
multiple phases [48]. By reducing the distributed clock frequency to below a GHz range
65
Figure 4.6: Block diagram of wide data-rate transceiver with low-frequency single-phase
clock routing scheme.
the clock distribution power can be significantly reduced. However, similar to the previous
architecture, it cannot provide the data-rate control for each lane independently. In view of
these drawbacks, we present a flexible clocking architecture that seeks to locally generate
fractional frequencies needed for flexible-rate operation and also provides independent per
lane data-rate control.
4.3 Proposed Architecture
Figure 4.7: Block diagram of proposed transceiver with flexible clocking scheme.
In the proposed approach we use a single-phase, fixed-frequency clock distribution as
shown in Fig. 4.7 [49]. Only a single-phase low-jitter clock generated by a fixed-frequency
LC clock multiplier is routed to all the channels.
A truly fractional divider is used to generate fractional frequencies from the fixed-frequency
clock based on independent data-rate control code shown as DFCW in Fig. 4.8. A ring oscil-
lator based integer-N MDLL uses this fractional frequency clock as reference and generates
high-frequency multi-phase clocks needed for half-rate operation of the transmitter and re-
ceiver. By using this architecture, a wide operating range with independent lane control
can be achieved using only a single LC tank and significantly reduced clock distribution
66
HF LF
MDLL
FCW
2
2
Figure 4.8: Block diagram of proposed multi-phase fractional clock generation unit.
power. Some key design details of this architecture are described next starting with a truly
fractional divider.
Figure 4.9: Block diagram of a conventional multi-modulus fractional divider.
Figure 4.9 shows the block diagram of a ∆Σ based multi-modulus fractional divider. It
consists of a digital ∆Σ modulator that truncates the digital frequency control word DFCW,F
and controls the division ratio of a multi-modulus divider (MMD). The timing diagram
shown in Fig. 4.10 illustrates a division operation by 4.25. The multi-modulus divider will
divide by 4 three times then divide by 5 only one time, and the average division will be a
4.25.
67
Figure 4.10: Timing waveforms of the conventional multi-modulus fractional divider
showing deterministic jitter accumulation for division ratio of 4.25.
However, due to open-loop behavior, the truncation error (eq) introduced by the delta-
sigma modulator is not filtered and it directly appears as a phase error. The resulting jitter is
deterministic and can be as large as one input period (THF). By comparing the output clock
to an ideal clock, 0.25THF deterministic jitter (DJ) appears in the first cycle and accumulates
to 0.75THF by the third cycle. In the fourth cycle, the output clock aligns with the ideal
clock. In this example, the DJ pattern repeats every four cycle and the maximum DJ is
0.75THF. This jitter is directly related to ∆Σ truncation error (eq) as shown in Eq. (4.1).
DJ = −eq× THF (4.1)
In order to cancel the deterministic jitter due to delta-sigma quantization noise, a digitally
controlled delay line (DCDL) is inserted to provide the necessary phase shift as shown in
Fig. 4.11. The phase shift provided by the DCDL, TQNC, should be equal in magnitude and
opposite in sign to the deterministic jitter as shown in Eq. (4.2).
TQNC = eq× THF (4.2)
The timing diagram depicted in Fig. 4.12 shows the added phase shift by the DCDL in
yellow, and the output clock match the ideal clock, where the DJ is completely canceled.
For perfect error cancellation the DCDL delay range should be exactly equal to one clock
68
Figure 4.11: Block diagram of a multi-modulus fractional divider with quantization error
cancellation.
Figure 4.12: Timing waveforms of the fractional divider with quantization error
cancellation.
69
period of the input clock as denoted in Eq. (4.3).
{0 : DCW[max]} ×KDCDL ⇒ 0 : TCLK,HF (4.3)
In other words, digital control range of the delay line must map to exactly one input clock
period. However in practice it is difficult to ensure this condition due to PVT variations.
For example, as illustrated by the transfer characteristic of a DCDL in Fig. 4.13, the delay
range exceeds one input clock period. The DCDL control code mapping for this senario can
be represented as Eq. (4.4).
Figure 4.13: Gain characteristics plot for a practical DCDL.
{0 : DCW[max]} ×KDCDL ; 0 : TCLK,HF (4.4)
Therefore, in practice, DCDL range has to be calibrated to be equal to THF. Figure 4.14
shows the block diagram of the fractional divider with DCDL calibration. Here a calibration
unit takes the MMD output clock, DCDL output clock and quantization error, eq, as inputs
and generates a scaling factor to calibrate the delay line. This scaling is achieved by scaling
the maximum range of DCDL by an appropriately chosen scaling factor KD as illustrated
by the red line in Fig. 4.15.
70
K
D
.e
q
[k
]
Figure 4.14: Block diagram of the fractional divider with DCDL gain calibration.
Figure 4.15: Gain characteristics plot for an ideal DCDL with gain calibration.
71
The scaling operation is highlighted by bold in Eq. (4.5).
{0 : DCW[max]} ×KD ×KDCDL ⇒ 0 : TCLK,HF (4.5)
For practical implementation, this scaling can be performed completely in the digital
domain simply by scaling the input as illustrated in Fig. 4.16 [32]. However, this requires
a wide bit width digital multiplier operating at divider output frequency. Such a multiplier
consumes a large amount of power especially when the output frequency is high in the order
of several hundred MHz.
Figure 4.16: Implementation of the fractional divider with DCDL gain calibration with
digital scaling.
In the view of these drawbacks, we propose an alternative DCDL calibration scheme in
which delay line gain KDCDL is scaled by a factor of KD such that the full scale range of the
DCDL spans one input period. This scaling is achieved by changing the supply voltage of
DCDL as shown in Fig. 4.17. The scaling operation can be understood by looking at the
DCDL gain characterstics shown in Figure 4.18. When the power supply of the DCDL is
increased the effective delay range of the DCDL decreases. By tuning the DCDL supply
voltage appropriately, the DCDL full scale range can be mapped to one input clock period
as highlighted by the red plot.
72
This supply-based scaling operation is highlighted by bold in Eq. (4.6).
{0 : DCW[max]} ×KD ×KDCDL ⇒ 0 : TCLK,HF (4.6)
Figure 4.17: Block diagram of power supply based alternate DCDL gain calibration for the
fractional divider.
Figure 4.18: Effect of power supply scaling on DCDL gain characteristics.
Figure 4.19 shows the block diagram of the proposed gain calibration scheme. A low
dropout regulator (LDO) is used to control the supply voltage of the DCDL. By scaling the
73
Figure 4.19: Block diagram of the proposed DCDL gain calibration scheme.
reference voltage of the LDO, VREF, the DCDL supply is varied which changes the DCDL
gain. The LDO reference is generated using a ∆Σ DAC which takes a digital control word,
KD,DIGITAL, and converts it into corresponding analog voltage, VREF. The digital scaling
code is generated in the background using an LMS-based correlator which correlates the
residual quantization error by multiplying a single bit MDLL phase detector output and
DCDL delay code.
Since the DCDL gain is sensitive to LDO load variations, a dummy DCDL with compli-
mentary code is used to maintain a constant LDO load as shown in Fig. 4.20. This dummy
DLDL reduces LDO load dependency and results in improve DCDL linearity and enhances
the DCDL spurious performance. Compared to digital calibration in [32], this approach
eliminates the need for a high-frequency multiplier at the input of the DCDL and is also
insensitive to DCDL offset delay variations.
To this end, Fig. 4.21 shows the complete block diagram of the proposed multi-phase
fractional-N clock generator. Cascaded multi-modulus divider followed by a DCDL used
for fractional clock generation is cascaded with a ring based MDLL to perform frequency
multiplication and multi-phase generation.
74
Figure 4.20: Complete block diagram of the proposed DCDL gain calibration scheme with
a dummy delay line.
CKHF
MMD
Gain
Calib.
VDCDL
CKLF
ERR[k]
±1
∆Σ2
∆Σ1
DCW[k]
DCDL
DIV[k]
DFCW,FRAC
DFCW,INT
±1,0
CKMMD
DCW[k]
DFCW
20
6
CKMDLL
Ring-based
Integer-N MDLL (x8)
Figure 4.21: Block diagram of the proposed multi-phase fractional-N clock generator.
75
Frequency
Detector
∆Σ2
KI,CDR KP,CDR
ACC
∆Σ1
DCW[k]DIV[k]
DFCW,FRAC
DFCW,INT
±1,0
DataIN
4
RX Front-End
CTLE VGA
FCW
‘0’
Tx/Rx
Select
‘Data Rate’
20
6
CKMDLL
Figure 4.22: Block diagram of the CDR logic implementation.
A second-order ∆Σ modulator truncates the 20-bit fractional control word, DFCW,FRAC,
to 10 bits. This 10-bit truncated output is then re-quantized to three levels using a first-
order ∆Σ modulator implemented using an accumulator. The three-level carry and 10-bit
sum outputs of the accumulator represent frequency control information and accumulated
truncation error of the modulator, respectively. Carry output is added to the integer part,
DFCW,INT, and used to control the MMD portion of the fractional divider while the 10-bit
sum output is used to cancel ∆Σ truncation error using the DCDL. This two-step truncation
and cancellation method provides higher-order modulation while avoiding -1 to 1 and 1 to
76
-1 jumps, which limits the needed DCDL range to one cycle of CKHF, independent of output
frequency/data-rate. This restricted range eases design of the DCDL with good integral
nonlinearity (INL). As described earlier, an LMS-based gain calibration method controls the
DCDL range in background and provides the perfect error cancellation and the combination
of the MMD+DCDL behaves as a true fractional divider. Consequently, explicit filtering of
the truncation error is unnecessary and the MDLL can be optimized for best random jitter
performance.
Figure 4.23: Complete block diagram of the proposed CDR clocking scheme.
Figure 4.22 shows the block diagram of the CDR logic used on the receiver side. The
CDR consists of an analog front-end followed by four charge-based samplers operating at
half-rate that provide data (D) and edge (E) samples and a 1:16 deserializer. The samplers
and deserializer are clocked using the MDLL output clock phases. The deserialized sampler
output early/late signals are used to generate proportional and integral path control signals
77
for the CDR loop shown as KP,CDR and KI,CDR. The detection and the digital proportional-
integral (PI) loop filter are operated at 1/16th the data-rate by decimating the D/E samples
by a factor of 8. The CDR integral control word is added to the fractional control word from
the frequency detector and fed to a second-order ∆Σ modulator. Because ∆Σ modulator
delay adds about 32UI loop latency, it is bypassed in the proportional path and phase error
is added to the input of first-order ∆Σ modulator as highlighted by blue color in the figure.
This reduced loop latency helps to improve the CDR jitter tolerance performance. Two
MUXs highlighted in red are added to configure TX/RX modes of operation.
Using this CDR logic and the truly fractional divider described in Fig. 4.23 shows the
complete block diagram of the clock generation and recovery circuit. In the CDR mode,
MMD+DCDL+MDLL perform the function of a high-resolution DCO in a conventional
CDR. Since MDLL phase noise is independent of CDR loop bandwidth, this architecture
alleviates the jitter generation vs. jitter transfer trade-off present in conventional CDRs.
Figure 4.24: Block diagram of the proposed transmitter with flexible clocking scheme.
78
Figure 4.25: Block diagram of the proposed receiver with flexible clocking scheme.
Figure 4.24 shows the conceptual block diagram of the proposed transmitter. In a multi-
lane interface, both the transmitter and receiver in every lane share the low-noise LC-based
clock generator, which provides a fixed 7GHz frequency clock (CKHF) generated from a
125MHz reference clock. On the transmitter side, CKHF is fed to a true fractional divider
(FDIV) implemented with an integer multi-modulus divider (MMD) followed by a fixed-
range digitally controlled delay line (DCDL). FDIV output, CKLF, is used as the input to
the local MDLL that generates the transmitter clock, CKMDLL. The MDLL, as opposed to a
PLL, offers: (1) superior phase noise suppression of the ring oscillator, and (2) reduced CDR
loop latency on the receiver side, which lowers jitter peaking and improves jitter tolerance
(JTOL). The CKMDLL frequency is set by digital control word, DFCW, and can take any
value within the MDLL operating range. In the prototype, DFCW is 26 bits wide, which
provides a resolution of 8 ppm from 1.5-6GHz.
On the receiver side, the clock is recovered from the incoming data by a digital clock
recovery unit using the FDIV+MDLL as the digitally controlled oscillator (DCO) as shown
in Fig. 4.25. Because CKHF can be generated with low jitter (160 fsrms), all the transceiver
jitter metrics are dictated by the performance of FDIV+MDLL combination.
79
4.4 Building Blocks
In this section implementation details of a few key building blocks are described starting
with MDLL.
4.4.1 Mulitiplying Delay Locked Loop (MDLL)
VDDVCO
ACCDS
z-1
KDS
P-RDAC
DCTRL[7:0]
N-RDAC
Dummy Mux
Input Mux
CKMDLL,I
Multiplexed Ring Oscillator 
PRBS15 Gating
KIBBPD
z-1
ACCI
Auto-deskew Loop
DCWDS[7:0]
CKLF,P
CKLF,N
Selection 
Logic
SEL
DCDL
DCDL
÷2/3
FinFout
MoutMin
÷2/3
FinFout
MoutMin
÷2/3
FinFout
MoutMin
Multi-Modulus Divider (÷4~16) DCTRL[7:0]
DIV
DIV Control
DIV
VCTRL
CKMDLL,IB
CKMDLL,Q
CKMDLL,QB
Figure 4.26: Schematic of the multiplying delay locked loop for multi-phase generation.
Figure 4.26 shows the detailed schematic of the MDLL. A four stage multiplexed ring
oscillator is used for multi-phase generation. To improve the quadrature accuracy a dummy
MUX is also employed in the ring oscillator. An auto deskew loop using small DCDL is
employed to reduce the static phase offset in MDLL [14,50]. At 0.9V supply, the MDLL has
a tuning range from 3.4GHz to 6.1GHz, by reducing the supply to 0.7V the MDLL tuning
range can be extended to 1.5GHz to 6.1GHz.
4.4.2 DCDL Calibration Circuit
Figure 4.27 shows the DCDL calibration circuitry. A second-order digital ∆Σ is used to
truncate 14-bit correlator output to 5 bits followed by a 5-bit thermometer DAC and RC-
based low-pass filter to generate the LDO reference voltage.
80
Figure 4.27: Implementation details of the proposed DCDL gain calibration scheme.
4.4.3 Digitally Controlled Delay Line (DCDL)
128
units
Figure 4.28: Schematic of the DCDL used in quantization error cancellation.
Figure 4.28 shows the DCDL schematic [51]. The 10-bit digital control is segmented into
four cascaded delay stages. This segmented delay distribution approach helps to improve
the DCDL linearity performance. It also helps in maintaining fast rise and fall edges for
the clock signal. The fast edges improve the effective high-to-low and low-to-high transition
time and reduce the DCDL sensitivity to noise and mismatch.
81
4.4.4 Receiver Front-End
Figure 4.29: Detailed schematic of receiver front-end.
Figure 4.29 shows the detailed schematic of the receiver front-end. Incoming data is ac-
coupled and terminated with 50Ω impedance to minimize the reflections. Incoming data is
then passed through analog front-end consisting of CLTE and a variable gain stage which
drives four front-end samplers for half-rate operation. To minimize the power consumption
of the front-end charge-based sense amplifier (CSA) and limited swing sample and hold
circuits (SHC) are used for sampler design similar to [48,52]. The sampler output is passed
through a 2:8 deserializer stage while maintaining low swing. This de-serialized output is
then passed to the final 8:16 stage and provides the full swing data and edge output at
1/16th lower update of the data rate.
82
Figure 4.30: Schematic design of injection locking based high-frequency clock generator.
83
4.4.5 High-Frequency Clock Generator
Figure 4.30 shows schematic of the high-frequency clock generator. By using a reference
clock, FREF, of 109MHz with a fixed multiplication ratio of 64 it provides a clean 7GHz out-
put clock that is used as reference for fractional dividers. To achieve better power efficiency
an injection locked approch is used in comparison to tradition phase locking technique. To
avoid the frequency drift while in operation caused by PVT a dual pulse gating architecture
similar to [53] is used. It disables the reference injection when measuring the frequency error
thus providing a drift-free, low-jitter clock solution.
4.4.6 LC Oscillator
Figure 4.31: Cross-couple DCO schematic used in high-frequency clock generator.
Figure 4.31 shows the schematic of LC oscillator used in the injection locked clock multi-
plier. An cross-coupled LC tank architecture is used with a single turn inductor ultra-thick
metal inductor. This combination provides high-quality factor inductance resulting in low
phase noise. For frequency control it uses an 8-bit coarse and 8-bit fine frequency control
84
using capacitor bank. While the coarse bank is controlled externally, the fine frequency bank
is controlled using the injection locking loop accumulator. An NMOS switch is also added
between the two output nodes to enable pulse injection capabilities [53].
4.5 Experimental Results
Figure 4.32: Die micrograph of the proposed flexible-rate transceiver.
The prototype transceiver is fabricated in a 65 nm CMOS process and assembled in an 88-
pin QFN package. Figure 4.32 shows the die micrograph of the prototype chip. It occupies
an active area of 1.04mm2 of which the LC clock generator occupies 0.33mm2. The entire
chip operates from 1.0V and 0.9V supply and consumes about 57.5mW of power for per
lane communication.
Figure 4.33 shows the clocking performance of the overall architecture at 10Gb/s. An LC
DCO-based injection locked clock multiplier provides 7GHz clock using a reference clock,
FREF, of 109MHz with a fixed division ratio of 64. Integrated jitter (10 kHz-100MHz) of the
85
7GHz integer LC clock generator is measured to be 160 fsrms. This high-frequency clock is
then passed through fractional divider + MDLL to provide the fractional clock at 5GHz with
360 fsrms jitter. The spurs seen with 5GHz clock are due to the coupling between clocking
path and digital supply. The spurs cannot be observed when tested with an independent
reference clock. In this case the MDLL jitter can be further reduced to 249 fsrms as shown
in Fig. 4.34.
Figure 4.33: Measured phase noise plot for the proposed clocking scheme.
Figure 4.35 shows the measured transmit eye diagram at 10Gb/s with the proposed clock-
ing scheme. The transmitter has 420mV vertical and 0.71UI horizontal eye opening. The
statistical BER bathtub curve shown in Fig. 4.36 indicates the phase margin for BER<10−12
is 0.53UI.
Figure 4.37 shows the measured receiver jitter tolerance (JTOL) plot with and without
100 ppm frequency error between transmitter and receiver. The proposed receiver achieves
the JTOL corner of 15MHz and a high-frequency JTOL of 0.5UIPK−PK for BER<10−12.
The JTOL corner can be programmed up to 1MHz by scaling the CDR proportional and
integral path gains.
Figure 4.38 shows the phase noise plot for the recovered clock at 10Gb/s. With PRBS-31
86
Figure 4.34: Measured phase noise plot for the proposed clocking scheme using external
reference clock.
Figure 4.35: Measured transmitter eye diagram at 10Gb/s.
87
Figure 4.36: Measured transmitter statistical BER bathtub plot at 10Gb/s.
Figure 4.37: Measured receiver jitter tolerance plot at 10Gb/s.
88
Figure 4.38: Measured recovered clock phase noise plot.
Figure 4.39: Detailed power breakdown of the proposed transceiver.
89
Table 4.1: Performance Summary Comparison of the Proposed Flexible Clocking
Transceiver with State-of-the-Art Designs.
This Work
Savoj et al. Balan et al. Frans et al.
CICC’12 [44] ISSCC’14 [54] JSSC’15 [43]
Technology [nm] 65 28 28 20
Clocking Architecture LC+Ring LC, Ring LC Ring
Supply Voltage [V] 0.9, 1.0 1, 1.2 0.9, 1.35 0.95, 1.0, 1.2
Area [mm2]
1.05 3.28 2.66 4.36
(1 Lane + LC) (1 Quad) (16 Lane + LC) (1 Quad)
Data-Rate[Gb/s] 3-10 0.6-13.1 14-20 0.5-16.3
Clocking Integrated Jitter [fsrms] 360 399 420 352/621
Power Consumption [mW/Channel] 57.5 285 130 278
Power Efficiency [mW/Gbps] 5.75 21.75 6.5 17.2
data, measured recovered clock jitter is 760 fsrms. At 0.9V supply, the MDLL has a tuning
range from 3.4GHz to 6.1GHz, by reducing the supply to 0.7V the MDLL tuning range can
be extended to 1.5GHz to 6.1GHz.
The total transceiver power consumption at 10Gb/s with BER<10−12 is 57.5mW of
which the transmitter/receiver/LC clock generator consume 27.2mW/26.6mW/3.7mW. A
detailed power breakdown is shown in Fig. 4.39. The pie chart indicates that of only 7% of
the total power is used in each fractional divider while the MDLL consumes only 9% of total
power. Table 4.1 shows the performance summary and comparison with the state-of-the-art
wide range transceivers. The transceiver achieves programmable data-rate from 3 to 10Gb/s
with independent lane control and the power efficiency of 5.75mW/Gbps.
90
CHAPTER 5
CONCLUSION
A cascaded digital MDLL and digital PLL frequency synthesizer with a scrambling TDC
(STDC) is presented to achieve optimal random jitter and deterministic jitter performance
with a low-frequency input reference clock. The proposed STDC alleviates the trade-off
between the deterministic jitter accumulation and the input reference frequency and achieves
a reference frequency-independent deterministic jitter. The cascaded architecture with the
first stage as digital MDLL and a second-stage as digital PLL provides wide bandwidth for
improved VCO phase noise suppression and achieves optimal random jitter. Fabricated in a
90 nm CMOS process, the prototype frequency synthesizer consumes 4.76mW power from
a 1.0V supply and generates 160MHz and 2.56GHz output clocks from a 1.25MHz crystal
reference frequency. The measured results show that with the new scrambling TDC, the
long-term absolute jitter of the 160MHz digital MDLL and 2.56GHz digital PLL outputs
are 2.4 psrms and 4.18 psrms, while the peak-to-peak jitter values are 22.1 ps and 35.2 ps,
respectively. The proposed frequency synthesizer occupies an active die area of 0.16mm2
and achieves power efficiency of 1.86mW/GHz. The measured results show that with the
new scrambling TDC, the MDLL has an 8× improvement in rms jitter and a 4× improvement
in peak-to-peak jitter. While the cascaded output has a 7× improvement in rms jitter and
a 4× improvement in the peak-to-peak jitter.
As a second work, a hybrid phase/current phase interpolation technique is presented to
improve the phase noise performance of the ring oscillator based fractional-N PLLs. The
proposed HPC-PI alleviates the bandwidth trade-off between VCO phase noise suppression
and ∆Σ quantization noise suppression. By combining phase detection and interpolation
functions into XOR PD-PI, the accurate quantization error cancellation is achieved without
using calibration. Use of a digital MDLL in front of the fractional-N PLL helps in alleviating
91
the bandwidth limitation due to low-reference frequency, extending the PLL bandwidth
further to suppress VCO phase noise and lowers the in-band noise floor. Fabricated in a 65 nm
CMOS process, the prototype generates fractional output frequencies from 4.25 to 4.75GHz
with in-band noise floor of -104 dBc/Hz and 1.5 psrms integrated jitter. The measured results
show that the overall clock generator provides a 13.2 dB noise floor improvement. The
proposed clock multiplier achieves power efficiency of 2.4mW/GHz and the best figure of
merit of -225.8 dB for ring-VCO-based fractional-N PLLs reported in the literature.
In the final work, we present a clocking strategy for efficient clock generation, recovery, and
distribution in flexible-rate transceivers. Using a fixed-frequency low-jitter clock provided
by an integer-N PLL, fractional frequencies are generated/recovered locally using multi-
phase fractional-N clock multipliers, which offers the following advantages. First, clock
distribution power is reduced because only a single-phase fixed-frequency clock is needed.
Second, it can provide multiple phases with infinite phase-shifting capability across a wide
frequency range without using phase interpolators. Finally, by maximizing the bandwidth of
a local ring-based clock multiplier, it achieves jitter performance comparable to that of LC-
based multipliers. Fabricated in a 65 nm CMOS process, the prototype transceiver achieves
a programmable data rate from 3 to 10Gb/s with independent lane control and the power
efficiency of 5.75mW/Gbps. The 10Gb/s, integrated jitter of the transmitter output and
recovered clock are 360 fsrms and 758 fsrms, respectively.
92
REFERENCES
[1] “Datasheet-AMD embedded G-series family of processors.” [Online]. Available:
http://www.amd.com/en-us/products/embedded/processors/g-series
[2] F. Gardner, “Charge-pump phase-lock loops,” IEEE Trans. Commun., vol. 28, no. 11,
pp. 1849–1858, Nov. 1980.
[3] P. K. Hanumolu, M. Brownlee, K. Mayaram, and U.-K. Moon, “Analysis of charge-
pump phase-locked loops,” IEEE Trans. Circuits Syst. I, vol. 51, no. 9, pp. 1665–1674,
Sep. 2004.
[4] J. Dunning, G. Garcia, J. Lundberg, and E. Nuckolls, “An all-digital phase-locked loop
with 50-cycle lock time suitable for high-performance microprocessors,” IEEE J. Solid-
State Circuits, vol. 30, no. 4, pp. 412–422, Apr. 1995.
[5] T.-Y. Hsu, C.-C. Wang, and C.-Y. Lee, “Design and analysis of a portable high-speed
clock generator,” IEEE Trans. Circuits Syst. II, vol. 48, no. 4, pp. 367–375, Apr. 2001.
[6] C.-C. Chung and C.-Y. Lee, “An all-digital phase-locked loop for high-speed clock
generation,” IEEE J. Solid-State Circuits, vol. 38, no. 2, pp. 347–351, Feb. 2003.
[7] C.-M. Hsu, M. Straayer, and M. Perrott, “A low-noise wide-BW 3.6-GHz digital
fractional-N frequency synthesizer with a noise-shaping time-to-digital converter and
quantization noise cancellation,” IEEE J. Solid-State Circuits, vol. 43, no. 12, pp. 2776
–2786, Dec. 2008.
[8] E. Roth, M. Thalmann, N. Felber, and W. Fichtner, “A delay-line based DCO for
multimedia applications using digital standard cells only,” in IEEE ISSCC Dig. Tech.
Papers, Feb. 2003, pp. 432–505.
[9] N. Da Dalt, “A design-oriented study of the nonlinear dynamics of digital bang-bang
PLLs,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 52, no. 1,
pp. 21–31, Jan. 2005.
[10] W. Yin, R. Inti, A. Elshazly, B. Young, and P. K. Hanumolu, “A 0.7-to-3.5 GHz 0.6-to-
2.8 mW highly digital phase-locked loop with bandwidth tracking,” IEEE J. Solid-State
Circuits, vol. 46, no. 8, pp. 1870–1880, Aug. 2011.
93
[11] R. Farjad-Rad, W. Dally, H.-T. Ng, R. Senthinathan, M. J. Lee, R. Rathi, and J. Poul-
ton, “A low-power multiplying DLL for low-jitter multigigahertz clock generation in
highly integrated digital chips,” IEEE J. Solid-State Circuits, vol. 37, no. 12, pp. 1804–
1812, Dec. 2002.
[12] S. Ye, L. Jansson, and I. Galton, “A multiple-crystal interface PLL with VCO realign-
ment to reduce phase noise,” IEEE J. Solid-State Circuits, vol. 37, no. 12, pp. 1795–
1803, Dec. 2002.
[13] B. M. Helal, M. Z. Straayer, G.-Y. Wei, and M. H. Perrott, “A highly digital MDLL-
based clock multiplier that leverages a self-scrambling time-to-digital converter to
achieve subpicosecond jitter performance,” IEEE J. Solid-State Circuits, vol. 43, no. 4,
pp. 855–863, Apr. 2008.
[14] A. Elshazly, R. Inti, B. Young, and P. Hanumolu, “Clock multiplication techniques using
digital multiplying delay-locked loops,” IEEE J. Solid-State Circuits, vol. 48, no. 6, pp.
1416–1428, Jun. 2013.
[15] D. Park and S. Cho, “A 14.2 mW 2.55-to-3 GHz cascaded PLL with reference injection
and 800 MHz delta-sigma modulator in 0.13 µm CMOS,” IEEE J. Solid-State Circuits,
vol. 47, no. 12, pp. 2989–2998, Dec. 2012.
[16] R. Nandwana, T. Anand, S. Saxena, S.-J. Kim, M. Talegaonkar, A. Elkholy, W.-S.
Choi, A. Elshazly, and P. Hanumolu, “A calibration-free fractional-N ring PLL using
hybrid phase/current-mode phase interpolation method,” IEEE J. Solid-State Circuits,
vol. 50, no. 4, pp. 882–895, Apr. 2015.
[17] R. K. Nandwana, S. Saxena, A. Elshazly, K. Mayaram, and P. K. Hanumolu, “A 2.5GHz
5.4mW 1-to-2048 digital clock multiplier using a scrambling TDC,” in IEEE VLSI
Circuits Symp. Dig. Tech. Papers, Jun. 2013, pp. 156–157.
[18] R. K. Nandwana, S. Saxena, A. Elshazly, K. Mayaram, and P. K. Hanumolu, “A 1-to-
2048 fully-integrated cascaded digital frequency synthesizer for low frequency reference
clocks using scrambling TDC,” IEEE Trans. Circuits Syst. I, vol. 64, no. 2, pp. 283–295,
Feb 2017.
[19] B. Nikolic, V. G. Oklobdzija, V. Stojanovic, W. Jia, J. K.-S. Chiu, and M. Ming-Tak Le-
ung, “Improved sense-amplifier-based flip-flop: Design and measurements,” IEEE J.
Solid-State Circuits, vol. 35, no. 6, pp. 876–884, Jun. 2000.
[20] M. H. Perrott, Y. Huang, R. T. Baird, B. W. Garlepp, D. Pastorello, E. T. King, Q. Yu,
D. B. Kasha, P. Steiner, L. Zhang, J. Hein, and B. Del Signore, “A 2.5-Gb/s multi-rate
0.25-µm CMOS clock and data recovery circuit utilizing a hybrid analog/digital loop
filter and all-digital referenceless frequency acquisition,” IEEE J. Solid-State Circuits,
vol. 41, no. 12, pp. 2930–2944, Dec. 2006.
94
[21] T. Watanabe and S. Yamauchi, “An all-digital PLL for frequency multiplication by 4
to 1022 with seven-cycle lock time,” IEEE J. Solid-State Circuits, vol. 38, no. 2, pp.
198–204, Feb. 2003.
[22] C.-C. Chung and C.-Y. Ko, “A fast phase tracking ADPLL for video pixel clock gen-
eration in 65 nm CMOS technology,” IEEE J. Solid-State Circuits, vol. 46, no. 10, pp.
2300–2311, Oct. 2011.
[23] P.-L. Chen, C.-C. Chung, J.-N. Yang, and C.-Y. Lee, “A clock generator with cascaded
dynamic frequency counting loops for wide multiplication range applications,” IEEE J.
Solid-State Circuits, vol. 41, no. 6, pp. 1275–1285, Jun. 2006.
[24] T. Riley, M. Copeland, and T. Kwasniewski, “Delta-sigma modulation in fractional-N
frequency synthesis,” IEEE J. Solid-State Circuits, vol. 28, no. 5, pp. 553–559, May
1993.
[25] S. Pamarti, L. Jansson, and I. Galton, “A wideband 2.4-GHz delta-sigma fractional-N
PLL with 1-Mb/s in-loop modulation,” IEEE J. Solid-State Circuits, vol. 39, no. 1, pp.
49–62, Jan. 2004.
[26] A. Swaminathan, K. Wang, and I. Galton, “A wide-bandwidth 2.4 GHz ISM band
fractional-N PLL with adaptive phase noise cancellation,” IEEE J. Solid-State Circuits,
vol. 42, no. 12, pp. 2639–2650, Dec. 2007.
[27] M. Zanuso, S. Levantino, C. Samori, and A. Lacaita, “A wideband 3.6 GHz digital ∆Σ
fractional-N PLL with phase interpolation divider and digital spur cancellation,” IEEE
J. Solid-State Circuits, vol. 46, no. 3, pp. 627 –638, Mar. 2011.
[28] C.-H. Heng and B.-S. Song, “A 1.8-GHz CMOS fractional-N frequency synthesizer with
randomized multiphase VCO,” IEEE J. Solid-State Circuits, vol. 38, no. 6, pp. 848–854,
Jun. 2003.
[29] R. Nonis, W. Grollitsch, T. Santa, D. Cherniak, and N. Da Dalt, “digPLL-Lite: A
low-complexity, low-jitter fractional-N digital PLL architecture,” IEEE J. Solid-State
Circuits, vol. 48, no. 12, pp. 3134–3145, Dec. 2013.
[30] P. K. Hanumolu, V. Kratyuk, G.-Y. Wei, and U.-K. Moon, “A sub-picosecond resolution
0.5–1.5 GHz digital-to-phase converter,” IEEE J. Solid-State Circuits, vol. 43, no. 2,
pp. 414–424, Feb. 2008.
[31] T.-K. Kao, C.-F. Liang, H.-H. Chiu, and M. Ashburn, “A wideband fractional-N ring
PLL with fractional-spur suppression using spectrally shaped segmentation,” in IEEE
ISSCC Dig. Tech. Papers, 2013, pp. 416–417.
[32] D. Tasca, M. Zanuso, G. Marzin, S. Levantino, C. Samori, and A. L. Lacaita, “A 2.9–4.0-
GHz fractional-N digital PLL with bang-bang phase detector and 560-fsrms integrated
jitter at 4.5-mW power,” IEEE J. Solid-State Circuits, vol. 46, no. 12, pp. 2745–2758,
Dec. 2011.
95
[33] P.-C. Huang, W.-S. Chang, and T.-C. Lee, “A 2.3GHz fractional-N dividerless phase-
locked loop with -112dBc/Hz in-band phase noise,” in IEEE ISSCC Dig. Tech. Papers,
Feb. 2014, pp. 362–363.
[34] R. K. Nandwana, T. Anand, S. Saxena, S.-J. Kim, M. Talegaonkar, A. Elkholy, W.-
S. Choi, A. Elshazly, and P. K. Hanumolu, “A 4.25 GHz-4.75 GHz calibration-free
fractional-N ring PLL using hybrid phase/current-mode phase interpolator with 13.2
dB phase noise improvement,” in IEEE VLSI Circuits Symp. Dig. Tech. Papers, Jun.
2014, pp. 230–231.
[35] T. Toifl, C. Menolfi, P. Buchmann, M. Kossel, T. Morf, R. Reutemann, M. Ruegg,
M. Schmatz, and J. Weiss, “A 0.94-ps-RMS-Jitter 0.016-mm2 2.5-GHz multiphase gen-
erator PLL with 360◦ digitally programmable phase shift for 10-Gb/s serial links,” IEEE
J. Solid-State Circuits, vol. 40, no. 12, pp. 2700–2712, Dec. 2005.
[36] G. Shu, S. Saxena, W.-S. Choi, M. Talegaonkar, R. Inti, A. Elshazly, B. Young, and
P. Hanumolu, “A reference-less clock and data recovery circuit using phase-rotating
phase-locked loop,” IEEE J. Solid-State Circuits, vol. 49, no. 4, pp. 1036–1047, Apr.
2014.
[37] K.-Y. K. Chang, J. Wei, C. Huang, Y. Li, K. Donnelly, M. Horowitz, Y. Li, and
S. Sidiropoulos, “A 0.4-4-Gb/s CMOS quad transceiver cell using on-chip regulated
dual-loop PLLs,” IEEE J. Solid-State Circuits, vol. 38, no. 5, pp. 747–754, May 2003.
[38] S. Meninger and M. Perrott, “A 1-MHz bandwidth 3.6-GHz 0.18-µmCMOS fractional-N
synthesizer utilizing a hybrid PFD/DAC structure for reduced broadband phase noise,”
IEEE J. Solid-State Circuits, vol. 41, no. 4, pp. 966–980, Apr. 2006.
[39] P. Park, D. Park, and S. Cho, “A 2.4 GHz fractional-N frequency synthesizer with high-
OSR ∆Σ modulator and nested PLL,” IEEE J. Solid-State Circuits, vol. 47, no. 10, pp.
2433–2443, Oct. 2012.
[40] D.-W. Jee, Y.-H. Seo, H.-J. Park, and J.-Y. Sim, “A 2 GHz fractional-N digital PLL
with 1b noise shaping ∆Σ TDC,” IEEE J. Solid-State Circuits, vol. 47, no. 4, pp. 875
–883, Apr. 2012.
[41] M.-W. Chen, D. Su, and S. Mehta, “A calibration-free 800 MHz fractional-N digital
PLL with embedded TDC,” IEEE J. Solid-State Circuits, vol. 45, no. 12, pp. 2819
–2827, Dec. 2010.
[42] X. Yu, Y. Sun, W. Rhee, and Z. Wang, “An FIR-embedded noise filtering method for
fractional-N PLL clock generators,” IEEE J. Solid-State Circuits, vol. 44, no. 9, pp.
2426–2436, Sep. 2009.
[43] Y. Frans, D. Carey, M. Erett, H. Amir-Aslanzadeh, W. Y. Fang, D. Turker, A. P. Jose,
A. Bekele, J. Im, P. Upadhyaya, Z. D. Wu, K. C. H. Hsieh, J. Savoj, and K. Chang,
“A 0.5-16.3 Gb/s fully adaptive flexible-reach transceiver for FPGA in 20 nm CMOS,”
IEEE J. Solid-State Circuits, vol. 50, no. 8, pp. 1932–1944, Aug. 2015.
96
[44] J. Savoj, K. Hsieh, P. Upadhyaya, F. T. An, J. Im, X. Jiang, J. Kamali, K. W. Lai,
D. Wu, E. Alon, and K. Chang, “Design of high-speed wireline transceivers for backplane
communications in 28nm CMOS,” in Proc. IEEE Custom Integr. Circuits Conf. (CICC),
Sep. 2012, pp. 1–4.
[45] J. Savoj, H. Aslanzadeh, D. Carey, M. Erett, W. Fang, Y. Frans, K. Hsieh, J. Im,
A. Jose, D. Turker, P. Upadhyaya, D. Wu, and K. Chang, “Wideband flexible-reach
techniques for a 0.5-16.3gb/s fully-adaptive transceiver in 20nm CMOS,” in Proc. IEEE
Custom Integr. Circuits Conf. (CICC), Sep. 2014, pp. 1–4.
[46] P. Upadhyaya, J. Savoj, F. T. An, A. Bekele, A. Jose, B. Xu, D. Wu, D. Turker,
H. Aslanzadeh, H. Hedayati, J. Im, S. W. Lim, S. Chen, T. Pham, Y. Frans, and
K. Chang, “A 0.5-to-32.75Gb/s flexible-reach wireline transceiver in 20nm CMOS,” in
IEEE ISSCC Dig. Tech. Papers, Feb. 2015, pp. 1–3.
[47] M. Raj, S. Saeedi, and A. Emami, “A wideband injection locked quadrature clock
generation and distribution technique for an energy-proportional 16-32 Gb/s optical
receiver in 28 nm FDSOI CMOS,” IEEE J. Solid-State Circuits, vol. 51, no. 10, pp.
2446–2462, Oct. 2016.
[48] S. Saxena, G. Shu, R. K. Nandwana, M. Talegaonkar, A. Elkholy, T. Anand, S. J. Kim,
W. S. Choi, and P. K. Hanumolu, “A 2.8mW/Gb/s 14Gb/s serial link transceiver in
65nm CMOS,” in IEEE VLSI Circuits Symp. Dig. Tech. Papers, pp. 352–353.
[49] R. K. Nandwana, S. Saxena, A. Elkholy, M. Talegaonkar, J. Zhu, W. S. Choi, A. Elmal-
lah, and P. K. Hanumolu, “A 3-to-10Gb/s 5.75pJ/bit transceiver with flexible clocking
in 65nm CMOS,” in IEEE ISSCC Dig. Tech. Papers, pp. 492–493.
[50] G. Marucci, A. Fenaroli, G. Marzin, S. Levantino, C. Samori, and A. Lacaita, “A 1.7ghz
MDLL-based fractional-N frequency synthesizer with 1.4ps RMS integrated jitter and
3mw power using a 1b TDC,” in IEEE ISSCC Dig. Tech. Papers, Feb. 2014, pp. 360–
361.
[51] A. Elkholy, A. Elmallah, M. Elzeftawi, K. Chang, and P. K. Hanumolu, “6.75-to-
8.25GHz, 250fsrms-integrated-jitter 3.25mw rapid on/off PVT-insensitive fractional-N
injection-locked clock multiplier in 65nm CMOS,” in IEEE ISSCC Dig. Tech. Papers,
Jan. 2016, pp. 192–193.
[52] S. Saxena, G. Shu, R. K. Nandwana, M. Talegaonkar, A. Elkholy, T. Anand, S. J. Kim,
W. S. Choi, and P. K. Hanumolu, “A 2.8mW/Gb/s 14Gb/s serial link transceiver in
65nm CMOS,” IEEE J. Solid-State Circuits, 2017, accepted.
[53] A. Elkholy, M. Talegaonkar, T. Anand, and P. K. Hanumolu, “Design and analysis
of low-power high-frequency robust sub-harmonic injection-locked clock multipliers,”
IEEE J. Solid-State Circuits, vol. 50, no. 12, pp. 3160–3174, Dec. 2015.
97
[54] V. Balan, O. Oluwole, G. Kodani, C. Zhong, S. Maheswari, R. Dadi, A. Amin, G. Bha-
tia, P. Mills, A. Ragab, and E. Lee, “A 130mw 20gb/s half-duplex serial link in 28nm
CMOS,” in IEEE ISSCC Dig. Tech. Papers, Feb. 2014, pp. 438–439.
98
