Ring oscillator based injection locked clock multiplier by Coombs, Daniel R
c© 2017 Daniel Coombs
RING OSCILLATOR BASED INJECTION LOCKED CLOCK
MULTIPLIER
BY
DANIEL COOMBS
THESIS
Submitted in partial fulfillment of the requirements
for the degree of Master of Science in Electrical and Computer Engineering
in the Graduate College of the
University of Illinois at Urbana-Champaign, 2017
Urbana, Illinois
Adviser:
Associate Professor Pavan Kumar Hanumolu
ABSTRACT
This thesis describes a ring-based injection locked clock multiplier (ILCM)
designed with the goal of generating a high-frequency and low-jitter clock.
Building on prior research done on injection locking, this design uses a refer-
ence frequency doubling technique to push the noise bandwidth of the circuit
to FREF/3 to suppress DCO noise to a large extent. A background duty cy-
cle error correction technique is employed to correct errors on the doubled
clock that could be detrimental to performance. The design also modifies an
existing architecture to achieve type-II suppression of DCO noise in order
to fully suppress the flicker noise which becomes prevalent in low process
nodes. The prototype ILCM was fabricated in TSMC 65 nm CMOS tech-
nology. Thorough testing was performed to characterize the effectiveness
of the aforementioned techniques. The circuit achieves 340 fsrms integrated
jitter when operating at 5 GHz while only consuming 5.3 mW of power.
The ILCM’s figure of merit, -242.4 dB, is on par with state-of-the-art ring-
based clock multipliers while operating at a much higher output frequency
and multiplication factor than previously published work. These results in-
dicate the effectiveness of reference frequency doubling in a ring-based, high-
performance clock multiplier design.
ii
ACKNOWLEDGMENTS
I would like to thank my advisor, Professor Hanumolu, for giving me the op-
portunity to work on this project and for patiently working with me through-
out the entire process. Many thanks to the members of my research group
who helped with this thesis, especially Ahmed Elkholy, who was the driving
force behind much of this project. Without his patience and intelligence this
work would not have been possible. I would also like to thank Brady Salz
for his help with the PCB design and for answering thousands of my ques-
tions over the last two years. Lastly, a special thanks to my parents who
provided me with the opportunity to study at the University of Illinois and
have supported me unconditionally throughout my life.
iii
TABLE OF CONTENTS
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . 1
1.1 Digital Phase-Locked Loops . . . . . . . . . . . . . . . . . . . 2
1.2 Voltage Controlled Oscillators . . . . . . . . . . . . . . . . . . 3
1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
CHAPTER 2 VCO NOISE SUPPRESSION . . . . . . . . . . . . . . 6
2.1 Architecture Comparison . . . . . . . . . . . . . . . . . . . . . 6
2.2 Injection Locking Basics . . . . . . . . . . . . . . . . . . . . . 9
CHAPTER 3 RING-BASED INJECTION LOCKED CLOCK MUL-
TIPLIER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1 Design Considerations . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Reference Frequency Doubling . . . . . . . . . . . . . . . . . . 11
3.3 Frequency Tracking . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4 Flicker Noise Suppression . . . . . . . . . . . . . . . . . . . . . 23
CHAPTER 4 CIRCUIT BUILDING BLOCKS . . . . . . . . . . . . . 26
4.1 Ring-Based Digitally Controlled Oscillator . . . . . . . . . . . 26
4.2 Digitally Controlled Delay Line . . . . . . . . . . . . . . . . . 27
4.3 Full Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 29
CHAPTER 5 RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.1 Measurement Setup . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.3 Performance Summary . . . . . . . . . . . . . . . . . . . . . . 37
CHAPTER 6 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . 40
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
iv
CHAPTER 1
INTRODUCTION
Phase-locked loops (PLLs) are one of the most common building blocks
found in modern integrated circuits. Operating from a low-frequency ref-
erence clock, PLLs generate a high-frequency clock for use in a wide variety
of systems including: serial links, RF communication, processor clocking,
and system on chip (SoC) clock distribution. PLL performance is generally
graded on the circuit’s ability to generate a low-jitter output clock. This
is desirable for many reasons. For example, consider a serial link using the
output of a PLL to clock its sampling of an incoming data stream. It is
extremely important for the receiver to sample the data at exactly the same
point in each symbol, typically the point where the signal to noise ratio is
the highest, in order to correctly recover the data. Therefore, not only does
the PLL clock need to operate at the same frequency as the incoming data,
it needs to be phase locked such that its sampling location is constant.
In the frequency domain, time domain jitter is represented by phase noise.
The relationship between time domain jitter and phase noise is given by
Equation 1.1, where SΦOUT (f) is the phase noise at frequency f and TOUT is
the frequency of the output clock.
σ∆T =
√∫ ∞
0
SΦOUT (f) ·
TV CO
2pi
(1.1)
Jitter is therefore a function of the total amount of phase noise on the output
clock. PLLs achieve low-jitter clocks using feedback loops designed to sup-
press phase noise. Within these loops, clock multiplication can be enabled
such that the PLL can operate at a multiple of the reference clock. Refer-
ence clocks in most systems are generated by a crystal oscillator circuit which
typically operates in the range of 50-200 MHz. High-frequency clocks, that
can range from a couple of GHz to 10s of GHz and are used in many high-
1
performance computing systems to push computational power and commu-
nication speed, require large reference clock multiplication factors. Modern
SoCs use multiple PLLs to generate separate clocks for different subsystems,
meaning that area and power consumption of the PLL are important at-
tributes. Also, SoCs require flexible frequency generation for various reasons
such as dynamic frequency scaling which scales the operating frequency of
a processor to match its required computational load to save power. PLL
designers for SoCs are therefore mostly concerned about phase noise, power
consumption, area, and frequency range. Ideally, a high-frequency, low phase
noise clock is generated using small amounts of power and chip area, and has
the ability to operate at a wide range of frequencies.
1.1 Digital Phase-Locked Loops
Traditionally, PLLs have been built using purely analog circuitry. Due to the
large area requirement of analog loop filters, digital PLLs (DPLLs), which
implement loop filtering digitally, have been a major area of interest for
researchers [1]. A simple DPLL is shown in Figure 1.1. A time to digital
converter (TDC) compares the phase of the output clock, OUT , with the
reference clock, REF , and converts this information into a digital signal.
This digital representation of the phase error is sent to a digital loop filter
(DLF) which controls a digitally controlled oscillator (DCO). The digitally
controlled oscillator consists of a digital-to-analog converter (DAC) and a
voltage controlled oscillator (VCO). The DAC converts the output of the
DLF into the control voltage for the VCO. A divider placed in the reference
path forces the loop to phase lock the output clock to the Nth multiple of
the reference clock.
The phase noise of the output clock is affected by many factors such as the
noise from the reference clock, random and quantization noise of the building
blocks such as the TDC, and noise from the DCO. In many systems the DCO
is the largest contributor to output clock phase noise. The rest of this work
will assume a clean reference clock and focus on lowering jitter generated
by PLL loop components. Furthermore, DCO noise is treated as the largest
source of jitter generation and the main focus will be on improving jitter
performance through its suppression.
2
TDC DAC
÷N
DCO
OUTREF DLF
Figure 1.1: Block diagram of a TDC-based digital PLL.
VC
Figure 1.2: Circuit diagram for a LC-based voltage controlled oscillator.
1.2 Voltage Controlled Oscillators
Oscillators can be implemented using either ring-based or LC-based archi-
tectures. LC-VCOs, like the one shown in Figure 1.2, have superior phase
noise performance compared to ring-VCOs. However, they suffer from sev-
eral drawbacks. First, they require large area especially when the oscillation
frequency is lower than a couple of GHz. Second, they do not scale well
across technologies and need to be redesigned when moving to a new process
node. Third, to maintain a high quality factor (and thus low phase noise),
their frequency tuning range is typically narrow.
Ring-VCOs, like the one shown in Figure 1.3, have much smaller area, and
scale well across CMOS process nodes. They can be easily designed to achieve
a wide frequency tuning range. A survey of published LC and ring-VCOs
3
VC
Figure 1.3: Circuit diagram for a ring-based voltage controlled oscillator.
depicted in Figure 1.4 shows their oscillator figure-of-merit (FoM) given by
Equation 1.2, where Lw{∆w} is the VCO phase noise at the offset frequency
∆w and PDC [mW ] is the oscillator power expressed in mW.
FoMV CO =
(wLO/∆w)
2
Lw{∆w}PDC [mW ] (1.2)
The FoM for ring-VCOs is approximately 20 dB worse than that of LC-VCOs
[2]. Per the analysis of Hajimiri in [3] and Abidi in [4], it is well known that
the most effective way to reduce ring-VCO phase noise is to increase power
consumption. This holds for phase noise caused by both thermal as well
as flicker noise. Therefore, to achieve same FoM performance as LC-VCOs,
ring-VCOs roughly need to consume 100 times more power.
1.3 Overview
With a fixed power budget, the PLLs that use ring oscillators display a gap
in performance compared to those using a LC-VCO. This work seeks to re-
duce the gap and make the performance of ring-VCO based clock generators
comparable to that of LC-based clock generators. Therefore, a technique to
suppress oscillator noise to a high extent is sought after. The technique must
also preserve the goal of a wide-range, high-frequency clock multiplier. The
rest of this thesis is organized to illustrate a proposed technique for achiev-
ing these goals. Chapter 2 reviews and compares some prior art focused on
VCO noise suppression. Chapter 3 describes the proposed method based on
reference frequency doubling and injection locking. Chapter 4 then describes
the circuit implementation of this technique and Chapter 5 presents the mea-
4
Figure 1.4: FoM comparison of LC and ring-based VCOs [2].
sured results. Finally, Chapter 6 concludes the thesis with a summary of the
key points.
5
CHAPTER 2
VCO NOISE SUPPRESSION
In this chapter, techniques for VCO noise suppression are discussed. As
described in Chapter 1, ring-VCOs have significantly worse jitter performance
than LC-VCOs. In order for a circuit to reap the benefits of ring-VCOs,
namely their smaller area and wider tuning range, while maintaining low-
jitter performance, sufficient VCO noise suppression is required. Section 2.1
will discuss design trade-offs and prior art of high-frequency, ring-based clock
multipliers. Section 2.2 will then delve deeper into the basics of injection
locked clock multipliers.
2.1 Architecture Comparison
Due to the accumulation of thermal noise, VCO phase noise increases by 20
dB/decade from high to low frequency. Therefore, at least a first-order high-
pass filtering of oscillator noise is needed in any practical clock multiplier.
The transfer function from VCO noise to the output, referred to as the VCO
noise transfer function (NTFVCO), for all clock multipliers needs a high-pass
characteristic. The bandwidth of NTFVCO is referred to as the VCO noise
bandwidth (NBW). A higher NBW leads to more suppression of VCO phase
noise.
Clock multipliers have an inherent NBW limit that depends heavily on
choice of architecture and reference clock frequency. Assuming no other noise
in the system, to improve output jitter due to oscillator noise after the NBW
limit is reached, VCO phase noise must be reduced. As discussed in Chapter
1, the most effective way to do this is to increase VCO power consumption.
For instance, to lower phase noise by 6 dB, power must be quadrupled. This
observation exhibits a noise bandwidth/power consumption trade-off. The
NBW limit can therefore be used as a metric to compare the amount of VCO
6
noise suppression across different clock multiplier architectures.
2.1.1 Digital PLL
To ensure stability in a PLL, the maximum noise bandwidth is typically
limited to FREF/10 [5]. In a conventional DPLL, the NTF of the TDC and
loop filter have a low-pass characteristic with a bandwidth equal to the NBW
of the DCO. By extending the NBW, more noise from these components
leaks to the output, degrading performance. This typically limits the extent
to which the NBW can be increased. Sub-sampling PLL architectures [6]
are designed to improve this trade-off, however, they still suffer from the
FREF/10 limit.
2.1.2 Multiplying Delay-locked Loop
Multiplying delay-locked loops (MDLLs) offer a means to push the noise
bandwidth further while also decoupling the bandwidth of the TDC and the
DCO [7]. By replacing a noisy oscillator output clock edge with a clean
reference clock edge, MDLLs increase the noise bandwidth to FREF/4. As
a consequence, the DCO phase noise is suppressed by an additional 6 dB
compared to that of a PLL. The architecture for a MDLL is shown in Figure
2.1. It consists of a ring oscillator preceded by a mux which can either feed
the oscillator output clock edge back into the oscillator, or replace it with a
reference edge. The digital selection logic generates a pulse to initiate this
edge replacement. Many high-performance ring-based clock multipliers have
been demonstrated using this architecture [7], [8], [9], [10]. However, while
the high noise bandwidth of a MDLL is appealing, replacing the oscillator
edge reliably at high frequencies complicates design and often results in either
degraded deterministic jitter performance or increased power consumption.
For example, in [10] a complex tuning scheme was needed in order to oper-
ate at 4.6 GHz. For this reason, the output frequency of most MDLLs in
literature is limited to below 2.5 GHz.
7
DLF
÷NSelect Logic
OUT
VCTRL
VCOREF
FF
Figure 2.1: Multiplying delay-locked loop block diagram.
Pulse Gen.
REF OUT
TP
Figure 2.2: Simplified injection locked clock multiplier.
2.1.3 Injection Locked Clock Multiplier
Injection locked clock multipliers (ILCMs) lock the DCO to an integer mul-
tiple of the reference frequency by injecting narrow pulses into a oscillator
whose free running frequency is close to a multiple of FREF . A block di-
agram for a simple ILCM is shown in Figure 2.2. Because ILCMs do not
require edge replacement logic, they are better suited for generating higher
frequencies compared to MDLLs. However, in practice their noise bandwidth
is typically about FREF/6, leading to less DCO noise suppression compared
to MDLLs as shown in Figure 2.3. ILCMs outperform traditional PLLs for
DCO noise suppression and can operate at higher frequencies than MDLLs.
Thus, they are the ideal choice for a high-frequency, ring-based clock multi-
plier.
8
106 107
Frequency (Hz)
-12
-6
0
dB
PLL
MDLL
ILCM
(a) Transfer Functions
105 106 107
Freqency (Hz)
-115
-108
-100
dB
c/
Hz
Open Loop
PLL
MDLL
ILCM
(b) VCO Output Noise
Figure 2.3: VCO noise suppression comparison across popular architectures.
2.2 Injection Locking Basics
Recently, the design and analysis of injection locked clock multipliers has
been a trending topic due to their ability to generate extremely low jitter
output clocks [11], [12], [13], [14]. In an ILCM like the one shown in Figure
2.2, a pulse generator turns the reference clock into narrow pulses which are
injected into an oscillator with a free running frequency close to an integer
multiple of FREF (FOUT ≈ NFREF ). The injection pulls the phase of the
injected clock (OUT ) toward the phase of the injecting clock (REF ) until
the two clocks are phase locked. This is done by triggering pulses with the
positive edge of REF that drive a switch in the oscillator which pulls the
output clock toward its zero crossing.
Analysis of injection locking by Dunwell et al. [12] and extended by Elkholy
et al. [14] introduced a robust theoretical model based on the phase domain
response (PDR) of the injection. The PDR is the relationship between the
oscillator phase prior to the injection and the phase post injection. Its zero
crossings indicate stability points where the output clock and input clock are
locked. The slope of the PDR at a given input phase corresponds to the
injection strength, β. As will be shown in Section 3.4, β plays an important
role in determining the noise bandwidth in an ILCM. These analysis also
quantify the effect of oscillator frequency error (discussed in Section 3.3) and
oscillator pull-in range (discussed in Section 5.2) on overall performance.
This work seeks to build upon these advancements to design a practical
ring-based injection locked clock multiplier that operates at high-frequency
with extremely low jitter due to large amounts of VCO noise suppression.
9
CHAPTER 3
RING-BASED INJECTION LOCKED
CLOCK MULTIPLIER
As shown in Chapter 2, a ring-based ILCM is a good choice for a high-
frequency clock multiplier with large amounts of VCO noise suppression.
The use of a ring oscillator allows the circuit to achieve a wide tuning range
and have a small form factor. This chapter will discuss the design of such an
architecture and address the major implementation challenges.
3.1 Design Considerations
By removing the need for complicated selection logic, ILCMs can operate re-
liably at higher frequencies compared to MDLLs. However, as was shown in
Chapter 2, their noise bandwidth is typically around FREF/6 which is lower
than the bandwidth achievable by MDLLs, FREF/4. This smaller noise band-
width is a major limiting factor for overall performance because improving
DCO output phase noise further after reaching the maximum noise band-
width requires an increase in DCO power. In this design, an injection rate
doubling approach that can achieve a noise bandwidth of FREF/3 is proposed
and discussed in Section 3.2.
Another well-known issue with injection locked clock multipliers is that
they require precise frequency tracking in order to maintain low-noise per-
formance [14]. Many different approaches have been taken in ILCMs to
mitigate this issue [13], [14], [15]. In this design, a scheme similar to the one
in [14] is adapted to maintain the free-running frequency of the VCO close
to the desired output frequency. Its design is described in Section 3.3.
At lower process nodes the flicker noise of a ring oscillator can become
a major contributing factor to the overall noise in the circuit. Previously
published digital ring-based ILCMs have a type-1 DCO noise transfer func-
tion. This allows the low-frequency DCO flicker noise to leak onto the output
10
clock. In view of this, our work uses a digital version of the pulse-position-
modulation approach in [16], presented in Section 3.4, to achieve a type-2
NTFDCO to fully suppress DCO flicker noise.
3.2 Reference Frequency Doubling
The NBW of an ILCM is largely determined by its injection strength, β (see
Section 3.4). As β is increased, the distortion incurred by the injection on
the output clock begins to negatively affect its performance. Also, achieving
a high β requires using a large injection switch which adds significant loading
to the oscillator, reducing its frequency. To achieve high-frequencies, more
power needs to be burned to overcome the loading of the switch. For these
reasons, β is typically limited in practice, thus limiting the NBW to around
FREF/6.
To overcome this issue, the the reference clock frequency can be doubled
internally before it is fed into the injection locking circuit, doubling the in-
jection rate. Doubling the reference frequency at the input to the injection
locking circuit doubles the NBW. The NBW of the ILCM stays constant
with respect to the doubled clock REFx2, with its NBW set at FREFx2/6.
However, due to the internal doubling, the effective NBW with respect to
the true reference clock is FREF/3.
As shown in Figure 3.1, doubling the reference frequency has a large impact
on DCO phase noise suppression. In this simulation, a 5 GHz output clock is
generated by injecting a clean 125 MHz reference clock into an oscillator with
a spot phase noise of -115 dBc/Hz at 10 MHz offset. The resulting integrated
jitter on the output clock is 505 fsrms. When the reference frequency is
increased to 250 MHz, DCO phase noise is suppressed by an additional 6 dB,
and the output integrated jitter reduces to 335 fsrms. Doubling the reference
frequency from 125 MHz to 250 MHz achieved an reduced jitter by 35% on
the 5 GHz output clock.
3.2.1 Frequency Doubling Circuit
Figure 3.2 shows a simple XOR based clock frequency doubling circuit. The
input clock, REF , is XORed with its delayed version, REFD. As illustrated
11
104 105 106 107 108
Freqency (Hz)
-120
-115
-110
-105
-100
Ph
as
e 
N
oi
se
 (d
Bc
/H
z)
Open Loop
Fref = 125MHz
Fref = 250MHz
Figure 3.1: Impact of doubling FREF on DCO phase noise.
REF REFx2TD
Figure 3.2: Schematic of a simple doubling circuit.
in Figure 3.3, both the rising and falling edges of the reference clock generate
a rising edge at the XOR output, resulting in an output clock, REFx2, with
twice the frequency. The high duration of the output is equal to the time
delay of REFD, TD. It is important to notice that this delay must be large
enough to prevent narrow REFx2 clock pulses that are difficult to process by
the ensuing circuitry.
The output of this frequency doubling circuit is used as the input to a
injection locking circuit, as shown in Figure 3.4, to double the effective NBW.
When operating at maximum injection strength, this leads to a NBW of
FREF/3.
3.2.2 Impact of Duty Cycle Error
A major drawback of this simple doubler circuit is its sensitivity to input
clock duty cycle error. To understand this issue, consider the timing diagram
12
TD
REF
REFD
REFx2
Figure 3.3: Timing diagram for an ideal doubler.
shown in Figure 3.5. An ideal clock spends 50% of its clock period in the high
state and 50% in the low state. When duty cycle error is present, the time for
with the clock is high, TH , is not equal to the duration for with it is low, TL.
Because both the rising and falling edges of the input clock create a rising
edge on the output clock, the output clock period will alternate between
TH and TL from cycle to cycle. This results in an average output period of
(TH + TL)/2 = TREF/2 with TREF being the time period of the reference
clock. This is desired as the average output period has been reduced by half,
doubling the frequency. However, the duty cycle error appears on the output
clock as deterministic period jitter of |TH − TL| = TDCE.
The effect of duty cycle error on ILCM performance is best understood
in the frequency domain. Because doubler output time period alternates
between TH and TL at a rate of FREF , duty cycle error appears as large spurs
at an offset frequency of FREF from FREFx2 as illustrated by the spectrum
of the clock at the output of the frequency doubler in Figure 3.6a.
When an oscillator is injection locked to the doubler output, as in Figure
3.4, the spurious tones present in the doubled clock appear almost as is at
REF
OUT
REFx2
Pulse Gen.
TP
Figure 3.4: Block diagram for injection locking with reference frequency
doubling.
13
REFIDEAL
REFACTUAL
REFx2 TH TL
TH ≠ TL  Duty Cycle Error = |TH-TL| 
Duty Cycle 
Error
Figure 3.5: Frequency doubler timing diagram with duty cycle error on
input clock.
FREFx2
FREFFREF
(a) Input
FOUT
Inj. Locked
REFx2 Spur REF Spur
(b) Output
Figure 3.6: Frequency doubler input and output clock spectrums with duty
cycle error on input clock.
the output of injection locked oscillator, as shown in Figure 3.6b. This is a
direct result of pushing the noise bandwidth higher by increasing the injection
rate and strength for DCO noise suppression. Described in Section 3.4, the
reference noise has a low-pass NTF with the same bandwidth as the NTF
for the DCO. Pushing this bandwidth to higher frequencies results in less
suppression of reference clock noise and any spurious tones. In other words,
because of the large noise bandwidth, the spurs present in the injection signal
are not adequately suppressed.
Transient simulations indicate that 0.2% duty cycle error, which is equal
to 15 ps at 125 MHz causes deterministic jitter of 7.2 ps of the output clock.
This represents only 6 dB suppression by the oscillator. Therefore, to make
the proposed frequency doubler scheme viable in practice a duty cycle error
mitigation technique is needed.
14
REFx2
Error
REFx2
Corrected
REFx2
Ideal
-T +T -T
Figure 3.7: Ideal duty cycle error correction timing diagram.
3.2.3 Duty Cycle Error Correction
Duty cycle error can be removed by adding a sequence of delays to each
positive edge of the REFx2 clock, as shown in Figure 3.7. The shorter of the
two periods is increased by applying a −∆T delay to its leading edge, while
the longer period gets shortened by applying a +∆T delay to its leading
edge. Therefore, setting |∆T | = TDCE/2, eliminates deterministic jitter on
the doubled clock. This correction method is depicted by the block diagram
in Figure 3.8. The delays are applied by a delay block with a magnitude of
∆T . The sign of the delay is controlled by a signal which alternates between
±1 at each falling edge of the REFx2 clock, resulting in a +∆T/−∆T delay
sequence. This delay sequence completely removes the duty cycle error.
In practice, “negative delay” can be achieved by setting the total delay
applied to each edge to an offset delay plus ±∆T , as in Equation 3.1. When
a −∆T is added the resulting edge will occur earlier than an edge that was
only delayed by Toff , and an edge with +∆T added will occur later. This
realizes a “negative delay” and a “positive delay” with respect to the edge
with only Toff applied.
TDelay = Toff ±∆T (3.1)
The complete block diagram of the proposed duty cycle error correction
scheme is shown in Figure 3.9. A digitally controlled delay line (DCDLCAL),
is used to apply delays to the REFx2 clock edges. DCDLCAL is controlled
15
± T
+/- Sequence
TP
REF
OUT
Pulse Gen.
REFx2
Figure 3.8: Ideal duty cycle error correction technique.
by a digital control word, DCW [k], which controls the total delay at time
step k according to Equation 3.2. DCW [k] is added to the mid-code value
of the delay line to create the effective control word. The total delay is equal
to this effective control word times the DCDL gain, GDCDL, which is defined
as the delay change per one control word step. Relating this expression to
Equation 3.1, the offset delay is equal to a static value given by Equation
3.3, and ∆T is a time-varying quantity given by Equation 3.4. A “negative
delay” occurs by applying a negative DCW [k], while a “positive delay” is
added by a positive DCW [k].
TDelay[k] = GDCDL(DCWmid +DCW [k]) (3.2)
TP
!!PD
KCAL[k]
µLMS
1-z-1
ERR[k]
DCW[k]
DCDLCAL
REF
OUT
DCDLDLL
ACCDLL DLL
Template 
Gen.
T[k]
Pulse Gen.
Figure 3.9: Complete duty cycle error correction architecture.
16
Toff = GDCDLDCWmid (3.3)
∆T [k] = GDCDLDCW [k] (3.4)
The expression for DCW[k] is given by Equation 3.5. Here, a control signal,
KCAL[k], is multiplied by a ±1 template sequence, T[k], which flips its sign
at every falling edge of the REFx2 clock. Therefore, T[k] determines the sign
of the delay while KCAL determines its magnitude by Equation 3.6. KCAL
can be thought of as the tuning knob to precisely set the magnitude of the
delay to TDCE/2 and it becomes the important parameter to tune in order to
remove the duty cycle error. Solving for KCAL expression for the ideal value
to remove duty cycle error given by Equation 3.7 is achieved.
DCW [k] = T [k]KCAL[k] (3.5)
|∆T [k]| = GDCDLKCAL[k] (3.6)
KCAL =
TDCE
2GDCDL
(3.7)
In reality, GDCDL is unknown and sensitive to PVT variations, and TDCE
is time varying and depends on the quality of the reference clock. Because
of this, KCAL must be able to adapt to changes in these two quantities to
precisely remove the duty cycle error.
KCAL[k] is estimated in a background manner using a sign-sign least-mean-
square (LMS) correlation algorithm that relies upon the detection of duty
cycle error. To detect the error, a sub-sampling bang-bang phase detector
is used that compares the phase of the REFx2 clock with the output clock.
This comparison generates an error signal, ERR[k], which when correlated
with the ±1 template signal, T[k], quantifies the presence of duty cycle error.
This observation relies upon the fact that the magnitude of the deterministic
period jitter on the output clock is slightly less that the magnitude on the
injection clock due to some suppression by its NBW, as was observed in
Section 3.2.2. Therefore, the rising edges of the output clock will always be
closer in time to an ideal rising edge. When REFx2 is compared to OUT in
17
REFX2[k] REFX2[k+1]OUT[k] OUT[k+1]
EARLY LATE
TDCE
TDCE_SUP
IDEAL
Figure 3.10: Relationship between REFx2 edges and OUT edges with duty
cycle error.
the presence of duty cycle error, the edges of REFx2 will occur early then late
as illustrated in Figure 3.10. While the deterministic difference between the
edges of REFx2 and OUT is larger than the random jitter on both edges, the
duty cycle error is detectable by the sub-sampling bang-bang phase detector.
The resulting ERR[k] signal will be a recurring sequence of ±1. Thus, when
ERR[k] is correlated with T[k], a large positive result indicates the presence
of duty cycle error.
The LMS block correlates T [k] with ERR[k] to to determine KCAL by
multiplying the two signals and accumulating the result, as given by Equa-
tion 3.8. The LMS gain factor, µLMS, is set to ensure stability of the loop.
Initialized to 0, KCAL[k] will increase in the presence of duty cycle error,
gradually increasing |∆T | until ERR[k] and T [k] are no longer correlated.
At this point, any detectable duty cycle error has been removed and ERR[k]
is now dominated by the random noise from in the loop.
KCAL[k] = KCAL[k − 1] + µLMSERR[k]T [k] (3.8)
It is important to notice that delays in the pulse generation and injection
mean there is no predefined relationship between OUT and REFx2, making
it impossible to directly compare the two. This time offset may saturate the
bang-bang PD resulting in a continuous output stream of either +1 or −1
and render it ineffective for detecting duty cycle error. To alleviate this, a
delay locked loop (DLL) consisting of the bang-bang PD and an accumulator
(ACCDLL) tune the delay of another digitally controlled delay line, DCDLDLL,
18
<10µs convergence time (FREF =125 MHz)
Clean REFx2
Figure 3.11: Simulated duty cycle error correction loop settling.
such that the bang-bang PD inputs are aligned on average, removing any
static offset. This enables the error signal to contain residual duty cycle
error allowing its usage in the LMS algorithm.
3.2.4 Settling Behavior
The simulated settling behavior of the duty cycle error correction loop with a
125 MHz reference clock is shown in Figure 3.11. Starting from a duty cycle
error of 40 ps, the loop converges in less than 10 µs. The period jitter of the
REFx2 clock is gradually reduced as KCAL settles. Once the duty cycle error
is removed, KCAL dithers around its optimal point. After 10 µs, the REFx2
clock can be treated as a clean 250 MHz input clock for the injection locking
circuit and the ILCM achieves the benefits of extended noise bandwidth.
3.3 Frequency Tracking
As illustrated by Figure 3.12, frequency error, defined as the difference be-
tween the free running frequency of the DCO and the target output frequency,
in an ILCM can have a detrimental effect on overall phase noise performance
19
FOUT = FFR
FREF FREF
Free-Running
Inj. Locked Ref. Spur
FFR
FREF
FOUT
FERR
Higher
Ref. Spur
Free Running Frequency = N*FREF Free Running Frequency = N*FREF-FERR
Degraded 
Phase Noise 
Performance
Figure 3.12: Injection locked clock spectrum with frequency error.
[14]. Figure 3.13 shows the results of a simulation where 125 MHz clock is
doubled and injected into an oscillator running at 5 GHz. In the absence of
frequency error the total integrated jitter is 482 fsrms. When frequency error
is added, the phase noise drastically increases. Every 2 MHz of error between
the free running frequency of the DCO and desired output frequency results
in two times the amount of total integrated jitter and severely degraded spu-
rious performance. As a result, it is extremely important to precisely tune
the free running frequency in an ILCM.
3.3.1 Frequency Tracking Loop
To remove frequency drift and maintain the free running frequency of the
DCO equal to the target frequency, this work uses a background frequency
tracking loop shown in Figure 3.14. This technique is similar to the one
introduced by Elkholy et al. in [14].
In the presence of frequency error, phase error between REFx2 and OUT
will build between each injection pulse. When the injection pulse arrives,
the phase of OUT is reset, resulting in large jitter and a high REFx2 spur
as can be seen in Figures 3.12 and 3.13. The frequency tracking loop detects
this accumulated phase error and corrects the frequency of the oscillator. To
do this, because the phase is reset at each injection pulse, one pulse out of
every M (the gating rate) pulses is “gated”, meaning that the injection is
disabled. Because there was no injection pulse, the FTL is able to measure
the accumulated phase in N (the multiplication factor) cycles by comparing
the phase of REFx2 with OUT using the sub-sampling bang-bang phase
detector. Because REFx2 and OUT are aligned on average by the delay
20
105 106 107 108 109
Frequency (Hz)
-160
-150
-140
-130
-120
-110
-100
-90
-80
M
ag
ni
tu
de
 (d
Bc
/H
z)
FERR= 4 MHz; σrms= 1.837 psrms
FERR= 2 MHz; σrms= 0.944 psrms
FERR= 0 MHz; σrms= 0.482 psrms
Figure 3.13: Effect of frequency error on phase noise.
REFx2
KIKDLL
OUT
DCDLDLL
TPGATE
ACCDLL FFS / ACCI
Pulse Gating Logic FTL
!!PD
Pulse Gen.
Figure 3.14: Block diagram of the frequency tracking loop.
21
locked loop as described in Section 3.2.3, this direct comparison is possible.
The output of the phase detector is the sign of the frequency error present
in the oscillator. This sign is integrated using a digital accumulator (ACCI)
whose output updates the frequency of the injection locked DCO incremen-
tally at every pulse gating event. Eventually, the output of the accumulator,
and thus the frequency of the oscillator, will reach a point where the sign
of the frequency error will dither between each gating event. This indicates
that the frequency error has been fully corrected.
The FTL is controlled by a digital logic block labeled “Pulse Gating Logic.”
Its function is to control the gating signal by counting clock cycles and gating
every Mth pulse. It is clocked by the falling edge of the REFx2 clock. The
logic block also controls the inputs to accumulators ACCDLL and ACCI such
that ACCI only accumulates the output of the phase detector after a pulse
has been gated. Conversely, ACCDLL and the LMS block from Figure 3.9 do
not accumulate the phase detector output after a pulse gating event. The
FTL loop is run slowly as to not interfere with the DCE correction loop,
or the DLL offset removal and ensures robust operation across supply and
temperature variations.
3.3.2 Startup Sequence
To ensure convergence of each loop, a particular startup sequence must be
followed. First, the injection locking is enabled and the DLL is locked. Dur-
ing this time it is important that the frequency tracking loop and the duty
cycle error correction loop are turned off because the output of the phase de-
tector is stuck to either +1 or −1 while the DLL is settling. If enabled these
loops could be pushed in the wrong direction, increasing their convergence
time or even preventing convergence altogether. Once the DLL is locked, al-
lowing the phase detector to directly compare the RERx2 and OUT clocks,
the duty cycle error correction loop is enabled. Once duty cycle error has
been removed, the FTL can safely be turned on and pulse gating is enabled.
22
3.4 Flicker Noise Suppression
Ring oscillator flicker noise, because of its accumulation within the ring,
grows by 30 dB/decade from the flicker noise corner toward lower frequencies.
As process nodes scale downward, the flicker noise corner of ring oscillators
tends to occur at higher frequencies. This allows for a significant amount of
low-frequency noise to degrade the performance of a ring-based ILCM. Most
ILCMs in literature exhibit first-order suppression of low-frequency oscillator
noise, allowing for 10 dB/decade of flicker noise to accumulate from the flicker
noise corner onward. While this approach is okay for LC-based architectures
or ring-based architectures in larger process nodes, flicker noise needs to be
addressed for ring-ILCMs at low process nodes. As was shown by Chien et
al. in [16], it is possible to achieve second-order suppression in an ILCM
using a technique called pulse-position-modulation.
This work uses pulse-position-modulation and adapts the work by Chien
into a digital architecture. Comparing the architecture in Figure 3.14 to the
FTL introduced in [14], there is one major difference. In this work, the DCDL
has been moved from the reference path and placed directly in the injection
path. This move achieves second-order flicker noise suppression of the ring
DCO at the expense of more DCDL quantization noise leaking to the output.
To understand this trade-off observe the phase domain block diagram in Fig-
ure 3.15. This phase domain block diagram is built using the PDR analysis
introduced in [12] and extended in [14]. The three major sources of noise in
the system, DCO flicker and thermal noise, DCDL quantization noise, and
reference noise are labeled as ΦDCO, ΦDCDL, and ΦREF respectively.
The noise transfer functions for each of the noise sources are given by
Equations 3.9–3.12. The NTFs are plotted versus frequency in Figure 3.16.
The DCO has a second-order transfer function resulting in 40 dB/decade of
low-frequency noise suppression. The two poles in the system are the result of
the injection and the delay locked loop used for aligning the REFx2 and OUT
phase. The location of the injection pole (and thus the noise bandwidth)
when FREF is 125 MHz is located at approximately 40 MHz (FREF/3) in
Figure 3.16 and is made programmable by adjusting the injection strength,
β. The location of the low-frequency pole is programmable by adjusting
KDLL, the gain of the delay locked loop accumulator.
23
REF N
KDCDL
Kp/1-z-1
K!!PD
/1-z-1 z-1
OUT
DCO
DCDL
Figure 3.15: Phase domain model of the ILCM.
LG(s) =
β
1− z−1 ·
G!!PDGDCDLKDLL
1− z−1 (3.9)
NTFDCO(s) =
ΦDCO(s)
ΦOUT (s)
=
1
1 + LG(s)
(3.10)
NTFDCDL(s) =
ΦDCDL(s)
ΦOUT (s)
=
β
1− z−1 ·
1
1 + LG(s)
(3.11)
NTFREF (s) =
ΦREF (s)
ΦOUT (s)
= N
(
1− G!!PDGDCDLKDLL
1− z−1
)
1
1 + LG(s)
(3.12)
While this flicker noise suppression is desirable the addition of the DCDL
quantization noise in the reference path incurs a significant penalty. Be-
cause the low-frequency pole needs to be placed significantly far away from
the injection pole to prevent peaking in the DCO transfer function, DCDL
noise has a large flat component where all of its noise leaks to the output.
Therefore, the location of two poles is extremely important. In this, these
pole locations are programmable so an optimal point can be found. Lastly,
as noted above, the reference noise has a low-pass transfer function with its
pole located at the noise bandwidth, motivating the need for the duty cycle
error correction.
24
104 105 106 107 108
Frequency (Hz)
-60
-40
-20
0
20
dB
NTF DCO
NTF DLL
NTF REF
Figure 3.16: Noise transfer functions of various blocks in the ILCM.
25
CHAPTER 4
CIRCUIT BUILDING BLOCKS
In this chapter, the circuit realization of the block diagram described in
Chapter 3 is presented. In particular, the transistor-level implementation
of a few major building blocks is discussed. Section 4.1 describes the ring-
based digitally controlled oscillator and Section 4.2 discusses the digitally
controlled delay line. Lastly, Section 4.3 shows how these blocks fit together
to implement the full architecture.
4.1 Ring-Based Digitally Controlled Oscillator
The circuit diagram for the ring-DCO is shown in Figure 4.1. The oscillator
is composed of four pseudo-differential delay cells connected in ring topology.
Differential operation was chosen for its superior power supply noise rejection
compared to single-ended [4]. Each of the delay cells is implemented using
two current starved CMOS inverters with resistor feed-forward coupling for
differential operation. Inverter-based delay cells were chosen for their su-
perior speed at lower power compared to other delay cell implementations.
The frequency of the oscillator is controlled by precisely tuning the amount
of current fed into each of the inverters.
To achieve a wide tuning range and fine frequency control, current control
is split into coarse and fine paths. The coarse path is controlled by a 5-
bit digital-to-analog converter (DAC) with a current output. The current
DAC is constructed by simply switching 32 PMOS current sources with equal
current set by an external current reference. The control signals for the
switches are generated by digital registers that are externally writable. The
fine path is controlled by the frequency tracking loop with 12 bits. A second-
order digital ∆Σ modulator truncates the 12 bits to 5 bits and a 32-bit
thermometer encoded version of this 5-bit signal controls another current
26
DAC. This DAC is similar to the one in the coarse path, but with a separate
external current reference. The output of the current DAC is converted into
voltage through resistor R1. Due to the noise shaping of the ∆Σ modulator,
which generates high-frequency noise, the voltage on resistor R1 is low-pass
filtered by capacitors C1 and C2 and resistor R2 to generate the control
voltage for the fine-path current source. The output of this current source
is the summed with the current from the coarse path to generate the total
control current for the current-starved inverters, thus precisely controlling
the frequency of the oscillator.
The pulse injection is performed by driving a NMOS switch that shorts
the two differential clock signals, pulling the phase of the ring oscillator
toward the phase of the injection pulse. To make the injection strength, β,
programmable, the width of the pulse is modifiable and the injection switch
was split into two. The injection pulse can be directed toward the switches
independently such that either switch can be chosen to inject, or both. The
injection switches were sized in order to achieve a large in injection strength,
but the size was limited due to the switch loading effect described in Section
3.2.
The oscillator frequency is dependent on the control current, as well as
the size of the inverters and the injection switch. The inverters were sized
such that the oscillator consumes approximately 3 mW of power at 5 GHz.
The current sources were designed to allow for oscillator frequencies between
2.5 GHz and 5.75 GHz across a supply range of 0.9 V to 1.1 V. At 1.0 V
the coarse frequency step was set to be 100 MHz, but is externally tunable
by modifying its reference current. The fine frequency step size was set at
approximately 50 kHz and is externally tunable as well. A custom layout
was conducted for the VCO, taking care of the locations of the clock lines to
reduce noise. The VCO layout is shown in Figure 4.2.
4.2 Digitally Controlled Delay Line
The architecture consists of two DCDLs, DCDLCAL and DCDLDLL, as shown
in Figure 3.9. Both are implemented using a 10-bit controllable series of
digitally controlled delay cells (DCDC). Figure 4.3 shows the circuit block
diagram for the DCDL. The circuit is similar to the one demonstrated in
27
R2
C1C2
VCTRL
31
IDAC
VSIFINEICOARSE
2 
5
DCOP
DCON
B2T
R1
IDACIDACCFS
12
FFS
5
INJ
RF
VS
IN
RF
VS
IN
OUT
Figure 4.1: Schematic of the digitally controlled ring oscillator.
Figure 4.2: Layout of the ring oscillator.
28
DC
DC DC
D
C
DC
DC
DC
DC
x8
x4 x2 x1 x1
CLKIN
DCW[9:4]
DCW[3:0]
CLKOUT
DCDC
x64
6666
6
1 1 1 1
1
4
Digitally Controlled 
Delay Cell (DCDC)
Figure 4.3: Block diagram of the digitally controlled delay line.
[17] with extra stages added for 10-bit operation. The DCDL consists of
16 cascaded DCDC blocks which ensures fast rise and fall times by placing
strong inverters in each cell to restore the edges. Each DCDC block consists
of an inverter, followed by a 6-bit tunable 64-unit MOS-capacitor bank and
a single unit capacitor, followed by another inverter. The six MSBs are fed
into each of the DCDC blocks while the four LSBs are fed into a number of
DCDC blocks depending on their binary weight and control the single unit
capacitor.
DCDLCAL was designed to be able to correct for ±5% duty cycle error on
a 125 MHz reference clock, a number typical to crystal oscillators. It was
configured to achieve a tunable range of 400 ps with an approximately 400 fs
step size with a supply of 1.0 V. DCDLDLL was designed to span one oscillator
period of the lowest output frequency, 2.5 GHz, and was also configured for
a range of 400 ps with a 400 fs step size at 0.9V. When operating toward
the high-frequency range, the supply of the DCDLDLL was raised in order to
achieve better precision. At 1.1 V its step size reduces to 200 fs for a range
of 200 ps.
4.3 Full Architecture
Figure 4.4 shows the complete architecture of the ring-based ILCM. The
reference doubler consists of the XOR based doubling circuit, DCDLCAL for
edge delays, and associated digital calibration logic. The FTL and DLL
consist of digital logic and DCDLDLL. These blocks interface with the ring-
DCO. The output clock is fed into the bang-bang phase detector along with
29
R2
C
1
C
2
V
CTRL
31
I
DAC
V
S
I
FINE
I
COARSE
!!PD
∆Σ
2
 
5
DCO
P
DCO
N
B2T
R
1
IDAC
IDAC
CFS
Pulse Gating 
Logic
∆Σ 
ACC
I
ACC
DLL
K
I
K
DLL
REF
Stage 1: Ref. Doubler
12
Ring DCO
ERR
DCDL
DLL
FFS
5
T
P
Pulse Gen.
INJ
GATE
±1
Calib.
Delay
Stage
OUT
7
16 Cascaded Stages
IN
REF2
Stage 2: ILCM
Figure 4.4: Block diagram of the full ILCM architecture.
the output of the reference doubler. The fine frequency path controlled by
the FTL and the injection switch is driven by the pulse generation block.
The ILCM was implemented in 65 nm CMOS process and its die photo is
shown in Figure 4.5. It occupies an active area of 0.09 mm2.
30
DCO
DCDL
DUB
DIGITAL
310µm
290µm
Figure 4.5: Die micrograph of the ILCM.
31
CHAPTER 5
RESULTS
In this chapter measurement results for the ring-based injection locked clock
multiplier are presented. The ILCM (circuit diagram shown in Figure 4.4,
chip layout in Figure 4.5) was fabricated in TSMC 65 nm CMOS process. The
36-pin chip was placed onto a custom designed printed circuit board (PCB)
and tested using high-precision clocking equipment. The measurement setup
is described in Section 5.1 and the resulting measurements are presented in
Section 5.2. Finally, a summary of the results is given in Section 5.3.
5.1 Measurement Setup
To properly measure the fabricated circuit, a custom PCB was designed. Its
functions were: (1) to provide clean power supplies to each power domain;
(2) to generate current references for each of the external current mirrors;
(3) to interface the digital control of the chip with a USB connection such
that internal digital settings could be changed; (4) to connect the input and
output clocks to their respective instruments; and (5) to do this in such a
way that everything is measurable without interfering with the performance
of the chip. The custom PCB is shown in Figure 5.1.
The PCB provides power to each of the seven power domains on the chip
(DCO, DCDL, pulse injection, reference buffer, output buffer, digital, and
serial interface). Since the chip has no voltage regulators, external LDOs are
used to supply clean voltages and allow for external control of the supply
levels. Each of the power domains has its own LDO and its current draw is
measurable such that the power consumed by each portion of the circuit is
observable. The current references for the DCO and for the input and output
buffers are tapped from these clean supplies. The supply is passed through
a potentiometer such that the external current reference is controllable.
32
Figure 5.1: PCB for testing the fabricated ILCM.
Digital registers on chip that contain the configuration information for
the digital circuitry are programmable using a serial interface. The serial
interface on the chip is connected via SPI to an external microcontroller which
sends configuration messages. The microcontroller is connected to a lab
computer via USB and register configuration is controlled with a MATLAB
interface.
The reference and output clocks are fed into SMA connectors which allow
for connection with lab instruments. The reference clock is provided by a
Rohde & Schwarz SMA100A Signal Generator. The output clock is connected
to a Rohde & Schwarz FSUP Signal Source Analyzer for measurement. The
full test setup consists of the signal generator, source analyzer, PCB with
fabricated ILCM, microcontroller, and lab computer.
5.2 Measurements
5.2.1 Phase Noise
The phase noise measurement begins by allowing the ILCM to phase lock at
a particular output frequency. With the DCO supply fixed, this is done by
manually setting the coarse frequency control word such that the open-loop
output frequency is within the pull-in range of the injection. Because of the
33
large coarse step, some of the MSBs of the fine path may also be manually
set. Once within the pull-in range, the pulse-injection circuit is enabled and
the circuit phase locks the output clock to the reference clock. The duty
cycle correction loop and the frequency tracking loop are then enabled.
The measured phase noise plot of the ILCM at 5 GHz is shown in Figure
5.2. The green line denotes the phase noise when a 125 MHz reference clock
is used with the doubler turned off. Integrated jitter under this condition is
480 fsrms with integration limits of 10 kHz to 40 MHz. When a 250 MHz ref-
erence clock is used with the doubler also off, phase noise is greatly reduced.
The resulting integrated jitter is 320 fsrms, an improvement of 33%. When
the doubler is enabled with a FREF equal to 125 MHz, integrated jitter of 337
fsrms is achieved. This result demonstrates the effectiveness of the internal
reference frequency doubler and duty cycle error correction as only 17 fsrms
extra integrated jitter is added compared to the case when a 250 MHz ex-
ternal reference clock was employed. To achieve 3.5 dB in-band phase noise
improvement solely by increasing the power of the DCO, its power would
need to be more than doubled. The implemented approach achieves this
improvement with only 0.5 mW extra power (15% of the power penalty of
doubling DCO power).
5.2.2 Duty Cycle Error Correction
To quantify the effectiveness of the duty cycle error correction, Figure 5.3
shows measured reference spurs versus duty cycle error. To create this plot,
DCDLCAL was manually tuned to its optimal point where all of the duty cycle
error has been removed (within the LSB of the DCDL) which corresponds to
the lowest point for the measured spur at FREF . Then, the spur was measured
as the duty cycle error was increased by taking steps away from optimal and
estimated the resulting duty cycle error by multiplying the step size by the
simulated gain of DCDLcal. As shown in the output clock spectrum on the
left of Figure 5.3, when the correction is turned on, the loop reduces the spur
at FREF to -54 dBc, indicating that the duty cycle correction settles to within
one step of the ideal DCDLCAL code. This result indicates the effectiveness
of the duty cycle error correction technique described in Section 3.2.3.
Also shown in the spectrum in Figure 5.3, the measured spur at the injec-
34
250MHz REF 
w/o REF Doubler
(Jitter = 320fsrms)
125MHz REF 
w/o REF Doubler
(Jitter = 480fsrms)
125MHz REF 
w/ REF Doubler
(Jitter = 337fsrms)
3.5dB
Figure 5.2: Measured phase noise of the ILCM output.
tion rate, FREFx2, is around -46 dBc. This indicates the convergence of the
frequency tracking loop to remove any free running frequency error.
5.2.3 Supply Tracking and Output Frequency Range
The performance of the frequency tracking loop and the duty cycle error
correction as the supply voltage is swept is plotted in Figure 5.4. Both loops
operate in a background manner and thus should adapt to changes in supply
voltage which change the free running frequency of the DCO as well as the
gain of DCDLCAL. Variation in supply voltage can act as a proxy for the two
other major sources of circuit variation, process and temperature, and the
circuit performance as supply voltage is swept indicates its resilience to PVT
variations. In this measurement, the supply voltage is externally modified by
tuning the LDOs for the DCO and DCDL supplies and the loops are given
time to settle before a new measurement is made.
As the supply is swept from 0.97 V to 1.0 V, the magnitude of the spurs
at FREF and FREFx2 spur stay relatively constant as shown by the plot in
the bottom-left. The plot in the top-left indicates that the integrated jitter
stays around 350 fsrms across the entire supply sweep. This indicates that he
35
REF2 
Spur
-46 dBc
REF 
Spur
-54 dBc
Figure 5.3: Measured reference spur versus duty cycle error and measured
output spectrum.
FTL and the duty cycle error correction loop both sufficiently track supply
variations and can be expected to maintain the performance of the ILCM
across a wide range of PVT variations.
Also shown in Figure 5.4 is the performance of the circuit across its wide
operating range. The ILCM is able to lock to frequencies within a range of
2.5-5.75 GHz when a reference frequency of 125 MHz is used. Because of
the internal doubling, the circuit locks to frequencies that are a multiple of
250 MHz. This wide frequency range is achieved across three DCO supply
voltage settings; 0.9 V, 1.0 V, and 1.1 V. As shown in the top-right, the
ILCM outputs a clock with sub-500 fsrms integrated jitter across the wide
frequency range. Power ranges from 3.5-6 mW across the frequency range
and increases with the DCO frequency as shown in the bottom-left plot.
5.2.4 Power
The power consumption breakdown for the various voltage supply domains
on the chip is shown in Figure 5.5. When calculating total power the in-
put/output buffers as well as the serial interface were left out. When oper-
ating at 5 GHz, the ILCM consumes 5.3 mW from a 1.0V supply, leading to
a power effency of 1.06 mW/GHz. The majority of this power is consumed
36
0.97 0.98 0.99 1
Supply Voltage (V)
-55
-50
-45
-40
-35
Sp
ur
 (d
Bc
) REF2 (FTL Off)
REF2 (FTL On)
REF (FTL On)
0.97 0.98 0.99 1
Supply Voltage (V)
300
350
400
450
500
In
t. 
Ji
tte
r (
fs rm
s) FTL Off
FTL On
2.5 3 3.5 4 4.5 5 5.5 6
Output Frequency (GHz)
300
350
400
450
500
In
t. 
Ji
tte
r (
fs rm
s) 0.9V
1.0V
1.1V
2.5 3 3.5 4 4.5 5 5.5 6
Output Frequency (GHz)
3
4
5
6
Po
w
er
 (m
W
)
0.9V
1.0V
1.1V
Figure 5.4: Measured performance of the ILCM across supply voltages and
output frequencies.
by the DCO, drawing 3.3 mW.
5.3 Performance Summary
A summary of the measurements results as well as a comparison to other
state of the art architectures is given in Table 5.1. The ILCM achieves a
figure of merit (FoM) of -242.4 dB when operating at 5 GHz (a multiplica-
tion factor of 40). Compared to the other works, this FoM is achieved at a
much higher operating frequency, a higher multiplication factor, and a larger
output frequency range. This demonstrates the effectiveness of the reference
frequency doubling technique as well as the corresponding duty cycle error
correction.
The work by Kim et al. in [9] also employed a reference frequency doubler
as well as duty cycle error correction. However, the highly analog nature of
their error detection led to an increase in power consumption, hurting their
overall FoM. Also, the MDLL architecture limits the achievable output fre-
quency. This work presented a fully digital correction scheme for a injection
locked clock multiplier and was able to achieve a higher FoM while operating
at twice the output frequency.
37
Figure 5.5: Power breakdown of the ILCM at 5 GHz output frequency.
Table 5.1: ILCM performance comparison.
Kim [9] Musa [13] Choi [15] This Work [18]
ISSCC’16 JSSC’14 ISSCC’16 ISSCC’17
Architecture MDLL ILCM ILCM ILCM
Technology [nm] 28 65 65 65
Output Freq. [GHz] 2.4 0.5-1.6 0.96-1.44 2.5-5.75
Ref Freq. [GHz] 75 125-400 120 125
Mult. Factor 32 4 8-12 20-46
Output Jitter [fsrms]
700 700 185 340
[1k-40MHz] [10k-40MHz] [10k-40MHz] [10k-40MHz]
Ref. Spur [dB] -51.4 -57 -53 -45
Power [mW] 1.51 0.97 9.5 5.3
Power Eff. [mW/GHz] 0.63 0.80 6.60 1.06
Area [mm2] 0.024 0.022 0.060 0.090
FoM [dB]1 -241.3 -243.0 -244.9 -242.4
1 FoM = 10log
[(
σrms
1sec
)2 · (Power
1mW
)]
38
Fo
M
J 
[d
B
]
                                                                                                                    
                                                                                                                    
                                                                                                                    
                                                                                                                    
                                                                                                                    
                                                                                                                    
                                                                                                                    
                                                                                                                    
                                                                                                                    
                                                                                                                    
                                                                                                                    
                                                                                                                    
                                                                                                                    
                                                                                                                    
                                                                                                                    
                                                                                                                    
This 
Work
(N=40)
Elshazly, 
JSSC’13
(N=4)
Ali, 
ISSCC’11
(N=8)
Kim, 
ISSCC’16
(N=32)
Helal, 
JSSC’08
(N=32)
Choi, 
ISSCC’16
(N=12)
Musa, 
JSSC’14
(N=4) Deng, 
ISSCC’13
(N=5)
Deng, 
ISSCC’14
(N=5)
Kim, 
VLSI’15
(N=5)
Chien, 
ISSCC’14
(N=8)
Frequency [GHz]
FoM=10log[(rms/1sec)2 x (P/1mW)]
Figure 5.6: Figure of merit versus frequency for state-of-the-art ring-based
clock multipliers.
The plot in Figure 5.6 shows that this circuit achieves a significant improve-
ment on FoM at a high-frequency with a high multiplication ratio compared
to recently published ring-based clock multipliers. It occupies an area on the
chart where there previously had been a gap.
39
CHAPTER 6
CONCLUSION
This work presented a ring-based, injection locked clock multiplier that is
able to achieve a state-of-the-art figure of merit while operating at a high-
frequency. To do this, DCO noise was highly suppressed by pushing the
noise bandwidth of the circuit to FREF/3 using a reference frequency doubling
technique. Duty cycle error correction was used to ensure robust operation of
the doubling circuit. A frequency tracking loop was also employed to remove
DCO freqency error, making the circuit immune to changes in supply voltage
and temperature. Lastly, a novel flicker noise suppression technique was
presented and could be used to ensure this circuit achieves high performance
at low process nodes.
The proposed ILCM was designed and fabricated in TSMC 65 nm CMOS
process. Thorough lab testing was done to characterize the performance of
the design. The circuit achieved a figure of merit of -242.4 dB when operating
at 5 GHz from a 125 MHz clock, a large multiplication factor of 40. In doing
this, it consumes 5.3 mW of power and 0.09 mm2 of area. These results
demonstrate the effectiveness of the reference clock doubling techneque and
make a strong case for its use in systems that desire a high-performance clock
generated by a ring oscillator.
40
REFERENCES
[1] P. K. Hanumolu, G. Y. Wei, U. K. Moon, and K. Mayaram, “Digitally-
enhanced phase-locking circuits,” in 2007 IEEE Custom Integrated Cir-
cuits Conference, Sep. 2007, pp. 361–368.
[2] H. Wu, M. Mikhemar, D. Murphy, H. Darabi, and M. C. F. Chang, “A
blocker-tolerant inductor-less wideband receiver with phase and thermal
noise cancellation,” IEEE Journal of Solid-State Circuits, vol. 50, no. 12,
pp. 2948–2964, Dec. 2015.
[3] A. Hajimiri, S. Limotyrakis, and T. Lee, “Jitter and phase noise in ring
oscillators,” IEEE Journal of Solid-State Circuits, vol. 34, no. 6, pp.
790–804, June 1999.
[4] A. Abidi, “Phase noise and jitter in CMOS ring oscillators,” IEEE Jour-
nal of Solid-State Circuits, vol. 41, no. 8, pp. 1803–1816, Aug. 2006.
[5] F. Gardner, “Charge-pump phase-lock loops,” IEEE Transactions on
Communications, vol. 28, no. 11, pp. 1849–1858, Nov. 1980.
[6] X. Gao, E. A. M. Klumperink, M. Bohsali, and B. Nauta, “A low noise
sub-sampling PLL in which divider noise is eliminated and PD/CP noise
is not multiplied by N2ˆ,” IEEE Journal of Solid-State Circuits, vol. 44,
no. 12, pp. 3253–3263, Dec. 2009.
[7] A. Elshazly, R. Inti, B. Young, and P. Hanumolu, “Clock multiplication
techniques using digital multiplying delay-locked loops,” IEEE Journal
of Solid-State Circuits, vol. 48, no. 6, pp. 1416–1428, June 2013.
[8] B. Helal, M. Straayer, G.-Y. Wei, and M. Perrott, “A highly digital
MDLL-based clock multiplier that leverages a self-scrambling time-to-
digital converter to achieve subpicosecond jitter performance,” IEEE
Journal of Solid-State Circuits, vol. 43, no. 4, pp. 855–863, Apr. 2008.
[9] H. Kim, Y. Kim, T. Kim, H. Park, and S. Cho, “19.3 A 2.4 GHz 1.5
mW digital MDLL using pulse-width comparator and double injection
technique in 28 nm CMOS,” in 2016 IEEE International Solid-State
Circuits Conference (ISSCC), Jan. 2016, pp. 328–329.
41
[10] T. A. Ali, A. A. Hafez, R. Drost, R. Ho, and C. K. K. Yang, “A 4.6
GHz MDLL with -46 dBc reference spur and aperture position tuning,”
in 2011 IEEE International Solid-State Circuits Conference, Feb. 2011,
pp. 466–468.
[11] B. Helal, C.-M. Hsu, K. Johnson, and M. Perrott, “A low jitter pro-
grammable clock multiplier based on a pulse injection-locked oscillator
with a highly-digital tuning loop,” IEEE Journal of Solid-State Circuits,
vol. 44, no. 5, pp. 1391–1400, May 2009.
[12] D. Dunwell and A. Carusone, “Modeling oscillator injection locking us-
ing the phase domain response,” IEEE Transactions on Circuits and
Systems I: Regular Papers, vol. 60, no. 11, pp. 2823–2833, Nov. 2013.
[13] A. Musa, W. Deng, T. Siriburanon, M. Miyahara, K. Okada, and
A. Matsuzawa, “A compact, low-power and low-jitter dual-loop injec-
tion locked PLL using all-digital PVT calibration,” IEEE Journal of
Solid-State Circuits, vol. 49, no. 1, pp. 50–60, Jan. 2014.
[14] A. Elkholy, M. Talegaonkar, T. Anand, and P. K. Hanumolu, “Design
and analysis of low-power high-frequency robust sub-harmonic injection-
locked clock multipliers,” IEEE Journal of Solid-State Circuits, vol. 50,
no. 12, pp. 3160–3174, Dec. 2015.
[15] S. Choi, S. Yoo, and J. Choi, “10.7 A 185 fsrms-integrated-jitter and -245
dB FOM PVT-robust ring-VCO-based injection-locked clock multiplier
with a continuous frequency-tracking loop using a replica-delay cell and
a dual-edge phase detector,” in 2016 IEEE International Solid-State
Circuits Conference (ISSCC), Jan. 2016, pp. 194–195.
[16] J. C. Chien, P. Upadhyaya, H. Jung, S. Chen, W. Fang, A. M. Niknejad,
J. Savoj, and K. Chang, “2.8 A pulse-position-modulation phase-noise-
reduction technique for a 2-to-16 GHz injection-locked ring oscillator in
20 nm CMOS,” in 2014 IEEE International Solid-State Circuits Con-
ference Digest of Technical Papers (ISSCC), Feb. 2014, pp. 52–53.
[17] A. Elkholy, T. Anand, W. S. Choi, A. Elshazly, and P. K. Hanumolu,
“A 3.7 mW low-noise wide-bandwidth 4.5 GHz digital fractional-N PLL
using time amplifier-based TDC,” IEEE Journal of Solid-State Circuits,
vol. 50, no. 4, pp. 867–881, Apr. 2015.
[18] D. Coombs, A. Elkholy, R. K. Nandwana, A. Elmallah, and P. K. Hanu-
molu, “8.6 A 2.5-to-5.75 GHz 5 mW 0.3 psrms-jitter cascaded ring-based
digital injection-locked clock multiplier in 65 nm CMOS,” in 2017 IEEE
International Solid-State Circuits Conference (ISSCC), Feb. 2017, pp.
152–153.
42
