Wright State University

CORE Scholar
Browse all Theses and Dissertations

Theses and Dissertations

2008

The Development of a Nonlinear Phase-lock Loop with Adaptive
Gain Control Based on Modern Control Theory
Michael D. Myers
Wright State University

Follow this and additional works at: https://corescholar.libraries.wright.edu/etd_all
Part of the Engineering Commons

Repository Citation
Myers, Michael D., "The Development of a Nonlinear Phase-lock Loop with Adaptive Gain Control Based
on Modern Control Theory" (2008). Browse all Theses and Dissertations. 224.
https://corescholar.libraries.wright.edu/etd_all/224

This Dissertation is brought to you for free and open access by the Theses and Dissertations at CORE Scholar. It
has been accepted for inclusion in Browse all Theses and Dissertations by an authorized administrator of CORE
Scholar. For more information, please contact library-corescholar@wright.edu.

THE DEVELOPMENT OF A NONLINEAR PHASE-LOCK LOOP
WITH ADAPTIVE GAIN CONTROL
BASED ON MODERN CONTROL THEORY

A dissertation submitted in partial fulfillment of the
Requirements for a degree of
Doctor of Philosophy

By

MICHAEL D. MYERS
B.S Electrical Engineering, Wright State University, 2001
M.S. Engineering, Wright State University, 2002

______________________________________

2008
Wright State University

COPYRIGHT BY
MICHAEL D. MYERS
2008

WRIGHT STATE UNIVERSITY
SCHOOL OF GRADUATE STUDIES
February 20, 2008
I HEREBY RECOMMEND THAT THE DISSERTATION PREPARED UNDER
MY SUPERVISION BY Michael D. Myers ENTITLED The Development of a
NPLL with Adaptive Gain Control Based on Modern Control Theory
BE
ACCEPTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR
THE DEGREE OF Doctor of Philosophy.

____________________________________
Raymond Siferd, Ph.D.
Dissertation Director
____________________________________
Ramana V. Grandhi, Ph.D.
Director, Engineering Ph.D. Program
____________________________________
Joseph F. Thomas, Jr., Ph.D.
Dean, School of Graduate Studies
Committee on
Final Examination
____________________________________
Raymond Siferd, Ph.D.
____________________________________
Henry Chen, Ph.D.
____________________________________
John Marty Emmert, Ph.D.
____________________________________
Marian K. Kazimierczuk, Ph.D.
____________________________________
Frank Scarpino, Ph.D.

ABSTRACT
Myers, Michael D. Ph.D. Engineering Ph.D. Program, Department of Electrical
Engineering, Wright State University, 2008.The Development of a NPLL with Adaptive
Gain Control Based on Modern Control Theory.
As the performance of integrated circuits (IC) improve, a more precise clock-signal is
needed to regulate their actions. The primary objective of this dissertation is to improve the
phase-lock loop (PLL), which is the most common type of clock-generator. A nonlinear phaselock loop (NPLL) was developed by adding a nonlinear-gain unit to a standard PLL. The NPLL
implementation improves performance compared to existing PLLs by demonstrating faster
acquisition times and superior jitter performance. The nonlinear-gain is achieved by the use of a
fuzzy controller. The fuzzy controller takes in a value and generates outputs based upon the rules
that are programmed into it. The developed NPLL takes a 62.5 MHz off-chip clock-signal and
generates a 2GHz on chip clock.
To demonstrate and confirm the viability of this approach, a clock-distribution system
was designed based on the NPLL. The clock-distribution system is a global-shielded H-tree,
coupled with a regional gird system. The NPLL and the clock-distribution system were designed
using a IBM 130nm CMOS process.
The maximum jitter values that are achieved are as low as 2.6ps. The NPLL locks onto its
input signal in approximately 200ns while consuming 1.98mW of power in an area of 133.9μm
by 60.9μm. The clock-distribution system, supplies a low jitter clock signal to a chip area of
81mm2.

iii

TABLE OF CONTENTS

1 INTRODUCTION ........................................................................................................... 1
2 BACKGROUND .............................................................................................................. 5
2.1 Asynchronous Systems ........................................................................ 5
2.2 Synchronous Systems .......................................................................... 6
2.3 Figures Of Merit .................................................................................. 9
2.3.1 Maximum Frequency ................................................................. 9
2.3.2 Power Consumptions.................................................................. 10
2.3.3 Size ............................................................................................. 11
2.3.4 Lock-Time.................................................................................. 11
2.3.5 Timing Accuracy........................................................................ 11
2.3.5.1 Skew ................................................................................. 11
2.3.5.2 Latency ............................................................................. 13
2.3.5.3 Rise And Fall Time........................................................... 13
2.3.5.4 Jitter .................................................................................. 14
2.3.5.5 Phase Noise....................................................................... 16
2.4 CLOCK DISTRIBUTION ................................................................... 17
2.4.1 Distribution Hierarchy ......................................................... 17
2.4.2 Global Level ........................................................................ 18
2.4.3 Regional Level..................................................................... 19
2.4.4 Local Level .......................................................................... 19
2.5 TYPES OF DISTRIBUTION .............................................................. 20
2.5.1 Single Driver .............................................................................. 20
2.5.2 H-Tree ........................................................................................ 21

iv

2.5.3 Spines ......................................................................................... 22
2.5.4 Grid............................................................................................. 23
2.5.5 Length-Matched Serpentines...................................................... 24
2.6 CLOCK GENERATORS .................................................................... 24
2.6.1 Ring Oscillators.......................................................................... 24
2.6.2 Voltage-Controlled Ring Oscillators.......................................... 26
2.6.3 Crystal Oscillators ...................................................................... 27
2.6.4 Negative-Resistance Circuits ..................................................... 28
2.6.5 Standing Wave Generators......................................................... 28
2.6.6 Phase-Lock Loop........................................................................ 28
2.6.7 Delayed Lock-Loops .................................................................. 32

2.7 CHOSEN SYSTEM....................................................................33
3 PROPOSED APPROACH................................................................................................ 34
3.1 Introduction .......................................................................................... 34
3.2 Theoretical Development..................................................................... 37
4 FUZZY LOGIC ................................................................................................................ 43
4.1 Introduction .......................................................................................... 43
4.2 Fuzzy Logic Controller........................................................................ 44
4.2.1 The Knowledge-Base ................................................................. 44
4.2.2 Fuzzification............................................................................... 46
4.2.3 Inference Engine ........................................................................ 48
4.2.4 Defuzzifier.................................................................................. 49
4.3 Implementation .................................................................................... 50
4.3.1 Look-Up Table ........................................................................... 52
4.4 Design of the Controller ...................................................................... 52

v

5 SUBSYSTEM DESIGN & SIMULATION ..................................................................... 59
5.1 Basic Gates .......................................................................................... 59
5.1.1 The Inverter ................................................................................ 60
5.1.2 Nand Gate................................................................................... 62
5.1.3 3-Input NAND Gate ................................................................... 64
5.1.4 4-Input NAND Gate ................................................................... 64
5.1.5 2-Input NOR Gate ...................................................................... 64
5.1.6 2-Input AND Gate ...................................................................... 65
5.1.7 3-Input AND Gate ...................................................................... 66
5.1.8 Buffer ......................................................................................... 66
5.1.9 XOR ........................................................................................... 67
5.1.10 AND_XOR............................................................................... 67
5.1.11 Fast Inverter.............................................................................. 68
5.1.12 D Flip-Flop............................................................................... 69
5.1.13 Toggle Flip-Flop ...................................................................... 72
5.2 Phase Frequency Detector ................................................................... 73
5.3 Digitizer ............................................................................................... 76
5.4 Time-To-Binary Converter .................................................................. 78
5.4.1 4-Bit Counters ............................................................................ 81
5.4.2 4-Bit Register ............................................................................. 85
5.5 Signed-Magnitude Subtractor .............................................................. 86
5.5.1 Full Adder .................................................................................. 89
5.5.2 Half Adder.................................................................................. 92
5.5.3 Half Subtractor ........................................................................... 94
5.4 Subtractor............................................................................................. 96

vi

5.6 Programmable Logic Device ............................................................... 98
5.6.1 NAND Decoder.......................................................................... 102
5.6.2 NOR Encoder ............................................................................. 105
5.7 Digital-To-Analog Converter............................................................... 107
5.7.1 Path Selectors ............................................................................. 110
5.7.2 Resistor-Tree .............................................................................. 113
5.8 Low-Pass Filter .................................................................................... 116
5.9 Voltage Controlled Dual-Delay,
Difference-Differential Ring Oscillator......................................... 118
5.9.1 Differential Delay Cell ............................................................... 120
5.10 Divide-By-32 ..................................................................................... 123
6 CLOCK DISTRIBUTION ................................................................................................ 125
6.1 H-Tree Distribution System................................................................. 125
6.1.1 Area ............................................................................................ 126
6.1.2 Driving Strength......................................................................... 126
6.1.3 Stages And Sizing ...................................................................... 127
6.2 Regional Distribution “The Grid”........................................................ 130
6.3 Schematic............................................................................................. 130
6.4 Layout .................................................................................................. 135
7 RESULTS ......................................................................................................................... 140
7.1 Nonlinear Phase-Lock Loop ................................................................ 141
7.1.1 Jitter............................................................................................ 147
7.1.1.1 Adjacent Period Jitter ...................................................... 147
7.1.1.2 Period Jitter ....................................................................... 150
7.1.2 Phase Noise ................................................................................ 150
7.1.3 Signal-to-Noise Ratio ................................................................. 153

vii

7.1.4 Spurious-Free Dynamic-Range .................................................. 154
7.1.5 Power.......................................................................................... 154
7.1.6 Area ............................................................................................ 156
7.1.7 Lock-Time.................................................................................. 157
7.1.8 Overall Performance................................................................... 158
7.2 Clock-Distribution Results .................................................................. 158
7.2.1 Power.......................................................................................... 158
7.2.2 Area ............................................................................................ 159
7.2.3 Latency ....................................................................................... 159
7.2.4 Rise And Fall-Time .................................................................... 160
7.2.5 Jitter............................................................................................ 160
7.2.6 Skew ........................................................................................... 162
7.2.7 Jitter And Skew .......................................................................... 164
7.2.8 Clock Distribution Summary...................................................... 165
8 CONCLUSIONS AND FUTURE WORK ....................................................................... 166
8.1 Conclusion .......................................................................................... 166
8.2 Future Work......................................................................................... 167
8.2.1 Clock Generator ......................................................................... 167
8.2.2 Clock-Distribution System ......................................................... 170
Appendix............................................................................................................................... 173
A Acronyms............................................................................................... 173
B Bibliography........................................................................................... 175

viii

LIST OF FIGURES
Figure

Page

1.1

NPLL Clock Generation and Clock-Distribution System ...........................3

2.1

Synchronous System of a Transmission Line..............................................7

2.2

Timing Diagram of a Synchronous System .................................................7

2.3

Synchronous System of Combinational Logic ............................................8

2.4

Clock Skew Layout....................................................................................12

2.5

Clock Skew Timing Diagram ....................................................................12

2.6

Rise and Fall-Time.....................................................................................14

2.7

Jitter............................................................................................................14

2.8

Adjacent-Period Jitter ................................................................................15

2.9

Measuring Phase Noise..............................................................................17

2.10

Distribution Hierarchy ...............................................................................18

2.11

Single Driver Scheme ................................................................................20

2.12

H-Tree ........................................................................................................21

2.13

Spine=s Distribution System.......................................................................23

2.14

Grid Distribution System ...........................................................................23

2.15

Length-Matched Serpentines-Distribution System....................................24

2.16

Ring Oscillator ...........................................................................................25

2.17

Voltage-Controlled Ring Oscillators .........................................................26

2.18

Layout Pattern of a Voltage-Controlled Ring Oscillator ...........................27

2.19

PLL ............................................................................................................29

2.20

Variable Bandwidth PLL ...........................................................................30
ix

2.21

Variable Gain PLL.....................................................................................31

2.22

Frequency-to-Voltage Converter Variable Gain PLL................................31

2.23

Delayed Lock-Loops..................................................................................32

3.1

Block Diagram of a Standard PLL.............................................................34

3.2

Block Diagram of a Nonlinear Phase-Lock Loop......................................35

3.3

Mathematical Model of the Nonlinear Phase-Lock Loop..........................37

3.4

Absolute Jitter vs. Gain..............................................................................41

3.5

Period Jitter vs Gain...................................................................................42

3.6

Normalized Plot of Absolute (Red) and Adjacent-Period (Blue) Jitter .....42

4.1

Block Diagram of a Fuzzy Controller........................................................44

4.2

Seven Membership Functions Fuzzy Sets .................................................45

4.3

Three Membership Functions Fuzzy Sets..................................................45

4.4

Knowledge-Based ......................................................................................46

4.5

Membership Functions...............................................................................46

4.6

Seven Membership Functions Fuzzy Sets .................................................47

4.7

Fuzzy Processing .......................................................................................47

4.8

Fuzzification Using Minimum/Maximum Method....................................48

4.9

Fuzzification and Defuzzification with Center of Gravity Method ...........49

4.10

Fuzzification and Defuzzification with Weighted Average Method .........50

4.11

Control Surfaces of a Three-Rule Controller.............................................53

4.12

Control Surfaces for Curve Fittings...........................................................54

4.13

Control Surfaces for a Low Gain Area ......................................................54

4.14

Control Surfaces.........................................................................................55

x

4.15

Ideal Control Surfaces(Blue) and 4-Bit Quantized Control
Surfaces (Red)......................................................................................56

4.16

Ideal Control Surfaces (Blue), 6-Bit Quantized Control
Surfaces (Red)......................................................................................56

4.17

Ideal Control Surfaces Scaled for 4-Bit System (Blue) and
4-Bit Quantized Control Surfaces (Red)..............................................57

5.1

Inverter Schematic .....................................................................................60

5.2

Two Input NAND Gate Schematic ............................................................62

5.3

Two Input NOR Gate Schematic ...............................................................65

5.4

Buffer Schematic .......................................................................................66

5.5

Schematic of a XOR ..................................................................................67

5.6

Schematic of an AND_XOR......................................................................68

5.7

Fast Inverter Schematic..............................................................................69

5.8

Typical DFF ...............................................................................................70

5.9

Schematic of the DFFCLRSYS .................................................................71

5.10

Layout of the DFFCLRSYS ......................................................................71

5.11

Toggle Flip-Flop Schematic ......................................................................73

5.12

Phase Frequency Detector Schematic........................................................74

5.13

Functionally of the Phase Frequency Detector ..........................................75

5.14

Layout of the Phase Frequency Detector ...................................................76

5.15

Schematic of the Digitizer .........................................................................77

5.16

Digitizer Layout .........................................................................................78

5.17

Simulation of the Time-to-Binary Converter.............................................79

5.18

Time-to-Binary Converter Schematic........................................................80

5.19

Time-to-Binary Converter Layout .............................................................80
xi

5.20

4-Bit Counter With Clear and an Enable-Signal........................................81

5.21

4-Bit Counter With Only RCO ..................................................................82

5.22

Simulation of a 4-Bit Counter With Clear and an Enable-Signal..............82

5.23

Simulation of a 4-Bit Counter With Only RCO.........................................83

5.24

Layout of a 4-Bit Counter With Clear and an Enable-Signal ....................84

5.25

Layout of a 4-Bit Counter With RCO........................................................84

5.26

Schematic of the 4-Bit Register .................................................................85

5.27

Layout of the 4-Bit Register ......................................................................86

5.28

Signed-Magnitude Subtractor Schematic ..................................................87

5.29

Functionally of the Signed-Magnitude Subtractor.....................................88

5.30

Layout of a Signed-Magnitude Subtractor.................................................89

5.31

Schematic of the Full Adder ......................................................................90

5.32

Simulation of the Full Adder .....................................................................90

5.33

Layout of the Full Adder............................................................................91

5.34

Half Adder Schematic................................................................................92

5.35

Half Adder Layout .....................................................................................93

5.36

Half Subtractor Schematic .........................................................................94

5.37

Simulation of a Half Subtractor .................................................................95

5.38

Layout of the Half Subtractor Schematic...................................................96

5.39

Subtractor Schematic .................................................................................97

5.40

Subtractor Layout.......................................................................................98

5.41

Schematic of the PLD ..............................................................................100

5.42

Layout of the PLD ...................................................................................101

xii

5.43

Digital Output of the PLD Blue; the Desired Curve is Shown in Red ....102

5.44

NAND Decoder Schematic......................................................................103

5.45

Simulation Run of the NAND Decoder, Digital Input Red,
Outputs Multicolor.............................................................................104

5.46

Layout of the NAND Decoder.................................................................105

5.47

NOR Encoder Schematic .........................................................................106

5.48

NOR Encoder Layout ..............................................................................107

5.49

Schematic of the Digital-To-Analog Converter.......................................108

5.50

Layout of the Digital-To-Analog Converter ............................................109

5.51

Simulation Plot of the DAC, Output Red, Input Bits Multicolor ............110

5.52

Schematic Path Selectors Two Pass Gates (a) Four Pass Gates (b).........111

5.53

Layout of the Two Pass Gates Path Selectors Two Pass Gates ...............112

5.54

Layout of the Four Pass Gates Path Selectors Two Pass Gates...............113

5.55

Resistor Tree Schematic ..........................................................................114

5.56

Simulation Plot of the Resistor-Tree Outputs Toggling the Sign-bit ......115

5.57

Layout of the Resistor-Tree .....................................................................116

5.58

Low-pass Filter Schematic.......................................................................117

5.59

Simulation Plot of the AC Response .......................................................117

5.60

Low-pass Filter Layout ............................................................................118

5.61

Voltage Ring Oscillator Schematic..........................................................119

5.62

Simulation Plot of the Control Voltage Swept Verses the Output
Frequency...........................................................................................119

5.63

Voltage Ring Oscillator Layout ...............................................................120

5.64

Differential-Delay Cell Schematic...........................................................121

xiii

5.65

Differential-Delay Cell Layout ................................................................123

5.66

Divide-By-32 Counter Schematic............................................................123

5.67

Divide-By-32 Counter Layout .................................................................124

6.1

H-Tree ......................................................................................................126

6.2

Buffer Stages............................................................................................129

6.3

Schematic of a Buffer ..............................................................................131

6.4

Schematic of the H-tree Clock Distribution System Covering 64 Grids .131

6.5

Schematic of the H-tree Clock Distribution System Covering 16 Grids .132

6.6

Schematic of the H-tree Clock Distribution System Covering 4 Grids ...132

6.7

Schematic of the Grid Model...................................................................133

6.8

Simulation of H-tree Clock Distribution System Input (Black)
Output (Red) ......................................................................................133

6.9

Example of the Monte Carlo Simulation for the H-tree Clock
Distribution System ..........................................................................134

6.10

Example of the Monte Carlo Simulation for the Grid .............................135

6.11

Grid Layout, With Load Capacitance ......................................................136

6.12

Grid Layout, With Level-ten and One inv9b Inverters ............................136

6.13

Layout of Four Grids, With Level-eight Inverters...................................137

6.14

Layout of Sixteen Grids, With Level- five Inverters ...............................137

6.15

Layout of the H-tree Clock Distribution System Covering 64 Grids ......138

6.16

Simulation of H-tree Clock Distribution System Input (Black)
Output (Blue) .....................................................................................139

7.1

NPLL Schematic......................................................................................143

7.2

NPLL Layout ...........................................................................................143

7.3

PLL One PFD Low Gain Larger Bandwidth for Faster Locking Time...145
xiv

7.4

PLL Two XOR Larger Bandwidth for Faster Locking Time ..................145

7.5

PLL Three PFD High Gain Larger Bandwidth for Faster Locking Time145

7.6

PLL4 XOR Smaller Bandwidth for Low Jitter ........................................146

7.7

PLL Five PFD Low Gain Smaller Bandwidth for Low Jitter..................146

7.8

PLL Six PFD High Gain Smaller Bandwidth for Low Jitter ...................146

7.9

Close-Up of The Frequency-Versus-Time Plots for The NPL ................148

7.10

Close-Up of The Frequency-Versus-Time Plots for The PLL.................148

7.11 Frequency-Versus-Time Plots for the NPLL
While Frequency Hopping...................................................................................149
7.12 Frequency-Versus-Time Plots for PLL One
While Frequency Hopping..................................................................................149
7.13

DFT of the NPLL at 2GHz ......................................................................151

7.14

DFT of the NPLL Frequency Hopping at 2GHz and 1.88GHz ...............152

7.15

DFT of PLL Five at 1.778GHz ................................................................152

7.16

DFT of PLL Three Frequency Hopping at 2GHz and 1.778GHz............153

7.17

DFT of PLL Five .....................................................................................154

7.18

Examples of the Monte Carlo Simulation Input to Grid R3Q3G3 ..........161

7.19

Examples of the Monte Carlo Simulation Input to Grid R1Q3G3 ..........161

7.20

Skew Grid R1Q1G1 to R3Q1G1 .............................................................162

7.21

Skew Grid R1Q1G1 to R3Q3G3 .............................................................163

7.22

In Grid Skew Legs 1 to 3 .........................................................................163

7.23

In Grid Skew Legs 1 to 4 .........................................................................164

8.1

NPLL With PI Fuzzy Controller..............................................................169

8.2

Limited Swing Differential Transmitter ..................................................171

xv

8.3

Differential Receiver................................................................................171

8.4

Variable Delay Buffers ............................................................................172

xvi

List of Tables
Tables

Pages

2.1

Bit-Rate Error and Alpha ...........................................................................15

4.1

Digital Inputs and Outputs of the 4-Bit Fuzzy Controller .........................58

5.1(a) Schematic Timing and Power Consumption..............................................63
5.1(b) Layout Timing and Power Consumption...................................................63
5.2

Logic Characteristics of the NAND Gate ..................................................63

5.3

Logic Characteristics of the NOR Gate .....................................................65

5.4

Logic Characteristics of the AND Gate .....................................................65

5.5

Logic Characteristics of the XOR Gate .....................................................67

5.6

Logic Characteristics of the AND_XOR Gate...........................................68

5.7

Logic Characteristics of the Adder Gate....................................................90

5.8

Logic Characteristics of the Half Adder Gate............................................93

5.9

Logic Characteristics of the Half Subtractor Gate.....................................95

5.10

Digital Inputs and Outputs of the 4-Bit Fuzzy Controller .........................99

7.1

Adjacent Period Jitter Phase-Lock Loops................................................150

7.2

Power for Phase-Lock Loops...................................................................155

7.3

Area for Phase-Lock Loops .....................................................................156

7.4

Lock-Time for Phase-Lock Loops ...........................................................157

7.5

Power for the Clock-Distribution Systems ..............................................159

7.6

Area for the Clock-Distribution Systems.................................................159

7.7

Jitter and Skew for the Clock-Distribution Systems................................164

xvii

List of Equations
Equation ........................................................................................................... Pages
2.1

Short Circuit Power....................................................................................10

2.2

Optimal Number of Stages for Minimum Delay .......................................21

3.1

Transfer Function of the NPLL..................................................................37

3.2

Normalized Second Order Transfer Function............................................38

3.3

Natural Frequency......................................................................................38

3.4

Damping Factor .........................................................................................38

3.5

Signal to Noise Ratio (1)............................................................................38

3.6

Noise Bandwidth (1) ..................................................................................38

3.7

Noise Bandwidth (2) ..................................................................................38

3.8

Signal to Noise Ratio (2)............................................................................38

3.9

Phase Noise................................................................................................39

3.10

Locking Range ...........................................................................................39

3.11

Locking-Time ............................................................................................39

3.12

Absolute Jitter ............................................................................................40

3.13

Equations Used to Simplify the Expression for Absolute Jitter ωd ..........40

3.14

Equations Used to Simplify the Expression for Absolute Jitter Cosθ .......40

3.15

Equations Used to Simplify the Expression for Absolute Jitter α .............40

3.16

Equations Used to Simplify the Expression for Absolute Jitter β .............40

3.17

Period Jitter in Radians ..............................................................................41

3.18

Period Jitter ................................................................................................41

4.1

Center of Gravity .......................................................................................49
xviii

4.2

Weighted Average .....................................................................................50

6.1

Optimal Number of Stages.......................................................................128

6.2

Gain Between Stages ...............................................................................128

7.1

Power Scaled............................................................................................155

7.2

Voltage Scaling Factor.............................................................................155

7.3

Area Scaling Factor..................................................................................156

7.4

Area Scaled ..............................................................................................156

xix

Acknowledgment
I wish to thank Dr. Raymond E. Siferd, my advisor, for his support and assistance
in completing my dissertation. The experience he has given me will assist me in my
future endeavors. I also would like to thank Dr. Marian K. Kazimierczuk, Dr. John
Enmmert, Dr. Henry Chen and Dr. Frank Scarpino for their aid, and for being on my
dissertation committee. In addition, I would like to thank Nick Vanderpool and Milissa
Cain for their generous help in proofreading my dissertation.
Most importantly, thanks to my parents, George and Sue Myers, who made all of
this is possible.

xx

1. INTRODUCTION
The primary objective of this doctoral research is to design and implement a highspeed, low-noise, phase-lock loop (PLL) in IBM 0.13μm CMOS technology for use in a
high frequency clock-distribution system in VLSI circuits. This will enhance on chip
clock generation by providing a technique that reduces jitter by having low gain in the
forward of the PLL, then automatically shift to a higher-gain for fast locking time when
not locked onto its input signal. This gain shifting is done with a fuzzy controller. In this
dissertation, this method is called a nonlinear phase-lock loop (NPLL). The NPLL is a
new approach that combines modern control theory with PLLs in the field of VLSI to
produce significant improvements. The NPLL is designed for a clock-distribution system
that will broadcast the clock signal across the chip area with the minimum degradation of
the signal.
Today there is a growing demand for high-speed integrated circuits. This is a
byproduct of Moore=s Law,

[41]

which states that every two years the number of

transistors on a computer chip will double with no increase in cost. Over this time, the
speed at which the chip operates will also increase. To operate these transistor’s circuits
efficiently, a precise clock-signal is needed. The latest high-speed, analog-to-digital
converters (ADC) and communication receivers also demand fast, accurate clock-signal.
To achieve these high clock-speeds, a more precise clock generator and distribution
system must be designed. The standard clock generator used on microprocessors is based
1

on a phase-lock loop (PLL). The PLL works by taking a very stable clock signal from
off-chip. This off-chip clock-signal normally comes from a crystal oscillator.
Although crystal oscillators are stable, they cannot be made on a silicon chip.
They can be made to operate at high-speeds, but the capacitance and inductance of the
wire from the oscillator to the chip will act as a low-pass filter blocking the signal. So
low frequency crystal oscillators are used. The low-frequency crystal oscillator, in this
dissertation, operates at a speed of 62.5MHz and is used in conjunction with the on-chip
PLL. The PLL takes the slow off-chip clock signal and compares it to the feedback signal
generated by the last stage of the PLL. The error between the off-chip clock and the
feedback signals is then filtered and altered for best performance. It is then fed back to
control the local, high-speed oscillator.[4] The high-speed clock signal, which operates at
2GHz for this implementation of this research, goes to the chip through the distribution
system. It is also slowed down by way of a clock divider. This slowed down signal forms
the feedback signal, which is then compared again with the off-chip clock. This repeated
comparison locks the on-chip clock’s output to the off-chip clock’s signal. This method is
used for the purpose of producing the clock, so that it is able to generate a precise signal
that is very stable with very little noise.
The standard PLL clock is improved in this dissertation with a nonlinear-gain
block. The nonlinear-gain is used to produce high-gain for fast locking-times and a small,
absolute jitter. Then, once lock is achieved, the nonlinear phase-lock loop (NPLL) shifts
to a low-gain to reduce phase noise, period jitter, and to achieve a large signal-to-noise
ratio (SNR). The nonlinear-gain unit consists of a fuzzy controller that is implemented
with a lookup table, where all possible inputs are matched with their pre-calculated

2

outputs.
The second part of this dissertation is the development of a clock-distribution
network that will not introduce errors that negate the accuracy made possible by the
improved clock. This takes the 2GHz signal generated by the NPLL clock and distributes
it throughout a chip to the end user’s circuits. This is done by buffering the small signal
of the NPLL clock and distributing the signal through an H-tree that splits six times to
end with 64 grids. These grids provide a uniform clock signal. Each grid is capable of
driving a load of 2 to 3pF and still provides a high quality signal. The NPLL is designed
to take a 62.5MHz from the off-chip clock and speed it up to a 2GHz clock signal.

Figure 1.1 NPLL Clock Generation and Clock-Distribution System
This dissertation starts with a Background Chapter that discusses different types
of clock generators and distribution systems that are currently available. The following
chapter deals with the mathematical dual-requirements of high-gain for fast-locking time
and absolute jitter, and the low-gain for period jitter. These differing requirements lead to
an approach that meets both. This is followed by a chapter discussing fuzzy controls. The
next chapter is on the type of circuit that will be added to a PLL to adjust automatically
3

the gain of the forward path of the PLL. This new circuit is called a NPLL. The NPLL
varying-gain structure will be shown to improve noise reduction and decrease the
locking-time of the traditional PLLs. The final sub circuits needed for the design of the
NPLL are discussed in Chapter 5. Chapter 6 contains the design of the clock-distribution
system, which contains the global level H-tree, as well as the local grid system. The
results of both the NPLL and the clock-distribution system will be discussed in Chapter
7. In Chapter 8, conclusions will be reached concerning the feasibility of the NPLL and
the clock distribution architecture. This will be followed by suggestions on future work.

4

2. BACKGROUND
Digital circuits require a clock signal to regulate their functions. This supersedes
the older method of asynchronous systems. In this Background Chapter, basic
information on the uses of clock systems is discussed. First, asynchronous and
synchronous systems are covered as well as the errors that can occur in them. This is
followed by the figures of merit that are used to evaluate a clock-distribution system.
Next, there is a discussion of the types of clock-distribution systems. This is followed by
a section on the common types of clock generators.
2.1 ASYNCHRONOUS SYSTEMS
Asynchronous systems were used in the beginning of digital circuit design
because not many components were being put on a single chip, and high-speed was not a
design requirement. This design methodology has inputs being applied to the
combinational logic, and eventually the correct outputs will be produced. However, this
system has several problems. First, it means only one piece of data can be manipulated at
a time. Other problems involve the types of errors asynchronous designs can produce.
One of the two main types of error is race, which is when two internal nodes change at
slightly different times so that the outputs will go through an intermediate state before
settling on the correct answer. The other type of error is known as a hazard. This occurs
even though a change in the inputs should not

5

affect the outputs, but the outputs briefly go to a wrong answer before returning to the
correct one. These problems are corrected in a synchronous design system.[70]
2.2 SYNCHRONOUS SYSTEMS
In today=s world, all high-performance microprocessors rely on synchronous
systems. In synchronous systems, clock signals are used to trigger registers that hold the
data in between stages of combinational logic. This allows designers to have a much
easier time in determining when the output data is correct. This also allows multiple
operations on the same data at the same time. The higher design cost of this is caused by
the extra circuitry in the clock generator and distribution systems, but is well worth it.
Some simple examples of synchronous systems will demonstrate their usefulness.
Modern microprocessor systems are generally much more complex, but operate on the
same principles.
The simplest example of the synchronous systems consists of two flip-flops at
either end of a transmission line. This is shown in Figure 2.1. For data to be transmitted
from the outputs of the first flip-flop to the inputs of the second, it must pass through the
transmission line. The transmission line acts as a time-constant circuit (RC), meaning that
it takes time for the outputs to change from low-to-high and high-to-low. In order for the
receiving D-flip-flops (DFF) to capture accurately the incoming data, a few things must
occur. First, the incoming data must reach its final value. Then, the set-up time ts of the
DFF must occur. Next, the clock edge will trigger the receiving DFF, latching the data.
The data must still be held at the input for the hold time, th, of the DFF. Once the
receiving flip-flop is latched, the first flip-flop can apply new data to the transmission
6

line. This is shown in Figure 2.2. The quantity tcDQ, contamination delay is the time that
occurs between the clock edge and the start of the change of the output of the DFF.
Another measure is tdDQ, which is the time that occurs between the clock edge and the
correct outputs of the DFF.

Figure 2.1 Synchronous System of a Transmission Line

Figure 2.2 Timing Diagram of a Synchronous System
Regulating the flow of data on the wire increases the rate of transfer by separating
the delay that follows the transmission line prior to it and after it. In this circuit, the first
of the two types of errors in a synchronous system can be demonstrated. When the data
from the first flip-flop changes before the set-up and hold-time of the second DFF, the
second DFF may register the wrong data. It is known as a long-path error. This error sets
the maximum frequency for the clock. The solution to long-path errors is to reduce the
frequency of the clock. Knowing that the maximum frequency for this transmission line
can be determined as F=1/(tdDQ + delay of wire+ ts). In this case, there is only one path
making it the longest path.[70]
7

The second example is slightly more complicated. It will demonstrate how most
circuitry is designed, as well as the second type of error in a synchronous system. This
second example consists of an input, A, flowing into a small segment of unknown
combinational logic. The output is then latched into the DFF when the clock pulse occurs.
The outputs of this DFF are then passed back into the combinational logic as an
additional input X. This can be seen in Figure 2.3.

Figure 2.3 Synchronous System of Combinational Logic
In this system, some timing issues must be considered. The first is the delay of the
combinational logic. The second is the set-up and hold-time for the flip-flop. In this
example, the correct operation of the circuit depends on the combinational logic reaching
the final value before the set-up time. If the path in the combinational logic is too long,
the signal will still be changing during the set-up time, causing an error known as longpath error. The second error that can be seen in this example is known as short-path error.
This occurs when the outputs of the DFF race through the combinational logic, and get
back to the input of the flip-flop, which violates the set-up or hold-times. This type of
error is less common, and cannot be corrected without redesigning the chip. In the
redesigning of the chip, delay buffers can be added, or total circuit redesigning must be
done, which can be very costly.

8

The reliability and functionality of synchronous circuit design hangs upon the
accuracy of the clock signal that arrives at the individual registers at the required times. If
it is early, set-up and hold-times may be violated. If it is too late, the next section will be
delayed, or cause a short-path error during the next clock cycle. Therefore, a reliable
clock generator and distribution system is a basic requirement for all modern computer
chips.[70]
2.4 FIGURES OF MERIT
The figures of merit for the design of any clock generator are maximum
frequency, power consumption, size, and timing accuracy. These figures of merit have
great impact on the design of the clock generator. They also influence which clock
generator will be chosen for the individual chip. An appropriate selection will have
significant impact on the performance and reliability of the chip.
2.3.1 Maximum Frequency
Maximum frequency describes the fastest speed at which the clock generator can
switch from high-to-low and back-to-high, while allowing the time for the combinational
logic to function. This must include time for other design criteria, namely the rise time
(trise) and fall time (tfall), the jitter requirements, or set-up and hold-times of the registers.
The maximum frequency is important in today's world of ever-increasing clock
frequencies.

9

2.3.2 Power Consumption
Power is the amount of energy over time required to operate the clock system.
The clock generator uses a small portion of the total power from the clock system. Power
is divided into two main categories, static and dynamic. Static power is the power used
when the circuit is not actively changing states. For CMOS integrated circuits, this static
power is in the form of leakage current or from a pseudo NMOS circuit. The pseudo
NMOS is not often used in clock generators or distribution systems. The clock system is
a very active circuit, so dynamic power is the dominant power. Dynamic power is a
combination of the switching power to the charge load capacitance and the power loss
due to leakage current. The switching power is defined by the equation P=αCfV2, where
a is the percent of the time a transistor switches. In most cases, for a clock system, α = 1
indicates that every transistor switches on every clock cycle. The next term is C, which is
the total capacitance of the clock generator. The frequency of the clock generator, when
operating, is f and V is the rail voltage for the clock generator. From this equation, it can
be determined that the modern requirement for low power and high frequencies is a very
delicate balancing act.[26][50]
The equation for the short circuit power is:
P =V

1 k ∗t ∗ f
∗
∗ (Vdd − 2Vt ) 3 .
12
Vdd

(2.1)

This is power lost from the current leaking from the Vdd to the ground during the
switching of states. In this equation, t is the average rise and fall-time of the input signal.
Vt is the threshold voltage of the transistors and f is the operating frequency. The last
value is k, which is the process transconductive parameter. If t is much less than 1/f, then
10

the short circuit power is not a significant factor. From these different equations, we can
calculate the total power usage.[26]
2.3.3 Size
Size is the actual physical dimension that the clock system needs to occupy on the
circuit. Due to the cost of the silicon area, the size should be kept as small as possible.
2.3.4 Lock-Time
Locking time is the time needed for the PLL to lock onto its external oscillator
signal. The shorter this value, the quicker the clock generator will be able to produce a
stable clock signal. This is vital when a microprocessor is turned on or comes back from
a power saving mode. Another time that a small lock-time is valuable is when recovering
from an unexpected glitch in the control voltage.
2.3.5 Timing Accuracy
Timing accuracy involves two major sources of error, skew and clock jitter. Skew
and jitter determine how accurately the clock signal can be produced and distributed.
Timing accuracy must also consider rise and fall-times of the clock and latency of the
clock system.
2.3.5.1 Skew
Skew is defined as the difference in time that the same clock signal arrives at two
points on the chip. Ideally, skew should be zero. Skew is caused by differences in the
length of the wires of the clock-distribution network. Examples of this can be seen in
Figures 1.4 and 1.5. In this example, the clock signal arrives at Point 1, tskew seconds
before the clock arrives at Point 2. Even if the length of the wire is the same, the
11

capacitance can be different, causing skew. The extra capacitance can be in the form of
via or cross coupling between wires. Differing power supply voltages, at the clockdistribution buffers, can also cause skew. Skew can be a factor in the design of pipeline
stages. If the clock line runs in the direction of the pipeline, each combinational logic
section will have a small amount of extra time. If the clock is run in the opposite
direction, stages at the beginning will have less time.

Figure 2.4 Clock Skew Layout

Figure 2.5 Clock Skew Timing Diagram
A related measure is local skew. This is the worse case skew within an area that a
signal can travel in a single clock-cycle. Because the signal will have to be registered to
leave this area, skew outside this region is not important.[52]

12

2.3.5.2 Latency
Latency is the delay between when the original clock changes and when the first
circuit receives the change. This can be seen in Figures 1.4 and 1.5. It is proportional to
the amount of jitter in the clock-distribution system. This factor needs to be considered
during the design of a chip. A larger latency indicates more time passing through the
buffers. Even without buffers, a longer wire has more latency, and is susceptible to
crosstalk.[39][44][52]
2.3.5.3 Rise and Fall-Time
An ideal clock generator produces a square wave, but non-ideal factors affect the
output. In a non-ideal case, parasitic capacitors slow the change between high-to-low and
low-to-high. This tends to make the clock generator=s output look more like a trapezoid
than a square wave. The measurements of the slope of the trapezoid are trise and tfall
(Figure 2.6). The trise is determined by how much time it takes for the clock signal to
change from 10 percent of its value to 90 percent of its value. The tfall is the time required
to change from 90 percent to 10 percent. As the load on the clock generator increases, the
values of trise and tfall will increase, creating a greater average propagation delay tp. In
addition, the rise and fall delays for the various nodes can be different from one another,
depending on the size of the buffer driving them. The reason why we must be concerned
with rise and fall-time is that there will be a delay in triggering the registers as they
increase. Also, as previously discussed, it is necessary that trise and tfall are much less than
1/fclk to conserve short circuit power consumption.[26]

13

Figure 2.6 Rise and Fall-Time
2.3.5.4 Jitter
Jitter is a small uncertainty in the clock signal. There are three main definitions of
jitter. The first is absolute jitter, sometimes called aperture uncertainty. This is the
difference between a perfect clock and the real clock (See Figure 2.7). A situation where
absolute jitter degrades the functionality of a circuit is a sample-hold circuit. In samplehold circuits, slight variations of the clock means that the sampled voltage will be read at
the wrong moment, and hence record the wrong value.

Figure 2.7 Jitter

14

The second definition is adjacent-period jitter. This is the difference between two
clock cycles next to each other, seen in Figure 2.8. This adjacent-period jitter is more of a
problem in digital circuits and clock generators, because if one clock pulse runs a little
longer and the next a little shorter, long-path errors can occur.

Figure 2.8 Adjacent-Period Jitter
Root mean square (RMS) jitter is a standard method of measuring adjacent-period jitter,
which is one method of measuring the amount of uncertainty in a circuit. It is defined, in
the jitter histogram, as plus-or-minus one standard deviation from the mean. This can be
simply measured from the jitter histogram. This method does assume Gaussian noise, and
that the measuring tools used do not produce noise themselves. From RMS jitter, adjacent
period jitter can be calculated by the total spread of the histogram and the bit-rate error.
Since the distribution is Gaussion, the histogram should have infinite tails. So the bit rate
error is used to limit the range. The adjacent period error is given as jitterap = α*jitterRMS,
where α is determined by the bit-rate error from Table 1.1.
BRE
10^ -3
10^ -4
10^ -5
10^ -6
10^ -7
10^ -8

Alpha
6.18
7.43
8.53
9.51
10.4
11.22

BRE
10^ -10
10^ -11
10^ -12
10^ -13
10^ -14
10^ -15

Alpha
12.72
13.41
14.07
14.7
15.3
15.88

Table 2. 1 Bit-Rate Error and Alpha

15

The third definition of jitter is period jitter. This is the variation in a period of a
clock from normal. This has similar effects as adjacent period jitter.
There are many causes of jitter. The two main ones are noise in the power supply
and crosstalk between signal lines and the clock. The worse case of crosstalk occurs
when a signal line directly above or below the clock line alters its value at the same time,
but in opposite direction of the transition of the clock signal. Noise in the power supply
can cause buffers to change their propagation delay time at random.[33] Also, the
generator itself can produce some jitter. Jitter can be reduced by isolating the clock lines
and reducing the noise in the power lines to the clock generator and the distribution
buffers.[22] [23]
Jitter can be reduced from the generator by using a more stable design that is less
susceptible to noise. In the design process of an integrated circuit (IC), a time budget is
formed that accounts for delays of the pipeline sections and the uncertainty of the clock is
used to determine the maximum frequency. If a reduction in the amount of time budgeted
for clock jitter and skew is achieved, the overall clock frequency could then be
increased.[70]
2.3.5.5 Phase Noise
Phase noise is a different method of looking at the uncertainty in the output of the
clock generator. The main difference is that phase noise is in the frequency domain, not
the time domain as jitter is. However, it is capable of being transformed into the time
domain to form jitter. Phase noise is described as the unwanted signals surrounding the
desired signal in a fast Fourier transform (FFT) that is not blocked by the low-pass filter.
16

Measuring phase noise is done by determining the power in a 1Hz bandwidth, that which
is at a set frequency away from the desired signal. This is shown in Figure 2.9.

[55][71]

A

second definition of phase noise is a value of the integral over a range of frequencies,
normally the entire noise bandwidth. This method is harder to measure but is more
comely used in calculations.

Figure 2.9 Measuring Phase Noise
2.4 CLOCK DISTRIBUTION
The clock-distribution system is responsible for receiving the clock output-signal,
increasing its strength, and dispersing it to all the required circuits on the chip. This
distribution system should also minimize the effects of jitter and skew while also
minimizing power consumption. Buffers are key components of a clock-distribution
system. While they are necessary, the number of buffers should be kept to a minimum
because each buffer stage introduces some skew, jitter, and consumes power.
2.4.1 Distribution Hierarchy
Clock distribution is usually separated into three layers, global, regional, and
local. These layers increase in fan-out and complexity. The way the sections are divided
17

is shown in Figure 2.10. Each layer has different requirements altering the best solution
for how the distribution network should be done. The different methods of distribution
will be described in a following section. [30]

Figure 2.10 Distribution Hierarchy
2.4.2 Global Level
The global level is responsible for distributing the clock signal from the clock
generator across the chip to each of the section buffers. This level has the largest area of
concern and generally covers the entire chip. With this large area, the considerations are
to produce fast rise and fall-times while not producing large amounts of clock jitter or
skew. The global level has the smallest amount of the total capacitance load of the entire
distribution system. With this small load, the global level accounts for only a small
amount of power dissipation from the total clock system. With these design
characteristics, the choice of the distribution system for the global level is determined.
For most VLSI chips, the H-tree or a single driver is chosen for the global level. The
H-tree and single driver will be discussed later.

18

2.4.3 Regional Level
The regional level is the middle ground between the global and the local clockdistribution networks. It starts after the sector buffers and ends with the clock pins of the
macro. Each region covers less area than the global system, but they have more of the
total capacitance load of the clock-distribution system. With this larger total capacitance,
they will consume more power, generally an order of magnitude more than the global
system. The regional section has some effect on clock jitter and skew, but not as much as
the global.[52]
There are four main methods for the regional-distribution network. The first
extends the H-tree down to the local system. The second method uses shorting bars to
form spines. The third is the formation of a grid. The last method uses length matching.
These methods will be discussed in detail later.
2.4.4 Local Level
The local level consists of everything after the clock pin of the macro or standard
cell. Each local-distribution system covers the smallest area but combined, they have the
largest total capacitance of the clock-distribution system. All the small, local level, clock
distribution across the entire chip combined consume most of the power used in the
clock-distribution system. The power is generally an order-of-magnitude above the power
requirements of the rest of the distribution system. The designer of a clock system usually
does not have a say in how the local level is designed. In the macros, the clock wires are
run with more concern for signal routing than skew or jitter. This is due to the unusual
shapes of the macros. For high frequencies systems, local-clock distribution would have

19

to be considered with respect to skew and jitter. There are several methods to implement
these levels in the clock-distribution system.
2.5 TYPES OF DISTRIBUTION
Distributing the clock signal effectively through the differing regions can require
different methods. Different design requirements such as power, delay, skew, and ease of
design can also effect which distribution system that is chosen. Some of the types of
distribution and their advantages and disadvantages will be discussed.

2.5.1 Single Driver
The first design method for a clock-distribution system at the global level is
known as the single driver scheme, and is shown in Figure 2.11. This is also referred to as
the water main system. It has a set of large buffers at the trunk of the distribution tree.
Smaller branches form at the end of each larger branch.

Figure 2.11 Single Driver Scheme
The clock lines branch off to the individual components. This ensures that there is
roughly the same distance from the clock generator to each component. The advantages
of this design scheme are that the buffer delay can easily be adjusted by changing the
buffer size. The delay between the different clock lines can also be adjusted by changing
20

the capacitance of the load at the end of the individual clock branches. This is done to
either slow down or speed up the signal. Also, the individual wire length can be changed.
A longer wire will have a longer delay. The problem with this design scheme is that the
initial buffers are quite large, and draw a significant amount of current.[50]
Determining the correct number of buffers for the single driver scheme can be
done by the use of the Equation 2.2. In this equation, N is the optimal number of stages
for minimum delay, the capacitance of the load is Cl, and Cin is the input capacitance of
the first stage. This also assumes the drain capacitance is zero. If the drain capacitance is
considered, the denominator can vary between 2 and 2.5.[50]

⎛C
ln⎜⎜ l
C
N = ⎝ in
e

⎞
⎟⎟
⎠

(2.2)

2.5.2 H-Tree

The second major design scheme for a clock-distribution network is the
distribution buffer scheme or the H-tree, as seen in Figure 2.12. This method has several
small buffers spread throughout its distribution tree. The buffers are located just before
each clock branch splits. This system is more flexible in layout design because the
individual buffers can be placed in small, unused areas of the layout. The distributed
buffers are smaller than the single driver is, and will consume less power. Another
advantage is that each individual buffer size can be altered to change its particular delay
of that branch. The problem with this design is that each individual buffer must be tuned
to reduce skew. Some modifications can be made to the H-tree to improve its
performance.[50]
21

Figure 2.12 H-Tree

The first modification is the deskew buffer. The deskew buffer does not change
the physical structure of the distribution network. Instead, it changes the delay of the
particular buffer for a specific region of the network. It corrects the delay of the signal by
using a phase-detector to determine if the buffer needs to speed up or slow down. For
comparison purposes, the inputs to the phase-detector are from an area on a nearby
branch, or from a second balanced tree.[30]
Another method is differential signals. It provides both the signal and the
reference. This reduces the effect of crosstalk from the power lines and common mode
noise.
2.5.3 SPINES

A commonly used method at the regional level is the use of shorting bars.
Shorting bars are wire connections to the output of some of the buffers of a particular
stage. Every two to three stages can be connected. These wires connect together forming
a spine, and are shown in Figure 2.13. This averages the distribution system together. The
drawback to this scheme is that as frequency increases, the connecting wires start to act
as resistors shutting down the averaging effect.
22

Figure 2.13 Spine=s Distribution System
2.5.4 Grid

The grid distribution system is a method to limit skew and jitter. This is done by
connecting the output of the buffers together at the edges of a grid (See Figure 2.14.).
This method averages the skew and jitter from previous stages. There will be some skew
produced as the signal travels from the edges toward the center. This system also has the
smallest latency of the regional distribution methods. The main drawback is the large
amount of capacitance in the grid. Significant amounts of power are required to drive a
grid. Reference planes for the grid are advisable to reduce reflections.[3]

Figure 2.14 Grid Distribution System

23

2.5.5 Length-Matched Serpentines

Length matching is a technique that ensures each clock signal has the same length
of wire leading to it. After determining the longest length of wire needed in a sector, all
other clock lines are made the same length. The extra wire is run out and folded back
upon itself as seen in Figure 2.15. The advantage of this design is that zero skew can be
achieved. This method has no effect on jitter, but it does have one of the largest
capacitance loads of all the regional designs.[2]

Figure 2.15 Length-Matched Serpentines-Distribution System
2.6 CLOCK GENERATORS

Even with a perfect, distribution system, the clock signal must be formed first.
There are several different types of clock generators, with each one having strong and
weak points. The clock generator forms the signal that regulates the circuits throughout
most of the chips today.
2.6.1 Ring Oscillators

The ring oscillator is one of the most basic types of clock generators. It consists of
an odd number of inverters connected in a loop (See Figure 2.16). For stability reasons,
more than five inverters are used, because at least one complete oscillation must be
contained in the loop.
24

Figure 2.16 Ring Oscillator

The basic principle is that with an odd number of inverters there are no stable
points. This means each node, between the inverters, fluctuates from high-to-low and
low-to-high. This fluctuation propagates through the entire loop. To access the clock
pulse, a node is tapped and then buffered to power the clock-distribution system. If the
complement to the clock is needed, an additional inverter is run off the same node and
then buffered to power the clock=s complement distribution net. The basic equation
governing the frequency that our ring oscillator operates at is f=1/(N*(tpHL + tpLH).[2]
Where N is the number of inverters and tpHL is the time it takes from a change in the
input, from high-to-low, to appear in the output. Likewise, tpLH is the time it takes a
change in the input, from low-to-high, to affect the output. These measurements are
usually taken at the halfway point between the ground and the rail voltage. The
advantages of the ring oscillator are its simplicity in design, the small area needed on the
chip, and the lack of a required input to start or operate. The drawbacks of the ring
oscillator are that it only operates at one frequency. This can be a problem if you wish to
test the chip at a lower frequency. An additional problem is process variations between
chips that can cause the propagation delay to be slightly different, thus making the same
ring oscillator operate at different frequencies between two chips. The last major problem
with ring oscillators is that they are highly subject to jitter due to noise in the power
supply and ground bounce.[2]

25

2.6.2 Voltage-Controlled Ring Oscillators

Voltage-controlled ring oscillators operate on the same basic principles as the ring
oscillator. The difference is that there is an additional PMOS connected between the
supply voltage and each inverter and an NMOS in line to the ground, Figure 2.17. This
lets a control signal change the voltage powering the inverters. At lower voltage levels,
the delay is increased. This is a useful tool when testing circuits for faults, or to back off
the clock, if long-wire errors are manifesting themselves. The voltage controller also
requires more circuitry than the ring oscillator, and needs an extra input pin for the
control signal. It is susceptible to the same noise sources as the ring oscillator.[2]

Figure 2.17 Voltage-Controlled Ring Oscillators

While the voltage-controlled ring oscillators are shown as a straight line in
schematics, this is not the case in most layouts. If all the inverters are in line, the
feedback wire must run the entire length of the circuit. Therefore, the last inverter must
power a larger capacitor load, which leads to a larger delay in the last stage. There are
two methods of solving this problem. The first is to ensure that each stage has the same
length of wire to the next stage. This is accomplished by alternating the connections to
the inverters. The odd numbered inverters connect to each other in the forward direction.
The even numbered inverters connect backwards. This is shown in Figure 2.18. The
26

second method is to form a loop with a line of inverters going out and another line
coming back. Both of these methods will evenly distribute the capacitor load of the
interconnects among all the stages.[2][74]

Figure 2.18 Layout Pattern of a Voltage-Controlled Ring Oscillator
2.6.3 Crystal Oscillators

Crystal oscillators are the oldest known types of electrical oscillators. They were
designed in 1921. Many different circuits use crystals to produce oscillations. Some
crystal oscillators are less expensive to design, while others are used for stability of the
frequency generated. They operate by having a feedback loop with a gain > 1, and the
crystal element in the loop is frequency sensitive. The crystal will set up a scenario where
it oscillates at its fundamental frequency. The oscillations in the crystals are set up by the
piezoelectric effect causing physical deformation in the crystal when electricity is
applied. For quartz crystals, the operating frequency is determined by the thickness of the
crystal and governed by the equation f=1670/t. The major problem with crystal
oscillators is that there are very few processes that allow a crystal to be formed on a
standard CMOS chip. Another problem is that a crystal only operates at one frequency.
However, with just a little bit of circuitry, a highly stable clock can be produced. Crystal
oscillators are generally used as off-chip clocks that can be fed to a chip.[38]

27

2.6.4 Negative-Resistance Circuits

Negative-resistance circuits are generally inductor-capacitor circuits with some
form of negative resistance. The negative resistance is normally achieved through
op-amps with positive feedback loops or cross-coupled, differential pairs. The problem
with these circuits is that they are only as accurate as their components, so small
differences in the inductor and capacitor values can mean differences in the operating
frequency.[32]
2.6.5 Standing-Wave Generators

Standing-wave generators are a relatively new idea for forming a clock signal and
the upper layer of the clock-distribution network all at once. They operate by taking a
loop of wire and producing negative resistance along that wire by means of cross-coupled
inverters. With this negative resistance associated with the wire, oscillations are produced
based upon its length. If more than one cross-coupled inverter is used, the circuit can be
made to operate at a higher harmonic frequency than its fundamental. Smaller, local
clock-distribution networks can be attached to the standing-wave oscillator, which acts as
a clock. Also, several standing-wave oscillators can be coupled to ensure a uniform clock
over the entire area of the chip. Theoretically, this has a high potential for being a stable
clock generation method, but little research has yet to be performed.[44][45][72]
2.6.6 Phase-Lock Loop

PLLs are the most common types of clock generators on modern high-speed
chips. PLLs have three major components. Which are a phase-detector, a loop filter and a
VCO.A phase-detector, compares the input signal, normally from an off-chip crystal
28

oscillator, to the feedback signal. A loop-filter takes the information from the phasedetector and removes all high frequency glitches, leaving an average signal to control the
third part, a VCO. It produces the output as well as the feedback to the phase-detector as
seen in Figure 2.19. An optional clock divider, in the feedback path, allows the output to
be slowed-down to the speed of the input-signal.

Figure 2.19 PLL

The PLL has several advantages over other clock generators. The first is that it
can operate over a wide range of frequencies. The second advantage of the PLL is that it
can use a lower frequency off-chip clock oscillator as a reference to regulate a much
higher speed, on-chip clock. In the clock system, the PLL also isolates noise and jitter,
from previous sources used as the reference signal from its output.
An important performance parameter for PLLs is locking time, which is the time
the PLL needs to lock onto an off-chip reference signal. This is very important in
frequency syntheses and wireless transceivers. PPLs can also be used as a
microprocessor=s clock source as in this dissertation. Another criterion is how quickly the
system can be operated after the initial power startup. If lock is lost on the signal for
some reason, such as a static discharge on the input clock, it is important to recover
quickly because the output clock will vary causing timing errors. The lock on the input
also can be lost if the input frequency changes quickly.[4]
29

One method of improving the performance of a PLL is to change the bandwidth
of the PLL=s low-pass filter. In an analog filter, the simplest of these methods uses a passgate to connect a different resistor in the low-pass filter. This allows for a larger
bandwidth for rapid acquisition and a smaller one for noise rejection once lock is
achieved. This set-up is shown in Figure 2.20.[4][66]

Figure 2.20 Variable Bandwidth PLL

In systems with digital filters, the coefficients of the filter can be changed,
altering the filter bandwidth to produce results similar to that discussed above for analog
filters. Another method of varying the bandwidth involves the PLL=s active filter. The
biasing of the active filter can be altered to achieve control over its bandwidth.[68]
A second major method used to improve the performance of a PLL in both
locking time and jitter is to change the gain of the PLL. The simplest implementations
would be to change the current in the charge pump. This can be done by turning on a
switch, which connects a second current source to the charge pump, or separate charge
pumps, when the PLL is attempting to acquire the signal. Normally, this is limited to two
different values, giving just two possible gains. This scheme can be seen in Figure
2.21.[18][36]
30

Figure 2.21 Variable Gain PLL

Another method of changing the gain is the use of voltage-to-current
transconductors. These voltages are then compared to determine the current to be used at
the charge pump. This scheme can be seen in Figure 2.22.[17]

Figure 2.22 Frequency-to-Voltage Converter Variable Gain PLL

A different technique to change the gain is the use of a nonlinear-gain element in
the feed forward path of the PLL. This will be discussed later in Chapters 3 and 4 of this
dissertation.

31

2.6.7 Delayed Lock-Loops

Delayed lock-loops are another popular method of generating clocks for VLSI
circuitry. Delayed lock-loops are simpler than PLLs and have most of their advantages.
They take a reference clock-signal, and run it through the phase-detector against the
delayed clock. The phase-detector then sends an up-or-down signal to the charge pump.
From there it is sent to the low-pass filter, whose output is then sent to a voltagecontrolled delay line. This is shown in Figure 2.23. This delay line is similar to the ring
oscillator, except it is not fed back in a loop. The delay line uses the control voltage to
determine the amount of time it takes the reference clock to go through the series of
inverters. Now, the delayed-reference clock is fed back to the phase-detector. One of the
greatest advantages of this type of device is that different delay amounts of the clock can
be achieved, making it a useful tool when designing time-interleave circuitry. The
delayed lock-loop has good jitter characteristics like the PLL. Their main problem is that
they have a smaller phase capture range.[61]

Figure 2.23 Delayed Lock-Loops

32

2.7 CHOSEN SYSTEM

For the criteria of this dissertation of a stable clock generator that quickly
recovers from glitches or other noise sources, a PLL is chosen. To improve the
performance of a standard PLL an automatic gain adjuster is added. This will be shown in
Chapter 3 to improve locking time and decrease jitter. A fuzzy controller that is discussed
in Chapter 4 performs this automatic gain adjustment. The fuzzy controller forms the core
of the new NPLL that was designed for this dissertation. The NPLL will improve the
field of clock generators. To test the performance of the NPLL, a clock-distribution
system was designed. The clock-distribution system uses an H-tree global distribution
level. This H-tree passes the clock signal to a regional grid system that will be discussed
in Chapter 6.

33

3. PROPOSED APPROACH
3.1 INTRODUCTION

A high frequency, low-noise phase-lock loop (PLL) for the production of a clock
signal is still a challenge for integrated circuits. In this dissertation, a method of using
a fuzzy logic controller (FLC) to act as a nonlinear-gain circuit in the feed forward
path of the PLL is demonstrated to improve performance of the PLL. This will be
called a nonlinear phase-lock loop (NPLL). The NPLL also outperforms other
methods of improving lock-time and jitter. The NPLL will be used to supply a clockdistribution network to show viability in a real world application. The block diagram
of a standard PLL and the nonlinear phase-lock loop (NPLL) is shown in Figure 3.1
and Figure 3.2

Figure 3.1 Block Diagram of a Standard PLL

34

Figure 3.2 Block Diagram of a Nonlinear Phase-Lock Loop

While the PLLs are used in clock generators, they alone cannot make a clock
signal. The basic PLL can only speed up, slow down, or match an input signal. The PLL
needs a reference clock. For clock generators, this is normally an off-chip, slow-speed
crystal oscillator. Crystal oscillators are one of the most stable sources for a signal. The
problem is that they cannot be formed on an integrated circuit (IC), so an off-chip crystal
oscillator is used for the reference for either the PLL or the NPLL. The reason the crystal
oscillator works at low speed is that a high-frequency signal from the crystal cannot pass
through the wire connecting it to the IC. The wire itself acts as a low-pass filter that will
block the signal. To achieve a high-frequency signal for the clock distribution, the PLL
must increase the speed of the stable off-chip signal. The NPLL achieves this without
adding extra noise to the system.
The NPLL in this dissertation will provide high-gain for fast acquisition and then
a lower gain section for lower jitter and better stability. Coupling this with a low-noise
voltage-controlled oscillator (VCO) provides a PLL with highly desirable characteristics.
This design is demonstrated in IBM 0.13μm CMOS technology. This work will further
the solution to the long-standing problems of high-speed clock generation and
distribution.

35

The nonlinear-gain unit is implemented by means of a fuzzy logic controller,
which is a nonlinear, proportional controller. This allows for a large gain to reduce the
locking time (TL), and a smaller gain for noise reduction.
The first advantage of this proposed design is its simplicity. It requires far fewer
circuit components than most adaptive PLLs

[4][18][69]

. Implementation of the fuzzy

controller can be done with a small, programmable logic device (PLD), or a
current-mirror, analog controller

[24]

. This will allow for high-speed operations. Another

advantage is that the controller can be designed by simple guidelines rather than design
optimization. In addition, this fuzzy controller can be added to any existing PLL to
improve its performance. It will work on both frequency-up converters and
frequency-down converters. While varying the gain with multiple charge pumps is
possible, this is generally limited to two or three different charge pumps each giving a
different gain. The NPLL in this dissertation has four different gains and the possibility
of having an infinite amount of different gain values.
The working idea for the gain controller is that for fast acquisition a large gain is
needed, but this leads to oscillation in the value of the voltage powering the VCO. This is
known as the ripple voltage. If the control voltage to the VCO varies, the output
frequency will also vary. This is an undesirable effect. To compensate, a lower gain
section is used in the operational region of the VCO. This reduces oscillations and
improves noise suppression. This nonlinear-gain will be designed with Modern Control
Theory, and implemented with a fuzzy controller.
The fuzzy controller will have a small, low-gain section, which is used around the
operating point of the VCO. This operating point is the input voltage of the VCO needed
to produce the desired output frequency. If designed correctly, it should be the same as
36

the center frequency of the VCO. This low-gain area should be surrounded by a high-gain
area for fast acquisition. When this idea was presented,[59][60] no implementation was
given. This idea is extremely good for clock generators where the desired output
frequency is known. The following sections describe why varying the gain affects the
signal-to-noise ratio (SNR), phase noise, locking range, locking time, and jitter.

Figure 3.3 Mathematical Model of the Nonlinear Phase-Lock Loop
3.2 THEORETICAL DEVELOPMENT

To show mathematically that as the gain of the NPLL decreases, the SNR will
increase, thus the transfer function of the NPLL must be found. The following Equation
3.1 can be determined from Figure 3.3. This also assumes an active lag filter.

H(s) =

Vout
=
Vin

kd ka k f ko
s +s
2

1+ sτ 2

kd ka k f ko

τ1

τ1

+

kd ka k f ko

(3.1)

τ1

In Equation 3.1, kd is the gain of the phase-detector and ko is the gain of the VCO. The
gain of the fuzzy nonlinear unit is Kf and the gain for the lag filter is ka. The time
constants (RC) for the low-pass filter are τ1 and τ2. From Equation 3.1 and the control
theory normalization from Equation 3.2, the Equation 3.3 for ωn, the natural frequency,
37

and ξ Equation 3.4, the damping factor, can also be obtained.
2 sξω n + ω n
H (s) = 2
2
s + 2 sξω n + ω n
2

ωn =

(3.2)[29]

kd ka k f ko

(3.3)

τ1

ξ = ωn

τ2

(3.4)

2

The SNR of the output is equal to the SNR of the input times the input bandwidth
Bi divided by twice the noise bandwidth, Bo, of the phase-lock loop. This is shown in

Equation 3.5

SNRo= SNRi

Bi
2Bo

(3.5)[2]

The noise bandwidth of the phase-lock loop is the integral of the transfer function from
zero to infinity.
∞

2

Bo = ∫ H ( j 2πf ) df =
0

ωn
2

(ξ +

1
)
4ξ

(3.6)

Substitution and reduction can be used to solve for the SNR of the output.

ω n2τ 22 + 1 k d k a k f k oτ 2
Bo =
=
4τ 2
4τ 1τ 2

(3.7)

Equations 3, 4, 5, 6, and 7 are combined to obtain the final SNR of the output.
SNRo = SNRi

2 Biτ 1τ 2
k d k a k f k oτ 2 + τ 1

(3.8)

Now, it is simple to determine that as Kf decreases the SNR increases, and if Kf increases
the SNR will decrease. Using the second definition of phase noise of power in the noise
38

bandwidth the equation for the phase noise is shown too related to the SNR as follows:

θ n2 =

1
2 × SNRo

(3.9)[4]

From the equation, it can be determined that the RMS phase noise is inversely
proportional to the SNR. As the SNR improves, the phase noise will decrease. This is
why SNR is a good measure of the performance of a PLL.
Another effect of increasing the gain is the improvement of the locking range,
ΔWL, which means that as the gain is increased, a larger change in the input frequency is
required to cause the PLL to lose lock. This process is governed by the Equation 3.10:
ΔWL = 4πξω n =

2πk o k d k f τ 2

τ

(3.10)

1

The fact that the lock can be lost makes acquisition time important. By design of
the NPLL, if the lock is lost, the gain will be increased so that locking-time will be
reduced. Locking-time, TL of a PLL is governed by Equation 3.11.
TL =

2π

ωn

=

2π
kd ka k f ko

(3.11)[4]

τ1
By looking at this equation, it can be determined that as Kf increases TL decreases and
vice versa.
The equation governing the absolute jitter, σ Δ2T of a PLL was derived in the paper
AJitter Optimization Based on Phase-Locked Loop Design Parameters,” [40] and is defined

as:

39

⎛ 4π 2 N VCO ⎞ ⎪⎧ 1
⎛ sin(ω d ΔT + θ ) cos(ω d ΔT ) ⎞⎫⎪
e −ξωn ΔT
⎟
⎜
⎟⎟⎬, ξ < 1
*
*
+
−
⎨
2
2
⎟ ⎪ 2ξω
ωn
ξω n
2(1 − ξ ) ⎜⎝
n
⎠⎪⎭
⎝ ωo
⎠ ⎩
(3.12)[40]
2
2
2
⎛ 4π N VCO ⎞ ⎧ 1
α ⎞ −bΔT ⎛ 2αβ β ⎞⎫
− aΔT ⎛ 2αβ
⎟*⎨
⎟⎟⎬, ξ > 1
⎜⎜
⎟⎟ − e
⎜
e
+
= ⎜⎜
−
+
2
⎜a+b
⎟ 2ξω
a
a
b
b
+
ω
⎠⎭
⎝
⎠
⎝
n
o
⎝
⎠ ⎩

σ Δ2T = ⎜⎜
σ Δ2T

NVCO is the noise from the VCO, and ωO is the center frequency of the VCO. ωn is the
natural frequency, and ζ is the damping factor. ΔT is the amount of time being observed.
The following equations are used to simplify the expression. [40] [29]

ω d = ω n * (1 − ξ 2 )

cos θ =

(3.12)[40]

(1 − ξ )
2

(

a, b = ξω m ω n * 1 − ξ 2

(3.13)

)

(3.14)

α=

−a
(b − a )

(3.15)

β=

b
(b − a )

(3.16)

When Equation 12 is plotted (See Figure 3.4), it can be seen that as the gain increases the
absolute jitter should decrease. This Equation also states that a rather significant gain
increase is required to produce a large reduction in the amount of jitter past the gain of
one.

40

Figure 3.4 Absolute Jitter vs. Gain

However, as the gain increases the period jitter will increase. Period jitter is larger
and has more effect on the performance of clock-distribution systems than absolute jitter.
The equation for period jitter can be found by using the phase noise to determine the
period jitter-in-radians (See Equation 3.17). To find period jitter, the last step is to divide
jitter-in-radians by 2π times the center frequency of the center oscillator, Equation 3.18.
This gives us the equation for period jitter.
θ n2

jitter rad = 10 10 ∗ 2
period jitter =

jitter rad
2π f osc

(3.17)
(3.18)

The plot of the period jitter versus gain is shown in Figure 3.5. From this plot, it is
clear to see that period jitter decreases as the gain decreases. These plots of gain versus
jitter clearly show the problem for PLLs. In this dissertation, a fuzzy controller will be
used for high-gain to start and achieve lock and reduce absolute jitter. Then, the gain of
41

the circuit transitions to a low-gain to manage the period jitter. A normalized plot for both
absolute and adjacent-period jitter is shown in Figure3.6.

Figure 3.5 Period Jitter vs. Gain

Figure 3.6 Normalized Plot of Absolute (Red) and Adjacent-Period (Blue) Jitter

42

4. FUZZY LOGIC
4.1 INTRODUCTION

The nonlinear phase-lock loop (NPLL) clock system requires a nonlinear-gain.
This nonlinear-gain is designed and implemented by means of a fuzzy controller. Lotfi
Zadeh did the first work on fuzzy logic in the 1960s.

[54][43]

He worked on multi-value

rule systems. Traditionally, logic systems used Boolean Logic, either yes or no, true or
false. Fuzzy logic allows for varying degrees of trueness or falseness. A simple example
of this is cold, warm, hot water, with warm being neither hot nor cold. The new system
has elements within sets that can have partial membership to multiple sets.
A field that has benefited from fuzzy logic is controls. Fuzzy logic allows
nonlinear and complex systems to be controlled. The fuzzy controller is capable of
controlling systems that are not well modeled, while providing robust control for noisy or
unstable systems. Other advantages are the simplicity of the controller design. This
allows a human expert to describe the controller properties in the form of if and then
statements. The if and then statements are known as rules. Each rule has a separate
membership function for each input and output. A simple example of this would be the
temperature of the water out of the faucet. If water is cold then increase hot. This form
allows a very intuitive method to describe a controller. [54][43]
Fuzzy controls implement a real-time, expert-guided, nonlinear controller. This is
capable of taking linguistics rules from an expert and implementing them. A piece-wise,
43

linear controller can be made with smooth transitions to implement a nonlinear control
curve.
4.2 FUZZY LOGIC CONTROLLER

A fuzzy logic controller (FLC) is comprised of three parts, fuzzification, inference
engine, and defuzzification. There is also the knowledge-base, which is used to design the
controller, is not an actual component of the controller. The block diagram of a fuzzy
controller is shown in Figure 4.1. [54]

Figure 4.1 Block Diagram of a Fuzzy Controller
4.2.1 The Knowledge-Base

The most important part of an FLC is the knowledge-base. It determines the rules
that govern the controller. The expert, or the system designer, provides this. The form
that the rules use is if, premise, then, consequence. [14] This describes the time-varying
inputs and outputs. Some examples of possible inputs and outputs are error, change-inerror, force, voltage, current, pressure, and temperature. These variables are then broken
down into fuzzy sets, such as: negative-big (NB), negative-medium (NM), negative-small
(NS), zero (Z), positive-small (PS), positive-medium (PM), positive-big (PB), or cold,
warm, and hot, as seen in Figures 4.2 and 4.3
44

Figure 4.2 Seven Membership Functions Fuzzy Sets

Figure 4.3 Three Membership Functions Fuzzy Sets

The fuzzy sets are used in the rules. An example is: If error is PS and change-oferror is NB, then force is NB. All the rules for a system are gathered together to make the
knowledge-base, for a small two-input, one-output system, governing an inverted
pendulum, as shown in Figure 4.4, where e is the error and e& is the change-in-error and
the output value is contained in the grid. From this table, several important points can be
shown. The first is how, for a positional controller, the outputs are set along diagonals.
This is no accident. It guarantees the system will converge to zero error and zero changein-error.

45

Figure 4.4 Knowledge-Base
4.2.2 Fuzzification

Fuzzification is the process of turning crisp inputs into fuzzy inputs. Fuzzy sets
are used to achieve fuzzification , and are derived from the rules. Each fuzzy set is
comprised of membership functions. Although the memberships are used in converting a
crisp number to a fuzzy number, they themselves are not fuzzy. Membership functions
have precise mathematical functions. They can have varying shapes such as Gaussian,
trapezoidal, sharp peak, triangular, and others, as seen in Figure 4.5. [43]

Figure 4. 5 Membership Functions

Triangular functions are often used to simplify calculations. An example of a
fuzzy set is seen in Figure 4.6. This is also an example of a fuzzy partition where the
combination of all of the memberships equal one.
46

Figure 4.6 Seven Membership Functions, Fuzzy Set

The process of changing a crisp input to a fuzzy input usually starts with the input
being scaled to one. This is call normalization. Although not necessary, this is often done.
Next, the normalized input is applied to the membership functions, to determine the fuzzy
set, that the input belongs in. This is then noted. This is shown in Figure 4.7. Then the
mathematical output of these memberships is computed. This is called the degree of
membership. Now the information on which fuzzy sets are active, and the degree of
membership are handed to the inference engine.

Figure 4.7 Fuzzy Processing

47

4.2.3 Inference Engine

The inference engine is where the degree of membership, from the input fuzzy
sets, is combined with the corresponding output fuzzy sets prescribed by the knowledgebase. This is done by taking the non-zero fuzzy sets, determining which rule they belong
to, and activating that rule. Conjunction is the process where the degree of membership of
the input is applied to the output of the corresponding fuzzy sets. The main method is the
minimum/maximum method. This is where the output fuzzy set is cut off at the value of
the degree of membership from the input. An example of this is the rule: if X is A then Z
is C, with the input set to M. The value of the fuzzy set, A at M, is 0.35. This degree of
membership is transferred to the output fuzzy set C. The part above the line 0.35 is
disregarded, leaving the final fuzzy output. This is shown in Figure 4.8.

Figure 4.8 Fuzzification Using Minimum/Maximum Method

In more complex systems, considering the rule: if the error is A and change-inerror is B, then force is C. The output of the membership function error A is minimized
with the output of change-in-error is B. The minimum function simply takes the smallest
of these two values and is represented in the rule base by the word and. In the opposite
case, the rule if error is A or change-in-error is B then force is C. The output of the
membership function error A is maximized with the output of change-in-error is B. The
maximum function simply takes the largest of these two values and is represented in the
rule base by the word
48

or. The maximum function is also used when two input fuzzy sets lead to the same output
fuzzy set. The outputs of the inference engine, one for each active rule, are then passed to
the defuzzifier. In some designs, the inference and the defuzzification can occur in the
same circuit. [54][14][43]
4.2.4 Defuzzifier

Defuzzification occurs in the defuzzifier. The defuzzifier changes the fuzzy
output to a single, crisp output. There are many methods to accomplish this, but there are
two main ones. Center of gravity method (COG) calculates the COG of the shape made
from all the active membership functions after they have been scaled by the inputs. This
forms the value of the crisp output. The equation for COG is Equation 4.1. An example is
shown in Figure 4.9.

∫ μ ( x) * x
a

Crisp Output =

U

∫ μ ( x)

(4.1) [14][54]

a

U

Figure 4.9 Fuzzification and Defuzzification with Center of Gravity Method

49

The other major method is weighted average, and is more often used due to the
lighter computational processing needed. The weighted average method is governed by
the Equation 4.2, where μ a (x) is the location of the centers of the output fuzzy sets, and x
is the value of the conjunction for that fuzzy set. An example is shown in Figure 4.10.

∑ μ ( x) ∗ x
Crisp Output =
∑ μ ( x)
a

U

(4.2) [14][54]

a

U

Figure 4.10 Fuzzification and Defuzzification with Weighted Average Method

4.3 IMPLEMENTATION

There are two major ways to implement a fuzzy controller, and each has
advantages and disadvantages. They are the application-independent fuzzy processor
(AIFP) and the application-specific fuzzy hardware (ASFH). [25][67]
50

The most common type of fuzzy controller is AIFP, which uses software
implementation. AIFP=s are the most versatile. They were not considered for use in this
dissertation. A AIFP would use a microprocessor to run code to manipulate data taken
from the NPLL by way of an analog-to-digital converter (ADC). This method is far too
slow to be accurate. Changes in the input frequency will go unnoticed by the controller
for long periods of time before the output is adjusted. Voltage ripples going into the
voltage-controlled oscillator (VCO) will also be hard to control. The next possibility is
the ASFH family of circuits.
The ASFH hardware-method is faster than AIFP. It can also be implemented on a
chip beside the phase-lock loop, a requirement for this dissertation. Fuzzy hardware can
be broken down further into general-purpose fuzzy processor (GPFP), application-set
fuzzy processor (ASFP), task-dedicated fuzzy processor (TDFP), and look-up table
(LUT). The GPFP is capable of emulating many types of fuzzy controllers. This is more
than needed in the phase-lock loops (PLL), and would require unnecessary circuitry. This
would use more power and require a larger circuit area that cannot be justified. The GPFP
is also the slowest type of hardware-based fuzzy processors, which is not a desired trait.
The next type of fuzzy processor is the ASFP. It is more limited than the GPFP, but still
can perform multi-types of fuzzy processing by using different membership functions and
defuzzification techniques. Again, this type of fuzzy processor hardware has extra
circuitry that requires power and area that will not be used. The TDFP only handles one
system-of-rules. This type of fuzzy processor is the smallest and consumes the least
power. It accomplishes this by stripping away all unnecessary hardware. It would be a
good choice for the NPLL. There is one other option, the LUT. It differs from the other

51

types of fuzzy processors, because it does not process the data in real time. It stores all
possible outputs to each input in a programmable logic device (PLD). This makes it very
small and very fast. This circuit was chosen to implement the fuzzy processor for the
NPLL. [25][67]
4.3.1 Look-Up Table

The last type of fuzzy controller is the LUT. This type of controller, while
technically a sub-set of digital ASHP, it operates in a very different manner. The other
fuzzy controllers all go through the steps of fuzzification, inference engine, and
defuzzification. In the LUT method, all the answers are pre-calculated and stored ahead
of time. When an input value is applied, it’s corresponding value is outputted. This makes
the LUT the fastest method, one of the smallest, and the least subject to internal noise.
The drawbacks are the inflexibility and the addition of a quantization error.
The LUT=s implementation is simple to design. First, the fuzzy controller is
developed, normally in a software program. Then all possible inputs are applied to the
design, and their corresponding outputs are recorded. This data is written in the form of a
table, which is used to program the LUT. The simplest method to perform this is with a
decoder that takes the digital input-value and changes it to a signal that has one outputline for each possible input-value. This is known as a one-hot-signal. These individual
signals turn on elements in the encoder that store the output. [25][67]
4.4 DESIGN OF THE CONTROLLER

Control theory is used to design the shape of the nonlinear-gain circuit. In the
nonlinear element, a small fuzzy center set is established, [-a,0,a], and a rule set of
[-1,0,1]. The gain of the circuit is adjusted by varying the value of a, Figure 4.11.

52

Figure 4.11 Control Surfaces of a Three-Rule Controller

Three values of a must be determined. They are the base line which is a = 1, the
maximum overshoot, and the rise-time. The first is the simplest, the base line where the
gain value is 1. To find the maximum overshoot, a is reduced until half of the overshoot
is achieved, and this value is recorded. A larger damping factor results in a smaller
overshoot. With a large damping factor, there will be less jitter, but little improvement
can be achieved beyond a damping factor of 1.15.[39]
The desired rise and fall-times are achieved by again decreasing a, and this value
is also recorded. The three gain lines are determined by the values of a. These are a gain
of 1, the gain for the overshoot, and the gain for the rise and fall time. These gain lines
are then drawn. The area surrounding the middle line was balanced so that the triangle
formed from the smallest value of a to the mid line is approximately the same as the area
of the triangle formed from the mid-line to the gain of the base line as seen in Figure
4.12. The XY coordinates of the transition points are used to set the final rule and center
values. This system produces a faster locking-time than the standard PLL.

53

Figure 4.12 Control Surfaces for Curve Fitting

In Control Theory, these steps have been previously laid out. To reduce the noise
further, a small low-gain area can be made. See Figure 4.13

Figure 4.13 Control Surfaces for a Low Gain Area

This is achieved with a fourth value of a. The fourth value is larger than the base line
value. After experimenting in MATLAB, a process was developed that uses 1/8 the area
of the other triangles formed by the rise-line and the overshoot-line. This dramatically
reduces the noise, but the precise operating voltage of the VCO is needed. The lower the
gain in this small section, the larger the noise reduction will be. The nonlinear-gain
circuit can also be used in a different manner to reduce noise. The phase detector, while
theoretically linear, has a small distortion around zero.[54] This can be corrected by post54

distorting its output. This will also reduce the phase noise. From MATLAB simulations
and experimenting with the NPLL, the input rules were found to be [-1, -0.6, -0.3, -0.1, 0,
0.1, 0.3, 0.6, 1], and the output centers are [-1, -0.9, -0.6, -0.05, 0, 0.05, 0.6, 0.9, 1]. This
control surface is shown in Figure 4.14.

Figure 4.14 Control Surfaces

If a single-input, single-output controller is used, these values are all that are
needed. If a LUT is used, one further step is necessary. The final step is the quantization
of the control surface. For this, we need to know how many bits the system is using. To
this end, a MATLAB program was developed that produced the values of the LUT when
given the rules and the fuzzy sets. From this, it can be determined that four bits are the
minimum requirement to achieve a reasonable approximation of the control surface. With
anything beyond seven bits, the hardware cost does not result in justifiable improvements
55

in the resolution of the approximation of the control surface. An example of control
surfaces and quantization are shown in Figures. 4.14 and 4.15.

Figure 4.15 Ideal Control Surfaces (Blue)
4-Bit Quantized Control Surfaces (Red)

Figure 4.16 Ideal Control Surfaces (Blue)
6-Bit Quantized Control Surfaces (Red)

56

To implement the designed controller on the NPLL hardware, more steps are
needed. The fuzzy sets were designed with a range of -1 to 1, and must be scaled for the
digital hardware. However, this uses decimals that are hard to implement in a digital
system. These values are scaled to the number of bits in the physical system. In this
dissertation, the hardware was designed as a 4-bit number plus a 1-bit sign number to
produce a -15 to 15 range see Figure 4.17. All possible inputs and outputs are calculated,
and then the outputs are quantized. This is shown in Table 4.1. The circuitry for this will
be discussed in the next chapter.

Figure 4.17 Ideal Control Surfaces Scaled for 4-bit System
(Blue) 4-Bit Quantized Control Surfaces (Red)

57

Input
-15
-14
-13
-12
-11
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0

Fuzzy Output
-16
-15
-15
-15
-15
-15
-14
-13
-12
-11
-10
-9
-6
-3
-1
0

Input
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0

Fuzzy Output
16
15
15
15
15
15
14
13
12
11
10
9
6
3
1
0

Table 4.1 Digital Inputs and Outputs of the 4-Bit Fuzzy Controller

58

5. SUBSYSTEM DESIGN AND SIMULATION
In this dissertation, the design of a clock generator and a clock-distribution system
were developed using a new method of variable gains by means of a fuzzy controller. The
subsystems in this section were combined to form the design of the nonlinear phase-lock
loop (NPLL). These subsystems were built and tested for its correct operation.
The process used to make the NPLL was the IBM 0.13μm dm technology. It is a
2.5/1.2v process, which means that it can support both 2.5v transistors, which are
typically used for the IO Pads, and a 1.2v transistor used in the core circuitry to save
power. The IBM 0.13μm is a radio frequency (RF) process capable of forming capacitors,
resistors, and inductors. It has three thin layers, two thick layers, and two RF layers. The
process is capable of forming transistors as small as 0.12μm in length, but warns that for
a reliable yield, larger gates should be used. Throughout this design, gates with lengths of
0.14μm were used, unless faster speeds were required for particular circuits.
5.1 BASIC GATES

For the NPLL, there are several minor gates that are needed, such as the Inverter,
ANDs, ORs, NANDs, NORs, and Buffers. These gates are used by higher level sub
circuits.

59

5.1.1 The Inverter

The inverter is a basic circuit that takes a signal, either high or low, and outputs
the opposite. In measuring the performance of this circuit or others, several
characteristics are examined.
The inverter works by having PMOS and NMOS transistors in series between the
Vdd and Vss. Vdd is the upper rail voltage that powers the circuits for this process. It is at
1.2v. Vss is the lower rail voltage and is at 0v or is considered ground. The input is
connected to the gates of both transistors, and the output is taken from the junction for the
two transistors. As seen in Figure 5.1.

Figure 5.1 Inverter Schematic

The PMOS transistor is connected to the Vdd and to the output, while the NMOS
is connected between the output and the Vss. If the input is high, the PMOS is turned off
60

and the NMOS is turned on. This allows the output to be pulled-down through the NMOS
transistor. When the input is low, the PMOS is turned on and the NMOS is turned off.
This allows the output to be pulled high through the PMOS transistor.
The propagation delays are the time from when the input changes to when the
corresponding output changes. These measurements are taken at the 50 percent value of
the signal. The propagation delay, for the inverter output to change from high-to-low is
tphl = 16.9 ps, and from low-to-high is tplh = 20.2 ps. The power consumed by the
schematic version has a maximum value of 91.3μW and an average power consumption
of 1.76μW. These power measurements are taken at a signal frequency of 1GHz unless
otherwise stated. For the schematic version of the inverter, the rise-time is 22ps and the
fall-time is 16ps. This is achieved with an appropriate load for the circuit, in this case two
NAND gates.
Observing the layout, the inverter is found to require an area of 3.14μm X
1.86μm. The layout is extracted with capacitance, and simulated for power consumption
and time criteria. The maximum power consumed is 78.86μW with the average power
consumption of 1.37μW. This was less than the schematic required. The rise and falltimes for the layout are Tr = 13.5ps and Tf = 18.7ps. These measurements are less than
the schematic due in large to the fact that the layout simulations did not have a load
attached. This was done because attaching a separate load and then extracting and
simulating it, would be a time consuming difficult process for each of the sub circuits.
The propagation delays for the layout version of the inverter are tpHL = 13.7ps and tpLH =
17.5ps.

61

5.1.2 NAND Gate

The NAND Gate is the most common building block of digital circuits,
because almost all digital circuitry can be built from a simple NAND Gate. The inputs
can be tied together to form an Inverter, four can be combined to form a XOR Gate, and
several can be connected to form larger AND or OR Gates. The functionality of the
NAND Gate is that it only outputs a low signal when both inputs, A and B, are high. This
is done by having two PMOS transistors in parallel connected to the Vdd and across to
the output with their gates attached to their respective inputs. There are also two NMOS
transistors connected in series from the output to the ground, with their gates connected
to their respective inputs. This is shown in Figure 5.2. When any of the input signals are
low, one or both of the PMOS transistors are on, pulling the output high. If both inputs
are high, the two NMOS transistors are turned on, allowing the output to be low. The
timing and power consumption data is located in Table 5.1.The Logic characteristics of
the NAND Gate are found in Table 5.2.

Figure 5.2 Two Input NAND Gate Schematic

62

Schematic
Inverter
NOR
NAND
NAND3
NAND4
AND
AND3
Buffer
XOR
AND_XOR
DFFclrSYS
DFF4
DFF3
TFF2
TFF3
TFF4
TFF5

Tr in ps
48
58.2
58
49
65
50
66.88
31.9
82.5
195.5
61.6
61.2
80
96
137.1
138.1
87.6

Tf in p s
27
41.38
56
46.7
37
31
20.8
30.9
44.6
105
66.7
61.4
75.5
79.7
102
113.3
77.6

TpHL in p s
25
34
39
20
29.9
61.5
53.6
50.4
86
49
160
127
168.6
189.6
200
184
128.3

TpLH in p s
29
45
43
31.3
29
47.1
66.6
66.6
97
49.5
73
161.3
135.1
97.8
117.1
162
168

Pmax in μw Pavg in μw
3.25
1.86
89.6
5.54
62
2.47
139.6
3.46
155
2.7
117.5
5.48
211.7
8.99
148.8
10.4
203.1
20.17
316
9.73
100
7.488
160.8
29.21
160.8
32.29
172.4
29.29
260.5
21.73
409
33.02
302
42.78

Table 5.1(a) Schematic Timing and Power Consumption
Layout
Inverter
NOR
NAND
NAND3
NAND4
AND
AND3
Buffer
XOR
AND_XOR
DFFclrSYS
DFF4
DFF3
TFF2
TFF3
TFF4
TFF5

Tr
In ps
18.6
39.8
30
28
23.1
19
35
16.2
62.6
24
79
36
59
58.2
100
85.2
50.9

Tf
In ps
13.3
32
25
30.7
32.5
13
12.8
21
44.3
72
80
45
65.4
67.3
89.7
80.3
56.5

TpHL
In ps
13.7
28
22
18.5
24.1
50.2
58.1
63
81
55
400
158.6
174
200
222.7
177.7
149.8

TpLH
In ps
17.5
42.7
26
26.1
22.1
53
51
40
114
99
480
140
154
213
236
167
161.2

Power max Power avg
In μw
In μw
78.86
2.76
78.8
4.45
67.6
2.15
83.9
1.69
125.9
1.53
121.8
5.01
233
9.18
114.5
7.5
209
23.9
303.8
9.6
105.5
18.36
210.5
54.07
243
57.39
230.8
44.95
246.5
35.17
453.2
59.52
310.6
71.85

Length
In μm
1.86
3.1
3.12
2.94
3.48
4.22
6.44
2.8
8.14
7.78
11.8
10.43
10.9
19.9
19.6
18.06
18.4

Table 5.1(b) Layout Timing and Power Consumption
Input A
0
0
1
1

Input B
0
1
0
1

Output NAND
1
1
1
0

Table 5. 2 Logic Characteristics of the NAND Gate

63

Width
In μm
3.14
2.32
2.4
3.44
4.68
3.12
3.18
3.12
3.52
4.1
3.8
3.5
3.5
3.8
4.1
4.1
3.5

5.1.3 3-Input NAND Gate

The 3-Input NAND Gate is a modification of the NAND Gate. This allows a third
input C to be applied. The basic functionality is the same; all three of the inputs must be
high for the outputs to be low. The extra transistors required for the third input slows
down the circuit, and it requires more power to operate. The timing and power
consumption are located in Table 5.1.
5.1.4 4-Input NAND Gate

The 4-Input NAND Gate has one more input than the 3-Input NAND Gate, while
maintaining the same functionality. The data for the 4-Input NAND Gate is located in
Table 5.1. These extra transistors add more capacitance, slowing the circuit down further,
requiring even more power to operate.
5.1.5 2-Input NOR Gate

The 2-Input NOR Gate is the other basic building block for most digital circuits.
It produces a low, when either of its inputs, A and B, goes high. This is done by having
two PMOS transistors in series connected to the Vdd across to the output, with their gates
attached to their respective inputs. There are also two NMOS transistors connected in
parallel from the output to the ground, with their gates connected to their respective
inputs. This is shown in Figure 5.3. When any of the input signals are low, one or both of
the PMOS transistors are turned on, pulling the output high. If both inputs are high, the
two NMOS transistors are on, allowing the output to be low. The logic functionality of
the NOR is shown in Table 5.3, and the timing data is shown in Table 5.1.

64

Figure 5.3 Two Input NOR Gate Schematic
Input A
0
0
1
1

Input B
0
1
0
1

Output NOR
1
0
0
0

Table 5.3 Logic Characteristics of the NOR Gate
5.1.6 2-Input AND Gate

The 2-input AND Gate is made of two input NAND Gates followed by an
Inverter. This makes the functionality of the AND Gate high only when both inputs are
high. The characteristics of the AND Gate are located in Table 5.4. The timing and power
consumption are located in Table 5.1
Input A
0
0
1
1

Input B
0
1
0
1

Output AND
0
0
0
1

Table 5.4 Logic Characteristics of the AND Gate

65

5.1.7 3-Input AND Gate

The 3-Input AND Gate is composed of a 2-Input NAND Gate feeding a 2-Input
NOR Gate. The other input of the NOR Gate is formed by inverting the one input of the
AND Gate. This circuit also produces an output of high, if all three inputs are high. It also
consumes significantly more power that the 2-Input AND Gate. The data for the 3-Input
AND Gate is located in Table 5.1.
5.1.8 Buffer

The Buffer is a circuit that takes a weak signal and outputs the same signal only
stronger. This is accomplished by the use of two Invertors. The first is a small inverter, so
that it does not put a large load on the previous stage of the circuit. The first inverter then
powers the second larger inverter, which then powers the following circuits. The
schematic is shown in Figure 5.4. The characteristics are listed in Table 5.1.

Figure 5.4 Buffer Schematic

66

5.1.9 XOR

The exclusive OR gate is a circuit that only outputs a high, if one of its two inputs
is high. If two or neither of its inputs are high, the output will be low. The schematic is
shown in Figure 5.5. This circuit is often used in a Toggle Flip-Flop (TFF), or in the
signed-magnitude subtractor. Both of these will be discussed later. Timing data on the
XOR Gate can be seen in Table 5.1. The logic characteristics of the XOR Gate are found
in Table 5.5

Figure 5.5 Schematic of a XOR
Input A
0
0
1
1

Input B
0
1
0
1

Output XOR
0
1
1
0

Table 5.5 Logic Characteristics of the XOR Gate
5.1.10 AND_XOR

The AND_XOR is not a standard gate. This circuit is a custom design. It contains
combinational logic that acts like an XOR Gate with one input attached to an AND Gate,
67

Figure 5.6. This was done so that the enable in the counters would have equal delay for
each of the TFF in the counters. The logic functionality is shown in Table 5.6, and the
timing data is shown in Table 5.1.

Figure 5.6 Schematic of an AND_XOR
Input A
0
0
0
0
1
1
1
1

Input EN2
0
0
1
1
0
0
1
1

Input EN1
0
1
0
1
0
1
0
1

Output AND_XOR
0
0
0
1
1
1
1
0

Table 5. 6 Logic Characteristics of the AND_XOR Gate
5.1.11 Fast Inverter

Another nonstandard gate is the fast inverter. It is a circuit for this design that
produces a complimented version of the input as well as a buffered version. It also
strengthens the input signal so that it can power a large load Figure 5.7. This circuit is
exclusively used for the NAND decoder, so that all the transistors for each word line can

68

be turned on when the correct input is applied. Timing and power figures can be seen in
Table 5.1

Figure 5.7 Fast Inverter Schematic
5.1.12 D Flip-Flop

The D Flip-Flop (DFF) is a digital device used to store the value of an input when
a clock edge changes. This can be achieved by many different methods. The most
common is the use of two D latches. (Figure 5.8) When a clock edge is low, the input
data can pass into the first latch from the input-pin D. This state is called transparent.
When the clock edge changes to high, the first latch closes, and is maintained with a
feedback loop. At this time, the second latch opens, and allows the value to flow through
it to the input-pin Q. The output stage can also use an Inverter to make the complemented
version of the output signal Q-bar. When the clock is low again, the second latch closes,
and the data value is maintained with the use of a feedback loop. At this point, the first
latch opens again, and the cycle repeats. This set-up can also be modified to include a
clear signal, a reset signal, or the signal to the latches can be reversed so that the flip-flop
can be triggered on the falling edge.
69

Figure 5.8 Typical DFF

In the NPLL developed for this dissertation, there are three different types of flipflops. The first is the D-flip-flop with a synchronous clear (DFFCLRSYS). This flip-flop
has a synchronous active low clear signal. This means that the flip-flop clears itself on the
next clock cycle, when the clear signal is pulled low. This DFF only has an
uncomplimented output, thus saving power and area in the layout.
From the schematic, Figure 5.9, the DFFCLRSYS has rise and fall-times of Tr =
61.6ps and Tf = 66.7ps. The propagation delays are tpHL = 160ps and the tpLH = 73ps. The
maximum power consumed is 100μW and the average power consumption is 7.48μw.
From the layout, Figure 5.10, the area required is 11.8um by 3.8um. The rise and
fall- times are Tr = 79ps and Tf = 80ps. The propagation delays are tpHL = 400ps and tpLH
= 480ps. The maximum power consumed is 105.5μW and the average power
consumption is 18.36μW.

70

Figure 5.9 Schematic of the DFFCLRSYS

Figure 5.10 Layout of the DFFCLRSYS

The second type of DFF is DFF3. This flip-flop has both the output Q and the
complimented output Q-Bar. It also does not have a clearing function that the previous
flip-flop has. This reduces the layout area.
From the schematic for the output Q, this DFF3 has rise and fall-times of Tr =
80ps and Tf = 75.5ps. The propagation delays are tpHL = 168.6ps and tpLH = 135.1ps. The
maximum power consumed is 160.8μW, and the average power consumption is
32.29μW.

71

From the layout, the area required is 10.9μm X 3.5μm. For the output Q, the rise
and fall-times are Tr = 59ps and Tf = 65.4ps. The propagation delays are tpHL = 174ps and
tpLH = 154ps. The maximum power consumed is 243μW, and the average power
consumption is 57.39uW.
From the schematic for the output Q-Bar, the rise and fall-times are Tr = 53ps and
Tf = 36ps. The propagation delays are tpHL = 164.6ps and tpLH = 202.8ps.
From the layout, the rise and fall-times for the output Q-Bar are Tr = 22.6ps and
Tf = 16ps. The propagation delays are tpHL = 169ps and tpLH = 194.3ps.
The third type of DFF is DFF4. The only difference between DFF4 and DFF3 is
the lack of a complimentary output. This should save power and size.
From the schematic, the DFF4 has rise and fall times of Tr = 61.2ps and Tf =
61.4ps. The propagation delays are tpHL = 127ps and tpLH =161.3ps. The maximum power
consumed is 160.8μW, and the average power consumption is 29.21μW.
From the layout, the area required is 10.43μm X 3.5μm, with rise and fall-times
of Tr = 36ps and Tf = 45ps. The propagation delays are tpHL = 158.6ps and tpLH = 140ps.
The maximum power consumed was 210.5μW, and the average power consumption was
54.07μW.
5.1.13 Toggle Flip-Flop

The Toggle Flip-Flop (TFF) is a type of digital memory devise. The TFF changes
its output from high-to-low or low-to-high every time a rising-edge of a clock occurs. In
some cases, it does not need to change on every clock cycle. For this case, an enable is
used to prevent it from changing.
72

There are several different methods to make a TFF. The simplest is to take the QBar output of a DFF and feed that signal back to the D input. Another method to form an
enable-TFF is to use an XOR Gate. The Q output of the DFF is connected to one of the
inputs of the XOR Gates. The other input of the XOR Gate is used for the enable signal.
The output of the XOR Gate is then connected to the input of the DFF. In the NPLL,
there are four types of TFF used, with a clear signal and without, using an XOR gate for
the enable, or the AND-XOR Gate for a 2-input enable. An example of a two input
enabled TFF is shown in Figure 5.11 Timing and power data are located in Table 5.1.

Figure 5.11 Toggle Flip-Flop Schematic

5.2 PHASE FREQUENCY DETECTOR

The phase frequency detector (PFD) is a circuit that compares the input clock from the
off-chip crystal oscillator with the feedback signal to determine if the voltage control
oscillator (VCO) needs to speed up or slow down. The PFD output is determined by both
the phase and frequency of the inputs, making the PFD the most accurate type of phase
detector for phase-lock loops. PFDs are capable of locking onto the widest range of input
73

frequencies, and are capable of measuring the smallest differences between the two input
signals. The PFD schematic is shown in Figure 5.12. It consists of two Set and Reset
(SR) latches, as well as circuitry that clears the SR latches. This circuit has two inputs,
one from the reference clock and the other is formed from the feedback from the VCO.
The PFD circuit has two outputs. The up-output tells, if the VCO needs to go faster. The
other output indicates if it needs to slow down, and is called the down output. As the
rising-edge of the input-clock occurs, and the feedback signal is low, the next edge of the
input signal will set the up output to low. The next falling edge of the feedback signal
will set the up output back-to-high. In the second case, when the rising edge of the inputclock occurs and the feedback signal is high, the next falling edge of the input will
change the down output to low. The following falling-edge of the feedback will reset the
latch to high during this sequence and the up output will be high. When both edges occur
at the same time, both outputs will be high. A simulation plot is shown in Figure 5.13.

Figure 5.12 Phase Frequency Detector Schematic

74

Figure 5.13 Functionally of the Phase Frequency Detector

From the schematic for the output, the PFD has rise- and fall-times of Tr = 138ps
and Tf = 68.5ps. The propagation delays for the up output are tpHL = 150ps and tpLH =
163ps. The down output has similar timing measurements. The maximum power
consumed is 350.3μW, and the average power consumption is 9.38μW. This was
measured at an operating frequency of 100MHz, which is above the predicted operating
frequency of 62MHz.
From the layout, the area required was 12.9μm X 7.16μm, and is shown in Figure
5.14. Both the up and down outputs have the same rise and fall-times, which were Tr =
120ps and Tf = 59ps. The propagation delays between a change in the input signal to the
correct output change are tpHL = 164ps and tpLH = 175ps. The maximum power consumed
is 314.4μW, and the average power consumption is 10.67μW.

75

Figure 5.14 Layout of the Phase Frequency Detector
5.3 DIGITIZER

The digitizer is the circuit between the phase detector and the counters. This
circuit consists of two flip-flops (Figure 5.15), clocked at the same speed as the counters
that follow. It is designed to latch in the data from the phase detector and synchronous it
with the rest of the NPLL. This is required, because of the changes in the output of the
PFD, which can occur at any time. The up and down-signals are used as the enable-signal
to the 4-bit counters in the next circuit. If a change occurs to the up signal that enables
one of the counters at a time close to when the clock edge occurs, the setup and holdtimes of the flip-flops could be violated, causing errors. Some of the flip-flops in the
counters may change when they should not. An example would be the count of 1, 2, 3, 4,
5, 6, 7; the number 7 is digitally represented by 0111. If the enable goes low, arriving at
the two least significant bits first, and not at the two most significant bits on the next
76

clock edge, the count will jump to 1011, which is the number 11, rather than staying on 7
or continuing onto 8, causing a major spike in the output. This error occurs with about a
0.2% chance; while this seems small, it means an error will occur every 500ns. This is
not acceptable. The solution to this problem is the use of a digitizer to ensure that
changes will not occur close to the clock edge.

Figure 5.15 Schematic of the Digitizer

From the schematic, the digitizer has rise and fall-times of Tr = 61.2ps and Tf =
61.4ps. The propagation delays are tpHL = 127ps and tpLH =161.3ps. The maximum power
consumed is 284.3μW, and the average power consumption is 39.92μW.
From the layout (Figure 5.16), the area required was 12.4μm X 7.14μm, with rise
and fall-times of Tr = 36ps and Tf =450ps. The propagation delays are tpHL = 158.6ps and
tpLH = 140ps. The maximum power consumed is 188.2μW, and the average power
consumption is 57.46μW.

77

Figure 5.16 Digitizer Layout
5.4 TIME-TO-BINARY CONVERTER

The time-to-binary converter is a circuit that takes the digitized outputs of the
PDF and changes them to 4-bit numbers based on how long the signal is high. This
design requires three counters of two different types. The type C counter takes the
digitized data of either the up or the down signal as an enable for the counter. So when
input is high the counter advances, and when it is low the counter stops. This will
produce a 4-bit number equal to the length of time the input-signal is high. The 4-bit
counter is clocked at 16 times the input-frequency. This clock signal is taken from a tapoff of the clock divider. (It will be discussed later.) Running the counters at 16 times the
input-frequency gives full coverage of every input-cycle. At the end of the 16-cycle
count, the up and down-counters will have their outputs clocked into a 4-bit register.
Then the counters will be cleared, and the cycle will repeat. To keep track of the 16
cycles, a separate Type-D counter is used. This counter, unlike the others, has no clear or
enable signals, but does have a ripple carry out signal (RCO) that goes high when the
count reaches 16. The RCO signal with some combinational logic tells the registers to

78

latch in the data and then clears the other two counters. The functionally of the time-tobinary converter is shown in Figure 5.17.
From the schematic shown in Figure 5.18, the time-to-binary converter has rise
and fall-times of Tr = 61.2ps and Tf = 61.4ps. The average propagation delay is tp =
730ps. The maximum power consumed is 2.46mW, and the average power consumption
is 382.2μW.
From the layout (Figure5.19), the area required was 44.2μm X 57.24μm. The rise
and fall-times are Tr = 36ps and Tf =45ps. The average propagation delay is tp =
697ps.The maximum power consumed is 2.38mW, and the average power consumption
is 368.8μW.

Figure 5.17 Simulation of the Time-to-Binary Converter

79

Figure 5.18 Time-to-Binary Converter Schematic

Figure 5.19 Time-to-Binary Converter Layout

80

5.4.1 4-Bit Counters

There are two types of 4-bit counters. One has a clear and an enable-signal and no
RCO (Figure 5.20), and the other only has an RCO. The schematic is shown in Figure
5.21. Both counters are 4-bit synchronous, binary counters with parallel enable logic. The
clock-signal is applied to all the TFF. This is done so that all the outputs change at the
same time. Determining if the TFF should change state is done by enabling each one
individually at the correct time. The TFF enable-signal consists of the outputs of all
previous stages of the counter, and the counter’s enable signal AND together. The last
AND gate is compressed in with the XOR gate forming the AND_XOR gate in some of
the TFFs. This is done to insure that the enable-input-pin has the same delay through the
entire counter. The type-D counter differs in that it has no enable or clear input pins, but
it does have the RCO. The RCO is formed from all four outputs by use of an AND gate
formed by a 4-bit NAND and an Inverter. Simulation runs of both types of counters are
shown in Figures 5.22 and 5.23.

Figure 5.20 4-Bit Counter With Clear and an Enable-Signal

81

Figure 5.21 4-Bit Counter With Only RCO

Figure 5.22 Simulation of a 4-Bit Counter With Clear and an Enable-Signal

82

Figure 5.23 Simulation of a 4-Bit Counter With Only RCO

From the schematic, the 4-bit counter with clear and enable-signals, has an
average rise and fall-time for all of its outputs of Tr = 29.05ps and Tf = 27.8 ps. The
average propagation delays are tpHL = 337.5ps and tpLH = 241ps. The maximum power
consumed is 1.072mW and the average power consumption is 169.5μW.
From the layout (Figure 5.24), the area required is 29.62μm X 18.38μm. The
average rise and fall-times are Tr = 40.6ps and Tf = 39.1ps. The average propagation
delays are tpHL = 249.4ps and tpLH = 241.3ps. The maximum power consumed is
1.23mW, and the average power consumption is 327.8μW.
From the schematic, the counter, with just an RCO output, has rise and fall-times
of Tr = 58.58 ps and Tf = 28. 1ps. The propagation delays are tpHL = 375.4 ps and tpLH =
347.9ps. The maximum power consumed is 1.149mW and the average power
consumption is 166.2μW.

83

From the layout (Figure 5.25), the area required is 27.21μm X 17.44μm. The rise
and fall-times are Tr =40.1 ps and Tf =32 ps. The propagation delays are tpHL =299.4 ps
and tpLH =310ps. The maximum power consumed is 1.44mW, and the average power
consumption is 338.3μW.

Figure 5.24 Layout of a 4-Bit Counter With Clear and an Enable-Signal

Figure 5.25 Layout of a 4-Bit Counter With RCO

84

5.4.2 4-Bit Register

The 4-bit Register is just 4, Type-4, DFF with their clocks tied together to form a
register. The schematic is shown in Figure 5.26. This unit is used to store the 4-bit data
coming from the counters. It maintains this data so that the signed-magnitude subtractor
has time to operate. The clock controlling the 4-bit Registers is the same clock that clears
the counting registers, which determine the length of the up and down-signals.
From the schematic, the 4-Bit Register has rise and fall-times of Tr = 61.2ps and
Tf = 61.4ps. The propagation delays are tpHL = 127 ps and tpLH = 161.3ps. The maximum
power consumed is 643.7μW, and the average power consumption is 92.77μW.

Figure 5.26 Schematic of the 4-Bit Register

From the layout (Figure 5.27), the area required is11.6μm X 13.7μm. The rise and
fall-times are Tr = 36ps and Tf =45ps. The propagation delays are tpHL = 158.6ps and tpLH
85

= 140ps. The maximum power consumed is 735μW, and the average power consumption
is 184μW.

Figure 5.27 Layout of the 4-Bit Register
5.5 SIGNED-MAGNITUDE SUBTRACTOR

The outputs of each of the up and down-counter registers go to the signed-magnitude
subtractor. The subtractor takes the digital outputs and subtracts them from each other
forming a signed-magnitude number. As seen in Figure 5.28. This number is the
difference between the up value and the down value of the digital PFD. This process
completes the conversion of the two outputs of the PFD to a single number that
represents the command to speed up or slow down the VCO. The signed-magnitude
subtractor is used rather than a traditional binary subtractor to save the size and
complexity in the DAC, and to increase the resolution of the programmable logic device
(PLD) that forms the nonlinear gain unit. The savings in the size of the PLD is achieved
due to the fact that the gain curve is an odd function about the origin. This will be
discussed later.
86

Figure 5.28 Signed-Magnitude Subtractor Schematic

The signed-magnitude subtractor works by performing a 2=s compliment
subtraction. The final C-out-bit is used as the sign-bit of the signed-magnitude subtractor.
If the C-out-bit of the 2=s compliment subtraction is low, i.e., the output is negative. The
output of the 2=s compliment subtractor is then inverted via a set of XOR gates, if the
output is positive, i.e., C-out is high, this output will not be inverted. The output of the
XOR gates is sent to a set of four half-adders. These adders will add 1, if the C-out is
low. If the C-out is high, a 0 is added. This can be seen in the Example 6-9 (0110 -1001).
Through the 2’s compliment subtraction, the output is 01101, the C-out bit is 0, and the
new number is 1101. This is complimented through the XOR gates to become 0010. Then
1 is added via the half adders to get an answer of 0011, which represents the number 3
with a sign-bit of 1 to indicate a negative number. The functionally of the signedmagnitude subtractor is shown in Figure 5.29.
87

Figure 5.29 Functionally of the Signed-Magnitude Subtractor

The power used in the schematic was measured at 60MHz, the speed at which the
signed-magnitude subtractor will be operating at for standard conditions. The maximum
power is 1.072mW, with an average power consumption of 29.85μW. From the
schematic, the signed-magnitude subtractor outputs has an average rise and fall-times of
Tr = 35.5ps and Tf = 36.5ps. The sign-bit achieve a rise and fall-times of Tr = 56.6ps and
Tf = 63.6ps. The longest path propagation delay is of the most significant bit and is tp =
600ps.
In the layout (Figure 5.30), the maximum power required is 711μW, with an
average power consumption of 37.22μW. The layout required an area of 30.8μm X
18.6μm. From the layout, the signed-magnitude subtractor outputs have an average rise
and fall-times of Tr = 24.8ps and Tf = 34.2ps. The sign-bit achieve a rise and fall-times of
88

Tr = 89ps and Tf = 95ps. The longest path propagation delay is of the most significant bit
and is tp = 580ps. The many sub-circuits of the signed-magnitude subtractor will be
discussed later.

Figure 5.30 Layout of a Signed-Magnitude Subtractor
5.5.1 Full Adder

The signed-magnitude subtractor uses many sub-cell the first is the full adder
circuit (Figure 5.32). The adder takes three, 1-bit numbers, A, B, Cin, and outputs two
values that represent the 2-bit value of the three inputs added together. The two outputs
are commonly called sum, the least significant bit, and carry, the most significant bit. The
three inputs are used to add two values as well as have an input able to take a carry from
a preceding adder unit. Three inputs also use the 2-bit representation of the maximum
value. The logic outputs are shown in Table 5.7 a simulation plot is shown in Figure 5.33.

89

Figure 5.31 Schematic of the Full Adder

Figure 5.32 Simulation of the Full Adder
Input A
0
0
0
0
1
1
1
1

Input B
0
0
1
1
0
0
1
1

Input Cin
0
1
0
1
0
1
0
1

Output Sum
0
1
1
0
1
0
0
1

Output Carry
0
0
0
1
0
1
1
1

Table 5.7 Logic Characteristics of the Adder Gate

90

From the schematic for the output sum, the adder has rise and fall times of Tr
=52ps and Tf = 60ps. The propagation delays are tpHL = 172ps and tpLH = 92ps. The
maximum power consumed is 184.1μW, and the average power consumption is
22.82μW.
From the layout, the area required is 10.2μm X 4.5μm and is presented in Figure
5.33. For the output sum, the rise and fall-times are Tr = 24.4ps and Tf = 34.9ps. The
propagation delays are tpHL =184ps and tpLH = 66ps. The maximum power for the
extracted version of the adder is 186.4μW, with an average power of 22.9μW.

Figure 5.33 Layout of the Full Adder

From the schematic, the output carry has rise and fall-times of Tr = 46ps and Tf =
42ps. The propagation delays are tpHL = 82.4ps and tpLH = 101ps. From the layout, rise
and fall-times for the output carry are Tr = 23ps and Tf = 23.2ps. The propagation delays
are tpHL = 91ps and the tpLH = 75ps.

91

5.5.2 Half Adder

The half adder acts like an adder, but only has two inputs, A and B, and two
outputs, carry and sum, (See Figure5.34). This circuit assumes that the input Cin is
always low. The main advantage to this is that the full adder requires 28 transistors and
the half adder requires only 14 transistors with corresponding size and power reductions.
In designing the half adder, a full adder is taken and its Cin is pulled low. Then the
PMOS transistors are shorted and removed from the design. The effected NMOS
transistors are open circuits and are cut-out along with any transistors they are in series
with. The logic outputs are shown in Table 5.8.

Figure 5.34 Half Adder Schematic

92

Input A
0
0
1
1

Input B
0
1
0
1

Output Sum Output Carry
0
0
1
0
1
0
0
1

Table 5.8 Logic Characteristics of the Half Adder Gate

From the schematic, for the output sum, the half adder has rise and fall-times of Tr
=56.9ps and Tf = 31.9ps. The propagation delays are tpHL = 112.8ps and tpLH = 87ps. The
maximum power consumed is 168.3μW, and the average power consumption is 14.9μW.
From the layout (Figure 5.35), the area required is 7.56μm X 3.68μm. For the
output sum in the layout, the rise and fall-times are Tr = 22.8ps and Tf =20.8ps. The
propagation delays are tpHL =124ps and tpLH = 65.4ps. The maximum power for the
extracted version of the half adder is 168.1μW, and an average power consumption of
16.11μW.

Figure 5.35 Half Adder Layout

93

Analysis of the schematic for the output, carry, has rise and fall times of Tr =
59ps and Tf = 30.4ps. The propagation delays are tpHL = 67ps and tpLH = 84ps. From the
layout, the rise and fall-times for the output carry are Tr = 24.9ps and Tf =14ps. The
propagation delays are tpHL = 62ps and the tpLH = 47.4ps.
5.5.3 Half Subtractor

The half subtractor is similar to the half adder, but instead of the Cin being pulled
low, it is affectively pulled high, and the effected transistors are removed, as seen in
Figure 5.36. The half subtractor is used as the first block in the 4-bit subtractor. The half
subtractor has 14 transistors compared to the full adder=s 28, which makes for a decrease
in size and power required. This circuit is used in the 2=s complement subtractor (which
will be discussed later). The functionality is shown in Table 5.9. A simulation plot is
shown in Figure 5.37.

Figure 5.36 Half Subtractor Schematic

94

Figure 5.37 Simulation of a Half Subtractor
Input A
0
0
1
1

Input B
0
1
0
1

Output Sum
1
0
0
1

Output Carry
0
1
1
1

Table 5.9 Logic Characteristics of the Half Subtractor Gate

Analysis of the schematic for the output, sum, the half subtractor has rise and falltime of Tr = 123.6ps and Tf =89.37ps. The propagation delays are tpHL = 155.9ps and tpLH
=128.4 ps. for the output, carry, the half subtractor has rise and fall-time of Tr = 52.6ps
and Tf =48.5ps. The propagation delays are tpHL = 113.3ps and tpLH =75.6ps. The
maximum power consumed is 174.2μW, and the average power consumption is
26.16μW.
From the layout (Figure 5.38), the area required is 8.02μm X 4.08μm. For the
output sum in the layout, the rise and fall-time are Tr = 21.9ps and Tf = 20.36ps. The
propagation delays are tpHL = 98.1ps and tpLH = 62.5ps. For the output, carry, the half
subtractor has rise and fall-time of Tr = 25.8ps and Tf = 18.5ps. The propagation delays
95

are tpHL = 97.3ps and tpLH = 53.5ps. The maximum power for the extracted version of the
half adder is 210.3μW, and an average power consumption of 26.24μW.

Figure 5.38 Layout of the Half Subtractor Schematic
5.5.4 Subtractor

The subtractor performs 2=s compliment subtraction on the values of the output
registers from the counters. The schematic is shown in Figure 5.39. The output of the
subtractor will proceed to the XOR gates of the signed-magnitude subtractor. The final
carry-bit will be used as the sign-bit and will determine if the XORs invert or not. The
subtractor works by inverting the number that is to be subtracted, then adding that to the
other input. Also, a 1 is added. This is automatically done by use of the half subtractor.
The addition of the two numbers will be performed with a 4-bit ripple carry adder. In this
step, the first full adder has been replaced by a half subtractor. The ripple carry adder is a
parallel adder that has all of its inputs applied at the same time to produce the sum. The
adders are connected by the carry out to the following adders carry in a cascading
96

manner. This allows changes in the least significant bit (LSB) to ripple through all the
adders to the most significant bit (MSB) if needed.
Analysis of the schematic, shows the maximum power is 670.5μW and an average
power of 54.08μW. These power calculations were taken at a frequency of 500MHz.
From the schematic, the subtractor outputs has an average rise and fall-times of Tr =
41.4ps and Tf = 40.7ps The carry out-bit achieve a rise and fall-times of Tr = 51ps and Tf
= 64ps.. The longest path propagation delay is of the most significant bit and is tp =
515ps.
The layout required an area of 19.5μW X 14.6μW and is shown in Figure 5.40,
with a maximum power of 697.1μW, and an average power consumption of 67.73μW.
From the layout, the subtractor outputs has an average rise and fall-times of Tr = 30.5ps
and Tf = 30.6ps. The carry out-bit achieve a rise and fall-times of Tr = 24ps and Tf =
46ps. The longest path propagation delay is of the most significant bit and is tp = 422.6ps.

Figure 5.39 Subtractor Schematic

97

Figure 5.40 Subtractor Layout
5.6 PROGRAMMABLE LOGIC DEVICE

The fuzzy controller is implemented by using a PLD composed of two parts
NAND Decoder and the NOR Encoder The schematic is shown in Figure 5.41 with the
layout in Figure 5.42. The PLD takes the output number of the signed-magnitude
subtractor and outputs a different number that corresponds to the desired output of the
fuzzy controller. The change between the input and the output represents a gain. This is
shown in Table 5.10. The digital output of the PLD is shown in blue, and the desired
curve is shown in red (Figure 5.43). This plot also shows that the PLD only covers the
first quadrant of the plot. The negative side is covered by use of the sign-bit from the
signed-magnitude subtractor in the digital-to-analog converter (DAC). This process also
98

saves hardware. If it was necessary to cover both the negative and positive numbers with
a 5-bit number, the PLD would require 330 transistors compared to the 165 used for the
4-bit with the 5th bit, the sign-bit, being used only in the DAC.
Binary Input
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000

Input
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0

Fuzzy Output
15
15
15
15
15
14
14
13
12
11
10
9
6
3
1
0

Binary output
1111
1111
1111
1111
1111
1110
1110
1101
1100
1011
1010
1001
0110
0011
0001
0000

Table 5. 10 Digital Inputs and Outputs of the 4-Bit Fuzzy Controller

The PLD works by taking the 4-bit input by way of a NAND decoder (which will
be discussed later) and changes that signal to a 15-bit one-hot-signal. A one hot signal
means that each bit represents a single number. Thus, if bit-7 is high, that represents the
number 7. These 15-bits then go to the NOR encoder that take the one-hot-signal and
turns it back to a 4-bit number corresponding to the output value of the fuzzy controller.

99

Figure 5.41 Schematic of the PLD

100

Figure 5.42 Layout of the PLD

101

Figure 5.43 Digital Output of the PLD Blue; the Desired Curve is Shown in Red
5.6.1 NAND Decoder

The NAND decoder is the first block in the PLD (Figure 5.44). It takes the 4-bit
number and changes it to a 15-bit one-hot-signal. Each output bit is called a word line.
When an input is applied, the signal and its compliment are formed in the fast buffer. The
output of the fast buffer, either the compliment or the true form of the input-bits, is
connected to the gates of the NMOS transistors, which control the word lines. There are 4
NMOS transistors in a row, which control each word line. Each word line has a PMOS
transistor pulling the word line high at one end. This transistor is then followed by 4
NMOS transistors, which are controlled by the input-bits, and connected in series. The
last NMOS transistor terminates in the ground wire. To turn on an NMOS gate, the
voltage must be high. Because the NMOS transistors are connected to either the input-bit
102

or its compliment, to turn on the NMOS when the input-bit is 1, it must be connected
directly to the signal. To turn it on when the input-bit is 0, it must be connected to the
compliment signal. When all the NMOSs are turned on in a word line, the line is pulled
to the ground. At this time there is some leakage current from Vdd through the PMOS
transistors and then through the NMOS transistors to ground. This slight power drain
could be avoided by changing this circuit to a dynamic circuit; also, because of both
NMOS and PMOS being turned on. The output signal does not achieve true low status.
This is corrected by use of an inverter on the output of the word line. A simulation run is
shown in Figure 5.45

Figure 5.44 NAND Decoder Schematic

103

Figure 5.45 Simulation Run of the NAND Decoder, Digital Input Red, Outputs
Multicolor

Analysis of the schematic, the NAND decoder has average rise and fall-times of
Tr = 29.3ps and Tf = 53.3 ps. The average propagation delays are tpHL = 205ps and tpLH =
130ps. The maximum power consumed is 2.538mW, and the average power consumption
is 189.4μW.
From the layout (Figure 5.46), the area required is13.04μm X 54.4μm. The
average rise and fall-times are Tr = 25.7ps and Tf = 31.1ps. The average propagation
delays are tpHL = 148.3ps and tpLH = 141.8ps. The maximum power consumed is
2.553mW, and the average power consumption is 237.8μW.

104

Figure 5.46 Layout of the NAND Decoder
5.6.2 NOR Encoder

The NOR encoder takes the one-hot-signal from the NAND decoder and changes
it back to a 4-bit number. The schematic is shown in Figure 5.47. The NOR encoder
works by having four PMOS transistors pulling up the four output lines. Each of the 15
inputs connects to the corresponding gates of the NMOS transistors to pull the output
lines down. An example of this would be, if word line 7 is activated, the NMOS
transistors attached to the output lines B0 and B2 will be activated, pulling those output
lines down. This leaves lines B1 and B4 high, forming 1010, or the number 10. Again,
there is the problem that the power and ground are connected together, causing a leaked
105

current. This only occurs on lines pulled to ground. The other problem is that the line
does not achieve full-low status. This is corrected by buffers in the DAC.

Figure 5.47 NOR Encoder Schematic

Analysis of the schematic, the NOR encoder has the average rise and fall-times of
Tr = 138ps and Tf =64.7 ps. The average propagation delays are tpHL = 46 ps and tpLH =
49ps. The maximum power consumed is 300μW, and the average power consumption is
131.3μW.

From the layout (See Figure 5.48), the area required is 5.68μm X 47.9μm.

The average rise and fall-times are Tr = 246.6ps and Tf = 104.4ps. The average
propagation delays are tpHL = 74.1ps and tpLH = 98.8ps. The maximum power consumed
is 590.9μW, and the average power consumption is 76μW.
106

Figure 5.48 NOR Encoder Layout
5.7 DIGITAL-TO-ANALOG CONVERTER

The DAC (See Figure 5.49) takes the digital output of the PLD and changes it to
an analog value that controls the VCO. The basic structure is a resistor-tree that produces
differing voltages. These voltages are connected to a series of path selectors. The path
selectors contain two pass gates that open and close, allowing one of the two voltages to
proceed to the output. If the control bit is low, the lower input-bit is chosen. If the output
is high, the higher input-bit is chosen. The LSB, D0, of the DAC controls the most
107

switches. The output of the first level of path selectors connects to the second set, which
has half as many path selectors as the first level. The second level of path selectors is
controlled by D1. These outputs are then passed to the third level where they are reduced
further, and passed on the fourth level of path selectors where the outputs are again
reduced. This process results in a single output that is selected by the digital inputs. In
this design, 30 pass gates are used for four inputs, 16 in the first level, 8 in the second, 4
in the third, and 2 in the fourth. If the sign-bit was used, making it a 5-bit system, 62 pass
gates would be required. This would require a significant size and power increase. [2]

Figure 5.49 Schematic of the Digital-To-Analog Converter

Analysis of the schematic, the DAC has rise and fall-times of Tr = 1.47ns and Tf
= 1.5ns. The propagation delays are tpHL = 943ps and tpLH = 1.1ns. A simulation plot of
the DAC is shown in Figure 5.50. The maximum power consumed is 1.419mW, and the
average power consumption is 128.1μW. This was measured with an input clock
108

frequency of 60 MHz. The resistor-tree was powered from Vdd to Vss, which is a larger
range than would be used in normal operations of the NPLL, but gives the worst case of
power consumption. The load for the timing measurements was the low-pass filter.
From the layout Figure 5.51, the area required is 32.14μm X 35.64μm. The rise
and fall-times are Tr = 1.78ns and Tf = 2.16ns. The propagation delays are tpHL = 756.5ps
and tpLH = 1.01ns. The maximum power consumed was 947.6μW, and the average power
consumption was 175.4μW.

Figure 5.50 Layout of the Digital-To-Analog Converter

109

Figure 5.51 Simulation Plot of the DAC, Output Red, Input Bits Multicolor
5.7.1 Path Selectors

In this design, two types of path selectors are used. They are shown in Figure
5.52a and Figure 5.52b. The first contains two pass gates, and the other contains four pass
gates. The path selectors work by selecting between two input-voltages, to produce one
output-voltage. This is done with the use of two pass gates opening and closing at
opposite times. The outputs of the two pass gates are connected together to form the
single output of the sub circuit.

110

(a)
(b)
Figure 5.52 Schematic Path Selectors Two Pass Gates (a) Four Pass Gates (b)

A pass gate is a circuit that can allow a voltage to pass through it or stop it. A
PMOS transistor transmits a high value without loss, but it does not pass a low value with
as much efficiency. The NMOS transistor transmits a low value without loss, but it does
not pass a high value with as much efficiency. To pass a range of voltages, both must be
used. To form the pass gate, a PMOS and NMOS have their sources and drains tied
together to form a single input and output. To have both transistors turned on and off, a
control signal is applied to their gates. For a pass gate to be on, the control signal must be
high. The true signal is connected to the NMOS gate and the compliment of the control
signal is connected to the PMOS gate. To form a pass gate, which is turned on when the
control signal is low, the wires are reversed to the gates of the PMOS and NMOS
transistors. The control signals for the pass gates are formed by the use of a small buffer.

111

In the four-path selector, an extra set of pass gates are used, forming a second
selector that has its own inputs and outputs. Both sets of path selectors are controlled by
the same buffer. This was done to aid in the design of the circuit and to reduce
complexity.
From the schematic, the 2-input path selector has propagation delays of tPHL =
223ps and tpLH = 246 ps. The maximum power consumed is 139.7μW, and the average
power consumption is 8.71μW.
From the layout (See Figure 5.53), the area required was 7.04μm X 4.5μm. The
propagation delays are tpHL = 207ps and tpLH = 215.6ps. The maximum power consumed
is 181.8μW, and the average power consumption is 11.66μW.

Figure 5.53 Layout of the Two Pass Gates Path Selector Two Pass Gates

From the schematic, the 4-input path selector timing information is the same for
both path selectors. The propagation delays are tpHL = 231ps and tPLH = 203ps. The
maximum power consumed is 199.2μW, and the average power consumption is
15.33μW.
112

From the layout (See Figure 5.54) the 4-input path selector=s area required is
7.04μm X 8.92μm. The average propagation delays are tpHL = 298ps and tpLH = 286ps.
The maximum power consumed is 214.7μW, and the average power consumption is
18.69μW.

Figure 5.54 Layout of the Four Pass Gates Path Selector Two Pass Gates
5.7.2 Resistor Tree

The resistor-tree produces the voltages used in the DAC. It consists of a series of
resistors connected together, between a high (top) voltage, a middle bias voltage, and a
low (bottom) voltage. The schematic for the resistor-tree is shown in Figure 5.55. This
113

resistor-tree has the ability to switch between the top and bottom voltages. This will
produce differing voltages between each resistor. This switching is controlled by the
sign-bit produced in the signed-magnitude subtractor. The control signal turns two pass
gates on-and-off which allows the top and bottom voltages to be connected to the resistor
tree at different times. When the control signal is high, the top voltage is applied to the
resistor tree forming the upper half of the voltage range. When the control signal is low,
the bottom voltage is applied to the resistor tree forming the lower half of the voltage
range. This switching saves half the size of the DAC, as well as, the reduction of the size
in the PLD that makes up the fuzzy controller. A simulation plot of the resistor-tree
outputs is shown in Figure 5.56.

Figure 5.55 Resistor Tree Schematic

114

Figure 5.56 Simulation Plot of the Resistor-Tree Outputs Toggling the Sign-bit

From the schematic, the resistor tree has rise and fall-times during the switch of
the top and bottom voltage of Tr = 51ps and Tf = 68ps. The propagation delays are tpHL =
114ps and tpLH = 94ps. The maximum power consumed using the rail voltages for the top
and bottom voltages are 301μW and the average power consumption is 33.68μW.
From the layout (See Figure 5.57), the area required is 8.9μm X 34.78μm. The
rise and fall-times are Tr = 46ps and Tf = 60ps. The propagation delays are tpHL = 135 ps
and tpLH = 94ps. The maximum power consumed is 670.2μW, and the average power
consumption is 104.9μW.

115

Figure 5.57 Layout of the Resistor-Tree
5.8 LOW-PASS FILTER

The low-pass filter is a very important part of all standard phase-lock loops. The
schematic is shown in Figure 5.58. It removes high frequency glitches as well as being
the major influence on the pull-in-range and lock-range. The lower the corner frequency
of the low-pass filter the better the noise performance, but the lock-range and pull-in-time
will suffer. The higher the corner frequency of the filter the better the lock-range and
pull-in-time will be, but noise will increase. The NPLL uses a first order low-pass filter
with a corner frequency of 377.35MHz. A simulation plot of the AC response is shown in
Figure 5.59. This rejects the spikes of the DAC switching while allowing large lock
116

ranges and small pull-in times. The varying gain of the NPLL handles the noise
reduction. In this design, a 5.82K Ohm resistor and a 0.5pf capacitor are used. To achieve
the same noise criteria, the standard phase-lock loop requires a third-order low-pass filter
with a corner frequency of 10MHz. This is accompanied by reductions in pull-in range
and an increased lock time. The layout for the low-pass filter is shown in Figure 5.60 and
the area required for it is 27.3μm X 26.3μm.

Figure 5.58 Low-pass Filter Schematic

Figure 5.59 Simulation Plot of the AC Response

117

Figure 5.60 Low-pass Filter Layout
5.9 VOLTAGE CONTROLLED DUAL-DELAY, DIFFERENCE-DIFFERENTIAL
RING OSCILLATOR

The VCO changes a controlled voltage to an oscillating signal, the speed of which
depends upon the input voltage. The schematic is shown in Figure 5.61. A simulation plot
of the control voltage swept verses the output frequency is shown in Figure 5.62. From
this it is shown that as the input-voltage increases, the frequency decreases. In this
design, a dual-delay differential pair was used. This design allows a low-noise, highfrequency signal to be achieved. In the difference differential VCO, each unit has two
inputs and two outputs. Each stage of the differential VCO works by forming a signal and
its compliment using a fully differential operational amplifier (OPAmp) that measures the
difference between its inputs. The advantages of a fully differential voltage controlled
118

ring oscillator are that it produces different phases of the clock without extra hardware.
Also, noise, on the output-pins of one stage or the power lines, is rejected by the
following stage. [47]

Figure 5.61 Voltage Ring Oscillator Schematic

Figure 5.62 Simulation Plot of the Control Voltage Swept Verses the Output
Frequency

119

The dual-delay VCO has four input and two output-pins. The extra two inputs are
used for negative skewed. This gives each stage of the VCO a warning that the stage
before it is about to change, reducing the apparent propagation delay.
From the schematic, the VCO has a maximum power consumed of 540μW and
the average power consumption is 450μW at the operating point of 0.6V. The rise and
fall-times are Tr = 37.2 ps and Tf = 38.4ps.
From the layout (See Figure 5.63), the area required is 32.82μm X 5.2μm. The
rise and fall-times are Tr = 28.4ps and Tf = 26.7ps. The maximum power consumed is
513μW, and the average power consumption is 458μW at the operating point of 0.6V.

Figure 5.63 Voltage Ring Oscillator Layout
5.9.1 Differential-Delay Cell

Six differential-delay cells make up the VCO. They have four PMOS
transistors, two in each leg of the differential pair. For the negative skew inputs,
Vin2+and Vin2-, one of these inputs is connected to one of the PMOS transistors on each
side of the differential pair. They are connected to the outputs of the cell, two stages
before it. The other two PMOS transistors, one on each side of the differential pair, form
a latch with two NMOS transistors connecting to the opposite side of the differential pair.
The schematic is shown in Figure 5.64. The gates of the NMOS transistors are controlled
120

by the V-control signal. When the control signal is low, the latch is weak allowing the
change in the output on one side to turn off or on the opposite side of the differential pair.
When the control is high, the latch is strong and the crossover effect is weak. The main
inputs, V+ and V-, connect to the NMOS transistors. The outputs are taken from between
the PMOS and the NMOS transistors.

Figure 5.64 Differential-Delay Cell Schematic

The dual-delay cell starts working as the cell, two units before it, changes its
output. This change is applied to the inputs Vin2+ and Vin2-, giving an alert that a
change is about to occur. As the PMOS transistors change, their state-one leg will start
the pull-up process and the other will have its pull-up strength reduced. This will pre-set
the dual-delay cell for the change by raising one output and lowering the other side. If the
V-control signal is low, some of this change could be felt on the other leg of the
121

differential pair. This will receive a further boost in its pull-up side, and lowering the
other side. This will feed back upon itself. If the V-control signal is high, this effect will
occur slower and not until the main inputs, V+ and V-, change. The output of the cell
prior to this cell is connected to two NMOS transistors on opposite sides of the
differential pair. When the prior cell changes it outputs, the legs of the differential pair
will be either pulled-down or pulled-up. [47]
From the schematic, the output Vo+ of the differential delay cell has rise and falltimes of Tr = 51.6ps and Tf = 40.8ps. The propagation delays are tpHL = 32ps and tpLH =
10.4ps. The schematic for the output Vo- of the differential-delay cell has rise and falltimes of Tr = 72.5ps and Tf = 35.3ps. The propagation delays are tpHL = 28.4ps and tpLH =
27.1ps. The maximum power consumed is 149.7μW, and the average power consumption
is 51.83μW.
From the layout Figure 5.65, the area required is 4.79μm X 5.41μm. Vo+ of the
differential delay cell has rise and fall-times of Tr = 45.9ps and T

f

= 24.6ps. The

propagation delays are tpHL = 23ps and tpLH = 18.2ps. From the layout, the output Vo- of
the differential delay cell has rise and fall-times of Tr = 28.2ps and Tf = 50ps. The
propagation delays are tpHL = 26.5ps and tpLH = 15.14ps. The maximum power consumed
is 125.4μW and the average power consumption is 39.14μW

122

Figure 5.65 Differential-Delay Cell Layout
5.10 DIVIDE-BY-32

The VCO outputs a higher frequency signal than the input control signal. To
accurately compare these two signals, in the phase detector, the output of the VCO must
be slowed down. A clock divider achieves this. In this NPLL, it is a Divide-by-32
counter. The schematic is shown in Figure 5.66. It uses of TFF to half the speed of its
input. To go from 2GHz to the 62.5 MHz, requires 5 TFFs in a series. Also, in the
Divide-by-32 there is a tap-out for a 1GHz signal, after the first TFF, to operate the
digitizer and the counters.

Figure 5.66 Divide-By-32 Counter Schematic

123

From the schematic, the Divide-by-32 counter has rise and fall-times for the
1GHz output of Tr = 19.2ps and Tf = 17ps. The propagation delays were tpHL = 70.4 ps
and tpLH = 23.5ps. The maximum power consumed is 902.5μW, and the average power
consumption is 353.7μw.
The 62.5MHz Divide-by-32 counter has rise and fall times of Tr = 19.2ps and Tf
= 15.8ps. The propagation delays are tpHL = 177.8ps and tpLH = 150.4ps.
From the layout Figure 5.67, the area required was 32μm X 6.9μm. The Divideby-32 counter has rise and fall-times for the 1GHz output of Tr = 19.3ps and Tf = 16.6ps.
The propagation delays were tpHL = 232 ps and tpLH = 200.8ps. The 62.5MHz Divide-by32 counter has rise and fall times of Tr = 18.4ps and Tf = 15.8ps. The propagation delays
are tpHL = 392.5ps and tpLH = 322.8ps. The maximum power consumed is 902.5μW, and
the average power consumption is 353.7μW.

Figure 5.67 Divide-By-32 Counter Layout

124

6. CLOCK DISTRIBUTION
No matter how good a clock generator is, the clock signal must still be distributed
to the rest of the circuit. This is achieved with the clock-distribution system, and is a
high-quality test for the nonlinear phase-lock loop (NPLL) in this dissertation. The clockdistribution system is designed to supply a chip with a steady clock signal. In this
dissertation, a global H-tree distribution system is coupled with a regional-grid system.
Local distribution is left to the end user to design.
The clock-distribution system starts with an H-tree having a minimum buffer and
increases in size, through seven inverters, to power the regional grids. Each grid can
support a load of 2 to 3 pF from its local distribution network. There are 64 grids in the
total clock-distribution system. Each grid provides a clock signal to an area of 1.266
mm2.

6.1 H-TREE DISTRIBUTION SYSTEM

The H-tree distribution system takes a small signal and distributes it across the
surface of the chip. Buffers are used to power each of the branching paths. These paths
are formed in a series of H shaped structures of decreasing size. (See Figure 6.1)

125

Figure 6.1 H-Tree

This system is small and flexible, and provides excellent coverage over a wide
area of a chip. To design the H-tree, the load, the area to be covered, and the driving
capability of the first small buffer must be considered. For this design, the load of the Htree is the regional grid distribution system, which produces a large load capacitance.
This load is not only from the grid itself, but all the local clocks attached to the regional
grid. In the design, a load of 2 to 3pF was used. [50]
6.1.1 Area

The second consideration is the area of the chip to be supplied with the clock
signal. The design is for an 81mm2 area chip, allowing an extra millimeter ring for the
input-output pads. This would give a total of 100 mm2 area, which is a standard chip run
from MOSIS. For this area, 64 grids will be used. If each branch of the clock tree splits
into two sub branches, 64 grids would require six stages with an additional inverter
powering each grid.
6.1.2 Driving Strength

The third characteristic is the driving strength of each inverter. This is based on
the input capacitance of the minimum size inverter. This first small inverter is designed to
have rise and fall-times that are roughly equal and as small as they can be without
increasing the size of the transistors. The first small inverter has an 160/130µm NMOS
126

transistor, and a 500/130µm PMOS transistor. This gives a rise and fall-time of Tr =
32.6ps and Tf = 32.7ps. The propagation delays are tpHL = 24.3 ps and tpLH = 25.6ps. With
this small inverter as the base of the clock-distribution system, the next step is to
determine the input capacitance of the inverter.
To determine the input capacitance of the minimum-sized inverter, a series of
measurements of the propagation delay of the inverter with different loads must be taken.
Two equations are used to determine the input capacitance. These equations are tpHL =
K0HL + K1HL* CL and tpLH = K0LH + K1LH * CL, where K0HL and K0LH are the intrinsic-gate
delays with no load for high-to-low and low-to-high and K1HL and K1LH are the sensitivity
of the gate delay due to variations in the load. CL is the load on the inverter.
To find the values of K0HL, K1HL, K0HL, and K0LH, the two equations with two
unknowns must be solved with data from the measurements of the minimum inverter
with differing loads. For best results, an average of tpHL and tpLH should be used. The
values are found to be K0HL = 21.3, K1HL = 8.98, K0HL = 19.83 and K1LH = 8.36. With
these values, the third test of the small inverter powering an identical copy of itself was
chosen so that the input capacitance is the same as the load capacitance. The equations
tpHL = K0HL + K1HL* CL and tpLH = K0LH + K1LH * CL, are used this time to solve for the
unknown load capacitance. This gives two slightly different values that are averaged to
determine the input load. The input capacitance is calculated to be 0.505fF. [50]
6.1.3 Stages and Sizing

With the input capacitance, the number of stages and the sizing between stages,
can be determined. There are two methods to calculate this. The first is the Equation 6.1,
where N is the number of stages, CL is the final load of the circuit, Cin is the input

127

capacitance of the minimum inverter, and e is Euler’s number (e=2.71828). This number
is the optimal value to minimize the delay between input and output and is the
multiplying factor between the widths of the transistors between stages. With a load of
2pF, the number of stages becomes 8.29 and with 3pF, the number of stages is 8.69. If
eight stages are used, the gain between stages becomes 2.81 and 2.96 for 2 and 3pF loads
respectively. This is at the higher end of what would be desirable for the gain. With nine
stages, the gain factor drops to 2.51 and 2.63, but with an odd number of stages this
clock-distribution system would give the complement to the signal, not the true signal.
Ten inverters would give the true signal, and would only have a gain of 2.28 and 2.38 per
stage for the loads of 2 to 3pF.
⎛C ⎞
ln⎜⎜ L ⎟⎟
C
N = ⎝ in ⎠
ln(e)

(6.1) [50]

The second method determines the gain between stages when the number of
stages is specified. The Equation for this is.
⎡C ⎤
Gain = ⎢ L ⎥
⎣ C in ⎦

1

N

(6.2) [2]

With a load of 2 to 3pF, and ten stages, the gain factor becomes 2.28 and 2.38. With
these values as a guideline, a series of tests of the distribution system can be performed.
To achieve fast rise and fall-times and have a reasonable propagation delay, a gain value
of 1.9 was chosen. This is at the low end of the acceptable gain factors, and is done to
save power. This becomes a gain of 3.8 for the branches that split. This gives the clockdistribution system the following widths of PMOS transistors for each stage of 500nm,
950nm, 1.8μm, 3.4μm, 6.5μm, 12.3μm, 23.5μm, 44.6μm, 84.9μm, and 161.3μm. For the
128

NMOS transistors, the widths were 160nm, 304nn, 577nm, 1.1μm, 2.0μm, 3.9μm, 7.5μm
14.3μm, 27.1μm, and 51.6μm. A further concern is when the H-tree branches. An inverter
at a branching path powers two inverters of the next stage. The combined width of the
two inverters should not be more than that of a single inverter of the desired gain factor.
The clock tree covering 64 grids only needs six branching paths to cover them, but the
clock tree has ten stages. This means that there are four levels that do not branch. Three
of these levels are at the start of the distribution system. At the start, the transistors are
small and use less power. These stages also have higher gain than the stages that split.
The stage-six inverters power only one stage-seven inverter each. The inverters for stage
nine and ten are large, and are broken down into smaller inverters along the path of the
clock-distribution system. The ninth stage is broken down into two inverters, with each
half powering one level-ten inverter. The level-ten inverter is broken down into eight
smaller inverters. Two sets of four are distributed on two of the edges of the regional
grid, as seen in Figure 6.2.

Figure 6.2 Buffer Stages

The H-tree structure consists of one inv1, one inv2s, one inv3s, two inv4s, four
inv5s, eight inv6s, eight inv7s, sixteen inv8s, thirty-two inv9s, and sixty-four inv10s. The
power consumed by each stage increases, due to both the number of inverters increasing
and the increasing size of the transistors in the inverters. The power consumed by the first
stage is 10.6μW, by the second stage is 31.7μW, by the third stage is 115.5μW, by the
129

fourth stage is 444μW, and by the fifth stage is 1.81mW. Carrying on to the sixth stage
the power consumed is 3.94mW, by the seventh stage is 14.4mW, by the eighth stage is
42.6mW, by the ninth stage is 137mW, and by the tenth stage is 435.4mW.
6.2 REGIONAL DISTRIBUTION ATHE GRID@

The method of regional distribution is a grid. This method is excellent for
distributing a clock signal with low skew, but consumes a large amount of power. The
grid consists of 64 squares of 1.25mm on a side. The grid is laid out in the first thick
metal layer. This allows the end user to place their circuits anywhere, including under the
grid itself, without conflicts between metal layers in their sub cells and the grid. Each
grid is further broken down into sub grids that are 375μm X 1.125mm. There are eight
inverters powering each grid. These inverters must power the grid and all the end user’s
circuits that are attached to the grid by the end user. The eight inverters are located on the
corners of the sub grids. To reduce noise, the clock lines are shielded by ground wires on
either side. Shielding wires are very effective at reducing noise. They are capable of
almost eliminating crosstalk noise that is less than 0.54 volts. The only, more effective
method is to use reduced-voltage, differential signals. To further reduce skew, the input
lines to the inverters are matched through serpentines to insure their input lines are the
same lengths. This insures that the inverters close to the source of the input have the same
length of wire leading to them as do the ones further away.
6.3 SCHEMATIC

In the schematic version of the clock-distribution system, there are 11 types of
inverters, inv1, inv2, inv3, inv4, inv5, inv6, inv7, inv8, inv9b, inv10, inv10b. The basic
structures of these inverters are shown in Figure 6.3.

130

Figure 6.3 Schematic of a Buffer

These inverters make up the basis of the clock distribution system. From the clock
generator, the clock signal passes through a single inv1 that powers inv2, which in turn
powers inv3. The inv3 controls two inv4s. Then one inv4 powers two inv5s. Each inv5, in
turn, controls two inv6s. Each inv6 powers one inv7. One inv7 delivers a signal to two
inv8s. An inv8 powers two inv9s, each of which is composed of two inv9bs. Each of the
inv9bs powers one inv10 that is composed of eight inv10bs. This is shown in Figures 6.4,
6.5 and 6.6.

Figure 6.4 Schematic of the H-tree Clock Distribution System Covering 64 Grids

131

Figure 6.5 Schematic of the H-tree Clock Distribution System Covering 16 Grids

Figure 6.6 Schematic of the H-tree Clock Distribution System Covering 4 Grids

The final step is the schematic that models the load. This schematic represents the
load to be applied by the end user as well as the load of the grid itself. The wires
themselves are modeled as transmission lines with resistance, capacitance, and
inductance. This is shown in Figure 6.7.
132

Figure 6.7 Schematic of the Grid Model

The maximum power for the clock-distribution system operating at 2GHz is
1.75W and the average power consumption is 564.9mW. From the schematic, an analysis
of the clock-distribution system with a load of 2pF has rise and fall-times of Tr = 72.6ps
and Tf = 70.8ps. The propagation delays are tpHL = 443.1ps and tpLH = 436.9ps (See
Figure 6.8). With a load of 3pF, it has rise and fall-times of Tr = 96.76ps and Tf = 96.2ps.
The propagation delays are tpHL = 454.8ps and tpLH = 450.7ps. By varying the load, this
demonstrates that the clock-distribution system supplies a low-skew clock signal with
little variance due to the load on the grid.

Figure 6.8 Simulation of H-tree Clock Distribution System Input (Black) Output
(Red)

133

Monte Carlo analysis is a means of determining how small changes in the circuit
effects the performance of the circuit. In simulation runs, parameters are changed that
effect the delay of the circuit. For each run that a circuit has a particular delay within a
range, a counter is added to that histogram’s bin. Further testing using Monte Carlo found
that by varying the process parameters and the mismatch among individual components,
the propagation delay varied between 368ps and 384ps with a standard deviation of
6.94ps and a mean of 374.5ps. These results are shown in Figure 6.9. The deviation
within the grid of the clock signal is 17.4fs with a standard deviation of 192.1fs. This is
shown in Figure 6.10 where the axes are the delay between two points and how many
times this value occurred.

Delay Between Input and grid R1Q1G1 in ps
Figure 6.9 Example of the Monte Carlo Simulation for the H-tree Clock
Distribution System

134

Delay Within Grid R1Q1G1 in ps
Figure 6.10 Example of the Monte Carlo Simulation for the Grid
6.4 LAYOUT

The layout is broken down into five layers. This is done to reduce demands on
designing the clock-distribution system, as well as to allow the reuse of smaller parts. The
first is the grid itself with the load capacitors attached to the center of each wire, as
shown in Figure 6.11. The wires in these grids are shielded to help in the reduction of
noise. The next layer has the level-ten inverters and one inv9b that powers the level-ten
inverters. Figure 6.12. The next design level takes four grids and attaches the level-eight
inverters. This is called a 4-H block and is shown in Figure 6.13. Four 4-H blocks with
the addition of the inverters for stages five, six, and seven, form a 16-H block, making up
a quarter of the H-tree distribution system. Figure 6.14. The next layer has four 16-H
blocks and the inverter stages one through four. This completes the H-tree distribution
system, as shown in Figure 6.15.

135

Figure 6.11 Grid Layout, With Load Capacitance

Figure 6.12 Grid Layout, With Level-ten and One inv9b Inverters

136

Figure 6.13 Layout of Four Grids, With Level-eight Inverters

Figure 6.14 Layout of Sixteen Grids, With Level-five Inverters

137

Figure 6.15 Layout of the H-tree Clock Distribution System Covering 64 Grids

In between the grids lie the power and ground lines that are intended to power the
H-tree distribution system, along with any circuits the end user may add. Each line has
the extra current capacity of 200mA. If more current is needed, the E1 wires can be
widened for extra capacity.
From the layout, the maximum power for the clock-distribution system is
2.1376W and the average power consumption is 0.534W. From the layout, the clock
distribution system with a load of 2pF has rise and fall-times of Tr = 66.92ps and Tf
=66.75 ps (See Figure 6.16). The propagation delays are tpHL = 389.8ps and tpLH =
358.94 ps. The area covered is 9300.44μm X 9112.3μm.

138

Figure 6.16 Simulation of H-tree Clock Distribution System from Layout
Input (Black) Output (Blue)

The design of the H-tree clock distribution provides excellent, stable, low-noise,
and a low-skew signal throughout an entire chip area, and provides an excellent test for
the NPLL. This is done with ten buffer stages in an H-tree that branches six times. The 64
regional grids can handle 2 to 3pF loads applied by the end user. Noise is reduced with
shielding wires that protect the clock lines. Monte Carlo simulations show only a 14.5ps
skew between grids.

139

7. RESULTS
Phase-lock loops (PLLs) are very important in communications and signal
processing. Along with these uses, a PLL is used to form a stable clock signal for a
microprocessor. While a standard PLL is useful, there is room for improvement to
increase locking time and reduce noise. For this dissertation, a clock-distribution system
has been developed using an NPLL clock generator and an H-tree distribution network.
Both of these out-performed their published competitors.
There are four main methods to improve a PLL: multiple charge pumps, variable
bandwidths of the loop-filter, multiple controls for the voltage-control oscillator (VCO),
and integrating the error signal. Multiple charge pumps first give a high-current and highgain for faster locking time, and then switch to a separate charge pump with a small
current that produces a lower gain for when the PLL is locked to reduce ripple voltage to
the VCO to reduce jitter. Usually, this scheme is limited to two charge pumps and an
inlock detector.[73][2] Another method to achieve this is to use two phase-detectors (PD)
each controlling a single charge pump.[37]
The second method is to change the loop-filter of the PLL. Adjustable loop-filters
have wide bandwidths for fast locking. Then, once the end-lock detector is achieved, the
bandwidth is reduced to lessen the ripple voltage and jitter. The simplest method to
achieve this is to switch in and out an extra capacitor or resistor in the filter.[61] Digital
filters are simpler, because they switch on more stages or adjust the coefficients of the
140

filter.
The third method is to use a VCO that has both coarse and fine-tuning ranges.
This is typically done in an inductor capacitance (LC) oscillator. Two different inputs can
be connected to variable resistors that affect the speed of oscillation. They are controlled
by differing PDs and charge pumps.[23][10] While this method does reduce some noise, it is
typically done to increase frequency range.
The last method is to use an integrator of the error signal. This is typically used
with a multiple-input VCO. The integrator keeps track of the error-signal overtime, and
corrects for it.[30][68] This method slows down locking time, but can be significant in
reducing error. The NPLL will be compared to PLL using these techniques, and will be
shown to out perform them.
7.1 NONLINEAR PHASE-LOCK LOOP

The methods, discussed in the previous section, have been used in the past to
reduce jitter and decrease lock time. However, the nonlinear phase-lock loop (NPLL) can
further decrease lock-time and reduce jitter. The NPLL operates by adjusting the gain of
the forward path of the NPLL. This is done so that high-gain provides fast locking time.
When the desired frequency is achieved, there is a reduction in gain, which lowers jitter.
This is achieved with a fuzzy controller.
The NPLL works by taking an input signal that is generated from a low-speed offchip oscillator, and compares it with the feedback signal in a phase frequency detector
(PFD). The PFD has two outputs. The first tells if the VCO needs to speed up and the
other indicates if the VCO needs to slow down. The outputs of the PDF are synchronized
141

with the clock by the use of two D flip-flops (DFF). The synchronized signal is used to
enable two 4-bit counters. The counters act as a time-to-digital converter. The longer the
input signal is high, the larger the digital-output value will be. These digital values are
loaded into registers and the counters are cleared. This is accomplished with a third
counter. The values of the registers are then subtracted to determine the average value
needed to control the VCO. The subtracter is a sign-magnitude subtracter. This type of
subtracter is used because both the positive and negative values mirror each other,
allowing the size of the programmable logic device (PLD) to be halved, which is the next
circuit.
The PLD implements the fuzzy controller, where any given input value will
output a corresponding value described by the fuzzy sets. Inputs away from the operating
point will be increased, making a gain. Values close to the operating point will be
decreased, reducing the gain. The PLD outputs a digital value, which needs to be changed
into an analog value for the low-pass filter and the VCO. To accomplish this, a digital-toanalog converter (DAC) is used. It consists of a resistor tree that switches values from the
high range to the low range based on the value of the sign-bit from the sign-magnitude
subtracter. The switching of the resistor tree saves area and power in the DAC. The
voltages formed from the resistor tree are connected to a series of pass-gates that are
opened and closed based on the value out of the PLD. From this, a single voltage is
applied to the low-pass filter. The low-pass filter eliminates any glitches from the switch
of the DAC, and consists of a simple, first-order time-constant (RC) circuit that forms the
control signal for the VCO. The VCO is a dual-delay, differential, six-stage, ring
oscillator that produces the desired output frequency. The VCO has a range of about
142

2.3GHz to 1.7GHz. The output is fed back to the phase detector by way of a divider that
consists of five, toggle flip-flops (TFFs). The NPLL schematic is shown in Figure 7.1 and
the layout is shown Figure 7.2.

Figure 7.1 NPLL Schematic

Figure 7.2 NPLL Layout

With this architecture, a fast-locking NPLL can be achieved. This design out
performs specifically structured, standard PLLs, and other design concepts, to improve
143

lock time and jitter. Six other PLLs were designed with stander architectures in the same
IBM 0.13μm process for comparison purposes. In this dissertation, they are called the
“stander PLLs”. The standard PLLs use as many of the same sub cells as possible to
show the improvement in performance is do to architecture. These standard PLLs designs
differed in the type of phase detectors used. One was the standard XOR phase detector.
The second was the set & reset (SR) PFD. Two instances of the charge pumps were
designed for the SR PDF. One pump had high-gain and the other had a lower gain. In
three cases, differing second-order low-pass filters were used to achieve the same locking
time as the NPLL. In the other three cases, the low-pass filter was designed to achieve
minimum jitter. These six PLLs are shown in Figures 7.3 to 7.8. All the stander PLLs use
the same VCO, divide by counter and PD, for the ones that use the SR PDF type, as the
NPLL. This has lead to some problems the VCO is sensitive to changes in the control
voltage. This can cause some increases in the phase noise and rippling effects in the
output frequency. The fact that both the NPLL and the standard PLLs have this problem
show that it is from the VCO and not the NPLL architecture. If a better performing PLL
is desired the VCO should be redesigned along with the PD.

Figure 7.3 PLL One PFD Low Gain Larger Bandwidth for Faster Locking Time

144

Figure 7.4 PLL Two XOR Larger Bandwidth for Faster Locking Time

Figure 7.5 PLL Three PFD High Gain Larger Bandwidth for Faster Locking Time

145

Figure 7.6 PLL4 XOR Smaller Bandwidth for Low Jitter

Figure 7.7 PLL Five PFD Low Gain Smaller Bandwidth for Low Jitter

Figure 7.8 PLL Six PFD High Gain Smaller Bandwidth for Low Jitter

146

7.1.1 Jitter

Jitter is a small uncertainty in the clock signal. There are three main definitions of
jitter: absolute jitter, adjacent-period jitter, and period jitter. Also there is root mean
square (RMS) jitter, which is a standard method of measuring the amount of uncertainty
in a circuit. The measurements from the NPLL show an improvement over both the
reported systems found in the literature review and the standard PLL=s.
7.1.1.1 Adjacent Period Jitter

The adjacent period jitter is the difference between one period of the clock and the
next .This is sometimes called peak-to-peak jitter. In the NPLL, the adjacent period jitter
was found to be 2.59ps. This is a reduction of 84.5 percent over the best reported value.
The comparison data is shown in Table 7.1. Figures 7.9 and 7.10, showing close-ups of
the frequency-versus-time plots for both the NPLL and the best performing standard
PLL. However, the standard PLL requires 1.6μs to reach steady-state and achieve its
minimum jitter value. The NPLL maintained its jitter values even when frequency
hopping. This is not true for the standard PLL. Frequency-versus-time plots are shown in
Figure 7.11 and 7.12. They show that as the input-frequency changes, so will the output
while maintaining low-jitter.

147

Time

Figure 7.9 Close-Up of the Frequency-Versus-Time Plots for the NPLL

Time

Figure 7.10 Close-Up of the Frequency-Versus-Time Plots for the Standard PLL

148

Time

Figure 7.11 Frequency-Versus-Time Plots for the NPLL While Frequency Hopping

Time

Figure 7.12 Frequency-Versus-Time Plots for Standard PLL While Frequency
Hopping

149

Reference #
NPLL Layout
NPLL Schematic
27
23
47
9
5
34
35
8
20
6

PLL1 Schematic
PLL2 Schematic
PLL3 Schematic
PLL4 Schematic
PLL5 Schematic
PLL6 Schematic

Adjacent Period Jitter ps % Improvement
2.59
100
0.23
-1026.1
16.7
84.5
25
89.6
80
96.8
45
94.2
62
95.8
20
87
28.8
91
110
97.6
222
98.8
17.6
85.3
25.54
89.9
47.27
94.5
36.36
92.9
0.3929
-559.2
0.7115
-264
0.3445
-651.8

Table 7.1 Adjacent Period Jitter For Phase-Lock Loops
7.1.1.2 Period Jitter

Period jitter is the variation in a period of the clock produced by the PLL from the
perfect clock period. The NPLL has a Period jitter of 6ps. This is an improvement of 62
percent over the best specifically designed PLL. The reason the Period jitter is more than
half of the peak-to-peak jitter, is that the output is slightly less than the perfect clock on
average. One side will have a larger jitter value. Comparing with systems found in the
literature review is difficult, because this value is seldom reported.
7.1.2 Phase Noise

Phase noise is sometimes used as a measure of performance. It is defined here as
the difference in power from the main signal to a 1Hz bandwidth that is at the offset
frequency away from the main signal in the DFT. The offset frequency is normally either
1KHz or 1MHz. The NPLL has a phase noise of -56dBc/Hz at 1MHz from the center
150

frequency of 2GHz. This is not the most impressive value. At 10MHz, the phase noise
drops to -67.4dBc/Hz. Reported values for phase noise are usually below -100dBc/Hz.
For cases of frequency hopping, the phase noise decreases to -39.41 at 1MHz. This is for
a case of switching between 2GHz and 2.28GHz at a switching rate of 6MHz. The DFTs
are shown in Figures 7.13 and 7.14. The best performing, standard PLLs for the same
process has a phase noise of -15.2dBc/Hz at 1MHz and -28.99dBc/Hz at 10MHz Figure
7.15. When the standard PLL is used for frequency hopping, the phase noise, on average,
decreases to -16.83 dBc at 1MHz, and -22.41 dBc/Hz at 10MHz. This is for a case
switching between 2GHz and 1.778GHz at a switching rate of 6MHz. The DFTs are
shown in Figure 7.16.

Figure 7.13 DFT of the NPLL at 2GHz

151

Figure 7.14 DFT of the NPLL Frequency Hopping at 2GHz and 1.88GHz

Figure 7.15 DFT of PLL Five at 1.778GHz

152

Figure 7.16 DFT of PLL Three Frequency Hopping at 2GHz and 1.778GHz

7.1.3 Signal-to-Noise Ratio

Signal-to-noise ratio (SNR) is the difference between the peak of the signal and
the noise floor. The NPLL has a SNR of 102dB. The standard best performing PLL has a
SNR of 69.68dB. This data was taken after a settling time of 1.4μs. The DFT(s) for both
of these cases are shown in Figure 7.13 and 7.17.

153

Figure 7.17 DFT of PLL Five
7.1.4 Spurious Free Dynamic Range

Spurious free dynamic range (SFDR) is another performance measure. The SFDR
is defined as the difference between the signal peak and the highest harmonic. For the
NPLL, this value is 73.8dB and for the standard PLL it is14.57dB. The data for the
standard PLL was taken after longer settling time than for the NPLL. The DFT for both
of these cases are shown in Figure 7.13 for the NPLL and 7.17 for the best performing
standard PLL.
7.1.5 Power

The NPLL developed for this dissertation requires a maximum power of 3.97 mW
and an average power of 1.98mW for a 1.2v system. This is less than comparable systems
found in the literature review. This may be due to a low rail voltage. To show that this is

154

not due to power supply differences, the other systems power requirements are scaled by
the Equation 7.1.
Pnew =

Pold
U2

(7.1)

to find the scale-factor U the Equation 7.2 is used.
U=

Vdd old
Vdd new

(7.2)

The NPLL shows a scaled improvement of 14 percent to 93.4 percent of the
systems found in the literature review, when corrections are made for the different
voltage sources. This data is shown in Table 7.2. There are only two compared systems
that use less power after scaling.[8][46] This is mainly due to their slow oscillator
frequency. Most of the specifically designed PLLs use less power due to their simplicity.
The two exceptions are the ones with large charge pumps that require more power.
Reference #
NPLL layout
NPLL Schematic
27
23
13
46
9
5
34
35
31
12
21
8
20
6

63
PLL1 Schematic
PLL2 Schematic
PLL3 Schematic
PLL4 Schematic
PLL5 Schematic
PLL6 Schematic

Vdd
1.2
1.2
1.8
3.3
1.5
3
3.3
2.5
3
2.5
2.5
1.3
2
1.8
3.3
2.2
1.8
1.2
1.2
1.2
1.2
1.2
1.2

POWER mw Power Scaled
1.982
1.982
1.44
1.44
6.4
2.844
130
17.19
19.5
12.48
4.1
0.656
23.1
3.055
100
23.04
15
2.4
10
2.304
20
4.608
35
29.822
50
18
2.95
1.311
200
26.446
23
6.843
37.62
16.72
0.992
0.992
0.558
0.558
11.244
11.244
560.8
560.8
1.002
1.002
10.265
10.265

%
1
1.376
0.697
0.115
0.159
3.021
0.649
0.086
0.826
0.86
0.43
0.066
0.11
1.512
0.075
0.29
0.119
1.998
3.552
0.176
0.004
1.978
0.193

Improvement
0
-37.6
30.3
88.5
84.1
-202.1
35.1
91.4
17.4
14
57
93.4
89
-51.2
92.5
71
88.1
-99.8
-255.2
82.4
99.6
-97.8
80.7

Table 7.2 Power for Phase-Lock Loops

155

7.1.6 Area

The NPLL has a length of 133.89μm and a width of 60.96μm giving a total area
of 8.17mm2 making the NPLL designed for this dissertation one of the smallest,
compared to published papers. However, this is helped by the small feature size. The area
can be scaled for accommodating different feature sizes to accurately compare different
PLL. Equation 7.3 determines the scaling factor for area. Then, Equation 7.4 when the
area is scaled for different technologies the NPLL still is smaller than many other PLLs
by at least 15 percent.
S=

techno log y old
techno log y new
Anew =

(7.3)

Aold
S2

(7.4)

There are just two compared systems that have less area.[34][35]The specifically
designed PLLs are the same size or smaller than the NPLL. The size and scaling data is in
Table 7.3.
Reference # Area mm2
NPLL
0.008167
27
0.046
23
1.48
13
0.21
46
0.07
9
0.0868
5
0.6656
34
0.11
35
0.028
31
0.2925
12
0.7
19
3
8
0.16
20
2.89
6
0.88
63
4.82

Tech.
0.13
0.18
0.15
0.13
0.35
0.35
0.3
0.6
0.25
0.25
0.12
0.35
0.35
0.35
0.18
0.18

Area Scaled

%

Improvement

0.024
1.1116
0.21
0.0097
0.012
0.125
0.0052
0.0076
0.0791
0.8215
0.4139
0.0221
0.3987
0.459
2.5141

0.34
0.007
0.039
0.842
0.681
0.065
1.571
1.075
0.103
0.01
0.02
0.37
0.02
0.018
0.003

66
99.3
96.1
15.8
31.9
93.5
-57.1
-7.5
89.7
99
98
63
98
98.2
99.7

Table 7.3 Area for Phase-Lock Loops

156

7.1.7 Lock-Time

Lock-time is the time needed for the NPLL to lock onto its external oscillator
signal. The longest the NPLL takes to achieve lock is 238ns, but can be as short as 20ns
depending on how large the step is. This covers both start-up and frequency hopping. The
NPLL in this dissertation has a much shorter locking time than all but one other
published systems. The paper “A Low-Noise Fast-Lock Phase-Locked Loop with
Adaptive Bandwidth Control” has a shorter locking time of 115ns. The reason why this
occurs is that it operates at a much lower frequency. Most published PLLs are in the 360μs range. The specifically designed PLLs can achieve locking times of 200ns at the
cost of jitter. The peak-to-peak jitter of a standard PLL, with fast locking time is 47.27ps
and a period jitter of 24.26ps. This is a jitter increase of 67 percent to achieve the same
locking time. The Table 7.4 shows the locking time of published system and the
percentage decrease given by the NPLL.
Reference #
T-Lock us
NPLL Schematic
0.238
NPLL Layout
0.0705
9
3
74
30
34
0.15
31
60
62
50
8
1.5
20
2.5
6
16
63
35
PLL1 Schematic
0.238
PLL2 Schematic
0.212
PLL3 Schematic
0.226
PLL4 Schematic
1.5
PLL5 Schematic
1.344
PLL6 Schematic
1.66

%

Improvement

3.376
0.079
0.008
1.587
0.004
0.005
0.159
0.095
0.015
0.007
1
1.123
1.053
0.159
0.177
0.143

-237.6
92.1
99.2
-58.7
99.6
99.5
84.1
90.5
98.5
99.3
0
-12.3
-5.3
84.1
82.3
85.7

Table 7.4 Lock-Time for Phase-Lock Loops

157

7.1.8 Overall Performance

The NPLL, in this dissertation, shows significant improvements in the desired
areas of jitter and lock time. It outperforms most of the systems found in the literature
review. In the jitter criteria, the NPLL shows a greater than 90% reduction against the
published PLLs. In the design merit of locking time, the reduction is again more than
90%. The NPLL performs well in the area of power use, using less power than most of
the other systems. The actual physical size of the NPLL is larger than the standard PLL,
but it is average for PLLs that use additional techniques to improve performance. The
draw back to the NPLL in this dissertation is the phase noise. Out of reported values, the
NPLL could be improved drastically. Overall, the NPLL performs superbly and
outperforms many of the other PLLs found in the literature review.
7.2 CLOCK-DISTRIBUTION RESULTS

The clock-distribution system designed for this dissertation is a six-stage H-tree
with ten-buffer stages. This propagates the clock-signal across a chip area of 9.3 X
9.1mm, with very low skew and jitter. This is helped by shielded wires that further reduce
jitter. The H-tree powers 64 grids, each with the capability of driving a 2 to 3pF load. The
grid is a square with two cross members parsing the square into three rectangles with the
powering transistors at the corners of each rectangle.
7.2.1 Power

The total average power for the clock distribution system is 0.61W at a speed of
2GHz. While this value is small compared to the other clock-distribution systems, it is
difficult to compare this to published papers. Often the total power for the chip is given,
158

not just the power for the distribution system. Both speed and chip area, will vastly affect
the amount of power needed. The power scaling and comparison data is shown in Table
7.5.
Reference No.
NPLL
7
15
30
64

Vdd
1.2
1.35
1.8
1.5
1.3

Power
0.61
130
1
82
130

Power Scaled
0.61
102.716
0.444
52.48
110.769

%
1
0.006
1.374
0.012
0.006

Im provem ent
0
99.4
-37.4
98.8
99.4

Table 7.5 Power for the Clock-Distribution Systems
7.2.2 Area

The area is the physical size of the clock-distribution system; in this dissertation,
it covers an area of or an area of 84.63mm2. While this is substantial for a custom
integrated circuit (IC), compared to a major microprocessor, the area is small. From
compared published papers, only one clock-distribution system is small after scaling.[15]
The scaled area data and comparison are located in Table 7.6.
R e fe re n c e # A re a m m ^2
NPLL
8 4 .6 3
7
432
15
4 4 .7 3
30
217
64
374
65
400

Tech.
0 .1 3
0 .1 3
0 .1 8
0 .1 8
0 .1 3
0 .1 8

A re a S c a le d

%

B ig g e r

432
2 3 .3 3 1
1 1 3 .1 8 8
374
2 0 8 .6 4 2

0 .1 9 6
3 .6 2 7
0 .7 4 8
0 .2 2 6
0 .4 0 6

-8 0 .4
2 6 2 .7
-2 5 .2
-7 7 .4
-5 9 .4

Table 7.6 Area for the Clock-Distribution Systems
7.2.3 Latency

Latency is the time difference between when the input-clock is switched and
when that change is seen at the output. For this clock-distribution system, the average
latency is 374.5ps.
159

7.2.4 Rise and Fall-Time

The trise is the time required for the clock-signal to change from 10 percent of its
value to 90 percent of its value. The tfall is the time required to change from 90 percent to
10 percent. The average rise-time with a 2pF load is 90.94ps, and the fall-time was found
to be 78.92ps. In an ideal world, these two values should be equal.
7.2.5 Jitter

Jitter is the uncertainty of when a clock-edge will occur. Jitter for the clockdistribution system was determined by running 100 Monte Carlo simulations, varying
both process and mismatched values. The delay between the input and the output of the
clock-distribution system was determined. The maximum standard deviation from the
mean was 6.94ps. The RMS jitter is then determined to be 15ps. Examples of the Monte
Carlo runs are shown in Figures 7.18 and 7.19. Jitter within the grid itself is also a
concern. The RMS jitter is 164fs and a jitter of 1ps, though this jitter may be caused in
part by the level-9 and 10 inverters, which are simulated at the same time as the grid.

160

Delay in ps
Figure 7.18 Examples of the Monte Carlo Simulation Input to Grid R3Q3G3

Delay in ps
Figure 7.19 Examples of the Monte Carlo Simulation Input to Grid R1Q3G3

161

7.2.6 Skew

Skew is the difference in time between the clock edge arrive at any two points of
the clock-distribution system. Skew was determined by running 100 Monte Carlo
simulations, varying both process and mismatched values. The delay between random
points was determined, and the average mean difference was 197.8fs, with the worse case
skew being 537fs. The standard deviation of the skew is the same as for the jitter.
Examples of the Monte Carlo runs are shown in Figures 7.20 and 7.21.
Inter-grid skew has the same definition of skew, but it is within the bounds of one
grid of the clock-distribution system. The skew within the grid is at most 235fs.
Examples of the Monte Carlo runs are shown in Figures 7.22 and 7.23.

Delay in ps
Figure 7.20 Skew Grid R1Q1G1 to R3Q1G1

162

Delay in ps
Figure 7.21 Skew Grid R1Q1G1 to R3Q3G3

Delay in ps
Figure 7.22 In Grid Skew Legs 1 to 3

163

Delay in ps
Figure 7.23 In Grid Skew Legs 1 to 4
7.2.7 Jitter and Skew

The total uncertainty of when a signal arrives at a point is a combination of jitter
and skew. For this clock-distribution system, it is found to be 15.5ps. This compares
favorably to other published papers. After scaling, an improvement of at least 23.3
percent in the reduction of jitter and skew was found. This is an impressive feat without
deskew buffers or differential-signal pairs. Table 7.7 contains the data for the scaled
combined jitter and skew.
Reference No.

NPLL
15
30
64
65

Skew+Jitter
15.5
100
51
31
28

Tech.
0.13
0.18
0.18
0.13
0.18

Delay scaled
15.5
72.222
36.833
31
20.222

%
1
0.215
0.421
0.5
0.766

Improvement
0
78.5
57.9
50
23.4

Table 7.7 Jitter and Skew for the Clock-Distribution Systems

164

7.2.8 Clock Distribution Summary

The clock-distribution system is used to propagate the clock signal throughout the
chip while adding the least amount of jitter or skew as possible. The clock-distribution
system in this dissertation is used for a test of the NPLL. It provides a low-noise
distribution system. The jitter and skew results show a substantial improvement over
other clock-distribution systems while only covering an 81mm2 area, which is large for
custom integrated circuits (IC). It is small for top of the line microprocessors. It is still
one third to one fifth the size of a Pentium.® The power for the clock-distribution system
is low at 0.53W. However, this is hard to compare, because most list the total power of
the chip not just the clock-distribution system. Overall, the clock-distribution system is an
appropriate test for the NPLL.

165

8. CONCLUSIONS AND FUTURE WORK
8.1 CONCLUSION

In this dissertation, clock systems were discussed. A nonlinear phase-lock loop
(NPLL) clock generator was developed that will improve the jitter and locking time
characteristics of the clock generator. This NPLL has a nonlinear-gain controller. It
achieves faster locking time than a standard phase-lock loop (PLL). It also increases the
signal-to-noise ratio (SNR), lowers jitter, and reduces locking time compared to a
standard PLL. The faster locking time and noise reduction are critical improvements in
the quest to produce faster clock systems. The NPLL works by the use of a fuzzy
controller. With the fuzzy controller the gain of the forward path of the NPLL is
automatically adjusted. When the output is away from the desired operating point, the
gain is high, which quickly brings the circuit back to the correct operating point. When
lock is achieved, in less than 200ns, the VCO is at the desired operating point. Then the
NPLL gain will be in a low-gain region for low-noise. Phase noise is another major cause
of the jitter that is produced by the voltage-controlled oscillator (VCO).
The phase noise is reduced more by the NPLL than the standard PLL. This,
decrease in gain will improve the jitter characteristics. For the NPLL, the maximum jitter
values are as small as 2.59ps. It requires an area of 133.9μm X 60.96μm, and consumes
only 1.98mW of power. The reduction in noise, the improvement in the SNR, and the
decrease in locking time indicate that the NPLL is an improvement in the field of PLL’s.
166

A clock-distribution system was designed for the output of the NPLL. This
distribution system is an H-tree design with six branches. At the end of each branch, is a
regional-grid, 64 in total, each capable of handling a 3pF load. The clock lines running to
the grids, and the grids themselves, use shielding wires to reduce noise. All clock lines
are length-matched so that they should have the same delay. Monte Carlo simulations
show that this distribution system has a maximum skew and jitter of 15.5ps. The clockdistribution system covers an area of 9.3mm X 9.1mm, and consumes 0.534W of power.
This will enhance the performance of the clock-distribution systems.
8.2 Future Work

These suggested methods will improve the performance of a clock-distribution
system. Some will improve locking time, such as adding a derivative in the fuzzy
controller, and others by reducing jitter by increasing the number of bits in the controller.
Other methods would be to separate the shielding wires in the clock-distribution system.
Power savings could be made by switching off different grids.
8.2.1 Clock Generator

In this dissertation, a design for a new clock generator using an NPLL that has
faster locking times and lower jitter has been demonstrated. Locking times of 200ns and
jitter at less than 2.59ps have been achieved with a clock-distribution system capable of
powering a chip with an area of 81mm2 while driving a load of 128pF. Even with the
dramatic results, more can be accomplished.
The first area to improve is the number of bits in the fuzzy controller. An increase
to 5-bits will not only improve resolution in the digital-to-analog converter (DAC), the
extra bit will reduce quantization error. It will also reduce phase noise. The modification

167

to achieve this is rather simple. First, the counters must have an extra toggle flip-flop
(TFF) for the extra bit. The subtractor can simply be expanded with an addition of a full
adder for the subtractor and a half adder and an XOR gate for the sign-magnitude
circuitry. The programmable logic device (PLD) will need to be redesigned for the extra
bit. This modification will increase its size by 2.5 times, which will require extra buffers
to power the PLD. Finally, the DAC will need to be upgraded to 5-bits. Of course, these
increases will take more power and area. In addition to keeping the same throughput, the
speed of the counters will need to be doubled.
A yet unexplored method is to forego the PLD altogether, and vary the value of
the resistors in the DAC. The varying resistor values will automatically apply the desired
voltage to the DAC for the given input. This will give a small increase in resolution, and
will make the control surface smoother.
These methods should give a reasonable improvement in the performance of the
NPLL. A larger improvement can be achieved by adding a second input to the fuzzy
controller. In the control theory, integrating the error signal can be used to reduce steadystate-error to zero. A controller with both promotional and integral control is called a
proportional integral (PI) controller. Adding the integrator can be done with a digital
accumulator from the output of the subtractor, and expanding the PLD. The accumulator
keeps track of the output of the sign-magnitude subtractor, and keeps a running total of
the error signal. The output of the accumulator selects between different PLDs that
contain pre-calculated values for all possible contingencies. The sign-magnitude
subtractor then selects a value from that PLD. The proposed block diagram for this PI
fuzzy controller is shown in Figure 8.1. In a 4-bit system, the accumulator can pick

168

between 16 PLDs. This will increase both size and power.
Adding a derivative will not help the error, but will improve locking time.
Inserting a derivative follows the same steps as adding an integrator where the output of
the sign-magnitude subtractor is used to determine if the rate of change of the error is
increasing or decreasing. The output of the derivative selects between differing PLDs that
contain pre-calculated values for all possible contingencies. The sign-magnitude
subtractor then selects a value from within that PLD. [14] [54]

Figure 8.1 NPLL With PI Fuzzy Controller

If both the integrator and the derivative are used to reduce both locking time and
jitter. This is called a proportional integral durative (PID) controller. While this system is
fast and has no steady-state-error, a 4-bit system would require a PLD with 4096 words in
it, which is quite large. [14] [54]
In addition to these methods, there is the possibility of using an on-chip
temperature sensor to adjust the PLD, which counters the effects of temperature. The
output of the temperature sensor selects between differing PLDs that contain precalculated values that correct for temperature fluctuation in the chip. The sign-magnitude
subtractor then selects a value from within that PLD.
Another possible method of improving the NPLL is to switch to an analog control. This
169

would eliminate quantization error and reduce some of the latency in the forward path in
the NPLL. This structure may be larger than the proposed NPLL. This is mainly due to
the voltage-follower operational amplifier (op-amps). However, the power requirements
should be about the same.
A calculation intensive process, yet to be achieved, for the clock generator is to
derive the noise equations for the NPLL. This will not be a simple task due to the system
being nonlinear.
8.2.2 Clock-Distribution System

The clock-distribution system can be improved by a variety of methods. The most
obvious method is to use separate ground wires to shield the clock lines as opposed to
using the power lines as the shield lines. This would reduce the crosstalk noise on the
clock signals. The second major method to reduce noise would be to use a reducedvoltage differential signal. Limiting the swing voltage to 0.3v allows smaller drivers for
the same latency, lowering the drive current and therefore the self-inductance noise. The
second way a differential signal reduces noise is that both wires of the differential must
be effected to create an error. A circuit diagram for this system is shown in Figure 8.2
and Figure 8.3. While this method is preferred for distributing across large areas, it does
not buffer-up to power large loads, such as a grid. [37]
Saving power in the distribution grid can be done by adding enable signals to turn
on-and-off the unused grids. Turning off a signal grid saves 8.9mw of power. This would
require a control signal of 6-bits, if a demultiplexer were used.

170

Figure 8.2 Limited Swing Differential Transmitter

Figure 8.3 Differential Receiver

Delay buffers can be used to further reduce skew. Individual control of each grid
is possible. This method is shown in Figure 8.4, where the control signal adjusts the
voltage to a PMOS transistor to speed up or slow down a delay circuit. This circuit also
has the enable option to turn off grids, which would require an extra 128 control lines. To
reduce this, the delay buffers could be moved higher in the H-tree. [30]
As can be seen, there is more work to be done on the clock system and the NPLL.
Breakthroughs in this area are coming and the industry will profit from them.
171

Figure 8.4 Variable Delay Buffers

172

Appendix A

ACRONYMS
ADC
AIFP
ASFH
ASFP
CMOS
COG
DAC
DEMUX
DFF
DFF3
DFF 4
DFFCLRSYS
DFT
FFT
FET
FLC
GPFP
IC
LC
LSB
LUT.
MFC
MFEB
MOSFET
MOSIS
MSB
MUX
NB
NM
NMOS
NPLL
NS
OPAmp
PB
PD
PFD
PLD
PLL
PM
PMOS
PI
PS

Analog-to-Digital Converters
Application-Independent Fuzzy Processor
Application-Specific Fuzzy Hardware
Application-Set Fuzzy Processors
Complementary metal–oxide–semiconductor
Center of Gravity
Digital-to-Analog Converter
Demultiplexers
D-Flip-Flop
D-Flip-Flop 2
D-Flip-Flop 4
D-flip-flop with a Synchronous Clear
Discrete Fourier Transform
Fast Fourier Transform
Field Effect Transistor
Fuzzy Logic Controller
General-Purpose Fuzzy Processors
Integrated Circuit
Inductor Capacitance
Least Significant Bit
Look-Up Tables
Membership Function Circuits
Membership Function Evaluation Block
Metal–Oxide–Semiconductor Field-Effect Transistor
Metal Oxide Semiconductor Implementation Service
Most Significant Bit
Multiplexers
Negative-Big
Negative-Medium
N-channel MOSFET
Nonlinear Phase-Lock Loop
Negative-Small
Operational Amplifier
Positive-Big
Phase-Detector
Phase Frequency Detector
Programmable Logic Device
Phase-Lock Loops
Positive-Medium
P-channel MOSFET
Proportional Integral
Positive-Small
173

RC
RCO
RF
RMS
ROM
SFDR
SNR
SR
TDFP
TFF
PLD
RC
TFF
TL
TPHL
Tplh
VCC
VCO
Z

Time Constant
Ripple Carry-Out Signal
Radio Frequency
Root Mean Square
Read Only Memory
Spurious-Free Dynamic-Range
Signal-to-Noise Ratio
Set and Reset
Task-Dedicated Fuzzy Processors
Toggle Flip-Flop
Programmable Logic Device
Time Constants
Toggle Flip-Flop
Locking Time
Propagation High-to-Low
Propagation Low-To-High
Voltage-to-Current Converter,
Voltage-Controlled Oscillator
Zero

174

Appendix B

Bibliography
1. Abidi, Asad A., “Phase Noise and Jitter in CMOS Ring Oscillators,” IEEE Journal of
Solid State Circuits, Vol. 41, No. 8, pp. 1803-1816, Aug. 2006.
2. Baker, R. Jacob, Li, Harry W., and Boyce, David E., CMOS Circuit Design, Layout,
And Simulation. Piscataway: IEEE Press.
3. Bernstein, Kerry, High Speed CMOS Design Styles. Boston: Kluwer Academic
Publishers, 1998.
4. Best Roland E., Phase-Locked Loops Design, Simulation, and Applications. New
York: McGraw – Hill Companies, Inc. 1999.
5. Boerstler, D. W. “A Low-Jitter PLL Clock Generator for Microprocessors with Lock
Range of 340-612 Mhz,” IEEE Journal of Solid-State Circuits, Vol. 34, pp. 513519, April 1999.
6. Chang, Hsiang-Hui and Liu, Shen-Iuan, “A Wide-Range and Fast-Locking All-Digital
Cycle-Controlled Delay-Locked Loop,” IEEE Journal of Solid-State Circuits,
Vol. 40, No. 3, pp. 661-670, March 2005.
7. Chang, Jonathan and others, “A 130-nm Triple-Vt 9-MB Third-Level On-Die Cache
for the 1.7-GHz Itanium® Processor,” IEEE Journal of Solid-State Circuits, Vol.
40, No. 1, pp. 195-203, Jan. 2005.
8. Chen, Oscal T.C., and Sheen, Robin Ruey-Bin, “A Power-Efficient Wide-Range
Phase-Locked Loop,” IEEE Journal of Solid-State Circuits, Vol. 37, No. 1, pp.
51-62, Jan. 2002.
9. Cheng, Kuo-Hsing and others, “A Dual-Slope Phase Frequency Detector and Charge
Pump Architecture to Achieve Fast Locking of Phase-Locked Loop,” IEEE
Transactions on Circuits and Systems, II Analog and Digital Signal Processing,
Vol.50, No. 11, Nov. 2003.
10. Chih-Ming, Hung and Kenneth K.O, “A Fully Integrated 1.5 -V 5.5 GHz CMO
Phase-Locked Loop,” IEEE Journal of Solid State Circuits, Vol. 37, No.4, Nov.
2002.
11. Chung, Daehyun, Ryu and others, “Chip-Package Hybrid Clock Distribution
Network and DLL for Low Jitter Clock Delivery,” IEEE Journal of Solid-State
Circuits, Vol. 41, No. 1, pp. 274-286, Jan. 2006.
12. Dalt, Nicola Da and Sandner, Christoph, “A Subpicosecond Jitter PLL for Clock
Generation in 0.12- μm,” IEEE Journal of Solid-State Circuits, Vol. 38, No. 7 pp.
1275-1278, July 2003.
175

13. Dalt, Nicola Da and others, “A Compact Triple-Band Low-Jitter Digital LC PLL
With Programmable Coil in 130-nm CMOS,” IEEE Journal of Solid-State
Circuits, Vol. 40, No. 7 pp.1482-1490, July 2005.
14. Driankov, Dimiter and others, An Introduction to Fuzzy Control, Second Edition.
New York: Springer-Verlag, 1995.
15. Elboim, Yaron and others, “A Clock-Tuning Circuit For System-On-Chip,” IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 11, No. 4, pp.
616-626, Aug. 2003.
16. Fahim, Amr M. and Elmasry, Mohamed I., “A Fast Lock Digital Phase-Locked-Loop
Architecture for Wireless Applications,” IEEE Transactions on Circuits and
Systems, –II Analog and Digital Signal Processing, Vol. 50, No. 2, Feb. 2003.
17. Fouzar, Y. and others, “A New Controlled Gain Phase-Locked Loop Technique,”
The 2001 IEEE International Symposium on Circuits and Systems 2001, Vol. 4,
pp. 810-813, 6-9 May 2001.
18. Fouzar, Y. and others, “Very Short Locking Time PLL Based On Controlled Gain
Technique,” Electronics, Circuits and Systems, 2000. ICECS 2000, Vol.1, pp.
252-255, 17-20 Dec. 2000.
19. Gutnik, Vadim and Chandrakasan, Anantha P., “Active Ghz Clock Network Using
Distributed PLLs,” IEEE Journal of Solid-State Circuits, Vol. 35, No. 11, pp.
1553-1560, Nov. 2000.
20. Han, Sung-Rung and others, “A Time-Constant Calibrated Phase-Locked Loop With
a Fast-Locked Time,” IEEE Trans-actions on Circuits and Systems –II: Express
Briefs, Vol. 54, No. 1, pp. 34-37, Jan. 2007.
21. Herzel, F. and others, “An Integrated CMOS PLL For Low-Jitter Applications,”
IEEE Transactions on Circuits & System – II, Vol. 49, No. 6, pp. 427-429, June
2002.
22. Heydari Payam “Analysis of the Phase Lock Loop Jitter Due To Power/Ground and
Substrate Noise.” IEEE Transaction on Solid-State Circuits, Vol. 51, No. 12, Dec.
2004.
23. Ingino, Joseph M. and von Kaenel, Vincent R., “A 4-Ghz Clock System for a HighPerformance System-On-A-Chip Design,” IEEE Journal of Solid-State Circuits,
Vol. 36, No. 11, pp. 1693-1698, Nov. 2001.
24. Jain Ph.D., L.C., Editor, Microelectronic Design of Fuzzy Logic-Based Systems. New
York: CRC Press.
25. Kandel, Abraham and Langholz, Gideon, Fuzzy Hardware Architectures and
Applica-tions. Norwell: Kauwer Academic Publishers, 1998.
176

26. Kang, Sung-Mo, Leblebici, Yusuf, CMOS Digital Integrated Circuits: Analysis and
Design, Second Edition. Boston: McGraw-Hill, 1999.
27. Kim, Byung-Guk and Kim,Lee-Sup “A 250-Mhz-2-Ghz Wide-Range Delay-Locked
Loop,” IEEE Journal of Solid-State Circuits, Vol. 40, No. 6, pp. 1310-1321, June
2005.
28. Kroupa, Venceslav F., Phase Locked Loops and Frequency Synthesis. West Sussex:
John Wiley & Sons Ltd, 2003.
29. Kuo Benjamin C., Automatic Control System. New Jersey: Prentice Hall 1995.
30. Kurd, Nasser A. and others, “A Multigigahertz Clocking Scheme for the Pentium® 4
Microprocessor,” IEEE Journal of Solid-State Circuits, Vol. 36, No. 11, pp.
1647-1653, Nov. 2001.
31. Lee Tai-Cheng and Razavi, Behzad, “A Stabilization Technique for Phase-Locked
Frequency Synthesizers,” IEEE Journal of Solid-State Circuits, Vol. 38, pp. 894,
2003.
32. Lee Thomas H., The Design of CMOS Radio-Frequency Integrated Circuits. New
York: Cambridge University Press 1998.
33. Lee, David C., “Analysis of Jitter in Phase-Locked Loops.” IEEE Transactions on
Circuits and Systems – II:Analog and Digital Signal Procession, Vol. 49, No. 11,
Nov. 2002.
34. Lee, Joonsuk and Kim, Beomsup, “A Low-Noise Fast-Lock Phase-Locked Loop
With Adaptive Bandwidth Control,” IEEE Journal of Solid-State Circuits, Vol.
35, No. 8, pp. 1137-1145, Aug. 2000.
35. Mansuri, Mozhgan, and Yang, Chih-Kong Ken, “A Low-Power Adaptive Bandwidth
PLL and Clock Buffer With Supply-Noise Compensation,” IEEE Journal of
Solid-State Circuits, Vol. 38, No. 11, pp. 1804-1812, Nov. 2003.
36. Mansuri, Mozhgan, and others, “Methodology for On-Chip Adaptive Jitter
Minimization in Phase-Locked Loops,” IEEE Transactions on Circuits & Systems
– II, Vol. 50, No.11 pp. 870-878, Nov. 2003.
37. Massoud, Yehia and others, “Managing On-Chip Inductive Effects,” IEEE
Transactions on Very Large Scale Intergation (VLSI) Systems, Vol. 10, No. 6, pp.
789-798, Dec. 2002.
38. Matthys, Robert J., Crystal Oscillator Circuits, New York: John Wiley & Sons, Inc
1983.

177

39. Meghelli, Mounir and others, “A 0.18-μm SiGe BiCMOS Receiver and Transmitter
Chipset for SONET OC-768 Transmission Systems,” IEEE Journal of Solid State
Circuits, Vol. 38, No.12, Dec. 2003.
40. Mansuri, Mozhgan and Yang, Ken Chih-Kong, “Jitter Optimization Based on PhaseLocked Loop Design Parameters,” IEEE Journal Of Solid-State Circuits, Vol. 37,
No. 11, pp. 1375-1382, Nov. 2002.
41. Moore, Gordon E., “Cramming More Components Onto Integrated Circuits”
Electronics, Vol. 38 No. 8 April 19, 1965
42. Mule, Anthony V. and others, “Electrical and Optical Clock Distribution Networks
for Gigascale Microprocessors,” IEEE Trans-actions on Very Large Scale
Integration (VLSI) Systems, Vol. 10, pp. 582-594, Oct. 2002.
43. Nguyen, Hung T. and Walker, Elbert, A First Course in Fuzzy Logic, Second
Edition. Boca Raton: Chapman & Hall/CRC Press, 2000.
44. O=Mahony, Frank P., 10GHz Global Clock Distribution Using Coupled StandingWave Oscillators, Ph.D. Dissertation, Stanford University, Aug. 2003.
45. O=Mahony, Frank P. and others, “A 10-GHz Global Clock Distribution Using
Coupled Standing-Wave Oscillators,” IEEE Journal of Solid State Circuits, Vol.
38, No. 11, Nov. 2003.
46. Olsson, Thomas and Nilsson, Peter, “A Digitally Controlled PLL for SoC
Applications,” IEEE Journal of Solid-State Circuits, Vol. 39, No. 5, pp. 751-760,
May 2004.
47. Park, Chan-Hong and Kim, Beomsup, “A Low-Noise, 900-MHz VCO in 0.6-μm
CMOS,” IEEE Journal of Solid State Circuits, Vol. 34, No. 5, pp. 586-590, May
1999.
48. Passino, Kevin M. and Yurkovich, Stephen, Fuzzy Control. Menlo Park: Addison Wesley Longman, Inc., 1998.
49. Patyra, M.J. and Mlynek, D.M., Fuzzy Logic Implementation and Applications. West
Sussex: John Wiley & Sons Ltd., 1996.
50. Rabaey, Jan M. Digital Integrated Circuits A Design Perspective. New Jersey:
Prentice Hall 1996.
51. Rabaey, M. Jan and Massoud, Pedram, Low-Power Design Methodologies.
Massachusetts: Klvwer Academic Publisher.
52. Renaud, Mathieu and Savaria, Yvon, “Linear Phase Detector for Arbitrary Clock
Signals,” IEEE Circuits and Systems, 2002. ISCAS 2002, Vol. 4, pp. 886-871, 2629 May 2002.
178

53. Restle, Phillip J. and others, “A Clock Distribution Network for Microprocessors,”
IEEE Journal of Solid-State Circuits, Vol. 36, No. 5, pp. 792-799, May 2001.
54. Reznik, Leonid, Fuzzy Controllers. Oxford: Butterworth-Heinemann Linacre House,
1997.
55. Roberts, Neil, Phase Noise and Jitter – A primer for Digital Designers Zarlink
Semiconductor, http://assets.zarlink.com/CA/Phase_Noise_and_Jitter_Article.pdf , Dec.
20, 2007.

56. Ryu, Heung-Gyoon and Lee,Hyun-Seok, “Analysis of Switching Characteristics of
the Digital Hybrid PLL Frequency Synthesizer,” Vol. 52, No. 4, pp. 1044-1048,
July 2003.
57. Ryu, Heung-Gyoon and others, “A New Triple-Controlled Type Frequency
Synthesizer Using Simplified DDFS-Driven Digital Hybrid PLL System,” IEEE
Transactions on Consumer Electronics, Vol. 48, pp. 63-71, 2002.
58. Rusu, S. Tam and others, “A 65-nm Dual-Core Multithreaded Xeon,® Processor
With 16-MB L3 Cache,” IEEE Journal of Solid-State Circuits, Vol. 42, No. 1,
pp. 17-25, 2007.
59. Shahruz, Shahram M., “A Phase-Locked Loop,” Review of Scientific Instruments,
Vol. 72, No. 3, pp. 1888-1892, March 2001.
60. Shahruz, Shahram M., “Low-Noise and Fast-Locking Phase-Locked Loop,” Review
of Scientific Instruments, Vol. 73, No. 12, pp. 4347-4353, Dec. 2002.
61. Sidiropoulos, Stefanos and Horowitz, Mark A., “A Semidigital Dual Delay-Locked
Loop,” IEEE Journal of Solid State Circuits, Vol. 32, No.11, Nov. 1997.
62. Staszewski, Robert Bogdan and Balsara, Poras T., “All-Digital PLL With Ultra Fast
Settling,” IEEE Transactions on Circuits & Systems – II, Vol. 54, No. 2. pp. 181185, Feb. 2007.
63. Swaminathan, Ashok and others, “A Wide-Bandwidth 2.4 GHz ISM Band
Fractional-NPLL With Adaptive Phase Noise Cancellation,” IEEE Journal of
Solid State Circuits, Vol. 42, No.12, pp. 1369-2650, Dec. 2007.
64. Tam, Simon and others, “Clock Generation and Distribution for the 130-nm
Itanium® 2 Processor With 6-MB On-Die L3 Cache,” IEEE Journal of SolidState Circuits, Vol. 39, No. 4, pp. 636-642, April 2004.
65. Tam, Simon and others, “Clock Generation and Distribution for the First IA-64
Microprocessor,” IEEE Journal of Solid-State Circuits, Vol. 35, No. 11, pp. 15451552, Nov. 2000.

179

66. Tang, Yiwu and others, “A New Fast-Settling Gearshift Adaptive PLL to Extend
Loop Bandwidth Enhancement in Frequency Synthesizers,” IEEE International
Symposium On Circuits and Systems, pp. 787-790, May 2002.
67. Teodorescu, Horia-Nicolai and others, Hardware Implementation of Intelligent
Systems. Heidelberg:Physica-Verlag, 2001.
68. Toyama, M., Dosho, S. and Yanagisawa, N., “A Design of a Compact 2 GHz-PLL
With a New Adaptive Active Loop Filter Circuit,” 17th Symposium on VLSI
Circuits, Tokyo, Japan, pp. 185-188, June 2003.
69. Vaucher, Cicero S., “An Adaptive PLL Tuning System Architecture Combining
High Spectral Purity and Fast Settling Time,” IEEE Journal of Solid-State
Circuits, Vol. 35, No. 4, pp. 490-502, April 2000.
70. Wakerly, John F., Digital Design: Principles & Practices, Second Edition. New
Jersey: Prentice Hall, 1994.
71 Weigandt, Todd Charles, Low -Phase-Noise, Low-Timing-Jitter Design Techniques
for Delay Cell Based VCO’s and Frequency Synthesizers, Ph. D. Dissertation,
University of California, Berkeley, Spring 1998.
72. Wood, John, Edwards and others, “Rotary Traveling-Wave Oscillator Arrays: A New
Clock Technology,” IEEE Journal of Solid State Circuits, Vol. 36, No. 11, Nov.
2001.
73. Xiu, Liming, “A Novel All-Digital PLL With Software Adaptive Filter,” IEEE
Journal of Solid-State Circuits, Vol. 39, No. 3, pp. 476-483, Mar., 2004.
74. Zhang, Benyong and others, “A Fast Switching PLL Frequency Synthesizer With an
On-Chip Passive Discrete-Time Loop Filter In 0.25μm CMOS,” IEEE Journal of
Solid-State Circuits, Vol. 38, No. 6 pp. 835-865, June 2003.

180

