Fast offset compensation for a 10Gbps limit amplifier by Crain, Ethan A. (Ethan Alan), 1972-
Fast Offset Compensation for a 10Gbps Limit
Amplifier
by
Ethan A. Crain
Bachelor of Science in Electrical Engineering and Computer Science,
Massachusetts Institute of Technology, December 1995
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Masters of Engineering in Electrical Engineering and Computer
Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
May 2004
@ Massachusetts Institute of Technology 2004. All rights reserved.
Author
Department of Electrical Engineering and Uomputer Science
May 15, 2004
Certified by
Accepted by
Michael H. Perrott
A ccistant Professor
,sis Supervisor
rthur C. Smith
Chairman, Department Committee on Graduate Students
MASSACHUSETTS INSTTlJTE
OF TECHNOLOGY
JUL 2 0 2004
BARKER
LIBRARIES
2
Fast Offset Compensation for a 10Gbps Limit Amplifier
by
Ethan A. Crain
Submitted to the Department of Electrical Engineering and Computer Science
on May 15, 2004, in partial fulfillment of the
requirements for the degree of
Masters of Engineering in Electrical Engineering and Computer Science
Abstract
A novel offset voltage compensation method is presented that significantly modifies
the existing tradeoff between control loop bandwidth, and therefore total compensa-
tion time, and total output jitter. The proposed system achieves comparable output
jitter performance to traditional approaches while significantly reducing the total
compensation time by nearly three orders of magnitude.
Traditional offset compensation methods are based on simple offset measurement
techniques that generally rely on passive compensation blocks and exhibit a direct
inverse relationship between total compensation time and resulting output jitter.
Therefore, current high-speed data-link systems suffer from extremely long offset
compensation loop settling times in order to satisfy the strict protocol jitter specifi-
cations. In the proposed system, the new CMOS peak detector design is the enabling
component that allows us break this relationship and achieve extremely fast settling
behavior while preventing data dependence of the control signal.
Simulated results show that the implemented system can achieve output jitter per-
formance similar to existing methods while dramatically improving the compensation
time. Specifically, the proposed system can achieve less than 2pS of peak-to-peak
jitter, or less than 700fS of RMS jitter, while reducing the total compensation time
from roughly 500pS to less than 1pS. The system was implemented in National Semi-
conductor's CMOS9 0.18pm CMOS process. Packaged parts will be tested to verify
agreement with simulated performance.
Thesis Supervisor: Michael H. Perrott
Title: Assistant Professor
3
4
"It is not the critic who counts: not the man who
points out how the strong man stumbles or where
the doer of deeds could have done better. The
credit belongs to the man who is actually in the
arena, whose face is marred by dust and sweat
and blood, who strives valiantly, who errs and
comes up short again and again, because there is
no effort without error or shortcoming, but who
knows the great enthusiasms, the great devotions,
who spends himself for a worthy cause; who, at
the best, knows, in the end, the triumph of high
achievement, and who, at the worst, if he fails, at
least he fails while daring greatly, so that his place
shall never be with those cold and timid souls who
knew neither victory nor defeat."
Theodore Roosevelt
University of Paris, Sorbonne
April 23, 1910
5
6
Acknowledgments
It was a significant decision to leave the work force, uproot my entire family from
their home and friends and return to academia after an eight year hiatus. The path
has not been without its trials and and I was tempted to give in on more than one
occasion. That is why I feel extremely lucky to have an incredible group of friends,
family, lab partners and mentors that helped me along the way. I owe a debt of
gratitude to several people who made it possible for me to complete my Masters of
Engineering thesis and dare to continue with my PhD.
First, I would like to thank my advisor, Michael Perrott, for taking a chance and
believing in me. I have learned a tremendous amount in the last two years in both
my work with you and in taking and TAing 6.976. I look forward to a white-knuckle
PhD experience over the next couple of years.
I would also like to thank all of my lab partners, Charlotte Lau, Belal Helal,
Shawn Kuo, Matt Park and Min Park, for keeping me sane and tolerating me for the
last two years. I would especially like to thank Scott Meninger whose undying drive
motivated me to keep going at my lowest points. In retrospect, the countless hours
we spent slaving away on layout and debugging CAD tools was kind of fun in a sick
and twisted way. I owe you a brew at the Thirsty after you tape out.
I would like to thank National Semiconductor for graciously fabricated my chip
on their 0.18pm CMOS9 process. I would not have been able to tape-out with out
the help of Sangamesh Buddhiraju and Matthew Courcy who coordinated getting my
chip onto the shuttle on time and ungrudgingly answered all of my questions.
Most importantly, I would like to thank Michelle, my wife, for daring to believe in
me. Without your support this thesis would not have been possible. I love you and,
in the words of a man much wiser than I, I owe you big time. My two sons, Jacob
and Samuel, have been extremely patient and understanding. Some day I hope that
you understand why I made the decision to come back to school and forgive me for
not being around as much as you would like. I owe you a quite a few play dates at
the park.
I would like to thank my parents, Stephen and Pauline, for pointing me in the
right direction at an early age. I hope that we get to spend some time visiting my
brothers Brad, Geoffrey and Justin, my sister Michelle and their families when I get
a little down time this summer.
Finally, I would like to thank Fairchild Semiconductor for their generous financial
support that they provided in my first year of graduate school.
7
8
Contents
1 Introduction
1.1 Background . . . . . . . . . . . . . . . . . . .
1.2 M otivation . . . . . . . . . . . . . . . . . . . .
1.2.1 Review of Offset Voltage . . . . . . . .
1.2.2 Impact of Offset Voltage on Amplifiers
1.3 Prior Offset Compensation Approaches . . . .
1.3.1 Sampled Offset Compensation . . . . .
1.3.2 Low-Pass Filter Compensation . . . . .
1.3.3 Other Approaches . . . . . . . . . . . .
1.4 Proposed Approach and Contribution . . . . .
1.5 Thesis Organization . . . . . . . . . . . . . . .
19
. . . . . . . . . . . . . 20
. . . . . . . . . . . . . 2 2
. . . . . . . . . . . . . 2 2
. . . . . . . . . . . . . 2 3
. . . . . . . . . . . . . 23
. . . . . . . . . . . . . 2 3
. . . . . . . . . . . . . 24
. . . . . . . . . . . . . 2 5
. . . . . . . . . . . . . 26
. . . . . . . . . . . . . 26
2 Proposed Approach
2.1 Measuring Offset Voltage . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Extracting the Offset Voltage with Min/Max Detectors . . . .
2.1.2 Issues with Sensing Offset with Min/Max Detectors . . . . . .
2.1.3 Extracting Offset with Simple Max Detectors . . . . . . . . .
2.1.4 Issues with Sensing Output Referred Offset with Max Detectors
2.1.5 Final Peak Detector Design . . . . . . . . . . . . . . . . . . .
2.2 Sum m ary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
29
29
29
30
31
32
33
34
3 System Modeling 37
3.1 System Level Implementation . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Linear System Modeling ......................... 39
3.2.1 Modeling System Response with PLL Design Assistant . . . . 42
3.2.2 Modeling the Impact of System Parameter Variation on Stabil-
ity and Compensation Time . . . . . . . . . . . . . . . . . . . 44
3.3 Sum m ary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 Numerical Design of High Speed Differential Amplifiers 47
4.1 M ethodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.1 Derivation of Gain/Swing Constraint Formulation . . . . . . . 50
4.2.2 Derivation of Gain-Bandwidth Tradeoff . . . . . . . . . . . . . 50
4.3 Intuitive Insights from Method . . . . . . . . . . . . . . . . . . . . . . 51
4.4 R esults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5 Application to SCL Digital Circuits . . . . . . . . . . . . . . . . . . . 53
4.6 Sum m ary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5 Circuit Design of Systems Blocks 55
5.1 High Speed Limit Amplifier . . . . . . . . . . . . . . . . . . . . . . . 55
5.1.1 Determining Optimal Number of Stages . . . . . . . . . . . . 56
5.1.2 Bandwidth Extension Techniques . . . . . . . . . . . . . . . . 62
5.1.3 Final Amplifier Design . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Peak Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3 Integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.4 Output Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.5 Comparator and Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.6 E SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
10
5.7 Sum m ary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6 Results
6.1 CppSim Modeling and Simulation Results . . . . . . . . . . . . . . .
6.1.1 Limit Amplifier . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.2 Peak Detector . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.3 Integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.4 Control Logic . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 CppSim Simulation Results . . . . . . . . . . . . . . . . . . . . . . .
6.3 Hspice Simulation Results . . . . . . . . . . . . . . . . . . . . . . . .
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 Layout
7.1 Peak Detector
7.2 Integrator . . . .
7.3 High Speed Limit
7.4 Output Buffer . .
7.5 Top Level . . . .
7.6 Summary . . . .
Amplifier
8 Conclusions and Future Work
8.1 Contributions . . . . . . . . . . . .
8.2 Future Work . . . . . . . . . . . . .
A Derivation of Input Referred Offset
A.0.1 Square-Law Operation . . .
A.0.2 Velocity Saturation . . . . .
Voltage
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
B Circuit Design Details
B.1 ESD Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
78
79
79
79
80
80
81
81
81
82
87
87
87
88
89
90
91
93
93
94
95
96
97
99
99
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
C CppSim Code 101
C.1 Limit Amplifier Code ................................. 101
C.2 Peak Detector Code .................................. 102
C.3 Integrator Code ....... .............................. 102
D Optimal Gain/Stage for Maximum Bandwidth 103
D.0.1 Determining Optimal Number of Stages . . . . . . . . . . . . 103
E Matlab Amplifier Script 107
E.1 Script for Fixed Bandwidth ...... ....................... 107
E.2 Script for Fixed Power Dissipation . . . . . . . . . . . . . . . . . . . 112
12
List of Figures
1-1 Block Diagram of High-Speed Data Link System . . . . . . . . . . . . 20
1-2 High-Speed, Multi-Stage Limit Amplifier . . . . . . . . . . . . . . . . 21
1-3 Implementation of Each Stage in Limit Amplifier . . . . . . . . . . . 21
1-4 LPF to Extract Output Referred Offset in High-Speed Data Link Systems 24
2-1 Measuring Output Referred Offset Voltage Using Minimum and Max-
im um Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2-2 Traditional Implementations for Minimum and Maximum Detectors . 30
2-3 Influence of Symbol Period on Droop of Simple Peak Detector . . . . 32
2-4 Measuring Output Referred Offset Voltage Using Maximum Detectors
O nly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2-5 Schematic of Typical CMOS Maximum Detector . . . . . . . . . . . . 33
2-6 Schematic of Proposed Peak Detector with Reduced Droop . . . . . . 34
2-7 Comparison of Influence of Symbol Period on Droop of Simple Peak
Detector vs Proposed Peak Detector Design . . . . . . . . . . . . . . 34
3-1 System Level of Limit Amplifier with Offset Compensation . . . . . . 38
3-2 Typical Transfer Function for Limit Amplifier Cell . . . . . . . . . . . 38
3-3 Complete System Showing Multiple Control Loops and Logic . . . . . 39
3-4 Linear Model of Limit Amplifier with Offset Compensation . . . . . . 41
3-5 Bode Plot Showing Stability Degradation with Increasing Gain . . . . 42
13
3-6 Root Locus Plot of G(s) Showing Necessary Condition for Stability . 43
3-7 PLL Design Assistant Graphical Interface . . . . . . . . . . . . . . . 43
3-8 Step Response of System Designed with PLL Design Assistant . . . . 44
3-9 PLL Design Assistant Graphical Interface . . . . . . . . . . . . . . . 45
3-10 Impact of ±20% Variation in Loop Gain and Dominant Pole Location
on the Step Response of the System . . . . . . . . . . . . . . . . . . . 45
4-1 Differential amplifier used in calculations . . . . . . . . . . . . . . . . 48
4-2 Small signal model for amplifier. . . . . . . . . . . . . . . . . . . . . . 49
4-3 Calculated Gain-Bandwidth product vs Iden. . . . . . . . . . . . . . . 52
4-4 Current density settings versus gain/swing. . . . . . . . . . . . . . . . 53
4-5 Digital high speed circuits. . . . . . . . . . . . . . . . . . . . . . . . . 54
5-1 High-Speed, Multi-Stage Limit Amplifier . . . . . . . . . . . . . . . . 55
5-2 Number of Stages vs Total Bandwidth: Normalized Total Bandwidth
vs Number of Stages for G11n = A = 1.65, 2.0 and 3.0 and Total Gains,
G, of 10, 100 and 1000 . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5-3 Total Power Dissipation of Limit Amplifier for a Total Gain of 100 and
Bandwidths/Stage from 2-10GHz . . . . . . . . . . . . . . . . . . . . 59
5-4 (a) Full Resistive-Loaded Differential Amplifier (b) Half-Circuit with
Noise Sources Added (c) Half-Circuit with Noise Source Referred to
Input......... .................................... 60
5-5 Total Input Referred Voltage Noise Versus Number of Amplifier Stages
for a Fixed Total Gain . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5-6 Minimum Input Voltage Versus Number of Amplifier Stages for a Fixed
Total G ain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5-7 Limit Amplifier Stage with Neutralization Capacitors . . . . . . . . . 63
14
5-8 Eye Diagrams for the Limit Amplifier at Data Rates of 5Gbps and
10Gbps and input amplitudes of 2mV, 20mV and 200mV peak-to-peak 65
5-9 Simplified Schematic of Fully Differential Peak Detector . . . . . . . . 66
5-10 Basic Differential RC Integrator . . . . . . . . . . . . . . . . . . . . . 67
5-11 Simplified Schematic of Differential gmC Integrator . . . . . . . . . . 68
5-12 Bode Plot of Modified Open-Loop Parameter A(s) . . . . . . . . . . . 70
5-13 Schematic showing how integrator array is configured . . . . . . . . . 71
5-14 Bode Plot of integrator demonstrate how transfer function varies with
n, the number of parallel integrator cells . . . . . . . . . . . . . . . . 72
5-15 (a) CMFB with Resistive Output Common-Mode Level Sensing, (b)
CMFB Using Differential Amplifier to Sense Output Common-Mode
Level......... .................................... 73
5-16 Simplified Schematic of Integrator Showing Biasing and CMFB . . . 73
5-17 Differential Package Model Showing the Bond Pad and Package Ca-
pacitance and the Bond Wire Inductance . . . . . . . . . . . . . . . . 74
5-18 Final Output Buffer Design . . . . . . . . . . . . . . . . . . . . . . . 75
5-19 Eye Diagram at Output of Output Buffer (a) 5Gbps, (b) 10Gbps . . . 75
5-20 Typical Implementation of Clocked Comparator . . . . . . . . . . . . 76
5-21 Implementation of Comparator in Windowing Block . . . . . . . . . . 77
6-1 3rd Order Polynomial Fit to Limit Amplifier Transfer Function Mea-
sured in Hspice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6-2 Control voltage of offset compensation loop during compensation from
CppSim: (A) 1MHz bandwidth, (B) 5MHz bandwidth, (C) 10 MHz
B andw idth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6-3 Eye diagram of limit amplifier output after compensation from CppSim 83
15
6-4 Control voltage of offset compensation loop during compensation from
Hspice: (A) 1MHz bandwidth, (B) 5MHz bandwidth, (C) 10 MHz
B andw idth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6-5 Eye diagram of limit amplifier output after compensation from Hspice
7-1 Detail of Peak Detector Cell Showing Common-Centroid Layout and
Dum m y Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-2 Layout of Base Integrator Cell . . . . . . . . . . . . . . . . . . . . . .
7-3 Layout of Integrator Array . . . . . . . . . . . . . . . . . . . . . . . .
7-4 Layout of Limit Amplifier Stage . . . . . . . . . . . . . . . . . . . . .
7-5 Layout of Limit Amplifier Top Level . . . . . . . . . . .
7-6 Layout of Output Buffer Stage . . . . . . . . . . . . . . .
7-7 Layout of Output Buffer Top Level . . . . . . . . . . . .
7-8 Layout of Chip Top Level . . . . . . . . . . . . . . . . .
A-1 Implementation of Each Stage in Limit Amplifier . . . .
B-1 Simplified schematic of ESD circuitry used on all pads .
B-2 Layouts for the two pads with ESD structures . . . . . .
D-1 High-Speed, Multi-Stage Limit Amplifier . . . . . . . . .
16
84
85
88
89
89
90
. . . . . . . 90
91
91
92
95
99
100
103
List of Tables
1.1 Typical vs Goal Performance Specifications for Offset Compensation 26
4.1 Calculated vs simulated amplifier performance . . . . . . . . . . . . . 52
5.1 Simulated Hysteresis vs Process and Temperature Corner . . . . . . . 77
17
18
Chapter 1
Introduction
In today's information age there is an ever increasing demand for products that deliver
higher performance, lower power dissipation and smaller form factor than existing
designs. New products that improve in these three areas will enable the continued
exponential growth of the worldwide communication infrastructure. Ultimately, a
combination of novel architectural and circuit techniques need to be developed to
achieve this end.
Offset compensation is important for aggressive high speed design to achieve high
input sensitivity and low DC offset. In traditional offset compensation implementa-
tions the data dependence of the control signal is proportional to the compensation
loop bandwidth. As a result of this limitation, current approaches suffer from long
compensation times and require expensive off-chip components to minimize data-
dependent jitter and to meet protocol (i.e. SONET/SDH) jitter specifications. Ma-
nipulation of fundamental characteristics of the differential architecture allow us to
modify this relationship in the proposed approach. The goal of this thesis is to de-
velop a broadband limit amplifier with dynamic, fully integrated, continuous-time
DC offset cancellation that achieves sub-1piS compensation times while providing a
low jitter, constant amplitude output. The limit amplifier will be used as the main
amplifier in an optical network receiver. The chip has been implemented in National
Semiconductor's CMOS9 0.18pm CMOS process.
Additionally, a simple numerical procedure is introduced that enables straightfor-
ward design of high speed, resistor loaded, differential amplifiers in modern CMOS
processes. The design procedure is beneficial because the device characteristics of
modern CMOS processes dramatically depart from traditional square law character-
istics. The analytical form of the procedure allows for an intuitive perspective of
the varying gain-bandwidth product for such amplifiers. Calculations based on the
method are compared to Hspice simulated results based on a National Semiconduc-
tor's 0. 18u CMOS process. Application of the design methodology to the design of
high speed, source-coupled logic (SCL) gates and latches is also discussed.
19
1.1 Background
One well established industry standard for broadband optical fiber networks known
as SONET, or Synchronous Optical NETwork, is defined for various data speeds.
Operating at a data rate of 10Gbps, the OC-192 SONET standard allows for very high
data transfer rates on optical fiber cable over long distances. One obstacle to achieving
the desired performance is DC offset in the data link. The offset can be introduced by
the transmitter, the transmit path or the circuit components in the front-end receiver.
The desire to reduce the cost structure and improve the efficiency of these networks
will increase the demand for quick power-up and switching between multiple incoming
links. This demand will drive the need for fast DC offset compensation.
Figure 1-1 exhibits a block diagram of a modern optical-link system. At the
transmit end of the fiber cable, a laser driver feeds a synchronous, Non-Return to Zero
(NRZ) data stream into the cable, which can range in length from tens to thousands of
kilometers. Sending distinct clock and data signals would require separate, dedicated
cables and would be prohibitively expensive. Therefore, a single serial data stream
is transmitted with the clock encoded in the data transitions. At the receiver end of
the cable an avalanche photo-diode drives a Transimpedance Amplifier (TIA) that
translates the current signal to a voltage signal. The output of the TIA then feeds the
main amplifier, which is the focus of this thesis. The main amplifier must amplify this
small voltage signal to a large enough level so that the clock and data recovery (CDR)
circuitry can operate reliably. The purpose of the CDR is to extract the clock signal
from the data stream and re-time the data to the new, synchronized clock signal.
Transmitter Receiver
....... ............. .........
PD: Transmit Path PData
MUXOpiaFbr CDR Clc Demux
TRA TIA LA4
Tx Clock
Figure 1-1: Block Diagram of High-Speed Data Link System
The overall design goal of the main amplifier is to increase the signal strength with-
out degrading the SNR achieved by the system front-end. Additionally, the amplifier
must provide a low jitter, constant amplitude input to the CDR. There are two main
architectures that can be used to implement the amplifier, namely an Automatic Gain
Control (AGC) amplifier or a Limit Amplifier (LA). As the name suggests, the AGC
amplifier dynamically adjusts its gain depending on the input amplitude to achieve
a constant amplitude output signal. In contrast, the limit amplifier achieves the
same effect by forcing the output to saturate at a known amplitude for input signals
greater than some predetermined minimum amplitude. The limit amplifier design
was selected for this work because it is more amenable to high-speed operation.
The key advantages of the limiting amplifier over the AGC are higher operat-
ing speed and lower implementation complexity. AGCs will exhibit inferior high-
20
frequency characteristics compared to limit amplifiers because of the increased capac-
itive loading of the sense and feedback circuitry. Also, the added feedback network,
for gain control, increases the design complexity. Therefore, the limiting amplifier
topology was used in this thesis. A limit amplifier can be implemented as a cascade
of resistively loaded differential pair amplifiers, as shown in Figure 1-2. Each of the
amplifier blocks is implemented as differential pairs, as shown in Figure 1-3.
Vn Av(1) Av(2) ---- Av(n-1) Av(n Vout
Figure 1-2: High-Speed, Multi-Stage Limit Amplifier
R R
Vout+ vout-
Vin+ W W Vin-
Ylbias
Figure 1-3: Implementation of Each Stage in Limit Amplifier
Cost is the fundamental driving force behind process selection for modern day
integrated circuits. CMOS has become the process of choice compared to more spe-
cialized III-IV processes like SiGe, GaAs and InP due to the immense inertia behind
process development in the PC market. However, this economic advantage does not
come without added design challenges. CMOS presents its own unique problems for
high speed design due to its lower ft and higher 1/f noise corner compared to the afore
mentioned processes. Despite these challenges, CMOS is the preferred technology due
to its low cost, high integration density and fast paced technology development driven
by Moore's Law.
21
1.2 Motivation
1.2.1 Review of Offset Voltage
As mentioned earlier, the data-link network suffers from two main sources of DC
offset. The first source of offset is the received signal. The circuit components of the
receiver are the other source of offset. The following section will examine the origin
of these two sources of offset and the impact of the offset on the system.
The offset introduced by the incoming signal originates at the transmitter and
is due to the power of the incoming signal. When the receiver initially powers up,
the received signal power may introduce a DC offset in the receiver. Additionally,
the magnitude of the offset can change when switching from one transmit path to
another. Different power levels between the two transmit paths, caused by different
output powers of the two transmitters or different losses in the transmit paths, will
result in a change in the DC voltage offset of the receiver.
The main amplifier itself may also introduce a finite DC offset component due to
mismatches in the differential paths. To understand the origin of the offset voltage,
consider the resistively loaded differential pair shown in Figure 1-3. Assuming V, = 0
and perfect symmetry, V0st = 0. However, this assumption is violated in practice due
to device mismatches in transistor physical dimensions, threshold voltages and resistor
values so that V0st = 0. The output referred offset voltage is defined as the voltage
that exists at the output with Vin = 0. By convention, this voltage is referred to the
input and the offset voltage is therefore the voltage that must be applied to the input
to force Vost = 0. The input referred offset is related to the output referred offset
voltage by I Vosi= vo,", 1, where A, is the gain of the amplifier.
It is beneficial to develop an expression for the offset voltage in terms of the circuit
parameters to determine how we, as the designer, can minimize it. Assuming that
the input devices operate in velocity saturation, the final expression for the input
referred offset voltage is:
v2 = (VGS - VTH ) 2 . [ ) + (v )H] + A 2(11)
os i (W R vT
where (AW/W), (AR/R) and (AVTH/VTH) are the normalized variation in the tran-
sistor width, load resistance and transistor threshold voltage, respectively, of the
amplifier. The reader is invited to refer to Appendix A for the full development. The
resulting equation is similar to the result found in [1], where square-law operation
was assumed. The exception is that a scaling in the magnitude of the over-drive term
and no direct dependence on device length, L. The offset is indirectly dependent on
L through the VT2H term.
By examining this result some useful insights can be obtained. First, the off-
set voltage is dependent on transistor length mismatches through the dependence on
threshold mismatches. Second, this analysis shows that threshold voltage mismatches
are directly referred to the input and that mismatches in transistor width and load re-
22
sistance are scaled by the transistor over-drive. Therefore, to minimize offset voltage,
the transistor over-drives should be minimized by either reducing the bias current or
by increasing the device widths. Reducing the bias current is only appropriate in low
power (i.e. low-speed) designs. As will be shown in Chapter 4, the transistor widths
and load resistance are not free variables when designing resistively loaded, differen-
tial amplifiers. The device dimensions and bias conditions are uniquely determined
when the gain, output swing and either bandwidth or power are specified. Addition-
ally, appropriate layout matching techniques such as common-centroid layout and
using dummy devices/stripes to minimize device mismatch should be incorporated
where appropriate. Unfortunately, as shown in Equation 1.1, the offset voltage of
the amplifier can not be reduced to zero even if the utmost care is taken in the de-
sign and layout. In the next section we will explore the impact of DC offsets on the
performance of the amplifier.
1.2.2 Impact of Offset Voltage on Amplifiers
Both the undesired input-referred offset and the desired input signal experience the
large gain of the limit amplifier, which is usually on the order of 40dB to 60dB.
Typical values of input-referred offset voltage can range from lmV to 10mV since the
high-speed amplifier stages are designed for maximum bandwidth at the expense of
matching and offset issues. The input-referred offset can be comparable in magnitude
to the input signal levels for high-speed optical receivers. For this reason, the offset
voltage can decrease sensitivity to incoming signals or, even worse, drive the later
stages into nonlinear operation and cause the outputs to saturate. In extreme cases,
the offset can be large enough to block the desired signal. For these reasons, some
form of offset compensation is required in modern, high-speed data-link systems.
1.3 Prior Offset Compensation Approaches
Several existing offset compensation techniques can be found in the literature. The
existing approaches generally fall into one of two categories: active sampled systems
or passive continuous time systems. We will explore the most significant methods
in this section. Two important characteristics of each design are the compensation
time and the data dependent output jitter. Long compensation times lead to loss
of data and decrease system efficiency. One source of jitter in the output signal
is data dependence of the offset compensation control signal. Unless the measured
offset is sufficiently filtered in this method, the proportional control signal will lead to
increased output jitter. All of the system discussed below exhibit a direct relationship
between compensation time and output jitter.
1.3.1 Sampled Offset Compensation
The three most common offset compensation techniques that fall into the sampled
system classification are auto-zeroing, correlated double sampling and chopper stabi-
23
lization [2]. The basic principle behind auto-zeroing and correlated double sampling
is to sample the undesired offset that exists in the system on one clock phase and to
subtract it from the desired signal on the following clock phase. By design, both of
these techniques require a clock and a sampling phase to measure the offset in the sys-
tem, fundamentally limiting the maximum input data rate to half the sampling rate.
Additionally, each of these techniques require sampling capacitors in the data path
which can be quite large if designed for minimal noise. On the other hand, chopper
stabilization achieves the same result by operating in the frequency domain. Com-
pensation is performed by modulating the desired signal to a higher frequency, where
the undesired offset and noise signals do not exist, performing the amplification on
the modulated signal and finally demodulating the amplified signal back to baseband.
Chopper-stabilization methods are fundamentally limited to low-speed applications
because the residual offset, or the offset remaining after compensation, is propor-
tional to the sampling rate. If the sampling rate is set too high, the residual offset
will increase. Also, the forward path gain can be attenuated and the noise floor will
increase due to the aliasing of the wide-band noise into the frequency band of inter-
est. Ultimately, none of these techniques are amenable to high-speed, continuous-time
systems.
1.3.2 Low-Pass Filter Compensation
By far, the most common technique for offset compensation in continuous time, high-
speed, broadband systems is to use a low-pass filter in a feedback configuration [3,
4, 5], as shown in Figure 1-4. The bandwidth of the filter must be set sufficiently
low to ensure stability of the overall system and to ensure that there is minimal data
dependence of the control loop.
In + Amp iOut
R
Vcontrol
BW =27RC C
Figure 1-4: LPF to Extract Output Referred Offset in High-Speed Data Link Systems
Due to the requirement for such a low loop bandwidth, this design has two sig-
nificant disadvantages. First, the very small loop bandwidth translates to a very
large time constant for the loop dynamics which results in very long compensation
times [6, 7]. Long compensation times will become a significant issue as emerging
standards such as Optical Time-Division Multiplexing (OTDM) [8] and Dense-Wave
24
Division Multiplexing (DWDM) [9] take hold in commercial applications. Second,
from the viewpoint of cost and ease of integration, the small bandwidth requirement
results in large component values that are not economically feasible to implement on
chip. Specifically capacitor values on the order of 10's of pF are required which would
consume considerable silicon area if implemented on chip. The result is the need for
expensive off-chip components.
1.3.3 Other Approaches
One of the earliest forms of offset compensation in the literature uses Minimum Mean-
Square Estimation (MMSE) [10, 11]. Similar to Chopper Stabilization, MMSE per-
forms the offset compensation in the frequency domain using adaptive equalizers.
The equalizers, which are slowly time-varying linear filters, will insert a null in the
transfer function at DC to compensate for DC offsets. One issue with this approach
is that the magnitude of residual offset is proportional to both the number of filter
taps and the magnitude of the uncompensated offset of the system. Therefore, low
residual DC offset requires high equalizer complexity and a small input referred off-
set. As CMOS processes continue to scale the increased digital complexity required
to implement the higher order filters will become less of an issue and this approach
may become more feasible.
Another solution, implemented in a silicon bipolar process, uses peak detectors to
measure the output referred DC offset (drift) of the main amplifier [11]. The output
of the peak detector is low-pass filtered to reduce the data dependence of the control
signal. Finally, the output of the low-pass filter feeds the input stage to perform the
compensation. Similar to the low-pass filter approach, the data dependence of the
control signal is proportional to the control loop bandwidth. To minimize the output
jitter this design also suffers from very long compensation times.
An alternative solution, that also uses maximum detectors in the feedback loop to
extract the output referred offset voltage, was proposed by Tanabe et al [12]. How-
ever, this design was implemented in a CMOS process. Compensation is performed
by feeding the difference between the instantaneous maximum value of the two dif-
ferential data signals to the input stage. Assuming a 50% duty cycle between high
and low data transitions (i.e. the average value of the data is zero) and that the
bandwidth of the maximum detector is sufficiently low, this implementation works
as intended. However, this design suffers from the same limitations as previous ap-
proaches. Specifically, the data dependence of the control signal is directly related to
the bandwidth of the compensation loop. When the data has extended periods with
a non-zero mean value the control signal is data dependent. To minimize the data
dependence of the control signal, the bandwidth of the maximum detector must be
very low which leads to long compensation times.
All of the offset compensation designs considered above suffer from the same
limitation. Namely, the magnitude of the output jitter is directly coupled to the offset
compensation time. To reduce the output jitter in these designs the compensation
loop bandwidth must be very low. This restriction results in long compensation times.
The following section introduces the proposed approach which dramatically reduces
25
the dependence of the output jitter on compensation time.
1.4 Proposed Approach and Contribution
There are two significant obstacles to designing a fast offset compensation network in
CMOS. First accurately measuring the offset voltage of the system is difficult. The
biggest reason for this is that CMOS transistors used to perform diode functions (i.e.
source follower) have limited high frequency capability. Additionally, mismatches
between the minimum (min) and maximum (max) detectors introduce error into the
measurement. Second, it is difficult to simultaneously generate a control signal that
is independent of the data while achieving fast settling performance. The reason
for the difficulty is that the amount of droop at the output of the peak detector is
proportional to the loop bandwidth in typical min/max detector designs.
The key contribution of this thesis is the development of a peak detector that
enables the design of a fast offset compensation loop in CMOS processes that also
meets strict jitter specifications. In traditional CMOS peak detectors the bandwidth
is proportional to the bias current when the input is high. Similarly, when the input
is low the amount of droop at the output is also proportional to bis. In the proposed
peak detector this restriction has been practically eliminated. The bandwidth is still
proportional to the bias current. However, the amount of droop at the output is
now proportional to an NMOS transistor off-state leakage current. Since the off-
state leakage current of modern CMOS devices is typically orders of magnitude less
than the peak detector bias current, the amount of droop is also reduced by several
orders of magnitude. This thesis focuses on the peak detector design and system
implementation details.
Output jitter and settling time performance for typical offset compensation designs
are compared to the targeted performance of the proposed approach in Table 1.1.
Although the output jitter targets are identical, the settling time goal in the proposed
solution is almost 3 orders of magnitude shorter than the typical design goals. The
peak detector allows the proposed system to meet both the jitter and the aggressive
settling time performance goals.
Typical Proposed
Specification Design Design
Output Jitter < 2pS-,_ < 2pSp-
Settling Time ~ 500pS < 1pS
Table 1.1: Typical vs Goal Performance Specifications for Offset Compensation
1.5 Thesis Organization
This thesis is organized as follows. Chapter 2 introduces the proposed architecture to
achieve the sub-1pS offset compensation time while still satisfying the output jitter
26
requirement of the CDR input. Chapter 3 discusses the linear modeling of the pro-
posed system architecture and computes the system parameters required to guarantee
stability and the desired dynamic response. Chapter 4 presents a novel, closed-form
numerical methodology for designing resistively loaded, high-speed, differential ampli-
fiers that make up the limit amplifier. Circuit design issues and CppSim and Hspice
simulation details are discussed in Chapters 5 and 6, respectively. Important lay-
out issues are discussed in Chapter 7. Finally, Chapter 8 presents conclusions and
potential extensions for future work.
27
28
Chapter 2
Proposed Approach
2.1 Measuring Offset Voltage
The most significant obstacles to designing a fast offset compensation network are
accurately measuring the offset voltage of the system and generating a control signal
that is independent of the data. This chapter incrementally develops the proposed
design of the key enabling component in the offset compensation loop, the peak
detector.
2.1.1 Extracting the Offset Voltage with Min/Max Detectors
One possible solution for measuring the offset voltage of the limit amplifier is to take
the difference between the common-mode voltages of each output signal [13]. The out-
put common-mode level can be obtained by taking the average of the instantaneous
maximum and minimum output values with max and min detectors, respectively, as
shown in Figure 2-1. The max and min detectors can be either continuous time or
sampled systems. Although high performance peak detectors have been designed in
BiCMOS and Bipolar processes [14, 15, 16], it is not trivial to do so in CMOS pro-
cesses. The fundamental design challenge is the limited high-frequency performance
of CMOS transistors used to perform diode functions (i.e. source follower).
Typical maximum and minimum detector implementations are shown in Figure 2-
2. The output of the minimum detector will track its input, plus a VGS shift equal to
(VTH + VDSAT)M1. Similarly, the output of the maximum detector will track its input,
less a VGS shift equal to (VTH + VDSAT)M2. In steady-state operation, both M1 and
M2 remain on and either sink or source a current equal to bis, such that the charge
stored on Cmin and Cmax remains unchanged. If the output referred offset voltage
increases ID,M1 will decrease in order to cancel charge on Cmzn. Vt will increase
until steady state conditions are met. Likewise, ID,M2 will increase to add charge to
Cmax until steady state conditions are met. Conversely, if the output referred offset
voltage decreases then ID,M1 will increase and ID,M2 will decrease until steady-state
conditions are once again met.
29
Vout
VOS
time
Figure 2-1: Measuring Output Referred Offset Voltage Using Minimum and Maximum
Detectors
Min Detector Max Detector
I[ Ibias Vin] M2
vout
Vin- M1  VuCma Ibias
Figure 2-2: Traditional Implementations for Minimum and Maximum Detectors
2.1.2 Issues with Sensing Offset with Min/Max Detectors
There are two main issues with measuring the offset voltage with the different types
of detectors, as shown in Figure 2-2. First, since we are attempting to measure the
offset with min and max detectors that are based on PMOS and NMOS transistors,
respectively, the accuracy of the measured offset is limited to the matching between
the two device types. The transistor threshold voltages, transconductance and even
physical dimensions will change with process, voltage and temperature variations
and these changes will not necessarily track in the two devices. Ultimately, these
differences will introduce offsets into the compensation loop and limit the effectiveness
of the compensation.
To understand the second issue, we need to examine the max and min detectors'
ability to track changes at their inputs. We will only consider the response of the max
detector, here to referred to as a peak detector, and infer the min detector operation
by extension.
For a unit change at the input, the output will follow by either adding charge to or
subtracting charge from Cmax. Consider first a step increase at the input. The output
voltage will increase by MI sourcing current onto Cmax and the rate of change at the
output will be limited by Ml's transconductance. The ratio of the transconductance
of MI to Cmax corresponds to the bandwidth of the peak detector while the device is
30
on:
f3dB = gm,M1 (2.1)27rCmax
Since gm is proportional to the bias current, the bandwidth is also proportional to
the bias current. If the bandwidth of the peak detector is much higher than the data
rate then M1 will fully charge Cmax so that the output will take on the correct value
at each successive peak, thus operating as a zero order hold that samples the peaks
of the input. If the bandwidth of the peak detector is set much lower than the data
rate, the high frequency components of the input will be greatly attenuated and the
output will track the lower frequency components of the input.
Alternately, consider a step decrease in the magnitude of the input signal. The
output will slew according to the tail current source's ability to strip charge away
from Cmax and the change in the output voltage will be:
-
Ibias - IMI
Cmax
where St is the data symbol period. The magnitude of the droop at the output is
proportional to the bias current and the number of symbol periods that the input is
low.
To understand why the asymmetric response of the peak detector is an issue, con-
sider the case when the input is driven by a constant amplitude, Non-Return to Zero
(NRZ), pseudo-random data stream whose amplitude does not change. Let us also
assume that the peak detector operates in steady-state (i.e. the offset compensation
has been performed), as shown in the top of Figure 2-3. When the input is high, the
output of the peak detector will be refreshed to its correct value. However, when the
input goes low, VGS,M1 will be reduced so ID,M1 will be either very low or zero and
Cmax will discharge according to Equation 2.2 as shown in the bottom of Figure 2-3.
If the output is low for n symbol periods, then Cmax will discharge according to:
WV = "" - nt = n A (2.3)
Cmax
where and n is the number of successive low bits at the input. The total droop is
defined as n - A. When the input goes high the output will return to the correct,
steady-state value. Therefore, the measured offset voltage is data dependent and
violates one of our design requirements for the offset compensation.
2.1.3 Extracting Offset with Simple Max Detectors
The offset issue due to the mismatch between the NMOS and PMOS devices in the
max and min detectors can be solved by taking advantage of the symmetry of the limit
31
0Figure 2-3: Influence of Symbol Period on Droop of Simple Peak Detector
amplifier structure. Since the data paths through the limit amplifier are differential
and the amplifier stages are symmetric, the gains through each path are close to being
equal, in practice. If the gains through each path are similar then the peak-to-peak
values must also be equal. With zero offset in the limit amplifier, the peak values of
the two paths must also be equal. However, as shown in Figure 2-4, if the output
referred offset is non-zero then the peak values of two outputs will be different. In
fact, the difference will equal the output referred offset and the max/min pair can
be replaced by a simple peak detector, as shown in Figure 2-5. This observation
eliminates the offset issue due to the mismatched min/max detectors [12].
Vout
time
Figure 2-4: Measuring Output Referred Offset Voltage Using Maximum Detectors
Only
2.1.4 Issues with Sensing Output Referred Offset with Max
Detectors
The offset issue caused by mismatches between different type detectors has been
solved by taking advantage of the symmetry of the limit amplifier stages. However,
the basic architecture of the peak detector has not changed and it still suffers from
32
Max Detector
V1in M1
Vout
Cma bias
Figure 2-5: Schematic of Typical CMOS Maximum Detector
the same data dependence issue described in Section 2.1.2. To solve fix this issue we
need to develop a new peak detector design.
2.1.5 Final Peak Detector Design
The operation of the basic peak detector was described in Section 2.1.2. The remain-
ing issue is related to the droop of the peak detector output when the input voltage
goes low. The absolute magnitude of the droop is not specifically the issue, rather
the dependence of the degree of droop on the symbol period, and hence the data
dependent control signal, is the issue.
The fundamental problem is that both the bandwidth of the peak detector when
the input is high and the amount that the sampling capacitor is discharged when
the input is low are proportional to Ibias. One possible solution is to decrease 'bias,
effectively reducing the rate that charge is stripped from the storage capacitor and
reducing the amount that the output droops each data period. However, since the
bandwidth of the peak detector is also proportional to 'bia, when the input is high,
this approach will directly impact the tracking ability of the peak detector. We need
to develop a method of measuring the offset that preserves the required slew rate
and bandwidth during the tracking phase while reducing the discharge current on the
hold phase. Fundamentally, there is no way to reduce the dependence of the amount
of droop on the symbol period with the current topology without paying a severe
performance penalty.
We propose that the peak detector circuit shown in Figure 2-6 provides a simple
solution to this problem. Let's consider the operation of the circuit. Transistors M1
and M2 act as simple source followers, similar to device M1 in the basic peak detector
described in Figure 2-5 and transistors M3 and M4 act as switches. When the input
is high M3 and M4 are closed so that the peak detector behaves as a traditional peak
detector. Alternately, when the input is low the switch devices are open and prevent
Ibia, from discharging Cmax.
Compared to traditional peak detector designs, the droop in this design is greatly
reduced because the switch devices, M3 and M4, dramatically reduce the dependence
33
Vin + M1 M2 Vin-
Vo+ Vo-
CLI ICL
M3 M4
Figure 2-6: Schematic of Proposed Peak Detector with Reduced Droop
of droop on bias. The amount of droop per data period in the traditional peak
detector design is determined by bia, while the amount of droop per data period in
the proposed peak detector design is determined by the off-state current of the switch
devices. Although the output of the peak detector is still dependent on the symbol
length, the magnitude of the variation is greatly attenuated. This point is illustrated
in Figure 2-7 by the difference in droop between the response of the traditional peak
detector, represented by the solid black line, and the response of the new peak detector
design, represented by the dashed line.
t
C t
Figure 2-7: Comparison of Influence of Symbol Period on Droop of Simple Peak
Detector vs Proposed Peak Detector Design
2.2 Summary
This chapter presented the design of the proposed peak detector implementation. The
addition of series switch devices, which are controlled by the input, prevent the peak
34
detector bias current from discharging the sampling capacitor when the peak detector
input is low. In traditional peak detector designs both the peak detector bandwidth
and droop are determined by the peak detector bias current. In the proposed design
the bandwidth of the peak detector is determined by its bias current while the droop
is determined by transistor off-state leakage current. By substantially reducing the
dependence of the droop on the bias current, this peak detector design enables the
system to simultaneously achieve the fast settling time and low output jitter goals.
In the next chapter the system modeling issues will be discussed.
35
36
Chapter 3
System Modeling
The peak detector design presented in the previous chapter is the corner-stone of
the offset compensation loop. The system topology and modeling of each control
loop will be presented in this chapter. Additionally, we will determine the system
parameters that provide the desired system dynamics in this chapter. There are
several performance parameters that must be considered when determining the system
parameters:
* Loop bandwidth -+ compensation time: Determined by compensation time goal
and impact on jitter of output signal.
" Forward path gain: Based on input and output signal characteristics
* Output jitter: Need to minimize output jitter for the CDR that follows
* System stability: Unconditional requirement
3.1 System Level Implementation
The next step is to pull all of the pieces together and implement the complete control
loop. A fully differential implementation of the system is shown in Figure 3-1. The
peak detector is used to measure the offset referred to the output of the limit amplifier.
The integrator in the feedback path filters the instantaneous peak detector output.
Additionally, the integrator forces the steady-state, output-referred offset voltage to
be zero regardless of the loop gain. However, there are a few changes that need to
be made to the system based on the assumptions that we made in developing the
proposed peak detector design.
As explained in Section 2.1.3, the offset information is contained in the peaks of the
outputs (i.e. the output referred offset is equal to the difference in the peak values).
Therefore, we need to account for the case when the output of the limit amplifier
becomes saturated. A typical DC transfer function between the input voltage and
output voltage of the basic limit amplifier stage is shown in Figure 3-2. The output
nonlinearly approaches a maximum value, determined by the positive power supply,
and ultimately saturates due to either a large input amplitude or a large offset.
37
- Integrator -- eak o
Integrator -- eak o
Vin + A(1) vout
Vos+Vn
Figure 3-1: System Level of Limit Amplifier with Offset Compensation
Compensation will be ineffective, or at least severely degraded in performance, if we
attempt to measure the offset from the saturated output. To solve this problem,
multiple control loops are used with taps located at each of the limit amplifier stage
outputs, as shown if Figure 3-3. The windowing and select logic determines the first
amplifier stage with a non-saturated output and compensates the output referred
offset at the selected output. Additionally, the select logic is dynamic and the selected
output tap location can change as the system is compensated and outputs later in
the amplifier chain become unsaturated.
... Non-U near .. . ............. ..... .. ..
Region
Input Voltage [Volts]
Figure 3-2: Typical Transfer Function for Limit Amplifier Cell
As shown in Figure 3-2, for large amplitude inputs the output nonlinearly ap-
proaches the maximum value defined by the positive power supply. This nonlinearity
can also reduce the effective gain of the offset compensation loop. To minimize this
undesired effect, the switching threshold of the windowing logic can be set lower than
the positive supply voltage, say by 50 - lO0mV, so that the non-linear portion of the
amplifier transfer function does not impact the offset compensation. There are two
seeming drawbacks to this solution.
First, the maximum amplitude of input referred offset that the system can compen-
38
-- Integrator H- Analog Mux Widwng-
Select pa
Logic Vpeak-
--- Integrator H- Analog Mux 1 4
Vpeaki+ Vpeak2+ peak3+
Pealk VPeak Peak
Detector Vpeak1- Detector Vpeak2- Detector V ak3-
In Ay Av Av Out
Figure 3-3: Complete System Showing Multiple Control Loops and Logic
sate is reduced because we have limited the range of each loop to avoid the non-linear
portion of the amplifier transfer function. However, the input referred offset would
have to be large enough to saturate the output of the first stage in the limit amplifier
for this to become an issue. In this design, the maximum input referred offset that
can be compensated is 450mV. However, it is highly unlikely that the input referred
offset would be this large.
The second potential issue is that the amplifier cells situated after the selected
compensation tap in the limit amplifier operate open-loop and any offset added by
these stages will not be compensated. As mentioned in Chapter 1, the DC offset
introduced by each limit amplifier stage, referred to its own input, will be on the
order of a few millivolts. Each of these offset components are referred to the output
of the limit amplifier through the gain of subsequent stages. The aggregate output
referred offset that can not be compensated is the sum of these components. Assuming
that the total output referred offset remains on the order of a few 10's of millivolts,
which will be true in practice, this condition is acceptable. The goal of the offset
compensation is to eliminate the gross offset that causes the output of any stage in
the limit amplifier to saturate.
To ensure that the system dynamics are consistent over all possible offset values,
the system parameters for each control loop are set equal. Since the gain increases at
each subsequent output of the limit amplifier, the gain in each feedback path must
be adjusted to satisfy this requirement. Full details of the system modeling will be
covered later in this chapter.
3.2 Linear System Modeling
To model the control loop we need to make some simplifying assumptions. First, to
eliminate the difficulty of analyzing multiple control loops, we only consider the case
39
when there is one active control loop. In the end, we can extend the analysis to the
more general case when there are multiple control loops and test that this assumption
is valid in simulation. Further, we can assume that all blocks in the system are linear
about a given operating point and make use of LTI modeling techniques. We will
now develop models for each of the system blocks.
Each of the amplifiers in the forward amplifier path can be modeled by a DC gain
and a single pole, representing the bandwidth of the amplifier. Therefore, the linear
model for each amplifier is:
H(s) 1 ± (3.1)
1 + s/pi
where Av is the gain and pi is the pole at the 3dB frequency. If we consider a cascade
of n amplifiers, the aggregate transfer function becomes:
H(s) = ( (3.2)
1+ s/pi)
Additionally, the peak detector can be similarly modeled by its DC gain, K1 , and
a single pole, P2, and has the same form as Equation 3.1. If the bandwidth of the
peak detector is low enough the model takes the same form as Equation 2.1 after
some simplification. The justification for this abstraction is that the peak detector
only needs to measure the average output referred offset of the system, or the DC
component of the output signal. To first order, the output of the peak detector is not
affected by instantaneous variations at its input. The final form of the peak detector
model is:
K1  _K 1 -P (323
H(s) KK,*2(3.3)1 + s/p2 S
The integrator can be modeled as an ideal integrator:
H(s) = (3.4)
S
where K 2 is the gain. Putting all of the pieces together, the complete model for the
forward amplifier path and the offset compensation is shown in Figure 3-4.
For the following discussion, we assume that pi > P2. This is valid in this system
since p, corresponds to the bandwidth of the limit amplifier (10GHz), and P2 corre-
sponds to the bandwidth of the peak detector (~ 10MHz). In a similar fashion to
the linear model for a PLL, where the state variable is phase and not the data signal
itself, the variable of interest in this system is the offset voltage. To characterize the
system response, we define the open loop response to be:
40
Amplifier
Vos,in + A -VOS ut
+ 1 + s/p1
K2 K1
s 1.0 + s/p2 4-
Integrator Peak Detector
Figure 3-4: Linear Model of Limit Amplifier with Offset Compensation
A(s) = ( v + ) ) (K2) (3.5)
(I + s/1) (1 + s/P2 S
The necessary criteria for stability can be determined based on traditional feedback
heuristics by analyzing the behavior of the open-loop parameter A(s). In this system,
P2 is the peak detector pole location and ft is the unity-gain frequency. From the Bode
plot in Figure 3-5, we can readily see that the integrator in the feedback path reduces
the magnitude at 20dB/dec at frequencies below the first pole, P2, and introduces a 90'
phase shift. Phase margin is defined as the difference in phase from 180' at unity gain.
If we require greater than 450 of phase margin to be stable then a necessary condition
is that P2 > ft. Additionally, as the loop gain increases, the unity-gain frequency
increases and the phase margin, and therefore stability, degrades. Ultimately, there
are optimal values for P2 and the loop gain, AK 1 K 2, that guarantee stability and
provide the desired loop dynamics.
To gain more intuition of the system modeling, we can further define a closed loop
response parameterizing function G(s) as:
G(s) = A(s) (3.6)1+ A(s)
where A(s) is the open loop response defined above. If AK 1 K 2 = 0 then the loop is
open, there is one closed-loop pole, Pi,closed-loop, located at the origin and there is a
pair of closed-loop poles located at:
P2/3,closed-loop = -0.5 * (P1 + P2) T P1 + P2 )2 _ 4P1P2 (3.7)
The pole locations, P2/3,closed-loop, roughly correspond to the open-loop pole locations,
pi and P2. To understand how the closed-loop poles vary with increased gain, we
construct the root locus plot in Figure 3-6. As the DC loop gain increases, the first
41
100
50 ..- ...- .... 24 dB /de c. I -.. ... ... ... .. ..- .... -...
-0 -50 .. ... --- .. -.. -.-..- 40dB/de o -....-
a) -1 0 0 -..----- . -. . .. . ..- . ... ..-. ..---.. .... .-- .- .- L o o p . ..
=D -1 5 0 .. ....-- .. . . ..- . .. . . .- .. ..-. . .. .. . .. --..- - .-.-- .G a in *. . .
0 -2 0 0 -.. . ... .. ... . P 2 - -... -. -.. ... .. .. .... ... .
-2 5 0 - ... ... .... ... ... .-. -..-. .--.. ..- ..- ..--- .-.- .-
:.......:I ........... 
....
-300 - -. --. ---.---.-. -.-.-.-.-.
-3 5 0 ---. .. .. .. .... . ..-. --.---..- .- P1 -.. .---..
-400
a)
0) 180 -- - - - - -. - . -
*> I
102 104 6 10 10 1012
Frequency (Hz)
Figure 3-5: Bode Plot Showing Stability Degradation with Increasing Gain
two closed-loop poles, Pi,closed-Ioop and P2,closed-loop, approach each other from zero
and the first open-loop pole location (P2) along the negative real axis as shown in
Figure 3-6. Additionally, P3,closed-loop moves away from the origin along the negative
real axis. When AK 1K 2 ~ P2/4, where P2 is the second open-loop pole corresponding
to the peak detector, a complex conjugate pole pair is formed that diverges at an angle
of ± 600 to the real axis. As the open-loop gain increases, the poles will cross into the
right-half-plane and the system will become unstable. So, how do we determine the
value of gain that not only guarantees stability but also provides the desired settling
response? One solution is to use the PLL Design Assistant [17].
3.2.1 Modeling System Response with PLL Design Assistant
The PLL Design Assistant is a useful tool that was developed to aid the design of PLL
systems and can be downloaded at http://www-mtl.mit.edu/perrottgroup/tools.html.
However, with a little imagination this tool can be used to model nearly any linear
system. We can think of the system in Figure 3-4 as a second order, type I PLL with
the forward amplifier path corresponding to a high frequency parasitic pole. Let's
assume that the open-loop pole pi is set to 10GHz, based on the desired data-rate, and
that the gain of the peak detector, K 1, is unity. Then, using the PLL Design Assistant
we can specify a desired closed-loop bandwidth, based on the desired compensation
settling time, and step-response shape to achieve the optimal settling time.
For example, Figure 3-7 illustrates the the GUI of the PLL Design Assistant
designing the system loop with a Bessel shape and a bandwidth of 2.5MHz. The
resulting gain coefficient, K, corresponds to the product AK 2, assuming that the
peak detector has a gain of one. The pole frequency, fr, corresponds to the open-loop
42
x10 8
18 16 14 12 10 8
Real Axis
P1 dosed o 'p
6 4 2 0 5x1 0
Figure 3-6: Root Locus Plot of G(s) Showing Necessary Condition for Stability
pole of the peak detector, P2. Note that the closed-loop complex conjugate pole pair
follow the trajectory determined in the root locus analysis and that the frequency
of the open-loop pole P2 is higher than the dominant closed-loop pole frequency, as
required by our earlier stability analysis. The resulting step-response of the closed-
loop system, shown in Figure 3-8, indicates that the total settling time for the offset
compensation is roughly 500nS and that the system is stable.
# *UM"fbU '
*
*
Sass ass
Figure 3-7: PLL Design Assistant Graphical Interface
43
1.5
1.0
0.5
U,
CU 0
E
-0.5
-1.0
-1.5
P2,closed loop
......-. ........ ... . .. ..
- ...........-. ........-. ..--- - - - - -.--. .-
--- . -. .. .
w
-J
U)
z
1.0
0.1 0.2 0.3 0.4 0.5 0.6
Time (seconds)
0.7 0.8 0.9 1.0
x 106
Figure 3-8: Step Response of System Designed with PLL Design Assistant
3.2.2 Modeling the Impact of System Parameter Variation
on Stability and Compensation Time
Despite the designers best effort, variations in both process and environmental vari-
ables will impact of system parameters and therefore the overall system operation.
We want to design the system to be robust to some degree of variation so that it
will operate as intended over a wide range of process and environmental conditions.
The PLL Design Assistant provides the designer with the ability to investigate the
impact system parameter variations on stability and the dynamic response of the
system. For example, variations in system parameters can be specified in the PLL
Design Assistant GUI as shown in the alter commands in Figure 3-9. In this case we
introduce a ±20% variation in both the open-loop gain and dominant open-loop pole
location. Figure 3-10 demonstrates the impact on the step response of the system.
Even with these large variations in the system parameters, the compensation loop is
still stable and the total compensation time remains less than 1pS.
3.3 Summary
This chapter presented the linearized model for the offset compensation system. To
simplify the analysis, only a single control loop was considered in this chapter. The
system component values required for stability and the desired dynamic behavior
were determined. The assumption that the analysis can be extended to multiple
loops will be tested in Chapter 6. The next two chapters will present a numerical
design procedure for resistor-loaded differential amplifiers and the circuit design of
each of the system components.
44
Closed Loop Step Response
........ .................
........... ..................................................................
.......... .......... .............................
........ ........................ ................................................................
........... ........ ............ ..................................... ......... ..... .........
0D
E
0.8
0.6
0.4
0.2
H [I-- i
4
4
# # 0 wwwwat at4 4
4
4 4
'S...
Figure 3-9: PLL Design Assistant Graphical Interface
Closed Loop Step Response w/ Variation
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Time (seconds)
0.8 0.9 1.0
x 106
Figure 3-10: Impact of ±20% Variation in Loop Gain and Dominant Pole Location
on the Step Response of the System
45
1.0
0.8
0.6
E
0.4
0.2
A
- ...-- - -- -..
-......- -- .- .......................-- -- - -------. .-. ....-. .-- - - - -- - - -- - - - --.--  .-- - - -
-. ....-  ... .. ......-- ----- -. ...--  .........-. ..-.--. . . --- - - - - - -- -
-. ..... ............. ...- .. ...... -.. ... ...
I
.0
46
Chapter 4
Numerical Design of High Speed
Differential Amplifiers
A novel numerical design procedure was developed to design the limit amplifier stages
[18] that provides quick and accurate results. After specifying the desired performance
metrics for the amplifier, the amplifier device parameters and bias conditions are
determined using the methodology. Accuracy is achieved by leveraging numerical
computation and basing the design on device characteristics extracted from SPICE.
4.1 Methodology
CMOS analog design techniques have traditionally assumed square law characteris-
tics for device I-V curves when calculating the impact of device properties on circuit
performance. However, the square law assumption is quickly becoming highly inaccu-
rate with the introduction of finer line width processes due to non-ideal effects such
as velocity saturation. As a result, the accuracy of traditional design equations is
steadily degrading, and analog designers are in need of alternate approaches to such
formulations.
Thus far, there have been two responses to dealing with changing device charac-
teristics in the analog design community. The first has been to assume square law I-V
characteristics in calculations, and then rely on a simulator such as SPICE to tweak in
final device parameter adjustments. Unfortunately, the square law is rapidly becom-
ing inaccurate to the point that the analytical calculations are practically useless -
all design time is then spent on SPICE simulations. Such an approach removes intu-
ition from the designer's grasp, leads to a lengthy design process (since many tweaks
are required), and often leads to suboptimal performance. The second approach is
to completely automate the analog design process - the user simply specifies per-
formance specifications and some possible topologies, and customized software takes
care of the rest [19]. Unfortunately, while very useful for the design of standard ana-
log blocks, such an approach removes creativity from the designer's grasp and offers
little intuition for the creation of new circuit topologies.
We propose an alternate approach to this issue - develop numerical procedures for
47
Vdd
R R
vo+
1o0 1 CL L
win W in-
n0
incremental lbiasground b,
Vs5
Figure 4-1: Differential amplifier used in calculations.
designing specific classes of circuits which resemble hand analysis, but use simulated
device characteristics in place of analytical expressions. By sticking with procedures
similar to hand analysis, much intuition can be gained about design tradeoffs. By
using simulated device characteristics, the results are made accurate so that little
or no tweaking is required in SPICE. This paper applies the above philosophy to
the design of high speed, resistor-loaded, differential amplifiers. These structures are
tremendously useful in circuit applications whose speed requirements exceed the abil-
ities of full-swing logic circuits. Implications for the design of SCL latches, registers,
and gates are also discussed.
Figure 4-1 displays a resistor-loaded, differential amplifier used in high speed ap-
plications. The resistors are often realized within a reasonably small area using un-
silicided polysilicon, and introduce less capacitance than other loads such as triode
PMOS devices or diode-connected NMOS devices. Further increases in bandwidth
can be achieved at the expense of chip area by introducing inductors into the loads
[20].
Design of resistor-loaded amplifiers entails choosing appropriate device sizes and
resistance values given three design specifications:
" Allowable power dissipation: 'bias
* Desired voltage swing: V5,
* Desired DC voltage gain: IAvj
An additional specification for the amplifier is its bandwidth - its value is constrained
by choice of the above three specifications as well as the load that the amplifier is
required to drive (assumed capacitive). We define intrinsic bandwidth (BW) as the
amplifier bandwidth that results when the amplifier drives an identical stage without
additional wiring capacitance. Since actual circuits contain wiring capacitance, the
48
intrinsic bandwidth offers only an upper bound on achievable performance, but is still
a very useful metric. Note that, to achieve the maximum bandwidth, the transistor
length, L, will always be assumed to be set to its minimum value for the discussion
to follow.
Figure 4-1 allows us to relate the first two design specifications to other circuit
parameters. When zero differential input voltage is applied to the amplifier, the bias
current through each transistor is observed to be
Io = Ibias/2.
As the input differential voltage is varied, the current through each resistor ranges
between 0 and Iias. Therefore, the maximum single-ended voltage swing at the
amplifier output is
VS = IbiasfR = 21oR (4.1)
The third design specification, DC gain, is derived about the bias point of zero
differential input voltage using the small signal transistor model shown in Figure 4-2.
Here we have assumed that node nO in Figure 4-1 is at incremental ground as the
differential voltage is varied. Ignoring capacitance for this DC calculation, we write
1
JAv| = grn(R|| ) > g= |Av||R + |AvJgds (4.2)
gds
Unfortunately, evaluation of the above equation requires calculation of gn and
9ds as a function of the device bias current and size. As pointed out earlier, hand
calculations assuming square law I-V characteristics prove inaccurate for this task.
Our proposed method of addressing this issue is described in the following section.
R
RVin ,,-------------------------------------. vout
01 e + 1C'o44[ I 171177
Vin vs C9S gmvgS r
small-signal CMOS device model
Figure 4-2: Small signal model for amplifier.
4.2 Proposed Approach
We will now show that we can create a design framework in which all design calcula-
tions revolve around the solution of just one key variable given the three constraints
49
described earlier. This key variable is current density, and is defined as
'den W7
where W is the width of the transistor as indicated in Figure 4-1.
Two key relationships involving current density will now be derived. The first is
a gain/swing constraint formulation that will set the value of Iden. The second is a
gain-bandwidth product expression that incorporates the impact of Iden.
4.2.1 Derivation of Gain/Swing Constraint Formulation
Given a fixed transistor length, L = Lmin, both the gm and 9ds values of a CMOS
device are dependent primarily on the transistor width, W, and bias current 1. Given
a fixed value for I0, as set by power dissipation requirements, it is straightforward to
sweep W of the device in SPICE to obtain simulated plots of gm (Io, W) and gds(Io, W).
We then define gmo(Iden) and YdsO(Jden) as
gmo (Iden) = gm(Io, W) /W, gdsO(Iden) = Yds ( io, W )/W (4.3)
Let us now revisit the swing and gain constraints discussed in the Background
section. Combining Equation 4.1 and Equation 4.2, we obtain
9m = 2iAvj + |Avjgds.
We relate gm and gds to the simulated characteristics defined in (4.3) as
Wgmo(Iden) = 2AIo + jAv|Wgdso(Iden ).
Dividing through by W, we obtain the key gain/swing constraint formulation as a
function of current density:
9mo(Iden)= 2 AvIIden + Avjs (Iden) (4.4)
The above expression states that current density is completely set by the choice of
gain, swing, and the simulated gm, Yds curves.
4.2.2 Derivation of Gain-Bandwidth Tradeoff
To examine the tradeoff between gain and intrinsic bandwidth, we first note that the
capacitive load can be approximately related to the amplifier device size as
CL = WCLO = W(Cso + Cdo), (4.5)
50
where CLO is the simulated capacitive load normalized to an effective W equal to
one. Justification for the above expression follows from the fact that the amplifier
is driving an identical structure for its load and that both Cgs and Cd scale linearly
with the device width, W.
Calculation of the intrinsic bandwidth is computed as
1 __ 2 1 d
BW(rad/s) = . (4.6)
RCL VswCLo
The amplifier gain is found through algebraic manipulation of Equation 4.4:
IAvI = gmo(Iden)(2/Vsw)Iden + gdsO(Iden)
The gain-bandwidth product is then found by combining the above two expressions:
|A,| - BW = gmo(Iden) 1 (4.8)
CLO 1 + Vsw9dso(Iden)/( 2lden)(
Given 9dsO is negligibly small, the above expression reverts to the classic gm/C ex-
pression familiar to analog designers. However, one must note that gm is a function of
current density - the implications of this point will be brought home in the following
section.
4.3 Intuitive Insights from Method
The first useful insight of the proposed method is that it provides an intuitive picture
of the dependence of gain-bandwidth product on current density. Figure 4-3 displays a
gain-bandwidth plot for a 0.18u NMOS device according to Equation 4.8. Each curve
utilized a gmo(Iden) curve and estimate of CLO generated in Hspice from a SPICE
model file for the 0.18u CMOS process. The top curve assumes gs = 0, while the
bottom one includes its effect based on gdsO(Iden) generated from Hspice. In either
case, we see that gain-bandwidth product is increased as current density is increased,
so that high current density is desirable in high speed applications.
The second useful insight of the proposed method is that it reveals that current
density is not a free variable - it is determined by the gain and swing requirements
of the amplifier as well as the gmo(Iden) and gdso(Iden) characteristics of the device.
Figure 4-4 displays a graphical interpretation of Equation 4.4 in setting the current
density. Ignoring the influence of dsO ('den), the current density is determined as the
intersection of the gmo(Iden) curve for the CMOS process with a straight line whose
slope is 2IA, I/Vw. As gain is increased relative to a given voltage swing, the line slope
is increased and Iden must be reduced. Combining this observation with Figure 4-3,
we see that higher gains lead to reduced gain-bandwidth products.
Note that the impact of finite output conductance, gdso(Iden), is to add to the
straight line whose slope is 2|Avj/VW, which leads to further reduction of the resulting
current density setting. Therefore, finite output conductance degrades the achievable
51
M_0
00
S10104 5 --
Calculated Gain-Bandwidth Product vs Current Density
3.5F
3
2
50 100 150 200 250 300 350 400 450 500
Current Density (A/m)
Figure 4-3: Calculated Gain-Bandwidth product vs Iden.
gain-bandwidth product of the differential amplifier structure.
4.4 Results
The proposed procedure was used to design several differential amplifiers in a 0.18u
CMOS process (only NMOS devices were used) with varying gain values. The swing
and power dissipation were held constant at V8, = IV and bias = 2mA, respectively,
and the bandwidth was calculated based on Equation 4.6. Table 4.1 displays a com-
parison of the calculated gain and bandwidth values to the Hspice simulation results.
In the Hspice simulation, the amplifier has the same topology as shown in Figure 4-1
and is loaded by an identical amplifier stage whose output is set to a constant voltage
in order to eliminate Miller effect on the capacitive load it presents.
Table 4.1: Calculated vs simulated amplifier performance.
Table 4.1 reveals that the proposed design procedure is quite accurate with re-
spect to achieving the desired gain for the amplifier. The calculated versus simulated
52
.
Calculated Simulated Simulated
Target Gain BW (GHz) Gain BW (GHz)
2.00 14.45 2.03 13.74
3.00 8.30 3.02 8.17
4.00 5.18 4.00 5.31
5.00 3.27 4.98 3.48
6.00 1.99 5.97 2.19
I
9ds =0
... ... 
Accounting for gds
2.5
Normalized Transconductance vs Current Density
1 flflf
0 50 100 150 200 250 300
Current Density (A/m)
350 400 450 500
Figure 4-4: Current density settings versus gain/swing.
bandwidth values are not as accurate, but are still within ± 10 % of each other.
The discrepancy in bandwidth is likely due to the fact that the capacitive load is not
strictly a linear function of W as assumed in Equation 4.5.
4.5 Application to SCL Digital Circuits
It is interesting to note that high speed digital structures also make use of such
differential amplifier structures. Figure 4-5 illustrates a high speed SCL latch and
a NAND/AND gate. The differential amplifiers embedded in such structures are
turned on or off based on other differential pairs below them. When turned on, their
behavior corresponds to that of the basic differential amplifier structure.
We have found that the following heuristic design method works well for such
circuit structures:
1. Use the proposed method to design the differential amplifier portion of the
structure with given gain, swing, and bias current requirements. We have found
that a choice of gain in the range of 1.25 to 1.75 works well (the swing and bias
current values depend on the application). In the latch example of Figure 4-5
(a), this step yields sizes for MO and M 1.
2. Choose identical sizes for transistors that feed off the same diff pair as the
differential amplifier above. In the latch example, this would lead to M2 and
M 3 having the same sizes as MO and MI1 .
3. Choose sizes that are roughly 20 % larger for the transistors that feed the above
differential pairs. In the latch example, the widths of M4 and M5 would then
53
900
800
700
600
500
400
300
200
100
0
E
...
- Increasing IAvINsw Ratio
gmo(,Iden)(from Hspice)-- -
- - - - - - - - - - - - - - - - - - --- - ( o s d r n ) --AvvNsw=2
- (considering* gds'
(ds A0) IvNsw=2
-(gd 
- --0(9ds)0 
-
( .- considering gds) ............... .
-. - .. .. -.. -. . . -.. .......- -.-.- .- .- .
RL RL
OUT- OUT
1N tIN:,M0 M1  M2 M3
(D M 4 Y 5 M H
(bias
(a) SCL Latch
Figure 4-5: Digital high speed circuits.
be set to be 20% higher than the widths of Mo and M1 (L should be minimum
in all cases). Note that this progressive scaling technique is commonly applied
in digital design (see page 298 of [21]) - the value of 20% is not necessarily
optimal but has worked well for us in practice.
4.6 Summary
This chapter presented a simple numerical technique to design high speed differential
amplifiers with resistor loads without relying on square law assumptions for the CMOS
devices. By combining hand analysis with SPICE generated data, intuition of such
issues as gain-bandwidth product properties is achieved while still obtaining highly
accurate design calculations. Calculations from the method were compared to Hspice
simulations, and reveal that the formulations are highly accurate with respect to
achieving desired gain, and reasonably accurate for bandwidth estimation. A heuristic
extension of the method can be applied to high speed SCL logic gates and latches. The
next chapter will discuss the circuit design issues related to each system component.
54
RL RL
NAND I AND
B9 B
Mo M Mdum
AH Qias A3
A M2 0- M3H
Ibias
(b) SCL NAND/AND gate
Chapter 5
Circuit Design of Systems Blocks
This section of the thesis is intended to highlight important design details for each
of the critical system blocks. The limit amplifier design will be presented first since
the design of the peak detector, integrator and control logic depend on the final limit
amplifier design. Next, the peak detector and integrator design will be presented.
Finally, the feedback control logic and output buffer will be briefly discussed. Full
cell schematics and additional characterization details for each cell can be found in
Appendix B.
5.1 High Speed Limit Amplifier
The main amplifier is not the main focus of this thesis, rather it is the system to
be compensated. It is necessary to implement the limit amplifier to demonstrate
the performance of the proposed offset compensation technique. There are numerous
ways to implement the amplifier structure. As explained in Chapter 1, the selected
topology is a limit amplifier composed of a cascade of resistively loaded differential
amplifiers as shown in Figure 5-1. Following are the design specifications for the limit
amplifier:
Vi jAv(1) >Av(2) ....- Av(n-1) Av(n) Vout
Figure 5-1: High-Speed, Multi-Stage Limit Amplifier
* Total Gain: > 100 (> 40dB)
" Total Bandwidth (3dB): > 6GHz
" RMS Jitter: < lpS
" Output Swing: 1V
55
* Input Sensitivity: 3mV
A key question to ask is how are these design constraints determined, specifically,
the gain, output swing and bandwidth? The total gain of the limit amplifier is
based on the difference between the TIA output amplitude and the clock and data
recovery (CDR) input amplitude requirements. The output swing is determined by
a combination of the CDR input requirements and its influence on bandwidth and
power dissipation. The total bandwidth is set according to the data rate to minimize
inter-symbol interference (ISI). The gain per stage is set to balance the trade-off
between maximizing the bandwidth and power efficiency and maximizing the input
sensitivity. The number of stages is calculated by simply dividing the total gain by
the gain per stage. Once the number of stages has been determined, the required
bandwidth per stage can then be calculated. Therefore, the gain, output voltage
swing and bandwidth are based on trade-offs between signal conditioning, power
dissipation and input SNR requirements. Chapter 4 presented the design methodology
for determining the appropriate biasing, transistor widths and resistance once the
gain, swing and bandwidth are determined.
5.1.1 Determining Optimal Number of Stages
The total gain, bandwidth and output swing for the limit amplifier are fixed and
the number of stages must be designed to meet the above specifications. How do
we determine the optimal number of stages, or gain per stage? Also, how does the
optimal number of stages required to maximize bandwidth compare to the optimal
number of stages required to minimize power dissipation or to minimize the input
referred noise (i.e. maximize the input sensitivity)? The following discussion will
address these issues.
Optimal Number of Stages for Maximum Bandwidth
Following the derivation in Chapter 8 of [20], we can determine the optimal gain per
stage, or conversely the optimal number of stages, of the limit amplifier to maximize
bandwidth for a given total amplifier gain. As before, if we model each stage of the
limit amplifier by a gain and a single pole, the model for the limit amplifier is:
Vot Av n (5.1)
Vin (1+ w1/Wo
A, is the gain/stage, w1 is the overall bandwidth of the limit amplifier, wo is the
bandwidth of each stage and n is the number of stages. If we solve for w, we can see
that as the number of stages, n, increases, the bandwidth decreases much slower than
the gain increases:
W1= wo 21/n - 1 (5.2)
56
This means that we can increase the gain-bandwidth product as the number of stages
increases, to a limit. If the total gain of the limit amplifier is G then the gain per
stage is A, = G-/". After a significant amount of algebra (refer to Appendix D), the
optimal gain per stage to maximize the total bandwidth for a specified total gain is
found to be:
G / = el/2 (5.3)
Therefore, the optimal gain per stage, neglecting the impact on power dissipation,
implementation size and noise, is approximately 1.65.
The following discussion is based on the analysis presented in [22]. The total limit
amplifier bandwidth, normalized to wt = Aw0 , is plotted as solid lines in Figure 5-2
versus n for total gains, G, of 10, 100 and 1000. The total normalized bandwidths are
also plotted versus n for gain per stage, Ar, of 1.65 (the optimum), 2 and 3 as dashed
lines. The intersection of two lines corresponds to a possible design point. The line
corresponding to a gain of A = 1.65 per stage intersects each of the solid lines at
its peak, indicating maximum achievable bandwidth. This agrees with our previous
analysis. Each of the solid curves for total normalized bandwidth are quite shallow
at their maximum, especially for large total gains. From a designers perspective,
this is desirable because it provides flexibility in the design. A gain per stage slightly
different from the optimal value can be used with minimal impact on total bandwidth.
The advantage is an extra degree of freedom in determining the number of stages.
0.25 GOI
0 .2 0 . -. -. . --... . --- . -.. .. -.. .. -. .
G=1000
0.10
- - - A=0 ~ A=3
00 5 10 15 20 25 30
Number of Stages
Figure 5-2: Number of Stages vs Total Bandwidth: Normalized Total Bandwidth vs
Number of Stages for Gi/ = A = 1.65, 2.0 and 3.0 and Total Gains, G, of 10, 100
and 1000
57
Optimal Number of Stages for Minimum Power Dissipation
Power dissipation is an important consideration that must considered in the design.
If the power dissipation of the final design is too high then the design will not be
practical. As mentioned in the previous section, as n increases to the number required
to maximize bandwidth, the total gain-bandwidth product also increases. This occurs
because the total gain increases faster than the bandwidth decreases. As n continues
to increase past the optimum, the gain-bandwidth product decreases. Therefore, for
a fixed total gain and bandwidth, the power per stage should also decrease faster than
the total power increases as n increases to the optimum. After n exceeds the optimum,
the power per stage continues to decrease but the total power starts to increase.
Intuitively, there should be some number of stages that provides the minimum power
dissipation.
Making use of the design methodology presented in Chapter 4, a Matlab script
was generated that returns the transistor width, resistance and bias current required
for a specified bandwidth. The script can be found in Appendix E. Using this
information, the total power dissipation for the limit amplifier is plotted in Figure 5-
3 as the dashed lines versus the n for a total gain, G, of 100 and bandwidths of 2,4,6,8
and 10GHz. The points at the minimum of each line corresponds to the number of
stages required to achieve minimum power dissipation for each bandwidth. Point A,
which lies at the minimum power for a bandwidth per stage of 8GHz, corresponds
to 9 amplifier stages, or a gain per stage of 1.65. Therefore, the number of stages
required for maximum bandwidth determined in Figure 5-2 equals the number of
stages required for minimum power dissipation. Moving to the minimum of a lower
dashed lines results in a lower total bandwidth, but minimum power dissipation for
the resulting bandwidth. The simulated results generated with the script match the
intuition presented above. This analysis demonstrates the powerful result that it
is possible to simultaneously achieve maximum total bandwidth and minimal total
power dissipation for a specified total gain and bandwidth.
Optimal Number of Stages to Minimize Total Noise
The input sensitivity of the limit amplifier is limited by the total input referred
noise from the amplifier. If the input referred noise is too large then the input SNR
decreases. Similar to the analysis performed above, we can analyze the impact on
the number of stages, or the gain per stage, on the input sensitivity. Figure 5-4a
shows a single limit amplifier stage. Figure 5-4b displays the half-circuit for each
limit amplifier stage with the noise sources for each of the devices included. The
current noise source for the resistor thermal noise is:
____ 1
j2 = 4KT-1Af (5.4)n~r R
where K is Boltzmann's constant (K = 1.38x10-23 J/K), R is the resistance value
and Af is the bandwidth of interest. Both of the input devices, M1 and M2, and
58
10
CU
.0
CU
C1)
0
0
10-1
10
Total Power Dissipation vs Number of Stages
. . .. .....
........ ......... w :: wG = 1 00
.. .. .% . . .. . .. .
-~ ..... .... ..- -
t .. . . . . ... .. ..
.. . . .I ... .. . ... .. . . . . . . .
A ' . . . . . .. . . ... ..
... . ............
.. . . . .... .. . .. .. .
.. . .. . . .. . . . . . . . . .... .... ........
......% .. ......... ....... f3dB/stage=2- OG Hz..
5 10 15
Number of Stages
20 25 30
Figure 5-3: Total Power Dissipation of Limit Amplifier for a Total Gain of 100 and
Bandwidths/Stage from 2-10GHz
the tail current source, M3, have flicker noise, thermal noise and gate noise. The
cross-over frequency, fe, is defined as the frequency where the flicker noise and the
thermal noise of the transistor are equal and depends on process and circuit variables.
Since f, is on the order of 10's of MHz and the limit amplifier is broadband, with a
bandwidth of 10GHz, we can ignore flicker noise to first order. Gate noise is a high
frequency, narrow-band noise phenomenon that can also be ignored to first order.
Therefore, in the half-circuit model the channel induced drain noise current for M1
and M3 can be expressed as:
i, = 4KTygd,,M1Af (5.5)
id,M3 = 4KTygdo,M3Af (5.6)
In the above equations, -y is the the excess noise factor, which is a process dependent,
and is roughly 2-3 in short channel devices. gdo is the output conductance of the
transistors at a zero Vd. Input referring all of these noise sources to a voltage noise
source, as shown in Figure 5-4c, yields:
V 4 - - + ygdo,M1 + 0-5 - Ygdo,M3 Af (5.7)in 2 R9m,M1 I
Finally, to determine the total input referred noise voltage of the limit amplifier,
refer each of the individual input referred noise sources in each stage to the input of
59
0-31
10
R R
Vo+ Vo-
Vin+ M1  M2  Vin-
M3
(a)
nr R
Vo+
Vin+ 2ndM1
0.5*Ib) 3
(b)
R
VO+
Vin+
-2 +
(c)
Figure 5-4: (a) Full Resistive-Loaded Differential Amplifier (b) Half-Circuit with
Noise Sources Added (c) Half-Circuit with Noise Source Referred to Input
the limit amplifier. Each noise source will simply be inversely scaled by the gain of
the preceding stages. Therefore, the total input referred noise voltage of the limit
amplifier is:
V , = 1 V ,in=tt Ai1 Sn V,j (5.8)
where n is the number of stages and A is gain per stage. To understand how the
number of stages in the limit amplifier impacts the total noise, we need to make a
few assumptions. First, we can assume that the tail current source adds negligible
to the total input referred noise. Next, if we assume that gm = agdo, where a is a
scaling factor, then we can simplify Equation 5.7 to:
V.in
4KT [1 ]
~ 2 - YgdoMI 1
gm,M1 R
4KT 1 + 7 Af
= 2 -- +-gm,M1 f
gm,M1 R a I
4KT A A
gm,MI [ Af
4KTR 1 Af
Av .A, a
(5.9)
(5.10)
(5.11)
(5.12)
To be conservative, we assume that the excess noise factor y = 3 and that a = 0.5.
60
Finally, using the amplifier script in Appendix E, the total input referred voltage noise
is plotted versus n for 3 different amplifier stage bandwidths and a total gain of 100 in
Figure 5-5. The lines marked with diamonds are the total input referred noise voltage
of the limit amplifier. The lines without the diamonds are the total input referred
noise voltage due only to the input devices of each amplifier stage. It is evident that
the channel noise of the input devices dominates the total noise and that the total
input referred noise increases nearly linearly with the number of stages. Interestingly,
the total noise spectral density decreases as the bandwidth of the limit amplifier
increases. This is expected since, for a fixed gain, the current density increases as
the bandwidth requirement increases. As the current density increases the resistance
decreases and the transistor transconductance increases to maintain a constant gmRo
product. However, the improved noise performance is achieved at the expense of
higher power dissipation. This trade-off between noise and power dissipation is well
documented [13, 23, 1]. From Equation 5.10 the total noise is inversely proportional
to gm, to first order, so it will decrease as the bandwidth increases. Therefore, to
minimize the total noise, and maximize the input SNR, it is desirable to minimize
the total number of stages in the limit amplifier.
N
0
z
(1)
0Y)
~0
'4-&
:3
C
Total Input Referred Voltage Noise
G=100 f3dB=2GHz/stage ,
7-
f3dB=6GHz/stage
1 .. .. .... .... ...... 0 H Z/ tag
Nub 10 1 of St a
Number of Stages
Figure 5-5:
for a Fixed
Total Input Referred Voltage Noise Versus Number of Amplifier Stages
Total Gain
Finally, we can translate the noise performance to input sensitivity. The input
sensitivity can be determined using the following expression[4]:
Vin,min = 1 2 - Vin,tot v/BW (5.13)
where BW is the total bandwidth of the limit amplifier. The minimum input voltage
61
-Xio,
is plotted in Figure 5-6 for the same conditions as in Figure 5-5.
Minimum Input Voltage
50
G=100 f3dB=2GHz/stage
45-
35 - .- - -
02 5 - -... ... .. .. .A... .. .
0 5 1 5 2 5 3
Nub f3dB=ofStges
-~ 2 0 . .. .. . .... .... .....
E .
Figur ........:. M..mm In....tg..r.sN mb r fAm.....tge oraFie
T1tal Gain
00 5 1'0 15 20 25 30
Number of Stages
Figure 5-6: Minimum Input Voltage Versus Number of Amplifier Stages for a Fixed
Total Gain
5.1.2 Bandwidth Extension Techniques
The analysis in the previous sections assumed that the amplifier stages could be mod-
eled by a gain and single pole which provides a first-order response. However, there
are several design techniques that can be used to provide a higher order response and,
therefore, enhance the bandwidth of the amplifier. The most common technique for
enhancing the bandwidth of resistively loaded, differential amplifiers is shunt peaking
with inductors [20, 24]. However, a significant disadvantage of using inductors in in-
tegrated designs is their large area. For this reason, this design will not use inductors
or shunt peaking.
A simple trick known as neutralization [20] utilizes negative capacitors to cancel
the gate to drain capacitance, Cgd, as shown in Figure 5-7. Although not as effec-
tive as shunt peaking, neutralization can provide significant bandwidth enhancement
while occupying much less die area. The neutralization capacitors, C., are connected
in parallel with Cgd between each input and output with a gain of -1 as shown in
Figure 5-7a. When the input goes high the charge is supplied to Cgd by C, rather
than the driving stage effectively increasing the bandwidth of the limit amplifier.
Similarly, when the input goes low the charge on Cgd simply transfers to C, rather
than discharging to ground.
A severe obstacle to using the neutralization technique is implementing the in-
verting buffers at each output. Recognizing that the differential outputs are 180' out
of phase, the neutralization capacitors can simply be connected between each input
62
[R R
C9d ga Cdg gd
Cn Cn Cn - Cn
4L L Li-4 L_!
vi iba 
viia
(a) (b)
Figure 5-7: Limit Amplifier Stage with Neutralization Capacitors
and the opposite output, as shown if Figure 5-7b. If C, is smaller than Cgd, there
will be a residual Miller Effect. If C, is the same size as Cgd the Miller effect will be
eliminated. Following this train of thought, if C,, is larger than Cyd there will be a
net negative capacitance, or an equivalent inductive effect over a narrow frequency
band. Over a broad frequency range, the magnitude of the impedance of an induc-
tor is proportional to frequency. However, the magnitude of the impedance of the
negative capacitance is inversely proportional to frequency. The final size of C,, was
determined by inspecting eye diagrams of limit amplifier output.
Other bandwidth enhancement techniques such as inverse scaling of the limit
amplifier stages were investigated but not employed in this design. Inverse scaling
would have significantly increased the layout burden and the amplifier design was not
the focus of this thesis.
5.1.3 Final Amplifier Design
To review the design goals, the limit amplifier is designed for a total gain of 40dB,
bandwidth of 6GHz, output swing of 1V and input sensitivity of 3mV. Based on
the analysis above, the final amplifier design was performed using the Matlab script
included in Appendix E. The gain-bandwidth requirement of the limit amplifier
tests the upper performance limits of this process. Therefore, the gain per stage was
selected near the optimal number of stages to maximum the bandwidth for the desired
gain. There is some flexibility in the choice of n since the curves of total bandwidth
versus number of stages in Figure 5-2 are very shallow. Considering the impact of
large n on power and noise, the amplifier was designed with 7 stages, rather than
the optimum of 9. The impact of this choice is a 1.2% reduction in total bandwidth
and a 25% increase in power dissipation. However, the total input referred noise is
reduced from approximately 6.9nV/v71z to just under 2.5nV/v71z, resulting in an
improvement in input sensitivity from 8.3mV to 3.OmV. To keep the power of the
63
chip manageable, the total power dissipation for the limit amplifier was limited to
100mW, or 55.5mA from a 1.8V supply. This power budget translates to 7.9mA bias
current per stage with 7 amplifier stages. Based on the total gain goal and number of
stages, each amplifier stage was designed for a gain of 2. The output swing was set to
IV. Based on these design choices and the script in Appendix E, each limit amplifier
stage has a bandwidth of 9.0GHz, neglecting the enhancement techniques. The total
bandwidth of the limit amplifier neglecting enhancement techniques is 2.9GHz. With
the neutralization technique described in Section 5.1.2 the final bandwidth is roughly
4.0GHz, less than the desired bandwidth of 6.0GHz.
The total bandwidth predicted by the linear model in Section 5.1.1 for a cascaded
amplifier design may be a conservative estimate [24]. In the analysis for optimal gain
per stage we assumed that each of the amplifier stages operates in the small-signal
region. We also assumed that the bandwidth of each stage is related to a single time
constant determined by the output resistance of the driving stage in parallel with
the total capacitance at the output. However, the later amplifier cells experience a
larger amplitude input than the first several stages, completely switching the bias
current, and exhibit large-signal behavior. The large transconductance of the input
devices of the later stages quickly switch the bias current and sharpen the edge rates
of the signals. When the delayed current flows to the output it only has to charge
the capacitance of the load resistor and input capacitance of the loading stage. The
capacitance due to the input stage has effectively been eliminated and the speed is
therefore determined by a single stage.
Ultimately, the performance of the limit amplifier must be judged by the quality
of the opening of the eye diagram. Figure 5-8 exhibits Hspice eye diagrams at the
output of the limit amplifier, loaded by the output buffer, for data rates of 5Gbps
and 10Gbps and for input amplitudes of 2mV, 20mV and 200mV (peak-to-peak).
The 5Gbps eye diagrams exhibit almost no ISI and all three eyes are very open.
The peak-to-peak jitter due to the offset compensation and ISI is less than 1.5pS
for all three eyes. The 10Gbps eye diagrams suffer from significant ISI and the eye
for Vj = 2mVp exhibits closure. Due to the limited limit amplifier bandwidth, the
outputs do not reach the minimum and maximum values for successive high-low bits
that transition at the maximum transition rate. The peak-to-peak jitter due to the
offset compensation and the limited bandwidth of the limit amplifier for the three
eyes are 4.5pS, 3.5ps and 2.3ps. All three eyes exhibit jitter which is greater than the
goal of 2ps.
5.2 Peak Detector
The peak detector topology, shown in Figure 5-9, and its operation was discussed in
detail in Chapter 2. In this chapter we will review the basic operation of the peak
detector and analyze the two most significant design issues for the peak detector.
The first issue is that variations in the peak detector pole location directly impact
the system dynamics. Techniques for minimizing this undesired effect are discussed.
Also, the size of the peak detector input devices represents a trade-off between limit
64
Eye Diagram of 5Gbps Differential Output - Vin = 2mVpp Eye Diagram of 5Gbps Differential Output - Vin = 20mVpp Eye Diagram of 5Gbps Differential Output - Vin = 200mVpp
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4
ime (seconds)
Eye Diagram of 10Gbps Differential Output - Vin = 2mVpp
0.25- -
0.20-
0.15-
010 -
(.05--
0
-005
-015 -
-0.20-
-0.2-
0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
Time (seconds) "1*"
10
0.6
0.2
-0.2
-0.4
-0.6
-0.8
-i.0
0 0.6 1.0 1.5 2.0 2.5 3.0 3.5 4
Eye Diagram of 10Gbps Differential Output - Vin = 20mVpp
0.2 0.4 U Tm ( so 12
Time (seconds)
-1.0
1 6 1.8 2.0 0
.10"
0.8
0 6
04
0.2
-0.2
-0.6
-0.6
-i0e -
0 0.0 1.0 1.5 2.0 25 3.0 3.5 4.
0.2 04 0.6 0.8 10 1.2 1.4 1.6 1.8 2.0
Time (seconds) x"'.
Figure 5-8: Eye Diagrams for the Limit Amplifier at Data Rates of 5Gbps and 10Gbps
and input amplitudes of 2mV, 20mV and 200mV peak-to-peak
amplifier bandwidth and offset introduced by the peak detector input devices. This
trade-off will be explored. We will also explore how the peak detector was designed
to allow its bandwidth to be programmed.
As discussed in Chapter 2, when the input is high, the switch devices M3 and M4
are on and the peak detector behaves as a traditional peak detector (source follower).
With the input high, the bandwidth of the peak detector can be expressed as:
i1 -gm,M1/2J 3dB = 2IrCL
(5.14)
Alternately, when the input is low the switch devices are open and the output
will retain its current value minus any droop. The amount of droop is determined by
the off-state leakage of the switch devices, M3 and M4, rather than the bias current.
In contrast to traditional peak detector designs, the bandwidth of the proposed peak
detector (when then input is high) can be set independently from the droop (when
the input is low).
In Chapter 3 we determined that the dominant open-loop pole of the offset com-
pensation corresponded to the pole of the peak detector. To minimize variation in the
system dynamics we would like to define the pole location to minimize its sensitivity
to process and temperature variations. Biasing devices Ml and M2 in the peak de-
tector in subthreshold causes gmMl/2 to be a linear, rather than quadratic, function
of the bias current and independent of device dimensions to first order. Therefore,
variations in the bias current and physical device dimensions will have a minimal im-
65
0.20
010
0.005
-0.15
-0.20
-0.25
Time (seconds) Tm(cdi e seconds) '*0
Diagram of 1 0Gbps Differential Output - Vin = 200mVpp
0 0
1.4
-.. -..--.--
--.--. ---. .-
0.6 - --
0.4q
0.2
-0.2--- -- -
-0. --4 - -
Vin+ M1 M2 Vin-
Vo+ Vo-
CLI ICL
M3 M4
Figure 5-9: Simplified Schematic of Fully Differential Peak Detector
pact on the dominant open-loop pole location. Additionally, it is trivial to reprogram
the bandwidth for different system dynamics if the bandwidth is a linear function of
the bias current.
The most significant design issue for the peak detector is the trade-off between
the offset due to device mismatch, introduced by the peak detector input devices, and
the limit amplifier bandwidth. As we know from the development in Appendix A,
the total offset due to device mismatch is inversely proportional to the square-root
of the peak detector input device gate area. Additionally, this offset will be directly
referred to the output of the limit amplifier through a low-pass filter transfer function.
However, the gate capacitance, and hence the loading on the limit amplifier output
due to the peak detector input, increases linearly with the gate area of the input
devices. The gate oxide density is approximately 8.5fF/pm2 so large peak detector
input devices can not be tolerated. To limit the impact on the limit amplifier band-
width, the total gate capacitance of the peak detector input devices was limited to
10fF. This results in input device dimensions of 4.0pm x 0.35pum (WxL).
One potential downside to biasing the input devices in subthreshold is that, by
definition, they operate at low current densities (amps/width). For a desired band-
width the peak detector must satisfy some gm/C ratio, which is linearly proportional
to Ibias/C. Due to the upper bound placed on total gate capacitance of the input
devices, the peak detector has a maximum achievable bandwidth. In this process,
the current density for the peak detector must remain below 1pA/ptm to stay in
subthreshold. Equivalently, the total bias current must be less than 4PA since the
width of the input devices is 4pm. With a load capacitance of lpF, the maximum
bandwidth of the peak detector at room temperature is:
BWmax = g - qIbias 20MHz (5.15)
2 7FCL 27rCLnKT
where n is a fitting constant approximately equal to 1.5 and T is 2940 K. The band-
66
width was simulated using Hspice across process variations and a temperature range
from 00 C to 700 C. The maximum bandwidth of the peak detector varies between
16.3MHz (slow/70' C) and 19.5MHz (fast/00 C) with values between 17.9MHz and
18.4MHz at 270 C across process variation. Peak detector bandwidths between
1.7MHz and 17MHz are required to achieve loop bandwidths between 1MHz and
10MHz so this design satisfies the design goals. In case the measured results dra-
matically differ from the simulated results, the peak detector bias current and load
capacitance can be independently adjusted to compensate the design.
5.3 Integrator
There were several challenges with designing the integrator. Each of the issues will
be discussed and the final integrator design, which addresses each of the issues, will
be presented. Figure 5-10 shows a typical implementation of a differential active
integrator. The transfer function is:
C
R
Vin Vout
R
C
Figure 5-10: Basic Differential RC Integrator
Vut 1 1V-t = 1 1 -(5.16)
V RC s
and corresponds to a gain of 1/(RC) cascaded with an ideal integrator. The unde-
sirable characteristic of this design for integrated CMOS designs is that it presents a
resistive load and can significantly attenuate the gain of the driving stage. Ideally, the
integrator should have a purely capacitive input impedance. Therefore, the gmC filter
shown in Figure 5-11, which satisfies the input impedance requirement, was used as
the integrator. Unfortunately, the filter's frequency response, shown here, does not
match the desired response from Equation 5.16:
Vout = gm,Mi - (Ro,M1 11 Ro,M4) 
_ K 2  (5.17)
Vn 1 + (Ro,Mi 11 Ro,M4) - O 1 + S(P3
67
Vb c M4 M5
i out 'outVout+ Vout- I
C M1 M2 CI I
Vbias2 M3
Figure 5-11: Simplified Schematic of Differential gmC Integrator
where RO,M1 and RO,M4 are the output impedances of MI and M4. Specifically, the
ginC filter, as designed, has a finite DC gain:
___ = gm,M1 (Ro,M1 Ro,M4) = K 2  (5.18)Vin
and a dominant low-frequency pole:
1
P3 = 1 ) C (5.19)(RoM1 11Ro,M4)
Since the gain is proportional to the output resistance and the pole location is inversely
proportional to the output resistance, the filter will approximate an ideal integrator
for large output resistance. There are two main design techniques that can be used
to increase the output resistance: cascode the output devices M1-M2 and M4-M5
and/or use long length devices. Unfortunately, it was not feasible to use cascoded
devices in this design due to head-room issues. The gate lengths of M1-M2 and M4-
M5 were set to l1pm to achieve the desired high output impedance. To determine if
this restriction is an issue, we need to investigate the impact of the new design on
the overall compensation performance.
To understand the impact of the non-ideal characteristics of the proposed imple-
mentation on the system operation, we will refer back to Chapter 3 where the linear
system model was developed and compare the impact of the two integrator models on
the system transfer function. First, expanding Equation 3.6 and solving for Vmd/Vin
yields:
68
G(s) A(s) (5.20)
1+ A(s)
AvKjK 2  (5.21)
s - (1 + s/pi) - (1 + s/P2) + AvK 1 K 2
V'(s) G(s) - I + s/P2 (5.22)
Av- (1 + s/p 2) (5.23)
s - (1 + s/p 1 ) (1 + S/P2) + AvK 1 K 2
vt(s=) 0 (5.24)
For the ideal model we can see that the offset compensation drives the steady-state
output-referred offset voltage to zero because of the infinite DC gain of the integrator.
Alternatively, consider that in steady-state the input to the integrator, which corre-
sponds to the output referred offset voltage, must be equal to zero. If the input to
the integrator were non-zero then the integrator output would ramp and the system
would not be in steady state.
Now, consider substituting the model for the proposed integrator into Equa-
tion 5.20 and solve for Vat/Vi,:
G(s) = AvK 1 K 2  (5.25)((I + s/p1) - (I + s/p2) - (1 + s/p 3) + AvK 1 K 2
___ (1+s/p2\f1+s/p3\(s) = G(s) I (5.26)
Vin K,1 K2
Av - (I + s/p2) - (1 + s/p3) (5.27)
(1 + s/p1) - (1 + s/p2) . (1 + s/p3) + AK 1 K 2
Vu 1
Vin (s = 0) 1 (5.28)
As expected, the proposed integrator causes a non-zero steady-state output referred
offset voltage that is inversely proportional to the DC gain of the feedback path.
To minimize the steady-state error (i.e. output referred offset voltage) we need to
maximize the DC gain of the feedback path, K1K 2 . Since the gain of the peak
detector, K 1 , is roughly unity, the gain of the feedback path is dominated by the
69
integrator gain term, K 2.
The impact of the finite DC gain can also be seen if we plug Equation 5.28 into
our system model and inspect the Bode plot of the open-loop parameter A(s) in
Figure 5-12. The additional pole in the integrator transfer function changes the low
frequency characteristics of A(s), as explained above, and forces us to lower the open
loop gain to ensure stability. The impact of the new pole, P3, and the finite DC gain
on stability can be seen more clearly if we create a Root Locus plot of the new G(s)
term in Equation 5.26. The root locus has the same shape as the root locus plot shown
in Figure 3-6 located in Section 3.2. However, the loop gain where P1,closed-loop and
P2,closed-loop cross into the right-half plane is reduced by approximately the magnitude
of p3, the pole location in the modified integrator transfer function. Therefore, there
is a tradeoff between system stability and the magnitude of the residual offset.
100
50 -........... ............. ..  ....
m -50 Increasi ng
-100 G------ -. -- -. -- -- n
_0 -150 - - P2
j -350 ----- . --- -- - - - - - - - - -
-300 . I
-350 1. . 1 1. 1. 1.12
CU
102 10' 0 6 108 10 10 10 1
Frequency (Hz)
Figure 5-12: Bode Plot of Modified Open-Loop Parameter A(s)
Another issue in the design of the integrator cells relates to the overall system im-
plementation. In Chapter 2 we determined that multiple control loops are required to
account for the outputs of the limit amplifiers saturating. Also, to keep the dynamics
of each of the control loops equal, both the loop gain and bandwidth of all loops
must be equal. The loop bandwidth is determined by the peak detectors, which are
identical in all loops. However, there are different numbers of limit amplifier stages
in each control loop resulting in different gains in each of the control loops. To keep
the total loop gains equal, the integrator gain, K2 , must be different for each loop.
The easiest way to implement the gain variation is to create 2n-1 unit integrator cells,
where n is the number of taps (or the number of stages in the limit amplifier in this
case), and select the required number depending on which control loop is active. Since
there are 7 stages in the limit amplifier, the integrator was designed as an array of 64
identical unit integrator cells with a single common load capacitance. The outputs
70
are connected through pass-gates to control which integrators are active as shown in
Figure 5-13. By superposition, the transfer function of n integrators in parallel is:
M4 M5
Vo+ Vo-
VM1 M3
% J
\ (1),'Vcbl
i(2) Vcb2
vc2 n-1
( 2l)Vcb2n.1
Figure 5-13: Schematic showing how integrator array is configured
Vout
Vi n
ngm,M * (R- tm IIROUt,M4)n
1 + (Ro0 t,Mi 11 Rout,M4) Cn
9m,M1 ' (Rout,M1 1 Rout,M4)
1 + (ROt,M1 11 Rout,M4) 
- C
The DC gain of the integrator versus n is:
Vout
Vi n
(ROM1 Ro,M4)
= n -m,Mi - n
= 9m,M1 - (RO,M1 Ro,M4)
= K2
and the dominant low-frequency pole becomes:
n
P3=(RoMI 1 Ro,M4 C
71
C
(5.29)
(5.30)
(5.31)
(5.32)
(5.33)
(5.34)
This is demonstrated in Figure 5-14. As n increases, the DC gain remains constant,
the dominant pole frequency increases and the effective integrator gain, or gain in the
frequency range when the integrator acts as an ideal integrator, increases. Therefore,
although the DC gain remains constant, the effective integrator gain is scaled linearly
by the number of active integrator cells, as desired. A DC gain of 40dB was achieved
by using long length devices (length= 1pm) and, as will be shown in Section 6, this
is large enough gain to achieve acceptably low residual output referred offset voltage.
M
Ca
CD
C
50
30
10
-10
-30
-50
-70 ...........
-90 L-
100 102 104 106 108
-... . -.
1010
Frequency (Hz)
Figure 5-14: Bode Plot of integrator demonstrate how transfer function
n, the number of parallel integrator cells
varies with
The final two issues for the integrator design are the biasing and common-mode
feedback (CMFB) implementation. Since the integrator was implemented as an array
of 64 unit integrator cells, it was important to keep the biasing and, more importantly,
the CMFB circuitry compact. It is especially important to keep the CMFB circuitry
compact because it is repeated in each unit integrator cell. The design of both the
integrator biasing network and CMFB are discussed here.
There are two typical CMFB implementations in CMOS processes. The first
design requires extremely large sensing resistors to extract the output common-mode
level to prevent the CMFB circuitry from loading the output of the integrator and
degrading the gain and bandwidth. A simplified schematic of this approach is shown
in Figure 5-15a. The second design uses a differential amplifier cell to extract the
output common-mode level. The CMFB amplifier can be comparable in size to the
unit integrator cell. A schematic for this approach is shown in Figure 5-15b. Neither
of these CMFB approaches were acceptable because they consumed too much area.
The solution was to use the CMFB scheme shown in Figure 5-16. The the desired
common-mode level is applied to the gate of M4R and device M4 (a and b) form the
feedback control loop in each integrator cell [1]. The output common-mode level is
defined as (V0os± + V 0,t_)/2 in Figure 5-16. If the output common-mode level is at
the desired level then Icm = 2I1, assuming that the biasing stage and the integrator
72
Operates as ideal
integrator
-. . . ..- ... .. - . -. -.. .... .- - . --.-.. - -.-.-
--- ......- .-...---- -- -- -.. . - . . ...- .- - .- 
Finite DC gain =40dB
--. .. ... ......-. .... .- ... -  -  - .- .- - -. --.-.-. --. -- -.--
n=1 n=64
-. . . ......- ...........-  ........-  - -  -  -  -  -.-.- .-.-.-
-- ....... ..-  .......-.-.- --.-.-. -- -.-.-.-. - -.-.-. .-
Increasing n,
-of parallel...
inte grators
-.. -....- - - .
M4 M5
Vout+ ]R R VOWt-
V M1 M2 -
M3
. VCM
(a) (b)
Figure 5-15: (a) CMFB with Resistive Output Common-Mode Level Sensing,
CMFB Using Differential Amplifier to Sense Output Common-Mode Level
cell have a 1:1 sizing ratio. However, if the output common-mode level increases,
VGS,M4 will increase causing Icm to increase. I,, the drain current of M6 and M7,
will increase forcing VDS,M6/7 to increase. The output common-mode level is then
lowered back to the desired value. If the output common-mode level decreases, the
operation is similar whereby I,, and I1 both decrease forcing VDS,M6/7 to decrease.
The output common-mode level will then increase to the desired value. The circuit
is ideal because it operates as intended and consumes negligible die area.
211
M1R Vxr
M3R
VCM M4R
M6 M7
VOt+ Vout-
V M1 M2
Vx
I cm
-H M3
M4a M4b
Figure 5-16: Simplified Schematic of Integrator Showing Biasing and CMFB
Figure 5-16 also shows the replica biasing, composed of transistors M1R-M4R and
M6, used in the integrator. The design is based on [25]. The replica biasing stage is
common to all integrator cells.
73
(b)
Vba 4 M5
M4 M5
Vot +out
VM1 M2
M3 -VCM
5.4 Output Buffer
The output buffer is subject to most of the same constraints as the limit amplifier.
However, the output buffer must also satisfy fairly harsh output drive requirements
dictated by the package parasitics and the medium that the output buffer is driving
into. Luckily, the output buffer does not have to provide appreciable gain. The gain
per stage only needs to be large enough to guarantee that the total output buffer gain
remains greater than unity over all process, voltage and temperature corners. The
model for the output bond pad, bond wire inductor package capacitance is shown
in Figure 5-17. The pad capacitance, due to both the bond pad and ESD, was
extracted with StarRC and found to be approximately 185fF. Based in the chip size
and available packages, the bond wire inductance was assumed to be 2nH and the
package capacitance was conservatively assumed to be 500fF. Finally, the transmission
line at the output was assumed to be 50Q.
Bond Pad and Package Model
I -I
Cpad Cload Road
Lbond
Cpad Cload Rload
Figure 5-17: Differential Package Model Showing the Bond Pad and Package Capac-
itance and the Bond Wire Inductance
To drive the large load in the package model, the output buffer was designed as a
cascade of 3 stages. Each successive stage has twice the output drive as the preceding
stage, resulting in a total scaling of 8 [26]. Following the same design methodology
used for the limit amplifier, each of the 3 stages was designed with a gain of 1.3, output
swing of 750mV and bandwidth per stage of 12.5GHz. To avoid laying out 3 different
high-speed amplifier stages a different approach was used. Instead of using 3 scaled
stages the output drive scaling was achieved by cascading parallel configurations of
a single output buffer to achieve the correct drive scaling. The conceptual schematic
is shown in Figure 5-18. The final power dissipation of the output buffer is 225mW,
or 125mA from a 1.8V supply, and the bias current per buffer cell is 17.9mA. The
effective total bandwidth of the output buffer is 6.4GHz. Neutralization was used on
the first two stages to increase the bandwidth to approximately 8.5GHz.
As with the limit amplifier, the best judge of the output buffer performance is
an eye diagram. Eye diagrams of the output buffer output generated with Hspice,
driving the package model, are shown in Figure 5-19 for 5Gbps and 10Gbps data.
Both eye diagrams are satisfactory, although the 10Gbps output visibly has more ISI.
74
Ix
- x - x
-X ix -1
-1X
Figure 5-18: Final Output Buffer Design
The measured jitter for the two data rates due to ISI alone are 800fSpp, or 280fS
RMS, for the 5Gbps data and l.0pS-,_, or 350fS RMS, for the 10Gbps data.
Eye Diagram for 5Gbps Output Data Eye Diagram for 10Gbps Output Data
0.4
0 .3 -- -- -- - --
0 .2 -. ..-. .. .-. -- - --- -. . .-.-. .-- -.- - - - - - - - ------ -- --.- . . .- ..---- --- -
-0 .1 .- ......
20
-0.3-
-0.4
0 1 2 3 41 0 1 2
Time (seconds) x 1010 Time (seconds) X 1010
Figure 5-19: Eye Diagram at Output of Output Buffer (a) 5Gbps, (b) 10Gbps
5.5 Comparator and Logic
Comparators fall into one of two classes, clocked or unclocked, with each type having
its associated strengths and weaknesses. There are several factors to consider when
designing a comparator:
" Implementation complexity
* Robustness
75
e Power dissipation
e Speed
Each of these issues will be addressed in the design of the comparator in the feedback
windowing circuitry.
The most significant advantages of clocked comparators is their inherent robust-
ness to noise and high input sensitivity. These advantages occur because the com-
parison is done in a discrete-time fashion and the decision circuitry is reset prior to
each comparison. In Figure 5-20, when the clk signal goes high the decision circuitry
is reset to prepare for the comparison phase on the falling clock edge. The gates of
M3 and M6 - M8 are pulled high, cutting off current flow through the output stage
and forcing Vo+ and Vo- low. Nodes A and B, at the drains of M9 and M10, see
a high impedance path through M7 and M8. Therefore, any charge stored at this
node can not be discharged, but M11 ensures that this charge is evenly distributed
between the two nodes. When the clock goes low, M7 - M8 turn on and M3, M6 and
M11 turn off. M1 and M2 sense the input signal and mirror a current to the output
stage through the PMOS current mirrors to charge nodes A and B accordingly. The
decision circuitry is composed of the cross-coupled NMOS devices, M4 - M5, and
PMOS devices, M9 - M10. The positive feedback of the cross-coupled devices forces
the outputs to be either high or low. Since the latches are reset between each decision
phase, the output does not have to over-drive the positive feedback when the input
changes sign. As a result, the design can be very low power. One undesirable aspect
of this design is that it requires a clock signal. One of the design goals for the offset
compensation was to avoid clock signals so this design was undesirable.
M9 Ck M10
A B
Vi+ M1 M2 Vi- M7 M1 M8
VO+ V0-
Clk M3 M4 M5 M6
Figure 5-20: Typical Implementation of Clocked Comparator
Consider the simplified schematic of the continuous time comparator in Figure 5-
21. The input stage is identical to the clocked comparator case in that it mirrors a
current proportional to the input voltage through the PMOS current source devices
to the output. This is where the similarities end. The mirrored current feeds diode
connected NMOS devices M3 and M4. In turn, M3 and M4 generate an input voltage
76
for M5 and M6, the input devices to the decision circuitry. The significant difference
from the clocked implementation is that the decision circuitry does not have separate
sample/reset and decision phases. As a result, M5 and M6 must over drive the cross-
coupled PMOS devices M7 and M8 if the output is to change sign. If the comparator
is designed for low power operation, M7 - M8 must be made small so that M5 - M6
can over drive their positive feedback. The problem with this approach is that the
circuit will become vulnerable to noise from the rest of the system. To improve the
robustness of the circuit, the power must be increased to ensure that M7 - M8 will
reliably latch and that M5 - M6 can over drive the positive feedback. Therefore,
there is a trade-off between power and robustness. Finally, the final comparator
design has a differential to full-swing converter in the output stage so that it can
drive the windowing control logic.
M7 M8
Vo+ Vo-
Vi+ M1 M2 Vi- M5 M6
M3 M4
Figure 5-21: Implementation of Comparator in Windowing Block
The comparator was designed for minimum power dissipation while guaranteeing
greater than 20mV of hysteresis over all process and temperature corners. The final
power dissipation for each comparator is 165pW with a 1.8V supply. Additionally,
the comparator operates up to a maximum frequency of 80MHz over all process and
temperature corners. This performance is much greater than the targeted 10MHz
bandwidth for the feedback loop. Table 5.1 summarizes the simulated hysteresis for
the critical process and temperature corners.
Process
Corner Temperature Hysteresis
Slow 700 C 19mV
Typical 250 C 36mV
Fast 00 C 43mV
Table 5.1: Simulated Hysteresis vs Process and Temperature Corner
77
5.6 ESD
Refer to Appendix B for the discussion of the ESD circuitry.
5.7 Summary
The design details of each of the system blocks was presented in this chapter. In the
limit amplifier analysis we investigated the impact of the number of stages on total
bandwidth, power dissipation and input sensitivity. It was shown that for a chain
of resistively loaded, differential amplifiers it is possible to simultaneously achieve
maximum bandwidth and minimum power dissipation for a given total gain and
bandwidth. The final design parameters for the limit amplifier were determined based
on the trade-off between bandwidth, power dissipation and input sensitivity. The peak
detector was biased in subthreshold to allow for bandwidth adjustment and the impact
of mismatch in the peak detector was explored. The integrator was implemented as
a gmC filter with a high DC gain and low-frequency dominant pole. The impact
of the non-ideal transfer function on the system performance was determined to be
acceptable. The gain of each loop is adjusted by using the appropriate number of
parallel integrator base cells. Finally, the details of the output buffer, which parallel
limit amplifier discussion, were presented. The next chapter will discuss behavioral
and SPICE simulation results.
78
Chapter 6
Results
The linear modeling analysis presented in Chapter 3 is useful for evaluating stability
issues of an individual control loop. However, we made several simplifying assump-
tions that allowed us to develop analytic expression to describe the system behavior.
We now need to validate these assumptions with both behavioral and SPICE simula-
tions of the entire system. We will explore the effect of non-idealities in the modeling
of each of the system blocks such as the nonlinear transfer function of the limit am-
plifier and the modeling of the peak detector as a gain and simple pole. Also, we
will explore the impact of multiple control loops and how transitioning between them
affects the overall system response.
6.1 CppSim Modeling and Simulation Results
Performing behavioral simulations with CppSim allows the designer to quickly gain
intuition about system issues and to iterate the design at a fast pace. CppSim is not
intended as a replacement for SPICE simulations. Rather, CppSim is an enhancement
to the tool set. Performing simulations as compiled C++ programs enhances the
user's ability to design innovative solutions by dramatically shortening the design
cycle. Specifics about working with CppSim can be found in the CppSim Manual,
located at http://www-mtl.mit.edu/perrottgroup/tools.html.
The following sections describe how each of the system components were modeled
in CppSim. The code for each block can be found in Appendix C. Following the
modeling discussion, CppSim simulation results are presented.
6.1.1 Limit Amplifier
In Section 3.2 the limit amplifier was simply modeled as a gain and a single pole
to enable us to get intuition about basic system issues. In reality there are several
non-idealities that limit the accuracy of this basic model that should be accounted for
in the behavioral model. First, the limit amplifier generally does not have a perfectly
linear relationship between the input and the output. In fact, as shown in Figure 6-1,
the transfer function can be modeled by a 3rd order polynomial over most of the
79
input range. Outside of the valid input range the transfer function can be modeled
by a simple limiting action. In other words, the output will be constant for inputs
larger than some input amplitude determined by the limit amplifier output swing,
gain and linearity.
0 ..6 ..'' ' .- ........ ........... ....... easUred
-0.6.4 -. 00 02 . .
r pFitter ine
CL
ca
Vout l'1 Vin +.c2i a3- i n
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
Differential Input Voltage (Volts)
Figure 6-1: 3rd Order Polynomial Fit to Limit Amplifier Transfer Function Measured
in Hspice
The approximation of modeling the frequency response of the amplifier with a
single pole is valid and provides excellent results. Additional modifications to the
model can be made to account for bandwidth extension techniques.
6.1.2 Peak Detector
The linear model that was developed for the peak detector simply consisted of a
gain and single pole, similar to the model for the limit amplifier. As discussed in
Section 2, the peak detector has asymmetric large signal tracking behavior due to the
difference between the pull-up and pull-down ability of the source follower. However,
based on the system modeling we know that the bandwidth of the peak detector,
which roughly corresponds to the loop bandwidth, is much lower than the data rate.
The peak detector simply filters the high-frequency components of the limit amplifier
output and extracts the DC component and the large signal changes in the limit
amplifier output are rejected by the peak detector. Changes in the instantaneous DC
component of each limit amplifier output are small in amplitude and occur at a rate
determined by the loop bandwidth. Therefore modeling the peak detector as a simple
low-pass filter is accurate for this specific system.
6.1.3 Integrator
The integrator model needs to account for the non-ideal, low-pass transfer function
of the design discussed in Section 5. The model for the integrator is a simple low-pass
80
filter and is very similar to the model for the peak detector.
6.1.4 Control Logic
Specifics of the models for each of the CppSim modules can be found in Appendix C.
6.2 CppSim Simulation Results
The two metrics that we are interested in measuring with CppSim are compensation
settling time and output jitter versus loop bandwidth. In CppSim, simulations were
performed with the loop bandwidth ranging from 1MHz to 10MHz, a data rate of
5Gbps and input amplitude of 5mVp,-. The system settling time can be measured
from the settling of the control voltage and the output jitter can be measured from
an eye diagram of the differential output signals. Figure 6-2 shows three plots of
the control voltage that correspond to loop bandwidths of (A) 1MHz, (B) 5MHz
and (C) 10MHz. Approximate settling times for the 3 cases are 1PS, 200nS and
100nS, respectively. All three designs meet the settling time goal of 1pS and agree
well with the behavior predicted by the linear modeling in Chapter 3. Figure 6-3
exhibits corresponding eye diagrams for the same 3 loop bandwidths. The impact of
the bandwidth on the output jitter is readily apparent. Interestingly, the measured
peak-to-peak jitter for the three bandwidths are 1pS, 5pS and 10pS.
To meet the specified jitter target of the design there is a maximum loop band-
width for a given data rate. There is also a lower bound on settling time, based on the
jitter performance, which is dependent on the data rate. The dominant cause of jitter
in these simulations is the data dependence of the compensation control voltage. As
explained in Chapter 2, the data dependence is caused by the droop at the output of
the peak detector, which is proportional to the symbol period of the data. Therefore,
higher data rates result in lower absolute jitter. However, most jitter specifications
are relative and are expressed as a percentage of the minimum symbol period. Since
both the absolute jitter and the minimum symbol period decrease as the data rate
increase, the minimum settling for a given design should remain roughly constant for
a given relative jitter goal, independent of data rate.
6.3 Hspice Simulation Results
This section will discuss the Hspice simulation results and compare them to the
CppSim simulation results. The circuit design details were discussed in Chapter 5.
Similar to the CppSim simulations, the SPICE design was simulated with the loop
bandwidth ranging from 1MHz to 10MHz, a data rate of 5Gbps and input amplitude
of 5mV _p. Figure 6-4 shows three plots of the control voltage that correspond to loop
bandwidths of (A) 1MHz, (B) 5MHz and (C) 10MHz. Approximate settling times for
the 3 cases are 800ns, 400nS and 200nS, respectively. The settling time performance
for the three cases are in good agreement with the CppSim results. Figure 6-5 exhibits
corresponding eye diagrams for the same 3 loop bandwidths. Both the shape of the
81
0 0.2 0.4 0.6 0.8 1
TIM
(A
12
10
8
6
E
0
4
2
0
1.2 1.4 1.6 1.8 2 0
x 10,
0 0.2 0.4 0.6 0.8 1 1.2 1.4
TIME
(C)
0.2 0.4 0.6 0.8 1
TIME
(B)
1.2 1.4 1.6 1.8 2
X 10,
1.6 1.8 2
x 10,
Figure 6-2: Control voltage of offset compensation loop during compensation from
CppSim: (A) 1MHz bandwidth, (B) 5MHz bandwidth, (C) 10 MHz Bandwidth
eye and the jitter behavior are in excellent agreement with the CppSim result. The
close match is due to the modeling of the non-linearities of the limit amplifier.
6.4 Summary
The simulations results presented show that the offset compensation loop operates
as intended. More extensive simulation result will be conducted to verify the full set
of design specifications. Noise simulations, extracted simulations with StarRC and
Monte Carlo simulations. Additionally, the trade-off between loop bandwidth and
output jitter will be characterized in Hspice and compared to the results from CppSim.
Finally, the Hspice results will be compared to measure results from packaged parts.
82
12
10
E
0)
-6
0
0
0
.... .. ........ .....
E
1.0
0 .8 - -. . -. . . . - - -. . -. - -. - -. ....... ...- - - - -. .-- -.- - - - - - - - - - - - - - -- - -
0.6
0 .4 - - - - - - - - - - - - - - - - - - - - - - - - --- - -- --
0 .2 - - - -- ------ -
- 0 .2 - - - - - - - - - - - - - - - --- -- - - - - - - - - -
-0 .4 - - - - -- - ----
-0 .6 - - - - - - - - - - - - - - - - - -.-- - . .- - - - --.-. - -. - -.- -.
-0 .8 - - - - -----
0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.00 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
TIME x I10  TIME x 1010
(A) (B)
0
-0.8
-0 .6 - ... - ...-... ....... - -.. -. -.. .. .. . -. .
0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
TIME Xi
10 0
(C)
Figure 6-3: Eye diagram of limit amplifier output after compensation from CppSim
83
0.2 0.4 0.6 0.8 1.0 1.2 1.4
TIME
(A)
12
10
'U
0)
CU
0
-5
C
00
2
0
1.6 1.8 2.0 0 0.2 0.4 0.6 0.8 1.0
x 10
6  
TIME
(B)
1.2 1.4 1.6 1.8
Figure 6-4: Control voltage of offset compensation loop during compensation from
Hspice: (A) 1MHz bandwidth, (B) 5MHz bandwidth, (C) 10 MHz Bandwidth
84
10
.9
0
C-
0 1.2 1.4 1.6 1.8 2.0
x 10 6
0 0.2 0.4 0.6 0.8 1.0
TIME
(C)
2.0
x106
.... ....... .......
... .......... .q q............. .......
............... .
.......... ...... ..............
q
-. ..... ... .. ... ......
10
0
0
0
-0
-0
-0
-0.
-1. 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
TIME xi100
(B)
0.5 1.0 1.5 2.0
TIME
(C)
2.5 3.0 3.5 4.0
x 10 
Figure 6-5: Eye diagram of limit amplifier output after compensation from Hspice
85
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.00
TIME x 10
10
(A)
.0
.8
.6 ...-. -. ..- -. .-. .-- -.-.-. . - - - - - -
.4 -. . . -. .-- - . .- . .- - - -
.2 - -.-..- .- -.-
0
2 -. . . .
.4 -. . . . . . . . . . . . . . . . . . . . . . .
6 -. . . . .
.8 .. . ..... .......... .
- .........-  ..-  ...-- - - -- --
-~~~~~~ -q
- .......-  .-  ...-  .....- .. -. .-. .-. .-. -
- - - -.-.-.-.-.-.-.-.
0
0
1.0
0.
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
-1.0
86
Chapter 7
Layout
In this chapter key layout details that impact the performance of the forward amplifier
path and the offset compensation are discussed. Since both the forward data path and
the feedback compensation path are differential, inevitable mismatches between the
devices in the two paths will add offset to the loop. Mismatches in the forward path
will be compensated for by the feedback loop, but offset introduced by the feedback
path will be simply low pass filtered and appear directly at the output. The layout
of the limit amplifier cells is optimized for high-speed operation at the expense of
matching issues. Techniques such as common-centroid layout and the addition of
dummy devices must be used in the feedback cells in addition to sizing and biasing
devices for acceptable matching. The layout for cells in the feedback path will be
discussed in the context of matching and the layout of the limit amplifier will be
discussed in the context of achieving maximum bandwidth and an acceptable level of
matching.
7.1 Peak Detector
As mentioned in Chapter 5, the input devices of the peak detector are sized to achieve
an acceptable trade-off between matching of the devices in the differential paths and
the impact on the limit amplifier bandwidth. To enhance the matching between the
two input devices in the peak detector, a common centroid layout and dummy devices
were used as shown in Figure 7-1. The maximum bandwidth of the peak detector is
less than 20MHz so the added capacitance from the dummy devices could be tolerated.
Additionally, a guard-ring completely surrounds the peak detector and isolates the
sensitive input devices from substrate noise. Finally, no metal interconnect overlaps
the sensitive poly gates of the input devices.
7.2 Integrator
Similar to the peak detector, offset between the differential input devices of the inte-
grator are directly referred to the output through a low-pass filter transfer function.
Therefore, the devices were laid out with dummy loading devices to either side, as
87
Figure 7-1: Detail of Peak Detector Cell Showing Common-Centroid Layout and
Dummy Devices
shown in the top of Figure 7-2. The tail current-source devices are located at the
bottom of the cell and arranged as an array of parallel gates with dummy loading
devices on either end. The interconnect for biasing and data signals are located in
the center and form buses that connect adjacent cells in the array. Also similar to the
layout for the peak detector, substrate guard-rings surround all sensitive circuitry.
The top level layout of the integrator cell, an array of 64 integrator base cells, is
shown in Figure 7-3.
7.3 High Speed Limit Amplifier
Unlike the peak detector and integrator layouts where matching was the biggest
concern, the focus in the layout of the limit amplifier, Figure 7-4, is to minimize
parasitic capacitance. To this end, the aspect ratio of the limit amplifier cell, or
the ratio between the width and height of the cell, is quite small to minimize the
length of the interconnect. To balance the impact of the RC time-constant of the
interconnect and the coupling capacitance to ground, the signal lines are 2[pm wide.
Also, to minimize the coupling capacitance between the differential signal lines, a
ground shield is place between the lines and all interconnect lines are separated by
3pm. Also, no dummy stripes or devices were used.
Due to the large bias currents in the limit amplifier, required by the input tran-
sistors for high-speed operation, special care was used in sizing all of the metal inter-
connect. The tail current source device, located at the bottom of the cell, is large to
achieve a low over-drive with the large bias current levels. Substrate taps are located
to either side of the bias current device to ensure that the local substrate remains
88
Figure 7-2: Layout of Base Integrator Cell
F r -: y I
MM>
:444 paale sharing a comnpwe-n;run&u.Th oa igAl ahlnt
in the limit amplifier is less than 200pm and the resulting bandwidth is greater than
50GHz.
7.4 Output Buffer
The layout concerns for the output buffer are similar to those in the limit amplifier,
but perhaps more severe. The current levels in the output buffer are approximately
three times larger than in the limit amplifier so the current carrying metal lines are
exceptionally large. Also, since the tail current source is proportionally larger than
89
pl-da t etr The utput bufrlyu is show in' Figure 7-6.~~XA
Figurp e 75Lyutoeii Apiir o ee
The top level layout view is shown in Figure 7-8. The final die size, after shrink,
is just over 1mm x 1mm. The system component cells were laid out to minimize
interconnect length, to ensure that there is sufficient power and ground bussing to all
cells and to satisfy the strict density rules for poly, composite and each of the metal
layers.
Both the limit amplifier and the output buffer are located close to bond pads, in the
center of the left side and bottom, respectively, to minimize the on-chip interconnect
length. Also, the interconnect between the bond pads and the limit amplifier and
output buffer cells is laid out symmetrically (i.e. the two differential paths are the
same length and have the same bends).
90
Figure 7-6: Layout of Output Buffer Stage
Figure 7-7: Layout of Output Buffer Top Level
7.6 Summary
In this chapter the critical layout issues for each of the cells was discussed. Matching
issues are paramount for both the peak detector and integrator cells so common-
centroid layout configurations and dummy loading devices were used. However, speed
is the critical issue in the limit amplifier over matching. The layout of the limit
amplifier cells is symmetric to improve matching between the differential paths but
the layout is optimized to minimize diffusion and interconnect capacitances. The
layout of the output buffer parallels the layout of the limit amplifier. The individual
cells were arranged in the top-level layout to minimize interconnect length and to
spatially isolate the high-speed blocks from the sensitive feedback cells.
91
ONJIM
Figure 7-8: Layout of Chip Top Level
92
Chapter 8
Conclusions and Future Work
The design of a low power, fully integrated, fast offset compensation system for high-
speed data-links was described in this thesis. The system meets all of the performance
goals of total compensation time, output jitter and residual offset voltage. The key
enabling system component, the peak detector, allows this design to achieve simi-
lar jitter and residual offset performance to current compensation techniques while
achieving superior settling time performance. Additionally, the design eliminates the
need for expensive off-chip components, which is attractive in the drive to reduce
component count and system costs.
The initial system architecture and performance were proven in CppSim, a fast and
accurate behavioral simulator, and verified in SPICE. The system was implemented
in National Semiconductor's 0.18pm CMOS9 process and is currently in the process
of fabrication. Measured results will be compared to simulated results when parts are
available.
8.1 Contributions
The key contribution of this thesis is the introduction of a new offset compensation
implementation for CMOS limit amplifier that significantly reduces the dependence of
output jitter on loop bandwidth, or settling time. In all current offset compensation
designs that we are aware of the output jitter is proportional to the loop bandwidth.
As a result, all of these systems suffer from long compensation time to meet strict jitter
specifications. By taking advantage of the symmetry of the limit amplifier design, a
novel peak detector design was developed that allows the designed system to modify
this relationship. Both the bandwidth and droop of traditional peak detectors are
proportional to the bias current. However, only the bandwidth is dependent on the
bias current in the proposed design. The droop is determined by NMOS off-state
leakage current. The designed offset compensation loop achieves both fast settling
time and low output jitter.
Additionally, a novel numerical design procedure was developed for high-speed, re-
sistor loaded, differential, CMOS amplifiers. Extremely accurate designs are realized
because inaccurate square-law models are replaced with simulated device character-
93
istics from SPICE. The design procedure was implemented in a Matlab script and is
available from http://www-mtl.mit.edu/perrottgroup/tools.html.
8.2 Future Work
A possible enhancement to the proposed approach would allow us to further alle-
viate the trade-off between settling time and jitter. We want to achieve very fast
settling time but also have low output jitter. The compensation time of the system is
proportional to the loop bandwidth. However, even with the proposed design there
is a trade-off between settling time and output jitter - faster settling times result in
higher jitter. We could slightly modify the existing design so that the loop bandwidth
adapts based on the required operation. The loop bandwidth could be increased to
provide fast settling performance during initial compensation and then decreased to
provide low jitter performance during steady-state operation. This could be accom-
plished simply by controlling the peak detector bias current and would provide further
performance enhancements.
A possible extension of the work completed in this thesis is to investigate the
application of the proposed peak detector to AGCs. AGCs are prevalent in bipolar
processes but see limited application in CMOS designs because it is difficult to perform
the amplitude detection. Current CMOS AGC designs use complex peak detector
circuitry in the gain control loop. The proposed design offers a simple alternate
means of performing the peak detection.
The system described in this thesis was specifically designed to be used in wire-
line applications with NRZ data. However, the issue of offset compensation is not
restricted to broadband systems. Offset is also a significant issue in direct conversion
wireless systems and it would be interesting to investigate the application of this
compensation technique to direct conversion receiver designs.
A full characterization of the relationship between the loop bandwidth and the
resulting jitter performance was not completed in this thesis. However, the circuitry
required to adjust the loop bandwidth, and the required loop gain, were included
in the system design. When final packaged parts are available, this analysis will be
performed and measured results will be compared to the simulated results.
94
Appendix A
Derivation of Input Referred Offset
Voltage
Following are the derivations for offset voltage assuming both square-law operation
and velocity saturation. As will be shown, the results have similar forms. In fact, the
results are only different by a fixed scale factor and, as expected, the result assuming
velocity saturation is dependent on L through the VTH term. In reality, the offset
voltage will be somewhere in between these two results. For the following discussion
refer to Figure A-1.
R R
Vout+ vout-
Vin+ W W Vin-
'bias
Figure A-1: Implementation of Each Stage in Limit Amplifier
In order to quantify the offset voltage as a function of circuit parameters, consider,
as mentioned in Chapter 1, mismatches in the transistor physical dimensions, where
(W/L)1 = W/L and (W/L)2 = W/L + A(W/L), transistor threshold voltages, where
VTH1 = VTH and VTH2 = VTH + AVTH, and the load resistor values, where R 1 = R
and R 2 = f + AR. In order for Vt = 0, I1R, = -12R 2, which means I1 # 12, since
R1 , R2[1]. Similar to the previous parameterizations, 1 = I and 12 = I + AI.
95
A.0.1 Square-Law Operation
Following is the development assuming square-law operation.
currents in M1 and M 2 as:
I1 =W- . (VGS - VTH ) 2
We can quantify the
(A.1)
Solving for VGs :
(A.2)21VGS ()+ VTH
\ nCox W
With Vos,in = VGS1 - VGS2, substituting for the appropriate parameters yields:
2I 1
=1, +VTH1
\n pCox (fl
2
PnCox
2
PnCox
I
(W/L)
I
S(W/L)
212
pnCox (L- V
I+AI
(W/L) + A(W/L)
I1+ (Al)
1 A(WL)(WIL)J
To recast the above expression in terms of the transistor and resistor physical pa-
rameters, we need to solve for AI/I in terms of AR/R.
R 2 = R + AR, I, = I and 12 = I+ AI.
11R 1 = I2R2:
IR
From before,
To satisfy the condition that Vo0 t
= (R+AR).(I+AI)
= R,
=0,
(A.6)
(A.7)
(A.8)IR + AIR + IAR
After solving for AI/I, assuming that the product of two small quantities approaches
96
Vos'in (A.3)
(A.4)
(A.5)
= IR + AIR + IAR + AIAR
R,
zero and recognizing that CW/L) - (VGS - VTH) 2, the following equation results:
- (VGS - VTH) [
2
A(W/L)
(W/L) + A]- AVTHR (A.9)
Finally, recognizing that offset voltage is a statistical value, similar to noise, we can
square each side of the last equation to find the standard deviation of the input
referred offset voltage.
2 (VGS - VTH )2 {A(W/L)(W/L) I 2 [LR.]22+ [AR . + AV2TH (A.10)
A.0.2 Velocity Saturation
Following is the development assuming velocity saturation.
currents in Mi and M 2 as:
I - -_ wLK
_ InCox W
- 2L
- (VGS - VTH) VDSAT
- (VGS - VTH) LESAT
We can quantify the
(A.11)
(A.12)
Solving for VGS:
VGS
2 I
= 
+VTH
AnCoxESAT W
With Vos,in = VGS1 - VGS2, substituting for the appropriate parameters yields:
2 I1
PnCoxESAT W 1 l
2
VTH1 - 2
PnCox ESAT
VTH2 (A.13)
2
PnCoxESAT
21
PnCoxESATW
I
W
I+AI AVH
- - AVTH
97
os,in
(A.14)
(A.15)
W2
To recast the above expression
rameters, we need to solve for
R 2 = R + AR, I1 = I and I 2
IIR 1 = I2R2:
in terms of the transistor
Al/I in terms of AR/R.
= I + Al. To satisfy the
and resistor physical pa-
From before, R1  R,
condition that 17 out = 0,
IR = (R + AR) - (I + AlI)
= IR + AIR + IAR + AIAR
(A.16)
(A.17)
(A.18)~ IR + AIR + IAR
After solving for AI/I, assuming that the product of two small quantities approaches
zero and recognizing that 21 w = (VGS - VTH), the following equation results:IpnC..ESATW
Vos,in = (VGS - VTH) . + - THW R . AT (A.19)
Finally, recognizing that offset voltage is a statistical value, similar to noise, we
can square each side of the last equation to find the standard deviation of the input
referred offset voltage.
=2 - VGS TH) 2 . IAW(2 AR 2W R TH (A.20)
98
Appendix B
Circuit Design Details
B.1 ESD Design
It is essential to add ESD protection to all pins that interact with the outside world.
Static charge during the manufacturing, packaging and testing of the parts can destroy
sensitive circuitry at the outputs of the chip if ESD is not added. However, the ESD
circuitry adds capacitance that is directly proportional to the level of protection it
provides. Therefore, there is a direct conflict between high-speed operation and the
necessary ESD protection.
To achieve the greatest flexibility in the chip design, custom ESD cells were cre-
ated. A simplified schematic of the basic ESD structure used is shown in Figure B-1.
The ESD protection essentially forms a high-pass filter with a very high cut-off fre-
quency formed by the gate resistance and the capacitances at the drain of the NMOS
device. The transfer function can be approximated by:
Poly Resistor
1/O PAD
Zin-
Rsmall
W S D
7503a
aa
ESD GND 13-h-
Figure B-1: Simplified schematic of ESD circuitry used on all pads
Zin(s) = 1 [1 + R(Cgd + Cgs)s 1 (B.1)
SCgd l + gRg+ RgCgds SCgd
The level of ESD protection is controlled primarily by the device width, W. The
99
gate resistance, Rg, has limited impact based on analytical analysis and Hspice sim-
ulations. As a rough rule of thumb, the structure provides 10V of protection per pum
of gate width. Additionally, the gate resistance should be sized somewhere between
300Q and 1kQ to achieve the desired level of protection. The small series resistor at
the drain of NMOS is used in lieu of an unsilicided region and acts as a current limiter
to protect the NMOS device. Data signals are not affected by the ESD circuitry, other
than by the added capacitive load. Since ESD events are characterized by very short
pulses of charge injection most of the energy in the pulse is at very high frequencies
and will be filtered by the ESD circuitry.
We can tolerate a significant amount of additional capacitance from the ESD
structures on all of the programming pins, the bias pins and the supply pins. There-
fore, the ESD for the control pins, bias pins and the supply pins was designed for
1kV of protection. The total gate widths for these structures is 100Pm and Rg was
set to 750Q based on Hspice simulations and Rsmai is just 20Q. The total capacitive
load due to the ESD and bond pad was extracted using StarRC is 198fF. Specialized,
low capacitance ESD structures are required on both the RF input pads and the RF
output pads. The ESD for the RF I/O pads was designed for 200V of protection.
The total gate widths for these structures is 4 0 tm, Rg is 750 and Rsmai is 20Q.
The bond pads measure 75pum x 75pm and are constructed of metal layers 3, 4 and
5. The total capacitive load due to ESD and the bond pad on the RF I/O's was also
extracted using StarRC and is just 185fF. The final pad and ESD layouts for the two
levels of protection are shown in Figure B-2.
ESD Structure and Pad RF ESD Structure and Pad
w/ 1kV Protection w/ 200V Protection
Figure B-2: Layouts for the two pads with ESD structures
100
Appendix C
CppSim Code
C.1 Limit Amplifier Code
The CppSim code to implement the limit amplifier is shown below. The transfer
function for the amplifier is separated into several stages. The two amplifier class
definitions, ampi and amp2, are used to implement the non-linear transfer func-
tion and limiting action. To model the amplifier stages without the neutralization
capacitors, flit and f1t2 are used to model the bandwidth of the amplifier stages.
Alternately, 13-fllt6 are used to model the amplifier frequency characteristics with
the neutralization capacitors. The two differential paths of each stage of the limit
amplifier are implemented as a cascade of each of these classes.
module: ampi
parameters: double off double gain double min double max
inputs: double a double b
outputs: double y double z
classes:
Amp amp1("off/2+gain*a+A1*a^2+A2*a^3", "off ,gain,min,max,A1,A2" ,off,
gain,min,max,0.6,-3.3);
Amp amp2("-off/2+gain*b+B1*b^2+B2*b^3" ,"off ,gain,min,max,B1,B2" ,off,
gain,min,max,0.6,-3.3);
Filter filtl("1","1 + 1/(2*pi*fp)*s","fp,Ts,Min,Max",10e9,Ts,minmax);
Filter filt2("1", "1 + 1/(2*pi*fp)*s","fp,Ts,Min,Max",10e9,Ts,min,max);
Filter filt3("1 + 1/(2*pi*fz)*s", "1", "fz,Ts",100e9,Ts);
Filter filt4("1 + 1/(2*pi*fz)*s", "1", "fz,Ts",100e9,Ts);
Filter filt5("1" , "1 + 1/(2*pi*fp)*s + 1/(fp*fz*(2*pi)^2)*s^2","fp,fz,
Ts,Min,Max",10e9,60e9,Ts,min,max);
Filter filt6("1","1 + 1/(2*pi*fp)*s + 1/(fp*fz*(2*pi)^2)*s^2","fp,fz,
Ts,Min,Max",10e9,60e9,Ts,min,max);
staticvariables:
init:
code:
101
amp1. inp (a);
amp2. inp (b) ;
filt3.inp(ampl.out);
filt4.inp(amp2.out);
filt5.inp(filt3.out);
filt6.inp(filt4.out);
y = filt5.out;
z = filt6.out;
C.2 Peak Detector Code
The CppSim code used to implement the peak detector is shown below. Initially, the
code modeled the peak detector as a low pass filter when the input was high and
as a constant droop when the input was low. However, the final code models the
peak detector as a simple low-pass filter with unity gain. The entire peak detector
operation is performed by the flit and fl1t2 class definitions.
module: peakdetect
parameters: double fp
inputs: double in double inb
outputs: double out double outb
staticvariables:
classes:
Filter filtl("K","1 + 1/(2*pi*fp)*s","K,fp,Ts",1,fp,Ts);
Filter filt2("K".,"1 + 1/(2*pi*fp)*s","K,fp,Ts",1,fp,Ts);
init:
out = 0.0;
outb = 0.0;
code:
filti.inp(in);
out = filtl.out;
filt2.inp(inb);
outb = filt2.out;
C.3 Integrator Code
Similar to the CppSim code for the peak detector, the integrator CppSim code is
omitted because it merely implements a low-pass filter with a gain term and closely
resembles the peak detector code.
102
Appendix D
Optimal Gain/Stage for Maximum
Bandwidth
Vin Av(1) Av(2) Av(n-1) Av(n) Vout
Figure D-1: High-Speed, Multi-Stage Limit Amplifier
D.0.1 Determining Optimal Number of Stages
Following the derivation in Chapter 8 of [20], we can determine the optimal gain per
stage, or conversely, the optimal number of stages, of the limit amplifier to maximize
bandwidth for a given total amplifier gain. As before, if we model each stage of the
limit amplifier by a gain and a single pole, the model for the limit amplifier is:
Vout _ A n (D.1)
Vin 1 + Wi/Wo
where A is the gain/stage, w, is the overall bandwidth of the limit amplifier, wo is the
bandwidth of each stage and n is the number of stages. The overall 3dB bandwidth
of the amplifier is:
Vout A n A"n
-1 -+ /- (D .2)
A ( 1(D.3)
1+ (w1/WO)2 V/2
103
If we solve for w, we can see that as the number of stages, n, increases the bandwidth
decreases much slower than the gain increases:
(1 + (w1/wO) 2)n = 2 (D.4)
(D.5)Wi= wO 21/n - 1
This means that we can increase the gain-bandwidth product as the number of stages
increases, to a limit. To determine what this limit is, let's assume that to first order
each individual stage has a constant gain-bandwidth product of:
Awo = wt > WO = (D.6)
Substituting Equation D.6 into Equation D.5 yields the following result:
(D.7)W = 21/n - I
If we want to achieve a total gain G with n stages, where A = G1/, then Equation D.7
becomes:
W = Wt 21/n - 1 (D.8)
To determine the optimal gain per stage, recognize that:
21/n = exp (ln2)(n (D.9)
which, for large n, can be rewritten as:
exp ( 1n2=(n 1 + Iln2n (D.10)
Substituting Equation D.10 into Equation D.8 and simplifying terms yields:
Wt 1
1 = -G-1n2
GI/ n
(D.11)
104
Wt n2(D.12)
GI/ n
(D.13)
Wi Wt /lr2
S- GlI/n( 1) (D.14)
Finally, determine the number of stages, n, to achieve the maximum total bandwidth
for a total gain by taking the derivative of Equation D.14, and setting the result equal
to zero to find the minimum.
d (riG = 0 (D.15)
dn
In(G1 ) =- G e/ (D.16)
2
Therefore, the optimal gain per stage, for maximum bandwidth, is approximately
1.65.
105
106
Appendix E
Matlab Amplifier Script
Following are two Matlab scripts based on the design methodology presented in Chap-
ter 4. The first version of the script accepts output voltage swing (Vsw), gain (Av),
desired bandwidth (BW), scaling ratio of loading stage to driving stage (scale) and
process and temperature corner (corner) as input arguments and returns device width
(W), resistance (R) and bias current (I) for the design. The second version of the
script accepts output voltage swing, gain, desired power dissipation, scaling ratio of
loading stage to driving stage and process and temperature corner as input arguments
and returns device width (W), resistance (R) and 3dB bandwidth (F) for the resulting
design.
E.1 Script for Fixed Bandwidth
This is the Matlab script for calculating transistor width, load resistance and amplifier
bias current for a given output voltage swing, gain and bandwidth.
% Script for generating device size and biasing for resistively
% loaded differential pair with National Semiconductor CMOS9
% 0.18um process. Inputs are output voltage swing (Vsw), DC
% gain (Av), power (Ibias), scaling factor (Scale) which represents
% the ratio of widths for the loading stage and the stage under test,
% model corner and the simulation characterization file name (file).
% Script returns the input pair width (assuming l=0.18um), bias current,
% load resistance (RlE), bandwidth (FbwoE) and calculated gain (AvoeE).
% Format for specifying corner: <model corner><temp>
% choices: s-40, s25, s85, t-40, t25, t85, f-40, f25, f85
% [WnE,RlE,IbiasE]=diffampnscsBW(Vsw,Av,BW,Scale, corner)
% Alternatively...
% [WnE,RlE,IbiasE,FbwoE,VswoE,AvoE]=diffampnscBW(Vsw,Av,BW,Scale,corner)
107
function [WnE,RlE,IbiasE,FbwoE,VswoE,AvoE]=diffampnscBW(Vsw,Av,BW,
Scale,corner)
% Process constants: NSC CMOS9 0.18um
% format is [t s f]
switch corner
case 't-40'
c = 1;
filel =
file2 =
case 't25'
c = 1;
filel =
file2 =
case 't85'
c = 1;
filet =
file2 =
case 's-40'
c = 2;
filet =
file2 =
case 's25'
c = 2;
filet =
file2 =
case 's85'
c = 2;
filet =
file2 =
case 'f-40'
c = 3;
filet =
file2 =
case 'f25'
c = 3;
filet =
file2 =
case 'f85'
c = 3;
filet =
file2 =
end;
'spicefiles/test.sw4';
'spicefiles/test.ac4';
'spicefiles/test.sw5';
'spicefiles/test.ac5';
'spicefiles/test.sw6';
'spicefiles/test.ac6';
'spicefiles/test.swt';
'spicefiles/test.act';
'spicefiles/test.sw2';
'spicefiles/test.ac2';
'spicefiles/test.sw3';
'spicefiles/test.ac3';
'spicefiles/test.sw7';
'spicefiles/test.ac7';
'spicefiles/test.sw8';
'spicefiles/test.ac8';
'spicefiles/test.sw9';
'spicefiles/test.ac9';
108
%XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX%%XXXXXXXXXX%%%XXXXXX%%XX
% Design variables - NEED TO UPDATE PROCESS & CIRCUIT PARAMETERS
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXrrrrrXX%%%%%%%%%%%%%%%%%%%%XXXXXXXXX
Hdin = 0.56e-6; % minimum s/d extension w/ sharing (m)
Hdout = 0.52e-6; % minimum s/d extension w/out sharing (m)
Rsh = 185; % sheet rho of pp-poly resistor (ohms/sq)
Lmin = 0.40e-6; % use minimum L devices (m)
CRp = 0.5*98.6e-6; % 1/2 of capacitance of np-poly (F/um^2)
Idenmax = 0.5e3; % maximum current den for poly res (A/m)
MinSq = 5; % Minimum # of squares for poly res
Cf ix = 15e-15; % Assume fixed wiring cap. (F)
XXXX7XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX%%XXXXXXXXXXXXXXXX%%%XXXXX%
% Empirical solution for device width - need to call Hspice
% and extract typical gm/W value for the given Av and Vsw
% Alternately, load characterization file
%hspsim
x = loadsig(filel);
gmSIM = 1.1*evalsig(x,'lx7_mO');
gdsSIM = evalsig(x,'lx8_m0');
IdSIM = evalsig(x,'i-mO');
wSIM = 2*evalsig(x,'w');
y = loadsig(file2);
cgSIM = 2/1.1*evalsig(y,'lxl8_mO');
cgdSIM = -2/1.1*evalsig(y,'lx19_mO');
cgsSIM = -2/1.1*evalsig(y,'lx20-mO');
cgbSIM = -2/1.1*evalsig(y,'lx21_mO');
cdbSIM = -2/1.1*evalsig(y,'lx22_mO');
% file is test.par
% load Hspice dc data
% simulated gm
% simulated gds
% simulated Id
% width from char
% load Hspice ac data
% cg for load device
% cgd for load device
% cgs for load device
% cgb for load device
% cdb for driving device
% Determine the appropriate current density and
% transconductance density from characterization
% data based on desired gain and voltage swing.
k = length(wSIM);
while (gmSIM(k)/wSIM(k))>
(2*Av/Vsw*(IdSIM(k)/wSIM(k))+Av*gdsSIM(k)/wSIM(k))
k=k-1;
end;
gmDen = gmSIM(k+1)/wSIM(k+1);
gdsDen = gdsSIM(k+1)/wSIM(k+1);
iDen = IdSIM(k+1)/wSIM(k+1);
cgDen = cgSIM(k+1)/wSIM(k+1);
cgdDen = cgdSIM(k+1)/wSIM(k+1);
% empirical gm/w
% empirical gds/w
% empirical ibias/w
% empirical cg/w
% empirical cgd/w
109
cgsDen = cgsSIM(k+1)/wSIM(k+1); % empirical cgs/w
cgbDen = cgbSIM(k+1)/wSIM(k+1); % empirical cgb/w
cdbDen = cdbSIM(k+1)/wSIM(k+1); % empirical cds/w
% Determine the circuit parameters based on the process
% characterization data, design specifications and the script
% These equations are for inputs - Vsw, Av, Ibias
WnE = 0.28e-6;
IbiasE = 2*iDen*WnE;
RlE = Vsw/IbiasE;
AvoE = gmDen*WnE*RlE/(1+gdsDen*WnE*RlE);
VswoE = IbiasE*RlE;
U/ To ignore Miller effect use this calculation for load device
UX capacitance for some strange reason NSC models have Cg = Cgs+Cgb
UX (no Cgd)
%Cvar = WnE*(cdbDen+Scale*(cgsDen+cgbDen+cgdDen));
% To account for Miller effect, use this calculation
Cvar = WnE*(cdbDen+Scale*(cgsDen+cgbDen+(1+AvoE)*cgdDen));
% Need to guarantee that Nsq >= MinSq
% This code calculates resistor size based on satisfying
% minimum number of squares (Nsq) and current density
% requirements based on design rule recommendations for
% device matching.
% Nsq = VswoE/(IbiasE*Rsh);
% Width = 2e-6;
% if ((Nsq >= MinSq)&&(IbiasE/Width <= Idenmax))
% Length = Width*Nsq;
% else if ((Nsq >= MinSq)&&(IbiasE/Width > Idenmax))
% Width = IbiasE/Idenmax;
% Length = Nsq*Width;
% else if ((Nsq < MinSq)&&(IbiasE/Width <= Idenmax))
% Width = ceil(MinSq/Nsq)*2e-6;
% Length = Nsq*Width*ceil(MinSq/Nsq);
% else if ((Nsq < MinSq)&&(IbiasE/Width > Idenmax))
% Width = max(ceil(MinSq/Nsq)*2e-6, IbiasE/Idenmax);
% Length = Nsq*Width*ceil(MinSq/Nsq);
% end
% end
% end
% end
110
% This code calculates resistor size based on satisfying
% current density rules only and optimizes the size for
% speed. The total area is kept larger than 10um^2.
Nsq = VswoE/(IbiasE*Rsh);
Width = max(2e-6,IbiasE/Idenmax);
Length = Nsq*Width;
if (Width*Length < 10e-12)
ratio = sqrt(10e-12/Width/Length);
Width = Width*ratio;
Length = Length*ratio;
end
FbwoE = (2*pi*RlE/(1+gdsDen*WnE*RlE)*(Cfix+Cvar+CRp*Length*Width))^-1;
if (FbwoE<BW)
while (FbwoE<BW)
WnE = WnE+O.Ole-6;
IbiasE = 2*iDen*WnE;
RlE = Vsw/IbiasE;
AvoE = gmDen*WnE*RlE/(1+gdsDen*WnE*RlE);
VswoE = IbiasE*RlE;
%% To ignore Miller effect use this calculation for load
Udevice capacitance
=% for some strange reason NSC models have Cg = Cgs+Cgb (no Cgd)
%Cvar = WnE*(cdbDen+Scale*(cgsDen+cgbDen+cgdDen));
%% To account for Miller effect, use this calculation
Cvar = WnE*(cdbDen+Scale*(cgsDen+cgbDen+(1+AvoE)*cgdDen));
Nsq = VswoE/(IbiasE*Rsh);
Width = max(2e-6,IbiasE/Idenmax);
Length = Nsq*Width;
if (Width*Length < 10e-12)
ratio = sqrt(10e-12/Width/Length);
Width = Width*ratio;
Length = Length*ratio;
end
FbwoE = (2*pi*RlE/(1+gdsDen*WnE*RlE)*
(Cfix+Cvar+CRp*Length*Width))^-1;
if WnE>700e-6
FbwoE = 1e20;
WnE = O.Ole-6;
end;
end;
WnE = WnE-O.Ole-6;
else
111
FbwoE = 1e20;
WnE = O.Ole-6;
end;
E.2 Script for Fixed Power Dissipation
This is the Matlab script for calculating transistor width, load resistance and amplifier
bias current for a given output voltage swing, gain and power dissipation.
% Script for generating device size and biasing for resistively
% loaded differential pair with National Semiconductor CMOS9
% 0.18um process. Inputs are output voltage swing (Vsw), DC
% gain (Av), power (Ibias), scaling factor (Scale) which represents
% the ratio of widths for the loading stage and the stage under test,
% model corner and the simulation characterization file name (file).
% Script returns the input pair width (assuming 1=0.18um), bias current,
% load resistance (RlE), bandwidth (FbwoE) and calculated gain (AvoeE).
% Format for specifying corner: <model corner><temp>
% choices: s-40, s25, s85, t-40, t25, t85, f-40, f25, f85
% [WnE,RlE,FbwoE, IbiasE]=diffampnscPWR(Vsw,Av, Ibias,Scale,corner)
% Alternatively .. .
% [WnE,RlE,FbwoE,IbiasE,VswoE,AvoE]=
% diffampnscPWR(Vsw,Av, Ibias ,Scale, corner)
function [WnE,RlE,FbwoE,IbiasE,VswoE,AvoE]=
diffampnscPWR(Vsw, Av, Ibias , Scale, corner)
% Process constants: NSC CMOS9 0.18um
% format is [t s f]
switch corner
case 't-40'
c = 1;
filet = 'spicefiles/test.sw4';
file2 = 'spicefiles/test.ac4';
case 't25'
c = 1;
filet = 'spicefiles/test.sw5';
file2 = 'spicefiles/test.ac5';
case 't85'
112
c = 1;
filel =
file2 =
case 's-40'
c = 2;
filel =
file2 =
case 's25'
c = 2;
filet =
file2 =
case 's85'
c = 2;
filet =
file2 =
case 'f-40'
c = 3;
filet =
file2 =
case 'f25'
c = 3;
filet =
file2 =
case 'f85'
c = 3;
filet =
file2 =
end;
'spicefiles/test.sw6';
'spicefiles/test.ac6';
'spicefiles/test swi';
'spicefiles/test.act';
'spicefiles/test.sw2';
'spicefiles/test.ac2';
'spicefiles/test.sw3';
'spicefiles/test.ac3';
'spicefiles/test.sw7';
'spicefiles/test.ac7';
'spicefiles/test.sw8';
'spicefiles/test.ac8';
'spicefiles/test.sw9';
'spicefiles/test.ac9';
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
% Design variables - NEED TO UPDATE PROCESS & CIRCUIT PARAMETERS
%%%%%%%%%%XXXXXXXXXXXXXXXXXXXXX/YXrrrrrXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Hdin = 0.56e-6; % minimum s/d extension w/ sharing (m)
Hdout = 0.48e-6; % minimum s/d extension w/out sharing (m)
Rsh = 185; % sheet rho of pp-poly resistor (ohms/sq)
Lmin = 0.40e-6; % use minimum L devices (m)
CRp = 0.5*98.6e-6; % 1/2 of capacitance of np-poly (F/um^2)
Idenmax = 0.5e3; % maximum current den for poly res (A/m)
MinSq = 5; % Minimum # of squares for poly res
Cfix = 15e-15; % Assume fixed wiring cap. (F)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX%%XXXXXXXXX%%%%XXXXXXXX
% Empirical solution for device width - need to call Hspice
% and extract typical gm/W value for the given Av and Vsw
% Alternately, load characterization file
113
Xhspsim
x = loadsig(filel);
gmSIM = 1.1*evalsig(x,'lx7_mO');
gdsSIM = evalsig(x,'lx8_mO');
IdSIM = evalsig(x,'i-mO');
wSIM = 2*evalsig(x,'w');
y = loadsig(file2);
cgSIM = 2/1.1*evalsig(y,'lx18_mO');
cgdSIM = -2/1.1*evalsig(y,'lx19_m0');
cgsSIM = -2/1.1*evalsig(y,'lx2OmO');
cgbSIM = -2/1.1*evalsig(y,'lx21_mO');
cdbSIM = -2/1.1*evalsig(y,'lx22_mO');
% file is test.par
% load Hspice dc data
% simulated gm
% simulated gds
% simulated Id
% width from char
% load Hspice ac data
% cg for load device
% cgd for load device
% cgs for load device
% cgb for load device
% cdb for driving device
% Determine the appropriate current density and
% transconductance density from characterization
% data based on desired gain and voltage swing.
k = length(wSIM);
while (gmSIM(k)/wSIM(k))>
(2*Av/Vsw*(IdSIM(k)/wSIM(k))+Av*gdsSIM(k)/wSIM(k))
k=k-1;
end;
gmDen = gmSIM(k+1)/wSIM(k+1);
gdsDen = gdsSIM(k+1)/wSIM(k+1);
iDen = IdSIM(k+1)/wSIM(k+1);
cgDen = cgSIM(k+1)/wSIM(k+1);
cgdDen = cgdSIM(k+1)/wSIM(k+1);
cgsDen = cgsSIM(k+1)/wSIM(k+1);
cgbDen = cgbSIM(k+1)/wSIM(k+1);
cdbDen = cdbSIM(k+1)/wSIM(k+1);
0/
04
04
04
04
04
04
04
empirical
empirical
empirical
empirical
empirical
empirical
empirical
empirical
% Determine the circuit parameters based on the process
% characterization data, design specifications and the script
% These equations are for inputs - Vsw, Av, Ibias
IbiasE = Ibias;
RlE = Vsw/IbiasE;
VswoE = IbiasE*RlE;
WnE = 0.5*IbiasE/iDen;
AvoE = gmDen*WnE*RlE/(1+gdsDen*WnE*RlE);
%% To ignore Miller effect use this calculation for load device
%% capacitance
%% for some strange reason NSC models have Cg = Cgs+Cgb (no Cgd)
114
gm/w
gds/w
ibias/w
cg/w
cgd/w
cgs/w
cgb/w
cds/w
%Cvar = WnE*(cdbDen+Scale*(cgsDen+cgbDen+cgdDen));
%% To account for Miller effect, use this calculation
Cvar = WnE*(cdbDen+Scale*(cgsDen+cgbDen+(1+AvoE)*cgdDen));
% Need to guarantee that Nsq >= MinSq
% This code calculates resistor size based on satisfying
% minimum number of squares (Nsq) and current density
% requirements based on design rule recommendations for
% device matching.
% Nsq = VswoE/(IbiasE*Rsh);
% Width = 2e-6;
7 if ((Nsq >= MinSq)&&(IbiasE/Width <= Idenmax))
% Length = Width*Nsq;
% else if ((Nsq >= MinSq)&&(IbiasE/Width > Idenmax))
% Width = IbiasE/Idenmax;
% Length = Nsq*Width;
% else if ((Nsq < MinSq)&&(IbiasE/Width <= Idenmax))
% Width = ceil(MinSq/Nsq)*2e-6;
% Length = Nsq*Width*ceil(MinSq/Nsq);
7 else if ((Nsq < MinSq)&&(IbiasE/Width > Idenmax))
7 Width = max(ceil(MinSq/Nsq)*2e-6, IbiasE/Idenmax);
7 Length = Nsq*Width*ceil(MinSq/Nsq);
7 end
7 end
7 end
% end
% This code calculates resistor size based on satisfying
% current density rules only and optimizes the size for
% speed. The total area is kept larger than 10um^2.
Nsq = VswoE/(IbiasE*Rsh);
Width = max(2e-6,IbiasE/Idenmax);
Length = Nsq*Width;
if (Width*Length < 10e-12)
ratio = sqrt(10e-12/Width/Length);
Width = Width*ratio;
Length = Length*ratio;
end
FbwoE = (2*pi*RlE/(1+gdsDen*WnE*RlE)*
(Cfix+Cvar+CRp*Length*Width))^-1;
115
116
Bibliography
[1] Behzad Razavi, Design of Analog CMOS Integrated Circuits, McGraw-Hill, New
York, 2001.
[2] C. C. Enz and G. C. Temes, "Circuit Techniques for Reducing the Effects of Op-
Amp Imperfections: Autozeroing, Correlated Double Sampling, and Chopper
Stabilization," Proceedings of the IEEE, vol. 84, no. 11, pp. 1584-1614, Nov.
1996.
[3] E. Sackinger and W. Fischer, "A 3-GHz 32-dB CMOS Limiting Amplifier for
SONET OC-48 Receivers," IEEE Journal of Solid State Circuits, vol. 35, no.
12, pp. 1884-1888, Dec. 2000.
[4] H. Rein, "Multi-Gigabit-Per-Second Silicon Bipolar IC's for Future Optical-Fiber
Transmission Systems," IEEE Journal of Solid State Circuits, vol. 23, no. 3, pp.
664-674, June 1988.
[5] S. Galal and B. Razavi, "10Gb/s Limiting Amplifier and Laser/Modulator Driver
in 0.18pam CMOS Technology," IEEE International Solid-State Circuits Confer-
ence, 2003.
[6] AMCC, "S3455 SONET/SDH/ATM OC-48 4-Bit Transceiver with CDR,"
AMCC Device Specification, vol. Revision A, Jan. 2002.
[7] Sumitomo Eletric Industries Ltd., "SDG2101/2102 OC-192 Optical Transceiver
Data Sheet," Sumitomo Electric Industries, Ltd. Specification TS-S00D027C,
Apr. 2001.
[8] R. Schmid, T.F. Meister, and H.-M. Rein, "SiGe Driver Circuit with High Output
Amplitude Operating up to 23Gb/s," IEEE Journal of Solid State Circuits, vol.
34, no. 6, pp. 886-891, June 1999.
[9] Y. Miyamoto, M. Yoneyama, T. Otsuji, K. Yonenaga, and N. Shimizu, "40-
Gbit.s TDM Transmission Technologies Based on Ultra-High-Speed IC's," IEEE
Journal of Solid State Circuits, vol. 34, no. 9, pp. 1246-1253, Sept. 1999.
[10] D. Lyon, "Elimination of DC Offset by MMSE Adaptive Equalizers," IEEE
Transactions on Communications, vol. 24, no. 9, pp. 1049-1052, Sept. 1976.
117
[11] M. Aiki, T. Tsuchiya, and M. Amemiya, "446 Mbit/s Integrated Optical Re-
peater," Lightwave Technology, Journal of, vol. 3, no. 2, pp. 392-399, Apr. 1985.
[12] A. Tanabe, M. Soda, Y. Nakahara, A. Furukawa, T. Tamura, and K. Yoshida, "A
Single Chip 2.4Gb/s CMOS Optical Receiver IC with Low Substrate Crosstalk
Preamplifier," International Solid State Circuits Conference, Feb. 1998.
[13] Paul R. Gray, Paul J. Hurst, Stephen H. Lewis, and Robert G. Meyer, Analysis
and Design of Analog Integrated Circuits, John Wiley and Sons, Inc., 2001.
[14] James K. Roberge, Operational Amplifiers: Theory and Practice, John Wiley
and Sons Inc., New York, 1975.
[15] Kenneth R. Laker and Willy M. Sansen, Design of Analog Integrated Circuits
and Systems, McGraw Hill, New York, 1994.
[16] Robert G. Meyer, "Low-Power Monolithic RF Peak Detector Analysis," IEEE
Journal of Solid State Circuits, vol. 30, no. 1, pp. 65-67, Jan. 1995.
[17] C.Y. Lau and M.H. Perrott, "Phase Locked Loop Design at the Transfer Function
Level Based on a Direct Closed Loop Realization Algorithm," Design Automation
Conference, , no. 11, pp. 526-531, June 2003.
[18] E Crain and M. H. Perrott, "A Numerical Design Approach For High Speed,
Differential, Resistor-Loaded, CMOS Amplifiers," in International Symposium
on Circuits and Systems, May 2004.
[19] H. Liu, A. Singhee, R.A. Rutenbar, and L.R. Carley, "Remembrance of Circuits
Past: Macromodeling by Data Mining in Large Analog Design Spaces," in Design
Automation Conference, June 2002, pp. 437-442.
[20] T.H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits, Cambridge
University Press, 1998.
[21] Jan M. Rabaey, Digital Integrated Circuits, A Design Perspective, Prentice Hall,
1996.
[22] Michael Perrott, 6.976 - High Speed Communication Circuits and Systems: Lec-
ture 6, MIT, Cambridge, MA, 2004.
[23] David Johns and Ken Martin, Analog Integrated Circuit Design, Wiley, 1997.
[24] Behzad Razavi, Design of Integrated Circuits for Optical Communications, Mc-
Graw Hill, New York, 2003.
[25] K. Gulati and Hae-Seung Lee, "A high-swing CMOS telescopic operational
amplifier," IEEE Journal of Solid State Circuits, vol. 33, no. 12, pp. 2010-2019,
Dec. 1998.
[26] Sataporn Pornpromlikit, "PRBS Generator for High-Speed System Test," In-
ternal MIT Documentation, Nov. 2003.
118
