Ultra-Wideband Transceiver with Error Correction for Cortical Interfaces in NanometerCMOS Process by Luo, Yi
Utah State University 
DigitalCommons@USU 
All Graduate Theses and Dissertations Graduate Studies 
5-2017 
Ultra-Wideband Transceiver with Error Correction for Cortical 
Interfaces in NanometerCMOS Process 
Yi Luo 
Utah State University 
Follow this and additional works at: https://digitalcommons.usu.edu/etd 
 Part of the Electrical and Computer Engineering Commons 
Recommended Citation 
Luo, Yi, "Ultra-Wideband Transceiver with Error Correction for Cortical Interfaces in NanometerCMOS 
Process" (2017). All Graduate Theses and Dissertations. 5859. 
https://digitalcommons.usu.edu/etd/5859 
This Dissertation is brought to you for free and open 
access by the Graduate Studies at 
DigitalCommons@USU. It has been accepted for 
inclusion in All Graduate Theses and Dissertations by an 
authorized administrator of DigitalCommons@USU. For 
more information, please contact 
digitalcommons@usu.edu. 
ULTRA-WIDEBAND TRANSCEIVER WITH ERROR CORRECTION FOR
CORTICAL INTERFACES IN NANOMETER CMOS PROCESS
by
Yi Luo
A dissertation submitted in partial fulfillment
of the requirements for the degree
of
DOCTOR OF PHILOSOPHY
in
Electrical Engineering
Approved:
Chris Winstead, Ph.D. Todd Moon, Ph.D.
Major Professor Committee Member
Sanghamitra Roy, Ph.D. Koushik Chakraborty, Ph.D.
Committee Member Committee Member
Xiaojun Qi, Ph.D. Mark R. McLellan, Ph.D.
Committee Member Vice President for Research and
Dean of the School of Graduate Studies
UTAH STATE UNIVERSITY
Logan, Utah
2017
ii
Copyright c© Yi Luo 2017
All Rights Reserved
iii
ABSTRACT
Ultra-Wideband Transceiver with Error Correction for Cortical Interfaces in Nanometer
CMOS Process
by
Yi Luo, Doctor of Philosophy
Utah State University, 2017
Major Professor: Chris Winstead, Ph.D.
Department: Electrical and Computer Engineering
This dissertation reports a high-speed wideband wireless transmission solution for the
tight power constraints of cortical interface application. The proposed system deploys
Impulse Radio Ultra-wideband (IR-UWB) technique to achieve very high-rate communica-
tion. However, impulse radio signals suffer from significant attenuation within the body,
and power limitations force the use of very low-power receiver circuits which introduce
additional noise and jitter. Moreover, the coils’ self-resonance has to be suppressed to min-
imize the pulse distortion and inter-symbol interference, adding significant attenuation. To
compensate these losses, an Error correction code (ECC) layer is added for functioning
reliably to the system. The performance evaluation is made by modeling a pair of physi-
cally fabricated coils, and the results show that the ECC is essential to obtain the system’s
reliability.
Furthermore, the gm/ID methodology, which is based on the complete exploration of
all inversion regions that the transistors are biased, is studied and explored for optimizing
the system at the circuit-level. Specific focuses are on the RF blocks: the low noise am-
plifier (LNA) and the injection-locked voltage controlled oscillator (IL-VCO). Through the
iv
analytical deduction of the circuit’s features as the function of the gm/ID for each tran-
sistor, it is possible to select the optimum operating region for the circuit to achieve the
target specification. Other circuit blocks, including the phase shifter, frequency divider,
mixer, etc. are also described and analyzed. The prototype is fabricated in a 65-nm CMOS
(Complementary Metal-Oxide-Semiconductor) process.
(152 pages)
vPUBLIC ABSTRACT
Ultra-Wideband Transceiver with Error Correction for Cortical Interfaces in Nanometer
CMOS Process
Yi Luo
The wireless data transmission in the application of cortical interfaces requires a mini-
mum of 27.0 Mbps for high-resolution visual functions within a very tight power constraint
(< 10 mW). Most of the published designs for this application deploy narrow band tech-
niques and have the throughput from several Mbps to tens of Mbps up to date. This work
proposed a wideband solution with the use of Error-correction codes that can realize a
throughput more than 100 Mbps within a power of 10 mW.
vi
To my wife, Yuzhe and our adorable son Evan
vii
ACKNOWLEDGMENTS
Creating a Ph.D. thesis is not an individual experience; rather it takes place in a social
context and includes several persons, whom I would like to thank sincerely.
The most important acknowledgment of gratitude I wish to express is to my noble and
esteemed guide, Dr. Chris Winstead. Dr. Winstead is a great thinker and a man who never
tires of his passion for looking for new insights and ways to comprehend the reality around
us. This thesis covers a variety of areas including communication and information theory,
analog/RF and digital circuitry and needs many CADs and tools. Dr. Winstead always
has a clarity of thought targeting the research goals and has the immense knowledge and
experience guiding me to finish the work. He supported me not only by providing a research
assistantship over almost seven years but also academically and emotionally through the
rough road to complete this thesis. Thanks to him I had the opportunity to fabricate the
chip. And during the most difficult times, he gave me the moral support and the freedom
I needed to move on. This dissertation would not have been possible without his expert
guidance, key insights, patience, and motivation.
I also would like to express my gratitude to my committee members: Dr. Todd Moon,
Dr. Sanghamitra Roy, Dr. Koushik Chakraborty and Dr. Xiaojun Qi, for their insightful
comments, encouragement and their time and effort in service to my doctoral committee
despite their already heavy loads of responsibility. In particular, Dr. Moon’s class gives me
a solid foundation on the mathematical methods and information theory, while Dr. Roy’s
class gives me basic background on digital VLSI. The discussions in and out of these courses
were always helpful and insightful.
My sincere thanks also go to Dr. Reyhan Baktur for her help during the coil fabrication
and test. Her advice and suggestions inspired my new ideas and helped me solve the
technical problems in my research efficiently. I would like to thank Dr. Ryan Gerdes for
allowing me participating part of his project in which I learned a lot of skills for testing.
viii
My graduate studies would not have been the same without the support and encour-
agement provided by all my colleagues and friends. Many thanks go to Gopalakrishnan
Sundararajan for his collaboration on this project. He optimized and layout the digital cir-
cuitry and maintained the CAD system to ensure the completion of the project on schedule.
I learned a great deal from the discussions with him. I would also like to thank Tasnuva
Tithi for her valuable investigation on the ADC design. Thank you to Taha Shahvirdi for
his hands-on help on the coil test. Besides, I also appreciated the friendship and encourage-
ment from Toribio David, Abiezer Tejeda, Eduardo Monzon, Magathi Jayaram and Peter
Beshay during the early period of this research.
My deepest appreciation also goes to the ECE Department and all of the staff members.
I am very thankful to Mary Lee Anderson for the invaluable support in helping me solve
various administrative matters; Scott Kimber for the immediate technical assistance and the
server system maintenance; Heidi Harper for the immediate help with the test equipment
and tools for so many times. You have contributed greatly in allowing me to beat paper
deadlines.
I would also like to thank my good friends, Yanyan Yang, Guanyu Ma and JiaWang
Gao for their unreserved help and encouragement. The time undertaken with them in Los
Angles greatly helped me put myself together and get me back on my feet.
Most importantly, at a personal level, I would like to sincerely thank my family without
whose support the completion of this Ph.D. would not have been possible. I wish to thank
my beloved wife, Yuzhe Liang, for her unconditional support and sacrifice all these years.
Even in the most difficult time, she never gave me up and continuously trusted, encouraged
and helped me. A special appreciation gives to my parents and parents-in-law, they have
given up many things for me to be at USU, and supported me whenever I needed it. Also
thanks to my little boy Evan, for being my everlasting source of joy and motivation.
Yi Luo
ix
CONTENTS
Page
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
PUBLIC ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Two Approaches: Narrowband and Wideband Modulations . . . . . . . . . 3
1.2.1 Narrowband Modulation . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Wideband Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Challenges on Wideband Modulation . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Contributions and Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 SYSTEM ARCHITECTURE AND ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 The State-of-Art for IR-UWB Receivers . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Coherent Pulse-Template Correlation Receiver . . . . . . . . . . . . 12
2.1.2 Non-Coherent Self-Correlation Receiver . . . . . . . . . . . . . . . . 15
2.1.3 Super-regenerative Receiver . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.4 Injection-locked Receiver . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.5 Comparison of the Four Architectures . . . . . . . . . . . . . . . . . 17
2.2 Proposed System Architecture and Link Analysis . . . . . . . . . . . . . . 18
2.2.1 Proposed Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.2 Link Transfer Function . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.3 Link Impulse Response . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Coil Fabrication and Measurement . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 Coil Fabrication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.2 UWB Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.3 Coil Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Receiver Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 28
3 DESIGN METHODOLOGY FOR NANOMETER CMOS PROCESS . . . . . . . . . 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 MOSFET gm/ID Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.1 gm/ID Versus Normalized Current i = ID/(W/L) . . . . . . . . . . . 38
3.2.2 fT Versus gm/ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.3 gds/ID Versus gm/ID . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.4 gmb/gm Versus gm/ID . . . . . . . . . . . . . . . . . . . . . . . . . . 45
x3.2.5 Normalized Capacitance Cij Versus gm/ID . . . . . . . . . . . . . . 48
3.2.6 Data Acquisition Scheme . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3 On-Chip Inductor gm/ID Model . . . . . . . . . . . . . . . . . . . . . . . . 49
4 UWB TRANSCEIVER CIRCUITS IMPLEMENTATION . . . . . . . . . . . . . . . . . . 56
4.1 Low Noise Amplifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.1 Brief Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.2 Gain-Boosted CG-LNA . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.1.3 Signal Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.1.4 gm/ID Design Verification . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2 Injection-Locked VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2.1 Brief Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2.2 Signal Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2.3 gm/ID Design Verification . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2.4 Digital Controlled Capacitor Bank . . . . . . . . . . . . . . . . . . . 84
4.3 Polyphase Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.4 Phase Shifter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.5 Frequency Divider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.5.1 Brief Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.5.2 Signal Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.6 Mixer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.7 Error Correction Encoder and Decoder . . . . . . . . . . . . . . . . . . . . . 104
5 CONCLUSIONS AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
A Impulse Response Calculation for the Coils . . . . . . . . . . . . . . . . . . 133
CURRICULUM VITAE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
xi
LIST OF TABLES
Table Page
1.1 NARROW BAND AND WIDEBAND RECEIVER COMPARISON . . . . 9
2.1 LRC VALUES AND COUPLING COEFFICIENT OF THE COIL . . . . . 20
4.1 ADVANTAGE (X) AND DISADVANTAGE (5) OF LNA STAGES . . . . 59
4.2 MATLAB CALCULATIONS AND SPECTRERF SIMULATIONS COM-
PARISON FOR THE CG-LNA. . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3 MATLAB CALCULATIONS AND SPECTRERF SIMULATIONS COM-
PARISON FOR THE IL-VCO. . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.1 AVERAGE POWER FOR EACH CIRCUIT BLOCK OF THE RECEIVER. 113
xii
LIST OF FIGURES
Figure Page
1.1 Cortical implant system for stimulating the visual cortex [1]. . . . . . . . . 2
1.2 Receiver Data Rate vs. Power Consumption. . . . . . . . . . . . . . . . . . 7
2.1 FCC spectrum mask for UWB indoor and outdoor communications [2]. . . 11
2.2 Time domain and frequency domain mapping of carrier-based UWB signals
with different pulse width. Carrier frequency fc = 4 GHz. . . . . . . . . . . 12
2.3 Time domain and frequency domain mapping of carrier-less UWB signals
with different pulse width. . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 IR-UWB receiver architecture overview. . . . . . . . . . . . . . . . . . . . . 14
2.5 Receiver Data Rate vs. Power Consumption. . . . . . . . . . . . . . . . . . 18
2.6 Data Rate vs. Energy/bit. . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.7 Proposed UWB transmission system: Transmitter, Inductive Coil Link, and
IR-UWB Receiver (modified from [3]). . . . . . . . . . . . . . . . . . . . . . 20
2.8 Time responses for carrier-base UWB pulse (input pulse width = 2 ns). . . 22
2.9 Time responses for carrier-less UWB pulse (input pulse width = 2 ns). . . 23
2.10 Bode plot of H12(s). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.11 The UWB pulse generator schematic. . . . . . . . . . . . . . . . . . . . . . . 25
2.12 Photo of (a) UWB pulse generator; (b) Tx coil; (c) Rx coil. . . . . . . . . 25
2.13 Measured S-parameters of the fabricated coil. . . . . . . . . . . . . . . . . 26
2.14 Experimental setup for measuring time response of the inductive coil link
between the beef tissue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.15 Time response measurement of the coils. . . . . . . . . . . . . . . . . . . . 29
2.16 Correlation coefficients of 300 measured and simulated samples. . . . . . . 30
2.17 Measured power spectral density for both pulses from Tx and Rx. . . . . . 30
xiii
2.18 (a) System performance when the jitter RMS is 35 ps, 40 ps, 45 ps; NF of
LNA is constant at 5 dB; (b) System performance when the LNA NF =
4 dB, 5 dB, 6 dB; jitter RMS is constant at 40 ps; (c) System performance
when jitter RMS is 60 ps, 80 ps; LNA NF=5 dB; (d) Showing the quantities
of performance improved by the ECC. The BER with ECC is unobservable
when Vrxb = 24.7 mV and clock jitter < 55 ps. . . . . . . . . . . . . . . . . . 32
3.1 Id versus Vds for different transistor length L. Vgs and W/L are constant at
0.6 V and 10, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 ID versus VGS in logarithmic scale for an NMOS transistor with W/L =
10 µm/70 nm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Comparison of square-law based method and gm/ID-based method [4]. . . 37
3.4 gm/ID versus i for an NMOS transistor. . . . . . . . . . . . . . . . . . . . 38
3.5 (a) NMOS transistor; (b) PMOS transistor; (c) MOSFET capacitances. . . 39
3.6 Layout of an NMOS transistor with W = 6 µm (nf = 4, Wn = 1.5 µm),
L = 70 nm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.7 gm/ID versus i for different transistor width W . . . . . . . . . . . . . . . . 40
3.8 gm/ID versus i for different transistor length L. . . . . . . . . . . . . . . . 41
3.9 gm/ID versus i for different drain-source voltage. . . . . . . . . . . . . . . 43
3.10 fT versus gm/ID for NMOS and PMOS transistors. . . . . . . . . . . . . . 43
3.11 gm/ID and fT versus overdrive voltage Vov. . . . . . . . . . . . . . . . . . . 44
3.12 The product of gm/ID and fT peaks in moderate inversion [4]. . . . . . . . 44
3.13 gds/ID versus gm/ID for different Vds. . . . . . . . . . . . . . . . . . . . . . 46
3.14 gds/ID versus gm/ID for different transistor length L. . . . . . . . . . . . . 47
3.15 gmb/gm versus gm/ID for NMOS and PMOS transistor. . . . . . . . . . . . 48
3.16 Normalized capacitance Cij versus gm/ID for NMOS transistor, ij = gs, gd,
gb, bd, bs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.17 Normalized capacitance Cij versus gm/ID for PMOS transistor, ij = gs, gd,
gb, bd, bs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.18 Test circuits to acquire gm/ID, gds/ID, gmb/gm and Cij . . . . . . . . . . . 52
xiv
3.19 AC analysis for (a) single-ended inductor and (b) differential inductor. . . 53
3.20 (a) Serial inductor modeling and (b) parallel inductor modeling. . . . . . . 53
3.21 Layout of a differential inductor with its parameters: coil width w, number
of turns, and diameter d. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.22 Series resistance, parallel resistance and quality factor (Q) versus L at f0 =
1.6 GHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1 Block diagram of the UWB transceiver prototype. . . . . . . . . . . . . . . 57
4.2 Basic LNA stage: (a) CS-LNA; (b) CG-LNA. . . . . . . . . . . . . . . . . . 58
4.3 Gain-boosting basic idea: (a) conventional CG-LNA; (b) gain-boosted CG-
LNA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4 Complete gain-boosted CG-LNA. . . . . . . . . . . . . . . . . . . . . . . . 62
4.5 Effective transconductance calculation of capacitive cross-coupled CG-LNA. 62
4.6 Equivalent half circuit model of the CG-LNA. . . . . . . . . . . . . . . . . 63
4.7 Small signal model for gain-boosted CG-LNA. . . . . . . . . . . . . . . . . 65
4.8 Power gain calculation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.9 SpectreRF simulation results of S11, S22, power gain, and noise figure (NF)
for the three CG-LNA designs. . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.10 Voltage gain when Vtune LNA=0 V and Vtune LNA=1.2 V. . . . . . . . . . . 71
4.11 Injection locking phenomenon: (a) free-running oscillator; (b) injection-locked
oscillator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.12 Injection-locking phenomenon: (a) first-harmonic injection locking; (b) super-
harmonic injection locking; (c) sub-harmonic injection locking. . . . . . . . 73
4.13 Injection-locked VCO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.14 Cross-coupled NMOS small signal analysis. . . . . . . . . . . . . . . . . . . 76
4.15 Complete VCO Core small signal model. . . . . . . . . . . . . . . . . . . . 78
4.16 Small signal model for cascode structure. . . . . . . . . . . . . . . . . . . . 78
4.17 Complete small signal model of IL-VCO. . . . . . . . . . . . . . . . . . . . 80
xv
4.18 IL-VCO free-running status: transient signal and coarse tunable frequencies
from b3b2b1b0 = 0000 to b3b2b1b0 = 1111. . . . . . . . . . . . . . . . . . . . 85
4.19 IL-VCO running status with injected signal. . . . . . . . . . . . . . . . . . 86
4.20 IL-VCO locking range 1.52-1.66 GHz. . . . . . . . . . . . . . . . . . . . . . 86
4.21 Capacitor bank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.22 Quadrature generation by divide-by-2 flip-flops. . . . . . . . . . . . . . . . 89
4.23 Basic digram for quadrature VCO. . . . . . . . . . . . . . . . . . . . . . . . 89
4.24 RC − CR polyphase filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.25 Layout sketch for the polyphase filter. . . . . . . . . . . . . . . . . . . . . . 92
4.26 Phase shifter working principle. . . . . . . . . . . . . . . . . . . . . . . . . 93
4.27 Complete phase shifter circuit. . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.28 Phase shift in one quadrature: SI = 0, SQ = 0 (totally 27 = 128). . . . . . 96
4.29 Regenerative (Miller) divider [5]. . . . . . . . . . . . . . . . . . . . . . . . . 96
4.30 Conventional divide-by-16 CML frequency divider. . . . . . . . . . . . . . . 97
4.31 Conventional LC-Type frequency divider [6]. . . . . . . . . . . . . . . . . . 98
4.32 Modified divide-by-16 frequency divider. . . . . . . . . . . . . . . . . . . . 99
4.33 Modified resettable CML circuit. . . . . . . . . . . . . . . . . . . . . . . . . 100
4.34 Timing diagram of the divide-by-16 frequency divider. . . . . . . . . . . . 101
4.35 Zoomed-in timing diagrams: (a) minimum requirement for Tck; (b) maximum
requirement for Tck. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.36 Small signal model to calculate RL and CL for the CML circuit. . . . . . . 104
4.37 Buffer: a complementary self-biased differential amplifier (CSDA) [7]. . . . 105
4.38 Output clock from the frequency divider. . . . . . . . . . . . . . . . . . . . 106
4.39 Average power for the frequency divider. . . . . . . . . . . . . . . . . . . . 107
4.40 Complete mixer circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.41 Trellis diagram for the encoder with constraint length K = 3, generator
polynomial G (x) =
[
1 + x2 1 + x+ x2
]
. . . . . . . . . . . . . . . . . . . . 109
4.42 Block diagram of the Viterbi decoder. . . . . . . . . . . . . . . . . . . . . . 110
CHAPTER 1
INTRODUCTION
Inspired by the wireless data transmission in the application of cortical interfaces, this
dissertation asserts that, the Impulse Radio Ultra-wideband (IR-UWB) signaling with Er-
ror Correction technique is a viable solution for the short distance, high data rate, low cost,
highly integrated, low power radio communication. Existing narrowband transceivers are
mostly order(s) of magnitude away from the transmission speed target for cortical interfaces.
IR-UWB system is widely considered as a high speed, low power solution for short distance
application, however, the power of existing designs is still outside the power constraint for
cortical interfaces. The goal of this dissertation is to identify and specify a candidate archi-
tecture for the low power (< 10 mW), high speed (> 100 Mbps), short distance (< 10 mm)
applications especially for cortical interfaces. The system has been fabricated in a nanome-
ter 65-nm CMOS (Complementary Metal-Oxide-Semiconductor) process.
1.1 Motivation
There have been continuous demands for high-speed, very low-power communication
systems for bio-implantable electronic devices. One such area that is of significant interest
is cortical interfaces which are used to stimulate the brain’s cortex directly. Visual cortical
prosthetics are a class of cortical interfaces that restore functional vision in those suffering
from partial or total blindness [8].
An implant system for stimulating the visual cortex is illustrated in Fig. 1.1 [1]. In this
system, image data is captured by a camera integrated into eyeglass frames, then trans-
mitted through an inductive transcutaneous link to an array of microelectrode stimulators
implanted in the visual cortex. Direct cortical stimulation results into the perception of an
image.
The system requires a high rate of data transmission to interface with a large array
2Optic Nerve
Retina
Eyeglass 
frame
Visual Cortex
Receiver
Transmitter
Fig. 1.1: Cortical implant system for stimulating the visual cortex [1].
of stimulating sites. Weiland and Humayun showed that at least 1000 pixels are required
to restore important visual functions such as face recognition and reading [9]. In such
a visual implant, for example, the resolution of 32×32 (1024) pixels requires 10 bits for
addressing purpose. If 8 bits are used for pulse amplitude control, 4 bits for polarity, parity-
checking, and sequencing, this results in a total word length of 22 bits. Assuming that each
of the channels is to be driven at the frequency, say, 200 Hz (for physiological reasons [10]),
the data rate required is higher than (4 words per biphasic pulse)×22 (bit/word)×1024
(channels)×200 (pulses/channel/s), or 18.0 Mbps (Megabit per second) [10]. If 300 Hz of
refresh rate is required (for high-resolution visual functions), the minimum data rate in-
creases to 27.0 Mbps. Thus, transceivers with data rate on the order of tens of Mbps or
higher could be expected in high demand in the future.
To guarantee safe long-term use of bio-implantable devices, severe power constraints
must be satisfied. The power density must be no greater than 0.8 mW/mm2 to avoid
tissue damage through heating effects [11]. For example, Lazzi showed that a chip with a
power of 12.4 mW and dimension of 4×4×0.5 mm, would induce a temperature increase on
the surface of the chip of approximately 0.8 ◦C when positioned in the center of a human
3eyeball [12]. Generally, 10 mW is considered as the safe upper limit of the overall system
since the temperature rises of more than 2 ◦C would damage the surrounding neurons [8].
It has proven difficult to design wireless receiver systems which meet this power constraint
while simultaneously delivering high date rate in the order of tens or even hundreds of
Mbps.
1.2 Two Approaches: Narrowband and Wideband Modulations
1.2.1 Narrowband Modulation
There are two modulation approaches for cortical interface application: narrowband
and wideband. In the available literature, most of the designs are optimized for narrowband
modulation, delivering high-amplitude signals to the implanted receiver by utilizing the
harmonic resonance in coil interfaces functioned in the near-field domain [13]. In such
solutions, high-quality factor (Q) is required to increase the selectivity of the coil. Thus a
great number of narrowband techniques deploying high-Q coils have been reported.
In 2004 Ghovanloo et al. demonstrated a Frequency-Shift Keying (FSK) system op-
erating at 2.5 Mbps with 0.38 mW power consumption [14]. In 2007 Harrison described a
bi-directional Amplitude-Shift Keying (ASK) system in which the forward link operated at
6.5 kbps (kilobits per seconds) and a reverse link operated at 330 kbps, with a total power
consumption of 13.5 mW [15]. Harrison’s system was subsequently improved upon and used
to implement a 100-site stimulation system [16].
More recent designs placed a greater emphasis on high-speed transmission. In 2007
Coulombe et al. described an ASK system achieving 1.5 Mbps at 0.9 mW [17]. In 2008
Mandal et al. described a Load-Shift Keying (LSK) system operating at 4 Mbps with 2.5 mW
power consumption [18]. Zhou et al. demonstrated a Phase-Shift Keying (PSK) system at
2 Mbps while consuming 6.2 mW [19], and Luo et al. demonstrated a Binary Phase-Shift
Keying (BPSK) design operating at 20 kbps and consuming 3 mW [20]. In 2012, Nabovati
et al. described a high-speed BPSK demodulator capable of transmitting one bit per carrier
cycle. The design achieved 16 Mbps with a carrier frequency of 16 MHz, while consuming
4only 27 µW in simulation [21]. In 2015, Zgaren et al. presented a FSK receiver for 8 Mbps
while consuming only 0.64 mW [22].
In 2011, a speed/power of 10.2 Mbps/3 mW was demonstrated by Inanlou et al. using
an impulse-based wideband solution known as Pulse Harmonic Modulation (PHM) [23,24].
In this technique, two pulses with specific amplitude and time delays are transmitted every
bit period through a high-Q coil link whose resonance frequency centered at 67.5 MHz. The
first pulse generates a decaying oscillation at the harmonic frequency of the coil, while the
second pulse produces a smaller oscillation with some delay that opposes in phase to the
first oscillation, thus the inter-symbol interference (ISI) is minimized, reaching a high data
rate without reducing the coils’ quality factor.
In 2013, an improved specification of the PHM receiver was proposed by Kiani et al.
with an additional automatic gain control (AGC) mechanism, and it reached a speed/power
of 20 Mbps/0.24 mW [25]. To the authors’ knowledge, this system has the highest through-
put and is most efficient cortical interface receiver reported to date within the narrowband
modulation category. Like most cortical interface designs, the system is a narrowband so-
lution that benefits from resonance in the coil interface. It is unlikely that this method –
or any narrowband methods – can be directly applied at much higher carrier frequencies
because the coils’ resonant frequency can only be controlled within a limited range.
1.2.2 Wideband Modulation
Because the narrowband approach is limited in transmission speed, researchers have
begun to consider wideband modulations. Kiani et al. proposed a Pulse Delay Modulation
(PDM) technique that can offer a wide bandwidth for simultaneous data and power trans-
mission across inductive links [26,27]. To transmit each bit, a pattern of narrow pulses are
generated at the same frequency of the power carrier across the transmitter (Tx) data coil
with specific time delays to initiate decaying ringing across the tuned receiver (Rx) data
coil [27]. However, the data rate is limited by the low frequency of the power carrier. It
was reported to be 13.56 Mbps while consuming 2.2 mW for the receiver side in [27].
5Since the FCC (Federal Communications Commission) deregulated the use of Ultra-
Wideband (UWB) in 2002 [28], UWB soon became quite popular for short range ultra-low-
power applications such as contact-less chip testing [29] and body area networks (BAN)
[30, 31]. The IR-UWB modulation was proposed in 2007 by Charles [32, 33] as a promis-
ing candidate for biomedical devices, due to its inherent high transmission speed (up to
1 Gbps [34]), low power consumption and low complexity. However, additional Clock Data
Recovery (CDR) circuit is necessary because of the blind synchronization at the receiver.
There is comparatively little work has been done on realizing UWB solutions for cortical
interfaces. In 2010, Jow et al. demonstrated a UWB solution for back telemetry applications
(transmission from internal to external), but only the design methodology for the link coils
is presented [35]. In 2012, the authors described a system model for an implantable UWB
receiver for cortical stimulators [3].
1.3 Challenges on Wideband Modulation
There are three challenges that may limit the application of UWB modulation for
forwarding transmission (from external to internal) in cortical interfaces. These challenges
are itemized in the following paragraphs. The contributions of this dissertation are to
address these challenges, to demonstrate feasible solutions for realizing the target system,
and to quantify trade-offs for optimizing the UWB forward link.
Coil implementation challenge
In an IR-UWB system, the data is modulated as a series of short pulses in time (less
than 2 ns), whose central frequency is usually hundreds of MHz or on the order of GHz,
with bandwidth greater than 500 MHz. To transmit such high-frequency signals, the self-
resonance frequency (SRF) of both Tx and Rx coils need to be kept quite large in order to
allow the high-frequency components of such narrow pulses to pass through the inductive
link, minimizing the pulse distortion and ISI. Obtaining high SRF in the hundreds of MHz
or even GHz range for implanted coils is quite challenging because the on-board [36, 37]
or handmade [38] coils’ inductance, dimensions, and separations are much larger than the
6on-chip coils used in chip-to-chip communication, while their parasitic resistance is much
lower [29]. Also, the high conductivity of the tissue increases the parasitic capacitance
around the coils, simultaneously decreases the SRF (since SRF = 1/2pi
√
LC, where L and
C are the inductance and capacitance of the coil, respectively). Jow and Ghovanloo reported
a vertical and a figure-8 data coils by using a thin multistrand Litz wire (diameter=100µm)
and the standard FR4 PCB (1.5 mm thick), which lead the coil’s SRF up to 98.8 MHz and
256 MHz [39], well below the UWB range.
Very low Signal-to-Noise Ratio (SNR)
One possible solution to the bandwidth limitation is reducing the quality factor(Q)
value of the coil by adding series or parallel resistors so that the transfer function is flat
across the entire UWB frequency range. With this approach, the amplitude of the received
signal will then be reduced significantly because of the coils’ low-Q, as well as the receiver’s
selectivity. As a result, the SNR at the Rx side is significantly reduced, possibly leading to
unacceptable performance.
More than that, the high-frequency signals are highly absorbable in vivo and would
be “blocked” by the human skull and conductive tissues [40–45], further decreasing the
system’s SNR. The experiment has shown that the path loss (power attenuation of an
electromagnetic wave) for an IR-UWB signal in the band of 3.5-4.5 GHz, varies between
−20 dB and −30 dB for an in-body propagation distance of approximately 10 mm [45].
High power consumption
Typical IR-UWB receivers usually contain low-noise amplifier (LNA), multiplier, analog-
to-digital converter (ADC) and synchronization circuitry, which would consume tens or hun-
dreds mW [46–61]. While these receiver circuits may be considered “low power” in many
application domains, they are still much higher than the 10 mW constraint specified for
bio-implantable electronics.
Data for UWB radios as well as narrowband radios for bio-implantable devices are
included in the plot Fig. 1.2, and listed in the Table. 1.1. The plot shows receiver data
7throughput plotted against power. The shaded region is our final target of the full receiver
for cortical interfaces: data rate > 100 Mbps, while power < 10 mW. An interesting ob-
servation from this plot is that almost all the available researches function at up to order
of magnitude less than the target specification. The narrowband solutions stay in the low
power region as well as their low-speed characterization. The wideband solutions have high
data throughput at the cost of high power consumption. Interesting exceptions are the
designs in [56, 59, 62] which display admirable speed/power efficiency. The low-complexity
receiver in [62] consists of only an LNA, a down conversion mixer and a buffer, leading to a
very low power of 5 mW when operating at 100 Mbps. The design in [59] does not include
LNA and the one in [56] has a disadvantage of large frequency offset. All of these three
designs, however, do not include a data synchronization mechanism.
10−1 100 101 102 103 104
10−1
100
101
102
103
[56]
[59]
[62]
Data Rate (Mbps)
P
ow
er
C
o
n
su
m
p
ti
o
n
(m
W
)
Narrowband
Wideband
Fig. 1.2: Receiver Data Rate vs. Power Consumption.
1.4 Contributions and Outline
The goal of this research is the exploration, design, implementation, and demonstration
8of a highly integrated, low power, impulse UWB transceiver for high rate cortical interfaces
data transmission. This research focuses on the complete transceiver and will attempt to
quantify the trade-offs between system performance and implementation. The maximum
power target for the receiver is set at 10 mW at the speed of 100 Mbps. A 65-nm CMOS
process is chosen for implementation as it allows for the potential of full integration of the
digital logic with the analog processing blocks in a very low power cost.
In order to achieve a single-chip solution with low power consumption, the following
approach is proposed:
• Investigation of ultra-wideband communication and determination of feasibility.
• Exploration, design, and fabrication of inductive, low-quality factor coil data link.
• Identification, modeling, and evaluation of the proposed system architecture.
• Identification of low power design techniques (gm/ID methodology) for the circuits
implemented in nanometer 65-nm CMOS process.
The remainder of this dissertation is divided into three parts. Chapter. 2 presents
the state-of-art of the UWB system, followed by the proposed system model and per-
formance evaluation. Chapter. 3 details the circuit design gm/ID methodology and con-
structs a database for circuit optimization. Chapter. 4 discusses the circuit-level design and
power/performance optimization by using the gm/ID methodology. Chapter. 5 summarizes
the main contributions of this dissertation.
9Table 1.1: NARROW BAND AND WIDEBAND RECEIVER COMPARISON
Reference
Data Rate
(Mbps)
Power
(mW)
Comm. Scheme Data Synch ∗
Donnell et al. (2006) [46] 10 1.8 UWB 5
Zheng et al. (2006) [49] 200 81 UWB 5
Ali et al. (2008) [47] 1000 98 UWB 5
Barras et al. (2009) [48] 5 42.4 UWB X
Zhou et al. (2009) [50] 2000 145.8 UWB 5
Zhang et al. (2009) [51] 100 156 UWB 5
Michael et al. (2009) [57] 10 11.2 UWB 5
Nick et al. (2010) [52] 39 6.2 UWB X
Denis et al. (2010) [55] 16 11 UWB X
Pelissier et al. (2010) [59] 112 5.4 UWB 5
Zhuo et al. (2011) [53] 33 16.3 UWB 5
Marco et al. (2011) [54] 1 4.1 UWB X
Xia et al. (2011) [56] 100 13.2 UWB 5
Prakash et al. (2011) [58] 10 10.8 UWB 5
Hu et al. (2011) [60] 500 45 UWB X
Gambini et al. (2012) [63] 1 0.29 UWB X
Hu et al. (2013) [64] 250 47.5 UWB X
Wang et al. (2014) [65] 1 0.1 UWB X
Rezaei et al. (2016) [62] 100 5 UWB 5
Brenna et al. (2016) [66] 20 0.965 UWB X
Ghovanloo et al. (2004) [14] 2.5 0.38 Narrowband X
Ghovanloo et al. (2007) [67] 2.5 8.25 Narrowband X
Harrison et al. (2007) [15] 0.3 13.5 Narrowband X
Sawan et al. (2007) [17] 1.5 0.9 Narrowband X
Mandal et al. (2008) [18] 4 2.5 Narrowband X
Zhou et al. (2008) [19] 2 6.2 Narrowband X
Luo et al. (2008) [20] 0.02 3 Narrowband X
Chen et al. (2010) [68] 2 5.7 Narrowband X
Inanlou et al. (2011) [23,24] 10.2 3 Narrowband X
Nabovati et al. (2012) [21] 16 0.027 Narrowband X
Chou et al. (2013) [69] 4 0.27 Narrowband X
Kiani et al. (2013) [25] 20 0.24 Narrowband X
Wilkerson et al. (2013) [70] 1 0.082 Narrowband X
Tan et al. (2014) [71] 0.1 0.78 Narrowband X
Karimi et al. (2014) [72] 10 0.078 Narrowband X
Zgaren et al. (2015) [22] 8 0.64 Narrowband X
Ba et al. (2015) [73] 4.5 1.59 Narrowband X
Hsieh et al. (2016) [74] 0.12 0.35 Narrowband X
∗ Data with synchronization (X) and without synchronization (5).
10
CHAPTER 2
SYSTEM ARCHITECTURE AND ANALYSIS
2.1 The State-of-Art for IR-UWB Receivers
Every wireless communication system designer would be pleased to have more data
modulation bandwidth because more bandwidth offers a higher data throughput by Shan-
non’s Equation [123]:
C = B log(1 +
S
N
) (2.1)
where C is the maximum channel capacity in bits/s; B is the channel bandwidth in
Hz; S is the signal power in watts, N is the noise power also in watts. This equation tells
us that there are three things that we can do to improve the capacity of a channel. We can
increase the bandwidth, increase the signal power or decrease the noise power. The ratio
S/N is known as signal-to-noise ratio (SNR) of the channel. The capacity of a channel
grows linearly with increasing the bandwidth B, but only logarithmically with signal power
S. Thus, from Shannon’s equation, we can see that the UWB system have a great potential
for high-capacity wireless communications.
UWB characterizes transmission systems with spectral occupancy over 500 MHz or a
fractional bandwidth of more than 20%. The power spectral emission mask of the UWB
systems by FCC is illustrated in Fig. 2.1 [2]. The regulation allows spectrum sharing with
low emission limit (-41.3 dBm/MHz Equivalent Isotropically Radiated Power (EIRP)) where
the transmitted signal does not cause harmful interference to other band devices.
UWB signaling can be broadly grouped into two categories: carrier-based UWB and
carrier-less UWB. Carrier-based UWB signal consists of narrow width pulses modulated
by an RF carrier. In this case, the center frequency of the modulated signal is simply de-
termined by the carrier frequency while signal power spectral density at RF is determined
11
100 101
−80
−70
−60
−50
−40
0.96 1.61
1.99
3.1
10.6
GPS
Band
Frequency in GHz
U
W
B
E
IR
P
E
m
is
si
on
L
ev
el
s
(d
B
m
/
M
H
z)
Indoor System Limits
Outdoor System Limits
Fig. 2.1: FCC spectrum mask for UWB indoor and outdoor communications [2].
by the baseband pulse shapes. Carrier-less UWB directly transmits the modulated narrow
pulses without an RF carrier. Two example pulse-based UWB signals with different pulse
width and their power spectrum density (PSD) are shown in Fig. 2.2 and Fig. 2.3. An ex-
tremely short pulse of few nanoseconds has its spectrum crossed over very wideband. The
spectrum width could be controlled by transmitting pulses with different pulse durations.
The signal could be modulated using several different ways including pulse-position modu-
lation (PPM), pulse amplitude modulation (PAM), pulse width modulation (PWM), on-off
keying (OOK), or binary phase-shift keying (BPSK).
A great variety of architectures have been proposed for pulse-based IR-UWB receivers.
Four types of them, which have the top potential to achieve the power constraints of cortical
interfaces while delivering data rate of 100 Mbps or higher, are listed in Fig. 2.4: (a) Coherent
pulse-template correlation receiver [49–52], (b) Non-coherent self-correlation receiver [53–
56], (c) Super-regenerative receiver [57–59] and, (d) Injection-locked receiver [60].
It should be noted that the IR-UWB architectures are not only limited to these four
types. Multi-band orthogonal frequency-division multiplexing (MB-OFDM) UWB chan-
12
−4 −2 0 2 4
−1
0
1
Time (ns)
M
a
g
n
it
u
d
e
(V
)
width = 0.5ns
−4 −2 0 2 4
−1
0
1
Time (ns)
M
a
g
n
it
u
d
e
(V
)
width = 2ns
0 2 4 6 8 10
−80
−60
−40
−20
0
Frequency (GHz)
N
o
rm
a
li
ze
d
P
S
D
(d
B
m
)
width = 2ns
0 2 4 6 8 10
−60
−40
−20
0
Frequency (GHz)
N
o
rm
a
li
ze
d
P
S
D
(d
B
m
)
width = 0.5ns
Fig. 2.2: Time domain and frequency domain mapping of carrier-based UWB signals with
different pulse width. Carrier frequency fc = 4 GHz.
nelizes the signal into 3 or more sub-bands [47, 48], has been primarily used for streaming
video or wireless USB, in which the high-performance electronics are required. Another
type is direct oversampling analog-to-digital converter (ADC) that requires very fast ADCs
at Nyquist rate up to multi-gigahertz, leading to an extremely high power consuming [46].
Thus, these systems are not amenable to the energy constrained cortical interface applica-
tion and will not be discussed in this dissertation.
2.1.1 Coherent Pulse-Template Correlation Receiver
Both coherent and non-coherent receivers correlate the received pulse first such that
13
−1 0 1−0.5
0
0.5
1
Time (ns)
M
ag
n
it
u
d
e
(V
)
width = 0.5ns
width = 2ns
2 4 6 8 10
−60
−40
−20
0
Frequency (GHz)
N
o
rm
al
iz
ed
P
S
D
(d
B
m
) width = 0.5ns
width = 2ns
Fig. 2.3: Time domain and frequency domain mapping of carrier-less UWB signals with
different pulse width.
the center frequency is down-converted to baseband. The difference is that, in a coherent
receiver, the received pulse correlates with a local template pulse; in a non-coherent receiver,
the received pulse correlates with itself [56].
Fig. 2.4a presents the main blocks of pulse-template correlation receiver. The received
UWB pulses are correlated by the local template pulses which are generated according to the
information acquired by the synchronization algorithms and/or channel estimation. Then
it is integrated and further amplified by the variable gain amplifier (VGA) to a constant
level for analog-digital conversion [49–52]. Since the receiver has no information about
the arrival of transmitted signals, the phase synchronization with the incoming Tx pulses
is a critical issue and difficult because accurate alignment between TX impulses and RX
templates must be achieved [75]. Furthermore, the transmitted impulse from the channel
and the antennas may be significantly distorted, increasing the difficulty in generating an
accurate pulse template. Two modes of operation are taken in the synchronization: data
acquisition and data reception. A known header is first sent during the acquisition mode.
The receiver scans all the possible window positions for the header and measures the energy
in each window. Once the header window is found, the receiver is locked to the transmitter
and is then switched to the reception mode [75]. The synchronization algorithms are usually
14
LNA VGA Digital
For
Synch.Template 
Pulse CDR Phase Adjust
(b)
Vin(a)
LNA VGA
(d)
Vin Digital
For
Synch.
LNA
Vin
Injection-
Locking VCO
Phase 
Adjust
CKADC
LNA
Vin
Quench Signal
Digital
For
Synch.
(c) Trans.
ADC
ADC
ADC
ADC
LC-VCO
CDR Phase Adjust
(a) Coherent pulse-template correlation receiver.
LNA VGA Digital
For
Synch.Template 
Pulse CDR Phase Adjust
(b)
Vin(a)
LNA VGA
(d)
Vin Digital
For
Synch.
LNA
Vin
Injection-
Locking VCO
Phase 
Adjust
CKADC
LNA
Vin
Quench Signal
Digital
For
Synch.
(c) Trans.
ADC
ADC
ADC
ADC
LC-VCO
CDR Phase Adjust
(b) Non-coherent self-correlation receiver.
LNA VGA Digital
For
Synch.Template 
Pulse CDR Phase Adjust
(b)
Vin(a)
LNA VGA
(d)
Vin Digital
For
Synch.
LNA
Vin
Injection-
Locking VCO
Phase 
Adjust
CKADC
LNA
Vin
Quench Signal
Digital
For
Synch.
(c) Trans.
ADC
ADC
ADC
ADC
LC-VCO
CDR Phase Adjust
(c) Super-regenerative receiver.
LNA VGA Digital
For
Synch.Template 
Pulse CDR Phase Adjust
(b)
Vin(a)
LNA VGA
(d)
Vin Digital
For
Synch.
LNA
Vin
Injection-
Locking VCO
Phase 
Adjust
CKADC
LNA
Vin
Quench Signal
Digital
For
Synch.
(c) Trans.
ADC
ADC
ADC
ADC
LC-VCO
CDR Phase Adjust
(d) Injection-locked receiver.
Fig. 2.4: IR-UWB receiver architecture overview.
15
run in the digital backend, which controls the analog front-end.
The analog front-end of coherent receiver usually consumes more than 100 mW because
of power-eating blocks like phase-locked loop (PLL) and multipliers which are required in
this architecture. Zheng etc. [49] described a carrier-less receiver (200 Mbps/81 mW) analog
front-end for wireless personal area networking, in which a PLL with a ring oscillator was
used for clock generation while two cascaded delay-locked loops (DLL) were used for syn-
chronization. However, the power of the separated ADC (4-bit/81 mW) and the baseband
circuitry was not considered. Zhang etc. [51] reported a receiver front-end, which used two
channels to receive signals in a broadband 3.1-9.5 GHz, consumed 156 mW while offering
100 Mbps. In this receiver front-end, 54% of the power was spent on the multipliers. In
the receiver reported by [52], signals were operating in the sub-1 GHz which could relax
the design burden of radio radio frequency (RF) blocks. The entire receiver cost 6.2 mW
in the acquisition mode and 4.2 mW in the reception mode, while the maximum speed was
39 Mbps.
2.1.2 Non-Coherent Self-Correlation Receiver
Fig. 2.4b presents the typical architecture of non-coherent self-correlation receiver. The
advantage of a non-coherent receiver is that it avoids the generation of a local pulse, the re-
ceived signal is firstly mixed with its self at radio-frequency, and a power-eating delay-locked
loop (DLL)/PLL is usually introduced to perform synchronization between the received
pulse and local pulse, needing precision on the order of several tens of picoseconds [56].
Denis et al. proposed a scheme that, after the received signal is self-correlated, a windowed
integrator and ADC at baseband generate a digital signal representing the total energy
received in a given time window [55]. The ADC values are passed to a digital backend,
which performs packet detection, synchronization and decoding. This receiver was reported
to be 11 mW at 16 Mbps. Zhuo and David et al. proposed another synchronization scheme
in which, a packet composed by preambles and payloads are transmitted with sufficient
amplitudes covering the worst case channel at a low pulse repetition frequency for receiver
acquisition [53]. At the receiver side, parameters such as the SNR and the channel delay
16
spread are estimated to determine the operation mode and baseband behavior for synchro-
nization and data reception. The performance was demonstrated to be 16.3 mW at 33 Mbps
in [53].
Xia et al. [56] proposed a symbol-level synchronization which further simplified the
timing scheme. In her method, the received signal is firstly squared and integrated by
the correlator, then a comparator compares it with a reference voltage and performs dig-
ital quantization. Only a sliding correlator is employed instead of DLL/PLL for the data
synchronization. This design gives an admirable specification of 13.2 mW at 100 Mbps.
However, it was particularly emphasized by the authors that, the inevitable large frequency
offset between the baseband clock of the transmitter and receiver should be compensated
by an additional digital baseband circuitry. Even though the same clock source was used
in the measurement for the design, the receiver’s sensitivity was −50 dBm at the BER of
10−3, much lower than −66 dBm to −99 dBm compared to other designs in their comparison
table. The large power-saving was mostly taken by the integrator circuit which was realized
by only two capacitors.
2.1.3 Super-regenerative Receiver
Another attractive super-regenerative receiver architecture was proposed in [57–59] and
its main blocks is shown in Fig. 2.4c. This receiver is built around an LC-VCO (inductor-
capacitor voltage-controlled oscillator) which is driven between oscillating or non-oscillating
states by an external command quench signal. An RF input pulse will drive the LC-VCO
to oscillate quickly, meanwhile, the trapezoidal quench signal is triggered synchronously
with the input. When the quench signal is switched off, the oscillations will be damped
until the next RF input sample phase. By detecting the envelope of the LC-VCO output
and comparing with an appropriate threshold, the bit sequence can be detected. Because
no multiplier is required and the LC-VCO is working at unstable state sometimes, this
architecture can save more power consumption. In [57] and [58], the analog front-end
of the receiver could offer 10 Mbps while cost only 11.2 mW and 10.8 mW, respectively.
Furthermore, Pelissier et al. [59] presented a UWB receiver front-end for Radio-frequency
17
Identification (RFID) application which cost only 5.4 mW with the data rate of 112 Mbps
excluding the low-noise amplifier (LNA) and synchronization circuitry.
The super-regenerative receiver suffers from low sensitivity in comparison to the co-
herent and non-coherent receivers which are based on energy detection, because the quench
signal slope duration, which is relatively close to the core oscillator time period, will build
up the oscillation in the absence of a pulse, thus degrading the SNR [58]. Furthermore, the
receiver requires synchronization for both Tx/Rx and quench signal/RF input.
2.1.4 Injection-locked Receiver
Injection locking is a frequency effect that can occur when a harmonic oscillator is
disturbed by a second oscillator operating at a nearby frequency. When the coupling is
strong enough and the frequencies near enough, the second oscillator can capture the first
oscillator, causing it to have the essentially identical frequency as the second [76, 77]. The
injection-locked receiver, which eliminates the clock data recovery circuitry, multiplexer and
any synchronization on the digital backend, is another promising architecture proposed by
Hu et al. [60]. As shown in Fig. 2.4d, the local oscillator is injection-locked to the incom-
ing OOK-modulated pulses and hence is automatically phase-aligned with the transmitted
clock. The ADC sampling requirements can be severely relaxed and can run at the actual
data rate since the receiver clock is now injection-locked and synchronized with the trans-
mitter. This local oscillator is called injection-locked VCO (IL-VCO). It was reported that
such a receiver could deliver 500 Mbps while consuming only 45 mW [60].
2.1.5 Comparison of the Four Architectures
Fig. 2.5 compares the data rate and power consumption for the recently reported IR-
UWB receivers. The solid symbols represent full receivers which include synchronization
processes.
The energy/bit, which is shown in Fig. 2.6, is another popular figure-of-merit for
transceivers because it relates directly to the energy required to transmit or receive one
18
100 101 102 103
100
101
102 [49]
[50]
[51]
[52]
[53]
[54]
[55] [56][57]
[58]
[59]
[60]
Data Rate (Mbps)
P
ow
er
C
o
n
su
m
p
ti
o
n
(m
W
)
Pulse-template
Self-correlation
Super-regenerative
IL-VCO
Fig. 2.5: Receiver Data Rate vs. Power Consumption.
bit of data. Compared with other types of architectures, the IL-VCO-based receiver offers
the best energy efficiency among all the full receivers.
To sum all, the IL-VCO-based architecture has the best potential to offer very high
data speed while respecting the rigorous power limits. The transceiver proposed in this
dissertation is based on this type of architecture.
2.2 Proposed System Architecture and Link Analysis
2.2.1 Proposed Architecture
However, there are three further concerns on the injection-locked receiver:
1. The data is OOK-modulated, a long string of empty data transitions would result in
loss of phase synchronization.
2. The receiver’s SNR would be significantly deteriorated if the power is constraint within
10 mW, leading to a possible unacceptable BER performance (it was originally 45 mW
in [60]).
19
100 101 102 103
100
101
102
103
104
[49]
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
[58]
[59]
[60]
Data Rate (Mbps)
E
n
er
g
y
/
b
it
(p
J
/
b
it
)
Pulse-template
Self-correlation
Super-regenerative
IL-VCO
Fig. 2.6: Data Rate vs. Energy/bit.
3. As mentioned earlier, the amplitude of the received signal would be reduced signifi-
cantly because of the coils’ low quality factor, further decreasing the receiver’s SNR.
To address these three concerns, a low-complexity error correction decoder is added to
compensate the loss of SNR in our proposed architecture of the UWB system. As depicted
in Fig. 2.7 [1,3], the data is first encoded by the error correction encoder and then modulated
using OOK in the transmitter. The lumped model of the low-Q coils L1 and L2, by which
the UWB pulses are inductively coupled, is shown in the center of the figure. The clock
signal ck generated by the IL-VCO is synchronized with the actual data rate, and is used
as the sample clock by the 2-bit ADC. The data is recovered by an low-complexity error
correction decoder, which is used to compensate the loss of SNR.
We use two pairs of coils to transmit power and data separately. The power and
data coils can be made approximately independent of each other (carrier frequency for
power coils < 10 MHz [32]), so that the data channel is isolated from the power channel.
In practice, there is likely to be a small mutual inductance between all coils, leading to
potential interference [38]. Even though the coils with GHz-order SRF perform a band-
20
Rs1 R1
C1 L1 L2
R2
C2 Vod
Skin Barrier
K12
UWB Pulse 
Generator
Error 
Correcting 
Encoder
Driver Vid LNA
OOK 
Modulation
+
-
ADC
-
+
ck
RfCf
Lumped Model of Coil Link
IL-VCO
Error 
Correcting 
Decoder
IR-UWB ReceiverTransmitter
Vod’
A
B
Original
Messages
Phase Adjust
Fig. 2.7: Proposed UWB transmission system: Transmitter, Inductive Coil Link, and IR-
UWB Receiver (modified from [3]).
pass filter behavior, considering the amplitude of power signals could be up to 400 V [38],
a passive high-pass filter (Rf , Cf shown in Fig. 2.7) is inserted between the LNA and the
ADC in order to cancel the potential interference from power coils.
2.2.2 Link Transfer Function
The data coils are designed following the methods described in [36–39, 78]. The
LRC values and the estimated coupling coefficient k are shown in Table. 2.1, in which
k = M/
√
L1L2, where M is the mutual inductance. R2 is designed to be a large value
in order to decrease the quality factor. The coils’ self-resonance frequency is chosen to be
4 GHz at this theoretical analysis.
Table 2.1: LRC VALUES AND COUPLING COEFFICIENT OF THE COIL
Parameter
Coils in this work
LTX(L1) LRX(L2)
L(µH) 0.16 0.13
C(fF ) 10.12 12.25
R(Ω) 1.83 2000
Rs(Ω) 50 50
k 0.0841
Since L1 is loosely coupled (small k) with L2 and the current in tank L2C2 is very small,
we can neglect the effect of L2C2 loading on the tank L1C1 to simplify the equation [24].
The transfer function in the S-domain is deduced as shown in (2.2) [3]:
21
H12(s) =
Vod(s)
Vid(s)
=
sM12
Rs1L1C1s2 + (Rs1R1C1 + L1)s+ (R1 +Rs1)
× 1
L2C2s2 +R2C2s+ 1
(2.2)
2.2.3 Link Impulse Response
The impulse response h(t) can be deduced from (2.2) (see the Appendix A). Suppose
the input signal is x(t), the output signal y(t) is the convolution of x(t) and h(t):
y(t) = x(t) ∗ h(t) =
∫ ∞
−∞
x(t− τ) · h(τ)dτ (2.3)
A carrier-base (carrier frequency fc = 4 GHz) UWB pulse with 2 ns width is transmitted
from coil L1 to L2 under high-Q and low-Q conditions as shown in Fig. 2.8. Another example
is a carrier-less Scholtzs monocycle UWB pulse [79] with 2 ns width as the input signal as
presented in Fig. 2.9. The bode plot of H12(s) is shown in Fig. 2.10. The plots indicate that
the low-Q coils have a flat frequency response which is good to minimize pulse distortion
and ISI, but also introduce larger attenuation estimated to be about 20 dB.
In the next section, a pair of low-Q coils are designed, fabricated, and test in bio-tissue
to observe the real attenuation and the implementation possibility.
22
0 5 10
−2
−1
0
1
2
time(ns)
M
a
g
n
it
u
d
e(
V
)
High-Q output
0 5 10
−1
−0.5
0
0.5
1
time(ns)
M
a
g
n
it
u
d
e(
V
)
Input
0 2 4 6 8
−120
−100
−80
−60
−40
−20
0
(4,-20)
Frequency(GHz)
P
S
D
(d
B
m
/
d
B
)
Input PSD
0 2 4 6 8
−80
−60
−40
−20
0
(4,-8)
Frequency(GHz)
P
S
D
(d
B
m
/
d
B
)
High-Q output PSD
0 2 4 6 8
−150
−100
−50
0
(4,-38)
Frequency(GHz)
P
S
D
(d
B
m
/
d
B
)
low-Q PSD
0 5 10
−0.2
−0.1
0
0.1
0.2
time(ns)
M
a
g
n
it
u
d
e(
V
)
Low-Q output
Fig. 2.8: Time responses for carrier-base UWB pulse (input pulse width = 2 ns).
23
0 1 2 3 4 5
−0.2
0
0.2
time(ns)
M
a
g
n
it
u
d
e(
V
)
High-Q output
0 1 2 3 4 5
−0.5
0
0.5
1
time(ns)
M
a
g
n
it
u
d
e(
V
)
Input
0 5 10
−80
−60
−40
−20
(2.5,-26)
Frequency(GHz)
P
S
D
(d
B
m
/
H
z)
Input PSD
0 5 10
−80
−60
−40
−20
(4,-23)
Frequency(GHz)
P
S
D
(d
B
m
/
H
z)
High-Q output PSD
0 5 10
−140
−120
−100
−80
−60
−40
(2.9,-46)
Frequency(GHz)
P
S
D
(d
B
m
/
H
z)
low-Q PSD
0 1 2 3 4 5
−0.1
−5 · 10−2
0
5 · 10−2
0.1
time(ns)
M
a
g
n
it
u
d
e(
V
)
Low-Q output
Fig. 2.9: Time responses for carrier-less UWB pulse (input pulse width = 2 ns).
24
100 101 102
−80
−60
−40
−20
0
20
(4,14)
(4,-18.4)
Frequency(GHz)
M
a
g
n
it
u
d
e(
d
B
)
High-Q
Low-Q
Fig. 2.10: Bode plot of H12(s).
2.3 Coil Fabrication and Measurement
2.3.1 Coil Fabrication
Wideband data transmission requires high-SRF coils thus demands very low parasitic
capacitor and inductance since SRF = 1/2pi
√
LC. The bandwidth of the UWB signals is in
the order of GHz. Standard FR4 material, of which the substrate dielectric constant εr = 4.4
and the dielectric loss tangent δ = 0.02, can hardly satisfy the requirement according to our
experience. We therefore choose a high frequency circuit material RO4003C (1.5mm thick,
Rogers Corporation, Rogers, CT) for fabricating our Tx and Rx coils. The corresponding
parameters of this material is (εr, δ) = (3.55, 0.0029). Both of the coils are squared shape
with only one loop, very small width in order to lower the parasitic capacitor and inductance.
We used a commercial field solver, HFSS (Ansys, Canonsburg, PA), to design and simulate
all the model parameters necessary for the coil design described in [39,80]. The photo and
the dimension of the Tx and Rx coils is shown in Fig. 2.12. Lt = 19.2 mm and Lr = 8.0 mm
are the length of Tx and Rx coil, respectively. Wt = 2.4 mm and Wr = 1.0 mm are the
width of Tx and Rx coil, respectively. A 2 kΩ resistor is added to the Rx coil to lower the
25
Q value as described in Section 2.2.
2.3.2 UWB Transmitter
A great variety of techniques may be used to generate UWB waveforms. In order to
test the specification of the fabricated coils before designing the chip, we adopt a low-cost
architecture based on the transient response of passive filters excited by step signals [81].
The step signal with sufficiently short rise/fall times is directly applied to the 3rd-order
band-pass filter, as shown in Fig. 2.11, then the UWB pulses can be generated. We realized
the circuit, as exhibited in Fig. 2.12, by using standard component values ( L = 0.5 nH, C =
1.0 pF).
Rs=50Ω 
CLC
L L
RL=50Ω 
C
e(t)
Band-Pass Filter
Fig. 2.11: The UWB pulse generator schematic.
Lt
Lt
Lr
Lr
A 2k resistor
Wt
Wr
Fig. 2.12: Photo of (a) UWB pulse generator; (b) Tx coil; (c) Rx coil.
26
2.3.3 Coil Measurement
It should be noted that the coils presented here are non-optimized. The scattering
parameters (S-parameters) of the fabricated coil measured in air and beef environment are
shown in Fig. 2.13.
Fig. 2.14 shows the measurement setup in order to test the time response of the coil.
We have used a digital oscilloscope (Tektronix TDS7704B) to measure the transient signal
waveforms from both Tx and Rx coils, while they are mounted vertically at different coupling
distances using the support frame made of a mini T-Slot style building system MicroRax
(Twintec, Auburn, WA). Two plastic Ziploc bags (∼ 45 µm thick), hanging from a horizontal
clamp, were filled with the combination of saline solution (0.9% NaCl) and thin beef slice
1 1.5 2 2.5 3
−15
−10
−5
0
Frequency(GHz)
M
a
g
n
it
u
d
e(
d
B
)
S11 in air
S11 in beef
1 1.5 2 2.5 3
−80
−60
−40
−20
0
Frequency(GHz)
M
a
g
n
it
u
d
e(
d
B
)
S21 in air
S21 in beef
1 1.5 2 2.5 3
−8
−6
−4
−2
0
Frequency(GHz)
M
a
g
n
it
u
d
e(
d
B
)
S22 in air
S22 in beef
1 1.5 2 2.5 3
−80
−60
−40
−20
0
Frequency(GHz)
M
a
g
n
it
u
d
e(
d
B
)
S12 in air
S12 in beef
Fig. 2.13: Measured S-parameters of the fabricated coil.
27
(ranging from 5 to 15 mm), both of which were in room temperature. The Rx coil was
sandwiched between two bags of beef tissue while the Tx coil was aligned with it, touching
the outer surface of one bag.
A train of squared pulses, with amplitude of 3.3 V and rise/fall time of 130 ps, were
obtained by a pulse generator (HP8133A 3 GHz) and applied to the UWB transmitter.
The transmitted UWB pulses with peak-to-peak voltage Vtx = 1.16 V, the corresponding
received pulses in air environment (Vrxa = 46.3 mV), received pulses in beef tissue envi-
ronment (Vrxb = 69.2 mV), are shown in Fig. 2.15. The distance between the Tx and Rx
coils is 10 mm and the maximum pulse rate is 166 Mbps. It should be noted that the value
of Vtx could be adjusted by modifying the amplitude of the squared pulses (by the pulse
generator), thus the Vrxb in tissue environment, for example, could be adjusted, not limited
to 69.2 mV.
In order to observe and evaluate the system performance in the next section, a ratio-
nal function-based model which can characterizes the coils’ port behavior as a function of
frequency is generated by using the Matlab RF Toolbox (Mathworks, Natick, MA), and
the simulation results are also presented in Fig. 2.15. It is shown that the simulated out-
put waveform matches well with the physical measurement. For further observation, 300
randomly selected input UWB signals are simultaneously fed into both the physical coils
and the generated computational model, obtaining the output signals denoted by Yc and
Ys, respectively. The correlation coefficients between Yc and Ys in air environment (ra) and
in beef tissue environment (rb), representing their similarity, are exhibited in Fig. 2.16. It
can be proved that the model is in good agreement with the physical coils.
There is a drawback of this type of transmitter that the adjacent UWB pulses are
180 ◦ phase-shifted because of the opposite rise and fall steps, but it is sufficient for this
measurement. In practice, the transmitter is generally fabricated in digital circuit with
tunable power and spectrum [56], which is outside the discussion of this section. The
transmitter design on our chip is discussed in Chapter. 4.
Fig. 2.17 shows the measured power spectral density for both the Tx and Rx pulses.
28
Fig. 2.14: Experimental setup for measuring time response of the inductive coil link between
the beef tissue.
It indicates that the high frequency components of Tx pulses (> 1.8 GHz) are highly ab-
sorbable in tissue environment and are hard to pass through tissue conductor.
2.4 Receiver Performance Evaluation
The realistic physical channel has been put into the proposed UWB receiver model
shown in Fig. 2.7. The sampling clock jitter is modeled as a Gaussian-distributed random
error in the received pulse sampling time. The noise figure (NF) of the nonideal LNA
ranged from 4-6 dB is also accounted. OOK modulation, together with a low-complexity
convolutional Viterbi decoder, with the generator polynomial G (x) =
[
1 + x2 1 + x+ x2
]
and rate R=1/2, are adopted for the performance evaluation. The BER is evaluated as the
primary measure of the system reliability.
Fig. 2.18a shows the system performance when the root mean square (RMS) of the
jitter is 35 ps, 40 ps and 45 ps, while the noise figure of LNA is constant at 5 dB. The results
illustrate that the system cannot function reliably without the use of Error Correction Code
(ECC) technique. The sampling jitter creates an obvious error floor that limits the system’s
performance, independent of the SNR of received signals. With the use of error correction
code, the error floor is effectively eliminated.
Fig. 2.18b shows the system performance when the LNA noise figure is 4 dB, 5 dB and
2930
0 5 10 15 20 25 30 35 40 45 50
−1
−0.5
0
0.5
1
A
m
p
li
tu
d
e(
V
)
Generated UWB pulses
0 5 10 15 20 25 30 35 40 45 50
−2
0
2
4
A
m
p
li
tu
d
e(
V
)
Pulse train from HP8133A
0 5 10 15 20 25 30 35 40 45 50
−4
−2
0
2
4
·10−2
A
m
p
li
tu
d
e(
V
)
Measured(air)
Simulated(air)
0 5 10 15 20 25 30 35 40 45 50
−4
−2
0
2
4
·10−2
time(ns)
A
m
p
li
tu
d
e(
V
)
Measured(beef)
Simulated(beef)
Fig. 2.15: Time response measurement of the coils.
30
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300
0.5
0.6
0.7
0.8
0.9
r a
In Air Environment
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300
0.5
0.6
0.7
0.8
0.9
r b
In Beef Tissue Environment
Fig. 2.16: Correlation coefficients of 300 measured and simulated samples.
0 1 2 3 4 5 6 7 8 9 10
−100
−80
−60
−40
−20
0
0.8∼1.8GHz
0.8∼3.8GHz
Frequency(GHz)
P
S
D
(d
B
m
/
H
z)
Tx pulse
Rx pulse in tissue
Rx pulse in air
Fig. 2.17: Measured power spectral density for both pulses from Tx and Rx.
31
6 dB, while the jitter-RMS remains at 40 ps. It reveals that (1) the level of the error floor
is determined by the sampling jitter, and (2) improving the LNA’s noise figure will not
contribute to improve the system performance without the use of ECC.
Performance with two exaggerated jitter-RMS, 60 ps and 80 ps, are also evaluated in
Fig. 2.18c, while the LNA NF = 5 dB. The result indicates that, as a matter of fact, the
error floor still exits if the jitter-RMS is extremely large, but it could be expected that
the performance would be improved by the ECC technique for more than four orders. We
introduce a factor η which could quantify the BER comparison (only hard decoding is
considered), defined as:
η =
BER Uncoded
BER Hard decoded
(2.4)
Fig. 2.18d shows the quantities of performance improved by the ECC when the re-
ceived pulse amplitude is Vpprxb = 7.6 mV, 10.4 mV and 24.7 mV. It demonstrates that the
reliability will be improved exponentially with the jitter-RMS decreases.
It is a useful reference showing that ECC can offer large compensation to the UWB
system. The burden of designing power-eating cells, like the LNA, VCO, and ADC, is
greatly released thus more power would be saved.
32
0 10 20 30 40
10−7
10−6
10−5
10−4
10−3
10−2
10−1
100
Vrxb(mV)
B
E
R
Uncoded
Hard decoded
Soft decoded
45 ps 40 ps 35 ps
(a)
0 10 20 30 40
10−7
10−6
10−5
10−4
10−3
10−2
10−1
100
Vrxb(mV)
B
E
R
Uncoded
Hard decoded
Soft decoded
6 dB 5 dB 4dB
(b)
0 10 20 30 40
10−7
10−6
10−5
10−4
10−3
10−2
10−1
100
Vrxb(mV)
B
E
R
Uncoded
Hard decoded
Soft decoded
80 ps 60 ps
(c)
40 60 80
100
101
102
103
104
105
Clock Jitter (ps)
η
Vrxb = 7.6 mV
Vrxb = 10.4 mV
Vrxb = 24.7 mV
(d)
Fig. 2.18: (a) System performance when the jitter RMS is 35 ps, 40 ps, 45 ps; NF of LNA
is constant at 5 dB; (b) System performance when the LNA NF = 4 dB, 5 dB, 6 dB; jitter
RMS is constant at 40 ps; (c) System performance when jitter RMS is 60 ps, 80 ps; LNA
NF=5 dB; (d) Showing the quantities of performance improved by the ECC. The BER with
ECC is unobservable when Vrxb = 24.7 mV and clock jitter < 55 ps.
33
CHAPTER 3
DESIGN METHODOLOGY FOR NANOMETER CMOS PROCESS
3.1 Introduction
In Chapter. 2, the system-level architecture and performance are discussed and evalu-
ated, however, the translation of these characteristics to a final design, especially in nanome-
ter CMOS process, is a big challenge. Many design parameters trade with each other, such
as gain, linearity, power consumption, noise, speed, input-output impedance, voltage swing,
etc., making the design a multi-dimensional optimization problem. Power consumption
constrains the design of the transceiver and makes the design for each circuit block of the
chain more challenging. It is especially noticeable to the trade-off between power and the
inherent noise of RF blocks. To exemplify, a very low-noise application should accept high
power consumption, whereas a very low-power design would need to manage higher noise
values.
The trade-off between power consumption and noise is strongly determined by the ac-
tive element: Metal-Oxide-Semiconductor Field-Effect-Transistor, or MOSFET. For deep
submicron or nanometer CMOS process, however, the short and narrow channel effects on
the device characteristics bring not only larger noise and mismatch, smaller transconduc-
tance, but also limitations of conventional square-law based analog design flow [82]. The
voltage-current (V -I) square-law relation of the transistor is given by [83]:
ID =
µnCox
W
L
[
(VGS − VTH)VDS − V
2
DS
2
]
, VDS 6 VGS − VTH , VGS ≥ VTH
1
2µnCox
W
L (VGS − VTH)2 (1− λVDS) , VDS > VGS − VTH , VGS ≥ VTH
(3.1)
where ID is the drain-source current, µn is the mobility of charge carries, Cox is the
gate oxide capacitance per unit area, (W/L) is the width-to-length ratio of the transistor.
34
VGS , VDS , VTH are the gate-source voltage, drain-source voltage and threshold voltage,
respectively. λ is the channel-length modulation coefficient which is reciprocal to the channel
length L, i.e., λ ∝ (1/L) [83].
This square-law equation matches well with the transistor behavior when VDS is small
and/or L is large because of the linear relation between carrier velocity and horizontal
electric field. However, the behavior of short channel devices deviates considerably from this
model due to the velocity saturation effect. At high field, the carrier velocity approaches
the thermal velocity, and when the electric field reaches a critical value Ec, the carrier
velocity tends to saturate due to scattering effects [82]. In this case, the V -I relation of the
transistor is modified as:
ID =
κ(VDS)µnCox
W
L
[
(VGS − VTH)VDS − V
2
DS
2
]
, VDS 6 VGS − VTH , VGS ≥ VTH
κ(VDS)
1
2µnCox
W
L (VGS − VTH)2 (1− λVDS) , VDS > VGS − VTH , VGS ≥ VTH
(3.2)
where
κ(VDS) =
1
1 +
(
VDS
EcL
) (3.3)
For large values of L or small values of VDS , κ approaches 1 and (3.2) reduces to (3.1).
For short channel devices κ < 1, the current is smaller than what would be expected, as
shown in Fig. 3.1.
Equations (3.1) and (3.2) are considered to be valid for VGS ≥ VTH . In reality, for
VGS ≈ VTH , a “weak” inversion layer still exist, and some current flows from drain to
source. Even for VGS < VTH , ID is finite [83]. As seen in the plot of Fig. 3.2, where the
ID versus VGS is plotted in a logarithmic scale. The logarithmic plot shows that below
the threshold voltage VTH the current is not zero and has an exponential relation with the
gate-source voltage. This current is often referred to as sub-threshold current.
Circuit features change as a function of the inversion region in which the MOSFET
is biased. The MOSFET used in analog and radio frequency has been traditionally biased
35
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2
0
20
40
60
80
L = 70nm
L = 0.18 µm
L = 0.35 µm
L = 0.5 µm
L = 1µm
Vgs = 0.6 V
W/L = 10
Vds (V)
I d
(µ
A
)
Fig. 3.1: Id versus Vds for different transistor length L. Vgs and W/L are constant at 0.6 V
and 10, respectively.
in strong inversion (SI) region. This region is characterized by high power consumption as
well as high MOSFET transition frequency (fT ) due to the small sizing of the MOSFET. In
this zone, the VGS is well above the threshold voltage VTH . The other two inversion regions
can be distinguished: the weak inversion (WI) region, or sub-threshold region, is the zone
where VGS is well below VTH ; and the moderate inversion (MI) region which is in the midst
of weak and strong inversion, approximately “around” threshold. In weak and moderate
regions, the values of VGS and the overdrive voltage VOD = VGS − VTH are very low, which
make these zones very adequate for low supply voltage operation, thus consume much less
power but at the cost of lower transition frequency and higher noise due to the increment
in MOSFET sizes.
Therefore, in low power analog circuits, it is crucial to the use of the MOSFETs in
moderate and/or weak inversion regions, taking advantage of the nanometer technologies
proliferating nowadays. However, the difficulty of optimizing the MOSFETs in moderate
or weak inversion regions and achieving the “best” specifications for a circuit block is sig-
nificantly increased for short channel nanometer process because of (not limited to) the
36
0 0.2 0.4 0.6 0.8 1 1.2
10−10
10−5
100
VTH
Weak
Inversion
Strong InversionModerate
Inversion
VGS (V)
I D
(
A
)
Fig. 3.2: ID versus VGS in logarithmic scale for an NMOS transistor with W/L =
10 µm/70 nm.
considerable deviated V -I square-law relation. In fact, because of the complicated Modern
MOSFETs, any square-law driven design optimization will be far off from Spice results.
In 1996, Silveira et al. [84]proposed a powerful transconductance-to-drain current (gm/ID)
technique to optimize an operational transconductance amplifier (OTA) that has since be-
come the basis of many later developments in analog circuit design. The gm/ID method-
ology was originally developed to help designers to size up transistors quickly with good
accuracy and to calculate parameters such as small signal gain and bandwidth. Recently,
this approach has been found in applications such as phase noise optimization of an LC-
VCO [85], MOSFET nonlinearity characterization [86], and low noise amplifier design [87].
A book dedicated to gm/ID methodology for the analog amplifier design has also been
published [88]. Some top-rated universities (Stanford [89] and Berkley [90]) teach courses
specifically focused on this design methodology. Utah State University has offered similar
courses on this methodology for many years as well [91]. Typically, analog circuit designers
reach their goal by taking advantage of their own experience while performing lots of simula-
tions and optimizations until the circuit “somehow” meets the specifications. With gm/ID
methodology, designers can quickly iterate a systematic design through tens of different
designs at one time (Fig. 3.3).
The basic idea of gm/ID methodology is that each gm/ID value is one to one related
with the normalized current i = ID/(W/L). It can be observed from the fact that the
gm/ID ratio is equal to the derivative of the logarithmic of ID with respect to VGS as shown
37
B. Murmann EE214B Winter 2012-13 – HO6 45
Observations
 The design is essentially right on target!
– Typical discrepancies are no more than 10-20%, due to VDS
dependencies, finite output resistance, etc.
 We accomplished this by using pre-computed spice data in the design 
process
 Even if discrepancies are more significant, there’s always the possibility 
to track down the root causes
– Hand calculations are based on parameters that also exist in Spice,
e.g. gm/ID, fT, etc.
– Different from square law calculations using µCox, VOV, etc.
• Based on artificial parameters that do not exist or have no 
significance in the spice model
B. Murmann EE214B Winter 2012-13 – HO6 46
Comparison
Fig. 3.3: Comparison of square-law based method and gm/ID-based method [4].
below [84]:
gm
ID
=
1
ID
∂ID
∂VGS
=
∂ ln ID
∂VGS
=
∂
{
ln
[
ID
(WL )
]}
∂VGS
=
∂ ln (i)
∂VGS
(3.4)
Ideally, the normalized current i is independent of transistor size according to (3.2),
the gm/ID ratio is also size independent (in reality, i has a slight dependence with W and
L which will be shown later). Therefore, the relationship between gm/ID and normalized
current i is a unique characteristic for all transistors of the same type (NMOS or PMOS) [84].
With this approach, as illustrated in Fig. 3.4, we consider a range of gm/ID between 4 V
−1
(strong inversion) and 26 V−1 (weak inversion), and the drain current ID varying from ID,min
to ID,max. Then for each pair (gm/ID, ID), the normalized current i, the transconductance
gm and the transistor size (W/L) are deduced. The transistor length L is assumed to
be a constant value, for example the technology minimum to reach the highest transition
frequency fT , then the width W is solved.
Fiorelli et al. [85] well explored the gm/ID design methodology for the LC-VCO opti-
mization. This chapter details some of their design flows and creates the necessary database
utilized throughout the methodology. The database includes:
1. gm/ID versus normalized current i = ID/(W/L);
38
10−12 10−11 10−10 10−9 10−8 10−7 10−6 10−5 10−4
0
10
20
30
L = 70nm
Vds = 0.6V
W = 10 µm
i = ID/(W/L) (A)
g m
/
I D
(V
−
1
)
NMOS gm/ID
Fig. 3.4: gm/ID versus i for an NMOS transistor.
2. fT versus gm/ID;
3. gds/ID versus gm/ID;
4. gmb/gm versus gm/ID;
5. Normalized capacitance Cij versus gm/ID, ij = {gs, gd, gb, bd, bs};
6. On-chip inductors.
3.2 MOSFET gm/ID Model
Before explaining the database, a “lumped” MOSFET model needs to be stated in
advance. Fig. 3.5 shows the NMOS and PMOS transistor symbols with the gate (G), drain
(D) and source (S). Since in most circuits the bulk terminals of NMOS and PMOS transistors
are tied to ground and VDD, respectively, the bulks (B) are omitted in the symbols.
The capacitance exits between every two of the four terminals of a MOSFET and is
also shown in Fig. 3.5. Cgs, Cgd, Cgb, Cbd, Cbs are the gate-source, gate-drain, gate-bulk,
bulk-drain and bulk-source capacitances of the MOSFET, respectively. The capacitance
between S and D is negligible [83].
3.2.1 gm/ID Versus Normalized Current i = ID/(W/L)
The gm/ID versus normalized current i = ID/(W/L) is extracted for the transistor
length L = 70 nm which is widely used in the following circuits design in order for the max-
39
NMOS
D
G
S
(a)
PMOS
D
G
S
(b)
NMOS
D
G
S
Cgs
Cgd Cbd
Cbs
Cgb
B
(c)
Fig. 3.5: (a) NMOS transistor; (b) PMOS transistor; (c) MOSFET capacitances.
imum transition frequency fT . Physically for a MOSFET layout, as presented in Fig. 3.6, if
the transistor finger width is wn and the number of fingers is nf , the total transistor width
is determined by W = wn × nf . For the data acquisition, a set of W = {1, 2, ..., 160} µm
is simulated, which is the pairwise product of wn = {1, 2, 3, 4, 5} µm and nf = {1, 2, 4,
8, 16, 32}. This range is enough for covering well the variations of gm/ID versus i in the
whole transistor region. The data is presented in Fig. 3.7, indicating that the gm/ID versus
i plot is width independent. Even though the curves have a slight variation especially in the
strong inversion, they are accurate enough for quickly sizing up transistors at the beginning
of the design by deploying only one curve of them.
Fig. 3.8 shows the gm/ID versus i plots when transistor width is constant at W = 10 µm
and the length L = {70, 80, 90, ..., 300} nm. In this case, larger variations exist especially in
the weak inversion region.
In order for a full observation, the gm/ID versus i plot variations with the Vds voltage
are presented in Fig. 3.9. The transistor length is constant at L = 70 nm. Vds, wn and nf are
swept in the range {0.3, 0.6, 0.9, 1.2}V, {1, 2, 3, 4, 5} µm and {1, 2, 4, 8, 16, 32}, respectively.
Note that the drain-source voltage for PMOS is Vsd. It can be observed that the variations
of gm/ID versus i plot with transistor size and Vds do not modify considerably. Therefore,
40
Wn
nf
Guard ring
Fig. 3.6: Layout of an NMOS transistor with W = 6 µm (nf = 4, Wn = 1.5 µm), L = 70 nm.
8
1
10−12 10−11 10−10 10−9 10−8 10−7 10−6 10−5 10−4
0
10
20
30
L = 70nm
Vds = 0.6V
nf = {1, 2, 4, 8, 16, 32}
W = {1, 2, ..., 160} µm
i = ID/(W/L) (A)
g m
/
I D
(V
−
1
)
NMOS gm/ID
10−12 10−11 10−10 10−9 10−8 10−7 10−6 10−5 10−4
0
10
20
30
L = 70nm
Vsd = 0.6V
nf = {1, 2, 4, 8, 16, 32}
W = {1, 2, ..., 160} µm
i = ID/(W/L) (A)
g m
/
I D
(V
−
1
)
PMOS gm/ID
wn = 5 µm
wn = 4 µm
wn = 3 µm
wn = 2 µm
wn = 1 µm
wn = 5 µm
wn = 4 µm
wn = 3 µm
wn = 2 µm
wn = 1 µm
Fig. 1.7: gm/ID versus i for different transistor width W .Fig. 3.7: g /ID versus i for different transistor idth .
419
10−12 10−11 10−10 10−9 10−8 10−7 10−6 10−5 10−4
0
10
20
30
Vds = 0.6V
nf = 2, wn = 5 µm
W = 10 µm
i = ID/(W/L) (A)
g m
/
I D
(V
−
1
)
NMOS gm/ID
10−12 10−11 10−10 10−9 10−8 10−7 10−6 10−5 10−4
0
10
20
30
Vsd = 0.6V
nf = 2, wn = 5 µm
W = 10 µm
i = ID/(W/L) (A)
g m
/
I D
(V
−
1
)
PMOS gm/ID
L = 70nm L = 80nm L = 90nm L = 100 nm L = 110 nm
L = 120 nm L = 150 nm L = 200 nm L = 250 nm L = 300 nm
Fig. 1.8: gm/ID versus i for different transistor length L.Fig. 3.8: gm/ID versus i for different transistor length L.
42
the “gm/ID versus i” plot can be used as the transistors intrinsic property throughout the
circuit design process.
In reality, only one curve, for example the curve for L = 70 nm,W = 10 µm, is enough
with a good accuracy throughout the design process. It is not complicated, however, to
use the curve for a specific L for a better accuracy in the Matlab scripts generated for the
circuit parameter calculation, which will be discussed in Chapter. 4.
3.2.2 fT Versus gm/ID
For an MOS transistor, the transition frequency fT is defined as the frequency where
the magnitude of the short-circuit, common-source current gain falls to unity [82]:
fT =
gm
2pi(Cgs + Cgd + Cgb)
(3.5)
This parameter predicts, not accurately, the frequency limit of one single transistor,
and can be considered as a figure of merit for a transistor. Inaccuracy comes from the
inaccurate “lumped” transistor model and many high order effects at higher frequencies,
as well as the layout design and device connections. In terms of experience, the “lumped”
transistor model is accurate enough up to about fT /5 [4].
The fT versus gm/ID plot shown in Fig. 3.10 indicates that, for an NMOS transistor
biased in strong inversion, fT can surpass 100 GHz and PMOS transistor reaches 80 GHz.
In the weak inversion, the fT drop down to levels below the gigahertz. This check simply
gives rough limitations of the technology in terms of frequency in each inversion.
An interesting observation is the design trade-off between gm/ID and fT . Fig. 3.11
shows the gm/ID and fT versus overdrive voltage Vov, which is defined as the VGS in excess
of the threshold voltage VTH , i.e., Vov = VGS − VTH . In weak inversion, gm/ID is large but
fT is small; in strong inversion, gm/ID is small but fT is large. The product of gm/ID and
fT , which is shown in Fig. 3.2.2, peaks in moderate inversion. Operating the transistor in
moderate inversion is optimal when we value speed and power efficiency equally .
4310
10−12 10−11 10−10 10−9 10−8 10−7 10−6 10−5 10−4
0
10
20
30
L = 70nm
W = {1, 2, ..., 160} µm
i = ID/(W/L) (A)
g m
/
I D
(V
−
1
)
NMOS gm/ID
10−12 10−11 10−10 10−9 10−8 10−7 10−6 10−5 10−4
0
10
20
30
L = 70nm
W = {1, 2, ..., 160} µm
i = ID/(W/L) (A)
g m
/
I D
(V
−
1
)
PMOS gm/ID
Vds = 0.3V
Vds = 0.6V
Vds = 0.9V
Vds = 1.2V
Vsd = 0.3V
Vsd = 0.6V
Vsd = 0.9V
Vsd = 1.2V
Fig. 1.9: gm/ID versus i for different drain-source voltage Vds.
use the curve for a specific L for a better accuracy in the Matlab scripts generated for the
circuit parameter calculation, which will be discussed in Chapter ??.
1.2.2 fT Versus gm/ID
For an MOS transistor, the transition frequency fT is defined as the frequency where
the magnitude of the short-circuit, common-source current gain falls to unity [2]:
fT =
gm
2pi(Cgs + Cgd + Cgb)
(1.5)
This parameter predicts, not accurately, the frequency limit of one single transistor,
and can be considered as a figure of merit for a transistor. Inaccuracy comes from the
inaccurate “lumped” transistor model and many high order effects at higher frequencies,
as well as the layout design and device connections. In terms of experience, the “lumped”
transistor model is accurate enough up to about fT /5 [1].
Fig. 3.9: gm/ID versus i for different drain-source voltage.
11
The fT versus gm/ID plot shown in Fig. 1.10 indicates that, for an NMOS transistor
biased in strong inversion, fT can surpass 100 GHz and PMOS transistor reaches 80 GHz.
In the weak inversion, the fT drop down to levels below the gigahertz. This check simply
gives rough limitations of the technology in terms of frequency in each inversion.
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
10−1
101
103
L = 70nm
W = {1, 2, ..., 160} µm
100GHz
1GHz
gm/ID (V
−1)
f T
(G
H
z)
NMOS fT
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
10−1
101
103
L = 70nm
W = {1, 2, ..., 160} µm
80GHz
1GHz
gm/ID (V
−1)
f T
(G
H
z)
PMOS fT
Vds = 0.3V
Vds = 0.6V
Vds = 0.9V
Vds = 1.2V
Vsd = 0.3V
Vsd = 0.6V
Vsd = 0.9V
Vsd = 1.2V
Fig. 1.10: fT versus gm/ID for NMOS and PMOS transistors.
An interesting observation is the design trade-off between gm/ID and fT . Fig. 1.11
shows the gm/ID and fT versus overdrive voltage Vov, which is defined as the VGS in excess
of the threshold voltage VTH , i.e., Vov = VGS − VTH . In weak inversion, gm/ID is large but
fT is small; in strong inversion, gm/ID is small but fT is large. The product of gm/ID and
fT , which is shown in Fig. 1.2.2, peaks in moderate inversion. Operating the transistor in
moderate inversion is optimal when we value speed and power efficiency equally .
3
44
−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.80
50
100
150
200
Weak
Inversion
Moderate
Inversion
Strong
Inversion
Vov (V)
f T
(G
H
z)
−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.80
10
20
30
40
Weak
Inversion
Moderate
Inversion
Strong
Inversion
Vov (V)
g m
/
I D
(V
−
1
)
Fig. 3.11: gm/ID and fT versus overdrive voltage Vov.
−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.80
200
400
600
800
1,000
Weak
Inversion
Moderate
Inversion Strong
Inversion
Vov (V)
[g
m
/
I D
]f
T
(V
−
1
·G
H
z)
Fig. 3.12: The product of gm/ID and fT peaks in moderate inversion [4].
45
3.2.3 gds/ID Versus gm/ID
The drain current ID also varies with the drain-source voltage Vds due to the channel-
length modulation. A very important MOSFET parameter needed for analog circuit design
is the drain-to-source conductance gds, defined by [83]:
gds =
∂ID
∂Vds
=
1
ro
= λID (3.6)
where ro is the output resistance of the transistor, λ = gds/ID is the channel-length
modulation coefficient as mentioned in (3.1).
The gds/ID versus gm/ID plot for four drain-source voltages is presented in Fig 3.13.
The transistor length L = 70 nm, while the transistor width is swept in the range of
{1, 2, 3, 4, 6, ..., 160} µm. The variations are clearly appreciable. The gds/ID, i.e. the λ,
remains low when gm/ID is small in which case the transistor is in a strong inversion re-
gion, leading to a higher output resistance (ro = 1/(λID)) at a constant ID. Attention
could be paid that, the transistor intrinsic gain Ai, as defined in (3.7), is reciprocal to the
curve slope in Fig 3.13.
Ai = gmro =
gm
gds
=
gm/ID
gds/ID
(3.7)
It is worth observing the gds/ID versus gm/ID plot for different transistor length L. As
shown in Fig. 3.14, the gds/ID decreases when L increases. It is true because gds/ID, or say λ,
is reciprocal to the channel length L (λ ∝ (1/L)) in the first order model approximation [83].
The gds parameter has a considerable spread with variations in the drain-source volt-
age and different transistor length. These facts make us expect some differences between
computational data and simulations if the correct drain voltage is not chosen.
3.2.4 gmb/gm Versus gm/ID
The bulk potential influences the threshold voltage and hence the gate-source overdrive.
This is called bulk effect which mainly factored in as an effective increase in the threshold
4614
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
0
1
2
3
L = 70nm
W = {1, 2, ..., 160} µm
gm/ID (V
−1)
g d
s
/
I D
(V
−
1
)
NMOS gds/ID
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
0
1
2
3
L = 70nm
W = {1, 2, ..., 160} µm
gm/ID (V
−1)
g d
s
/
I D
(V
−
1
)
PMOS gds/ID
Vds = 0.3V Vds = 0.6V
Vds = 0.9V Vds = 1.2V
Vsd = 0.3V Vsd = 0.6V
Vsd = 0.9V Vsd = 1.2V
Fig. 1.13: gds/ID versus gm/ID for different Vds.
voltage VTH . The threshold voltage is given by [3]:
VT = VTH0 + γ
(√
|2φF + VSB| −
√
|2φF |
)
(1.8)
where VTH0, γ and φF are technology related parameters. This equation states that
the threshold voltage is a function of the technology and the applied source-bulk voltage
VSB. Define the transconductance gmb = ∂ID/∂VBS and its relation with gm is [3]:
gmb/gm = η =
γ
2
√
2φF + VSB
(1.9)
This equation suggests that incremental bulk effect becomes less pronounced as VSB
increases. gmb is much smaller than gm, as it is shown in Fig 1.15. However, it should
be extracted in the database especially when the bulk effect is important for the circuit
behavior.
Fig. 3.13: gds/ID versus gm/ID for different Vds.
voltage VTH . he threshold voltage is given by [83]:
VT VTH0 γ
(
|2φF VSB| |2φF |
)
(3.8)
here VTH0, γ and φF are technology related para eters. his equation states that
the threshold voltage is a function of the technology and the applied source-bulk voltage
VSB. efine the transconductance gmb ∂ID/∂VBS and its relation ith gm is [83]:
gmb/gm η
γ
2 2φF VSB
(3.9)
his equation suggests that incre ental bulk effect beco es less pronounced as VSB
increases. gmb is uch s aller than gm, as it is sho n in Fig 3.15. o ever, it should
be extracted in the database especially hen the bulk effect is i portant for the circuit
behavior.
4715
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
0
0.5
1
1.5
2
Vds = 0.6V
W = 10 µm
gm/ID (V
−1)
g d
s
/
I D
(V
−
1
)
NMOS gds/ID
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
0
0.5
1
1.5
2
Vds = 0.6V
W = 10 µm
gm/ID (V
−1)
g d
s
/
I D
(V
−
1
)
PMOS gds/ID
L = 70nm L = 80nm L = 90nm L = 100 nm L = 110 nm
L = 120 nm L = 150 nm L = 200 nm L = 250 nm L = 300 nm
Fig. 1.14: gds/ID versus gm/ID for different transistor length L.Fig. 3.14: gds/ID versus gm/ID for different transistor length L.
48
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
6
8
10
12
14
Vdb = 0.6V, VB = Ground
W = 10 µm, L = 70nm
gm/ID (V
−1)
g m
b
/
g m
(%
)
NMOS gmb/gm
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
6
8
10
12
14
Vdb = 0.6V, VB = VDD
W = 10 µm, L = 70nm
gm/ID (V
−1)
g m
b
/
g m
(%
)
PMOS gmb/gm
Vsb = 0V
Vsb = 0.2V
Vsb = 0.4V
Vbs = 0V
Vbs = 0.2V
Vbs = 0.4V
Fig. 3.15: gmb/gm versus gm/ID for NMOS and PMOS transistor.
3.2.5 Normalized Capacitance Cij Versus gm/ID
At high frequencies, the intrinsic capacitances of the MOSFET, which is mainly com-
posed of {Cgs, Cgd, Cgb, Cbd, Cbs}, must be taken into account since they substantially in-
fluence the circuit behavior. These capacitances change with the inversion level and the
transistor size [83] and are difficult to model or predict before simulation. Fiorelli et al. [85]
proposed a semi-empirical method to model the MOSFET intrinsic capacitance used for
gm/ID methodology. In this method, the intrinsic capacitances are considered to be propor-
tional to the gate area (WL). This can be done because these capacitances are proportional
to the oxide capacitance Cox which is itself proportional to (WL) [82,83].
The normalized capacitance is defined by:
Cij =
Cij
WL
, ij = {gs, gd, gb, bd, bs} (3.10)
Fig 3.16 and 3.17 shows the normalized capacitance Cij versus gm/ID of for NMOS and
49
PMOS transistors with a set of width and Vds. Cij are used to estimate the capacitances
of the transistors in the considered width range, only multiplying by the respective WL.
A rough estimation can be obtained by extracting a middle-sized MOSFET with a middle
bias of Vds.
A summary of their basic properties is:
• Cgs: (1) varies between 4-10 fF/µm2 through all regions; (2) the largest capacitance
that should be mainly taken into account in the circuit design.
• Cgd: (1) mostly remains at around 4 fF/µm2 though all regions; (2) can be as large
as Cgs when Vds is low in the strong inversion.
• Cgb: (1) varies between 1-3 fF/µm2 through all regions; (2) values are close to Cgd.
Therefore, Cgd and Cgb are the second largest capacitances that sometimes need to
be considered.
• Cbd: (1) no more than 0.05 fF/µm2 through all regions; (2) can be ignored in most
cases.
• Cbs: (1) no more than 1 fF/µm2 through all regions; (2) can be ignored in most cases.
3.2.6 Data Acquisition Scheme
In order to acquire these data, the test circuits of Fig. 3.18 is utilized. The MOSFET
gate, drain and source nodes are connected to a DC voltage source. The bulk nodes are
connected either to the ground (NMOS) or the supply voltage VDD (PMOS). VGS is swept
from 0 to VDD. VD and VS are set around their expected DC value. The power supply is
1.2 V in the process.
3.3 On-Chip Inductor gm/ID Model
In radio frequency design, on-chip passive inductors need to be correctly characterized
since they strongly determine the performance of the circuit. For example, in a VCO design,
5018
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
4
6
8
C
g
d
(f
F
/
µm
2
)
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
4
6
8
10
C
g
s
(f
F
/µ
m
2
)
NMOS, L = 70 nm, W = {1, 2, ..., 160}µm
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
0
1
2
3
C
g
b
(f
F
/µ
m
2
)
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
0
2 · 10−2
4 · 10−2
6 · 10−2
8 · 10−2
0.1
C
b
d
(f
F
/µ
m
2
)
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
0
0.2
0.4
0.6
0.8
1
gm/ID (V
−1)
C
b
s
(f
F
/µ
m
2
)
Vds = 0.3V Vds = 0.6V
Vds = 0.9V Vds = 1.2V
Vds = 0.3V Vds = 0.6V
Vds = 0.9V Vds = 1.2V
Vds = 0.3V Vds = 0.6V
Vds = 0.9V Vds = 1.2V
Vds = 0.3V Vds = 0.6V
Vds = 0.9V Vds = 1.2V
Vds = 0.3V Vds = 0.6V
Vds = 0.9V Vds = 1.2V
Fig. 1.16: Normalized capacitance Cij versus gm/ID for NMOS transistor, ij =
{gs, gd, gb, bd, bs}.
3.16: Normalized capacit nce Cij versus gm/ID for NMOS transistor, ij = gs gd, gb,
bd, bs.
5119
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
4
6
8
C
g
d
(f
F
/
µm
2
)
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
4
6
8
10
C
g
s
(f
F
/µ
m
2
)
PMOS, L = 70 nm, W = {1, 2, ..., 160} µm
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
0
1
2
3
C
g
b
(f
F
/
µm
2
)
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
0
2 · 10−2
4 · 10−2
6 · 10−2
8 · 10−2
0.1
C
b
d
(f
F
/µ
m
2
)
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
0
0.2
0.4
0.6
0.8
1
gm/ID (V
−1)
C
b
s
(f
F
/µ
m
2
)
Vds = 0.3V Vds = 0.6V
Vds = 0.9V Vds = 1.2V
Vds = 0.3V Vds = 0.6V
Vds = 0.9V Vds = 1.2V
Vds = 0.3V Vds = 0.6V
Vds = 0.9V Vds = 1.2V
Vds = 0.3V Vds = 0.6V
Vds = 0.9V Vds = 1.2V
Vds = 0.3V Vds = 0.6V
Vds = 0.9V Vds = 1.2V
Fig. 1.17: Normalized capacitance Cij versus gm/ID for PMOS transistor, ij =
{gs, gd, gb, bd, bs}.
3.17: Normalized capacit nce Cij versus gm/ID for PMOS transistor, ij = gs gd, gb,
bd, bs.
52
Vg Vs
Vd
Vg Vd
Vs
NMOS PMOS
ID ID VDD
Fig. 3.18: Test circuits to acquire gm/ID, gds/ID, gmb/gm and Cij .
if the tank inductor models a parasitic parallel resistor much lower than the real one, the
VCO will fail to oscillate.
Inductor models can be extracted by using the 3-D electromagnetic simulators by
solving Maxwell’s equations numerically, such as HFSS (Ansys, Canonsburg, PA), ADS
Momentum (Keysight, Santa Rosa, CA) or ASITIC [92]. Although accurate, they have
three main drawbacks: (1) they need to have the technological data provided by the foundry
to obtain accurate descriptions; (2) the simulations are computationally intensive, both in
memory and time, and (3) the use of these tools complicates the interface between the
inductor model and the circuit simulator (such as SPICE).
In order to speed-up the design, the inductor models in this dissertation are extracted
by utilizing the parameterized library provided by the foundry. The inductors in the library
are simulated using AC analysis at the working frequency to obtain their equivalent complex
impedance.
The schematic in Fig. 3.19 shows the AC analysis for single-ended and differential in-
ductor. For a single-ended inductor, one of its ports is connected to the source, and the
other is grounded. For a differential inductor, its middle port is AC grounded, and the other
two are connected to identical sources with a phase difference of 180 ◦ in order to consider
the differential voltages between these ports.
53
VAC
Single-ended
(a)
VAC
Differential
180°
VAC
0°
(b)
Fig. 3.19: AC analysis for (a) single-ended inductor and (b) differential inductor.
A extracted inductor can be modeled as an equivalent ideal inductor with a parasitic
resistor, either in series or parallel, at the working frequency f0, as presented in Fig. 3.20.
Ls
Rs
(a)
Lp Rp
(b)
Fig. 3.20: (a) Serial inductor modeling and (b) parallel inductor modeling.
The impedance of an inductor is defined by:
Zind = Rs + jω0Ls = Rp//jω0Lp (3.11)
The quality factor, Q, is:
Q =
ω0Ls
Rs
=
Rp
ω0Lp
(3.12)
54
At the working frequency, the serial combination of Fig. 3.20a can be converted to its
equivalent parallel configuration of Fig. 3.20b, by the relationship:
Lp = Ls
(
1 + 1
Q2
)
Rp = Rs
(
1 +Q2
) (3.13)
turns
w
d
Fig. 3.21: Layout of a differential inductor with its parameters: coil width w, number of
turns, and diameter d.
5523
0 2 4 6 8 10 12 14
0
500
1,000
1,500
2,000
turn={1,2,3,4,5,6}
d=(60:5:160)µm
R
p
(Ω
)
Parallel Resistance
0 2 4 6 8 10 12 14
0
5
10
15
20
turn={1,2,3,4,5,6}
d=(60:5:160)µm
R
s
(Ω
)
Series Resistance
0 2 4 6 8 10 12 14
0
5
10
15
20
turn={1,2,3,4,5,6}
d=(60:5:160)µm
L (nH)
Q
Quality Factor
w = 6 µm; w = 8 µm;
w = 10 µm; w = 12 µm.
w = 6 µm; w = 8 µm;
w = 10 µm; w = 12 µm.
w = 6 µm; w = 8 µm;
w = 10 µm; w = 12 µm.
Fig. 1.22: Series resistance, parallel resistance and quality factor (Q) versus L at f0 =
1.6 GHz.
Fig. 3.22: Series resistance, parallel resistance and quality factor (Q) versus L at f0 =
1.6 GHz.
56
CHAPTER 4
UWB TRANSCEIVER CIRCUITS IMPLEMENTATION
This chapter describes the design and implementation of each circuit block, focusing
primarily on ultra-low power target and the gm/ID methodology for circuit optimization.
The proposed UWB transceiver block diagram is shown in Fig 4.1, consisting of a
transmitter with error correction encoder, a receiver with a common-gate low noise amplifier
(CG-LNA), an injection-locked VCO (IL-VCO), a phase shifter and a divide-by-16 frequency
divider (/16). The phase shifter is used to align the phase between the Clock generated from
the divider and the amplified data pulses from the LNA.
The FPGA (field-programmable gate array) is used for test purpose, including gen-
erating PRBS (pseudorandom binary sequence) data to the TX and collecting decoded
(recovered) data from the RX for the BER calculation. Considering the properties of the
fabricated coils described in Chapter. 2, 1.6 GHz was selected as the carrier frequency, which
implies that the target sample clock generated for the 2-bit ADC is 100 MHz thus the target
data rate is 100 Mbps. Due to the fabrication deadlines, however, the 2-bit ADC was not
implemented on-chip.
4.1 Low Noise Amplifier
4.1.1 Brief Introduction
There are two candidate topologies for the LNA: common-source (CS) and common-
gage (CG). The basic CS-LNA and CG-LNA circuits are depicted in Fig. 4.2.
The input impedance of the CG-LNA stage Zin,CG is approximate 1/gm, ease of design
to match to 50 Ω, while that of CS-LNA Zin,CS is:
Zin,CS = s(Lg + Ls) +
1
sCgs
+ (
gm
Cgs
)Ls (4.1)
57
Balun
/16
Phase
Shifter
CG
LNA
IL-VCO
Soft Decoder
Hard Decoder
IL-VCO
Buffer
Mixer
2-bit
ADC
Balun
Digital
Clock
FPGA
PRBS Data
1.6GHz
Ref
RX 
Coil
TX 
Coil
FPGA
NRZ-RZ
Encoder
Receiver
Transmitter
Buffer
Fig. 4.1: Block diagram of the UWB transceiver prototype.
58
Ls
LgRs
Cgs
Cpad
Zin,CS
M
(a)
Rs
Cgs
Cpad+CbsZin,CG
Ls
M
(b)
Fig. 4.2: Basic LNA stage: (a) CS-LNA; (b) CG-LNA.
As can be seen, Zin,CS is sensitive to the input parasitic parameters, i.e. the process,
voltage and temperature (PVT) variations.
The effective transconductance of the CS-LNA stage, with the input matched to Rs,
is:
Gm CS =
1
2Rs
(
ωT
ω0
)
(4.2)
where ωT is the transition frequency, ω0 is the operating frequency.
The effective input transconductance of the CG-LNA with same perfect matching con-
ditions is:
Gm CG =
1
2Rs
(4.3)
Typically, the ωT /ω0 lies in the range of 5-10 depending on the operating frequency
and the process details. Therefore, the CS-LNA has higher gain than the CG-LNA [93].
59
The theoretical minimum noise figure of CS-LNA, with the optimum Q of the input
resonant circuit, is [93]:
Fmin CS = 1 +
γ
α
(
ω0
ωT
)
2δα2
5γ Qopt
Qopt =
√
1 + 2 |c|
√
5γ
δα2
+ 5γ
δα2
(4.4)
α, γ and δ are bias-dependent parameters, c is the correlation coefficient between the
gate noise and drain noise of the MOSFET [94].
While for the CG-LNA, the noise figure approximately equals to:
FCG = 1 +
γ
α
(4.5)
The noise figure of CG-LNA is constant with respect to ωT /ω0, while that of CS-LNA
is linear with ωT /ω0.
The CS-LNA also exhibits inferior reverse isolation and stability due to the Miller effect
because the Cgd provides a feed-forward path between input and output.
Table 4.1 represents the comparison of the two basic LNA topologies. In Hu’s design
[60], two CS-LNAs were deployed in order to achieve a broadband frequency response from
3 to 5 GHz. The first LNA was centered at f1 = 3.5 GHz and second one was centered at
f1 = 4.5 GHz. This solution did have a good gain and noise performance, however, at the
cost of large power consumption (45 mW for RX in [60]).
For the design in this dissertation, a single stage CG-LNA is adopted to meet the
Table 4.1: ADVANTAGE (X) AND DISADVANTAGE (5) OF LNA STAGES
Specification CS-LNA CG-LNA
Input Matching 5 X
Bandwidth 5 X
Gain X 5
Noise Figure X 5
Sensitivity 5 X
Power 5 X
Reverse Isolation 5 X
Parasitic 5 X
60
critical power constraint. Furthermore, the capacitor cross-coupled gain-boosting technique
is used to improve the noise figure and gain performance. Gain-boosting LNA designs have
been described in many publications [93,95,96]. Since the difficulty to accurately model the
parasitic parameters in radio frequency, most of the designs were based on simplified models
and equations. This section is focused on the full modeling construction and optimization
by using gm/ID methodology in all transistor regions.
4.1.2 Gain-Boosted CG-LNA
The basic idea of gain-boosting is introducing an inverting amplification, A, between
the source and gate terminals, resulting in the effective transconductance Gm,eff increased
to (1 +A)gm, as depicted in Fig. 4.3.
Vin=Vs
S
M
G
D
G
S
D
gmvgs
=-gmvs
Vgs ro
(a)
Vin=Vs
S
M
G
D
-A
G
S
D
gmvgs
=-gm(1+A)vs
Vgs=-(1+A)Vs ro-A
(b)
Fig. 4.3: Gain-boosting basic idea: (a) conventional CG-LNA; (b) gain-boosted CG-LNA.
In a differential topology, the inverting gain is naturally available. Fig. 4.4 shows the
61
complete gain-boosted CG-LNA circuit with capacitive cross-coupling (C0 and C1). In a
conventional CG-LNA, the gates are AC shorted to ground. In the gain-boosted CG-LNA,
however, the gate of M0, for example, is connected to the source of M1 through C1.
As presented in the signal model of M0 in Fig. 4.5, the gate G0 is not AC grounded,
thus the gate-source voltage Vgs is some voltage ratio derived from Vs1−Vs2 = −2Vs, which
can be proved to be:
Vgs
−2Vs =
Cc
Cc + Cgs
(4.6)
Hence the drain current of M0 is now:
Id0 = −gm
[
2Cc
Cc + Cgs
]
Vs (4.7)
Thus the effective transconductance is:
Gm,eff =
[
2Cc
Cc + Cgs
]
gm = (1 +Ab) gm (4.8)
where Ab = (Cc − Cgs) / (Cc + Cgs). In this case, Cc  Cgs resulting Ab ≈ 1 and
Gm,eff ≈ 2gm.
4.1.3 Signal Modeling
The equivalent half circuit model of the differential CG-LNA is presented in Fig. 4.6.
It consists of a common gate source-degenerated transistor M with its effective transcon-
ductance Gm = (1 + Ab)(gm + gmb) (bulk effect included). The input signal is fed to the
source terminal via the source impedance Rs (mostly 50 Ω), the off-chip capacitor Cext, the
bond wire and the pad. Cext is included to facilitate the input matching condition with
feasible low-value source inductor Ls. The output stage consists an inductor Ld connected
to the power supply and a varactor Cv for inter-stage matching. The load ZL is the total
input impedance of the following stages (IL-VCO and buffer).
62
Cc Cc
2Ls
RBRB
VBIAS
1/2Cv
2Ld
V DD
Vip Vin
Vop Von
M0 M1
Fig. 4.4: Complete gain-boosted CG-LNA.
G0
S0
D0
vgs ro
S1
Cc
Cgs
Vs1=-Vs
Vs0=Vs
Id0=
-gmvs (before)
-Gmvgs (after)
Id0
Fig. 4.5: Effective transconductance calculation of capacitive cross-coupled CG-LNA.
63
Rs
Cpad
Zin,CG=Rs+j0
Ls
M
Bond wireVin
Zin
Ld
Vbias
Cv
VDD
Zout
Gm
S
D
Cext Cext
Following 
Stage
ZL
Zd,mos
Fig. 4.6: Equivalent half circuit model of the CG-LNA.
In fact, the parasitic parameters of the pad, bond wire, the package leads and even
the PCB (Printed Circuit Board) traces should be considered and modeled to make sure
the total input impedance of the LNA, Zin,CG, is resistively equal to Rs. This can be
done by simulating and extracting those parasitics from the Process Design Kit (PDK) or
information offered by the foundry. However in this dissertation, only the circuit design
considerations are discussed, all other parasitics like bond wire, pad, etc., are out of the
scope of the discussion (but they were considered in the actual chip design). Therefore,
in this case, Cext is moved forward and included in the input impedance Zin as shown in
Fig. 4.6.
The small signal model for half of the differential CG-LNA is depicted in Fig. 4.7. Rps
and Rpd are the parallel parasitic resistances of Ls and Ld, respectively. Note that serial
parasitic resistance of Ls should be small in order to limit the power consumed (wasted) by
Ls. Thus, the Rps is much larger than Rs. Zin and Zout are the input and output impedance
64
of the CG-LNA, respectively. Zd,mos is the output impedance seen into the transistor from
the drain, and Zd is the input impedance of the output stage.
Input Impedance Match
The input matching condition is Zin = Rs + j0, hence computing Zin we have:
Z−1in = jω0 (Cext + Cgs + Cbs) +
1
jω0Ls
+
[
1
Rps
+
1 +Gmro
ro + Zd
]
=
1
Rs
+ j0 (4.9)
Solving for Zd and Cext gives:
Zd = Zd,match = Rs
1 +Gm/gds
1−Rs/Rps −
1
gds
(4.10)
Cext =
1
ω20Ls
− Cgs − Cbs (4.11)
Here, ro = 1/gds, ω0 is the center frequency of the input signal. Zd must be fixed to
Zd,match in (4.10) to achieve the input matching.
Output Impedance Match
Short Vin to AC ground, and notice that (Cext +Cgs +Cbs) and Ls have been canceled
out at the center frequency ω0, thus the output impedance is:
Z−1out = jω0 (Cv + Cbd + Cgd) +
1
jω0Ld
+
(
1
Rpd
+ Z−1d,mos
)
(4.12)
where Zd,mos can be obtained after some calculations:
Zd,mos =
1
gds
+
(
1 +
Gm
gds
)
(Rs//Rps) (4.13)
The output impedance matching condition is that Zout is conjugate-matched to the
load ZL, i.e ZL = Z
∗
out. Therefore, Zd can be expressed as:
65
Input Stage Output Stage
Rps
S
ro
GmVs
D
Ls RpdLd
Zin
Zout
RsVin
Cext+Cbs+Cgs
Cv+Cbd+Cgd
ZL=Zout*ZdZd,mos
Following 
Stage
Fig. 4.7: Small signal model for gain-boosted CG-LNA.
Z−1d =
1
jω0Ld
+ jω0 (Cv + Cbd + Cgd) +
1
Rpd
+ (Z∗out)
−1 (4.14)
Substituting (4.12) into (4.14) gives:
Z−1d =
2
Rpd
+ Z−1d,mos (4.15)
Note that Zd must be set as Zd,match in (4.10) to facilitate the input matching condition,
hence we have:
1
Rpd
=
1
2
(
Z−1d,match − Z−1d,mos
)
(4.16)
Power Gain
For convenience, the signal model for the LNA is sketched as shown in Fig. 4.8. Under
perfect matching (at ω0), the input capacitance and inductance are canceled out. The input
impedance of the input stage Zin thus equals to Rs, gives Vs = (1/2)Vin. Part of the input
66
current Iin is shunted into Rps which is denoted as Ips, while the other part is transferred
to the output stage (Iout).
According to the Kirchhoff’s current law (KCL), the Iout can be deduced as:
Iout =
(
1− Rs
Rps
)
Iin (4.17)
The input power is:
Pin = I
2
inRs (4.18)
At the output stage, Iout is separated into Zout and ZL where ZL is conjugate-matched
to Zout because of output matching. If Zout = a+ jb, then ZL = a− jb. Only the real part
consumes power thus the load power is:
Pout = a · I2out/4 (4.19)
The power gain of the CG-LNA can be written as:
G = 10 log
(
Pout
Pin
)
= 10 log
[
a
4Rs
(
1− Rs
Rps
)2]
(4.20)
The real part can be obtained from (4.12) and (4.16):
a−1 =
1
Rpd
+ Z−1d,mos =
1
2
(
Z−1d,match + Z
−1
d,mos
)
(4.21)
At last, the noise factor is given by [93]:
F = 1 +
γ
α
1
(1 +Ab)GmRs
(4.22)
4.1.4 gm/ID Design Verification
With the obtained input/output impedance matching, power gain and noise factor as
functions of the MOSFET model, this section details the gm/ID design flow to optimize the
67
Input Stage Output Stage
Rps
VS
Zin=Rs
Vin
a+jb
Rs
Iin
Ips
Iout
a-jb
Iout
Zout Zout*
ZL
Fig. 4.8: Power gain calculation.
power gain, noise figure within a power constraint.
Before that, minimum transistor length Lmin is chosen to be 70 nm in order to maximize
the fT . Find a set of Ls with large parallel parasitic resistance in the inductor database. In
this section, 10 pairs of (Ls, Pps) are selected. The working frequency f0 is 1.6 GHz. Cv can
be temporarily out of consideration; it has to be included in the system simulation when
the LNA is connected with the following well-designed circuit blocks. The gm/ID design
algorithm is organized in Algorithm. (1) as below.
68
Algorithm 1 gm/ID Methodology For Gain-boosted Common-gate LNA Design
Input: (Ls, Pps): A pair from inductor database; an acceptable Power.
Output: Power Gain (G).
1: for each Power do
2: Compute total current 2ID;
3: for gm/ID = 3 : 1 : 24 do
4: Find the normalized current i, compute transistor width W ;
5: Compute Cij = WLCij ;
6: Find the output transconductance gds;
7: Compute Zd,match from (4.10), Cext from (4.11), Zd,mos from (4.13);
8: Compute Rpd from (4.16), a from (4.21);
9: Find a Ld with the parasitic parallel resistance equal or close to Rpd, if it does
not exit, return an error flag;
10: Compute power gain G from (4.20);
11: end for
12: end for
Table. 4.2 shows three different designs with gm/ID=6, 12, and 18, all targeting the
power gain of 10 dB. Similar results are appreciated according to the comparison of the
Matlab calculations and the SpectreRF simulations. The power gains have less than 20 %
difference. Considering the noise figure, Table. 4.2 also reveals a better noise performance
when moving towards the strong inversion.
Table 4.2: MATLAB CALCULATIONS AND SPECTRERF SIMULATIONS COMPARI-
SON FOR THE CG-LNA.
gm/Id
(V−1)
Id
(mA)
W
(µm)
Ld
(nH)
G (dB)
NF
(dB)
Calc. Sim. Calc. Sim. Calc. Sim. Calc. Sim. M Sim. M C
6 6.37 2.50 2.49 24.24 24.24 15.31 14.81 10 9.05 (9.5%) - 2.94
12 12.43 1.60 1.61 72.56 72.6 13.18 14.22 10 8.67 (13.3%) - 3.60
18 18.54 1.05 1.03 180.8 182 10.89 12.73 10 8.30 (17%) - 4.03
69
SpectreRF simulation results of S11, S22, power gain, and noise figure (NF) for the
three CG-LNA designs are shown in Fig. 4.9.
The variable capacitance Cv can be adjusted off-chip by tuning a control signal Vtune
from 0-1.2 V to compensate the process variation. The voltage gains with Vtune LNA=0 V
and 1.2 V are plot in Fig.4.10. The average power for the LNA is 3.37 mW.
4.2 Injection-Locked VCO
4.2.1 Brief Introduction
Oscillator injection locking is widely used in communication systems such as frequency
division [6, 97–99], phase-locked loop [100], quadrature generation [101, 102]. When an
external signal is applied to an oscillator, the latter stops to be an autonomous circuit and
synchronizes to the external signal. This phenomenon has been investigated by Adler in
1946 [76], and since then by many other authors, more recently in [77,103–105].
Generally, a free-running oscillator contains a gain block H(jω) and a filter F (jω) as
presented in Fig. 4.11a. According to the Barkhausen criteria [83], the oscillation can be
sustained at a given frequency ω0 if the negative feedback has a loop gain and phase that
satisfy:
|H(jω)F (jω0)| ≥ 1
∠H(jω)F (jω0) = pi
(4.23)
Suppose that the gain block is capable of satisfying the gain criteria in 4.23 at any
frequency, then the oscillation occurs at ω0 where the filter satisfies the phase criteria in
4.23. ω0 is the free-running frequency of the oscillator.
Consider the case in Fig. 4.11b, an external signal Vinj at frequency ωinj is applied to
the oscillator. The phase shift across the loop at ω0 is no longer pi, and the oscillation
cannot be at ω0. However, under some conditions, the phase response of the filter will
compensate an extra phase shift, and the Barkhausen’s phase criteria is satisfied at ωinj ,
instead of ω0 [77]. We say that the oscillator is injection-locked by Vinj and its oscillation
70
0 1 2 3
−25
−20
−15
−10
−5
0
S
1
1
(d
B
)
0 1 2 3
−20
−15
−10
−5
0
S
2
2
(d
B
)
0 1 2 3
2
4
6
8
10
Frequency (GHz)
N
F
(d
B
)
0 1 2 3
−20
−10
0
10
Frequency (GHz)
P
ow
er
G
ai
n
(d
B
)
gm/Id = 6, Id = 2.5mA gm/Id = 12, Id = 1.6mA gm/Id = 18, Id = 1mA
Fig. 4.9: SpectreRF simulation results of S11, S22, power gain, and noise figure (NF) for
the three CG-LNA designs.
71
0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3
−10
0
10
20
Frequency (GHz)
V
o
lt
a
g
e
G
a
in
(d
B
)
Vtune LNA=0 V
Vtune LNA=1.2 V
Fig. 4.10: Voltage gain when Vtune LNA=0 V and Vtune LNA=1.2 V.
frequency moves from the natural frequency ω0 to the injection frequency ωinj . The range of
frequencies ω0±ωL where synchronization occurs defines the locking range of the oscillator.
This is called first-harmonic injection locking [103].
In fact, if the injection signal contains harmonic frequency component that is close to
ω0, the oscillator will oscillate at a fractional or multiple of the input frequency ωinj . To be
more clear, a sketch map of the injection-locking phenomenon is shown in Fig. 4.12. ωout
is the output frequency of the oscillator being injection-locked. Note that n is an integer
(n={1, 2, 3, ...}).
• If ωinj is close enough to ω0 (in the locking range of the oscillator), the oscillation will
be sustained at ωinj instead of ω0. This is called first-harmonic injection locking.
• If ωinj is close enough to nω0, or say, (1/n)ωinj is close enough to ω0, the oscillation
will be sustained at (1/n)ωinj . This is called super-harmonic injection locking and is
widely used for frequency dividers [6, 97–99,103,106].
• If ωinj is close enough to (1/n)ω0, or say, nωin is close enough to ω0, the oscillation will
be sustained at nωin. This is called sub-harmonic injection locking which is popular
for frequency multipliers [100].
72
H(jw)
F(jw)
Vo(jw)noise
(a)
H(jw)
F(jw)
VoVinj
(b)
Fig. 4.11: Injection locking phenomenon: (a) free-running oscillator; (b) injection-locked
oscillator.
This section presents the analysis and modeling of a first-harmonic injection-locked
VCO, and deduce necessary equations for gm/ID methodology.
4.2.2 Signal Modeling
The detailed view of the IL-VCO is exhibited in Fig. 4.13 [60]. It shows the comple-
mentary VCO composed by NMOS M5 and M6, PMOS M7 and M8, with its LC tank which
consists of L and a digital controlled capacitor bank Cbank. In order to differentiate it from
the whole IL-VCO for convenience, we give this part an alias—VCO Core, or Core.
M1 and M2 are the injection differential pair with the isolation transistors M3 and M4.
The VCO Core is biased at Iosc while the injection pair is biased at Iinj . The bias current
Ibias is adjusted by an off-chip variable resistor Rv. M10 and M13 exhibit small resistance
since their gates are at VDD voltage level. M11 andM14 act as a MOS capacitance. Together,
they form a low-pass filter to suppress noise from the reference current Ibias.
To simplify the analysis, we assume that the four transistors in the VCO Core are sized
so that their transconductances are the same, i.e. gm5,6 = gm7,8 = gm.
Cross-Coupled Structure
The cross-coupled transistors M5/M6 and M7/M8 need to be carefully analyzed before
73
21/n
=
Oscillator
1/2 n
(a)
(1/n)
Oscillator
21/n 1/2 n
= (1/n)
(b)
Oscillator
21/n 1/2 n
n
= n
(c)
Fig. 4.12: Injection-locking phenomenon: (a) first-harmonic injection locking; (b) super-
harmonic injection locking; (c) sub-harmonic injection locking.
74
M1 M2
M3 M4
M12
M5 M6
M7 M8
L
Cbank
Iinj
Iosc
M9 M10
M11
M13
M14
M15
Vin+ Vin-
Vout- Vout+
Rv
Ibias
Off-chip
Fig. 4.13: Injection-locked VCO.
75
the complete equations being constructed. It only needs to check the NMOS pair M5 and
M6 since the case is similar to the PMOS complementary M7 and M8.
Fig. 4.14 shows the NMOS cross-coupled circuit and the modeling simplification steps
from the complete small signal model to the reduced model. Since M5 and M6 are matched,
it holds that Cij5 = Cij6 in which ij = {gd, db, ds, gb, gs}, as well as gds5 = gds6. Thus, as
shown in Fig. 4.14, we have:
Ct5 = Ct6 = Cdb5,6 + Cds5,6 + Cgb5,6 + Cgs5,6 (4.24)
Cnmos = 2Cgd5,6 +
1
2
(Cdb5,6 + Cds5,6 + Cgb5,6 + Cgs5,6) (4.25)
where the Cnmos is the equivalent parasitic capacitances for the NMOS cross-coupled
circuit. Similarly, for PMOS cross-coupled circuit:
Cpmos = 2Cgd7,8 +
1
2
(Cdb7,8 + Cds7,8 + Cgb7,8 + Cgs7,8) (4.26)
Note that the final reduced model in Fig. 4.14 contains a negative resistance −2/gm,
thus this cross-coupled circuit is also called “negative-Gm oscillator” [83]. The analysis
based on a negative feedback perspective could also be found in [83].
Complete IL-VCO Model
Now the complete model for the VCO Core can be constructed as shown in Fig. 4.15.
The inductor L is modeled as a ideal inductor Lind paralleled with its parasitic resistance
Rind (the transconductance form gind = 1/Rind). As can be seen, the VCO Core can finally
be modeled as a RLC network: a positive resistance 1/gcore, a negative resistance −1/gm,
an ideal inductor Lind and the parasitic capacitance Ccore, where:
Ccore = Cnmos + Cpmos + Cbank (4.27)
76
Cgd5
Cds5 Cgs51/gds5 gm6vgs6
Cgd6
Cgs6 Cds6gm5vgs5
1/gds6
D5 D6
Cgb5 Cgb6
S=B
G5 G6
Cgd5
Cgd6
Cdb5 Cdb6
Cds5
1/gds5
Cgs6Cgb6Cdb5 gm5vgs5 Cgb5
1/gds6
Cdb6Cds6Cgs5gm6vgs6
Cgd5+Cgd6
D5=G6 D6=G5
1/gds5
1/gds6
Ct5
Ct6
gm5vgs5
gm6vgs6
Cnmos 2/gds5,6 -2/gm
D5
D6
D5
D6
M5 M6
D5 D6
G5 G6
Cross-Coupled NMOS
Complete Small Signal Model
A Reduced Model
Fig. 4.14: Cross-coupled NMOS small signal analysis.
77
gcore =
1
2
gds5,6 +
1
2
gds7,8 + gind (4.28)
Now consider the injection differential pair which consists of M1, M2, M3 and M4. The
cascode structure is used for reverse isolation purpose. Intuitively, without M3 and M4,
the output resistances and capacitances seen from the drain of M1 and M2 will be directly
applied to the Core, and affects the Core’s behavior. Stacking the transistors M3 and M4
on M1 and M2, respectively, will not only increase the applied resistances, but also decrease
the applied capacitances.
To see this, and also to deduce the equations for the gm/ID design flow, we first observe
the small signal model of the cascode circuit, which is illustrated in Fig. 4.16. C1 includes
Cgd1, Cbd1, Cbs1 and Cgs3. C3 includes Cbd3 and Cgd3. Bulk effect of M3 is considered in
this case.
In order to calculate the partial impedance Z ′inj shown in the last step in Fig. 4.16,
apply a virtual output voltage vo and output current io at node D3 , thus the voltage at
node D1 is:
VD1 = [io + (gm3 + gmb3)VD1]
(
1
sC1
//ro1//
1
gm3 + gmb3
)
= [io + (gm3 + gmb3)VD1]ZD1 (4.29)
Dividing both sides by io, and notice that Z
−1
D1 = sC1 + 1/ro1 + (gm3 + gmb3), we can
obtain:
VD1
io
=
1
Z−1D1 − (gm3 + gmb3)
=
ro1
1 + sro1C1
= ro1//
1
sC1
(4.30)
Refer again to Fig. 4.16 and take a look at the voltage and current applied to ro3, it
holds that:
[io + (gm3 + gmb3)VD1] ro3 = vo − VD1 (4.31)
78
-2/gmCpmos 2/gds7,8
Lind
Rind Cbank 2/gd5,6 -2/gm Cnmos
PMOS Pair NMOS PairLC Tank
Lind
-/gmCcore 1/gcore
Fig. 4.15: Complete VCO Core small signal model.
M3
M1
Zinj
1/gds1
C3
C1
D1=S3
D3
D1=S3
1/gds3
-(gm3+gmb3)VD1
Zinj
1/gds1
C3
C1
D1=S3
1/gds3
Zinj
-(gm3+gmb3)VD1
ro1=
1/gds1 C2
C1
D1=S3
ro3=1/gds3
(gm3+gmb3)VD1
1/(gm3+gmb3)
D3
D3D3
Z’inj
io
vo
Zinj
Fig. 4.16: Small signal model for cascode structure.
79
Dividing both sides by io and making some rearrangement, Z
′
inj = vo/io can be deduced
as:
vo
io
= ro3 + [1 + (gm3 + gmb3) ro3]
[
ro1//
1
sC1
]
= ro3 +Ac
[
ro1//
1
sC1
]
(4.32)
Here, Ac = 1 + (gm3 + gmb3) ro3 is the gain introduced by the cascode structure. Z
′
inj
can be considered as a resistor ro3 in series with an parallel network, which consists a
resistor Acro1 and a capacitor C1/Ac. As can be seen, the applied resistances to the Core
are increased by a factor of Ac, while the capacitances are reduced by a factor of Ac,
minimizing the impacts from the injection differential part to the VCO Core, and also
prevents the large oscillation output signal of the Core feeding back through. Because M3
and M4 are used for isolation purpose, their length could be selected larger in order to
increase their output resistance ro3,4. Finally, the total output impedance of the cascode
circuit Zinj equals to:
Zinj = Z
′
inj//
(
1
sC3
)
(4.33)
To sum up, the complete small signal model of the whole IL-VCO is illustrated in
Fig. 4.17. Since M3 and M4 are biased in the triode region, ro3 and ro4 are small thus can
be ignored for simplicity. CT and gT are the total parasitic capacitance and total positive
transconductance of the IL-VCO which are given by:
CT = Ccore + 2C3,4 +
2
Ac
C1,2
gT = gcore +
2
Ac
gds1,2
(4.34)
where C1,2 = Cgd1,2 + Cbd1,2 + Cbs1,2 + Cgs3,4, C3,4 = Cbd3,4 + Cgd3,4.
80
Lind
-/gmCcore 1/gcore 2C3,4 2C1,2/Ac 2Acro1,2
VCO Core Injection Part
ro3
ro4
CT -1/gm 1/gT
Lind
Fig. 4.17: Complete small signal model of IL-VCO.
Oscillation Start-up Criteria
For oscillation built-up, the absolute value of the positive resistance should be larger
than the negative one [83], i.e. :
gm > gT (4.35)
In order to ensure oscillation in the presence of temperature and process variations,
typically the gm is selected larger than the theoretical criteria shown in (4.35) [83]. A safty
margin could be set by defining a coefficient kosc to transform the inequality [107]:
gm = koscgT (4.36)
where kosc is generally in the range of 1.5-3.
The MOSFET intrinsic gain Ai, as defined in (3.7) in Chapter. 3, is rewritten as below:
Ai = gmro =
gm
gds
=
gm/ID
gds/ID
(4.37)
Suppose gmb3,4 = ηgm3,4, the gT can thus be transformed into:
81
gT = gcore +
2
Ac
gds1,2
=
1
2
gds5,6 +
1
2
gds7,8 + gind +
2
1 + (gm3,4 + gmb3,4) ro3,4
gds1,2
=
1
2
gds5,6 +
1
2
gds7,8 + gind +
2
1 + (1 + η) gm3,4/gds3,4
gds1,2 (4.38)
= gind +
gm
2
(
1
Ai5,6
+
1
Ai7,8
)
+
2gm1,2
Ai1,2 [1 + (1 + η)Ai3,4]
(4.39)
Substituting (4.36) into (4.39), we can get:
gind = β1gm + β2gm1,2 (4.40)
where
β1 =
1
kosc
− 1
2Ai5,6
− 1
2Ai7,8
(4.41)
β2 =
−2
Ai1,2 [1 + (1 + η)Ai3,4]
(4.42)
Hence we have:
gm =
1
β1
(gind − β2gm1,2) (4.43)
The free-running oscillation frequency of the IL-VCO is:
fvco =
1
2pi
√
LindCT
(4.44)
The locking range of a sine wave IL-VCO is described in [76,77]:
ωL =
ωout
2Q
· Iinj
Iosc
· 1√
1− I
2
inj
I2osc
(4.45)
82
where Q represents the quality factor of the tank. Suppose the injection occurs once
every N cycles, the locking range needs to be modified as [103]:
ωL =
ωout
2Q
· Iinj
Iosc
· 1
N
· 1√
1− I
2
inj
N2I2osc
' ωout
2Q
· Iinj
Iosc
· 1
N
(4.46)
where the total quality factor Q of the tank is given by:
Q =
1
2pif0 · gT · Lind (4.47)
In our case, f0 = 1.6 GHz and the target data rate is DR = 100 Mbps. Therefore, N
can be calculated by:
N =
1
2·DR
1/f0
=
f0
2 ·DR = 8 (4.48)
A short discussion is worth taken to bring the results of the above analysis back into
perspective. The VCO Core takes the responsibility to oscillate in a quiescent state. The
Core is an RLC lossy tank, and the cross-coupled NMOS/PMOS pairs supply the energy
to start and maintain oscillation by a form of, intuitively, a negative resistance −1/gm.
In the above analysis, all the capacitances and inductance are modeled for the oscillation
frequency estimation, and the transconductances, or resistances, are modeled for the lossy
parts which are the actual energy consumers. gm value should be “strong” enough to
cover the loss introduced by the resistances. The injection part should be simultaneously
considered when designing the VCO Core because of the unignorable parasitic capacitances
of M3 and M4. However, the impacts from M1 and M2 are well suppressed and could be
neglected because of the Ac introduced by the cascode structure.
4.2.3 gm/ID Design Verification
With the analytical expressions deduced above, the gm/ID design flow can be con-
structed. Firstly, set the minimum transistor length Lmin = 70 nm for M1, M2, M5, M6,
83
M7, and M8, set the transistor length Lmin = 250 nm for M3 and M4, and define the work-
ing frequency f0 is 1.6 GHz. Secondly, pick up a high-Q inductor Lind in the database, and
find its parallel equivalent resistance Rind hence get its gind = 1/Rind. Finally, suppose
Iosc = 2ID, and Iinj = Iosc where  > 1. The design algorithm is organized in Algorithm. 2
as below.
Algorithm 2 gm/ID Methodology For Injection-locked VCO Design
Input: Transistor length; high-Q Lind; kosc = 2;
Output: Tunable oscillation centered at f0 = 1.6 GHz; Power consumption;
1: for each ID = (1 : 1 : 10)× 50 µA do;
2: for each (gm/ID) = 3 : 1 : 24 do
3: Find normalized current of i5,6 and i7,8;
4: Find (gds/ID)5,6 and (gds/ID)7,8;
5: Compute intrinsic gain Ai5,6 and Ai7,8 from (4.37);
6: Find normalized capacitance Cij5,6 and Cij7,8;
7: for each  = 1.5 : 0.5 : 5 do
8: for each (gm/ID) = 3 : 1 : 24 do
9: Find normalized current of i1,2 and i3,4;
10: Compute gm1,2 and gm3,4;
11: Find gmb3,4 and compute η = gmb3,4/gm3,4;
12: Find (gds/ID)3,4;
13: Compute Ai3,4 from (4.37);
14: Find normalized capacitance Cij1,2 and Cij3,4;
15: Compute β2 from (4.42);
16: Compute gm from (4.43);
17: Compute transistor width W5,6 from i5,6 and ID;
18: Compute transistor width W7,8 from i7,8 and ID;
19: Compute transistor width W1,2 from i1,2 and ID;
20: Compute transistor width W3,4 from i3,4 and ID;
21: Compute Cij1,2, Cij3,4, Cij5,6 and Cij7,8;
22: Compute Ac from (4.32);
23: Compute CT from 4.44, Ccore from (4.34);
24: Compute Cnmos from (4.25), Cpmos from (4.26);
25: Compute Cbank from (4.27).
26: Compute Power=VDD × (Ibias + 2ID + 2ID);
27: end for
28: end for
29: end for
30: end for
84
Comparing the Matlab calculations and the SpectreRF simulations, similar design pa-
rameters are obtained with small differences, as shown in Table. 4.3. The simulated average
power is 1.21 mW.
Table 4.3: MATLAB CALCULATIONS AND SPECTRERF SIMULATIONS COMPARI-
SON FOR THE IL-VCO.
Transistors
gm/ID (V
−1) Width (µm)
Calc. Sim. Calc. Sim.
M1, M2 15 16.4 49.1 48
M3, M4 10 9.1 14.8 16
M5, M6 20 21.8 17.8 20
M7, M8 20 18.6 40 40
Iosc (µA) Iinj (µA)
Average Power
(mW)
Calc. Sim. Calc. Sim. Calc. Sim.
100 92 1000 915 1.32 1.21
Without an injected signal, the IL-VCO runs at its free-running frequencies which
can be coarsely tuned by the 4-bit capacitor bank (shown in the next section). When the
controlling bits b3b2b1b0 change from 0000 to 1111, the IL-VCO runs freely at totally 16
sub-bands, as shown in Fig. 4.18.
When a sinusoid signal is injected, the IL-VCO will automatically be locked at the
frequency of the signal. If the injected signal is a carrier-based pulse chain, the IL-VCO
transits between locked and unlocked status, as shown in Fig. 4.19, leading to a higher phase
noise, i.e., a higher clock jitter. Fig. 4.20 presents a locking range of 1.52-1.66 GHz of the
IL-VCO. The average power of the IL-VCO is 1.61 mW.
4.2.4 Digital Controlled Capacitor Bank
When designing a VCO with wide tuning range, it is better to divide the full tuning
range into multiple overlapping bands. In this case, binary control signals are used to select
the band of interest (coarse tuning), and analog signal fine-tunes the free-running oscillation
frequency in the selected band.
85
0 2 4 6 8 10 12 14 16 18 20
1.4
1.6
1.8
2
time (ns)
F
re
q
u
en
cy
(G
H
z)
0 2 4 6 8 10 12 14 16 18 20
0
0.5
1
1.5
A
m
p
li
tu
d
e
(V
)
Fig. 4.18: IL-VCO free-running status: transient signal and coarse tunable frequencies from
b3b2b1b0 = 0000 to b3b2b1b0 = 1111.
86
5 10 15 20 25 30 35 40 45 50
1.58
1.59
1.6
1.61
1.62
time (ns)
F
re
q
u
en
cy
(G
H
z)
5 10 15 20 25 30 35 40 45 50
−0.2
0
0.2
A
m
p
li
tu
d
e
(V
)
Fig. 4.19: IL-VCO running status with injected signal.
0 10 20 30 40 50 60 70 80 90 100
1.5
1.6
1.7
free running locked
time (ns)
F
re
q
u
en
cy
(G
H
z)
Fig. 4.20: IL-VCO locking range 1.52-1.66 GHz.
87
To implement this scheme, a bank of 4-bit binary-weighted switched capacitors (Cd)
and a small differential accumulation mode varactor (Cv) are employed to realize the Cbank
computed previously, as shown in Fig. 4.21. The capacitors nCd (n = 1, 2, 4, 8) are connected
to each output node and are switched to ground by switches Mpd. Each switch contributes
additional loss to the tank due to its finite resistance. Thus, minimum-length NMOS devices
are utilized and made as wide as can be tolerated with regards to the resulting parasitic
drain-to-bulk capacitance, which ultimately limits the achievable tuning range [108].
4.3 Polyphase Filter
In order to control the clock phase, quadrature signals are required. One way to
generate in-phase and quadrature signals is using a VCO operating at twice the frequency of
interest and divide-by-2 circuit, performed either in the digital or in the analog domain [109].
Fig. 4.22 represents a basic implementation with two master-slave flip-flops. The outputs
of a set of divide-by-2 flip-flops are triggered by opposite phases of a 50 % duty cycle clock,
the outputs are in quadrature but at half the clock frequency. Important drawbacks of this
scheme are the significant increase in power consumption because of the high-frequency-
running flip-flops, and the need for accurate 50 % duty cycle in the oscillator output.
Another way to obtain quadrature signals through the use of a VCO design capable
of directly delivering such signals. A ring oscillator fulfills the requirement [110]. However,
not to say the notorious high phase noise of the ring oscillator, an additional oscillator is
obviously not acceptable because of the constraint power limit. A third way for quadrature
generation is coupling two symmetric LC-VCOs to each other, as exemplified in Fig. 4.23, the
combination of a direct connection and a cross (inverting) connection forces the two VCOs
to oscillate in quadrature [111]. As mentioned, any additional oscillator will significantly
give more burden on the power budget.
An attractive method to generate quadrature signals is to use RC − CR network, or
polyphase filter, as presented in Fig. 4.24 [112–114]. The balanced quadrature signals are
directly generated from the differential signal (Ri = R, Ci = C, i = 1, 2, 3, 4).
88
8Cd 8Cd
b3
Mpd Mpd
8Ws/Ls
4Cd 4Cd
b2
Mpd Mpd
4Ws/Ls
2Cd 2Cd
b1
Mpd Mpd
2Ws/Ls
Cd Cd
b0
Mpd Mpd
Ws/Ls
Cv Cv
Vtune V CO
Fig. 4.21: Capacitor bank.
89
Q
Q
SET
CLR
D
CLK
D Q
Q
SET
CLR
D
CLK
D
I+
I-
Q+
Q-
VCO
fin=2fo
fout=fo
Fig. 4.22: Quadrature generation by divide-by-2 flip-flops.
0°
180°
90°
270°
180°
0°
VCO1 VCO2
Fig. 4.23: Basic digram for quadrature VCO.
One stage polyphase filter exhibits quadrature phase shift only around the pole fre-
quency, fp = 1/2piRC. To achieve a broadband response, several stages must be cascaded.
The expense of the multi-stage polyphase filter is it lossy characteristic. When one stage
loads the output without buffering, the resulting voltage division by 2 lowers the 3 dB of
gain, i.e. 3 dB loss for one stage [113]. Therefore, amplifier(s) is required for multistage
polyphase filter [113].
Phase inaccuracy results in larger phase sweeping steps for the phase shifter in a specific
phase range, thus leading to worse phase align between the clock and data, degrading the
BER performance. However, in our case, very accurate quadrature phases are not necessary.
As long as the phase shifter can tolerate the 3 dB loss of I/Q signals, the polyphase filter can
be directly connected to the phase shifter (AC coupled) without the additional amplifier(s).
In one stage, the phase error between the I/Q signals is given by [112]:
∆ϕ = −2 arctan (ω0RC) (4.49)
90
I+ =Vo 0°
Vi 0°
Vi 180°
I- =Vo 180°
Q+ =Vo 90°
Q- =Vo 270°
R1
R2
R3
R4
C1
C2
C3
C4
Fig. 4.24: RC − CR polyphase filter.
For the frequency variation of 1.6 GHz ± 150 MHz, the phase error is 5 ◦ which is
acceptable.
A big challenge is that, the polyphase filter is sensitive to the RC mismatch. Assuming
a relative resistor mismatch of α and capacitor mismatch of β, the phase shift ∆ϕ can be
expressed as:
∆ϕ =
pi
2
− [arctan [ω0RC (1 + α) (1 + β)]− arctan (ω0RC)]
' pi
2
− α+ β
2
(4.50)
Therefore, for the resistor and capacitor matching of 1 %, the phase error would be
0.6 ◦. If the mismatch is 10 %, the phase error would be 5 ◦.
Careful layout is very important to the polyphase filter. Although common-centroid
structures can alleviate mismatch to a certain extend, special attention is still necessary
for the process-sensitive block. Large component results in better matching for both the
91
resistors and capacitors. Resistors achieve better matching, and the dominant source of
performance degradation is the capacitor matching [115]. Also, large resistor values will
result in larger input impedance, hence result in less loading of the IL-VCO resonant tank.
Therefore, the resistor values are set at the largest possible values, and the capacitors are
designed to match the frequency requirement.
A layout sketch for the polyphase filter is shown in Fig. 4.25. Each resistor Ri (r =
1, 2, 3, 4) are separated into four identical ones: Ria, Rib, Ric and Rid. Those that are
without any notations are the dummy resistors.
4.4 Phase Shifter
The phase shifter uses a current-steering digital-to-analog converter (DAC) that sup-
plies tail current to the two differential pairs while sharing the same resistive loading [116].
A simplified schematic of the phase shifter is depicted in Fig. 4.26. I1 and I2 are complemen-
tary current supply, i.e. the sum of them is a constant: I1 + I2 = I. All the four transistors
are in the same size and in the saturation region. Suppose I1 = ηI, in which η is some ratio
between 0 and 1, thus we have I2 = (1 − η)I. Further suppose the signal amplitude is A,
signal frequency is ω and the phase of I+ is α, we can have:

I+ = Asin (ωt+ α+ 0)
I− = Asin (ωt+ α+ pi)
Q+ = Asin
(
ωt+ α+ pi2
)
Q− = Asin
(
ωt+ α+ 3pi2
)
(4.51)
According to the first-order squared-law, the transconductance of the four transistors
are:
 gm1,2 =
√
2µnCox
W
L
(
1
2ηI
)
= K
√
ηI
gm3,4 =
√
2µnCox
W
L
(
1
2 (1− η) I
)
= K
√
(1− η) I
(4.52)
The output voltage of Vo1 can be obtained as:
92
C1
R1a
C2
R1bR2b
R2a
R2c R2d
C3
R1cR3b
R3a
R3c R3d
C4
R1dR4b
R4a
R4c R4d
Fig. 4.25: Layout sketch for the polyphase filter.
93
I1
I+ I-
I2
Q+ Q-
R R
I+I-
Q+
Q-
Vo1 Vo2
Vo2
Vo1
M1 M2 M3 M4
Fig. 4.26: Phase shifter working principle.
Vo1 = R [gm1,2 · (I+) + gm3,4 · (Q+)]
= R
[
K
√
ηI ·A sin (ωt+ α) +K
√
(1− η) I ·A sin
(
ωt+ α+
pi
2
)]
= RAK
√
I
[√
η sin (ωt+ α) +
√
(1− η) cos (ωt+ α)
]
= RAK
√
I sin (ωt+ α+ ∆ϕ) (4.53)
where
∆ϕ = arctan
√
1− η
η
(4.54)
Similarly, Vo2 can also be expressed as:
94
Vo2 = R [gm1,2 · (I−) + gm3,4 · (Q−)]
= R
[
K
√
ηI ·A sin (ωt+ α+ pi) +K
√
(1− η) I ·A sin
(
ωt+ α+
3pi
2
)]
= RAK
√
I
[
−√η sin (ωt+ α)−
√
(1− η) cos (ωt+ α)
]
= −RAK
√
I sin (ωt+ α+ ∆ϕ) (4.55)
= −Vo1 (4.56)
As can be seen, ∆ϕ ∈ [0, pi/2] if η ∈ [0, 1], i.e., choosing different current weight η
results in a phase shift in between the I+ and Q+.
The complete phase shifter circuit is shown in Fig. 4.27. I and Q are the input I/Q
signals. MI+, MI−, MQ+ and MQ− are switches controlled by Isel and Qsel which are
digital signals in order to select the quadrant, thus the phase can be shifted from 0 to 360 ◦.
The two identical DACs are binary-weighted which are controlled by the 7-bit bus A[6 : 0].
The phase shift in one quadrature (Isel=0, Qsel=0) is presented in Fig. 4.28 with an average
power of 1.52 mW.
4.5 Frequency Divider
4.5.1 Brief Introduction
Frequency dividers are widely used in high-speed communication circuity and have
been the subject of extensive study [76]. In the digital domain, a binary counter can be
used for power-of-2 integer division, clocked by the input signal [117]. An arrangement of
flip-flops is a classic method for integer-n division. Shifter register network that is clocked
by the input signal is another popular structure. The last register’s complemented output
is fed back to the first register’s input. The digital realization for the frequency divider is
convenient to design, robust to process variation, but suffers from comparatively high power
consumption and is hard to process RF signals.
95
RR
I
Q
DAC DAC
I1 I2
Isel
Qsel
A[6 : 0]
2
2
7
7
M0 M1 M2 M3 M4 M5 M6 M7
MI+ MI− MQ+ MQ−
VOUTP
VOUTN
Ibias
I2
A[6 : 0]
Fig. 4.27: Complete phase shifter circuit.
96
2
2 2.02 2.04 2.06 2.08 2.1 2.12 2.14 2.16 2.18 2.2 2.22 2.24 2.26 2.28 2.3 2.32 2.34
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
T/4=156.25 ps
time (ns)
A
m
p
li
tu
d
e
(V
)
Fig. 1.1: Phase shift in one quadrature: Isel=0, Qsel=0.
Fig. 4.28: Phase shift in one quadrature: SI = 0, SQ = 0 (totally 27 = 128).
Therefore, analog frequency divider is widely used in RF domain. A very conventional
architecture is regenerative frequency divider, also known as Miller frequency divider [5,
118, 119]. As shown the basic structure in Fig. 4.29 [5], Miller frequency divider contains
a mixer to produce new signals which contain the frequency components of fin + fout and
fin − fout. If fout = 1/2fin, it can realize divide-by-2 frequency division after filtering
the high frequency part by the low pass filter (LPF). However, mixer and LPF are power
consuming. One such design for an OFDM UWB system in [5] cost 47 mW in 0.18-µm
CMOS process.
fout=1/2fin
fin
fin+fout
fin-fout
Mixer LPF
Fig. 4.29: Regenerative (Miller) divider [5].
97
Another popular structure is based on divide-by-2 current-mode logic (CML) D flip-
flops (DFFs), as presented in Fig. 4.30 [106, 120]. For each CML, the Q terminals are
connected back to the D terminals in reversed polarity, resulting in a divide-by-2 operation.
Four stages are cascaded to implement the 16 division ratio. As can be seen, if the input
frequency is 1.6 GHz which is in our case, the first CML toggles at 800 MHz. The second,
third and fourth CML toggles at 400 MHz, 200 MHz and 100 MHz, respectively. Thus,
the maximum input frequency of the entire frequency divider is limited by the maximum
toggling frequency of the first CML in the chain. High-speed CML dividers are usually power
hungry because the CML of the first stage has to handle very high frequencies [106,120].
As was discussed in Section 4.2 (IL-VCO), the super-harmonic injection locking phe-
nomenon can be used as frequency dividers [6, 97–99,103,106]. This is a popular structure
since it cost very less power. The injection-locked frequency divider (ILFD) can be based on
either LC-tank oscillator (LC-Type) [6, 97, 103] or ring oscillators (Ring-Type) [98, 99, 106].
However, LC-Type divider, as shown in Fig. 4.31 [6], suffers from large die area because
of the passive inductor. The Ring-Type divider, which includes divide-by-2 circuits based
on DFFs with negative feedback, is robust to process variation and has a wide frequency
locking range, leading to its dominant position in industry designs [120,121].
A fully differential, divide-by-8 ILFD based on ring oscillator proposed by Cheng et
al. [106] gives a tuned locking range of 4-18 GHz with a power of 3.6 mW in 0.18-µm CMOS
D-
Q+
Q-
D+
CML1
CK+ CK-
VIN+ VIN-
D-
Q+
Q-
D+
CML2
CK+ CK-
D-
Q+
Q-
D+
CML3
CK+ CK-
D-
Q+
Q-
D+
CML4
CK+ CK-
Vo+ Vo-
1.6GHz 800MHz 400MHz 200MHz 100MHz
Fig. 4.30: Conventional divide-by-16 CML frequency divider.
98
M1 M2
M3Vinj
M1 M2
M3Vbias
Vinj
Fig. 4.31: Conventional LC-Type frequency divider [6].
technology. This section presents a modified divide-by-16 ILFD based on Cheng’s design and
solves correlated equations and steps for gm/ID methodology to optimization the divider.
4.5.2 Signal Modeling
A modified divide-by-16 frequency divider is used in this dissertation, based on Cheng’s
design, is shown in Fig. 4.32. It consists of an eight-stage ring of latches, each of which is
a CML D-latch. The output of the last latch (CML8) is connected to the input of the
first latch (CML1) with inverted polarity to achieve extra phase-shift of 180 ◦. The clock
terminals of the eight CMLs are tied together and used to inject the differential input signal
which is previously buffered after the phase shifter. The output signal can be taken from
any of the eight Q+/Q− terminals.
The modified resettable CML is shown in Fig. 4.33. It is a CML latch with PMOS
M9 and M10 operating in triode region. When RST is low (no reset), the CML works
as a DFF. That is, when (CK+ = 1, CK− = 0), M7 is ON and M8 is OFF; and hence
99
bufferbuffer
buffer
Phase 
Shifter
CK+
CK-
D-
Q+
Q-
D+
CML1
CK+ CK-
D-
Q+
Q-
D+
CML8
CK+ CK-
D-
Q+
Q-
D+
CML2
CK+ CK-
D-
Q+
Q-
D+
CML7
CK+ CK-
D-
Q+
Q-
D+
CML3
CK+ CK-
D-
Q+
Q-
D+
CML6
CK+ CK-
D-
Q+
Q-
D+
CML4
CK+ CK-
D-
Q+
Q-
D+
CML5
CK+ CK-
CKout+
CKout-
Q1+ Q2+ Q3+ Q4+
Q8+
Q7+ Q6+ Q5+
buffer
Fig. 4.32: Modified divide-by-16 frequency divider.
the CML behaves as a sensing differential amplifier with M1, M2, M9 and M10 devices.
When (CK+ = 0, CK− = 1), M7 is OFF and M8 is ON; and hence the CML behaves as
a differential pair with positive feedback, M3 and M4. The positive feedback latches the
output signal (Q+/Q−) at their current level.
When reset signal RST is high, M5 pulls Q+ to a inverse voltage potential to CK+
(Q+ = 0 if CK+ = 1, Q+ = 1 if CK+ = 0). Therefore, the latch stores a static state
according to the input clock signal. Only one of the CMLs needs reset input (CML1) which
is controlled off-chip, other reset inputs are OFF for CML2 to CML8 (RST=Ground and
RST=VDD, permanently).
The speed of this CML is determined by the bias current of M11 together with the load
resistance and capacitance seen at the output nodes, Q+ and Q−. A higher current value
can charge the load quickly, hence it allows higher frequency operation.
Fig. 4.34 illustrates the timing diagram of the input CK and output Q1 to Q8. The
transition edges of Q1 to Q8 are delayed by tp from the rising edge of CK. tp is the
propagation time for the CML to toggle state after when CK is high.
100
D+ D-
RST RST
M1 M2 M3 M4
M5 M6
M7 M8CK+ CK-
Q+ Q-
Vbias M11 Ibias
R1 R2
Fig. 4.33: Modified resettable CML circuit.
101
CK
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17
tp tp tp tp tp tp tp tp tp tp tp tp tp tp tp tp tp tp
Fig. 4.34: Timing diagram of the divide-by-16 frequency divider.
Suppose that the CK keeps at high voltage level all the time, the divider works as a
ring oscillator with the free-running frequency at:
f0 =
1
2Ntp
(4.57)
where N is the number of stages. As shown in [122], the free-running frequency of a
N-stage ring oscillator is given by:
f0 '= 1
2NRLCL ln 2
(4.58)
where RL and CL are respectively the equivalent resistance and capacitance at the
output of each CML cell. Hence we have:
tp = RLCL ln 2 (4.59)
Further suppose the input clock (CK) frequency is fck, hence the period is Tck = 1/fck.
In the first half period of CK, it must leave an enough time space (at least tp) for Q1 to
102
toggle its state, as shown in Fig. 4.35a. This gives:
Tck
2
> tp (4.60)
CK
Q1
Q2
tp
1/2TCK
(a)
CK
Q1
Q2
tp
1/2TCK
tp
(b)
Fig. 4.35: Zoomed-in timing diagrams: (a) minimum requirement for Tck; (b) maximum
requirement for Tck.
At the other hand, Tck cannot be too long, otherwise Q2 would have time to toggle its
state. In other words, Q1 cannot propagate to the next stage in half of the clock period as
presented in Fig. 4.35b. That is:
Tck
2
< 2tp (4.61)
From (4.60) and (4.61), tp must satisfy:
1
4fck
< tp <
1
2fck
(4.62)
103
Substituting (4.59) into (4.62) gives:
1
(4 ln 2) fck
< RLCL <
1
(2 ln 2) fck
(4.63)
Notice that M3 and M4 are cross-coupled NMOS pair whose model has been detailed
in Fig. 4.14. Considering the M1 and M9 as shown in Fig. 4.36 and substituting (4.25), the
RL and CL for the CML circuit can be written as:
R−1L = R1,2 + gds1,2 + gds3,4 − gm3,4 (4.64)
CL = (Cgd1,2 + Cbd1,2) + (4Cgd3,4 + Cds3,4 + Cgb3,4 + Cgs3,4) (4.65)
The buffers shown in Fig. 4.32 is a complementary self-biased differential amplifier
(CSDA) [7] which is presented in Fig. 4.37.
In the CSDA, transistors M5 and M6 operate in the linear region. Thus the output
swing of the amplifier can be very close to the difference between the two supply rails. This
large output swing makes interfacing the CSDA to the CMOS logic gates straightforward
since it provides a large margin for variation in the logic threshold of the gates [7]. Fur-
thermore, the linear-region operation of M5 and M6 can provide output switching currents
momentarily large while keeping its quiescent current very small. This feature makes CSDA
can fast charge and discharge the output capacitive loads without consuming a large amount
of power.
With the resulting set of equations derived, it is now to present the gm/ID design flow.
The transistor length for Mi (i=1,2,...,8) is set as the minimum Lmin = 70 nm. Suppose
gm1,2,3,4 = gm and Ibias = 2ID, the algorithm is presented in Algorithm. (3). The output
clock generated from the frequency divider at different input frequencies are presented in
Fig. 4.38 with the average power shown in Fig. 4.39.
104
M1
2Cnmos3,4 1/gds3,4 -1/gm3,4
R1,2
Fig. 4.36: Small signal model to calculate RL and CL for the CML circuit.
4.6 Mixer
The mixer basically includes analog switches SW1, SW2, SW3 and SW4 which are
composed of a NMOS and a PMOS transistor, as presented in Fig. 4.40. A IL-VCO in the
TX gives a constant sinusoidal carrier wave at the frequency of 1.6 GHz. When Data=1,
SW1/SW2 are ON while SW3/SW4 are OFF, the output signal Vop and Von are differential
sinusoidal waves with a width equals to the Data width. When Data=0, SW1/SW2 are
OFF while SW3/SW4 are ON, the output signal Vop and Von are at a DC potential biased
by the signal Bias. Capacitors C1 and C2 give a AC path to ground in order to suppress
the fed through signal when SW1/SW2 are OFF.
4.7 Error Correction Encoder and Decoder
The design of low-power error correction decoder is crucial in this system. Convo-
lutional Viterbi decoder (VD) is chosen since its high-speed and low-complexity features.
Usually, there are two methods used to extract the decoded bits: the trace-back (TB) and
105
VDD
Vin+ Vin-Vout
M1
M2
M3
M4
M5
M6
VB
Fig. 4.37: Buffer: a complementary self-biased differential amplifier (CSDA) [7].
106
0 5 10 15 20 25 30 35 40 45 50 55 60
0
0.5
1
time (ns)
A
m
p
li
tu
d
e
(V
)
1.5 GHz/16 = 93.75 MHz
0 5 10 15 20 25 30 35 40 45 50 55 60
0
0.5
1
time (ns)
A
m
p
li
tu
d
e
(V
)
1.6 GHz/16 = 100 MHz
0 5 10 15 20 25 30 35 40 45 50 55 60
0
0.5
1
time (ns)
A
m
p
li
tu
d
e
(V
)
2.1 GHz/16 = 131.25 MHz
Fig. 4.38: Output clock from the frequency divider.
107
1.5 1.6 1.7 1.8 1.9 2 2.1 2.2
0.66
0.67
0.67
0.68
Frequency (GHz)
A
ve
ra
ge
P
ow
er
(m
W
)
Fig. 4.39: Average power for the frequency divider.
Data
Bias
VCO 
Input
Data
SW1
SW3
SW2
SW4
C1
C2
Vop
Von
Fig. 4.40: Complete mixer circuit.
108
Algorithm 3 gm/ID Methodology For Injection-locked Frequency Divider Design
Input: Current ID;
Output: All transistor dimensions; Power.
1: for each ID = (1 : 1 : 15)× 20 µA do
2: for each gm/ID=3:1:24 do
3: Compute gm3,4;
4: Find normalized current in = i1,2,3,4;
5: Find (gds/ID)1,2,3,4 and compute (gds)1,2,3,4;
6: Find normalized capacitance Cij1,2,3,4;
7: Compute transistor width Wn = W1,2,3,4 from in and ID;
8: Compute Cij1,2,3,4;
9: Compute CL from (4.65);
10: Compute RL from (4.64);
11: Compute RLCL. If it does not satisfy (4.63), return error.
12: Compute total power=VDD × (8× 2ID).
13: end for
14: end for
the register-exchange (RE) [123]. The TB method is acceptable for trellis with a large
number of states, whereas the RE approach is more suitable for trellis with a small number
of states. In this work, a low-complexity encoder with constraint length K = 3, code rate
R = 1/2 and generator polynomial G (x) =
[
1 + x2 1 + x+ x2
]
is adopted, thus the RE-
based Viterbi algorithm (VA) is chosen for decoding process. The encoder has two memory
units and generates two encoded bits for each incoming data bit, i.e., only one parity-check
bit is added for each data bit.
The trellis diagram of the encoder is depicted in Fig. 4.41. The state is indicated
as a pair of bits, with the first bit representing the least significant bit (lsb) while the
second representing the most significant bit (msb). The transitions between two states
are represented by branches which are along the state diagram indicating input/output
values. In VA, the branch is assigned a weight, referred to as branch metric (BM) that are
accumulated along the path, forming a path metric (PM). When two branches enter the
same state, the branch with the smaller PM survives while the other one is discarded. For
simple hard decision, the weight is the Hamming distance (number of bits differ) between
the encoded bits and the received bits.
The units for calculating the BMs and PMs are required for both TB and RE methods.
109
0/00
State
(lsb,msb)
t
00
10
01
10
00
10
01
10
1/11
0/11
1/00
0/01
1/10
1/01
0/10
00
10
01
10
1/11
0/11
1/00
0/01
1/10
1/01
0/10
00
10
01
10
1/11
0/11
1/00
0/01
1/10
1/01
0/10
00
10
01
10
1/11
0/11
1/00
0/01
1/10
1/01
0/10
0/00 0/00 0/00
t+1 t+2 t+3 t+
Fig. 4.41: Trellis diagram for the encoder with constraint length K = 3, generator polyno-
mial G (x) =
[
1 + x2 1 + x+ x2
]
.
The difference is the methodology of extracting the decoded output bits. In TB method, the
decoded bit is extracted by tracking backward from the state with minimum PM at Stage
t + Γ (Γ is the decoding depth), by following the survivor path, to the original state t. It
is necessary to trace backward through the trellis once for each output. In RE approach, a
register is assigned to each state. The register contains the decoded output sequence along
the path from initial state t to the final state t+Γ. The decoded output sequence is selected
in the register assigned to the state with the minimum PM (double buffering of the data
may be necessary so that the contents in registers are not lost during the copying from one
state to another [123]). Since the RE method does not need to be traced back, it is faster
than the TB method.
The VD composed of four functional units is shown in Fig. 4.42. The BM unit (BMU)
calculates the BMs. The add-compare-select unit (ACSU) adds the BMs to the PMs along
the path, compares the PMs, and then stores the minimum PMs in the PM memory unit
(PMU). Meanwhile, the ACSU appends the associated decoded bit in the survivor-path
memory unit (SMU). All the units are driven by the CONTROL block. It should be
noted that it is possible to make an incorrect decoding decision on a finite decoding depth,
which is called a truncation error. The truncation error is typically small enough to ensure
110
negligible performance degradation if a decoding depth of about four to five constraint
length is employed [124, 125]. The decoding depth Γ is 12 (4 ×K) in this design with the
tradeoff of power and performance.
The power estimation method, which has been measured and verified with the absolute
modeling error of 5.2 % and standard deviation of 6.6 % [126], is adopted for estimating the
power dissipation of the VD. The VD is designed by Verilog hardware description language
(Verilog-HDL). Gate mapping is carried out by a synthesis tool (Cadence RTL Compiler).
The delay information of the circuit is written in a standard delay format (.sdf) file. The
HDL simulator, Cadence SimVision, is used to create the value change dump (.vcd) file.
Here, the .vcd file and the .sdf file are used to carry out cycle-accurate power simulation.
By post-processing the data, the energy distribution can be extracted [126]. The estimated
power of the entire VD is 51.3 µW at the speed of 100 Mbps in this 65-nm CMOS process.
The decoder creates a speed limit for the system. It should be noted here that this
Viterbi decoder is not the only option for UWB systems used for cortical interfaces. We
selected the Viterbi decoder as the option because it requires least area while still operat-
ing above 100 Mbps. In general, higher speed can be achieved through parallel processing,
either by duplicating the Viterbi decoder or by using a more complex option like the Gal-
lager decoder. Recently, error correction circuits with sub-threshold operation have been
reported in [127]. With this technology, for example, a Gallager-A (GA) decoder which
has a highly parallel architecture consumes only 0.66 mW when operating at 200 Mbps in
a 65-nm low power high threshold (LP-HVT) CMOS process [127]. Furthermore, analog
BMU ACSU
PMU
SMU
CONTROL
Encoded
bits
Decoded
bits
Fig. 4.42: Block diagram of the Viterbi decoder.
111
decoding circuits [128] may be another alternative solution since the power-consuming ADC
will be removed in this situation.
112
CHAPTER 5
CONCLUSIONS AND FUTURE WORK
This work proposed a novel high-speed ultra-wideband wireless communication system
for the cortical interface application. Compared to the traditional narrowband techniques,
the wideband solution can offer order(s) of transmission speed higher than its existing
narrowband counterpart. An IR-UWB receiver system with pulse-injection-locking method
is deployed. The injection-locking technique eliminates the clock data recovery circuitry,
leading to significant power reduction.
However, the injection-locked IR-UWB system also introduces many difficulties to the
communication between the devices in and out of human tissue. The first challenge is to
design and fabricate a pair of low-Q inductive coils with high self-resonance frequency in the
order of GHz. Secondly, the low-Q coils and the tissue absorption in high-frequency band
result in a significant signal attenuation, i.e., a much lower SNR is introduced. Thirdly,
the injection-locking technique brings a higher clock jitter to the system compared to other
PLL-based designs. All these considerations lead to many uncertainties for the system
performance. To address these difficulties, a pair of low-Q coils is fabricated by using a
high-frequency material RO4003C, and the physical communication channel through both
the air and biological tissue is measured. Moreover, a rational function-based model that
can well characterize the coils’ port behavior is built in Matlab for the system reliability
evaluation of which the BER is evaluated as the primary measure. Critical results show
that:
1. The sampling clock jitter creates an error floor that is independent of the SNR of the
received signals. With the use of ECC, the error floor is eliminated;
2. The level of the error floor is determined by the sampling jitter. An LNA with a
better noise figure will not contribute to improving the system performance without
113
the use of ECC;
3. ECC can improve the system performance more than four orders.
With the ECC technique, the burden of designing power-consuming cells, like the LNA,
VCO, and ADC, is greatly released thus more power can be saved.
Another contribution of this work is the exploration of gm/ID design methodology for
circuit optimization in all-inversion regions. The design methodology easily presents the
devices’ sizing of the circuit. Therefore, the design period is significantly reduced since
only a little adjustments are needed after the Matlab calculation. The works for this part
includes:
1. A database is created including the MOSFET semi-empirical models based on the
AC characteristics of the device: gm/ID versus the normalized current i, the drain-to-
source conductance over ID, the bulk-to-source conductance over ID, five capacitance
of the quasistatic model. The simplified models for passive inductors are also obtained
by using the AC simulation at the working frequency;
2. Specific design methodologies are derived for the CG-LNA, the IL-VCO, and the fre-
quency divider circuit block. Its efficiency has been proved since the devices modeling
data is collected once and can be reutilized for other circuits designs.
3. A summary of the average power for each circuit block of the receiver is presented in
Table. 5.1.
Table 5.1: AVERAGE POWER FOR EACH CIRCUIT BLOCK OF THE RECEIVER.
LNA IL-VCO Phase Shifter Frequency Divider ECC Total
3.37 mW 1.61 mW 1.52 mW 0.67 mW 0.05 mW 7.22 mW
Future works could be applied in:
1. The carrier frequency is determined by the characteristics of the coils. Higher carrier
frequency can not only increase the data throughput but also can decrease the power
114
of the LNA and the VCO. Therefore, further coil optimization is worth taken in the
future;
2. For the ECC, higher speed can be achieved through parallel processing, either by
duplicating the Viterbi decoder or by using a more complex option like the Gallager
decoder;
3. Since the system can tolerate RF blocks with the higher noise level, the LC-tank VCO
can be replaced with an injection-locked ring oscillator which has less power than its
LC-based counterpart at the cost of a higher noise level. In this case, the expensive
on-chip inductor can be saved for both the transmitter and the receiver.
115
Bibliography
[1] C. Winstead and Y. Luo, “Error correction circuits for bio-implantable electronics,”
in Proc. IEEE 55th Int Circuits and Systems (MWSCAS) Midwest Symp,
2012, pp. 158–161. [Online]. Available: http://ieeexplore.ieee.org/stamp/stamp.jsp?
arnumber=6291981
[2] First Report and Order, Federal Communication Commision Std. FCC 02-48,
Feb, 2002. [Online]. Available: https://transition.fcc.gov/Bureaus/Engineering
Technology/Orders/2002/fcc02048.pdf
[3] Y. Luo, C. Winstead, and P. Chiang, “125Mbps ultra-wideband system evaluation
for cortical implant devices,” in Proc. Annual Int Engineering in Medicine and
Biology Society (EMBC) Conf. of the IEEE, 2012, pp. 779–782. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6346047
[4] B. Murmann, EE 214B: Advanced Analog Integrated Circuit Design, Stanford Univer-
sity, 2012.
[5] T.-C. Lee and Y.-C. Huang, “The design and analysis of a Miller-divider-based clock
generator for mboa-UWB application,” IEEE Journal of Solid-State Circuits, vol. 41,
no. 6, pp. 1253–1261, Jun. 2006.
[6] M. Tiebout, “A CMOS direct injection-locked oscillator topology as high-frequency
low-power frequency divider,” IEEE Journal of Solid-State Circuits, vol. 39, no. 7,
pp. 1170–1174, Jul. 2004.
[7] M. Bazes, “Two novel fully complementary self-biased CMOS differential amplifiers,”
IEEE Journal of Solid-State Circuits, vol. 26, no. 2, pp. 165–168, Feb. 1991.
116
[8] K. Wise, D. Anderson, J. Hetke, D. Kipke, and K. Najafi, “Wireless
implantable microsystems: high-density electronic interfaces to the nervous
system,” Proc. IEEE, vol. 92, no. 1, pp. 76–97, 2004. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1258173
[9] J. Weiland and M. Humayun, “Visual prosthesis,” Proceedings of the IEEE, vol. 96,
no. 7, pp. 1076–1084, 2008. [Online]. Available: http://ieeexplore.ieee.org/xpls/
abs all.jsp?arnumber=4539488
[10] K. Jones and R. Normann, “An advanced demultiplexing system for physiological
stimulation,” IEEE Trans. Biomed. Eng., vol. 44, no. 12, pp. 1210–1220, 1997.
[Online]. Available: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=649992
[11] T. Seese, H. Harasaki, G. Saidel, and C. R. Davies, “Characterization of tissue
morphology, angiogenesis, and temperature in the adaptive response of muscle tissue
to chronic heating,” Laboratory investigation, vol. 78, pp. 1553–1562, 1998. [Online].
Available: http://ukpmc.ac.uk/abstract/MED/9881955
[12] G. Lazzi, “Thermal effects of bioimplants,” IEEE Eng. Med. Biol. Mag., vol. 24, no. 5,
pp. 75–81, 2005.
[13] M. Ghovanloo, “An overview of the recent wideband transcutaneous wireless
communication techniques,” in Proc. Annual Int Engineering in Medicine and
Biology Society, EMBC Conf. of the IEEE, 2011, pp. 5864–5867. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6091450
[14] M. Ghovanloo and K. Najafi, “A wideband frequency-shift keying wireless link for in-
ductively powered biomedical implants,” IEEE Trans. Circuits Syst. I, vol. 51, no. 12,
pp. 2374–2383, 2004.
[15] R. R. Harrison, P. T. Watkins, R. J. Kier, R. O. Lovejoy, D. J. Black, B. Greger,
and F. Solzbacher, “A low-power integrated circuit for a wireless 100-electrode neural
117
recording system,” IEEE Journal of Solid-State Circuits, vol. 42, no. 1, pp. 123–133,
Jan. 2007.
[16] B. K. Thurgood, D. J. Warren, N. M. Ledbetter, G. A. Clark, and R. R. Harrison,
“A wireless integrated circuit for 100-Channel charge-balanced neural stimulation,”
IEEE Transactions on Biomedical Circuits and Systems, vol. 3, no. 6, pp. 405–414,
Dec. 2009.
[17] J. Coulombe, M. Sawan, and J.-F. o. Gervais, “A highly flexible system for
microstimulation of the visual cortex: Design and implementation,” IEEE Trans.
Biomed. Circuits Syst., vol. 1, no. 4, pp. 258–269, 2007. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4432298
[18] S. Mandal and R. Sarpeshkar, “Power-efficient impedance-modulation wireless data
links for biomedical implants,” IEEE Trans. Biomed. Circuits Syst., vol. 2, no. 4,
pp. 301–315, 2008. [Online]. Available: http://ieeexplore.ieee.org/stamp/stamp.jsp?
arnumber=4667645
[19] M. Zhou, M. R. Yuce, and W. Liu, “A non-coherent DPSK data receiver
with interference cancellation for dual-band transcutaneous telemetries,” IEEE J.
Solid-State Circuits, vol. 43, no. 9, pp. 2003–2012, 2008. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4625999
[20] Z. Luo and S. Sonkusale, “A novel BPSK demodulator for biological implants,” IEEE
Transactions on Circuits and Systems I: Regular Papers, vol. 55, no. 6, pp. 1478–1484,
Jul. 2008.
[21] G. Nabovati, E. Ghafar-Zadeh, F. Awwad, and M. Sawan, “Fully digital low-power
self-calibrating BPSK demodulator for implantable biosensors,” in Proc. IEEE 55th
Int. Midwest Symp. Circuits and Systems (MWSCAS), Aug. 2012, pp. 354–357.
118
[22] M. Zgaren and M. Sawan, “A low-power dual-injection-locked RF receiver with FSK-
to-ook conversion for biomedical implants,” IEEE Transactions on Circuits and Sys-
tems I: Regular Papers, vol. 62, no. 11, pp. 2748–2758, Nov. 2015.
[23] F. Inanlou, M. Kiani, and M. Ghovanloo, “A 10.2 Mbps pulse harmonic
modulation based transceiver for implantable medical devices,” IEEE J. Solid-
State Circuits, vol. 46, no. 6, pp. 1296–1306, 2011. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5764843
[24] F. Inanlou and M. Ghovanloo, “Wideband near-field data transmission using pulse
harmonic modulation,” IEEE Trans. Circuits Syst. I, vol. 58, no. 1, pp. 186–195, 2011.
[25] M. Kiani and M. Ghovanloo, “A 20-Mb/s pulse harmonic modulation transceiver for
wideband near-field data transmission,” IEEE Transactions on Circuits and Systems
II: Express Briefs, vol. 60, no. 7, pp. 382–386, Jul. 2013.
[26] ——, “Pulse delay modulation (pdm) a new wideband data transmission method to
implantable medical devices in presence of a power link,” in Proc. IEEE Biomedical
Circuits and Systems Conf. (BioCAS), Nov. 2012, pp. 256–259.
[27] ——, “A 13.56-Mbps pulse delay modulation based transceiver for simultaneous near-
field data and power transmission,” IEEE Transactions on Biomedical Circuits and
Systems, vol. 9, no. 1, pp. 1–11, Feb. 2015.
[28] F. C. Commission et al., “Revision of part 15 of the commission’s rules regarding
ultra-wideband transmission systems,” First Report and Order, FCC, vol. 2, p. V48,
2002.
[29] Y. Wang, A. M. Niknejad, V. Gaudet, and K. Iniewski, “A CMOS IR-UWB
transceiver design for contact-less chip testing applications,” IEEE Trans. Circuits
Syst. II, vol. 55, no. 4, pp. 334–338, 2008.
[30] R. Chavez-Santiago, K. Nolan, O. Holland, L. De Nardis, J. Ferro, N. Barroca,
L. Borges, F. Velez, V. Goncalves, and I. Balasingham, “Cognitive radio
119
for medical body area networks using ultra wideband,” IEEE Wireless
Commun., vol. 19, no. 4, pp. 74–81, 2012. [Online]. Available: http:
//ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6272426
[31] J. Yoo, S. Lee, and H.-J. Yoo, “A 1.12pJ/b inductive transceiver with a fault-tolerant
network switch for multi-layer wearable body area network applications,” IEEE J.
Solid-State Circuits, vol. 44, no. 11, pp. 2999–3010, 2009.
[32] C. T. Charles, “Wireless data links for biomedical implants: Current research and
future directions,” in Proc. IEEE Biomedical Circuits and Systems Conf. BIOCAS
2007, 2007, pp. 13–16.
[33] ——, “An implantable i-uwb transceiver architecture with power carrier synchro-
nization,” in Proc. IEEE Int. Symp. Circuits and Systems ISCAS 2008, 2008, pp.
1970–1973.
[34] S. Iida, K. Tanaka, H. Suzuki, N. Yoshikawa, N. Shoji, B. Griffiths, D. Mellor, F. Hay-
den, I. Butler, and J. Chatwin, “A 3.1 to 5 GHz CMOS dsss UWB transceiver for
wpans,” in Proc. ISSCC. 2005 IEEE Int. Digest of Technical Papers. Solid-State Cir-
cuits Conf, Feb. 2005, pp. 214–594 Vol. 1.
[35] U. M. Jow and M. Ghovanloo, “Optimization of data coils in a multiband wireless link
for neuroprosthetic implantable devices,” IEEE Transactions on Biomedical Circuits
and Systems, vol. 4, no. 5, pp. 301–310, Oct. 2010.
[36] G. Wang, P. Wang, Y. Tang, and W. Liu, “Analysis of dual band power and data
telemetry for biomedical implants,” IEEE Trans. Biomed. Circuits Syst., no. 99, 2011,
early Access.
[37] U.-M. Jow and M. Ghovanloo, “Design and optimization of printed spiral coils for
efficient transcutaneous inductive power transmission,” IEEE Trans. Biomed. Circuits
Syst., vol. 1, no. 3, pp. 193–202, 2007.
120
[38] M. Ghovanloo and S. Atluri, “A wide-band power-efficient inductive wireless link for
implantable microelectronic devices using multiple carriers,” IEEE Trans. Circuits
Syst. I, vol. 54, no. 10, pp. 2211–2221, 2007.
[39] U.-M. Jow and M. Ghovanloo, “Optimization of data coils in a multiband wireless
link for neuroprosthetic implantable devices,” IEEE Trans. Biomed. Circuits Syst.,
vol. 4, no. 5, pp. 301–310, 2010.
[40] W. G. Scanlon, B. Burns, and N. E. Evans, “Radiowave propagation from a tissue-
implanted source at 418 MHz and 916.5 MHz,” IEEE Transactions on Biomedical
Engineering, vol. 47, no. 4, pp. 527–534, Apr. 2000.
[41] Y. Chan, M. Q.-H. Meng, K.-L. Wu, and X. Wang, “Experimental study of radiation
efficiency from an ingested source inside a human body model*,” in Proc. 27th Annual
Int. Conf. of the Engineering in Medicine and Biology Society IEEE-EMBS 2005,
2005, pp. 7754–7757.
[42] H. Bahrami, B. Gosselin, and L. A. Rusch, “Realistic modeling of the
biological channel for the design of implantable wireless UWB communication
systems,” in Proc. Annual Int Engineering in Medicine and Biology Society
(EMBC) Conf. of the IEEE, 2012, pp. 6015–6018. [Online]. Available: http:
//ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6347365
[43] A. Khaleghi, R. Chavez-Santiago, and I. Balasingham, “An improved ultra
wideband channel model including the frequency-dependent attenuation for in-body
communications,” in Proc. Annual Int Engineering in Medicine and Biology
Society (EMBC) Conf. of the IEEE, 2012, pp. 1631–1634. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6346258
[44] K. M. S. Thotahewa, J. M. Redoute´, and M. R. Yuce, “SAR, sa, and temperature
variation in the human head caused by IR-UWB implants operating at 4 GHz,” IEEE
Transactions on Microwave Theory and Techniques, vol. 61, no. 5, pp. 2161–2169, May
2013.
121
[45] K. M. S. Thotahewa, J. M. Redoute`, and M. R. Yuce, “Propagation, power absorption,
and temperature analysis of UWB wireless capsule endoscopy devices operating in
the human body,” IEEE Transactions on Microwave Theory and Techniques, vol. 63,
no. 11, pp. 3823–3833, Nov. 2015.
[46] I. D. O’Donnell and R. W. Brodersen, “A 2.3mw baseband impulse-UWB transceiver
front-end in CMOS,” in VLSI Circuits, 2006. Digest of Technical Papers. 2006
Symposium on, 2006. [Online]. Available: http://ieeexplore.ieee.org/stamp/stamp.
jsp?arnumber=1705379
[47] A. Medi and W. Namgoong, “A high data-rate energy-efficient interference-tolerant
fully integrated CMOS frequency channelized UWB transceiver for impulse radio,”
IEEE J. Solid-State Circuits, vol. 43, no. 4, pp. 974–980, 2008. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4476500
[48] D. Barras, R. Meyer-Piening, G. v. Bueren, W. Hirt, and H. Jaeckel, “A
low-power baseband ASIC for an energy-collection IR-UWB receiver,” IEEE J.
Solid-State Circuits, vol. 44, no. 6, pp. 1721–1733, 2009. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4982867
[49] Y. Zheng, Y. Tong, C. W. Ang, Y.-P. Xu, W. G. Yeoh, F. Lin, and R. Singh, “A CMOS
carrier-less UWB transceiver for WPAN applications,” in Proc. Digest of Technical
Papers. IEEE Int. Solid-State Circuits Conf. ISSCC 2006, 2006, pp. 378–387.
[50] L. Zhou, Z. Chen, C.-C. Wang, F. Tzeng, V. Jain, and P. Heydari, “A 2Gbps RF-
correlation-based impulse-radio UWB transceiver front-end in 130nm CMOS,” in
Proc. IEEE Radio Frequency Integrated Circuits Symp. RFIC 2009, 2009, pp. 65–
68.
[51] F. Zhang, A. Jha, R. Gharpurey, and P. Kinget, “An agile, ultra-wideband
pulse radio transceiver with discrete-time wideband-IF,” IEEE J. Solid-State
Circuits, vol. 44, no. 5, pp. 1336–1351, 2009. [Online]. Available: http:
//ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4907327
122
[52] N. V. Helleputte and M. Verhelst, “A reconfigurable, 130 nm CMOS 108 pJ/pulse,
fully integrated IR-UWB receiver for communication and precise ranging,” IEEE
J. Solid-State Circuits, vol. 45, no. 1, pp. 69–83, 2010. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5357557
[53] Z. Zou, D. S. Mendoza, P. Wang, Q. Zhou, J. Mao, F. Jonsson, H. Tenhunen,
and L. R. Zheng, “A low-power and flexible energy detection IR-UWB
receiver for RFID and wireless sensor networks,” IEEE Trans. Circuits
Syst. I, vol. 58, no. 7, pp. 1470–1482, 2011. [Online]. Available: http:
//ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5766792
[54] M. Crepaldi, C. Li, J. R. Fernandes, and P. R. Kinget, “An ultra-wideband
impulse-radio transceiver chipset using synchronized-OOK modulation,” IEEE J.
Solid-State Circuits, vol. 46, no. 10, pp. 2284–2299, 2011. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5979212
[55] D. C. Daly, P. P. Mercier, M. Bhardwaj, A. L. Stone, Z. N. Aldworth,
T. L. Daniel, J. Voldman, J. G. Hildebrand, and A. P. Chandrakasan,
“A pulsed UWB receiver SoC for insect motion control,” IEEE J. Solid-
State Circuits, vol. 45, no. 1, pp. 153–166, 2010. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5357570
[56] L. Xia, K. Shao, H. Chen, Y. Huang, Z. Hong, and P. Y. Chiang, “0.15-nJ/b
3–5-GHz IR-UWB system with spectrum tunable transmitter and merged-correlator
noncoherent receiver,” IEEE Trans. Microw. Theory Techn., vol. 59, no. 4, pp.
1147–1156, 2011. [Online]. Available: http://ieeexplore.ieee.org/stamp/stamp.jsp?
arnumber=5727926
[57] M. Pelissier, D. Morche, and P. Vincent, “Super-regenerative architecture for
UWB pulse detection: From theory to RF front-end design,” IEEE Trans.
Circuits Syst. I, vol. 56, no. 7, pp. 1500–1512, 2009. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4671075
123
[58] P. E. Thoppay, C. Dehollain, M. M. Green, and M. J. Declercq, “A 0.24-
nJ/bit super-regenerative pulsed UWB receiver in 0.18-µm CMOS,” IEEE J.
Solid-State Circuits, vol. 46, no. 11, pp. 2623–2634, 2011. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6034536
[59] M. Pelissier, B. Gomez, G. Masson, S. Dia, M. Gary, J. Jantunen,
J. Arponen, and J. Varteva, “A 112Mb/s full duplex remotely-powered impulse-
UWB RFID transceiver for wireless nv-memory applications,” in VLSI Circuits
(VLSIC), 2010 IEEE Symposium on, 2010, pp. 25–26. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5560255
[60] C. Hu, R. Khanna, J. Nejedlo, K. Hu, H. Liu, and P. Y. Chiang, “A 90 nm-CMOS,
500 Mbps, 3–5 GHz fully-integrated IR-UWB transceiver with multipath equalization
using pulse injection-locking for receiver phase synchronization,” IEEE J. Solid-State
Circuits, vol. 46, pp. 1076–1088, 2011.
[61] S. A. Mirbozorgi, H. Bahrami, M. Sawan, L. A. Rusch, and B. Gosselin, “A single-
chip full-duplex high speed transceiver for multi-site stimulating and recording neural
implants,” IEEE Transactions on Biomedical Circuits and Systems, vol. 10, no. 3, pp.
643–653, Jun. 2016.
[62] M. Rezaei and B. Gosselin, “Low-power high-speed wireless transceivers and antennas
for large-scale neural implants,” in Proc. 14th IEEE Int. New Circuits and Systems
Conf. (NEWCAS), Jun. 2016, pp. 1–4.
[63] S. Gambini, J. Crossley, E. Alon, and J. M. Rabaey, “A fully integrated, 290 pJ/bit
UWB dual-mode transceiver for cm-range wireless interconnects,” IEEE Journal of
Solid-State Circuits, vol. 47, no. 3, pp. 586–598, Mar. 2012.
[64] J. Hu, Y. Zhu, S. Wang, and H. Wu, “An energy-efficient IR-UWB receiver based
on distributed pulse correlator,” IEEE Transactions on Microwave Theory and Tech-
niques, vol. 61, no. 6, pp. 2447–2459, Jun. 2013.
124
[65] L. Wang, C. H. Heng, and Y. Lian, “A sub-ghz mostly digital impulse radio UWB
transceiver for wireless body sensor networks,” IEEE Journal on Emerging and Se-
lected Topics in Circuits and Systems, vol. 4, no. 3, pp. 344–353, Sep. 2014.
[66] S. Brenna, F. Padovan, A. Neviani, A. Bevilacqua, A. Bonfanti, and A. L. Lacaita,
“A 64-Channel 965- neural recording SoC with UWB wireless transmission in 130-nm
CMOS,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 63, no. 6,
pp. 528–532, Jun. 2016.
[67] M. Ghovanloo and K. Najafi, “A wireless implantable multichannel microstimulating
system-on-a-chip with modular architecture,” IEEE Transactions on Neural Systems
and Rehabilitation Engineering, vol. 15, no. 3, pp. 449–457, Sep. 2007.
[68] K. Chen, Z. Yang, L. Hoang, J. Weiland, M. Humayun, and W. Liu, “An integrated
256-Channel epiretinal prosthesis,” IEEE Journal of Solid-State Circuits, vol. 45,
no. 9, pp. 1946–1956, Sep. 2010.
[69] C. W. Chou, L. C. Liu, and C. Y. Wu, “A medradio-band low-energy-per-bit 4-Mbps
CMOS ook receiver for implantable medical devices,” in Proc. 35th Annual Int. Conf.
of the IEEE Engineering in Medicine and Biology Society (EMBC), Jul. 2013, pp.
5171–5174.
[70] B. P. Wilkerson and J. K. Kang, “A low power BPSK demodulator for wireless im-
plantable biomedical devices,” in Proc. IEEE Int. Symp. Circuits and Systems (IS-
CAS2013), May 2013, pp. 626–629.
[71] J. Tan, W. S. Liew, C. H. Heng, and Y. Lian, “A 2.4 GHz ulp reconfigurable asym-
metric transceiver for single-chip wireless neural recording IC,” IEEE Transactions
on Biomedical Circuits and Systems, vol. 8, no. 4, pp. 497–509, Aug. 2014.
[72] M. Karimi, M. H. Maghami, M. Faizollah, and A. M. Sodagar, “A noncoherent low-
power high-data-rate BPSK demodulator and clock recovery circuit for implantable
125
biomedical devices,” in Proc. IEEE Biomedical Circuits and Systems Conf (BioCAS),
Oct. 2014, pp. 372–375.
[73] A. Ba, M. Vidojkovic, K. Kanda, N. F. Kiyani, M. Lont, X. Huang, X. Wang,
C. Zhou, Y. H. Liu, M. Ding, B. Bsze, S. Masui, M. Hamaminato, H. Sato, K. Philips,
and H. de Groot, “A 0.33 nJ /bit IEEE802.15.6 /proprietary mics /ISM wireless
transceiver with scalable data rate for medical implantable applications,” IEEE Jour-
nal of Biomedical and Health Informatics, vol. 19, no. 3, pp. 920–929, May 2015.
[74] J. Y. Hsieh, Y. C. Huang, P. H. Kuo, T. Wang, and S. S. Lu, “A 0.45-V low-power
ook/FSK RF receiver in 0.18 CMOS technology for implantable medical applications,”
IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 63, no. 8, pp.
1123–1130, Aug. 2016.
[75] N. V. Helleputte and G. Gielen, “A 70 pJ/pulse analog front-end in
130 nm CMOS for UWB impulse radio receivers,” IEEE J. Solid-State
Circuits, vol. 44, no. 7, pp. 1862–1871, 2009. [Online]. Available: http:
//ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5109791
[76] R. Adler, “A study of locking phenomena in oscillators,” Proceedings of the IEEE,
vol. 61, no. 10, pp. 1380–1385, Oct. 1973.
[77] B. Razavi, “A study of injection locking and pulling in oscillators,” IEEE Journal of
Solid-State Circuits, vol. 39, no. 9, pp. 1415–1424, Sep. 2004.
[78] U.-M. Jow and M. Ghovanloo, “Optimization of a multiband wireless link for neu-
roprosthetic implantable devices,” in Proc. IEEE Biomedical Circuits and Systems
Conf. BioCAS 2008, 2008, pp. 97–100.
[79] X. Chen and S. Kiaei, “Monocycle shapes for ultra wideband system,” in Proc. IEEE
Int. Symp. Circuits and Systems ISCAS 2002, vol. 1, 2002.
126
[80] U.-M. Jow and M. Ghovanloo, “Modeling and optimization of printed spiral coils in
air, saline, and muscle tissue environments,” IEEE Trans. Biomed. Circuits Syst.,
vol. 3, no. 5, pp. 339–347, 2009.
[81] R. Thai-Singama, F. Du-Burck, and M. Piette, “Demonstration of low-cost
architectures for UWB pulse generation,” in Proc. Eighth Int Wireless and Optical
Communications Networks (WOCN) Conf, 2011, pp. 1–5. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5872942
[82] S. H. L. R. G. M. Paul. R. Gray, Paul. J.Hurst, Analysis and Design of Analog
Integrated Circuits. John Wiley and Sons, Inc., 2001.
[83] B. Razavi, Design of Analog CMOS Integrated Circuits. McGraw-Hill Higher Edu-
cation, 2001.
[84] F. Silveira, D. Flandre, and P. G. A. Jespers, “A gm/id based methodology for the
design of CMOS analog circuits and its application to the synthesis of a silicon-on-
insulator micropower OTA,” IEEE Journal of Solid-State Circuits, vol. 31, no. 9, pp.
1314–1319, Sep. 1996.
[85] R. Fiorelli, E. J. Peralias, and F. Silveira, “LC-VCO design optimization methodol-
ogy based on the ratio for nanometer CMOS technologies,” IEEE Transactions on
Microwave Theory and Techniques, vol. 59, no. 7, pp. 1822–1831, Jul. 2011.
[86] J. Ou and F. Farahmand, “Transconductance/drain current based distortion analysis
for analog CMOS integrated circuits,” in Proc. IEEE 10th Int. New Circuits and
Systems Conf. (NEWCAS), Jun. 2012, pp. 61–64.
[87] R. Fiorelli, F. Silveira, and E. Peralı´as, “Most moderate –weak-inversion region as the
optimum design zone for CMOS 2.4-GHz cs-lnas,” IEEE Transactions on Microwave
Theory and Techniques, vol. 62, no. 3, pp. 556–566, Mar. 2014.
[88] P. G. Jespers, The gm/Id Methodology, a sizing tool for low-voltage Analog CMOS
Circuit, M. Ismail, Ed. Springer Dordrecht Heidelberg London New York, 2010.
127
[89] E. E. Stanford University. [Online]. Available: https://ee.stanford.edu/
[90] E. E. C. S. University of California Berkeley. [Online]. Available: https:
//eecs.berkeley.edu/academics/courses
[91] E. C. E. Utah State University. [Online]. Available: https://ece.usu.edu/programs/
courses/schedule
[92] A. M. Niknejad and R. G. Meyer, “Analysis, design, and optimization of spiral induc-
tors and transformers for si RF ics,” IEEE Journal of Solid-State Circuits, vol. 33,
no. 10, pp. 1470–1481, Oct. 1998.
[93] D. J. Allstot, X. Li, and S. Shekhar, “Design considerations for CMOS low-noise
amplifiers,” in Proc. Digest of Papers Radio Frequency Integrated Circuits (RFIC)
Symp. 2004 IEEE, Jun. 2004, pp. 97–100.
[94] T. H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits. Cambridge
University Press, 2003.
[95] R. F. Ye, T. S. Horng, and J. M. Wu, “Two CMOS dual-feedback common-gate low-
noise amplifiers with wideband input and noise matching,” IEEE Transactions on
Microwave Theory and Techniques, vol. 61, no. 10, pp. 3690–3699, Oct. 2013.
[96] M. D. Wei, S. F. Chang, C. W. Chang, C. H. Han, and R. Negra, “A CMOS fully-
differential current-reuse LNA with gm-boosting technique,” in Proc. European Mi-
crowave Integrated Circuits Conf. (EuMIC), Oct. 2011, pp. 378–381.
[97] S. L. Jang and C. W. Chang, “A 90 nm CMOS LC-tank divide-by-3 injection-locked
frequency divider with record locking range,” IEEE Microwave and Wireless Compo-
nents Letters, vol. 20, no. 4, pp. 229–231, Apr. 2010.
[98] X. Yi, C. C. Boon, M. A. Do, K. S. Yeo, and W. M. Lim, “Design of ring-oscillator-
based injection-locked frequency dividers with single-phase inputs,” IEEE Microwave
and Wireless Components Letters, vol. 21, no. 10, pp. 559–561, Oct. 2011.
128
[99] A. A. Hafez and C. K. K. Yang, “Analysis and design of superharmonic injection-
locked multipath ring oscillators,” IEEE Transactions on Circuits and Systems I:
Regular Papers, vol. 60, no. 7, pp. 1712–1725, Jul. 2013.
[100] C. Y. Wu, M. C. Chen, and Y. K. Lo, “A phase-locked loop with injection-locked
frequency multiplier in 0.18- CMOS for -band applications,” IEEE Transactions on
Microwave Theory and Techniques, vol. 57, no. 7, pp. 1629–1636, Jul. 2009.
[101] M. Raj and A. Emami, “A wideband injection-locking scheme and quadrature phase
generation in 65-nm CMOS,” IEEE Transactions on Microwave Theory and Tech-
niques, vol. 62, no. 4, pp. 763–772, Apr. 2014.
[102] M. Elbadry, B. Sadhu, J. X. Qiu, and R. Harjani, “Dual-channel injection-locked
quadrature lo generation for a 4-GHz instantaneous bandwidth receiver at 21-GHz
center frequency,” IEEE Transactions on Microwave Theory and Techniques, vol. 61,
no. 3, pp. 1186–1199, Mar. 2013.
[103] J. Lee and H. Wang, “Study of subharmonically injection-locked plls,” IEEE Journal
of Solid-State Circuits, vol. 44, no. 5, pp. 1539–1553, May 2009.
[104] S. Verma, H. R. Rategh, and T. H. Lee, “A unified model for injection-locked frequency
dividers,” IEEE Journal of Solid-State Circuits, vol. 38, no. 6, pp. 1015–1027, Jun.
2003.
[105] A. Mirzaei, M. E. Heidari, R. Bagheri, S. Chehrazi, and A. A. Abidi, “The quadrature
LC oscillator: A complete portrait based on injection locking,” IEEE Journal of Solid-
State Circuits, vol. 42, no. 9, pp. 1916–1932, Sep. 2007.
[106] S. Cheng, H. Tong, J. Silva-Martinez, and A. l. Karsilayan, “A fully differential low-
power divide-by-8 injection-locked frequency divider up to 18 GHz,” IEEE Journal of
Solid-State Circuits, vol. 42, no. 3, pp. 583–591, Mar. 2007.
129
[107] R. Fiorelli, E. J. Peralias, and F. Silveira, “LC-VCO design optimization methodol-
ogy based on the ratio for nanometer CMOS technologies,” IEEE Transactions on
Microwave Theory and Techniques, vol. 59, no. 7, pp. 1822–1831, Jul. 2011.
[108] A. D. Berny, A. M. Niknejad, and R. G. Meyer, “A 1.8-GHz LC VCO with 1.3-GHz
tuning range and digital amplitude calibration,” IEEE Journal of Solid-State Circuits,
vol. 40, no. 4, pp. 909–917, Apr. 2005.
[109] J. P. Maligeorgos and J. R. Long, “A low-voltage 5.1-5.8-GHz image-reject receiver
with wide dynamic range,” IEEE Journal of Solid-State Circuits, vol. 35, no. 12, pp.
1917–1926, Dec. 2000.
[110] B. Razavi, A Study of Phase Noise in CMOS Oscillators. Wiley-IEEE
Press, 2003, pp. 176–188. [Online]. Available: http://ieeexplore.ieee.orghttp:
//ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5311457
[111] A. Rofougaran, J. Rael, M. Rofougaran, and A. Abidi, “A 900 MHz CMOS LC-
oscillator with quadrature outputs,” in Proc. IEEE Int. Solid-State Circuits Conf.
Digest of Technical Papers. 42nd ISSCC, Feb. 1996, pp. 392–393.
[112] S. H. Galal, H. F. Ragaie, and M. S. Tawfik, “RC sequence asymmetric polyphase
networks for RF integrated transceivers,” IEEE Transactions on Circuits and Systems
II: Analog and Digital Signal Processing, vol. 47, no. 1, pp. 18–27, Jan. 2000.
[113] F. Behbahani, Y. Kishigami, J. Leete, and A. A. Abidi, “CMOS mixers and polyphase
filters for large image rejection,” IEEE Journal of Solid-State Circuits, vol. 36, no. 6,
pp. 873–887, Jun. 2001.
[114] F. Haddad, W. Rahajandraibe, L. Zaid, and O. Frioui, “Design of fully-integrated
RF front-end for large image rejection and wireless communication applications,” in
Proc. 17th IEEE Int Electronics, Circuits, and Systems (ICECS) Conf, Dec. 2010,
pp. 902–905.
[115] A. Hastings, The Art of Analog Layout. Upper Saddle River, NJ, 2001.
130
[116] J. F. Bulzacchelli, M. Meghelli, S. V. Rylov, W. Rhee, A. V. Rylyakov, H. A.
Ainspan, B. D. Parker, M. P. Beakes, A. Chung, T. J. Beukema, P. K. Pepelju-
goski, L. Shan, Y. H. Kwark, S. Gowda, and D. J. Friedman, “A 10-Gb/s 5-Tap
DFE/4-Tap ffe transceiver in 90-nm CMOS technology,” IEEE Journal of Solid-State
Circuits, vol. 41, no. 12, pp. 2885–2900, Dec. 2006.
[117] K. Y. Kim, Y. J. Min, S. W. Kim, and J. Park, “Low-power programmable divider
with a shared counter for frequency synthesiser,” Devices Systems IET Circuits, vol. 5,
no. 3, pp. 170–176, May 2011.
[118] R. L. Miller, “Fractional-frequency generators utilizing regenerative modulation,”
Proceedings of the IRE, vol. 27, no. 7, pp. 446–457, Jul. 1939.
[119] A. Safarian, S. Anand, and P. Heydari, “On the dynamics of regenerative frequency
dividers,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 53,
no. 12, pp. 1413–1417, Dec. 2006.
[120] B. Razavi, K. F. Lee, and R. H. Yan, “Design of high-speed, low-power frequency
dividers and phase-locked loops in deep submicron CMOS,” IEEE Journal of Solid-
State Circuits, vol. 30, no. 2, pp. 101–109, Feb. 1995.
[121] X. Gui, Z. Chen, and M. M. Green, “Analysis of nonlinearities in injection-locked fre-
quency dividers,” IEEE Transactions on Microwave Theory and Techniques, vol. 63,
no. 3, pp. 945–953, Mar. 2015.
[122] A. Mirzaei, M. E. Heidari, R. Bagheri, and A. A. Abidi, “Multi-phase injection widens
lock range of ring-oscillator-based frequency dividers,” IEEE Journal of Solid-State
Circuits, vol. 43, no. 3, pp. 656–671, Mar. 2008.
[123] T. K. Moon, Error correction coding: mathematical methods and algorithms. Wiley-
Interscience, 2005.
131
[124] C.-C. Lin, Y.-H. Shih, H.-C. Chang, and C.-Y. Lee, “Design of a power-reduction
Viterbi decoder for WLAN applications,” IEEE Transactions on Circuits and Systems
I: Regular Papers, vol. 52, no. 6, pp. 1148–1156, Jun. 2005.
[125] F. Hemmati and D. J. Costello, “Truncation error probability in Viterbi decoding,”
IEEE Transactions on Communications, vol. 25, no. 5, pp. 530–532, May 1977.
[126] O. C. AKgun, J. N. Rodrigues, Y. Leblebici, and V. Owall, “High-level energy
estimation in the sub-vT domain: Simulation and measurement of a cardiac event
detector,” IEEE Trans. Biomed. Circuits Syst., vol. 6, no. 1, pp. 15–27, 2012.
[Online]. Available: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5936651
[127] C. Winstead and J. N. Rodrigues, “Ultra-low-power error correction circuits:
Technology scaling and sub-vt operation,” IEEE Trans. Circuits Syst. II, to be
published, early Access. [Online]. Available: http://ieeexplore.ieee.org/stamp/stamp.
jsp?arnumber=6412785
[128] C. Winstead, N. Nguyen, V. C. Gaudet, and C. Schlegel, “Low-voltage cmos circuits
for analog iterative decoders,” IEEE Trans. Circuits Syst. I, vol. 53, no. 4, pp. 829–
841, 2006.
132
APPENDIX
133
Impulse Response Calculation for the Coils
In order to obtain the impulse response equation, re-write and re-arrange the function
(2.2) as:
H12(s) =
sM12
Rs1L1C1s2 + (Rs1R1C1 + L1)s+ (R1 +Rs1)
× 1
L2C2s2 +R2C2s+ 1
=
M12
Rs1L1C1L2C2
s
(s2 + Rs1R1C1+L1Rs1L1C1 s+
R1+Rs1
Rs1L1C1
)(s2 + R2L2 s+
1
L2C2
)
=
sΘ
(s2 + 2ζ1ω1s+ ω21)(s
2 + 2ζ2ω2s+ ω22)
(A.1)
where
Θ =
M12
Rs1L1C1L2C2
(A.2)
ω21 =
R1 +Rs1
Rs1L1C1
(A.3)
ω22 =
1
L2C2
(A.4)
ζ1 =
Rs1R1C1 + L1
2Rs1
√
L1C1
√
Rs1
R1 +Rs1
(A.5)
ζ2 =
R2
2
√
C2
L2
(A.6)
Notice that the denominator of (A.1) is combined with two two-order systems. Assume
the roots of them are (r11, r12) and (r21, r22), function (A.1) can be further written as:
134
H12(s) =
sΘ
(s2 + 2ζ1ω1s+ ω21)(s
2 + 2ζ2ω2s+ ω22)
=
sΘ
(s− r11)(s− r12)(s− r21)(s− r22)
=
A1
s− r11 +
A2
s− r12 +
B1
s− r21 +
B2
s− r22 (A.7)
where
A1 = (s− r11)H12(s) |s=r11=
Θr11
(r11 − r12)(r11 − r21)(r11 − r22) (A.8)
A2 = (s− r12)H12(s) |s=r12=
Θr12
(r12 − r11)(r12 − r21)(r12 − r22) (A.9)
B1 = (s− r21)H12(s) |s=r21=
Θr21
(r21 − r11)(r21 − r12)(r21 − r22) (A.10)
B2 = (s− r22)H12(s) |s=r22=
Θr22
(r22 − r11)(r22 − r12)(r22 − r21) (A.11)
r11,12 =
 −ζ1ω1 ± ω1
√
ζ21 − 1, ζ1 ≥ 1
−ζ1ω1 ± jω1
√
1− ζ21 , ζ1 < 1
(A.12)
r21,22 =
 −ζ2ω2 ± ω2
√
ζ22 − 1, ζ2 ≥ 1
−ζ2ω2 ± jω2
√
1− ζ22 , ζ2 < 1
(A.13)
I If (ζ1 ≥ 1, ζ2 ≥ 1) In this case, all roots as well as A1, A2, B1, B2 are real. The impulse
response is the reverse Laplace Transform of (A.7):
h(t) = A1e
r11t +A2e
r12t +B1e
r21t +B2e
r22t (A.14)
135
II If (ζ1 < 1, ζ2 < 1) In this case, all the roots as well as A1, A2, B1, B2 are complex.
Notice that (A1, A2) and (B1, B2) are conjugate:
A1 = A
∗
2 (A.15)
B1 = B
∗
2 (A.16)
Assume that: A1 = a1 + jb1, A2 = a1 − jb1, B1 = a2 + jb2, B2 = a2 − jb2. Re-write
the complex roots as:
r11,12 = −ζ1ω1 ± jω1
√
1− ζ21 = −τ1 ± jωd1 (A.17)
r21,22 = −ζ2ω2 ± jω2
√
1− ζ22 = −τ2 ± jωd2 (A.18)
The reverse Laplace Transform of the first two-order system is:
L −11 = L
−1
[
A1
s− r11 +
A2
s− r12
]
= L −1
[
A1
s+ τ1 − jωd1 +
A∗1
s+ τ1 + jωd1
]
= A1e
(−τ1+jωd1)t +A∗1e
(−τ1−jωd1)t
= e−τ1t
[
A1e
jωd1t +A∗1e
−jωd1t] (A.19)
The expansion of (A.19) is given by:
L −11 = e
−τ1t {(a1 + jb1) [cos(ωd1t) + jsin(ωd1t)] + (a1 − jb1) [cos(ωd1t)− jsin(ωd1t)]}
= 2e−τ1t [a1cos(ωd1t)− b1sin(ωd1t)] (A.20)
136
Similarly, the reverse Laplace Transform of the second two-order system is:
L −12 = 2e
−τ2t [a2cos(ωd2t)− b2sin(ωd2t)] (A.21)
So we can get the whole impulse response:
h(t) = L −11 +L
−1
2
= 2e−τ1t [a1cos(ωd1t)− b1sin(ωd1t)] + 2e−τ2t [a2cos(ωd2t)− b2sin(ωd2t)]
(A.22)
III If (ζ1 < 1, ζ2 > 1) In this case, (r11, r12, A1, A2) are complex, (r21, r22, B1, B2) are
real. Similarly, the impulse response could be obtained by:
h(t) = 2e−τ1t [a1cos(ωd1t)− b1sin(ωd1t)] +B1er21t +B2er22t (A.23)
IV If (ζ1 > 1, ζ2 < 1) In this case, (r11, r12, A1, A2) are real, (r21, r22, B1, B2) are complex.
The impulse response is given by:
h(t) = A1e
r11t +A2e
r12t + 2e−τ2t [a2cos(ωd2t)− b2sin(ωd2t)] (A.24)
137
CURRICULUM VITAE
Yi Luo
Published Journal Articles
• Chris Winstead, Abiezer Tejeda, Eduardo Monzon, Yi Luo, “An error correction
method for binary and multiple-valued logic,” Journal of Multiple Valued Logic and
Soft Computing, in press, 2012. IF 0.343.
• Hui Zhao, Yi Luo, Xiaoxing Zhang, Yujie Dai, Yingjie Lv, “1.5Gb/s Transceiver
Design for SubLVDS Interface,” Journal of Microelectronics and Computer, China,
CN 61-1123/TN, No.090311, 2009.
Published Conference Papers
• Yi Luo, Chris Winstead and Patrick Chiang, “125Mbps Ultra-Wideband System Eval-
uation for Cortical Implant Devices,” IEEE Engineering in Medicine and Biology
Conference (EMBC), Aug 2012.
• Chris Winstead, Yi Luo, “Error correction circuits for bio-implantable electronics,”
IEEE Midwest Symposium on Circuits and Systems (MWSCAS), Invited Paper, Aug
2012.
• Chris Winstead, Abiezer Tejeda, Eduardo Monzon, Yi Luo, “An error-correction
method for binary and multiple-valued logic,” IEEE International Symposium on
Multiple-Valued Logic, Tuusala, Finland, May 2011.
